Recap (Pipelining) - PowerPoint PPT Presentation

recap pipelining l.
Skip this Video
Loading SlideShow in 5 Seconds..
Recap (Pipelining) PowerPoint Presentation
Recap (Pipelining)

play fullscreen
1 / 31
Download Presentation
Views
Download Presentation

Recap (Pipelining)

Presentation Transcript

  1. Recap(Pipelining)

  2. What is Pipelining? • A way of speeding up execution of tasks • Key idea: overlap execution of multiple taks

  3. Automobile Manufacturing 1. Build frame. 60 min. 2. Add engine. 50 min. 3. Build body. 80 min. 4. Paint. 40 min. 5. Finish. 45 min. 275 min. Latency: Time from start to finish for one car. 275 minutes per car. (smaller is better) Throughput: Number of finished cars per time unit. 1 car/275 min = 0.218 cars/hour (larger is better) Issues: How can we make the process better by adding?

  4. 3 4 1 2 2 3 4 1 time 4 1 2 3 3 1 4 2 4 3 2 1 An Assembly line 80 80 60 80 50 80 80 40 45 Last two stages only receive onecar/80 min to work on. Latency: 400 min/car Throughput: 4 cars/640 min (1 car/160 min) Will approach 1 car/80 min as time goes on First two stagescan’t produce faster thanone car/80 min or a backlog will occurat third stage.

  5. 1ns 200ps 200ps 200ps 200ps 200ps Pipeline Register Pipelining a Digital System • Key idea: break big computation up into piecesSeparate each piece with a pipeline register

  6. Non-pipelined: 1 operation finishes every 1ns 1ns Pipelined: 1 operation finishes every 200ps 200ps 200ps 200ps 200ps 200ps Pipelining a Digital System • Why do this? Because it's faster for repeated computations

  7. Comments about pipelining • Pipelining increases throughput, but not latency • Answer available every 200ps, BUT • A single computation still takes 1ns • Limitations: • Computations must be divisible into stages of equal sizes • Pipeline registers add overhead

  8. 30ns 3ns Comb. Logic R E G Clock Another Example Unpipelined System • One operation must complete before next can begin • Operations spaced 33ns apart Delay = 33ns Throughput = 30MHz Op1 Op2 Op3 ?? Time

  9. 10ns 3ns 10ns 3ns 10ns 3ns Comb. Logic R E G Comb. Logic R E G Comb. Logic R E G Clock 3 Stage Pipelining • Space operations 13ns apart • 3 operations occur simultaneously Delay = 39ns Throughput = 77MHz Op1 Op2 Op3 Op4 Time

  10. 5ns 3ns 15ns 3ns 10ns 3ns Com. Log. R E G Comb. Logic R E G Comb. Logic R E G Limitation: Nonuniform Pipelining • Throughput limited by slowest stage • Delay determined by clock period * number of stages • Must attempt to balance stages Delay = 18 * 3 = 54 ns Throughput = 55MHz Clock

  11. 5ns 3ns 5ns 3ns 5ns 3ns 5ns 3ns 5ns 3ns 5ns 3ns Com. Log. R E G Com. Log. R E G Com. Log. R E G Com. Log. R E G Com. Log. R E G Com. Log. R E G Clock Limitation: Deep Pipelines • Diminishing returns as add more pipeline stages • Register delays become limiting factor • Increased latency • Small throughput gains • More hazards Delay = 48ns, Throughput = 128MHz

  12. MIPSPipelining

  13. MIPS 5-stage pipeline • The MIPS processor needs 5 stages to execute instructions • Pipelining stages: • IF - Instruction Fetch • ID - Instruction Decode • EX - Execute / Address Calculation • MEM - Memory Access (read / write) • WB - Write Back (results into register file) • Not all instructions need all the stages (e.g., add instruction does not need the MEM stage)

  14. Pipeline Registers Basic MIPS Pipelined Processor IF/ID ID/EX EX/MEM MEM/WB

  15. Consider the following instruction sequence: lw $r0, 10($r1) sw $sr3, 20($r4) add $r5, $r6, $r7 sub $r8, $r9, $r10 Pipelined Example - Executing Multiple Instructions

  16. Executing Multiple InstructionsClock Cycle 1 LW

  17. Executing Multiple InstructionsClock Cycle 2 SW LW

  18. Executing Multiple InstructionsClock Cycle 3 ADD SW LW

  19. Executing Multiple InstructionsClock Cycle 4 ADD SW LW SUB

  20. Executing Multiple InstructionsClock Cycle 5 SUB ADD SW LW

  21. Executing Multiple InstructionsClock Cycle 6 SUB ADD SW

  22. Executing Multiple InstructionsClock Cycle 7 ADD SUB

  23. Executing Multiple InstructionsClock Cycle 8 SUB

  24. Alternative View - Multicycle Diagram

  25. Processor Pipelining • There are two ways that pipelining can help: • Reduce the clock cycle time, and keep the same CPI • Reduce the CPI, and keep the same clock cycle time CPU time = Instruction count * CPI * Clock cycle time

  26. Reduce the clock cycle time, and keep the same CPI CPI = 1 Clock = X Hz

  27. Pipeline Registers ADD ADD M U X M E U X X T N D Reduce the clock cycle time, and keep the same CPI CPI = 1 Clock = X*5 Hz 4 PC <<2 Instruction I ADDR RD 32 32 16 5 5 5 Instruction Memory RN1 RN2 WN RD1 Register File ALU WD RD2 ADDR Data RD Memory 16 32 WD

  28. Reduce the CPI, and keep the same cycle time CPI = 5 Clock = X*5 Hz

  29. Pipeline Registers ADD ADD 4 PC <<2 Instruction I ADDR RD 32 32 16 5 5 5 Instruction Memory RN1 RN2 WN RD1 Register File ALU WD RD2 M ADDR U X Data RD Memory M E U X 16 32 X WD T N D Reduce the CPI, and keep the same cycle time CPI = 1 Clock = X*5 Hz

  30. Pipeline performance • Ideally we get a speedup (by reducing clock cycle or reducing the CPI) equal to the number of stages. • In practice, we do not achieve that – but we get close: • Pipelining has additional overhead (e.g., pipeline registers) • Pipeline hazards

  31. Pipeline Hazards • Hazards are situations in pipelining which prevent the next instruction in the instruction stream from executing during the designated clock cycle. • Hazards reduce the ideal speedup gained from pipelining (e.g., CPI =1) and are classified into three classes: • Structural hazards • Data hazards • Control hazards