Fill in the pipeline diagram below for the following code, assuming it is runnin
ID: 2246863 • Letter: F
Question
Fill in the pipeline diagram below for the following code, assuming it is running on the pipelined machine with full forwarding support. Make sure to denote stalls and use arrows to show forwarding. You can refer to Fig 4.60 for the pipelined processor overview, with both the hazard detection unit and the forwarding unit. lw $t0, 0($s0) lw $t1, 0($t0) addi $t0, $t0, 4 addi $t1, $t1, 4 sw $t1, 0($s0) Fill in the pipeline diagram below for the code in (a), assuming a new pipelined machine (with full forwarding support). This new pipeline breaks MEM into two stages (presuming that was the bottleneck). So a memory operation (e.g., lw) will start memory access in Ml (still having calculated the address in EX) but does not complete the operation until the end of M2. Make sure to denote stalls and use arrows to show forwarding. If breaking the MEM stage into two as described in (b) reduces the cycle time from 260ns to 240ns. by how much did the performance of the code in (a) improve?Explanation / Answer
Well, I wanted to but am not able to upload the solution image for some network reasons. Let us try explaining without images.
a. the pipeline diagram will be as follows:
INSTRUCTION C1 C2 C3 C4 C5 C6 C7 C8
lw $t0, 0($s0) IF ID EX M WB
lw $t1, 0($t0) IF ID EX M WB
addi $t0, $t0,4 IF ID EX WB
addi $t1, $t1,4 IF ID EX WB
sw $t1, 0($s0) IF ID EX M
Thus, it will move from execution at C3 to EX at C4 to IF, ID AND WB OF C6 thereby completing at C8 by storing in memory and writing to the registers.
b.
INSTRUCTION C1 C2 C3 C4 C5 C6 C7 C8
lw $t0, 0($s0) IF ID EX M1 M2 WB
lw $t1, 0($t0) IF ID EX M1 M2 WB
addi $t0, $t0,4 IF ID EX WB
addi $t1, $t1,4 IF ID EX WB
sw $t1, 0($s0) IF ID EX M
This will also follow a similar path. thereby traversing upto and ending st C8. However the time gets reduced becuse parallel processing comes to picture.
c. The performance has been improved drastically by 160 ns since it is an improvement of 20ns per clock cycle and we have 8 clock cycles working in both cases.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.