Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Use the following code fragment: Start: LD R7, 0(R9) DADDI R7,R7,#1 SD R7,0,(R9)

ID: 674240 • Letter: U

Question

Use the following code fragment:

Start:

LD R7, 0(R9)

DADDI R7,R7,#1

SD R7,0,(R9)

DADDI R1,R1,#4

DMULT R9,R9,R7

DSUB R5,R1,R9

BNEZ R5, Start

Assume that the initial value of R3 is R2 + 396.

a. [15] Data hazards are caused by data dependences in the code. Whether a dependency causes a hazard depends on the machine implementation (i.e., number of pipeline stages). List all of the data dependences in the code above. Record the register, source instruction, and destination instruction; for example, there is a data dependency for register R1 from the LD to the DADDI.

b. [15] Show the timing of this instruction sequence for the 5-stage RISC pipeline without any forwarding or bypassing hardware but assuming that a register read and a write in the same clock cycle “forwards” through the register file, as shown in Figure C.6. Use a pipeline timing chart like that in Figure C.5. Assume that the branch is handled by flushing the pipeline. If all memory references take 1 cycle, how many cycles does this loop take to execute?

c. [15] Show the timing of this instruction sequence for the 5-stage RISC pipeline with full forwarding and bypassing hardware. Use a pipeline timing chart like that shown in Figure C.5. Assume that the branch is handled by predicting it as not taken. If all memory references take 1 cycle, how many cycles does this loop take to execute?

d. [15] Show the timing of this instruction sequence for the 5-stage RISC pipeline with full forwarding and bypassing hardware. Use a pipeline timing chart like that shown in Figure C.5. Assume that the branch is handled by predicting it as taken. If all memory references take 1 cycle, how many cycles does this loop take to execute?

e. [25] High-performance processors have very deep pipelines—more than 15 stages. Imagine that you have a 10-stage pipeline in which every stage of the 5-stage pipeline has been split in two. The only catch is that, for data forwarding, data are forwarded from the end of a pair of stages to the beginning of the two stages where they are needed. For example, data are forwarded from the output of the second execute stage to the input of the first execute stage, still causing a 1-cycle delay. Show the timing of this instruction sequence for the 10-stage RISC pipeline with full forwarding and bypassing hardware. Use a pipeline timing chart like that shown in Figure C.5. Assume that the branch is handled by predicting it as taken. If all memory references take 1 cycle, how many cycles does this loop take to execute?

f. [10] Assume that in the 5-stage pipeline the longest stage requires 0.8 ns, and the pipeline register delay is 0.1 ns. What is the clock cycle time of the 5-stage pipeline? If the 10-stage pipeline splits all stages in half, what is the cycle time of the 10-stage machine?

g. [15] Using your answers from parts (d) and (e), determine the cycles per instruction (CPI) for the loop on a 5-stage pipeline and a 10-stage pipeline. Make sure you count only from when the first instruction reaches the write-back stage to the end. Do not count the start-up of the first instruction. Using the clock cycle time calculated in part (f), calculate the average instruction execute time for each machine. (Computer Architecture: A Quantitative Approach
Fifth Edition by John L. Hennessy)

Explanation / Answer

This questions has multiple subparts. I will answer few. Please post one more question. Also it looks like there is a problem with the given values. R2 and R3 values are given but they are not getting used in this sequences.

a)

There are two other data dependencies in the above code:

DADDI R7,R7,#1

SD R7,0,(R9)

There is a dependency on R7 here. R7 is used as a destination and is getting used as a source in the next instruction.

There is a data hazard in

DMULT R9,R9,R7

DSUB R5,R1,R9

Here R9 is a destination of the instruction and then in the next instruction it is getting used as a source.

b)

It will take 11 cycles normally if there is no data dependency for this code to execute. Since there are 3 data dependencies in the above code it will add additional 2 cycles for each data dependency as values are not available in execute stage for the incoming instructions. So in total it will tak 17 cycles

c)

If data forwarding and by-passing are available then it will take 11 cycles for this sequence to process as data hazards will be removed because of by-passing.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote