How long (in cycles) will the following loop take to execute per iteration? Assu
ID: 671586 • Letter: H
Question
How long (in cycles) will the following loop take to execute per iteration?
Assume the processor uses the 5 cycle memory pipeline from the text (FET/DEC/EXE/
MEM/WB) with perfect branch prediction and full forwarding. Further assume that the
processor can decode and issue 2 instructions/cycle, that there is a single add/sub ALU,
a single load/store ALU and a multiply ALU. The add/sub ALU also processes branches.
The load/store ALU requires 1 extra cycle in the MEM stage to perform a load or store.
The multiply ALU is fully pipelined and requires 3 cycles to complete a multiplication.
Also assume that the processor supports out-of-order execution with register renaming.
Be sure to show how you obtained your answer, at least including a schedule of stages
by instruction and cycle on the next page.
LOOP:
MUL R2,R1,R0
ADD R3,R2,R0
MUL R4,R5,R1
SUB R6,R7,R8
ADD R9,R6,R1
ST R6,0(R10)
ADDI R10,R10,#4
BNE R10,R11,LOOP
Explanation / Answer
L.D F0, 0(R1) ; F0 = array element
ADD.D F4, F0, F2 ; add scalar
S.D F4, 0(R1) ; store result
DADDUI R1, R1,# -8 ; decrement address pointer
BNE R1, R2, Loop ; branch if R1 != R2
NOP
Loop: L.D F0, 0(R1) ; F0 = array element
stall
ADD.D F4, F0, F2 ; add scalar
stall
stall
S.D F4, 0(R1) ; store result
DADDUI R1, R1,# -8 ; decrement address pointer
stall
BNE R1, R2, Loop ; branch if R1 != R2
stall
Loop: L.D F0, 0(R1)
stall
ADD.D F4, F0, F2
stall
stall
S.D F4, 0(R1)
DADDUI R1, R1,# -8
stall
BNE R1, R2, Loop
stall
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.