Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

please show your work Problem 2: consider the following code segment in MIPS 24

ID: 656083 • Letter: P

Question

please show your work

Problem 2: consider the following code segment in MIPS 24 points add $t3, St1, St2 Sub St3, $t1, St2. beq $S3, SS5, Load lw $t4, 8($t0) add $t5, $17, $t6 Load: LW $S5, 16 (SS0) a. Show what happens when the branch is not taken in this instruction sequence, assuming the pipeline is designed by assuming the branch will not be taking. b. Show what happens when the branch is taken in this instruction sequence, assuming the pipeline is designed by assuming the branch will not be taken. c. Show what happens when the branch is not taken in this instruction sequence, assuming the pipeline is designed by assuming always stall.

Explanation / Answer

Use the following code segment:

loop:

LW

R1, 0(R2)

ADDI

R1,R1,#1

SW

R1, 0(R2)

ADDI

R2, R2, #4

SUB

R4, R3, R2

BNEZ

R4, loop

Assume that the initial value of R3 is R2+396.

Throughout this Exercise use the DLX integer pipeline and assume all memory accesses are cache hits.

a) Show the timing of this instruction sequence for the DLX pipeline without any forwarding but assuming a register read and a write in the same clock cycle "forwards" through the register file. Assume that a branch is handled by flushing the pipeline. If the memory references hit in the cache, how many cycles does this loop take to execute?

Solution:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

l:

LW

R1,0(R2)

IF

ID

EX

ME

WB

ADDI

R1,R1,#1

IF

stall

stall

ID

EX

ME

WB

SW

R1, 0(R2)

IF

stall

stall

ID

EX

ME

WB

ADDI

R2,R2,#4

IF

ID

EX

ME

WB

SUB

R4,R3,R2

IF

stall

stall

ID

EX

ME

WB

BNEZ

R4, l

IF

stall

stall

ID

EX

MEM

WB

IF

stall

stall

IF

Number of steps = (396 / 4) + 1 = 100

Clock Cycles to Execute the loop = Number of steps * Number of Clock Cycles per step

= 100 * 17 = 1700

b) Show the timing of this instruction sequence for the DLX pipeline with normal forwarding hardware. Assume that the branch is handled by predicting it as not taken. If the memory references hit in the cache, how many cycles does this loop take to execute?

Solution:

1

2

3

4

5

6

7

8

l:

LW

R1,0(R2)

IF

ID

EX

ME

WB

ADDI

R1,R1,#1

  

IF

ID

stall

EX

ME

WB

SW

R1, 0(R2)

IF

stall

ID

EX

ME

WB

ADDI

R2,R2,#4

IF

ID

EX

ME

WB

SUB

R2,R3,R2

IF

ID

EX

ME

WB

BNEZ

R4, l

IF

ID

EX

ME

WB

if not taken

IF

idle

idle

idle

l: (if taken)

IF

Keep in mind that 99 times the branch is predicted wrong( as not taken),
and 1 time is actually predicted right!

Clock Cycles to Execute the loop = 99 * 8 + 1 * 7 = 799

c) Assuming the DLX pipeline with a single-cycle delayed branch and forwarding hardware, schedule the instructions in the loop including the branch delay slot. You may reorder instructions and modify the individual instruction operands. Show a pipeline timing diagram and compute the number of cycles needed to execute the entire loop.

Solution:

1

2

3

4

5

6

l:

LW

R1,0(R2)

IF

ID

EX

ME

WB

ADDI

R2,R2,#4

IF

ID

EX

ME

WB

SUB

R4,R3,R2

IF

ID

EX

ME

WB

ADDI

R1,R1,#1

IF

ID

EX

ME

WB

BNEZ

R4, l

IF

ID

EX

ME

WB

SW

R1, 0(R2)

IF

ID

EX

ME

WB

IF

  

  

Clock Cycles to Execute the loop = 100 * 6 = 600

loop:

LW

R1, 0(R2)

ADDI

R1,R1,#1

SW

R1, 0(R2)

ADDI

R2, R2, #4

SUB

R4, R3, R2

BNEZ

R4, loop