Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

The latencies of individual stages in five-stage MIPS (Microprocessor without In

ID: 3576646 • Letter: T

Question

The latencies of individual stages in five-stage MIPS (Microprocessor without Interlocked Pipeline Stages) Architecture are given below.

Instruction

Instruction Fetch

Register Read

Arithmetic Logic Unit (ALU)

Memory Access

Register Write

Latency

200ps

100ps

200ps

300ps

100ps

(10 pts) What is the clock cycle time in a pipelined and non-pipelined processor?

Pipelined version                    : ______________

Non-pipelined version            : ______________

The classic five-stage pipeline MIPS architecture is used to execute the code fragments. Assume the followings:

Register write is done in the first half of the clock cycle; register read is performed in the second half of the clock cycle,

Branches are resolved in the fourth stage of the pipeline and the architecture does not utilize any branch prediction mechanism

Forwarding is not supported.

(5 pts) Assuming there is no dependence other than one(s) given in the code, show the pipeline diagram.

Clock Cycle à

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

add     R1, R2, R3

add     R4, R5, R6

beq      R1, R4, target

I4

(5 pts) Assuming there is no dependence other than one(s) given in the code, show the pipeline diagram.

Clock Cycle à

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

add     R4, R5, R6

lw        R1, 0(R2)

beq      R1, R4, target

I4

(5 pts) Assuming there is no dependence other than one(s) given in the code, show the pipeline diagram.

Clock Cycle à

1

2

3

4

5

6

7

8

9

10

11

12

13

add     R1, R2, R3

add     R1, R1, R4

add      R1, R1, R5

(5 pts) Assuming there is no dependence other than one(s) given in the code, show the pipeline diagram.

Clock Cycle à

1

2

3

4

5

6

7

8

9

10

11

12

13

lw        R1, 4(R2)

sw       R1, 0(R2)

The classic five-stage pipeline MIPS architecture is used to execute the code fragments. Assume the followings:

Register write is done in the first half of the clock cycle; register read is performed in the second half of the clock cycle,

Branches are resolved in the second stage of the pipeline and the architecture does not utilize any branch prediction mechanism

Forwarding is fully supported.

(5 pts) Assuming there is no dependence other than one(s) given in the code, show the pipeline diagram.

Clock Cycle à

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

add     R1, R2, R3

add     R4, R5, R6

beq      R1, R4, target

I4

(5 pts) Assuming there is no dependence other than one(s) given in the code, show the pipeline diagram.

Clock Cycle à

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

add     R4, R5, R6

lw        R1, 0(R2)

beq      R1, R4, target

I4

(5 pts) Assuming there is no dependence other than one(s) given in the code, show the pipeline diagram.

Clock Cycle à

1

2

3

4

5

6

7

8

9

10

11

12

13

add     R1, R2, R3

add     R1, R1, R4

add      R1, R1, R5

(5 pts) Assuming there is no dependence other than one(s) given in the code, show the pipeline diagram.

Clock Cycle à

1

2

3

4

5

6

7

8

9

10

11

12

13

lw        R1, 4(R2)

sw       R1, 0(R2)

a) (18 pts) A 64 KB L1 cache has a 32 byte block size and is 8-way set-associative.

How many sets does the cache have?

How many bits are used for the offset, index, and tag, assuming that the CPU provides 32-bit addresses?

How large is the tag array including valid bit?

b) (16 pts) Consider a program that can execute with no stalls and a CPI of 1 if the underlying processor can service every load instruction with a 2-cycle L1 cache hit. In practice, 10% of all load instructions suffer from an L1 cache miss. Every cache miss results in a 300-cycle stall while data is fetched from memory. What is the CPI for this program if 20% of the program's instructions are load instructions?

c)   (16 pts) Consider an L1 cache that has 16 sets, is direct-mapped (1-way), and supports a block size of 16 bytes. For the following memory access pattern (shown as byte addresses), show which accesses are hits and misses. For each case, indicate the set number.
0, 8, 16, 24, 32, 40, 48, 256, 28, 8, 36, 12, 20, 260.

Instruction

Instruction Fetch

Register Read

Arithmetic Logic Unit (ALU)

Memory Access

Register Write

Latency

200ps

100ps

200ps

300ps

100ps

Explanation / Answer

Pipelined: cycle time determined by slowest stage: 300ps.
Non-pipelined: cycle time determined by sum of all stages: 900ps.

Please post different questions for each question

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote