One CPU manufacturer proposed the 10-stage pipeline below for a 500MHz (2ns cloc

ID: 3781004 • Letter: O

Question

One CPU manufacturer proposed the 10-stage pipeline below for a 500MHz (2ns clock cycle) machine Here are the correspondences between this and the MIPS pipeline: Instructions are fetched in the FET stage Register reading is performed in the REG stage. ALU operations and memory access are both done in the EXE stage. Branches are resolved in the DET stage. WRB is the write back stage. How much time is required to execute one million instructions on this processor, assuming there are no dependencies or branches in the code? (3%) Without forwarding, how many Assume that the register file could be written and read in the same clock cycle. What is this hazard called? (3%) If a branch is mispredicted, how many instructions would have to be flushed from the (3%) pipeline? Assume that a program executes one million instructions. Of these, 15% are load instructions which stall, and 10% of the instructions are branches. The CPU predicts branches correctly 75% of the time, How much time will it take to execute this program?

Explanation / Answer

(a)
It takes 9 cycles to fill the pipeline, and then 1,000,000 more cycles to complete the instructions, for a total
of 1,000,009 * 2 = 2,000,018 ns.

(b)
The “lw” would not store its result into $t0 until the WRB stage, but that value must be read by the “add,”
in its REG stage.

So, it would take 2 stall cycles.

However, if we assume that register file could be written and read on the same cycle it would take 3 stall cycles.

(c)
Deciding on the branch happens in the 9th stage, and there would be 8 subsequent instructions in the pipeline by this time.
Flushing is to be done for all of them.

(d)
From the above answer (a), 1 million instructions would take 1,000,009 cycles.

Given CPU predicts branches correctly 75% of the time => it predicts incorrectly 25% of the time.
This incorrect prediction would incur penalty of 8 cycles from flushing.

Also, given that 15% of load instructions are stallled => results in 2 extra cycles for each of those instructions.

Therefore, total cycles is then 1,000,009 + (25,000 * 8) + (150,000 * 2) = 1,500,009, for a run time of 3,000,018ns.

Performance is about 1.5 times worse, which indicates the need of minimizing stalls and accurate branch prediction.

Navigate

One British pound exchanges for 1.6 U.S. dollars. One dollar exchanges for 81 Ja

One CPU manufacturer proposed the 10-stage pipeline below for a 500MHz (2ns cloc

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

One CPU manufacturer proposed the 10-stage pipeline below for a 500MHz (2ns cloc

Question

Explanation / Answer

Related Questions

Navigate