1. Consider a program that executes a large number of instructions. Assume that
ID: 3813091 • Letter: 1
Question
1. Consider a program that executes a large number of instructions. Assume that the program does not suffer from stalls from data hazards. Assume that 20% of all instructions are branch instructions, and 65% of these branch instructions are Taken. What is the average CPI for this program when it executes on each of the processors listed below? All of these processors implement an 8-stage pipeline and resolve a branch at the end of the 3rd stage. The 1st stage fetches an instruction, the 2nd stage does decode, and the 3rd stage does register read and branch resolution. (40 points)
a.The processor pauses instruction fetch as soon as it fetches a branch. Instruction fetch is resumed after the branch outcome has been resolved.
b.The processor always fetches instructions sequentially. If a branch is resolved as Taken, the incorrectly fetched instructions after the branch are squashed.
c.The processor implements one branch delay slot. The compiler fills the branch delay slot with an instruction that comes before the branch in the original code (option A in the videos). After fetching the branch delay slot instruction, the processor pauses instruction fetch until the branch outcome has been resolved.
d.The processor implements a hardware branch predictor that makes correct predictions for 95% of all branches. When an incorrect prediction is discovered, the incorrectly fetched instructions after the branch are squashed.
2. Consider a program that can execute with no stalls and a CPI of 1 if the underlying processor can somehow magically service every load instruction with a 1-cycle L1 cache hit. In practice, 10% of all load instructions suffer from an L1 cache miss, and 2% of all load instructions suffer from an L2 cache miss (and are serviced by the memory system). An L1 cache miss stalls the processor for 20 cycles while the L2 is looked up. An L2 cache miss stalls the processor for an additional 250 cycles while data is fetched from memory. What is the CPI for this program if 20% of the program's instructions are load instructions? (20 points)
3. Consider an L1 cache that has 8 sets, is direct-mapped (1-way), and supports a block size of 16 bytes. For the following memory access pattern (shown as byte addresses), show which accesses are hits and misses. For each hit, indicate the set that yields the hit. (20 points)
0, 8, 4, 12, 16, 260, 24, 30, 31, 36, 42, 48, 8, 4, 278.
4. A 64 KB L1 cache has a 128 byte block size and is 8-way set-associative. How many sets does the cache have? How many bits are used for the offset, index, and tag, assuming that the CPU provides 32-bit addresses? How large is the tag array? If you do not explain your steps, you will not receive partial credit for an incorrect answer. (20 points)
Explanation / Answer
1. Consider a program that executes a large number of instructions. Assume that the program does not suffer from stalls from data hazards. Assume that 20% of all instructions are branch instructions, and 65% of these branch instructions are Taken. What is the average CPI for this program when it executes on each of the processors listed below? All of these processors implement an 8-stage pipeline and resolve a branch at the end of the 3rd stage. The 1st stage fetches an instruction, the 2nd stage does decode, and the 3rd stage does register read and branch resolution. (40 points)
a.The processor pauses instruction fetch as soon as it fetches a branch. Instruction fetch is resumed after the branch outcome has been resolved.
On an average, in a pipelined system each instruction have a CPI of one clock cycle.
In this case, CPI for branch instructions is different as the processor pauses the instruction fetch as soon as it fetches the branch instruction till branch outcome is resolved.
As branch outcome is resolved in 3rd stage, so processor starts fetching new instructions 3 clock cycles after branch instruction is fetched so branch instruction takes 3 clock cycles to execute.
So average CPI of program is = 0.2 * 3 + 0.8 * 1 = 1.4
b.The processor always fetches instructions sequentially. If a branch is resolved as Taken, the incorrectly fetched instructions after the branch are squashed.
If a branch is taken, then instructions in the stages preceding the 3rd stage are squashed. So branch instruction when branch is taken takes 3 clock cycles while all others take 1 clock cycle.
So average CPI of program is = 0.20 * 0.65 * 3 + 0.8 * 1 + 0.20 * 0.35 * 1 = 1.26
c.The processor implements one branch delay slot. The compiler fills the branch delay slot with an instruction that comes before the branch in the original code (option A in the videos). After fetching the branch delay slot instruction, the processor pauses instruction fetch until the branch outcome has been resolved.
As the branch delay slot is filled with an instruction before the branch instruction and processor stops fetching instructions as soon as branch delay slot ins. Is fetched, so 2 clock cycles are required for branches.
So average CPI of program is = 0.20 * 2 + 0.8 * 1 = 1.20
d.The processor implements a hardware branch predictor that makes correct predictions for 95% of all branches. When an incorrect prediction is discovered, the incorrectly fetched instructions after the branch are squashed.
As hardware predictor makes correct prediction for 95% of branches,
1 clock cycle is required when correct prediction is made and 3 clock cycles are required for incorrect prediction.
So average CPI of program is = 0.20 * 0.95 * 1 + 0.20 * 0.05 * 3 + 0.8 * 1 = 1.02
Please let me know in case of any doubts.
Thanks
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.