Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

A processor with in-order execution runs at 1 GHz and has an execution CPI of 1.

ID: 3776690 • Letter: A

Question

A processor with in-order execution runs at 1 GHz and has an execution CPI of 1.2 without counting memory stall cycles. The only instructions that access memory are loads (20% of all instructions) and stores (6% of all instructions). The memory system is composed of an l-cache and a D-cache that each has a hit time of 1 clock cycle. The l-cache has a 3% miss rate. The D-cache is write-back with a 5% miss rate for reads and a 2% miss rate for writes. It takes 15 ns on average to access and transfer a block from the unified L2 cache into the l-cache or D-cache. Of all memory references sent to the L2 cache in the system, 20% miss in the L2 cache and require main memory access. It takes 50 ns on average to access and transfer a block from the main memory into the l-cache or D-cache. What is the average memory access time for instruction fetching? What is the average memory access time for data reads? What is the average memory access time for data writes? What is the overall CPI, including memory stall cycles?

Explanation / Answer

Assumption:

CR = 1.1 GHz cycle = 1/(1.1*109

) = 0.9 (ns)

CPI excluding memory = 0.7 (cycle)

Load% = 20%

Store% = 5%

L1: HitTime = 0

I-cache: direct mapped, Capacity = 32KB, MR = 2%, 32B/block;

D-cache: direct mapped, wirte through, Capacity=32KB, MR=5%, 16B/block;

Write buffer eliminates 95% stall.

L2: Capacity=512KB, write back, 64B/block, Accesstime=15ns, 128bit/data bus,

64*8/128=4(times transformation for one block)

TR= 266MHz, 128bit/transfer cycle,

one transfer cycle time(TTL2) = 1/(266*106

) = 3.76ns

MR= 20%, DirtyRate = 50%

MM: 128bit wide, access latency=60ns, 128bit/transfer cycle,

one transfer cycle time(TTmm) = 1/(133*106

) = 7.52ns

load / write block latency = 60ns + 4*7.52 = 90ns

Solution:

a. AMAT of Instruction Access

AMATinstr = HitTime_I-cache + MRi-cache* (HT-L2 + Time to fetch miss block from L2 + Time

to fetch from MM + Time to write back the dirty block when miss )

= 0+ 2% * (15 + (32*8/128) * TTL2+ 20% * ((60 + (64*8/128) * TTmm + 20%*50%* (60

+ 64*8/128*TTmm) )

= 2% * ( 15 + 2*3.76 + 20%*(60+4*7.52) + 20%*50%*(60+4*7.52)

= 2% * ( 15 + 7.52 + 20% * 90.08 + 20% * 50% *90.08 ) = 0.99 (ns)

b. AMAT of Data Read

AMAT = HitTime_D-cache + MRd-cache* (HT-L2 + Time to fetch miss block from L2 + Time to

fetch from MM + Time to write back the dirty block when miss )

= 0 + 5% * (15 + (16*8/128) * TTL2 + 20% * ((60 + (64*8 /128) *TTmm + 20%*50%*

(60 + 64*8/128*TTmm) )

= 5% * ( 15 + 3.76 + 20% *1.5* (60+4*7.52)) = 5% * 45.78=2.29(ns) .

c. AMAT of Data Write ( Every write will go to the write buffer. Each time for one word.)

AMAT = WriteTime when hit the write buffer Hit time of L1 + (1-95%)* 100% x (Hit time of

L2 + Miss rate of L2 x Miss penalty of L2)

= 0 + (1-95%) * (15 + 1 * TTL2 + 20% * (60 + 1* TTmm) )

= 5% * ( 15 + 3.76 + 20% * (60 + 7.52 + 50%*90)) (NO WRITE ALLOCATE )

= 1.61( ns )

* 50%*90)): No replacement will happen due to NO WRITE ALLOCATE .

How about write allocate ?

= 0+ (1-95%) * (15 + 1 * TTL2 + 20% * (90 + 50% * 90) )

= 5% * ( 15 + 3.75 + 20%* 135)

= 2.29

* 50%*90: time latency for write back the dirty block.

d. Over all of CPI

CPI = CPI of org + stalls for instruction reference + stalls of Data read per instruction + stalls of

Data write per instruction

= 0.7 + 0.99/0.9 + 20%*2.29/0.9 + 5%*2.29/0.9 = 0.7 + 1.1 +0.51 + 0.13 = 2.44

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote