Problem #4 Consider the following vector code run on a 500 MHz version of VMIPS
ID: 3706886 • Letter: P
Question
Problem #4 Consider the following vector code run on a 500 MHz version of VMIPS for a fixed vector length of 64. The latency of load and store unit is 12 cycles, add unit 6 cycles, multiply unit 7 cycles and divide unit 20 cycles. LV V1, Ra MULV.D V2, VI, V3 ADDV.D V4, VI, V3 SV Rb, V2 (a) Assuming no chaining and a single memory pipeline, determine how many clock periods it (b) If the vector sequence is chained, how many clock cycles it would take to run the above vector (c) Supposc VMIPS had thrcc mcmory pipclincs and chaining. If thcrc wcre no bank conflicts in SV Rc, V4 would take to run the above VMIPS vector code codc? the accesses for the above loop, how many clock cycles are required to run this sequencc?Explanation / Answer
LV V1, Ra; 12 + 64
MULV.D V2, V1, V3; 7 + 12
ADDV.D V4, V1, V3; 6 + 64
SV Rb, V2 ; 12 + 64
SV Rc, V4 ; 12 + 64
The approximate time is 317 clock periods.
The hardware is changed such that the vector FP multiplier unit is chained to the scalar FP add unit (which stores in a
scalar FP register). The chaining requires a new bus, a Vector-Scalar, VS bus, from the vector multiplier to the scalar
FP Add functional unit. In the context of speculative superscalar MIPS execution, this means a reservation station
entry for the DPV in the integer section. That reservation station waits for the 64 results coming on the VS bus from
the vector unit. The VS bus is parallel to the integer and FP CDB bus(es).
If full chaining and multiple memory pipes are employed, the execution time would be 317 clock periods
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.