Given the following single-cycle \"Minimal MIPS\" datapath, answer the questions
ID: 3781732 • Letter: G
Question
Given the following single-cycle "Minimal MIPS" datapath, answer the questions that follow: Diagram or explain the hardware changes needed to support the following instructions. Be sure to account for both the data hardware and the control signal changes needed. jr (jump to register, copies the value of a register into PC) jal (jump and link; copies the value of PC+4 into a register, then jumps to a new address) Now we want to convert our datapath to a multi-cycle (but NOT pipelined) version. For both those instructions (jr and jal), explain what, if anything, they would do during the EX, MEM, and WB phases of execution.Explanation / Answer
CPU time
X,P
= Instructions executed
P
* CPI
X,P
* Clock cycle time
X
Instructions executed:
—
We are not interested in the
static instruction count
, or how many
lines of code are in a program.
—
Instead we care about the
dynamic instruction count
, or how many
instructions are actually executed when the program runs.
There are three lines of code below, but the number of instructions
executed would be 2001.
li
$a0, 1000
Ostrich:
sub
$a0, $a0, 1
bne
$a0, $0, Ostrich
Instructions Executed
The average number of clock cycles per instruction, or
CPI
, is a function
of the machine
and
program.
—
The CPI depends on the actual instructions appearing in the program
—
a floating
-
point intensive application might have a higher CPI than an
integer
-
based program.
—
It also depends on the CPU implementation. For example, a Pentium
can execute the same instructions as an older 80486, but faster.
So far we assumed each instruction took one cycle, so we had CPI = 1.
—
The CPI can be >1 due to memory stalls and slow instructions.
—
The CPI can be
<
1 on machines that execute more than 1 instruction
per cycle (superscalar).
CPI
One cycle is the minimum time it takes the CPU to do any work.
—
The
clock cycle time
or clock period is just the length of a cycle.
—
The
clock rate
, or frequency, is the reciprocal of the cycle time.
Generally, a higher frequency is better.
Some examples illustrate some typical frequencies.
—
A 500MHz processor has a cycle time of 2ns.
—
A 2GHz (2000MHz) CPU has a cycle time of just 0.5ns (500ps).
Clock cycle time
CPU time
X,P
= Instructions executed
P
* CPI
X,P
* Clock cycle time
X
The easiest way to remember this is match up the units:
Make things faster by making any component smaller!!
Often easy to reduce one component by increasing another
Execution time, again
Seconds
=
Instructions
*
Clock cycles
*
Seconds
Program
Program
Instructions
Clock cycle
Program
Compiler
ISA
Organization
Technology
Instruction
Executed
CPI
Clock Cycle
TIme
Let’s compare the performances two x86
-
based processors.
—
An 800MHz AMD Duron, with a CPI of 1.2 for an MP3 compressor.
—
A 1GHz Pentium III with a CPI of 1.5 for the same program.
Compatible processors implement identical instruction sets and will use
the same executable files, with the same number of instructions.
But they implement the ISA differently, which leads to different CPIs.
CPU time
AMD,P
= Instructions
P
* CPI
AMD,P
* Cycle time
AMD
=
=
CPU time
P3,P
= Instructions
P
* CPI
P3,P
* Cycle time
P3
=
=
Example 1: ISA
-
compatible processors
12
10100
I [15
-
11]
How the add goes through the datapath
Read
address
Instruction
memory
Instruction
[31
-
0]
Read
address
Write
address
Write
data
Data
memory
Read
data
MemWrite
MemRead
1
M
u
x
0
MemToReg
4
Shift
left 2
PC
Add
Add
0
M
u
x
1
PCSrc
Sign
extend
0
M
u
x
1
ALUSrc
Result
Zero
ALU
ALUOp
I [15
-
0]
I [25
-
21]
01001
I [20
-
16]
01010
0
M
u
x
1
RegDst
Read
register 1
Read
register 2
Write
register
Write
data
Read
data 2
Read
data 1
Registers
RegWrite
00...01
00...10
00...11
PC+4
It gets worse...
We’ve made
very
optimistic assumptions about memory latency:
—
Main memory accesses on modern machines is >50ns.
•
For comparison, an ALU on an AMD Opteron takes ~0.3ns.
Our worst case cycle (loads/stores) includes 2 memory accesses
—
A modern single cycle implementation would be stuck at <10Mhz.
—
Caches will improve common case access time, not worst case.
Tying frequency to worst case path violates first law of performance!!
—
Make the common case fast (we’ll revisit this often)
Summary
Performance
is one of the most important criteria in judging systems.
—
Here we’ll focus on
Execution time
.
Our main performance equation explains how performance depends on
several factors related to both hardware and software.
CPU time
X,P
= Instructions executed
P
* CPI
X,P
* Clock cycle time
X
It can be hard to measure these factors in real life, but this is a useful
guide for comparing systems and designs.
A single
-
cycle CPU has two main disadvantages.
—
The cycle time is limited by the worst case latency.
—
It isn’t efficiently using its hardware.
Next time, we’ll see how this can be rectified with pipelining.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.