Advanced Computer Architecture 510/610 Winter 2016 (1) This question will be abo
ID: 3808190 • Letter: A
Question
Advanced Computer Architecture 510/610 Winter 2016 (1) This question will be about the following computer: (Note that some of the following information may not be needed) 1.5 GHz, No Pipeline, instruction buffers that negate instruction load penalties 32Kb L1 cache, 1 cycle access, 4-way associative, write-through, not write-allocate, 2% miss rate for Data and a miss rate of 1% for Instructions 1Mb L2 cache, 3 cycle access, 4-way associative, write-back, 30% miss rate, 20% dirty Block size is 4 32-bit words for both caches 1Gb RAM, DDR2 1.5 GHz 20 nanosecond latency 2-bit dynamic branch predictor that is right 90% of the time. When the prediction is correct and it is a branch not taken then there is no penalty. When the prediction is to branch and it is correct, there is a 1 cycle penalty, and there is a 2 cycle penalty if incorrect. Assume branches are 1/5 of all instructions, 4/5 of branches are taken, memory access is 1/10 of all instructions. What is CPI?Explanation / Answer
Pipelining define
• advent
– Defining Pipelining
– Pipelining instructions
• dangers
– Structural risks
– facts risks
– manipulate hazards
• overall performance
• Controller implementation
Pipeline risks
• in which one practise can not immediately observe some other
• kinds of risks
– Structural dangers - try to use the identical aid through or greater commands
– manage dangers - try to make branching decisions before branch circumstance is evaluated
– statistics dangers - attempt to use information earlier than it is prepared
• Can constantly resolve dangers by means of waiting
Structural dangers
• try and use the equal useful resource through or extra instructions on the identical time
• instance: unmarried memory for commands and information
– Accessed by means of IF level
– Accessed at equal time with the aid of MEM stage
• answers
– delay the second access with the aid of one clock cycle, OR
– provide separate recollections for instructions & information
» this is what the book does
» that is known as a “Harvard structure”
actual pipelined processors have separate caches
Pipelined instance -
Executing more than one commands
• don't forget the subsequent guidance series:
lw $r0, 10($r1)
sw $sr3, 20($r4)
upload $r5, $r6, $r7
sub $r8, $r9, $r10
Structural hazards
some common Structural risks:
• memory:
– we’ve already cited this one.
• Floating factor:
– considering many floating factor instructions require many cycles, it’s smooth for them to intervene with each different.
• starting up more of one kind of guidance than there are assets.
– as an instance, the PA-8600 can aid ALU + two load/shop commands according to cycle - that’s how plenty hardware it has to be had.
Structural hazards
• Structural hazards are decreased with these rules:
– each instruction uses a resource at most once
– constantly use the resource inside the same pipeline stage
– Use the aid for one cycle most effective
• Many RISC ISAs are designed with this in mind
• sometimes very tough to try this.
– as an example, reminiscence of necessity is used inside the IF and MEM stages.
Structural dangers
We need to evaluate the performance of two machines. Which system is quicker?
• device A: dual ported memory - so there are not any memory stalls
• system B: single ported memory, but its pipelined implementation has a clock rate this is 1.05 times faster
count on:
• best CPI = 1 for both
• loads are forty% of commands done
speed up Equations for Pipelining
We want to evaluate the performance of two machines. Which system is faster?
• device A: twin ported reminiscence - so there aren't any reminiscence stalls
• gadget B: single ported memory, however its pipelined implementation has a 1.05 times faster clock price
count on:
• ideal CPI = 1 for both
• loads are forty% of commands performed
SpeedUpA = Pipeline intensity/(1 + zero) x (clockunpipe/clockpipe)
= Pipeline depth
SpeedUpB = Pipeline intensity/(1 + zero.4 x 1)
x (clockunpipe/(clockunpipe / 1.05)
= (Pipeline intensity/1.four) x 1.05
= 0.seventy five x Pipeline intensity
SpeedUpA / SpeedUpB = Pipeline intensity / (0.75 x Pipeline depth) = 1.33
• device A is 1.33 times faster
Pipelining summary
• speed up <= Pipeline intensity; if ideal CPI is 1, then:
• dangers restriction performance on computer systems:
– Structural: need extra HW assets
– statistics (uncooked,warfare,WAW)
– control
Pipeline risks
• where one guidance can't right now comply with another
• forms of hazards
– Structural hazards - attempt to use equal resource two times
– manipulate risks - try to make selection earlier than condition is evaluated
– facts hazards - try and use facts before it is prepared
• Can constantly resolve dangers through ready
hardware problems
• wide variety and kind of Processors • Processor manage • memory Hierarchy • I/O devices and Peripherals • working device assist • packages software program Compatibility running device problems • Allocating and handling resources • get admission to to hardware functions – Multi-Processing – Multi-Threading • I/O management • access to Peripherals • efficiency applications problems • Compiler/Linker assist • Programmability • OS/hardware function Availability • Compatibility • Parallel Compilers – Preprocessor – Precompiler – Parallelizing Compiler structure Evolution • Scalar architecture • Prefetch Fetch/Execute Overlap • a couple of useful gadgets • Pipelining • Vector Processors • Lock-Step Processors • Multi-Processor Flynn’s class • remember instruction Streams and statistics Streams one by one. • SISD - unmarried guidance, unmarried facts stream • SIMD - unmarried practise, a couple of information Streams •
MIMD - multiple instruction, more than one facts Streams.
• MISD - (uncommon) more than one instruction, single data circulation SISD • traditional computer systems. • Pipelined structures • a couple of-useful Unit systems • Pipelined Vector Processors • consists of maximum computers encountered in ordinary existence SIMD • a couple of Processors Execute a unmarried program • each Processor operates on its personal statistics • Vector Processors • Array Processors • PRAM Theoretical version MIMD • multiple Processors cooperate on a unmarried venture • every Processor runs a extraordinary program • each Processor operates on one of a kind statistics • Many industrial Examples Exist MISD • A unmarried records movement passes via multiple processors • distinctive operations are brought on on unique processors • Systolic Arrays •
Wave-the front Arrays Programming issues • Parallel computers are hard to software • automatic Parallelization techniques are handiest partially a success • Programming languages are few, no longer well supported, and hard to apply. • Parallel Algorithms are difficult to design. performance issues • Clock price / Cycle Time = • Cycles in keeping with guidance (average) = CPI • education count number = I c • Time, T = I c × CPI × • p = Processor Cycles, m = reminiscence Cycles, ok = reminiscence/Processor cycle ratio • T = I c × (p + m × ok) × performance troubles II •
Ic & p laid low with processor design and compiler technology. • m affected mainly by using compiler technology tormented by processor design • okay affected by memory hierarchy structure and design other Measures • MIPS charge - thousands and thousands of instructions according to 2d • Clock rate for comparable processors • MFLOPS rate - millions of floating factor operations in step with 2d. • these measures are not neccessarily immediately similar among exceptional forms of processors. Parallelizin g Code • Implicitly – Write Sequential Algorithms – Use a Parallelizing Compiler – rely on compiler to discover parallelism • Explicitly – design Parallel Algorithms – Write in a Parallel Language – rely on Human to find Parallelism Multi-Processors •
Multi-Processors normally percentage memory, even as multi-computer systems do now not. – Uniform memory model – Non-Uniform reminiscence version – Cache-only • MIMD Machines Multi-Com puters • impartial computers that Don’t proportion memory. • related by way of high-velocity verbal exchange community • extra tightly coupled than a collection of impartial computers • Cooperate on a unmarried problem Vector computer systems • independent Vector hardware • may be an connected processor • Has each scalar and vector instructions • Vector instructions perform in exceptionally pipelined mode • can be reminiscence-to-memory or check in-toRegister SIMD Com puters
One manipulate Processor • numerous Processing factors • All Processing factors execute the identical training on the identical time • Interconnection network among PEs determines memory access and PE interplay The PRAM version • SIMD style Programming
• Uniform global memory • local memory in each PE • memory battle resolution – CRCW - common study, common Write – crew - common study, one-of-a-kind Write – EREW - specific study, distinctive Write – ERCW - (uncommon) different study, common Write The VLSI model • put into effect set of rules as a mostly combinational circuit • determine the location required for implementation • decide the intensity of the circuit
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.