Advanced Computer Architecture 510/610 Winter 2016 (1) This question will be abo

ID: 3808190 • Letter: A

Question

Advanced Computer Architecture 510/610 Winter 2016 (1) This question will be about the following computer: (Note that some of the following information may not be needed) 1.5 GHz, No Pipeline, instruction buffers that negate instruction load penalties 32Kb L1 cache, 1 cycle access, 4-way associative, write-through, not write-allocate, 2% miss rate for Data and a miss rate of 1% for Instructions 1Mb L2 cache, 3 cycle access, 4-way associative, write-back, 30% miss rate, 20% dirty Block size is 4 32-bit words for both caches 1Gb RAM, DDR2 1.5 GHz 20 nanosecond latency 2-bit dynamic branch predictor that is right 90% of the time. When the prediction is correct and it is a branch not taken then there is no penalty. When the prediction is to branch and it is correct, there is a 1 cycle penalty, and there is a 2 cycle penalty if incorrect. Assume branches are 1/5 of all instructions, 4/5 of branches are taken, memory access is 1/10 of all instructions. What is CPI?

Explanation / Answer

Pipelining define
•   advent
–   Defining Pipelining
–   Pipelining instructions
•   dangers
–   Structural risks
–   facts risks
–   manipulate hazards
•   overall performance
•   Controller implementation
Pipeline risks
•   in which one practise can not immediately observe some other
•   kinds of risks
–   Structural dangers - try to use the identical aid through or greater commands
–   manage dangers - try to make branching decisions before branch circumstance is evaluated
–   statistics dangers - attempt to use information earlier than it is prepared
•   Can constantly resolve dangers by means of waiting
Structural dangers
•   try and use the equal useful resource through or extra instructions on the identical time
•   instance: unmarried memory for commands and information
–   Accessed by means of IF level
–   Accessed at equal time with the aid of MEM stage
•   answers
–   delay the second access with the aid of one clock cycle, OR
–   provide separate recollections for instructions & information
»   this is what the book does
»   that is known as a “Harvard structure”
actual pipelined processors have separate caches
Pipelined instance -
Executing more than one commands
•   don't forget the subsequent guidance series:
lw $r0, 10($r1)
sw $sr3, 20($r4)
upload $r5, $r6, $r7
sub $r8, $r9, $r10
Structural hazards
some common Structural risks:
•   memory:
–   we’ve already cited this one.
•   Floating factor:
–   considering many floating factor instructions require many cycles, it’s smooth for them to intervene with each different.
•   starting up more of one kind of guidance than there are assets.
–   as an instance, the PA-8600 can aid ALU + two load/shop commands according to cycle - that’s how plenty hardware it has to be had.

Structural hazards
•   Structural hazards are decreased with these rules:
–   each instruction uses a resource at most once
–   constantly use the resource inside the same pipeline stage
–   Use the aid for one cycle most effective
•   Many RISC ISAs are designed with this in mind
•   sometimes very tough to try this.
–   as an example, reminiscence of necessity is used inside the IF and MEM stages.
Structural dangers
We need to evaluate the performance of two machines. Which system is quicker?
•   device A: dual ported memory - so there are not any memory stalls
•   system B: single ported memory, but its pipelined implementation has a clock rate this is 1.05 times faster
count on:
•   best CPI = 1 for both
•   loads are forty% of commands done
speed up Equations for Pipelining

We want to evaluate the performance of two machines. Which system is faster?
•   device A: twin ported reminiscence - so there aren't any reminiscence stalls
•   gadget B: single ported memory, however its pipelined implementation has a 1.05 times faster clock price
count on:
•   ideal CPI = 1 for both
•   loads are forty% of commands performed
SpeedUpA = Pipeline intensity/(1 + zero) x (clockunpipe/clockpipe)
= Pipeline depth
SpeedUpB = Pipeline intensity/(1 + zero.4 x 1)
       x (clockunpipe/(clockunpipe / 1.05)
        = (Pipeline intensity/1.four) x 1.05
        = 0.seventy five x Pipeline intensity
SpeedUpA / SpeedUpB = Pipeline intensity / (0.75 x Pipeline depth) = 1.33
•   device A is 1.33 times faster
Pipelining summary
•   speed up <= Pipeline intensity; if ideal CPI is 1, then:
•   dangers restriction performance on computer systems:
–   Structural: need extra HW assets
–   statistics (uncooked,warfare,WAW)
–   control

Pipeline risks
•   where one guidance can't right now comply with another
•   forms of hazards
–   Structural hazards - attempt to use equal resource two times
–   manipulate risks - try to make selection earlier than condition is evaluated
–   facts hazards - try and use facts before it is prepared
•   Can constantly resolve dangers through ready

hardware problems
• wide variety and kind of Processors • Processor manage • memory Hierarchy • I/O devices and Peripherals • working device assist • packages software program Compatibility running device problems • Allocating and handling resources • get admission to to hardware functions – Multi-Processing – Multi-Threading • I/O management • access to Peripherals • efficiency applications problems • Compiler/Linker assist • Programmability • OS/hardware function Availability • Compatibility • Parallel Compilers – Preprocessor – Precompiler – Parallelizing Compiler structure Evolution • Scalar architecture • Prefetch Fetch/Execute Overlap • a couple of useful gadgets • Pipelining • Vector Processors • Lock-Step Processors • Multi-Processor Flynn’s class • remember instruction Streams and statistics Streams one by one. • SISD - unmarried guidance, unmarried facts stream • SIMD - unmarried practise, a couple of information Streams •
MIMD - multiple instruction, more than one facts Streams.
• MISD - (uncommon) more than one instruction, single data circulation SISD • traditional computer systems. • Pipelined structures • a couple of-useful Unit systems • Pipelined Vector Processors • consists of maximum computers encountered in ordinary existence SIMD • a couple of Processors Execute a unmarried program • each Processor operates on its personal statistics • Vector Processors • Array Processors • PRAM Theoretical version MIMD • multiple Processors cooperate on a unmarried venture • every Processor runs a extraordinary program • each Processor operates on one of a kind statistics • Many industrial Examples Exist MISD • A unmarried records movement passes via multiple processors • distinctive operations are brought on on unique processors • Systolic Arrays •
Wave-the front Arrays Programming issues • Parallel computers are hard to software • automatic Parallelization techniques are handiest partially a success • Programming languages are few, no longer well supported, and hard to apply. • Parallel Algorithms are difficult to design. performance issues • Clock price / Cycle Time = • Cycles in keeping with guidance (average) = CPI • education count number = I c • Time, T = I c × CPI × • p = Processor Cycles, m = reminiscence Cycles, ok = reminiscence/Processor cycle ratio • T = I c × (p + m × ok) × performance troubles II •
Ic & p laid low with processor design and compiler technology. • m affected mainly by using compiler technology tormented by processor design • okay affected by memory hierarchy structure and design other Measures • MIPS charge - thousands and thousands of instructions according to 2d • Clock rate for comparable processors • MFLOPS rate - millions of floating factor operations in step with 2d. • these measures are not neccessarily immediately similar among exceptional forms of processors. Parallelizin g Code • Implicitly – Write Sequential Algorithms – Use a Parallelizing Compiler – rely on compiler to discover parallelism • Explicitly – design Parallel Algorithms – Write in a Parallel Language – rely on Human to find Parallelism Multi-Processors •
Multi-Processors normally percentage memory, even as multi-computer systems do now not. – Uniform memory model – Non-Uniform reminiscence version – Cache-only • MIMD Machines Multi-Com puters • impartial computers that Don’t proportion memory. • related by way of high-velocity verbal exchange community • extra tightly coupled than a collection of impartial computers • Cooperate on a unmarried problem Vector computer systems • independent Vector hardware • may be an connected processor • Has each scalar and vector instructions • Vector instructions perform in exceptionally pipelined mode • can be reminiscence-to-memory or check in-toRegister SIMD Com puters

One manipulate Processor • numerous Processing factors • All Processing factors execute the identical training on the identical time • Interconnection network among PEs determines memory access and PE interplay The PRAM version • SIMD style Programming
• Uniform global memory • local memory in each PE • memory battle resolution – CRCW - common study, common Write – crew - common study, one-of-a-kind Write – EREW - specific study, distinctive Write – ERCW - (uncommon) different study, common Write The VLSI model • put into effect set of rules as a mostly combinational circuit • determine the location required for implementation • decide the intensity of the circuit

Navigate

Advanced Company\'s experienced a significant downturn in financial results due

Advanced Computer Architecture 510/610 Winter 2016 This question will be about t

Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.

Advanced Computer Architecture 510/610 Winter 2016 (1) This question will be abo

Question

Explanation / Answer

Related Questions

Navigate