1. PYTHIAS Inc, a company which builds branch prediction units has consulted you
ID: 3686260 • Letter: 1
Question
1. PYTHIAS Inc, a company which builds branch prediction units has consulted you about a new processor they are designing. (a)[10] The processor has allocated a single bit for a special type of smart branch" instruction which the compiler will set to indicate, for branches, if the branch is associated with a loop or with an if statement. What is a good default prediction for loops? What is a good default prediction for ifs? Why? (b)[5] Describe a way that dynamic branch prediction might help PYTHIAS better predict the result an if statement that is normally true. (c)[5] Branch prediction improves pipelining stalls due to what type of hazard?Explanation / Answer
Branch Prediction
Branch prediction is what the processor uses to decide whether to take a conditional branch or not. Getting this information as accurately as possible is important, as an incorrect prediction (mispredict) will cause the microprocessor to throw out all the instructions that did not need to be executed and start over with the correct set of instructions. This process is particularly expensive with deeply pipelined processors.
Many compilers rely on branch prediction to improve program performance by identifying frequently executed regions and by aiding in scheduling instructions.
Good default prediction for IF
When writing an if-else statement, always make the "then" block more likely to be executed than the else block, so the processor can take advantage of instructions already placed in the instruction fetch buffer.
The first type of construction considered here that causes conditional branches is if-else:
if ( *list == NULL )
{
/*
Code to initialize the list. This
occurs only the first time an element
is inserted.
*/
}
else
{
/*
Code to insert an element in the list.
This will be executed for all inserts
except the first.
*/
}
Good default prediction for loops
Backwards branch passes control to an address that is before the address of the branch instruction) usually control the iteration of loops. However, many non-backwards ranches can also control the iteration of loops either by exiting the loop or continuing the iteration
Backward branches are generated for while, do-while and for statements. The reason to make them likely is that loops are generally executed one or more times, so the branch at the end of the loop that tests for the condition is likely to be taken.
int
while_test( int a, int b, int c )
{
int count;
while ( c < a )
{
c += b;
a--;
count++;
}
return (count);
}
while_test:
stwu 1,-16(1) # allocate stack
b .L9 # jump to condition test
.L11:
add 5,5,4 # c += b
addi 3,3,-1 # a--
addi 9,9,1 # count++
.L9:
cmpw 7,5,3 # check c < a
blt+ 7,.L11 # iterate if true
mr 3,9 # = count
addi 1,1,16 # restore stack
blr # return
int
do_while_test( int a, int b, int c )
{
int count;
do
{
c += b;
a--;
count++;
}
while ( c < a );
return (count);
}
do_while_test:
stwu 1,-16(1) # allocate stack
.L13:
add 5,5,4 # c += b
addi 3,3,-1 # a--
cmpw 7,5,3 # check c < a
addi 9,9,1 # count++
blt+ 7,.L13 # iterate if true
mr 3,9 # = count
addi 1,1,16 # restore stack
blr # return
int
for_test( int a, int b, int c )
{
int count;
for(;c<a;count++)
{
c += b;
a--;
}
return (count);
}
for_test:
stwu 1,-16(1) # allocate stack
b .L27 # jump to condition test
.L28:
add 5,5,4 # c += b
addi 3,3,-1 # a--
addi 9,9,1 # count++
.L27:
cmpw 7,5,3 # check c < a
blt+ 7,.L28 # iterate if true
mr 3,9 # = count
addi 1,1,16 # restore stack
blr # return
Dynamic Branch Prediction
Dynamic branch prediction is done in the microprocessor by using a history log of previously encountered branches containing data for each branch, noting whether or not it was taken. This branch-history log is known as the Branch Target Buffer (BTB). Every time a branch is encountered and the microprocessor knows which direction the branch has taken, the BTB is updated to reflect that information.
The BTB is a buffer that holds a branch’s address and a brief history of the direction that a branch has taken. The address field is used somewhat like an index to the BTB, where it looks up whether a branch should be taken or not taken. There are 16 bits in the Pentium 4 processor to signify whether a branch should be taken or not. The bits work like a circular buffer, with each bit being checked for each check into the BTB.
For Example: The do-while loop has been executed multiple times, with each execution of the loop containing a fixed amount of four iterations. Now that the history for this loop is in the BTB, whenever this code is executed again, it will not cause any branch mispredicts and the accompanying penalty.
One thing to remember is that the BTB is finite. Once all the entries in the BTB have been consumed, an older entry will need to be used for a new branch that is encountered.
Another benefit of loop unrolling is that dependence chains are stretched, allowing for deep pipelines to get more utilization.
There are some rules that need to be followed:
Branch prediction improves pipelining stalls due to what type of hazard?
Control Hazards can result from branch instructions. Here, the branch target address might not be ready in time for the branch to be taken, which results in stalls (dead segments) in the pipeline that have to be inserted as local wait events, until processing can resume after the branch target is executed. Control hazards can be mitigated through accurate branch prediction and by delayed branch strategies.
The most common type of control hazard is the branch instruction, which has two alternative results: (1) jump to the branch target address if the instruction succeeds, or (2) execute the instruction after the branch, if the branch fails.
The following four strategies are employed in resolving control dependencies due to branch instructions.
Assume Branch Not Taken. As we saw previously, we can insert stalls until we find out whether or not the branch is taken. However, this slows pipeline execution unacceptably. A common alternative to stalling is to continue execution of the instruction stream as though the branch was not taken. The intervening instructions between the branch and its target are then executed. If the branch is not taken, this is not a harmful or disruptive technique. However, if the branch is taken, then we must discard the results of the instructions executed after the branch statement
Reducing Branch Delay we currently assume that the branch condition is evaluated of the pipeline (EX). If we move the branch evaluation up one stage, and put special circuitry in the ID, then we can evaluate the branch condition for the BEG instruction. This would allow us to take the branch in EX instead of MEM, since we would have ready for Stage 3 (EX) the Zero output of the comparator that would normally be computed in EX. The hardware needed to compute equality is merely a series of parallel xnor gates and-ed together, then inverted
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.