Questions
Preface
You will be asked to prepare a diff of your version of gem5 and the source tree. Make a backup of your implementation files before trying to make the diff for fear of losing all your hard work. There is a video on how to prepare a diff.
Additionally, take a look at the git documentation for how to create a diff.
Answer the following questions in a PDF document. Whenever you are required to report numbers/statistics, tables and graphs are welcome, but must be accompanied by discussions.
Your predecessor at AMD collected the following statistics before his... departure:
Data
mcf
Total Branches: 19721575
LocalBP | Taken | Not Taken |
---|---|---|
Forward | 4914814 | 10511241 |
Backward | 3854945 | 987032 |
LocalBP (proportion correctly predicted):
GlobalBP | Taken | Not Taken |
---|---|---|
Forward | 5300192 | 10568128 |
Backward | 4061431 | 858512 |
GlobalBP (proportion correctly predicted):
gcc
Total Branches: 11034363
LocalBP | Taken | Not Taken |
---|---|---|
Forward | 4928492 | 3375839 |
Backward | 2928204 | 882677 |
LocalBP (proportion correctly predicted):
GlobalBP | Taken | Not Taken |
---|---|---|
Forward | 4858809 | 3439865 |
Backward | 2954803 | 932846 |
GlobalBP (proportion correctly predicted):
gcc1
Total Branches: 10889855
LocalBP | Taken | Not Taken |
---|---|---|
Forward | 4877373 | 3528105 |
Backward | 2828925 | 971724 |
LocalBP (proportion correctly predicted):
GlobalBP | Taken | Not Taken |
---|---|---|
Forward | 4863576 | 3582457 |
Backward | 2836581 | 976723 |
GlobalBP (proportion correctly predicted):
Note how the branch prediction accuracy changes not only based on the program run, but also on the workload supplied to the program. For a more explicit example, you can take a look at the abench
benchmark. The abench
benchmark is a synthetic workload designed specifically to demonstrate the behaviour of the branch predictor under different workloads.
Questions
You now have to finish his work (and do so as a part of a team. Remember those collaboration guidelines?) by answering the following questions:
-
General branch frequency:
-
(1 point) How often are branches executed in your benchmarks?
- Report absolute values relative to the total number of instructions.
- Report percentage relative to the total number of instructions.
-
(2 point) If there is a difference between the frequency of branches in the two programs?
- If so, what is your hypothesis to explain it?
-
(3 point) How do the frequencies that you measured with gem5 compare with the "average" frequency of one branch for every 4-7 instructions mentioned in the textbook?
- If there is a difference from the average, what is your hypothesis to explain it?
-
(1 point) Report the frequencies of each element of the Cartesian product (forward, backward) x (taken, not taken) for both benchmarks with each branch prediction strategy. Express your numbers in 6 tables of the following type:
Taken Not Taken Forward Backward
-
-
Prediction accuracy:
- (2 points) Report the number of correct predictions for all branch prediction strategies, including what the static schemes would have achieved.
- For the static schemes you do not have to run a simulation, just use the branch characterization results.
- In particular, discuss whether or not your results support the use of the traditional static scheme of "backward branches taken and forward branches not taken".
- (2 points) Which prediction scheme has the most correct predictions for each program? Why?
- If you need to run additional simulations to justify your analysis and conclusion, do so.
noteDo not be surprised if your results do not coincide with conventional wisdom because you are executing a limited number of instructions at the beginning of an application, and that may not reflect the general behaviour of the whole program.
- (1 point) This question is only applicable if each of your programs performed best with a different branch prediction scheme: if you had to pick only one prediction scheme that would be used for both programs, which would it be and why?
- (2 points) Is there a limit to the number of useful counters and/or history bits/PC bits?
- If you had additional hardware to spend in your predictor, would you increase the number of bits in each counter or would you keep the same number of bits per counter but increase the number of counters?
- Perform simulations to determine what the situation really is. Consider the two new predictors (gshare and perceptron).
- Limit yourself to about one order of magnitude more storage than the base case.
- What do you think explains your data?
tipKeep in mind that when doing a sensitivity analysis of this kind, you should keep all factors constant except for the one that you are varying.
- If you had additional hardware to spend in your predictor, would you increase the number of bits in each counter or would you keep the same number of bits per counter but increase the number of counters?
- (2 points) Report the number of correct predictions for all branch prediction strategies, including what the static schemes would have achieved.
-
(3 points) Analytical question:
A (3,2) correlator predictor is a correlator predictor that has a global register with three bits and in which each entry of the PHT has two bits. For a (3,2) correlated prediction scheme that has a global history register and one pattern history table, show the contents of the history register and the PHT after the following sequence of branch executions have taken place:
- 1 taken branch
- 2 not-taken branches
- 2 taken branches
- 2 not-taken branches
- 2 taken branches
Assume that:
- The prediction bits in the PHT use 2-bit prediction, implemented with saturating counters. The counters are incremented when a branch is taken and decremented when a branch is not taken. Each counter is initialized to the weak taken state, as illustrated in the textbook and in the lecture slides. Assume this state has the value 10.
- The global history register is initialized to zero.
- The addresses of the nine branch instructions are:
ffff0000, ffff0001, ffff0010, ffff0000, ffff0000, ffff0000, ffff0011, ffff0010, ffff0000
... but do you need this information? - The PHT should be shown as updated after the execution of the last branch.
-
(2 points) Bonus for extra credit (not mandatory):
After getting a sense of how prediction correctness varies with changes in the size of the hardware data structures (c.f. question 2.4), the question that microarchitects usually face, is:
"If I have a certain amount of chip area that I can devote to branch prediction, how should I divide it among the various branch data structures for correlating branch prediction, the global history registers, the pattern history table (and the branch target buffer)?"
Try to answer that question for a bit budget of 2K bits plus a few extra bits (say, less than 64) for the branch prediction only (i.e. not the branch target buffer). In essence, you have to come up with a design SAs where S is the number of history registers and s is the number of PHT tables of adequate sizes.
-
(1 point) The best time to be critical and to offer constructive suggestions about an assignment is soon after you complete it. Please include in your report the following:
- A narrative description of any difficulties or misunderstandings that made the assignment unnecessarily difficult, or that led you to waste time.
- Suggestions of changes that could make this assignment more interesting, more relevant or more challenging in future editions of this course.
-
(2 points) Make use of proper typesetting in the report and overall presentation style. This includes the use of properly referenced figures, tables, and graphs (with descriptive axis titles and proper identification of units of measurement) where applicable.
-
(2 points) Meeting notes. See the collaboration requirements for what to include.
-
(1 point) Partner Evaluation Please complete the confidential partner evaluation for your partner. If you both complete the evaluation, you get a point.