Questions

Preface

Danger (warning number 2)

You will be asked to prepare a diff of your version of gem5 and the source tree. Make a backup of your implementation files before trying to make the diff for fear of losing all your hard work. There is a video on how to prepare a diff.

Additionally, take a look at the git documentation for how to create a diff.

Answer the following questions in a PDF document. Whenever you are required to report numbers/statistics, tables and graphs are welcome, but must be accompanied by discussions.

Your predecessor at AMD collected the following statistics before his... departure:

Data

`mcf`

Total Branches: 19721575

LocalBP	Taken	Not Taken
Forward	4914814	10511241
Backward	3854945	987032

LocalBP (proportion correctly predicted): $18694193 / 1972157 = 94.8\%$

GlobalBP	Taken	Not Taken
Forward	5300192	10568128
Backward	4061431	858512

GlobalBP (proportion correctly predicted): $18946810 / 1972157 = 96.1\%$

`gcc`

Total Branches: 11034363

LocalBP	Taken	Not Taken
Forward	4928492	3375839
Backward	2928204	882677

LocalBP (proportion correctly predicted): $10008245 / 11034363 = 90.0\%$

GlobalBP	Taken	Not Taken
Forward	4858809	3439865
Backward	2954803	932846

GlobalBP (proportion correctly predicted): $10147120 / 11034363 = 92.0\%$

`gcc1`

Total Branches: 10889855

LocalBP	Taken	Not Taken
Forward	4877373	3528105
Backward	2828925	971724

LocalBP (proportion correctly predicted): $10000959 / 10889855 = 91.8\%$

GlobalBP	Taken	Not Taken
Forward	4863576	3582457
Backward	2836581	976723

GlobalBP (proportion correctly predicted): $10114105 / 10889855 = 92.8\%$

Workload Dependence

Note how the branch prediction accuracy changes not only based on the program run, but also on the workload supplied to the program. For a more explicit example, you can take a look at the abench benchmark. The abench benchmark is a synthetic workload designed specifically to demonstrate the behaviour of the branch predictor under different workloads.

Questions

You now have to finish his work (and do so as a part of a team. Remember those collaboration guidelines?) by answering the following questions:

General branch frequency:
1. (1 point) How often are branches executed in your benchmarks?
  - Report absolute values relative to the total number of instructions.
  - Report percentage relative to the total number of instructions.
2. (2 point) If there is a difference between the frequency of branches in the two programs?
  - If so, what is your hypothesis to explain it?
3. (3 point) How do the frequencies that you measured with gem5 compare with the "average" frequency of one branch for every 4-7 instructions mentioned in the textbook?
  - If there is a difference from the average, what is your hypothesis to explain it?
4. (1 point) Report the frequencies of each element of the Cartesian product (forward, backward) x (taken, not taken) for both benchmarks with each branch prediction strategy. Express your numbers in 6 tables of the following type:
  
  Taken Not Taken
  Forward
  Backward
Prediction accuracy:
1. (2 points) Report the number of correct predictions for all branch prediction strategies, including what the static schemes would have achieved.
  - For the static schemes you do not have to run a simulation, just use the branch characterization results.
  - In particular, discuss whether or not your results support the use of the traditional static scheme of "backward branches taken and forward branches not taken".
2. (2 points) Which prediction scheme has the most correct predictions for each program? Why?
  - If you need to run additional simulations to justify your analysis and conclusion, do so.
  note
  Do not be surprised if your results do not coincide with conventional wisdom because you are executing a limited number of instructions at the beginning of an application, and that may not reflect the general behaviour of the whole program.
3. (1 point) This question is only applicable if each of your programs performed best with a different branch prediction scheme: if you had to pick only one prediction scheme that would be used for both programs, which would it be and why?
4. (2 points) Is there a limit to the number of useful counters and/or history bits/PC bits?
  - If you had additional hardware to spend in your predictor, would you increase the number of bits in each counter or would you keep the same number of bits per counter but increase the number of counters?
    - Perform simulations to determine what the situation really is. Consider the two new predictors (gshare and perceptron).
    - Limit yourself to about one order of magnitude more storage than the base case.
  - What do you think explains your data?
  tip
  Keep in mind that when doing a sensitivity analysis of this kind, you should keep all factors constant except for the one that you are varying.
(3 points) Analytical question:

A (3,2) correlator predictor is a correlator predictor that has a global register with three bits and in which each entry of the PHT has two bits. For a (3,2) correlated prediction scheme that has a global history register and one pattern history table, show the contents of the history register and the PHT after the following sequence of branch executions have taken place:
- 1 taken branch
- 2 not-taken branches
- 2 taken branches
- 2 not-taken branches
- 2 taken branches
Assume that:
- The prediction bits in the PHT use 2-bit prediction, implemented with saturating counters. The counters are incremented when a branch is taken and decremented when a branch is not taken. Each counter is initialized to the weak taken state, as illustrated in the textbook and in the lecture slides. Assume this state has the value 10.
- The global history register is initialized to zero.
- The addresses of the nine branch instructions are: ffff0000, ffff0001, ffff0010, ffff0000, ffff0000, ffff0000, ffff0011, ffff0010, ffff0000... but do you need this information?
- The PHT should be shown as updated after the execution of the last branch.
(2 points) Bonus for extra credit (not mandatory):

After getting a sense of how prediction correctness varies with changes in the size of the hardware data structures (c.f. question 2.4), the question that microarchitects usually face, is:

"If I have a certain amount of chip area that I can devote to branch prediction, how should I divide it among the various branch data structures for correlating branch prediction, the global history registers, the pattern history table (and the branch target buffer)?"

Try to answer that question for a bit budget of 2K bits plus a few extra bits (say, less than 64) for the branch prediction only (i.e. not the branch target buffer). In essence, you have to come up with a design SAs where S is the number of history registers and s is the number of PHT tables of adequate sizes.
(1 point) The best time to be critical and to offer constructive suggestions about an assignment is soon after you complete it. Please include in your report the following:
1. A narrative description of any difficulties or misunderstandings that made the assignment unnecessarily difficult, or that led you to waste time.
2. Suggestions of changes that could make this assignment more interesting, more relevant or more challenging in future editions of this course.
(2 points) Make use of proper typesetting in the report and overall presentation style. This includes the use of properly referenced figures, tables, and graphs (with descriptive axis titles and proper identification of units of measurement) where applicable.
(2 points) Meeting notes. See the collaboration requirements for what to include.
(1 point) Partner Evaluation Please complete the confidential partner evaluation for your partner. If you both complete the evaluation, you get a point.

Preface​

Data​

mcf​

gcc​

gcc1​

Questions​

Preface

Data

`mcf`

`gcc`

`gcc1`

Questions