Procedure

danger

You will be asked to prepare a diff of your version of gem5 and the source tree. Make a backup of your implementation files before trying to make the diff for fear of losing all your hard work. There is a video (and slides) on how to safely prepare a diff.

Get acquainted to executing benchmarks gcc and mcf on the simulator, according to the instructions page.

Important!

To answer the questions that follow, you will need to create additional statistics in gem5, recompile it and (re-)run the benchmarks. Check the [instructions page](/docs/tutorial/Using Gem5/Basic Concepts/statistics) for tips on how to implement new statistics.

Implement statistics that tell you the number of forward and backward branches in the execution.

tip
You can count your statistics on either fetched or committed branches, but you have to choose one and be consistent by always counting at the same point in the pipeline. Also, your report must state if you counted fetched or committed branches.

Important!
Due to a limitation of gem5, you will only be able to access branch targets for direct control instructions; don't worry about reporting statistics for indirect instructions for this part.
Amendment (2024): The limitation that made reporting indirect branches impossible has been fixed, however to keep the scope of the assignment reasonable we still will not require you to report the indirect branches. If you are curious about how to do it anyways, encourage you to grep through the instruction representation in gem5 and look for the indirect branch data.
Implement the gshare branch predictor discussed in class. You should first apply this patch to your gem5 repository, which will provide you with the templates for the gshare cc and hh files. You can also find the patch in 429-resources/templates/gshare/template.patch if you want to browse what exactly is going on.

tip
Recall that the gshare predictor takes the global history and XORs some bits of the PC to index into its list of counters. You are welcome (even encouraged) to examine any of the included predictors to assist you in implementing your own.
For each of the following branch prediction strategies:
- perceptron (--branch_predictor=MultiperspectivePerceptron8KB)
- gshare (the one you just implemented)
execute both benchmarks (mcf and gcc) using the SiFiveP550.py configuration and record the results

tip
Save the m5out/stats.txt file for each one of the 6 cases. You will also thank yourself for writing a script to run all of this inside a tmux session on the lab machines. More details in the tutorials
tip: Saving Stats
there is an option in gem5 called --stats-file=NAME that allows you to set a name for each of the stats file, you might want to try using it in your script....

note
As a good tradeoff between representativity and execution time, you may, but are not required to, set your maximum instructions to 100 million. Note that even if you do so, sometimes gem5 reports some instructions short or in excess of the value you set. That is fine as long as your results are consistent.

Important!
Make sure that in SiFiveP550.py you have specified the O3CPU and keep the machine configuration consistent for all your experiments. The O3CPU is named RiscvO3CPU in the command line options and in code.

tip

The provided configuration scripts (e.g SiFiveP550.py) have a variable named args.branch_predictor that sets the branch predictor. Change the value of the variable to a string with the name of the desired branch predictor. (e.g args.branch_predictor = “MultiperspectivePerceptron8KB”).

You should consult the $GEM_PATH/src/cpu/pred/BranchPredictor.py file for a complete list of the available branch predictors. The corresponding command line option is the class type field.

Postscript

Using synthetic benchmarks to test branch prediction accuracy is often not reflective of the true performance of a branch predictor. Real-world programs are the targets for branch predictor optimizations, you may find the source code for the abench benchmark fun to look at. You can find it here.

The workloads passed to the benchmark can be found in 429-resources/cpu2017/inputs, one workload is a random sequence and the other is a lot of ones.

Postscript​

Postscript