Skip to main content

Background

Objective

Recall your research into the proportion of instructions that are branches in a given integer program in [Assignment 1](/docs/labs/Assignment 1/). You should now be familiar, through the lectures and assignments, how important accurate branch prediction is in the performance of computer systems; the goal of this lab is to further expand your understanding of branch prediction architecture and gem5 by having you implement a branch predictor.

Branch Predictors in gem5

You will find branch predictor source code in $GEM_PATH/src/cpu/pred. each of the source files corresponds to a different type of predictor and has a few different sections of source code. We will look through the local predictor as an example.

info

Creating a branch predictor requires you to create a custom object in gem5, there is a video and tutorial on how to do so [in the tutorials section](/docs/tutorial/Using Gem5/Advanced Concepts/custom).

There are two key functions in the 2-bit local branch predictor, namely lookup and update. These two make up the majority of the functionality required in a branch predictor.

Lookup

$GEM_PATH/src/cpu/pred/2bit_local.cc

bool
LocalBP::lookup(ThreadID tid, Addr branch_addr, void * &bp_history)
{
bool taken;
unsigned local_predictor_idx = getLocalIndex(branch_addr);

DPRINTF(Fetch, "Looking up index %#x\n",
local_predictor_idx);

uint8_t counter_val = localCtrs[local_predictor_idx];

DPRINTF(Fetch, "prediction is %i.\n",
(int)counter_val);

taken = getPrediction(counter_val);

return taken;
}

The highlighted lines are the key elements of the lookup function for this 2-bit branch predictor in gem5:

  1. Get Index of predictor
  2. Get Value
  3. Is Taken?

In the 2-bit counter, as you hopefully remember from lecture, we model as a flowchart like follows:

Bauer p.134

(Bauer p.134)

where each state is some permutation of our two bits.

tip

Recall that the local prediction scheme assumes that the values of the branch state are attached to each branch, in reality this is not possible so we use a hash based on the address of the branch instruction to index into the table of branch predictors.

Bauer p.136

(Bauer p.136)

So the steps are as follows:

  1. unsigned local_predictor_idx = getLocalIndex(branch_addr); - index into the PHT
  2. uint8_t counter_val = localCtrs[local_predictor_idx]; - get the value of the predictor
  3. taken = getPrediction(counter_val); - get the prediction

Update

$GEM_PATH/src/cpu/pred/2bit_local.cc
void
LocalBP::update(ThreadID tid, Addr branch_addr, bool taken /* important */, void *bp_history,
bool squashed, const StaticInstPtr & inst, Addr corrTarget)
{
assert(bp_history == NULL);
unsigned local_predictor_idx;

// No state to restore, and we do not update on the wrong
// path.
if (squashed) {
return;
}

// Update the local predictor.
local_predictor_idx = getLocalIndex(branch_addr);

DPRINTF(Fetch, "Looking up index %#x\n", local_predictor_idx);

if (taken) {
DPRINTF(Fetch, "Branch updated as taken.\n");
localCtrs[local_predictor_idx]++;
} else {
DPRINTF(Fetch, "Branch updated as not taken.\n");
localCtrs[local_predictor_idx]--;
}
}

Recalling the state diagram from the lookup function, this is pretty straightforward:

  1. local_predictor_idx = getLocalIndex(branch_addr); - index into the PHT
  2. Move the state of the predictor accordingly (c.f. lines 21 and 24)

The Assignment

You are expected to work as a team of 2 people with the second partner that was assigned to you at the beginning of the course. The expectations are covered in our collaboration policy

info

Pleeeeease actually talk to your partner, this is designed to get you both familiar with the ins and outs of gem5, and besides implementing a predictor is a really cool experience (at least I think so... 👉👈)

In this assignment, you will analyze how branching occurs in two benchmarks from the SPEC CPU2006 suite, specifically xz and mcf. To do this, you will analyze the effectiveness of various branch prediction strategies and instrument gem5 to track branch characterizations.

Simulation-Time Estimate

The gcc and xz benchmarks took less than 10 minutes to execute on otherwise idle lab machines, regardless of the predictor (local vs global) used. You will need to implement the GShare predictor and test that as well. In all, it should be roughly one hour of simulation time for the whole assignment, assuming the lab machines are not running anything else.

tip

When you get to implementing your version of GShare, you should strongly consider starting from one of the branch predictors inside the gem5 folder.

danger

You should back up any files you change, gem5 can be... picky...