Background
This is an optional reading, feel free to skip to the assignment if you want to forgo this very cool and important knowledge that we're giving you.
Introduction
In 2017 a major security vulnerability in CPUs named Spectre was discovered. Later in 2018 it was announced to the public accompanying mitigations from major vendors. This lab will introduce the mechanisms behind Spectre and then examine it in detail through Gem5.
This attack has continuing ramifications on CPU design as many proposed attacks and defences have been suggested in the wake of Spectre.
Speculation and Side Effects
Out-of-order execution provides significant performance benefits to CPUs by allowing them to dynamically schedule instructions to make forward progress despite dealing with delays such as cache misses. When waiting to resolve a cache miss, many instructions may be in flight and others will be waiting in the reorder buffer until the delay is resolved. Importantly, some instructions will be dependant on the result of the cache miss and either have to stall or speculate on the value that will be returned.
Speculation refers to the execution of instructions that may not need to be executed. Sometimes the prediction of which execution path should be followed is incorrect and instructions that were not intended to be executed are speculatively executed.
Examples of decisions that lead to speculative execution include branch prediction, indirect branch prediction, and load-store dependency prediction.
When a misprediction occurs the younger instructions in flight should be squashed so that execution can begin down the correct path starting from the mispredicted instruction. This understanding is correct but incomplete, for example consider the following code that reads from an array only if the index is less than the length of the array:
if(x < len){
y = a[x];
}
This compiles to the following MIPS assembly with numbered instructions of interest:
lw $3,0($fp) ;Load X
(1) lw $2,4($fp) ;Load len
slt $2,$3,$2 ;Check if x < len
(2) beq $2,$0,.L8 ;Jump if branch is false
lw $2,0($fp) ;Load X
dsll $2,$2,2 ;Shift x<<2
ld $3,8($fp) ;Load &A
daddu $2,$3,$2 ;Add &A + (x<<2) to get A[X]
(3) lw $2,0($2) ;Load A[X]
sw $2,16($fp) ;Write result to Y
.L8
Assume that (1) cannot retire because len
was recently evicted from the cache and its value needs to be brought from memory. Now imagine that the branch predictor predicts that (2) will be not taken, then the CPU will begin filling the pipeline and executing instructions from within the if statement. Consider the load (3), this instruction will also request data from the cache while being executed speculatively. In fact since the branch that checks for inbounds access was speculatively true, any value of X
can be used to access A
and A[X]
will be brought into the cache. Now if (1) returns its result and we find that the branch should have been taken, all the instructions will be squashed, the registers will be reset to their state at (2) and execution begins correctly.
However, during this process the cache was perturbed even after an incorrect path was rolled back. Moreover the value that was brought into the cache is could not possibly be in the cache if the program executes in program order. Thus, the more nuanced understanding of speculation is that it rolls back the CPU state but not the microarchitectural state (i.e. the state of the cache).
Side-channels
The programming model used by programmers to develop code provides an abstract view of the underlying machinery that leads to significant improvement in productivity.
For example, the code x++
is very simple at face value but expands out to be much more complicated code. It results in a load, an add and then a store. The load instruction will influence the state of the cache by bringing in the cache line if necessary. An out-of-order CPU may schedule instructions in between the add and the store to mitigate the load-use data dependency. All of this is invisible to the productive programmer.
However, these invisible effects can be mobilized to covertly communicate information between processes residing in the same CPU. To understand this recall that a cache hit is quick and a cache miss is orders of magnitude slower. Now imagine you were tasked with transmitting one byte of information purely by using these timing properties of the cache, how would you proceed?
For simplicity assume that the cache is direct mapped. One approach is to assign cache lines the numbers 0-255, both the sender and receiver are aware of their numbering. With this established, the sender can evict all 256 lines from the cache, then perform a read to bring the line that has the number corresponding to the value it wishes to transmit. At this point there is exactly one line present out of the 256 numbered in the cache, the receiver simply has to access each line in turn and measure the time it takes for each access. They will observe that there are 255 long delays which correspond to misses and exactly 1 quick access which is the cache hit. The receiver simply records the number assigned to the line that was a hit and has received the value transmitted by the sender. This can be repeated to transmit multiple bytes. Consider the example of sending the value 127:
- Sender and Receiver agree on cache line numbering 0-255.
- Sender wants to send 127
- Sender evicts cache lines 0-255
- Sender reads address mapping to line 127
- Receiver reads addresses mapping to lines 0-255 and measures timing
- Receiver notices that lines 0-126 and 128-255 all take ~100 cycles while line 127 takes 1 cycle.
- Receiver gets value 127
This is known as Flush + Reload since it involves flushing cachelines and reloading the value of interest. Flush + Reload is an example of a side channel which is the transmission or leakage of information through microarchitectural components. In the example data was intentionally transmitted through the cache but later we will see that Spectre uses the same technique to read values that are off-limits.