reveal.js

# Pipelining

---

CS 130 // 2021-12-01

## Administrivia
- No quiz today, but there will be one on Monday (online, untimed, on Gradescope)
- 
 Midterm Exam 3 is returned on Gradescope
- 
 Uplift Delivery Technology Internship Lunch-n-Learn
    + December 3 at Noon
    + Collier-Scripps Hall, Room 235

# The Datapath

## The Datapath
- What is a "datapath" and what does it consist of?

## K&S Datapath Demo

- [K&S Datapath Simulator](javascript:window.open('http://users.dickinson.edu/%7Ebraught/kands/KandS2/datapath.html','KSA','width=525,height=650');)
- [K&S Simulator with Main Memory](javascript:window.open('http://users.dickinson.edu/%7Ebraught/kands/KandS2/dpandmem.html','KSB','width=800,height=650');)
- [K&S Simulator with Microprogramming](javascript:window.open('http://users.dickinson.edu/%7Ebraught/kands/KandS2/micromachine.html','KSC','width=850,height=650');)
- [K&S Computer Simulator](javascript:window.open('http://users.dickinson.edu/%7Ebraught/kands/KandS2/machine.html','KSC','width=850,height=650');)

## Instruction Execution
1. Uses the PC to fetch the next instruction to be executed from memory
2. 
 Identifies instruction type and registers involved and fetches their contents from the register file
3. 
 Depending on the instruction:
    - 
 Uses the ALU to do appropriate arithmetic
    - 
 Read/write RAM for load/store

## Datapath Implementation

## ALU Control

![ALU Control By Instruction](/teaching/2021f/cs130/assets/images/COD/alu_control_by_instruction.png)

## Datapath with Control

## Control Truth Table

- The control unit can be implemented with a simple combinational logic unit from its truth table:

![Control Truth Table](/teaching/2021f/cs130/assets/images/COD/control_truth_table.png)

# Performance Issues

## Performance Issues
- Longest delay determines clock period
- 
 Some stages of the datapath are idle waiting for others to finish
- 
 Can improve performance by **pipelining**

# Pipelining

## Pipeline Analogy
- Suppose you need to do four loads of laundry
- 
 Each load of laundry needs to be
    1. Washed via the washing machine
    2. Dried via the dryer
    3. Folded
    4. Put away in the closet
- 
 For simplicity, assume that each task takes 30 mins

## Pipeline Analogy
- How long does it take to complete four loads?
- 
 One approach uses only one stage at a time and does nothing in parallel:
    ![nonpipelined_laundry](/teaching/2021f/cs130/assets/images/COD/unpipelined_laundry.png)
- 
 Notice that the washer is unused 3/4 of the time

## Pipeline Analogy
- Another approach is harnessing parallelism by running independent stages simultaneously
    ![pipelined_laundry](/teaching/2021f/cs130/assets/images/COD/pipelined_laundry.png)
- 
 How much of a speedup does this approach give us?
    + 
 $8/3.5 = 2.3\times$ speedup
    + 
 $2n/0.5n = 4\times$ speedup if running continuously

## Pipelined Datapath

## Pipelined Datapath
- Five stages:
    1. **IF**: Instruction Fetch
    2. **ID**: Instruction Decode
    3. **EX**: Execute
    4. **MEM**: Memory access
    5. **WB**: Write back

## Pipeline Performance
- Assume time for stages is:
    + `$100\text{ps}$` for register read/write
    + `$200\text{ps}$` for other stages

![Pipeline Performance](/teaching/2021f/cs130/assets/images/COD/pipeline_performance_table.png)

## Without a Pipeline
![Pipeline Performance](/teaching/2021f/cs130/assets/images/COD/nonpipelined_mips_instructions.png)

- Why must the clock be set to `$800\text{ps}$` when some instructions like `beq` could be completed in `$500\text{ps}$`?
    + 
 Clock speed is limited by **slowest** instruction: `lw`

## With a Pipeline
![Pipeline Performance](/teaching/2021f/cs130/assets/images/COD/pipelined_mips_instructions.png)

- 
 How much of a speedup does this approach give us?
    + 
 `$2400/1400 = 1.7\times$` speedup
    + 
 `$800n/200n = 4\times$` if running continuously

## Pipeline Performance
- Does using a pipeline increase the efficiency of executing **individual** instructions?
    + 
 No, it slows them down from `$800\text{ps}$` to `$1\text{ns}$`
    + 
 Performance benefits come from increased **throughput** do to the parallelism

## Why MIPS is Good for Pipelining
- All MIPS instructions are the **same length**
    + Easy to fetch instruction in cycle 1
    + Easy to decode instruction in cycle 2
- 
 MIPS has only **a few instruction formats**
    + Registers will always be in same location
    + Easy to decode instructions

# Hazards

## Hazards
- Up until now, we have pretended that each instruction is **independent** of the others and that there are no conflicts
- 
 In reality, instructions often depends on previous ones, which may cause naive pipelining to fail

## Hazards
- Situations that prevent immediately executing the next instruction in the pipeline are called **hazards**
- 
 Three types of hazards:
    1. Structure hazards
    2. Data hazards
    3. Control hazards

# Structure Hazards

## Structure Hazards
- In our laundry example, we were assuming the "fold" and "put away" stages could be done simultaneously
    + However, if you are working by yourself, you would have to do them sequentially
- 
 A **structure hazard** occurs when a required resource is busy performing another task

## Structure Hazards
- Recall that our MIPS pipeline has five stages:
    + IF, ID, EX, MEM, WB
- 
 The ID and WB stages read and write to the register file simultaneously
    + Does this create a structure hazard?
    + 
 No, by design, the register file supports simultaneous reading and writing

## Structure Hazards
- Suppose that we stored instruction memory and data memory in the same location
    + Would this create a structure hazard?
    + 
 Yes, the IF and the MEM stages would need to simultaneously read from the same memory
    + 
 If using a single memory, the pipeline would need to **stall** to wait for the resource to become available

# Data Hazards

## Data Hazards
- A **data hazard** occurs when one instruction depends on the result of a previous instruction
- 
 For example:
    ```mips
    add $s0, $t0, $t1
    sub $t2, $s0, $t3
    ```
- 
 What stage does `add` write the result of `$s0` into the register file?
    + 
 Stage 5: Write Back
- 
 What stage does `sub` read from `$s0`?
    + 
 Stage 2: Instruction Decode

## Data Hazards
- Would need to stall for two clock cycles in order to wait for the value $s0 to be available for reading

![Data Hazard Stall](/teaching/2021f/cs130/assets/images/COD/data_hazard_stall.png)

## Forwarding (aka Bypassing)
- One way to avoid some data hazards without stalling is to **forward** the result to the next instruction immediately when it is available

![Data Hazard Forwarding](/teaching/2021f/cs130/assets/images/COD/data_hazard_forwarding.png)

## Forwarding (aka Bypassing)
- Can you think of a situation where forwarding cannot resolve a data hazard?
    ```mips
    lw $s0, 20($t1)
    sub $t2, $s0, $t3
    ```
- 
 What stage does `lw` produce the bits of `$s0`?
    + 
 After Stage 4: MEM

## Forwarding (aka Bypassing)

- Why does this create an unavoidable stall?
    ![Data Hazard Load Stall](/teaching/2021f/cs130/assets/images/COD/data_hazard_load.png)

- 
 We cannot send data **backwards in time**

## Data Hazard Exercise
- Consider the following MIPS code:
    ```mips
    lw $t0, 40($a3)
    add $t6, $t0, $t2
    sw $t6, 40($a3)
    ```
- 
 Assuming there is no forwarding implemented, are any stalls necessary?
- 
 How many clock cycles are required to execute these three lines of code without forwarding?

## Data Hazard Exercise
- Consider the following MIPS code:
    ```mips
    lw $t0, 40($a3)
    add $t6, $t0, $t2
    sw $t6, 40($a3)
    ```
- Assuming there IS forwarding implemented, are any stalls necessary?
- 
 How many clock cycles are required to execute these three lines of code with forwarding?

## Rearranging Instructions
- Another way to avoid data hazards is by rearranging instructions
- Consider the following MIPS code:
    ```mips
    lw  $t1, 0($t0)
    lw  $t2, 4($t0)
    add $t3, $t1, $t2
    sw  $t3, 12($t0)
    lw  $t4, 8($t0)
    add $t5, $t1, $t4
    sw  $t5, 16($t0)
    ```
- 
 Identify any stalls that are necessary even with forwarding and hazard detection active

## Rearranging Instructions
- These two stalls could be avoided by rearranging the code in the following way:
    ```mips
    lw  $t1, 0($t0)
    lw  $t2, 4($t0)
    lw  $t4, 8($t0)
    add $t3, $t1, $t2
    sw  $t3, 12($t0)
    add $t5, $t1, $t4
    sw  $t5, 16($t0)
    ```

# Control Hazards

## Control Hazards
- A **control hazard** (aka branching hazard) is when the next instruction to be executed is not yet known
- 
 Caused by **branching** instructions such as `beq`
- 
 During a `beq` instruction, at what pipeline stage do we know which branch will be taken?
    + 
 After Stage 3: EX

## Control Hazards
- One way to avoid control hazards is by stalling
    ![Stall on Branch](/teaching/2021f/cs130/assets/images/COD/stall_on_branch.png)
- 
 After every branch statement, we stall for one cycle

## Control Hazards
- The pros of the "always stall" approach are:
    1. 
 Simple and easy to understand
    2. 
 Will always work
- 
 The con of "always stall" is:
    + It is slow

## Control Hazards
- An alternative to the always stall approach is **branch prediction**
    + Make an educated guess on what the next instruction will be and execute that
    + 
 If incorrectly guessed, "undo" the steps and go to the correct branch

## Control Hazards
- **Static branch prediction** will always predict a certain branch depending on the branching behavior
    + 
 Predict forward branches not taken ()
    + 
 Predict backward branches taken (loops)
- 
 **Dynamic branch prediction** keeps track of how many times a branch is taken and updates its predictions based on history

# Pipelined Datapath Design

## Pipelined Datapath Design
<img src='/teaching/2021f/cs130/assets/images/COD/piplined_datapath.png' height='550'>

## Using Registers (Has Bug)
<img src='/teaching/2021f/cs130/assets/images/COD/pipelined_datapath_with_registers.png' height='500'>

## Using Registers (Bug Fixed)

# Pipelined Control

## Pipelined Control Simplified

![Pipeline Control Simplified](/teaching/2021f/cs130/assets/images/COD/pipelined_control_simplified.png)

## Pipelined Control Registers
- Control signals derived from instruction and passed through the relevant registers

![Pipeline Control Registers](/teaching/2021f/cs130/assets/images/COD/pipelined_control_registers.png)

## Pipelined Control Complete