Editing Classic RISC pipeline (section)

====Solution B. Pipeline interlock====

However, consider the following instructions:
<syntaxhighlight lang="nasm">
LD  adr    -> r10
AND r10,r3 -> r11
</syntaxhighlight>
The data read from the address <code>adr</code> is not present in the data cache until after the Memory Access stage of the <code>LD</code> instruction. By this time, the <code>AND</code> instruction is already through the ALU. To resolve this would require the data from memory to be passed backwards in time to the input to the ALU. This is not possible. The solution is to delay the <code>AND</code> instruction by one cycle.  The data hazard is detected in the decode stage, and the fetch and decode stages are '''stalled''' - they are prevented from flopping their inputs and so stay in the same state for a cycle.  The execute, access, and write-back stages downstream see an extra no-operation instruction (NOP) inserted between the <code>LD</code> and <code>AND</code> instructions.

This NOP is termed a pipeline ''[[bubble (computing)|bubble]]'' since it floats in the pipeline, like an air bubble in a water pipe, occupying resources but not producing useful results. The hardware to detect a data hazard and stall the pipeline until the hazard is cleared is called a '''pipeline interlock'''.

{| align=center style="text-align:center"
|'''Bypassing backwards in time'''
|'''Problem resolved using a bubble'''
|-
|[[File:Data Forwarding (Two Stage, error).svg]]
|[[File:Data Forwarding (Two Stage).svg]]
|}

A pipeline interlock does not have to be used with any data forwarding, however. The first example of the <code>SUB</code> followed by <code>AND</code> and the second example of <code>LD</code> followed by <code>AND</code> can be solved by stalling the first stage by three cycles until write-back is achieved, and the data in the register file is correct, causing the correct register value to be fetched by the <code>AND</code>'s Decode stage. This causes quite a performance hit, as the processor spends a lot of time processing nothing, but clock speeds can be increased as there is less forwarding logic to wait for.

This data hazard can be detected quite easily when the program's machine code is written by the compiler.  The [[Stanford MIPS]] machine relied on the compiler to add the NOP instructions in this case, rather than having the circuitry to detect and (more taxingly) stall the first two pipeline stages.  Hence the name MIPS: Microprocessor without Interlocked Pipeline Stages.  It turned out that the extra NOP instructions added by the compiler expanded the program binaries enough that the instruction cache hit rate was reduced.  The stall hardware, although expensive, was put back into later designs to improve instruction cache hit rate, at which point the acronym no longer made sense.