Editing Classic RISC pipeline (section)

===Control hazards===

'''Control hazards are caused by conditional and unconditional branching.''' The classic RISC pipeline resolves branches in the decode stage, which means the branch resolution recurrence is two cycles long. There are three implications:

* The branch resolution recurrence goes through quite a bit of circuitry: the instruction cache read, register file read, branch condition compute (which involves a 32-bit compare on the MIPS CPUs), and the next instruction address multiplexer.
* Because branch and jump targets are calculated in parallel to the register read, RISC ISAs typically do not have instructions that branch to a register+offset address. Jump to register is supported.
* On any branch taken, the instruction immediately after the branch is always fetched from the instruction cache. If this instruction is ignored, there is a one cycle per taken branch [[Instructions per cycle|IPC]] penalty, which is adequately large.

There are four schemes to solve this performance problem with branches:

* Predict Not Taken: Always fetch the instruction after the branch from the instruction cache, but only execute it if the branch is not taken.  If the branch is not taken, the pipeline stays full.  If the branch is taken, the instruction is flushed (marked as if it were a NOP), and one cycle's opportunity to finish an instruction is lost.
* Branch Likely: Always fetch the instruction after the branch from the instruction cache, but only execute it if the branch was taken.  The compiler can always fill the branch delay slot on such a branch, and since branches are more often taken than not, such branches have a smaller IPC penalty than the previous kind.
* [[Branch delay slot|Branch Delay Slot]]: Depending on the design of the delayed branch and the branch conditions, it is determined whether the instruction immediately following the branch instruction is executed even if the branch is taken. Instead of taking an IPC penalty for some fraction of branches either taken (perhaps 60%) or not taken (perhaps 40%), branch delay slots take an IPC penalty for those branches into which the compiler could not schedule the branch delay slot.  The SPARC, MIPS, and MC88K designers designed a branch delay slot into their ISAs.
* [[Branch Prediction]]: In parallel with fetching each instruction, guess if the instruction is a branch or jump, and if so, guess the target.  On the cycle after a branch or jump, fetch the instruction at the guessed target.  When the guess is wrong, flush the incorrectly fetched target.

Delayed branches were controversial, first, because their semantics are complicated.  A delayed branch specifies that the jump to a new location happens ''after'' the next instruction.  That next instruction is the one unavoidably loaded by the instruction cache after the branch.

Delayed branches have been criticized{{By whom|date=May 2012}} as a poor short-term choice in ISA design:
* Compilers typically have some difficulty finding logically independent instructions to place after the branch (the instruction after the branch is called the delay slot), so that they must insert NOPs into the delay slots.
* [[Superscalar]] processors, which fetch multiple instructions per cycle and must have some form of branch prediction, do not benefit from delayed branches. The [[DEC Alpha|Alpha]] ISA left out delayed branches, as it was intended for superscalar processors.
* The most serious drawback to delayed branches is the additional control complexity they entail. If the delay slot instruction takes an exception, the processor has to be restarted on the branch, rather than that next instruction. Exceptions then have essentially two addresses, the exception address and the restart address, and generating and distinguishing between the two correctly in all cases has been a source of bugs for later designs.