Editing Instruction pipelining (section)

==History==
{{more citations needed section|date=March 2019}}
Seminal uses of pipelining were in the [[ILLIAC II]] project and the [[IBM Stretch]] project, though a simple version was used earlier in the [[Z1 (computer)|Z1]] in 1939 and the [[Z3 (computer)|Z3]] in 1941.<ref name="Rojas_1997">{{cite journal |title=Konrad Zuse's Legacy: The Architecture of the Z1 and Z3 |author-last=Rojas |author-first=Raúl |author-link=Raúl Rojas |journal=[[IEEE Annals of the History of Computing]] |volume=19 |number=2 |date=April–June 1997 |pages=5–16 |doi=10.1109/85.586067 |url=http://ed-thelen.org/comp-hist/Zuse_Z1_and_Z3.pdf |access-date=2022-07-03 |url-status=live |archive-url=https://web.archive.org/web/20220703082408/http://ed-thelen.org/comp-hist/Zuse_Z1_and_Z3.pdf |archive-date=2022-07-03}} (12 pages)</ref>

Pipelining began in earnest in the late 1970s in [[supercomputers]] such as vector processors and array processors.{{Citation needed|date=March 2019}} One of the early supercomputers was the Cyber series built by Control Data Corporation. Its main architect, [[Seymour Cray]], later headed Cray Research. Cray developed the XMP line of supercomputers, using pipelining for both multiply and add/subtract functions. Later, Star Technologies added parallelism (several pipelined functions working in parallel), developed by Roger Chen. In 1984, Star Technologies added the pipelined divide circuit developed by James Bradley. By the mid-1980s, pipelining was used by many different companies around the world.{{Citation needed|date=March 2019}}

Pipelining was not limited to supercomputers. In 1976, the [[Amdahl Corporation]]'s 470 series general purpose mainframe had a 7-step pipeline, and a patented branch prediction circuit.{{Citation needed|date=March 2019}}

===Hazards===
{{main article|Hazard (computer architecture)}}

The model of sequential execution assumes that each instruction completes before the next one begins; this assumption is not true on a pipelined processor. A situation where the expected result is problematic is known as a [[Hazard (computer architecture)|hazard]]. Imagine the following two register instructions to a hypothetical processor:
 1: add 1 to R5
 2: copy R5 to R6
If the processor has the 5 steps listed in the initial illustration (the 'Basic five-stage pipeline' at the start of the article), instruction 1 would be fetched at time ''t''<sub>1</sub> and its execution would be complete at ''t<sub>5</sub>''. Instruction 2 would be fetched at ''t<sub>2</sub>'' and would be complete at ''t<sub>6</sub>''. The first instruction might deposit the incremented number into R5 as its fifth step (register write back) at ''t<sub>5</sub>''. But the second instruction might get the number from R5 (to copy to R6) in its second step (instruction decode and register fetch) at time ''t<sub>3</sub>''. It seems that the first instruction would not have incremented the value by then. The above code invokes a hazard.

Writing computer programs in a [[compiler|compiled]] language might not raise these concerns, as the compiler could be designed to generate machine code that avoids hazards.

====Workarounds====
In some early DSP and RISC processors, the documentation advises programmers to avoid such dependencies in adjacent and nearly adjacent instructions (called [[delay slot]]s), or declares that the second instruction uses an old value rather than the desired value (in the example above, the processor might counter-intuitively copy the unincremented value), or declares that the value it uses is undefined. The programmer may have unrelated work that the processor can do in the meantime; or, to ensure correct results, the programmer may insert [[NOP (code)|NOP]]s into the code, partly negating the advantages of pipelining.

====Solutions====
Pipelined processors commonly use three techniques to work as expected when the programmer assumes that each instruction completes before the next one begins:
*The pipeline could [[Pipeline stall|stall]], or cease scheduling new instructions until the required values are available. This results in empty slots in the pipeline, or ''bubbles'', in which no work is performed.
*An additional data path can be added that routes a computed value to a future instruction elsewhere in the pipeline before the instruction that produced it has been fully retired, a process called [[operand forwarding]].<ref>{{cite web|url=http://www.csee.umbc.edu/~squire/cs411_l19.html |title=CMSC 411 Lecture 19, Pipelining Data Forwarding |publisher=University of Maryland Baltimore County Computer Science and Electrical Engineering Department |access-date=2020-01-22}}</ref><ref>{{Cite web |url=http://hpc.serc.iisc.ernet.in/~govind/hpc/L10-Pipeline.txt |title=High performance computing, Notes of class 11 |publisher=hpc.serc.iisc.ernet.in |date=September 2000 |access-date=2014-02-08 |url-status=dead |archive-url=https://web.archive.org/web/20131227033204/http://hpc.serc.iisc.ernet.in/~govind/hpc/L10-Pipeline.txt |archive-date=2013-12-27 }}</ref>
*The processor can locate other instructions which are not dependent on the current ones and which can be immediately executed without hazards, an optimization known as [[out-of-order execution]].

===Branches===
A branch out of the normal instruction sequence often involves a hazard. Unless the processor can give effect to the branch in a single time cycle, the pipeline will continue fetching instructions sequentially. Such instructions cannot be allowed to take effect because the programmer has diverted control to another part of the program.

A conditional branch is even more problematic. The processor may or may not branch, depending on a calculation that has not yet occurred. Various processors may stall, may attempt [[branch prediction]], and may be able to begin to execute two different program sequences ([[Speculative execution#Eager execution|eager execution]]), each assuming the branch is or is not taken, discarding all work that pertains to the incorrect guess.{{efn|Early pipelined processors without any of these heuristics, such as the [[PA-RISC]] processor of [[Hewlett-Packard]], dealt with hazards by simply warning the programmer; in this case, that one or more instructions following the branch would be executed whether or not the branch was taken. This could be useful; for instance, after computing a number in a register, a conditional branch could be followed by loading into the register a value more useful to subsequent computations in both the branch and the non-branch case.}}

A processor with an implementation of branch prediction that usually makes correct predictions can minimize the performance penalty from branching. However, if branches are predicted poorly, it may create more work for the processor, such as [[pipeline flush|flushing from the pipeline]] the incorrect code path that has begun execution before resuming execution at the correct location.

Programs written for a pipelined processor deliberately avoid branching to minimize possible loss of speed. For example, the programmer can handle the usual case with sequential execution and branch only on detecting unusual cases. Using programs such as [[gcov]] to analyze [[code coverage]] lets the programmer measure how often particular branches are actually executed and gain insight with which to optimize the code.
In some cases, a programmer can handle both the usual case and unusual case with [[branch (computer science)#Branch-free code|branch-free code]].

===Special situations===
; Self-modifying programs
: The technique of [[self-modifying code]] can be problematic on a pipelined processor. In this technique, one of the effects of a program is to modify its own upcoming instructions. If the processor has an [[instruction cache]], the original instruction may already have been copied into a [[prefetch input queue]] and the modification will not take effect. Some processors such as the [[Zilog Z280]] can configure their on-chip cache memories for data-only fetches, or as part of their ordinary memory address space, and avoid such difficulties with self-modifying instructions.

; Uninterruptible instructions
: An instruction may be uninterruptible to ensure its [[atomicity (programming)|atomicity]], such as when it swaps two items. A sequential processor permits [[interrupt]]s between instructions, but a pipelining processor overlaps instructions, so executing an uninterruptible instruction renders portions of ordinary instructions uninterruptible too. The [[Cyrix coma bug]] would [[hang (computing)|hang]] a single-core system using an infinite loop in which an uninterruptible instruction was always in the pipeline.