Editing Branch predictor (section)

==History==
The [[IBM 7030 Stretch]], designed in the late 1950s, pre-executes all unconditional branches and any conditional branches that depended on the index registers. For other conditional branches, the first two production models implemented predict untaken; subsequent models were changed to implement predictions based on the current values of the indicator bits (corresponding to today's condition codes).<ref>{{cite web|url=https://people.computing.clemson.edu/~mark/stretch.html|title=IBM Stretch (7030) -- Aggressive Uniprocessor Parallelism}}</ref> The Stretch designers had considered static hint bits in the branch instructions early in the project but decided against them. Misprediction recovery was provided by the lookahead unit on Stretch, and part of Stretch's reputation for less-than-stellar performance was blamed on the time required for misprediction recovery. Subsequent IBM large computer designs did not use branch prediction with speculative execution until the [[IBM 3090]] in 1985.

Two-bit predictors were introduced by Tom McWilliams and Curt Widdoes in 1977 for the Lawrence Livermore National Lab S-1 supercomputer and independently by Jim Smith in 1979 at CDC.<ref>{{cite web|url=https://people.computing.clemson.edu/~mark/s1.html|title=S-1 Supercomputer}}</ref>

Microprogrammed processors, popular from the 1960s to the 1980s and beyond, took multiple cycles per instruction, and generally did not require branch prediction. However, in addition to the IBM 3090, there are several other examples of microprogrammed designs that incorporated branch prediction.

The [[Burroughs B2500|Burroughs B4900]], a microprogrammed COBOL machine released around 1982, was pipelined and used branch prediction. The B4900 branch prediction history state is stored back into the in-memory instructions during program execution. The B4900 implements 4-state branch prediction by using 4 semantically equivalent branch opcodes to represent each branch operator type. The opcode used indicated the history of that particular branch instruction. If the hardware determines that the branch prediction state of a particular branch needs to be updated, it rewrites the opcode with the semantically equivalent opcode that hinted the proper history. This scheme obtains a 93% hit rate. {{US patent|4435756|US patent 4,435,756}} and others were granted on this scheme.

The DEC [[VAX 9000]], announced in 1989, is both microprogrammed and pipelined, and performs branch prediction.<ref>{{cite book|chapter-url=https://ieeexplore.ieee.org/document/63652|chapter=Micro-architecture of the VAX 9000|doi=10.1109/CMPCON.1990.63652 |s2cid=24999559 |title=Digest of Papers Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage |year=1990 |last1=Murray |first1=J.E. |last2=Salett |first2=R.M. |last3=Hetherington |first3=R.C. |last4=McKeen |first4=F.X. |pages=44–53 |isbn=0-8186-2028-5 }}</ref>

The first commercial RISC processors, the [[MIPS Technologies|MIPS]] [[R2000 (microprocessor)|R2000]] and [[R3000]] and the earlier [[SPARC]] processors, do only trivial "not-taken" branch prediction. Because they use branch delay slots, fetched just one instruction per cycle, and execute in-order, there is no performance loss. The later [[R4000]] uses the same trivial "not-taken" branch prediction, and loses two cycles to each taken branch because the branch resolution recurrence is four cycles long.

Branch prediction became more important with the introduction of pipelined superscalar processors like the Intel [[Pentium (brand)|Pentium]], DEC [[Alpha 21064]], the MIPS [[R8000]], and the [[IBM Power microprocessors|IBM POWER]] series. These processors all rely on one-bit or simple bimodal predictors.

The DEC [[Alpha 21264]] (EV6) uses a next-line predictor overridden by a combined local predictor and global predictor, where the combining choice is made by a bimodal predictor.<ref name="seznec">{{cite conference |first1=A. |last1=Seznec |first2=S. |last2=Felix |first3=V. |last3=Krishnan |first4=Y. |last4=Sazeides |url=https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=988c6e59d2c44b85b8aacc68bc2958cacf76eaf1 |title=Design Tradeoffs for the Alpha EV8 Conditional Branch Predictor |book-title=Proceedings 29th Annual International Symposium on Computer Architecture |doi=10.1109/ISCA.2002.1003587|url-access=subscription }}</ref>

The [[AMD K8]] has a combined bimodal and global predictor, where the combining choice is another bimodal predictor. This processor caches the base and choice bimodal predictor counters in bits of the L2 cache otherwise used for ECC. As a result, it has effectively very large base and choice predictor tables, and parity rather than ECC on instructions in the L2 cache. The parity design is sufficient, since any instruction suffering a parity error can be invalidated and refetched from memory.

The [[Alpha 21464]]<ref name="seznec"/> (EV8, cancelled late in design) had a minimum branch misprediction penalty of 14 cycles. It was to use a complex but fast next-line predictor overridden by a combined bimodal and majority-voting predictor. The majority vote was between the bimodal and two gskew predictors.

In 2018 a catastrophic [[vulnerability (computing)|security vulnerability]] called [[Spectre (security vulnerability)|Spectre]] was made public by Google's [[Project Zero]] and other researchers. Affecting virtually all modern [[central processing unit|CPUs]], the vulnerability involves priming the branch predictors so another process (or the kernel) will mispredict a branch and use secret data as an array index, evicting one of the attacker's cache lines.  The attacker can time access to their own array to find out which one, turning this CPU internal (microarchitectural) state into a value the attacker can save which has information about values they could not read directly.<ref>{{cite web|url=https://www.theguardian.com/technology/2018/jan/04/meltdown-spectre-worst-cpu-bugs-ever-found-affect-computers-intel-processors-security-flaw|title=Meltdown and Spectre: 'worst ever' CPU bugs affect virtually all computers|last=Gibbs|first=Samuel|date=2018-01-04|website=the Guardian|language=en|access-date=2018-05-18}}</ref>