Editing Branch predictor (section)

==Implementation==

===Static branch prediction===
Static prediction is the simplest branch prediction technique because it does not rely on information about the dynamic history of code executing. Instead, it predicts the outcome of a branch based solely on the branch instruction.<ref>{{cite book |author-last1=Shen |author-first1=John P. |author-last2=Lipasti |author-first2=Mikko |title=Modern processor design: fundamentals of superscalar processors |url=https://archive.org/details/modernprocessord00shen |url-access=limited |date=2005 |publisher=[[McGraw-Hill Higher Education]] |location=Boston |isbn=0-07-057064-7 |pages=[https://archive.org/details/modernprocessord00shen/page/n236 455]}}</ref>

The early implementations of [[SPARC]] and [[MIPS architecture|MIPS]] (two of the first commercial [[RISC]] architectures) used single-direction static branch prediction: they always predict that a conditional jump will not be taken, so they always fetch the next sequential instruction. Only when the branch or jump is evaluated and found to be taken, does the instruction pointer get set to a non-sequential address.

Both CPUs evaluate branches in the decode stage and have a single cycle instruction fetch. As a result, the branch target recurrence is two cycles long, and the machine always fetches the instruction immediately after any taken branch. Both architectures define [[branch delay slot]]s in order to utilize these fetched instructions.

A more advanced form of static prediction presumes that backward branches will be taken and that forward branches will not. A backward branch is one that has a target address that is lower than its own address. This technique can help with prediction accuracy of loops, which are usually backward-pointing branches, and are taken more often than not taken.

Some processors allow branch prediction hints to be inserted into the code to tell whether the static prediction should be taken or not taken. The Intel [[Pentium 4]] accepts branch prediction hints, but this feature was abandoned in later Intel processors.<ref name="Fog_Microarchitecture">{{cite web |author-last=Fog |author-first=Agner |title=The microarchitecture of Intel, AMD, and VIA CPUs |date=2016-12-01 |url=http://www.agner.org/optimize/microarchitecture.pdf |page=36 |access-date=2017-03-22}}</ref>

Static prediction is used as a fall-back technique in some processors with dynamic branch prediction when dynamic predictors do not have sufficient information to use. Both the Motorola [[PowerPC G4#PowerPC 7450 .22Voyager.22|MPC7450 (G4e)]] and the Intel [[Pentium 4]] use this technique as a fall-back.<ref>{{cite web|url=https://arstechnica.com/articles/paedia/cpu/p4andg4e.ars/4|title=The Pentium 4 and the G4e: an Architectural Comparison|website=[[Ars Technica]]|date=12 May 2001 }}</ref>

In static prediction, all decisions are made at compile time, before the execution of the program.<ref>{{cite web |url=http://ece-research.unm.edu/jimp/611/slides/chap4_5.html |title=CMSC 611: Advanced Computer Architecture, Chapter 4 (Part V) |author-first=Jim |author-last=Plusquellic}}</ref>

===Dynamic branch prediction===
Dynamic branch prediction<ref name="schemes-and-performances"/> uses information about taken or not taken branches gathered at run-time to predict the outcome of a branch.<ref name="dbp-class-report"/>

===Random branch prediction===
Using a random or pseudorandom bit (a pure guess) would guarantee every branch a 50% correct prediction rate, which cannot be improved (or worsened) by reordering instructions. (With the simplest static prediction of "assume take", [[compiler]]s can reorder instructions to get better than 50% correct prediction.) Also, it would make timing [much more] nondeterministic.

===Next line prediction===
Some [[superscalar processor]]s (MIPS [[R8000]], [[Alpha 21264]], and [[Alpha 21464]] (EV8)) fetch each line of instructions with a pointer to the next line. This next-line predictor handles [[branch target predictor|branch target prediction]] as well as branch direction prediction.

When a next-line predictor points to aligned groups of 2, 4, or 8 instructions, the branch target will usually not be the first instruction fetched, and so the initial instructions fetched are wasted. Assuming for simplicity, a uniform distribution of branch targets, 0.5, 1.5, and 3.5 instructions fetched are discarded, respectively.

Since the branch itself will generally not be the last instruction in an aligned group, instructions after the taken branch (or its [[delay slot]]) will be discarded. Once again, assuming a uniform distribution of branch instruction placements, 0.5, 1.5, and 3.5 instructions fetched are discarded.

The discarded instructions at the branch and destination lines add up to nearly a complete fetch cycle, even for a single-cycle next-line predictor.

===One-level branch prediction===
====Saturating counter====
A 1-bit saturating counter (essentially a [[flip-flop (electronics)|flip-flop]]) records the last outcome of the branch. This is the most simple version of dynamic branch predictor possible, although it is not very accurate.

A 2-bit [[saturation arithmetic|saturating counter]]<ref name="dbp-class-report" /> is a [[state machine]] with four states:
[[File:Branch prediction 2bit saturating counter-dia.svg|600px|thumb|right|Figure 2: State diagram of 2-bit saturating counter]]
<!-- 2-bit saturating counter is different from bimodal predictor -->
* Strongly not taken
* Weakly not taken
* Weakly taken
* Strongly taken

When a branch is evaluated, the corresponding state machine is updated. Branches evaluated as not taken change the state toward strongly not taken, and branches evaluated as taken change the state toward strongly taken. The advantage of the two-bit counter scheme over a one-bit scheme is that a conditional jump has to deviate twice from what it has done most in the past before the prediction changes. For example, a loop-closing conditional jump is mispredicted once rather than twice.

The original, non-MMX [[Original Intel Pentium (P5 microarchitecture)|Intel Pentium]] processor uses a saturating counter, though with an imperfect implementation.<ref name="Fog_Microarchitecture"/>

On the [[Standard Performance Evaluation Corporation|SPEC]]'89 benchmarks, very large bimodal predictors saturate at 93.5% correct, once every branch maps to a unique counter.<ref name="decwrl-tn-36">{{cite web |author-first=Scott |author-last=McFarling |title=Combining Branch Predictors |id=Digital Western Research Lab (WRL) Technical Report, TN-36 |date=June 1993 |url=https://hplabs.itcs.hp.com/techreports/Compaq-DEC/WRL-TN-36.pdf}}</ref>{{rp|3}}

The predictor table is indexed with the instruction [[memory address|address]] bits, so that the processor can fetch a prediction for every instruction before the instruction is decoded.

===Two-level predictor===
The Two-Level Branch Predictor, also referred to as Correlation-Based Branch Predictor, uses a two-dimensional table of counters, also called "Pattern History Table". The table entries are two-bit counters.

====Two-level adaptive predictor====
[[File:Two-level branch prediction.svg|420px|thumb|right|Figure 3: Two-level adaptive branch predictor. Every entry in the pattern history table represents a 2-bit saturating counter of the type shown in figure 2.<ref>{{cite journal |title=New Algorithm Improves Branch Prediction: 3/27/95|url=https://www.cs.cmu.edu/afs/cs/academic/class/15213-f00/docs/mpr-branchpredict.pdf |journal=[[Microprocessor Report]] |volume=9 |issue=4 |date=March 27, 1995 |access-date=2016-02-02 |archive-url=https://web.archive.org/web/20150310190847/https://www.cs.cmu.edu/afs/cs/academic/class/15213-f00/docs/mpr-branchpredict.pdf |archive-date=2015-03-10 |url-status=live}}</ref>]]

If an <code>if</code> statement is executed three times, the decision made on the third execution might depend upon whether the previous two were taken or not. In such scenarios, a two-level adaptive predictor works more efficiently than a saturation counter. Conditional jumps that are taken every second time or have some other regularly recurring pattern are not predicted well by the saturating counter. A two-level adaptive predictor remembers the history of the last n occurrences of the branch and uses one saturating counter for each of the possible 2<sup>n</sup> history patterns. This method is illustrated in figure 3.

Consider the example of n&nbsp;= 2. This means that the last two occurrences of the branch are stored in a two-bit [[shift register]]. This branch history register can have four different [[binary numeral system|binary]] values, 00, 01, 10, and 11, where zero means "not taken" and one means "taken". A pattern history table contains four entries per branch, one for each of the 2<sup>2</sup>&nbsp;= 4 possible branch histories, and each entry in the table contains a two-bit saturating counter of the same type as in figure 2 for each branch. The branch history register is used for choosing which of the four saturating counters to use. If the history is 00, then the first counter is used; if the history is 11, then the last of the four counters is used.

Assume, for example, that a conditional jump is taken every third time. The branch sequence is 001001001... In this case, entry number 00 in the pattern history table will go to state "strongly taken", indicating that after two zeroes comes a one. Entry number 01 will go to state "strongly not taken", indicating that after 01 comes a zero. The same is the case with entry number 10, while entry number 11 is never used because there are never two consecutive ones.

The general rule for a two-level adaptive predictor with an n-bit history is that it can predict any repetitive sequence with any period if all n-bit [[subsequence|sub-sequences]] are different.<ref name="Fog_Microarchitecture"/>

The advantage of the two-level adaptive predictor is that it can quickly learn to predict an arbitrary repetitive pattern. This method was invented by T.-Y. Yeh and [[Yale Patt]] at the [[University of Michigan]].<ref>{{cite conference |author-first1=T.-Y. |author-last1=Yeh |author-last2=Patt |author-first2=Y. N. |author-link2=Yale Patt |title=Two-Level Adaptive Training Branch Prediction |book-title=Proceedings of the 24th annual international symposium on Microarchitecture |pages=51–61 |publisher=ACM |date=1991 |location=Albuquerque, New Mexico, Puerto Rico |doi=10.1145/123465.123475|doi-access=free }}</ref> Since the initial publication in 1991, this method has become very popular. Variants of this prediction method are used in most modern microprocessors.{{citation needed|date=September 2015}}

====Two-level neural predictor====
A two-level branch predictor where the second level is replaced with a [[neural network]] has been proposed.<ref>{{cite journal|url=https://www.researchgate.net/publication/264708564|title=Two-Level Branch Prediction using Neural Networks|last1=Egan|first1=Colin|last2=Steven|first2=Gordon|last3=Quick|first3=P.|last4=Anguera|first4=R.|last5=Vintan|first5=Lucian|date=December 2003|journal=Journal of Systems Architecture|volume=49|issue=12–15|pages=557–570|doi=10.1016/S1383-7621(03)00095-X}}</ref>

===Local branch prediction===
A local branch predictor has a separate history buffer for each conditional jump instruction. It may use a two-level adaptive predictor. The history buffer is separate for each conditional jump instruction, while the pattern history table may be separate as well or it may be shared between all conditional jumps.

The [[Intel]] [[Pentium MMX]], [[Pentium II]], and [[Pentium III]] have local branch predictors with a local 4-bit history and a local pattern history table with 16 entries for each conditional jump.

On the [[Standard Performance Evaluation Corporation|SPEC]]'89 benchmarks, very large local predictors saturate at 97.1% correct.<ref name="decwrl-tn-36"/>{{rp|6}}

===Global branch prediction===
A global branch predictor does not keep a separate history record for each conditional jump. Instead it keeps a shared history of all conditional jumps. The advantage of a shared history is that any [[correlation]] between different conditional jumps is part of making the predictions. The disadvantage is that the history is diluted by irrelevant information if the different conditional jumps are uncorrelated, and that the history buffer may not include any bits from the same branch if there are many other branches in between. It may use a two-level adaptive predictor.

This scheme is better than the saturating counter scheme only for large table sizes, and it is rarely as good as local prediction. The history buffer must be longer in order to make a good prediction. The size of the pattern history table grows [[exponential function|exponentially]] with the size of the history buffer. Hence, the big pattern history table must be shared among all conditional jumps.

A two-level adaptive predictor with globally shared history buffer and pattern history table is called a "gshare" predictor if it [[XOR gate|xors]] the global history and branch PC, and "gselect" if it [[concatenation|concatenates]] them. Global branch prediction is used in [[Advanced Micro Devices|AMD]] processors, and in Intel [[Pentium M]], [[intel core|Core]], [[Intel core 2|Core 2]], and [[Silvermont]]-based [[Intel Atom|Atom]] processors.

===Alloyed branch prediction===
An alloyed branch predictor<ref>{{cite conference |author-first=K. |author-last=Skadron |author-last2=Martonosi |author-first2=M. |author-last3=Clark |author-first3=D. W. |title=A Taxonomy of Branch Mispredictions, and Alloyed Prediction as a Robust Solution to Wrong-History Mispredictions |url=https://www.cs.virginia.edu/~skadron/Papers/alloy_pact00.pdf |book-title=Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques |pages=199–206 |date=October 2000 |location=Philadelphia |doi=10.1109/PACT.2000.888344}}</ref> combines the local and global prediction principles by [[concatenation|concatenating]] local and global branch histories, possibly with some bits from the [[program counter]] as well. Tests indicate that the [[VIA Nano]] processor may be using this technique.<ref name="Fog_Microarchitecture"/>

===Agree predictor===
An agree predictor is a two-level adaptive predictor with globally shared history buffer and pattern history table, and an additional local saturating counter. The outputs of the local and the global predictors are XORed with each other to give the final prediction. The purpose is to reduce contentions in the pattern history table where two branches with opposite prediction happen to share the same entry in the pattern history table.<ref>{{cite conference |author-first1=E. |author-last1=Sprangle  |author-first2=R.S. |author-last2=Chappell |author3-first=M. |author3-last=Alsup |author4-first=Y.N. |author4-last=Patt |author4-link=Yale Patt |title=The Agree Predictor: A Mechanism for Reducing Negative Branch History Interference |url=http://meseec.ce.rit.edu/eecc722-fall2006/papers/branch-prediction/5/agree_isca24.pdf |book-title=Proceedings of the 24th International Symposium on Computer Architecture |date=June 1997 |location=Denver |doi=10.1145/264107.264210}}</ref>

===Hybrid predictor===
A hybrid predictor, also called combined predictor, implements more than one prediction mechanism. The final prediction is based either on a meta-predictor that remembers which of the predictors has made the best predictions in the past, or a majority vote function based on an odd number of different predictors.

[[Scott McFarling]] proposed combined branch prediction in his 1993 paper.<ref name="decwrl-tn-36"/>

On the SPEC'89 benchmarks, such a predictor is about as good as the local predictor.{{Citation needed|date=April 2009}}

Predictors like gshare use multiple table entries to track the behavior of any particular branch. This multiplication of entries makes it much more likely that two branches will map to the same table entry (a situation called aliasing), which in turn makes it much more likely that prediction accuracy will suffer for those branches. Once you have multiple predictors, it is beneficial to arrange that each predictor will have different aliasing patterns, so that it is more likely that at least one predictor will have no aliasing. Combined predictors with different indexing functions for the different predictors are called ''gskew'' predictors, and are analogous to [[CPU cache#Two-way skewed associative cache|skewed associative caches]] used for data and instruction caching.

===Loop predictor===
A [[conditional jump]] that controls a [[control flow#Loops|loop]] is best predicted with a special loop predictor. A conditional jump in the bottom of a loop that repeats N times will be taken N-1 times and then not taken once. If the conditional jump is placed at the top of the loop, it will be not taken N-1 times and then taken once. A conditional jump that goes many times one way and then the other way once is detected as having loop behavior. Such a conditional jump can be predicted easily with a simple counter. A loop predictor is part of a hybrid predictor where a meta-predictor detects whether the conditional jump has loop behavior.

===Indirect branch predictor===
An [[indirect branch|indirect jump]] instruction can choose among more than two branches. Some processors have specialized indirect branch predictors.<ref>{{cite web |url=http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0438h/BABEHAJJ.html |title=Cortex-A15 MPCore Technical Reference Manual, section 6.5.3 "Indirect predictor" |website=[[ARM Holdings]]}}</ref><ref>{{cite web |url=http://hoelzle.org/publications/TRCS97-10.pdf |title=Limits of Indirect Branch Prediction |author-first1=Karel |author-last1=Driesen |author-first2=Urs |author-last2=Hölzle |date=1997-06-25 |archive-url=https://web.archive.org/web/20160506213742/http://hoelzle.org/publications/TRCS97-10.pdf |archive-date=2016-05-06 |url-status=dead}}</ref> Newer processors from Intel<ref>{{cite web |url=https://arstechnica.com/features/2004/02/pentium-m/ |title=A Look at Centrino's Core: The Pentium M |author-first=Jon |author-last=Stokes |date=2004-02-25 |pages=2–3}}</ref> and AMD<ref>{{cite web |url=https://www.realworldtech.com/cpu-perf-analysis/5/ |title=Performance Analysis for Core 2 and K8: Part 1 |page=5 |date=2008-10-28 |author-first=Aaron |author-last=Kanter}}</ref> can predict indirect branches by using a two-level adaptive predictor. This kind of instruction contributes more than one bit to the history buffer. The [[IBM zEC12 (microprocessor)|zEC12]] and later [[z/Architecture]] processors from IBM support a {{Mono|BRANCH PREDICTION PRELOAD}} instruction that can preload the branch predictor entry for a given instruction with a branch target address constructed by adding the contents of a general-purpose register to an immediate displacement value.<ref>{{cite book |url=https://publibfp.dhe.ibm.com/epubs/pdf/a227832d.pdf |title=z/Architecture Principles of Operation |id=SA22-7832-13 |edition=Fourteenth |date=May 2022 |publisher=[[IBM]] |pages=7-42{{hyp}}7-45}}</ref><ref>{{cite web |url=https://www.redbooks.ibm.com/redbooks/pdfs/sg248138.pdf |title=IBM zEnterprise BC12 Technical Guide |date=February 2014 |page=78 |publisher=[[IBM]]}}</ref>

Processors without this mechanism will simply predict an indirect jump to go to the same target as it did last time.<ref name="Fog_Microarchitecture"/>

===Prediction of function returns===
A [[subroutine|function]] will normally return to where it is called from. The [[return statement|return instruction]] is an indirect jump that reads its target address from the [[call stack]]. Many microprocessors have a separate prediction mechanism for return instructions. This mechanism is based on a so-called ''return stack buffer'', which is a local mirror of the call stack. The size of the return stack buffer is typically 4–16 entries.<ref name="Fog_Microarchitecture"/>

===Overriding branch prediction===
The [[trade-off]] between fast branch prediction and good branch prediction is sometimes dealt with by having two branch predictors. The first branch predictor is fast and simple. The second branch predictor, which is slower, more complicated, and with bigger tables, will override a possibly wrong prediction made by the first predictor.

The Alpha 21264 and Alpha EV8 microprocessors used a fast single-cycle next-line predictor to handle the branch target recurrence and provide a simple and fast branch prediction. Because the next-line predictor is so inaccurate, and the branch resolution recurrence takes so long, both cores have two-cycle secondary branch predictors that can override the prediction of the next-line predictor at the cost of a single lost fetch cycle.

The [[Intel Core i7]] has two [[branch target predictor|branch target buffers]] and possibly two or more branch predictors.<ref>{{cite patent |inventor1-last=Yeh |inventor1-first=Tse-Yu |inventor2-last=Sharangpani |inventor2-first=H. P. |publication-date=2000-03-16 |title=A method and apparatus for branch prediction using a second level branch prediction table |country-code=WO |postscript=<!--None--> |patent-number=2000/014628}}</ref>

===Neural branch prediction===
[[Machine learning]] for branch prediction using [[learning vector quantization|LVQ]] and [[multi-layer perceptron]]s, called "[[artificial neural network|neural]] branch prediction", was proposed by Lucian Vintan ([[Lucian Blaga University of Sibiu]]).<ref>{{cite conference |author-first=Lucian N. |author-last=Vintan |title=Towards a High Performance Neural Branch Predictor |book-title=Proceedings International Journal Conference on Neural Networks (IJCNN) |date=1999 |url=http://webspace.ulbsibiu.ro/lucian.vintan/html/USA.pdf |access-date=2010-12-02 |archive-date=2019-07-13 |archive-url=https://web.archive.org/web/20190713224752/http://webspace.ulbsibiu.ro/lucian.vintan/html/USA.pdf |url-status=dead}}</ref>
One year later he developed the perceptron branch predictor.<ref>{{cite journal |author-first=Lucian N. |author-last=Vintan |title=Towards a Powerful Dynamic Branch Predictor |journal=Romanian Journal of Information Science and Technology |volume=3 |issue=3 |pages=287–301 |issn=1453-8245 |publisher=Romanian Academy |location=Bucharest |date=2000 |url=http://webspace.ulbsibiu.ro/lucian.vintan/html/Rom_JIST.pdf}}</ref>
The neural branch predictor research was developed much further by Daniel Jimenez.<ref name="jimenez-perceptrons">{{cite conference |author-first1=D. A. |author-last1=Jimenez |author-first2=C. |author-last2=Lin |title=Dynamic Branch Prediction with Perceptrons |url=https://www.cs.utexas.edu/~lin/papers/hpca01.pdf |book-title=Proceedings of the 7th International Symposium on High Performance Computer Architecture (HPCA-7) |location=Monterrey, NL, Mexico |date=2001 |pages=197–296 |doi=10.1109/HPCA.2001.903263}}</ref>
In 2001,<ref name="jimenez-perceptrons"/> the first [[perceptron]] predictor was presented that was feasible to implement in hardware. The first commercial implementation of a perceptron branch predictor was in AMD's [[Piledriver (microarchitecture)|Piledriver microarchitecture]].<ref>{{cite web |url=http://www.anandtech.com/show/5831/amd-trinity-review-a10-4600m-a-new-hope |title=The AMD Trinity Review (A10-4600M): A New Hope |author-first=Jarred |author-last=Walton |date=2012-05-15 |website=[[AnandTech]]}}</ref>

The main advantage of the neural predictor is its ability to exploit long histories while requiring only linear resource growth. Classical predictors require exponential resource growth. Jimenez reports a global improvement of 5.7% over a McFarling-style hybrid predictor.<ref name="jimenez-micro-36">{{cite conference |url=http://www.microarch.org/micro36/html/pdf/jimenez-FastPath.pdf |author-first=Daniel A. |author-last=Jimenez |title=Fast Path-Based Neural Branch Prediction |doi=10.1109/MICRO.2003.1253199 |conference=The 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-36) |location=San Diego, USA |date=December 2003 |pages=243–252 |access-date=2018-04-08 |archive-date=2016-03-31 |archive-url=https://web.archive.org/web/20160331115200/http://www.microarch.org/micro36/html/pdf/jimenez-FastPath.pdf |url-status=dead }}</ref> He also used a gshare/perceptron overriding hybrid predictors.<ref name="jimenez-micro-36"/>

The main disadvantage of the perceptron predictor is its high latency. Even after taking advantage of high-speed arithmetic tricks, the computation latency is relatively high compared to the clock period of many modern microarchitectures. In order to reduce the prediction latency, Jimenez proposed in 2003 the ''fast-path neural predictor'', where the perceptron predictor chooses its weights according to the current branch's path, rather than according to the branch's PC. Many other researchers developed this concept (A. Seznec, M. Monchiero, D. Tarjan & K. Skadron, V. Desmet, Akkary et al., K. Aasaraai, Michael Black, etc.).{{citation needed|date=May 2013}}

Most of the state-of-the-art branch predictors are using a perceptron predictor (see Intel's "Championship Branch Prediction Competition"<ref>{{cite web |url=https://www.jilp.org/cbp2016/|title=Championship Branch Prediction}}</ref>). Intel already implements this idea in one of the [[IA-64]]'s simulators (2003).<ref>{{cite conference |author-first1=Edward |author-last1=Brekelbaum |author-first2=Jeff |author-last2=Rupley |author-first3=Chris |author-last3=Wilkerson |author-first4=Bryan |author-last4=Black |title=Hierarchical scheduling windows |doi=10.1109/MICRO.2002.1176236 |book-title=Proceedings of the 35th International Symposium on Microarchitecture |location=Istanbul, Turkey |date=December 2002}}</ref>

The [[AMD]] [[Ryzen]]<ref>{{cite web |url=https://www.pcgamesn.com/amd/amd-zen-release-date-specs-prices-rumours |title=AMD Ryzen reviews, news, performance, pricing, and availability |author-first=Dave |author-last=James |website=[[PCGamesN]]|date=2017-12-06}}</ref><ref>{{cite press release |url=https://www.amd.com/en/press-releases/amd-takes-computing-2016dec13 |title=AMD Takes Computing to a New Horizon with Ryzen™ Processors |publisher=[[AMD]] |access-date=2016-12-14}}</ref><ref>{{cite news |url=http://arstechnica.co.uk/gadgets/2016/12/amd-zen-performance-details-release-date/ |title=AMD's Zen CPU is now called Ryzen, and it might actually challenge Intel |newspaper=Ars Technica UK |access-date=2016-12-14}}</ref> multi-core processor's [[Infinity Control Fabric|Infinity Fabric]] and the [[Samsung]] [[Exynos]] processor include a perceptron-based neural branch predictor.