Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Predication (computer architecture)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Form of conditionals in computer programming}} {{More citations needed|date=March 2014}} {{Distinguish|branch prediction}} In [[computer architecture]], '''predication''' is a feature that provides an alternative to [[Conditional (computer programming)|conditional]] transfer of [[control flow|control]], as implemented by conditional [[branch (computer science)|branch]] machine [[instruction (computer science)|instructions]]. Predication works by having conditional (''predicated'') non-branch instructions associated with a ''predicate'', a [[Boolean data type|Boolean value]] used by the instruction to control whether the instruction is allowed to modify the architectural state or not. If the predicate specified in the instruction is true, the instruction modifies the architectural state; otherwise, the architectural state is unchanged. For example, a predicated move instruction (a conditional move) will only modify the destination if the predicate is true. Thus, instead of using a conditional branch to select an instruction or a sequence of instructions to [[Execution (computing)|execute]] based on the predicate that controls whether the branch occurs, the instructions to be executed are associated with that predicate, so that they will be executed, or not executed, based on whether that predicate is true or false.<ref name="rvinyard">{{cite web |author=Rick Vinyard |date=2000-04-26 |title=Predication |url=https://www.cs.nmsu.edu/~rvinyard/itanium/predication.htm |archive-url=https://web.archive.org/web/20150420152310/https://www.cs.nmsu.edu/~rvinyard/itanium/predication.htm |archive-date=20 Apr 2015 |url-status=dead |access-date=2014-04-22 |website=cs.nmsu.edu}}</ref> [[Vector processors]], some [[SIMD]] ISAs (such as [[AVX2]] and [[AVX-512]]) and [[GPU]]s in general make heavy use of predication, applying one bit of a conditional ''mask vector'' to the corresponding elements in the vector registers being processed, whereas scalar predication in scalar instruction sets only need the one predicate bit. Where predicate masks become particularly powerful in [[vector processing]] is if an ''array'' of [[condition_code_register|condition codes]], one per vector element, may feed back into predicate masks that are then applied to subsequent vector instructions. ==Overview== Most [[computer program]]s contain [[conditional (computer programming)|conditional]] code, which will be executed only under specific conditions depending on factors that cannot be determined beforehand, for example depending on user input. As the majority of [[central processing unit|processors]] simply execute the next [[instruction (computer science)|instruction]] in a sequence, the traditional solution is to insert ''branch'' instructions that allow a program to conditionally branch to a different section of code, thus changing the next step in the sequence. This was sufficient until designers began improving performance by implementing [[instruction pipelining]], a method which is slowed down by branches. For a more thorough description of the problems which arose, and a popular solution, see [[branch predictor]]. Luckily, one of the more common patterns of code that normally relies on branching has a more elegant solution. Consider the following [[pseudocode]]:<ref name="rvinyard"/> <syntaxhighlight lang="c"> if condition {do_something} else {do_something_else} </syntaxhighlight> On a system that uses conditional branching, this might translate to [[machine language|machine instructions]] looking similar to:<ref name="rvinyard"/> <syntaxhighlight lang="c"> branch_if_condition_to label1 do_something_else branch_always_to label2 label1: do_something label2: ... </syntaxhighlight> With predication, all possible branch paths are coded inline, but some instructions execute while others do not. The basic idea is that each instruction is associated with a predicate (the word here used similarly to its usage in [[predicate logic]]) and that the instruction will only be executed if the predicate is true. The machine code for the above example using predication might look something like this:<ref name="rvinyard"/> <syntaxhighlight lang="c"> (condition) do_something (not condition) do_something_else </syntaxhighlight> Besides eliminating branches, less code is needed in total, provided the architecture provides predicated instructions. While this does not guarantee faster execution in general, it will if the <code>do_something</code> and <code>do_something_else</code> blocks of code are short enough. Predication's simplest form is ''partial predication'', where the architecture has ''conditional move'' or ''conditional select'' instructions. Conditional move instructions write the contents of one register over another only if the predicate's value is true, whereas conditional select instructions choose which of two registers has its contents written to a third based on the predicate's value. A more generalized and capable form is ''full predication''. Full predication has a set of predicate registers for storing predicates (which allows multiple nested or sequential branches to be simultaneously eliminated) and most instructions in the architecture have a register specifier field to specify which predicate register supplies the predicate.<ref>{{cite conference |last1=Mahlke|first1=Scott A.|last2=Hank|first2=Richard E.|last3=McCormick|first3=James E.|last4=August|first4=David I.|last5=Hwn|first5=Wen-mei W.|title=A Comparison of Full and Partial Predicated Execution Support for ILP Processors |conference=The 22nd International Symposium on Computer Architecture, 22–24 June 1995 |year=1995 |doi=10.1145/223982.225965 |isbn=0-89791-698-0 |citeseerx=10.1.1.19.3187}}</ref> ==Advantages== The main purpose of predication is to avoid jumps over very small sections of program code, increasing the effectiveness of [[pipeline (computing)|pipelined]] execution and avoiding problems with the [[CPU cache|cache]]. It also has a number of more subtle benefits: *Functions that are traditionally computed using simple arithmetic and [[bitwise operation]]s may be quicker to compute using predicated instructions. *Predicated instructions with different predicates can be mixed with each other and with unconditional code, allowing better [[instruction scheduling]] and so even better performance. *Elimination of unnecessary branch instructions can make the execution of necessary branches, such as those that make up loops, faster by lessening the load on [[branch prediction]] mechanisms. *Elimination of the cost of a branch misprediction which can be high on deeply pipelined architectures. * Instruction sets that have comprehensive [[Condition_code_register|Condition Codes]] generated by instructions may reduce code size further by directly using the Condition Registers in or as predication. ==Disadvantages== Predication's primary drawback is in increased encoding space. In typical implementations, every instruction reserves a bitfield for the predicate specifying under what conditions that instruction should have an effect. When available memory is limited, as on [[embedded system|embedded devices]], this space cost can be prohibitive. However, some architectures such as [[Thumb-2]] are able to avoid this issue (see below). Other detriments are the following:<ref name="Fisher04">{{cite book |first1=Joseph A. |last1=Fisher |first2=Paolo |last2=Faraboschi |first3=Cliff |last3=Young |year=2004 |title=Embedded Computing — A VLIW Approach to Architecture, Compilers, and Tools |page=172 |chapter-url=https://books.google.com/books?id=DbOa0p1wRD4C&pg=PA172 |chapter=4.5.2 Predication § Predication in the Embedded Domain |isbn=9780080477541 |publisher=Elsevier}}</ref> *Predication complicates the hardware by adding levels of [[control unit|logic]] to critical [[datapath|paths]] and potentially degrades clock speed. *A predicated block includes cycles for all operations, so shorter [[control-flow graph|paths]] may take longer and be penalized. *Predication is not usually speculated and causes a longer dependency chain. For ordered data this translates to a performance loss compared to a predictable branch.<ref>{{cite web |last1=Cordes |first1=Peter |title=assembly - How does Out of Order execution work with conditional instructions, Ex: CMOVcc in Intel or ADDNE (Add not equal) in ARM |url=https://stackoverflow.com/a/50960323 |website=Stack Overflow |quote=Unlike with control dependencies (branches), they don't predict or speculate what the flags will be, so a cmovcc instead of a jcc can create a loop-carried dependency chain and end up being worse than a predictable branch. [https://stackoverflow.com/questions/50959808 gcc optimization flag -O3 makes code slower than -O2] is an example of that.}}</ref> Predication is most effective when paths are balanced or when the longest path is the most frequently executed,<ref name="Fisher04"/> but determining such a path is very difficult at compile time, even in the presence of [[profiling (computer programming)|profiling information]]. ==History== Predicated instructions were popular in European computer designs of the 1950s, including the [[Mailüfterl]] (1955), the [[Z22 (computer)|Zuse Z22]] (1955), the [[ZEBRA (computer)|ZEBRA]] (1958), and the [[Electrologica X1]] (1958). The [[ACS-1|IBM ACS-1]] design of 1967 allocated a "skip" bit in its instruction formats, and the CDC Flexible Processor in 1976 allocated three conditional execution bits in its microinstruction formats. [[Hewlett-Packard]]'s [[PA-RISC]] architecture (1986) had a feature called ''nullification'', which allowed most instructions to be predicated by the previous instruction. [[IBM]]'s [[IBM POWER instruction set architecture|POWER architecture]] (1990) featured conditional move instructions. POWER's successor, [[PowerPC]] (1993), dropped these instructions. [[Digital Equipment Corporation]]'s [[DEC Alpha|Alpha]] architecture (1992) also featured conditional move instructions. [[MIPS architecture|MIPS]] gained conditional move instructions in 1994 with the MIPS IV version; and [[SPARC]] was extended in Version 9 (1994) with conditional move instructions for both integer and floating-point registers. In the [[Hewlett-Packard]]/[[Intel]] [[IA-64]] architecture, most instructions are predicated. The predicates are stored in 64 special-purpose predicate [[processor register|registers]]; and one of the predicate registers is always true so that ''unpredicated'' instructions are simply instructions predicated with the value true. The use of predication is essential in IA-64's implementation of [[software pipelining]] because it avoids the need for writing separated code for prologs and epilogs.{{clarify|date=March 2014}} In the [[x86]] architecture, a family of conditional move instructions (<code>CMOV</code> and <code>FCMOV</code>) were added to the architecture by the [[Intel]] [[Pentium Pro]] (1995) processor. The <code>CMOV</code> instructions copied the contents of the source register to the destination register depending on a predicate supplied by the value of the flag register. In the [[ARM architecture]], the original 32-bit instruction set provides a feature called ''conditional execution'' that allows most instructions to be predicated by one of 13 predicates that are based on some combination of the four condition codes set by the previous instruction. ARM's [[ARM architecture#Thumb|Thumb]] instruction set (1994) dropped conditional execution to reduce the size of instructions so they could fit in 16 bits, but its successor, [[ARM architecture#Thumb-2|Thumb-2]] (2003) overcame this problem by using a special instruction which has no effect other than to supply predicates for the following four instructions. The 64-bit instruction set introduced in ARMv8-A (2011) replaced conditional execution with conditional selection instructions. == SIMD, SIMT and vector predication == Some [[SIMD]] instruction sets, like AVX2, have the ability to use a logical [[Mask (computing)|mask]] to conditionally load/store values to memory, a parallel form of the conditional move, and may also apply individual mask bits to individual arithmetic units executing a parallel operation. The technique [[Flynn's taxonomy#Associative processor|is known in Flynn's taxonomy as "associative processing"]]. This form of predication is also used in [[vector processors]] and [[single instruction, multiple threads]] GPU computing. All the techniques, advantages and disadvantages of single scalar predication apply just as well to the parallel processing case. ==See also== {{Div col|colwidth=20em}} *[[Branch predictor]] *[[Control flow]] *[[Delay slot]] *[[Instruction-level parallelism]] *[[Optimizing compiler]] *[[Pipeline stall]] *[[Software pipelining]] *[[Speculative execution]] *[[Vector processor]] *[[Very long instruction word]] {{Div col end}} ==References== {{Reflist}} ==Further reading== *{{cite book|first=Alan |last=Clements|title=Computer Organization & Architecture: Themes and Variations |chapter=8.3.7 Predication |chapter-url={{google books|id=ySILAAAAQBAJ&pg=PA532|plainurl=yes}}|year=2013|publisher=Cengage Learning|isbn=978-1-285-41542-0|pages=532–9}} {{DEFAULTSORT:Predication}} [[Category:Conditional constructs]] [[Category:Instruction processing]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Cite book
(
edit
)
Template:Cite conference
(
edit
)
Template:Cite web
(
edit
)
Template:Clarify
(
edit
)
Template:Distinguish
(
edit
)
Template:Div col
(
edit
)
Template:Div col end
(
edit
)
Template:More citations needed
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)