Editing Register allocation (section)

== Register allocation techniques ==
Register allocation can happen over a [[basic block]] of code: it is said to be "local", and was first mentioned by Horwitz et al.{{sfn|Horwitz|Karp|Miller|Winograd|1966|p=43}} As basic blocks do not contain branches, the allocation process is thought to be fast, because the management of [[control-flow graph]] merge points in register allocation reveals itself{{clarify|date=September 2020}} a time-consuming operation.{{sfn|Farach|Liberatore|1998|p=566}} However, this approach is thought not to produce as optimized code as the "global" approach, which operates over the whole compilation unit (a method or procedure for instance).{{sfn|Eisl|Marr|Würthinger|Mössenböck|2017|p=92}}

=== Graph-coloring allocation ===
{{See also|Graph coloring}}
Graph-coloring allocation is the predominant approach to solve register allocation.{{sfn|Eisl|Leopoldseder|Mössenböck|2018|p=1}}{{sfn|Briggs|Cooper|Torczon|1992|p=316}} It was first proposed by Chaitin et al.{{sfn|Chaitin|Auslander|Chandra|Cocke|1981|p=47}}
In this approach, nodes in the [[Graph (discrete mathematics)|graph]] represent live ranges ([[Variable (computer science)|variables]], [[Temporary variable|temporaries]], virtual/symbolic registers) that are candidates for register allocation. Edges connect live ranges that interfere, i.e., live ranges that are simultaneously live at at least one program point. Register allocation then reduces to the [[graph coloring]] problem in which colors (registers) are assigned to the nodes such that two nodes connected by an edge do not receive the same color.{{sfn|Poletto|Sarkar|1999|p=896}}

Using [[Live variable analysis|liveness analysis]], an interference graph can be built. The interference graph, which is an [[Graph (discrete mathematics)#Undirected graph|undirected graph]] where the nodes are the program's variables, is used to model which variables cannot be allocated to the same register.{{sfn|Runeson|Nyström|2003|p=241}}

==== Principle ====
The main phases in a Chaitin-style graph-coloring register allocator are:{{sfn|Briggs|Cooper|Torczon|1992|p=316}}
[[File:Chaitin et al.'s iteravite graph coloring based register allocator.png|thumb|700px|Chaitin et al.'s iterative graph coloring based register allocator]]
# '''Renumber''': discover live range information in the source program.
# '''Build''': build the interference graph.
# '''Coalesce''': merge the live ranges of non-interfering variables related by copy instructions.
# '''Spill cost''': compute the spill cost of each variable. This assesses the impact of mapping a variable to memory on the speed of the final program.
# '''Simplify''': construct an ordering of the nodes in the inferences graph
# '''Spill Code''': insert spill instructions, i.e. loads and stores to commute values between registers and memory.
# '''Select''': assign a register to each variable.

==== Drawbacks and further improvements ====
The graph-coloring allocation has three major drawbacks. First, it relies on graph-coloring, which is an [[NP-completeness|NP-complete problem]], to decide which variables are spilled. Finding a minimal coloring graph is indeed an NP-complete problem.{{sfn|Book|1975|p=618–619}} Second, unless live-range splitting is used, evicted variables are spilled everywhere: store instructions are inserted as early as possible, i.e., just after variable definitions; load instructions are respectively inserted late, just before variable use. Third, a variable that is not spilled is kept in the same register throughout its whole lifetime.{{sfn|Colombet|Brandner|Darte|2011|p=1}}

On the other hand, a single register name may appear in multiple register classes, where a class is a set of register names that are interchangeable in a particular role. Then, multiple register names may be aliases for a single hardware register.{{sfn|Smith|Ramsey|Holloway|2004|p=277}} 
Finally, graph coloring is an aggressive technique for allocating registers, but is computationally expensive due to its use of the interference graph, which can have a worst-case size that is [[Time complexity#Table of common time complexities|quadratic]] in the number of live ranges.{{sfn|Cavazos|Moss|O’Boyle|2006|p=124}}
The traditional formulation of graph-coloring register allocation implicitly assumes a single bank of non-overlapping general-purpose registers and does not handle irregular architectural features like overlapping registers pairs, special purpose registers and multiple register banks.{{sfn|Runeson|Nyström|2003|p=240}}

One later improvement of Chaitin-style graph-coloring approach was found by Briggs et al.: it is called conservative coalescing. This improvement adds a criterion to decide when two live ranges can be merged. Mainly, in addition to the non-interfering requirements, two variables can only be coalesced if their merging will not cause further spilling. Briggs et al. introduces a second improvement to Chaitin's works which is biased coloring. Biased coloring tries to assign the same color in the graph-coloring to live range that are copy related.{{sfn|Briggs|Cooper|Torczon|1992|p=316}}

=== Linear scan ===

Linear scan is another global register allocation approach. It was first proposed by Poletto et al. in 1999.{{sfn|Poletto|Sarkar|1999|p=895}} In this approach, the code is not turned into a graph. Instead, all the variables are linearly scanned to determine their live range, represented as an interval. Once the live ranges of all variables have been figured out, the intervals  are traversed chronologically. Although this traversal could help identifying variables whose live ranges interfere, no interference graph is being built and the variables are allocated in a greedy way.{{sfn|Cavazos|Moss|O’Boyle|2006|p=124}}

The motivation for this approach is speed; not in terms of execution time of the generated code, but in terms of time spent in code generation. Typically, the standard graph coloring approaches produce quality code, but have a significant [[Overhead (computing)|overhead]],{{sfn|Poletto|Sarkar|1999|p=902}}{{sfn|Wimmer|Mössenböck|2005|p=132}} the used graph coloring algorithm having a quadratic cost.{{sfn|Johansson|Sagonas|2002|p=102}} Owing to this feature, linear scan is the approach currently used in several JIT compilers, like the [[HotSpot (virtual machine)|Hotspot client compiler]], [[Chrome V8|V8]], [[Jikes RVM]],{{sfn|Eisl|Grimmer|Simon|Würthinger|2016|p=1}} and the Android Runtime (ART).<ref>{{Cite video |url=https://www.youtube.com/watch?v=fwMM6g7wpQ8 |title=The Evolution of ART - Google I/O 2016  |date=25 May 2016 |time=3m47s |work=Google}}</ref> The [[HotSpot (virtual machine)|Hotspot server compiler]] uses graph coloring for its superior code.{{sfn|Paleczny|Vick|Click|2001|p=1}}

==== Pseudocode ====

This describes the algorithm as first proposed by Poletto et al.,{{sfn|Poletto|Sarkar|1999|p=899}} where:
* '''R''' is the number of available registers.
* '''active''' is the list, sorted in order of increasing end point, of live intervals overlapping the current point and placed in registers.

 '''LinearScanRegisterAllocation'''
     active ← {}
     '''for each''' live interval ''i'', in order of increasing start point '''do'''
         ExpireOldIntervals(i)
         '''if''' length(active) = R '''then'''
             SpillAtInterval(i)
         '''else'''
             register[i] ← a register removed from pool of free registers
             add ''i'' to active, sorted by increasing end point
 
 '''ExpireOldIntervals(i)'''
     '''for each''' interval ''j'' '''in''' active, in order of increasing end point '''do'''
         '''if''' endpoint[j] ≥ startpoint[i] '''then'''
             '''return''' 
         remove ''j'' from active
         add register[j] to pool of free registers
 
 '''SpillAtInterval(i)'''
     spill ← last interval in active
     '''if''' endpoint[spill] > endpoint[i] '''then'''
         register[i] ← register[spill]
         location[spill] ← new stack location
         remove spill from active
         add ''i'' to active, sorted by increasing end point
     '''else'''
         location[i] ← new stack location

==== Drawbacks and further improvements ====
However, the linear scan presents two major drawbacks. First, due to its greedy aspect, it does not take lifetime holes into account, i.e. "ranges where the value of the variable is not needed".{{sfn|Eisl|Grimmer|Simon|Würthinger|2016|p=2}}{{sfn|Traub|Holloway|Smith|1998|p=143}} Besides, a spilled variable will stay spilled for its entire lifetime.
[[File:Shorter live ranges with SSA approach.png|thumb|Shorter live ranges with SSA approach]]
Many other research works followed up on the Poletto's linear scan algorithm. Traub et al., for instance, proposed an algorithm called second-chance binpacking aiming at generating code of better quality.{{sfn|Traub|Holloway|Smith|1998|p=141}}{{sfn|Poletto|Sarkar|1999|p=897}} In this approach, spilled variables get the opportunity to be stored later in a register by using a different [[Heuristic (computer science)|heuristic]] from the one used in the standard linear scan algorithm. Instead of using live intervals, the algorithm relies on live ranges, meaning that if a range needs to be spilled, it is not necessary to spill all the other ranges corresponding to this variable.

Linear scan allocation was also adapted to take advantage from the [[Static single assignment form|SSA form]]: the properties of this intermediate representation simplify the allocation algorithm and allow lifetime holes to be computed directly.{{sfn|Wimmer|Franz|2010|p=170}} First, the time spent in data-flow graph analysis, aimed at building the lifetime intervals, is reduced, namely because variables are unique.{{sfn|Mössenböck|Pfeiffer|2002|p=234}} It consequently produces shorter live intervals, because each new assignment corresponds to a new live interval.{{sfn|Mössenböck|Pfeiffer|2002|p=233}}{{sfn|Mössenböck|Pfeiffer|2002|p=229}} To avoid modeling intervals and liveness holes, Rogers showed a simplification called future-active sets that successfully removed intervals for 80% of instructions.{{sfn|Rogers|2020}}

===Rematerialization===
{{See also|Rematerialization}}

=== Coalescing ===
{{See also|Coalescing (computer science)}}
In the context of register allocation, coalescing is the act of merging variable-to-variable move operations by allocating those two variables to the same location. The coalescing operation takes place after the interference graph is built. Once two nodes have been coalesced, they must get the same color and be allocated to the same register, once the copy operation becomes unnecessary.{{sfn|Chaitin|1982|p=90}}

Doing coalescing might have both positive and negative impacts on the colorability of the interference graph.{{sfn|Bouchez|Darte|Rastello|2007b|p=103}} For example, one negative impact that coalescing could have on graph inference colorability is when two nodes are coalesced, as the result node will have a union of the edges of those being coalesced.{{sfn|Bouchez|Darte|Rastello|2007b|p=103}}
A positive impact of coalescing on inference graph colorability is, for example, when a node interferes with both nodes being coalesced, the degree of the node is reduced by one which leads to improving the overall colorability of the interference graph.{{sfn|Ahn|Paek|2009|p=7}}

There are several coalescing heuristics available:{{sfn|Park|Moon|2004|p=736}}

; Aggressive coalescing: It was first introduced in Chaitin's original register allocator. This heuristic aims at coalescing any non-interfering, copy-related nodes.{{sfn|Chaitin|1982|p=99}} From the perspective of copy elimination, this heuristic has the best results.{{sfn|Park|Moon|2004|p=738}} On the other hand, aggressive coalescing could impact the colorability of the inference graph.{{sfn|Ahn|Paek|2009|p=7}}
;Conservative Coalescing: It mainly uses the same heuristic as aggressive coalescing but it merges moves if, and only if, it does not compromise the colorability of the interference graph.{{sfn|Briggs|Cooper|Torczon|1994|p=433}}
;Iterated coalescing: It removes one particular move at the time, while keeping the colorability of the graph.{{sfn|George|Appel|1996|p=212}}
;Optimistic coalescing: It is based on aggressive coalescing, but if the inference graph colorability is compromised, then it gives up as few moves as possible.{{sfn|Park|Moon|2004|p=741}}

=== Mixed approaches ===
==== Hybrid allocation ====
Some other register allocation approaches do not limit to one technique to optimize register's use. Cavazos et al., for instance, proposed a solution where it is possible to use both the linear scan and the graph coloring algorithms.{{sfn|Eisl|Marr|Würthinger|Mössenböck|2017|p=11}} In this approach, the choice between one or the other solution is determined dynamically: first, a [[machine learning]] algorithm is used "offline", that is to say not at runtime, to build a heuristic function that determines which allocation algorithm needs to be used. The heuristic function is then used at runtime; in light of the code behavior, the allocator can then choose between one of the two available algorithms.{{sfn|Cavazos|Moss|O’Boyle|2006|p=124-127}}

Trace register allocation is a recent approach developed by Eisl et al.{{sfn|Eisl|Grimmer|Simon|Würthinger|2016|p=14:1}}{{sfn|Eisl|Grimmer|Simon|Würthinger|2016|p=1}} This technique handles the allocation locally: it relies on dynamic [[Profiling (computer programming)|profiling]] data to determine which branches will be the most frequently used in a given control flow graph. It then infers a set of "traces" (i.e. code segments) in which the merge point is ignored in favor of the most used branch. Each trace is then independently processed by the allocator. This approach can be considered as hybrid because it is possible to use different register allocation algorithms between the different traces.{{sfn|Eisl|Grimmer|Simon|Würthinger|2016|p=4}}

==== Split allocation ====
Split allocation is another register allocation technique that combines different approaches, usually considered as opposite. For instance, the hybrid allocation technique can be considered as split because the first heuristic building stage is performed offline, and the heuristic use is performed online.{{sfn|Cavazos|Moss|O’Boyle|2006|p=124}} In the same fashion, B. Diouf et al. proposed an allocation technique relying both on offline and online behaviors, namely static and dynamic compilation.{{sfn|Diouf|Cohen|Rastello|Cavazos|2010|p=66}}{{sfn|Cohen|Rohou|2010|p=1}} During the offline stage, an optimal spill set is first gathered using [[Integer programming|Integer Linear Programming]]. Then, live ranges are annotated using the <code>compressAnnotation</code> algorithm which relies on the previously identified optimal spill set. Register allocation is performed afterwards during the online stage, based on the data collected in the offline phase.{{sfn|Diouf|Cohen|Rastello|Cavazos|2010|p=72}}
 
In 2007, Bouchez et al. suggested as well to split the register allocation in different stages, having one stage dedicated to spilling, and one dedicated to coloring and coalescing.{{sfn|Bouchez|Darte|Rastello|2007a|p=1}}