Editing Data-flow analysis (section)

== An iterative algorithm ==
The most common way of solving the data-flow equations is by using an iterative algorithm. It starts with an approximation of the in-state of each block. The out-states are then computed by applying the transfer functions on the in-states. From these, the in-states are updated by applying the join operations. The latter two steps are repeated until we reach the so-called '''fixpoint''': the situation in which the in-states (and the out-states in consequence) do not change.
 
A basic algorithm for solving data-flow equations is the '''round-robin iterative algorithm''':
:for ''i'' ← 1 to ''N''
::''initialize node i''
:while (''sets are still changing'')
::for ''i'' ← 1 to ''N''
:::''recompute sets at node i''

=== Convergence ===
To be usable, the iterative approach should actually reach a fixpoint. This can be guaranteed
by imposing constraints on the combination of the value domain of the states, the transfer functions and the join operation.

The value domain should be a [[partial order]] with '''finite height''' (i.e., there are no infinite ascending chains <math>x_1</math> < <math>x_2</math> < ...).  The combination of the transfer function and the join operation should be [[monotonic]] with respect to this partial order.  Monotonicity ensures that on each iteration the value will either stay the same or will grow larger, while finite height ensures that it cannot grow indefinitely.  Thus we will ultimately reach a situation where T(x) = x for all x, which is the fixpoint.

=== The work list approach ===
It is easy to improve on the algorithm above by noticing that the in-state of a block will not change if the out-states of its predecessors don't change. Therefore, we introduce a '''work list''': a list of blocks that still need to be processed. Whenever the out-state of a block changes, we add its successors to the work list. In each iteration, a block is removed from the work list. Its out-state is computed. If the out-state changed, the block's successors are added to the work list. For efficiency, a block should not be in the work list more than once.

The algorithm is started by putting information-generating blocks in the work list. It terminates when the
work list is empty.

=== Ordering ===
The efficiency of iteratively solving data-flow equations is influenced by the order at which local nodes are visited.<ref name="Cooper_2004"/> Furthermore, it depends on whether the data-flow equations are used for forward or backward data-flow analysis over the CFG. Intuitively, in a forward flow problem, it would be fastest if all predecessors of a block have been processed before the block itself, since then the iteration will use the latest information. In the absence of loops it is possible to order the blocks in such a way that the correct out-states are computed by processing each block only once.

In the following, a few iteration orders for solving data-flow equations are discussed 
(a related concept to iteration order of a [[control-flow graph|CFG]] is [[tree traversal]] of a
[[Tree (graph theory)|tree]]).

* '''Random order''' - This iteration order is not aware whether the data-flow equations solve a forward or backward data-flow problem. Therefore, the performance is relatively poor compared to specialized iteration orders.
* '''[[Postorder]]''' - This is a typical iteration order for backward data-flow problems. In ''postorder iteration'', a node is visited after all its successor nodes have been visited. Typically, the ''postorder iteration'' is implemented with the '''depth-first''' strategy.
* '''[[depth-first search#Vertex orderings|Reverse postorder]]''' - This is a typical iteration order for forward data-flow problems. In '''reverse-postorder iteration''', a node is visited before any of its successor nodes has been visited, except when the successor is reached by a back edge. (Note that reverse postorder is not the same as [[depth-first search#Vertex orderings|preorder]].)

=== Initialization ===
The initial value of the in-states is important to obtain correct and accurate results. 
If the results are used for compiler optimizations, they should provide '''conservative''' information, i.e. when applying the information, the program should not change semantics. 
The iteration of the fixpoint algorithm will take the values in the direction of the maximum element. Initializing all blocks with the maximum element is therefore not useful. At least one block starts in a state with a value less than the maximum. The details depend on the
data-flow problem. If the minimum element represents totally conservative information, the results can be used safely even during the data-flow iteration. If it represents the most accurate information, fixpoint should be reached before the results can be applied.