Editing Control-flow graph

{{Short description|Graphical representation of a computer program or algorithm}}
{{multiple issues|
{{More citations needed|date=January 2009}}
{{expert needed|Computer science|reason=many bad/unclear definitions and dubious claims.|date=February 2015}}
}}

[[File:Some types of control flow graphs.svg|thumb|Some CFG examples:<br />(a) an [[if-then-else]]<br />(b) a [[while loop]]<br />(c) a natural loop with two exits, e.g. while with an if...break in the middle; non-structured but reducible<br />(d) an irreducible CFG: a loop with two entry points, e.g. goto into a while or for loop]]
[[File:Rust_MIR_CFG.svg|thumb|A control-flow graph used by the [[Rust (programming language)|Rust]] compiler to perform codegen.]]
In [[computer science]], a '''control-flow graph''' ('''CFG''') is a [[Depiction|representation]], using [[directed graph|graph]] notation, of all paths that might be traversed through a [[computer program|program]] during its [[execution (computing)|execution]]. The control-flow graph was conceived by [[Frances E. Allen]],<ref>{{cite journal
| author = Frances E. Allen
| title = Control flow analysis
| journal = SIGPLAN Notices
| volume = 5
| issue = 7
|date=July 1970
| pages = 1–19
| doi=10.1145/390013.808479
| author-link = Frances E. Allen
}}</ref> who noted that [[Reese Prosser|Reese T. Prosser]] used [[Adjacency matrix|boolean connectivity matrices]] for flow analysis before.<ref>{{cite conference 
| author = Reese T. Prosser
| title = Applications of Boolean matrices to the analysis of flow diagrams
| book-title = Papers presented at the December 1–3, 1959, eastern joint IRE-AIEE-ACM computer conference
| year = 1959
| pages = 133–138
| doi=10.1145/1460299.1460314
| author-link = Reese Prosser
}}</ref>

The CFG is essential to many [[compiler optimization#Data-flow optimizations|compiler optimization]]s and [[static code analysis|static-analysis]] tools.

== Definition ==
In a control-flow graph each [[Vertex (graph theory)|node]] in the [[Graph (discrete mathematics)|graph]] represents a [[basic block]], i.e. a straight-line sequence of code with a single entry point and a single exit point, where no branches or jumps occur within the block. Basic blocks start with jump targets and end with jumps or branch instructions. Directed [[edge (graph theory)|edge]]s are used to represent jumps in the [[control flow]]. There are, in most presentations, two specially designated blocks: the ''entry block'', through which control enters into the flow graph, and the ''exit block'', through which all control flow leaves.<ref>{{cite conference |chapter=Masking wrong-successor Control Flow Errors employing data redundancy  |last1=Yousefi |first1=Javad |title=2015 5th International Conference on Computer and Knowledge Engineering (ICCKE) |date= 2015|publisher=IEEE |pages=201–205 |doi=10.1109/ICCKE.2015.7365827 |url=https://ieeexplore.ieee.org/document/7365827|isbn=978-1-4673-9280-8 }}</ref>

Because of its construction procedure, in a CFG, every edge A→B has the property that:
: [[outdegree]](A) > 1 or indegree(B) > 1 (or both).<ref name="TarrWolf2011">{{cite book|author1=Peri L. Tarr|author2=Alexander L. Wolf|title=Engineering of Software: The Continuing Contributions of Leon J. Osterweil|year=2011|publisher=Springer Science & Business Media|isbn=978-3-642-19823-6|page=58}}</ref>

The CFG can thus be obtained, at least conceptually, by starting from the program's (full) flow graph—i.e. the graph in which every node represents an individual instruction—and performing an [[edge contraction]] for every edge that falsifies the predicate above, i.e. contracting every edge whose source has a single exit and whose destination has a single entry. This contraction-based algorithm is of no practical importance, except as a visualization aid for understanding the CFG construction, because the CFG can be more efficiently constructed directly from the program by [[Basic block#Creation algorithm|scanning it for basic blocks]].<ref name="TarrWolf2011"/>

==Example==
Consider the following fragment of code:
<pre>
0: (A) t0 = read_num
1: (A) if t0 mod 2 == 0
2: (B)     print t0 + " is even."
3: (B)     goto 5
4: (C) print t0 + " is odd."
5: (D) end program
</pre>
In the above, we have 4 basic blocks: A from 0 to 1, B from 2 to 3, C at 4 and D at 5. In particular, in this case, A is the "entry block", D the "exit block" and lines 4 and 5 are jump targets. A graph for this fragment has edges from A to B, A to C, B to D and C to D.

== Reachability ==
{{Main|Reachability}}

[[Reachability]] is a graph property useful in optimization.

If a subgraph is not connected from the subgraph containing the entry block, that subgraph is unreachable during any execution, and so is [[unreachable code]]; under normal conditions it can be safely removed.

If the exit block is unreachable from the entry block, an [[infinite loop]] may exist. Not all infinite loops are detectable, see [[Halting problem]]. A halting order may also exist there.

Unreachable code and infinite loops are possible even if the programmer does not explicitly code them: optimizations like [[constant propagation]] and [[constant folding]] followed by [[jump threading]] can collapse multiple basic blocks into one, cause edges to be removed from a CFG, etc., thus possibly disconnecting parts of the graph.

==Domination relationship==
{{Main|Dominator (graph theory)}}

A block M ''[[dominator (graph theory)|dominates]]'' a block N if every path from the entry that reaches block N has to pass through block M. The entry block dominates all blocks.

In the reverse direction, block M ''postdominates'' block N if every path from N to the exit has to pass through block M. The exit block postdominates all blocks.

It is said that a block M ''immediately dominates'' block N if M dominates N, and there is no intervening block P such that M dominates P and P dominates N. In other words, M is the last dominator on all paths from entry to N. Each block has a unique immediate dominator.

Similarly, there is a notion of ''immediate postdominator'', analogous to ''immediate dominator''.

The [[dominator (graph theory)|''dominator tree'']] is an ancillary data structure depicting the dominator relationships. There is an arc from Block M to Block N if M is an immediate dominator of N. This graph is a tree, since each block has a unique immediate dominator. This tree is rooted at the entry block. The dominator tree can be calculated efficiently using [[Lengauer–Tarjan's algorithm]].

A ''postdominator tree'' is analogous to the ''dominator tree''. This tree is rooted at the exit block.

==Special edges==

A ''back edge'' is an edge that points to a block that has already been met during a depth-first ([[depth-first search|DFS]]) traversal of the graph. Back edges are typical of loops.

A ''critical edge'' is an edge which is neither the only edge leaving its source block, nor the only edge entering its destination block. These edges must be ''split'': a new block must be created in the middle of the edge, in order to insert computations on the edge without affecting any other edges.

An ''abnormal edge'' is an edge whose destination is unknown. [[Exception handling]] constructs can produce them. These edges tend to inhibit optimization.

An ''impossible edge'' (also known as a ''fake edge'') is an edge which has been added to the graph solely to preserve the property that the exit block postdominates all blocks. It cannot ever be traversed.

==Loop management==

A ''loop header'' (sometimes called the ''entry point'' of the loop) is a dominator that is the target of a loop-forming back edge. The loop header dominates all blocks in the loop body.  A block may be a loop header for more than one loop.  A loop may have multiple entry points, in which case it has no "loop header".

Suppose block M is a dominator with several incoming edges, some of them being back edges (so M is a loop header). It is advantageous to several optimization passes to break M up into two blocks M<sub>pre</sub> and M<sub>loop</sub>. The contents of M and back edges are moved to M<sub>loop</sub>, the rest of the edges are moved to point into M<sub>pre</sub>, and a new edge from M<sub>pre</sub> to M<sub>loop</sub> is inserted (so that M<sub>pre</sub> is the immediate dominator of M<sub>loop</sub>). In the beginning, M<sub>pre</sub> would be empty, but passes like [[loop-invariant code motion]] could populate it. M<sub>pre</sub> is called the ''loop pre-header'', and M<sub>loop</sub> would be the loop header.

== Reducibility ==

A reducible CFG is one with edges that can be partitioned into two disjoint sets: forward edges, and back edges, such that:<ref>{{Cite web |url=http://www.cs.colostate.edu/~mstrout/CS553Fall06/slides/lecture13-control.pdf |title=Archived copy |access-date=2018-03-24 |archive-date=2020-08-01 |archive-url=https://web.archive.org/web/20200801004407/https://www.cs.colostate.edu/~mstrout/CS553Fall06/slides/lecture13-control.pdf |url-status=dead }}</ref>

* Forward edges form a [[directed acyclic graph]] with all nodes reachable from the entry node.
* For all back edges (A, B), node B [[Dominator (graph theory)|dominates]] node A.

[[Structured programming]] languages are often designed such that all CFGs they produce are reducible, and common structured programming statements such as IF, FOR, WHILE, BREAK, and CONTINUE produce reducible graphs. To produce irreducible graphs, statements such as [[GOTO]] are needed.  Irreducible graphs may also be produced by some compiler optimizations.

== Loop connectedness ==
The loop connectedness of a CFG is defined with respect to a given [[depth-first search]] tree (DFST) of the CFG. This DFST should be rooted at the start node and cover every node of the CFG.

Edges in the CFG which run from a node to one of its DFST ancestors (including itself) are called back edges.

The loop connectedness is the largest number of back edges found in any cycle-free path of the CFG. In a reducible CFG, the loop connectedness is independent of the DFST chosen.<ref name=":0">{{Cite journal|last1=Kam|first1=John B.|last2=Ullman|first2=Jeffrey D.|date=1976-01-01|title=Global Data Flow Analysis and Iterative Algorithms|journal=Journal of the ACM|volume=23|issue=1|pages=158–171|doi=10.1145/321921.321938|s2cid=162375 |issn=0004-5411|doi-access=free}}</ref><ref>{{Cite web|url=https://www.cs.umb.edu/~offner/files/flow_graph.pdf |archive-url=https://web.archive.org/web/20080725053222/http://www.cs.umb.edu/~offner/files/flow_graph.pdf |archive-date=2008-07-25 |url-status=live|title=Notes on Graph Algorithms Used in Optimizing Compilers|last=Offner|first=Carl|access-date=13 April 2018}}</ref>

Loop connectedness has been used to reason about the time complexity of [[data-flow analysis]].<ref name=":0" />

== Inter-procedural control-flow graph ==

While control-flow graphs represent the control flow of a single procedure, inter-procedural control-flow graphs represent the control flow of whole programs.<ref>{{Cite web|url=http://web.cs.iastate.edu/~weile/cs513x/4.ControlFlowAnalysis.pdf |archive-url=https://web.archive.org/web/20161219113226/http://web.cs.iastate.edu/~weile/cs513x/4.ControlFlowAnalysis.pdf |archive-date=2016-12-19 |url-status=live|title=Control Flow Analysis|date=2016}}</ref>

==See also==
* [[Abstract syntax tree]]
* [[Flowchart]]
* [[Control-flow diagram]]
* [[Control-flow analysis]]
* [[Data-flow analysis]]
* [[Interval (graph theory)]]
* [[Program dependence graph]]
* [[Cyclomatic complexity]]
* [[Static single assignment]]
* [[Compiler construction]]
* [[Intermediate representation]]

== References ==
{{Reflist}}

==External links==
{{Commons category}}
*[https://www.cs.cmu.edu/afs/cs.cmu.edu/academic/class/15745-s02/public/doc/cfg.html The Machine-SUIF Control Flow Graph Library]
*[https://gcc.gnu.org/onlinedocs/gccint/Control-Flow.html GNU Compiler Collection Internals]
*Paper "[http://www.ucw.cz/~hubicka/papers/proj/node6.html#SECTION02420000000000000000 Infrastructure for Profile Driven Optimizations in GCC Compiler]" by [[Zdeněk Dvořák]] ''et al.''

;Examples
*[http://compilers.cs.ucla.edu/avrora/cfg.html Avrora – Control-Flow Graph Tool] {{Webarchive|url=https://web.archive.org/web/20110825221459/http://compilers.cs.ucla.edu/avrora/cfg.html |date=2011-08-25 }}

[[Category:Compiler construction]]
[[Category:Control-flow analysis|*]]
[[Category:Application-specific graphs]]
[[Category:Modeling languages]]