Control flow graph

Some CFG examples:
(a) an if-then-else
(b) a while loop
(c) a natural loop with two exits, e.g. while with an if...break in the middle; non-structured but reducible
(d) an irreducible CFG: a loop with two entry points, e.g. goto into a while or for loop

A control flow graph (CFG) in computer science is a representation, using graph notation, of all paths that might be traversed through a program during its execution. The control flow graph is due to Frances E. Allen,^[1] who notes that Reese T. Prosser used boolean connectivity matrices for flow analysis before.^[2]

The CFG is essential to many compiler optimizations and static analysis tools.

Definition

In a control flow graph each node in the graph represents a basic block, i.e. a straight-line piece of code without any jumps or jump targets; jump targets start a block, and jumps end a block. Directed edges are used to represent jumps in the control flow. There are, in most presentations, two specially designated blocks: the entry block, through which control enters into the flow graph, and the exit block, through which all control flow leaves.^[3]

Because of its construction procedure, in a non-trivial CFG (i.e. one with more than zero edges), every edge A→B has the property that:

outdegree(A) > 1 or indegree(B) > 1 (or both).^[4]

The CFG can thus be obtained, at least conceptually, by starting from the program's (full) flow graph—i.e. the graph in which every node represents an individual instruction—and performing an edge contraction for every edge that falsifies the predicate above, i.e. contracting every edge whose source has a single exit and whose destination has a single entry. This contraction-based algorithm is of no practical importance, except as a visualization aid for understanding the CFG construction, because the CFG can be more efficiently constructed directly from the program by scanning it for basic blocks.^[4]

Example

Consider the following fragment of code:

0: (A) t0 = read_num
1: (A) if t0 mod 2 == 0
2: (B)   print t0 + " is even."
3: (B)   goto 5
4: (C) print t0 + " is odd."
5: (D) end program

In the above, we have 4 basic blocks: A from 0 to 1, B from 2 to 3, C at 4 and D at 5. In particular, in this case, A is the "entry block", D the "exit block" and lines 4 and 5 are jump targets. A graph for this fragment has edges from A to B, A to C, B to D and C to D.

Reachability

Main article: Reachability

Reachability is a graph property useful in optimization.

If a subgraph is not connected from the subgraph containing the entry block, that subgraph is unreachable during any execution, and so is unreachable code; under normal conditions it can be safely removed.

If the exit block is unreachable from the entry block, an infinite loop may exist. Not all infinite loops are detectable, see Halting problem. A halting order may also exist there.

Unreachable code and infinite loops are possible even if the programmer does not explicitly code them: optimizations like constant propagation and constant folding followed by jump threading can collapse multiple basic blocks into one, cause edges to be removed from a CFG, etc., thus possibly disconnecting parts of the graph.

Domination relationship

Main article: Dominator (graph theory)

A block M dominates a block N if every path from the entry that reaches block N has to pass through block M. The entry block dominates all blocks.

In the reverse direction, block M postdominates block N if every path from N to the exit has to pass through block M. The exit block postdominates all blocks.

It is said that a block M immediately dominates block N if M dominates N, and there is no intervening block P such that M dominates P and P dominates N. In other words, M is the last dominator on all paths from entry to N. Each block has a unique immediate dominator.

Similarly, there is a notion of immediate postdominator : Analogous to immediate dominator.

The dominator tree is an ancillary data structure depicting the dominator relationships. There is an arc from Block M to Block N if M is an immediate dominator of N. This graph is a tree, since each block has a unique immediate dominator. This tree is rooted at the entry block. Can be calculated efficiently using Lengauer–Tarjan's algorithm.

A postdominator tree is analogous to the dominator tree. This tree is rooted at the exit block.

Special edges

A back edge is an edge that points to a block that has already been met during a depth-first (DFS) traversal of the graph. Back edges are typical of loops.

A critical edge is an edge which is neither the only edge leaving its source block, nor the only edge entering its destination block. These edges must be split: a new block must be created in the middle of the edge, in order to insert computations on the edge without affecting any other edges.

An abnormal edge is an edge whose destination is unknown. Exception handling constructs can produce them. These edges tend to inhibit optimization.

An impossible edge (also known as a fake edge) is an edge which has been added to the graph solely to preserve the property that the exit block postdominates all blocks. It cannot ever be traversed.

Loop management

A loop header (sometimes called the entry point of the loop) is a dominator that is the target of a loop-forming back edge. The loop header dominates all blocks in the loop body. A block may be a loop header for more than one loop. A loop may have multiple entry points, in which case it has no "loop header".

Suppose block M is a dominator with several incoming edges, some of them being back edges (so M is a loop header). It is advantageous to several optimization passes to break M up into two blocks M_pre and M_loop. The contents of M and back edges are moved to M_loop, the rest of the edges are moved to point into M_pre, and a new edge from M_pre to M_loop is inserted (so that M_pre is the immediate dominator of M_loop). In the beginning, M_pre would be empty, but passes like loop-invariant code motion could populate it. M_pre is called the loop pre-header, and M_loop would be the loop header.

References

↑ Frances E. Allen (July 1970). "Control flow analysis". SIGPLAN Notices. 5 (7): 1–19. doi:10.1145/390013.808479.
↑ Reese T. Prosser (1959). "Applications of Boolean matrices to the analysis of flow diagrams". pp. 133–138.
↑ Yousefi, Javad (2015). Masking wrong-successor Control Flow Errors employing data redundancy. IEEE. pp. 201–205. doi:10.1109/ICCKE.2015.7365827.
1 2 Peri L. Tarr; Alexander L. Wolf (2011). Engineering of Software: The Continuing Contributions of Leon J. Osterweil. Springer Science & Business Media. p. 58. ISBN 978-3-642-19823-6.

External links

Examples

Avrora – Control Flow Graph Tool

This article is issued from Wikipedia - version of the 9/26/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.