| 
 | 1 | +# Data-flow Analysis  | 
 | 2 | + | 
 | 3 | +If you work on the MIR, you will frequently come across various flavors of  | 
 | 4 | +[data-flow analysis][wiki]. For example, `rustc` uses data-flow to find  | 
 | 5 | +uninitialized variables, determine what variables are live across a generator  | 
 | 6 | +`yield` statement, and compute which `Place`s are borrowed at a given point in  | 
 | 7 | +the control-flow graph.  | 
 | 8 | + | 
 | 9 | +Since data-flow analysis is such a fundamental concept in modern compilers, there  | 
 | 10 | +are ample resources for those who are not yet familiar. [*Static  | 
 | 11 | +Program Analysis*] by Anders Møller and Michael I. Schwartzbach is an  | 
 | 12 | +incredible, freely available textbook. For those who prefer audiovisual  | 
 | 13 | +learning, the Goethe University Frankfurt has published a series of short  | 
 | 14 | +[youtube lectures][goethe] that are very approachable.  | 
 | 15 | + | 
 | 16 | +The following sections will discuss the framework used to define and inspect  | 
 | 17 | +data-flow analyses in `rustc`. They assume that the reader is familiar with  | 
 | 18 | +common data-flow ideas such as [lattices], fixpoint, and transfer functions.  | 
 | 19 | +Any of the resources listed above should give you enough background to  | 
 | 20 | +understand what comes next.  | 
 | 21 | + | 
 | 22 | +[wiki]: https://en.wikipedia.org/wiki/Data-flow_analysis#Basic_principles  | 
 | 23 | +[goethe]: https://www.youtube.com/watch?v=NVBQSR_HdL0&list=PL_sGR8T76Y58l3Gck3ZwIIHLWEmXrOLV_&index=2  | 
 | 24 | +[lattices]: https://en.wikipedia.org/wiki/Lattice_(order)  | 
 | 25 | +[*Static Program Analysis*]: https://cs.au.dk/~amoeller/spa/  | 
 | 26 | + | 
 | 27 | +## Inspecting the Results of a Data-flow Analysis  | 
 | 28 | + | 
 | 29 | +Before we describe how to define a new data-flow analysis, let's inspect the  | 
 | 30 | +results of an existing one. Once you have constructed an analysis, you must  | 
 | 31 | +pass it to an `Engine`, which is capable of finding the fixpoint of  | 
 | 32 | +your data-flow problem. Calling `iterate_to_fixpoint` will return a `Results`,  | 
 | 33 | +which contains the fixpoint upon entry of each block.  | 
 | 34 | + | 
 | 35 | +Once you have a `Results`, you can can inspect the data-flow state at fixpoint  | 
 | 36 | +at any point in the CFG. If you only need the state at a few locations (e.g.,  | 
 | 37 | +each `Drop` terminator) use a [`ResultsCursor`]. If you need the state at *all*  | 
 | 38 | +locations, a [`ResultsVisitor`] will be more efficient.  | 
 | 39 | + | 
 | 40 | +```  | 
 | 41 | +                         Analysis  | 
 | 42 | +                            |  | 
 | 43 | +                            | into_engine(…)  | 
 | 44 | +                            |  | 
 | 45 | +                          Engine  | 
 | 46 | +                            |  | 
 | 47 | +                            | iterate_to_fixpoint()  | 
 | 48 | +                            |  | 
 | 49 | +                         Results  | 
 | 50 | +                         /     \  | 
 | 51 | + into_results_cursor(…) /       \  visit(…)  | 
 | 52 | +                       /         \  | 
 | 53 | +               ResultsCursor  ResultsVisitor  | 
 | 54 | +```  | 
 | 55 | + | 
 | 56 | +The following code example uses the `ResultsVisitor` method...  | 
 | 57 | + | 
 | 58 | + | 
 | 59 | +```rust,ignore  | 
 | 60 | +// Assuming `MyVisitor` implements `ResultsVisitor<FlowState = BitSet<MyAnalysis::Idx>>`...  | 
 | 61 | +let my_visitor = MyVisitor::new();  | 
 | 62 | +
  | 
 | 63 | +// inspect the fixpoint state for every location within every block in RPO.  | 
 | 64 | +let results = MyAnalysis()  | 
 | 65 | +    .into_engine(tcx, body, def_id)  | 
 | 66 | +    .iterate_to_fixpoint()  | 
 | 67 | +    .visit(body, traversal::reverse_postorder(body), my_visitor);  | 
 | 68 | +```  | 
 | 69 | + | 
 | 70 | +and this code uses `ResultsCursor`.  | 
 | 71 | + | 
 | 72 | +```rust,ignore  | 
 | 73 | +let mut results = MyAnalysis()  | 
 | 74 | +    .into_engine(tcx, body, def_id)  | 
 | 75 | +    .iterate_to_fixpoint()  | 
 | 76 | +    .into_results_cursor(body);  | 
 | 77 | +
  | 
 | 78 | +// Inspect the fixpoint state immediately before each `Drop` terminator.  | 
 | 79 | +for (bb, block) in body.basic_blocks().iter_enumerated() {  | 
 | 80 | +    if let TerminatorKind::Drop { .. } = block.terminator().kind {  | 
 | 81 | +        results.seek_before(body.terminator_loc(bb));  | 
 | 82 | +        let state = results.get();  | 
 | 83 | +
  | 
 | 84 | +        println!("state before drop: {:#?}", state);  | 
 | 85 | +    }  | 
 | 86 | +}  | 
 | 87 | +```  | 
 | 88 | + | 
 | 89 | +[`ResultsCursor`]: #  | 
 | 90 | +[`ResultsVisitor`]: #  | 
 | 91 | + | 
 | 92 | +## Defining a New Data-flow Analysis  | 
 | 93 | + | 
 | 94 | +### Domain  | 
 | 95 | + | 
 | 96 | +A data-flow analysis has two defining characteristics. First is the domain upon  | 
 | 97 | +which the analysis is defined, also known as the data-flow lattice. For  | 
 | 98 | +example, the domain of the [`MaybeInitializedPlaces`] analysis is the set–or,  | 
 | 99 | +more formally, the powerset lattice–of all move paths that are used in a  | 
 | 100 | +function. For now, the MIR data-flow framework only supports analyses whose  | 
 | 101 | +domain is the powerset lattice of some monotonic index, such as a `MovePathIndex`  | 
 | 102 | +or a `Local`.  | 
 | 103 | + | 
 | 104 | +The [`AnalysisDomain`] and [`BottomValue`] traits define the domain of a data-flow  | 
 | 105 | +analysis. `BottomValue` determines the initial value of the data-flow state for  | 
 | 106 | +each basic block, either the empty set (if `BOTTOM_VALUE = false`) or the full  | 
 | 107 | +set (if `BOTTOM_VALUE = true`). This also specifies the default lattice join  | 
 | 108 | +operator, union (if `BOTTOM_VALUE = false`) or intersection (if `BOTTOM_VALUE =  | 
 | 109 | +true`). This is because the initial value of the entry state of each block is  | 
 | 110 | +joined with the exit state of its predecessors. For example, if the initial  | 
 | 111 | +value of the data-flow state is the empty set but intersection is used as the  | 
 | 112 | +join operator, the entry state will never change since ∅ ∩ A = ∅ for all A.  | 
 | 113 | + | 
 | 114 | +`AnalysisDomain` defines the index type that serves as the element of the  | 
 | 115 | +data-flow state. It is also responsible for initalizing the data-flow state for  | 
 | 116 | +the `START_BLOCK`. For example,  | 
 | 117 | +`MaybeInitializedPlaces::initialize_start_block` marks move paths  | 
 | 118 | +corresponding to the parameters of a function as initialized.  | 
 | 119 | + | 
 | 120 | +[`MaybeInitializedPlaces`]: #  | 
 | 121 | +[`BottomValue`]: #  | 
 | 122 | +[`AnalysisDomain`]: #  | 
 | 123 | + | 
 | 124 | +### Transfer Function  | 
 | 125 | + | 
 | 126 | +The second characteristic of a data-flow analysis is its transfer function.  | 
 | 127 | +This describes how the data-flow state changes as a program is executed. For  | 
 | 128 | +the MIR, the transfer function of each basic block is comprised of the  | 
 | 129 | +effects of each individual statement followed by the effect of the terminator.  | 
 | 130 | +For example, in `MaybeInitializedPlaces`, the statement effect for an  | 
 | 131 | +assignment marks its destination as initialized.  | 
 | 132 | + | 
 | 133 | +A transfer function is defined for each statement and terminator via the  | 
 | 134 | +`Analysis::effect` methods. When called in sequence, these comprise the  | 
 | 135 | +transfer function for the entire basic block. Try to avoid using the  | 
 | 136 | +`before` variants of the effect methods. Unlike the unprefixed variants, their  | 
 | 137 | +effect on a given statement will be applied when `seek_before` is called with  | 
 | 138 | +that statement as the target location. Instead, use `seek_after` or  | 
 | 139 | +`visit_statement_exit` when inspecting the results.  | 
 | 140 | + | 
 | 141 | +#### Gen-kill data-flow problems  | 
 | 142 | + | 
 | 143 | +[Gen-kill] problems (also known as bit vector problems) are a certain class of  | 
 | 144 | +data-flow analyses whose domain is a powerset lattice and whose transfer  | 
 | 145 | +function only inserts or removes specific elements from the state vector. This  | 
 | 146 | +class of analyses is guaranteed to converge quickly, since we can use more  | 
 | 147 | +efficient approach when iterating to fixpoint. If your analysis can be defined  | 
 | 148 | +using only `gen` and `kill` operations, it probably should be.  | 
 | 149 | + | 
 | 150 | +[`GenKillAnalysis`] defines the transfer function for such analyses. Unlike the  | 
 | 151 | +[`Analysis`] trait, which can mutate the state vector directly. A  | 
 | 152 | +`GenKillAnalysis` only has access to a generic type that implements the  | 
 | 153 | +[`GenKill`] interface.  | 
 | 154 | + | 
 | 155 | + | 
 | 156 | +[`GenKillAnalysis`]: #  | 
 | 157 | +[`Analysis`]: #  | 
 | 158 | +[`GenKill`]: #  | 
 | 159 | +[Gen-kill]: https://en.wikipedia.org/wiki/Data-flow_analysis#Bit_vector_problems  | 
0 commit comments