1 \documentclass[11pt]{article}
2 \title{Design document for DyninstAPI DDG Analysis}
3 \author{Andrew R. Bernat}
6 The purpose of this document is to explain the data dependence graph
7 (DDG) analysis used by the DDG component of the DyninstAPI binary
8 analysis and instrumentation suite. Our DDG closely follows the
9 conventional definition and is built using an analysis optimized for
12 \section*{Definitions}
14 This work uses the following terms:
17 \item[Memory Region] A memory region is a contiguous range of memory
18 locations beginning at 0 and ending at some maximal value $n$. We
19 assume that memory regions to not overlap. Our current approach
20 defines two memory regions for each function: the \emph{global}
21 region and the \emph{stack}.
22 \item[Abstract Location] An \emph{abstract location} (or absloc) represents a
23 register or memory location. A memory location is a (region, offset)
24 pair that identifies a unique location within a memory region.
25 \item[Abstract Region] An \emph{abstract region} refers to multiple
26 offsets within a memory region. Currently an abstract region refers
27 to the entire region; this may become more precise in the future.
28 \item[Aliasing] Imprecision in memory analysis may lead to us not
29 being able to determine the offset within a region that an
30 instruction uses or defines. This leads to the existence of
31 \emph{aliases} - multiple abstractions for the same offset. In
32 particular, an abstract region is an alias of all abstract locations
33 within its set of offsets and vice versa. Aliasing leads to the
34 following issue: if an instruction defines an abstract region it
35 \emph{possibly defines} all abstract locations within that region;
36 however, we do not know which location was actually defined.
37 \item[Operation] Instructions may define multiple abstract
38 locations. For example, consider the IA-32 \emph{push}
39 instruction. Let $SP$ represent the stack pointer, and $*SP$ the top
40 of the stack. Then \emph{push eax} has the following effects: $SP = SP -
41 4$ and $*SP = eax$. Note that although there are two definitions
42 they are independent; the definition of $SP$ does not depend on
43 $eax$. We represent this by splitting instructions into
44 single-definition \emph{operations} represented by an (instruction,
46 \item[Gen Set] The \emph{gen set} (short for \emph{generated set}) for
47 an absloc is the set of operations that definitely define and
48 possibly define (see aliasing, above) that absloc.
49 \item[Kill Set] The \emph{kill set} for an absloc is the set of
50 operations to remove from the gen set.
51 \item[Candidate Node] The analysis may create a number of nodes that
52 are not necessary in the finished DDG. We refer to these as
53 \emph{candidate} nodes (cNodes), and any candidate nodes that are not added
54 to the DDG are discarded.
57 \section*{Data Dependence Graph}
59 A data dependence graph (DDG) is a graph $DDG = (V, E)$. The set of
60 vertices $V$ consists of the following types of nodes:
62 \item[Operation Nodes] Represent operations as defined above.
63 \item[Virtual Node] Any operation that uses no abstract locations is
64 instead given an in edge from a unique \emph{virtual node}. We do
65 this to insure that all operations are reachable in the graph.
66 \item[Formal Parameter Nodes] These represent definitions of abstract
67 locations that occur prior to the function being executed. These
68 include both ``expected'' parameters (as provided in registers or
69 the stack), definitions to global variables, and so on.
70 \item[Formal Return Nodes] These represent definitions of abstract
71 locations that persist after the function completes. This includes
72 the ``expected'' return value as well as definitions to caller-saved
73 registers, the stack, and so on.
74 \item[Actual Parameter Nodes] These summarize abstract locations used
76 \item[Actual Return Nodes] These summarize abstract locations defined
77 during execution of a called function.
80 Nodes are connected with edges that represent use-def
81 chains. Let $i, j$ be instructions and $a, b$ be abstract
82 locations. Let $o = (i,a)$ and $p = (j,b)$ be two operations: $o$
83 represents the definition of $a$ by $i$ and $p$ similarly represents
84 the definition of $b$ by $j$. Then an edge $e = (o,p)$ indicates the following:
86 \item $i$ defines an abstract location $a$.
87 \item $j$ uses $a$ to define $b$.
88 \item There exists an execution path from $i$ to $j$ such that no
89 redefinition of $a$ occurs on that path.
92 Operation nodes, the immediate node, formal parameter nodes, and
93 actual return nodes all define abstract locations and thus have
94 out-edges. Operation nodes, formal return nodes, and actual parameter
95 nodes all use abstract locations and thus have in-edges. We also
96 define \emph{virtual} edges between actual parameter and actual return
97 nodes to represent dependences within the called function.
101 Our analysis is somewhat complex. This section describes both the
102 intuition between each phase of the analysis and the results of the
103 phase. It should be kept in sync with any changes to function names,
106 \subsection*{Overview}
108 The goal of the DDG analysis is to construct an accurate and
109 reasonably precise intraprocedural DDG for a provided function. The
110 DDG is calculated with a fixpoint loop over a CFG. We note that CFGs
111 are not highly connected; instead, they are very \emph{lumpy} as
112 described by basic blocks. Therefore, instead of using a generalized
113 fixpoint we divide the analysis into three phases:
115 \item[Block Summarization] This phase summarizes the gen set and kill
116 set for each block in the program. The results of this phase is the
117 following information for each block and absloc:
119 \item Gen set: the last operation $o$ to definitely define that absloc,
120 plus all operations that possibly define that absloc after $o$.
121 \item Kill set: a boolean; true if there exists a definite definition
122 to the absloc and false otherwise.
124 \item[Inter-Block Reaching Definitions] This phase uses the gen and
125 kill set information determined in Phase 1 to determine the set of
126 reaching definitions at the entry of each block. This is determined
127 over blocks in the classic way. The following equations are per-absloc:
129 \item $IN(B) = \bigcup_{C \in pred(B)} OUT(C)$;
130 \item $OUT(B) = GEN(B) \cup (IN(B) - KILL(B))$
132 We optimize this as follows. The kill set can be compactly
133 represented as a boolean; if true, then $OUT(B) = GEN(B)$;
134 otherwise, $OUT(B) = GEN(B) \cup IN(B)$.
135 \item[Node Creation] Once reaching definitions have been calculated
136 for the entry to each block we can determine the reaching defs to
137 each instruction within the block and create the graph.
140 \subsection*{Block Summarization}
142 The first phase summarizes gen and kill set information for each
143 block. Since we are only interested in the \emph{last} operation that
144 definitely defines a particular absloc, plus any possible definitions
145 that occur after that, we do this with a backwards iteration. This
146 algorithm operates as follows:
150 updateDefSet(GenSet blockGens, Absloc A, cNode N)
152 For each alias A' of A:
155 updateKillSet(KillSet blockKills, Absloc A)
162 Let height = current stack height
163 return A' = (stack region, off + height)
166 summarizeCallGenKill(GenSet blockGens, KillSet blockKills,
168 Let cDDG = callee DDG
169 Let fR = cDDG.formalReturnNodes
170 For each absloc D in fR, do:
172 Let cNode C = ActualReturnNode(callee, D')
173 updateDefSet(blockGens, D', C)
174 updateKillSet(blockKills, D')
176 summarizeBlockGenKill(Block B)
177 Let blockGens = globalGens[B]
178 Let blockKills = globalKills[B]
179 For i = B.last to B.first, do:
180 For each absloc D defined by i, do:
181 If blockKills(D), continue
183 updateDefSet(blockGens, D, C)
184 updateKillSet(blockKills, D)
186 summarizeCallGenKill(blockGens, blockKills, i.callee)
189 For each block B in blocks
190 summarizeBlockGenKill(B)
194 \subsection*{Inter-Block Reaching Definitions}
196 This phase is relatively straightforward. The only complication is
197 created by the need to handle an unknown set of initial definitions
198 from outside the function (parameters). Any absloc that is used before
199 being defined is assumed to have an initial definition, represented by
200 a parameter node, from the caller function. However, we don't want to
201 have to enumerate all possible parameter nodes. Instead, we create
204 Consider the following case. Let $B_1, B_2, B_3$ be basic blocks, with
205 $B_1, B_2$ being predecessors of $B_3$ and $B_2$ being an entry
206 block. Let $A$ be an absloc defined in $B_1$ and used in $B_3$; $A$ is
207 \emph{not} defined in $B_2$. In this case the use of $A$ in $B_3$ has
208 a reaching definition from both $B_1$ (due to the definition) and a
209 parameter definition from $B_2$.
211 We can lazily create these parameter definitions by creating a
212 \emph{virtual parameter definition} for all abstract locations not
213 defined in the block itself.
215 These virtual parameter definitions are lazily created during the
216 merge phase of reaching definitions analysis, as follows:
219 merge(BlockSet blocks)
226 IN[A] += Parameter(A)
229 However, we still have a problem with alias definitions. Suppose that
230 $B_1$ defines $S$, the stack region (a possible definition of all
231 locations in the stack), and that $B_2$ defines $S_1$. When we merge
232 $B_1, B_2$, there will be no information that $B_1$ possibly defined
233 $S_1$ (as we create abstract locations lazily. We thus augment our
234 definition of merge, as follows:
237 merge(BlockSet blocks)
244 IN[A] += Parameter(A)
245 For each D in A.aliases
249 The problem with this definition is that we must know all abstract
250 locations a priori, which means a iterating over all abslocs (even
251 those not defined by previous blocks). This is inefficient. However,
252 we note the following. Parameter nodes are unique and idempotent; we
253 insert a parameter node for absloc $A$ if there exists at least one
254 path along which $A$ is not defined.
256 We therefore use the following definition of merge:
259 merge (BlockSet blocks)
260 Let counts be a map absloc -> integer
261 For each block B in blocks
262 For each absloc A in B.defSet
266 For each D in A.aliases
269 For each absloc A in counts
270 If counts[A] != blocks.size()
271 IN[A] += Parameter(A)
275 The remainder of the fixpoint analysis operates as expected, with the
276 modification of the kill set being represented as booleans rather than
292 \subsection*{Node Creation}
294 Once we have the set of reaching definitions at the entry point of
295 each block we can continue to create the graph. This is done in a
296 straightforward iteration over each block and instruction within the
299 We call the set of reaching definitions to the start of the block the
300 \emph{local reaching definitions} of the block. We then calculate the
301 reaching definitions to each instruction (and operation) as follows:
306 Let localReachingDefs = IN set of B
307 For each instruction I in B
308 For each absloc D defined by I
309 Let node TARGET = (I, D)
310 Let USE be the set of abslocs used by I to define D
312 Let RD = localReachingDefs[U]
313 For each cNode N in RD
315 Insert (S,T) into DDG
316 For each absloc D defined by I
318 localReachingDefs[D] = (I,D)
325 The creation of edges in the DDG is straightforward. Given a
326 definition $D$ (and thus an operation $(I,D)$ and a node $T = (I,D)$):
328 \item Find the abstract locations used to define $D$;
329 \item For each $U$ in this set, find the reaching definitions of $U$;
330 \item Create an edge from each such reaching definition $S$ to $T$.
333 However, once an instruction defines an abstract location it kills all
334 such previous definitions. We thus iterate over all defined locations
335 a second time and remove all appropriate previous definitions of
336 $D$. This is done in a second iteration rather than during the first
337 to insure parallelism of operations.
339 We must also create actual parameter nodes and formal return nodes for
340 calls and returns. Call parameter nodes are created in the same way as
341 instructions, except instead of the used set of the instruction we use
342 the formal parameter nodes of the call. Formal return nodes are
343 created for every reaching definition to the end of the block. These
344 are handled in a straightforward way by the \texttt{createCallNodes}
345 and \texttt{createReturnNodes} functions.