A relational approach to interprocedural shape analysis

A relational approach to interprocedural shape analysis
A Relational Approach to
Interprocedural Shape Analysis
BERTRAND JEANNET
and
ALEXEY LOGINOV
and
THOMAS REPS
and
MOOLY SAGIV
This paper addresses the verification of properties of imperative programs with recursive procedure
calls, heap-allocated storage, and destructive updating of pointer-valued fields—i.e.,interprocedural
shape analysis. The paper makes three contributions:
— It introduces a new method for abstracting relations over memory configurations for use in
abstract interpretation.
— It shows how this method furnishes the elements needed for a compositional approach to shape
analysis. In particular, abstracted relations are used to represent the shape transformation
performed by a sequence of operations, and an over-approximation to relational composition
can be performed using the meet operation of the domain of abstracted relations.
— It applies these ideas in a new algorithm for context-sensitive interprocedural shape analysis.
The algorithm creates procedure summaries using abstracted relations over memory configurations, and the meet-based composition operation provides a way to apply the summary
transformer for a procedure P at each call site from which P is called.
The algorithm has been applied successfully to establish properties of both (i) recursive programs
that manipulate lists, and (ii) recursive programs that manipulate binary trees.
Categories and Subject Descriptors: D.2.4 [Software Engineering]: Software/program Verification—Assertion checkers; D.2.5 [Software Engineering]: Testing and Debugging—symbolic
execution; D.3.3 [Programming Languages]: Language Constructs and Features—data types
and structures; dynamic storage management; Procedures, functions and subroutines; Recursion; E.1 [Data]: Data Structures—Lists, stacks and queues; Trees; E.2 [Data]: Data Storage
A preliminary version of this paper appeared in the proceedings of the 11th Int. Static Analysis
Symposium (SAS), (Verona, Italy, August 26-28, 2004) [Jeannet et al. 2004].
This work was supported in part by ONR under grants N00014-01-1-0796 and N00014-01-10708, and by NSF under grants CCR-9986308, CCF-0540955, and CCF-0524051.
Affiliations: Bertrand Jeannet; INRIA; Bertrand.Jeannet@inrialpes.fr. Alexey Loginov;
GrammaTech, Inc.; alexey@grammatech.com. Thomas Reps; Comp. Sci. Dept., University of
Wisconsin, and GrammaTech, Inc.; reps@cs.wisc.edu. Mooly Sagiv; School of Comp. Sci., Tel
Aviv University; msagiv@post.tau.ac.il.
When the research reported in the paper was carried out, Bertrand Jeannet was affiliated
with INRIA or visiting the University of Wisconsin, and Alexey Loginov was affiliated with the
University of Wisconsin.
Permission to make digital/hard copy of all or part of this material without fee for personal
or classroom use provided that the copies are not made or distributed for profit or commercial
advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and
notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish,
to post on servers, or to redistribute to lists requires prior specific permission and/or a fee.
c 20YY ACM 0000-0000/20YY/0000-0001 $5.00
ACM Journal Name, Vol. V, No. N, Month 20YY, Pages 1–0??.
2
·
Bertrand Jeannet et al.
Representations—composite structures; linked representations; F.3.1 [Logics and Meanings of
Programs]: Specifying and Verifying and Reasoning about Programs—Assertions; Invariants
General Terms: Algorithms, Languages, Theory, Verification
Additional Key Words and Phrases: Abstract interpretation, context-sensitive analysis, interprocedural dataflow analysis, destructive updating, pointer analysis, shape analysis, static analysis,
3-valued logic
1. INTRODUCTION
This paper concerns techniques for static analysis of recursive programs that manipulate heap-allocated storage and perform destructive updating of pointer-valued
fields. The goal is to recover shape descriptors that provide information about the
characteristics of the data structures that a program’s pointer variables can point
to. Such information can be used to help programmers understand certain aspects
of the program’s behavior, to verify properties of the program, and to optimize or
parallelize the program.
The work reported in the paper builds on past work by several of the authors
on static analysis based on 3-valued logic [Sagiv et al. 2002; Reps et al. 2003] and
its implementation in the TVLA system [Lev-Ami and Sagiv 2000]. In this setting,
two related logics come into play: an ordinary 2-valued logic, as well as a related
3-valued logic. A memory configuration, or store, is modeled by what logicians
call a logical structure, which consists of a predicate (i.e., a relation of appropriate
arity) for each predicate symbol of a vocabulary P. A store is modeled by a 2-valued
logical structure; a set of stores is abstracted by a (finite) set of bounded-size 3valued logical structures. An individual of a 3-valued structure’s universe either
models a single memory cell or, in the case of a summary individual, a collection of
memory cells.
The constraint of working with limited-size descriptors entails a loss of information about the store. Certain properties of concrete individuals are lost due to
abstraction, which groups together multiple individuals into summary individuals:
a property can be true for some concrete individuals of the group but false for other
individuals. It is for this reason that 3-valued logic is used; uncertainty about a
property’s value is captured by means of the third truth value, 1/2.
One of the opportunities for scaling up this approach is to exploit the compositional structure of programs. In interprocedural dataflow analysis, one avenue
for accomplishing this is to create a summary transformer for each procedure P ,
and use the summary transformer at each call site at which P is called. Each
summary transformer must capture (an over-approximation of) the net effect of a
call on P . To be able to create summary transformers, the abstract transformers
for individual transitions must have a “composable representation”; that is, given
the representations of two abstract transformers, it must be possible to represent
their composition as an object of roughly the same size. One then carries out a
fixpoint-finding procedure on a collection of equations in which each variable in the
equation set has a transformer-valued value—i.e., a value drawn from the domain
of transformers—rather than a dataflow value proper.
ACM Journal Name, Vol. V, No. N, Month 20YY.
A Relational Approach to Interprocedural Shape Analysis
·
3
A number of approaches to interprocedural dataflow analysis based on summary
transformers are known [Cousot and Cousot 1977; Sharir and Pnueli 1981; Knoop
and Steffen 1992; Reps et al. 1995; Sagiv et al. 1996; Reps et al. 2005]. However, not
all program-analysis problems have abstract transformers that have a composable
representation.
For some problems, it is possible to address this issue by working pointwise,
tabulating composed transformers using either (i) sets of pairs that consist of an
input abstract value and an output abstract value [Sharir and Pnueli 1981], or (ii)
finer-granularity sets of pairs that capture how parts of an input abstract value
influence parts of an output abstract value [Reps et al. 1995; Sagiv et al. 1996; Ball
and Rajamani 2001]. In essence, these approaches start with the kinds of objects
used in intraprocedural analysis and pair them together to create the objects that
are used in interprocedural analysis.
However, for interprocedural shape analysis, tabulating pairs of 3-valued
structures—the kinds of objects used in intraprocedural shape analysis—has significant drawbacks insofar as precision is concerned: in the 3-valued-logic approach
to shape analysis, individuals—which model memory cells—do not have fixed identities; they are identified only up to their “distinguishing characteristics”, namely,
their values for a specific set of unary predicates. Because these “distinguishing
characteristics” can change during the course of a procedure call, there is no way to
identify individuals in an input abstract structure with their corresponding individuals in the output abstract structure. In essence, a pair of input/output 3-valued
structures loses track of the correlations between the input and output values of
an individual’s unary predicates. Consequently, an approach based on tabulating
composed transformers as sets of pairs of 3-valued structures provides only a weak
characterization of a procedure’s net effect, and is fundamentally limited in the
properties that it can express.
All is not lost, however: instead of “abstracting and then pairing” (as discussed
above), the solution is to “pair and then abstract”.
Observation 1.1. By using a 3-valued structure over a doubled vocabulary P ]
P 0 , where P 0 = {p0 | p ∈ P} and ] denotes disjoint union, one can obtain a finite
abstraction that relates the predicate values for an individual at the beginning of a
transition to the predicate values for the individual at the end of the transition.
This approach provides a way to create much more accurate composable representations of transformers, and hence much more accurate summary transformers, for
a broad class of problems. The advantages come from two effects:
— The addition of the second vocabulary changes the abstraction in use because
individuals now have additional “distinguishing characteristics” [Sagiv et al.
2002].
— The second vocabulary helps permit the changes in a predicate to be tracked
over a sequence of operations [Lev-Ami et al. 2000].
The benefit of these properties is that, in many cases, a relationship on the before
and after values of a predicate can be tracked on individual locations or tuples of locations, over a sequence of operations—even when abstraction has been performed.
The consequence is that two-vocabulary 3-valued structures provide more precise
ACM Journal Name, Vol. V, No. N, Month 20YY.
4
·
Bertrand Jeannet et al.
descriptors of relations between stores than an approach based on pairing abstract
stores from an existing store abstraction.
Moreover, by extending the abstract domain of 3-valued logical structures with
some new operations, it is possible to perform abstract interpretation of call and
return statements without losing too much precision (see §6 and §7). We have
used these ideas to create a context-sensitive shape-analysis algorithm for recursive
programs that manipulate heap-allocated storage and perform destructive updating.
The “pair and then abstract” principle of Observation 1.1 is related to several
well-known concepts:
Pairing without abstraction:. The use of a doubled vocabulary is standard in
logic-based reasoning about execution behavior: the transition relations of a language’s concrete semantics are often expressed by means of formulas over presentstate and next-state variables (e.g., [Gries 1981; Manna and Pnueli 1995; Clarke
et al. 1999]). For instance, the semantics of a statement x := y+1 can be expressed
as the formula (x0 = y + 1) ∧ (y 0 = y). Similarly, a procedure’s post-condition is
often expressed using such a doubled vocabulary (i.e., the post-condition expresses
a relation over input stores and output stores).
Pairing and then numeric abstraction:. For analyzing programs that manipulate
numeric data, a composable abstract transformer for a statement such as x := y+1
can be created directly from the formula (x0 = y + 1) ∧ (y 0 = y) when using
the polyhedral abstract domain [Cousot and Halbwachs 1978]. The number of
dimensions in each polyhedron used by the analyzer is double the number |V | of
numeric variables V about which the analyzer is trying to obtain information. Each
program variable has a primed and an unprimed version, and a polyhedron captures
linear relations among the 2|V | variables.
In this paper, we use Observation 1.1 to create composable abstract transformers for
programs that manipulate non-numeric data. Our work provides a new approach to
performing context-sensitive interprocedural shape analysis, and allows us to verify
properties of imperative programs with recursive procedure calls, heap-allocated
storage, and destructive updating of pointer-valued fields.
The contributions of our work include the following:
(1) We introduce a new method for abstracting relations over memory configurations for use in abstract interpretation.
(2) We show how this method furnishes the elements needed for a compositional
approach to shape analysis. In particular, abstracted relations are used to
represent the shape transformation performed by a sequence of operations, and
an over-approximation to relational composition can be performed using the
meet operation of the domain of abstracted relations.
(3) We apply these ideas in a new algorithm for context-sensitive interprocedural
shape analysis. The algorithm creates procedure summaries using abstracted
relations over memory configurations, and the meet-based composition operation provides a way to apply the summary transformer for a procedure P at
each call site from which P is called.
We have been able to apply this approach successfully to establish properties of both
(i) recursive programs that manipulate lists, and (ii) recursive programs that maACM Journal Name, Vol. V, No. N, Month 20YY.
A Relational Approach to Interprocedural Shape Analysis
typedef struct node {
struct node *n;
int data;
} *List;
List res;
void main(List l) {
res = rev(l);
}
·
5
List rev(List x){
List y, z;
z = x->n;
x->n = NULL;
if (z != NULL){
y = rev(z);
z->n = x;
}
else y = x;
return y;
}
Fig. 1. Recursive list-reversal program. The recursive function rev destructively reverses a nonempty, acyclic, singly-linked list using recursion to traverse the list.
nipulate binary trees. While list-manipulation programs can often be implemented
in tail-recursive fashion—and hence can be converted easily into loop programs—
tree-manipulation programs are much less easily converted to non-recursive form.
In particular, the shape properties that characterize sorted binary trees are complex and rely on global properties, whereas the shape properties that characterize
sorted lists are mostly local properties—with cyclicity properties being the main
exception.
Organization. The remainder of the paper is organized as follows: §2 presents,
at a semi-formal level, several of the principles that lie behind our approach. §3
presents some background on 2-valued and 3-valued logic. §4 defines the language
to which our analysis applies, and gives a concrete semantics, based on the use
of 2-valued logical structures for representing memory configurations. §5 describes
the abstraction of 2-valued logical structures with bounded-size 3-valued logical
structures [Sagiv et al. 2002]. Our interprocedural shape analysis is based on a
relational semantics, which establishes at each control point a relation between
the input state of the enclosing procedure and the state at the current point. This
semantics requires the ability to represent relations between memory configurations,
which presents certain difficulties at the abstract level. §6 addresses this problem
by abstracting relations between memory configurations using the same principles
as those used to abstract sets of memory configurations in §5. §7 describes the
interprocedural shape-analysis algorithm that we developed based on these ideas.
§8 presents experimental results. §9 discusses related work.
2. OVERVIEW
In this section, we discuss at a semi-formal level the “pairing” aspect of Obs. 1.1
(“pair and then abstract”). Abstraction is the subject of §5. §7 applies the “pair
and then abstract” principle in the context of interprocedural shape analysis.
Consider non-empty, acyclic, singly-linked lists constructed from nodes of the
type List whose declaration is given in Fig. 1. One of the issues discussed below
concerns how to create a summary transformer for a procedure that reverses a list,
using destructive updating. The summary transformer that we give applies both to
recursive and non-recursive destructive list-reversal procedures. Because summary
transformers (also known as “procedure summaries”) are particularly useful for anACM Journal Name, Vol. V, No. N, Month 20YY.
6
·
Bertrand Jeannet et al.
(a)
[1]
[2]
[3]
[4]
a = <a 4-element list>; b = NULL; p = NULL;
b = rev(a);
p = b->n;
. . .
(b)
[1]
[2]
[3]
[4]
a = <a 4-element list>; b = NULL; c = NULL;
b = rev(a);
c = rev(b);
. . .
Fig. 2. Examples to illustrate one-vocabulary structures, two-vocabulary structures, transformer
application, and procedure summaries.
(a)
S =
a
(b)
S0 =
a
(c)
S 00 =
n
n
n
n
n
n
b
a
n
n
n
b
p
Fig. 3. (a) The (one-vocabulary) structure that represents a four-element acyclic list that is
pointed to by a; (b) the (one-vocabulary) structure that represents the list from (a) after the
operation “[2] b = rev(a);”; (c) the (one-vocabulary) structure that represents the list from (b)
after the operation “[3] p = b->n;”.
alyzing recursive programs, the running example used in later sections of the paper
is the recursive list-reversal program shown in Fig. 1. That procedure destructively
reverses a non-empty, acyclic, singly-linked list using recursion to traverse the list.
In the remainder of this section, we discuss the two code fragments shown in
Fig. 2. Fig. 3 depicts three four-element, singly-linked, acyclic lists. The nodes of
each graph represent memory cells. An address-valued program variable (“pointer
variable”) that points to a given memory cell is represented by an arrow from the
variable name to the node for the cell. (A pointer variable whose value is NULL is
not shown.) The other arrows in the graph, labeled with n, represent the values of
cells’ n-fields. Fig. 3(a), (b), and (c) represent lists that arise just before lines [2],
[3], and [4] of Fig. 2(a), respectively.
Two Kinds of Pairing. Figs. 4 and 5 illustrate two different kinds of pairing
operations that can be performed on lists:
— Fig. 4(a) depicts a pair of one-vocabulary structures that represent the net
transformation from just before line [2] of Fig. 2(a) to just before line [3];
Fig. 4(b) depicts a pair of one-vocabulary structures that represent the net
transformation from just before line [2] of Fig. 2(a) to just before line [4].
ACM Journal Name, Vol. V, No. N, Month 20YY.
A Relational Approach to Interprocedural Shape Analysis
n
(a)
hS, S 0 i =
, a’
n
b’
n’
n’
n’
n
, a”
a
hS, S 00 i =
7
n
a
n
(b)
n
·
b”
n”
n”
n”
p”
Fig. 4. Pairs of one-vocabulary structures that represent (a) the net transformation from just
before line [2] of Fig. 2(a) to just before line [3]; (b) the net transformation from just before
line [2] of Fig. 2(a) to just before line [4].
n
(a)
(b)
S h·,
S h·,
0
00
i
i
=
=
n
n
b’
a,a’
n’
n’
n’
n
n
n
n”
n”
n”
a,a”
b”
p”
Fig. 5. Two-vocabulary structures that represent (a) the net transformation from just before
line [2] of Fig. 2(a) to just before line [3]; (b) the net transformation from just before line [2]
of Fig. 2(a) to just before line [4]. (The superscript in each structure’s name indicates what
vocabularies are present in the structure; “·” stands for “unprimed”.)
— Fig. 5(a) depicts a two-vocabulary structure that represents the net transformation from just before line [2] of Fig. 2(a) to just before line [3]; Fig. 5(b)
depicts a two-vocabulary structure that represents the net transformation from
just before line [2] of Fig. 2(a) to just before line [4].
A two-vocabulary structure has a single set of memory cells that are structured
using two vocabularies. In Fig. 5(a), one vocabulary is {a, b, p, n}; the second
vocabulary is {a0 , b0 , p0 , n0 }. In Fig. 5(b), the two vocabularies are {a, b, p, n} and
{a00 , b00 , p00 , n00 }.1 (In Fig. 4(a) and (b), we have used single-primed and doubleprimed vocabularies in the respective second-component structures to emphasize
how they correspond to the two-vocabulary structures of Fig. 5(a) and (b). Strictly
speaking, these should have been unprimed vocabularies.)
Even though we have drawn the list in the second component of the pair shown
in Fig. 4(a) so that each n0 -edge appears to have been reversed from the n-edge
in the first component, we have not given names to the nodes, and thus Fig. 4(a)
does not contain sufficient information to ensure that each the original edges has,
in fact, been reversed.2
1 Variables b, p, and p0 do not appear in Fig. 5(a) because they have the value NULL. Likewise,
variables b and p do not appear in Fig. 5(b) because they have the value NULL.
2 Although it would be easy to give indelible names to nodes in each concrete list, it will become
apparent in §5 that this is not the case for nodes in abstract lists. The discussion in this section is
intended to convey—using concrete lists—how we overcome the lack of indelible names for nodes
in abstract lists.
ACM Journal Name, Vol. V, No. N, Month 20YY.
8
·
Bertrand Jeannet et al.
In contrast, because there is a unique set of nodes in the two-vocabulary structure
of Fig. 5(a), we know that for each n-edge there is a corresponding reversed n0 -edge,
and vice versa.
Transformer Application. Let τ denote the transformation produced by the statement “[3] p = b->n;” in line [3] of Fig. 2(a). Consider three ways of depicting the
effect:
— In terms of one-vocabulary structures, the transformation amounts to passing
from Fig. 3(b) to Fig. 3(c):
τ (S 0 ) = S 00 .
— In terms of pairs of one-vocabulary structures, the transformation amounts to
passing from Fig. 4(a) to Fig. 4(b):
τ (hS, S 0 i) = hS, τ (S 0 )i = hS, S 00 i.
— In terms of two-vocabulary structures, the transformation amounts to passing
from Fig. 5(a) to Fig. 5(b):
0
00
τ (S h·, i ) = S h·, i ,
where the superscript indicates what vocabularies are included in the structure
(“·” stands for “unprimed”).
Two-Vocabulary Structures as Procedure Summaries. Both (i) a pair of onevocabulary structures, and (ii) a two-vocabulary structure provide a way to represent the net transformation performed by an operation (or a sequence of operations). However, as illustrated above, in the absence of indelible names for nodes,
a two-vocabulary structure can represent information more precisely than a pair of
one-vocabulary structures, and thus a two-vocabulary structure can provide a more
precise procedure summary than a pair of one-vocabulary structures.
In the remainder of this section, we discuss the code fragment shown in
h·,0 i
Fig. 2(b). Structure S2 in Fig. 6(a) summarizes the transformation performed by
h0 ,00 i
“[2] b = rev(a);”, and structure S3
in Fig. 6(b) summarizes the transformation
performed by “[3] c = rev(b);”.
Transformer Composition. The result of composing the transformations represented by two two-vocabulary structures can be expressed as another two-vocabulary
h·,00 i
structure. For instance, consider the two-vocabulary structure S2;3 shown in
Fig. 7, which represents the result of composing Fig. 6(b) with Fig. 6(a) to obtain a
two-vocabulary structure for the sequence “[2] b = rev(a); [3] c = rev(b);”.
The composition of the transformations represented by two two-vocabulary structures can be expressed in terms of a meet operation on three-vocabulary structures.
To explain this, we introduce the graphical notation of dotted edges to represent unknown information (i.e., with truth value 1/2). For instance, Fig. 8(a) and Fig. 8(b)
h1/2,0 ,00 i
h·,0 ,1/200 i
, respectively, where
and S3
show two three-vocabulary structures S2
the symbol 1/2 in the superscript of a structure name indicates that the structure
h·,0 ,1/200 i
has only unknown information for a given vocabulary. Note that S2
and
ACM Journal Name, Vol. V, No. N, Month 20YY.
A Relational Approach to Interprocedural Shape Analysis
Operation
·
9
Resulting Structure
h·,0 i
S2
[2] b = rev(a);
=
[3] c = rev(b);
n
n
n’
n’
n’
b’
a,a’
a’,a”
h0 ,00 i
S3
n
n”
n”
n”
n’
n’
n’
b’,b”
=
c”
Fig. 6. The two-vocabulary structures that summarize (a) the transformation performed by
“[2] b = rev(a);”, and (b) the transformation performed by “[3] c = rev(b);”.
n
a,a”
00
h·, i
S2;3
n
n
b”
=
c”
n”
n”
n”
h·,00 i
Fig. 7. The two-vocabulary structure S2;3 represents the net transformation performed by the
sequence “[2] b = rev(a); [3] c = rev(b)”. Note that for each (unprimed) n-edge there is a
corresponding (double-primed) n00 -edge, and vice versa.
h1/2,0 ,00 i
S3
are three-vocabulary structures that correspond to the two-vocabulary
h·,0 i
h0 ,00 i
structures S2 and S3
from Fig. 6, respectively.
We introduce the meet operation (u), where “unknown” u “definite information”
h0 ,00 i
h·,0 i
yields “definite information”.3 With this notation, the composition S3
◦ S2 of
h·,0 i
h0 ,00 i
the transformations represented by two two-vocabulary structures S2 and S3
can be expressed in terms of three-vocabulary structures as
h0 ,00 i
S3
h·,0 i
◦ S2
h1/2,0 ,00 i
= project1,3 (S3
=
h·,0 ,1/200 i
u S2
)
h·,00 i
S2;3 .
h·,0 ,00 i
h1/2,0 ,00 i
h·,0 ,1/200 i
The three-vocabulary structure S2;3 obtained from S3
u S2
is shown
in Fig. 9. Finally, by projecting away the “middle” (single-primed) vocabulary from
h·,0 ,00 i
h·,00 i
S2;3 , we obtain the two-vocabulary composition result S2;3 shown in Fig. 7.
How These Ideas are Used in Relational Shape Analysis. In §5, we introduce a
way to use 3-valued structures as abstractions of sets of 2-valued structures. In
§6, this is extended to using two-vocabulary 3-valued structures as abstractions
of transformations on 2-valued structures. This provides what is needed for a
compositional approach to shape analysis:
— the 3-valued analog of the two-vocabulary version of transformer application
can be used for intraprocedural propagation;
3“Definite information” means “definitely present” (true, denoted by 1) or “definitely absent” (false,
denoted by 0). Thus, 1/2 u 1 = 1 = 1 u 1/2 and 1/2 u 0 = 0 = 0 u 1/2.
ACM Journal Name, Vol. V, No. N, Month 20YY.
10
·
Bertrand Jeannet et al.
n”
n”
n”
n”
(a)
h·,0 ,1/200 i
S2
a,a’
=
n”
n”
(b)
n”
n”
n
n
n’
n’
n’
n”
b’
a”,b”,c”
h1/2,0 ,00 i
S3
n”
n
a,b,c
a’,a”
c”
=
n
n”
n”
n”
n’
n’
n’
n
n
n
n
n
n
b’,b”
n
n
n
Fig. 8. Three-vocabulary structures for the two-vocabulary structures from Fig. 6. Dotted edges
indicate predicate tuples that have the value 1/2 (and hence correspond to information that is
unknown). In (a), the unprimed and single-primed vocabularies capture the transformation performed by [2] b = rev(a);, and the information in the double-primed vocabulary (predicates a00 ,
b00 , c00 , and n00 ) is unknown. In (b), the single-primed and double-primed vocabularies capture the
transformation performed by [3] c = rev(b);, and the information in the unprimed vocabulary
(predicates a, b, c, and n) is unknown.
a,a’,a”
h·,0 ,00 i
S2;3
=
c”
n
n
n
n’
n’
n’
n”
n”
n”
h·,0 ,00 i
Fig. 9. The three-vocabulary structure S2;3
0 00
0
00
h1/2, , i
h·, ,1/2 i
from Fig. 8: S3
u S2
. Note that
(double-primed) n00 -edge, and vice versa.
b’,b”
obtained from the meet (u) of the two structures
for each (unprimed) n-edge there is a corresponding
— the 3-valued analog of transformer composition can be used for interprocedural
propagation.
Two-vocabulary 3-valued structures are used as summary transformers for the
shape transformations performed by the possible sequences of operations in each
procedure, and an over-approximation of composition can be performed using the
meet operation on three-vocabulary 3-valued structures. In particular, it is possible to perform an over-approximation of the composition of the transformations
0
0 00
represented by two two-vocabulary 3-valued structures, (S # )h·, i and (T # )h , i ,
0
00
by (i) promoting them to three-vocabulary 3-valued structures (S # )h·, ,1/2 i and
0 00
(T # )h1/2, , i , (ii) taking their meet, and (iii) projecting away the middle vocabulary. (See §6.5.)
3. PRELIMINARIES
3.1 2-Valued First-Order Logic
We briefly discuss definitions related to first-order logic. We assume a vocabulary P
of predicate symbols and a set of variables, usually denoted by v, v1 , . . .. Formulas
ACM Journal Name, Vol. V, No. N, Month 20YY.
A Relational Approach to Interprocedural Shape Analysis
J1KS (Z)
Jp(v1 , . . . , vk )KS (Z)
J¬ϕ1 KS (Z)
Jϕ1 ∨ ϕKS (Z)
J∃v1 : ϕ1 KS (Z)
=
=
=
=
=
·
11
1
ι(p)(Z(v1 ), . . . , Z(vk ))
1 − Jϕ1 KS (Z)
max(Jϕ1 KS (Z), Jϕ1 KS (Z))
maxu∈U Jϕ1 K(Z[v1 7→ u])
Table I. Meaning of first-order formulas, given a logical structure
S = (U, ι) and an assignment Z.
are defined by the syntax:
ϕ ::=
|
|
|
1
p(v1 , . . . , vk )
¬ϕ | ϕ ∨ ϕ
∃v : ϕ
logical literal
where p is a predicate symbol of arity k
logical connectives
existential quantification
(1)
For reasons that will be made explicit in the next paragraph, we do not include
the formula v1 = v2 in the grammar itself. Instead, we assume that the vocabulary
P contains a special predicate symbol eq of arity 2 that will have a special interpretation. We will write v1 = v2 and v1 6= v2 for eq(v1 , v2 ) and ¬eq(v1 , v2 ). The
literal 0, the connectives ⇒ and ∧, and the quantifier ∀v are defined in the usual
way, in terms of items in grammar (1). A conditional expression ϕ1 ? ϕ2 : ϕ3 is an
abbreviation for (ϕ1 ∧ ϕ2 ) ∨ (¬ϕ1 ∧ ϕ3 ). The notion of free variables is defined in
the usual way.
The set {0, 1} of (2-valued) truth values is denoted by B. A 2-valued logical
structure S = (U, ι) is aSpair, where the universe U is a set of individuals and
the valuation ι : P → k≥0 (U k → B) maps each predicate symbol of arity k
to a predicate (or truth-valued function). The set of 2-valued structures over a
vocabulary P is denoted by 2−STRUCT[P]. We assume that for any (U, ι) ∈
2−STRUCT[P], ι(eq) is defined by ι(eq)(u1 , u2 ) = (u1 = u2 ).
An assignment Z : {v1 , . . . , vk } → U maps free variables (implicitly with respect
to a formula) to individuals. Given a 2-valued logical structure S = (U, ι) and an
assignment Z of free variables, the (2-valued) meaning of a formula ϕ, denoted by
JϕKS (Z), is defined in Tab. I by induction on the syntax of ϕ. A logical structure
satisfies a closed formula ϕ (i.e., without free variables), denoted by S |= ϕ, iff
JϕKS = 1. For open formulas, satisfaction with respect to assignment Z is defined
by S, Z |= ϕ, iff JϕKS (Z) = 1.
3.2 3-Valued First-Order Logic
We now extend the definitions from §3.1 to 3-valued logic, in which a third truth
value, denoted by 1/2, represents uncertainty. The set B ∪ {1/2} of 3-valued truth
values is denoted by T, and is partially ordered by the order l @ 1/2 for l ∈ B.
A 3-valued logical structure SS= (U, ι) is almost identical to a 2-valued structure,
except for the fact that ι : P → k≥0 (U k → T) maps each predicate symbol of arity
k to a 3-valued truth-valued function. The syntax of formulas defined in Eqn. (1)
is extended with the logical literal 1/2, which is given the meaning J1/2KS = 1/2.
The meaning of other syntactic constructs is still defined by Tab. I. Note that the
operations “−” and “max” can accept the value 1/2 as an operand.
A 3-valued logical structure potentially satisfies a closed (3-valued) formula ϕ,
ACM Journal Name, Vol. V, No. N, Month 20YY.
12
·
Bertrand Jeannet et al.
denoted by S |= ϕ, iff JϕKS ∈ {1/2, 1}. For open formulas, we have S, Z |= ϕ, iff
JϕKS (Z) ∈ {1/2, 1}.
We refer to [Sagiv et al. 2002] for the extension of first-order 2- and 3-valued
logic with transitive closure, which we have omitted here for the sake of simplicity.
The transitive closure of a formula with two free variables ϕ(v1 , v2 ) is denoted by
ϕ∗ (v1 , v2 ).
Embedding of 3-Valued Logical Structures. To abstract memory configurations
represented by logical structures, we use the following notion of embedding:
Definition 3.1. Given S = (U, ι) and S 0 = (U 0 , ι0 ), two 3-valued structures
over the same vocabulary P, and f : U → U 0 , a surjective function, f embeds S in
S 0 , denoted by S vf S 0 , if for all p ∈ P and u1 , . . . , uk ∈ U ,
ι(p)(u1 , . . . , uk ) v ι0 (p)(f (u1 ), . . . , f (uk ))
If, in addition,
ι0 (p)(u01 , . . . , u0k ) =
G
ι(p)(u1 , . . . , uk )
u1 ∈f −1 (u01 ),...,uk ∈f −1 (u0k )
then S 0 is the tight embedding of S with respect to f , denoted by S 0 = f (S).
Intuitively, f (S) is obtained by merging individuals of S and by defining accordingly
the valuation of predicates (in the most precise way). Observe that vid , which will
be denoted simply by v, is the natural information order between structures that
share the same universe. Note that one has S vf S 0 ⇔ f (S) vid S 0 .
We can now explain the usefulness of the eq predicate. Let S = (U, ι) ∈
2−STRUCT and S 0 = (U 0 , ι0 ) = f (S). We have

 1 if ∀u1 ∈ f −1 (u01 ), ∀u2 ∈ f −1 (u02 ) : ι(eq)(u1 , u2 ) = 1
0
0
0
0 if ∀u1 ∈ f −1 (u01 ), ∀u1 ∈ f −1 (u02 ) : ι(eq)(u1 , u2 ) = 0
ι (eq)(u1 , u2 ) =

1/2 otherwise
which can be simplified to

 1 if u01 = u02 ∧ |f −1 (u01 )| = 1
0
0
0
0 if u01 6= u02
ι (eq)(u1 , u2 ) =

1/2 if u01 = u02 ∧ |f −1 (u01 )| > 1
Note that u01 = u02 in the simplified definition is not a shorthand for eq(u01 , u02 );
it evaluates to true whenever u01 and u02 are the same individual of U 0 . Similarly,
u01 6= u02 evaluates to true when u01 and u02 are distinct individuals of U 0 . Hence, for
any S 00 = (U 00 , ι00 ) wf S, if for some u00 ∈ U 00 ι00 (eq)(u00 , u00 ) = 1, then |f −1 (u00 )| = 1,
otherwise |f −1 (u00 )| ≥ 1. Consequently, the value of the formula eq(v, v) evaluated
in a 3-valued structure S 00 indicates whether an individual of S 00 represents exactly
one individual in each of the structures S that can be embedded into S 00 , or at least
one individual.
The following preservation theorem about the interpretation of logical formulas
allows to interpret logical formulas in embedded structures in a conservative way
with respect to the original structure.
ACM Journal Name, Vol. V, No. N, Month 20YY.
srev
re
y=
l r
es=
r
z=x->n
…cal
z==
NUL
L
return site
call rev
… ret
y=x
= re
res
)
v(l
Fig. 10.
z)
v(
x->n=NULL
if(z==NULL)
z!
=N
UL
L
call rev
emain
13
l
al
smain
·
…c
ev(
l) A Relational Approach to Interprocedural Shape Analysis
return site
=x
>n
z
v
y=re
…ret
erev
(z) Interprocedural CFG of the list-reversal program.
Theorem 3.2 (Embedding theorem [Sagiv et al. 2002]). Let S = (U, ι)
and S 0 = (U 0 , ι0 ) be two 3-valued structures, such that there exists an embedding
function f with S vf S 0 . Then, for any formula ϕ(v1 , . . . , vk ) and assignment
Z : {v1 , . . . , vk } → U of free variables of ϕ, we have
0
JϕKS3 (Z) v JϕKS3 (Z 0 ),
where Z 0 : {v1 , . . . , vk } → U 0 is the abstract assignment defined by Z 0 (vi ) =
f (Z(vi )).
4. PROGRAMS AND MEMORY CONFIGURATIONS
We consider programs written in an imperative programming language in which
(1) it is forbidden to take the address of a local variable, a global variable, a parameter, or a function;
(2) parameters are passed by value;
(3) pointer arithmetic is forbidden.
These restrictions prevent direct aliasing among variables; thus, only nodes in heapallocated structures can be aliased. The third feature makes memory configurations
invariant under permutations of addresses. Note that both Java and Ml follow
these conventions.
4.1 Program Syntax
A program is defined by a set of procedures Pi , 0 ≤ i ≤ K. Each procedure has
local variables, formal input parameters, and output parameters. To simplify our
notation, we will assume that each procedure has only one input parameter and
one output parameter; the generalization to multiple parameters is straightforward.
We also assume that an input parameter is not modified during the execution of
the procedure. This assumption is made solely for convenience, and involves no loss
of generality because it is always possible to copy input parameters to additional
local variables.
ACM Journal Name, Vol. V, No. N, Month 20YY.
14
·
Bertrand Jeannet et al.
Set of cells
Cell
Pointer variable z
z ∈ Cell ∪ {NULL}
U
Universe
z:U →B
Unary relation
Table II.
Set-theoretic view
Pointer field n
n ∈ Cell → Cell ∪ {NULL}
n:U ×U →B
Binary relation
Logical view
Data variable x
x∈D
Data field d
d ∈ Cell → D
x:D
Nullary function
d:U →D
Unary function
Two related models of a program state, where D may be B or Int.
x
5
2
9
3
NULL
y
Fig. 11.
A possible store, consisting of a four-node linked list pointed to by x and y.
Thus, a procedure Pi = hfpii , fpoi , Li , Gi i is defined by its input parameter fpii ,
its output parameter fpoi , its set of local variables Li (containing fpii and fpoi ),
and Gi , its intraprocedural control flow graph (CFG).
A program is represented by a directed graph G∗ = (N ∗ , E ∗ ), called an interprocedural CFG. G∗ consists of a collection of intraprocedural CFGs G1 , G2 , . . . , GK ,
one of which, Gmain , represents the program’s main procedure. Each CFG Gi contains exactly one start node si and exactly one exit node ei . The nodes of a CFG
represent control points and its edges represent individual statements and branches
of a procedure in the usual way. A procedure call statement relates a call node
and a return-site node. For n ∈ N ∗ , proc(n) denotes the (index of the) procedure
that contains n. In addition to the ordinary intraprocedural edges that connect
the nodes of the individual flowgraphs in G∗ , each procedure call, represented by
call-node c and return-site node r, has two edges: (1) a call-to-start edge from c to
the start node of the called procedure; (2) an exit-to-return-site edge from the exit
node of the called procedure to r. The functions call and ret record matching call
and return-site nodes: call(r) = c and ret(c) = r. We assume that a start node has
no incoming edges except call-to-start edges.
4.2 Representing Memory Configurations
Consider a program that consists of several procedures, and, for the moment, ignore
the stack of activation records in each state. At a given control point, a program
state s ∈ State is defined by the values of the local variables and the heap. We
describe two ways in which such a state s can be modeled (see Tab. II):
— The set-theoretic model is perhaps more intuitive. We consider a fixed set Cell
of memory cells. The value of a pointer variable z is modeled by an element z ∈
Cell∪{NULL}, where NULL denotes the null value. If cells have a pointer-valued
field n, the values of n fields are modeled by a function n : Cell → Cell ∪ {NULL}
that associates with each memory cell the cell pointed to by the field. If cells
have an Int-valued (or, more generally, a data-valued) field x, the values of
x fields are modeled by a function d : Cell → Int that associates with each
memory cell the value of the corresponding field.
— Sagiv et al. [2002] model states using the tools of logic (cf. §3.1). Each state is
modeled as a 2-valued logical structure: the set of memory cells is replaced by
ACM Journal Name, Vol. V, No. N, Month 20YY.
A Relational Approach to Interprocedural Shape Analysis
·
15
a universe U of individuals; the value of a program variable z is defined by a
unary predicate on U ; and the value of a field n is defined by a binary predicate
on U . Integrity constraints are used to capture the fact that, for instance, a
unary predicate z that represents what program variable z points to can have
the value “true” for at most one memory cell [Sagiv et al. 2002].
We use the term “predicate of arity n” for a Boolean function U n → B. We use
Pn to denote the set of predicates symbols of arity n, and N to denote the set
of integer-valued function symbols. With such notation, the concrete state-space
considered is:4
State = (U → B)|P1 | × (U 2 → B)|P2 | × (U → Int)|N |
(2)
where |E| denotes the size of a finite set E. A concrete property in ℘(State) is thus
a set of tuples, each field of which is a function.
From now on, for the sake of simplicity, we will first perform the trivial abstraction
of the concrete state space defined by Eqn. (2) to the state-space
State = (U → B)|P1 | × (U 2 → B)|P2 |
(3)
In this case, a state S ∈ State can be represented bySa 2-valued logical structure
(U, ι) (§3.1), where the valuation function ι : P → k (U k → B) associates each
predicate symbol of arity k with a k-ary relation over U . We thus have State '
2−STRUCT[P].
In the sequel, we also assume that the universe U is infinite. Because all infinite
countable sets are isomorphic, we can omit the universe in declarations of 2-valued
structures S = (U, ι) ∈ 2−STRUCT[P], so that S will denote both the 2-valued
structure and its valuation function ι.
Remark 4.1. Because we want shape properties to be invariant under permutations of memory cells, we implicitly quotient State by the equivalence relation
S ≈ S 0 if there is a permutation f : U → U such that
∀p ∈ P : S 0 (p)(u1 , . . . , uk ) = S(p)(f (u1 ), . . . , f (uk ))
2
The predicates that are part of the underlying semantics of the language to be
analyzed are called core predicates. They will be distinguished from additional predicates that will be introduced later when abstracting concrete heaps. The set of
core predicates that are used is dictated by the semantics of the programming language to be analyzed. (The programming language can have a degree of abstraction
already built into it by the analysis designer, as illustrated by Remarks 4.2 and 4.3
below.) For the programs that we consider, and the part of the state-space that we
chose to analyze (Eqn. (3)), we need to introduce a core predicate for each program
variable and data-structure field, following Tab. II. The set of core predicates is
thus uniquely defined for a given program.
4 Eqn.
(2) is the concrete state-space that one has when the techniques of [Sagiv et al. 2002]
are combined with those of [Gopan et al. 2004]. To simplify Eqn. (2), we have omitted nullary
predicates, which would be used to model Boolean-valued variables, and nullary functions, which
would be used to model data-valued variables.
ACM Journal Name, Vol. V, No. N, Month 20YY.
16
·
Bertrand Jeannet et al.
Remark 4.2. (Modeling dynamic memory allocation) The free memory
pool required for dynamic memory allocation and deallocation is modeled using a
core predicate free(v), which has the value true for the unbounded number of nodes
modeling free memory cells.
2
Remark 4.3. (Modeling ordering among cells’ data values) In some
experiments of §8.2, we model lists and trees that are ordered with respect to integer
keys. However, according to Eqn. (3), we abstract integer values and we cannot
compare such keys directly. Instead, we introduce a special core predicate leq(v1 , v2 ),
which (i) is a total order, and (ii) has the value true on (v1 , v2 ) whenever the key of
cell v1 is less than or equal to the key of cell v2 . This core predicate can be seen as
an abstraction of the predicate cell1->key <= cell2->key when the state-space
of Eqn. (2) is abstracted into the state-space of Eqn. (3).
2
4.3 Semantics of Intraprocedural Operations
The usefulness of adopting the logical view for modeling memory becomes apparent
when defining the semantics of instructions. This is because one can use the language of first-order logic for specifying how predicates—and hence logical structures
and memory configurations—are transformed by the program’s operations.
In this section, we only discuss intraprocedural operations; the problem of defining the semantics of interprocedural operations is left to §7.1.
Generally speaking, the concrete operational semantics of a programming language is defined by specifying a state transformer for each kind of operation associated with intraprocedural edges of the control-flow graph. We distinguish among
the operations statements, which modify the program state, from conditions, which
select program states that satisfy the conditions. The semantics of a statement stm
is a transformer with signature JstmK : State → State; the semantics of a condition
cond is a predicate JcondK : State → B, which can be lifted to a transformer with
signature JcondK : ℘(State) → ℘(State) that filters out the states not satisfying the
condition.
4.3.1 Statements. The transformer of a statement stm acts on states modeled
as logical structures. It is defined using a collection of predicate-update formulas,
c(v1 , . . . , vk ) = ϕcstm (v1 , . . . , vk ), one for each core predicate c (see [Sagiv et al.
2002]). These formulas define how the core predicates of a logical structure S are
transformed by the statement stm to create a logical structure S 0 ; they define the
value of predicate c in S 0 as a function of c’s value in S. Formally,
JstmK : State −→ State
S 7−→ S 0
where
∀c ∈ P : S 0 (c)(u1 , . . . , uk ) = Jϕcstm (v1 , . . . , vk )KS ([v1 7→ u1 , . . . , vk 7→ uk ])
(4)
For instance, the semantics of the assignment statement z->n = NULL; is specified
by the predicate-update formulas
ϕnstm (v1 , v2 ) = n(v1 , v2 ) ∧ ¬z(v1 ),
ϕcstm (v1 , . . . , vk ) = c(v1 , . . . , vk ) for c 6= n
The predicate-update formula ϕnstm should be read as follows: “If the cell v1 is not
pointed to by the variable z, leave the n field of the cell v1 unchanged, otherwise
ACM Journal Name, Vol. V, No. N, Month 20YY.
A Relational Approach to Interprocedural Shape Analysis
Statement
z = NULL
z = y
z = y->sel
z->sel = NULL
z->sel = y (assuming that z->sel = NULL)
Table III.
·
17
Predicate-update formula
ιz (v) = 0
ιz (v) = y(v)
ιz (v) = ∃v1 : y(v1 ) ∧ sel(v1 , v)
ιsel (v1 , v2 ) = sel (v1 , v2 ) ∧ ¬z(v1 )
ιsel (v1 , v2 ) = sel (v1 , v2 ) ∨ (z(v1 ) ∧ y(v2 ))
Predicate-update formulas for statements.
Condition
z == NULL
z != NULL
z1 == z2
z1 != z2
z->sel == NULL (assuming that z != NULL)
z->sel != NULL
z1->sel == z2 (assuming that z1 != NULL)
z1->sel != z2
Table IV.
Precondition formula
∀v : ¬z(v)
∃v : z(v)
∀v : z1(v) ⇔ z2(v)
∃v : ¬(z1(v) ⇔ z2(v))
∀v1 , v2 : z(v1 ) ⇒ ¬sel (v1 , v2 )
∃v1 , v2 : z(v1 ) ∧ sel (v1 , v2 ))
∀v1 , v2 : z1(v1 ) ⇒ (sel (v1 , v2 ) ⇔ z2(v2 ))
∃v1 , v2 : z1(v1 ) ∧ ¬(sel (v1 , v2 ) ⇔ z2(v2 ))
Precondition formulas for conditions.
assign it the value NULL (represented by n(v1 , v2 ) = false for every cell v2 ).” We
assume that the statements of the analyzed program are decomposed into the elementary statements listed in Tab. III (which is always possible for the class of
languages considered in this paper). The elementary statements modify the value
of at most one core predicate. We omit writing explicit predicate-update formulas
for predicates that are unchanged by a statement. (The omitted formulas merely
express the identity transformation.)
4.3.2 Conditions. The semantics of a condition cond is defined by a precondition
formula ϕcond , which is a nullary formula that filters out structures that should not
follow the transition along edges e labeled by the condition. Formally,
Jϕcond K : ℘(State) −→ ℘(State)
X
7−→ X 0 ⊆ X
where
X 0 = {S ∈ X | S |= ϕcond }
(5)
For instance, the semantics of the condition z->n != NULL is given by the precondition formula
∃v1 , v2 : z(v1 ) ∧ n(v1 , v2 ),
which evaluates to false on logical structures for which the n field of the cell pointed
to by z (if any) is equal to NULL. Tab. IV gives the complete semantics of conditions.
Program assumptions, such as z!=NULL at the point of a dereference of z, are
checked by the analysis using the “halt” instruction of the TVLA system [LevAmi and Sagiv 2000], which generates an alert when a program assumption is not
satisfied.
4.3.3 Memory allocation and deallocation.. Remark 4.2 introduced the predicate free(v) for modeling the free memory pool. The semantics of a memory
deallocation instruction dealloc(z) is defined using the predicate-update formulas
ACM Journal Name, Vol. V, No. N, Month 20YY.
18
·
Bertrand Jeannet et al.
τ z (v) = 0 and τ free (v) = free(v) ∨ z(v). Intuitively, the semantics of a memory allocation instruction z = alloc() is to pick randomly a node v0 with free(v0 ) = 1, and
then update free(v) and z(v) using predicate-update formulas τ free (v) = free(v) ∧
¬eq(v, v0 ) and τ z (v) = eq(v, v0 ).5
5. ABSTRACTING MEMORY CONFIGURATIONS
In this section, we discuss the abstraction method developed by Sagiv et al. [2002],
which maps 2-valued logical structures (of arbitrary size) to 3-valued logical structures of bounded size.
The problem with representing and manipulating 2-valued structures is the unbounded universe U . Consequently, the starting point for abstracting a 2-valued
structure is the abstraction of the universe U to an abstract universe U ] of bounded
size. Intuitively, the abstraction consists of (i) merging concrete individuals into a
bounded number of abstract individuals U ] , and (ii) replacing the concrete predicates by abstract versions in which the values of the tuples reflect how concrete
individuals have been merged to create the abstract individuals.
5.1 The Abstraction Principle
Given a finite set U ] with a surjective function f : U → U ] , one can define the
following Galois connection, using the tight embedding on logical structures induced
by f and the partial order defined on 3-valued structures (see Defn. 3.1):
γf
−−
−− 3−STRUCT
℘(2−STRUCT) ←
−−
αf→
G
αf (X) =
f (S)
S∈X
]
γf (S )
=
{S | S vf S ] }
S
In this abstraction, sets of valuations for predicate symbols ι : P → k U k → B
S
] k
are abstracted with a single abstract valuation ι : P →
k (U ) → T .
5.2 The Abstract Domain of 3-Valued Structures
The abstraction principle depicted above is parameterized by a finite abstraction of
the universe U of 2-valued structures. The idea behind canonical abstraction [Sagiv
et al. 2002] is to choose a subset A ⊆ P1 of abstraction predicates and to define
an equivalence relation 'ιA on U that is parameterized by the logical structure
S ∈ 2−STRUCT to be abstracted:
u1 'SA u2 ⇔ ∀p ∈ A : S(p)(u1 ) = S(p)(u2 )
5 Unfortunately, “picking a node randomly” cannot be easily expressed in 2-valued logic, so we
define it directly in 3-valued logic using the special operator Focus that will be introduced in
§5. (To conserve space, we do not give the precise definition here.) An alternative would have
been to employ a concrete model of the free memory pool, e.g., using a singly-linked list, but this
would have increased the complexity of the summaries of procedures that perform allocation and
deallocation.
ACM Journal Name, Vol. V, No. N, Month 20YY.
A Relational Approach to Interprocedural Shape Analysis
x
r[n,x]
n
r[n,x]
n
x
r[n,x]
(a) a 2-valued structure S that
represents a singly-linked list
r[n,x]
n
·
r[n,x]
19
n
(b) the canonical abstraction of S
with A = {x}
Unary predicates associated with variable pointers (e.g., x) are depicted with arrows. The
other unary predicates (e.g., r[n, x]) are depicted inside nodes for which they evaluate to
true. (The meaning of r[n, x] will be explained in §5.2.1; see also Tab. V.) Binary predicates
(e.g., n) are depicted using arrows linking the two arguments. Solid arrows denote the value
1, dashed arrows denote the value 1/2. Summary nodes (for which eq = 1/2) are depicted
using double ovals.
Fig. 12.
Graphical representation of logical structures that represent memory configurations.
S
This equivalence relation defines the surjective function fA
: U → U/ 'SA that
maps an individual to its equivalence class. We thus have the Galois connection
γ
−−
−− ℘(3−STRUCT[P]) = A
℘(State) = ℘(2−STRUCT[P]) ←
−−
α→
S
α(X) = {fA
(S) | S ∈ X}
γ(Y ) = {S | S ] ∈ Y ∧ S vf S ] }
S
where fA
is the tight embedding function for logical structures
S
induced by fA
: U → U/ 'SA
The abstraction function α is referred to as canonical abstraction. It defines the
canonical 3-valued structures as those that are the image of canonical abstraction.
Fig. 12 illustrates the abstraction of a singly-linked list using the predicate x as
the unique abstraction predicate. The ordering in A extends the ordering between
3-valued structures as follows: Y1 v Y2 iff ∀S1] ∈ Y1 : ∃S2] ∈ Y2 : S1] v S2] .
Thanks to the Embedding Theorem (Thm. 3.2), one can evaluate a logical formula in a 3-valued structure to obtain a conservative result with respect to the
structure’s concretization as a set of 2-valued structures. Consequently, we can
reuse the formulas that specify the concrete operational semantics of statements
and conditions (see §4): when evaluated in a 3-valued structure, these formulas
yield sound approximations — in the abstract lattice A — of the concrete transformers.
5.2.1 Instrumentation Predicates. As always with abstraction interpretation,
there is a danger that as the analysis proceeds, the indefinite value 1/2 will become pervasive. This can destroy the ability to recover interesting information
(although soundness is maintained). A key role for improving the precision of the
abstraction is played by instrumentation predicates, which record auxiliary information in a logical structure. An instrumentation predicate p of arity k is defined
by a logical formula ψp (v1 , . . . , vk ) over the core predicate symbols, and captures
a property that each k-tuple of nodes may or may not possess. Tab. V lists some
instrumentation predicates that are important for the analysis of programs that use
type List.
If the set of instrumentation predicates is denoted by I ⊆ P, the concretization
function becomes:
S
(6)
γ(S ] ) = S ∈ γA
(S ] ) ∀p ∈ I : Jp(v1 , . . . , vk )KS2 = Jψp (v1 , . . . , vk )KS2
ACM Journal Name, Vol. V, No. N, Month 20YY.
·
20
Bertrand Jeannet et al.
p
Intended Meaning
ψp
t[n](v1 , v2 )
r[n, q](v)
Is v2 reachable from v1 along n fields?
Is v reachable from pointer variable q
along n fields?
Is v on a directed cycle of n fields?
Is v pointed by 2 or more n fields?
n∗ (v1 , v2 )
∃ v1 : q(v1 ) ∧ t[n](v1 , v)
c[n](v)
is[n](v)
∃ v1 : n(v, v1 ) ∧ t[n](v1 , v)
∃ v1 , v2 : ¬eq(v1 , v2 ) ∧
n(v1 , v) ∧ n(v2 , v)
Table V. Defining formulas of instrumentation predicates used to characterize singly-linked lists.
Typically, there is a separate predicate symbol r[n, q] for each pointer variable q.
The constraint in Eqn. (6) that the value of an instrumentation predicate p must
match its defining formula ψp filters out many concrete structures from consideration, thereby increasing the precision of the abstraction.
Moreover, the use of unary instrumentation predicates as abstraction predicates
provides a way to control which concrete individuals are merged together into summary nodes, and thereby to control the amount of information lost by abstraction.
For instance, in program-analysis applications, reachability properties from specific
pointer variables have the effect of keeping disjoint sublists or subtrees summarized
separately. This is particularly important when analyzing a program in which two
pointers are advanced along disjoint sublists.6
When applying the abstract transformer JstmK : 3-STRUCT → 3-STRUCT for
statement stm, one could first update the values of the core predicates, and then
reevaluate each instrumentation predicate’s defining formula in the resulting abstract store. However, this would not provide any additional information. To gain
maximum benefit from instrumentation predicates, their value should be computed
in some other way. This problem, the instrumentation-predicate-maintenance problem, is solved by updating the instrumentation predicates of the post-state as a
function of their values in the pre-state. [Reps et al. 2003] presents an algorithm
to generate an appropriate predicate-maintenance formula for each instrumentation predicate p, using the (core) predicate-update formulas ϕcstm that define the
semantics of stm, together with p’s defining formula ψp (v1 , . . . , vk ).
Given the importance of instrumentation predicates that express reachability
properties—such as t[n](v1 , v2 ) and r[n, q](v) shown in Tab. V—for maintaining
precision under canonical abstraction, there is one limitation of the method from
[Reps et al. 2003] that is worth mentioning: if b is a core binary predicate, and t[b] is
the corresponding reachability predicate, the method from [Reps et al. 2003] works
best when the modification to b by each concrete transformer is a unit-size change—
i.e., when the transformer changes the value of at most one b-tuple. This presents a
problem for creating summary transformers for procedures, because the net action
6A
method for automatically identifying appropriate instrumentation predicates, using a process
of abstraction refinement, is presented in [Loginov et al. 2005]. In that paper, the input required
to specify a program analysis consists of (i) a program, (ii) a characterization of the inputs, and
(iii) a query (i.e., a formula that characterizes the intended output). That work, along with [Reps
et al. 2003], provides a framework for automating most of the issues related to instrumentation
predicates that were explicit obligations of an analysis designer in the original formulation of the
3-valued-logic approach to shape analysis [Sagiv et al. 2002]. See also [Loginov 2006].
ACM Journal Name, Vol. V, No. N, Month 20YY.
A Relational Approach to Interprocedural Shape Analysis
·
21
of a procedure will modify multiple b-tuples, in general. Fortunately, the approach
to applying procedure summaries developed in this paper uses a different approach
to maintaining the values of instrumentation predicates than the one presented in
[Reps et al. 2003] (see §6.5).
5.2.2 Other Operations on Logical Structures. Several additional operations on
logical structures help prevent an analysis from losing precision [Sagiv et al. 2002]:
— Focus is an operation that can be invoked to elaborate a 3-valued structure—
allowing it to be replaced by a set of more precise 3-valued structures (not
necessarily images of canonical abstraction) that represent the same set of concrete stores.
— Coerce is a clean-up operation that may “sharpen” a 3-valued structure by
setting an indefinite value (1/2) to a definite value (0 or 1), or discard a structure
entirely if the structure exhibits some fundamental inconsistency (e.g., it cannot
represent any possible concrete store).
Because the Embedding Theorem applies to any pair of structures for which one
can be embedded into the other, it is not necessary to perform canonical abstraction after the application of each abstract transformer. To ensure that abstract
interpretation terminates, it is only necessary that canonical abstraction be applied
as a widening operator somewhere in each loop, e.g., at the target of each backedge
in the CFG.
6. REPRESENTING AND ABSTRACTING RELATIONS BETWEEN MEMORY CONFIGURATIONS
6.1 Motivation
As discussed more thoroughly in §7 and §9, there are two main approaches to
interprocedural static analysis: the functional and operational approaches [Sharir
and Pnueli 1981]. In this paper, we follow the functional approach (also known
as the relational approach). A key aspect of the functional approach is that it
computes procedure summaries. It computes a predicate transformer for each node
of the program by finding the smallest fixpoint of a set of equations over predicate
transformers. During this process, the effect of a call to procedure P at a call
site c is handled by composing the predicate transformer for c with the predicate
transformer for P . (The predicate transformer for P is the predicate transformer
for the exit node of P .) When the fixpoint solution is obtained, the predicate
transformer for P is the procedure summary for P . In this paper such predicate
transformers will be viewed as relations.
The main point here is that the ability to represent and abstract relations between
memory configurations is fundamental for capturing the input/output behavior of
a procedure. This section shows how representations for relations between memory
configurations that are represented as logical structures can be created. This representation is the basis of the interprocedural shape analysis described in the next
section.
ACM Journal Name, Vol. V, No. N, Month 20YY.
22
·
Bertrand Jeannet et al.
6.2 Principles of the Representation
We now return to the discussion from §2 about two ways to represent and abstract
relations between concrete program states, when a program state is a 2-valued
structure. The first approach described in §2 involved representing relations between concrete program states as sets of pairs of 2-valued structures.
This point of view leads to a simple abstraction, where abstract relations are (sets
of) pairs of 3-valued structures obtained by canonical abstraction; see Fig. 13(b).
However, this solution is unsatisfactory for the following reasons:
— There is a technical difficulty: as explained in Remark 4.1, logical structures are
implicitly defined up to a permutation of individuals. As explained in §2, this
leads to a loss of information compared with first pairing and then abstracting.7
With this representation it is also difficult to implement the application of
a predicate transformer (sets of pairs) to an input predicate (a set of logical
structures).
— From an efficiency point of view, applying such a solution to a complex abstract
domain like 3-valued structures would often lead to combinatorial explosion.8
Fortunately, another approach is possible. We will proceed by analogy with
an approach used when abstracting sets of vectors X ⊆ Rn and sets of relations
R ⊆ Rn × Rn between such vectors. Sets of vectors can be abstracted with convex
polyhedra [Cousot and Halbwachs 1978]:
γ
℘(Rn ) ← Pol[n]
It is well-known that a good approach to abstracting relations between vectors is not
to consider pairs of polyhedra, but to view relations between n-dimensional vectors
as sets of 2n-dimensional vectors, and to consider polyhedra in 2n dimensions:
γ
℘(Rn × Rn ) ← Pol[2n]
Indeed, a relation like ~x = x~0 cannot be finitely represented with pairs of polyhedra,
but is very easily represented with a 2n-dimensional polyhedron. Composing two
such relations P1 , P2 ∈ Pol[2n] is also easy: one computes the intersection
P12 (~x, x~0 , x~00 ) = P1 (~x, x~0 , −) ∩ P2 (−, x~0 , x~00 ) ∈ Pol[3n],
and then projects out the x~0 variables in P12 .
S
Coming back to 2-valued logical structures (U, ι : P → k (U k → B)), an analogy
can be drawn with polyhedra by considering each predicate symbol in a logical
structure over a vocabulary P, where |P| = n, to correspond to a dimension in
an n-dimensional vector. Thus, we will use logical structures over the duplicated
vocabulary P ] P 0 to represent relations between logical structures over vocabulary
7 In concrete structures, identity of individuals is preserved in any given run of a procedure. The
problem with abstraction-and-pairing is that the identity of the abstract individual to which a
given concrete individual is mapped is not necessarily the same when different concrete structures
are abstracted. The canonical name for u in S1] on entry to a procedure has no a priori fixed
relationship to the canonical name in a structure S2] that arises at the exit of the procedure.
8 Even with intraprocedural analysis using single structures, combinatorial explosion needs to be
carefully controlled by choosing a suitable set of abstraction predicates.
ACM Journal Name, Vol. V, No. N, Month 20YY.
A Relational Approach to Interprocedural Shape Analysis
·
23
c
list
id succ[n,inp,out]
n[out]
n[out]
id succ[n,inp,out]
id succ[n,out,inp]
n[out]
n[out]
n[out]
n[out]
n[out]
id succ[n,inp,out]
id succ[n,out,inp]
n[inp]
id succ[n,inp,out]
id succ[n,out,inp]
n[inp]
n[inp]
id succ[n,inp,out]
id succ[n,out,inp]
n[inp]
n[inp]
n[inp]
(a) Relational representation
n
list
n
c
n
list
n
n
n
n
(b) Tabulated representation
Fig. 13. Two abstractions of the relation between an input list and an output list in which a new
cell pointed to by c has been inserted—using destructive updating—somewhere in the middle of
the list. Predicates n[inp] and n[out] represent the valuations of the n predicate before and after
the insertion, respectively.
P. Observe that the representation of concrete and abstract relations is unified by
the notion of 3-valued structures, as before.
Taking the analogy further, the existential quantification of a dimension in a
set of vectors X ⊆ Rn corresponds to assigning the value 1/2 to all tuples of a
predicate. With the addition of a meet operation on 3-valued structures (described
in §6.5.2), we will be able to implement relation composition on two-vocabulary
structures, in a manner similar to convex polyhedra in 2n dimensions.
Example 6.1. Fig. 13(a) and (b) illustrate the relational and tabulated representations, respectively, of a relation between input lists pointed to by a pointer
variable list and output lists obtained by the insertion of a cell pointed to by
pointer c.9
The meanings of the relational instrumentation predicates displayed inside the
nodes in Fig. 13(a) are explained in §6.4. They allow the analysis to track whether
the fields of some cells have been modified or not.
Observe that the relational representation provides more information, because
each cell is tracked individually in the representation. For instance, in Fig. 13(b),
the information that the output list contains exactly one more cell than the input
list is lost. Furthermore, with the tabulated representation, there is no way to
determine whether the cells in the output list have been permuted from their order
in the input list. In contrast, with the relational representation and the use of the
relational instrumentation predicates, it is possible to record the fact that the fields
of some cells have not been mutated.
2
9 To
reduce clutter, we have omitted certain information from Fig. 13(a); in particular, values have
been omitted for some of the standard list predicates given in Tab. V, and therefore the reason
why certain non-summary nodes have been kept separate from the summary nodes may not be
apparent. This is just to simplify the diagram; the actual system has additional information not
shown in Fig. 13(a) that controls which collections of nodes are summarized.
ACM Journal Name, Vol. V, No. N, Month 20YY.
24
·
Bertrand Jeannet et al.
6.3 Structure of the Vocabulary
In this section, we define the vocabularies that are used when two-vocabulary logical
structures are used to represent relations between logical structures.
Because our analysis method will use relation composition (see Eqn. (12) in §7),
we actually need three vocabularies. For each original predicate p ∈ P, we will
define three predicates p[inp], p[out] and p[tmp]. A logical structure that represents
a relation will use only p[inp] and p[out ] predicates. The p[tmp] predicates (which
will be used for computing compositions as explained below) are irrelevant outside
of composition. The “irrelevancy” of a predicate corresponds to “undefinedness”,
and will be modeled in a 3-valued structure using the value 1/2. We will refer to
the labels inp, out and tmp as modes.
We have already distinguished, among predicates, core predicates from instrumentation predicates: P = C ∪ I. Moreover, among core predicates, we have
distinguished predicates related to the local state and those related to the global
state: C = L ∪ G. The vocabulary of core predicates will now contain:
— three sets of predicates corresponding to global core predicates in G: G[inp],
G[out] and G[tmp];
— the set of local core predicates L.
We will assume that the formal input parameter of a procedure is not modified
in the procedure, so as to obtain at the exit node of the procedure a relationship
between the values of predicates in G[inp] ∪ {fpi} and predicates in G[out] ∪ {fpo}.
The other local variables may be forgotten at the exit node.
The case of an instrumentation predicate p is a bit more complex, because it
depends on the predicates involved in its defining formula ψp . If ψp involves at least
one global predicate, the vocabulary will include three copies of the instrumentation
predicate p: p[inp], p[out] and p[tmp]. For instance, the vocabulary will include
three copies of the reachability predicate r[n, q](v) defined in Tab. V, because we
need to characterize a cell by its reachability properties from the pointer variable q
through n links both at the entry of the procedure and at the current control point.
We can now give the precise definition of 3-valued structures S ] = (U ] , ι] ) ∈
3-STRUCT[P[inp] ∪ P[out]] in terms of a relation R ⊆ (2-STRUCT[C])2 :


∃S = (U, ι) ∈ γ(S ] ) :






 ∀p ∈ G[inp] : ι1 (p) = ι(p[inp]) 
]
γr (S ) = ((U, ι1 ), (U, ι2 )) 
 ∀p ∈ G[out ] : ι2 (p) = ι(p[out ]) 



∀p ∈ L : ι1 (p) = ι2 (p) = ι(p) 
where the concretization function γ is defined by Eqn. (6).
6.4 Relational Instrumentation Predicates
To prevent loss of essential information, we also need specific instrumentation predicates to capture properties that relate p[inp] predicates and p[out ] predicates. We
call such multi-vocabulary instrumentation predicates relational instrumentation
predicates.
ACM Journal Name, Vol. V, No. N, Month 20YY.
A Relational Approach to Interprocedural Shape Analysis
·
25
In particular, it will be essential to capture accurately the identity relationship (see §7.1, Eqn. (11)). As a consequence, we always use the unary predicates
id succ[n, m1 , m2 ] and id pred[n, m1 , m2 ], where m1 , m2 ∈ {inp, out} and m1 6= m2 ,
to record information about the values of different modes of predicate n, such as
whether the value of predicate n[m1 ] implies n[m2 ]. These are defined by
id succ[n, m1 , m2 ](v) = ∀v1 : (n[m1 ](v, v1 ) ⇒ n[m2 ](v, v1 ))
id pred[n, m1 , m2 ](v) = ∀v1 : (n[m1 ](v1 , v) ⇒ n[m2 ](v1 , v)).
Example 6.2. In Fig. 13(a), the fact that id succ[n, inp, out](v) and
id succ[n, out , inp](v) both hold for the two summary nodes captures the fact
that the concrete memory cells represented by these summary nodes have not
been reordered. More generally, the value of id succ[n, m1 , m2 ] on the different
nodes allows to capture precisely that the only transformation performed on the
list is the addition of the new cell. (Looking ahead to Fig. 15(a), the fact that
id succ[n, inp, tmp](v) and id succ[n, tmp, inp](v) hold globally captures the condition that the n[inp] and n[tmp] predicates are identical.)
2
Generally speaking, relational instrumentation predicates are essential to preserving
relational information that would otherwise be lost when concrete nodes are merged
into summary nodes.
Some additional constraint rules related to these relational instrumentation predicates are also needed for the relation composition operation defined in §6.5. These
constraint rules expresses logical consequences between relational instrumentation
predicates. For instance, the rule
id succ[n, m1 , m2 ](v) ∧ id succ[n, m2 , m3 ](v) ⇒ id succ[n, m1 , m3 ](v)
for m1 6= m2 6= m3 is standard for capturing the fact that the composition of two
identity relations is the identity relation. At present time such rules are provided
manually.
Depending on the procedures in the analyzed program and their semantics, one
may need additional relational instrumentation predicates and constraint rules. For
the list-reversal example of Fig. 1, §8.1 discusses the relational instrumentation
predicates used to capture the fact that the list has been reversed.
6.5 Relation Composition
As mentioned in §6.2, relation composition can be defined in term of meet and
0 00
0
projection operations. In the notation from §2, the composition S h , i ◦ S h·, i of the
0
0 00
transformations represented by two two-vocabulary structures S h·, i and S h , i is
performed as follows:
0 00
Sh ,
i
0
0 00
◦ S h·, i = project1,3 (S h1/2, ,
i
0
00
u S h·, ,1/2 i )
(7)
00
= S h·, i .
We define the projection and meet operations below, and discuss their interaction
with instrumentation predicates.
6.5.1 The Projection Operation. The existential quantification of a (core) predicate symbol p0 in a 2-valued logical structure S = (U, τ ) is formally defined as the
ACM Journal Name, Vol. V, No. N, Month 20YY.
26
·
Bertrand Jeannet et al.
disjunction of all the possible values {1, 0} for all tuples of the predicate p0 in S,
leading to a set of 2-valued structures:
∃p0 : S = {S 0 = (U, τ 0 ) | ∀p ∈ P \ {p0 } : τ 0 (p) = τ (p)}.
Now consider existential quantification in a 3-valued logical structure S ] . The
goal is to create a 3-valued structure that over-approximates the result of existential
quantification in all 2-valued structures that S ] represents.
When S ] contains no instrumentation predicates, existential quantification can
be modeled exactly by assigning the value 1/2 to all tuples of the predicate p0 , as
follows:
(∃p0 : S) = (U, τ 0 ),
where τ 0 is defined by ∀~u ∈ U ∗ : τ 0 (p0 )(~u) = 1/2 ∧ ∀p ∈ P \ {p0 } : τ 0 (p) =
τ (p). This operation can be implemented with a predicate-update formula (§5.2).
Applying the concretization operation γ : ℘(3-STRUCT) → ℘(2-STRUCT) gives
back the disjunction of 2-valued structures.
Matters are slightly different when we consider a 3-valued logical structure equipped
with instrumentation predicates. Consider S ] ∈ 3-STRUCT[P], where P = C ∪ I
has core predicates C and instrumentation predicates I. Quantifying out a core
predicate c alone may not be sufficient to drop all information about c: in particular, every instrumentation predicate whose defining formula involves c provides
(a degree of) redundant information about c; hence, all instrumentation predicates
whose defining formula involves c should also be quantified out.10
Projecting a logical structure in 3-STRUCT[P[inp] ∪ P[out] ∪ P[tmp]] onto the
subspace 3-STRUCT[P[inp] ∪ P[out]] is thus equivalent to the existential quantification of all p[tmp] predicates, for p ∈ P, as well as all relational instrumentation
predicates that involve a predicate in p[tmp].
This operation on 3-valued structures is extended in the standard way to our
abstract domain ℘(3-STRUCT[P[inp] ∪ P[out] ∪ P[tmp]]) that manipulates sets of
such structures.
6.5.2 The Meet Operation. The meet operation is first defined as the greatestlower-bound operation induced by the approximation order in the lattice
3-STRUCT[P]. It is then extended to the abstract domain ℘(3-STRUCT[P]).
[Arnold et al. 2006] shows that in general the first operation is NP-complete. However, [Arnold et al. 2006] provides an algorithm based on graph matching that
performs rather well in practice. This is discussed in more detail in §8.3.
The effect of the meet operation on instrumentation predicates deserves a further
remark: In the context of abstract structures, it should be combined with the Coerce
operation discussed in §5.2.2, which propagates logical consequences between (core
and instrumentation) predicates. Indeed, the standard meet operation performs a
logical meet without exploiting the defining formulas of instrumentation predicates:
instrumentation predicates are just treated as independent core predicates.
10 Quantifying
out c and all instrumentation predicates whose defining formula involves c might not
be the best correct approximation of quantifying out c in all concrete structures represented by S ] if
the defining formula ψp of an instrumentation predicate p has a syntactic dependence on c without
involving a true semantic dependence—for instance, if we have ψ(p)(~
v ) = . . . ∧ (c(~
v ) ∨ ¬c(~
v)).
ACM Journal Name, Vol. V, No. N, Month 20YY.
·
A Relational Approach to Interprocedural Shape Analysis
p = 1/2
p = 1/2
c
c
(a) S1]
p = 1/2
p=1
c
(b) S2]
27
c
(c) S ] = S1] u S2]
(d) coerce(S ] )
Fig. 14. Applying the Coerce operation after the
p a nullary
meet operation. c is a core predicate,
instrumentation predicate defined by p = ∃v : c(v) ∧ ∀v0 : v 6= v0 ⇒ ¬c(v0 ) .
list
n[inp]
list,res
id succ[n,inp,tmp]
id succ[n,tmp,inp]
id succ[n,tmp,out]
id succ[n,out,tmp]
n[out]
n[out]
n[inp]
id succ[n,inp,tmp]
id succ[n,tmp,inp]
n[tmp]
id succ[n,tmp,out]
id succ[n,out,tmp]
n[tmp]
n[tmp]
(a) S1]
n[tmp]
(b) S2]
list,res
list,res
id
id
id
id
n[inp],n[out]
n[inp],n[out]
succ[n,inp,tmp]
succ[n,tmp,out]
succ[n,out,tmp]
succ[n,tmp,inp]
n[inp],n[out]
id
id
id
id
n[tmp]
succ[n,inp,tmp]
succ[n,tmp,out]
succ[n,out,tmp]
succ[n,tmp,inp]
id
id
id
id
id
id
succ[n,inp,tmp]
succ[n,tmp,out]
succ[n,inp,out]
succ[n,out,tmp]
succ[n,tmp,inp]
succ[n,out,inp]
id
id
id
id
id
id
n[inp],n[out]
n[tmp]
succ[n,inp,tmp]
succ[n,tmp,out]
succ[n,inp,out]
succ[n,out,tmp]
succ[n,tmp,inp]
succ[n,out,inp]
n[tmp]
(c)
S1]
list,res
u
n[tmp]
S2]
(d) coerce(S1] u S2] )
list,res
n[inp]
n[out]
n[out]
n[inp]
id succ[n,inp,out]
id succ[n,out,inp]
n[out]
id succ[n,inp,out]
id succ[n,out,inp]
n[inp]
n[out]
(e)
coerce(projectinp,out (S1]
Fig. 15.
n[inp]
u
S2] ))
(f) projectinp,out (coerce(S1] u S2] ))
Applying the Coerce operation in relation composition
Consider the example of Fig. 14. It returns the structure depicted in Fig. 14(c),
where p holds the indefinite value 1/2. However, performing a semantic reduction
on S ] using Coerce leads to p obtaining the definite value 1, as shown in Fig. 14(d).
In this case, Coerce used constraint rules derived from the defining formula of p to
infer that p must have the value 1. (See [Sagiv et al. 2002, §6.4] for more details
about the use of constraint propagation during Coerce.)
This aspect is even more important in the context of multi-vocabulary logical
structures that are combined for relation composition, see Fig. 15. As discussed
ACM Journal Name, Vol. V, No. N, Month 20YY.
28
·
Bertrand Jeannet et al.
in §6.4, the multi-vocabulary logical structures that we work with are typically
equipped with relational instrumentation predicates and related constraint rules.
To retain precision, it is necessary to make sure that logical consequences of the
predicates in the vocabulary to be dropped have been incorporated into the predicates of the other vocabularies before projection. Fig. 15 illustrates this point
when the id succ[n,m1,m2] relational instrumentation predicates and their related
constraint rules as defined in §6.4 are active. It shows that applying the Coerce operation before projection is the key to obtaining the fact that the resulting relation
in Fig. 15(f) is the identity relation.
As a consequence, the abstract meet operation between 3-valued structures is
defined as
S1] u] S2] = coerce(S1] u S2] ),
where u is the standard meet on 3-valued structures, and the abstract relationcomposition operation (Eqn. (7)) is redefined as
0 00
Sh ,
i
0
0 00
◦ S h·, i = project1,3 (S h1/2, ,
i
0
00
u] S h·, ,1/2 i )
(8)
00
= S h·, i .
The use of the abstract meet operation in Eqn. (8) addresses a problem that was
mentioned in §5.2.1: the instrumentation-predicate-maintenance formulas created
by finite differencing [Reps et al. 2003] are able to maintain definite values for instrumentation predicates that express reachability properties only for unit-size changes
to core predicates. However, procedure summaries can involve non-unit-size changes
to core predicates. We side-step this problem by using abstract meet—rather than
a method that involves finite differencing—to implement abstract relation composition.
7. INTERPROCEDURAL SHAPE ANALYSIS
Our interprocedural shape analysis is based on a variant of the functional approach to interprocedural analysis [Cousot and Cousot 1977; Sharir and Pnueli
1981; Knoop and Steffen 1992], in which the two computation steps referred to in
§6.1 are merged into a single step. Jeannet and Serwe [Jeannet and Serwe 2004]
show how the functional approach can be derived as an abstract interpretation of
the standard operational semantics, modeled using a stack of activation records.
Once the interprocedural semantics is defined in this way, a second abstraction step
may be used to abstract the data (in our case, the values of variables and linked
memory cells).
In this section, we start directly from the derived forward relational semantics
obtained by abstract interpretation of the standard operational semantics, as described in [Jeannet and Serwe 2004]. In §7.1, we first instantiate this forward relational semantics for the case where relations between memory configurations are
represented as sets of pairs. In §7.2, we reformulate it for the case where relations
are represented and abstracted with the two-vocabulary structures defined in §6, so
as to obtain the effective dataflow equations used by our analysis. Finally, in §7.3,
we discuss how these dataflow equations can be modified so that their solutions
can be obtained more rapidly. (Experimental results with the latter technique are
presented in §8.4.)
ACM Journal Name, Vol. V, No. N, Month 20YY.
A Relational Approach to Interprocedural Shape Analysis
·
29
7.1 Forward Relational Semantics
In the forward relational semantics, each node of the program’s CFG is associated
with a relation between
— the states reachable at the entry node of the current procedure, and
— the states reachable at the current node of the procedure.
The relational semantics is defined as the least fixpoint of a system of equations
over such relations.
Each procedure is viewed as a pure function taking inputs and returning outputs,
without performing any side effect on the global store. However, the programs that
we consider do modify the global store, defined by the heap and the value of global
variables. To account for this, at the semantic level we include the heap and the
global variables as implicit input and output parameters of the functions, in addition
to the explicit input and output parameters.
Notation. For the time being, we represent relations between concrete memory
configurations as sets of pairs of 2-valued structures. Thus, we define the function
R : N → ℘(State × State), which maps each node of the CFG to a relation over
states.
States are represented as 2-valued logical structures over core predicates. Among
the core predicates, some predicates represent information about the local state
of a procedure (i.e., the values of local variables), while other predicates represent
information about the global state of the program, i.e., the structure of the heap
and the values of global variables. We thus decompose the set of core predicates
into local and global predicates:
C =G∪L
Intraprocedural Operations. An edge n → n0 of the CFG labeled with a statement
stm or a condition cond generates the following equations, respectively:
R(n0 ) ⊇ {(S, S 00 ) | (S, S 0 ) ∈ R(n) ∧ S 00 = JstmK(S 0 )}
0
00
0
00
0
R(n ) ⊇ {(S, S ) | (S, S ) ∈ R(n) ∧ S ∈ JcondK({S })}
(9)
(10)
Intuitively, the current relation is composed with the relation induced by the semantics of the operation. We use inclusions in the equations because several edges
may have n0 as their target.
Procedure Calls. In a procedure call, modeled by a call-to-start edge (c, s) labeled
by an expression hcall apo = Pi (api)i, the current global state and the actual parameter are passed to the callee, while the other local variables become undefined.
One generates the identity relation from the obtained reachable set of states:
T (fpi) = S 0 (api) ∧ 0
R(s) = (T, T ) (S, S ) ∈ R(c) ∧ (11)
∀p ∈ G : T (p) = S 0 (p)
Note that an undefined predicate is modeled as: “any value is possible”.
Procedure Returns. This is the most complex operation. We assume that an exitto-return edge (e, r) is labeled by hret apo = Pi (api)i, and that (e, r)’s corresponding
ACM Journal Name, Vol. V, No. N, Month 20YY.
30
·
Bertrand Jeannet et al.
call-to-start edge is (c, s) (i.e., call(r) = c). The processing of a procedure return
consists of the following steps:
— composing the relation R(c) at the corresponding call node c with the relation
R(e) at the exit node of the callee, to create the global state at r;
— taking the local state at the call node and modifying it with the assignment of
the actual output parameter at the exit node, to create the local state at r.

(S, S 0 ) ∈ R(c) ∧ (T, T 0 ) ∈ R(e)



S 0 (p) = T (p) ∧ S 0 (api) = T (fpi)
W (p) = S 0 (p)



W (p) = T 0 (p) ∧ W (apo) = T 0 (fpo)
(12)
In the above equations, for (S, S 0 ) and (T, T 0 ) to be composable, the states S 0 and
T must agree on the input parameters (actual and formal) and the global state. In
the new state W , the values of local variables except the actual output parameter
are inherited from S 0 , while the global state and the value of the actual output
parameter are taken from T 0 .




∧
∀p ∈ G :
R(r) = (S, W ) ∧ ∀p ∈ L \ {apo} :



∧
∀p ∈ G :
The Initial Set of Relations. Normally, the analysis starts in an initial state (here,
a relation). Assuming that the set of possible memory configurations at the start
node of the main procedure is X, we add the inclusion
R(smain ) ⊇ {(S, S) | S ∈ X}
(13)
Reachable States. The set that we want to compute is the least fixpoint of
Eqns. (9), (10), (11), (12), and (13). This defines a framework for interprocedural
dataflow analysis:
— A given analysis is obtained by instantiating these equations for a suitable
abstract domain.
— At each control-flow graph node, the fixpoint solution captures the relation
between the reachable states at the entry of the current procedure and the
reachable states at the current node.
— The states reachable at each node n can thus be extracted by projecting the
relation R(n) onto its second component.
Eqns. (9), (10), (11), (12), and (13) are a particular version of the equations given
in [Jeannet and Serwe 2004], except that the global state is passed back and forth
explicitly. (Also, here we merge the two sets of activation records that were kept
separate in [Jeannet and Serwe 2004] to support backward analysis.) The soundness
of the semantics with respect to the standard operational semantics is proven in
[Jeannet and Serwe 2004] by using abstract interpretation.
7.2 Dataflow Equations
In §6, we showed how to represent relations between logical structures more efficiently and to abstract them more precisely with two-vocabulary structures. We
thus instantiate the equations of §7.1 with this better representation.
ACM Journal Name, Vol. V, No. N, Month 20YY.
A Relational Approach to Interprocedural Shape Analysis
·
31
Intraprocedural Operations. Eqns. (9) and (10) are replaced by
R(n0 ) ⊇ JstmK(R(n))
R(n0 ) ⊇ JcondK(R(n))
except that predicate-update formulas and precondition formulas in functions JstmK
and JcondK defined by Eqns. (4) and (5) are modified by replacing global predicates
p ∈ G with predicates p[out] ∈ G[out]. For instance, in the case of the statement
x->n := NULL, the predicate-update formula becomes
n0 [out](v1 , v2 ) = n[out ](v1 , v2 ) ∧ ¬x(v1 ).
Procedure Calls. Eqn. (11) is replaced by
 

T (fpi) = S(api)
 

T (p) = 1/2
R(s) = T S ∈ R(c) ∧  ∧ ∀p ∈ L \ {fpi} :
 
∧
∀p ∈ G : T (p[inp]) = T (p[out]) = S(p[out ])
Procedure Returns. We proceed in three steps to implement Eqn. (12). First we
take the relation R(e) at the exit node of the callee and transform it by eliminating
local variables that are not formal input or output parameters, and by setting the
values of p[tmp] predicates to the values of p[inp] predicates:

 
S 0 (p) = 1/2

  ∀p ∈ L \ {fpi, fpo} :
∀p ∈ G : S 0 (p[inp]) = 1/2
R0 (e) = S 0 ∃S ∈ R(e) :

 
∀p ∈ G : S 0 (p[tmp]) = S(p[inp])
We also take the relation R(c) at the call node, set the values of p[tmp] predicates
to the values of p[out] predicates, and equate formal and actual input parameters.
(To simplify the presentation, we assume that there are no name conflicts.)
 

S 0 (fpi) = S(api)
 
R0 (c) = S 0 S ∈ R(c) ∧  ∧ ∀p ∈ G : S 0 (p[tmp]) = S(p[out ]) 
 
∧ ∀p ∈ G : S 0 (p[out]) = 1/2
The last step consists of combining R0 (c) and R0 (e) by taking their meet, assigning
the formal output parameter to the actual parameter, and then forgetting p[tmp]
predicates and the formal output parameter of the callee:
 

S 0 (apo) = S(fpo) 
 S 0 (fpi) = 1/2 
R(r) = S 0 S ∈ R0 (c) u] R0 (e) ∧ 
(14)
 
0
∧ ∀p ∈ G : S (p[tmp]) = 1/2
The meet forces the relations R0 (c) ∈ 3-STRUCT[P[inp] ∪ P[tmp]] and R0 (e) ∈
3-STRUCT[P[tmp] ∪ P[out]] to agree on global p[tmp] predicates, and on actual
and formal parameters.
With the exception of the meet operation, all operations can be implemented using predicate-update formulas (cf. §4.3). We do not specify in the equations above
how instrumentation predicates are updated—the implementation mainly uses the
automatically generated predicate-maintenance formulas created by finite differencing [Reps et al. 2003], although for some simple cases and for instrumentation
predicates that involve only one mode, they were provided manually. In particular,
ACM Journal Name, Vol. V, No. N, Month 20YY.
·
32
Bertrand Jeannet et al.
R(n0 ) ⊇ Finstr (R(n))
R(smain ) ⊇ IdState
R(s)⊇Idcodom(R(c))
R(r)⊇FCombine (R(c), R(e))
R(s)⊇IdState
R(r)⊇FCombine (R(c), R(e))
main()
f()
g()
main()
main()
f()
f()
R(s)⊇Idcodom(R(c))
R(r)⊇FCombine (R(c), R(e))
f()
g()
f()
g()
f()
g()
g()
(a) Phase I, bottom-up
f()
g()
g()
(b) Phase II, top-down
f()
g()
f()
g()
(c) Combination with forward
relational semantics
Fig. 16. Inequation systems and induced dependences between variables. Solid and dashed lines
are used to distinguish between the first and second calls to f() and g().
for the procedure-call operations, we provided manually the values of relational
instrumentation predicates that model the identity relationship.
7.3 Impact of the Form of the Dataflow Equations on Precision and Efficiency
We now compare our one-phase approach to interprocedural analysis to a two-phase
approach that is in the spirit of [Sharir and Pnueli 1981; Knoop and Steffen 1992].
At the concrete level, the two approaches are semantically equivalent; however, at
the abstract level the one-phase approach can yield more precise answers because
the abstract operations are no longer exact. We exploited this difference to develop an optimization that, in practice, speeds up the convergence of our one-phase
analysis while still retaining its precision advantages (see the experimental results
presented in §8.4).
Our forward relational semantics can be sketched as follows:
R(n0 )⊇Finstr (R(n))
R(smain )⊇IdState
R(s)⊇Idcodom(R(c))
R(r)⊇FCombine (R(c), R(e))
Intraprocedural statement/condition
Uninitialized state at start
Procedure call, with call-to-start edge (c, s)
Procedure return, with call-site c and exit-to-return
edge (e, r)
where IdX denotes the identity relation restricted to the domain X and codom(R)
denotes the projection of a relation R on its codomain. In contrast, the two-phase
method involves solving two equation systems in succession. The first system,
which defines the so-called bottom-up phase, computes procedure summaries that
are valid for any input, instead of being specialized to the reachable inputs of the
callee. The second system, which defines the so-called top-down phase, computes
reachability information using the procedure summaries R(e) computed by the first
phase. The corresponding equations are given in Fig. 16.
The advantage of combining the two phases into a single phase—as done in §7.1—
ACM Journal Name, Vol. V, No. N, Month 20YY.
A Relational Approach to Interprocedural Shape Analysis
·
33
is that the one-phase approach can yield more precise answers (because it computes
procedure summaries that are specialized to the reachable inputs of the callee).
The one-phase approach may converge more slowly than the two-phase approach
because the one-phase equation system is more intricate. However, it is possible
to speed up the convergence of the one-phase analysis while retaining its precision
advantage as follows: it is sound to replace the inequation R(s) ⊇ Idcodom(R(c))
associated with call-to-start edges by R(s) ⊇ Idcodom(R(c))∪X 0 (c) for any X 0 (c).
The idea is to choose X 0 (c) to be a set of states that is very likely to be reachable
(e.g., for a list-manipulating procedure, the set of well-formed lists). Because it
may take several iteration steps for the solver to obtain this information and to
propagate it further, adding it right from the beginning may speed up convergence.
From a semantic point of view, this is equivalent to starting the iterative fixpoint
computation from a higher initial value for R(s), which becomes IdX 0 (c) instead of
⊥. Two cases can occur:
— X 0 (c) contains only reachable inputs of the callee, and the initial value of R(s)
is still smaller than the smallest solution of the original equation system; in this
case, there is no impact on precision, and we gain a convergence speed-up.
— X 0 (c) contains some unreachable inputs of the callee, which can have a negative
impact on precision (with respect to the precision of the one-phase approach).
However, in the limit (i.e., when X 0 (c) = State), one obtains the equations of
Phase I of the two-phase approach: the propagation from call nodes to start
nodes of only reachable abstract values is completely eliminated. In this case,
the precision of the one-phase approach degenerates to that of Phase I of the
two-phase approach (see Fig. 16).11
The results of our experiments with this optimization are presented in §8.4.
8. IMPLEMENTATION AND EXPERIMENTS
To perform interprocedural shape analysis by the method that is described in §7,
we created a modified version of TVLA [Lev-Ami and Sagiv 2000], an existing
shape-analysis system, to allow it to support the following features:
— We replaced the built-in notion of an intraprocedural CFG by the more general
notion of equation system, in which transfer functions may depend on more
than one variable. This modification was needed for implementing the return
operation (Eqn. (14)).
— We also designed a more general language in which to specify equation systems.
These modifications, originally performed in 2003 [Jeannet et al. 2004], were made
to the version of the TVLA system as it existed in 2003 [Lev-Ami and Sagiv 2000].
Later, we extended the modified system to incorporate the algorithm for the meet
11 In
the limit case, the approach described above still solves only one equation system—one that
is equivalent to Phase I of the two-phase approach. Thus, although the results are as precise as
the two-phase approach with respect to summary functions, because we do not perform Phase
II afterwards, the results obtained are imprecise with respect to what Phase II discovers about
the reachable inputs of callees. In such a case, it would be possible to obtain more accurate
information by solving the equations of Phase II.
ACM Journal Name, Vol. V, No. N, Month 20YY.
·
34
Bertrand Jeannet et al.
Const. unary predicates
r[n,in,list]
r[n,out,res]
reverse_n_succ[in,out]
reverse_n_succ[out,in]
list
Const. unary predicates
id_succ[n,in,out]
id_succ[n,out,in]
id_pred[n,in,out]
id_pred[n,out,in]
r[n,in,list]
r[n,out,list]
res=1/2
r[n,in,res]=1/2
r[n,out,res]=1/2
reverse_n_succ[in,out]=1/2
reverse_n_succ[out,in]=1/2
list
Const. unary predicates
r[n,in,list]
r[n,out,res]
reverse_n_succ[in,out]
reverse_n_succ[out,in]
list
id_succ[n,out,in]
id_pred[n,in,out]
r[n,out,list]
n[in]
n[in]
n[out]
(a) S0
n[in]
n[out]
id_succ[n,in,out]=1/2
id_succ[n,out,in]=1/2
id_pred[n,in,out]=1/2
id_pred[n,out,in]=1/2
n[in]
n[out]
res
res
n[in]
n[out]
n[out]
id_succ[n,in,out]
id_pred[n,out,in]
r[n,in,res]
id_succ[n,in,out]
id_pred[n,out,in]
r[n,in,res]
id_succ[n,out,in]=1/2
id_pred[n,in,out]=1/2
(b) S1
(c) S2
n[out]
n[in]
id_succ[n,out,in]
id_pred[n,in,out]
r[n,out,list]
id_succ[n,in,out]=1/2
id_pred[n,out,in]=1/2
Fig. 17. List-reversal example: The input structure S0 represents all acyclic singly-linked lists
of length two or more. The analysis produces the two output structures S1 and S2 . (In each
structure, unary predicates that have the same non-0 value for all individuals are displayed in the
box labeled “Const. unary predicates”. The values of the “irrelevant” predicates of the vocabulary
are not shown. By convention, the in, tmp, or out qualifier for a predicate whose name includes
square-bracket symbols is inserted inside the brackets, e.g., r[n, out, res].)
operator described in [Arnold et al. 2006]. It is this version of TVLA that we used
in the experiments reported here.
This section is organized as follows: §8.1 discusses the analysis of the recursive
list-reversal procedure from Fig. 1; §8.2 describes our experiments on a variety of
list-manipulation and tree-manipulation procedures. §8.3 discusses improvements
(compared to our previous work [Jeannet et al. 2004]) brought about by the use
of an improved meet operation [Arnold et al. 2006]. §8.4 discusses experiments
to speed up the convergence of the analysis method by injecting likely reachable
states at the start nodes of procedures. §8.5 compares our method and experimental
results with that of Rinetzky et al. [2005].
All running times were obtained using a 2GHz Pentium M, equipped with 1 GB
of memory, running Linux.
8.1 Analysis of the List-Reversal Example
Given that the input is an acyclic, singly-linked list, the goal of the analysis of the
procedure from Fig. 1, which destructively reverses an acyclic, singly-linked list,
using recursion to traverse the list, is to show that
(1) the output is an acyclic list
(2) each link of the output list is the reversal of a link of the input list, and vice
versa
(3) the cells of the output list are exactly the cells of the input list.
ACM Journal Name, Vol. V, No. N, Month 20YY.
A Relational Approach to Interprocedural Shape Analysis
·
35
Fig. 17 shows how the summary information that we obtain captures the behavior
of the recursive list-reversal procedure of Figs. 1 and 10. The descriptor of the initial
summary transformer at start node smain was the 3-valued structure S0 , shown in
Fig. 17(a), which represents (the identity transformation on) all linked lists of length
at least two that are pointed to by program variable list. The head of the answer
list is pointed to by program variable res. At the program’s exit node emain , the
summary transformers were the structures S1 and S2 of Fig. 17(b) and Fig. 17(c),
which represent the transformations that reverse lists of length two, and all lists of
length greater than two, respectively.
Note that in both S1 and S2 from Fig. 17, each node has the value 0 for the unary
predicate c[n, out] and each node has the value 1 for r[n, out, res]. This means that
no node lies on a directed cycle of n fields and all nodes are reachable from the new
head of the list res, and hence establishes item 1.
As discussed in §6.4, relational instrumentation predicates need to be introduced
to prevent the loss of essential information. Besides the identity instrumentation predicates defined in §6.4, the unary predicates reverse n succ[m1 , m2 ], with
m1 , m2 ∈ {in, out} and m1 6= m2 , record whether n[m2 ] is the reverse of n[m1 ].
These are defined by
reverse n succ[m1 , m2 ](v) = ∀v1 : (n[m1 ](v, v1 ) ⇒ n[m2 ](v1 , v)).
(15)
We also provided the following related constraint rules, which allow to deduce a
relationship between n[in] and n[out].
id succ[n, in, tmp](v) ∧ reverse n succ[tmp, out](v) ⇒ reverse n succ[in, out](v)
reverse n succ[in, tmp](v) ∧ id pred[tmp, out](v) ⇒ reverse n succ[in, out](v)
Note that only the reverse n succ[m1 , m2 ] predicates and the related constraint
rules are specific to the list-reversal example. The other predicates that appear
in Fig. 17 are shape properties that characterize singly-linked lists. (They have
been used in previous papers about shape analysis of list-manipulation programs;
e.g., see [Sagiv et al. 2002].) For instance, r[n, out, list](v) holds the value 1 for
individuals that are reachable from variable list through a chain of n[out] links.
In structures S1 and S2 , the values for the predicates reverse n succ[m1 , m2 ],
with m1 , m2 ∈ {in, out} and m1 6= m2 , show that for each n link n[in](v1 , v2 ) at the
entry node smain , we have an n link n[out](v2 , v1 ) at the exit node emain . In other
words, the procedure reverses all of the n links; this establishes item 2.
Finally, in both of the output structures S1 and S2 , we find that r[n, in, list](v)
and r[n, out, res](v) hold for each node. This means that no nodes are either lost or
gained, and hence the cells of the output list are exactly the cells of the input list;
this establishes item 3.
From the above discussion, it should be clear that the set of 3-valued structures
{S1 , S2 } establishes the desired properties: the output list is the reversal of the
input list, and no elements are either lost or gained.
We generalized this experiment by having procedure main call procedure rev
twice, as in Fig. 2(b). To achieve the same level of accuracy as we obtained for
a single call on rev, we needed to introduce an additional family of unary instrumentation predicates, reverse n pred[m1 , m2 ], whose definition is the same as
reverse n succ[m1 , m2 ] (Eqn. (15)), except with v and v1 exchanged. With these
ACM Journal Name, Vol. V, No. N, Month 20YY.
36
·
Bertrand Jeannet et al.
Program
a. programs on unsorted lists
create creates a list of any length
append (create) appends 2 lists
split (create) cuts a list into 2 lists
reverse (create) destructive list reversal
revappend (create) reverse-append
(using an accumulator parameter)
insert (create) inserts a cell at a random place in a list
delete (create)
removes a cell at a random place in a list
merge (create) merges randomly 2 lists
merge*
splice (create) splices 2 lists (specialized merge)
splice*
b. programs on sorted lists
create creates a list of any length
append (create) appends 2 lists
split create) cuts a list into 2 lists
reverse (create) destructive list reversal
revappend (create) reverse-append
(using an accumulator parameter)
createo (inserto) creates a sorted list using inserto
inserto (createo)
inserts a cell in the right place in a sorted list
deleteo (createo,inserto)
removes a cell with a given key from a sorted list
mergeo (createo,inserto) merges 2 sorted lists in one
sorted list
mergeo*
spliceo (createo,inserto) splices 2 sorted lists (interleaves their cells)
spliceo*
tailsort (create, inserto) sorts a list recursively using
insert
tailsort*
insertionsort (create, inserto) insertion sort
(using an accumulator parameter)
insertionsort*
mergesort (create,split,mergeo) mergesort
mergesort*
Iterative
# of
Time
structs
(sec)
Recursive
# of
Time
structs
(sec)
3/3
12/9
7/6
9/4
0.8
5.5
2.3
4.3
4/3
11/9
7/6
5/4
1
5.5
2.3
2.4
12/4
9/7
4.8
4.5
15/12
10/7
6.2
4.2
9/8
5.6
14/10
92/49
32/13
10/10
9/7
4.9
92
23
13
11
4/3
12/9
7/6
9/4
1
6
3
4.8
4/3
11/9
7/6
5/4
1
6
3
3
12/4
7/3
5.5
8.5
15/12
9/3
7
8.5
9/7
10
11/8
10
188/16
114
52/23
120/64
35
161
32/13
10/10
45
19
10/7
110/93
19
135
23/3
15
–
65/3
–
69/3
–
35
–
113
Table VI. Experimental results on unsorted and sorted lists. The names in parentheses indicate
the other procedures that are analyzed in the example. The stars indicate the introduction of
“blurring functions” in dataflow equations. The column “# of structs” indicates (i) the maximum
number of logical structures at any control point of the main procedure, and (ii) the maximum
number of logical structures at the summary point.
additional instrumentation predicates, we were able to establish that the second
call to rev always restores the initial memory configuration.
ACM Journal Name, Vol. V, No. N, Month 20YY.
A Relational Approach to Interprocedural Shape Analysis
·
37
Recursive
# of
Time
structs
(sec)
Program
a. programs on unsorted trees
create creates an unsorted tree of any size (possibly empty)
create*
spliceLeft (create) inserts a tree as the leftmost child of another
tree
insert* (create) inserts a cell in a tree
find* (create) finds a cell in a tree
removeRoot (create,spliceLeft) remove the root of a tree
remove* (create,spliceLeft,removeRoot) remove a cell in a tree
rotate (create) exchange left and right subtrees of all nodes
b. programs on sorted trees
create creates an unsorted tree
create*
insertu* (create) inserts a cell in an (unsorted) tree
spliceLeft (create) inserts an (unsorted) tree as the leftmost
child of another (unsorted) tree
createo* (insert) creates a sorted tree
insert* (createo) inserts a cell in a sorted tree
find* (createo,insert) finds a cell with a given key in a sorted
tree
removeRoot* (createo,insert,removeRoot,spliceLeft) removes a
cell with a given key in a sorted tree
remove* (createo,insert,removeRoot,spliceLeft) removes a cell
with a given key in a sorted tree
split* splits a tree into two trees according to a key, one with
cells less than the key, one with cells greater than the key
rotate (create) exchange left and right subtrees of all nodes
10/5
10/3
21/11
13
11
47
14/7
74/18
12/9
53/25
11/5
47
100
63
593
64
26/5
10/3
14/7
21/11
61
14
59
109
16/7
35/12
41/15
654
676
771
12/9
1160
30/15
1888
51/18
1780
19/5
101
Table VII. Experimental results on unsorted and sorted trees. The names in parentheses indicate
the other procedures that are analyzed in the example. The stars indicate the introduction of
“blurring functions” in dataflow equations. The column “# of structs” indicates (i) the maximum
number of logical structures at any control point of the main procedure, and (ii) the maximum
number of logical structures at the summary point.
8.2 Experimental Results on Lists and Trees
Tabs. VI and VII present our experimental results on lists and trees. In these
analyses, memory allocation and deallocation is modeled using a pool of free cells
[Reps et al. 2003]. The instrumentation predicates related to data structures (lists
and trees) are given in Tab. V and Tab. VIII. For sorted lists and trees, we introduce
the total-order core predicate leq(v1 , v2 ) described in Remark 4.3. We also introduce
the related predicates of Tab. IX.
All analyses start with a memory heap consisting of a summary node that represents the free-cell pool and another summary node that represents any context.
The core predicate leq(v1 , v2 ) evaluates globally to 1/2. The examples are named
according to the main analyzed procedure, but for most of them the main procedure
ACM Journal Name, Vol. V, No. N, Month 20YY.
·
38
Bertrand Jeannet et al.
p
Intended Meaning and ψp
down(v1 , v2 )
At least one field of v1 points to v2 :
left(v1 , v2 ) ∨ right (v1 , v2 )
Both fields of v1 points to v2 :
left(v1 , v2 ) ∧ right (v1 , v2 )
Reachability by any field from a variable z:
∃v1 : z(v1 ) ∧ down ∗ (v1 , v)
Shared property:
∃v1 , v2 : v1 6= v2 ∧ down(v1 , v) ∧ down(v2 , v)
Cyclicity property:
∃v1 : down(v, v1 ) ∧ down ∗ (v1 , v)
both(v1 , v2 )
r down[z ](v)
shared down(v)
cyc down(v)
Table VIII.
Defining formulas of instrumentation predicates related to binary trees.
ψp and Intended Meaning
p
Sorted lists:
orda[n](v)
ordb[n](v)
ord [n](v)
The n field of v points to a cell v2 with v ≤ v2 :
∃v2 : n(v, v2 ) ∧ leq(v, v2 )
The n field of v is null: ∀v2 : ¬n(v, v2 )
Property of all cells of a sorted list
orda[n](v) ∨ ordb[n](v)
Sorted binary tree
orda right (v) The keys of the right subtree of v are greater than the key of v:
orda right (v) = ∃v1 : right (v, v1 ) ∧ ∀v2 : down ∗ (v1 , v2 ) ⇒ leq(v, v2 )
orda left(v)
The keys of the left subtree of v are less than the key of v:
orda left(v) = ∃v1 : left(v, v1 ) ∧ ∀v2 : down ∗ (v1 , v2 ) ⇒ leq(v2 , v)
ordb[n](v)
The n field of v is null: ∀v2 : ¬n(v, v2 )
ord tree(v)
The tree is sorted:
ord tree(v) = (ordb[right](v) ∨ orda right (v))∧
(ordb[left](v) ∨ orda left(v))
Table IX.
Defining formulas of instrumentation predicates related to ordering of cells.
first calls one or more data-structure-creation procedures, and possibly subprocedures, which are also analyzed from scratch.
Analysis Goals. The goal of each analysis run is to establish that a data-structure
invariant is preserved (or re-established), and that the summary obtained for each
procedure captures its effect with sufficient precision. For unsorted lists (resp.
trees), the output should be a well-formed list (resp. tree), without cell sharing,
cycles, and memory leaks. Additionally, for sorted lists (resp. trees), the output
should satisfy shape properties that define the proper ordering of cells in the data
structure. The input/output invariant that the summary of a procedure should
capture depends on the procedure. Fig. 17 shows the procedure summary computed
for the list-reversal example, which shows that the output list is composed of exactly
the same set of cells as the input list, and that for each cell, the incoming n link
has become an outgoing n link towards the same cell. For the insert and delete
examples, the summary pinpoints the inserted or deleted cell (see Fig. 13(a)). In
ACM Journal Name, Vol. V, No. N, Month 20YY.
A Relational Approach to Interprocedural Shape Analysis
·
39
general, we observed that when the analysis fails to capture a precise approximation
of the summary of a procedure, the abstract memory configurations obtained at
the return site of the procedure do not establish that the expected data-structure
invariants hold.
Note that the shape property that characterizes an ordered tree is much more
complex than the shape property that characterizes an ordered list (see Tab. IX).
A list is sorted if and only if each of its cells satisfies a local shape property (namely,
that it is in the right order with respect to its immediate successor), whereas a tree
is sorted if and only if each of its cells satisfies a global shape property (namely,
that it is in the right order with respect to all of its children). The resulting
more complex instrumentation predicates for trees explain, for instance, the time
difference between the spliceLeft example on unsorted trees and on sorted trees. In
the latter case, the analyzer must propagate the values of instrumentation predicates
that hold information about ordering properties.
The analysis times are quite high for procedures on sorted trees. However, the
ability to automatically infer correct summaries for procedures that manipulate
sorted trees is a major success for our technique. Indeed, other approaches to
interprocedural shape analysis have not yet tackled this challenge. For instance, the
tree analyses presented in [Rinetzky et al. 2005] do not establish that orderedness
is maintained.
Choice of Appropriate Instrumentation Predicates. The instrumentation predicates (§5.2.1) that characterize a data structure’s shape properties—such as those
defined in Tabs. V, VIII, and IX) are needed for the analysis to infer interesting
information. As soon as the data structures manipulated by the analyzed program are large enough to generate summary nodes in abstract structures, these
data structures cannot be characterized accurately without these instrumentation
predicates.
Concerning relational instrumentation predicates (see §6.4), besides the
id succ[n, m1 , m2 ] predicate that is needed to model the identity relationship that
holds at the entry of procedures, they are also needed in several procedure summaries to capture crucial information about the before and after states. §8.1 discussed the predicate reverse n succ[m1 , m2 ] that models the reversal of n links, used
in the analysis of reverse and revappend. For trees, rotate requires a similar relational predicate. The other examples in Tabs. VI and VII do not require specific
relational instrumentation predicates.
The omission of necessary instrumentation predicates quickly leads to useless
analysis results: an initial minor loss of precision generally leads to a major loss of
precision. The methodology with respect to this issue consists of checking whether
the provided instrumentation predicates allow capturing both (i) shape properties
that characterize the data structure, and (ii) the effects of the procedures in the
analyzed program. We needed some trial-and-error steps to define the appropriate
instrumentation predicates for sorted trees.
An alternative approach to the problem of choosing appropriate instrumentation
predicates would have been to use the method developed by Loginov et al. [Loginov
et al. 2005; Loginov 2006] for performing automatic abstraction refinement, using
inductive logic programming to identify candidate instrumentation predicates. We
ACM Journal Name, Vol. V, No. N, Month 20YY.
40
·
Bertrand Jeannet et al.
fr1
r[n,inp,fp1]
r[n,out,fp1]
r[n,inp,fr1]
r[n,out,fr1]
fp1
n[inp]
r[n,inp,fp2]
r[n,out,fp2]
r[n,inp,fr1]
r[n,out,fr1]
fp2
n[out]
n[inp]
r[n,inp,fp1]
r[n,out,fp1]
r[n,inp,fr1]
r[n,out,fr1]
r[n,inp,fp2]
r[n,out,fp2]
r[n,inp,fr1]
r[n,out,fr1]
r[n,out,fp1]
r[n,inp,fp2]
r[n,out,fp2]
r[n,out,fr1]
n[inp]
Fig. 18.
r[n,inp,fp1]
r[n,out,fp1]
r[n,inp,fr1]
r[n,out,fr1]
n[out]
fp2
r[n,out,fp1]
r[n,inp,fp2]
r[n,out,fp2]
r[n,out,fr1]
r[n,out,fp2]
r[n,inp,fp1]
r[n,out,fp1]
r[n,out,fr1]
n[inp]
n[inp]
n[out]
n[out]
fp1
n[out]
r[n,out,fp1]
r[n,inp,fp2]
r[n,out,fp2]
r[n,out,fr1]
fp1
n[out]
n[out]
fp2
fr1
fr1
...
n[inp]
r[n,inp,fp1]
r[n,out,fp1]
r[n,out,fp2]
r[n,inp,fr1]
r[n,out,fr1]
n[out]
r[n,out,fp2]
r[n,inp,fp1]
r[n,out,fp1]
r[n,out,fr1]
n[out]
r[n,out,fp1]
r[n,inp,fp2]
r[n,out,fp2]
r[n,out,fr1]
Combinatorial explosion with the summary of merge: illustration with 2 lists of size 2.
did not attempt to use that approach in this work.
Introduction of “Blurring Functions” in the Analysis. From the sorted-list examples, one can observe that the analysis time and complexity (in terms of the number
of structures representing the summary function) becomes high for merging and
sorting procedures. This is due to the fact that our abstraction is sometimes more
precise than necessary, and this can cause combinatorial explosion. For instance, for
the merge procedure, the abstraction remembers, for each cell of the resulting list,
whether it belonged to the first argument list or the second one. The many possible
interleavings of the first cells in the resulting lists causes a combinatorial explosion
in the result (see Fig. 18). This is all the more frustrating because this information
is rarely relevant: the properties that the summary function of procedure reverse
should capture accurately are:
(1) Each cell in the result list was a cell in one of the two input lists, and vice versa.
(2) The result is a list, and it is a sorted list.
One way to limit this combinatorial explosion is to apply an extra abstraction
step at the end of the procedure. For reverse,
(1) we introduce an instrumentation predicate r n fp1or2[m](v) = r[n, m, fp1](v) ∨
r[n, m, fp2](v) indicating whether a cell is reachable from one of the two list
arguments fp1 and fp2;
(2) we forget the value of n[inp](v1 , v2 ) and related predicates r[n, inp, fp1](v),
ACM Journal Name, Vol. V, No. N, Month 20YY.
A Relational Approach to Interprocedural Shape Analysis
·
41
r[n, inp, fp2](v) and r[n, inp, fr1](v) on all cells reachable from the result, using
the assignment:
n[inp](v1 , v2 )
r[n, inp, fp1](v)
r[n, inp, fp2](v)
r[n, inp, fr1](v)
=
=
=
=
(r[n, out, fr1](v1 ) ? 1/2 : n[inp](v1 , v2 ))
(r[n, out, fr1](v) ? 1/2 : r[n, inp, fp1](v))
(r[n, out, fr1](v) ? 1/2 : r[n, inp, fp2](v))
(r[n, out, fr1](v) ? 1/2 : r[n, inp, fr1](v))
The same phenomenon holds for sorting procedures, where the main information
to be captured is that the resulting list is a sorted permutation of the input list,
but where a (partial) knowledge about the applied permutation is superfluous. The
starred versions of the examples in Tabs. VI and VII refer to versions of the dataflow
equations in which such “blurring” functions are introduced to forget information
considered irrelevant.
Our methodology with respect to the introduction of blurring functions is related to the choice of instrumentation predicates, with the difference that it is
guided only by performance issues. If the existing instrumentation predicates lead
to combinatorial explosion with respect to the desired procedure summary, this motivates the application of a blurring function, together with the possible addition of
an instrumentation predicate to preserve essential information. Our experience was
that adding adequate blurring functions and related instrumentation predicates was
quite easy to do, once the origin of the combinatorial explosion was identified (either theoretically or experimentally). The main issue is to blur enough predicates,
otherwise some of them might again take on definite values after semantic reduction
via Coerce. This is because instrumentation predicates are not independent from
each other.
8.3 Improvement Brought About by the Meet Operator
Compared to [Jeannet et al. 2004], our implementation of interprocedural analysis
has been improved by the use of a precise meet operation on the abstract domain of
3-valued structures, proposed by [Arnold et al. 2006] and based on graph matching,
as mentioned in §7.2.
In [Jeannet et al. 2004], we used an approximate implementation of the meet of
two 3-valued structures, based on the conversion of one of the argument structures
to a set of constraint rules, and the application of these additional constraints to
the other argument structure using the Coerce and Focus operations, which were
briefly described in §5.2.2. The approximations came both from the conversion
to constraints and the restricted use of the Focus operation. To be exact would
require the analysis to focus (temporarily) on all predicates common to the two
3-valued structures; for efficiency reasons we decided to focus only on predicates
that represent pointer variables. The method was still rather inefficient:
— The conversion of a 3-valued structure to a 3-valued logical formula and then
to constraint rules generates many rules, in particular due to the restricted
syntax allowed for such rules in TVLA. Given a 3-valued structure of size n on
a vocabulary of size p1 + p2 , where pi is the number of predicates of arity i, the
number of generated constraint rules is in O(n · p1 + n2 · p2 ).
— The Coerce operation is the most expensive operation in the TVLA implemenACM Journal Name, Vol. V, No. N, Month 20YY.
42
·
Bertrand Jeannet et al.
Old meet
# of
Time
structs
(sec)
Program
programs on unsorted lists (recursive version)
reverse destructive list reversal
insert inserts a cell at a random place in a list
delete
removes a cell at a random place in a list
New meet
# of
Time
structs
(sec)
7
23
4.7
82
5/4
10/7
2.4
4.2
32
84
14/10
4.9
Analysis with the old meet does not include the creation of input lists. It also requires 2 additional instrumentation predicates for the insert and delete examples, due to the approximation
induced by the meet.
Table X.
Comparison of the use of resources with the old and new meet operators.
Normal
# of
Time
structs
(sec)
Program
Accelerated
# of
Time
structs
(sec)
a. programs on sorted lists
tailsort* (create,insert)
insertionsort* (create,insert)
mergesort* (create,split,merge)
23/3
65/3
69/3
15
35
113
18/3
65/3
48/3
8.8
27
40
b. programs on sorted trees
createo* (insert)
16/7
654
14/8
250
Table XI.
Interprocedural analysis method.
tation.12
The gains obtained by the use of the precise meet operation of [Arnold et al.
2006] are illustrated in Tab. X for a few simple examples. The gain in efficiency is
impressive, but the gain in precision is also important: for the insert and delete
examples, we did not need to introduce specific instrumentation predicates to capture the effect of these procedures (the triangular pattern shown in Fig. 13(a)). This
precision issue prevented us from experimenting with the old meet implementation
on our full set of examples.
8.4 Speeding up the Analysis by Modifying the Equations
In §7.3, we discussed the possibility of speeding up the convergence of the analysis
by injecting (a subset of) likely reachable states at the start nodes of procedures,
which may reduce the number of iteration steps needed for reaching a fixpoint.
We experimented with this technique on programs that consist of several recursive
procedures: we injected the set of all well-formed (and possibly ordered) lists at the
12 It
should be noted that the inefficiency of the Coerce operation was a general problem in
past versions of the TVLA system. It motivated the work reported in [Arnold 2006], which obtained substantial speedups by replacing pairs of Focus/Coerce operations by the meet operation
whenever possible. It also motivated the work of Bogudlov et al. [2007b; 2007a] who developed
techniques that allowed Coerce to run over an order of magnitude faster. These techniques were
not incorporated into the version of TVLA that implements the methods described in the present
paper. The methods of Bogudlov et al. are essentially orthogonal to the ones that we developed, and thus the speed-ups that would be obtained by incorporating their techniques into our
implementation should be comparable to the speed-ups reported in [Bogudlov et al. 2007b; 2007a].
ACM Journal Name, Vol. V, No. N, Month 20YY.
A Relational Approach to Interprocedural Shape Analysis
void main(){
List list = create();
List acc = NULL;
List res = insertionsort(list,acc);
}
List insertionsort(List list, List acc){
List res,t,tt;
if (list==NULL)
res = acc;
else {
t = list->n; list->n = NULL;
tt = insert(acc,list);
res = insertionsort(t,tt);
}
return res;
}
Fig. 19.
·
43
List insert(List list, List cell){
List res;
if (list!=NULL && cell->key > list->key){
t = list->n; list->n = NULL;
res = insert(t,cell);
list->n = res;
res = list;
}
else {
if (list==NULL) cell->n = NULL;
else cell->n = list;
res = cell;
}
return res;
}
Insertionsort example
entry of all list-manipulating procedures. The results are given in Tab. XI, and show
that in such cases this technique is very efficient. Note that in the examples from
Tab. XI, all procedures are recursive, which leads to more complex dependences
than in the example shown in Fig. 16.
For the insertionsort example depicted on Fig. 19, with the standard technique,
insertionsort, and then insert, will first be called with the acc=NULL argument.
insert will be analyzed for this base case, then insertionsort will be called with
a one-element list in acc, which will be propagated later in the body of insert.
So it takes several steps to infer that insert might be called with any sorted list.
Injecting the set of all sorted lists at the entry of insert allows to compute quicker
the complete summary of insert, which also induces a faster propagation of the
call to insert in insertionsort.
8.5 Comparison with Cutpoint Semantics and Tabulated Representation
In this section, we compare our method and experimental results with those of
Rinetzky et al. [2005]. The two methods are built on the same abstract domain
of 3-valued logical structures, and both implement a context- and flow-sensitive
interprocedural shape analysis based on procedure summarization.
The effective reuse of procedure summaries in different calling contexts motivated
the development of mechanisms to allow parts of the heap that are not relevant to
the procedure’s actions to be ignored [Rinetzky et al. 2005]. Rinetzky et al. [2005]
use a tabulated representation—i.e., using pairs of abstracted structures, rather
than abstractions of paired structures—to capture summaries of procedures, and
the notion of cutpoints is used to eliminate details of the heap that are inessential to
the callee, thereby permitting procedure summaries to be used in different calling
contexts.13 In §9, we describe the similarities and differences between the methods
more thoroughly, and discuss how a similar effect of eliminating details is obtained
essentially for free with our approach.
13 When
control is passed from the caller to the callee, the cutpoints represent the frontier of
vertices in the part of the heap visible to the callee that are reachable from the caller’s pointer
variables (or from pointer variables of other procedures further back in the stack). During the
execution of the callee, it is necessary to track all nodes that are reachable from the local variables,
the global variables, and the cutpoints, but other parts of the heap structure can be removed.
ACM Journal Name, Vol. V, No. N, Month 20YY.
44
·
Program
Bertrand Jeannet et al.
Cutpoint
-based
Relational
std.
*
a. (Sorted) list-manip. programs
create
8
8
8
insert
46
10
10
delete
46
35
35
reverse
32
3
3
revappend
47
7
7
merge
83
161
45
insertionsort
265 >1000
35
tailsort
65
135
15
mergesort
576 >1000
113
Program
Cutpoint
-based
Relational
std.
*
b. (Unsorted) tree-manip. programs
create
10
61
14
insert
25
—
47
find
67
—
100
removeRoot
49
—
63
remove
114
—
593
spliceLeft
26
—
47
rotate
43
—
64
Table XII. Times for cutpoint-based analysis vs. relational analysis. All times are in seconds.
(The columns labeled * report the times for analysis runs in which blurring functions are applied.
A long dash (—) means that the run was not attempted.)
Tab. XII compares the two analyses on a set of examples. We followed [Rinetzky
et al. 2005] by not analyzing procedures in isolation, but instead analyzing a full
program from scratch. This means that the analysis time for the mergesort example
includes the analysis time for the creation of a list (create), as well as the auxiliary
procedures (split and merge). Both analyses were executed on the same computer,
using the same version of the TVLA system (with the exception of the additions
mentioned at the very beginning of this section).
Sorted-List Examples. For this set of examples, the relational method is generally
as efficient as, or more efficient than, the cutpoint-based method. It should be
noted, however, that the relational method sometimes requires the application of
blurring functions to obtain reasonable performance, but in such cases the gain in
performance is significant, even with respect to the cutpoint-based analysis. The
latter is somewhat surprising because the cutpoint-based analysis tabulates pairs
of structures, and, as discussed in §6.2 and illustrated in Fig. 13, the information
computed by the relational analysis is much more precise than the information that
is computed when a tabulated representation is used.
Unsorted-Tree Examples. Because [Rinetzky et al. 2005] did not try to analyze
examples with sorted trees, the experiments dealt only with unsorted trees: the
ordering relation between tree cells is abstracted away. The execution times are
better for the cutpoint-based method, but remain of the same order of magnitude,
with the exception of the remove example. The latter example demonstrates that
the advantages of the relational analysis in terms of precision can have a cost in
terms of efficiency, even when blurring functions (§8.2) are applied. In the case
of the remove procedure, the extra precision of the relational analysis causes the
number of cases that the analyzer has to consider to increase: in particular, the set
of output two-vocabulary structures at the exit node of remove (i.e., the procedure
summary for remove) relates an output tree—in which a cell has been removed—
to an input tree—which contains the cell. Consequently, for essentially the same
output tree, the analyzer ends up enumerating a number of different two-vocabulary
structures according to the different possible positions in the input tree of the cell
to be removed: the cell to be removed is the root; the cell to be removed is a left or
ACM Journal Name, Vol. V, No. N, Month 20YY.
A Relational Approach to Interprocedural Shape Analysis
·
45
right child of the root; or the cell to be removed is one that lies deeper in the left
or right subtree of the root cell.
9. RELATED WORK
General approaches to interprocedural analysis
One can distinguish two main approaches to interprocedural static analysis. The
first approach, called the functional approach (after the name used in [Sharir and
Pnueli 1981]), uses a denotational semantics of the analyzed program and consists
of two steps. The first step computes predicate transformers associated with the
procedures of the program by finding a fixpoint of a set of equations over predicate
transformers. The operations used in these equations are (primarily) transformer
composition and transformer join. The second step (repeatedly) applies a composed
predicate transformer for a program path to some predicate that characterizes the
possible input states, to obtain a predicate that holds at the end of the path.
[Cousot and Cousot 1977; Sharir and Pnueli 1981; Knoop and Steffen 1992] apply
this approach to different classes of programs.
The second approach, which we call the operational approach, adopts an operational semantics for programs. Here, as in many intraprocedural verification techniques, the predicates are propagated along the edges of the program’s control-flow
graph, using the predicate transformers associated with program statements and
conditions, until a fixpoint is reached. The analysis can be viewed as a symbolic execution of the program in which values are replaced by properties. In contrast with
the functional approach, there is no computation of (composed) predicate transformers associated with blocks of instructions or procedures. However, to simulate
the execution of the program, one needs to take into account the program’s call
stack: when a procedure returns to its caller, the call site should be popped from
the stack and the local state of the caller should be restored to the state that it had
before the call. The “call-strings” approach of [Sharir and Pnueli 1981] provides one
way to address this issue, by maintaining additional information in the abstract
domain to over-approximate the state of the call stack.
Techniques based on pushdown systems [Bouajjani et al. 1997; Finkel et al. 1997]
and weighted pushdown systems [Bouajjani et al. 2003; Reps et al. 2005] contain
elements of both the functional and operational approaches. Jeannet and Serwe
[Jeannet and Serwe 2004] show how the functional and operational approaches can
be derived as an abstract interpretation of the standard operational semantics,
modeled using a stack of activation records. Once the interprocedural semantics is
defined in this way, a second abstraction step may be used to abstract the data (in
our case, the values of variables and linked memory cells). This is the approach we
followed in §7.1, with the variations described in §7.3.
Interprocedural shape analysis
Several other papers have studied interprocedural shape analysis using canonical
abstraction. In [Rinetzky and Sagiv 2001], the store is augmented to include the
runtime stack as an explicit data structure. The storage abstraction used in [Rinetzky and Sagiv 2001] is an abstraction of the store augmented in this fashion. In
essence, the collection of activation records that form the stack are abstracted using
ACM Journal Name, Vol. V, No. N, Month 20YY.
46
·
Bertrand Jeannet et al.
an abstraction for linked lists. This “stack-materialization” approach causes certain
technical complications; they are not insurmountable, but do cause the designer of
an abstract interpretation to have to identify certain shape properties that relate
the state of the stack and the state of the heap during the execution of the program
(in particular, how the heap cells reachable from the visible and invisible instances
of local variables are related). This approach is reminiscent of the “call-strings”
approach; in contrast, the approach used in the present paper was inspired by the
functional approach, in which the stack is not materialized as an explicit data structure; instead it is an implicit part of the programming-language semantics. Thus,
the designer of an abstract interpretation does not need to be concerned with the
“shape” of the runtime stack nor with such things as visible and invisible instances
of local variables. Because of the different nature of the information obtained,
[Rinetzky and Sagiv 2001] can only show that a list reversed twice yields a list with
the same head and the same set of memory cells (in some order) as the initial list,
while our method shows that it yields the same initial list.
As mentioned in §8.5, [Rinetzky et al. 2005] implements a context- and flowsensitive analysis that is also inspired by the functional approach, but which uses
tabulation to represent the summaries of procedures. The effective reuse of procedure summaries in different calling contexts is made possible by using the notion
of cutpoints and by considering cutpoint-free programs.14 As for [Rinetzky and
Sagiv 2001], the resulting analysis is less precise than ours because their tabulated representation is less expressive than our relational representation, in which
one can track (an approximation of) the evolution of individual objects. It may
perform more efficiently, particularly on trees, but it has not yet been applied to
ordered trees, where the ingredients of invariants satisfied by ordered trees need
to be tracked. Our approach has the benefit of generality (it is not restricted to
cutpoint-free programs) and conceptual simplicity: it reuses the same algorithms
as the intraprocedural analysis, and relational composition is performed using the
standard notions of intersection and elimination.
A method for performing interprocedural shape analysis using procedure specifications and assume-guarantee reasoning is presented in [Yorsh et al. 2004]. There,
it is assumed that a specification for each procedure—a pre- and post-condition—is
already known; the technique presented in [Yorsh et al. 2004] can be used to interpret a procedure’s pre- and post-condition in the most precise way (for a given
abstraction). For every procedure invocation, one checks if the current abstract
value potentially violates the precondition; if it does, a warning is produced. At
the point immediately after the call, one can assume that the post-condition holds.
Similarly, when a procedure is analyzed, the pre-condition is assumed to hold on entry, and at end of the procedure the post-condition is checked. The work described
in the present paper is complementary to [Yorsh et al. 2004]: our work provides a
way to identify procedure specifications (in the form of sets of 2-vocabulary 3-valued
structures) that can be used with the method from [Yorsh et al. 2004].
Several techniques have been suggested for automatically checking the partial cor14 In
cutpoint-free programs [Rinetzky et al. 2005], the nodes pointed to by a caller’s parameters
always dominate the nodes that are reachable from the caller’s pointer variables (or from pointer
variables of other procedures further back in the stack).
ACM Journal Name, Vol. V, No. N, Month 20YY.
A Relational Approach to Interprocedural Shape Analysis
·
47
rectness of programs annotated with loop invariants and pre- and post-conditions
[Møller and Schwartzbach 2001; Berdine et al. 2005; Lahiri and Qadeer 2008]. Compared to our approach to shape analysis, those techniques can be faster; in particular, annotations can drastically reduce the cost of interprocedural shape analysis
because they allow the correctness of a set of procedures to be checked modularly,
using a linear pass over each procedure’s body. However, the burden of requiring
programmers to express loop invariants and the required pre- and post-conditions
is much higher than the effort required for providing adequate instrumentation
predicates in our method.
A recent approach to interprocedural shape analysis is based on separation logic,
which has been designed for performing context-independent reasoning about memory shapes [Gotsman et al. 2006]. It has to take care of cutpoints, and to abstract
them if too many cutpoints appears in the course of the analysis. In our method,
pointer variables to cutpoints in the caller are forgotten in the callee, but are recovered upon return from the callee thanks to the meet operation in Eqn. (14).
Cutpoints were also used to develop interprocedural shape-analysis algorithms that
are not based on canonical abstractions [Marron et al. 2008]. We believe that the
principles that underlie our relational analysis (i.e., the use of abstractions of twovocabulary structures) are also applicable for other abstractions as long as they
support the right interface operations (e.g., projection and meet).
“Heap Modularity”
Both [Rinetzky et al. 2005] and [Gotsman et al. 2006] state that their techniques are
fully “heap modular” in the sense that the procedure summaries computed by the
analyses deal only with the reachable parts of the heap and ignore the (unreachable)
context of the caller in the callee, which cannot be modified by the callee.
This effect is obtained naturally with our approach. Because most core and
instrumentation predicates are related to reachability from visible variables, the part
of the heap that is not reachable from the local variables in a callee is summarized
with a single (or a few) “context” summary nodes. When the callee returns to
its caller, this context summary node is materialized again by the meet of the
summary relation at the return site of the callee and the relation at the call site. In
fact, because some predicates are independent of reachability properties (predicates
related to cyclicity or sharing), there may be several context summary nodes. In
such cases, at the entry of the callee, a predicate-update formula (cf. §4.3) may be
used to assign the value 1/2 to those predicates for non-reachable cells, as follows:
p0 (v) = reachable from input parameters(v) ? p(v) : 1/2
This induces a more effective merging of abstract cells not reachable in the callee
(hence not modifiable by the callee). The information is recovered during the
processing at the procedure return site.
Abstract transformers
The analysis described in this paper uses 3-valued structures over a doubled vocabulary. A similar approach is standard when concrete transition relations are
expressed by means of formulas. For instance, the semantics of a statement x :=
y+1 can be expressed as (x0 = y +1)∧(y 0 = y). Statements such as x := y+1 can be
ACM Journal Name, Vol. V, No. N, Month 20YY.
48
·
Bertrand Jeannet et al.
transformed into composable abstract transformers for programs that manipulate
numeric data, using several numeric lattices (e.g., polyhedra [Cousot and Halbwachs 1978], octagons [Miné 2006], etc.). A key feature of the approach described
in the present paper is that relational instrumentation predicates can refer to both
the P[inp] and P[out] vocabularies. For instance, the family of unary predicates
reverse n succ[m1 , m2 ] discussed in §8 (with m1 , m2 ∈ {inp, out} and m1 6= m2 )
records whether n[m2 ] is an inverse of n[m1 ].
The classic functional approach of Sharir and Pnueli [Sharir and Pnueli 1981] uses
function composition for all operations. As is typically done in analyses based on the
numerical abstract domain [Cousot and Halbwachs 1978; Miné 2006], the approach
taken in this paper might be more properly described as a hybrid approach:
(1) Intraprocedural propagation is based on a form of transformer application,
rather than transformer composition. That is, for an intraprocedural propagation with respect to transformer τ , the actions of τ are applied to the second
vocabulary, with the first vocabulary kept constant.
(2) Interprocedural propagation is based on the composition of two-vocabulary
structures (using three-vocabulary structures, structure meet, and vocabulary
projection).
For shape analysis, the advantage of the hybrid approach has to do with the maintenance of instrumentation predicates that express reachability properties. The
application step used in item (1) is satisfactory when there are unit-size changes
to core relations: the instrumentation-predicate-maintenance formulas created by
finite differencing [Reps et al. 2003] are generally able to maintain definite values for instrumentation predicates that express reachability properties for unit-size
changes to core predicates. The (approximate) composition step used in item 2
generally allows definite values to be retained under the non-unit-size changes to
core predicates that occur when applying a procedure summary.
Acknowledgments. We are grateful to V. Kuncak for several discussions about
the use of two-vocabulary structures in shape analysis; to N. Rinetzky for many
discussions about interprocedural shape-analysis methods, as well as for his help
with the experiments that compare our methods with his; and to G. Arnold for his
help incorporating his work on the meet operation into our implementation.
REFERENCES
Arnold, G. 2006. Specialized 3-valued logic shape analysis using structure-based refinement and
loose embedding. In Static Analysis Symposium, SAS’06. LNCS, vol. 4134.
Arnold, G., Manevich, R., Sagiv, M., and Shaham, R. 2006. Combining shape analyses by
intersecting abstractions. In Int. Conf. on Verification, Model Checking and Abstract Interpretation, VMCAI’06. LNCS, vol. 3855.
Ball, T. and Rajamani, S. 2001. Bebop: A path-sensitive interprocedural dataflow engine. In
Prog. Analysis for Softw. Tools and Eng. 97–103.
Berdine, J., Calcagno, C., and O’Hearn, P. W. 2005. Smallfoot: Modular automatic assertion
checking with separation logic. In FMCO’05. LNCS, vol. 4111. Springer, 115–137.
Bogudlov, I., Lev-Ami, T., Reps, T., and Sagiv, M. 2007a. Revamping TVLA: Making parametric shape analysis competitive. Tech. Rep. TR-2007-01-01, Tel-Aviv Univ., Tel-Aviv, Israel.
ACM Journal Name, Vol. V, No. N, Month 20YY.
A Relational Approach to Interprocedural Shape Analysis
·
49
Bogudlov, I., Lev-Ami, T., Reps, T., and Sagiv, M. 2007b. Revamping TVLA: Making parametric shape analysis competitive (tool paper). In Int. Conf. on Computer Aided Verif. LNCS,
vol. 4590.
Bouajjani, A., Esparza, J., and Maler, O. 1997. Reachability analysis of pushdown automata:
Application to model checking. In Proc. CONCUR. LNCS, vol. 1243. Springer-Verlag, 135–150.
Bouajjani, A., Esparza, J., and Touili, T. 2003. A generic approach to the static analysis of
concurrent programs with procedures. In Princ. of Prog. Lang. 62–73.
Clarke, Jr., E., Grumberg, O., and Peled, D. 1999. Model Checking. The M.I.T. Press.
Cousot, P. and Cousot, R. 1977. Static determination of dynamic properties of recursive
procedures. In Formal Descriptions of Programming Concepts, E. Neuhold, Ed. North-Holland,
237–277.
Cousot, P. and Halbwachs, N. 1978. Automatic discovery of linear constraints among variables
of a program. In Princ. of Prog. Lang. 84–96.
Finkel, A., B.Willems, and Wolper, P. 1997. A direct symbolic approach to model checking
pushdown systems. Elec. Notes in Theor. Comp. Sci. 9.
Gopan, D., DiMaio, F., N.Dor, Reps, T., and Sagiv, M. 2004. Numeric domains with summarized dimensions. In Tools and Algs. for the Construct. and Anal. of Syst. LNCS, vol. 2988.
512–529.
Gotsman, A., Berdine, J., and Cook, B. 2006. Interprocedural shape analysis with separated
heap abstractions. In Static Analysis Symp. LNCS, vol. 4134. 240–260.
Gries, D. 1981. The Science of Programming. Springer-Verlag.
Jeannet, B., Loginov, A., Reps, T., and Sagiv, M. 2004. A relational approach to interprocedural shape analysis. In Static Analysis Symp. LNCS, vol. 3148.
Jeannet, B. and Serwe, W. 2004. Abstracting call-stacks for interprocedural verification of
imperative programs. In Algebraic Methodology and Software Technology, AMAST’04. LNCS,
vol. 3116.
Knoop, J. and Steffen, B. 1992. The interprocedural coincidence theorem. In Comp. Construct.
LNCS, vol. 641. 125–140.
Lahiri, S. K. and Qadeer, S. 2008. Back to the future: Revisiting precise program verification
using smt solvers. In Princ. of Prog. Lang.
Lev-Ami, T., Reps, T., Sagiv, M., and Wilhelm, R. 2000. Putting static analysis to work for
verification: A case study. In Int. Symp. on Softw. Testing and Analysis. 26–38.
Lev-Ami, T. and Sagiv, M. 2000. TVLA: A system for implementing static analyses. In Static
Analysis Symp. LNCS, vol. 1824. 280–301.
Loginov, A. 2006. Refinement-based program verification via three-valued-logic analysis. Ph.D.
thesis, Comp. Sci. Dept., Univ. of Wisconsin, Madison, WI. Tech. Rep. 1574.
Loginov, A., Reps, T., and Sagiv, M. 2005. Abstraction refinement via inductive learning. In
Int. Conf. on Computer Aided Verif. LNCS, vol. 3576.
Manna, Z. and Pnueli, A. 1995. Temporal Verification of Reactive Systems: Safety. SpringerVerlag.
Marron, M., Hermenegildo, M. V., Kapur, D., and Stefanovic, D. 2008. Efficient contextsensitive shape analysis with graph based heap models. In Comp. Construct. LNCS, vol. 4959.
245–259.
Miné, A. 2006. The octagon abstract domain. Higher-Order and Symbolic Computation 19, 1,
31–100.
Møller, A. and Schwartzbach, M. I. 2001. The pointer assertion logic engine. In Prog. Lang.
Design and Impl. 221–231.
Reps, T., Horwitz, S., and Sagiv, M. 1995. Precise interprocedural dataflow analysis via graph
reachability. In Princ. of Prog. Lang. ACM Press, New York, NY, 49–61.
Reps, T., Sagiv, M., and Loginov, A. 2003. Finite differencing of logical formulas for static
analysis. In European Symp. on Programming. LNCS, vol. 2618. 380–398.
Reps, T., Schwoon, S., Jha, S., and Melski, D. 2005. Weighted pushdown systems and their
application to interprocedural dataflow analysis. Sci. of Comp. Prog. 58, 1–2 (Oct.), 206–263.
ACM Journal Name, Vol. V, No. N, Month 20YY.
50
·
Bertrand Jeannet et al.
Rinetzky, N., Bauer, J., Reps, T., Sagiv, M., and Wilhelm, R. 2005. A semantics for procedure
local heaps and its abstraction. In Proc. of the 32th ACM SIGPLAN - SIGACT Symposium
on Principles of Programming Languages (POPL’05).
Rinetzky, N. and Sagiv, M. 2001. Interprocedural shape analysis for recursive programs. In
Comp. Construct. LNCS, vol. 2027. 133–149.
Rinetzky, N., Sagiv, M., and Yahav, E. 2005. Interprocedural shape analysis for cutpoint-free
programs. In Static Analysis Symposium, SAS’05. LNCS, vol. 3672.
Sagiv, M., Reps, T., and Horwitz, S. 1996. Precise interprocedural dataflow analysis with
applications to constant propagation. Theor. Comp. Sci. 167, 131–170.
Sagiv, M., Reps, T., and Wilhelm, R. 2002. Parametric shape analysis via 3-valued logic. Trans.
on Prog. Lang. and Syst. 24, 3, 217–298.
Sharir, M. and Pnueli, A. 1981. Two approaches to interprocedural data flow analysis. In
Program Flow Analysis: Theory and Applications, S. Muchnick and N. Jones, Eds. PrenticeHall, Englewood Cliffs, NJ, Chapter 7, 189–234.
Yorsh, G., Reps, T., and Sagiv, M. 2004. Symbolically computing most-precise abstract operations for shape analysis. In Tools and Algs. for the Construct. and Anal. of Syst. LNCS, vol.
2988. 530–545.
ACM Journal Name, Vol. V, No. N, Month 20YY.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertising