Overview

The purpose of the PCG Analysis is to provide clients with the following:

The PCG data structure representing the state of ownership and borrowing of Rust places at arbitrary program points within a Rust function
For any pair $p p_{i}, p p_{j}$ of consecutive program points, an ordered list of actions that describe the transformation of the PCG of $p p_{i}$ to the PCG of $p p_{j}$ .

PCG Analysis Algorithm

Key Concepts:

The PCG Analysis Algorithm operates on the MIR Body of a Rust function and returns a PCG Analysis of the function.
A PCG Analysis contains PCG Data for every reachable¹ MIR location in the Body².
The PCG Data is a tuple of PCG States and PCG Action Data.

The PCG States of a MIR location is a list of the PCGs computed at that location, concretely:

An Initial PCG, followed by
One PCG for every PCG evaluation phase

The PCG Action Data of a MIR location contains a list of PCG Actions for each evaluation phase in the location (i.e. the actions performed at that phase).

The PCG Analysis Algorithm is implemented as a MIR dataflow analysis using PcgDomainData as the domain. PcgDomainData contains a PCGData value and other relevant metadata (e.g the associated basic block). Notably, the analysis only analyzes the statements in each basic block one time. Conceptually, this property is ensured because the final state at a loop head is computed upon entering it in the analysis (without having first seen the body).

We note that the behaviour of the join operation on PCGDomainData requires tracking of what blocks have been previously joined (this is basically a consequence of the interface of the MIR dataflow analysis). The PCGDomainData join operation joins the PCG $G^{'}$ of block $b^{'}$ into the PCG $G$ at block $b$ as follows:

If no block has ever been joined into $b$ , then set $G$ = $G^{'}$
If the edge from $b^{'}$ to $b$ is not a back edge³ of a loop, then $G^{'}$ is joined into $G$ using the algorithm defined here

Because the join does not modify the PCG for back edges, the analysis can be completed without ever having to re-analyse the statements within a block.

Our implementation should also be checking that the PCG generated at the loop head is valid w.r.t the state at the back edge here, but this is not happening yet.

PCG Data Structure

The PCG data structure represents the state of ownership and borrowing of Rust places at an arbitrary program point.

It consists of three components:

The Owned PCG, which describes the state of owned places
The Borrow State, which describes borrowed memory (borrowed places and lifetime projections) and borrow relations, and also some auxillary data structures
The Place Capabilities which is a map from places to capabilities

Owned PCG

The Owned PCG is a forest, where the root of each tree in the forest is a MIR Local.

Each tree in the forest is defined by a set of place expansions, which describe how unpacked all of the owned places are. For each expansion $\overline{p}$ in the set, the base $p$ is a node in the forest and $\overline{p}$ are its children. We note that each expansion can be similarly interpreted as a hyperedge ${p} \to \overline{p}$

Each local in the body of a MIR function is either unallocated or allocated in the Owned PCG. A local is allocated iff there exists a corresponding root node in the Owned PCG, otherwise it is unallocated.

The operation allocate $v$ in the Owned PCG requires that $v$ is not allocated, and adds a new root for $v$ . The deallocate $v$ operation removes the forest rooted at $v$ .

Borrow State

The Borrow State is a tuple containing a set of Validity Conditions that describes the set of paths leading to the current block and a Borrows Graph, a directed acyclic hypergraph which describes borrowed places, sets of borrows, and their relation to one another.

In our implementation the Borrows Graph is represented as a map from PCG hyperedges to validity conditions.

Because a borrow created within a block exists only for executions that visit that block, we label new borrows using the validity conditions of the block in which they were created.

Place Capabilities

The design for how place capabilities are represented and computed is being updated. In the new design, capabilities are computed from the initialisation state and borrow state rather than stored in an explicit map. See Computing Place Capabilities for details.

Place capabilities $C$ is a partial map from places to capabilities.

A MIR location $l$ reachable iff its basic block $b_{l}$ is reachable from the start block in the CFG without traversing unwind or fake edges (the latter kind do not correspond to the actual control flow of the function). The original reason for only considering reachable edges was to improve performance; removing this constraint (and instead considering all locations) would be simple to change in the implementation. ↩
The PCG analysis algorithm is implemented as a MIR dataflow analysis defined by the rust compiler crate. In the implementation, the PCG Data is computed for every reachable MIR location during the algorithm execution itself, but only the PCG Data for the entry state of each basic block is stored. The PCG Data for an arbitrary location within a block is re-computed by applying the dataflow analysis transfer function from the entry state. ↩
In the join implementation, we an edge from $b^{'}$ to $b$ is a back edge if $b$ dominates $b^{'}$ in the CFG ↩

Definitions

Types

Type Contains

A type $τ$ contains a type $τ^{'}$ , iff:

$τ = τ^{'}$ , or
$τ$ is an ADT and contains a field $f : τ_{f}$ and $τ_{f}$ contains $τ^{'}$
$τ = &^{'} r mut τ_{tgt}$ and $τ_{tgt}$ contains $τ^{'}$

Types Containing Lifetimes

A type $τ$ contains a lifetime $r$ iff $τ$ contains the type $&^{'} r mut τ^{'}$ for some type $τ^{'}$ . A lifetime $r$ is nested in a type $τ$ iff $τ$ contains a type $&^{'} r mut τ^{'}$ and $τ^{'}$ contains $r$ . We extend these concept to places: a place $p : τ$ contains a lifetime $r$ iff $τ$ contains $r$ ; $r$ is nested in $p : τ$ iff $r$ is nested in $τ$ . A lifetime projection $p ↓ r$ is nested if $r$ is nested in $p$ .

PCG Evaluation Phase

The PCG Evaluation Phases are:

PreOperands
PostOperands
PreMain
PostMain

For every MIR location, a seperate PCG is generated for each phase. They represent the following:

PreOperands - A reorganization of the initial state¹ to prepare to apply the effects of evaluating the operands in the statement
PostOperands - The result of applying the operand effects on the PreOperands state
PreMain - A reorganization of the PostOperands state to prepare to apply any other effects of the statement
PostMain - The result of applying any other effects of the statement to the PreMain state.

Program Point

A program point represents a point within the execution of a Rust function in a way that is more fine-grained than a MIR location (each MIR location has multiple program points which to different stages of evaluation of a statement). Concretely, a program point is either:

The start of a basic block
A pair of a MIR location and a PCG evaluation phase

Borrows and Blocking

Blocking

A place $p_{b l oc k er}$ blocks a place $p_{b l oc k e d}$ at a location $l$ if a usage of $p_{b l oc k e d}$ at $l$ would invalidate a live borrow $b$ contained in the origins of $p_{b l oc k er}$ at $l$ .

Borrow Liveness

A borrow $p = & mut p^{'}$ is live at location $l$ if a usage of $p^{'}$ at $l$ would invalidate the borrow.

Directly Borrowed

A place $p$ is directly borrowed by a borrow $b$ if $p$ is exactly the borrowed place (not e.g. a pre- or postfix of the place).

The initial state is either the PostMain of the previous location within the basic block. Or if this is the first statement within the block, it is the entry state of the block (i.e. the result from joining the final states of incoming basic blocks). ↩

Capabilities

A capability describes the actions that the program is permitted to perform on a place at a particular program point. There are three main capabilities:

Exclusive (`E`)

Places with this capability can be read from, written to, or mutably borrowed.

We do not differentiate between locals bound with let bindings and let mut bindings: a variable bound with let would still have this capability if it could be written to if it was mutably borrowed.

Read (`R`)

Places with this capability can be read from. Shared borrows can also be created to this place. Shared references with this capability can be dereferenced.

A place $p$ with capability E is downgraded to R if a shared borrow is created to a place that is a pre- or postfix of $p$ .

When a shared reference $p$ is dereferenced, the capability to $p$ is downgraded to R. Any place projecting from a shared references will have capability R.

Write (`W`)

The place can be written to.

When an exclusive reference $p$ is dereferenced, the capability to $p$ is downgraded to W.

In the implementation we define a fourth capability ShallowExclusive (e), which is used for a rather specific and uncommon situation. When converting a raw pointer *mut T into a Box<T>, there is an intermediate state where the memory for the box is allocated on the heap but the box does not yet hold a value. We use ShallowExclusive to represent this state.

In the implementation, writing to a Box-typed place p via e.g. *p = 5 requires that p have capability e.

PCG Nodes

$i v c b l p ℓ p \overset{p}{^} rp n ::= b [i] ::= ∣ start b ∣ loop b ∣ prepare l ∣ before-collapse l ∣ before-ref-assignment l ∣ mid l ∣ after l ::= ∣ p ∣ p at ℓ ::= ∣ p ∣ remote (v) ::= ∣ \overset{p}{^} ↓ r ∣ \overset{p}{^} ↓ r at ℓ ∣ \overset{p}{^} ↓ r at FUTURE ∣ c ↓ r ::= \overset{p}{^} ∣ rp (Integer) (MIR Local) (MIR Constant) (MIR Basic Block Index) (MIR Location) (MIR Place) (Label) (Maybe-Labelled Place) (Current Place) (Labelled Place) (PCG Place) (Maybe-Labelled Place) (Remote Place) (Lifetime Projection) (Place Projection) (Snapshot of Place Projection) (Placeholder Projection) (Constant Projection) (PCG Node)$

We probably don't need so many label types, but we have them in the implementation currently.

In the implementation we currently refer to lifetime projections as "region projections" and labelled places as "old" places.

Associated Place

The associated place of a PCG node $n$ is defined by the partial function $p (n)$ :

$p (p) = p$
$p (p at ℓ) = p$
$p (rp) = p (base (rp))$

Where $base (rp)$ is the base of the lifetime projection $rp$ as defined here.

Local PCG Nodes

A PCG node $n$ is a local node if it has an associated place $p (n)$ .

PCG Hyperedges

A PCG Hyperedge is a directed edge of the form $\overline{n_{s}} \to \overline{n_{t}}$ , where elements of $\overline{n_{s}}$ and $\overline{n_{t}}$ are PCG nodes. The set $\overline{n_{s}}$ are referred to as the source nodes of the edge, and $\overline{n_{t}}$ are the target nodes. Hyperedges in the PCG are labelled with validity conditions¹.

We represent a PCG hyperedge as a tuple of an edge kind and validity conditions.

Edge Kinds

An edge kind is a description an edge, including its source and target nodes, as well as other associated metadata. The metadata is described by the type of the edge, the various types are presented below:

Unpack Edges

Unpack edges are used to represent the unpack of an owned place in order to access one of its fields. For example, writing to a field x.f requires expanding x.

An unpack edge connects an owned place $p$ to one of it's expansions $\overline{p}$ . Each place in $\overline{p}$ is guaranteed to be owned.

For example, if x is an owned place with fields x.f and x.g, the edge {x} -> {x.f, x.g} is a valid unpack edge.

Unpack edges are not generated for dereferences of reference-typed places. Borrow PCG Expansion Edges are used in such cases. Unpack edges are used in derefences of Box-typed places.

In the implementation, we don't have an explicit data type representing unpack edges. Rather, the unpack edges are conceptually represented as the interior edges in the Owned PCG.

Validity conditions are not currently associated with unpack edges in the implementation.

Borrow PCG Expansion Edge

Borrow PCG Expansion Edges conceptually take one of three forms:

Dereference Expansion: The dereference of a reference-typed place
Place Expansion: The expansion of a borrowed place to access one of its fields
Lifetime Projection Expansion: The expansion of the lifetime projections of an owned or borrowed place

Dereference Expansion

The source nodes of a dereference expansion consist of:

A maybe-labelled place $p_{s}$ , and:
A lifetime projection $rp_{s}$

Where $p_{s}$ and $rp_{s}$ have the same associated place $p_{s}$ .

The target node $p_{t}$ of a dereference expansion is a maybe-labelled place with associated place $* p_{s}$ .

Place Expansion

The source node of a place expansion is a maybe-labelled place $p_{s}$ with associated place $p_{s}$ , where $p_{s}$ is a borrowed place and $p_{s}$ is not a reference.

The target nodes of a place expansion is a set of maybe-labelled places $\overline{p_{t}}$ where the associated places of $\overline{p_{t}}$ are an expansion of $p_{s}$ .

Lifetime Projection Expansion

The source node of a lifetime projection expansion is a lifetime projection $rp_{s}$ where $base (rp_{s})$ is a maybe-labelled place $p_{s}$ with associated place of $p_{s}$ .

The target nodes of a lifetime projection expansion is a set of lifetime projections $\overline{rp_{t}}$ where the base of each place is a maybe-labelled place. The associated places of the bases of $\overline{rp_{t}}$ are an expansion of $p_{s}$ .

It might make sense to differentiate lifetime projection expansions of owned and borrowed places, since they differ in terms of how placeholder labels should be included. Namely, for owned places there is no need to connect the expansion nodes to the placeholder of the base node (the owned place may never be repacked, or could be repacked with entirely unrelated lifetime projections)

Borrow Edges

Borrow edges are used to represent direct borrows in the program. We define two types:

Local Borrow Edges: A borrow created in the MIR Body, e.g. from a statement let y = &mut x;
Remote Borrow Edges: A borrow created by the caller of this function where the assigned place of the borrow is an input to this function.

Remote Borrows are named as such because (unlike local borrows), the borrowed place does not have a name in the MIR body (since it was created by the caller).

Local Borrows

The source node of a local borrow is a maybe-labelled place $p$ . The target node of a local borrow is a lifetime projection $rp$ where $base (rp)$ is a maybe-labelled place.

Remote Borrows

The source node of a remote borrow is a remote place $remote (v)$ . The target node of a remote borrow is a lifetime projection $rp$ where $base (rp)$ is a maybe-labelled place.

Abstraction Edge

An abstraction edge represents the flow of borrows introduced due to a function call or loop.

Function Call Abstraction Edge

The source node of a function call abstraction edge is a lifetime projection $rp$ where $base (rp)$ is a maybe-labelled place.

The target node of a function call abstraction edge is a lifetime projection $rp$ where $base (rp)$ is a maybe-labelled place.

Loop Abstraction Edge

The source node of a loop abstraction edge is a PCG node.

The target node of a loop abstraction edge is a local PCG node.

Borrow-Flow Edge

A borrow-flow edge represents the flow of borrows between a lifetime projection $rp_{s}$ and a local lifetime projection $rp_{t}$ . This edge is used when the relationship between the blocked and target node is known concretely, but does not correspond to an expansion or a borrow.

Borrow-Flow Edges are labelled with a Borrow-Flow Edge Kind with associated metadata, enumerated as follows:

Aggregate

Metadata:

$i_{f}$ : Field Index
$i_{rp}$ : Lifetime Projection Index

An aggregate kind is used for borrows flowing into an aggregate type (i.e. struct, tuple). The metadata indicates that the blocked lifetime projection flows into the $i_{rp}^{t h}$ lifetime projection of the $i_{f}^{t h}$ field of the blocking lifetime projection.

Introduced in the following two cases:

Collapsing an owned place:
- edges flow from the lifetime projections of the labelled places of the base to the lifetime projections of the current base
Assigning an aggregate (e.g. x = Aggregate(a, b)):
- edges flow from the lifetime projections of the labelled places in the rvalue to the lifetime projections of x

$i_{rp}$ is probably not necessary. We probably don't even need $i_{f}$ for case 1 (field index can be inferred from the expansion place), so perhaps a different edge kind could be used in that case.

Reference to Constant

Connects a region projection from a constant to some PCG node. For example let x: &'x C = c; where c is a constant of type &'c C, then an edge {c↓'c} -> {x↓'x} of this kind is created. This is called ConstRef in the implementation.

This seems quite similar to "Borrow Outlives", perhaps we should merge them?

Borrow Outlives

For a borrow e.g. let x: &mut y;, the PCG analysis inserts edges of this kind to connect the (potentially snapshotted) lifetime projections of y to the lifetime projections of x.

Initial Borrows

To construct the initial PCG state, the PCG analysis adds an edge of this kind between every lifetime projection in each remote place to the corresponding lifetime projection of its corresponding argument.

For example, if $v$ is the local corresponding to a function argument and contains a lifetime projection $v ↓ r$ , an edge ${remote (v) ↓ r} \to {v ↓ r}$ will appear in the graph.

Connects the lifetime projections of remote places to the lifetime projections of

Copy

For a copy let x = y;, the PCG analysis inserts edges of this kind to connect the lifetime projections of y to the lifetime projections of x.

In the implementation this is currently called CopyRef.

Move

For a move let x = move y;, the PCG analysis inserts edges of this kind to connect the (potentially snapshotted) lifetime projections of y to the lifetime projections of x.

Future

These edges are introduced to describe the flow of borrows to the lifetime projections of a place that is currently blocked. When they are created, the target node is a placeholder lifetime projection of a blocked place.

Perhaps this should be its own edge type?

Currently not in the owned portion of the PCG, but this should happen eventually. ↩

Types

A type $τ$ is either:

A type parameter of the form $param i$
An alias type of the form $τ :: T ⟨ \overline{τ} ⟩$
A type constructor application of the form $T ⟨ \overline{τ} ⟩$
A box type of the form $Box ⟨ τ ⟩$

Corresponding Regions

If $r$ is a region in $τ$ , the corresponding region $r_{c}$ in a type $τ_{c}$ is:

If $τ = & r m τ^{'}$ and $τ_{c} = & r_{c}^{'} m τ_{c}^{'}$ then $r_{c} = r_{c}^{'}$

If $τ = T ⟨ τ_{1}, \dots, t_{n} ⟩$ and $τ_{c} = T ⟨ τ_{c_{1}}, \dots, t_{c_{n}} ⟩$ , iterate $i$ over $1, \dots, n$ , and if there exists an $r_{c}^{'}$ where $r_{c}^{'}$ in $τ_{c_{i}}$ is the corresponding region of $r$ in $t_{i}$ , then $r_{c} = r_{c}^{'}$ .

Places

A place $p$ is a tuple of a local $v$ and a projection.

Place Expansion

This is missing some cases

A set of places $\overline{p}$ is a place expansion iff there exists a base $p$ such that:

$p$ is an enum type and $\overline{p} = {p @ V}$ and $V$ is a variant of $p$
$p$ is a struct or tuple type and $\overline{p}$ is the set of places obtained by projecting $p$ with each of the fields in the type of $p$
$p$ is a reference-typed field and $\overline{p} = {* p}$
$p$ is an array or slice and $\overline{p} = p [i]$ (TODO: more cases)

If there is such a $p$ , then that $p$ is unique, and $\overline{p}$ is an expansion of $p$ .

Owned Places

A place is owned iff it does not project from the dereference of a reference-typed place.

Place Liveness

A place $p$ is live at a location $l$ iff there exists a location $l^{'}$ and a control flow-path $c$ from $l$ to $l^{'}$ where a place conflicting with $p$ is used at $l^{'}$ and there are no assignments of any places conflicting with $p$ along c.

Place Prefix

A place $p$ is a prefix of a place $p^{'}$ iff:

$p$ and $p^{'}$ have the same local, and
The projection of $p$ is a prefix of the projection of $p^{'}$

Note that $p$ is a prefix of itself.

A place $p$ is a strict prefix of $p^{'}$ iff $p$ is a prefix of $p^{'}$ and $p \neq = p^{'}$ .

Regions

A region $r$ is one of:

A RegionVid (a region variable identifier)
The static region 'static
An early-bound region

TODO: Other cases

Lifetime Projections

Generalized Lifetimes

A generalized lifetime $g r$ is either a region $r$ or $RegionsIn (τ)$ , where $τ$ is either:

a type parameter, or
a type alias that cannot be normalized

Generalized Lifetime Projections

A generalized lifetime projection $g r p$ takes the form $b ↓ g r$ where $b$ is a base having an associated type $τ$ . The index $index (g r p)$ of a lifetime projection is the index of the occurence of $g r$ in the generalized lifetime list $GR (τ)$ (the list of generalized lifetimes in $τ$ , occurring in the order they appear in $τ$ , and with duplicates removed).

A lifetime projection is a generalized lifetime projection of the form $b ↓ r$ (that is, a generalized lifetime projection where the associated generalized lifetime is a region).

PCG Lifetime Projections

PCG lifetime projections take the following form

$rp ::= ∣ \overset{p}{^} ↓ r ∣ \overset{p}{^} ↓ r at ℓ ∣ \overset{p}{^} ↓ r at FUTURE ∣ c ↓ r (Lifetime Projection) (Place Projection) (Snapshot of Place Projection) (Placeholder Projection) (Constant Projection)$

Lifetime Projection Lifetime

The lifetime $r (rp)$ of a lifetime projection $rp$ is conceptually the lifetime $r$ on the right of the $↓$ , i.e:

$r (\overset{p}{^} ↓ r) = r$
$r (\overset{p}{^} ↓ r at ℓ) = r$
$r (\overset{p}{^} ↓ r at FUTURE) = r$
$r (c ↓ r) = r$

Lifetime Projection Base

The base $base (rp)$ of a lifetime projection $rp$ is conceptually the part on the left of the $↓$ , i.e. defined as follows:

$base (\overset{p}{^} ↓ r) = \overset{p}{^}$
$base (\overset{p}{^} ↓ r at ℓ) = \overset{p}{^}$
$base (\overset{p}{^} ↓ r at FUTURE) = \overset{p}{^}$
$base (c ↓ r) = c$

Validity Conditions

We associate borrow PCG edges with validity conditions to identify the control-flow conditions under which they are valid. Consider the following program:

fn main(){
    // BB0
    let mut x = 1;
    let mut y = 2;
    let mut z = 3;
    let mut r = &mut x;
    if rand() {
        // BB1
        r = &mut y;
    }
    // BB2
    if rand2() {
        // BB3
        r = &mut z;
    }
    // BB4
    *r = 5;
}

We represent control flow using a primitive called a branch choice $d \in D$ of the form $b_{i} \to b_{j}$ . A branch choice represents the flow of control from one basic block to another. In the above code there are two possible branch choices from bb0: $bb0 \to bb1$ and $bb0 \to bb2$ and one branch choice from bb1 (to bb2).

For any given basic block $b$ in the program, an execution path leading to $b$ is an ordered list of blocks ending in $b$ . For example, one path of execution to bb4 can be described by: $bb0 \to bb2 \to bb3 \to bb4$ .

Validity conditions $pc$ conceptually define a predicate on execution paths. For example, the validity conditions $pc_{x \to r}$ describing the control flow where r borrows from z at bb3 require that the path contain $bb2 \to bb3$ .

We represent validity conditions $pc$ as a partial function $B \to P (B)$ mapping relevant blocks to sets of allowed target blocks.

An execution path $\overline{d}$ is valid for validity conditions $pc$ iff, for each $b_{s} \to b_{t} \in \overline{d}$ , either:

$b_{s}$ is not a relevant block in $pc$ , or
$b_{t}$ is in the set of allowed target blocks of $b_{s}$ in $pc$

Formal Definition

The representation of validity conditions in our implementations corresponds closely to the following description:

Validity conditions $pc \in I$ is a map $B \to P (B)$ describing control-flow conditions. For each block $b \in dom (pc)$ , $pc [b]$ is a subset of the real successors of $b$ .

The join of validity conditions $pc_{i}$ and $pc_{j}$ is defined as:

$(pc_{i} \cup pc_{j}) [b] = p c_{i} [b] \cup p c_{j} [b]$

Validity conditions $pc$ are valid for a path $b_{0}, \dots, b_{n}$ iff:

$\forall i \in {0, \dots, n - 1} : pc [b_{i}] = \emptyset \lor b_{i + 1} \in pc [b_{i}]$

Correctness Theorem (Sketch)

For every borrow edge $e$ in the PCG at location $l$ and execution path $\overline{d}$ to $l$ , the corresponding borrow is live at $l$ iff $\overline{d}$ satisfies the validity conditions of $e$ .

Proof

We prove this for each location in the MIR by induction on the basic blocks of the graph via a reverse-postorder traversal. The inductive hypothesis is that the property holds for each ancestor block.

Having proved for the 1st location in a basic block, it is trivial to show for the remaining locations in the block, therefore we only concern ourselves with the basic blocks.

The case of bb0 is trivial.

We now concern ourselves with the validity conditions $p c_{n}^{e}$ of an arbitrary edge $e$ at block $b_{n}$ , assuming via IH that it holds for all $b_{n}^{'}$ where $n^{'} < n$ . Let $\overline{b_{in}}$ be the set of incoming nodes. We define our $p c_{n}^{e}$ as the repeated join: $⋃_{p c_{e}^{f} \in \overline{b_{in}}} p c_{f}^{e} \cup {b_{f} \to b_{n}}$ .

We first show that the if direction holds: For every borrow edge $e$ in the PCG at location $l$ and execution path $\overline{d}$ to $l$ , if the corresponding borrow is live at $l$ , then $\overline{d}$ satisfies the validity conditions of $e$ .

Every $\overline{d}$ has a prefix $\overline{d_{f}}$ and ends with $b_{f} \to b_{n}$ , by our IH we have $\overline{d_{f}}$ satisfies $p c_{e}^{f}$ . We need to then show that $\overline{d} = \overline{d_{f}} + + [b_{n}]$ satisfies $p c_{e}^{n}$ . Our proof is by contradiction.

Suppose $\overline{d}$ did not satisfy $p c_{e}^{n}$ ; then there must be a pair $(b \to b^{'}) \in \overline{d}$ where $p c_{e}^{n} [b] \neq = \emptyset \land b^{'} \neq \in p c_{e}^{n} [b]$ . We know that the pair cannot be the final pair in $\overline{d}$ (i.e. $b_{f} \to b_{n}$ ) by the definition of join we have $p c_{e}^{n} [b_{f}] = \emptyset \lor b_{n} \in p c_{e}^{f} [b]$ . Therefore the edge $b \to b^{'}$ must exist in some strict prefix $\overline{d}^{'}$ of $\overline{d}$ . Then, there must be a nonempty set $\overline{b_{f}}^{'} \subseteq \overline{b_{in}}$ where $p c_{e}^{f^{'}} [b] \neq = \emptyset \land b^{'} \neq \in p c_{e}^{f^{'}} [b]$ (otherwise, the joined $p c_{e}^{n} [b]$ would be either $\emptyset$ or contain $b^{'}$ ). But then, because $\overline{d}^{'}$ must contain $b \to b^{'}$ , it would not satisfy the validity conditions of $p c_{e}^{f^{'}}$ for any $f^{'}$ .

We now show the reverse: For every borrow live at $l$ for an execution path $\overline{d}$ , its corresponding edge satisfies the validity conditions of $e$ .

Our proof relies on the monotonicity of joins: if $b^{'} \in p c [b]$ then $b^{'} \in (p c \cup p c^{'}) [b]$ for any $p c^{'}$ .

The borrow is created at a unique block $b_{c}$ . Then $b_{c}$ is definitely in $\overline{d}$ at some index $i$ and the borrow is not killed in any block $b_{i}$ for all $c < i ⩽ ∣ \overline{d} ∣$ .

Our proof works by building longer prefixes of $\overline{d}$ . The base case is when the borrow is created (and therefore becomes live) at $b_{c} = d_{i}$ , where the property holds trivially. Then, when going from $d_{i}, \dots, d_{j}$ to $d_{i}, \dots, d_{j + 1}$ , we add to the validity conditions $d_{j} \to d_{j + 1}$ and join other incoming blocks. Because the prefix satisfies the prior conditions by induction, satisfies the additional condition, and monotonicity of join, the longer prefix also satisfies conditions. QED.

Functions

Generalized Types and Parameter Environments

A generalized type $τ^{*}$ is either a type $τ$ or a region $r$

A param env $E$ is a list of constraints; where a constraint takes the form:

$r : r^{'}$ ( $r$ outlives $r^{'}$ )
$τ : r$ (All regions in $τ$ outlive $r$ )
$τ : Tr$ ( $τ$ implements $Tr$ )

Outlives Judgments

The base outlives relation $E ⊢_{0} τ^{*} : r$ holds iff it can be derived from the following rules:

Direct: $(τ^{*} : r) \in E$
Reflexivity: $E ⊢_{0} r : r$
Transitivity: $E ⊢_{0} τ^{*} : r$ and $E ⊢_{0} r : r^{'}$ implies $E ⊢_{0} τ^{*} : r^{'}$

That is, the base outlives relation is the transitive closure of the region-outlives and type-outlives facts in $E$ .

There also exists an outlives relation $E ⊢ τ^{*} : r$ such that $E ⊢_{0} τ^{*} : r$ implies $E ⊢ τ^{*} : r$ . This relation is computed by Rust's type system and additionally accounts for outlives constraints implied by trait membership constraints in $E$ .

Function Signatures

A function signature is a pair $⟨ \overline{τ^{in}}, τ^{out} ⟩$ .

A defined function signature $f$ is a tuple $⟨ defid, \overline{τ^{in}}, τ^{out}, E ⟩$ .

An instantiation $\hat{f}$ of $f$ is the tuple $⟨ f, \overline{τ^{*}} ⟩$ ; where $\overline{τ^{*}}$ is a list of early-bound parameters.

The identity instantiation $f_{I}$ of $f$ is obtained by applying the identity substitution $I_{τ^{*}}$ . Defined function calls are applied to instantiations of a function.

The generalized lifetime projections $GRP (\hat{f})$ of a function instantiation $\hat{f}$ is defined as the set:

${arg i ↓ g r ∣ i ⩽ ∣ \overline{τ^{in}} ∣, g r \in GR (τ_{i}^{in}) ∣} \cup$ ${result ↓ g r ∣ g r \in GR (τ^{out})}$

Function Calls

A function call target $\tilde{f}$ is either an instantiation $\hat{f}$ or a closure / function pointer $c t$ .

A function call $FC$ takes the form $p = \tilde{f} (\overline{o p}) at l$ , where $p$ is a MIR place, and $\overline{o p}$ is a sequence of MIR operands.

$E (FC)$ is the parameter environment of the function with respect to the call site.

The lifetime projections $RP (FC)$ of a function call is the union of the lifetime projections in $p$ and the lifetime projections in $\overline{o p}$ .

A function call $FC$ is valid iff it satisfies the unique region property: each region in the lifetime projections of $FC$ is unique.

We assume that function calls generated by directly extracting the result place and operands from a MIR body are valid. We note that converting the places to PCG places (which use the type derived from their local), does not necessarily maintain the validity of a function call.

MIR Definitions

Here we describe definitions of MIR concepts that are relevant to the PCG.

It's possible that these definitions will become outdated as the MIR is not stable. If there is any discrepency between the descriptions here and those from official Rust sources (e.g. the dev guide), this page should be updated accordingly.

MIR Dataflow Analysis

At a high level, a MIR dataflow analysis is defined by the following elements:

A domain $D$
A join operation $join : (D \times D) \to D$
An empty state $d_{ϵ} \in D$
A transfer function $transfer : (D \times Location) \to D$

Performing the dataflow analysis on a MIR body $B$ computes a value of type $D$ for every location in $B$ . The analysis is performed (conceptually) as follows¹:

The analysis defines a map $S$ that maps locations in $B$ to elements of $D$ .
Each location in $S$ is initialized to $d_{ϵ}$
The operation analyze(b) updates $S$ as follows:
- $s [b [0]] \leftarrow transfer (s [b [0]], b [0])$
- For $0 < i ⩽ ∣ b ∣ : s [b [i]] \leftarrow transfer (s [b [i - 1]], b [i])$
The analysis defines a worklist $W = [b_{0}]$
While $W$ is not empty:
- Pop $b$ from $W$
- Perform $ana l yze (b)$
- Let $s_{b}^{exit}$ be the entry of the last location in $b$ in $s$
- For each successor $b^{'}$ of $b$ :
  - Let $s_{b^{'}}^{entry} = s [b^{'} [0]]$
  - Let $s_{b^{'}}^{join} = join (s_{b^{'}}^{entry}, s_{b}^{exit})$
  - If $s_{b^{'}}^{join} \neq = s_{b^{'}}^{entry}$ :
    - $s [b^{'} [0]] \leftarrow s_{b^{'}}^{join}$
    - Add $b^{'}$ to $W$
$S$ is the result

I'm not sure of the order things are popped from $W$ . Any ordering should yield the same $S$ but some blocks may be analyzed more frequently than necessary. We should check the rustc implementation.

The current analysis implementation (defined in the rust compiler) is more efficient than what we describe because it tracks state per basic block rather than per-location, as the states for any location in a block can be computed by repeated application of the transfer function to the entry state. ↩

Borrow-Checker Interface

The PCG Analysis requires an implementation of the borrow-checker interface providing the following:

A predicate $live (n, l)$ , which holds iff the borrow extents in PCG node $n$ are all in scope at MIR location $l$
A predicate $directly_blocked (p, l)$ , which holds iff place $p$ is directly blocked at MIR location $l$
A predicate $blocks (p, p^{'}, l)$ which holds iff mutating $p^{'}$ at $l$ would cause $p$ to be dead
A predicate $outlives (r, r^{'}, l)$ which holds iff $r$ must outlive $r^{'}$ at $l$
A function $borrows_blocking (p, l)$ which returns the set of borrows that would be invalidated if $p$ was modified at $l$
A function $twophase_borrow_activations (l)$ which returns the set of two-phase borrows that are activated at $l$ .

Analysing Statements

The PCG Analysis computes four states for each MIR statement, corresponding to the PCG evaluation phases:

PreOperands
PostOperands
PreMain
PostMain

The analysis for each statement proceeds in two steps:

Step 1: Place Conditions are computed for each statement phase
Step 2: PCG Actions are performed for each statement phase

Determining Place Conditions

A place condition is a predicate on the PCG related to a particular MIR place.

We define the following place conditions:

Capability: Place $p$ must have capability $c$
RemoveCapability: Capability for place $p$ should be removed¹
AllocateOrDeallocate: Storage for local $v$ is allocated (e.g. via the StorageLive instruction)
Unalloc: Storage for local $v$ is not allocated (e.g. via the StorageDead instruction)
ExpandTwoPhase: Place $p$ is the borrowed place of a two-phase borrow
Return: The RETURN place has Exclusive capability

ExpandTwoPhase may not be necessary. AllocateOrDeallocate is a confusing name, also it's not clear if it's any different than just having Write permission.

During this step of the analysis, place conditions are computed for each phase. The determination of place conditions is based on the MIR statement; the state of the PCG is not relevant.

The conditions computed for each phase are as follows:

PreOperands: Pre-conditions on the PCG for the operands in the statement to be evaluated
PostOperands: Post-conditions on the PCG after the operands in the statement has been evaluated
PreMain: Pre-conditions on the PCG for the main effect of the statement to be applied
PostMain: Post-conditions on the PCG after the main effect of the statement has been applied

As an example, the MIR statement: let y = move x would have the following place conditions:

PreOperands: {x: E}
PostOperands: {x: W}
PreMain: {y: W}
PostMain: {y: E}

The rules describing how these place conditions are generated for a statement are described here.

Performing PCG Actions

After the place conditions for each phase are computed, the analyses proceeds by performing the actions for each phase. At a high-level, this proceeds as follows:

`PreOperands`

The Borrow PCG graph is minimized by repeatedly removing every effective leaf edge² $e$ for which their target PCG nodes of $e$ are not live at the current MIR location. A Borrow PCG RemoveEdge action is generated for each removed edge. See more details here.

TODO: Precisely define liveness.

The place capabilities required by the PreOperand place conditions are obtained.

`PostOperands`

No actions occur at this stage. Capabilities for every moved-out operand are set to Write.

`PreMain`

The place capabilities required by the PreMain place conditions are obtained.

Then, the behaviour depends on the type of statement:

StorageDead(v)
1. The analysis performs the MakePlaceOld(v, StorageDead) action.
Assign(p r)
1. If p is a reference-typed value and contained in the borrows graph and the capability for p is R:
  1. The analysis performs the RestoreCapability(p, E) action
2. If $C [p] \neq = W$ :
  1. The analysis performs the $Weaken (p, C [p], W)$ action
3. All edges in the borrow PCG blocked by any of the lifetime projections in p are removed

`PostMain`

For every operand move p in the statement:
1. The analysis performs the MakePlaceOld(p, MoveOut) action.
If the statement is a function call p = call f(..):
1. Function call abstraction edges are inserted using the rules defined here
If the statement takes the form Assign(p, r):
1. p is given exclusive permission
2. If $r$ takes the form Aggregate(o_1, ..., o_n):
  1. For every $i \in {i ∣0 ⩽ i < n \land (o_{i} = move p \lor o_{i} = copy p)}$
    1. Let $p_{i}$ be the associated place of $o_{i}$
    2. For all $j . k .0 ⩽ j < ∣ r p (p) ∣, 0 ⩽ k < ∣ r p (p_{i}) ∣$
      1. If $p_{i} ↓ r_{k}$ outlives $p ↓ r_{j}$ :
        
        Add an Aggregate BorrowFlow edge ${p_{i} ↓ r_{k}} \to {p ↓ r_{j}}$ , with associated field index $i$ and lifetime projection index $k$ .
3. If $r$ takes the form use c, c is a reference-typed constant with lifetime $r_{c}$ , and $p$ is a reference-typed place with lifetime $r_{p}$ , then:
  1. Create a new ConstRef borrowedge of the form ${c ↓ r_{c}} \to {p ↓ r_{p}}$
4. If $r$ takes the form move p_f or cast(_, move p_f):
  1. For all $i, 0 ⩽ i < ∣ r p (p) ∣$ :
    1. Let $p ↓ r$ be the i'th lifetime projection in p
    2. Let $p_{f} ↓ r_{f}$ be the i'th lifetime projection in p_f
    3. Let $ℓ$ be the snapshot location $before l : P os tOp er an d s$
    4. Add a Move edge ${p_{f} at ℓ ↓ r_{f}} \to {p ↓ r}$
5. If $r$ takes the form copy p_f or cast(_, copy p_f):
  1. For all $i, 0 ⩽ i < ∣ r p (p) ∣$ :
    1. Let $p ↓ r$ be the i'th lifetime projection in p
    2. Let $p_{f} ↓ r_{f}$ be the i'th lifetime projection in p_f
    3. Add a CopyRef edge ${p_{f} ↓ r_{f}} \to {p ↓ r}$
6. If $r$ takes the form &p or &mut p:
  1. Handle the borrow as described here

This is only used for mutably borrowed places ↩
The set of effective leaf edges are the leaf edges in the graph obtained by removing all edges to placeholder lifetime projections. ↩

Removing Edges for out-of-scope Borrow Extents

At the beginning of each statement

Rules for Determining Place Conditions

Place conditions are computed in terms of triples, where a triple is a pair (pre, post) where pre is a place condition and post is either a place condition or None.

The place conditions for each phase are determined by two sets of triples: the operand triples and the main triples. The place conditions for the PreOperands phase is the set of conditions in the pres of the operand triples. The PostOperands phase is the set of conditions in the posts of the operand triples. PreMain and PostMain are defined accordingly.

Determining Operand Triples for a Statement

For each operand $o$ in the statement:
1. If $o$ takes the form copy p:
  - Add (p: R, None) to the operand triples
2. If $o$ takes the form move p:
  - Add (p: E, p: W) to the operand triples
For each rvalue $r$ in the statement:
1. If $r$ takes the form &p:
  - Add (p: R, p: R) to the operand triples
2. If $r$ takes the form &mut p:
  - If the borrow is a two-phase borrow:
    - Add (ExpandTwoPhase p, p: R) to the operand triples
  - Otherwise, add (p: E, RemoveCapability p) to the operand triples
3. If $r$ takes the form *mut p:
  - Add (p: E, None) to the operand triples
4. If $r$ takes the form *const p:
  - Add (p: R, None) to the operand triples
5. If $r$ takes the form len(p), discriminant(p) or CopyForDeref(p):
  - Add (p: R, None) to the operand triples

Determining Main Triples for a Statement

The rule depends on the statement type:

Assign(p, r)
1. If r takes the form &fake q:
  - Add (p: W, None) to the main triples
2. If r takes the form ShallowInitBox o t
  - Add (p: W, p: e) to the main triples
3. Otherwise, add (p: W, p: E) to the main triples
FakeRead(_, p)
1. Add (p: R, None) to the main triples
SetDiscriminant(p, ..)
1. Add (p: E, None) to the main triples
Deinit(p)
1. Add (p: E, p: w) to the main triples
StorageLive(v)
1. Add (Unalloc v, AllocateOrDeallocate v) to the main triples
StorageDead(v)
1. Add (AllocateOrDeallocate v, Unalloc v) to the main triples
Retag(_, p)
1. Add (p: E, None) to the main triples

Determining Main Triples for a Terminator

The rule depends on the terminator type:

Return
1. Add (Return, _0: w) to the main triples
Drop(p)
1. Add (p: W, None) to the main triples
Call(p, _)
1. Add (p: W, p: E) to the main triples
Yield(p, _)
1. Add (p: W, p: E) to the main triples

Rules for the Creation of Borrows

Mutable Borrows

Consider the stmt p = &mut q, at a program point $l$ , where $p$ has type $& r_{0} mut τ$ , and $q$ has type $τ$ , and $τ$ is a type containing lifetimes $r_{1}, \dots r_{n}$ .

At the end of the PreOperands phase, the PCG is guaranteed to be in a state where, for each $r_{i} \in {r_{1}, \dots, r_{n}}$ the lifetime projection $q ↓ r_{i}$ is in the graph. During the Operands phase, each lifetime projection $q ↓ r_{i}$ is labelled with the current program point to become $q ↓ r_{i} at l$ . At the end of the PreMain phase, for each $r_{i} \in {r_{0}, \dots, r_{n}}$ , the lifetime projection $p ↓ r_{i}$ is guaranteed not to be in the graph. During the Main phase, these projections are added.

Subsequently, the labelled lifetime projections under $p$ are connected with BorrowFlow edges to the new lifetime projections under $q$ . Namely, for each $i \in {1, \dots n}, j \in {0, \dots, n}$ if $r_{i}$ outlives $r_{j}$ , then a BorrowFlow edge ${q ↓ r_{i} at l} \to {p ↓ r_{j}}$ is added.

Subsequently, we introduce Future nodes and edges to account for nested references as follows. For each $i \in {1, \dots n}$ :

Insert the node $q ↓ r_{i} atFuture$ into the graph
If any Future edges originate from the labelled projection $q ↓ r_{i} at l$ , redirect them such that they originate from $q ↓ r_{i} atFuture$ .
Insert a Future edge ${q ↓ r_{i} at l} \to {q ↓ r_{i} atFuture}$
Insert a Future edge ${p ↓ r_{i}} \to {q ↓ r_{i} atFuture}$

Owned PCG Operations

Collapse

The operation $collapse (p, E, C)$ modifies place expansions $E$ and set of place capabilities $C$ such that $p$ becomes a leaf in the forest corresponding to $E$ . Stated more formally it modifies $E$ to ensure that $E$ contains an expansion $\overline{p}$ containing $p$ , and $p$ is not the base of any expansion in $E$ . Capabilities in $C$ are updated to account for the removal of expansions from $E$ .

collapse returns the set of Owned PCG Actions corresponding to the removed expansions.

This logic is very similar to the collapse defined on the (combined) PCG defined here. This is used in contexts where the Borrow PCG is not available (such as the join on owned PCGs).

We should investigate making a common operation.

The algorithm is implemented as follows:

Let $E^{'}$ be the subset of place expansions in $E$ such that for each $\overline{p^{'}}$ in $E^{'}$ , the base place $p^{'}$ is a prefix of $p$ .
Let $E^{''}$ be an ordered list of the expansions in $E^{'}$ sorted in order of descending length of the projections of their base place
Let $R$ be the list of operations to return
For each $\overline{p^{'}}$ in $E^{''}$
1. Let $\overline{c^{'}}$ be the capabilities of $p^{'}$ in $C$
2. Let $c$ be the minimum common capabiility of $\overline{c^{'}}$ .
3. Let $p^{'}$ be the base of $\overline{p^{'}}$
4. Remove capabilities to the places in $\overline{p^{'}}$ from $C$
5. Assign capability $c$ to $p^{'}$ in $C$
6. Remove $\overline{p^{'}}$ from $E$
7. Add $collapse (p^{'}, \overline{p^{'}}, c)$ to $R$
return $R$

Join Operation

The join algorithm on PCGs takes as input PCGs $G$ and $G^{'}$ and mutates $G$ to join in $G^{'}$ .

We define the join in this manner because this is similar to how the implementation works.

The algorithm proceeds in the follow steps:

The Owned PCG of $G^{'}$ is joined into the Owned PCG of $G$ (this may also change the capabilities of $G$ )
The capabilities of $G^{'}$ are joined into the capabilities of $G$ .
The Borrow State of $G^{'}$ is joined into the Borrow State of $G$

We now describe each step in detail:

Owned PCG Join

Let $O$ be the owned PCG of $G$ and $O^{'}$ the PCG of $G^{'}$ .

For each local $v$ in the MIR body:
1. If $v$ is allocated in both $O$ and $O^{'}$ :
  1. Join the place expansions rooted at $O^{'} [v]$ into $O [v]$
2. Otherwise, if $v$ is allocated in $O$ , it should be deallocated in $O$

Joining Local Expansions (Algorithm Overview)

The algorithm for joining local expansions also involves modifying capabilities (including to borrowed places), and also may cause some places to be labelled. We "emulate" a modification of the block to be joined in, so that we can use the resulting modified capabilities and borrows graph in the remainder of the join. We implement the join by defining a "one-way" join operation $⊔^{\leftarrow}$ , and then performing a join $(G, G^{'}) \leftarrow G ⊔^{\leftarrow} G^{'}$ and subsequently $(G^{'}, G) \to G^{'} ⊔^{\leftarrow} G$ . The owned PCGs of $G$ and $G^{'}$ (including the associated capabilities) should be the same after the join. Formally, we could imagine a partial order $<_{O}$ where $G_{1} <_{G} G_{2}$ iff the owned PCG of $G_{1}$ can be transformed to the owned PCG of $G_{2}$ by a sequence of repack operations on its owned PCG. The join identifies the maximum graph $G^{'}$ (w.r.t $<_{O}$ ) where $G^{'} <_{O} G_{1}$ and $G^{'} <_{O} G_{2}$ .

Before we present the full algorithm (that takes into account borrows), we first describe a version that would work in the absence of borrows and therefore only considers capabilities W and E. The key point of the join is that all places/capabilities in the resulting joined PCGs should be obtainable from the originals (i.e. via a sequence of PCG repacks). The join starts from the root of both trees, and navigate downwards. For expansions that are the same in both, nothing needs to happen. If the other tree is expanded more than us, we continue expanding ours: the only reason it would be expanded is that part of it is moved out, so we can expand and weaken the LHS to match the RHS. For example, if the LHS is {x: E}, and the RHS is {x.f: E, x.g: W}, we'd want to expand x in the LHS (note that the subsequent capability join will properly account for capabilities).

If we're at a place where the expansions differ in both trees (e.g. x -> {x@Some} vs x -> {x@None}), we collapse both to the place (e.g x) and set capability to the place to W. Neither of the expansion places are accessible by both, so the "most" capability we could obtain is the W permission to the base.

More formally, we define the one-way join $j o i n_{\leftarrow} (G, G^{'})$ as the traversal where:

If we're at a leaf $p$ in $G$ that is not a leaf in $G^{'}$ (having children $\overline{p}$ ), then:
1. Expand $p$ to $\overline{p}$ in $G$
If were at a node $p$ in $G$ where the expansion $e$ is not the same as the expansion $e^{'}$ in $G^{'}$ :
1. Collapse both $G$ and $G^{'}$ to $p$ for capability $W$

Now, we extend the join to work with borrows and the read capability $R$ . Borrows complicate the story because it enables mutually exclusive (at runtime) places to appear in the PCG. For example:

#![allow(unused)]
fn main() {
fn f(x: Either<String, String>) -> &String; {
    let result = match x {
        Left(l) => &l,
        Right(r) => &r
    };
    result
}
}

After the assignment, we should have $R$ capability to both x@Left and x@Right because they are borrowed into result. In the case of exclusive borrows, such places would not have capability but still present in the owned PCG (again, to reflect the target of result). Therefore, the collapse and expand rule previously shown are no longer sufficient.

If we're at a place $p$ that's a leaf in both $G$ and $G^{'}$ :
1. If $p$ has capability of at least $R$ in $G$ and has capability exactly $R$ in $G^{'}$ :
  1. Then, after the join, the place $p$ can have at most capability $R$ . Its also possible that we can now have incompatible different owned expansions of $p$ in the joined PCG, that have been borrowed under mutually exclusive paths.

TODO: continue. (An overly formal version is below)

Local Expansions Join: $joi n_{E}$

The algorithm $joi n_{E} (⟨ E, B, C ⟩, ⟨ E^{'}, B^{'}, C^{'} ⟩)$ joins a set of place expansions $E^{'}$ into a set of place expansions $E$ , and makes corresponding changes to borrows $B$ and capabilities $C$ .

If either $E$ or $E^{'}$ have any expansions:
1. Perform $joi n_{E}^{\leftarrow} (⟨ E, B, C ⟩, ⟨ E^{'}, B^{'}, C^{'} ⟩)$
2. Perform $joi n_{E}^{\leftarrow} (⟨ E^{'}, B^{'}, C^{'} ⟩, ⟨ E, B, C ⟩)$
Otherwise:
1. Let $v$ be the local
2. Perform $joi n_{v}^{\leftarrow} (⟨ B, C ⟩, ⟨ B^{'}, C^{'} ⟩)$
3. Perform $joi n_{v}^{\leftarrow} (⟨ B^{'}, C^{'} ⟩, ⟨ B, C ⟩)$

We define $joi n_{E}^{\leftarrow} (⟨ E, B, C ⟩, ⟨ E^{'}, B^{'}, C^{'} ⟩)$ as:

Let $\overline{p_{co ll a p se d}} = \emptyset$
For each expansion $e^{'} \in E^{'}$ (shortest first):
1. Let $p^{'}$ be the base of $e^{'}$
2. If there exists a $p \in \overline{p_{co ll a p se d}}$ where $p$ is a prefix of $p^{'}$ , then ignore this expansion
3. Otherwise, let $\overline{e}$ by the set of (direct) expansions from $e^{'}$ in $E$
4. If $e^{'} \in \overline{e}$ :
  1. Perform $joi n_{e^{'}}^{\leftarrow} (⟨ E, B, C ⟩, ⟨ E^{'}, B^{'}, C^{'} ⟩)$
5. Otherwise, if $\overline{e}$ is empty and $C [p^{'}] = c$ :
  1. If $C^{'} [p^{'}] = c^{'}$ :
    1. Perform $e x p an d^{\leftarrow} (e^{'}, ⟨ c, B, C ⟩, ⟨ c^{'}, B^{'}, C^{'} ⟩)$
  2. Otherwise, perform a normal expansion operation of $e^{'}$ in $E$
6. Otherwise, perform $joi n_{e^{'}}^{\leftrightarrow} (⟨ E, B, C ⟩, ⟨ E^{'}, B^{'}, C^{'} ⟩)$
  1. If the result of the join is to collapse to $p^{'}$ , then add $p^{'}$ to $\overline{p_{co ll a p se d}}$

We define $e x p an d^{\leftarrow} (e, ⟨ c, B, C ⟩, ⟨ c^{'}, B^{'}, C^{'} ⟩)$ as:

If $c ⊔ c^{'} = c^{''}$
1. Add expansion $e$ to $E$
2. Perform $joi n_{e^{'}}^{\leftarrow} (⟨ E, B, C ⟩, ⟨ E^{'}, B^{'}, C^{'} ⟩)$
Otherwise, if $c = R$ :
1. Let $p$ be the base of $e$
2. Perform $j o i n_{p_{R W}}^{\leftarrow} (p, B, C)$
Otherwise do nothing

Expansion Edge One-Way Join $j o i n_{e}^{\leftarrow}$

We define $joi n_{e}^{\leftarrow} (⟨ E, B, C ⟩, ⟨ E^{'}, B^{'}, C^{'} ⟩)$ as:

For each place $p$ in the expansion of $e$ :
1. Perform $joi n_{p}^{\leftarrow} (⟨ E, B, C ⟩, ⟨ E^{'}, B^{'}, C^{'} ⟩)$

Expansion Edge Two-Way Join $j o i n_{e}^{\leftrightarrow}$

We define $joi n_{e}^{\leftrightarrow} (⟨ E, B, C ⟩, ⟨ E^{'}, B^{'}, C^{'} ⟩)$ as:

Let $p$ be the base of $e$
If $C [p] = R$ and $C^{'} [p] = R$ :
1. Perform a regular expand of $e$
Otherwise, if there exists a descendant of $p$ in $E^{'}$ that does not have a capability:
1. Insert $e$ into $E$ (do not change capabilities)
Otherwise:
1. Collapse $E, C$ and $E^{'}, C^{'}$ to $p$

Place Join $j o i n_{p}^{\leftarrow}$

We define $joi n_{p}^{\leftarrow} (⟨ E, B, C ⟩, ⟨ E^{'}, B^{'}, C^{'} ⟩)$ as:

If $p$ is not a leaf in $E$ or $p$ is not a leaf in $E^{'}$ or $p \neq \in C$ or $p \neq \in C^{'}$ :
1. Abort
Otherwise, let $c = C [p]$ , $c^{'} = C^{'} [p]$
If $c ⩾ R$ and $c^{'} = R$ :
1. Perform $co p y_{R}^{\leftarrow} (p, C, C^{'})$
If $c = R$ and $c^{'} = W$ :
1. Perform $j o i n_{p_{R W}}^{\leftarrow} (p, B, C)$
If $c = E$ and $c^{'} = W$ :
1. Perform $co p y^{\leftarrow} (p . +, C, C^{'})$

Leaf RW Join $j o i n_{p_{R W}}^{\leftarrow}$

We define $j o i n_{p_{R W}}^{\leftarrow} (p, B, C)$ as:

Label all postfixes places of $p$ in $B$
$C [p] = W$

TODO: Actually it seems like shared references should maintain E capability even if they're dereferenced

Place Capabilitiies Join

The algorithm join( $C, C^{'}$ ) is defined as:

For each p: c in $C$ :
1. If $p \neq \in C^{'}$ :
  1. Remove capability to $p$ in $C$
2. Otherwise:
  1. If $min (c, C^{'} [p])$ is defined:
    1. Assign capability $min (c, C^{'} [p])$ to $p$ in $C$
  2. Otherwise, remove capability to $p$ in $C$

Borrow State Join

The borrow graphs are joined
The validity conditions are joined

Borrow PCG Join

Joining $B^{'}$ into $B$ , where $b$ is the block for $B$ and $b^{'}$ is the block for $B^{'}$ .

Update the validity conditions for each edge $e$ in $B^{'}$ to require the branch condition $b^{'} \to b$ .

Update the validity

If $b$ is a loop head perform the loop join algorithm as described here.
Otherwise:
1. For each edge $e^{'}$ in $B^{'}$ :
  1. If there exists an edge $e$ of the same kind in $B$
    1. Join the validity conditions associated with $e$ in $B^{'}$ to the validity conditions associated with $e$ in $B$
  2. Otherwise, add $e$ to $B$
2. For all placeholder lifetime projections $\overset{p}{^} ↓ r at FUTURE$ in $B$ :
  1. Label all lifetime projection nodes of the form $\overset{p}{^} ↓ r$ in $B$ with $FUTURE$

Loops

The loop head is the basic block with an incoming back edge. We define $l_{h}$ as the location of the first statement of the loop head.

The pre-loop block is the block prior to the loop head. We assume that there is always a unique pre-loop block. The pre-loop state $G_{pre}$ is the state after evaluating the terminator of the pre-loop block.

The following operations are performed when we join the pre-loop block with the loop head. Note that at this point we've already computed $G_{pre}$ .

We construct the state $G_{h}$ for the loop head as follows:

Step 1 - Identify Relevant Loop Places

We identify the following sets of places:

$P_{l i v e}$ : the places used inside the loop that are live and initialized at $l_{h}$ .

TODO: Doesn't liveness imply initialization?

$P_{b l oc k e d}^{l oo p}$ : the subset of $P_{l i v e}$ that are directly borrowed by a borrow live at $l_{h}$
$P_{b l oc k ers}$ : the subset of $P_{l i v e}$ that contain lifetime projections
$P_{l oo p}$ : Places used in the loop that may be relevant for the invariant: $P_{b l oc k e d}^{l oo p} \cup P_{b l oc k ers}$

$N_{l oo p}$ is the union of $P_{l oo p}$ and the associated lifetime projections of $P_{l oo p}$ .

$R P_{b l oc k ers}$ are the associated lifetime projections of $P_{b l oc k ers}$ .

Step 2 - Obtain Relevant Loop Nodes in Pre-state Graph

The nodes in $N_{l oo p}$ will need to appear in $G_{h}$ but may not be present in $G_{p re}$ (for example, it's possible that the loop could borrow from a subplace that requires unpacking). We construct a graph $G_{p re}^{'}$ by performing the obtain operation on each place in $P_{l oo p}$ .

Step 3 - Identify Borrow Roots $P_{roo t s}$

The borrow roots of a place $p$ are the most immediate places that $p$ could be borrowing from and later become accessible, and excluding places already in $P_{l oo p}$

We defined the borrow roots using the function $roo t s (p)$ :

Initialize a list $L$ to contain all lifetime projections in $p$
while $L$ is not empty:
- Pop $n$ from $L$
- For each edge $e$ blocked by $n$ in $G_{p re}^{'}$ :
  - If the edge is an unpack edge, add all of its blocked nodes to $L$
  - Otherwise, for each blocked node $n^{'}$ in $e$ :
    - If $n^{'} \in N_{roo t s}^{p}$ or $n^{'} \in N_{l oo p}$ , do nothing
    - If $n^{'}$ is live at $l_{h}$ , add $n^{'}$ to $N_{roo t s}^{p}$
    - If $n^{'}$ is a root in $G_{p re}^{'}$ , add $n^{'}$ to $N_{roo t s}^{p}$
    - Otherwise, add $n^{'}$ to $L$
The resulting roots are the associated places of $N_{roo t s}^{p}$

We then identify $P_{roo t s}$ , the most immediate nodes that $P_{l oo p}$ could be borrowing from and later become accessible (excluding nodes already in $P_{l oo p}$ ). $P_{roo t s}$ is the union of the roots for each place in $P_{l oo p}$ .

Step 4 - Construct Abstraction Graph And Compute Blocked Lifetime Projections

We construct an abstraction graph $A$ that describes the blocking relations potentially introduced in the loop from places in $P_{roots}$ to nodes in $P_{b l oc k ers}$ and from nodes in $P_{b l oc k e d}^{l oo p}$ to nodes in $P_{b l oc k ers}$ .

connect() function:

We begin by define a helper function $co nn ec t (p_{b l oc k e d}, p_{b l oc k er})$ which adds edges to $A$ based on $p_{b l oc k e d}$ being blocked by $p_{b l oc k er}$ in the loop:

Identify $p_{b l oc k er} ↓ r_{t o p}$ : the top-level lifetime projection in $p_{b l oc k er}$
- Insert a loop abstraction edge ${p_{b l oc k e d}} \to {p_{b l oc k er} ↓ r_{t o p}}$ into $A$
For each $p_{b l oc k e d} ↓ r \in RP (p_{b l oc k e d})$ :
- Identify the lifetime projections in $p_{b l oc k er}$ that may mutate borrows in $p_{b l oc k e d} ↓ r$
  - $R P_{m u t} = {p ↓ r^{'} ∣ p ↓ r^{'} \in RP (p_{b l oc k er}), r \approx r^{'}}$
  - If $R P_{m u t}$ is nonempty:
    - Introduce a placeholder node $p_{b l oc k e d} ↓ r at FUTURE$
    - Add a borrowflow hyperedge ${p_{b l oc k e d} ↓ r} \to R P_{m u t}$
    - Add a future hyperedges:
      - ${p_{b l oc k e d} ↓ r} \to {p_{b l oc k e d} ↓ r at FUTURE}$
      - $R P_{m u t} \to {p_{b l oc k e d} ↓ r at FUTURE}$
- Identify the lifetime projections in $p_{b l oc k er}$ that borrows in $p_{b l oc k e d} ↓ r$ may flow into
  - $R P_{f l o w} = {p ↓ r^{'} ∣ p ↓ r^{'} \in RP (p_{b l oc k er}), r outlives r^{'}} ∖ R P_{f l o w}$
  - If $R P_{f l o w}$ is nonempty:
    - Add a borrowflow hyperedge ${p_{b l oc k e d} ↓ r} \to R P_{f l o w}$

Algorithm:

For each blocker place $p_{b l oc k er} \in P_{b l oc k er}$ :
- For each $p_{b l oc k e d} \in (P_{roo t s} \cup P_{b l oc k e d})$ :
  - If $p_{b l oc k e d}$ is a remote node, or if $p_{b l oc k er}$ blocks $p_{roo t}$ at $l_{h}$ :
    - Perform $co nn ec t (p_{b l oc k e d}, p_{b l oc k er})$
Subsequently, ensure that $A$ is well-formed by adding unpack edges where appropriate. For example, if (*x).f is in the graph, there should also be an expansion edge from *x to (*x).f.
We identify the set $R P_{re nam e}$ of lifetime projections that will need to be renamed (indicating they will be expanded in the loop and remain non-accessible at the loop head). $R P_{re nam e}$ is the set of non-leaf lifetime projection nodes in $A$ (leaf nodes are accessible at the head).
Label all lifetime projections in $R P_{re nam e}$ with location $l_{h}$ , add connections to their Future nodes as necessary.

Step 5 - Label Blocked Lifetime Projections in Pre-State

The resulting graph for the loop head will require new labels on lifetime projections modified in the loop. We begin by constructing an intermediate graph $G_{p re}^{''}$ by labelling each lifetime projection in $R P_{re nam e}$ with $l_{h}$ and remove capabilities to all places in $P_{re m o v e}$ in $G_{p re}^{'}$ .

Step 6 - Identify Pre-State Subgraph to Replace With Abstraction Graph

We then identify the subgraph $G_{c u t} \subseteq G_{p re}^{''}$ that will be removed from $G_{p re}^{''}$ and replaced with the loop abstraction $A$ .

Let $N_{c u t}$ = $n o d es (A)$ .
We construct $G_{c u t}$ by including all nodes in $N_{c u t}$ and all edges $e$ where there exists $n, n^{'} \in N_{c u t}$ where $e$ is on a path between $n$ and $n^{'}$ in $G_{pre}^{''}$ .

Step 7 - Replace Pre-State Subgraph with Abstraction Graph

The graph $G_{h}$ for the loop head is defined as $G_{h} = G_{p re}^{''} ∖ G_{c u t} \cup A$

Step 8 (Optional) - Confirm Invariant is Correct

To confirm that the resulting graph is correct, for any back edge into the state at $G_{h}$ with state $G^{'}$ , performing the loop join operation on $G^{'}$ and $G_{h}$ should yield $G_{h}$ .

Function Shapes

Background

The PCG generates a "shape" for a function call - a bipartite graph indicating where borrows could flow as a result of the call. In particular this shape represents (1) reborrows that are returned from the function (w.r.t borrows passed in the arguments), and (2) the effects of the nested borrows passed in the operands.

We model the first case by introducing abstraction edges between lifetime projections in the operands and those in the result place: each lifetime projection in the operands is connected to the corresponding lifetime projections that it outlives in the result place. The (compiler-checked) outlives constraint captures whether borrows could be assigned in this way, according to the type system.

For (2) To directly model the potentially changed sets of borrows relevant to these concerns, our analysis of function calls introduces lifetime projections to represent the post-state of each lifetime projection in the operands. Each lifetime projection in the operands is connected with abstraction edges to its corresponding post-state projection as well as the post-state nested lifetime projections that it outlives (analogously to sets of borrows explicitly returned).

Creating Function Shapes

Function Shapes

A function shape source base $B_{S}$ takes the form $arg i$ . A function shape target base $B_{T}$ is either $arg i$ or $result$ .

A function shape source node $N_{B}$ is a pair $⟨ B_{S}, i ⟩$ where $i$ is the region index of the node. Function shape target nodes $N_{T}$ are defined analogously.

A function shape edge is a pair $⟨ N_{B}, N_{T} ⟩$ , and a function shape $S$ is a set of edges.

A shape $S$ permits more borrowing than a shape $S^{'}$ iff $S^{'} \subseteq S$ ; likewise $S$ permits less borrowing than $S^{'}$ iff $S \subseteq S^{'}$ .

Signature Shape

The corresponding node $n (g r p)$ of a generalized lifetime projection $g r p \in GRP (\hat{f})$ is $⟨ base (r p), index (r p) ⟩$ .

The corresponding generalized lifetime projection $g r p (n)$ of a node $n = ⟨ b, i ⟩$ is the generalized lifetime projection $g r p \in GRP (\hat{f})$ such that $n (g r p) = n$ .

A generalized lifetime $g r$ outlives a generalized lifetime $g r^{'}$ in the signature of $\hat{f}$ (denoted $\hat{f} ⊢ g r : g r^{'}$ iff:

$g r = g r^{'}$ , or
$g r$ is a generalized type $τ^{*}$ and $g r^{'}$ is a region $r^{'}$ and $E (\hat{f}) ⊢ g r : r^{'}$

Note that $RegionsIn (τ)$ represents all regions that could occur in $τ$ . More generally, $RegionsIn (τ)$ outlives $RegionsIn (τ^{'})$ when all regions in $τ$ outlive all corresponding regions in $τ^{'}$ . The current implementation handles the case where $τ = τ^{'}$ (reflexivity); other cases may be added in the future.

The signature shape $S_{\hat{f}}^{sig}$ for a function instantiation $\hat{f}$ is defined as follows:

For each $⟨ b_{s} ↓ g r_{s}, b_{t} ↓ g r_{t} ⟩ \in GRP (\hat{f}) \times GRP (\hat{f})$ then add $⟨ n (b_{s} ↓ g r_{s}), n (b_{t} ↓ g r_{t} ⟩)⟩$ to $S_{\hat{f}}^{sig}$ if both:

$\hat{f} ⊢ g r_{s} : g r_{t}$ , and
$b_{t}$ is $result$ , or $g r_{s}$ is a region $r$ that is invariant in $b_{t}$ .

Call Shape

The corresponding node $n (r p)$ of a lifetime projection $o p ↓ r \in RP (FC)$ is $⟨ arg i, index (r p) ⟩$ .

The corresponding node $n (r p)$ of a lifetime projection $p ↓ r \in RP (FC)$ is $⟨ result, index (r p) ⟩$ .

${⟨ base (r p), index (r p) ⟩ ∣ r p \in RP (\hat{f})}$

The call shape $S_{FC}^{call}$ for a function call $FC$ is defined as follows:

For each $⟨ b_{s} ↓ r_{s}, b_{t} ↓ r_{t} ⟩ \in RP (FC) \times RP (FC)$ then add $⟨ n (b_{s} ↓ r_{s}), n (b_{t} ↓ r_{t} ⟩)⟩$ to $S_{FC}^{call}$ if both:

$r_{s} outlives r_{t}$ at $l$ according to the borrow checker, and
$b_{t}$ is $p$ , or $r_{s}$ is invariant in $b_{t}$ .

Type Aliases and Normalization

An alias type $τ_{α}$ is a type of the form $τ :: T ⟨ \overline{τ^{*}} ⟩$ where $T$ is a type constructor. The function $normalize (τ, E)$ returns a type $τ^{'}$ where alias types in $τ$ may possibly be replaced with other types. This normalisation is idempotent, e.g. $normalize (normalize (τ, E), E) = normalize (τ, E)$ .

Signature-Derived Call Shape

For a call $FC = (p = \hat{f} (\overline{o p})$ at $l$ ), the signature-derived call shape $S_{FC}^{sig}$ is obtained as follows:

Let $S_{\hat{f}}^{norm}$ be the normalized signature shape, e.g the one obtained by replacing each $τ$ in $S_{\hat{f}}^{sig}$ with $normalize (τ, E (FC))$ .

If $b$ is the $i^{'} t h$ operand in $FC$ , the corresponding normalized type $τ_{b}$ is the type of the $i^{'} t h$ argument in $S_{\hat{f}}^{norm}$ . Likewise, if $b = result$ , then $τ_{b}$ is the output type of $S_{\hat{f}}^{norm}$ . Then, the corresponding normalized region of a lifetime projection $b ↓ r$ is the region in $τ_{b}$ that corresponds to $r$ in $b$ .

For each $(n_{s}, n_{t}) \in S_{FC}^{call}$ :

Let $b_{s} ↓ r_{s} = r p (n_{s})$ , $b_{t} ↓ r_{t} = r p (n_{t})$ be the corresponding lifetime projections
Then, let $r_{s}^{'}$ and $r_{t}^{'}$ be the corresponding normalized regions of $r_{s}$ and $r_{t}$ respectively.
If $r_{s}^{'}$ outlives $r_{t}^{'}$ in $S_{\hat{f}}^{norm}$ , then add $(n_{s}, n_{t})$ to $S_{FC}^{sig}$

Using shapes for function calls in the PCG

If the call is to a defined function, then the signature-derived call shape is used. Otherwise, the call shape is used.

More Background

For a function $f$ , there are three types of shapes to consider:

Signature Shapes: The shape of an instantiation of $f$ generated from its signature
Call Shapes: A shape used to represent call to an instantiation of $f$
Implementation Shapes: A shape representing $f$ 's body, which connects remote lifetime projection nodes to the the result.

These different shape types are relevant for Prusti, as:

Signature shapes define the shape of a magic wand
Call shapes are used for magic wands that will be applied
Implementation shapes define magic wands that will be packaged

The call shape and implementation shapes are necessarily related to the signature shape; the former can contain more edges while the latter can contain less.

The shape for a function call must necessarily have corresponding edges that appear in the shape for the function signature. The reverse is not necessarily the case. For example, consider the following:

#![allow(unused)]
fn main() {
fn caller<'e, 'f: 'e>(x: T<'e>, y: U<'e>, z: T<'f>) {
    let r: W<'e> = f(x, y, z);
}
}

Using the borrow-checker to generate the shape of the call will result in an edge from z|'f to r|'e: the borrow-checker reflects that 'f: 'e. However, the definition of f do not allow borrows to flow from z to r (which is reflected in the function call shape).

Therefore, to build more precise graphs at function call sites. We want to use the shape of the function to determine the shape of the call. The procedure is as follows:

Generate the shape of the function.
Build a map $M$ from Def Projections to Call Projections by comparing the types of the arg places $T_{p}$ and formal args $T_{a}$ (analogously to the results). By construction, $M$ will contain all projections in the function def (even if they don't appear in the shape).

Here's an example of how its computed:

If the type of the $i$ 'th place $T_{p}$ is &'?1 mut U<'?2> and the type of the $i$ 'th formal arg $T_{a}$ is &'a mut U<'b>, then the visitor will first compare at the top level, add arg_i |'a -> p_i '?1 to the map and continue by comparing U<'?2> to U<'b>

Then, for each edge in the fn shape, the edge is replaced with the corresponding projections of the call. If the fn shape has an edge arg 1|'a -> result|'b for example, then for the call shape it will lookup corresponding call projections and add an edge between them.

Reasoning about Associated Types

Consider the following code:

#![allow(unused)]
fn main() {
trait MyTrait<'a> {
    type Assoc<'b> where 'a: 'b;

    fn get_assoc<'slf, 'b>(&'slf mut self) -> Self::Assoc<'b> {
      todo!()
    }
}
}

The full signature for get_assoc is:

#![allow(unused)]
fn main() {
fn(&'slf mut Self) -> <Self as MyTrait<'a>>::Assoc<'b>
}

we observe that the fn sig has the following lifetime projections:

argidx 0|'slf
result 0|'a
result 0|'b

And the shape for get_assoc contains the single edge:

argidx 0|'slf -> result|'b

For the body, self has type &'8 mut Self/#0 and result has Alias(Projection, AliasTy { args: [Self/#0, '?6, '?7], def_id: DefId(0:5 ~ input[9b88]::MyTrait::Assoc), .. }),

So we unify &'slf mut Self with &'?8 mut Self adding argidx 0|'slf -> _1|'?8 to $M$

We unify <Self as MyTrait<'a>::Assoc<'b> with <Self as MyTrait<'?6>::Assoc<'?7> adding

result|'a -> result|'?6 to $M$
result|'b -> result|'?7 to $M$

Then applying the substitutions our shape is _1|'?8 -> result|'?7

Function Calls

Consider a function call $p_{result} = f (o_{1}, \dots, o_{n})$ at location $l$ . Each $o_{i} \in {o_{1}, \dots, o_{n}}$ is an operand of the form move p | copy p | const c. Let $\overline{p}$ be the set of moved operand places.

The function call abstraction is created in the PostMain evalution stage, after all of the operands have been evaluated. Therefore, all places in $\overline{p}$ are labelled, as they have been moved-out by this point.

Algorithm

Let $RP$ be the set of lifetime projections in $\overline{p}$ (that is, obtained from places in $p$ , regardless of whether they are already in the graph). For any place $p$ , we define $\tilde{p}$ as the place $p at before l : PostOperands$ .

We define the set of lifetime projections of the moved-out operands as $RP = {\tilde{p} ↓ r ∣ p ↓ r \in RP}$ .

We define the "pre-call states" of $RP$ as $S = {\tilde{p} ↓ r at PRE ∣ p ↓ r \in RP}$ .

We define the "post-call states" of $RP$ as $T = {\tilde{p} ↓ r at POST ∣ p ↓ r \in RP and r is nested in p}$

The definitions of sets $RP$ , $RP$ , $S$ , and $T$ do not depend on the nodes present in the PCG. In general, we postulate that following invariants should always hold:

Only the nodes in $RP$ are in the graph at the end of $PreOperands$ stage
Only the nodes in $RP$ are in the graph at the end of the $PostOperands$ stage
Only the nodes in $S \cup T$ are in the graph at the end of the $PostMain$ stage

1. Redirect Future Nodes

This follows the standard process, e.g. similar to borrows and borrow PCG expansions

For each Future edge $e$ where the source node is in $RP$ :
1. Change the source node from $⟨ p ↓ r ⟩$ to $⟨ p ↓ r at POST ⟩$

2. Label Lifetime Projections for Pre-State

Label projections in $RP$ to become $S$ in the graph:

For each $0 < i ⩽ ∣ RP ∣$ :
1. Label the $RP_{i}$ with $PRE$

3. Add Abstraction Edges

At a high level we construct by connecting lifetime projections $S$ to the lifetime projections in $T$ and the lifetime projections in $p_{res u lt}$ , where connections are made based on outlives constraints. Concretely:

Let $O$ be union of $T$ and the lifetime projections in $p_{res u lt}$
For each $\tilde{p} ↓ r_{s} at PRE$ in $S$ :
1. Let $O_{s}$ contain each lifetime projection $r p \in O$ where $r_{s}$ outlives the lifetime of $r p$
2. If $O_{s}$ is not empty, add the abstraction edge ${\tilde{p} ↓ r_{s} at PRE} \to O_{s}$ to the PCG

Generating Annotations

Generating Annotations Between Statements

The annotations for transitioning from statement $i$ to statement $i + 1$ are the recorded PCG actions performed during its analysis.

Generating Annotations Between Basic Blocks

We generate the annotations to get from the state $s$ of $b$ to its successor $b^{'}$ . The state $s^{'}$ of $b^{'}$ may be to the join of multiple basic blocks (including $b$ ).

We generate annotations by performing a join that joins $s^{'}$ into a copy of $s$ and recording the actions that occur. Note that this is the reverse of the join that occurs during the analysis where $s$ is joined into $s^{'}$ . Conceptually, the actions performed by the join are the procedure that weakens state $s$ to obtain state $s^{'}$ .

Generating Annotations for Magic Wands

Clients like Prusti want to generate magic wands from function bodies.

The PCG facilitates this via the UnblockGraph interface. The interface takes a PCG $G$ and a PCG node $n$ , and returns an ordered list $L$ of annotations that describe how to unblock $n$ in $G$ . Prusti can consume these annotations to generate magic wands.

The interface could trivially be extended to unblock multiple nodes simultaneously

The annotations generated from the UnblockGraph are Borrow PCG edges.

At a high level, the resulting annotations are a topological sort of the edges in the subgraph of $G$ that contains $n$ and all of its ancestors. Concretely, the procedure to generate the list $L$ of annotations is as follows:

Let $U$ be the subgraph containing $n$ and its ancestors in $G$
While $U$ has at least one edge:
1. Let $\overline{e}$ be the set of leaf edges in $U$ .
2. If $\overline{e}$ is empty fail
3. Append $\overline{e}$ to $L$
4. Remove $\overline{e}$ from $U$

We note that the above procedure could fail, for example, $U$ could contain a cycle. We expect that in practice this is quite rare. Furthermore, we believe the following property should hold:

For any list $\overline{e}$ of PCG edges forming a cycle, there does not exist any execution path that call satisfy the validity conditions of every edge in the cycle.

Therefore, it should be possible to modify the implementation to e.g. produce multiple lists for distinct paths.

Misc (To Update)

Unpacking Places

An unpack operation introduces an Unpack edge into the graph, and is associated with a source place $p$ and an expansion $\overline{p}$ . An unpack operation can either be mutable or read-only.

An unpack operation is a dereference unpack for lifetime $r$ iff the type of the source place p is &'r mut p or &'r p.

Applying an unpack operation introduces an Unpack edge as follows:

If the unpack operation is a dereference operation, the edge ${p, p ↓ r} \to {* p}$ is added to the graph.
Otherwise, the edge ${p} \to {\overline{p}}$ is added to the graph.

A mutable unpack operation for capabiliy $c \in {W, E}$ requires that the source place $p$ have capability $c$ . When the operation is a applied, the capability for $p$ is removed and the capability of all places in $\overline{p}$ is set to $c$ .

A read-only unpack operation requires that $p$ have capability $R$ and assigns capability $R$ to all places in $\overline{p}$ .

Updating Lifetime Projections

Assume the source place $p$ has type $τ$ with lifetimes $r_{1}, \dots, r_{n}$ . If the unpack operation is mutable, then we label each lifetime projection $p ↓ r_{i}$ with location $l$ .

The source lifetime projection for a lifetime $r$ is $p ↓ r at l$ if the unpack is mutable, and $p ↓ r$ otherwise. The target lifetime projections is the set ${p^{'} ↓ r ∣ p^{'} \in \overline{p} and r is in the type of p^{'}}$ .

For each lifetime $r \in {r_{1}, \dots, r_{n}}$ , if the set of target lifetime projections associated with $r$ is nonempty:

For each $t$ in the set of target projections, add a BorrowFlow edge ${s} \to {t}$ where $s$ is the source projection¹.

If the unpack is mutable and the source place of the expansion is either a reference or a borrowed place: a. Add a Future edge ${s} \to {p ↓ r atFuture}$ b. For each $t$ in the set of target projections, add a Future edge ${t} \to {p ↓ r atFuture}$ c. If any Future edges originate from the source projection $s$ , redirect them such that they originate from $p ↓ r atFuture$ .

Origin Containg Loan

An origin $r$ contains a loan $r_{L}$ created at location $l$ iff:

Polonius: Read directly from output facts

NLL: $r_{L}$ is live at $l$ and $r_{L}$ outlives $l$

TODO: Currently we actually introduce an unpack edge in the implementation, but we should change this. ↩

PCG Operations

Obtaining Capability to a Place

The operation to obtain capability $c$ to a place $p$ in a PCG $G$ proceeds as follows.

When performing this operation to satisfy the place capability required for a statement, the analysis guarantees that no live mutable borrows conflicting with $p$ ¹. At a high level, capability to $p$ is obtained by first collapsing the owned places in $G$ to $p$

Step 1 - Label dereferences of shared borrows stored in $p$

Reborrows of shared references derived from $p$ (i.e. from any postfix of $p$ ) will survive even if $p$ is moved or mutated. Therefore, if $c$ permits the function to mutate $p$ (i.e. $c \neq = R$ ), then the analysis labels all places that project from shared references derived from $p$ .

Formally:

If $c \neq = R$ , then for each place $p^{'}$ in $G$ where:

$p^{'}$ is a strict postfix of $p$ , and
$p^{'}$ is a shared reference, and
$* p^{'}$ is in $G$

The analysis labels every postfix of $p^{'}$ in $G$ .

Step 2 - Collapse

Implementation

This operation is implemented as PlaceObtainer::obtain

The obtain(p, o) operation reorganizes a PCG state to a new state in where the PCG node for a place $p$ is present in the graph. The parameter o is an Obtain Type which describes the reason for the expansion. The reason $o$ is either:

To obtain a specified capability to the place
To access the place because it is the borrowed place of a two-phase borrow

An Obtain Type is associated with a result capability $c$ which is either the specified capability (in the first case), or Read (in the case of access for two-phase borrows).

The "two-phase" borrow case is likely unnecessary: we can use the borrow-checker to detect if the place $p$ is also the borrowed place of a two-phase borrow reserved at the current location. In fact the current implementation make similar queries as part of the expand step.

Note that a place node for $p$ is not necessarily present in the graph before this occurs.

This proceeds as follows:

Step 1 - Label dereferences of shared borrows stored in $p$

(Same as high-level description)

Step 2 - Upgrading Closest Ancestor From R to E

This step is included to handle a relatively uncommon case (see the Rationale section below).

If the obtain operation is called with permission $W$ or $E$ and the closest ancestor $p^{'}$ to $p$ , that is, the longest prefix of $p$ for which there exists a node in the graph, has $R$ capability, we upgrade $p^{'}$ 's capability to $E$ in exchange for removing capability to all pre- and postfix places of $p^{'}$ in the graph (excluding $p^{'}$ itself).

This is sound because if we need to obtain non-read capability to place, and there are any ancestors of place in the graph with R capability, one such ancestor originally had E capability was subsequently downgraded. This function finds such an ancestor (if one exists), and performs the capability exchange.

Perhaps it would be better to explicitly track downgrades in the analysis so that they can be upgraded later? This will make the soundness argument more convincing.

Rationale

It's possible that we want to obtain exclusive or write permission to a field that we currently only have read access for. For example, consider the following case:

There is an existing shared borrow of (*c).f1
Therefore we have read permission to *c, (*c).f1, and (*c).f2
Then, we want to create a mutable borrow of (*c).f2
This requires obtaining exclusive permission to (*c).f2

We can upgrade capability of (*c).f2 from R to E by downgrading all other pre-and postfix places of (*c).f2 to None (in this case c and *c). In the example, (*c).f2 is actually the closest read ancestor, but this is not always the case (e.g. if we wanted to obtain (*c).f2.f3 instead)

Step 3 - Collapse

Then, if a node for $p$ exists in the graph and $p$ 's capability is not at least as strong as $c$ , collapse the subgraph rooted at $p$ (and obtain capability $c$ for $p$ ) by performing the collapse(p, c) operation.

collapse

The collapse(p) operation is implemented as follows:

For each $p^{'}$ such that $p$ is a prefix of $p^{'}$ (from longest to shortest) and there is a node for $p^{'}$ in the graph:
- perform the Collapse Repack Operation on $p^{'}$ .
- For each lifetime $^{'} r$ in the type of $p^{'}$ :
  - Create a new lifetime projection node $p^{'} ↓^{'} r$
  - For each lifetime projection node $p^{''} ↓^{'} r$ where $p^{''}$ is an expansion of $p^{'}$ :
    - Label $p^{''}$
    - Create a new BorrowFlow edge ${p^{''} ↓^{'} r} \to {p^{'} ↓^{'} r}$

Step 4 - Labelling $p$

At this point, if $c$ is $W$ , we know that a subsequent operation will mutate $p$ . As a result, if there exists a lifetime projection node for $p$ (for example, if $p$ stores a borrow that has since been reborrowed), it will no longer be tied to the current value of $p$ . So, we label $p$ with reason ReAssign.

Step 5 - Expand

At this point there should be a node for some prefix $p^{'}$ of $p$ in the graph such that $C [p^{'}] ⩾ c$ .

We expand the graph to $p$ (and obtain the capability for $p$ ) by performing the expandTo(p, o) operation.

expandTo

The expandTo operation is implemented as follows:

For each strict prefix $p^{'}$ of $p$ (from shortest to longest):
- If expanding $p^{'}$ one level adds new edges to the graph, then
  - We expand the lifetime projections of $p^{'}$ one level

The operation to expand a place one level is the expandPlaceOneLevel operation, and the operation to expand the lifetime projections one level is expandLifetimeProjectionsOneLevel.

expandLifetimeProjectionsOneLevel

expandLifetimeProjectionsOneLevel is defined with three parameters:

$p_{b}$ : The place to expand
$e$ : The target expansion of $p_{b}$
$o$ : The Obtain Type

The operation is implemented as follows:

Let $\overline{p}$ be the expansion of $p_{b}$ using $e$
For each lifetime projection $p_{b} ↓ r$ of $p_{b}$ :
- Let $\overline{r p}_{r}$ be the set of lifetime projections in $\overline{p}$ with lifetime $r$
- If $\overline{r p}_{r}$ is nonempty²:
  - We identify the base lifetime projection $r p_{b}$ as follows:
    - Let $l$ be the current snapshot location
    - If $o$ is not an obtain for capability R:
      - $r p_{b} = p_{b} ↓ r at l$
    - Otherwise, if $p_{b}$ is blocked by a two-phase borrow live at $l$ :
      - Let $l^{'}$ be the reservation location of the conflicting borrow
      - $r p_{b} = p_{b} ↓ r at l^{'}$
    - Otherwise, $r p_{b} = p_{b} ↓ r$
  - Create a new Borrow PCG Expansion hyperedge $e = {r p_{b}} \to \overline{r p}_{r}$

If the program requires capability to the place to do some action, then the place cannot be borrowed mutably. This invariant does not hold when we are obtaining capability to the place in order to construct a loop abstraction. ↩
This could happen e.g. expanding an x : Option<&'a T> to a x@None ↩

Collapsing Owned Places

At the outset of each program point, the collapse_owned_places operation eagerly collapses the Owned PCG.

This operation is implemented as PcgVisitor::collapse_owned_places (see https://github.com/viperproject/pcg/blob/main/src/pcg/visitor/pack.rs

It is implemented as follows:

For each place $p$ for which there exists a place node (from longest to shortest):
- If no expansion of $p$ is blocked by a borrow and every expansion of $p$ has the same capability:
  - perform the collapse(p) operation
  - if $p$ has no projection and has $R$ capability, upgrade $p$ 's capability to $E$

Activating Two-Phase Borrows

activateTwophaseBorrowCreatedAt

reserveLocation is a function from borrow edges to the MIR location at which the borrow edge was created.

The activateTwophaseBorrowCreatedAt operation takes a single parameter:

$l$ , a MIR location

The operation is implemented as follows:

If there exists a borrow edge $e = {p} \to p s$ in the graph such that l = reserveLocation(e):
- If there exists a place node for $* p$ in the graph:
  - Restore $E$ capability to $* p$
- If $p$ is not owned:
  - Downgrade the capability of every ancestor of $p$ to None
TODO Logic is bad

Packing Old and Dead Borrow Leaves

The packOldAndDeadBorrowLeaves operation removes leaf nodes in the Borrow PCG that are old or dead (according to the borrow checker).

$l$ : the current MIR location

When analysing a particular location, this operation is performed before the effect of the statement.

Note that the liveness calculation is performed based on what happened at the end of the previous statement.

For example when evaluating:

bb0[1]: let x = &mut y;
bb0[2]: *x = 2;
bb0[3]: ... // x is dead

we do not remove the *x -> y edge until bb0[3]. This ensures that the edge appears in the graph at the end of bb0[2] (rather than never at all).

This operation is implemented as PcgVisitor::pack_old_and_dead_borrow_leaves (see https://github.com/viperproject/pcg/blob/main/src/pcg/visitor/pack.rs

We must first introduce some auxiliary operations:

isDead

isDead(n, l) is true if and only if the borrow checker considers the node $n$ to be dead at MIR location $l .$

removeEdgeAndPerformAssociatedStateUpdates

removeEdgeAndPerformAssociatedStateUpdates is defined with one parameter:

$e$ : the edge to remove

It proceeds as follows:

For each current place node $p$ that would be unblocked by removing $e$ :
- If $p$ does not have $R$ capability, and $p$ is mutable:
  - Update $L$ to map $p$ to $after l$ where $l$ is the current MIR location
Remove $e$ from the graph
For each current place node $p$ that is unblocked by removing $e$ :
- Let $c$ be $R$ if $p$ projects a shared reference and $E$ otherwise
- If $p$ has no capability or $e$ capability, upgrade its capability to $c$
- Unlabel each region projection of $p$
If $e$ is a Borrow PCG Expansion edge:
- If $e$ is a Dereference Expansion ${\tilde{p_{s}}, r p_{s}} \to p_{t}$ where $p_{t}$ is a current place with no lifetime projections:
  - unlabel $r p_{s}$
- If $e$ has a source node $p_{s}$ where $p_{s}$ is a current place:
  - For each place node $p_{t}$ in the expansion of $e$ , label each region projection of $p_{t}$ with $prepare l$ , where $l$ is the current MIR location
If $E$ is a Borrow edge; $i sDe a d (e, l)$ where $l$ is the current MIR location; the target of the borrow is a current place $p$ ; and $p$ has non-zero capability:
- weaken $p$ 's capability to $W$

Main Loop

packOldAndDeadBorrowLeaves proceeds as follows:

Until the PCG remains unchanged across the following steps:
- Let $E$ be be the set of edges $e = \overline{n_{s}} \to \overline{n_{t}}$ such that either:
  - $e$ is an Borrow PCG Expansion edge and either:
    - for each $n_{t}$ , either:
      - $i sDe a d (n_{t}, l)$ , where $l$ is the current MIR location
      - or $n_{t}$ is old
    - or $n = p$ , any pair of place nodes in $\overline{n_{t}}$ have the same capability, and for all $n_{t}$ such that $n_{t} = p_{t}$ , $p_{t}$ has the same label as $p$ and $p$ is an exact prefix of $p_{t}$
    - or $n = p ↓^{'} r$ and for all $n_{t}$ such that $n_{t} = p_{t} ↓^{'} r_{t}$ , $p$ is an exact prefix of $p_{t}$ ; $p$ and $p_{t}$ have the same label; and $^{'} r$ and $^{'} r_{t}$ have the same label.
  - or for each $n_{t}$ , where $n_{t} = \overset{p}{^}$ or $n_{t} = \overset{p}{^} ↓ ..$ , either:
    - $\overset{p}{^}$ is old
    - or $\overset{p}{^}$ 's associated place is not a function argument and either:
      - $\overset{p}{^}$ has a non-empty projection and $n_{t}$ is not blocked by an edge
      - or $i sDe a d (n_{t}, l)$ , where $l$ is the current MIR location
- For each $e$ in $E$ :
  - perform removeEdgeAndPerformAssociatedStateUpdates(e)

Repack Operations

Repack operations describe actions on owned places.

RegainLoanedCapability

Fields:

$p$ - Place
$c$ - Capability

This operation is used to indicate that $p$ is no longer borrowed, and can therefore be restored to capability $c$ .

In principle I think $c$ should always be exclusive capability

Applying RegainLoanedCapability

The PCG applies this operation by setting the capability of $p$ to $c$ .

DerefShallowInit

Fields:

$p_{f}$ - From Place
$p_{t}$ - To Place

This operation is used to indicate that a $p_{f}$ (which is a shallow-initialized box) was dereferenced.

Applying DerefShallowInit

Let $\overline{p}$ be the expansion of $p_{f}$ obtained by expanding towards $p_{t}$ .

The PCG applies this operation by adding read capability to the places in $\overline{p}$

Why do we use this logic?

Collapse

Fields:

$p$ - Place
$g$ - Collapse Guide
$c$ - Capability

This operation indicates that the expansion of $p$ should be packed (using guide $g$ ) with resulting capability $c$ .

Applying Collapse

Let $\overline{p}$ be the expansion of $p$ towards guide place $g$ .

Preconditions:

Each place $p^{'}$ in $\overline{p}$ has a capability $c_{p^{'}} ⩾ c$

Let $c^{'}$ be the minimum capability of the places in $\overline{p}$ .

Capability for each place in $\overline{p}$ is removed.
Capability for $p$ is set to $c^{'}$ .
The Unpack edge from $p$ is removed

The current implementation guarantees there is only one unpack edge from $p$ . In the future this may change.

Expand

Fields:

$p$ - Place
$g$ - Expand Guide
$c$ - Capability

This operation indicates that $p$ should be expanded (using guide $g$ ) such that each place in the expansion has capability $c$ .

Applying Expand

Let $\overline{p}$ be the expansion of $p$ towards guide place $g$ .

Preconditions:

$p$ has capability $p_{c} ⩾ c$

The unpack edge ${p} \to {\overline{p}}$ is added
Capability for every place in $\overline{p}$ is set to $c$
If $c$ is Read capability, the capability of $p$ is set to Read
Otherwise, if $c$ is not Read, capability of $p$ is removed

Note that reference-typed places will never be expanded.

Borrow PCG Actions

LabelLifetimeProjection

Fields

predicate - A predicate describing lifetime projections that should be labelled
label - The label to apply (current, FUTURE or a label $ℓ$ )

Applying LabelLifetimeProjection

Replaces the label associated with lifetime projections in the borrow PCG matching predicate. If label is current, then the label of each matching lifetime projection is removed.

Weaken

Fields:

$p$ - Place
$c_{f}$ - From capability
$c_{t}$ - (Optional) To capability

Used to reduce the capability of a place. In general the $c_{f}$ is Exclusive, for example in the following cases:

Before writing to $p$ , capability should be reduced to Write
When a two-phase borrow is activated, capabilities to places conflicting with the borrowed place should be removed

Applying Weaken

If $c_{t}$ is defined, the capability of $p$ is set to $c_{t}$ . Otherwise, capability to $p$ is removed.

RestoreCapability

Fields:

$p$ - Place
$c$ - Capability

Instructs that the capability to the place should be restored to the given capability, e.g. after a borrow expires, the borrowed place should be restored to exclusive capability.

Applying RestoreCapability

The capability of $p$ is set to $c$ .

LabelPlace

Fields

$p$ - Place
$s$ - The snapshot location to use
reason - Why the place is to be made labelled

The purpose of this action is to label current versions of $p$ (and potentially prefixes and postfixes of $p$ ) with the label corresponding to the last time they were updated.

There are six reasons defined:

StorageDead
MoveOut
ReAssign
LabelSharedDerefProjections
Collapse

Applying LabelPlace

The behaviour of this action depends on reason:

`ReAssign`, `StorageDead`, `MoveOut`

The places to be labelled are:

All postfixes of $p$
All prefixes of $p$ prior to the first dereference of a reference.

For example, if $p$ is (*x).f, then *((*x).f), (*x).f, and *x will be labelled.

`Collapse`

The place $p$ is labelled (but none of its prefixes or postfixes).

`LabelSharedDerefProjections`

All strict postfixes of $p$ are labelled.

RemoveEdge

Removes an edge from the graph. If the removal of the edge causes any place nodes to be removed from the graph, the capability of those places are removed.

AddEdge

Inserts an edge into the graph. This does not change the capabilities.

Coupling (WIP)

The PCG tracks ownership and borrowing at a fine-grained level, and in some cases this granularity cannot be "observed" by the type system. For example, lifetime projection nodes can represent a notion of reborrowing that is more precise than Rust's borrow-checker itself. For example, consider the choose function:

#![allow(unused)]
fn main() {
fn choose<'a, T>(choice: bool, lhs: &'a mut T, rhs: &'a mut T) -> &'a mut T {
    if choice {
        lhs
    } else {
        rhs
    }
}
}

The the PCG shape of a call choose(x, y) function: consists of two edges $x ↓^{'} a \to result ↓^{'} a$ and $y ↓^{'} a \to result ↓^{'} a$ . However, because the compiler only tracks lifetimes, the borrows of x and y will always expire at the same time. Accordingly, the two edges corresponding to the call will always be removed from the graph at the same time. These edges are therefore coupled, because the Rust type system forces the PCG to remove them at the same time.

Motivation

Because the type systems forces a set of coupled edges to be removed "all-at-once", edges that are known to be coupled could be treated as a single hyperedge.

The primary reason for doing so is to provide more information analysis tools. For example, Prusti uses coupling information to generate the shape of magic wands: in the choose, the coupled hyperedge provides precisely the shape of magic wand that Prusti encodes (although this is not always the case).

Another benefit is that coupling can reduce the size of the graphs.

Formal Definitions

Hyperedge

A hyperedge $e$ is an object with an associated set of source nodes and target nodes. The functions $sources (e)$ and $targets (e)$ denote the source and target nodes respectively.

Coupled Edges

A coupled edge $c$ is a hyperedge defined by a set of underlying hyperedges, where the sources and targets are defined as follows:

Let $S$ be the union of the sources of $\overline{e}$ and $T$ be the union of the targets of $T$ . Then $so u rces (c) = S ∖ T$ and $t a r g e t s (c) = T ∖ S$ .

Hypergraph

A hypergraph $G$ is a tuple $⟨ S, E ⟩$ where $S$ is a set of nodes and $E$ is a set of hyperedges. Functions $n o d es (G)$ and $e d g es (G)$ return the sets of nodes and hyperedges respectively.

Blocked Nodes

A node $n$ is blocked in $G$ iff $n \in n o d es (G)$ and $n$ is not a leaf in $G$ .

Descendant Relation

We define the descendant relation $⩽_{G}$ as

$s ⩽_{G} s^{'} iff s = s^{'} or s is a descendant of s^{'} in G .$

Frontier

A set of nodes $S$ is a frontier of a hypergraph $G$ (denoted $frontier (S, G)$ ) iff $S \subseteq n o d es (G)$ and $S$ is closed under $⩽_{G}$ .

If $S$ is a frontier of $G$ , it defines a valid expiry. The valid expiry $G ∖ S$ is the subgraph of $G$ obtained by removing all nodes in $S$ and all edges containing sources or targets in $S$ . The expired edges of a valid

Reachable Subgraph

A graph $G^{'}$ is a reachable subgraph of a graph $G$ iff there exists a frontier $S$ such that $G ∖ S = G^{'}$

Desired Properties of Coupled Edges

Edges that will always be removed from the graph at the same time should definitely be coupled. Formally:

A set of edges $\overline{e}$ expire together on a graph $G$ iff for all reachable subgraphs $G^{'}$ , $G^{'}$ either contains all edges in $\overline{e}$ or none of them, i.e.:

$\overline{e} \cap e d g es (G) = \overline{e}$ , or
$\overline{e} \cap e d g es (G) = \emptyset$

If a set of edges $\overline{e}$ expire together on a graph $G$ , then there must exist a set of edges $\overline{e^{'}} \supseteq \overline{e}$ that are coupled for $G$ .

Note that we could in principle define coupling as such, but we could also consider a stronger definition we describe below:

Definition Based on Unblocking Frontier Expiries

Our stronger definition is based on the following two observations:

First, if removing a frontier $S$ does not unblock any node in a graph, there is no way for the removal to be observed in the program. Therefore, such frontiers do not need to be considered for the purpose of asserting properties about a place once it becomes accessible. We can instead define coupling via notion of unblockings of a graph, where an unblocking is an ordered list of the sets of nodes that become available by repeated removal of frontiers.

Unblockings

An unblocking $U$ of a graph $G$ is an ordered partitioning of the non-root nodes of $G$ into non-empty subsets $S_{1}, \dots, S_{n}$ , satisfying the property that there exists a frontier $S^{'}$ of $G$ with an expiry that unblocks all nodes in $S_{1}$ , and $S_{2}, \dots S_{n}$ is an unblocking of $G ∖ S^{'}$ . The function $Ub (G)$ denotes the set of all unblockings of $G$ .

Correspondingly, we can define edges as coupled if they always observably expire together, i.e. at all points when a node becomes accessible, they are either all in the graph or none of them are.

Reachable Subgraphs

Formally, for a graph $G_{0}$ and an unblocking $U = S_{1}, \dots, S_{n}$ of $G_{0}$ , the reachable subgraphs $R (U, G_{0})$ of an unblocking $U = S_{1}, \dots, S_{n}$ is the list of graphs $G_{0}, \dots, G_{n}$ where $\forall i, 1 ⩽ i ⩽ n . G_{i} = G_{i - 1} ∖ S_{i}$ . The function $\hat{R}$ is the lifting of $R$ to sets of unblockings: $\hat{R} (\overline{U}, G) = U \in \overline{U} ⋃ R (U, G)$

Therefore, edges should be coupled if they are either all present or all absent for each graph in $\hat{R} (Ub (G), G)$ .

The second observation is that this set can be computed by considering only a subset of the unblockings in $G$ . This is because an unblocking $U$ can subsume an unblocking $U^{'}$ in the sense that the reachable subgraphs of $U$ are a superset of the reachable subgraphs of $U^{'}$ .

Subsumption

An unblocking $U = S_{1}, \dots, S_{n}$ is immediately subsumed by an unblocking $U^{'} = S_{1}^{'}, \dots S_{n + 1}^{'}$ (denoted $S u b (U, U^{'})$ ) iff there exists an $i, 0 ⩽ i ⩽ n$ such that $S_{i} = S_{i}^{'} \cup S_{i + 1}^{'}$ and $\forall j < i . S_{j} = S_{j}^{'}$ and $\forall j > i + 1. S_{j} = S_{j + 1}^{'}$ .

$U^{'}$ subsumes $U$ (denoted $U < U^{'}$ ) iff $⟨ U, U^{'} ⟩$ is in the transitive closure of $S u b$ .

Theorem (Subsumption): If $U < U^{'}$ , then $R (G, U) \subset R (G, U^{'})$

Distinct Unblockings

The distinct unblockings of a graph $G$ (denoted $Dub (G)$ ) is the subset of $G^{'} s$ unblockings obtained by removing all non-minimal elements w.r.t $<$ .

Theorem (Distinct Unblockings): For all graphs $G$ , $\hat{R} (Ub (G), G) = \hat{R} (Dub (G), G)$

Effective and Maximal Coupling

A set of edges $\overline{e}$ are effectively coupled for a graph $G_{0}$ iff for all reachable subgraphs $G^{'}$ in the distinct unblockings of $G$ , $G^{'}$ contains either all edges in $\overline{e}$ or none of them. A set of edges $\overline{e}$ is maximally coupled if it is effectively coupled and not a subset of an effectively coupled set.

Theorem (Correctness)

If a set of edges $\overline{e}$ expire together on a graph $G$ , then there exists a set of edges $\overline{e^{'}} \supseteq \overline{e}$ that are maximally coupled on $G$ .

Proof

Recall that a set of edges $\overline{e}$ expire together on $G$ if every reachable subgraph $G^{'}$ either contains all edges in $\overline{e}$ or none of them, and that a set of edges are definitely coupled if for all reachable subgraph $G^{''}$ in the distinct unblockings of $G$ contains either all edges in $\overline{e}$ or none of them.

Therefore it is sufficient to show that the reachable subgraphs in the distinct unblockings of $G$ is a subset of the reachable subgraphs of $G$ . The proof makes use of the following lemma:

Lemma: Valid Expiry on Unions of Frontier Nodes

If $S$ is a frontier of $G$ and $S^{'}$ is a frontier of $G ∖ S$ , then $S \cup S^{'}$ is a frontier of $G$ and:

$G ∖ S ∖ S^{'} = G ∖ (S \cup S^{'})$

Proof is TODO

Then, let $U$ be an arbitrary unblocking of $G$ , it follows by induction on the list of frontiers corresponding to the nodes in $U$ , that any for every reachable subgraph $G^{'}$ of $U$ , there exists a frontier $S$ of $G$ such that $G ∖ S = G^{'}$ . Therefore, for all unblockings $U$ of $G$ , the reachable subgraphs of $U$ are a subset of the reachable subgraphs of $G$ .

Test Graphs

`m` function

#![allow(unused)]
fn main() {
fn m<'a: 'c, 'b: 'e, 'c, 'd, 'e, T>(
    x: &'a mut T,
    y: &'b mut T,
) -> (&'c mut T, &'d mut T, &'e mut T)
    where 'a: 'd, 'b: 'd {
         unimplemented!()
}
}

`w` function

#![allow(unused)]
fn main() {
fn w<'a: 'd, 'b: 'd, 'c: 'e, 'd, 'e T>(
    x: &'a mut T,
    y: &'b mut T,
    z: &'c mut T,
) -> (&'d mut T, &'e mut T) where 'b: 'e {
         unimplemented!()
}
}

Previous Example

Additional Examples from HackMD

One Lifetime Reborrower Function

#![allow(unused)]
fn main() {
fn f<'a>(x: &'a mut T) -> &'a mut T {
    x
}
}

Possible Outlives Reborrower Function

#![allow(unused)]
fn main() {
fn f<'a, 'b: 'a>(x: &'a mut T, y: &'b mut T) -> &'a mut T {
    todo!()
}
}

Non-Bipartite Graph Example

Owned State

This section describes a design that is currently being implemented and is subject to change.

The Owned State describes the state of owned places at a program point. It consists of two layers:

The Initialisation State, which tracks which owned places are initialised, uninitialised, or shallowly initialised.
Materialised extensions, which extend the leaves of the initialisation state to reach the roots of borrows in the borrow PCG.

Place capabilities are computed from the owned state and the borrow state (see Computing Place Capabilities), rather than being stored and updated directly.

The key motivation for this design is that the initialisation state does not depend on what places are borrowed. This independence simplifies the join algorithm significantly, because borrows no longer force the owned state to be unpacked in borrow-dependent ways.

Initialisation State

Initialisation Capabilities

Each leaf node in the initialisation state carries one of three initialisation capabilities, ordered as $D > S > U$ :

Deep (`D`)

The place is fully initialised. All memory reachable from this place (including through dereferences) is valid and accessible. This is the state of a place after it has been assigned a value.

Shallow (`S`)

The place is shallowly initialised: the place itself holds a valid value, but memory behind a dereference may not be initialised. This state arises only for Box-typed places, where the heap allocation exists but no value has been written through the pointer yet.

Uninit (`U`)

The place is uninitialised or has been moved out of. No reads are permitted; only writes (to re-initialise the place) are allowed.

Tree Structure

The initialisation state is a forest of trees, one per allocated MIR local. The root of each tree is a local variable, and internal nodes correspond to place expansions (unpacking a struct or tuple into its fields).

The key structural properties are:

Leaf nodes carry an initialisation capability (D, S, or U).
Internal nodes have no explicit capability; their capability is derived from their children. An internal node exists only because one or more of its descendants has a different initialisation status than its siblings.
Invariant: if a tree is expanded (i.e. is not a single leaf node), then at least one of its leaves must be U or S. Otherwise, the tree would be collapsed to a single D leaf.

For example, after executing consume(pair.0) on a pair: (String, String):

pair
├── .0: U
└── .1: D

The tree is expanded because pair.0 has been moved out while pair.1 remains initialised.

In contrast, a fully initialised pair is represented as a single leaf:

pair: D

Join Algorithm

The join algorithm on the initialisation state operates pointwise on the tree structure. Because the initialisation state is independent of borrows, the join does not need to consult the borrow state.

The algorithm is defined recursively:

$join (leaf (s_{1}), leaf (s_{2})) = leaf (min (s_{1}, s_{2}))$

$join (leaf (S), internal (n)) = leaf (S)$

$join (leaf (U), internal (n)) = leaf (U)$

$join (leaf (D), internal (n)) = internal (n)$

$join (internal (m), internal (n)) = internal (join (m_{0}, n_{0}), join (m_{1}, n_{1}), \dots)$

The intuition behind these cases:

Two leaves: take the minimum capability. If either side is uninitialised, the join must conservatively assume the place may be uninitialised.
Leaf U or S vs internal: the leaf dominates because if the place is (at best) uninitialised or shallowly initialised, the detailed expansion on the other side is irrelevant.
Leaf D vs internal: the internal node's structure is preserved, because a deeply initialised place is compatible with any expansion of that place.
Two internals: join children pointwise.

Example

Consider the following program:

#![allow(unused)]
fn main() {
type Pair = (String, String);

fn f(choice: bool) {
    let mut pair0 = (String::new(), String::new());
    let mut pair1 = (String::new(), String::new());
    let mut pair2: Pair;
    let mut rx: String;
    if choice {
        rx = pair0.0;
        pair2 = pair1; // {pair0: {.0: U, .1: D}, pair1: U}
    } else {
        rx = pair1.0;
        pair2 = pair0; // {pair0: U, pair1: {.0: U, .1: D}}
    }
    // join: {pair0: U, pair1: U}
}
}

At the join point, pair0 has state {.0: U, .1: D} on one branch and U on the other. Applying the rule $join (leaf (U), internal (n)) = leaf (U)$ , the result is pair0: U. Symmetrically, pair1: U.

Materialised Extensions

The leaves of the initialisation state serve as the roots of materialised extensions. A materialised extension is an additional subtree that grows off a leaf of the initialisation state to reach places that are targets of borrows in the borrow PCG.

Materialised extensions exist because borrowing a sub-place does not change the initialisation state (e.g. &mut x.f does not expand x in the init state), but the owned state still needs to represent the expanded structure so that the borrow target is a node in the graph.

Construction

For each leaf $l$ of the initialisation state with place $p$ , if there exist places $q_{1}, \dots, q_{n}$ in the borrow PCG that are strict descendants of $p$ (i.e. $p$ is a strict prefix of each $q_{i}$ ), then a materialised extension tree is constructed by expanding $p$ toward each $q_{i}$ . The materialised extension tree uses the same expansion structure as owned place expansions. Leaves of the materialised extension carry no additional data.

Example

Suppose x.0 is moved out and x.1 is D, and there is a borrow targeting x.1.h:

Initialisation state:
  x
  ├── .0: U
  └── .1: D

Materialised extension off x.1:
  x.1
  └── .h  (materialised)

The full owned state combines both: the init state provides the tree x -> {x.0, x.1}, and the materialised extension provides the edge x.1 -> {x.1.h}.

A simpler example where the init state is not expanded:

#![allow(unused)]
fn main() {
let mut pair = (String::new(), String::new());
let rx = &mut pair.0;
// init state: {pair: D}
// materialised: pair -> {pair.0, pair.1}
}

Here the initialisation state is a single leaf {pair: D} (borrowing does not change initialisation), but the materialised extension expands pair so that pair.0 (the borrow target) is a node in the owned state.

Edge Capabilities

This section describes a design that is currently being implemented and is subject to change.

In the updated design, edges in the Owned PCG (specifically, unpack hyperedges) carry an edge capability that describes whether the expansion is mutable or immutable. This is used to generate upgrade/downgrade annotations when the mutability of an edge changes between PCG states.

Edge Capability Values

An edge capability is one of:

Immutable (I): the expansion is under a shared borrow. The parent place has capability R, and the children inherit R.
Mutable (M): the expansion is not constrained by a shared borrow. Children may have capabilities E, W, or e depending on their initialisation state.

Computing Edge Capabilities

Edge capabilities are computed, not tracked explicitly. An edge's capability is determined by the borrow state: if any place in the subtree rooted at the parent of the edge is blocked by a shared (immutable) borrow, the edge is I; otherwise it is M.

Annotations from Edge Capability Changes

When the edge capability of an unpack hyperedge changes between two consecutive PCG states (e.g. from I to M because a shared borrow expired), an upgrade annotation is emitted. Conversely, when an edge transitions from M to I (e.g. because a shared borrow is created), a downgrade annotation is emitted.

These edge-oriented annotations replace the previous per-place capability update annotations (such as "Restore pair.1 to E"). The advantage is that edge annotations make explicit where a capability change originates, rather than describing isolated per-place updates whose source is implicit.

Example

Consider the following program:

#![allow(unused)]
fn main() {
fn shared_borrow() {
    let mut pair = (String::new(), String::new());
    // {pair: D} -> {pair: E}

    let r0 = &pair.0;
    // edge {pair} -> {.0, .1}: I
    // {r0: E, pair: R, pair.0: R, pair.1: R}

    let p1 = pair.1;
    // edge {pair} -> {.0, .1}: M   (upgrade emitted)
    // {r0: E, pair: ∅, pair.0: R, pair.1: W}
}
}

When pair.1 is moved, the shared borrow on pair.0 no longer covers pair as a whole, so the edge {pair} -> {.0, .1} is upgraded from I to M, and an upgrade {pair} -> {.0, .1} annotation is emitted.

Computing Place Capabilities

This section describes a design that is currently being implemented and is subject to change.

In the updated design, place capabilities are computed from the owned state and the borrow state, rather than being tracked and updated via explicit rules. This eliminates a class of soundness issues (see below) and simplifies the capability accounting logic.

Motivation

The previous design maintained a map from places to capabilities that was updated by three mechanisms: statement evaluation, borrow expiry/activation, and control-flow joins. This led to three problems:

Unsoundness: the rule "when a mutable borrow expires, restore the place's capability to E" is unsound when the place has been conditionally moved out. After a join, the move-out information is lost, and the borrow expiry incorrectly restores E to a potentially uninitialised place.
Insufficient annotations: per-place capability update annotations (e.g. "Restore pair.1 to E") do not explain the source of the capability. Edge-oriented annotations (see Edge Capabilities) address this.
Complex rules: the rules for updating capabilities were numerous and difficult to justify, complicating an eventual soundness proof.

Computing Capabilities for Owned Places

The capability of an owned place $p$ is determined as follows:

Look up $p$ in the initialisation state. If $p$ is within a materialised extension, use the initialisation capability of the ancestor leaf from which the extension grows.
Based on the initialisation capability:
- U (uninitialised): the place capability is W (write-only).
- S (shallow): the place capability is e (shallow exclusive).
- Internal node: the place capability is ∅ (none), because the place is only partially initialised.
- D (deep): proceed to check the borrow state below.
If the place is fully initialised (D), consult the borrow state:
- If the type of $p$ contains a region $r$ but the lifetime projection $p ↓ r$ does not exist in the borrow state: the capability is e (shallow exclusive).
- If $p$ or any of its sub-places is blocked by a mutable borrow: the capability is ∅ (none).
- If $p$ or any of its sub-places is blocked by a shared borrow: the capability is R (read).
- Otherwise: the capability is E (exclusive).

Computing Capabilities for Borrowed Places

For borrowed places, the capability is determined by the borrow PCG:

If the place projects a shared borrow (e.g. *r where r: &T): R (read).
Otherwise, if the place is a leaf in the borrow PCG: E (exclusive).
Otherwise: ∅ (none).

Example: Soundness Fix

The following example demonstrates how the computed capability approach avoids the unsoundness of the previous design:

#![allow(unused)]
fn main() {
fn conditional_move(choice: bool) {
    let mut p = String::new();
    let mut p2 = String::new();
    let mut rp: &mut String;

    // BB0: init {p: D, p2: D, rp: U} -> cap {p: E, p2: E, rp: W}
    if true {
        consume(p);
        rp = &mut p2;
        // BB1: init {p: U, p2: D, rp: D} -> cap {p: W, p2: ∅, rp: E}
    } else {
        rp = &mut p;
        // BB2: init {p: D, p2: D, rp: D} -> cap {p: ∅, p2: E, rp: E}
    }

    // BB3: join -> init {p: U, p2: D, rp: D}
    //           -> cap  {p: ∅, p2: ∅, rp: E}

    *rp = String::from("updated");

    // After borrow expiry:
    // init {p: U, p2: D, rp: D} -> cap {p: W, p2: E, rp: e}
}

In the previous design, borrow expiry would have incorrectly restored p to E. In the computed design, p remains W after expiry because the initialisation state still records p: U.

Keyboard shortcuts

PCG Documentation