Relation-Rule Based Callgraph Inference

2024/07/23 Leonard Ritter, Duangle GbR

A relational event graph is described with unpredicated rules relating events, connecting a product of n sources and m conditions to a single sink (also called the goal). The format is as follows:

Y :- X[1], X[2], ..., X[n], c[1], c[2], ..., c[m].

means "when all events X[1]..X[n] have happened, and all conditions c[1]..c[m] have been met, then the goal Y will happen". Because the right hand side is a product, the arguments are commutative and associative, so that the rule makes no demands as to in what order the sources are called, or the conditions are evaluated.

The resulting graph is a directed hypergraph of the B-graph class, with each rule constituting a B-arc.

The task is now to manifest the relational event graph as a callgraph (as precursor of a control flow graph) that satisfies all rules.

We recognize the callgraph as a transitive reduction of the hypergraph, which means all source edges of a rule are reducible to one (and only one) edge that connects to a callgraph path through all sources of the rule.

Because the callgraph is initially incomplete, we must start from trivially reducible rules with single sources. This adds more edges to the callgraph, which allows us to reduce more complex rules, until a fixpoint is reached because no more rules can be reduced. See this example (conditions omitted for clarity):

(1) A.
(2) B :- A.
(3) B :- A, B, C.
(4) C :- A, B.
(5) D :- A, B, C.
// iteration 1: trivially reducible rules
(1) A.
(2) B :- A.
// iteration 2
(4) C :- B. // :- A
// iteration 3
(3) B :- C. // :- B :- A
(5) D :- C. // :- B :- A

If a rule reduction leads to more than one possible callgraph edge, there exists an ambiguity in the program that the user must resolve by adding more edges.

If one or more rules remain undecidable after the fixpoint, the user should be warned as the rules would never be applied.

The resulting callgraph must still be converted to a CFG. The primary transformation required here is to bring all calls in topological order. Noteworthy are these caveats:

Branching rules may have non-exclusive conditions, implying that all outgoing edges of an event must be called in arbitrary order. Where we can prove outgoing edges to be exclusive, an if/switch optimization can be performed.
Likewise, rules are merged non-exclusive, meaning all incoming edges of an event E exist in an OR relationship and must be serialized in arbitrary order, with E as the final event. Where we can prove incoming edges to be exclusive, a phi-node optimization can be performed.
Because sources are not required to dominate the reduced rules, we can not guarantee at compile time that all source events of a rule will always have happened; we can only verify that a goal is connected to all sources. An additional transformation of the callgraph is required, in which non-dominating sources are structured to optional arguments on the dominant path, and then tested and destructured where the rule conditions are checked. Once destructured, subsequent events with the same dependence can reuse the work. This mechanism saves the user from having to implement sum type encodings to circumvent dominance issues.

Merge Events

(subject to ongoing research)

i really need to figure out what the semantics of merge-events (continuations with multiple in-edges, which have a non-exclusive OR relationship) are, first. these must be sync points, analog to how branch-events are (non-exclusive) forks.

i am trying to model an interpreter to gain more clarity on this question.

for starters, it seems necessary to require that merge events only evaluate their rules once all linear events reach a fixed state.

but merge event data is still ambiguous.

one prerequisite of our events is that each event is linear, i.e. has a single optional state: Some definitive value or None.

but multiple rules may apply different values to the same merge event, meaning we need to merge those values somehow.

our 4 options:

we choose a value.
the event doesn't happen when there is more than one possible value (XOR behavior).
a fold operator decides how the values are to be combined.
the event must not take any arguments (void fold operator).

(1) and (2) are surprising to the user. they are both difficult to predict and subject to subtle behavioral change when the program is altered.

(3) could be quite powerful, but adds complexity to the system. this is only permissible when there is no other way.

(4) is predictable and stable under program changes, but limits expressiveness.

paniq/dril_rel_callgraph.md

Relation-Rule Based Callgraph Inference

Merge Events