Relation-Rule Based Callgraph Inference

2024/07/23 Leonard Ritter, Duangle GbR

A relational event graph is described with unpredicated rules relating events, connecting a product of n sources and m conditions to a single sink (also called the goal). The format is as follows:

Y :- X[1], X[2], ..., X[n], c[1], c[2], ..., c[m].

means "when all events X[1]..X[n] have happened, and all conditions c[1]..c[m] have been met, then the goal Y will happen". Because the right hand side is a product, the arguments are commutative and associative, so that the rule makes no demands as to in what order the sources are called, or the conditions are evaluated.

The resulting graph is a directed hypergraph of the B-graph class, with each rule constituting a B-arc.

The task is now to manifest the relational event graph as a callgraph (as precursor of a control flow graph) that satisfies all rules.

We recognize the callgraph as a transitive reduction of the hypergraph, which means all source edges of a rule are reducible to one (and only one) edge that connects to a callgraph path through all sources of the rule.

Because the callgraph is initially incomplete, we must start from trivially reducible rules with single sources. This adds more edges to the callgraph, which allows us to reduce more complex rules, until a fixpoint is reached because no more rules can be reduced. See this example (conditions omitted for clarity):

(1) A.
(2) B :- A.
(3) B :- A, B, C.
(4) C :- A, B.
(5) D :- A, B, C.
// iteration 1: trivially reducible rules
(1) A.
(2) B :- A.
// iteration 2
(4) C :- B. // :- A
// iteration 3
(3) B :- C. // :- B :- A
(5) D :- C. // :- B :- A

If a rule reduction leads to more than one possible callgraph edge, there exists an ambiguity in the program that the user must resolve by adding more edges.

If one or more rules remain undecidable after the fixpoint, the user should be warned as the rules would never be applied.

The resulting callgraph must still be converted to a CFG. The primary transformation required here is to bring all calls in topological order. Noteworthy are these caveats:

Branching rules may have non-exclusive conditions, implying that all outgoing edges of an event must be called in arbitrary order. Where we can prove outgoing edges to be exclusive, an if/switch optimization can be performed.
Likewise, rules are merged non-exclusive, meaning all incoming edges of an event E exist in an OR relationship and must be serialized in arbitrary order, with E as the final event. Where we can prove incoming edges to be exclusive, a phi-node optimization can be performed.
Because sources are not required to dominate the reduced rules, we can not guarantee at compile time that all source events of a rule will always have happened; we can only verify that a goal is connected to all sources. An additional transformation of the callgraph is required, in which non-dominating sources are structured to optional arguments on the dominant path, and then tested and destructured where the rule conditions are checked. Once destructured, subsequent events with the same dependence can reuse the work. This mechanism saves the user from having to implement sum type encodings to circumvent dominance issues.

Merge Events

(Subject to ongoing research)

The semantics of merging events (continuations with multiple in-edges, which have a non-exclusive OR relationship) need to be defined. Merging events must be sync points, analog to how branching events are (non-exclusive) forks of execution flow.

Let's model rules for an interpreter to gain more clarity on this question.

To begin with, it seems necessary to require that merge events only evaluate their rules once all linear events reach a fixed state. But merge event data is still ambiguous.

One prerequisite of our events is that each event is linear, i.e. has a single optional state: Some definitive value or None.

But multiple rules may apply different values to the same merge event, meaning we need to merge those values somehow.

Our 4 options are:

We arbitrarily choose a value.
The event doesn't happen when there is more than one possible value (XOR behavior).
A fold operator f decides how the values are to be combined. The controller logic is described by the following Scopes pseudocode:
```
fold (x = empty) for y in source-events
    if (empty? y) x
    elseif (empty? x) y
    else (f x y)
```
The event must not take any arguments (void fold operator).

To evaluate:

(1) is surprising to the user. Even when deterministic, its behavior is difficult to predict and subject to subtle behavioral change when the program is altered.

(2) might be surprising to the user since it alters datalog semantics. The :- operator is expected to describe a non-exclusive implication. We would need to use a different operator such as -:-, standing for iff (if and only if). It requires the user to define a separate rule for every possible conjunctive case. Where an event should merge a set of 16 different events, 65535 rules would have to be defined to cover every case. The complexity can be reduced by chaining 15 events of which each defines three rules (a ^ b ^ ab), but that would still require the definition of 45 rules!

(3) could be quite powerful, but adds complexity to the system. This is only permissible when there is no other way.

(4) is predictable and stable under program changes, but limits expressiveness. Due to relaxed dominance, subsequent rules may still depend on both the merge event and one of the events that it merged, but this way, branching values remain irreconcilable. There simply is no way to merge data from two different events into a single one. We can not even build flow that combines two booleans.

paniq/dril_rel_callgraph.md

Relation-Rule Based Callgraph Inference

Merge Events