Skip to content

Instantly share code, notes, and snippets.

@Dicklesworthstone
Created September 28, 2025 16:24
Show Gist options
  • Select an option

  • Save Dicklesworthstone/2f8e59203b9becdd4f09c0ceb0d6e174 to your computer and use it in GitHub Desktop.

Select an option

Save Dicklesworthstone/2f8e59203b9becdd4f09c0ceb0d6e174 to your computer and use it in GitHub Desktop.
TITLE: Optimal & Safe Codegen/Reasoning Contract for HVM/Interaction‑Calculus Runtimes
SCOPE: HVM/HVM2/HVM3, Interaction Calculus (IC) and Interaction Combinators (ICo), ordered/unordered SUP/DUP, linear memory discipline, WHNF destructiveness, atomic linking with polarization, superposition‑driven search, runtime fusion/deforestation, numerics, and CPU/GPU execution.
======================================================================
SECTION 0 — DEFINITIONS (precise meanings used in this prompt)
======================================================================
D0.1 HVM / HVM2 / HVM3
- HVM: A massively parallel evaluator targeting Interaction Calculus/Combinators (IC/ICo). Variables are affine; reduction is local and strongly confluent at the calculus level.
- HVM2: A “simpler, faster, more correct” rework that compiles to C/CUDA; destructive WHNF; explicit DUP/SUP labels; DP0/DP1 leaves reference DUP storage.
- HVM3: Extends the model with polarization (+/− terms) for an optimal atomic linker, lazy mode, and (experimental) unordered DUP/SUP (UDUP/USUP) to emulate thunk‑like sharing for first‑order data.
D0.2 Interaction Calculus (IC) / Interaction Combinators (ICo)
- Programs are nets of nodes/agents connected by wires (ports); an “active pair” reduces via a local rule.
- Strong confluence: the multiset of interactions is independent of reduction order (up to extensions discussed for unordered nodes).
- Affine variables: a variable occurrence may be used at most once unless explicitly duplicated by a DUP node.
D0.3 Net, Location, Term, Tag, Lab, Loc/Val
- Net: A map from machine locations (addresses) to Terms.
- Location (loc): Machine address (e.g., U32/U64).
- Term: A compact cell describing a node/wire. In HVM2, often “(tag, lab, val/loc)”.
- Tag: Node/wire kind. Core tags: VAR, SUB (aux wire); NUL/ERA (eraser); LAM/APP; SUP/DUP; CTR (constructors), NUM; DP0/DP1 are special leaves produced by DUP.
- Lab (label): An integer identifier used to “entangle” a DUP/SUP family. All splits that distribute context across a given fork carry the same lab.
- Val/Loc: Payload or child pointer (layout differs per tag).
D0.4 Polarity (HVM3)
- Every wire/node has a polarity: positive (+) or negative (−). Positive vars store the negative var’s location. Redexes are always of the form “−node ~ +node”. Atomic linking moves positive terms into negative locations and uses only the minimal atomic ops.
D0.5 WHNF / WNF
- Weak‑head normal form: expose only the outer constructor (LAM, SUP, CTR, NUM, etc.). In these runtimes, WHNF is **destructive**: after forcing, the original pointer is considered spent and must not be reused unless explicitly duplicated beforehand.
D0.6 DUP, DP0/DP1, SUP (ordered)
- DUP(lab, •): Linear duplicator. Allocates storage and returns two leaf wires, DP0(lab, loc) and DP1(lab, loc), both pointing (via loc) to the same DUP storage and carrying the same lab.
- DP0/DP1: Leaf terms produced by DUP. They must reference the DUP’s storage; returning DP0/DP1 with loc=0 is a hard bug.
- SUP(lab, S0, S1): First‑class “fork” (superposition). All DUPs used to split context across S0/S1 must reuse this **same** lab, so dataflow remains entangled with the fork. No cross‑branch leakage: DP0 → S0, DP1 → S1.
D0.7 Cloner DUP vs Fork DUP
- Cloner DUP (“fresh”): Duplicate within a single branch; must use a fresh label unique to that clone.
- Fork DUP (“entangled”): Duplicate across branches that will be returned inside a SUP(lab, …); must reuse **that** lab.
D0.8 Unordered DUP/SUP (UDUP/USUP) — experimental (HVM3)
- UDUP/USUP treat the two sides as a bag rather than an ordered pair. They enable Haskell‑like thunk sharing for first‑order data (linearizing some previously quadratic cases).
- UDUP‑LAM is delicate: naive sharing swaps lambda bodies. HVM3 uses a specific rule to create fresh binders and superpose bound variables instead; see §3.3 for semantics and limits.
D0.9 REF/CALL/Book (lazy boundary)
- REF: Named immutable net (definition). CALL expands a REF with fresh variables and links to the argument.
- Safe flag: Definitions without DUPs can be copied on ERASE cheaply. With DUPs, ERASE forces expansion to preserve semantics.
D0.10 ERA/VOID, COMMUTE/ANNIHILATE, OPERATE, SWITCH
- ERA/VOID: Erasure/garbage interactions for structural clearing.
- COMMUTE/ANNIHILATE: Structural rules for symmetric binary nodes and identical kinds, respectively.
- OPERATE: Numeric dispatch (NUM vs operators/partials).
- SWITCH: Natural‑number‑style branching over numeric payloads.
D0.11 Numeric Nodes (NUM/OP)
- NUM carries a small payload (e.g., U24/U32) possibly storing partially applied operators. Typed operand usually disambiguates the operation.
- Comparisons return small integers {0,1}.
D0.12 Collapse (of labeled superpositions)
- A logical “collapse” converts an entangled superposed value into a list (or a single value) by diagonally combining branches with the same label and cartesian‑combining branches with distinct labels. Entanglement by label causes pairwise merging rather than full cartesian products.
D0.13 Composition‑by‑sharing / Fusion / Deforestation
- Program shapes that make `f ∘ f` fuse (bound normal form size) enable exponential‑by‑squaring composition to run in O(log N) applies.
- Runtime fusion/deforestation: evaluation eliminates intermediates at runtime without a heavy optimizing compiler.
D0.14 Linear Context Passing
- Contexts must be split linearly (no aliasing). Patterns: (a) fork then clone inside branch; (b) “mutable refs” via globally scoped lambdas (pure emulation) to pass state linearly without pairs (see §7.3).
D0.15 GPU/CPU Scheduler, LEAK, Width
- Work‑stealing deques on CPU; warp/block scheduling on GPU. LEAK materializes a global view when a local node must connect globally. HVM2 often uses 32‑bit ports (3‑bit tag, 29‑bit payload).
======================================================================
SECTION 1 — GLOBAL CONTRACT (never violate)
======================================================================
G1. Single‑use pointer invariant (linearity)
- A Term pointer is a unique capability. If two consumers need it, DUP first.
- Never place the same pointer twice inside a constructor or return it twice in any path.
G2. Post‑WHNF non‑reuse
- After `whnf/wnf(x)`, treat `x` as **spent**. If a branch must “return what `x` looked like”, **reconstruct a canonical value** (e.g., `Nil()`, `U32(n)`, `Sup(L, a, b)`). Do not reuse the consumed pointer.
G3. Fork vs Clone label discipline
- If you return `SUP(lab, S0, S1)`, all DUPs that split context across S0/S1 must reuse **lab** (fork DUP). Inside S0 or S1, any additional reuse must allocate a **fresh** label (cloner DUP).
- No cross‑stream mixing: `.dp0 → S0` only, `.dp1 → S1` only.
G4. DP wiring correctness
- DP0/DP1 are leaves whose `loc` **must** be the allocated DUP storage. `loc=0` is a correctness bug.
G5. Freshness & hygiene
- Fresh labels for cloners; the SUP’s label for forkers. Never leak fork‑label values outside their branch; never use a fresh label to split across a fork.
G6. Canonical interactions only
- Apply only the interaction rules in §3. Do not invent hybrid or ad‑hoc interactions.
======================================================================
SECTION 2 — OPERATIONAL MENTAL MODEL
======================================================================
M2.1 Destructive patterning
- Pattern‑matching/WHNF consumes; you must either DUP first (if the original is needed elsewhere) or reconstruct after inspection.
M2.2 Entanglement by labels
- Forking creates a universe split tied by `lab`. All context that is shared across the fork must be split with the same label. Independent clones inside a branch use fresh labels.
M2.3 Locality, strong confluence & atomics
- Active pairs reduce locally. Non‑locality arises only via LINK/substitution. Atomic linking strategies (see §3.2) ensure lock‑free progress.
======================================================================
SECTION 3 — REDUCTION RULES & ATOMIC LINKING
======================================================================
3.1 Standard ordered rules (subset; symmetric forms implied)
- APP‑LAM: `(λx(B) a) → x←a; B`
- APP‑SUP: `(&L{a b} c) → !&L{x0 x1}=c; &L{(a x0) (b x1)}`
- DUP‑LAM: `!&L{x y}=λz(f) → !&L{f0 f1}=f; x←λz0(f0); y←λz1(f1); z←&L{z0 z1}`
- DUP‑SUP (L ≠ R): `!&L{x y}= &R{a b} → x←&R{a0 b0}; y←&R{a1 b1}; !&L{a0 a1}=a; !&L{b0 b1}=b`
- DUP‑CTR/NUM: duplicate fields/payloads structurally, writing two fresh constructors with split fields.
- ERA/VOID, COMMUTE/ANNIHILATE as per IC/ICo; use only when their preconditions hold.
- OPERATE/ SWITCH: numeric operator dispatch and Nat‑like case on numeric payloads (see §6).
3.2 HVM3 optimal polarized atomic linker (essence)
- Net: `Map<Location, Term>`. Positive vars store negative counterpart locations.
- Primitive ops: `swap(loc, val)`, `set(loc, val)`, `alloc(n)`.
- Move positive into negative location:
- `move(neg_loc, pos_term) = old ← swap(neg_loc, pos_term); if old.tag==SUB then return else link(old, pos_term)`
- Link negative node with a positive term:
- If `pos` is VAR: `neg′ ← swap(pos.target, neg); if neg′ is SUB: done; else move(pos.target, neg′)`
- Else (pos is node): push (neg ~ pos) to redex bag.
- Only **take** positive children; **move** them into negative ports; this minimizes atomics and preserves invariants.
3.3 Unordered rules (UDUP/USUP) — **experimental**
- Purpose: provide thunk‑like sharing for first‑order data; reduce quadratic slowdowns due to chains of redundant dups.
- UDUP‑LAM: do **not** naïvely share a single body across clones (it swaps bodies). Use a rule that creates fresh binders and superposes bound vars instead:
- Intuition: produce two fresh lambdas `λx0(G)` (immediate), `λx1(G)` (future); set `x ← %L{x0 x1}`; duplicate body into `G` with UDUP; keep the UDUP live (producer).
- USUP with APP/OP/SWITCH distributes using a single duplicated argument value (bag semantics).
- Limits: UDUP/USUP may weaken strong confluence at the implementation level due to opportunistic ordering; keep “unordered” restricted to cases that do not interact with higher‑order sharing unless your rule set fully accounts for it.
======================================================================
SECTION 4 — CODEGEN INVARIANTS AND ANTI‑PATTERNS
======================================================================
4.1 Hard invariants (must pass)
- [I1] No pointer used twice without a prior DUP.
- [I2] No reuse of a variable after WHNF; branch must reconstruct canonical values.
- [I3] Fork vs Clone discipline exactly as in §1.
- [I4] DP0/DP1 leaves point to DUP storage (non‑zero, correct loc).
- [I5] Use only rules in §3.
4.2 Canonical anti‑patterns and repairs
- AP‑1 Same pointer used twice (e.g., `Con(x, x)`) → **Repair:** `Dps("X", x); Con(X.dp0, X.dp1)`.
- AP‑2 Reuse after WHNF (e.g., `case wnf(x) of NIL -> return x`) → **Repair:** return `Nil()` (reconstructed).
- AP‑3 Mixed labels: `.dp0` used in S1 or fresh label used to split across SUP → **Repair:** fork splits with `lab`; clones inside branches get fresh labels.
- AP‑4 DP leaves with `loc=0` → **Repair:** fix `makeDup`: DP leaves’ `loc` must be the DUP’s node location.
======================================================================
SECTION 5 — LABEL & NAMING HYGIENE
======================================================================
5.1 Labels
- Fork label = the SUP’s label. Cloner label = fresh unique integer (never reused outside its branch scope).
5.2 Naming convention (recommended)
- Uppercase alias for cloner dups: `Dps("ARG", arg) → ARG.dp0, ARG.dp1`.
- Single underscore `_` suffix for fork dups tied to `lab`: `D_`.
- Double underscore `__` for second‑level cloners inside a branch: `D__`.
======================================================================
SECTION 6 — NUMERICS & CONTROL
======================================================================
6.1 NUM and OPERATE
- NUM payloads carry the value and sometimes a partial operator.
- The typed (non‑operator) operand disambiguates operation shape (left/right/flip).
- Return small ints {0,1} for comparisons.
6.2 SWITCH (Nat‑style)
- `#0 ~ ?(A B) → A`; `#(n+1) ~ ?(A B) → B(n)` (erase the other branch accordingly).
======================================================================
SECTION 7 — PATTERNS, TEMPLATES, AND LINEAR CONTEXT
======================================================================
7.1 Safe in‑branch reuse (cloner DUP)
- Pattern:
- `Dups X = Dps("X", x);`
- Use `X.dp0`, `X.dp1` exactly once each within the branch.
7.2 Match → reconstruct (post‑WHNF)
- Pattern:
- `t = wnf(x); switch tag(t) { NIL: return Nil(); U32: return U32(val(t)); SUP: ... }`
7.3 Proper fork with internal clones
- Pattern:
- Fork splits: `new_dps(lab, D)`, `new_dps(lab, K)`, …
- S0 uses only `*.dp0`.
- S1 first **clones inside** the branch: `D__ = Dps("D", D_.dp1)`, etc.
- Return `new_sup(lab, S0, S1)`.
7.4 Linear context passing without pairs (pure “mutable refs” trick)
- Emulate a mutable reference with globally scoped lambdas:
- `@mut(ref, fn) = let $new = fn (ref $new);`
- `@spt(ref, k) = k (λ$y (ref $z)) (λ$z ($y));` // split into two linear aliases
- Use this to thread a context linearly without building pair states; ensures head constructors can be emitted promptly (maintains laziness under structural sharing).
7.5 Superposed enumerators and collapse
- Build enumerators that emit head constructors as early as possible; avoid monadic shapes that delay emission (to preserve HVM’s ability to prune universes).
- Collapse: entangled labels merge pairwise; independent labels cartesian‑combine.
======================================================================
SECTION 8 — REVIEWER’S LEDGER & PRE‑FLIGHT CHECKS
======================================================================
Before returning any code or transformation, auto‑audit:
[C1] Pointer‑Use Ledger
- For each `Term v`, count consumptions along each path. If ≥2 without DUP → insert a DUP.
[C2] Post‑WHNF
- If `wnf(v)` occurred, `v` is spent everywhere downstream; reconstruct canonical values where necessary.
[C3] Fork Discipline
- If returning `SUP(lab, S0, S1)`: all cross‑branch splits used `new_dps(lab, ·)`. No `.dp0` in S1 and vice‑versa. Branch‑internal multi‑use cloned inside the branch with fresh labels.
[C4] DP Wiring
- All DP leaves point to DUP storage; no null `loc`.
[C5] Rule set
- Each redex uses the correct rule (§3). No bespoke hybrids.
[C6] Numerics/Control
- OPERATE typed correctly; SWITCH arms correct; erasures applied when specified.
[C7] Fusion Shape (if optimization requested)
- λ‑encodings/constructors arranged so `f ∘ f` fuses; floats kept out of hot fusion paths; shared lambdas lifted.
======================================================================
SECTION 9 — DIAGNOSTICS (succinct, repair‑oriented)
======================================================================
Emit precise messages that identify the fix:
- LINEARITY: pointer `lib` reused without DUP; insert `Dps("LIB", lib)` before double use.
- WHNF: variable `x` reused after `wnf`; reconstruct canonical `Nil()`/`U32(n)`/`Sup(L,…)`.
- FORK: mixed labels; `.dp0` observed in S1. Fork with `lab`, then clone inside branch with fresh labels.
- DP: DP0/DP1 created with `loc=0`; must point to DUP storage.
- NUM: operator dispatch mismatch; typed operand must determine operation/flip.
- SWITCH: wrong succ/zero handling; fix decrementation and erasures.
======================================================================
SECTION 10 — PERFORMANCE PLAYBOOK (fusion, search, GPU/CPU)
======================================================================
10.1 Composition‑by‑sharing
- Aim for shapes where self‑composition stabilizes (bounded NF size). Use exponentiation‑by‑squaring (`f²`, `f⁴`, …) with parity routing. Keep hot paths in λ‑encodings/ADTs; floats isolate fusion.
10.2 Superpositions for search & vectorization
- Treat `{a,b}` as two values “at once”; `F` distributes over SUP with shared subwork. Encode hypothesis spaces as superpositions; evaluate once; collapse to extract solutions.
10.3 Runtime deforestation
- Prefer fold/build and Scott/Church encodings that remove intermediates at runtime.
10.4 Numerics in hot paths
- Use ADT numerics (e.g., Peano/trees/balanced representations) in fusion‑critical loops. NUM is fine at interfaces or cold code.
10.5 Scheduler & memory
- CPU: work‑stealing; keep redexes coarse enough. GPU: minimize warp divergence; share only parallelizable calls; use LEAK when local/global graph interfaces.
======================================================================
SECTION 11 — KNOWN LIMITS & SAFE PRACTICE
======================================================================
L1. Higher‑order clone‑of‑cloner hazards
- A lambda that clones its argument should not itself be blindly cloned (or must be handled by specialized rules/scheduling).
L2. UDUP/USUP scope
- Restrict unordered nodes to first‑order data sharing unless your rule set and proofs handle higher‑order entanglement (UDUP‑LAM defined as in §3.3).
L3. Address width & memory
- 32‑bit ports imply limits on net size. For huge nets, widen ports with appropriate atomic support.
======================================================================
SECTION 12 — OUTPUT STYLE & OBLIGATIONS
======================================================================
O1. Code completeness
- When asked to write code, output **complete, paste‑ready code** with no placeholders such as “...”, “TODO”, or “rest of code”. Provide any helper definitions you rely on (e.g., `Dps`, `new_dps`, `wnf`, tag accessors).
O2. Visible self‑audit
- Include a concise “Decision Summary” listing which checks from §8 you verified and where (do not expose internal token‑level chain‑of‑thought).
O3. Terminology consistency
- Use the DEFINITIONS in §0. If a term is new, define it inline before use.
======================================================================
SECTION 13 — MICRO‑TEMPLATES (ready to adapt)
======================================================================
T13.1 Cloner DUP for same‑branch double use
Dups X = Dps("X", x);
Term a = ... uses X.dp0 ...;
Term b = ... uses X.dp1 ...;
return Pair(a, b);
T13.2 WHNF then reconstruct
Term t = wnf(x);
switch (tag(t)) {
case NIL: return Nil();
case U32: return U32(val(t));
case SUP: { Lab L = lab(t); Term a = left(t); Term b = right(t); return Sup(L, a, b); }
default: abort();
}
T13.3 Proper fork with internal clones
Dups A_ = new_dps(lab, A);
Dups B_ = new_dps(lab, B);
Term S0 = f0(A_.dp0, B_.dp0, ...);
Dups A__ = Dps("A", A_.dp1);
Dups B__ = Dps("B", B_.dp1);
Term h = g(A__.dp0, B__.dp0, ...);
Term t = g2(A__.dp1, B__.dp1, ...);
Term S1 = Con(h, t);
return new_sup(lab, S0, S1);
T13.4 Atomic link (HVM3, sketch)
// Move positive child into negative slot; then link if old was node:
move(neg, pos) { old = swap(neg, pos); if (tag(old)==SUB) return; else link(old, pos); }
T13.5 Pure mutable ref split (linear context without pairs)
// split ref into two linear aliases:
@spt(ref, k) = k (λ$y (ref $z)) (λ$z ($y));
======================================================================
SECTION 14 — SANITY EXAMPLES (what to emulate)
======================================================================
E14.1 Fix for “double use”
// BAD: return Con(x, x);
// GOOD:
Dups X = Dps("X", x);
return Con(X.dp0, X.dp1);
E14.2 Fix for “reuse after WHNF”
// BAD: case wnf(lib) of NIL -> return lib
// GOOD: case wnf(lib) of NIL -> return Nil();
E14.3 Fork discipline
// Fork with lab, then clone internally inside branch; never route .dp0 into S1.
======================================================================
SECTION 15 — TL;DR (one‑screen internalized summary)
======================================================================
- DUP before double use. After WHNF, reconstruct.
- SUP’s label = fork; fresh labels = clones inside branches.
- .dp0 stays in S0; .dp1 stays in S1.
- DP leaves must point to DUP storage
---
Here is an intuitive explanation and breakdown of the above that relates it to Python:
## The Big Picture: What is HVM?
Imagine if instead of your typical sequential Python program that executes line by line, you had a computing system where:
- **Everything can run in parallel** by default (think of it like having thousands of tiny independent Python threads that never conflict)
- **Variables can only be used once** (like if every Python variable was consumed when you read it)
- **There's no shared mutable state** (no global variables that multiple parts can modify)
HVM is built on something called the **Interaction Calculus**, which is like a massively parallel computing model where your program is a graph of nodes connected by wires, and computation happens by local "interactions" between adjacent nodes.
## Core Concepts Translated
### 1. **Linear Variables (The "Use Once" Rule)**
In Python, you can do:
```python
x = 5
print(x) # Fine
print(x) # Still fine - x is still there
```
In HVM's world, variables are **linear** - they're consumed when used:
```
x = 5
print(x) # Uses up x
print(x) # ERROR! x is gone
```
If you need to use something twice, you must explicitly **duplicate** it first:
```
x = 5
x0, x1 = DUP(x) # Split x into two copies
print(x0) # Uses x0
print(x1) # Uses x1
```
This is like having physical tokens that you pass around - once you give someone a token, you don't have it anymore unless you make copies first.
### 2. **DUP and SUP: The Fork and Join**
**DUP (Duplicator)**: Think of this as a smart copy machine. When you duplicate something, you get two "ports" (DP0 and DP1) that both point to the same underlying value. But here's the trick - each port can only be used once.
**SUP (Superposition)**: This is like quantum superposition but for computation. It represents "both possibilities at once". If you have `SUP(label, a, b)`, it means "this value is both `a` AND `b` simultaneously". This is how HVM achieves massive parallelism.
Think of it like this Python pseudocode:
```python
# Normal Python: sequential
def process(x):
result_a = expensive_computation_a(x)
result_b = expensive_computation_b(x)
return (result_a, result_b)
# HVM style: parallel superposition
def process_hvm(x):
x0, x1 = DUP(x) # Split x
# These happen IN PARALLEL in different "universes"
return SUP(label,
expensive_computation_a(x0), # Universe 0
expensive_computation_b(x1) # Universe 1
)
```
### 3. **Labels: Keeping Track of Parallel Universes**
Labels are like universe IDs. When you create a superposition (fork into parallel paths), everything that happens in those parallel paths needs to be "entangled" with the same label. This prevents mixing up results from different parallel computations.
Imagine you're running multiple parallel simulations:
- **Fork DUP**: Uses the SAME label as the SUP it's feeding into (keeping universes synchronized)
- **Cloner DUP**: Uses a FRESH label for copying within a single universe
### 4. **WHNF (Weak Head Normal Form) and Destructive Reads**
In Python, when you check what something is, you can still use it:
```python
if isinstance(x, list):
return x # x is still intact
```
In HVM, checking what something is **consumes it**:
```
t = wnf(x) # "What is x?" - this DESTROYS x
if t is NIL:
return x # ERROR! x is gone
return Nil() # CORRECT: reconstruct a new Nil
```
This is like opening a sealed envelope to read its contents - once opened, you can't give someone the sealed envelope anymore. You'd need to put the contents in a new envelope.
### 5. **Interaction Rules: The Computation Engine**
The document defines specific rules for how different node types interact. Think of these as the "CPU instructions" of this parallel computer:
- **APP-LAM**: Function application (like calling a function in Python)
- **DUP-LAM**: Duplicating a function (making two copies of a lambda)
- **APP-SUP**: Applying a function to a superposition (the function runs on both branches in parallel)
Each rule is **local** - it only involves the two nodes that are directly connected, which is why everything can run in parallel without conflicts.
### 6. **Atomic Linking and Polarity (HVM3)**
This is about thread-safety at the lowest level. Every connection has a polarity (+ or -), like electrical charges. The system ensures that updates happen atomically (all-or-nothing) without locks:
- Positive nodes actively "push" their values
- Negative nodes passively "receive" values
- This prevents race conditions in parallel execution
It's like having a rule that only one type of connector can initiate a connection, preventing conflicts.
### 7. **The Fork vs Clone Discipline**
This is one of the trickiest parts. When you're working with parallel branches:
**Fork DUPs** split context across parallel universes:
```python
# Pseudocode
def parallel_process(x, y):
x0, x1 = DUP(x, label="universe_split")
y0, y1 = DUP(y, label="universe_split") # SAME label
return SUP("universe_split",
process_a(x0, y0), # Universe 0 gets x0, y0
process_b(x1, y1) # Universe 1 gets x1, y1
)
```
**Cloner DUPs** copy within a single universe:
```python
def single_universe_work(z):
z0, z1 = DUP(z, label="fresh_123") # NEW label
return combine(f(z0), g(z1)) # Both in same universe
```
### 8. **Performance Patterns**
The document describes optimization patterns:
- **Fusion**: Composing functions in a way that intermediate results disappear (like how `map(f, map(g, list))` can fuse into `map(f∘g, list)`)
- **Superposition-driven search**: Exploring multiple possibilities simultaneously
- **Linear context passing**: Threading state through computation without creating intermediate pairs (like using a state monad, but simpler)
### 9. **Safety Rules (The Contract)**
The document is essentially a strict contract that ensures:
1. No variable is used twice without explicit duplication
2. After inspecting a value, you must reconstruct it if you need to return it
3. Parallel branches don't accidentally mix their data
4. Every reference points to valid memory
## Why This Matters
This system enables:
- **Massive parallelism** without race conditions
- **Automatic optimization** that would be impossible in traditional systems
- **GPU execution** of high-level functional code
- **Quantum-like superposition** for exploring multiple computation paths simultaneously
Think of it as a computing model where instead of carefully managing threads and locks, the very structure of your program guarantees it can run in parallel safely. The trade-off is that you must be very explicit about resource usage (the linearity constraint), but in return, you get a system that can automatically parallelize almost everything.
The document is essentially a detailed safety manual for writing code generators and transformers that work with this unusual but powerful computing model. It's like having a strict set of rules that, if followed, guarantee your generated code will run correctly on this massively parallel runtime.
Now let's get into the technical meat that makes this system actually work.
## The Actual Reduction Rules (How Computation Happens)
### The Core Interaction Rules in Detail
Each rule describes what happens when two specific node types meet. Think of these as chemical reactions:
**APP-LAM (Function Application)**:
```
(λx.B) A → B[x := A]
```
But here's the twist - this isn't textual substitution. It's creating a physical wire connection. The λ's body B has a "hole" where x was, and we're literally plugging A's wire into that hole.
**APP-SUP (Application meets Superposition)**:
```
(f {a b}) → {(f₀ a) (f₁ b)}
```
When a function hits a superposition, it needs to apply to BOTH branches. But f can only be used once! So the rule actually:
1. Creates a DUP of f with the SUP's label (creating f₀ and f₁)
2. Applies f₀ to branch a
3. Applies f₁ to branch b
4. Returns the results in a new SUP
**DUP-LAM (Duplicating a Lambda)**:
```
DUP[L](λx.B) → λx₀.B₀, λx₁.B₁ where x becomes SUP[L]{x₀ x₁}
```
This is mind-bending: when you duplicate a function, you get two functions that share their parameter! The original parameter x becomes a superposition of x₀ and x₁. Any computation on x happens in BOTH branches simultaneously.
**DUP-SUP (The Entanglement Rule)**:
When labels match (L = L):
```
DUP[L](SUP[L]{a b}) → (a, b)
```
The DUP "collapses" the matching superposition - it's like observing a quantum state.
When labels differ (L ≠ R):
```
DUP[L](SUP[R]{a b}) → SUP[R]{DUP[L](a) DUP[L](b)}
```
The DUP must be applied to BOTH branches, creating nested superpositions. This is how different parallel computations stay isolated.
### The Critical DUP Storage Mechanism
Here's what actually happens in memory when you create a DUP:
1. Allocate a storage location (let's say address 0x1000)
2. Create two leaf nodes: DP0[label, 0x1000] and DP1[label, 0x1000]
3. Both leaves point to THE SAME storage location
4. The storage location initially contains a "substitution" node (SUB)
When something interacts with DP0:
- It swaps itself into location 0x1000
- If DP1 hasn't been used yet, it will find this value when accessed
- This is how the duplication actually shares the computation
## The Atomic Linking Protocol (HVM3's Lock-Free Magic)
The polarity system is genius. Every wire has a charge:
- **Negative (-)**: Passive slots waiting for values
- **Positive (+)**: Active values looking for homes
The atomic `move` operation:
```
move(negative_location, positive_term):
old = atomic_swap(negative_location, positive_term)
if old.tag == SUB: // Was empty
return // Done!
else:
link(old, positive_term) // Create a new active pair
```
This uses a single atomic swap to:
1. Place the positive term into the negative slot
2. Get back what was there
3. If something was already there, create a new interaction
No locks, no retries, just physics-like rules that guarantee progress.
## The Unordered DUP/SUP Innovation (Experimental)
This is HVM3's attempt to get Haskell-like lazy sharing. Normal DUP/SUP maintains order (left/right), but UDUP/USUP treats both sides as equivalent.
**The Problem**: If you naively share a lambda body:
```
UDUP(λx.body) → λx.body, λx.body // WRONG!
```
Both copies share the SAME x binding, causing chaos when they're both applied.
**The Solution**: Fresh variables with superposed bindings:
```
UDUP(λx.body) → λx₀.G, λx₁.G where x = USUP{x₀ x₁} in G
```
Now each lambda has its own parameter, but the body G sees a superposition of both. This preserves sharing while avoiding variable capture.
## The Collapse Mechanism (Making Sense of Superpositions)
When superpositions with the same label meet, they "collapse" diagonally:
```
SUP[L]{a b} meets SUP[L]{c d}
Results in: [(a,c), (b,d)] // Pairs from same universes
```
When different labels meet, they go full Cartesian:
```
SUP[L]{a b} meets SUP[R]{c d}
Results in: [(a,c), (a,d), (b,c), (b,d)] // All combinations
```
This is how HVM handles non-deterministic computation - exploring all possibilities but keeping entangled computations synchronized.
## Critical Anti-Patterns and Their Fixes
### Anti-Pattern 1: Double Use
```python
# WRONG - x used twice
def bad_pair(x):
return Cons(x, x)
# FIXED - explicit duplication
def good_pair(x):
x0, x1 = Dup("X", x)
return Cons(x0, x1)
```
### Anti-Pattern 2: Post-WHNF Reuse
```python
# WRONG - lib consumed by wnf
def bad_check(lib):
t = wnf(lib)
if is_nil(t):
return lib # lib is gone!
# FIXED - reconstruct canonical value
def good_check(lib):
t = wnf(lib)
if is_nil(t):
return Nil() # Fresh nil
```
### Anti-Pattern 3: Cross-Stream Contamination
```python
# WRONG - dp0 used in wrong branch
def bad_fork(a, b):
a0, a1 = dup_with_label(L, a)
b0, b1 = dup_with_label(L, b)
s0 = process(a0, b0)
s1 = process(a1, b0) # b0 already used!
# FIXED - proper stream discipline
def good_fork(a, b):
a0, a1 = dup_with_label(L, a)
b0, b1 = dup_with_label(L, b)
s0 = process(a0, b0) # Stream 0 gets .dp0
s1 = process(a1, b1) # Stream 1 gets .dp1
```
## The Numeric System and Control Flow
HVM has a clever approach to numbers. Instead of native machine integers everywhere, it uses:
**NUM nodes**: Carry payloads up to 24-32 bits, can represent:
- Actual numbers
- Partially applied operators
- Operator + one operand
**OPERATE interactions**: When NUM meets an operator:
```
(+ 3) 5 → 8
(< 3) 5 → 1 // Comparisons return 0 or 1
```
**SWITCH for Natural Number Branching**:
```
SWITCH 0 (Zero: A) (Succ: B) → A
SWITCH (n+1) (Zero: A) (Succ: B) → B(n)
```
This gives you pattern matching on numbers, but notice - it decrements rather than binding the value.
## Linear Context Threading (The "Mutable Ref" Trick)
This is a beautiful hack for threading state without pairs:
```haskell
-- Traditional: build nested pairs
compute x =
let (y, s1) = step1 x s0
(z, s2) = step2 y s1
in (z, s2)
-- Linear trick: use scoped lambdas
@mut(ref, fn) = let $new = fn ref in $new
@spt(ref, k) = k (λ$y.(ref $z)) (λ$z.($y))
```
The `@spt` operation splits a reference into two linear capabilities that can restore each other. This lets you thread state without building intermediate pairs, keeping the head constructor visible for lazy evaluation.
## The Fusion/Deforestation Patterns
For exponential-by-squaring composition:
```python
# Want: f^16(x) in O(log 16) = 4 compositions
def power_compose(f, n, x):
if n == 0: return x
if n == 1: return f(x)
# The magic: f composed with itself
f2 = compose(f, f) # Must fuse!
if n % 2 == 0:
return power_compose(f2, n//2, x)
else:
return f(power_compose(f2, n//2, x))
```
For this to work efficiently, `compose(f, f)` must produce a bounded-size result, not grow linearly. This requires careful structuring of your data types.
## Performance-Critical Patterns
### Superposition-Driven Search
Instead of exploring paths sequentially:
```python
# Traditional backtracking
def solve(state):
if is_solution(state): return state
for next in possible_moves(state):
result = solve(next)
if result: return result
return None
# Superposition style
def solve_sup(state):
if is_solution(state): return state
moves = possible_moves(state)
# Create superposition of ALL paths
futures = SUP[L]{solve_sup(move) for move in moves}
# All paths explore in parallel!
return collapse_first_success(futures)
```
### ADT Numerics for Fusion
In hot loops, use algebraic data types for numbers:
```haskell
-- Instead of machine ints
data Nat = Z | S Nat
-- Balanced representation for efficiency
data Bin = E | O Bin | I Bin -- Empty, bit-0, bit-1
```
These fuse during composition, unlike machine integers which block optimization.
## The LEAK Mechanism and Scheduling
**LEAK** occurs when a local computation needs to reference a global node:
1. Local reducer hits a non-local reference
2. Must "leak" this interaction to global queue
3. Global scheduler handles it
4. This is expensive, so minimize cross-region references
**CPU Scheduling**: Work-stealing deques
- Each thread has its own deque
- Push/pop locally (LIFO for cache locality)
- Steal from others when empty (FIFO)
**GPU Scheduling**: Warp-based
- Groups of 32 threads (warps) work together
- Minimize divergence (different paths in same warp)
- Shared memory for local interactions
- Global memory for LEAK operations
## The Pre-Flight Audit Checklist
Before any code generation, verify:
**[C1] Pointer-Use Ledger**: Track every variable like an accountant:
```python
def audit_usage(term):
usage = {}
for var in extract_vars(term):
usage[var] = count_uses(var, term)
if usage[var] > 1 and not has_dup(var):
raise Error(f"{var} used {usage[var]} times without DUP")
```
**[C3] Fork Discipline Verification**:
```python
def verify_fork(sup_term):
label = sup_term.label
for dup in find_dups_feeding_into(sup_term):
assert dup.label == label, "Fork DUP must use SUP's label"
# Verify no cross-contamination
assert no_dp0_in_branch1(sup_term)
assert no_dp1_in_branch0(sup_term)
```
## The Memory Layout
A Term in memory (HVM2) typically looks like:
```c
struct Term {
uint32_t tag : 3; // VAR, LAM, APP, DUP, SUP, CTR, NUM, ERA
uint32_t lab : 29; // Label for DUP/SUP entanglement
uint32_t loc; // Pointer to another term or payload
}
```
That's 64 bits per term, fitting in a cache line. The 3-bit tag limits you to 8 node types, but that's enough for the core calculus.
## The Complete Code Generation Contract
When generating code, you must:
1. Output **complete, runnable code** - no "..." or "TODO"
2. Include all helper functions (Dps, new_dps, wnf, etc.)
3. Follow the exact patterns from Section 13's micro-templates
4. Include a "Decision Summary" showing which audits you performed
5. Use only the canonical interaction rules from Section 3
6. Maintain the invariants at all times
The document is essentially saying: "This is a radically different computational model. If you follow these rules EXACTLY, you get massive parallelism for free. Violate them, and your program will corrupt memory or deadlock."
The beauty is that once you internalize these patterns, you're writing code that can run on thousands of cores simultaneously without any explicit synchronization. The structure itself IS the synchronization.
Now I will cover the crucial pieces I haven't touched, then give you that synthesis.
## The REF/CALL System (Lazy Module Boundaries)
This is HVM's answer to separate compilation and code reuse:
**REF (Reference)**: An immutable, named net that lives in the "Book":
```haskell
@fibonacci = λn λf λs
switch n {
0: s
_: @fibonacci (- n 1) s (+ f s)
}
```
**CALL**: Expands a REF with fresh variables:
```
CALL(@fibonacci, arg) →
1. Instantiate @fibonacci with fresh vars
2. Link arg to the parameter
3. Return the expanded net
```
**The "Safe" Flag Optimization**:
- If a REF contains no DUPs, it's marked "safe"
- Safe REFs can be copied on ERASE cheaply (just duplicate the CALL)
- Unsafe REFs must be expanded on ERASE to preserve semantics
This is crucial: it means pure, non-duplicating functions can be erased for free, while stateful or sharing functions force expansion. It's automatic strictness analysis!
## ERA/VOID and Garbage Collection
The erasure system is more sophisticated than it seems:
**ERA (Eraser)**: Actively consumes structures
**VOID**: What ERA becomes after consuming something
The rules cascade:
```
ERA ~ LAM(x, body) → x←VOID; ERA~body
ERA ~ APP(f, x) → ERA~f; ERA~x
ERA ~ SUP(L, a, b) → ERA~a; ERA~b
ERA ~ DUP(L, •) → VOID // DUP just disappears!
```
But here's the subtle part with DUPs:
```
ERA ~ DP0(L, loc) →
if loc.contents is DP1:
loc.contents ← ERA // Erase the shared value
else:
VOID // Other port already erased it
```
This creates a "race" to erase shared values, but it's benign - either port can win, the result is the same.
## The Deeper Composition Pattern
The document hints at but doesn't fully explain the "magic" composition pattern. Here's what's really happening:
```haskell
-- The key: functions that preserve their own structure
f :: Tree → Tree
f (Leaf x) = Leaf (g x)
f (Node l r) = Node (f l) (f r)
-- When you compose f with itself:
(f ∘ f) (Node l r) = f (Node (f l) (f r))
= Node ((f∘f) l) ((f∘f) r)
```
The structure is **preserved** through composition! This means `f^n` has the same shape as `f`, just with `g^n` at the leaves. The normal form size stays bounded, enabling logarithmic-time exponentiation.
## The Hidden Optimal Reduction Property
Here's something profound the document assumes but doesn't emphasize: HVM implements **Lévy-optimal reduction**. This means:
1. **No redundant work**: If a computation would be duplicated, it's shared instead
2. **No premature work**: Computations happen only when needed
3. **Optimal sharing**: The system automatically finds the maximum sharing possible
Traditional lazy evaluation (like Haskell) gives you #2 and some of #3. Strict evaluation gives you none. HVM gives you all three, which was thought impossible to implement efficiently until recently.
## The Affine Memory Discipline's Deep Implications
The document mentions "affine variables" but doesn't stress how radical this is:
- **No reference counting**: Variables are used at most once, so no counting needed
- **No garbage collection pauses**: Memory is freed immediately when consumed
- **Perfect memory prediction**: You can statically determine exact memory usage
- **Cache-optimal**: Each piece of data is touched once then discarded
This isn't just about parallelism - it's about predictable, real-time performance.
## The Core Essential Trick - My Synthesis
After digesting all of this, here's what I think makes HVM truly special:
**The core insight: HVM makes sharing explicit, first-class, and compositional.**
In traditional computing:
- Sharing is implicit (pointers/references)
- Sharing is dangerous (race conditions, memory corruption)
- Sharing breaks abstraction (you must know if something is shared)
- Sharing is all-or-nothing (mutable or immutable)
In HVM:
- Sharing is explicit (DUP/SUP nodes)
- Sharing is safe (linear logic guarantees)
- Sharing is compositional (labeled entanglement)
- Sharing is granular (each DUP is a controlled sharing point)
But here's the real magic - **the SUP node isn't just about parallelism, it's about controlled non-determinism**. It says "this value exists in multiple states simultaneously until observed." This gives you:
1. **Quantum-like superposition** for search problems
2. **Automatic parallelization** without annotations
3. **Perfect sharing** without aliasing bugs
4. **Optimal reduction** without complex analysis
The analytical lens that makes this compelling is this:
**Instead of trying to analyze where sharing is safe (the traditional approach), HVM makes sharing explicit in the syntax itself. The structure IS the analysis.**
This inverts the entire problem:
- Traditional: "Where can we safely parallelize this sequential program?"
- HVM: "Where must we sequentialize this parallel program?"
For real-world impact, imagine:
1. **Database queries** that automatically parallelize over indices, with superposition representing multiple query plans evaluated simultaneously
2. **Machine learning** where the forward and backward passes are the same code (via superposition), automatically fusing for optimal memory usage
3. **Compilers** where optimization passes compose perfectly - `optimize^10` takes the same time as `optimize^2` (logarithmic composition)
4. **Distributed systems** where the network topology is represented as SUP nodes, and routing finds all paths simultaneously
5. **Cryptographic protocols** where the non-determinism is explicit in the type system, making timing attacks impossible
The system is essentially saying: "What if we stopped fighting the physics of parallel hardware and instead made our computational model match it exactly?"
The superposition collapses to classical values only when observed (collapsed), just like quantum mechanics. The labels are like quantum entanglement. The linear variables are like conservation laws.
It's a computational model that mirrors physics rather than fighting it, and that's why it can achieve optimal reduction with local-only rules on massively parallel hardware.
The document you read is the "safety manual" for this new physics of computation - follow these laws exactly, and you get optimal parallel execution. Violate them, and the universe (your program) becomes inconsistent.
That's the lens: **computation as physics, where sharing is a controlled quantum phenomenon rather than a dangerous classical aliasing.**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment