Skip to content

Instantly share code, notes, and snippets.

@dims
dims / dataplane-pluggability.md
Last active June 28, 2026 13:37
Agent Substrate — Dataplane Pluggability (design doc, proposed)

Dataplane Pluggability

Status: Proposed, for review. This is an RFC, a position proposal. It is not an approvable architecture decision record, and it is not a contract. Several questions below, including who owns capacity, the lifecycle shape, and how providers are discovered, can still change what the chosen boundary means in practice. The contract sketches in it are illustrative, and a number of questions are still open (see Open questions). The target design is not built yet. What exists on main today is described in The dataplane today and [Current

@dims
dims / 2026-06-25-microvm-pr287-runbook.md
Last active June 25, 2026 16:05
Agent Substrate micro-VM (PR #287) runbook on a kind cluster: counter-microvm + OpenShell helpdesk-microvm demos, validated end-to-end (suspend/resume of cloud-hypervisor VM snapshots).

Micro-VM (substrate PR #287) on bigbox — runbook

Stand up the agent-substrate micro-VM runtime (cloud-hypervisor + kata-agent, PR #287) on a kind cluster on bigbox and run two demos, then tear it down:

  • Part A — counter-microvm: in-RAM counter → suspend (VM memory snapshot) → resume on another worker → count continues. The runtime's own demo; pure substrate.
  • Part B — helpdesk-microvm: the OpenShell helpdesk agent running as a micro-VM actor → /status + /chat (real gpt-oss:20b-cloud completions via Ollama Cloud) → suspend → resume → state continues. Proves a real ~3 GB agent workload boots, does name-based egress, snapshots, and
@dims
dims / criu-checkpoint-restore-design.md
Created June 10, 2026 17:32
Kata Containers CRIU checkpoint/restore — design + intern test guide (PoC: dims/kata-containers criu-cr-containerd)

CRIU Checkpoint / Restore for Kata Containers

Status: prototype against Kata 3.31.0 (runtime-rs). Validated end-to-end (via shim-ctl): a counter's in-memory state survives 3 checkpoint/restore cycles (monotonic, no reset), a live TCP LISTEN socket survives, a ~10 MB memory buffer survives, and a checkpoint restores in a fresh microVM (migration-style). Engine-driven: both ctr (containerd native) and crictl (CRI, no kubelet) do the full checkpoint→restore cycle with the counter surviving; crictl restore needs containerd ≥ 2.3.0. See Proof of concept for the branches.

Motivation

@dims
dims / 2026-06-03-substrate-cross-vendor-contributors.md
Created June 4, 2026 13:35
Agent Substrate (agent-substrate/substrate) — Cross-Vendor Contributor & Affiliation Report (2026-06-03)

Agent Substrate — Cross-Vendor Contributor & Affiliation Report

Generated: 2026-06-03 Repo: agent-substrate/substrate"Agent Substrate: the core system" (public, 468★, Apache-2.0) What it is: a system on top of Kubernetes that manages agent-like workloads at higher scale/lower latency by taking the K8s control-plane out of the critical path — actors run in gVisor sandboxes (ateom), managed by a kubelet-like agent (atelet), with GCS checkpoint/restore (ategcs) and a router (atenet). Window: 2026-05-13 → 2026-06-03 (~3 weeks — a brand-new seed project) Volume analyzed: 95 commits · 117 PRs (all states) · 63 issues (all states) Analysis basis: upstream agent-substrate/substrate@main (e26cfa22), cloned fresh — the local dims/substrate fork checkout (4cbac18) was a few commits behind.

Framing: Unlike the NVIDIA-owned reports (nvsentinel / dra-driver / aicr / OpenShell), this repo is not NVIDIA-owned — it's a

@dims
dims / external-contributor-report.md
Created June 3, 2026 14:40
OpenShell (NVIDIA/OpenShell) — External Contributor & DCO-Hygiene Report (2026-05-26)

External Contributor & DCO-Hygiene Report — nvidia/OpenShell

  • Generated: 2026-05-26
  • Repository: nvidia/OpenShell (working copy: /Users/dsrinivas/go/src/github.com/nvidia/OpenShell)
  • Total commits analyzed (full main history): 754
  • Total unique commit-author emails: 58
  • Total unique GitHub handles (resolved): 51 (excluding bots)

Methodology summary

@dims
dims / 2026-06-03-aicr-external-contributors.md
Created June 3, 2026 14:30
aicr (NVIDIA/aicr) — External Contributor & DCO-Hygiene Report (2026-06-03)

aicr — External Contributor & DCO-Hygiene Report

Generated: 2026-06-03 Repo: NVIDIA/aicr"Tooling for optimized, validated, and reproducible GPU-accelerated AI runtime in Kubernetes" (323★) History analyzed: 2026-01-30 → 2026-06-03 (~4 months), main @ f65d7b0 Total commits analyzed: 1,205 (44 unique author emails → 35 distinct GitHub handles + 3 bots) Analysis basis: working copy is the dims/aicr2 fork; its main HEAD (f65d7b0eddcda…) is identical to upstream NVIDIA/aicr@main, so the local history faithfully represents upstream.

Methodology: Extracted every commit author via git log (email, name, date, and Signed-off-by trailer via %(trailers)) → resolved each email to a GitHub login through the upstream commit API (GET /repos/NVIDIA/aicr/commits/{sha}.author.login) → classified each handle by (1) Helios LDAP match, (2) @nvidia.com commit email, (3) NVIDIA GitHub-org membership (`GET /orgs/NVIDIA/member

@dims
dims / 1-2026-05-29-firecracker-ateom-poc-bigbox.md
Created May 29, 2026 19:18
Agent Substrate — pluggable ateom backend: Firecracker (microVM). [1] PoC on bigbox, [2] design proposal, [3] implementation log.

Firecracker ateom Backend — Working PoC on bigbox (counter demo)

Update (2026-05-29): this standalone PoC has since been turned into a full in-repo implementation (Phases 0–3) and a cluster e2e — a counter actor on a Firecracker worker driven through the real control plane (ate-api-server + atenet), state preserved across suspend/resume, on the existing kind cluster. Branch firecracker-backend (pushed to dims/substrate, commit bc533f5; worktree ~/go/src/github.com/agent-substrate/substrate-firecracker). Full journal: ~/notes/agent-substrate/2026-05-29-firecracker-backend-implementation-log.md. The PoC notes below are retained for the from-scratch microVM bring-up details (rootfs build, Firecracker API sequence, gotchas).

  • Date: 2026-05-29 · Host: bigbox (Ubuntu 24.04, AMD EPYC 7763, nested KVM) · Firecracker: v1.15.1 · Guest kernel: vmlinux-6.1.128
  • Goal: prove a Firecracker backend can satisfy substrate's ateom Run/Checkpoint/Restore contract, preserving
@dims
dims / host-managed-imex-design-v2.md
Last active May 29, 2026 17:04
Host-managed IMEX v2 design and operator guide

Design v2: Host-Managed IMEX, Minimal Alpha

Field Value
Status Implementable minimal alpha
Feature gate HostManagedIMEX
Scope Install-wide, not per-ComputeDomain
Primary goal Stop launching per-ComputeDomain IMEX DaemonSets when the host already runs nvidia-imex
Primary non-goal Per-ComputeDomain channel isolation across an IMEX fabric
# set PATH and check if cluster is present (all terminals)
export PATH=$HOME/go/bin:$PATH:
kubectl version
# ============================================================
# Terminal A — keep this running, watches and port-forwards.
# ============================================================
kubectl port-forward -n ate-system svc/atenet-router 8000:80 &
kubectl port-forward -n ate-openshell-m0 svc/openshell-gateway-substrate 50051:50051 &
@dims
dims / 2026-05-11-dra-driver-nvidia-gpu-external-contributors.md
Last active May 11, 2026 18:20
dra-driver-nvidia-gpu — External Contributor Report (2026-05-11)

dra-driver-nvidia-gpu — External Contributor Report

Generated: 2026-05-11 (rev. 2 — Helios cross-check added) Repo: kubernetes-sigs/dra-driver-nvidia-gpu Repo history: 2022-07-14 → 2026-05-11 (~3.8 years) Total commits analyzed: 1,853 (47 unique author emails) Methodology: Extracted all unique commit authors via git log → classified by email domain (@nvidia.com = NVIDIA, all others = candidates) → mapped commits to GitHub logins via GET /repos/.../commits/{sha} → verified every candidate against GET /orgs/NVIDIA/members/{username} (HTTP 204 = confirmed member, 404 = not a member) → for ambiguous cases, additionally cross-referenced against NVIDIA Helios LDAP (helios-cli user search) to detect NVIDIA employees who contribute via personal GitHub accounts not registered in the NVIDIA org → cross-referenced GitHub profiles, DCO Signed-off-by trailers, LinkedIn, and corporate-email patterns → folded NVIDIA-personal-e