- Treat simulators as the object of study, not the tool. Mech interp methodology applied to mechanistic simulators is almost completely unexplored, and it's a natural fit. Each simulator (CorticalSim, Cytosim, Tubulaton) makes different assumptions and produces different array behaviors. The mech interp move: don't just compare their outputs to data — intervene inside them. Ablate the collision rule and see what global statistic changes. Patch the dynamic instability module from Tubulaton into Cytosim and see if it rescues a phenotype. Treat each model as a circuit and ask which subcircuits are doing the work. This is exactly activation patching, just on a hand-built mechanistic model instead of a learned one. I don't think the plant MT community has framed it this way and the framing alone would be useful.
- Learned surrogates of the simulators. Cytosim takes hours to run; you can't easily do MCMC over its parameters or run gradient-based optimization. Train a neural surrogate that maps (parameters, initial conditions) → (array statistics). Suddenly you can do Bayesian inference, sensitivity analysis, and design of experiments at the speed of inference rather than simulation. This is a standard move in scientific ML but the plant MT field hasn't really adopted it. If you build the surrogate well, you can then do mech interp on the surrogate — find which input dimensions matter, what internal representations the network learns about the array, whether it discovers the same structure the simulators were built to express. That's a genuinely novel research direction.
- SAEs / probing on simulation trajectories. A long microtubule simulation produces a high-dimensional trajectory. Train a sparse autoencoder on the state at each timestep. What features does it find? Probably things like "local nematic order," "defect density," "regions of active zipping." These are exactly the order parameters the field cares about, but currently they're hand-designed. Letting unsupervised methods discover them and then comparing to the hand-designed ones is the kind of project that would interest both communities.
- Inference from microscopy data. The biggest open methodological problem in the field: given a movie of MTs in a real cell, infer what mechanistic rules generated it. This is a simulation-based inference problem (SBI, ABC, neural posterior estimation). The ML community has tooling for this — sbi library, normalizing flows, etc. — that the plant MT community is largely not using. Bringing modern SBI to bear on the "which model explains this microscopy data" question is a clean, high-value contribution.
- The AI safety / interpretability connection that's actually deep. Self-organization in MT arrays is mathematically the same family of problem as: how do emergent capabilities arise in trained networks? How do mesa-objectives form? You have a system of many local interactions producing global structure, and you want to predict and control the global structure from the local rules. The plant MT case has the enormous advantage that the local rules are known (it's a designed simulator), so it's a sandbox for developing interp methods that you then carry back to neural nets where local rules are emergent. I would actually pitch this to your AI safety colleagues. Cortical MTs are a beautiful "interpretability easy mode" testbed: nontrivial emergent organization, known generating rules, rich data, fast iteration. If your methods can't recover the mechanism here, they probably won't on a transformer either. How to actually engage with this community Concretely, this week (since you're at the workshop): Find the two or three people working on inference or model comparison and have coffee. The names to look for are anyone presenting on parameter fitting, anyone comparing simulators, anyone working with microscopy data quantitatively. Tell them you do mech interp, explain the analogy in one sentence, and ask what their hardest inference problem is. They will tell you, and it will probably be something you have tooling for. Don't lead with "I do AI safety." That phrase means different things to different people and you'll spend the conversation explaining instead of collaborating. Lead with "I work on understanding the internals of trained neural networks, and there's a methodological overlap with what you're doing." Then if it's relevant, mention safety. Offer something specific and small. Not "let's collaborate" — that goes nowhere. "I could spend a week training a surrogate on your Cytosim outputs and seeing if it recovers your order parameters" is concrete and useful. Or "send me 100 of your simulation trajectories and I'll run SAEs on them and tell you what features it finds." Tractable proof-of-concept first, then a real project. The right collaborator is probably a postdoc, not a PI. PIs have full plates and slow timelines. A postdoc who's stuck on an inference problem and would love a methods collaborator is your best bet for actually shipping something. The longer-term pitch If you wanted to make this a real research thread rather than a one-off contribution: there's a paper to be written titled something like "Mechanistic interpretability methods applied to biological self-organization" that takes one cortical MT simulator, treats it as the object of study, applies the full mech interp toolkit (activation patching, probing, SAEs, causal interventions), and shows which methods recover known mechanism and which don't. That paper would land well in both communities and would be a natural bridge piece for someone with your background. It would also be a real contribution to AI safety, not a detour from it — because validating interp methods on systems where ground truth is known is exactly what the field needs and is exactly what biological simulators offer. So: you have more to offer them than you might think, and they have a cleaner testbed than most things you're working with. Worth pushing on.
-
-
Save bigsnarfdude/9b32446c585f458c4a7f196d5c36cba0 to your computer and use it in GitHub Desktop.
Computational Modeling of Cytoskeletal Dynamics: A Guide to Cytosim and the Force-Based Mechanisms of Cortical Microtubule Self-Organization
Architecture and Physical Principles of the Cytosim Engine
Cytosim is an open-source, agent-based physical simulation suite designed to model the non-equilibrium self-organization of large assemblies of flexible filaments, such as microtubules and F-actin, alongside their associated proteins, crosslinkers, and molecular motors.1 Developed starting in 1995 and named in 1999 by François Nédélec and collaborators, the platform has been utilized in over one hundred peer-reviewed publications to study cytoskeletal dynamics in one, two, and three dimensions.1
The physical architecture of Cytosim is built upon a Brownian dynamics approach, integrating the collective Langevin equations of motion for flexible fibers suspended in an immobile, highly viscous cytoplasm.1 Because the Reynolds number at the macromolecular scale is exceptionally low (), inertial forces are mathematically negligible, allowing the simulation core to omit the mass of the objects from the equations of motion.3
To capture the elasticity of filaments, Cytosim represents each fiber as a series of discrete model points
(
) separated by inextensible segments.3 The flexural rigidity (
) of the fiber dictates its resistance to bending, which is calculated directly from the angles between adjacent segments.3 Rather than utilizing stiff longitudinal spring potentials to maintain segment lengths—which would require an excessively small integration time step—Cytosim employs a projection operator to project the equations of motion onto a constrained manifold, maintaining segment inextensibility with high precision.3 This constraint-based method is computationally superior to pairwise repulsive potentials, which suffer from numerical instability in explicit time-stepping schemes.7
The temporal evolution of the coordinates of all model points is computed at each discrete time step using an implicit numerical integration scheme, solving the linearized system 3:
In this governing equation, represents the coordinate vector of the model points at time
,
is the projection matrix enforcing the inextensibility constraints,
is the viscous mobility (the inverse of the drag coefficient),
is the stiffness matrix associated with bending elasticity and mechanical links,
represents constant external forces, and
denotes the random displacements arising from thermal collisions (Brownian motion).3 The stochastic thermal term is calibrated to satisfy the fluctuation-dissipation theorem, utilizing the thermal energy scale
.3 Because the matrix
is highly sparse, Cytosim solves this linear system rapidly using iterative conjugate-gradient-like solvers, enabling the simulation of tens of thousands of filaments and millions of associated proteins on standard CPU architectures.1
Operational Workflows, Compilation, and Directory Structures
Operating Cytosim requires navigating a highly modular C++ compilation process and a command-line driven execution and analysis pipeline.1 The software does not require dedicated graphic processing hardware, running on standard CPUs under UNIX, macOS, GNU/Linux, and Windows (via Cygwin or the Windows Subsystem for Linux).1
Software Dependencies and Environment Setup
The simulation core and visualization tools rely on a standard set of mathematical and graphical libraries, outlined in the table below:
| Library Name | Architectural Role in Cytosim | Operating System Packages |
|---|---|---|
| BLAS | Optimized vector-matrix arithmetic for implicit integration solver 1 | libblas-dev (Linux) / Built-in Apple Accelerate (macOS) 1 |
| LAPACK | Dense linear algebra and matrix decomposition operations 1 | liblapack-dev (Linux) / Apple Accelerate (macOS) 1 |
| Ncurses | Terminal screen handling and text-based interactive menus 1 | Standard system library 1 |
| pthreads | POSIX-compliant multi-threading for parallel CPU execution 1 | Built-in system compiler package 1 |
| OpenGL | Hardware-accelerated 2D and 3D graphical rendering of trajectories 1 | mesa-common-dev (Linux) / Native framework (macOS) 1 |
| GLUT / freeGLUT | Windowing and input event handling for interactive display 1 | freeglut3-dev (Linux) / Native framework (macOS) 1 |
| GLEW | OpenGL Extension Wrangler for modern graphics pipeline support 10 | libglew-dev (Linux) 10 |
| Git | Version control for downloading and updating source code 10 | git 10 |
While CorticalSim is designed for computational speed—allowing rapid scanning of the parameter space to establish phase diagrams of network alignment—it relies on hard-coded, phenomenological collision rules.16 For example, collisions occurring at angles below an arbitrary threshold are programmed to result in zippering, whereas steeper collisions result in a split probability between crossover and catastrophe.17
Cytosim, conversely, simulates the complete physical mechanics.16 Instead of prescribing collision outcomes based on local angles, Cytosim calculates the balance of elastic bending energy, steric repulsion, and motor- or anchor-mediated forces in three dimensions.16 This allows the software to model phenomena that cannot be captured in 2D simplifications, such as out-of-plane buckling, twist, and spatial segregation in complex volumetric geometries.6
Advanced Physical Phenomena and Future Biophysical Directions
The utility of Cytosim extends beyond basic cytoskeletal simulations to include multi-physics phenomena and structural mechanics at the single-molecule level.34
Mechanical Stress and the Microtubule Breakage Threshold
While microtubules are highly rigid, they can undergo dramatic bending in vivo.34 Understanding how much force a microtubule can sustain before breaking remains a fundamental question in biophysics.34 Fully atomistic molecular dynamics simulations suggest that a microtubule can sustain up to of tensile stress, which corresponds to a longitudinal stretching force of approximately
.34 However, experimental studies under physiological conditions indicate that microtubules break at much lower forces.34
To address this, Cytosim has been used to model the interaction between multi-motor clusters and microtubule networks, integrating experimental measurements with mechanical simulations.34 For instance, the overexpression or phase separation of the kinesin-3 motor KIF1C results in dense multi-kinesin clusters on cargo surfaces.34 When these clusters engage with neighboring microtubules, the walk-away behavior of multiple processive motor domains exerts opposing tensile forces.34
Kinesin Motor Domain Walks Right --->
=========================================== [Microtubule]
<--- Kinesin Motor Domain Walks Left
│
▼
Tensile Stress Buildup (70-120 pN)
│
▼
By testing a range of mechanical thresholds in Cytosim, researchers estimated the rupture force of microtubules in living cells to be between .34 This is orders of magnitude lower than in vitro estimates for taxol-stabilized microtubules, suggesting that microtubules in vivo are highly sensitive to localized tensile stress, whereas compressive forces are easily dissipated by bending and buckling.34
Bilayer-Mediated Nematic Transitions
Another expanding research front is the coupling of cytoskeletal networks with fluid lipid membranes.35 When motor-propelled microtubules are bound to supported lipid bilayers (SLBs) containing diffusing anchors (such as DGS-NTA lipids binding histidine-tagged kinesins), the motors can diffuse laterally with a diffusion constant of .40
This lateral mobility introduces a physical feedback loop: as microtubules glide and collide, the fluid membrane allows the motors to pool and reorganize.40 This bilayer-mediated restructuring significantly suppresses crossing events, promoting active bundling even in the absence of chemical depletants like PEG.40 As a result, the system transitions from a disordered, isotropic state to an active nematic state characterized by dense, parallel lanes of microtubules traveling across the membrane.40 Modeling these dynamic fluid-structure interfaces in Cytosim is key to understanding the mechanics of cell division, nuclear positioning, and the design of self-organizing synthetic cells.35
Conclusions and Recommendations for Computational Modeling
The physical modeling of cytoskeletal systems demonstrates that complex, cell-scale organization can emerge from local, mechanical feedback loops operating at the single-filament level.26 François Nédélec's and Maud Formanek's force-based model of cortical microtubule organization replaces hard-coded, phenomenological collision rules with a continuous physical competition between flexural rigidity, polymerization force, and anchor compliance.16 This mechanical framework explains how plant cells integrate geometric and mechanical cues to guide morphogenesis.26
For biophysical researchers developing simulations of cytoskeletal assemblies, the choice of modeling framework should be dictated by the specific physical questions being addressed.16 For large-scale qualitative sweeps of phase space, 2D event-driven tools like CorticalSim remain highly effective due to their low computational cost.16 However, when investigating the mechanical feedback of physical confinement, the effects of membrane anchoring density, or the structural consequences of external mechanical stress, a 3D, force-based framework like Cytosim is necessary.16
Works cited
Cytosim - Francois Nedelec - GitLab, accessed on May 20, 2026, https://gitlab.com/f-nedelec/cytosim
simularium/Cytosim: A fork of Cytosim, a cytoskeleton simulation engine (from https://gitlab.com/f-nedelec/cytosim) · GitHub - GitHub, accessed on May 20, 2026, https://github.com/simularium/Cytosim
(PDF) Collective Langevin Dynamics of Flexible Cytoskeletal Fibers - ResearchGate, accessed on May 20, 2026, https://www.researchgate.net/publication/24167863_Collective_Langevin_Dynamics_of_Flexible_Cytoskeletal_Fibers
Geometrical and Mechanical Properties Control Actin Filament Organization - PMC, accessed on May 20, 2026, https://pmc.ncbi.nlm.nih.gov/articles/PMC4446331/
Collective Langevin Dynamics of Flexible Cytoskeletal Fibers - arXiv, accessed on May 20, 2026, https://arxiv.org/pdf/0903.5178
Effects of spatial dimensionality and steric interactions on microtubule-motor self-organization - PMC, accessed on May 20, 2026, https://pmc.ncbi.nlm.nih.gov/articles/PMC7655122/
Toward the cellular-scale simulation of motor-driven cytoskeletal assemblies | eLife, accessed on May 20, 2026, https://elifesciences.org/articles/74160
Centrosome centering and decentering by microtubule network rearrangement - PMC - NIH, accessed on May 20, 2026, https://pmc.ncbi.nlm.nih.gov/articles/PMC5025270/
Nédélec Group | Sainsbury Laboratory - University of Cambridge, accessed on May 20, 2026, https://www.slcu.cam.ac.uk/research/nedelec-group
(PDF) A typical workflow to simulate cytoskeletal systems with Cytosim - ResearchGate, accessed on May 20, 2026, https://www.researchgate.net/publication/360936235_A_typical_workflow_to_simulate_cytoskeletal_systems_with_Cytosim
cytosim for 3D actomyosin simulations - AKBG, accessed on May 20, 2026, https://akbg.uni-goettingen.de/software_doku/ExternalDocs/cytosim.html
doc/main/faq.md · main · Francois Nedelec / Cytosim - GitLab, accessed on May 20, 2026, https://gitlab.com/f-nedelec/cytosim/-/blob/main/doc/main/faq.md
Cytosim is a cytoskeleton simulation engine - GitHub, accessed on May 20, 2026, https://github.com/ykjawale/cytosim
Viscoelasticity Analysis of Coarse-grained Cytoskeletal Simulations with Cytosim and Cytocalc | bioRxiv, accessed on May 20, 2026, https://www.biorxiv.org/content/10.1101/2025.10.30.685558v1.full-text
doc · master · Francois Nedelec / Cytosim-2023 - GitLab, accessed on May 20, 2026, https://gitlab.com/f.nedelec/cytosim/-/tree/master/doc
Microtubule simulations in plant biology: A field coming to maturity - WUR eDepot, accessed on May 20, 2026, https://edepot.wur.nl/662582
Angle dependent outcomes of microtubule collisions. Graphical depiction... - ResearchGate, accessed on May 20, 2026, https://www.researchgate.net/figure/Angle-dependent-outcomes-of-microtubule-collisions-Graphical-depiction-of-the_fig10_262300295
A theory of microtubule catastrophes and their regulation - PMC - NIH, accessed on May 20, 2026, https://pmc.ncbi.nlm.nih.gov/articles/PMC2795527/
Force- and length-dependent catastrophe activities explain interphase microtubule organization in fission yeast - PMC, accessed on May 20, 2026, https://pmc.ncbi.nlm.nih.gov/articles/PMC2671915/
microtubule pushing forces contribute to robust spindle orientation in regular and irregular cell shapes - bioRxiv, accessed on May 20, 2026, https://www.biorxiv.org/content/10.1101/2025.09.22.677921v1.full.pdf
On the large-scale simulation of multibody nonlocal dynamics - Digital Repository, accessed on May 20, 2026, https://d.lib.msu.edu/etd/52658
Efficient event-driven simulations shed new light on microtubule organization in the plant cortical array - Frontiers, accessed on May 20, 2026, https://www.frontiersin.org/journals/physics/articles/10.3389/fphy.2014.00019/full
A Mechanochemical Model Explains Interactions between Cortical Microtubules in Plants, accessed on May 20, 2026, https://pmc.ncbi.nlm.nih.gov/articles/PMC2920726/
Computer simulation and mathematical models of the noncentrosomal plant cortical microtubule cytoskeleton - Washington University in St. Louis Scholarly Repository, accessed on May 20, 2026, https://openscholarship.wustl.edu/cgi/viewcontent.cgi?article=1072&context=bio_facpubs
Probing stress-regulated ordering of the plant cortical microtubule array via a computational approach | bioRxiv, accessed on May 20, 2026, https://www.biorxiv.org/content/10.1101/2022.02.17.480928v1.full-text
(PDF) The self-organization of plant microtubules inside the cell volume yields their cortical localization, stable alignment, and sensitivity to external cues - ResearchGate, accessed on May 20, 2026, https://www.researchgate.net/publication/323296472_The_self-organization_of_plant_microtubules_inside_the_cell_volume_yields_their_cortical_localization_stable_alignment_and_sensitivity_to_external_cues
Microtubule-based nucleation results in a large sensitivity to cell geometry of the plant cortical array - PMC, accessed on May 20, 2026, https://pmc.ncbi.nlm.nih.gov/articles/PMC12425394/
Poster Removal - European Molecular Biology Laboratory (EMBL), accessed on May 20, 2026, https://www.embl.org/about/info/course-and-conference-office/wp-content/uploads/EES24-07-Poster-numbers-1.pdf
Directory - People Search - Cambridge Centre for Physical Biology |, accessed on May 20, 2026, https://www.physbiol.cam.ac.uk/directory/people-search/research-topic/biomechanics/research-topic/research-area/development/research-topic/plant-cells
Directory - People Search - Cambridge Centre for Physical Biology |, accessed on May 20, 2026, https://www.physbiol.cam.ac.uk/directory/people-search/research-topic/cytoskeleton
Microtubule-based nucleation results in a large sensitivity to cell geometry of the plant cortical array - bioRxiv, accessed on May 20, 2026, https://www.biorxiv.org/content/biorxiv/early/2025/05/19/2024.03.25.586463.full.pdf
Microtubule-based nucleation results in a large sensitivity to cell geometry of the plant cortical array | PLOS Computational Biology - Research journals, accessed on May 20, 2026, https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1013282
Machine learning segmentation tool trained on synthetic data for tracking cytoskeleton polymerisation and depolymerisation | bioRxiv, accessed on May 20, 2026, https://www.biorxiv.org/content/10.1101/2025.01.04.631322v1.full-text
Multi-kinesin clusters impart mechanical stress that reveals mechanisms of microtubule breakage in cells - PMC, accessed on May 20, 2026, https://pmc.ncbi.nlm.nih.gov/articles/PMC11838454/
Active Cytoskeletal Systems on Lipid Membranes - mediaTUM, accessed on May 20, 2026, https://mediatum.ub.tum.de/doc/1625312/document.pdf
Multi-kinesin clusters impart mechanical stress that reveals mechanisms of microtubule breakage in cells | Journal of Cell Biology | Rockefeller University Press, accessed on May 20, 2026, https://rupress.org/jcb/article/224/10/e202501070/278191/Multi-kinesin-clusters-impart-mechanical-stress
Multi-kinesin clusters impart mechanical stress that reveals mechanisms of microtubule breakage in cells - ResearchGate, accessed on May 20, 2026, https://www.researchgate.net/publication/394434794_Multi-kinesin_clusters_impart_mechanical_stress_that_reveals_mechanisms_of_microtubule_breakage_in_cells
Multi-kinesin clusters impart mechanical stress that reveals mechanisms of microtubule breakage in cells | bioRxiv, accessed on May 20, 2026, https://www.biorxiv.org/content/10.1101/2025.01.31.635950v1.full-text
(PDF) Multi-kinesin clusters impart mechanical stress that reveals mechanisms of microtubule breakage in cells - ResearchGate, accessed on May 20, 2026, https://www.researchgate.net/publication/388663159_Multi-kinesin_clusters_impart_mechanical_stress_that_reveals_mechanisms_of_microtubule_breakage_in_cells
Active nematic order and dynamic lane formation of microtubules driven by membrane-bound diffusing motors | PNAS, accessed on May 20, 2026, https://www.pnas.org/doi/10.1073/pnas.2117107118
Active nematic order and dynamic lane formation of microtubules driven by membrane-bound diffusing motors - ResearchGate, accessed on May 20, 2026, https://www.researchgate.net/publication/357233925_Active_nematic_order_and_dynamic_lane_formation_of_microtubules_driven_by_membrane-bound_diffusing_motors
A novel mechanism of microtubule length-dependent force to pull centrosomes toward the cell center - PMC, accessed on May 20, 2026, https://pmc.ncbi.nlm.nih.gov/articles/PMC3158624/
A matter of force - European Molecular Biology Laboratory (EMBL), accessed on May 20, 2026, https://www.embl.org/news/science/a-matter-of-force/
Plant MT Simulation Benchmark — What We Actually Agreed On
BIRS Workshop 26w5658 follow-up
Status: Draft for sign-off
Owner of this doc: [name]
Deadline for sign-off: 4 weeks from workshop close
Why This Document Exists
The workshop ended without a shared artifact. Two years of planning produced great conversations and no concrete deliverable beyond a planned special issue with no agreed content.
This document fixes that. It proposes the minimum thing the field needs to stop talking past itself: a shared benchmark that any simulator must pass before its results can be compared to another simulator's results.
This is not a research paper. This is a checkerboard. The point is not to be novel. The point is to be agreed upon.
If you sign this, you commit to:
- Running your simulator against the benchmark within 8 weeks
- Publishing your numbers in the QPB special issue regardless of whether you pass
- Treating the benchmark as a precondition for any future cross-simulator claim
What We Agree On (Sign at the Bottom)
Tier 1 — Isolated Microtubule Dynamic Instability
Setup: 200 non-interacting MTs in free space. BY-2 parameters (growth velocity 0.08 μm/s, shrink velocity 0.16 μm/s, catastrophe rate 0.003/s, rescue rate to be specified). 1800 second simulation. 5 seeds minimum.
Reported statistics:
- Length distribution at t=1800s
- Per-MT growth velocity (from event log, not 1Hz sampling)
- Per-MT shrink velocity (from event log)
- Per-MT catastrophe count
- Survival fraction
- Stutter frequency and duration (STADIA-style analysis, see Tier 1.5)
Pass criterion: All pairwise KS tests p > 0.05 on the four classical statistics.
Tier 1.5 — Stutters (added based on Mahserejian et al. 2022): Report whether your simulator produces stutter behavior. If yes, report frequency and duration distributions. If no, state explicitly that your model is two-state and does not capture pre-catastrophe slowdown. This is not a pass/fail criterion yet — it is a disclosure requirement. We need to know which simulators can even ask the stutter question before we decide whether stutter agreement matters.
Tier 2 — Collective Behavior with Collision Rules
Setup: Cylindrical cell, 125 × 20 μm (BY-2-like). Same DI parameters as Tier 1. Three collision conditions:
| Condition | Description |
|---|---|
zipper_40 |
Standard 40° zippering threshold, cross/catastrophe above |
always_catas |
All collisions → catastrophe |
no_interaction |
MTs pass through each other freely |
10 seeds per condition. 3 hours simulation time minimum.
Reported statistics (all three, jointly):
- S₂ nematic order parameter at t=final
- MT density at t=final (S₂ without density is meaningless — see workshop finding on branched nucleation confound)
- S₂ slope over last 30% of run
- Mean MT length
- Collision event breakdown (zip/cross/catas counts)
Pass criterion: Each simulator reports all five metrics for all three conditions. There is no pass/fail across simulators yet — Tier 2 is the diagnostic. The point is to see where engines diverge and why.
Tier 3 — Orientation-Dependent Catastrophe (ε Condition)
Setup: Same cylinder as Tier 2. Default zippering. Add orientation-dependent catastrophe rate per Tian et al. 2025: r_c(φ) = r_0 (1 + ε sin φ).
Sweep ε ∈ {0, 0.2, 0.4, 0.8}. 10 seeds each.
Reported statistics: Same as Tier 2, plus:
- Angular distribution of MTs (binned at 5°)
- Spatial S₂ across cylinder bands (not just global S₂)
Pass criterion: Same as Tier 2 — disclosure, not validation. This tier exists because the ε mechanism is where engines are expected to disagree most.
Tier 4 — Branched Nucleation Control
Setup: Tier 2 baseline + 2×2 factorial on (collision rule × nucleation mode):
| Branched nucleation (accept_MT=0.24) | Isotropic nucleation | |
|---|---|---|
| Baseline collisions | A | B |
no_interaction |
C | D |
10 seeds per cell. The gap (A − D) is the total ordering contribution. The gap (B − D) isolates collision-rule contribution. The gap (A − B) isolates nucleation contribution.
Why this matters: Workshop finding — S₂ as currently reported is a mixture signal of branched nucleation + collision-induced alignment. Without controlling for nucleation mode, "my simulator agrees with experiment on S₂" is not a meaningful claim.
What We Explicitly Do Not Agree On Yet
These are listed here so they don't get smuggled in later as if they were settled:
- Whether stutters are mechanistically important for plant MT arrays specifically
- Whether MT bending/elasticity needs to be in the model
- What the "correct" katanin severing rule is
- Whether realistic cell shapes change tier-3 conclusions
- Whether mechanical stress feedback to MTs is real or phenomenological
- The right persistence length value
Each of these gets its own benchmark when there's enough infrastructure to test it. Not before.
Deliverables
Per simulator, due 8 weeks from sign-off:
- Code: Public repo with the benchmark runner. Reproducible from
git clone && run.sh. - Data: Raw output for all tiers, all seeds, all conditions. Parquet or numpy preferred.
- Report: One markdown file per simulator, filling in the metrics tables.
- Honesty disclosure: What your simulator cannot do that the benchmark asks for. List the tiers you skipped and why.
These four artifacts together become the QPB special issue contribution. The special issue is not a place for papers about your favorite simulator. It is the proceedings of this benchmark.
Process Rules
- No paper chasing. If you want to add a new tier, propose it here. Don't write a parallel benchmark paper.
- Disagreements get resolved by running the benchmark, not by argument. If two simulators disagree on Tier 2, the discussion is "why does our code differ at this specific point," not "which is right."
- Failures are publishable. A simulator that fails Tier 1 is the most informative outcome possible. It means we found a real bug or a real physics disagreement. Either is publishable.
- One person owns each tier. Not a committee. Names below.
Tier Owners (Fill In)
| Tier | Owner | Backup |
|---|---|---|
| Tier 1 (DI) | ||
| Tier 1.5 (Stutters) | ||
| Tier 2 (Collisions) | ||
| Tier 3 (ε condition) | ||
| Tier 4 (Nucleation control) | ||
| Infrastructure (repo, runner, results aggregator) |
Signatures
By signing, you commit to running this benchmark on your simulator within 8 weeks and publishing the results in the QPB special issue.
| Simulator | Lead | Signature | Date |
|---|---|---|---|
Appendix A — What Already Exists
The following work was done at the workshop and around it. Some of it is partial; all of it is a starting point:
- Tier-1 reference run: Tim's sim and Cytosim, BY-2 parameters, seed 26058. Passed all four KS tests at p > 0.05. Data at
/home/vincent/bench/. - Tier-2 partial: CorticalSim 2/3 conditions done, Cytosim running, Tim's data on disk at
mt_phase1/. Reproducible. - Refactored zip_cat: Tim's collision logic split into
collision/decision.py+collision/geometry.py+collision/api.py. 81 tests pass. Backward-compatible. Branchrefactor/split-zip-catonbigsnarfdude/elastic-mt-refactor. - Branched nucleation confound: Demonstrated empirically.
accept_MT=0.24contributes substantially to S₂ even when collision-based alignment is removed. See ablation results.
This is not "Vincent's work that you can ignore." This is the existing tier-1 calibration data. If your simulator's tier-1 numbers don't agree with this, that is the conversation to have.
Appendix B — Why We Need This (One Paragraph)
The workshop description called for a "Stanford Bunny" for plant MT simulation. We didn't produce one. This document is the minimum viable Stanford Bunny: an agreed isolated-MT test, an agreed collective-behavior test, and an honest disclosure of what each simulator can and cannot do. Without this, every cross-paper comparison in the field is unmoored. With this, we can argue about science instead of about whose code is right.
The MT Benchmark Nobody Built
- Three groups built three simulators for plant microtubule dynamics. Nobody had ever run them against the same parameters and checked if they agree.
- A one-day sprint compared all three on isolated MT dynamics. They agree. The DI physics layer is consistent across all three codebases.
- The no-interaction condition revealed that the collision rule isn't doing alignment — it's doing population control. Remove it and the array explodes.
- All three engines are built on the 2-state DL model. STADIA 2022 showed a third state — the stutter — precedes 80% of catastrophes. Nobody has it.
- Cancer drug discovery targets MT dynamics. The models drugs are designed against are incomplete. The heavy lifting won't be done by individual labs.
The Setup
The BIRS workshop on plant microtubule modelling brought together the three groups who built the main simulation engines. The frustration in the room: thirty years of work, increasingly detailed biology, and nobody had ever just run the engines against the same parameters and compared outputs. No shared benchmark. No canonical test. No MNIST for microtubule simulators.
Three levels of abstraction. François built the physics. Eva and Tim built rules that approximate the physics. Each one is a bet that the level below it matters less than it looks like.
What We Ran
Tier-1: 200 isolated MTs, Dogterom-Leibler 2-state model, BY-2 plant cell parameters (v+=0.08 µm/s, v-=0.16 µm/s, kcat=0.003/s, kres=0.005/s), 1800s. No interactions. Do the engines agree on basic DI statistics?
Tier-2: Full cortical array on a 125×20 µm cylinder, same DI parameters, three collision conditions: 40° zipper rule, no interaction, always catastrophe. Where do the engines diverge?
What We Found
Every pairwise KS test passes at p>0.05. Growth and shrink velocities match spec to 3 sig figs across all three codebases. Any future disagreement at tier-2 is collision handling, not the growth/shrink model.
No-interaction under BY-2 parameters: population grew from 2,800 MTs at t=180s to 14,000 at t=1800s with no sign of stabilizing. BY-2 has rescue rate > catastrophe rate — without collision-induced catastrophe the array has no equilibrium. The zipper rule's primary job isn't alignment, it's keeping the array alive. Remove it and you don't get a disordered array. You get a population explosion.
The Dogterom-Leibler model has two states: grow and shrink. STADIA 2022 (Goodson lab) showed that a third state — the stutter, a period of slowed or halted dynamics — precedes 78% of catastrophes in simulation and 86% in experiment. None of the three engines implement it. The benchmark is testing an incomplete model.
Why It Matters Beyond Plant Biology
Microtubule dynamic instability is a primary target for cancer chemotherapy. Taxol, vincristine, and the broader class of MT-targeting agents work by disrupting the grow/shrink cycle. The models these drugs are designed and tested against are built on the same 2-state DL framework.
Individual labs won't fix this. Each one is optimized for their own questions. The benchmark work — agreeing on parameters, running cross-validation, establishing what "passing" looks like, updating the canonical model when new states are discovered — is coordination work. It needs to live outside any single lab.
The Abstraction Stack
François built physics from observations through a microscope. Eva and Tim built decision rules from the same observations. The mech-interp work on Tim's simulator built a model of the rules to understand what they're actually doing (answer: density control, not alignment).
The next step isn't building a better rule. It's removing the handcrafted translation entirely — generate enough trajectories from the event-driven sims (seconds each, 100k overnight) and train something that learns the structure directly. The simulator becomes training data, not the artifact. The stutter state becomes a feature the model discovers, not a state someone has to add by hand.
That's the gap between 1995 and now. The microscopes got better. The simulators didn't change architecture.
What's Still Running
Cytosim tier-1 finishes tonight. Cytosim tier-2 (bundling, crossing, nonsteric) is running on nigel — days away. That's the comparison worth waiting for: does the force-based physics agree with the decision-rule engines on cortical array organization? If yes, the rules are a good approximation. If no, the forces are capturing something the rules miss.
nigel:/home/vincent/bench/ · STADIA: Vemu et al. 2022, PMC9250389 · BIRS 26w5658 · 2026-05-23MT Simulator Benchmark — Tier 1 & Tier 2 Comparison Report
Date: 2026-05-26 | Machine: nigel (RTX 4070 Ti) | Seed: 26058
Engines
┌─────────────┬──────────────────┬──────────┬───────────────────────────────────┐
│ Engine │ Author │ Language │ Type │
├─────────────┼──────────────────┼──────────┼───────────────────────────────────┤
│ Cytosim │ François Nédélec │ C++ │ Physics, continuous forces │
├─────────────┼──────────────────┼──────────┼───────────────────────────────────┤
│ elastic-mt │ Tim │ Python │ Event-driven, zippering │
├─────────────┼──────────────────┼──────────┼───────────────────────────────────┤
│ corticalsim │ Eva │ C++ │ Cortical array, cylinder geometry │
└─────────────┴──────────────────┴──────────┴───────────────────────────────────┘
Tier 1 — Isolated Dynamic Instability
Single MTs, no collisions. Tests whether engines agree on basic DI physics (grow/shrink/catastrophe/rescue) using BY-2 parameters.
Engines compared: Cytosim vs Tim DI wrapper (Tim's full elastic-mt can't run isolated MTs without refactoring — collision detection is baked into the main loop, so a standalone wrapper reimplementing his exact DI parameters was used.)
Results (200 MTs, seed 26058, t=1518s)
┌───────────────────────────┬─────────┬────────┬────────┬──────────┐
│ Statistic │ Cytosim │ Tim DI │ Spec │ Verdict │
├───────────────────────────┼─────────┼────────┼────────┼──────────┤
│ Growth velocity (µm/s) │ 0.080 │ 0.080 │ 0.080 │ ✅ │
├───────────────────────────┼─────────┼────────┼────────┼──────────┤
│ Shrink velocity (µm/s) │ 0.159 │ 0.159 │ 0.160 │ ✅ │
├───────────────────────────┼─────────┼────────┼────────┼──────────┤
│ Mean cats/MT │ 1.7 │ 1.7 │ — │ ✅ │
├───────────────────────────┼─────────┼────────┼────────┼──────────┤
│ Mean final length (µm) │ 64.9 │ 59.6 │ — │ ✅ │
├───────────────────────────┼─────────┼────────┼────────┼──────────┤
│ Max final length (µm) │ 122 │ 121 │ — │ ✅ │
├───────────────────────────┼─────────┼────────┼────────┼──────────┤
│ Alive MTs at t=T │ 52/200 │ 63/200 │ — │
├───────────────────────────┼─────────┼────────┼────────┼──────────┤
│ KS p-value (length dist.) │ — │ — │ p=0.37 │ ✅ PASS │
└───────────────────────────┴─────────┴────────┴────────┴──────────┘
Verdict: PASS. Both engines agree on all four canonical DI statistics at p > 0.05.
One known artifact: Catastrophe frequency reads ~5× the input rate (0.017/s vs spec 0.003/s). This is a 1 Hz sampling artifact — fast sub-second catastrophe+rescue cycles are invisible to the logger. Both engines are equally affected so they
still agree with each other. Fix: sub-second sampling or event-log counting.
Alive MT count (52 vs 63): Within expected stochastic variance for n=1 seed. Not a bug.
Tier 2 — Cortical Array with Collisions
MTs nucleated on a surface, MT-MT interactions enabled. Tests whether engines agree on how collisions shape the steady-state array.
Engines compared: Cytosim (3 variants) × corticalsim/Eva (3 variants).
Variants
┌──────────────────┬─────────┬───────────────────────────────────────────┐
│ Variant │ Engine │ Collision physics │
├──────────────────┼─────────┼───────────────────────────────────────────┤
│ cyto:nonsteric │ Cytosim │ No MT-MT interaction (reference) │
├──────────────────┼─────────┼───────────────────────────────────────────┤
│ cyto:crossing │ Cytosim │ Steric repulsion only │
├──────────────────┼─────────┼───────────────────────────────────────────┤
│ cyto:bundling │ Cytosim │ Steric attraction + repulsion │
├──────────────────┼─────────┼───────────────────────────────────────────┤
│ csim:nozipper │ Eva │ No zipper, MTs cross freely │
├──────────────────┼─────────┼───────────────────────────────────────────┤
│ csim:baseline │ Eva │ 40° zipper rule + 50% induced catastrophe │
├──────────────────┼─────────┼───────────────────────────────────────────┤
│ csim:alwayscatas │ Eva │ Always catastrophe on collision │
└──────────────────┴─────────┴───────────────────────────────────────────┘
▎ Geometry note: Not directly comparable in absolute numbers — Cytosim uses a 50×50 µm flat periodic patch (2,500 µm²), Eva's sim uses a cylinder (18,221 µm², ~7.3× larger) with ~36× higher nucleation flux. Qualitative trends are what matter
▎ here.
Steady-state results (last 50% of simulation time, t=900–1800s)
┌──────────────────┬──────────────────┬──────────────────┬───────────┐
│ Variant │ Density (µm/µm²) │ Mean length (µm) │ MTs alive │
├──────────────────┼──────────────────┼──────────────────┼───────────┤
│ cyto:nonsteric │ 3.83 │ 29.8 │ 317 │
├──────────────────┼──────────────────┼──────────────────┼───────────┤
│ cyto:crossing │ 2.51 │ 20.8 │ 301 │
├──────────────────┼──────────────────┼──────────────────┼───────────┤
│ cyto:bundling │ 3.44 │ 26.1 │ 327 │
├──────────────────┼──────────────────┼──────────────────┼───────────┤
│ csim:nozipper │ 20.80 │ 30.3 │ 12,383 │
├──────────────────┼──────────────────┼──────────────────┼───────────┤
│ csim:baseline │ 0.86 │ 7.1 │ 2,203 │
├──────────────────┼──────────────────┼──────────────────┼───────────┤
│ csim:alwayscatas │ 0.54 │ 5.7 │ 1,718 │
└──────────────────┴──────────────────┴──────────────────┴───────────┘
Order parameter S2 (Eva's sim only — cylinder geometry required)
┌──────────────────┬───────┬──────────────────────────────────────────────────────────────────┐
│ Variant │ S2 │ Interpretation │
├──────────────────┼───────┼──────────────────────────────────────────────────────────────────┤
│ csim:nozipper │ 0.004 │ Isotropic, no preferred direction │
├──────────────────┼───────┼──────────────────────────────────────────────────────────────────┤
│ csim:baseline │ 0.101 │ Modest alignment from zipper rule │
├──────────────────┼───────┼──────────────────────────────────────────────────────────────────┤
│ csim:alwayscatas │ 0.159 │ Strongest alignment — fast turnover lets stable bundles dominate │
└──────────────────┴───────┴──────────────────────────────────────────────────────────────────┘
Key Finding
Both engines agree qualitatively, disagree quantitatively by ~24×.
Within each engine, adding collision effects reduces density and mean length. The ordering is consistent. But the magnitude of the effect is completely different:
┌───────────┬─────────────────────────────────┬───────────────────┐
│ Engine │ No collisions → With collisions │ Density reduction │
├───────────┼─────────────────────────────────┼───────────────────┤
│ Cytosim │ 3.83 → 2.51 µm/µm² │ ~35% │
├───────────┼─────────────────────────────────┼───────────────────┤
│ Eva's sim │ 20.8 → 0.86 µm/µm² │ ~24× │
└───────────┴─────────────────────────────────┴───────────────────┘
Root cause: The engines implement different biology.
- Cytosim collision = mechanical force only (MTs push/pull, no state change)
- Eva's sim collision = 50% chance of catastrophe (MT dies on contact)
Collision-induced catastrophe is the dominant mechanism. Mechanical repulsion alone barely affects steady-state density. Any simulator without it cannot reproduce in-vivo BY-2 MT densities.
Cytosim oddity — bundling > crossing: Adding attraction (bundling) gives higher density than repulsion alone (crossing). Mechanism: co-alignment from attraction reduces crossing angles, so fewer effective collisions occur. This is actually
consistent with the biology — zippering reduces interference, it doesn't amplify it.
Summary
┌────────┬────────────────────────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────┐
│ Tier │ Question │ Answer │
├────────┼────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────┤
│ Tier 1 │ Do engines agree on isolated DI? │ Yes — engines are interchangeable at the single-MT level │
├────────┼────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────┤
│ Tier 2 │ Do engines agree on collision effects? │ Qualitatively yes, quantitatively no — 24× gap driven by catastrophe-on-contact physics │
└────────┴────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────────────────┘
The benchmark is working as designed. Tier 1 establishes the floor (DI physics is shared). Tier 2 reveals that the dominant biological question is what happens when two MTs meet — and the two engines encode different answers to that question.
Next Steps (v0.3)
- Add collision-induced catastrophe to Cytosim — implement a custom Hand that triggers catastrophe on fiber contact; would close the engine gap and confirm the mechanism
- Normalize geometry/nucleation — rerun with matched area and nucleation rate so absolute numbers are directly comparable
- Run Tim's elastic-mt in tier-2 — adds a third engine with its own zippering model (already on nigel at ~/elastic-mt/)
- Multi-seed (n=5) — Poisson CIs on density and MT count for all variants
Why steering + evals is the right project
This plays directly to your comparative advantage. The reframe:
Their simulators are models. You evaluate models. You steer models. That's literally your job.
The plant MT community has three or four competing simulators that disagree, no principled way to choose between them, no standard benchmarks, and no methodology for intervening inside them to test mechanism. That's not a software gap — that's a methodology gap, and it's exactly the gap mech interp fills for neural nets.
What this looks like concretely
Evals — the immediate, obvious win:
Build a standard benchmark suite for cortical MT simulators. Pick 5–10 well-characterized experimental scenarios (wild-type array maturation, specific mutants, drug treatments, mechanical perturbations) with quantitative microscopy data. Define metrics: alignment order parameter trajectories, MT length distributions, collision outcome statistics, response to perturbation. Run all simulators against all scenarios. Publish the leaderboard.
This is methodologically trivial for you and enormously valuable to the field. Right now every paper picks its own metrics and its own validation data, so models can't be compared. You'd be doing for cortical MT modeling what ImageNet did for vision or what HELM did for LLMs: forcing a common evaluation surface.
The paper writes itself: "Standardized Benchmarks for Cortical Microtubule Simulators." It will get cited by every paper in the field for the next decade because everyone will need to report numbers against it.
Steering — the deeper research program:
Treat each simulator as a system to intervene in. Pick the cleanest one (probably Cytosim because the physics is bottom-up, or CorticalSim because it's fastest to iterate on) and develop methods to:
The unifying frame:
You're applying interpretability methodology to scientific simulators. This is a generalizable contribution, not a one-off favor to the plant MT community. The same methods would apply to any mechanistic simulator — climate models, epidemiological models, protein folding pipelines. You'd be staking out a methodological territory.
Why this also serves AI safety
This is the part to emphasize when justifying the time to your home community:
Mech interp has a validation crisis. We don't know if our methods recover real mechanism because in neural nets we don't know what the real mechanism is. Biological simulators are interpretability with ground truth: the mechanism is written in C++, you can read it, you know exactly what circuit you're trying to recover. If activation patching can't localize the collision rule in Cytosim, why would we trust it to localize deception in a transformer?
Plant MT simulators are a cleaner testbed than most toy tasks people use for method development. Real emergent behavior, real comparison to data, real disagreement between competing models — but with known ground truth at the substrate level. That's a unicorn for interp methods development.
So the actual pitch to your safety colleagues: "I'm developing and validating interp methods on biological simulators where ground truth is known, and the validated methods come back to neural nets stronger."
Practical sequencing
Months 1–2: Build the evals harness. Get 2–3 simulators producing comparable outputs against 3–5 benchmark scenarios. Publish a workshop paper or preprint. This earns credibility with the community and gives you the infrastructure for everything else.
Months 3–6: Pick one simulator and do a focused mech interp study. Activation patching analogue, probing, one steering demonstration. Aim for a methods paper that's legible to both communities.
Months 6–12: Either go broader (extend to other simulators, develop the general framework for "interp on mechanistic models") or go deeper (collaborate seriously on a biology question that the methods unlock). Both are good; depends on whether you want to be primarily a methods person or primarily a bridge.
The collaborator profile
For the evals: any of the simulator authors will want this to exist, but the most useful partner is whoever has the most carefully quantified microscopy data. Probably an experimentalist whose papers include real numbers, not just images.
For the steering: François is the obvious partner because Cytosim is the richest substrate and he's already at the workshop. But realistically he's busy — the right collaborator is probably someone in his group, or a postdoc working on cortical MTs specifically.
Bottom line
Evals + steering = doing your own research, using their substrate, advancing both fields.
You'd be the mech interp person who showed the methodology generalizes beyond neural nets, and the person who gave the plant MT community its first principled evaluation framework. Both are real contributions, neither is wasted effort, and they compound rather than compete with your safety work.