- Date: 2026-03-18
- Branch:
zeming/lance-mapper - Script:
claude_scratchpad/fold_val_test.py - Dataset:
/bio/projects/es/zlin/esmc2_datasets/260312_uniref_seqonly/val_filtered.lance - Model: Janaury trainout of ESMCFold hero medium (24blk, 12 diffusion steps, no MSA, confidence-trained)
- Checkpoint:
conf_esmcfold_hero_medium_24blk_12diffu_no_msa_bs128_ctx512_mult2_noise1.1_step1.0_nodiffcond/epoch-0000-step-7000_cleaned.ckpt
| Property | Value |
|---|---|
| Input rows | 601,760 |
| Schema | id (string), sequence (string), cluster_rep_50, max_seq_id_vs_train |
| Sharding | rows_per_shard=1000 → 602 shards |
| Max sequence length | 1024 (longer sequences skipped) |
| Setting | Value |
|---|---|
| Workers | 16 GPUs (H100, dev QOS) |
| Batch sizing | Dynamic by seqlen: 32 (≤128), 16 (≤256), 8 (≤384), 4 (≤512), 2 (≤786), 1 (≤1024) |
| OOM handling | Fallback to one-by-one on batch OOM |
| Preemption | SIGUSR2 handler, flush + scontrol requeue |
Each folded sequence produces:
id(string) — key columnptm(float) — predicted TM-scoremean_plddt(float) — mean pLDDTper_residue_plddt(binary) — npz-compressed float16 arraystructure_blob(binary) — ESM structure blobpdb_str(string) — PDB format string
| Metric | Value |
|---|---|
| Effective throughput | ~20 rows/s aggregate across 16 GPUs |
| Per-GPU throughput | ~1.25 rows/s (varies heavily with sequence length) |
| Output rows | 581,613 / 601,760 (20,147 skipped — sequences > 1024 residues) |
| Output size | 37 GB across 602 parquet shards (25-99 MB each), merged to result.lance |
-
Long-tail shards: Static shard assignment caused 14/16 workers to finish early while 2 were stuck on long-sequence shards. Fixed by adding work stealing — workers that finish their assigned shards pick up remaining unfinished shards with
.lockfile coordination. -
Work stealing contention: Multiple workers raced on the same last shards, duplicating work. Fixed with
.lockfiles containing the SLURM job+task ID, with stale lock cleanup viasqueuechecks.
Merge uses multithreaded lance.fragment.write_fragments() + atomic commit (from lance_dataset.py pattern). Runs on the login node, no GPU needed.
BTREE scalar index on the id column:
ds = lance.dataset("result.lance")
ds.create_scalar_index("id", index_type="BTREE")| Operation | Time |
|---|---|
| Index creation (581K rows) | 1.3s |
| Single lookup | 16ms |
| Batch 10 | 112ms (11ms/row) |
| Batch 100 | 221ms (2.2ms/row) |
| Batch 1000 | 2.0s (2.0ms/row) |
Extrapolated to 1B rows: index creation ~37 min, lookup times somewhat higher but BTREE keeps it sub-linear.
/bio/projects/es/zlin/atlas-folding/test-small.lance
# Full run
python claude_scratchpad/fold_val_test.py run \
--input-dataset /bio/projects/es/zlin/esmc2_datasets/260312_uniref_seqonly/val_filtered.lance \
--output ~/tmp/fold-val-test/result.lance \
--num-workers 16 --qos dev
# Check progress
python claude_scratchpad/fold_val_test.py status --output ~/tmp/fold-val-test/result.lance
# Single shard smoke test
srun --gres=gpu:1 --mem=32G -c 12 --qos dev -t 1:00:00 \
pixi run python claude_scratchpad/fold_val_test.py run_single \
--input-dataset /bio/projects/es/zlin/esmc2_datasets/260312_uniref_seqonly/val_filtered.lance \
--output ~/tmp/fold-val-test/result.lance --shard-ids 0
# Create index
python -c "import lance; ds = lance.dataset('result.lance'); ds.create_scalar_index('id', index_type='BTREE')"