beccajcarlson / 01_paywall_bot_block_compare.md

Last active June 8, 2026 01:49

Production vs OpenAlex paywall+bot-block — n=100 random cohort genes × 10 random papers each

Production vs OpenAlex — body-fetch reachability

Two-bar comparison of how literature reachability changes when the v2 deep-dive's production search strategy is swapped for an OpenAlex-backed retrieval surface. Same 100 random genes × 10 random papers each for direct comparability.

What the two strategies are

Production — build_a1_kickoff() from the deep-dive pipeline.

beccajcarlson / 01_paywall_bot_block_overview.md

Last active June 7, 2026 23:19

v2 deep-dive cohort paywall + bot-block landscape (5 anchor runs + 28-gene Unpaywall sample)

Paywall & bot-block landscape for the v2 surfaceome deep-dive cohort

Single-panel horizontal stacked bar showing how 150 random genes from the 6,521-gene v2 deep-dive cohort (candidate_universe_v2.tsv) distribute across the four operationally-relevant body-fetch outcomes. For each gene, the top-5 most-recent PubMed papers were classified (n = 680 total) by which path through the production fetch chain would succeed.

The four buckets

beccajcarlson / 01_zero_db_rescues_by_triage.md

Last active June 7, 2026 21:29

Zero-DB rescues by triage — what the Sonnet+NCBI agent catches that classical surface DBs miss

Zero-DB rescues by triage — what the agent catches that classical surface DBs miss

Whole-genome view of the genes the Sonnet (+ NCBI) triage agent flags as surface-accessible despite none of the five classical surface DBs (UniProt / GO CC / HPA / SURFY / CSPA) voting "yes." Two grouped bar panels on a shared y-axis show per-reason counts within each verdict bucket:

yes — definite surface, by triage agent's verdict=yes
contextual — state / lineage / partner-dependent surface

beccajcarlson / 01_db_vs_sonnet_whole_proteome.md

Last active June 7, 2026 21:29

DB vs Sonnet agreement on the whole proteome — bench-optimized cutoffs

DB ↔ Sonnet agreement on the whole proteome — bench-optimized cutoffs

Whole-genome analog of the 147-gene bench plot db_correctness_by_class. On the bench, ground truth is hand-curated; on the whole proteome (~19,324 protein-coding genes) it doesn't exist, so we use the Sonnet (+ NCBI) triage verdict as the reference and ask: for each surface DB (under its bench-optimized cutoff) and each ≥k-DB ensemble (k = 1..5), what fraction of genes does it agree with Sonnet on, split by Sonnet's verdict bucket.

beccajcarlson / 01_ensemble_vs_best_db_vs_sonnet.md

Last active June 7, 2026 21:29

Sonnet vs best DB vs >=k DB ensembles — overall accuracy on 147-gene bench

Sonnet vs best DB vs ≥k-DB ensembles — overall accuracy on 147-gene bench

Six-bar comparison of overall verdict accuracy on the 147-gene triage benchmark:

Sonnet (+ NCBI) — Claude Sonnet 4.6 triage agent on its canonical NCBI-resolver prompt variant
UniProt (TM+signal) — best single classical DB under its bench-optimized cutoff (TM OR signal-peptide positive)
≥2 / ≥3 / ≥4 / ≥5 DB — ensemble callers: "yes" iff at least

beccajcarlson / 01_db_correctness_overall.md

Last active June 7, 2026 21:29

Claude triage on the 147-gene bench, LLM-only, hatched by variant (accessible-surfaceome)

LLM overall accuracy — Claude triage on the 147-gene bench

Bars showing overall verdict accuracy for each (model, prompt-variant) Claude triage cell on the 147-gene bench. Grouped by model (Haiku 4.5 / Sonnet 4.6 / Opus 4.7); within each group, four bars encode the prompt variants via hatch:

solid — naive (gene symbol only)
// — + NCBI resolver context
xx — + NCBI + web_search

beccajcarlson / 01_benchmark_cost_vs_accuracy.md

Last active June 7, 2026 21:29

Claude triage: cost vs accuracy on the 147-gene bench (accessible-surfaceome)

Benchmark cost vs accuracy — Claude triage agents on the 147-gene bench

Each point is one (model, prompt-variant) cell of the triage agent benchmark: x = $/whole-genome (cost projected to a 1-replicate-per-gene sweep over 19,324 protein-coding genes), y = verdict accuracy on the 147-gene labelled bench. Ten Claude cells: Haiku 4.5 × {naive, +NCBI, +NCBI+PubMed, +NCBI+web}, Sonnet 4.6 × same 4 variants, and Opus 4.7 × {naive, +NCBI}. Cost amortises prompt-caching using the observed cache-hit rate per cell.

beccajcarlson / 01_db_cutoff_tradeoff.md

Last active June 7, 2026 21:29

Per-DB cutoff trade-off: universe size vs bench accuracy (accessible-surfaceome)

DB cutoff tradeoff — universe size vs benchmark accuracy per source

For each of the 5 M1 surface DBs (UniProt, GO CC, HPA, SURFY, CSPA), a small panel plotting how the choice of "surface-vote" cutoff trades universe size (proteins this filter admits, log-scale x) for benchmark accuracy on the 147-gene bench (y, %). Lower-x = stricter.

Markers:

Circle — alternative cutoff option, not currently used in the M1 merge rules.

beccajcarlson / 01_db_overlap_venn.md

Last active June 7, 2026 23:42

M1 surface DB overlap — 5-way Venn (accessible-surfaceome)

DB overlap Venn — 5-way agreement across the M1 surface databases

Topologically-correct 5-ellipse Venn of the M1 candidate universe. Each ellipse is one surface-prediction DB (UniProt subcellular, GO cellular component, HPA, SURFY, CSPA). Cell labels are protein counts in each of the 31 non-empty regions.

Note: 5-set Venns can't be drawn area-proportional in 2D (open geometry problem). For an area-proportional view of the same data, see the companion UpSet plot.

beccajcarlson / 01_db_correctness_by_class.md

Last active June 7, 2026 21:29

Surface callers: bench correctness by verdict bucket (accessible-surfaceome)

DB correctness by class — optimized cutoffs vs Sonnet+NCBI on the 147-gene bench

5 surface DBs (UniProt, GO CC, HPA, SURFY, CSPA) + Sonnet+NCBI, grouped bars showing accuracy per ground-truth class (overall / yes / contextual / no). DB cutoffs are the trade-off-audit optimized versions, not canonical baselines:

UniProt — TM+signal: admit any accession with a TM domain, a signal peptide, OR a strict surface subcellular term (looser than canonical; rescues more bench positives without hurting the

Becca Carlson beccajcarlson

Production vs OpenAlex — body-fetch reachability

What the two strategies are

Paywall & bot-block landscape for the v2 surfaceome deep-dive cohort

The four buckets

Zero-DB rescues by triage — what the agent catches that classical surface DBs miss

DB ↔ Sonnet agreement on the whole proteome — bench-optimized cutoffs

Sonnet vs best DB vs ≥k-DB ensembles — overall accuracy on 147-gene bench

LLM overall accuracy — Claude triage on the 147-gene bench

Benchmark cost vs accuracy — Claude triage agents on the 147-gene bench

DB cutoff tradeoff — universe size vs benchmark accuracy per source

DB overlap Venn — 5-way agreement across the M1 surface databases

DB correctness by class — optimized cutoffs vs Sonnet+NCBI on the 147-gene bench