Skip to content

Instantly share code, notes, and snippets.

@aria42
aria42 / experiment-adaptive-v2.md
Created February 27, 2026 00:20
Adaptive problem selection v2: experiment proposal

Adaptive Problem Selection v2: Experiment Proposal

What We Learned from v1

The good news

  • Ridge acceptance rate: 57% vs baseline 42% (late stage). The adaptive scorer IS better at finding useful problems — 22% fewer wasted generations (589 vs 758 attempts for 320 useful groups).
  • The per-level breakdown shows baseline wins at every level, but the adaptive scorer was handicapped by a KL explosion at step 19 (kl=10.2, grad_norm=4.66).

The bad news

  • Eval accuracy: baseline 82.2% > ridge 78.7% > cluster 77.6%. Better exploration didn't translate to better training.
<!DOCTYPE html>
<html>
<head>
<title>Entropy Depth Analysis</title>
<script src="https://cdn.plot.ly/plotly-2.27.0.min.js"></script>
<style>
body { font-family: -apple-system, sans-serif; max-width: 1400px; margin: 0 auto; padding: 20px; }
h1, h2 { color: #333; }
.plot { width: 100%; height: 450px; margin: 20px 0; }
.grid { display: grid; grid-template-columns: 1fr 1fr; gap: 20px; }
@aria42
aria42 / experiment-answer-conditioned-entropy.md
Created February 26, 2026 05:27
Experiment Plan: Answer-Conditioned Entropy as Zero-Rollout Difficulty Estimator

Experiment: Answer-Conditioned Entropy as a Zero-Rollout Difficulty Estimator

Thesis

We can predict which math problems will produce high reward variance (the useful training signal for GRPO) by measuring the model's logit entropy when prompted with the ground-truth answer. If the model is uncertain about how to derive a known answer, the problem is at the frontier — neither trivially easy nor impossibly hard.

@aria42
aria42 / curriculum-learning-review.html
Created February 25, 2026 23:32
Literature Review: Curriculum Learning & Proposal Distributions for GRPO
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Literature Review: Curriculum Learning &amp; Proposal Distributions for GRPO</title>
<style>
:root {
--bg: #0d1117;
--surface: #161b22;
Does it have sugar?
'Yes, this product has organic low-glycemic coconut palm sugar.'
what size mattress will it fit?
The Exclusivo Mezcla Waffle Textured Extra Large Fleece Blanket is a 50x70 inch throw blanket that can fit a mattress size of up to queen.'
Ninja Fryer
can I broil a burger?
'Yes, you can broil a burger in the Ninja AF101 Air Fryer.’
is there plastic inside?
No, the Ninja AF101 Air Fryer does not have any plastic inside.
@aria42
aria42 / conditional_features.py
Last active October 13, 2019 19:58
conditional_features.py
import pandas as pd
def build_conditional_features(df, cond_series, feat_cols=None):
"""
return a new dataframe with conditional features
"""
if len(cond_series) != len(df):
raise Exception("Condition series isn't same num rows as features")
if feat_cols is None:
aria42$ java -cp target/flare-0.1.0-SNAPSHOT-standalone.jar flare.examples.sentence_classification -n 8000 --train-file data/sentiment-train10k.txt --test-file data/sentiment-test10k.txt --embed-file data/glove.6B.300d.txt
WARNING: any? already refers to: #'clojure.core/any? in namespace: vertigo.core, being replaced by: #'vertigo.core/any?
{:train-file data/sentiment-train10k.txt, :test-file data/sentiment-test10k.txt, :embed-file data/glove.6B.300d.txt, :num-classes 2, :emb-size 300, :model-type :bilstm, :lstm-size 25, :num-data 8000}
Params ([lstm/input->gates/b [200]] [hidden->logits/W [2 50]] [lstm/input->gates/W [200 650]] [hidden->logits/b [2]])
Total # params 130302
Optimizing with flare.optimize.Adadelta
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Iteration 0
Caused by: java.lang.RuntimeException: No such var: api/es, compiling:(uncomplicate/neanderthal/linalg.clj:642:6)
at clojure.lang.Compiler.analyze(Compiler.java:6688)
at clojure.lang.Compiler.analyze(Compiler.java:6625)
at clojure.lang.Compiler$InvokeExpr.parse(Compiler.java:3766)
at clojure.lang.Compiler.analyzeSeq(Compiler.java:6870)
at clojure.lang.Compiler.analyze(Compiler.java:6669)
at clojure.lang.Compiler.analyze(Compiler.java:6625)
at clojure.lang.Compiler$IfExpr$Parser.parse(Compiler.java:2797)
at clojure.lang.Compiler.analyzeSeq(Compiler.java:6868)
at clojure.lang.Compiler.analyze(Compiler.java:6669)