Aria Haghighi aria42

Adaptive Problem Selection v2: Experiment Proposal

What We Learned from v1

The good news

Ridge acceptance rate: 57% vs baseline 42% (late stage). The adaptive scorer IS better at finding useful problems — 22% fewer wasted generations (589 vs 758 attempts for 320 useful groups).
The per-level breakdown shows baseline wins at every level, but the adaptive scorer was handicapped by a KL explosion at step 19 (kl=10.2, grad_norm=4.66).

The bad news

Eval accuracy: baseline 82.2% > ridge 78.7% > cluster 77.6%. Better exploration didn't translate to better training.

Experiment: Answer-Conditioned Entropy as a Zero-Rollout Difficulty Estimator

Thesis

We can predict which math problems will produce high reward variance (the useful training signal for GRPO) by measuring the model's logit entropy when prompted with the ground-truth answer. If the model is uncertain about how to derive a known answer, the problem is at the frontier — neither trivially easy nor impossibly hard.

	<!DOCTYPE html>
	<html>
	<head>
	<title>Entropy Depth Analysis</title>
	<script src="https://cdn.plot.ly/plotly-2.27.0.min.js"></script>
	<style>
	body { font-family: -apple-system, sans-serif; max-width: 1400px; margin: 0 auto; padding: 20px; }
	h1, h2 { color: #333; }
	.plot { width: 100%; height: 450px; margin: 20px 0; }
	.grid { display: grid; grid-template-columns: 1fr 1fr; gap: 20px; }

	<!DOCTYPE html>
	<html lang="en">
	<head>
	<meta charset="UTF-8">
	<meta name="viewport" content="width=device-width, initial-scale=1.0">
	<title>Literature Review: Curriculum Learning & Proposal Distributions for GRPO</title>
	<style>
	:root {
	--bg: #0d1117;
	--surface: #161b22;

	Ninja Fryer

	can I broil a burger?

	'Yes, you can broil a burger in the Ninja AF101 Air Fryer.’


	is there plastic inside?

	No, the Ninja AF101 Air Fryer does not have any plastic inside.

	import pandas as pd


	def build_conditional_features(df, cond_series, feat_cols=None):
	"""
	return a new dataframe with conditional features
	"""
	if len(cond_series) != len(df):
	raise Exception("Condition series isn't same num rows as features")
	if feat_cols is None:

	aria42$ java -cp target/flare-0.1.0-SNAPSHOT-standalone.jar flare.examples.sentence_classification -n 8000 --train-file data/sentiment-train10k.txt --test-file data/sentiment-test10k.txt --embed-file data/glove.6B.300d.txt
	WARNING: any? already refers to: #'clojure.core/any? in namespace: vertigo.core, being replaced by: #'vertigo.core/any?
	{:train-file data/sentiment-train10k.txt, :test-file data/sentiment-test10k.txt, :embed-file data/glove.6B.300d.txt, :num-classes 2, :emb-size 300, :model-type :bilstm, :lstm-size 25, :num-data 8000}
	Params ([lstm/input->gates/b [200]] [hidden->logits/W [2 50]] [lstm/input->gates/W [200 650]] [hidden->logits/b [2]])
	Total # params 130302
	Optimizing with flare.optimize.Adadelta
	SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
	SLF4J: Defaulting to no-operation (NOP) logger implementation
	SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
	Iteration 0

	Does it have sugar?

	'Yes, this product has organic low-glycemic coconut palm sugar.'

	what size mattress will it fit?

	The Exclusivo Mezcla Waffle Textured Extra Large Fleece Blanket is a 50x70 inch throw blanket that can fit a mattress size of up to queen.'

	Caused by: java.lang.RuntimeException: No such var: api/es, compiling:(uncomplicate/neanderthal/linalg.clj:642:6)
	at clojure.lang.Compiler.analyze(Compiler.java:6688)
	at clojure.lang.Compiler.analyze(Compiler.java:6625)
	at clojure.lang.Compiler$InvokeExpr.parse(Compiler.java:3766)
	at clojure.lang.Compiler.analyzeSeq(Compiler.java:6870)
	at clojure.lang.Compiler.analyze(Compiler.java:6669)
	at clojure.lang.Compiler.analyze(Compiler.java:6625)
	at clojure.lang.Compiler$IfExpr$Parser.parse(Compiler.java:2797)
	at clojure.lang.Compiler.analyzeSeq(Compiler.java:6868)
	at clojure.lang.Compiler.analyze(Compiler.java:6669)