Coding agent compatible logs. concise, inline guidance, token effecient

Principles

don't use many tokens
make it so a dumb summary LLM can easily 1) see problems 2) have clues to diagnose
timing information

example of good log

with single line should statement inline in log that make it clear how it should look, distinguish from subtle failures, and give principled clue for diagnosis
table short have longest and least important lines last, so that humans can read it even with wrap around e.g short numeric columns first. long text columns last, notes or desc last

use tabulate plain for token effecient, not logging each step or epoch but just table

  ```py
  f"\n=== Sweep: {FAMILY_TITLES[family]} ===")
  print(tabulate(
      table_rows,
      headers=["model", "dataset", "condition", "seeds", "|SR|", "hgap(low)", "hgap(0)", "hgap(high)"],
      tablefmt="tsv",
      floatfmt="+.2f",
  ))
  ```

tqdm with mininterval 60, to record times, but not pollute logs
have headers for major stages with timestamp in for task_idx, task in enumerate(tqdm(tasks, desc=f"{model_name} {cot_label}", mininterval=60)):
avoid escape issues, for example don't have |dS| mean instead have abs(dS) or similar

loguru plain message, no colors, write to tqdm.write

import os as _os
logger.remove()
# TODO change to config option and env vars are not very trackable
_LOG_LEVEL = _os.environ.get("SSTEER_LOG_LEVEL", "INFO")
logger.add(lambda x: tqdm.write(x, end=""), level=_LOG_LEVEL, colorize=False, format="{message}")

due false positives, if you have things that might trigger llm nanies like ending a process, or traces from red teaming, you might need to give context
due to tail, make the last 30 lines have most important context: main metric, argv/delta(config), main diagnostics, time, commit / branch, output dir, wandb etc

Examples of good logs (but should use tabulate tsv)


  coeff    logratio     pmass  passes    note
-------  ----------  --------  --------  -------------------------
 0           13.547  1         ✓
 0.0001      13.547  1         ✓
 0.001       12.641  1         ✓
 0.01        11.109  1         ✓
 0.02        13.625  1         ✓
 0.05        13.297  1         ✓
 0.1         10.844  1         ✓
 0.2          8.188  1         ✓
 0.2375       5.891  1         ✓         <-- selected
 0.275        5.635  0.949219  .         <-- breakdown pmass<floor

SHOULD: logratio should be monotonic untill breakdown. should fine a place where pmass breaks down and select just before itr, coeff=0 should have ~perfect pmass
---

example of good final 40 lines (note has output files, input args, main metric, and result table with important and short things first)

out: ./outputs/20260426T015439_ssteer_v2_exp_mean_38a4_eval.jsonl argv: eval_logratio.py --quick --model-name Qwen/Qwen3-0.6B --extraction ssteer_v2 --seed 42 --n-train-steps 5 main metric: abs_sr=6.867 [flags=quick,tasks=1/75]

cue abs_sr h_low h_0 h_high C_min/max pmass_min seed n commit model method flags run out 🟢 6.867 0.758 5.75 7.625 -0.50/+0.28 0.93 42 1 773f4d5 Qwen3-0.6Bssteer_v2/exp/mean quick,tasks=1/75 20260426T015439_ssteer_v2_exp_mean_38a4 ./outputs/20260426T015439_ssteer_v2_exp_mean_38a4_eval.jsonl

wassname/logs.md

Select an option

No results found

Select an option

No results found