/feed/brain-diff

Diff.

Substantive changes across all kinds, generated from git history. Boring edits are filtered out so review focuses on meaningful updates.

Today

claim revised

Frontier evals overstate coding ability outside greenfield repos

“Three weeks of refactor work in a large TS codebase. Models still get lost in imports.”

conf 0.74 → 0.62

Yesterday

thought added

Why I keep a manual review step in agentic pipelines

1,840 words
prediction updated

Apple ships an on-device assistant that beats Siri 2024 on intent F1 by 2x at WWDC 2026

“No movement, but bumped freshness.”

conf 0.55 → 0.55

Apr 28

prediction resolved

An open-weight model will match GPT-4o on MMLU by EOY 2025

“Llama 3.3 405B crossed 86 in March. Counts.”

resolved · true
claim added

Calibration is the only honest measure of an opinion economy

conf 0.78

Apr 22

decision reversed

Use SQLite as the canonical store

“Markdown wins for git-trackability. SQLite is now derived only.”

claim deprecated

RAG over a vector store is the right default for most assistants

conf 0.66 → 0.31
/feed/brain-diff.xml · /feed/brain-diff.json last build 2026-05-03 14:22 UTC