/feed/brain-diff
Diff.
Substantive changes across all kinds, generated from git history. Boring edits are filtered out so review focuses on meaningful updates.
Today
claim revised
Frontier evals overstate coding ability outside greenfield repos
“Three weeks of refactor work in a large TS codebase. Models still get lost in imports.”
conf 0.74 → 0.62
Yesterday
thought added
Why I keep a manual review step in agentic pipelines
1,840 words
prediction updated
Apple ships an on-device assistant that beats Siri 2024 on intent F1 by 2x at WWDC 2026
“No movement, but bumped freshness.”
conf 0.55 → 0.55
Apr 28
prediction resolved
An open-weight model will match GPT-4o on MMLU by EOY 2025
“Llama 3.3 405B crossed 86 in March. Counts.”
resolved · true
claim added
Calibration is the only honest measure of an opinion economy
conf 0.78
Apr 22
decision reversed
Use SQLite as the canonical store
“Markdown wins for git-trackability. SQLite is now derived only.”
—
claim deprecated
RAG over a vector store is the right default for most assistants
conf 0.66 → 0.31