The Arithmetic Arc

The journey — nine moves, leverage-ordered

The laws land first. The claim card — claim + scope + one-command recount + freshness — becomes the only shape a claim may take on any agent-facing surface: briefs, session primes, tool results, and every future Fides or DE Brain answer. This is also how our own docs stop being naked assertions: the session-prime emitter is the path by which doc content becomes receipted, starting the day this move lands.
↘ go deeper
DONE: no unreceipted-claim code path in agent-facing emitters (grep-provable) + claim-card fields in the shared envelope. Kill: none — a floor.
Checking becomes free-feeling — and the world stays standing. The scale probe (VS Code, 11,508 files) found the dominant cost: the command-line rebuilds the whole world per question — 501s per read, 496s on the re-read, zero reuse. The standing service already holds the cure: build once per commit, read in seconds, command-line included. Plus scale-sane payloads — 6,313 consumers must come back with names, never a bare count.
↘ go deeper
DONE: repeat reads in seconds on the 11.5k-file fixture, measured; top-k shown at any scale; cost-to-check ≤ 1/5 of re-derive — defined as the standing-world recount path vs independent derivation (baseline: median 25 calls / 193s); bench rows gain toolCalls capture. Kill: work that doesn't move measured checking cost → stop polishing, study.
The promotion sweep — the felt payoff. 1 of 136 gate checks may reuse a saved verdict today; the gate re-pays ~16 minutes to re-derive the rest. Promote honestly — declared basis, hermetic proof, a break-tooth per leg — from 1 to ≥10 (arms the meters), then toward 70%. Carries the bet's one real remaining risk, and its kill catches it in days.
↘ go deeper
DONE: ≥10 legs promoted; skip-rate + cold-vs-warm delta off the live bench curve. Kill (W1's): receipt upkeep costs more than re-running → stop and study.
The 5× demonstration — the story gets an owner. The proof bar demands one real task at ≥5× lower cost because settled truth was read; until now nobody owned producing it. Staged deliberately: one recurring task, cold-way vs genome-way, instrumented end to end, published as a one-page before/after that Alie can feel cold.
↘ go deeper
DONE: ≥5× documented with receipts. Kill, honest, no limbo: under 2× → publish the number anyway and study; 2×–5× → iterate with named levers, three attempts max, then re-scope. Ends in a published number in every branch.
Meters where agents live — and the map begins. The reads ledger goes live at its choke point; briefs and primes rebuilt as receipted reads; the twice-daily driver becomes the first born-knowing worker; the first handoff doc retires. Reframed by your sprawl question: the reads ledger is cartography, not telemetry — every read is an edge, the org graph's producer.
↘ go deeper
DONE: ≥1 real read logged · driver primes from genome reads alone · 1 handoff retired with round-trip proof · the first cross-system "who read this fact?" query answered from read-tracks. Kill (W3's): genome briefs underperform hand priming → stop scaling; Law 1 is the known mitigation.
Membership multiplies encounters — and the evidence hardens first. Prism level 1 now, the full-membership PR on Lukasz's own merge, then de-brain, Fides, Eve, then org-wide by playbook. And the graded reviewer's sharpest point, folded: the corrected experiment re-run launches immediately, in parallel — scope-matched claim, fixed paraphrase, graded and un-graded arms; the un-graded condition is the only honest test of trust-at-work, and the first design suppressed it by construction.
↘ go deeper
DONE: corrected run banked in both conditions · Prism answers a real Lukasz question · per-repo counters live. Fallback (named honestly, not a kill): an owner never merges level 2 → a signal to study, never a mandate.
The first practice fiber — n=1, the business begins. W4's merge made this possible now: one real practice's truth — hours, phone, services from Salesforce/GBP — deposited with source-of-record receipts, served from the standing world, answering one Fides-shaped question with honest freshness stamps. Internal only, n=1, Alie's gate untouched by construction; a short spec precedes the build — never improvised against live sources.
↘ go deeper
DONE: one practice answers one real question from receipted truth; the receipt re-runs clean against the live source; the honest-dated floor renders where no source exists. Fallback (the source-receipts rule): a non-deterministic source drops to honest-dated-claim — reported, never dressed up.
One engine, many planes — LANDED. Merged to main as #82: the intent plane with leases, the ledger walls with the F-WALL drill as a standing gate leg, bindings, the re-verify sweep, source receipts — six items, each independently adversary-verified, two defects caught pre-trunk and fixed. Remaining: Eve's dual-write (her own repo, verified green: 337 files / 6,542 tests) waits on your glance — and ratification 8 becomes the intent plane's first deposited record on your word.
It faces the world. Fed by this arc, not built in it: the leadership brief rendered from the genome with currency + blindness stamps (Alie's gate sits exactly where the business gets touched); Neuron's fleet on build-lineage receipts; the Stage-5 wedge in its measured form — the only practices selling check-me-in-one-call truth in a market of naked assertions.

Evidence & provenance

By type — Laws: the rule · L1 · L2 | Load-bearing: the sweep | Demand-side: the 5× story · the fiber | Kills & fallbacks: upkeep · reads · hardening | Calls: laws · volatility · Eve

Counts behind the claims — 0/9 blind trust · best-case receipt-led 21 calls vs 25, receipt-arm median 41 (the gap this plan closes) · assertion +124% / +78% · 9-way set-identical convergence (44 = 44) · live gate reused=1 · scale probe: 11,508 files / 102,711 wires in 515s, honesty held, reads 501s/496s zero reuse · W4 merged #82 (135/136 first pass, census acknowledged, two defects adversary-caught pre-trunk). Grading provenance: independent opus adversary — 6/10, findings folded, re-graded 8.5/10, remaining gaps folded same-day. Sources: PLAN-arithmetic-arc.md · RECEIPT-EXPERIMENT-2026-07-02.md · bench.jsonl.

Locked by data — and the three that need your word

The journey — nine moves, leverage-ordered

What could go wrong — honestly

Your calls

Evidence & provenance