Genome program · execution plan · v2 — graded 6→8.5 by an independent adversary, gaps folded — 2026-07-03

The Arithmetic Arc

The path forward after the receipt experiment and the scale probe. One sentence: stop trying to be believed — make checking cheaper than redoing, everywhere, and let arithmetic compound. The honest numbers: the best receipt-led subject beat re-deriving outright (21 calls vs 25) — but the receipt arm's median cost 41. The crossover exists and is not yet the default path; closing that gap is what this plan is for. The unambiguous finding: a naked assertion costs +124% — agents fight it. Source of truth: docs/ai/PLAN-arithmetic-arc.md · evidence: genome/eval/RECEIPT-EXPERIMENT-2026-07-02.md · scale probe 2026-07-03.

Locked by data — and the three that need your word

Economics, not psychology — no trust UX, ever Structural-first, now evidenced ⭑ LAW 1 · receipts-or-silence — YOURS TO RATIFY ⭑ LAW 2 · scope travels with claims — YOURS TO RATIFY ⭑ Volatility table — YOURS, yes/no Membership: value first, the owner merges (ratified) W4 LANDED — merged #82, live and gate-guarded
The proposed law, plainly: no agent-facing surface ships a claim without its receipt, scope, and freshness attached. A claim travels with its proof or it doesn't travel. The evidence: +124% burned fighting a true claim that arrived naked and badly scoped.

The journey — nine moves, leverage-ordered

  1. The laws land first. The claim card — claim + scope + one-command recount + freshness — becomes the only shape a claim may take on any agent-facing surface: briefs, session primes, tool results, and every future Fides or DE Brain answer. This is also how our own docs stop being naked assertions: the session-prime emitter is the path by which doc content becomes receipted, starting the day this move lands.
    ↘ go deeper
    DONE: no unreceipted-claim code path in agent-facing emitters (grep-provable) + claim-card fields in the shared envelope. Kill: none — a floor.
  2. Checking becomes free-feeling — and the world stays standing. The scale probe (VS Code, 11,508 files) found the dominant cost: the command-line rebuilds the whole world per question — 501s per read, 496s on the re-read, zero reuse. The standing service already holds the cure: build once per commit, read in seconds, command-line included. Plus scale-sane payloads — 6,313 consumers must come back with names, never a bare count.
    ↘ go deeper
    DONE: repeat reads in seconds on the 11.5k-file fixture, measured; top-k shown at any scale; cost-to-check ≤ 1/5 of re-derive — defined as the standing-world recount path vs independent derivation (baseline: median 25 calls / 193s); bench rows gain toolCalls capture. Kill: work that doesn't move measured checking cost → stop polishing, study.
  3. The promotion sweep — the felt payoff. 1 of 136 gate checks may reuse a saved verdict today; the gate re-pays ~16 minutes to re-derive the rest. Promote honestly — declared basis, hermetic proof, a break-tooth per leg — from 1 to ≥10 (arms the meters), then toward 70%. Carries the bet's one real remaining risk, and its kill catches it in days.
    ↘ go deeper
    DONE: ≥10 legs promoted; skip-rate + cold-vs-warm delta off the live bench curve. Kill (W1's): receipt upkeep costs more than re-running → stop and study.
  4. The 5× demonstration — the story gets an owner. The proof bar demands one real task at ≥5× lower cost because settled truth was read; until now nobody owned producing it. Staged deliberately: one recurring task, cold-way vs genome-way, instrumented end to end, published as a one-page before/after that Alie can feel cold.
    ↘ go deeper
    DONE: ≥5× documented with receipts. Kill, honest, no limbo: under 2× → publish the number anyway and study; 2×–5× → iterate with named levers, three attempts max, then re-scope. Ends in a published number in every branch.
  5. Meters where agents live — and the map begins. The reads ledger goes live at its choke point; briefs and primes rebuilt as receipted reads; the twice-daily driver becomes the first born-knowing worker; the first handoff doc retires. Reframed by your sprawl question: the reads ledger is cartography, not telemetry — every read is an edge, the org graph's producer.
    ↘ go deeper
    DONE: ≥1 real read logged · driver primes from genome reads alone · 1 handoff retired with round-trip proof · the first cross-system "who read this fact?" query answered from read-tracks. Kill (W3's): genome briefs underperform hand priming → stop scaling; Law 1 is the known mitigation.
  6. Membership multiplies encounters — and the evidence hardens first. Prism level 1 now, the full-membership PR on Lukasz's own merge, then de-brain, Fides, Eve, then org-wide by playbook. And the graded reviewer's sharpest point, folded: the corrected experiment re-run launches immediately, in parallel — scope-matched claim, fixed paraphrase, graded and un-graded arms; the un-graded condition is the only honest test of trust-at-work, and the first design suppressed it by construction.
    ↘ go deeper
    DONE: corrected run banked in both conditions · Prism answers a real Lukasz question · per-repo counters live. Fallback (named honestly, not a kill): an owner never merges level 2 → a signal to study, never a mandate.
  7. The first practice fiber — n=1, the business begins. W4's merge made this possible now: one real practice's truth — hours, phone, services from Salesforce/GBP — deposited with source-of-record receipts, served from the standing world, answering one Fides-shaped question with honest freshness stamps. Internal only, n=1, Alie's gate untouched by construction; a short spec precedes the build — never improvised against live sources.
    ↘ go deeper
    DONE: one practice answers one real question from receipted truth; the receipt re-runs clean against the live source; the honest-dated floor renders where no source exists. Fallback (the source-receipts rule): a non-deterministic source drops to honest-dated-claim — reported, never dressed up.
  8. One engine, many planes — LANDED. Merged to main as #82: the intent plane with leases, the ledger walls with the F-WALL drill as a standing gate leg, bindings, the re-verify sweep, source receipts — six items, each independently adversary-verified, two defects caught pre-trunk and fixed. Remaining: Eve's dual-write (her own repo, verified green: 337 files / 6,542 tests) waits on your glance — and ratification 8 becomes the intent plane's first deposited record on your word.
  9. It faces the world. Fed by this arc, not built in it: the leadership brief rendered from the genome with currency + blindness stamps (Alie's gate sits exactly where the business gets touched); Neuron's fleet on build-lineage receipts; the Stage-5 wedge in its measured form — the only practices selling check-me-in-one-call truth in a market of naked assertions.

What could go wrong — honestly

Receipts cost more than re-runs. The sweep's kill fires in days, cheaply, on Plato — the arc re-plans around CI/human value. The bet's real remaining risk, priced.
Reads stay near zero after membership. Then encounter rate is a placement problem — study which surfaces agents actually look at before building more.
The hardened re-run weakens the findings. The laws stay (floor-cheap, justified by fail-closed doctrine independently); the ergonomics investment resizes to the honest numbers. It launches immediately — better resized this week than wrong this quarter.
The mirror problem. Our own docs are naked assertions — the exact thing proven toxic. The concrete path: move 1's session-prime emitter is how doc content becomes receipted, starting the day it lands. Until then, docs cite evidence inline — as this one does.

Your calls

Ratify Law 1 (receipts-or-silence) + Law 2 (scope travels with claims)? On your word they bank same-day as roadmap ratification 8 — and become the live intent plane's first deposited record.
The volatility table, yes/no: volatile (phone/hours/listings/pricing/contacts) > medium (services/rosters/tiers/integrations) > stable (rulings/North-Stars/brand/architecture). One word banks the re-verify sweep's priority order; changing it later is a one-line edit.
Eve #209 — glance when ready. Independently verified green (337 test files, 6,542 passed, type-check clean, kill test present, plane hardcoded personal, flag default OFF). I will never merge it for you.
Anything to cut, reorder, or add? Sequencing inside moves is my lane; the order between them is open to your red pen.

Evidence & provenance

By type — Laws: the rule · L1 · L2 | Load-bearing: the sweep | Demand-side: the 5× story · the fiber | Kills & fallbacks: upkeep · reads · hardening | Calls: laws · volatility · Eve

Counts behind the claims — 0/9 blind trust · best-case receipt-led 21 calls vs 25, receipt-arm median 41 (the gap this plan closes) · assertion +124% / +78% · 9-way set-identical convergence (44 = 44) · live gate reused=1 · scale probe: 11,508 files / 102,711 wires in 515s, honesty held, reads 501s/496s zero reuse · W4 merged #82 (135/136 first pass, census acknowledged, two defects adversary-caught pre-trunk). Grading provenance: independent opus adversary — 6/10, findings folded, re-graded 8.5/10, remaining gaps folded same-day. Sources: PLAN-arithmetic-arc.md · RECEIPT-EXPERIMENT-2026-07-02.md · bench.jsonl.