- seven floors · typed cognitive shapes
- goals + beliefs first-class
- self-model · learning log · unlearn
- neurosymbolic interface
the database AGI will run on.
v2.0 is the substrate. v2.1 is brain-aligned multimodal. v2.2–v2.5 builds the cognitive engine on top — pattern completion, formal belief revision, causal claims, closed-loop self-modification with formal safety guarantees.
the database
AGI will run on.
v2.0 is the substrate. v2.1 is brain-aligned multimodal. v2.2–v2.5 builds the cognitive engine on top — pattern completion, formal belief revision, causal claims, world model fragments, closed-loop self-modification with formal safety guarantees. five years committed.
- V-JEPA 2 + W2V-BERT + Llama-3.2-3B sensory
- brain-calibrated θ_brain
- BAMS benchmark · ICLR '26 paper
- Hopfield pattern completion
- AGM-formal belief revision
- analogical retrieval via HDC bind
- causal claim storage · intervention semantics
- world model fragments
- on-line learning state
- enterprise tier · distributed mode
- formal safety on self-modification
- BCI sensory experimental · Brain-JEPA
- closed-loop self-modification
- causal reasoning over beliefs
- full cognitive engine
one week
collapses the bet.
every claim on this page resolves at phase 7 · week 12 against a shared harness: LongMemEval-S · LoCoMo · BEAM · cognitive (goal · belief · unlearn · multi-floor). six metrics per run · never a single number · raw logs + harness hash with every claim. three outcomes — and the project commits to one.
- ≥ Zep/Graphiti accuracy on LongMemEval-S (within 1pp F1 + LLM-judge)
- ≥ 3× lower p95 latency vs mem0 (target < 50ms)
- ≥ 3× lower token cost vs mem0 (< 2,500 tokens/query)
- wins noisy-cue degradation test
- all four cognitive benchmarks pass
- holds across all three standard benchmarks · no cherry-picking
- within 3pp of mem0 F1
- ≥ 10× memory savings vs alternatives
- partial cognitive benchmark pass acceptable
- reposition as agidb-lite · embedded cognitive memory for edge agents
- skip brain-alignment milestone
- >10pp behind dense baselines
- gap doesn't close with reranking
- cognitive benchmarks fail
- reposition as ctxgraph · temporal graph memory
- preserve the IP · publish what we learned
agidb — AGI Trajectory
The 5-year roadmap from agidb v2.0 (substrate, 2026) to v2.5 (AGI-grade, 2031). Brain-alignment integrated as the v2.1 additive milestone.
The shape of the bet
agidb is a 5-year commitment, not a 9-month launch. The cognitive-substrate framing is what justifies the AGIDB name; the 12-month v2.1 brain-aligned launch is what justifies the first round of funding; the 5-year trajectory is what justifies the long-term existence of the company.
Each major version adds a capability frontier that compounds on the previous:
| Version | Year | What it adds | Decision gate |
|---|---|---|---|
| v2.0 | 2026 (m9) | Substrate — episodic, semantic, procedural, working, sensory, goals, beliefs, self-model, unlearn, neurosymbolic interface | Phase 7, week 12: commit / reposition / retreat |
| v2.1 | 2026 (m12) | Brain-alignment — V-JEPA 2 + Wav2Vec-BERT + Llama-3.2-3B multimodal sensory, brain-calibrated surprise, BAMS benchmark, ICLR 2026 paper | Phase 16, week 52: paper accepted? BAMS wins associative networks? |
| v2.2 | 2027 | Cognitive engine v0.1 — Hopfield pattern completion, AGM belief revision, analogical retrieval via HDC binding, learned projection (if BAMS plateaus) | End of 2027: design partner production deployments |
| v2.3 | 2028 | Causal layer — causal claim storage with intervention semantics, world model fragments, on-line learning, Causal-JEPA-style object-centric masking | End of 2028: enterprise deal pipeline |
| v2.4 | 2029-2030 | Production-grade — full enterprise tier, distributed mode, formal safety guarantees on self-modification, BCI input experimental (Brain-JEPA, signal-JEPA) | Mid-2030: revenue >$5M ARR |
| v2.5 | 2031 | AGI-grade — substrate for true autonomous systems; closed-loop self-modification, causal reasoning over learned beliefs, cognitive engine fully realized | Year 5: agidb is the de facto AGI substrate or it isn’t |
v2.0 — Substrate (2026, month 9)
The first credible AGI substrate. Inherits sochdb v1’s working HDC kernel, bi-temporal storage, episode binding, tiered recall, and consolidation. Adds five new phases (9-13) for the AGI pivot.
What ships
- All seven cognitive floors with first-class typed shapes
- Goals as state machines with parent-child hierarchy
- Beliefs as revisable claims with audit trails
- Sensory buffer with surprise gating (hand-tuned threshold, no brain calibration yet)
- Self-model audit log + self-vector EMA
- Non-destructive cascading unlearn with self-vector subtraction
- Neurosymbolic interface (signature ↔ triple translation)
- 9 crates: agidb-core, agidb-extract, agidb-ns, agidb-skills, agidb-cli, agidb-mcp, agidb-py, agidb-bench, agidb umbrella
Decision gate
Phase 7, week 12. The benchmark suite vs Mem0, Zep/Graphiti, Letta. Three outcomes:
- Commit — proceed to v2.1 + fundraise
- Reposition — ship as “agidb-lite: embedded cognitive memory for edge agents”
- Retreat — fold back into ctxgraph (predecessor)
See PROJECT.md section 11 for the full threshold definitions.
Success at v2.0 launch (month 9)
- 1M+ episodes on a laptop with sub-100ms p99 recall
- Match/beat Zep on LongMemEval-S (≥ 64 accuracy)
- 3× lower retrieval latency than Mem0 (p95 < 50ms)
- 3× lower token cost than Mem0 (< 2,500 tokens/query)
- All four cognitive benchmarks pass
- 1000+ GitHub stars
- 5+ design-partner deployments
- arxiv whitepaper posted
v2.1 — Brain-alignment (2026, month 12)
Additive expansion. v2.0 substrate stays the core. Brain-alignment is the publishable differentiator that turns agidb from “another rust memory library” into “an artifact of brain-aligned cognitive science research with production rust deployment.”
What ships
agidb-sensorycrate — V-JEPA 2 + Wav2Vec-BERT + Llama-3.2-3B encoders, Charikar 2002 random projection, VSA multimodal bindingobserve_multimodal()API — 30s video + audio + text → one episode HV- Brain-calibrated surprise gating — θ_brain fit against TRIBE v2 predicted neural surprise on associative cortex
agidb-bamscrate — BAMS benchmark suite, six-cortical-network RSA harness, baselines (mem0, letta, zep, hipporag, raw V-JEPA)- ICLR 2026 MemAgents workshop paper (or CCN 2026 backup)
- 11 crates total (v2.0’s 9 + agidb-sensory + agidb-bams)
Decision gate at v2.1
Phase 16, week 52. The brain-alignment work is judged on:
- BAMS suite open-source with reproducible baselines
- agidb wins BAMS in at least 3 of 6 functional networks (target: DMN, dorsal attention, frontoparietal)
- ICLR 2026 MemAgents paper accepted (or CCN 2026)
- Multimodal pipeline p50 latency ≤ 2s on a laptop CPU
If yes → proceed to seed round + v2.2 cognitive engine. If no → reassess brain-alignment as a v2.2 retry or deprioritize.
See BRAIN_ALIGNMENT.md and BAMS_BENCHMARK.md for full detail.
v2.2 — Cognitive engine (2027)
The first cognitive engine on top of the substrate. Adds operations that turn stored memory into active reasoning.
What ships
- Pattern completion via Hopfield networks. Modern Hopfield (Ramsauer et al. 2021) over stored signatures. Given a partial cue, retrieve the full pattern. Implements “remembering” as continuous attractor dynamics over the signature space, not just nearest-neighbor lookup.
- AGM belief revision. Alchourrón-Gärdenfors-Makinson belief revision semantics. New evidence triggers principled revision of dependent beliefs. Replaces the v2.0 ad-hoc confidence math.
- Analogical retrieval via HDC binding. “If A is to B as X is to ?”: bind(A, B) ⊕ X → answer signature. Recover via nearest-neighbor cleanup. Classic VSA analogy mechanism.
- Learned projection (if BAMS plateaus). Article XVIII clause 5 explicitly leaves this open for v2.2+. Replace Charikar 2002 random projection with a small MLP optimized against BAMS, only if the random baseline saturates.
- Background consolidation scheduler. Tokio-task-based, runs during idle periods. v2.0 ships synchronous consolidate(); v2.2 makes it automatic.
- Procedure success-rate-based retrieval reweighting. Floor 5 procedures with execution traces now influence which skills get retrieved in similar contexts.
Decision gate at v2.2
End of 2027. Three design-partner production deployments running >6 months. Multi-week zero-touch uptime. At least one revenue-generating customer. If yes → v2.3 fundraise.
Why this comes after brain-alignment
Pattern completion, AGM, and analogical retrieval are cognitive operations on top of the substrate. They need a credible substrate first (v2.0), benefit from brain-aligned encoders (v2.1), and add new capabilities on top. If brain-alignment validates the representations, v2.2 turns those representations into reasoning.
v2.3 — Causal layer (2028)
Add causal reasoning capabilities. The substrate becomes capable of representing not just what happened but why it happened.
What ships
- Causal claim storage. First-class
CausalClaimtype: “A caused B” with conditions, confidence, evidence. Stored as bound HDC patterns over (cause, effect, condition). - Intervention semantics. Pearl-style do-calculus operations over stored causal claims. “What would have happened if X hadn’t occurred?” answered via counterfactual replay.
- World model fragments. First-class
WorldModeltype. Causal claims compose into world model fragments. Models can be composed for prediction. - Causal-JEPA-style object-centric masking (if relevant work has matured). Object-level latent prediction for compositional causal reasoning.
- On-line learning state. Persisted hyperparameter and online-learning rate state, recovers correctly across restarts.
- HRR (Holographic Reduced Representations) as a secondary VSA format. Real-valued vectors with circular convolution binding. Useful for analog scalar values (temperatures, scores, probabilities) that BSC can’t represent natively.
Decision gate at v2.3
End of 2028. Enterprise deal pipeline established. At least 3 paying customers with 6+ figure annual contracts. Series A raised.
v2.4 — Production-grade (2029-2030)
The system goes from research-credible to enterprise-grade. Distributed mode, hardened safety, BCI experimentation.
What ships
- Distributed mode (still optional). Replication, sharding by entity or session, cross-region failover. Embedded-first OSS remains canonical (constitution article III).
- Formal safety guarantees on self-modification. When the agent unlearns or revises core beliefs, formal guarantees about what was changed, audit trail completeness, and recoverability. Type-system enforced where possible.
- Enterprise tier: SSO, audit-log encryption, role-based access, compliance certifications (SOC 2, HIPAA, ISO 27001).
- BCI input experimental.
agidb-bcicrate. EEG/MEG ingestion via Brain-JEPA (arxiv 2406.19260) or signal-JEPA encoders. Surprise gating extends to neural signals. - Multi-agent shared memory. Beyond v2.4’s single-agent focus: shared memory pools, conflict resolution, federated consolidation. Inspired by BMAS multi-agent architectures.
Decision gate at v2.4
Mid-2030. Revenue > $5M ARR. Series B raised. agidb is a production database with enterprise deployments and a >$50M valuation.
v2.5 — AGI-grade (2031)
The full cognitive substrate. By year 5, agidb is either the de facto AGI substrate (because frontier labs and OSS AGI projects build on it) or it isn’t (because the field moved past current paradigms — V-JEPA → next paradigm, HDC → spiking, etc).
What ships (if the bet pays off)
- Closed-loop self-modification. The agent can rewrite its own goals, beliefs, and even procedures, with formal safety boundaries.
- Causal reasoning over learned beliefs as a core API.
- Cognitive engine fully realized. Pattern completion, AGM revision, analogical retrieval, causal reasoning, sleep-like consolidation, brain-aligned encoding — all integrated into one substrate.
- Established interop standards. Standard formats for cognitive substrate (.agidb files), shared benchmarks (BAMS evolved to BAMS-2), and interop with the broader AGI ecosystem (OpenCog Hyperon, Monty, frontier-lab proprietary substrates).
- Production-grade with formal verification where applicable. Critical paths formally verified for safety properties.
If the bet doesn’t pay off
The field will have moved past current paradigms. V-JEPA may be replaced by something post-JEPA. HDC may be replaced by spiking neural networks on neuromorphic hardware. agidb v2.5 either pivots aggressively (becomes v3) or sunsets gracefully with substantial OSS legacy. Both outcomes are acceptable if the journey produces real value along the way.
What stays constant across the 5 years
- Constitution articles I-XVIII. The principles are the invariants. Code rots; principles don’t.
- The wedge. Content-addressable HDC retrieval, bi-temporal supersession, embedded Rust binary, no LLM in read path, first-class cognitive primitives, non-destructive unlearn. These differentiators don’t change.
- The audience. Developers building autonomous agents, regulated industries, AGI-curious researchers, local-first builders.
- The OSS-first commitment. The embedded engine stays free, complete, self-hostable, Apache-2.0.
What evolves across the 5 years
- The encoder stack. V-JEPA 2 → V-JEPA 3 (likely 2026-2027) → post-JEPA paradigm (2028+). agidb tracks the best available open-weight encoders.
- The VSA format. Default BSC throughout, with HRR as secondary in v2.3, SBDR (sparse) as candidate for v2.5.
- The brain-encoding ground truth. TRIBE v2 → TRIBE v3 (likely 2027) → whatever the next-best brain encoder is.
- The benchmark surface. LongMemEval/LoCoMo/BEAM + BAMS in v2.1. New benchmarks emerge; agidb runs them all.
- The substrate’s scale. v2.0 single-laptop; v2.4 enterprise multi-node; v2.5 substrate for the AGI ecosystem.
Risks and mitigations
| Risk | Probability | Mitigation |
|---|---|---|
| v2.0 decision gate fails | 30% | Reposition path defined; sochdb code valuable even if standalone |
| BAMS paper rejected at all venues | 15% | Multiple venue options; benchmark stands on its own as a public artifact |
| Direct rust HDC competitor emerges | 30% | Move fast on v2.1 brain-alignment; differentiate on cognitive primitives |
| Frontier lab open-sources a competing substrate | 20% | unlikely (no signals as of May 2026); agidb’s OSS-first commitment matches |
| V-JEPA 2 deprecated by post-JEPA paradigm | 20% by 2028 | Trait-based encoder abstraction; swap encoders without rewriting substrate |
| TRIBE v2 replaced by TRIBE v3 mid-cycle | 60% by 2027 | BAMS protocol is version-aware; recalibration documented |
| Funding environment for deep-tech infra deteriorates | 30% | Bootstrap-friendly architecture; revenue paths from enterprise contracts |
| Founder burnout over 5 years | 40% | Realistic milestone pacing; v2.0 ship at month 9 buys credibility for slower v2.2+ |
| Major safety incident in deployed agents | 20% | Constitution article on safety; cascading unlearn + audit log makes incidents recoverable |
Why this trajectory makes sense
Three reasons.
-
Each version is independently valuable. v2.0 ships as a credible substrate even if v2.1+ never happens. v2.1 ships as a credible brain-aligned substrate with a workshop paper even if v2.2+ never happens. The optionality compounds.
-
The cognitive primitives compound. Goals + beliefs + sensory + self-model + unlearn (v2.0) → multimodal + brain-alignment (v2.1) → pattern completion + analogical retrieval + AGM (v2.2) → causal claims + world models (v2.3) → BCI + multi-agent (v2.4) → closed-loop self-mod (v2.5). Each version’s capability requires the previous version’s foundation.
-
The competitive landscape favors a 5-year horizon. Mem0 ($24M, Series A), Letta, Zep, Cognee — all are racing on application-layer agent memory. None have committed to a 5-year substrate roadmap. By month 12, agidb is the only published cognitive substrate with brain-aligned evaluation. By year 3, the gap widens. By year 5, agidb is either the substrate or it isn’t — but no other team will have run this play.
The single non-negotiable
If at any point during the 5-year trajectory the constitution is violated to chase a feature, a customer, or a paper — the bet has been lost regardless of how good the numbers look. The substrate’s value compounds because the principles don’t move. Pivoting on principles ends the project; pivoting on tactics is normal.
See CONSTITUTION.md.
agidb — Roadmap
The week-by-week phase plan from where we are today (sochdb v1 phases 0-2-4-6 complete, rebranded to agidb v2) through v2.0 launch at month 9 and v2.1 brain-alignment ship at month 12. Sixteen phases total. Decision gate binding at week 12.
Status: weeks counted from agidb v2 kickoff (rebrand from sochdb v1). Phases 0, 1, 2, 4, 6 already complete from sochdb. Remaining critical path: phases 3, 5, 9-13 for v2.0; phases 14-16 for v2.1.
The 16 phases at a glance
| # | Phase | Weeks | Status | Version |
|---|---|---|---|---|
| 0 | Setup | — | ✅ done (sochdb v1) | inherited |
| 1 | HDC kernel | — | ✅ done (sochdb v1) | inherited |
| 2 | Storage | — | ✅ done (sochdb v1) | inherited |
| 3 | Extraction (GLiNER) | 1-4 | ⬜ | v2.0 critical |
| 4 | Binding + recall | — | ✅ done (sochdb v1) | inherited |
| 5 | MCP + Python | 5-8 | ⬜ | v2.0 critical |
| 6 | Consolidation | — | ✅ done (sochdb v1) | inherited |
| 7 | Decision gate | 11-13 | ⬜ | binding |
| 8 | Hardening + launch | 31-36 | ⬜ | v2.0 ship |
| 9 | Cognitive primitives (goals + beliefs) | 13-18 | ⬜ | v2.0 wedge |
| 10 | Sensory + self-model | 19-22 | ⬜ | v2.0 |
| 11 | Unlearn API | 23-25 | ⬜ | v2.0 |
| 12 | Neurosymbolic interface | 26-27 | ⬜ | v2.0 |
| 13 | Cognitive benchmarks | 28-30 | ⬜ | v2.0 |
| 14 | Multimodal sensory (V-JEPA 2 + Wav2Vec-BERT + Llama-3.2-3B) | 37-42 | ⬜ | v2.1 (gated) |
| 15 | Brain-calibrated surprise | 43-46 | ⬜ | v2.1 (gated) |
| 16 | BAMS benchmark + ICLR paper | 47-52 | ⬜ | v2.1 (gated) |
Phase ordering rationale
The ordering reflects three engineering constraints and one strategic constraint:
- Phase 3 first — extraction unlocks tier B recall and alias resolution. Without it, the recall cascade is missing its most important tier. Also unlocks belief extraction, which phase 9 needs.
- Phase 5 second — MCP + Python bindings make the engine consumable. Demos and design partners need this before we can run the decision gate.
- Phase 7 at week 12 — the binding decision gate happens after MCP/Python (so we can run real benchmarks against Mem0/Letta/Zep) but before the cognitive primitives. If the substrate doesn’t beat incumbents on the standard agent-memory benchmarks, the cognitive-primitive bet doesn’t get to run.
- Phases 9-13 after decision gate — only build the cognitive primitives if the substrate wins the gate. Otherwise reposition or retreat.
- v2.1 phases 14-16 only on “Commit” — constitutionally gated. No brain-alignment work if v2.0 substrate doesn’t earn its credibility first.
Pre-week-0 — Rebrand and namespace lock
Before the week-counter starts: rename sochdb → agidb across the codebase, push to GitHub, secure namespaces.
Tasks:
- ☐ Rename workspace crates:
sochdb-core→agidb-core,sochdb-cli→agidb-cli, etc. - ☐ Update
Cargo.tomlpackage names, dependency references, README path links. - ☐ Update doc references from “sochdb” to “agidb” (~50 places across docs/).
- ☐ Rename storage error type:
SochError→AgidbError. - ☐ Update the manifest format string from “sochdb-v0.1” to “agidb-v2.0”.
- ☐
cargo build --workspace && cargo test --workspace— all 44 tests still pass. - ☐ Buy
agidb.ai,agidb.dev,agidb.io,agidb.co. - ☐ Create
github.com/agidborganization, transfer existing sochdb commits. - ☐ Reserve
agidbcrate name on crates.io (publish empty 0.0.1 placeholder). - ☐ Reserve
agidbpackage on PyPI (placeholder). - ☐ Reserve
agidbon npm (placeholder, even if no JS pkg planned, for namespace hygiene). - ☐ Send formal prior-inventions email to Naman at Utkrusht.ai (this is the legal hygiene step you mentioned).
Exit criterion: the codebase compiles under the new name, all 44 tests pass, the GitHub org exists, the four domains are locked, the crates.io/PyPI/npm placeholders are claimed. Estimated effort: 1-2 weekends.
This is not counted as a week of the build. It’s prerequisite hygiene.
Weeks 1-4 — Phase 3: Extraction (GLiNER)
Goal: raw text in, structured triples + canonical entities + parsed time anchors + belief candidates out.
Week 1
- ☐ Vendor GLiNER ONNX model + tokenizer code from ctxgraph repo. Compile under
agidb-extractcrate. - ☐ Wire
ort(ONNX runtime) into the workspace. Verify CPU-only inference path works. - ☐ Add
agidb-extract::gliner::GLiNERExtractorwithextract(text, entity_types) -> Vec<Entity>API. - ☐ Write unit tests: 10 hand-labeled observations, check that entities + spans extracted correctly.
Week 2
- ☐ Build
agidb-extract::relations— given entities + sentence context, extract(subj, pred, obj)triples. - ☐ Add predicate-canonicalization trie (“recommended”, “suggested”, “told me about” →
recommends). - ☐ Build
agidb-extract::time— parse “last weekend”, “two months ago”, ISO dates, etc., intoTimeRange. Usechrono_englishfor casual phrasings. - ☐ Build
agidb-extract::alias— fuzzy match new mentions to existing canonical concepts (exact match + Levenshtein ≤ 3 for typos).
Week 3
- ☐ Wire extraction into
Agidb::observe(text)— replace today’s “pre-extracted triples only” path with full pipeline. - ☐ Property tests: 50 synthetic observations with known triples; check F1 > 0.85.
- ☐ Build gold-set evaluation: 100 hand-labeled observations from realistic agent-conversation data; record F1, precision, recall.
- ☐ Activate tier B in the recall cascade (now that triples exist with proper canonicalization).
- ☐ Activate alias resolution in tier A.
Week 4
- ☐ Build belief extractor: detect “X said Y”, “X believes Y”, “X claimed Y” patterns; emit
Beliefcandidates with confidence priors (0.5-0.8 depending on predicate). - ☐ Integration tests for full observe pipeline: text in → episode stored, triples in redb, signature in mmap, belief candidates queued.
- ☐ Benchmark: 100 observations/sec on a laptop CPU end-to-end.
- ☐ Documentation update:
LAYER_2_EXTRACTION.mdreflects shipped behavior, not aspirational.
Exit criterion: cargo test -p agidb-extract passes ≥30 new tests. F1 > 0.85 on the 100-sample gold set. Tier B activates correctly in recall(). Phase 3 complete.
Weeks 5-8 — Phase 5: MCP + Python
Goal: make agidb consumable from outside the Rust workspace. MCP server + Python wheels.
Week 5
- ☐ Build
agidb-mcpcrate. MCP server skeleton over stdio + JSON-RPC. - ☐ Expose MCP tools:
observe,recall,consolidate,between. (Goals/beliefs added later, post-phase-9.) - ☐ Tool schemas: JSON-Schema input/output for each, with examples.
- ☐ Smoke-test against Claude Desktop: register
agidbas an MCP server, observe + recall via Claude Desktop chat.
Week 6
- ☐ Build
agidb-pycrate. pyo3 bindings, async via pyo3-asyncio. - ☐ Expose:
Agidb.open,observe,recall,consolidate,set_goal(stub for now),assert_belief(stub for now). - ☐ Build maturin pipeline. Local
pip install -e .works. - ☐ Type stubs:
agidb.pyifor IDE support.
Week 7
- ☐ CI: build wheels for macOS (arm64 + x86), Linux (x86 + arm64), Windows (x86).
- ☐ Test wheels in fresh venvs across all platforms; verify imports and basic ops.
- ☐ Quickstart Python notebook: 50 LOC end-to-end demo (observe a conversation, recall, consolidate).
- ☐ MCP server: configurable port + transport (stdio default + optional WebSocket for non-Anthropic clients).
Week 8
- ☐ Documentation: Python API reference, MCP tool reference.
- ☐ Example agents: 3 small example agents (research-summarizer, journal, todo-helper) using agidb-py.
- ☐ Performance sanity-check across all bindings: end-to-end recall p95 < 100ms even through Python/MCP layer.
Exit criterion: pip install agidb works from a fresh venv. Claude Desktop can use agidb as a memory tool. 3 example agents run. Phase 5 complete.
Weeks 9-10 — Benchmark harness build (phase 7 prep)
Goal: build the harness before the cognitive primitives, so the decision gate at week 12 has working benchmarks ready to run.
Week 9
- ☐ Build
agidb-benchcrate. - ☐ Implement LongMemEval-S harness: load dataset, run agent loop with agidb backend, score with the official LongMemEval grading prompt.
- ☐ Implement equivalent harness for Mem0 baseline (call Mem0’s Python SDK from agidb-bench via subprocess).
- ☐ Six-metric output: BLEU, F1, LLM-judge, token cost, p95 latency, noisy-cue degradation.
Week 10
- ☐ LoCoMo harness — 10+ session conversations, memory consistency scoring.
- ☐ BEAM harness — millions-of-tokens scale, contradiction resolution.
- ☐ Baselines: Mem0, Letta, Zep/Graphiti (each via their respective Python SDK; subprocess invocation).
- ☐ Reproducibility kit: docker-compose for harness + all baselines, fixed seeds, committed dataset SHAs.
- ☐ Commit harness code by EOW10 (constitution article XIII: “harness committed by week 8” — we’re slightly behind but inside the 13-week window).
Exit criterion: agidb-bench run --suite all --systems agidb,mem0,letta,zep produces a JSON report with the six metrics across the three benchmarks. Reproducible from a docker container.
Weeks 11-13 — Phase 7: Decision gate (binding)
Goal: run the benchmarks, publish results, make the binding commit/reposition/retreat decision.
Week 11
- ☐ Commit thresholds (constitution article XIII: “thresholds committed by week 10” — we’re a week behind but inside the 13-week window). Write them down publicly so they can’t be quietly moved later.
- ☐ Run full benchmark suite. Three movies’ worth of compute.
- ☐ Sanity-check results against published numbers from Mem0/Letta/Zep papers; investigate anything off by > 5%.
Week 12 — the actual gate
- ☐ Final benchmark run. Raw logs preserved.
- ☐ Compare results against the three thresholds:
- Commit: agidb wins/ties on accuracy, beats on latency 3×+, beats on token cost 3×+, wins noisy-cue degradation. → proceed to phases 9-13 + v2.1.
- Reposition: agidb within 3pp of Mem0 F1 AND ≥10× memory savings. → ship as “agidb-lite”, skip v2.1, no fundraise.
- Retreat: more than 10pp behind on accuracy, no closing path. → fold back into ctxgraph.
Week 13
- ☐ Decision communicated to (a) self, (b) Naman/Utkrusht context (informational), (c) any prospective design partners.
- ☐ If Commit: phase 9 starts week 13. (Phase 9 takes 6 weeks → ends week 18.)
- ☐ If Reposition: pivot the messaging, defer phases 9-13, focus on phase 8 hardening as “agidb-lite”, skip v2.1.
- ☐ If Retreat: write a public post-mortem, transfer code back into ctxgraph repo, retire the agidb name.
Exit criterion (assuming Commit): decision made and publicly logged. Phase 9 begins. Phase 7 complete.
The rest of this roadmap assumes Commit. If Reposition or Retreat, see ROADMAP_REPOSITION.md or ROADMAP_RETREAT.md (TBD docs that get written if those branches activate).
Weeks 13-18 — Phase 9: Cognitive primitives (the wedge)
Goal: Goal and Belief as first-class typed shapes with state machines, revision audit, HDC signatures. The thing no other agent memory system has.
Week 13
- ☐ Add
agidb-core::goalmodule. Types:Goal,GoalState,GoalPatch,GoalTree,SuccessCriterion. State-machine transition validator. - ☐ Add
agidb-core::beliefmodule. Types:Belief,BeliefRevision,Evidence,RevisionReport. - ☐ Two new redb tables:
goals,beliefs. Migration code: open v2.0 db without these tables → create them empty. - ☐ Property tests: goal state machine invariants (Completed/Abandoned are terminal; pause/resume preserves history).
Week 14
- ☐ Implement
Agidb::set_goal,revise_goal,complete_goal,abandon_goal,active_goals,goal_tree,get_goal. - ☐ Goal HDC signature derivation: bind description tokens with parent context.
- ☐ Add
belief_revisionsredb table (third v2.0 table this phase). - ☐ Implement
Agidb::assert_belief,revise_belief,what_do_i_believe,belief_history,withdraw_belief.
Week 15
- ☐ Belief revision math: Bayesian-style confidence update on new evidence. Append
BeliefRevisionto log on every change. - ☐ LLM-assisted revision (constitution article IV amendment): when evidence is ambiguous, call an LLM at write time to judge contradiction. Structured prompt → structured
RevisionDecision. Document which LLMs are supported (Claude, GPT, local Llama via Ollama). - ☐ Withdraw belief on confidence drop below 0.5 (configurable).
- ☐ 100-step goal-mutation property test: random walk through goal state machines never violates invariants.
Week 16
- ☐ Wire goal-biased retrieval into
recall(). Active goals’ HDC signatures up-weight related episode matches bygoal_bias_weight * similarity(episode_sig, goal_sig). - ☐ Add
Recall::active_goalsandRecall::goal_biasedfields. - ☐ Extend MCP server with goal/belief tools:
set_goal,revise_goal,assert_belief,revise_belief,what_do_i_believe,active_goals. - ☐ Extend Python bindings with the same.
Week 17
- ☐ Belief context in recall results:
Recall::beliefsfield populated with beliefs about the queried subject. - ☐ Concept-level belief lookups:
what_do_i_believe(ConceptId)fast (indexed by belief.subject). - ☐ Property test: belief revision log captures every change; replaying the log reconstructs current confidence.
Week 18
- ☐ Integration test: 20-turn agent simulation where goals get set/revised/completed, beliefs get asserted/revised/withdrawn. Verify final state matches expected.
- ☐ Benchmark:
set_goal≤ 5ms,assert_belief≤ 5ms,revise_belief≤ 50ms (LLM-assisted path can be slower). - ☐ Docs update:
COGNITIVE_PRIMITIVES.mdmatches shipped behavior.
Exit criterion: 100-step goal mutation test passes. Belief revision audit log captures every change. Goal-biased retrieval working. Phase 9 complete.
Weeks 19-22 — Phase 10: Sensory + self-model
Goal: floor 1 (sensory ring buffer with surprise gating) and floor 7 (learning event log + self-vector EMA).
Week 19
- ☐ Add
agidb-core::sensorymodule. Types:SensoryFrame,SensoryData,Modality, ring-buffer logic. - ☐ New redb table:
sensory_buffer(with ring-eviction semantics). - ☐ Implement
Agidb::observe_sensory,working_state,surprise_score. - ☐ Surprise computation:
1 - similarity(new_sig, bundle_of(recent_beliefs)).
Week 20
- ☐ Surprise-gated promotion: sensory frames with
surprise > threshold(default 0.4) auto-promote to episodic via internalobserve()call. - ☐ Add
agidb-core::learning_logmodule. New redb table:learning_events. - ☐ Implement
LearningEventenum (closed set per constitution XV implication). Emit events from every state-changing operation across the engine.
Week 21
- ☐ Implement
Agidb::what_did_i_learn(since)— query the learning log. - ☐ Add
attention_tracerecording to the recall path. Whenquery.trace_attention = true, buildAttentionTraceand emit to learning log. - ☐ Implement
Agidb::attention_trace(recall_id)lookup.
Week 22
- ☐ Self-vector implementation. New redb table:
self_vector_history(originally scheduled for v2.1, brought forward into v2.0 because phase 11’s unlearn needs it). 8192-bit HV, EMA update on each consolidate pass:self_vec ← (1-α) self_vec + α bundle(consolidated_atoms). - ☐ Implement
Agidb::self_vector,self_vector_at(time),self_vector_history. - ☐ Wire self-vector update into the consolidation worker (extends phase 6 code).
- ☐ Benchmark: sensory ingest 1000 frames/sec, surprise gating promotes ~5%, learning log writes don’t bottleneck observe.
Exit criterion: sensory buffer ingests at target rate. Surprise gating promotes only the novel. Learning log captures every state change. Self-vector drifts with consolidation. Phase 10 complete.
Weeks 23-25 — Phase 11: Unlearn API
Goal: non-destructive cascading unlearn with self-vector subtraction and permanent audit. Constitution article XVI.
Week 23
- ☐ Add
agidb-core::unlearnmodule. Types:UnlearnTarget,UnlearnReport,Tombstone, cascade-graph computation. - ☐ New redb table:
tombstones. - ☐ Cascade-graph algorithm: given a target (Concept/Episode/Belief/Session/Source), compute the full dependency set across episodes, beliefs, semantic atoms, procedures.
- ☐ Property test: cascade-graph correctly identifies all dependents (gold set of 20 hand-traced cascades).
Week 24
- ☐ Implement
Agidb::unlearn(target, reason):- Compute cascade.
- Tombstone all affected rows (set
tombstoned_at). - Invalidate signatures in mmap (mark in slot header).
- Cascade through beliefs: confidence reduce or withdraw; emit
BeliefRevision. - Cascade through semantic atoms: recompute without removed evidence; withdraw if evidence drops below threshold.
- Self-vector subtraction:
self_vec ← self_vec - α · bundle(tombstoned_sigs). Append corrected snapshot toself_vector_history. - Emit
LearningEvent::Unlearned(permanent, survives compaction).
- ☐ Implement
Agidb::unlearn_report,unlearn_history,restore_within_window(30-day recovery).
Week 25
- ☐ Bi-temporal filter in
recall()extended: tombstoned rows excluded by default;as_ofqueries can still surface them within the 30-day window. - ☐ Property tests: unlearn a 100-episode concept → all references gone within 100ms; self-vector hamming distance to pre-unlearn state matches
α · bundle(tombstoned). - ☐ Compliance test: simulate a GDPR Article 17 request (BySource unlearn). Verify all data gone, audit log entry permanent.
- ☐ MCP + Python expose
unlearn,unlearn_history,restore_within_window.
Exit criterion: 100-episode unlearn completes in ≤100ms. Self-vector verifiably no longer contains the unlearned concept. Audit log permanent. Phase 11 complete.
Weeks 26-27 — Phase 12: Neurosymbolic interface
Goal: expose the implicit signature↔triple translation as a first-class API. Hybrid queries.
Week 26
- ☐ Add
agidb-nscrate (already scaffolded). Implement the five translation directions: triple_to_signature, signature_to_triples, cue_to_partial_signature, belief_to_signature, multimodal-factorization stub (full multimodal in phase 14). - ☐ Implement
Agidb::neurosymbolic_querywithHybridWeights. Combines structured triple-pattern matching with fuzzy HDC similarity. - ☐ Default hybrid weights for
recall():{structured: 0.7, fuzzy: 0.3}.
Week 27
- ☐ Property tests: bind-then-unbind roundtrip recovers triples with low hamming error. Hybrid weights at extremes (1,0) and (0,1) reduce to pure structured / pure fuzzy.
- ☐ MCP + Python expose
neurosymbolic_query,signature_to_triples,triples_to_signature. - ☐ Docs:
NEUROSYMBOLIC.mdmatches shipped behavior.
Exit criterion: hybrid queries with 50/50 weights return appropriately blended results. Phase 12 complete.
Weeks 28-30 — Phase 13: Cognitive benchmarks
Goal: the four cognitive benchmarks no other system can run on itself.
Week 28
- ☐ Build
agidb-bench::cognitivemodule with four benchmark suites:- Goal consistency: 50 simulated agent sessions with goal trees of depth 3; verify state machine never violates invariants.
- Belief revision: 50 sequences of (assertion, contradiction, re-assertion) with known correct revision history; verify agidb’s audit log matches.
- Unlearn cascade: 30 GDPR-style requests; verify cascading removal completes correctly + self-vector reflects subtraction.
- Multi-floor retrieval: 50 queries requiring information from 2+ floors (e.g. “what did Sarah say about my current goal?”) — verify recall returns matches grounded across floors.
Week 29
- ☐ Run benchmarks against agidb. Document thresholds: goal consistency ≥99%, belief revision audit ≥95% match, unlearn cascade ≥99%, multi-floor retrieval F1 ≥80%.
- ☐ Comparison baselines (where they’re applicable): run goal consistency + belief revision against mem0/letta/zep — most will score near 0% because they don’t have these primitives. That’s the point.
Week 30
- ☐ Write up cognitive benchmark whitepaper section (becomes part of the eventual v2.0 launch arxiv paper).
- ☐ Integrate cognitive benchmarks into CI: every PR runs goal consistency + multi-floor retrieval as smoke tests.
Exit criterion: all four cognitive benchmarks pass agidb thresholds. Phase 13 complete.
Weeks 31-36 — Phase 8: Hardening + launch (v2.0 ships)
Goal: turn an in-progress engine into a launchable v2.0 substrate.
Week 31-32
- ☐ Expand the harness: add a fuzz target for
observe(random text strings) andrecall(random queries); run 24h fuzz, fix anything that crashes. - ☐ 30-day soak test: continuous load test simulating an agent that observes 100/day, consolidates daily, recalls 1000/day, unlearns 5/week. Run on a laptop; verify no leaks, no degradation, no corruption.
- ☐ Crash-recovery tests: kill mid-write at 100 random points; verify recovery to last commit.
Week 33
- ☐ Write the v2.0 arxiv whitepaper. ~12 pages. Sections: introduction, related work (mem0/letta/zep/cognee/MemMachine), architecture, benchmark methodology, results, cognitive benchmark results, future work (v2.1 brain-alignment teased here).
- ☐ Internal review.
Week 34
- ☐ Onboard 3-5 design partners. Outreach to: 2 frontier-adjacent startups, 1 regulated-industry team (legal or healthcare), 1 local-first AI builder, 1 academic researcher (Hyperon/Monty-adjacent).
- ☐ Each partner gets a private alpha + a slack channel + biweekly check-ins.
- ☐ Documentation pass: every public API method has rustdoc with examples.
Week 35
- ☐ Launch blog post draft. Demo video (3 minutes): observe → recall → goal → belief → consolidate → unlearn → self-model query.
- ☐ Public website at agidb.ai. Landing + docs + blog.
- ☐ crates.io publish:
agidb0.1.0 + all sub-crates. PyPI publish:agidb0.1.0. MCP-registry publish.
Week 36
- ☐ Public launch. arxiv post. blog post. HN/X/lobste.rs announcements. Mastodon for the federated AI/ML crowd.
- ☐ Office hours for the first 2 weeks post-launch: 1h/day for issues + questions.
- ☐ v2.0 SHIPS. Month 9 milestone reached.
Exit criterion: cargo add agidb and pip install agidb work. 3+ design partners running agidb in something resembling production. arxiv paper posted. Blog post live. 1000+ GitHub stars by end of week 36 (aspirational, not exit-gating). Phase 8 complete. v2.0 LAUNCHED.
Weeks 37-42 — Phase 14: Multimodal sensory (v2.1 begins)
Goal: V-JEPA 2 + Wav2Vec-BERT + Llama-3.2-3B sensory encoders, Charikar 2002 random projection to 8192-bit HVs, VSA multimodal binding.
Gate check: v2.1 work begins ONLY if phase 7 decision was “Commit” AND v2.0 launched successfully. Constitution article XVIII clause 2 + XIII extension.
Week 37
- ☐ Create
agidb-sensorycrate. Add to workspace. - ☐ Wire
ort(ONNX runtime) for V-JEPA 2 inference. Download V-JEPA 2 Gigantic-256 weights from HuggingFace (CC BY-NC); pin SHA. - ☐ Implement
agidb-sensory::vjepa::VJepa2Encoderwithencode(video: &VideoClip) -> Result<[f32; 1024]>. Spatial mean pooling of the 8192-token output. - ☐ Smoke test: encode a 64-frame video clip, verify output shape + reasonable values.
Week 38
- ☐ Wire Wav2Vec-BERT 2.0. Download weights, pin SHA. Implement
agidb-sensory::wav2vec::Wav2VecBertEncoderwithencode(audio: &AudioClip) -> Result<[f32; 1024]>. Temporal mean pooling. - ☐ Wire Llama-3.2-3B as a text encoder (forward pass only, not generation). Implement
agidb-sensory::llama::LlamaTextEncoderwithencode(text: &str) -> Result<[f32; 2048]>. Mean pooling of layer-32 hidden state. - ☐ Inference performance baseline on a laptop: measure CPU latency for each.
Week 39
- ☐ Implement
agidb-sensory::project::HDCProjector— Charikar 2002 thresholded random projection. Per-encoder seeded matrices. - ☐ Property tests: same input + same seed → same output (determinism). 1000 random latent pairs → hamming distance ordering preserves cosine distance ordering (Spearman correlation > 0.85).
- ☐ Add
MultimodalEncodertrait. Each encoder getsencode_and_project(input) -> Result<HV>.
Week 40
- ☐ Implement
agidb-sensory::multimodal::bind_multimodal_episode— VSA role-filler binding:episode = ROLE_VIDEO ⊕ sig_v XOR ROLE_AUDIO ⊕ sig_a XOR ROLE_TEXT ⊕ sig_t XOR ROLE_GOAL ⊕ sig_g XOR ROLE_TIME ⊕ sig_time. - ☐ Implement modality factorization:
extract_modality_signature(episode_sig, modality)returns approximate sig + nearest-neighbor cleanup against per-modality codebook. - ☐ Property test: bind 3 modalities, extract each individually with cleanup, hamming distance to original sig ≤ 200 bits (2.5% of 8192).
Week 41
- ☐ Extend
Agidb::observe_multimodal(video, audio, text, ctx)API. Wire into layer 3 storage: append per-modality signatures to mmap, store offsets in newmodality_signaturescolumn on episodes. - ☐ Two new redb tables:
self_vector_history(already added in phase 10, schema unchanged),encoder_versions(new). - ☐ Encoder version mismatch detection: open a db with encoders X, binary uses encoders Y → error with migration message.
- ☐ Extend
recall()to factor multimodal episodes: per-modality similarity scoring when query specifies a modality preference.
Week 42
- ☐ End-to-end benchmark: 30s video + 30s audio + 100 tokens text → encoded → projected → bound → stored. P50 latency ≤ 2s CPU on a laptop.
- ☐ Optional Candle backend: pure-Rust ML inference path as alternative to ONNX. Identical outputs to within 1e-3.
- ☐ MCP + Python expose
observe_multimodal. - ☐ Docs update:
LAYER_2_EXTRACTION.md,BRAIN_ALIGNMENT.md,LAYER_3_STORAGE.mdreflect shipped behavior.
Exit criterion: end-to-end multimodal observe pipeline works. P50 latency ≤ 2s on laptop CPU. Modality factorization works (extract recovers original sig with < 200 bits noise). Phase 14 complete.
Weeks 43-46 — Phase 15: Brain-calibrated surprise
Goal: empirically fit the surprise threshold θ_brain against TRIBE v2 predicted neural surprise.
Week 43
- ☐ Download TRIBE v2 weights from
huggingface.co/facebook/tribev2(CC BY-NC; research use). Pin SHA. - ☐ Build TRIBE v2 inference wrapper. v2.1 uses PyO3 subprocess call to a Python script running TRIBE v2 (because TRIBE’s reference inference is Python; pure-Rust port deferred to v2.2+).
- ☐ Verify TRIBE v2 inference matches published numbers on a sample stimulus (within Pearson r±0.005 of the paper’s reported value on a single subject single movie).
Week 44
- ☐ Acquire Courtois NeuroMod dataset access (open access; requires acknowledgment + email registration).
- ☐ Acquire Algonauts 2025 OOD stimulus files (open access via algonauts.org).
- ☐ Pick a representative subject (e.g. Courtois NeuroMod subject 1) and a held-out movie segment (e.g. Pulp Fiction first 20 minutes).
- ☐ Run TRIBE v2 over the stimulus → predicted BOLD per parcel per TR.
Week 45
- ☐ Compute neural surprise: at each TR,
neural_surprise(t) = || BOLD_pred(t) - sliding_mean(BOLD_pred, ±5 TRs) ||over associative-cortex parcels (TPJ, dlPFC, DMN regions in Schaefer 1000 atlas). - ☐ Run agidb’s observe_multimodal pipeline over the same stimulus → signature stream.
- ☐ Compute agidb surprise: at each TR,
agidb_surprise(t) = 1 - hamming_sim(sig(t), bundle(sigs[t-K..t])). - ☐ Fit threshold θ_brain to maximize Pearson correlation between Indicator(agidb_surprise > θ_brain) and Indicator(neural_surprise > σ × mean_neural_surprise) for σ ∈ {1.5, 2.0, 2.5}.
Week 46
- ☐ Validate calibration on a held-out movie (Princess Mononoke or World of Tomorrow). Calibrated threshold should generalize within ±10% of fitted value.
- ☐ Publish calibrated θ_brain as the default surprise threshold for new v2.1 databases. Store in
manifest.tomlwith provenance (calibration dataset SHA, TRIBE v2 version, fit date). - ☐ Documentation:
BRAIN_ALIGNMENT.mdsection on calibration includes the full reproducible recipe. - ☐ Add
Agidb::brain_calibration()andAgidb::recalibrate(dataset)APIs. - ☐ Comparison plot: pre-calibration (θ=0.4) vs post-calibration (θ_brain) sensory promotion patterns on a held-out movie. Visually demonstrate the difference.
Exit criterion: calibrated θ_brain ships in v2.1. Reproducible calibration recipe documented. Phase 15 complete.
Weeks 47-52 — Phase 16: BAMS benchmark + ICLR paper
Goal: ship the brain-aligned memory similarity benchmark suite, run all baselines, write and submit the ICLR 2026 MemAgents workshop paper.
Week 47
- ☐ Create
agidb-bamscrate. - ☐ Implement
agidb-bams::protocol— the BAMS protocol (per BAMS_BENCHMARK.md): stimulus loading, TRIBE v2 inference, per-network RDM construction, agent RDM construction, RSA scoring. - ☐ Implement
agidb-bams::networks— six functional cortical network definitions (DMN, visual, auditory, language, dorsal attention, frontoparietal), Schaefer-to-network mapping.
Week 48
- ☐ Build baseline adapters:
agidb-bams::baselines::{mem0, letta, zep, hipporag, raw_vjepa, random}. Each implementsAgentMemorySystem::replay_stimulus(stream) -> Vec<HV>. - ☐ For text-only baselines (mem0/letta/zep), replay strategy: feed text descriptions of stimuli (captions/transcripts) since they don’t support multimodal natively. Document this as a methodological limitation in the paper.
- ☐ Random baseline: random 8192-bit HVs as the statistical null. Should score ~0.
Week 49
- ☐ Run full BAMS suite: 6 movies × 7 systems × 6 networks. Estimated compute: ~8h on a laptop with GPU; ~24h CPU-only. Run on a cloud GPU for speed.
- ☐ Generate report (
agidb-bams report results.json --format html). Overall + per-network + per-movie tables. - ☐ Ablations: agidb without VSA binding (concatenation), agidb with attention fusion instead of XOR, agidb without brain-calibrated surprise, agidb without consolidation.
Week 50
- ☐ Paper draft. Title: Brain-Aligned Memory Retrieval: Measuring Cognitive Plausibility in Agent Memory Systems via TRIBE-Derived Ground Truth. Target: ICLR 2026 MemAgents workshop (6-page version). Sections per
BAMS_BENCHMARK.mdpaper outline. - ☐ Figures: overall BAMS scores table, per-network heatmap, ablation table, RDM visualizations (a few representative examples).
- ☐ Internal review.
Week 51
- ☐ Address review feedback. Revise paper.
- ☐ Build reproduction kit: Docker container that runs the full BAMS suite end-to-end with one command. Pin all dependency versions, dataset SHAs, model weight hashes.
- ☐ Open-source
agidb-bamsongithub.com/agidb/agidb-bamsunder Apache-2.0 (benchmark code) with explicit notes about TRIBE v2 CC BY-NC for the weight artifacts.
Week 52
- ☐ Submit to ICLR 2026 MemAgents workshop. (If deadline missed, backup is CCN 2026.)
- ☐ Crates.io: publish
agidb 0.2.0(v2.1) +agidb-sensory 0.1.0+agidb-bams 0.1.0. PyPI: publishagidb 0.2.0. - ☐ Launch blog post for v2.1. Demo: observe a video clip, recall it via cue, factor by modality, run BAMS self-score.
- ☐ v2.1 SHIPS. Month 12 milestone reached.
Exit criterion: BAMS suite open-source with reproducible baselines. ICLR 2026 MemAgents paper submitted. agidb 0.2.0 published. Phase 16 complete. v2.1 LAUNCHED.
Beyond week 52
After v2.1 ships, the focus shifts to:
- Seed fundraise (if not done sooner): now there’s a substrate + a paper + design partners. Target $1-3M from a deep-tech-friendly fund.
- v2.2 cognitive engine work (2027): pattern completion, AGM belief revision, analogical retrieval. See
AGI_TRAJECTORY.md. - Community + ecosystem: developer relations, conference talks (ICLR 2026 in person if accepted, CCN 2026, MLSys 2027 submission, RustConf workshop), contributor onboarding.
- Hardening for the long tail: issues from real production users, performance regressions, the things you only find by being in production for 6+ months.
Risk register and mitigations
| Risk | Phase impacted | Mitigation |
|---|---|---|
| GLiNER F1 lower than 0.85 on real data | 3 | Augment with regex patterns + canonicalization rules; possibly add LLM-fallback for low-confidence extractions (write-time only) |
| Decision gate threshold ambiguous (close to threshold) | 7 | Pre-commit thresholds week 10; tiebreaker is noisy-cue degradation (the one Mem0 reliably loses) |
| Cognitive primitives ship but no design partners care | 9-13 | Talk to design partners during phases 9-13, not just at launch; iterate on the wedge based on real friction |
| V-JEPA 2 ONNX export incomplete/buggy | 14 | Fallback to Candle backend; or PyO3 subprocess to torch as last resort |
| TRIBE v2 inference too slow to calibrate | 15 | Use a smaller calibration subset (single movie, single subject) for v2.1; full calibration deferred to v2.2 |
| Courtois NeuroMod access friction | 15 | Backup: Algonauts 2025 OOD predictions are public-derivable from TRIBE v2 directly; doesn’t strictly require Courtois |
| BAMS baselines (mem0/letta/zep) don’t support multimodal | 16 | Document as methodological limitation; use text-only stimulus stream for those baselines; still scores meaningfully on language-network alignment |
| MemAgents deadline missed | 16 | Backup: CCN 2026 has a later deadline; if both missed, MLSys 2027 or NeurIPS 2026 main track |
| Burnout across 52 weeks | all | Pace: phases 9-13 are six weeks each, not three. Sleep more than the build dictates. Phases inherited from sochdb v1 = real savings, not aspirational. |
What this roadmap doesn’t try to cover
- Day-to-day engineering tasks (covered by issues + ADRs in the repo).
- Marketing + community-building beyond launch posts.
- Hiring (the plan is solo through v2.1; first hires post-seed in 2027).
- Detailed fundraise mechanics (separate doc when relevant).
- v2.2+ phase plans (see
AGI_TRAJECTORY.mdfor the 5-year shape; detailed roadmaps for v2.2+ get written when we get there).
This is a 52-week plan. It will slip. Slip-handling rule: when a phase runs over by more than 1 week, stop and decide explicitly whether to (a) cut scope of the current phase, (b) push everything downstream by the slip amount, or (c) deprioritize a later phase. Don’t let slips compound silently.