Roadmap — Pending Stages

Each stage adds a capability. What we've built is the organism's core. What's ahead is making it battle-ready, observable, and usable.

Done

Stage	Name	What It Unlocked
0	Conception	Heartbeat
1	Amoeba	Single cell (brain + senses + muscles + memory)
2	Hydra	Multi-agent (inbox + scheduling)
3a	Embryo (self-building)	Axiom creates agents
3b	Embryo (governance)	Build/Fuse mode
3c	Embryo (self-sufficiency)	Per-agent data + grants
4a	Ephemeral Pages	Short-link web pages for interactions
4b	Pure Core + Plugins	Channels extracted as plugins
5	Onboarding Conversation	Axiom grows the organism through chat — no forms, no wizards
5a	MiniWhatsApp + plugin	Local WhatsApp-lookalike for real external messaging tests
5c	Claude CLI Architect	Self-coding agents — organism writes new TS tools into sandboxed agent folders
6	Mesh Visualization	Force-directed graph of agents + message flow, aggregated across recent traces

Total so far: 74 tests passing, 18 catalog skills, complete organism core, self-extension via Claude CLI.

Pending — In Suggested Priority Order

✅ Stage 5: Onboarding Conversation (DONE — see stage doc)

What gets built:

Axiom interviews new customers through /talk
Builds their organism through natural conversation
Creates agents, assigns skills, sets up grants
No admin panel, no forms — setup IS a conversation

Flow:

Customer: "Hi, I'm Naveen, I run a small household"
Axiom:    "Welcome. Who's in your household?"
Customer: "My maid Radha speaks Telugu, my driver Agil speaks Tamil"
Axiom:    [creates passive agents] "Got it. What do you need them for?"
Customer: "Daily task delegation via WhatsApp"
Axiom:    [plans agent architecture]
          "I'll set up an attendance tracker and a task router. Sound good?"

What you get:

Best-in-class demo for prospects
Validates the self-building organism capability
Exercises every system (agents, skills, grants, passive, plugins, pages)
Real usage patterns → feedback for improvement

Dependencies: None (all primitives exist) Complexity: Medium — mostly instruction-writing for Axiom + guided flow logic Time to build: 1-2 days

✅ Stage 5a: MiniWhatsApp — Local Test Harness (DONE — see stage doc)

What gets built:

A standalone web app living OUTSIDE orbita-core (one directory up):

/Users/sajithmr/1box/
  ├── orbita-core/
  └── miniwhatsapp/          ← new project
      ├── server.ts          HTTP + WebSocket server
      ├── public/
      │   └── index.html     Chat UI (browser app)
      └── package.json

MiniWhatsApp features:

User registers with name + mobile number (any string, no verification)
Session stored in browser localStorage (persistent)
Chat UI showing conversation list + active chat
Send message → broadcasts over WebSocket
Receive message → appears instantly in UI
Multiple browsers/incognito tabs = multiple "WhatsApp users"

REST/WebSocket API (for the Orbita plugin):

POST /api/send — Orbita sends outbound: { to: "+91...", text: "..." }
WS /api/events — Orbita listens: { from: "+91...", text: "..." } events
GET /api/users — list registered users

Orbita side — plugins/miniwhatsapp.plugin.ts:

Opens WebSocket to miniwhatsapp
Maintains contact book: agent name → miniwhatsapp number
Watches matching agent inboxes → POSTs to /api/send
Receives WS events → writes to agent inbox via inbox.send

What you get:

Full end-to-end external messaging without Meta API
Test as many concurrent users as you have browser windows
Real HTTP + WebSocket transport (no in-memory shortcut)
Confidence that the plugin architecture handles real external systems
When real WhatsApp comes, it's just a new plugin with different transport — core unchanged

Dependencies: None (pure dev tool) Complexity: Low-Medium — it's just a small web app + plugin Time to build: 1-2 days (miniwhatsapp) + 1 day (plugin)

Why this is genius: The hardest part of "Stage 5b: Real WhatsApp" is integration testing. You can't iterate easily against Meta's API. With MiniWhatsApp, you can:

Test message routing in minutes
Demo to colleagues without Meta account
CI-friendly (run in tests)
Debug the plugin pattern with real transport

✅ Stage 5c: Claude CLI Architect (Self-Coding Agents) (DONE — see stage doc)

The big idea: A new agent mode that uses Claude CLI (not the API) to generate agents WITH their tool implementations — including TypeScript code. Bring-your-own-agent + build-your-own-agent, all from natural language.

What gets built:

src/cortex/cli-runtime.ts — spawns claude CLI process for agents needing full computer access
Architect-mode agents — agents with mode: "cli" in config.json run via Claude CLI instead of API
skills/architect/SKILL.md — Paperclip-style instructions that teach Claude CLI:
- Orbita's architecture (link to manifesto)
- Existing agents and their skills (dynamic)
- Skill catalog (what tools exist)
- Coding conventions (where files go, how to register)
- How to create: folder + instruction.md + config.json + src/tools/X.skill.ts
add_coded_skill tool — Architect writes new TypeScript tools, registers in catalog, commits
Build mode gate — CLI mode only works in BUILD mode (structural changes = code changes)
Feed-the-brain — the CLI agent receives full context:
- manifesto.md (architecture)
- list_agents output (what exists)
- catalog.listNames() output (available skills)
- Existing src/tools/ structure (example implementations)

Example flow:

User: "I need a GST tax calculator agent for India —
       GST on invoices, HSN code lookup, GSTR filing prep"

Axiom (API mode): recognizes need for CODED skills
                  → delegates to architect agent (CLI mode)

Architect (CLI): reads manifesto + existing agents
                 → creates agents/gst-calculator/ folder
                 → writes instruction.md (persona)
                 → creates skills/calculate-gst.skill.md
                 → writes src/tools/calculate-gst.skill.ts (actual TS code)
                 → creates skills/lookup-hsn.skill.md
                 → writes src/tools/lookup-hsn.skill.ts
                 → updates build-catalog.ts to register new skills
                 → updates factories.ts
                 → writes agents/gst-calculator/config.json with skills
                 → runs tests to verify
                 → reports: "gst-calculator agent ready with 2 coded skills"

What you get:

True self-extension — organism writes new code for itself
BYOA/BYOS — customers describe domain needs, system generates the code
Architecture-aware generation — controlled, follows conventions (not ad-hoc)
Dogfood loop — organism can improve its own core too
Paperclip pattern realized — this is what Paperclip proved works
Scales horizontally — each domain (GST, HIPAA, payroll) becomes its own agent with coded tools

Why our controlled CLI instead of just running claude manually?

Manual `claude` in terminal	Orbita-controlled CLI
No knowledge of Orbita architecture	Auto-injects manifesto + conventions
Doesn't know existing agents/skills	Sees live catalog, avoids duplicates
Ad-hoc file placement	Enforces folder/naming conventions
No audit trail	Every CLI run traced with traceId
Could break the system	Gated by BUILD mode, tests run after
Expert-only	Any user can say "build me a X agent"
No tests enforced	Auto-runs test suite after generation

Governance:

CLI mode is structural → BUILD mode only
Every file created is audited to the trace log
Axiom approves the plan before CLI agent executes
Automatic rollback if tests fail

Quality Pipeline (mandatory gates)

Every agent/tool generated by the CLI Architect must pass ALL gates before being activated:

Gate 1: Static Validation

TypeScript compiles (tsc --noEmit on new files)
Imports resolve to existing modules
Exports match the expected factory signature
No disallowed imports (nothing that bypasses inbox/data/loader)
Skill name in markdown matches tool name in code
config.json shape is valid

Gate 2: Orbita Rules Validation (QA via Claude CLI)

A second CLI pass acts as QA Architect with a focused SKILL.md:

You are the Orbita QA Reviewer. Validate this generated agent against:
  - DNA laws (8 laws)
  - Inbox-only communication (no direct calls)
  - Data namespace isolation (only touches own + granted)
  - Trace continuity (preserves traceId)
  - No reserved names
  - Skill convention compliance
  - Does NOT conflict with existing skills (no duplicates, no overrides)
  - Does NOT break any existing agent (integration analysis)
Output: PASS / FAIL with specific violations.

Gate 3: Dependency Management

If new tools need new npm packages:

Architect lists required packages (dependencies array in plan)
package.json updated
npm install runs automatically
Installation success verified
Axiom decides: live reload (if supported) OR restart required notice

Gate 4: Isolated Agent Test

Every new agent MUST ship with a unit test. The architect generates:

tests/agents/<agent-name>.test.ts
Runs in isolated sandbox — copy agent folder to /tmp/orbita-test-<uuid>/
Tests each skill with mock services
Tests happy path + error path for each tool
MongoDB: in-memory test DB (mongodb-memory-server)
Must pass 100% before agent is activated

Gate 5: Integration Smoke Test

Start a test runtime with ONLY the new agent + Axiom
Send a sample message to trigger a skill
Verify end-to-end: inbox → skill → tool → result
Trace inspected for trace continuity

What Happens on Failure

Any gate failing → rollback:

Delete agent folder
Revert package.json
Revert catalog registration
Log what failed + why to audit trail
Report to user: "Agent generation failed at Gate X: <reason>. Try again with different approach?"

Agent Test Portability

A generated agent must be testable in isolation:

bash

# Copy an agent (from any Orbita tenant) and test it standalone:
orbita test-agent tenant/agents/gst-calculator/
  → creates /tmp/orbita-test-xxx/
  → copies agent folder + required tools
  → runs the agent's test suite
  → reports pass/fail
  → cleans up

This means:

Portable agents — someone shares an agent folder, you test it before using
CI-friendly — every agent tested independently
Trust marketplace — future agent marketplace can verify submissions this way
Safe installs — drop an agent in, test it, only then activate

Dependencies: None technical, but needs claude CLI installed on the host Complexity: High — spawning CLI, context injection, code generation safety, quality gates, isolation Time to build: 5-7 days (vs 4-6 without quality gates)

Where it slots in: Before Stage 7 (Immune System), because self-coding unblocks rapid domain expansion.

🟡 Stage 5b: Real WhatsApp Plugin

What gets built:

plugins/whatsapp.plugin.ts — real Meta Business API integration
Contact book (agent name → WhatsApp number) in plugin's own MongoDB collection
Webhook receiver for incoming messages
Template management (Meta's approved templates)
24-hour session window handling

Flow:

Axiom creates passive agent "radha" (language: te)
Plugin admin UI: "radha = +91 9988776655"
Axiom sends to radha's inbox
WhatsApp plugin formats for Telugu, sends via Meta
Real Radha receives WhatsApp
Real Radha replies → webhook fires → plugin writes to radha's inbox

What you get:

First real external channel works
Proves plugin pattern end-to-end
Demo to prospects who care about real messaging
Foundation for email/SMS/Telegram plugins later

Dependencies: Meta Business API account (someone outsources this) Complexity: Medium — API integration + webhooks + templates Time to build: 2-3 days (most time is Meta setup, not code)

✅ Stage 6: Trace Mesh Visualization (DONE — see stage doc)

What gets built:

/trace upgraded to a full-graph visualization (Vue Flow)
Multiple traces overlaid — see the mesh
Real-time updates (SSE or WebSocket)
Click agent → see their activity
Click trace → see full graph
Performance metrics (tokens, time, cost per agent/trace)
Filter by agent, time window, trace type

What you get:

True observability — see inside the living organism
Debugging mastery — any issue traceable visually
Customer-facing dashboard (eventually)
Performance tuning data (where are the slow agents?)
Cost tracking per agent

Dependencies: None Complexity: Medium — Vue Flow + backend streaming Time to build: 2-3 days

🟠 Stage 7: Immune System (Self-Healing)

What gets built:

src/sentinel/ module
Agent crash detection & auto-restart
Stuck task escalation (task pending too long → notify manager)
Plugin failure handling (retry, circuit breaker, fallback)
Rate limiting per agent / per channel
Dead letter queue for failed inbox messages
Anomaly detection (unusual patterns → alert)
Budget enforcement (LLM cost caps)

What you get:

Production-ready resilience
Organism survives partial failures
Can deploy with confidence
Alerting for operators
Cost control

Dependencies: Some production load (need real failures to handle) Complexity: Medium-high — defensive code, many edge cases Time to build: 3-5 days

🟠 Stage 8: Multi-Tenant Isolation

What gets built:

Control plane API (provision/start/stop tenants)
Per-tenant Docker containers (or shared instance with strict isolation)
Tenant-scoped everything (DB prefix, agents, plugins)
Per-tenant usage tracking (cost, storage)
Tenant-level config (timezone, language defaults, budget caps)
Provisioning workflow: new customer → new isolated organism

What you get:

Can sell to multiple customers simultaneously
Each customer's organism fully isolated
SaaS-ready
Billing/metering foundation

Dependencies: Stages 5-7 should be solid first Complexity: High — infrastructure + orchestration Time to build: 5-7 days

🔵 Stage 9: Adult (Production Polish)

What gets built:

Security audit (token strength, RBAC, input validation)
Backup & restore (automatic MongoDB backups)
Upgrade path (framework version migration)
Extensive docs for developers building plugins
Plugin marketplace (community-shared plugins)
CLI tool (orbita create-tenant, orbita list-agents, etc.)
Zero-downtime deployment
Compliance (data retention, audit log exports)

What you get:

Enterprise-ready product
Sellable at scale
Long-term sustainable

Dependencies: Stages 5-8 complete Complexity: High Time to build: Ongoing

Minor Improvements (Can Slot In Anytime)

🟢 Clean up legacy docs

Old stage docs reference add_contact, OrgManager, etc. that no longer exist. Update them to match current architecture.

🟢 Example plugin library

Add plugins/examples/ with templates: email plugin, SMS plugin, webhook-forwarder plugin.

🟢 Agent templates library

Pre-built passive agent templates (Driver, Nurse, Vendor, Student) that customers can use as starting points.

🟢 CLI for agent creation

orbita add-person Naveen --role=Leader --language=en for power users.

🟢 Getting started guide

Step-by-step tutorial for first-time users. "In 10 minutes, your first organism."

Recommendation

If I were building this for market fit, I'd do:

1. Onboarding Conversation (Stage 5)   ← the demo
2. MiniWhatsApp + Plugin (Stage 5a)    ← local test harness, builds confidence
3. Claude CLI Architect (Stage 5c)     ← self-coding agents, BYOA/BYOS
4. Trace Mesh Viz (Stage 6)            ← operator confidence
5. Real WhatsApp Plugin (Stage 5b)     ← natural extension of 5a
6. Immune System (Stage 7)             ← production hardening
7. Multi-tenant (Stage 8)              ← SaaS scaling

Why this order?

Stage 5 wins prospects' attention (the demo)
Stage 5a is cheap confidence (real transport, no Meta account needed)
Stage 6 gives operators trust (see inside the organism)
Stage 5b becomes trivial after 5a works (swap the plugin's URL)
Stage 7-8 are production concerns — address when production exists

But it's your call. You know your situation.

Roadmap — Pending Stages ​

Done ​

Pending — In Suggested Priority Order ​

✅ Stage 5: Onboarding Conversation (DONE — see stage doc) ​

✅ Stage 5a: MiniWhatsApp — Local Test Harness (DONE — see stage doc) ​

✅ Stage 5c: Claude CLI Architect (Self-Coding Agents) (DONE — see stage doc) ​

Quality Pipeline (mandatory gates) ​

Gate 1: Static Validation ​

Gate 2: Orbita Rules Validation (QA via Claude CLI) ​

Gate 3: Dependency Management ​

Gate 4: Isolated Agent Test ​

Gate 5: Integration Smoke Test ​

What Happens on Failure ​

Agent Test Portability ​

🟡 Stage 5b: Real WhatsApp Plugin ​

✅ Stage 6: Trace Mesh Visualization (DONE — see stage doc) ​

🟠 Stage 7: Immune System (Self-Healing) ​

🟠 Stage 8: Multi-Tenant Isolation ​

🔵 Stage 9: Adult (Production Polish) ​

Minor Improvements (Can Slot In Anytime) ​

🟢 Clean up legacy docs ​

🟢 Example plugin library ​

🟢 Agent templates library ​

🟢 CLI for agent creation ​

🟢 Getting started guide ​

Recommendation ​

Roadmap — Pending Stages

Done

Pending — In Suggested Priority Order

✅ Stage 5: Onboarding Conversation (DONE — see stage doc)

✅ Stage 5a: MiniWhatsApp — Local Test Harness (DONE — see stage doc)

✅ Stage 5c: Claude CLI Architect (Self-Coding Agents) (DONE — see stage doc)

Quality Pipeline (mandatory gates)

Gate 1: Static Validation

Gate 2: Orbita Rules Validation (QA via Claude CLI)

Gate 3: Dependency Management

Gate 4: Isolated Agent Test

Gate 5: Integration Smoke Test

What Happens on Failure

Agent Test Portability

🟡 Stage 5b: Real WhatsApp Plugin

✅ Stage 6: Trace Mesh Visualization (DONE — see stage doc)

🟠 Stage 7: Immune System (Self-Healing)

🟠 Stage 8: Multi-Tenant Isolation

🔵 Stage 9: Adult (Production Polish)

Minor Improvements (Can Slot In Anytime)

🟢 Clean up legacy docs

🟢 Example plugin library

🟢 Agent templates library

🟢 CLI for agent creation

🟢 Getting started guide

Recommendation