Skip to content

Roadmap — Pending Stages

Each stage adds a capability. What we've built is the organism's core. What's ahead is making it battle-ready, observable, and usable.

Done

StageNameWhat It Unlocked
0ConceptionHeartbeat
1AmoebaSingle cell (brain + senses + muscles + memory)
2HydraMulti-agent (inbox + scheduling)
3aEmbryo (self-building)Axiom creates agents
3bEmbryo (governance)Build/Fuse mode
3cEmbryo (self-sufficiency)Per-agent data + grants
4aEphemeral PagesShort-link web pages for interactions
4bPure Core + PluginsChannels extracted as plugins
5Onboarding ConversationAxiom grows the organism through chat — no forms, no wizards
5aMiniWhatsApp + pluginLocal WhatsApp-lookalike for real external messaging tests
5cClaude CLI ArchitectSelf-coding agents — organism writes new TS tools into sandboxed agent folders
6Mesh VisualizationForce-directed graph of agents + message flow, aggregated across recent traces

Total so far: 74 tests passing, 18 catalog skills, complete organism core, self-extension via Claude CLI.


Pending — In Suggested Priority Order

✅ Stage 5: Onboarding Conversation (DONE — see stage doc)

What gets built:

  • Axiom interviews new customers through /talk
  • Builds their organism through natural conversation
  • Creates agents, assigns skills, sets up grants
  • No admin panel, no forms — setup IS a conversation

Flow:

Customer: "Hi, I'm Naveen, I run a small household"
Axiom:    "Welcome. Who's in your household?"
Customer: "My maid Radha speaks Telugu, my driver Agil speaks Tamil"
Axiom:    [creates passive agents] "Got it. What do you need them for?"
Customer: "Daily task delegation via WhatsApp"
Axiom:    [plans agent architecture]
          "I'll set up an attendance tracker and a task router. Sound good?"

What you get:

  • Best-in-class demo for prospects
  • Validates the self-building organism capability
  • Exercises every system (agents, skills, grants, passive, plugins, pages)
  • Real usage patterns → feedback for improvement

Dependencies: None (all primitives exist) Complexity: Medium — mostly instruction-writing for Axiom + guided flow logic Time to build: 1-2 days


✅ Stage 5a: MiniWhatsApp — Local Test Harness (DONE — see stage doc)

What gets built:

A standalone web app living OUTSIDE orbita-core (one directory up):

/Users/sajithmr/1box/
  ├── orbita-core/
  └── miniwhatsapp/          ← new project
      ├── server.ts          HTTP + WebSocket server
      ├── public/
      │   └── index.html     Chat UI (browser app)
      └── package.json

MiniWhatsApp features:

  • User registers with name + mobile number (any string, no verification)
  • Session stored in browser localStorage (persistent)
  • Chat UI showing conversation list + active chat
  • Send message → broadcasts over WebSocket
  • Receive message → appears instantly in UI
  • Multiple browsers/incognito tabs = multiple "WhatsApp users"

REST/WebSocket API (for the Orbita plugin):

  • POST /api/send — Orbita sends outbound: { to: "+91...", text: "..." }
  • WS /api/events — Orbita listens: { from: "+91...", text: "..." } events
  • GET /api/users — list registered users

Orbita side — plugins/miniwhatsapp.plugin.ts:

  • Opens WebSocket to miniwhatsapp
  • Maintains contact book: agent name → miniwhatsapp number
  • Watches matching agent inboxes → POSTs to /api/send
  • Receives WS events → writes to agent inbox via inbox.send

What you get:

  • Full end-to-end external messaging without Meta API
  • Test as many concurrent users as you have browser windows
  • Real HTTP + WebSocket transport (no in-memory shortcut)
  • Confidence that the plugin architecture handles real external systems
  • When real WhatsApp comes, it's just a new plugin with different transport — core unchanged

Dependencies: None (pure dev tool) Complexity: Low-Medium — it's just a small web app + plugin Time to build: 1-2 days (miniwhatsapp) + 1 day (plugin)

Why this is genius: The hardest part of "Stage 5b: Real WhatsApp" is integration testing. You can't iterate easily against Meta's API. With MiniWhatsApp, you can:

  • Test message routing in minutes
  • Demo to colleagues without Meta account
  • CI-friendly (run in tests)
  • Debug the plugin pattern with real transport

✅ Stage 5c: Claude CLI Architect (Self-Coding Agents) (DONE — see stage doc)

The big idea: A new agent mode that uses Claude CLI (not the API) to generate agents WITH their tool implementations — including TypeScript code. Bring-your-own-agent + build-your-own-agent, all from natural language.

What gets built:

  1. src/cortex/cli-runtime.ts — spawns claude CLI process for agents needing full computer access
  2. Architect-mode agents — agents with mode: "cli" in config.json run via Claude CLI instead of API
  3. skills/architect/SKILL.md — Paperclip-style instructions that teach Claude CLI:
    • Orbita's architecture (link to manifesto)
    • Existing agents and their skills (dynamic)
    • Skill catalog (what tools exist)
    • Coding conventions (where files go, how to register)
    • How to create: folder + instruction.md + config.json + src/tools/X.skill.ts
  4. add_coded_skill tool — Architect writes new TypeScript tools, registers in catalog, commits
  5. Build mode gate — CLI mode only works in BUILD mode (structural changes = code changes)
  6. Feed-the-brain — the CLI agent receives full context:
    • manifesto.md (architecture)
    • list_agents output (what exists)
    • catalog.listNames() output (available skills)
    • Existing src/tools/ structure (example implementations)

Example flow:

User: "I need a GST tax calculator agent for India —
       GST on invoices, HSN code lookup, GSTR filing prep"

Axiom (API mode): recognizes need for CODED skills
                  → delegates to architect agent (CLI mode)

Architect (CLI): reads manifesto + existing agents
                 → creates agents/gst-calculator/ folder
                 → writes instruction.md (persona)
                 → creates skills/calculate-gst.skill.md
                 → writes src/tools/calculate-gst.skill.ts (actual TS code)
                 → creates skills/lookup-hsn.skill.md
                 → writes src/tools/lookup-hsn.skill.ts
                 → updates build-catalog.ts to register new skills
                 → updates factories.ts
                 → writes agents/gst-calculator/config.json with skills
                 → runs tests to verify
                 → reports: "gst-calculator agent ready with 2 coded skills"

What you get:

  • True self-extension — organism writes new code for itself
  • BYOA/BYOS — customers describe domain needs, system generates the code
  • Architecture-aware generation — controlled, follows conventions (not ad-hoc)
  • Dogfood loop — organism can improve its own core too
  • Paperclip pattern realized — this is what Paperclip proved works
  • Scales horizontally — each domain (GST, HIPAA, payroll) becomes its own agent with coded tools

Why our controlled CLI instead of just running claude manually?

Manual claude in terminalOrbita-controlled CLI
No knowledge of Orbita architectureAuto-injects manifesto + conventions
Doesn't know existing agents/skillsSees live catalog, avoids duplicates
Ad-hoc file placementEnforces folder/naming conventions
No audit trailEvery CLI run traced with traceId
Could break the systemGated by BUILD mode, tests run after
Expert-onlyAny user can say "build me a X agent"
No tests enforcedAuto-runs test suite after generation

Governance:

  • CLI mode is structural → BUILD mode only
  • Every file created is audited to the trace log
  • Axiom approves the plan before CLI agent executes
  • Automatic rollback if tests fail

Quality Pipeline (mandatory gates)

Every agent/tool generated by the CLI Architect must pass ALL gates before being activated:

Gate 1: Static Validation

  • TypeScript compiles (tsc --noEmit on new files)
  • Imports resolve to existing modules
  • Exports match the expected factory signature
  • No disallowed imports (nothing that bypasses inbox/data/loader)
  • Skill name in markdown matches tool name in code
  • config.json shape is valid

Gate 2: Orbita Rules Validation (QA via Claude CLI)

A second CLI pass acts as QA Architect with a focused SKILL.md:

You are the Orbita QA Reviewer. Validate this generated agent against:
  - DNA laws (8 laws)
  - Inbox-only communication (no direct calls)
  - Data namespace isolation (only touches own + granted)
  - Trace continuity (preserves traceId)
  - No reserved names
  - Skill convention compliance
  - Does NOT conflict with existing skills (no duplicates, no overrides)
  - Does NOT break any existing agent (integration analysis)
Output: PASS / FAIL with specific violations.

Gate 3: Dependency Management

If new tools need new npm packages:

  • Architect lists required packages (dependencies array in plan)
  • package.json updated
  • npm install runs automatically
  • Installation success verified
  • Axiom decides: live reload (if supported) OR restart required notice

Gate 4: Isolated Agent Test

Every new agent MUST ship with a unit test. The architect generates:

  • tests/agents/<agent-name>.test.ts
  • Runs in isolated sandbox — copy agent folder to /tmp/orbita-test-<uuid>/
  • Tests each skill with mock services
  • Tests happy path + error path for each tool
  • MongoDB: in-memory test DB (mongodb-memory-server)
  • Must pass 100% before agent is activated

Gate 5: Integration Smoke Test

  • Start a test runtime with ONLY the new agent + Axiom
  • Send a sample message to trigger a skill
  • Verify end-to-end: inbox → skill → tool → result
  • Trace inspected for trace continuity

What Happens on Failure

Any gate failing → rollback:

  • Delete agent folder
  • Revert package.json
  • Revert catalog registration
  • Log what failed + why to audit trail
  • Report to user: "Agent generation failed at Gate X: <reason>. Try again with different approach?"

Agent Test Portability

A generated agent must be testable in isolation:

bash
# Copy an agent (from any Orbita tenant) and test it standalone:
orbita test-agent tenant/agents/gst-calculator/
 creates /tmp/orbita-test-xxx/
 copies agent folder + required tools
 runs the agent's test suite
  → reports pass/fail
  → cleans up

This means:

  • Portable agents — someone shares an agent folder, you test it before using
  • CI-friendly — every agent tested independently
  • Trust marketplace — future agent marketplace can verify submissions this way
  • Safe installs — drop an agent in, test it, only then activate

Dependencies: None technical, but needs claude CLI installed on the host Complexity: High — spawning CLI, context injection, code generation safety, quality gates, isolation Time to build: 5-7 days (vs 4-6 without quality gates)

Where it slots in: Before Stage 7 (Immune System), because self-coding unblocks rapid domain expansion.


🟡 Stage 5b: Real WhatsApp Plugin

What gets built:

  • plugins/whatsapp.plugin.ts — real Meta Business API integration
  • Contact book (agent name → WhatsApp number) in plugin's own MongoDB collection
  • Webhook receiver for incoming messages
  • Template management (Meta's approved templates)
  • 24-hour session window handling

Flow:

Axiom creates passive agent "radha" (language: te)
Plugin admin UI: "radha = +91 9988776655"
Axiom sends to radha's inbox
WhatsApp plugin formats for Telugu, sends via Meta
Real Radha receives WhatsApp
Real Radha replies → webhook fires → plugin writes to radha's inbox

What you get:

  • First real external channel works
  • Proves plugin pattern end-to-end
  • Demo to prospects who care about real messaging
  • Foundation for email/SMS/Telegram plugins later

Dependencies: Meta Business API account (someone outsources this) Complexity: Medium — API integration + webhooks + templates Time to build: 2-3 days (most time is Meta setup, not code)


✅ Stage 6: Trace Mesh Visualization (DONE — see stage doc)

What gets built:

  • /trace upgraded to a full-graph visualization (Vue Flow)
  • Multiple traces overlaid — see the mesh
  • Real-time updates (SSE or WebSocket)
  • Click agent → see their activity
  • Click trace → see full graph
  • Performance metrics (tokens, time, cost per agent/trace)
  • Filter by agent, time window, trace type

What you get:

  • True observability — see inside the living organism
  • Debugging mastery — any issue traceable visually
  • Customer-facing dashboard (eventually)
  • Performance tuning data (where are the slow agents?)
  • Cost tracking per agent

Dependencies: None Complexity: Medium — Vue Flow + backend streaming Time to build: 2-3 days


🟠 Stage 7: Immune System (Self-Healing)

What gets built:

  • src/sentinel/ module
  • Agent crash detection & auto-restart
  • Stuck task escalation (task pending too long → notify manager)
  • Plugin failure handling (retry, circuit breaker, fallback)
  • Rate limiting per agent / per channel
  • Dead letter queue for failed inbox messages
  • Anomaly detection (unusual patterns → alert)
  • Budget enforcement (LLM cost caps)

What you get:

  • Production-ready resilience
  • Organism survives partial failures
  • Can deploy with confidence
  • Alerting for operators
  • Cost control

Dependencies: Some production load (need real failures to handle) Complexity: Medium-high — defensive code, many edge cases Time to build: 3-5 days


🟠 Stage 8: Multi-Tenant Isolation

What gets built:

  • Control plane API (provision/start/stop tenants)
  • Per-tenant Docker containers (or shared instance with strict isolation)
  • Tenant-scoped everything (DB prefix, agents, plugins)
  • Per-tenant usage tracking (cost, storage)
  • Tenant-level config (timezone, language defaults, budget caps)
  • Provisioning workflow: new customer → new isolated organism

What you get:

  • Can sell to multiple customers simultaneously
  • Each customer's organism fully isolated
  • SaaS-ready
  • Billing/metering foundation

Dependencies: Stages 5-7 should be solid first Complexity: High — infrastructure + orchestration Time to build: 5-7 days


🔵 Stage 9: Adult (Production Polish)

What gets built:

  • Security audit (token strength, RBAC, input validation)
  • Backup & restore (automatic MongoDB backups)
  • Upgrade path (framework version migration)
  • Extensive docs for developers building plugins
  • Plugin marketplace (community-shared plugins)
  • CLI tool (orbita create-tenant, orbita list-agents, etc.)
  • Zero-downtime deployment
  • Compliance (data retention, audit log exports)

What you get:

  • Enterprise-ready product
  • Sellable at scale
  • Long-term sustainable

Dependencies: Stages 5-8 complete Complexity: High Time to build: Ongoing


Minor Improvements (Can Slot In Anytime)

🟢 Clean up legacy docs

Old stage docs reference add_contact, OrgManager, etc. that no longer exist. Update them to match current architecture.

🟢 Example plugin library

Add plugins/examples/ with templates: email plugin, SMS plugin, webhook-forwarder plugin.

🟢 Agent templates library

Pre-built passive agent templates (Driver, Nurse, Vendor, Student) that customers can use as starting points.

🟢 CLI for agent creation

orbita add-person Naveen --role=Leader --language=en for power users.

🟢 Getting started guide

Step-by-step tutorial for first-time users. "In 10 minutes, your first organism."


Recommendation

If I were building this for market fit, I'd do:

1. Onboarding Conversation (Stage 5)   ← the demo
2. MiniWhatsApp + Plugin (Stage 5a)    ← local test harness, builds confidence
3. Claude CLI Architect (Stage 5c)     ← self-coding agents, BYOA/BYOS
4. Trace Mesh Viz (Stage 6)            ← operator confidence
5. Real WhatsApp Plugin (Stage 5b)     ← natural extension of 5a
6. Immune System (Stage 7)             ← production hardening
7. Multi-tenant (Stage 8)              ← SaaS scaling

Why this order?

  • Stage 5 wins prospects' attention (the demo)
  • Stage 5a is cheap confidence (real transport, no Meta account needed)
  • Stage 6 gives operators trust (see inside the organism)
  • Stage 5b becomes trivial after 5a works (swap the plugin's URL)
  • Stage 7-8 are production concerns — address when production exists

But it's your call. You know your situation.

Orbita — We don't build software. We grow organisms.