Otonomii Agents

Agents

The Possibility Ladder

8

The Possibility Ladder is the core decision framework that every Otonomii agent follows. It is a sequential process where each rung builds on the output of the previous one, and the final rung — Learn — feeds back into the first — Observe — creating a perpetual improvement loop. Every rung is approximate by nature. The system never claims certainty; it claims calibrated confidence.

Execution
Team

Routing Logic

Tasks are automatically routed to the team member best suited to handle them. Routing decisions are based on task characteristics, not round-robin assignment. The Orchestrator evaluates each incoming task and determines the optimal route.

Reviewer

Specialist

Executor

Reviewer

Specialist (15x cheaper)

Orchestrator handles directly

Design review / consensus needed

Algorithmic / math / debugging task

> 4 hours or 20+ files affected

Web / current information needed

Cost-sensitive bulk operations

Everything else

Governance

All agent work passes through a structured governance process before reaching production. Advisory boards provide domain-specific review, and overseers provide cross-cutting scrutiny on high-stakes decisions. This governance model ensures that no single perspective — however expert — goes unchallenged.

Building Custom Agents

Custom agents are built using the Otonomii SDK, which provides programmatic access to all four cognitive layers. The SDK supports Python, TypeScript, Go, and Java, with type-safe interfaces that mirror the platform's internal architecture. Building an agent follows the same Discovery-Configuration-Deployment lifecycle described above.


Tool integration uses the Model Context Protocol (MCP), which provides a standardized interface for agents to interact with external systems. MCP tools are declarative — you describe what the tool does, what inputs it accepts, and what outputs it produces. The agent's Machine layer decides when and how to use each tool based on the current state of the Possibility Ladder.


Agent-to-agent communication follows the Hub-and-Spoke pattern. Each agent operates as an independent spoke, and a coordination hub aggregates their outputs. Agents can share observations, predictions, and confidence levels without sharing internal state. This enables ensemble decision-making where multiple specialized agents contribute to a single outcome — and where conflict between agents is treated as a signal, not an error.

Agent Lifecycle

3

4

5

6

1

2

7

8

Observe

Perceive the current state without interpretation. Raw sensory input from all available sources — price feeds, system metrics, news streams, user behavior, environmental data. The key discipline at this stage is to observe without filtering. Premature interpretation narrows the possibility space before it has been fully explored. The Brain layer's write operation captures everything; the Mind layer's encoding has not yet been applied.


Possibilities

Exhaust what CAN happen before estimating what WILL happen. This is the most undervalued step in any decision framework. Most systems jump directly from observation to probability estimation, which means they only consider outcomes they already expect. The Possibilities step forces the system to enumerate the full space of potential outcomes — including unlikely, unprecedented, and uncomfortable ones. Accuracy without knowing the full possibility space is dangerous. Being honest about all possibilities is more valuable than false confidence in one outcome.



Context

Regime, volume, pivots, prior day, time remaining. Context transforms raw possibilities into situated ones. The same set of possibilities has different weightings in a low-volatility grinding market versus a high-volatility breakout regime. Context includes not just current conditions but the path that led here — the branch of the Hierarchical Pattern-Outcome Tree that produced the current state. Context also includes meta-context: how reliable is our context assessment? Are we in a known regime or an ambiguous one?



Probabilities

Weighted consensus from matches and context. Now — and only now — does the system estimate likelihoods. Probabilities are derived from historical pattern matches weighted by contextual similarity. Multiple models vote independently, and the Hub-and-Spoke architecture aggregates their assessments. Agreement increases confidence. Conflict signals uncertainty. Silence signals novelty. Probabilities are always expressed with confidence intervals, never as point estimates.



Decision

System presents, human decides — or in autonomous mode, Machine decides. The decision step is where the system commits to a course of action. In human-in-the-loop deployments, the system presents its analysis with full transparency: what it observed, what possibilities it considered, what context it applied, and what probabilities it estimated. The human makes the final call. In fully autonomous mode, the Machine layer's threshold and stopping rule determine whether to act. The system never acts without passing both the confidence gate and the risk assessment gate.



Risk Assessment

Adversarial recall, counter-examples. Never act without this. Before any action is taken, the system performs a structured adversarial review. It actively searches for counter-examples — historical situations that looked similar but had different outcomes. It recalls scenarios where similar decisions led to losses. It applies the Contrarian Gate: what would happen if this decision is wrong? What is the maximum downside? Is this a One-Way Door (hard to reverse, requiring extra scrutiny) or a Two-Way Door (easily reversed, allowing faster action)? Risk assessment is not optional — it is a hard gate.



Action

Execute. This is the boundary between cognition and behavior. The Machine layer translates the decision into concrete operations — placing orders, triggering workflows, sending notifications, or making API calls. Action is deliberately separated from decision because the quality of execution can differ from the quality of the decision. A good decision poorly executed still fails. The system logs the exact action taken, the timestamp, the market state at execution, and any slippage between intended and actual outcomes.



Learn

Track outcomes. Credit assignment: was the error in observation, possibilities, context, decision, or execution? The Learn step closes the loop by feeding back into Observe. Every outcome is compared against the prediction made at the Probabilities step, and the prediction error is decomposed across all prior steps. If the observation was correct but the possibility space was too narrow, that is a Possibilities error. If the possibilities were correct but the context was misjudged, that is a Context error. This granular credit assignment ensures that learning improves the specific step that failed, rather than making undifferentiated adjustments to the whole system.

2

1

Discovery

Identify the problem domain, data sources, and success criteria. Map the operational environment — what systems exist, what data flows between them, what decisions are currently being made by humans. Define the Arena: world model, instruments, and charter. Discovery produces a structural model of the problem space before any agent is configured.



Configuration

Assemble the agent from composable cognitive layers. Select and configure Brain storage (what to remember and how), Mind models (what to predict and how to encode), Machine parameters (thresholds, stopping rules, action space), and Arena bindings (which instruments, which data feeds, which execution venues). Configuration is declarative — you describe what the agent should do, not how.



Deployment

Deploy to the target environment with progressive rollout. Shadow mode runs the agent alongside existing processes without taking action, allowing validation against real data. Canary mode allows limited action on a subset of the workload. Full deployment grants the agent its complete action space. Each stage includes automated health checks and rollback triggers.



Monitoring

Continuous observation of agent behavior against expected patterns. The monitoring system itself uses the Hub-and-Spoke architecture — multiple monitoring spokes watch different aspects of agent performance independently. Agreement between monitors means the agent is behaving as expected. Conflict means something unexpected is happening. Dashboards are viewers, not control panels — the monitoring system itself can trigger responses.



Learning

The agent updates its internal models based on outcome feedback. Learning is not retraining — it is continuous adaptation through prediction error signals. The Mind layer adjusts its models, the Machine layer adjusts its thresholds, and the Brain layer consolidates new experiences into long-term memory. Learning is regime-indexed: what is learned in one regime is stored with that regime's context and only reactivated when similar conditions recur.



Evolution

Periodic structural review of the agent's architecture. Are the right spokes in place? Is the Arena model still accurate? Have new data sources become available? Evolution is the meta-learning step — it changes not just parameters but structure. Evolution decisions are classified as One-Way Door by default and require governance review before implementation.

3

4

5

6

Orchestrator

Reasoning, design, production code, state management

The Orchestrator is the primary reasoning engine. It handles task decomposition, architectural decisions, production code generation, and state management across the system. It coordinates the other team members, routing tasks to the appropriate specialist based on task type. The Orchestrator maintains the full context of the current operation and is responsible for ensuring coherence across all decisions.

Reviewer

Consensus, web grounding, spec validation


The Reviewer provides independent validation and consensus-building. It is stateless — every invocation must include full context because it maintains no memory between calls. This is a feature, not a limitation: statelessness ensures the Reviewer evaluates each situation on its merits without anchoring to prior opinions. It excels at spec validation, web-grounded fact checking, and identifying gaps in reasoning.

Specialist

Algorithms, math, debugging — 15x cheaper for bulk


The Specialist handles algorithmic work, mathematical proofs, performance optimization, and deep debugging. At 15x lower cost than the Orchestrator for bulk operations, it is the preferred choice for cost-sensitive tasks that require technical depth but not broad reasoning. It excels at tasks with well-defined inputs and outputs — implementing a specific algorithm, optimizing a query, or tracing a bug through a call stack.

Executor

Long-running tasks (24h+), large refactors

The Executor handles tasks that take hours or days to complete — large codebase refactors, multi-file migrations, comprehensive test suite generation. It operates asynchronously and requires a detailed specification upfront because it cannot course-correct mid-execution. The quality of the specification directly determines the quality of the output. Create detailed specs first; the Executor cannot ask clarifying questions.

Subconscious

Cross-session continuity, persistent memory


The Subconscious provides memory that persists across sessions and interactions. It stores decisions, rationale, context, and outcomes that would otherwise be lost when a session ends. It enables the system to build on past work without re-discovering context, maintain long-running projects across multiple interactions, and detect patterns that only emerge over time. The Subconscious is passive — it is queried by other team members, not invoked directly.

Causal Inference Board

Causal reasoning, statistical validity, experimental design. Ensures the system ascends the causal ladder — from correlation to intervention to counterfactual. Reviews all learning algorithms and validation methodologies.


Representational Learning Board

Architecture patterns, self-supervised learning, energy-based models. Evaluates whether the system is learning efficient representations. Reviews model architecture decisions and training approaches.

Neuromorphic Systems Board

Hierarchical temporal memory, spatial reasoning, reference frames. Ensures the system maintains structural integrity and that cognitive layers respect their boundaries. Reviews layer interactions and import rules.


Applied Systems Board

Code quality, practical implementation, training infrastructure. Ensures designs are implementable, efficient, and maintainable. Reviews production code, performance characteristics, and operational reliability.