Observability
How to monitor agent activity, track costs at every granularity level, replay any action, configure alerts, and export data for your own analytics. Everything your agents do is logged, metered, and replayable.
Observability in Capx
Running an autonomous company means trusting agents to act on your behalf. Trust requires visibility. Capx records every action, every inference call, every tool invocation, every governance decision, and every credit spent.
This data is available in real time through four surfaces:
You can access observability data through three interfaces:
- The Casa dashboard, a visual, browser-based interface.
- The CLI, for terminal-first workflows.
- The REST API, for programmatic access.
All three surfaces expose the same underlying data.
Activity feed
The activity feed is a chronological log of every action taken by every agent in your company. Each entry includes the agent, the action type, the playbook and step that triggered it, the cost, the outcome, and a timestamp.
What is logged
| Event type | Description | Includes |
|---|---|---|
| playbook.started | A playbook execution began. | Playbook name, version, trigger source, inputs. |
| playbook.completed | A playbook execution finished. | Duration, total cost, step count, final status. |
| step.started | A playbook step began executing. | Step ID, agent, tool/prompt, estimated cost. |
| step.completed | A step finished. | Output summary, actual cost, rubric score (if applicable). |
| step.failed | A step failed. | Error message, retry count, escalation path. |
| rubric.evaluated | A rubric graded step output. | Score, criteria, pass/fail, retry decision. |
| approval.requested | An action entered the approval queue. | Action details, estimated cost, context. |
| approval.resolved | A founder approved or rejected an action. | Decision, feedback, response time. |
| heartbeat.fired | An agent's heartbeat cycle ran. | Actions taken, items evaluated, cost. |
| agent.state_changed | An agent changed state. | Old state, new state, reason. |
| budget.warning | A budget threshold was crossed. | Agent/company, threshold, current usage. |
| budget.exceeded | A budget cap was hit. | Agent/company, cap type, action taken. |
Querying the activity feed
Query the same feed from the CLI or the REST API. Both support filtering by agent, event type, playbook, and time window.
# Recent activity (last 2 hours) capx activity --company my-company --since 2h # TIME AGENT EVENT COST STATUS # 10:42 engineer step.completed 22cr ok # 10:38 engineer step.started - running # 10:35 marketer rubric.evaluated 5cr pass (8/10) # 10:31 marketer step.completed 18cr ok # 10:15 support heartbeat.fired 4cr 2 actions # 10:02 strategist approval.requested - pending # 09:58 strategist step.completed 12cr ok # Filter by agent capx activity --company my-company --agent engineer --since 24h # Filter by event type capx activity --company my-company --type step.failed --since 7d # Filter by playbook capx activity --company my-company --playbook weekly-content --since 7d # Search activity (full-text search on event details) capx activity --company my-company --search "customer onboarding" --since 30d
Cost ledger
The cost ledger tracks every credit spent with granularity down to individual tool calls within a playbook step. It is the authoritative record for billing and the primary tool for cost optimization.
Granularity levels
| Level | What it shows | Use case |
|---|---|---|
| Company | Total credits per day/week/month across all agents. | Budget tracking, executive reporting. |
| Agent | Credits per agent per time period. | Identifying expensive agents, rebalancing budgets. |
| Playbook | Credits per playbook across all runs. | Optimizing high-cost workflows. |
| Run | Credits per playbook run, broken down by step. | Debugging a specific execution. |
| Step | Credits per step, split into inference and tool costs. | Finding expensive prompts or tools. |
| Call | Individual inference calls and tool invocations. | Deep debugging, model comparison. |
# Company-level: monthly summary capx costs --company my-company # Total this month: 15,790 credits ($157.90) # Agent-level: per-agent breakdown capx costs --company my-company --by agent # AGENT THIS MONTH % OF TOTAL BUDGET REMAINING # strategist 4,240cr 26.8% 30,000 25,760 # engineer 8,720cr 55.2% 20,000 11,280 # marketer 1,150cr 7.3% 15,000 13,850 # support 1,680cr 10.6% 8,000 6,320 # Run-level: step-by-step cost breakdown capx costs --company my-company --run run_8k2m9x # STEP AGENT INFERENCE TOOLS STORAGE TOTAL # research marketer 12cr 6cr 0cr 18cr # pick_topic strategist 8cr 0cr 0cr 8cr # draft_post marketer 22cr 0cr 0cr 22cr # rubric_eval marketer 6cr 0cr 0cr 6cr # publish engineer 2cr 1cr 1cr 4cr # notify strategist 2cr 1cr 0cr 3cr # Call-level: individual inference calls within a step capx costs --company my-company --run run_8k2m9x --step draft_post --detail # CALL MODEL INPUT OUTPUT COST LATENCY # 1 claude-sonnet-4 3,200tk 1,800tk 16cr 4.2s # 2 claude-sonnet-4 1,200tk 400tk 6cr 1.8s (rubric)
Replay log
The replay log lets you reconstruct exactly what happened during any action. It shows the full context the agent had (system prompt, memory entries, conversation history), the model request, the raw response, tool calls, rubric evaluation, and the governance decision. This is the primary debugging tool when an agent produces unexpected output.
# Replay a specific step in a run capx replay run_8k2m9x:draft_post --company my-company # === Replay: run_8k2m9x / step: draft_post === # Agent: marketer # Model: claude-sonnet-4 # Timestamp: 2026-05-25T10:31:14Z # # System prompt: (234 tokens) # You are a marketer for this company. You create content... # # Memory entries retrieved: 12 (relevance strategy) # - "Blog audience prefers practical examples" (score: 0.94) # - "Last post on AI tooling got 2x average engagement" (score: 0.91) # - ... # # Input messages: 3 (1,840 tokens) # [step context from pick_topic output] # # Model response: (1,800 tokens, 4.2s) # [full text of the generated blog post] # # Rubric evaluation: # Criteria: "Between 800 and 1500 words, 3+ examples, CTA, no jargon" # Score: 8/10 (threshold: 7) -> PASS # # Governance: auto-approved (cost 22cr < threshold 50cr) # Total cost: 22cr (inference: 16cr + rubric: 6cr)
Verbose replay with raw payloads
Pass --verbose to include the raw request and response payloads, which is useful when debugging adapters.
# Include full request/response payloads (useful for adapter debugging) capx replay run_8k2m9x:draft_post --company my-company --verbose # Include raw JSON request sent to the adapter # Include raw JSON response from the model # Include token-by-token timing (if available)
Heartbeat monitor
The heartbeat monitor tracks the health and activity of every agent. It shows when each agent last fired its heartbeat, when the next one is due, how many actions the agent took, and whether any errors occurred.
capx agent status --company my-company # AGENT LAST HEARTBEAT NEXT DUE STATE ACTIONS ERRORS (24h) # strategist 10:00 (42m ago) 12:00 active 3 0 # engineer 10:30 (12m ago) 11:00 active 5 0 # marketer 09:00 (1h ago) 13:00 active 2 0 # support 10:45 (3m ago) 11:00 active 8 1
capx agent heartbeats engineer --company my-company --since 24h # TIME ACTIONS ITEMS EVAL COST ERRORS # 10:30 5 8 12cr 0 # 10:00 3 5 8cr 0 # 09:30 2 3 6cr 0 # 09:00 4 7 10cr 0 # 08:30 0 2 4cr 0 # ...
heartbeat.missed event in the activity feed and triggers a webhook alert if configured. Missed heartbeats typically indicate that the agent is paused, killed, or that the runtime encountered an error.Casa dashboard
The Casa dashboard provides a visual overview of your company's operations. It surfaces the same data as the CLI and API in a browser-based interface.
| Dashboard section | What it shows |
|---|---|
| Overview | Total costs today/week/month, active agents, pending approvals, recent errors. |
| Activity | Real-time activity feed with filters and search. |
| Agents | Agent roster with state, heartbeat status, budget usage per agent. |
| Playbooks | Playbook list with run history, average cost, success rate. |
| Costs | Interactive cost charts: by agent, by playbook, by model, over time. |
| Approvals | Pending approval queue with approve/reject actions. |
| Settings | Company configuration, governance rules, secrets, team management. |
Usage analytics
Beyond raw activity and cost data, Capx provides aggregate analytics that help you understand patterns and optimize over time.
# Playbook success rate over the past 30 days capx analytics playbooks --company my-company --period 30d # PLAYBOOK RUNS SUCCESS AVG COST AVG DURATION RUBRIC AVG # weekly-content 4 100% 116cr 8m 12s 8.2/10 # customer-onboarding 32 94% 18cr 1m 45s 8.7/10 # support-triage 156 98% 4cr 22s 7.4/10 # deploy-pipeline 8 88% 62cr 3m 30s 7.8/10 # Agent efficiency over the past 30 days capx analytics agents --company my-company --period 30d # AGENT TASKS SUCCESS AVG COST/TASK RUBRIC AVG IDLE % # strategist 120 97% 35cr 8.5/10 22% # engineer 85 91% 102cr 7.9/10 15% # marketer 64 95% 18cr 8.3/10 45% # support 312 99% 5cr 7.6/10 8% # Model efficiency comparison capx analytics models --company my-company --period 30d # MODEL CALLS AVG COST AVG LATENCY RUBRIC AVG # claude-sonnet-4 2,840 8.2cr 3.1s 8.2/10 # claude-haiku-4 4,120 1.8cr 0.9s 7.5/10 # gpt-4.1-mini 340 3.4cr 2.2s 7.8/10
Webhook alerts
Configure webhooks to receive real-time notifications when specific events occur. Webhooks are delivered as POST requests with a JSON payload containing the event type, company ID, and event-specific data.
governance:
webhooks:
- url: "https://your-api.com/webhooks/capx"
secret: "${{ secrets.WEBHOOK_SECRET }}"
events:
- step.failed
- approval.requested
- budget.warning
- budget.exceeded
- heartbeat.missed
- agent.state_changed
- url: "https://hooks.slack.com/services/T.../B.../..."
events:
- playbook.completed
- approval.requested
- budget.exceeded{
"event": "budget.warning",
"company_id": "comp_8k2m9x",
"timestamp": "2026-05-25T10:42:00Z",
"data": {
"agent": "engineer",
"cap_type": "daily",
"threshold": 0.8,
"current_usage": 816,
"cap_limit": 1000,
"percentage": 0.816
}
}Data export
Export activity, cost, and analytics data in JSON or CSV format for use in your own dashboards, spreadsheets, or data pipelines.
# Export activity as JSON capx activity --company my-company --since 30d --format json > activity.json # Export costs as CSV capx costs --company my-company --period 30d --format csv > costs.csv # Export analytics as JSON capx analytics playbooks --company my-company --period 90d --format json > playbook-analytics.json # Stream activity in real time via API (Server-Sent Events) curl -N "https://api.capx.ai/v1/companies/my-company/activity/stream" \ -H "Authorization: Bearer capx_sk_live_..." \ -H "Accept: text/event-stream"
| Export format | CLI flag | Best for |
|---|---|---|
| JSON | --format json | Programmatic processing, data pipelines, custom dashboards. |
| CSV | --format csv | Spreadsheets, quick analysis, importing into BI tools. |
| SSE (streaming) | API only | Real-time monitoring, custom alerting, live dashboards. |
Best practices
- Check the activity feed daily during your first two weeks. Patterns emerge quickly: which agents are busiest, which playbooks fail most often, which steps cost the most. After two weeks you will have a strong intuition for what normal looks like, which makes anomalies obvious.
- Set up webhook alerts from day one for
step.failed,budget.exceeded, andheartbeat.missed. These three events cover the most common failure modes: broken playbooks, runaway costs, and unresponsive agents. Route them to Slack or a similar channel where you will actually see them. - Use the replay log proactively, not just for debugging. Replaying a few random successful actions per week helps you spot quality issues before they become patterns. It also builds your understanding of how agents reason, which makes you better at writing playbooks and tuning rubrics.
