Private betaThe Capx developer platform is available to private beta participants. Join the waitlist for access.

Guides

Observability

How to monitor agent activity, track costs at every granularity level, replay any action, configure alerts, and export data for your own analytics. Everything your agents do is logged, metered, and replayable.

Observability in Capx

Running an autonomous company means trusting agents to act on your behalf. Trust requires visibility. Capx records every action, every inference call, every tool invocation, every governance decision, and every credit spent.

This data is available in real time through four surfaces:

Activity feedA chronological log of every action taken by every agent.Cost ledgerEvery credit spent, down to individual tool calls.Replay logReconstruct exactly what happened during any action.Heartbeat monitorHealth and activity status for every agent.

You can access observability data through three interfaces:

The Casa dashboard, a visual, browser-based interface.
The CLI, for terminal-first workflows.
The REST API, for programmatic access.

All three surfaces expose the same underlying data.

Activity feed

The activity feed is a chronological log of every action taken by every agent in your company. Each entry includes the agent, the action type, the playbook and step that triggered it, the cost, the outcome, and a timestamp.

What is logged

Event type	Description	Includes
playbook.started	A playbook execution began.	Playbook name, version, trigger source, inputs.
playbook.completed	A playbook execution finished.	Duration, total cost, step count, final status.
step.started	A playbook step began executing.	Step ID, agent, tool/prompt, estimated cost.
step.completed	A step finished.	Output summary, actual cost, rubric score (if applicable).
step.failed	A step failed.	Error message, retry count, escalation path.
rubric.evaluated	A rubric graded step output.	Score, criteria, pass/fail, retry decision.
approval.requested	An action entered the approval queue.	Action details, estimated cost, context.
approval.resolved	A founder approved or rejected an action.	Decision, feedback, response time.
heartbeat.fired	An agent's heartbeat cycle ran.	Actions taken, items evaluated, cost.
agent.state_changed	An agent changed state.	Old state, new state, reason.
budget.warning	A budget threshold was crossed.	Agent/company, threshold, current usage.
budget.exceeded	A budget cap was hit.	Agent/company, cap type, action taken.

Querying the activity feed

Query the same feed from the CLI or the REST API. Both support filtering by agent, event type, playbook, and time window.

# Recent activity (last 2 hours)
capx activity --company my-company --since 2h

# TIME      AGENT        EVENT                COST    STATUS
# 10:42     engineer     step.completed       22cr    ok
# 10:38     engineer     step.started         -       running
# 10:35     marketer     rubric.evaluated     5cr     pass (8/10)
# 10:31     marketer     step.completed       18cr    ok
# 10:15     support      heartbeat.fired      4cr     2 actions
# 10:02     strategist   approval.requested   -       pending
# 09:58     strategist   step.completed       12cr    ok

# Filter by agent
capx activity --company my-company --agent engineer --since 24h

# Filter by event type
capx activity --company my-company --type step.failed --since 7d

# Filter by playbook
capx activity --company my-company --playbook weekly-content --since 7d

# Search activity (full-text search on event details)
capx activity --company my-company --search "customer onboarding" --since 30d

Cost ledger

The cost ledger tracks every credit spent with granularity down to individual tool calls within a playbook step. It is the authoritative record for billing and the primary tool for cost optimization.

Granularity levels

Level	What it shows	Use case
Company	Total credits per day/week/month across all agents.	Budget tracking, executive reporting.
Agent	Credits per agent per time period.	Identifying expensive agents, rebalancing budgets.
Playbook	Credits per playbook across all runs.	Optimizing high-cost workflows.
Run	Credits per playbook run, broken down by step.	Debugging a specific execution.
Step	Credits per step, split into inference and tool costs.	Finding expensive prompts or tools.
Call	Individual inference calls and tool invocations.	Deep debugging, model comparison.

Cost ledger queries

bash

# Company-level: monthly summary
capx costs --company my-company
# Total this month: 15,790 credits ($157.90)

# Agent-level: per-agent breakdown
capx costs --company my-company --by agent
# AGENT        THIS MONTH   % OF TOTAL   BUDGET     REMAINING
# strategist   4,240cr      26.8%        30,000     25,760
# engineer     8,720cr      55.2%        20,000     11,280
# marketer     1,150cr       7.3%        15,000     13,850
# support      1,680cr      10.6%         8,000      6,320

# Run-level: step-by-step cost breakdown
capx costs --company my-company --run run_8k2m9x
# STEP         AGENT       INFERENCE   TOOLS   STORAGE   TOTAL
# research     marketer    12cr        6cr     0cr       18cr
# pick_topic   strategist  8cr         0cr     0cr       8cr
# draft_post   marketer    22cr        0cr     0cr       22cr
# rubric_eval  marketer    6cr         0cr     0cr       6cr
# publish      engineer    2cr         1cr     1cr       4cr
# notify       strategist  2cr         1cr     0cr       3cr

# Call-level: individual inference calls within a step
capx costs --company my-company --run run_8k2m9x --step draft_post --detail
# CALL   MODEL           INPUT     OUTPUT    COST    LATENCY
# 1      claude-sonnet-4  3,200tk   1,800tk   16cr    4.2s
# 2      claude-sonnet-4  1,200tk     400tk    6cr    1.8s  (rubric)

Replay log

The replay log lets you reconstruct exactly what happened during any action. It shows the full context the agent had (system prompt, memory entries, conversation history), the model request, the raw response, tool calls, rubric evaluation, and the governance decision. This is the primary debugging tool when an agent produces unexpected output.

Replaying an action

bash

# Replay a specific step in a run
capx replay run_8k2m9x:draft_post --company my-company

# === Replay: run_8k2m9x / step: draft_post ===
# Agent: marketer
# Model: claude-sonnet-4
# Timestamp: 2026-05-25T10:31:14Z
#
# System prompt: (234 tokens)
#   You are a marketer for this company. You create content...
#
# Memory entries retrieved: 12 (relevance strategy)
#   - "Blog audience prefers practical examples" (score: 0.94)
#   - "Last post on AI tooling got 2x average engagement" (score: 0.91)
#   - ...
#
# Input messages: 3 (1,840 tokens)
#   [step context from pick_topic output]
#
# Model response: (1,800 tokens, 4.2s)
#   [full text of the generated blog post]
#
# Rubric evaluation:
#   Criteria: "Between 800 and 1500 words, 3+ examples, CTA, no jargon"
#   Score: 8/10 (threshold: 7) -> PASS
#
# Governance: auto-approved (cost 22cr < threshold 50cr)
# Total cost: 22cr (inference: 16cr + rubric: 6cr)

Verbose replay with raw payloads

Pass --verbose to include the raw request and response payloads, which is useful when debugging adapters.

Verbose replay

bash

# Include full request/response payloads (useful for adapter debugging)
capx replay run_8k2m9x:draft_post --company my-company --verbose

# Include raw JSON request sent to the adapter
# Include raw JSON response from the model
# Include token-by-token timing (if available)

Tip

When an agent produces unexpected output, replay is the first tool to reach for. It answers the three critical questions: what context did the agent have, what did the model return, and how did governance evaluate it.

Heartbeat monitor

The heartbeat monitor tracks the health and activity of every agent. It shows when each agent last fired its heartbeat, when the next one is due, how many actions the agent took, and whether any errors occurred.

Heartbeat status

bash

capx agent status --company my-company

# AGENT        LAST HEARTBEAT     NEXT DUE      STATE    ACTIONS   ERRORS (24h)
# strategist   10:00 (42m ago)    12:00          active   3         0
# engineer     10:30 (12m ago)    11:00          active   5         0
# marketer     09:00 (1h ago)     13:00          active   2         0
# support      10:45 (3m ago)     11:00          active   8         1

Heartbeat history for an agent

bash

capx agent heartbeats engineer --company my-company --since 24h

# TIME      ACTIONS   ITEMS EVAL   COST    ERRORS
# 10:30     5         8            12cr    0
# 10:00     3         5             8cr    0
# 09:30     2         3             6cr    0
# 09:00     4         7            10cr    0
# 08:30     0         2             4cr    0
# ...

Note

A missed heartbeat (one that should have fired but did not) generates a heartbeat.missed event in the activity feed and triggers a webhook alert if configured. Missed heartbeats typically indicate that the agent is paused, killed, or that the runtime encountered an error.

Casa dashboard

The Casa dashboard provides a visual overview of your company's operations. It surfaces the same data as the CLI and API in a browser-based interface.

Dashboard section	What it shows
Overview	Total costs today/week/month, active agents, pending approvals, recent errors.
Activity	Real-time activity feed with filters and search.
Agents	Agent roster with state, heartbeat status, budget usage per agent.
Playbooks	Playbook list with run history, average cost, success rate.
Costs	Interactive cost charts: by agent, by playbook, by model, over time.
Approvals	Pending approval queue with approve/reject actions.
Settings	Company configuration, governance rules, secrets, team management.

Usage analytics

Beyond raw activity and cost data, Capx provides aggregate analytics that help you understand patterns and optimize over time.

Usage analytics CLI

bash

# Playbook success rate over the past 30 days
capx analytics playbooks --company my-company --period 30d

# PLAYBOOK               RUNS   SUCCESS   AVG COST   AVG DURATION   RUBRIC AVG
# weekly-content          4      100%      116cr      8m 12s         8.2/10
# customer-onboarding    32       94%       18cr      1m 45s         8.7/10
# support-triage        156       98%        4cr      22s            7.4/10
# deploy-pipeline         8       88%       62cr      3m 30s         7.8/10

# Agent efficiency over the past 30 days
capx analytics agents --company my-company --period 30d

# AGENT        TASKS    SUCCESS   AVG COST/TASK   RUBRIC AVG   IDLE %
# strategist   120      97%       35cr            8.5/10       22%
# engineer      85      91%       102cr           7.9/10       15%
# marketer      64      95%       18cr            8.3/10       45%
# support      312      99%        5cr            7.6/10        8%

# Model efficiency comparison
capx analytics models --company my-company --period 30d

# MODEL             CALLS    AVG COST   AVG LATENCY   RUBRIC AVG
# claude-sonnet-4   2,840    8.2cr      3.1s          8.2/10
# claude-haiku-4    4,120    1.8cr      0.9s          7.5/10
# gpt-4.1-mini        340   3.4cr      2.2s          7.8/10

Webhook alerts

Configure webhooks to receive real-time notifications when specific events occur. Webhooks are delivered as POST requests with a JSON payload containing the event type, company ID, and event-specific data.

company.yaml (webhooks)

yaml

governance:
  webhooks:
    - url: "https://your-api.com/webhooks/capx"
      secret: "${{ secrets.WEBHOOK_SECRET }}"
      events:
        - step.failed
        - approval.requested
        - budget.warning
        - budget.exceeded
        - heartbeat.missed
        - agent.state_changed

    - url: "https://hooks.slack.com/services/T.../B.../..."
      events:
        - playbook.completed
        - approval.requested
        - budget.exceeded

Webhook payload example

json

{
  "event": "budget.warning",
  "company_id": "comp_8k2m9x",
  "timestamp": "2026-05-25T10:42:00Z",
  "data": {
    "agent": "engineer",
    "cap_type": "daily",
    "threshold": 0.8,
    "current_usage": 816,
    "cap_limit": 1000,
    "percentage": 0.816
  }
}

Note

Webhook deliveries are retried 3 times with exponential backoff (1s, 10s, 60s) if your endpoint returns a non-2xx status code. Failed deliveries are logged in the activity feed.

Data export

Export activity, cost, and analytics data in JSON or CSV format for use in your own dashboards, spreadsheets, or data pipelines.

Exporting data

bash

# Export activity as JSON
capx activity --company my-company --since 30d --format json > activity.json

# Export costs as CSV
capx costs --company my-company --period 30d --format csv > costs.csv

# Export analytics as JSON
capx analytics playbooks --company my-company --period 90d --format json > playbook-analytics.json

# Stream activity in real time via API (Server-Sent Events)
curl -N "https://api.capx.ai/v1/companies/my-company/activity/stream" \
  -H "Authorization: Bearer capx_sk_live_..." \
  -H "Accept: text/event-stream"

Export format	CLI flag	Best for
JSON	--format json	Programmatic processing, data pipelines, custom dashboards.
CSV	--format csv	Spreadsheets, quick analysis, importing into BI tools.
SSE (streaming)	API only	Real-time monitoring, custom alerting, live dashboards.

Best practices

Check the activity feed daily during your first two weeks. Patterns emerge quickly: which agents are busiest, which playbooks fail most often, which steps cost the most. After two weeks you will have a strong intuition for what normal looks like, which makes anomalies obvious.
Set up webhook alerts from day one for step.failed, budget.exceeded, and heartbeat.missed. These three events cover the most common failure modes: broken playbooks, runaway costs, and unresponsive agents. Route them to Slack or a similar channel where you will actually see them.
Use the replay log proactively, not just for debugging. Replaying a few random successful actions per week helps you spot quality issues before they become patterns. It also builds your understanding of how agents reason, which makes you better at writing playbooks and tuning rubrics.

Tip

Export cost data weekly and track your cost-per-task metric over time. If it is trending down, your company is getting more efficient. If it is trending up, check for model upgrades, longer prompts, or increased retry rates.

Next steps

Cost ControlSet budgets and alerts for the spend you are now tracking.Governance & ApprovalsApproval queues, spend caps, and kill switches.Activity & Costs APIEvent streams and cost reporting endpoints.CLI ReferenceEvery capx command used in this guide, with flags.

PreviousCost Control NextOverview