Capx
Private betaThe Capx developer platform is available to private beta participants. Join the waitlist for access.
Guides

Observability

How to monitor agent activity, track costs at every granularity level, replay any action, configure alerts, and export data for your own analytics. Everything your agents do is logged, metered, and replayable.

Observability in Capx

Running an autonomous company means trusting agents to act on your behalf. Trust requires visibility. Capx records every action, every inference call, every tool invocation, every governance decision, and every credit spent.

This data is available in real time through four surfaces:

You can access observability data through three interfaces:

  • The Casa dashboard, a visual, browser-based interface.
  • The CLI, for terminal-first workflows.
  • The REST API, for programmatic access.

All three surfaces expose the same underlying data.

Activity feed

The activity feed is a chronological log of every action taken by every agent in your company. Each entry includes the agent, the action type, the playbook and step that triggered it, the cost, the outcome, and a timestamp.

What is logged

Event typeDescriptionIncludes
playbook.startedA playbook execution began.Playbook name, version, trigger source, inputs.
playbook.completedA playbook execution finished.Duration, total cost, step count, final status.
step.startedA playbook step began executing.Step ID, agent, tool/prompt, estimated cost.
step.completedA step finished.Output summary, actual cost, rubric score (if applicable).
step.failedA step failed.Error message, retry count, escalation path.
rubric.evaluatedA rubric graded step output.Score, criteria, pass/fail, retry decision.
approval.requestedAn action entered the approval queue.Action details, estimated cost, context.
approval.resolvedA founder approved or rejected an action.Decision, feedback, response time.
heartbeat.firedAn agent's heartbeat cycle ran.Actions taken, items evaluated, cost.
agent.state_changedAn agent changed state.Old state, new state, reason.
budget.warningA budget threshold was crossed.Agent/company, threshold, current usage.
budget.exceededA budget cap was hit.Agent/company, cap type, action taken.

Querying the activity feed

Query the same feed from the CLI or the REST API. Both support filtering by agent, event type, playbook, and time window.

# Recent activity (last 2 hours)
capx activity --company my-company --since 2h

# TIME      AGENT        EVENT                COST    STATUS
# 10:42     engineer     step.completed       22cr    ok
# 10:38     engineer     step.started         -       running
# 10:35     marketer     rubric.evaluated     5cr     pass (8/10)
# 10:31     marketer     step.completed       18cr    ok
# 10:15     support      heartbeat.fired      4cr     2 actions
# 10:02     strategist   approval.requested   -       pending
# 09:58     strategist   step.completed       12cr    ok

# Filter by agent
capx activity --company my-company --agent engineer --since 24h

# Filter by event type
capx activity --company my-company --type step.failed --since 7d

# Filter by playbook
capx activity --company my-company --playbook weekly-content --since 7d

# Search activity (full-text search on event details)
capx activity --company my-company --search "customer onboarding" --since 30d

Cost ledger

The cost ledger tracks every credit spent with granularity down to individual tool calls within a playbook step. It is the authoritative record for billing and the primary tool for cost optimization.

Granularity levels

LevelWhat it showsUse case
CompanyTotal credits per day/week/month across all agents.Budget tracking, executive reporting.
AgentCredits per agent per time period.Identifying expensive agents, rebalancing budgets.
PlaybookCredits per playbook across all runs.Optimizing high-cost workflows.
RunCredits per playbook run, broken down by step.Debugging a specific execution.
StepCredits per step, split into inference and tool costs.Finding expensive prompts or tools.
CallIndividual inference calls and tool invocations.Deep debugging, model comparison.
Cost ledger queries
bash
# Company-level: monthly summary
capx costs --company my-company
# Total this month: 15,790 credits ($157.90)

# Agent-level: per-agent breakdown
capx costs --company my-company --by agent
# AGENT        THIS MONTH   % OF TOTAL   BUDGET     REMAINING
# strategist   4,240cr      26.8%        30,000     25,760
# engineer     8,720cr      55.2%        20,000     11,280
# marketer     1,150cr       7.3%        15,000     13,850
# support      1,680cr      10.6%         8,000      6,320

# Run-level: step-by-step cost breakdown
capx costs --company my-company --run run_8k2m9x
# STEP         AGENT       INFERENCE   TOOLS   STORAGE   TOTAL
# research     marketer    12cr        6cr     0cr       18cr
# pick_topic   strategist  8cr         0cr     0cr       8cr
# draft_post   marketer    22cr        0cr     0cr       22cr
# rubric_eval  marketer    6cr         0cr     0cr       6cr
# publish      engineer    2cr         1cr     1cr       4cr
# notify       strategist  2cr         1cr     0cr       3cr

# Call-level: individual inference calls within a step
capx costs --company my-company --run run_8k2m9x --step draft_post --detail
# CALL   MODEL           INPUT     OUTPUT    COST    LATENCY
# 1      claude-sonnet-4  3,200tk   1,800tk   16cr    4.2s
# 2      claude-sonnet-4  1,200tk     400tk    6cr    1.8s  (rubric)

Replay log

The replay log lets you reconstruct exactly what happened during any action. It shows the full context the agent had (system prompt, memory entries, conversation history), the model request, the raw response, tool calls, rubric evaluation, and the governance decision. This is the primary debugging tool when an agent produces unexpected output.

Replaying an action
bash
# Replay a specific step in a run
capx replay run_8k2m9x:draft_post --company my-company

# === Replay: run_8k2m9x / step: draft_post ===
# Agent: marketer
# Model: claude-sonnet-4
# Timestamp: 2026-05-25T10:31:14Z
#
# System prompt: (234 tokens)
#   You are a marketer for this company. You create content...
#
# Memory entries retrieved: 12 (relevance strategy)
#   - "Blog audience prefers practical examples" (score: 0.94)
#   - "Last post on AI tooling got 2x average engagement" (score: 0.91)
#   - ...
#
# Input messages: 3 (1,840 tokens)
#   [step context from pick_topic output]
#
# Model response: (1,800 tokens, 4.2s)
#   [full text of the generated blog post]
#
# Rubric evaluation:
#   Criteria: "Between 800 and 1500 words, 3+ examples, CTA, no jargon"
#   Score: 8/10 (threshold: 7) -> PASS
#
# Governance: auto-approved (cost 22cr < threshold 50cr)
# Total cost: 22cr (inference: 16cr + rubric: 6cr)
Verbose replay with raw payloads

Pass --verbose to include the raw request and response payloads, which is useful when debugging adapters.

Verbose replay
bash
# Include full request/response payloads (useful for adapter debugging)
capx replay run_8k2m9x:draft_post --company my-company --verbose

# Include raw JSON request sent to the adapter
# Include raw JSON response from the model
# Include token-by-token timing (if available)
Tip
When an agent produces unexpected output, replay is the first tool to reach for. It answers the three critical questions: what context did the agent have, what did the model return, and how did governance evaluate it.

Heartbeat monitor

The heartbeat monitor tracks the health and activity of every agent. It shows when each agent last fired its heartbeat, when the next one is due, how many actions the agent took, and whether any errors occurred.

Heartbeat status
bash
capx agent status --company my-company

# AGENT        LAST HEARTBEAT     NEXT DUE      STATE    ACTIONS   ERRORS (24h)
# strategist   10:00 (42m ago)    12:00          active   3         0
# engineer     10:30 (12m ago)    11:00          active   5         0
# marketer     09:00 (1h ago)     13:00          active   2         0
# support      10:45 (3m ago)     11:00          active   8         1
Heartbeat history for an agent
bash
capx agent heartbeats engineer --company my-company --since 24h

# TIME      ACTIONS   ITEMS EVAL   COST    ERRORS
# 10:30     5         8            12cr    0
# 10:00     3         5             8cr    0
# 09:30     2         3             6cr    0
# 09:00     4         7            10cr    0
# 08:30     0         2             4cr    0
# ...
Note
A missed heartbeat (one that should have fired but did not) generates a heartbeat.missed event in the activity feed and triggers a webhook alert if configured. Missed heartbeats typically indicate that the agent is paused, killed, or that the runtime encountered an error.

Casa dashboard

The Casa dashboard provides a visual overview of your company's operations. It surfaces the same data as the CLI and API in a browser-based interface.

Dashboard sectionWhat it shows
OverviewTotal costs today/week/month, active agents, pending approvals, recent errors.
ActivityReal-time activity feed with filters and search.
AgentsAgent roster with state, heartbeat status, budget usage per agent.
PlaybooksPlaybook list with run history, average cost, success rate.
CostsInteractive cost charts: by agent, by playbook, by model, over time.
ApprovalsPending approval queue with approve/reject actions.
SettingsCompany configuration, governance rules, secrets, team management.

Usage analytics

Beyond raw activity and cost data, Capx provides aggregate analytics that help you understand patterns and optimize over time.

Usage analytics CLI
bash
# Playbook success rate over the past 30 days
capx analytics playbooks --company my-company --period 30d

# PLAYBOOK               RUNS   SUCCESS   AVG COST   AVG DURATION   RUBRIC AVG
# weekly-content          4      100%      116cr      8m 12s         8.2/10
# customer-onboarding    32       94%       18cr      1m 45s         8.7/10
# support-triage        156       98%        4cr      22s            7.4/10
# deploy-pipeline         8       88%       62cr      3m 30s         7.8/10

# Agent efficiency over the past 30 days
capx analytics agents --company my-company --period 30d

# AGENT        TASKS    SUCCESS   AVG COST/TASK   RUBRIC AVG   IDLE %
# strategist   120      97%       35cr            8.5/10       22%
# engineer      85      91%       102cr           7.9/10       15%
# marketer      64      95%       18cr            8.3/10       45%
# support      312      99%        5cr            7.6/10        8%

# Model efficiency comparison
capx analytics models --company my-company --period 30d

# MODEL             CALLS    AVG COST   AVG LATENCY   RUBRIC AVG
# claude-sonnet-4   2,840    8.2cr      3.1s          8.2/10
# claude-haiku-4    4,120    1.8cr      0.9s          7.5/10
# gpt-4.1-mini        340   3.4cr      2.2s          7.8/10

Webhook alerts

Configure webhooks to receive real-time notifications when specific events occur. Webhooks are delivered as POST requests with a JSON payload containing the event type, company ID, and event-specific data.

company.yaml (webhooks)
yaml
governance:
  webhooks:
    - url: "https://your-api.com/webhooks/capx"
      secret: "${{ secrets.WEBHOOK_SECRET }}"
      events:
        - step.failed
        - approval.requested
        - budget.warning
        - budget.exceeded
        - heartbeat.missed
        - agent.state_changed

    - url: "https://hooks.slack.com/services/T.../B.../..."
      events:
        - playbook.completed
        - approval.requested
        - budget.exceeded
Webhook payload example
json
{
  "event": "budget.warning",
  "company_id": "comp_8k2m9x",
  "timestamp": "2026-05-25T10:42:00Z",
  "data": {
    "agent": "engineer",
    "cap_type": "daily",
    "threshold": 0.8,
    "current_usage": 816,
    "cap_limit": 1000,
    "percentage": 0.816
  }
}
Note
Webhook deliveries are retried 3 times with exponential backoff (1s, 10s, 60s) if your endpoint returns a non-2xx status code. Failed deliveries are logged in the activity feed.

Data export

Export activity, cost, and analytics data in JSON or CSV format for use in your own dashboards, spreadsheets, or data pipelines.

Exporting data
bash
# Export activity as JSON
capx activity --company my-company --since 30d --format json > activity.json

# Export costs as CSV
capx costs --company my-company --period 30d --format csv > costs.csv

# Export analytics as JSON
capx analytics playbooks --company my-company --period 90d --format json > playbook-analytics.json

# Stream activity in real time via API (Server-Sent Events)
curl -N "https://api.capx.ai/v1/companies/my-company/activity/stream" \
  -H "Authorization: Bearer capx_sk_live_..." \
  -H "Accept: text/event-stream"
Export formatCLI flagBest for
JSON--format jsonProgrammatic processing, data pipelines, custom dashboards.
CSV--format csvSpreadsheets, quick analysis, importing into BI tools.
SSE (streaming)API onlyReal-time monitoring, custom alerting, live dashboards.

Best practices

  • Check the activity feed daily during your first two weeks. Patterns emerge quickly: which agents are busiest, which playbooks fail most often, which steps cost the most. After two weeks you will have a strong intuition for what normal looks like, which makes anomalies obvious.
  • Set up webhook alerts from day one for step.failed, budget.exceeded, and heartbeat.missed. These three events cover the most common failure modes: broken playbooks, runaway costs, and unresponsive agents. Route them to Slack or a similar channel where you will actually see them.
  • Use the replay log proactively, not just for debugging. Replaying a few random successful actions per week helps you spot quality issues before they become patterns. It also builds your understanding of how agents reason, which makes you better at writing playbooks and tuning rubrics.
Tip
Export cost data weekly and track your cost-per-task metric over time. If it is trending down, your company is getting more efficient. If it is trending up, check for model upgrades, longer prompts, or increased retry rates.

Next steps