AgentHub

Viewing a run

Open the dashboard and pick an agent. Each agent’s Runs tab lists every historical execution, sorted newest-first, with the trigger, the duration, the succeeded/failed status, and the cost. Click any row to land on the run detail page, which shows:

The input that started the run.
Every tool call with its arguments and outputs.
Retrieval contexts and the citations the model emitted.
Approval decisions, with the approver, the policy, and the consumed timestamp.
The final output and any structured artifacts.
A timeline of spans you can also open in your trace viewer.

Three replay modes

From the run detail page, click Replay and pick a mode:

Identical — same agent version, same model, same tool snapshot. Useful for verifying determinism after an infra change.
What-if — same inputs against a newer agent version, so you can see the diff before promoting it. Outputs are diffed token-by-token (text) and schema-aware (structured data).
Verbose — same execution, extra debug logging — every model decision, every retrieval cut, and the full prompt as the runtime saw it. Useful for chasing hard-to-reproduce bugs.

Sandbox semantics

Replays are sandboxed by default:

External tool calls are stubbed against the recorded responses; no email is sent, no CRM record mutates.
Approvals are not consumed; the original approval row stays valid for the original execution.
Idempotency keys are namespaced under the replay run id so dedupe in your downstream services still works.
Replay events carry a replay tag so the cost dashboard separates them from billable runs.

A replay can never double-charge a customer, send a duplicate email, or otherwise reach into the real world.

Debugging traces

Replays propagate the original run’s W3C traceparent header, so the replay’s spans line up side-by-side with the original in your trace viewer. The detail page also exposes a flat span tree filtered to the current run, with per-span latency, model id, token counts, and tool name.

Replaying from the API

You can also kick off a replay from the API — handy for scripted regression suites.

# Replay an existing run in identical mode
curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"mode": "identical"}' \
  "https://app.mvsagents.ai/api/v1/orgs/$ORG_ID/runs/$RUN_ID/replay"

# What-if against a specific agent version
curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"mode": "what_if", "agent_version": "v1.4.2"}' \
  "https://app.mvsagents.ai/api/v1/orgs/$ORG_ID/runs/$RUN_ID/replay"

Promote a replay to an eval case

If a what-if replay surfaces a regression, click Save as eval from the replay detail page. The replay’s inputs and expected outputs become a regression case in the agent’s eval suite, and from then on every promotion of that agent must pass it. See /docs/buildouts.