Viewing a run
Open the dashboard and pick an agent. Each agent’s Runs tab lists every historical execution, sorted newest-first, with the trigger, the duration, the succeeded/failed status, and the cost. Click any row to land on the run detail page, which shows:
- The input that started the run.
- Every tool call with its arguments and outputs.
- Retrieval contexts and the citations the model emitted.
- Approval decisions, with the approver, the policy, and the consumed timestamp.
- The final output and any structured artifacts.
- A timeline of spans you can also open in your trace viewer.
Three replay modes
From the run detail page, click Replay and pick a mode:
- Identical — same agent version, same model, same tool snapshot. Useful for verifying determinism after an infra change.
- What-if — same inputs against a newer agent version, so you can see the diff before promoting it. Outputs are diffed token-by-token (text) and schema-aware (structured data).
- Verbose — same execution, extra debug logging — every model decision, every retrieval cut, and the full prompt as the runtime saw it. Useful for chasing hard-to-reproduce bugs.
Sandbox semantics
Replays are sandboxed by default:
- External tool calls are stubbed against the recorded responses; no email is sent, no CRM record mutates.
- Approvals are not consumed; the original approval row stays valid for the original execution.
- Idempotency keys are namespaced under the replay run id so dedupe in your downstream services still works.
- Replay events carry a
replaytag so the cost dashboard separates them from billable runs.
A replay can never double-charge a customer, send a duplicate email, or otherwise reach into the real world.
Debugging traces
Replays propagate the original run’s W3C traceparent header, so the replay’s spans line up side-by-side with the original in your trace viewer. The detail page also exposes a flat span tree filtered to the current run, with per-span latency, model id, token counts, and tool name.
Replaying from the API
You can also kick off a replay from the API — handy for scripted regression suites.
# Replay an existing run in identical mode
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"mode": "identical"}' \
"https://app.mvsagents.ai/api/v1/orgs/$ORG_ID/runs/$RUN_ID/replay"
# What-if against a specific agent version
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"mode": "what_if", "agent_version": "v1.4.2"}' \
"https://app.mvsagents.ai/api/v1/orgs/$ORG_ID/runs/$RUN_ID/replay"Promote a replay to an eval case
If a what-if replay surfaces a regression, click Save as eval from the replay detail page. The replay’s inputs and expected outputs become a regression case in the agent’s eval suite, and from then on every promotion of that agent must pass it. See /docs/buildouts.