Catch risky agent actions before they merge.
Mirage sits between your agent and its APIs. It checks every outbound HTTP call against declarative policy, returns deterministic mocks, and fails the build before a hallucinated route or duplicate charge ever ships.
- Outcome taxonomy
- 4 deterministic states
- Stack
- httpx · pytest · CLI
- Gate
- CI fails on policy_violation
Positioning
Three categories. One open seat.
Mirage isn’t another evaluator or another dashboard. It’s the deterministic gate that runs in CI before a regression reaches production.
Vs quality eval
LangSmith · Braintrust · Patronus
They score whether the agent answered correctly.
Mirage scores whether the agent’s actions stayed inside policy.
Vs observability
Agent tracing dashboards
They watch what the agent already did, after the run.
Mirage gates what the agent is about to do — pre-merge, in CI.
Vs HTTP mocks
VCR.py · respx · responses
They replay recorded responses; cassettes lock regressions.
Mirage enforces declarative policy on synthetic mocks, so a brand-new risky action is caught the first time it appears.
How it works
Three files. One deterministic gate.
Mirage’s entire surface area is mocks, policies, and a session wrapper. No SDKs to learn, no model-in-the-loop, no live traffic in tests.
Declare the surface
Author mocks for the routes your agent calls and policies for the rules it must obey. Both are plain YAML — review them in PRs like any config.
# mocks.yaml
mocks:
- name: get_quote
method: POST
path: /v1/get_quote
response:
status_code: 200
json:
quote_id: Q-001
price: 24500
- name: submit_bid
method: POST
path: /v1/submit_bid
response:
status_code: 201
json: { order_id: ORD-9 } Wrap your run
Drop MirageSession around your agent run. Outbound httpx traffic is intercepted, matched, and policy-checked. assert_clean() is your gate.
from mirage import MirageSession
with MirageSession(run_id="procurement") as mirage:
run_my_agent(client=mirage.client)
# Fail the build if any call hit a policy_violation
# or missed the declared mocks.
mirage.assert_clean() Fail the build
Every run emits a deterministic trace. CI fails on policy_violation or unmatched_route — no flaky live API, no model-judge in the loop.
{
"run_id": "procurement",
"outcome": "policy_violation",
"policy": "enforce_bid_limit",
"request": {
"method": "POST",
"path": "/v1/submit_bid",
"json": { "amount": 50000 }
},
"side_effect_count": 0
} Review console
Read every run as a precision instrument.
Outcome taxonomy, response headers, and policy decisions surfaced without dashboards or live agents — just the trace your CI already wrote.
Roadmap
The deterministic engine, today. Chaos engineering for agent governance, next.
Mirage 0.1.x is what runs in your CI today. The numbered milestones below are the path to 1.0 — versioned, public, and shipped on PyPI.
- v0.1.3 Shipped
The deterministic engine
- Httpx-native proxy and MirageSession
- Declarative mocks + policies in YAML
- Outcome taxonomy + deterministic traces
- Pytest plugin, `mirage gate-run` CLI, review console
- v0.2.0 In progress
Chaos library + scenario DSL
- Network and payload chaos modes
- Scenario YAML loader
- Reference scenarios under hostile conditions
- v0.3.0 Planned
Containment metrics
- Containment rate, false-negative rate, time-to-detect
- CLI scenario runner with JUnit output
- Resilience tab in the review console
- v1.0.0 Horizon
Policy chaos + reference suite
- Policy-layer chaos modes
- Authoring docs + reference scenario library
- 1.0 stability commitment
Get started
Ship the next agent commit behind a deterministic gate.
Install in 30 seconds. Author one mocks file and one policies file. Wrap your agent run. CI fails the next risky action before it merges.