mirage
Open source · v0.2.0 shipped

Same policy file in CI and in production.

Mirage is the deterministic policy runtime for AI agents. Every outbound action is evaluated against a portable policy DSL and decided by rules — allow, block, or flag. The same file gates your CI build and enforces in production. No LLM in the decision loop.

View on GitHub
Modes
CI · gateway
Stack
httpx · FastAPI · CLI
Decision
deterministic, no LLM judge
mirage · proxy + ci
~/agent
$ mirage proxy --mocks ./mocks.yaml --policies ./policies.yaml
[mirage] proxy listening on :8001 · 12 routes · 4 policies
 
$ pytest tests/test_procurement_agent.py
 
agent → POST /v1/get_quote allowed · 12ms
agent → GET /v1/suppliers allowed · 31ms
agent → POST /v1/submit_bid { amount: 50000 }
policy_violation
✗ enforce_bid_limit (max: 25000)
 
side_effect_count: 0
trace: artifacts/traces/procurement-run.json
 
1 failed in 0.34s
trace deterministic
outcome policy_violation

Positioning

Three categories. One open seat.

Mirage isn’t another evaluator, dashboard, or framework SDK. It’s the deterministic policy runtime — the same file gating CI today and enforcing in production tomorrow.

Vs quality eval

LangSmith · Braintrust · Patronus · Future AGI

They grade whether the model answered correctly with an LLM judge in the loop.

Mirage decides whether the agent’s actions are allowed by rules — deterministic, no judge, safe to fail-build a CI run on.

Vs observability

Sentrial · Laminar · Helicone · Datadog-for-agents

They watch what the agent already did, after the side effects landed.

Mirage decides on each action before it ships — and runs the same decision in CI and in production.

Vs bundled framework guardrails

Microsoft Agent Governance Toolkit · OpenAI Agents SDK · NeMo Guardrails

They ship as SDK helpers inside one framework or one cloud.

Mirage is the framework-agnostic layer underneath: a portable policy file that survives a stack change.

How it works

Same policy file. CI gate, then runtime gateway.

Mirage’s entire surface area is YAML, a session wrapper, and a gateway binary. No SDKs to learn, no model in the decision loop, one policy file across CI and production.

01

Declare the surface

Author mocks for the routes your agent calls and policies for the rules it must obey. Both are plain YAML — review them in PRs like any config.

mocks.yaml
yaml
# mocks.yaml
mocks:
  - name: get_quote
    method: POST
    path: /v1/get_quote
    response:
      status_code: 200
      json:
        quote_id: Q-001
        price: 24500

  - name: submit_bid
    method: POST
    path: /v1/submit_bid
    response:
      status_code: 201
      json: { order_id: ORD-9 }
02

Wrap your run

Drop MirageSession around your agent run. Outbound httpx traffic is intercepted, matched, and policy-checked. assert_clean() is your gate.

test_procurement.py
py
from mirage import MirageSession

with MirageSession(run_id="procurement") as mirage:
    run_my_agent(client=mirage.client)
    # Fail the build if any call hit a policy_violation
    # or missed the declared mocks.
    mirage.assert_clean()
03

Fail the build

Every run emits a deterministic trace. CI fails on policy_violation or unmatched_route — no flaky live API, no model-judge in the loop.

trace.json
json
{
  "run_id": "procurement",
  "outcome": "policy_violation",
  "policy": "enforce_bid_limit",
  "request": {
    "method": "POST",
    "path": "/v1/submit_bid",
    "json": { "amount": 50000 }
  },
  "side_effect_count": 0
}
04

Enforce in production

Run the same policy file behind mirage gateway in front of a real upstream. Start in passthrough — forward and log — then flip to enforce when containment is at the floor you want.

production
sh
# Same policies.yaml. Real upstream traffic. Two enforcement modes.

# Passthrough: forward every request, log decisions, do not block.
mirage gateway \
  --upstream https://api.acme.internal \
  --mode passthrough \
  --policies ./policies.yaml

# Enforce: block on policy violation with HTTP 403.
mirage gateway \
  --upstream https://api.acme.internal \
  --mode enforce \
  --policies ./policies.yaml

Review console

Read every run as a precision instrument.

Outcome taxonomy, response headers, and policy decisions across both modes — CI runs and gateway traffic, surfaced over the same trace store.

mirage console · procurement-run.json v0.2.0
Mirage review console showing a procurement run that triggered a policy violation
Screenshot · review console at v0.2.0 See it on GitHub

Roadmap

From CI gate to production gateway. From rule-based to stateful policy.

Mirage 0.2.0 ships the production gateway, framework integrations, and a reproducible benchmark harness, live on PyPI today. v0.3 brings durable rate limits and the hosted control plane. The numbered milestones below are the path to 1.0 — versioned, public, MIT.

  1. v0.1.3 Shipped

    Apr 26, 2026

    CI mode of the policy runtime

    • Httpx-native proxy and MirageSession
    • Declarative mocks + policies in YAML
    • Deterministic traces and four-outcome taxonomy
    • Pytest plugin, `mirage gate-run` CLI, review console
  2. v0.2.0 Shipped

    May 3, 2026

    Production gateway, integrations, and benchmarks

    • Gateway mode: `mirage gateway` against real upstreams (passthrough + enforce)
    • OpenAI Agents SDK and LangChain integrations
    • Eleven new policy operators: regex, host allowlist, length, contains, sequence
    • Containment + decision-latency + time-to-decide metrics in the console
    • Reproducible benchmark harness with three scenarios (`make bench`)
  3. v0.3.0 In progress

    Durable rate limits + hosted control plane

    • Durable, multi-process rate limits (Redis-backed)
    • Hosted control plane: policy authoring, fleet view, audit-log export
    • Stateful policies (cross-call invariants beyond per-run counters)
    • Adversarial benchmark scenarios + chaos library
  4. v1.0.0 Horizon

    Reference scenarios + stability commitment

    • Reference scenario library across verticals
    • SOC2 / HIPAA-aligned audit log shape
    • Async-native gating in the framework adapters
    • 1.0 stability commitment on policy DSL + outcome taxonomy

Get started

Ship one policy file across CI and production.

Install in 30 seconds. Author one mocks file and one policies file. Wrap your agent run for CI; point the gateway at your upstream for production. Same decisions, same trace store, no LLM judge.