The gap between theoretical reliability and actual deployment performance. Adjust severity and drift, then toggle auditing to see how pre-deployment checks close the gap.
Audit effect: Pre-deployment auditing bounds the gap at launch (t=0) and slows drift. It can't eliminate all uncertainty — hence "stochastic" — but it reduces expected deviation significantly.
Concept 2
Markovian Agent State Model
Each step depends only on the current state. Adjust transition probabilities and see how overall success/failure rates change — and where the agent most often gets stuck.
—
P(success)
—
P(failure)
—
Avg reviews/task
Key insight: Because it's Markovian, every probability is analytically computable — no simulation needed. This makes the framework auditable: you can certify risk in advance.
Concept 3
Optimal Oversight Level
Drag the oversight slider to explore cost trade-offs. The optimal point minimizes total cost — the sum of risk exposure and oversight overhead.
—
Risk cost
—
Oversight cost
—
Total cost
—
Optimal level
Try it: Increase α (risk weight) — the optimal point shifts right (more oversight justified). Increase β (oversight weight) — it shifts left (oversight is expensive, rely more on the agent). The math finds the sweet spot automatically.