Governance
Budget caps
Per-agent and per-environment spend caps that throttle or block turns when the budget is exceeded.
Budget caps
A budget cap is a per-period spend ceiling. When an agent or environment hits its cap, the runtime blocks new turns (and emits a webhook). Caps roll up across the four spend lanes the runtime tracks: model inference, embeddings, LLM-judge evals, and skill metering. Without caps, a runaway prompt loop is a runaway invoice.
What it is
A PlatosBudgetCap row keyed on (scope, agentId | null, period). Carries:
period:daily,weekly,monthly. Wall-clock periods aligned to UTC midnight.capCents: the limit.softWarnCents: optional; emits a warn webhook when crossed.hardCap: when true, blocks turns. When false, alerts only.lanes: optional; restrict to specific cost lanes (e.g. cap only onmodel_inference, leave embeddings unmetered).
BudgetService reads the cap on every turn start and rolls up actual spend from the cost rows. Spend updates land via CostService; the cap check is a Redis lookup against a rolling counter, not a Postgres scan.
Override capability: an admin can POST /agent/v1/budgets/:capId/override with a bypassUntil timestamp to let a single agent run beyond cap during incidents.
Why it matters
The default LLM cost curve is exponential to the agent. A single prompt with the wrong loop (model calls itself in a tool, model emits a memory write that triggers extraction that calls itself) can spend $1000 in an hour. Budget caps fail loud before that becomes a postmortem. The runtime stops you from shooting yourself in the foot; the cap is the safety on the gun.
Shadow lanes matter because they are easy to miss. A judge eval at $0.01 per call across 50 inputs and 3 criteria is $1.50, just on the eval. If you only cap on model_inference, the eval lane runs free. The default cap covers all four lanes; restrict only when you know why.
How to use it
Set a cap
/orgs/{org}/projects/{project}/env/{env}/agent-budgets. Pick the agent (or "all agents" for env-wide). Set period, cap, optional soft warn, hard cap toggle. Save.
See remaining budget
The agent detail header shows a budget pill with "X% used today" when a cap exists. Hover for the breakdown by lane. The budgets page shows every cap with current spend.
When the cap hits
A turn that would exceed the cap is rejected before it dispatches to the model. The agent's response to the user is BUDGET_CAP_EXCEEDED plus the reset time (next period boundary). A webhook (budget.exceeded) fires; tie it to your alerting.
Soft warn
Set softWarnCents to (say) 80% of cap. A budget.soft_warn webhook fires when spend crosses the threshold. The agent keeps running; the warn is informational.
Bypass during an incident
await platos.platos_call("budgets.override", {
capId,
bypassUntil: new Date(Date.now() + 60 * 60 * 1000).toISOString(),
reason: "P0 incident workaround",
});
Logged in the Audit log.
Common pitfalls
- Caps roll up wall-clock; a midnight UTC boundary is not your local midnight. Configure your alerting accordingly.
- The cap check is per-turn, not per-tool-call. A single turn that fans out into 100 LLM judge calls can briefly exceed the cap before the post-turn write lands. Use lane restrictions plus aggressive soft-warn for tight bounds.
- Override bypasses the cap, not the rate limit. A capped agent under heavy load is still rate-limited. See Rate limits.
- A budget cap on
agentId: nullmeans "scope-wide". Per-agent caps stack on top; the agent hits whichever fires first.
Related
- Costs: the per-lane spend rollup the cap reads from.
- Rate limits: the orthogonal axis (RPS, not dollars).
- Safety and PII: the broader governance surface.
