Governance
Rate limits
Per-IP, per-token, per-agent, and per-environment rate limits enforced at the agent runtime.
Rate limits
Rate limits cap requests per second along four axes: IP, auth token, agent, and environment. The runtime enforces them at the auth guard before any business logic runs, so a misbehaving caller burns very little of the agent's CPU before getting a 429.
What it is
RateLimitService plus RateLimitGuard. Backed by Redis sliding-window counters keyed on:
(ip, route): per-source-IP cap. Default 60 req/min on chat endpoints, higher on public read endpoints.(token, route): per-auth-token cap. Defaults vary by token type; PATs are tighter than session tokens.(agentId): total turns per minute on a specific agent. Defaults configured per agent in its monitoring config.(scope): scope-wide cap. The ceiling across every agent in the environment.
Hits are logged to RateLimitHit (sampled to avoid log spam) and rolled up on the Monitoring page.
A 429 response carries Retry-After (seconds) plus X-Platos-Rate-Limit-Bucket so you know which axis fired.
Why it matters
Rate limits and budget caps fix different problems:
- A budget cap fails when your spend is too high. It is a money brake.
- A rate limit fails when your requests-per-second are too high. It is a fairness brake.
A user that holds down "send" in the chat UI will burn through a budget cap eventually, but the rate limit kicks in first and protects the agent from cascading downstream errors. A scraper hammering your public agent will not blow your budget for an hour, but it will burn other users' latency budgets in the meantime; the rate limit stops that.
How to use it
Tighten or loosen per agent
In the agent's monitoring tab, set rateLimit: { perMinute: 30 }. The agent's per-agent counter caps at 30 RPS regardless of how many users hit it.
Tighten by IP for public agents
Public agents (see Public agents and embed) typically need stricter IP caps to absorb scrape traffic. The default for public-share endpoints is 30 req/min per IP; raise the cap in the agent's share settings if you have a high-traffic public bot.
Diagnose a 429
Inspect X-Platos-Rate-Limit-Bucket:
ip:chat-> caller IP rate.token:chat-> caller token rate.agent:abc123-> the agent's per-minute cap.scope:env-id-> the environment's ceiling.
Each bucket has its own remaining-window header so you know exactly when retry will work.
Distinguish from upstream
A 429 from Platos is the runtime's. A 429 from the underlying provider (OpenAI, Anthropic) lands as a 502 with a provider_rate_limit error code; the runtime never re-emits a provider 429 directly because that would conflate two different actors.
Common pitfalls
- A long-running BGO does not consume agent rate (it is one tool call per turn). Heavy BGO fan-out hits the engine layer's rate limits, not the agent's. See Queues.
- Public agents should have stricter caps than internal ones; the default is intentionally tight.
- Sliding-window counters are per-replica until the cluster syncs. For multi-replica deployments, configure
RATE_LIMIT_CLUSTER_SYNC=trueso caps are global, not per-replica. - Rate limit hits are sampled in the audit log. If you need every hit, raise
RATE_LIMIT_LOG_SAMPLEto 1.0; expect log volume to grow.
Related
- Budgets: the orthogonal axis (dollars, not RPS).
- Safety and PII: the broader governance surface.
- Auth modes: each auth mode has its own default rate config.
