Recipes
Configure a PII filter
Block or redact specific PII categories per agent and log every hit.
Configure a PII filter
Detect and act on PII in chat input or memory writes per agent.
The goal
An agent that automatically redacts (or blocks) sensitive content before it reaches the model or memory store, with every hit logged for audit.
Steps
Open the safety tab.
On the agent, Safety -> "PII filters".
Pick categories.
Toggle defaults:
email,phone,credit_card,ssn,api_key,iban. Per category, choose policy:flag: pass through, emit an event for review.redact: mask the match in the model input or memory write.block: refuse the turn or memory write entirely.
Add custom regex (optional).
- Name:
internal_id. - Pattern:
\bACME-\d{6}\b. - Policy:
redact.
Save.
- Name:
Test.
In the chat panel, send a message containing a fake match (e.g. an email like
test@example.com). Watch the response: redacted content appears as[REDACTED:email]. The safety event card lights up on the governance page.
Verify
- A turn with a matched pattern triggers a
safety.eventwebhook. - The governance page (
/agent-governance) lists the event with category, policy, and decrypted match. - Audit shows the redacted-match log row.
Wire to an alert
Subscribe to safety.event. Filter by category: credit_card and policy block; route to your incident channel.
Common pitfalls
- US phone regex misses international formats. Add custom patterns.
blockpolicy throws a turn error users see; useredactfor first-party data the agent must acknowledge.- The filter runs on input, not output. Output filtering is configured separately on the agent's
outputFilters.
Next steps
- Add a human approval gate for tools that act on detected entities.
- See Safety, PII, governance for the full rule engine.
