Platform
Memory
Long-term, scoped, ratable memory that powers personalization without leaking across tenants.
Memory
Memory in Platos is scoped, ratable, and tiered. Every fact the agent learns is stored as a PlatosMemory row keyed by user and agent (or cluster), embedded for vector recall, surfaced through recall and list_memories meta-tools, and updated through a feedback loop. Cross-tenant access is rejected at the auth layer; you cannot recall a memory you did not write.
What it is
Four classes of memory, all in one table:
- Working memory: short-lived state for the current turn or run. Owned by
WorkingMemoryService. Lives in Redis with a TTL. - Conversation memory: messages and summaries on the active thread. Backed by
ConversationServiceand (optionally) compaction. - Profile memory: long-term facts about the user (preferences, identifiers, prior context). Owned by
ProfileCacheServiceover aPlatosMemoryrow taggedkind: "profile". - Knowledge memory: agent-scoped facts and references. Same table, tagged
kind: "knowledge". The Memory graph layers entity nodes and edges over this set.
Each row carries (scope, userId, agentId | clusterId, kind, content, embedding, rating, createdAt, updatedAt). The embedding column is pgvector; recall is HNSW cosine. Ratings flow through MemoryFeedbackService and reweight retrieval ranking.
Extraction runs after a turn ends, scheduled by MemoryExtractionService plus MemoryScheduler. The agent's extractionPolicy decides when to run (every turn, batched per N turns, or only on user feedback) and what shape the extraction prompt takes.
Why it matters
A chat agent without long-term memory feels stateless: the user reintroduces themselves on every visit, repeats their preferences, and forgets the agent will forget. A naive solution (dump every message into a vector store) leaks private facts across tenants and explodes cost. Platos splits the problem into typed tiers, scopes each tier per (scope, user, agent | cluster), and runs extraction asynchronously so chat latency never pays for memory writes.
The ratings loop is the differentiator. When a user thumbs-down a reply that hallucinated, the message rating cascades back into the memory rows the recall pulled from, downweighting them on the next query. Bad memories starve themselves out without manual cleanup.
Setup — embedding provider is mandatory
Memory writes (manual remember calls, update_user_profile, AND the hourly extraction sweep) compute a 1536-dim embedding before insert. Without an embedding provider configured, the embed call throws and the memory never lands. Symptom: agent feels stateless across sessions, the dashboard memory tab is empty even after multi-turn conversations, and the agent log shows VOYAGE_API_KEY not configured (or the OpenAI equivalent) on every failed extraction.
Pick one provider on the agent container env:
# Recommended — Anthropic-recommended embedding provider
PLATOS_EMBEDDING_PROVIDER=voyage
VOYAGE_API_KEY=pa-... # https://www.voyageai.com or via the Anthropic console
# OR — reuse your OpenAI key (already needed if you use the OpenAI LLM)
PLATOS_EMBEDDING_PROVIDER=openai
OPENAI_API_KEY=sk-...
Default model on each provider is 1536-dim native (voyage-large-2 / text-embedding-3-small) so it slots straight into the existing pgvector column without a schema migration. If you set PLATOS_EMBEDDING_MODEL to a non-1536-dim model you'll need to widen the column first; see Self-hosting § Required env vars.
These can also be linked per-scope via the dashboard Providers UI, encrypted in Postgres — EmbeddingService.resolveApiKey checks the scope-bound store first and falls back to the container env. Profile-kind memory rows (update_user_profile) skip the embed step (key-value lookup) and don't require this; everything else does.
How to use it
From the agent's perspective
Three meta-tools cover the bulk of usage:
remember({ content, kind?: "profile" | "knowledge" }): write a fact.recall({ query, limit?: 5 }): vector search over the agent's accessible memory.forget({ memoryId }): delete a specific memory.
Plus list_memories({ kind?, query? }) for paged inspection and relate({ from, to, label }) for the graph layer.
Manual writes from the dashboard
/orgs/{org}/projects/{project}/env/{env}/memories lists every memory in scope, filterable by user, agent, kind, and rating. Inline edit, rate, or delete. Useful for forensic cleanup after a known bad turn.
Extraction policy
Set per agent:
{
"mode": "post-turn",
"minTurns": 3,
"maxTokensPerExtraction": 2000,
"extractionPrompt": null
}
post-turn extracts after each turn ends. batched extracts after minTurns turns or when the user rates a message. Custom prompts override the default extraction template.
Scoping rules
- Default:
(userId, agentId). The agent only sees memory it wrote for that user. - Cluster:
(userId, clusterId). All cluster members share the same memory pool. - Project-shared: explicit
kind: "shared"rows visible to every agent in the project. Use sparingly; this is the only place where one agent's writes leak to another.
GDPR delete
DELETE /agent/v1/memory?userId=... runs the cascade: working memory, profile, knowledge, and any embedded copies. Returns the count of rows deleted. Per-user delete is also exposed through messages.rate with a delete intent and via the dashboard's user detail page.
Common pitfalls
- Recall ranking blends cosine similarity with rating. A high-cosine row with
rating: -1ranks lower than a moderate-cosine row withrating: +1. If you import memories without ratings, expect the first turn after import to feel more aggressive than steady-state. - The scheduler runs in the same trigger.dev queue as extraction. A flood of post-turn extractions can backlog the queue. Consider
batchedmode for high-volume agents. - Cluster scope is opt-in. Two agents created independently and later linked into a cluster do not retroactively share their existing memories. New writes flow to the cluster scope; old writes stay agent-scoped.
embedding.service.tscalls the configured embedding model. If the provider key for that model is unlinked, extraction silently no-ops (logged at warn level). Watch the Monitoring memory-extraction-health card.
Related
- Memory graph: the entity-and-relations view over knowledge memory.
- Conversations and threads: where extraction reads from.
- Agent clusters: cluster-scoped memory pools.
- Safety and PII: PII filtering before a memory is written.
