Conversation Memory
Conversation Memory gives your chatbot the ability to remember what users talked about — both within a single chat session and across multiple sessions over time.
Two Memory Strategies
You choose one of two strategies for how your chatbot manages conversation context. Each works differently under the hood, but both are invisible to your users — the chatbot just gives better, more personalized answers.
| Graph-Based (Recommended) | Conventional | |
|---|---|---|
| How it works | Stores conversations in a knowledge graph. Uses AI-powered ranking to find the most relevant context. | Keeps full conversation history. Summarizes older turns when the context window fills. |
| Best for | Long, detailed conversations. Technical support. Users who reference earlier topics. | Short, linear conversations. Simple Q&A. Cost-sensitive deployments. |
| Within-session recall | Semantic search + graph connectivity — finds relevant older turns even across topic shifts | Full history until summarization triggers, then summary + recent turns |
| Token efficiency | High — only the most relevant turns are injected (about 1,500 tokens vs 15,000+ for full history) | Moderate — full history until threshold, then summary reduces to about 1,500 tokens |
| Requires FalkorDB | Yes | No |
| Default | Yes | — |
Both strategies can be combined with long-term memory (cross-session fact recall) as an independent toggle.
If you're unsure, use Graph-Based — it's the default and works well for all conversation lengths. Switch to Conventional only if you specifically want the simpler summarization approach.
Graph-Based Memory (Knowledge Graph)
The graph-based strategy stores every question-and-answer exchange as connected nodes in a knowledge graph. This creates a rich structure that the system uses to find the most relevant context for each new message.
How it works
- Every turn is stored with its meaning (semantic embedding) and linked to the previous turn with a continuity score
- When a new question arrives, the system searches for relevant older turns using three signals:
- Semantic similarity — "Does this earlier turn mean something similar?"
- Keyword matching — "Does it contain the same terms?"
- Graph connectivity — "Is it connected to other relevant turns?" (Personalized PageRank)
- The most relevant turns are injected into the chatbot's context alongside the last 2 exchanges
- The chatbot sees a focused view of the conversation — not the full history, just what matters for the current question
Intelligent ranking with Personalized PageRank
The graph signal is what makes this strategy special. The system doesn't just search by keywords or meaning — it walks the knowledge graph to discover connected context.
Example: A user has a 15-turn conversation:
- Turns 1–4: Detailed milling setup discussion
- Turn 5: Quick question about file export (topic shift)
- Turn 6: "Going back to the milling — what coolant should I use?"
A simple search for "coolant" might miss turns 1–4 (they don't mention coolant). But the graph knows those turns are tightly connected — they form a conversation cluster about milling. Personalized PageRank starts from the coolant question, walks the graph, and discovers the entire milling cluster is relevant.
Conversation flow tracking
Each question is linked to the next with a continuity score that measures topic relatedness:
"What tool for aluminum?" ──0.73──▸ "What feed rate?" ──0.71──▸ "Climb or conventional?"
(same topic, high continuity)
"Climb or conventional?" ──0.26──▸ "How do I export to PDF?"
(topic shift, low continuity)
Questions in a tight topical cluster are more meaningful together than scattered one-off questions. The ranking system uses this: context from focused discussion clusters is prioritized over isolated mentions.
Token efficiency
Instead of passing the entire conversation to the AI model (which wastes tokens and money), the graph strategy selectively picks the most relevant turns:
- About 1,500 tokens of memory context per request
- Compare to 15,000–30,000 tokens for full history in a 20-turn session
- 90–95% token reduction while preserving the most important context
Conventional Memory (Summarize and Truncate)
The conventional strategy keeps the full conversation history in the prompt. When the history grows large, a background summarizer automatically condenses the older turns into a concise summary.
How it works
- Full history is passed to the AI model on every message — no selective retrieval
- When history reaches approximately 100,000 tokens (about 250 turns), a background summarizer kicks in
- The summarizer (Amazon Nova Micro — fast and cost-efficient) condenses older turns into a bullet-point summary
- The summary replaces the original older messages on the next turn
- Recent turns (last 3 exchanges) are always kept verbatim
- The user never notices — summarization happens after the response is sent
Why this saves you money
Without condensation, a 300-turn conversation sends approximately 120,000 tokens of history to your main AI model on every single message. After condensation, the model receives a 500-token summary plus the last few turns — roughly 80x fewer tokens per request. The one-time summarization cost (about $0.001) pays for itself immediately.
Each subsequent summarization only processes the newly-aged-out turns (not the whole history again), so ongoing costs are minimal (about $0.0001 per condensation).
Consistent threshold
The condensation threshold is the same regardless of which AI model your chatbot uses. Whether your main model has a 128K or 1M token context window, condensation triggers at the same point. This prevents large-context models from accumulating unlimited history (which would be extremely expensive per request even though the model CAN handle it).
Long-Term Memory (Cross-Session Facts)
Long-term memory is an independent toggle that works with either strategy. When enabled, your chatbot learns and remembers facts about each user across sessions.
How it works
After a conversation ends, the system automatically extracts key facts — things the user said about themselves, their work, their preferences — and stores them for future sessions.
Example:
Session 1 (Monday):
User: "I'm setting up a tapping operation on our CNC-5000 Pro"
Session 2 (Wednesday):
User: "What feed rate should I use?" Chatbot: "Based on your previous setup on the CNC-5000 Pro, here are the recommended feed rates for tapping..."
The chatbot remembered the machine from Monday's session and used that context to give a better answer on Wednesday — without the user needing to say "remember" or "last time." Facts are always injected automatically.
What gets remembered
The system extracts user-centric facts — things about the person, not documentation content:
| Remembered | Not remembered |
|---|---|
| "User works with a CNC-5000 Pro" | Product command definitions |
| "User prefers climb milling for aluminum" | Step-by-step procedures from docs |
| "User's typical tool diameter is 6mm" | Parameter tables |
How facts connect (Graph strategy only)
When using the graph-based strategy, long-term facts are connected in the same knowledge graph. Facts extracted from the same conversation get linked. Semantically similar facts from different sessions also get linked. This creates a web of connections:
"Works with CNC-5000 Pro" ───── "Prefers climb milling"
│ │
│ │
"Tool diameter is 6mm" ──────── "Programs M12x1 tapping"
When a user asks about milling, all connected facts get boosted — not just the one with the best keyword match. Questions from the current session also link to these facts, so frequently-referenced facts rank higher than one-off mentions.
Scope-aware recall
If your chatbot uses Knowledge Scopes, long-term facts are tagged with the scope they were learned in and only surface when the user is querying that same scope (or a scope chained to it).
Example:
| Scenario | Behavior |
|---|---|
User discusses MultiDECO setup in the solidcam-2025 scope. Comes back later, still on solidcam-2025. | Facts surface — bot personalizes the answer. |
Same user switches to the hyperturn scope and asks a generic question. | MultiDECO facts do NOT surface — bot doesn't offer personalization it can't back with the right documentation. |
| User has no scopes configured (single namespace chatbot). | All facts surface for every question, unchanged behavior. |
This prevents the chatbot from offering tailored advice it can't deliver — e.g. "Want me to apply this to your MultiDECO setup?" when MultiDECO documentation isn't in the currently-active scope.
Back-compat: Facts learned before scope-tagging shipped have no scope label and continue to surface under any scope (so existing users don't suddenly lose context).
Limits and retention
- Up to 500 facts per user are stored
- When the limit is reached, the least important facts are automatically removed
- Facts that are re-mentioned across sessions gain importance
- Well-connected facts (linked to many other facts) rank higher than isolated ones
Requirements
- Pro plan or higher — long-term memory is a premium feature
- User authentication required — anonymous users don't have a persistent identity to attach memories to
- Changes take effect within a few minutes (no redeploy needed)
What the Chatbot Sees
Every message, the chatbot automatically receives the most relevant context from memory:
1. Known facts about this user ← long-term memory (if enabled)
- Works with CNC-5000 Pro
- Prefers climb milling for aluminum
2. Earlier relevant context ← recalled from this session
User: "What feed rate for 6mm carbide?"
Bot: "Based on the docs, recommended feed rates are..."
3. Recent conversation ← last 2 exchanges (always included)
User: "Should I use flood or mist coolant?"
Bot: "For aluminum milling..."
User: "What pressure setting?" ← current question
4. Documentation context ← RAG search results from your knowledge base
The token budget for memory is shared (approximately 1,500 tokens total). If many long-term facts are relevant, fewer past turns are included, and vice versa.
How Memory Appears to End Users
During chat
Memory is invisible to users during normal chatting. The chatbot simply gives better, more contextual answers. Memory provides context, not authority — if a remembered fact conflicts with the documentation, the documentation wins.
Memory Manager
When long-term memory is enabled, users see a "My Memory" option in the chatbot's sidebar:
- View all facts the chatbot has learned about them
- Delete individual facts or all memories at once
- Opt out entirely via Privacy settings
Privacy controls
- Per-user opt-out: Each user can disable memory for themselves
- Pseudonymization: User identifiers are cryptographically hashed
- Data isolation: Each user's memory is completely isolated
- Database constraints: Mandatory user scoping enforced at the database level
- GDPR compliance: Users can view, delete, and opt out at any time
Configuring Memory
Enable/disable long-term memory
Toggle in the chatbot's detail page under "Conversation Memory". Disabling will permanently delete all stored facts — you'll be asked to confirm.
Switch short-term strategy
Choose between Graph-based and Conventional in the same card. Switching from graph to conventional will permanently delete stored conversation nodes — you'll see the count and asked to confirm.
Settings propagation
Both settings propagate within a few minutes via configuration hot-reload. No redeploy needed.
Cost
| Feature | Cost |
|---|---|
| Graph-based short-term memory | Included — embeddings recycled from document search |
| Conventional short-term memory | Included until condensation triggers |
| Conventional condensation (first) | About $0.001 per condensation. Pays for itself via 80x token savings on main model. |
| Conventional condensation (subsequent) | About $0.0001 per condensation |
| Long-term fact extraction | About $0.0003 per completed session |
| Memory retrieval | Included — graph queries run locally |
All AI costs are tracked in your usage dashboard via Langfuse.
FAQ
Does memory slow down my chatbot?
No. Memory retrieval adds 10–50ms to each request. Short-term writes and conventional summarization happen after the response is sent. None of these are perceptible to the user.
Can I use long-term memory with the conventional strategy?
Yes. Long-term memory and short-term strategy are independent. You can combine them in any way: graph + long-term, conventional + long-term, graph only, or conventional only.
Do anonymous users get memory?
Anonymous users get short-term memory within their session. They do NOT get long-term memory — their identity is ephemeral.
Is memory shared between chatbots?
No. Each chatbot has its own isolated memory.
Is memory shared between scopes within the same chatbot?
No — long-term facts are scope-aware. A fact learned in one knowledge scope only surfaces when the user is querying that same scope (or a chained scope). See Scope-aware recall above. Legacy facts from before scope-tagging shipped still surface globally for back-compat.
What's the difference between graph ranking and simple keyword search?
Keyword search finds memories containing the exact words the user typed. Graph ranking goes further — it finds memories that are connected to the matching ones through the knowledge graph. If a user asks about "coolant," the system also surfaces "climb milling" and "6mm carbide end mill" because those facts were discussed in the same conversation cluster.
How does the chatbot decide what to remember vs forget?
The system extracts facts about the USER (their work, preferences, context) — not documentation content. Facts mentioned repeatedly gain importance. Facts never referenced again gradually lose importance and are evicted when the 500-per-user limit is reached. Well-connected facts rank higher than isolated mentions.