Conversation Memory

Conversation Memory gives your chatbot the ability to remember what users talked about — both within a single chat session and across multiple sessions over time.

Two Memory Strategies

You choose one of two strategies for how your chatbot manages conversation context. Each works differently under the hood, but both are invisible to your users — the chatbot just gives better, more personalized answers.

	Graph-Based (Recommended)	Conventional
How it works	Stores conversations in a knowledge graph. Uses AI-powered ranking to find the most relevant context.	Keeps full conversation history. Summarizes older turns when the context window fills.
Best for	Long, detailed conversations. Technical support. Users who reference earlier topics.	Short, linear conversations. Simple Q&A. Cost-sensitive deployments.
Within-session recall	Semantic search + graph connectivity — finds relevant older turns even across topic shifts	Full history until summarization triggers, then summary + recent turns
Token efficiency	High — only the most relevant turns are injected (about 1,500 tokens vs 15,000+ for full history)	Moderate — full history until threshold, then summary reduces to about 1,500 tokens
Requires FalkorDB	Yes	No
Default	Yes	—

Both strategies can be combined with long-term memory (cross-session fact recall) as an independent toggle.

Choosing a strategy

If you're unsure, use Graph-Based — it's the default and works well for all conversation lengths. Switch to Conventional only if you specifically want the simpler summarization approach.

Graph-Based Memory (Knowledge Graph)

The graph-based strategy stores every question-and-answer exchange as connected nodes in a knowledge graph. This creates a rich structure that the system uses to find the most relevant context for each new message.

How it works

Every turn is stored with its meaning (semantic embedding) and linked to the previous turn with a continuity score
When a new question arrives, the system searches for relevant older turns using three signals:
- Semantic similarity — "Does this earlier turn mean something similar?"
- Keyword matching — "Does it contain the same terms?"
- Graph connectivity — "Is it connected to other relevant turns?" (Personalized PageRank)
The most relevant turns are injected into the chatbot's context alongside the last 2 exchanges
The chatbot sees a focused view of the conversation — not the full history, just what matters for the current question

Intelligent ranking with Personalized PageRank

The graph signal is what makes this strategy special. The system doesn't just search by keywords or meaning — it walks the knowledge graph to discover connected context.

Example: A user has a 15-turn conversation:

Turns 1–4: Detailed milling setup discussion
Turn 5: Quick question about file export (topic shift)
Turn 6: "Going back to the milling — what coolant should I use?"

A simple search for "coolant" might miss turns 1–4 (they don't mention coolant). But the graph knows those turns are tightly connected — they form a conversation cluster about milling. Personalized PageRank starts from the coolant question, walks the graph, and discovers the entire milling cluster is relevant.

Conversation flow tracking

Each question is linked to the next with a continuity score that measures topic relatedness:

"What tool for aluminum?"  ──0.73──▸  "What feed rate?"  ──0.71──▸  "Climb or conventional?"
                                (same topic, high continuity)

"Climb or conventional?"  ──0.26──▸  "How do I export to PDF?"
                              (topic shift, low continuity)

Questions in a tight topical cluster are more meaningful together than scattered one-off questions. The ranking system uses this: context from focused discussion clusters is prioritized over isolated mentions.

Token efficiency

Instead of passing the entire conversation to the AI model (which wastes tokens and money), the graph strategy selectively picks the most relevant turns:

About 1,500 tokens of memory context per request
Compare to 15,000–30,000 tokens for full history in a 20-turn session
90–95% token reduction while preserving the most important context

Conventional Memory (Summarize and Truncate)

The conventional strategy keeps the full conversation history in the prompt. When the history grows large, a background summarizer automatically condenses the older turns into a concise summary.

How it works

Full history is passed to the AI model on every message — no selective retrieval
When history reaches approximately 100,000 tokens (about 250 turns), a background summarizer kicks in
The summarizer (Amazon Nova Micro — fast and cost-efficient) condenses older turns into a bullet-point summary
The summary replaces the original older messages on the next turn
Recent turns (last 3 exchanges) are always kept verbatim
The user never notices — summarization happens after the response is sent

Why this saves you money

Cost savings on long conversations

Without condensation, a 300-turn conversation sends approximately 120,000 tokens of history to your main AI model on every single message. After condensation, the model receives a 500-token summary plus the last few turns — roughly 80x fewer tokens per request. The one-time summarization cost (about $0.001) pays for itself immediately.

Each subsequent summarization only processes the newly-aged-out turns (not the whole history again), so ongoing costs are minimal (about $0.0001 per condensation).

Consistent threshold

The condensation threshold is the same regardless of which AI model your chatbot uses. Whether your main model has a 128K or 1M token context window, condensation triggers at the same point. This prevents large-context models from accumulating unlimited history (which would be extremely expensive per request even though the model CAN handle it).

Long-Term Memory (Cross-Session Facts)

Long-term memory is an independent toggle that works with either strategy. When enabled, your chatbot learns and remembers facts about each user across sessions.

How it works

After a conversation ends, the system automatically extracts key facts — things the user said about themselves, their work, their preferences — and stores them for future sessions.

Example:

Session 1 (Monday):

User: "I'm setting up a tapping operation on our CNC-5000 Pro"

Session 2 (Wednesday):

User: "What feed rate should I use?" Chatbot: "Based on your previous setup on the CNC-5000 Pro, here are the recommended feed rates for tapping..."

The chatbot remembered the machine from Monday's session and used that context to give a better answer on Wednesday — without the user needing to say "remember" or "last time." Facts are always injected automatically.

What gets remembered

The system extracts user-centric facts — things about the person, not documentation content:

Remembered	Not remembered
"User works with a CNC-5000 Pro"	Product command definitions
"User prefers climb milling for aluminum"	Step-by-step procedures from docs
"User's typical tool diameter is 6mm"	Parameter tables

How facts connect (Graph strategy only)

When using the graph-based strategy, long-term facts are connected in the same knowledge graph. Facts extracted from the same conversation get linked. Semantically similar facts from different sessions also get linked. This creates a web of connections:

"Works with CNC-5000 Pro" ───── "Prefers climb milling"
         │                                │
         │                                │
"Tool diameter is 6mm" ──────── "Programs M12x1 tapping"

When a user asks about milling, all connected facts get boosted — not just the one with the best keyword match. Questions from the current session also link to these facts, so frequently-referenced facts rank higher than one-off mentions.

Scope-aware recall

If your chatbot uses Knowledge Scopes, long-term facts are tagged with the scope they were learned in and only surface when the user is querying that same scope (or a scope chained to it).

Example:

Scenario	Behavior
User discusses MultiDECO setup in the `solidcam-2025` scope. Comes back later, still on `solidcam-2025`.	Facts surface — bot personalizes the answer.
Same user switches to the `hyperturn` scope and asks a generic question.	MultiDECO facts do NOT surface — bot doesn't offer personalization it can't back with the right documentation.
User has no scopes configured (single namespace chatbot).	All facts surface for every question, unchanged behavior.

This prevents the chatbot from offering tailored advice it can't deliver — e.g. "Want me to apply this to your MultiDECO setup?" when MultiDECO documentation isn't in the currently-active scope.

Back-compat: Facts learned before scope-tagging shipped have no scope label and continue to surface under any scope (so existing users don't suddenly lose context).

Limits and retention

Up to 500 facts per user are stored
When the limit is reached, the least important facts are automatically removed
Facts that are re-mentioned across sessions gain importance
Well-connected facts (linked to many other facts) rank higher than isolated ones

Requirements

info

Pro plan or higher — long-term memory is a premium feature
User authentication required — anonymous users don't have a persistent identity to attach memories to
Changes take effect within a few minutes (no redeploy needed)

What the Chatbot Sees

Every message, the chatbot automatically receives the most relevant context from memory:

1. Known facts about this user          ← long-term memory (if enabled)
   - Works with CNC-5000 Pro
   - Prefers climb milling for aluminum

2. Earlier relevant context              ← recalled from this session
   User: "What feed rate for 6mm carbide?"
   Bot: "Based on the docs, recommended feed rates are..."

3. Recent conversation                   ← last 2 exchanges (always included)
   User: "Should I use flood or mist coolant?"
   Bot: "For aluminum milling..."
   User: "What pressure setting?" ← current question

4. Documentation context                 ← RAG search results from your knowledge base

The token budget for memory is shared (approximately 1,500 tokens total). If many long-term facts are relevant, fewer past turns are included, and vice versa.

How Memory Appears to End Users

During chat

Memory is invisible to users during normal chatting. The chatbot simply gives better, more contextual answers. Memory provides context, not authority — if a remembered fact conflicts with the documentation, the documentation wins.

Memory Manager

When long-term memory is enabled, users see a "My Memory" option in the chatbot's sidebar:

View all facts the chatbot has learned about them
Delete individual facts or all memories at once
Opt out entirely via Privacy settings

Privacy controls

Per-user opt-out: Each user can disable memory for themselves
Pseudonymization: User identifiers are cryptographically hashed
Data isolation: Each user's memory is completely isolated
Database constraints: Mandatory user scoping enforced at the database level
GDPR compliance: Users can view, delete, and opt out at any time

Configuring Memory

Enable/disable long-term memory

Toggle in the chatbot's detail page under "Conversation Memory". Disabling will permanently delete all stored facts — you'll be asked to confirm.

Switch short-term strategy

Choose between Graph-based and Conventional in the same card. Switching from graph to conventional will permanently delete stored conversation nodes — you'll see the count and asked to confirm.

Settings propagation

Both settings propagate within a few minutes via configuration hot-reload. No redeploy needed.

Cost

Feature	Cost
Graph-based short-term memory	Included — embeddings recycled from document search
Conventional short-term memory	Included until condensation triggers
Conventional condensation (first)	About $0.001 per condensation. Pays for itself via 80x token savings on main model.
Conventional condensation (subsequent)	About $0.0001 per condensation
Long-term fact extraction	About $0.0003 per completed session
Memory retrieval	Included — graph queries run locally

All AI costs are tracked in your usage dashboard via Langfuse.

FAQ

Does memory slow down my chatbot?

No. Memory retrieval adds 10–50ms to each request. Short-term writes and conventional summarization happen after the response is sent. None of these are perceptible to the user.

Can I use long-term memory with the conventional strategy?

Yes. Long-term memory and short-term strategy are independent. You can combine them in any way: graph + long-term, conventional + long-term, graph only, or conventional only.

Do anonymous users get memory?

Anonymous users get short-term memory within their session. They do NOT get long-term memory — their identity is ephemeral.

Is memory shared between chatbots?

No. Each chatbot has its own isolated memory.

Is memory shared between scopes within the same chatbot?

No — long-term facts are scope-aware. A fact learned in one knowledge scope only surfaces when the user is querying that same scope (or a chained scope). See Scope-aware recall above. Legacy facts from before scope-tagging shipped still surface globally for back-compat.

What's the difference between graph ranking and simple keyword search?

Keyword search finds memories containing the exact words the user typed. Graph ranking goes further — it finds memories that are connected to the matching ones through the knowledge graph. If a user asks about "coolant," the system also surfaces "climb milling" and "6mm carbide end mill" because those facts were discussed in the same conversation cluster.

How does the chatbot decide what to remember vs forget?

The system extracts facts about the USER (their work, preferences, context) — not documentation content. Facts mentioned repeatedly gain importance. Facts never referenced again gradually lose importance and are evicted when the 500-per-user limit is reached. Well-connected facts rank higher than isolated mentions.

Two Memory Strategies​

Graph-Based Memory (Knowledge Graph)​

How it works​

Intelligent ranking with Personalized PageRank​

Conversation flow tracking​

Token efficiency​

Conventional Memory (Summarize and Truncate)​

How it works​

Why this saves you money​

Consistent threshold​

Long-Term Memory (Cross-Session Facts)​

How it works​

What gets remembered​

How facts connect (Graph strategy only)​

Scope-aware recall​

Limits and retention​

Requirements​

What the Chatbot Sees​

How Memory Appears to End Users​

During chat​

Memory Manager​

Privacy controls​

Configuring Memory​

Enable/disable long-term memory​

Switch short-term strategy​

Settings propagation​

Cost​

FAQ​

Two Memory Strategies

Graph-Based Memory (Knowledge Graph)

How it works

Intelligent ranking with Personalized PageRank

Conversation flow tracking

Token efficiency

Conventional Memory (Summarize and Truncate)

How it works

Why this saves you money

Consistent threshold

Long-Term Memory (Cross-Session Facts)

How it works

What gets remembered

How facts connect (Graph strategy only)

Scope-aware recall

Limits and retention

Requirements

What the Chatbot Sees

How Memory Appears to End Users

During chat

Memory Manager

Privacy controls

Configuring Memory

Enable/disable long-term memory

Switch short-term strategy

Settings propagation

Cost

FAQ