Mem0

Treating Agent Memory as a Separate Tier

Jun 23, 2026

Drop-in long-term memory layer that extracts, stores, and retrieves user facts across a hybrid vector-graph-key-value store. The 2.0 rewrite landed quietly while the discourse was on MCP and tool-calling protocols. The cleanest example of treating memory as a service distinct from retrieval.

Yesterday’s piece made the argument that your agent’s memory should not be your RAG index. The write patterns are different, the consistency needs are different, and the failure mode of retrieving another user’s fact is harder to detect than a hallucination. Today we look at the tool that embodies that separation most clearly. Mem0 ships a pipeline purpose-built for user memory: extract facts from conversation, store them in a hybrid store that mixes vector embeddings with keyword indexes and entity graphs, and retrieve them with temporal awareness so the agent knows which facts are current. It does not try to also be your document retrieval system. That constraint is what makes it good.

Mem0 is a Y Combinator S24 company, Apache 2.0 licensed, and recently crossed 59,000 GitHub stars with 6,800 forks. The star count alone tells you something about the demand signal. Every agent team discovers at some point that the context window is not enough and that stuffing memories into their existing vector store is worse than having no memory at all. Mem0 exists because that discovery keeps happening, and the team built a dedicated store so teams do not have to build their own.

The architecture is the right starting point. Mem0 stores memories at three levels: user, session, and agent. User memory persists across sessions and captures preferences, long-term facts, and behavioral patterns. Session memory covers a single interaction and gets summarized when the session ends. Agent memory is the system-level knowledge the agent accumulates about its own operation. Most memory layers and most agent frameworks that claim memory support only handle one level. Mem0 handles all three with a consistent API, and the distinction matters because a user preference like “prefers async updates” has a different retention profile than a session fact like “the user mentioned their deadline is Friday.”

The v2.0 rewrite that landed in spring 2026 was a ground-up rearchitecture. The team released a new memory algorithm in April that switched from a complex update-and-delete model to a single-pass ADD-only extraction. The old algorithm tried to track which memories to overwrite and which to keep, producing a correctness surface that was hard to audit. The new algorithm does one thing: extract facts on every pass, append them to the store, and let the retrieval layer handle recency weighting. The change was driven by the team’s own benchmarks. The old algorithm scored 71.4 on LoCoMo and 67.8 on LongMemEval. The new algorithm scores 91.6 and 94.8 respectively. The benchmark improvements come with a better token profile: both benchmarks complete in roughly 7,000 tokens with sub-second p50 latency. Those are not synthetic numbers. They ran on the same model stack the library uses in production.

The retrieval layer is where the hybrid design earns its keep. Mem0 runs three retrieval signals in parallel and fuses them: a semantic vector search against stored memory embeddings, a BM25 keyword match for exact phrase and term recall, and an entity-matching pass that links extracted entities across memories. The fusion layer scores and ranks results from all three signals so the top result can come from any path. This matters for the use case that gives vector-only stores the most trouble: the agent needs to find a user fact that contains a specific name or number, and the embedding of that query does not land near the memory’s embedding in vector space. The BM25 signal catches it. The entity link catches it. The agent gets the right fact.

The temporal reasoning is the feature that makes the architecture correct for the use case. Mem0 assigns timestamps to every memory extraction and uses time-aware ranking during retrieval. When the agent asks “what is the user’s current project timeline?” the retrieval layer weights the most recent memory about project timelines higher than older ones. When the user says “we rebranded to NexGen last month,” the old company name is not deleted from the store. It is still there for historical queries. The retrieval layer deprioritizes it for questions about the current state. This is how memory should work. Not by overwriting the past, but by ranking it correctly for the present query.

The multi-signal retrieval with entity linking handles the case that breaks most memory implementations. A user says “I work with five engineers across three time zones” in one session and “we are up to seven engineers now” in a later session. Mem0 extracts entity links for both statements, links them to the same user entity, and at retrieval time the temporal ranker surfaces the newer fact first. The older fact is still retrievable for context. The agent does not have to decide which embedding to trust. The store makes the decision based on time and entity identity.

The deployment options matter for different team profiles. The library mode is pip install mem0ai, a few lines of Python, and you have a working memory layer against an OpenAI model as the default. The library stores data locally by default with a configurable backend, so prototyping takes minutes. The self-hosted mode runs as a Docker Compose stack with a dashboard, auth, and API keys. The cloud platform is managed with zero infrastructure overhead. Each tier adds operational capabilities without changing the core API. The team also ships a full CLI for terminal-based memory management and agent signup, plus integrations for LangGraph, CrewAI, and the Vercel AI SDK so Mem0 slots into full existing stacks and drop-in replacements for the default context-passing patterns.

The npm side of the ecosystem is worth calling out separately because it shows the engineering quality. Mem0 ships a TypeScript SDK (mem0ai on npm) alongside the Python one with the same API surface and a CLI (@mem0/cli) that lets agents register themselves instantly. An agent running in TypeScript or from a terminal can call mem0 init --agent --agent-caller claude-code and have a working memory store in four commands with no human in the loop. The human claims the account later. The memories survive the handoff. That agent-first signup flow is a small detail that reveals a big priority: the team thought about who their primary user is.

The version as of this writing is v2.0.7, released June 17, 2026. The release adds Gemini via Vertex AI as an LLM provider and batched embedding support for Ollama, plus fixes for async error handling, the entity store reset, and several Anthropic and Azure integration edge cases. The project has shipped 7 patch releases since the v2.0.0 rewrite, which suggests active maintenance rather than a team that shipped and moved on. The ts-v3 stream is on version 3.0.9 with its own parallel release cadence, so the TypeScript side is equally active.

The contrast with the mainstream alternatives is instructive. LangChain’s built-in memory abstractions and CrewAI’s default memory store operate at the session level and dump to a local SQLite or a simple key-value store. They cannot distinguish between user-level and session-level facts. They do not extract entities. They do not rank by recency. They do not fuse multiple retrieval signals. They work for demos. Mem0 works for production because it makes a set of architectural decisions that the mainstream memory systems avoid: store the facts separately from the conversation history, retrieve them with multiple signals, and rank by time. Each decision adds complexity. Each decision also adds correctness.

The team published a paper alongside the v3 algorithm rollout that benchmarks the approach rigorously. The LoCoMo score went from 71.4 to 91.6. The LongMemEval score went from 67.8 to 94.8 with a 53.6-point improvement on the assistant-memory-recall subset. The BEAM benchmark, which tests at production-scale token volumes, came in at 64.1 for 1M tokens and 48.6 for 10M tokens. The benchmarks are open-sourced for anyone to reproduce. That level of transparency is unusual in the agent memory space, which is full of projects that claim memory support without defining what they mean by the term.

For teams building the memory-aware agent stack, Mem0 is the memory tier. It sits between the agent and whatever document retrieval system you already have. It does not replace your vector store. It replaces the duct tape you used to stuff user facts into that store. The API is clean, the retrieval is multi-signal, the temporal ranker is correct, and the LLM and embedding provider support covers the major options. The 2.0 rewrite was a bet that the original architecture was not good enough for production. The evidence suggests the bet paid off.

If this was useful, forward it to one engineer who needs less noise in their feed.

Share Signal Over Noise

Signal Over Noise

Discussion about this post

Ready for more?