Letta: The Only Agent Framework That Treats Memory as Infrastructure
Every other framework treats memory as a feature. Letta treats it as the runtime.
The thing that breaks most production agents is not the model and not the tools. It is that the agent forgets. The session ends, the process restarts, the conversation rolls past whatever window you bought, and the next request lands on a stranger who has read the same system prompt the original stranger read an hour ago. Every framework I have worked with has a story about memory. Almost none of them have an architecture for it. They have a vector store you bolted on, a summarization step that runs at the end of each turn, and a hope that the embedding lookup will find the right note when the user asks a follow-up. Letta is the only one I have seen that walked into the problem from the other side and said the agent is a process with state, the state lives outside the model, and the framework’s job is to manage the paging.
Letta is the project formerly known as MemGPT. The name change in 2024 tracked a deeper pivot. MemGPT shipped as a research library with a clever idea: treat the context window like a CPU cache, and let the agent itself decide what to evict to a larger “external” memory and what to pull back in when relevant. The original paper was a good read. The original code was a good demo. Neither of them ran in production for anyone I knew. The rename to Letta came with the actual product, which is a server. You run it. It exposes a REST API. Agents are first-class objects with a database row, a memory hierarchy, and a lifecycle that survives restart, deployment, and the death of whatever process was holding the conversation when the laptop closed.
The current release is 0.16.8 from May 14. The release notes are short, and one line in them tells you everything about the project’s posture: a security fix that replaces pickle with JSON for the sandbox-to-server transport of tool results. That is not the kind of changelog entry you write if you are a research library. That is the kind of entry you write if real users are running you with untrusted tool code and you are taking responsibility for the blast radius. The Letta team is treating this as infrastructure, and the small things in the release notes are the tell.
The architecture is the part worth dwelling on, because it is where the daylight is between Letta and every framework that markets memory as a checkbox feature. A Letta agent has a tiered memory: a core block of always-resident facts the agent can edit (the “this is who I am, this is who I am talking to” layer), a recall buffer of recent conversation, and an archival store that holds everything the agent has decided is worth keeping for later. The model does not see the archive directly. The framework exposes memory-management tools the agent can call: write to archive, search archive, edit core, summarize and evict from recall. The agent decides what to remember and what to fetch. The framework decides what fits in the prompt at any given moment.
That inversion is the whole game. In a typical RAG-flavored agent, the application code decides what context to inject and the model is a passive consumer. In Letta, the agent is an active manager of its own context, and the application code mostly stays out of the way. The result is an agent that, given enough turns, builds a working representation of the user and the task that the next conversation can resume against without a session handoff. It is the closest thing in the open ecosystem to a stateful assistant rather than a stateless completion wrapped in a chat UI.
The honest limitations matter, and they are not the ones the marketing copy would highlight. Letta is a server, which means you run it, and “you run it” is a different operational shape from “you call an API.” There is a Postgres dependency, a process to monitor, a deployment story to own. The hosted option exists, but the project is most coherent when you treat it as a piece of infrastructure you operate, not a SaaS you subscribe to. The agent-as-active-memory-manager pattern also means latency is a function of how often the model decides to call its memory tools, which is a function of how the prompts and the available functions are tuned. An undertuned agent will either thrash on memory operations or fail to use them at all, and getting the balance right takes iteration. The progressive-memory model is also opinionated in a way that does not map cleanly onto every product. If your use case is single-turn, stateless, retrieval-augmented question answering, Letta is overbuilt. The investment pays back when conversations are long, persistent, and personal.
Where it earns its place is in everything that has historically been awkward in agent design: the customer-support agent that needs to remember the last six tickets without you stuffing them into every prompt, the internal copilot that builds up institutional knowledge about how a team works over weeks rather than minutes, the personal assistant that knows your projects and your preferences without you re-explaining them every Monday. These are the cases where the cost of forgetting is high and the cost of remembering everything is higher, and Letta is the first framework I have used where the answer to “how does the agent remember” was a real architectural answer rather than a hand wave about embeddings.
The comparison that comes up most is with LangGraph’s checkpointer or with the various agent-server projects that have shipped over the last year. LangGraph’s checkpointer is excellent at what it does, which is letting a graph-based agent pause and resume a workflow. It is not memory in the Letta sense. It is execution state. Restoring a LangGraph checkpoint replays you to a point in a flow. Restoring a Letta agent gives you back the same continuous entity, with its accumulated archive, its edited core, and whatever it learned about you in the last six conversations. The other agent servers tend to treat memory as a key-value store with a search index on top, which is the storage model without the management policy. Letta ships the policy, and the policy is the part that takes the most engineering to get right.
The thing nobody will tell you in a launch post is that the policy is also where the failure modes hide. An agent that manages its own memory will sometimes archive the wrong thing, fail to retrieve when it should, or invent a recollection that did not happen. The Letta team has been honest about this in their docs and in their issue tracker, and the work on better recall heuristics is visible in the commit history. The framing matters: this is a problem they are working on, not a problem they are pretending does not exist. That posture is the difference between a project you ship against and a project you watch.
The strategic read on Letta is that it is one of the few open agent frameworks built around a thesis that is going to look obvious in two years. The thesis is that context engineering is the real product surface of agentic systems, that the context window is a managed resource rather than an unlimited input, and that the framework’s job is to make the paging decisions the model is not equipped to make alone. Anthropic is making the same argument in their writing about context. The major labs are all moving toward stateful agents in their hosted products. Letta is the open implementation that is furthest along that road today.
If you are building anything that needs an agent to be the same entity across sessions, Letta is the framework I would start with, knowing the operational cost up front and the policy-tuning work I am taking on. If you are building stateless tool-use, pick something lighter. The choice is about whether memory is a feature you bolt on or a runtime you build around, and the day you decide it is a runtime, Letta is the only framework in the open ecosystem that already agrees with you.
If this was useful, forward it to one engineer who needs less noise in their feed.


