Cognee
GraphRAG Without the Microsoft Tax
Knowledge graph memory for agents, built from unstructured docs, self-hostable, and running on your Postgres instance.
The Microsoft GraphRAG launch was a genuine contribution to the field. The idea that your vector index is not enough, that the relationships between your documents matter at least as much as their semantic proximity, and that a graph layer over unstructured data produces better retrieval than embeddings alone. But the deployment story made it hard to recommend to anyone who was not already running Azure. The cost, the Neo4j dependency, the pipeline orchestration that assumed either a cloud bill or a dedicated infrastructure team. A tool that requires a graph database, a vector database, a Redis cluster, and a relational database before it remembers a single document is a tool that only teams with that stack already running ever adopt.
Cognee is the open-source answer that closed the gap. It builds a typed knowledge graph from unstructured documents, maintains a vector index alongside it, and lets agents query by semantic meaning and structural relationship in the same operation. It also runs the entire memory layer on a single Postgres instance if you want it to.
The project has been under active development for over a year. The 1.0 release was a milestone because it consolidated the architecture around a clear four-operation API: remember, recall, forget, and improve. You feed documents in any format into remember, and Cognee runs an ECL pipeline, extract, cognify, and load, that parses the content, infers entities and relationships with LLM-guided ontology generation, and writes the result into a combined graph-plus-vector store. Recall routes your query through the knowledge graph and the vector index, picking the best search strategy automatically. Forget removes data cleanly from every layer. Improve takes feedback from the agent or the user and refines the stored representation. Four verbs, one mental model.
The recent 1.1 and 1.2 releases have been compounding improvements on that foundation. v1.1 brought the Postgres-native architecture that reduces the graph-and-vector-plus-session stack from four services to one. You can run the entire memory layer on a single Postgres instance with pgvector. The graph still exists. The relationships between entities are still stored and traversed. They just live inside the same database where your embeddings and your session cache and your metadata live, so retrieval moves between similarity and structure without crossing service boundaries. In the Cognee team’s benchmarks, that single-database setup runs about ten percent faster than the separate graph-plus-vector arrangement.
The 1.2 releases added what the team calls a truth subspace. It is a compact index built from distilled, accepted session learnings that helps rerank search results and weight feedback. Think of it as a learned re-ranker that does not need a separate training pipeline. Every time an agent uses Cognee and confirms that a result was useful, that interaction feeds back into the truth subspace and improves the ranking for the next query. The feature shipped just this week in v1.2.2, which means it is fresh enough that I would treat it as directional rather than battle-tested, but the direction is right. Feedback loops in memory systems are the mechanism that separates a knowledge base that stays correct from one that ossifies.
The reliability story improved across the 1.2 cycle as well. The LLM retry policy got smarter about transient failures from rate limits and network hiccups. Search relevance was upgraded with a new reranking step. Ingestion and background sync both got latency improvements. The release notes for v1.2.0 through v1.2.2 show a maintenance cadence that matters more for production adoption than any single feature. Five releases in the last two weeks, each with genuine improvements rather than version bumps for their own sake. That is the signal of a team that is actively using their own tool.
The list of supported backends is worth noting because it reflects a design philosophy that is surprisingly rare in this space. Cognee defaults to Postgres for everything, but you can swap in Neo4j or Neptune for graphs, Redis for sessions, pgvector and LanceDB for vectors, and Qdrant, ChromaDB, Weaviate, or Milvus through community adapters. Local development runs fully embedded on SQLite, LanceDB, and Kuzudb with no services to start. You can prototype on a laptop with pip install cognee and zero infrastructure, and deploy the same code against Postgres in production. That path from zero to production without changing your API calls is the right engineering property for a memory layer.
The benchmark story adds credibility. Cognee scored 0.79 on the BEAM benchmark at 100K tokens, beating the previous state of the art of 0.735, and matched the SOTA at 10M tokens with 0.67. BEAM tests whether a system can track a changing long conversation, which is a more honest test of agent memory than the static needle-in-a-haystack benchmarks that dominate the evaluation discourse. The numbers are directional rather than definitive, but they are strong enough that the Cognee team published the full methodology and the paper behind it, arxiv 2505.24478, and the benchmark is reproducible.
The integration story is practical. Cognee ships a Claude Code plugin that gives Claude persistent memory across sessions. It captures prompts, tool traces, and assistant responses into session memory, injects relevant context on every prompt, and syncs session memory into the permanent knowledge graph at session end. There are also Rust and TypeScript clients, a CLI, an MCP server, a Docker Compose setup that starts everything with a single command, and prebuilt images on Docker Hub. The breadth of integration points is unusual for a tool at this stage and suggests the team has been watching how people actually want to consume a memory layer.
There are two limitations worth naming. The first is the cost of the LLM calls that drive the extract and cognify stages. Cognee uses the LLM to infer entities and relationships from documents, which produces a richer knowledge graph than purely statistical methods, but it also means every document ingested costs a round of LLM inference. For a hundred documents it does not matter. For a hundred thousand, the inference cost is real and needs to be budgeted. The ontology generation is efficient enough that the per-document cost is reasonable, but it is not zero, and anyone evaluating Cognee against a purely embedding-based approach needs to include that line item in the comparison.
The second limitation is that the graph quality is only as good as the LLM doing the extraction. Cognee’s extraction pipeline is provider-agnostic and works with local models through tools like Ollama, but the entity disambiguation and relationship inference benefit from a capable model. If your deployment cannot use a frontier model for the extraction step, you will need to test whether a smaller model produces graph quality that is high enough for your use case. The team’s benchmarks used OpenAI-grade models, and the results reflect that.
The audience for this tool is any team that tried Microsoft GraphRAG, saw the deployment complexity and the cost, and went back to a vanilla vector index because it was the path of least resistance. Cognee gives you the same graph-augmented retrieval with a deployment story that starts at pip install and scales to a single Postgres instance. The graph is not a feature bolted on to a vector pipeline. It is the architecture from the ground up. The ECL pipeline, the truth subspace, the feedback loop, the Postgres-native graph layer. They are designed together because memory with relationships is fundamentally different from memory without them. Cognee is the tool that makes that difference accessible without requiring a dedicated infrastructure team.
If this was useful, forward it to one engineer who needs less noise in their feed.


