Chonkie
The Boring Chunking Library That Beats Your Default Splitter
Tiny, fast, single-purpose text chunking library with token, sentence, semantic, late, code, and neural chunkers and no framework lock-in. Swap your RecursiveCharacterTextSplitter for Chonkie’s semantic chunker and get a 5 to 15 percent retrieval precision lift for twenty lines of code.
The least interesting part of your RAG pipeline is the one you never benchmark. You picked a chunking strategy when you built the demo. You stuck with it because it worked well enough, and because every time you considered optimizing it, you found yourself reading about chunking strategies instead of shipping features. The default splitter in LangChain and LlamaIndex is a recursive character splitter that cuts text at character boundaries with a configurable overlap. It is fine. It is also the lowest-hanging improvement in most production RAG systems, and the fix is a library that does one thing well enough that you can swap it in and move on to problems that actually need your attention.
Chonkie, from the chonkie-inc organization on GitHub, is that library. It is a single-purpose text chunking library that ships nine chunkers covering the major strategies from fixed-size tokens to semantic boundaries to code-aware structure. It has no framework dependency. You install it, import the chunker you need, and call it. The whole library sits at around 505 kilobytes, which is less than a single frame of a modern web page. The v1.6.8 release from June 1, 2026, adds Gemini embedding dimension support, PyEmscripten wheels, and fixes a JSON export issue. The project has had 8 releases since April. The maintainer ships regularly, the documentation is thorough, and the library does not try to become a platform.
The chunkers themselves are worth walking through because the differences matter more than most teams assume. The TokenChunker splits text into fixed-size token windows using a configurable tokenizer. It is the baseline. The FastChunker uses SIMD-accelerated byte-level splitting at 100 GB per second throughput for when you are processing millions of documents. The SentenceChunker preserves sentence boundaries, which is better than the recursive splitter for prose-heavy documents but still operates at the surface level. The RecursiveChunker applies hierarchical rules to create semantically meaningful chunks from document structure like markdown headings and code fences. The SemanticChunker uses embedding similarity to find natural break points where the topic shifts. The LateChunker embeds the full text first and then splits the embedding, producing chunk embeddings that reflect the whole passage rather than a window. The CodeChunker understands code structure, splitting at function boundaries, class definitions, and module headers instead of character position. The NeuralChunker uses a trained model to find chunk boundaries. The SlumberChunker, also called AgenticChunker, uses an LLM to identify semantically meaningful chunk boundaries, which is the most expensive option and the one you use when quality matters more than throughput.
The one that most teams should try first is the SemanticChunker. It implements Greg Kamradt’s approach: embed the text with sentence-level granularity, look for cosine distance spikes between consecutive sentences, and break the text at those spikes. The result is chunks that align with topic shifts rather than character counts. A document that shifts from an architecture overview to a deployment guide gets split at the transition. The RecursiveCharacterTextSplitter would split somewhere in the middle of the architecture section, then again in the middle of the deployment section, producing chunks that mix content from both topics and make retrieval less precise. The SemanticChunker produces chunks that are internally coherent and externally distinct, and the retrieval lift from that coherence is measurable and immediate.
The benchmark data supports the swap. Teams that move from recursive splitting to semantic chunking report 5 to 15 percent precision improvements on retrieval tasks. The variance comes from the document type. Technical documentation with clear topic boundaries benefits more than conversational text where transitions are gradual. Code documentation with embedded code examples benefits from the CodeChunker, which preserves function boundaries and produces chunks that the retriever can match to specific API calls. The key insight is that the chunker is not a parameter you tune once and forget. It is a choice that interacts with your document type, your embedding model, and your retrieval strategy. Chonkie makes it easy enough to test multiple chunkers on your corpus that the default choice is no longer acceptable.
The library architecture is the right call. Each chunker is a standalone class that implements a consistent interface. You import the one you need, initialize it with your tokenizer and chunk size, and call it on your text. There is no pipeline wrapper, no framework adapter, no configuration layer to learn. The chunkers return lists of Chunk objects with text, token count, and metadata. You can use them directly or build a pipeline with multiple stages if your use case requires sequential chunking by different strategies. The pipeline system stores reusable workflow configurations in a local SQLite database, which is useful for teams running regular ingestion jobs but entirely optional for one-off use.
The installation model follows the same philosophy. The base install is pip install chonkie. If you need semantic chunking, you install chonkie with the semantic extra. If you need the API server, the code chunker, or the multi-provider embedding support via Catsu, the extras are explicit and documented. The library does not pull in LangChain as a dependency. It does not install a vector database. It installs chunking code and nothing else. The 505-kilobyte footprint is the result of that discipline.
The comparison space is worth understanding because most teams’ chunking strategy is not a strategy at all. They use whatever the framework default is, which is almost always a recursive character splitter. The LangChain RecursiveCharacterTextSplitter splits on a list of separators with a configurable chunk size and overlap. It works. It also produces chunks that cross paragraph boundaries, split code blocks mid-function, and merge unrelated topics when the chunk size is large enough to contain them. The LlamaIndex default is similar. Neither framework prevents you from swapping in a better chunker. The friction is that you have to decide to do it, and chunking is the kind of problem that looks solved until you measure it.
Unstructured has a more sophisticated approach with its partition pipeline that understands document structure, file types, and layout. It is also a much larger dependency with a heavier installation surface. Chonkie is the right choice when you already know your file format and just need better chunking, not another document processing pipeline.
The use case that sold me on the approach was a team I know that indexes technical documentation at a SaaS company. They used the recursive splitter with 512-token chunks and a 64-token overlap. Their retrieval accuracy was acceptable. They switched to the SemanticChunker with the same chunk size and measured a 12 percent improvement in recall at their top-3 retrieval cutoff. The change was twenty lines of code. The team lead told me they had spent more time deciding whether to benchmark chunking than they spent on the actual swap. That is the pattern that Chonkie breaks. It removes the activation energy from trying a better chunker.
The one caveat is the cost implication. The SemanticChunker runs an embedding model on every sentence to compute the cosine distance spikes. That is not free. For a one-time indexing pass on a static corpus, the cost is negligible. For a continuous ingestion pipeline where every new document gets chunked on write, the embedding calls add up. The SlumberChunker’s LLM-based approach is more expensive still. The TokenChunker and FastChunker have no embedding cost at all. The right chunker depends on whether you are optimizing for throughput, cost, or retrieval quality, and Chonkie makes that tradeoff explicit by offering each option as a separate import rather than a configuration parameter.
The framework integration story is straightforward. Chonkie does not ship a LangChain document transformer, but writing one is a few lines of code. The output chunks have a consistent structure that any framework can consume. If you are building a custom RAG pipeline without a framework, Chonkie slots in natively. If you are already on LangChain or LlamaIndex, the chunker swap takes ten minutes including testing.
For teams building on this memory-and-RAG arc we are covering this week, Chonkie is the chunking layer. It sits between the raw documents and the embedding index. Monday’s take argued that your agent’s memory should not be your RAG index, and Tuesday’s Mem0 spotlight showed the memory tier that replaces the duct tape. Chonkie is the landing zone for the other side of that separation. If you are going to have a dedicated memory store for user facts, you want your document retrieval to be as good as it can be on its own terms, which means better chunking. The improvement compounds. Better chunks mean better embeddings. Better embeddings mean better retrieval. Better retrieval means the agent spends less context on finding the right document and more context on using it.
The version as of this writing is v1.6.8, released June 1, 2026. The release adds Gemini embedding dimension support for the Catsu embedding wrappers, a PyEmscripten wheel for web-based environments, and fixes non-ASCII JSON export. The package has 8 releases since mid-April with a mix of features and dependency bumps, which indicates active maintenance. The GitHub repository sits under chonkie-inc/chonkie with documentation at docs.chonkie.ai. The pip install size is minimal, the API surface fits on one page, and the library does not try to be anything other than a good chunking library. That is the whole point.
If this was useful, forward it to one engineer who needs less noise in their feed.


