<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Signal Over Noise]]></title><description><![CDATA[Most AI content is written by people selling something. This isn't. Signal Over Noise is a practitioner-first publication on AI agents and automation, written by a CTO who's made the expensive mistakes so you don't have to.]]></description><link>https://signalovernoise.tech</link><image><url>https://substackcdn.com/image/fetch/$s_!Owu3!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb7415cf-9ce5-438f-98c5-7acddf07ed8f_1280x1280.png</url><title>Signal Over Noise</title><link>https://signalovernoise.tech</link></image><generator>Substack</generator><lastBuildDate>Mon, 01 Jun 2026 18:03:40 GMT</lastBuildDate><atom:link href="https://signalovernoise.tech/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Justin Wilson]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[signalovernoisetech@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[signalovernoisetech@substack.com]]></itunes:email><itunes:name><![CDATA[Justin Wilson]]></itunes:name></itunes:owner><itunes:author><![CDATA[Justin Wilson]]></itunes:author><googleplay:owner><![CDATA[signalovernoisetech@substack.com]]></googleplay:owner><googleplay:email><![CDATA[signalovernoisetech@substack.com]]></googleplay:email><googleplay:author><![CDATA[Justin Wilson]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[The Best Agent Tools Aren't on Your Timeline]]></title><description><![CDATA[Kicking off a month on the agent tools that earn their place in production without earning a viral thread.]]></description><link>https://signalovernoise.tech/p/the-best-agent-tools-arent-on-your</link><guid isPermaLink="false">https://signalovernoise.tech/p/the-best-agent-tools-arent-on-your</guid><dc:creator><![CDATA[Justin Wilson]]></dc:creator><pubDate>Mon, 01 Jun 2026 10:10:42 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!GNkJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53da380c-7f28-4413-8059-7e05b8c7f6cb_1728x960.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GNkJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53da380c-7f28-4413-8059-7e05b8c7f6cb_1728x960.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GNkJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53da380c-7f28-4413-8059-7e05b8c7f6cb_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!GNkJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53da380c-7f28-4413-8059-7e05b8c7f6cb_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!GNkJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53da380c-7f28-4413-8059-7e05b8c7f6cb_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!GNkJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53da380c-7f28-4413-8059-7e05b8c7f6cb_1728x960.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GNkJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53da380c-7f28-4413-8059-7e05b8c7f6cb_1728x960.png" width="1456" height="809" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/53da380c-7f28-4413-8059-7e05b8c7f6cb_1728x960.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:809,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2455853,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://signalovernoise.tech/i/200099747?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53da380c-7f28-4413-8059-7e05b8c7f6cb_1728x960.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GNkJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53da380c-7f28-4413-8059-7e05b8c7f6cb_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!GNkJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53da380c-7f28-4413-8059-7e05b8c7f6cb_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!GNkJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53da380c-7f28-4413-8059-7e05b8c7f6cb_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!GNkJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53da380c-7f28-4413-8059-7e05b8c7f6cb_1728x960.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p><em>Kicking off a month on the agent tools that earn their place in production without earning a viral thread.</em></p><div><hr></div><p>The agent tool I have leaned on the hardest for the last six months is one I have never seen mentioned in a single post on my X timeline, and the agent tool I have seen mentioned the most on my timeline is one I removed from a production stack in March after it cost a client a weekend of incident response. That asymmetry is not an accident. It is the structural condition of the field right now, and pretending otherwise has become expensive enough that I want to spend a month on it.</p><p>Most of the agent tooling that is winning attention right now is winning it the same way SaaS won attention in 2014: a founder with a strong personal brand, a launch video shot in the same Brooklyn loft as every other launch video, a Show HN that hits the front page on a Tuesday, a sequence of threads that get reposted by the same eight accounts, and a Discord that fills up with people who want to be early on something. That motion produces a particular shape of tool. The tool looks great in a thirty-second demo. The README is beautiful. The marketing site has a gradient. The &#8220;get started&#8221; path takes ninety seconds. The first thing you build with it works. The second thing you build with it works. The third thing you build with it, which is the first thing that touches a real production constraint, is where you discover that the tool was optimized for the launch demo and not for the work.</p><p>The tools that survive the third build are the tools that were never optimized for the launch in the first place. They are usually older. They are usually maintained by a small team that does not have a growth lead. The README is functional and slightly out of date. The marketing site is a single page or does not exist. The &#8220;get started&#8221; path takes an hour because the tool is honest about the assumptions it is making. The first thing you build with it is harder than it would have been on the trendy alternative. The third thing you build with it works, and the tenth thing works, and the hundredth thing works, and a year later you realize the tool has receded into the background of your stack the way good infrastructure is supposed to.</p><p>The problem is that this second category of tool is structurally incapable of competing for attention with the first category. The founders are not on X. The maintainers are not running threads. The project does not have a Series A to spend on developer relations. The signal these tools produce is the signal of work getting done, which is the quietest signal there is. A team that has been running a particular orchestration library in production for two years and has not had an incident attributable to it does not write a blog post about it. They write blog posts about the things that hurt. The things that work get inherited by the next engineer and the engineer after that, and the tool&#8217;s reputation propagates through hiring channels and Slack DMs and the kind of conversations that happen at conference dinners, not through the conversations that happen on a platform optimized for outrage.</p><p>I want to deal with the obvious counterargument before it sits unaddressed: the argument that the loud tools are loud because they are good, and the quiet tools are quiet because they are not, and the market is roughly efficient at sorting these things out over a long enough timeline. That argument is wrong in the specific case of agent tooling for two reasons. The first is that the timeline is not long enough. The field is two years old in its current form. The feedback loop between &#8220;this tool seemed great&#8221; and &#8220;this tool burned us in production&#8221; is six to nine months long for most teams, which means the market signal we are getting in June 2026 is the result of decisions made before most of these tools had been stressed under real load. The second reason is that the loudness of a tool is a function of the founder&#8217;s distribution, not the tool&#8217;s quality. A founder with twelve thousand followers and a knack for threads will produce more visible signal in a week than a maintainer with a hundred GitHub followers will produce in a year, regardless of which tool is actually better. The market is not sorting on quality. The market is sorting on distribution, and the two are not correlated in this field yet.</p><p>There is a second counterargument that is more honest and harder to dismiss: that the loud tools at least have community, and community matters, and a tool with a small maintainer base is a tool with a bus factor of one. This argument is real. I have lost bets on small-maintainer tools that went quiet at exactly the wrong moment, and the cost of those bets is part of what shaped the criteria I am going to use this month. The mitigation is not to avoid quiet tools. The mitigation is to evaluate quiet tools on the dimensions that actually predict longevity: how clearly the tool is scoped, whether the maintainer has shipped a stable interface, whether the project has a real ecosystem of users running it in production even if those users are not posting about it, and whether the architecture lets you replace the tool with a competitor without rewriting the system around it. A small-maintainer tool with a tight scope and a clean interface is a safer bet than a venture-backed tool with a sprawling surface area, because the small tool can be replaced and the large one cannot. The bus factor is the input. The replaceability is the output, and the output is the thing that matters when you are deciding what to put in a production stack.</p><p>What I am going to do for the rest of June is walk through the agent tools that have earned their place in stacks I have built or advised, and which do not show up on the threads that dominate this field&#8217;s discourse. Some of them are frameworks. Some of them are execution engines. Some of them are memory layers or governance layers or evaluation harnesses. All of them share the property that I would have to actively explain them to a senior engineer joining a project, because the engineer would not have heard of them, and that conversation would end with the engineer being glad we picked what we picked. None of these tools is going to make you look like you are early on the next thing. All of them are going to make you look like you are running a stack that survives the next thing.</p><p>The signal I am trying to amplify with this month is the signal of work. The work of a tool that has been in production for two years and has not been an incident. The work of a maintainer who fixes the bug in the issue you opened on a Sunday and does not post about it. The work of a project that does not have a launch video because the project did not need a launch. That work is the actual content of the field, and it is buried under a layer of vendor marketing and founder threads and capability announcements that have almost nothing to do with whether a tool is worth standing on.</p><p>The case for spending a month on what works without trending is not that the loud tools are bad. Some of the loud tools are excellent. The case is that the loud tools already have the attention they need, and the quiet tools do not, and the asymmetry of attention is producing a generation of AI stacks that look impressive in a demo and fail under load. If you are building anything in this space that has to survive contact with a real customer, the work of finding the quiet tools is the work that pays off, and the work of separating signal from noise is what this publication is for. Thirty days of that work starts tomorrow.</p><div><hr></div><p><em>If this was useful, forward it to one engineer who needs less noise in their feed.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://signalovernoise.tech/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://signalovernoise.tech/subscribe?"><span>Subscribe now</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://signalovernoise.tech/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share Signal Over Noise&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://signalovernoise.tech/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share Signal Over Noise</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[What May 2026 Told Us About the AI Tools That Actually Ship]]></title><description><![CDATA[Thirty days, thirty posts, and the patterns that survived contact with real work.]]></description><link>https://signalovernoise.tech/p/what-may-2026-told-us-about-the-ai</link><guid isPermaLink="false">https://signalovernoise.tech/p/what-may-2026-told-us-about-the-ai</guid><dc:creator><![CDATA[Justin Wilson]]></dc:creator><pubDate>Sun, 31 May 2026 09:59:24 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!w2D2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F144c62c3-35f6-4d51-a7cd-b6d173220e9d_1728x960.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!w2D2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F144c62c3-35f6-4d51-a7cd-b6d173220e9d_1728x960.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!w2D2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F144c62c3-35f6-4d51-a7cd-b6d173220e9d_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!w2D2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F144c62c3-35f6-4d51-a7cd-b6d173220e9d_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!w2D2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F144c62c3-35f6-4d51-a7cd-b6d173220e9d_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!w2D2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F144c62c3-35f6-4d51-a7cd-b6d173220e9d_1728x960.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!w2D2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F144c62c3-35f6-4d51-a7cd-b6d173220e9d_1728x960.png" width="1456" height="809" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/144c62c3-35f6-4d51-a7cd-b6d173220e9d_1728x960.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:809,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1913599,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://signalovernoise.tech/i/199961250?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F144c62c3-35f6-4d51-a7cd-b6d173220e9d_1728x960.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!w2D2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F144c62c3-35f6-4d51-a7cd-b6d173220e9d_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!w2D2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F144c62c3-35f6-4d51-a7cd-b6d173220e9d_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!w2D2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F144c62c3-35f6-4d51-a7cd-b6d173220e9d_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!w2D2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F144c62c3-35f6-4d51-a7cd-b6d173220e9d_1728x960.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Thirty days, thirty posts, and the patterns that survived contact with real work.</em></p><div><hr></div><p>I wrote thirty articles this month about the tools my teams actually touch when we build AI systems for paying customers, and the most useful thing I can do on the last day of May is tell you which patterns survived the writing and which ones quietly fell apart on the page. I started the month thinking the story was going to be about agent frameworks, because that is where the noise is loudest and where the largest number of vendors are trying to take your attention. The story turned out to be about something else. The story turned out to be that the framework layer is the least interesting decision in an AI stack right now, and the interesting decisions are happening one layer up and one layer down from where everyone is looking.</p><p>The first week was the framework spotlight arc: Strands, LangGraph, CrewAI, AutoGen, Pydantic AI, the OpenAI Agents SDK, with the Mastra and Semantic Kernel honorable mentions to round it out. I went in with the prior that the field was overcrowded and came out with the same prior reinforced. The frameworks that survived my own scrutiny were the ones built by people who took debuggability and observability as design requirements from the first commit, not features added in the second year of the project. That filter eliminated a startling amount of the field. The frameworks that earned their place on a real engagement, in my notes from this month, were LangGraph, Pydantic AI, and Strands. The others have their cases. None of them are the case I would defend on a Monday morning standup with a CTO asking why we picked what we picked.</p><p>The second week was the eval arc, and that is the week the through-line of the month actually surfaced. Promptfoo, Braintrust, LangSmith, Langfuse, and Arize Phoenix all got their day, and what became obvious by the end of the week is that the field has not yet decided whether evaluation is a development-time activity or a runtime activity. The vendors who treat it as a runtime concern, which is most of them, end up building observability tools that happen to score outputs. The vendors who treat it as a development-time concern, like Promptfoo and increasingly Braintrust, end up building something closer to a test suite for non-deterministic systems. The two camps are not in competition, even though they look like they are on the landing pages. They are solving different problems for the same team. A production AI system needs both, and the teams that figured this out first are the teams that are shipping the most reliably right now. Most teams have not figured it out, which is why the eval gap is the largest open problem on my list heading into June.</p><p>The third week was the older-but-relevant arc, and that one mattered for a reason I did not see coming when I sketched the calendar. Haystack, LlamaIndex, Semantic Kernel, DSPy, and the RAG-to-agent evolution take on the closing day all served the same purpose in retrospect: they were the reminder that the agent ecosystem did not start in 2024, and the tools that have been quietly maturing for three or four years are often the ones with the production stories the new entrants do not have yet. Haystack runs in places where the new frameworks will not survive a procurement review. LlamaIndex has been doing agent work under a different label for longer than most of its competitors have existed. DSPy is the only tool in the field treating prompt construction as a compiler problem rather than a string-formatting exercise, and that bet is going to look smarter in eighteen months than it does today. The lesson of the week, which I want to underline because it is genuinely contrarian in the current moment, is that vendor age is not a liability in this field. Vendor age is the closest thing we have to a proxy for production hardening.</p><p>The fourth week was the personal arc, and that is the one I felt the most uncertain about heading into the month and the most confident about coming out of. The arc was about built-in tools, Claude Skills, the skills-as-npm-packages pattern, and the Composio interrogation that closed it out. The argument that emerged across the seven posts, which I want to state directly because it is the argument I will be making for the rest of the year, is that the agent ecosystem is currently confusing two different distribution problems and treating them as one problem. The first distribution problem is how a developer ships a capability to an agent runtime. The second distribution problem is how an organization governs which capabilities its agents are allowed to use. The current generation of registry-style products, Composio being the cleanest example, are solving the second problem by pretending it is the first. That works until you try to put an agent through a SOC 2 audit, at which point the registry model collapses into something the auditor cannot understand and the security team cannot underwrite. The package model, where skills ship as npm or PyPI artifacts with the same supply-chain controls as the rest of your code, solves both problems at once because both problems are already solved at the package layer. Most teams are not going to land on this until the audit cycle forces them to. That is fine. The teams that figure it out before the audit are the teams that get to keep their velocity through Q3.</p><p>The framework scorecard yesterday was the cross-tool retrospective, and the thing I want to call out from it now that I have a day of distance is that the criteria I used to score the frameworks were the same criteria I would use to score anything else in this stack. Debuggability, lock-in, type safety, observability, and team-shareability are not framework-specific. They are the criteria that should be applied to every AI tooling decision a team makes, including the ones that do not look like framework decisions, like which observability vendor to standardize on or which eval harness to integrate into CI. The reason most teams end up with the AI stack they regret is that they apply rigor to the framework decision and then default to whatever is easiest on every decision after it. The framework is the most visible choice, which is exactly why it is the wrong place to spend most of your judgment budget.</p><p>The pattern that emerged across all four weeks, which I did not see clearly until I sat down to write this post, is that the AI tools that actually ship are the ones that took some part of the production lifecycle seriously from day one. Strands took observability seriously. Pydantic AI took type contracts seriously. Promptfoo took the test-suite metaphor seriously. Langfuse took self-hosting seriously. Haystack took enterprise procurement seriously. DSPy took the compiler metaphor seriously. None of those tools win on every axis. All of them win on the axis they bet on, and that is the axis that matters when the tool meets a real production constraint. The tools that try to win on every axis at once are the tools you have not heard of in six months, because the strategy of being slightly above average on everything is the strategy that loses to the strategy of being excellent at the one thing your customer actually feels.</p><p>June is going to look different from May for two reasons. The first is that the framework arc has been written, and I do not see another tool spotlight worth thirty days of attention emerging at the same pace; the field is consolidating and the spotlights will become rarer and more reluctant. The second is that the open problems I identified across the month, the eval gap, the distribution-versus-governance confusion, the production-hardening lag in newer tools, are not problems that get solved by writing about one more framework. They get solved by writing about the architectural patterns that sit above the framework layer, and that is where I am pointing the calendar for the next month. Expect posts about the shape of an AI stack that survives a regulated-industry deployment, the patterns for running evals in CI without bankrupting the org on inference costs, and a hard look at what changes when the model layer becomes commodity faster than the tooling layer does.</p><p>The month did not change my mind about anything I came in believing. It sharpened the conviction that the interesting work in this field is not happening at the framework layer, and it gave me a stack of examples to point at when someone asks me why. The framework debate is the conversation the vendors want you to be having, because that conversation is good for them. The conversation that is good for you is the one happening at the eval layer, the distribution layer, and the production-hardening layer, and that is the conversation I am here to keep having.</p><p>If you read every post this month, thank you. If you read one, thank you for that too. The next thirty days are going to be more opinionated than the last thirty, because the last thirty earned me the right to be.</p><div><hr></div><p><em>If this was useful, forward it to one engineer who needs less noise in their feed.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://signalovernoise.tech/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://signalovernoise.tech/subscribe?"><span>Subscribe now</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://signalovernoise.tech/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share Signal Over Noise&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://signalovernoise.tech/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share Signal Over Noise</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[The Agent Framework Wars: A Practitioner Scorecard]]></title><description><![CDATA[Six frameworks, four criteria that actually matter in production, and the honest verdict after a month of writing about them.]]></description><link>https://signalovernoise.tech/p/the-agent-framework-wars-a-practitioner</link><guid isPermaLink="false">https://signalovernoise.tech/p/the-agent-framework-wars-a-practitioner</guid><dc:creator><![CDATA[Justin Wilson]]></dc:creator><pubDate>Sat, 30 May 2026 11:10:11 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!JBt4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2241a61-2875-4faa-b6ca-a5f3460c058f_1728x960.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JBt4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2241a61-2875-4faa-b6ca-a5f3460c058f_1728x960.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JBt4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2241a61-2875-4faa-b6ca-a5f3460c058f_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!JBt4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2241a61-2875-4faa-b6ca-a5f3460c058f_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!JBt4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2241a61-2875-4faa-b6ca-a5f3460c058f_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!JBt4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2241a61-2875-4faa-b6ca-a5f3460c058f_1728x960.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JBt4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2241a61-2875-4faa-b6ca-a5f3460c058f_1728x960.png" width="1456" height="809" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b2241a61-2875-4faa-b6ca-a5f3460c058f_1728x960.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:809,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2513757,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://signalovernoise.tech/i/199854156?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2241a61-2875-4faa-b6ca-a5f3460c058f_1728x960.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JBt4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2241a61-2875-4faa-b6ca-a5f3460c058f_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!JBt4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2241a61-2875-4faa-b6ca-a5f3460c058f_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!JBt4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2241a61-2875-4faa-b6ca-a5f3460c058f_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!JBt4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2241a61-2875-4faa-b6ca-a5f3460c058f_1728x960.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Six frameworks, four criteria that actually matter in production, and the honest verdict after a month of writing about them.</em></p><div><hr></div><p>I have spent the last four weeks of this newsletter writing one tool spotlight after another on the agent frameworks that currently matter: AWS Strands, LangGraph, CrewAI, AutoGen 0.4, Pydantic AI, OpenAI&#8217;s Agents SDK, with a side trip through Semantic Kernel and the TypeScript camp for completeness. Every one of those posts was an attempt to be fair to the framework on its own terms. This one is not. This one is the head-to-head, in the voice of the person who has to pick one for a real team and live with the consequences. The criteria I care about are not &#8220;developer experience&#8221; or &#8220;community size&#8221; or any of the other variables that show up on vendor comparison pages. The criteria are debuggability, lock-in, type safety, observability, and team-shareability, because those are the five things that decide whether a framework choice quietly costs you a quarter of your engineering capacity eighteen months in.</p><p>Debuggability is the one I lead with because it is the one nobody scores against until it is too late. The honest test is not &#8220;can I attach a debugger to a Python process&#8221; but &#8220;when an agent run produces the wrong answer in production, how many minutes does it take a mid-level engineer on my team to find the cause.&#8221; LangGraph wins this category by a margin that surprised me when I went back through my notes from the month. The graph is explicit, the state transitions are inspectable, and when something goes wrong you can replay the run with the same inputs and watch it happen. CrewAI sits in the middle: the role-based abstraction makes the happy path easy to read and makes failure modes harder to localize, because the framework is doing more of the orchestration on your behalf. AutoGen 0.4 is better than the 0.2 reputation it inherits but still suffers from the multi-agent conversation pattern that makes &#8220;whose turn was it and why&#8221; a hard question to answer six steps deep. The OpenAI Agents SDK is the worst of the lot for debugging, not because the framework is broken but because the abstraction is thin and the failure mode is usually &#8220;the model made a choice you cannot inspect.&#8221;</p><p>Pydantic AI and Strands sit at the top of the pile here for different reasons. Pydantic AI is debuggable because the type contracts catch a category of error before it becomes a runtime mystery. If a structured output does not validate, you know exactly which field failed and why, which collapses a whole class of &#8220;the agent gave me garbage&#8221; investigations into a fixable error message. Strands is debuggable for the opposite reason: AWS chose to surface tracing as a first-class concern from the first commit, and the observability hooks are wired through every primitive, so the question of &#8220;what did the agent actually do&#8221; has a real answer without bolting on a third-party tool. The two of them solve different debugging problems, and a team that uses both for different services is doing the right thing.</p><p>Lock-in is the criterion the vendors least want you to think about, which is exactly why it should be the second filter. The OpenAI Agents SDK is the most locked-in framework on the list by a wide margin, and the lock-in is not about model choice, because every framework supports model swapping at this point. The lock-in is in the runtime primitives, the tracing model, the way handoffs are expressed, the assumption that you will use OpenAI&#8217;s evals product downstream. None of those are unreasonable choices on their own. They are unreasonable in aggregate, because the cost of leaving compounds across all of them at once. Strands has the same shape of risk wearing different colors: AWS-flavored conventions, AWS-flavored observability story, AWS-flavored deployment path. The framework is open source and the code will keep running if you walk away. The ecosystem around it will not.</p><p>LangGraph, CrewAI, AutoGen, and Pydantic AI are the four that score well on lock-in, and the reasons are worth distinguishing. LangGraph is portable because the graph is a data structure you own; the framework is mostly a runtime for executing it, and if you ever needed to rewrite that runtime you could do it in a week. CrewAI is portable because the role and task abstractions are simple enough to reimplement on top of anything else if you had to, and the team has not built a moat around any particular hosting story. AutoGen 0.4 is portable because Microsoft Research has historically published the protocol designs as research artifacts, which means the abstractions get documented at a level that survives the framework. Pydantic AI is portable in the strongest sense, because the type definitions are the contract and the framework is a thin layer around them; the structured outputs and tool schemas would survive a swap to anything else with two days of glue code.</p><p>Type safety is the criterion that splits the field in a way that has nothing to do with the Python-versus-TypeScript debate, although that is the way it usually gets framed. The real question is whether the framework treats the schema as a first-class concern or an afterthought. Pydantic AI wins this category by a margin that is almost embarrassing to the rest of the field, because the framework is built around the premise that structured outputs and tool definitions should be expressed in the same Pydantic models you already use for your API contracts. The end result is that a production AI service ends up with the same level of contract enforcement as a normal HTTP service, which is the thing that lets your CI actually catch regressions before they reach a customer. Mastra, on the TypeScript side, makes the same bet with Zod and earns the same advantage for teams whose backend is already TypeScript.</p><p>LangGraph, CrewAI, and AutoGen are usable but not strong here, in the sense that you can layer Pydantic over them if you choose to and most production teams eventually do, but the framework is not pulling you in that direction by default. Strands is better than I expected, with type hints threaded through the primitives in a way that suggests the AWS team learned from the Python community&#8217;s long argument about static typing. OpenAI&#8217;s Agents SDK is the weakest, which is a strange place for it to land given that OpenAI invented the function-calling abstraction that made this whole conversation possible. The SDK gives you typed tools but the broader runtime is loose, and the gap shows up most visibly in error handling, where untyped exceptions surface from places you did not expect.</p><p>Observability is where I came in with a strong prior and most of it held up after a month of writing about it. LangFuse, Arize Phoenix, and LangSmith are the three open or open-ish tools I covered earlier this month, and the honest summary is that the framework you pick determines how painful or pleasant integration with any of them ends up being. LangGraph is straightforward, because the graph already represents the structure the tracer wants to record. CrewAI has gotten meaningfully better here in the last two releases, with first-class hooks for emitting span data. Strands is genuinely excellent because the AWS team treated this as a launch requirement, which is rare enough to call out. Pydantic AI is good because the contracts already exist; you mostly need to wire them through. AutoGen and the OpenAI SDK are workable but more effort than they should be, which is a frustrating thing to discover the week before a production launch.</p><p>Team-shareability is the last criterion and the one I find myself caring most about now that I have written this many posts. The question is whether a person who did not write the original agent can pick it up, understand what it does, change one thing, and ship that change in the same afternoon. CrewAI wins this in the broad case because the role-and-task abstraction reads like documentation; the cost is that the abstraction hides enough of the runtime that the same engineer cannot debug the failure modes I described earlier. LangGraph is the second-best, because the graph diagram and the code are close enough that the visualization works as onboarding material. Pydantic AI is excellent for teams that already think in types and a step harder for teams that do not. Strands is good if your team is comfortable with AWS conventions and worse if they are not. AutoGen is hardest in the team-handoff scenario because the multi-agent conversation pattern requires holding more state in your head than the other patterns do.</p><p>The verdict, which I would not write this directly outside of a piece like this, is that there is no one framework that wins. There are three I would defend on a given engagement and three I would only pick under specific conditions. LangGraph, Pydantic AI, and Strands are the three I keep coming back to in real work, and the reason is the same in each case: they were built by people who took debuggability and observability as design requirements rather than feature requests. CrewAI is the right answer when the team optimizing for it is composed of domain experts who need to express workflows without writing graph code. AutoGen is the right answer when the multi-agent conversation pattern is genuinely the shape of the problem, which is less often than people think. The OpenAI Agents SDK is the right answer when you have already committed to the OpenAI ecosystem end to end and the lock-in cost is one you have priced in honestly.</p><p>The framework choice is not the most important choice you will make about an agent system, and that is the part I want to land hardest. The model choice matters more, the evaluation harness matters more, the data layer matters more, the deployment posture matters more. The framework decides how those decisions get expressed in code. Pick the one that gets out of your way for the kind of system you are actually building, and budget the time to switch when you discover the system you thought you were building was the wrong one.</p><div><hr></div><p><em>If this was useful, forward it to one engineer who needs less noise in their feed.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://signalovernoise.tech/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://signalovernoise.tech/subscribe?"><span>Subscribe now</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://signalovernoise.tech/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share Signal Over Noise&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://signalovernoise.tech/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share Signal Over Noise</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Composio: The Tool Registry That Wants to Be the npm of Agent Capabilities]]></title><description><![CDATA[A hosted catalog of 1000+ toolkits with auth handled for you.]]></description><link>https://signalovernoise.tech/p/composio-the-tool-registry-that-wants</link><guid isPermaLink="false">https://signalovernoise.tech/p/composio-the-tool-registry-that-wants</guid><dc:creator><![CDATA[Justin Wilson]]></dc:creator><pubDate>Fri, 29 May 2026 10:40:26 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!5yUd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08beeb1-9aaf-4ddd-ade1-e0bbb6d19b78_1728x960.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5yUd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08beeb1-9aaf-4ddd-ade1-e0bbb6d19b78_1728x960.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5yUd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08beeb1-9aaf-4ddd-ade1-e0bbb6d19b78_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!5yUd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08beeb1-9aaf-4ddd-ade1-e0bbb6d19b78_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!5yUd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08beeb1-9aaf-4ddd-ade1-e0bbb6d19b78_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!5yUd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08beeb1-9aaf-4ddd-ade1-e0bbb6d19b78_1728x960.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5yUd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08beeb1-9aaf-4ddd-ade1-e0bbb6d19b78_1728x960.png" width="1456" height="809" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f08beeb1-9aaf-4ddd-ade1-e0bbb6d19b78_1728x960.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:809,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1799028,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://signalovernoise.tech/i/199724714?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08beeb1-9aaf-4ddd-ade1-e0bbb6d19b78_1728x960.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5yUd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08beeb1-9aaf-4ddd-ade1-e0bbb6d19b78_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!5yUd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08beeb1-9aaf-4ddd-ade1-e0bbb6d19b78_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!5yUd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08beeb1-9aaf-4ddd-ade1-e0bbb6d19b78_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!5yUd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff08beeb1-9aaf-4ddd-ade1-e0bbb6d19b78_1728x960.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>A hosted catalog of 1000+ toolkits with auth handled for you. The honest case for and against the registry model.</em></p><div><hr></div><p>The tell on every agent integration story is the OAuth flow nobody wants to write. You can wire a model to a Gmail-send tool in twenty minutes. You can wire it across forty users, three Google Workspace tenants, a token-refresh background job, and a revocation path that survives a compromised laptop in about six weeks. That gap is the entire reason hosted tool registries exist, and it is the reason Composio has built the project that it has.</p><p>Composio is an open-source toolkit catalog with the ComposioHQ repo sitting around 28,500 stars this week and a pushed-at timestamp from two days ago, which already says more than the marketing does. The pitch is a single line: 1000+ pre-built toolkits, tool search, context management, authentication, and a sandboxed workbench, sold as the connective tissue between your agent and every SaaS your customers already use. The CLI is shipping beta releases on a roughly daily cadence right now, with <code>@composio/cli@0.2.31-beta.256</code> landing nine days ago and a steady drumbeat of point releases before that. The primary language switched to TypeScript somewhere in the last cycle, with a first-class Python SDK still maintained. MCP is in the topic list. The license is MIT. The repo is alive in a way most &#8220;agent infrastructure&#8221; repos on GitHub are not.</p><p>The product question is not whether Composio works. It does. The question is the one every team adopting a tool registry has to answer honestly, and most don&#8217;t until the second integration breaks: what exactly are you trading away in exchange for never writing an OAuth handler again?</p><p>Here is what the registry does well. The catalog covers the integrations every B2B agent actually needs: Gmail, Outlook, Slack, Notion, Linear, Jira, GitHub, Salesforce, HubSpot, Stripe, the long tail of HR and finance tooling. Each toolkit ships with the auth flow already implemented, the API client wrapped, the action signatures discoverable by the model, and a hosted callback endpoint that handles token exchange. You add a connection through the SDK, point your agent at the toolkit, and the calls work. The execution layer can run inside Composio&#8217;s sandbox or inside your own runtime, which matters more than the marketing emphasizes. The MCP support means the same toolkits register as MCP servers if you&#8217;re already invested in that protocol. Tool search exists because the catalog is now too large for a model to consider every action in a single context window, which is itself a real architectural problem solved correctly.</p><p>That is the case for. The case against is structural, and it is the same case against every hosted registry that has ever existed in software.</p><p>The integration you don&#8217;t own is the integration you can&#8217;t fix. When the Gmail toolkit returns a 429 in a way the SDK doesn&#8217;t surface cleanly, your agent gets a generic tool error and the model retries until your rate limit is gone. When a Salesforce field gets renamed in the customer&#8217;s org and Composio&#8217;s wrapper hasn&#8217;t been updated, the action fails with a serialization error that doesn&#8217;t tell you which field. When a vendor deprecates an API version, the fix lives in a PR queue you don&#8217;t control. None of these failures are hypothetical. They are the operational reality of every integration platform from Zapier on forward, and the registry model concentrates the failure mode into a single dependency whose roadmap doesn&#8217;t match yours.</p><p>The skills-as-packages pattern this newsletter has been working through all month inverts that dependency. When the Gmail-send capability lives in an npm package you publish from your own repo, the implementation is in your codebase. The 429 retry logic is yours to write. The Salesforce field name lives in a config file in your repo and changes through a PR with tests. The fix is <code>npm version patch &amp;&amp; git push</code>, not a support ticket. The cost is that you wrote the integration. The benefit is that nobody can break it on a Tuesday morning without your knowledge.</p><p>Composio knows this is the tension. The sandboxed workbench, the ability to extend toolkits with custom actions, the self-hostable execution mode, all of it is the company&#8217;s answer to &#8220;what if I want to own more of the surface area.&#8221; It is a genuine answer and worth taking seriously. The honest read is that it lowers the floor of the tradeoff rather than removing it. Even with a custom action, the discovery layer, the auth layer, and the catalog metadata are still owned by Composio. That is fine for capabilities where the API is stable and the vendor is healthy. It is the wrong place to be when the integration is the load-bearing part of your product.</p><p>The right way to think about Composio in 2026 is the way you would think about any platform dependency: as a function of which capabilities sit on your critical path and which don&#8217;t. The internal sales-ops agent that summarizes Salesforce activity for the weekly meeting is a Composio fit. The customer-facing agent that takes refund actions against Stripe and writes back to the billing record is not. The line is not &#8220;internal versus external.&#8221; It is &#8220;can this fail silently for a week before we notice.&#8221; If the answer is yes, the registry is fine. If the answer is no, the integration belongs in a package you own, tested in CI, versioned with the rest of your code, and rolled back with <code>git revert</code> instead of a vendor escalation.</p><p>The cost story is where the registry economics get interesting. The open-source toolkit code is MIT and free to self-host. The hosted execution, the managed auth callbacks, and the connection storage have a paid plan that scales with active connections and action invocations. That is a fair model. The hidden cost, the one nobody puts in the architecture diagram, is the platform tax on every action the agent takes. Each call routes through Composio&#8217;s infrastructure, which adds latency, which adds a failure mode, which adds a vendor relationship to your runbook. The right comparison is not Composio versus writing the integration from scratch. It is Composio versus a package in your monorepo that imports a vendor&#8217;s official SDK directly and handles auth through your existing identity provider. The package wins on latency, on debuggability, and on the day the vendor announces an unrelated outage.</p><p>The release cadence is the strongest argument for taking Composio seriously as infrastructure rather than as a hosted tool. A project shipping daily beta CLI releases, pushing changes within the last 48 hours, and maintaining a TypeScript and Python SDK in parallel is a project run by people who have customers in production calling them when things break. That is the right kind of vendor to depend on, if you are going to depend on a vendor at all. Compare it to the field&#8217;s usual graveyard of repos that hit 20,000 stars during the 2024 funding wave and have not tagged a release in nine months. Composio is in the rarer category of agent infrastructure that is still being maintained like a product the maintainers expect to be using themselves next year.</p><p>The verdict for the month is the same verdict that has shown up against every category we have covered. The registry model is real, it solves a real problem, and it is the right answer for the right tier of integration. The wrong tier is the one where the integration is the product. Use Composio for the connections that would otherwise sit in the backlog for a quarter. Keep the load-bearing toolkits in code you own. The registry earns its place by handling the long tail of integrations no engineering team was going to get to. It loses its place the moment a team treats it as a substitute for owning the parts of the agent that matter most.</p><div><hr></div><p><em>If this was useful, forward it to one engineer who needs less noise in their feed.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://signalovernoise.tech/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://signalovernoise.tech/subscribe?"><span>Subscribe now</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://signalovernoise.tech/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share Signal Over Noise&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://signalovernoise.tech/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share Signal Over Noise</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Dify: The Visual Builder for People Who Don't Want to Write Agent Code]]></title><description><![CDATA[A node-based LLMOps platform with 140k stars and one honest question to answer.]]></description><link>https://signalovernoise.tech/p/dify-the-visual-builder-for-people</link><guid isPermaLink="false">https://signalovernoise.tech/p/dify-the-visual-builder-for-people</guid><dc:creator><![CDATA[Justin Wilson]]></dc:creator><pubDate>Thu, 28 May 2026 10:46:55 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ESIi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4eb6fd-9238-4a4e-86aa-926dd29ba8d6_1728x960.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ESIi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4eb6fd-9238-4a4e-86aa-926dd29ba8d6_1728x960.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ESIi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4eb6fd-9238-4a4e-86aa-926dd29ba8d6_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!ESIi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4eb6fd-9238-4a4e-86aa-926dd29ba8d6_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!ESIi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4eb6fd-9238-4a4e-86aa-926dd29ba8d6_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!ESIi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4eb6fd-9238-4a4e-86aa-926dd29ba8d6_1728x960.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ESIi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4eb6fd-9238-4a4e-86aa-926dd29ba8d6_1728x960.png" width="1456" height="809" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4e4eb6fd-9238-4a4e-86aa-926dd29ba8d6_1728x960.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:809,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1757719,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://signalovernoise.tech/i/199583549?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4eb6fd-9238-4a4e-86aa-926dd29ba8d6_1728x960.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ESIi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4eb6fd-9238-4a4e-86aa-926dd29ba8d6_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!ESIi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4eb6fd-9238-4a4e-86aa-926dd29ba8d6_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!ESIi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4eb6fd-9238-4a4e-86aa-926dd29ba8d6_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!ESIi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e4eb6fd-9238-4a4e-86aa-926dd29ba8d6_1728x960.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>A node-based LLMOps platform with 140k stars and one honest question to answer.</em></p><div><hr></div><p>The clean test for any visual agent builder is whether the team using it can survive the day the senior engineer leaves. Most can&#8217;t, and that has nothing to do with the tool. It&#8217;s that the visual flow on the screen is the only place the system exists. There&#8217;s no repo to clone, no diff to review, no test to fail. The team owns the artifact the way a marketing department owns a Canva file.</p><p>Dify is the open-source visual builder that takes that problem more seriously than its competitors. The langgenius repo crossed 142,000 stars this month, version 1.14.2 shipped on May 19, and the project pitches itself now as a &#8220;production-ready platform for agentic workflow development&#8221; rather than the chatbot builder it started as. The drag-and-drop canvas is still there. So is the prompt IDE, the RAG pipeline, the agent runtime, the tool registry, and the LLMOps tracing layer. It runs self-hosted on Docker in about ten minutes. It is a real piece of software, built by people who have shipped real software, and the recent release notes read like a team that knows what production failure looks like: tenant-scoped credential isolation, workflow execution reliability fixes, memory fetches outside Flask context, base64 file lookup sessions closed correctly. That is not vibes-driven development.</p><p>The honest question Dify has to answer in 2026 is not whether it works. It works. The question is who it&#8217;s for, and whether the people it&#8217;s for know what they&#8217;re trading away.</p><p>The visual builder pitch is always the same: lower the floor so non-engineers can ship. The Dify canvas lets a product manager wire an LLM node to a knowledge retrieval node to a conditional branch to an HTTP call to a response. The HITL workflows landed properly in 1.14, so you can park a node for human review and resume the trace afterwards. The tool registry has hundreds of pre-built integrations. The same flow can be exposed as a chatbot, an API endpoint, or a workflow that another system calls. None of this requires writing agent code in Python, TypeScript, or anything else. For a four-person team that needs a customer-support assistant on top of their docs by Friday, the value is obvious.</p><p>The problem is the second project. The flow that started as a Friday afternoon chatbot becomes the system that handles refund eligibility, then the system that decides whether a clinical claim gets routed for review, then the system that talks to the billing API. Each step makes sense locally. The composite is a piece of production software whose entire definition lives in a database row and whose history lives in the platform&#8217;s audit log. Version it by exporting the DSL file. Review changes by diffing exported JSON in a PR description that nobody can read. Test it by clicking through the flow on staging and hoping. This is the part of the visual-builder lifecycle that vendors don&#8217;t market and that operators see every time.</p><p>This is the same tension the npm-packaged-skills approach was designed to escape. When the agent&#8217;s behavior lives in a versioned package with a tagged release, the changes get code review, the tests run in CI, and the rollback is a <code>git revert</code>. When the behavior lives in a Dify flow, the rollback is &#8220;restore from yesterday&#8217;s database backup and lose every conversation in between.&#8221; Both models can ship software. Only one of them treats the agent like software.</p><p>That doesn&#8217;t make Dify wrong. It makes it a category, and the category has a job to do. The job is letting people who don&#8217;t write code build useful AI applications without a developer in the loop for every change. That is a real job, it is worth doing, and Dify does it as well as anything I&#8217;ve seen in the open-source space. The mistake is using it for the wrong tier of system. A customer-facing internal tool that summarizes weekly sales data is the right tier. A workflow that touches PHI, makes refund decisions, or sits in front of a regulated process is the wrong tier, no matter how clean the visual flow looks.</p><p>The case I would actually make for Dify in an engineering organization is narrower than the marketing suggests, and stronger because of it. Use it as the front door for prompt and flow exploration before anything gets promoted to a code repo. The product team can sketch a workflow on the canvas, run real traces against real data, prove the idea is worth shipping, and hand the engineering team a working spec with the prompts already battle-tested. That export becomes a starting point for a real implementation in code, not the implementation itself. Visual builder as design surface, code repo as production artifact. Most teams using Dify have the relationship inverted, which is why their second project breaks.</p><p>The other honest use is the internal-tools tier. Operations, HR, customer success, finance. Each of those teams has a constant queue of &#8220;can we get an AI that does X for our department.&#8221; Standing up a development sprint for each request is the wrong economics. A Dify instance behind SSO with department-level tenants is the right economics. The flows live for six months, serve a known internal user base, and don&#8217;t sit on the critical path. When one of them outgrows the canvas, you know because it has, and you migrate that one. The platform is doing what visual builders are good at: enabling the work that wasn&#8217;t going to get engineering attention anyway.</p><p>The cost story matters and is rarely told straight. Self-hosted Dify on a modest VM is essentially free for the platform layer, with the inference cost going to whatever model provider you&#8217;re calling through. The hosted version has paid tiers that scale with usage. The hidden cost is the operational one. The platform brings its own database, vector store, redis, sandbox executor for Python tool calls, and plugin daemon. Those are five additional things on the runbook for an SRE team that thought they were adopting one tool. The recent releases have been visibly tightening this. The 1.14.x line has been a steady drumbeat of deployment and runtime fixes, which is what a project does when it knows real ops teams are now running it in real environments. That is healthy. It is also a signal that the operational surface area is wider than the marketing implies.</p><p>The release cadence is the strongest signal of all. A project shipping 1.13.2, 1.13.3, 1.14.0, 1.14.1, and 1.14.2 over the past two months is a project with users in production complaining about specific bugs and a maintainer team that resolves them. Compare that to the AI-tool graveyard of repos that hit 30,000 stars in 2024 and haven&#8217;t tagged a release since the Series A closed. Dify is in the rarer category of open-source AI infrastructure that is still being maintained like a product.</p><p>The verdict for 2026 is simple. Dify is the visual builder I would actually deploy if I had a team that needed one, and the visual builder I would actively keep off the critical path of anything regulated. The product is honest. The category has known limits. Pretending those limits aren&#8217;t there is how the visual builder ends up owning a workflow that should have been a code repo six months ago. Use it for what it is, hand the rest to a real codebase, and the platform earns the stars.</p><div><hr></div><p><em>If this was useful, forward it to one engineer who needs less noise in their feed.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://signalovernoise.tech/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://signalovernoise.tech/subscribe?"><span>Subscribe now</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://signalovernoise.tech/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share Signal Over Noise&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://signalovernoise.tech/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share Signal Over Noise</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Agents Should Be Distributed Like Libraries, Not Like SaaS]]></title><description><![CDATA[The marketplaces want to be the App Store of agents.]]></description><link>https://signalovernoise.tech/p/agents-should-be-distributed-like</link><guid isPermaLink="false">https://signalovernoise.tech/p/agents-should-be-distributed-like</guid><dc:creator><![CDATA[Justin Wilson]]></dc:creator><pubDate>Wed, 27 May 2026 10:28:12 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!cAwG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4044e6ed-1833-4062-9fa3-d67ac9fdc3f7_1728x960.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cAwG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4044e6ed-1833-4062-9fa3-d67ac9fdc3f7_1728x960.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cAwG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4044e6ed-1833-4062-9fa3-d67ac9fdc3f7_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!cAwG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4044e6ed-1833-4062-9fa3-d67ac9fdc3f7_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!cAwG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4044e6ed-1833-4062-9fa3-d67ac9fdc3f7_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!cAwG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4044e6ed-1833-4062-9fa3-d67ac9fdc3f7_1728x960.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cAwG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4044e6ed-1833-4062-9fa3-d67ac9fdc3f7_1728x960.png" width="1456" height="809" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4044e6ed-1833-4062-9fa3-d67ac9fdc3f7_1728x960.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:809,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2626231,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://signalovernoise.tech/i/199443952?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4044e6ed-1833-4062-9fa3-d67ac9fdc3f7_1728x960.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cAwG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4044e6ed-1833-4062-9fa3-d67ac9fdc3f7_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!cAwG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4044e6ed-1833-4062-9fa3-d67ac9fdc3f7_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!cAwG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4044e6ed-1833-4062-9fa3-d67ac9fdc3f7_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!cAwG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4044e6ed-1833-4062-9fa3-d67ac9fdc3f7_1728x960.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>The marketplaces want to be the App Store of agents. The package manager has been waiting the whole time.</em></p><div><hr></div><p>The pitch deck arrived last Tuesday and it had four logos on the front page. Four agent marketplaces, all launched in the last eight months, all promising the same thing: a hosted runtime, a discovery surface, a billing rail, and a &#8220;deploy in one click&#8221; path from somebody else&#8217;s agent to your production environment. The model is familiar because it is the same model every vendor reaches for when a new category opens up. Build the SaaS. Own the runtime. Become the App Store of agents.</p><p>The vendors selling this are not stupid and the model is not crazy. Marketplaces capture value. Hosted runtimes lock in customers. The economics work for the platform. They work less well for the customer, and they work badly for the agent itself, and the people who will figure that out first are the practitioners who have already had this argument once, around containers, and once more, around CI/CD, and at least twice more around package management. The shape of the answer has been known for a decade. Agents are not a new category. They are libraries with a loop on top, and libraries get distributed through source control and package managers, not through hosted catalogs run by people who want a percentage.</p><p>The case for treating agents like libraries is mostly a case about three things the SaaS model cannot give you and that the library model gives you for free: version pinning, code review, and a clean separation between the artifact and the runtime. None of these are exotic. Every working engineering org has them for normal code. The interesting question is why the agent ecosystem keeps acting like they need to be reinvented.</p><p>Version pinning is the most concrete one. When an agent&#8217;s behavior changes, somebody has to know. In the library model, the agent is a versioned artifact in your <code>package.json</code> or your <code>requirements.txt</code> or your equivalent. A version bump shows up in a pull request. A diff exists. A reviewer signs off. The change is auditable six months later when an incident sends somebody digging through git blame to figure out when the triage flow started misclassifying severity. In the marketplace model, the agent is hosted somewhere, the vendor pushes an update, and your production behavior changes overnight with no diff, no review, and no audit trail beyond the vendor&#8217;s changelog, which is written by the vendor, for the vendor, and which you do not get to verify. The vendors will tell you that you can pin a version. Some of them even mean it. Most of the time the pin is a soft pin, the deprecation window is short, and the next major model upgrade silently moves under you anyway because the agent&#8217;s behavior depends on the model and the model is not part of the pin.</p><p>Code review is the second one and it is the one practitioners feel the hardest after a few months of running agents in anger. An agent&#8217;s behavior is encoded in prompts, in tool definitions, in skill files, in retry policies, in routing logic. Every one of those is text. Every one of those benefits from a second pair of eyes. When the agent lives in a GitHub repo, every change to it is a PR, every PR has a review, every review enforces whatever conventions the team has decided matter. When the agent lives behind a vendor portal, the closest thing to review is a &#8220;preview this version&#8221; button and a textarea where one person clicks save. That is not a workflow. That is a foot-gun with a UI on top.</p><p>The runtime question is where the library model breaks cleanly from the SaaS model, because this is where the vendor&#8217;s incentives diverge from yours hardest. A hosted agent runtime gives the vendor leverage over your dependency graph. The agent only runs on their platform. The model calls are routed through their inference layer. The observability data is theirs. The cost model is theirs. The migration path off the platform, if you ever want one, is theirs to define and theirs to make difficult. None of this is hypothetical. We have run this exact movie with workflow automation tools, with iPaaS platforms, with low-code builders, and with every prior generation of vendor-mediated runtime, and the ending is the same every time: the bill goes up faster than the value, the lock-in becomes a board-level conversation, and somebody gets handed the project of getting off the platform two years too late.</p><p>The library model unbundles all of that. The agent is a package. The model is whatever you point it at. The runtime is your own infrastructure. The observability is whichever tracing stack you already use. The bill is the inference cost plus a rounding error of CPU. The migration path is the same migration path you have for any other library, which is to update the dependency or fork it, and that path stays open because the artifact is text in a repo you control.</p><p>The objection from the vendor side, and from a non-trivial number of engineers who have been burned by self-hosting, is that the SaaS model exists for a reason. Operating production infrastructure is real work. Patching is real work. Observability plumbing is real work. The platforms charge a premium because they handle that work, and a team that adopts the library model has to handle it themselves. This is true and it is the strongest argument for the SaaS approach. The counter is that the work in question is work the team already does for the rest of its production stack, and the marginal cost of adding agents to that stack is far smaller than the marginal cost of running a parallel governance regime for one vendor-hosted carve-out. If your team is already doing platform work for its services, adding the agent layer to that work is incremental. If your team is not doing platform work at all and the agent is the first production workload, the SaaS model is genuinely the right choice for a while. The trap is staying on it after the first answer stops applying.</p><p>The other objection is discovery. A marketplace solves a real problem, which is that finding good agents written by people you trust is hard, and an open ecosystem of GitHub repos does not solve it by itself. This is fair. The honest answer is that the discovery problem is not solved by handing it to a vendor either. The vendor&#8217;s incentive is to surface the agents that pay for placement, not the agents that work best for your use case. The discovery problem is solved by the same mechanism that solved it for libraries, which is reputation, community curation, and the slow accumulation of trust around specific maintainers and specific orgs. That is a slower mechanism than a marketplace launch event, and it is more durable, and it does not put a vendor between you and the artifact.</p><p>Where this lands, after a year of watching teams try both models, is that the agent layer is going to look more like the library layer than the platform layer, and the teams that figure that out first are going to spend less, lock in less, and ship more. The marketplaces will exist. Some of them will even be useful for a particular niche. They are not the default. The default is the same boring answer engineering has been giving for two decades when a new category of artifact appears: put it in a repo, version it, review it, ship it through the package manager you already use, run it on the infrastructure you already own.</p><p>If the agent in your production stack is on somebody else&#8217;s runtime, behind somebody else&#8217;s portal, with a version number you do not control, ask the question now rather than the quarter the bill triples.</p><div><hr></div><p><em>If this was useful, forward it to one engineer who needs less noise in their feed.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://signalovernoise.tech/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://signalovernoise.tech/subscribe?"><span>Subscribe now</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://signalovernoise.tech/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share Signal Over Noise&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://signalovernoise.tech/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share Signal Over Noise</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Shipping Skills as npm Packages: A Workflow That Actually Scales]]></title><description><![CDATA[The package manager already solved this problem.]]></description><link>https://signalovernoise.tech/p/shipping-skills-as-npm-packages-a</link><guid isPermaLink="false">https://signalovernoise.tech/p/shipping-skills-as-npm-packages-a</guid><dc:creator><![CDATA[Justin Wilson]]></dc:creator><pubDate>Tue, 26 May 2026 10:32:40 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!WiNx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed0164b-e80a-4f31-8c87-9bbd7e27b051_1728x960.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WiNx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed0164b-e80a-4f31-8c87-9bbd7e27b051_1728x960.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WiNx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed0164b-e80a-4f31-8c87-9bbd7e27b051_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!WiNx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed0164b-e80a-4f31-8c87-9bbd7e27b051_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!WiNx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed0164b-e80a-4f31-8c87-9bbd7e27b051_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!WiNx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed0164b-e80a-4f31-8c87-9bbd7e27b051_1728x960.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WiNx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed0164b-e80a-4f31-8c87-9bbd7e27b051_1728x960.png" width="1456" height="809" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9ed0164b-e80a-4f31-8c87-9bbd7e27b051_1728x960.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:809,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2436611,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://signalovernoise.tech/i/199307138?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed0164b-e80a-4f31-8c87-9bbd7e27b051_1728x960.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WiNx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed0164b-e80a-4f31-8c87-9bbd7e27b051_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!WiNx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed0164b-e80a-4f31-8c87-9bbd7e27b051_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!WiNx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed0164b-e80a-4f31-8c87-9bbd7e27b051_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!WiNx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed0164b-e80a-4f31-8c87-9bbd7e27b051_1728x960.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>The package manager already solved this problem. Stop reinventing it with custom registries and Slack-pasted folders.</em></p><div><hr></div><p>The first six months of running skills across a team looked like a horror show. Engineers were copying SKILL.md folders out of Slack threads. One squad was running a v0.3 of the incident-triage skill while another squad was running a v0.7 with a completely different prompt structure, and neither one knew the other existed until an outage forced them to compare notes. We had a Confluence page that was supposed to be the source of truth and it had been wrong for three weeks. The skills themselves were working. The distribution was the disaster.</p><p>The fix was sitting in plain sight, and it embarrassed me a little that it took as long as it did to land on it. We already had a perfectly good way to ship code artifacts across teams with version pinning, dependency resolution, semantic versioning, changelogs, and a registry that handles all of this for free. It is called npm. The skills are markdown files plus the occasional shell script. They are content. npm does not care that they are not JavaScript. The package manager was happy to ship them the moment we stopped treating skills like some new category of artifact that needed a new category of tooling.</p><p>What this looks like in practice is a small package per skill or per logical bundle of skills. The package contains a <code>skills/</code> directory at the root, an index file that exports nothing meaningful from a runtime perspective, a <code>package.json</code> with name, version, description, and a <code>files</code> field that ships exactly the skill content. The name follows the same scope pattern as everything else: <code>@org/skill-incident-triage</code>, <code>@org/skill-billing-rollforward</code>, <code>@org/skills-postmortem</code>. The version is semver. A patch bump means the SKILL.md description was tightened. A minor bump means new instructions were added to the body, or a new reference document was included. A major bump means the skill&#8217;s invocation surface changed in a way that a downstream agent might trip on.</p><p>The consumer side is where the workflow earns its keep. An agent project has a <code>package.json</code> like any other Node project. Skills get pulled in as dependencies pinned to exact versions, not floating ranges. When the agent boots, it walks <code>node_modules</code> for any package matching the <code>@org/skill-*</code> or <code>@org/skills-*</code> naming pattern, finds the <code>skills/</code> directory inside each one, and either symlinks or copies the contents into the agent&#8217;s <code>.claude/skills/</code> directory before the model is loaded. Twenty lines of bootstrap code. The agent never has to know which skills it has. It asks the filesystem.</p><p>The reason this matters more than it sounds: every agent in the org ships with an explicit, version-pinned manifest of exactly which skills are loaded and at what version. The manifest is <code>package.json</code>. It lives in source control. It goes through code review. It diffs cleanly when someone bumps a skill version, and the diff shows up in the same PR review surface as every other dependency change. That is the entire workflow problem solved, and we did not write any of the tooling that solved it.</p><p>The version pinning is the part that does the most work in practice. When the incident-triage skill gets a major revision because the underlying playbook changed, the team that owns the skill ships <code>@org/skill-incident-triage@2.0.0</code>. The agents that depend on it stay pinned to <code>^1.x</code> until their owners are ready to test the upgrade. That test is the usual one: bump the version in <code>package.json</code>, run the agent against a fixture set of historical incidents, see whether the outputs hold up, ship the bump if they do, file an issue against the skill owners if they do not. Same loop as any other library upgrade. The skill team is not running around chasing every consumer asking them if they have updated. The consumers are not stuck on whatever version was current the day they cloned the repo.</p><p>The objection I get from engineers when I describe this is that npm feels heavy for what is fundamentally a folder of markdown. They are not wrong about the technology being overpowered for the artifact, and the cleaner instinct is to want something custom that fits the actual shape of the problem. The trouble with that instinct is that custom registries take real engineering time to build, real engineering time to maintain, and they fail in ways the team is not used to fixing. npm has been pressure-tested for fifteen years against the most adversarial dependency graphs in the industry. The cost of using a tool that is slightly oversized for the job is somewhere between negligible and zero. The cost of building a new tool that exactly fits the job is six engineer-months and a perpetual maintenance tax. The math is not close.</p><p>The second objection is that npm is JavaScript-flavored and most of the orgs running production agents are Python shops. This one is fair, and the workaround is uglier than I would like, but it works. The agent process does not need to be Node. The skills package can be pulled with <code>npm install</code> as a one-time setup step in the agent&#8217;s container build, with the resulting <code>node_modules</code> directory copied or symlinked into the agent&#8217;s working directory before the Python process starts. The cost is one Node toolchain in the build image. Python orgs that want to avoid that can use the same pattern with PyPI and a <code>skills/</code> directory inside a Python package. The mechanics are identical, the file layout is identical, only the index file and metadata change. The principle survives the language switch. The principle is: use the package manager you already have, with the version pinning you already trust, and let the boot-time copy step bridge the runtime gap.</p><p>Where I would push back on a team that is about to adopt this is the temptation to overdo the bundling. The first version of our setup had a single <code>@org/skills-all</code> package that shipped every skill in the org as one giant dependency. It was convenient until it was not. Every skill change cut a new version of the everything-package, every agent had to pull the whole world to update one skill, and the blast radius of a bad skill went from &#8220;the agents that depend on it&#8221; to &#8220;every agent in the company.&#8221; We broke it apart skill by skill over a quarter, and the diff in operational sanity was immediate. Granularity is doing real work here. One package per skill is the default. One package per coherent skill bundle (three skills that always travel together) is the exception. One package per organization is the antipattern that looks like simplification and turns into a load-bearing footgun.</p><p>The repeatable pattern, after roughly four months of running this in production across a dozen agents and three teams, comes down to a checklist. Each skill lives in its own repo, or in a monorepo with one package per skill. Each package follows the <code>@org/skill-&lt;name&gt;</code> naming convention. Each version follows semver with the patch/minor/major rules above. Each agent project pins exact versions in <code>package.json</code> and uses a twenty-line bootstrap to copy the skills into place before model load. Each version bump is a PR, reviewed by the same humans who review code changes. The whole thing fits into the developer workflow that already exists, and nobody has to learn a new tool.</p><p>The temptation in the agent ecosystem right now is to invent new infrastructure for every new abstraction. The instinct is wrong, and it is the same instinct that gave us a different YAML format for every CI tool, a different secret manager for every cloud, and a different package format for every language runtime. The agent layer is going to be more boring than the marketing wants you to believe, and the boring parts are where the actual leverage lives. Skills are content. npm ships content. Everything else is a distraction from the part of the work that produces value, which is writing skills good enough to be worth shipping in the first place.</p><p>If you are still copying skill folders out of Slack threads, the package manager has been waiting for you the whole time.</p><div><hr></div><p><em>If this was useful, forward it to one engineer who needs less noise in their feed.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://signalovernoise.tech/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://signalovernoise.tech/subscribe?"><span>Subscribe now</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://signalovernoise.tech/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share Signal Over Noise&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://signalovernoise.tech/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share Signal Over Noise</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Stop Building Agents. Build Skills.]]></title><description><![CDATA[The agent is a loop.]]></description><link>https://signalovernoise.tech/p/stop-building-agents-build-skills</link><guid isPermaLink="false">https://signalovernoise.tech/p/stop-building-agents-build-skills</guid><dc:creator><![CDATA[Justin Wilson]]></dc:creator><pubDate>Mon, 25 May 2026 09:59:39 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!VfZd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe2164a0-1fed-480c-a648-b48a1309075b_1728x960.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VfZd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe2164a0-1fed-480c-a648-b48a1309075b_1728x960.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VfZd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe2164a0-1fed-480c-a648-b48a1309075b_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!VfZd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe2164a0-1fed-480c-a648-b48a1309075b_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!VfZd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe2164a0-1fed-480c-a648-b48a1309075b_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!VfZd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe2164a0-1fed-480c-a648-b48a1309075b_1728x960.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VfZd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe2164a0-1fed-480c-a648-b48a1309075b_1728x960.png" width="1456" height="809" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/be2164a0-1fed-480c-a648-b48a1309075b_1728x960.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:809,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2137633,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://signalovernoise.tech/i/199168169?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe2164a0-1fed-480c-a648-b48a1309075b_1728x960.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VfZd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe2164a0-1fed-480c-a648-b48a1309075b_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!VfZd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe2164a0-1fed-480c-a648-b48a1309075b_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!VfZd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe2164a0-1fed-480c-a648-b48a1309075b_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!VfZd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe2164a0-1fed-480c-a648-b48a1309075b_1728x960.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>The agent is a loop. The skill is the product. Stop investing in the wrong layer.</em></p><div><hr></div><p>Every team I have talked to in the last six months is building an agent. Almost none of them are building skills. The ratio is roughly fifteen to one, and the gap is the reason most of these projects will be rewritten inside a year. The agent is the part that looks impressive in a demo. The skill is the part that survives the second model upgrade, the second framework switch, and the second time a senior engineer leaves. One of these things you should be investing in, and most teams are picking the wrong one.</p><p>The architectural mistake is easy to make because the words are loaded. &#8220;Agent&#8221; sounds like the system. &#8220;Skill&#8221; sounds like a feature. The instinct is to spend engineering hours on the thing that sounds like the system and treat the features as cheap content somebody can fill in later. That instinct is exactly backward, and the cost of getting it wrong only shows up when the model under the agent changes, or the framework changes, or the loop pattern itself changes, which in the current ecosystem is roughly every quarter.</p><p>Look at what an agent actually is. Strip the marketing off any production system that calls itself agentic and you find the same skeleton: a loop that calls a model, parses the response, executes any tool calls, feeds the results back, and exits when the model says the work is done. Variations exist around how the loop manages state, how it handles retries, how it composes plans, but those are tactical choices on top of a pattern that fits on a page. The loop is not where your competitive advantage lives. The loop is where your competitive advantage gets written and rewritten as the field moves.</p><p>The skill is the opposite. A skill is the encoded knowledge of how a specific job gets done in your specific environment. How your team writes incident postmortems. How your billing system rolls forward a subscription that hit a card decline. How your sales team scopes a contract before it goes to legal. How your support engineers triage a Sev 2 against a Sev 3. That knowledge took years to accumulate. It lives in your senior people, in your Confluence pages nobody reads, in the Slack scrollback from the last time something broke. Packaging it as a skill is the act of moving it out of those places and into a format an agent can use today and a different agent can use tomorrow.</p><p>That portability is the part the field is sleeping on. A skill written against the Claude Skills format is a folder of markdown and optionally some scripts. The format is small enough that translating it to whatever Anthropic&#8217;s competitors ship next quarter is a half-day exercise, not a rewrite. The agent loop that consumes the skill might get replaced four times in the next eighteen months as the frameworks shake out. The skill itself, if you wrote it against the actual workflow and not against the framework&#8217;s idioms, survives every one of those replacements.</p><p>The team I am running has tested this directly. We have rebuilt the agent layer twice in the last nine months. Once when we moved off a vendor-hosted runtime to a self-hosted one, once when we swapped the underlying framework to get better tracing. Both rewrites took less than a week. The skills moved across unchanged. The reverse experiment, where we tried to keep one agent and rewrite the skill collection inside it, took six weeks the one time we did it and produced worse results than what we had started with. The economics are not subtle.</p><p>The complication, and the reason teams do not arrive at this on their own, is that building skills first feels slow. You spend two weeks writing a skill that documents how a workflow runs, and at the end of those two weeks you do not have an agent. You have a folder of markdown. The pull toward the agent is the pull toward something demoable, something a stakeholder can see moving on a screen, something that produces a Slack thread with screenshots. The skill produces none of that until it is mounted into a loop, and even then the visible artifact is the loop, not the skill underneath it.</p><p>The honest counterargument is that not every workflow deserves to be a skill. Plenty of agent work is one-off, exploratory, glue between two systems that will not exist in six months. Writing those as skills is over-engineering, and the team that tries to formalize everything is going to drown in a folder of half-finished SKILL.md files that nobody trusts. The judgment call is which workflows are durable enough to encode and which ones should stay in the loop as inline instructions. The rough heuristic I have landed on is that anything you have explained more than twice to a teammate is already a skill in disguise, and the act of writing it down is just making the existing knowledge legible to the agent.</p><p>The deeper reframe is that the agent is infrastructure and the skill is product. Treating the agent as product is the same category error as treating Kubernetes as product. The orchestration layer matters, the choices it makes have real consequences, and it is worth getting right, but the thing your customers or users actually care about is the work that gets done on top of it. Nobody buys a Kubernetes cluster. They buy what runs on it. Nobody is going to buy an agent loop. They are going to buy the collection of skills that the loop happens to execute, and the loop is going to be the cheap, replaceable substrate that makes those skills run.</p><p>What does it look like to actually build this way? It looks like a repository structure where the skills directory is the part with the most commit activity, the most code review, the most test coverage, and the most attention from senior engineers. It looks like a CI pipeline that lints skills the way you would lint code, with the description field validated against an LLM-as-judge to confirm it triggers correctly on representative prompts. It looks like a versioning scheme for skills that lets you ship a v2 without breaking the agents that pin to v1. It looks like an internal review process where shipping a new skill goes through the same gates as shipping a new microservice, because in any system that takes this seriously, that is exactly what it is.</p><p>The agent, in that world, is the smallest reasonable amount of code that loads the skills, runs the loop, handles the edges, and exits. The agent is something a competent engineer writes in a week. The skills are what the company spends years building. The investment ratio inverts, and the surface area of the thing that has to change every time the model improves shrinks to the part nobody is emotionally attached to.</p><p>This is the architectural shift the next year is going to force. The teams that get there early will have a portable, composable library of capabilities they can run against whatever the best frontier model is on any given Tuesday. The teams that stay in the agent-first mindset will spend the next year rewriting their loop every time the framework landscape shifts, and they will end that year exactly where they started.</p><p>Stop building agents. Build skills. The agent is the part you throw away.</p><div><hr></div><p><em>If this was useful, forward it to one engineer who needs less noise in their feed.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://signalovernoise.tech/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://signalovernoise.tech/subscribe?"><span>Subscribe now</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://signalovernoise.tech/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share Signal Over Noise&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://signalovernoise.tech/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share Signal Over Noise</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Claude Skills: The Most Underrated Feature in the Agent Ecosystem]]></title><description><![CDATA[A folder with a markdown file, and somehow it solves the problem every other vendor is still trying to ship around.]]></description><link>https://signalovernoise.tech/p/claude-skills-the-most-underrated</link><guid isPermaLink="false">https://signalovernoise.tech/p/claude-skills-the-most-underrated</guid><dc:creator><![CDATA[Justin Wilson]]></dc:creator><pubDate>Sun, 24 May 2026 10:39:02 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!I1LT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a091366-9d4d-4128-b0cf-ff550f9c5b70_1728x960.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!I1LT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a091366-9d4d-4128-b0cf-ff550f9c5b70_1728x960.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!I1LT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a091366-9d4d-4128-b0cf-ff550f9c5b70_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!I1LT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a091366-9d4d-4128-b0cf-ff550f9c5b70_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!I1LT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a091366-9d4d-4128-b0cf-ff550f9c5b70_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!I1LT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a091366-9d4d-4128-b0cf-ff550f9c5b70_1728x960.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!I1LT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a091366-9d4d-4128-b0cf-ff550f9c5b70_1728x960.png" width="1456" height="809" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2a091366-9d4d-4128-b0cf-ff550f9c5b70_1728x960.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:809,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2099945,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://signalovernoise.tech/i/199054630?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a091366-9d4d-4128-b0cf-ff550f9c5b70_1728x960.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!I1LT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a091366-9d4d-4128-b0cf-ff550f9c5b70_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!I1LT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a091366-9d4d-4128-b0cf-ff550f9c5b70_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!I1LT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a091366-9d4d-4128-b0cf-ff550f9c5b70_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!I1LT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a091366-9d4d-4128-b0cf-ff550f9c5b70_1728x960.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>A folder with a markdown file, and somehow it solves the problem every other vendor is still trying to ship around.</em></p><div><hr></div><p>Anthropic shipped the cleanest abstraction in the agent ecosystem six months ago, and most teams I talk to still cannot tell me what it does. They have opinions about MCP. They have spreadsheets comparing tool-use latency across vendors. They have a slide deck on &#8220;system prompt strategies&#8221; that runs to forty pages. Ask them what a Claude Skill is and you get a pause, then a guess that it is some kind of plugin, or maybe a custom tool, or possibly the same thing as an MCP server. It is not. It is a different primitive, and once you see it for what it is, the agent architecture conversation simplifies in a way that is genuinely hard to walk back.</p><p>A skill, in the Claude implementation, is a folder. The folder contains a file called SKILL.md with YAML frontmatter on top: a name, a description, and optionally a license or version. Below the frontmatter is the body, which is instructions to the agent about what the skill does and how to use it. The folder may also contain other files: reference documents the skill points to, runnable scripts the agent can execute, templates it can fill in.</p><p>That is the whole format. There is no registry, no manifest, no install step. You put the folder somewhere Claude can read it, and the skill exists.</p><p>The cleverness is in how the agent loads it. Claude does not pull the full skill body into context at the start of a session. It reads only the name and description from the frontmatter, across every skill available to it, and keeps those in working memory the way a human keeps a mental index of what tools are in the drawer. When the user asks a question that matches a description, the agent reaches into the folder and loads the body. If the body references other files, those load only when the agent actually opens them. This is what Anthropic calls progressive disclosure, and it is the part that makes the abstraction scale. A team can ship fifty skills to an agent without burning context on any of them until the moment one is needed.</p><p>This is where the confusion with MCP starts, and it is worth being precise. MCP is a protocol. It connects an agent to a running server that exposes tools and resources over a defined interface. The server is a process; it has uptime; it can be remote; it handles auth and state and concurrency.</p><p>A skill is none of those things. A skill is a static folder of text and optional scripts. The skill does not run anywhere until the agent decides to use it. MCP solves &#8220;how does the agent reach the running system that does the work.&#8221; Skills solve &#8220;how does the agent know what work is even possible and how to do it.&#8221;</p><p>Those are different problems, and the right architecture often uses both. A skill might instruct the agent on a workflow that calls four MCP tools in a specific sequence. The skill is the recipe; the MCP server is the kitchen.</p><p>The confusion with tools, in the function-calling sense, is the other failure mode. A tool is a single function with a schema, exposed to the model so it can call it with structured arguments. A skill is a packaged capability that may contain tools, may contain instructions, may contain reference files, may contain runnable scripts, and may contain none of those at the upper bound.</p><p>The smallest valid skill is a SKILL.md with a description and three paragraphs of instructions. That is a complete, useful skill. It teaches the agent how to do something it could not do before, with no code, no server, no schema. The largest skill might be a folder of fifty files and a runnable Python script that the agent invokes via a shell tool. Both are skills, both load through the same progressive-disclosure mechanism, and the format does not care.</p><p>What makes the abstraction clean is the set of properties it has by default. Skills are discoverable through their descriptions, triggering the same way a tool would by matching intent against metadata. Skills are composable, because they are folders, so a skill can be made of other skills the same way a directory can contain subdirectories. Skills are versionable, because they live in git like any other text. Skills are shareable, because copying a skill across teams is copying a folder. Skills are progressively disclosed, which means the context cost is paid only on use.</p><p>None of those properties required a new protocol, a new runtime, or a new format. The abstraction worked because Anthropic resisted the temptation to invent any of those things and shipped a folder with a markdown file.</p><p>Where it earns its place is in everything you would otherwise stuff into a system prompt that grows without bound. The skill is the right place for &#8220;how this team writes pull request descriptions,&#8221; for &#8220;the seven steps to onboard a new customer in our billing system,&#8221; for &#8220;what our internal glossary calls a deal versus an account versus a contract.&#8221; Those are not tools. Those are not facts. They are workflows and conventions and the kind of knowledge that lives in a wiki nobody reads. Skills are the agent-shaped expression of that knowledge, and the format is small enough that a domain expert can write one without learning a framework.</p><p>The honest limitations matter, and none of them explain why the feature itself is underrated, which is a separate problem with a different cause. Skills are Claude-only today, with no portable spec across vendors, which means a skill you write does not run on GPT or Gemini without translation. The description field is load-bearing in a way that takes a few tries to get right; a description that is too generic gets ignored by the agent, and a description that is too narrow misses the cases you wanted it to handle.</p><p>The progressive disclosure model is not free, because a skill that auto-loads on a tangential match still spends tokens on its body before the agent decides it was the wrong choice. The ecosystem for sharing skills across teams and organizations is embryonic, with no real registry or distribution model, which is the gap our team has been filling with the npm-packaged approach the rest of this arc covers. The scripts a skill includes run in the agent&#8217;s own environment, which means anything you ship as a skill carries the security posture of the host process, not the sandboxed isolation a remote MCP server would give you.</p><p>The reason it is underrated is mostly a marketing accident. Anthropic shipped skills in a docs update that landed between two louder launches, and the community read it as &#8220;oh, a way to organize prompts,&#8221; then moved on. The MCP announcement got conference talks; skills got a paragraph in a changelog.</p><p>Half the engineers I have talked to about this still think a skill is a fancy system prompt. The other half think it is a competitor to MCP. The fact that it is neither, and that the abstraction is small enough to fit in a folder, makes it easy to overlook and hard to evaluate against the noise.</p><p>This is the gap the rest of this week&#8217;s posts are written to close. The next one argues that the architectural shift is to build skills first and treat the agent as the cheap, replaceable wrapper around them. The day after, I cover the npm-packaging pattern our team has landed on for distributing skills across the org with the same workflow as any other library. Then comes the case for treating agents as code you check in, not SaaS you subscribe to, and the comparison against the hosted tool-registry model that several vendors are trying to sell as the alternative.</p><p>The throughline is what yesterday set up. The framework you pick wraps the loop. The skills you ship are the product. Claude Skills are the closest thing to a clean specification of that unit that anyone has put in front of practitioners, and the year ahead is going to be defined by who notices.</p><div><hr></div><p><em>If this was useful, forward it to one engineer who needs less noise in their feed.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://signalovernoise.tech/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://signalovernoise.tech/subscribe?"><span>Subscribe now</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://signalovernoise.tech/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share Signal Over Noise&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://signalovernoise.tech/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share Signal Over Noise</span></a></p><p></p><p></p>]]></content:encoded></item><item><title><![CDATA[Built-In Tools Are the Whole Game]]></title><description><![CDATA[A loop with nothing to call is a chatbot.]]></description><link>https://signalovernoise.tech/p/built-in-tools-are-the-whole-game</link><guid isPermaLink="false">https://signalovernoise.tech/p/built-in-tools-are-the-whole-game</guid><dc:creator><![CDATA[Justin Wilson]]></dc:creator><pubDate>Sat, 23 May 2026 10:40:57 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!plTF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bee0476-f843-41b9-b355-cb8ac189ebf9_1728x960.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!plTF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bee0476-f843-41b9-b355-cb8ac189ebf9_1728x960.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!plTF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bee0476-f843-41b9-b355-cb8ac189ebf9_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!plTF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bee0476-f843-41b9-b355-cb8ac189ebf9_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!plTF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bee0476-f843-41b9-b355-cb8ac189ebf9_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!plTF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bee0476-f843-41b9-b355-cb8ac189ebf9_1728x960.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!plTF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bee0476-f843-41b9-b355-cb8ac189ebf9_1728x960.png" width="1456" height="809" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6bee0476-f843-41b9-b355-cb8ac189ebf9_1728x960.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:809,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1985955,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://signalovernoise.tech/i/198949780?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bee0476-f843-41b9-b355-cb8ac189ebf9_1728x960.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!plTF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bee0476-f843-41b9-b355-cb8ac189ebf9_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!plTF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bee0476-f843-41b9-b355-cb8ac189ebf9_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!plTF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bee0476-f843-41b9-b355-cb8ac189ebf9_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!plTF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6bee0476-f843-41b9-b355-cb8ac189ebf9_1728x960.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>A loop with nothing to call is a chatbot. The tools are the product.</em></p><div><hr></div><p>Two teams I talked to this month went down the same road and ended up in different places. The first team had been benchmarking LangGraph against CrewAI against Pydantic AI for four weeks, with a spreadsheet, with side-by-side latency tests, with a draft RFC they were going to standardize on once the decision landed. The second team picked the first framework that compiled on a Tuesday afternoon and spent the same four weeks plumbing their agent into Salesforce, their billing system, their internal Confluence, their support ticket queue, and a read-replica of their orders table. The first team had three demos that worked on synthetic data. The second team had something that handled real customer questions on real accounts and saved their support team about fifteen hours a week.</p><p>The framework choice did not matter. The framework choice has never been the point.</p><p>This is the part the agent ecosystem keeps refusing to admit. We have spent eighteen months treating &#8220;which framework&#8221; as the load-bearing architectural decision. Every conference talk is on state machines and graph topology. Every comparison thread on Hacker News pulls thousands of comments. Every Twitter post benchmarks decorators against builder patterns. Meanwhile the differentiator that decides whether your agent ships value is what it can touch. Tools. Tool integrations. The specific, boring, business-shaped capabilities wired into the loop. Not the loop.</p><p>A framework, stripped of marketing language, is a wrapper around a while-loop, a tool-calling protocol, and some state. The frameworks differ in ergonomics, in observability hooks, in how they handle parallel tool calls or human-in-the-loop pauses or streaming responses. Those differences matter once you have something worth running. They do not produce value on their own. A LangGraph agent with no tools and a CrewAI agent with no tools and a Pydantic AI agent with no tools are all the same agent. They are a chatbot. A chatbot is not what anyone is buying.</p><p>What people are buying is the agent that knows their customers, queries their data, files their tickets, schedules their meetings, reads their docs, and writes back into the systems that run their business. None of that lives in the framework. All of it lives in the tools the framework can call. The framework debate is a distraction from the only question that predicts whether the agent does anything useful: what does it have access to?</p><p>The built-in tool catalog from the major vendors hides the second layer of this conversation. Anthropic ships Claude with web search, code execution, computer use, and a file system tool, all callable from the API without writing custom integration code. OpenAI&#8217;s Agents SDK pulls in web search, a code interpreter, file search, and a hosted image generator. Google&#8217;s Gemini agents ship with native Google Workspace and search access. Those are the gimme tools. They are useful. They are also the easy ones. Every team that ships an agent gets the same web search and the same code execution, which is why none of those capabilities differentiate any product anyone is paying for.</p><p>The hard tools are the ones that touch your business. The Salesforce integration that knows the difference between an opportunity and a contact and writes back to the right object. The Postgres tool that respects row-level security and does not return PII to a user who should not see it. The Stripe tool that knows the difference between a refund and a chargeback dispute and asks for human confirmation before calling either. The ticketing tool that creates a Linear issue with the right project, the right team, and the right priority based on the user&#8217;s actual intent. Those are the tools that make the agent worth paying for. They are also the tools nobody writes blog posts about, because they are not interesting on a slide deck. They are auth flows, schema mapping, error handling, idempotency keys, rate-limit backoff, and the kind of integration code that makes engineers groan when they see the JIRA ticket.</p><p>This is the work the framework debate keeps hiding. A team that spent six months choosing between LangGraph and Strands has spent the same six months not building the Confluence tool, not writing the Salesforce integration, not figuring out the safe-write semantics for their billing system. Those are not framework problems. No framework solves them. A framework gives you a function signature for <code>register_tool</code> and gets out of the way. The work happens after that signature, and it is the work that produces the product.</p><p>The cleanest abstraction the industry has produced for this work is the skill. Anthropic shipped Claude Skills as the first vendor implementation that names the thing: a skill is a packaged capability with instructions, files, and tools bundled together, that the agent loads on demand from a folder. The MCP server ecosystem covers a different shape of the same idea, treating each capability as a small server the agent connects to over a protocol. Both are converging on the same insight. The unit of agent capability is not the framework you wrote it in. It is the tool, the instructions for using the tool, and the auth that lets the tool reach into the system it represents. That bundle is reusable across agents and reusable across frameworks. The framework you pick is the harness. The skills are the work.</p><p>The honest objection is that framework choice does still matter at scale, and it does, marginally. The team running a hundred agents in production cares about debuggability, lock-in, observability, type safety, and team-shareability. Those are real concerns and a framework with weak observability hooks will cost you incident hours when something breaks at three in the morning. None of that, though, is what predicts whether the first agent ships value. The first agent ships value because it has the tools to do something useful. Every team I know that bounced off agents bounced because their agent could not touch the systems that mattered. They had a working LangGraph. They did not have a working integration.</p><p>This reframe is also why the framework wars matter less than the marketing on either side wants you to believe. The skills arc this month picks up here. The framework wars get a scorecard at the end of May, and that scorecard is going to read as anticlimactic, because every framework on the list does the wrapping job well enough to ship. What separates the teams that ship from the teams that demo is the catalog of capabilities they have built into their agent: which systems it can reach, which actions it can take, which workflows it can complete end to end without a human gluing the steps together. The next few posts cover Claude Skills as the cleanest vendor-shipped expression of that catalog, the packaging pattern we have landed on for distributing skills as npm packages across teams, and the architectural argument for treating skills as the unit of reuse instead of the agent.</p><p>Stop arguing about the loop. Start cataloging the tools. That is the only inventory that predicts whether you ship.</p><div><hr></div><p><em>If this was useful, forward it to one engineer who needs less noise in their feed.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://signalovernoise.tech/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://signalovernoise.tech/subscribe?"><span>Subscribe now</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://signalovernoise.tech/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share Signal Over Noise&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://signalovernoise.tech/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share Signal Over Noise</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[DSPy: The Stanford Project That Treats Prompts as a Compiler Problem]]></title><description><![CDATA[The prompt is not the artifact.]]></description><link>https://signalovernoise.tech/p/dspy-the-stanford-project-that-treats</link><guid isPermaLink="false">https://signalovernoise.tech/p/dspy-the-stanford-project-that-treats</guid><dc:creator><![CDATA[Justin Wilson]]></dc:creator><pubDate>Fri, 22 May 2026 10:47:43 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!oB2G!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce11bc2-617b-4eb9-aca3-d175a88c163b_1728x960.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oB2G!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce11bc2-617b-4eb9-aca3-d175a88c163b_1728x960.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oB2G!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce11bc2-617b-4eb9-aca3-d175a88c163b_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!oB2G!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce11bc2-617b-4eb9-aca3-d175a88c163b_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!oB2G!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce11bc2-617b-4eb9-aca3-d175a88c163b_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!oB2G!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce11bc2-617b-4eb9-aca3-d175a88c163b_1728x960.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oB2G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce11bc2-617b-4eb9-aca3-d175a88c163b_1728x960.png" width="1456" height="809" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5ce11bc2-617b-4eb9-aca3-d175a88c163b_1728x960.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:809,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2076429,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://signalovernoise.tech/i/198826592?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce11bc2-617b-4eb9-aca3-d175a88c163b_1728x960.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oB2G!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce11bc2-617b-4eb9-aca3-d175a88c163b_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!oB2G!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce11bc2-617b-4eb9-aca3-d175a88c163b_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!oB2G!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce11bc2-617b-4eb9-aca3-d175a88c163b_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!oB2G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce11bc2-617b-4eb9-aca3-d175a88c163b_1728x960.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>The prompt is not the artifact. The compiler is.</em></p><div><hr></div><p>The third time you rewrite the same production prompt because a new model arrived, you suspect the problem is not the prompts. It&#8217;s the loop that produced them. Every team I know has a folder of brittle, hand-tuned prompts that work right now and that nobody wants to touch. Swap the model, retune all of them by hand, and hope the regressions show up in QA instead of three weeks later in a customer complaint.</p><p>DSPy is the Stanford project that looked at that loop in 2022 and decided prompts were a compiler problem. You don&#8217;t write the prompt. You declare what the program needs to do, hand the framework a metric and some training examples, and let it generate and refine the prompts for you. The framework grew out of earlier work on Demonstrate-Search-Predict and ColBERT, which is why the early docs read more like a research paper than a tool guide. The framing has been throwing people off ever since.</p><p>Here is what it actually looks like in code. You define a signature: <code>context, question -&gt; answer</code>. That signature behaves like a Python type hint with intent baked in. You wire it into a module: <code>dspy.Predict</code> for a straight call, <code>dspy.ChainOfThought</code> when you want reasoning steps, <code>dspy.ReAct</code> when the module needs tool use. The module is the executable. The signature tells DSPy what the module is supposed to accomplish, not how the prompt should be worded.</p><p>The interesting part is the compile step. You hand DSPy an optimizer (a &#8220;teleprompter&#8221; in the original naming, which is part of why people think the project is exotic), a metric function, and a small training set. <code>BootstrapFewShot</code> generates demonstrations from your training data. <code>MIPROv2</code> searches over instructions and few-shot examples jointly. <code>COPRO</code> rewrites instructions when the demonstrations alone aren&#8217;t carrying the program. The output is an optimized version of your module with selected examples and refined instructions baked in. You don&#8217;t see the prompts unless you ask. You see better metrics.</p><p>That&#8217;s the part most teams miss. DSPy is not a prompt library. It&#8217;s a compiler that produces prompts as an artifact. The mental model shift is the same one that happened when teams moved from hand-tuning SQL to letting the query planner do it. Some queries you still want to write by hand. Most, you don&#8217;t.</p><p>The honest case for using DSPy in 2026 is model migration. Most production teams I see are on their third or fourth model swap since GPT-4 came out: GPT-4o, Sonnet 3.5, Sonnet 4, Sonnet 4.6, Haiku 4.5 when latency matters, open-source mixes when cost dominates. Each swap, the team gets to retune the prompts that worked on the last model. If those prompts were declared in DSPy, the swap is a re-compile against the new model and a delta-check on the metric. That&#8217;s the difference between an afternoon and a sprint.</p><p>The reframe has a price, and most teams haven&#8217;t paid it yet. DSPy compiles against a metric. If your metric is &#8220;does the output look right when I read five examples,&#8221; your compiled program will be tuned to look right when you read five examples. The metric has to correlate with production quality. Building that metric is half the work, and most teams haven&#8217;t done it. They&#8217;re running production AI without a way to score outputs at scale, which is also why they can&#8217;t tell when a model swap silently regressed their pipeline.</p><p>The same problem hits the training set. DSPy&#8217;s optimizers learn from whatever examples you hand them. If your training set is twenty hand-picked clean cases, the compiled program will be sharp on twenty hand-picked clean cases and brittle on everything else. The dataset has to represent prod. That means messy inputs, edge cases, and the kind of failure modes you would find in your worst customer tickets. This is closer to test data discipline than prompt engineering, which is part of the point.</p><p>There are real costs beyond the dataset work. Compile times for MIPROv2 can run into the hundreds of dollars on a complex pipeline, especially if you&#8217;re optimizing against a frontier model. The abstraction feels heavy for the first few hours. Debugging a compiled program is harder than debugging a hand-written prompt, because you&#8217;re now debugging the compiler&#8217;s choices alongside your own. None of these are blocking. They are real, and teams who skip the learning curve quietly fall back to handwritten prompts within a week.</p><p>The framework is still underrated in 2026, which is part of why it belongs in a month focused on tools. The agent framework debate keeps pulling oxygen out of conversations about how the prompts inside those agents actually get built. Most agent frameworks treat prompts as strings you write and tune by hand. DSPy treats them as artifacts you compile. The 2024 and 2025 work on MIPROv2 and BetterTogether closed enough of the gap between research demos and production code that if you tried DSPy in 2023 and walked away, the project deserves a second look. Those two stances on prompts don&#8217;t have the same future.</p><p>DSPy is the wrong fit when the prompt is a one-shot formatting task and the metric would cost more to build than the prompt would cost to maintain. A retrieval call that asks the model to extract JSON from a known schema does not need a compiler. Three calls into a multi-stage RAG pipeline where the first call&#8217;s misclassification cascades into the third call&#8217;s hallucination is exactly where the compiler earns its place. The judgment call is whether the program has enough moving parts to justify the harness work.</p><p>If your team has already built an eval harness and a representative dataset, you&#8217;re eighty percent of the way to getting real value from DSPy. The remaining work is wiring your existing modules through <code>dspy.Predict</code> or <code>dspy.ChainOfThought</code> and watching what the compiler does with them. If your team hasn&#8217;t built that harness yet, that&#8217;s the project worth starting first. The compiler is waiting.</p><div><hr></div><p><em>If this was useful, forward it to one engineer who needs less noise in their feed.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://signalovernoise.tech/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://signalovernoise.tech/subscribe?"><span>Subscribe now</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://signalovernoise.tech/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share Signal Over Noise&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://signalovernoise.tech/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share Signal Over Noise</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Your RAG Pipeline Is an Agent Now, Whether You Know It or Not]]></title><description><![CDATA[An engineer walked me through his &#8220;RAG pipeline&#8221; last week and, by the third minute, had described query rewriting, intent classification, multi-hop retrieval across two vector stores, a Cohere reranker, a tool call into Salesforce for live opportunity data, conditional routing between summarization and direct-answer modes, and a retry loop when the structured-output validator rejected the model&#8217;s response.]]></description><link>https://signalovernoise.tech/p/your-rag-pipeline-is-an-agent-now</link><guid isPermaLink="false">https://signalovernoise.tech/p/your-rag-pipeline-is-an-agent-now</guid><dc:creator><![CDATA[Justin Wilson]]></dc:creator><pubDate>Thu, 21 May 2026 10:24:02 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!o4uZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1779de56-5368-4a17-b8fa-546d6b0502de_1728x960.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!o4uZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1779de56-5368-4a17-b8fa-546d6b0502de_1728x960.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!o4uZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1779de56-5368-4a17-b8fa-546d6b0502de_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!o4uZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1779de56-5368-4a17-b8fa-546d6b0502de_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!o4uZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1779de56-5368-4a17-b8fa-546d6b0502de_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!o4uZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1779de56-5368-4a17-b8fa-546d6b0502de_1728x960.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!o4uZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1779de56-5368-4a17-b8fa-546d6b0502de_1728x960.png" width="1456" height="809" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1779de56-5368-4a17-b8fa-546d6b0502de_1728x960.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:809,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1662376,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://signalovernoise.tech/i/198680924?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1779de56-5368-4a17-b8fa-546d6b0502de_1728x960.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!o4uZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1779de56-5368-4a17-b8fa-546d6b0502de_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!o4uZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1779de56-5368-4a17-b8fa-546d6b0502de_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!o4uZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1779de56-5368-4a17-b8fa-546d6b0502de_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!o4uZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1779de56-5368-4a17-b8fa-546d6b0502de_1728x960.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>An engineer walked me through his &#8220;RAG pipeline&#8221; last week and, by the third minute, had described query rewriting, intent classification, multi-hop retrieval across two vector stores, a Cohere reranker, a tool call into Salesforce for live opportunity data, conditional routing between summarization and direct-answer modes, and a retry loop when the structured-output validator rejected the model&#8217;s response. Then he said the latency was up and asked if I had ideas on tuning the chunker.</p><p>The chunker is not the problem. The label is.</p><p>Whatever this thing is, it stopped being a retrieval-augmented generation pipeline a while ago. The 2024 RAG playbook had four steps: embed the query, retrieve top-k, stuff context into the prompt, generate. That is not a description of any production system I have reviewed this year. The diagrams still say RAG. The runbooks still say RAG. The JIRA tickets still say RAG. The runtime behavior is something else entirely.</p><p>The line between RAG and agents was always thinner than people pretended. Retrieval is a tool call. Reranking is a control-flow decision. Conditional routing through different prompts is a loop with branching. Each of those moves, in isolation, was a small and sensible improvement on a brittle baseline. Stacked together over eighteen months, they add up to a non-deterministic loop with state, tool access, and conditional control flow. That is the working definition of an agent in every framework that ships one.</p><p>What practitioners actually built, while telling themselves they were &#8220;improving the RAG pipeline,&#8221; reads like a tour of the agent playbook. Query decomposition, because single-shot retrieval missed compound questions. Reranking with Cohere or BGE, because semantic similarity alone returned topical-but-wrong context. Multi-hop retrieval, because the answer required chaining facts across documents. Tool calls into Postgres or Snowflake, because the customer wanted live numbers and not a vector of last quarter&#8217;s PDF. Conditional branching, because some questions needed retrieval and some needed a calculator. Retry logic with structured-output validation, because the model returned malformed JSON one time in fifty and oncall got tired of paging. Together, they ship as an agent.</p><p>The honest pushback here is that this is semantics. Call it a pipeline, call it an agent, the thing still answers customer questions and the chunker still needs tuning. That argument would hold if it were not for what the framework maintainers did next. Haystack 2.x rewrote its component model so tool integration is first-class, and the Pipeline type now does what a runtime control-flow graph does. LlamaIndex Workflows added explicit state, event-driven steps, and conditional routing, then started shipping templates labeled &#8220;agent&#8221; in the docs. Neither project pivoted. They grew into the shape their users were already forcing on them. The maintainers watched the issues and PRs roll in, and all of them were about multi-step, stateful, tool-using flows. The label changed because the work changed.</p><p>This relabeling problem matters because the operational story you wrote for a deterministic retrieval pipeline does not survive contact with a non-deterministic agent loop. Tracing a RAG pipeline means logging the query, the retrieved chunks, and the prompt. Tracing an agent means logging the trajectory: every step, every tool call, every branch decision, every retry. Eval gating a RAG pipeline scores retrieval precision and answer faithfulness against a golden set. For an agent, you score trajectories instead of endpoints, because two correct answers can come from one good path and one disaster path that happened to converge. Incident response on a RAG pipeline starts at the embedder and the index. On an agent, you ask which tool call failed, which branch fired, and which retry consumed the budget. The runbooks are not interchangeable.</p><p>The cost of the mislabel shows up on a Tuesday afternoon, when oncall posts &#8220;the RAG pipeline is broken&#8221; in the incident channel and starts checking the vector store. Two hours in, they find the actual problem: a tool call to Salesforce hit a rate limit, the retry loop ate the timeout budget, and the model produced a confident answer from stale context because the fallback path did not reflect the failure upstream. None of that lives in the RAG runbook, because on paper this is a RAG system. The team learns the wrong lesson, files the wrong ticket, and the next incident has the same shape because nobody updated the mental model. That is what the label costs you.</p><p>The upgrade path here is not what people assume it is. Nobody needs to migrate to LangGraph, swap Haystack for CrewAI, or rebuild on Pydantic AI to fix this. The framework you have is almost certainly fine. The Haystack 2.x, LlamaIndex Workflows, or Semantic Kernel pipeline you already run is a reasonable agent runtime. The upgrade is admitting what you have been running, then giving it the operational treatment an agent deserves. Trace every step end to end with OpenTelemetry into Langfuse or Phoenix. Name your loops in code so the trace UI shows you a graph instead of a flat call list. Eval-gate changes with promptfoo or Braintrust against trajectory-level test cases, not single-shot Q&amp;A pairs. Write the failure-mode catalog: what happens when the reranker times out, when the SQL tool errors, when the validator rejects, when the user asks something out of scope. Own the loops you already wrote.</p><p>Once you admit you have been building an agent, the architectural questions get easier. The &#8220;which framework should we adopt&#8221; debate stops mattering, because the framework you have already does the job and the migration cost is real. The &#8220;is our system an agent&#8221; debate stops mattering, because the answer is yes and the question is how to operate it. What remains is the only question that was ever interesting: what capabilities does your system have, and which ones do you give it next?</p><p>That is the question worth your time. The skills arc starts there.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://signalovernoise.tech/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://signalovernoise.tech/subscribe?"><span>Subscribe now</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://signalovernoise.tech/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share Signal Over Noise&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://signalovernoise.tech/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share Signal Over Noise</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Semantic Kernel: The Enterprise Agent Framework Nobody Writes About]]></title><description><![CDATA[The framework AI Twitter ignored and Fortune 500 platform teams quietly shipped.]]></description><link>https://signalovernoise.tech/p/semantic-kernel-the-enterprise-agent</link><guid isPermaLink="false">https://signalovernoise.tech/p/semantic-kernel-the-enterprise-agent</guid><dc:creator><![CDATA[Justin Wilson]]></dc:creator><pubDate>Wed, 20 May 2026 09:43:28 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!bWU_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd29309d0-42ee-44f3-96b3-94fbc4d70e29_1728x960.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bWU_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd29309d0-42ee-44f3-96b3-94fbc4d70e29_1728x960.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bWU_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd29309d0-42ee-44f3-96b3-94fbc4d70e29_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!bWU_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd29309d0-42ee-44f3-96b3-94fbc4d70e29_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!bWU_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd29309d0-42ee-44f3-96b3-94fbc4d70e29_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!bWU_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd29309d0-42ee-44f3-96b3-94fbc4d70e29_1728x960.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bWU_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd29309d0-42ee-44f3-96b3-94fbc4d70e29_1728x960.png" width="1456" height="809" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d29309d0-42ee-44f3-96b3-94fbc4d70e29_1728x960.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:809,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2449654,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://signalovernoise.tech/i/198535027?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd29309d0-42ee-44f3-96b3-94fbc4d70e29_1728x960.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bWU_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd29309d0-42ee-44f3-96b3-94fbc4d70e29_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!bWU_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd29309d0-42ee-44f3-96b3-94fbc4d70e29_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!bWU_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd29309d0-42ee-44f3-96b3-94fbc4d70e29_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!bWU_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd29309d0-42ee-44f3-96b3-94fbc4d70e29_1728x960.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>The framework AI Twitter ignored and Fortune 500 platform teams quietly shipped.</em></p><div><hr></div><p>Nobody on AI Twitter writes about Semantic Kernel. The Fortune 500 platform teams I talk to keep shipping it. That gap is the most interesting thing about this framework, and it explains why Semantic Kernel has outlasted three full hype cycles while LangChain forks burn through their welcome.</p><p>Microsoft&#8217;s Semantic Kernel started life in 2023 as a Microsoft Research project for stitching LLMs into existing application code. Three years later, it has C#, Python, and Java SDKs, a stable plugin model, a process orchestration framework, and an Agent abstraction that finally landed properly in the 1.x line. None of that gets attention because none of it ships with a personality: the release notes read like release notes, the docs read like Microsoft Docs, and the samples are written in C#, which is the kiss of death for any AI library hoping to trend.</p><p>That&#8217;s the marketing problem. The engineering reality is that Semantic Kernel quietly does the things enterprise platform teams need a framework to do, in the languages they already write, with the auth providers they already use. The plugin model is the cleanest separation of capability-from-orchestration in any framework I&#8217;ve evaluated: a plugin is a class, a function on that plugin is a method, type safety carries through to the model&#8217;s tool-use payload, and dependency injection works the way DI works everywhere else in the same codebase. None of this is exciting, which is precisely why it survives integration with a fifteen-year-old monolith.</p><p>The C# story matters more than people give it credit for. I have worked inside healthcare and insurance environments where the entire backend is .NET, audit logs are tied to Entra ID identities, and the security team will not approve a Python service for production because nobody on staff can read it. In those shops, &#8220;use LangChain&#8221; is not a decision the platform team can make. Semantic Kernel slots into the existing build pipeline, the existing observability stack, and the existing service mesh without anyone needing to argue about Python packaging. That alone is enough to ship it.</p><p>The Python SDK has caught up enough to matter. It trailed C# for the first eighteen months, which earned Semantic Kernel a reputation as the .NET one that has been hard to shake. As of the 1.x series, the Python feature parity is close enough that you can pick the language based on team preference instead of capability gaps. The Java SDK is further behind, though it exists, which is more than most agent frameworks can say about anything outside Python.</p><p>Where it gets interesting is the Process Framework. Semantic Kernel&#8217;s process model is a graph of steps with explicit event-driven transitions, close to what LangGraph offers but expressed in a way that does not feel like a research artifact. Each step is a regular class. State flows through events. The execution model is observable through the same telemetry pipeline as the rest of your application. If you have built workflows in Azure Durable Functions or AWS Step Functions, the mental model maps directly, and that conceptual continuity is what makes the framework palatable to platform engineers who have spent careers building systems that survive on-call rotations.</p><p>The Agent framework is the newer piece, and it shows the tradeoffs of Microsoft&#8217;s ship-slowly-break-less cadence. Compared to AutoGen 0.4 or CrewAI, the agent abstractions feel deliberately conservative. You get ChatCompletionAgent, OpenAIAssistantAgent, and an AgentGroupChat coordinator for multi-agent scenarios. There is no orchestrator-of-orchestrators pattern, no exotic delegation primitives, no team-of-specialists DSL. It does the basic thing well and leaves the unusual stuff to your own code. For production work, that&#8217;s usually what you actually want. For a hackathon, it&#8217;s underwhelming.</p><p>The places it falls short are predictable. Documentation sprawls across three language tracks and reads like reference material rather than narrative, which is fine when you know what you are looking for and brutal when you are learning. The community plugin ecosystem is a fraction of LangChain&#8217;s, so you will end up writing wrappers around vendor SDKs that someone in the LangChain orbit already wrote. The release cadence is slower than the rest of the field, which is a feature in production and a liability when a new model API drops and you are waiting on the official binding.</p><p>The vendor-lock question deserves a direct answer. Semantic Kernel is not Azure-only. It supports OpenAI, Azure OpenAI, Hugging Face, ONNX, Ollama, Mistral, Anthropic, and Google models through first-party or community connectors. It runs anywhere .NET, Python, or Java runs. The Microsoft incentive shows up in subtler ways: the path of least resistance for memory and embeddings points at Azure AI Search, the auth examples assume Entra ID, the deployment patterns lean on Azure Container Apps. None of that is a lock. It is gravity, and most teams who choose Semantic Kernel have already chosen to live inside that gravity for unrelated reasons.</p><p>The honest comparison runs like this. If your team writes Python and lives on AI Twitter, you will hate Semantic Kernel and pick something with more momentum. If your team writes C# or Java, ships to enterprise customers, and answers to a procurement org that wants Microsoft on the receipt, Semantic Kernel is the only serious option in the agent space. If your team writes Python inside an enterprise that has standardized on Azure, the Python SDK gets you the same compliance posture without forcing a language switch. Picking the wrong one of these three modes is how teams end up with frameworks they cannot defend in a security review six months later.</p><p>The Fortune 500 platform teams that quietly ship Semantic Kernel are not making a hype-resistant statement. They are picking the tool that fits the rest of their stack, that their security team will approve, and that has Microsoft&#8217;s name on the support contract. Those criteria do not generate Twitter threads. They do generate production systems that ship and stay shipped. Most of the AI tooling conversation is calibrated for the first hour of building something. Semantic Kernel is calibrated for the third year of operating it. That gap, more than anything, is why the framework nobody writes about keeps showing up in the systems that actually matter.</p><div><hr></div><p><em>If this was useful, forward it to one engineer who needs less noise in their feed.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://signalovernoise.tech/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://signalovernoise.tech/subscribe?"><span>Subscribe now</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://signalovernoise.tech/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share Signal Over Noise&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://signalovernoise.tech/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share Signal Over Noise</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[LlamaIndex Quietly Became an Agent Framework]]></title><description><![CDATA[The pivot from RAG library to agent runtime is real, and the Workflow abstraction is why.]]></description><link>https://signalovernoise.tech/p/llamaindex-quietly-became-an-agent</link><guid isPermaLink="false">https://signalovernoise.tech/p/llamaindex-quietly-became-an-agent</guid><dc:creator><![CDATA[Justin Wilson]]></dc:creator><pubDate>Tue, 19 May 2026 09:40:13 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!5R0f!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f272fa-9c9f-4be5-8788-76935cd11e11_1728x960.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5R0f!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f272fa-9c9f-4be5-8788-76935cd11e11_1728x960.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5R0f!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f272fa-9c9f-4be5-8788-76935cd11e11_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!5R0f!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f272fa-9c9f-4be5-8788-76935cd11e11_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!5R0f!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f272fa-9c9f-4be5-8788-76935cd11e11_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!5R0f!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f272fa-9c9f-4be5-8788-76935cd11e11_1728x960.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5R0f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f272fa-9c9f-4be5-8788-76935cd11e11_1728x960.png" width="1456" height="809" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b6f272fa-9c9f-4be5-8788-76935cd11e11_1728x960.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:809,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1909459,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://signalovernoise.tech/i/198384132?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f272fa-9c9f-4be5-8788-76935cd11e11_1728x960.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5R0f!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f272fa-9c9f-4be5-8788-76935cd11e11_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!5R0f!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f272fa-9c9f-4be5-8788-76935cd11e11_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!5R0f!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f272fa-9c9f-4be5-8788-76935cd11e11_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!5R0f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f272fa-9c9f-4be5-8788-76935cd11e11_1728x960.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>The pivot from RAG library to agent runtime is real, and the Workflow abstraction is why.</em></p><div><hr></div><p>Two years ago I used LlamaIndex to do one thing: index a directory of policy PDFs and answer questions against them. The query engine worked. The retriever worked. I didn&#8217;t think about it again. Last month I opened the latest docs to upgrade that pipeline and found a framework I barely recognized.</p><p>The README now describes LlamaIndex as the leading framework for building LLM-powered agents over your data. Two years ago it said something closer to a data framework for LLM applications. The shift is real. Workflows now sits at the center of the architecture. Agents are described as workflows. RAG pipelines are described as workflows. The thing that used to be a document indexing library is now an event-driven orchestration engine that happens to ship with the best document loaders in the ecosystem.</p><p>This is the kind of pivot vendors usually botch. The old API gets bolted onto the new one with adapter shims, the documentation forks, the community splits between people who learned the old way and people learning the new way. LlamaIndex avoided most of that. The Workflow primitive is a better abstraction than the Agent class it largely replaced, and the team had the discipline to make the migration path obvious instead of hiding it behind marketing copy.</p><p>Workflows are event-driven steps connected by typed messages. A step receives an event, does some work, and emits another event. The runtime handles routing, retries, and parallel execution. If that sounds like Temporal or Inngest with an LLM-flavored wrapper, that&#8217;s roughly right. The difference is that the events carry LLM-relevant context cleanly and the step decorators handle async-first execution without ceremony. You write a step like this:</p><pre><code>@step
async def retrieve(self, ev: QueryEvent) -&gt; RetrievalEvent:
nodes = await self.retriever.aretrieve(ev.query)

return RetrievalEvent(nodes=nodes, query=ev.query)</code></pre><p>Three things matter here. The step is typed in both directions, so the workflow engine can validate that your graph actually wires together before any model call happens. The step is async, so I/O-heavy work runs without blocking. The event is a real object, not a dict, which means refactoring across steps doesn&#8217;t degrade into find-and-replace across keys you hoped you spelled the same everywhere.</p><p>This is the part that pulled me back in. Most agent frameworks I evaluated last quarter either pretended async didn&#8217;t exist or required me to wrap their synchronous APIs in thread pools to recover throughput. LlamaIndex Workflows assumes async. The agent loop is a workflow where one step calls a model and another step routes based on tool calls. There is no separate agent execution model fighting the rest of your code.</p><p>The honest comparison is against LangGraph, which is the framework people reach for when they want graph-based agent orchestration. LangGraph&#8217;s state-machine model is more rigorous. You declare nodes and edges explicitly, the state object is shared and reducible, and the supervisor pattern for multi-agent setups is more developed. LlamaIndex Workflows trades some of that rigor for ergonomics. The event-driven model feels closer to writing normal Python and less like configuring a state machine. For a small team that needs to ship something maintainable, this matters more than the theoretical purity of the abstraction.</p><p>The pivot does have rough edges. The legacy Agent and AgentRunner classes still exist in the codebase, still appear in older tutorials, and still work, which means new practitioners hit the documentation and have to figure out which abstraction is the current one. The team has moved hard toward AgentWorkflow as the canonical pattern, but the older surface area hasn&#8217;t been deprecated cleanly. If you&#8217;re starting today, ignore everything that doesn&#8217;t say workflow and you&#8217;ll save yourself a week.</p><p>The other rough edge is the multi-agent story. AgentWorkflow handles handoffs and shared state between agents, but the patterns are less mature than what LangGraph or CrewAI offer. If your use case is one agent with a real toolset and a real retrieval layer, LlamaIndex is at or near the top of the lineup. If your use case is a coordinated team of specialist agents with handoffs and shared scratchpads, you&#8217;ll spend more time reinventing what other frameworks give you out of the box.</p><p>What didn&#8217;t change is the data layer, and this is the actual reason to pick LlamaIndex over a more abstract framework. LlamaHub still has the largest catalog of document loaders, retrievers, and storage integrations in the ecosystem. LlamaParse handles the kind of PDFs that break every other parser I&#8217;ve tried, including the ones with three-column layouts and embedded tables that other libraries reduce to noise. If your agent needs to work with documents rather than chat about them, you start every other framework at a deficit because you&#8217;ll be reimplementing pieces LlamaIndex already shipped two years ago.</p><p>The pivot is real. It&#8217;s not naming. The Workflow abstraction is an architectural commitment, not a marketing gesture. The team didn&#8217;t paint agents on the side of the truck and call it new. They built an event-driven runtime, made it the core, and rewrote the agent patterns to sit on top of it. That&#8217;s the move I want to see when a tools company adapts to a new shape of work, and few teams have managed it without breaking their users along the way.</p><p>Where this lands in production: I&#8217;d use LlamaIndex today for any agent whose primary job is reasoning over documents or structured retrieval, particularly anything where the data layer matters more than the orchestration layer. For pure tool-using agents with no retrieval involved, I&#8217;d still reach for Pydantic AI when type safety dominates or LangGraph when explicit state machines do. The choice isn&#8217;t about which framework is best, it&#8217;s about which gap in your system matters most. LlamaIndex is the right answer when the gap is data.</p><p>The library that started as RAG infrastructure didn&#8217;t pretend to become something else. It built the orchestration layer it always needed, made workflows the spine, and kept the data integrations its users depended on. That&#8217;s the rarer transition. Most frameworks pivot by abandoning what made them useful. This one pivoted by building outward from it.</p><div><hr></div><p><em>If this was useful, forward it to one engineer who needs less noise in their feed.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://signalovernoise.tech/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://signalovernoise.tech/subscribe?"><span>Subscribe now</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://signalovernoise.tech/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share Signal Over Noise&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://signalovernoise.tech/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share Signal Over Noise</span></a></p>]]></content:encoded></item><item><title><![CDATA[Haystack Is Still Here, And It's Better Than You Remember]]></title><description><![CDATA[The 2.x rewrite turned a RAG framework into a quietly serious agent platform, and almost nobody talked about it.]]></description><link>https://signalovernoise.tech/p/haystack-is-still-here-and-its-better</link><guid isPermaLink="false">https://signalovernoise.tech/p/haystack-is-still-here-and-its-better</guid><dc:creator><![CDATA[Justin Wilson]]></dc:creator><pubDate>Mon, 18 May 2026 10:46:49 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!1NEn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb163fef0-b9a5-4bd4-a44c-dd5d24411441_1728x960.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1NEn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb163fef0-b9a5-4bd4-a44c-dd5d24411441_1728x960.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1NEn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb163fef0-b9a5-4bd4-a44c-dd5d24411441_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!1NEn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb163fef0-b9a5-4bd4-a44c-dd5d24411441_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!1NEn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb163fef0-b9a5-4bd4-a44c-dd5d24411441_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!1NEn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb163fef0-b9a5-4bd4-a44c-dd5d24411441_1728x960.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1NEn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb163fef0-b9a5-4bd4-a44c-dd5d24411441_1728x960.png" width="1456" height="809" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b163fef0-b9a5-4bd4-a44c-dd5d24411441_1728x960.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:809,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2074611,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://signalovernoise.tech/i/198242738?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb163fef0-b9a5-4bd4-a44c-dd5d24411441_1728x960.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1NEn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb163fef0-b9a5-4bd4-a44c-dd5d24411441_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!1NEn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb163fef0-b9a5-4bd4-a44c-dd5d24411441_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!1NEn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb163fef0-b9a5-4bd4-a44c-dd5d24411441_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!1NEn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb163fef0-b9a5-4bd4-a44c-dd5d24411441_1728x960.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>The 2.x rewrite turned a RAG framework into a quietly serious agent platform, and almost nobody talked about it.</em></p><div><hr></div><p>The last RAG service my team shipped without an emergency rollback ran on Haystack. The pipeline was eleven components long. It validated its own wiring at build time, ran async under FastAPI without bespoke glue, exported OpenTelemetry traces to the same dashboard the rest of the platform used, and required exactly one engineer to maintain. Nobody on the team gave a conference talk about it. That is the entire point.</p><p>Haystack is the open-source framework from deepset, a German NLP company that has been building production search and QA systems since before &#8220;agent framework&#8221; was a phrase that meant anything. The 1.x line, which most engineers tried once around 2022 and then quietly stopped thinking about, was a monolithic toolkit with separate indexing and query pipelines, a REST surface that did not quite feel native, and abstractions that had grown by accretion rather than design. If your memory of Haystack ends there, the catch-up is real, because Haystack 2.0 in March 2024 was a ground-up rewrite, not a minor version bump. The framework that wears the same name in 2026 is a different framework, and pretending otherwise is the reason it gets skipped in tool roundups by engineers who should know better.</p><p>The architectural primitive in 2.x is the Component, and every Component is a Python class with typed inputs, typed outputs, and a single <code>run</code> method. A Pipeline is a directed graph of Components with the connections drawn explicitly between named sockets. When the pipeline is constructed, Haystack walks the graph and validates that every connection links a producer output to a consumer input of a compatible type. Misnamed sockets, type mismatches, missing components, all caught at build time rather than at the moment a user query hits the broken edge in production. This sounds like table stakes until you compare it to the agent frameworks that spent April defining themselves as &#8220;agnostic&#8221; by making every payload an unstructured dictionary.</p><p>The composition story is what makes the pipeline model age well. A retrieval pipeline is a graph of an embedder, a retriever, and a ranker. A RAG pipeline adds a prompt builder and a generator. An agent pipeline adds an Agent component that owns the tool-calling loop and routes back through retrieval steps as needed. The same Component contract holds across all of them, which means the eleven-component pipeline I shipped uses the same primitives as the three-component prototype that came before it. The framework does not change shape as the problem grows.</p><p>The Agent component is where the 2.x story becomes a 2026 story. deepset added it in Haystack 2.4, and the design is intentionally narrow: an Agent is a Component that wraps a generator, a tool list, and a control loop, with state passed through the pipeline graph the same way any other intermediate value would be. There is no separate agent runtime, no parallel execution model, no second framework hiding inside the first. If LangGraph&#8217;s pitch is &#8220;state machines for agents,&#8221; Haystack&#8217;s pitch is closer to &#8220;agents are one Component in a pipeline you already understand.&#8221; That framing trades flexibility for legibility, and the trade pays off the third time someone unfamiliar with the codebase has to debug a production incident at 11pm.</p><p>The integration surface is the part that makes Haystack a viable default rather than a niche pick. The document store list covers Elasticsearch, OpenSearch, Weaviate, Qdrant, Pinecone, Chroma, Milvus, pgvector, MongoDB Atlas, Astra DB, and a half-dozen others, all behind a uniform interface that lets you swap stores without rewriting the rest of the pipeline. The model integrations include OpenAI, Anthropic, Cohere, HuggingFace, AWS Bedrock, Vertex, Azure OpenAI, Ollama, vLLM, and the long tail of OSS inference endpoints. The deployment story is <code>hayhooks</code>, deepset&#8217;s pipeline-as-REST-service runner, or you write your own FastAPI wrapper around the pipeline object, which is a ten-line file. Nothing here is invented for the sake of being new. All of it is the boring thing that already works.</p><p>Where Haystack trails is exactly where you would expect a framework written by people who care about production to trail. The developer-experience polish is workmanlike rather than seductive. The pipeline-build error messages are accurate but rarely delightful. The visual builder in deepset Studio exists and is genuinely useful for non-engineering stakeholders, though the OSS package itself is a Python-first tool that assumes you read code more than diagrams. The documentation is thorough but dense, optimized for the engineer who needs the right answer, not the engineer who wants to feel inspired about agents in the next six weeks. None of this is a defect. It is a posture, and the posture is consistent.</p><p>The honest comparison to the frameworks that dominated this newsletter for the last two weeks comes down to what kind of problem you are actually solving. If the work is heavily retrieval-shaped, with documents, embeddings, rerankers, structured outputs, and a model call at the end, Haystack is the framework that was already built for that and has the most mature library of components for the supporting cast. If the work is heavily orchestration-shaped, with branching control flow, human-in-the-loop checkpoints, and multi-step recovery logic, LangGraph&#8217;s state-machine model maps to the problem more naturally. If the work is a structured multi-agent workflow with role specialization, CrewAI&#8217;s vocabulary fits. The categories overlap, though they overlap less than the marketing of any single framework wants to suggest. Picking Haystack for the agent-with-tools-and-retrieval case in 2026 is not a hedge against the hype cycle. It is the answer that was right before the hype cycle and stayed right through it.</p><p>The failure modes are worth naming honestly. The pipeline graph model is rigid in exactly the way it is supposed to be rigid, which means problems that genuinely require dynamic graph mutation at runtime do not fit cleanly. Agent loops inside Haystack work well when the tool surface and the control flow are bounded, and they get awkward when the agent needs to compose new sub-pipelines on the fly. The component contract assumes mostly Python, which means polyglot teams that have settled on TypeScript for the agent layer will find Mastra or the OpenAI Agents SDK a more natural fit. None of these are reasons to avoid Haystack. They are reasons to know which slot it fills, which is the production-RAG-and-bounded-agents slot, not the freeform-orchestration slot.</p><p>The reason Haystack belongs in this month&#8217;s framework rundown, and the reason it gets the kickoff slot for the &#8220;older stack holds up&#8221; arc, is the same reason teams ignore it and shouldn&#8217;t. The framework has a name that sounds like 2022. The 2.x architecture is a 2026-grade composition model with typed graphs, async execution, agent support, and a serving story. The gap between the perception and the reality is the kind of gap that careers get made in, because the engineer who has shipped two services on Haystack 2.x in the last year is not waiting for the next agent framework to stabilize. They are already shipping the third. Boring and ships is not a slogan. It is the most honest description of a tool that you will ever read in a blog post about it.</p><div><hr></div><p><em>If this was useful, forward it to one engineer who needs less noise in their feed.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://signalovernoise.tech/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://signalovernoise.tech/subscribe?"><span>Subscribe now</span></a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://signalovernoise.tech/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share Signal Over Noise&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://signalovernoise.tech/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share Signal Over Noise</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[You Can't Debug What You Can't See]]></title><description><![CDATA[Logs were built for deterministic systems.]]></description><link>https://signalovernoise.tech/p/you-cant-debug-what-you-cant-see</link><guid isPermaLink="false">https://signalovernoise.tech/p/you-cant-debug-what-you-cant-see</guid><dc:creator><![CDATA[Justin Wilson]]></dc:creator><pubDate>Sun, 17 May 2026 10:53:28 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!JqY8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdf44f95-daa3-4daa-a63f-b5bd219c192c_1728x960.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JqY8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdf44f95-daa3-4daa-a63f-b5bd219c192c_1728x960.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JqY8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdf44f95-daa3-4daa-a63f-b5bd219c192c_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!JqY8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdf44f95-daa3-4daa-a63f-b5bd219c192c_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!JqY8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdf44f95-daa3-4daa-a63f-b5bd219c192c_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!JqY8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdf44f95-daa3-4daa-a63f-b5bd219c192c_1728x960.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JqY8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdf44f95-daa3-4daa-a63f-b5bd219c192c_1728x960.png" width="1456" height="809" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fdf44f95-daa3-4daa-a63f-b5bd219c192c_1728x960.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:809,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1886211,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://signalovernoise.tech/i/198107779?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdf44f95-daa3-4daa-a63f-b5bd219c192c_1728x960.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JqY8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdf44f95-daa3-4daa-a63f-b5bd219c192c_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!JqY8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdf44f95-daa3-4daa-a63f-b5bd219c192c_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!JqY8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdf44f95-daa3-4daa-a63f-b5bd219c192c_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!JqY8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdf44f95-daa3-4daa-a63f-b5bd219c192c_1728x960.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Logs were built for deterministic systems. LLMs broke that contract, and most production agents are still running on the old playbook.</em></p>
      <p>
          <a href="https://signalovernoise.tech/p/you-cant-debug-what-you-cant-see">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Arize Phoenix: The Open-Source Tracing Tool That Actually Works Locally]]></title><description><![CDATA[One pip install, one launch call, and the failure mode in your retrieval pipeline stops being invisible.]]></description><link>https://signalovernoise.tech/p/arize-phoenix-the-open-source-tracing</link><guid isPermaLink="false">https://signalovernoise.tech/p/arize-phoenix-the-open-source-tracing</guid><dc:creator><![CDATA[Justin Wilson]]></dc:creator><pubDate>Sat, 16 May 2026 10:22:59 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!eiG3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0beeae47-7a45-40a2-a3df-b30f0a1aa773_1728x960.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eiG3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0beeae47-7a45-40a2-a3df-b30f0a1aa773_1728x960.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eiG3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0beeae47-7a45-40a2-a3df-b30f0a1aa773_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!eiG3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0beeae47-7a45-40a2-a3df-b30f0a1aa773_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!eiG3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0beeae47-7a45-40a2-a3df-b30f0a1aa773_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!eiG3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0beeae47-7a45-40a2-a3df-b30f0a1aa773_1728x960.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eiG3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0beeae47-7a45-40a2-a3df-b30f0a1aa773_1728x960.png" width="1456" height="809" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0beeae47-7a45-40a2-a3df-b30f0a1aa773_1728x960.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:809,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2083836,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://signalovernoise.tech/i/197977572?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0beeae47-7a45-40a2-a3df-b30f0a1aa773_1728x960.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eiG3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0beeae47-7a45-40a2-a3df-b30f0a1aa773_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!eiG3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0beeae47-7a45-40a2-a3df-b30f0a1aa773_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!eiG3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0beeae47-7a45-40a2-a3df-b30f0a1aa773_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!eiG3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0beeae47-7a45-40a2-a3df-b30f0a1aa773_1728x960.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>One pip install, one launch call, and the failure mode in your retrieval pipeline stops being invisible.</em></p>
      <p>
          <a href="https://signalovernoise.tech/p/arize-phoenix-the-open-source-tracing">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Langfuse: The Open-Source Observability Stack You Can Actually Self-Host]]></title><description><![CDATA[When the SaaS alternative gets killed in security review, this is the tracing platform you ship instead.]]></description><link>https://signalovernoise.tech/p/langfuse-the-open-source-observability</link><guid isPermaLink="false">https://signalovernoise.tech/p/langfuse-the-open-source-observability</guid><dc:creator><![CDATA[Justin Wilson]]></dc:creator><pubDate>Fri, 15 May 2026 10:31:26 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!dU3g!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a1d75aa-5461-450d-a356-ea87bf2b0d49_1728x960.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dU3g!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a1d75aa-5461-450d-a356-ea87bf2b0d49_1728x960.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dU3g!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a1d75aa-5461-450d-a356-ea87bf2b0d49_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!dU3g!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a1d75aa-5461-450d-a356-ea87bf2b0d49_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!dU3g!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a1d75aa-5461-450d-a356-ea87bf2b0d49_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!dU3g!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a1d75aa-5461-450d-a356-ea87bf2b0d49_1728x960.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dU3g!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a1d75aa-5461-450d-a356-ea87bf2b0d49_1728x960.png" width="1456" height="809" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0a1d75aa-5461-450d-a356-ea87bf2b0d49_1728x960.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:809,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2158400,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://signalovernoise.tech/i/197836188?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a1d75aa-5461-450d-a356-ea87bf2b0d49_1728x960.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dU3g!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a1d75aa-5461-450d-a356-ea87bf2b0d49_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!dU3g!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a1d75aa-5461-450d-a356-ea87bf2b0d49_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!dU3g!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a1d75aa-5461-450d-a356-ea87bf2b0d49_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!dU3g!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a1d75aa-5461-450d-a356-ea87bf2b0d49_1728x960.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>When the SaaS alternative gets killed in security review, this is the tracing platform you ship instead.</em></p>
      <p>
          <a href="https://signalovernoise.tech/p/langfuse-the-open-source-observability">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[LangSmith: Useful Even If You Hate LangChain]]></title><description><![CDATA[The framework lock-in finally broke, and the tracing platform underneath earns a second look.]]></description><link>https://signalovernoise.tech/p/langsmith-useful-even-if-you-hate</link><guid isPermaLink="false">https://signalovernoise.tech/p/langsmith-useful-even-if-you-hate</guid><dc:creator><![CDATA[Justin Wilson]]></dc:creator><pubDate>Thu, 14 May 2026 10:30:30 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!KO31!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1577aad4-3184-4d67-9c3b-857e8d2bd8f2_1728x960.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KO31!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1577aad4-3184-4d67-9c3b-857e8d2bd8f2_1728x960.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KO31!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1577aad4-3184-4d67-9c3b-857e8d2bd8f2_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!KO31!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1577aad4-3184-4d67-9c3b-857e8d2bd8f2_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!KO31!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1577aad4-3184-4d67-9c3b-857e8d2bd8f2_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!KO31!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1577aad4-3184-4d67-9c3b-857e8d2bd8f2_1728x960.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KO31!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1577aad4-3184-4d67-9c3b-857e8d2bd8f2_1728x960.png" width="1456" height="809" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1577aad4-3184-4d67-9c3b-857e8d2bd8f2_1728x960.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:809,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2030993,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://signalovernoise.tech/i/197666452?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1577aad4-3184-4d67-9c3b-857e8d2bd8f2_1728x960.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KO31!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1577aad4-3184-4d67-9c3b-857e8d2bd8f2_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!KO31!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1577aad4-3184-4d67-9c3b-857e8d2bd8f2_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!KO31!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1577aad4-3184-4d67-9c3b-857e8d2bd8f2_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!KO31!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1577aad4-3184-4d67-9c3b-857e8d2bd8f2_1728x960.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>The framework lock-in finally broke, and the tracing platform underneath earns a second look.</em></p>
      <p>
          <a href="https://signalovernoise.tech/p/langsmith-useful-even-if-you-hate">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Braintrust: Evals as a Product, Not a Side Project]]></title><description><![CDATA[Where evals stop being a notebook script and start being a product surface.]]></description><link>https://signalovernoise.tech/p/braintrust-evals-as-a-product-not</link><guid isPermaLink="false">https://signalovernoise.tech/p/braintrust-evals-as-a-product-not</guid><dc:creator><![CDATA[Justin Wilson]]></dc:creator><pubDate>Wed, 13 May 2026 10:41:27 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!8WDq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F772bf438-ac6a-472f-a4f1-5982f8ab2d73_1728x960.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8WDq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F772bf438-ac6a-472f-a4f1-5982f8ab2d73_1728x960.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8WDq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F772bf438-ac6a-472f-a4f1-5982f8ab2d73_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!8WDq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F772bf438-ac6a-472f-a4f1-5982f8ab2d73_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!8WDq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F772bf438-ac6a-472f-a4f1-5982f8ab2d73_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!8WDq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F772bf438-ac6a-472f-a4f1-5982f8ab2d73_1728x960.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8WDq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F772bf438-ac6a-472f-a4f1-5982f8ab2d73_1728x960.png" width="1456" height="809" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/772bf438-ac6a-472f-a4f1-5982f8ab2d73_1728x960.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:809,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1784814,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://signalovernoise.tech/i/197485218?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F772bf438-ac6a-472f-a4f1-5982f8ab2d73_1728x960.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8WDq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F772bf438-ac6a-472f-a4f1-5982f8ab2d73_1728x960.png 424w, https://substackcdn.com/image/fetch/$s_!8WDq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F772bf438-ac6a-472f-a4f1-5982f8ab2d73_1728x960.png 848w, https://substackcdn.com/image/fetch/$s_!8WDq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F772bf438-ac6a-472f-a4f1-5982f8ab2d73_1728x960.png 1272w, https://substackcdn.com/image/fetch/$s_!8WDq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F772bf438-ac6a-472f-a4f1-5982f8ab2d73_1728x960.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Where evals stop being a notebook script and start being a product surface.</em></p>
      <p>
          <a href="https://signalovernoise.tech/p/braintrust-evals-as-a-product-not">
              Read more
          </a>
      </p>
   ]]></content:encoded></item></channel></rss>