Built-In Tools Are the Whole Game
A loop with nothing to call is a chatbot. The tools are the product.
Two teams I talked to this month went down the same road and ended up in different places. The first team had been benchmarking LangGraph against CrewAI against Pydantic AI for four weeks, with a spreadsheet, with side-by-side latency tests, with a draft RFC they were going to standardize on once the decision landed. The second team picked the first framework that compiled on a Tuesday afternoon and spent the same four weeks plumbing their agent into Salesforce, their billing system, their internal Confluence, their support ticket queue, and a read-replica of their orders table. The first team had three demos that worked on synthetic data. The second team had something that handled real customer questions on real accounts and saved their support team about fifteen hours a week.
The framework choice did not matter. The framework choice has never been the point.
This is the part the agent ecosystem keeps refusing to admit. We have spent eighteen months treating “which framework” as the load-bearing architectural decision. Every conference talk is on state machines and graph topology. Every comparison thread on Hacker News pulls thousands of comments. Every Twitter post benchmarks decorators against builder patterns. Meanwhile the differentiator that decides whether your agent ships value is what it can touch. Tools. Tool integrations. The specific, boring, business-shaped capabilities wired into the loop. Not the loop.
A framework, stripped of marketing language, is a wrapper around a while-loop, a tool-calling protocol, and some state. The frameworks differ in ergonomics, in observability hooks, in how they handle parallel tool calls or human-in-the-loop pauses or streaming responses. Those differences matter once you have something worth running. They do not produce value on their own. A LangGraph agent with no tools and a CrewAI agent with no tools and a Pydantic AI agent with no tools are all the same agent. They are a chatbot. A chatbot is not what anyone is buying.
What people are buying is the agent that knows their customers, queries their data, files their tickets, schedules their meetings, reads their docs, and writes back into the systems that run their business. None of that lives in the framework. All of it lives in the tools the framework can call. The framework debate is a distraction from the only question that predicts whether the agent does anything useful: what does it have access to?
The built-in tool catalog from the major vendors hides the second layer of this conversation. Anthropic ships Claude with web search, code execution, computer use, and a file system tool, all callable from the API without writing custom integration code. OpenAI’s Agents SDK pulls in web search, a code interpreter, file search, and a hosted image generator. Google’s Gemini agents ship with native Google Workspace and search access. Those are the gimme tools. They are useful. They are also the easy ones. Every team that ships an agent gets the same web search and the same code execution, which is why none of those capabilities differentiate any product anyone is paying for.
The hard tools are the ones that touch your business. The Salesforce integration that knows the difference between an opportunity and a contact and writes back to the right object. The Postgres tool that respects row-level security and does not return PII to a user who should not see it. The Stripe tool that knows the difference between a refund and a chargeback dispute and asks for human confirmation before calling either. The ticketing tool that creates a Linear issue with the right project, the right team, and the right priority based on the user’s actual intent. Those are the tools that make the agent worth paying for. They are also the tools nobody writes blog posts about, because they are not interesting on a slide deck. They are auth flows, schema mapping, error handling, idempotency keys, rate-limit backoff, and the kind of integration code that makes engineers groan when they see the JIRA ticket.
This is the work the framework debate keeps hiding. A team that spent six months choosing between LangGraph and Strands has spent the same six months not building the Confluence tool, not writing the Salesforce integration, not figuring out the safe-write semantics for their billing system. Those are not framework problems. No framework solves them. A framework gives you a function signature for register_tool and gets out of the way. The work happens after that signature, and it is the work that produces the product.
The cleanest abstraction the industry has produced for this work is the skill. Anthropic shipped Claude Skills as the first vendor implementation that names the thing: a skill is a packaged capability with instructions, files, and tools bundled together, that the agent loads on demand from a folder. The MCP server ecosystem covers a different shape of the same idea, treating each capability as a small server the agent connects to over a protocol. Both are converging on the same insight. The unit of agent capability is not the framework you wrote it in. It is the tool, the instructions for using the tool, and the auth that lets the tool reach into the system it represents. That bundle is reusable across agents and reusable across frameworks. The framework you pick is the harness. The skills are the work.
The honest objection is that framework choice does still matter at scale, and it does, marginally. The team running a hundred agents in production cares about debuggability, lock-in, observability, type safety, and team-shareability. Those are real concerns and a framework with weak observability hooks will cost you incident hours when something breaks at three in the morning. None of that, though, is what predicts whether the first agent ships value. The first agent ships value because it has the tools to do something useful. Every team I know that bounced off agents bounced because their agent could not touch the systems that mattered. They had a working LangGraph. They did not have a working integration.
This reframe is also why the framework wars matter less than the marketing on either side wants you to believe. The skills arc this month picks up here. The framework wars get a scorecard at the end of May, and that scorecard is going to read as anticlimactic, because every framework on the list does the wrapping job well enough to ship. What separates the teams that ship from the teams that demo is the catalog of capabilities they have built into their agent: which systems it can reach, which actions it can take, which workflows it can complete end to end without a human gluing the steps together. The next few posts cover Claude Skills as the cleanest vendor-shipped expression of that catalog, the packaging pattern we have landed on for distributing skills as npm packages across teams, and the architectural argument for treating skills as the unit of reuse instead of the agent.
Stop arguing about the loop. Start cataloging the tools. That is the only inventory that predicts whether you ship.
If this was useful, forward it to one engineer who needs less noise in their feed.


