Shipping Skills as npm Packages: A Workflow That Actually Scales
The package manager already solved this problem. Stop reinventing it with custom registries and Slack-pasted folders.
The first six months of running skills across a team looked like a horror show. Engineers were copying SKILL.md folders out of Slack threads. One squad was running a v0.3 of the incident-triage skill while another squad was running a v0.7 with a completely different prompt structure, and neither one knew the other existed until an outage forced them to compare notes. We had a Confluence page that was supposed to be the source of truth and it had been wrong for three weeks. The skills themselves were working. The distribution was the disaster.
The fix was sitting in plain sight, and it embarrassed me a little that it took as long as it did to land on it. We already had a perfectly good way to ship code artifacts across teams with version pinning, dependency resolution, semantic versioning, changelogs, and a registry that handles all of this for free. It is called npm. The skills are markdown files plus the occasional shell script. They are content. npm does not care that they are not JavaScript. The package manager was happy to ship them the moment we stopped treating skills like some new category of artifact that needed a new category of tooling.
What this looks like in practice is a small package per skill or per logical bundle of skills. The package contains a skills/ directory at the root, an index file that exports nothing meaningful from a runtime perspective, a package.json with name, version, description, and a files field that ships exactly the skill content. The name follows the same scope pattern as everything else: @org/skill-incident-triage, @org/skill-billing-rollforward, @org/skills-postmortem. The version is semver. A patch bump means the SKILL.md description was tightened. A minor bump means new instructions were added to the body, or a new reference document was included. A major bump means the skill’s invocation surface changed in a way that a downstream agent might trip on.
The consumer side is where the workflow earns its keep. An agent project has a package.json like any other Node project. Skills get pulled in as dependencies pinned to exact versions, not floating ranges. When the agent boots, it walks node_modules for any package matching the @org/skill-* or @org/skills-* naming pattern, finds the skills/ directory inside each one, and either symlinks or copies the contents into the agent’s .claude/skills/ directory before the model is loaded. Twenty lines of bootstrap code. The agent never has to know which skills it has. It asks the filesystem.
The reason this matters more than it sounds: every agent in the org ships with an explicit, version-pinned manifest of exactly which skills are loaded and at what version. The manifest is package.json. It lives in source control. It goes through code review. It diffs cleanly when someone bumps a skill version, and the diff shows up in the same PR review surface as every other dependency change. That is the entire workflow problem solved, and we did not write any of the tooling that solved it.
The version pinning is the part that does the most work in practice. When the incident-triage skill gets a major revision because the underlying playbook changed, the team that owns the skill ships @org/skill-incident-triage@2.0.0. The agents that depend on it stay pinned to ^1.x until their owners are ready to test the upgrade. That test is the usual one: bump the version in package.json, run the agent against a fixture set of historical incidents, see whether the outputs hold up, ship the bump if they do, file an issue against the skill owners if they do not. Same loop as any other library upgrade. The skill team is not running around chasing every consumer asking them if they have updated. The consumers are not stuck on whatever version was current the day they cloned the repo.
The objection I get from engineers when I describe this is that npm feels heavy for what is fundamentally a folder of markdown. They are not wrong about the technology being overpowered for the artifact, and the cleaner instinct is to want something custom that fits the actual shape of the problem. The trouble with that instinct is that custom registries take real engineering time to build, real engineering time to maintain, and they fail in ways the team is not used to fixing. npm has been pressure-tested for fifteen years against the most adversarial dependency graphs in the industry. The cost of using a tool that is slightly oversized for the job is somewhere between negligible and zero. The cost of building a new tool that exactly fits the job is six engineer-months and a perpetual maintenance tax. The math is not close.
The second objection is that npm is JavaScript-flavored and most of the orgs running production agents are Python shops. This one is fair, and the workaround is uglier than I would like, but it works. The agent process does not need to be Node. The skills package can be pulled with npm install as a one-time setup step in the agent’s container build, with the resulting node_modules directory copied or symlinked into the agent’s working directory before the Python process starts. The cost is one Node toolchain in the build image. Python orgs that want to avoid that can use the same pattern with PyPI and a skills/ directory inside a Python package. The mechanics are identical, the file layout is identical, only the index file and metadata change. The principle survives the language switch. The principle is: use the package manager you already have, with the version pinning you already trust, and let the boot-time copy step bridge the runtime gap.
Where I would push back on a team that is about to adopt this is the temptation to overdo the bundling. The first version of our setup had a single @org/skills-all package that shipped every skill in the org as one giant dependency. It was convenient until it was not. Every skill change cut a new version of the everything-package, every agent had to pull the whole world to update one skill, and the blast radius of a bad skill went from “the agents that depend on it” to “every agent in the company.” We broke it apart skill by skill over a quarter, and the diff in operational sanity was immediate. Granularity is doing real work here. One package per skill is the default. One package per coherent skill bundle (three skills that always travel together) is the exception. One package per organization is the antipattern that looks like simplification and turns into a load-bearing footgun.
The repeatable pattern, after roughly four months of running this in production across a dozen agents and three teams, comes down to a checklist. Each skill lives in its own repo, or in a monorepo with one package per skill. Each package follows the @org/skill-<name> naming convention. Each version follows semver with the patch/minor/major rules above. Each agent project pins exact versions in package.json and uses a twenty-line bootstrap to copy the skills into place before model load. Each version bump is a PR, reviewed by the same humans who review code changes. The whole thing fits into the developer workflow that already exists, and nobody has to learn a new tool.
The temptation in the agent ecosystem right now is to invent new infrastructure for every new abstraction. The instinct is wrong, and it is the same instinct that gave us a different YAML format for every CI tool, a different secret manager for every cloud, and a different package format for every language runtime. The agent layer is going to be more boring than the marketing wants you to believe, and the boring parts are where the actual leverage lives. Skills are content. npm ships content. Everything else is a distraction from the part of the work that produces value, which is writing skills good enough to be worth shipping in the first place.
If you are still copying skill folders out of Slack threads, the package manager has been waiting for you the whole time.
If this was useful, forward it to one engineer who needs less noise in their feed.


