[ ] Developer Philosophy 14 min read

$ Skills, Pipes, and the Unix Philosophy: Why Agents Should Run Programs, Not Replace Them

The Unix philosophy — small programs, composed through pipes, doing one thing well — turns out to be an ideal operating model for AI agents. Skills leverage this by guiding agents to use battle-tested programs rather than reinventing them. Here's why that beats building custom MCP servers for most workflows.

Cover image for: Skills, Pipes, and the Unix Philosophy: Why Agents Should Run Programs, Not Replace Them
// cover_image.render()

The Philosophy That Keeps Winning

There is a design philosophy that has been quietly winning for over fifty years. It was articulated in the early 1970s at Bell Labs, refined through decades of practice, and embedded so deeply into how we build software that most developers absorb it by osmosis without ever reading the original texts.

The Unix philosophy:

  1. Write programs that do one thing and do it well.
  2. Write programs to work together.
  3. Write programs to handle text streams, because that is a universal interface.

Three principles. That is it.

And yet, as I watch the current wave of agent tooling unfold—MCPs, custom servers, plugin architectures, tool registries—I keep coming back to the same thought:

We already had the right abstraction. We just forgot we had it.

I started programming in 2009 on mainframes in the Israeli Defense Force’s J6 & Cyber Defense Directorate. Those systems were built on a version of this philosophy: small, focused utilities composed into pipelines. No framework-of-the-week. No plugin ecosystem. Just programs that did their job, connected by well-defined interfaces.

In Handcrafted Software in the Age of Automation, I argued that we should guide AI toward existing, proven tools rather than letting it generate bespoke solutions for already-solved problems. This post is the next step in that argument.

The Unix philosophy is not just good engineering advice from the past. It is the ideal operating model for AI agents right now. And skills—lightweight prompt-based guidance—are how you unlock it.


Agents Are Not the Program. They Are the Operator.

There is a popular way to talk about agents that gets the abstraction slightly wrong. We describe them as if they are replacing software.

In practice, the more useful framing is the opposite. An agent is not replacing the program. It is learning how to operate the program.

That distinction matters.

Yes, there are scenarios where an agent can itself behave like a tool. That boundary will shift as models improve. Some workflows that currently require explicit software will eventually collapse into the model itself. But today, agents are generally strongest when they are orchestrators—deciding what to do next, interpreting messy intent, translating goals into steps, and then driving deterministic systems on the user’s behalf.

A user says, “Find all the error logs from the last week and summarize the top failure modes.” The agent’s job is not to become a log parser. Its job is to figure out that find, grep, sort, uniq, and head already solve this problem—and compose them correctly.

The model handles ambiguity: what the user probably means, what the intent is, what sequence of actions makes sense.

The program handles execution: parse the file, filter the lines, count the occurrences, sort the results.

This is a very good pairing. And it maps directly onto the Unix philosophy.

The agent is the operator at the terminal. The programs are the tools on the shelf. The pipes are the composition mechanism. The skill tells the agent which tools to reach for and how to connect them.


What a Skill Actually Is

I think “prompt” is too weak a word for what a good skill does.

A skill is better understood as operational guidance for using a program in a particular way for a specific class of problems.

It tells the agent things like:

  • Which tool to use for this category of task
  • What order to use tools in
  • What assumptions need to be verified before acting
  • What failure modes are common and how to recover
  • What a good output should look like

In Building a Claude Code Skill for Codebase Synchronization, I walked through a practical example: a skill that detects documentation drift across a codebase and fixes it. That skill does not contain the intelligence to understand your codebase. It contains the procedure—inspect these files, compare these sources of truth, show the delta, propose a fix, confirm with the user.

The procedure is the valuable part.

Raw capability is not the same as applied competence. An agent might know that jq exists. A skill tells it: use jq here, with this filter pattern, pipe the output through this transformation, and verify the result has the expected shape before moving on.

An agent might know a shell exists. A skill tells it: this is a shell problem. Inspect state first. Avoid destructive commands. Verify the result. Here is the recovery path if it fails.

That is not “extra prompting.” That is operational knowledge. And it is the difference between an agent that vaguely knows tools exist and one that can use them effectively in production.

This is not very different from how humans work. Knowing that ffmpeg, git, jq, sed, ImageMagick, pandoc, or psql exist is not the same as knowing how to solve a real problem with them. The useful part is not awareness of the tool. The useful part is the learned procedure.

A skill packages that procedure in a form the agent can reliably follow.


Why This Works: Training Data Already Contains the Raw Ingredients

One reason skills work so well is that they align with the kinds of patterns models have already absorbed during training.

There are two major sources of leverage here.

1. Tool Use Is Embedded in the Training Data

Training data is saturated with programs being used directly. Bash commands, shell pipelines, Makefiles, Dockerfiles, Python scripts, CI configs, terminal transcripts, debugging sessions, SQL queries, CLI help pages, man pages, setup instructions.

That matters because Unix tools have a text-based interface. A shell command is text. A script is text. A config file is text. An API call is usually text. For a language model, that is home territory.

Consider the difference:

# Finding error patterns in recent logs — Unix pipeline
find /var/log -name "*.log" -mtime -7 -exec grep -l "ERROR" {} + \
  | xargs -I{} sh -c 'echo "=== {} ===" && grep "ERROR" "{}" | sort | uniq -c | sort -rn | head -5'

An agent guided by a skill toward this pipeline is drawing on a deep prior—countless examples of humans solving concrete problems with exactly these mechanisms. The model has seen find, grep, sort, uniq, and head composed together thousands of times across its training data.

This does not mean the model is always right. But it means the model has a strong native bias toward this mode of problem-solving. The skill activates that bias and constrains it toward the specific task.

2. The Internet Is Full of Problem-Solution Pairs

The second source of leverage is even more important: the web is packed with people asking “how do I do X?” and other people answering with step-by-step procedures.

Forums, Q&A sites, documentation, blog posts, README files, tutorials, issue trackers, troubleshooting threads—they all reinforce the same pattern:

  • Here is the problem
  • Here is the environment
  • Here are the constraints
  • Here is the sequence of commands
  • Here is the expected result
  • Here is what to do when it fails

That is almost exactly the structure of agentic work.

When a user asks an agent to solve a task, the agent is often matching that request against a vast library of previously seen “how to accomplish X using Y” examples. A skill sharpens that retrieval path. It narrows the solution space. It says: in this context, use these tools, in this order, with this standard of care.

So the skill is not replacing the model’s knowledge. It is activating and constraining it. Two different operations, both important.


Why This Beats MCPs for Most Workflows

Now let me say the thing directly.

MCP servers—Model Context Protocol servers—are a popular way to give agents access to capabilities. You write a server that exposes tools through a standardized protocol. The agent calls the server. The server runs logic and returns results.

For certain cases, this makes sense. I will get to those.

But for the vast majority of agent workflows, MCPs are the wrong abstraction. They add complexity where none is needed. And they actively work against the grain of what makes agents effective.

Here is why.

MCPs Create Cold Starts

When an agent calls an MCP server, it is interacting with an interface it has likely never seen before. The tool names, parameter schemas, return formats, and behavioral semantics are all novel. The model has zero prior exposure to your custom extract-json-fields MCP tool. It is working from the schema description alone.

Compare that to a skill that guides the agent to use jq:

# Skill-guided: extract active user emails from JSON
jq '.users[] | select(.active == true) | .email' data.json

The model has seen jq used in hundreds of thousands of examples across its training data. It knows the syntax, common patterns, edge cases, error messages. The skill does not need to explain jq from first principles. It only needs to direct behavior: for this class of problems, use jq, here is the pattern.

That is an enormous advantage. You are building on a warm prior instead of starting cold.

MCPs Add Unnecessary Infrastructure

An MCP server requires:

  • Writing code (TypeScript, Python, whatever the runtime)
  • Defining tool schemas
  • Running a server process
  • Managing the connection between agent and server
  • Handling errors, timeouts, and version mismatches
  • Maintaining and updating the codebase over time

A skill requires:

  • Writing a markdown file

I am not exaggerating. A Claude Code skill is a .md file that describes a procedure. No server. No runtime. No dependencies. No deployment. You write it, the agent reads it, the agent follows the procedure using tools that already exist on the system.

The maintenance burden is not comparable. When jq gets an update, you do nothing—your skill still works. When your MCP server’s dependencies need updating, you have a maintenance task.

MCPs Break Composability

One of the most powerful features of Unix programs is that they compose through pipes. The output of one program becomes the input of the next. No special integration required. No orchestration layer. Just text flowing through a pipeline.

# Three programs composed through pipes — no orchestration code needed
curl -s https://api.example.com/users \
  | jq '.[] | select(.role == "admin")' \
  | wc -l

MCPs are silos. Each MCP server is its own world. Composing two MCPs requires the agent to call one, capture the result, transform it, and call the other. That orchestration lives in the agent’s reasoning—which is the least deterministic part of the system.

Unix pipes move the composition into the deterministic layer. The agent constructs the pipeline. The shell executes it. The agent checks the result. That is a much more robust workflow.

MCPs Duplicate What Already Exists

This is the part that bothers me most.

Consider what people build MCP servers for: reading files, searching codebases, extracting data from JSON, transforming text, querying databases, managing git operations, processing images, converting formats.

Now consider what already exists on any Unix system: cat, grep, find, jq, sed, awk, sort, uniq, curl, git, ffmpeg, convert, pandoc, psql, sqlite3, tar, diff, wc, head, tail, cut, tr, xargs.

These tools are battle-tested. They have been refined for decades. They handle edge cases. They are fast. They are documented. They are deterministic. And—crucially—the model already knows how to use them.

Building an MCP server to wrap functionality that jq or grep or git already provides is not just unnecessary overhead. It is actively counterproductive, because you are replacing a tool the model has a deep prior on with a novel interface it has never seen.

When MCPs Do Make Sense

I am not saying MCPs are never the right choice. They make sense when:

  • You need authenticated access to APIs (Slack, GitHub with auth, internal services)
  • You are interfacing with proprietary systems that have no CLI equivalent
  • You need stateful interactions that cannot be expressed as shell commands
  • You are building a product where the MCP protocol is the distribution mechanism

Those are real use cases. But they are a small fraction of what I see people building MCP servers for.

For file processing, text transformation, system inspection, data extraction, format conversion, log analysis, git operations, and most of the day-to-day work that agents do? You do not need a custom server. You need a good skill and access to a shell.


The Multiplier Effect

There is an interesting compounding dynamic here.

If the deterministic program already appears frequently in the model’s training data, the skill becomes more powerful. The reason is simple: the prompt is not teaching the program from scratch. It is cueing a system the model has likely seen many times before.

Instructing an agent to use bash, python, git, curl, ffmpeg, jq, grep, awk, or sed is effective partly because those interfaces are deeply represented in the training corpus. The model has seen them used across documentation, scripts, answers, debugging threads, and tutorials. A skill can therefore be compact while still being highly effective.

The prompt does not need to explain everything. It only needs to direct behavior toward a known operational pattern.

This is one reason skills often feel disproportionately useful relative to their size. A few paragraphs of guidance can unlock a large amount of latent procedural knowledge. The skill is not providing the intelligence. It is providing the direction.

MCPs, by contrast, are always a cold start. Every custom MCP tool is a novel interface. The model works from the schema description, not from deep familiarity. That is a fundamentally weaker position, and no amount of schema documentation fully compensates for the absence of training data.


Determinism Is Part of the Point

One reason this pattern is robust is that the program is deterministic.

The model is not being asked to improvise every step of the solution. It is being asked to drive something that has crisp, predictable behavior.

That gives you a useful split:

  • The model decides — which tools, which order, which parameters
  • The program executes — deterministically, exactly as invoked
  • The model checks — did the output match expectations?
  • The user gets a result — verified through a deterministic channel

This is much more reliable than asking the model to do everything internally, especially when the task involves files, formats, transformations, or any workflow where correctness matters.

The skill is what makes that split work consistently. It tells the model how to move between judgment and execution without losing the thread.

Without that layer, the agent may still succeed on any given run. With it, the agent is far more likely to succeed repeatedly.


The Boundary Will Move, But the Principle Will Not

I do not think skills in their current form are permanent. Over time, models will absorb more tool-like behavior. Some explicit programs will disappear behind model interfaces. Some workflows that currently need careful orchestration will become default capabilities.

But the Unix philosophy insight—composition of small, focused, deterministic units—is a design principle, not a temporary workaround. It applies regardless of whether the units are shell programs, API calls, or model-native capabilities. The principle survives because it solves a real problem: managing complexity through composition rather than consolidation.

Today, that composition happens through skills guiding agents to use Unix programs.

Tomorrow, the specific tools may change. But the pattern—small units, clear interfaces, text as the universal medium, composition over monoliths—will still be the right way to build reliable systems.

The Unix philosophy was not designed for AI agents. It just happens to describe exactly what they need.


The Operating Model

Here is the mental model I keep coming back to:

The model brings interpretation. It handles ambiguity, figures out intent, translates fuzzy human goals into structured action.

The program brings precision. It executes deterministically, handles edge cases correctly, and produces predictable output.

The skill connects the two. It encodes operational knowledge—which tools, which order, which patterns, which guardrails.

That connection is powerful precisely because it leverages training data twice. First through the model’s deep familiarity with tool interfaces and executable patterns. Second through the enormous corpus of human problem-solving examples—the entire internet’s worth of “how do I accomplish X using Y.”

A skill is not just a prompt. It is a compact operational wrapper around accumulated procedural knowledge. It tells the agent how to use a deterministic program for a specific class of problems. And when that program already lives in the model’s prior—when the agent has effectively seen it used ten thousand times—the result is surprisingly strong.

We do not need to rebuild the wheel with custom servers and novel protocols for most agent workflows. The programs already exist. The training data already contains the knowledge. The composition mechanism already works.

All we need is the skill to connect them.

Because, for now at least, agents are not at their best when they replace programs.

They are at their best when they know how to run them well.


This post builds on ideas from Handcrafted Software in the Age of Automation, which argued for respecting battle-tested tools in the age of AI, and Building a Claude Code Skill for Codebase Synchronization, which demonstrated what skills look like in practice. Together they form a thread: respect the fundamentals, encode the procedures, and let agents compose what already works.

Written and developed with Claude. The arguments are mine; the drafting was collaborative.

// WAS THIS HELPFUL?