AI Slop, Human Bandwidth, and Where I Draw the Line

A few weeks ago I sat down to review a pull request that, on paper, looked fine.

The description was crisp. The commit messages were tidy. There were comments in the code explaining the tricky parts. When I asked a question in the review, I got a confident, well-structured answer within minutes.

Halfway through reading it, I realized something uncomfortable: almost none of it had been written by the person whose name was on the PR. The description read like a model. The inline comments read like a model. The review reply read like a model. The code itself was somewhere between “plausible” and “actually wrong” in places that only careful reading would surface.

I was not annoyed because AI had been used. I was annoyed because the author had shipped me several thousand words of confident prose and zero thinking. My job as a reviewer had quietly been upgraded from check the work to audit the plausibility of an entire synthetic artifact.

That is the thing that worries me most about AI in software. Not that models can generate code. That they can generate everything around the code even faster.

AI Slop Is Not Just Bad Code

When people say “AI slop,” they usually mean low-quality generated code. I think that definition is too narrow.

AI slop is any generated artifact that is cheaper to produce than it is to evaluate.

Sometimes that artifact is code. Sometimes it is a 700-line issue body. Sometimes it is a PR description that says everything and nothing. Sometimes it is a wall of comments added to create the feeling of rigor. Sometimes it is a review response that sounds confident but is really a proxy answer copied from another model run.

The common pattern is not “AI was used.” The common pattern is that the human cost has been externalized. One person gets leverage. Everyone else inherits the reading burden.

That is where the social contract breaks.

The Problem Is Not AI. The Problem Is Offloading Cognition

I am not anti-AI. That framing is too primitive.

I have been shipping with AI agents in my daily workflow for a while now — drafting code, exploring design space, producing scaffolding, generating test cases, arguing with my own plans. A lot of this post itself started as notes I worked through with Claude. The value of these tools is obvious to me. I am not in the abstinence camp.

Sometimes the thing you are writing is not even primarily for a human. It is a token blob meant to prime the next tool call. An issue can be a prompt. A system prompt can be architecture. A structured spec can be less about documentation than about steering an agentic workflow.

That is real work. It is often valuable work.

You can push it further. Run multiple coder agents against the same task. Compare approaches. Use their outputs as competing prompts for a final synthesis. That can be a productive workflow.

None of this removes the need for human judgment. It just changes where judgment is most scarce and most expensive.

The real bottleneck is no longer generation. It is comprehension.

Accountability Is Not the Same Thing as Understanding

This is the part I think many teams still talk past each other on.

“Accountability” and “understanding” are related, but they are not the same thing.

Accountability means I can own the result. I can defend the branch I took. I can explain why a dependency was added, what tradeoff was made, what the blast radius is, how this affects DX, reliability, and product quality, and whether the tests are actually good enough. I can say, without hiding behind the tool, this change is mine.

Understanding is narrower and deeper. Do I understand the design pattern behind this module? Do I understand why this abstraction exists? Do I understand how this generic type will behave when the next integration point arrives? Do I understand the failure modes of this virtualization strategy? Did I inspect alternatives, or did I just accept the first plausible answer?

Some work requires both. Some work can survive with stronger accountability than line-by-line understanding. Pretending otherwise is purity theater.

A rule that says “the human must fully understand all code” sounds clean. In practice, it is subjective, hard to operationalize, and easy to weaponize. Ghostty takes a strict version of that stance, requiring disclosure and saying contributors should not submit changes they cannot explain without AI help. That may be exactly right for a terminal emulator maintained by a small group with a specific culture. I do not think it scales cleanly into a general standard.

In real teams, full understanding is not binary. It is uneven, contextual, and often discovered under pressure. A policy that pretends otherwise will either be ignored or weaponized, usually both.

Why I Don’t Buy the VisiData 10 Levels as a Standard

Saul Pwanson’s VisiData post is thoughtful, and the taxonomy is useful as a lens. It distinguishes between cases like “bots coded, human understands completely,” “bots coded, human understands mostly,” “human specced, bots coded,” and “bots planned, human approved.” It is a serious attempt to name meaningful differences in human involvement.

I do not agree with treating that 10-level scheme as a de facto standard.

It is too specific to one project’s culture, one maintainer’s concerns, and one style of contribution management. Even VisiData presents it as its framework, not as an ecosystem-wide norm.

My issue is not that the distinctions are intellectually wrong. My issue is that they are hard to operationalize consistently in policy.

“Reviewed” is observable. “Generated” is observable enough. “Human approved” is at least claimable. But “understands mostly” versus “understands completely” is a subjective mental-state audit. That is not a stable enforcement boundary. It is a vibe.

For a small open source project, a highly opinionated taxonomy may be acceptable. For inner-source or commercial product development, it can become friction masquerading as rigor. Teams with limited resources do not need a more ornate morality scale. They need clarity around ownership, risk, and review burden.

In my own work, most AI-assisted PRs would land somewhere VisiData would call level 7 or level 8. In a simplified internal system I would describe them more bluntly as “AI-generated, human-reviewed” or sometimes “AI-generated, lightly reviewed.” That is exactly why I do not want a policy that performs precision without delivering it. My bar is not whether I can philosophically classify my state of understanding. My bar is whether I can responsibly own the outcome.

The Ecosystem Is Not Converging on One Answer

Another reason I resist universalizing any one project’s framework is simple: open source itself has not converged.

The Linux Foundation is permissive in spirit — AI-generated contributions are allowed, with the burden on contributors to ensure license compatibility and compliance. The Linux kernel now has official guidance for AI coding assistants, but routes them through the ordinary development process rather than through an abstract theory of “AI contribution levels.”

Rust’s recent discussion is reacting to a different failure mode: low-effort, insufficiently self-reviewed, extractive contributions that consume maintainer attention and feel, in their words, like a denial-of-service attack on review capacity. Firefox has focused entirely on accountability and declined to require disclosure. LLVM requires disclosure and calls out extractive contributions explicitly. QEMU sits much further to the restrictive side and says it will decline contributions believed to include or derive from AI-generated content at all.

These projects are not disagreeing about whether AI is good or bad. They are making different tradeoffs about which failure mode they most want to prevent: license risk, maintainer burnout, learning-opportunity capture, code quality, or contributor accountability.

So no, I do not think one project’s 10-level rubric should be treated as the standard. The principles may travel. The taxonomy should not.

My Practical Line

My line is pragmatic.

Use AI aggressively where it compresses toil, expands exploration, or helps structure a problem.

Do not use it as a shield against ownership.

Do not use generated verbosity as a substitute for thinking.

Do not dump unbounded review cost on another human and call it productivity.

Do not answer reviewer questions by laundering model output through your keyboard.

And do not confuse “the tests pass” with “the design is sound” in areas where the real risk is architectural, operational, or product-facing.

If you want a simple rule, mine is this:

The more AI reduced the cost of producing the change, the more the author must compensate with clarity, verification, and accountability.

That is the trade.

Not abstinence. Not blanket permission. Not taxonomy worship.

A fair exchange.

What Should Policies Optimize For?

I think good policies should optimize for three things:

Transparency about how the work was produced.
Human accountability for the result.
Protection of reviewer bandwidth.

That last one matters more than many teams want to admit. Human attention is the scarce resource here. Not tokens. Not compute. Not generated prose.

When a policy makes contribution harder in ways that protect scarce human review capacity, that can be reasonable. When it adds ceremony without improving accountability or reducing review cost, it becomes process cosplay.

This is why I prefer simpler classifications over elaborate ladders. I would rather have a small system that contributors can apply honestly and maintainers can use consistently than a nuanced system that encodes subjective mental states nobody can verify.

I wrote a while back in The Unpopular Opinion About Vibe Coding and AI that AI has not replaced coding — it has shifted where the effort lives, from writing to governance. The same shift applies here. The question is no longer “did a human write this line.” It is “does a human stand behind this change, and did they invest enough effort to justify mine?”

Augmentation, Not Theater

The choice is not between stopping progress and surrendering to slop.

The right frame is augmentation.

AI should help humans move faster, see more options, and spend more time on the parts that actually require judgment. It should not become a machine for manufacturing plausible-looking burden for someone else to sort through.

That is the line for me.

I am not against AI-generated code. I am against unaccountable output. I am against content inflation. I am against policies that pretend every use case is the same. And I am against importing a rigid framework from one project and treating it like a universal answer.

The future is not human-only software development. But it also cannot be a world where machines generate explanation faster than humans can maintain trust.

The scarce asset is not code.

It is responsible attention.

That PR I started with, the one that made me write this post — I did eventually close the review. Not because AI had been used. Because the author could not defend the parts I asked about, and the review itself had started feeling like a conversation with a model wearing a human’s badge. The fix was not a new taxonomy. The fix was a short message that said: do the next one yourself, then loop me back in.

That, to me, is the whole policy in one sentence.

Written and developed with Claude. The arguments are mine; the drafting was collaborative. The inciting PR review was, unfortunately, also mine.