What Is AI Slop?

AI slop is not "anything made with AI." It is the stuff that looks finished before anyone has actually understood it: cheap to generate, expensive to verify, and usually someone else's review problem.
A pull request lands from an agent.
At first, it all looks reassuring... The summary is clean. The file names make sense. The tests are green.
Ten minutes later, you realize it duplicated an abstraction you already had, ignored a constraint buried in an old incident, and added 400 lines nobody wants to own.
From a developer's point of view, that's AI slop. Not because the code was AI-generated. Because it looked done long before it was actually understood.
The term stuck because it names something real. It gives people a useful name for a pattern that now shows up everywhere: content, code, specs, tickets, status updates, and summaries that look finished from a distance but fall apart the moment someone has to rely on them.
AI slop is output with responsibility stripped out
Merriam-Webster now explicitly defines one sense of "slop" as low-quality digital content produced, usually in quantity, by AI.1
Useful, but I think Simon Willison's framing is better for actual work: not all AI-generated content is slop. It becomes slop when it is mindlessly generated, barely reviewed, and pushed onto other people who did not ask for it.2
That difference matters.
Bad content existed long before LLMs. Spam existed. Clickbait existed. Empty consulting prose existed too.
AI changed the math.
Now you can make something that looks credible in seconds. So the temptation is no longer "can I make this?" It is "can I get away with shipping this without thinking very hard?"
For developers, this is where the definition gets practical. Slop is not just a bad blog post or a cursed Facebook image.
Here is a boring but very real version of it. An agent opens a PR called "improve retry handling."
The description says it standardizes resilience across services. The tests pass. The diff is tidy, but the repo already has a retry helper. The patch adds a second one. It also touches a billing path where retries are risky, and the PR never mentions idempotency once.
Nothing in it is obviously broken: That's the trap.
You still have to read every line because the output sounds more trustworthy than it deserves.
It is also:
- the PR description that sounds thoughtful but says nothing about why the change exists
- the generated spec that expands a small problem into three pages of padded prose
- the bug summary that reads smoothly but cannot be tied back to logs, tasks, or an actual incident
- the code patch that looks plausible but does not fit the repo it landed in
The tool is not the common trait, what is missing is judgment.
The internal version is the one that hurts
The public version of AI slop is annoying, the internal version is expensive.
It wastes review time, pollutes context, and creates the feeling of progress without much actual progress underneath it.
I do not think there is a paper out there formally measuring "slop.", but there are adjacent signals pointing in the same direction.
DORA's 2025 report makes the broad systems point cleanly: AI mostly acts as an amplifier. It magnifies the strengths and weaknesses of the system you already have.3
GitClear's large-scale analysis found more churn and less reuse as AI-assisted coding rose, which is pretty close to what you'd expect when generation gets cheap and review discipline does not rise with it.4
Put those two together and the risk is pretty obvious.
My read is simple: if your team already has vague specs, weak ownership, and poor traceability, AI will not fix that. It will help you produce more artifacts with the same weakness baked in.
That's the real developer problem with slop: it is cheap to produce and expensive to absorb.
One person saves fifteen minutes. Three other people lose an hour trying to verify what they were handed.
How to spot AI slop in a dev workflow
The easiest way to spot slop is to stop asking whether it was made with AI and start asking what kind of review burden it creates.
I would use five tests.
1. It sounds local, but ignores local reality
This is the classic one.
The output uses the right language but misses the repo's actual abstractions, naming patterns, or historical constraints. It can talk fluently about caching, auth, retries, or migrations while still violating the exact conventions your system depends on.
That's not intelligence. It's syntax cosplay. If the summary could fit any repo, it probably fits none.
2. It is specific in syntax and vague in intent
AI slop often looks impressively detailed until you ask one level deeper.
Why this approach? What tradeoff did we choose? What previous decision does this respect? What can it safely touch and what must stay stable?
If the artifact gets blurry the moment those questions show up, you are not looking at finished work. You are looking at a polished first draft.
3. It expands the surface area faster than it sharpens the result
Slop loves volume.
More helpers. More wrapper functions. More bullets. More sections. More tickets. More "comprehensive" documentation.
The artifact gets bigger, so everybody gets to pretend the work got sharper.
But if the output keeps getting longer while the problem stays the same size, somebody is paying for that later.
Usually the reviewer.
4. It has no provenance
Good work leaves a trail behind it.
What ticket triggered this? What incident or request does it answer? Which logs, docs, customer notes, or commits support the summary? Which discussion made the tradeoff?
Slop has none of that.
It appears as a free-floating artifact with no evidence attached.
5. It makes review harder, not easier
This one matters most.
If I need to reverse-engineer assumptions, diff through avoidable boilerplate, or throw most of it away to salvage the useful 20%, the tool did not save time. It moved the labor.
That's slop.
What good AI use looks like instead
The alternative is not "never use AI". That would be shallow too. A better rule is simpler: use AI to compress labor, not to outsource judgment.
Good AI-assisted work usually has a few visible traits.
- the scope is clear before generation starts
- the output is smaller, not just longer
- assumptions are stated instead of hidden
- claims can be checked
- the reviewer can understand the why without playing detective
A good AI-assisted PR usually feels narrower, not broader.
It says: use the existing helper in shared/http, do not touch the billing path, this came from incident X and task Y, and here is what the reviewer should verify.
Very different from "comprehensive refactor of retry handling" followed by 600 lines of fresh boilerplate.
So I do not think the best AI workflow is the one that generates the most.
It is the one that leaves the cleanest trace:
- smaller diffs
- clearer specs
- fewer invented abstractions
- sharper summaries
- better links between tasks, decisions, commits, and outcomes
If AI is genuinely helping, the artifact should become easier to trust.
The standard I would use on a dev team
If an AI-generated artifact has no owner, no source trail, no constraint story, and no review path, reject it.
Not "clean it up later", reject it.
That's the bar.
AI is supposed to remove drudge work while keeping judgment visible, not flood the system with plausible-looking work. If the output makes review harder, it is not acceleration. It is waste.
If you are adopting agents, build the system that keeps specs, tasks, commits, PRs, and decisions connected tightly enough that the output stays legible.
That's the layer we care about at One Horizon.
Give it a try



