Creating Agents Is the New Prompt Engineering

The old game was crafting one clever prompt. The new game is designing the system around the model: repo instructions, scoped rules, tools, plugins, subagents, approvals, and task context.
People still talk about prompt engineering as if it means finding the perfect spell for a chat box.
That framing is out of date.
It belonged to the moment when the model sat behind a text field and your only real lever was wording.
Claude Code, Codex, and Cursor do not really work like that anymore. They have memory files, scoped rules, MCP servers, hooks, background execution, agent approvals, GitHub integrations, and specialized subagents.1234567
Once the model can read from multiple instruction layers and act through multiple tools, the prompt stops being the whole product. It becomes one layer in a much larger behavior stack.
That is why I think creating agents, skills, and plugins has quietly become the new prompt engineering.
The single-prompt era is over
Early prompt engineering was mostly local optimization.
You tried to get a better answer by changing phrasing, adding examples, or forcing a format. That still matters, but it is not where most of the leverage sits once you move into coding agents.
OpenAI's own GPT-5 coding guide now warns that vague or conflicting instructions in .cursor/rules or AGENTS.md can hurt results, and explicitly recommends thinking about tool budgets, reasoning effort, and how eager the agent should be when gathering context.8 That is not a prompt-writing problem in the old sense. That is runtime design.
The same pattern shows up in the products themselves. Anthropic's Claude Code reads CLAUDE.md memory files recursively, exposes slash commands, connects to MCP servers, and lets you create subagents with their own prompts, tool access, and separate context windows.1234 Cursor gives you project rules in .cursor/rules, MCP integrations, and background agents tied into GitHub and remote environments.567
That is the shift.
You are no longer prompting a model once. You are configuring a system that will keep prompting itself.
The prompt is now a stack
The easiest way to miss this change is to keep looking for one magical instruction block.
There usually is not one.
What actually shapes behavior now is the combination of layers around the model. A root-level AGENTS.md tells the agent how the repo works. A scoped rule file overrides that behavior for a subset of files. An MCP server changes what context is available at decision time. A subagent narrows expertise and tool access for a specific type of task. A hook decides what happens before or after an edit. An approval policy determines whether the agent stops to ask or just executes.
By the time the model produces text, most of the real prompt engineering has already happened.
That is why so many teams feel disappointed when they copy a good-looking prompt from a thread and get inconsistent results. The missing variable is usually not the sentence. It is the surrounding system: which rules were active, which files were loaded, which tools were available, what the agent was allowed to touch, and whether the task had enough structure to survive delegation.
Single-prompt advice breaks down because the unit of control moved up a layer.
Every serious coding agent converged on the same idea
The vendor names are different. The pattern is not.
Anthropic talks about memory, MCP, slash commands, hooks, and subagents.1234 Cursor talks about rules, MCP, GitHub-backed background agents, and remote environments.567 OpenAI's developer docs for Codex now expose configuration areas for Rules, Hooks, AGENTS.md, MCP, Plugins, Skills, and Subagents, which tells you exactly where the company thinks the control surface lives.9 OpenAI's Docs MCP guide even recommends adding a line to AGENTS.md so the agent reliably consults a documentation MCP server without being reminded every time.10 And when OpenAI describes the Codex SDK, it says the same thing in plainer language: the prompt, tool definitions, and agent loop were tuned together.11
That is not a minor UI detail.
It means the mainstream agent tooling has already accepted a new premise: good agent behavior comes from reusable environment design, not from rediscovering the same instructions in every chat.
Call those reusable pieces skills, plugins, agents, rules, or workflows if you want. The label matters less than the operating model. The winning pattern is always the same. Encode the behavior once. Scope it well. Reuse it across tasks. Keep humans in the loop where judgment still matters.
The old prompt engineer was part copywriter.
The new prompt engineer looks a lot more like a systems designer.
This changes what good prompting actually looks like
If you accept that the prompt is now a stack, the job changes with it.
Good prompt engineering used to mean writing instructions that sounded smart. Good agent design means making behavior legible.
You want small, composable rules instead of one bloated manifesto. You want subagents that own a narrow job instead of a super-agent that does everything badly. You want tools exposed on purpose, not by default. You want MCP servers that bring in real context, not more noise. You want approval policies that match risk. You want hooks and review steps that catch predictable failure modes before somebody merges polished nonsense.
This is also where most teams underestimate the amount of product thinking involved.
A skill is a product decision.
A plugin is a product decision.
A subagent boundary is a product decision.
So is the moment when the agent should stop, ask, summarize, or hand work back to a human.
Those decisions shape trust more than the wording of any single instruction block.
And yes, the sentence-level prompt still matters. If your top-level instructions are vague, contradictory, or bloated, the system will still underperform.8 But that sentence block is now one ingredient, not the whole dish.
The real bottleneck is shared task context
This is the part people skip because it is less fun than talking about models.
The best agent stack in the world still falls apart if the underlying task is mush.
If the bug has no reproduction steps, if the initiative has no real definition of done, if the repo rules are tribal knowledge, if the review expectations live in one staff engineer's head, then the agent does what every other system does in that situation. It hallucinates structure. It produces plausible-looking work. It dumps verification cost on somebody else.
That is why I do not think the new prompt engineering starts with the prompt at all.
It starts with the task.
A clean task determines which agent you should invoke, which rules should apply, which tools are safe, what context needs to be pulled in, what the review surface should be, and what outcome counts as done. If that foundation is weak, the stack above it gets brittle fast.
Take a boring engineering task. "Fix the auth bug" is not useful agent context. "Reproduce the 401 on token refresh in mobile web, do not touch the OAuth provider, preserve current session-cookie behavior, and add a regression test around refresh flow" is. The second version already tells the system which files probably matter, which constraints are non-negotiable, what kind of tool use is justified, and how a reviewer should judge the result.
This is also why the phrase "spec-driven development" matters more in the agent era, not less. The spec is no longer just documentation for humans. It is part of the execution environment for the model.
Tasks, bugs, and initiatives are becoming the prompts that matter.
Treat agent design like infrastructure
If this framing is right, then teams should stop treating agent setup as sidecar wizardry. Version the instructions, name the skills clearly, keep plugins narrow, audit tool access, review subagent boundaries, test approval policies, trim rules that turned into sludge, and track where the system is doing the wrong kind of work too eagerly.
In other words, manage agent behavior the way you would manage architecture, internal tooling, or CI.
Because that is what it is now.
Prompt engineering did not disappear. It got absorbed into a bigger discipline.
The sentence still matters.
The system matters more.
That is the shift I care about, and it is a big part of how we think about One Horizon. If the real control surface now lives across tasks, tools, instructions, and handoffs, then the missing layer is not a better chat box. It is shared operational context. A system where the task already knows its goal, history, constraints, ownership, and shipped outcome before you hand it to Claude Code, Codex, or Cursor.
That is when the stack gets interesting.
Not when the prompt sounds clever.
When the whole chain holds.
See One Horizon
Footnotes
-
Anthropic, "Manage Claude's memory." https://docs.anthropic.com/en/docs/claude-code/memory ↩ ↩2 ↩3
-
Anthropic, "Slash commands." https://docs.anthropic.com/en/docs/claude-code/slash-commands ↩ ↩2 ↩3
-
Anthropic, "Connect Claude Code to tools via MCP." https://docs.anthropic.com/en/docs/claude-code/mcp ↩ ↩2 ↩3
-
Anthropic, "Subagents." https://docs.anthropic.com/en/docs/claude-code/sub-agents ↩ ↩2 ↩3
-
Cursor, "Rules." https://docs.cursor.com/en/context/rules ↩ ↩2 ↩3
-
Cursor, "Model Context Protocol (MCP)." https://docs.cursor.com/cli/mcp ↩ ↩2 ↩3
-
Cursor, "Background Agents." https://docs.cursor.com/background-agent ↩ ↩2 ↩3
-
OpenAI, "GPT-5 for Coding." The guide says vague or conflicting instructions in
.cursor/rulesorAGENTS.mdcan hurt results, and recommends setting tool budgets and eagerness explicitly. https://cdn.openai.com/API/docs/gpt-5-for-coding-cheatsheet.pdf ↩ ↩2 -
OpenAI Developers, Codex documentation navigation. The Codex docs expose configuration areas for Rules, Hooks,
AGENTS.md, MCP, Plugins, Skills, and Subagents. https://developers.openai.com/ ↩ -
OpenAI Developers, "Docs MCP." The guide recommends adding an
AGENTS.mdinstruction so agents use the docs MCP server automatically when answering OpenAI product questions. https://developers.openai.com/learn/docs-mcp ↩ -
OpenAI, "Codex is now generally available." OpenAI says the Codex SDK uses the same agent that powers the CLI and that its prompt, tool definitions, and agent loop were tuned together. https://openai.com/index/codex-now-generally-available/ ↩



