Claude, Codex, Cursor — you use AI coding agents every day, so why do some people work 10x faster while others are still at the copy-paste level? Looking at @systematicls's publicly shared "How to become a world-class agentic engineer," the key isn't plugins or tools — it's how you work.

TL;DR
Design context deliberately Let agents self-verify Remove friction with tools Optimize your codebase for AI Compound all of this across the team

What Is It?

"Agentic Engineering" is the next step beyond "Vibe Coding," coined by Andrej Karpathy in 2025. If vibe coding was "just tell it and it'll figure it out," agentic engineering is a systematic engineering methodology for designing the process where AI agents plan, write code, test, and deploy.

Peter Steinberger (who joined OpenAI) went as far as saying on a podcast: "Vibe coding is almost a slur at this point. I call what I do agentic engineering." Karpathy himself emphasized this distinction. "Agentic" because you're delegating work to agents, "engineering" because it requires design and expertise.

So what does this actually look like in practice? Synthesizing @systematicls's original post with various real-world examples, the core principles distill into five.

The key distinction

Vibe coding = "Ask ChatGPT and copy-paste"
Agentic engineering = "Design systems where agents plan, execute, verify, and deploy on their own"
Both use AI, but the quality of output is completely different.

What's Different?

Principle 1. Context Engineering — Cut the noise, feed only the precise context

"The context window is 1 million tokens, so just dump everything in, right?" Easy to think that way, but Meta Staff Engineer John Kim nails it: "Models produce probabilistic output. You need to feed exactly the right amount for accurate results. More input doesn't mean better output." Tools like CLAUDE.md, slash commands, and MCP in Claude Code are ultimately tools for systematically managing context.

Even internally at OpenAI, they call this "Harness Engineering" and experimented with pushing domain knowledge directly into the codebase. Their conclusion was blunt: "If the domain knowledge isn't in the codebase, it doesn't exist to the agent."

Principle 2. Agentic Verification — Let the agent verify its own output

This is a key point emphasized by Boris Cherny, creator of Claude Code. If you tell the agent to write code, then manually review it, then give correction instructions... that's no different from vibe coding. Instead, you need to build loops where the agent verifies its own work. For backends, that means auto-running tests. For frontends, opening a browser with Playwright to capture and compare screenshots. For mobile, checking interactions via an ADB simulator.

PulseMCP calls this "closing the agentic loop." Once the loop is closed, a human invests 5 minutes to kick off the agent, and the agent runs autonomously for 30+ minutes, opening a PR with CI green, tests passing, and self-review complete.

Vibe Coding (manual review)Agentic Engineering (automated verification loop)
Output verificationDeveloper manually reviews by eyeAgent self-verifies via tests, browser, and logs
Feedback speedWaits until developer checksInstant feedback, auto-corrects iteratively
Developer roleReviews code line by lineDesigns verification systems, reviews only the final PR
Agent runtimeOne prompt → one result5 min investment → 30+ min autonomous execution
Quality consistencyDepends on developer's conditionVerification system applies consistent standards

Principle 3. Agentic Tooling — Build tools that remove friction for agents

Peter Steinberger calls this "friction." Every point where an agent gets stuck or can't do something is an opportunity. A specific API isn't accessible via CLI? Build a CLI. A specific validation can't be automated? Build an MCP server. Steinberger credits the CLI tools he built in advance for the success of his OpenClaw project.

Simon Willison adds an important principle: "Hoard things you know how to do." Small code experiments, solved problems, prompts that worked well — accumulating these gives you a powerful reference when asking agents to build new features. In agentic engineering, a developer's real asset isn't "typing speed" but "the range of what they know is possible."

Principle 4. Agentic Codebase — Make your codebase AI-readable

Is your codebase optimized for AI agents? Probably not. Dead code, half-finished migrations with two competing patterns, conflicting frameworks... when these get into an agent's context, they act like poison. When an agent generates code with a weird pattern, it's not the agent's fault — it's the confusing signals in your codebase.

The OpenAI team went further. They standardized file structures so agents could generate code consistently, added agent-specific logging, and wrote documentation readable by both humans and agents. They're encoding their "Golden Principles" directly into the repo.

Principle 5. Compound Engineering — Let all four principles compound across the entire team

This is the "Compound Engineering" concept from Dan Schipper, co-founder of Every. Context design, verification loops, tools, codebase optimization — if only one person does these, only individual productivity improves. But when the entire team contributes knowledge to CLAUDE.md, builds new MCPs, and shares verification scripts, compound effects emerge. The next session's agent starts with more tools and context than the previous one.

1,000+
Stripe Minions' weekly merged PRs
89%
Zapier org-wide AI adoption rate
500K hrs
Engineering time saved by TELUS

These numbers come from organizations that systematically adopted agentic engineering — not individuals. Stripe's Minions system works like this: a developer posts a task in Slack, the agent writes the code, passes CI, and opens a PR. The human only does the final review and merge. From task assignment to PR — zero human involvement.

Quick Start Guide

  1. Rewrite your CLAUDE.md (or AGENTS.md) as an "agent onboarding doc"
    Write down your project's build commands, coding conventions, and architecture decisions. Skip the obvious stuff like "we use TypeScript" and keep only what Claude is likely to get wrong. If domain knowledge isn't in the codebase, it doesn't exist to the agent.
  2. Close one verification loop
    For your most frequent task, make the agent verify its own results. For backend work, instruct "run the tests and iterate until they all pass." For frontend, use Playwright MCP to verify in the browser. Closing just one loop makes a noticeable difference.
  3. Solve one friction point with a tool
    Find the most common thing your agent can't do. Figure out if it's manually changing something on a website or calling a specific API, then build a CLI tool or MCP server to solve it. The test: "Next time this task comes up, can the agent handle it alone?"
  4. Clean up dead code and competing patterns in your codebase
    Half-finished migrations, two coexisting patterns, unused code — clean it up. Agents work probabilistically, so conflicting patterns mean they'll randomly follow either one.
  5. Start sharing across the team & compound your gains
    Commit new skills, MCP servers, and verification scripts to the repo. The test: "Can the tool I just built be used by the next person (or next agent session)?" Solo workflows don't compound.

Warning: 3 anti-patterns

These are things Simon Willison has clearly warned about.
1. Don't merge agent-written code without review. The agent's PR descriptions need human verification too.
2. Don't blindly trust agent-written tests. You can end up with "self-fulfilling tests" that pass with hardcoded values.
3. Don't settle for "it works, so it's fine." If you lower review standards just because code generation costs dropped, technical debt accumulates at AI speed.