Introduction: The Problem with “Free-form” AI Coding
I’ve been experimenting with various SDD development approaches in Vibe Coding, such as OpenSpec, SpecKit, and oh-my-openagent’s Prometheus.
Andrej Karpathy introduced the concept of Vibe Coding in February 2025—describing intent in natural language, letting LLMs generate code, “Accept All” without reviewing diffs, and just copy-pasting error messages for AI to fix when things go wrong.
But once I actually started, I discovered a critical problem: AI keeps improvising beyond my intent.
Especially when maintaining existing projects—give it a requirement, and it’ll churn out three different implementation approaches, each “seemingly reasonable,” but which one is actually what you wanted?
What is Spec-Driven Development (SDD)?
Spec-Driven Development (SDD), or “Contract-First Development,” has a core principle that’s straightforward:
Define specifications first, then let AI execute.
This isn’t the traditional “write code first, documentation is scaffolding”—it’s treating specifications as the Single Source of Truth.
GitHub Spec Kit proposes a six-phase workflow:
/constitution— Define project-level principles (quality standards, architecture constraints)/specify— Describe “what” the feature does, not “how”/clarify— Eliminate ambiguity, identify boundary conditions/plan— Technical decisions, architecture choices/tasks— Break down into executable units/implement— Write code according to specs
This workflow made me realize: Vibe Coding’s ultimate goal isn’t maximizing model creativity—it’s ensuring intent is 100% understood and implemented.
Harness Engineering: Putting Reins on the Model
Anthropic has a classic metaphor:
|
|
Harness Engineering is the discipline of designing, building, and operating AI Agent infrastructure—constraining, guiding, verifying, and correcting AI in production environments.
This is fundamentally different from Prompt Engineering:
| Aspect | Prompt Engineering | Harness Engineering |
|---|---|---|
| Scope | Instruction text sent to Model | Entire infrastructure around the Model |
| Focus | Making Model understand the task | How the system constrains, validates, corrects the Model |
| Reliability Improvement | 5-15% | 50-80% |
A real case left a deep impression: A financial company’s Agent ran out of control at 3 AM, continuously retrying a failed API for 11 minutes—847 retries total, costing $2,200 in API fees, and sending 14 duplicate emails to the same customer.
Post-analysis revealed: The Model ran fine, Prompts were carefully designed, but the problem was at the infrastructure level—missing three hard controls: retry limit, execution timeout, circuit breaker.
This incident truly helped me understand Harness’s core value: Writing “please don’t exceed 10 retries” in a Prompt is just a suggestion—AI can ignore it, it can be overridden by other instructions. But a retry limit enforced at the Harness layer is true governance—Agents cannot bypass it, they must comply.
The Paradigm Shift is Complete
From Conversational Interaction to Strict Constraints
In the past, we:
- Kept conversing with AI interactively
- Wrote better prompts
- Used stronger single models (GPT-4 → GPT-4o → Claude Opus)
Now in Harness Engineering:
- Strict constraints: What to do, what not to do
- Human intent understanding: Interview-style clarification, eliminating ambiguity
- Full-process control: From spec → plan → execute → verify
- Multi-agent multi-model collaboration (oh-my-openagent plugin): Sisyphus + Prometheus + Metis + Momus
Core Components
Harness Engineering has five core components:
- Context Engineering — Deciding what information the Agent sees at each step
- Tool Orchestration — Tool selection, parameter validation, execution sandbox
- Verification Loops — Verifying output at each step (83% → 96% task completion rate)
- Cost Envelope Management — Per-task budget ceiling (median × 3)
- Observability — Structured execution tracing, causality recording
My Harness Engineering Practice: DDD + gRPC Projects
Step 1: Define gRPC Input/Output Specifications
In DDD architecture projects, you must manually define gRPC input/output specifications first.
|
|
This .proto file is the contract—AI cannot improvise beyond this scope.
Step 2: Let AI Fully Explore the Project
Projects have many reusable resources in the repo layer. Need AI to do global exploration (read-only), producing project summary markdown.
Using oh-my-openagent’s /init-deep, launching multiple Agents in parallel:
- Search existing data models
- Analyze repo layer interfaces
- Extract reusable components
Step 3: Write Business Detail Markdowns
Remember: Don’t try to pack all business details into one document.
Small-granularity breakdown:
|
|
Principle: Try not to let the main Agent occupy too much context per task. Even with context compression, it’s information accuracy loss.
Step 4: Research Phase—Never Write Code First
This is the most critical step: Must tell AI not to start writing code immediately.
Especially when context accumulates longer, models tend to hallucinate and start executing prematurely. We need to use single AI session’s context space, first let AI do strict research, raise questions, and together refine the previously written specifications.
Simulated dialogue scenario:
|
|
This kind of iterative clarification is much more effective than directly letting AI write code.
Step 5: After Research, Use OpenSpec or Prometheus to Build Tasks/Plans
Two approaches I commonly use:
OpenSpec’s /opsx:propose:
|
|
oh-my-openagent’s Prometheus:
|
|
Personally, I prefer omo plugin’s “interview-style,” or “dialogue-style” confirmation of many details.
Step 6: After Plans/Tasks Created, Start New Session to Execute
When Plans or Tasks markdown files are created, I create a new AI session to exclusively execute this task.
Using omo’s /start-work, or OpenSpec’s /opsx-apply xxxx:
- Read validated plans
- Delegate to specialized Agents (code generation, testing, integration)
- Independently validate results
- Main Agent coordinates
Side note: oh-my-openagent’s multi-agent, multi-category approach seems to not work properly when using OpenSpec skills—it just lets your current main model execute.
Summary: Real Feelings After Practice
After 2+ months of using Harness Engineering for Vibe Coding, the biggest feeling isn’t about data improvements—it’s mindset changed.
Before, using AI to write code, every step felt uneasy—did it misunderstand? Would it add unnecessary features on its own? Would it complicate simple things?
Now with spec constraints, these problems basically disappeared. Not because I used a stronger model, but because I “locked down” intent before AI execution.
Several obvious changes:
Less rework. Previously, a feature averaged 3-4 back-and-forth revisions. Now most pass on first try. Not because AI suddenly got smarter—specifications defined “how to do” clearly beforehand.
Lower cost. Same tasks, Token consumption reduced by about one-third. Because AI doesn’t need repeated attempts and corrections.
Lower model requirements. Previously felt must use Claude Opus for good work. Now with comprehensive specs, Sonnet works similarly well.
Vibe Coding’s essence isn’t “making AI smarter”—it’s “making AI follow instructions better”.
The prerequisite for following instructions: You state things clearly first.
This is Harness Engineering’s core—it’s not about unleashing stronger model capabilities, but ensuring your intent is accurately understood and strictly executed.
Building this infrastructure took time—spent over 2 months exploring before getting comfortable with it. But once built, AI coding transforms from “guessing game” to “building from blueprint”—you can truly enjoy Vibe Coding’s fun, instead of constantly watching if it’s running off track.
If you’re interested in spec-driven development, check out my experience with OpenSpec in OpenCode. And for AI agent model configuration, see my oh-my-opencode optimization journey.