Vibe Coding: From Prompt Engineering to Harness Engineering Paradigm Shift

Introduction: The Problem with “Free-form” AI Coding

I’ve been experimenting with various SDD development approaches in Vibe Coding, such as OpenSpec, SpecKit, and oh-my-openagent’s Prometheus.

Andrej Karpathy introduced the concept of Vibe Coding in February 2025—describing intent in natural language, letting LLMs generate code, “Accept All” without reviewing diffs, and just copy-pasting error messages for AI to fix when things go wrong.

But once I actually started, I discovered a critical problem: AI keeps improvising beyond my intent.

Especially when maintaining existing projects—give it a requirement, and it’ll churn out three different implementation approaches, each “seemingly reasonable,” but which one is actually what you wanted?

What is Spec-Driven Development (SDD)?

Spec-Driven Development (SDD), or “Contract-First Development,” has a core principle that’s straightforward:

Define specifications first, then let AI execute.

This isn’t the traditional “write code first, documentation is scaffolding”—it’s treating specifications as the Single Source of Truth.

GitHub Spec Kit proposes a six-phase workflow:

/constitution — Define project-level principles (quality standards, architecture constraints)
/specify — Describe “what” the feature does, not “how”
/clarify — Eliminate ambiguity, identify boundary conditions
/plan — Technical decisions, architecture choices
/tasks — Break down into executable units
/implement — Write code according to specs

This workflow made me realize: Vibe Coding’s ultimate goal isn’t maximizing model creativity—it’s ensuring intent is 100% understood and implemented.

Harness Engineering: Putting Reins on the Model

Anthropic has a classic metaphor:

1
2


LLM = Powerful horse, immense strength, but no direction sense, doesn't understand boundaries
Harness = Reins + saddle, guiding force, setting boundaries, preventing runaway

Harness Engineering is the discipline of designing, building, and operating AI Agent infrastructure—constraining, guiding, verifying, and correcting AI in production environments.

This is fundamentally different from Prompt Engineering:

Aspect	Prompt Engineering	Harness Engineering
Scope	Instruction text sent to Model	Entire infrastructure around the Model
Focus	Making Model understand the task	How the system constrains, validates, corrects the Model
Reliability Improvement	5-15%	50-80%

A real case left a deep impression: A financial company’s Agent ran out of control at 3 AM, continuously retrying a failed API for 11 minutes—847 retries total, costing $2,200 in API fees, and sending 14 duplicate emails to the same customer.

Post-analysis revealed: The Model ran fine, Prompts were carefully designed, but the problem was at the infrastructure level—missing three hard controls: retry limit, execution timeout, circuit breaker.

This incident truly helped me understand Harness’s core value: Writing “please don’t exceed 10 retries” in a Prompt is just a suggestion—AI can ignore it, it can be overridden by other instructions. But a retry limit enforced at the Harness layer is true governance—Agents cannot bypass it, they must comply.

The Paradigm Shift is Complete

From Conversational Interaction to Strict Constraints

In the past, we:

Kept conversing with AI interactively
Wrote better prompts
Used stronger single models (GPT-4 → GPT-4o → Claude Opus)

Now in Harness Engineering:

Strict constraints: What to do, what not to do
Human intent understanding: Interview-style clarification, eliminating ambiguity
Full-process control: From spec → plan → execute → verify
Multi-agent multi-model collaboration (oh-my-openagent plugin): Sisyphus + Prometheus + Metis + Momus

Core Components

Harness Engineering has five core components:

Context Engineering — Deciding what information the Agent sees at each step
Tool Orchestration — Tool selection, parameter validation, execution sandbox
Verification Loops — Verifying output at each step (83% → 96% task completion rate)
Cost Envelope Management — Per-task budget ceiling (median × 3)
Observability — Structured execution tracing, causality recording

My Harness Engineering Practice: DDD + gRPC Projects

Step 1: Define gRPC Input/Output Specifications

In DDD architecture projects, you must manually define gRPC input/output specifications first.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


service UserService {
  rpc CreateUser(CreateUserRequest) returns (CreateUserResponse);
}

message CreateUserRequest {
  string name = 1;      // 2-100 characters
  string email = 2;     // Must be valid email format
  string role = 3;      // Only "admin", "user", "guest" allowed
}

message CreateUserResponse {
  string user_id = 1;   // UUID format
  int64 created_at = 2;
}

This .proto file is the contract—AI cannot improvise beyond this scope.

Step 2: Let AI Fully Explore the Project

Projects have many reusable resources in the repo layer. Need AI to do global exploration (read-only), producing project summary markdown.

Using oh-my-openagent’s /init-deep, launching multiple Agents in parallel:

Search existing data models
Analyze repo layer interfaces
Extract reusable components

Step 3: Write Business Detail Markdowns

Remember: Don’t try to pack all business details into one document.

Small-granularity breakdown:

1
2
3
4
5
6
7
8


specs/
├── 001-user-auth/
│   ├── spec.md         # User scenarios, acceptance criteria
│   ├── data-model.md   # Data model definition
│   ├── contracts/      # API contracts
│   └── tasks.md        # Executable tasks
├── 002-payment-flow/
│   └── ...

Principle: Try not to let the main Agent occupy too much context per task. Even with context compression, it’s information accuracy loss.

Step 4: Research Phase—Never Write Code First

This is the most critical step: Must tell AI not to start writing code immediately.

Especially when context accumulates longer, models tend to hallucinate and start executing prematurely. We need to use single AI session’s context space, first let AI do strict research, raise questions, and together refine the previously written specifications.

Simulated dialogue scenario:

1
2
3
4
5
6
7
8


AI: "I see UserRepository exists in the project, but OAuthToken entity is missing.
     Should we create a new table or reuse the existing user_tokens table?"
     
Me: "Reuse user_tokens table, but need to add provider field."

AI: "Should provider field have enum constraint? What providers are supported?"

Me: "Currently only Google and GitHub, use enum constraint."

This kind of iterative clarification is much more effective than directly letting AI write code.

Step 5: After Research, Use OpenSpec or Prometheus to Build Tasks/Plans

Two approaches I commonly use:

OpenSpec’s /opsx:propose:

1
2
3
4
5
6


/opsx:propose Add OAuth 2.0 authentication support, please read xxx.md to investigate and create plan, but don't execute plan
# Automatically generates:
#   proposal.md - Intent, scope, approach
#   specs/auth/spec.md - Delta Specs (ADDED/MODIFIED/REMOVED)
#   design.md - Technical architecture decisions
#   tasks.md - Implementation checklist

oh-my-openagent’s Prometheus:

1
2
3
4
5
6
7
8


Please read xxx.md to investigate and create plan, don't execute plan
# Interview-style clarification:
#   - Which endpoints to migrate first?
#   - Keep or deprecate REST?
#   - OAuth provider selection?
#   - Use TDD test-driven development?
# 
# Generate plan → Oracle research → Metis analysis → Momus validation

Personally, I prefer omo plugin’s “interview-style,” or “dialogue-style” confirmation of many details.

Step 6: After Plans/Tasks Created, Start New Session to Execute

When Plans or Tasks markdown files are created, I create a new AI session to exclusively execute this task.

Using omo’s /start-work, or OpenSpec’s /opsx-apply xxxx:

Read validated plans
Delegate to specialized Agents (code generation, testing, integration)
Independently validate results
Main Agent coordinates

Side note: oh-my-openagent’s multi-agent, multi-category approach seems to not work properly when using OpenSpec skills—it just lets your current main model execute.

Summary: Real Feelings After Practice

After 2+ months of using Harness Engineering for Vibe Coding, the biggest feeling isn’t about data improvements—it’s mindset changed.

Before, using AI to write code, every step felt uneasy—did it misunderstand? Would it add unnecessary features on its own? Would it complicate simple things?

Now with spec constraints, these problems basically disappeared. Not because I used a stronger model, but because I “locked down” intent before AI execution.

Several obvious changes:

Less rework. Previously, a feature averaged 3-4 back-and-forth revisions. Now most pass on first try. Not because AI suddenly got smarter—specifications defined “how to do” clearly beforehand.

Lower cost. Same tasks, Token consumption reduced by about one-third. Because AI doesn’t need repeated attempts and corrections.

Lower model requirements. Previously felt must use Claude Opus for good work. Now with comprehensive specs, Sonnet works similarly well.

Vibe Coding’s essence isn’t “making AI smarter”—it’s “making AI follow instructions better”.

The prerequisite for following instructions: You state things clearly first.

This is Harness Engineering’s core—it’s not about unleashing stronger model capabilities, but ensuring your intent is accurately understood and strictly executed.

Building this infrastructure took time—spent over 2 months exploring before getting comfortable with it. But once built, AI coding transforms from “guessing game” to “building from blueprint”—you can truly enjoy Vibe Coding’s fun, instead of constantly watching if it’s running off track.

If you’re interested in spec-driven development, check out my experience with OpenSpec in OpenCode. And for AI agent model configuration, see my oh-my-opencode optimization journey.