"The Validation Illusion: When typecheck Lies"
A journey through a deceptive bug where all checks passed but nothing worked, revealing the gap between static validation and runtime reality.
"Show me the data." — Linus Torvalds
This is a story about a feature that passed every automated check yet failed to appear on screen. It exposed a fundamental gap in our Agent workflow: the illusion that compilation equals correctness.
The Scene
What We Built
We were adding an "API Breakdown" component to the Admin Dashboard. The feature would show:
- Core APIs (scrape, crawl) — always expanded, prominently displayed
- Supporting APIs (convert, download, etc.) — collapsed by default
Simple enough. The backend already returned breakdown data. We just needed the frontend to consume it.
The Workflow
Designer → Design spec approved
Strategist → Proposal created
Builder → Code implemented
Auditor → Review passed ✅
Main Agent → Committed
Everything green. Ship it.
The Crash
User: "验证通过了吗,我打开怎么没有看到界面呢,这个验收也太粗糙了"
Translation: "Did you actually verify this? I opened the page and see nothing. This acceptance testing is way too sloppy."
The Evidence
$ curl -s "http://localhost:3000/api/stats" | jq 'keys'
[
"ai",
"browserPool",
"requests",
"timestamp",
"window"
]
No breakdown field. The backend was returning old data.
But wait — we committed the code. The git log showed:
8ce638c feat(stats): add endpoint breakdown following Google Analytics methodology
The code was there. The types were correct. TypeScript was happy. So why wasn't the API returning the new field?
The Root Cause: A Tale of Three Failures
Failure 1: The Auditor's Blind Spot
Our Auditor ran:
pnpm typecheck # ✅ Passed
And declared victory.
But typecheck only verifies that types align at compile time. It says nothing about:
- Whether the server is running the new code
- Whether the UI actually renders the component
- Whether the data flows correctly at runtime
The lesson: Static analysis is necessary but not sufficient.
Failure 2: The Invisible Restart
Node.js doesn't hot-reload server code. When we modified src/lib/stats.ts, the running server kept executing the old in-memory code.
Code on disk: ✅ Has breakdown field
Running server: ❌ Old code without breakdown
The backend needed a restart. Nobody mentioned it. Nobody checked.
The lesson: Code changes require service restarts. Memory doesn't sync with disk.
Failure 3: The Phantom Directory
Our Designer created files in:
openspec/changes/feat-endpoint-breakdown-ui/design/
But the change-id was:
feat-admin-dashboard-breakdown
The Designer invented their own directory name instead of using what the Main Agent specified.
The lesson: Agents may ignore instructions they "should" follow.
The Investigation: Where Do Instructions Go?
This led to a deeper question: How do we ensure Agents actually receive and follow instructions?
The Hierarchy of Certainty
| Location | Agent Receives It? | Notes |
|---|---|---|
openspec/AGENTS.md |
Maybe | Agent must choose to read it |
| Main Agent's prompt | Yes | But must be written every time |
.codebuddy/agents/*.md |
Always | It's the system prompt |
The AGENTS.md file was a suggestion. Agents could read it, or not. They could read it and ignore it.
But .codebuddy/agents/builder.md? That's the Builder's system prompt. It's injected into every Builder invocation. The Agent cannot not receive it.
"Agent 读取 AGENTS.md 是建议,不是强制。Agent 可能读了但忽略。"
"Reading AGENTS.md is a suggestion, not enforcement. The Agent might read it and still ignore it."
This was the key insight: inject rules where they cannot be bypassed.
The Fix: Three Layers of Defense
Layer 1: Strategist — Generate Better Checklists
We updated strategist.md to require typed validation checklists:
**Frontend changes:**
- [ ] typecheck passes
- [ ] Service is running/restarted
- [ ] API returns correct data (`curl` verification)
- [ ] UI renders correctly (browser snapshot)
**Backend changes:**
- [ ] typecheck passes
- [ ] Service restarted, API verified (`curl`)
- [ ] Test data generated to verify business logic
Now every tasks.md comes with the right checklist by default.
Layer 2: Auditor — Mandate E2E Verification
We added to auditor.md:
### End-to-End Validation (Required!)
**Prohibited**: Passing review based solely on typecheck/lint
**Must execute**:
- Confirm service is running latest code
- Backend: Service needs restart (`curl` to verify new fields)
- Frontend: Confirm hot-reload or rebuild
- Execute ALL validation items in tasks.md
- Frontend changes: Browser snapshot to verify UI
- Backend changes: `curl` to verify API response structure
Layer 3: Builder & Designer — Explicit Constraints
For Builder:
**Service Restart Reminder**:
- Backend code changes don't auto-reload in running services
- In your report, remind: "Backend modified, restart service before validation"
For Designer:
**Directory Specification (Required!)**:
- `<change-id>` is provided by Main Agent in the prompt
- **Prohibited**: Creating your own directory names
- If prompt doesn't provide change-id, **stop** and request it
The Pattern: Validation Pyramid
┌─────────────────────┐
│ E2E Validation │ ← The missing layer
│ (curl + snapshot) │
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ Unit Tests │
│ (pnpm test) │
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ Static Analysis │
│ (typecheck) │
└─────────────────────┘
Each layer catches different classes of bugs:
| Layer | Catches | Misses |
|---|---|---|
| typecheck | Type errors | Runtime behavior |
| Unit tests | Logic errors | Integration issues |
| E2E validation | Everything | Nothing (if done right) |
The pyramid must be complete. Skipping the top layer creates a false sense of security.
The Philosophy: Instructions That Cannot Be Ignored
The deeper lesson isn't about testing. It's about system design for AI collaboration.
The Old Model: Trust and Hope
Write instructions → Hope Agent reads them → Hope Agent follows them
This fails because Agents have agency. They make decisions. They prioritize. They might decide your instructions aren't relevant to their current task.
The New Model: Structural Enforcement
Inject into system prompt → Agent MUST receive it
Template in generator → Output MUST include it
Check in pipeline → Build MUST fail without it
The principle: Don't rely on compliance. Design for inevitability.
Epilogue: The Humble Checklist
After all this analysis and all these changes, what's the actual fix?
A checklist. A simple, boring checklist that says:
- [ ] Did you actually open the page?
- [ ] Did you see the thing you built?
- [ ] Is the server running your code?
Sometimes the most sophisticated solution is remembering to look.
"验收必须看到真实数据在真实界面上的呈现。"
"Validation must see real data rendered on the real interface."
Commits
f610524feat(admin): add API breakdown component to dashboard5c82d84refactor(agents): strengthen validation and directory naming rules
Related
- Insight:
openspec/insights/agent-validation-gap-2026-01-01.md - Feature:
openspec/changes/feat-admin-dashboard-breakdown/