Development Workflow
Project development process, Git conventions, and tooling guide
Version: 2.5.2
Updated: 2025-12-27
This document describes the project's development workflow configuration, including Git Hooks, code quality gates, and collaboration standards.
Project Overview
A lightweight web scraping tool that converts any website into clean Markdown format for LLM processing.
Core Features
| Feature | Description |
|---|---|
Single Page Scrape /api/scrape |
Scrape a single URL with multiple engine support |
Multi-Page Crawl /api/crawl |
Deep crawl websites with SSE real-time progress |
HTML Convert /api/convert |
Upload HTML files and convert to Markdown |
Scraping Engines
- http (~170ms) - Pure HTTP requests, ideal for static sites
- browser (~5s) - Puppeteer rendering, ideal for SPAs
- auto (~1.2s) - Smart detection, automatic selection
Architecture
project/
├── src/ # Backend (Express + TypeScript)
│ ├── server.ts # Express entry point
│ ├── routes/ # API routes
│ ├── middleware/ # Middleware
│ ├── scraper/ # Single page scraping module
│ │ ├── browser-pool.ts # Browser pool management
│ │ ├── browser.ts # Puppeteer scraping
│ │ ├── fetch.ts # HTTP scraping
│ │ ├── detector.ts # Engine auto-detection
│ │ └── images.ts # Image processing
│ ├── crawler/ # Multi-page crawling module
│ ├── lib/ # Utility functions
│ │ ├── clean-html.ts # HTML cleaning
│ │ ├── html-to-markdown.ts # Markdown conversion (Turndown)
│ │ └── queue/ # Task queue
│ └── types.ts # Type definitions + Zod validation
│
├── ui/ # Frontend (Next.js 15)
│ └── src/
│ ├── app/ # App Router pages
│ ├── components/ # React components (shadcn/ui)
│ ├── hooks/ # Custom Hooks
│ └── lib/ # Utility functions
│
└── tests/ # Tests
├── unit/
├── integration/
└── e2e/
Tech Stack
| Layer | Technology |
|---|---|
| Backend | Node.js + Express + TypeScript |
| Browser Automation | Puppeteer |
| HTML Parsing | Cheerio |
| Markdown Conversion | Turndown + GFM |
| Frontend | Next.js 15 + React + Tailwind |
| Validation | Zod |
| Testing | Vitest |
Data Flow
User requests URL
↓
detector.ts (decide http or browser)
↓
fetch.ts / browser.ts (get HTML)
↓
clean-html.ts (clean HTML, remove ads/scripts)
↓
html-to-markdown.ts (convert to Markdown)
↓
Return LLM-ready content
Codebase is ~1500 lines, following Unix philosophy "do one thing well".
Git Hooks Configuration
Pre-commit Hook (Fast Quality Gate)
Goal: Complete checks in < 3 seconds
3-Layer Quality Gate:
| Layer | Name | Function |
|---|---|---|
| Layer 0 | Commit Type Detection | Distinguish code/non-code changes; skip review for docs/config/archive |
| Layer 1 | Code Review Enforcement | Check for code review token; reject commits without token |
| Layer 2 | Biome Format/Lint | Auto-format + error checking; re-stage fixed files |
v2.5.2 Optimization: TypeScript and ESLint checks moved to pre-push to avoid redundant execution.
Key Mechanisms:
- Detect non-code changes, auto-skip Layer 1
- Code changes must have code review token
- Auto-fixed files are immediately
git added - One-time use token (deleted after use)
Pre-push Hook (Comprehensive Quality Check)
Goal: < 60 seconds (first run) / < 2 seconds (cache hit)
5-Layer Check:
| Layer | Name | Function |
|---|---|---|
| Layer 0.0 | Cache Check | Commit hash based caching; skip all checks on second push of same commit |
| Layer 0.1 | Version Generation | Generate VERSION file from Git tag or package.json |
| Layer 1 | Test Execution | Backend tests pnpm test; frontend lint + test |
| Layer 2 | Build Verification | Frontend build pnpm build (includes TypeScript); verify build output |
| Layer 3 | Security Audit | Critical vulnerabilities block push; High vulnerabilities warn only |
Cache Mechanism (v2.5.2 New):
- Cache file:
/tmp/.pre-push-passed-{commit-hash} - Force skip cache:
SKIP_PUSH_CACHE=1 git push - Clear cache:
rm -f /tmp/.pre-push-passed-*
Optimization Effect: Multi-remote push (origin + github) reduced from ~90s to ~45s
Change Summary: Show file count and line changes when diff > 50 lines
Post-merge Hook (Version Sync)
Function: Auto-update VERSION file to ensure version consistency after pull
Priority:
- Try reading from Git tag
- Fallback to package.json version
- Final fallback to commit SHA version
OpenSpec Workflow Configuration
Directory Structure
openspec/
├── AGENTS.md # Workflow spec + role definitions
├── project.md # Project context
├── changes/ # Changes in progress
│ ├── [change-id]/
│ │ ├── proposal.md # Change proposal (Strategist output)
│ │ ├── design.md # Technical design + architecture review
│ │ └── tasks.md # Executable task list
│ └── archive/ # Completed changes
└── specs/ # Feature spec library
Core Role System (5 Roles)
Main Agent (Coordinator)
↓
├─→ Designer (Designer) - Visual + interaction design
├─→ Strategist (Strategist) - What + Why
├─→ Builder (Builder) - How (implementation)
└─→ Auditor (Auditor) - Review + acceptance
| Role | Responsibilities | Prohibited |
|---|---|---|
| Designer | Visual design, interaction definition, design specs | Write code, execute commit |
| Strategist | Requirements analysis, architecture design, create proposals | Modify src/ code, execute commit |
| Builder | Code implementation, test writing, bug fixing | Architecture decisions, execute commit |
| Auditor | Code review, security checks, issue tokens | Modify code, architecture decisions |
OpenSpec Commands
/openspec:proposal <description> # Create change proposal (Strategist)
/openspec:apply <change-id> # Execute change (Builder + Auditor)
/openspec:archive <id> # Archive completed change (Main Agent)
Workflow Stages (6 Stages)
Design Stage Proposal Stage Apply Stage Review Stage Commit Stage Archive Stage
(design needs) ↓ ↓ ↓ ↓ ↓
↓ Task(strategist) Task(builder)×N Task(auditor) git commit move to archive/
Task(designer) ↓ ↓ ↓ ↓ ↓
↓ proposal.md code changes review report commit code change archived
design spec design.md test results token issued
↓ tasks.md completion report
integrate into proposal
Quality Gates
Proposal Stage:
- proposal.md must include "Background" section
- Linus 3 Questions answered (Q1: Problem essence / Q2: Simplest solution / Q3: Automation measures)
- Architecture review passed
- Create proposal.md, design.md, tasks.md
Apply Stage:
- All tasks marked complete [x]
- typecheck passed
- test passed
- Auditor review passed
- Code review token exists
Commit Stage:
- Pre-commit hook passed
- Commit message follows Conventional Commits
Archive Stage:
- Code deployed and verified
- Change moved to archive/
Change Priority & Mode
| Priority | Type | Mode | Linus 3Q | Architecture Review |
|---|---|---|---|---|
| P0 | SECURITY | Full | Required | Required |
| P1 | ARCHITECTURE | Full | Required | Required |
| P2 | FEATURE / REFACTOR | Full | Required | Required |
| P3 | FIX | Light | Optional | Skip |
| P4 | DOCS / CONFIG | Light | Optional | Skip |
AI Collaboration Configuration
Agent Configuration Details
Strategist
Configuration:
- name: strategist
- description: Requirements analysis, architecture design and change proposals
- tools: Read, Write, Edit, Glob, Grep, WebFetch, AskUserQuestion, SlashCommand
- model: opus
Responsibilities:
- Analyze codebase (Read, Grep, Glob)
- Answer Linus 3 Questions
- Conduct architecture review
- Output proposal.md, design.md, tasks.md
Prohibited:
- Modify src/ business code
- Execute git commit
- Run tests, start services
- Call browser / MCP
Output Format:
- proposal.md: Change proposal (with Linus 3Q + Impact scope + Risk assessment)
- design.md: Technical design (with architecture review conclusion 5/5 score)
- tasks.md: Task list (grouped by Phase)
Builder
Configuration:
- name: builder
- description: Code implementation, debugging and testing
- tools: Read, Write, Edit, Bash, Glob, Grep
- model: sonnet
Responsibilities:
- Implement code (Write, Edit)
- Run tests (Bash)
- Fix bugs
- Step-by-step completion (independent test and report per Phase)
Prohibited:
- Make architecture decisions independently
- Execute git commit
- Changes beyond tasks.md scope
Commit Convention (Linus Principle):
- One patch does one thing
- Can be reviewed independently / can be reverted independently / diff < 200 lines
Auditor
Configuration:
- name: auditor
- description: Code review, security checks and change archiving
- tools: Read, Bash, Glob, Grep, TodoWrite, SlashCommand
- model: sonnet
Review Dimensions:
- Spec compliance - Does it match proposal.md
- Code quality - Clear naming, function length, nesting depth, code duplication
- Performance issues - O(n²) check, memory leaks, unnecessary loops
- Security vulnerabilities - SQL injection, XSS, hardcoded secrets
- Edge cases - null/undefined, empty collections, timeout handling
- Test coverage - Critical paths + edge cases
Issue Severity:
| Level | Definition | Response |
|---|---|---|
| Critical | Security vulnerabilities, logic errors, performance deadlocks, resource leaks | Reject merge |
| Warning | O(n²), code duplication, poor maintainability, any type | Recommend changes |
| Info | Naming suggestions, comment additions, code style | Optional optimization |
Review Results:
- ✅ Pass - No Critical, no Warning or Warning confirmed
- ⚠️ Conditional Pass - Has Warning, needs confirmation before proceeding
- ❌ Reject - Has Critical, must fix and resubmit
Package.json Scripts
Development Flow
pnpm dev # Start backend (localhost:3000)
pnpm build # Build code (tsc --build)
pnpm typecheck # TypeScript check
pnpm start # Production start
pnpm clean # Delete dist/
Code Quality
pnpm format # Biome format
pnpm lint # Biome lint
pnpm check # Biome check (format + lint)
Testing
pnpm test # Run unit tests (vitest run)
pnpm test:watch # Watch mode testing
pnpm test:integration # Integration tests
pnpm verify # Real website tests
Release
pnpm release # Release flow (choose version)
pnpm release:patch # 1.9.4 → 1.9.5
pnpm release:minor # 1.9.4 → 1.10.0
pnpm release:major # 1.9.4 → 2.0.0
Other Workflow Configuration
Release Script (scripts/release.sh)
Function: Semantic versioning + changelog generation
Usage:
./scripts/release.sh [patch|minor|major]
./scripts/release.sh patch # 1.9.4 → 1.9.5
./scripts/release.sh minor # 1.9.4 → 1.10.0
./scripts/release.sh major # 1.9.4 → 2.0.0
Checks:
- Working directory must be clean (no uncommitted changes)
- Must be on main branch (or confirm to continue)
- Update package.json version
- Create Git tag
Frontend Build Verification (scripts/verify-frontend-build.js)
- Called in pre-push hook Layer 2
- Verify build output validity
- Check Tailwind CSS configuration
Environment Configuration (.env)
- Contains sensitive information (API keys, tokens)
- Configured in .gitignore, not checked into version control
- Reference: .env.example
Conventional Commits
Format:
type(scope): subject
body (optional)
Types: feat, fix, refactor, perf, test, docs, chore Scopes: backend, scraper, crawler, api, agents Subject: Short description (< 50 characters)
Core Workflow Summary
Daily Development Flow
Write code
↓
git add / git commit (triggers pre-commit)
↓
pre-commit hook quality gate (< 3s)
├─ Layer 0: Detect commit type
├─ Layer 1: Verify code review token
├─ Layer 2: Biome format + Lint
↓
git push (triggers pre-push)
↓
pre-push hook comprehensive check (< 60s first / < 2s cached)
├─ Layer 0.0: Cache check (skip if same commit)
├─ Layer 0.1: Generate VERSION file
├─ Layer 1: Run tests
├─ Layer 2: Build verification (includes TypeScript)
├─ Layer 3: Security audit
↓
Code pushed
↓
git merge/pull (triggers post-merge)
↓
post-merge hook version sync
OpenSpec Change Flow
User submits requirement
↓
/openspec:proposal <description>
↓
Task(strategist) creates proposal
├─ Answer Linus 3 Questions
├─ Conduct architecture review
├─ Output proposal.md, design.md, tasks.md
↓
/openspec:apply <change-id>
↓
Task(builder) × N parallel implementation
├─ Complete step by step by Phase
├─ Test and report per Phase
├─ Mark tasks as complete
↓
Task(auditor) code review
├─ Check 6 dimensions
├─ Categorize issues (Critical/Warning/Info)
├─ Issue code review token
↓
Main Agent git commit
↓
git push (pre-push hook check)
↓
/openspec:archive <id>
↓
Change moved to archive/ complete
This is a carefully designed Linus Torvalds-style workflow system, emphasizing automation, simplicity, and separation of responsibilities. The core philosophy is "machines enforce, humans create".