Development Workflow

Project development process, Git conventions, and tooling guide

Version: 2.5.2
Updated: 2025-12-27

This document describes the project's development workflow configuration, including Git Hooks, code quality gates, and collaboration standards.

Project Overview

A lightweight web scraping tool that converts any website into clean Markdown format for LLM processing.

Core Features

Feature	Description
Single Page Scrape `/api/scrape`	Scrape a single URL with multiple engine support
Multi-Page Crawl `/api/crawl`	Deep crawl websites with SSE real-time progress
HTML Convert `/api/convert`	Upload HTML files and convert to Markdown

Scraping Engines

http (~170ms) - Pure HTTP requests, ideal for static sites
browser (~5s) - Puppeteer rendering, ideal for SPAs
auto (~1.2s) - Smart detection, automatic selection

Architecture

project/
├── src/                    # Backend (Express + TypeScript)
│   ├── server.ts           # Express entry point
│   ├── routes/             # API routes
│   ├── middleware/         # Middleware
│   ├── scraper/            # Single page scraping module
│   │   ├── browser-pool.ts   # Browser pool management
│   │   ├── browser.ts        # Puppeteer scraping
│   │   ├── fetch.ts          # HTTP scraping
│   │   ├── detector.ts       # Engine auto-detection
│   │   └── images.ts         # Image processing
│   ├── crawler/            # Multi-page crawling module
│   ├── lib/                # Utility functions
│   │   ├── clean-html.ts     # HTML cleaning
│   │   ├── html-to-markdown.ts # Markdown conversion (Turndown)
│   │   └── queue/            # Task queue
│   └── types.ts            # Type definitions + Zod validation
│
├── ui/                     # Frontend (Next.js 15)
│   └── src/
│       ├── app/            # App Router pages
│       ├── components/     # React components (shadcn/ui)
│       ├── hooks/          # Custom Hooks
│       └── lib/            # Utility functions
│
└── tests/                  # Tests
    ├── unit/
    ├── integration/
    └── e2e/

Tech Stack

Layer	Technology
Backend	Node.js + Express + TypeScript
Browser Automation	Puppeteer
HTML Parsing	Cheerio
Markdown Conversion	Turndown + GFM
Frontend	Next.js 15 + React + Tailwind
Validation	Zod
Testing	Vitest

Data Flow

User requests URL
    ↓
detector.ts (decide http or browser)
    ↓
fetch.ts / browser.ts (get HTML)
    ↓
clean-html.ts (clean HTML, remove ads/scripts)
    ↓
html-to-markdown.ts (convert to Markdown)
    ↓
Return LLM-ready content

Codebase is ~1500 lines, following Unix philosophy "do one thing well".

Git Hooks Configuration

Pre-commit Hook (Fast Quality Gate)

Goal: Complete checks in < 3 seconds

3-Layer Quality Gate:

Layer	Name	Function
Layer 0	Commit Type Detection	Distinguish code/non-code changes; skip review for docs/config/archive
Layer 1	Code Review Enforcement	Check for code review token; reject commits without token
Layer 2	Biome Format/Lint	Auto-format + error checking; re-stage fixed files

v2.5.2 Optimization: TypeScript and ESLint checks moved to pre-push to avoid redundant execution.

Key Mechanisms:

Detect non-code changes, auto-skip Layer 1
Code changes must have code review token
Auto-fixed files are immediately git added
One-time use token (deleted after use)

Pre-push Hook (Comprehensive Quality Check)

Goal: < 60 seconds (first run) / < 2 seconds (cache hit)

5-Layer Check:

Layer	Name	Function
Layer 0.0	Cache Check	Commit hash based caching; skip all checks on second push of same commit
Layer 0.1	Version Generation	Generate `VERSION` file from Git tag or `package.json`
Layer 1	Test Execution	Backend tests `pnpm test`; frontend lint + test
Layer 2	Build Verification	Frontend build `pnpm build` (includes TypeScript); verify build output
Layer 3	Security Audit	Critical vulnerabilities block push; High vulnerabilities warn only

Cache Mechanism (v2.5.2 New):

Cache file: /tmp/.pre-push-passed-{commit-hash}
Force skip cache: SKIP_PUSH_CACHE=1 git push
Clear cache: rm -f /tmp/.pre-push-passed-*

Optimization Effect: Multi-remote push (origin + github) reduced from ~90s to ~45s

Change Summary: Show file count and line changes when diff > 50 lines

Post-merge Hook (Version Sync)

Function: Auto-update VERSION file to ensure version consistency after pull

Priority:

Try reading from Git tag
Fallback to package.json version
Final fallback to commit SHA version

OpenSpec Workflow Configuration

Directory Structure

openspec/
├── AGENTS.md              # Workflow spec + role definitions
├── project.md             # Project context
├── changes/               # Changes in progress
│   ├── [change-id]/
│   │   ├── proposal.md    # Change proposal (Strategist output)
│   │   ├── design.md      # Technical design + architecture review
│   │   └── tasks.md       # Executable task list
│   └── archive/           # Completed changes
└── specs/                 # Feature spec library

Core Role System (5 Roles)

Main Agent (Coordinator)
    ↓
    ├─→ Designer      (Designer)    - Visual + interaction design
    ├─→ Strategist    (Strategist)  - What + Why
    ├─→ Builder       (Builder)     - How (implementation)
    └─→ Auditor       (Auditor)     - Review + acceptance

Role	Responsibilities	Prohibited
Designer	Visual design, interaction definition, design specs	Write code, execute commit
Strategist	Requirements analysis, architecture design, create proposals	Modify `src/` code, execute commit
Builder	Code implementation, test writing, bug fixing	Architecture decisions, execute commit
Auditor	Code review, security checks, issue tokens	Modify code, architecture decisions

OpenSpec Commands

/openspec:proposal <description>  # Create change proposal (Strategist)
/openspec:apply <change-id>       # Execute change (Builder + Auditor)
/openspec:archive <id>            # Archive completed change (Main Agent)

Workflow Stages (6 Stages)

Design Stage      Proposal Stage       Apply Stage        Review Stage       Commit Stage     Archive Stage
(design needs)         ↓                   ↓                  ↓                  ↓                ↓
    ↓            Task(strategist)   Task(builder)×N    Task(auditor)    git commit       move to archive/
Task(designer)         ↓                   ↓                  ↓                  ↓                ↓
    ↓            proposal.md        code changes       review report    commit code      change archived
design spec      design.md          test results       token issued
    ↓            tasks.md           completion report
integrate into proposal

Quality Gates

Proposal Stage:

proposal.md must include "Background" section
Linus 3 Questions answered (Q1: Problem essence / Q2: Simplest solution / Q3: Automation measures)
Architecture review passed
Create proposal.md, design.md, tasks.md

Apply Stage:

All tasks marked complete [x]
typecheck passed
test passed
Auditor review passed
Code review token exists

Commit Stage:

Pre-commit hook passed
Commit message follows Conventional Commits

Archive Stage:

Code deployed and verified
Change moved to archive/

Change Priority & Mode

Priority	Type	Mode	Linus 3Q	Architecture Review
P0	SECURITY	Full	Required	Required
P1	ARCHITECTURE	Full	Required	Required
P2	FEATURE / REFACTOR	Full	Required	Required
P3	FIX	Light	Optional	Skip
P4	DOCS / CONFIG	Light	Optional	Skip

AI Collaboration Configuration

Agent Configuration Details

Strategist

Configuration:

name: strategist
description: Requirements analysis, architecture design and change proposals
tools: Read, Write, Edit, Glob, Grep, WebFetch, AskUserQuestion, SlashCommand
model: opus

Responsibilities:

Analyze codebase (Read, Grep, Glob)
Answer Linus 3 Questions
Conduct architecture review
Output proposal.md, design.md, tasks.md

Prohibited:

Modify src/ business code
Execute git commit
Run tests, start services
Call browser / MCP

Output Format:

proposal.md: Change proposal (with Linus 3Q + Impact scope + Risk assessment)
design.md: Technical design (with architecture review conclusion 5/5 score)
tasks.md: Task list (grouped by Phase)

Builder

Configuration:

name: builder
description: Code implementation, debugging and testing
tools: Read, Write, Edit, Bash, Glob, Grep
model: sonnet

Responsibilities:

Implement code (Write, Edit)
Run tests (Bash)
Fix bugs
Step-by-step completion (independent test and report per Phase)

Prohibited:

Make architecture decisions independently
Execute git commit
Changes beyond tasks.md scope

Commit Convention (Linus Principle):

One patch does one thing
Can be reviewed independently / can be reverted independently / diff < 200 lines

Auditor

Configuration:

name: auditor
description: Code review, security checks and change archiving
tools: Read, Bash, Glob, Grep, TodoWrite, SlashCommand
model: sonnet

Review Dimensions:

Spec compliance - Does it match proposal.md
Code quality - Clear naming, function length, nesting depth, code duplication
Performance issues - O(n²) check, memory leaks, unnecessary loops
Security vulnerabilities - SQL injection, XSS, hardcoded secrets
Edge cases - null/undefined, empty collections, timeout handling
Test coverage - Critical paths + edge cases

Issue Severity:

Level	Definition	Response
Critical	Security vulnerabilities, logic errors, performance deadlocks, resource leaks	Reject merge
Warning	O(n²), code duplication, poor maintainability, any type	Recommend changes
Info	Naming suggestions, comment additions, code style	Optional optimization

Review Results:

✅ Pass - No Critical, no Warning or Warning confirmed
⚠️ Conditional Pass - Has Warning, needs confirmation before proceeding
❌ Reject - Has Critical, must fix and resubmit

Package.json Scripts

Development Flow

pnpm dev              # Start backend (localhost:3000)
pnpm build            # Build code (tsc --build)
pnpm typecheck        # TypeScript check
pnpm start            # Production start
pnpm clean            # Delete dist/

Code Quality

pnpm format           # Biome format
pnpm lint             # Biome lint
pnpm check            # Biome check (format + lint)

Testing

pnpm test             # Run unit tests (vitest run)
pnpm test:watch       # Watch mode testing
pnpm test:integration # Integration tests
pnpm verify           # Real website tests

Release

pnpm release          # Release flow (choose version)
pnpm release:patch    # 1.9.4 → 1.9.5
pnpm release:minor    # 1.9.4 → 1.10.0
pnpm release:major    # 1.9.4 → 2.0.0

Other Workflow Configuration

Release Script (scripts/release.sh)

Function: Semantic versioning + changelog generation

Usage:

./scripts/release.sh [patch|minor|major]
./scripts/release.sh patch   # 1.9.4 → 1.9.5
./scripts/release.sh minor   # 1.9.4 → 1.10.0
./scripts/release.sh major   # 1.9.4 → 2.0.0

Checks:

Working directory must be clean (no uncommitted changes)
Must be on main branch (or confirm to continue)
Update package.json version
Create Git tag

Frontend Build Verification (scripts/verify-frontend-build.js)

Called in pre-push hook Layer 2
Verify build output validity
Check Tailwind CSS configuration

Environment Configuration (.env)

Contains sensitive information (API keys, tokens)
Configured in .gitignore, not checked into version control
Reference: .env.example

Conventional Commits

Format:

type(scope): subject

body (optional)

Types: feat, fix, refactor, perf, test, docs, chore Scopes: backend, scraper, crawler, api, agents Subject: Short description (< 50 characters)

Core Workflow Summary

Daily Development Flow

Write code
    ↓
git add / git commit (triggers pre-commit)
    ↓
pre-commit hook quality gate (< 3s)
├─ Layer 0: Detect commit type
├─ Layer 1: Verify code review token
├─ Layer 2: Biome format + Lint
    ↓
git push (triggers pre-push)
    ↓
pre-push hook comprehensive check (< 60s first / < 2s cached)
├─ Layer 0.0: Cache check (skip if same commit)
├─ Layer 0.1: Generate VERSION file
├─ Layer 1: Run tests
├─ Layer 2: Build verification (includes TypeScript)
├─ Layer 3: Security audit
    ↓
Code pushed
    ↓
git merge/pull (triggers post-merge)
    ↓
post-merge hook version sync

OpenSpec Change Flow

User submits requirement
    ↓
/openspec:proposal <description>
    ↓
Task(strategist) creates proposal
├─ Answer Linus 3 Questions
├─ Conduct architecture review
├─ Output proposal.md, design.md, tasks.md
    ↓
/openspec:apply <change-id>
    ↓
Task(builder) × N parallel implementation
├─ Complete step by step by Phase
├─ Test and report per Phase
├─ Mark tasks as complete
    ↓
Task(auditor) code review
├─ Check 6 dimensions
├─ Categorize issues (Critical/Warning/Info)
├─ Issue code review token
    ↓
Main Agent git commit
    ↓
git push (pre-push hook check)
    ↓
/openspec:archive <id>
    ↓
Change moved to archive/ complete

This is a carefully designed Linus Torvalds-style workflow system, emphasizing automation, simplicity, and separation of responsibilities. The core philosophy is "machines enforce, humans create".