Development Workflow

Project development process, Git conventions, and tooling guide

Version: 2.5.2
Updated: 2025-12-27

This document describes the project's development workflow configuration, including Git Hooks, code quality gates, and collaboration standards.


Project Overview

A lightweight web scraping tool that converts any website into clean Markdown format for LLM processing.

Core Features

Feature Description
Single Page Scrape /api/scrape Scrape a single URL with multiple engine support
Multi-Page Crawl /api/crawl Deep crawl websites with SSE real-time progress
HTML Convert /api/convert Upload HTML files and convert to Markdown

Scraping Engines

  • http (~170ms) - Pure HTTP requests, ideal for static sites
  • browser (~5s) - Puppeteer rendering, ideal for SPAs
  • auto (~1.2s) - Smart detection, automatic selection

Architecture

project/
├── src/                    # Backend (Express + TypeScript)
│   ├── server.ts           # Express entry point
│   ├── routes/             # API routes
│   ├── middleware/         # Middleware
│   ├── scraper/            # Single page scraping module
│   │   ├── browser-pool.ts   # Browser pool management
│   │   ├── browser.ts        # Puppeteer scraping
│   │   ├── fetch.ts          # HTTP scraping
│   │   ├── detector.ts       # Engine auto-detection
│   │   └── images.ts         # Image processing
│   ├── crawler/            # Multi-page crawling module
│   ├── lib/                # Utility functions
│   │   ├── clean-html.ts     # HTML cleaning
│   │   ├── html-to-markdown.ts # Markdown conversion (Turndown)
│   │   └── queue/            # Task queue
│   └── types.ts            # Type definitions + Zod validation
│
├── ui/                     # Frontend (Next.js 15)
│   └── src/
│       ├── app/            # App Router pages
│       ├── components/     # React components (shadcn/ui)
│       ├── hooks/          # Custom Hooks
│       └── lib/            # Utility functions
│
└── tests/                  # Tests
    ├── unit/
    ├── integration/
    └── e2e/

Tech Stack

Layer Technology
Backend Node.js + Express + TypeScript
Browser Automation Puppeteer
HTML Parsing Cheerio
Markdown Conversion Turndown + GFM
Frontend Next.js 15 + React + Tailwind
Validation Zod
Testing Vitest

Data Flow

User requests URL
    ↓
detector.ts (decide http or browser)
    ↓
fetch.ts / browser.ts (get HTML)
    ↓
clean-html.ts (clean HTML, remove ads/scripts)
    ↓
html-to-markdown.ts (convert to Markdown)
    ↓
Return LLM-ready content

Codebase is ~1500 lines, following Unix philosophy "do one thing well".


Git Hooks Configuration

Pre-commit Hook (Fast Quality Gate)

Goal: Complete checks in < 3 seconds

3-Layer Quality Gate:

Layer Name Function
Layer 0 Commit Type Detection Distinguish code/non-code changes; skip review for docs/config/archive
Layer 1 Code Review Enforcement Check for code review token; reject commits without token
Layer 2 Biome Format/Lint Auto-format + error checking; re-stage fixed files

v2.5.2 Optimization: TypeScript and ESLint checks moved to pre-push to avoid redundant execution.

Key Mechanisms:

  • Detect non-code changes, auto-skip Layer 1
  • Code changes must have code review token
  • Auto-fixed files are immediately git added
  • One-time use token (deleted after use)

Pre-push Hook (Comprehensive Quality Check)

Goal: < 60 seconds (first run) / < 2 seconds (cache hit)

5-Layer Check:

Layer Name Function
Layer 0.0 Cache Check Commit hash based caching; skip all checks on second push of same commit
Layer 0.1 Version Generation Generate VERSION file from Git tag or package.json
Layer 1 Test Execution Backend tests pnpm test; frontend lint + test
Layer 2 Build Verification Frontend build pnpm build (includes TypeScript); verify build output
Layer 3 Security Audit Critical vulnerabilities block push; High vulnerabilities warn only

Cache Mechanism (v2.5.2 New):

  • Cache file: /tmp/.pre-push-passed-{commit-hash}
  • Force skip cache: SKIP_PUSH_CACHE=1 git push
  • Clear cache: rm -f /tmp/.pre-push-passed-*

Optimization Effect: Multi-remote push (origin + github) reduced from ~90s to ~45s

Change Summary: Show file count and line changes when diff > 50 lines

Post-merge Hook (Version Sync)

Function: Auto-update VERSION file to ensure version consistency after pull

Priority:

  1. Try reading from Git tag
  2. Fallback to package.json version
  3. Final fallback to commit SHA version

OpenSpec Workflow Configuration

Directory Structure

openspec/
├── AGENTS.md              # Workflow spec + role definitions
├── project.md             # Project context
├── changes/               # Changes in progress
│   ├── [change-id]/
│   │   ├── proposal.md    # Change proposal (Strategist output)
│   │   ├── design.md      # Technical design + architecture review
│   │   └── tasks.md       # Executable task list
│   └── archive/           # Completed changes
└── specs/                 # Feature spec library

Core Role System (5 Roles)

Main Agent (Coordinator)
    ↓
    ├─→ Designer      (Designer)    - Visual + interaction design
    ├─→ Strategist    (Strategist)  - What + Why
    ├─→ Builder       (Builder)     - How (implementation)
    └─→ Auditor       (Auditor)     - Review + acceptance
Role Responsibilities Prohibited
Designer Visual design, interaction definition, design specs Write code, execute commit
Strategist Requirements analysis, architecture design, create proposals Modify src/ code, execute commit
Builder Code implementation, test writing, bug fixing Architecture decisions, execute commit
Auditor Code review, security checks, issue tokens Modify code, architecture decisions

OpenSpec Commands

/openspec:proposal <description>  # Create change proposal (Strategist)
/openspec:apply <change-id>       # Execute change (Builder + Auditor)
/openspec:archive <id>            # Archive completed change (Main Agent)

Workflow Stages (6 Stages)

Design Stage      Proposal Stage       Apply Stage        Review Stage       Commit Stage     Archive Stage
(design needs)         ↓                   ↓                  ↓                  ↓                ↓
    ↓            Task(strategist)   Task(builder)×N    Task(auditor)    git commit       move to archive/
Task(designer)         ↓                   ↓                  ↓                  ↓                ↓
    ↓            proposal.md        code changes       review report    commit code      change archived
design spec      design.md          test results       token issued
    ↓            tasks.md           completion report
integrate into proposal

Quality Gates

Proposal Stage:

  • proposal.md must include "Background" section
  • Linus 3 Questions answered (Q1: Problem essence / Q2: Simplest solution / Q3: Automation measures)
  • Architecture review passed
  • Create proposal.md, design.md, tasks.md

Apply Stage:

  • All tasks marked complete [x]
  • typecheck passed
  • test passed
  • Auditor review passed
  • Code review token exists

Commit Stage:

  • Pre-commit hook passed
  • Commit message follows Conventional Commits

Archive Stage:

  • Code deployed and verified
  • Change moved to archive/

Change Priority & Mode

Priority Type Mode Linus 3Q Architecture Review
P0 SECURITY Full Required Required
P1 ARCHITECTURE Full Required Required
P2 FEATURE / REFACTOR Full Required Required
P3 FIX Light Optional Skip
P4 DOCS / CONFIG Light Optional Skip

AI Collaboration Configuration

Agent Configuration Details

Strategist

Configuration:

  • name: strategist
  • description: Requirements analysis, architecture design and change proposals
  • tools: Read, Write, Edit, Glob, Grep, WebFetch, AskUserQuestion, SlashCommand
  • model: opus

Responsibilities:

  • Analyze codebase (Read, Grep, Glob)
  • Answer Linus 3 Questions
  • Conduct architecture review
  • Output proposal.md, design.md, tasks.md

Prohibited:

  • Modify src/ business code
  • Execute git commit
  • Run tests, start services
  • Call browser / MCP

Output Format:

  • proposal.md: Change proposal (with Linus 3Q + Impact scope + Risk assessment)
  • design.md: Technical design (with architecture review conclusion 5/5 score)
  • tasks.md: Task list (grouped by Phase)

Builder

Configuration:

  • name: builder
  • description: Code implementation, debugging and testing
  • tools: Read, Write, Edit, Bash, Glob, Grep
  • model: sonnet

Responsibilities:

  • Implement code (Write, Edit)
  • Run tests (Bash)
  • Fix bugs
  • Step-by-step completion (independent test and report per Phase)

Prohibited:

  • Make architecture decisions independently
  • Execute git commit
  • Changes beyond tasks.md scope

Commit Convention (Linus Principle):

  • One patch does one thing
  • Can be reviewed independently / can be reverted independently / diff < 200 lines

Auditor

Configuration:

  • name: auditor
  • description: Code review, security checks and change archiving
  • tools: Read, Bash, Glob, Grep, TodoWrite, SlashCommand
  • model: sonnet

Review Dimensions:

  1. Spec compliance - Does it match proposal.md
  2. Code quality - Clear naming, function length, nesting depth, code duplication
  3. Performance issues - O(n²) check, memory leaks, unnecessary loops
  4. Security vulnerabilities - SQL injection, XSS, hardcoded secrets
  5. Edge cases - null/undefined, empty collections, timeout handling
  6. Test coverage - Critical paths + edge cases

Issue Severity:

Level Definition Response
Critical Security vulnerabilities, logic errors, performance deadlocks, resource leaks Reject merge
Warning O(n²), code duplication, poor maintainability, any type Recommend changes
Info Naming suggestions, comment additions, code style Optional optimization

Review Results:

  • ✅ Pass - No Critical, no Warning or Warning confirmed
  • ⚠️ Conditional Pass - Has Warning, needs confirmation before proceeding
  • ❌ Reject - Has Critical, must fix and resubmit

Package.json Scripts

Development Flow

pnpm dev              # Start backend (localhost:3000)
pnpm build            # Build code (tsc --build)
pnpm typecheck        # TypeScript check
pnpm start            # Production start
pnpm clean            # Delete dist/

Code Quality

pnpm format           # Biome format
pnpm lint             # Biome lint
pnpm check            # Biome check (format + lint)

Testing

pnpm test             # Run unit tests (vitest run)
pnpm test:watch       # Watch mode testing
pnpm test:integration # Integration tests
pnpm verify           # Real website tests

Release

pnpm release          # Release flow (choose version)
pnpm release:patch    # 1.9.4 → 1.9.5
pnpm release:minor    # 1.9.4 → 1.10.0
pnpm release:major    # 1.9.4 → 2.0.0

Other Workflow Configuration

Release Script (scripts/release.sh)

Function: Semantic versioning + changelog generation

Usage:

./scripts/release.sh [patch|minor|major]
./scripts/release.sh patch   # 1.9.4 → 1.9.5
./scripts/release.sh minor   # 1.9.4 → 1.10.0
./scripts/release.sh major   # 1.9.4 → 2.0.0

Checks:

  • Working directory must be clean (no uncommitted changes)
  • Must be on main branch (or confirm to continue)
  • Update package.json version
  • Create Git tag

Frontend Build Verification (scripts/verify-frontend-build.js)

  • Called in pre-push hook Layer 2
  • Verify build output validity
  • Check Tailwind CSS configuration

Environment Configuration (.env)

  • Contains sensitive information (API keys, tokens)
  • Configured in .gitignore, not checked into version control
  • Reference: .env.example

Conventional Commits

Format:

type(scope): subject

body (optional)

Types: feat, fix, refactor, perf, test, docs, chore Scopes: backend, scraper, crawler, api, agents Subject: Short description (< 50 characters)


Core Workflow Summary

Daily Development Flow

Write code
    ↓
git add / git commit (triggers pre-commit)
    ↓
pre-commit hook quality gate (< 3s)
├─ Layer 0: Detect commit type
├─ Layer 1: Verify code review token
├─ Layer 2: Biome format + Lint
    ↓
git push (triggers pre-push)
    ↓
pre-push hook comprehensive check (< 60s first / < 2s cached)
├─ Layer 0.0: Cache check (skip if same commit)
├─ Layer 0.1: Generate VERSION file
├─ Layer 1: Run tests
├─ Layer 2: Build verification (includes TypeScript)
├─ Layer 3: Security audit
    ↓
Code pushed
    ↓
git merge/pull (triggers post-merge)
    ↓
post-merge hook version sync

OpenSpec Change Flow

User submits requirement
    ↓
/openspec:proposal <description>
    ↓
Task(strategist) creates proposal
├─ Answer Linus 3 Questions
├─ Conduct architecture review
├─ Output proposal.md, design.md, tasks.md
    ↓
/openspec:apply <change-id>
    ↓
Task(builder) × N parallel implementation
├─ Complete step by step by Phase
├─ Test and report per Phase
├─ Mark tasks as complete
    ↓
Task(auditor) code review
├─ Check 6 dimensions
├─ Categorize issues (Critical/Warning/Info)
├─ Issue code review token
    ↓
Main Agent git commit
    ↓
git push (pre-push hook check)
    ↓
/openspec:archive <id>
    ↓
Change moved to archive/ complete

This is a carefully designed Linus Torvalds-style workflow system, emphasizing automation, simplicity, and separation of responsibilities. The core philosophy is "machines enforce, humans create".