EdgeOne SSE Optimization & Markdown Parsing Fix
How we solved EdgeOne 30s timeouts by removing the proxy layer, and fixed non-standard HTML table parsing with custom Turndown rules.
Key Insight: The ultimate form of architectural optimization is "subtraction"—removing a valueless proxy layer solved the timeout issue. Robustness in parsers beats standardization—leniently handling non-standard HTML is more practical than strictly following specs.
Problem Statement
In version v2.5.3, we encountered two critical issues affecting user experience:
- SSE Interruption: Frontend crawler progress frequently stuck at 30/40. Diagnosis revealed that EdgeOne Node Functions have a hard 30-second timeout limit, and the CDN was buffering SSE events.
- Markdown Conversion Failure: Non-standard HTML tables generated by Slate editors (using
<td class="is-header">instead of<th>) were not recognized by the standardturndown-plugin-gfmplugin, resulting in raw HTML output.
Key Decisions
Decision 1: Remove Reverse Proxy vs. Optimize Proxy Config
We faced two choices to solve the 30s timeout:
| Option | Pros | Cons |
|---|---|---|
| Optimize Proxy | Maintains architectural consistency, hides backend IP | Constrained by platform hard limit (30s), cannot fully solve |
| Remove Proxy (Direct) | Completely solves timeout, reduces latency, simpler architecture | Requires CORS configuration, requires auth redesign |
Decision: Remove Proxy (Direct Connection)
Reason: The proxy layer introduced a platform limit that couldn't be bypassed, without providing core value (like caching or secure aggregation). As Linus Torvalds might say: "Using a service with a 30-second limit to proxy a request that needs 40 seconds isn't architecture; it's asking for trouble."
Decision 2: Dual-Layer Authentication Strategy
Direct frontend-to-backend connection introduced a new problem: How to protect the API Key? We adopted a hybrid strategy:
| Context | Auth Method | Reason |
|---|---|---|
| Browser (User) | CORS Whitelist | Browsers cannot safely store API Keys; domain whitelisting is secure enough |
| Server (API) | API Key | Traditional key auth, not limited by CORS, suitable for 3rd-party integration |
Reason: This strategy ensures convenience for Web users (no key config needed) while maintaining API security (preventing abuse).
Decision 3: Custom Turndown Rules vs. HTML Pre-processing
When fixing the table parsing issue, we compared two technical approaches:
| Option | Pros | Cons |
|---|---|---|
| Pre-process HTML | Reuses existing GFM plugin | Requires multiple regex replacements, fragile DOM manipulation, high maintenance |
| Custom Turndown Rules | Higher performance (-74%), robust, cohesive logic | Requires writing AST handling logic |
Decision: Custom Turndown Rules
Reason: Using Turndown's content parameter avoids recursive parsing overhead. Benchmarks showed custom rules took only 1.7ms for small documents, compared to 6.5ms for the GFM plugin. More importantly, it flexibly handles <colgroup> and non-standard headers, embodying the principle that "parsers should be robust."
Core Insights
- Architectural Subtraction: The best code is no code. Removing the proxy layer not only solved the timeout but also reduced network hops and failure points.
- Google SRE Mindset: Tests in the release process should be deterministic. We moved E2E tests out of the release blocking path because tests relying on external services (picsum.photos) should not block version releases.
- Parser Design: When converting HTML to Markdown, "duck typing" (if it looks like a table, render it as a table) is more practical than strict spec validation.
Reusable Patterns
Pattern: Context-Aware Hybrid Auth
export function authenticate(req, res, next) {
const API_KEY = process.env.API_KEY;
const origin = req.headers.origin;
// 1. Browser context: Rely on CORS whitelist (No Key needed)
if (origin && isAllowedOrigin(origin)) {
return next();
}
// 2. Server context: Rely on API Key
const providedKey = req.headers.authorization?.replace("Bearer ", "");
if (providedKey === API_KEY) {
return next();
}
res.status(401).json({ error: "Unauthorized" });
}
Pattern: Robust Turndown Table Rules
// Don't rely on <th> or <thead>, render if it has cells
turndownService.addRule("robustTable", {
filter: "table",
replacement: (content, node) => {
// Simple string processing is often faster and more robust than DOM ops
const rows = content.trim().split("\n");
if (rows.length === 0) return "";
// Auto-calculate columns and generate separator
const colCount = (rows[0].match(/\|/g) || []).length - 1;
const separator = "|" + " --- |".repeat(colCount);
return "\n\n" + rows[0] + "\n" + separator + "\n" + rows.slice(1).join("\n") + "\n\n";
}
});
Next Steps
- Simplify Release: Implement
feat-release-simplifyby removing unstable E2E tests fromrelease.sh. - Enhance Tables: Implement
feat-table-alignto support HTML tablealignattribute conversion.