Remote Browser Gateway Integration
How Firecrawl integrates EdgeOne browser gateway using Look-aside pattern
Core Insight: Integrating complex systems via a minimalist "Look-aside" pattern, outsourcing browser cluster complexity to a gateway service.
Firecrawl needs to support remote browser clusters to improve scalability, but maintaining a browser pool in-house is costly. This document records the architectural decisions and implementation patterns for integrating the EdgeOne browser gateway.
Problem Statement
As concurrent crawling demands increase, the local BrowserPool becomes a resource bottleneck. Although Puppeteer supports connecting to remote browsers via browserWSEndpoint, we need a dynamic, highly available way to acquire these remote instances instead of hardcoding a single address.
Another team built the EdgeOne browser gateway, offering dynamic scheduling capabilities. We need to integrate this service with minimal cost while maintaining system robustness.
Key Decisions
Decision 1: Auth Token Reuse
| Option | Pros | Cons |
|---|---|---|
New BROWSER_TOKEN |
Clear concept, strong isolation | Adds configuration, user confusion |
Reuse BROWSERLESS_TOKEN |
Simple config, seamless migration | Relies on naming convention |
Choice: Reuse BROWSERLESS_TOKEN
Reason: The gateway is essentially a proxy for Browserless. Reusing the existing standard token fits user intuition and reduces configuration burden. This follows the "Simple > Clever" principle.
Decision 2: Client-side Protocol Conversion
Problem: The gateway API returns https://... (for generality), but Puppeteer's browserWSEndpoint requires wss://.
Options:
- Request gateway team to modify API
- Client-side protocol conversion
Choice: Client-side conversion (replace(/^https?:\/\//, "wss://"))
Reason: "Robustness Principle" — be conservative in what you do, be liberal in what you accept from others. Handling it on the client side immediately decouples dependencies, avoiding waits for server-side changes and speeding up integration.
Decision 3: Aggressive Degradation Strategy (Fail-safe)
try {
// Get remote browser...
} catch {
return null; // Silent fallback to local browser on ANY error
}
Reason: Service availability > Performance. Gateway failures (network issues, downtime) should not render the entire Firecrawl service unavailable. This "silent fallback" strategy ensures that even if the gateway is completely down, the local service continues to work (albeit with reduced concurrency).
Core Insights
- Value of "Glue Code": Connecting two complex systems with just ~15 lines of code (
src/lib/remote-browser.ts). Good architecture often yields maximum value with minimal code. - Configuration Consistency: Maintaining consistency in environment variable naming (like
BROWSERLESS_TOKEN) is more important than introducing new concepts, reducing cognitive load for users. - OpenSpec Workflow: Even for small changes, following the "Proposal -> Code -> Archive" flow ensures documentation and decisions are recorded, avoiding "phantom code".
Reusable Patterns
Pattern: Look-aside Gateway
Query the gateway before creating local resources:
- Query: Ask gateway "Are resources available?"
- Hit: Use resources returned by gateway (Remote)
- Miss/Fail: Use local resources (Local Fallback)
This pattern has low availability requirements for the gateway and naturally supports high availability.
Future Actions
- Monitor gateway connection success rate and latency
- Consider automatic reconnection mechanisms on gateway disconnects
- Update deployment documentation explaining gateway configuration
Resource Isolation Architecture Evolution (Distributed Crawling)
Under restricted resources (2C2G), crawling tasks (memory-heavy, bursty) compete with main business logic (lightweight, continuous), leading to OOM or service instability.
To solve this, we explored two different evolution paths:
- Remote Gateway Integration: Leveraging the existing EdgeOne browser gateway (maintained by another team).
- Distributed Worker Construction: Utilizing free compute power from CNB CI/CD platform (self-built architecture).
Path Comparison
Path A: Remote Browser Gateway (Implemented)
Outsourcing computation via minimalist integration with external services.
- Pattern: Look-aside Gateway
- Cost: Dependent on external team
- Complexity: Low (client-side integration)
Path B: CNB Distributed Worker (Exploratory)
Leveraging free CI compute, overcoming NAT and cold start limitations via architectural adaptation.
- Pattern: Reverse Tunnel & Time Relay
- Cost: $0 (exchanged for architectural complexity)
- Complexity: High (reverse connection, state management)
Core Insight: The essence of the requirement is often not "distribution" but "resource isolation". The gateway solution isolates via "outsourcing", while the worker solution isolates via "physical separation".