Supacrawler vs Selenium: Local Python Performance Benchmarks
We benchmarked Selenium WebDriver against Supacrawler for JavaScript-heavy web scraping using identical retry and error handling logic on a Mac M4 with 24GB RAM.
See benchmark code: Selenium vs Supacrawler benchmark.
Identical Retry Logic with JavaScript Rendering
This is a critical fairness test because both systems are rendering JavaScript - this is what Selenium was designed for. We used render_js=True
for Supacrawler to ensure apples-to-apples comparison.
Supacrawler Service (internal/core/scrape/service.go
):
maxRetries := 3for attempt := 0; attempt < maxRetries; attempt++ {if attempt > 0 {d := time.Duration(1<<(attempt-1)) * time.Second // 1s, 2s, 4stime.Sleep(d)}// JavaScript rendering with Playwright backendresult, err = s.scrapeWithPlaywright(params.Url, includeHTML, format)}
Selenium Benchmark (test notebook):
while attempt < max_retries and not success:try:driver.get(url)WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.TAG_NAME, "body")))time.sleep(2) # let JS render# ... scraping logicexcept WebDriverException as e:attempt += 1if attempt < max_retries:backoff = 2 ** (attempt - 1) # 1s, 2s, 4stime.sleep(backoff)
Critical Setup Details:
- JavaScript Rendering: Both use full browser automation (
render_js=True
for Supacrawler) - Browser Management: Selenium manages local Chrome, Supacrawler uses cloud infrastructure
- Retry Logic: Both use 3 retries with exponential backoff (1s→2s→4s)
- Timeouts: Both use 10-second timeouts with additional 2s for JavaScript execution
- Error Classification: Both only retry on WebDriverException/retryable errors
- Environment: Mac M4, 24GB RAM, Chrome headless mode
Why Supacrawler Outperforms Selenium
Architecture Advantage: Supacrawler uses a Go-based streaming worker pool architecture vs Selenium's Python sequential processing:
Supacrawler's Go Streaming Architecture (internal/core/crawl/service.go
):
// Worker pool with concurrent processingmaxWorkers := 10 // 20 for non-JS, 2 for JS renderingif renderJs {maxWorkers = 2 // Optimized for JavaScript workloads}// Concurrent workers process URLs in parallelfor i := 0; i < maxWorkers; i++ {wg.Add(1)go worker(i + 1) // Go goroutines for concurrency}// Stream results as they completeworker := func(id int) {for u := range linksCh {res, err := s.scrapeWithFreshOption(ctx, u, includeHTML, renderJs, fresh)// Process immediately, no waiting}}
Selenium's Python Sequential Processing:
# Sequential processing - one URL at a timefor url in urls:driver.get(url) # Block until complete# Extract data# Move to next URL (sequential bottleneck)
Key Technical Differences:
- Concurrency: Go goroutines vs Python sequential execution
- Browser Management: Managed cloud infrastructure vs local browser overhead
- Memory Efficiency: Go's efficient memory model vs Python/Selenium memory usage
- Network Optimization: Supacrawler's optimized network stack vs WebDriver protocol overhead
Benchmark Results
Single Page Scrape (https://supabase.com) - JavaScript Rendering:
Tool | Time | Browser Management | Resource Usage |
---|---|---|---|
Selenium | 4.08s | Local Chrome | High CPU/Memory |
Supacrawler | 1.37s | Cloud managed | Zero local |
Supacrawler is 3.0x faster despite doing the same JavaScript rendering.
Multi-Page Crawling Performance:
Test Type | Supacrawler | Selenium | Performance Gain |
---|---|---|---|
Single Page | 1.37s | 4.08s | 3.0x faster |
10 Pages | 2.05s (0.20s/page) | 42.11s (4.21s/page) | 20.6x faster |
50 Pages (avg) | 0.77s/page | 6.02s/page | 7.8x faster |
Large-Scale Testing (50 pages each site):
Website | Selenium | Supacrawler | Performance Gain |
---|---|---|---|
supabase.com | 4.83s/page | 0.73s/page | 6.7x faster |
docs.python.org | 4.13s/page | 0.83s/page | 5.0x faster |
ai.google.dev | 9.11s/page | 0.74s/page | 12.2x faster |
The Content Quality Trade-off
Selenium Raw Output:
Supabase | The Postgres Development Platform. Product DevelopersSolutions Pricing Docs Blog 88.3K Sign in Start your project...
Supacrawler LLM-Ready Output:
# Build in a weekend, Scale to millionsSupabase is the Postgres development platform.Start your project with a Postgres database, Authentication,instant APIs, Edge Functions, Realtime subscriptions...
Supacrawler automatically removes navigation, ads, and boilerplate content while preserving structured content in clean markdown format - all while being faster than Selenium.
When to Choose Each Tool
Choose Selenium when:
- You need full browser automation (clicking, form filling)
- You're building automated testing suites
- You need complete control over browser interactions
- You're comfortable managing browser infrastructure
Choose Supacrawler when:
- You need high-performance web scraping
- You want LLM-ready, clean markdown output
- You're building production data extraction pipelines
- You want zero infrastructure management
- You need reliable, scalable scraping at volume
The performance difference comes from Supacrawler's purpose-built architecture: Go's concurrency model, streaming worker pools, and managed cloud infrastructure vs Selenium's Python sequential processing and local browser overhead.
See more benchmarks: Supacrawler vs Playwright | Supacrawler vs BeautifulSoup