Back to Blog

Supacrawler vs Selenium: Local Python Performance Benchmarks

We benchmarked Selenium WebDriver against Supacrawler for JavaScript-heavy web scraping using identical retry and error handling logic on a Mac M4 with 24GB RAM.

See benchmark code: Selenium vs Supacrawler benchmark.

Identical Retry Logic with JavaScript Rendering

This is a critical fairness test because both systems are rendering JavaScript - this is what Selenium was designed for. We used render_js=True for Supacrawler to ensure apples-to-apples comparison.

Supacrawler Service (internal/core/scrape/service.go):

maxRetries := 3
for attempt := 0; attempt < maxRetries; attempt++ {
if attempt > 0 {
d := time.Duration(1<<(attempt-1)) * time.Second // 1s, 2s, 4s
time.Sleep(d)
}
// JavaScript rendering with Playwright backend
result, err = s.scrapeWithPlaywright(params.Url, includeHTML, format)
}

Selenium Benchmark (test notebook):

while attempt < max_retries and not success:
try:
driver.get(url)
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.TAG_NAME, "body"))
)
time.sleep(2) # let JS render
# ... scraping logic
except WebDriverException as e:
attempt += 1
if attempt < max_retries:
backoff = 2 ** (attempt - 1) # 1s, 2s, 4s
time.sleep(backoff)

Critical Setup Details:

  • JavaScript Rendering: Both use full browser automation (render_js=True for Supacrawler)
  • Browser Management: Selenium manages local Chrome, Supacrawler uses cloud infrastructure
  • Retry Logic: Both use 3 retries with exponential backoff (1s→2s→4s)
  • Timeouts: Both use 10-second timeouts with additional 2s for JavaScript execution
  • Error Classification: Both only retry on WebDriverException/retryable errors
  • Environment: Mac M4, 24GB RAM, Chrome headless mode

Why Supacrawler Outperforms Selenium

Architecture Advantage: Supacrawler uses a Go-based streaming worker pool architecture vs Selenium's Python sequential processing:

Supacrawler's Go Streaming Architecture (internal/core/crawl/service.go):

// Worker pool with concurrent processing
maxWorkers := 10 // 20 for non-JS, 2 for JS rendering
if renderJs {
maxWorkers = 2 // Optimized for JavaScript workloads
}
// Concurrent workers process URLs in parallel
for i := 0; i < maxWorkers; i++ {
wg.Add(1)
go worker(i + 1) // Go goroutines for concurrency
}
// Stream results as they complete
worker := func(id int) {
for u := range linksCh {
res, err := s.scrapeWithFreshOption(ctx, u, includeHTML, renderJs, fresh)
// Process immediately, no waiting
}
}

Selenium's Python Sequential Processing:

# Sequential processing - one URL at a time
for url in urls:
driver.get(url) # Block until complete
# Extract data
# Move to next URL (sequential bottleneck)

Key Technical Differences:

  1. Concurrency: Go goroutines vs Python sequential execution
  2. Browser Management: Managed cloud infrastructure vs local browser overhead
  3. Memory Efficiency: Go's efficient memory model vs Python/Selenium memory usage
  4. Network Optimization: Supacrawler's optimized network stack vs WebDriver protocol overhead

Benchmark Results

Single Page Scrape (https://supabase.com) - JavaScript Rendering:

ToolTimeBrowser ManagementResource Usage
Selenium4.08sLocal ChromeHigh CPU/Memory
Supacrawler1.37sCloud managedZero local

Supacrawler is 3.0x faster despite doing the same JavaScript rendering.

Multi-Page Crawling Performance:

Test TypeSupacrawlerSeleniumPerformance Gain
Single Page1.37s4.08s3.0x faster
10 Pages2.05s (0.20s/page)42.11s (4.21s/page)20.6x faster
50 Pages (avg)0.77s/page6.02s/page7.8x faster

Large-Scale Testing (50 pages each site):

WebsiteSeleniumSupacrawlerPerformance Gain
supabase.com4.83s/page0.73s/page6.7x faster
docs.python.org4.13s/page0.83s/page5.0x faster
ai.google.dev9.11s/page0.74s/page12.2x faster

The Content Quality Trade-off

Selenium Raw Output:

Supabase | The Postgres Development Platform. Product Developers
Solutions Pricing Docs Blog 88.3K Sign in Start your project...

Supacrawler LLM-Ready Output:

# Build in a weekend, Scale to millions
Supabase is the Postgres development platform.
Start your project with a Postgres database, Authentication,
instant APIs, Edge Functions, Realtime subscriptions...

Supacrawler automatically removes navigation, ads, and boilerplate content while preserving structured content in clean markdown format - all while being faster than Selenium.

When to Choose Each Tool

Choose Selenium when:

  • You need full browser automation (clicking, form filling)
  • You're building automated testing suites
  • You need complete control over browser interactions
  • You're comfortable managing browser infrastructure

Choose Supacrawler when:

  • You need high-performance web scraping
  • You want LLM-ready, clean markdown output
  • You're building production data extraction pipelines
  • You want zero infrastructure management
  • You need reliable, scalable scraping at volume

The performance difference comes from Supacrawler's purpose-built architecture: Go's concurrency model, streaming worker pools, and managed cloud infrastructure vs Selenium's Python sequential processing and local browser overhead.


See more benchmarks: Supacrawler vs Playwright | Supacrawler vs BeautifulSoup

By Supacrawler Team
Published on September 12, 2025