Supacrawler vs Selenium: Local Python Performance Benchmarks

We benchmarked Selenium WebDriver against Supacrawler for JavaScript-heavy web scraping using identical retry and error handling logic on a Mac M4 with 24GB RAM.

See benchmark code: Selenium vs Supacrawler benchmark.

Identical Retry Logic with JavaScript Rendering

This is a critical fairness test because both systems are rendering JavaScript - this is what Selenium was designed for. We used render_js=True for Supacrawler to ensure apples-to-apples comparison.

Supacrawler Service (internal/core/scrape/service.go):

maxRetries := 3
for attempt := 0; attempt < maxRetries; attempt++ {
    if attempt > 0 {
        d := time.Duration(1<<(attempt-1)) * time.Second  // 1s, 2s, 4s
        time.Sleep(d)
    }
    // JavaScript rendering with Playwright backend
    result, err = s.scrapeWithPlaywright(params.Url, includeHTML, format)
}

Selenium Benchmark (test notebook):

while attempt < max_retries and not success:
    try:
        driver.get(url)
        WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.TAG_NAME, "body"))
        )
        time.sleep(2)  # let JS render
        # ... scraping logic
    except WebDriverException as e:
        attempt += 1
        if attempt < max_retries:
            backoff = 2 ** (attempt - 1)  # 1s, 2s, 4s
            time.sleep(backoff)

Critical Setup Details:

JavaScript Rendering: Both use full browser automation (render_js=True for Supacrawler)
Browser Management: Selenium manages local Chrome, Supacrawler uses cloud infrastructure
Retry Logic: Both use 3 retries with exponential backoff (1s→2s→4s)
Timeouts: Both use 10-second timeouts with additional 2s for JavaScript execution
Error Classification: Both only retry on WebDriverException/retryable errors
Environment: Mac M4, 24GB RAM, Chrome headless mode

Why Supacrawler Outperforms Selenium

Architecture Advantage: Supacrawler uses a Go-based streaming worker pool architecture vs Selenium's Python sequential processing:

Supacrawler's Go Streaming Architecture (`internal/core/crawl/service.go`):

// Worker pool with concurrent processing
maxWorkers := 10  // 20 for non-JS, 2 for JS rendering
if renderJs {
    maxWorkers = 2  // Optimized for JavaScript workloads
}

// Concurrent workers process URLs in parallel
for i := 0; i < maxWorkers; i++ {
    wg.Add(1)
    go worker(i + 1)  // Go goroutines for concurrency
}

// Stream results as they complete
worker := func(id int) {
    for u := range linksCh {
        res, err := s.scrapeWithFreshOption(ctx, u, includeHTML, renderJs, fresh)
        // Process immediately, no waiting
    }
}

Selenium's Python Sequential Processing:

# Sequential processing - one URL at a time
for url in urls:
    driver.get(url)  # Block until complete
    # Extract data
    # Move to next URL (sequential bottleneck)

Key Technical Differences:

Concurrency: Go goroutines vs Python sequential execution
Browser Management: Managed cloud infrastructure vs local browser overhead
Memory Efficiency: Go's efficient memory model vs Python/Selenium memory usage
Network Optimization: Supacrawler's optimized network stack vs WebDriver protocol overhead

Benchmark Results

Single Page Scrape (https://supabase.com) - JavaScript Rendering:

Tool	Time	Browser Management	Resource Usage
Selenium	4.08s	Local Chrome	High CPU/Memory
Supacrawler	1.37s	Cloud managed	Zero local

Supacrawler is 3.0x faster despite doing the same JavaScript rendering.

Multi-Page Crawling Performance:

Test Type	Supacrawler	Selenium	Performance Gain
Single Page	1.37s	4.08s	3.0x faster
10 Pages	2.05s (0.20s/page)	42.11s (4.21s/page)	20.6x faster
50 Pages (avg)	0.77s/page	6.02s/page	7.8x faster

Large-Scale Testing (50 pages each site):

Website	Selenium	Supacrawler	Performance Gain
supabase.com	4.83s/page	0.73s/page	6.7x faster
docs.python.org	4.13s/page	0.83s/page	5.0x faster
ai.google.dev	9.11s/page	0.74s/page	12.2x faster

The Content Quality Trade-off

Selenium Raw Output:

Supabase | The Postgres Development Platform. Product Developers 
Solutions Pricing Docs Blog 88.3K Sign in Start your project...

Supacrawler LLM-Ready Output:

# Build in a weekend, Scale to millions

Supabase is the Postgres development platform.

Start your project with a Postgres database, Authentication, 
instant APIs, Edge Functions, Realtime subscriptions...

Supacrawler automatically removes navigation, ads, and boilerplate content while preserving structured content in clean markdown format - all while being faster than Selenium.

When to Choose Each Tool

Choose Selenium when:

You need full browser automation (clicking, form filling)
You're building automated testing suites
You need complete control over browser interactions
You're comfortable managing browser infrastructure

Choose Supacrawler when:

You need high-performance web scraping
You want LLM-ready, clean markdown output
You're building production data extraction pipelines
You want zero infrastructure management
You need reliable, scalable scraping at volume

The performance difference comes from Supacrawler's purpose-built architecture: Go's concurrency model, streaming worker pools, and managed cloud infrastructure vs Selenium's Python sequential processing and local browser overhead.

See more benchmarks: Supacrawler vs Playwright | Supacrawler vs BeautifulSoup

Supacrawler vs Selenium: Local Python Performance Benchmarks

Identical Retry Logic with JavaScript Rendering

Why Supacrawler Outperforms Selenium

Supacrawler's Go Streaming Architecture (`internal/core/crawl/service.go`):

Selenium's Python Sequential Processing:

Benchmark Results

The Content Quality Trade-off

When to Choose Each Tool

Product

Company

Blog

Support

Supacrawler vs Selenium: Local Python Performance Benchmarks

Identical Retry Logic with JavaScript Rendering

Why Supacrawler Outperforms Selenium

Supacrawler's Go Streaming Architecture (internal/core/crawl/service.go):

Selenium's Python Sequential Processing:

Benchmark Results

The Content Quality Trade-off

When to Choose Each Tool

Supacrawler's Go Streaming Architecture (`internal/core/crawl/service.go`):