Rendering JavaScript, Handling Redirects, and Mapping Links

Some sites render most of their content in the browser, or require redirects that break a simple HTTP fetch. In those cases, a plain curl will not return the final content. This post shows two essential techniques with v1/scrape:

Enable JavaScript rendering to fetch the real, user-visible content.
Extract and map links from a page for simple site discovery.

When `curl` is not enough

The Gemini API docs are a good example. A direct curl to the page will bounce through Google auth redirects and fail. With Supacrawler, set render_js=true and a real browser fetches and renders the page before extracting content.

Scrape a JS-rendered page (Gemini docs)

curl -G https://api.supacrawler.com/api/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d url="https://ai.google.dev/gemini-api/docs" \
  -d format="markdown" \
  -d render_js=true

Real output (truncated)

# Gemini Developer API
[Get a Gemini API Key](https://aistudio.google.com/apikey)
Get a Gemini API key and make your first API request in minutes.

### Python
from google import genai
client = genai.Client()
response = client.models.generate_content(
  model="gemini-2.5-flash",
  contents="Explain how AI works in a few words",
)
print(response.text)
...
Title: Gemini API | Google AI for Developers
Status: 200
Description: Gemini Developer API Docs and API Reference

This matches what a user sees in the browser because the page was rendered before extraction.

Extracting all links from a page

Use the same endpoint with format=links to collect URLs for lightweight discovery or to seed a crawl. You can control depth and max_links to keep results bounded.

Get links from a site

curl -G https://api.supacrawler.com/api/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d url="https://supacrawler.com" \
  -d format="links" \
  -d depth=2 \
  -d max_links=10

Real output

[
  "https://supacrawler.com/pricing",
  "https://supacrawler.com/about",
  "https://supacrawler.com/contact",
  "https://supacrawler.com/terms-of-service",
  "https://supacrawler.com/work",
  "https://supacrawler.com",
  "https://supacrawler.com/dashboard/scrape",
  "https://supacrawler.com/signin",
  "https://supacrawler.com/privacy-policy",
  "https://supacrawler.com/blog/your-first-web-scrape"
]

Takeaways

Use render_js=true for SPA/redirect-heavy pages.
Use format=links for quick discovery and to seed crawls.
Combine both to first capture a page, then explore related content safely.

Ready to try it yourself? Grab an API key and run these examples in minutes.

Rendering JavaScript, Handling Redirects, and Mapping Links

When `curl` is not enough

Scrape a JS-rendered page (Gemini docs)

Real output (truncated)

Extracting all links from a page

Get links from a site

Real output

Takeaways

Product

Company

Blog

Support

Rendering JavaScript, Handling Redirects, and Mapping Links

When curl is not enough

Scrape a JS-rendered page (Gemini docs)

Real output (truncated)

Extracting all links from a page

Get links from a site

Real output

Takeaways

When `curl` is not enough