Back to Blog

Rendering JavaScript, Handling Redirects, and Mapping Links

Some sites render most of their content in the browser, or require redirects that break a simple HTTP fetch. In those cases, a plain curl will not return the final content. This post shows two essential techniques with v1/scrape:

  1. Enable JavaScript rendering to fetch the real, user-visible content.
  2. Extract and map links from a page for simple site discovery.

When curl is not enough

The Gemini API docs are a good example. A direct curl to the page will bounce through Google auth redirects and fail. With Supacrawler, set render_js=true and a real browser fetches and renders the page before extracting content.

Scrape a JS-rendered page (Gemini docs)

curl -G https://api.supacrawler.com/api/v1/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-d url="https://ai.google.dev/gemini-api/docs" \
-d format="markdown" \
-d render_js=true

Real output (truncated)

# Gemini Developer API
[Get a Gemini API Key](https://aistudio.google.com/apikey)
Get a Gemini API key and make your first API request in minutes.
### Python
from google import genai
client = genai.Client()
response = client.models.generate_content(
model="gemini-2.5-flash",
contents="Explain how AI works in a few words",
)
print(response.text)
...
Title: Gemini API | Google AI for Developers
Status: 200
Description: Gemini Developer API Docs and API Reference

This matches what a user sees in the browser because the page was rendered before extraction.

Extracting all links from a page

Use the same endpoint with format=links to collect URLs for lightweight discovery or to seed a crawl. You can control depth and max_links to keep results bounded.

Get links from a site

curl -G https://api.supacrawler.com/api/v1/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-d url="https://supacrawler.com" \
-d format="links" \
-d depth=2 \
-d max_links=10

Real output

[
"https://supacrawler.com/pricing",
"https://supacrawler.com/about",
"https://supacrawler.com/contact",
"https://supacrawler.com/terms-of-service",
"https://supacrawler.com/work",
"https://supacrawler.com",
"https://supacrawler.com/dashboard/scrape",
"https://supacrawler.com/signin",
"https://supacrawler.com/privacy-policy",
"https://supacrawler.com/blog/your-first-web-scrape"
]

Takeaways

  • Use render_js=true for SPA/redirect-heavy pages.
  • Use format=links for quick discovery and to seed crawls.
  • Combine both to first capture a page, then explore related content safely.

Ready to try it yourself? Grab an API key and run these examples in minutes.

By Supacrawler Team
Published on August 23, 2025