Rendering JavaScript, Handling Redirects, and Mapping Links
Some sites render most of their content in the browser, or require redirects that break a simple HTTP fetch. In those cases, a plain curl
will not return the final content. This post shows two essential techniques with v1/scrape
:
- Enable JavaScript rendering to fetch the real, user-visible content.
- Extract and map links from a page for simple site discovery.
When curl
is not enough
The Gemini API docs are a good example. A direct curl
to the page will bounce through Google auth redirects and fail. With Supacrawler, set render_js=true
and a real browser fetches and renders the page before extracting content.
Scrape a JS-rendered page (Gemini docs)
curl -G https://api.supacrawler.com/api/v1/scrape \-H "Authorization: Bearer YOUR_API_KEY" \-d url="https://ai.google.dev/gemini-api/docs" \-d format="markdown" \-d render_js=true
Real output (truncated)
# Gemini Developer API[Get a Gemini API Key](https://aistudio.google.com/apikey)Get a Gemini API key and make your first API request in minutes.### Pythonfrom google import genaiclient = genai.Client()response = client.models.generate_content(model="gemini-2.5-flash",contents="Explain how AI works in a few words",)print(response.text)...Title: Gemini API | Google AI for DevelopersStatus: 200Description: Gemini Developer API Docs and API Reference
This matches what a user sees in the browser because the page was rendered before extraction.
Extracting all links from a page
Use the same endpoint with format=links
to collect URLs for lightweight discovery or to seed a crawl. You can control depth
and max_links
to keep results bounded.
Get links from a site
curl -G https://api.supacrawler.com/api/v1/scrape \-H "Authorization: Bearer YOUR_API_KEY" \-d url="https://supacrawler.com" \-d format="links" \-d depth=2 \-d max_links=10
Real output
["https://supacrawler.com/pricing","https://supacrawler.com/about","https://supacrawler.com/contact","https://supacrawler.com/terms-of-service","https://supacrawler.com/work","https://supacrawler.com","https://supacrawler.com/dashboard/scrape","https://supacrawler.com/signin","https://supacrawler.com/privacy-policy","https://supacrawler.com/blog/your-first-web-scrape"]
Takeaways
- Use
render_js=true
for SPA/redirect-heavy pages. - Use
format=links
for quick discovery and to seed crawls. - Combine both to first capture a page, then explore related content safely.
Ready to try it yourself? Grab an API key and run these examples in minutes.