Web Scraper and AI processor avatar
Web Scraper and AI processor

Pricing

Pay per event

Go to Apify Store
Web Scraper and AI processor

Web Scraper and AI processor

Developed by

Scraping Samurai

Scraping Samurai

Maintained by Community

Adaptive AI controller classifies page quality from fast HTTP fetches and selectively triggers headless rendering, then converts raw text into structured JSON from natural-language extraction prompts. Optimizes cost vs. accuracy with AI-guided escalation, retry, and thin/blocked content heuristics.

0.0 (0)

Pricing

Pay per event

1

3

3

Last modified

3 days ago

Smart Web Scraper & Data Extractor

Extract structured data from any set of web pages with ease.
This Actor crawls your target URLs, handles blocking automatically, and uses an advanced AI-powered extraction engine to transform messy page text into clean, structured outputs such as JSON.


✨ Features

  • HTTP-first crawling → Fast & efficient.
  • Automatic browser fallback → If pages block bots or require JS rendering, the Actor switches to a full browser for reliable scraping.
  • AI-powered text extraction → Provide your own natural language instruction (e.g., “Extract all emails and phone numbers as JSON”), and the Actor will return structured results.
  • Robust anti-blocking → Uses concurrency controls, proxy support, and session handling for maximum reliability.
  • Pay-per-event pricing → You pay only for the work done:
    • Run start
    • Each URL processed via HTTP
    • Each URL escalated to browser

🚀 Use Cases

  • Lead generation → Extract contact details (emails, phones, LinkedIn URLs).
  • E-commerce monitoring → Get product names, prices, SKUs, and stock statuses.
  • News & blogs → Collect article titles, authors, dates, and summaries.
  • SEO research → Extract H1s, meta descriptions, canonical URLs.
  • Custom reports → Pull out exactly what you need with a single instruction.

🛠️ Input Schema

{
"urls": [
"https://apify.com/",
"https://crawlee.dev/"
],
"extractionInstruction": "Extract the page title and the first H1 as JSON with keys: title, h1."
}

Fields:

  • urls (array, required) — List of page URLs to scrape.
  • extractionInstruction (string, required) — Describe what to extract in plain language.

Note: Advanced crawling options (concurrency, retries, proxy settings, etc.) are set internally and are not user-configurable.


📊 Output Example

{
"url": "https://crawlee.dev/",
"content": "…extracted plain text from the page…",
"aiAnswer": {
"title": "Crawlee",
"h1": "The web scraping and browser automation library for Node.js"
},
"status": "success"
}

Each record contains:

  • url — Source page
  • content — Extracted raw text
  • aiAnswer — Structured data matching your instruction
  • statussuccess, blocked, or error

💵 Pricing Model

This Actor uses a pay-per-event pricing system.
You only pay for what you actually use:

  • Run start (run-start) → A flat fee charged once at the beginning of each run.
  • URL (HTTP) start (url-http-start) → A fee charged for every URL processed with the fast HTTP crawler.
  • URL (Browser) start (url-browser-start) → A higher fee charged only if the Actor needs to escalate a URL to full browser mode (Playwright).

Why this model?

  • Fair → You don’t pay for unused capacity, only for actual work.
  • Predictable → Costs scale with the number of pages and whether they need browser fallback.
  • Efficient → Most pages succeed in fast HTTP mode, so you save money. Browser mode is used only when necessary.

Example

If you run the Actor with 100 URLs:

  • 100 × url-http-start
    • 20 × url-browser-start (if 20 of them needed browser)
    • 1 × run-start

👉 Total = cost of 121 events.


🔒 Why Choose This Actor?

  • Built on Apify platform with Crawlee under the hood.
  • Designed for scalability and reliability — from a few URLs to thousands.
  • No brittle CSS selectors — describe what you want in plain language.
  • Handles dynamic pages, blocking, and captchas with minimal setup.

💡 Pro Tips

  • Write precise extraction instructions → “Extract product name, price, and availability as JSON with keys: name, price, availability.”
  • Use proxies for large-scale scraping to avoid rate limits.
  • Set a reasonable minCharsThreshold to automatically retry thin or blocked pages in browser mode.

📈 SEO Keywords

Web scraping, data extraction, structured data, AI extractor, JSON extraction, Apify actor, automatic browser fallback, anti-blocking crawler, scrape websites, intelligent scraper, text-to-JSON, scalable web scraping.


⚡ Get Started Now

  1. Add your URLs and extraction instruction.
  2. Run the Actor on Apify.
  3. Get clean, structured data — fast, reliable, and AI-enhanced.

Turn any website into structured data with one Actor run. Save hours of manual parsing and let the scraper + AI do the heavy lifting.