Pricing

$8.00 / 1,000 results

Go to Store

AI Web Scraper - Powered by Crawl4AI

Try for free

Developed by

Raizen Technology

A blazing-fast AI web scraper powered by Crawl4AI. Perfect for LLMs, AI agents, AI automation, model training, sentiment analysis, and content generation. Supports deep crawling, multiple extraction strategies and flexible output (Markdown/JSON). Seamlessly integrates with Make.com, n8n, and Zapier.

1.0 (1)

Pricing

$8.00 / 1,000 results

Total users

178

Monthly users

Runs succeeded

>99%

Issues response

1.1 hours

Last modified

3 months ago

Agents

Automation

You can access the AI Web Scraper - Powered by Crawl4AI programmatically from your own applications by using the Apify API. You can also choose the language preference from below. To use the Apify API, you’ll need an Apify account and your API token, found in Integrations settings in Apify Console.

Python

JavaScript

CLI

OpenAPI

HTTP

MCP

$echo '{
<  "startUrls": [
<    {
<      "url": "https://www.cnbc.com/2025/03/12/googles-deepmind-says-it-will-use-ai-models-to-power-physical-robots.html"
<    }
<  ],
<  "browserConfig": {
<    "browser_type": "chromium",
<    "headless": true,
<    "verbose_logging": false,
<    "ignore_https_errors": true,
<    "user_agent": "random",
<    "proxy": "",
<    "viewport_width": 1280,
<    "viewport_height": 720,
<    "accept_downloads": false,
<    "extra_headers": {}
<  },
<  "crawlerConfig": {
<    "cache_mode": "BYPASS",
<    "page_timeout": 20000,
<    "simulate_user": true,
<    "override_navigator": true,
<    "magic": true,
<    "remove_overlay_elements": true,
<    "delay_before_return_html": 0.75,
<    "wait_for": "",
<    "screenshot": false,
<    "pdf": false,
<    "enable_rate_limiting": false,
<    "memory_threshold_percent": 90,
<    "word_count_threshold": 200,
<    "css_selector": "",
<    "excluded_tags": [],
<    "excluded_selector": "",
<    "only_text": false,
<    "prettify": false,
<    "keep_data_attributes": false,
<    "remove_forms": false,
<    "bypass_cache": false,
<    "disable_cache": false,
<    "no_cache_read": false,
<    "no_cache_write": false,
<    "wait_until": "domcontentloaded",
<    "wait_for_images": false,
<    "check_robots_txt": false,
<    "mean_delay": 0.1,
<    "max_range": 0.3,
<    "js_code": "",
<    "js_only": false,
<    "ignore_body_visibility": true,
<    "scan_full_page": false,
<    "scroll_delay": 0.2,
<    "process_iframes": false,
<    "adjust_viewport_to_content": false,
<    "screenshot_wait_for": 0,
<    "screenshot_height_threshold": 20000,
<    "image_description_min_word_threshold": 50,
<    "image_score_threshold": 3,
<    "exclude_external_images": false,
<    "exclude_social_media_domains": [],
<    "exclude_external_links": false,
<    "exclude_social_media_links": false,
<    "exclude_domains": [],
<    "verbose": true,
<    "log_console": false,
<    "stream": false
<  },
<  "deepCrawlConfig": {
<    "max_pages": 100,
<    "max_depth": 3,
<    "include_external": false,
<    "score_threshold": 0.5,
<    "filter_chain": [],
<    "keywords": [
<      "crawl",
<      "example",
<      "async",
<      "configuration"
<    ],
<    "weight": 0.7
<  },
<  "markdownConfig": {
<    "ignore_links": false,
<    "ignore_images": false,
<    "escape_html": true,
<    "skip_internal_links": false,
<    "include_sup_sub": false,
<    "citations": false,
<    "body_width": 80,
<    "fit_markdown": false
<  },
<  "contentFilterConfig": {
<    "type": "pruning",
<    "user_query": "",
<    "threshold": 0.45,
<    "min_word_threshold": 5,
<    "bm25_threshold": 1.2,
<    "apply_llm_filter": false,
<    "semantic_filter": "",
<    "word_count_threshold": 10,
<    "sim_threshold": 0.3,
<    "max_dist": 0.2,
<    "top_k": 3,
<    "linkage_method": "ward"
<  },
<  "userAgentConfig": {
<    "user_agent_mode": "random",
<    "device_type": "desktop",
<    "browser_type": "chrome",
<    "num_browsers": 1
<  },
<  "llmConfig": {
<    "provider": "groq/deepseek-r1-distill-llama-70b",
<    "api_token": "",
<    "instruction": "Summarize content in clean markdown.",
<    "base_url": "",
<    "chunk_token_threshold": 2048,
<    "apply_chunking": true,
<    "input_format": "markdown",
<    "temperature": 0.7,
<    "max_tokens": 4096
<  },
<  "extractionSchema": {
<    "name": "Custom Extraction",
<    "baseSelector": "div.article",
<    "fields": [
<      {
<        "name": "title",
<        "selector": "h1",
<        "type": "text"
<      },
<      {
<        "name": "link",
<        "selector": "a",
<        "type": "attribute",
<        "attribute": "href"
<      }
<    ]
<  }
<}' |
<apify call raizen/ai-web-scraper --silent --output-dataset

AI Web Scraper - Crawl4AI for LLMs, AI Agents & Automation API through CLI

The Apify CLI is the official tool that allows you to use AI Web Scraper - Powered by Crawl4AI locally, providing convenience functions and automatic retries on errors.

Install the Apify CLI

$npm i -g apify-cli
$apify login

Other API clients include:

AI Web Scraper - Powered by Crawl4AI API in Python

AI Web Scraper - Powered by Crawl4AI API in JavaScript

AI Web Scraper - Powered by Crawl4AI OpenAPI definition

AI Web Scraper - Powered by Crawl4AI API

RAG Web Browser

apify/rag-web-browser

Web browser for OpenAI Assistants, RAG pipelines, or AI agents, similar to a web browser in ChatGPT. It queries Google Search, scrapes the top N pages, and returns their content as Markdown for further processing by an LLM. It can also scrape individual URLs. Supports Model Context Protocol (MCP).

Apify

4.9K

4.4

🔥fireScraper AI Prompt Website Content Markdown Scraper

mohamedgb00714/fireScraper-AI-prompt-Website-Content-Markdown-Scraper

fireScrape AI is an advanced web scraper built with Crawlee and Puppeteer. It crawls websites, extracts meaningful content, converts it into Markdown, then runs your custom prompt on the extracted text—ideal for generating enriched datasets, summaries or analyses for LLMs and AI pipelines

mohamed el hadi msaid

5.0

Crawl4AI

janbuchar/crawl4ai

Wraps the Crawl4AI open-source library for retrieving text content from websites.

Jan Buchar

536

5.0

AI-Powered Web Content & Link Extractor

scrapercoder/ai-powered-web-content-link-extractor

Crawls websites to extract clean, structured content for AI/LLM use, ideal for training datasets, knowledge bases, and RAG systems. Json output includes: * text: Normalized page content * links: Extracted sub-URLs

wallnut.ai

102

Universal AI GPT Scraper

louisdeconinck/ai-gpt-scraper

Transform any website into structured data with AI-powered extraction. This versatile tool combines advanced web scraping with intelligent content analysis to deliver clean, customized JSON output - perfect for automating data collection from any web source.

Louis Deconinck

5.0

🔥 FireScrape AI Website Content Markdown Scraper

mohamedgb00714/fireScraper-AI-Website-Content-Markdown-Scraper

Advanced web scraper powered by Crawlee and Puppeteer — extracts website content, converts it to Markdown, and structures it for LLM training datasets.

mohamed el hadi msaid

106

3.8

Ai Ready Web Page To Markdown Converter

mustafa.irshaid.113/ai-ready-web-page-to-markdown-converter

Convert any webpage into structured Markdown and HTML using just a URL. Get the page title, link, and content—perfect for SEO, devs, and AI crawlers. Fast, clean, and ideal for repurposing or analysis. Start turning websites into Markdown instantly.

Mustafa Irshaid

AI Website Content Markdown Scraper

quaking_pail/ai-website-content-markdown-scraper

This Apify Actor, "Website Content Crawler with Markdown Extraction," is designed to perform a comprehensive crawl of specified websites, extract their text content, convert it into Markdown format, and store it in a structured dataset. The extracted content is suitable for feeding LLMs.

AI_Builder

633

4.6

Enhanced Deep Content Crawler

assertive_analogy/advanced-crawler

A fast, Python-powered web crawler with smart content extraction, JS support, metadata capture, and duplicate detection. Ideal for SEO, content migration, and e-commerce scraping. Reliable, scalable, and easy to customize.

Gideon Nesh

Pro Web Content Crawler (With Images)

assertive_analogy/pro-web-content-crawler

Pro Web Content Crawler is a powerful tool that digs deep into web content and images. It handles complex sites, dynamic pages, and hidden content, making it perfect for extracting both data and images. Customizable and API-ready for your unique data needs.

Gideon Nesh

129

5.0