Pricing

$8.00 / 1,000 results

Go to Store

AI Web Scraper - Powered by Crawl4AI

Try for free

Developed by

Raizen Technology

A blazing-fast AI web scraper powered by Crawl4AI. Perfect for LLMs, AI agents, AI automation, model training, sentiment analysis, and content generation. Supports deep crawling, multiple extraction strategies and flexible output (Markdown/JSON). Seamlessly integrates with Make.com, n8n, and Zapier.

1.0 (1)

Pricing

$8.00 / 1,000 results

Total users

178

Monthly users

Runs succeeded

>99%

Issues response

1.1 hours

Last modified

3 months ago

Agents

Automation

You can access the AI Web Scraper - Powered by Crawl4AI programmatically from your own applications by using the Apify API. You can also choose the language preference from below. To use the Apify API, you’ll need an Apify account and your API token, found in Integrations settings in Apify Console.

Python

JavaScript

CLI

OpenAPI

HTTP

MCP

1from apify_client import ApifyClient
2
3# Initialize the ApifyClient with your Apify API token
4# Replace '<YOUR_API_TOKEN>' with your token.
5client = ApifyClient("<YOUR_API_TOKEN>")
6
7# Prepare the Actor input
8run_input = {
9    "startUrls": [{ "url": "https://www.cnbc.com/2025/03/12/googles-deepmind-says-it-will-use-ai-models-to-power-physical-robots.html" }],
10    "browserConfig": {
11        "browser_type": "chromium",
12        "headless": True,
13        "verbose_logging": False,
14        "ignore_https_errors": True,
15        "user_agent": "random",
16        "proxy": "",
17        "viewport_width": 1280,
18        "viewport_height": 720,
19        "accept_downloads": False,
20        "extra_headers": {},
21    },
22    "crawlerConfig": {
23        "cache_mode": "BYPASS",
24        "page_timeout": 20000,
25        "simulate_user": True,
26        "override_navigator": True,
27        "magic": True,
28        "remove_overlay_elements": True,
29        "delay_before_return_html": 0.75,
30        "wait_for": "",
31        "screenshot": False,
32        "pdf": False,
33        "enable_rate_limiting": False,
34        "memory_threshold_percent": 90,
35        "word_count_threshold": 200,
36        "css_selector": "",
37        "excluded_tags": [],
38        "excluded_selector": "",
39        "only_text": False,
40        "prettify": False,
41        "keep_data_attributes": False,
42        "remove_forms": False,
43        "bypass_cache": False,
44        "disable_cache": False,
45        "no_cache_read": False,
46        "no_cache_write": False,
47        "wait_until": "domcontentloaded",
48        "wait_for_images": False,
49        "check_robots_txt": False,
50        "mean_delay": 0.1,
51        "max_range": 0.3,
52        "js_code": "",
53        "js_only": False,
54        "ignore_body_visibility": True,
55        "scan_full_page": False,
56        "scroll_delay": 0.2,
57        "process_iframes": False,
58        "adjust_viewport_to_content": False,
59        "screenshot_wait_for": 0,
60        "screenshot_height_threshold": 20000,
61        "image_description_min_word_threshold": 50,
62        "image_score_threshold": 3,
63        "exclude_external_images": False,
64        "exclude_social_media_domains": [],
65        "exclude_external_links": False,
66        "exclude_social_media_links": False,
67        "exclude_domains": [],
68        "verbose": True,
69        "log_console": False,
70        "stream": False,
71    },
72    "deepCrawlConfig": {
73        "max_pages": 100,
74        "max_depth": 3,
75        "include_external": False,
76        "score_threshold": 0.5,
77        "filter_chain": [],
78        "keywords": [
79            "crawl",
80            "example",
81            "async",
82            "configuration",
83        ],
84        "weight": 0.7,
85    },
86    "markdownConfig": {
87        "ignore_links": False,
88        "ignore_images": False,
89        "escape_html": True,
90        "skip_internal_links": False,
91        "include_sup_sub": False,
92        "citations": False,
93        "body_width": 80,
94        "fit_markdown": False,
95    },
96    "contentFilterConfig": {
97        "type": "pruning",
98        "user_query": "",
99        "threshold": 0.45,
100        "min_word_threshold": 5,
101        "bm25_threshold": 1.2,
102        "apply_llm_filter": False,
103        "semantic_filter": "",
104        "word_count_threshold": 10,
105        "sim_threshold": 0.3,
106        "max_dist": 0.2,
107        "top_k": 3,
108        "linkage_method": "ward",
109    },
110    "userAgentConfig": {
111        "user_agent_mode": "random",
112        "device_type": "desktop",
113        "browser_type": "chrome",
114        "num_browsers": 1,
115    },
116    "llmConfig": {
117        "provider": "groq/deepseek-r1-distill-llama-70b",
118        "api_token": "",
119        "instruction": "Summarize content in clean markdown.",
120        "base_url": "",
121        "chunk_token_threshold": 2048,
122        "apply_chunking": True,
123        "input_format": "markdown",
124        "temperature": 0.7,
125        "max_tokens": 4096,
126    },
127    "extractionSchema": {
128        "name": "Custom Extraction",
129        "baseSelector": "div.article",
130        "fields": [
131            {
132                "name": "title",
133                "selector": "h1",
134                "type": "text",
135            },
136            {
137                "name": "link",
138                "selector": "a",
139                "type": "attribute",
140                "attribute": "href",
141            },
142        ],
143    },
144}
145
146# Run the Actor and wait for it to finish
147run = client.actor("raizen/ai-web-scraper").call(run_input=run_input)
148
149# Fetch and print Actor results from the run's dataset (if there are any)
150print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
151for item in client.dataset(run["defaultDatasetId"]).iterate_items():
152    print(item)
153
154# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

AI Web Scraper - Crawl4AI for LLMs, AI Agents & Automation API in Python

The Apify API client for Python is the official library that allows you to use AI Web Scraper - Powered by Crawl4AI API in Python, providing convenience functions and automatic retries on errors.

Install the apify-client

$pip install apify-client

Other API clients include:

AI Web Scraper - Powered by Crawl4AI API in JavaScript

AI Web Scraper - Powered by Crawl4AI API through CLI

AI Web Scraper - Powered by Crawl4AI OpenAPI definition

AI Web Scraper - Powered by Crawl4AI API

RAG Web Browser

apify/rag-web-browser

Web browser for OpenAI Assistants, RAG pipelines, or AI agents, similar to a web browser in ChatGPT. It queries Google Search, scrapes the top N pages, and returns their content as Markdown for further processing by an LLM. It can also scrape individual URLs. Supports Model Context Protocol (MCP).

Apify

4.9K

4.4

🔥fireScraper AI Prompt Website Content Markdown Scraper

mohamedgb00714/fireScraper-AI-prompt-Website-Content-Markdown-Scraper

fireScrape AI is an advanced web scraper built with Crawlee and Puppeteer. It crawls websites, extracts meaningful content, converts it into Markdown, then runs your custom prompt on the extracted text—ideal for generating enriched datasets, summaries or analyses for LLMs and AI pipelines

mohamed el hadi msaid

5.0

Crawl4AI

janbuchar/crawl4ai

Wraps the Crawl4AI open-source library for retrieving text content from websites.

Jan Buchar

536

5.0

AI-Powered Web Content & Link Extractor

scrapercoder/ai-powered-web-content-link-extractor

Crawls websites to extract clean, structured content for AI/LLM use, ideal for training datasets, knowledge bases, and RAG systems. Json output includes: * text: Normalized page content * links: Extracted sub-URLs

wallnut.ai

102

Universal AI GPT Scraper

louisdeconinck/ai-gpt-scraper

Transform any website into structured data with AI-powered extraction. This versatile tool combines advanced web scraping with intelligent content analysis to deliver clean, customized JSON output - perfect for automating data collection from any web source.

Louis Deconinck

5.0

🔥 FireScrape AI Website Content Markdown Scraper

mohamedgb00714/fireScraper-AI-Website-Content-Markdown-Scraper

Advanced web scraper powered by Crawlee and Puppeteer — extracts website content, converts it to Markdown, and structures it for LLM training datasets.

mohamed el hadi msaid

106

3.8

Ai Ready Web Page To Markdown Converter

mustafa.irshaid.113/ai-ready-web-page-to-markdown-converter

Convert any webpage into structured Markdown and HTML using just a URL. Get the page title, link, and content—perfect for SEO, devs, and AI crawlers. Fast, clean, and ideal for repurposing or analysis. Start turning websites into Markdown instantly.

Mustafa Irshaid

AI Website Content Markdown Scraper

quaking_pail/ai-website-content-markdown-scraper

This Apify Actor, "Website Content Crawler with Markdown Extraction," is designed to perform a comprehensive crawl of specified websites, extract their text content, convert it into Markdown format, and store it in a structured dataset. The extracted content is suitable for feeding LLMs.

AI_Builder

633

4.6

Enhanced Deep Content Crawler

assertive_analogy/advanced-crawler

A fast, Python-powered web crawler with smart content extraction, JS support, metadata capture, and duplicate detection. Ideal for SEO, content migration, and e-commerce scraping. Reliable, scalable, and easy to customize.

Gideon Nesh

Pro Web Content Crawler (With Images)

assertive_analogy/pro-web-content-crawler

Pro Web Content Crawler is a powerful tool that digs deep into web content and images. It handles complex sites, dynamic pages, and hidden content, making it perfect for extracting both data and images. Customizable and API-ready for your unique data needs.

Gideon Nesh

129

5.0