
Website Content Crawler
Pricing
Pay per usage

Website Content Crawler
Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.
4.0 (41)
Pricing
Pay per usage
1638
Total users
65K
Monthly users
8.5K
Runs succeeded
>99%
Issues response
7.3 days
Last modified
2 days ago
not able to scroll after
Opened 2 days ago by ninew36494, last comment 2 days ago by ninew36494
No text parsed from Website with Success
Opened 3 days ago by w4s, last comment 3 days ago by w4s
The actor performs a task that costs $ 15
Opened 4 days ago by MerrickMa, last comment 4 days ago by MerrickMa
Crawler is timing out for root url
Opened 4 days ago by adarshkm, last comment 2 days ago by Jindřich Bär (jindrich.bar)
0.3.67 CheerioCrawler shows “Request timeout (0) ms exceeded” despite requestTimeoutSecs being set to 60
Opened 10 days ago by uglyrobot, last comment 2 days ago by uglyrobot
Cannot load dataset
Opened 14 days ago by katiev-owner, last comment 11 days ago by Jindřich Bär (jindrich.bar)
New rust http client failing on valid SSL config: SelectedUnusableCipherSuiteForVersion
Opened a month ago by uglyrobot, last comment a month ago by Jindřich Bär (jindrich.bar)
Glob Patterns are ignored when using Sitemap
Opened a month ago by cirez_d, last comment a month ago by cirez_d
Memory issue
Opened 2 months ago by acarter, last comment 14 days ago by Jindřich Bär (jindrich.bar)
Avoid query parameters when crawling websites
Opened 2 months ago by innovum_admin, last comment 2 months ago by Jindřich Bär (jindrich.bar)
Getting 403 from public page
Opened 2 months ago by formidable_quagmire, last comment 2 months ago by formidable_quagmire
crawling cannot be done with arabic website in english
Opened 2 months ago by aswinthazhath, last comment 2 months ago by Jindřich Bär (jindrich.bar)
CORS Error
Opened 3 months ago by fmateen, last comment 2 months ago by Jindřich Bär (jindrich.bar)
Is there a way to crawl URL from the visible HTML after removing "removeElementsCssSelector"
Opened 3 months ago by formidable_quagmire, last comment 2 months ago by Jindřich Bär (jindrich.bar)
can we get the images on the pages too?
Opened 3 months ago by disarming_rutabaga, last comment 2 months ago by Jiří Spilka (jiri.spilka)
Decode non-UTF-8 text in crawlerType cheerio
Opened a year ago by consoling_knock, last comment a year ago by Jindřich Bär (jindrich.bar)