Website Content Crawler avatar
Website Content Crawler

Pricing

Pay per usage

Go to Store
Website Content Crawler

Website Content Crawler

Developed by

Apify

Maintained by Apify

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

4.6 (38)

Pricing

Pay per usage

1.2k

Monthly users

6.3k

Runs succeeded

>99%

Response time

3.5 days

Last modified

a day ago

Exclude Start URL and Disallowed Paths from Output + Return Clean JSON Structure

Opened 3 hours ago by rudy-seo, last comment 3 hours ago by rudy-seo

Error on Zapier Actor

Opened 4 hours ago by insiderperks-owner, last comment 4 hours ago by insiderperks-owner

Issue Crawling Content from Paid Websites Like New York Times

Opened 2 days ago by onlinereach, last comment a day ago by Jakub Kopecký (jakub.kopecky)

Date Format

Opened 3 days ago by rizlene, last comment 3 days ago by Jiří Spilka (jiri.spilka)

crawler wont click on a specific button

Opened 4 days ago by shikh.sn2021, last comment a day ago by Jakub Kopecký (jakub.kopecky)

Adsterra .com

Opened 4 days ago by Tijjeboy, last comment 3 days ago by Jiří Spilka (jiri.spilka)

Received blocked status code: 429

Opened 7 days ago by josephfalla, last comment 3 days ago by Jiří Spilka (jiri.spilka)

number of saved lines

Opened 8 days ago by kocsi, last comment 6 hours ago by kocsi

Large number of requests fail

Opened 9 days ago by cirez_d, last comment 9 days ago by cirez_d

Increased usage limit not continuing run

Opened 11 days ago by anlaics2, last comment 9 days ago by anlaics2

How to only have the home page or about us page?

Opened 12 days ago by xemivo2655, last comment 3 days ago by Jiří Spilka (jiri.spilka)

the crawler stops half way through the crawling process

Opened 12 days ago by avkarma, last comment 9 days ago by Jiří Spilka (jiri.spilka)

how to extract "date" meta data?

Opened 14 days ago by avkarma, last comment 3 days ago by Jiří Spilka (jiri.spilka)

get this error

Opened 15 days ago by esc4dinh4, last comment 9 days ago by Jakub Kopecký (jakub.kopecky)

Crawling not extracting all text on page

Opened 20 days ago by agungbmtra, last comment 16 days ago by Jakub Kopecký (jakub.kopecky)

cannot download pdfs

Opened 21 days ago by ftballguy45, last comment 16 days ago by Jakub Kopecký (jakub.kopecky)

Links are not extracted

Opened 22 days ago by tom.a, last comment 22 days ago by tom.a

Add Full File Name to the Key-Value-Stores

Opened 23 days ago by CtrlAltElite, last comment a day ago by CtrlAltElite

Ability to Group Crawled Page with Followed Link and Its Content in a Single Row

Opened a month ago by randomname1234, last comment a month ago by randomname1234

we can't scrape that website as its says SSL certificate error , Can you please fix it.

Opened a month ago by anthony.quinn, last comment 16 days ago by Jakub Kopecký (jakub.kopecky)

Pricing

Pricing model

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage.