Website Content Crawler avatar
Website Content Crawler

Pricing

Pay per usage

Go to Store
Website Content Crawler

Website Content Crawler

Developed by

Apify

Maintained by Apify

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

4.6 (38)

Pricing

Pay per usage

1.2k

Monthly users

6.2k

Runs succeeded

>99%

Response time

3.6 days

Last modified

11 hours ago

Issue Crawling Content from Paid Websites Like New York Times

Opened an hour ago by onlinereach, last comment an hour ago by onlinereach

Date Format

Opened a day ago by rizlene, last comment a day ago by Jiří Spilka (jiri.spilka)

crawler wont click on a specific button

Opened 2 days ago by shikh.sn2021, last comment 2 days ago by shikh.sn2021

Adsterra .com

Opened 3 days ago by Tijjeboy, last comment 2 days ago by Jiří Spilka (jiri.spilka)

number of saved lines

Opened 7 days ago by kocsi, last comment a day ago by Jiří Spilka (jiri.spilka)

Large number of requests fail

Opened 8 days ago by cirez_d, last comment 8 days ago by cirez_d

How to only have the home page or about us page?

Opened 10 days ago by xemivo2655, last comment 2 days ago by Jiří Spilka (jiri.spilka)

Crawling not extracting all text on page

Opened 18 days ago by agungbmtra, last comment 15 days ago by Jakub Kopecký (jakub.kopecky)

cannot download pdfs

Opened 19 days ago by ftballguy45, last comment 15 days ago by Jakub Kopecký (jakub.kopecky)

Add Full File Name to the Key-Value-Stores

Opened 21 days ago by CtrlAltElite, last comment 4 days ago by CtrlAltElite

Ability to Group Crawled Page with Followed Link and Its Content in a Single Row

Opened 24 days ago by randomname1234, last comment 24 days ago by randomname1234

Page Title

Opened a month ago by CtrlAltElite, last comment a month ago by Jakub Kopecký (jakub.kopecky)

Navigating frame was detached

Opened 2 months ago by stephen.kim, last comment 2 months ago by Jiří Spilka (jiri.spilka)

scraper don't scrape all the website content like product description

Opened 2 months ago by maabada.shivok, last comment 2 months ago by maabada.shivok

Crawl hung at finished

Opened 3 months ago by mcantrell, last comment 3 months ago by mykola_scrapes

Decode non-UTF-8 text in crawlerType cheerio

Opened a year ago by consoling_knock, last comment a year ago by Jindřich Bär (jindrich.bar)

Pricing

Pricing model

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage.