Website Content Crawler avatar
Website Content Crawler

Pricing

Pay per usage

Go to Store
Website Content Crawler

Website Content Crawler

Developed by

Apify

Apify

Maintained by Apify

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

4.0 (41)

Pricing

Pay per usage

1638

Total users

65K

Monthly users

8.5K

Runs succeeded

>99%

Issues response

7.3 days

Last modified

2 days ago

not able to scroll after

Opened 2 days ago by ninew36494, last comment 2 days ago by ninew36494

No text parsed from Website with Success

Opened 3 days ago by w4s, last comment 3 days ago by w4s

The actor performs a task that costs $ 15

Opened 4 days ago by MerrickMa, last comment 4 days ago by MerrickMa

Crawler is timing out for root url

Opened 4 days ago by adarshkm, last comment 2 days ago by Jindřich Bär (jindrich.bar)

0.3.67 CheerioCrawler shows “Request timeout (0) ms exceeded” despite requestTimeoutSecs being set to 60

Opened 10 days ago by uglyrobot, last comment 2 days ago by uglyrobot

Cannot load dataset

Opened 14 days ago by katiev-owner, last comment 11 days ago by Jindřich Bär (jindrich.bar)

New rust http client failing on valid SSL config: SelectedUnusableCipherSuiteForVersion

Opened a month ago by uglyrobot, last comment a month ago by Jindřich Bär (jindrich.bar)

Glob Patterns are ignored when using Sitemap

Opened a month ago by cirez_d, last comment a month ago by cirez_d

Memory issue

Opened 2 months ago by acarter, last comment 14 days ago by Jindřich Bär (jindrich.bar)

Avoid query parameters when crawling websites

Opened 2 months ago by innovum_admin, last comment 2 months ago by Jindřich Bär (jindrich.bar)

Getting 403 from public page

Opened 2 months ago by formidable_quagmire, last comment 2 months ago by formidable_quagmire

crawling cannot be done with arabic website in english

Opened 2 months ago by aswinthazhath, last comment 2 months ago by Jindřich Bär (jindrich.bar)

CORS Error

Opened 3 months ago by fmateen, last comment 2 months ago by Jindřich Bär (jindrich.bar)

Is there a way to crawl URL from the visible HTML after removing "removeElementsCssSelector"

Opened 3 months ago by formidable_quagmire, last comment 2 months ago by Jindřich Bär (jindrich.bar)

can we get the images on the pages too?

Opened 3 months ago by disarming_rutabaga, last comment 2 months ago by Jiří Spilka (jiri.spilka)

Decode non-UTF-8 text in crawlerType cheerio

Opened a year ago by consoling_knock, last comment a year ago by Jindřich Bär (jindrich.bar)