Website Content Crawler avatar

Website Content Crawler

Try for free

No credit card required

View all Actors
Website Content Crawler

Website Content Crawler

apify/website-content-crawler
Try for free

No credit card required

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗LangChain, LlamaIndex, and the wider LLM ecosystem.

Do you want to learn more about this Actor?

Get a demo
CD

Requests are failing often

Closed

cirez_d opened this issue
2 months ago

Hello, I noticed recently that requests to the start URL fail, and I do not see why. It seems like an intermittent issue to me and I do not see a pattern what is causing this. Any feedback would be appreciated. Thanks!

janbuchar avatar

Hello! I see you have quite a lot of website content crawler runs. Could you point out the problematic ones?

CD

cirez_d

2 months ago

Hello Jan, please see for example these runs:

Could you please delete the links from this issue after you accessed them? Thanks!

janbuchar avatar

It is not possible for anyone but the author to edit or delete issue comments, but I have already seen the links, so feel free to redact them now.

janbuchar avatar

I inspected your runs and the most likely explanation is that the web page is having some latency issues. It might help if you try increasing the request timeout.

CD

cirez_d

a month ago

Thank you, it seems to work now. It was weird, as this happened to multiple pages.

Developer
Maintained by Apify
Actor metrics
  • 3k monthly users
  • 465 stars
  • 99.9% runs succeeded
  • 3.1 days response time
  • Created in Mar 2023
  • Modified 10 days ago