Website Content Crawler
No credit card required
Website Content Crawler
No credit card required
Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗LangChain, LlamaIndex, and the wider LLM ecosystem.
Do you want to learn more about this Actor?
Get a demoHello, I noticed recently that requests to the start URL fail, and I do not see why. It seems like an intermittent issue to me and I do not see a pattern what is causing this. Any feedback would be appreciated. Thanks!
Hello! I see you have quite a lot of website content crawler runs. Could you point out the problematic ones?
Hello Jan, please see for example these runs:
Could you please delete the links from this issue after you accessed them? Thanks!
It is not possible for anyone but the author to edit or delete issue comments, but I have already seen the links, so feel free to redact them now.
I inspected your runs and the most likely explanation is that the web page is having some latency issues. It might help if you try increasing the request timeout.
Thank you, it seems to work now. It was weird, as this happened to multiple pages.
- 3k monthly users
- 465 stars
- 99.9% runs succeeded
- 3.1 days response time
- Created in Mar 2023
- Modified 10 days ago