
Website Content Crawler
Pricing
Pay per usage

Website Content Crawler
Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.
4.0 (41)
Pricing
Pay per usage
1638
Total users
65K
Monthly users
8.5K
Runs succeeded
>99%
Issues response
7.3 days
Last modified
2 days ago
Crawler is timing out for root url
Open
When Crawling, behaviour is totally random. Sometime is able to scrap text and some time just failed with 403 errors, even in multiple retry.
Hello Adarsh, and thank you for reporting this.
The run you’re referring to timed out due to a backend issue on our side - sorry about that. The backend was overloaded because of a rolling release and didn't find a suitable timeslot for scheduling your run. This is a very rare issue.
As for the inconsistent behavior and 403 errors you're seeing in other runs - these issues can depend on several factors, such as:
- The crawler mode being used (e.g., browser-based vs. Cheerio)
- Whether you're using proxies, and what kind
- Specific settings in your Actor (like timeouts, clicked selectors etc.)
To help you further, we'll need a bit more context. Could you please share:
- The Actor input or settings you're using
- Whether you're using proxies, and which type
- A couple of run links where the 403s or unexpected behavior happened
Once we have that, we can take a closer look and guide you more effectively. Looking forward to your reply!