
Website Content Crawler
Pricing
Pay per usage

Website Content Crawler
Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.
4.0 (41)
Pricing
Pay per usage
1638
Total users
65K
Monthly users
8.5K
Runs succeeded
>99%
Issues response
7.3 days
Last modified
2 days ago
0.3.67 CheerioCrawler shows “Request timeout (0) ms exceeded” despite requestTimeoutSecs being set to 60
Open
I’ve set the configuration option requestTimeoutSecs: 60. However, during execution, I see the following error in logs:
ERROR CheerioCrawler: Request failed and reached maximum retries. Error: impit error: Request timeout (0) ms exceeded.
This suggests the timeout is being treated as 0 ms, even though the configuration specifies 60 seconds. This seems to have started happening more and more often to a significant % of url in any crawl job.
Also as these requests never get added to results like a HTTP 504 for example would, it's hard to debug.
Hello, and thank you for your interest in this Actor.
We just tested this and reproduced the issue. Fortunately, this is just a logging problem, i.e., the Actor respects the set timeout value, but logs a wrong number on error.
I filed an issue in GitHub, feel free to follow the progress there.
Thank you for bringing this up!
uglyrobot
Just to be clear these requests are actually timing out at my 60s then? And is that a connection timeout or request timeout?
uglyrobot
Ok I did some debugging. It seems there is some kind of bug with version 0.3.67 of the actor. When using a custom proxy and Cheerio a large percentage of requests always fail with this timeout error. Running the exact same input on version 0.3.66 works fine.
Hello again, and thank you for the details regarding this error.
I looked into this a bit more, and it seems that the new HTTP client implementation is indeed not respecting the longer user-set timeouts. I made a PR to the underlying library that fixes this.
Until this is merged (and a new WCC version is released), feel free to pin your runs to the last working version (in your case, that would be 0.3.66
).
I'll let you know here once this is resolved.
Thank you for your patience and for providing useful debugging info. Cheers!
uglyrobot
Thanks!