Website Content Crawler
No credit card required
Website Content Crawler
No credit card required
Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.
Do you want to learn more about this Actor?
Get a demoDear Support,
I encountered an issue where the Actor timed out, and I am unable to continue my task. Could you please advise how to extend the timeout or assist in resuming the process from where it left off?
Additionally, is there a way to configure the crawler to return a predefined content in a page content HTML.
instead of an empty response when no data is found during an API request? This would be essential to prevent my automation in Zapier from failing.
Thank you for your support.
Best regards, Raul
a predefined content in a page content HTML, such as:
Hi, thank you for using Website Content Crawler.
I'm not very familiar with Zapier, but based on the logs, it seems there is a 30-second timeout. This is most likely a setting in Zapier. Could you please check? By default, the crawler uses a timeout of 360,000 seconds.
-
Apify API: Run Actor Synchronously:
In Apify, there is an endpoint to Run Actor synchronously with input and get dataset items:
https://api.apify.com/v2/acts/:actorId/run-sync-get-dataset-items
You can specify thetimeout
as a query parameter here. -
Keep HTML Elements:
There is an option to Keep HTML elements using a CSS selector. Please refer to the documentation and set up a selector for your specific content.
I hope this helps. Please let me know whether you are able to solve it in Zapier. Jiri
Actor Metrics
4.1k monthly users
-
854 stars
>99% runs succeeded
24 hours response time
Created in Mar 2023
Modified 16 hours ago