Web Scraper avatar

Web Scraper

Try for free

No credit card required

Go to Store
Web Scraper

Web Scraper

apify/web-scraper
Try for free

No credit card required

Crawls arbitrary websites using the Chrome browser and extracts data from pages using JavaScript code. The Actor supports both recursive crawling and lists of URLs and automatically manages concurrency for maximum performance. This is Apify's basic tool for web crawling and scraping.

Do you want to learn more about this Actor?

Get a demo
proloser avatar

Reclaiming failed request back to the list or queue. requestHandler timed out after 60 seconds.

Closed

Dean Sofer (proloser) opened this issue
6 months ago

It looks like it's crawling the page correctly but I can't figure out why this error is occurring and I'd prefer to preserve my usage

https://console.apify.com/actors/moJRLRc85AitArpNN/runs/An5339u0xNa1UKP7D#log

jindrich.bar avatar

Hello, and thank you for your interest in this Actor!

The issue you describe seems to appear randomly. It might be related to the asynchronous requests you are making inside of the Page Function. Unfortunately, I cannot provide much more help with your custom code, as I don't know what you are trying to achieve. As a quick remedy, you can also bump the requestHandler timeout by increasing the value in the Performance and Limits > Page Function timeout input option.

By the way - the website you are scraping seems completely server-side rendered (and static, i.e., without client-side JS). This means you can process it with our Cheerio Scraper as well. This Actor is much faster than Web Scraper, as it doesn't use web browsers to load the page (it uses a simple HTTP request and an HTML parser instead). I see that most of your custom code uses jQuery - migrating this to Cheerio should be fairly easy, as Cheerio supports a fairly comprehensive subset of jQuery syntax.

Migrating to Cheerio Scraper should give you your results much faster (up to 20x speed improvement) and definitely save you some platform credits as well.

I'll keep this issue open - feel free to ask additional questions if you have any - or close this issue, if you don't. Cheers!

proloser avatar

I am scraping this wordpress blog for events around my city and converting them into events with scheduling details and lat/long coordinates to display the events on a map. The async request I'm doing is to geocode the address for the event for the purposes of displaying on a map.

I will look into parsing the site with cheerio, but I am guessing I'd have to then use another actor to geocode the address as I would not be able to do this in cheerio, right?

proloser avatar

Hello I figured out how to do it with cheerio (their documentation is horrible) and it works great! thanks

Developer
Maintained by Apify

Actor Metrics

  • 2.6k monthly users

  • 340 stars

  • >99% runs succeeded

  • 37 days response time

  • Created in Mar 2019

  • Modified 5 months ago

Categories