Smart Article Extractor avatar

Smart Article Extractor

Try for free

No credit card required

Go to Store
Smart Article Extractor

Smart Article Extractor

lukaskrivka/article-extractor-smart
Try for free

No credit card required

📰 Smart Article Extractor extracts articles from any scientific, academic, or news website with just one click. The extractor crawls the whole website and automatically distinguishes articles from other web pages. Download your data as HTML table, JSON, Excel, RSS feed, and more.

Do you want to learn more about this Actor?

Get a demo
HE

Timeout and Storage check

Open

hectorlca opened this issue
11 days ago

Could you please help understand how to pass timeout parameter using API? I never see it reflected, even when using the web interface. Additionally, I believe it occasionally fails to check the articles that are already stored.

i have used this actor for many months now, and in the recent weeks, some of my runs have been running indefinitely. I don-t know what happened, but I didn't have to monitor every day because the results were as expected. Now I've had some terrible experiences with the actor running infinitely, costing me loads of money.

Edit: these two runs have identical input:

  1. https://console.apify.com/actors/runs/TrPjXGgDm5PHNiM48#output - 3,099 results. (Not desired)
  2. https://console.apify.com/actors/runs/SJcFnVa96s3OFZy2w#output - 66 results. (Desired).
ondrejklinovsky avatar

Hey,

when starting a new run through API, you can define the timeout with query parameter. For example, to start a run with timeout 60 seconds: POST https://api.apify.com/v2/acts/<ACTOR>/runs?timeout=60. Here's the docs.

The issue with the run is that the website changed the urls of the articles - they added ?amp query parameter. This caused that the actor scraped them again because their URL was different from those there were already stored. We'll need to figure out how to avoid situatoins like this. We cannot ignore the query parameters completely because they may define the article (e.g. ?articleId=edasdas), so this will require more thinking. Thank you for the report, we'll let you know when we have any updates on this. Let me know if you have any questions.

Developer
Maintained by Apify

Actor Metrics

  • 277 monthly users

  • 82 stars

  • >99% runs succeeded

  • 2.3 days response time

  • Created in Nov 2019

  • Modified a month ago

Categories