Tripadvisor Reviews Scraper avatar
Tripadvisor Reviews Scraper

Pricing

$2.00 / 1,000 reviews

Go to Store
Tripadvisor Reviews Scraper

Tripadvisor Reviews Scraper

Developed by

Maximillian Copelli

Maintained by Apify

Get and download reviews for chosen places on Tripadvisor. Extract the review text, URL, rating, date of travel, published date, basic reviewer info, owner's response, helpful votes, images, review language, place details. Download reviews in XML, JSON, CSV.

4.3 (6)

Pricing

$2.00 / 1,000 reviews

82

Monthly users

393

Runs succeeded

>99%

Response time

8.5 hours

Last modified

10 days ago

KC

Review count doesn't line up -- way to re-run on missing?

Closed

kcarriere opened this issue
4 months ago

I just ran a decent-sized job. There were a few locations that had less than 100% return rate (66%, 14%, 66%, 32.5%, 80%, and 49%). For example, locationId==10105731, returned a suspiciously round 21,500 reviews. There's definitely contradictory information on TripAdvisor's side -- it lists 32,092 reviews but also "showing results of..." 25,416.

Regardless, that's between 3.9k and 10.5k reviews not scrapped.

The log at some point reads for this location: "2024-12-19T13:02:39.104Z INFO Reached max reviews to enqueue per query". That sounds like the scrapper got bounced out potentially? I'm not sure what that means.

Sorting the JSON, it's definitely scrapping the most recent (12/16/2024; 12/05/2024, 11/28/2024). Hard to tell if it's scrapping the "oldest". I can't necessarily re-run because the filter is for "only scrap reviews since" and not "only scrap reviews before".

Just trying to figure out the missing data discrepancy here, and how I could solve this.

lukas.prusa avatar

Hi, thanks for opening this issue and your patience! Sorry this issue got a bit lost for us.

The problem - You've set the global max items limit to 249.5k. Exactly where the crawler has stopped. It can be found at the bottom of the page.

The resolution - Understandably, you are now missing results for some hotels. I've gone over the logs and found the ones that've not been finished. These are the ones that you will most likely have to rescrape again:

1https://www.tripadvisor.com/Attraction_Review-g28970-d10105731-Reviews-Lincoln_Memorial-Washington_DC_District_of_Columbia.html
2https://www.tripadvisor.com/Attraction_Review-g60763-d1687489-Reviews-The_National_9_11_Memorial_Museum-New_York_City_New_York.html
3https://www.tripadvisor.com/Attraction_Review-g60982-d104386-Reviews-USS_Arizona_Memorial-Honolulu_Oahu_Hawaii.html
4https://www.tripadvisor.com/Attraction_Review-g187147-d188709-Reviews-Arc_de_Triomphe-Paris_Ile_de_France.html
5https://www.tripadvisor.com/Attraction_Review-g187323-d617423-Reviews-The_Holocaust_Memorial_Memorial_to_the_Murdered_Jews_of_Europe-Berlin.html

I hope this helps, thanks and happy scraping!

Pricing

Pricing model

Pay per result 

This Actor is paid per result. You are not charged for the Apify platform usage, but only a fixed price for each dataset of 1,000 items in the Actor outputs.

Price per 1,000 items

$2.00