Tripadvisor Scraper avatar

Tripadvisor Scraper

Try for free

Pay $3.00 for 1,000 results

View all Actors
Tripadvisor Scraper

Tripadvisor Scraper

maxcopell/tripadvisor
Try for free

Pay $3.00 for 1,000 results

This unofficial Tripadvisor API is a data extraction tool able to get data on hotels, restaurants, things to do, vacation rentals, attractions, tours, and public trips. Get pricing, contact details, amenities, awards, ratings, and more. Download your data in Excel, JSON, CSV, and other formats.

Do you want to learn more about this Actor?

Get a demo
BP

Please publish Input value for all records, not just for invalid URLs

Closed

buddy_props opened this issue
2 months ago

Tripadvisor updates their ID for a place from time to time. When I rescrape a listing to get the most current information, and the ID has changed, Tripadvisor redirects to the place listing with the new ID. When this happens, I have no way to map it back to my existing listing and I get duplicates.

I noticed that when an inputted URL is scraped and no longer exists (or the location ID is invalid), this actor also outputs the Input URL. Can you please add the Input URL to all records? That will allow me to know the original input URL so if the place listing is redirected by Tripadvisor, I can map it back to the original URL and not create a new duplicate.

lukas.prusa avatar

Hi, thanks for opening this issue!

Are you sure TripAdvisor really update their hotel IDs? We've never experienced any hotels changing their IDs, and it also doesn't make sense from TripAdvisor's perspective, where many users rely on the ID's permanency in bookmarked URLs. Do you have a specific example of such hotel?

Either way, this feature still makes perfect sense to us :) We will add it to the scraper, thanks for the suggestion!

I will keep you updated here, thanks!

DH

dhlee3

2 months ago

I don't know about hotels, but for restaurants they do.

For example, here are two URLs that redirect to the same business with a different ID. https://www.tripadvisor.com/Restaurant_Review-g60827-d26101341-Reviews-Ayat-Brooklyn_New_York.html https://www.tripadvisor.com/Restaurant_Review-g60763-d611426-Reviews-Buddakan-New_York_City_New_York.html

Glad you can add it! It will be very useful!

lukas.prusa avatar

Oh I see, yes, the first URL redirects for me with a different ID to https://www.tripadvisor.com/Restaurant_Review-g60827-d21337564-Reviews-Ayat_Bayridge-Brooklyn_New_York.html

The second URL doesn't redirect for me, but still the first one does.

We will add this feature shortly, thank you!

mvolfik avatar

We have just released version 0.0.137 which adds this output field.

BP

buddy_props

2 months ago

Thank you! This is very helpful!

BP

buddy_props

2 months ago

Actually, one small request.

When the the input URL gives an error, the field you show for the error record is "Input". Can that be renamed to "inputQueryOrUrl" as well so it matches with the output of successful scrape records? I think it's basically the same field.

For exmaple, in this Run, I have two columns, "Input" and "inputQueryOrUrl" - when I parse the records, it would be much easier for this be the same name column name. https://console.apify.com/organization/ehoaF7l5kHY9MMfs1/actors/dbEyMBriog95Fv8CW/runs/oSgueZqmcpZAdVHN0#output

BTW, this is an example of a place URL that has been removed by Tripadvisor. I can use the input url to map it back to my record and mark it as no longer in business. https://www.tripadvisor.com/Restaurant_Review-g60763-d7180903-Reviews-Bar_SixtyFive-New_York_City_New_York.html

mvolfik avatar

sorry I forgot to ping here, maybe you already noticed, on Monday we released an update and now the field is named input in both error and normal items

DH

dhlee3

2 months ago

Great news - thank you!

Developer
Maintained by Apify
Actor metrics
  • 365 monthly users
  • 74 stars
  • 97.3% runs succeeded
  • 4.8 days response time
  • Created in Nov 2019
  • Modified 12 days ago
Categories