Tripadvisor Review Scraper avatar
Tripadvisor Review Scraper

Pricing

Pay per event

Go to Store
Tripadvisor Review Scraper

Tripadvisor Review Scraper

Developed by

HappiTap

HappiTap

Maintained by Community

A specialized Apify actor that extracts detailed reviews from TripAdvisor hotels, restaurants, and attractions. Features advanced anti-bot measures, residential proxy support, and comprehensive review data extraction.

0.0 (0)

Pricing

Pay per event

0

Total users

1

Monthly users

1

Runs succeeded

>99%

Last modified

5 days ago

A specialized Apify actor that extracts detailed reviews from TripAdvisor hotels, restaurants, and attractions. Features advanced anti-bot measures, residential proxy support, and comprehensive review data extraction.

What It Does

This scraper extracts structured review data from TripAdvisor pages including:

FieldDescription
reviewIdUnique review identifier
titleReview title/headline
textFull review text content
ratingStar rating (1-5 scale)
dateReview publication date
author.nameReviewer's name
author.locationReviewer's location
helpfulCountNumber of helpful votes
photosArray of review photo URLs (optional)
urlSource TripAdvisor page URL
scrapedAtTimestamp of data extraction

Use Cases

  • Sentiment Analysis: Analyze customer sentiment and satisfaction trends
  • Competitor Research: Monitor reviews for competing hotels/restaurants
  • Reputation Management: Track review patterns and identify improvement areas
  • Market Research: Understand customer preferences and pain points
  • Review Monitoring: Get alerts for new reviews and rating changes

Input

The actor accepts the following input format:

{
"startUrls": [
{ "url": "https://www.tripadvisor.com/Hotel_Review-g60763-d93452-Reviews-The_Plaza_Hotel-New_York_City_New_York.html" }
],
"maxReviews": 100,
"includePhotos": false
}

Input Parameters

ParameterTypeRequiredDefaultDescription
startUrlsArrayYes-Array of objects with url property pointing to TripAdvisor review pages
maxReviewsNumberNo100Maximum number of reviews to extract per page (with pagination)
includePhotosBooleanNofalseWhether to extract photo URLs from reviews

Supported TripAdvisor URLs

The scraper works with TripAdvisor review pages for:

  • Hotels: https://www.tripadvisor.com/Hotel_Review-g60763-d93452-Reviews-The_Plaza_Hotel-New_York_City_New_York.html
  • Restaurants: https://www.tripadvisor.com/Restaurant_Review-g60763-d1751194-Reviews-Eleven_Madison_Park-New_York_City_New_York.html
  • Attractions: https://www.tripadvisor.com/Attraction_Review-g60763-d104365-Reviews-Statue_of_Liberty-New_York_City_New_York.html
  • Activities: https://www.tripadvisor.com/AttractionProductReview-g60763-d11966990-Reviews-Central_Park_Walking_Tour-New_York_City_New_York.html

Output

The actor outputs structured data for each review found:

{
"reviewId": "review_123456",
"title": "Amazing stay at The Plaza!",
"text": "We had a wonderful time at this hotel. The service was exceptional and the location perfect for exploring NYC. The rooms were clean and comfortable with beautiful views of Central Park.",
"rating": 5.0,
"date": "December 2024",
"author": {
"name": "John D",
"location": "Los Angeles, CA"
},
"helpfulCount": 15,
"photos": ["https://media-cdn.tripadvisor.com/media/photo-s/..."],
"url": "https://www.tripadvisor.com/Hotel_Review-g60763-d93452-Reviews-The_Plaza_Hotel-New_York_City_New_York.html",
"scrapedAt": "2024-01-15T10:30:00.000Z"
}

Example Usage

Single Hotel Reviews

{
"startUrls": [
{ "url": "https://www.tripadvisor.com/Hotel_Review-g60763-d93452-Reviews-The_Plaza_Hotel-New_York_City_New_York.html" }
],
"maxReviews": 100,
"includePhotos": false
}

Multiple Properties

{
"startUrls": [
{ "url": "https://www.tripadvisor.com/Hotel_Review-g60763-d93452-Reviews-The_Plaza_Hotel-New_York_City_New_York.html" },
{ "url": "https://www.tripadvisor.com/Restaurant_Review-g60763-d1751194-Reviews-Eleven_Madison_Park-New_York_City_New_York.html" }
],
"maxReviews": 50,
"includePhotos": false
}

Reviews with Photos

{
"startUrls": [
{ "url": "https://www.tripadvisor.com/Hotel_Review-g60763-d93452-Reviews-The_Plaza_Hotel-New_York_City_New_York.html" }
],
"maxReviews": 25,
"includePhotos": true
}

How It Works

  1. Page Navigation: Uses Puppeteer with stealth mode and residential proxies to load TripAdvisor review pages
  2. Anti-Bot Bypass: Enhanced headers, realistic timing, and residential IP rotation to avoid detection
  3. Review Detection: Identifies review containers using multiple CSS selectors for maximum compatibility
  4. Data Extraction: Extracts review text, ratings, author info, and optional photos
  5. Pagination: Automatically navigates through multiple review pages to reach maxReviews limit
  6. Structured Output: Returns clean, structured review data ready for analysis

Features

  • Residential Proxy Support: Uses Apify's residential proxy network for better anti-bot bypass
  • Advanced Anti-Detection: Realistic browser headers, timing, and stealth mode
  • Robust Extraction: Multiple fallback selectors to handle TripAdvisor's changing page structure
  • Review Pagination: Automatically navigates through multiple pages of reviews
  • Photo Extraction: Optional extraction of review photos and media
  • Error Handling: Graceful error handling with detailed logging and blocking detection
  • Data Validation: Ensures data quality with validation checks

Installation

  1. Clone this repository
  2. Install dependencies: npm install
  3. Run the actor: npm start

Development

  • npm start - Run the actor
  • npm run format - Format code with Prettier
  • npm run lint - Run ESLint
  • npm run lint:fix - Fix ESLint issues
  • node test-tripadvisor.js - Test TripAdvisor functionality
  • node test-with-proxy.js - Test with Apify residential proxies

Architecture

  • src/main.js - Main entry point and input validation for TripAdvisor URLs
  • src/routes.js - Request routing with TripAdvisor URL validation
  • src/handlers/tripadvisorReviews.js - TripAdvisor review scraping logic with anti-bot measures
  • src/puppeteerLauncher.js - Puppeteer browser configuration with stealth mode

Deployment

$apify push

Local Testing with Apify Token

export APIFY_TOKEN=your_apify_token_here
node src/main.js

Notes

  • Residential Proxies Required: TripAdvisor actively blocks datacenter IPs. Deploy to Apify platform or use valid Apify token for residential proxy access
  • Anti-Bot Measures: The scraper includes advanced anti-detection measures but TripAdvisor's blocking is sophisticated
  • Success Rate: Best results when deployed to Apify platform with residential proxy rotation
  • Pagination Support: Automatically navigates through multiple review pages to reach maxReviews limit
  • All extracted data is timestamped for tracking purposes
  • The scraper is designed to be respectful of TripAdvisor's servers with realistic delays