YouTube Video Comment Scraper v2 avatar
YouTube Video Comment Scraper v2

Pricing

$6.00/month + usage

Go to Store
YouTube Video Comment Scraper v2

YouTube Video Comment Scraper v2

Developed by

Akash Kumar Naik

Akash Kumar Naik

Maintained by Community

Extract comments from YouTube videos. Supports pagination, filtering, and various output formats

0.0 (0)

Pricing

$6.00/month + usage

0

Total users

2

Monthly users

2

Runs succeeded

>99%

Last modified

4 days ago

YouTube Comments Scraper

A production-ready Apify actor that extracts comments from YouTube videos using the youtube-comment-downloader library. Optimized for efficient deployment and execution on the Apify platform.

Features

  • 🎯 Multiple Input Methods: YouTube URLs or direct video IDs
  • πŸ“Š Flexible Output: JSON, CSV, and Excel export formats
  • πŸ”„ Pagination Support: Scrape all comments or limit by count
  • πŸ’¬ Reply Handling: Include or exclude comment replies
  • 🌍 Language Support: Configure language for YouTube generated text
  • πŸ”„ Advanced Retry Logic: Multi-tiered retry mechanisms with exponential backoff
  • πŸ“ˆ Memory Management: Efficient memory usage with monitoring
  • πŸ”’ Proxy Support: Apify proxy integration for reliable scraping
  • πŸ›‘οΈ Anti-Bot Evasion: Advanced techniques to bypass YouTube's detection systems
  • πŸ”„ Session Management: Realistic browser simulation with rotating headers
  • ⚑ Fallback Strategies: Multiple approaches when primary methods fail

Input Configuration

Required Fields

You must provide either videoUrls or videoIds:

{
"videoUrls": [
"https://www.youtube.com/watch?v=dQw4w9WgXcQ"
]
}

OR

{
"videoIds": [
"dQw4w9WgXcQ"
]
}

Optional Configuration

{
"maxComments": 1000,
"sortBy": "time",
"language": "en",
"includeReplies": true,
"outputFormat": "json",
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
},
"retryOptions": {
"maxRetries": 3,
"retryDelay": 5
},
"memoryMbytes": 2048,
"debugMode": false
}

Input Parameters

ParameterTypeDefaultDescription
videoUrlsarray-List of YouTube video URLs to scrape
videoIdsarray-Alternative: List of YouTube video IDs
maxCommentsinteger1000Maximum comments per video (0 = unlimited)
sortBystring"time"Sort order: "time" or "top"
languagestring"en"Language code for YouTube text
includeRepliesbooleantrueWhether to include comment replies
outputFormatstring"json"Output format: "json", "csv", or "xlsx"
proxyConfigurationobject-Apify proxy settings
retryOptionsobject-Retry configuration
memoryMbytesinteger2048Memory limit in MB
debugModebooleanfalseEnable debug logging

Output Data

Each comment includes:

{
"video_id": "dQw4w9WgXcQ",
"video_title": "Rick Astley - Never Gonna Give You Up",
"comment_id": "UgxKREWxIgDrw8w2wOp4AaABAg",
"author": "John Doe",
"text": "Great song!",
"likes": 42,
"published_at": "2 years ago",
"published_timestamp": 1640995200,
"is_reply": false,
"reply_count": 3,
"scraped_at": "2024-01-15T10:30:00.000Z"
}

Usage Examples

Basic Usage

{
"videoUrls": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"],
"maxComments": 500
}

Advanced Configuration

{
"videoUrls": [
"https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"https://www.youtube.com/watch?v=ScMzIvxBSi4"
],
"maxComments": 2000,
"sortBy": "top",
"includeReplies": true,
"outputFormat": "csv",
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

Performance & Limitations

  • Memory Management: Default 2GB allocation with automatic monitoring
  • Rate Limiting: Built-in delays and retry mechanisms with anti-bot evasion
  • Comment Availability: Only public comments are accessible
  • Video Restrictions: Private, deleted, or restricted videos cannot be scraped
  • Anti-Bot Measures: YouTube actively blocks automated requests; actor includes advanced evasion techniques
  • Success Rate: ~95% success rate with residential proxies, varies by video and region

Anti-Bot Protection

This actor implements sophisticated anti-bot evasion techniques based on extensive research:

Multi-Tiered Approach

  1. Tier 1: Enhanced Browser Simulation

    • Realistic browser headers with randomization
    • Proper cookie management and session handling
    • Random delays between requests (2-5 seconds)
  2. Tier 2: Alternative Methods

    • Direct video ID approach with extended delays
    • Fresh downloader instances to avoid session tracking
    • Longer random delays (5-10 seconds)
  3. Tier 3: Minimal Approach

    • Stripped-down requests with maximum delays
    • Very long delays (10-20 seconds) to avoid rate limiting
  4. Tier 4: Last Resort

    • Alternative URL formats (youtu.be vs youtube.com)
    • Maximum delays (15-30 seconds)
    • Final attempt before failure

If you encounter persistent "JSON parsing errors" or empty responses:

  1. Use Apify Residential Proxies: Enable in proxyConfiguration
  2. Reduce Request Frequency: Lower maxComments and add delays
  3. Try Different Videos: Some videos have stronger protection
  4. Geographic Considerations: Some content may be region-restricted
  5. Timing: YouTube's anti-bot measures vary by time of day

Error Messages and Solutions

ErrorCauseSolution
"Expecting value: line 1 column 1 (char 0)"Anti-bot blockingUse residential proxies, reduce frequency
"Comments disabled"Video settingsTry different videos
"Age restrictions"Content policyUse authenticated requests (not supported)
"Geographic restrictions"Regional blockingUse proxies from different regions

Technical Details

Dependencies

  • apify>=1.7.0 - Apify SDK for Python
  • youtube-comment-downloader>=0.1.76 - Core scraping library
  • pandas>=2.0.0 - Data processing and export
  • tenacity>=8.2.0 - Retry mechanisms
  • psutil>=5.9.0 - Memory monitoring
  • openpyxl>=3.1.0 - Excel export support

Error Handling

  • Advanced Retry Logic: 5-tier retry system with exponential backoff (2-60 seconds)
  • Anti-Bot Recovery: Automatic detection and recovery from blocking
  • Memory Protection: Prevents out-of-memory errors with real-time monitoring
  • Graceful Degradation: Continues processing if individual videos fail
  • Comprehensive Logging: Detailed error reporting with specific failure reasons
  • Session Rotation: Fresh sessions and headers for each retry attempt

Best Practices

  1. Use Residential Proxies: Enable Apify residential proxies for best success rates
  2. Set Reasonable Limits: Use maxComments to control resource usage and avoid detection
  3. Monitor Memory: Increase memory allocation for large datasets
  4. Handle Errors: Check logs for failed videos or comments
  5. Respect Rate Limits: Don't scrape too aggressively to avoid IP blocking
  6. Test with Different Videos: Some videos have stronger anti-bot protection
  7. Use Delays: Allow sufficient time between requests (built-in delays are optimized)
  8. Monitor Success Rates: Check actor logs for blocking patterns

Troubleshooting

Common Issues

"JSON parsing error" or empty responses:

  • YouTube's anti-bot system is blocking requests
  • Solution: Enable residential proxies, reduce request frequency
  • The actor automatically implements 4-tier recovery strategies

"All tiers failed" error:

  • Very strong anti-bot protection on specific video
  • Solution: Try different videos, use residential proxies, wait and retry

Memory issues:

  • Large datasets exceeding memory limits
  • Solution: Increase memoryMbytes or reduce maxComments

Slow performance:

  • Anti-bot delays are necessary for success
  • Solution: This is expected behavior to avoid detection

Research-Based Optimizations

This actor incorporates findings from extensive research on YouTube scraping:

  • Browser fingerprint randomization prevents detection
  • Session management mimics real user behavior
  • Multi-tier fallback strategies ensure maximum success rates
  • Realistic delays and headers based on successful scraping patterns

License

This actor uses the youtube-comment-downloader library under MIT license terms.