
YouTube Video Comment Scraper v2
Pricing
$6.00/month + usage

YouTube Video Comment Scraper v2
Extract comments from YouTube videos. Supports pagination, filtering, and various output formats
0.0 (0)
Pricing
$6.00/month + usage
0
Total users
2
Monthly users
2
Runs succeeded
>99%
Last modified
4 days ago
YouTube Comments Scraper
A production-ready Apify actor that extracts comments from YouTube videos using the youtube-comment-downloader
library. Optimized for efficient deployment and execution on the Apify platform.
Features
- π― Multiple Input Methods: YouTube URLs or direct video IDs
- π Flexible Output: JSON, CSV, and Excel export formats
- π Pagination Support: Scrape all comments or limit by count
- π¬ Reply Handling: Include or exclude comment replies
- π Language Support: Configure language for YouTube generated text
- π Advanced Retry Logic: Multi-tiered retry mechanisms with exponential backoff
- π Memory Management: Efficient memory usage with monitoring
- π Proxy Support: Apify proxy integration for reliable scraping
- π‘οΈ Anti-Bot Evasion: Advanced techniques to bypass YouTube's detection systems
- π Session Management: Realistic browser simulation with rotating headers
- β‘ Fallback Strategies: Multiple approaches when primary methods fail
Input Configuration
Required Fields
You must provide either videoUrls
or videoIds
:
{"videoUrls": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"]}
OR
{"videoIds": ["dQw4w9WgXcQ"]}
Optional Configuration
{"maxComments": 1000,"sortBy": "time","language": "en","includeReplies": true,"outputFormat": "json","proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]},"retryOptions": {"maxRetries": 3,"retryDelay": 5},"memoryMbytes": 2048,"debugMode": false}
Input Parameters
Parameter | Type | Default | Description |
---|---|---|---|
videoUrls | array | - | List of YouTube video URLs to scrape |
videoIds | array | - | Alternative: List of YouTube video IDs |
maxComments | integer | 1000 | Maximum comments per video (0 = unlimited) |
sortBy | string | "time" | Sort order: "time" or "top" |
language | string | "en" | Language code for YouTube text |
includeReplies | boolean | true | Whether to include comment replies |
outputFormat | string | "json" | Output format: "json", "csv", or "xlsx" |
proxyConfiguration | object | - | Apify proxy settings |
retryOptions | object | - | Retry configuration |
memoryMbytes | integer | 2048 | Memory limit in MB |
debugMode | boolean | false | Enable debug logging |
Output Data
Each comment includes:
{"video_id": "dQw4w9WgXcQ","video_title": "Rick Astley - Never Gonna Give You Up","comment_id": "UgxKREWxIgDrw8w2wOp4AaABAg","author": "John Doe","text": "Great song!","likes": 42,"published_at": "2 years ago","published_timestamp": 1640995200,"is_reply": false,"reply_count": 3,"scraped_at": "2024-01-15T10:30:00.000Z"}
Usage Examples
Basic Usage
{"videoUrls": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"],"maxComments": 500}
Advanced Configuration
{"videoUrls": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ","https://www.youtube.com/watch?v=ScMzIvxBSi4"],"maxComments": 2000,"sortBy": "top","includeReplies": true,"outputFormat": "csv","proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
Performance & Limitations
- Memory Management: Default 2GB allocation with automatic monitoring
- Rate Limiting: Built-in delays and retry mechanisms with anti-bot evasion
- Comment Availability: Only public comments are accessible
- Video Restrictions: Private, deleted, or restricted videos cannot be scraped
- Anti-Bot Measures: YouTube actively blocks automated requests; actor includes advanced evasion techniques
- Success Rate: ~95% success rate with residential proxies, varies by video and region
Anti-Bot Protection
This actor implements sophisticated anti-bot evasion techniques based on extensive research:
Multi-Tiered Approach
-
Tier 1: Enhanced Browser Simulation
- Realistic browser headers with randomization
- Proper cookie management and session handling
- Random delays between requests (2-5 seconds)
-
Tier 2: Alternative Methods
- Direct video ID approach with extended delays
- Fresh downloader instances to avoid session tracking
- Longer random delays (5-10 seconds)
-
Tier 3: Minimal Approach
- Stripped-down requests with maximum delays
- Very long delays (10-20 seconds) to avoid rate limiting
-
Tier 4: Last Resort
- Alternative URL formats (youtu.be vs youtube.com)
- Maximum delays (15-30 seconds)
- Final attempt before failure
Recommended Solutions for Persistent Blocking
If you encounter persistent "JSON parsing errors" or empty responses:
- Use Apify Residential Proxies: Enable in
proxyConfiguration
- Reduce Request Frequency: Lower
maxComments
and add delays - Try Different Videos: Some videos have stronger protection
- Geographic Considerations: Some content may be region-restricted
- Timing: YouTube's anti-bot measures vary by time of day
Error Messages and Solutions
Error | Cause | Solution |
---|---|---|
"Expecting value: line 1 column 1 (char 0)" | Anti-bot blocking | Use residential proxies, reduce frequency |
"Comments disabled" | Video settings | Try different videos |
"Age restrictions" | Content policy | Use authenticated requests (not supported) |
"Geographic restrictions" | Regional blocking | Use proxies from different regions |
Technical Details
Dependencies
apify>=1.7.0
- Apify SDK for Pythonyoutube-comment-downloader>=0.1.76
- Core scraping librarypandas>=2.0.0
- Data processing and exporttenacity>=8.2.0
- Retry mechanismspsutil>=5.9.0
- Memory monitoringopenpyxl>=3.1.0
- Excel export support
Error Handling
- Advanced Retry Logic: 5-tier retry system with exponential backoff (2-60 seconds)
- Anti-Bot Recovery: Automatic detection and recovery from blocking
- Memory Protection: Prevents out-of-memory errors with real-time monitoring
- Graceful Degradation: Continues processing if individual videos fail
- Comprehensive Logging: Detailed error reporting with specific failure reasons
- Session Rotation: Fresh sessions and headers for each retry attempt
Best Practices
- Use Residential Proxies: Enable Apify residential proxies for best success rates
- Set Reasonable Limits: Use
maxComments
to control resource usage and avoid detection - Monitor Memory: Increase memory allocation for large datasets
- Handle Errors: Check logs for failed videos or comments
- Respect Rate Limits: Don't scrape too aggressively to avoid IP blocking
- Test with Different Videos: Some videos have stronger anti-bot protection
- Use Delays: Allow sufficient time between requests (built-in delays are optimized)
- Monitor Success Rates: Check actor logs for blocking patterns
Troubleshooting
Common Issues
"JSON parsing error" or empty responses:
- YouTube's anti-bot system is blocking requests
- Solution: Enable residential proxies, reduce request frequency
- The actor automatically implements 4-tier recovery strategies
"All tiers failed" error:
- Very strong anti-bot protection on specific video
- Solution: Try different videos, use residential proxies, wait and retry
Memory issues:
- Large datasets exceeding memory limits
- Solution: Increase
memoryMbytes
or reducemaxComments
Slow performance:
- Anti-bot delays are necessary for success
- Solution: This is expected behavior to avoid detection
Research-Based Optimizations
This actor incorporates findings from extensive research on YouTube scraping:
- Browser fingerprint randomization prevents detection
- Session management mimics real user behavior
- Multi-tier fallback strategies ensure maximum success rates
- Realistic delays and headers based on successful scraping patterns
License
This actor uses the youtube-comment-downloader
library under MIT license terms.
On this page
Share Actor: