Awesome Google News Scraper
1 day trial then $5.00/month - No credit card required now
Awesome Google News Scraper
1 day trial then $5.00/month - No credit card required now
This tool scrapes content from Google News, streamlining the collection of latest the information on any topic. Its key feature is the ability to extract full-length articles, not just headlines. Customize results from brief summaries to complete content, revolutionizing your news gathering process.
Unlock the power of comprehensive news analysis with this cutting-edge Apify actor! Designed to revolutionize how you gather and process information, this tool doesn't just scrape headlines – it delivers entire articles right to your fingertips. By leveraging Google News as its source, our actor offers an unparalleled ability to extract, filter, and aggregate full-length news content on any topic you choose.
Features
• Full Article Extraction: Unlike standard RSS feeds or basic scrapers, this actor can retrieve the complete text of articles, giving you access to in-depth content without leaving the platform. • Customizable Content Length: Whether you need a quick summary or the entire story, you're in control. Choose between a specific word count or opt for the full article. • Smart Filtering: Easily exclude unwanted content with customizable keyword filters. • Flexible Time Ranges: Stay current or research past events with adjustable time frame options. • Streamlined Data Structure: Receive well-organized output including titles, URLs, publication dates, sources, and more. • Optional Image Retrieval: Choose whether to fetch image URLs for articles, balancing between comprehensive data and faster performance.
Transform your news gathering process and gain deeper insights with our actor's unique ability to provide complete article content. Say goodbye to surface-level summaries and hello to comprehensive news analysis at your fingertips!
Input
The actor accepts the following input parameters:
Parameter | Type | Description |
---|---|---|
keyword | String | The search term for news (e.g., "BRICS", "Politics") |
numberOfItems | Number | The number of news items to fetch (default: 10, maximum: 100) |
filterBadKeywords | Array | Optional array of keywords to filter out unwanted news items |
contentLength | String/Number | Number of words to extract from the article or 'full' for entire content |
timeRange | String | Time range for news articles (e.g., "Past hour", "Past 24 hours", "Past week", "Past year") |
retrieveImage | Boolean | Whether to retrieve image URLs for articles (default: false) |
Example input:
json { "keyword": "Bitcoin", "numberOfItems": 20, "filterBadKeywords": ["scam", "fraud"], "contentLength": "200", "timeRange": "Past week", "retrieveImage": false }
Output
The actor outputs a dataset with the following structure for each news article:
- title: The title of the news article
- link: The resolved URL of the article
- pubDate: The publication date of the article
- source: The source (news outlet) of the article
- imageUrl: The URL of the article's main image (if retrieveImage is set to true)
- summary: A brief summary of the article
- content: The extracted content of the article (based on contentLength parameter)
Usage
- Configure your desired input parameters
- Run the actor
- Retrieve the results from the dataset
Performance
The performance of this actor can vary based on the number of items requested and the complexity of the articles being scraped. Here are some general guidelines:
- Processing Time: On average, the actor takes about 5-10 seconds per article for full content extraction.
- Scalability: The actor is designed to handle up to 100 items per run efficiently.
- Concurrent Requests: To balance performance and politeness to source websites, the actor processes up to 5 articles concurrently.
For optimal performance, we recommend:
- Limiting requests to 50 items or fewer for quicker results.
- Using more specific keywords to target relevant articles and reduce processing time.
- Setting a reasonable
contentLength
if you don't need the full article text. - Keeping
retrieveImage
set to false unless image URLs are necessary, as this can significantly speed up the scraping process.
Note: Performance can be affected by factors such as network latency and the responsiveness of source websites.
Error Handling
This actor is designed with robust error handling to ensure smooth operation:
- Network Issues: If a connection to Google News fails, the actor will retry up to 3 times before moving on to the next item.
- Rate Limiting: The actor implements a delay between requests to avoid triggering Google's rate limits. If rate limiting is detected, the actor will pause for 60 seconds before retrying.
- Article Extraction: If the full text of an article cannot be extracted, the actor will fall back to providing the summary from the RSS feed.
- Invalid Inputs: The actor validates all inputs and will provide meaningful error messages for any invalid parameters.
In case of any unrecoverable errors, the actor will log the error details and continue processing the remaining items where possible.
Actor Metrics
16 monthly users
-
3 stars
98% runs succeeded
1.5 days response time
Created in Aug 2024
Modified 3 months ago