on clicking this link you get redirected to the correct website. This is breaking all my flows, this has been happening since last week, I was getting correct links before that

dynamic_harvest

After

dynamic_harvest

Before

Kristýna Lhoťanová (lhotanova)

Hello, thank you for reporting this bug and I'm sorry for a late reply. I'm currently reworking the Actor to address this issue as well as other bugs. I'll keep you updated about the progress here.

Kristýna Lhoťanová (lhotanova)

Hi again, there has been a recent change of Google News API that caused this bug. The Actor has been fixed today, see the example run: https://console.apify.com/view/runs/oqDhZjFezcTVZHdAi

The Actor first extracts the links in the https://news.google.com/ format from the API, then it decodes them to the actual target links and opens the target pages to extract preview images. It needs to be done this way because the Actor doesn't use a web browser that could resolve the redirects automatically. It uses HTTP requests only to keep the expenses low. Google News has recently made the encoding of target URLs more difficult to deal with, so the Actor now uses rather a hacky way to decode the links. Hopefully the Google News API will be stable now and won't break the Actor's flow again.

If you encounter any other issues, please report them using new issue threads 🙏

boothdev

Hi, we seems to be getting a lot of links like that causing it to show a good amount of failures on every run.

boothdev

Any update on this as we are seeing quite a few links come up as https://news.google.com/_/DotsSplashUi/data/batchexecute?rpcids=Fbv4je

I am also having trouble with article urls being "https://news.google.com/rss/articles/..." rather than the actual aricle url. Could this be fixed ?

Kristýna Lhoťanová (lhotanova)

Hi, could you please share example runs where this issue occurred?

I did a test run with 1800 results and didn't find any https://news.google.com links stored in the link output field (see screenshot). Knowing which query triggers the issue would help me a lot debugging this.

It's also possible it was fixed on Google News' side in the meantime but I need to redo the problematic runs to validate it.

Regarding the URL https://news.google.com/_/DotsSplashUi/data/batchexecute?rpcids=Fbv4je, the Actor uses it to decode links from the https://news.google.com/ format.

I have another idea what could go wrong - did you set the input filed extractImages to true or false? Originally, non-RSS article links were served by the RSS API, so the Actor was able to scrape them without opening article pages. Crawling articles pages could be switched on by extractImages: true, because images were extracted from these article pages (they still are). After the latest change, non-RSS links are no longer available in the RSS API. It became necessary to crawl article pages to get decoded article links. Therefor, you need to use extractImages: true to get non-RSS article links.

I'm sorry for this confusion, I didn't realize that the function of extractImages changed into a switch between (not) getting decoded article links. I'll add a new field fetchArticleDetails instead and possibly deprecate extractImages

Kristýna Lhoťanová (lhotanova)

So, fetchArticleDetails input field was added and extractImages deprecated. extractImages can still be used in the JSON input editor or when calling the Actor via API but it is no longer displayed in the Manual input editor.

fetchArticleDetails: true activates additional requests to article pages to decode the links from RSS format and also extract images from the page's metadata.
- test run: https://console.apify.com/view/runs/huZYRddUxVUsrPxEq
fetchArticleDetails: false disables requests to article pages so the links are stored in the RSS format https://news.google.com/rss/articles/ and no images are scraped. The runs with this option disabled are much faster and cheaper though, thanks to saving a lot of extra requests.
- test run: https://console.apify.com/view/runs/oaf7KSCUBS519ARj5

Please let me know if you still struggle to scrape decoded article URLs with fetchArticleDetails option enabled. Thank you!

Hi,

With the option "fetchArticleDetails: true", "link" is pointing to the direct article and not the rss url.

Thank you for the fix.

Kristýna Lhoťanová (lhotanova)

Thank you for your feedback, I'm glad it’s working for you now!

Add comment

Google News Realtime Scraper

devisty/google-news

Provide real-time news and articles sourced from Google News

Devisty

180

5.0

Google News Scraper

easyapi/google-news-scraper

Powerful Google News scraper, collect up to 5000 news articles with flexible search options, language support. Perfect for news aggregation, market research, and sentiment analysis. 📰🔍

EasyApi

282

4.3

Fast Google News Scraper

aymorato/fast-google-news-scraper

Extract details from Google News articles, such as images, titles, links, and other relevant information.

Alwin Morato

142

Google News Scraper (Pay Per Result)

data_xplorer/google-news-scraper-fast

⚡️ Extract real-time news including Images and Descriptions from Google News with our powerful scraper. Get comprehensive structured data including titles, sources, publication dates and full article summaries. Perfect for news monitoring, market research and content aggregation.

Data Xplorer

168

5.0

Google News Scraper

epctex/google-news-scraper

Unlock timely news insights with our Google News data retrieval tool. Get the latest news on any news at any time, and more. Effortless and powerful. 📰🔍 #NewsData

epctex

443

Google News Scraper - Cheap

bot_kevin/Google-News-Scraper

Easily scrape news from Google News page in .json format.

bot

Google News Scraper Fast & cheap ⭐ (Pay per results) 📰⚡

scrapestorm/google-news-scraper-fast-cheap-pay-per-results

Unlock the power of the Google News scraper tool! 📰✨ Effortlessly gather news articles based on your chosen Keyword or topic 🔍. Get key details like the title 📝 source 🌐, publication time ⏰, images 🖼️, & direct links to the full articles 🔗perfect for staying informed and ahead of the curve! 🚀

Storm_Scraper

236

5.0

Super Fast Google News Scraper (pay per result)

aymorato/super-fast-google-news-scraper-pay-per-result

Efficiently extract direct links to the latest Google News articles from the past 24 hours.

Alwin Morato

811

Google News

canadesk/google-news

Find the latest news with direct source links from Google News. It's fast and costs little!

Canadesk Support

Awesome Google News Scraper

sync-network/awesome-google-news-scraper

This tool scrapes content from Google News, streamlining the collection of latest the information on any topic. Its key feature is the ability to extract full-length articles, not just headlines. Customize results from brief summaries to complete content, revolutionizing your news gathering process.