Pricing

$9.00 / 1,000 pages

Go to Store

GPT Scraper

Try for free

Developed by

Jakub Drobník

Extract data from any website and feed it into GPT via the OpenAI API. Use ChatGPT to proofread content, analyze sentiment, summarize reviews, extract contact details, and much more.

4.4 (7)

Pricing

$9.00 / 1,000 pages

108

Total users

6.1K

Monthly users

111

Runs succeeded

99%

Issues response

7.2 days

Last modified

7 months ago

Lead generation

Back to issues Create new issue

Scraper Fails with Google Play Store Urls

Closed

blkbox opened this issue

Error logs

2024-09-10T11:49:34.199Z ACTOR: Pulling Docker image of build nXM2iV00mWsXHDDu6 from repository.
2024-09-10T11:49:34.297Z ACTOR: Creating Docker container.
2024-09-10T11:49:35.463Z ACTOR: Starting Docker container.
2024-09-10T11:49:36.674Z Starting X virtual framebuffer using: Xvfb :99 -ac -screen 0 1920x1080x24+32 -nolisten tcp
2024-09-10T11:49:36.676Z Executing main command
2024-09-10T11:49:39.630Z INFO  System info {"apifyVersion":"3.1.16","apifyClientVersion":"2.9.3","crawleeVersion":"3.8.1","osType":"Linux","nodeVersion":"v18.20.4"}
2024-09-10T11:49:39.773Z INFO  Max pages per crawl: 1
2024-09-10T11:49:40.462Z INFO  Configuration completed. Starting the crawl.
2024-09-10T11:49:40.549Z INFO  PlaywrightCrawler: Starting the crawler.
2024-09-10T11:49:45.697Z INFO  Opening https://play.google.com/store/apps/details?id=com.hardrockdigital.client&hl=en_US&gl=US...
2024-09-10T11:49:46.292Z WARN  PlaywrightCrawler: Reclaiming failed request back to the list or queue. page.evaluate: TypeError: Failed to execute 'parseFromString' on 'DOMParser': This document requires 'TrustedHTML' assignment.
2024-09-10T11:49:46.295Z     at eval (eval at evaluate (:226:30), <anonymous>:2:37)
2024-09-10T11:49:46.298Z     at UtilityScript.evaluate (<anonymous>:228:17)
2024-09-10T11:49:46.300Z     at UtilityScript.<anonymous> (<anonymous>:1:44)
2024-09-10T11:49:46.303Z     at shrinkHtml (/home/myuser/dist/processors.js:10:33) {"id":"u95wgQVoq9JiBtG","url":"https://play.g... [trimmed]

blkbox

Input used

{
  "startUrls": [
    {
      "url": "https://play.google.com/store/apps/details?id=com.hardrockdigital.client&hl=en_US&gl=US"
    }
  ],
  "maxPagesPerCrawl": 1,
  "maxCrawlingDepth": 1,
  "linkSelector": "a[href]",
  "instructions": " \"I have a website link that provides information about a specific brand. \"\n          \"Please visit the link and extract detailed information about the brand.\n\"\n          \"Include the following details in your response:\n\n\"\n          \"Brand Name: The official name of the brand.\n\"\n          \"Overview: A brief description of what the brand is about, including its mission, vision, and values.\n\"\n          \"Products/Services: A list of main products or services offered by the brand.\n\"\n          \"Category of products: Provide only the name of categories of product\"\n          \"Unique selling points:\n\"\n          \"Target Audience: The primary demographic or market segment that the brand caters to.\n\"\n          \"Contact Information: How to get in touch with the brand, including customer service, social media handles, and physical addresses if available.\n\"\n          \"Website Link: The original website link provided for reference.\n\"\n          \"Social Media Handles:\n\n\"\n          Provide the output in JSON format with keys representing above points and values representing the details.\n           ",
  "extractor": {
    "type": "gpt",
    "model": "gpt-4",
    "maxTokens": 4000
  },
... [trimmed]

Prukáš Lůša (lukas.prusa)

Hi, thanks for opening this issue!

I can see that you've found the original issue from a different user for the same website here - https://console.apify.com/actors/paOtbjvyUiNsr1Qms/issues/puLBJmEVQSvgzuXw7

Seeing more users having the same problem, we will prioritize this issue more :)

I will keep you updated here, thanks!

blkbox

Hey Lukas,

Thanks for the response, can we do anything to put this on top of your priority list ?

Context: We are a bootstrapped startup just launched this product, expecting to see scrapings around 4-5K pages per day.

This a blocker for our business, we will be forced to look for alternatives if this doesn't work.

PS: We love your tool it has unlocked key capabilities for us. Would be happy to setup a chat with our engineering team, if that can help resolve the issue quicker.

blkbox

I tried scraping the same url, with apify/playwright-scraper , seems to be working there.

Prukáš Lůša (lukas.prusa)

Thanks for your insight, we will try to finish it this week :)

It's most likely just some very stupid bug in our backend that's somehow magically clashing with this very specific website, so it's just incredibly annoying to catch.

blkbox

Thank you looking forward to the fix. :)

Happy debugging, a rubber duck might help ;)

Prukáš Lůša (lukas.prusa)

Hi again, thanks for the rubber duck, it helped us a ton :) I'm happy to inform you that we've just updated the scraper with the fix!

Try it out and let me know how it works now. Thanks and happy scraping!

blkbox

Hey Lukas,

I can confirm we are able to scrape the play store urls now.

Thank you so much for the fix. :)

Add comment

Extended GPT Scraper

drobnikj/extended-gpt-scraper

Extract data from any website and feed it into GPT via the OpenAI API. Use ChatGPT to proofread content, analyze sentiment, summarize reviews, extract contact details, and much more.

Jakub Drobník

1.5K

4.1

GPT Browser

anchor/gpt-browser

A GPT browser to use OpenAI prompt on any website. Put a list of URLs and a prompt, then the GPT agent will give you the answer you need. Fast, easy, and not limited with OpenAI ChatGPT restrictions. The best way to search and use GPT on large number of websites. Upload Excel or CSV. Screenshots 📸

Anchor

🔍 GPT Search [Private API]

openapi/gpt-search-private-api

Use OpenAI's GPT4o Search mode via API! No cookie or proxy is required. Fast, cheap and reliable.

Open API

5.0

Universal AI GPT Scraper

louisdeconinck/ai-gpt-scraper

Transform any website into structured data with AI-powered extraction. This versatile tool combines advanced web scraping with intelligent content analysis to deliver clean, customized JSON output - perfect for automating data collection from any web source.

Louis Deconinck

5.0

Auto GPT

lukaskrivka/auto-gpt

Run Auto GPT sessions directly on Apify. No OpenAI account or API token is required! Store parsed thoughts into datasets for later analysis.

Lukáš Křivka

199

GPT Search

tri_angle/gpt-search

Send queries to ChatGPT and retrieve structured answers with full source citations. Easily integrate into your tools or workflows for flexible, scalable AI-powered solutions.

Tri⟁angle

ChatGPT

pertosh/chatgpt

You can use this Actor to transform scraped results, such as reviews from restaurants, by rephrasing the sentences. Additionally, translation is also supported. You can also use it to generate new website descriptions, keywords, and other similar metadata.

Alper

148

OpenAI Vector Store Integration

jiri.spilka/openai-vector-store-integration

The Apify OpenAI Vector Store integration uploads data from Apify Actors to the OpenAI Vector Store linked to OpenAI Assistant.

Jiří Spilka

182

4.8

OpenRouter - Unified LLM Interface for ChatGPT, Claude, Gemini

xyzzy/open-router

Use the OpenRouter platform to choose the best and most cost effective model for your prompts utilizing a standardized interface (including ChatGPT, Claude, Gemini, Llama, Mistral, and more). See instructions for creating an OpenRouter account and API key.