Extended GPT Scraper avatar

Extended GPT Scraper

Try for free

No credit card required

View all Actors
Extended GPT Scraper

Extended GPT Scraper

drobnikj/extended-gpt-scraper
Try for free

No credit card required

Extract data from any website and feed it into GPT via the OpenAI API. Use ChatGPT to proofread content, analyze sentiment, summarize reviews, extract contact details, and much more.

Do you want to learn more about this Actor?

Get a demo
TE

Inquiry Regarding Extended GPT Scraper for Shein Data Extraction

Closed

tarek_eljazeri opened this issue
4 months ago

Hello,

I hope this message finds you well. I am interested in utilizing your Extended GPT Scraper for a project involving data extraction from the Shein website. Before proceeding, I have a few inquiries that I hope you can assist me with:

Could you please confirm if your Extended GPT Scraper is capable of extracting product data from Shein, including attributes such as color, price, size variations, and images?

What is the recommended approach for configuring the scraper to collect data from Shein efficiently? Are there any specific settings or parameters that I should be aware of?

Is it possible to customize the scraper to extract additional data fields beyond the default ones provided? If so, what is the process for implementing these customizations?

Are there any limitations or known issues when using the Extended GPT Scraper with Shein or similar e-commerce websites? How can I mitigate these challenges effectively?

Lastly, could you provide any guidance or best practices for ensuring the accuracy and reliability of the extracted data?

I would greatly appreciate your insights and assistance in addressing these inquiries. Thank you for your time and support.

Best regards, Tarek

lukas.prusa avatar

Hi Tarek, thanks for raising your questions!

Yes, the scraper is capable of extracting Shein's website with all data attributes, it's just a matter of configuration.

All you basically have to worry about are the Page processing settings, which control what is sent to GPT for processing. Each GPT model has its context window, in which you want to fit the page content, often users run out of context and data from the page is cut off. Please ensure you are using a model with a big enough context window. Other than this, you will mostly just need to properly prompt engineer the GPT to extract what you want.

I don't know what your data processing pipeline looks like, but I assume you want JSON output. You can do that under JSON formatted output settings.

I hope this helps, thanks!

Developer
Maintained by Apify
Actor metrics
  • 81 monthly users
  • 28 stars
  • 99.2% runs succeeded
  • 4.4 days response time
  • Created in Jun 2023
  • Modified about 1 month ago