AI Web Scraper [No API Key Needed] avatar

AI Web Scraper [No API Key Needed]

Try for free

Pay $160.00 for 1,000 requests

Go to Store
AI Web Scraper [No API Key Needed]

AI Web Scraper [No API Key Needed]

vulnv/ai-web-scraper-no-api-key-needed
Try for free

Pay $160.00 for 1,000 requests

Scrape structured data effortlessly - just describe what you need in plain language, and get precise results tailored to your request. Simplify data extraction with a tool designed for ease and accuracy, no coding required.

Effortlessly extract structured data from web pages by simply describing what you need. This AI-powered web scraper is designed for precision and ease of use, allowing you to customize your data extraction with natural language prompts. Additionally, it attempts to bypass captchas to ensure uninterrupted scraping. Perfect for developers and data analysts looking to streamline their web scraping tasks.

Features

  • Start URLs: Specify the URLs to begin your scraping.
  • Natural Language Prompts: Define the desired output by describing it in plain language.
  • Custom Depth: Configure the scraping depth to suit your needs.
  • Captcha Bypass: Attempts to bypass captchas to ensure uninterrupted scraping.
  • Initial Cookies: Pre-set cookies for all pages the scraper opens.
  • Save HTML: Option to store the full transformed HTML of all pages.
  • Save Markdown: Option to convert and store the transformed HTML as Markdown.

Configuration

Input Schema

The actor accepts the following input parameters:

FieldTypeDescriptionDefault Value
start_urlsArrayList of URLs to start scraping from.[{"url": "https://apify.com"}]
promptStringNatural language description of the desired scraping output."List me all the features with their description."
max_depthIntegerThe maximum depth for recursive scraping.0
initial_cookiesStringCookies that will be pre-set to all pages the scraper opens.[]
save_html_to_key_value_storeBooleanIf enabled, stores full transformed HTML of all pages found to the default key-value store.true
save_markdown_to_key_value_storeBooleanIf enabled, converts the transformed HTML of all pages found to Markdown, and stores it under the markdown field in the output dataset.true

Example Input 1

1{
2    "start_urls": [
3        { "url": "https://simple.wikipedia.org/wiki/List_of_European_countries" }
4    ],
5    "prompt": "List all the European countries",
6    "max_depth": 0
7}

Output:

1[
2    {
3        "EuropeanCountries": [
4            "Albania",
5            "Andorra",
6            "Armenia",
7            "Austria",
8            "Azerbaijan",
9            "Belarus",
10            "Belgium",
11            "Bosnia and Herzegovina",
12            "Bulgaria",
13            "Croatia",
14            "Cyprus",
15            "Czech Republic",
16            "Denmark",
17            "Estonia",
18            "Finland",
19            "France",
20            "Georgia",
21            "Germany",
22            "Greece",
23            "Hungary",
24            "Iceland",
25            "Ireland",
26            "Italy",
27            "Kazakhstan",
28            "Kosovo",
29            "Latvia",
30            "Liechtenstein",
31            "Lithuania",
32            "Luxembourg",
33            "Malta",
34            "Moldova",
35            "Monaco",
36            "Montenegro",
37            "Netherlands",
38            "North Macedonia",
39            "Norway",
40            "Poland",
41            "Portugal",
42            "Romania",
43            "Russia",
44            "San Marino",
45            "Serbia",
46            "Slovakia",
47            "Slovenia",
48            "Spain",
49            "Sweden",
50            "Switzerland",
51            "Turkey",
52            "Ukraine",
53            "United Kingdom",
54            "Vatican City"
55        ],
56        "url": "https://simple.wikipedia.org/wiki/List_of_European_countries",
57        "key": "simple_wikipedia_org_wiki_List_of_European_countries"
58    }
59]

Example Input 2

1{
2    "max_depth": 0,
3    "prompt": "This page contains a list of fashion products. Per each product, scrape the following fields for each product: product code, full price, price, currency, itemurl, imageurl, product category, product subcategory, product name. The product code is a numeric string that can be found in the item url. Full price is the price of the product without discounts, if any. If there is no discount, use the only price shown. Price is the product price after discounts: if there's no discount, use the only product price available. Currency is the ISO code of the currency used to display prices on this page. Imageurl is the URL of the image of the product, used as a thumbnail on this page.",
4    "save_html_to_key_value_store": true,
5    "save_markdown_to_key_value_store": true,
6    "start_urls": [
7        {
8            "url": "https://www.net-a-porter.com/en-it/shop/clothing",
9            "method": "GET"
10        }
11    ]
12}

Output:

1[
2    {
3        "products": [
4            {
5                "product_code": "1647597349677034",
6                "full_price": "1590",
7                "price": "1590",
8                "currency": "EUR",
9                "itemurl": "https://www.net-a-porter.com/en-it/shop/product/gabriela-hearst/clothing/midi-dresses/tenes-belted-ribbed-silk-and-cashmere-blend-midi-dress/1647597349677034",
10                "imageurl": "//www.net-a-porter.com/variants/images/1647597349677034/in/w358_q60.jpg",
11                "product_category": "Clothing",
12                "product_subcategory": "Midi Dresses",
13                "product_name": "Tenes belted ribbed silk and cashmere-blend midi dress"
14            },
15            {
16                "product_code": "1647597344535411",
17                "full_price": "3243",
18                "price": "3243",
19                "currency": "EUR",
20                "itemurl": "https://www.net-a-porter.com/en-it/shop/product/suzie-kondi/clothing/long/kyma-cashmere-coat/1647597344535411",
21                "imageurl": "//www.net-a-porter.com/variants/images/1647597344535411/in/w358_q60.jpg",
22                "product_category": "Clothing",
23                "product_subcategory": "Coats",
24                "product_name": "Kyma cashmere coat"
25            },
26            ...
27        ],
28        "url": "https://www.net-a-porter.com/en-it/shop/clothing",
29        "key": "www_net_a_porter_com_en_it_shop_clothing"
30    }
31]

How to Use

  1. Set the start URLs to specify the pages you want to scrape.
  2. Write a prompt to describe your desired output.
  3. Set the maximum depth to control recursive scraping.
  4. Run the actor and get structured results based on your input!

Output

The actor outputs structured data in JSON format, tailored to your provided prompt.

Explore More Actors

Looking for additional solutions? Check out more actors on Apify that can help with your web automation and data extraction needs. Discover a wide range of tools tailored for different scenarios at 🌐 Explore Vulnv's Actors on Apify.

📧 For inquiries or support, feel free to reach out to us at apify@vulnv.com.

Developer
Maintained by Community

Actor Metrics

  • 8 monthly users

  • 1 star

  • 97% runs succeeded

  • Created in Dec 2024

  • Modified 12 hours ago