Browserless Scraper Pro avatar

Browserless Scraper Pro

Try for free

Pay $10.00 for 1,000 results

Go to Store
Browserless Scraper Pro

Browserless Scraper Pro

datavoyantlab/browserless-scraper-pro
Try for free

Pay $10.00 for 1,000 results

Browserless Scraper Pro is designed to automate common web tasks such as web scraping, taking screenshots, and generating PDFs without the need for manual browser interaction.

Simplify Your Web Interactions with Browserless Scraper Pro

Browserless Scraper Pro, inspired by the functionality of Browserless, ScrapingBee ... but tailored to provide a unique, user-friendly experience. This tool is designed to automate common web tasks such as web scraping, taking screenshots, and generating PDFs without the need for manual browser interaction.

Challenges in Web Interactions for AI

Building AI applications that interact with the web presents several challenges:

  • Dynamic Content: Modern websites often use client-side rendering and lazy loading, requiring tools that can execute JavaScript and wait for page hydration to access full content.

  • Infrastructure Overhead: Managing a fleet of headless browsers for scraping at scale involves complexities related to resource contention, reliability, and cold starts.

  • Lack of Web APIs: Many sites lack proper API access, forcing developers to create and maintain custom scrapers.

This actor is designed to tackle these challenges head-on, providing a robust solution for automating web interactions.

Key Features

  • Web Scraping
    Effortlessly extract data from websites in multiple formats including HTML, readability-enhanced content, cleaned HTML, and Markdown. This feature is perfect for data collection and analysis, allowing users to choose the format that best suits their needs.

  • Screenshot Capture
    Obtain high-resolution screenshots of entire web pages or specific sections. This feature includes options for capturing the full page or just the viewport, making it ideal for visual documentation, quality assurance testing, and sharing visuals across teams.

  • PDF Generation
    Convert web pages into well-formatted PDF documents with options for custom delays to handle dynamic content. This is suitable for archiving articles, generating reports, or saving web content for offline use.

  • Flexible Proxy Configuration
    Configure proxy settings to manage and rotate IPs during scraping activities to avoid detection and blocking by target websites. This feature supports both custom proxies and Apify's built-in proxy solutions.

  • Customizable Delays and Timeouts
    Set custom delays between requests to manage scraping speed and comply with website rate limits, ensuring reliable data extraction without overloading the website servers. Additionally, specify a maximum timeout for operations to prevent excessive delays.

  • Comprehensive Output
    Receive detailed JSON outputs including HTML content, metadata, and extracted links, which provide insights into the structure and content of the target web pages.

How It Works

  1. Select the Task:
    Choose from scraping data, capturing a screenshot, or generating a PDF.

  2. Submit the URLs:
    Provide the URLs of the target webpages.

  3. Customize Options:
    Set parameters such as page size for PDFs, full-page or viewport-specific screenshots, scraping selectors, optional delay for operations, and maximum timeout.

  4. Proxy Configuration:
    Configure proxy settings if necessary, with a default option to use Apify Proxy (Special apify proxies are not supported yet)

  5. Receive Results:
    The tool processes your request and delivers the output in the desired format.

Usage Examples

Web Scraping Input

Scrape Input

1{
2    "operation": "scrape",
3    "urls": ["https://example.com", "https://example2.com"],
4    "format": "html", // Optional, defaults to 'html'. Other formats available: 'readability', 'cleaned_html', 'markdown'
5    "delay": 5000, // Optional, Delay before scraping (in milliseconds)
6    "maxTimeout": 30 // Optional, Maximum timeout for the operation (in seconds)
7}

Screenshot Capture Input

1{
2  "operation": "screenshot",
3  "urls": ["https://example.com"],
4  "fullPage": true,  // Optional, defaults to false
5  "delay": 3000,      // Optional, Delay before scraping (in milliseconds)
6  "maxTimeout": 30 // Optional, Maximum timeout for the operation (in seconds)
7}

PDF Generation Input

1{
2  "operation": "pdf",
3  "urls": ["https://example.com"],
4  "delay": 3000,      // Optional, Delay before scraping (in milliseconds)
5  "maxTimeout": 30 // Optional, Maximum timeout for the operation (in seconds)
6}

Example Output for Web Scraping

Below is an example of the JSON output from a web scraping operation. This output includes the scraped HTML content, metadata about the scrape, and a list of links found on the page.

1{
2  "content": {
3    "html": "<html lang=\"en\" data-theme=\"light\" style=\"color-scheme: light;\"><head>.....</body></html>"
4  },
5  "metadata": {
6    "statusCode": 200,
7    "title": "datavoyantlab (DataVoyantLab) · Apify",
8    "ogImage": "https://apify.com/og-image/user?username=datavoyantlab",
9    "ogTitle": "datavoyantlab (DataVoyantLab) · Apify",
10    "urlSource": "https://apify.com/datavoyantlab",
11    "description": "🔍 Web Data Extraction Specialist | Building tomorrow's automation tools today | Turning data into decisions 💡",
12    "ogDescription": "🔍 Web Data Extraction Specialist | Building tomorrow's automation tools today | Turning data into decisions 💡",
13    "language": "en",
14    "timestamp": "2025-01-12T22:12:40.497Z"
15  },
16  "links": [
17    {
18      "url": "https://apify.com/datavoyantlab#main-content",
19      "text": "Skip to content"
20    },
21    // Additional links omitted for brevity
22  ]
23}

This output is structured to provide comprehensive details about the scraped page, including the HTML content, response status, and various metadata elements like the page title, description, and the original URL. The links array contains objects representing links found on the page, each with a URL and the link text.

Developer
Maintained by Community

Actor Metrics

  • 2 monthly users

  • 0 No stars yet

  • >99% runs succeeded

  • Created in Jan 2025

  • Modified 8 hours ago