Legacy PhantomJS Crawler

Pricing

Pay per usage

Try for free

Go to Apify Store

Legacy PhantomJS Crawler

Try for free

Developed by

Apify

Maintained by Apify

Replacement for the legacy Apify Crawler product with a backward-compatible interface. The actor uses PhantomJS headless browser to recursively crawl websites and extract data from them using a piece of front-end JavaScript code.

5.0 (6)

Pricing

Pay per usage

1.6K

Last modified

23 days ago

Developer tools

Open source

You can access the Legacy PhantomJS Crawler programmatically from your own applications by using the Apify API. You can also choose the language preference from below. To use the Apify API, you’ll need an Apify account and your API token, found in Integrations settings in Apify Console.

Python

JavaScript

CLI

OpenAPI

HTTP

MCP

1from apify_client import ApifyClient
2
3# Initialize the ApifyClient with your Apify API token
4# Replace '<YOUR_API_TOKEN>' with your token.
5client = ApifyClient("<YOUR_API_TOKEN>")
6
7# Prepare the Actor input
8run_input = {
9    "startUrls": [{
10            "key": "START",
11            "value": "https://www.example.com/",
12        }],
13    "crawlPurls": [{
14            "key": "MY_LABEL",
15            "value": "https://www.example.com/[.*]",
16        }],
17    "clickableElementsSelector": "a:not([rel=nofollow])",
18    "pageFunction": """function pageFunction(context) {
19    // called on every page the crawler visits, use it to extract data from it
20    var $ = context.jQuery;
21    var result = {
22        title: $('title').text(),
23        myValue: $('TODO').text()
24    };
25    return result;
26}
27""",
28    "interceptRequest": """function interceptRequest(context, newRequest) {
29    // called whenever the crawler finds a link to a new page,
30    // use it to override default behavior
31    return newRequest;
32}
33""",
34}
35
36# Run the Actor and wait for it to finish
37run = client.actor("apify/legacy-phantomjs-crawler").call(run_input=run_input)
38
39# Fetch and print Actor results from the run's dataset (if there are any)
40print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
41for item in client.dataset(run["defaultDatasetId"]).iterate_items():
42    print(item)
43
44# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

Legacy PhantomJS Crawler - Crawl websites, extract data API in Python

The Apify API client for Python is the official library that allows you to use Legacy PhantomJS Crawler API in Python, providing convenience functions and automatic retries on errors.

Install the apify-client

$pip install apify-client

Other API clients include:

Legacy PhantomJS Crawler API in JavaScript

Legacy PhantomJS Crawler API through CLI

Legacy PhantomJS Crawler OpenAPI definition

Legacy PhantomJS Crawler API

Puppeteer Scraper

apify/puppeteer-scraper

Crawls websites with the headless Chrome and Puppeteer library using a provided server-side Node.js code. This crawler is an alternative to apify/web-scraper that gives you finer control over the process. Supports both recursive crawling and list of URLs. Supports login to website.

Apify

9.6K

5.0

Playwright Scraper

apify/playwright-scraper

Crawls websites with the headless Chromium, Chrome, or Firefox browser and Playwright library using a provided server-side Node.js code. Supports both recursive crawling and a list of URLs. Supports login to a website.

Apify

2.8K

4.7

Cheerio Scraper

apify/cheerio-scraper

Crawls websites using raw HTTP requests, parses the HTML with the Cheerio library, and extracts data from the pages using a Node.js code. Supports both recursive crawling and lists of URLs. This actor is a high-performance alternative to apify/web-scraper for websites that do not require JavaScript.

Apify

10K

4.9

Example Puppeteer

apify/example-puppeteer

Example showing how to use headless Chromium with Puppeteer to open a web page, determine its dimensions, save a screenshot, and print the page to PDF. This actor must use images with Puppeteer (Node.js 8 + Puppeteer on Debian).

Apify

426

4.6

Send FCM

martin.forejt/send-fcm

This actor can be used as integration with Firebase Cloud Messaging (FCM). It sends a message (aka push notification) to a device, group of devices or topics. The message can be fully customised supporting all FCM options.

Martin Forejt

5.0

Vanilla JS Scraper

mstephen190/vanilla-js-scraper

Scrape the web using familiar JavaScript methods! Crawls websites using raw HTTP requests, parses the HTML with the JSDOM package, and extracts data from the pages using Node.js code. Supports both recursive crawling and lists of URLs. This actor is a non jQuery alternative to CheerioScraper.

Matthias Stephens

484

Send Legacy PhantomJS Crawler Results

drobnikj/send-crawler-results

This actor downloads results from Legacy PhantomJS Crawler task and sends them to email as attachments. It is designed to run from finish webhook.

Jakub Drobník

Algolia Website Indexer

apify/algolia-website-indexer

The Indexer crawls recursively a website using the Puppeteer browser (headless Chrome) and indexes the selected pages to the Algolia index.

Apify

4.5

API / JSON scraper

pocesar/json-downloader

Scrape any API / JSON URLs directly to the dataset, and return them in CSV, XML, HTML, or Excel formats. Transform and filter the output. Enables you to follow pagination recursively from the payload without the need to visit the HTML page.

Paulo Cesar

540

JSDOM Scraper

apify/jsdom-scraper

Parses the HTML using the JSDOM library, providing the same DOM API as browsers do (e.g. `window`). It is able to process client-side JavaScript without using a real browser. Performance-wise, it stands somewhere between the Cheerio Scraper and the browser scrapers.

Apify

112

4.3