Example Process Crawl Results avatar
Example Process Crawl Results

Pricing

Pay per usage

Go to Store
Example Process Crawl Results

Example Process Crawl Results

Developed by

Apify

Maintained by Apify

Iterates through all results from a crawler run and count them. Needs to be called from the crawler's finish webhook by adding an URL to finish the webhook of your crawler. Use this actor as a starting point to develop custom post-processing of data from the crawler.

4.5 (2)

Pricing

Pay per usage

5

Monthly users

1

Last modified

10 months ago

Dockerfile

1# This is a template for a Dockerfile used to run acts in Actor system.
2# The base image name below is set during the act build, based on user settings.
3# IMPORTANT: The base image must set a correct working directory, such as /usr/src/app or /home/user
4FROM apify/actor-node-basic:v0.21.10
5
6# Second, copy just package.json and package-lock.json since it should be
7# the only file that affects "npm install" in the next step, to speed up the build
8COPY package*.json ./
9
10# Install NPM packages, skip optional and development dependencies to
11# keep the image small. Avoid logging too much and print the dependency
12# tree for debugging
13RUN npm --quiet set progress=false \
14 && npm install --only=prod --no-optional \
15 && echo "Installed NPM packages:" \
16 && (npm list --all || true) \
17 && echo "Node.js version:" \
18 && node --version \
19 && echo "NPM version:" \
20 && npm --version
21
22# Copy source code to container
23# Do this in the last step, to have fast build if only the source code changed
24COPY  . ./
25
26# NOTE: The CMD is already defined by the base image.
27# Uncomment this for local node inspector debugging:
28# CMD [ "node", "--inspect=0.0.0.0:9229", "main.js" ]

package.json

1{
2    "name": "apify-project",
3    "version": "0.0.1",
4    "description": "",
5    "author": "It's not you it's me",
6    "license": "ISC",
7    "dependencies": {
8        "apify": "0.21.10",
9        "underscore": "latest"
10    },
11    "scripts": {
12        "start": "node main.js"
13    }
14}

main.js

1const Apify = require('apify');
2const _ = require('underscore');
3
4Apify.main(async () => {
5    // Get act input and validate it
6    const input = await Apify.getValue('INPUT');
7    console.log('Input:')
8    console.dir(input);
9    if (!input || !input._id) {
10        throw new Error('Input is missing the "_id" attribute. Did you start it from crawler finish webhook?');
11    }
12    const executionId = input._id;
13    
14    // Print info about crawler run
15    const crawlerRunDetails = await Apify.client.crawlers.getExecutionDetails({ executionId });
16    if (!crawlerRunDetails) {
17        throw new Error(`There is no crawler run with ID: "${executionId}"`);
18    }
19    console.log(`Details of the crawler run (ID: ${executionId}):`);
20    console.dir(crawlerRunDetails);
21    
22    // Iterate through all crawler results and count them
23    // Here is the place where you can add something more adventurous :)
24    console.log(`Counting results from crawler run...`);
25    
26    const limit = 100;
27    let offset = 0;
28    let totalItems = 0;
29    let results;
30    
31    do {
32        results = await Apify.client.crawlers.getExecutionResults({ 
33            executionId,
34            limit,
35            offset
36        });
37        
38        offset += results.count;
39        totalItems += results.items.length;
40    } while (results.count > 0);
41    
42    // Save results
43    console.log(`Found ${totalItems} records`);
44    await Apify.setValue('OUTPUT', {
45        crawlerRunDetails,
46        totalItems
47    });
48    
49});

Pricing

Pricing model

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage.