Diff Datasets avatar
Diff Datasets

Pricing

Pay per usage

Go to Store
Diff Datasets

Diff Datasets

Developed by

Paulo Cesar

Maintained by Community

Take one dataset on Apify platform, compare it to another, and output the missing ones. This can also be used to output only changed items, using a compound key

0.0 (0)

Pricing

Pay per usage

3

Monthly users

2

Runs succeeded

50%

Last modified

3 years ago

Take one dataset on Apify platform, compare to another, and output the missing ones. This can also be used to output only changed items, using a compound key.

Supports using whole nested objects as value, they are JSON.stringify'd before being turned into a small non-cryptographic space efficient hash

Example

1await Apify.call('pocesar/diff-datasets', {
2    baseDatasetId: 'LdNAlaOY1aKGhwAah', // place the datasets here. The order of "base" and "other" matters
3    otherDatasetId: 'Bzu1pgOjenN43VhPY', // existing items in "base" are not output from "other"
4    uniqueFields: [
5        // simple primitive field value, like string, number, boolean
6        "pageUrl",
7
8        // you can use lodash.get notation to get nested items,
9        // in this case `sub.fields.0` works like `sub.fields[0]` and the object looks like
10        // {
11        //    pageUrl: "https//pageurl",
12        //    sub: {
13        //      fields: [
14        //        {...},
15        //        {...}
16        //      ]
17        //    }
18        //  }
19        "sub.fields.0",
20
21        // you can also use .length to count arrays or string characters, as in
22        "sub.fields.length",
23        "pageUrl.length"
24    ],
25});

Limitations

  • Every value is kept in memory while reading from the base dataset, more items more memory needed.
  • The key value store might choke when trying to save the in-memory Set with too many items

License

Apache 2.0

Pricing

Pricing model

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage.