Dataset Validity Checker avatar

Dataset Validity Checker

Try for free

No credit card required

Go to Store
Dataset Validity Checker

Dataset Validity Checker

equidem/dataset-validity-checker
Try for free

No credit card required

Automatically checks, whether default datasets created by runs of an actor differ too much from the previously encountered ones, allowing it to warn you about web scraping problems caused by, e.g., a website layout changing, or other significant changes in the resulting data.

Actor Id

actIdstringOptional

Id of the actor whose datasets the validity checker is supposed to process.

Task Id

taskIdstringOptional

Id of the task whose datasets the validity checker is supposed to process. Supersedes the actId.

User Token

tokenstringOptional

Token of the user owning the examined actor/task. If not filled, token of the user starting the Dataset Validity Checker is used.

Warning Email

warningEmailstringOptional

An email, where warnings about invalid datasets should be sent.

Clear History

clearHistorybooleanOptional

Set to true if you want the validity checker to discard all previously gathered information about datasets and start anew. You should use this option if you change the actor in a way that significantly changes its results, or if the website changes significantly in a way, that doesn't actually break your actor (e.g. the amount of different items available for purchase at an e-shop changes drastically).

Default value of this property is false

Previous Datasets Considered

previousDatasetsTakenIntoAccountintegerOptional

A number of previous datasets that will be considered when determining whether the dataset is valid. If not filled, the value will be 100.

Minimal Datasets

minimalDatasetCountintegerOptional

Minimal number of datasets processed needed to validate further datasets. Needs to be at most the same value as 'Previous Datasets Considered Count'. If not filled, the value will be 10.

Number Handling Policy

numberHandlingPolicyEnumOptional

Governs what attributes the Dataset Validity Checker considers to be numbers. If it is 'Strict', only values saved as number type will be considered as such. If 'Loose', strings that are numbers in a non-scientific notation are also handled like numbers. 'Strict' policy is generally better, but if you don't convert numbers to the proper type, using 'Loose' should give you better results.

Value options:

"loose": string"strict": string

Default value of this property is "loose"

Starting At

startingAtstringOptional

Allows you to control, what will be the earliest run whose dataset will be processed by this run of Dataset Validity Checker. Will be superseded, if runs from later time have already been processed. Has to be ISO 8601 compliant date/time in UTC.

Until

untilstringOptional

Allows you to control, what will be the latest run whose dataset will be processed by this run of Dataset Validity Checker. Has to be ISO 8601 compliant date/time in UTC.

Average Multiplying Coefficient

averageMultiplyingCoefficientstringOptional

Controls how different the dataset can be compared to the previously seen datasets to still be considered valid in terms of multiples of average difference. Default value is 5.

Maximal Multiplying Coefficient

maximalMultiplyingCoefficientstringOptional

Controls how different the dataset can be compared to the previously seen datasets to still be considered valid in terms of multiples of maximal difference. Default value is 2.

Leniency Coefficient

leniencyCoefficientstringOptional

Allows you to control both 'Maximal Multiplying Coefficient' and 'Average Multiplying Coefficient' at the same time. Is multiplicative, so a value of 2 increases both of them by a factor of 2. Default value is 1.

Developer
Maintained by Community

Actor Metrics

  • 1 monthly user

  • 3 stars

  • >99% runs succeeded

  • Created in Aug 2019

  • Modified 2 years ago

Categories