Website Checker avatar

Website Checker

Try for free

No credit card required

Go to Store
Website Checker

Website Checker

lukaskrivka/website-checker
Try for free

No credit card required

Check any website you plan to scrape for expected Compute unit consumption, anti-scraping software, and reliability.

Do you want to learn more about this Actor?

Get a demo

URLs to check

urlsToCheckarrayRequired

A static list of URLs to check for captchas. To be able to add new URLs on the fly, enable the Use request queue option.

For details, see Start URLs in README.

Proxy Configuration

proxyConfigurationobjectOptional

Specifies proxy servers that will be used by the scraper in order to hide its origin.

For details, see Proxy configuration in README.

Default value of this property is {}

Cheerio

checkers.cheeriobooleanOptional

Crawl with Cheerio

Default value of this property is true

Puppeteer

checkers.puppeteerbooleanOptional

Crawl with Puppeteer

Default value of this property is true

Playwright

checkers.playwrightbooleanOptional

Crawl with Playwright

Default value of this property is true

Enabled

saveSnapshotbooleanOptional

Will save HTML for Cheerio and HTML + screenshot for Puppeteer/Playwright

Default value of this property is true

Enqueue any URL on domain (no need for link selector or pseudo URLs)

enqueueAllOnDomainbooleanOptional

Will enqueue any URLs on the domain

Default value of this property is true

Link Selector

linkSelectorstringOptional

A CSS selector saying which links on the page (<a> elements with href attribute) shall be followed and added to the request queue. This setting only applies if Use request queue is enabled. To filter the links added to the queue, use the Pseudo-URLs setting.

If Link selector is empty, the page links are ignored.

For details, see Link selector in README.

Pseudo-URLs

pseudoUrlsarrayOptional

Specifies what kind of URLs found by Link selector should be added to the request queue. A pseudo-URL is a URL with regular expressions enclosed in [] brackets, e.g. http://www.example.com/[.*]. This setting only applies if the Use request queue option is enabled.

If Pseudo-URLs are omitted, the actor enqueues all links matched by the Link selector.

For details, see Pseudo-URLs in README.

Default value of this property is []

Repeat checks on provided URLs

repeatChecksOnProvidedUrlsintegerOptional

Will access each URL multiple times. Useful to test the same URL or bypass blocking of the first page.

Max number of pages checked per domain

maxNumberOfPagesCheckedPerDomainintegerOptional

The maximum number of pages that the checker will load. The checker will stop when this limit is reached. It's always a good idea to set this limit in order to prevent excess platform usage for misconfigured scrapers. Note that the actual number of pages loaded might be slightly higher than this value.

If set to 0, there is no limit.

Maximum concurrent pages checked per domain

maxConcurrentPagesCheckedPerDomainintegerOptional

Specifies the maximum number of pages that can be processed by the checker in parallel for one domain. The checker automatically increases and decreases concurrency based on available system resources. This option enables you to set an upper limit, for example to reduce the load on a target website.

Default value of this property is 500

Maximum number of concurrent domains checked

maxConcurrentDomainsCheckedintegerOptional

Specifies the maximum number of domains that should be checked at a time. This setting is relevant when passing in more than one URL to check.

Default value of this property is 5

Retire browser instance after request count

retireBrowserInstanceAfterRequestCountintegerOptional

How often will the browser itself rotate. Pick a higher number for smaller consumption, pick a lower number to rotate (test) more proxies.

Default value of this property is 10

navigationTimeoutSecsintegerOptional

Specifies the maximum time in seconds the request will wait for the page to load. If the page is not loaded within this time, the browser will throw an error and the page will be marked as failed.

Default value of this property is 60

Headfull browser (XVFB)

puppeteer.headfullbooleanOptional

Only works for Puppeteer type!

Use Chrome

puppeteer.useChromebooleanOptional

Only works for Puppeteer type! Be careful that Chrome is not guaranteed to work with Puppeteer.

Wait for

puppeteer.waitForstringOptional

Only works for Puppeteer type. Will wait on each page. You can provide number in ms or a selector.

Default value of this property is "2000"

Memory

puppeteer.memoryintegerOptional

Must be power of 2 between 128 and 32768.

Default value of this property is 4096

Chrome

playwright.chromebooleanOptional

Use Chrome when checking

Default value of this property is false

Firefox

playwright.firefoxbooleanOptional

Use Firefox when checking

Default value of this property is true

Safari (Webkit)

playwright.webkitbooleanOptional

Use Safari when checking

Use Chrome instead of Chromium

playwright.useChromebooleanOptional

Only works for Playwright type! Be careful that Chrome is not guaranteed to work with Playwright.

Headfull browser (XVFB)

playwright.headfullbooleanOptional

If the browser should be headfull or not

Wait for

playwright.waitForstringOptional

Only works for playwright type. Will wait on each page. You can provide number in ms or a selector.

Default value of this property is "2000"

Memory

playwright.memoryintegerOptional

Must be power of 2 between 128 and 32768.

Default value of this property is 4096

Developer
Maintained by Apify

Actor Metrics

  • 7 monthly users

  • 26 stars

  • >99% runs succeeded

  • Created in Jan 2020

  • Modified 7 months ago

Categories