Website Content Crawler avatar
Website Content Crawler

Pricing

Pay per usage

Go to Store
Website Content Crawler

Website Content Crawler

Developed by

Apify

Apify

Maintained by Apify

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

4.0 (41)

Pricing

Pay per usage

1638

Total users

65K

Monthly users

8.5K

Runs succeeded

>99%

Issues response

6.9 days

Last modified

3 days ago

IM

how to handle navigation away issue.

Closed

imkundeng opened this issue
12 days ago

2025-07-21T16:22:34.557Z ACTOR: Pulling Docker image of build N5wC2ArUbLgbcEpaX from registry. 2025-07-21T16:22:34.559Z ACTOR: Creating Docker container. 2025-07-21T16:22:34.606Z ACTOR: Starting Docker container. 2025-07-21T16:22:34.864Z Starting X virtual framebuffer using: Xvfb :99 -ac -screen 0 1920x1080x24+32 -nolisten tcp 2025-07-21T16:22:34.866Z Executing main command 2025-07-21T16:22:36.150Z INFO System info {"apifyVersion":"3.4.2","apifyClientVersion":"2.12.4","crawleeVersion":"3.13.8","osType":"Linux","nodeVersion":"v22.9.0"} 2025-07-21T16:22:36.715Z INFO Crawling will be started using 1 start URLs and 0 sitemap URLs 2025-07-21T16:22:36.842Z [DEBUG] gradually switching: GotScrapingHttpClient -> ImpitHttpClient (picked: ImpitHttpClient) 2025-07-21T16:22:37.251Z INFO PlaywrightCrawler: Starting the crawler. 2025-07-21T16:23:37.251Z INFO PlaywrightCrawler:Statistics: PlaywrightCrawler request statistics: {"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":null,"requestsFinishedPerMinute":0,"requestsFailedPerMinute":0,"requestTotalDurationMillis":0,"requestsTotal":0,"crawlerRuntimeMillis":60404,"retryHistogram":[]} 2025-07-21T16:23:37.282Z INFO PlaywrightCrawler:AutoscaledPool: state {"currentConcurrency":1,"desiredConcurrency":3,"systemStatus":{"isSystemIdle":true,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":0},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.6,"actualRatio":0},"cpuInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":0},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":0}}} 2025-07-21T16:23:39.494Z WARN PlaywrightCrawler: Reclaiming failed request back to the list or queue. Navigation timed out after 60 seconds. {"id":"XSSZTx1njETKIQ3","url":"https://help.splunk.com/en/splunk-cloud-platform/search/spl2-search-manual","retryCount":1} 2025-07-21T16:24:17.419Z WARN Failed to process HTML with 'readableText', falling back to 'none' for URL https://help.splunk.com/en/splunk-cloud-platform/search/spl2-search-manual. 2025-07-21T16:24:17.742Z INFO Enqueued 13 new links on https://help.splunk.com/en/splunk-cloud-platform/search/spl2-search-manual. 2025-07-21T16:24:37.274Z INFO PlaywrightCrawler:Statistics: PlaywrightCrawler request statistics: {"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":35028,"requestsFinishedPerMinute":0,"requestsFailedPerMinute":0,"requestTotalDurationMillis":35028,"requestsTotal":1,"crawlerRuntimeMillis":120428,"retryHistogram":[null,1]} 2025-07-21T16:24:37.285Z INFO PlaywrightCrawler:AutoscaledPool: state {"currentConcurrency":4,"desiredConcurrency":5,"systemStatus":{"isSystemIdle":true,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":0},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.6,"actualRatio":0.04},"cpuInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":0.139},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":0}}} 2025-07-21T16:24:53.585Z WARN PlaywrightCrawler: Reclaiming failed request back to the list or queue. page.evaluate: Execution context was destroyed, most likely because of a navigation. 2025-07-21T16:24:53.587Z at expandClickableElements (/home/myuser/dist/utils.js:238:16) {"id":"w0nwFEpnzump7LK","url":"https://help.splunk.com/en/splunk-cloud-platform/search/spl2-search-manual/introduction/introduction","retryCount":1} 2025-07-21T16:24:55.492Z WARN PlaywrightCrawler: Reclaiming failed request back to the list or queue. page.evaluate: Execution context was destroyed, most likely because of a navigation. 2025-07-21T16:24:55.494Z at expandClickableElements (/home/myuser/dist/utils.js:238:16) {"id":"P8BZusCLBJNfCGp","url":"https://help.splunk.com/en/splunk-cloud-platform/search/spl2-search-manual/get-started-searching/start-searching-using-spl2","retryCount":1} 2025-07-21T16:24:58.115Z WARN PlaywrightCrawler: Reclaiming failed request back to the list or queue. page.evaluate: Execution context was destroyed, most likely because of a navigation 2025-07-21T16:24:58.119Z at genericHandler (/home/myuser/dist/request-handler.js:120:31) {"id":"jag5eHdvKm4RSU5","url":"https://help.splunk.com/en/splunk-cloud-platform/search/spl2-search-manual/modules-statements-and-views/modules-and-spl2-statements","retryCount":1} 2025-07-21T16:25:01.899Z WARN PlaywrightCrawler: Reclaiming failed request back to the list or queue. page.evaluate: Execution context was destroyed, most likely because of a navigation 2025-07-21T16:25:01.910Z at genericHandler (/home/myuser/dist/request-handler.js:120:31) {"id":"b0ufcCFGczAQpfR","url":"https://help.splunk.com/en/splunk-cloud-platform/search/spl2-search-manual/export-import-and-namespaces/exporting-module-items-using-spl2","retryCount":1} 2025-07-21T16:25:37.275Z INFO PlaywrightCrawler:Statistics: PlaywrightCrawler request statistics: {"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":35028,"requestsFinishedPerMinute":0,"requestsFailedPerMinute":0,"requestTotalDurationMillis":35028,"requestsTotal":1,"crawlerRuntimeMillis":180428,"retryHistogram":[null,1]} 2025-07-21T16:25:37.288Z INFO PlaywrightCrawler:AutoscaledPool: state {"currentConcurrency":2,"desiredConcurrency":3,"systemStatus":{"isSystemIdle":true,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":0},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.6,"actualRatio":0},"cpuInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":0},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":0}}} 2025-07-21T16:26:05.039Z WARN PlaywrightCrawler: Reclaiming failed request back to the list or queue. Navigation timed out after 60 seconds. {"id":"6vbiDWw2ohtwWVv","url":"https://help.splunk.com/en/splunk-cloud-platform/search/spl2-search-manual/datasets-and-dataset-literals/datasets","retryCount":1} 2025-07-21T16:26:08.220Z WARN PlaywrightCrawler: Reclaiming failed request back to the list or queue. Navigation timed out after 60 seconds. {"id":"KbrNjxjZq4dnXw9","url":"https://help.splunk.com/en/splunk-cloud-platform/search/spl2-search-manual/expressions-and-predicates/types-of-expressions","retryCount":1} 2025-07-21T16:26:28.829Z WARN Failed to process HTML with 'readableText', falling back to 'none' for URL https://help.splunk.com/en/splunk-cloud-platform/search/spl2-search-manual/functions/built-in-and-custom-functions. 2025-07-21T16:26:29.115Z INFO Enqueued 4 new links on https://help.splunk.com/en/splunk-cloud-platform/search/spl2-search-manual/functions/built-in-and-custom-functions. 2025-07-21T16:26:37.275Z INFO PlaywrightCrawler:Statistics: PlaywrightCrawler request statistics: {"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":29549,"requestsFinishedPerMinute":0,"requestsFailedPerMinute":0,"requestTotalDurationMillis":59097,"requestsTotal":2,"crawlerRuntimeMillis":240429,"retryHistogram":[1,1]} 2025-07-21T16:26:37.279Z WARN PlaywrightCrawler: Reclaiming failed request back to the list or queue. mouse.wheel: Execution context was destroyed, most likely because of a navigation 2025-07-21T16:26:37.281Z at async genericHandler (/home/myuser/dist/request-handler.js:120:9) {"id":"Fy885mdx1mqP8wJ","url":"https://help.splunk.com/en/splunk-cloud-platform/search/spl2-search-manual/dates-and-time/timestamps-and-time-ranges","retryCount":1} 2025-07-21T16:26:47.290Z INFO PlaywrightCrawler:AutoscaledPool: state {"currentConcurrency":6,"desiredConcurrency":5,"systemStatus":{"isSystemIdle":false,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":0},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.6,"actualRatio":0.039},"cpuInfo":{"isOverloaded":true,"limitRatio":0.4,"actualRatio":0.767},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":0}}} 2025-07-21T16:26:57.884Z WARN PlaywrightCrawler: Reclaiming failed request back to the list or queue. page.evaluate: Execution context was destroyed, most likely because of a navigation. 2025-07-21T16:26:57.886Z at expandClickableElements (/home/myuser/dist/utils.js:238:16) {"id":"2Qu5eZTfYUE9BCp","url":"https://help.splunk.com/en/splunk-cloud-platform/search/spl2-search-manual/sort-and-order/lexicographical-order","retryCount":1} 2025-07-21T16:27:05.089Z WARN Failed to process HTML with 'readableText', falling back to 'none' for URL https://help.splunk.com/en/splunk-cloud-platform/search/spl2-search-manual/data-types/built-in-data-types. 2025-07-21T16:27:06.683Z INFO Enqueued 1 new link on https://help.splunk.com/en/splunk-cloud-platform/search/spl2-search-manual/data-types/built-in-data-types. 2025-07-21T16:27:07.992Z WARN PlaywrightCrawler: Reclaiming failed request back to the list or queue. page.evaluate: Execution context was destroyed, most likely because of a navigation 2025-07-21T16:27:07.995Z at genericHandler (/home/myuser/dist/request-handler.js:120:31) {"id":"w0nwFEpnzump7LK","url":"https://help.splunk.com/en/splunk-cloud-platform/search/spl2-search-manual/introduction/introduction","retryCount":2} 2025-07-21T16:27:08.484Z WARN PlaywrightCrawler: Reclaiming failed request back to the list or queue. page.evaluate: Execution context was destroyed, most likely because of a navigation. 2025-07-21T16:27:08.487Z at expandClickableElements (/home/myuser/dist/utils.js:238:16) {"id":"khO7ubEyjTGxFlr","url":"https://help.splunk.com/en/splunk-cloud-platform/search/spl2-search-manual/wildcards-quotes-and-escape-characters/wildcards","retryCount":1} 2025-07-21T16:27:11.303Z WARN PlaywrightCrawler: Reclaiming failed request back to the list or queue. page.evaluate: Execution context was destroyed, most likely because of a navigation. 2025-07-21T16:27:11.305Z at expandClickableElements (/home/myuser/dist/utils.js:238:16) {"id":"zdXirNqOkSKBc9n","url":"https://help.splunk.com/en/splunk-cloud-platform/search/spl2-search-manual/event-segmentation/event-segmentation-and-searching","retryCount":1}

HU

Humble

10 days ago

Also experiencing this:

2025-07-23T14:17:57.329Z ACTOR: Pulling Docker image of build N5wC2ArUbLgbcEpaX from registry. 2025-07-23T14:17:57.330Z ACTOR: Creating Docker container. 2025-07-23T14:17:57.439Z ACTOR: Starting Docker container. 2025-07-23T14:17:57.691Z Starting X virtual framebuffer using: Xvfb :99 -ac -screen 0 1920x1080x24+32 -nolisten tcp 2025-07-23T14:17:57.691Z Executing main command 2025-07-23T14:17:59.178Z INFO System info {"apifyVersion":"3.4.2","apifyClientVersion":"2.12.4","crawleeVersion":"3.13.8","osType":"Linux","nodeVersion":"v22.9.0"} 2025-07-23T14:17:59.773Z INFO Crawling will be started using 1 start URLs and 0 sitemap URLs 2025-07-23T14:17:59.880Z [DEBUG] gradually switching: GotScrapingHttpClient -> ImpitHttpClient (picked: ImpitHttpClient) 2025-07-23T14:17:59.881Z [DEBUG] gradually switching: PlaywrightHttpClient -> ImpitHttpClient (picked: ImpitHttpClient) 2025-07-23T14:18:00.446Z INFO WCCAdaptiveCrawler: Starting the crawler. 2025-07-23T14:18:34.593Z WARN WCCAdaptiveCrawler: Reclaiming failed request back to the list or queue. page.goto: NS_ERROR_NET_TIMEOUT 2025-07-23T14:18:34.601Z Call log: 2025-07-23T14:18:34.602Z - navigating to "https://nontechies.ai/", waiting until "load" 2025-07-23T14:18:34.603Z 2025-07-23T14:18:34.605Z {"id":"H4NAz62wdzYkMnD","url":"https://nontechies.ai/","retryCount":1} 2025-07-23T14:18:52.190Z WARN WCCAdaptiveCrawler: Reclaiming failed request back to the list or queue. page.evaluate: Execution context was destroyed, most likely because of a navigation. 2025-07-23T14:18:52.191Z at expandClickableElements (/home/myuser/dist/utils.js:238:16) {"id":"H4NAz62wdzYkMnD","url":"https://nontechies.ai/","retryCount":2} 2025-07-23T14:19:00.446Z INFO Statistics: WCCAdaptiveCrawler request statistics: {"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":null,"requestsFinishedPerMinute":0,"requestsFailedPerMinute":0,"requestTotalDurationMillis":0,"requestsTotal":0,"crawlerRuntimeMillis":60365,"retryHistogram":[]} 2025-07-23T14:19:00.470Z INFO WCCAdaptiveCrawler:AutoscaledPool: state {"currentConcurrency":1,"desiredConcurrency":3,"systemStatus":{"isSystemIdle":true,"memInfo":{"isOverloaded":false,"limitRatio":0.2,"actualRatio":0},"eventLoopInfo":{"isOverloaded":false,"limitRatio":0.6,"actualRatio":0.027},"cpuInfo":{"isOverloaded":false,"limitRatio":0.4,"actualRatio":0.106},"clientInfo":{"isOverloaded":false,"limitRatio":0.3,"actualRatio":0}}} 2025-07-23T14:19:05.484Z WARN WCCAdaptiveCrawler: Reclaiming failed request back to the list or queue. page.evaluate: Execution context was destroyed, most likely because of a navigation. 2025-07-23T14:19:05.484Z at expandClickableElements (/home/myuser/dist/utils.js:238:16) {"id":"H4NAz62wdzYkMnD","url":"https://nontechies.ai/","retryCount":3} 2025-07-23T14:19:19.088Z ERROR WCCAdaptiveCrawler: Request failed and reached maximum retries. page.evaluate: Execution context was destroyed, most likely because of a navigation. 2025-07-23T14:19:19.088Z at expandClickableElements (/home/myuser/dist/utils.js:238:16) 2025-07-23T14:19:19.089Z at async genericHandler (/home/myuser/dist/request-handler.js:117:9) 2025-07-23T14:19:19.089Z at async wrap (/home/myuser/node_modules/@apify/timeout/cjs/index.cjs:54:21) {"id":"H4NAz62wdzYkMnD","url":"https://nontechies.ai/","method":"GET","uniqueKey":"https://nontechies.ai"} 2025-07-23T14:19:19.240Z INFO WCCAdaptiveCrawler: All requests from the queue have been processed, the crawler will shut down. 2025-07-23T14:19:20.146Z INFO WCCAdaptiveCrawler: Final request statistics: {"requestsFinished":0,"requestsFailed":1,"retryHistogram":[null,null,null,1],"requestAvgFailedDurationMillis":10347,"requestAvgFinishedDurationMillis":null,"requestsFinishedPerMinute":0,"requestsFailedPerMinute":0,"requestTotalDurationMillis":10347,"requestsTotal":1,"crawlerRuntimeMillis":80066} 2025-07-23T14:19:20.147Z INFO WCCAdaptiveCrawler: Error analysis: {"totalErrors":1,"uniqueErrors":1,"mostCommonErrors":["1x: page.evaluate: Execution context was destroyed, most likely because of a navigation. (/home/myuser/dist/utils.js:238:16)"]} 2025-07-23T14:19:20.148Z INFO WCCAdaptiveCrawler: Finished! Total 1 requests: 0 succeeded, 1 failed. {"terminal":true}

jindrich.bar avatar

Hello, and thank you for your interest in this Actor!

In both cases, the problem is caused by the Actor's default behaviour. By default, WCC attempts to click and expand accordion / collapsible elements on the page to extract as much content from the webpage as possible.

On both of those pages, such elements initiate navigation (away from the page), which causes the error you're seeing.

You can modify this behaviour by changing the HTML processing > Expand clickable elements input option to some non-existent selector (e.g. .dont-click). This way, the Actor won't try to click anything on the page (so no errors should happen).

I'll close this issue, but feel free to ask additional questions if you have any. Cheers!