Web Scraper avatar

Web Scraper

Try for free

No credit card required

Go to Store
Web Scraper

Web Scraper

apify/web-scraper
Try for free

No credit card required

Crawls arbitrary websites using the Chrome browser and extracts data from pages using JavaScript code. The Actor supports both recursive crawling and lists of URLs and automatically manages concurrency for maximum performance. This is Apify's basic tool for web crawling and scraping.

Do you want to learn more about this Actor?

Get a demo
CH

Crawler goes off domain

Closed

chimaro opened this issue
2 months ago

What is the setting that tells the scraper to not leave the original domain? For example if I scrape a site example.com it finds a social link and then its scraping facebook.com/link but I want it to stay on example.com

jindrich.bar avatar

Hello and thank you for your interest in this Actor!

This is the default behavior of Web Scraper, i.e., by default; it only follows the links targeting the same domain as at least one of the start URLs. See my example run on my personal blog - while I have links to other websites (my GitHub profile, LinkedIn, or Apify homepage), the Actor doesn't visit these. To change this, you can use the Include globs input option - using this, you can set custom URL patterns to crawl.

I'll close this issue now, but feel free to ask additional questions if you have any. Cheers! (and sorry for the wait).

Developer
Maintained by Apify

Actor Metrics

  • 2.6k monthly users

  • 340 stars

  • >99% runs succeeded

  • 37 days response time

  • Created in Mar 2019

  • Modified 5 months ago

Categories