Fast Scraper is a blazingly fast web scraper powered by Rust on the backend. It allows you to scrape static csfd HTML pages extremely quickly without renderring while using only 128 MB of memory. With this scraper, you can maximize the efficiency of your credits on Apify. 🚀🚀🚀

Regarding benchmark see https://apify.com/danielherman/fast-scraper.

Explanation of the input

There are some global parameters for the actor that you can find in the Input tab description and then there are requests. Requests have the following structure

{
    "request_type": string, // required
    "url": string, // optional
    "id": string, // optional
    "headers": object, // optional
    "user-agent": string // optional
}

Only request_type is required, so if request_type=Sitemap then url won't be considered, but with different request_type the url has to be mentioned otherwise the actor will panic. Key id is optional and will be copied to results, this value is for you if you want to track the requests with something else then url only. In the response list the order of scraped data most likely will be different than in requests. Both headers and user-agent are optional, you can also state user-agent in headers directly. Request headers and user-agent will override the global headers and user-agent. Let's see an example

{
    "requests": [
        {
            "request_type": "View",
            "url": "https://www.csfd.cz/film/68990-star-trek-hluboky-vesmir-devet/494608-serie-6/prehled/",
            "headers": {
                "dnt": "0",
                "priority": "u=0, i",
                "referer": "https://www.csfd.cz/"
            }
        }
    ],
    "headers": {
        "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
        "accept-language": "en-US,en;q=0.9",
        "dnt": "1",
        "priority": "u=0, i",
        "sec-ch-ua": "\"Chromium\";v=\"124\", \"Google Chrome\";v=\"124\", \"Not-A.Brand\";v=\"99\"",
        "sec-ch-ua-mobile": "?0",
        "sec-ch-ua-platform": "\"macOS\"",
        "sec-fetch-dest": "document",
        "sec-fetch-mode": "navigate",
        "sec-fetch-site": "none",
        "sec-fetch-user": "?1",
        "upgrade-insecure-requests": "1",
    },
    "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
    "force_cloud": false,
    "push_data_size": 500,
    "max_concurrency": 10,
    "max_request_retries": 3,
    "max_request_retry_timeout_ms": 10000,
    "request_retry_wait_ms": 5000
}

Here the request will contain all the global headers and user_agent, but "dnt" (Do Not Track) header will be set to 0 from 1 and we also have two additional header "priority" and "referer". Once you set global headers you can not delete them at the request level, only override them.

Supported page types

There are different request_types

Sitemap request_type=Sitemap: this allows you to scrape all urls that are in the exposed csfd.cz sitemap. This can take a while and it is done in a single request specification.
View request_type=View: with this type you will obtain information for views (/prehled) of movies, serials, series and episodes.
View reviews request_type=ViewReviews: this would return you comments for specific movie, serial, series or episode.
User request_type=User: this would return you information about the user.
User reviews request_type=UserReviews: this would return you information about the user.
Ratings request_type=Ratings: this would return you all th user ratings for specific movie, serial, series or episode. PLANNED
Creator request_type=Creator: this would return you information about the creator. PLANNED
Program request_type=Program: this would return you parsed TV program from https://www.csfd.cz/televize/program/. PLANNED

At this moment there is only one page type supported and that is view type. Soon will be also added rating and comments types. You can now scrape the whole sitemap with this scraper.

Sitemap

Make sure that the timeout for actor is long enough (e.g. 3600 s). The scraping of sitemap is not done in parallel.

Input example:

{
    "requests": [
        {
            "request_type": "Sitemap"
        }
    ],
    "user_agent": "ApifyFastScraper/1.0",
    "force_cloud": false,
    "push_data_size": 500,
    "max_concurrency": 10,
    "max_request_retries": 3,
    "max_request_retry_timeout_ms": 10000,
    "request_retry_wait_ms": 5000
}

It will fetch for whole published sitemap of csfd.cz that contains also:

https://www.csfd.cz/film/
https://www.csfd.cz/tvurce/
https://www.csfd.cz/uzivatel/
https://www.csfd.cz/diskuze/
https://www.csfd.cz/akce/
https://www.csfd.cz/festival/
https://www.csfd.cz/kino/
https://www.csfd.cz/novinky/
https://www.csfd.cz/zanry/

Output example:

[
    {
        "id": "06de9c9d-b17f-44aa-a9f3-e87a6769fffd",
        "request_type": "Sitemap",
        "url": "https://www.csfd.cz/sitemap.xml",
        "data": {
            "Sitemap": [
                "https://www.csfd.cz/film/16-zurov/231-zurov-2/prehled/",
                "https://www.csfd.cz/film/16-zurov/703683-zurov/prehled/",
                "https://www.csfd.cz/film/16-zurov/703684-teorema-lobacevskogo/prehled/"
            ]
        }
    }
]

Views (film + prehled)

Pages of the type https://www.csfd.cz/film/<movie-id>/, https://www.csfd.cz/film/<movie-id>/<movie-id2>/, https://www.csfd.cz/film/<movie-id>/prehled/ or https://www.csfd.cz/film/<movie-id>/<movie-id2>/prehled/.

Set request_type=View and here is an example of input and output. Input example:

{
    "requests": [
        {
            "request_type": "View",
            "url": "https://www.csfd.cz/film/17592-ctyri-svatby-a-jeden-pohreb/prehled/"
        }
    ],
    "user_agent": "ApifyFastScraper/1.0",
    "force_cloud": false,
    "push_data_size": 500,
    "max_concurrency": 10,
    "max_request_retries": 3,
    "max_request_retry_timeout_ms": 10000,
    "request_retry_wait_ms": 5000
}

The following requests are all equivalent

"requests": [
    {
        "request_type": "View",
        "url": "https://www.csfd.cz/film/17592-ctyri-svatby-a-jeden-pohreb/prehled/"
    },
    {
        "request_type": "View",
        "url": "https://www.csfd.cz/film/17592/prehled/"
    },
]

Output example:

{
  "View": {
    "header_name": "Čtyři svatby a jeden pohřeb",
    "header_name_langs": [
      {
        "country": "Velká Británie",
        "title": "Four Weddings and a Funeral(více)"
      },
      {
        "country": "USA",
        "title": "Four Weddings and a Funeral"
      },
      {
        "country": "Slovensko",
        "title": "Štyri svadby a jeden pohreb(méně)"
      }
    ],
    "rating": "72%",
    "rating_votes_count": 14484,
    "rating_fanklub_count": 45,
    "origin": "Velká Británie / USA, 1994, 117 min(Alternativní 113 min)",
    "plot_full": "Snímek vypráví příběh Charlese (Hugh Grant), vtipného a okouzlujícího muže, který ve svých dvaatřiceti letech stále střídá partnerky jako na běžícím pásu. Jeho životem prošla spousta žen, které zbožňoval, ale s žádnou z nich nedokázal navázat hlubší vztah. Rezervovaný Angličan vystavěl kolem vlastního nitra tak nepropustnou zeď, že nyní nedokáže projevit své city. A čím více svateb společně se svými kamarády navštíví, tím méně se sám hrne do ženění. Až do oné osudné soboty, kdy v jednom kostele spatří Carrie (Andie MacDowellová) – tu nejzajímavější, nejkrásnější, nejdůvtipnější a také nejnedostupnější Američanku, jakou kdy v životě potkal. Charles se ze všech sil snaží, aby ji příliš neuháněl a hlavně se do ní nezamiloval - během jednoho pohřbu a tří dalších svateb…(Cinemax)",
    "genres": [
      "Komedie",
      "Romantický",
      "Drama"
    ],
    "creators": [
      {
        "name": "režie",
        "people": [
          {
            "name": "Mike Newell",
            "url": "/tvurce/4060-mike-newell/"
          }
        ]
      },
      {
        "name": "scénář",
        "people": [
          {
            "name": "Richard Curtis",
            "url": "/tvurce/6726-richard-curtis/"
          }
        ]
      },
      {
        "name": "kamera",
        "people": [
          {
            "name": "Michael Coulter",
            "url": "/tvurce/75908-michael-coulter/"
          }
        ]
      },
      {
        "name": "hudba",
        "people": [
          {
            "name": "Richard Rodney Bennett",
            "url": "/tvurce/63995-richard-rodney-bennett/"
          }
        ]
      },
      {
        "name": "hrají",
        "people": [
          {
            "name": "Hugh Grant",
            "url": "/tvurce/332-hugh-grant/"
          },
          {
            "name": "Andie MacDowell",
            "url": "/tvurce/130-andie-macdowell/"
          },
          {
            "name": "James Fleet",
            "url": "/tvurce/17860-james-fleet/"
          },
          {
            "name": "Simon Callow",
            "url": "/tvurce/12966-simon-callow/"
          },
          {
            "name": "John Hannah",
            "url": "/tvurce/803-john-hannah/"
          },
          {
            "name": "Kristin Scott Thomas",
            "url": "/tvurce/164-kristin-scott-thomas/"
          },
          {
            "name": "Elspet Gray",
            "url": "/tvurce/35549-elspet-gray/"
          },
          {
            "name": "Rowan Atkinson",
            "url": "/tvurce/349-rowan-atkinson/"
          },
          {
            "name": "Corin Redgrave",
            "url": "/tvurce/16584-corin-redgrave/"
          },
          {
            "name": "Anna Chancellor",
            "url": "/tvurce/12166-anna-chancellor/"
          },
          {
            "name": "Hannah Taylor-Gordon",
            "url": "/tvurce/23562-hannah-taylor-gordon/"
          },
          {
            "name": "Bernice Stegers",
            "url": "/tvurce/11078-bernice-stegers/"
          },
          {
            "name": "Jeremy Kemp",
            "url": "/tvurce/53343-jeremy-kemp/"
          },
          {
            "name": "Sophie Thompson",
            "url": "/tvurce/55128-sophie-thompson/"
          },
          {
            "name": "Charlotte Coleman",
            "url": "/tvurce/76910-charlotte-coleman/"
          },
          {
            "name": "David Haig",
            "url": "/tvurce/78156-david-haig/"
          },
          {
            "name": "Nicola Walker",
            "url": "/tvurce/111089-nicola-walker/"
          },
          {
            "name": "Struan Rodger",
            "url": "/tvurce/115678-struan-rodger/"
          },
          {
            "name": "Simon Kunz",
            "url": "/tvurce/145261-simon-kunz/"
          },
          {
            "name": "Duncan Kenworthy",
            "url": "/tvurce/205006-duncan-kenworthy/"
          },
          {
            "name": "Rosalie Crutchley",
            "url": "/tvurce/214678-rosalie-crutchley/"
          },
          {
            "name": "Rupert Vansittart",
            "url": "/tvurce/219917-rupert-vansittart/"
          },
          {
            "name": "Kenneth Griffith",
            "url": "/tvurce/277724-kenneth-griffith/"
          },
          {
            "name": "Philip Voss",
            "url": "/tvurce/298970-philip-voss/"
          },
          {
            "name": "Randall Paul",
            "url": "/tvurce/308830-randall-paul/"
          },
          {
            "name": "Sara Crowe",
            "url": "/tvurce/157205-sara-crowe/"
          },
          {
            "name": "Richard Butler",
            "url": "/tvurce/348163-richard-butler/"
          },
          {
            "name": "Nigel Hastings",
            "url": "/tvurce/368537-nigel-hastings/"
          },
          {
            "name": "Juliette James",
            "url": "/tvurce/529994-juliette-james/"
          },
          {
            "name": "Amanda Mealing",
            "url": "/tvurce/875966-amanda-mealing/"
          }
        ]
      },
      {
        "name": "produkce",
        "people": [
          {
            "name": "Duncan Kenworthy",
            "url": "/tvurce/205006-duncan-kenworthy/"
          },
          {
            "name": "Eric Fellner",
            "url": "/tvurce/150112-eric-fellner/"
          }
        ]
      },
      {
        "name": "střih",
        "people": [
          {
            "name": "Jon Gregory",
            "url": "/tvurce/241299-jon-gregory/"
          }
        ]
      },
      {
        "name": "scénografie",
        "people": [
          {
            "name": "Anna Pinnock",
            "url": "/tvurce/787919-anna-pinnock/"
          }
        ]
      },
      {
        "name": "masky",
        "people": [
          {
            "name": "Ann Buchanan",
            "url": "/tvurce/630463-ann-buchanan/"
          }
        ]
      },
      {
        "name": "kostýmy",
        "people": [
          {
            "name": "Lindy Hemming",
            "url": "/tvurce/254644-lindy-hemming/"
          }
        ]
      }
    ],
    "vod_content": [
      {
        "name": "Apple TV+",
        "ga_name": "vod-service-apple-tv|film|vod",
        "url": "https://tv.apple.com/cz/movie/four-weddings-and-a-funeral/umc.cmc.50uemm7f92zctyyjp8z6x1upu"
      },
      {
        "name": "Google Play",
        "ga_name": "vod-service-google-play|film|vod",
        "url": "https://play.google.com/store/movies/details/Four_Weddings_And_A_Funeral?id=ZnSmxlAWj4s&hl=cs&gl=cz"
      }
    ]
  }
}

View reviews (film + recenze)

Pages of the type https://www.csfd.cz/film/<movie-id>/recenze/?page=<N> and https://www.csfd.cz/film/<movie-id>/<movie-id2>/recenze/?page=<N>.

For request_type=ViewReviews you have to make sure that the url contains film and recenze. You don't have to put ?page=<N> at the end of the url, because it will be replaced with page=1, the number of pages needed to scan will retrieved and then the scraper will scrape all of them one at the time. Input example:

{
    "requests": [
        {
            "request_type": "ViewReviews",
            "url": "https://www.csfd.cz/film/1490468-survivor-cesko-slovensko/1486957-serie-3/recenze/"
        }
    ],
    "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
    "force_cloud": false,
    "push_data_size": 500,
    "max_concurrency": 10,
    "max_request_retries": 3,
    "max_request_retry_timeout_ms": 10000,
    "request_retry_wait_ms": 5000
}

The results, are split so that results have roughly 1MB or less, this way we can make sure that the results will be uploaded to apify store. The id stays the same and part key indicates the order of the results. Output example:

[
    {
        "id": "191f5d99-e101-455d-b833-9554e7b102e8",
        "request_type": "ViewReviews",
        "url": "https://www.csfd.cz/film/2294-vykoupeni-z-veznice-shawshank/recenze/",
        "data": {
            "part": 1,
            "content": [
                {
                    "user_name": "golfista",
                    "user_url": "/uzivatel/95-golfista/",
                    "star_rating": "5",
                    "comment": "\n Na velmi ošemetnou a těžko zodpověditelnou otázku \"který film je podle vás nejlepší\", mi dal do úst tímhle dílem Frank Darabont odpověď, za kterou se opravdu nebudu stydět. Pokud bych měl jenom jednu (možná dvě :) možnost, pak právě sem patří 6*. Bohužel jsem nestihl tenhle film v kině, ale vydáním na DVD jsem si ho konečně vychutnal i v originále a je to fakt nádhera (tím nechci hanět český dabing, který je mimochodem vynikající).\n",
                    "comment_html": "Na velmi ošemetnou a těžko zodpověditelnou otázku \"který film je podle vás nejlepší\", mi dal do úst tímhle dílem Frank Darabont odpověď, za kterou se opravdu nebudu stydět. Pokud bych měl jenom jednu (možná dvě :) možnost, pak právě sem patří 6*. Bohužel jsem nestihl tenhle film v kině, ale vydáním na DVD jsem si ho konečně vychutnal i v originále a je to fakt nádhera (tím nechci hanět český dabing, který je mimochodem vynikající).",
                    "date": "14.02.2003"
                },
                ...
            ]
        }
    }
]

View ratings (film + prehled)

User (uzivatel)

For request_type=ViewRatings you have to make sure that the url contains film. You don't have to put ?pageRating=<N> at the end of the url, because it will be replaced with pageRating=1, the number of pages needed to scan will retrieved and then the scraper will scrape all of them one at the time. Input example:

{
    "requests": [
        {
            "request_type": "ViewRatings",
            "url": "https://www.csfd.cz/film/425904-mizerove-na-zivot-a-na-smrt/prehled/"
        }
    ],
    "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
    "force_cloud": false,
    "push_data_size": 500,
    "max_concurrency": 10,
    "max_request_retries": 3,
    "max_request_retry_timeout_ms": 10000,
    "request_retry_wait_ms": 5000
}

[
    {
        "id": "7749baa2-b364-4920-afbb-88907fa2f194",
        "request_type": "ViewRatings",
        "url": "https://www.csfd.cz/film/425904-mizerove-na-zivot-a-na-smrt/prehled/",
        "data": {
            "part": 1,
            "content": [
                {
                    "user_name": "POMO",
                    "user_url": "/uzivatel/1-pomo/",
                    "date": "Vloženo v 05.06.2024",
                    "star_rating": "3"
                },
                {
                    "user_name": "kleopatra",
                    "user_url": "/uzivatel/1263-kleopatra/",
                    "date": "Vloženo v 07.06.2024",
                    "star_rating": "4"
                },
                ...
            ]
        }
    }
]

User reviews (uzivatel + recenze)

Pages of the type https://www.csfd.cz/uzivatel/<movie-id>/recenze/?page=<N>.

For request_type=UserReviews you have to make sure that the url contains uzivatel and recenze. You don't have to put ?page=<N> at the end of the url, because it will be replaced with page=1, the number of pages needed to scan will retrieved and then the scraper will scrape all of them one at the time. Input example:

{
    "requests": [
        {
            "request_type": "UserReviews",
            "url": "https://www.csfd.cz/uzivatel/195357-verbal/recenze/"
        }
    ],
    "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
    "force_cloud": false,
    "push_data_size": 500,
    "max_concurrency": 10,
    "max_request_retries": 3,
    "max_request_retry_timeout_ms": 10000,
    "request_retry_wait_ms": 5000
}

[
    {
        "id": "e5475be7-a70c-4749-bfff-60ad68bdc38e",
        "request_type": "UserReviews",
        "url": "https://www.csfd.cz/uzivatel/195357-verbal/recenze/",
        "data": {
            "part": 1,
            "content": [
                {
                    "movie_name": "Mizerové: Na život a na smrt",
                    "movie_url": "/film/425904-mizerove-na-zivot-a-na-smrt/",
                    "star_rating": "5",
                    "comment": "\n Jak repujeme my, sportovně založení bílí Dolní Slezané „ Bembajs, bembajs, jak pro tebe du, narobiš pyču!!! “… A oni zas přišli, co nadělám! Šup do kina! Navíc se belgičtí uzenáči od minula o dost zlepšili, scénáristé zavzpomínali na staroškolské fláky, a Špatňáci jsou tak na zase plné kule tím, čím bývali za časů Míši Záliva ve starých dobrých devadesátkách. Tedy pořád docela freš Wilík a ubohý starý trapák Lórenc proti bandě konečně charismatických a bezskrupulózních zlolidí ve vinně potěšující akčně kokotmediální taškařici a lá Smrtonosné smrti. Míša si v tom zase štěknul a docela nechápu, proč si svou vypiplanou značku rovnou nezmáknul sám. Patrně má nahrabáno tolik, že už jen rybaří v Zálivu. Ale i tak furt klasicka blažena oddychovka jak cyp.\n",
                    "comment_html": "Jak repujeme my, sportovně založení bílí Dolní Slezané „<em>Bembajs, bembajs, jak pro tebe du, narobiš pyču!!!</em>“… A oni zas přišli, co nadělám! Šup do kina! Navíc se belgičtí uzenáči od minula o dost zlepšili, scénáristé zavzpomínali na staroškolské fláky, a Špatňáci jsou tak na zase plné kule tím, čím bývali za časů Míši Záliva ve starých dobrých devadesátkách. Tedy pořád docela freš Wilík a ubohý starý trapák Lórenc proti bandě konečně charismatických a bezskrupulózních zlolidí ve vinně potěšující akčně kokotmediální taškařici a lá Smrtonosné smrti. Míša si v tom zase štěknul a docela nechápu, proč si svou vypiplanou značku rovnou nezmáknul sám. Patrně má nahrabáno tolik, že už jen rybaří v Zálivu. Ale i tak furt klasicka blažena oddychovka jak cyp.",
                    "date": "14.06.2024"
                },
                ...
            ]
        }
    }
]

Your feedback

I am always working on improving the performance of my Actors. So if you’ve got any technical feedback for Fast Scraper or simply found a bug, please create an issue on the Actor’s Issues tab in Apify Console.

On this page

What is CSFD Scraper?
Explanation of the input
Your feedback

Share Actor:

F1 API

adriigarr/f1-api

The F1 API provides real-time and historical Formula 1 race data, allowing users to access race results, driver standings, team information, and more. This API is designed for F1 enthusiasts, developers, and data analysts who want to explore motorsport statistics effortlessly.

Adriana Garcia

Monitoring

apify/monitoring

This actor monitors your actors' statuses, validates their datasets' data, and displays useful information in an interactive dashboard. And if something happens, you'll get notified via email or Slack.

Apify

162

4.4

Validate Dataset(s) with JSON Schema

jaroslavhejlek/validate-dataset-with-json-schema

This Actor validates items in one or more datasets against a provided JSON Schema. Use it if you planning to add a dataset validation schema to your actor and you want test it.

Jaroslav Hejlek

OnlyFans Scraper Pro | Posts, Media & Profiles

pintxuki/onlyfans-scrapper

Scrape OnlyFans profiles with ease. Extract posts, images, videos, bio info, engagement stats, and pricing. Perfect for research, analysis, automation, and growth tracking. Auth support for private content. Fast, flexible, and code-free.

Inspecto

Monitoring Reporter Slack

apify/monitoring-reporter-slack

The Monitoring reporter slack is a part of the Apify Monitoring Suite (apify/monitoring). See its readme for more information and how to use this.

Apify

4.1

LOL Esports Power Rankings

richard.biros/lol-esports-power-rankings

This LOL Esports power rankings scraper extracts official Global Power Rankings data from lolesports.com, providing comprehensive team performance metrics used by analysts, bettors, and esports enthusiasts worldwide.

Richard Biroš

Html Renderer

jakubbalada/html-renderer

Generate image for your HTML using a headless browser

Jakub Balada

Onlyfans Scraper

hello.datawizard-owner/onlyfans-scraper

Efficient OnlyFans scraper for extracting public data, media, and insights with no proxy requirements and low costs.

datawizards

288

OnlyFans Downloader & Scraper API

clearpath/onlyfans-downloader

OnlyFans Downloader & Scraper - Download all media content including DRM videos, images, and posts. Automated ZIP packaging with email notifications. Professional OnlyFans API solution.