No credit card required

Get Site to Markdown

jhaisley/get-site

No credit card required

Website to Markdown Crawler An asynchronous web crawler that mirrors websites into a single organized markdown file, with handling for images and directory structure preservation. Designed to operate with low cost. This works great to build context for AI agents.

Website to Markdown Crawler

An asynchronous web crawler that mirrors websites into a single organized markdown file, with special handling for images and proper directory structure preservation. Built with Python, asyncio, and httpx.

Author: Jordan Haisley (jordan@b-w.pro)

Features

🚀 Fast asynchronous crawling using httpx and asyncio
📁 Preserves site structure - can be limited to specific subdirectories
🖼️ Smart image handling - preserves both alt text and filenames
📝 Clean Markdown output with proper sectioning
🔍 Depth-controlled crawling
🔒 Domain-restricted recursive crawling for safety
🤫 Quiet mode for silent operation

As an Apify Actor

Actor input schema:

1{
2    "start_urls": [{"url": "https://example.com"}],
3    "max_depth": 1
4}

Output Format

The generated markdown file contains:

A section for each page
Page title as heading
Original URL reference
Page content in Markdown format
Image references with both alt text and filenames

Example output:

1# Page Title
2*URL: https://example.com/page*
3
4![Alt text (File: image.jpg)](https://example.com/image.jpg)
5
6Page content in markdown...
7
8----------------

Developer

b-w.pro

Actor Metrics

1 monthly user
0 No bookmarks yet
>99% runs succeeded
Created in Mar 2025
Modified 6 days ago

Categories

Website Content Crawler

apify/website-content-crawler

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

Apify

40.4k

1.1k

RAG Web Browser

apify/rag-web-browser

Web browser for OpenAI Assistants, RAG pipelines, or AI agents, similar to a web browser in ChatGPT. It queries Google Search, scrapes the top N pages from the results, and returns their cleaned content as Markdown for further processing by an LLM. It can also scrape individual URLs. Supports MCP.

Apify

Twitter (X.com) Scraper Unlimited: No Rate-Limits

apidojo/twitter-scraper-lite

Introducing Twitter Scraper Unlimited, the most comprehensive Twitter data extraction solution available. Our enterprise-grade scraper offers unmatched capabilities with a transparent event-based pricing model, making it perfect for both small-scale and large-scale data extraction needs.

API Dojo

Parsera

parsera-labs/parsera

Extract Data from ANY website with Parsera.org

Parsera Labs

Youtube Video Downloader

epctex/youtube-video-downloader

Effortlessly download YouTube videos of your preferred quality with our user-friendly Video Downloader. Try it now!

epctex

871

Cheerio Scraper

apify/cheerio-scraper

Crawls websites using raw HTTP requests, parses the HTML with the Cheerio library, and extracts data from the pages using a Node.js code. Supports both recursive crawling and lists of URLs. This actor is a high-performance alternative to apify/web-scraper for websites that do not require JavaScript.

Apify

6.7k

127

Web Scraper

apify/web-scraper

Crawls arbitrary websites using the Chrome browser and extracts structured data from web pages using a provided JavaScript function. The Actor supports both recursive crawling and lists of URLs, and automatically manages concurrency for maximum performance.

Apify

78.1k

529

🔥 LinkedIn Jobs Scraper

bebity/linkedin-jobs-scraper

ℹ️ Designed for both personal and professional use, simply enter your desired job title and location to receive a tailored list of job opportunities. Try it today!

Bebity

7.1k

230

Puppeteer Scraper

apify/puppeteer-scraper

Crawls websites with the headless Chrome and Puppeteer library using a provided server-side Node.js code. This crawler is an alternative to apify/web-scraper that gives you finer control over the process. Supports both recursive crawling and list of URLs. Supports login to website.

Apify

5.7k

109