GS

Get Site to Markdown

Try for free

No credit card required

Go to Store
GS

Get Site to Markdown

jhaisley/get-site
Try for free

No credit card required

Website to Markdown Crawler An asynchronous web crawler that mirrors websites into a single organized markdown file, with handling for images and directory structure preservation. Designed to operate with low cost. This works great to build context for AI agents.

Website to Markdown Crawler

An asynchronous web crawler that mirrors websites into a single organized markdown file, with special handling for images and proper directory structure preservation. Built with Python, asyncio, and httpx.

Author: Jordan Haisley (jordan@b-w.pro)

Features

  • 🚀 Fast asynchronous crawling using httpx and asyncio
  • 📁 Preserves site structure - can be limited to specific subdirectories
  • 🖼️ Smart image handling - preserves both alt text and filenames
  • 📝 Clean Markdown output with proper sectioning
  • 🔍 Depth-controlled crawling
  • 🔒 Domain-restricted recursive crawling for safety
  • 🤫 Quiet mode for silent operation

As an Apify Actor

Actor input schema:

1{
2    "start_urls": [{"url": "https://example.com"}],
3    "max_depth": 1
4}

Output Format

The generated markdown file contains:

  • A section for each page
  • Page title as heading
  • Original URL reference
  • Page content in Markdown format
  • Image references with both alt text and filenames

Example output:

1# Page Title
2*URL: https://example.com/page*
3
4![Alt text (File: image.jpg)](https://example.com/image.jpg)
5
6Page content in markdown...
7
8----------------
Developer
Maintained by Community

Actor Metrics

  • 1 monthly user

  • 0 No bookmarks yet

  • >99% runs succeeded

  • Created in Mar 2025

  • Modified 6 days ago