Get Site to Markdown
No credit card required
Get Site to Markdown
No credit card required
Website to Markdown Crawler An asynchronous web crawler that mirrors websites into a single organized markdown file, with handling for images and directory structure preservation. Designed to operate with low cost. This works great to build context for AI agents.
Website to Markdown Crawler
An asynchronous web crawler that mirrors websites into a single organized markdown file, with special handling for images and proper directory structure preservation. Built with Python, asyncio, and httpx.
Author: Jordan Haisley (jordan@b-w.pro)
Features
- 🚀 Fast asynchronous crawling using
httpx
andasyncio
- 📁 Preserves site structure - can be limited to specific subdirectories
- 🖼️ Smart image handling - preserves both alt text and filenames
- 📝 Clean Markdown output with proper sectioning
- 🔍 Depth-controlled crawling
- 🔒 Domain-restricted recursive crawling for safety
- 🤫 Quiet mode for silent operation
As an Apify Actor
Actor input schema:
1{ 2 "start_urls": [{"url": "https://example.com"}], 3 "max_depth": 1 4}
Output Format
The generated markdown file contains:
- A section for each page
- Page title as heading
- Original URL reference
- Page content in Markdown format
- Image references with both alt text and filenames
Example output:
1# Page Title 2*URL: https://example.com/page* 3 4 5 6Page content in markdown... 7 8----------------
Actor Metrics
1 monthly user
-
0 No bookmarks yet
>99% runs succeeded
Created in Mar 2025
Modified 6 days ago