Website Content Crawler
No credit card required
Website Content Crawler
No credit card required
Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗LangChain, LlamaIndex, and the wider LLM ecosystem.
Do you want to learn more about this Actor?
Get a demoWebsite crawler ends successfully but crawls nothing
Opened 11 hours ago by innovum_admin, last comment 9 hours ago by innovum_admin
Use of sitemap makes crawler fail
Opened a day ago by fsmedat, last comment a day ago by fsmedat
scraper started failing
Opened a day ago by fsmedat, last comment a day ago by fsmedat
Failed crawling with 403 code
Opened 4 days ago by ballerine, last comment 4 days ago by ballerine
Can't crawl while logged in
Opened 5 days ago by rust_chimta, last comment 5 days ago by rust_chimta
Fails to scrape useful data from AWS documentation page
Opened 6 days ago by methodical, last comment 5 days ago by methodical
Scrolling functionality breaking our scrape job
Opened 8 days ago by developprotege, last comment 4 days ago by Jan Čurn (jancurn)
Malformed sitemap content and 429s
Opened 8 days ago by MavenAGI, last comment 8 days ago by MavenAGI
My run doesn't work. I have 0 results
Opened 18 days ago by contact_plune, last comment 18 days ago by Jan Buchar (janbuchar)
Poor CPU utilization due to low usage limit
Opened 21 days ago by write2souvik, last comment 10 days ago by write2souvik
Crawling takes longer when calling API vs on site
Opened 24 days ago by adi-kamaraj, last comment 24 days ago by Jan Buchar (janbuchar)
How to ignore broken SSL when using PROXY
Opened a month ago by sash2s, last comment 24 days ago by Jan Buchar (janbuchar)
Unable to crawl https://openai.com/index/extracting-concepts-from-gpt-4/
Opened a month ago by imda_peckyoke, last comment a month ago by Jindřich Bär (jindrich.bar)
Crawler does not identify relative links
Opened a month ago by MavenAGI, last comment a month ago by Jindřich Bär (jindrich.bar)
My Runs do not end
Opened a month ago by matthias.amberg, last comment a month ago by matthias.amberg
Parsing website with CloudFlare protection
Opened a month ago by sash2s, last comment a month ago by sash2s
Unable to crawl the whole website
Opened a month ago by simpleworks, last comment a month ago by Jan Buchar (janbuchar)
Automating Web Content Crawling for Real-Time Updates
Opened 2 months ago by glovebubble, last comment a month ago by Jan Buchar (janbuchar)
Getting duplicate URLs in web crawling
Opened 2 months ago by simpleworks, last comment 24 days ago by Jan Buchar (janbuchar)
Memory limit control
Opened 2 months ago by vitthalrao.lavate, last comment 2 months ago by intriguing_game
- 3k monthly users
- 465 stars
- 99.9% runs succeeded
- 3.1 days response time
- Created in Mar 2023
- Modified 10 days ago