Website Content Crawler avatar

Website Content Crawler

Try for free

No credit card required

View all Actors
Website Content Crawler

Website Content Crawler

apify/website-content-crawler
Try for free

No credit card required

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗LangChain, LlamaIndex, and the wider LLM ecosystem.

Do you want to learn more about this Actor?

Get a demo
SS

How to ignore broken SSL when using PROXY

Open

sash2s opened this issue
a month ago

Hi, I'm currently trying to use a proxy from scrapingbee.com, but every request is not processed because there are SSL errors connecting to the proxy (test via "curl -k" works). In the scrapingbee.com manual, in the "Apify Integration" section, they recommend enabling the "Ignore SSL errors" checkbox. But I don't see it in the actor settings.

janbuchar avatar

Hello, and thank you for the interest in the Actor! You are right that there is currently no way to do this with Website Content Crawler. We will look into this and let you know here once this is addressed.

SS

sash2s

24 days ago

Is there a way to clone the "Website Content Crawler" docker image to add some updates to the code? We really need this feature.

janbuchar avatar

Unfortunately, the package is not open source, so you cannot modify the code. We will add this, but I cannot make any promises now. You may use the Apify proxy right now - it is optimized for this use case.

Developer
Maintained by Apify
Actor metrics
  • 3k monthly users
  • 465 stars
  • 99.9% runs succeeded
  • 3.1 days response time
  • Created in Mar 2023
  • Modified 10 days ago