Email ✉️  & Phone ☎️ Extractor avatar

Email ✉️ & Phone ☎️ Extractor

Try for free

7 days trial then $30.00/month - No credit card required now

View all Actors
Email ✉️  & Phone ☎️ Extractor

Email ✉️ & Phone ☎️ Extractor

anchor/email-phone-extractor
Try for free

7 days trial then $30.00/month - No credit card required now

Extract emails, phone numbers, and other useful contact information (twitter, linkedIn...) from any list of websites you provide. Best tool for contact lead generation. Export data in structured formats and dominate your outreach game. Capture your leads almost for free, fast, and without limits.

SP

Does not really work

Closed

spr123 opened this issue
22 days ago
anchor avatar

guillim (anchor)

22 days ago

Hello, Thanks so much for using my actor. And thank you also for your feedback, it's very valuable to me. I could run on my side with the same Start URL and it found 4 email, and some phones. So it's working, and the problem you have is a bit different. I looked at your logs, and found out that you need more memory. It's very easy to fix : in the INPUT settings, open the "Run options" and add more memory (1024 is the minimum to make sure it works) see the images attached for more details. Hope it helps.

SP

spr123

22 days ago

I increased it as you mentioned. but it seemed to be doing many many requests, 85 and I aborted it. ( it was also expensive due to these many requests )

How do I get the actor to only search the Home page and Contact us page

It does not make any sense it just crawls through tons of pages hoping to find contact details

For example it searched this page

https://mosmannaustralia.com/en-gb/collections/all-bamboo-underwear

Would it not make sense the actor searches the home page, if none found it then looks at links like Contact us

anchor avatar

guillim (anchor)

22 days ago

I agree. Gimme 10min and I will answer your question

anchor avatar

guillim (anchor)

22 days ago

I think it's a simple fix to do : in the INPUT settings, please put 1 as the maximum pages per start URL. This should do exaclty as you want. You can also select "stay within domain" but it's more for when you need more than 1 per start URL. Let me know if its works

anchor avatar

guillim (anchor)

22 days ago

Can you try "maximum link depth" to 0 ?

SP

spr123

22 days ago

Nope, got zero results, any chance you could test this, this is costing me money every time I run it. You have the url :)

anchor avatar

guillim (anchor)

22 days ago

I did run on my side, and it works fine. it's what I was saying in my first message ! Here is what I have as results (see pictures). I also screenshotted you my INPUT for you to see what I put exactly. I hope it helps

SP

spr123

22 days ago

Are you putting in the url of the contact page ? we added the home page. So just to clarify, we have 1000's of home page urls, we want to scan, we are asking if there is no contact details on the home page, the actor then scans pages for example with the word "contact" in it.

anchor avatar

guillim (anchor)

21 days ago

two options for the behaviour of the Actor :

  • you put a list of URLs with "maximum link depth" to 0 and if there are no emails, it will return nothing and stop there
  • you allow the crawler to click on links and see by itself. However, there is no conditional "if word contact in the link" that you can use. unless you add a list of pseudoURLs with the regex for all the homepages and the word contact. It sounds to me like a very complicated way though. Let me know if it helps
anchor avatar

guillim (anchor)

21 days ago
  • Last option, you put directly the contact pages instead of the homepages
SP

spr123

21 days ago

"Last option, you put directly the contact pages instead of the homepages"

Not an option, we have 1000's of home page urls, don't think other users would have the actual contact page url either. Using scrapers like you would generally have the home page.

"you put a list of URLs with "maximum link depth" to 0 and if there are no emails, it will return nothing and stop there"

This will only pull contact details if they are on the home page, many many websites do not contain the contact information on their home page but another page.

"you allow the crawler to click on links and see by itself. "

This makes no sense, you could be crawling 100's of pages that will never have contact information like the url I gave earlier.

https://mosmannaustralia.com/en-gb/collections/all-bamboo-underwear

However, there is no conditional "if word contact in the link" that you can use. unless you add a list of pseudoURLs with the regex for all the homepages and the word contact. It sounds to me like a very complicated way though. Let me know if it helps.

I am not sure I understand your last comment.

Surely it would make sense that you can scan the home page and also scan a page that contains the word Contact, I don't see how this is complicated at all, and makes a lot of sense to me. Your actor is made to scrape contact details right ?

as per the below example, and many many websites do not have their contact details on the home page, but a separate page like the below one.

https://mosmannaustralia.com/pages/customer-service-help-page

I would add words like

Contact Customer Service Help

for the actor to only scan these pages as well as the home page for contact details.

You could even have it if it makes it easier, that it does not have to see if contact details are available on the home page, but it scans the home page, and any urls containing these words.

Many thanks

Scott

anchor avatar

guillim (anchor)

21 days ago

I found out an easy way. In the settings, you will add a pseudoURL : it’s a regex rule that every link must follow otherwise it’s discarded.

In our case, something like

.*(contact|help|service).*

Should do the trick. Feel free to test your own regex using regexr.com for instance.

Let me know if it solves your inquiry

anchor avatar

guillim (anchor)

21 days ago

For your info, I tested it and it was working 🙂

SP

spr123

21 days ago
anchor avatar

guillim (anchor)

21 days ago

Yeah it's because you forgot to change

  • the "maxDepth" to something like 2 (so that the Actor does not stop at the Start URL)
  • the "maxRequests" to something like 100 (so that the Actor can click on a few links without being blocked by this limit)
  • the "maxRequestsPerStartUrl" to something like 3 (so that the Actor can click on at least one link, if you want it to click on the contact page for instance)
anchor avatar

guillim (anchor)

21 days ago

I see that you decided to set "onlyOneEmailPerDomain": true Note that once the Actor finds at least 1 email, he will skip any other URL with the same domain. So if you want emails of the same website, but from different pages, you would better change this to false

SP

spr123

21 days ago

Getting closer thanks :)

Phone number is wrong though, it does not seem to like spaces.

Any chance you can also scrape address's ?

anchor avatar

guillim (anchor)

20 days ago

I fixed it. You should see Australian numbers (at least the one from your website)

About the postal address I am sorry but that is technically impossible, due to the variety of typings and formats.

Hope you finally get something working out !!!

Let me know if I can close this issue after that.

anchor avatar

guillim (anchor)

18 days ago

Closing but feel free to reopen if necessary

Developer
Maintained by Community
Actor metrics
  • 262 monthly users
  • 30 stars
  • 99.2% runs succeeded
  • 6.9 hours response time
  • Created in Oct 2021
  • Modified 9 days ago