Can I extract emails or phone numbers from websites?
With Octoparse you can easily extract emails and phone numbers from webpages with the same/similar page layout. Emails/phone numbers extraction can be easily done by configuring a task in our app.
What type of emails or phone numbers does Octoparse extract?
When Octoparse extracts data, it actually parses and picks up the content from the source code. Hence emails or phone numbers in text format can be successfully captured by Octoparse.
e.g. when we extract "Krishnam Bio-tech", Octoparse actually captures the text between the attributes <span> capture the text here <span>.
Some websites may use anti-crawled measures and encrypt the email or phone numbers in an image or other non-text formats. (But it looks like text format on the webpage). In this case, Octoparse is not able to extract them from images or decode them into text format.
How does Octoparse work for emails or phone numbers extraction?
To capture email or phone numbers you are after, first you would need to specify their location by clicking directly on the various pieces of information.
e.g. Take yellowpages.com as an example. To extract emails and phone numbers, you need to specify their location on the webpage by selecting them.
Click on the email and phone numbers, and then select “Extract text of the selected element”.
When the data is selected properly, the selection will be highlighted in green.
When coupled with the other techniques such as pagination, it can achieve the data scraping on the entire category or website.