You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!
Yell is the UK's leading online business directory. You can search for local businesses across the UK on this website. This tutorial will show you how to collect business details on Yell.com with Octoparse.
To demonstrate, we will use the URL below as an example.
We will scrape data such as Title, Address, Phone number, and Website from the web page.
Here are the main steps in this tutorial: [Download demo task file here ]
1. Go to Web Page - open the target web
Enter the URL on the home page
Click Start to create a new task
2. Auto-detect web page data - to set up the workflow
Click the Auto-detect web page data
Wait for the detection to complete
Go to Data preview to see if you're okay with the current data output
You can delete unnecessary data fields directly by clicking the trash button
You can also modify the data field names here directly by clicking the edit button
Uncheck the option of Add a page scroll
Click Create workflow
Octoparse will automatically generate a workflow with the data fields it has detected.
3. Extract data - extract phone numbers and websites
There could be some information that is not detected by auto-detection, and we can select them to scrape manually
Select the website of the first business on the webpage (note to select from the area highlighted in red)
Choose Extract the URL of the selected link
Click ... and modify the XPath of the URL field into //a[contains(text(),'Website')]
Click Apply to confirm
Scraping phone numbers is tricky in this case as the numbers are not visible on the web page but are stored in the HTML code. We can scrape a field and modify the XPath of the field to get the phone number.
Select the Call button on the page and choose to Extract the text of the element
Click ... and modify the XPath of the field into //span[@itemprop="telephone"]
Click Apply to confirm
The Email address cannot be scraped in this case as the web page does not include the email address in its source code. Clicking the Email button would direct you to a page where you can submit information.
Rename the fields if needed
4. Start extraction - run the task and get the data
Click Run on the upper left side
Select Run on your device to run the task on your computer, or select Run in the Cloud to run the task in the Cloud (for premium users only)
You can export the result data in provided formats such as EXCEL, CVS, JSON, or your database.
Here is the sample output.