Scrape business details from Yell(V8.4)
FollowYell is the UK's leading online business directory. You can search for local businesses across the UK on this website. In this tutorial, we will show you how to collect business details on Yell.com with Octoparse.
To demonstrate, we will use the URL below as an example.
https://www.yell.com/ucs/UcsSearchAction.do?scrambleSeed=627415385&keywords=dentists&location=London
We will scrape data such as Title, Address, Phone number, and Website from the web page.
Here are the main steps in this tutorial: [Download demo task file here ]
- Go to Web Page - open the target web
- Auto-detect web page data - to set up the workflow
- Extract data - modify the data fields
- Start extraction - run the task and get data
1. Go to Web Page - open the target web
- Enter the URL on the home page
- Click Start to create a new task
2. Auto-detect web page data - to set up the workflow
- Click the Auto-detect web page data
- Wait for the detection to complete
- Go to Data preview to see if you're okay with the current data output
- You can delete unnecessary data fields directly by clicking the icon
- You can also modify the data field names here directly by clicking the icon
- You can delete unnecessary data fields directly by clicking the icon
- Uncheck the option of Add a page scroll
- Click Create workflow
Octoparse will automatically generate a workflow with the data fields it has detected.
3. Extract data - extract phone numbers and websites
There could be some information that is not detected by auto-detection and we can select them to scrape manually
- Select the Website of the first business on the webpage (note to select from the area highlighted in red)
- Choose Extract the URL of the selected link
- Click ... and modify the XPath of the URL field into //a[contains(text(),'Website')]
- Click Apply to confirm
Scraping phone numbers is tricky in this case as the numbers are not visible on the web page but are stored in the HTML code. We can scrape a field and modify the XPath of the field to get the phone number.
- Select the Call button on the page and choose Extract the text of the element
- Click ... and modify the XPath of the field into //span[@itemprop="telephone"]
- Click Apply to confirm
- Rename the fields if needed
4. Start extraction - run the task and get the data
- Click Save
- Click Run on the upper left side
- Select Run on your device to run the task on your computer, or select Run in the Cloud to run the task in the Cloud (for premium users only)
You can export the result data in provided formats such as EXCEL, CVS, JSON or in your database.
Here is the sample output.
Is this article helpful? Contact us at any time if you need our help!
Writer: Joy
Editor: Yina