In this tutorial, we will show you how to collect business details on Yell.com with Octoparse.
To demonstrate, we will use the URL below as an example.
We will scrape data such as Title, Address, Phone number, and Website from the web page.
Here are the main steps in this tutorial: [Download demo task file here ]
- Go to Web Page - open the target web
- Auto-detect web page data - to set up the workflow
- Extract data - modify the data fields
- Start extraction - run the task and get data
1. Go to Web Page - open the target web
- Enter the URL on the home page
- Click "Start" to create a new task
2. Auto-detect web page data - to set up the workflow
- Click the "Auto-detect web page data"
- Wait for the detection to complete
- Go to "Data preview" to see if you're okay with the current data output
- You can delete unnecessary data fields directly by clicking the icon
- You can also modify the data field names here directly by clicking the icon
- Uncheck the option of "Add a page scroll"
- Click "Create workflow"
Octoparse would generate a workflow automatically with the data fields it has detected.
3. Extract data - extract phone numbers and websites
There could be some information not detected by auto-detection and we can select them to scrape manually
- Select the "Website" of the first business on the webpage(note to select from the area highlighted in red)
- Choose "Extract the URL of the selected link"
- Click open the settings of Extract Data and modify the XPath of the field into //a[contains(text(),'Website')]
Scraping phone numbers is tricky in this case as the numbers are not visible on the web page but are stored in the HTML code. We can scrape a field and modify the XPath of the field to get the phone number.
- Select the "Call" button and extract the text
- Click open the settings of Extract Data and modify the XPath of the field into //span[@itemprop="telephone"]
The Email address cannot be scraped in this case as the web page does not include the email address in its source code. Clicking the Email button would direct you to a page to submit information.
- Rename the fields if needed
4. Start extraction - run the task and get data
- Click on the upper left side
- Select "Run on your device" to run the task on your computer, or select "Run in the Cloud" to run the task in the Cloud (for premium users only)
You can export the result data in provided formats such as EXCEL, CVS, JSON or in your database.
Here is the sample output.
Tutorial en español: Scrapear los detalles comerciales de Yell
También puedes leer más artículos de web scraping en el sitio web oficial
Is this article helpful? Contact us at any time if you need our help!