In this tutorial, we will show you how to collect business details on Yell.com with Octoparse.
To demonstrate, we will use the URL below as an example.
We will scrape data such as Title, Address, Rating, Phone number, and Reviews from the business detail pages with Octoparse.
Here are the main steps in this tutorial: [Download demo task file here ]
- Go To Web Page - open the target web
- Create a pagination loop - enable Octoparse to scrape across all available pages
- Create a "Loop Item" - build a loop for all the businesses on the page then click into each one of them
- Extract data - select the data fields needed for the extraction
- Start extraction - run the task and get data
1. Go To Web Page - open the target web page
- Click "+ Task" to start a task using the Advanced Mode
- Copy and paste the target URL into the "Extraction URL" box
- Uncheck "Retry when page fails to load"
- Click "Save URL" to move on
If you are visiting Yell for the first time, a notification (which can't be blocked) will appear. You need to switch to browser mode and click "Got it" so the notification will not pop up when scraping data.
2. Create a pagination loop - scrape multiple list pages
- Scroll down the page and click the "next" button
- Click "Loop click the single element" on the "Action Tips"
- Uncheck the "Retry when the page remains unchanged"
- Click "Load the page with Ajax" and set AJAX Timeout as 5s (optional)
AJAX timeout can often be used as webpage timeout for Click Action. For example, when you have a page that takes forever to finish loading, long after the data you need gets loaded, you can conveniently use AJAX timeout to tell Octoparse to move on to the next action when the set time is reached. If you want to learn more about AJAX, you can enjoy the video tutorial here.
- Click on the first title of the listing on the current page
- Select "Select all" on the "Action Tips" panel
- Select "Loop click each element" on the "Action Tips" panel
- Uncheck the box for "Retry when page remains unchanged (use discreetly for AJAX loading)"
- Set "AJAX Timeout" as 5s (optional)
- Click "Save"
4. Extract data - select the data fields needed for the extraction
- Click on the specific data fields
- Select "Extract text of the selected element" from the "Action Tips"
- Repeat the same steps until all data fields needed have been captured
- Click on any two reviews to have the list of all reviews identified
- Select "Extract text from selected elements" on the "Action Tips" panel then all the reviews would be extracted in a loop
- Click "Action Options" and set "Waiting Before Execution" as 2s
- Rename the fields by selecting from the pre-defined list or inputting on your own
5. Start extraction - run the task and get data
- Click "Save"
- Click "Start Extraction" on the upper left side
- Select "Local Extraction" to run the task on your computer, or select "Cloud Extraction" to run the task in the Cloud (for premium users only)
Here is the sample of your data.
Happy data hunting!