In this tutorial, we are going to show you how to scrape restaurant information in Google Maps.
For Google Maps, you could visit our easy-to-use "Task Template" on the main screen of the Octoparse scraping tool. All you need is to type in several parameters and the task is ready to go. For further details, you may check it out here: Task Templates
To follow through, you may want to use this URL in the tutorial:
We will scrape the data such as the restaurant name, rating, category, location, description, and hours from details page with Octoparse.
This tutorial will also cover:
- Deal with AJAX for pagination
Here are the main steps in this tutorial: [Download task file here]
- Go To Web page - to open the targeted web page
- Create a pagination loop - to scrape all the results from multiple pages
- Create a "Loop Item" - to scrape all the item details on the current page
- Extract data - to select the data for extraction
- Start extraction - to run the task and get data
1) Go To Web page - to open the targeted web page
In this tutorial, the first step is a little different. We need to set "Browser" in "Settings" in order to open Google Maps correctly since the default browser cannot open Google Maps.
- Click "+ Task" to start a task using Advanced Mode
- Paste the URL into the "Extraction URL" box and click "Save URL" to move on
The default built-in browser is incompatible with Google Maps. Hence, we need to switch to a compatible browser.
- Click "Save" to save your task
- Click "Settings" and change your browser to "Firefox 45.0" and click "Save"
2) Create a pagination loop - to scrape all the results from multiple pages
- Click the next page button ">"
- Click "Loop click Single Button" on "Action Tips"
- Uncheck the box for "Retry when page remains unchanged (use discreetly for AJAX loading)"
- Set up AJAX Load time as 15s for the "Click to paginate" action
- Click "OK" to save
AJAX timeout can often be used as web page timeout for Click Action. For example, when you have a page that takes forever to finish loading, long after the data you need gets loaded, you can conveniently use AJAX timeout to tell Octoparse to move on to the next action when the set time is reached.
If you want to learn more about AJAX, you can enjoy the video tutorial here .
3) Create a "Loop Item" - to scrape all the item details on the current page
- Click "Go To Web Page" to go back to the first page
When extracting data throughout multiple pages, you should always begin your task building on the first page.
- Select the first and the second section containing restaurant information on the current page
- Click "Extract data in the loop" on the "Action Tips" panel
Octoparse will automatically select all the sections on the current page. The manually selected sections will be highlighted in green with all the sub-elements highlighted in red.
4) Extract data - to select the data for extraction
- Delete unwanted data fields
We will keep the restaurant name, rating, category, location, description, and hours.
- Click "OK" to save
- Rename the fields by selecting from the pre-defined list or inputting on your own
5) Start extraction - to run the task and get data
- Click "Save"
- Click "Start Extraction" on the upper left side
- Select "Local Extraction" to run the task on your computer, or select "Cloud Extraction" to run the task in the Cloud (for premium users only)
Here is the sample output. You can see some blank fields in the column “Description” and column "Hours". This is because some restaurants do not contain any description and/or the hours of operation.
Octoparse may fail to find the element of the defined pattern and leaves the data field blank even if the element needed is shown on the website. If you encounter this problem, here are a related tutorial you might need：
Happy data hunting!