Google Advanced Search is a more detailed method of finding information on Google. It uses a variety of Google search operators that consists of specific characters and commands – also known as "advanced operators" – that go beyond a standard Google search.
An example is shown in the picture below:
By clicking the Advanced Search button at the bottom of the webpage, it will lead to the search result page: https://www.google.com/search?as_q=web+designer&as_epq=&as_oq=&as_eq=&as_nlo=&as_nhi=&lr=lang_en&cr=&as_qdr=w&as_sitesearch=&as_occt=any&safe=images&as_filetype=&tbs=
This tutorial will show you how to scrape data from Google Advanced Search results with Octoparse using the URL above.
You can also go to "Task Templates" on the main screen of the Octoparse scraping tool and start with the ready-to-use Google Advanced Search Template directly to save time. For further details on task templates, you may check out here.
Here are the main steps of this tutorial:
- Create a Go to Web Page - to open the target website
- Create a Pagination - to load more data on the following websites
- Create an Extract Data - to extract the search results
- Set Wait before Action - to make sure Data is fully loaded
- Run the task - to get your desired Data
1. Create a Go to Web Page - to open the target website
- Enter the target URL into the search bar on the home screen and click Start
2. Create a Pagination - to load more data on the following websites
- Click Next at the bottom of the webpage
- Click Loop click next page
- Set AJAX timeout: 7-10s recommended
If you want to learn more about how Octoparse can resolve the Captcha automatically during the extraction, please check out here: Resolve Captcha
3. Create an Extract Data - to extract the search results
- Click the Title and Content of the first item on the webpage
- Click Select All to create a loop of extraction
- Click Extract data
- Click on the More button next to the data field > Customize XPath
- Modify the XPath of the Data field as below:
4. Set Wait before Action - to make sure data is fully loaded
Wait before action is a function that can be set to every action in the workflow. It will let the task wait before the action is executed.
In this case, it is better to add a Wait before Action for every step in the workflow.
- Click on each step respectively > Options
- Set Wait before action: 2-3s recommended
- Click Apply to save the change
5. Run the task - to get your desired Data
- Click Save on the upper right to save your task
- Click Run next to it and wait for a Run Task window to pop up
- Select Run on your device to run the task on your local device
- Wait for the task to complete
Here is the sample output from a local run: