Scrape Data Via Google Searching
FollowIn this tutorial, we are going to show you how to scrape data on Google search.
To follow through, you may want to use this URL in the tutorial:
We will scrape data such as the title, URL, and description from the search results page with Octoparse.
Here are the main steps in this tutorial:[Download demo task file here ]
- "Go To Web Page" - to open the targeted web page
- "Enter Text" – to enter single/multiple keywords to be searched through
- Create a pagination loop - to scrape multiple listing pages
- Extract data- to scrape all the items on each page
- Save and start extraction - to run the task and get data
1) "Go To Web Page" - to open the targeted web page
- Click "+ Task" to start a task using Advanced Mode
- Paste the URL into the "Extraction URL" box
- Click "Save URL" to move on
2) “Enter Text” – to enter single/multiple keywords to be searched through
- Click "Search box"
- Click "Enter text" on the "Action Tips"
- Enter the keyword/s you want
When inputting multiple keywords into Octoparse, Octoparse would generate a loop, and automatically enter every word into the search box, one word a time. (Check here to see multiple keywords tutorial).
- Click "OK"
- Click the "Search" button
- Click "Click button" on the "Action Tips"
Tips! If you find the default built-in browser is incompatible with the result page, then you could modify the browser setting.
For more about texts/keywords inputting, please refer to Text/keyword input |
3) Create a pagination loop - to scrape multiple listing pages
- Scroll down and click the "Next Page" button on the webpage
- Click "Loop click next page" on "Action Tips"
4)Extract data- to scrape all the items on each page
We are now on the second result page. Before moving on, we'd better go back to the first page.
- Click "Go To Web Page" in the workflow
- Click "Enter text” and “Click item” in sequence
By clicking through each step in the workflow, you can easily see how Octoparse is interacting with the website.
- Click "Pagination" in the workflow
- Click any 2 result sections consecutively
- Click “Select all sub-elements”
- Click “Select all”
- Click “Extract data”
- Delete the unwanted or useless data fields
- Rename the fields by selecting from the pre-defined list or inputting on your own
5) Save and start extraction - to run the task and get data
- Click"Save"
- Click "Start Extraction" on the upper left side
- Select "Local Extraction" to run the task on your computer, or select "Cloud Extraction" to run the task in the Cloud (for premium users only)
Here is the sample output.
Is this article helpful? Contact us any time if you need our help!
From: https://www.octoparse.com/tutorial-7/scrape-data-via-google-searching