In this tutorial, we are going to show you how to scrape list information from Bing.com.
For Bing, you could visit our easy-to-use "Task Template" on the main screen of the Octoparse scraping tool. All you need is to type in several parameters and the task is ready to go. For further details, please check it out here: Task Templates
To follow through, you may want to use this URL in the tutorial:
We will scrape data such as the title, URL, and description from the search results list with Octoparse.
Here are the main steps in this tutorial [Download demo task from here ]:
- "Go To Web Page" - open the target web page
- Create a pagination loop - scrape multiple listing pages
- Extract data - scrape certain elements on each page
- Save and start extraction - run the task and get data
1. "Go To Web Page" - open the target web page
- Enter the example URL and click "Start"
2. Create a pagination loop - to scrape multiple listing pages
- Scroll down and click the ">" button on the web page
- Click "Loop click single URL" on the Tips panel
After the "Pagination" is created, you can check if the pagination works well to paginate to the next page by manually clicking the "Pagination" and "Click to Paginate" actions in the workflow (like what the GIF shows you).
3. Extract data - scrape certain elements from each page
Let's start with the 1st non-Ad item on the search result list.
- Click on the 1st non-Ad item's title on the page
- Click "Select all" on the Tips panel
You will see other similar items being selected.
- Choose "Extract text of the selected elements" on the Tips panel
If all the sections are highlighted in red, it means the loop is successfully created. It will also generate a field of the title. It's fine to keep it.
- Select an item from the Loop Item list, and you'll see the selected one highlighted in blue
- Click the title of the item
- Choose "Extract the URL of the selected link" on the Tips panel
- If you need the description, click the text and then choose "Extract the text of the selected element"
- You can also add some predefined data fields from the "+" icon. I choose the "Current date & time" to have the extracted time
- If you want to rename the field name, just click the icon next to the field name on the "Data Preview".
Here we found that some ads are still included in our loop, but we don't need the ads. Therefore, we would need to modify the XPath.
- Click to modify the Loop item and change the XPath to //li[@class='b_algo']
- Click "OK" to save
Modifying XPath in Octoparse works very well with more flexibility and accuracy than the XPath auto-generated.
Here are some related tutorials you might need:
4. Save and start extraction - run the task and get data
- Click "Save" to save the task first
- Then, click "Run" on the upper left side
- Select "Run task on your device" to run the task on your computer, or select “Run task in the Cloud” to run the task in the Cloud (for premium users only)
Here is the sample output.
Happy data hunting!