You are browsing a tutorial guide for Octoparse's latest version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier, and more robust! Download and upgrade here if you haven't already done so!

Bing is one of the most popular search engines around the world. In this tutorial, we are going to show you how to scrape result information from Bing.com.

For Bing, you can visit our easy-to-use "Task Template" on the main screen of the Octoparse scraping tool. All you need to do is type in several parameters, and the task is ready to go. For further details, please check it out here: Task Templates

bing.png

To follow through, you may want to use this URL in the tutorial:

http://www.bing.com/search?q=Web+scraping&form=QBLHVN&sp=-1&pq=web+scraping&sc=8-12&qs=n&sk=&cvid=0F966DDFA0C4442CA6957B085350A50Dwww.bing.com

We will scrape data such as the title, URL, and description from the search results list with Octoparse.

Here are the main steps in this tutorial [Download demo task from here]:

  1. "Go to Web Page" - open the target web page

  2. Create a pagination loop - scrape multiple listing pages

  3. Extract data - scrape certain elements on each page

  4. Save and start extraction - run the task and get the data


1. "Go to Web Page" - open the target web page

  • Enter the example URL and click Start

1.png

2. Create a pagination - to scrape multiple listing pages

  • Scroll down and click the ">" button on the web page

  • Click Loop click single URL on the Tips panel

_1.gif

3. Extract data - scrape certain elements from each page

Let's start with the 1st non-ad item on the search result list.

  • Click on the 1st non-ad item title on the page

  • Click Select all on the Tips panel

2.png
  • Choose Extract text of the selected elements on the Tips panel

4.png
  • Click on the title of the first item

  • Choose Extract the URL of the selected link from the Tips panel

Scrape_title_URL.jpg
  • If you need the description, click on the text and then choose Extract the text of the element

Scrape_description.jpg
  • You can also add some predefined data fields from the "+" icon. I choose the Current date & time to have the extracted time

5.png
  • Double click on the field name to rename it if needed

6.png

Here we found that some ads are still included in our loop, but we don't need the ads. Therefore, we would need to modify the XPath.

  • Click on the Loop Item and change the XPath to //li[@class='b_algo']

  • Click Apply to save

7.png

XPath for the data fields also needs to be modified.

  • Switch the Data Preview to Vertical View

  • Modify the XPath of the fields as below

Title: //h2

Title URL: //h2/a

Description: //p

XPath.jpg

TIP: Modifying XPath in Octoparse works very well with more flexibility and accuracy than the XPath auto-generated.

Here are some related tutorials you might need:


4. Save and start extraction - run the task and get the data

  • Click Save to save the task first

  • Click Run on the upper left side

  • Select Run task on your device to run the task on your computer, or select Run task in the Cloud to run the task in the Cloud (for premium users only)

Here is the sample output.

10.png
Did this answer your question?