You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier, and more robust! Download and upgrade here if you haven't already done so!
Best Buy is a consumer electronics retailer with operations in the U.S., Mexico, and Canada. At its brick-and-mortar locations and online, Best Buy sells a wide variety of items ranging from mobile phones, video games, and music to home appliances like washing machines.
This tutorial will show you how to scrape product information, such as product title, price, rating, delivery option, etc., from Best Buy with the Octoparse.
To follow through, here is the example URL:
The main steps are shown in the menu on the right, and you can download the sample task file here.
1. Create a Go to Web Page - to open the target website
Enter the target URL on the homepage of Octoparse and click Start
2. Auto-detect the webpage - to create a workflow
Click Auto-detect web page data and wait for it to complete
Go to Data Preview to see if you're okay with the current data output
Delete unnecessary data fields directly by clicking the delete icon next to the field name
Uncheck Add a page scroll
Click Create workflow
3. Set up a Page Scroll - to better load the data on the webpage
Click Go to Webpage > Option panel
Tick Scroll down the page after it is loaded
Set the Scroll Mode as for one screen
Click Apply to save the settings
Repeat the steps above for the step Click to Paginate
4. Clean Data - to get the product rating as text
Click on the rating of any product
Click Text on the Tips panel
Click the More button next to the data field name
Select Customize XPath
Modify the XPath of the Rating field as: //div[contains(@class,'c-ratings-reviews-small')]/p
Click Apply to save the settings
Click the More button next to the data field name
Select Clean Data
Click on Add Step > Match with Regular Expression
Enter the Regular Expression as: (?<=Rating )(.+?)(?=out of)
NOTE: For more information on how to write a Regular Expression with Octoparse, please check: Regular expression tool
Click Apply to save the data cleaning output
5. Run the task - to get your target data
Click Save and click Run on the upper right side
Select Run on your device to run the task on your computer, or select Run in the Cloud to run the task in the Cloud (for premium users only)
Here's the sample data output for your reference: