Scrape reviews from BestBuy (V7.3)
FollowIn this tutorial, we are going to show you how to scrape the listing details from BestBuy.
For Bestbuy, you could visit our easy-to-use "Task Template" on the main screen of the Octoparse scraping tool. All you need is to type in several parameters and the task is ready to go. For further details, you may check it out here: Task Templates
To follow through, you may want to use this URL in the tutorial:
We will scrape data such as product title, model, and reviews from the product details page with Octoparse.
Here are the main steps in this tutorial: [Download task file here]
- "Go To Web Page" - to open the targeted web page
- Create a pagination loop - to scrape all the reviews from multiple pages
- Create a "Loop Item" to scrape all the reviews on one page
- Extract data - to select the title and model to extract
- Extract data - to select the reviews to extract
- Run extraction - to run your task and get data
1. "Go To Web Page" - to open the targeted web page
- Click "+ Task" to start a new task with Advanced Mode
- Paste the URL into the "Input URL" box
- Click "Save URL" to move on
If you enter the website for the first time, you may encounter a page which ask you to choose a country.
- Click
"build-in browser" and then choose your own country
- Click
"build-in browser" again to exit the mode and continue next step
2. Create a pagination loop - to scrape all the reviews from multiple pages
- Scroll down to the bottom of the page
- Click the next page button ">"
- Click "Loop click next page" on the "Action Tips"
- Uncheck the box for "Retry when the page remains unchanged"
- Set "Wait before execution" as 5 seconds (optional according to your network)
- Check the box for "Load the page with AJAX" and set AJAX Timeout
Tips! AJAX timeout can often be used as webpage timeout for Click Action. For example, when you have a page that takes forever to finish loading, long after the data you need gets loaded, you can conveniently use AJAX timeout to tell Octoparse to move on to the next action when the set time is reached. If you want to learn more about AJAX, here are related tutorials you might need:
|
3. Create a "Loop Item"- to scrape all the reviews on one page
- Click the first tow titles of the list on the current page
- Click "Loop click each element" on the Action Tips panel
- Uncheck "Retry when the page remains unchanged"
- Check the box for "Load the page with AJAX" and set AJAX Timeout as 5s
- Click "Save" to move on
4. Extract data - to select the title and model to extract
- Click on the data you need on the page
- Select "Extract text of the selected element" from the "Action Tips"
- Click"Ratings and Reviews"
- Choose "Click button"
5. Extract data - to select reviews for extraction
- Scroll down and click "See all customers reviews"
- Choose "Click element"
- Uncheck "Retry when the page remains unchanged"
- Click "Load the page with AJAX" and set up AJAX Timeout as 5s
- Click the first review on inner browser
- Click "Select all"
- Click "Extract text of the selected element" on the "Action Tips"
- Rename the fields by selecting from the pre-defined list or inputting on your own
6. Start extraction - to run the task and get data
- Click "Save"
- Click "Start Extraction" on the upper left side
- Select "Local Extraction" to run the task on your computer, or select "Cloud Extraction" to run the task in the Cloud (for premium users only)
Here is the sample for your information.
Happy data hunting!