You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!

As a division of Walmart Inc., Sam's Club provides a membership warehouse club solution for customers' daily life with high-quality products. It has become well in demand around the world in recent years.

This tutorial will introduce how to scrape basic information such as the name, price, etc., of products from Sam's Club.

sam_search.jpg

To follow through the tutorial, you may want to use the URL below:

https://www.samsclub.com/s/pillow?xid=hdr_search-typeahead_pillow

Here are the main steps in this tutorial: [Download task file here]

  1. Create a Go to Web Page - to open the target website

  2. Auto-detect the webpage - to create a workflow

  3. Modify the XPath of the data field(s) - to locate the fields accurately

  4. Create a Pagination - to load and extract more data

  5. Run the task - to get your desired data


1. Create a Go to Web Page - to open the target website

  • Enter the target URL into the search bar on the home screen and click Start

sam_start.jpg

2. Auto-detect the webpage - to create a workflow

Octoparse's Auto-detection function can help you create a workflow quickly according to the design of the target website.

  • Click Auto-detect web page data in Tips and wait for the detection to complete

auto_detect.jpg
  • Check the data fields in Data preview and delete unwanted fields or rename them if needed

data_preview.jpg
  • Uncheck Add a page scroll and uncheck Click on a “Load More” button

  • Click Create workflow

create.jpg

3. Modify the XPath of the data field(s) - to locate the data accurately

The auto-generated XPath of some fields needs to be modified to make sure that Octoparse extracts accurate data.

In this case, the data in the field Price is incomplete, so we need to modify the XPath of Price to get the right data.

  • Click More(...) next to the data field to change its settings

  • Choose Customize XPath

Xpath.jpg
  • Input the Matching XPath for Price as: //span[contains(text(),'current price')]

  • Click Apply to save the change

xpath_price.jpg

NOTE: You may find that the price data contains not only numbers but also irrelevant words such as CURRENT PRICE in this case. If you would like to remove them, check here to learn more about how Octoparse can help refine the data.

current_price.jpg

4. Create a Pagination - to load and extract more data

  • Click on the next page button at the bottom of the webpage

  • Click Loop click single button on the Tips panel

Pagination.jpg
  • Set appropriate AJAX timeout: 7-10s recommended

AJAX.jpg

NOTE: If you are interested in how Octoparse handles AJAX websites, please check out here.

Now, you will see a workflow created like the one below:

sam_workflow.jpg

5. Run the task - to get your desired data

  • Click Save on the upper right to save your task

  • Click Run next to it and wait for a Run Task window to pop up

  • Select Run on your device to run the task on your local device

  • Wait for the task to complete


Here is sample output from a local run:

sam_data.jpg
Did this answer your question?