You are browsing a tutorial guide for the latest Octoparse version. If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!

Flipkart, as India's biggest e-commerce platform, owns a large number of users and occupies a big market in India. Millions of people shop on the website every single day. Flipkart provides almost everything one needs in our daily life.

Through the data from the website, one can easily have a comparison between similar products. In this case, we will scrape data such as the image URL, product title, price, and other info of T-shirts on Flipkart with Octoparse.

INFO.jpg

To follow through with the tutorial, kindly please use the following URL for reference:

https://www.flipkart.com/search?q=t+shirts&as=on&as-show=on&otracker=AS_Query_TrendingAutoSuggest_1_0_na_na_na&otracker1=AS_Query_TrendingAutoSuggest_1_0_na_na_na&as-pos=1&as-type=HISTORY&suggestionId=t+shirts&requestId=6b1b2bb2-7abd-458f-b79e-311ce7af47cd

Here are the main steps in this tutorial: [Download task file here]

  1. Enter the URL on the home page - to open the target website

  2. Start auto-detection - to create a workflow

  3. Modify XPath for pagination - to successfully go to the next page

  4. Run the task - to get your target data


1. Enter the URL on the home page - to open the target website

To start our scrape journey, the target website needs to be input first.

  • Enter the Flipkart search URL into the search box at the center of the home screen. Click Start to create a new task in Advanced Mode.

2022-05-06_10-18-35.jpg

2. Start auto-detection - to create a workflow

The auto-detect function in Octoparse can identify the structure of the page and automatically generate a collection process.

  • Click Auto-detect web page data in the Tips box and wait for the detection to complete

detec.jpg
  • Check the data fields in Data Preview and delete unwanted data or rename them if needed

DATA_PREVIEW.jpg
  • Click Create workflow

CREATE_WORKFLOW.jpg

The workflow is then generated as below:

WORKFLOW.jpg

3. Modify Xpath for the pagination - to successfully go to the next page

To get the pagination going right, an accurate Xpath is essential.

  • Click Pagination

  • Input the modified Xpath in Matching XPath: //span[contains(text(),'Next']/..

  • Click Apply to save the modification


4. Run the task - to get your target data

  • Click the Save button first to save all the settings you have made

  • Then click Run to run your task either locally or cloudly

mceclip8.png
  • Select Run on your device and click Run Now to run the task on your local device

  • Waiting for the task to complete

mceclip9.png

Below is a sample data run from the local run. Excel, CSV, HTML, and JSON formats are available for export.

data.jpg

TIP: Local runs are great for quick runs and small amounts of data. If you are dealing with more complicated tasks or a mass of data, Run in the Cloud is recommended for higher speed. You are very welcome to try the premium feature by signing up for the 14-day free trial here. Tasks could be scheduled hourly, daily, or weekly and data delivered regularly.

Did this answer your question?