Flipkart, as India's biggest e-commerce platform, owns a large number of users and occupies a big market in India. Millions of people shop on the website every single day. Flipkart provides almost everything one needs in our daily life.
Through the data from the website, one can easily have a comparison between similar products. In this case, we will scrape data such as the image URL, product title, price, and other info of T-shirts on Flipkart with Octoparse.
To follow through with the tutorial, kindly please use the following URL for reference:
Here are the main steps in this tutorial:
- Enter the URL on the home page - to open the target website
- Start auto-detection - to create a workflow
- Modify XPath for pagination - to successfully go to the next page
- Run the task - to get your target data
1. Enter the URL on the home page - to open the target website
To start our scrape journey, the target website needs to be input first.
- Enter the Flipkart search URL into the search box at the center of the home screen. Click Start to create a new task in Advanced Mode.
2. Start auto-detection - to create a workflow
The auto-detect function in Octoparse can identify the structure of the page and automatically generate a collection process.
- Click Auto-detect web page data in the Tips box and wait for the detection to complete
- Check the data fields in Data Preview and delete unwanted data or rename them if needed
- Click Create workflow
The workflow is then generated as below:
3. Modify Xpath for the pagination - to successfully go to the next page
To get the pagination going right, an accurate Xpath is essential.
- Click Pagination
- Input the modified Xpath in Matching XPath: //span[contains(text(),'Next']/..
- Click Apply to save the modification
4. Run the task - to get your target data
- Click the Save button first to save all the settings you have made
- Then click Run to run your task either locally or cloudly
- Select Run on your device and click Run Now to run the task on your local device
- Waiting for the task to complete
Below is a sample data run from the local run. Excel, CSV, HTML, and JSON formats are available for export.