When scraping product information from e-commerce websites, it's often necessary to extract data from both the search results page and the individual product detail pages. In this tutorial, we’ll teach you how to build a customized crawler to accomplish that.

Let's say we want to scrape blog information from Octoparse. Here's the sample URL : https://www.octoparse.com/blog

The listing page:

The details page:

In this case, we'll start by extracting the basic information of blogs from the listing page first and then go to each blog's details page to get the full content. We can use two methods to achieve this.

1. Use the Auto-detect feature to create a workflow

The smart detection feature in Octoparse 8.7 is more powerful than ever. We can use it to detect the webpage and help us create a workflow.

Click Auto-detect web page data in the Tips panel and wait for it to complete

Switch between the auto-detect results to find your desired data fields (result 1 in this case)

Check the data fields in the Data Preview section and delete unwanted ones

Usually, we need to scrape data from multiple search result pages. Then we can set a pagination action to extract data from each page.

Paginate to scrape more pages: Leave the box checked to enable pagination. Click the Check button to verify whether Octoparse has correctly identified the Next Page button.
Add page scrolls: Uncheck the Add a page scroll option since the webpage doesn't require scrolling to load all contents, then click Create Workflow.

Octoparse has now created a Loop Item in the workflow, which allows you to scrape data from the search results pages. Next, we’ll continue building the steps to navigate to the detail pages

Select Select subpage URL

Click on Confirm

Now Octoparse has taken us to the details page for further data extraction. We can take down the information we want from the page.

Click on any web element you want to extract
Click Text from the Tips panel
Modify the data field names in the Data Preview section by double-clicking on the field header

Click Save and Run the task

Select a mode to run your task.

Here is the sample output:

2. Manually create the workflow

If the auto-detect function fails for some websites, we can also set up the workflow manually. See the steps below:

Select the first item on the list page
Continue selecting the second item
Click Text

A Loop Item has now been added to the workflow, but only one field has been scraped. We can add other fields.

Select any information you want to scrape from the results page
Choose Text

Repeat the steps above to add more fields

Then we need to build an action to click on the product title URL.

Select the first title on the list page
Click Click element

Once we are taken to the details page, we can extract the information from the Item specifics.

Click on any web element you want to extract
Click Text from the Tips panel

Modify the data field names in the Data Preview section by double-clicking on the field header

Click on Loop Item to go back to the listing page
Click on the 'next page button' and Loop click to create a pagination step so the task can go through all pages
The final workflow should look like this:

Click Save and Run the task.
Select a mode to run your task.
Here is the sample output:

Note: If the website uses infinite scrolling to load more items, you can manually add a scroll step. To do this, click the "+" button in the workflow, select Loop.

Then switch the loop mode to Scroll Page, and click Apply.

Scrape a list of data

Scrape articles from Medium

Scraping product details from Wayfair

Scrape product details from Amazon

Troubleshooting Common Octoparse Scraping Issues

Scrape data from both listing and detail pages

1. Use the Auto-detect feature to create a workflow

2. Manually create the workflow