Although somewhat rare, in some cases we need to get data outside an existing loop item.

Let's say we want to extract data from the Amazon Best Sellers page. For each product, we need to get its product details and the category it belongs to at the same time, as shown in the picture below:


If we create a loop just for the products, apparently "category" data will be outside the "product" loop. You might try to resolve the issue by creating another loop to get the category data. Try as you might, it will not end well... because Octoparse will yell at you for overlapping two loops. But if the new loop is completely independent of the existing loop, we will fail to establish data between the two loops.

It seems that we are stuck in a dilemma. What can we do? The answer is actually quite simple:

Use the XPath for the product loop as an AXIS and write relative XPath to locate the category data


In case you are still confused, allow me to explain to you step by step with the sample website:

Create a loop for all the products


Check the XPath for the products in the HTML source code:


The XPath for the products will be: //li[@class="a-carousel-card"]

Using this as an axis, we can get the category data by adding the red part below:

//li[@class="a-carousel-card"]/ancestor::div[@class="a-row a-carousel-controls a-carousel-row a-carousel-has-buttons"]/preceding-sibling::div[@class="a-row a-carousel-header-row a-size-large"]//h2

As we have set the blue part as the matching XPath for the loop item (which is the AXIS), the XPath for the product data field should be left blank, while the XPath for the category data field should be the same as the red part.


The sample data will look like this:

Did this answer your question?