Use relative XPath to locate data outside a loop item
FollowAlthough somewhat rare, in some cases we need to get data outside an exsiting loop item.
Let's say we want to extract data from the Amazon Best Sellers page. For each product, we need to get its product details and the category it belongs to at the same time, as shown in the picture below:
If we create a loop just for the products, apparently "category" data will be outside the "product" loop. You might try to resolve the issue by creating another loop to get the category data. Try as you might, it will not end well... because Octoparse will yell at you for overlapping two loops. But if the new loop is completely independent from the exisiting loop, we will fail to establish data between the two loops.
It seems that we are stuck in a dilemma. What can we do? The answer is actually quite simple:
Use the XPath for the product loop as an AXIS and write relative XPath to locate the category data
In case you are still confused, allow me to explain to you step by step with the sample website: https://www.amazon.com/gp/bestsellers/?ref_=nav_em_cs_bestsellers_0_1_1_2
Step 1 Create a loop for all the products
You can check the XPath for the products in the HTML source code:
The XPath for the products will be:
//li[@class="a-carousel-card"]
Use this as an axis, we can get the category data by adding the red part below:
//li[@class="a-carousel-card"]/ancestor::div[@class="a-row a-carousel-controls a-carousel-row a-carousel-has-buttons"]/preceding-sibling::div[@class="a-row a-carousel-header-row a-size-large"]//h2
As we have set the blue part as the matching XPath for the loop item (which is the AXIS), the XPath for the product data field should be left blank, while the XPath for the category data field should be the same as the red part.
The sample data will look like this:
If you have further issues with the task or any suggestions, we’d love to hear about them. Submit a request here.