How to use relative XPath to get data outside a Loop Item? (V7.3)
Follow
Question: How to Use Relative XPath to Get Data Outside a Loop Item
If this title doesn't make any sense to you, here is a use case you may be familiar with.
How to extract each product's URL and the category it belongs to at the same time.
Here is the data sample you want.
Answer: It's time to talk about the powerful relative XPath!
Although the relative XPath and matching XPath have always been a hard bone for our users, they are the key to save you from this dilemma.
Take this website as an example: http://www.boribori.co.kr/index.html
At first glance of this issue, you might want to resolve it with two loop items.
The first loop is to get the category value and the counter parted URLs, and use the second loop to extract URL details.
Well, you can try, but it will not end well...
You will find the URL's value in the second loop will not change accordingly, because Octoparse cannot detect two loops on the same page. So let's find an alternative workaround.
You can create a loop item to get all the URLs first and use the locational relationship between the category and the corresponding product URLs to write the XPath getting the category's value.
Based on the XPath of Loop Item (URLs):
//ul[@class="list"]/li
You can write the matching XPath of the category:
//ul[@class="list"]/li/preceding::div[@class="itemBox"]/h5/img
The relative XPath is the additional part of matching XPath relative to Loop Item Xpath:
/preceding::div[@class="itemBox"]/h5/img
Make sure the Loop Item's XPath is part of the category's XPath. That's the bond to connect a relative XPath and matching XPath.
Tips! |
The way to input the matching XPath and relative XPath is shown here.
However, the sample value is still blank. The reason is the default setting is to Extract the text of the element.
The default setting will work if the value of category shows as texts. However, in this case, the category is a value of an attribute "alt", which led the field a blank.
To solve it, simply check another option: "Extract specified attribute of the Item", then select the right attribute type: "alt". Now the value finally shows up.
Tips! |
Here is a data sample you can get!
If you are stick with us until now, I'd say congratulations to you! You just get how to write a relative path to extract data outside a loop item.
Artículo en español: ¿Cómo usar relative XPath para obtener datos fuera de Loop Item?
También puede leer artículos de web scraping en el sitio web oficial
Author: Yanni
Editor: Yina