Why do I get errors when extracting from list page? The error said "One data field is missing. HTTP status code 200".
When scraping from the list page, errors are usually caused by the loop item locating some unwanted items that do not contain the data field you need.
Take this URL as an example:
After creating a loop item to scrape the job information on the web page, you will find there are items different from those job items.
There should be only 25 job items on the page, but the loop item locates 33 items, which means some undesired items are located.
The normal job items are like this:
But in the loop, there are some items(Featured jobs block) like this:
When Octoparse scrapes the Featured Jobs, it cannot find the right information, so it gives errors to notify users that there might be some issue with the items.
How can we resolve this error?
It is easy to resolve it--just modifying the XPath to make sure only the desired job items are located in the loop.
In this case, if we inspect the HTML code of the job items, you will find they are all in the div tags of which the id contains "jobsearchresult".
So we can modify the XPath into //div[contains(@id,'jobsearchresult')].
After saving the new XPath, you will see the number of items turn to 25, which is the correct number.
If you want to learn more about XPath and how to generate it, here are some
related tutorials you might need： Locate elements with XPath Video: Octoparse: XPath 101
Artículo en español: ¿Por qué recibo errores al extraer de la página de lista?
También puede leer artículos de web scraping en el sitio web oficial