If there exists any error after starting extraction, you will find out the "Extraction error report" is generated next to "Data extracted" field. "Data cannot be extracted. HTTP status code: 200" is a common error message, and this tutorial is going to show you how to deal with the error.
What does "HTTP status code: 200" mean?
The HTTP status code 200 means OK, which is the standard response for successful HTTP requests. This is to say, Octoparse loads the web page successfully but fails to extract some data.
How to fix it?
Firstly, you should check that if the web page contains the data you want to extract. If the web page does contain the data, you should refer to the following situations.
1. The data is not loaded by Octoparse, and Octoparse has tried to extract the data but failed.
In this case, you can set up "Wait before execution" or "wait until element is found" in the "Advanced Options" of the "Extract data" step to let Octoparse load the data before execution.
Sometimes you need to let Octoparse automatically scroll down to load the data completely. You can check the box for "Scroll down" in the "Advanced Options" of the "Go To Web Page" and set up how you would like Octoparse to scroll down the page. For example, you can set "Scroll times" as 1, "Interval" as 1 second, and scroll way as "Scroll down for one screen".
2. The data can be found in some web pages but not in some other pages after pagination. This happens because the XPath cannot locate all the data due to the different format of web pages, and you should modify the XPath to locate all the elements correctly.
|Tips: If you don't know how to modify XPath, you can see this tutorial: Locate elements with XPath|
3. Octoparse fails to recognize web pages. This situation mostly happens when Octoparse clicks links in a loop. Octoparse is supposed to recognize web page and extract data after clicking and opening the new page, but it fails to recognize the new web page and remains to recognize the previous page. Thus, Octoparse cannot extract the targeted data, though the new web page is open successfully.
You can fix it by repairing the workflow like the below gif:
Artículo en español: ¿Cómo resolver el error "HTTP status code 200"?
También puede leer artículos de web scraping en el Website Oficial