In version 7.2, Octoparse is now able to auto-detect for AJAX and set up "AJAX load" and "AJAX Timeout" automatically. Previously, users are required to recognize and configure the step for "AJAX load" manually. But now, Octoparse 7.2 has it all covered.
What is AJAX?
- When there's AJAX involved:
- When there's no AJAX involved:
Read more about AJAX:
How does AJAX Auto-detection work?
In the case of Walmart.com, we know that AJAX is being used for pagination as there's no reloading sign when the next page button is clicked. Apparently, the webpage only updates portion of the contents (e.g. product information) and the rest of the web page remains the same without refreshing.
Let's see how Octoparse detects for AJAX and configure the corresponding step for "AJAX load' automatically. If you load the Walmart listing page (https://www.walmart.com/search/?cat_id=0&query=pens) in Octoparse, click the next page button and select "Loop click next page" on "Action Tips", you'll find that Octoparse automatically set up "AJAX Load" as AJAX technique is detected.
"AJAX Timeout" is used together with "AJAX Load". The default setting is 1 or 3 second(s). This is to say, Octoparse will wait for 1 or 3 second(s) after executing the current step, and then execute the next step.
You can also set up "AJAX Timeout" manually. Depending on the actual network environment, you may want to set it to 5 seconds or more which gives more time for the page to load.
When a page is loaded with AJAX, it is critical to make sure that AJAX Timeout is set properly or the crawler may not work right.
Artículo en español: Autodetección AJAX
También puede leer artículos de web scraping en el website oficial