Many websites apply AJAX technique for creating better, faster, and more interactive web pages. Octoparse can easily deal with pages with AJAX. In this article, I will show you how to handle AJAX in Octoparse.
1. What Is AJAX?
2. How do I know if a web page loads content using AJAX?
When you have a click action to load web data, it is rather straight forward to tell if AJAX is being used. When AJAX is used, the web page loads the additional content without reloading the page. Hence, the reloading icon is a good indicator to tell apart if AJAX's been used.
- When there's AJAX involved, the page should not reload when additional content gets loaded. So there should be NO reloading sign in this case.
- If there's no AJAX involved, you should see the page reloads with the reloading icon running when you click to load more information.
3. How to handle AJAX in Octoparse?
Octoparse takes reloading as a signal when executing the click item. If the page reloads after clicking an element, it will execute the next action after the reloading finishes. But as pages with AJAX do not reload, Octoparse doesn't receive the signal to act and would get stuck. So we need to set up AJAX timeout for the "Click Item" or "Click to Paginate" to tell Octoparse to go to the next action when the timeout is reached. There are two ways AJAX can be taken care of in Octoparse.
Octoparse would set up AJAX timeout automatically when AJAX is detected for the page.
For example, Walmart's website uses AJAX to load the next page. So when we choose to click the next page button, Octoparse automatically sets up AJAX timeout for the action.
If you need a longer or shorter timeout, simply click on the dropdown menu and choose the one you'd like.
Set up AJAX manually
When a task is built manually or if Octoparse fails to detect AJAX, it is also possible to set it up manually by clicking the setting button of the "Click Item" action or "Click to Paginate" action.
The AJAX timeout should be long enough for the page to load the information we need.
4. Consider using AJAX timeout for web pages without AJAX
Even for pages that do not use AJAX, AJAX timeout can still be used to ameliorate prolonged wait time for some pages. For example, if you have a page that is taking forever to load, long after the information you need has been loaded, you may want to use AJAX timeout to "force" Octoparse to move on to the next step instead of having Octoparse wait until the page loading to finish.
FAQs related to AJAX:
If you have any questions, you are welcome to submit a request here. Our support team will get back to you within 24 hours.
Artículo en español: Tratar AJAX
También puedes leer artículos de web scraping en sitio web oficial