If you have ever tried scraping an eCommerce website, you may be no stranger to scraping data from inside a tab. When a webpage needs to show a lot of information on a single webpage, it often uses tabs to help sort things better, and only when you click on the specific tab will the respective information be shown.
Let's take this webpage as an example:
On this webpage, if you want to view the data from inside the "Shipping and Returns" tab and "Size Guide" tab, you'd need to click the tabs respectively.
Now, if we want to extract the data from the "Shipping and Returns" tab, how can it be done? There are two ways to get the data from inside a tab.
1. Scrape data from inside a tab by clicking on the tab first
Obviously, you can tell Octoparse to click on the tabs and scrape the content from inside the tabs respectively.
- Click on the "Shipping and Returns" tab
- Select "Click element" on the Tips panel
- Set up AJAX. You can adjust the AJAX timeout based on your network speed
- Then, click on the data you need to capture and select "Extract the text of the element" on the Tips panel
1. If you want to learn more about AJAX, click here.
2. For the Click action, please make sure the Open in a new tab option is not checked
2. Scrape data from inside a tab directly when the content is found in the source code
Even though the information is sorted into different tabs, the content inside each tab may already exist in the source code regardless of whether the respective tab is clicked or not. In this case, we can first have the tab content revealed under Browse mode and then proceed to scrape the target information directly. This way, there's no need to add any Click actions to the workflow.
- To check if the tab content is provided in the source code, load the web page in your everyday browser and press "F12" on the keyboard.
- Inspect the source code and see if the target content is there. For this example webpage, we can see that even though we have not clicked on the "Shipping and Returns" tab, we can still find the corresponding data in the source code. There we know it is possible to scrape the tab content directly without having to click on the tab.
- Now, go back to Octoparse, toggle the button at the top right corner of the built-in browser to switch to the Browse mode
- Click the "Shipping and Returns" tab to reveal the content
- Toggle the Browse mode button again and switch back to the Workflow mode
- Click on the data to capture and select "Extract the text of the element" on the Tips panel
- There you have the tab content captured directly
Feel free to leave a message if you still have questions about capturing data from inside a tab. We will get back to you ASAP.