When a webpage needs to show a lot of information on a single webpage, it often uses tabs to help sort things better, and only when you click on the specific tab will the respective information be shown.
Let's take this webpage as an example:
On this webpage, if you want to view the data from inside the Channel Overview tab and Competition tab, you'd need to click the tabs respectively.
Click the Channel Overview tab to reveal the data
Click the Competition tab and get the data
Now, if we want to extract the data from the Competition tab, how can it be done? There are two ways to get the data from inside a tab.
1. Clicking on the tab to Scrape Data from Inside a Tab
Obviously, you can tell Octoparse to click on the tabs and scrape the content from inside the tabs respectively.
Select Click Item
Untick Open in a new tab and set up AJAX time.
Then, click on the data you need to capture and select Text on the Tips panel.
2. Directly Scrape Data from a Tab When Its Content is Available in the Source Code.
Even though the information is sorted into different tabs, the content inside each tab may already exist in the source code, regardless of whether the respective tab is clicked or not.
In this case, we can first have the tab content revealed under Browse mode and then proceed to scrape the target information directly. This way, there's no need to add any Click actions to the workflow.
To check if the tab content is provided in the source code, load the web page in your everyday browser and press "F12" on the keyboard.
Inspect the source code and see if the target content is there.
For this example webpage, we can see that even though we have not clicked on the Competition tab, we can still find the corresponding data in the source code. Then we know it is possible to scrape the tab content directly without having to click on the tab.
Click the Competition tab and locate the corresponding data in the source code.
Without clicking the Competition tab, we can still find the corresponding data in the source code.
Now, go back to Octoparse, toggle the button at the top right corner of the built-in browser to switch to the Browse mode.
Click the Competition tab to reveal the content.
Toggle the Browse mode button again and switch back to the Workflow mode.
Click on the data to capture and select Text on the Tips panel
There you have the tab content captured directly.