If the data you need is not readily accessible when you've opted "Auto-detect web page data", you can click "Not the right webpage?" provided on the bottom of the Tips Panel to proceed with more options.
- Login to website
- Close a pop-up
- Search with keyword(s)
- Switch tab
1. Login to website
You should use this option if the data you need is behind a login. Upon clicking on "Login to website", Octoparse will turn on the "Browse Mode" automatically for you to enter the login credentials. You can then type in the username and passwords just like how it is done in your everyday browser. Once you've successfully logged into the account, click "Done" on the Tips panel.
Cookies is then saved automatically to the task and used for future access. Please note that Octoparse does not keep or save your login credentials and no login steps will be generated and added to the workflow in this case.
Now you've logged in the account, you can move on to scrape the desired data by clicking manully or running "Auto-detect web page data" once again.
There isn't just one way to get data behind log-in in Octoparse. Check out the other alternative ways to handle login in Octoparse: Scrape data behind a login.
Related case tutorial: Scrape jobs from LinkedIn
2. Close a pop-up
When you load a website in Octoparse, you may run into a pop-up instead of the webpage you need. Even though the pop-ups won't necessarily mess up the extraction, but they do get in the way of setting up the task, so you may want to close the pop-ups first. Follow the instructions below to close a pop-up window.
- Click on "Close a pop-up"
- Click the "Close" button on the pop-up window or any other element that does the same thing. In the example below, click the "ACCEPT" button to continue.
- Then, click "Confirm" to finish up.
- Octoparse will ask you if you'd like to adjust the timeout for AJAX. (see more in Deal with AJAX). Follow the instruction on the AJAX Setup panel if needed.
3. Search with keyword(s)
If you are scraping any kinds of directory websites, chances are you may need to search with keyword(s) in order to access the information you need. Follow the instructions below to run a search prior to scraping the data.
- Click "Search with keyword(s)" on the Tips panel
- Click "Settings" to add a search box.
- Click on the search box on the webpage, then hit "Confirm" to save it.
- Click the icon to add search keyword(s)
- Enter one keyword per line and then hit "Confirm"
- Depending whether there's a "search" button on the page, you can either choose to "Hit the Enter/Return key when finish entering" or "Click the search button when finishing entering". For the latter, make sure you've clicked on "Setting" and selected the correct "Search" button.
- Lastly, hit "Confirm" to continue. The corresponding steps will be generated and added to the workflow automatically.
Learn more about how to deal with text/keyword input:
4. Switch tab
To scrape data from inside a tab, follow the instructions below.
Taking the screenshot above as an example, here's how to get the data under the "SPECS" tab.
- Choose "Switch tab" from the Tips panel
- Follow the guide on the Tips panel to click the tab to show the data
- Then, hit "Confirm" to continue
- Now, you can click any data you want to capture under this tab.
Check the following guide for details if you want to customize it manually:
If you need any help with your task, feel free to reach out to us.