Scrape data behind a login(Version 8)
FollowWhen the target data is behind authentication, it is still possible to access the data with Octoparse. Simply text input the login information (username and password) then click on the "sign-in" button to log in. In this tutorial, we will show you how to extract data behind a login, as well as how to use cookies to optimize the workflow of your task.
1) Enter login information to sign in
2) Use cookies to optimize the workflow
Enter login information to sign in
- Click on the textbox for username input on the web page
- Select "Enter text" from Action Tips
- Input the username into the textbox, click "Confirm", and the username entered is automatically populated to the username textbox on the web page
- Click on "Continue" and select "Click button" from Tips panel
(Set up proper AJAX timeout as needed.)
- Follow the same steps to enter the password
- Click the "Sign In" button on the page and select "Click button" from the Tips panel
Octoparse has now logged into the website successfully!
Tips! Clear cookies As all websites handle cookies differently, to ensure the task workflow will work consistently, you may want to start with the login steps every time the task is executed. To do this, you can clear any cookies saved before the login page is loaded. This way, the target website will always "forget" you and takes you to the login page on which you can enter all the login information.
|
Use cookies to optimize the workflow
1. Save cookies
Most of the time, you can optimize the workflow by saving the cookies in the task after login. This way, Octoparse will send the saved cookies to the website at loading, and there's a good chance the website will remember "you" and skip the login steps.
- Switch to the Browser mode by clicking
on the top right
- You can log in to the website just like what you do on a regular browser.
- After login, go to the settings of the "Go to web page" action and save the cookies.
- Now the web page is supposed to "remember" the login and skip the login steps when the crawler is running next time.
Tips! 1. A saved cookie is only effective before it gets expired Cookies come in many different forms. Some have a specific expiration time, others expire immediately as the browser is closed. In Octoparse, the saved cookie will no longer work when it gets expired. To solve this, you will need to go through the login steps once again under the browser mode again in order to obtain and save the updated cookie. 2. Your password is well-protected
3. Entering captcha manually while running local extraction When a captcha is encountered, you can manually input the captcha when running the task locally. Cloud Extraction doesn’t support dealing with Captcha. |
Tutorial en español: Extraer datos después del inicio de sesión
También puedes leer más tutoriales de web scraping en sitio web oficial
If you need any help with task configuration or data collection, submit a ticket to our support team! We'll get back to you within 24 hours.
Author: Kara
Editor: Yina