This tutorial aims to help you understand what Octoparse Advanced Mode is, why you should use it and how it is going to help you achieve what you need.
Advanced Mode enables anyone to scrape data from any website using simple point-and-click, with no code. If you are looking to scrape from webpages that are a bit more complicated or if you have yet extracted the data successfully using auto-detect, we strongly recommend that you give the Advanced Mode a try and truly uncover the world of possibilities with it:
- Scrape information from nearly any web pages
- Extract data like text, URL, image, and HTML
- Interact with webpages to perform complicated actions such as login authentication, keywords searching, and switching through a drop-down menu
- Fine tune your workflow, such as add wait time, modify XPath, and reformat the data extracted
Start a task in Advanced Mode
There are two ways to quickly start a new task using Advanced Mode:
1) Head straight to the home page, enter the URL(s) of the target web page and hit "Start".
2) Right under the Octoparse logo, hover on "+ New" and click "Advanced Mode".
Get to know the Advanced Mode interface
The Built-in Brower: Once you've entered a target webpage URL, the webpage will be loaded in Octoparse's built-in browser. you can browse the website in Browse mode or you can click to extract the data you need in Select mode.
The Workflow: As you proceed to interact with the webpage, such as opening a web page and clicking on a page element/button, the entire process is defined automatically in the form of a workflow.
Tips panel: Octoparse uses smart Tips to "talk" to you during the extraction process, to guide you through the task building process.
Data Preview: Have a preview of the data selected. You can also rename the data fields or remove the ones that are not needed.
How to use Advanced Mode to build tasks manually
To build a task manually using the Advanced Mode, skip the auto-detect process by click "Turn OFF auto-detect".
Then, simply click on the target data on the webpage. Follow the tips provided on the Tips panel to proceed with the task-building process. The general building steps are straightforward:
Select the data you need on the webpage >> Follow through the instruction provided in Action Tips >> Check your workflow >> Run the task to get data
In light of the nature of the web, webpages changes all the time, and different sets of data may be needed by different individuals. The Advanced Mode is created with the flexibility and versatility required to handle all kinds of scraping needs while making sure it is still non-coder friendly with step-by-step guidance provided in Action Tips.
Select the data you'd like to extract on the web page
Within the built-in browser, use simple clicks to select any data you'd like to extract on the webpage. As you hover over the web page, Octoparse tries to "understand" what you'd like to fetch as it highlights the page elements around your cursor. You can move your cursor slightly if the highlighted area is not quite but close to what you'd like to extract.
Once you have the data you need highlighted in blue, you can click to confirm the selection. Now, the selected page element should be highlighted in green, indicating that's been selected successfully.
Repeat the same process if you'd like to extract multiple elements on the same page.
Follow through the instructions provided in Action Tips
Octoparse attempts to guide you through the task-building process by offering all possible next-step in the Action Tips Panel. It is a way for Octoparse to "talk" to you.
Every time you select an element, the Action Tips panel will pop up with a number of options for you to choose from. Simply follow through the instructions provided and choose how you'd like to proceed with the selected data. For example, if you'd like to scrape the text of the selected elements, you can choose "Extract the text of the selected element"; or If you'd like to click on the selected element to go to the linked page, you can choose "Click element".
Below are the most frequently used actions:
Extract the text of the selected element - capture the text of the selected page element
Click element - click the selected page element
Extract the HTML of the selected element - capture the source code string of the selected element
Loop click single element - click the selected element repeatedly (similar to Loop click next page or Loop click single URL)
Extract URL of the selected - capture the URL of the selected link (when a link is selected)
Extract URL of the selected image - capture the image URL (when an image is selected)
Select all - select all similar elements (when similar elements are detected)
3. Check the workflow
As you go on to build the scraping task, Octoparse simultaneously creates a workflow according to how you've interacted with the web page as well as the Tips Panel.
An example workflow:
A few things to check before running the task:
1) If the workflow actions are ordered correctly.
You can rearrange the actions of the workflow by dragging and dropping to the right spot.
2) If each action needs to be fine-tuned with more settings.
You can check to see if each action's been set up correctly by hovering over the specific action. To modify the setting of an action, click and make changes when necessary.
Check this tutorial to learn more about how to test your workflow step-by-step:
4. Run the task
Now you've finished building and testing your task, you can run the task by clicking the Run button. You can run the task locally on your device or run it in the Cloud.
Here are some tutorials on how to deal with different kinds of page structures:
Find out more at Interactive with Webpages