Hi there! Welcome to the brand new Octoparse version 8.5! There are some major updates in this new version so we are putting together a new learning series to help you grasp the new capabilities and improvements in the software.
Going through all the intro lessons will help you get a thorough understanding of Octoparse 8.5. You will be able to scrape data from most web pages with Octoparse. It normally takes around 30 to 60 mins to finish reading all the lessons.
Let's start by introducing the software's interface and core features.
- The Interface: The home page/ The sidebar menu/ The workspace
- Core Features: Using Task Templates/ Scraping data with Custom Mode/ Cloud extraction
1. The Interface
As soon as you log into Octoparse, you will find two main sections: the home page and the sidebar.
1.1 The Home Screen
There is a search bar at the top of the page where you can enter the target webpage URL(s) to start building a task or you can also enter a template name (such as Amazon or eBay) to search for a pre-built scraping template.
You can also access some of the most popular scraping templates and tutorials on the home page.
1.2 The Sidebar Menu
The sidebar menu on the left contains everything you need to navigate within Octoparse.
- + New: create/import a new task or create new task groups.
- Dashboard: where you can find all your scraping tasks. You can edit, delete, rename and organize all the tasks in your account. You can also run, stop or schedule any tasks conveniently.
- Support: where you can search for a tutorial or start a quick chat with the Octoparse support team for any assistance needed.
1.3 The Workspace
The Octoparse workspace is where you will be building your tasks. There are 5 main parts to it with each part servicing its particular purpose.
- The Built-in Browser: Once you have entered a target webpage URL, the webpage will be loaded in Octoparse's built-in browser. you can browse the website in Browse mode or click to extract the data you need in Select mode.
- Tips: Octoparse uses Smart Tips to "talk" to you during the extraction process, to guide you through the task-building process.
- The Workflow: As you proceed to interact with the webpage, such as opening a web page and clicking on a page element/button, the entire process is defined automatically in the form of a workflow.
- Settings: Settings options for the actions in the workflow will be shown after you select one action.
- Data Preview: To have a preview of the selected data. You can also rename the data fields or remove the ones that are not needed.
2. Core Features
2.1 Task Templates
Task Templates are pre-built tasks for users to get data by entering simple parameters like URL(s) or keywords. There are currently over 60 templates for most mainstream websites. There is no need to build anything and no technical proficiencies are required. Simply select a template you need, check the sample data to see if it gets what you need, and extract data right away!
2.2 Scraping data with Custom Mode
Building your own scraping task with Advanced mode need not be complicated and intimidating. With the new auto-detect algorithm, Octoparse automatically detects elements on a page and generates recommended task settings like extracting the list and going to the next page.
On top of the auto-detected data, you can always manually edit the task settings or build a task from scratch by skipping the auto-detect step.
Once you are satisfied with the auto-detected data, simply save the settings and Octoparse will generate the task workflow automatically. You can add extra steps to the workflow or modify the actions manually if needed.
2.3 Cloud Extraction
Octoparse offers a powerful Cloud platform for premium users (Standard & above) to run your tasks 24/7. When you run a task with "Cloud extraction", it runs in the Cloud with multiple servers using our IPs. You can shut down the App or your computer while the task is running. There is no need to worry about hardware limitations.
Data extracted will be saved in the cloud and can be accessed at any time. Advanced features such as automatic IP rotation, task scheduling, extraction speed up, and Octoparse API are all parts of the Octoparse Cloud service.
continue to >> Lesson 1: Start with Auto-detect