Lesson 0: Octoparse Basics
FollowHi there! Welcome to the brand new Octoparse version 8.1! There are major changes in the new version so we are putting together this new learning series to help you leverage the new improved capabilities for extracting the data you need.
After going through all the intro lessons, you will know Octoparse 8.1 inside out and you will be able to scrape data from most webpages out there. It would take around 30 to 60 mins to finish reading all the lessons. Have fun!
Let's start by introducing the interface and covering the core features.
- The home screen
- The sidebar menu
- The workspace
- Using Task Templates
- Scraping data with Advanced Mode
- Cloud extraction
1. The interface
As soon as you log into Octoparse, you will find two main sections: the home screen and the sidebar.
1.1 The Home screen
At the center of the home screen is a search bar. You can enter the target webpage URL(s) to start building a task or you can also enter a template name (such as Amazon or eBay) to search for a pre-built scraping template.
You can also access some of the most popular scraping templates and tutorials on the home screen.
1.2 The Sidebar Menu
The sidebar menu on the left contains everything you need to navigate within Octoparse.
+New button: create/import a new task or create new task groups.
Dashboard: The one place to manage all your scraping tasks. Edit, delete, rename and organize all the tasks in your account. You can also conveniently run, stop or schedule any tasks.
Quick Filters & Recent Tasks: Use these shortcuts to quickly access your tasks.
1.3 The Workspace
The Octoparse workspace is the place where you'll be building your task. There are four main parts to it with each part servicing its particular purpose.
The Built-in Brower: Once you've entered a target webpage URL, the webpage will be loaded in Octoparse's built-in browser. you can browse the website in Browse mode or you can click to extract the data you need in Select mode.
The Workflow: As you proceed to interact with the webpage, such as opening a web page and clicking on a page element/button, the entire process is defined automatically in the form of a workflow.
Action Tips: Octoparse uses Smart Tips to "talk" to you during the extraction process, to guide you through the task building process.
Data Preview: Have a preview of the data selected. You can also rename the data fields or remove the ones that are not needed.
2. Core Features
2.1 Task Templates
Task Templates are pre-built tasks for users to get data by entering simple parameters like URL(s) or keywords. There are currently over 60 templates for the most popular websites. There is no need to build anything and no technical proficiencies required. Simply select a template you need, check the sample data to see if it gets what you need, and extract data right away!
2.2 Scraping data with Advanced Mode
Contrary to task templates where everything's already preset, the Octoparse Advanced Mode is a highly flexible and powerful scraping mode that enables you to build a scraping task customized to your specific requirements. The advanced mode is robust enough to scrape complicated web pages, such as pages with JavaScript, AJAX, or any dynamic websites.
Building your own scraping task with Advanced mode need not be complicated and intimidated. With the new auto-detect algorithm, Octoparse automatically detects for elements on a page and generate recommended task settings like extracting the list, go to the next page, or clicking detail page links.
On top of the auto-detected data, you can always manually edit the task settings or build a task from scratch by skipping the auto-detect step.
Once you are happy with the auto-detected data, simply save the settings and Octoparse will generate the task workflow automatically. You can add extra steps to the workflow or modify the actions manually if needed.
2.3 Cloud extraction
Octoparse offers a powerful Cloud platform for premium users (Standard & above) to run your tasks 24/7. When you run a task with "Cloud extraction", it runs in the Cloud with multiple servers using our IP's. You can shut down the App or your computer while the task is running. There is no need to worry about hardware limitations.
Data extracted will be saved in the cloud and can be accessed at any time. Advanced features such as automatic IP rotation, task scheduling, extraction speed up, and Octoparse API are all parts of the Octoparse Cloud service.
Good job for making it here! You've already grabbed the basics about Octoparse.
To learn how to build your first scraping task, please continue to >> Lesson 1: Extract data with the all-new auto-detect algorithm
Artículo en español: Lección 0: Octoparse Conceptos básicos
También puede leer artículos de web scraping en el sitio web oficial