Lesson 3: Getting data - Capture text from a page
Follow

1) Create a new task
Once you're logged in, create a new task by clicking the "+ Task " button under Advanced Mode.
Tips! 1. What is a task? A task is a crawler for scraping data from any websites. Each task consists of a set of instructions specific to crawling data from a particular website. Unless the websites share exactly the same page layouts, you'll need to create a unique task that tells Octoparse to perform scraping actions on that particular webpage. 2. Why should I use Advanced Mode? Advanced Mode is an incredibly powerful mode offering extended flexibility to accommodate scraping all different kinds of websites. It allows you to customize individual action needed to perform the extraction including keywords searching, login authentication, opening dropdowns, etc. |
For this example, we'll take one of our blog posts to show you how to fetch data from a single webpage. Suppose our goal is to extract the blog information from the page.
Copy and paste the target URL (https://www.octoparse.com/blog/top-5-web-scraping-tools-comparison/) into the "Extraction URL" textbox, then click on "Save URL". Octoparse will load the designated webpage within the built-in browser.
Tips! 1. Toggle the "Workflow' button 2. Task name can be edited directly by typing on top of the auto-generated name. Don’t forget to save the changes by clicking |
2) Select data to capture
Now, let's start capturing the data by clicking directly on the various pieces of information.
Click on the title, the posted date and the content of the post. When the data is being selected properly, the selections will be highlighted in green.
Notice the data you justed selected is now shown within Action Tips. You can edit the field names now by clicking or leave till later.
Select "Extract Data" to complete the text extraction action.
3) Getting your data
Now you have finished creating your very first scraping task. Click "Save and run" from Action Tips or alternatively, click "Start Extraction" on the upper left-hand side to run the task.
Octoparse offers two ways to extract: local extraction and cloud extraction. Use local extraction for testing your task. Select "Local extraction" and your task will begin to run. Are you getting data like this? [Download the task file created in this lesson]
日本語記事:レッスン3:データ取得 - ページからテキストを抽出する
Webスクレイピングについての記事は 公式サイトでも読むことができます。
Artículo en español: Lección 3: Obtener datos - Scraping textos de web
También puede leer artículos de web scraping en el sitio web oficial.
Lesson 4: Capture a list of items
From: https://www.octoparse.com/tutorial-7/capture-text-from-a-page