Contents on web pages are usually organized in some kinds of patterns. One of the most commonly seen patterns is a list. Here are a few examples of when content is laid out as a list.
This particular web page consists of items sharing the same structure. Each item contains a title, date, keyword, article...
Our goal is to get data extracted into excel like this:
Now, let's explore different ways to get this done in Octoparse.
You may need this link to follow through: https://www.octoparse.com/blog
1. Extract list with Auto-detect
Once you've created a new task using the example URL, select "Auto-detect web page data. Octoparse will now detect any data on the page and you can click "Create workflow" to generate the workflow.
2. Extract list manually
If for some reason the Auto-detect fails to detect the list or if you are building a task without Auto-detect, you can always extract the list manually.
1) Method 1:
- Load the web page in Octoparse and hover your cursor over the first item until the entire section gets highlighted in blue
- Continue to click on the second item and you will find all you need on one page has been selected.
- Choose "Extract text of the selected elements" and Octoparse will create a Loop Item automatically
You will notice that the first item is now highlighted in red. You can select the the information like title, date and keyword from the highlighted area.
- Select the title and choose "Extract the text of the element"
- Repeat the steps to get other information
- Double click on the field name to rename it if needed
Please make sure all the sub-elements you want to extract are all included in this highlighted section.
2) Method 2:
- Hover your cursor over the first item until the entire section gets highlighted in blue
You will notice that Octoparse detects sub-elements from the section and highlights them in red.
- Choose "Select sub-elements"
- Choose "Select all"
- Select "Extract data". A loop item will be generated automatically to scrap the list of items on the page.
If you want to edit or delete the extracted data fields, you can click "Extract Data" and modify the fields on the Data Preview panel.
If you need any help with task configuration or data collection, submit a ticket to our support team! We'll get back to you within 24 hours.