Table data is common among websites related to finance, sports, etc. This tutorial will guide you on how to scrape table data.
If you have learned how to grab a list of data, then table data is more or less similar (Extract a list). You can take each row of the table as an element of list data. Then, each table cell is equal to a sub-element in the element.
How to collect the table data with Octoparse? Go ahead with this tutorial!
Example URL: https://money.cnn.com/data/hotstocks/index.html
1. Use the Auto-detect function to set up the workflow
Octoparse supports auto-detecting the table and capturing all the columns. With this feature, you just need to
1) Enter the web page URL and select to auto-detect the web page data
2) Check if all the table cells have the captured and click "Create workflow"
Click Lesson 1: Extract data with the brand-new Auto-detect algorithm for details about auto-detect.
2. Set up the workflow manually
What if the auto-detect fails or it doesn't collect the complete table data? In this case, you need to set up the task manually. Here are the steps:
1) Select the 1st cell on the 1st row of the table, and then click the icon of "Expand the selection area" until it selects the whole 1st row.
(You can click "Turn OFF Auto-detect" or "Cancel Auto-detect" to stop auto-detect if it starts automatically)
the Tips panel will say "One or more sub-elements are found". "Sub-elements" are the specific data fields that Octoparse detects on each row of data. This is to ask if you want to locate these sub-elements.
2. Choose "Select all sub-elements" on the Tips panel. All the sub-elements in the 1st row are selected, and then Octoparse finds other similar elements highlighted in red.
3. Choose "Select all" on the Tips panel. All the sub-elements on the table are selected and highlighted in the green too.
4. Choose "Extract data" on the Tips panel. Now, Octoparse will extract all the data fields on the table.
5. Edit data fields as needed(optional)
Now, you have all the data fields set up in the task. You can rename, delete data fields on the "Data Preview" section.
- Click to rename the data field
- Click for more actions: delete, copy, clean data, etc.
If you have any trouble with extracting table data, you're welcome to submit a ticket to our Support team.
Artículo en español: Extraer datos de tabla
También puedes leer artículos de web scraping en sitio web oficial