Scrape page-level data (meta data, page URL, page title, source code)
FollowIn this tutorial, we will show you how to use Octoparse to extract page-level data, including webpage URL, page title, meta description, meta keywords, and HTML source code.
How to add the data
1. When you are in the "Extract Data" action, click "Action Settings"
2. Click on to "Add data field(s)"
3. Hover on or click on "Page-level data" to select the page-level data that you want
The selected page-level data will be added automatically in "Data Field".
4. Rename the data field as needed
Tips! You can also add the fields on the "Data Preview" panel. Click the icon |
Meaning of the fields
- Page URL: add the URL of the current page along with the corresponding data
It is useful when you would like to check the missing data fields on a page: What to do with those blank fields I got in the extracted result?
- Page title: scrape the content of the title tag.
It is a short description of a webpage and appears at the top of a browser window.
- Meta description: scrape the content of the meta description tag
The tag contains a summary of the page content.
- Meta keyword: scrape the content of the meta keyword tag
Scraping the page title, meta description, and meta keywords are useful when users need to improve their SEO.
- HTML source code: the complete HTML code of the web page
Tutorial en español: Extraer datos del nivel de página (metadatos, URL de la página, título de la página, código fuente)
También puedes leer más tutoriales de web scraping en sitio web oficial
If you need any help with task configuration or data collection, submit a ticket to our support team! We'll get back to you soon.
Author: Kara
Editor: Yina