Octoparse offers some predefined data fields that are really useful and convenient for users.
Where to add the predefined data fields?
Go to the "Extract Data" step and you can find the "Add predefined field" button on the right panel.
What predefined data fields I can add?
There are four kinds of data fields you can add:
1. Add the current time
This data field means the extraction time of the data line scraped.
For example, if you have a scheduled task that runs every day, and you would like to know the date on which the data lines are scraped, you can simply add this filed.
1. You can reformat the date with Reformat extracted data/time to change the format of the current time field.
2. Adding the current time in Cloud extraction can help to keep all the duplicates: Can I keep the duplicates extracted in Cloud?
3. The time in Cloud extraction is based on UTC time.
2. Add a fixed field
This option allows you to create a fixed value to every data line.
If you are scraping from both Amazon.com and Amazon.fr, for example, and you would like to add a "Website" field to indicate which domain the data are scraped, you can create the data field using this option.
3. Add a blank field
This option helps you create a blank field to extract any field you'd want on the page. To turn a blank field into a useful one, revising the XPath is a must.
How to revise the XPath step by step, please check here:
4. Add current page information
- Web page URL: add the URL of the current page along with the corresponding data
It is useful when you would like to check the missing data fields on a page: What to do with those blank fields I got in the extracted result?
- Page title: scrape the content of the title tag.
It is a short description of a webpage and appears at the top of a browser window.
- Meta description: scrape the content of the meta description tag
The tag contains a summary of the page content.
- Meta keyword: scrape the content of the meta keyword tag
Scraping the page title, meta description, and meta keywords are useful when users need to improve their SEO.