Octoparse offers some predefined data fields that are really useful and convenient for users. You can also add a fixed value to your task.

Where to add predefined data fields?

In the Data Preview section, you can add the data field(s) you need as below -

customfields.jpg

What predefined data fields can I add?

There are 5 types of custom fields -

custom.jpg

1. Capture data on the page

This option will guide you to capture other elements you want to capture on the screen.

2. Current date & time

This would be the extraction time when a data line is being scraped. For example, if you have a scheduled task that runs every day, and you would like to know the date on which the data lines are scraped, you can add this field.

8800.png

TIPS:

  1. You can reformat the date with Refine extracted date/time to change the format of the current time field.

  2. Adding the current time in Cloud extraction can help to keep all the duplicates: How can I keep the duplicates in the Cloud runs?

  3. The time in Cloud extraction is on UTC time.

3. Add a Page-level data

pagelevel.jpg
  • Page URL: URL of the current page

  • Page title: title of the current page, which is a short description of a webpage and appears at the top of a browser window.

mceclip2.png
  • Meta description: meta description tag of the current page, which contains a summary of the page.

mceclip3.png
  • Meta keyword: meta keyword tag of the current page

mceclip4.png
  • HTML source code: the complete HTML code of the web page

4. Add a fixed value

This option allows you to create a fixed value for every data line.

You can set up your custom field name or choose from the Common fields, then enter the fixed value you want to add (if you need to add a blank field, just leave the Enter Text box empty)

fixedvalue1.jpg

5. Add Original Input URL

If you scrape a list of URLs, you may want to get the original input URL as a field along with your target data so you can match them to see if there are any URLs that haven't been scrapped.

Did this answer your question?