You are browsing a tutorial guide for the latest Octoparse version (8.5.4). If you are running an older version of Octoparse, we strongly recommend you upgrade because it is faster, easier and more robust! Download and upgrade here if you haven't already done so!

The latest version introduces the function of downloading files and pictures to local devices so that we can directly open the files and pictures from local folders. Document downloads in jpg, png, gif, doc, pdf, ppt, txt, xls, and zip formats are currently supported.

In this tutorial, we are going to show you how to download files and images with Octoparse. Please note that file downloads can only be used in local runs currently.

  1. Download Files

  2. Download Images

  3. Download Settings

Sample URL for the downloaded file settings: https://www.cclcomponents.com/fronius-gen24-plus-primo-3kw-hybrid-inverter

URL.png

The sample URL for the downloaded image settings is: https://www.rappi.com.mx/tiendas/tipo/market

2.png

1. Download Files

  • Click on one of the Download buttons - Choose one document you want to download and the selected element will turn green, similar elements will turn red

__.png
  • Click on Select All from the Tips box - All documents will be identified and selected to turn green

SELECT_ALL.png
  • Click Extract document URLs and download linked files - To extract the links as well as download the files to local folders

download.png

The data field will look like the one below, you can click the ... icon in the upper right corner to modify the data fields.

modified.png

NOTE: Deleting the field with a folder icon in the name will cancel the download settings.

mceclip0.png
  • Name downloaded files: There are 4 ways to name the downloaded files. You can see the options on the Tips.

  1. MD5 Hash Value: Use the MD5 value to name the files

  2. Original File Name: Default original file name

  3. Download Complete Time: Use complete download time to name the files

  4. Data Field Value: Use the data field value to name the files

MD5.png

If the file name already exists in the folder, there are also three ways to deal with the situation.

mceclip0.png
  1. Skip the new file: Skip the current downloaded file

  2. Replace the existing file: Replace the existing file with the newly downloaded file

  3. Rename the new file: Rename the new file with a (1) at the end of the file name


2. Download Images

Downloading images to local folders shares the same logic as the downloaded files.

  • Click on one image - Choose one image

IMAGE.png
  • Click on Select All from the Tips box - Select all images

select.png
  • Click Extract image URLs and download linked files - To extract the links as well as download the images to local folders

extract.png

Note: Only URLs with "https://" can be downloaded directly with Octoparse. If the URL value scraped is only part of the complete download link, you can use the Add prefix or other data refining features in the Clean Data function to get the valid download links.

file_downlad.png

3. Download Settings

3.1 Download file settings

  • Click the arrow icon in front of the data field

__.png
  • You can rename the downloaded files, separate multiple URLs and input URLs to skip the download files here

REMOVE_DUPLICATES.png

3.2 Download Location settings

  • Click on the task settings icon in the upper right corner of the task settings screen - To open the settings panel

SETTING.png
  • Click the Browse button - Choose a local folder for the downloaded files and images

  • Choose one mode for the When a local run starts settings

  • Click Save - Save all the modifications

SAVE.png
Did this answer your question?