If you run a task multiple times, you may see Octoparse showing duplicates on the Dashboard:
This is because Octoparse will store the data scraped from all the runs together and recognize duplicates. Duplicates will be deleted automatically from the Cloud. In addition, for data export, Octoparse removes duplicates by default from the exported results, regardless of whether the "Remove duplicates" setting is checked or not. This behavior ensures that unchanged data will not be re-exported in subsequent task runs.
Duplicates refer to data lines that are identical in all columns. While they may not be necessary in most cases, there are certain situations where keeping all the scraped data for comparison can be beneficial. This is especially useful in scenarios where you need to compare historical trends or when handling audits that require tracking all scraped data, even unchanged entries.
How to keep duplicates?
You can try to add the current date & time as a field in the task.
Go to the Data Preview
Click on the Add Custom Field button
Choose Current date & time
The field will be added like this:
The field indicates the date and time this data row is scraped. Since each row is scraped at a different time, they are now different in the current_time field. There won't be any duplicates.