The Problem
Many international websites display dates in DD/MM/YYYY format, but Octoparse's built-in date cleaner expects MM/DD/YYYY. This causes issues when using the "Reformat date and time" feature.
The Solution
Use Regular Expressions (RegEx) to swap day and month values before cleaning.
Step-by-Step Guide
After extracting your dates, go to:
Data Preview → ... (More menu) → Clean Data → Add StepSelect "Replace with Regular Expression"
Enter these values:
Find: ([0-9]{1,2})(\/)([0-9]{1,2})(\/)([0-9]{1,4})
Replace with: $3/$1/$5Click "Apply"
Converts
DD/MM/YYYY
→MM/DD/YYYY
(Optional) Add another step:
"Reformat date & time" to standardize the format further
How the RegEx Works
Part | Matches | Example (31/12/2023) |
| Day (1-2 digits) |
|
| Forward slash |
|
| Month (1-2 digits) |
|
| Forward slash |
|
| Year (1-4 digits) |
|
The replacement $3/$1/$5
swaps day and month:
31/12/2023 → 12/31/2023
Pro Tips
🔹 Test with "Preview" before applying to all data
🔹 Combine with "Extract numbers" if dates contain text (e.g., "Jan 31, 2024")
🔹 For ambiguous dates (e.g., 05/06/2023), add a note about the original format
Need other date formats? Try these RegEx patterns:
YYYY-MM-DD: Use
$5-$3-$1
Text dates:
([A-Za-z]{3}) (\d{1,2}), (\d{4})
→$2 $1 $3
Now your dates will work perfectly with Octoparse's cleaning tools! 🚀