*****************************************************************************
* ZTreeRegexPuzzleSolutionStrict.html
* A more precise regex solution to Laurent Duchastel's file renaming puzzle
* proposed here: http://www.ztw3.com/forum/forum_entry.php?id=83661
*
* This enhanced solution verifies that the days (1-31) and the months (1-12)
* are within their valid ranges and allows for variable length session digit
* counts i.e. not just 1 of 9, but 10 of 100 or 9999 of 100000 etc. e.g.
* LLLL_20071216_400-1000.JPG It also converts dash delimiters to underscores
* and session delimiters from 'Of' to a single dash i.e. 1Of2 becomes 1-2.
* It makes sure all required fields exist and that no extra characters are
* present. These RE expressions were developed and tested using the Find and
* Replace function in UltraEdit32 which uses the Boost C++ Regex Library. This
* solution builds upon the RE solutions submitted by Ian Binnie.
*
* Author: Jeff Roberson
* Date: 17-Dec-07
* Updated: 03-Dec-08
****************************************************************************
* regex #1: Step 1 of 2
****************************************************************************
Given:
A filename containing a date having either of the following two structures:
LLLL-DDMMYY-XOfX.JPG or
LLLL_DDMMYY_X-X.JPG
Where valid months are from 01 to 12, valid days are from 01 to 31 and
valid years are from 0 to 99. Also each of the session count number
fields ('X') may have more than one digit which are to be preserved.
Find:
A strict regex search pattern which matches only filenames with the given
structure, and a replacement string to swap the day and year fields and
extend the year field from 2 to 4 digits to look like the following:
LLLL_20YYMMDD_X-X.JPG
Note that years from 10 through 99 will erringly become 2010 through 2099
Solution:
First we chop up the task into its components and assign the captured groups
to substitution variables $1 through $7:
STEP VAR FIELD SOURCE STRING REGULAR EXPRESSION
------------------------------------------------------------
1 $1 = pre: 'LLLL-' or 'LLLL_' '^([A-Z]{4})[-_]'
2 $2 = day: 'DD' '(0[1-9]|[12]\d|3[01])'
3 $3 = month: 'MM' '(0[1-9]|1[012])'
4 $4 = year: 'YY' '(\d\d)'
5 $5 = firstdig: '-X' or '_X' '[-_](\d+)'
5 $6 = separator: '-' or 'Of' '(-|Of)'
6 $7 = post: 'X.JPG' '(\d+\.JPG)$'
STEP VERBOSE DESCRIPTION OF REGULAR EXPRESSION INTERPRETATION
---------------------------------------------------------------
1 match the beginning of string position followed by (exactly four upper
case letters)=$1 followed by either a dash or an underscore
2 match (a '0' followed by one of '1'-'9', OR a '1' or '2' followed by a
digit '0'-'9', OR a '3' followed by a '0' or '1')=$2
3 match (a '0' followed by one of '1'-'9', OR a '1' followed by a '0',
'1' or '2')=$3
4 match (a digit followed by another digit)=$4
5 match a dash or an underscore followed by (one or more digits)=$5
6 match (either a dash OR the letters 'O' then 'f')=$6
7 match (one or more digits then a dot '.' then the uppercase letters
'J', 'P', then 'G')=$7 followed by the end of string position
Final regex #1 =
'^([A-Z]{4})[-_](0[1-9]|[12]\d|3[01])(0[1-9]|1[012])(\d\d)[-_](\d+)(-|Of)(\d+\.JPG)$'
Replacement string #1 = '$1_20$4$3$2_$5-$7'
See the RegexBuddy description of regex #1 pattern here.
**************************************************************************
* regex #2: Step 2 of 2
**************************************************************************
Given:
File name output from previous regex having the following structure:
LLLL_2010MMDD_X-X.JPG through LLLL_2099MMDD_X-X.JPG
Find:
A regex search pattern that finds years from 2010 through 2099 and changes
them to 1910 through 1999 thereby correcting the 20th century Y2K error
resulting from the previous regex replacement.
Solution:
STEP VAR FIELD SOURCE STRING REGULAR EXPRESSION
--------------------------------------------------------------
1 $1 = prefix: 'LLLL' '^([A-Z]{4})'
2 $2 = century: '_20' '_(20)'
3 $3 = year: '1'-'9' '([1-9])'
STEP DESCRIPTION OF REGULAR EXPRESSION INTERPRETATION
-------------------------------------------------------
1 match the beginning of string followed by (exactly four uppercase
letters)=$1
2 match an underscore followed by (a '2' followed by a '0')=$2
3 match (one of '1'-'9')=$3
Final regex #2 = '^([A-Z]{4})(_20)([1-9])'
Replacement string #2 = '$1_19$3'
See the RegexBuddy description of regex #2 pattern here.
**************************************************************************
Notes:
day range = 01 to 31
month range = 01 to 12
year range 19xx = 10 to 99
year range 20xx = 01 to 09
weaknesses:
* allows February '30' and '31'
Test Data:
ORIGINAL AFTER REGEX #1 AFTER REGEX #2
---------------------------------------------------------------------------
LLLL-311299-1Of2.JPG LLLL_20991231_1-2.JPG LLLL_19991231_1-2.JPG
LDUC_280906_1-3.JPG LDUC_20060928_1-3.JPG
LDUC_280906_2-3.JPG LDUC_20060928_2-3.JPG
LDUC_280906_3-3.JPG LDUC_20060928_3-3.JPG
LDUC_290806_1-1.JPG LDUC_20060829_1-1.JPG
LDUC_180498_1-9.JPG LDUC_20980418_1-9.JPG LDUC_19980418_1-9.JPG
Extended valid filespecs Extended valid filespecs Extended valid filespecs
LDUC_311207_10-999.JPG LDUC_20071231_10-999.JPG
LLLL-020157-123Of4321.JPG LLLL_20570102_123-4321.JPG LLLL_19570102_123-4321.JPG
Invalid filespecs check
ABCD_320101_1-9.JPG Day too big
ABCD_000101_1-9.JPG Day too small
ABCD_011301_1-9.JPG Month too big
ABCD_010001_1-9.JPG Month too small
BCD_010101_1-9.JPG Missing prefix char
ABCDE_010101_1-9.JPG Extra prefix char
ABCD_010101_1-9.PG Missing post char
ABCD_010101_1-9.JPEG Extra post char
**************************************************************************