***************************************************************************** * ZTreeRegexPuzzleSolutionStrict.html * A more precise regex solution to Laurent Duchastel's file renaming puzzle * proposed here: http://www.ztw3.com/forum/forum_entry.php?id=83661 * * This enhanced solution verifies that the days (1-31) and the months (1-12) * are within their valid ranges and allows for variable length session digit * counts i.e. not just 1 of 9, but 10 of 100 or 9999 of 100000 etc. e.g. * LLLL_20071216_400-1000.JPG It also converts dash delimiters to underscores * and session delimiters from 'Of' to a single dash i.e. 1Of2 becomes 1-2. * It makes sure all required fields exist and that no extra characters are * present. These RE expressions were developed and tested using the Find and * Replace function in UltraEdit32 which uses the Boost C++ Regex Library. This * solution builds upon the RE solutions submitted by Ian Binnie. * * Author: Jeff Roberson * Date: 17-Dec-07 * Updated: 03-Dec-08 **************************************************************************** * regex #1: Step 1 of 2 **************************************************************************** Given: A filename containing a date having either of the following two structures: LLLL-DDMMYY-XOfX.JPG or LLLL_DDMMYY_X-X.JPG Where valid months are from 01 to 12, valid days are from 01 to 31 and valid years are from 0 to 99. Also each of the session count number fields ('X') may have more than one digit which are to be preserved. Find: A strict regex search pattern which matches only filenames with the given structure, and a replacement string to swap the day and year fields and extend the year field from 2 to 4 digits to look like the following: LLLL_20YYMMDD_X-X.JPG Note that years from 10 through 99 will erringly become 2010 through 2099 Solution: First we chop up the task into its components and assign the captured groups to substitution variables $1 through $7: STEP VAR FIELD SOURCE STRING REGULAR EXPRESSION ------------------------------------------------------------ 1 $1 = pre: 'LLLL-' or 'LLLL_' '^([A-Z]{4})[-_]' 2 $2 = day: 'DD' '(0[1-9]|[12]\d|3[01])' 3 $3 = month: 'MM' '(0[1-9]|1[012])' 4 $4 = year: 'YY' '(\d\d)' 5 $5 = firstdig: '-X' or '_X' '[-_](\d+)' 5 $6 = separator: '-' or 'Of' '(-|Of)' 6 $7 = post: 'X.JPG' '(\d+\.JPG)$' STEP VERBOSE DESCRIPTION OF REGULAR EXPRESSION INTERPRETATION --------------------------------------------------------------- 1 match the beginning of string position followed by (exactly four upper case letters)=$1 followed by either a dash or an underscore 2 match (a '0' followed by one of '1'-'9', OR a '1' or '2' followed by a digit '0'-'9', OR a '3' followed by a '0' or '1')=$2 3 match (a '0' followed by one of '1'-'9', OR a '1' followed by a '0', '1' or '2')=$3 4 match (a digit followed by another digit)=$4 5 match a dash or an underscore followed by (one or more digits)=$5 6 match (either a dash OR the letters 'O' then 'f')=$6 7 match (one or more digits then a dot '.' then the uppercase letters 'J', 'P', then 'G')=$7 followed by the end of string position Final regex #1 = '^([A-Z]{4})[-_](0[1-9]|[12]\d|3[01])(0[1-9]|1[012])(\d\d)[-_](\d+)(-|Of)(\d+\.JPG)$' Replacement string #1 = '$1_20$4$3$2_$5-$7' See the RegexBuddy description of regex #1 pattern here. ************************************************************************** * regex #2: Step 2 of 2 ************************************************************************** Given: File name output from previous regex having the following structure: LLLL_2010MMDD_X-X.JPG through LLLL_2099MMDD_X-X.JPG Find: A regex search pattern that finds years from 2010 through 2099 and changes them to 1910 through 1999 thereby correcting the 20th century Y2K error resulting from the previous regex replacement. Solution: STEP VAR FIELD SOURCE STRING REGULAR EXPRESSION -------------------------------------------------------------- 1 $1 = prefix: 'LLLL' '^([A-Z]{4})' 2 $2 = century: '_20' '_(20)' 3 $3 = year: '1'-'9' '([1-9])' STEP DESCRIPTION OF REGULAR EXPRESSION INTERPRETATION ------------------------------------------------------- 1 match the beginning of string followed by (exactly four uppercase letters)=$1 2 match an underscore followed by (a '2' followed by a '0')=$2 3 match (one of '1'-'9')=$3 Final regex #2 = '^([A-Z]{4})(_20)([1-9])' Replacement string #2 = '$1_19$3' See the RegexBuddy description of regex #2 pattern here. ************************************************************************** Notes: day range = 01 to 31 month range = 01 to 12 year range 19xx = 10 to 99 year range 20xx = 01 to 09 weaknesses: * allows February '30' and '31' Test Data: ORIGINAL AFTER REGEX #1 AFTER REGEX #2 --------------------------------------------------------------------------- LLLL-311299-1Of2.JPG LLLL_20991231_1-2.JPG LLLL_19991231_1-2.JPG LDUC_280906_1-3.JPG LDUC_20060928_1-3.JPG LDUC_280906_2-3.JPG LDUC_20060928_2-3.JPG LDUC_280906_3-3.JPG LDUC_20060928_3-3.JPG LDUC_290806_1-1.JPG LDUC_20060829_1-1.JPG LDUC_180498_1-9.JPG LDUC_20980418_1-9.JPG LDUC_19980418_1-9.JPG Extended valid filespecs Extended valid filespecs Extended valid filespecs LDUC_311207_10-999.JPG LDUC_20071231_10-999.JPG LLLL-020157-123Of4321.JPG LLLL_20570102_123-4321.JPG LLLL_19570102_123-4321.JPG Invalid filespecs check ABCD_320101_1-9.JPG Day too big ABCD_000101_1-9.JPG Day too small ABCD_011301_1-9.JPG Month too big ABCD_010001_1-9.JPG Month too small BCD_010101_1-9.JPG Missing prefix char ABCDE_010101_1-9.JPG Extra prefix char ABCD_010101_1-9.PG Missing post char ABCD_010101_1-9.JPEG Extra post char **************************************************************************