© Jeff Roberson
Created: 2009-Mar-02
Edited: 2012-Oct-06
Revision History
The Internet Engineering Task Force's (IETF) Request for Comments (RFC) document number 3986 titled: "Uniform Resource Identifier (URI): Generic Syntax" (RFC3986), is a standard which describes the precise syntax of all components that make up a valid generic Uniform Resource Identifier (URI). This article seeks to provide a set of regular expressions (regex) which accurately, clearly and efficiently describe all the various components of a URI.
To define the syntax of the various URI components, RFC3986 utilizes the Augmented Backus-Naur Form (ABNF) notation as defined in RFC4234. Table 1. lists all the (trivial) core rules defined in RFC4234 which are used by RFC3986.
ABNF rule name | ABNF rule syntax [ ; comments ] |
---|---|
Regular Expression rule syntax | |
ALPHA | %x41-5A / %x61-7A ; A-Z / a-z |
[A-Za-z] | |
CR | %x0D ; carriage return |
\r | |
DIGIT | %x30-39 ; 0-9 |
[0-9] | |
DQUOTE | 0x22 ; " (Double Quote) |
" | |
HEXDIG | DIGIT / "A" / "B" / "C" / "D" / "E" / "F" |
[0-9A-Fa-f] | |
LF | 0x0A ; linefeed |
\n | |
SP | 0x20 ; space |
\x20 |
RFC3986 Appendix A describes all of the rules which make up the syntax for a valid URI. Each rule has a name and an ABNF syntax definition. All of these rules from RFC3986 are reproduced in Table 2 along with a regular expression implementation for each rule. For non-trivial URI ABNF rules, the regex is presented in verbose free-spacing mode with comments (taken from the ABNF syntax wording, which describe each major component of the regex). Trivial regexes are presented in non-verbose native regex mode.
SOURCE CODE SNIPPETS: With Javascript enabled, you can double-click on any regular expression to generate a TEXTAREA box containing a code snippet for one of several different languages. Double-click again to turn it off.
ABNF rule name | ABNF rule syntax [ ; comments ] |
---|---|
Regular Expression rule syntax | |
gen-delims | ":" / "/" / "?" / "#" / "[" / "]" / "@" |
[:/?#[\]@] | |
sub-delims | "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" |
[!$&'()*+,;=] | |
reserved | gen-delims / sub-delims |
[:/?#[\]@!$&'()*+,;=] | |
unreserved | ALPHA / DIGIT / "-" / "." / "_" / "~" |
[A-Za-z0-9\-._~] | |
pct-encoded | "%" HEXDIG HEXDIG |
%[0-9A-Fa-f]{2} | |
dec-octet | DIGIT ; 0-9 / %x31-39 DIGIT ; 10-99 / "1" 2DIGIT ; 100-199 / "2" %x30-34 DIGIT ; 200-249 / "25" %x30-35 ; 250-255 |
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) | |
IPv4address | dec-octet "." dec-octet "." dec-octet "." dec-octet |
# RFC-3986 URI component: IPv4address (?: (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) \.){3} # dec-octet "." dec-octet "." (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) # dec-octet "." dec-octet | |
h16 | 1*4HEXDIG |
[0-9A-Fa-f]{1,4} | |
ls32 | ( h16 ":" h16 ) / IPv4address |
# RFC-3986 URI component: ls32 (?: # ( [0-9A-Fa-f]{1,4} : [0-9A-Fa-f]{1,4} # ( h16 ":" h16 ) | (?: (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) \.){3} # / IPv4address (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) ) # ) | |
IPv6address | 6( h16 ":" ) ls32 / "::" 5( h16 ":" ) ls32 / [ h16 ] "::" 4( h16 ":" ) ls32 / [ *1( h16 ":" ) h16 ] "::" 3( h16 ":" ) ls32 / [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32 / [ *3( h16 ":" ) h16 ] "::" h16 ":" ls32 / [ *4( h16 ":" ) h16 ] "::" ls32 / [ *5( h16 ":" ) h16 ] "::" h16 / [ *6( h16 ":" ) h16 ] "::" |
# RFC-3986 URI component: IPv6address (?: (?: (?:[0-9A-Fa-f]{1,4}:){6} | :: (?:[0-9A-Fa-f]{1,4}:){5} | (?: [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){4} | (?: (?:[0-9A-Fa-f]{1,4}:){0,1} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){3} | (?: (?:[0-9A-Fa-f]{1,4}:){0,2} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){2} | (?: (?:[0-9A-Fa-f]{1,4}:){0,3} [0-9A-Fa-f]{1,4})? :: [0-9A-Fa-f]{1,4}: | (?: (?:[0-9A-Fa-f]{1,4}:){0,4} [0-9A-Fa-f]{1,4})? :: ) (?: # ls32 [0-9A-Fa-f]{1,4} : [0-9A-Fa-f]{1,4} # factored out | (?: (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) \.){3} # from first 7 lines (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) # of ABNF rule above. ) | (?: (?:[0-9A-Fa-f]{1,4}:){0,5} [0-9A-Fa-f]{1,4})? :: [0-9A-Fa-f]{1,4} | (?: (?:[0-9A-Fa-f]{1,4}:){0,6} [0-9A-Fa-f]{1,4})? :: ) | |
IPvFuture | "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" ) |
[Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+ | |
IP-literal | "[" ( IPv6address / IPvFuture ) "]" |
# RFC-3986 URI component: IP-literal \[ # "[" (?: # ( (?: # IPv6address (?: (?:[0-9A-Fa-f]{1,4}:){6} | :: (?:[0-9A-Fa-f]{1,4}:){5} | (?: [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){4} | (?: (?:[0-9A-Fa-f]{1,4}:){0,1} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){3} | (?: (?:[0-9A-Fa-f]{1,4}:){0,2} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){2} | (?: (?:[0-9A-Fa-f]{1,4}:){0,3} [0-9A-Fa-f]{1,4})? :: [0-9A-Fa-f]{1,4}: | (?: (?:[0-9A-Fa-f]{1,4}:){0,4} [0-9A-Fa-f]{1,4})? :: ) (?: [0-9A-Fa-f]{1,4} : [0-9A-Fa-f]{1,4} | (?: (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) \.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) ) | (?: (?:[0-9A-Fa-f]{1,4}:){0,5} [0-9A-Fa-f]{1,4})? :: [0-9A-Fa-f]{1,4} | (?: (?:[0-9A-Fa-f]{1,4}:){0,6} [0-9A-Fa-f]{1,4})? :: ) | [Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+ # / IPvFuture ) # ) \] # "]" | |
reg-name | *( unreserved / pct-encoded / sub-delims ) |
(?:[A-Za-z0-9\-._~!$&'()*+,;=]|%[0-9A-Fa-f]{2})* | |
host | IP-literal / IPv4address / reg-name |
# RFC-3986 URI component: host (?: # ( \[ # IP-literal (?: (?: (?: (?:[0-9A-Fa-f]{1,4}:){6} | :: (?:[0-9A-Fa-f]{1,4}:){5} | (?: [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){4} | (?: (?:[0-9A-Fa-f]{1,4}:){0,1} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){3} | (?: (?:[0-9A-Fa-f]{1,4}:){0,2} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){2} | (?: (?:[0-9A-Fa-f]{1,4}:){0,3} [0-9A-Fa-f]{1,4})? :: [0-9A-Fa-f]{1,4}: | (?: (?:[0-9A-Fa-f]{1,4}:){0,4} [0-9A-Fa-f]{1,4})? :: ) (?: [0-9A-Fa-f]{1,4} : [0-9A-Fa-f]{1,4} | (?: (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) \.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) ) | (?: (?:[0-9A-Fa-f]{1,4}:){0,5} [0-9A-Fa-f]{1,4})? :: [0-9A-Fa-f]{1,4} | (?: (?:[0-9A-Fa-f]{1,4}:){0,6} [0-9A-Fa-f]{1,4})? :: ) | [Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+ ) \] | (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3} # / IPv4address (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) | (?:[A-Za-z0-9\-._~!$&'()*+,;=]|%[0-9A-Fa-f]{2})* # / reg-name ) # ) | |
userinfo | *( unreserved / pct-encoded / sub-delims / ":" ) |
(?:[A-Za-z0-9\-._~!$&'()*+,;=:]|%[0-9A-Fa-f]{2})* | |
port | *DIGIT |
[0-9]* | |
authority | [ userinfo "@" ] host [ ":" port ] |
# RFC-3986 URI component: authority (?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:]|%[0-9A-Fa-f]{2})* @)? # [ userinfo "@" ] (?: # host \[ (?: (?: (?: (?:[0-9A-Fa-f]{1,4}:){6} | :: (?:[0-9A-Fa-f]{1,4}:){5} | (?: [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){4} | (?: (?:[0-9A-Fa-f]{1,4}:){0,1} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){3} | (?: (?:[0-9A-Fa-f]{1,4}:){0,2} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){2} | (?: (?:[0-9A-Fa-f]{1,4}:){0,3} [0-9A-Fa-f]{1,4})? :: [0-9A-Fa-f]{1,4}: | (?: (?:[0-9A-Fa-f]{1,4}:){0,4} [0-9A-Fa-f]{1,4})? :: ) (?: [0-9A-Fa-f]{1,4} : [0-9A-Fa-f]{1,4} | (?: (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) \.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) ) | (?: (?:[0-9A-Fa-f]{1,4}:){0,5} [0-9A-Fa-f]{1,4})? :: [0-9A-Fa-f]{1,4} | (?: (?:[0-9A-Fa-f]{1,4}:){0,6} [0-9A-Fa-f]{1,4})? :: ) | [Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+ ) \] | (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) | (?:[A-Za-z0-9\-._~!$&'()*+,;=]|%[0-9A-Fa-f]{2})* ) (?: : [0-9]* )? # [ ":" port ] | |
pchar | unreserved / pct-encoded / sub-delims / ":" / "@" |
(?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2}) | |
segment | *pchar |
(?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* | |
segment-nz | 1*pchar |
(?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+ | |
segment-nz-nc | 1*( unreserved / pct-encoded / sub-delims / "@" ) ; non-zero-length segment without any colon ":" |
(?:[A-Za-z0-9\-._~!$&'()*+,;=@]|%[0-9A-Fa-f]{2})+ | |
path-abempty | *( "/" segment ) |
# RFC-3986 URI component: path-abempty (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* # *( "/" segment ) | |
path-absolute | "/" [ segment-nz *( "/" segment ) ] |
# RFC-3986 URI component: path-absolute / # "/" (?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+ # [ segment-nz (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* # *( "/" segment ) )? # ] | |
path-noscheme | segment-nz-nc *( "/" segment ) |
# RFC-3986 URI component: path-noscheme (?:[A-Za-z0-9\-._~!$&'()*+,;=@] |%[0-9A-Fa-f]{2})+ # segment-nz-nc (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* # *( "/" segment ) | |
path-rootless | segment-nz *( "/" segment ) |
# RFC-3986 URI component: path-rootless (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+ # segment-nz (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* # *( "/" segment ) | |
path-empty | 0pchar |
.{0} | |
path | path-abempty ; begins with "/" or is empty / path-absolute ; begins with "/" but not "//" / path-noscheme ; begins with a non-colon segment / path-rootless ; begins with a segment / path-empty ; zero characters |
# RFC-3986 URI component: path (?: # ( (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* # path-abempty | / # / path-absolute (?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+ (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* )? | (?:[A-Za-z0-9\-._~!$&'()*+,;=@] |%[0-9A-Fa-f]{2})+ # / path-noscheme (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* | (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+ # / path-rootless (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* | # / path-empty ) # ) | |
hier-part | "//" authority path-abempty / path-absolute / path-rootless / path-empty |
# RFC-3986 URI component: hier-part (?: // # ( "//" (?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:]|%[0-9A-Fa-f]{2})* @)? # authority (?: \[ (?: (?: (?: (?:[0-9A-Fa-f]{1,4}:){6} | :: (?:[0-9A-Fa-f]{1,4}:){5} | (?: [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){4} | (?: (?:[0-9A-Fa-f]{1,4}:){0,1} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){3} | (?: (?:[0-9A-Fa-f]{1,4}:){0,2} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){2} | (?: (?:[0-9A-Fa-f]{1,4}:){0,3} [0-9A-Fa-f]{1,4})? :: [0-9A-Fa-f]{1,4}: | (?: (?:[0-9A-Fa-f]{1,4}:){0,4} [0-9A-Fa-f]{1,4})? :: ) (?: [0-9A-Fa-f]{1,4} : [0-9A-Fa-f]{1,4} | (?: (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) \.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) ) | (?: (?:[0-9A-Fa-f]{1,4}:){0,5} [0-9A-Fa-f]{1,4})? :: [0-9A-Fa-f]{1,4} | (?: (?:[0-9A-Fa-f]{1,4}:){0,6} [0-9A-Fa-f]{1,4})? :: ) | [Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+ ) \] | (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) | (?:[A-Za-z0-9\-._~!$&'()*+,;=]|%[0-9A-Fa-f]{2})* ) (?: : [0-9]* )? (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* # path-abempty | / # / path-absolute (?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+ (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* )? | (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+ # / path-rootless (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* | # / path-empty ) # ) | |
scheme | ALPHA *( ALPHA / DIGIT / "+" / "-" / "." ) |
[A-Za-z][A-Za-z0-9+\-.]* | |
query | *( pchar / "/" / "?" ) |
(?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* | |
absolute-URI | scheme ":" hier-part [ "?" query ] |
# free-spacing mode regex for URI component: absolute-URI [A-Za-z][A-Za-z0-9+\-.]* : # scheme ":" (?: // # hier-part (?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:]|%[0-9A-Fa-f]{2})* @)? (?: \[ (?: (?: (?: (?:[0-9A-Fa-f]{1,4}:){6} | :: (?:[0-9A-Fa-f]{1,4}:){5} | (?: [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){4} | (?: (?:[0-9A-Fa-f]{1,4}:){0,1} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){3} | (?: (?:[0-9A-Fa-f]{1,4}:){0,2} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){2} | (?: (?:[0-9A-Fa-f]{1,4}:){0,3} [0-9A-Fa-f]{1,4})? :: [0-9A-Fa-f]{1,4}: | (?: (?:[0-9A-Fa-f]{1,4}:){0,4} [0-9A-Fa-f]{1,4})? :: ) (?: [0-9A-Fa-f]{1,4} : [0-9A-Fa-f]{1,4} | (?: (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) \.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) ) | (?: (?:[0-9A-Fa-f]{1,4}:){0,5} [0-9A-Fa-f]{1,4})? :: [0-9A-Fa-f]{1,4} | (?: (?:[0-9A-Fa-f]{1,4}:){0,6} [0-9A-Fa-f]{1,4})? :: ) | [Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+ ) \] | (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) | (?:[A-Za-z0-9\-._~!$&'()*+,;=]|%[0-9A-Fa-f]{2})* ) (?: : [0-9]* )? (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* | / (?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+ (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* )? | (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+ (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* | ) (?:\? (?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* )? # [ "?" query ] | |
fragment | *( pchar / "/" / "?" ) |
(?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* | |
URI | scheme ":" hier-part [ "?" query ] [ "#" fragment ] |
# RFC-3986 URI component: URI [A-Za-z][A-Za-z0-9+\-.]* : # scheme ":" (?: // # hier-part (?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:]|%[0-9A-Fa-f]{2})* @)? (?: \[ (?: (?: (?: (?:[0-9A-Fa-f]{1,4}:){6} | :: (?:[0-9A-Fa-f]{1,4}:){5} | (?: [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){4} | (?: (?:[0-9A-Fa-f]{1,4}:){0,1} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){3} | (?: (?:[0-9A-Fa-f]{1,4}:){0,2} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){2} | (?: (?:[0-9A-Fa-f]{1,4}:){0,3} [0-9A-Fa-f]{1,4})? :: [0-9A-Fa-f]{1,4}: | (?: (?:[0-9A-Fa-f]{1,4}:){0,4} [0-9A-Fa-f]{1,4})? :: ) (?: [0-9A-Fa-f]{1,4} : [0-9A-Fa-f]{1,4} | (?: (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) \.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) ) | (?: (?:[0-9A-Fa-f]{1,4}:){0,5} [0-9A-Fa-f]{1,4})? :: [0-9A-Fa-f]{1,4} | (?: (?:[0-9A-Fa-f]{1,4}:){0,6} [0-9A-Fa-f]{1,4})? :: ) | [Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+ ) \] | (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) | (?:[A-Za-z0-9\-._~!$&'()*+,;=]|%[0-9A-Fa-f]{2})* ) (?: : [0-9]* )? (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* | / (?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+ (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* )? | (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+ (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* | ) (?:\? (?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* )? # [ "?" query ] (?:\# (?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* )? # [ "#" fragment ] | |
relative-part | "//" authority path-abempty / path-absolute / path-noscheme / path-empty |
# RFC-3986 URI component: relative-part (?: // # ( "//" (?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:]|%[0-9A-Fa-f]{2})* @)? # authority (?: \[ (?: (?: (?: (?:[0-9A-Fa-f]{1,4}:){6} | :: (?:[0-9A-Fa-f]{1,4}:){5} | (?: [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){4} | (?: (?:[0-9A-Fa-f]{1,4}:){0,1} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){3} | (?: (?:[0-9A-Fa-f]{1,4}:){0,2} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){2} | (?: (?:[0-9A-Fa-f]{1,4}:){0,3} [0-9A-Fa-f]{1,4})? :: [0-9A-Fa-f]{1,4}: | (?: (?:[0-9A-Fa-f]{1,4}:){0,4} [0-9A-Fa-f]{1,4})? :: ) (?: [0-9A-Fa-f]{1,4} : [0-9A-Fa-f]{1,4} | (?: (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) \.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) ) | (?: (?:[0-9A-Fa-f]{1,4}:){0,5} [0-9A-Fa-f]{1,4})? :: [0-9A-Fa-f]{1,4} | (?: (?:[0-9A-Fa-f]{1,4}:){0,6} [0-9A-Fa-f]{1,4})? :: ) | [Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+ ) \] | (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) | (?:[A-Za-z0-9\-._~!$&'()*+,;=]|%[0-9A-Fa-f]{2})* ) (?: : [0-9]* )? (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* # path-abempty | / # / path-absolute (?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+ (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* )? | (?:[A-Za-z0-9\-._~!$&'()*+,;=@] |%[0-9A-Fa-f]{2})+ # / path-noscheme (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* | # / path-empty ) # ) | |
relative-ref | relative-part [ "?" query ] [ "#" fragment ] |
# RFC-3986 URI component: relative-ref (?: // # relative-part (?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:]|%[0-9A-Fa-f]{2})* @)? (?: \[ (?: (?: (?: (?:[0-9A-Fa-f]{1,4}:){6} | :: (?:[0-9A-Fa-f]{1,4}:){5} | (?: [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){4} | (?: (?:[0-9A-Fa-f]{1,4}:){0,1} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){3} | (?: (?:[0-9A-Fa-f]{1,4}:){0,2} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){2} | (?: (?:[0-9A-Fa-f]{1,4}:){0,3} [0-9A-Fa-f]{1,4})? :: [0-9A-Fa-f]{1,4}: | (?: (?:[0-9A-Fa-f]{1,4}:){0,4} [0-9A-Fa-f]{1,4})? :: ) (?: [0-9A-Fa-f]{1,4} : [0-9A-Fa-f]{1,4} | (?: (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) \.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) ) | (?: (?:[0-9A-Fa-f]{1,4}:){0,5} [0-9A-Fa-f]{1,4})? :: [0-9A-Fa-f]{1,4} | (?: (?:[0-9A-Fa-f]{1,4}:){0,6} [0-9A-Fa-f]{1,4})? :: ) | [Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+ ) \] | (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) | (?:[A-Za-z0-9\-._~!$&'()*+,;=]|%[0-9A-Fa-f]{2})* ) (?: : [0-9]* )? (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* | / (?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+ (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* )? | (?:[A-Za-z0-9\-._~!$&'()*+,;=@] |%[0-9A-Fa-f]{2})+ (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* | ) (?:\? (?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* )? # [ "?" query ] (?:\# (?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* )? # [ "#" fragment ] | |
URI-reference | URI / relative-ref |
# RFC-3986 URI component: URI-reference (?: # ( [A-Za-z][A-Za-z0-9+\-.]* : # URI (?: // (?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:]|%[0-9A-Fa-f]{2})* @)? (?: \[ (?: (?: (?: (?:[0-9A-Fa-f]{1,4}:){6} | :: (?:[0-9A-Fa-f]{1,4}:){5} | (?: [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){4} | (?: (?:[0-9A-Fa-f]{1,4}:){0,1} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){3} | (?: (?:[0-9A-Fa-f]{1,4}:){0,2} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){2} | (?: (?:[0-9A-Fa-f]{1,4}:){0,3} [0-9A-Fa-f]{1,4})? :: [0-9A-Fa-f]{1,4}: | (?: (?:[0-9A-Fa-f]{1,4}:){0,4} [0-9A-Fa-f]{1,4})? :: ) (?: [0-9A-Fa-f]{1,4} : [0-9A-Fa-f]{1,4} | (?: (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) \.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) ) | (?: (?:[0-9A-Fa-f]{1,4}:){0,5} [0-9A-Fa-f]{1,4})? :: [0-9A-Fa-f]{1,4} | (?: (?:[0-9A-Fa-f]{1,4}:){0,6} [0-9A-Fa-f]{1,4})? :: ) | [Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+ ) \] | (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) | (?:[A-Za-z0-9\-._~!$&'()*+,;=]|%[0-9A-Fa-f]{2})* ) (?: : [0-9]* )? (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* | / (?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+ (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* )? | (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+ (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* | ) (?:\? (?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* )? (?:\# (?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* )? | (?: // # / relative-ref (?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:]|%[0-9A-Fa-f]{2})* @)? (?: \[ (?: (?: (?: (?:[0-9A-Fa-f]{1,4}:){6} | :: (?:[0-9A-Fa-f]{1,4}:){5} | (?: [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){4} | (?: (?:[0-9A-Fa-f]{1,4}:){0,1} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){3} | (?: (?:[0-9A-Fa-f]{1,4}:){0,2} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){2} | (?: (?:[0-9A-Fa-f]{1,4}:){0,3} [0-9A-Fa-f]{1,4})? :: [0-9A-Fa-f]{1,4}: | (?: (?:[0-9A-Fa-f]{1,4}:){0,4} [0-9A-Fa-f]{1,4})? :: ) (?: [0-9A-Fa-f]{1,4} : [0-9A-Fa-f]{1,4} | (?: (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) \.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) ) | (?: (?:[0-9A-Fa-f]{1,4}:){0,5} [0-9A-Fa-f]{1,4})? :: [0-9A-Fa-f]{1,4} | (?: (?:[0-9A-Fa-f]{1,4}:){0,6} [0-9A-Fa-f]{1,4})? :: ) | [Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+ ) \] | (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) | (?:[A-Za-z0-9\-._~!$&'()*+,;=]|%[0-9A-Fa-f]{2})* ) (?: : [0-9]* )? (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* | / (?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+ (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* )? | (?:[A-Za-z0-9\-._~!$&'()*+,;=@] |%[0-9A-Fa-f]{2})+ (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )* | ) (?:\? (?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* )? (?:\# (?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* )? ) # ) |
Regular Expressions are a great tool to help describe (and validate) the precise text semantics which uniquely describe Universal Resource Identifiers. When accurately and consistently adhered to by all users, this URI methodology brings order to the addressing system used to locate each piece of information found on the global internet. Hopefully, this article will help web/network developers properly (and painlessly) implement their networking resource addressing requirements, by reducing their time spent formulating accurate and efficient regular expressions used to parse and validate URIs.
Happy regexing!