Regular Expression URI Validation

© Jeff Roberson
Created: 2009-Mar-02
Edited: 2012-Oct-06
Revision History

Introduction

The Internet Engineering Task Force's (IETF) Request for Comments (RFC) document number 3986 titled: "Uniform Resource Identifier (URI): Generic Syntax" (RFC3986), is a standard which describes the precise syntax of all components that make up a valid generic Uniform Resource Identifier (URI). This article seeks to provide a set of regular expressions (regex) which accurately, clearly and efficiently describe all the various components of a URI.

Core Rules (RFC4234)

To define the syntax of the various URI components, RFC3986 utilizes the Augmented Backus-Naur Form (ABNF) notation as defined in RFC4234. Table 1. lists all the (trivial) core rules defined in RFC4234 which are used by RFC3986.

Table 1. Core ABNF syntax rules from RFC4234
ABNF rule name ABNF rule syntax [ ; comments ]
Regular Expression rule syntax
ALPHA
%x41-5A / %x61-7A                         ; A-Z / a-z
[A-Za-z]
CR
%x0D                                      ; carriage return
\r
DIGIT
%x30-39                                   ; 0-9
[0-9]
DQUOTE
0x22                                      ; " (Double Quote)
"
HEXDIG
DIGIT / "A" / "B" / "C" / "D" / "E" / "F"
[0-9A-Fa-f]
LF
0x0A                                      ; linefeed
\n
SP
0x20                                      ; space
\x20

URI Rules (RFC3986)

RFC3986 Appendix A describes all of the rules which make up the syntax for a valid URI. Each rule has a name and an ABNF syntax definition. All of these rules from RFC3986 are reproduced in Table 2 along with a regular expression implementation for each rule. For non-trivial URI ABNF rules, the regex is presented in verbose free-spacing mode with comments (taken from the ABNF syntax wording, which describe each major component of the regex). Trivial regexes are presented in non-verbose native regex mode.

SOURCE CODE SNIPPETS: With Javascript enabled, you can double-click on any regular expression to generate a TEXTAREA box containing a code snippet for one of several different languages. Double-click again to turn it off.

Table 2. URI ABNF syntax rules from RFC3986
ABNF rule name ABNF rule syntax [ ; comments ]
Regular Expression rule syntax
gen-delims
":" / "/" / "?" / "#" / "[" / "]" / "@"
[:/?#[\]@]
sub-delims
"!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
[!$&'()*+,;=]
reserved
gen-delims / sub-delims
[:/?#[\]@!$&'()*+,;=]
unreserved
ALPHA / DIGIT / "-" / "." / "_" / "~"
[A-Za-z0-9\-._~]
pct-encoded
"%" HEXDIG HEXDIG
%[0-9A-Fa-f]{2}
dec-octet
  DIGIT                 ; 0-9
/ %x31-39 DIGIT         ; 10-99
/ "1" 2DIGIT            ; 100-199
/ "2" %x30-34 DIGIT     ; 200-249
/ "25" %x30-35          ; 250-255
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
IPv4address
dec-octet "." dec-octet "." dec-octet "." dec-octet
# RFC-3986 URI component:  IPv4address
(?: (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) \.){3}  # dec-octet "." dec-octet "."
    (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)         # dec-octet "." dec-octet
h16
1*4HEXDIG
[0-9A-Fa-f]{1,4}
ls32
( h16 ":" h16 ) / IPv4address
# RFC-3986 URI component:  ls32
(?:                                                    # (
  [0-9A-Fa-f]{1,4} : [0-9A-Fa-f]{1,4}                  # ( h16 ":" h16 )
| (?: (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) \.){3}  # / IPv4address
      (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
)                                                      # )
IPv6address
                             6( h16 ":" ) ls32
/                       "::" 5( h16 ":" ) ls32
/ [               h16 ] "::" 4( h16 ":" ) ls32
/ [ *1( h16 ":" ) h16 ] "::" 3( h16 ":" ) ls32
/ [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32
/ [ *3( h16 ":" ) h16 ] "::"    h16 ":"   ls32
/ [ *4( h16 ":" ) h16 ] "::"              ls32
/ [ *5( h16 ":" ) h16 ] "::"              h16
/ [ *6( h16 ":" ) h16 ] "::"
# RFC-3986 URI component:  IPv6address
(?:
  (?:                                                    (?:[0-9A-Fa-f]{1,4}:){6}
  |                                                   :: (?:[0-9A-Fa-f]{1,4}:){5}
  | (?:                            [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){4}
  | (?: (?:[0-9A-Fa-f]{1,4}:){0,1} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){3}
  | (?: (?:[0-9A-Fa-f]{1,4}:){0,2} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){2}
  | (?: (?:[0-9A-Fa-f]{1,4}:){0,3} [0-9A-Fa-f]{1,4})? ::    [0-9A-Fa-f]{1,4}:
  | (?: (?:[0-9A-Fa-f]{1,4}:){0,4} [0-9A-Fa-f]{1,4})? ::
  ) (?:                                                    # ls32
      [0-9A-Fa-f]{1,4} : [0-9A-Fa-f]{1,4}                  # factored out
    | (?: (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) \.){3}  # from first 7 lines
          (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)         # of ABNF rule above.
    )
|   (?: (?:[0-9A-Fa-f]{1,4}:){0,5} [0-9A-Fa-f]{1,4})? ::    [0-9A-Fa-f]{1,4}
|   (?: (?:[0-9A-Fa-f]{1,4}:){0,6} [0-9A-Fa-f]{1,4})? ::
)
IPvFuture
"v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )
[Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+
IP-literal
"[" ( IPv6address / IPvFuture  ) "]"
# RFC-3986 URI component:  IP-literal
\[                                                          # "["
(?:                                                         # (
  (?:                                                       # IPv6address
    (?:                                                    (?:[0-9A-Fa-f]{1,4}:){6}
    |                                                   :: (?:[0-9A-Fa-f]{1,4}:){5}
    | (?:                            [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){4}
    | (?: (?:[0-9A-Fa-f]{1,4}:){0,1} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){3}
    | (?: (?:[0-9A-Fa-f]{1,4}:){0,2} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){2}
    | (?: (?:[0-9A-Fa-f]{1,4}:){0,3} [0-9A-Fa-f]{1,4})? ::    [0-9A-Fa-f]{1,4}:
    | (?: (?:[0-9A-Fa-f]{1,4}:){0,4} [0-9A-Fa-f]{1,4})? ::
    ) (?:
        [0-9A-Fa-f]{1,4} : [0-9A-Fa-f]{1,4}
      | (?: (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) \.){3}
            (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
      )
  |   (?: (?:[0-9A-Fa-f]{1,4}:){0,5} [0-9A-Fa-f]{1,4})? ::    [0-9A-Fa-f]{1,4}
  |   (?: (?:[0-9A-Fa-f]{1,4}:){0,6} [0-9A-Fa-f]{1,4})? ::
  )
| [Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+           # / IPvFuture
)                                                           # )
\]                                                          # "]"
reg-name
*( unreserved / pct-encoded / sub-delims )
(?:[A-Za-z0-9\-._~!$&'()*+,;=]|%[0-9A-Fa-f]{2})*
host
IP-literal / IPv4address / reg-name
# RFC-3986 URI component:  host
(?:                                                           # (
  \[                                                          # IP-literal
  (?:
    (?:
      (?:                                                    (?:[0-9A-Fa-f]{1,4}:){6}
      |                                                   :: (?:[0-9A-Fa-f]{1,4}:){5}
      | (?:                            [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){4}
      | (?: (?:[0-9A-Fa-f]{1,4}:){0,1} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){3}
      | (?: (?:[0-9A-Fa-f]{1,4}:){0,2} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){2}
      | (?: (?:[0-9A-Fa-f]{1,4}:){0,3} [0-9A-Fa-f]{1,4})? ::    [0-9A-Fa-f]{1,4}:
      | (?: (?:[0-9A-Fa-f]{1,4}:){0,4} [0-9A-Fa-f]{1,4})? ::
      ) (?:
          [0-9A-Fa-f]{1,4} : [0-9A-Fa-f]{1,4}
        | (?: (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) \.){3}
              (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
        )
    |   (?: (?:[0-9A-Fa-f]{1,4}:){0,5} [0-9A-Fa-f]{1,4})? ::    [0-9A-Fa-f]{1,4}
    |   (?: (?:[0-9A-Fa-f]{1,4}:){0,6} [0-9A-Fa-f]{1,4})? ::
    )
  | [Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+
  )
  \]
| (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}           # / IPv4address
     (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
| (?:[A-Za-z0-9\-._~!$&'()*+,;=]|%[0-9A-Fa-f]{2})*            # / reg-name
)                                                             # )
userinfo
*( unreserved / pct-encoded / sub-delims / ":" )
(?:[A-Za-z0-9\-._~!$&'()*+,;=:]|%[0-9A-Fa-f]{2})*
port
*DIGIT
[0-9]*
authority
[ userinfo "@" ] host [ ":" port ]
# RFC-3986 URI component:  authority
(?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:]|%[0-9A-Fa-f]{2})* @)?     # [ userinfo "@" ]
(?:                                                           # host
  \[
  (?:
    (?:
      (?:                                                    (?:[0-9A-Fa-f]{1,4}:){6}
      |                                                   :: (?:[0-9A-Fa-f]{1,4}:){5}
      | (?:                            [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){4}
      | (?: (?:[0-9A-Fa-f]{1,4}:){0,1} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){3}
      | (?: (?:[0-9A-Fa-f]{1,4}:){0,2} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){2}
      | (?: (?:[0-9A-Fa-f]{1,4}:){0,3} [0-9A-Fa-f]{1,4})? ::    [0-9A-Fa-f]{1,4}:
      | (?: (?:[0-9A-Fa-f]{1,4}:){0,4} [0-9A-Fa-f]{1,4})? ::
      ) (?:
          [0-9A-Fa-f]{1,4} : [0-9A-Fa-f]{1,4}
        | (?: (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) \.){3}
              (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
        )
    |   (?: (?:[0-9A-Fa-f]{1,4}:){0,5} [0-9A-Fa-f]{1,4})? ::    [0-9A-Fa-f]{1,4}
    |   (?: (?:[0-9A-Fa-f]{1,4}:){0,6} [0-9A-Fa-f]{1,4})? ::
    )
  | [Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+
  )
  \]
| (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
     (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
| (?:[A-Za-z0-9\-._~!$&'()*+,;=]|%[0-9A-Fa-f]{2})*
)
(?: : [0-9]* )?                                               # [ ":" port ]
pchar
unreserved / pct-encoded / sub-delims / ":" / "@"
(?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})
segment
*pchar
(?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})*
segment-nz
1*pchar
(?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+
segment-nz-nc
1*( unreserved / pct-encoded / sub-delims / "@" )
               ; non-zero-length segment without any colon ":"
(?:[A-Za-z0-9\-._~!$&'()*+,;=@]|%[0-9A-Fa-f]{2})+
path-abempty
*( "/" segment )
# RFC-3986 URI component:  path-abempty
  (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*  # *( "/" segment )
path-absolute
"/" [ segment-nz *( "/" segment ) ]
# RFC-3986 URI component:  path-absolute
/                                                             # "/"
(?:    (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+     # [ segment-nz
  (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*  # *( "/" segment )
)?                                                            # ]
path-noscheme
segment-nz-nc *( "/" segment )
# RFC-3986 URI component:  path-noscheme
       (?:[A-Za-z0-9\-._~!$&'()*+,;=@] |%[0-9A-Fa-f]{2})+     # segment-nz-nc
  (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*  # *( "/" segment )
path-rootless
segment-nz *( "/" segment )
# RFC-3986 URI component:  path-rootless
       (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+     # segment-nz
  (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*  # *( "/" segment )
path-empty
0pchar
.{0}
path
  path-abempty    ; begins with "/" or is empty
/ path-absolute   ; begins with "/" but not "//"
/ path-noscheme   ; begins with a non-colon segment
/ path-rootless   ; begins with a segment
/ path-empty      ; zero characters
# RFC-3986 URI component:  path
(?:                                                             # (
    (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*  #   path-abempty
| /                                                             # / path-absolute
  (?:    (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+
    (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
  )?
|        (?:[A-Za-z0-9\-._~!$&'()*+,;=@] |%[0-9A-Fa-f]{2})+     # / path-noscheme
    (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
|        (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+     # / path-rootless
    (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
|                                                               # / path-empty
)                                                               # )
hier-part
"//" authority path-abempty
/ path-absolute
/ path-rootless
/ path-empty
# RFC-3986 URI component:  hier-part
(?: //                                                          # ( "//"
  (?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:]|%[0-9A-Fa-f]{2})* @)?     # authority
  (?:
    \[
    (?:
      (?:
        (?:                                                    (?:[0-9A-Fa-f]{1,4}:){6}
        |                                                   :: (?:[0-9A-Fa-f]{1,4}:){5}
        | (?:                            [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){4}
        | (?: (?:[0-9A-Fa-f]{1,4}:){0,1} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){3}
        | (?: (?:[0-9A-Fa-f]{1,4}:){0,2} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){2}
        | (?: (?:[0-9A-Fa-f]{1,4}:){0,3} [0-9A-Fa-f]{1,4})? ::    [0-9A-Fa-f]{1,4}:
        | (?: (?:[0-9A-Fa-f]{1,4}:){0,4} [0-9A-Fa-f]{1,4})? ::
        ) (?:
            [0-9A-Fa-f]{1,4} : [0-9A-Fa-f]{1,4}
          | (?: (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) \.){3}
                (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
          )
      |   (?: (?:[0-9A-Fa-f]{1,4}:){0,5} [0-9A-Fa-f]{1,4})? ::    [0-9A-Fa-f]{1,4}
      |   (?: (?:[0-9A-Fa-f]{1,4}:){0,6} [0-9A-Fa-f]{1,4})? ::
      )
    | [Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+
    )
    \]
  | (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
       (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
  | (?:[A-Za-z0-9\-._~!$&'()*+,;=]|%[0-9A-Fa-f]{2})*
  )
  (?: : [0-9]* )?
  (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*    # path-abempty
| /                                                             # / path-absolute
  (?:    (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+
    (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
  )?
|        (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+     # / path-rootless
    (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
|                                                               # / path-empty
)                                                               # )
scheme
ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
[A-Za-z][A-Za-z0-9+\-.]*
query
*( pchar / "/" / "?" )
(?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})*
absolute-URI
scheme ":" hier-part [ "?" query ]
# free-spacing mode regex for URI component:  absolute-URI
[A-Za-z][A-Za-z0-9+\-.]* :                                      # scheme ":"
(?: //                                                          # hier-part
  (?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:]|%[0-9A-Fa-f]{2})* @)?
  (?:
    \[
    (?:
      (?:
        (?:                                                    (?:[0-9A-Fa-f]{1,4}:){6}
        |                                                   :: (?:[0-9A-Fa-f]{1,4}:){5}
        | (?:                            [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){4}
        | (?: (?:[0-9A-Fa-f]{1,4}:){0,1} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){3}
        | (?: (?:[0-9A-Fa-f]{1,4}:){0,2} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){2}
        | (?: (?:[0-9A-Fa-f]{1,4}:){0,3} [0-9A-Fa-f]{1,4})? ::    [0-9A-Fa-f]{1,4}:
        | (?: (?:[0-9A-Fa-f]{1,4}:){0,4} [0-9A-Fa-f]{1,4})? ::
        ) (?:
            [0-9A-Fa-f]{1,4} : [0-9A-Fa-f]{1,4}
          | (?: (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) \.){3}
                (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
          )
      |   (?: (?:[0-9A-Fa-f]{1,4}:){0,5} [0-9A-Fa-f]{1,4})? ::    [0-9A-Fa-f]{1,4}
      |   (?: (?:[0-9A-Fa-f]{1,4}:){0,6} [0-9A-Fa-f]{1,4})? ::
      )
    | [Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+
    )
    \]
  | (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
       (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
  | (?:[A-Za-z0-9\-._~!$&'()*+,;=]|%[0-9A-Fa-f]{2})*
  )
  (?: : [0-9]* )?
  (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
| /
  (?:    (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+
    (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
  )?
|        (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+
    (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
|
)
(?:\? (?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* )?   # [ "?" query ]
fragment
*( pchar / "/" / "?" )
(?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})*
URI
scheme ":" hier-part [ "?" query ] [ "#" fragment ]
# RFC-3986 URI component:  URI
[A-Za-z][A-Za-z0-9+\-.]* :                                      # scheme ":"
(?: //                                                          # hier-part
  (?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:]|%[0-9A-Fa-f]{2})* @)?
  (?:
    \[
    (?:
      (?:
        (?:                                                    (?:[0-9A-Fa-f]{1,4}:){6}
        |                                                   :: (?:[0-9A-Fa-f]{1,4}:){5}
        | (?:                            [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){4}
        | (?: (?:[0-9A-Fa-f]{1,4}:){0,1} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){3}
        | (?: (?:[0-9A-Fa-f]{1,4}:){0,2} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){2}
        | (?: (?:[0-9A-Fa-f]{1,4}:){0,3} [0-9A-Fa-f]{1,4})? ::    [0-9A-Fa-f]{1,4}:
        | (?: (?:[0-9A-Fa-f]{1,4}:){0,4} [0-9A-Fa-f]{1,4})? ::
        ) (?:
            [0-9A-Fa-f]{1,4} : [0-9A-Fa-f]{1,4}
          | (?: (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) \.){3}
                (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
          )
      |   (?: (?:[0-9A-Fa-f]{1,4}:){0,5} [0-9A-Fa-f]{1,4})? ::    [0-9A-Fa-f]{1,4}
      |   (?: (?:[0-9A-Fa-f]{1,4}:){0,6} [0-9A-Fa-f]{1,4})? ::
      )
    | [Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+
    )
    \]
  | (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
       (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
  | (?:[A-Za-z0-9\-._~!$&'()*+,;=]|%[0-9A-Fa-f]{2})*
  )
  (?: : [0-9]* )?
  (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
| /
  (?:    (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+
    (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
  )?
|        (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+
    (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
|
)
(?:\? (?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* )?   # [ "?" query ]
(?:\# (?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* )?   # [ "#" fragment ]
relative-part
"//" authority path-abempty
/ path-absolute
/ path-noscheme
/ path-empty
# RFC-3986 URI component:  relative-part
(?: //                                                          # ( "//"
  (?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:]|%[0-9A-Fa-f]{2})* @)?     # authority
  (?:
    \[
    (?:
      (?:
        (?:                                                    (?:[0-9A-Fa-f]{1,4}:){6}
        |                                                   :: (?:[0-9A-Fa-f]{1,4}:){5}
        | (?:                            [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){4}
        | (?: (?:[0-9A-Fa-f]{1,4}:){0,1} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){3}
        | (?: (?:[0-9A-Fa-f]{1,4}:){0,2} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){2}
        | (?: (?:[0-9A-Fa-f]{1,4}:){0,3} [0-9A-Fa-f]{1,4})? ::    [0-9A-Fa-f]{1,4}:
        | (?: (?:[0-9A-Fa-f]{1,4}:){0,4} [0-9A-Fa-f]{1,4})? ::
        ) (?:
            [0-9A-Fa-f]{1,4} : [0-9A-Fa-f]{1,4}
          | (?: (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) \.){3}
                (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
          )
      |   (?: (?:[0-9A-Fa-f]{1,4}:){0,5} [0-9A-Fa-f]{1,4})? ::    [0-9A-Fa-f]{1,4}
      |   (?: (?:[0-9A-Fa-f]{1,4}:){0,6} [0-9A-Fa-f]{1,4})? ::
      )
    | [Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+
    )
    \]
  | (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
       (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
  | (?:[A-Za-z0-9\-._~!$&'()*+,;=]|%[0-9A-Fa-f]{2})*
  )
  (?: : [0-9]* )?
  (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*    # path-abempty
| /                                                             # / path-absolute
  (?:    (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+
    (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
  )?
|        (?:[A-Za-z0-9\-._~!$&'()*+,;=@] |%[0-9A-Fa-f]{2})+     # / path-noscheme
    (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
|                                                               # / path-empty
)                                                               # )
relative-ref
relative-part [ "?" query ] [ "#" fragment ]
# RFC-3986 URI component:  relative-ref
(?: //                                                          # relative-part
  (?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:]|%[0-9A-Fa-f]{2})* @)?
  (?:
    \[
    (?:
      (?:
        (?:                                                    (?:[0-9A-Fa-f]{1,4}:){6}
        |                                                   :: (?:[0-9A-Fa-f]{1,4}:){5}
        | (?:                            [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){4}
        | (?: (?:[0-9A-Fa-f]{1,4}:){0,1} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){3}
        | (?: (?:[0-9A-Fa-f]{1,4}:){0,2} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){2}
        | (?: (?:[0-9A-Fa-f]{1,4}:){0,3} [0-9A-Fa-f]{1,4})? ::    [0-9A-Fa-f]{1,4}:
        | (?: (?:[0-9A-Fa-f]{1,4}:){0,4} [0-9A-Fa-f]{1,4})? ::
        ) (?:
            [0-9A-Fa-f]{1,4} : [0-9A-Fa-f]{1,4}
          | (?: (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) \.){3}
                (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
          )
      |   (?: (?:[0-9A-Fa-f]{1,4}:){0,5} [0-9A-Fa-f]{1,4})? ::    [0-9A-Fa-f]{1,4}
      |   (?: (?:[0-9A-Fa-f]{1,4}:){0,6} [0-9A-Fa-f]{1,4})? ::
      )
    | [Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+
    )
    \]
  | (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
       (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
  | (?:[A-Za-z0-9\-._~!$&'()*+,;=]|%[0-9A-Fa-f]{2})*
  )
  (?: : [0-9]* )?
  (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
| /
  (?:    (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+
    (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
  )?
|        (?:[A-Za-z0-9\-._~!$&'()*+,;=@] |%[0-9A-Fa-f]{2})+
    (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
|
)
(?:\? (?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* )?   # [ "?" query ]
(?:\# (?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* )?   # [ "#" fragment ]
URI-reference
URI / relative-ref
# RFC-3986 URI component: URI-reference
(?:                                                               # (
  [A-Za-z][A-Za-z0-9+\-.]* :                                      # URI
  (?: //
    (?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:]|%[0-9A-Fa-f]{2})* @)?
    (?:
      \[
      (?:
        (?:
          (?:                                                    (?:[0-9A-Fa-f]{1,4}:){6}
          |                                                   :: (?:[0-9A-Fa-f]{1,4}:){5}
          | (?:                            [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){4}
          | (?: (?:[0-9A-Fa-f]{1,4}:){0,1} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){3}
          | (?: (?:[0-9A-Fa-f]{1,4}:){0,2} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){2}
          | (?: (?:[0-9A-Fa-f]{1,4}:){0,3} [0-9A-Fa-f]{1,4})? ::    [0-9A-Fa-f]{1,4}:
          | (?: (?:[0-9A-Fa-f]{1,4}:){0,4} [0-9A-Fa-f]{1,4})? ::
          ) (?:
              [0-9A-Fa-f]{1,4} : [0-9A-Fa-f]{1,4}
            | (?: (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) \.){3}
                  (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
            )
        |   (?: (?:[0-9A-Fa-f]{1,4}:){0,5} [0-9A-Fa-f]{1,4})? ::    [0-9A-Fa-f]{1,4}
        |   (?: (?:[0-9A-Fa-f]{1,4}:){0,6} [0-9A-Fa-f]{1,4})? ::
        )
      | [Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+
      )
      \]
    | (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
         (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
    | (?:[A-Za-z0-9\-._~!$&'()*+,;=]|%[0-9A-Fa-f]{2})*
    )
    (?: : [0-9]* )?
    (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
  | /
    (?:    (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+
      (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
    )?
  |        (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+
      (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
  |
  )
  (?:\? (?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* )?
  (?:\# (?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* )?
| (?: //                                                          # / relative-ref
    (?: (?:[A-Za-z0-9\-._~!$&'()*+,;=:]|%[0-9A-Fa-f]{2})* @)?
    (?:
      \[
      (?:
        (?:
          (?:                                                    (?:[0-9A-Fa-f]{1,4}:){6}
          |                                                   :: (?:[0-9A-Fa-f]{1,4}:){5}
          | (?:                            [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){4}
          | (?: (?:[0-9A-Fa-f]{1,4}:){0,1} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){3}
          | (?: (?:[0-9A-Fa-f]{1,4}:){0,2} [0-9A-Fa-f]{1,4})? :: (?:[0-9A-Fa-f]{1,4}:){2}
          | (?: (?:[0-9A-Fa-f]{1,4}:){0,3} [0-9A-Fa-f]{1,4})? ::    [0-9A-Fa-f]{1,4}:
          | (?: (?:[0-9A-Fa-f]{1,4}:){0,4} [0-9A-Fa-f]{1,4})? ::
          ) (?:
              [0-9A-Fa-f]{1,4} : [0-9A-Fa-f]{1,4}
            | (?: (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) \.){3}
                  (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
            )
        |   (?: (?:[0-9A-Fa-f]{1,4}:){0,5} [0-9A-Fa-f]{1,4})? ::    [0-9A-Fa-f]{1,4}
        |   (?: (?:[0-9A-Fa-f]{1,4}:){0,6} [0-9A-Fa-f]{1,4})? ::
        )
      | [Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&'()*+,;=:]+
      )
      \]
    | (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
         (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
    | (?:[A-Za-z0-9\-._~!$&'()*+,;=]|%[0-9A-Fa-f]{2})*
    )
    (?: : [0-9]* )?
    (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
  | /
    (?:    (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})+
      (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
    )?
  |        (?:[A-Za-z0-9\-._~!$&'()*+,;=@] |%[0-9A-Fa-f]{2})+
      (?:/ (?:[A-Za-z0-9\-._~!$&'()*+,;=:@]|%[0-9A-Fa-f]{2})* )*
  |
  )
  (?:\? (?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* )?
  (?:\# (?:[A-Za-z0-9\-._~!$&'()*+,;=:@/?]|%[0-9A-Fa-f]{2})* )?
)                                                                       # )

Miscellaneous Table Notes

Summary

Regular Expressions are a great tool to help describe (and validate) the precise text semantics which uniquely describe Universal Resource Identifiers. When accurately and consistently adhered to by all users, this URI methodology brings order to the addressing system used to locate each piece of information found on the global internet. Hopefully, this article will help web/network developers properly (and painlessly) implement their networking resource addressing requirements, by reducing their time spent formulating accurate and efficient regular expressions used to parse and validate URIs.

Happy regexing!

Valid XHTML 1.0 Strict