URL Linkification (HTTP/FTP).

Version 20101010_1000

PHP version: linkify.php. Javascript version: linkify.html.

Click on paragraphs below to apply Javascript linkification to un-linkified URLs.

Well-formed URL syntax (required to match 100% correctly):

Plain URLs (not delimited):
foo http://example.com bar...
foo http://example.com:80 bar...
foo http://example.com:80/path/ bar...
foo http://example.com:80/path/file.txt bar...
foo http://example.com:80/path/file.txt?query=val&var2=val2 bar...
foo http://example.com:80/path/file.txt?query=val&var2=val2#fragment bar...
foo http://example.com/(file's_name.txt) bar... (with ' and (parentheses))
foo http://[2001:0db8:85a3:08d3:1319:8a2e:0370:7348] bar... ([IPv6 literal])
foo http://[2001:0db8:85a3:08d3:1319:8a2e:0370:7348]/file.txt bar... ([IPv6] with path)

URLs ending with [.!',;:?] punctuation:
foo http://example.com. bar...
foo http://example.com! bar...
foo http://example.com' bar...
foo http://example.com, bar...
foo http://example.com; bar...
foo http://example.com: bar...
foo http://example.com? bar...

URLs within matching "()[]{}<>" delimiters:
foo (http://example.com) bar...
foo [http://example.com] bar...
foo {http://example.com} bar...
foo <http://example.com> bar... (encoded as: &lt;URL&gt;)
foo <http://example.com> bar... (encoded as: &#60;URL&#62;)
foo <http://example.com> bar... (encoded as: &#x3C;URL&#x3E;)
foo (http://example.com/(path)/file.txt) bar... (with inside (parentheses))
foo (http://example.com/path/(file.txt)) bar... (with ending (parentheses))
foo [http://[2001:0db8:85a3:08d3:1319:8a2e:0370:7348]] bar... ([IPv6 literal])

URLs within matching "()[]{}<>" delimiters ending with [.!',;:?] punctuation:
foo (http://example.com.) bar...
foo [http://example.com!] bar...
foo {http://example.com'} bar...
foo <http://example.com,> bar... (encoded as: &lt;URL&gt;)
foo <http://example.com;> bar... (encoded as: &#60;URL&#62;)
foo <http://example.com:> bar... (encoded as: &#x3C;URL&#x3E;)
foo (http://example.com/(path)/file.txt?) bar... (with inside (parentheses))
foo (http://example.com/path/(file.txt).) bar... (with ending (parentheses))
foo [http://[2001:0db8:85a3:08d3:1319:8a2e:0370:7348]!] bar... ([IPv6 literal])

URLs within matching quotes:
foo 'http://example.com' bar...
foo 'http://example.com' bar... (encoded as: &apos;URL&apos; Note 1.)
foo 'http://example.com' bar... (encoded as: &#39;URL&#39;)
foo 'http://example.com' bar... (encoded as: &#039;URL&#039;)
foo 'http://example.com' bar... (encoded as: &#x27;URL&#x27;)
foo 'http://example.com' bar... (encoded as: &#x027;URL&#x027;)
foo "http://example.com" bar...
foo "http://example.com" bar... (encoded as: &quot;URL&quot;)
foo "http://example.com" bar... (encoded as: &#34;URL&#34;)
foo "http://example.com" bar... (encoded as: &#034;URL&#034;)
foo "http://example.com" bar... (encoded as: &#x22;URL&#x22;)
foo "http://example.com" bar... (encoded as: &#x022;URL&#x022;)

Note 1. The &apos; entity is not part of the HTML 4 standard and Internet Explorer 6 does not recognize it. If you are viewing the HTML version of this page with IE, this entity may initially appear as: "&apos;". In Firefox, Opera and Safari, it appears as "'". However, the linkify_html() function converts each &apos; to its numeric html entity equivalent: &#39;, so once this has run (either by clicking on the paragraph or loading the PHP version of the page), then they should all appear correctly. Note also that The W3C recommends to NOT use the &apos; entity in HTML documents, but to use &#39; instead. This page is using it to demonstrate how this char is handled by the Linkify() function.

URLs within matching quotes and ending [.!',;:?] punctuation inside:
foo 'http://example.com.' bar...
foo 'http://example.com!' bar... (encoded as: &apos;URL&apos; Note 1.)
foo 'http://example.com'' bar... (encoded as: &#39;URL&#39;)
foo 'http://example.com,' bar... (encoded as: &#039;URL&#039;)
foo 'http://example.com;' bar... (encoded as: &#x27;URL&#x27;)
foo 'http://example.com:' bar... (encoded as: &#x027;URL&#x027;)
foo "http://example.com?" bar...
foo "http://example.com." bar... (encoded as: &quot;URL&quot;)
foo "http://example.com!" bar... (encoded as: &#34;URL&#34;)
foo "http://example.com'" bar... (encoded as: &#034;URL&#034;)
foo "http://example.com," bar... (encoded as: &#x22;URL&#x22;)
foo "http://example.com;" bar... (encoded as: &#x022;URL&#x022;)

URLs within matching quotes and ending [.!',;:?] punctuation outside:
foo 'http://example.com'. bar...
foo 'http://example.com'! bar... (encoded as: &apos;URL&apos; Note 1.)
foo 'http://example.com'' bar... (encoded as: &#39;URL&#39;)
foo 'http://example.com', bar... (encoded as: &#039;URL&#039;)
foo 'http://example.com'; bar... (encoded as: &#x27;URL&#x27;)
foo 'http://example.com': bar... (encoded as: &#x027;URL&#x027;)
foo "http://example.com"? bar...
foo "http://example.com". bar... (encoded as: &quot;URL&quot;)
foo "http://example.com"! bar... (encoded as: &#34;URL&#34;)
foo "http://example.com"' bar... (encoded as: &#034;URL&#034;)
foo "http://example.com", bar... (encoded as: &#x22;URL&#x22;)
foo "http://example.com"; bar... (encoded as: &#x022;URL&#x022;)

URLs with embedded quote and ampersand HTML entities:
foo http://example.com/file's_name.txt bar... ("'" encoded as: &apos; Note 1.)
foo http://example.com/file's_name.txt bar... ("'" encoded as: &#39;)
foo http://example.com/file's_name.txt bar... ("'" encoded as: &#x27;)
foo http://example.com/file&s_name.txt bar... ("&" encoded as: &amp;)

Not well-formed improperly delimited URL syntax (may not match 100% correctly):

URLs within only opening "()[]{}<>" delimiter:
foo (http://example.com bar...
foo [http://example.com bar...
foo {http://example.com bar...
foo <http://example.com bar... (encoded as: &lt;URL)
foo <http://example.com bar... (encoded as: &#60;URL)
foo <http://example.com bar... (encoded as: &#x3C;URL)
foo (http://example.com/(path)/file.txt bar... (Note 2.)
foo (http://example.com/path/(file.txt) bar... (Note 2.)
foo [http://[2001:0db8:85a3:08d3:1319:8a2e:0370:7348] bar... (Note 2.)

URLs within only closing "()[]{}<>" delimiter:
foo http://example.com) bar... (Note 2.)
foo http://example.com] bar... (Note 2.)
foo http://example.com} bar...
foo http://example.com> bar... (encoded as: URL&gt;)
foo http://example.com> bar... (encoded as: URL&#62;)
foo http://example.com> bar... (encoded as: URL&#x3E;)
foo http://example.com/(path)/file.txt) bar... (Note 2.)
foo http://example.com/path/(file.txt)) bar... (Note 2.)
foo http://[2001:0db8:85a3:08d3:1319:8a2e:0370:7348]] bar... (Note 2.)

Note 2. The linkify function demonstrated by this web page uses a single regex replace operation that is not smart enough to correctly exclude the trailing delimiter that is erroneously being included in these examples. However, by using more sophisticated logic, a smarter linkify function can be easily implemented. As an example, the analyse_links() function in linkify.js checks for balanced bracket nesting to determine which links to mark red.

URLs within only opening quotes:
foo 'http://example.com bar...
foo 'http://example.com bar... (encoded as: &apos;URL Note 1.)
foo 'http://example.com bar... (encoded as: &#39;URL)
foo 'http://example.com bar... (encoded as: &#039;URL)
foo 'http://example.com bar... (encoded as: &#x27;URL)
foo 'http://example.com bar... (encoded as: &#x027;URL)
foo "http://example.com bar...
foo "http://example.com bar... (encoded as: &quot;URL)
foo "http://example.com bar... (encoded as: &#34;URL)
foo "http://example.com bar... (encoded as: &#034;URL)
foo "http://example.com bar... (encoded as: &#x22;URL)
foo "http://example.com bar... (encoded as: &#x022;URL)

URLs within only closing quotes:
foo http://example.com' bar...
foo http://example.com' bar... (encoded as: URL&apos; Note 1.)
foo http://example.com' bar... (encoded as: URL&#39;)
foo http://example.com' bar... (encoded as: URL&#039;)
foo http://example.com' bar... (encoded as: URL&#x27;)
foo http://example.com' bar... (encoded as: URL&#x027;)
foo http://example.com" bar...
foo http://example.com" bar... (encoded as: URL&quot;)
foo http://example.com" bar... (encoded as: URL&#34;)
foo http://example.com" bar... (encoded as: URL&#034;)
foo http://example.com" bar... (encoded as: URL&#x22;)
foo http://example.com" bar... (encoded as: URL&#x022;)

URLs within only closing quotes and ending [.!',;:?] punctuation inside:
foo http://example.com.' bar...
foo http://example.com!' bar... (encoded as: URL&apos; Note 1.)
foo http://example.com'' bar... (encoded as: URL&#39;)
foo http://example.com,' bar... (encoded as: URL&#039;)
foo http://example.com;' bar... (encoded as: URL&#x27;)
foo http://example.com:' bar... (encoded as: URL&#x027;)
foo http://example.com?" bar...
foo http://example.com." bar... (encoded as: URL&quot;)
foo http://example.com!" bar... (encoded as: URL&#34;)
foo http://example.com'" bar... (encoded as: URL&#034;)
foo http://example.com," bar... (encoded as: URL&#x22;)
foo http://example.com;" bar... (encoded as: URL&#x022;)

URLs within only closing quotes and ending [.!',;:?] punctuation outside:
foo http://example.com'. bar...
foo http://example.com'! bar... (encoded as: URL&apos; Note 1.)
foo http://example.com'' bar... (encoded as: URL&#39;)
foo http://example.com', bar... (encoded as: URL&#039;)
foo http://example.com'; bar... (encoded as: URL&#x27;)
foo http://example.com': bar... (encoded as: URL&#x027;)
foo http://example.com"? bar...
foo http://example.com". bar... (encoded as: URL&quot;)
foo http://example.com"! bar... (encoded as: URL&#34;)
foo http://example.com"' bar... (encoded as: URL&#034;)
foo http://example.com", bar... (encoded as: URL&#x22;)
foo http://example.com"; bar... (encoded as: URL&#x022;)

Pre-linkified URLs in HTML or BBCode syntax (should never match):

URLs preceded with "=" (i.e. inside HTML tags):
foo href=http://example.com bar... (unquoted, no spacing)
foo href="http://example.com" bar... (double-quoted, no spacing)
foo href='http://example.com' bar... (single-quoted, no spacing)
foo href = http://example.com bar... (unquoted, with spacing)
foo href = "http://example.com" bar... (double-quoted, with spacing)
foo href = 'http://example.com' bar... (single-quoted, with spacing)

URL's preceded with "=" (i.e. inside BBCode tags):
foo [url=http://example.com/path/]LINK[/url] bar...
foo [url = http://example.com/path/]LINK[/url] bar...
foo [url="http://example.com/path/"]LINK[/url] bar...
foo [url = "http://example.com/path/"]LINK[/url] bar...
foo [url='http://example.com/path/']LINK[/url] bar...
foo [url = 'http://example.com/path/']LINK[/url] bar...
foo [url]http://example.com/path/[/url] bar...

Here's the regular expression that plucks URL's from text (PHP version):

$url_pattern = '/# Rev:20100913_0900 github.com\/jmrware\/LinkifyURL
# Match http & ftp URL that is not already linkified.
  # Alternative 1: URL delimited by (parentheses).
  (\()                     # $1  "(" start delimiter.
  ((?:ht|f)tps?:\/\/[a-z0-9\-._~!$&\'()*+,;=:\/?#[\]@%]+)  # $2: URL.
  (\))                     # $3: ")" end delimiter.
| # Alternative 2: URL delimited by [square brackets].
  (\[)                     # $4: "[" start delimiter.
  ((?:ht|f)tps?:\/\/[a-z0-9\-._~!$&\'()*+,;=:\/?#[\]@%]+)  # $5: URL.
  (\])                     # $6: "]" end delimiter.
| # Alternative 3: URL delimited by {curly braces}.
  (\{)                     # $7: "{" start delimiter.
  ((?:ht|f)tps?:\/\/[a-z0-9\-._~!$&\'()*+,;=:\/?#[\]@%]+)  # $8: URL.
  (\})                     # $9: "}" end delimiter.
| # Alternative 4: URL delimited by <angle brackets>.
  (<|&(?:lt|\#60|\#x3c);)  # $10: "<" start delimiter (or HTML entity).
  ((?:ht|f)tps?:\/\/[a-z0-9\-._~!$&\'()*+,;=:\/?#[\]@%]+)  # $11: URL.
  (>|&(?:gt|\#62|\#x3e);)  # $12: ">" end delimiter (or HTML entity).
| # Alternative 5: URL not delimited by (), [], {} or <>.
  (                        # $13: Prefix proving URL not already linked.
    (?: ^                  # Can be a beginning of line or string, or
    | [^=\s\'"\]]          # a non-"=", non-quote, non-"]", followed by
    ) \s*[\'"]?            # optional whitespace and optional quote;
  | [^=\s]\s+              # or... a non-equals sign followed by whitespace.
  )                        # End $13. Non-prelinkified-proof prefix.
  ( \b                     # $14: Other non-delimited URL.
    (?:ht|f)tps?:\/\/      # Required literal http, https, ftp or ftps prefix.
    [a-z0-9\-._~!$\'()*+,;=:\/?#[\]@%]+ # All URI chars except "&" (normal*).
    (?:                    # Either on a "&" or at the end of URI.
      (?!                  # Allow a "&" char only if not start of an...
        &(?:gt|\#0*62|\#x0*3e);                  # HTML ">" entity, or
      | &(?:amp|apos|quot|\#0*3[49]|\#x0*2[27]); # a [&\'"] entity if
        [.!&\',:?;]?        # followed by optional punctuation then
        (?:[^a-z0-9\-._~!$&\'()*+,;=:\/?#[\]@%]|$)  # a non-URI char or EOS.
      ) &                  # If neg-assertion true, match "&" (special).
      [a-z0-9\-._~!$\'()*+,;=:\/?#[\]@%]* # More non-& URI chars (normal*).
    )*                     # Unroll-the-loop (special normal*)*.
    [a-z0-9\-_~$()*+=\/#[\]@%]  # Last char can\'t be [.!&\',;:?]
  )                        # End $14. Other non-delimited URL.
/imx';
$url_replace = '$1$4$7$10$13<a href="$2$5$8$11$14">$2$5$8$11$14</a>$3$6$9$12';

Here's the Javascript version: (with some added line breaks):

var url_pattern = /(\()((?:ht|f)tps?:\/\/[a-z0-9\-._~!$&'()*+,;=:\/?#[\]@%]+)(\))
|(\[)((?:ht|f)tps?:\/\/[a-z0-9\-._~!$&'()*+,;=:\/?#[\]@%]+)(\])|(\{)((?:ht|f)tps?
:\/\/[a-z0-9\-._~!$&'()*+,;=:\/?#[\]@%]+)(\})|(<|&(?:lt|#60|#x3c);)((?:ht|f)tps?:
\/\/[a-z0-9\-._~!$&'()*+,;=:\/?#[\]@%]+)(>|&(?:gt|#62|#x3e);)|((?:^|[^=\s'"\]])\s
*['"]?|[^=\s]\s+)(\b(?:ht|f)tps?:\/\/[a-z0-9\-._~!$'()*+,;=:\/?#[\]@%]+(?:(?!&(?:
gt|#0*62|#x0*3e);|&(?:amp|apos|quot|#0*3[49]|#x0*2[27]);[.!&',:?;]?(?:[^a-z0-9\-.
_~!$&'()*+,;=:\/?#[\]@%]|$))&[a-z0-9\-._~!$'()*+,;=:\/?#[\]@%]*)*[a-z0-9\-_~$()*+
=\/#[\]@%])/img;

Happy regexing!
©2010 Jeff Roberson.
Released as open source under the MIT License