This is the regular expression and its replacement that parses URLs in my markup parser. My markup parser is what formats all the text on my weblog, including all comments. These are fed to preg_replace to do their job.
<?php
$regex = "/(?<=^|[\s({])(\w+:\/\/[\w.,\/?+~&=_:;#$%{}-]+[^\s.?!:,()<>[\]'\"&]+?)/e"; #urls
$style = '\'<a href="\'.\'$1\'.\'">\'.((($len = strlen(\'$1\')) > '.
$flags['max_url_length'].') ? (substr(\'$1\',0,'.
ceil($flags['max_url_length']/2).').\'...\'.substr(\'$1\',$len-'.
(floor($flags['max_url_length']/2)-3).')) : \'$1\').\'</a>\'';
?>
The code's been changed a little bit (for instance, broken up so it's not too long on one line), but you get the idea.
As you can see, this code wasn't built from any formal grammar of URLs... it's just been built ad hoc as I've come across URLs it it didn't parse. It's possible that it's horribly inefficient compared to what it could be. If you see anything that should be different, let me know 
Besides recently fixing the code to allow tildes in URLs (which should have been in there a long time ago), I finally took a few minutes today to fix the code so that really long links won't blow out my layout. Works exactly right.
By the way, if you want to implement this code, make sure you stick two calls to htmlspecialchars() in there, otherwise it won't work correctly for ampersands, etc.
Note: updated the regex a bit.
I've mentioned HTTrack before, but that's kind of the point here. One of the main reasons I keep my weblog is so that I can keep track of links I've come across so I can find them when I need them. I finally had a need for HTTrack, which I didn't at the time, so I found it, used it, and it worked flawlessly. This is how my blog is supposed to work
I recommend the program, by the way.
I hate PHP
Elliot Anderson,
Dude!! You theman! The reverse replacement forarray_u...
Alex Ndungu: Oct 11, 1:35am