O'Reilly Network: Spidering Hacks
From the "Holy crap, I never knew this existed, but it's awesome!" department, check out Template::Extract:
One day, I was fiddling about with the Template Toolkit (http://www.template-toolkit.com/) and it dawned on me that all these sites were, at some level, generated with some templating engine. The Template Toolkit takes a template and some data and produces HTML output.
Okay, you might think, very interesting, but how does this relate to scraping web pages for RSS? Well, we know what the HTML looks like, and we can make a reasonable guess at what the template ought to look like, but we want only the data. If only I could apply the Template Toolkit backward somehow. Taking HTML output and a template that could conceivably generate the output, I could retrieve the original data structure and, from then on, generating RSS from the data structure would be a piece of cake.
Like most brilliant ideas, this is hardly original, and an equally brilliant man named Autrijus Tang not only had the idea a long time before me, but—and this is the hard part—actually worked out how to implement it. His Template::Extract Perl module (http://search.cpan.org/author/AUTRIJUS/Template-Extract/) does precisely this: extract a data structure from its template and output.
This tip was an excerpt from O'Reilly's new book, Spidering Hacks by Kevin Hemenway, author of AmphetaDesk, and Tara Calishain (of ResearchBuzz fame!)
Also check out Autrijus Tang's Template::Generate, which completes the trifecta of Perl template parsing tools:
Template: ($template + $data) ==> $document # normal
Template::Extract: ($document + $template) ==> $data # tricky
Template::Generate: ($data + $document) ==> $template # very tricky
You've got to be kidding me 
You may not realise it, but a Tutorial was published in the Perl Advent Calendar today on Template::Extract!
Template::Extract
http://perladvent.org/2003/5th/
...thought you might find it interesting.