KBD

Keith Devens .com

Monday, December 1, 2008 Flag waving
All non-trivial abstractions, to some degree, are leaky. – Joel Spolsky (The Law of Leaky Abstractions)

Archive: May 08, 2003

← May 07, 2003May 09, 2003 →

Daily link icon Thursday, May 8, 2003

Top 100 video games of all time

Check out IGN's list of the top 100 video games of all time. I had a lot of the role-playing games, like the Secret of Mana and Chrono Trigger.

Yuengling

Yuengling has got to be my favorite beer.

The Regex Coach

Ok, I just downloaded and installed The Regex Coach (sorry, I left out the "The" yesterday), and it's as incredible as it sounds. Really, if you've ever used regular expressions, get this. I can't understate how fantastic this program is. You're guaranteed to understand regular expressions better after you play around with this for a while. You can step through a regex to see what happens at each step, you can get a tree view of the regular expression, and everything else it says it does Smiley But I just wanted to reiterate... great program.

Oh, and on another note, this was the one thing Komodo really had going for it that was unique (IMO), and now it doesn't have that anymore. On the other hand, now that I've spent a minute looking around the Komodo site, it really does look like a kick ass IDE. I hated it in the past because it was dog slow since it was built on Mozilla. Hopefully they've fixed that a bit. It also didn't work right for me in the past, but since I've reformatted and installed a different OS, maybe that'll be better.

Building my StructuredText parser

I'm working on the second version of my StructuredText parser again. I started the second version a long time ago in PHP, but I started from scratch this time using Python. I hate PHP, but if I get the programming right in Python first (which is much easier to do than in PHP, and much more pleasurable), then translating it to PHP won't be so much of a pain. Of course, I need to translate it to PHP since my site runs on PHP.

It's structured in three parts. First part is the parser, which builds up a StructuredText document (second part), which can then be handed off to any number of document generators (not sure what to call these yet, but that's the third part) that can output to HTML, LaTeX, DocBook, plain text again, PDF, you name it.

I've already got the document format down, and the document generator for HTML done. Those parts are easy. The document is just a plain Python data structure that looks like this (it's easier to just paste my test data in than describe the data structure):

document = {
    'head':None,
    'body': {
        'children':[
            {
                'type':ST_PARAGRAPH,
                'children':[
                    {
                        'type':ST_LINE,
                        'children':None,
                        'text':'This is some text'
                    }
                ]
            },
            {
                'type':ST_LIST,
                'params':{'type':'ul'},
                'children':[
                    {
                        'type':ST_LIST_ITEM,
                        'children':[
                            {
                                'type':ST_LINE,
                                'children':None,
                                'text':'This is a list item'
                            }
                        ]
                    }
                ]
            }
        ]
    }
}

Then, you have a series of handlers, which are dispatched and called recursively to handle the whole document. Some code will be illustrative again:

def dispatch(document):
    if document is not None and 'children' in document:
        for child in document['children']:
            if child['type'] in handlers:
                handlers[child['type']](child)

dispatch(document['body'])

#and one sample handler...
def HtmlPara(para):
    sys.stdout.write('<p>')
    dispatch(para);
    sys.stdout.write('</p>')

Right now I'm just using plain functions as handlers, though later I'll probably stick all common functions for each document generator in a class. Also, handlers is just a lookup table for all the handlers for each type of element in a StructuredText document.

Now I'm moving onto the hard part, the parser. Some assumptions: some elements in a StructuredText document can nest. For instance, you can have a blockquote within a blockquote, and within blockquotes you have to have paragraphs, and each paragraph is made of lines, and each line can have a whole host of inline formatting doohickeys to do bold, italics, links, etc. etc. The StructuredText document can of course represent more than it's easy to come up with syntax for in plain text! Smiley

But, for a sample, a blockquote looks like:

> this is a blockquote

> second blockquote

In the first version of my ST parser, every line of a blockquote had to have a ">" before it, but that won't be necessary in my second version

> this is a blockquote
that continues onto
multiple lines
with no problem

And a blank line signifies the end of the blockquote.

So, the way I envision this working (and this is mostly how my PHP version worked), is that you go through every line of the document, and for each line test it against a set of ST components that are "allowed" at each point. (For instance, a heading isn't allowed within a paragraph, but it's allowed within the document root.) Each component tries to match at that point, and if it recognizes something (a blockquote component recognizes a blockquote if the line begins with ">"), then it sucks up everything that it knows belongs to the component (blockquote sucks up everything from the first ">" to the first blank line), stripping markup specific to the element (blockquote strips any leading ">"'s) and then recursing, calling the parser on the text it's just saved (and of course, the parser will only match on the components that are allowed to nest within that other component. Oh, and when a particular component is done parsing, it leaves its parent parser at the line it finished on.

That's pretty much it, I think. I'm not quite sure what to do with all the intermediate text that gets left around everywhere, but for now I won't worry about it.

If anyone has any comments about the design, or suggestions about how you'd structure the parser part of it (which is the most complex part and the part I'm most uncertain about), I'd appreciate it if you'd share Smiley This is one of my most important projects. All of my weblog posts use my ST parser, all of my wiki pages do, and if I ever write a book, or documentation for stuff, or a paper for school, etc., I want to type it in plain text and process it with my StructuredText parser.

← May 07, 2003May 09, 2003 →
December 2008
SunMonTueWedThuFriSat
 123456
78910111213
14151617181920
21222324252627
28293031 



RSS feed RSS feed for Keith's Weblog
Atom feed Atom feed for Keith's Weblog
Weblog archive
Recent comments
  on 5 posts

Recent comments XML

new⇒Free image hosting sites

Well, TinyPic has this in its​FAQ:

> Images and videos is in​your accoun...

Keith: Dec 1, 1:13am

Join a NameValueCollection into a querystring in C#

Well with a lamba expression, this​is what I came up​with:

?!code:csharp...

Gustaf Lindqvist: Nov 30, 4:38pm

Why no generic OrderedDictionary?

Check​http://www.codeproject.com/KB/recip​es/GenericOrderedDictionary.aspx?d...

Gabrielk: Nov 27, 6:57am

WhatIsMyIP.com

http://www.thesysteminfo.com is​another good alternate to​whatismp.com... I...

Kripz: Nov 26, 8:51pm

Girls, please don't get breast implants

Actually I think it's sweet when a​man loves a woman whether she's big​or n...

218.186.12.228: Nov 26, 9:40am

Generated in about 0.115s.

(Used 7 db queries)

mobile phone