StructuredText parser release history
- StructuredText parser release history
- .80 first public release: 4/14/02
- .81 4/15/02
- .85 4/19/02
- .86 4/19/02 (a few minutes later)
- .87 4/19/02 (a little while later)
- .88 4/21/02
- .88b 4/22/02
- .89 4/23/02
- .90 4/23/02
- .91 4/23/02 (same day)
- .91b 4/23/02
- .91c 4/24/02
- .95 4/26/02
- .95b 4/27/02
- .95c 4/27/02
- .95d 5/8/02
- .96 5/13/02
- .97 5/14/02
- .98 5/17/02
- .98b 5/17/02
- .98c 5/20/02
- .98d 5/21/02
- .99 5/21/02
- .99b 5/22/02
- .99c 5/22/02
- .99d 5/23/02
- .99e 5/23/02
- .99f 5/24/02
- .99g 5/25/02
- To do
- Notes
Note: Some styles shown here may be slightly different than they actually are used. At the time of writing (4/16/02, version .81) I don't have escaping built into the parser yet, so I've had to cheat a little. Also, the rules should be kept up to date with the parser, so at any time you should be able to get a full summary of what the parser can do.
.80 first public release: 4/14/02
Featured:
- Headings
- Unordered lists and Ordered lists of any type, nested in any way, to any level
- Links: both autogenerated from something that looks like a URL, and in a format that looked like "linktext" ="url"
- Inline styles, bold, italic, underline, and monospaced, in any combination
- blockquotes
- horizontal rules
- tables (with colspans)
- images in the format img:"url" or img:"url":"alttext"
- A few other minor combinations of url formats and image formats - such as "linktext" :"url"
- Any arbitrary stylesheet
- Raw HTML (since normally, all HTML is escaped)
.81 4/15/02
Added:
- Smilies

- as well as the text substitution "framework"
- now you can do plain text substitutions rather than everything having to be a regex
- Started work on more directives. ?page and ?leadin. You can now get a list of pages in the document with the get_page_info flag
- Added the absolute_links flag (converts any relative link into an absolute link), but no code yet.
- Little bit of code reorg/cleanup
.85 4/19/02
- Finished ?page directive - when you ask for page 0 it gives you all pages. When you ask for page > 0 it'll give you just that page. If you ask for a page greater than the max number of pages, it'll just return an empty string. Oh, almost forgot. If you ask for all pages by sending the flag "get_page_info" flag, it'll return a list of page titles. This would be useful to generate a table of contents for a multi-page article.
- Made a lot of progress in replacing regexes with normal string functions where possible. A lot fewer =preg_replace='s and a lot more =substr='s. Hopefully this means that the parser will be faster. I have yet to do any benchmarking, however. Rather spend my time coding than benchmarking, of course. Right before version 1 I'll probably go over past versions (which I've been saving) and see how they compare.
- Added the table of contents directive. You can now see it on this page and get an idea for how it works.
.86 4/19/02 (a few minutes later)
- Added block preformatted text
- Greatly improved output markup appearance, and basically fixed the way I'm dealing with line breaks (and yet again, fewer regexes)
.87 4/19/02 (a little while later)
- a brand new algorithm for links in the format "linktext"="url". Uses no regexes, and no hacks to get around limitations in the regexes either! It better be faster this way.
.88 4/21/02
- minor changes, bugfixes.
- Added the ability to control the width and alignment of a horizontal rule.
- "---" at the beginning of a line is just "<hr>",
- "---5" at the beginning of a line is "<hr width="50">"
- "---75" at the beginning of a line is "<hr width="75">"
- if a width is specified, you can also specify an alignment
- A horizontal rule with "r" or "R" at the end of it is aligned right
- A horizontal rule with an "l" or "L" at the end of it is aligned left
.88b 4/22/02
- Improved the URL regex to not catch periods, commas, exclamation points, etc. on the end of a URL.
.89 4/23/02
- Fixed a bug in the URL regex that was created by what I did with .88b

- Put in a lot of flags to control what features are available. You can have fine grained control and disallow headings, images, monospaced or preformatted text (since someone could break your layout by including long lines), tables, directives, smileys, and on and on.
- Put in the strikeout style. I couldn't decide whether to make it be one - (hyphen) or two... so I compromised and allowed both

- Neatened up the regexes for all the other styles
.90 4/23/02
- Finished ?leadin directive
- Added markup escaping: *emphasis* is normally emphasis. \\*emphasis* escapes it. Most things can be escaped. Unfortunately, right now the escaping system isn't completely general, but it's pretty good. I'm going for DWIMmetry above all else here, so I'd rather make it work better for how it's most often going to be used than make it work "right" all the time. Maybe later I'll be able to figure out how to make it work "right" and DWIM, all at the same time.
- Just to give you some idea of what I'm talking about, the parser works so that you don't always have to escape backslash, even though backslash is the escaping character. So \\ can stand on its own, but is special in certain contexts, like when you're escaping \*styles*. Anyone who knows how to program knows what it's like to have to escape every single backslash character. This lets you avoid that. It DWIMs.
- Finished adding "enable" flags that let you choose what features to enable. Now I get to make sure people don't put headings, directives, or raw HTML in comments on my weblog.
.91 4/23/02 (same day)
- Coded for the absolute links flag. Automatically converts any relative link, such as /weblog/ into http://www.keithdevens.com/weblog/. This way, images and links can be relative on my main weblog page, and be made absolute for my rss feed. Pretty cool, huh? The logic for it could probably be improved by a little bit, but it seems to work well as is.
.91b 4/23/02
- Put in a simple framework for giving the ?leadin directive somewhere to actually point to to get the rest of the text

.91c 4/24/02
- Improved the URL regex a bit. The fact that HTML is escaped before my regular expressions are run is giving me a headache. > is replaced with >, for instance, so if a link is surrounded like so: <http://www.keithdevens.com/>, the ending > is still on the end because it's hard to say, in a regex, "Not >", though it's easy to say "Not >" - that's just [^>]
.95 4/26/02
- Changed the design around a lot.
- The design is cleaner
- it now outputs valid XHTML, -- Closes paragraphs and everything!

- It now collapses whitespace in a paragraph. So if you have a bunch of blank lines, it'll treat it as simply the end of one paragraph and the beginning of another, not a paragraph with a bunch of <br />s in it. This allows it to produce the same output, even if you decide to put some leading space before a directive, before a blockquote, etc. This is usually what you want. If you really want to have a bunch of blank lines, you can just put equals signs down the left for how many lines you want (that makes preformatted text)
- Because all normal text is enclosed within <p> tags now, I added an option ("use_initial_para") to skip the first paragraph so it won't be padded with whitespace.
- Added an escape to links in the form "linktext"="url". Just put a \ in front of the whatever quote in the link. That'll also let you include quotes within the link, like so: Keith "The Rock" Devens
- Similar escape also added to image links: img:"/images/kbd.gif":"This is the "KBD" image on my website" is

- Added the "location_offset" 'flag' so you can have absolute links resolve to places aside from your root.
- Renamed the "full_location" flag to "main_location". main_location is the 'canonical' location for the links after a ?leadin to point to.
- Added a "mad" smiley face:

.95b 4/27/02
- Fixed a bug with strikeout: if you had stuff like this -- here is some text between double hyphens -- it would consider that striked out, even though it obviously shouldn't be. That's because the hyphen was considered a word character (something that would be allowed to be surrounded by inline markup). So, just for the strikeout style, I removed the hyphen as a word character, and it fixed the problem. Note that --here is some text between double hyphens--, is supposed to be striked out, like so:
here is some text between double hyphens.
.95c 4/27/02
- Fixed the heading id generation. Now all headers are guaranteed to have valid ids for use in a ?tableofcontents.
.95d 5/8/02
- Changed the formatting code for strikeout. Now it must be two hyphens -- rather than one, and it won't happen in the middle of something--like--this anymore. Unless you use three-
hyphens-I suppose. I'll have to improve my regex to more completely specify the boundary characters I want. Right now I just use "\W", which includes the hyphen, which I don't want. Time to get out my ASCII chart 
.96 5/13/02
- Added an "output_directly" flag that will make it output the markup directly rather than returning it as a string - function returns an empty string if this flag is set
- Added stylesheet classes to all HTML tags output by the parser.
.97 5/14/02
- Added the ability to autonumber headings, like in the CSS2 specification, for instance.
- The only thing left is to be able to choose what number you start numbering at
- Added stylesheet classes to lists, which I had missed before
- Added the ability to have unordered lists with no "bullet" by using the prefixes _ or *~.
I don't expect _ to clash with underlining too much. Oops, it did. Now you can only do "blank lists" with *~
- Also changed the lists to use the "list-style-type" style rather than the deprecated type attribute.
.98 5/17/02
- Added footnotes[1] "[# this is an autonumbered footnote]"[2]. "[#label This is a footnote with a specific label]"[label]. They'll be displayed at the bottom of this list.
- Also added a ?footnotes directive. By default, all footnotes will be displayed at the end of the document. However, if you'd like to show all footnotes per heading, per paragraph, etc. all you have to do is use the footnotes directive. It'll show all footnotes seen up to that point, and then clear the list of footnotes for next time.
- Added two more unordered list beginnings: "-" and "+". "-" seems to be commonly used for lists, and reStructuredText also allows "+" to be used. I figure both shouldn't clash with anything really, so I added them[3]
- Made a few other improvements, fixes. For instance, all headings now get a unique id so that if they contain the same text (which shouldn't happen anyway), unique ids will still be generated in the HTML.
- Finally, I added an "enable_footnotes" flag to match the new footnotes functionality.
.98b 5/17/02
- Minor bug fixes - I had to change the "precedence" of lists verses horizontal rules, now that I included "-" as a list begin character.
- Also changed how footnotes are formatted, slightly.
?footnotesFootnotes:
[1]: The only thing I have to improve is to make it be able to differentiate between numeric labels and autonumbered footnotes. In other words, I have to make sure that if someone puts a numeric label, it won't overwrite any automatically generated footnote. This will let you have footnotes with specific numbers if you choose.
[2]: this is an autonumbered footnote
[label]: This is a footnote with a specific label
[3]: You have to be consistent in your use of the different types of unordered lists. If you begin a list with a "*" and try to continue it with "-", it'll be considered a new list.
.98c 5/20/02
- Bug fixes/minor improvements:
- Made all lists output pretty printed HTML
- Fixed a bug in footnotes. I had to make the formatting more specific, otherwise it lost some DWIMmetry. Before, footnotes could look like this: [# auto-numbered footnote] or this [label labeled footnote]. Now they have to look like: [# auto numbered footnote] or [#label labeled footnote]
- Added the ability to put whitespace between list item specifiers. So your lists can begin with things that look like * * # , etc., rather than **#. You can put any number of tabs, spaces, etc. It's not quite as nice yet as DocUtils does it, but at least this can make deeply nested lists of different types a little easier to read.
.98d 5/21/02
- Added the ability to float an image right or left by using imgr or imgl instead of just img:"img":"alt"
- Made some of the function's data static for a speedup (hopefully) on calls after the first.
.99 5/21/02
- Got that whole extension framework in
- And the first extension was a code highlighter (which right now has a PHP mode)
- The extension to highlight code in general is ?!code
- And if you give it a language parameter of php ("?!code:language=php") then it'll PHP highlight the code.
- And you end the extension by ?!/code.
.99b 5/22/02
- Added line numbering to my code block extention. Parameter is "line_numbers=1". Could be improved a tad, maybe.
- Also made sure that if no language is specified: nl2br and htmlspecialchars are called on the source text.
.99c 5/22/02
- Added an HTML code highlighter
.99d 5/23/02
- Improved blockquote highlighting. Now it behaves more like paragraphs, in that consecutive lines beginning with ">" are in the same blockquote. A blank line makes the following blockquote start a new blockquote.
.99e 5/23/02
- Replaced all state checks with defined constants instead of just string comparisons. Had to be done

.99f 5/24/02
- In an asymptotic march towards version 1.0, added some self-reflection features. There is now a ?settings directive that will show you all current flags and their settings.
- Cleaned up code a little.
- Changed the behavior of directives when the directive isn't enabled.
- Previously, the line would be blanked out if a directive was used which wasn't enabled.
- Now, it'll actually display the text of the directive. In other words "?footnotes", "?tableofcontents", etc. will literally be displayed.
- Fixed leadin behavior
- Now, the enable_leadin flag controls whether leadins are enabled, and show_leadin controls whether the leadin is displayed, if leadins are enabled. There was also a bug I introduced along the way where I compared the first seven characters after the ? (for the directive) to see if they matched "leadin". However, "leadin" is only 6 characters, so that was wrong.
.99g 5/25/02
- Turns out CSS doesn't allow underlines in class selectors, so I changed the class for all elements from "st_markup" to "st-markup". Before 1.0, I may also allow you to define your own class to go along with markup elements.
- Added paragraphs inside blockquotes - just found out they're necessary for valid HTML.
- Added alt tags to all the image replacements (for smileys) for valid HTML.
- Changed <u> to <em style="font-style: normal; text-decoration: underline"> because <u> isn't allowed in XHTML (boo).
To do
- For Version 1
- Fix a parsing peculiarity with extentions where if there's no blank line the parser isn't put back into a "nothing" state before continuing.
- Add a "top level heading" flag which says what is the top level heading you can have in your document. It would break the structure of a document if you had an h1 where an h4 (or whatever) should be.
- Add a few more styles, including plain indenting (without having to be a blockquote or a list)
- Make sure I'm generating valid XHTML.
- I just found out blockquotes have to have internal elements - like paragraphs, etc.
- Add comments
- Provide arbitrary anchors - with this in place I can go about rewriting all the files on my site to use the markup parser. I'll never have to write HTML again! Mwa ha ha ha! (Although of course, for static content, using the markup parser to render it every time doesn't make sense)
- For Version 2 (not necessarily everything)
- Add definition lists
- Provide an option for headings to link back to the table of contents. (actually, this requires two passes over the source - I don't like it)
- Do inclusion and transclusion
- Inclusion is when a file is taken verbatim and included in the final output
- Can be from the local filesystem
- Or from an Internet resource
- And I need to be able to specify whether the file is then to be interpreted or not
- Transclusion is when you include a resource and put it in an iframe, basically

- Maybe redo the quoting linking to be a little bit smarter, and look for "=" instead of just a quote.
- Maybe add PHP code highlighting
- Other directives:
- ?c comment? - probably just ??
- ?pragma type - (maybe) - can switch modes for the following text until another pragma is hit.
- So if you hit ?pragma raw you don't have to put ";" in front of every line.
- This also solves the nesting issue I had worried about with making tags like [tag]a bunch of lines[/tag]
- Be able to choose what number your headings start at if they're autonumbered - this is definitely for after version 1
- Custom text substitutions somehow? so (?scriptingnews?) or (?= scriptingnews?) might turn into <a href="http://www.scripting.com/">Scripting News</a>? Or maybe even use backquotes (`) to do this?
- or Maybe just any arbitrary anchor. All of this may be an extension of the normal linking format. "link" = "#anchor".
- (but how are anchors generated... ?anchor directives?)
- Add the option to consider all markup one paragraph.
- Steal ideas from every wiki, StructuredText NG, etc.

- Other needs, which I'm not sure how to provide now:
- a I need some way to have multiple paragraphs in a list item
- a I need a "literal" type of markup, distinct from "raw HTML". Literal will do HTML escaping, but not any of the styles. I suppose it's a cross between raw HTML and PRE blocks. I'll probably make my literal be a directive ?literal, ending with ?/literal, maybe. Though, it would be nice to have an inline format, but this would complicate just about everything else in the script, so that's probably out. If you want it inline, use escapes. If I were to use an inline format, I'd probably do it with reverse quotes (`). However, if I cheat a little and have the "literal" inline markup just escape all other markup, and then evaluate it as normal, it should work just fine. Something to try.
- a I want to do definition lists, but I haven't been able to decide on a syntax. I'll probably do something that looks like RFC822 headers (maybe with the colon in the front like some of the StructuredText parsers do, but I want to be very careful with my use of the colon on the first line. I only get to use it once, so I want to use it for something good, and definition lists aren't used that frequently).
- I want to support multiple lines in a cell, rowspans, alignment within table cells, and header rows and columns. Definitely not the way other StructuredText parsers do it, which is to have this gross thing that really does look like a table, with hyphens and plus signs and everything - not easy to parse, too much typing, etc. A major goal is to be able to have this stuff parsed line by line, while maintaining minimal state between lines. That's how I have it now, and I'm confident it can be continued that way. Other StructuredText parser formats are for too complex in this respect.
Notes
- reStructuredText uses significant leading whitespace to delimit markup. I'm not so sure I like this, so I don't think I'm going to do it.
- It also has "adorned" headings, which I definitely don't like (way more complicated, takes more typing, complicates the parser a lot, isn't well defined, etc.), so I'm not going to do that either.