KBD

Keith Devens .com

Monday, March 15, 2010 Flag waving
The very name "selection" implies that you're choosing between two or more variants. So that means that the end... – Dr. Walter Veith
← Amaya project closedSharpReader 0.9.0.2 released →

Daily link icon Thursday, May 15, 2003

Explanation of the remaining problem with my StructuredText parser

I figured I'd throw this out there. This explains what the last remaining difficulty in my StructuredText parser is.

The problem is this... if you have tokens which are made up of the same characters, the parser needs to do a better job of figuring out which one opens and which one closes. It has to DWIM better than it's doing. So, if you have bold be '*', and italic be '*', and have a line like ***bold italic* italic* or ***bold italic* bold** it's got to be able to parse both correctly Smiley My regular expressions in my current parser do that correctly, so when I get a chance to dive into it I should be able to make what worked for my regular expressions work for my new parser without regular expressions. How hard it's going to be to code depends on whether the regular expressions worked because of backtracking.

I think part of the solution is going to be to, for a given open or close tag (such as '*'), look at what other tags share that character, and simply make sure that it can only be one tag. That is, to make sure it's not ambiguous. This involves precedence rules to disambiguate, such as "short tags match first", so that '*' will match before '*', and it shouldn't involve searching around more than a few characters. Hopefully, it will only involve code when matching an opening tag, and hopefully it'll only involve looking *forward, and not back. We'll see.

By the way, I've added a StructuredText page on my wiki to keep track of different StructuredText implementations. Also keep in mind that when I refer to StructuredText, that's not one set thing. I use "StructuredText" to refer to any plain text format that uses conventions to give structure to the markup without having to use a formal markup language like XML or LaTeX, etc.

← Amaya project closedSharpReader 0.9.0.2 released →

Comments XML gif


Feel free to post a comment below. Please see my comment policy.

Formatting Rules (No HTML):

  • **bold**, *italic*, _underlined_, --strikeout--
  • "text"="url" creates a link, and URLs are auto-highlighted
  • Blockquote: Like e-mail, begin paragraph with > (greater-than sign)
  • Lists: begin paragraph with *,-, or + (unordered), or # (ordered)
  • Code block: ?!code:language=perl|php|sql|javascript|etc.{\n}...{\n}?!/code

:
(will be your IP address if blank)
: (optional)
(Will not be shown on site)

: (optional)
:

March 2010
SunMonTueWedThuFriSat
 123456
78910111213
14151617181920
21222324252627
28293031 



RSS feed RSS feed for Keith's Weblog
Atom feed Atom feed for Keith's Weblog
Weblog archive

Generated in about 0.12s.

(Used 8 db queries)