Keith Devens .com |
Friday, August 29, 2008 | ![]() |
| All I want is to have my peace of mind. – Boston (Peace of Mind) | ||
|
| ← John Podhoretz on how liberals think Bush is the "Evil Doofus" | Kyoto Protocol is just lots of hot air → |

Hans (http://zephyrfalcon.org/) wrote:
Hans (http://zephyrfalcon.org/) wrote:
That should really have been "\\:", of course; it's lucky that "\:" happens to work as well.
Keith (http://keithdevens.com/) wrote:
Replace the escaped colon "\:" with the special character, then split on the real colon...
But you still have the question of how to identify the escaped colon. For instance, you can have "key\::value", "key\\::value", "key\\\::value", or "key\\\\::value". The first colon is "real" only on the second and fourth examples, but looks like "\:" in all of them.
Leland Johnson (http://protoplasmic.org/) wrote:
Well, in perl you can use the fancy "zero-width negative look-behind assertion" and "zero-width positive look-behind assertion" thingies.
Try this:
perl -e'print join("\n", split(qr/(?<=\\\\)
(?<!\\):/, <STDIN>))'
with this input:
asdfsdf:here be colon \: haha!:ending backslash \\:real end
Should result in:
asdfsdf
here be colon \: haha!
ending backslash \\
real end
Here's it explained:
(?<=\\\\):
Match "\\:" without including backslash backslash in the match (so the split doesn't eat it up).
or
(?<!\\):
Match a ":" that does not have a backslash backslash proceeding it.
(Note that backslashes were escaped in the regex, but not the sample strings.)
So you'd still have to replace "\:" with ":" and back back with back afterwards anyways.
The "zero-width negative look-behind assertion", "(?<!pattern)", and "zero-width positive look-behind assertion", "(?<=pattern)", expressions are quite germane/obscure though.
'
I'll readily admit that I had to pull up perlre to remember this, and I got it wrong a few times too.
I would very much advise against using this in production code. It doesn't handle "\\\\\:" and etc properly. A regex doesn't beat a state machine though. I think it wins on speed and readability, but I wouldn't know. Thanks for the fun problem though!
Text::Balanced might be able to do what you want.
And your blogging software seems to be acutally using something like my solution - I can't get "n backslashes and a double quote" (were n >= 3) to display properly.
Leland Johnson (http://protoplasmic.org/) wrote:
And your blogging software stole the colon pipe in the middle of the expression and replaced it with
. 

Posting from lynx is not fun.
Keith (http://keithdevens.com/) wrote:
If you'd used a code block it wouldn't have happened:
perl -e'print join("\n", split(qr/(?<=\\\\):|(?<!\\):/, <STDIN>))'
Keith (http://keithdevens.com/) wrote:
I think what I want to do is provably impossible. Only, I'm not sure how to prove it. I was hoping someone more clever than I could think of a way.
Jonas wrote:
import re
print re.split(r'(?<!\\):', 'te\:st\\1:va\\lue1')
129.42.208.182 wrote:
Instead of saying "one or more non-colon characters", you say "one or more of either a backslash followed by anything or a non-colon character".
/^\s*((\\.|[^:])+)\s*:\s*(.*?)\s*$/
Keith (http://keithdevens.com/) wrote:
>>> re.split(r'(?<!\\):', 'te\\:st\\1:va\\lue1')
['te\\:st\\1', 'va\\lue1']
Keith (http://keithdevens.com/) wrote:
129.42.208.182:
Excellent. Only one problem:
$\ = "\n";
$_ = 'key\\\\\\\\\\:value';
/^\s*((?:\\.|[^:])+?)\s*:\s*(.*?)\s*$/;
print;
print $1;
print $2;
(regex modified slightly to make the inner group non-capturing and to make the capture non-greedy)
Prints:
key\\\\\:value
key\\\\\
value
So, any way you can think of to get it to not split in a case like this when it shouldn't?
Note that if there was another option for it:
key\\\\\:value:value2
key\\\\\:value
value2
It correctly waits until it finds a match later on in the string.
Keith (http://keithdevens.com/) wrote:
Hey, I think this did it:
$\ = "\n";
$_ = 'key\\\\\\\\\\:value';
/^\s*((?:\\.|[^:])+)\s*(:?)\s*(.*?)\s*$/;
print;
print $1;
print $2;
print $3;
Prints:
key\\\\\:value
key\\\\\:value
And if you add a slash above it prints:
key\\\\\\:value
key\\\\\\
:
value
So, as long as $2 is set to a colon you know it got through the key to the value, and not just that there was a blank value.
Alternatively:
use warnings;
use strict;
$\ = "\n";
$_ = 'key\\\\\\\\\\:value';
/^\s*((?:\\.|[^:])+)\s*:?\s*(.*?)??\s*$/;
print;
print $1;
print $2;
Gives:
key\\\\\:value
key\\\\\:value
Use of uninitialized value in print at test.pl line 8.
Now I just wonder if there's any way to make the regex fail algother on an invalid key:value.
Feel free to post a comment below. Please see my comment policy.
Formatting Rules (No HTML):
Generated in about 0.16s.
(Used 8 db queries)

I'm not sure a regular expression is necessary here. If there is a character that is certain not be used in your key/value string (e.g. 0xff), then you can use that. Replace the escaped colon "\:" with the special character, then split on the real colon, then replace the special character in the parts with a colon again.
>>> s = "foo\:bar:baz\:xyzzy"
>>> t = s.replace("\:", "\xff")
>>> t
'foo\xffbar:baz\xffxyzzy'
>>> parts = t.split(":")
>>> parts
['foo\xffbar', 'baz\xffxyzzy']
>>> key = parts[0].replace("\xff", ":")
>>> value = parts[1].replace("\xff", ":")
>>> key, value
('foo:bar', 'baz:xyzzy')
Just my $0.02...