The X-Philes, a list of XHTML valid sites. Via Jacques, via Sam. I'm going to ask to be added to the list.
My site isn't bulletproof because I don't strip control characters (which I should). I also deliver MIME types other than application/xhtml+xml to user agents that don't include it in their ACCEPT header, but I think that's a completely reasonable approach.
I need to write a cross-platform C library that can handle strings in a way that makes everyone happy, from a Unix programmer who likes to deal with UTF-8 encoded char* strings to a COM programmer who needs to use BSTRs (or any of the other multitude of string types Windows has).
Essentially, the library needs to parse a data file format with string values that are UTF-8 encoded and can contain null bytes. This is library code, so I need to consume the file and pass the caller a data structure with strings he can use.
How do I make everybody happy? I could pull a BSTR[1] (except with chars instead of wchars), or I could just make every string value a struct with a length and a char*. Either way, Windows people could create a BSTR from it assuming there's some Windows routine to convert UTF-8 to UTF-16 that doesn't expect null-terminated strings. And I assume "normal" C programmers would be happy enough with a char* pointing to a bunch of bytes containing a UTF-8 encoded string that they can do whatever they want with.
Anyway, what's the best way to handle this that will make the most people happy? What would you recommend I do if I was using C++ instead?
Also, I have a separate question. I've been wondering this for a while: if you create something like a BSTR where your pointer points not to the beginning of the memory you allocated but at some index in, what will free do if you don't pass it the location at the start of that allocation? I assume it's not smart enough to know that your pointer is in the middle of block X and then free that, right? 
Footnotes:
[1]: For those who don't know, BSTR is a pointer to something like struct{word_t size, wchar_t[length] chars}, where chars is null-terminated (note, two bytes of 0's because wchar_t) and the BSTR is actually a pointer to struct.chars (so you index one word back to get the length, and can treat the BSTR as a normal wchar_t null-terminated string if your string doesn't have null bytes in it
new⇒Perl 6 1.0 in March?
Doh, my mistake. I'm aware of therelation between Parrot and Rakudobut I'...
Keith: Dec 2, 1:03am