KBD

Keith Devens .com

Friday, July 4, 2008 Flag waving
"Let me get this straight. He met with terrorists? Oh, that's good." – John Edwards (on finding out John Kerry's own diary testified to him meeting with North Vietnamese terrorists in Paris)
← My site is now fully unicode-ized and xhtml-izedSize does matter! Shorter is better :) →

Daily link icon Saturday, May 15, 2004

Unicode testing

I figured it'd be fun to paste in some foreign-language text and see how my site handles it now that I do Unicode Smiley

I got the following texts from this helpful Unicode test page. I really wanted to find the Shema somewhere where it wasn't shown as an image, but I wasn't able to.

По оживлённым берегам
Громады стройные теснятся
Дворцов и башен; корабли
Толпой со всех концов земли
К богатым пристаням стремятся;

Ἰοὺ ἰού· τὰ πάντʼ ἂν ἐξήκοι σαφῆ.
Ὦ φῶς, τελευταῖόν σε προσϐλέψαιμι νῦν,
ὅστις πέφασμαι φύς τʼ ἀφʼ ὧν οὐ χρῆν, ξὺν οἷς τʼ
οὐ χρῆν ὁμιλῶν, οὕς τέ μʼ οὐκ ἔδει κτανών.

पशुपतिरपि तान्यहानि कृच्छ्राद्
अगमयदद्रिसुतासमागमोत्कः ।
कमपरमवशं न विप्रकुर्युर्
विभुमपि तं यदमी स्पृशन्ति भावाः ॥

This is supposedly Chinese, but my browser doesn't have the fonts installed so I get a bunch of question marks:

子曰:「學而時習之,不亦說乎?有朋自遠方來,不亦樂乎?
人不知而不慍,不亦君子乎?」

有子曰:「其為人也孝弟,而好犯上者,鮮矣;
不好犯上,而好作亂者,未之有也。君子務本,本立而道生。
孝弟也者,其為仁之本與!」

ஸ்றீனிவாஸ ராமானுஜன் ஐயங்கார்

بِسْمِ ٱللّٰهِ ٱلرَّحْمـَبنِ ٱلرَّحِيمِ

ٱلْحَمْدُ لِلّٰهِ رَبِّ ٱلْعَالَمِينَ

ٱلرَّحْمـَبنِ ٱلرَّحِيمِ

مَـالِكِ يَوْمِ ٱلدِّينِ

إِيَّاكَ نَعْبُدُ وَإِيَّاكَ نَسْتَعِينُ

ٱهْدِنَــــا ٱلصِّرَاطَ ٱلمُسْتَقِيمَ

صِرَاطَ ٱلَّذِينَ أَنعَمْتَ عَلَيهِمْ غَيرِ ٱلمَغضُوبِ عَلَيهِمْ وَلاَ ٱلضَّالِّين

Here's testing some punctiation:

The convention in English is “to use double quotation marks to indicate quotation, and ‘single quotation marks’ for nested quotations.”

En français la convention est « d'utiliser les guillemets français doubles pour les citations, et “ les guillemets anglais doubles ” ou bien ‹ les guillemets français simples › pour les citations imbriquées. »

Auf Deutsch ist die Vereinbarung »umgekehrte zweifache Anführungszeichen für die Zitate zu benutzen, sogar ›einfache Anführungszeichen‹ für die verschachtelte Zitate«; diese Anführungszeichen „dürfen auch solche ‚englische‘ Anführungszeichen sein.“

The en-dash is used between numbers such as in: 1685–1750 (J. S. Bach). It is longer than the hyphen (as in “en-dash”, or, more properly, “en‐dash”) but shorter than the em-dash, which is used — like this — as a sort of parenthesis. Neither should be confused with the horizontal bar which is used to introduce quotation in some cases.
― Like this?
― Right.

The ellipsis is… well, it just is.

It'll be interesting to see if my StructuredText parser dies on all of this.

Other than that, a concern I have is that someone would be able to post invalid Unicode data to my site and have me take it. I wonder if any other environments (than PHP) ensure that text coming into it is in the proper encoding.

Ok, here goes. Will I corrupt MySQL? Will I make PHP crash? Tune in next time, same bat time, same bat channel.


Ok then. Everything seems to have worked (but of course I still can't verify the Chinese), except that the Arabic is not right-justified. I wonder what I have to do to do that. The page I got this from had the following to begin the Arabic blockquote: <blockquote xml:lang="ar" lang="ar" dir="rtl">. Language and direction are two things I guess I should shoehorn into my StructuredText parser. Interestingly, that page's encoding was ASCII and they used all entities to include the other languages, so that page didn't actually use Unicode itself at all.

← My site is now fully unicode-ized and xhtml-izedSize does matter! Shorter is better :) →

Comments XML gif

Adam V. wrote:

I see Asian characters, not question marks. Though of course I can't verify that they are in fact Chinese characters.

∴ Adam V. | 15-May-2004 1:22pm est | #4586

Keith (http://keithdevens.com/) wrote:

Cool, thanks.

Keith | 15-May-2004 2:18pm est | http://keithdevens.com/ | #4587

Adam V. wrote:

If you have an MS Office CD set handy, it may have the "Arial Unicode MS" font on it. This font used to be free for download, but apparently it has been removed. It's a rather large font (23MB!), but worth tracking down if you're dealing with Unicode issues.

Some combination of Firefox and Windows chose to render this entry using that font, so I can see all the languages above.

∴ Adam V. | 15-May-2004 2:33pm est | #4588

Keith (http://keithdevens.com/) wrote:

I just tested under IE6. It has boxes all over the place for many of the different languages. Mozilla deals with Unicode better.

Keith | 15-May-2004 4:02pm est | http://keithdevens.com/ | #4589

209.114.245.216 wrote:

The Chinese text is, in fact, Chinese. Some sort of classical poem, I'm guessing. It's actually traditional Chinese characters, so I'm going to paste some simplified characters into this comment:

这达标、那达标,
都要农民掏腰包;

这大办、那大办,
都是农民血和汗。

∴ 209.114.245.216 | 15-May-2004 5:52pm est | #4590

Keith Gaughan (http://talideon.com/) wrote:

Some nitpicking: you're not supposed to have spaces on each side of an em-dash.

Bit annoyed today because my Compilers final didn't go as well as it should have because my lecturer (who set the paper) is an idiot. Grrr! Smiley frowning

∴ Keith Gaughan | 17-May-2004 10:13am est | http://talideon.com/ | #4595

Randy Charles Morin (http://www.kbcafe.com) wrote:

∴ Randy Charles Morin | 1-Jun-2004 3:21pm est | http://www.kbcafe.com | #4716

Keith (http://keithdevens.com/) wrote:

What the heck? I thought I followed their guidelines precisely, but I made a mistake. Thanks for pointing that out. Now my page validates.

Keith | 4-Jun-2004 7:36pm est | http://keithdevens.com/ | #4731

Gerardo (http://ase-usa.net) wrote:

Your testing of chinese characters works fine on my browser. I am particularly having a hard time on a project where i have to use XSLT and XML doc and an XSL style sheet to display a language translation. Do you know of any specific declarations or markups that have to be made to the XML or XSL for this to occur?? Is your page an XSLT transaltion? many thanks anyone!

∴ Gerardo | 2-Jul-2004 2:55pm est | http://ase-usa.net | #4900

Keith (http://keithdevens.com/) wrote:

Do you know of any specific declarations or markups that have to be made to the XML or XSL for this to occur?? Is your page an XSLT transaltion? many thanks anyone!

My page is not an XSLT translation. The only thing I can recommend to you is that you make sure all of your text is in Unicode.

Keith | 3-Jul-2004 6:03am est | http://keithdevens.com/ | #4904

Gerardo (http://ase-usa.net) wrote:

Thanks for your response. I went away on vacation last week (yeah -- it was too short). I just read my previous post. I meant to say transform instead of translate. I am using the .Net framework's XslTransform class' Transform method. I'm not even sure if you are using .Net for your blog. It's hard to find help on encoding! Thanks again.

Gerardo

∴ Gerardo | 12-Jul-2004 10:31am est | http://ase-usa.net | #4985

Feel free to post a comment below. Please see my comment policy.

Formatting Rules (No HTML):

  • **bold**, *italic*, _underlined_, --strikeout--
  • "text"="url" creates a link, and URLs are auto-highlighted
  • Blockquote: Like e-mail, begin paragraph with > (greater-than sign)
  • Lists: begin paragraph with *,-, or + (unordered), or # (ordered)
  • Code block: ?!code:language=perl|php|sql|javascript|etc.{\n}...{\n}?!/code

:
(will be your IP address if blank)
: (optional)
(Will not be shown on site)

: (optional)
:

July 2008
SunMonTueWedThuFriSat
 12345
6789101112
13141516171819
20212223242526
2728293031 



RSS feed RSS feed for Keith's Weblog
Atom feed Atom feed for Keith's Weblog
Weblog archive
Recent comments
  on 5 posts

Recent comments XML

new⇒Court rejects death penalty for raping children - Yahoo! News

Keith is not a person. I have this​on good authority. He's actually a​very,...

M. Bean: Jul 4, 2:05am

Girls, please don't get breast implants

> And no, you will not be receiving​a picture.

:-(...

Keith: Jul 2, 6:05am

Javascript clone function

This is a clever way to clone an​object if you are using YAHOO UI.​Same tec...

Antonio: Jul 1, 12:47pm

I hate Norton Antivirus

Oh just one other thing norton is​great at keeping people out of your​compu...

kevin.sands: Jul 1, 12:50am

Terminator 3 was awful

I think the biggest reason why T3​totally blew was because Edward​Furlong g...

76.167.172.64: Jun 29, 3:06am

Generated in about 0.13s.

(Used 8 db queries)

mobile phone