<?xml version="1.0" ?>
<rss version="2.0">
	<channel>
		<title>Keith's Weblog: Comments on &quot;Iñtërnâtiônàlizætiøn&quot;</title>
		<description>Keith's Weblog: Comments on &quot;Iñtërnâtiônàlizætiøn&quot;, posted on March 16, 2005</description>
		<link>http://keithdevens.com/weblog/archive/2005/Mar/16/I%c3%b1t%c3%abrn%c3%a2ti%c3%b4n%c3%a0liz%c3%a6ti%c3%b8n</link>

		<category>Programming</category>
		<category>This website</category>
		<language>en-us</language>
		<image>
			<link>http://keithdevens.com/weblog</link>
			<title>Keith Devens .com</title>
			<url>http://keithdevens.com/images/kbd.gif</url>
		</image>

		<item>
			<title>by Anne</title>
			<link>http://keithdevens.com/weblog/archive/2005/Mar/16/I%c3%b1t%c3%abrn%c3%a2ti%c3%b4n%c3%a0liz%c3%a6ti%c3%b8n#comment7231</link>
			<guid isPermaLink="false">http://keithdevens.com/weblog/6712#comment7231</guid>
			<pubDate>Thu, 17 Mar 2005 08:22:29 +0000</pubDate>
			<description>&lt;p class=&quot;st-markup&quot;&gt;You can see it quite easily if you hover the link &amp;quot;Anne claimed&amp;quot; in Firefox. You could also pass the hexadecimal encoded characters to something you know that uses UTF-8, like Google and see what the result is. This URI seems to be compliant with the IRI specification though.&lt;/p&gt;

</description>
		</item>
		<item>
			<title>by Keith</title>
			<link>http://keithdevens.com/weblog/archive/2005/Mar/16/I%c3%b1t%c3%abrn%c3%a2ti%c3%b4n%c3%a0liz%c3%a6ti%c3%b8n#comment7232</link>
			<guid isPermaLink="false">http://keithdevens.com/weblog/6712#comment7232</guid>
			<pubDate>Thu, 17 Mar 2005 09:13:50 +0000</pubDate>
			<description>&lt;p class=&quot;st-markup&quot;&gt;When I hover over it in Firefox the correct Hebrew is displayed in my status bar. Also, if I have the URI in my HTML &lt;em&gt;without&lt;/em&gt; doing the URI percent-encoding, Firefox doesn't complain about invalid UTF-8 (which it would if it was, since I'm using XHTML). Assuming Firefox handles the Unicode for the Hebrew text correctly there's no way it should be incorrect, since it goes right from Firefox to my db.&lt;/p&gt;

&lt;p class=&quot;st-markup&quot;&gt;When I said I don't know how to check, I meant that I'm not familiar enough with the Hebrew part of Unicode to know what the UTF-8 should be. But as further evidence that it's valid, I ran it through &lt;a href=&quot;/weblog/archive/2004/Jun/29/UTF-8.regex&quot;&gt;this regex&lt;/a&gt; and it said it was valid.&lt;/p&gt;

&lt;p class=&quot;st-markup&quot;&gt;What makes you think it's invalid?&lt;/p&gt;

</description>
		</item>
		<item>
			<title>by Anne</title>
			<link>http://keithdevens.com/weblog/archive/2005/Mar/16/I%c3%b1t%c3%abrn%c3%a2ti%c3%b4n%c3%a0liz%c3%a6ti%c3%b8n#comment7233</link>
			<guid isPermaLink="false">http://keithdevens.com/weblog/6712#comment7233</guid>
			<pubDate>Thu, 17 Mar 2005 09:22:16 +0000</pubDate>
			<description>&lt;p class=&quot;st-markup&quot;&gt;I get &amp;quot;‏שְׁמַ&amp;quot; as result which seems to be too short. Also, the URI itself seems to be too short as you need to have about 6 characters to express &amp;quot;ë&amp;quot;. See &lt;a href=&quot;http://www.w3.org/2003/06/mod_fileiri/#Testing&quot;&gt;http://www.w3.org/2003/06/mod_fileiri/#Testing&lt;/a&gt;&lt;/p&gt;

&lt;p class=&quot;st-markup&quot;&gt;(Note also that you should probably enable UTF-8 for URIs in Firefox. That is on by default in recent nightlies.)&lt;/p&gt;

&lt;p class=&quot;st-markup&quot;&gt;It looks more like legacy encoding than UTF-8 to me.&lt;/p&gt;

</description>
		</item>
		<item>
			<title>by Keith</title>
			<link>http://keithdevens.com/weblog/archive/2005/Mar/16/I%c3%b1t%c3%abrn%c3%a2ti%c3%b4n%c3%a0liz%c3%a6ti%c3%b8n#comment7234</link>
			<guid isPermaLink="false">http://keithdevens.com/weblog/6712#comment7234</guid>
			<pubDate>Thu, 17 Mar 2005 09:39:25 +0000</pubDate>
			<description>&lt;blockquote class=&quot;st-markup&quot;&gt;&lt;p&gt;I get &amp;quot;‏שְׁמַ&amp;quot; as result which seems to be too short.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p class=&quot;st-markup&quot;&gt;Well, that's all I put. Two characters with two vowels. Though in fact the &amp;quot;Shema&amp;quot; &lt;em&gt;should&lt;/em&gt; have included the silent third letter--‏שְׁמַ֖ע--so that was my mistake in only including the first two. I've &lt;em&gt;never&lt;/em&gt; understood why Hebrew has all these silent letters!&lt;/p&gt;

&lt;p class=&quot;st-markup&quot;&gt;Here's what I get if I break the URL up into sets of three octets:&lt;/p&gt;

&lt;pre class=&quot;st-markup&quot;&gt;%E2%80%8F %D7%A9%D7 %81%D6%B0 %D7%9E%D6 %B7%D6%96
&lt;/pre&gt;

&lt;p class=&quot;st-markup&quot;&gt;So, that's big enough for four letters plus one. The plus one is most likely that the Hebrew is encoded such that it encodes the letter that serves as the base for both &amp;quot;sin&amp;quot; (שׂ) and &amp;quot;shin&amp;quot; (‏‏שׁ) separately from the dot (which I'm sure has a name that I forget). That makes five characters in total, so it adds up correctly.&lt;/p&gt;

</description>
		</item>
		<item>
			<title>by Keith</title>
			<link>http://keithdevens.com/weblog/archive/2005/Mar/16/I%c3%b1t%c3%abrn%c3%a2ti%c3%b4n%c3%a0liz%c3%a6ti%c3%b8n#comment7235</link>
			<guid isPermaLink="false">http://keithdevens.com/weblog/6712#comment7235</guid>
			<pubDate>Thu, 17 Mar 2005 09:49:02 +0000</pubDate>
			<description>&lt;p class=&quot;st-markup&quot;&gt;Oh, by the way Anne. Just so I'm straight, IRIs are just URIs where non-ASCII characters don't have to be percent-encoded so long as they're in UTF-8, right?&lt;/p&gt;

&lt;p class=&quot;st-markup&quot;&gt;What's the status of IRIs? Is the standard done? Are we just waiting for browsers to catch up so we can start using them? How bad is browser support, currently? (Sorry to bother you with so many questions.)&lt;/p&gt;

</description>
		</item>
		<item>
			<title>by Anne</title>
			<link>http://keithdevens.com/weblog/archive/2005/Mar/16/I%c3%b1t%c3%abrn%c3%a2ti%c3%b4n%c3%a0liz%c3%a6ti%c3%b8n#comment7237</link>
			<guid isPermaLink="false">http://keithdevens.com/weblog/6712#comment7237</guid>
			<pubDate>Thu, 17 Mar 2005 12:53:52 +0000</pubDate>
			<description>&lt;p class=&quot;st-markup&quot;&gt;IRIs are a standard. It was published together with the new URI RFC. See RFC 3986 (URI) and RFC 3987 (IRI). Some browsers already support them if you use the correct configuration. Here is a simple testcase: &lt;a href=&quot;http://www.w3.org/2001/08/iri-test/resumeHtmlImgSrcBase.html&quot;&gt;http://www.w3.org/2001/08/iri-test/resumeHtmlImgSrcBase.html&lt;/a&gt;&lt;/p&gt;

&lt;p class=&quot;st-markup&quot;&gt;(Recent nightlies of Mozilla show a green image there. You can get the same result in Firefox if you change some options in about:config.)&lt;/p&gt;

&lt;p class=&quot;st-markup&quot;&gt;I wasn't aware that you just took a few characters of Hebrew. I thought you took the whole title, just like you are doing here.&lt;/p&gt;

</description>
		</item>
		<item>
			<title>by Keith</title>
			<link>http://keithdevens.com/weblog/archive/2005/Mar/16/I%c3%b1t%c3%abrn%c3%a2ti%c3%b4n%c3%a0liz%c3%a6ti%c3%b8n#comment7238</link>
			<guid isPermaLink="false">http://keithdevens.com/weblog/6712#comment7238</guid>
			<pubDate>Thu, 17 Mar 2005 20:29:13 +0000</pubDate>
			<description>&lt;blockquote class=&quot;st-markup&quot;&gt;&lt;p&gt;I wasn't aware that you just took a few characters of Hebrew. I thought you took the whole title, just like you are doing here.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p class=&quot;st-markup&quot;&gt;No problem. I'm just glad to know it's not broken.&lt;/p&gt;

&lt;p class=&quot;st-markup&quot;&gt;Thanks for the info on IRIs and the link to the test case. I played around with the Firefox setting and now I understand what the trouble is. The page encoding is latin-1, but the URIs always have to be in UTF-8 regardless of page encoding. So, it seems that if your pages use UTF-8 you get IRIs &amp;quot;for free&amp;quot; regardless of what your browser does?&lt;/p&gt;

</description>
		</item>
		<item>
			<title>by Anne</title>
			<link>http://keithdevens.com/weblog/archive/2005/Mar/16/I%c3%b1t%c3%abrn%c3%a2ti%c3%b4n%c3%a0liz%c3%a6ti%c3%b8n#comment7239</link>
			<guid isPermaLink="false">http://keithdevens.com/weblog/6712#comment7239</guid>
			<pubDate>Thu, 17 Mar 2005 23:34:36 +0000</pubDate>
			<description>&lt;p class=&quot;st-markup&quot;&gt;Yeah, I believe that is an advantage of UTF-8 :-)&lt;/p&gt;

</description>
		</item>
	</channel>
</rss>
