MozillaZine has info on International Domain Names. I'm interested in learning how they work. Here's a page on the standards involved, including a list of relevant RFCs, and here's a list of "technical documents".
What's interesting is the following:
Encoding Scheme
The encoding scheme for IDNs will be an ASCII Compatible Encoding (ACE) that will encode the local language characters of an IDN into ASCII characters such that DNS can accurately answer a request for an address record. There are several types of ACE. In order to select an ACE as the standard, IETF must consider the difficult balance between compression and implementation. The preferred ACE will allow the greatest number of characters (code points) to be represented and will not be difficult to deploy. The IETF has chosen an ACE known as Punycode to be the standard.
So it seems they aren't using UTF-8?? Here's the RFC for Punycode. One of the most obvious questions I can think of is "How does Punycode compare to UTF-16 and UTF-8?", yet they don't answer that in any of their FAQs.
OK, after reading a little of the RFC for Punycode:
Punycode is a simple and efficient transfer encoding syntax designed
for use with Internationalized Domain Names in Applications (IDNA).
It uniquely and reversibly transforms a Unicode string into an ASCII
string. ASCII characters in the Unicode string are represented
literally, and non-ASCII characters are represented by ASCII
characters that are allowed in host name labels (letters, digits, and
hyphens). This document defines a general algorithm called
Bootstring that allows a string of basic code points to uniquely
represent any string of code points drawn from a larger set.
Punycode is an instance of Bootstring that uses particular parameter
values specified by this document, appropriate for IDNA.
So, Punycode seems to be sort of a BASE64 encoding meant for Unicode strings that decomposes them into ASCII characters. It appears that the canonical form of an internationalized domain name will be in Punycode, so now my only question is how those are distinguished from ordinary domain names (how are the namespaces separate?)
Well, here are some datasheets and whitepapers. I'll have to figure out the rest later.
Keith:
By way of introduction I am the technical director for the i-Nav family of plug-ins here at Verisign. Our plug-ins essentially provide resolution and display capabilities to IE, Outlook and Outlook Express. This is required until native support, as in the case of Mozilla is built into the application based on the standards you reference above.
You raised the question at the end:
"...so now my only question is how those are distinguished from ordinary domain names (how are the namespaces separate?)"
The point is that they are not. The encoding of the domain is an ASCII representation of that domain based upon the IDNA standards (which you reference above). As far as DNS is concerned this is in the same name space as any other ASCII domain. This is true for all domains irrespective of the TLD.
In other words lets take the domain: müller.de. The encoded form of this domain is: xn--mller-kva.de. If you did a DNS Dig of the name it would return the following:
; <<>> DiG 2.1 <<>> @dns1.menandmice.is xn--mller-kva.de A
; (1 server found)
;; res options: init recurs defnam dnsrch
;; got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10
;; flags: qr rd ra; Ques: 1, Ans: 1, Auth: 2, Addit: 0
;; QUESTIONS:
;; xn--mller-kva.de, type = A, class = IN
;; ANSWERS:
xn--mller-kva.de. 3600 A 81.2.176.59
;; AUTHORITY RECORDS:
xn--mller-kva.de. 3600 NS ns1.nameservice.de.
xn--mller-kva.de. 3600 NS ns6.nameservice.de.
;; Total query time: 166 msec
;; FROM: us.mirror.menandmice.com to SERVER: default -- 0.0.0.0
;; WHEN: Wed Apr 14 18:24:01 2004
;; MSG SIZE sent: 34 rcvd: 98
Which as you can see is not in a separate namespace but directly in the dotDE zone.
Hope this helps.
Gary.