Last week's tizzy about IDN (Internationalized Domain Name) spoofing was an interesting exercise in watching how people react to the unknown. The nearly-universal response to the problem that had been described in detail many years ago was "turn off IDNs" instead of "assume that the people who created IDNs knew about this, so let's do some research."
The following is based on my thoughts this week. For those of you who are not familiar with my earlier work, I'm one of the authors of the IDN standards, and I busted my butt for a couple of years on getting them done correctly and standardized. Yes, I get a tad prickly when people attack "my" standard, particularly when they do so without knowing what they are talking about. I have been trying to get out of the IDN business for the past year or so, but I can see that I'm failing, at least today.
The problem started with Eric Johanson's paper which came out of the recent Shmoocon. The "problem" is that Eric presented the issue mostly correctly, but stated that the solution was to disable IDNs in browsers. That's only a good security measure when the feature that is being insecure has no benefit; IDNs clearly have lots of benefit, but not to the people who are advocating to turn them off. Eric showed the problem very well in a proof of concept that includes SSL servers as well, using Cyrillic characters that look just like ASCII in the fonts used by IE and Mozilla to show URLs.
Quick answers that don't work for the world
After Eric's paper, numerous "fixes" to the problem came out. They fell into one of two categories, both of which are predicated on the idea that IDNs are bad:
- Turn off IDN resolution in the browser altogether
- Make IDN resolution obnoxious so that users pay attention to it
Reading the ensuing Slashdot and other coverage gave me the feeling that nearly everyone talking was from the US, UK, or Australia, the three countries that have the least native need for IDNs.
It also became clear that few of the folks in the discussion knew much about Unicode (and, in some cases, the DNS...). Suggestions like "find all the homographs and map them together" and "ban all domain names that have more than one language in them" reminded me of discussions four years ago with people who were also unfamiliar with the basic topics but felt empowered to speak anyway.
For completeness, I should explain why both of those proposals are silly. The number of homographs in Unicode is in the thousands under the best of situations, and much higher in the worst. It, of course, depends on the font being used, the carefulness of the person who created the font, and so on. For example, U+013C (ļ, lowercase "l" with cedilla), sometimes appears without the cedilla in smaller fonts. Banning all domain names with more than one "language" would ban names that include both non-ASCII and ASCII characters. This ignores how deeply English and French have mixed with other languages; it is common to find businesses with the word "shop" or "café" in their names throughout the world.
Better solutions
Given the assumption that billions of people would actually like to have their domain names be in characters that they use every day, there has to be better solutions to the homograph spoofing problem. Fortunately, there are. I talked with Eric (a week too late, unfortunately, but I blame that on our mutual friend Rodney Thayer who managed not to hook us up even though Rodney knew I was co-author on the IDN standard...), the folks at VeriSign's i-Nav group (who has the very popular plugin for Internet Explorer, and with whom I have worked a bit), and a few other weary folks from the IDN standards days. Most agreed that "turn it off" and "make it obnoxious" were not the right solutions for anyone other than the English-only crowd.
Given that the problem is that domain names with more than one script can cause homograph confusion, the solution should highlight names that have more than one script and say what script the characters come from. This can be done with a hover-over pop-up that looks something like:
Note that the pop-up is not a warning, it is informative. There are zillions of valid names that have two scripts in them; there are many, particularly in Japan, that will have three scripts.
The difficult question is how to show the pop-up in a way that alerts about spoofing but doesn't get in the way of normal IDNs. One easy way to put an icon to the left of the "favicon" in the address bar, such as:
(Better and more noticeable art is obviously possible.) If that can't be done, replacing the favicon itself is probably sufficient if it is glaring enough, possibly even blinking.
As Eric showed, there is also an issue with SSL sites. Users rarely look at certificates, but if they did today for sites with IDNs in their names, they would see the "xn--"ified name, not the readable one. This would certainly cause confusion. However, even if a certifier (such as Thawte) puts the readable name in the certificate so that it matches what the user sees, doing so would mask possible spoofing. At that point, it would be up to the browser showing the certificate to give more clues (such as my proposed pop-up above) about the actual contents of the domain name.
Next steps
It would be wonderful, but unlikely, if the various browser developers could work together on this. The meta-discussions last week showed all too well that there is some very deep, possibly justified, mistrust in the browser developer community. If someone took the lead on doing the right thing, others would have to at least respond, if not exactly match what the leader did.
Thus, my greatest hope is for Mozilla, who has consistently shown a great concern for getting internationalization right. Their first round of IDN support wasn't great, but they responded to bug reports and did some very good followup. Apple has another round of Safari coming out in a few months, and VeriSign can slip-stream new versions of the i-Nav plugin to their users at any time. Each of these organizations will probably make a different tradeoff between antiispoofing and IDN usability; hopefully, none of them take the extreme stance seen last week.
Thus, there is hope that the "just turn off IDNs" and "make IDNs unattractive" crowds will be able to be quelled, or maybe just limited to the English-dominant countries. There are probably other good solutions that I haven't thought about, and hopefully Eric's re-raising the issue will help spur some creative thinking that lead to even better uses of IDNs in browsers. For example, the Unicode Consortium has already been discussing IDN spoofing at the same time as it looks at other security-related topics. The early draft of their technical report suggests many different ways of thinking about the problem. But it is clear that what would be best is that the proposed solutions come from people who have both a reasonable understanding of internationalization and a reasonable amount of care about languages other than English.