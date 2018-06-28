When ICANN began to allow registration of internationalized domain names—that is, domain names that use non-ASCII characters—they unwittingly opened a new method for phishing campaigns to succeed. Visual similarities between characters in different scripts, called homoglyphs, can be used to create domain names with visually indiscernible differences that can be used to easily fool users into believing that one domain is actually another.

Without using links, consider the differences between ТесhRерubliс and TechRepublic. One is written normally, with ASCII characters. The other substitutes the Latin-based ASCII characters characters with Cyrillic characters for T, e, c, and p. (The answer to which is written at the bottom of this article.) Russian lends itself well to homoglyph attacks, as the lowercase a, o, x, and y can be rendered identically, as а, о, х, and у, with other possibilities extant in non-Russian Cyrillic characters. Other, less precise homoglyphs are possible as well. For example, the letter i is visually similar to і (Cyrillic) and ì (Latin, with grave).

This is, to some extent, a problem in other languages as well. Consider that Japanese has three writing systems—Hiragana, Katakana, and Kanji. For the company name Mitsubishi, it would normally be written as 三菱 (three diamonds). For Japanese, the kanji for three (三) looks similar to the katakana for mi (ミ), which can lead to confusion. As would be expected, it is possible to mix and match these writing systems when registering domain names. For Traditional and Simplified Chinese, many characters are homoglyphs of each other as well.

Principally, this becomes a problem when attackers use these homoglyphs in phishing attacks, as it would be easy to impersonate popular websites using this type of strategy. ICANN's policies on how to deal with this problem—or IDNs in general—are sparse. As a result, each registry has its own rules about how to handle IDNs.

Many ccTLDs and new gTLDs disallow IDNs, or have restrictions on how those can be used, though these are inconsistent between registries. The .com and .net registries essentially allow anything through. By merit of being the most popular TLDs for legitimate websites, the lack of protection in this case makes it more attractive for phishers.

At present, Google Chrome, Microsoft Edge, and Mozilla Firefox handle mixed-character IDNs by reverting to punycode, that is, the ASCII representation of an IDN. Because of the complexity of changing character encoding, IDNs were implemented in a somewhat kludge-like fashion. So, from the above example, instead of seeing techrepublic.com in the address bar, you would see xn—hrubli-2ofc3hgib.com.

But, this behavior breaks situations where it would be expected to mix non-Latin characters with standard ASCII character sets. Microsoft tried to fix this problem by manually whitelisting scripts in IE, which are allowed to mix with ASCII without reverting to punycode.

There is a more elegant solution to this problem, however. For domain names that mix ASCII and non-ASCII characters, changing individual non-ASCII characters in a domain name to red in the address bar would sufficiently differentiate characters otherwise useful for homoglyph attacks while preserving the intended use of IDNs. For obvious reasons, extension engines in browsers generally do not allow this behavior to be implemented as an extension, making it necessary to implement as a feature of the browser itself.

This solution, however, is only a band-aid to a problem that exists because of ICANN's failure to generate a coherent and universally applicable set of standards for registration of IDNs to prevent this type of abuse. From a registry perspective, the best solution is probably that of .ca, which disallows another registrant from buying an accented version of an existing name.

*Solution: The first TechRepublic is written using Cyrillic substitute characters.

