When ICANN began to allow registration of internationalized domain names—that is, domain names that use non-ASCII characters—they unwittingly opened a new method for phishing campaigns to succeed. Visual similarities between characters in different scripts, called homoglyphs, can be used to create domain names with visually indiscernible differences that can be used to easily fool users into believing that one domain is actually another.
Without using links, consider the differences between ТесhRерubliс and TechRepublic. One is written normally, with ASCII characters. The other substitutes the Latin-based ASCII characters characters with Cyrillic characters for T, e, c, and p. (The answer to which is written at the bottom of this article.) Russian lends itself well to homoglyph attacks, as the lowercase a, o, x, and y can be rendered identically, as а, о, х, and у, with other possibilities extant in non-Russian Cyrillic characters. Other, less precise homoglyphs are possible as well. For example, the letter i is visually similar to і (Cyrillic) and ì (Latin, with grave).
This is, to some extent, a problem in other languages as well. Consider that Japanese has three writing systems—Hiragana, Katakana, and Kanji. For the company name Mitsubishi, it would normally be written as 三菱 (three diamonds). For Japanese, the kanji for three (三) looks similar to the katakana for mi (ミ), which can lead to confusion. As would be expected, it is possible to mix and match these writing systems when registering domain names. For Traditional and Simplified Chinese, many characters are homoglyphs of each other as well.
SEE: Cybersecurity strategy research: Common tactics, issues with implementation, and effectiveness (Tech Pro Research)
Principally, this becomes a problem when attackers use these homoglyphs in phishing attacks, as it would be easy to impersonate popular websites using this type of strategy. ICANN's policies on how to deal with this problem—or IDNs in general—are sparse. As a result, each registry has its own rules about how to handle IDNs.
Many ccTLDs and new gTLDs disallow IDNs, or have restrictions on how those can be used, though these are inconsistent between registries. The .com and .net registries essentially allow anything through. By merit of being the most popular TLDs for legitimate websites, the lack of protection in this case makes it more attractive for phishers.
At present, Google Chrome, Microsoft Edge, and Mozilla Firefox handle mixed-character IDNs by reverting to punycode, that is, the ASCII representation of an IDN. Because of the complexity of changing character encoding, IDNs were implemented in a somewhat kludge-like fashion. So, from the above example, instead of seeing techrepublic.com in the address bar, you would see xn—hrubli-2ofc3hgib.com.
But, this behavior breaks situations where it would be expected to mix non-Latin characters with standard ASCII character sets. Microsoft tried to fix this problem by manually whitelisting scripts in IE, which are allowed to mix with ASCII without reverting to punycode.
There is a more elegant solution to this problem, however. For domain names that mix ASCII and non-ASCII characters, changing individual non-ASCII characters in a domain name to red in the address bar would sufficiently differentiate characters otherwise useful for homoglyph attacks while preserving the intended use of IDNs. For obvious reasons, extension engines in browsers generally do not allow this behavior to be implemented as an extension, making it necessary to implement as a feature of the browser itself.
This solution, however, is only a band-aid to a problem that exists because of ICANN's failure to generate a coherent and universally applicable set of standards for registration of IDNs to prevent this type of abuse. From a registry perspective, the best solution is probably that of .ca, which disallows another registrant from buying an accented version of an existing name.
Update (June 29, 2018): EURid, the operator of the .eu registry, has issued a notice indicating that domains using Cyrillic characters will be deleted as of June 1, 2019. The same organization is requiring domains names with Cyrillic characters to use the matching .ею TLD instead, which is also controlled by EURid. According to the organization, the move is part of a requirement forcing domain name owners to match the script of the TLD with the second-level name in order to avoid homoglyph attacks.
A report in The Register noted that this is inconsistent, as this still allows the use of any letter of the Greek alphabet, as well as accented characters from multiple European languages, including the "German ü, the Romanian ș, and the Swedish å."
Political questions aside, this is good in terms of minimizing phishing attacks, but still insufficient for differentiating characters. In Greek, omicron (ο) and in certain fonts, nu (ν) are the closest matches to ASCII characters, though slightly more abstract matches also exist, in order: εικηρτυωχγ resembles eiknptuwxy to a degree, with larger variances depending on the fonts involved. Accented characters are too numerous to mention. While these variants do exist, the further away attackers go from the intended character, the more likely a ransom note effect will occur.
As it is, the most inclusive solution to preserving the intended display of IDNs while preserving security for users is to change the color of non-ASCII characters.
While the practice of mass deletion is generally abnormal for a registry to engage in, the European Commission issued a notice in March that registrants of .eu domain names within the United Kingdom will lose their eligibility to hold .eu
*Solution: The first TechRepublic is written using Cyrillic substitute characters.
- Cybersecurity in an IoT and mobile world (ZDNet special report) | Download the report as a PDF (TechRepublic)
- Phishing alert: Hacking gang turns to new tactics in malware campaign (ZDNet)
- Net neutrality: A cheat sheet (TechRepublic)
- Facebook gives developers a tool for spotting phishing attempts (ZDNet)
- Rampant spam, falling registrations show new gTLDs have limited business value (TechRepublic)
James Sanders is a Tokyo-based programmer and technology journalist. Since 2013, he has been a regular contributor to TechRepublic and Tech Pro Research.