The Internet becomes a scary place if DNS can't be trusted. Michael Kassner would like to explain why, and how a group of DNS experts recently prevented that from happening.
The Internet becomes a scary place if DNS can't be trusted. I'd like to explain why this is so and how a group of DNS experts recently prevented that from happening.
In a recent article, "DNS: Painful Reminders of How Important It Is," I alluded to the problem uncovered by Dan Kaminsky (director of pen testing at IOActive and Doxpara Research). This past week at Black Hat 2008, Kaminsky finally revealed the actual details of the bug he discovered. The design flaw makes it a great deal easier to poison a name server's cache, voiding any trust in query results from that name server. In order to understand the magnitude of the bug, we need to be familiar with how a DNS query works, so let's start there.Domain Name System (DNS)
Humans like to use easily remembered names (FQDN), whereas digital machines such as computers like to use numbers (IP addresses). DNS is simply the go-between, allowing us to enter a name into a Web browser's address bar or e-mail address instead of a cryptic number. For example, I want to go to www.techrepublic.com. Let's follow the steps of how my Web browser accomplishes that:
- I type www.techrepublic.com in the Web browser.
- The Web browser then asks the resolver (a background client DNS application that resolves FQDNs to IP addresses) to convert www.techrepublic.com into an IP address.
- The resolver first checks the host's file and the local cache to see if the FQDN/IP address combination is available. If it is, the resolver passes the IP address to my computer's network protocols to set up a connection. If not, the resolver contacts the name server at my ISP, asking it for the IP address of www.techrepublic.com.
- If the ISP's name server has that information in its cache, it will immediately return the IP address to the resolver, which once again sets up a connection. One important term that we need to understand is recursive name server. It's simply a name server such as my ISP's that relies on other name servers for authoritative answers.
- If not found, my ISP's name server takes over looking for the techrepublic.com name server by querying what are called authoritative name servers. To explain, the three components of the FQDN www.techrepublic.com determine the order of the authoritative name servers queried. Com is the top level of this particular name, so the search starts there; the next level is techrepublic, and finally www.
- First, my ISP's name server queries the root name server (authoritative for .com, .net, etc). The root name server returns the IP address for the appropriate .com name server to my ISP's name server.
- Next, my ISP's name server queries the .com name server (authoritative for name servers in the .com domain) for the IP address of the techrepublic.com name servers. The .com name server returns the appropriate IP addresses to my ISP's name server.
- Now my ISP's name server queries the techrepublic.com name server for the IP address associated with www.techrepublic.com. Finally, my ISP's name server has the required IP address, sending it to the resolver application on my computer so it can establish a connection.
- Once connected, the Web server will send the specified Web pages back for display on my Web browser.
That seems like a great deal of back and forth just to get the IP address of a Web site. Thankfully, these transactions take only milliseconds due to several features incorporated in the DNS protocol. I'd like to give a quick overview of these features as they are important to our discussion about the new DNS bug.
To help reduce the number of queries, DNS uses caching. For example, my ISP's name server may have recently communicated with the .com name server. If so, my ISP's name server stores that information in its cache, cutting out a step. The more popular a Web site is, the better chance my ISP's name server will have the Web site's IP address already stored in its cache.
Time to live (TTL)
The IT department at TechRepublic decides to change the IP address of the Web site www.techrepublic.com. When that happens, how do all the name servers around the world receive the new IP address for www.techrepublic.com? It's taken care of by a component of caching called Time to live (TTL). TTL controls how long a server will cache the FQDN/IP address information. This allows new information to propagate efficiently, keeping all DNS records in the cache accurate.
"Out of bailiwick"
"Out of bailiwick" refers to any information that doesn't pertain to the original DNS query. For example, an ISP's name server asked for information about www.techrepublic.com, instead information about www.faketechrepublic.com is returned. That information is "out of bailiwick." This problem surfaced in the early editions of DNS but is no longer an issue. It does relate to the discussion on Kaminsky's DNS bug though.
Now that we have all these terms defined and have a good idea of how a DNS query works, I'd like to delve "under-the-hood." While doing so, I'd also like to include some DNS history.
Original DNS protocol
DNS started out as a completely trusting application. As the Internet matured, it became apparent that tricks could be played on DNS, redirecting users to Internet locations they didn't ask for. At first, it was harmless, but then attackers realized they could redirect users to spoofed Web sites with the hope of stealing personal information using "DNS cache poisoning."
In the original DNS protocol, the DNS query packet contained a 16-bit number (1- 65535) called a Transaction ID (TID), which is a linear-incrementing counter. The TID's purpose was to verify the DNS query response, as the queried name server must return the same TID it received in the initial query.
Therefore, if the attacker knew the Transaction ID number used in the victim DNS server's latest query, the attacker could send malicious DNS query responses using Transaction IDs. Starting with that number, each additional query response would be incremented. It's guesswork, but the attack can be accomplished. It's also important to know that the original cache-poisoning method required the attacker to wait patiently until the victim DNS server sent out a query (TTL expired) for the domain the attacker was interested in.
Cache poisoning is only possible because DNS uses UDP, a connectionless protocol as defined by Wikipedia:
"Connectionless describes communication between two network end points in which a message can be sent from one end point to another without prior arrangement. The device at one end of the communication transmits data to the other, without first ensuring that the recipient is available and ready to receive the data. The device sending a message simply sends it addressed to the intended recipient."
Being a connectionless protocol, UDP traffic doesn't use authentication. Therefore, it's possible to send spoofed query responses with altered IP addresses and port numbers.
In order to harden DNS, the use of random Transaction IDs created by a pseudorandom number generator became part of the protocol. This markedly increased the strength of DNS by forcing the attacker to guess a number from 65,535 choices, not just the next increment. To review: the attacker has two variables to resolve in order to successfully inject a cache-poisoning query response: the Transaction ID (now difficult) and the IP address/port combination (not so difficult).
16-bit randomization is still not secure
Several well-known DNS experts, including Dan Bernstein, have pointed out that with today's processing power,16-bit randomization afforded by the Transaction ID isn't enough. As a point of reference, Bernstein developed his own DNS application that randomizes the query port (Internet traffic requires an IP address and port number).
Up until now, DNS required a static port. Typically, port 53 for the query port, meaning the attacker had to guess only the Transaction ID. With 65,535 ports being available, Bernstein determined that query port randomization would add 16 bits of entropy to the mix, making it 65,535 times more difficult to guess the required query port and Transaction ID combination.
Sounds like a plan; but most variations of DNS don't use query port randomization, because it gets complicated. There are firewall and NAT router issues that come into play when you use port randomization. BIND, Cisco, and Microsoft DNS applications are some examples, and they only just recently implemented query port randomization.
Do you remember how an attacker has to wait until the TTL expired before the attacker could attempt injecting a poisoned DNS query response? Well, Kaminsky figured out how to get around that. The best way to explain would be to step through an attack. Let's say I wanted to poison my ISP's DNS record for the techrepublic.com domain. Here's what I'd do:
- I obtain the IP addresses of the authoritative name servers for techrepublic.com. I also get the IP address of my ISP's name server and what port it's using for DNS queries.
- I then send a query for a fake computer at techrepublic.com, called 11.techrepublic.com.
- Since there isn't a machine by that name, my ISP's name server must send a DNS query to techrepublic.com's name server.
- At that time, I construct a bunch of DNS query response packets that contain:
- Spoofed source IP address (so it appears to be coming from techrepublic.com)
- The correct query port (used by my ISP's name server in the DNS query)
- A Transaction ID (random selection, sufficient query responses will affect a collision)
- IP addresses of name servers of my choosing (notice this is "in bailiwick," thus accepted)
Therefore, waiting until the TTL of a domain expired is no longer required. In my example, I'm controlling when my ISP's name server is sending out a DNS query. If my query for 11.techrepublic.com didn't work, all I have to do is try 12.techrepublic.com and go through the same process until I get a collision. I'll know when that happens, because I'll get DNS information for 11 or 12.techrepublic.com from my ISP.
There are several concepts in play here that make this cache-poisoning attack vector extremely onerous, they are:
- Since the DNS query response was "in bailiwick," my ISP's name server thinks the IP addresses that I gave it are authoritative for the whole techrepublic.com domain.
- I can set the TTL of the FQDN/IP address information to an extremely large amount; it's a 32-bit number. That way the false DNS information will not expire.
- I can now set up phishing Web sites that will not trip any alarms or phishing filters.
- This design flaw is present in every recursive name server.
Do you recall Dan Bernstein and his idea of using random query ports? That's the fix for now. It adds a significant amount of randomness -- enough that the attack is no longer viable. The problem is getting all the ISPs and entities running recursive DNS servers to install patches. If you'd like to know if your ISP's name servers are patched, there are two Web sites that you can go to check. Kaminsky has a test application on his Web site "Doxpara.com" and one that I especially like at "DNS-ORAC.net."
If you find that your ISP's DNS servers are not randomizing the query port, I'd suggest asking the ISP when and if the DNS servers will be patched. Then immediately switch to a patched and secure DNS service like OpenDNS, until your ISP's DNS servers are upgraded. It's also important to know this attack vector doesn't work on Web sites using SSL; certificates can't be spoofed. Therefore, you're safe as long as you make sure the certificate is correct and the URL displays https.
If you have gotten this far, I sincerely thank you, as the article is long. I just felt it was important to explain a DNS query and how the attack vector works. Imagine not knowing for sure if the Web site you are looking at is the one you asked for. I realize that most important sites are SSL, but some of those aren't secure until the user actually logs in. Before then the Web sites could be spoofed and you wouldn't be any the wiser.
I wanted to thank Dan Kaminsky. I feel that he handled the situation responsibly. Taking any other path could have led to serious alterations of our Internet experience. I'd also like to extend my thanks to all the ISPs that have implemented the patch, because it's not a trivial undertaking.
One final point: I wanted to link "Microsoft Patch MS08-037," the Windows 2000 and 2003 Server fix for the DNS bug. I suspect many organizations have internal DNS servers that may alternatively be acting as recursive name servers.
Michael Kassner has been involved with wireless communications for 40 plus years, starting with amateur radio (K0PBX) and now as a network field engineer for Orange Business Services and an independent wireless consultant with MKassner Net. Current certifications include Cisco ESTQ Field Engineer, CWNA, and CWSP.