Security

BGP and Internet security: Is it better to be lucky or good?

Does "it's not a problem until it actually happens" apply to Internet security? Michael P. Kassner interviews a networking expert who's wondering about the same thing.
What would you think if someone intimately familiar with the inner workings of digital networking begins a paper by pronouncing the Internet insecure, then asks, "Does this mean we have simply been lucky, or are the issues more theoretical than actual?"

The "nonsecure" proclamation and evocative question were found in this ACM paper written by Chris Hall, senior engineer with Highwayman Associates and founding executive director of Communications Research Network at Cambridge University Computer Laboratory.

Is there a problem?

Since the Internet consists of countless disparate networks, there has to be some way to figure out how traffic gets from point A to point B. For example, what tells the digital bits, making up an email from me, how to get to my friend in Sousse, Tunisia, some 8000 kilometers away.

Border Gateway Protocol (BGP) does a "lion's share" of the figuring. It is the protocol Internet backbone devices use to make high-level routing decisions. And it is BGP that has Chris concerned.

Way back in 2008, I wrote an article in which an expert talked about some of the same concerns Chris has, mentioning that BGP's vulnerabilities are an Internet time bomb waiting to go off. I asked Chris to describe what he sees as the problem with BGP.

Hall: When we talk about BGP and its vulnerabilities, we are rarely talking about the protocol itself; we are generally talking about the routing system built on the protocol. Routers which implement BGP also use other common mechanisms to process the information carried by the protocol, so it can be tricky to distinguish between BGP and BGP implementation issues.

The most obvious vulnerability is the inability to check if the routing information carried by BGP is correct. A less obvious issue is BGP announces that a destination is reachable, but does not announce how much traffic can be handled.

Every now and then, some network administrator somewhere makes a small mistake generating bogus routing information that BGP blindly accepts, and relays across the Internet. The effect of such a Route Leak is that data is diverted from its intended destination, usually ending up in a black hole from which there is no return. Since that can be achieved by accident, people worry what might be achievable with malice aforethought?

Another issue with BGP is the Route Hijack, in which some network announces routes for addresses it has no business using. The most benign case of this is where some network co-opts unused addresses. A less benign use would be announcing routes to divert traffic to the announcer, where it could be examined, discarded, or otherwise disrupted.

The final issue with BGP is how long it takes to respond after a major flap. Faced with a large-scale change in routing, it may take BGP minutes to cope. For many purposes, that is not a problem, but it will disrupt services like VoIP. More of a challenge to the system would be repeated large-scale changes in routing, where slow responses could mean by the time a given route reaches some distant part of the Internet, it is no longer valid.

Why not fix BGP?

Chris mentioned in the paper the current version of BGP is 18 years old. Chris also indicated that securing BGP would be costly and take years to accomplish. I asked Chris if BGPSEC might be the answer.

Hall: BGPSEC holds out promise that the information it covers can be verified. But, current routers do not meet the processing and memory requirements needed by BGPSEC. To implement BGPSEC means either expensive upgrades to existing routers, extensive network changes moving BGPSEC out of routers into a new BGPSEC-plane, or wait for equipment turnover.

Also, BGPSEC is not a complete solution; it only covers part of the information carried by BGP. It does not allow a network to verify that its announcements are consistent with policy, nor can a remote network's announcements be checked against their policy-- so BGPSEC is of no help with Route Leaks.

Fortunately, the operational layer seems to be doing a good job.

Operational layer

You may be wondering what makes up the operational layer -- people. According to Chris:

Each network in the Internet has a Network Operation Center (NOC) that monitors its own network, its connections to other networks, responds to incidents when they occur, strives to maintain acceptable levels of service and reliability, and at an acceptable cost. Each NOC acts independently and interacts with other NOCs, collectively forming the operational layer.

I asked Chris if he had an example of where the people working in a NOC made a difference.

Hall: One good example is the China Telecom incident (18 minute mystery) that occurred in April of 2010. Approximately 15% of all Internet addresses were disrupted, but only for 18 minutes. This mishap is a testament to the effectiveness of the operational layer.

For what it’s worth, the "China Incident" is a wonderful example of a Route Leak -- it speaks volumes about the ability of ignorance coupled with paranoia to wind a small fat-finger incident into an attack on everything from national security to apple pie.

For the long haul

Because of the Internet's complexity, it seems human intervention will be required for quite some time. Is that how you see it, Chris?

Hall: I wouldn't put it that way. The Internet is not just IP and BGP. The Internet is a many-layered system, and each layer plays a part. There are technical solutions to some security and reliability issues. However, if we only consider a technical solution for a given problem we may not come up with the most effective solution.

Furthermore, if we do not consider commercial and economic implications, we may find that the proposed solution will never be implemented, because it is not cost effective nor are there any economic incentives to implement the solution.

Without becoming too philosophical: we can always improve the automatic systems which run large networks. We can always improve monitoring systems to ensure the network is working properly, and we can make the systems easier to use. But, when something unusual happens...

An Internet NOC

Chris mentioned numerous times how important the NOC was to a network's health, and its ability to interact with other networks. I asked Chris if there was some kind of a centralized NOC for the Internet.

Hall: There is no global view of how well the Internet works, and no view of how it responds to events large or small. We also know next to nothing about demand or capacity.

ISPs have an economic incentive to monitor their own networks, but no incentive to consider the Internet as a whole. If we collectively consider the Internet to be a common good, then we should strive to understand the Internet's performance parameters. It would lead to a more secure Internet -- a common good. But sadly, the incentive is missing.

Final thoughts

Now to the question that started it all:

"Does this mean we have simply been lucky, or are the issues more theoretical than actual?"

Have we been lucky: maybe. Between the hard-working people running NOCs, and realizing bad guys are just as dependent on the Internet as the rest of us; I'd say lucky or not is relative.

As for theoretical versus actual, one can go into metaphysical overload debating when or if an issue turns from theoretical to actual, but it is evident that something is working; and as Chris pointed out, that something is having people who are at the controls.

I'd like to thank Chris for his worthy explanations and the ACM for allowing me to use parts of the article. And I almost forgot, Chris wanted me to mention the ACM article was derived from this ENISA paper.

About

Information is my field...Writing is my passion...Coupling the two is my mission.

7 comments
rm
rm

I'm reminded of the old fashioned concept of a "public document". To view one, you had to go to a physical location and pay a fee for a copy. The barrier to entry was fairly high and demanded that the viewer spend a lot of time in pursuit of the information. As we quickly learned when these "public documents" became available on the Internet, access became trivial and embarrassingly abused. I'm afraid that security vulnerabilities on an open networked system will ALWAYS be found and exploited. It is just a matter of time, just as it was when we found that DES encryption wasn't good enough and began to use double & triple DES. Neal Stephenson makes the case in Cryptonomicon that 2048 bit encryption will probably be inadequate in a few years. Meet the ants in my kitchen. Every time I find the crack from which they emerge and seal it the kitchen is ant free for a few days, weeks or months. Then they find a new crack and the cycle begins anew. Code - like a Wikipedia entry, is ruthlessly and endlessly scrutinized.

wdewey@cityofsalem.net
wdewey@cityofsalem.net

Security is about more than just technical controls. To say the Internet is insecure because it relies on an imperfect protocol ignores the administrative controls in place. There are a limited number of AS numbers that can be allocated and there are controls in place as to who gets them. To pass data back and forth it requires a relationship to be configured which makes it more difficult to just inject routes. While I think it would be great to improve BGP, I would also say that there are a lot of controls already in place. I don't know enough about BGP to say that is is or is not secure, but there have been accidental incidents that indicate that NOC tech's are paying attention and that incidents are noticed in minutes and completely resolved in hours. Is this sufficient for current business needs? I have opinions on this matter, but it is really up to each business to decide that. I have to disagree a little when the author says that BGP is 18 years old. While that may be technically true there have been numerous improvements to the protocol. According to Wikipedia (not the best source I know) version 4 was codified in 2006 which is only 7 years ago. It's kind of like a 747. The air frame may be 20 years old, but often times subsystems are redesigned and fleets retrofitted. This is also what has happened with IPv4. It has been tweaked (private IP space, IPSEC) and has been running longer than most people though it would. Without private IP space IPv4 addresses would have ran out years ago. Bill Edited to changed ratified to codified (exact verbiage used by Wikipedia)

Michael Kassner
Michael Kassner

Does "its not a problem until it actually happens" apply to Internet security? Michael P. Kassner interviews a networking expert who's wondering about the same thing

Michael Kassner
Michael Kassner

Was that people working in NOCs are what is keeping the system going. As for each business unit deciding, that seems like an inappropriate idea, as BGP errors or mishaps affect more than just the associated business unit.

wdewey@cityofsalem.net
wdewey@cityofsalem.net

People working in any environment is what keeps the system going. If you look at a DNS amplified DDOS attack and you see a creative manipulation of an automated system. Without people and administrative controls every system can be manipulated in a malicious manor. I don't see this as being any different than any other system. I was simply saying that if the current response times are insufficient then entities need to find ways to avoid or accept the risk. "each business unit deciding" was talking about if the level of risk with getting knocked off the Internet by a BGP error. If this level of risk is unacceptable then having either some mitigation in place or pushing standards bodies to develop new standards. There are processes that can be used to help with this. The Internet Routing Registry (IRR) and the Routing Policy Specification Language (RPSL) can help keep larger ISP's from transmitting errors. These tools basically help automate policies so that routes are only trusted and readvertised if they originate from an entity that has control of that IP space. This is an interesting service as well (http://www.renesys.com/products/routing-alarms/). This service will alert you if your route is hijacked. There were a couple other services listed on the site I got this from, but they appear to not be available anymore and one specifically said there wasn't a lot of interest so the system was a low priority to the entity. Bill

wdewey@cityofsalem.net
wdewey@cityofsalem.net

To be honest I just found them when I was looking up some info to make sure I remembered things correctly. The number of network engineers and companies that run BGP is fairly limited (only about 65000 globally reachable AS numbers exist) so it is a niche market from a global perspective. Bill

Michael Kassner
Michael Kassner

I was unaware of the tools you mentioned. The Renesys tool is of particular interest, but you saying there is low interest also makes sense.