What would you think if someone intimately familiar with the inner workings of digital networking begins a paper by pronouncing the Internet insecure, then asks, “Does this mean we have simply been lucky, or are the issues more theoretical than actual?”

The “nonsecure” proclamation and evocative question were found in this ACM paper written by Chris Hall, senior engineer with Highwayman Associates and founding executive director of Communications Research Network at Cambridge University Computer Laboratory.

Is there a problem?

Since the Internet consists of countless disparate networks, there has to be some way to figure out how traffic gets from point A to point B. For example, what tells the digital bits, making up an email from me, how to get to my friend in Sousse, Tunisia, some 8000 kilometers away.

Border Gateway Protocol (BGP) does a “lion’s share” of the figuring. It is the protocol Internet backbone devices use to make high-level routing decisions. And it is BGP that has Chris concerned.

Way back in 2008, I wrote an article in which an expert talked about some of the same concerns Chris has, mentioning that BGP’s vulnerabilities are an Internet time bomb waiting to go off. I asked Chris to describe what he sees as the problem with BGP.

Hall: When we talk about BGP and its vulnerabilities, we are rarely talking about the protocol itself; we are generally talking about the routing system built on the protocol. Routers which implement BGP also use other common mechanisms to process the information carried by the protocol, so it can be tricky to distinguish between BGP and BGP implementation issues.

The most obvious vulnerability is the inability to check if the routing information carried by BGP is correct. A less obvious issue is BGP announces that a destination is reachable, but does not announce how much traffic can be handled.

Every now and then, some network administrator somewhere makes a small mistake generating bogus routing information that BGP blindly accepts, and relays across the Internet. The effect of such a Route Leak is that data is diverted from its intended destination, usually ending up in a black hole from which there is no return. Since that can be achieved by accident, people worry what might be achievable with malice aforethought?

Another issue with BGP is the Route Hijack, in which some network announces routes for addresses it has no business using. The most benign case of this is where some network co-opts unused addresses. A less benign use would be announcing routes to divert traffic to the announcer, where it could be examined, discarded, or otherwise disrupted.

The final issue with BGP is how long it takes to respond after a major flap. Faced with a large-scale change in routing, it may take BGP minutes to cope. For many purposes, that is not a problem, but it will disrupt services like VoIP. More of a challenge to the system would be repeated large-scale changes in routing, where slow responses could mean by the time a given route reaches some distant part of the Internet, it is no longer valid.

Why not fix BGP?

Chris mentioned in the paper the current version of BGP is 18 years old. Chris also indicated that securing BGP would be costly and take years to accomplish. I asked Chris if BGPSEC might be the answer.

Hall: BGPSEC holds out promise that the information it covers can be verified. But, current routers do not meet the processing and memory requirements needed by BGPSEC. To implement BGPSEC means either expensive upgrades to existing routers, extensive network changes moving BGPSEC out of routers into a new BGPSEC-plane, or wait for equipment turnover.

Also, BGPSEC is not a complete solution; it only covers part of the information carried by BGP. It does not allow a network to verify that its announcements are consistent with policy, nor can a remote network’s announcements be checked against their policy– so BGPSEC is of no help with Route Leaks.

Fortunately, the operational layer seems to be doing a good job.

Operational layer

You may be wondering what makes up the operational layer — people. According to Chris:

Each network in the Internet has a Network Operation Center (NOC) that monitors its own network, its connections to other networks, responds to incidents when they occur, strives to maintain acceptable levels of service and reliability, and at an acceptable cost. Each NOC acts independently and interacts with other NOCs, collectively forming the operational layer.

I asked Chris if he had an example of where the people working in a NOC made a difference.

Hall: One good example is the China Telecom incident (18 minute mystery) that occurred in April of 2010. Approximately 15% of all Internet addresses were disrupted, but only for 18 minutes. This mishap is a testament to the effectiveness of the operational layer.

For what it’s worth, the “China Incident” is a wonderful example of a Route Leak — it speaks volumes about the ability of ignorance coupled with paranoia to wind a small fat-finger incident into an attack on everything from national security to apple pie.

For the long haul

Because of the Internet’s complexity, it seems human intervention will be required for quite some time. Is that how you see it, Chris?

Hall: I wouldn’t put it that way. The Internet is not just IP and BGP. The Internet is a many-layered system, and each layer plays a part. There are technical solutions to some security and reliability issues. However, if we only consider a technical solution for a given problem we may not come up with the most effective solution.

Furthermore, if we do not consider commercial and economic implications, we may find that the proposed solution will never be implemented, because it is not cost effective nor are there any economic incentives to implement the solution.

Without becoming too philosophical: we can always improve the automatic systems which run large networks. We can always improve monitoring systems to ensure the network is working properly, and we can make the systems easier to use. But, when something unusual happens…

An Internet NOC

Chris mentioned numerous times how important the NOC was to a network’s health, and its ability to interact with other networks. I asked Chris if there was some kind of a centralized NOC for the Internet.

Hall: There is no global view of how well the Internet works, and no view of how it responds to events large or small. We also know next to nothing about demand or capacity.

ISPs have an economic incentive to monitor their own networks, but no incentive to consider the Internet as a whole. If we collectively consider the Internet to be a common good, then we should strive to understand the Internet’s performance parameters. It would lead to a more secure Internet — a common good. But sadly, the incentive is missing.

Final thoughts

Now to the question that started it all:

“Does this mean we have simply been lucky, or are the issues more theoretical than actual?”

Have we been lucky: maybe. Between the hard-working people running NOCs, and realizing bad guys are just as dependent on the Internet as the rest of us; I’d say lucky or not is relative.

As for theoretical versus actual, one can go into metaphysical overload debating when or if an issue turns from theoretical to actual, but it is evident that something is working; and as Chris pointed out, that something is having people who are at the controls.

I’d like to thank Chris for his worthy explanations and the ACM for allowing me to use parts of the article. And I almost forgot, Chris wanted me to mention the ACM article was derived from this ENISA paper.