Networking

Not Invented Here has no place in open source development

Last week, many Debian users got something of a shock when they realized that encryption keys for OpenSSH, OpenSSL, and OpenVPN have all been vulnerable to relatively easy compromise for a while. Previously, I discussed how you can detect and replace vulnerable SSH keys on Debian, and Vincent Danen explained another means to find and fix crypto key vulnerabilities that arose as a result of this snafu. So much for the technical matters -- read on for a quick overview of the rest of the story.

Last week, many Debian users got something of a shock when they realized that encryption keys for OpenSSH, OpenSSL, and OpenVPN have all been vulnerable to relatively easy compromise for a while. Previously, I discussed how you can detect and replace vulnerable SSH keys on Debian, and Vincent Danen explained another means to find and fix crypto key vulnerabilities that arose as a result of this snafu. So much for the technical matters -- now I'll provide a quick overview of the rest of the story.

What happened

The problem is that Debian package maintainers used valgrind to profile some security tools and "fixed" some "problems" it found without understanding what they were doing. The problem is that valgrind reports issues with uninitialized memory, which is usually a good sign of a bad bug, but is not always a sign that there's anything wrong. In fact, the uninitialized memory usage in this case is critial to the proper operation of crypto tools like OpenSSH.

By "fixing" the uninitialized memory issue, Debian package maintainers destroyed the ability of tools like OpenSSL to add any entropy to the pool it uses to generate encryption keys. Not only did these maintainers modify the code that offended valgrind to eliminate uninitialized memory use, but they eliminated the ability of these tools to add any new memory to the entropy pool at all. The end result is that these tools were effectively restricted to a very limited set of potential encryption keys, making a brute-force attack not just possible, but even easy, by some measures.

The actual problem occurred with a patch to the OpenSSL libraries, which are used by the OpenSSH and OpenVPN projects to generate random numbers for encryption key generation.

What they did right

Valgrind is a great tool. There's no reason to avoid using it. The fact that it reported something as a problem that, in this case, is not actually a problem, is not where the real issue arose. Tools like valgrind and Purify (which would also produce warnings about uninitialized memory) are extremely helpful tools, and any C developer should familiarize himself or herself with one or both. They can prove invaluable in discovering and fixing security issues, in fact, if they are used correctly.

The Debian package maintainers and the people working with them also employed some due diligence in tracking down the source of the apparent problem to the best of their ability. They discussed possible solutions, and they even chose real solutions rather than just hiding what they saw as a real problem.

The OpenSSH, OpenSSL, and OpenVPN teams, for their parts, have done great jobs over the years of maintaining useful, high quality, very secure tools, and have not neglected the software in their charge when presented with evidence of bugs and vulnerabilities that needed fixing.

What they did wrong

The Debian package maintainers may have discussed their options for solving the problem in a well-reasoned, careful manner, with an eye toward fixing rather than hiding problems, but they missed the single most obvious -- and obviously correct -- option of all. They failed to consider getting in touch with the upstream developers for the security tools in question.

When you have a problem with a piece of software that is developed and maintained by another team that obviously knows a fair bit about that type of software, any bugs and issues you find should be brought to the attention of the upstream developers. This is doubly important when you yourself are not a subject matter expert, and that level of importance increases by an order of magnitude when the software in question is security software.

Developing non-security oriented software in a secure manner requires some knowledge of good development practices, common security failings and vulnerabilities, and effective testing techniques. That's knowledge that any developer should have in his head. Developing actual security software, however, is an exacting, demanding, and highly specialized activity. It requires a level of expertise that one cannot simply fall into. If you are not deeply involved in security software development yourself with a fair bit of experience behind you, you should never take it upon yourself to second-guess the decisions of the security software development experts without an expert or six of your own looking over your shoulder and checking your work.

This is a problem that can come up under pretty much any circumstances, of course -- it's not particular to a Linux distribution project's package maintainers, or even to software that legitimately makes use of code that is often considered a sign of a bug as in the case of uninitialized memory use to refresh an entropy pool. This problem in particular is an even more egregious example of poor handling of a security software issue than usual because it happened with an open source project.

One of the benefits of open source software is that anyone who discovers a problem, or something that may be a problem (but actually isn't, as in this case), can collaborate with upstream developers to solve the problem not only for themselves but for all users of that software. Doing so, in fact, removes some of the software maintenance weight from their shoulders and places it where it belongs: on the shoulders of the upstream developers. As such, the Debian package maintainers' mistake was such that what they did was the wrong answer whether uninitialized memory use was a real problem or not. If so, they would have been fixing the problem only locally, which could conflict with later upstream updates, prevent other downstream users from getting the same fix by not sharing it, and adding to their own workload by taking on maintenance of the fix personally rather than handing it to the people whose job it is to maintain such things. Since it was not a real problem, they "fixed" something that actually should not have been "fixed" at all, and created a huge problem out of nothing.

Regardless of what upstream source is providing the software you use, or redistribute to others -- whether it's a closed source commercial developer or an open source project -- your first instinct after documenting a potential problem to the best of your ability should always be to contact the upstream software developer through whatever appropriate channels you have available to you. Only after you have done so should you even consider fixing it locally rather than getting a fix from upstream (perhaps after submitting a patch to the upstream developers). If it's security software, even then you shouldn't fix it yourself unless you are a security software development expert, or have one on hand to review your work make sure you're not making any silly mistakes.

But wait -- there's more! This is not entirely a problem with the Debian package maintainers. The OpenSSL team may need to clean up its act just a little, too. Getting in touch with the right people at the OpenSSL project is a less than obvious process, apparently. OpenSSL core team member Ben Laurie said that the openssl-dev mailing list, despite its name, is not the place to discuss development of OpenSSL. The OpenSSL Support page identifies it as a mailing list for "Discussions on development of the OpenSSL library," however. Even worse, the "openssl-team" e-mail address he suggests for reporting issues like the valgrind warnings doesn't appear to have been noted in any OpenSSL documentation or Web pages.

The OpenSSL team needs to make their preferred means of receiving such reports more accessible. On the other hand, that doesn't excuse the Debian package maintainers from submitting a patch to the OpenSSL team rather than just patching the code locally in the Debian project and leaving it at that. To varying degrees, it seems everyone involved was culpable.

Another example

Another recent, if less disturbing, example of downstream software users mishandling apparent bugs is the celebrated 25 year old BSD bug. The Samba team discovered a bug in BSD Unix code for handling the MS-DOS filesystem which results in a simultaneous file access conflict issue. This problem has, since 1983, affected every BSD Unix and significantly derivative OS, including Apple's MacOS X.

The failure here is that the Samba team wrote a work-around into Samba code when they discovered the issue rather than alert BSD developers to the problem so that it could be properly fixed. Once again, a lack of communication led to a suboptimal result. Once again, the problem is compounded by the fact that the offending parties never bothered to get in touch with upstream developers in an open source project. Luckily, this error in judgment didn't result in a widely affecting, potentially very damaging security vulnerability.

The Lesson

All I can offer as a lesson is a repeat of what I've already said:

  1. Your first instinct should always be to bring a bug to the attention of upstream developers.
  2. Even if, after talking to upstream developers, you feel the need to make local changes without forking the project to solve systemic maintenance problems in the upstream project, you should never try to fix a code problem you don't understand.
  3. Perhaps most importantly, never assume you understand a problem (or even that there *is* a problem) with security software unless you yourself are an experienced security software developer or having one near at hand to double-check your work.
  4. Finally -- and this should be the most obvious point of all -- take advantage of the open source development model whenever possible. Introducing downstream changes to software you're getting from some other source to fix a bug is exactly the wrong way to do it, in part because you're depriving the upstream developers of the benefit of your development efforts, and in part because you aren't taking advantage of the upstream developers' familiarity with the software, but mostly because it's the upstream developers' jobs to maintain the software after any fixes have been applied.

There's no room for Not Invented Here syndrome in open source software development. When you let NIH get in the way of doing the right thing, you're not doing open source development any longer.

About

Chad Perrin is an IT consultant, developer, and freelance professional writer. He holds both Microsoft and CompTIA certifications and is a graduate of two IT industry trade schools.

11 comments
Jaqui
Jaqui

as you mentioned in openssl not having accurate contact info posted anywhere. I went looking into a bug, that only occurred on one distro for a package, it was a distro config error that caused the bug. but the upstream developers didn't ever respond to my post asking for their input. PCLinuxOS fixed the bug that caused a sigterm 11 to be thrown, on exiting the app in question on their distro. As long as you did not exit the app, no errors... or try to browse the file system to open a specific file in the running app. Both were fixed fairly quickly by PCLinuxOS, before I actually contacted them. I was trying to duplicate the behaviour in a source build and not able to, even on PCLinuxOS system.

itpapers
itpapers

You got it wrong. The problem was discussed upstream - it just fell through the cracks. Indeed upstream approved removing the code that valgrind found wanting. The trouble is the identical bit of code occurred twice and it was the second removal that was essential. Please do more thorough research before posting errant thoughts.

The Ref
The Ref

Is that this vulnerability was injected in Sep 06, and the point of origin for the change in openssl was the RSA exponent 3 vulnerability and subsequent exploit. This should have indicated how difficult it is to get security right. To reiterate your point 3 "never assume you understand a problem (or even that there *is* a problem) with security software unless you yourself are an experienced security software developer..." The fact that this vulnerability was created whilst trying to close another major vulnerability is a bit concerting. (note there is no causality by vulnerability, just that unintentional changes additional to fixing one caused the other). The developer should have realised the intricate nature of cryptography and known enough not to mess with it.

apotheon
apotheon

When you succumb to NIH syndrome, you're no longer doing open source development. Take the brakes on a bicycle as an example. They're analogous to a benefit of open source development -- in this case, the benefit of being able to work with upstream developers to solve problems with software in a way that simply isn't possible with closed source development models. The bicycle, meanwhile, is analogous to a popular open source security application. If you use those brakes, you make your use of that bicycle pretty safe. If you ignore them, and try to use the soles of your shoes dragging on the ground to do all your braking, you'll have a tendency to slide into the middle of an intersection and get run over -- particularly after making good time down a steep hill, or if the light changes from green to yellow then red just as you're getting to the intersection. The question that arises then is a simple one: Why are you using open source software at all, if you aren't going to make full -- or even minimal -- use of the open source development model's benefits? Have you ever been on either side of a failure of communication between upstream and downstream software maintainers? What went wrong -- and what did you learn from the experience?

apotheon
apotheon

"[i]You got it wrong. The problem was discussed upstream - it just fell through the cracks.[/i]" I suppose it's good that you have confidence in your beliefs about the matter -- but you haven't offered anything except your word that everything you say is correct. If you're going to challenge my statements by making some of your own (including the implication that I don't know anything), please take the time to actually offer some kind of evidence to back up your accusations of incompetence. As for it being "discussed upstream", I don't think so. In absence of any more explanation of what you mean, I'm going to have to assume that you're basing this incredible statement on the notion that the sparse comments on the dev list I mentioned in the article qualify as "discussed upstream". If that's the case, I recommend you learn something about what "upstream" means, and try reconciling that with the statements by core developers that the mailing list in question does [b]not[/b] constitute a proper forum for bug reports. "[i]Indeed upstream approved removing the code that valgrind found wanting.[/i]" Uh . . . what? It looks to me like you're saying that the OpenSSL developers "approved" code that produced errors when valgrind was used to profile the code, as if that somehow refutes what I said in the article. Considering I specifically stated that valgrind threw warnings when it encountered code that was intentionally included in the OpenSSL libraries, your apparent implication that I said otherwise is, well, [b]wrong[/b]. If that's not what you intended to imply, I fear your ability to express yourself in the English language could use some polishing. "[i]Please do more thorough research before posting errant thoughts.[/i]" Please bother to understand what you're reading before you dispute it. In fact, your error seems to be ironically very similar to that of the Debian package maintainers.

Sterling chip Camden
Sterling chip Camden

... open source or not. Developers use some package to jump-start their efforts, but fail to understand how or why it works the way it does. As their needs change, they don't even look at what that package already has that might address that change -- rather, they reinvent a solution on top of our outside the package -- usually creating some type of conflict that they live with like a fracture that was never set right. The larger problem is that developers are so consumed with their specific problem that they have to get resolved today that they don't take time to look at the big picture.

tuomo
tuomo

Correct - ".. open source or not." Happens too often! And you are right, most often when a developer tries to "fix" or enhance one part of the system without the big picture. Most irritating, especially when part of system is outsourced and suddenly starts failing but just sometimes and a long time after it has already been production. It has made at least me so suspicious that I nowadays often use a lot of time trying first to find if anybody did change anything somewhere else in system. IMHO closed source is even worse, the project which did the mistake may not even exist anymore when the bug is found - like last one I had, 12 years after the (stupid! useless!) changes were made. Which means that you have to collect all the change / new information and hope it was kept even after the project was deemed successful. Or you start digging, long nights, find the cause and hope that the "enhancement" is local, which unfortunately often it isn't! One thing really has changed, a long time ago I was able to get in contact with a person who wrote/changed the SVC, JES, SNA printer, etc code - now you are lucky if you even know which company, organization or consultant did the work. Many companies don't track this information - stupid and dangerous for several reasons!

apotheon
apotheon

You noticed the error and admitted it. That's as good as not making the error in the first place, in many cases.

Penguin_me
Penguin_me

Sorry, should have read a bit more closely before responding - I noticed (too late) you'd mentioned the OpenSSL-dev mailing list. Apologies.

Editor's Picks