Linux

The most expensive programming mistake ever?

In this programming news roundup, read about the Ajax Control Toolkit, HTML5 and ASP.NET 4, developers' preference to code on Macs rather than Linux, and more.

Poul-Henning Kamp wrote an excellent piece claiming that the choice of NULL terminated strings for the C language may be the most expensive mistake in the history of programming, and it was only a one-byte mistake at that. I write the TechRepublic Patch Tuesday column every month, and I can tell you that the kinds of issues that NULL terminated strings cause are the biggest cause of security issues for Windows, and therefore cost billions of dollars every year in security violations and lost time patching systems.

Does "social" fit into your mobile apps?

Appcelerator released its latest quarterly mobile developer survey, which has a big focus on "social" and how it fits into mobile apps. I think that mobile apps + social is a great blend. While I'm not that big on social stuff for enterprise scenarios (do you really want your server Tweeting that it's had a failure?), I think it's a good match with mobile devices, which tend to have a lot of consumer features.

Windows Phone 7 has a lot of social capabilities, and it is a great experience. For example, when my daughter was born a couple of months ago, I was able to get details and photos out to my friends and family within minutes of the birth, allowing those who could not be there to still participate.

I'm not sold on Google+ for a number of reasons, and Google's history with its APIs makes me reluctant to recommend integrating apps with them, but an integration with Facebook and Twitter (where appropriate) makes perfect sense for many mobile apps.

Language/library updates

Ajax Control Toolkit

The Ajax Control Toolkit recently got an upgrade, with a number of bug fixes and a new HTML editor component.

HTML5 and ASP.NET 4

Scott Hunter wrote a short piece explaining recent changes to the .NET Framework to enhance its ability to handle HTML5.

Tools and products

AppDynamics releases .NET cloud performance tool

AppDynamics released an application performance management (APM) tool designed for .NET apps in the cloud, including Azure.

CumuLogic betas Java PaaS software

CumuLogic is now offering beta use of its Java Platform-as-a-Service (PaaS) software, which allows you to create Java platforms as a cloud service model using a combination of in-house and third-party servers.

Syncfusion Essential Studio Enterprise Volume 3

Syncfusion's latest update to its Essential Studio Enterprise bundle of .NET components adds a full Excel Editor control, new HTML5 controls, a PDF viewer for Silverlight and WPF, and more.

Japsersoft Self-Service Express

Jaspersoft created a new support offering called Self-Service Express. It bridges the gap between the free, community-provided support and the paid support contracts.

App Hub enhancements

App Hub (the WP7 app marketplace) just got a number of important enhancements, including private distribution (which is critical for enterprises that want to use WP7).

NetBeans 7.0.1

NetBeans 7.0.1 has been released. It looks to be mostly patches and bug fixes compared to 7.0.

BlackBerry App World 3.0 beta

RIM has a beta of its newest version of App World (its online app marketplace) available, with a big overhaul of the UI, social features, and more.

Editorial and commentary

Devs prefer Macs to Linux

Apparently, developers prefer to code on Macs than Linux. This is no surprise to me. Many of the people I know who are seriously working on software for *Nix deployment are working on Macs because they are easier to use and support than Linux and have better application options, while providing an environment with the same capabilities as Linux.

Tips and tricks

ASP.NET MVC 3 on Azure

Mark Berryman of the Visual Web Developer Team Blog wrote an in-depth article about putting ASP.NET MVC 3 applications on Azure.

Using the Razor view engine for text templating

Phil Haack posted a tutorial that shows how to use the Razor view engine from ASP.NET MVC to create text templates.

Building a Tree with lambdas

Chris Eargle has an article series demonstrating the use of lambdas in C# to make a Tree data structure. It's definitely an interesting exercise, and shows one of those situations where a low-level language like C beats the pants off a higher level language like C#.

Events

Eclipse Testing Day

If you want to learn more about testing with Eclipse, there is an event on September 7 in Neuss, Germany, that you should check out.

ConFoo 2012 call for papers

ConFoo 2012 (a large Web development event in Canada) has opened their call for papers. The event is February 29 - March 2 in Montreal, and the call for papers ends September 2.

J.Ja

About

Justin James is the Lead Architect for Conigent.

30 comments
medomoreno
medomoreno

There also exist interoperability issues with the address + length format: endianness, word-length, field-alignment, definition of #bits / char to name a few. The NUL-terminated choice seems to have served Ken, Dennis, and Brian well in terms of simplicity. My suspicion is that, had they chosen the address + length format, we would now be reading about the costs generated by the mistake of them choosing THAT format 40+ years ago.

awans
awans

the events name seems pretty anim.. and that's driving me away

redave
redave

While this is a problem, It can be made into a minor one. The major problem I have seen is: there is never enough time to design a program, but always enough time to fix it. One of the best programs that I worked on had DOS for the I/O, think speed, reliability, and no programming needed, assembly where speed was needed and all the rest C. This uses the right tool in the right place. The second was a top down design. with almost the entire program being comprised of library routines. This application was for testing PC I/O boards so there was a lot of repetition. The top down design made sure that all repetitive tasks were done exactly the same, by making them a library function. The code was half the size of the previous version did almost three times as much and was about 2.5 times faster. I say this, because when a probram is structured by design, there are less mistakes and problems than linear coding, and they are easier to identify and fix. I realize a lot of you will say I do that, but a lot wish their management would allow them to do so. The only good argument for linear coding, came from a friend, who works for a government contractor, because no one but his company has anyone who understands the code that they write!!

Mark Miller
Mark Miller

That's more identifying the symptom. The real problem was using C in the first place on such a wide scale. Even if the standard library could've been rewritten to use some other method for ending strings, the same problems would still be evident, because the C runtime never checks if you're going beyond the bounds of allocated memory. As I keep saying, C is best thought of as a high level assembly language. When I programmed in assembler in college we used to usually end our strings with the LF (line-feed) character. We could've used anything, though we used a few rare Unix system functions (which were C-based) to get input and produce output, in which case we had to deal with null-termination. One of the most dangerous functions in the standard library IMO is strcpy(). It's seductive, a seemingly nice and easy function for copying strings. It has what I'd call a nasty bug, where if you give it a source buffer that's larger than the destination buffer, it will blithely write past the end of the destination buffer, even if the destination has a null terminator in it right at the last element. It only stops copying characters when it comes to the null terminator in the source buffer! In a place where I worked, we eventually decided to not use strcpy() most of the time. We came up with our own secure coding solutions. What we were worried about at the time was corrupted data, and unstable software that would crash mysteriously, not hackers. I remember I wrote up a macro to copy strings that only used strncpy() (a function that allows you to specify the number of characters to copy), in conjunction with sizeof() (for stack-based buffers). It would truncate the copied string if the destination buffer was too small. This lessened our buffer overflow worries quite a bit. C was considered a fast language in the 80s and 90s, which was a boon for slow hardware. In hindsight, I guess for the software in the 80s, which wasn't that large by today's standards, you could get away with using it without causing too many problems. The worst that would happen if you ran into one of these bugs was some of your data might get corrupted and your computer would crash. Computers were usually only networked in LANs, small scale networks, in a controlled environment. So the risk of a security issue arising because of these bugs was pretty low. The problems IMO came in the '90s when C became dominant, and was used for everything. The complexity of software got larger, and became too much for many programmers to handle in C. Add to that the fact that software infrastructure was being made public for everyone to use via. the web, which made it available to be abused as well. The industry should've looked for a better development solution. One of the more secure languages that was available in the 1980s and into the '90s, and at one point was widely used, was Pascal. It was rather popular in the 1980s, when C was becoming popular, and it bounds-checks arrays in the runtime. It was usually implemented as a p-code interpreter, and Pascal code would be compiled down to that. Turbo Pascal, which compiled directly to machine code, was the exception, not the rule. Pascal was considered a bit slower than C, but I don't think there was that big a difference. I think developers gravitated to C because it was considered a hacker language. Pascal was more wordy and clunky IMO, and not as pleasant to use for software production, mainly because of its type strictness, particularly around arrays. You had to define arrays as their own types, as I recall (maybe this was only for passing them into functions). So if you wanted to have a 5-element integer array, and a 10-element integer array, you had to define different types for each. Quite a pain, though there may have been dialects of Pascal which didn't enforce types to this degree. It was originally designed as an educational language, but it was not a low-powered language. The original Mac OS that ran on the first Macintoshes was written in Pascal, as were some of the early Mac applications. That doesn't mean that software written for the Mac didn't have some nasty hidden bugs. I remember reading articles written by David Small, who created a hardware/software Mac emulator for the Atari ST, called Spectre GCR. He gave some details about how quite a few pieces of Mac software were neglectful about using uninitialized pointers. Information would frequently be stored at the NULL address, whatever that location was (probably address 0), because the Mac wouldn't bat an eye if you did that. Apparently there was a "pocket" of memory at this location that wasn't used by the system. Programmers didn't do this on purpose, but it was an easy error to miss. He talked about how he had to compensate for it in his emulator, because the Atari would barf if software tried to do that. I think the reason the industry stayed with C into the '90s was due to the pursuit of speed, developer enthusiasm, market momentum, and developer tradition. And they had a perfectly good excuse, because it had a reputation for being the next fastest thing to assembly language. It was a huge influence on the developer community. We still have syntactic traditions from C floating around in the popular languages used today, just because they look familiar to developers. Even when the industry moved to C++, it didn't totally get rid of these problems. I remember we had to deal with this when I worked in VC++ 6, using a version of STL. We used the MFC CString class, which was bounds checked, but we used a class derived from "vector," which we called BVector (for "bounded vector"), for arrays, because the MFC containers sucked. Even so, the STL vector class was written such that you could assign a value past the end of its own internally allocated memory! So we used BVector, which provided it with bounds checking. Oy! And then, of course, there was the issue of managing heap memory, which was a whole other can of worms... Smart pointers, anyone? One is tempted to say, "The industry should've moved to Java a lot sooner," because it's had bounds-checked, garbage-collected memory that does away with all of this crap. The thing was, it was not as stable as claimed when it was first introduced, and it earned its reputation for being slow on the hardware of the time. It really was! But it had some good ideas in it.

AnsuGisalas
AnsuGisalas

I mean, I can analyze the structure of the piece - so I can see that it's a news breakdown. Short and sweet - and non sequitur. That's well and good (and well done, too). But these pieces always have a big title based on the first non sequitur in the list, always making me wonder how the title relates to item #2 in the list... it's only when I (puzzled) proceed to items # 3 & 4 that I have enough data to lock down the genre, thus dispelling the misunderstanding. Could the title be modified? F.ex. [b]Justin James' Development Newsreel : Most expensive mistake, and more. [/b] That would be less confusing.

Tony Hopkinson
Tony Hopkinson

Not really. The real issue was being able to inadertantly overwrite terminator and there by extending the string. That sort of error could and quite probably would have applied to a length descriptor.

oz penguin
oz penguin

NULL terminated strings are not a mistake, but using them without understanding them is Just like F1 cars, guns (and other things), NULL terminated strings are very fast and powerful but you don't let untrained people play with them and brand them as professionals ! Let the undisciplined programmers go back to basic and let the undisciplined managers realise that people have limits and software needs to be tested.

Tony Hopkinson
Tony Hopkinson

planning, forethought, experience, ie not cheap and quick

Tony Hopkinson
Tony Hopkinson

and then get them Low and High, but you can do that because of range checking, but yes you do have to have them if you declare a type. As you say though the real problem was C, you could give yourself the same sort of issues in pascal, you just had to write a fair bit of code to do it. :p

Justin James
Justin James

Thanks, I'll pass that on to my editor, and get her thoughts on it. J.Ja

TommCatt
TommCatt

The "mistake" mentioned is at least understandable in that the security implications could not have been foreseen when C was being developed. However, the comment that "using them without understanding them is [a mistake]" is rather short-sighted on its own. The null-terminated string is a good example of a leaky abstraction. Ideally, the programmer shouldn't know or care about how a string is terminated, only that the end of the string is "somehow" detectable. The best implementation I have seen was how it was done on the VAX. The header was two words. The first word consisted of two short-words: the number of bytes allocated and the number of bytes actually used. The second word was the address of the first character. Using something like this, the current length and the maximum defined length are easily discerned and resizing was also very easy. There would also be no reason for the programmer to know anything about the descriptor or how it worked. Only that it did.

Justin James
Justin James

... it is clear that the number of people programming in C who have the discipline to safely do so is much smaller than the number of people who currently program in C... J.Ja

Tony Hopkinson
Tony Hopkinson

doesn't have a string type.... If it had they would have more options. Choice was pretty much taken out of our hands, but anyone sane who compared the gain of the extra performance of null terminated string versus the huge number of really painful bugs it could, would, did and still causes, would have opted for something with a bit more discipline. I've seen one or two C experts makes this mistake as well, vene the ones who built their own routines and passed length as well, and did their own checking.

george
george

There have and always will be professed programmers who are looking to pad their resumes with languages; look to the .net camp for ample examples. The ability to write a 'hello world' program does not make you a master of that language. There are many C programmers out there but very few masters of the language.

nwallette
nwallette

I've only been using C for a short while now. I've always meant to learn it, so I finally did. Therefore, I'm probably erring on the pedantic side -- because I really don't have the experience to define the difference between "this could happen" and "it could, but never does in practice." :-) My portfolio of tested compilers consists of: GCC on Linux x86-64.

Mark Miller
Mark Miller

True. Some older compilers put struct fields on even address boundaries, perhaps for more optimal access. Most of the ones I saw, though, allocated memory contiguously. I never saw a compiler put fields in reverse order. Maybe you're talking about big endian vs. little endian, where the bytes are reversed? C has been known as a portable language, but there are all sorts of pitfalls like this. I used to write Unix servers, and the way we kept source code portability was to stick pretty much with POSIX compliance, and use libraries that were designed to be portable between the platforms our customers used. Whenever I'd install our server on a new system, I would spend at least a day surveying the customer's system, reading man ("manual") pages describing the features of its C compiler, and what libraries were on the system, so I could see what adjustments needed to be made to our make files and source code. The source usually only required minor tweaks. The make file was the bigger adjustment. The big thing in each project was making sure that our third-party libraries worked on the new system. If that worked, we were off and running.

nwallette
nwallette

I liked that too. Then I read that the memory allocation used by structs is implementation-specific. So, you can't guarantee that a struct with two 32-bit integers will be ordered as intA, intB. It might by intB, intA. Or intA, 32-bit pad, intB, 32-bit pad. Or anything else the compiler wants to do. I started reading my numeric fields 8 bits at a time, and using bit shifting to turn them into 32-bit ints since architecture, byte order, and compiler can all cause the same code to fail at some future time, when I've long forgotten that I need to remember to consider it. It makes me sad, as it could have so easily been defined by the spec and then those shortcuts would be totally legitimate.

Mark Miller
Mark Miller

...but you're right. You had to be careful with it. In one of the projects I worked on years ago I learned about this technique I could use in parsing a flat file. If I defined a struct with fields that matched the fixed-width fields in a text file, byte for byte, I could use the following code to automatically put the fields into slots: fread(&structVar, sizeof(structType), 1, fileHandle); We were developing a client-server system, and we used flat files to transmit data back and forth between the client and the server. I forget if the client had the same structs and just serialized them, or if my server code had to use strncpy() and sizeof(), and manually null-terminating the strings, to get the field values out. In any case, it was a very effective technique, and it didn't require a lot of code.

Mark Miller
Mark Miller

...if that helps explain my comment. It was the original standard developed by Kernighan & Ritchie before ANSI C came along. I wrote a blog post on how C is misunderstood a while back. In it I have a section where I talk about some of the stuff you could do in K & R C that you can't do in ANSI C. I used it to explain why C is the way it is today, because I see there is a "reason behind the madness." Some got the impression that I was a big fan of C, and that I was bashing people who didn't "get it." That wasn't my intention. I was just saying that there were many who were criticizing the language for the wrong reasons, and that there were different, legitimate reasons to criticize it, from a standpoint of understanding what the creators of C were actually trying to do. I said at the end that I'm glad I'm not using it anymore. Not to say I would never, ever go back to it, but I'm glad there are some better options out there. Re. the changing sizes of types That was something I thought about back in the '90s when I was working in C, though it didn't keep me up at night. I remember asking my boss about it. I noticed that already int's had expanded from 16- to 32-bits, and I figured that someday we'd get 64-bit computers, and the bit width for int's would expand still further. I asked how the software we were writing then would work on newer hardware, even if the instruction sets were still compatible. Since we always supplied our customers with complete source code in the projects we wrote for them, he said, "They can just recompile the source code with a newer compiler." I got the idea, but I was still wary of it. This was a selling point of Java when it came out, that an int would always remain 32-bits, because unlike C and C++, primitive types were not dependent on the hardware architecture. The main thing I worried about (though it still didn't keep me up at night), since everyone was becoming more conscious of the constraints programmed into systems 20-30 years earlier with Y2K coming up, was the time value used in C for getting the current date. It had the same wrap-around limitation that nwallette talked about earlier with integers. I figured out that with the bit width of integers at the time (I forget what the bit width was for the time value at the time. Maybe it was a long), that it wouldn't become an issue until 2034, or something. I remember the president of our small outfit asked us to think about any unresolved issues we might need to focus on in the near future to address Y2K. I mentioned this, and he just laughed it off...

Tony Hopkinson
Tony Hopkinson

the weak implicit casting, or passes in pointers and then casts back. The power behind that can be very useful, but at the point you do it the compiler cannot help you. The way it was desribed to me was a Pascal compiler is a policeman, and a C compiler is an accomplice. :p Compiled C is powerful because of the minimal set of contraints it imposes, but that means the programmer has to police themself.

nwallette
nwallette

I don't know if I agree on the type argument -- though I am looking at it from the perspective of someone who has only ever used ANSI C. I find it to be very strongly typed. BUT, there are really only a very small number of actual data types to chose from. There are, however, a lot of "specific" types that are just recycled generic types under the hood. Char is the perfect example. You can have a signed char because a char is just an aliased integer. If it were a unique type, there probably wouldn't be a signed variation. (And if strings were a type, 0x00 would just be another character.) I honestly don't know how C programmers slept at night before u/intXX_t, size_t, etc., were conceived. Not knowing whether your numbers rolled over at 16, 32, or 64 bits would send me into neurotic fits. Also, I'm not sure whether pointer arithmetic is a blessing or curse. I love that you can iterate over arrays, or through memory, by adding to an address. But one of the reasons I like C is the lack of black magic. Unfortunately: int *i = &someaddress; i++; ... is totally black magic. How many bytes did you just jump? I would really have preferred: int *i = &someaddress; i += sizeof(someaddress); ... which, of course, does NOT do what you expect, if you're new and don't realize that's exactly what "++" is really doing for you.

Mark Miller
Mark Miller

Re. C doesn't have a string type This goes back to the original K & R C language. One of the first things I learned about C was it didn't really care about types. C looked like a high level language that would care about types, would make sure you passed the correct number of parameters to a function, and would make sure you didn't go beyond the bounds of allocated memory, but it really wasn't. ANSI C implements type safety, and counts parameters (still no bounds checking), but this only came in the early '90s. In K & R, types were mere descriptors for bit length and offsets into memory. I remember wondering what the point of using these descriptors was. They really didn't mean anything. A char was 8 bits, a byte. That was all. An int was 16- bits, a word. And a long was 32-bits (long word). This has all since changed. As the bit-width of processors have increased, the bit-width signified by these descriptors has increased as well. With K & R, if you defined your own type with typedef, or a struct, again, all that meant to it was a descriptor for the size of the blob of memory you needed, and what, if any, offsets it needed to calculate for access into that blob. All a string is in C is an array where each element is "char length," 8 or 16 bits, depending on how the compiler interprets the descriptor. What adds even more weirdness is you can have signed and unsigned chars. Why? Because really char is like all the other types. It's a little blob of memory that stores a number. It is nothing more than that. Since the data in memory is numeric (which is what all data is in your computer), it can be signed or unsigned, telling whether the value is positive or negative. A negative character? No, a negative 8- or 16-bit number! This is the thing that probably throws most people off about C. It puts a pretty cover over the internals of your computer, which is what you're really manipulating with it. It does some nice things for you, which an assembly programmer would have to do manually, like allocating memory, calculating offsets into memory, dereferencing an address to get to data, and pushing and popping things off a runtime stack. Plus, it provides some constructs for conditionals and loops, allows you to create "char array literals" (text you surround in double-quotes, for which the compiler allocates memory, and adds a null-terminator), and a couple data structures: structs and unions. Otherwise, you've got the standard library that provides some level of abstraction for I/O, and strings. Other than that, you're up to your neck in the computer's internals, and you're on your own!

Justin James
Justin James

The lack of bounds checking is part of the problem, and it's related to it being null terminated. The fundamental problem, as pointed out by others, is that C doesn't actually have a true "string" type that functions automatically handle properly (which is part of the null termination decision). It's too easy to slip up and do the wrong thing and just copy content directly. J.Ja

Mark Miller
Mark Miller

What scared me to death with C was floating-point calculations. I never felt like I quite got a handle on how they worked. All I ever read about them is they were "estimates." Oh, that's comforting! When they really got screwed up was when I'd use atof() (or perhaps strtod()) to convert a floating-point value stored as a string into a bona fide double value. I don't know what it was, but the string-to-double conversion always screwed up even the simplest of floating-point calculations. My guess is there were some garbage bits left over in the mantissa from the conversion. You could take the simplest of cases, like convert "9.00" to 9.0 (or so I thought) and divide that by 3.0, and end up with some screwy decimal like 3.2395725! This was a nasty bug, because it was difficult to detect. You could even say, "printf("%f\n", convertedValue)" just as a sanity check to see if there was something wrong with the converted value, and you'd get an innocuous output, something like 9.0, just like you expected. Nothing wrong, right? It drove me nuts! We got this nasty bug in production code once that went totally undetected through our own testing, and acceptance testing. When I saw it, I was horrified. We were developing a client-server system. One part of the app. downloaded a record that contained a monetary amount, stored as a string. It turned out every time somebody downloaded this type of record, and then sent it back to the server, the monetary value increased by a penny, or something. I forget. Maybe it was more than that. There was absolutely nothing in our code where we were explicitly increasing the amount. It was happening because of the text-to-double conversions we were doing, and possibly some calculations we were doing with the values. I came up with a fix for it, but I felt like I shouldn't have to do this. I'd multiply the converted value by 100.0 (apparently multiplications of this type didn't cause problems), convert the value to an int, just to make sure I got rid of any "junk bits" left in the mantissa from conversion, then convert the int back to double, and divide by 100.0. I think before this I tried using floor(), and I'd still get screwy results. Doing what I described was the only way I could find to make sure I got a "clean" conversion value. It was nuts!

oz penguin
oz penguin

J Ja your problem seems to be with the lack bounds checking of the allocated memory object/variable and not actually with NULL terminated strings. If that is the point you are trying to make, then I agree that it is a problem, but it not limited to strings and it makes the title of the article quite misleading. I still would not classify that as the worst problem, I do not see it as big of an issue as SQL injection (for one example)

seanferd
seanferd

But those sorts of programmers shouldn't be writing code for, say, the Windows operating system. Or the code should be better vetted rather than patched after release. Better to not have the option of null-terminated strings if the major software houses cannot get it right.

nwallette
nwallette

I love C. I really do. But, I always battle a little internally with whether I should negate any performance benefit I would get by using C, or allow thousands of tiny bugs. Because, face it, any line like this: i++; ... is technically a bug. Instead, the only "safe" way would be: if (i < [max_val_for_this_type]) i++; It's even more fun with arithmetic: if ((b > 0) && ((a - b) > (MAX_INT - a))) { a += b; } And that only catches some of the possible problems for addition. Furthermore, what goes in the "else" clause? Most of the time, I shrug it off and accept the fact that, someday, my code will crash, or an integer will wrap and things will get very unpredictable, because the user opened a file with a size > 2^64 bytes. I still can't stand Java, though.

Justin James
Justin James

You got it... it's the buffer overruns, often caused by not properly handling strings in C... that cause those remote code execution bugs that fill my monthly Patch Tuesday piece. :( What's bothersome about it, is that there are a lot of things that developers can do to mitigate the risks, but it takes a lot of remembering to do the right thing to make it happen. J.Ja

AnsuGisalas
AnsuGisalas

Because in that case it certainly is a mistake. That's like designing a commercial airliner to have a self-destruct button sitting in the middle of similar-looking buttons in the cockpit... Wait - scratch that - it's like designing a commercial airliner to have a self-destruct button sitting next to the seat-adjust lever under each seat... A disaster just waiting to happen - again and again and again...

Justin James
Justin James

... is that the number of competent people is much lower than the number of people in it... it's not specific to programming. The problem is, C raises the knowledge/experience bar high enough that it's very difficult to higher ample amounts of good people. J.Ja

Editor's Picks