Operating systems

GNU is not Unix: How the GNU Project has gone astray

The GNU Project is a famous attempt to supplant commercial Unix systems, but it is not Unix.

In 1969, a small group of developers at Bell Labs created a new operating system whose descendants have become one of the most widely used OS families in the world. That operating system was Unix, which ultimately became the technological basis for the Internet. Because of a consent decree in an antitrust case, AT&T was prohibited by law from engaging in the computer business. In the 1970s, AT&T basically gave Unix to academic institutions for free.

The University of California at Berkeley became one of the biggest centers of Unix development, along with Bell Labs itself. Thanks to the heavy research and development occurring at Berkeley, Unix effectively split into two separate operating system families. One was the Berkeley Software Distribution, or BSD Unix -- and the other eventually evolved into what AT&T called System V, or SysV Unix, for short.

Somewhere along the way, the UNIX trademark (all capital letters, somewhat distinct from the term Unix) became the mark of a commercial certification of an OS as conforming to a standard. While only certified OSes may use the UNIX trademark, which means that most BSD Unix systems are not technically UNIX systems, the major BSD Unix OSes most certainly are Unix. Aside from official UNIX systems and other Unix systems, however, there are also other Unix-like OSes.

In the 1980s, people started developing clones of Unix. One of the earliest was MINIX, initially developed as an instructional tool, though MINIX 3 is a much more ambitious project. Eventually, the most popular clone appeared, under the name Linux.

Another one of the earliest Unix cloning projects was the GNU Project. Core utilities, compilers, and other accessories for a free Unix clone were created, but Linux-based systems leveraged those tools to produce a complete OS before the GNU Project ever got around to developing its own OS kernel.

The tools that the GNU Project has developed have become important components for a lot of operating systems over the years, however -- most notably Linux-based systems. As a result, they have become very widely used, and some of them have even been ported to Microsoft Windows OSes. GCC, the GNU Compiler Collection, is one of the most widely used compiler packages in the world. Even the major modern BSD Unix systems have used GCC heavily for a number of years, though they have been starting to move away from it recently.

The GNU Project is hailed by a lot of people as the genesis of the entire free Unix-like OS ecosystem, the reason we even have open source OSes like Linux-based systems. Even though the focus is on giving credit to the GNU Project, this is a very Linux-centric view of the world, however. The truth is that other free Unix-like systems existed to varying degrees at the time, and fully open source Unix-like systems would surely exist now regardless of whether the GNU Project itself ever existed, notably including the MINIX and BSD Unix systems.

There have been some criticisms of the GNU Project's toolset over the years. The project is regarded by some as very exclusive and unlikely to accept outside contributions, for instance. The source code of some of its applications, such as GNU Screen, is generally regarded as buggy and unmaintainable. The GNU Project's departures from expected behavior for Unix tools is a major sticking point for a lot of Unix users as well.

General cases of annoying variations from standard Unix tools include many instances of basic command line options being changed, renamed (usually to a longer or more arcane option syntax), or even eliminated, sometimes in favor of something generally less useful. Developers notorious for their dedication to clean, stable, secure software design, and still more well-known for their loathing for its opposite, have some unflattering things to say about GCC, including OpenBSD founder Theo de Raadt. While we are at it, let us not forget the absurdity of the GNU su security model, which seems to be "We don't need no stinking security."

Some of the problems people encounter with GNU tools are actually intentional incompatibilities with standard Unix tool behavior. One might easily come to the conclusion that someone in the GNU Project is consciously pursuing something akin to Microsoft's "embrace, extend, extinguish" strategy. The direction of development for specific tools, and the very existence of the tools themselves in some cases, have over the years served as handy evidence for such a claim. For instance, GNU Emacs has expanded to embody so much functionality that jokes to the effect it is a fine operating system but the speaker prefers Unix have gotten less and less absurdly joke-like, as Emacs has absorbed more and more functionality that people associate with their OSes. Another example is GNU Info, a byzantine, perversely designed, user-hostile help page system with a bad case of featuritis that is meant to replace the venerable Unix manpage.

Doug McIlroy is an engineer, mathematician, and programmer whose contributions to the development of Unix as we know it are so widespread and fundamental to the Unix experience that he can reasonably be regarded as one of the founders of the Unix operating system tradition. He actually invented the Unix pipeline:

ls -l | grep Sep

That vertical bar symbol is called the "pipe" character in the context of the Unix command line. It provides an incredibly elegant, simple way to pass data between command line tools. That is what he invented. In this case, it sends the output of an ls -l command (long listing of directory contents) to the grep Sep command (filters out everything that does not contain "Sep", the indicator for the month September in ls -l output). You could easily connect other commands into the pipeline:

ls -l |grep Sep | sort | less

It was probably Doug McIlroy who first articulated a Unix philosophy, and his explanation is generally accepted as the definition of the Unix philosophy:

Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

The first sentence of that is by far the most often quoted. By contrast, tools like GNU Emacs, GNU grep, and GNU Info violate that principle -- that tools should do one thing and do it well -- many times over without breaking a sweat. It is true that the guys at cat-v.org complain about Berkeley's cat utility, rightly pointing out that command line options like -v (which visibly prints representations of normally non-printing characters) are not part of cat's actual primary use: concatenating the contents of files. The Berkeley version of cat used on FreeBSD has seven options, at least six of which do not serve the core purpose of cat. The GNU version, however, offers twelve options, according to the manpage, showing a distinct inflation of the feature creep problem as compared with the Berkeley version of the tool. Of course, maybe there are more that are only documented in the Info page for it. That sort of hidden documentation problem is common with GNU tools.

GNU tools abandoned any meaningful sense of the Unix philosophy quite thoroughly, a long time ago. Maybe it really is for the best that someone else, namely Linus Torvalds, deflected the early development of a GNU operating system by providing a kernel outside of the GNU Project's direct control to be used with those tools, because the Linux community's desire for a free Unix-like system that imitates much of the SysV branch of the extended Unix family has probably served to retard the growth of the contrary GNU philosophy, which would probably have run quite egregiously out of control if a complete GNU OS were released at about the same time.

The GNU Project is aptly named. GNU means "GNU's Not Unix". If what you want is Unix, look elsewhere.

About

Chad Perrin is an IT consultant, developer, and freelance professional writer. He holds both Microsoft and CompTIA certifications and is a graduate of two IT industry trade schools.

49 comments
martosurf
martosurf

"...freelance professional writer. He holds both Microsoft and CompTIA certifications..." It's all already said.

Jaqui
Jaqui

from the intent of the project inception was never to write a unix clone, but to have a unix like system I can't say it has gone astray. the gnu system is a unix like. the linux project has never gone beyond an os kernel and is completely dependent on the gnu system. I think the differences from the BSD and GNU systems are caused more by a difference in focus. the BSDs are programmer / systems admin friendly GNU is more focused on end user friendly. the 2 don't need to be incompatible, but usually are.

j-mart
j-mart

I followed link in article to MINIX 3. http://www.minix3.org/ Is this the real future of OS design, the micro-kernel. Maybe it is time to start again from scratch. The old, Microsoft, Unix, Linux, Mac have had their time. In with a different approach. Download this demo http://demo.tudos.org/ for a look at a less bloated direction

namekuseijin
namekuseijin

in an ideal GNU world, Scheme Lisp would be the "scripting language" of choice... :)

fairportfan
fairportfan

...the Linux and Windows fanbois sniping at each other ... but on a higher technical plane.

Justin James
Justin James

For whatever reason, I just realized that command line piping is a very primitive form of functional programming. J.Ja

cjcoats
cjcoats

From the classic UNIX point-of-view, command line options "-omp" and "-o mp" are completely different. Whitespace is significant. For many UNIX compilers, the former says "Use OpenMP parallelism" and the latter says "output file is 'mp'". The GNU toolchain, in a misdesign worthy of obsolete FORTRAN66 (where also the whitespace is not significant -- one of the worst errors in lanugage design (after significant TAB in "make), declares that the whitespace is NOT significant, and silently mis-parses a command line which is either perfectly valid or (in the case that OpenMP is not supported) is an easily-diagnosed error. IDIOTS !!!!

bmnfan
bmnfan

I do not believe the majority of GNU developers provide the plethora of --options to disturb compatibility. That's what they do, but I guess the intention is usability and featuritis. From a longtime Linux user standpoint however, the BSD systems should give in and adopt the major --gnu options. For the benefit of us users. An entirely different thing is bash. Obviously it doesn't claim to be the original bourne shell. But the intention here is clearly to lock people into one implementation. In recent years bash became the main memory sink under Linux. It degraded from a shell to a programming language. I see the need for arrays and regular expressions. However all these things could be accomplished by sed, awk and some pipes. Anything more complex should be relegated to Perl. It's the domain of Perl and Python to provide system glue code. But the FSF and the GNU guys can't stand to be dependend on too freely licensed tools. Hence bash tries to be the GNU projects JVM. (It's the backend of the abysmal automake "makefiles", which really would have caused less cancer if they just f**king used Perl.) The GNU project is well aware that they cause lock-in with all the bash-reliant shell scripts in most Linux distributions. As they do with not-quite-standard syntax features in GCC. And given the FSFs general attitude about deserved attribution I believe this all isn't accidential. Like Microsoft, if charm alone doesn't do it, they strive for technical compatibility hurdles.

apotheon
apotheon

What is it that you think that means? Wait -- before you answer that, maybe you should take a quick look at these articles: * Radiohead knows more than Microsoft about security * Microsoft finally catches the eight year bug * Microsoft may be Firefox's worst vulnerability * Unix vs. Microsoft Windows: How system designs reflect security philosophy * Promote openness: Custom applications and standardized formats I wouldn't want you to miss the implications of the fact that the same person wrote all those articles and end up making a complete fool of yourself by making accusations of being a Microsoft shill. Yeah, that's right -- I can smell the stink of that bad idea in the air. Just in case you're somehow getting ready to say something denigrating about those certification sources: yeah, they suck about as much as all certifications as measures of skill. It takes someone who has gotten a few certs to know that for sure, though. Okay, so . . . now it's your turn. Tell me what you're trying to imply.

apotheon
apotheon

I don't consider "sorta similar, but really totally different in terms of the fundamental principles on which it is built" to be very much "like" what it isn't actually emulating very closely. The GNU Project borrowed the names of some core utilities, gave its own core utilities some functionality in common with them, and otherwise basically did nothing else the same way as the Unix philosophy would suggest it should be done. > GNU is more focused on end user friendly. I don't see that at all. I think it's every bit as user-hostile. It's more ideologue-friendly, though.

Jaqui
Jaqui

the history and current state of HURD. the GNU micro kernel based os. the length of time it has taken to even get a barely bootable system with HURD says it isn't an easy direction to go.

apotheon
apotheon

In an "ideal" GNU world, elisp and Guile would be the scripting languages of choice. Neither is satisfactorily RNRS compliant. See a brief discussion of GNU Guile implementation compromises, for instance.

apotheon
apotheon

Have you ever seen any salvos in the vi vs. emacs war?

Sterling chip Camden
Sterling chip Camden

Only expressed in reverse order: ls | grep | sort | uniq is functionally: (uniq (sort (grep (ls)))) Each takes an input and returns an output.

apotheon
apotheon

Note that "primitive" refers to the fundamental tokens in a programming language, in many contexts. The truth of the matter is that the Unix environment is in many respects a programming language with greater utility than almost any other.

aureolin
aureolin

>> BSD systems should give in and adopt the major --gnu options. For the benefit of us users.

jg
jg

My memory may be wrong, but i think make pre-dates perl so it would have been hard for the designers of make to use perl. Moreover, doesn't make use the SHELL env vbl? So, if you want it (make) to use PERL to process the scripts embedded in make files go right ahead. That is, make does dependency analysis (one thing) not script processing (another thing).

CharlieSpencer
CharlieSpencer

"Write programs to handle text streams, because that is a universal interface" may be outdated too.

apotheon
apotheon

Yes. Without that philosophy, Unix would have cmd.exe and command.exe instead of actually useful shells. We also wouldn't have the pipe without that philosophy. Either everything would have to be implemented over and over again, as it is in the majority of third party GUI programs on MS Windows, or everything would have to be tied together so closely in a tangle of tightly coupled spaghetti code that a problem with one thing screws up the entire system, like the majority of first Microsoft GUI programs. Both approaches impose tremendous development costs and end-user reliability issues. I don't adopt a development philosophy without thinking about its benefits and detriments.

CharlieSpencer
CharlieSpencer

Done feeding the troll, Chad? Can I take it off now? :D

apotheon
apotheon

MINIX 3 seems to be making comparatively rapid progress, by contrast. HURD has been in active development for about twenty years (since 1990), and Stallman announced this year that he thinks the whole damned thing is a turkey that will never really be very good; by contrast, MINIX 3 went from announcement to first really usable release in four years, complete with interesting advances in the state of the art for Unix-like systems. It gets worse if you consider that the GNU Project as a whole started in 1984, which means it is taking about 2.5 decades and counting to have a complete GNU system that anyone would even want to use for reasons other than GNU boosterism. The big feature for HURD was the fact that it was a modular microkernel system (in development, in theory, et cetera), but it has been blown away on that front already by something that has been in development for about a quarter of the time, already works better, and represents a big advance in the state of the art over HURD (to say nothing of the fact it's better-licensed and doesn't come with a crapton of sanctimonious ideological GNU baggage). The problem isn't that it's a hard problem. Sure, some of it might be hard problem stuff; but solving hard problems in innovative ways is what geeks love to do. The fact someone else is achieving much more in a much shorter time, aiming for far more ambitious goals, without the GNU Project behind it, suggests that the difficulty holding things up isn't the actual problem they're trying to solve, but rather the people trying to solve it.

SciHacker
SciHacker

Yup ... Emacs is its own OS, and I use vi which is fast, arcane and moderately powerful ... of course I still yearn for an editor whose sandals these could not touch ... TECO!

apotheon
apotheon

I'm pretty sure the vi/emacs war predates the PC/Apple war.

fairportfan
fairportfan

I had enough of religious wars in the original Apple II/PC/Mac brouhaha

apotheon
apotheon

formatted with indentation: (uniq (sort (grep (ls)))) postfix notation: ((((ls) grep) sort) uniq) I'm really not sure how else that would be formatted with Lisp-style parenthetical code, using postfix notation. Functional using object oriented dot notation: ls.grep.sort.uniq M-expression style: uniq sort grep ls I'm just babbling at this point, really. I do have a question, though: What's the point of a grep with no parameters?

apotheon
apotheon

Are you saying that the text stream is no longer a universally accessible interface? Is there something that has supplanted it?

Justin James
Justin James

It may not be perfect, and it locks your tool choice down tightly, but PowerShell is onto something by combining the assumption of piping with an object model. J.Ja

seanferd
seanferd

In that case, let us dispense with the stored program paradigm I'm sure something equally interesting could be developed from the ground up.

Allen Halsey
Allen Halsey

I, for one, am glad GNU strives to improve CLI usage with innovations like long options and dragging ancient programs like tar up to obey today's modern option conventions. I'm trying to teach my son the CLI. I'm sure glad I don't have to explain to him that options begin with a hyphen except for some programs (tar and ps). If it weren't for GNU adding some modernization and consistency to CLI tools, I'd reluctantly teach him Microsoft PowerShell instead. What do you want the UNIX command line interface to be like in 20 years?

Jaqui
Jaqui

the base design decisions for hurd like moving the system scheduler for clearing ram into the user space not part of the kernel some choices have made hurd a failure from the start

apotheon
apotheon

Darwin, the base OS beneath all the Apple frippery, uses a Mach kernel. It just doesn't really take advantage of a microkernel architecture in any way, as far as I'm aware. I really don't know why.

seanferd
seanferd

of Mach, Hurd's predecessor. I'm not overly familiar with Nextstep or XNU/Darwin, but I have wondered why Apple dropped the microkernel aspect when designing OS X. Is it because it was just easier to use the work already done by the BSDs (which already had bits of Mach incorporated)? Not that I expect anyone to know, I'm just wondering "out loud", as it were.

Sterling chip Camden
Sterling chip Camden

... the ability to shoot yourself in the foot. One slip of the fingers, and you blow off the lower half of your body.

fairportfan
fairportfan

But my awareness of such things doesn't. OTOH, the Apple/PC war taught me to avoid all such conflicts... (Just did a quick Google to superficially renew my memory of vi and emacs. Yes, i feel i did well staying out of *that* war.)

Sterling chip Camden
Sterling chip Camden

@[apotheon,seanferd]: thanks, guys @neon: that would be postfix. German often seems to like that, too.

Neon Samurai
Neon Samurai

god heaven earth built I think that's it anyhow..

seanferd
seanferd

Probably easier for me to understand that way around, as well.

apotheon
apotheon

Leave it to Sterling to come up with an interesting demonstration by way of scriptural differences in wildly divergent natural languages. I'm impressed.

Sterling chip Camden
Sterling chip Camden

Just as a + b can be written a.add(b) in many languages. It agrees with the English language's preference for subject-verb-object ordering, whereas Lisp and other prefix languages correspond to the verb-first human languages. For example, Genesis 1:1 in Hebrew: ...barah elohim hashamaim ve haaretz "built God(s) the heavens and the earth" Hebrew prefers (built god (list heavens earth)) English likes god.created [heavens,earth]

apotheon
apotheon

Yeah . . . I noticed a long time ago that the .message notation is basically just postfix notation. It turns into infix notation when there are additional arguments, though, because of the parenthetical argument list that follows the method name.

Sterling chip Camden
Sterling chip Camden

The OO dot notation is nothing more nor less than a special way of passing a parameter to a generic function -- and is often implemented that way under the hood. object.method(arg) passes object as the first parameter to a generic function "method" that dispatches to the correct implementation for the argument signature, just like generic functions in CLOS (for languages that do that).

Sterling chip Camden
Sterling chip Camden

... finds the sound of one hand clapping? Actually, it produces an error -- on FreeBSD, at least.

Justin James
Justin James

"An example, if necessary, would be the difference between "dir" providing an object of type "DirectoryListing", which consists of "DirectoryListingLine" objects -- vs. -- "dir" provides each line as an array of fields with types "datetime", "int", "string"" This *sounds* about right from what I've seen. I've never programmed directly against PS, but I've used it a bit with my sys admin hat on. You can use stuff like an output formatter to refer to specific fields of output, for example. Like if you did something like (pseudo-code): dir | format -show:directory,filesize,timestamp | more J.Ja

nwallette
nwallette

Being ignorant of how it works internally, I'm wary of the whole object model thing. Are the objects of specific types, such that a command[let] needs to understand every type of object that it is likely to ever provide or receive? Or are they more like database tables, where the data itself is opaque, but each field's (for lack of a better term) datatype is defined? An example, if necessary, would be the difference between "dir" providing an object of type "DirectoryListing", which consists of "DirectoryListingLine" objects -- vs. -- "dir" provides each line as an array of fields with types "datetime", "int", "string".

apotheon
apotheon

You're thinking of .NET tools written in other languages as if they're routines written in the shell language. Objects passed between utilities as data should be used when appropriate, and not when they aren't. They aren't when what you're trying to tie together is a bunch of arbitrary tools. More to the point, objects aren't appropriate as a data format when you have no idea how the data might be used in the future. This might not seem like a realistic condition to someone who's immersed in the .NET environment, but that's not because it's unrealistic. It's because the .NET environment with PowerShell as its glue code mechanism provides sort of a self-fulfilling prophecy along those lines. When your data format is particularly suited only to specific tools, you only use those tools, and may not ever notice that there might be other ways of doing things if you aren't currently familiar with environments where those other ways of doing things are practical. The whole thing is a towering, tightly integrated vendor lock-in stack. > You have to write a parser and attempt to define a common format, even for textual data in many cases. The parsers have already been written. If you're writing them yourself, either: 1. you are doing it wrong 2. you are trying to do something so new it requires a new parser If you define your environment by restricted data formats, you start to think within those limits, though. You might never notice the need for something other than what's easy within the .NET environment if you think largely in terms of the capabilities of that environment. I think it's for this reason that every once in a while Microsoft provides something it hasn't before that is basically a poor man's implementation of what has been available in other environments for decades, and all those developers working in the MS environment think Microsoft invented the idea. > Or you can define a mess of switches to allow the caller to define the output or input format. A "mess of switches" is so the tool can be used directly, which is (believe it or not) often very useful. If you don't want the user to be able to actually use the tool, you don't need a "mess of switches" in any shell. > I agree that piping is a wonderful thing, but the PowerShell approach makes like a lot easier (in a "sell your soul to the devil kind of way) in certain regards. The PowerShell approach makes some things (marginally) easier at the expense of making other things damned near impossible. It's not a trade-off I'm really enthusiastic to make. edit: In fact, PowerShell is so relatively hostile to a simple means of interacting with the system that I find a Scheme REPL (namely Ypsilon) preferable to PowerShell as a shell environment. If I wanted to interact with .NET tools more, of course, I'd surely prefer PowerShell for those purposes -- but as a general purpose shell, I find it woefully misdesigned.

Justin James
Justin James

... is the "arbitrary" nature of the data in piping. You have to write a parser and attempt to define a common format, even for textual data in many cases. Or you can define a mess of switches to allow the caller to define the output or input format. With PowerShell you don't have that, reflection handles it all. I agree that piping is a wonderful thing, but the PowerShell approach makes like a lot easier (in a "sell your soul to the devil kind of way) in certain regards. J.Ja

apotheon
apotheon

The PowerShell pipeline is a typical Microsoftism: it is far from a universal interface. In fact, objects can be passed between programs via pipeline in Unix shells, too. While Microsoft might pretend this is a feature particular to PowerShell, the truth of the matter is that the Unix pipeline can handle arbitrary data -- not just text streams. McIlroy doesn't say we should write programs to accept and emit text streams because the Unix pipeline can't handle anything else; he does so because text streams offer a universal interface. Meanwhile, the PowerShell pipeline interface is particular to the .NET framework, and as such is much more limited and much less portable. I, for one, am not a fan of vendor lock-in.

apotheon
apotheon

> I, for one, am glad GNU strives to improve CLI usage with innovations like long options and dragging ancient programs like tar up to obey today's modern option conventions. There's nothing wrong with support for long options. There's something wrong with eliminating support for short options. There's nothing wrong with support for long options. There's something wrong with changing traditional option names just to provide new options with GNU Emacs compatibility in a core utility. There's nothing wrong with support for long options. There's something wrong with using one syntax for long options in one core utility, and another syntax for long options in another core utility. There's nothing wrong with support for long options. There's something wrong with piling on more and more options that have nothing to do with the actual purpose of a given tool, duplicating functionality of filter utilities that already exist and do that job just fine, just because long options provide fewer command option namespace conflicts. . . . et cetera. > I'm trying to teach my son the CLI. I'm sure glad I don't have to explain to him that options begin with a hyphen except for some programs (tar and ps). When did I say that standardizing option syntax is a bad thing? My problem is with stuff like how GNU eliminates standardization of option formats. After all, the GNU Project is not the only place where dashes before options were added to tools for standardization purposes, but it is the only place where people can't make up their damned minds whether core utilities should use one dash or two for long option names. > If it weren't for GNU adding some modernization and consistency to CLI tools, I'd reluctantly teach him Microsoft PowerShell instead. You act like the GNU Project invented these things, rather than just inconsistently applying them after getting the idea from elsewhere. > What do you want the UNIX command line interface to be like in 20 years? Consistent, and without unnecessary bloat. edit: Frankly, you kinda sound like one of these guys that think Microsoft invented everything we use on home computers -- except your focus is on GNU instead of Microsoft. (edit: typo)