What does upstream and downstream development even mean?

Have you heard the terms upstream and downstream applied to source code, and wondered what they meant? Here's a simplified explanation.

What does upstream and downstream development even mean? Have you heard the terms upstream and downstream applied to source code, and wondered what they meant? Here's a simplified explanation.

If you've ever dealt with (in any shape or form) open source software, chances are pretty good you've heard the terms upstream and downstream. These terms are actually more important to open source development than you might think.

But what do they even mean? 

I'm going to explain it to you.

SEE: Choosing your Windows 7 exit strategy: Four options (TechRepublic Premium)

The flow of data

The terms upstream and downstream refer to the flow of data (aka code). In this sense, there are two directions: Upstream and downstream. Code can flow upstream or downstream. These directions are crucial to the development of open source projects. The destination of the code defines if it is flowing upstream or downstream.

Let's examine the destinations, by way of forking an application.

Let's say we have Application A. This is the original iteration of the software. Eventually, Application B is forked from Application A. So at this point we have:

A -> B

Next, Application C is also forked from Application A, and Application D is forked from Application B. Now we have:

A -> B -> D
|
C

Judging by the arrows, you might be able to see a "stream" taking shape. From A to B to D and from A to C. How does it work? Simple.

Say the developer of Application A commits a change to their software. That change is then picked up by Developer B. That flow of data is downstream, as it is flowing away from the origin. However, if the developer of Application C commits a change that the developer of Application A wants to incorporate, that flow of code is upstream, as it is going toward the original source.

So to put it simply:

If the flow of data goes toward the original source, that flow is upstream. If the flow of data goes away from the original source, that flow is downstream.

Which is the better flow?

The idea that either upstream or downstream could be superior depends on the commit. Say, for example, the developer of Application B makes a change to the application that adds a new feature unique to B. If this feature has no bearing on Application A, but does have a use in Application D, the only logical flow is downstream. 

If, on the other hand, the developer of Application D submits a change that would affect all other applications, then the flow should be upstream to the source (otherwise, the change wouldn't make it to applications B or C).

But what if the developer of Application D codes a new feature that would benefit B but not A? That flow of data would still be upstream, because D was forked from B (even though the commit wouldn't make it to A).

The benefit of upstream

An upstream flow of data has one major benefit (besides all forks gaining access to the commit). Let's say you're the developer of Application B and you've made a change to the core of the software. If you send that change downstream, you and the developer of D will benefit. However, when the developer of Application A makes a different change to the core of the software, and that change is sent downstream, it could overwrite the commit in Application B. 

Instead, if the developer of Application B sent the core change upstream, this wouldn't be the case as the change will remain unless the developer of Application A overwrites or removes said change. Because of this, it is beneficial to send such changes upstream, otherwise you'll wind up having to deal with making that change again after the upstream code from Application A (that doesn't contain your change) is applied to Application B.

Linux distributions

The terms upstream and downstream also apply to Linux distributions, with a slight twist. Let's examine Ubuntu Linux. Upstream refers to the software that a distribution packages and ships to users (such as GNOME, Firefox, and the Linux kernel). So if the Ubuntu developers make a change to GNOME, they would then send that change upstream (to the GNOME developers). If the GNOME developers make a change that would affect Ubuntu, they would send that change downstream.

Because Ubuntu is a derivative distribution, Ubuntu itself has an upstream--Debian. Debian serves as the upstream for much of Ubuntu. However, for the likes of GNOME and the kernel, Ubuntu packages directly from the upstream project and bypasses Debian. So Ubuntu has multiple upstreams.

The downstream of Ubuntu would be all derived distributions (such as Elementary, Mint, Kubuntu, etc.).

Streams of confusion

It's not exactly simple, but it's also not rocket science. Just remember that the direction in which data flows is crucial to the development of open source projects. And now, when you hear reference to the upstream kernel, you know that kernel comes from the original source, whereas a downstream kernel has come from a source beyond the origin. 

Or, with regards to the kernel, upstream is Linux (and kernel.org) and downstream is everything else.

Also see

Focused developer coding on computer monitors working late in office

Image: iStockphoto/Viktoriia Hnatiuk