Why developers are abuzz about distributed version control systems

Learn what sets distributed version control systems apart from traditional version control systems, and why more and more developers prefer distributed version control.

Version control is something that most developers see as a fact of life. You need to use version control so you don't make a mess, though it's a bit of a hassle, and it usually does not work to make your life wonderful by itself.

When I hear developers talk about version control, it is typically complaints, especially about some of the older systems that are (fortunately) discontinued, unsupported, and quickly going away. So when developers I know and respect started talking about how much they like their distributed version control systems (DVCSs), I paid attention. Based on my research about DVCS, here's a look at what makes these systems different from traditional version control systems, and why many developers prefer DVCSs.

What sets DVCS apart

The primary difference between DVCS and traditional version control is where the repository lies.

In traditional version control systems, there is a centralized repository (perhaps mirrored to a few nodes for performance and redundancy) that contains the master copy of every file. Developers have a non-authoritative copy on their local drives, and when a user wants to make their copy the master copy, it needs to be checked back in. Users can lock files in the repository to signal to others that the file is being changed, and when a file is checked into the repository, any differences between what is on the server and what is being checked in need to be resolved if the files were changed separately by different users. If you want to experiment or create a different version, you branch the code tree, which makes an entirely different copy. Eventually, you can merge the two trees back together again, and you have to resolve the differences between them at that time.

A DVCS turns this around. Instead of a centralized repository in a central server, each developer maintains their own repositories (this is what makes it "distributed"). It is very similar to a peer-to-peer network. This creates much more redundancy than in a traditional version control system. In and of itself, this is nothing special. The big difference is the approach to check ins. DVCS uses change sets and creates one on each check in, whereas traditional systems just replace the file and give it a new version number. If two developers make changes based on the same initial parent in the tree, this creates two parallel branches automatically. Merging can occur more easily because the system knows what is actually different between the two branches, as opposed to just seeing a different version number and forcing a comparison. The Mercurial wiki has an excellent tutorial on the topic that explains it in depth.

Where DVCS shines

There are two big advantages cited with DVCS: the distributed end of things and the workflow. The distributed aspect is great for geographically dispersed teams, loosely organized teams, highly mobile teams, and other scenarios where not everyone is always connected to a central repository. But, as Joel Spolsky said, it's the branching/merging that is much more interesting, and from what I've read, I agree.

Because of the nature of change sets, it is much easier to merge someone else's changes with your own, because you aren't trying to reconcile your changes to a central branch and their changes to a central branch -- you can just get the changes they made instead. This makes it much easier to actually experiment, create various proof of concept versions, cut production or QA versions, and so on. And when these tasks become easier, they happen more often.

As someone who has been using Team Foundation Server (TFS) for a couple of years, I hate the merging story. It's so bad that I've learned to not branch, and we had a lot of growing pains when our developers started working together on the same project. I can definitely see how this is a huge advantage when working with large teams with a lot of moving parts because each sub-team can work on their own part of the system and easily merge back into a project repository when their changes are ready to be seen by others.

Plans to experiment with DVCS

I'm frustrated with TFS. Is it a bad tool? Not at all. TFS has a lot of built-in functionality, such as being able to tie work items to check ins, decent reporting capabilities, and build management. But it is really miserable to use for version control. At least once a week someone on our team has a TFS question, and it is not user error, it is a problem with the sheer complexity of TFS and the unintuitive revision model.

I am ready for a change. I will probably experiment with either Git or Mercurial on personal projects in the near future, and if that pans out, I will look into bringing it back into work. Based on the results and comments in my recent TechRepublic poll about version control, it seems that I am not the only one moving in this direction.


Disclosure of Justin's industry affiliations: Justin James has a contract with Spiceworks to write product buying guides; he has a contract with OpenAmplify, which is owned by Hapax, to write a series of blogs, tutorials, and articles; and he has a contract with OutSystems to write articles, sample code, etc.