Software

Atlassian introduces "Smart Mirroring" to accelerate Git transactions in Bitbucket

A new caching system in Bitbucket increases performance of Git, decreasing the clone time of large repositories to a few minutes, easing the development workflow for geographically distributed teams.

Image: iStockphoto/alphaspirit

The use of Git for version control—and the ecosystem of frontends and extensions for Git—has made the process of software development with multiple developers substantially easier than before.

However, for large organizations, having a development team spread across continents connecting to a central Git server can lead to substantially decreased performance relative to local access of the same resources. The issue becomes particularly exacerbated as repository sizes grow larger and larger.

Atlassian has introduced new solutions in Bitbucket for both of these problems. The first is "Smart Mirroring," which allows for the content of the Git server to be copied to a geographically closer server, in the pursuit of decreasing clone time for large repositories. Changes are then pushed back to the primary Git server from which the mirror grabs data. Developers at Atlassian have a great deal of firsthand experience with this, as programmers at their satellite campuses around the world collaborate with their home office in Sydney, Australia. As with the circumstances leading to Linus Torvalds' creation of Git, necessity is the mother of invention.

Architecturally, this greatly resembles the same mirroring strategy used by the popular nginx project, which is the foundation of content delivery networks like CloudFlare, though representatives from Atlassian would not comment on the exact implementation of Bitbucket's Smart Mirroring. Of note, the full nginx project would be somewhat excessive for a Git repository—while the static file handling and reverse proxy features are quite necessary, the aggressive load balancing of nginx (as used on a public-facing production web server) is not as necessary for a Git server.

On their own internal use of Smart Mirroring, Atlassian product manager Roger Barnes noted that:

"In internal performance tests, we observed a 5GB repository taking over an hour to clone from the primary instance in San Francisco to Sydney. With a local mirror (via Bitbucket Smart Mirroring), that time was reduced to a few minutes, resulting in a more than 25x speedup for the end user. However, performance gains will vary depending on the size of the repositories and capability of the remote link. Customers with limited bandwidth who have resorted to posting portable drives with pre-seeded copies of their repos, for example, could see improvements that are many times greater."

In a further effort to improve the speed of Bitbucket, support for Git Large File Storage (LFS) was also added, which uses reference files in place of large binary blobs like images, video, or audio, which are integral to a project outside the original scope of Git as a version control tracker. For multimedia-heavy projects like smartphone games, reducing the synchronization of static media files during development can greatly reduce the time waiting on Git operations before work can begin.

In terms of how this affects current users of Bitbucket, iOS developer Ruben H. Baca of KryptonWare Solutions said: "Collaboration is an increasingly important part of computer science. In the past, it seemed like solo-authored publications were the norm but my (completely informal and unsystematic) impression of the publication process is that multi-authored publications are becoming as common in computer science as they are in other social sciences (like psychology) and the physical and biological sciences where coauthoring has been common for a long time. Bitbucket allows us to do just that."

What about you?

Does your organization use Git for version control? Have you experienced slow performance when performing a repo clone on a geographically distant server? Are there other pain points you have experienced with Git? Share your development experiences in the comments.

Also see

About

James Sanders is a Java programmer specializing in software as a service and thin client design, and virtualizing legacy programs for modern hardware.

Editor's Picks