Data Centers

CoreOS eliminates downtime from server OS patching

Rackspace and Amazon have come under fire for disruptions caused by patching a Xen bug. Learn how CoreOS lets you do server OS patches without downtime.

coreos-overview-1-638.jpg

The Shellshock vulnerability and the even more recent Xen hypervisor vulnerability remind us of an important fact of enterprise IT: server OS patching is hard. Patching is one of the most unglamorous though extremely high-profile aspects of managing an IT infrastructure. One promise of the abstracted infrastructure that is sometimes overlooked is the flexibility to patch the infrastructure with no impact on the application layer.

It starts with the network

Most organizations have network patching down to a science. Data center fabrics are completely redundant by design; network operations can normally deploy rolling updates without any downtime related to network booting.

A lesson learned from network patching is leveraging redundancy to avoid downtime. Amazon and Rackspace have come under fire for disruption caused by patching their Xen hypervisor-based infrastructures; the flavor of Xen used by Amazon and Rackspace doesn't leverage live migration. In contrast, VMware and Microsoft leverage live migration to allow for rolling updates that don't require application downtime. This approach solves the challenge of hypervisor-level patching, but trying to apply this concept to the server OS is more challenging.

Leverage OS clustering when patching servers

The challenge has been to bring the operational flexibility in hypervisors and network to servers. One way to achieve this flexibility is to leverage OS clustering.

Today, OS clustering is complex and normally reserved for the highest tier applications. It's common to see non-disruptive patching for server infrastructures supporting ERP, email, and large databases. These applications are typically clustered at the application layer.

CoreOS is trying to simplify OS patching. According to CoreOS CEO Alex Polvi, who was a guest on a recent Cloudcast podcast, CoreOS can solve patching headaches. The long-term value proposition of the CoreOS primary product of the same name is to bring Google Chrome-like updates to server updates.

Abstraction is key

CoreOS leverages a commodity approach to clustering, which will provide the pre-requisite for non-disruptive patching. OS level clustering requires the application to be cluster aware. CoreOS tries to abstract the underlying OS from the application layer so that applications can be live migrated from one OS to another similar to how live virtual machine migration work.

According to Polvi, Docker is a potential ecosystem partner. Docker also looks to abstract the OS from the application. The long-term vision of Docker is similar to one that the industry has heard before.

In the 90s the promise of Java was to write once and run everywhere — Docker has a similar objective: Software packaged with Docker can potentially run on any OS running Docker. In theory, a machine running Window Server 2012 would be able to run an application developed and packed on Red Hat Linux.

With Docker sitting in-between the OS and the application, CoreOS can easily provide the clustering needed to perform live migrations of all the applications running on one server or a set of servers. With this capability, CoreOS could provide the Chrome-like updates it is promising. Additionally, the patching challenge isn't just moved from OS to container — the same live migration capability can be used to patch the container software.

Conclusion

According to CoreOS, the clustering portion of its solution is still in beta and has a long way to go before they'll support it in production. However, they feel confident in their Chrome-like OS update capability, which is available today.

There currently isn't capability to do live migrations between Docker containers, nor is there the ability to run Docker apps on any OS. One of the principles of computer science is to abstract the lowest levels of technology. In this case, the lowest level is the OS and at some point the industry will abstract the OS, making patching much less disruptive.

Also see

Note: TechRepublic and ZDNet are CBS Interactive properties.

About Keith Townsend

Keith Townsend is a technology management consultant with more than 15 years of related experience designing, implementing, and managing data center technologies. His areas of expertise include virtualization, networking, and storage solutions for Fo...

Editor's Picks

Free Newsletters, In your Inbox