Two VMware engineers attempted to virtualize 100% of the workloads in an enterprise data center. This is a recap of the lessons they learned.
Virtualization has been the true enabler in the enterprise data center. Server virtualization has brought cost savings in the form of a reduced footprint and higher physical server efficiency along with the reduction of power consumption. Also, it has reduced the go to market time for application development and deployment.
Virtualization is the foundation of cloud computing. Organizations looking to one day migrate their entire data center to cloud-based resources must reach the goal of 100% virtualization.
Prior to jumping into the technical hurdles of reaching 100% virtualization with a data center, it's worthwhile to acknowledge the non-technical challenges. These challenges were not addressed as part of the VMworld session.
Business drivers such as prohibitive software licensing costs can prevent some workloads from being virtualized. For example, some enterprise software charges license fees based on the number of sockets within a virtualization cluster. So, if you have a 32-node cluster with 4-sockets per node, you will be charged for 128-sockets of license fees.
Another non-technical driver may be the workload's size. If an application requires the equivalent amount of compute resources as your largest VM host, it would be cost prohibitive to virtualize the application. For instance, a large database server consumes 96 GB of RAM, and your largest physical VM host has 96 GB of RAM. The advantages of virtualization may not outweigh the cost of adding a hypervisor to the overhead of the workload.
One last non-technical barrier is political issues surrounding mission-critical apps. Even in today's climate, there's a perception by some that mission-critical applications require bare-metal hardware deployments.
Fine-tune your environment
VMware claims that vSphere is well-tuned for generalized workloads; it has been my experience that this is more or less true. Therefore, if a customer is experiencing poor performance on a virtualized workload, the first place to examine is the tuning of both the guest and the VM host. One common area of misconfiguration is the I/O subsystem. Engineers should first look to storage and network configuration within both the VM host and the guest OS.
Always follow best practices for configuring guest I/O. Some of VMware's best practices, along with some of my own, include:
- ensure the VMXNET3 driver is used in the VM guest;
- install the latest version of VMware Tools;
- disable unnecessary services;
- leverage host-wide virus protection vs. individual VM based; and
- ensure storage multi-path is configured and working.
While these suggestions will go a long way for most workloads, more complex systems require advanced optimizations. Typical workloads that encounter performance issues when virtualized include high-performance computing (HPC) and big data applications.
It would be unrealistic to think the abstraction that enables the benefits of virtualization doesn't come at a cost. The hypervisor adds a layer of latency to each CPU and I/O transaction. The more intense the application performance requires, the more impact to the latency.
VMware has provided advanced optimizations that reduce the impact of this hypervisor overhead; these tools have improved in each version of vSphere. Most of the optimizations have come in the form of exposing the physical sub-systems to the guest operating systems.
Remote Direct Memory Access (RDMA) is the generic term for providing access to the physical hardware directly to a VM. VMware provides three types of RDMA access technologies:
- DirectPath I/O (supported in vSphere 4.0 and higher);
- SR-IOV (supported in vSphere 5.1 and higher); and
- vRDMA (expected in a future release of vSphere).
I'll give an example of one of the three optimizations. DirectPath I/O takes advantage of virtualization extensions in Intel VT-d and AMD-Vi. DirectPath I/O for Networking allows direct access to a server's network interface card (NIC). The direct access means the VM can leverage the advanced packet offloading features of a high-end NIC. In a performance test, VMware showed that a high-traffic web server could host 15% more users per CPU with DirectPath enabled vs. a traditional virtual NIC.
However, these direct access optimizations come at a cost. Enabling DirectPath I/O for Networking for a virtual machine disables advanced vSphere features such as vMotion. VMware is working on technologies that will enable direct hardware access without sacrificing features.
Complex workloads can challenge the desire to reach 100% virtualization within a data center. While VMware has closed the gap for the most demanding workloads, it may still prove impractical to virtualize some workloads.
What's your experience?
Have you found the overhead associated with hypervisors a hindrance to virtualizing your most demanding workloads? Share your thoughts in the comments.
- TechRepublic's top 3 takeaways from VMworld 2015
- NSX roadmap: How VMware wants to expand network virtualization
- It's time to scale the software-defined data center, and VMware has a plan
- VMware moves out of the data center and into the cubicle with new end-user solutions
- The software-defined data center: Security is a battlefield
- Decoding VMware's mysterious containerization strategy
Note: The information about vRDMA was updated on Sept. 17, 2015 based on a clarification from VMware.