As the cost of hardware continues to decline and virtualization technology improves, organizations are looking more closely at server virtualization. Most of the techniques used to secure non-virtualized servers apply to virtualized systems, but there are some differences. In this article, I step through how hypervisor-based virtualization works, dispel some myths about virtual server security, and provide recommendations for ensuring the same level of trust for virtual environments as that which exists for traditional server implementations.
What is virtualization?
Server virtualization technology creates two or more virtual machines (VMs) out of one physical hardware platform. It also manages how the VMs share resources. Physical system resources are shared and are accessed through virtualized representations of them.
There are two basic types of virtualization. In the first type, a standard operating system (OS) runs on top of the hardware layer and hosts one or more guest OS instances. These guest instances are VMs. In the second type, a program known as a hypervisor abstracts the hardware from the operating system and manages how the VMs communicate with each other and with physical or virtual resources. I focus on hypervisor technology in this article.
How does virtualization work?
There are various server virtualization products on the market today. As an example environment I chose Xen. Xen is Novell’s virtualization solution for SUSE Linux. In addition, Microsoft has recently committed to ensuring Xen compatibility in its Windows OS’s. It’s a good representation of how hypervisor-based virtualization functions. Other hypervisor-based products include
- Intel with vPro
- Microsoft (expected after the Longhorn release)
- VMware ESX
All examples and descriptions in this section are based on research conducted by IBM and documented in “Building a MAC-based Security Architecture for the Xen Opensource Hypervisor” (Reiner et al, 8 Jun 2005).
Figure 1 depicts the layers of the Xen environment. The Xen hypervisor is a small application that runs on top of the physical machine hardware layer. It implements and manages the virtual CPU (vCPU), virtual memory (vMemory), event channels, and memory shared by the resident VMs. It also controls I/O and memory access to devices.
Figure 1: Xen Hypervisor Architecture
In Xen, VMs are called domains. They are labeled Dom0 and DomU in Figure 1. Dom0 is the first VM created. It is used to manage the other domains known as user domains. Management through Dom0 consists of creating, destroying, migrating, saving, or restoring user domains (DomU).
An operating system running in a user domain is configured so that privileged operations are executed via calls to the hypervisor. This is because they are powerful enough to compromise the hypervisor. According to IBM, there are three characteristics associated with these calls:
- They offer access to virtual resources (e.g., event channels and shared memory)
- They accelerate critical path operations such as page table management
- They emulate privileged operations that are restricted to the hypervisor but might also be necessary in guest operating systems
Within the Xen environment each VM, as well as the hypervisor, has its own resources. Resources allocated to the hypervisor include the CPU, I/O memory, and hypervisor memory. These are exclusive to the hypervisor; VMs are blocked from accessing them.
VM resources include vMemory and vCPU. In addition, all VMs share event channels and shared memory. An event channel provides for synchronization of interactions between VMs. Shared memory, managed by a grant table in the hypervisor, allows a VM to allow another VM access to vMemory pages it owns.
Shared hardware resources such as a network adapter are typically managed by device drivers inside each VM. It’s possible to run device drivers from a special domain created for that purpose. However, a compromise of that domain can expose all domains to downtime or leave them open to attack. Examples of devices with drivers that are often placed into a device domain include SCSI disk or an Ethernet device.
In a Dark Reading article, Paul Lin, senior director of project management at VMware is quoted as saying, “Virtualization is both an opportunity and a threat” (Kelly Jackson Higgins, “VMs Create Potential Risks”, 21 Feb 2007). Lin goes on to say, however, that because VMs are less complex than traditional operating systems, they should be easier to secure. So what are the risks organizations deploying virtual environments should address?
Just for a moment, let’s take a look at virtual machines in general. One concern can be described as uncontrolled—or unmanaged—proliferation of virtual servers. This could result in rogue VMs that are unprotected. The ease with which VMs can be created can increase the complexity of patch and change management.
Another and possibly greater risk is the potential for free and open communication between VMs. Unless steps are taken to control how information is shared between processing environments, the isolation approach—which hinders one physical server from acting as a launching platform for malware or other types of attacks—breaks down.
Safeguarding encryption keys might also be a challenge if sharing of vMemory between VMs is not controlled. Keys residing in memory could be compromised if another VM accesses the right (or wrong) memory page. Normal operation of VMs on the same server might not allow this to happen, but a compromised hypervisor or VM might break weak controls designed to prevent such mistakes.
All of the security issues raised so far can be solved by the right policies and processes. The proliferation of VMs is a management issue that must be handled through training, monitoring, and compliance enforcement. Implementing communication and sharing policies in the virtualized environment itself is a little more challenging.
One of the most critical assets to protect is the hypervisor itself. Compromise of the hypervisor exposes all VMs on a single physical server to attack. One method of protecting the hypervisor is through attestation at boot up. Using technology like TPM, the hypervisor can establish a trusted relationship with the host hardware that is maintained during server operation. This can also be extended to the VMs as well. (See IBM’s “Virtual Trusted Platform Module”.)
Addressing inter-VM communication is the objective of sHype, IBM’s proposed architecture for securing VMs in a hypervisor-based environment. Implementation of policies within the hypervisor allows administrators to determine which VM or groups of VMs are allowed to communicate and what resources they can share. Portions of the sHype specification are being included in Xen. (See “sHype: Secure Hypervisor Approach to Trusted Virtualized Systems”.)
The final word
In closing, I’m providing a list of recommendations for securing hypervisor-based virtualized servers taken from Gartner’s “Secure Hypervisor Hype: Myths, Realities, and Recommendations” (Neil MacDonald, Pub. ID: G00140754, 6 July 2006).
- Require your hypervisor provider to support hardware-based attestation of hypervisor integrity. The hypervisor should be able to extend its root of trust (built during attestation) to other critical partitions (domains).
- Immutable roots of trust in the BIOS or secure BIOS update mechanisms should be mandatory hypervisor selection criteria.
- Fully understand the level at which your hypervisor provider hosts drivers. (Drivers are a weak link in any server security model. Ensure that the compromise of a single driver doesn’t compromise the entire virtualized environment.)
- Security policies that define the configuration of the hypervisor, access controls, LAN or disk-based sharing, VLAN’s and so on should be protected against tampering.
- The ability to update policies should be tightly controlled, requiring strong authentication and verifiable digital signatures for integrity and validity.
- Restrict the ability of administrators to load arbitrary software in security, management, and other critical partitions.
- Plan for the single point of failure potentially caused by a service or parent partition.
- To protect against DoS attacks, no single host OS partition should be able to consume 100% of any physical resource.
- By default, VMs should not share their resources with other hosted VMs, unless explicitly configured to do so in compliance with the principle of least privilege.
- Inter-VM communication should only be enabled when configured through tightly controlled, explicit policy.