John Joyner explains the benefits of blade-based server deployments, especially as a scalable format for private clouds, and covers monitoring strategies with the HP BladeSystem enclosures.
Blade-based server deployments could have a renaissance as a preferred platform for many private cloud initiatives in the mid-market and distributed enterprise. A small farm of identical blade servers is an ideal platform for one or more clusters of Microsoft Hyper-V, VMware ESX, or Citrix Xen virtualization hosts-the core of the private cloud. Here are some reasons that the blade server format is appealing for private clouds, which might consist of from several dozen up to a few hundred virtual machines (VMs):
- Ease and safety in scaling from just a couple of server blades up to a fully-populated blade chassis-this lets you start your private cloud migration on a small scale and not lose investment as you add capacity.
- Guarantees uniformity of scale units, so you don't have to support, for example, multiple network adapter models as you add virtualization hosts in the future.
- Disassociating the compute nodes (i.e., the blade servers) from the storage node(s) (most likely storage area networks-SANs, or storage blades) is inherent in blade design and aligns with virtualization cluster architecture.
- The converged networking possible in blade enclosure-based switches enables practically unlimited scenarios such as Quality of Service (QoS) and Virtual Local Area Networks (VLANs).
- Simplicity in duplicating cloud fabrics at primary and failover sites in Disaster Recovery (DR) scenarios.
- Possibility to leverage vendor and operating system (OS)-specific ‘bare metal' blade provisioning technologies-such as transforming a blade in a shipping box to a production virtualization host in a few minutes.
Enclosure Diagram View of HP BladeSystems in the System Center Operations Manager console. (Click to enlarge images)
Managing the blade enclosure
Since the enclosure consolidates power and cooling, networking, storage, and out of band (OOB) connectivity, managing each of these functions depends on software-based instrumentality (you can't walk behind the server and move cables around). Blade vendors may equip their enclosures with ‘management modules' that provide a way to interact with the intelligent enclosure itself, such as a web-based management portal. These management modules can, in turn, be monitored and provide a stand-off view of the health and performance of multiple enclosures and their blades.
Keeping track of which servers are in which enclosure bays is obviously a critical management task, as well as providing a rapid means to connect to the OOB management port of each blade and the management module of the enclosure. The more blade enclosures you have in your organization, and the more blades each contains, the higher the administrative burden and the criticality of effective monitoring. This can quickly complicate things to the point that efficiencies in the blade design are offset by difficult administration or missed alerts of pre-failure conditions.
The diagram view of the HP C7000 blade enclosures in Figure A illustrates how the server and storage blades in the device bays are discrete objects, while the shared features of the enclosure such as power and cooling systems are associated with the enclosure chassis. Imagine if you had dozens of enclosures to manage, how much time this would save correlating chassis issues with affected blades. Notice also that with the a server in a device bay selected, on the right in the Actions pane, there are handy one-click tasks to connect to the involved HP Integrated Lights-Out (server OOB) and HP Onboard Administrator (enclosure management module).
Monitoring the blade enclosure in the private cloud contextOf course HP makes their own monitoring solution, HP Systems Insight Manager (HP-SIM), which does a great job managing HP equipment of all types. (IBM, Cisco, and Dell have their respective solutions as well.) However HP-SIM does not monitor application performance-something SCOM excels at. Figure B illustrates the simple power of this concept. This is an alert view from the SCOM console, which lists in chronological order when alerts were received from an environment with two blade enclosures. Notice that alerts of heartbeat failure from monitored blade servers were preceded by 50 seconds of alerts of failure to connect to blade enclosures.
Real world: Enclosure connectivity alerts precede heartbeat failure alerts
In the real-world case shown in Figure B, at a glance the SCOM operator knows the reason for the loss of connectivity to the monitored servers is due to a loss of connectivity to the entire enclosures. A targeted investigation can immediately begin at the furthest junction point in the connectivity paths to both enclosures. Anticipating such a scenario, a SCOM distributed application you might have authored could include a superset diagram that immediately isolates this fault.
HP BladeSystem Enclosures Management Pack
There are two approaches hardware vendors can take when writing management packs for monitoring of devices like blade enclosures using SCOM 2007 R2 (the release of SCOM that was current until April 2012 when SCOM 2012 was released). These are (1) leveraging the native Simple Network Management Protocol (SNMP) features of SCOM 2007, and (2) building a software add-on to SCOM that uses a vendor's proprietary technology, rather than SCOM 2007 SNMP features, to monitor the devices. HP wisely followed the second approach for their solution and produced an elegant and useful tool (which also works with SCOM 2012).Figure C shows the HP BladeSystem Enclosure Monitor Manager application that HP provides along with the management pack. You run the monitor service on one or more management servers in your organization, and the service interacts with SCOM to provide health and performance data. This approach is better than using native SNMP with SCOM 2007 for scaling reasons. More enclosures can be managed with this technique without negatively impacting the performance of the overall SCOM 2007 management group. (See note below about SCOM 2012.)