Isolating points of failure in a VPN
CIO Republic is introducing a new monthly column, the VPN Advisor, covering VPN issues and trends. Columnist Salvatore Salamone will answer TechRepublic member questions, so we invite you to send in any questions you have concerning VPNs.
Reliability and redundancy
Q: When it comes to VPNs, how reliable are they, and are some more reliable? What about redundancy?
—Dan Shundoff, President, IntelliCom Computer Consulting, Inc.
Salamone: With VPNs, there are several factors that contribute to reliability. I’m assuming a typical VPN setup in which there is a VPN gateway or server at the company headquarters' site. Remote users connect either by dialing into an Internet service provider, or they may use DSL or cable modem services. Remote sites connect using a gateway or server of their own.
One possible point of failure for the VPN is the gateway or server within the corporate headquarters. For that reason, many companies use a more resilient VPN gateway or server for the headquarters location. Typically, the product chosen includes such features as redundant power supplies and fans, hot-swappable modules, sophisticated remote network management tools, and the ability to monitor such things as the temperature of the cabinet and the speed of the unit’s cooling fans.
Another reliability concern relating to the headquarters VPN device is how to keep the VPN up and running if that device fails. Many VPN equipment vendors offer load-balancing or clustering technology so that if one device fails, VPN users can connect to a second device. The level of sophistication of this failover feature varies greatly within the industry. Some vendors’ products perform a live cutover to keep an existing VPN session alive if the primary VPN server fails. More common is a less automated approach. If a VPN server fails, the existing sessions are terminated, and the user must relaunch his or her VPN client software. The new session is then initiated on a secondary VPN server.
An alternative approach to deal with a VPN gateway or server failure is to use a third-party load-balancing switch in front of two or more VPN devices. If one VPN device fails, the traffic will be directed to another VPN server or gateway.
The service used to connect remote workers and sites could also be a source of potential problems. Some service providers offer premium dial-access services for dial-up users. These services typically offer service level agreements (SLAs) that guarantee that a user will connect on the first or second dial attempt something like 90 or 95 percent of the time. Some of these services base the call-completion rate SLA on other metrics, such as the Internet service provider benchmarks by Keynote Systems, Inc.
For site-to-site VPNs, many companies chose premium access services that offer SLAs on both the availability of the provider’s backbone network and the latency of traffic passing over that backbone.
When the VPN is a replacement for other data networks (e.g., it replaces a Frame Relay network), another reliability issue to explore is the use of multiple service providers. The way this would work is to connect the main VPN server or gateway to two or more providers’ networks. This way, if one provider has a system failure, there is a second route for all traffic to follow. This type of setup is not widely used, but it is an option for cases in which the VPN is essential.
Q: As more and more of our customers look towards VPNs as the primary WAN connection between offices, we are faced with addressing the basic issue: How scalable is a VPN, and are some solutions more scalable than others?
—TechRepublic member who requested anonymity
Salamone: VPN equipment exists that can support up to hundreds of thousands of simultaneous sessions. However, this equipment is primarily aimed at the large telcos and service providers—and it is fairly expensive.
Most companies don’t buy hardware of that size. In most companies, VPNs are deployed gradually. In some scenarios, after a pilot project, perhaps all telecommuters are moved over to the VPN to start. And then perhaps a few remote offices that have in the past been too expensive to connect to the corporate backbone are added. And still later, branch offices that are currently connected using other services like Frame Relay are migrated over to the VPN. And somewhere in all of this process, the VPN is used to provide secure connections to business partners and customers.
In such a scenario, companies start with fairly modest VPN needs. The challenge most companies face is: How do we buy a product that will support a pilot program and then allow us to move to full-scale deployment? The key issue is how many simultaneous VPN sessions at a certain data rate the system can support.
There are a couple of ways to scale a VPN. One approach is to look for VPN devices that can be clustered. In that way, you could buy a lower-end device that supports say 20 or even 100 simultaneous sessions to run a VPN pilot program and to support early users. As the VPN grows, you can purchase more devices and link them together to support more simultaneous users.
Another approach is to use a chassis-based VPN system. Basically, a chassis-based VPN system uses modules that allow the system to scale up as VPN requirements grow. Most chassis-based systems offer modules that when added to the chassis increase the processing power (which is needed to handle the additional encryption and tunneling tasks) of the entire system so that more simultaneous sessions can be supported. This is a fairly common approach taken with other networking equipment, like routers and switches.
One downside to the chassis-based system approach is that the initial cost of the chassis is fairly high compared to a lower-end VPN system. However, the additional cost is often offset in large deployments because the additional modules cost less than standalone VPN systems. Additionally, the higher initial cost buys things like advanced management features, the electrical and cooling power to support a much larger number of VPN sessions, and the ability to grow the system.