Discussion on:
View:
Show:
Do you agree that lower-end servers will see a resurgence in popularity?
Whenever the government puts their nose into anything it gets screwed up. Backup systems, and redundancy are the growth secors for 2008-10.
...has been from hardware to software. Meaning, the biggest threat to uptime used to be hard disk, power supply, and other component failures. These days, most of my time seems occupied with OS and software snafus. (Thanks Microsoft, for maintaining my job security) Even on the cheapest boxes I oversee, it's rare to see anything spontaneously fail, save for hard disks and power supplies. Power supplies can be quickly replaced, and I tend to swap hard disks on mission critical systems long before they hit 5 years.
Personally, for the most part I don?t see expensive computers cost justified on reliability alone since inexpensive ?generic? components are now nearly as reliable, if not just as or more so. I think in the future, the selling point of expensive systems will have to be performance and energy consumption.
Personally, for the most part I don?t see expensive computers cost justified on reliability alone since inexpensive ?generic? components are now nearly as reliable, if not just as or more so. I think in the future, the selling point of expensive systems will have to be performance and energy consumption.
The reason that google can take that approach is that their operating system can support it. As you pointed out, the OS handles individual failures, resubmitting the task to another CPU and disabling the broken one. They could not do that if they were running Windows on those servers.
Been there, done that.
At the start of my IT career I worked for a company that used Tandem computers. They are one of the original distributed processing vendors. Everything (hardware and OS) was designed and build redundant. Their minimum computer box had 2 CPUs, one transparently redundant for the other. Controllers had 2 ports, lots of RAID. And most importantly, the OS, Tandem Non-Stop, was designed to handle distributed processing totally transparently. If a piece of hardware failed, the OS handled the failover. Multiple CPU boxes could be networked together, even between buildings (I was there when it was done once, transfering a box from "development" to "production" logically, no physical move required. If traffic required, the same program could be run on multiple boxes. A program could spawn parallel processes on separate boxes ... Tandem was started in the mid 1970's.
(anyone get the idea I am a fan of theirs?)
It sounds like everything that the big shops are trying to re-invent in terms of reliabliltiy, failover and parallel/grid processing.
So where is this paragon. In the mid 90's they were bought out, and then bought again by, you guessed it, HP. So odds are that the (Tandem?) pair of very expensive servers you bought have some of that Non-Stop technology in them to make them more reliable.
Been there, done that.
At the start of my IT career I worked for a company that used Tandem computers. They are one of the original distributed processing vendors. Everything (hardware and OS) was designed and build redundant. Their minimum computer box had 2 CPUs, one transparently redundant for the other. Controllers had 2 ports, lots of RAID. And most importantly, the OS, Tandem Non-Stop, was designed to handle distributed processing totally transparently. If a piece of hardware failed, the OS handled the failover. Multiple CPU boxes could be networked together, even between buildings (I was there when it was done once, transfering a box from "development" to "production" logically, no physical move required. If traffic required, the same program could be run on multiple boxes. A program could spawn parallel processes on separate boxes ... Tandem was started in the mid 1970's.
(anyone get the idea I am a fan of theirs?)
It sounds like everything that the big shops are trying to re-invent in terms of reliabliltiy, failover and parallel/grid processing.
So where is this paragon. In the mid 90's they were bought out, and then bought again by, you guessed it, HP. So odds are that the (Tandem?) pair of very expensive servers you bought have some of that Non-Stop technology in them to make them more reliable.
You know, with all the recent hype on "green computing", it might be that the big industrial players realized that as well, and are already playing their cards in preparation for the future.
Most current distributed software is ill prepared to deal with hardware failures. Checkpointing and rollbacks are complex issues that are exasperated with recent trends to use asynchronous communications rather than synchronous. In many ways software engineering has not yet come to grips with the difficulties embedded in distributed applications and we are probably a decade away from being able to rely on hardware redundancy to make up for lower hardware reliability.
I am inclined to agree on your statement that we are a decade away form being able to rely on hardware redundancy to make up for lower hardware reliability due to shortcomings in distributed software in general.
However, I believe that virtualization is stealing the play here. If you look at VMware's ESX 3, they have rather advanced features for failovers of VMs.
So it does look like even the software does not matter so much now.
However, I believe that virtualization is stealing the play here. If you look at VMware's ESX 3, they have rather advanced features for failovers of VMs.
So it does look like even the software does not matter so much now.
Virtualisation is the key trend as to why hardware reliability is becoming largely irrelevant.
It won't be long before the SME market can afford a SAN and a couple of hi-po servers and can virtualise end to end.
It won't be long before the SME market can afford a SAN and a couple of hi-po servers and can virtualise end to end.
Yes, I agree that virtualisation is one of the key enabling factor. Though I personally feel that having a SAN here is not necessarily as important as the hardware vendors would like us to believe.
Useful yes, but its importance have to be measured against actual uptime/fail-over requirements.
Useful yes, but its importance have to be measured against actual uptime/fail-over requirements.
I think you have missed the point of Google building their own servers. They have done this not because hardware reliability is irrelevant, but because it is so poor that they are constantly needing to replace components (mostly disk).
Rather than pay the price for manufacturers maintenance on these very commoditized products, they have brought their break-fix function in house to save money. Once they could do their own maintenance, they could do what many kids are doing in their garages - build their own servers with commodity parts, run commodity OS, and cut out the middleman on both product and service.
For most users this is an impractical approach to dealing with server hardware failures. Reliability is more important than ever - but its visibility is masked by hardware redundancies. This has made for an environment where users percieve far better reliability than is being manufactured.
You state that you would build your own servers if not for the lower reliability. How do you know your own product would be less reliable?
By the way - I believe the best business decision is always to buy the most reliable product. The key is to know which is which.
Rather than pay the price for manufacturers maintenance on these very commoditized products, they have brought their break-fix function in house to save money. Once they could do their own maintenance, they could do what many kids are doing in their garages - build their own servers with commodity parts, run commodity OS, and cut out the middleman on both product and service.
For most users this is an impractical approach to dealing with server hardware failures. Reliability is more important than ever - but its visibility is masked by hardware redundancies. This has made for an environment where users percieve far better reliability than is being manufactured.
You state that you would build your own servers if not for the lower reliability. How do you know your own product would be less reliable?
By the way - I believe the best business decision is always to buy the most reliable product. The key is to know which is which.
- Keyboard Shortcuts:
- Prev
- Next
- Toggle

































