Human error the cause of WGA meltdown

The facts are in. The weekend snafus involving WGA that resulted in validation and activation problems for some 12,000 Windows Vista systems was caused by human error.

According to Microsoft, the problem began when preproduction code was sent mistakenly to production servers. A rollback resolved activation problems within 30 minutes. However, it failed to solve the problem for the validation portion of the system, which suffered a downtime of some 20 hours.

Excerpt from eWeek:

Microsoft, based in Redmond, Wash., confirmed Aug, 27 that the problems with processing validations started at about 3:30 p.m. PDT on Friday, Aug. 24, and continued until around 11:15 a.m. Pacific Time on Saturday.

Excerpt from the the WGA Blog:

Nothing more than human error started it all. Preproduction code was sent to production servers. The production servers had not yet been upgraded with a recent change to enable stronger encryption/decryption of product keys during the activation and validation processes

While the response to the activation issue was quick (less than thirty minutes) the effect on our validation service continued even after the rollback took place. We expected the rollback to fix both issues at the same time but we now realize that we didn't have the right monitoring in place to be sure the fixes had the intended effect.

Microsoft is keen to stress that its WGA system is designed to default to genuine if the server is disrupted or unavailable.

In its own words: "If our servers are down, your system will pass validation every time. This event was not the same as an outage because in this case the trusted source of validations itself responded incorrectly."

Does this report cause you to regain some of your confidence with WGA?


Stay on top of the latest tech news

Get this news story and many more by subscribing to our free IT News Digest newsletter, delivered each weekday. Automatically sign up today!