I encountered every developer's worst nightmare this weekend: severe data corruption in the database. The corruption was in the worst place possible from my viewpoint as a consumer: my credit card company. Not only did I uncover a case of significant data corruption, but it looks like the backup system was out to lunch too.
A few weeks ago, I received a paper bill from Capital One instead of the usual e-mail notifying me that my statement was posted online. When I went to pay my bill, I didn't notice anything unusual -- although, in retrospect, I should have. I paid my bill a few days before the due date and went on my merry way. This Saturday (after the due date), I received a letter in the mail informing me that my payment was rejected due to an invalid bank account number used for payment. Huh? My checking account has not changed in well over five years. The number of the checking account the company tried to debit was definitely not mine, but the number was striking a chord deep inside.
Customer service tried to help me, but they were having a hard time understanding that the last time I changed that information in their system was five years ago; they kept reminding me that no one other than me has access to the system. "Then how did it change?" was my usual response. Near the end of the customer service merry-go-round, I found out on my own what happened.
I went through the bill payment system to see what account they had on file, and I figured out why the checking account number on file looked familiar. It was the checking account I originally had in the system -- the one I shut down more than five years ago. That data was removed in Capital One's system at that time and was replaced by my current information.
Data corruption is the silent killer of databases and the source (and often the result of) security breaches, system failures, and programming mistakes. I hit the panic button, big time.
Customer service transferred me to its Web site support team, and I spoke to a friendly woman named Megan. Unlike the people I spoke with at customer service, Megan immediately had an answer for me. She told me that during a "Web site upgrade" some data had been "deleted" with no knowledge of what data was lost and whose data was affected, leaving no possibility of restoring it. I thought her explanation was pretty bad; how can a "Web site upgrade" affect the checking account I have on file? I am also unsure why the company cannot inspect the database changes to determine what records were altered. I am baffled why "deleting" data reverted it to values from 2002. And if Capital One did do a database restore, why is the most recent data from 2002? Finally, why was the customer service department ignorant of this situation, leading me to spend an hour winding myself up with fear of a hacked account and frustration at them?
From a customer service perspective, Capital One did a good job of reminding me why I have been a customer for so long. My Capital One card is not my best credit card; it has an extremely low limit, a fairly high interest rate (I got it right after college), and few benefits. I usually just use it to "firewall" myself from untrusted vendors online.
Megan transferred me back to customer service, and they reversed the "past due" fee and took the associated black mark off my record within a minute or two. She also told me that the message was passed to customer service to let them know what had happened globally, but obviously not everyone got the memo (which was odd, since I spoke to three customer service representatives and a floor manager). Despite my increasing frustration and volume of voice, the customer service representatives stuck with me and treated me quite well. I still like Capital One as a company and as a vendor of financial services.
But, at the end of the day, Capital One committed more than one of the top 10 "thou shall not's" in IT with this incident.
- It allowed data to be severely corrupted.
- It deployed code without an appropriate rollback or backout plan or path.
- It did not notify its customers despite that the mistake is costing its customers to have late payments, resulting in fees and credit history problems if uncorrected.
- It did not properly prepare the customer service team to handle the situation.
- It allowed the user to see that data had been corrupted, which has destroyed all trust in the system.
The real problem with data corruption is that fixing it completely is nearly impossible unless you shut everything down and perform a full database restore from a backup that is known to be good. For something like a credit card system, that is close to impossible and possibly/probably illegal. It is why the folks who work on those systems get paid so much. An event like that puts the company out of business nearly instantly. Even if the system survives the issue at a technical level, the effect it has on the users' trust will ruin you. The last thing the user of a system that handles money or "identity theft class" data needs is to ever ask themselves, "What else was changed or lost?" To the user, it looks like the system was hacked. There is a fear that someone else's account is now tied to your data. It makes users worry that their history with the company, past payments, or other details are also messed up.
Don't let this happen to you, your code, or your systems. There is just no recovering from it, folks. I am not taking any chances -- I am paying off my balance (through the mail) and closing the account that I have had open since right after college.
J.JaNote: Edited on 12/24/2007 to correct the incorrect spelling of "Capital One". Sorry folks, ever since I read "Swiss Family Robinson" I always spell it with an "o" even when I should not!
Justin James is the Lead Architect for Conigent.