General discussion


Banking disaster

By JamesRL ·
For most of you non-Canadians, you may have missed this.

Canada's largest bank, the Royal Bank of Canada(or RBC) implemented a patch to their software and had some major issues recovering from it.

I'm sure there are lots of lessons to learn, about software testing, crisis communications and other areas.

It is easy to play Monday morning quarterback, but are we all prepared to face something like this?


This conversation is currently closed to new comments.

Thread display: Collapse - | Expand +

All Comments

Collapse -


by Oz_Media In reply to Banking disaster

I know the guy who suggested their using a different updater (he's a heavy coder and I don't understand much of what he does) but he had wrote this same script for another major bank when they were patching and upgrading, I believe it emulates a drive mirror/image or something. This would have not have happened if they had used the newer scripts, NOTE: This guy also custom codes their PKI system so he's already known by the bank.

Oh well, didn't slow or impede a single cent of my money, it was apprently only effecting a VERY small portion of thier transactions during a 3-hour period (even though that adds up, it was only a couple of hours, not days )which has apparently been sorted now.

Your URl is broken, they added a %20 space in the word article, here's the condensed URL for anyone having problems.

The issue with the code was that this is ONE bank that still uses C++ scripting and the code is MUCH MUCH harder to debug and not as stable as Python.

If this was a Python script, as the PKI is structured, it would have never happened as it is (so I am told) due to a redundant code error, this is eliminated in Python as the code is far easier to comprehend and doesn't use redundat statements.

They are now asking for a second proposal from my friend's company and are investigating using Python in the future.

Collapse -

Call in show

by JamesRL In reply to Humor

There was a radio call in about it on CBC, with one spot on caller. He said it shouldn't be the guy who wrote the script who should be worried, it should be the guy who tested it and signed off on it.

I have had similar situations where something works perfectly on the test server and doesn't work on the production server. It was because the two configurations weren't totally in sync in terms of patches and other configuration issues.

Our solution was to: a) keep the test box and the production server in sync as much as possible and b) institute a series of "validation" tests in the production environment to validate the assumptions we made from our testing server.

This saved a lot of headaches. Not sure whether it would have caught the RBC error but its one approach.


Collapse -

Not really the same issue with this type of script

by Oz_Media In reply to Call in show

My point here being that the script is ancient. Other banks have been using a mroe stable system that runs in a compressed code so (no matter WHERE it is tested) makes no difference. THe issue was buggy code that didn't need to be buggy as pointed out to RBC last year, actually i think it was in 2001 and they simply didn't see a needd for it.

Whether the code was tested or not means very little when the TYPE of code used is the main issue.

It will take 2000 lines of C++ code to equal what you can achieve with 25 lines of Python. I'll never forget showing a friend a SMALL C++ script I wrote that was only just over 60 lines. I thought I did pretty good, until he showed me how the EXACT same functions and results were achieved in less than 8 lines of Python.

The code itself is the problem, not the person writing or testing it. Had they used something mroe efficient and easier to notice errors in, they would not have an issue to address.

RBC didn't follow suit with other institutions they are grouped with that had migrated to Python for their PKI code instead of the hackable, full of holes products we so often see due to over coded and redundant C++ code.

Collapse -

Importance of good and regulary tested DRP

by MooneyS In reply to Banking disaster

From what I gather - one of the lessons learnt is having a good tested plan for reverting back to the original state of the system - something RBC did not hace in place!

Secondly, obviously the patch was not tested adequately before introducing it into the production network - many organizations do not have adequate testing network that would simulate the actual production environment. Testing a patch on a one or two boxes is not the best answer!

Another thing that comes to mind is perhaps having a dedicated staff for testing patches.

Having proper Disaster Recovery Plan is crucial for most businesses but especially a financial institution and testing it regularly to ensure its integrity is also critical. On the surface it would appear that RBC did not take this seriously.

It can shake the trust of most loyal customers and partners!

Related Discussions

Related Forums