Operating systems optimize

Time to improve application deployment


I am sure that most conscientious developers do their best to make sure that the next coder who tries to maintain the program has an easy time. But how many of us pay any attention to the system engineers who need to actually deploy these things? The answer is, quite unfortunately, not enough of us.

UNIX is a great example of what I am talking about. Up until very, very recently, UNIX was known to be a system that only a programmer could like. Being a UNIX systems administrator involved knowledge of things like makefiles, building kernels, applying code patches, and so on. As a result, UNIX got a bad reputation that continues long after the situation has been changed.

Likewise, many large "enterprise class applications" are so complex to install and configure that companies will spend as much (or even more!) to hire outside experts to install it as they did on the original software itself. Now I do not know about you, but I think that if installing Microsoft Office cost $500 or installing Photoshop cost $600 in outside help, these applications would not be used too much. Even when you go to a PC shop, they rarely charge for more than an hour’s worth of time to install Windows. The current state of application installation (including *Nix and Mac) is an example of the fact that it does not need to be too tough. Most desktop apps, regardless of platform, use that platform’s standard installation system. Once the user learns that system that is it.

And then we get to the custom coded solutions that most developers are working on. What a mess! At best, there might be some .aspx or .jsp or .php (or whatever) files to copy over. At the medium end, there might be some compiled binaries or bytecode to move and possibly an application server restart. And then we get the really hairy ones, the kind with custom DB scripts that may or may not overwrite existing configuration data, the ones that require a full reboot, the ones that involve stopping and then starting the database server, and so on. You know the kind that I mean: downtime, possible failure, and lengthy recovery times.

On the desktop side, anything involving Oracle is a guaranteed mess when it needs the Oracle client. Java desktop apps are another pain point, particularly when they need a non-standard JVM. There is nothing that will aggravate desktop support quite like juggling four apps that need three different JVMs. And, of course, the users love installers or patches that "require" a reboot that really is not needed.

Why are we will stuck in this situation? Sure, some of the systems have improved, like application servers that maintain session state throughout a restart or are nice enough to reload bytecode if the file changes. On the other hand, redeploying a Java EAR file or running an MSI installer to update the application is a lot more time consuming and occasionally painful than running "cp –R /path/to/staging/* /parth/to/production".

When the deployment fails, our only real option is the stop the application server, stop the database server, restore the database, rollback the files, and fire everything up again. This can take hours, during which customers are furious. At least in a bigger system, there is some load balancing or clustering going on, so the system is not hard down. But still, no one likes the situation at all.

I am really not sure what the solution is. I know one major item is to fully and completely separate configuration from runtime data. That takes away a lot of the potential damage and difficulty of both the deployment and the rollback (even in a clustered scenario, if you deployed bad metadata or configuration to the central storage area, all nodes are bad). Another item is to make sure that the deployment is as easy as possible for the person doing it and throws off clear, concise error messages, and does not commit anything unless all operations are 100% complete. After all, it might not be the rock star sys admin or coder doing the move, but a summer intern who has an escalation path in case he notices an error. Finally, make sure that as many scripts as possible pull from the system’s environment and not your test box assumptions. Remember, your backend Ubuntu machine may very well have a different directory hierarchy from the Red Hat or Solaris machine that the system guys are deploying it to.

There is a lot more work to be done until we all have easy-to-deploy and patch systems, but until then, let’s do our best to keep the server folks happy.

J.Ja

About

Justin James is the Lead Architect for Conigent.

23 comments
Flyers70
Flyers70

I do packaging and deployment for a living. There are two, really indispensible items I believe orgs should have that deploy software: 1.) A rock solid, agreed-upon process on how software gets pushed through the deployment cycle replete with test environments. 2.) A web-based change management portal to track it all. My biggest headaches? They are the following: 1.) Weak installation packages written by vendors (actually, Microsoft does a pretty good job here...you can pretty much automate their stuff rather easily). Siebel is EASILY the worst I have ever dealt with. And I still don't understand why Oracle felt the need to write their own installation system. There are many good commercial ones available. 2.) Short-sighted, self-centered developers, who believe their stuff should take precedence over everyone else's. In my experience, most developers are laughably bad admins. 3.) Unreasonable schedules. I understand time to market is important, but you shouldn't sacrifice quality, consistency, etc. and too often I have seen quality and consistency traded off in a negative way.

Justin James
Justin James

How could I forget Oracle installs, not just the client, with their insane Java GUI thingy? My favorite is when it incorrectly sets the file permissions wrong, which you do not find out until you try running it, and do to some bizarre behavior, you need to correct the permissions and reboot the server to get it to work... just what you need on a Production server. J.Ja

tszecsy
tszecsy

Hi, If you use Oracle 10g and OCI support is enough then use Oracle Instant Client, where it is a file copy to install client software. On the server side I think there is no way to avoid it. If you use Oracle 10g XE (the free version of Oracle RDBMS) then it uses some kind of Windows installer, no Java :) Tamas Szecsy

Justin James
Justin James

... it was still fairly miserable, if I recall. Unfortunately, Oracle has really poor documentation, and it is a real trick to figure out what you actually need to install to get your app to connect to a server, vs. all of the potential options. It is kind of like going to Sun's Java page, and trying to figure out what you actually need to download and install to try programming in Java... too many packages, not enough explanation. J.Ja

Wayne M.
Wayne M.

My favorite Oracle upgrade story was, I believe the Oracle 6 client upgrade. The installer for Oracle 8i didn't support an upgrade and said to uninstall the old version. Well, the uninstaller left Windows NT in an unbootable state; it removed one of the critical operating system files! Luckily, we had tested on a lab server and could rebuild it. Unfortunately, during the remote upgrade of 1 of 12 servers, the engineer messed up and I had to get a new server shipped out via next day express.

Justin James
Justin James

... in other words, better than par! I have always been bewildered at Oracle's market share, considering that their software is only of good quality at the basic level of stability and performance. Nothing about it is pleasant. It's performance edge is fairly marginal at this point, it is riddled with security holes, it is a PITA to deal with, and its stability is matched by others. And all of its competitors have a better ecosphere to go along with it. J.Ja

Tony Hopkinson
Tony Hopkinson

Not really, they are very different jobs, I've done both, at the same time when I was a one man shop, in a 24/7 environment as well. Developers tend to weigh more heavily to the creative, ie they learn by making mistakes, sometimes very profitably. In operations 'we' concentrate on cutting down on the risk of making a mistake. It's a good idea to rotate your developers through the deployment team, they are never going to appreciate the issues, particularly how they can ease deployment, without getting some first hand experience. Out biggest cock up came from deploying MSDE along with our applications. Just rolls back if File and Printer sharing isn't enabled. Oh and it's for the agent service, which we didn't want anyway. Never got an explanation as to the why of this asshole dependency either. Of course every machine we tested it on had it enabled. Error message in install log, once you find the damn thing, completely uninformative, requiring trawling all over the WWW to find the damn thing. Another task for the installation checker.

john_mattson
john_mattson

Been installing software for 30+ years. Have seen most all of it, The Good, the Bad, and the UGLY. I could say that painful installs DO provide job security; but I would really rather do more good installs, and have some time to pretty things up, make sure it all works, and make nice to those who pay me. In the construction business they refer to "Call Backs". Hate them, they cost, not pay; and the users do not forget. Creating good install packages should be a highly paid specialty. Their final test should be to give it to a DMV employee and have them successfully install without help. (Appologize to good DMV folks, recent bad experience).

Justin James
Justin James

... the DMV folks. I like to test software on my mother. She's a pretty smart lady who has been around PCs for a while, but definitely not an expert, and represents the average office worker fairly well in that regard. If she can't use it from CD unwrapping to day-to-day usage to uninstall, time to go back to the drawing board. J.Ja

Tony Hopkinson
Tony Hopkinson

Deploying to your own machines, is something that there is no excuse for failing at. Back out's, good procedures, tested ones ! Stick production in a VM and deploy to it. Best tip I can give. If the update script, exe, procedure whatever goes off track. Stop find out why, roll all the way back and start with version + 1. Do not fix one in mid stride, and assume you know exactly what went wrong. Installs on development machines do not count, that's for fixing spelling mistakes and such. Deploying to clients machines, as well as the above is fraught with difficulty. That's when you find out you can't put MSDE on because file and printer sharing is disabled. That you need to elevate under Vista to map a drive. That even though you never tried it some twit is trying to put it on XP Home or worse still Media centre, or they aren't patched or service packed up to date. That the firewall is blocking something, a false positive from an anti-virus, (worse still a true positive :( ), that the user installing doesn't have the necessary rights.... Installation checker is a must in my opinion. There will always be systems where installation will fail, concentrate on the majority. Have some machines to test deployment on , various flavours of windows, with possibly different combinations of web and a database servers, network topologies. Don't forget Citrix type set ups as well, workgroup vs domain can have a massive impact. It's a damn nightmare, and the chances of waking up from it are minimal. Put some resource into this otherwise your carefully crafted genius level code :p will go straight into the bin.

Justin James
Justin James

"Put some resource into this otherwise your carefully crafted genius level code will go straight into the bin." I know exactly what you mean. I've seen it happen too many times! Especially when the customer has a 90 day period with minimal financial commitment. Or worse, when you've dropped big bucks on software, and never got it past the test server because it was such a mess. That kills careers. J.Ja

gardoglee
gardoglee

Planning of the deployment is usually not a design consideration for large system programming projects. Instead, a few hardy fools are allocated to figure out the 'conversion', which is expected to be a simple reformat of existing data files or database tables. The rest is assumed to be simple. Try to bring up deployment as a necessary requirement during design and people will assume you have a VCR at home blinking 12:00, and shouldn't be in the room at all. It isn't until someone discovers at a client site that some critical elements are missing at a critical time, or some critical product is missing on users' machines, or worse yet that some critical product already on the remote users' machines is incompatible with the new application (and wasn't present on the test machines).

Justin James
Justin James

... are always the ones who get hurt the worst by this attitude that you describe, and of course, they are supposed to be the reference accounts. That is one thing to be said for hosted applications, you let someone else worry about the deployment... J.Ja

Wayne M.
Wayne M.

I like to treat deployment as the most critical feature in a development effort. If the software can't be deployed, the rest of the features are immaterial. 1) Make installation the first iteration deliverable. As soon as there is an executable, web page, or what ever with a version number or even just "Hello, World", install it on a near production (or perhaps even a full production) system. 2) Automate the installation. Take the time to write an installer, scripts, custom executables, etc. in whatever combination needed. Even though the deployment may be a one time event, it needs to be tested multiple times. Do not accept any results saying, "Gee, I guess I forgot a step this time." 3) Have an installation validation, preferably automated. Verify all files are put in the expected directories. Verify the version numbers or dates of installed files (hint, set the file date to its create date in the install routine). Verify basic operation. Don't just read over the install program, verify it actually does what you think it does. 3) Always have a rollback plan and test it. Know exactly what you plan to do if an error crops up. Plan for both immediate detection and detecting the error after about a week of operation. You may have to have both a partial and full rollback plan. Some items, such as DLL upgrades or Oracle Client upgrades, may be riskier to rollback than to just leave in place. 4) Avoid configuration options in developed software. Lock it down! Why test the software and then leave the door open to someone to misconfigure it. 5) Always use the install utility internally to set up new machines. Do not use any short cuts or simply copy files to save time. If the install fails, drop everything else, debug and fix the installer. Remember, the installer is your highest priority deliverable. 6) Always have at least one test machine. Developer machines are never configured the same as user machines; try to keep them close, but recognize they will differ. It is often useful to have some baseline user configurations ghosted for test. I am always surprised that we will spend months developing an application and then try to define the deployment approach the day before release. Develop the deployment infrastructure first and avoid the "But it works on my machine" syndrome.

Justin James
Justin James

"I am always surprised that we will spend months developing an application and then try to define the deployment approach the day before release. Develop the deployment infrastructure first and avoid the "But it works on my machine" syndrome." Too many times I've seen great software get a bad rep or worse, never get past the customer's 90 day pilot program because of this attitude! J.Ja

monte_bertrand
monte_bertrand

I always try to treat my client as a customer, regardless if they are internal, external, or whatever. The deployment dept should be treated with as much consideration as the guy who signs your check. I think a deployment should be like the treatment you get at a Jiffy Lube. As opposed to other shops were they just slam down the hood and say "you're all set", Jiffy Lube has a person come by and say "Here is the service we performed for you today, sir." Even if it's a checklist produced on a spreadsheet that gets emailed to all parties, there should be a reporting to stakeholders of what you did during the deployment process. You would be surprised how many times people will come back to you and say "Whoa, we did what to the production box?". (No matter how many times you tell them beforehand, they never seem to read the email until after you deploy.) Lots of proactive steps during the staging process will come up when you take this approach. That way, the real end user doesn't have to deal with as many hiccups. "So, today sir, we suggest you upgrade to the new Vista air filter ..."

Justin James
Justin James

... to ease application deployment pain? J.Ja

apotheon
apotheon

Between you talking about the importance of deployment as a development concern, and Wayne M.'s "Deployment Philosophy" post, I was inspired to write up my own set of guidelines for developing deployment procedures. I composed it mainly as a means of reminding myself of the important factors involved in deployment planning for developers, but sometimes find that the best way to make it stick is to share it with others. That also makes it more difficult to justify doing a half-assed job of specifying a set of guidelines -- it has to stand up to public scrutiny: [url=http://sob.apotheon.org/?p=262][b]10 tips for developing deployment procedures (or: Deployment Is Development)[/b][/url] If you have (or if anyone else, for that matter, has) any suggestions for how I might improve the guideline list, please let me know. I'm all about making things better.

Tony Hopkinson
Tony Hopkinson

probably the only jib would be two systems, to test deployment on especially in windows world. A good thing to do with the customers consent of course is to collect set up info, windows version (db, web as possibly) service packs, memory size .... Possibly as part of the install it self. Build up a picture of your deployment targets. Even not going mad We test deployment on many combinations (server, client and network), subtly different each one.

Justin James
Justin James

Good work on that! I agree with Tony, there should be an easy way (opt in needed, of course!) to collect system info for both failed *and* successful deployments, to help the developers out. J.Ja

Deadly Ernest
Deadly Ernest

The boss wanted a new application deployed that I said wouldn't work right with our network set up. He insisted it just loaded and ran. He claims more expertise than me. he wanted it on before I went on 4 weeks leave. So when it came to deployment, we loaded it on the server together, made sure he was happy with everything and left it to auto deploy via our auto update system over night. This was a Thursday evening, Friday morning at 3.00 am I'm off driving interstate. About 11.am he rings my mobile to tell me half the corporate network is down due to the application. I tell him to log onto the server and ran a script named 'believe_me_now' password 'Rasberry'. This deactivated the new application and started a renew auto update for thee whole network, taking it back to an hour before we did the installation. An hour later the whole systems is back up OK. And no I didn't get fired for this. The next two application deployments went down while I was again on leave. I just seemed to have perfect timing, my applications were approved before the deployment timing was announced- but it did look very suspicious. One would think I could work out how long it took the people to work out their deployment strategy and timing. It was rather simple to do. After that, when I put in for leave, the boss checks what's about to be deployed.

Justin James
Justin James

That one is typical... the boss who beleives the vendor's "easy deployment" promises, all evidence from the actual workers to the contrary. J.Ja

Deadly Ernest
Deadly Ernest

so well. This same boss thought we could buy a single appliance that did anti-virus and web proxy in one and use it to replace a whole secure gateway with built in redundancy. We tried to tell him the mail server wouldn't work through it as the system came totally default setting for security reasons that blocked all but the few ports needed for the AV and the web proxy. Hate to think how much it cost to fix the damage at several client sites he had it installed. I made a point of including in the branch meeting minutes with the big boss that the device would NOT work as advertised and I'd proven that. For that I GOT FIRED, well contract not renewed is the legal term. I heard later several big bosses failed to make bonuses the next year due to the costs of fixing up that little faux pas. edited to add the default settings could NOT be user changed.