Software optimize

Alternatives to using Office COM objects within web apps

Don't use Office COM objects within web applications. Developer Justin James explains why it's a bad idea and suggests three better alternatives.

A requirement that comes up at least several times a year for me is "how do we work with Office documents in our web application?" The common thought by those who have never done it is almost invariably to host the Office COM objects and use them to manipulate the documents. After all, nothing is definitely going to be 100% compatible, a copy of Office is pretty cheap, and the techniques are well known and documented. Unfortunately, this is the worst thing possible to do.

The problem with working with Office inside a web application is simple: It is a desktop application. For one thing, the various pieces sometimes require or expect user interaction, so the object can lock up. And it requires a desktop session and for the Office application to be started, which can be a performance drag. There is a lot more to it, but Microsoft has made it very clear that using Office COM in web apps is a really bad idea. There are three better alternatives.

The first one is to re-evaluate the need for manipulating Office formats. Perhaps the use case officially says "import data from Excel" but the real use case can be met with a CSV file import, or the "must read/write Word documents" in the feature list is really a substitute for an on-screen HTML editor component. Many times when I have seen the "needs to work with Office files" requirement, deeper investigation showed it was not a true need.

The more likely alternative is to buckle down and get a library to work with Office formats. I have used Syncfusion's packages and really liked them; Syncfusion packages also handle PDFs, making the offering a fine value. I have heard good things about Aspose. There may be some less expensive or free components available.

The third alternative is to ignore the older Office formats and only work with the newer XML-based formats, which are fairly easy to manipulate (just unzip them and there's a pile of XML inside). However, by working directly with the XML, you lose the business logic that the format represents. Depending on your needs, this may be just fine.

Whichever path you take, remember: do not use the Office COM objects in a web application.

J.Ja

Keep your engineering skills up to date by signing up for TechRepublic's free Software Engineer newsletter, delivered each Tuesday.

About

Justin James is the Lead Architect for Conigent.

5 comments
HusamKhoulahTech
HusamKhoulahTech

For Xml based formats I think the best alternative is the Open XML SDK offered by Microsoft

Duke E Love
Duke E Love

I have been using it for spreadsheets for a good while now and I am quite happy with it.

clavius
clavius

I've had good luck with the Apache POI library (http://poi.apache.org/). It lets you open various kinds of Office documents (including Word and Excel) in a Java environment and do useful things with them.

paulschapman
paulschapman

As you point out manipulating office documents on a server with the COM objects is a really bad idea. There is however a way of doing it without using the COM objects. For a number of versions now Office has used XML to store documents. This provides a perfectly good alternative. The XML can be parsed if required and can be generated in order to create a Word document.

Mark Miller
Mark Miller

I remember talking to you about doing this probably 6 years ago. Ever since Office '07 came out (I think) they've had .Net components that were designed to be run on a server. I'm surprised this is still an issue. The version of Office I was using when I did this was about 3 years old. I know you say "Don't do it," but in my situation I really didn't feel like I had a choice. As I recall, I discussed it with my boss, and he recommended I do it. I'm fairly sure I discussed the CSV option, and the customer didn't want to do that. I can't remember why. One of the other options at the time was a component of SQL Server that would import/export from/to various data formats, including Office. It did it lightning quick, too. For some reason that wasn't going to work, either. I asked about the 3rd party options that were available, and neither he nor the client were willing to shell out the money for them. I told my boss about the warnings I'd heard about using Office on the server, and he said, "They're only going to have one or two people using that feature at the same time anyway." The main concern I heard at the time was running Office in a multi-threaded environment. It's components were not thread-safe. I felt as though I was given no other option. So I did it, and it seemed to work, though it was pretty slow. I had to put up a "wait" screen while it ran, because it could take as long as a couple minutes to process a single spreadsheet. They weren't even that large. It definitely felt like a kludged solution. I remember the only way I was able to really run it safely was to make sure I null'd out *every single reference* to a COM object before I programmatically quit out of Office, and shut down the translation object I had written, plus I had to do something risky, which was to run GC.Collect() to get the COM components to shut down on demand. Otherwise they could've had several "zombie" instances of Office running on their server, hogging memory. The reason was the COM components wouldn't shut down until the GC got around to them, and so Office wouldn't shut down until then, either. I never heard any complaints about it. In fact when the customer tried it they were thrilled almost beyond words. I agree with your admonition to not do it, but I'd say only do it as a last resort, and with a serious restriction on it. It will absolutely not work if you bring up more than 3 instances at once, from what I'd heard at the time. It is not a scalable solution. Call it for what it is: a kludge, but one you can make work for really specialized circumstances.