Software

How do I... Modify Word documents using C#?

Accessing Word components from C# isn't quite as straight-forward as many other features of C# and the .NET Framework. With that said -- it's not rocket science either. You simply need to know what to reference and how to use the components. Zach Smith lays out exactly what you need to do.

Accessing Word components from C# isn't quite as straightforward as many other features of C# and the .NET Framework. With that said -- it's not rocket science either. You simply need to know what to reference and how to use the components.

This blog post is also available in PDF form in a TechRepublic download, which includes a sample Visual Studio project file with all the necessary code.

Referencing the Word assemblies

The first step is to get the correct assemblies referenced for Word. The name of your assemblies will vary based upon the version of Word that you have -- in my case, it is version 11, or "Microsoft Word 11.0 Object Library."

The Add Reference dialog is shown in Figure A with the correct reference highlighted.

Figure A

Word Reference
Note: The Word library is found under the COM tab of the Add Reference dialog.

After you click OK, you should see the following under References in your C# project:

  • Microsoft.Office.Core
  • VBIDE (This is used for Visual Basic For Applications functionality)
  • Word

Using the Word components

Once you have the references set up, you can begin using the Word components. However, these components are a little tricky to deal with and can act in unexpected ways. These objects work by basically creating an instance of Word under the current session and giving you access to Word's functionality. You can even see the Word instance running if you instruct it to be visible. This is both intriguing and scary. On the one hand, you have access to a wealth of Word functionality, but on the other you're taking up valuable RAM by creating a new instance of Word.

For this reason, it's best to use these components in an environment where you don't expect a lot of heavy use, or where you know only one person will be using it at one time. For instance, a Web server is not a good place to use this functionality. However, using this in a client-based application would be fine.

Okay, now let's get to the code. To use the Word components, you will follow these basic steps:

  1. Instantiate a new Word.Application object.
  2. Create a new Word.Document object.
  3. Call Word.Document.Activate to make sure our document has focus.
  4. Do some action on the document (we'll replace some text in the example).
  5. Save the document using Word.Document.Save or Word.Document.SaveAs.
  6. Close the document.

The code shown in Figure B and available in a Visual Studio project file in the download version of this document demonstrates these steps. This code opens an existing document, replaces several tags (as if the document is a letter), inserts some text before and after the existing content, and then saves the document.

Figure B

Code Listing 1

You'll notice the "ref missing" parameter in several of the Word object calls. This is due to the Word components being accessed through COM Interop. We need to use the missing variable to indicate to the components that we want to use the default value for that particular parameter.

Another thing to point out is the FindAndReplace function, which is actually a helper function that wraps around the Word.Application.Selection.Find.Execute method. Figure C shows the code of the FindAndReplace function.

Figure C

Code Listing 2

Since there is no text selected, the Find method defaults to searching the entire document.

Other functionality

Take note of the number of parameters you can send to the Find.Execute method shown above. As I said before, a wealth of functionality is available from the Word components. More than you'll probably ever need.

Some of the more useful functionality is listed below:

  • CheckSpelling -- Runs spell checker on the document
  • DowngradeDocument -- Downgrades a document so it can be opened in previous versions of Word
  • FitToPages -- Decreases the font size so that the document will fit into a certain number of printed pages
  • Password -- Defines a password for the document
  • PrintOut -- Prints the document

TechRepublic's free Visual Studio Developer newsletter, delivered every Wednesday, contains useful tips and coding examples on topics such as ASP.NET, ADO.NET, and Visual Studio .NET.  Automatically sign up today!

27 comments
borisi
borisi

I just ran on this topic and I have to shed some light of my own on the topic. When creating a document generation solution you basically have 3 different ways to do it:
- Office COM (described in this blog)
- Microsoft Open XML SDK
- Buying an existing 3rd party solution

It can be said that using the first solution (Office COM) is bad in almost all cases, especially when used on server side in ASP.NET apps or other client-server architectures. Even Microsoft doesn't recommend it

A much better way is to use MS's OOXML SDK but this approach has a drawback of being much more tedious because the SDK APIs do not allow you to manipulate document object model but instead allow you to manipulate document packages/files. It is much more difficult to write a DocGen solution compared to Office COM, but it is still a better solution because it doesn't rely on MS Word.

The third approach can also be the option. There are several existing DocGen solutions on the market but as far as I know non is free. Most of them try to mimic Office COM APIs and there are also the ones which allows you to create an MS Word document containing tags/placeholders and then populate such documents (templates) with your data from within your .NET app. On of such libraries for .NET is Docentric Toolkit. It is a very versatile tool because you can place not just simple value placeholders, but also data bound charts, images, hyperlinks and it has its own Add-In for MS Word which makes this tool easy to use and allows non developers to use it for template creation.

guido.leenders
guido.leenders

I prefer to work declarative and use as little coding as possible. In that way, business changes directly reflect changes in the technical approach for the document. And when only on a business change technology needs to be updated, there is never an issue with budgte :-)

You can work with the document also in a declarative way when you install Invantive Composition within Word (requires a Word version that has .NET, but I think with C# that is reasonable to ask for). It allows expanding documents using tags in the document and a declarative model which also specifies database queries if needed. All interaction with central servers and databases is done through a secured webservice on-premises. You can add extensions to Invantive Composition Word templates using C#. See http://www.invantive.com/products/invantive-composition for information on modifying Word documents in a declarative way.

Please note that I work for Invantive, so I may be biased.

aseke123
aseke123

Hello, thank you for the post. I have used this code to replace all the "l" latters with "m" letters, but I have a question: there is also a replaceAll function is available, I'm wondering which one will work faster, if for example I have 1000 or more instances of letter "l" in my file. And I couldn't understand what does it mean: ref replace=2, why 2?

transkinger
transkinger

Great articles! Thank you so much. But word automation cannot be used in Asp.net. I used Spire.Doc to Modify Word documents using C#, it easy to use.

peter.j
peter.j

We left MS Word automation some time ago. It is a messy, unstable and unreliable solution for document creation. We were looking for a solid document generation library and found the Docentric toolkit which doesn't provide APIs for document creation and manipulation. Instead, it provides a solid and template based document generation with the ability to edit document templates in MS Word: www.docentric.com If you are looking for the .NET tool that will help you create and maintain complex data-bound documents then check it out.

Synbari
Synbari

This is really a great articale and save me a lot of time. Just a small problem am facing when trying the replace funcion. An AccessViolationException is thrown. Any idea is really appreciated

dhays
dhays

I guess the newsletter is no longer available, a search on TR shows none. The listing of available newsletters doesn't show it either.

camainc
camainc

If you really need to modify Word documents using C# (or VB.Net, for that matter) you will save yourself a ton of headaches by using the Aspose Words API. Believe me, it is WELL worth the investment. http://www.aspose.com/ They have APIs for Excel, Powerpoint, PDF, and a bunch of other good stuff too. And no, I don't work for Aspose. I just use their excellent products.

dawgit
dawgit

Is NOT 'C' That could, just might, through a few people off a bit.

charakadharma
charakadharma

Thanks man... worked like a charm. Put the 's in my word document, built a UI to it, and bam... done. THANKS

vikas0286
vikas0286

thank alot...it solve my problem...how can i use for print out

nedaema
nedaema

??? ???? ?? ?? ??? ????? ????? TANK YUO VERY MUCH

Tony Hopkinson
Tony Hopkinson

This one is almost a classic example of why MS's monolithic design philosophy sucks big time. I gave it go, threw my hands in the air, and dropped the entire idea. You might as well record keystrokes as use this toss. PS you might want to mention that which version of MS word you have deployed can have a wee impact o what you do and can do.......

dotNiemand
dotNiemand

How you are going to deal with ms word com on shared hosting? use some lib like word.net or invoke docx. for example i use invoke for my web project (need to generate documents) online. It's pretty small and simple, check here - http://invoke.co.nz/products/docx.aspx

rajiv216
rajiv216

System.Runtime.InteropServices.COMException (0x800A1436): This file could not be found. Try one or more of the following: * Check the spelling of the name of the document. * Try a different file name. (ranjan.doc) at Microsoft.Office.Interop.Word.Documents.Open(Object& FileName, Object& ConfirmConversions, Object& ReadOnly, Object& AddToRecentFiles, Object& PasswordDocument, Object& PasswordTemplate, Object& Revert, Object& WritePasswordDocument, Object& WritePasswordTemplate, Object& Format, Object& Encoding, Object& Visible, Object& OpenAndRepair, Object& DocumentDirection, Object& NoEncodingDialog, Object& XMLTransform) at ResearchAndDevelopment.FormMicrosoftWord.ReadWordDocument(Object fileName, TextBoxBase textFileData) in C:\Documents and Settings\Sun\Desktop\Dot Net\ResearchAndDevelopment\ResearchAndDevelopment\Forms\FormMicrosoftWord.cs:line 84

snath
snath

this is a cool article. but if server doesn't provide service for word.application,, then what to do.. can you have any idea for providing any sort of api for word.. where we need not to install word.application at the server.. thanking you.. subimal mckvie.subimal@gmail.com

places_intheheart
places_intheheart

Perfect, but what if I want to save the word document as .txt, where should I place the desired format?

geertpante
geertpante

IMHO, the COM API for Word is hardly worth calling an API. It's written on top of the User Interface instead of providing an API to the underlying Document Model. For example: you need to activate a document before you can use it, you need to explicitly create a selection to change it, you need to programmaticaly type text to insert anything in a document, etc. I'd rather use a standalone library like Jakarta POI, than having to launch MSWord to programmaticaly create documents.

Tony Hopkinson
Tony Hopkinson

doesn't require word to be installed, at least that's what I was told by MS at TechEd when they launched 3.5. It was about the only good part of the entire thing, (the word API, not TechEd).

pauleta85
pauleta85

Hi, How do I embed the image inside the MS Words. thanks in advance.