Note: The following article was originally published on June 13, 2008. I revived and updated it for the TR Dojo Challenge series.
In last week's TR Dojo Challenge question, I asked TechRepublic members how to save images embedded within a Microsoft Word document as separate files?
Imagine the following scenario. Someone sent you a Word document loaded with pictures (30 or more). You need the pictures as individual image files, but for some reason the document's creator can't send you the images.
Now, you could open the document in Word, select a single image, copy it, paste the image into your favorite image-editing application, and then save the picture. But, this would take too long. You could also create a script or macro to remove copy the images, but again, this is more work than necessary. By saving the file as a Web page (Word 2000, Word 2002/XP, or Word 2003) or by unzipping the .docx file (Word 2007), you can quickly save embedded images as individual files.
Save as Web page
Using the following steps for Word 2000, Word 2002/XP, or Word 2003:
- Open the document in Word.
- Click File from the Standard Toolbar.
- Click Save As.
- Specify your Save in location.
- Select Web Page (*.htm; *.html) from the Save as type drop-down menu, as shown in Figure A.
- Click Save.
Make sure you choose Web Page (*.htm; *.html) and not Single File Web Page (*.mht; *.mhtml). Click the image for the full-size version.When you save the document as a Web page, Word creates an .htm file and folder containing the embedded images, as shown in Figure B.
By default, Word saves supporting files to a subfolder in the same location as the main .htm file. By default, Word saves supporting files to a subfolder in the same location as the main .htm file. You can instruct Word to save the files to the .htm file's location instead of a folder from the Web Options settings window.The .htm file contains the document's text, formatting information, properties, image references, and so forth. Open the .htm file with and HTML editor, and you can see the code Word generates. As I mentioned above, the folder contains the document's embedded images and a filelist.xml file, as shown in Figure C.
If the image has been resized within Word, the folder will contain both the original image and a resized copy. Word will preserve each file's original format (.jpg, .png, etc.) but will not preserve the image's original file name. Word renames the files in ascending order starting with the first image in the document. Each original image is immediately followed by the resized copy, if it exists.
Depending on the Web Options settings, Word may automatically create a resized image when you save the file as a Web page. Word may also convert the image to a .gif. For example, if you haven't told Word to allow .png as a graphics format under Web Options and you insert a .png file into your document, the supporting-files folder will contain both the original image file and a resized, reformatted .gif copy. You can now copy the file(s) to another location.Note: If you only want the resized images and not the originals, you can choose Web Page, Filtered (*.htm; *.html) from the Save as type drop-down menu, instead of Web Page (*.htm; *.html).
Unzipping a .docx file
With Word 2007, Microsoft introduced the XML-based .docx file format. The new format is essentially a ZIP container, which contains a series of XML files and any embedded images. To access the embedded images in a .docx file, use the following steps:
- If it's not already a .docx file, Open the file in Word 2007 and save the file as a Word Document (*.docx).
- Change the file extension on the original file from .docx to .zip, as shown in Figure D.
- Open the file using a ZIP application. The image files should be listed at the top of the file list, as shown in Figure E.
Click the image for the full-size version.
You can now copy the file(s) to another location.
And the TechRepublic swag goes to...
I was pleasantly surprised by number of TechRepublic members who actually remembered I discussed this problem back in June 2008 and even linked to the original TR Dojo article. As with last week's Challenge, several TechRepublic members submitted answers that described both the methods I outline above, but I'm awarding the TR swag to Ole88 (who first suggested saving the document as a Web page), bsteenkamp (who was first to mention opening the the .docx file), and TechRepFollower (who was first to suggest saving the file with the Web Page, Filtered option).
Thanks to everyone who submitted an answer. If you don't see your answer here, be sure to give this week's question, "How can you say "no to all" replacements when copying files in Windows XP?" a try.
You can also sign up to receive the latest from the TR Dojo through one or more of the following methods:
Bill Detwiler has nothing to disclose. He doesn't hold investments in the technology companies he covers.
Bill Detwiler is Managing Editor Tech Pro Research and the host of Cracking Open, CNET and TechRepublic's popular online show. He was most recently Managing Editor for TechRepublic Pro. Prior to joining TechRepublic in 2000, Bill was an IT manager, database administrator, and desktop support specialist in the social research and energy industries. He has bachelor's and master's degrees from the University of Louisville, where he has also lectured on computer crime and crime prevention.