RE: Convert Word files to XML?

Subject: RE: Convert Word files to XML?
From: "Janoff, Steven" <Steven -dot- Janoff -at- ga -dot- com>
To: "techwr-l -at- lists -dot- techwr-l -dot- com TECHWR-L" <techwr-l -at- lists -dot- techwr-l -dot- com>, Richard Hamilton <dick -at- rlhamilton -dot- net>
Date: Fri, 25 Jul 2014 11:47:01 -0700

Big thanks to Robert, Tony, and Richard for the responses so far -- thank you!

Yes, sorry, I should have been more specific.

For some applications I'll be converting the Word files to S1000D/XML for use in Arbortext.

For others, I'll be converting the Word files to DITA/XML for use in both Arbortext and Oxygen.

Not a large bulk of Word files up front but the project could grow. (First few docs are 50-100 pages each.)

Yes, I've seen the Eliot Kimber sources so that looks good and I will check those out.

Thank you again and I'll follow up with more details as I understand the project more. Meantime I'll look into what's been suggested so far.

Thanks, Richard, for the in-depth discussion. (I do have some XSLT/XSL-FO experience but I'll need to refresh.)


PS - Also wondering what you do when you get a plain text file and you want to convert it to XML/DITA, for example. Do you primarily start with templates? Thanks for any direction there too. There might be a possibility to receive some of those.

-----Original Message-----
From: On Behalf Of Richard Hamilton
Sent: Friday, July 25, 2014 11:21 AM
To: techwr-l -at- lists -dot- techwr-l -dot- com TECHWR-L
Subject: Convert Word files to XML?

Hi Steve,

There are several factors. The most important are: what XML schema you are converting to, how clean your Word content is, and how much content you need to convert.

Bottom line for me is that if you have a lot of content to convert, you should seriously consider contracting the job out to a conversion company, unless you have some serious expertise with XSL and related tools.

Here is some detail on some tools to consider if you want to go it alone:

I convert Word to DocBook XML using Open Office, which will export DocBook directly. However, sometimes it's better to export HTML and then use a utility called Herold to convert to DocBook. And, I've also used the rather circuitous route of uploading Word to a Confluence wiki, then exporting DocBook using a plug-in exporter developed by a company called k15t software. Which I use in a given case depends on what the input looks like.

You can convert Word to DITA using DITA for Publishers ( I haven't used it myself, but I know the developer (Eliot Kimber), and he does quality work, so I'd definitely give it a try if you're headed towards DITA.

One caveat is that I've found it exceedingly rare that a conversion will be completely clean. You need to plan on doing some kind of cleanup using an XSL stylesheet, perl, manual editing, or a combination of all three on the output of any of these tools unless your input is really simple and well suited to the tool you use (which, with Word, I've never seen:-).

Best regards,
XML Press
XML for Technical Communicators
hamilton -at- xmlpress -dot- net

On Jul 25, 2014, at 10:46 AM, Janoff, Steven wrote:

> Hi,
> For those with experience converting Word files to XML:
> What's the easiest or most effective way you've found to do this?
> Does it depend on the XML editor you're importing into?
> Arbortext is currently editor of choice, but I might also have the opportunity to install Oxygen at home.
> Thanks for your advice. I'll be researching on the web also, but that looks like a bit of a mish-mash.
> Steve

Read about how Georgia System Operation Corporation improved teamwork, communication, and efficiency using Doc-To-Help |


You are currently subscribed to TECHWR-L as archive -at- web -dot- techwr-l -dot- com -dot-

To unsubscribe send a blank email to
techwr-l-leave -at- lists -dot- techwr-l -dot- com

Send administrative questions to admin -at- techwr-l -dot- com -dot- Visit for more resources and info.

Looking for articles on Technical Communications? Head over to our online magazine at

Looking for the archived Techwr-l email discussions? Search our public email archives @

Convert Word files to XML?: From: Richard Hamilton

Previous by Author: Convert Word files to XML?
Next by Author: RE: Convert Word files to XML?
Previous by Thread: Convert Word files to XML?
Next by Thread: Re: _"Infographics_—_A_Special_Mode_of_Technical_C ommun ication"

What this post helpful? Share it with friends and colleagues:

Sponsored Ads