TechWhirl (TECHWR-L) is a resource for technical writing and technical communications professionals of all experience levels and in all industries to share their experiences and acquire information.
For two decades, technical communicators have turned to TechWhirl to ask and answer questions about the always-changing world of technical communications, such as tools, skills, career paths, methodologies, and emerging industries. The TechWhirl Archives and magazine, created for, by and about technical writers, offer a wealth of knowledge to everyone with an interest in any aspect of technical communications.
Thanks very much for the follow-up, Yves. My experience with converting documents from Word and unstructured FrameMaker is a few years old and clearly there are much better tools available now.
Adding structure information: Aside from metadata, I'm pretty surprised that automated tools can do much to create typed topics from unstructured Word, unless the source documents are very carefully formatted. One of my projects is a set of manuals converted to DITA from SGML (IBMIDDoc), which I assume is a much cleaner starting point than Word. We found that reference and concept topics came across very easily as long as the original material was not too mixed. However, because the structure rules for task topics are more restrictive, these required a lot of manual rework.
Even cleanly converted content required manual effort. For example, topic introductions came across as perfectly valid <p> tags, but every one had to be checked and often rewritten to produce a suitable <shortdesc>.
So while it might not be too hard to get to the point where the new DITA topics don't throw validation errors, we found there was a lot more work making those topics well-structured.
Importing tables: Do you mean that these editors can Smart Paste copied Word tables directly into DITA? One of my colleagues wrote some scripts to generate complex tables as an Excel workbook. We use Paste Excel in Arbortext to convert these and tables taken from Word-formatted design documents to DITA tables. I don't think Arbortext lets us paste Word tables directly.
Regards,
Stuart
----- Original Message -----
From: "Yves Barbion" <yves -dot- barbion -at- gmail -dot- com>
To: "Stuart Burnfield" <slb -at- westnet -dot- com -dot- au>
Cc: "Techwr-l" <techwr-l -at- lists -dot- techwr-l -dot- com>
Sent: Friday, 26 July, 2013 3:14:52 PM GMT +08:00 Beijing / Chongqing / Hong Kong / Urumqi
Subject: Re: Cheicken and eggs scenario for structred writing
On Fri, Jul 26, 2013 at 4:14 AM, Stuart Burnfield < slb -at- westnet -dot- com -dot- au > wrote:
(...)
Start experimenting with a conversion process. It would be a mountain of work to manually paste your Word content into skeleton DITA topics. There won't be a simple single-step 'Save As DITA' but you should be able to at least semi-automate the process. For example:
- Save a Word document as XML. This creates a text file with Word formatting mapped to generic tags. Use scripts, macros or regular expressions to convert all generic XML tag s <foo> to actual DITA tag <bar> and to delete unwanted MS Word fluff.
[Yves] >>> Instead of saving the Word file as (Microsoft) XML, you could also try Eliot Kimber's Word-to-DITA transformation Framework:
2. Use mif2go ( www.mif2go.com ) to set up your conversion (basically a style to element mapping), for example:
â heading* = title
â bulleted list = ul li
â instruction = cmd
â heading 4 = section
3. Save the FM file as DITA.
- Use MS Excel as an intermediate step to convert large tables to DITA.
[Yves] >>> That's not really required. Some DITA editors, such as oXygen XML Author and FrameMaker 11, have a "Smart Paste" function, which handle tables very nicely, even more complex tables with merged cells.
As Chris says, you can't magically add structure information that isn't there in the original, but you can automate a lot of the grunt work.
[Yves] >>> Well, actually, you can, depending on the type of structure information which needs to be added. Metadata is very important if you use DITA. In a topic, you can add metadata in the <prolog> element, which can contain things like the name of the author, the creation and modification date of the topic, the product name, the version of the topic etc. MIF2Go can add this prolog to each topic during the conversion.
Still, you will have to check your content after the conversion and restructure it (a bit, YMMV). For example, suppose you have this paragraph in the original text:
"Click on Preview. You should see that this output creates a PDF of the 3D view only."
After conversion, you will get this:
<step><cmd>Click on Preview. You should see that this output creates a PDF of the 3D view only.</cmd></step>
This is *valid* DITA, but not well-structured yet. You need to restructure and refactor this to get something like this:
<step><cmd>Click <uicontrol>Preview</uicontrol>.</cmd>
<stepresult>You should see that this output creates a PDF of the 3D view only.</stepresult>
Conversion can be automated; restructuring/refactoring cannot because you actually have to read the text and then decide that, in this case, the second sentence is the result of the instruction in the first sentence. In other cases, however, the second sentence may be an example (stepxmp), a tip (note type="tip"), or just some more information about the instruction (info).
Cheers
--
Yves Barbion
www.scripto.nu
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
New! Doc-to-Help 2013 features the industry's first HTML5 editor for authoring.