Help! Need to reformat over 1000 *plain text* Word docs quickly

Subject: Help! Need to reformat over 1000 *plain text* Word docs quickly
From: "Hart, Geoff" <Geoff-H -at- MTL -dot- FERIC -dot- CA>
To: "TECHWR-L" <techwr-l -at- lists -dot- raycomm -dot- com>
Date: Fri, 31 Jan 2003 14:39:39 -0500


wieland -at- marinbrillante -dot- com has a sticky problem: <<... over a thousand Word
documents have been created by the copyediting team, each document matching
up with a web page to be entered [into Vignette] and edited by another team
of contractors. Unfortunately, most of the data... appears on the Word
document in the wrong order, in plain text, and in a non-intuitive
fashion.>>

If there's truly no pattern to the madness, then you won't be able to come
up with an automated solution, and you're stuck with doing all or most of
the work by hand. Sounds like that might be at least partially the case:

<<For instance, the title text might appear on page four of 50 Word
documents, but appear as the first input box in the corresponding Vignette
template.>>

Can you group the Word documents based on similarity? For example, if 50
have the title as the first paragraph, 50 have it as the final paragraph,
etc., you should create separate directories to hold these two groups of
similar documents. This lets you use the same approach on all the files in a
given directory. If the documents really have no common thread, you'll have
to impose order on the chaos by manually putting them into the same order
for each of the thousand documents.

<<What's more, each unique component of information (for instance, the title
of the page, the featured graphic, the terms of use box, etc.) is listed
plain text in Word. No fields!>>

The Word documents you're forced to work with originated as someone's Web
site. Before you get too hung up on using the current files, see if you can
get ahold of the original HTML files. Someone probably downloaded them and
saved them as plain text rather than HTML, and in so doing, erased all the
style tags. If you can open the original HTML in Word, then do a "save as"
in Word format, you'll end up with a document whose structure (e.g.,
paragraph vs. heading) is defined. That will enormously simplify your job.
Of course, if all the Web pages are plain text, and the creator didn't even
bother distinguishing between headings and body text, you're going to have
to tag the paragraphs manually.

<<any ideas that would allow us to change the order of the data so that the
workflow would be easier.>>

Once you've got the files tagged, use Word's search function to find all
entries with a given style (one at a time) and cut and paste them into some
kind of standard order. There's probably a macro you could write that would
do this for you, but not if there's no underlying logic to the existing
structure. Where logic is lacking, you'll have to provide it yourself.

<<What we need is some sort of system where batches of these Word documents
can be reorganized. Each piece on the page needs to appear in the order
corresponding to the content management system's entry template used.>>

Start by creating this template in Word; it can be an actual style template
(better), or simply a document composed of lines that say "title goes here,
abstract goes here, etc." Once the Web pages are tagged (as noted above),
you can use Word's outline view (View--->Outline) to display the overall
structure of the document. The cool thing about the outliner is that you can
collapse it to show only certain levels of heading, then drag those headings
around within the outline until they're where you want them; all subordinate
topics grouped under a higher-level heading will move along with the heading
that you're dragging. Once the headings match the content management
system's required order, apply the appropriate paragraph styles and you're
done.

Wish there were a simpler way, but it doesn't sound like it from what you've
told us. Hope you're billing by the hour!

--Geoff Hart, geoff-h -at- mtl -dot- feric -dot- ca
Forest Engineering Research Institute of Canada
580 boul. St-Jean
Pointe-Claire, Que., H9R 3J9 Canada

"Work is of two kinds: first, altering the position of matter at or near the
earth's surface relative to other matter; second, telling other people to do
so. The first is unpleasant and ill-paid; the second is pleasant and highly
paid."--Bertrand Russell

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Buy or upgrade to RoboHelp X3 today and receive the WebHelp
Merge Module for FREE ($299 value). RoboHelp X3's all-new
features include conditional text, completely re-engineered
printed documentation output, Context-sensitive Help Toolkit,
single-source layouts, and more!
Order online today at http://www.ehelp.com/techwr-l


---
You are currently subscribed to techwr-l as:
archive -at- raycomm -dot- com
To unsubscribe send a blank email to leave-techwr-l-obscured -at- lists -dot- raycomm -dot- com
Send administrative questions to ejray -at- raycomm -dot- com -dot- Visit
http://www.raycomm.com/techwhirl/ for more resources and info.



Previous by Author: HUMOR: Should I use common sense?
Next by Author: What's with colo[u]r anyway?
Previous by Thread: Re: Help! Need to reformat over 1000 *plain text* Word docs quickly
Next by Thread: Electronic Help Files/Manuals HTML Help or PDF?


What this post helpful? Share it with friends and colleagues:


Sponsored Ads