Converting Word to text?

Subject: Converting Word to text?
From: "Hart, Geoff" <Geoff-H -at- MTL -dot- FERIC -dot- CA>
To: "Techwr-L (E-mail)" <TECHWR-L -at- lists -dot- raycomm -dot- com>
Date: Tue, 22 Feb 2000 09:33:00 -0500

Ruth Lundquist wondered: <<Does anyone know of a macro, tool, program,
utility, potion, magic spell, and/or sacrificial offering that will convert
a Word document to *readable* text?...It's possible to simply "Save
As=text", but when you try that on a document that has columns, tables, or
other formatting, you are, in a word, screwed.>>

The biggest problem would seem to be "how to convert it into a format your
client's legacy system can use", not how to convert it to readable text. The
underlying cause of that problem is that ASCII simply doesn't support any of
Word's advanced layout features (headers, footers, tables, graphics, page
numbering, etc.), and you'll never completely recapture any of this
formatting in ASCII.

My first suggestion would be to use something like MacLink or Conversions
Plus to convert the Word files into an intermediate format, then convert
that intermediate format into the legacy system's format. (We did this long
ago for getting WPS Plus files off a VAX and into PageMaker.) Without
knowing how your client's software imports text (i.e., what formats it can
and can't handle), it's pretty hard for me to provide any more detailed
advice here. With a little digging, you may find that their software
supports (say) WordPerfect 5.1 files or WPS Plus or something even more
obscure that the translation packages can still handle more or less well.
You'll still have to test the conversion and probably simplify the Word file
before you try converting it (see below), but if you're lucky, there's a
silver bullet that will magically do most of the work for you.

There are two strategies that _will_ work reasonably well, but they're only
general directions for you to explore rather than actual solutions; you'll
need to dig deep to find out what import capabilities your client's software
possesses, and do some tweaking on your own to figure out how to generate a
compatible format. First, make a copy of your document and simplify it,
removing as much formatting as possible; convert the file to a single-column
layout, with the headers and footers stripped out, pagination turned off,
section breaks removed, and the tables converted to text before you begin.
For each table, select the table and tell Word (under the Table menu) to
"Convert table to text...". That will give you tab-delimited ASCII (or any
of several other options), which should be readable by just about any
software. To finish the process, export the file to HTML; better still, open
it in an HTML editor such as Dreamweaver or BBEdit that generates _clean_
HTML. You now have an ASCII file that includes a bunch of html markup code,
but all the paragraphs will be intact as paragraphs; open that file in any
text editor and either strip out the codes (e.g., do a global search and
replace for all <...> tags and replace them with "nothing") or replace them
with some kind of tags that your client's system can read.

Like I said, that's the overview; you'll have to work on the details with
your client.

--Geoff Hart, Pointe-Claire, Quebec
geoff-h -at- mtl -dot- feric -dot- ca
"The paperless office will arrive when the paperless toilet
arrives."--Matthew Stevens




Previous by Author: HTML authoring tool for creating HTML Company Newsletter?
Next by Author: Tech word for types of user interface?
Previous by Thread: RE: tech word for user interface
Next by Thread: Tech word for types of user interface?


What this post helpful? Share it with friends and colleagues:


Sponsored Ads