TechWhirl (TECHWR-L) is a resource for technical writing and technical communications professionals of all experience levels and in all industries to share their experiences and acquire information.
For two decades, technical communicators have turned to TechWhirl to ask and answer questions about the always-changing world of technical communications, such as tools, skills, career paths, methodologies, and emerging industries. The TechWhirl Archives and magazine, created for, by and about technical writers, offer a wealth of knowledge to everyone with an interest in any aspect of technical communications.
Subject:Re: Word 2000 HTML conversion From:Ray Dembek <RFDembek -at- MEDIAONE -dot- NET> Date:Wed, 18 Aug 1999 12:18:06 -0400
You can download the Microsoft Office HTML Filter, a tool you can use to
remove Office-specific markup tags embedded in Office 2000 documents saved
as HTML. See http://officeupdate.microsoft.com/2000/downloadDetails/htmlfilter.htm.
Installing this filter will also implement an Export to HTML command on the
File menu in Word 2000.
The HTML code out of this is not "minimal" but it is a lot cleaner than when
you save a Word doc as a web page. The "Hello, World" test now results in a
5K HTML document with still more than I want.
-----Original Message-----
From: Technical Writers List; for all Technical Communication issues
[mailto:TECHWR-L -at- LISTSERV -dot- OKSTATE -dot- EDU]On Behalf Of Eric J. Ray
Sent: Wednesday, August 18, 1999 11:22 AM
To: TECHWR-L -at- LISTSERV -dot- OKSTATE -dot- EDU
Subject: Re: Word 2000 HTML conversion
> Using Word 97 to convert a Word doc to HTML results in a lot of extraneous
> tags and a lot of manual clean-up. Someone told me here that Word 2000
> produces a tighter, cleaner file.
>
> Has anyone out there tried this? Confirm? Deny?
It's worse. Because Microsoft claims that you can round-trip
files from Word to HTML and back to Word without losing
formatting, they've had to add a ton of XML (pseudo XML,
actually, with a lot of non-standard namespace issues)
to each HTML document. When tech editing a new book on
MS Word 2000, I tested it with the classic "Hello, World"
and nothing else visible in the Word file, and came up with
just about 100 lines of code in the HTML document.
It might be a non-issue on an intranet, but you've got
enough bloat to be a serious issue on the Internet.
Eric
--
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Eric J. Ray ejray -at- raycomm -dot- com
UNIX Visual QuickStart Guide is "a superb book!"
Don't believe it? Check for yourself!
Find out more at http://www.raycomm.com/