Re: XML-based Help Authoring tools for customized help

Subject: Re: XML-based Help Authoring tools for customized help
From: "Mark Baker" <listsub -at- analecta -dot- com>
To: "TECHWR-L" <techwr-l -at- lists -dot- raycomm -dot- com>
Date: Mon, 15 Dec 2003 12:28:13 -0500


David Neeley wrote

> Thanks for your detailed treatment of my comments. I still think I may be
a little hazy as to several
> of the points, though, so perhaps you can enlighten me further.

And thank you for your detailed examination of my points. God really is in
the details.

> First, regarding whether DocBook is an "application" in the sense that
Frame or Word are "applications.

Okay, lets get away from the word "applications" because it really is not
the important word in my statement that DocBook is a packaged application
like Word or Frame. The important word is "packaged".

So let's state it this way:

Word, Frame, OpenOffice, are all packages. At the heart of each package is a
data format with a specific set of semantics. Those semantics differ in
detail, but they are all essentially document semantics, and they are close
enough to each other that data can be transferred effectively between one
package and another with minimal loss of information. Each of these formats
can be expressed in multiple different syntaxes. Word's format, for
instance, can be expressed in a binary syntax, in RTF, or in Word's XML
format. The syntax is different, but the underlying semantics are the same.

Each package also includes a number of programs for working on that data
format. This includes a main application and various supplementary, third
party (Robo Help, WebWorks, etc.) , and ad hoc programs. Each package
provides specific access points for developing third party and ad hoc
programs.

If you adopt one of these packages (which may or may not include purchasing
some of all of the third party pieces, and may or may not include developing
your own ad hoc programs). You get a rich set of capabilities which you can
use to write and publish documents. The range of those capabilities,
however, is limited by the underlying semantics of the data format. There
may be additional uses you can make of that data format by writing ad hoc
applications, and people do this using Frame Script or VBA. But the
fundamental limit on the range of such applications is set by the basic
semantics of the file format.

Finally, it is true of all these packages that their underlying file formats
are published and you can use them by themselves by writing your own set of
tools completely independent of the main package. This is not done often,
for obvious reasons, but it has been done.

Now my point is that Docbook is just like this on all the salient points.

At its heart, as you point out, is a data format with a specific set of
semantics. Those semantics are document structure semantics, just as they
are in Word and Frame's formats. Just like Word and Frame's formats, they
can be expressed in different syntaxes, in this case XML and SGML. And, as
you have also pointed out, exchange between DocBook and these other packages
is relatively simple. This is likewise because they have similar semantics.

There is also a standard package of applications for working with Docbook.
It is, I believe, part of some Linux distributions, as well as being
available for download. There are also third-party programs and access
points that allow you to write ad hoc programs. Now the exact architecture
of the programs in the packages may differ. Word and Frame both separate the
authoring function from the formatting and publishing function through the
use of stylesheets, but they combine these two operations into a single
executable program. In the Docbook package, the same separation between
authoring and publishing occurs, but it is packaged as separate executables.
This distinction may sometimes be important in choosing between one package
and another, but it does not alter that fact that they are all packages.

If you adopt Docbook, then you are adopting this package and all the
functionality it includes. You may not install all the parts of the package
and you may not do any ad hoc programming. But whether you do or you don't,
the limits of the capability of your package are set by the underlying
semantics of the data format.

And finally, it is true that you can download the file format alone and
build your own processing programs from scratch, but very few people do.

The decision to adopt Docbook, therefore is a decision to adopt a particular
publishing package with a particular set of document oriented semantics, and
it must be judged on its merits as such.

Bill was originally arguing for the adoption of Docbook, as opposed to the
development of a custom DTD, precisely on the ground that Docbook is a
complete package, as described above, and therefore does not require a lot
of custom development. All I did was to point out that, precisely because it
has this characteristic of being a complete package, it has all the
advantages and all the disadvantages of a complete package and should
therefore be judged against them.

Docbook is Docbook. Word is Word. OpenOffice is OpenOffice. Each uses XML
syntax as one means of expressing its file format. Not one of them is XML
incarnate.

> > However, they are not designed to be altered with any sort of equivalent
to schema
> > or DTD...
>
> "Sure they have. A Word stylesheet or a Frame template are the equivalent
of
> an XML DTD: They define a set of named "elements" which can be assigned to
> spans of text. You can change a stylesheet or template just as you can
> change a DTD, and for much the same purpose"
>
> Again, a very strained comparison, I believe. Again, I cannot speak for
the latest incarnation of Word. For Frame, the DocBook DTD is more
equivalent to a portion of the structured Frame EDD--which in this case
would incorporate the DocBook DTD as a part and various text formatting
rules as another, IIRC.

You are correct. A Word stylesheet incorporates both the definition of
element names (a DTD function) and the mapping of element names to
formatting outcomes (a function performed by XSL or a similar language with
Docbook). But the point is that the same functionality exists in each case.
Only the placing of implementation boundaries is different.

> To assert that DocBook or *any* XML language is not all-encompassing is to
create a straw
> man--easy to knock down, perhaps, but not useful in the real world outside
of argumentation.
> DocBook, while very useful, was never *intended* to be an "all things to
all people" XML language.

Exactly my point. The trouble is, people keep making posts to the list that
claim that the proper way to go to XML is to go to Docbook. To refute this
argument it is necessary to point out that Docbook is only one DTD and that
it may not be a suitable DTD for a particular application. That is the point
I have been trying to make and I am glad that we agree on it.

However, you yourself go on to assert that Docbook is the appropriate
starting point for any move to XML, which it clearly is not if what you say
above is correct and Docbook is not designed to meet all needs.

Your aversion to consultant is understandable given that many consultants in
this business go into a corporation telling them that they need to create
"a" DTD. They then indulge in extensive document modeling exercise across a
broad range of company documents which, in the end, inevitably produces
something more or less like Docbook. If your aversion to custom solutions
comes from an experience of this sort it is highly understandable. If you
want a generalized document format, then there is no reason not to use
Docbook rather than invent your own. However, there are some quite good
reasons, as Eric has articulated, for choosing Frame rather than Docbook.

My point is that there are some applications for which a generalized
document format is not a good format for data capture or manipulation. In
these cases these are no standard packages available and you have to develop
a custom application (or else just not bother). And if you want to develop a
custom application rapidly and inexpensively and if you want to make sure
that it is easy to learn, use, and maintain, you need to make sure that its
data formats are small, tight, and topical. This can be done inexpensively
and successfully. I know. I've done it myself and seen others do it as well.

Docbook is just simply not a good place to start from or to borrow from when
developing those small, tight, topical data formats. That is just one of the
many applications it is not suitable for. However, Docbook can plan an
important role in developing a tool chain for such an application. You can
hijack the entire Docbook tool chain by wring your custom processing
routines to create Docbook as an output format. Your custom application then
sits in front of the Docbook tool chain. This is a very good use of Docbook.

> Actually, I believe that not all information can be "chunked" with the
granularity you seem to insist
> upon.

I agree. Not all information can be usefully chunked. Narrative documents
are the only way to express many kinds of information. By the same token,
explicit document structure markup doesn't do much for such information
either. Word and Frame work quite well for writing and publishing
narratives.

> In many cases, in fact, such chunking strains the ability of the reader to
grasp the finished product.

Only if you do not reassemble the chunks into a readable synthesis.
Sometimes, chunking allows you to deliver highly individualized task
specific information that could never be economically be created by hand.
However, to accomplish this you need very specific topical data models, not
general document structures.

> It is also very difficult to manage at present, since most organizations
that have the greatest
> need also have weak abilities to find if a particular "chunk" has already
been defined
> elsewhere in their documentation base.

You cannot easily find chunks if you model information as documents. This is
precisely why you need topical DTDs, not document structure DTDs. The key to
finding the information you need is not to let it get lost in the first
place. Modeling information in a generalized document structure DTD is a
sure way to lose it.

> Frankly, I suspect strongly that this will continue until we have a better
handle upon
> automatically generated semantic analysis of a given document base with
what is
> being written or prepared.

Automated post-facto semantic analysis is unnecessary if the semantics are
capture up front by authors working with topical data model designed to
capture those semantics. Why throw the semantic information away by using a
general document structure DTD and then try to use an automated semantic
analyzer to try to get it back again. Capture the relevant semantics at the
time the data is created.

>The point is, though, that not all information is readily reusable. In
fact, many times the
> effort to parse it into various levels of detail so that any particular
level might be extracted
> through some sort of automated tool for various forms of output is simply
not worth the
> candle.

Not all information is readily reusable. However, a great deal of content
can be made more easily reusable by proper labeling. To get authors to do
that labeling accurately and quickly is the key to success. To get that to
happen you need to provide them with small, tight, topical data models.

> In fact, with the present state of the art, reaching the true level of
understanding of the
> information as information and not simply as linguistic elements implies a
great deal
> of metadata identification which is all too often completely beyond any
realistic budget
> of time, manpower, or money.

I agree, and I would go further. Capturing the full semantics of content in
anything more regular than natural language itself is impossible. But that
isn't the point. You do not need or want to capture all semantics. You need
only capture strategic elements of the identity of subjects and the type of
information about those subjects in order to permit some very powerful
techniques for creating, managing, and delivering information.

And to do that you need data models that are (drum roll please) small,
tight, and topical.

Mark Baker
Analecta Communications


^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

ROBOHELP FOR FRAMEMAKER TRIAL NOW AVAILABLE!

RoboHelp for FrameMaker is a NEW online publishing tool for FrameMaker that
lets you easily single-source content to online Help, intranet, and Web.
The interface is designed for FrameMaker users, so there is little or no
learning curve and no macro language required! Call 800-718-4407 for
competitive pricing or download a trial at: http://www.ehelp.com/techwr-l4

---
You are currently subscribed to techwr-l as:
archive -at- raycomm -dot- com
To unsubscribe send a blank email to leave-techwr-l-obscured -at- lists -dot- raycomm -dot- com
Send administrative questions to ejray -at- raycomm -dot- com -dot- Visit
http://www.raycomm.com/techwhirl/ for more resources and info.



References:
Re: XML-based Help Authoring tools for customized help: From: David Neeley

Previous by Author: Re: XML-based Help Authoring tools for customized help
Next by Author: Re: XML-based Help Authoring tools for customized help
Previous by Thread: Mimeo experiences?
Next by Thread: Re: XML-based Help Authoring tools for customized help


What this post helpful? Share it with friends and colleagues:


Sponsored Ads