From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Edward C. Bailey" To: docbook-tools-discuss@sourceware.cygnus.com Subject: Re: docbook-tools-discuss: Re: I'm trying to set up docbook-tools... Date: Wed, 27 Dec 2000 06:36:00 -0000 Message-id: References: <200007041511.LAA15779@snark.thyrsus.com> <00070410352500.07357@ehome.inhouse> <873dlnjklb.fsf@nwalsh.com> <20000706095446.A13085@kstarr.celestial.com> <87puoqniw7.fsf@nwalsh.com> X-SW-Source: 2000/msg00258.html >>>>> "Norm" == Norman Walsh writes: ... Norm> A little more discussion about how to convert from procedural markup Norm> to structural markup is probably in order, but tools to do this are Norm> very, very hard to write. This is the problem I call "dragging markup Norm> up hill". Look at the troff source for an (old) O'Reilly book (I Norm> have :-), and you'll find that the same troff markup for "italic" is Norm> used for all the things that are italic in print. (Quelle Norm> surprise). But if you want to mark those things up semantically, you Norm> have to distinguish between at least three or four different kinds of Norm> italic things which is nearly impossible to do accurately. We had the same problem going from LaTeX to DocBook; for every \texttt{foo}, our script converted it to foo. We then used Emacs to do multiple query-replaces (ie, one to go from "TT?" to "filename", one for "TT?" to "command", etc). Once you got going, it was possible to crank through a surprising volume of markup in a reasonable amount of time. Pretty mind-numbing, though... :-) And I wouldn't recommend taking this approach if you're a large company converting tons of legacy content. To really automate this kind of thing requires something on the order of a HAL 9000 -- by looking at the few words surrounding the content in question, a human being can make a pretty accurate assessment in a second or so, but having a machine do the same thing is "Sir Not-Appearing-in-This-Film", at least for the time being... :-) Ed -- Ed Bailey Red Hat, Inc. http://www.redhat.com/