public inbox for
 help / color / mirror / Atom feed
From: Mark Whitis <>
To: Tim Waugh <>
Cc: "Éric Bischoff" <>,
Subject: Re: multiple bugs and security hole (was: Re: BUG: docbook2html --nochunks)
Date: Wed, 10 Apr 2002 19:40:00 -0000	[thread overview]
Message-ID: <> (raw)
Message-ID: <20020410194000.3wORln-qgkmz95VXn1KGy03unGYg6zfAQ5j-8uac1E8@z> (raw)
In-Reply-To: <>

Sorry if this message seems overly negative.  I only have time
to mention that which is broke.  Not talk about what works or
contribute patches for what's broke.  Thanks to everyone who is
making Docbook, SGML, and HTML document preparation feasable.
I coauthored an 800 page linux book; docbook is MUCH better
than the publisher's dreadful MS-Word style sheet based

Versions in use (Redhat RPMs):
(i'd need to download half of skipjack to install newer versions - this
is only a little bit of an exageration if preliminary examination
and past experience are any indication).

On Wed, 10 Apr 2002, Tim Waugh wrote:
> Indeed.  Here is the patch I used:

Thanks for the quick response.  I applied that patch directly to
/usr/bin/jw and it sorta-kinda fixed the problem.  Still, it is a
kludge rather than a proper bugfix. docbook2html still can't be used
as a proper filter, for example:

   <generate_docbook> | docbook2html ... | tidy ... | ...

This is un*x.  Filters should be able to take input on standard in
and send output to standard out with errors to standard error.
Obviously, you can't do that if you are using it set to blow chunks,
but that is a special mode of operation suitable only for documents
that are not only very large but also written by someone you trust.

The blow chunks mode is also probably also a serious security
hole in many situations (it creates files on the host system with
names based on text supplied by the untrustworthy remote user who
supplied the file).   Don't believe me?  Try this
     <chapter id="/etc/youarescrewed">
Jade will complain aobut "/" in an id but
will still happily create overwrite /etc/youarescrewed if you run it as
root.  Even as a
non-root user, there are plenty of files which can be compromised, such as
".ssh/authorized_keys".  Yes, there will probably be a ".html" tacked
onto the end of the file name.   That DOES NOT mean you are safe.
Not only might there be another hole that lets an attacker
get rid of the ".html" (perhaps as simple as a buffer
overflow), there are places  where a file can
do serious damage even with a ".html" extension, like in
the directories "/etc/rc.d/rc3.d/" (executable flag probably required) and
/etc/xinetd.d/ (executable flag not required).  And if docbook2html
is not run as root, there may still be dot file directorys which will be
executed even with a .html extension.  Not to mention an HTML file
itself may be the target of an attack; here let me just trojan
this copy of the nimbda worm (which is carred in .html files)
into ~/public_html/index.html.   Or maybe I will distribute
a trojan docbook document which overwrites /www/index.html with
   7|-|15 5173 15 0\/\/|\|3D BY 7|-|3 D00D
If the person who reads that document has access to /www, there
will be trouble.

So, no document should ever be processed in blow chunks mode
unless you personally wrote it.     Which also means no document
should should be distributed that needs blow chunks mode
to be processed.   And blow chunks mode should definitely
NOT be the default for docbook2html.   And no document
should every be processed by docbook2html in a vgi-bin
unless the document was written by a user on that server.
So, you can forget about:
   - An HTML translation service that allows web users
     who do not have docbook translation software on their
     computer to enter a URL of a docbook or upload it
     and view the results.
   - Any server that publishes documents from anyone
     who isn't absolutely trustworthy and accepts those
     documents in docbook format and generates html.
     So much for the LDP,, etc.   One clown
     writes a trojan "Post-It note mini-HOWTO" and
     the server is compromised.   It is not like
     the people administering those servers are
     likely to have time to validate every revision
     of every document.
   - secure non-shell ISP which requires people to create their web
     pages in XML/SGML instead of using Frontpage.
   - a web browser which supports docbook by using docbook2html
     as an external filter.

At least using docbook2html as a standard unix filter, if that
were actually allowed, would be inherently more secure.  Of course,
you still can't allow sloppy code leading to buffer overlows in
openjade, jw, etc.

Lest some niave person suggest that fixing jade so it will not
accept "/" in a chapter "id" is the fix for the problem, history
has repeatedly demonstrated that deny-known-bad is NOT an effective
security proceedure.  Someone will, using a hypothetical example, find
out that the kernel also accepts ASCII character 254 as a synonym for "/".
Or that id="%2Fetc%2Fpasswd" gets through.   Yes, jade needs to
be fixed to allow an extremely limited character set to appear
in filenames based on document supplied text; for example,
every single character other than A-Za-z0-9 should be translated
to an underscore, multiple consequitive should be compressed to one,,
leading and training undersocres should be deleted, the total length should
be limited, and if appropriate a number should be added to the end to
prevent two identifiers from mapping to the same file name.

Denial of service attack:  Lets suppose that on a system with
a 65536 inode limit, I process a mailicious file which has 65536

Again, don't blow chunks unless you seriously trust the document
author.  Even if it was a 300 page book, I would run as a pipe
producing a single document unless I had a good reason to trust
the author.

On a related note, Docbook2html files actually need to be tidy'ed so badly
that you might consider making a call to tidy (with configurable
options), a built option (or better yet, fix the generator - but that
is probably jade).   The output is technically legal HTML but the
formatting violates the spirit of HTML.

> I hope to have time to look at making a new release in the next few
> weeks.

Another question: does either 0.6.9 or the upcoming release fix
the "URL not supported" problem?   docbook2html chokes on the DOCTYPE
in files generated by abiword:
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"

Results in
   URL not supported by this version
   DTD did not contain element declaration for document type name
   element "BOOK" undefined
   element "CHAPTER" undefined
   element "SECTION" undefined
   element "TITLE" undefined
   element "PARA" undefined

Now, this appears to be at least two bugs:
   - URL in DOCTYPE is unimplemented feature

   - failure to use a good catch-all document type where an exact
     stylesheet match is not found.   If someone ran docbook2html,
     they have already said the document is in some form of docbook.
     The program should not puke because someone omitted a doctype
     or uses a doctype different/newer than the style sheets on
     my system.

      <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.0//EN">
      <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.1//EN">
      <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.2//EN">

     Now it is very likely that at some point, a docbook user
     is going to receive a document in a newer version of
     docbook than their system recognizes.  One of the primary
     reasons for using XML/SGML markup in the first place
     instead of horrible proprietary formats is that new
     features can be added in the future but old programs
     can still read the document although they may not
     be able to render any new constructs.  If you
     process this document with a default stylesheet,
     you will probably get 95% of the content or more.
     You may even get 100% since a document might well
     be labeled "DocBook 4.2" but actually only uses
     "Docbook 4.1 features".

   blow chunks:  (blo chungks)
     v. intr.
     1. (slang) To regurgitate
     2. An extremely disdainful term for
     running docbook2html without --nochunks, or a
     similar program in a similar mode, such that you generate
     multiple tiny output files from a single source file (or
     a single source file with other files included by reference.
     As would be done by a webmaster with poor judgement on
     small documents, causing grief for users loading, reading, printing,
     searching, archiving, and making offline use of  web pages.
     Or as might legitimately be done
     by the author of a VERY large document such as a multihundred
     page book (which should still be availible as a single file
     if the user wants it that way) that might legitimately
     be viewed as a number of smaller _but still substantial_
     documents.  Dividing documents up into screenful sized
     chunks of text is something only a propeller head would do;
     it might appeal to other propeller heads who are easily
     amused by having buttons to push but it really pisses off
     people who have useful work to do.

Mark Whitis       NO SPAM
Author of many open source software packages.
Coauthor: Linux Programming Unleashed (1st Edition)

  parent reply	other threads:[~2002-04-11  2:40 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-12-20 19:23 BUG: docbook2html --nochunks Mark Whitis
2002-04-09 17:57 ` Mark Whitis
2002-12-20 19:23 ` Éric Bischoff
2002-04-09 23:57   ` Éric Bischoff
2002-12-20 19:23   ` Tim Waugh
2002-04-10  0:03     ` Tim Waugh
2002-12-20 19:23     ` Mark Whitis [this message]
2002-04-10 19:40       ` multiple bugs and security hole (was: Re: BUG: docbook2html --nochunks) Mark Whitis
2002-12-20 19:23       ` Tim Waugh
2002-04-11  7:17         ` Tim Waugh
2002-12-20 19:23       ` New location of the "Crash DocBook" Éric Bischoff
2002-04-11  3:25         ` Éric Bischoff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).