public inbox for docbook-tools-discuss@sourceware.org
 help / color / mirror / Atom feed
* BUG: docbook2html --nochunks
  2002-12-20 19:23 BUG: docbook2html --nochunks Mark Whitis
@ 2002-04-09 17:57 ` Mark Whitis
  2002-12-20 19:23 ` Éric Bischoff
  1 sibling, 0 replies; 12+ messages in thread
From: Mark Whitis @ 2002-04-09 17:57 UTC (permalink / raw)
  To: docbook-tools-discuss

If you tell docbook2html not to blow chunks, it writes the output
html document to standard out.   This is fine except for the fact
that it also outputs status messages to standard out (instead
of sending them to stderr where the belong).  So you end up
with an invalid document because the status messages are mixed in.

So, you get stuff like this:
   Using catalogs: /etc/sgml/sgml-docbook-4.0.cat
   Using stylesheet: /usr/share/sgml/docbook/utils-0.6/docbook-utils.dsl#html
   Working on: /home/whitis/docbook/sample.docbook
   [...]
   Done.


--
Mark Whitis   http://www.freelabs.com/~whitis/       NO SPAM
Author of many open source software packages.
Coauthor: Linux Programming Unleashed (1st Edition)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BUG: docbook2html --nochunks
  2002-12-20 19:23 ` Éric Bischoff
@ 2002-04-09 23:57   ` Éric Bischoff
  2002-12-20 19:23   ` Tim Waugh
  1 sibling, 0 replies; 12+ messages in thread
From: Éric Bischoff @ 2002-04-09 23:57 UTC (permalink / raw)
  To: Mark Whitis, docbook-tools-discuss, twaugh

On Wednesday 10 April 2002 02:58, Mark Whitis wrote:
> If you tell docbook2html not to blow chunks, it writes the output
> html document to standard out.   This is fine except for the fact
> that it also outputs status messages to standard out (instead
> of sending them to stderr where the belong).  So you end up
> with an invalid document because the status messages are mixed in.
>
> So, you get stuff like this:
>    Using catalogs: /etc/sgml/sgml-docbook-4.0.cat
>    Using stylesheet:
> /usr/share/sgml/docbook/utils-0.6/docbook-utils.dsl#html Working on:
> /home/whitis/docbook/sample.docbook
>    [...]
>    Done.

I think this has already been fixed. Tim ?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BUG: docbook2html --nochunks
  2002-12-20 19:23   ` Tim Waugh
@ 2002-04-10  0:03     ` Tim Waugh
  2002-12-20 19:23     ` multiple bugs and security hole (was: Re: BUG: docbook2html --nochunks) Mark Whitis
  1 sibling, 0 replies; 12+ messages in thread
From: Tim Waugh @ 2002-04-10  0:03 UTC (permalink / raw)
  To: Éric Bischoff; +Cc: Mark Whitis, docbook-tools-discuss

[-- Attachment #1: Type: text/plain, Size: 715 bytes --]

On Wed, Apr 10, 2002 at 08:56:48AM +0200, Éric Bischoff wrote:

> I think this has already been fixed. Tim ?

Indeed.  Here is the patch I used:

--- docbook-utils-0.6.9/bin/jw.in.nochunks	Tue Jul  3 14:57:32 2001
+++ docbook-utils-0.6.9/bin/jw.in	Tue Jul  3 14:59:52 2001
@@ -369,7 +369,12 @@
 cd $SGML_OUTPUT_DIRECTORY
 export SGML_JADE SGML_FILE_NAME SGML_ARGUMENTS
 export SGML_CATALOG_FILES SGML_BASE_DIR SGML_FILE SGML_STYLESHEET
-sh $SGML_BACKEND
+if [ -z "$SGML_NOCHUNKS" ]
+then
+	sh $SGML_BACKEND
+else
+	sh $SGML_BACKEND >$SGML_FILE_NAME.html
+fi
 SGML_RETURN=$?
 cd $SGML_CURRENT_DIRECTORY

I hope to have time to look at making a new release in the next few
weeks.
 
Tim.
*/

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: multiple bugs and security hole (was: Re: BUG: docbook2html --nochunks)
  2002-12-20 19:23     ` multiple bugs and security hole (was: Re: BUG: docbook2html --nochunks) Mark Whitis
@ 2002-04-10 19:40       ` Mark Whitis
  2002-12-20 19:23       ` Tim Waugh
  2002-12-20 19:23       ` New location of the "Crash Course.to DocBook" Éric Bischoff
  2 siblings, 0 replies; 12+ messages in thread
From: Mark Whitis @ 2002-04-10 19:40 UTC (permalink / raw)
  To: Tim Waugh; +Cc: Éric Bischoff, docbook-tools-discuss

Sorry if this message seems overly negative.  I only have time
to mention that which is broke.  Not talk about what works or
contribute patches for what's broke.  Thanks to everyone who is
making Docbook, SGML, and HTML document preparation feasable.
I coauthored an 800 page linux book; docbook is MUCH better
than the publisher's dreadful MS-Word style sheet based
monstrosity.

Versions in use (Redhat RPMs):
   openjade-1.3-13
   docbook-utils-0.6-13
(i'd need to download half of skipjack to install newer versions - this
is only a little bit of an exageration if preliminary examination
and past experience are any indication).

On Wed, 10 Apr 2002, Tim Waugh wrote:
> Indeed.  Here is the patch I used:

Thanks for the quick response.  I applied that patch directly to
/usr/bin/jw and it sorta-kinda fixed the problem.  Still, it is a
kludge rather than a proper bugfix. docbook2html still can't be used
as a proper filter, for example:

   <generate_docbook> | docbook2html ... | tidy ... | ...

This is un*x.  Filters should be able to take input on standard in
and send output to standard out with errors to standard error.
Obviously, you can't do that if you are using it set to blow chunks,
but that is a special mode of operation suitable only for documents
that are not only very large but also written by someone you trust.

The blow chunks mode is also probably also a serious security
hole in many situations (it creates files on the host system with
names based on text supplied by the untrustworthy remote user who
supplied the file).   Don't believe me?  Try this
     <chapter id="/etc/youarescrewed">
Jade will complain aobut "/" in an id but
will still happily create overwrite /etc/youarescrewed if you run it as
root.  Even as a
non-root user, there are plenty of files which can be compromised, such as
".ssh/authorized_keys".  Yes, there will probably be a ".html" tacked
onto the end of the file name.   That DOES NOT mean you are safe.
Not only might there be another hole that lets an attacker
get rid of the ".html" (perhaps as simple as a buffer
overflow), there are places  where a file can
do serious damage even with a ".html" extension, like in
the directories "/etc/rc.d/rc3.d/" (executable flag probably required) and
/etc/xinetd.d/ (executable flag not required).  And if docbook2html
is not run as root, there may still be dot file directorys which will be
executed even with a .html extension.  Not to mention an HTML file
itself may be the target of an attack; here let me just trojan
this copy of the nimbda worm (which is carred in .html files)
into ~/public_html/index.html.   Or maybe I will distribute
a trojan docbook document which overwrites /www/index.html with
   7|-|15 5173 15 0\/\/|\|3D BY 7|-|3 D00D
   (THIS SITEIS OWNED BY THE DUDE)
If the person who reads that document has access to /www, there
will be trouble.

So, no document should ever be processed in blow chunks mode
unless you personally wrote it.     Which also means no document
should should be distributed that needs blow chunks mode
to be processed.   And blow chunks mode should definitely
NOT be the default for docbook2html.   And no document
should every be processed by docbook2html in a vgi-bin
unless the document was written by a user on that server.
So, you can forget about:
   - An HTML translation service that allows web users
     who do not have docbook translation software on their
     computer to enter a URL of a docbook or upload it
     and view the results.
   - Any server that publishes documents from anyone
     who isn't absolutely trustworthy and accepts those
     documents in docbook format and generates html.
     So much for the LDP, linux.org, etc.   One clown
     writes a trojan "Post-It note mini-HOWTO" and
     the server is compromised.   It is not like
     the people administering those servers are
     likely to have time to validate every revision
     of every document.
   - secure non-shell ISP which requires people to create their web
     pages in XML/SGML instead of using Frontpage.
   - a web browser which supports docbook by using docbook2html
     as an external filter.

At least using docbook2html as a standard unix filter, if that
were actually allowed, would be inherently more secure.  Of course,
you still can't allow sloppy code leading to buffer overlows in
openjade, jw, etc.

Lest some niave person suggest that fixing jade so it will not
accept "/" in a chapter "id" is the fix for the problem, history
has repeatedly demonstrated that deny-known-bad is NOT an effective
security proceedure.  Someone will, using a hypothetical example, find
out that the kernel also accepts ASCII character 254 as a synonym for "/".
Or that id="%2Fetc%2Fpasswd" gets through.   Yes, jade needs to
be fixed to allow an extremely limited character set to appear
in filenames based on document supplied text; for example,
every single character other than A-Za-z0-9 should be translated
to an underscore, multiple consequitive should be compressed to one,,
leading and training undersocres should be deleted, the total length should
be limited, and if appropriate a number should be added to the end to
prevent two identifiers from mapping to the same file name.

Denial of service attack:  Lets suppose that on a system with
a 65536 inode limit, I process a mailicious file which has 65536
<chapter>'s.

Again, don't blow chunks unless you seriously trust the document
author.  Even if it was a 300 page book, I would run as a pipe
producing a single document unless I had a good reason to trust
the author.

On a related note, Docbook2html files actually need to be tidy'ed so badly
that you might consider making a call to tidy (with configurable
options), a built option (or better yet, fix the generator - but that
is probably jade).   The output is technically legal HTML but the
formatting violates the spirit of HTML.

> I hope to have time to look at making a new release in the next few
> weeks.


Another question: does either 0.6.9 or the upcoming release fix
the "URL not supported" problem?   docbook2html chokes on the DOCTYPE
in files generated by abiword:
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
	"http://www.oasis-open.org/docbook/xml/4.0/docbookx.dtd"

Results in
   URL not supported by this version
   DTD did not contain element declaration for document type name
   element "BOOK" undefined
   element "CHAPTER" undefined
   element "SECTION" undefined
   element "TITLE" undefined
   element "PARA" undefined
   ...

Now, this appears to be at least two bugs:
   - URL in DOCTYPE is unimplemented feature

   - failure to use a good catch-all document type where an exact
     stylesheet match is not found.   If someone ran docbook2html,
     they have already said the document is in some form of docbook.
     The program should not puke because someone omitted a doctype
     or uses a doctype different/newer than the style sheets on
     my system.

     works:
      <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.0//EN">
      <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.1//EN">
     pukes:
      <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.2//EN">

     Now it is very likely that at some point, a docbook user
     is going to receive a document in a newer version of
     docbook than their system recognizes.  One of the primary
     reasons for using XML/SGML markup in the first place
     instead of horrible proprietary formats is that new
     features can be added in the future but old programs
     can still read the document although they may not
     be able to render any new constructs.  If you
     process this document with a default stylesheet,
     you will probably get 95% of the content or more.
     You may even get 100% since a document might well
     be labeled "DocBook 4.2" but actually only uses
     "Docbook 4.1 features".

[Definition:
   blow chunks:  (blo chungks)
     v. intr.
     1. (slang) To regurgitate
     2. An extremely disdainful term for
     running docbook2html without --nochunks, or a
     similar program in a similar mode, such that you generate
     multiple tiny output files from a single source file (or
     a single source file with other files included by reference.
     As would be done by a webmaster with poor judgement on
     small documents, causing grief for users loading, reading, printing,
     searching, archiving, and making offline use of  web pages.
     Or as might legitimately be done
     by the author of a VERY large document such as a multihundred
     page book (which should still be availible as a single file
     if the user wants it that way) that might legitimately
     be viewed as a number of smaller _but still substantial_
     documents.  Dividing documents up into screenful sized
     chunks of text is something only a propeller head would do;
     it might appeal to other propeller heads who are easily
     amused by having buttons to push but it really pisses off
     people who have useful work to do.
]



--
Mark Whitis   http://www.freelabs.com/~whitis/       NO SPAM
Author of many open source software packages.
Coauthor: Linux Programming Unleashed (1st Edition)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* New location of the "Crash Course.to DocBook"
  2002-12-20 19:23       ` New location of the "Crash Course.to DocBook" Éric Bischoff
@ 2002-04-11  3:25         ` Éric Bischoff
  0 siblings, 0 replies; 12+ messages in thread
From: Éric Bischoff @ 2002-04-11  3:25 UTC (permalink / raw)
  To: Docbook tools

Hi all,

As I'm leaving Caldera to start my own company, I had to move the location of 
the "Crash Course to DocBook".

Former address was :
	http://www.caldera.de/~eric/crash-course/HTML/index.html

New address is :
	http://www.bureau-cornavin.com/opensource/crash-course/index.html

Sorry for the inconvenience.


<spam mode="for-those-interested">
The "Bureau Cornavin" is a new company specialized in:
- translation of technical documents
(English to German, French, Italian, Spanish, Portuguese, Brazilian,
 Czech, Romanian, Turkish, Hungarian and Polish ; German to French).
- documentation writing
- documentation and XML expertise
</spam>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: multiple bugs and security hole (was: Re: BUG: docbook2html --nochunks)
  2002-12-20 19:23       ` Tim Waugh
@ 2002-04-11  7:17         ` Tim Waugh
  0 siblings, 0 replies; 12+ messages in thread
From: Tim Waugh @ 2002-04-11  7:17 UTC (permalink / raw)
  To: Mark Whitis; +Cc: Éric Bischoff, docbook-tools-discuss

[-- Attachment #1: Type: text/plain, Size: 3426 bytes --]

Hi Mark,

Thanks for your feedback.

> Thanks for the quick response.  I applied that patch directly to
> /usr/bin/jw and it sorta-kinda fixed the problem.  Still, it is a
> kludge rather than a proper bugfix. docbook2html still can't be used
> as a proper filter, for example:
> 
>    <generate_docbook> | docbook2html ... | tidy ... | ...

Well then all the other backends are 'broken', if you take that
attitude.  I think a more useful approach is to have consistent
behaviour across all the backends: that of generating one or more
output files in the current (or a specified) directory.  That's what
the man page says it does.

> This is un*x.  Filters should be able to take input on standard in
> and send output to standard out with errors to standard error.

If jw were to output to stdout, it would (in general) need to send a
tar file!

> The blow chunks mode is also probably also a serious security
> hole in many situations (it creates files on the host system with
> names based on text supplied by the untrustworthy remote user who
> supplied the file).   Don't believe me?  Try this
>      <chapter id="/etc/youarescrewed">

Yes, this is an interesting attack.  The docbook-dsssl package by
default makes up its own names for output files when chunking; the Red
Hat Linux docbook-utils package comes with a default custom stylesheet
which turns on a feature to use IDs as filenames.  We'll be correcting
that shortly.

> Denial of service attack:  Lets suppose that on a system with
> a 65536 inode limit, I process a mailicious file which has 65536
> <chapter>'s.

I can say the same thing about tar files (for example).

> On a related note, Docbook2html files actually need to be tidy'ed so
> badly that you might consider making a call to tidy (with
> configurable options), a built option (or better yet, fix the
> generator - but that is probably jade).  The output is technically
> legal HTML but the formatting violates the spirit of HTML.

The output is determined by the stylesheets.  They are the way they
are because of technical details---significant whitespace is the
reason for '>' being separate to the rest of the element, for example.

I'm sure that Norm would welcome patches that make the HTML output
nicer to read.  How's your DSSSL? ;-)

(On the other hand, who is it that is editing generating output rather
than editing the source?)

> Another question: does either 0.6.9 or the upcoming release fix
> the "URL not supported" problem?   docbook2html chokes on the DOCTYPE
> in files generated by abiword:
> <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
> 	"http://www.oasis-open.org/docbook/xml/4.0/docbookx.dtd"

For a long time the Red Hat Linux openjade package came with HTTP
support disabled.  It is enabled in the current package (in
Skipjack).

But you might want to consider using an XSL processor for DocBook
XML.  Take a look at the xmlto package for a way to start.

> Now, this appears to be at least two bugs:
>    - URL in DOCTYPE is unimplemented feature

(Actually a feature that defaults to 'disabled'.)

>    - failure to use a good catch-all document type where an exact
>      stylesheet match is not found.

This is an unreasonable requirement and would just generate bogus bug
reports.  People should install the DTD for the document they are
processing.

Tim.
*/

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* BUG: docbook2html --nochunks
@ 2002-12-20 19:23 Mark Whitis
  2002-04-09 17:57 ` Mark Whitis
  2002-12-20 19:23 ` Éric Bischoff
  0 siblings, 2 replies; 12+ messages in thread
From: Mark Whitis @ 2002-12-20 19:23 UTC (permalink / raw)
  To: docbook-tools-discuss

If you tell docbook2html not to blow chunks, it writes the output
html document to standard out.   This is fine except for the fact
that it also outputs status messages to standard out (instead
of sending them to stderr where the belong).  So you end up
with an invalid document because the status messages are mixed in.

So, you get stuff like this:
   Using catalogs: /etc/sgml/sgml-docbook-4.0.cat
   Using stylesheet: /usr/share/sgml/docbook/utils-0.6/docbook-utils.dsl#html
   Working on: /home/whitis/docbook/sample.docbook
   [...]
   Done.


--
Mark Whitis   http://www.freelabs.com/~whitis/       NO SPAM
Author of many open source software packages.
Coauthor: Linux Programming Unleashed (1st Edition)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BUG: docbook2html --nochunks
  2002-12-20 19:23 ` Éric Bischoff
  2002-04-09 23:57   ` Éric Bischoff
@ 2002-12-20 19:23   ` Tim Waugh
  2002-04-10  0:03     ` Tim Waugh
  2002-12-20 19:23     ` multiple bugs and security hole (was: Re: BUG: docbook2html --nochunks) Mark Whitis
  1 sibling, 2 replies; 12+ messages in thread
From: Tim Waugh @ 2002-12-20 19:23 UTC (permalink / raw)
  To: Éric Bischoff; +Cc: Mark Whitis, docbook-tools-discuss

[-- Attachment #1: Type: text/plain, Size: 715 bytes --]

On Wed, Apr 10, 2002 at 08:56:48AM +0200, Éric Bischoff wrote:

> I think this has already been fixed. Tim ?

Indeed.  Here is the patch I used:

--- docbook-utils-0.6.9/bin/jw.in.nochunks	Tue Jul  3 14:57:32 2001
+++ docbook-utils-0.6.9/bin/jw.in	Tue Jul  3 14:59:52 2001
@@ -369,7 +369,12 @@
 cd $SGML_OUTPUT_DIRECTORY
 export SGML_JADE SGML_FILE_NAME SGML_ARGUMENTS
 export SGML_CATALOG_FILES SGML_BASE_DIR SGML_FILE SGML_STYLESHEET
-sh $SGML_BACKEND
+if [ -z "$SGML_NOCHUNKS" ]
+then
+	sh $SGML_BACKEND
+else
+	sh $SGML_BACKEND >$SGML_FILE_NAME.html
+fi
 SGML_RETURN=$?
 cd $SGML_CURRENT_DIRECTORY

I hope to have time to look at making a new release in the next few
weeks.
 
Tim.
*/

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: multiple bugs and security hole (was: Re: BUG: docbook2html --nochunks)
  2002-12-20 19:23     ` multiple bugs and security hole (was: Re: BUG: docbook2html --nochunks) Mark Whitis
  2002-04-10 19:40       ` Mark Whitis
@ 2002-12-20 19:23       ` Tim Waugh
  2002-04-11  7:17         ` Tim Waugh
  2002-12-20 19:23       ` New location of the "Crash Course.to DocBook" Éric Bischoff
  2 siblings, 1 reply; 12+ messages in thread
From: Tim Waugh @ 2002-12-20 19:23 UTC (permalink / raw)
  To: Mark Whitis; +Cc: Éric Bischoff, docbook-tools-discuss

[-- Attachment #1: Type: text/plain, Size: 3426 bytes --]

Hi Mark,

Thanks for your feedback.

> Thanks for the quick response.  I applied that patch directly to
> /usr/bin/jw and it sorta-kinda fixed the problem.  Still, it is a
> kludge rather than a proper bugfix. docbook2html still can't be used
> as a proper filter, for example:
> 
>    <generate_docbook> | docbook2html ... | tidy ... | ...

Well then all the other backends are 'broken', if you take that
attitude.  I think a more useful approach is to have consistent
behaviour across all the backends: that of generating one or more
output files in the current (or a specified) directory.  That's what
the man page says it does.

> This is un*x.  Filters should be able to take input on standard in
> and send output to standard out with errors to standard error.

If jw were to output to stdout, it would (in general) need to send a
tar file!

> The blow chunks mode is also probably also a serious security
> hole in many situations (it creates files on the host system with
> names based on text supplied by the untrustworthy remote user who
> supplied the file).   Don't believe me?  Try this
>      <chapter id="/etc/youarescrewed">

Yes, this is an interesting attack.  The docbook-dsssl package by
default makes up its own names for output files when chunking; the Red
Hat Linux docbook-utils package comes with a default custom stylesheet
which turns on a feature to use IDs as filenames.  We'll be correcting
that shortly.

> Denial of service attack:  Lets suppose that on a system with
> a 65536 inode limit, I process a mailicious file which has 65536
> <chapter>'s.

I can say the same thing about tar files (for example).

> On a related note, Docbook2html files actually need to be tidy'ed so
> badly that you might consider making a call to tidy (with
> configurable options), a built option (or better yet, fix the
> generator - but that is probably jade).  The output is technically
> legal HTML but the formatting violates the spirit of HTML.

The output is determined by the stylesheets.  They are the way they
are because of technical details---significant whitespace is the
reason for '>' being separate to the rest of the element, for example.

I'm sure that Norm would welcome patches that make the HTML output
nicer to read.  How's your DSSSL? ;-)

(On the other hand, who is it that is editing generating output rather
than editing the source?)

> Another question: does either 0.6.9 or the upcoming release fix
> the "URL not supported" problem?   docbook2html chokes on the DOCTYPE
> in files generated by abiword:
> <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
> 	"http://www.oasis-open.org/docbook/xml/4.0/docbookx.dtd"

For a long time the Red Hat Linux openjade package came with HTTP
support disabled.  It is enabled in the current package (in
Skipjack).

But you might want to consider using an XSL processor for DocBook
XML.  Take a look at the xmlto package for a way to start.

> Now, this appears to be at least two bugs:
>    - URL in DOCTYPE is unimplemented feature

(Actually a feature that defaults to 'disabled'.)

>    - failure to use a good catch-all document type where an exact
>      stylesheet match is not found.

This is an unreasonable requirement and would just generate bogus bug
reports.  People should install the DTD for the document they are
processing.

Tim.
*/

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: BUG: docbook2html --nochunks
  2002-12-20 19:23 BUG: docbook2html --nochunks Mark Whitis
  2002-04-09 17:57 ` Mark Whitis
@ 2002-12-20 19:23 ` Éric Bischoff
  2002-04-09 23:57   ` Éric Bischoff
  2002-12-20 19:23   ` Tim Waugh
  1 sibling, 2 replies; 12+ messages in thread
From: Éric Bischoff @ 2002-12-20 19:23 UTC (permalink / raw)
  To: Mark Whitis, docbook-tools-discuss, twaugh

On Wednesday 10 April 2002 02:58, Mark Whitis wrote:
> If you tell docbook2html not to blow chunks, it writes the output
> html document to standard out.   This is fine except for the fact
> that it also outputs status messages to standard out (instead
> of sending them to stderr where the belong).  So you end up
> with an invalid document because the status messages are mixed in.
>
> So, you get stuff like this:
>    Using catalogs: /etc/sgml/sgml-docbook-4.0.cat
>    Using stylesheet:
> /usr/share/sgml/docbook/utils-0.6/docbook-utils.dsl#html Working on:
> /home/whitis/docbook/sample.docbook
>    [...]
>    Done.

I think this has already been fixed. Tim ?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* New location of the "Crash Course.to DocBook"
  2002-12-20 19:23     ` multiple bugs and security hole (was: Re: BUG: docbook2html --nochunks) Mark Whitis
  2002-04-10 19:40       ` Mark Whitis
  2002-12-20 19:23       ` Tim Waugh
@ 2002-12-20 19:23       ` Éric Bischoff
  2002-04-11  3:25         ` Éric Bischoff
  2 siblings, 1 reply; 12+ messages in thread
From: Éric Bischoff @ 2002-12-20 19:23 UTC (permalink / raw)
  To: Docbook tools

Hi all,

As I'm leaving Caldera to start my own company, I had to move the location of 
the "Crash Course to DocBook".

Former address was :
	http://www.caldera.de/~eric/crash-course/HTML/index.html

New address is :
	http://www.bureau-cornavin.com/opensource/crash-course/index.html

Sorry for the inconvenience.


<spam mode="for-those-interested">
The "Bureau Cornavin" is a new company specialized in:
- translation of technical documents
(English to German, French, Italian, Spanish, Portuguese, Brazilian,
 Czech, Romanian, Turkish, Hungarian and Polish ; German to French).
- documentation writing
- documentation and XML expertise
</spam>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: multiple bugs and security hole (was: Re: BUG: docbook2html --nochunks)
  2002-12-20 19:23   ` Tim Waugh
  2002-04-10  0:03     ` Tim Waugh
@ 2002-12-20 19:23     ` Mark Whitis
  2002-04-10 19:40       ` Mark Whitis
                         ` (2 more replies)
  1 sibling, 3 replies; 12+ messages in thread
From: Mark Whitis @ 2002-12-20 19:23 UTC (permalink / raw)
  To: Tim Waugh; +Cc: Éric Bischoff, docbook-tools-discuss

Sorry if this message seems overly negative.  I only have time
to mention that which is broke.  Not talk about what works or
contribute patches for what's broke.  Thanks to everyone who is
making Docbook, SGML, and HTML document preparation feasable.
I coauthored an 800 page linux book; docbook is MUCH better
than the publisher's dreadful MS-Word style sheet based
monstrosity.

Versions in use (Redhat RPMs):
   openjade-1.3-13
   docbook-utils-0.6-13
(i'd need to download half of skipjack to install newer versions - this
is only a little bit of an exageration if preliminary examination
and past experience are any indication).

On Wed, 10 Apr 2002, Tim Waugh wrote:
> Indeed.  Here is the patch I used:

Thanks for the quick response.  I applied that patch directly to
/usr/bin/jw and it sorta-kinda fixed the problem.  Still, it is a
kludge rather than a proper bugfix. docbook2html still can't be used
as a proper filter, for example:

   <generate_docbook> | docbook2html ... | tidy ... | ...

This is un*x.  Filters should be able to take input on standard in
and send output to standard out with errors to standard error.
Obviously, you can't do that if you are using it set to blow chunks,
but that is a special mode of operation suitable only for documents
that are not only very large but also written by someone you trust.

The blow chunks mode is also probably also a serious security
hole in many situations (it creates files on the host system with
names based on text supplied by the untrustworthy remote user who
supplied the file).   Don't believe me?  Try this
     <chapter id="/etc/youarescrewed">
Jade will complain aobut "/" in an id but
will still happily create overwrite /etc/youarescrewed if you run it as
root.  Even as a
non-root user, there are plenty of files which can be compromised, such as
".ssh/authorized_keys".  Yes, there will probably be a ".html" tacked
onto the end of the file name.   That DOES NOT mean you are safe.
Not only might there be another hole that lets an attacker
get rid of the ".html" (perhaps as simple as a buffer
overflow), there are places  where a file can
do serious damage even with a ".html" extension, like in
the directories "/etc/rc.d/rc3.d/" (executable flag probably required) and
/etc/xinetd.d/ (executable flag not required).  And if docbook2html
is not run as root, there may still be dot file directorys which will be
executed even with a .html extension.  Not to mention an HTML file
itself may be the target of an attack; here let me just trojan
this copy of the nimbda worm (which is carred in .html files)
into ~/public_html/index.html.   Or maybe I will distribute
a trojan docbook document which overwrites /www/index.html with
   7|-|15 5173 15 0\/\/|\|3D BY 7|-|3 D00D
   (THIS SITEIS OWNED BY THE DUDE)
If the person who reads that document has access to /www, there
will be trouble.

So, no document should ever be processed in blow chunks mode
unless you personally wrote it.     Which also means no document
should should be distributed that needs blow chunks mode
to be processed.   And blow chunks mode should definitely
NOT be the default for docbook2html.   And no document
should every be processed by docbook2html in a vgi-bin
unless the document was written by a user on that server.
So, you can forget about:
   - An HTML translation service that allows web users
     who do not have docbook translation software on their
     computer to enter a URL of a docbook or upload it
     and view the results.
   - Any server that publishes documents from anyone
     who isn't absolutely trustworthy and accepts those
     documents in docbook format and generates html.
     So much for the LDP, linux.org, etc.   One clown
     writes a trojan "Post-It note mini-HOWTO" and
     the server is compromised.   It is not like
     the people administering those servers are
     likely to have time to validate every revision
     of every document.
   - secure non-shell ISP which requires people to create their web
     pages in XML/SGML instead of using Frontpage.
   - a web browser which supports docbook by using docbook2html
     as an external filter.

At least using docbook2html as a standard unix filter, if that
were actually allowed, would be inherently more secure.  Of course,
you still can't allow sloppy code leading to buffer overlows in
openjade, jw, etc.

Lest some niave person suggest that fixing jade so it will not
accept "/" in a chapter "id" is the fix for the problem, history
has repeatedly demonstrated that deny-known-bad is NOT an effective
security proceedure.  Someone will, using a hypothetical example, find
out that the kernel also accepts ASCII character 254 as a synonym for "/".
Or that id="%2Fetc%2Fpasswd" gets through.   Yes, jade needs to
be fixed to allow an extremely limited character set to appear
in filenames based on document supplied text; for example,
every single character other than A-Za-z0-9 should be translated
to an underscore, multiple consequitive should be compressed to one,,
leading and training undersocres should be deleted, the total length should
be limited, and if appropriate a number should be added to the end to
prevent two identifiers from mapping to the same file name.

Denial of service attack:  Lets suppose that on a system with
a 65536 inode limit, I process a mailicious file which has 65536
<chapter>'s.

Again, don't blow chunks unless you seriously trust the document
author.  Even if it was a 300 page book, I would run as a pipe
producing a single document unless I had a good reason to trust
the author.

On a related note, Docbook2html files actually need to be tidy'ed so badly
that you might consider making a call to tidy (with configurable
options), a built option (or better yet, fix the generator - but that
is probably jade).   The output is technically legal HTML but the
formatting violates the spirit of HTML.

> I hope to have time to look at making a new release in the next few
> weeks.


Another question: does either 0.6.9 or the upcoming release fix
the "URL not supported" problem?   docbook2html chokes on the DOCTYPE
in files generated by abiword:
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
	"http://www.oasis-open.org/docbook/xml/4.0/docbookx.dtd"

Results in
   URL not supported by this version
   DTD did not contain element declaration for document type name
   element "BOOK" undefined
   element "CHAPTER" undefined
   element "SECTION" undefined
   element "TITLE" undefined
   element "PARA" undefined
   ...

Now, this appears to be at least two bugs:
   - URL in DOCTYPE is unimplemented feature

   - failure to use a good catch-all document type where an exact
     stylesheet match is not found.   If someone ran docbook2html,
     they have already said the document is in some form of docbook.
     The program should not puke because someone omitted a doctype
     or uses a doctype different/newer than the style sheets on
     my system.

     works:
      <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.0//EN">
      <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.1//EN">
     pukes:
      <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.2//EN">

     Now it is very likely that at some point, a docbook user
     is going to receive a document in a newer version of
     docbook than their system recognizes.  One of the primary
     reasons for using XML/SGML markup in the first place
     instead of horrible proprietary formats is that new
     features can be added in the future but old programs
     can still read the document although they may not
     be able to render any new constructs.  If you
     process this document with a default stylesheet,
     you will probably get 95% of the content or more.
     You may even get 100% since a document might well
     be labeled "DocBook 4.2" but actually only uses
     "Docbook 4.1 features".

[Definition:
   blow chunks:  (blo chungks)
     v. intr.
     1. (slang) To regurgitate
     2. An extremely disdainful term for
     running docbook2html without --nochunks, or a
     similar program in a similar mode, such that you generate
     multiple tiny output files from a single source file (or
     a single source file with other files included by reference.
     As would be done by a webmaster with poor judgement on
     small documents, causing grief for users loading, reading, printing,
     searching, archiving, and making offline use of  web pages.
     Or as might legitimately be done
     by the author of a VERY large document such as a multihundred
     page book (which should still be availible as a single file
     if the user wants it that way) that might legitimately
     be viewed as a number of smaller _but still substantial_
     documents.  Dividing documents up into screenful sized
     chunks of text is something only a propeller head would do;
     it might appeal to other propeller heads who are easily
     amused by having buttons to push but it really pisses off
     people who have useful work to do.
]



--
Mark Whitis   http://www.freelabs.com/~whitis/       NO SPAM
Author of many open source software packages.
Coauthor: Linux Programming Unleashed (1st Edition)

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2002-04-11 14:17 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-12-20 19:23 BUG: docbook2html --nochunks Mark Whitis
2002-04-09 17:57 ` Mark Whitis
2002-12-20 19:23 ` Éric Bischoff
2002-04-09 23:57   ` Éric Bischoff
2002-12-20 19:23   ` Tim Waugh
2002-04-10  0:03     ` Tim Waugh
2002-12-20 19:23     ` multiple bugs and security hole (was: Re: BUG: docbook2html --nochunks) Mark Whitis
2002-04-10 19:40       ` Mark Whitis
2002-12-20 19:23       ` Tim Waugh
2002-04-11  7:17         ` Tim Waugh
2002-12-20 19:23       ` New location of the "Crash Course.to DocBook" Éric Bischoff
2002-04-11  3:25         ` Éric Bischoff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).