From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16196 invoked by alias); 11 Apr 2002 02:40:40 -0000 Mailing-List: contact docbook-tools-discuss-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: docbook-tools-discuss-owner@sources.redhat.com Received: (qmail 16179 invoked from network); 11 Apr 2002 02:40:35 -0000 Received: from unknown (HELO scaup.prod.itd.earthlink.net) (207.217.120.49) by sources.redhat.com with SMTP; 11 Apr 2002 02:40:35 -0000 Received: from sdn-ar-001vacharp257.dialsprint.net ([168.191.212.147]) by scaup.prod.itd.earthlink.net with esmtp (Exim 3.33 #1) id 16vUVi-0004Bi-00; Wed, 10 Apr 2002 19:40:31 -0700 Date: Fri, 20 Dec 2002 19:23:00 -0000 From: Mark Whitis To: Tim Waugh cc: =?iso-8859-1?Q?=C9ric_Bischoff?= , Subject: Re: multiple bugs and security hole (was: Re: BUG: docbook2html --nochunks) In-Reply-To: <20020410080309.D13205@redhat.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-SW-Source: 2002/txt/msg00047.txt.bz2 Sorry if this message seems overly negative. I only have time to mention that which is broke. Not talk about what works or contribute patches for what's broke. Thanks to everyone who is making Docbook, SGML, and HTML document preparation feasable. I coauthored an 800 page linux book; docbook is MUCH better than the publisher's dreadful MS-Word style sheet based monstrosity. Versions in use (Redhat RPMs): openjade-1.3-13 docbook-utils-0.6-13 (i'd need to download half of skipjack to install newer versions - this is only a little bit of an exageration if preliminary examination and past experience are any indication). On Wed, 10 Apr 2002, Tim Waugh wrote: > Indeed. Here is the patch I used: Thanks for the quick response. I applied that patch directly to /usr/bin/jw and it sorta-kinda fixed the problem. Still, it is a kludge rather than a proper bugfix. docbook2html still can't be used as a proper filter, for example: | docbook2html ... | tidy ... | ... This is un*x. Filters should be able to take input on standard in and send output to standard out with errors to standard error. Obviously, you can't do that if you are using it set to blow chunks, but that is a special mode of operation suitable only for documents that are not only very large but also written by someone you trust. The blow chunks mode is also probably also a serious security hole in many situations (it creates files on the host system with names based on text supplied by the untrustworthy remote user who supplied the file). Don't believe me? Try this Jade will complain aobut "/" in an id but will still happily create overwrite /etc/youarescrewed if you run it as root. Even as a non-root user, there are plenty of files which can be compromised, such as ".ssh/authorized_keys". Yes, there will probably be a ".html" tacked onto the end of the file name. That DOES NOT mean you are safe. Not only might there be another hole that lets an attacker get rid of the ".html" (perhaps as simple as a buffer overflow), there are places where a file can do serious damage even with a ".html" extension, like in the directories "/etc/rc.d/rc3.d/" (executable flag probably required) and /etc/xinetd.d/ (executable flag not required). And if docbook2html is not run as root, there may still be dot file directorys which will be executed even with a .html extension. Not to mention an HTML file itself may be the target of an attack; here let me just trojan this copy of the nimbda worm (which is carred in .html files) into ~/public_html/index.html. Or maybe I will distribute a trojan docbook document which overwrites /www/index.html with 7|-|15 5173 15 0\/\/|\|3D BY 7|-|3 D00D (THIS SITEIS OWNED BY THE DUDE) If the person who reads that document has access to /www, there will be trouble. So, no document should ever be processed in blow chunks mode unless you personally wrote it. Which also means no document should should be distributed that needs blow chunks mode to be processed. And blow chunks mode should definitely NOT be the default for docbook2html. And no document should every be processed by docbook2html in a vgi-bin unless the document was written by a user on that server. So, you can forget about: - An HTML translation service that allows web users who do not have docbook translation software on their computer to enter a URL of a docbook or upload it and view the results. - Any server that publishes documents from anyone who isn't absolutely trustworthy and accepts those documents in docbook format and generates html. So much for the LDP, linux.org, etc. One clown writes a trojan "Post-It note mini-HOWTO" and the server is compromised. It is not like the people administering those servers are likely to have time to validate every revision of every document. - secure non-shell ISP which requires people to create their web pages in XML/SGML instead of using Frontpage. - a web browser which supports docbook by using docbook2html as an external filter. At least using docbook2html as a standard unix filter, if that were actually allowed, would be inherently more secure. Of course, you still can't allow sloppy code leading to buffer overlows in openjade, jw, etc. Lest some niave person suggest that fixing jade so it will not accept "/" in a chapter "id" is the fix for the problem, history has repeatedly demonstrated that deny-known-bad is NOT an effective security proceedure. Someone will, using a hypothetical example, find out that the kernel also accepts ASCII character 254 as a synonym for "/". Or that id="%2Fetc%2Fpasswd" gets through. Yes, jade needs to be fixed to allow an extremely limited character set to appear in filenames based on document supplied text; for example, every single character other than A-Za-z0-9 should be translated to an underscore, multiple consequitive should be compressed to one,, leading and training undersocres should be deleted, the total length should be limited, and if appropriate a number should be added to the end to prevent two identifiers from mapping to the same file name. Denial of service attack: Lets suppose that on a system with a 65536 inode limit, I process a mailicious file which has 65536 's. Again, don't blow chunks unless you seriously trust the document author. Even if it was a 300 page book, I would run as a pipe producing a single document unless I had a good reason to trust the author. On a related note, Docbook2html files actually need to be tidy'ed so badly that you might consider making a call to tidy (with configurable options), a built option (or better yet, fix the generator - but that is probably jade). The output is technically legal HTML but the formatting violates the spirit of HTML. > I hope to have time to look at making a new release in the next few > weeks. Another question: does either 0.6.9 or the upcoming release fix the "URL not supported" problem? docbook2html chokes on the DOCTYPE in files generated by abiword: pukes: Now it is very likely that at some point, a docbook user is going to receive a document in a newer version of docbook than their system recognizes. One of the primary reasons for using XML/SGML markup in the first place instead of horrible proprietary formats is that new features can be added in the future but old programs can still read the document although they may not be able to render any new constructs. If you process this document with a default stylesheet, you will probably get 95% of the content or more. You may even get 100% since a document might well be labeled "DocBook 4.2" but actually only uses "Docbook 4.1 features". [Definition: blow chunks: (blo chungks) v. intr. 1. (slang) To regurgitate 2. An extremely disdainful term for running docbook2html without --nochunks, or a similar program in a similar mode, such that you generate multiple tiny output files from a single source file (or a single source file with other files included by reference. As would be done by a webmaster with poor judgement on small documents, causing grief for users loading, reading, printing, searching, archiving, and making offline use of web pages. Or as might legitimately be done by the author of a VERY large document such as a multihundred page book (which should still be availible as a single file if the user wants it that way) that might legitimately be viewed as a number of smaller _but still substantial_ documents. Dividing documents up into screenful sized chunks of text is something only a propeller head would do; it might appeal to other propeller heads who are easily amused by having buttons to push but it really pisses off people who have useful work to do. ] -- Mark Whitis http://www.freelabs.com/~whitis/ NO SPAM Author of many open source software packages. Coauthor: Linux Programming Unleashed (1st Edition) From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16196 invoked by alias); 11 Apr 2002 02:40:40 -0000 Mailing-List: contact docbook-tools-discuss-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: docbook-tools-discuss-owner@sources.redhat.com Received: (qmail 16179 invoked from network); 11 Apr 2002 02:40:35 -0000 Received: from unknown (HELO scaup.prod.itd.earthlink.net) (207.217.120.49) by sources.redhat.com with SMTP; 11 Apr 2002 02:40:35 -0000 Received: from sdn-ar-001vacharp257.dialsprint.net ([168.191.212.147]) by scaup.prod.itd.earthlink.net with esmtp (Exim 3.33 #1) id 16vUVi-0004Bi-00; Wed, 10 Apr 2002 19:40:31 -0700 Date: Wed, 10 Apr 2002 19:40:00 -0000 From: Mark Whitis To: Tim Waugh cc: =?iso-8859-1?Q?=C9ric_Bischoff?= , Subject: Re: multiple bugs and security hole (was: Re: BUG: docbook2html --nochunks) In-Reply-To: <20020410080309.D13205@redhat.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-SW-Source: 2002-q2/txt/msg00014.txt.bz2 Message-ID: <20020410194000.3wORln-qgkmz95VXn1KGy03unGYg6zfAQ5j-8uac1E8@z> Sorry if this message seems overly negative. I only have time to mention that which is broke. Not talk about what works or contribute patches for what's broke. Thanks to everyone who is making Docbook, SGML, and HTML document preparation feasable. I coauthored an 800 page linux book; docbook is MUCH better than the publisher's dreadful MS-Word style sheet based monstrosity. Versions in use (Redhat RPMs): openjade-1.3-13 docbook-utils-0.6-13 (i'd need to download half of skipjack to install newer versions - this is only a little bit of an exageration if preliminary examination and past experience are any indication). On Wed, 10 Apr 2002, Tim Waugh wrote: > Indeed. Here is the patch I used: Thanks for the quick response. I applied that patch directly to /usr/bin/jw and it sorta-kinda fixed the problem. Still, it is a kludge rather than a proper bugfix. docbook2html still can't be used as a proper filter, for example: | docbook2html ... | tidy ... | ... This is un*x. Filters should be able to take input on standard in and send output to standard out with errors to standard error. Obviously, you can't do that if you are using it set to blow chunks, but that is a special mode of operation suitable only for documents that are not only very large but also written by someone you trust. The blow chunks mode is also probably also a serious security hole in many situations (it creates files on the host system with names based on text supplied by the untrustworthy remote user who supplied the file). Don't believe me? Try this Jade will complain aobut "/" in an id but will still happily create overwrite /etc/youarescrewed if you run it as root. Even as a non-root user, there are plenty of files which can be compromised, such as ".ssh/authorized_keys". Yes, there will probably be a ".html" tacked onto the end of the file name. That DOES NOT mean you are safe. Not only might there be another hole that lets an attacker get rid of the ".html" (perhaps as simple as a buffer overflow), there are places where a file can do serious damage even with a ".html" extension, like in the directories "/etc/rc.d/rc3.d/" (executable flag probably required) and /etc/xinetd.d/ (executable flag not required). And if docbook2html is not run as root, there may still be dot file directorys which will be executed even with a .html extension. Not to mention an HTML file itself may be the target of an attack; here let me just trojan this copy of the nimbda worm (which is carred in .html files) into ~/public_html/index.html. Or maybe I will distribute a trojan docbook document which overwrites /www/index.html with 7|-|15 5173 15 0\/\/|\|3D BY 7|-|3 D00D (THIS SITEIS OWNED BY THE DUDE) If the person who reads that document has access to /www, there will be trouble. So, no document should ever be processed in blow chunks mode unless you personally wrote it. Which also means no document should should be distributed that needs blow chunks mode to be processed. And blow chunks mode should definitely NOT be the default for docbook2html. And no document should every be processed by docbook2html in a vgi-bin unless the document was written by a user on that server. So, you can forget about: - An HTML translation service that allows web users who do not have docbook translation software on their computer to enter a URL of a docbook or upload it and view the results. - Any server that publishes documents from anyone who isn't absolutely trustworthy and accepts those documents in docbook format and generates html. So much for the LDP, linux.org, etc. One clown writes a trojan "Post-It note mini-HOWTO" and the server is compromised. It is not like the people administering those servers are likely to have time to validate every revision of every document. - secure non-shell ISP which requires people to create their web pages in XML/SGML instead of using Frontpage. - a web browser which supports docbook by using docbook2html as an external filter. At least using docbook2html as a standard unix filter, if that were actually allowed, would be inherently more secure. Of course, you still can't allow sloppy code leading to buffer overlows in openjade, jw, etc. Lest some niave person suggest that fixing jade so it will not accept "/" in a chapter "id" is the fix for the problem, history has repeatedly demonstrated that deny-known-bad is NOT an effective security proceedure. Someone will, using a hypothetical example, find out that the kernel also accepts ASCII character 254 as a synonym for "/". Or that id="%2Fetc%2Fpasswd" gets through. Yes, jade needs to be fixed to allow an extremely limited character set to appear in filenames based on document supplied text; for example, every single character other than A-Za-z0-9 should be translated to an underscore, multiple consequitive should be compressed to one,, leading and training undersocres should be deleted, the total length should be limited, and if appropriate a number should be added to the end to prevent two identifiers from mapping to the same file name. Denial of service attack: Lets suppose that on a system with a 65536 inode limit, I process a mailicious file which has 65536 's. Again, don't blow chunks unless you seriously trust the document author. Even if it was a 300 page book, I would run as a pipe producing a single document unless I had a good reason to trust the author. On a related note, Docbook2html files actually need to be tidy'ed so badly that you might consider making a call to tidy (with configurable options), a built option (or better yet, fix the generator - but that is probably jade). The output is technically legal HTML but the formatting violates the spirit of HTML. > I hope to have time to look at making a new release in the next few > weeks. Another question: does either 0.6.9 or the upcoming release fix the "URL not supported" problem? docbook2html chokes on the DOCTYPE in files generated by abiword: pukes: Now it is very likely that at some point, a docbook user is going to receive a document in a newer version of docbook than their system recognizes. One of the primary reasons for using XML/SGML markup in the first place instead of horrible proprietary formats is that new features can be added in the future but old programs can still read the document although they may not be able to render any new constructs. If you process this document with a default stylesheet, you will probably get 95% of the content or more. You may even get 100% since a document might well be labeled "DocBook 4.2" but actually only uses "Docbook 4.1 features". [Definition: blow chunks: (blo chungks) v. intr. 1. (slang) To regurgitate 2. An extremely disdainful term for running docbook2html without --nochunks, or a similar program in a similar mode, such that you generate multiple tiny output files from a single source file (or a single source file with other files included by reference. As would be done by a webmaster with poor judgement on small documents, causing grief for users loading, reading, printing, searching, archiving, and making offline use of web pages. Or as might legitimately be done by the author of a VERY large document such as a multihundred page book (which should still be availible as a single file if the user wants it that way) that might legitimately be viewed as a number of smaller _but still substantial_ documents. Dividing documents up into screenful sized chunks of text is something only a propeller head would do; it might appeal to other propeller heads who are easily amused by having buttons to push but it really pisses off people who have useful work to do. ] -- Mark Whitis http://www.freelabs.com/~whitis/ NO SPAM Author of many open source software packages. Coauthor: Linux Programming Unleashed (1st Edition)