public inbox for bzip2-devel@sourceware.org
 help / color / mirror / Atom feed
* Re: bzip2 1.0.7 released
  2019-01-01  0:00       ` Julian Seward
@ 2019-01-01  0:00         ` Mark Wielaard
  2019-01-01  0:00           ` Mark Wielaard
                             ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Mark Wielaard @ 2019-01-01  0:00 UTC (permalink / raw)
  To: jseward, Federico Mena Quintero, bzip2-devel

[-- Attachment #1: Type: text/plain, Size: 5304 bytes --]

Hi Julian,

On Fri, 2019-06-28 at 08:08 +0200, Julian Seward wrote:
> I see the following, in the 1.0.6 sources:
> 
> * bzlib_private.h:#define BZ_MAX_SELECTORS (2 + (900000 / BZ_G_SIZE))
> 
> * compress.c has
>     AssertH( nSelectors < 32768 &&
>              nSelectors <= (2 + (900000 / BZ_G_SIZE)),
>              3003 );
> 
> * decompress.c has (well, wants to have)
>        if (nSelectors < 1 || nSelectors > BZ_MAX_SELECTORS)
> 
> Mark observes this:
> 
>  > So it looks like some implementations might add more selectors than
>  > necessary. For example lbzip2 seems to use a max of 18000 + 1 + 7.  Which
>  > might explain why our 18002 = 2 + (900000 / 50) isn't enough, and why my
>  > random increase of 5 seemed to work for the given file.
> 
> Seems plausible, except that (18000 + 1 + 7) - (2 + (900000 / 50)) == 6,
> not 5.  So we'd have to increase the number by 6, not 5.

:} off-by-one...

> IIUC (which I may not), bzip2-1.0.7 is internally self-consistent, in that it
> won't create more than 18002 selectors, and it also won't accept more than
> 18002 selectors.  But some lbzip2 implementations can create up to 18008
> selectors (per Mark's comments), and that is what is causing the problem here.

Yes. if I understand correctly, then 18002 is also the theoretical
limit of the number of selectors. Being 2 (a zero and end of block
marker) plus (900000 / BZ_G_SIZE), the maximum block size, because you
cannot have more symbols than fit in a block, divided by BZ_G_SIZE
(50), the minimum group of symbols per tree.

But some implementations might "round up" that number. And I see some
implementations do accept the theoretical limit of 32768 (being 2^15).
But that seems impossible, and is just an upper limit because 18002
doesn't fit in 14 bits, so you have to use 15 bits to encode the number
of selectors.

> I would propose the following fix for discussion:
> 
> * split BZ_MAX_SELECTORS into two different values:
> 
>    - BZ_MAX_SELECTORS_ENC = (2 + (900000 / BZ_G_SIZE)) [= 18002], the max
>      number of selectors that bzip2 can create ("encode")
> 
>    - BZ_MAX_SELECTORS_DEC = BZ_MAX_SELECTORS_ENC + 6 [= 18008], the max number
>      of selectors that we'll accept during decoding, and add a comment
>      explaining that the difference is due to buggy lbzip2/whatever creating
>      more than BZ_MAX_SELECTORS_ENC
> 
> * use BZ_MAX_SELECTORS_ENC to dimension the arrays in struct EState
> 
> * use BZ_MAX_SELECTORS_DEC to dimension the arrays in struct DState, and
>    for the new decompress.c range check
> 
> * change the compress.c assertion
> 
>     AssertH( nSelectors < 32768 &&
>              nSelectors <= (2 + (900000 / BZ_G_SIZE)),
>              3003 );
> 
>     to actually mention BZ_MAX_SELECTORS_ENC directly, instead of
>     (2 + (900000 / BZ_G_SIZE)), [which is embarrassingly lame]
> 
> It seems to me to be important to now split BZ_MAX_SELECTORS into these two
> parts so as to make it clear to everybody that we're accepting (decompressing)
> a slightly larger set of inputs than we create (a la that old saying about
> network protocol implementations), so as to tolerate other compressors.

That seems good. The attached patch does this and makes it possible to
decode the problematic bz2 file.

> That then leaves the questions:
> 
> * Why did it work before?  I suspect Federico's guess that "I think the len
>    arrays are still uninitialized and will get filled until later" is correct,
>    but I'd need to look more at that.

Yes, I think Federico is correct, the len array comes directly after
the selectorMtf array and is being filled in after the selectorMtf
array overflow has been written. It is also large enough (BZ_N_GROUPS *
BZ_MAX_ALPHA_SIZE = 6 * 258 = 1548 UChars) so would have caught a small
overflow. The extra selectors are also never really used, since they
were just padding beyond the theoretical max. So the file would decode
correctly.

> * How to verify any change?  For whatever fix we arrive at, I'd like to do
>    some test runs over big trees of files (several tens of GB) with both
>    compressor and decompressor compiled with asan (and/or ubsan), since I
>    believe at least asan can detect overruns of the statically sized arrays
>    within the EState and DState structs -- V can't.

I couldn't get gcc -fsanitize=address to find the overwrite in the
subtle case. But the original report was for a fuzzed file. With a
ridiculously large selector value it should be easy to show (even under
valgrind).

>    Per communication with Federico (which didn't go to Mark, unfortunately), I
>    have a C driver program that compresses and decompresses every file in a
>    tree and checks it is the same as the original.  But I don't know what the
>    copyright status of it is and so I'm not happy to redistribute it (alas).
>    We should try to replace it.
> 
>    In particular, the three test files in the tarball merely serve to verify
>    that the build didn't fail in some obvious way.  They are in no way a
>    comprehensive test set.

Yes, having a more comprehensive test set would be great.
Especially having file encoded with other bzip2 encoders.

Thanks,

Mark

[-- Attachment #2: 0001-Be-more-liberal-in-the-number-of-selectors-we-accept.patch --]
[-- Type: text/x-patch, Size: 3871 bytes --]

From 8c99340e3cb4538b91bc508ebfe778f7de2174b1 Mon Sep 17 00:00:00 2001
From: Mark Wielaard <mark@klomp.org>
Date: Fri, 28 Jun 2019 12:38:41 +0200
Subject: [PATCH] Be more liberal in the number of selectors we accept when
 decoding.

As proposed by Julian Sewart:

* split BZ_MAX_SELECTORS into two different values:

  - BZ_MAX_SELECTORS_ENC = (2 + (900000 / BZ_G_SIZE)) [= 18002], the max
    number of selectors that bzip2 can create ("encode")

  - BZ_MAX_SELECTORS_DEC = BZ_MAX_SELECTORS_ENC + 6 [= 18008], the max
    number of selectors that we'll accept during decoding, and add a
    comment explaining that the difference is due to buggy lbzip2/whatever
    creating more than BZ_MAX_SELECTORS_ENC

* use BZ_MAX_SELECTORS_ENC to dimension the arrays in struct EState

* use BZ_MAX_SELECTORS_DEC to dimension the arrays in struct DState,
  and for the new decompress.c range check

* change the compress.c assertion

  AssertH( nSelectors < 32768 &&
           nSelectors <= (2 + (900000 / BZ_G_SIZE)),
           3003 );

  to actually mention BZ_MAX_SELECTORS_ENC directly, instead of
  (2 + (900000 / BZ_G_SIZE)), [which is embarrassingly lame]
---
 bzlib_private.h | 15 ++++++++++-----
 compress.c      |  2 +-
 decompress.c    |  2 +-
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/bzlib_private.h b/bzlib_private.h
index 7975552..6208ba1 100644
--- a/bzlib_private.h
+++ b/bzlib_private.h
@@ -122,7 +122,12 @@ extern void bz_internal_error ( int errcode );
 #define BZ_G_SIZE   50
 #define BZ_N_ITERS  4
 
-#define BZ_MAX_SELECTORS (2 + (900000 / BZ_G_SIZE))
+/* The max number of selectors that bzip2 can create ("encode") [= 18002] */
+#define BZ_MAX_SELECTORS_ENC (2 + (900000 / BZ_G_SIZE))
+/* The max number of selectors that bzip2 accept during decoding [= 18008]
+   This is larger than BZ_MAX_SELECTORS_ENC because some implementations,
+   might round up the number of selectors to a factor of 8. */
+#define BZ_MAX_SELECTORS_DEC (BZ_MAX_SELECTORS_ENC + 6)
 
 
 
@@ -253,8 +258,8 @@ typedef
       /* stuff for coding the MTF values */
       Int32    nMTF;
       Int32    mtfFreq    [BZ_MAX_ALPHA_SIZE];
-      UChar    selector   [BZ_MAX_SELECTORS];
-      UChar    selectorMtf[BZ_MAX_SELECTORS];
+      UChar    selector   [BZ_MAX_SELECTORS_ENC];
+      UChar    selectorMtf[BZ_MAX_SELECTORS_ENC];
 
       UChar    len     [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
       Int32    code    [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
@@ -399,8 +404,8 @@ typedef
       /* for decoding the MTF values */
       UChar    mtfa   [MTFA_SIZE];
       Int32    mtfbase[256 / MTFL_SIZE];
-      UChar    selector   [BZ_MAX_SELECTORS];
-      UChar    selectorMtf[BZ_MAX_SELECTORS];
+      UChar    selector   [BZ_MAX_SELECTORS_DEC];
+      UChar    selectorMtf[BZ_MAX_SELECTORS_DEC];
       UChar    len  [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
 
       Int32    limit  [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
diff --git a/compress.c b/compress.c
index 237620d..9b660f8 100644
--- a/compress.c
+++ b/compress.c
@@ -454,7 +454,7 @@ void sendMTFValues ( EState* s )
 
    AssertH( nGroups < 8, 3002 );
    AssertH( nSelectors < 32768 &&
-            nSelectors <= (2 + (900000 / BZ_G_SIZE)),
+            nSelectors <= BZ_MAX_SELECTORS_ENC,
             3003 );
 
 
diff --git a/decompress.c b/decompress.c
index 20ce493..d24a052 100644
--- a/decompress.c
+++ b/decompress.c
@@ -287,7 +287,7 @@ Int32 BZ2_decompress ( DState* s )
       GET_BITS(BZ_X_SELECTOR_1, nGroups, 3);
       if (nGroups < 2 || nGroups > BZ_N_GROUPS) RETURN(BZ_DATA_ERROR);
       GET_BITS(BZ_X_SELECTOR_2, nSelectors, 15);
-      if (nSelectors < 1 || nSelectors > BZ_MAX_SELECTORS) RETURN(BZ_DATA_ERROR);
+      if (nSelectors < 1 || nSelectors > BZ_MAX_SELECTORS_DEC) RETURN(BZ_DATA_ERROR);
       for (i = 0; i < nSelectors; i++) {
          j = 0;
          while (True) {
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bzip2 1.0.7 released
  2019-01-01  0:00     ` Federico Mena Quintero
@ 2019-01-01  0:00       ` Julian Seward
  2019-01-01  0:00         ` Mark Wielaard
  0 siblings, 1 reply; 16+ messages in thread
From: Julian Seward @ 2019-01-01  0:00 UTC (permalink / raw)
  To: Federico Mena Quintero, Mark Wielaard, bzip2-devel


Mark, Federico,

Thank you both for the analysis.  Here's my Eur 0.02:

I see the following, in the 1.0.6 sources:

* bzlib_private.h:#define BZ_MAX_SELECTORS (2 + (900000 / BZ_G_SIZE))

* compress.c has
    AssertH( nSelectors < 32768 &&
             nSelectors <= (2 + (900000 / BZ_G_SIZE)),
             3003 );

* decompress.c has (well, wants to have)
       if (nSelectors < 1 || nSelectors > BZ_MAX_SELECTORS)

Mark observes this:

 > So it looks like some implementations might add more selectors than
 > necessary. For example lbzip2 seems to use a max of 18000 + 1 + 7.  Which
 > might explain why our 18002 = 2 + (900000 / 50) isn't enough, and why my
 > random increase of 5 seemed to work for the given file.

Seems plausible, except that (18000 + 1 + 7) - (2 + (900000 / 50)) == 6,
not 5.  So we'd have to increase the number by 6, not 5.

IIUC (which I may not), bzip2-1.0.7 is internally self-consistent, in that it
won't create more than 18002 selectors, and it also won't accept more than
18002 selectors.  But some lbzip2 implementations can create up to 18008
selectors (per Mark's comments), and that is what is causing the problem here.

I would propose the following fix for discussion:

* split BZ_MAX_SELECTORS into two different values:

   - BZ_MAX_SELECTORS_ENC = (2 + (900000 / BZ_G_SIZE)) [= 18002], the max
     number of selectors that bzip2 can create ("encode")

   - BZ_MAX_SELECTORS_DEC = BZ_MAX_SELECTORS_ENC + 6 [= 18008], the max number
     of selectors that we'll accept during decoding, and add a comment
     explaining that the difference is due to buggy lbzip2/whatever creating
     more than BZ_MAX_SELECTORS_ENC

* use BZ_MAX_SELECTORS_ENC to dimension the arrays in struct EState

* use BZ_MAX_SELECTORS_DEC to dimension the arrays in struct DState, and
   for the new decompress.c range check

* change the compress.c assertion

    AssertH( nSelectors < 32768 &&
             nSelectors <= (2 + (900000 / BZ_G_SIZE)),
             3003 );

    to actually mention BZ_MAX_SELECTORS_ENC directly, instead of
    (2 + (900000 / BZ_G_SIZE)), [which is embarrassingly lame]

It seems to me to be important to now split BZ_MAX_SELECTORS into these two
parts so as to make it clear to everybody that we're accepting (decompressing)
a slightly larger set of inputs than we create (a la that old saying about
network protocol implementations), so as to tolerate other compressors.

That then leaves the questions:

* Why did it work before?  I suspect Federico's guess that "I think the len
   arrays are still uninitialized and will get filled until later" is correct,
   but I'd need to look more at that.

* How to verify any change?  For whatever fix we arrive at, I'd like to do
   some test runs over big trees of files (several tens of GB) with both
   compressor and decompressor compiled with asan (and/or ubsan), since I
   believe at least asan can detect overruns of the statically sized arrays
   within the EState and DState structs -- V can't.

   Per communication with Federico (which didn't go to Mark, unfortunately), I
   have a C driver program that compresses and decompresses every file in a
   tree and checks it is the same as the original.  But I don't know what the
   copyright status of it is and so I'm not happy to redistribute it (alas).
   We should try to replace it.

   In particular, the three test files in the tarball merely serve to verify
   that the build didn't fail in some obvious way.  They are in no way a
   comprehensive test set.

J

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bzip2 1.0.7 released
  2019-01-01  0:00 bzip2 1.0.7 released Mark Wielaard
@ 2019-01-01  0:00 ` Mark Wielaard
  2019-01-01  0:00   ` Mark Wielaard
                     ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Mark Wielaard @ 2019-01-01  0:00 UTC (permalink / raw)
  To: bzip2-devel; +Cc: Julian Seward

Hi,

On Thu, Jun 27, 2019 at 08:54:08PM +0200, Mark Wielaard wrote:
> * Make sure nSelectors is not out of range (CVE-2019-12900)

Well, that was quick... There is already a regression report about
this fix. See https://bugs.launchpad.net/ubuntu/+source/bzip2/+bug/1834494

The fix itself is certainly correct:

diff --git a/decompress.c b/decompress.c
index ab6a624..f3db91d 100644
--- a/decompress.c
+++ b/decompress.c
@@ -280,21 +280,21 @@ Int32 BZ2_decompress ( DState* s )
                if (uc == 1) s->inUse[i * 16 + j] = True;
             }
       makeMaps_d ( s );
       if (s->nInUse == 0) RETURN(BZ_DATA_ERROR);
       alphaSize = s->nInUse+2;
 
       /*--- Now the selectors ---*/
       GET_BITS(BZ_X_SELECTOR_1, nGroups, 3);
       if (nGroups < 2 || nGroups > 6) RETURN(BZ_DATA_ERROR);
       GET_BITS(BZ_X_SELECTOR_2, nSelectors, 15);
-      if (nSelectors < 1) RETURN(BZ_DATA_ERROR);
+      if (nSelectors < 1 || nSelectors > BZ_MAX_SELECTORS) RETURN(BZ_DATA_ERROR);
       for (i = 0; i < nSelectors; i++) {
          j = 0;
          while (True) {
             GET_BIT(BZ_X_SELECTOR_3, uc);
             if (uc == 0) break;
             j++;
             if (j >= nGroups) RETURN(BZ_DATA_ERROR);
          }
          s->selectorMtf[i] = j;
       }

Because if nSelectors would be > BZ_MAX_SELECTORS it would write over
memory after the selectorMtf array.

The problem with the file in the report is that it does contain some
nSelectors that are slightly larger than BZ_MAX_SELECTORS.

The test file can be found here:
https://developer.nvidia.com/embedded/dlc/l4t-jetson-xavier-driver-package-31-1-0

The fix is simple:

diff --git a/bzlib_private.h b/bzlib_private.h
index 7975552..ef870d9 100644
--- a/bzlib_private.h
+++ b/bzlib_private.h
@@ -122,7 +122,7 @@ extern void bz_internal_error ( int errcode );
 #define BZ_G_SIZE   50
 #define BZ_N_ITERS  4
 
-#define BZ_MAX_SELECTORS (2 + (900000 / BZ_G_SIZE))
+#define BZ_MAX_SELECTORS (7 + (900000 / BZ_G_SIZE))
 
 
 
But of course I cannot tell why increasing the max with 5 is correct.
It might well be that the file is invalid. Before the fix bunzip2
would overwrite some memory after the selectorMtf array. So it might
be the file decompressed by accident in the past.

I'll look a but deeper, but if people have a clue what exactly is
going on that would be appreciated.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 16+ messages in thread

* bzip2 1.0.7 released
@ 2019-01-01  0:00 Mark Wielaard
  2019-01-01  0:00 ` Mark Wielaard
  0 siblings, 1 reply; 16+ messages in thread
From: Mark Wielaard @ 2019-01-01  0:00 UTC (permalink / raw)
  To: bzip2-devel; +Cc: lwn, Julian Seward

We are happy to announce the release of bzip2 1.0.7.

This is an emergency release because the old bzip2 home
is gone and there were outstanding security issues.
The original bzip2 home, downloads and documentation
can now be found at: https://sourceware.org/bzip2/

bzip2 1.0.7 contains only the following bug/security fixes:

* Fix undefined behavior in the macros SET_BH, CLEAR_BH, & ISSET_BH
* bzip2: Fix return value when combining --test,-t and -q.
* bzip2recover: Fix buffer overflow for large argv[0]
* bzip2recover: Fix use after free issue with outFile (CVE-2016-3189)
* Make sure nSelectors is not out of range (CVE-2019-12900)

A future 1.1.x release is being prepared by Federico Mena Quintero
which will include more fixes, an updated build system and possibly
an updated SONAME default.

Please read his blog for more background on this:
https://people.gnome.org/~federico/blog/tag/bzip2.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bzip2 1.0.7 released
  2019-01-01  0:00 ` Mark Wielaard
  2019-01-01  0:00   ` Mark Wielaard
@ 2019-01-01  0:00   ` Federico Mena Quintero
  2019-01-01  0:00   ` Jeffrey Walton
  2 siblings, 0 replies; 16+ messages in thread
From: Federico Mena Quintero @ 2019-01-01  0:00 UTC (permalink / raw)
  To: Mark Wielaard, bzip2-devel; +Cc: Julian Seward

On Thu, 2019-06-27 at 22:58 +0200, Mark Wielaard wrote:

> The problem with the file in the report is that it does contain some
> nSelectors that are slightly larger than BZ_MAX_SELECTORS.

There's a discussion here:
https://gitlab.com/federicomenaquintero/bzip2/commit/74de1e2e6ffc9d51ef9824db71a8ffee5962cdbc

It has links to the lbzip2 bug, and the commit that fixed it.

Summary: a bug in lbzip2 was causing it to write more than bzip2's
allowed number of selectors; it got fixed in December 2018.

If lbzip2's goals is to produce files which are compatible with bzip2,
it can be argued that indeed it was producing broken files for a while.

> -#define BZ_MAX_SELECTORS (2 + (900000 / BZ_G_SIZE))
> +#define BZ_MAX_SELECTORS (7 + (900000 / BZ_G_SIZE))
>  
>  
>  
> But of course I cannot tell why increasing the max with 5 is correct.
> It might well be that the file is invalid. Before the fix bunzip2
> would overwrite some memory after the selectorMtf array. So it might
> be the file decompressed by accident in the past.

The comments in the lbzip2 commit that fixes this refer to the padding
required to make the block size a multiple of 8 bits.  I don't know
enough about bzip2's format yet, or how the move-to-front algorithm
works, to be able to tell if *that* simple fix for bzip2 of increasing
BZ_MAX_SELECTORS is enough to make bzip2 able to read all of lbzip2's
files.

I think some distros ship lbzip2 instead of bzip2.  If it was
generating non-fully-compatible files... due to a bug in bzip2... it's
probably fair to allow these files in bzip2 and be done with it.  I'd
still like to hear some confirmation (from Julian or the lbzip2
authors?) on whether the analysis above is correct.

  Federico

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bzip2 1.0.7 released
  2019-01-01  0:00   ` Mark Wielaard
@ 2019-01-01  0:00     ` Federico Mena Quintero
  2019-01-01  0:00       ` Julian Seward
  0 siblings, 1 reply; 16+ messages in thread
From: Federico Mena Quintero @ 2019-01-01  0:00 UTC (permalink / raw)
  To: Mark Wielaard, bzip2-devel; +Cc: Julian Seward

On Fri, 2019-06-28 at 00:46 +0200, Mark Wielaard wrote:
> 
> So it looks like some implementations might add more selectors than
> necessary. For example lbzip2 seems to use a max of 18000 + 1 + 7.
> Which might explain why our 18002 = 2 + (900000 / 50) isn't enough,
> and
> why my random increase of 5 seemed to work for the given file.
> 
> In general the nSelector field can be up to 15 bits, so 32768. So we
> definitely do want to check the input doesn't overflow (or make
> BZ_MAX_SELECTORS 32768, but that seems excessive).

I've posted a little more analysis here:

https://gitlab.com/federicomenaquintero/bzip2/issues/24

  Federico

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bzip2 1.0.7 released
  2019-01-01  0:00 ` Mark Wielaard
  2019-01-01  0:00   ` Mark Wielaard
  2019-01-01  0:00   ` bzip2 1.0.7 released Federico Mena Quintero
@ 2019-01-01  0:00   ` Jeffrey Walton
  2 siblings, 0 replies; 16+ messages in thread
From: Jeffrey Walton @ 2019-01-01  0:00 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: bzip2-devel, Julian Seward

On Thu, Jun 27, 2019 at 4:58 PM Mark Wielaard <mark@klomp.org> wrote:
>
> On Thu, Jun 27, 2019 at 08:54:08PM +0200, Mark Wielaard wrote:
> > * Make sure nSelectors is not out of range (CVE-2019-12900)
>
> Well, that was quick... There is already a regression report about
> this fix. See https://bugs.launchpad.net/ubuntu/+source/bzip2/+bug/1834494
> ...
>
> -#define BZ_MAX_SELECTORS (2 + (900000 / BZ_G_SIZE))
> +#define BZ_MAX_SELECTORS (7 + (900000 / BZ_G_SIZE))
>
> But of course I cannot tell why increasing the max with 5 is correct.
> It might well be that the file is invalid. Before the fix bunzip2
> would overwrite some memory after the selectorMtf array. So it might
> be the file decompressed by accident in the past.
>
> I'll look a but deeper, but if people have a clue what exactly is
> going on that would be appreciated.

Hi Mark.

At this point, I think you should perform the root cause analysis.

I think you are having trouble identifying the the problem(s) because
the code is not instrumented, and it does not debug itself. It
requires manual intervention and time under the debugger. Time under a
debugger is mostly a waste of time in my opinion. You have better
things to do with your time.

If this were my module, then I would fill the program with ASSERT's so
the module debugs itself in non-release builds. Then, run all the
malformed test data you can find.

The good thing about instrumenting with asserts is, it is monkey work.
You can task an intern with it, and not waste a senior dev's time with
it. Even better, the intern can then identify the problems so you
don't waste your time with it.

I know assert's are not preferred in the Unix world. Many Unix dev's
prefer to cling to the old K&R way of writing code from the 1970's.
But there are a lot better ways to develop nowadays, and self
debugging code is one of them.

(This is really a policies and procedures problem. The development
process has gaps. The fix is to require completely instrumented code.
But to do that, you have to deal with the political problem of people
clinging to ancient K&R patterns).

Jeff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bzip2 1.0.7 released
  2019-01-01  0:00         ` Mark Wielaard
@ 2019-01-01  0:00           ` Mark Wielaard
  2019-01-01  0:00             ` Federico Mena Quintero
  2019-01-01  0:00           ` Alternative nSelectors patch (Was: bzip2 1.0.7 released) Mark Wielaard
  2019-01-01  0:00           ` bzip2 test suite " Mark Wielaard
  2 siblings, 1 reply; 16+ messages in thread
From: Mark Wielaard @ 2019-01-01  0:00 UTC (permalink / raw)
  To: jseward, Federico Mena Quintero, bzip2-devel

[-- Attachment #1: Type: text/plain, Size: 299 bytes --]

On Fri, 2019-06-28 at 13:10 +0200, Mark Wielaard wrote:
> That seems good. The attached patch does this and makes it possible
> to decode the problematic bz2 file.

The patch was correct, but I misspelled Julian's last name in the
commit message... embarrassing. Sorry. Fixed in the attached.

[-- Attachment #2: 0001-Be-more-liberal-in-the-number-of-selectors-we-accept.patch --]
[-- Type: text/x-patch, Size: 3871 bytes --]

From d970bad3371761555d249374e70012bba705ed73 Mon Sep 17 00:00:00 2001
From: Mark Wielaard <mark@klomp.org>
Date: Fri, 28 Jun 2019 12:38:41 +0200
Subject: [PATCH] Be more liberal in the number of selectors we accept when
 decoding.

As proposed by Julian Seward:

* split BZ_MAX_SELECTORS into two different values:

  - BZ_MAX_SELECTORS_ENC = (2 + (900000 / BZ_G_SIZE)) [= 18002], the max
    number of selectors that bzip2 can create ("encode")

  - BZ_MAX_SELECTORS_DEC = BZ_MAX_SELECTORS_ENC + 6 [= 18008], the max
    number of selectors that we'll accept during decoding, and add a
    comment explaining that the difference is due to buggy lbzip2/whatever
    creating more than BZ_MAX_SELECTORS_ENC

* use BZ_MAX_SELECTORS_ENC to dimension the arrays in struct EState

* use BZ_MAX_SELECTORS_DEC to dimension the arrays in struct DState,
  and for the new decompress.c range check

* change the compress.c assertion

  AssertH( nSelectors < 32768 &&
           nSelectors <= (2 + (900000 / BZ_G_SIZE)),
           3003 );

  to actually mention BZ_MAX_SELECTORS_ENC directly, instead of
  (2 + (900000 / BZ_G_SIZE)), [which is embarrassingly lame]
---
 bzlib_private.h | 15 ++++++++++-----
 compress.c      |  2 +-
 decompress.c    |  2 +-
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/bzlib_private.h b/bzlib_private.h
index 7975552..6208ba1 100644
--- a/bzlib_private.h
+++ b/bzlib_private.h
@@ -122,7 +122,12 @@ extern void bz_internal_error ( int errcode );
 #define BZ_G_SIZE   50
 #define BZ_N_ITERS  4
 
-#define BZ_MAX_SELECTORS (2 + (900000 / BZ_G_SIZE))
+/* The max number of selectors that bzip2 can create ("encode") [= 18002] */
+#define BZ_MAX_SELECTORS_ENC (2 + (900000 / BZ_G_SIZE))
+/* The max number of selectors that bzip2 accept during decoding [= 18008]
+   This is larger than BZ_MAX_SELECTORS_ENC because some implementations,
+   might round up the number of selectors to a factor of 8. */
+#define BZ_MAX_SELECTORS_DEC (BZ_MAX_SELECTORS_ENC + 6)
 
 
 
@@ -253,8 +258,8 @@ typedef
       /* stuff for coding the MTF values */
       Int32    nMTF;
       Int32    mtfFreq    [BZ_MAX_ALPHA_SIZE];
-      UChar    selector   [BZ_MAX_SELECTORS];
-      UChar    selectorMtf[BZ_MAX_SELECTORS];
+      UChar    selector   [BZ_MAX_SELECTORS_ENC];
+      UChar    selectorMtf[BZ_MAX_SELECTORS_ENC];
 
       UChar    len     [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
       Int32    code    [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
@@ -399,8 +404,8 @@ typedef
       /* for decoding the MTF values */
       UChar    mtfa   [MTFA_SIZE];
       Int32    mtfbase[256 / MTFL_SIZE];
-      UChar    selector   [BZ_MAX_SELECTORS];
-      UChar    selectorMtf[BZ_MAX_SELECTORS];
+      UChar    selector   [BZ_MAX_SELECTORS_DEC];
+      UChar    selectorMtf[BZ_MAX_SELECTORS_DEC];
       UChar    len  [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
 
       Int32    limit  [BZ_N_GROUPS][BZ_MAX_ALPHA_SIZE];
diff --git a/compress.c b/compress.c
index 237620d..9b660f8 100644
--- a/compress.c
+++ b/compress.c
@@ -454,7 +454,7 @@ void sendMTFValues ( EState* s )
 
    AssertH( nGroups < 8, 3002 );
    AssertH( nSelectors < 32768 &&
-            nSelectors <= (2 + (900000 / BZ_G_SIZE)),
+            nSelectors <= BZ_MAX_SELECTORS_ENC,
             3003 );
 
 
diff --git a/decompress.c b/decompress.c
index 20ce493..d24a052 100644
--- a/decompress.c
+++ b/decompress.c
@@ -287,7 +287,7 @@ Int32 BZ2_decompress ( DState* s )
       GET_BITS(BZ_X_SELECTOR_1, nGroups, 3);
       if (nGroups < 2 || nGroups > BZ_N_GROUPS) RETURN(BZ_DATA_ERROR);
       GET_BITS(BZ_X_SELECTOR_2, nSelectors, 15);
-      if (nSelectors < 1 || nSelectors > BZ_MAX_SELECTORS) RETURN(BZ_DATA_ERROR);
+      if (nSelectors < 1 || nSelectors > BZ_MAX_SELECTORS_DEC) RETURN(BZ_DATA_ERROR);
       for (i = 0; i < nSelectors; i++) {
          j = 0;
          while (True) {
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* bzip2 test suite (Was: bzip2 1.0.7 released)
  2019-01-01  0:00         ` Mark Wielaard
  2019-01-01  0:00           ` Mark Wielaard
  2019-01-01  0:00           ` Alternative nSelectors patch (Was: bzip2 1.0.7 released) Mark Wielaard
@ 2019-01-01  0:00           ` Mark Wielaard
  2 siblings, 0 replies; 16+ messages in thread
From: Mark Wielaard @ 2019-01-01  0:00 UTC (permalink / raw)
  To: jseward, Federico Mena Quintero, bzip2-devel

[-- Attachment #1: Type: text/plain, Size: 2488 bytes --]

On Fri, 2019-06-28 at 13:10 +0200, Mark Wielaard wrote:
> >    In particular, the three test files in the tarball merely serve to verify
> >    that the build didn't fail in some obvious way.  They are in no way a
> >    comprehensive test set.
> 
> Yes, having a more comprehensive test set would be great.
> Especially having file encoded with other bzip2 encoders.

I don't claim it is comprehensive at all, but I have gone through all
the other bzip2 encoder/decoder projects I could find and collected the
samples/tests they used. To not clutter up the main repo I have setup a
new repository with a test driver.

https://sourceware.org/git/?p=bzip2-tests.git;a=summary
git clone git://sourceware.org/git/bzip2-tests.git

It only has one commit:

commit 3e71fa8acda490a6b288833d759eb4129ad573e2
Author: Mark Wielaard <mark@klomp.org>
Date:   Sun Jun 30 20:56:33 2019 +0200

    Initial bzip2 test suite.
    
    Contains test files from the commons-compress, dotnetzip, go, lbzip2
    and pyflate projects. Each test file has either an associated md5 sum
    to check it decompressed correctly. Or it is named bz2.bad to indicate
    it cannot be decompressed (or might need --force to decompress).
    
    The run-tests.sh test wrapper runs a bzip2 binary in a couple of
    configurations, optionally under valgrind, on all the test files in
    the tree.

See the attached README to see how to run it.

Even though it might not be comprehensive, it is a start. And you can
use the ./run-tests.sh test runner to run over a full directory tree.
See the end of the README file.

It already contains a testcase (from lbzip2) that shows the original
issue with bzip2 1.0.6
(compiled with gcc -fsanitize=undefined -fno-sanitize-recover):

Processing lbzip2/32767.bz2
  Decompress...
decompress.c:299:24: runtime error: index 18002 out of bounds for type 'UChar [18002]'
!!! bad decompress result 1

And with bzip 1.0.7 it does fail as follows:

Processing lbzip2/32767.bz2
  Decompress...

bzip2: Data integrity error when decompressing.
!!! bad decompress result 2

Sadly it also shows that the fix we have isn't complete.
Even with that fix it fails as above.

I think I have a fix for that though. Will post in a minute.

I'll try to integrate test suite with the buildbots (although some will
have to at least use --without-valgrind because testing is really,
really, really slow under valgrind).

Cheers,

Mark

[-- Attachment #2: README --]
[-- Type: text/plain, Size: 2156 bytes --]

= BZ2 test file collection =

This is a collection of "interesting" .bz2 files that can be used to
test bzip2 works correctly. They come from different projects.

Each directory should contain a README file explaining where the .bz2
files originally came from. Plus a reference to the (Free Software)
license that the project files were distributed under.

Some files are deliberately bad, and are use to see how bzip2 handles
corrupt files. They are explicitly not intended to decompress correctly,
but to catch errors in bzip2 trying to deal with deliberately bad data.
All such files have a name ending in .bz2.bad.

All none bad files end in bz2. And should come with a .md5 file for
the original input file. The .md5 file is used to check that bzip2
could correctly decompress the file. The original (non-compressed)
files are deliberately not checked in.

A .md5 sum is generated by:
  md5sum < file > file.md5

This generates a .md5 file that doesn't carry a file name (but just "-").
They can then be checked again with:
  md5sum --check file.md5 < file

Note do NOT name a file ending in .testfilecopy or .testfilecopy.bz2.
Those will automatically be cleaned by up the testframework.

There is a simple bash script to run the tests:

run-tests [--bzip2=bzip2-command] [--without-valgrind]
          [--ignore-md5] [--tests-dir=/path/to/bzip2-tests-dir]

It will by default test with the command 'bzip2', running under
valgrind (if installed on the system), checking md5 sum files
after decompression using the current directory (".") to find
any .bz2 or .bz2.bad files (and .md5 files if checked).

For each .bz2 file found it is decompressed, recompressed and
decompressed again. Once with the default bzip2 settings and
once in --small (-s) mode.

For each .bz2.bad file decompression is tried twice also. In
default mode and small mode. The bzip2 binary is expected to
return either 1 or 2 as exit status. Any other exit code is
interpreted as failure.

If you just want to check a directory (and any subdirectories)
full of (known good) .bz2 files you can invoke the script as:

  ./run-test --ignore-md5 --tests-dir=/dir/full/off/bz2/files


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bzip2 1.0.7 released
  2019-01-01  0:00           ` Mark Wielaard
@ 2019-01-01  0:00             ` Federico Mena Quintero
  2019-01-01  0:00               ` Mark Wielaard
  0 siblings, 1 reply; 16+ messages in thread
From: Federico Mena Quintero @ 2019-01-01  0:00 UTC (permalink / raw)
  To: Mark Wielaard, jseward, bzip2-devel

On Fri, 2019-06-28 at 13:27 +0200, Mark Wielaard wrote:
> 
> The patch was correct, but I misspelled Julian's last name in the
> commit message... embarrassing. Sorry. Fixed in the attached.

I have to be out today running errands, but this is great work!  Thanks
Julian for the insight to just split the limits in two constants and
Mark for implementing it.

Mark, do you want to do something like release a 1.0.8pre test and
advertise it to the Ubuntu people who reported problems?

  Federico

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bzip2 1.0.7 released
  2019-01-01  0:00             ` Federico Mena Quintero
@ 2019-01-01  0:00               ` Mark Wielaard
  0 siblings, 0 replies; 16+ messages in thread
From: Mark Wielaard @ 2019-01-01  0:00 UTC (permalink / raw)
  To: Federico Mena Quintero, jseward, bzip2-devel

On Fri, 2019-06-28 at 11:04 -0500, Federico Mena Quintero wrote:
> On Fri, 2019-06-28 at 13:27 +0200, Mark Wielaard wrote:
> > 
> > The patch was correct, but I misspelled Julian's last name in the
> > commit message... embarrassing. Sorry. Fixed in the attached.
> 
> I have to be out today running errands, but this is great
> work!  Thanks
> Julian for the insight to just split the limits in two constants and
> Mark for implementing it.
> 
> Mark, do you want to do something like release a 1.0.8pre test and
> advertise it to the Ubuntu people who reported problems?

I added a note to their bug tracker [*]. Pointing to the workaround
patch. Lets do some more testing during the weekend and then decide
whether to do a new release.

Cheers,

Mark

[*] 
https://bugs.launchpad.net/ubuntu/+source/bzip2/+bug/1834494/comments/7

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: bzip2 1.0.7 released
  2019-01-01  0:00 ` Mark Wielaard
@ 2019-01-01  0:00   ` Mark Wielaard
  2019-01-01  0:00     ` Federico Mena Quintero
  2019-01-01  0:00   ` bzip2 1.0.7 released Federico Mena Quintero
  2019-01-01  0:00   ` Jeffrey Walton
  2 siblings, 1 reply; 16+ messages in thread
From: Mark Wielaard @ 2019-01-01  0:00 UTC (permalink / raw)
  To: bzip2-devel; +Cc: Julian Seward

Hi,

A bit more analysis before I go to sleep.

On Thu, 2019-06-27 at 22:58 +0200, Mark Wielaard wrote:
> On Thu, Jun 27, 2019 at 08:54:08PM +0200, Mark Wielaard wrote:
> > * Make sure nSelectors is not out of range (CVE-2019-12900)
> 
> Well, that was quick... There is already a regression report about
> this fix. See 
> https://bugs.launchpad.net/ubuntu/+source/bzip2/+bug/1834494
> 
> The fix itself is certainly correct:
> 
> diff --git a/decompress.c b/decompress.c
> index ab6a624..f3db91d 100644
> --- a/decompress.c
> +++ b/decompress.c
> @@ -280,21 +280,21 @@ Int32 BZ2_decompress ( DState* s )
>                 if (uc == 1) s->inUse[i * 16 + j] = True;
>              }
>        makeMaps_d ( s );
>        if (s->nInUse == 0) RETURN(BZ_DATA_ERROR);
>        alphaSize = s->nInUse+2;
>  
>        /*--- Now the selectors ---*/
>        GET_BITS(BZ_X_SELECTOR_1, nGroups, 3);
>        if (nGroups < 2 || nGroups > 6) RETURN(BZ_DATA_ERROR);
>        GET_BITS(BZ_X_SELECTOR_2, nSelectors, 15);
> -      if (nSelectors < 1) RETURN(BZ_DATA_ERROR);
> +      if (nSelectors < 1 || nSelectors > BZ_MAX_SELECTORS)
> RETURN(BZ_DATA_ERROR);
>        for (i = 0; i < nSelectors; i++) {
>           j = 0;
>           while (True) {
>              GET_BIT(BZ_X_SELECTOR_3, uc);
>              if (uc == 0) break;
>              j++;
>              if (j >= nGroups) RETURN(BZ_DATA_ERROR);
>           }
>           s->selectorMtf[i] = j;
>        }
> 
> Because if nSelectors would be > BZ_MAX_SELECTORS it would write over
> memory after the selectorMtf array.
> 
> The problem with the file in the report is that it does contain some
> nSelectors that are slightly larger than BZ_MAX_SELECTORS.
> 
> The test file can be found here:
> 
https://developer.nvidia.com/embedded/dlc/l4t-jetson-xavier-driver-package-31-1-0
> 
> The fix is simple:
> 
> diff --git a/bzlib_private.h b/bzlib_private.h
> index 7975552..ef870d9 100644
> --- a/bzlib_private.h
> +++ b/bzlib_private.h
> @@ -122,7 +122,7 @@ extern void bz_internal_error ( int errcode );
>  #define BZ_G_SIZE   50
>  #define BZ_N_ITERS  4
>  
> -#define BZ_MAX_SELECTORS (2 + (900000 / BZ_G_SIZE))
> +#define BZ_MAX_SELECTORS (7 + (900000 / BZ_G_SIZE))
>  
>  
>  
> But of course I cannot tell why increasing the max with 5 is correct.
> It might well be that the file is invalid. Before the fix bunzip2
> would overwrite some memory after the selectorMtf array. So it might
> be the file decompressed by accident in the past.
> 
> I'll look a but deeper, but if people have a clue what exactly is
> going on that would be appreciated.

So it looks like some implementations might add more selectors than
necessary. For example lbzip2 seems to use a max of 18000 + 1 + 7.
Which might explain why our 18002 = 2 + (900000 / 50) isn't enough, and
why my random increase of 5 seemed to work for the given file.

In general the nSelector field can be up to 15 bits, so 32768. So we
definitely do want to check the input doesn't overflow (or make
BZ_MAX_SELECTORS 32768, but that seems excessive).

Cheers,

Mark

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Alternative nSelectors patch (Was: bzip2 1.0.7 released)
  2019-01-01  0:00         ` Mark Wielaard
  2019-01-01  0:00           ` Mark Wielaard
@ 2019-01-01  0:00           ` Mark Wielaard
  2019-01-01  0:00             ` Julian Seward
  2019-01-01  0:00           ` bzip2 test suite " Mark Wielaard
  2 siblings, 1 reply; 16+ messages in thread
From: Mark Wielaard @ 2019-01-01  0:00 UTC (permalink / raw)
  To: jseward, Federico Mena Quintero, bzip2-devel

[-- Attachment #1: Type: text/plain, Size: 1021 bytes --]

Hi,

On Fri, 2019-06-28 at 13:10 +0200, Mark Wielaard wrote:
> > It seems to me to be important to now split BZ_MAX_SELECTORS into these two
> > parts so as to make it clear to everybody that we're accepting (decompressing)
> > a slightly larger set of inputs than we create (a la that old saying about
> > network protocol implementations), so as to tolerate other compressors.
> 
> That seems good. The attached patch does this and makes it possible to
> decode the problematic bz2 file.

Sorry, it is a bit too late here to properly document this patch and
explain why I think it is a better one than the "split-max-selectors"
fix. But hopefully the new testsuite example and the comment in the
patch make clear what my thinking is.

This resolved both the issue with the large file reported as with the
new test suite file (lbzip2/32767.bz2). The whole testsuite passes now,
even under valgrind and with gcc -fsanitize=undefined.

Comments on the patch idea more than welcome.

Thanks,

Mark

[-- Attachment #2: limit-nSelectors-processing.patch --]
[-- Type: text/x-patch, Size: 1523 bytes --]

diff --git a/compress.c b/compress.c
index 237620d..76adee6 100644
--- a/compress.c
+++ b/compress.c
@@ -454,7 +454,7 @@ void sendMTFValues ( EState* s )
 
    AssertH( nGroups < 8, 3002 );
    AssertH( nSelectors < 32768 &&
-            nSelectors <= (2 + (900000 / BZ_G_SIZE)),
+            nSelectors <= BZ_MAX_SELECTORS,
             3003 );
 
 
diff --git a/decompress.c b/decompress.c
index 20ce493..3303499 100644
--- a/decompress.c
+++ b/decompress.c
@@ -287,7 +287,7 @@ Int32 BZ2_decompress ( DState* s )
       GET_BITS(BZ_X_SELECTOR_1, nGroups, 3);
       if (nGroups < 2 || nGroups > BZ_N_GROUPS) RETURN(BZ_DATA_ERROR);
       GET_BITS(BZ_X_SELECTOR_2, nSelectors, 15);
-      if (nSelectors < 1 || nSelectors > BZ_MAX_SELECTORS) RETURN(BZ_DATA_ERROR);
+      if (nSelectors < 1) RETURN(BZ_DATA_ERROR);
       for (i = 0; i < nSelectors; i++) {
          j = 0;
          while (True) {
@@ -296,8 +296,14 @@ Int32 BZ2_decompress ( DState* s )
             j++;
             if (j >= nGroups) RETURN(BZ_DATA_ERROR);
          }
-         s->selectorMtf[i] = j;
+         /* Having more than BZ_MAX_SELECTORS doesn't make much sense
+            since they will never be used, but some implementations might
+            "round up" the number of selectors, so just ignore those. */
+         if (i < BZ_MAX_SELECTORS)
+           s->selectorMtf[i] = j;
       }
+      if (nSelectors > BZ_MAX_SELECTORS)
+        nSelectors = BZ_MAX_SELECTORS;
 
       /*--- Undo the MTF values for the selectors. ---*/
       {

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Alternative nSelectors patch (Was: bzip2 1.0.7 released)
  2019-01-01  0:00             ` Julian Seward
@ 2019-01-01  0:00               ` Mark Wielaard
  2019-01-01  0:00                 ` Mark Wielaard
  0 siblings, 1 reply; 16+ messages in thread
From: Mark Wielaard @ 2019-01-01  0:00 UTC (permalink / raw)
  To: jseward, Federico Mena Quintero, bzip2-devel

[-- Attachment #1: Type: text/plain, Size: 1905 bytes --]

On Tue, 2019-07-02 at 08:34 +0200, Julian Seward wrote:
> This seems to me like a better patch than my proposal, so I retract
> my proposal and vote for this one instead.

I did keep your cleanup of the compress.c assert. And I don't think I
would have proposed something different if I didn't just happen to
stumble upon that odd 32767.bz2 testcase. It seems unlikely someone
would use so many unused selectors. But the file format allows for it,
so lets just handle it.

> The one thing that concerned me was that, it would be a disaster -- having
> ignored all selectors above 18002 -- if subsequent decoding actually *did*
> manage somehow to try to read more than 18002 selectors out of s->selectorMtf,
> because we'd be reading uninitialised memory.  But this seems to me can't
> happen because, after the selector-reading loop, you added
> 
> +      if (nSelectors > BZ_MAX_SELECTORS)
> +        nSelectors = BZ_MAX_SELECTORS;
> 
> and the following loop:
> 
>        /*--- Undo the MTF values for the selectors. ---*/
>        ...
> 
> is the only place that reads s->selectorMtf, and then only for the range
> 0 .. nSelectors-1.
> 
> So it seems good to me.  Does this sync with your analysis?

Yes. There is also the other BZ_MAX_SELECTORS sized array s->selector.
But that is similarly guarded. First it is filled from the selectorMtf
array with that for loop 0 .. nSelectors-1. Then it is accessed through
the GET_MTF_VAL macro as selector[groupNo], but that access is guarded
with:

      if (groupNo >= nSelectors)                  \
         RETURN(BZ_DATA_ERROR);                   \

So that prevents any bad access.

Attached is the patch with a commit message that hopefully explains why
the change is correct (and why the CVE, although a source code bug,
wasn't really exploitable in the first place). Hope it makes sense.

Cheers,

Mark

[-- Attachment #2: Type: text/x-patch, Size: 3096 bytes --]

From b357f4ec14a8b5b11b37621ee9f2a10f518b6c65 Mon Sep 17 00:00:00 2001
From: Mark Wielaard <mark@klomp.org>
Date: Wed, 3 Jul 2019 01:28:11 +0200
Subject: [PATCH] Accept as many selectors as the file format allows.

But ignore any larger that the theoretical maximum, BZ_MAX_SELECTORS.

The theoretical maximum number of selectors depends on the maximum
blocksize (900000 bytes) and the number of symbols (50) that can be
encoded with a different Huffman tree. BZ_MAX_SELECTORS is 18002.

But the bzip2 file format allows the number of selectors to be encoded
with 15 bits (because 18002 isn't a factor of 2 and doesn't fit in
14 bits). So the file format maximum is 32767 selectors.

Some bzip2 encoders might actually have written out more selectors
than the theoretical maximum because they rounded up the number of
selectors to some convenient factor of 8.

The extra 14766 selectors can never be validly used by the decompression
algorithm. So we can read them, but then discard them.

This is effectively what was done (by accident) before we added a
check for nSelectors to be at most BZ_MAX_SELECTORS to mitigate
CVE-2019-12900.

The extra selectors were written out after the array inside the
EState struct. But the struct has extra space allocated after the
selector arrays of 18060 bytes (which is larger than 14766).
All of which will be initialized later (so the overwrite of that
space with extra selector values would have been harmless).
---
 compress.c   |  2 +-
 decompress.c | 10 ++++++++--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/compress.c b/compress.c
index 237620d..76adee6 100644
--- a/compress.c
+++ b/compress.c
@@ -454,7 +454,7 @@ void sendMTFValues ( EState* s )
 
    AssertH( nGroups < 8, 3002 );
    AssertH( nSelectors < 32768 &&
-            nSelectors <= (2 + (900000 / BZ_G_SIZE)),
+            nSelectors <= BZ_MAX_SELECTORS,
             3003 );
 
 
diff --git a/decompress.c b/decompress.c
index 20ce493..3303499 100644
--- a/decompress.c
+++ b/decompress.c
@@ -287,7 +287,7 @@ Int32 BZ2_decompress ( DState* s )
       GET_BITS(BZ_X_SELECTOR_1, nGroups, 3);
       if (nGroups < 2 || nGroups > BZ_N_GROUPS) RETURN(BZ_DATA_ERROR);
       GET_BITS(BZ_X_SELECTOR_2, nSelectors, 15);
-      if (nSelectors < 1 || nSelectors > BZ_MAX_SELECTORS) RETURN(BZ_DATA_ERROR);
+      if (nSelectors < 1) RETURN(BZ_DATA_ERROR);
       for (i = 0; i < nSelectors; i++) {
          j = 0;
          while (True) {
@@ -296,8 +296,14 @@ Int32 BZ2_decompress ( DState* s )
             j++;
             if (j >= nGroups) RETURN(BZ_DATA_ERROR);
          }
-         s->selectorMtf[i] = j;
+         /* Having more than BZ_MAX_SELECTORS doesn't make much sense
+            since they will never be used, but some implementations might
+            "round up" the number of selectors, so just ignore those. */
+         if (i < BZ_MAX_SELECTORS)
+           s->selectorMtf[i] = j;
       }
+      if (nSelectors > BZ_MAX_SELECTORS)
+        nSelectors = BZ_MAX_SELECTORS;
 
       /*--- Undo the MTF values for the selectors. ---*/
       {
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Alternative nSelectors patch (Was: bzip2 1.0.7 released)
  2019-01-01  0:00               ` Mark Wielaard
@ 2019-01-01  0:00                 ` Mark Wielaard
  0 siblings, 0 replies; 16+ messages in thread
From: Mark Wielaard @ 2019-01-01  0:00 UTC (permalink / raw)
  To: jseward, Federico Mena Quintero, bzip2-devel

Hi,

> Attached is the patch with a commit message that hopefully explains why
> the change is correct (and why the CVE, although a source code bug,
> wasn't really exploitable in the first place). Hope it makes sense.

So the https://sourceware.org/git/bzip2-tests.git was integrated into
the buildbot and it turned RED. As expected, since without this fix it
fails with:
 - ./lbzip2/32767.bz2 bad decompress result

So I have now pushed the patch and hopefully that turns the buildbot
green: https://builder.wildebeest.org/buildbot/#/builders?tags=bzip2

Cheers,

Mark

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Alternative nSelectors patch (Was: bzip2 1.0.7 released)
  2019-01-01  0:00           ` Alternative nSelectors patch (Was: bzip2 1.0.7 released) Mark Wielaard
@ 2019-01-01  0:00             ` Julian Seward
  2019-01-01  0:00               ` Mark Wielaard
  0 siblings, 1 reply; 16+ messages in thread
From: Julian Seward @ 2019-01-01  0:00 UTC (permalink / raw)
  To: Mark Wielaard, Federico Mena Quintero, bzip2-devel


Hi Mark,

This seems to me like a better patch than my proposal, so I retract my
proposal and vote for this one instead.

The one thing that concerned me was that, it would be a disaster -- having
ignored all selectors above 18002 -- if subsequent decoding actually *did*
manage somehow to try to read more than 18002 selectors out of s->selectorMtf,
because we'd be reading uninitialised memory.  But this seems to me can't
happen because, after the selector-reading loop, you added

+      if (nSelectors > BZ_MAX_SELECTORS)
+        nSelectors = BZ_MAX_SELECTORS;

and the following loop:

       /*--- Undo the MTF values for the selectors. ---*/
       ...

is the only place that reads s->selectorMtf, and then only for the range
0 .. nSelectors-1.

So it seems good to me.  Does this sync with your analysis?

J


On 01/07/2019 01:36, Mark Wielaard wrote:
> Hi,
> 
> On Fri, 2019-06-28 at 13:10 +0200, Mark Wielaard wrote:
>>> It seems to me to be important to now split BZ_MAX_SELECTORS into these two
>>> parts so as to make it clear to everybody that we're accepting (decompressing)
>>> a slightly larger set of inputs than we create (a la that old saying about
>>> network protocol implementations), so as to tolerate other compressors.
>>
>> That seems good. The attached patch does this and makes it possible to
>> decode the problematic bz2 file.
> 
> Sorry, it is a bit too late here to properly document this patch and
> explain why I think it is a better one than the "split-max-selectors"
> fix. But hopefully the new testsuite example and the comment in the
> patch make clear what my thinking is.
> 
> This resolved both the issue with the large file reported as with the
> new test suite file (lbzip2/32767.bz2). The whole testsuite passes now,
> even under valgrind and with gcc -fsanitize=undefined.
> 
> Comments on the patch idea more than welcome.
> 
> Thanks,
> 
> Mark
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-07-09 21:38 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-01  0:00 bzip2 1.0.7 released Mark Wielaard
2019-01-01  0:00 ` Mark Wielaard
2019-01-01  0:00   ` Mark Wielaard
2019-01-01  0:00     ` Federico Mena Quintero
2019-01-01  0:00       ` Julian Seward
2019-01-01  0:00         ` Mark Wielaard
2019-01-01  0:00           ` Mark Wielaard
2019-01-01  0:00             ` Federico Mena Quintero
2019-01-01  0:00               ` Mark Wielaard
2019-01-01  0:00           ` Alternative nSelectors patch (Was: bzip2 1.0.7 released) Mark Wielaard
2019-01-01  0:00             ` Julian Seward
2019-01-01  0:00               ` Mark Wielaard
2019-01-01  0:00                 ` Mark Wielaard
2019-01-01  0:00           ` bzip2 test suite " Mark Wielaard
2019-01-01  0:00   ` bzip2 1.0.7 released Federico Mena Quintero
2019-01-01  0:00   ` Jeffrey Walton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).