public inbox for bzip2-devel@sourceware.org
 help / color / mirror / Atom feed
From: Mark Wielaard <mark@klomp.org>
To: jseward@acm.org, Federico Mena Quintero <federico@gnome.org>,
	 bzip2-devel@sourceware.org
Subject: Re: Alternative nSelectors patch (Was: bzip2 1.0.7 released)
Date: Tue, 01 Jan 2019 00:00:00 -0000	[thread overview]
Message-ID: <f9230fc65a3529b59b31f13494c72a1c01a6148e.camel@klomp.org> (raw)
In-Reply-To: <4f434101-5ce3-d757-2f61-c9e419911e00@acm.org>

[-- Attachment #1: Type: text/plain, Size: 1905 bytes --]

On Tue, 2019-07-02 at 08:34 +0200, Julian Seward wrote:
> This seems to me like a better patch than my proposal, so I retract
> my proposal and vote for this one instead.

I did keep your cleanup of the compress.c assert. And I don't think I
would have proposed something different if I didn't just happen to
stumble upon that odd 32767.bz2 testcase. It seems unlikely someone
would use so many unused selectors. But the file format allows for it,
so lets just handle it.

> The one thing that concerned me was that, it would be a disaster -- having
> ignored all selectors above 18002 -- if subsequent decoding actually *did*
> manage somehow to try to read more than 18002 selectors out of s->selectorMtf,
> because we'd be reading uninitialised memory.  But this seems to me can't
> happen because, after the selector-reading loop, you added
> 
> +      if (nSelectors > BZ_MAX_SELECTORS)
> +        nSelectors = BZ_MAX_SELECTORS;
> 
> and the following loop:
> 
>        /*--- Undo the MTF values for the selectors. ---*/
>        ...
> 
> is the only place that reads s->selectorMtf, and then only for the range
> 0 .. nSelectors-1.
> 
> So it seems good to me.  Does this sync with your analysis?

Yes. There is also the other BZ_MAX_SELECTORS sized array s->selector.
But that is similarly guarded. First it is filled from the selectorMtf
array with that for loop 0 .. nSelectors-1. Then it is accessed through
the GET_MTF_VAL macro as selector[groupNo], but that access is guarded
with:

      if (groupNo >= nSelectors)                  \
         RETURN(BZ_DATA_ERROR);                   \

So that prevents any bad access.

Attached is the patch with a commit message that hopefully explains why
the change is correct (and why the CVE, although a source code bug,
wasn't really exploitable in the first place). Hope it makes sense.

Cheers,

Mark

[-- Attachment #2: Type: text/x-patch, Size: 3096 bytes --]

From b357f4ec14a8b5b11b37621ee9f2a10f518b6c65 Mon Sep 17 00:00:00 2001
From: Mark Wielaard <mark@klomp.org>
Date: Wed, 3 Jul 2019 01:28:11 +0200
Subject: [PATCH] Accept as many selectors as the file format allows.

But ignore any larger that the theoretical maximum, BZ_MAX_SELECTORS.

The theoretical maximum number of selectors depends on the maximum
blocksize (900000 bytes) and the number of symbols (50) that can be
encoded with a different Huffman tree. BZ_MAX_SELECTORS is 18002.

But the bzip2 file format allows the number of selectors to be encoded
with 15 bits (because 18002 isn't a factor of 2 and doesn't fit in
14 bits). So the file format maximum is 32767 selectors.

Some bzip2 encoders might actually have written out more selectors
than the theoretical maximum because they rounded up the number of
selectors to some convenient factor of 8.

The extra 14766 selectors can never be validly used by the decompression
algorithm. So we can read them, but then discard them.

This is effectively what was done (by accident) before we added a
check for nSelectors to be at most BZ_MAX_SELECTORS to mitigate
CVE-2019-12900.

The extra selectors were written out after the array inside the
EState struct. But the struct has extra space allocated after the
selector arrays of 18060 bytes (which is larger than 14766).
All of which will be initialized later (so the overwrite of that
space with extra selector values would have been harmless).
---
 compress.c   |  2 +-
 decompress.c | 10 ++++++++--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/compress.c b/compress.c
index 237620d..76adee6 100644
--- a/compress.c
+++ b/compress.c
@@ -454,7 +454,7 @@ void sendMTFValues ( EState* s )
 
    AssertH( nGroups < 8, 3002 );
    AssertH( nSelectors < 32768 &&
-            nSelectors <= (2 + (900000 / BZ_G_SIZE)),
+            nSelectors <= BZ_MAX_SELECTORS,
             3003 );
 
 
diff --git a/decompress.c b/decompress.c
index 20ce493..3303499 100644
--- a/decompress.c
+++ b/decompress.c
@@ -287,7 +287,7 @@ Int32 BZ2_decompress ( DState* s )
       GET_BITS(BZ_X_SELECTOR_1, nGroups, 3);
       if (nGroups < 2 || nGroups > BZ_N_GROUPS) RETURN(BZ_DATA_ERROR);
       GET_BITS(BZ_X_SELECTOR_2, nSelectors, 15);
-      if (nSelectors < 1 || nSelectors > BZ_MAX_SELECTORS) RETURN(BZ_DATA_ERROR);
+      if (nSelectors < 1) RETURN(BZ_DATA_ERROR);
       for (i = 0; i < nSelectors; i++) {
          j = 0;
          while (True) {
@@ -296,8 +296,14 @@ Int32 BZ2_decompress ( DState* s )
             j++;
             if (j >= nGroups) RETURN(BZ_DATA_ERROR);
          }
-         s->selectorMtf[i] = j;
+         /* Having more than BZ_MAX_SELECTORS doesn't make much sense
+            since they will never be used, but some implementations might
+            "round up" the number of selectors, so just ignore those. */
+         if (i < BZ_MAX_SELECTORS)
+           s->selectorMtf[i] = j;
       }
+      if (nSelectors > BZ_MAX_SELECTORS)
+        nSelectors = BZ_MAX_SELECTORS;
 
       /*--- Undo the MTF values for the selectors. ---*/
       {
-- 
1.8.3.1


  reply	other threads:[~2019-07-03  0:05 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-01  0:00 bzip2 1.0.7 released Mark Wielaard
2019-01-01  0:00 ` Mark Wielaard
2019-01-01  0:00   ` Federico Mena Quintero
2019-01-01  0:00   ` Jeffrey Walton
2019-01-01  0:00   ` Mark Wielaard
2019-01-01  0:00     ` Federico Mena Quintero
2019-01-01  0:00       ` Julian Seward
2019-01-01  0:00         ` Mark Wielaard
2019-01-01  0:00           ` Alternative nSelectors patch (Was: bzip2 1.0.7 released) Mark Wielaard
2019-01-01  0:00             ` Julian Seward
2019-01-01  0:00               ` Mark Wielaard [this message]
2019-01-01  0:00                 ` Mark Wielaard
2019-01-01  0:00           ` bzip2 1.0.7 released Mark Wielaard
2019-01-01  0:00             ` Federico Mena Quintero
2019-01-01  0:00               ` Mark Wielaard
2019-01-01  0:00           ` bzip2 test suite (Was: bzip2 1.0.7 released) Mark Wielaard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f9230fc65a3529b59b31f13494c72a1c01a6148e.camel@klomp.org \
    --to=mark@klomp.org \
    --cc=bzip2-devel@sourceware.org \
    --cc=federico@gnome.org \
    --cc=jseward@acm.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).