On Tue, 2019-07-02 at 08:34 +0200, Julian Seward wrote:
> This seems to me like a better patch than my proposal, so I retract
> my proposal and vote for this one instead.

I did keep your cleanup of the compress.c assert. And I don't think I
would have proposed something different if I didn't just happen to
stumble upon that odd 32767.bz2 testcase. It seems unlikely someone
would use so many unused selectors. But the file format allows for it,
so lets just handle it.

> The one thing that concerned me was that, it would be a disaster -- having
> ignored all selectors above 18002 -- if subsequent decoding actually *did*
> manage somehow to try to read more than 18002 selectors out of s->selectorMtf,
> because we'd be reading uninitialised memory.  But this seems to me can't
> happen because, after the selector-reading loop, you added
> 
> +      if (nSelectors > BZ_MAX_SELECTORS)
> +        nSelectors = BZ_MAX_SELECTORS;
> 
> and the following loop:
> 
>        /*--- Undo the MTF values for the selectors. ---*/
>        ...
> 
> is the only place that reads s->selectorMtf, and then only for the range
> 0 .. nSelectors-1.
> 
> So it seems good to me.  Does this sync with your analysis?

Yes. There is also the other BZ_MAX_SELECTORS sized array s->selector.
But that is similarly guarded. First it is filled from the selectorMtf
array with that for loop 0 .. nSelectors-1. Then it is accessed through
the GET_MTF_VAL macro as selector[groupNo], but that access is guarded
with:

      if (groupNo >= nSelectors)                  \
         RETURN(BZ_DATA_ERROR);                   \

So that prevents any bad access.

Attached is the patch with a commit message that hopefully explains why
the change is correct (and why the CVE, although a source code bug,
wasn't really exploitable in the first place). Hope it makes sense.

Cheers,

Mark