public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Martin Sebor <msebor@gmail.com>
To: Andrew MacLeod <amacleod@redhat.com>, Jeff Law <law@redhat.com>,
	gcc-patches <gcc-patches@gcc.gnu.org>,
	Aldy Hernandez <aldyh@redhat.com>
Subject: Re: [PATCH] warn for integer overflow in allocation calls (PR 96838)
Date: Thu, 17 Sep 2020 14:18:35 -0600	[thread overview]
Message-ID: <b06e6ad3-bbc8-e036-6090-7afce4d9c1b1@gmail.com> (raw)
In-Reply-To: <5ad6fd82-18fb-eeb2-0c5f-959a39bcee1a@redhat.com>

On 9/17/20 12:39 PM, Andrew MacLeod wrote:
> On 9/17/20 12:08 PM, Martin Sebor via Gcc-patches wrote:
>> On 9/16/20 9:23 PM, Jeff Law wrote:
>>>
>>> On 9/15/20 1:47 PM, Martin Sebor wrote:
>>>> Overflowing the size of a dynamic allocation (e.g., malloc or VLA)
>>>> can lead to a subsequent buffer overflow corrupting the heap or
>>>> stack.  The attached patch diagnoses a subset of these cases where
>>>> the overflow/wraparound is still detectable.
>>>>
>>>> Besides regtesting GCC on x86_64-linux I also verified the warning
>>>> doesn't introduce any false positives into Glibc or Binutils/GDB
>>>> builds on the same target.
>>>>
>>>> Martin
>>>>
>>>> gcc-96838.diff
>>>>
>>>> PR middle-end/96838 - missing warning on integer overflow in calls 
>>>> to allocation functions
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>>     PR middle-end/96838
>>>>     * calls.c (eval_size_vflow): New function.
>>>>     (get_size_range): Call it.  Add argument.
>>>>     (maybe_warn_alloc_args_overflow): Diagnose overflow/wraparound.
>>>>     * calls.h (get_size_range): Add argument.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>>     PR middle-end/96838
>>>>     * gcc.dg/Walloc-size-larger-than-19.c: New test.
>>>>     * gcc.dg/Walloc-size-larger-than-20.c: New test.
>>>
>>> If an attacker can control an integer overflow that feeds an 
>>> allocation, then they can do all kinds of bad things.  In fact, when 
>>> my son was asking me attack vectors, this is one I said I'd look at 
>>> if I were a bad guy.
>>>
>>>
>>> I'm a bit surprised you can't just query the range of the argument 
>>> and get the overflow status out of that range, but I don't see that 
>>> in the APIs.  How painful would it be to make that part of the API? 
>>> The conceptual model would be to just ask for the range of the 
>>> argument to malloc which would include the range and a status bit 
>>> indicating the computation might have overflowed.
>>
>   Do we know if it did/would have wrapped? sure.  since we have to do 
> the math.    so you are correct in that the information is there. but is 
> it useful?
> 
> We are in the very annoying habit of subtracting one by adding 
> 0xFFFFFFF.  which means you get an overflow for unsigned when you 
> subtract one.   From what I have seen of unsigned math, we would be 
> flagging very many operations as overflows, so you would still have the 
> difficulty of figuring out whether its a "real" overflow or a fake one 
> because of the way we do unsigned math

You and me both :)

> 
> At the very start, I did have an overflow flag in the range class... but 
> it was turning out to be fairly useless so it was removed.
> .
> 
>> I agree that being able to evaluate an expression in an as-if
>> infinite precision (in addition to its type) would be helpful.
> 
> SO again, we get back to adding 0x0fffff when we are trying to subtract 
> one...  now, with infinite precision you are going to see
> 
>   [2,10]  - 1      we end up with [2,10]+0xFFFFFF, which will now give 
> you  [0x100000001, 0x100000009]    so its going to look like it overflowed?
> 
> 
> 
>> But just to make sure I understood correctly, let me ask again
>> using an example:
>>
>>   void* f (size_t n)
>>   {
>>     if (n < PTRDIFF_MAX / 2)
>>       n = PTRDIFF_MAX / 2;
>>
>>     return malloc (n * sizeof (int));
>>   }
>>
>> Can the unsigned wraparound in the argument be readily detected?
>>
>> On trunk, this ends up with the following:
>>
>>   # RANGE [4611686018427387903, 18446744073709551615]
>>   _6 = MAX_EXPR <n_2(D), 4611686018427387903>;
>>   # RANGE [0, 18446744073709551615] NONZERO 18446744073709551612
>>   _1 = _6 * 4;
>>   ...
>>   p_5 = mallocD.1206 (_1); [tail call]
>>   ...
>>   return p_5;
>>
>> so _1's range reflects the wraparound in size_t, but _6's range
>> has enough information to uncover it.  So detecting it is possible
>> and is done in the patch so we get a warning:
>>
>> warning: argument 1 range [18446744073709551612, 0x3fffffffffffffffc] 
>> is too large to represent in ‘long unsigned int’ 
>> [-Walloc-size-larger-than=]
>>     6 |   return malloc (n * sizeof (int));
>>       |          ^~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> The code is very simplistic and only handles a small subset of cases.
>> It could be generalized and exposed by a more generic API but it does
>> seem like the ranger must already have all the logic built into it so
>> if it isn't exposed now it should be a matter of opening it up.
> 
> everything is exposed in range-ops.  well, mostly.
> if we have _1 = _6 * 4
> 
> if one wanted to do that infinite precision, you query the range for _6, 
> and the range for 4 (which would be [4,4] :-)
> range_of_expr (op1r, _6, stmt)
> range_of_expr (op2r, 4, stmt)
> 
> you could take their current types, and cast those ranges to whatever 
> the next higher precsion is,
> range_cast  (op1r, highertype)
> range_cast (op2r, highertype)
> then invoke the operation on those parameters
> 
> gimple_range_fold (r, stmt,  op1r, op2r)
> 
> and that will do your operation in the higher precision.  you could 
> compare that to the value in regular precision too i suppose.

The patch does pretty much exactly what you described, except in
offset_int, and only for a limited set of arithmetic operations.
It figures out if an unsigned expression wrapped simply by checking
to see if the mathematically correct result fits in its type.

It sounds like I should be able to use the range APIs above instead
and get the same result, but for arbitrary expressions.  I don't
need to (and very well can't) compare the infinitely precise result
to the regular result because when it's a range that wraps the result
is the full range of the type (i.e., [0, SIZE_MAX] in the test above).

> 
> The ranger is designed to track ranges as they are represented in the 
> IL.  You are asking for us to interpret the IL as something other than 
> what is there, and increase the precision.  Thats a different task.
> 
> could that be done?    maybe. It might involve parallel tracking of 
> ssa-Name ranges in a higher precision... and recalculating every 
> expression using those values.   Im not convinced that a generalized 
> ranger which works in higher precision is really going to solve the 
> problem the way you hope it would.. i think we'd get a lot of false info 
> when unsigned values are involved

I don't think that's necessary for unsigned wrapping.  Most of
the time the (possibly) wrapped result is what we need because
that's what the languages say is supposed to happen.  The cases
when we need to check for the dangerous wraparound (or overflow)
should be rare (just allocation calls) and can be handled on
demand by walking the IL and doing the computation in the infinite
precision.

For signed overflow, though, computing the mathematically correct
result seems preferable.  Or at least tracking (and propagating)
the overflow bit should be.  That way we could tell when
an expression is definitely invalid.

> 
> If I were to see a clear solution/description as to what is a 
> problematic overflow and what isnt, there might be something more 
> general that could be done...

Does the distinction above (unsigned wrapping/signed overflow) help?

> 
>> I mentioned it to Aldy in response to the irange best practices
>> document.  He says there's no way to do that and no plans to
>> make it possible.
>>
> Thats not quite what he said.   . He said  "If the operation may wrap or 
> overflow, it will show up as so in the range.  I don't think we have any 
> plans of changing this behavior."
> 
> In fact. looking back, he said the same thing I just did....  There IS a 
> mechanism there for working in higher precision should one desire to do 
> so, but as I pointed out above, Im not convinced as a general thing in 
> the ranger it will work how you imagine.
> 
> My guess is what you really want is to be invoking range-ops on stmts 
> you care about whether they might overflow or not with higher precision 
> versions and then identify the cases which trigger that you actually 
> care about.

Yes, it looks like that's the way to go.  I was hoping the ranger
had an API that would do all that for me (i.e., walk the IL and
compute the range of the result on demand in the precision of
my choice), since it has to do that as it is, except that it uses
the precision of the expression.  But if not, it sounds like with
range_ops I should be able to fairly easily write one myself.

> So maybe you want an "overflow calculator" which is similar to the 
> ranger, and  identifies which kinds of stmts expect certain range of 
> results, and if they get results outside of  the expected ranges, 
> triggers a flag.
> maybe it could convert all unsigned and signed  values to signed, and 
> then work in a higher precision,  and then if any intermediate result 
> cant be accurately cast back to the original type,  it gets flagged?  is 
> that the kind of overflow that is cared about?

There are two kinds of "overflow:" unsigned wrapping and signed
overflow.  Both are a problem in allocation calls like:

   int *p = malloc (nelts * size);

If nelts and size are signed, the compile-time result seems to
be done in saturation arithmetic (its range looks to be capped
at TYPE_MAX).  If they are unsigned, they wrap around zero.
Neither is usually intended by the user but the wraparound is
the more dangerous of the two because it's well-defined and
because it shrinks the size to a very small number.

> 
> I dont know... this isnt my space :-)       Im just telling you that I'm 
> not sure the current range query mechanism is the right place to be 
> asking if an "overflow" occurred because I think the general answer wont 
> be useful.. too many kinds of overflows thrown together.
> 
> I think this requires more specifoc detailed information, and maybe a 
> simple overflow analyzer will do the trick based on range-ops.... I 
> think range-ops has the required abilities.  whether it needs some 
> enhancing, or something else exposed,  I'm not sure yet

Thank you!  This was a useful clarification to what Aldy said.
I'd been meaning to follow up with him on this point when I was
done with what I'm working on but Jeff's question just got me
the answers I was looking for quicker.  Let me experiment with
range_ops and get back to you.

> 
> 
> Andrew
> 
> PS  One could always take a ranger and overload the range_of_stmt() 
> routine that calculates results, so that it calls current functionality, 
> then try converting the arguements to  higher precision ranges and 
> re-invoking range-ops on those arguments, get a new higher precision 
> range back and then look at the results vs the normal ones.  Might be a 
> place to start experimenting...   then pass a flag along sayign an 
> overflow occurred.

That sounds like something to look into.  I don't see range_of_stmt
on trunk.  Is that the right name or has it not been merged yet?

Thanks again.
Martin


  reply	other threads:[~2020-09-17 20:18 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-15 19:47 Martin Sebor
2020-09-16  9:39 ` Bernhard Reutner-Fischer
2020-09-17  3:23 ` Jeff Law
2020-09-17 16:08   ` Martin Sebor
2020-09-17 18:39     ` Andrew MacLeod
2020-09-17 20:18       ` Martin Sebor [this message]
2020-09-18  6:29         ` Aldy Hernandez
2020-09-19 21:22           ` Martin Sebor
2020-09-20  6:39             ` Aldy Hernandez
2020-09-21 15:13               ` Martin Sebor
2020-11-09 16:00                 ` Martin Sebor
2020-11-21  5:07                   ` Jeff Law
2020-11-21 13:26                     ` Andrew MacLeod
2020-11-23 21:38                       ` Martin Sebor
2020-11-24 17:42                         ` Andrew MacLeod
2020-11-24 17:44                           ` Andrew MacLeod
2020-11-24 18:39                             ` Martin Sebor
2020-12-04 19:14                               ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b06e6ad3-bbc8-e036-6090-7afce4d9c1b1@gmail.com \
    --to=msebor@gmail.com \
    --cc=aldyh@redhat.com \
    --cc=amacleod@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=law@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).