public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Martin Sebor <msebor@gmail.com>
To: Jeff Law <law@redhat.com>, Jakub Jelinek <jakub@redhat.com>
Cc: !Gcc Patch List <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] handle integer overflow/wrapping in printf directives (PR 78622)
Date: Fri, 02 Dec 2016 22:54:00 -0000	[thread overview]
Message-ID: <7d9d10aa-36c2-f649-b89d-1131bdb59424@gmail.com> (raw)
In-Reply-To: <945f3abe-dddc-8579-977b-a99ebeef54a9@redhat.com>

Thanks for looking at this!  I realize it's dense code and not easy
to make sense out of.

>> PR middle-end/78622 - [7 Regression]
>> -Wformat-length/-fprintf-return-value incorrect with overflow/wrapping
>>
>> gcc/ChangeLog:
>>
>>     PR middle-end/78622
>>     * gimple-ssa-sprintf.c (min_bytes_remaining): Use res.knownrange
>>     rather than res.bounded.
>>     (adjust_range_for_overflow): New function.
>>     (format_integer): Always set res.bounded to true unless either
>>     precision or width is specified and unknown.
>>     Call adjust_range_for_overflow.
>>     (format_directive): Remove vestigial quoting.
>>     (add_bytes): Correct the computation of boundrange used to
>>     decide whether a warning is of a "maybe" or "defnitely" kind.
> s/defnitely/definitely/

Will fix.

>> +/* With the range [*ARGMIN, *ARGMAX] of an integer directive's actual
>> +   argument, due to the conversion from either *ARGMIN or *ARGMAX to
>> +   the type of the directive's formal argument it's possible for both
>> +   to result in the same number of bytes or a range of bytes that's
>> +   less than the number of bytes that would result from formatting
>> +   some other value in the range [*ARGMIN, *ARGMAX].  This can be
>> +   determined by checking for the actual argument being in the range
>> +   of the type of the directive.  If it isn't it must be assumed to
>> +   take on the full range of the directive's type.
>> +   Return true when the range has been adjusted to the range of
>> +   DIRTYPE, false otherwise.  */
> I wish I'd counted how many times I read that.

Let me see if I can word it better.

>
>> +
>> +static bool
>> +adjust_range_for_overflow (tree dirtype, tree *argmin, tree *argmax)
>> +{
>> +  unsigned argprec = TYPE_PRECISION (TREE_TYPE (*argmin));
>> +  unsigned dirprec = TYPE_PRECISION (dirtype);
>> +
>> +  /* The logic here was inspired/lifted from  the CONVERT_EXPR_CODE_P
>> +     branch in the extract_range_from_unary_expr function in tree-vrp.c.
>> +  */
> Formatting it.  If the comment-close won't fit, then line wrap prior to
> the last word in the comment.

Okay.  I didn't remember what the exact rules were.

>> +
>> +  if (TREE_CODE (*argmin) == INTEGER_CST
>> +      && TREE_CODE (*argmax) == INTEGER_CST
>> +      && (dirprec >= argprec
>> +      || integer_zerop (int_const_binop (RSHIFT_EXPR,
>> +                         int_const_binop (MINUS_EXPR,
>> +                                  *argmax,
>> +                                  *argmin),
>> +                         size_int (dirprec)))))
>> +  {
>> +    *argmin = force_fit_type (dirtype, wi::to_widest (*argmin), 0,
>> false);
>> +    *argmax = force_fit_type (dirtype, wi::to_widest (*argmax), 0,
>> false);
>> +    return false;
>> +  }
> So in this case we're not changing the values, we're just converting
> them to a equal or wider type, right?  Thus no adjustment (what about a
> sign change?  Is that an adjustment?) and we return false per the
> function's specifications.

This casts the value to the destination type, so it may result in
different values.  The important postcondition it preserves is that
the difference ARGMAX - ARGMIN is less than or equal to TYPE_MAX of
the directive's unsigned TYPE.  (If it isn't, the range cannot be
converted.)  This lets us take, say:

   int f (int i)
   {
     if (i < 1024 || 1033 < i)
        i = 1024;

     return snprintf (0, 0, "%hhi", i);
   }

and fold the function call into 1 because (signed char)1024 is 0 and
(signed char)1033 is 9, and every other value in that same original
range is also in the same range after the conversion.  We know it's
safe because (1033 - 1024 <= UCHAR_MAX) holds.  But in in this:

   int g (int i)
   {
     if (i < 1024 || 1289 < i)
        i = 1024;

     return snprintf (0, 0, "%hhi", i);
   }

even though (signed char)1024 is 0 and (signed char)1289 is also 9
like above, since (1289 - 1024) is 265 and thus greater than UCHAR_MAX,
folding the result wouldn't be safe (for example, (sighed char)1034
yields 10).

I picture this as a window bounded by the range of the directive's
type sliding within another window bounded by the argument's type:

   [<-----[...dirtype...]--->]

the outer brackets delimit the range of the argument and the inner
ones the directive's.  The smaller window can slide left and right
and it can shrink but it can't be wider that the directive's type's
range.

> What about overflows when either argmin or argmax won't fit into dirtype
> without an overflow?  What's the right behavior there?

That's fine just as long as the property above holds.

I think the algorithm works but the code came from tree-vrp where
there are all sorts of additional checks for some infinities which
VRP distinguishes from type limits like SCHAR_MIN and _MAX.  I don't
fully understand those and so I'd feel better if someone who does
double checked this code.

>
>> +
>> +  tree dirmin = TYPE_MIN_VALUE (dirtype);
>> +  tree dirmax = TYPE_MAX_VALUE (dirtype);
> From this point onward argmin/argmax are either not integers or we're
> doing a narrowing conversion, right?  In both cases the fact that we're
> doing a narrowing conversion constrains the range

Argmin and argmax are always integers.  The rest of the function
handles the case where the postcondition above doesn't hold (e.g.,
in function g above).

>
>> +
>> +  if (TYPE_UNSIGNED (dirtype))
>> +    {
>> +      *argmin = dirmin;
>> +      *argmax = dirmax;
>> +    }
>> +  else
>> +    {
>> +      *argmin = integer_zero_node;
>> +      *argmax = dirmin;
>> +    }
> Should this be dirmax? Am I missing something here?

It's dirmin because for a signed type, due to the sign, it results
in more bytes on output than dirmin would.  (E.g., with SCHAR_MIN
= -128 and SCHAR_MAX = 127 it's four bytes vs three.)

Martin

  reply	other threads:[~2016-12-02 22:54 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-01  3:26 Martin Sebor
2016-12-01  7:26 ` Jakub Jelinek
2016-12-01  9:15   ` Jakub Jelinek
2016-12-02  2:31     ` Martin Sebor
2016-12-02 20:52       ` Jeff Law
2016-12-02 22:54         ` Martin Sebor [this message]
2016-12-07 18:43           ` Jeff Law
2016-12-07 19:18             ` Martin Sebor
2016-12-12 18:18               ` Jeff Law
2016-12-05 20:26       ` Jakub Jelinek
2016-12-06  3:43         ` Martin Sebor
2016-12-07 18:58           ` Jeff Law
2016-12-12  0:21             ` Martin Sebor
2016-12-12 20:28               ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7d9d10aa-36c2-f649-b89d-1131bdb59424@gmail.com \
    --to=msebor@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=law@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).