Re: [PING][PATCH] correct handling of indices into arrays with elements larger than 1 (PR c++/96511)

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Jason Merrill <jason@redhat.com>
To: Martin Sebor <msebor@gmail.com>, gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PING][PATCH] correct handling of indices into arrays with elements larger than 1 (PR c++/96511)
Date: Wed, 7 Oct 2020 11:07:02 -0400	[thread overview]
Message-ID: <8a6bf322-0262-5111-96f0-92e012a69556@redhat.com> (raw)
In-Reply-To: <45f5e9f2-1855-91b5-0841-a6bcd342a09b@gmail.com>

On 10/7/20 10:42 AM, Martin Sebor wrote:
> On 10/7/20 8:26 AM, Jason Merrill wrote:
>> On 9/28/20 6:01 PM, Martin Sebor wrote:
>>> On 9/25/20 11:17 PM, Jason Merrill wrote:
>>>> On 9/22/20 4:05 PM, Martin Sebor wrote:
>>>>> The rebased and retested patches are attached.
>>>>>
>>>>> On 9/21/20 3:17 PM, Martin Sebor wrote:
>>>>>> Ping: 
>>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553906.html
>>>>>>
>>>>>> (I'm working on rebasing the patch on top of the latest trunk which
>>>>>> has changed some of the same code but it'd be helpful to get a go-
>>>>>> ahead on substance the changes.  I don't expect the rebase to
>>>>>> require any substantive modifications.)
>>>>>>
>>>>>> Martin
>>>>>>
>>>>>> On 9/14/20 4:01 PM, Martin Sebor wrote:
>>>>>>> On 9/4/20 11:14 AM, Jason Merrill wrote:
>>>>>>>> On 9/3/20 2:44 PM, Martin Sebor wrote:
>>>>>>>>> On 9/1/20 1:22 PM, Jason Merrill wrote:
>>>>>>>>>> On 8/11/20 12:19 PM, Martin Sebor via Gcc-patches wrote:
>>>>>>>>>>> -Wplacement-new handles array indices and pointer offsets the 
>>>>>>>>>>> same:
>>>>>>>>>>> by adjusting them by the size of the element.  That's correct 
>>>>>>>>>>> for
>>>>>>>>>>> the latter but wrong for the former, causing false positives 
>>>>>>>>>>> when
>>>>>>>>>>> the element size is greater than one.
>>>>>>>>>>>
>>>>>>>>>>> In addition, the warning doesn't even attempt to handle 
>>>>>>>>>>> arrays of
>>>>>>>>>>> arrays.  I'm not sure if I forgot or if I simply didn't think of
>>>>>>>>>>> it.
>>>>>>>>>>>
>>>>>>>>>>> The attached patch corrects these oversights by replacing most
>>>>>>>>>>> of the -Wplacement-new code with a call to compute_objsize which
>>>>>>>>>>> handles all this correctly (plus more), and is also better 
>>>>>>>>>>> tested.
>>>>>>>>>>> But even compute_objsize has bugs: it trips up while converting
>>>>>>>>>>> wide_int to offset_int for some pointer offset ranges.  Since
>>>>>>>>>>> handling the C++ IL required changes in this area the patch also
>>>>>>>>>>> fixes that.
>>>>>>>>>>>
>>>>>>>>>>> For review purposes, the patch affects just the middle end.
>>>>>>>>>>> The C++ diff pretty much just removes code from the front end.
>>>>>>>>>>
>>>>>>>>>> The C++ changes are OK.
>>>>>>>>>
>>>>>>>>> Thank you for looking at the rest as well.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> -compute_objsize (tree ptr, int ostype, access_ref *pref,
>>>>>>>>>>> -                bitmap *visited, const vr_values *rvals /* = 
>>>>>>>>>>> NULL */)
>>>>>>>>>>> +compute_objsize (tree ptr, int ostype, access_ref *pref, 
>>>>>>>>>>> bitmap *visited,
>>>>>>>>>>> +                const vr_values *rvals)
>>>>>>>>>>
>>>>>>>>>> This reformatting seems unnecessary, and I prefer to keep the 
>>>>>>>>>> comment about the default argument.
>>>>>>>>>
>>>>>>>>> This overload doesn't take a default argument.  (There was a stray
>>>>>>>>> declaration of a similar function at the top of the file that had
>>>>>>>>> one.  I've removed it.)
>>>>>>>>
>>>>>>>> Ah, true.
>>>>>>>>
>>>>>>>>>>> -      if (!size || TREE_CODE (size) != INTEGER_CST)
>>>>>>>>>>> -       return false;
>>>>>>>>>>  >...
>>>>>>>>>>
>>>>>>>>>> You change some failure cases in compute_objsize to return 
>>>>>>>>>> success with a maximum range, while others continue to return 
>>>>>>>>>> failure. This needs commentary about the design rationale.
>>>>>>>>>
>>>>>>>>> This is too much for a comment in the code but the background is
>>>>>>>>> this: compute_objsize initially returned the object size as a 
>>>>>>>>> constant.
>>>>>>>>> Recently, I have enhanced it to return a range to improve 
>>>>>>>>> warnings for
>>>>>>>>> allocated objects.  With that, a failure can be turned into 
>>>>>>>>> success by
>>>>>>>>> having the function set the range to that of the largest 
>>>>>>>>> object. That
>>>>>>>>> should simplify the function's callers and could even improve
>>>>>>>>> the detection of some invalid accesses.  Once this change is made
>>>>>>>>> it might even be possible to change its return type to void.
>>>>>>>>>
>>>>>>>>> The change that caught your eye is necessary to make the function
>>>>>>>>> a drop-in replacement for the C++ front end code which makes this
>>>>>>>>> same assumption.  Without it, a number of test cases that exercise
>>>>>>>>> VLAs fail in g++.dg/warn/Wplacement-new-size-5.C.  For example:
>>>>>>>>>
>>>>>>>>>    void f (int n)
>>>>>>>>>    {
>>>>>>>>>      char a[n];
>>>>>>>>>      new (a - 1) int ();
>>>>>>>>>    }
>>>>>>>>>
>>>>>>>>> Changing any of the other places isn't necessary for existing 
>>>>>>>>> tests
>>>>>>>>> to pass (and I didn't want to introduce too much churn).  But I do
>>>>>>>>> want to change the rest of the function along the same lines at 
>>>>>>>>> some
>>>>>>>>> point.
>>>>>>>>
>>>>>>>> Please do change the other places to be consistent; better to 
>>>>>>>> have more churn than to leave the function half-updated.  That 
>>>>>>>> can be a separate patch if you prefer, but let's do it now 
>>>>>>>> rather than later.
>>>>>>>
>>>>>>> I've made most of these changes in the other patch (also attached).
>>>>>>> I'm quite happy with the result but it turned out to be a lot more
>>>>>>> work than either of us expected, mostly due to the amount of 
>>>>>>> testing.
>>>>>>>
>>>>>>> I've left a couple of failing cases in place mainly as reminders
>>>>>>> to handle them better (which means I also didn't change the caller
>>>>>>> to avoid testing for failures).  I've also added TODO notes with
>>>>>>> reminders to handle some of the new codes more completely.
>>>>>>>
>>>>>>>>
>>>>>>>>>>> +  special_array_member sam{ };
>>>>>>>>>>
>>>>>>>>>> sam is always set by component_ref_size, so I don't think it's 
>>>>>>>>>> necessary to initialize it at the declaration.
>>>>>>>>>
>>>>>>>>> I find initializing pass-by-pointer local variables helpful but
>>>>>>>>> I don't insist on it.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> @@ -187,7 +187,7 @@ decl_init_size (tree decl, bool min)
>>>>>>>>>>>    tree last_type = TREE_TYPE (last);
>>>>>>>>>>>    if (TREE_CODE (last_type) != ARRAY_TYPE
>>>>>>>>>>>        || TYPE_SIZE (last_type))
>>>>>>>>>>> -    return size;
>>>>>>>>>>> +    return size ? size : TYPE_SIZE_UNIT (type);
>>>>>>>>>>
>>>>>>>>>> This change seems to violate the comment for the function.
>>>>>>>>>
>>>>>>>>> By my reading (and writing) the change is covered by the first
>>>>>>>>> sentence:
>>>>>>>>>
>>>>>>>>>     Returns the size of the object designated by DECL considering
>>>>>>>>>     its initializer if it either has one or if it would not affect
>>>>>>>>>     its size, ...
>>>>>>>>
>>>>>>>> OK, I see it now.
>>>>>>>>
>>>>>>>>> It handles a number of cases in Wplacement-new-size.C fail that
>>>>>>>>> construct a larger object in an extern declaration of a template,
>>>>>>>>> like this:
>>>>>>>>>
>>>>>>>>>    template <class> struct S { char c; };
>>>>>>>>>    extern S<int> s;
>>>>>>>>>
>>>>>>>>>    void f ()
>>>>>>>>>    {
>>>>>>>>>      new (&s) int ();
>>>>>>>>>    }
>>>>>>>>>
>>>>>>>>> I don't know why DECL_SIZE isn't set here (I don't think it can
>>>>>>>>> be anything but equal to TYPE_SIZE, can it?) and other than struct
>>>>>>>>> objects with a flexible array member where this identity doesn't
>>>>>>>>> hold I can't think of others.  Am I missing something?
>>>>>>>>
>>>>>>>> Good question.  The attached patch should fix that, so you 
>>>>>>>> shouldn't need the change to decl_init_size:
>>>>>>>
>>>>>>> I've integrated it into the bug fix.
>>>>>>>
>>>>>>> Besides the usual x86_64-linux bootstrap/regtest I tested both
>>>>>>> patches by building a few packages, including Binutils/GDB, Glibc,
>>>>>>> and  verifying no new warnings show up.
>>>>>>>
>>>>>>> Martin
>>>>
>>>>> +offset_int
>>>>> +access_ref::size_remaining (offset_int *pmin /* = NULL */) const
>>>>
>>>> For the various member functions, please include the comments with 
>>>> the definition as well as the in-class declaration.
>>>
>>> Only one access_ref member function is defined out-of-line: 
>>> offset_bounded().  I've adjusted the comment and copied it above
>>> the function definition.
>>>
>>>>
>>>>> +      if (offrng[1] < offrng[0])
>>>>
>>>> What does it mean for the max offset to be less than the min offset? 
>>>> I wouldn't expect that to ever happen with wide integers.
>>>
>>> The offset is represented in sizetype with negative values represented
>>> as large positive values, but has to be converted to ptrdiff_t.
>>
>> It looks to me like the offset is offset_int, which is both signed and 
>> big enough to hold all values of sizetype without turning large 
>> positive values into negative values.  Where are these sign-switching 
>> conversions happening?
> 
> In get_offset_range in builtins.c.

Since we're converting to offset_int there, why not give the offset_int 
the real value rather than a bogus negative value?

>>> These
>>> cases come up when the unsigned offset is an ordinary range that
>>> corresponds to an anti-range, such as here:
>>>
>>>    extern char a[2];
>>>
>>>    void f (unsigned long i)
>>>    {
>>>      if (i == 0)
>>>        return;
>>>      a[i] = 0;   // i's range is [1, -1] (i.e., [1, SIZE_MAX]
>>>    }
>>>
>>>>
>>>>> +  /* Return true if OFFRNG is bounded to a subrange of possible 
>>>>> offset
>>>>> +     values.  */
>>>>> +  bool offset_bounded () const;
>>>>
>>>> I don't understand how you're using this.  The implementation checks 
>>>> for the possible offset values falling outside those representable 
>>>> by ptrdiff_t, unless the range is only a single value.  And then the 
>>>> only use is
>>>>
>>>>> +  if (ref.offset_zero () || !ref.offset_bounded ())
>>>>> +    inform (DECL_SOURCE_LOCATION (ref.ref),
>>>>> +        "%qD declared here", ref.ref);
>>>>> +  else if (ref.offrng[0] == ref.offrng[1])
>>>>> +    inform (DECL_SOURCE_LOCATION (ref.ref),
>>>>> +        "at offset %wi from %qD declared here",
>>>>> +        ref.offrng[0].to_shwi (), ref.ref);
>>>>> +  else
>>>>> +    inform (DECL_SOURCE_LOCATION (ref.ref),
>>>>> +        "at offset [%wi, %wi] from %qD declared here",
>>>>> +        ref.offrng[0].to_shwi (), ref.offrng[1].to_shwi (), ref.ref);
>>>>
>>>> So if the possible offsets are all representable by ptrdiff_t, we 
>>>> don't print the range?  The middle case also looks unreachable, 
>>>> since offset_bounded will return false in that case.
>>>
>>> The function was originally named "offset_unbounded."  I changed
>>> it to "offset_bounded" but looks like I didn't finish the job or
>>> add any tests for it.
>>>
>>> The goal of conditionals is to avoid overwhelming the user with
>>> excessive numbers that may not be meaningful or even relevant
>>> to the warning.  I've corrected the function body, tweaked and
>>> renamed the get_range function to get_offset_range to do a better
>>> job of extracting ranges from the types of some nonconstant
>>> expressions the front end passes it, and added a new test for
>>> all this.  Attached is the new revision.
>>>
>>> Martin
>>
>

next prev parent reply	other threads:[~2020-10-07 15:07 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-11 16:19 [PATCH] " Martin Sebor
2020-08-19 15:00 ` [PING][PATCH] " Martin Sebor
2020-08-28 15:42   ` [PING 2][PATCH] " Martin Sebor
2020-09-01 19:22 ` [PATCH] " Jason Merrill
2020-09-03 18:44   ` Martin Sebor
2020-09-04 17:14     ` Jason Merrill
2020-09-14 22:01       ` Martin Sebor
2020-09-21 21:17         ` [PING][PATCH] " Martin Sebor
2020-09-22 20:05           ` Martin Sebor
2020-09-26  5:17             ` Jason Merrill
2020-09-28 22:01               ` Martin Sebor
2020-10-05 16:37                 ` Martin Sebor
2020-10-07 14:26                 ` Jason Merrill
2020-10-07 14:42                   ` Martin Sebor
2020-10-07 15:07                     ` Jason Merrill [this message]
2020-10-07 15:19                       ` Martin Sebor
2020-10-07 19:28                         ` Jason Merrill
2020-10-07 20:11                           ` Martin Sebor
2020-10-07 21:01                             ` Jason Merrill
2020-10-08 19:18                               ` Martin Sebor
2020-10-08 19:40                                 ` Jason Merrill
2020-10-09 14:51                                   ` Martin Sebor
2020-10-09 15:13                                     ` Jason Merrill
2020-10-11 22:45                                       ` Martin Sebor
2020-10-12  3:44                                         ` Jason Merrill
2020-10-12 15:21                                           ` Martin Sebor
2020-10-13  9:46                 ` Christophe Lyon
2020-10-13 16:59                   ` Martin Sebor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8a6bf322-0262-5111-96f0-92e012a69556@redhat.com \
    --to=jason@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=msebor@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).