From: Jason Merrill <jason@redhat.com>
To: Martin Sebor <msebor@gmail.com>, gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PING][PATCH] correct handling of indices into arrays with elements larger than 1 (PR c++/96511)
Date: Wed, 7 Oct 2020 15:28:07 -0400 [thread overview]
Message-ID: <647ab677-41be-2030-c734-bde03071057a@redhat.com> (raw)
In-Reply-To: <28c7421a-4fe8-ec6b-a288-001aff24df18@gmail.com>
On 10/7/20 11:19 AM, Martin Sebor wrote:
> On 10/7/20 9:07 AM, Jason Merrill wrote:
>> On 10/7/20 10:42 AM, Martin Sebor wrote:
>>> On 10/7/20 8:26 AM, Jason Merrill wrote:
>>>> On 9/28/20 6:01 PM, Martin Sebor wrote:
>>>>> On 9/25/20 11:17 PM, Jason Merrill wrote:
>>>>>> On 9/22/20 4:05 PM, Martin Sebor wrote:
>>>>>>> The rebased and retested patches are attached.
>>>>>>>
>>>>>>> On 9/21/20 3:17 PM, Martin Sebor wrote:
>>>>>>>> Ping:
>>>>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553906.html
>>>>>>>>
>>>>>>>>
>>>>>>>> (I'm working on rebasing the patch on top of the latest trunk which
>>>>>>>> has changed some of the same code but it'd be helpful to get a go-
>>>>>>>> ahead on substance the changes. I don't expect the rebase to
>>>>>>>> require any substantive modifications.)
>>>>>>>>
>>>>>>>> Martin
>>>>>>>>
>>>>>>>> On 9/14/20 4:01 PM, Martin Sebor wrote:
>>>>>>>>> On 9/4/20 11:14 AM, Jason Merrill wrote:
>>>>>>>>>> On 9/3/20 2:44 PM, Martin Sebor wrote:
>>>>>>>>>>> On 9/1/20 1:22 PM, Jason Merrill wrote:
>>>>>>>>>>>> On 8/11/20 12:19 PM, Martin Sebor via Gcc-patches wrote:
>>>>>>>>>>>>> -Wplacement-new handles array indices and pointer offsets
>>>>>>>>>>>>> the same:
>>>>>>>>>>>>> by adjusting them by the size of the element. That's
>>>>>>>>>>>>> correct for
>>>>>>>>>>>>> the latter but wrong for the former, causing false
>>>>>>>>>>>>> positives when
>>>>>>>>>>>>> the element size is greater than one.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In addition, the warning doesn't even attempt to handle
>>>>>>>>>>>>> arrays of
>>>>>>>>>>>>> arrays. I'm not sure if I forgot or if I simply didn't
>>>>>>>>>>>>> think of
>>>>>>>>>>>>> it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The attached patch corrects these oversights by replacing most
>>>>>>>>>>>>> of the -Wplacement-new code with a call to compute_objsize
>>>>>>>>>>>>> which
>>>>>>>>>>>>> handles all this correctly (plus more), and is also better
>>>>>>>>>>>>> tested.
>>>>>>>>>>>>> But even compute_objsize has bugs: it trips up while
>>>>>>>>>>>>> converting
>>>>>>>>>>>>> wide_int to offset_int for some pointer offset ranges. Since
>>>>>>>>>>>>> handling the C++ IL required changes in this area the patch
>>>>>>>>>>>>> also
>>>>>>>>>>>>> fixes that.
>>>>>>>>>>>>>
>>>>>>>>>>>>> For review purposes, the patch affects just the middle end.
>>>>>>>>>>>>> The C++ diff pretty much just removes code from the front end.
>>>>>>>>>>>>
>>>>>>>>>>>> The C++ changes are OK.
>>>>>>>>>>>
>>>>>>>>>>> Thank you for looking at the rest as well.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> -compute_objsize (tree ptr, int ostype, access_ref *pref,
>>>>>>>>>>>>> - bitmap *visited, const vr_values *rvals /*
>>>>>>>>>>>>> = NULL */)
>>>>>>>>>>>>> +compute_objsize (tree ptr, int ostype, access_ref *pref,
>>>>>>>>>>>>> bitmap *visited,
>>>>>>>>>>>>> + const vr_values *rvals)
>>>>>>>>>>>>
>>>>>>>>>>>> This reformatting seems unnecessary, and I prefer to keep
>>>>>>>>>>>> the comment about the default argument.
>>>>>>>>>>>
>>>>>>>>>>> This overload doesn't take a default argument. (There was a
>>>>>>>>>>> stray
>>>>>>>>>>> declaration of a similar function at the top of the file that
>>>>>>>>>>> had
>>>>>>>>>>> one. I've removed it.)
>>>>>>>>>>
>>>>>>>>>> Ah, true.
>>>>>>>>>>
>>>>>>>>>>>>> - if (!size || TREE_CODE (size) != INTEGER_CST)
>>>>>>>>>>>>> - return false;
>>>>>>>>>>>> >...
>>>>>>>>>>>>
>>>>>>>>>>>> You change some failure cases in compute_objsize to return
>>>>>>>>>>>> success with a maximum range, while others continue to
>>>>>>>>>>>> return failure. This needs commentary about the design
>>>>>>>>>>>> rationale.
>>>>>>>>>>>
>>>>>>>>>>> This is too much for a comment in the code but the background is
>>>>>>>>>>> this: compute_objsize initially returned the object size as a
>>>>>>>>>>> constant.
>>>>>>>>>>> Recently, I have enhanced it to return a range to improve
>>>>>>>>>>> warnings for
>>>>>>>>>>> allocated objects. With that, a failure can be turned into
>>>>>>>>>>> success by
>>>>>>>>>>> having the function set the range to that of the largest
>>>>>>>>>>> object. That
>>>>>>>>>>> should simplify the function's callers and could even improve
>>>>>>>>>>> the detection of some invalid accesses. Once this change is
>>>>>>>>>>> made
>>>>>>>>>>> it might even be possible to change its return type to void.
>>>>>>>>>>>
>>>>>>>>>>> The change that caught your eye is necessary to make the
>>>>>>>>>>> function
>>>>>>>>>>> a drop-in replacement for the C++ front end code which makes
>>>>>>>>>>> this
>>>>>>>>>>> same assumption. Without it, a number of test cases that
>>>>>>>>>>> exercise
>>>>>>>>>>> VLAs fail in g++.dg/warn/Wplacement-new-size-5.C. For example:
>>>>>>>>>>>
>>>>>>>>>>> void f (int n)
>>>>>>>>>>> {
>>>>>>>>>>> char a[n];
>>>>>>>>>>> new (a - 1) int ();
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> Changing any of the other places isn't necessary for existing
>>>>>>>>>>> tests
>>>>>>>>>>> to pass (and I didn't want to introduce too much churn). But
>>>>>>>>>>> I do
>>>>>>>>>>> want to change the rest of the function along the same lines
>>>>>>>>>>> at some
>>>>>>>>>>> point.
>>>>>>>>>>
>>>>>>>>>> Please do change the other places to be consistent; better to
>>>>>>>>>> have more churn than to leave the function half-updated. That
>>>>>>>>>> can be a separate patch if you prefer, but let's do it now
>>>>>>>>>> rather than later.
>>>>>>>>>
>>>>>>>>> I've made most of these changes in the other patch (also
>>>>>>>>> attached).
>>>>>>>>> I'm quite happy with the result but it turned out to be a lot more
>>>>>>>>> work than either of us expected, mostly due to the amount of
>>>>>>>>> testing.
>>>>>>>>>
>>>>>>>>> I've left a couple of failing cases in place mainly as reminders
>>>>>>>>> to handle them better (which means I also didn't change the caller
>>>>>>>>> to avoid testing for failures). I've also added TODO notes with
>>>>>>>>> reminders to handle some of the new codes more completely.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>> + special_array_member sam{ };
>>>>>>>>>>>>
>>>>>>>>>>>> sam is always set by component_ref_size, so I don't think
>>>>>>>>>>>> it's necessary to initialize it at the declaration.
>>>>>>>>>>>
>>>>>>>>>>> I find initializing pass-by-pointer local variables helpful but
>>>>>>>>>>> I don't insist on it.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> @@ -187,7 +187,7 @@ decl_init_size (tree decl, bool min)
>>>>>>>>>>>>> tree last_type = TREE_TYPE (last);
>>>>>>>>>>>>> if (TREE_CODE (last_type) != ARRAY_TYPE
>>>>>>>>>>>>> || TYPE_SIZE (last_type))
>>>>>>>>>>>>> - return size;
>>>>>>>>>>>>> + return size ? size : TYPE_SIZE_UNIT (type);
>>>>>>>>>>>>
>>>>>>>>>>>> This change seems to violate the comment for the function.
>>>>>>>>>>>
>>>>>>>>>>> By my reading (and writing) the change is covered by the first
>>>>>>>>>>> sentence:
>>>>>>>>>>>
>>>>>>>>>>> Returns the size of the object designated by DECL
>>>>>>>>>>> considering
>>>>>>>>>>> its initializer if it either has one or if it would not
>>>>>>>>>>> affect
>>>>>>>>>>> its size, ...
>>>>>>>>>>
>>>>>>>>>> OK, I see it now.
>>>>>>>>>>
>>>>>>>>>>> It handles a number of cases in Wplacement-new-size.C fail that
>>>>>>>>>>> construct a larger object in an extern declaration of a
>>>>>>>>>>> template,
>>>>>>>>>>> like this:
>>>>>>>>>>>
>>>>>>>>>>> template <class> struct S { char c; };
>>>>>>>>>>> extern S<int> s;
>>>>>>>>>>>
>>>>>>>>>>> void f ()
>>>>>>>>>>> {
>>>>>>>>>>> new (&s) int ();
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> I don't know why DECL_SIZE isn't set here (I don't think it can
>>>>>>>>>>> be anything but equal to TYPE_SIZE, can it?) and other than
>>>>>>>>>>> struct
>>>>>>>>>>> objects with a flexible array member where this identity doesn't
>>>>>>>>>>> hold I can't think of others. Am I missing something?
>>>>>>>>>>
>>>>>>>>>> Good question. The attached patch should fix that, so you
>>>>>>>>>> shouldn't need the change to decl_init_size:
>>>>>>>>>
>>>>>>>>> I've integrated it into the bug fix.
>>>>>>>>>
>>>>>>>>> Besides the usual x86_64-linux bootstrap/regtest I tested both
>>>>>>>>> patches by building a few packages, including Binutils/GDB, Glibc,
>>>>>>>>> and verifying no new warnings show up.
>>>>>>>>>
>>>>>>>>> Martin
>>>>>>
>>>>>>> +offset_int
>>>>>>> +access_ref::size_remaining (offset_int *pmin /* = NULL */) const
>>>>>>
>>>>>> For the various member functions, please include the comments with
>>>>>> the definition as well as the in-class declaration.
>>>>>
>>>>> Only one access_ref member function is defined out-of-line:
>>>>> offset_bounded(). I've adjusted the comment and copied it above
>>>>> the function definition.
>>>>>
>>>>>>
>>>>>>> + if (offrng[1] < offrng[0])
>>>>>>
>>>>>> What does it mean for the max offset to be less than the min
>>>>>> offset? I wouldn't expect that to ever happen with wide integers.
>>>>>
>>>>> The offset is represented in sizetype with negative values represented
>>>>> as large positive values, but has to be converted to ptrdiff_t.
>>>>
>>>> It looks to me like the offset is offset_int, which is both signed
>>>> and big enough to hold all values of sizetype without turning large
>>>> positive values into negative values. Where are these
>>>> sign-switching conversions happening?
>>>
>>> In get_offset_range in builtins.c.
>>
>> Since we're converting to offset_int there, why not give the
>> offset_int the real value rather than a bogus negative value?
>
> I don't understand the question: the real offset (in the program)
> is negative when its sizetype representation is greater than
> PTRDIFF_MAX. It's worked this way for years.
OK, then we're back to my original question: why is the max offset less
than the min offset? If the range includes negative values, why isn't
the more-negative one the minimum?
>>>>> These
>>>>> cases come up when the unsigned offset is an ordinary range that
>>>>> corresponds to an anti-range, such as here:
>>>>>
>>>>> extern char a[2];
>>>>>
>>>>> void f (unsigned long i)
>>>>> {
>>>>> if (i == 0)
>>>>> return;
>>>>> a[i] = 0; // i's range is [1, -1] (i.e., [1, SIZE_MAX]
>>>>> }
>>>>>
>>>>>>
>>>>>>> + /* Return true if OFFRNG is bounded to a subrange of possible
>>>>>>> offset
>>>>>>> + values. */
>>>>>>> + bool offset_bounded () const;
>>>>>>
>>>>>> I don't understand how you're using this. The implementation
>>>>>> checks for the possible offset values falling outside those
>>>>>> representable by ptrdiff_t, unless the range is only a single
>>>>>> value. And then the only use is
>>>>>>
>>>>>>> + if (ref.offset_zero () || !ref.offset_bounded ())
>>>>>>> + inform (DECL_SOURCE_LOCATION (ref.ref),
>>>>>>> + "%qD declared here", ref.ref);
>>>>>>> + else if (ref.offrng[0] == ref.offrng[1])
>>>>>>> + inform (DECL_SOURCE_LOCATION (ref.ref),
>>>>>>> + "at offset %wi from %qD declared here",
>>>>>>> + ref.offrng[0].to_shwi (), ref.ref);
>>>>>>> + else
>>>>>>> + inform (DECL_SOURCE_LOCATION (ref.ref),
>>>>>>> + "at offset [%wi, %wi] from %qD declared here",
>>>>>>> + ref.offrng[0].to_shwi (), ref.offrng[1].to_shwi (),
>>>>>>> ref.ref);
>>>>>>
>>>>>> So if the possible offsets are all representable by ptrdiff_t, we
>>>>>> don't print the range? The middle case also looks unreachable,
>>>>>> since offset_bounded will return false in that case.
>>>>>
>>>>> The function was originally named "offset_unbounded." I changed
>>>>> it to "offset_bounded" but looks like I didn't finish the job or
>>>>> add any tests for it.
>>>>>
>>>>> The goal of conditionals is to avoid overwhelming the user with
>>>>> excessive numbers that may not be meaningful or even relevant
>>>>> to the warning. I've corrected the function body, tweaked and
>>>>> renamed the get_range function to get_offset_range to do a better
>>>>> job of extracting ranges from the types of some nonconstant
>>>>> expressions the front end passes it, and added a new test for
>>>>> all this. Attached is the new revision.
>>>>>
>>>>> Martin
>>>>
>>>
>>
>
next prev parent reply other threads:[~2020-10-07 19:28 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-11 16:19 [PATCH] " Martin Sebor
2020-08-19 15:00 ` [PING][PATCH] " Martin Sebor
2020-08-28 15:42 ` [PING 2][PATCH] " Martin Sebor
2020-09-01 19:22 ` [PATCH] " Jason Merrill
2020-09-03 18:44 ` Martin Sebor
2020-09-04 17:14 ` Jason Merrill
2020-09-14 22:01 ` Martin Sebor
2020-09-21 21:17 ` [PING][PATCH] " Martin Sebor
2020-09-22 20:05 ` Martin Sebor
2020-09-26 5:17 ` Jason Merrill
2020-09-28 22:01 ` Martin Sebor
2020-10-05 16:37 ` Martin Sebor
2020-10-07 14:26 ` Jason Merrill
2020-10-07 14:42 ` Martin Sebor
2020-10-07 15:07 ` Jason Merrill
2020-10-07 15:19 ` Martin Sebor
2020-10-07 19:28 ` Jason Merrill [this message]
2020-10-07 20:11 ` Martin Sebor
2020-10-07 21:01 ` Jason Merrill
2020-10-08 19:18 ` Martin Sebor
2020-10-08 19:40 ` Jason Merrill
2020-10-09 14:51 ` Martin Sebor
2020-10-09 15:13 ` Jason Merrill
2020-10-11 22:45 ` Martin Sebor
2020-10-12 3:44 ` Jason Merrill
2020-10-12 15:21 ` Martin Sebor
2020-10-13 9:46 ` Christophe Lyon
2020-10-13 16:59 ` Martin Sebor
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=647ab677-41be-2030-c734-bde03071057a@redhat.com \
--to=jason@redhat.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=msebor@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).