From: Jason Merrill <jason@redhat.com>
To: Martin Sebor <msebor@gmail.com>, Jeff Law <law@redhat.com>,
gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: PING [PATCH] avoid -Warray-bounds checks for vtable assignments (PR 98266)
Date: Tue, 2 Mar 2021 09:11:09 -0500 [thread overview]
Message-ID: <7c667491-dcfc-da4c-40e7-bfd303d69e2b@redhat.com> (raw)
In-Reply-To: <87a14261-2595-11f7-61b5-f3ce30132dd4@gmail.com>
On 3/1/21 6:11 PM, Martin Sebor wrote:
> On 2/24/21 5:35 PM, Jason Merrill wrote:
>> On 2/23/21 6:07 PM, Martin Sebor wrote:
>>> On 2/23/21 2:52 PM, Jason Merrill wrote:
>>>> On 2/23/21 11:02 AM, Martin Sebor wrote:
>>>>> [CC Jason for any further comments/clarification]
>>>>>
>>>>> On 2/9/21 10:49 AM, Martin Sebor wrote:
>>>>>> On 2/8/21 4:11 PM, Jeff Law wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2/8/21 3:44 PM, Martin Sebor wrote:
>>>>>>>> On 2/8/21 3:26 PM, Jeff Law wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2/8/21 2:56 PM, Martin Sebor wrote:
>>>>>>>>>> On 2/8/21 12:59 PM, Jeff Law wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 1/19/21 5:56 PM, Martin Sebor via Gcc-patches wrote:
>>>>>>>>>>>> Similar to the problem reported for -Wstringop-overflow in
>>>>>>>>>>>> pr98266
>>>>>>>>>>>> and already fixed, -Warray-bounds is also susceptible to false
>>>>>>>>>>>> positives in assignments and copies involving virtual
>>>>>>>>>>>> inheritance.
>>>>>>>>>>>> Because the two warnings don't share code yet (hopefully in
>>>>>>>>>>>> GCC 12)
>>>>>>>>>>>> the attached patch adds its own workaround for this problem to
>>>>>>>>>>>> gimple-array-bounds.cc, this one slightly more crude because of
>>>>>>>>>>>> the limited insight the array bounds checking has into the
>>>>>>>>>>>> checked
>>>>>>>>>>>> expressions.
>>>>>>>>>>>>
>>>>>>>>>>>> Tested on x86_64-linux.
>>>>>>>>>>>>
>>>>>>>>>>>> Martin
>>>>>>>>>>>>
>>>>>>>>>>>> gcc-98266.diff
>>>>>>>>>>>>
>>>>>>>>>>>> PR middle-end/98266 - bogus array subscript is partly
>>>>>>>>>>>> outside array
>>>>>>>>>>>> bounds on virtual inheritance
>>>>>>>>>>>>
>>>>>>>>>>>> gcc/ChangeLog:
>>>>>>>>>>>>
>>>>>>>>>>>> PR middle-end/98266
>>>>>>>>>>>> * gimple-array-bounds.cc
>>>>>>>>>>>> (array_bounds_checker::check_array_bounds):
>>>>>>>>>>>> Avoid checking references involving artificial members.
>>>>>>>>>>>>
>>>>>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>>>>>
>>>>>>>>>>>> PR middle-end/98266
>>>>>>>>>>>> * g++.dg/warn/Warray-bounds-15.C: New test.
>>>>>>>>>>> It seems to me that we've got the full statement at some
>>>>>>>>>>> point and
>>>>>>>>>>> thus
>>>>>>>>>>> the full expression so at some point couldn't we detect when
>>>>>>>>>>> TYPE_SIZE_UNIT!= DECL_SIZE_UNIT? Or should we be using
>>>>>>>>>>> TYPE_SIZE_UNIT
>>>>>>>>>>> rather than DECL_SIZE_UNIT in gimple-array-bounds.cc
>>>>>>>>>>>
>>>>>>>>>>> Am I missing something?
>>>>>>>>>>
>>>>>>>>>> The expression we're looking at when the false positive is issued
>>>>>>>>>> is the MEM_REF in the LHS of:
>>>>>>>>>>
>>>>>>>>>> MEM[(struct D *)&D.2652 + 24B]._vptr.D = &MEM <int (*) ()> [(void
>>>>>>>>>> *)&_ZTC1E24_1D + 24B];
>>>>>>>>>>
>>>>>>>>>> TREE_TYPE(LHS) is D, DECL_SIZE_UNIT (D.2652) is 24, and
>>>>>>>>>> TYPE_SIZE_UNIT(D) is also 24, so there's no discrepancy between
>>>>>>>>>> DECL_SIZE and TYPE_SIZE.
>>>>>>>>> So that seems like it's a different issue then, unrelated to
>>>>>>>>> 97595.
>>>>>>>>> Right?
>>>>>>>>
>>>>>>>> I think the underlying problem is the same. We're getting a size
>>>>>>>> that doesn't correspond to what's actually being accessed, and it
>>>>>>>> happens because of the virtual inheritance. In pr97595 Jason
>>>>>>>> suggested to use the decl/type size inequality to identify this
>>>>>>>> case but I think we could have just as well used DECL_ARTIFICIAL
>>>>>>>> instead. At least the test cases from pr97595 both pass with
>>>>>>>> this change.
>>>>>>> But in the 98266 case the type and decl sizes are the same. So
>>>>>>> to be
>>>>>>> true that would mean that the underlying type we're using to access
>>>>>>> memory differs from its actual type. Is that the case in the IL?
>>>>>>> And
>>>>>>> does this have wider implications for diagnostics or
>>>>>>> optimizations that
>>>>>>> rely on accurate type sizing?
>>>>>>>
>>>>>>> I'm just trying to make sure I understand, not accepting or
>>>>>>> rejecting
>>>>>>> the patch yet.
>>>>>>
>>>>>> The part of the IL with the MEM_REF is this:
>>>>>>
>>>>>> void g ()
>>>>>> {
>>>>>> void * D.2789;
>>>>>> struct E D.2652;
>>>>>>
>>>>>> <bb 2> [local count: 1073741824]:
>>>>>> E::E (&D.2652, "");
>>>>>> f (&D.2652);
>>>>>>
>>>>>> <bb 3> [local count: 1073741824]:
>>>>>> MEM[(struct D *)&D.2652 + 24B]._vptr.D = &MEM <int (*) ()>
>>>>>> [(void *)&_ZTC1E24_1D + 24B];
>>>>>> ...
>>>>>>
>>>>>> The access here is to the _vptr.D pointer member of D.2652 which is
>>>>>> just past the end of the parent object (as reflected by its SIZE):
>>>>>> it sets sets up the virtual table pointer.
>>>>>>
>>>>>> The access in pr97595 is to the member subobject, which, as Jason
>>>>>> explained (and I accordingly documented under DECL_SIZE in tree.h),
>>>>>> is also laid out separately from the parent object.
>>>>>>
>>>>>> These cases aren't exactly the same (which is also why the test
>>>>>> I added for -Warray-bounds in pr97595 didn't expose this bug) but
>>>>>> they are closely related. The one here can be distinguished by
>>>>>> DECL_ARTIFICAL. The other by the DECL_SIZE != TYPE_SIZE member
>>>>>> inequality.
>>>>>>
>>>>>> Might this impact other warnings? I'd say so if they don't take
>>>>>> these things into account. I just learned about this in pr97595
>>>>>> which was a -Wstringop-overflow false positive but I also saw
>>>>>> a similar instance of -Warray-bounds with my patch to improve
>>>>>> caching and enhance array bounds checking. I dealt with that
>>>>>> instance of the warning in that patch but proactively added
>>>>>> a test case to the fix for pr97595. But the test case is focused
>>>>>> on the subobject access and not on one to the virtual table so
>>>>>> (as I said above) it didn't expose this bug.
>>>>>>
>>>>>> Might this also impact optimizations? I can imagine someone
>>>>>> unaware of this "gotcha" making the same "naive" assumption
>>>>>> I did, but I'd also expect such an invalid assumption to be
>>>>>> found either in code review or quickly cause problems in
>>>>>> testing.
>>>>>
>>>>> Jeff, does this answer your question?
>>>>
>>>> I don't see how the issue here depends on the artificiality of the
>>>> vptr; I'd expect to see the same problem with a data member. The
>>>> problem is that a D base subobject is smaller than a complete D
>>>> object, and in this case the base subobject is allocated such that
>>>> if it were a full D object, it would overlap the end of E. And
>>>> we're checking the MEM_REF as though accessing a full D object, so
>>>> we get a warning.
>>>
>>> Thanks for chiming in!
>>>
>>> There is no warning for accesses to data members defined in
>>> the virtual bases because their offset isn't constant. For example,
>>> given the following modified version of the test case from the patch:
>>>
>>> struct A
>>> {
>>> virtual ~A ();
>>> int a;
>>> };
>>>
>>> struct B: virtual A { int b; };
>>> struct C: virtual A { int c; };
>>> struct D: virtual B, virtual C
>>> {
>>> int d;
>>> };
>>>
>>> D d;
>>>
>>> void g ()
>>> {
>>> D *p = &d;
>>> p->a = p->b = p->c = p->d = 1;
>>> }
>>>
>>> we end up with:
>>>
>>> void g ()
>>> {
>>> ...
>>> <bb 2> [local count: 1073741824]:
>>> d.d = 1; <<< okay
>>> _1 = d._vptr.D; <<< DECL_ARTIFICIAL
>>> _2 = MEM[(long int *)_1 + -40B]; <<< not checked
>>> _3 = (sizetype) _2; <<< not constant...
>>> _4 = &d + _3;
>>> MEM[(struct C *)_4].c = 1; <<< ...no warning
>>
>> Interesting. I do get a warning if I embed the access in the destructor:
>>
>> struct A
>> {
>> virtual ~A ();
>> int a;
>> };
>>
>> struct B: virtual A { };
>> struct C: virtual A {
>> int c;
>> ~C() { c = 0; } // warns twice, for _vptr and c accesses
>> };
>> struct D: virtual B, virtual C
>> {
>> D();
>> };
>>
>> void g()
>> {
>> D d;
>> }
>
> Ah, right. Unlike in member functions, in a dtor (or in the ctors
> of the derived class) the offsets are known/constant so accesses
> involving those are checked and we do get a false positive. Thanks
> for the test case!
>
>>
>>> _5 = MEM[(long int *)_1 + -24B];
>>> _6 = (sizetype) _5;
>>> _7 = &d + _6;
>>> MEM[(struct B *)_7].b = 1;
>>> _8 = MEM[(long int *)_1 + -32B];
>>> _9 = (sizetype) _8;
>>> _10 = &d + _9;
>>> MEM[(struct A *)_10].a = 1;
>>> return;
>>>
>>> }
>>>
>>>> The general issue about the confusion between complete and as-base
>>>> types is PR 22488; I have various thoughts about a proper fix, but
>>>> there have always been higher-priority things to work on.
>>>
>>> Thanks for the reference!
>>>
>>>> One possible approach to this PR is that we don't need to check an
>>>> intermediate _REF; in
>>>>
>>>> MEM[(struct D *)&D.2651 + 24B]._vptr.D
>>>>
>>>> we aren't loading a whole D, we're only looking at the _vptr, which
>>>> is within the bounds of E.
>>>
>>> This is also what the patch does: it keeps us from reaching
>>> the MEM_REF because _vptr.D is artificial.
>>
>> My point applies equally to the access to c in my testcase above: we
>> shouldn't check lvalue refs as though we're loading or storing the
>> whole aggregate object.
>>
>> The full access is (*((D*)&e))._vptr.D. The MEM_REF represents the *,
>> which we shouldn't check by itself, because it doesn't imply a memory
>> access of its own. We should only check the COMPONENT_REF around it
>> that is an actual memory access.
>
> The -Warray-bounds implementation checks every ARRAY_REF and MEM_REF,
> regardless of their context(*).
Aha. That seems like a flaw, but as you say, not one that can be
addressed easily.
> It doesn't check COMPONENT_REF except
> while checking the other REFs (while walking up the use-def chains),
> and the checks don't know the context of the REF. The hack this patch
> put in was to circumvent that when the walk saw a COMPONENT_REF to
> an artificial member (which as you showed isn't enough).
>
> I'm not sure if what you suggest (IIUC) is feasible without reworking
> how the checking works in general.
> What is feasible is working harder
> and detecting if a) the type of the accessed object has virtual bases
> and b) if the offset of the accessed field is within the bounds of
> the enclosing object. In that case, skipping the MEM_REF check avoids
> the warnings for all the cases we have discussed (artificial members,
> ctors and dtors). It still seems hacky but sufficiently narrow in
> scope for GCC 11(**).
>
> Please see the attached revision.
> + else if (TREE_CODE (t) == COMPONENT_REF
> + && TREE_CODE (TREE_OPERAND (t, 0)) == MEM_REF)
> + {
> + /* Hack: Skip MEM_REF checks in accesses to a member of a type
> + with virtual bases at an offset that's in bounds of the size
> + of the enclosing object. See pr98266 and pr97595. */
> + tree ref = TREE_OPERAND (TREE_OPERAND (t, 0), 0);
> + if (TREE_CODE (ref) == ADDR_EXPR)
> + ref = TREE_OPERAND (ref, 0);
> + tree reftype = TREE_TYPE (ref);
> + if (DECL_P (ref)
> + && RECORD_OR_UNION_TYPE_P (reftype)
> + && TYPE_BINFO (reftype))
> + {
> + tree fld = TREE_OPERAND (t, 1);
> + tree fldpos = byte_position (fld);
> + if (tree refsize = DECL_SIZE_UNIT (ref))
> + if (TREE_CODE (refsize) == INTEGER_CST
> + && tree_int_cst_lt (fldpos, refsize))
> + *walk_subtree = false;
> + }
> + }
Here fld F is a member of a base class B, and ref is a variable of a
derived class D, so you're comparing the offset of F in B to the size of
D without considering the offset of B in D. I think you just need to
add the MEM_REF offset to fldpos before comparing.
Jason
next prev parent reply other threads:[~2021-03-02 14:11 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-20 0:56 Martin Sebor
2021-01-29 17:22 ` PING " Martin Sebor
2021-02-06 17:12 ` PING 2 " Martin Sebor
2021-02-08 19:59 ` Jeff Law
2021-02-08 21:56 ` Martin Sebor
2021-02-08 22:26 ` Jeff Law
2021-02-08 22:44 ` Martin Sebor
2021-02-08 23:11 ` Jeff Law
2021-02-09 17:49 ` Martin Sebor
2021-02-23 16:02 ` PING " Martin Sebor
2021-02-23 21:52 ` Jason Merrill
2021-02-23 23:07 ` Martin Sebor
2021-02-25 0:35 ` Jason Merrill
2021-03-01 23:11 ` Martin Sebor
2021-03-02 14:11 ` Jason Merrill [this message]
2021-03-03 23:20 ` Martin Sebor
2021-03-04 5:33 ` Jason Merrill
2021-03-04 17:44 ` Martin Sebor
2021-03-04 20:00 ` Jason Merrill
2021-02-25 23:47 ` Jeff Law
2021-02-26 3:56 ` Jason Merrill
2021-02-25 23:40 ` Jeff Law
2021-03-01 23:16 ` Martin Sebor
2021-02-25 23:15 ` Jeff Law
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7c667491-dcfc-da4c-40e7-bfd303d69e2b@redhat.com \
--to=jason@redhat.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=law@redhat.com \
--cc=msebor@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).