public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jason Merrill <jason@redhat.com>
To: Martin Sebor <msebor@gmail.com>, Jeff Law <law@redhat.com>,
	gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: PING [PATCH] avoid -Warray-bounds checks for vtable assignments (PR 98266)
Date: Tue, 2 Mar 2021 09:11:09 -0500	[thread overview]
Message-ID: <7c667491-dcfc-da4c-40e7-bfd303d69e2b@redhat.com> (raw)
In-Reply-To: <87a14261-2595-11f7-61b5-f3ce30132dd4@gmail.com>

On 3/1/21 6:11 PM, Martin Sebor wrote:
> On 2/24/21 5:35 PM, Jason Merrill wrote:
>> On 2/23/21 6:07 PM, Martin Sebor wrote:
>>> On 2/23/21 2:52 PM, Jason Merrill wrote:
>>>> On 2/23/21 11:02 AM, Martin Sebor wrote:
>>>>> [CC Jason for any further comments/clarification]
>>>>>
>>>>> On 2/9/21 10:49 AM, Martin Sebor wrote:
>>>>>> On 2/8/21 4:11 PM, Jeff Law wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2/8/21 3:44 PM, Martin Sebor wrote:
>>>>>>>> On 2/8/21 3:26 PM, Jeff Law wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2/8/21 2:56 PM, Martin Sebor wrote:
>>>>>>>>>> On 2/8/21 12:59 PM, Jeff Law wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 1/19/21 5:56 PM, Martin Sebor via Gcc-patches wrote:
>>>>>>>>>>>> Similar to the problem reported for -Wstringop-overflow in 
>>>>>>>>>>>> pr98266
>>>>>>>>>>>> and already fixed, -Warray-bounds is also susceptible to false
>>>>>>>>>>>> positives in assignments and copies involving virtual 
>>>>>>>>>>>> inheritance.
>>>>>>>>>>>> Because the two warnings don't share code yet (hopefully in 
>>>>>>>>>>>> GCC 12)
>>>>>>>>>>>> the attached patch adds its own workaround for this problem to
>>>>>>>>>>>> gimple-array-bounds.cc, this one slightly more crude because of
>>>>>>>>>>>> the limited insight the array bounds checking has into the 
>>>>>>>>>>>> checked
>>>>>>>>>>>> expressions.
>>>>>>>>>>>>
>>>>>>>>>>>> Tested on x86_64-linux.
>>>>>>>>>>>>
>>>>>>>>>>>> Martin
>>>>>>>>>>>>
>>>>>>>>>>>> gcc-98266.diff
>>>>>>>>>>>>
>>>>>>>>>>>> PR middle-end/98266 - bogus array subscript is partly 
>>>>>>>>>>>> outside array
>>>>>>>>>>>> bounds on virtual inheritance
>>>>>>>>>>>>
>>>>>>>>>>>> gcc/ChangeLog:
>>>>>>>>>>>>
>>>>>>>>>>>>       PR middle-end/98266
>>>>>>>>>>>>       * gimple-array-bounds.cc
>>>>>>>>>>>> (array_bounds_checker::check_array_bounds):
>>>>>>>>>>>>       Avoid checking references involving artificial members.
>>>>>>>>>>>>
>>>>>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>>>>>
>>>>>>>>>>>>       PR middle-end/98266
>>>>>>>>>>>>       * g++.dg/warn/Warray-bounds-15.C: New test.
>>>>>>>>>>> It seems to me that we've got the full statement at some 
>>>>>>>>>>> point and
>>>>>>>>>>> thus
>>>>>>>>>>> the full expression so at some point couldn't we detect when
>>>>>>>>>>> TYPE_SIZE_UNIT!= DECL_SIZE_UNIT?  Or should we be using 
>>>>>>>>>>> TYPE_SIZE_UNIT
>>>>>>>>>>> rather than DECL_SIZE_UNIT in gimple-array-bounds.cc
>>>>>>>>>>>
>>>>>>>>>>> Am I missing something?
>>>>>>>>>>
>>>>>>>>>> The expression we're looking at when the false positive is issued
>>>>>>>>>> is the MEM_REF in the LHS of:
>>>>>>>>>>
>>>>>>>>>> MEM[(struct D *)&D.2652 + 24B]._vptr.D = &MEM <int (*) ()> [(void
>>>>>>>>>> *)&_ZTC1E24_1D + 24B];
>>>>>>>>>>
>>>>>>>>>> TREE_TYPE(LHS) is D, DECL_SIZE_UNIT (D.2652) is 24, and
>>>>>>>>>> TYPE_SIZE_UNIT(D) is also 24, so there's no discrepancy between
>>>>>>>>>> DECL_SIZE and TYPE_SIZE.
>>>>>>>>> So that seems like it's a different issue then, unrelated to 
>>>>>>>>> 97595.
>>>>>>>>> Right?
>>>>>>>>
>>>>>>>> I think the underlying problem is the same.  We're getting a size
>>>>>>>> that doesn't correspond to what's actually being accessed, and it
>>>>>>>> happens because of the virtual inheritance.  In pr97595 Jason
>>>>>>>> suggested to use the decl/type size inequality to identify this
>>>>>>>> case but I think we could have just as well used DECL_ARTIFICIAL
>>>>>>>> instead.  At least the test cases from pr97595 both pass with
>>>>>>>> this change.
>>>>>>> But in the 98266 case the type and decl sizes are the same.  So 
>>>>>>> to be
>>>>>>> true that would mean that the underlying type we're using to access
>>>>>>> memory differs from its actual type.  Is that the case in the IL? 
>>>>>>> And
>>>>>>> does this have wider implications for diagnostics or 
>>>>>>> optimizations that
>>>>>>> rely on accurate type sizing?
>>>>>>>
>>>>>>> I'm just trying to make sure I understand, not accepting or 
>>>>>>> rejecting
>>>>>>> the patch yet.
>>>>>>
>>>>>> The part of the IL with the MEM_REF is this:
>>>>>>
>>>>>> void g ()
>>>>>> {
>>>>>>    void * D.2789;
>>>>>>    struct E D.2652;
>>>>>>
>>>>>>    <bb 2> [local count: 1073741824]:
>>>>>>    E::E (&D.2652, "");
>>>>>>    f (&D.2652);
>>>>>>
>>>>>>    <bb 3> [local count: 1073741824]:
>>>>>>    MEM[(struct D *)&D.2652 + 24B]._vptr.D = &MEM <int (*) ()> 
>>>>>> [(void *)&_ZTC1E24_1D + 24B];
>>>>>>    ...
>>>>>>
>>>>>> The access here is to the _vptr.D pointer member of D.2652 which is
>>>>>> just past the end of the parent object (as reflected by its SIZE):
>>>>>> it sets sets up the virtual table pointer.
>>>>>>
>>>>>> The access in pr97595 is to the member subobject, which, as Jason
>>>>>> explained (and I accordingly documented under DECL_SIZE in tree.h),
>>>>>> is also laid out separately from the parent object.
>>>>>>
>>>>>> These cases aren't exactly the same (which is also why the test
>>>>>> I added for -Warray-bounds in pr97595 didn't expose this bug) but
>>>>>> they are closely related.  The one here can be distinguished by
>>>>>> DECL_ARTIFICAL.  The other by the DECL_SIZE != TYPE_SIZE member
>>>>>> inequality.
>>>>>>
>>>>>> Might this impact other warnings?  I'd say so if they don't take
>>>>>> these things into account.  I just learned about this in pr97595
>>>>>> which was a -Wstringop-overflow false positive but I also saw
>>>>>> a similar instance of -Warray-bounds with my patch to improve
>>>>>> caching and enhance array bounds checking.  I dealt with that
>>>>>> instance of the warning in that patch but proactively added
>>>>>> a test case to the fix for pr97595.  But the test case is focused
>>>>>> on the subobject access and not on one to the virtual table so
>>>>>> (as I said above) it didn't expose this bug.
>>>>>>
>>>>>> Might this also impact optimizations?  I can imagine someone
>>>>>> unaware of this "gotcha" making the same "naive" assumption
>>>>>> I did, but I'd also expect such an invalid assumption to be
>>>>>> found either in code review or quickly cause problems in
>>>>>> testing.
>>>>>
>>>>> Jeff, does this answer your question?
>>>>
>>>> I don't see how the issue here depends on the artificiality of the 
>>>> vptr; I'd expect to see the same problem with a data member.  The 
>>>> problem is that a D base subobject is smaller than a complete D 
>>>> object, and in this case the base subobject is allocated such that 
>>>> if it were a full D object, it would overlap the end of E.  And 
>>>> we're checking the MEM_REF as though accessing a full D object, so 
>>>> we get a warning.
>>>
>>> Thanks for chiming in!
>>>
>>> There is no warning for accesses to data members defined in
>>> the virtual bases because their offset isn't constant.  For example,
>>> given the following modified version of the test case from the patch:
>>>
>>> struct A
>>> {
>>>    virtual ~A ();
>>>    int a;
>>> };
>>>
>>> struct B: virtual A { int b; };
>>> struct C: virtual A { int c; };
>>> struct D: virtual B, virtual C
>>> {
>>>    int d;
>>> };
>>>
>>> D d;
>>>
>>> void g ()
>>> {
>>>    D *p = &d;
>>>    p->a = p->b = p->c = p->d = 1;
>>> }
>>>
>>> we end up with:
>>>
>>> void g ()
>>> {
>>>    ...
>>>    <bb 2> [local count: 1073741824]:
>>>    d.d = 1;                           <<< okay
>>>    _1 = d._vptr.D;                    <<< DECL_ARTIFICIAL
>>>    _2 = MEM[(long int *)_1 + -40B];   <<< not checked
>>>    _3 = (sizetype) _2;                <<< not constant...
>>>    _4 = &d + _3;
>>>    MEM[(struct C *)_4].c = 1;         <<< ...no warning
>>
>> Interesting.  I do get a warning if I embed the access in the destructor:
>>
>> struct A
>> {
>>    virtual ~A ();
>>    int a;
>> };
>>
>> struct B: virtual A { };
>> struct C: virtual A {
>>    int c;
>>    ~C() { c = 0; } // warns twice, for _vptr and c accesses
>> };
>> struct D: virtual B, virtual C
>> {
>>    D();
>> };
>>
>> void g()
>> {
>>    D d;
>> }
> 
> Ah, right.  Unlike in member functions, in a dtor (or in the ctors
> of the derived class) the offsets are known/constant so accesses
> involving those are checked and we do get a false positive.  Thanks
> for the test case!
> 
>>
>>>    _5 = MEM[(long int *)_1 + -24B];
>>>    _6 = (sizetype) _5;
>>>    _7 = &d + _6;
>>>    MEM[(struct B *)_7].b = 1;
>>>    _8 = MEM[(long int *)_1 + -32B];
>>>    _9 = (sizetype) _8;
>>>    _10 = &d + _9;
>>>    MEM[(struct A *)_10].a = 1;
>>>    return;
>>>
>>> }
>>>
>>>> The general issue about the confusion between complete and as-base 
>>>> types is PR 22488; I have various thoughts about a proper fix, but 
>>>> there have always been higher-priority things to work on.
>>>
>>> Thanks for the reference!
>>>
>>>> One possible approach to this PR is that we don't need to check an 
>>>> intermediate _REF; in
>>>>
>>>> MEM[(struct D *)&D.2651 + 24B]._vptr.D
>>>>
>>>> we aren't loading a whole D, we're only looking at the _vptr, which 
>>>> is within the bounds of E.
>>>
>>> This is also what the patch does: it keeps us from reaching
>>> the MEM_REF because _vptr.D is artificial.
>>
>> My point applies equally to the access to c in my testcase above: we 
>> shouldn't check lvalue refs as though we're loading or storing the 
>> whole aggregate object.
>>
>> The full access is (*((D*)&e))._vptr.D.  The MEM_REF represents the *, 
>> which we shouldn't check by itself, because it doesn't imply a memory 
>> access of its own.  We should only check the COMPONENT_REF around it 
>> that is an actual memory access.
> 
> The -Warray-bounds implementation checks every ARRAY_REF and MEM_REF,
> regardless of their context(*).

Aha.  That seems like a flaw, but as you say, not one that can be 
addressed easily.

> It doesn't check COMPONENT_REF except
> while checking the other REFs (while walking up the use-def chains),
> and the checks don't know the context of the REF.  The hack this patch
> put in was to circumvent that when the walk saw a COMPONENT_REF to
> an artificial member (which as you showed isn't enough).
> 
> I'm not sure if what you suggest (IIUC) is feasible without reworking
> how the checking works in general.

> What is feasible is working harder
> and detecting if a) the type of the accessed object has virtual bases
> and b) if the offset of the accessed field is within the bounds of
> the enclosing object.  In that case, skipping the MEM_REF check avoids
> the warnings for all the cases we have discussed (artificial members,
> ctors and dtors).  It still seems hacky but sufficiently narrow in
> scope for GCC 11(**).
> 
> Please see the attached revision.

> +  else if (TREE_CODE (t) == COMPONENT_REF
> +	   && TREE_CODE (TREE_OPERAND (t, 0)) == MEM_REF)
> +    {
> +      /* Hack: Skip MEM_REF checks in accesses to a member of a type
> +	 with virtual bases at an offset that's in bounds of the size
> +	 of the enclosing object.  See pr98266 and pr97595.  */
> +      tree ref = TREE_OPERAND (TREE_OPERAND (t, 0), 0);
> +      if (TREE_CODE (ref) == ADDR_EXPR)
> +	ref = TREE_OPERAND (ref, 0);
> +      tree reftype = TREE_TYPE (ref);
> +      if (DECL_P (ref)
> +	  && RECORD_OR_UNION_TYPE_P (reftype)
> +	  && TYPE_BINFO (reftype))
> +	{
> +	  tree fld = TREE_OPERAND (t, 1);
> +	  tree fldpos = byte_position (fld);
> +	  if (tree refsize = DECL_SIZE_UNIT (ref))
> +	    if (TREE_CODE (refsize) == INTEGER_CST
> +		&& tree_int_cst_lt (fldpos, refsize))
> +	      *walk_subtree = false;
> +	}
> +    }

Here fld F is a member of a base class B, and ref is a variable of a 
derived class D, so you're comparing the offset of F in B to the size of 
D without considering the offset of B in D.  I think you just need to 
add the MEM_REF offset to fldpos before comparing.

Jason


  reply	other threads:[~2021-03-02 14:11 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-20  0:56 Martin Sebor
2021-01-29 17:22 ` PING " Martin Sebor
2021-02-06 17:12   ` PING 2 " Martin Sebor
2021-02-08 19:59 ` Jeff Law
2021-02-08 21:56   ` Martin Sebor
2021-02-08 22:26     ` Jeff Law
2021-02-08 22:44       ` Martin Sebor
2021-02-08 23:11         ` Jeff Law
2021-02-09 17:49           ` Martin Sebor
2021-02-23 16:02             ` PING " Martin Sebor
2021-02-23 21:52               ` Jason Merrill
2021-02-23 23:07                 ` Martin Sebor
2021-02-25  0:35                   ` Jason Merrill
2021-03-01 23:11                     ` Martin Sebor
2021-03-02 14:11                       ` Jason Merrill [this message]
2021-03-03 23:20                         ` Martin Sebor
2021-03-04  5:33                           ` Jason Merrill
2021-03-04 17:44                             ` Martin Sebor
2021-03-04 20:00                               ` Jason Merrill
2021-02-25 23:47                 ` Jeff Law
2021-02-26  3:56                   ` Jason Merrill
2021-02-25 23:40         ` Jeff Law
2021-03-01 23:16           ` Martin Sebor
2021-02-25 23:15     ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7c667491-dcfc-da4c-40e7-bfd303d69e2b@redhat.com \
    --to=jason@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=law@redhat.com \
    --cc=msebor@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).