From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by sourceware.org (Postfix) with ESMTP id A3E563854830 for ; Thu, 4 Mar 2021 05:34:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org A3E563854830 Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-478-Ty3t_jFWN1qilsF3G-PUsA-1; Thu, 04 Mar 2021 00:34:08 -0500 X-MC-Unique: Ty3t_jFWN1qilsF3G-PUsA-1 Received: by mail-qv1-f72.google.com with SMTP id q104so19585049qvq.20 for ; Wed, 03 Mar 2021 21:34:08 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=v/xrnEpBRDJ8GQkeM0PPi7vLRZljU3BFC9Z6cRH2mF4=; b=NcgckRkFmW+JCUVkOHa6+X+C8udiODPmfMmPWTqDZnuh/qACo1sHy5seaXyGwL5gxa inoZ4P4TvMqYBKdNWJqWVihOov/tSS2vSc9Z5yAc4n1EaMMiPBe9pCRwZt251GKQIrG4 FWf1aN3S7uDKuLhfhF59RR0AecNCkaukHL7afPE6mf19esYIj49cYIv4hKm3EWD4RcAb wPwMHttkD/RWxNpIoHHvcxmLPdeaZd7ca9/RD/dKgaZXtUC2WO5rS9hgoVBvQ14E5swz b1IEbezrz9O9L+DLVqRa/f+uul7hcO8i6C6Ni7XJA1iko4umlnztBZq2u3t0Ow4u4J3y 9BCA== X-Gm-Message-State: AOAM531CF98O4V1iVkIkPdwUqFtKtoO0JPC3Cd8W27xCIFvAMM08MgL4 p89PdsVrsTs7hiESEZLVT8Hq4v7Ji+XljTm/XTUNRkTTohujhVCf42w4upMdfuMdfdqebM+zMIk uTocfx7Y9YwzRNd+Ttl7ciNHnNbEV0VSr0e7AVH0ZnDBvwBLAxWkCp++RQ/FM/v21Jw== X-Received: by 2002:ac8:7501:: with SMTP id u1mr2768154qtq.330.1614836047312; Wed, 03 Mar 2021 21:34:07 -0800 (PST) X-Google-Smtp-Source: ABdhPJy5W6ebuPCWhhkpLoN3hDmpXn+lSJ/GyMw4uBgI1g/OUhfrs5sOXE8ocYxl3f1k4/bVN5QVvA== X-Received: by 2002:ac8:7501:: with SMTP id u1mr2768138qtq.330.1614836046770; Wed, 03 Mar 2021 21:34:06 -0800 (PST) Received: from [192.168.1.148] (209-6-216-142.s141.c3-0.smr-cbr1.sbo-smr.ma.cable.rcncustomer.com. [209.6.216.142]) by smtp.gmail.com with ESMTPSA id u1sm16881806qkd.13.2021.03.03.21.34.00 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 03 Mar 2021 21:34:05 -0800 (PST) Subject: Re: PING [PATCH] avoid -Warray-bounds checks for vtable assignments (PR 98266) To: Martin Sebor , Jeff Law , gcc-patches References: <7fa3c1e0-8a09-3436-8669-6d1a577ca247@gmail.com> <55235bab-1a5a-4e3b-3a57-55d9a84ae00c@redhat.com> <042e2b8f-3031-519d-4a2f-1cb33e2004a7@gmail.com> <354bf06f-ad62-7966-a5d5-e4931ed1d5ed@redhat.com> <355d1e90-bba2-ad97-3544-c8fda473cfa0@gmail.com> <571fbfd1-d0f1-c1ca-1eef-4e804d7f8088@redhat.com> <8067ec57-5bb5-87a3-6a72-7ab43004248c@gmail.com> <3ced50a4-06b9-f83b-da73-9c4842944714@redhat.com> <87a14261-2595-11f7-61b5-f3ce30132dd4@gmail.com> <7c667491-dcfc-da4c-40e7-bfd303d69e2b@redhat.com> From: Jason Merrill Message-ID: <508a5907-0645-1871-16f5-85c93c50c726@redhat.com> Date: Thu, 4 Mar 2021 00:33:55 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.6.1 MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00, BODY_8BITS, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Mar 2021 05:34:13 -0000 On 3/3/21 6:20 PM, Martin Sebor wrote: > On 3/2/21 7:11 AM, Jason Merrill wrote: >> On 3/1/21 6:11 PM, Martin Sebor wrote: >>> On 2/24/21 5:35 PM, Jason Merrill wrote: >>>> On 2/23/21 6:07 PM, Martin Sebor wrote: >>>>> On 2/23/21 2:52 PM, Jason Merrill wrote: >>>>>> On 2/23/21 11:02 AM, Martin Sebor wrote: >>>>>>> [CC Jason for any further comments/clarification] >>>>>>> >>>>>>> On 2/9/21 10:49 AM, Martin Sebor wrote: >>>>>>>> On 2/8/21 4:11 PM, Jeff Law wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> On 2/8/21 3:44 PM, Martin Sebor wrote: >>>>>>>>>> On 2/8/21 3:26 PM, Jeff Law wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 2/8/21 2:56 PM, Martin Sebor wrote: >>>>>>>>>>>> On 2/8/21 12:59 PM, Jeff Law wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 1/19/21 5:56 PM, Martin Sebor via Gcc-patches wrote: >>>>>>>>>>>>>> Similar to the problem reported for -Wstringop-overflow in >>>>>>>>>>>>>> pr98266 >>>>>>>>>>>>>> and already fixed, -Warray-bounds is also susceptible to >>>>>>>>>>>>>> false >>>>>>>>>>>>>> positives in assignments and copies involving virtual >>>>>>>>>>>>>> inheritance. >>>>>>>>>>>>>> Because the two warnings don't share code yet (hopefully >>>>>>>>>>>>>> in GCC 12) >>>>>>>>>>>>>> the attached patch adds its own workaround for this >>>>>>>>>>>>>> problem to >>>>>>>>>>>>>> gimple-array-bounds.cc, this one slightly more crude >>>>>>>>>>>>>> because of >>>>>>>>>>>>>> the limited insight the array bounds checking has into the >>>>>>>>>>>>>> checked >>>>>>>>>>>>>> expressions. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Tested on x86_64-linux. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Martin >>>>>>>>>>>>>> >>>>>>>>>>>>>> gcc-98266.diff >>>>>>>>>>>>>> >>>>>>>>>>>>>> PR middle-end/98266 - bogus array subscript is partly >>>>>>>>>>>>>> outside array >>>>>>>>>>>>>> bounds on virtual inheritance >>>>>>>>>>>>>> >>>>>>>>>>>>>> gcc/ChangeLog: >>>>>>>>>>>>>> >>>>>>>>>>>>>>       PR middle-end/98266 >>>>>>>>>>>>>>       * gimple-array-bounds.cc >>>>>>>>>>>>>> (array_bounds_checker::check_array_bounds): >>>>>>>>>>>>>>       Avoid checking references involving artificial members. >>>>>>>>>>>>>> >>>>>>>>>>>>>> gcc/testsuite/ChangeLog: >>>>>>>>>>>>>> >>>>>>>>>>>>>>       PR middle-end/98266 >>>>>>>>>>>>>>       * g++.dg/warn/Warray-bounds-15.C: New test. >>>>>>>>>>>>> It seems to me that we've got the full statement at some >>>>>>>>>>>>> point and >>>>>>>>>>>>> thus >>>>>>>>>>>>> the full expression so at some point couldn't we detect when >>>>>>>>>>>>> TYPE_SIZE_UNIT!= DECL_SIZE_UNIT?  Or should we be using >>>>>>>>>>>>> TYPE_SIZE_UNIT >>>>>>>>>>>>> rather than DECL_SIZE_UNIT in gimple-array-bounds.cc >>>>>>>>>>>>> >>>>>>>>>>>>> Am I missing something? >>>>>>>>>>>> >>>>>>>>>>>> The expression we're looking at when the false positive is >>>>>>>>>>>> issued >>>>>>>>>>>> is the MEM_REF in the LHS of: >>>>>>>>>>>> >>>>>>>>>>>> MEM[(struct D *)&D.2652 + 24B]._vptr.D = &MEM >>>>>>>>>>>> [(void >>>>>>>>>>>> *)&_ZTC1E24_1D + 24B]; >>>>>>>>>>>> >>>>>>>>>>>> TREE_TYPE(LHS) is D, DECL_SIZE_UNIT (D.2652) is 24, and >>>>>>>>>>>> TYPE_SIZE_UNIT(D) is also 24, so there's no discrepancy between >>>>>>>>>>>> DECL_SIZE and TYPE_SIZE. >>>>>>>>>>> So that seems like it's a different issue then, unrelated to >>>>>>>>>>> 97595. >>>>>>>>>>> Right? >>>>>>>>>> >>>>>>>>>> I think the underlying problem is the same.  We're getting a size >>>>>>>>>> that doesn't correspond to what's actually being accessed, and it >>>>>>>>>> happens because of the virtual inheritance.  In pr97595 Jason >>>>>>>>>> suggested to use the decl/type size inequality to identify this >>>>>>>>>> case but I think we could have just as well used DECL_ARTIFICIAL >>>>>>>>>> instead.  At least the test cases from pr97595 both pass with >>>>>>>>>> this change. >>>>>>>>> But in the 98266 case the type and decl sizes are the same.  So >>>>>>>>> to be >>>>>>>>> true that would mean that the underlying type we're using to >>>>>>>>> access >>>>>>>>> memory differs from its actual type.  Is that the case in the >>>>>>>>> IL? And >>>>>>>>> does this have wider implications for diagnostics or >>>>>>>>> optimizations that >>>>>>>>> rely on accurate type sizing? >>>>>>>>> >>>>>>>>> I'm just trying to make sure I understand, not accepting or >>>>>>>>> rejecting >>>>>>>>> the patch yet. >>>>>>>> >>>>>>>> The part of the IL with the MEM_REF is this: >>>>>>>> >>>>>>>> void g () >>>>>>>> { >>>>>>>>    void * D.2789; >>>>>>>>    struct E D.2652; >>>>>>>> >>>>>>>>    [local count: 1073741824]: >>>>>>>>    E::E (&D.2652, ""); >>>>>>>>    f (&D.2652); >>>>>>>> >>>>>>>>    [local count: 1073741824]: >>>>>>>>    MEM[(struct D *)&D.2652 + 24B]._vptr.D = &MEM >>>>>>>> [(void *)&_ZTC1E24_1D + 24B]; >>>>>>>>    ... >>>>>>>> >>>>>>>> The access here is to the _vptr.D pointer member of D.2652 which is >>>>>>>> just past the end of the parent object (as reflected by its SIZE): >>>>>>>> it sets sets up the virtual table pointer. >>>>>>>> >>>>>>>> The access in pr97595 is to the member subobject, which, as Jason >>>>>>>> explained (and I accordingly documented under DECL_SIZE in tree.h), >>>>>>>> is also laid out separately from the parent object. >>>>>>>> >>>>>>>> These cases aren't exactly the same (which is also why the test >>>>>>>> I added for -Warray-bounds in pr97595 didn't expose this bug) but >>>>>>>> they are closely related.  The one here can be distinguished by >>>>>>>> DECL_ARTIFICAL.  The other by the DECL_SIZE != TYPE_SIZE member >>>>>>>> inequality. >>>>>>>> >>>>>>>> Might this impact other warnings?  I'd say so if they don't take >>>>>>>> these things into account.  I just learned about this in pr97595 >>>>>>>> which was a -Wstringop-overflow false positive but I also saw >>>>>>>> a similar instance of -Warray-bounds with my patch to improve >>>>>>>> caching and enhance array bounds checking.  I dealt with that >>>>>>>> instance of the warning in that patch but proactively added >>>>>>>> a test case to the fix for pr97595.  But the test case is focused >>>>>>>> on the subobject access and not on one to the virtual table so >>>>>>>> (as I said above) it didn't expose this bug. >>>>>>>> >>>>>>>> Might this also impact optimizations?  I can imagine someone >>>>>>>> unaware of this "gotcha" making the same "naive" assumption >>>>>>>> I did, but I'd also expect such an invalid assumption to be >>>>>>>> found either in code review or quickly cause problems in >>>>>>>> testing. >>>>>>> >>>>>>> Jeff, does this answer your question? >>>>>> >>>>>> I don't see how the issue here depends on the artificiality of the >>>>>> vptr; I'd expect to see the same problem with a data member.  The >>>>>> problem is that a D base subobject is smaller than a complete D >>>>>> object, and in this case the base subobject is allocated such that >>>>>> if it were a full D object, it would overlap the end of E.  And >>>>>> we're checking the MEM_REF as though accessing a full D object, so >>>>>> we get a warning. >>>>> >>>>> Thanks for chiming in! >>>>> >>>>> There is no warning for accesses to data members defined in >>>>> the virtual bases because their offset isn't constant.  For example, >>>>> given the following modified version of the test case from the patch: >>>>> >>>>> struct A >>>>> { >>>>>    virtual ~A (); >>>>>    int a; >>>>> }; >>>>> >>>>> struct B: virtual A { int b; }; >>>>> struct C: virtual A { int c; }; >>>>> struct D: virtual B, virtual C >>>>> { >>>>>    int d; >>>>> }; >>>>> >>>>> D d; >>>>> >>>>> void g () >>>>> { >>>>>    D *p = &d; >>>>>    p->a = p->b = p->c = p->d = 1; >>>>> } >>>>> >>>>> we end up with: >>>>> >>>>> void g () >>>>> { >>>>>    ... >>>>>    [local count: 1073741824]: >>>>>    d.d = 1;                           <<< okay >>>>>    _1 = d._vptr.D;                    <<< DECL_ARTIFICIAL >>>>>    _2 = MEM[(long int *)_1 + -40B];   <<< not checked >>>>>    _3 = (sizetype) _2;                <<< not constant... >>>>>    _4 = &d + _3; >>>>>    MEM[(struct C *)_4].c = 1;         <<< ...no warning >>>> >>>> Interesting.  I do get a warning if I embed the access in the >>>> destructor: >>>> >>>> struct A >>>> { >>>>    virtual ~A (); >>>>    int a; >>>> }; >>>> >>>> struct B: virtual A { }; >>>> struct C: virtual A { >>>>    int c; >>>>    ~C() { c = 0; } // warns twice, for _vptr and c accesses >>>> }; >>>> struct D: virtual B, virtual C >>>> { >>>>    D(); >>>> }; >>>> >>>> void g() >>>> { >>>>    D d; >>>> } >>> >>> Ah, right.  Unlike in member functions, in a dtor (or in the ctors >>> of the derived class) the offsets are known/constant so accesses >>> involving those are checked and we do get a false positive.  Thanks >>> for the test case! >>> >>>> >>>>>    _5 = MEM[(long int *)_1 + -24B]; >>>>>    _6 = (sizetype) _5; >>>>>    _7 = &d + _6; >>>>>    MEM[(struct B *)_7].b = 1; >>>>>    _8 = MEM[(long int *)_1 + -32B]; >>>>>    _9 = (sizetype) _8; >>>>>    _10 = &d + _9; >>>>>    MEM[(struct A *)_10].a = 1; >>>>>    return; >>>>> >>>>> } >>>>> >>>>>> The general issue about the confusion between complete and as-base >>>>>> types is PR 22488; I have various thoughts about a proper fix, but >>>>>> there have always been higher-priority things to work on. >>>>> >>>>> Thanks for the reference! >>>>> >>>>>> One possible approach to this PR is that we don't need to check an >>>>>> intermediate _REF; in >>>>>> >>>>>> MEM[(struct D *)&D.2651 + 24B]._vptr.D >>>>>> >>>>>> we aren't loading a whole D, we're only looking at the _vptr, >>>>>> which is within the bounds of E. >>>>> >>>>> This is also what the patch does: it keeps us from reaching >>>>> the MEM_REF because _vptr.D is artificial. >>>> >>>> My point applies equally to the access to c in my testcase above: we >>>> shouldn't check lvalue refs as though we're loading or storing the >>>> whole aggregate object. >>>> >>>> The full access is (*((D*)&e))._vptr.D.  The MEM_REF represents the >>>> *, which we shouldn't check by itself, because it doesn't imply a >>>> memory access of its own.  We should only check the COMPONENT_REF >>>> around it that is an actual memory access. >>> >>> The -Warray-bounds implementation checks every ARRAY_REF and MEM_REF, >>> regardless of their context(*). >> >> Aha.  That seems like a flaw, but as you say, not one that can be >> addressed easily. >> >>> It doesn't check COMPONENT_REF except >>> while checking the other REFs (while walking up the use-def chains), >>> and the checks don't know the context of the REF.  The hack this patch >>> put in was to circumvent that when the walk saw a COMPONENT_REF to >>> an artificial member (which as you showed isn't enough). >>> >>> I'm not sure if what you suggest (IIUC) is feasible without reworking >>> how the checking works in general. >> >>> What is feasible is working harder >>> and detecting if a) the type of the accessed object has virtual bases >>> and b) if the offset of the accessed field is within the bounds of >>> the enclosing object.  In that case, skipping the MEM_REF check avoids >>> the warnings for all the cases we have discussed (artificial members, >>> ctors and dtors).  It still seems hacky but sufficiently narrow in >>> scope for GCC 11(**). >>> >>> Please see the attached revision. >> >>> +  else if (TREE_CODE (t) == COMPONENT_REF >>> +       && TREE_CODE (TREE_OPERAND (t, 0)) == MEM_REF) >>> +    { >>> +      /* Hack: Skip MEM_REF checks in accesses to a member of a type >>> +     with virtual bases at an offset that's in bounds of the size >>> +     of the enclosing object.  See pr98266 and pr97595.  */ >>> +      tree ref = TREE_OPERAND (TREE_OPERAND (t, 0), 0); >>> +      if (TREE_CODE (ref) == ADDR_EXPR) >>> +    ref = TREE_OPERAND (ref, 0); >>> +      tree reftype = TREE_TYPE (ref); >>> +      if (DECL_P (ref) >>> +      && RECORD_OR_UNION_TYPE_P (reftype) >>> +      && TYPE_BINFO (reftype)) >>> +    { >>> +      tree fld = TREE_OPERAND (t, 1); >>> +      tree fldpos = byte_position (fld); >>> +      if (tree refsize = DECL_SIZE_UNIT (ref)) >>> +        if (TREE_CODE (refsize) == INTEGER_CST >>> +        && tree_int_cst_lt (fldpos, refsize)) >>> +          *walk_subtree = false; >>> +    } >>> +    } >> >> Here fld F is a member of a base class B, and ref is a variable of a >> derived class D, so you're comparing the offset of F in B to the size >> of D without considering the offset of B in D.  I think you just need >> to add the MEM_REF offset to fldpos before comparing. > > I see what you mean, thanks, but I can't think of a test case to > trigger a false negative due to the smaller offset.  Any suggestions? My only ideas involve undefined behavior, casting the address to a pointer to an unrelated too-large class. I don't know if that would show up as a MEM_REF of this pattern. > With more testing I also realized that focusing solely on an underlying > DECL like in the above doesn't prevent the warning when an object is > created in dynamically allocated memory or in a backing buffer.  So > the attached revision both adjusts the offset computation upward and > handles all kinds of backing store and moves the logic into a helper > function for better readability.  I've also added more tests. > > Retested on x86_64-linux. > > Thanks again for your help! > > Martin > > PS The TYPE_BINFO test isn't as quite as restrictive as I had hoped. > It means we consider all derived class, not just those with virtual > bases. It's even less restrictive than that: all C++ classes have TYPE_BINFO. > I tried also requiring BINFO_VIRTUAL_P() to be true but that > doesn't work. Right, BINFO_VIRTUAL_P is true for the binfo representing the virtual base in the inheritance hierarchy, not on the binfo for the derived class. > Using BINFO_VTABLE() does work but it still isn't the same. Indeed, virtual functions and virtual bases both cause a class to have a vtable. You could also check BINFO_N_BASE_BINFOS. Or not bother trying to restrict this to classes with virtual bases (i.e. leave the patch as it is), since the result is just as correct for other classes. Jason