From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by sourceware.org (Postfix) with ESMTP id A5D7B3864858 for ; Wed, 7 Oct 2020 21:02:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org A5D7B3864858 Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-473-RDu9-_VfO72RyYNsOoea6g-1; Wed, 07 Oct 2020 17:02:01 -0400 X-MC-Unique: RDu9-_VfO72RyYNsOoea6g-1 Received: by mail-qk1-f197.google.com with SMTP id w126so2264503qka.5 for ; Wed, 07 Oct 2020 14:02:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=nB3rGCa1zC5uT8bS5Es1Oexxi4WYGkjMVGo2fPjQTPw=; b=sdWgIM3c2OAObY7eHMGthe5XQ0IPaNec/DrZkoTBBBXnZZx+nPihAw8oh5SGOwi1gQ AtsHyTksbiyp7KrBE6+bVN1jaPsOczFjga+9YMC/JwucMaOReKHJKKynoaijQefS+0md 3/HGCp3w51STbvhASDYzwqhSUoDnPMHiVdcQegXuzGmAc1cXcl1u1wrpMd4sbYvkx0uU jxivdwZMPSCSbaxb1sEWGLpIf59JhWDoKNFbvnnOwZKDLjmXOuo3K6WPaLrtkUbmsWC3 hxEBpX7/W/QDONrhlu98Sgqu/GssxInMaANHIu2bz11ddNwo62KXJRQTX+5TA/G8UeYg sJ8A== X-Gm-Message-State: AOAM530VBc2sFdkmYaNqydUbrCGsLintzlDhf0xEv8V+lea4lODKp9kM Xzx0Rs63f5S1xqnIAkBHMtiRP8EE3w4clSw82AHoIeQ0ApnQzuPHr63wRbxOlYfHnfjrqyUxqkB Cb7c8wQ5Tp8KBlFZvfw== X-Received: by 2002:ac8:411c:: with SMTP id q28mr5392031qtl.254.1602104520631; Wed, 07 Oct 2020 14:02:00 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyDiFITRwRLsYVcxpj0j1Sjjb+pzIWYvUQbL/4HGu+oRonWciGAfe+5V4Qf1zSOSt85sQRhvA== X-Received: by 2002:ac8:411c:: with SMTP id q28mr5391991qtl.254.1602104520075; Wed, 07 Oct 2020 14:02:00 -0700 (PDT) Received: from [192.168.1.148] (209-6-216-142.s141.c3-0.smr-cbr1.sbo-smr.ma.cable.rcncustomer.com. [209.6.216.142]) by smtp.gmail.com with ESMTPSA id q7sm2433952qtd.49.2020.10.07.14.01.58 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 07 Oct 2020 14:01:59 -0700 (PDT) Subject: Re: [PING][PATCH] correct handling of indices into arrays with elements larger than 1 (PR c++/96511) To: Martin Sebor , gcc-patches References: <38c7996c-2c9a-c2c3-e18e-db19a2a805a0@gmail.com> <9c1a04cb-f01e-90c9-df38-6766a0bc9b70@gmail.com> <7c28029d-4f6a-3605-30df-7b9281f988d7@gmail.com> <0a0f49ad-66bc-b59b-6f33-c40a52b06277@redhat.com> <183d8ff6-21e8-6302-7524-f4f0fd6a4e77@gmail.com> <45f5e9f2-1855-91b5-0841-a6bcd342a09b@gmail.com> <8a6bf322-0262-5111-96f0-92e012a69556@redhat.com> <28c7421a-4fe8-ec6b-a288-001aff24df18@gmail.com> <647ab677-41be-2030-c734-bde03071057a@redhat.com> From: Jason Merrill Message-ID: Date: Wed, 7 Oct 2020 17:01:58 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00, BODY_8BITS, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_SHORT, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Oct 2020 21:02:07 -0000 On 10/7/20 4:11 PM, Martin Sebor wrote: > On 10/7/20 1:28 PM, Jason Merrill wrote: >> On 10/7/20 11:19 AM, Martin Sebor wrote: >>> On 10/7/20 9:07 AM, Jason Merrill wrote: >>>> On 10/7/20 10:42 AM, Martin Sebor wrote: >>>>> On 10/7/20 8:26 AM, Jason Merrill wrote: >>>>>> On 9/28/20 6:01 PM, Martin Sebor wrote: >>>>>>> On 9/25/20 11:17 PM, Jason Merrill wrote: >>>>>>>> On 9/22/20 4:05 PM, Martin Sebor wrote: >>>>>>>>> The rebased and retested patches are attached. >>>>>>>>> >>>>>>>>> On 9/21/20 3:17 PM, Martin Sebor wrote: >>>>>>>>>> Ping: >>>>>>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553906.html >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> (I'm working on rebasing the patch on top of the latest trunk >>>>>>>>>> which >>>>>>>>>> has changed some of the same code but it'd be helpful to get a >>>>>>>>>> go- >>>>>>>>>> ahead on substance the changes.  I don't expect the rebase to >>>>>>>>>> require any substantive modifications.) >>>>>>>>>> >>>>>>>>>> Martin >>>>>>>>>> >>>>>>>>>> On 9/14/20 4:01 PM, Martin Sebor wrote: >>>>>>>>>>> On 9/4/20 11:14 AM, Jason Merrill wrote: >>>>>>>>>>>> On 9/3/20 2:44 PM, Martin Sebor wrote: >>>>>>>>>>>>> On 9/1/20 1:22 PM, Jason Merrill wrote: >>>>>>>>>>>>>> On 8/11/20 12:19 PM, Martin Sebor via Gcc-patches wrote: >>>>>>>>>>>>>>> -Wplacement-new handles array indices and pointer offsets >>>>>>>>>>>>>>> the same: >>>>>>>>>>>>>>> by adjusting them by the size of the element.  That's >>>>>>>>>>>>>>> correct for >>>>>>>>>>>>>>> the latter but wrong for the former, causing false >>>>>>>>>>>>>>> positives when >>>>>>>>>>>>>>> the element size is greater than one. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In addition, the warning doesn't even attempt to handle >>>>>>>>>>>>>>> arrays of >>>>>>>>>>>>>>> arrays.  I'm not sure if I forgot or if I simply didn't >>>>>>>>>>>>>>> think of >>>>>>>>>>>>>>> it. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The attached patch corrects these oversights by replacing >>>>>>>>>>>>>>> most >>>>>>>>>>>>>>> of the -Wplacement-new code with a call to >>>>>>>>>>>>>>> compute_objsize which >>>>>>>>>>>>>>> handles all this correctly (plus more), and is also >>>>>>>>>>>>>>> better tested. >>>>>>>>>>>>>>> But even compute_objsize has bugs: it trips up while >>>>>>>>>>>>>>> converting >>>>>>>>>>>>>>> wide_int to offset_int for some pointer offset ranges. >>>>>>>>>>>>>>> Since >>>>>>>>>>>>>>> handling the C++ IL required changes in this area the >>>>>>>>>>>>>>> patch also >>>>>>>>>>>>>>> fixes that. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> For review purposes, the patch affects just the middle end. >>>>>>>>>>>>>>> The C++ diff pretty much just removes code from the front >>>>>>>>>>>>>>> end. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The C++ changes are OK. >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you for looking at the rest as well. >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> -compute_objsize (tree ptr, int ostype, access_ref *pref, >>>>>>>>>>>>>>> -                bitmap *visited, const vr_values *rvals >>>>>>>>>>>>>>> /* = NULL */) >>>>>>>>>>>>>>> +compute_objsize (tree ptr, int ostype, access_ref *pref, >>>>>>>>>>>>>>> bitmap *visited, >>>>>>>>>>>>>>> +                const vr_values *rvals) >>>>>>>>>>>>>> >>>>>>>>>>>>>> This reformatting seems unnecessary, and I prefer to keep >>>>>>>>>>>>>> the comment about the default argument. >>>>>>>>>>>>> >>>>>>>>>>>>> This overload doesn't take a default argument.  (There was >>>>>>>>>>>>> a stray >>>>>>>>>>>>> declaration of a similar function at the top of the file >>>>>>>>>>>>> that had >>>>>>>>>>>>> one.  I've removed it.) >>>>>>>>>>>> >>>>>>>>>>>> Ah, true. >>>>>>>>>>>> >>>>>>>>>>>>>>> -      if (!size || TREE_CODE (size) != INTEGER_CST) >>>>>>>>>>>>>>> -       return false; >>>>>>>>>>>>>>  >... >>>>>>>>>>>>>> >>>>>>>>>>>>>> You change some failure cases in compute_objsize to return >>>>>>>>>>>>>> success with a maximum range, while others continue to >>>>>>>>>>>>>> return failure. This needs commentary about the design >>>>>>>>>>>>>> rationale. >>>>>>>>>>>>> >>>>>>>>>>>>> This is too much for a comment in the code but the >>>>>>>>>>>>> background is >>>>>>>>>>>>> this: compute_objsize initially returned the object size as >>>>>>>>>>>>> a constant. >>>>>>>>>>>>> Recently, I have enhanced it to return a range to improve >>>>>>>>>>>>> warnings for >>>>>>>>>>>>> allocated objects.  With that, a failure can be turned into >>>>>>>>>>>>> success by >>>>>>>>>>>>> having the function set the range to that of the largest >>>>>>>>>>>>> object. That >>>>>>>>>>>>> should simplify the function's callers and could even improve >>>>>>>>>>>>> the detection of some invalid accesses.  Once this change >>>>>>>>>>>>> is made >>>>>>>>>>>>> it might even be possible to change its return type to void. >>>>>>>>>>>>> >>>>>>>>>>>>> The change that caught your eye is necessary to make the >>>>>>>>>>>>> function >>>>>>>>>>>>> a drop-in replacement for the C++ front end code which >>>>>>>>>>>>> makes this >>>>>>>>>>>>> same assumption.  Without it, a number of test cases that >>>>>>>>>>>>> exercise >>>>>>>>>>>>> VLAs fail in g++.dg/warn/Wplacement-new-size-5.C.  For >>>>>>>>>>>>> example: >>>>>>>>>>>>> >>>>>>>>>>>>>    void f (int n) >>>>>>>>>>>>>    { >>>>>>>>>>>>>      char a[n]; >>>>>>>>>>>>>      new (a - 1) int (); >>>>>>>>>>>>>    } >>>>>>>>>>>>> >>>>>>>>>>>>> Changing any of the other places isn't necessary for >>>>>>>>>>>>> existing tests >>>>>>>>>>>>> to pass (and I didn't want to introduce too much churn). >>>>>>>>>>>>> But I do >>>>>>>>>>>>> want to change the rest of the function along the same >>>>>>>>>>>>> lines at some >>>>>>>>>>>>> point. >>>>>>>>>>>> >>>>>>>>>>>> Please do change the other places to be consistent; better >>>>>>>>>>>> to have more churn than to leave the function half-updated. >>>>>>>>>>>> That can be a separate patch if you prefer, but let's do it >>>>>>>>>>>> now rather than later. >>>>>>>>>>> >>>>>>>>>>> I've made most of these changes in the other patch (also >>>>>>>>>>> attached). >>>>>>>>>>> I'm quite happy with the result but it turned out to be a lot >>>>>>>>>>> more >>>>>>>>>>> work than either of us expected, mostly due to the amount of >>>>>>>>>>> testing. >>>>>>>>>>> >>>>>>>>>>> I've left a couple of failing cases in place mainly as reminders >>>>>>>>>>> to handle them better (which means I also didn't change the >>>>>>>>>>> caller >>>>>>>>>>> to avoid testing for failures).  I've also added TODO notes with >>>>>>>>>>> reminders to handle some of the new codes more completely. >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>>>> +  special_array_member sam{ }; >>>>>>>>>>>>>> >>>>>>>>>>>>>> sam is always set by component_ref_size, so I don't think >>>>>>>>>>>>>> it's necessary to initialize it at the declaration. >>>>>>>>>>>>> >>>>>>>>>>>>> I find initializing pass-by-pointer local variables helpful >>>>>>>>>>>>> but >>>>>>>>>>>>> I don't insist on it. >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> @@ -187,7 +187,7 @@ decl_init_size (tree decl, bool min) >>>>>>>>>>>>>>>    tree last_type = TREE_TYPE (last); >>>>>>>>>>>>>>>    if (TREE_CODE (last_type) != ARRAY_TYPE >>>>>>>>>>>>>>>        || TYPE_SIZE (last_type)) >>>>>>>>>>>>>>> -    return size; >>>>>>>>>>>>>>> +    return size ? size : TYPE_SIZE_UNIT (type); >>>>>>>>>>>>>> >>>>>>>>>>>>>> This change seems to violate the comment for the function. >>>>>>>>>>>>> >>>>>>>>>>>>> By my reading (and writing) the change is covered by the first >>>>>>>>>>>>> sentence: >>>>>>>>>>>>> >>>>>>>>>>>>>     Returns the size of the object designated by DECL >>>>>>>>>>>>> considering >>>>>>>>>>>>>     its initializer if it either has one or if it would not >>>>>>>>>>>>> affect >>>>>>>>>>>>>     its size, ... >>>>>>>>>>>> >>>>>>>>>>>> OK, I see it now. >>>>>>>>>>>> >>>>>>>>>>>>> It handles a number of cases in Wplacement-new-size.C fail >>>>>>>>>>>>> that >>>>>>>>>>>>> construct a larger object in an extern declaration of a >>>>>>>>>>>>> template, >>>>>>>>>>>>> like this: >>>>>>>>>>>>> >>>>>>>>>>>>>    template struct S { char c; }; >>>>>>>>>>>>>    extern S s; >>>>>>>>>>>>> >>>>>>>>>>>>>    void f () >>>>>>>>>>>>>    { >>>>>>>>>>>>>      new (&s) int (); >>>>>>>>>>>>>    } >>>>>>>>>>>>> >>>>>>>>>>>>> I don't know why DECL_SIZE isn't set here (I don't think it >>>>>>>>>>>>> can >>>>>>>>>>>>> be anything but equal to TYPE_SIZE, can it?) and other than >>>>>>>>>>>>> struct >>>>>>>>>>>>> objects with a flexible array member where this identity >>>>>>>>>>>>> doesn't >>>>>>>>>>>>> hold I can't think of others.  Am I missing something? >>>>>>>>>>>> >>>>>>>>>>>> Good question.  The attached patch should fix that, so you >>>>>>>>>>>> shouldn't need the change to decl_init_size: >>>>>>>>>>> >>>>>>>>>>> I've integrated it into the bug fix. >>>>>>>>>>> >>>>>>>>>>> Besides the usual x86_64-linux bootstrap/regtest I tested both >>>>>>>>>>> patches by building a few packages, including Binutils/GDB, >>>>>>>>>>> Glibc, >>>>>>>>>>> and  verifying no new warnings show up. >>>>>>>>>>> >>>>>>>>>>> Martin >>>>>>>> >>>>>>>>> +offset_int >>>>>>>>> +access_ref::size_remaining (offset_int *pmin /* = NULL */) const >>>>>>>> >>>>>>>> For the various member functions, please include the comments >>>>>>>> with the definition as well as the in-class declaration. >>>>>>> >>>>>>> Only one access_ref member function is defined out-of-line: >>>>>>> offset_bounded().  I've adjusted the comment and copied it above >>>>>>> the function definition. And size_remaining, as quoted above? I also don't see a comment above the definition of offset_bounded in the new patch? >>>>>>>>> +      if (offrng[1] < offrng[0]) >>>>>>>> >>>>>>>> What does it mean for the max offset to be less than the min >>>>>>>> offset? I wouldn't expect that to ever happen with wide integers. >>>>>>> >>>>>>> The offset is represented in sizetype with negative values >>>>>>> represented >>>>>>> as large positive values, but has to be converted to ptrdiff_t. >>>>>> >>>>>> It looks to me like the offset is offset_int, which is both signed >>>>>> and big enough to hold all values of sizetype without turning >>>>>> large positive values into negative values.  Where are these >>>>>> sign-switching conversions happening? >>>>> >>>>> In get_offset_range in builtins.c. >>>> >>>> Since we're converting to offset_int there, why not give the >>>> offset_int the real value rather than a bogus negative value? >>> >>> I don't understand the question: the real offset (in the program) >>> is negative when its sizetype representation is greater than >>> PTRDIFF_MAX.  It's worked this way for years. >> >> OK, then we're back to my original question: why is the max offset >> less than the min offset?  If the range includes negative values, why >> isn't the more-negative one the minimum? > > I showed a simple example where it happens: > >    extern char a[2]; > >    void f (unsigned long i) >    { >      if (i == 0) >        return; >      a[i] = 0;   // i's range is [1, -1] (i.e., [1, SIZE_MAX]) >    } > The "negative" offset can't be the minimum because it would include > zero which is the one value the range excludes.  It's effectively > an anti-range represented as an inverted range. So if the excluded value was 1, the range would be [2, 0]? OK, then the answer to my question is "it indicates an anti-range" and the sign wrapping was just a distraction. >>>>>>> These >>>>>>> cases come up when the unsigned offset is an ordinary range that >>>>>>> corresponds to an anti-range, such as here: >>>>>>> >>>>>>>    extern char a[2]; >>>>>>> >>>>>>>    void f (unsigned long i) >>>>>>>    { >>>>>>>      if (i == 0) >>>>>>>        return; >>>>>>>      a[i] = 0;   // i's range is [1, -1] (i.e., [1, SIZE_MAX] >>>>>>>    } >>>>>>> >>>>>>>> >>>>>>>>> +  /* Return true if OFFRNG is bounded to a subrange of >>>>>>>>> possible offset >>>>>>>>> +     values.  */ >>>>>>>>> +  bool offset_bounded () const; >>>>>>>> >>>>>>>> I don't understand how you're using this.  The implementation >>>>>>>> checks for the possible offset values falling outside those >>>>>>>> representable by ptrdiff_t, unless the range is only a single >>>>>>>> value.  And then the only use is >>>>>>>> >>>>>>>>> +  if (ref.offset_zero () || !ref.offset_bounded ()) >>>>>>>>> +    inform (DECL_SOURCE_LOCATION (ref.ref), >>>>>>>>> +        "%qD declared here", ref.ref); >>>>>>>>> +  else if (ref.offrng[0] == ref.offrng[1]) >>>>>>>>> +    inform (DECL_SOURCE_LOCATION (ref.ref), >>>>>>>>> +        "at offset %wi from %qD declared here", >>>>>>>>> +        ref.offrng[0].to_shwi (), ref.ref); >>>>>>>>> +  else >>>>>>>>> +    inform (DECL_SOURCE_LOCATION (ref.ref), >>>>>>>>> +        "at offset [%wi, %wi] from %qD declared here", >>>>>>>>> +        ref.offrng[0].to_shwi (), ref.offrng[1].to_shwi (), >>>>>>>>> ref.ref); >>>>>>>> >>>>>>>> So if the possible offsets are all representable by ptrdiff_t, >>>>>>>> we don't print the range?  The middle case also looks >>>>>>>> unreachable, since offset_bounded will return false in that case. >>>>>>> >>>>>>> The function was originally named "offset_unbounded."  I changed >>>>>>> it to "offset_bounded" but looks like I didn't finish the job or >>>>>>> add any tests for it. >>>>>>> >>>>>>> The goal of conditionals is to avoid overwhelming the user with >>>>>>> excessive numbers that may not be meaningful or even relevant >>>>>>> to the warning.  I've corrected the function body, tweaked and >>>>>>> renamed the get_range function to get_offset_range to do a better >>>>>>> job of extracting ranges from the types of some nonconstant >>>>>>> expressions the front end passes it, and added a new test for >>>>>>> all this.  Attached is the new revision. offset_bounded looks unchanged in the new patch. It still returns true iff either the range is a single value or one of the bounds are unrepresentable in ptrdiff_t. I'm still unclear how this corresponds to "Return true if OFFRNG is bounded to a subrange of possible offset values." It still looks like the middle case is unreachable: if offrng[0] == offrng[1], offset_bounded returns false, so we don't get as far as testing it for the middle case. Jason