From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot1-x341.google.com (mail-ot1-x341.google.com [IPv6:2607:f8b0:4864:20::341]) by sourceware.org (Postfix) with ESMTPS id C75073857C74 for ; Wed, 7 Oct 2020 15:19:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org C75073857C74 Received: by mail-ot1-x341.google.com with SMTP id t15so2593893otk.0 for ; Wed, 07 Oct 2020 08:19:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=TkemL5Ha3v2hCPc7lHy8VuyOOcaLMe8rGyN61Z1w7bM=; b=H0CdWMc6f3vJh/rDai2eBF+qGJe0vWFRLYAiZfGBrJZ9Yu+M8B7Etaka0iwRjtiN6/ oPnmeLsHmw0LEDxz3VGn3hbvQAOT+hiJpxKqVw5jE+8xaeUHnAtg1z9Yo5c4gfRwdqVZ C7Dsa+PqkcKOlIaYw1cQ/83/ocWduw1cyyu3/GiqXpaa+KCeOtNK4eTN/5sSS8205xhz 9IqRpHnf5cvSTJcLO7DhUvGNuAuKOIklt3kmKyBXGNqJzFpRv9aYAL/Sy+pYkt+IvCMw KYSEaH0V7tFCMlKb3dE5lsAO0tA3LWHr1Sg6u2oZX2M+4uZidrONTHWLEOjEbPemRtNC 3K/A== X-Gm-Message-State: AOAM531NfH8s0aOGHk+KIsvTBb839EEV7wUSc9Jcc/Q5qp4TfJSEUmPp OY9ONUuwYQbeC9tDcJ2H8AhPyY96TTk= X-Google-Smtp-Source: ABdhPJwISbphTiPo62DlHY3Z1aAxZQJ4ny4r8nHkXSkpclI1WWhoeOTMdCGrCmn0uAK/jMu0Cc9NUg== X-Received: by 2002:a9d:6d95:: with SMTP id x21mr2232822otp.339.1602083970561; Wed, 07 Oct 2020 08:19:30 -0700 (PDT) Received: from [192.168.0.41] (97-118-127-189.hlrn.qwest.net. [97.118.127.189]) by smtp.gmail.com with ESMTPSA id x13sm2494482oot.24.2020.10.07.08.19.29 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 07 Oct 2020 08:19:29 -0700 (PDT) Subject: Re: [PING][PATCH] correct handling of indices into arrays with elements larger than 1 (PR c++/96511) To: Jason Merrill , gcc-patches References: <38c7996c-2c9a-c2c3-e18e-db19a2a805a0@gmail.com> <9c1a04cb-f01e-90c9-df38-6766a0bc9b70@gmail.com> <7c28029d-4f6a-3605-30df-7b9281f988d7@gmail.com> <0a0f49ad-66bc-b59b-6f33-c40a52b06277@redhat.com> <183d8ff6-21e8-6302-7524-f4f0fd6a4e77@gmail.com> <45f5e9f2-1855-91b5-0841-a6bcd342a09b@gmail.com> <8a6bf322-0262-5111-96f0-92e012a69556@redhat.com> From: Martin Sebor Message-ID: <28c7421a-4fe8-ec6b-a288-001aff24df18@gmail.com> Date: Wed, 7 Oct 2020 09:19:29 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: <8a6bf322-0262-5111-96f0-92e012a69556@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_SHORT, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Oct 2020 15:19:35 -0000 On 10/7/20 9:07 AM, Jason Merrill wrote: > On 10/7/20 10:42 AM, Martin Sebor wrote: >> On 10/7/20 8:26 AM, Jason Merrill wrote: >>> On 9/28/20 6:01 PM, Martin Sebor wrote: >>>> On 9/25/20 11:17 PM, Jason Merrill wrote: >>>>> On 9/22/20 4:05 PM, Martin Sebor wrote: >>>>>> The rebased and retested patches are attached. >>>>>> >>>>>> On 9/21/20 3:17 PM, Martin Sebor wrote: >>>>>>> Ping: >>>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553906.html >>>>>>> >>>>>>> (I'm working on rebasing the patch on top of the latest trunk which >>>>>>> has changed some of the same code but it'd be helpful to get a go- >>>>>>> ahead on substance the changes.  I don't expect the rebase to >>>>>>> require any substantive modifications.) >>>>>>> >>>>>>> Martin >>>>>>> >>>>>>> On 9/14/20 4:01 PM, Martin Sebor wrote: >>>>>>>> On 9/4/20 11:14 AM, Jason Merrill wrote: >>>>>>>>> On 9/3/20 2:44 PM, Martin Sebor wrote: >>>>>>>>>> On 9/1/20 1:22 PM, Jason Merrill wrote: >>>>>>>>>>> On 8/11/20 12:19 PM, Martin Sebor via Gcc-patches wrote: >>>>>>>>>>>> -Wplacement-new handles array indices and pointer offsets >>>>>>>>>>>> the same: >>>>>>>>>>>> by adjusting them by the size of the element.  That's >>>>>>>>>>>> correct for >>>>>>>>>>>> the latter but wrong for the former, causing false positives >>>>>>>>>>>> when >>>>>>>>>>>> the element size is greater than one. >>>>>>>>>>>> >>>>>>>>>>>> In addition, the warning doesn't even attempt to handle >>>>>>>>>>>> arrays of >>>>>>>>>>>> arrays.  I'm not sure if I forgot or if I simply didn't >>>>>>>>>>>> think of >>>>>>>>>>>> it. >>>>>>>>>>>> >>>>>>>>>>>> The attached patch corrects these oversights by replacing most >>>>>>>>>>>> of the -Wplacement-new code with a call to compute_objsize >>>>>>>>>>>> which >>>>>>>>>>>> handles all this correctly (plus more), and is also better >>>>>>>>>>>> tested. >>>>>>>>>>>> But even compute_objsize has bugs: it trips up while converting >>>>>>>>>>>> wide_int to offset_int for some pointer offset ranges.  Since >>>>>>>>>>>> handling the C++ IL required changes in this area the patch >>>>>>>>>>>> also >>>>>>>>>>>> fixes that. >>>>>>>>>>>> >>>>>>>>>>>> For review purposes, the patch affects just the middle end. >>>>>>>>>>>> The C++ diff pretty much just removes code from the front end. >>>>>>>>>>> >>>>>>>>>>> The C++ changes are OK. >>>>>>>>>> >>>>>>>>>> Thank you for looking at the rest as well. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> -compute_objsize (tree ptr, int ostype, access_ref *pref, >>>>>>>>>>>> -                bitmap *visited, const vr_values *rvals /* >>>>>>>>>>>> = NULL */) >>>>>>>>>>>> +compute_objsize (tree ptr, int ostype, access_ref *pref, >>>>>>>>>>>> bitmap *visited, >>>>>>>>>>>> +                const vr_values *rvals) >>>>>>>>>>> >>>>>>>>>>> This reformatting seems unnecessary, and I prefer to keep the >>>>>>>>>>> comment about the default argument. >>>>>>>>>> >>>>>>>>>> This overload doesn't take a default argument.  (There was a >>>>>>>>>> stray >>>>>>>>>> declaration of a similar function at the top of the file that had >>>>>>>>>> one.  I've removed it.) >>>>>>>>> >>>>>>>>> Ah, true. >>>>>>>>> >>>>>>>>>>>> -      if (!size || TREE_CODE (size) != INTEGER_CST) >>>>>>>>>>>> -       return false; >>>>>>>>>>>  >... >>>>>>>>>>> >>>>>>>>>>> You change some failure cases in compute_objsize to return >>>>>>>>>>> success with a maximum range, while others continue to return >>>>>>>>>>> failure. This needs commentary about the design rationale. >>>>>>>>>> >>>>>>>>>> This is too much for a comment in the code but the background is >>>>>>>>>> this: compute_objsize initially returned the object size as a >>>>>>>>>> constant. >>>>>>>>>> Recently, I have enhanced it to return a range to improve >>>>>>>>>> warnings for >>>>>>>>>> allocated objects.  With that, a failure can be turned into >>>>>>>>>> success by >>>>>>>>>> having the function set the range to that of the largest >>>>>>>>>> object. That >>>>>>>>>> should simplify the function's callers and could even improve >>>>>>>>>> the detection of some invalid accesses.  Once this change is made >>>>>>>>>> it might even be possible to change its return type to void. >>>>>>>>>> >>>>>>>>>> The change that caught your eye is necessary to make the function >>>>>>>>>> a drop-in replacement for the C++ front end code which makes this >>>>>>>>>> same assumption.  Without it, a number of test cases that >>>>>>>>>> exercise >>>>>>>>>> VLAs fail in g++.dg/warn/Wplacement-new-size-5.C.  For example: >>>>>>>>>> >>>>>>>>>>    void f (int n) >>>>>>>>>>    { >>>>>>>>>>      char a[n]; >>>>>>>>>>      new (a - 1) int (); >>>>>>>>>>    } >>>>>>>>>> >>>>>>>>>> Changing any of the other places isn't necessary for existing >>>>>>>>>> tests >>>>>>>>>> to pass (and I didn't want to introduce too much churn).  But >>>>>>>>>> I do >>>>>>>>>> want to change the rest of the function along the same lines >>>>>>>>>> at some >>>>>>>>>> point. >>>>>>>>> >>>>>>>>> Please do change the other places to be consistent; better to >>>>>>>>> have more churn than to leave the function half-updated.  That >>>>>>>>> can be a separate patch if you prefer, but let's do it now >>>>>>>>> rather than later. >>>>>>>> >>>>>>>> I've made most of these changes in the other patch (also attached). >>>>>>>> I'm quite happy with the result but it turned out to be a lot more >>>>>>>> work than either of us expected, mostly due to the amount of >>>>>>>> testing. >>>>>>>> >>>>>>>> I've left a couple of failing cases in place mainly as reminders >>>>>>>> to handle them better (which means I also didn't change the caller >>>>>>>> to avoid testing for failures).  I've also added TODO notes with >>>>>>>> reminders to handle some of the new codes more completely. >>>>>>>> >>>>>>>>> >>>>>>>>>>>> +  special_array_member sam{ }; >>>>>>>>>>> >>>>>>>>>>> sam is always set by component_ref_size, so I don't think >>>>>>>>>>> it's necessary to initialize it at the declaration. >>>>>>>>>> >>>>>>>>>> I find initializing pass-by-pointer local variables helpful but >>>>>>>>>> I don't insist on it. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> @@ -187,7 +187,7 @@ decl_init_size (tree decl, bool min) >>>>>>>>>>>>    tree last_type = TREE_TYPE (last); >>>>>>>>>>>>    if (TREE_CODE (last_type) != ARRAY_TYPE >>>>>>>>>>>>        || TYPE_SIZE (last_type)) >>>>>>>>>>>> -    return size; >>>>>>>>>>>> +    return size ? size : TYPE_SIZE_UNIT (type); >>>>>>>>>>> >>>>>>>>>>> This change seems to violate the comment for the function. >>>>>>>>>> >>>>>>>>>> By my reading (and writing) the change is covered by the first >>>>>>>>>> sentence: >>>>>>>>>> >>>>>>>>>>     Returns the size of the object designated by DECL considering >>>>>>>>>>     its initializer if it either has one or if it would not >>>>>>>>>> affect >>>>>>>>>>     its size, ... >>>>>>>>> >>>>>>>>> OK, I see it now. >>>>>>>>> >>>>>>>>>> It handles a number of cases in Wplacement-new-size.C fail that >>>>>>>>>> construct a larger object in an extern declaration of a template, >>>>>>>>>> like this: >>>>>>>>>> >>>>>>>>>>    template struct S { char c; }; >>>>>>>>>>    extern S s; >>>>>>>>>> >>>>>>>>>>    void f () >>>>>>>>>>    { >>>>>>>>>>      new (&s) int (); >>>>>>>>>>    } >>>>>>>>>> >>>>>>>>>> I don't know why DECL_SIZE isn't set here (I don't think it can >>>>>>>>>> be anything but equal to TYPE_SIZE, can it?) and other than >>>>>>>>>> struct >>>>>>>>>> objects with a flexible array member where this identity doesn't >>>>>>>>>> hold I can't think of others.  Am I missing something? >>>>>>>>> >>>>>>>>> Good question.  The attached patch should fix that, so you >>>>>>>>> shouldn't need the change to decl_init_size: >>>>>>>> >>>>>>>> I've integrated it into the bug fix. >>>>>>>> >>>>>>>> Besides the usual x86_64-linux bootstrap/regtest I tested both >>>>>>>> patches by building a few packages, including Binutils/GDB, Glibc, >>>>>>>> and  verifying no new warnings show up. >>>>>>>> >>>>>>>> Martin >>>>> >>>>>> +offset_int >>>>>> +access_ref::size_remaining (offset_int *pmin /* = NULL */) const >>>>> >>>>> For the various member functions, please include the comments with >>>>> the definition as well as the in-class declaration. >>>> >>>> Only one access_ref member function is defined out-of-line: >>>> offset_bounded().  I've adjusted the comment and copied it above >>>> the function definition. >>>> >>>>> >>>>>> +      if (offrng[1] < offrng[0]) >>>>> >>>>> What does it mean for the max offset to be less than the min >>>>> offset? I wouldn't expect that to ever happen with wide integers. >>>> >>>> The offset is represented in sizetype with negative values represented >>>> as large positive values, but has to be converted to ptrdiff_t. >>> >>> It looks to me like the offset is offset_int, which is both signed >>> and big enough to hold all values of sizetype without turning large >>> positive values into negative values.  Where are these sign-switching >>> conversions happening? >> >> In get_offset_range in builtins.c. > > Since we're converting to offset_int there, why not give the offset_int > the real value rather than a bogus negative value? I don't understand the question: the real offset (in the program) is negative when its sizetype representation is greater than PTRDIFF_MAX. It's worked this way for years. Martin >>>> These >>>> cases come up when the unsigned offset is an ordinary range that >>>> corresponds to an anti-range, such as here: >>>> >>>>    extern char a[2]; >>>> >>>>    void f (unsigned long i) >>>>    { >>>>      if (i == 0) >>>>        return; >>>>      a[i] = 0;   // i's range is [1, -1] (i.e., [1, SIZE_MAX] >>>>    } >>>> >>>>> >>>>>> +  /* Return true if OFFRNG is bounded to a subrange of possible >>>>>> offset >>>>>> +     values.  */ >>>>>> +  bool offset_bounded () const; >>>>> >>>>> I don't understand how you're using this.  The implementation >>>>> checks for the possible offset values falling outside those >>>>> representable by ptrdiff_t, unless the range is only a single >>>>> value.  And then the only use is >>>>> >>>>>> +  if (ref.offset_zero () || !ref.offset_bounded ()) >>>>>> +    inform (DECL_SOURCE_LOCATION (ref.ref), >>>>>> +        "%qD declared here", ref.ref); >>>>>> +  else if (ref.offrng[0] == ref.offrng[1]) >>>>>> +    inform (DECL_SOURCE_LOCATION (ref.ref), >>>>>> +        "at offset %wi from %qD declared here", >>>>>> +        ref.offrng[0].to_shwi (), ref.ref); >>>>>> +  else >>>>>> +    inform (DECL_SOURCE_LOCATION (ref.ref), >>>>>> +        "at offset [%wi, %wi] from %qD declared here", >>>>>> +        ref.offrng[0].to_shwi (), ref.offrng[1].to_shwi (), >>>>>> ref.ref); >>>>> >>>>> So if the possible offsets are all representable by ptrdiff_t, we >>>>> don't print the range?  The middle case also looks unreachable, >>>>> since offset_bounded will return false in that case. >>>> >>>> The function was originally named "offset_unbounded."  I changed >>>> it to "offset_bounded" but looks like I didn't finish the job or >>>> add any tests for it. >>>> >>>> The goal of conditionals is to avoid overwhelming the user with >>>> excessive numbers that may not be meaningful or even relevant >>>> to the warning.  I've corrected the function body, tweaked and >>>> renamed the get_range function to get_offset_range to do a better >>>> job of extracting ranges from the types of some nonconstant >>>> expressions the front end passes it, and added a new test for >>>> all this.  Attached is the new revision. >>>> >>>> Martin >>> >> >