From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <msebor@gmail.com>
Received: from mail-ot1-x341.google.com (mail-ot1-x341.google.com
 [IPv6:2607:f8b0:4864:20::341])
 by sourceware.org (Postfix) with ESMTPS id C75073857C74
 for <gcc-patches@gcc.gnu.org>; Wed,  7 Oct 2020 15:19:32 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org C75073857C74
Received: by mail-ot1-x341.google.com with SMTP id t15so2593893otk.0
 for <gcc-patches@gcc.gnu.org>; Wed, 07 Oct 2020 08:19:32 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:subject:to:references:from:message-id:date
 :user-agent:mime-version:in-reply-to:content-language
 :content-transfer-encoding;
 bh=TkemL5Ha3v2hCPc7lHy8VuyOOcaLMe8rGyN61Z1w7bM=;
 b=H0CdWMc6f3vJh/rDai2eBF+qGJe0vWFRLYAiZfGBrJZ9Yu+M8B7Etaka0iwRjtiN6/
 oPnmeLsHmw0LEDxz3VGn3hbvQAOT+hiJpxKqVw5jE+8xaeUHnAtg1z9Yo5c4gfRwdqVZ
 C7Dsa+PqkcKOlIaYw1cQ/83/ocWduw1cyyu3/GiqXpaa+KCeOtNK4eTN/5sSS8205xhz
 9IqRpHnf5cvSTJcLO7DhUvGNuAuKOIklt3kmKyBXGNqJzFpRv9aYAL/Sy+pYkt+IvCMw
 KYSEaH0V7tFCMlKb3dE5lsAO0tA3LWHr1Sg6u2oZX2M+4uZidrONTHWLEOjEbPemRtNC
 3K/A==
X-Gm-Message-State: AOAM531NfH8s0aOGHk+KIsvTBb839EEV7wUSc9Jcc/Q5qp4TfJSEUmPp
 OY9ONUuwYQbeC9tDcJ2H8AhPyY96TTk=
X-Google-Smtp-Source: ABdhPJwISbphTiPo62DlHY3Z1aAxZQJ4ny4r8nHkXSkpclI1WWhoeOTMdCGrCmn0uAK/jMu0Cc9NUg==
X-Received: by 2002:a9d:6d95:: with SMTP id x21mr2232822otp.339.1602083970561; 
 Wed, 07 Oct 2020 08:19:30 -0700 (PDT)
Received: from [192.168.0.41] (97-118-127-189.hlrn.qwest.net. [97.118.127.189])
 by smtp.gmail.com with ESMTPSA id x13sm2494482oot.24.2020.10.07.08.19.29
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Wed, 07 Oct 2020 08:19:29 -0700 (PDT)
Subject: Re: [PING][PATCH] correct handling of indices into arrays with
 elements larger than 1 (PR c++/96511)
To: Jason Merrill <jason@redhat.com>, gcc-patches <gcc-patches@gcc.gnu.org>
References: <a47e256b-11ce-7818-8391-8921ae114a99@gmail.com>
 <dd4f23e5-7fc7-1c90-43bd-8b57ac1ecf36@redhat.com>
 <38c7996c-2c9a-c2c3-e18e-db19a2a805a0@gmail.com>
 <dbd55109-70f6-aed0-61fb-6f27e66717ce@redhat.com>
 <dcadf96e-402b-cee1-78a1-02eda22cff55@gmail.com>
 <9c1a04cb-f01e-90c9-df38-6766a0bc9b70@gmail.com>
 <7c28029d-4f6a-3605-30df-7b9281f988d7@gmail.com>
 <0a0f49ad-66bc-b59b-6f33-c40a52b06277@redhat.com>
 <183d8ff6-21e8-6302-7524-f4f0fd6a4e77@gmail.com>
 <a3d8a916-42d9-1b8e-61f0-a6acaca4a38d@redhat.com>
 <45f5e9f2-1855-91b5-0841-a6bcd342a09b@gmail.com>
 <8a6bf322-0262-5111-96f0-92e012a69556@redhat.com>
From: Martin Sebor <msebor@gmail.com>
Message-ID: <28c7421a-4fe8-ec6b-a288-001aff24df18@gmail.com>
Date: Wed, 7 Oct 2020 09:19:29 -0600
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
 Thunderbird/68.2.2
MIME-Version: 1.0
In-Reply-To: <8a6bf322-0262-5111-96f0-92e012a69556@redhat.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00, BODY_8BITS,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM,
 KAM_SHORT, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 07 Oct 2020 15:19:35 -0000

On 10/7/20 9:07 AM, Jason Merrill wrote:
> On 10/7/20 10:42 AM, Martin Sebor wrote:
>> On 10/7/20 8:26 AM, Jason Merrill wrote:
>>> On 9/28/20 6:01 PM, Martin Sebor wrote:
>>>> On 9/25/20 11:17 PM, Jason Merrill wrote:
>>>>> On 9/22/20 4:05 PM, Martin Sebor wrote:
>>>>>> The rebased and retested patches are attached.
>>>>>>
>>>>>> On 9/21/20 3:17 PM, Martin Sebor wrote:
>>>>>>> Ping: 
>>>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553906.html
>>>>>>>
>>>>>>> (I'm working on rebasing the patch on top of the latest trunk which
>>>>>>> has changed some of the same code but it'd be helpful to get a go-
>>>>>>> ahead on substance the changes.  I don't expect the rebase to
>>>>>>> require any substantive modifications.)
>>>>>>>
>>>>>>> Martin
>>>>>>>
>>>>>>> On 9/14/20 4:01 PM, Martin Sebor wrote:
>>>>>>>> On 9/4/20 11:14 AM, Jason Merrill wrote:
>>>>>>>>> On 9/3/20 2:44 PM, Martin Sebor wrote:
>>>>>>>>>> On 9/1/20 1:22 PM, Jason Merrill wrote:
>>>>>>>>>>> On 8/11/20 12:19 PM, Martin Sebor via Gcc-patches wrote:
>>>>>>>>>>>> -Wplacement-new handles array indices and pointer offsets 
>>>>>>>>>>>> the same:
>>>>>>>>>>>> by adjusting them by the size of the element.  That's 
>>>>>>>>>>>> correct for
>>>>>>>>>>>> the latter but wrong for the former, causing false positives 
>>>>>>>>>>>> when
>>>>>>>>>>>> the element size is greater than one.
>>>>>>>>>>>>
>>>>>>>>>>>> In addition, the warning doesn't even attempt to handle 
>>>>>>>>>>>> arrays of
>>>>>>>>>>>> arrays.  I'm not sure if I forgot or if I simply didn't 
>>>>>>>>>>>> think of
>>>>>>>>>>>> it.
>>>>>>>>>>>>
>>>>>>>>>>>> The attached patch corrects these oversights by replacing most
>>>>>>>>>>>> of the -Wplacement-new code with a call to compute_objsize 
>>>>>>>>>>>> which
>>>>>>>>>>>> handles all this correctly (plus more), and is also better 
>>>>>>>>>>>> tested.
>>>>>>>>>>>> But even compute_objsize has bugs: it trips up while converting
>>>>>>>>>>>> wide_int to offset_int for some pointer offset ranges.  Since
>>>>>>>>>>>> handling the C++ IL required changes in this area the patch 
>>>>>>>>>>>> also
>>>>>>>>>>>> fixes that.
>>>>>>>>>>>>
>>>>>>>>>>>> For review purposes, the patch affects just the middle end.
>>>>>>>>>>>> The C++ diff pretty much just removes code from the front end.
>>>>>>>>>>>
>>>>>>>>>>> The C++ changes are OK.
>>>>>>>>>>
>>>>>>>>>> Thank you for looking at the rest as well.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> -compute_objsize (tree ptr, int ostype, access_ref *pref,
>>>>>>>>>>>> -                bitmap *visited, const vr_values *rvals /* 
>>>>>>>>>>>> = NULL */)
>>>>>>>>>>>> +compute_objsize (tree ptr, int ostype, access_ref *pref, 
>>>>>>>>>>>> bitmap *visited,
>>>>>>>>>>>> +                const vr_values *rvals)
>>>>>>>>>>>
>>>>>>>>>>> This reformatting seems unnecessary, and I prefer to keep the 
>>>>>>>>>>> comment about the default argument.
>>>>>>>>>>
>>>>>>>>>> This overload doesn't take a default argument.  (There was a 
>>>>>>>>>> stray
>>>>>>>>>> declaration of a similar function at the top of the file that had
>>>>>>>>>> one.  I've removed it.)
>>>>>>>>>
>>>>>>>>> Ah, true.
>>>>>>>>>
>>>>>>>>>>>> -      if (!size || TREE_CODE (size) != INTEGER_CST)
>>>>>>>>>>>> -       return false;
>>>>>>>>>>>  >...
>>>>>>>>>>>
>>>>>>>>>>> You change some failure cases in compute_objsize to return 
>>>>>>>>>>> success with a maximum range, while others continue to return 
>>>>>>>>>>> failure. This needs commentary about the design rationale.
>>>>>>>>>>
>>>>>>>>>> This is too much for a comment in the code but the background is
>>>>>>>>>> this: compute_objsize initially returned the object size as a 
>>>>>>>>>> constant.
>>>>>>>>>> Recently, I have enhanced it to return a range to improve 
>>>>>>>>>> warnings for
>>>>>>>>>> allocated objects.  With that, a failure can be turned into 
>>>>>>>>>> success by
>>>>>>>>>> having the function set the range to that of the largest 
>>>>>>>>>> object. That
>>>>>>>>>> should simplify the function's callers and could even improve
>>>>>>>>>> the detection of some invalid accesses.  Once this change is made
>>>>>>>>>> it might even be possible to change its return type to void.
>>>>>>>>>>
>>>>>>>>>> The change that caught your eye is necessary to make the function
>>>>>>>>>> a drop-in replacement for the C++ front end code which makes this
>>>>>>>>>> same assumption.  Without it, a number of test cases that 
>>>>>>>>>> exercise
>>>>>>>>>> VLAs fail in g++.dg/warn/Wplacement-new-size-5.C.  For example:
>>>>>>>>>>
>>>>>>>>>>    void f (int n)
>>>>>>>>>>    {
>>>>>>>>>>      char a[n];
>>>>>>>>>>      new (a - 1) int ();
>>>>>>>>>>    }
>>>>>>>>>>
>>>>>>>>>> Changing any of the other places isn't necessary for existing 
>>>>>>>>>> tests
>>>>>>>>>> to pass (and I didn't want to introduce too much churn).  But 
>>>>>>>>>> I do
>>>>>>>>>> want to change the rest of the function along the same lines 
>>>>>>>>>> at some
>>>>>>>>>> point.
>>>>>>>>>
>>>>>>>>> Please do change the other places to be consistent; better to 
>>>>>>>>> have more churn than to leave the function half-updated.  That 
>>>>>>>>> can be a separate patch if you prefer, but let's do it now 
>>>>>>>>> rather than later.
>>>>>>>>
>>>>>>>> I've made most of these changes in the other patch (also attached).
>>>>>>>> I'm quite happy with the result but it turned out to be a lot more
>>>>>>>> work than either of us expected, mostly due to the amount of 
>>>>>>>> testing.
>>>>>>>>
>>>>>>>> I've left a couple of failing cases in place mainly as reminders
>>>>>>>> to handle them better (which means I also didn't change the caller
>>>>>>>> to avoid testing for failures).  I've also added TODO notes with
>>>>>>>> reminders to handle some of the new codes more completely.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>> +  special_array_member sam{ };
>>>>>>>>>>>
>>>>>>>>>>> sam is always set by component_ref_size, so I don't think 
>>>>>>>>>>> it's necessary to initialize it at the declaration.
>>>>>>>>>>
>>>>>>>>>> I find initializing pass-by-pointer local variables helpful but
>>>>>>>>>> I don't insist on it.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> @@ -187,7 +187,7 @@ decl_init_size (tree decl, bool min)
>>>>>>>>>>>>    tree last_type = TREE_TYPE (last);
>>>>>>>>>>>>    if (TREE_CODE (last_type) != ARRAY_TYPE
>>>>>>>>>>>>        || TYPE_SIZE (last_type))
>>>>>>>>>>>> -    return size;
>>>>>>>>>>>> +    return size ? size : TYPE_SIZE_UNIT (type);
>>>>>>>>>>>
>>>>>>>>>>> This change seems to violate the comment for the function.
>>>>>>>>>>
>>>>>>>>>> By my reading (and writing) the change is covered by the first
>>>>>>>>>> sentence:
>>>>>>>>>>
>>>>>>>>>>     Returns the size of the object designated by DECL considering
>>>>>>>>>>     its initializer if it either has one or if it would not 
>>>>>>>>>> affect
>>>>>>>>>>     its size, ...
>>>>>>>>>
>>>>>>>>> OK, I see it now.
>>>>>>>>>
>>>>>>>>>> It handles a number of cases in Wplacement-new-size.C fail that
>>>>>>>>>> construct a larger object in an extern declaration of a template,
>>>>>>>>>> like this:
>>>>>>>>>>
>>>>>>>>>>    template <class> struct S { char c; };
>>>>>>>>>>    extern S<int> s;
>>>>>>>>>>
>>>>>>>>>>    void f ()
>>>>>>>>>>    {
>>>>>>>>>>      new (&s) int ();
>>>>>>>>>>    }
>>>>>>>>>>
>>>>>>>>>> I don't know why DECL_SIZE isn't set here (I don't think it can
>>>>>>>>>> be anything but equal to TYPE_SIZE, can it?) and other than 
>>>>>>>>>> struct
>>>>>>>>>> objects with a flexible array member where this identity doesn't
>>>>>>>>>> hold I can't think of others.  Am I missing something?
>>>>>>>>>
>>>>>>>>> Good question.  The attached patch should fix that, so you 
>>>>>>>>> shouldn't need the change to decl_init_size:
>>>>>>>>
>>>>>>>> I've integrated it into the bug fix.
>>>>>>>>
>>>>>>>> Besides the usual x86_64-linux bootstrap/regtest I tested both
>>>>>>>> patches by building a few packages, including Binutils/GDB, Glibc,
>>>>>>>> and  verifying no new warnings show up.
>>>>>>>>
>>>>>>>> Martin
>>>>>
>>>>>> +offset_int
>>>>>> +access_ref::size_remaining (offset_int *pmin /* = NULL */) const
>>>>>
>>>>> For the various member functions, please include the comments with 
>>>>> the definition as well as the in-class declaration.
>>>>
>>>> Only one access_ref member function is defined out-of-line: 
>>>> offset_bounded().  I've adjusted the comment and copied it above
>>>> the function definition.
>>>>
>>>>>
>>>>>> +      if (offrng[1] < offrng[0])
>>>>>
>>>>> What does it mean for the max offset to be less than the min 
>>>>> offset? I wouldn't expect that to ever happen with wide integers.
>>>>
>>>> The offset is represented in sizetype with negative values represented
>>>> as large positive values, but has to be converted to ptrdiff_t.
>>>
>>> It looks to me like the offset is offset_int, which is both signed 
>>> and big enough to hold all values of sizetype without turning large 
>>> positive values into negative values.  Where are these sign-switching 
>>> conversions happening?
>>
>> In get_offset_range in builtins.c.
> 
> Since we're converting to offset_int there, why not give the offset_int 
> the real value rather than a bogus negative value?

I don't understand the question: the real offset (in the program)
is negative when its sizetype representation is greater than
PTRDIFF_MAX.  It's worked this way for years.

Martin

>>>> These
>>>> cases come up when the unsigned offset is an ordinary range that
>>>> corresponds to an anti-range, such as here:
>>>>
>>>>    extern char a[2];
>>>>
>>>>    void f (unsigned long i)
>>>>    {
>>>>      if (i == 0)
>>>>        return;
>>>>      a[i] = 0;   // i's range is [1, -1] (i.e., [1, SIZE_MAX]
>>>>    }
>>>>
>>>>>
>>>>>> +  /* Return true if OFFRNG is bounded to a subrange of possible 
>>>>>> offset
>>>>>> +     values.  */
>>>>>> +  bool offset_bounded () const;
>>>>>
>>>>> I don't understand how you're using this.  The implementation 
>>>>> checks for the possible offset values falling outside those 
>>>>> representable by ptrdiff_t, unless the range is only a single 
>>>>> value.  And then the only use is
>>>>>
>>>>>> +  if (ref.offset_zero () || !ref.offset_bounded ())
>>>>>> +    inform (DECL_SOURCE_LOCATION (ref.ref),
>>>>>> +        "%qD declared here", ref.ref);
>>>>>> +  else if (ref.offrng[0] == ref.offrng[1])
>>>>>> +    inform (DECL_SOURCE_LOCATION (ref.ref),
>>>>>> +        "at offset %wi from %qD declared here",
>>>>>> +        ref.offrng[0].to_shwi (), ref.ref);
>>>>>> +  else
>>>>>> +    inform (DECL_SOURCE_LOCATION (ref.ref),
>>>>>> +        "at offset [%wi, %wi] from %qD declared here",
>>>>>> +        ref.offrng[0].to_shwi (), ref.offrng[1].to_shwi (), 
>>>>>> ref.ref);
>>>>>
>>>>> So if the possible offsets are all representable by ptrdiff_t, we 
>>>>> don't print the range?  The middle case also looks unreachable, 
>>>>> since offset_bounded will return false in that case.
>>>>
>>>> The function was originally named "offset_unbounded."  I changed
>>>> it to "offset_bounded" but looks like I didn't finish the job or
>>>> add any tests for it.
>>>>
>>>> The goal of conditionals is to avoid overwhelming the user with
>>>> excessive numbers that may not be meaningful or even relevant
>>>> to the warning.  I've corrected the function body, tweaked and
>>>> renamed the get_range function to get_offset_range to do a better
>>>> job of extracting ranges from the types of some nonconstant
>>>> expressions the front end passes it, and added a new test for
>>>> all this.  Attached is the new revision.
>>>>
>>>> Martin
>>>
>>
>