From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-510281-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 48967 invoked by alias); 4 Oct 2019 16:14:47 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 48955 invoked by uid 89); 4 Oct 2019 16:14:45 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-25.3 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,SPF_HELO_PASS autolearn=ham version=3.3.1 spammy=
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 04 Oct 2019 16:14:39 +0000
Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71])	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))	(No client certificate requested)	by mx1.redhat.com (Postfix) with ESMTPS id 25AB08E58C	for <gcc-patches@gcc.gnu.org>; Fri,  4 Oct 2019 16:14:38 +0000 (UTC)
Received: by mail-wm1-f71.google.com with SMTP id c188so1780779wmd.9        for <gcc-patches@gcc.gnu.org>; Fri, 04 Oct 2019 09:14:38 -0700 (PDT)
Received: from abulafia.quesejoda.com (136.red-2-139-5.dynamicip.rima-tde.net. [2.139.5.136])        by smtp.gmail.com with ESMTPSA id b5sm4770162wmj.18.2019.10.04.09.14.34        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);        Fri, 04 Oct 2019 09:14:35 -0700 (PDT)
Subject: Re: [patch] canonicalize unsigned [1,MAX] ranges into ~[0,0]
To: Jeff Law <law@redhat.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>, Andrew MacLeod <amacleod@redhat.com>
References: <f57ee68e-f1ee-e406-e90b-32c651a7683e@redhat.com> <995b4560-6a76-6742-888f-eadbfb9ff9cc@redhat.com> <74fb2cf8-6312-6ae2-b620-3d36deb61080@redhat.com> <6a557c4c-d86b-7b7f-70c2-483fff2d5f59@redhat.com>
From: Aldy Hernandez <aldyh@redhat.com>
Message-ID: <d7af1c4e-1f66-4293-a4b8-29440bd59ec0@redhat.com>
Date: Fri, 04 Oct 2019 16:14:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0
MIME-Version: 1.0
In-Reply-To: <6a557c4c-d86b-7b7f-70c2-483fff2d5f59@redhat.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
X-IsSubscribed: yes
X-SW-Source: 2019-10/txt/msg00352.txt.bz2


On 10/4/19 12:02 PM, Jeff Law wrote:
> On 10/4/19 9:49 AM, Aldy Hernandez wrote:
>>
>>
>> On 10/4/19 11:38 AM, Jeff Law wrote:
>>> On 10/4/19 6:59 AM, Aldy Hernandez wrote:
>>>> When I did the value_range canonicalization work, I noticed that an
>>>> unsigned [1,MAX] and an ~[0,0] could be two different representations
>>>> for the same thing.Â  I didn't address the problem then because callers
>>>> to ranges_from_anti_range() would go into an infinite loop trying to
>>>> extract ~[0,0] into [1,MAX] and [].Â  We had a lot of callers to
>>>> ranges_from_anti_range,Â andÂ itÂ smelledÂ likeÂ aÂ rat'sÂ nest,Â soÂ IÂ bailed.
>>>>
>>>> Now that we have one main caller (from the symbolic PLUS/MINUS
>>>> handling), it's a lot easier to contain.Â  Well, singleton_p also calls
>>>> it,Â butÂ it'sÂ alreadyÂ handlingÂ nonzeroÂ specially,Â soÂ itÂ wouldn'tÂ beÂ affected.
>>>>
>>>>
>>>>
>>>> With some upcoming cleanups I'm about to post, the fact that [1,MAX] and
>>>> ~[0,0] are equal_p(), but not nonzero_p(), matters.Â  Plus, it's just
>>>> good form to have one representation, giving us the ability to pick at
>>>> nonzero_pÂ rangesÂ withÂ ease.
>>>>
>>>> The code in extract_range_from_plus_minus_expr() continues to be a mess
>>>> (as it has always been), but at least it's contained, and with this
>>>> patch,Â it'sÂ slightlyÂ smaller.
>>>>
>>>> Note, I'm avoiding adding a comment header for functions with highly
>>>> descriptiveÂ obviousÂ names.
>>>>
>>>> OK?
>>>>
>>>> Aldy
>>>>
>>>> canonicalize-nonzero-ranges.patch
>>>>
>>>> commit 1c333730deeb4ddadc46ad6d12d5344f92c0352c
>>>> Author: Aldy Hernandez <aldyh@redhat.com>
>>>> Date:Â Â  Fri Oct 4 08:51:25 2019 +0200
>>>>
>>>>  Â Â Â Â  Canonicalize UNSIGNED [1,MAX] into ~[0,0].
>>>>  Â Â Â Â  Â Â Â Â  Adapt PLUS/MINUS symbolic handling, so it doesn't call
>>>>  Â Â Â Â  ranges_from_anti_range with a VR_ANTI_RANGE containing one
>>>> sub-range.
>>>>
>>>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>>>> index 6e4f145af46..3934b41fdf9 100644
>>>> --- a/gcc/ChangeLog
>>>> +++ b/gcc/ChangeLog
>>>> @@ -1,3 +1,18 @@
>>>> +2019-10-04Â  Aldy HernandezÂ  <aldyh@redhat.com>
>>>> +
>>>> +Â Â Â  * tree-vrp.c (value_range_base::singleton_p): Use num_pairs
>>>> +Â Â Â  instead of calling vrp_val_is_*.
>>>> +Â Â Â  (value_range_base::set): Canonicalize unsigned [1,MAX] into
>>>> +Â Â Â  non-zero.
>>>> +Â Â Â  (range_has_numeric_bounds_p): New.
>>>> +Â Â Â  (range_int_cst_p): Use range_has_numeric_bounds_p.
>>>> +Â Â Â  (ranges_from_anti_range): Assert that we won't recurse
>>>> +Â Â Â  indefinitely.
>>>> +Â Â Â  (extract_extremes_from_range): New.
>>>> +Â Â Â  (extract_range_from_plus_minus_expr): Adapt so we don't call
>>>> +Â Â Â  ranges_from_anti_range with an anti-range containing only one
>>>> +Â Â Â  sub-range.
>>> So no problem with the implementation, but I do have a higher level
>>> question.
>>>
>>> One of the goals of the representation side of the Ranger project is to
>>> drop anti-ranges.Â  Canonicalizing [1, MAX] to ~[0,0] seems to be going
>>> in the opposite direction.Â Â  So do we really want to canonicalize to
>>> ~[0,0]?
>>
>> Hmmm, Andrew had the same question.
>>
>> It really doesn't matter what we canonicalize too, as long as we're
>> consistent, but there are a bunch of non-zero tests throughout that were
>> checking for the ~[0,0] construct, and I didn't want to rock the boat
>> too much.Â  Although in all honesty, most of those should already be
>> converted to the ::nonzero_p() API.
>>
>> However, if we canonicalize into [1,MAX] for unsigned, we have the
>> problem that a signed non-zero will still be ~[0,0], so our ::nonzero_p
>> code will have to check two different representations, not to mention it
>> will now have to check TYPE_UNSIGNED(type).
> ISTM that the right thing to do is to first move to the ::nonzero_p API,
> which should be a behavior preserving change.  It'd probably be a medium
> sized change, but highly mechanical and behavior preserving, so easy to
> review.

Ughh, please no?  This was just a means to get the general range_fold* 
cleanups which are important and make everything easier to read.  I'd 
rather not get distracted by having to audit all the nonzero checking 
throughout.

Besides, we can't get away from anti-ranges inasmuch as we have 
value_range_base, which hopefully we can move away from in a future 
release.  So this will eventually become a non-issue.  At that point, 
we'll loose ALL anti-ranges once and for all.

~[0,0] has been the accepted way for a long time, I'd really prefer to 
keep that (for now).

And really, this is just plain ugly:

bool
value_range_base::nonzero_p ()
{
   if (m_kind == VR_ANTI_RANGE
       && !TYPE_UNSIGNED (type ())
       && integer_zerop (m_min)
       && integer_zerop (m_max))
     return true;

   return (m_kind == VR_RANGE
	  && TYPE_UNSIGNED (type ())
	  && integer_onep (m_min)
	  && vrp_val_is_max (m_max));
}

Aldy

> 
> Once that's in place, then I'd seriously look at [1,MAX] as the
> canonical form for unsigned types and [MIN, -1] [1, MAX] as the
> canonical form for signed types.  If we're trying to get away from ANTI
> ranges, canonicalizing on ~[0,0] just seems wrong.
> 
> Where things may get painful is things like [MIN, -3] [3, MAX] which
> occur due to arithmetic and pointer alignments.  I think we have a hack
> somewhere which special cases this into [MIN, -1], [1, MAX] even though
> it's technically less precise.
> 
> jeff
>