From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=4+Pq=5N=redhat.com=aldyh@sourceware.org>
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124])
	by sourceware.org (Postfix) with ESMTPS id 740013858D32
	for <gcc-patches@gcc.gnu.org>; Mon, 16 Jan 2023 07:32:50 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 740013858D32
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1673854369;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=duAGzuSOu6cyw9XWAEwyWWzrTxmhnoqXE5TMQXKIv30=;
	b=SnTJ56mZaHj4TYFpVjEA/g2XKx/qES0AvZI/YIOCNSwq7NukqVt20yQe69odIcJ5zM5xiB
	+UARGrSItyDVHw674LhRk8UE8irBdTdT6v/eKfP/ejEnRg5BEX8CjnXh6KO+wCz2kjIitj
	uf3my/WlpwmBsAPJHe4bEZYKBJZNY2E=
Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com
 [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id
 us-mta-657-EMFNvhw-PZi1Ddej2Fu4yg-1; Mon, 16 Jan 2023 02:32:48 -0500
X-MC-Unique: EMFNvhw-PZi1Ddej2Fu4yg-1
Received: by mail-wm1-f70.google.com with SMTP id ay38-20020a05600c1e2600b003da7c41fafcso2854305wmb.7
        for <gcc-patches@gcc.gnu.org>; Sun, 15 Jan 2023 23:32:48 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:in-reply-to:from:content-language
         :references:cc:to:subject:user-agent:mime-version:date:message-id
         :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=duAGzuSOu6cyw9XWAEwyWWzrTxmhnoqXE5TMQXKIv30=;
        b=MJ3EQNMKtih0iEXIXxZ7zBPDDtiqEX42SEeXiV1mw++k+FYRFZfTfy4EbYstwOg6go
         CsyxH+xI+M/AK2zOorbIK602yXDnf8fyHJpzslqnBefsoH9a5vTUdBSxT8njeIbSdNIl
         LK5Kps61VGBUZb9NPoZtsZ99x/73F4UXOZu8VC2XQxLfwLhuIhGVw0CdFUQkRkMBeQfh
         Z97ZS6DSO1ewPsCPVtYLplER9xI5jSwhxk64PaJKDYYAusCmYHUK1+m4nUQi6iOzSeVq
         CDARNNKpyI63yEJ2Tw5R/h/vINvNjJcsBbQHphod/hikn/7aaTNgKiaP489u+8qmulXV
         yigw==
X-Gm-Message-State: AFqh2koZXQZpUBznaUCySwm2eUEglGc5dHNOJ5ZHX6da+3qgF2hHguNF
	/2ClfN471e1dHMScY3oqT1q9Cs+8KzZvyBoRuaHnsHJyZf8egPYJMETXlGfYBsP+XjsOgVlpbve
	ymKKDSJh9NRk9jIBlww==
X-Received: by 2002:a05:600c:b9b:b0:3da:f53b:9ba5 with SMTP id fl27-20020a05600c0b9b00b003daf53b9ba5mr4837348wmb.33.1673854367518;
        Sun, 15 Jan 2023 23:32:47 -0800 (PST)
X-Google-Smtp-Source: AMrXdXtPTpmq7pwar2egmxlZdkEC2Oyz2pwD74rJimKhs6xFI7s0NObK5hh/NXH7OecX9Fs3DXK+sQ==
X-Received: by 2002:a05:600c:b9b:b0:3da:f53b:9ba5 with SMTP id fl27-20020a05600c0b9b00b003daf53b9ba5mr4837339wmb.33.1673854367266;
        Sun, 15 Jan 2023 23:32:47 -0800 (PST)
Received: from [192.168.1.201] ([139.47.32.75])
        by smtp.gmail.com with ESMTPSA id iv14-20020a05600c548e00b003b47b80cec3sm42848705wmb.42.2023.01.15.23.32.46
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Sun, 15 Jan 2023 23:32:46 -0800 (PST)
Message-ID: <c1220a94-dba6-47fb-3c25-e0d2e3fbca6b@redhat.com>
Date: Mon, 16 Jan 2023 08:32:46 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.6.0
Subject: Re: [PATCH] PR tree-optimization/108359 - Utilize op1 == op2 when
 invoking range-ops folding.
To: Richard Biener <richard.guenther@gmail.com>,
 Andrew MacLeod <amacleod@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>, gcc-patches <gcc-patches@gcc.gnu.org>
References: <3815f4c2-7a8d-c662-54d8-eac1ab315fbb@redhat.com>
 <Y8HTMIKRbwILk2xM@tucnak> <110a8f5a-665d-9e36-d980-b4bec4e819c7@redhat.com>
 <CAFiYyc1QygxmQTBX86zy5DDfB6fcjf+4+jxJNdw7amfzkPNTqw@mail.gmail.com>
From: Aldy Hernandez <aldyh@redhat.com>
In-Reply-To: <CAFiYyc1QygxmQTBX86zy5DDfB6fcjf+4+jxJNdw7amfzkPNTqw@mail.gmail.com>
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Language: en-US
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-4.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>


On 1/16/23 08:19, Richard Biener wrote:
> On Fri, Jan 13, 2023 at 11:07 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>>
>>
>> On 1/13/23 16:54, Jakub Jelinek wrote:
>>> On Fri, Jan 13, 2023 at 04:23:20PM -0500, Andrew MacLeod via Gcc-patches wrote:
>>>> fold_range() already invokes wi_fold_in_parts to try to get more refined
>>>> information. If the subranges are quite small, it will do each individual
>>>> calculation and combine the results.
>>>>
>>>> x * y with x = [1,3] and y = [1,3]  is broken down and we calculate each
>>>> possibility and we end up with [1,4][6,6][9,9] instead of [1,9]
>>>>
>>>> We limit this as the time is between quadratic to exponential depending on
>>>> the number of elements in x and y.
>>>>
>>>> If we also check the relation and determine that x == y, we don't need to
>>>> worry about that growth as this process is linear.  The above case will be
>>>> broken down to just  1*1, 2*2 and 3*3, resulting in a range of [1,
>>>> 1][4,4][9,9].
>>>>
>>>>    In the testcase, it happens to be the right_shift operation, but this
>>>> solution is generic and applies to all range-op operations. I added a
>>>> testcase which checks >>, *, + and %.
>>>>
>>>> I also arbitrarily chose 8 elements as the limit for breaking down
>>>> individual operations.  The overall compile time change for this is
>>>> negligible.
>>>>
>>>> Although this is a regression fix, it will affect all operations where x ==
>>>> y, which is where my initial hesitancy arose.
>>>>
>>>> Regardless, bootstrapped on x86_64-pc-linux-gnu with no regressions.  OK for
>>>> trunk?
>>> Will defer to Aldy, just some nits.
>> Did you mean Richi?
> 
> I suppose Aldy, since it's a regression fix it's OK even during stage4.

I don't have any issues with the patch.  Whatever the release managers 
agree to, I'm fine with.

Aldy

> 
> I do have a comment as well though, you do
> 
> +  // If op1 and op2 are equivalences, then we don't need a complete cross
> +  // product, just pairs of matching elements.
> +  if (relation_equiv_p (rel) && lh == rh && num_lh <= 16)
> +    {
> +      int_range_max tmp;
> +      r.set_undefined ();
> +      for (unsigned x = 0; x < num_lh; ++x)
> +       {
> +         wide_int lh_lb = lh.lower_bound (x);
> +         wide_int lh_ub = lh.upper_bound (x);
> +         wi_fold_in_parts_equiv (tmp, type, lh_lb, lh_ub);
> 
> and that does
> 
> +  widest_int lh_range = wi::sub (widest_int::from (lh_ub, TYPE_SIGN (type)),
> +                                widest_int::from (lh_lb, TYPE_SIGN (type)));
> +  // if there are 1 to 8 values in the LH range, split them up.
> +  r.set_undefined ();
> +  if (lh_range >= 0 && lh_range <= 7)
> +    {
> +      for (unsigned x = 0; x <= lh_range; x++)
> 
> which in total limits the number of sub-ranges in the output but in an
> odd way.  It's also all-or-nothing.  IIRC there's a hard limit on the
> number of sub-ranges in the output anyway via int_range<N>, so
> why not use that and always do the first loop over the sub-ranges
> of the inputs and the second loop over the range members but
> stop when we reach N-1 and then use wi_fold on the remainds?
> 
> Your above code suggests we go up to 112 sub-ranges and once
> we'd reach 113 we'd fold down to a single.
> 
> Maybe my "heuristic" wouldn't be much better, but then somehow
> breaking the heuristic down to a single magic number would be
> better, esp. since .union_ will undo some of the breakup when
> reaching N?
> 
>>>
>>>> +  // if there are 1 to 8 values in the LH range, split them up.
>>>> +  r.set_undefined ();
>>>> +  if (lh_range >= 0 && lh_range <= 7)
>>>> +    {
>>>> +      unsigned x;
>>>> +      for (x = 0; x <= lh_range; x++)
>>> Nothing uses x after the loop, so why not
>>>         for (unsigned x = 0; x <= lh_range; x++)
>>> instead?
>>
>> Just old habits.
>>
>>
>>>> @@ -234,6 +264,26 @@ range_operator::fold_range (irange &r, tree type,
>>>>      unsigned num_lh = lh.num_pairs ();
>>>>      unsigned num_rh = rh.num_pairs ();
>>>>
>>>> +  // If op1 and op2 are equivalences, then we don't need a complete cross
>>>> +  // product, just pairs of matching elements.
>>>> +  if (relation_equiv_p (rel) && (lh == rh))
>>> The ()s around lh == rh look superfluous to me.
>> Yeah I just found it marginally more readable, but it is superfluous
>>>> +    {
>>>> +      int_range_max tmp;
>>>> +      r.set_undefined ();
>>>> +      for (unsigned x = 0; x < num_lh; ++x)
>>> fold_range has an upper bound of num_lh * num_rh > 12, shouldn't something
>>> like that be there for this case too?
>>> I mean, every wi_fold_in_parts_equiv can result in 8 subranges,
>>> but num_lh could be up to 255 here, it is true it is linear and union_
>>> should merge excess ones, but still I wonder if some larger num_lh upper
>>> bound like 20 or 32 wouldn't be useful.  Up to you...
>> fold_range has the num_lh * num_rh limit because it was
>> quadratic/exponential and changes rapidly. Since this was linear based
>> on the number of sub ranges I didn't think it would matter much, but
>> sure, we can put a similar limit on it.. 16 seems reasonable.
>>>> +    {
>>>> +      wide_int lh_lb = lh.lower_bound (x);
>>>> +      wide_int lh_ub = lh.upper_bound (x);
>>>> +      wi_fold_in_parts_equiv (tmp, type, lh_lb, lh_ub);
>>>> +      r.union_ (tmp);
>>>> +      if (r.varying_p ())
>>>> +        break;
>>>> +    }
>>>> +      op1_op2_relation_effect (r, type, lh, rh, rel);
>>>> +      update_known_bitmask (r, m_code, lh, rh);
>>>> +      return true;
>>>> +    }
>>>> +
>>>        Jakub
>>>
>> Updated patch attached...  I'll run it through testing whe the current
>> one is done.
>>
>>
>> Andrew
>