Re: [PATCH] PR tree-optimization/108359 - Utilize op1 == op2 when invoking range-ops folding.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Aldy Hernandez <aldyh@redhat.com>
To: Richard Biener <richard.guenther@gmail.com>,
	Andrew MacLeod <amacleod@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>, gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] PR tree-optimization/108359 - Utilize op1 == op2 when invoking range-ops folding.
Date: Mon, 16 Jan 2023 08:32:46 +0100	[thread overview]
Message-ID: <c1220a94-dba6-47fb-3c25-e0d2e3fbca6b@redhat.com> (raw)
In-Reply-To: <CAFiYyc1QygxmQTBX86zy5DDfB6fcjf+4+jxJNdw7amfzkPNTqw@mail.gmail.com>



On 1/16/23 08:19, Richard Biener wrote:
> On Fri, Jan 13, 2023 at 11:07 PM Andrew MacLeod <amacleod@redhat.com> wrote:
>>
>>
>> On 1/13/23 16:54, Jakub Jelinek wrote:
>>> On Fri, Jan 13, 2023 at 04:23:20PM -0500, Andrew MacLeod via Gcc-patches wrote:
>>>> fold_range() already invokes wi_fold_in_parts to try to get more refined
>>>> information. If the subranges are quite small, it will do each individual
>>>> calculation and combine the results.
>>>>
>>>> x * y with x = [1,3] and y = [1,3]  is broken down and we calculate each
>>>> possibility and we end up with [1,4][6,6][9,9] instead of [1,9]
>>>>
>>>> We limit this as the time is between quadratic to exponential depending on
>>>> the number of elements in x and y.
>>>>
>>>> If we also check the relation and determine that x == y, we don't need to
>>>> worry about that growth as this process is linear.  The above case will be
>>>> broken down to just  1*1, 2*2 and 3*3, resulting in a range of [1,
>>>> 1][4,4][9,9].
>>>>
>>>>    In the testcase, it happens to be the right_shift operation, but this
>>>> solution is generic and applies to all range-op operations. I added a
>>>> testcase which checks >>, *, + and %.
>>>>
>>>> I also arbitrarily chose 8 elements as the limit for breaking down
>>>> individual operations.  The overall compile time change for this is
>>>> negligible.
>>>>
>>>> Although this is a regression fix, it will affect all operations where x ==
>>>> y, which is where my initial hesitancy arose.
>>>>
>>>> Regardless, bootstrapped on x86_64-pc-linux-gnu with no regressions.  OK for
>>>> trunk?
>>> Will defer to Aldy, just some nits.
>> Did you mean Richi?
> 
> I suppose Aldy, since it's a regression fix it's OK even during stage4.

I don't have any issues with the patch.  Whatever the release managers 
agree to, I'm fine with.

Aldy

> 
> I do have a comment as well though, you do
> 
> +  // If op1 and op2 are equivalences, then we don't need a complete cross
> +  // product, just pairs of matching elements.
> +  if (relation_equiv_p (rel) && lh == rh && num_lh <= 16)
> +    {
> +      int_range_max tmp;
> +      r.set_undefined ();
> +      for (unsigned x = 0; x < num_lh; ++x)
> +       {
> +         wide_int lh_lb = lh.lower_bound (x);
> +         wide_int lh_ub = lh.upper_bound (x);
> +         wi_fold_in_parts_equiv (tmp, type, lh_lb, lh_ub);
> 
> and that does
> 
> +  widest_int lh_range = wi::sub (widest_int::from (lh_ub, TYPE_SIGN (type)),
> +                                widest_int::from (lh_lb, TYPE_SIGN (type)));
> +  // if there are 1 to 8 values in the LH range, split them up.
> +  r.set_undefined ();
> +  if (lh_range >= 0 && lh_range <= 7)
> +    {
> +      for (unsigned x = 0; x <= lh_range; x++)
> 
> which in total limits the number of sub-ranges in the output but in an
> odd way.  It's also all-or-nothing.  IIRC there's a hard limit on the
> number of sub-ranges in the output anyway via int_range<N>, so
> why not use that and always do the first loop over the sub-ranges
> of the inputs and the second loop over the range members but
> stop when we reach N-1 and then use wi_fold on the remainds?
> 
> Your above code suggests we go up to 112 sub-ranges and once
> we'd reach 113 we'd fold down to a single.
> 
> Maybe my "heuristic" wouldn't be much better, but then somehow
> breaking the heuristic down to a single magic number would be
> better, esp. since .union_ will undo some of the breakup when
> reaching N?
> 
>>>
>>>> +  // if there are 1 to 8 values in the LH range, split them up.
>>>> +  r.set_undefined ();
>>>> +  if (lh_range >= 0 && lh_range <= 7)
>>>> +    {
>>>> +      unsigned x;
>>>> +      for (x = 0; x <= lh_range; x++)
>>> Nothing uses x after the loop, so why not
>>>         for (unsigned x = 0; x <= lh_range; x++)
>>> instead?
>>
>> Just old habits.
>>
>>
>>>> @@ -234,6 +264,26 @@ range_operator::fold_range (irange &r, tree type,
>>>>      unsigned num_lh = lh.num_pairs ();
>>>>      unsigned num_rh = rh.num_pairs ();
>>>>
>>>> +  // If op1 and op2 are equivalences, then we don't need a complete cross
>>>> +  // product, just pairs of matching elements.
>>>> +  if (relation_equiv_p (rel) && (lh == rh))
>>> The ()s around lh == rh look superfluous to me.
>> Yeah I just found it marginally more readable, but it is superfluous
>>>> +    {
>>>> +      int_range_max tmp;
>>>> +      r.set_undefined ();
>>>> +      for (unsigned x = 0; x < num_lh; ++x)
>>> fold_range has an upper bound of num_lh * num_rh > 12, shouldn't something
>>> like that be there for this case too?
>>> I mean, every wi_fold_in_parts_equiv can result in 8 subranges,
>>> but num_lh could be up to 255 here, it is true it is linear and union_
>>> should merge excess ones, but still I wonder if some larger num_lh upper
>>> bound like 20 or 32 wouldn't be useful.  Up to you...
>> fold_range has the num_lh * num_rh limit because it was
>> quadratic/exponential and changes rapidly. Since this was linear based
>> on the number of sub ranges I didn't think it would matter much, but
>> sure, we can put a similar limit on it.. 16 seems reasonable.
>>>> +    {
>>>> +      wide_int lh_lb = lh.lower_bound (x);
>>>> +      wide_int lh_ub = lh.upper_bound (x);
>>>> +      wi_fold_in_parts_equiv (tmp, type, lh_lb, lh_ub);
>>>> +      r.union_ (tmp);
>>>> +      if (r.varying_p ())
>>>> +        break;
>>>> +    }
>>>> +      op1_op2_relation_effect (r, type, lh, rh, rel);
>>>> +      update_known_bitmask (r, m_code, lh, rh);
>>>> +      return true;
>>>> +    }
>>>> +
>>>        Jakub
>>>
>> Updated patch attached...  I'll run it through testing whe the current
>> one is done.
>>
>>
>> Andrew
>

next prev parent reply	other threads:[~2023-01-16  7:32 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-13 21:23 Andrew MacLeod
2023-01-13 21:54 ` Jakub Jelinek
2023-01-13 22:07   ` Andrew MacLeod
2023-01-16  7:19     ` Richard Biener
2023-01-16  7:32       ` Aldy Hernandez [this message]
2023-01-16  8:32       ` Jakub Jelinek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c1220a94-dba6-47fb-3c25-e0d2e3fbca6b@redhat.com \
    --to=aldyh@redhat.com \
    --cc=amacleod@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).