public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <rguenther@suse.de>
To: Richard Sandiford <richard.sandiford@arm.com>
Cc: Tamar Christina <tamar.christina@arm.com>,
	Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>,
	nd <nd@arm.com>,
	jlaw@ventanamicro.com, Andrew MacLeod <amacleod@redhat.com>
Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]
Date: Fri, 10 Feb 2023 19:34:37 +0100	[thread overview]
Message-ID: <F6AE5834-ACA9-4FE2-9697-3E20F8A53D47@suse.de> (raw)
In-Reply-To: <mpt1qmx1jbc.fsf@arm.com>



> Am 10.02.2023 um 19:12 schrieb Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org>:
> 
> Tamar Christina <Tamar.Christina@arm.com> writes:
>>> -----Original Message-----
>>> From: Richard Sandiford <richard.sandiford@arm.com>
>>> Sent: Friday, February 10, 2023 4:57 PM
>>> To: Tamar Christina <Tamar.Christina@arm.com>
>>> Cc: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>; nd
>>> <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com
>>> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask
>>> by using new optabs [PR108583]
>>> 
>>> Tamar Christina <Tamar.Christina@arm.com> writes:
>>>>> -----Original Message-----
>>>>> From: Richard Sandiford <richard.sandiford@arm.com>
>>>>> Sent: Friday, February 10, 2023 4:25 PM
>>>>> To: Tamar Christina <Tamar.Christina@arm.com>
>>>>> Cc: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>; nd
>>>>> <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com
>>>>> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of
>>>>> div-bitmask by using new optabs [PR108583]
>>>>> 
>>>>> Tamar Christina <Tamar.Christina@arm.com> writes:
>>>>>>> -----Original Message-----
>>>>>>> From: Richard Sandiford <richard.sandiford@arm.com>
>>>>>>> Sent: Friday, February 10, 2023 3:57 PM
>>>>>>> To: Tamar Christina <Tamar.Christina@arm.com>
>>>>>>> Cc: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>; nd
>>>>>>> <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com
>>>>>>> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of
>>>>>>> div-bitmask by using new optabs [PR108583]
>>>>>>> 
>>>>>>> Tamar Christina <Tamar.Christina@arm.com> writes:
>>>>>>>>>> a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 6934aebc69f231af24668f0a1c3d140e97f55487..e39d7e6b362ef44eb2fc467f33
>>>>>>>>> 69
>>>>>>>>>> de2afea139d6 100644
>>>>>>>>>> --- a/gcc/tree-vect-patterns.cc
>>>>>>>>>> +++ b/gcc/tree-vect-patterns.cc
>>>>>>>>>> @@ -3914,12 +3914,82 @@ vect_recog_divmod_pattern
>>> (vec_info
>>>>>>> *vinfo,
>>>>>>>>>>       return pattern_stmt;
>>>>>>>>>>     }
>>>>>>>>>>   else if ((cst = uniform_integer_cst_p (oprnd1))
>>>>>>>>>> -       && targetm.vectorize.can_special_div_by_const (rhs_code,
>>>>>>>>> vectype,
>>>>>>>>>> -                              wi::to_wide
>>>>> (cst),
>>>>>>>>>> -                              NULL,
>>>>> NULL_RTX,
>>>>>>>>>> -                              NULL_RTX))
>>>>>>>>>> +       && TYPE_UNSIGNED (itype)
>>>>>>>>>> +       && rhs_code == TRUNC_DIV_EXPR
>>>>>>>>>> +       && vectype
>>>>>>>>>> +       && direct_internal_fn_supported_p (IFN_ADDH, vectype,
>>>>>>>>>> +                          OPTIMIZE_FOR_SPEED))
>>>>>>>>>>     {
>>>>>>>>>> -      return NULL;
>>>>>>>>>> +      /* div optimizations using narrowings
>>>>>>>>>> +       we can do the division e.g. shorts by 255 faster by
>>>>>>>>>> + calculating it
>>>>> as
>>>>>>>>>> +       (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in
>>>>>>>>>> +       double the precision of x.
>>>>>>>>>> +
>>>>>>>>>> +       If we imagine a short as being composed of two blocks
>>>>>>>>>> + of bytes
>>>>>>> then
>>>>>>>>>> +       adding 257 or 0b0000_0001_0000_0001 to the number is
>>>>>>>>>> + equivalent
>>>>>>> to
>>>>>>>>>> +       adding 1 to each sub component:
>>>>>>>>>> +
>>>>>>>>>> +        short value of 16-bits
>>>>>>>>>> +       ┌──────────────┬────────────────┐
>>>>>>>>>> +       │              │                │
>>>>>>>>>> +       └──────────────┴────────────────┘
>>>>>>>>>> +     8-bit part1 ▲  8-bit part2   ▲
>>>>>>>>>> +             │                │
>>>>>>>>>> +             │                │
>>>>>>>>>> +            +1               +1
>>>>>>>>>> +
>>>>>>>>>> +       after the first addition, we have to shift right by
>>>>>>>>>> + 8, and narrow
>>>>> the
>>>>>>>>>> +       results back to a byte.  Remember that the addition
>>>>>>>>>> + must be done
>>>>>>> in
>>>>>>>>>> +       double the precision of the input.  However if we
>>>>>>>>>> + know that the
>>>>>>>>> addition
>>>>>>>>>> +       `x + 257` does not overflow then we can do the
>>>>>>>>>> + operation in the
>>>>>>>>> current
>>>>>>>>>> +       precision.  In which case we don't need the pack and
>>> unpacks.
>>>>> */
>>>>>>>>>> +      auto wcst = wi::to_wide (cst);
>>>>>>>>>> +      int pow = wi::exact_log2 (wcst + 1);
>>>>>>>>>> +      if (pow == (int) (element_precision (vectype) / 2))
>>>>>>>>>> +    {
>>>>>>>>>> +      wide_int min,max;
>>>>>>>>>> +      /* If we're in a pattern we need to find the orginal
>>>>> definition.  */
>>>>>>>>>> +      tree op0 = oprnd0;
>>>>>>>>>> +      gimple *stmt = SSA_NAME_DEF_STMT (oprnd0);
>>>>>>>>>> +      stmt_vec_info stmt_info = vinfo->lookup_stmt (stmt);
>>>>>>>>>> +      if (is_pattern_stmt_p (stmt_info))
>>>>>>>>>> +        {
>>>>>>>>>> +          auto orig_stmt = STMT_VINFO_RELATED_STMT
>>>>> (stmt_info);
>>>>>>>>>> +          if (is_gimple_assign (STMT_VINFO_STMT (orig_stmt)))
>>>>>>>>>> +        op0 = gimple_assign_lhs (STMT_VINFO_STMT
>>>>> (orig_stmt));
>>>>>>>>>> +        }
>>>>>>>>> 
>>>>>>>>> If this is generally safe (I'm skipping thinking about it in
>>>>>>>>> the interests of a quick review :-)), then I think it should be
>>>>>>>>> done in vect_get_range_info instead.  Using gimple_get_lhs
>>>>>>>>> would be more general than handling just assignments.
>>>>>>>>> 
>>>>>>>>>> +
>>>>>>>>>> +      /* Check that no overflow will occur.  If we don't have range
>>>>>>>>>> +         information we can't perform the optimization.  */
>>>>>>>>>> +      if (vect_get_range_info (op0, &min, &max))
>>>>>>>>>> +        {
>>>>>>>>>> +          wide_int one = wi::to_wide (build_one_cst (itype));
>>>>>>>>>> +          wide_int adder = wi::add (one, wi::lshift (one, pow));
>>>>>>>>>> +          wi::overflow_type ovf;
>>>>>>>>>> +          /* We need adder and max in the same precision.  */
>>>>>>>>>> +          wide_int zadder
>>>>>>>>>> +        = wide_int_storage::from (adder, wi::get_precision
>>>>> (max),
>>>>>>>>>> +                      UNSIGNED);
>>>>>>>>>> +          wi::add (max, zadder, UNSIGNED, &ovf);
>>>>>>>>> 
>>>>>>>>> Could you explain this a bit more?  When do we have mismatched
>>>>>>>>> precisions?
>>>>>>>> 
>>>>>>>> C promotion rules will promote e.g.
>>>>>>>> 
>>>>>>>> void fun2(uint8_t* restrict pixel, uint8_t level, int n) {
>>>>>>>>  for (int i = 0; i < n; i+=1)
>>>>>>>>    pixel[i] = (pixel[i] + level) / 0xff; }
>>>>>>>> 
>>>>>>>> And have the addition be done as a 32 bit integer.  The
>>>>>>>> vectorizer will demote this down to a short, but range
>>>>>>>> information is not stored for patterns.  So In the above the
>>>>>>>> range will correctly be 0x1fe but the precision will be that of
>>>>>>>> the original expression, so 32.  This will be a mismatch with
>>>>>>>> itype which is derived from the size the vectorizer
>>>>>>> will perform the operation in.
>>>>>>> 
>>>>>>> Gah, missed this first time round, sorry.
>>>>>>> 
>>>>>>> Richi would know better than me, but I think it's dangerous to
>>>>>>> rely on the orig/pattern link for range information.  The end
>>>>>>> result of a pattern
>>>>>>> (vect_stmt_to_vectorize) has to have the same type as the lhs of
>>>>>>> the original statement.  But the other statements in the pattern
>>>>>>> sequence can do arbitrary things.  Their range isn't predictable
>>>>>>> from the range of the original statement result.
>>>>>>> 
>>>>>>> IIRC, the addition above is converted to:
>>>>>>> 
>>>>>>>  a' = (uint16_t) pixel[i]
>>>>>>>  b' = (uint16_t) level
>>>>>>>  sum' = a' + b'
>>>>>>>  sum = (int) sum'
>>>>>>> 
>>>>>>> where sum is the direct replacement of "pixel[i] + level", with
>>>>>>> the same type and range.  The division then uses sum' instead of sum.
>>>>>>> 
>>>>>>> But the fact that sum' is part of the same pattern as sum doesn't
>>>>>>> guarantee that sum' has the same range as sum.  E.g. the pattern
>>>>>>> statements added by the division optimisation wouldn't have this
>>>>> property.
>>>>>> 
>>>>>> So my assumption is that no pattern would replace a statement with
>>>>>> something That has higher precision than the C statement. The
>>>>>> pattern above is demoted By the vectorizer based on range information
>>> already.
>>>>>> My assumption was that the precision can only ever be smaller,
>>>>>> because otherwise the pattern has violated the semantics of the C
>>>>>> code, which
>>>>> would be dangerous if e.g. the expression escapes?
>>>>> 
>>>>> IMO the difference in precisions was a symptom of the problem rather
>>>>> than the direct cause.
>>>>> 
>>>>> The point is more that "B = vect_orig_stmt(A)" just says "A is used
>>>>> somehow in a new calculation of B".  A might equal B (if A replaces
>>>>> B), or A might be an arbitrary temporary result.  The code above is
>>>>> instead using it to mean "A equals B, expressed in a different type".
>>>>> That happens to be true for sum' in the sequence above, but it isn't
>>>>> true of non-final pattern statements in general.
>>>>> 
>>>> 
>>>> Sorry for being dense, but I though that's exactly what the code does
>>>> and what I tried explain before. If B isn't a final statement than it won't
>>> have an original statement.
>>>> AFAIK, the only places we set original statement is the root of the pattern
>>> expression.
>>> 
>>> Final pattern statements (those not in DEF_SEQ) always have the same type
>>> and value as the original statements.  We wouldn't see mismatched
>>> precisions if we were only looking at final pattern statements.
>> 
>> We would because the entire problem is that pattern statement have no ranges.
>> Ranger does not track them after they have been created.  This could of course
>> Trivially be solved if we tell ranger about the demotion we did, but we don't do so
>> at the moment. It will just return varying here.  This is the root cause of the issue.
>> 
>>> 
>>> Like you say, the 16-bit addition didn't exist before vectorisation (it was a 32-
>>> bit addition instead).  So to make things type-correct, the 32-bit addition:
>>> 
>>>   A: sum = a + b           (STMT_VINFO_RELATED_STMT == A2)
>>> 
>>> is replaced with:
>>> 
>>>   DEF_SEQ:
>>>     A1: tmp = a' + b'      (STMT_VINFO_RELATED_STMT == A)
>>>   A2: sum' = (int) tmp     (STMT_VINFO_RELATED_STMT == A)
>>> 
>>> (using different notation from before, just to confuse things).
>>> Here, A2 is the final pattern statement for A and A1 is just a temporary result.
>>> sum == sum'.
>>> 
>>> Later, we do a similar thing for the division itself.  We have:
>>> 
>>>   B: quotient = sum / 0xff            (STMT_VINFO_RELATED_STMT == B2)
>>> 
>>> We realise that this can be a 16-bit division, so (IIRC) we use
>>> vect_look_through_possible_promotion on sum to find the best starting
>>> point.  This should give:
>>> 
>>>   DEF_SEQ:
>>>     B1: tmp2 = tmp / (uint16_t) 0xff  (STMT_VINFO_RELATED_STMT == B)
>>>   B2: quotient' = (int) tmp2          (STMT_VINFO_RELATED_STMT == B)
>>> 
>>> Both changes are done by vect_widened_op_tree.
>>> 
>>> We then apply the division pattern to B1.  B1 is a nonfinal pattern statement
>>> that uses the result (tmp) of another nonfinal pattern statement (A1).
>>> 
>>> The code does:
>>> 
>>>      if (is_pattern_stmt_p (stmt_info))
>>>        {
>>>          auto orig_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
>>>          if (is_gimple_assign (STMT_VINFO_STMT (orig_stmt)))
>>>        op0 = gimple_assign_lhs (STMT_VINFO_STMT (orig_stmt));
>>>        }
>>> 
>>> is_pattern_stmt_p is true for both A1 and A2, and
>>> STMT_VINFO_RELATED_STMT is A for both A1 and A2.  I would expect:
>>> 
>>>  gcc_assert (stmt_info == vect_stmt_to_vectorize (orig_stmt));
>>> 
>>> (testing for a final pattern) to fail for the motivating example.
>>> 
>> 
>> I think we're actually saying the same thing. I believe all I'm saying is that looking
>> at the original statement is a safe alternative as it conservatively will overestimate
>> to VARYING or give a wider range than the pattern would have.
> 
> Hmm, but you said "If B isn't a final statement than it won't have an
> original statement. AFAIK, the only places we set original statement
> is the root of the pattern expression."  My point was that that isn't true.
> All statements in the pattern have an original statement, not just the root.
> And we're specifically relying on that for the motivating example to work.
> 
>> I'm saying it's conservatively safe, while not overly accurate.  The alternative would be
>> to tell ranger about the demotions in vect_recog_over_widening_pattern using range::set.
>> 
>> But for this to work the general widening pattern also have to update the range information.
>> 
>> I think where we're disagreeing is that I think looking at the original scalar statement is a safe
>> conservative estimate.  It will fail in some cases, but that's a missed optimization, not a miss-optimization.
> 
> Yeah, like you say, I disagree that it's conservatively correct.
> It means that we're hoping (without proving) that the only things
> between stmt_info and the final pattern statement are conversions.
> I don't think there's any reason in principle why that must hold.
> 
> What would be conservatively correct would be to start from the
> final pattern statement and work our way down to the value that
> is actually being used.  That seems a bit convoluted though,
> so I'd prefer not to do that...
> 
>> In any case, if you disagree I don’t' really see a way forward aside from making this its own pattern
>> running it before the overwidening pattern.
> 
> I think we should look to see if ranger can be persuaded to provide the
> range of the 16-bit addition, even though the statement that produces it
> isn't part of a BB.  It shouldn't matter that the addition originally
> came from a 32-bit one: the range follows directly from the ranges of
> the operands (i.e. the fact that the operands are the results of
> widening conversions).

I think you can ask ranger on operations on names defined in the IL, so you can work yourself through the sequence of operations in the pattern sequence to compute ranges on their defs (and possibly even store them in the SSA info).  You just need to pick the correct ranger API for this…. Andrew CCed

Richard 

> Thanks,
> Richard

  reply	other threads:[~2023-02-10 18:34 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-09 17:16 Tamar Christina
2023-02-09 17:22 ` [PATCH 2/2]AArch64 Update div-bitmask to implement new optab instead of target hook [PR108583] Tamar Christina
2023-02-10 10:35   ` Tamar Christina
2023-02-10 14:10   ` Richard Sandiford
2023-02-10 10:34 ` [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583] Tamar Christina
2023-02-10 13:13 ` Richard Biener
2023-02-10 13:36 ` Richard Sandiford
2023-02-10 13:52   ` Richard Biener
2023-02-10 14:13   ` Tamar Christina
2023-02-10 14:30     ` Richard Sandiford
2023-02-10 14:54       ` Tamar Christina
2023-02-27 11:09       ` Tamar Christina
2023-02-27 12:11         ` Richard Sandiford
2023-02-27 12:14           ` Tamar Christina
2023-02-27 21:33             ` Richard Sandiford
2023-02-27 22:10               ` Tamar Christina
2023-02-28 11:08                 ` Richard Sandiford
2023-02-28 11:12                   ` Tamar Christina
2023-02-28 12:03                     ` Richard Sandiford
2023-03-01 11:30                       ` Richard Biener
2023-02-10 15:56     ` Richard Sandiford
2023-02-10 16:09       ` Tamar Christina
2023-02-10 16:25         ` Richard Sandiford
2023-02-10 16:33           ` Tamar Christina
2023-02-10 16:57             ` Richard Sandiford
2023-02-10 17:01               ` Richard Sandiford
2023-02-10 17:14               ` Tamar Christina
2023-02-10 18:12                 ` Richard Sandiford
2023-02-10 18:34                   ` Richard Biener [this message]
2023-02-10 20:58                     ` Andrew MacLeod
2023-02-13  9:54                       ` Tamar Christina
2023-02-15 12:51                         ` Tamar Christina
2023-02-15 16:05                           ` Andrew MacLeod
2023-02-15 17:13                             ` Tamar Christina
2023-02-15 17:50                               ` Andrew MacLeod
2023-02-15 18:42                                 ` Andrew MacLeod
2023-02-22 12:51                                   ` Tamar Christina
2023-02-22 16:41                                   ` Andrew MacLeod
2023-02-22 18:03                                     ` Tamar Christina
2023-02-22 18:33                                       ` Andrew MacLeod
2023-02-23  8:36                                         ` Tamar Christina
2023-02-23 16:39                                           ` Andrew MacLeod
2023-02-23 16:56                                             ` Tamar Christina
2023-03-01 16:57                                             ` Andrew Carlotti
2023-03-01 18:16                                               ` Tamar Christina
2023-02-22 13:06                                 ` Tamar Christina
2023-02-22 15:19                                   ` Andrew MacLeod

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=F6AE5834-ACA9-4FE2-9697-3E20F8A53D47@suse.de \
    --to=rguenther@suse.de \
    --cc=amacleod@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jlaw@ventanamicro.com \
    --cc=nd@arm.com \
    --cc=richard.sandiford@arm.com \
    --cc=tamar.christina@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).