From: Richard Sandiford <richard.sandiford@arm.com>
To: Tamar Christina <Tamar.Christina@arm.com>
Cc: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>,
nd <nd@arm.com>, "rguenther\@suse.de" <rguenther@suse.de>,
"jlaw\@ventanamicro.com" <jlaw@ventanamicro.com>
Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]
Date: Fri, 10 Feb 2023 16:57:28 +0000 [thread overview]
Message-ID: <mptsffd1mrr.fsf@arm.com> (raw)
In-Reply-To: <VI1PR08MB5325AF01FE7CF4FD9464E48AFFDE9@VI1PR08MB5325.eurprd08.prod.outlook.com> (Tamar Christina's message of "Fri, 10 Feb 2023 16:33:20 +0000")
Tamar Christina <Tamar.Christina@arm.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandiford@arm.com>
>> Sent: Friday, February 10, 2023 4:25 PM
>> To: Tamar Christina <Tamar.Christina@arm.com>
>> Cc: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>; nd
>> <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com
>> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask
>> by using new optabs [PR108583]
>>
>> Tamar Christina <Tamar.Christina@arm.com> writes:
>> >> -----Original Message-----
>> >> From: Richard Sandiford <richard.sandiford@arm.com>
>> >> Sent: Friday, February 10, 2023 3:57 PM
>> >> To: Tamar Christina <Tamar.Christina@arm.com>
>> >> Cc: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>; nd
>> >> <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com
>> >> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of
>> >> div-bitmask by using new optabs [PR108583]
>> >>
>> >> Tamar Christina <Tamar.Christina@arm.com> writes:
>> >> >> > a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index
>> >> >> >
>> >> >>
>> >>
>> 6934aebc69f231af24668f0a1c3d140e97f55487..e39d7e6b362ef44eb2fc467f33
>> >> >> 69
>> >> >> > de2afea139d6 100644
>> >> >> > --- a/gcc/tree-vect-patterns.cc
>> >> >> > +++ b/gcc/tree-vect-patterns.cc
>> >> >> > @@ -3914,12 +3914,82 @@ vect_recog_divmod_pattern (vec_info
>> >> *vinfo,
>> >> >> > return pattern_stmt;
>> >> >> > }
>> >> >> > else if ((cst = uniform_integer_cst_p (oprnd1))
>> >> >> > - && targetm.vectorize.can_special_div_by_const (rhs_code,
>> >> >> vectype,
>> >> >> > - wi::to_wide
>> (cst),
>> >> >> > - NULL,
>> NULL_RTX,
>> >> >> > - NULL_RTX))
>> >> >> > + && TYPE_UNSIGNED (itype)
>> >> >> > + && rhs_code == TRUNC_DIV_EXPR
>> >> >> > + && vectype
>> >> >> > + && direct_internal_fn_supported_p (IFN_ADDH, vectype,
>> >> >> > + OPTIMIZE_FOR_SPEED))
>> >> >> > {
>> >> >> > - return NULL;
>> >> >> > + /* div optimizations using narrowings
>> >> >> > + we can do the division e.g. shorts by 255 faster by calculating it
>> as
>> >> >> > + (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in
>> >> >> > + double the precision of x.
>> >> >> > +
>> >> >> > + If we imagine a short as being composed of two blocks of
>> >> >> > + bytes
>> >> then
>> >> >> > + adding 257 or 0b0000_0001_0000_0001 to the number is
>> >> >> > + equivalent
>> >> to
>> >> >> > + adding 1 to each sub component:
>> >> >> > +
>> >> >> > + short value of 16-bits
>> >> >> > + ┌──────────────┬────────────────┐
>> >> >> > + │ │ │
>> >> >> > + └──────────────┴────────────────┘
>> >> >> > + 8-bit part1 ▲ 8-bit part2 ▲
>> >> >> > + │ │
>> >> >> > + │ │
>> >> >> > + +1 +1
>> >> >> > +
>> >> >> > + after the first addition, we have to shift right by 8, and narrow
>> the
>> >> >> > + results back to a byte. Remember that the addition must
>> >> >> > + be done
>> >> in
>> >> >> > + double the precision of the input. However if we know
>> >> >> > + that the
>> >> >> addition
>> >> >> > + `x + 257` does not overflow then we can do the operation
>> >> >> > + in the
>> >> >> current
>> >> >> > + precision. In which case we don't need the pack and unpacks.
>> */
>> >> >> > + auto wcst = wi::to_wide (cst);
>> >> >> > + int pow = wi::exact_log2 (wcst + 1);
>> >> >> > + if (pow == (int) (element_precision (vectype) / 2))
>> >> >> > + {
>> >> >> > + wide_int min,max;
>> >> >> > + /* If we're in a pattern we need to find the orginal
>> definition. */
>> >> >> > + tree op0 = oprnd0;
>> >> >> > + gimple *stmt = SSA_NAME_DEF_STMT (oprnd0);
>> >> >> > + stmt_vec_info stmt_info = vinfo->lookup_stmt (stmt);
>> >> >> > + if (is_pattern_stmt_p (stmt_info))
>> >> >> > + {
>> >> >> > + auto orig_stmt = STMT_VINFO_RELATED_STMT
>> (stmt_info);
>> >> >> > + if (is_gimple_assign (STMT_VINFO_STMT (orig_stmt)))
>> >> >> > + op0 = gimple_assign_lhs (STMT_VINFO_STMT
>> (orig_stmt));
>> >> >> > + }
>> >> >>
>> >> >> If this is generally safe (I'm skipping thinking about it in the
>> >> >> interests of a quick review :-)), then I think it should be done
>> >> >> in vect_get_range_info instead. Using gimple_get_lhs would be
>> >> >> more general than handling just assignments.
>> >> >>
>> >> >> > +
>> >> >> > + /* Check that no overflow will occur. If we don't have range
>> >> >> > + information we can't perform the optimization. */
>> >> >> > + if (vect_get_range_info (op0, &min, &max))
>> >> >> > + {
>> >> >> > + wide_int one = wi::to_wide (build_one_cst (itype));
>> >> >> > + wide_int adder = wi::add (one, wi::lshift (one, pow));
>> >> >> > + wi::overflow_type ovf;
>> >> >> > + /* We need adder and max in the same precision. */
>> >> >> > + wide_int zadder
>> >> >> > + = wide_int_storage::from (adder, wi::get_precision
>> (max),
>> >> >> > + UNSIGNED);
>> >> >> > + wi::add (max, zadder, UNSIGNED, &ovf);
>> >> >>
>> >> >> Could you explain this a bit more? When do we have mismatched
>> >> >> precisions?
>> >> >
>> >> > C promotion rules will promote e.g.
>> >> >
>> >> > void fun2(uint8_t* restrict pixel, uint8_t level, int n) {
>> >> > for (int i = 0; i < n; i+=1)
>> >> > pixel[i] = (pixel[i] + level) / 0xff; }
>> >> >
>> >> > And have the addition be done as a 32 bit integer. The vectorizer
>> >> > will demote this down to a short, but range information is not
>> >> > stored for patterns. So In the above the range will correctly be
>> >> > 0x1fe but the precision will be that of the original expression, so
>> >> > 32. This will be a mismatch with itype which is derived from the
>> >> > size the vectorizer
>> >> will perform the operation in.
>> >>
>> >> Gah, missed this first time round, sorry.
>> >>
>> >> Richi would know better than me, but I think it's dangerous to rely
>> >> on the orig/pattern link for range information. The end result of a
>> >> pattern
>> >> (vect_stmt_to_vectorize) has to have the same type as the lhs of the
>> >> original statement. But the other statements in the pattern sequence
>> >> can do arbitrary things. Their range isn't predictable from the
>> >> range of the original statement result.
>> >>
>> >> IIRC, the addition above is converted to:
>> >>
>> >> a' = (uint16_t) pixel[i]
>> >> b' = (uint16_t) level
>> >> sum' = a' + b'
>> >> sum = (int) sum'
>> >>
>> >> where sum is the direct replacement of "pixel[i] + level", with the
>> >> same type and range. The division then uses sum' instead of sum.
>> >>
>> >> But the fact that sum' is part of the same pattern as sum doesn't
>> >> guarantee that sum' has the same range as sum. E.g. the pattern
>> >> statements added by the division optimisation wouldn't have this
>> property.
>> >
>> > So my assumption is that no pattern would replace a statement with
>> > something That has higher precision than the C statement. The pattern
>> > above is demoted By the vectorizer based on range information already.
>> > My assumption was that the precision can only ever be smaller, because
>> > otherwise the pattern has violated the semantics of the C code, which
>> would be dangerous if e.g. the expression escapes?
>>
>> IMO the difference in precisions was a symptom of the problem rather than
>> the direct cause.
>>
>> The point is more that "B = vect_orig_stmt(A)" just says "A is used somehow
>> in a new calculation of B". A might equal B (if A replaces B), or A might be an
>> arbitrary temporary result. The code above is instead using it to mean "A
>> equals B, expressed in a different type". That happens to be true for sum' in
>> the sequence above, but it isn't true of non-final pattern statements in
>> general.
>>
>
> Sorry for being dense, but I though that's exactly what the code does and what I
> tried explain before. If B isn't a final statement than it won't have an original statement.
> AFAIK, the only places we set original statement is the root of the pattern expression.
Final pattern statements (those not in DEF_SEQ) always have the same
type and value as the original statements. We wouldn't see mismatched
precisions if we were only looking at final pattern statements.
Like you say, the 16-bit addition didn't exist before vectorisation
(it was a 32-bit addition instead). So to make things type-correct,
the 32-bit addition:
A: sum = a + b (STMT_VINFO_RELATED_STMT == A2)
is replaced with:
DEF_SEQ:
A1: tmp = a' + b' (STMT_VINFO_RELATED_STMT == A)
A2: sum' = (int) tmp (STMT_VINFO_RELATED_STMT == A)
(using different notation from before, just to confuse things).
Here, A2 is the final pattern statement for A and A1 is just a
temporary result. sum == sum'.
Later, we do a similar thing for the division itself. We have:
B: quotient = sum / 0xff (STMT_VINFO_RELATED_STMT == B2)
We realise that this can be a 16-bit division, so (IIRC) we use
vect_look_through_possible_promotion on sum to find the best
starting point. This should give:
DEF_SEQ:
B1: tmp2 = tmp / (uint16_t) 0xff (STMT_VINFO_RELATED_STMT == B)
B2: quotient' = (int) tmp2 (STMT_VINFO_RELATED_STMT == B)
Both changes are done by vect_widened_op_tree.
We then apply the division pattern to B1. B1 is a nonfinal pattern
statement that uses the result (tmp) of another nonfinal pattern
statement (A1).
The code does:
if (is_pattern_stmt_p (stmt_info))
{
auto orig_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
if (is_gimple_assign (STMT_VINFO_STMT (orig_stmt)))
op0 = gimple_assign_lhs (STMT_VINFO_STMT (orig_stmt));
}
is_pattern_stmt_p is true for both A1 and A2, and STMT_VINFO_RELATED_STMT
is A for both A1 and A2. I would expect:
gcc_assert (stmt_info == vect_stmt_to_vectorize (orig_stmt));
(testing for a final pattern) to fail for the motivating example.
Thanks,
Richard
next prev parent reply other threads:[~2023-02-10 16:57 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-09 17:16 Tamar Christina
2023-02-09 17:22 ` [PATCH 2/2]AArch64 Update div-bitmask to implement new optab instead of target hook [PR108583] Tamar Christina
2023-02-10 10:35 ` Tamar Christina
2023-02-10 14:10 ` Richard Sandiford
2023-02-10 10:34 ` [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583] Tamar Christina
2023-02-10 13:13 ` Richard Biener
2023-02-10 13:36 ` Richard Sandiford
2023-02-10 13:52 ` Richard Biener
2023-02-10 14:13 ` Tamar Christina
2023-02-10 14:30 ` Richard Sandiford
2023-02-10 14:54 ` Tamar Christina
2023-02-27 11:09 ` Tamar Christina
2023-02-27 12:11 ` Richard Sandiford
2023-02-27 12:14 ` Tamar Christina
2023-02-27 21:33 ` Richard Sandiford
2023-02-27 22:10 ` Tamar Christina
2023-02-28 11:08 ` Richard Sandiford
2023-02-28 11:12 ` Tamar Christina
2023-02-28 12:03 ` Richard Sandiford
2023-03-01 11:30 ` Richard Biener
2023-02-10 15:56 ` Richard Sandiford
2023-02-10 16:09 ` Tamar Christina
2023-02-10 16:25 ` Richard Sandiford
2023-02-10 16:33 ` Tamar Christina
2023-02-10 16:57 ` Richard Sandiford [this message]
2023-02-10 17:01 ` Richard Sandiford
2023-02-10 17:14 ` Tamar Christina
2023-02-10 18:12 ` Richard Sandiford
2023-02-10 18:34 ` Richard Biener
2023-02-10 20:58 ` Andrew MacLeod
2023-02-13 9:54 ` Tamar Christina
2023-02-15 12:51 ` Tamar Christina
2023-02-15 16:05 ` Andrew MacLeod
2023-02-15 17:13 ` Tamar Christina
2023-02-15 17:50 ` Andrew MacLeod
2023-02-15 18:42 ` Andrew MacLeod
2023-02-22 12:51 ` Tamar Christina
2023-02-22 16:41 ` Andrew MacLeod
2023-02-22 18:03 ` Tamar Christina
2023-02-22 18:33 ` Andrew MacLeod
2023-02-23 8:36 ` Tamar Christina
2023-02-23 16:39 ` Andrew MacLeod
2023-02-23 16:56 ` Tamar Christina
2023-03-01 16:57 ` Andrew Carlotti
2023-03-01 18:16 ` Tamar Christina
2023-02-22 13:06 ` Tamar Christina
2023-02-22 15:19 ` Andrew MacLeod
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=mptsffd1mrr.fsf@arm.com \
--to=richard.sandiford@arm.com \
--cc=Tamar.Christina@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=jlaw@ventanamicro.com \
--cc=nd@arm.com \
--cc=rguenther@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).