Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Sandiford <richard.sandiford@arm.com>
To: Tamar Christina <Tamar.Christina@arm.com>
Cc: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>,
	 nd <nd@arm.com>,  "rguenther\@suse.de" <rguenther@suse.de>,
	 "jlaw\@ventanamicro.com" <jlaw@ventanamicro.com>
Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]
Date: Tue, 28 Feb 2023 12:03:24 +0000	[thread overview]
Message-ID: <mptlekingj7.fsf@arm.com> (raw)
In-Reply-To: <VI1PR08MB5325CBCD5D554B714AD922BAFFAC9@VI1PR08MB5325.eurprd08.prod.outlook.com> (Tamar Christina's message of "Tue, 28 Feb 2023 11:12:43 +0000")

Tamar Christina <Tamar.Christina@arm.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandiford@arm.com>
>> Sent: Tuesday, February 28, 2023 11:09 AM
>> To: Tamar Christina <Tamar.Christina@arm.com>
>> Cc: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>; nd
>> <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com
>> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask
>> by using new optabs [PR108583]
>> 
>> Tamar Christina <Tamar.Christina@arm.com> writes:
>> >> -----Original Message-----
>> >> From: Richard Sandiford <richard.sandiford@arm.com>
>> >> Sent: Monday, February 27, 2023 9:33 PM
>> >> To: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>
>> >> Cc: Tamar Christina <Tamar.Christina@arm.com>; nd <nd@arm.com>;
>> >> rguenther@suse.de; jlaw@ventanamicro.com
>> >> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of
>> >> div-bitmask by using new optabs [PR108583]
>> >>
>> >> Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> >> >> -----Original Message-----
>> >> >> From: Richard Sandiford <richard.sandiford@arm.com>
>> >> >> Sent: Monday, February 27, 2023 12:12 PM
>> >> >> To: Tamar Christina <Tamar.Christina@arm.com>
>> >> >> Cc: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>; nd
>> >> >> <nd@arm.com>; rguenther@suse.de; jlaw@ventanamicro.com
>> >> >> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of
>> >> >> div-bitmask by using new optabs [PR108583]
>> >> >>
>> >> >> Tamar Christina <Tamar.Christina@arm.com> writes:
>> >> >> > Hi,
>> >> >> >
>> >> >> >> > I avoided open coding it with add and shift because it
>> >> >> >> > creates a
>> >> >> >> > 4 instructions (and shifts which are typically slow)
>> >> >> >> > dependency chain instead of a load and multiply.  This
>> >> >> >> > change, unless the target is known to optimize it further is
>> >> >> >> > unlikely to be beneficial.  And by the time we get to costing
>> >> >> >> > the only alternative is to undo the existing pattern and
>> >> >> >> so you lose the general shift optimization.
>> >> >> >> >
>> >> >> >> > So it seemed unwise to open code as shifts, given the codegen
>> >> >> >> > out of the vectorizer would be degenerate for most targets or
>> >> >> >> > one needs the more complicated route of costing during
>> >> >> >> > pattern
>> >> matching already.
>> >> >> >>
>> >> >> >> Hmm, OK.  That seems like a cost-model thing though, rather
>> >> >> >> than something that should be exposed through optabs.  And I
>> >> >> >> imagine the open-coded version would still be better than
>> >> >> >> nothing on targets without
>> >> >> highpart multiply.
>> >> >> >>
>> >> >> >> So how about replacing the hook with one that simply asks
>> >> >> >> whether division through highpart multiplication is preferred
>> >> >> >> over the add/shift
>> >> >> sequence?
>> >> >> >> (Unfortunately it's not going to be possible to work that out
>> >> >> >> from existing
>> >> >> >> information.)
>> >> >> >
>> >> >> > So this doesn't work for SVE.  For SVE the multiplication
>> >> >> > widening pass introduces FMAs at gimple level.  So in the cases
>> >> >> > where the operation is fed from a widening multiplication we end
>> >> >> > up generating
>> >> FMA.
>> >> >> If that was it I could have matched FMA.
>> >> >> >
>> >> >> > But it also pushes the multiplication in the second operand
>> >> >> > because it no longer has a mul to share the results with.
>> >> >> >
>> >> >> > In any case, the gimple code is transformed into
>> >> >> >
>> >> >> > vect__3.8_122 = .MASK_LOAD (_29, 8B, loop_mask_121);
>> >> >> > vect_patt_57.9_123 = (vector([8,8]) unsigned short)
>> >> >> > vect__3.8_122;
>> >> >> > vect_patt_64.11_127 = .FMA (vect_patt_57.9_123, vect_cst__124, {
>> >> >> > 257, ... });
>> >> >> > vect_patt_65.12_128 = vect_patt_64.11_127 >> 8;
>> >> >> > vect_patt_66.13_129 = .FMA (vect_patt_57.9_123, vect_cst__124,
>> >> >> > vect_patt_65.12_128);
>> >> >> > vect_patt_62.14_130 = vect_patt_66.13_129 >> 8;
>> >> >> > vect_patt_68.15_131 = (vector([8,8]) unsigned charD.21)
>> >> >> > vect_patt_62.14_130;
>> >> >> >
>> >> >> > This transformation is much worse than the original code, it
>> >> >> > extended the dependency chain with another expensive
>> >> >> > instruction. I can try to correct this in RTL by matching FMA
>> >> >> > and shift and splitting into MUL +
>> >> >> ADDHNB and hope CSE takes care of the extra mul.
>> >> >> >
>> >> >> > But this seems like a hack, and it's basically undoing the
>> >> >> > earlier transformation.  It seems to me that the open coding is a bad
>> idea.
>> >> >>
>> >> >> Could you post the patch that gives this result?  I'll have a poke around.
>> >> >
>> >> > Sure, I'll post the new series, it needs all of them.
>> >>
>> >> Thanks.  Which testcase did you use to get the above?
>> >>
>> >
>> > #include <stdint.h>
>> >
>> > #define N 16
>> > #define TYPE uint8_t
>> >
>> > void fun3(TYPE* restrict pixel, TYPE level, int n) {
>> >   for (int i = 0; i < (n & -16); i+=1)
>> >     pixel[i] = (pixel[i] * level) / 0xff; }
>> 
>> Thanks.  In that testcase, isn't the FMA handling an anti-optimisation in its
>> own right though?  It's duplicating a multiplication into two points on a
>> dependency chain.
>
> Most definitely, that's what I meant above. The "optimization" doesn't take into
> account the effect on the rest of the chain.
>
>> 
>> E.g. for:
>> 
>> unsigned int
>> f1 (unsigned int a, unsigned int b, unsigned int c) {
>>   unsigned int d = a * b;
>>   return d + ((c + d) >> 1);
>> }
>> unsigned int
>> g1 (unsigned int a, unsigned int b, unsigned int c) {
>>   return a * b + c;
>> }
>> 
>> __Uint32x4_t
>> f2 (__Uint32x4_t a, __Uint32x4_t b, __Uint32x4_t c) {
>>   __Uint32x4_t d = a * b;
>>   return d + ((c + d) >> 1);
>> }
>> __Uint32x4_t
>> g2 (__Uint32x4_t a, __Uint32x4_t b, __Uint32x4_t c) {
>>   return a * b + c;
>> }
>> 
>> typedef unsigned int vec __attribute__((vector_size(32))); vec
>> f3 (vec a, vec b, vec c)
>> {
>>   vec d = a * b;
>>   return d + ((c + d) >> 1);
>> }
>> vec
>> g3 (vec a, vec b, vec c)
>> {
>>   return a * b + c;
>> }
>> 
>> compiled with -O2 -msve-vector-bits=256 -march=armv8.2-a+sve, all the g
>> functions use multiply-add (as expected), but the f functions are:
>> 
>> f1:
>>         mul     w1, w0, w1
>>         add     w0, w1, w2
>>         add     w0, w1, w0, lsr 1
>>         ret
>> 
>> f2:
>>         mul     v0.4s, v0.4s, v1.4s
>>         add     v2.4s, v0.4s, v2.4s
>>         usra    v0.4s, v2.4s, 1
>>         ret
>> 
>> f3:
>>         ...
>>         mla     z0.s, p0/m, z1.s, z2.s
>>         lsr     z0.s, z0.s, #1
>>         mad     z1.s, p0/m, z2.s, z0.s
>>         ...
>> 
>> What we do for f3 doesn't seem like a good idea.
>
> Agreed,  I guess this means I have to fix that as well? ☹
>
> I'll go take a look then..

How about something like this, before the main loop in
convert_mult_to_fma:

  /* There is no numerical difference between fused and unfused integer FMAs,
     and the assumption below that FMA is as cheap as addition is unlikely
     to be true, especially if the multiplication occurs multiple times on
     the same chain.  E.g., for something like:

         (((a * b) + c) >> 1) + (a * b)

     we do not want to duplicate the a * b into two additions, not least
     because the result is not a natural FMA chain.  */
  if (ANY_INTEGRAL_TYPE_P (type)
      && !has_single_use (mul_result))
    return false;

?  Richi, would that be OK with you?

From a quick check, it passes the aarch64-sve{,2}.exp tests.

Thanks,
Richard

next prev parent reply	other threads:[~2023-02-28 12:03 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-09 17:16 Tamar Christina
2023-02-09 17:22 ` [PATCH 2/2]AArch64 Update div-bitmask to implement new optab instead of target hook [PR108583] Tamar Christina
2023-02-10 10:35   ` Tamar Christina
2023-02-10 14:10   ` Richard Sandiford
2023-02-10 10:34 ` [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583] Tamar Christina
2023-02-10 13:13 ` Richard Biener
2023-02-10 13:36 ` Richard Sandiford
2023-02-10 13:52   ` Richard Biener
2023-02-10 14:13   ` Tamar Christina
2023-02-10 14:30     ` Richard Sandiford
2023-02-10 14:54       ` Tamar Christina
2023-02-27 11:09       ` Tamar Christina
2023-02-27 12:11         ` Richard Sandiford
2023-02-27 12:14           ` Tamar Christina
2023-02-27 21:33             ` Richard Sandiford
2023-02-27 22:10               ` Tamar Christina
2023-02-28 11:08                 ` Richard Sandiford
2023-02-28 11:12                   ` Tamar Christina
2023-02-28 12:03                     ` Richard Sandiford [this message]
2023-03-01 11:30                       ` Richard Biener
2023-02-10 15:56     ` Richard Sandiford
2023-02-10 16:09       ` Tamar Christina
2023-02-10 16:25         ` Richard Sandiford
2023-02-10 16:33           ` Tamar Christina
2023-02-10 16:57             ` Richard Sandiford
2023-02-10 17:01               ` Richard Sandiford
2023-02-10 17:14               ` Tamar Christina
2023-02-10 18:12                 ` Richard Sandiford
2023-02-10 18:34                   ` Richard Biener
2023-02-10 20:58                     ` Andrew MacLeod
2023-02-13  9:54                       ` Tamar Christina
2023-02-15 12:51                         ` Tamar Christina
2023-02-15 16:05                           ` Andrew MacLeod
2023-02-15 17:13                             ` Tamar Christina
2023-02-15 17:50                               ` Andrew MacLeod
2023-02-15 18:42                                 ` Andrew MacLeod
2023-02-22 12:51                                   ` Tamar Christina
2023-02-22 16:41                                   ` Andrew MacLeod
2023-02-22 18:03                                     ` Tamar Christina
2023-02-22 18:33                                       ` Andrew MacLeod
2023-02-23  8:36                                         ` Tamar Christina
2023-02-23 16:39                                           ` Andrew MacLeod
2023-02-23 16:56                                             ` Tamar Christina
2023-03-01 16:57                                             ` Andrew Carlotti
2023-03-01 18:16                                               ` Tamar Christina
2023-02-22 13:06                                 ` Tamar Christina
2023-02-22 15:19                                   ` Andrew MacLeod

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mptlekingj7.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=Tamar.Christina@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jlaw@ventanamicro.com \
    --cc=nd@arm.com \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).