* [PATCH] PR tree-optimization/101895: Fold VEC_PERM to help recognize FMA.
@ 2022-03-11 23:48 Roger Sayle
2022-03-12 23:39 ` Marc Glisse
0 siblings, 1 reply; 5+ messages in thread
From: Roger Sayle @ 2022-03-11 23:48 UTC (permalink / raw)
To: 'GCC Patches'
[-- Attachment #1: Type: text/plain, Size: 1723 bytes --]
This patch resolves PR tree-optimization/101895 a missed optimization
regression, by adding a constant folding simplification to match.pd to
simplify the transform "mult; vec_perm; plus" into "vec_perm; mult; plus"
with the aim that keeping the multiplication and addition next to each
other allows them to be recognized as fused-multiply-add on suitable
targets. This transformation requires a tweak to match.pd's
vec_same_elem_p predicate to handle CONSTRUCTOR_EXPRs using the same
SSA_NAME_DEF_STMT idiom used for constructors elsewhere in match.pd.
The net effect is that the following code example:
void foo(float * __restrict__ a, float b, float *c) {
a[0] = c[0]*b + a[0];
a[1] = c[2]*b + a[1];
a[2] = c[1]*b + a[2];
a[3] = c[3]*b + a[3];
}
when compiled on x86_64-pc-linux-gnu with -O2 -march=cascadelake
currently generates:
vbroadcastss %xmm0, %xmm0
vmulps (%rsi), %xmm0, %xmm0
vpermilps $216, %xmm0, %xmm0
vaddps (%rdi), %xmm0, %xmm0
vmovups %xmm0, (%rdi)
ret
but with this patch now generates the improved:
vpermilps $216, (%rsi), %xmm1
vbroadcastss %xmm0, %xmm0
vfmadd213ps (%rdi), %xmm0, %xmm1
vmovups %xmm1, (%rdi)
ret
This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures. Ok for mainline?
2022-03-11 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR tree-optimization/101895
* match.pd (vec_same_elem_p): Handle CONSTRUCTOR_EXPR def.
(plus (vec_perm (mult ...) ...) ...): New reordering simplification.
gcc/testsuite/ChangeLog
* gcc.target/i386/pr101895.c: New test case.
Thanks in advance,
Roger
--
[-- Attachment #2: patchvp.txt --]
[-- Type: text/plain, Size: 1413 bytes --]
diff --git a/gcc/match.pd b/gcc/match.pd
index 97399e5..9184276 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7695,10 +7695,22 @@ and,
(match vec_same_elem_p
(vec_duplicate @0))
+(match vec_same_elem_p
+ CONSTRUCTOR@0
+ (if (uniform_vector_p (TREE_CODE (@0) == SSA_NAME
+ ? gimple_assign_rhs1 (SSA_NAME_DEF_STMT (@0)) : @0))))
+
(simplify
(vec_perm vec_same_elem_p@0 @0 @1)
@0)
+/* Push VEC_PERM earlier if that may help FMA perception (PR101895). */
+(for plusminus (plus minus)
+ (simplify
+ (plusminus (vec_perm (mult@0 @1 vec_same_elem_p@2) @0 @3) @4)
+ (plusminus (mult (vec_perm @1 @1 @3) @2) @4)))
+
+
/* Match count trailing zeroes for simplify_count_trailing_zeroes in fwprop.
The canonical form is array[((x & -x) * C) >> SHIFT] where C is a magic
constant which when multiplied by a power of 2 contains a unique value
diff --git a/gcc/testsuite/gcc.target/i386/pr101895.c b/gcc/testsuite/gcc.target/i386/pr101895.c
new file mode 100644
index 0000000..4d0f1cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr101895.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=cascadelake" } */
+
+void foo(float * __restrict__ a, float b, float *c) {
+ a[0] = c[0]*b + a[0];
+ a[1] = c[2]*b + a[1];
+ a[2] = c[1]*b + a[2];
+ a[3] = c[3]*b + a[3];
+}
+
+/* { dg-final { scan-assembler "vfmadd" } } */
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] PR tree-optimization/101895: Fold VEC_PERM to help recognize FMA.
2022-03-11 23:48 [PATCH] PR tree-optimization/101895: Fold VEC_PERM to help recognize FMA Roger Sayle
@ 2022-03-12 23:39 ` Marc Glisse
2022-03-14 7:37 ` Richard Biener
0 siblings, 1 reply; 5+ messages in thread
From: Marc Glisse @ 2022-03-12 23:39 UTC (permalink / raw)
To: Roger Sayle; +Cc: 'GCC Patches'
On Fri, 11 Mar 2022, Roger Sayle wrote:
+(match vec_same_elem_p
+ CONSTRUCTOR@0
+ (if (uniform_vector_p (TREE_CODE (@0) == SSA_NAME
+ ? gimple_assign_rhs1 (SSA_NAME_DEF_STMT (@0)) : @0))))
Ah, I didn't remember we needed that, we don't seem to be very consistent
about it. Probably for this reason, the transformation "Prefer vector1 <<
scalar to vector1 << vector2" does not match
typedef int vec __attribute__((vector_size(16)));
vec f(vec a, int b){
vec bb = { b, b, b, b };
return a << bb;
}
which is only optimized at vector lowering time.
+/* Push VEC_PERM earlier if that may help FMA perception (PR101895). */
+(for plusminus (plus minus)
+ (simplify
+ (plusminus (vec_perm (mult@0 @1 vec_same_elem_p@2) @0 @3) @4)
+ (plusminus (mult (vec_perm @1 @1 @3) @2) @4)))
Don't you want :s on mult and vec_perm?
--
Marc Glisse
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] PR tree-optimization/101895: Fold VEC_PERM to help recognize FMA.
2022-03-12 23:39 ` Marc Glisse
@ 2022-03-14 7:37 ` Richard Biener
2022-03-15 7:25 ` Roger Sayle
0 siblings, 1 reply; 5+ messages in thread
From: Richard Biener @ 2022-03-14 7:37 UTC (permalink / raw)
To: GCC Patches; +Cc: Roger Sayle, Marc Glisse
On Sun, Mar 13, 2022 at 12:39 AM Marc Glisse via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Fri, 11 Mar 2022, Roger Sayle wrote:
>
> +(match vec_same_elem_p
> + CONSTRUCTOR@0
> + (if (uniform_vector_p (TREE_CODE (@0) == SSA_NAME
> + ? gimple_assign_rhs1 (SSA_NAME_DEF_STMT (@0)) : @0))))
>
> Ah, I didn't remember we needed that, we don't seem to be very consistent
> about it. Probably for this reason, the transformation "Prefer vector1 <<
> scalar to vector1 << vector2" does not match
>
> typedef int vec __attribute__((vector_size(16)));
> vec f(vec a, int b){
> vec bb = { b, b, b, b };
> return a << bb;
> }
>
> which is only optimized at vector lowering time.
Few more comments - since match.pd is matching in match.pd order the
(match vec_same_elem_p
@0
(...))
should come last. Please use
+(match vec_same_elem_p
+ CONSTRUCTOR@0
(if (TREE_CODE (@0) == SSA_NAME
&& uniform_vector_p (...
since otherwise we'll try uniform_vector_p twice on all CTORs (that
are not uniform).
> +/* Push VEC_PERM earlier if that may help FMA perception (PR101895). */
> +(for plusminus (plus minus)
> + (simplify
> + (plusminus (vec_perm (mult@0 @1 vec_same_elem_p@2) @0 @3) @4)
> + (plusminus (mult (vec_perm @1 @1 @3) @2) @4)))
>
> Don't you want :s on mult and vec_perm?
Yes. Also for plus you want :c on it , likewise you want :c on the
mult. The :c
on the plus will require splitting the plus and minus case :/
Otherwise looks reasonable.
Richard.
>
> --
> Marc Glisse
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [PATCH] PR tree-optimization/101895: Fold VEC_PERM to help recognize FMA.
2022-03-14 7:37 ` Richard Biener
@ 2022-03-15 7:25 ` Roger Sayle
2022-03-15 7:43 ` Richard Biener
0 siblings, 1 reply; 5+ messages in thread
From: Roger Sayle @ 2022-03-15 7:25 UTC (permalink / raw)
To: 'Richard Biener', 'GCC Patches'; +Cc: 'Marc Glisse'
[-- Attachment #1: Type: text/plain, Size: 3053 bytes --]
Hi Richard and Marc,
Many thanks for both your feedback on my patch for PR 101895.
Here's version 2 of this patch, incorporating all of the suggested improvements.
The one minor complication is that the :s qualifier doesn't automatically
recognize that a capture already has two (or N) uses in a pattern,
so I have to manually confirm that there are no other uses of the mult
using num_imm_uses.
This revision has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures. Ok for mainline?
2022-03-15 Roger Sayle <roger@nextmovesoftware.com>
Marc Glisse <marc.glisse@inria.fr>
Richard Biener <rguenther@suse.de>
gcc/ChangeLog
PR tree-optimization/101895
* match.pd (vec_same_elem_p): Handle CONSTRUCTOR_EXPR def.
(plus (vec_perm (mult ...) ...) ...): New reordering simplification.
gcc/testsuite/ChangeLog
PR tree-optimization/101895
* gcc.target/i386/pr101895.c: New test case.
Thanks in advance,
Roger
--
> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: 14 March 2022 07:38
> To: GCC Patches <gcc-patches@gcc.gnu.org>
> Cc: Roger Sayle <roger@nextmovesoftware.com>; Marc Glisse
> <marc.glisse@inria.fr>
> Subject: Re: [PATCH] PR tree-optimization/101895: Fold VEC_PERM to help
> recognize FMA.
>
> On Sun, Mar 13, 2022 at 12:39 AM Marc Glisse via Gcc-patches <gcc-
> patches@gcc.gnu.org> wrote:
> >
> > On Fri, 11 Mar 2022, Roger Sayle wrote:
> >
> > +(match vec_same_elem_p
> > + CONSTRUCTOR@0
> > + (if (uniform_vector_p (TREE_CODE (@0) == SSA_NAME
> > + ? gimple_assign_rhs1 (SSA_NAME_DEF_STMT (@0))
> > +: @0))))
> >
> > Ah, I didn't remember we needed that, we don't seem to be very
> > consistent about it. Probably for this reason, the transformation
> > "Prefer vector1 << scalar to vector1 << vector2" does not match
> >
> > typedef int vec __attribute__((vector_size(16))); vec f(vec a, int b){
> > vec bb = { b, b, b, b };
> > return a << bb;
> > }
> >
> > which is only optimized at vector lowering time.
>
> Few more comments - since match.pd is matching in match.pd order the
>
> (match vec_same_elem_p
> @0
> (...))
>
> should come last. Please use
>
> +(match vec_same_elem_p
> + CONSTRUCTOR@0
> (if (TREE_CODE (@0) == SSA_NAME
> && uniform_vector_p (...
>
> since otherwise we'll try uniform_vector_p twice on all CTORs (that are not
> uniform).
>
> > +/* Push VEC_PERM earlier if that may help FMA perception (PR101895).
> > +*/ (for plusminus (plus minus)
> > + (simplify
> > + (plusminus (vec_perm (mult@0 @1 vec_same_elem_p@2) @0 @3) @4)
> > + (plusminus (mult (vec_perm @1 @1 @3) @2) @4)))
> >
> > Don't you want :s on mult and vec_perm?
>
> Yes. Also for plus you want :c on it , likewise you want :c on the mult. The :c on
> the plus will require splitting the plus and minus case :/
>
> Otherwise looks reasonable.
>
> Richard.
>
> >
> > --
> > Marc Glisse
[-- Attachment #2: patchvp3.txt --]
[-- Type: text/plain, Size: 1807 bytes --]
diff --git a/gcc/match.pd b/gcc/match.pd
index 97399e5..12c92f4 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7689,16 +7689,33 @@ and,
/* VEC_PERM_EXPR (v, v, mask) -> v where v contains same element. */
(match vec_same_elem_p
+ (vec_duplicate @0))
+
+(match vec_same_elem_p
+ CONSTRUCTOR@0
+ (if (TREE_CODE (@0) == SSA_NAME
+ && uniform_vector_p (gimple_assign_rhs1 (SSA_NAME_DEF_STMT (@0))))))
+
+(match vec_same_elem_p
@0
(if (uniform_vector_p (@0))))
-(match vec_same_elem_p
- (vec_duplicate @0))
(simplify
(vec_perm vec_same_elem_p@0 @0 @1)
@0)
+/* Push VEC_PERM earlier if that may help FMA perception (PR101895). */
+(simplify
+ (plus:c (vec_perm:s (mult:c@0 @1 vec_same_elem_p@2) @0 @3) @4)
+ (if (TREE_CODE (@0) == SSA_NAME && num_imm_uses (@0) == 2)
+ (plus (mult (vec_perm @1 @1 @3) @2) @4)))
+(simplify
+ (minus (vec_perm:s (mult:c@0 @1 vec_same_elem_p@2) @0 @3) @4)
+ (if (TREE_CODE (@0) == SSA_NAME && num_imm_uses (@0) == 2)
+ (minus (mult (vec_perm @1 @1 @3) @2) @4)))
+
+
/* Match count trailing zeroes for simplify_count_trailing_zeroes in fwprop.
The canonical form is array[((x & -x) * C) >> SHIFT] where C is a magic
constant which when multiplied by a power of 2 contains a unique value
diff --git a/gcc/testsuite/gcc.target/i386/pr101895.c b/gcc/testsuite/gcc.target/i386/pr101895.c
new file mode 100644
index 0000000..4d0f1cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr101895.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=cascadelake" } */
+
+void foo(float * __restrict__ a, float b, float *c) {
+ a[0] = c[0]*b + a[0];
+ a[1] = c[2]*b + a[1];
+ a[2] = c[1]*b + a[2];
+ a[3] = c[3]*b + a[3];
+}
+
+/* { dg-final { scan-assembler "vfmadd" } } */
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] PR tree-optimization/101895: Fold VEC_PERM to help recognize FMA.
2022-03-15 7:25 ` Roger Sayle
@ 2022-03-15 7:43 ` Richard Biener
0 siblings, 0 replies; 5+ messages in thread
From: Richard Biener @ 2022-03-15 7:43 UTC (permalink / raw)
To: Roger Sayle; +Cc: GCC Patches, Marc Glisse
On Tue, Mar 15, 2022 at 8:25 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> Hi Richard and Marc,
> Many thanks for both your feedback on my patch for PR 101895.
> Here's version 2 of this patch, incorporating all of the suggested improvements.
> The one minor complication is that the :s qualifier doesn't automatically
> recognize that a capture already has two (or N) uses in a pattern,
> so I have to manually confirm that there are no other uses of the mult
> using num_imm_uses.
>
> This revision has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check with no new failures. Ok for mainline?
OK.
Thanks,
Richard.
> 2022-03-15 Roger Sayle <roger@nextmovesoftware.com>
> Marc Glisse <marc.glisse@inria.fr>
> Richard Biener <rguenther@suse.de>
>
> gcc/ChangeLog
> PR tree-optimization/101895
> * match.pd (vec_same_elem_p): Handle CONSTRUCTOR_EXPR def.
> (plus (vec_perm (mult ...) ...) ...): New reordering simplification.
>
> gcc/testsuite/ChangeLog
> PR tree-optimization/101895
> * gcc.target/i386/pr101895.c: New test case.
>
>
> Thanks in advance,
> Roger
> --
>
> > -----Original Message-----
> > From: Richard Biener <richard.guenther@gmail.com>
> > Sent: 14 March 2022 07:38
> > To: GCC Patches <gcc-patches@gcc.gnu.org>
> > Cc: Roger Sayle <roger@nextmovesoftware.com>; Marc Glisse
> > <marc.glisse@inria.fr>
> > Subject: Re: [PATCH] PR tree-optimization/101895: Fold VEC_PERM to help
> > recognize FMA.
> >
> > On Sun, Mar 13, 2022 at 12:39 AM Marc Glisse via Gcc-patches <gcc-
> > patches@gcc.gnu.org> wrote:
> > >
> > > On Fri, 11 Mar 2022, Roger Sayle wrote:
> > >
> > > +(match vec_same_elem_p
> > > + CONSTRUCTOR@0
> > > + (if (uniform_vector_p (TREE_CODE (@0) == SSA_NAME
> > > + ? gimple_assign_rhs1 (SSA_NAME_DEF_STMT (@0))
> > > +: @0))))
> > >
> > > Ah, I didn't remember we needed that, we don't seem to be very
> > > consistent about it. Probably for this reason, the transformation
> > > "Prefer vector1 << scalar to vector1 << vector2" does not match
> > >
> > > typedef int vec __attribute__((vector_size(16))); vec f(vec a, int b){
> > > vec bb = { b, b, b, b };
> > > return a << bb;
> > > }
> > >
> > > which is only optimized at vector lowering time.
> >
> > Few more comments - since match.pd is matching in match.pd order the
> >
> > (match vec_same_elem_p
> > @0
> > (...))
> >
> > should come last. Please use
> >
> > +(match vec_same_elem_p
> > + CONSTRUCTOR@0
> > (if (TREE_CODE (@0) == SSA_NAME
> > && uniform_vector_p (...
> >
> > since otherwise we'll try uniform_vector_p twice on all CTORs (that are not
> > uniform).
> >
> > > +/* Push VEC_PERM earlier if that may help FMA perception (PR101895).
> > > +*/ (for plusminus (plus minus)
> > > + (simplify
> > > + (plusminus (vec_perm (mult@0 @1 vec_same_elem_p@2) @0 @3) @4)
> > > + (plusminus (mult (vec_perm @1 @1 @3) @2) @4)))
> > >
> > > Don't you want :s on mult and vec_perm?
> >
> > Yes. Also for plus you want :c on it , likewise you want :c on the mult. The :c on
> > the plus will require splitting the plus and minus case :/
> >
> > Otherwise looks reasonable.
> >
> > Richard.
> >
> > >
> > > --
> > > Marc Glisse
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-03-15 7:43 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-11 23:48 [PATCH] PR tree-optimization/101895: Fold VEC_PERM to help recognize FMA Roger Sayle
2022-03-12 23:39 ` Marc Glisse
2022-03-14 7:37 ` Richard Biener
2022-03-15 7:25 ` Roger Sayle
2022-03-15 7:43 ` Richard Biener
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).