[PATCH]middle-end: don't form FMAs when multiplication is not single use. [PR108583]

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH]middle-end: don't form FMAs when multiplication is not single use. [PR108583]
@ 2023-03-09 19:36 Tamar Christina
  2023-03-10  2:30 ` Hongtao Liu
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Tamar Christina @ 2023-03-09 19:36 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, jlaw, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 3112 bytes --]

Hi All,

The testcase

typedef unsigned int vec __attribute__((vector_size(32)));
vec
f3 (vec a, vec b, vec c)
{
  vec d = a * b;
  return d + ((c + d) >> 1);
}

shows a case where we don't want to form an FMA due to the MUL not being single
use.  In this case to form an FMA we have to redo the MUL as well as we no
longer have it to share.

As such making an FMA here would be a de-optimization.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	PR target/108583
	* tree-ssa-math-opts.cc (convert_mult_to_fma): Inhibit FMA in case not
	single use.

gcc/testsuite/ChangeLog:

	PR target/108583
	* gcc.dg/mla_1.c: New test.

Co-Authored-By: Richard Sandiford <richard.sandiford@arm.com>

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/mla_1.c b/gcc/testsuite/gcc.dg/mla_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..a92ecf248116d89b1bc4207a907ea5ed95728a28
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mla_1.c
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-options "-O2 -msve-vector-bits=256 -march=armv8.2-a+sve -fdump-tree-optimized" } */
+
+unsigned int
+f1 (unsigned int a, unsigned int b, unsigned int c) {
+  unsigned int d = a * b;
+  return d + ((c + d) >> 1);
+}
+
+unsigned int
+g1 (unsigned int a, unsigned int b, unsigned int c) {
+  return a * b + c;
+}
+
+__Uint32x4_t
+f2 (__Uint32x4_t a, __Uint32x4_t b, __Uint32x4_t c) {
+  __Uint32x4_t d = a * b;
+  return d + ((c + d) >> 1);
+}
+
+__Uint32x4_t
+g2 (__Uint32x4_t a, __Uint32x4_t b, __Uint32x4_t c) {
+  return a * b + c;
+}
+
+typedef unsigned int vec __attribute__((vector_size(32))); vec
+f3 (vec a, vec b, vec c)
+{
+  vec d = a * b;
+  return d + ((c + d) >> 1);
+}
+
+vec
+g3 (vec a, vec b, vec c)
+{
+  return a * b + c;
+}
+
+/* { dg-final { scan-tree-dump-times {\.FMA } 1 "optimized" { target aarch64*-*-* } } } */
diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
index 5ab5b944a573ad24ce8427aff24fc5215bf05dac..26ed91d58fa4709a67c903ad446d267a3113c172 100644
--- a/gcc/tree-ssa-math-opts.cc
+++ b/gcc/tree-ssa-math-opts.cc
@@ -3346,6 +3346,20 @@ convert_mult_to_fma (gimple *mul_stmt, tree op1, tree op2,
 		    param_avoid_fma_max_bits));
   bool defer = check_defer;
   bool seen_negate_p = false;
+
+  /* There is no numerical difference between fused and unfused integer FMAs,
+     and the assumption below that FMA is as cheap as addition is unlikely
+     to be true, especially if the multiplication occurs multiple times on
+     the same chain.  E.g., for something like:
+
+	 (((a * b) + c) >> 1) + (a * b)
+
+     we do not want to duplicate the a * b into two additions, not least
+     because the result is not a natural FMA chain.  */
+  if (ANY_INTEGRAL_TYPE_P (type)
+      && !has_single_use (mul_result))
+    return false;
+
   /* Make sure that the multiplication statement becomes dead after
      the transformation, thus that all uses are transformed to FMAs.
      This means we assume that an FMA operation has the same cost




-- 

[-- Attachment #2: rb17101.patch --]
[-- Type: text/plain, Size: 2326 bytes --]

diff --git a/gcc/testsuite/gcc.dg/mla_1.c b/gcc/testsuite/gcc.dg/mla_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..a92ecf248116d89b1bc4207a907ea5ed95728a28
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mla_1.c
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-options "-O2 -msve-vector-bits=256 -march=armv8.2-a+sve -fdump-tree-optimized" } */
+
+unsigned int
+f1 (unsigned int a, unsigned int b, unsigned int c) {
+  unsigned int d = a * b;
+  return d + ((c + d) >> 1);
+}
+
+unsigned int
+g1 (unsigned int a, unsigned int b, unsigned int c) {
+  return a * b + c;
+}
+
+__Uint32x4_t
+f2 (__Uint32x4_t a, __Uint32x4_t b, __Uint32x4_t c) {
+  __Uint32x4_t d = a * b;
+  return d + ((c + d) >> 1);
+}
+
+__Uint32x4_t
+g2 (__Uint32x4_t a, __Uint32x4_t b, __Uint32x4_t c) {
+  return a * b + c;
+}
+
+typedef unsigned int vec __attribute__((vector_size(32))); vec
+f3 (vec a, vec b, vec c)
+{
+  vec d = a * b;
+  return d + ((c + d) >> 1);
+}
+
+vec
+g3 (vec a, vec b, vec c)
+{
+  return a * b + c;
+}
+
+/* { dg-final { scan-tree-dump-times {\.FMA } 1 "optimized" { target aarch64*-*-* } } } */
diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
index 5ab5b944a573ad24ce8427aff24fc5215bf05dac..26ed91d58fa4709a67c903ad446d267a3113c172 100644
--- a/gcc/tree-ssa-math-opts.cc
+++ b/gcc/tree-ssa-math-opts.cc
@@ -3346,6 +3346,20 @@ convert_mult_to_fma (gimple *mul_stmt, tree op1, tree op2,
 		    param_avoid_fma_max_bits));
   bool defer = check_defer;
   bool seen_negate_p = false;
+
+  /* There is no numerical difference between fused and unfused integer FMAs,
+     and the assumption below that FMA is as cheap as addition is unlikely
+     to be true, especially if the multiplication occurs multiple times on
+     the same chain.  E.g., for something like:
+
+	 (((a * b) + c) >> 1) + (a * b)
+
+     we do not want to duplicate the a * b into two additions, not least
+     because the result is not a natural FMA chain.  */
+  if (ANY_INTEGRAL_TYPE_P (type)
+      && !has_single_use (mul_result))
+    return false;
+
   /* Make sure that the multiplication statement becomes dead after
      the transformation, thus that all uses are transformed to FMAs.
      This means we assume that an FMA operation has the same cost




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH]middle-end: don't form FMAs when multiplication is not single use. [PR108583]
  2023-03-09 19:36 [PATCH]middle-end: don't form FMAs when multiplication is not single use. [PR108583] Tamar Christina
@ 2023-03-10  2:30 ` Hongtao Liu
  2023-03-10  7:46   ` Richard Biener
  2023-03-10  7:41 ` Richard Biener
  2023-03-14  7:42 ` Jakub Jelinek
  2 siblings, 1 reply; 5+ messages in thread
From: Hongtao Liu @ 2023-03-10  2:30 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, rguenther, jlaw, richard.sandiford

On Fri, Mar 10, 2023 at 3:37 AM Tamar Christina via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi All,
>
> The testcase
>
> typedef unsigned int vec __attribute__((vector_size(32)));
> vec
> f3 (vec a, vec b, vec c)
> {
>   vec d = a * b;
>   return d + ((c + d) >> 1);
> }
>
> shows a case where we don't want to form an FMA due to the MUL not being single
> use.  In this case to form an FMA we have to redo the MUL as well as we no
> longer have it to share.
>
> As such making an FMA here would be a de-optimization.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>         PR target/108583
>         * tree-ssa-math-opts.cc (convert_mult_to_fma): Inhibit FMA in case not
>         single use.
>
> gcc/testsuite/ChangeLog:
>
>         PR target/108583
>         * gcc.dg/mla_1.c: New test.
>
> Co-Authored-By: Richard Sandiford <richard.sandiford@arm.com>
>
> --- inline copy of patch --
> diff --git a/gcc/testsuite/gcc.dg/mla_1.c b/gcc/testsuite/gcc.dg/mla_1.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..a92ecf248116d89b1bc4207a907ea5ed95728a28
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/mla_1.c
> @@ -0,0 +1,40 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-options "-O2 -msve-vector-bits=256 -march=armv8.2-a+sve -fdump-tree-optimized" } */
> +
> +unsigned int
> +f1 (unsigned int a, unsigned int b, unsigned int c) {
> +  unsigned int d = a * b;
> +  return d + ((c + d) >> 1);
> +}
> +
> +unsigned int
> +g1 (unsigned int a, unsigned int b, unsigned int c) {
> +  return a * b + c;
> +}
> +
> +__Uint32x4_t
> +f2 (__Uint32x4_t a, __Uint32x4_t b, __Uint32x4_t c) {
> +  __Uint32x4_t d = a * b;
> +  return d + ((c + d) >> 1);
> +}
> +
> +__Uint32x4_t
> +g2 (__Uint32x4_t a, __Uint32x4_t b, __Uint32x4_t c) {
> +  return a * b + c;
> +}
> +
> +typedef unsigned int vec __attribute__((vector_size(32))); vec
> +f3 (vec a, vec b, vec c)
> +{
> +  vec d = a * b;
> +  return d + ((c + d) >> 1);
> +}
> +
> +vec
> +g3 (vec a, vec b, vec c)
> +{
> +  return a * b + c;
> +}
> +
> +/* { dg-final { scan-tree-dump-times {\.FMA } 1 "optimized" { target aarch64*-*-* } } } */
> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index 5ab5b944a573ad24ce8427aff24fc5215bf05dac..26ed91d58fa4709a67c903ad446d267a3113c172 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -3346,6 +3346,20 @@ convert_mult_to_fma (gimple *mul_stmt, tree op1, tree op2,
>                     param_avoid_fma_max_bits));
>    bool defer = check_defer;
>    bool seen_negate_p = false;
> +
> +  /* There is no numerical difference between fused and unfused integer FMAs,
> +     and the assumption below that FMA is as cheap as addition is unlikely
> +     to be true, especially if the multiplication occurs multiple times on
> +     the same chain.  E.g., for something like:
> +
> +        (((a * b) + c) >> 1) + (a * b)
> +
> +     we do not want to duplicate the a * b into two additions, not least
> +     because the result is not a natural FMA chain.  */
> +  if (ANY_INTEGRAL_TYPE_P (type)
> +      && !has_single_use (mul_result))
What about floating point?
> +    return false;
> +
>    /* Make sure that the multiplication statement becomes dead after
>       the transformation, thus that all uses are transformed to FMAs.
>       This means we assume that an FMA operation has the same cost
>
>
>
>
> --



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH]middle-end: don't form FMAs when multiplication is not single use. [PR108583]
  2023-03-10  2:30 ` Hongtao Liu
@ 2023-03-10  7:46   ` Richard Biener
  0 siblings, 0 replies; 5+ messages in thread
From: Richard Biener @ 2023-03-10  7:46 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: Tamar Christina, gcc-patches, nd, jlaw, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 4501 bytes --]

On Fri, 10 Mar 2023, Hongtao Liu wrote:

> On Fri, Mar 10, 2023 at 3:37 AM Tamar Christina via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Hi All,
> >
> > The testcase
> >
> > typedef unsigned int vec __attribute__((vector_size(32)));
> > vec
> > f3 (vec a, vec b, vec c)
> > {
> >   vec d = a * b;
> >   return d + ((c + d) >> 1);
> > }
> >
> > shows a case where we don't want to form an FMA due to the MUL not being single
> > use.  In this case to form an FMA we have to redo the MUL as well as we no
> > longer have it to share.
> >
> > As such making an FMA here would be a de-optimization.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> >         PR target/108583
> >         * tree-ssa-math-opts.cc (convert_mult_to_fma): Inhibit FMA in case not
> >         single use.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         PR target/108583
> >         * gcc.dg/mla_1.c: New test.
> >
> > Co-Authored-By: Richard Sandiford <richard.sandiford@arm.com>
> >
> > --- inline copy of patch --
> > diff --git a/gcc/testsuite/gcc.dg/mla_1.c b/gcc/testsuite/gcc.dg/mla_1.c
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..a92ecf248116d89b1bc4207a907ea5ed95728a28
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/mla_1.c
> > @@ -0,0 +1,40 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target vect_int } */
> > +/* { dg-options "-O2 -msve-vector-bits=256 -march=armv8.2-a+sve -fdump-tree-optimized" } */
> > +
> > +unsigned int
> > +f1 (unsigned int a, unsigned int b, unsigned int c) {
> > +  unsigned int d = a * b;
> > +  return d + ((c + d) >> 1);
> > +}
> > +
> > +unsigned int
> > +g1 (unsigned int a, unsigned int b, unsigned int c) {
> > +  return a * b + c;
> > +}
> > +
> > +__Uint32x4_t
> > +f2 (__Uint32x4_t a, __Uint32x4_t b, __Uint32x4_t c) {
> > +  __Uint32x4_t d = a * b;
> > +  return d + ((c + d) >> 1);
> > +}
> > +
> > +__Uint32x4_t
> > +g2 (__Uint32x4_t a, __Uint32x4_t b, __Uint32x4_t c) {
> > +  return a * b + c;
> > +}
> > +
> > +typedef unsigned int vec __attribute__((vector_size(32))); vec
> > +f3 (vec a, vec b, vec c)
> > +{
> > +  vec d = a * b;
> > +  return d + ((c + d) >> 1);
> > +}
> > +
> > +vec
> > +g3 (vec a, vec b, vec c)
> > +{
> > +  return a * b + c;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-times {\.FMA } 1 "optimized" { target aarch64*-*-* } } } */
> > diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> > index 5ab5b944a573ad24ce8427aff24fc5215bf05dac..26ed91d58fa4709a67c903ad446d267a3113c172 100644
> > --- a/gcc/tree-ssa-math-opts.cc
> > +++ b/gcc/tree-ssa-math-opts.cc
> > @@ -3346,6 +3346,20 @@ convert_mult_to_fma (gimple *mul_stmt, tree op1, tree op2,
> >                     param_avoid_fma_max_bits));
> >    bool defer = check_defer;
> >    bool seen_negate_p = false;
> > +
> > +  /* There is no numerical difference between fused and unfused integer FMAs,
> > +     and the assumption below that FMA is as cheap as addition is unlikely
> > +     to be true, especially if the multiplication occurs multiple times on
> > +     the same chain.  E.g., for something like:
> > +
> > +        (((a * b) + c) >> 1) + (a * b)
> > +
> > +     we do not want to duplicate the a * b into two additions, not least
> > +     because the result is not a natural FMA chain.  */
> > +  if (ANY_INTEGRAL_TYPE_P (type)
> > +      && !has_single_use (mul_result))
> What about floating point?

I think for a case like above, thus

 ((a * b) + c) + (a * b)

it's profitable to handle this as

  fma (a, b, fma (a, b, c))

as this saves one add and has one op less latency?  For the case
where the second use is not part of the dependence chain it's
less obvious but since FMA is usually not (very much more) expensive
than an add erroring on the optimization side didn't look wrong
(IIRC the FMA forming analysis isn't "global", aka counts
untransformed mults left in the end)

Richard.

> > +    return false;
> > +
> >    /* Make sure that the multiplication statement becomes dead after
> >       the transformation, thus that all uses are transformed to FMAs.
> >       This means we assume that an FMA operation has the same cost
> >
> >
> >
> >
> > --
> 
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH]middle-end: don't form FMAs when multiplication is not single use. [PR108583]
  2023-03-09 19:36 [PATCH]middle-end: don't form FMAs when multiplication is not single use. [PR108583] Tamar Christina
  2023-03-10  2:30 ` Hongtao Liu
@ 2023-03-10  7:41 ` Richard Biener
  2023-03-14  7:42 ` Jakub Jelinek
  2 siblings, 0 replies; 5+ messages in thread
From: Richard Biener @ 2023-03-10  7:41 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, jlaw, richard.sandiford

On Thu, 9 Mar 2023, Tamar Christina wrote:

> Hi All,
> 
> The testcase
> 
> typedef unsigned int vec __attribute__((vector_size(32)));
> vec
> f3 (vec a, vec b, vec c)
> {
>   vec d = a * b;
>   return d + ((c + d) >> 1);
> }
> 
> shows a case where we don't want to form an FMA due to the MUL not being single
> use.  In this case to form an FMA we have to redo the MUL as well as we no
> longer have it to share.
> 
> As such making an FMA here would be a de-optimization.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

OK.

Thanks,
Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	PR target/108583
> 	* tree-ssa-math-opts.cc (convert_mult_to_fma): Inhibit FMA in case not
> 	single use.
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR target/108583
> 	* gcc.dg/mla_1.c: New test.
> 
> Co-Authored-By: Richard Sandiford <richard.sandiford@arm.com>
> 
> --- inline copy of patch -- 
> diff --git a/gcc/testsuite/gcc.dg/mla_1.c b/gcc/testsuite/gcc.dg/mla_1.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..a92ecf248116d89b1bc4207a907ea5ed95728a28
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/mla_1.c
> @@ -0,0 +1,40 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-options "-O2 -msve-vector-bits=256 -march=armv8.2-a+sve -fdump-tree-optimized" } */
> +
> +unsigned int
> +f1 (unsigned int a, unsigned int b, unsigned int c) {
> +  unsigned int d = a * b;
> +  return d + ((c + d) >> 1);
> +}
> +
> +unsigned int
> +g1 (unsigned int a, unsigned int b, unsigned int c) {
> +  return a * b + c;
> +}
> +
> +__Uint32x4_t
> +f2 (__Uint32x4_t a, __Uint32x4_t b, __Uint32x4_t c) {
> +  __Uint32x4_t d = a * b;
> +  return d + ((c + d) >> 1);
> +}
> +
> +__Uint32x4_t
> +g2 (__Uint32x4_t a, __Uint32x4_t b, __Uint32x4_t c) {
> +  return a * b + c;
> +}
> +
> +typedef unsigned int vec __attribute__((vector_size(32))); vec
> +f3 (vec a, vec b, vec c)
> +{
> +  vec d = a * b;
> +  return d + ((c + d) >> 1);
> +}
> +
> +vec
> +g3 (vec a, vec b, vec c)
> +{
> +  return a * b + c;
> +}
> +
> +/* { dg-final { scan-tree-dump-times {\.FMA } 1 "optimized" { target aarch64*-*-* } } } */
> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index 5ab5b944a573ad24ce8427aff24fc5215bf05dac..26ed91d58fa4709a67c903ad446d267a3113c172 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -3346,6 +3346,20 @@ convert_mult_to_fma (gimple *mul_stmt, tree op1, tree op2,
>  		    param_avoid_fma_max_bits));
>    bool defer = check_defer;
>    bool seen_negate_p = false;
> +
> +  /* There is no numerical difference between fused and unfused integer FMAs,
> +     and the assumption below that FMA is as cheap as addition is unlikely
> +     to be true, especially if the multiplication occurs multiple times on
> +     the same chain.  E.g., for something like:
> +
> +	 (((a * b) + c) >> 1) + (a * b)
> +
> +     we do not want to duplicate the a * b into two additions, not least
> +     because the result is not a natural FMA chain.  */
> +  if (ANY_INTEGRAL_TYPE_P (type)
> +      && !has_single_use (mul_result))
> +    return false;
> +
>    /* Make sure that the multiplication statement becomes dead after
>       the transformation, thus that all uses are transformed to FMAs.
>       This means we assume that an FMA operation has the same cost
> 
> 
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH]middle-end: don't form FMAs when multiplication is not single use. [PR108583]
  2023-03-09 19:36 [PATCH]middle-end: don't form FMAs when multiplication is not single use. [PR108583] Tamar Christina
  2023-03-10  2:30 ` Hongtao Liu
  2023-03-10  7:41 ` Richard Biener
@ 2023-03-14  7:42 ` Jakub Jelinek
  2 siblings, 0 replies; 5+ messages in thread
From: Jakub Jelinek @ 2023-03-14  7:42 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, rguenther, jlaw, richard.sandiford

On Thu, Mar 09, 2023 at 07:36:21PM +0000, Tamar Christina via Gcc-patches wrote:
> 	PR target/108583
> 	* gcc.dg/mla_1.c: New test.

The testcase FAILs on all targets but AArch64 (maybe ARM is fine too).
While f1/g1 are compilable on all targets and f3/g3 with -Wno-psabi in
dg-options, f2/g2 are AArch64 specific.  So, I think either
the testcase should be moved to gcc.target/aarch64/ as whole,
or you should split it, have gcc.dg/mla_1.c contain everything but
f2/g2, drop vect_int requires and change dg-options to "-O2 -Wno-psabi",
and then gcc.target/aarch64/mla_1.c which has the dg- directives as you
currently have and #include "../../gcc.dg/mla_1.c" to get the functions
+ add f2/g2.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/mla_1.c
> @@ -0,0 +1,40 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-options "-O2 -msve-vector-bits=256 -march=armv8.2-a+sve -fdump-tree-optimized" } */
> +
> +unsigned int
> +f1 (unsigned int a, unsigned int b, unsigned int c) {
> +  unsigned int d = a * b;
> +  return d + ((c + d) >> 1);
> +}
> +
> +unsigned int
> +g1 (unsigned int a, unsigned int b, unsigned int c) {
> +  return a * b + c;
> +}
> +
> +__Uint32x4_t
> +f2 (__Uint32x4_t a, __Uint32x4_t b, __Uint32x4_t c) {
> +  __Uint32x4_t d = a * b;
> +  return d + ((c + d) >> 1);
> +}
> +
> +__Uint32x4_t
> +g2 (__Uint32x4_t a, __Uint32x4_t b, __Uint32x4_t c) {
> +  return a * b + c;
> +}
> +
> +typedef unsigned int vec __attribute__((vector_size(32))); vec
> +f3 (vec a, vec b, vec c)
> +{
> +  vec d = a * b;
> +  return d + ((c + d) >> 1);
> +}
> +
> +vec
> +g3 (vec a, vec b, vec c)
> +{
> +  return a * b + c;
> +}
> +
> +/* { dg-final { scan-tree-dump-times {\.FMA } 1 "optimized" { target aarch64*-*-* } } } */

	Jakub


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-03-14  7:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-09 19:36 [PATCH]middle-end: don't form FMAs when multiplication is not single use. [PR108583] Tamar Christina
2023-03-10  2:30 ` Hongtao Liu
2023-03-10  7:46   ` Richard Biener
2023-03-10  7:41 ` Richard Biener
2023-03-14  7:42 ` Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).