[PATCH]middle-end: don't form FMAs when multiplication is not single use. [PR108583]

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Tamar Christina <tamar.christina@arm.com>
To: gcc-patches@gcc.gnu.org
Cc: nd@arm.com, rguenther@suse.de, jlaw@ventanamicro.com,
	richard.sandiford@arm.com
Subject: [PATCH]middle-end: don't form FMAs when multiplication is not single use. [PR108583]
Date: Thu, 9 Mar 2023 19:36:21 +0000	[thread overview]
Message-ID: <patch-17101-tamar@arm.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 3112 bytes --]

Hi All,

The testcase

typedef unsigned int vec __attribute__((vector_size(32)));
vec
f3 (vec a, vec b, vec c)
{
  vec d = a * b;
  return d + ((c + d) >> 1);
}

shows a case where we don't want to form an FMA due to the MUL not being single
use.  In this case to form an FMA we have to redo the MUL as well as we no
longer have it to share.

As such making an FMA here would be a de-optimization.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	PR target/108583
	* tree-ssa-math-opts.cc (convert_mult_to_fma): Inhibit FMA in case not
	single use.

gcc/testsuite/ChangeLog:

	PR target/108583
	* gcc.dg/mla_1.c: New test.

Co-Authored-By: Richard Sandiford <richard.sandiford@arm.com>

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.dg/mla_1.c b/gcc/testsuite/gcc.dg/mla_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..a92ecf248116d89b1bc4207a907ea5ed95728a28
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mla_1.c
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-options "-O2 -msve-vector-bits=256 -march=armv8.2-a+sve -fdump-tree-optimized" } */
+
+unsigned int
+f1 (unsigned int a, unsigned int b, unsigned int c) {
+  unsigned int d = a * b;
+  return d + ((c + d) >> 1);
+}
+
+unsigned int
+g1 (unsigned int a, unsigned int b, unsigned int c) {
+  return a * b + c;
+}
+
+__Uint32x4_t
+f2 (__Uint32x4_t a, __Uint32x4_t b, __Uint32x4_t c) {
+  __Uint32x4_t d = a * b;
+  return d + ((c + d) >> 1);
+}
+
+__Uint32x4_t
+g2 (__Uint32x4_t a, __Uint32x4_t b, __Uint32x4_t c) {
+  return a * b + c;
+}
+
+typedef unsigned int vec __attribute__((vector_size(32))); vec
+f3 (vec a, vec b, vec c)
+{
+  vec d = a * b;
+  return d + ((c + d) >> 1);
+}
+
+vec
+g3 (vec a, vec b, vec c)
+{
+  return a * b + c;
+}
+
+/* { dg-final { scan-tree-dump-times {\.FMA } 1 "optimized" { target aarch64*-*-* } } } */
diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
index 5ab5b944a573ad24ce8427aff24fc5215bf05dac..26ed91d58fa4709a67c903ad446d267a3113c172 100644
--- a/gcc/tree-ssa-math-opts.cc
+++ b/gcc/tree-ssa-math-opts.cc
@@ -3346,6 +3346,20 @@ convert_mult_to_fma (gimple *mul_stmt, tree op1, tree op2,
 		    param_avoid_fma_max_bits));
   bool defer = check_defer;
   bool seen_negate_p = false;
+
+  /* There is no numerical difference between fused and unfused integer FMAs,
+     and the assumption below that FMA is as cheap as addition is unlikely
+     to be true, especially if the multiplication occurs multiple times on
+     the same chain.  E.g., for something like:
+
+	 (((a * b) + c) >> 1) + (a * b)
+
+     we do not want to duplicate the a * b into two additions, not least
+     because the result is not a natural FMA chain.  */
+  if (ANY_INTEGRAL_TYPE_P (type)
+      && !has_single_use (mul_result))
+    return false;
+
   /* Make sure that the multiplication statement becomes dead after
      the transformation, thus that all uses are transformed to FMAs.
      This means we assume that an FMA operation has the same cost




-- 

[-- Attachment #2: rb17101.patch --]
[-- Type: text/plain, Size: 2326 bytes --]

diff --git a/gcc/testsuite/gcc.dg/mla_1.c b/gcc/testsuite/gcc.dg/mla_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..a92ecf248116d89b1bc4207a907ea5ed95728a28
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mla_1.c
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-options "-O2 -msve-vector-bits=256 -march=armv8.2-a+sve -fdump-tree-optimized" } */
+
+unsigned int
+f1 (unsigned int a, unsigned int b, unsigned int c) {
+  unsigned int d = a * b;
+  return d + ((c + d) >> 1);
+}
+
+unsigned int
+g1 (unsigned int a, unsigned int b, unsigned int c) {
+  return a * b + c;
+}
+
+__Uint32x4_t
+f2 (__Uint32x4_t a, __Uint32x4_t b, __Uint32x4_t c) {
+  __Uint32x4_t d = a * b;
+  return d + ((c + d) >> 1);
+}
+
+__Uint32x4_t
+g2 (__Uint32x4_t a, __Uint32x4_t b, __Uint32x4_t c) {
+  return a * b + c;
+}
+
+typedef unsigned int vec __attribute__((vector_size(32))); vec
+f3 (vec a, vec b, vec c)
+{
+  vec d = a * b;
+  return d + ((c + d) >> 1);
+}
+
+vec
+g3 (vec a, vec b, vec c)
+{
+  return a * b + c;
+}
+
+/* { dg-final { scan-tree-dump-times {\.FMA } 1 "optimized" { target aarch64*-*-* } } } */
diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
index 5ab5b944a573ad24ce8427aff24fc5215bf05dac..26ed91d58fa4709a67c903ad446d267a3113c172 100644
--- a/gcc/tree-ssa-math-opts.cc
+++ b/gcc/tree-ssa-math-opts.cc
@@ -3346,6 +3346,20 @@ convert_mult_to_fma (gimple *mul_stmt, tree op1, tree op2,
 		    param_avoid_fma_max_bits));
   bool defer = check_defer;
   bool seen_negate_p = false;
+
+  /* There is no numerical difference between fused and unfused integer FMAs,
+     and the assumption below that FMA is as cheap as addition is unlikely
+     to be true, especially if the multiplication occurs multiple times on
+     the same chain.  E.g., for something like:
+
+	 (((a * b) + c) >> 1) + (a * b)
+
+     we do not want to duplicate the a * b into two additions, not least
+     because the result is not a natural FMA chain.  */
+  if (ANY_INTEGRAL_TYPE_P (type)
+      && !has_single_use (mul_result))
+    return false;
+
   /* Make sure that the multiplication statement becomes dead after
      the transformation, thus that all uses are transformed to FMAs.
      This means we assume that an FMA operation has the same cost

next             reply	other threads:[~2023-03-09 19:36 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-09 19:36 Tamar Christina [this message]
2023-03-10  2:30 ` Hongtao Liu
2023-03-10  7:46   ` Richard Biener
2023-03-10  7:41 ` Richard Biener
2023-03-14  7:42 ` Jakub Jelinek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=patch-17101-tamar@arm.com \
    --to=tamar.christina@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jlaw@ventanamicro.com \
    --cc=nd@arm.com \
    --cc=rguenther@suse.de \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).