[gcc(refs/users/meissner/heads/work117)] Do not generate fmaddfp and fnmsubfp

public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed

* [gcc(refs/users/meissner/heads/work117)] Do not generate fmaddfp and fnmsubfp
@ 2023-04-06 18:48 Michael Meissner
  0 siblings, 0 replies; 7+ messages in thread
From: Michael Meissner @ 2023-04-06 18:48 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:d5a1284d923c38004ac855357bc32ec86d84bf80

commit d5a1284d923c38004ac855357bc32ec86d84bf80
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Thu Apr 6 14:48:24 2023 -0400

    Do not generate fmaddfp and fnmsubfp
    
    The Altivec instructions fmaddfp and fnmsubfp have different rounding behaviors
    than the VSX xvmaddsp and xvnmsubsp instructions.  In particular, generating
    these instructions seems to break Eigen.
    
    GCC has generated the Altivec fmaddfp and fnmsubfp instructions on VSX systems
    as an alternative to the xsmadd{a,m}sp and xsnmsub{a,m}sp instructions.  The
    advantage  of the Altivec instructions is that they are 4 operand instructions
    (i.e. the target register does not have to overlap with one of the input
    registers).  The advantage is it can eliminate an extra move instruction.  The
    disadvantage is it does round the same was as the VSX instructions.
    
    This patch eliminates the generation of the Altivec fmaddfp and fnmsubfp
    instructions as alternatives in the VSX instruction insn support, and in the
    Altivec insns it adds a test to prevent the insn from being used if VSX is
    available.  I also added a test to the regression test suite.
    
    I have done bootstrap builds on power9 little endian (with both IEEE long
    double and IBM long double).  I have also done the builds and test on a power8
    big endian system (testing both 32-bit and 64-bit code generation).  Chip has
    verified that it fixes the problem that Eigen encountered.  Can I check this
    into the master GCC branch?  After a burn-in period, can I check this patch
    into the active GCC branches?
    
    Thanks in advance.
    
    2023-04-06   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            PR target/70243
            * config/rs6000/rs6000.md (isa attribute): Add fastmath.
            (enabled attribute): Add support for fastmath.
            * config/rs6000/vsx.md (vsx_fmav4sf4): Set the isa attribute to
            fastmath to disable Altivec instruction generatins normally.
            (vsx_nfmsv4sf4): Likewise.
    
    gcc/testsuite/
    
            PR target/70243
            * gcc.target/powerpc/pr70243.c: New test.
            * gcc.target/powerpc/pr70243-2.c: New test.

Diff:
---
 gcc/config/rs6000/rs6000.md                  |  6 +++-
 gcc/config/rs6000/vsx.md                     |  6 ++--
 gcc/testsuite/gcc.target/powerpc/pr70243-2.c | 41 ++++++++++++++++++++++++++++
 gcc/testsuite/gcc.target/powerpc/pr70243.c   | 41 ++++++++++++++++++++++++++++
 4 files changed, 91 insertions(+), 3 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 44f7dd509cb..7fea6a40e0c 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -354,7 +354,7 @@
   (const (symbol_ref "(enum attr_cpu) rs6000_tune")))
 
 ;; The ISA we implement.
-(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10"
+(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10,fastmath"
   (const_string "any"))
 
 ;; Is this alternative enabled for the current CPU/ISA/etc.?
@@ -402,6 +402,10 @@
      (and (eq_attr "isa" "p10")
 	  (match_test "TARGET_POWER10"))
      (const_int 1)
+
+     (and (eq_attr "isa" "fastmath")
+	  (match_test "flag_unsafe_math_optimizations"))
+     (const_int 1)
     ] (const_int 0)))
 
 ;; If this instruction is microcoded on the CELL processor
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 0865608f94a..85d4ac5082f 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -2025,7 +2025,8 @@
    xvmaddasp %x0,%x1,%x2
    xvmaddmsp %x0,%x1,%x3
    vmaddfp %0,%1,%2,%3"
-  [(set_attr "type" "vecfloat")])
+  [(set_attr "type" "vecfloat")
+   (set_attr "isa" "*,*,fastmath")])
 
 (define_insn "*vsx_fmav2df4"
   [(set (match_operand:V2DF 0 "vsx_register_operand" "=wa,wa")
@@ -2078,7 +2079,8 @@
    xvnmsubasp %x0,%x1,%x2
    xvnmsubmsp %x0,%x1,%x3
    vnmsubfp %0,%1,%2,%3"
-  [(set_attr "type" "vecfloat")])
+  [(set_attr "type" "vecfloat")
+   (set_attr "isa" "*,*,fastmath")])
 
 (define_insn "*vsx_nfmsv2df4"
   [(set (match_operand:V2DF 0 "vsx_register_operand" "=wa,wa")
diff --git a/gcc/testsuite/gcc.target/powerpc/pr70243-2.c b/gcc/testsuite/gcc.target/powerpc/pr70243-2.c
new file mode 100644
index 00000000000..2a4c2b17f6f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr70243-2.c
@@ -0,0 +1,41 @@
+/* { dg-do compile */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-Ofast -mvsx" } */
+
+/* PR 70423, Make sure we don't generate fmaddfp or fnmsubfp unless -ffast-math
+   is used.  These instructions do not round the same way the normal VSX
+   instructions do.  These tests are written where the 3 inputs and target are
+   all separate registers where the register allocator would prefer to issue
+   the 4 argument FMA instruction over the 3 argument instruction plus an extra
+   move.  */
+
+#include <altivec.h>
+
+vector float
+do_add1 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return (a * b) + c;
+}
+
+vector float
+do_nsub1 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return -((a * b) - c);
+}
+
+vector float
+do_add2 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return vec_madd (a, b, c);
+}
+
+vector float
+do_nsub2 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return vec_nmsub (a, b, c);
+}
+
+/* { dg-final { scan-assembler-not "xvmaddsp"  } } */
+/* { dg-final { scan-assembler-not "xvnmsubsp" } } */
+/* { dg-final { scan-assembler     "fmaddfp"   } } */
+/* { dg-final { scan-assembler     "fnmsubfp"  } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr70243.c b/gcc/testsuite/gcc.target/powerpc/pr70243.c
new file mode 100644
index 00000000000..7860767d83f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr70243.c
@@ -0,0 +1,41 @@
+/* { dg-do compile */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+/* PR 70423, Make sure we don't generate fmaddfp or fnmsubfp unless -ffast-math
+   is used.  These instructions do not round the same way the normal VSX
+   instructions do.  These tests are written where the 3 inputs and target are
+   all separate registers where the register allocator would prefer to issue
+   the 4 argument FMA instruction over the 3 argument instruction plus an extra
+   move.  */
+
+#include <altivec.h>
+
+vector float
+do_add1 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return (a * b) + c;
+}
+
+vector float
+do_nsub1 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return -((a * b) - c);
+}
+
+vector float
+do_add2 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return vec_madd (a, b, c);
+}
+
+vector float
+do_nsub2 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return vec_nmsub (a, b, c);
+}
+
+/* { dg-final { scan-assembler     "xvmaddsp"  } } */
+/* { dg-final { scan-assembler     "xvnmsubsp" } } */
+/* { dg-final { scan-assembler-not "fmaddfp"   } } */
+/* { dg-final { scan-assembler-not "fnmsubfp"  } } */

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [gcc(refs/users/meissner/heads/work117)] Do not generate fmaddfp and fnmsubfp
@ 2023-04-06 20:32 Michael Meissner
  0 siblings, 0 replies; 7+ messages in thread
From: Michael Meissner @ 2023-04-06 20:32 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:80575f56b4392f1f0afe58ee0b1478e54d41a4ca

commit 80575f56b4392f1f0afe58ee0b1478e54d41a4ca
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Thu Apr 6 16:31:58 2023 -0400

    Do not generate fmaddfp and fnmsubfp
    
    The Altivec instructions fmaddfp and fnmsubfp have different rounding behaviors
    than the VSX xvmaddsp and xvnmsubsp instructions.  In particular, generating
    these instructions seems to break Eigen.
    
    GCC has generated the Altivec fmaddfp and fnmsubfp instructions on VSX systems
    as an alternative to the xsmadd{a,m}sp and xsnmsub{a,m}sp instructions.  The
    advantage  of the Altivec instructions is that they are 4 operand instructions
    (i.e. the target register does not have to overlap with one of the input
    registers).  The advantage is it can eliminate an extra move instruction.  The
    disadvantage is it does round the same was as the VSX instructions.
    
    This patch eliminates the generation of the Altivec fmaddfp and fnmsubfp
    instructions as alternatives in the VSX instruction insn support, and in the
    Altivec insns it adds a test to prevent the insn from being used if VSX is
    available.  I also added a test to the regression test suite.
    
    I have done bootstrap builds on power9 little endian (with both IEEE long
    double and IBM long double).  I have also done the builds and test on a power8
    big endian system (testing both 32-bit and 64-bit code generation).  Chip has
    verified that it fixes the problem that Eigen encountered.  Can I check this
    into the master GCC branch?  After a burn-in period, can I check this patch
    into the active GCC branches?
    
    Thanks in advance.
    
    2023-04-06   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            PR target/70243
            * config/rs6000/rs6000.md (isa attribute): Add fastmath.
            (enabled attribute): Add support for fastmath.
            * config/rs6000/vsx.md (vsx_fmav4sf4): Set the isa attribute to
            fastmath to disable Altivec instruction generatins normally.
            (vsx_nfmsv4sf4): Likewise.
    
    gcc/testsuite/
    
            PR target/70243
            * gcc.target/powerpc/pr70243.c: New test.
            * gcc.target/powerpc/pr70243-2.c: New test.

Diff:
---
 gcc/config/rs6000/rs6000.md                  |  6 +++-
 gcc/config/rs6000/vsx.md                     |  6 ++--
 gcc/testsuite/gcc.target/powerpc/pr70243-2.c | 41 ++++++++++++++++++++++++++++
 gcc/testsuite/gcc.target/powerpc/pr70243.c   | 41 ++++++++++++++++++++++++++++
 4 files changed, 91 insertions(+), 3 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 44f7dd509cb..7fea6a40e0c 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -354,7 +354,7 @@
   (const (symbol_ref "(enum attr_cpu) rs6000_tune")))
 
 ;; The ISA we implement.
-(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10"
+(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10,fastmath"
   (const_string "any"))
 
 ;; Is this alternative enabled for the current CPU/ISA/etc.?
@@ -402,6 +402,10 @@
      (and (eq_attr "isa" "p10")
 	  (match_test "TARGET_POWER10"))
      (const_int 1)
+
+     (and (eq_attr "isa" "fastmath")
+	  (match_test "flag_unsafe_math_optimizations"))
+     (const_int 1)
     ] (const_int 0)))
 
 ;; If this instruction is microcoded on the CELL processor
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 0865608f94a..85d4ac5082f 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -2025,7 +2025,8 @@
    xvmaddasp %x0,%x1,%x2
    xvmaddmsp %x0,%x1,%x3
    vmaddfp %0,%1,%2,%3"
-  [(set_attr "type" "vecfloat")])
+  [(set_attr "type" "vecfloat")
+   (set_attr "isa" "*,*,fastmath")])
 
 (define_insn "*vsx_fmav2df4"
   [(set (match_operand:V2DF 0 "vsx_register_operand" "=wa,wa")
@@ -2078,7 +2079,8 @@
    xvnmsubasp %x0,%x1,%x2
    xvnmsubmsp %x0,%x1,%x3
    vnmsubfp %0,%1,%2,%3"
-  [(set_attr "type" "vecfloat")])
+  [(set_attr "type" "vecfloat")
+   (set_attr "isa" "*,*,fastmath")])
 
 (define_insn "*vsx_nfmsv2df4"
   [(set (match_operand:V2DF 0 "vsx_register_operand" "=wa,wa")
diff --git a/gcc/testsuite/gcc.target/powerpc/pr70243-2.c b/gcc/testsuite/gcc.target/powerpc/pr70243-2.c
new file mode 100644
index 00000000000..6eb3a53a76e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr70243-2.c
@@ -0,0 +1,41 @@
+/* { dg-do compile */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-Ofast -mvsx" } */
+
+/* PR 70423, Make sure we don't generate fmaddfp or fnmsubfp unless -ffast-math
+   is used.  These instructions do not round the same way the normal VSX
+   instructions do.  These tests are written where the 3 inputs and target are
+   all separate registers where the register allocator would prefer to issue
+   the 4 argument FMA instruction over the 3 argument instruction plus an extra
+   move.  */
+
+#include <altivec.h>
+
+vector float
+do_add1 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return (a * b) + c;
+}
+
+vector float
+do_nsub1 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return -((a * b) - c);
+}
+
+vector float
+do_add2 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return vec_madd (a, b, c);
+}
+
+vector float
+do_nsub2 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return vec_nmsub (a, b, c);
+}
+
+/* { dg-final { scan-assembler-not {\mxvmadd[am]sp\M}  } } */
+/* { dg-final { scan-assembler-not {\mxvnmsub[am]sp\M} } } */
+/* { dg-final { scan-assembler     {\mvmaddfp\M}       } } */
+/* { dg-final { scan-assembler     {\mvnmsubfp\M}      } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr70243.c b/gcc/testsuite/gcc.target/powerpc/pr70243.c
new file mode 100644
index 00000000000..3d8ca9cff29
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr70243.c
@@ -0,0 +1,41 @@
+/* { dg-do compile */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+/* PR 70423, Make sure we don't generate fmaddfp or fnmsubfp unless -ffast-math
+   is used.  These instructions do not round the same way the normal VSX
+   instructions do.  These tests are written where the 3 inputs and target are
+   all separate registers where the register allocator would prefer to issue
+   the 4 argument FMA instruction over the 3 argument instruction plus an extra
+   move.  */
+
+#include <altivec.h>
+
+vector float
+do_add1 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return (a * b) + c;
+}
+
+vector float
+do_nsub1 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return -((a * b) - c);
+}
+
+vector float
+do_add2 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return vec_madd (a, b, c);
+}
+
+vector float
+do_nsub2 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return vec_nmsub (a, b, c);
+}
+
+/* { dg-final { scan-assembler     {\mxvmadd[am]sp\M}  } } */
+/* { dg-final { scan-assembler     {\mxvnmsub[am]sp\M} } } */
+/* { dg-final { scan-assembler-not {\mvmaddfp\M}       } } */
+/* { dg-final { scan-assembler-not {\mvnmsubfp\M}      } } */

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [gcc(refs/users/meissner/heads/work117)] Do not generate fmaddfp and fnmsubfp
@ 2023-04-06 19:52 Michael Meissner
  0 siblings, 0 replies; 7+ messages in thread
From: Michael Meissner @ 2023-04-06 19:52 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:887189cd66871102fef739a57a3404310f88c499

commit 887189cd66871102fef739a57a3404310f88c499
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Thu Apr 6 15:52:05 2023 -0400

    Do not generate fmaddfp and fnmsubfp
    
    The Altivec instructions fmaddfp and fnmsubfp have different rounding behaviors
    than the VSX xvmaddsp and xvnmsubsp instructions.  In particular, generating
    these instructions seems to break Eigen.
    
    GCC has generated the Altivec fmaddfp and fnmsubfp instructions on VSX systems
    as an alternative to the xsmadd{a,m}sp and xsnmsub{a,m}sp instructions.  The
    advantage  of the Altivec instructions is that they are 4 operand instructions
    (i.e. the target register does not have to overlap with one of the input
    registers).  The advantage is it can eliminate an extra move instruction.  The
    disadvantage is it does round the same was as the VSX instructions.
    
    This patch eliminates the generation of the Altivec fmaddfp and fnmsubfp
    instructions as alternatives in the VSX instruction insn support, and in the
    Altivec insns it adds a test to prevent the insn from being used if VSX is
    available.  I also added a test to the regression test suite.
    
    I have done bootstrap builds on power9 little endian (with both IEEE long
    double and IBM long double).  I have also done the builds and test on a power8
    big endian system (testing both 32-bit and 64-bit code generation).  Chip has
    verified that it fixes the problem that Eigen encountered.  Can I check this
    into the master GCC branch?  After a burn-in period, can I check this patch
    into the active GCC branches?
    
    Thanks in advance.
    
    2023-04-06   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            PR target/70243
            * config/rs6000/rs6000.md (isa attribute): Add fastmath.
            (enabled attribute): Add support for fastmath.
            * config/rs6000/vsx.md (vsx_fmav4sf4): Set the isa attribute to
            fastmath to disable Altivec instruction generatins normally.
            (vsx_nfmsv4sf4): Likewise.
    
    gcc/testsuite/
    
            PR target/70243
            * gcc.target/powerpc/pr70243.c: New test.
            * gcc.target/powerpc/pr70243-2.c: New test.

Diff:
---
 gcc/config/rs6000/rs6000.md                  |  6 +++-
 gcc/config/rs6000/vsx.md                     |  6 ++--
 gcc/testsuite/gcc.target/powerpc/pr70243-2.c | 41 ++++++++++++++++++++++++++++
 gcc/testsuite/gcc.target/powerpc/pr70243.c   | 41 ++++++++++++++++++++++++++++
 4 files changed, 91 insertions(+), 3 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 44f7dd509cb..7fea6a40e0c 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -354,7 +354,7 @@
   (const (symbol_ref "(enum attr_cpu) rs6000_tune")))
 
 ;; The ISA we implement.
-(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10"
+(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10,fastmath"
   (const_string "any"))
 
 ;; Is this alternative enabled for the current CPU/ISA/etc.?
@@ -402,6 +402,10 @@
      (and (eq_attr "isa" "p10")
 	  (match_test "TARGET_POWER10"))
      (const_int 1)
+
+     (and (eq_attr "isa" "fastmath")
+	  (match_test "flag_unsafe_math_optimizations"))
+     (const_int 1)
     ] (const_int 0)))
 
 ;; If this instruction is microcoded on the CELL processor
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 0865608f94a..85d4ac5082f 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -2025,7 +2025,8 @@
    xvmaddasp %x0,%x1,%x2
    xvmaddmsp %x0,%x1,%x3
    vmaddfp %0,%1,%2,%3"
-  [(set_attr "type" "vecfloat")])
+  [(set_attr "type" "vecfloat")
+   (set_attr "isa" "*,*,fastmath")])
 
 (define_insn "*vsx_fmav2df4"
   [(set (match_operand:V2DF 0 "vsx_register_operand" "=wa,wa")
@@ -2078,7 +2079,8 @@
    xvnmsubasp %x0,%x1,%x2
    xvnmsubmsp %x0,%x1,%x3
    vnmsubfp %0,%1,%2,%3"
-  [(set_attr "type" "vecfloat")])
+  [(set_attr "type" "vecfloat")
+   (set_attr "isa" "*,*,fastmath")])
 
 (define_insn "*vsx_nfmsv2df4"
   [(set (match_operand:V2DF 0 "vsx_register_operand" "=wa,wa")
diff --git a/gcc/testsuite/gcc.target/powerpc/pr70243-2.c b/gcc/testsuite/gcc.target/powerpc/pr70243-2.c
new file mode 100644
index 00000000000..27460150631
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr70243-2.c
@@ -0,0 +1,41 @@
+/* { dg-do compile */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-Ofast -mvsx" } */
+
+/* PR 70423, Make sure we don't generate fmaddfp or fnmsubfp unless -ffast-math
+   is used.  These instructions do not round the same way the normal VSX
+   instructions do.  These tests are written where the 3 inputs and target are
+   all separate registers where the register allocator would prefer to issue
+   the 4 argument FMA instruction over the 3 argument instruction plus an extra
+   move.  */
+
+#include <altivec.h>
+
+vector float
+do_add1 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return (a * b) + c;
+}
+
+vector float
+do_nsub1 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return -((a * b) - c);
+}
+
+vector float
+do_add2 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return vec_madd (a, b, c);
+}
+
+vector float
+do_nsub2 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return vec_nmsub (a, b, c);
+}
+
+/* { dg-final { scan-assembler-not {\mxvmadd[am]sp\M}  } } */
+/* { dg-final { scan-assembler-not {\mxvnmsub[am]sp\M} } } */
+/* { dg-final { scan-assembler     {\mfmaddfp\M}       } } */
+/* { dg-final { scan-assembler     {\mfnmsubfp\M}      } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr70243.c b/gcc/testsuite/gcc.target/powerpc/pr70243.c
new file mode 100644
index 00000000000..91b75b68986
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr70243.c
@@ -0,0 +1,41 @@
+/* { dg-do compile */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+/* PR 70423, Make sure we don't generate fmaddfp or fnmsubfp unless -ffast-math
+   is used.  These instructions do not round the same way the normal VSX
+   instructions do.  These tests are written where the 3 inputs and target are
+   all separate registers where the register allocator would prefer to issue
+   the 4 argument FMA instruction over the 3 argument instruction plus an extra
+   move.  */
+
+#include <altivec.h>
+
+vector float
+do_add1 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return (a * b) + c;
+}
+
+vector float
+do_nsub1 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return -((a * b) - c);
+}
+
+vector float
+do_add2 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return vec_madd (a, b, c);
+}
+
+vector float
+do_nsub2 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return vec_nmsub (a, b, c);
+}
+
+/* { dg-final { scan-assembler     {\mxvmadd[am]sp\M}  } } */
+/* { dg-final { scan-assembler     {\mxvnmsub[am]sp\M} } } */
+/* { dg-final { scan-assembler-not {\mfmaddfp\M}       } } */
+/* { dg-final { scan-assembler-not {\mfnmsubfp\M}      } } */

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [gcc(refs/users/meissner/heads/work117)] Do not generate fmaddfp and fnmsubfp
@ 2023-04-06 17:20 Michael Meissner
  0 siblings, 0 replies; 7+ messages in thread
From: Michael Meissner @ 2023-04-06 17:20 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:be052de3b9adde5c7a445bb2eb2f346099aef3f9

commit be052de3b9adde5c7a445bb2eb2f346099aef3f9
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Thu Apr 6 13:20:21 2023 -0400

    Do not generate fmaddfp and fnmsubfp
    
    The Altivec instructions fmaddfp and fnmsubfp have different rounding behaviors
    than the VSX xvmaddsp and xvnmsubsp instructions.  In particular, generating
    these instructions seems to break Eigen.
    
    GCC has generated the Altivec fmaddfp and fnmsubfp instructions on VSX systems
    as an alternative to the xsmadd{a,m}sp and xsnmsub{a,m}sp instructions.  The
    advantage  of the Altivec instructions is that they are 4 operand instructions
    (i.e. the target register does not have to overlap with one of the input
    registers).  The advantage is it can eliminate an extra move instruction.  The
    disadvantage is it does round the same was as the VSX instructions.
    
    This patch eliminates the generation of the Altivec fmaddfp and fnmsubfp
    instructions as alternatives in the VSX instruction insn support, and in the
    Altivec insns it adds a test to prevent the insn from being used if VSX is
    available.  I also added a test to the regression test suite.
    
    I have done bootstrap builds on power9 little endian (with both IEEE long
    double and IBM long double).  I have also done the builds and test on a power8
    big endian system (testing both 32-bit and 64-bit code generation).  Chip has
    verified that it fixes the problem that Eigen encountered.  Can I check this
    into the master GCC branch?  After a burn-in period, can I check this patch
    into the active GCC branches?
    
    Thanks in advance.
    
    2023-04-06   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            PR target/70243
            * config/rs6000/rs6000.md (isa attribute): Add fastmath.
            (enabled attribute): Add support for fastmath.
            * config/rs6000/vsx.md (vsx_fmav4sf4): Set the isa attribute to
            fastmath to disable Altivec instruction generatins normally.
            (vsx_nfmsv4sf4): Likewise.
    
    gcc/testsuite/
    
            PR target/70243
            * gcc.target/powerpc/pr70243.c: New test.
            * gcc.target/powerpc/pr70243-2.c: New test.

Diff:
---
 gcc/config/rs6000/rs6000.md                  |  6 +++-
 gcc/config/rs6000/vsx.md                     |  6 ++--
 gcc/testsuite/gcc.target/powerpc/pr70243-2.c | 41 ++++++++++++++++++++++++++++
 gcc/testsuite/gcc.target/powerpc/pr70243.c   | 41 ++++++++++++++++++++++++++++
 4 files changed, 91 insertions(+), 3 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 44f7dd509cb..21eae610efc 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -354,7 +354,7 @@
   (const (symbol_ref "(enum attr_cpu) rs6000_tune")))
 
 ;; The ISA we implement.
-(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10"
+(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10,fastmath"
   (const_string "any"))
 
 ;; Is this alternative enabled for the current CPU/ISA/etc.?
@@ -402,6 +402,10 @@
      (and (eq_attr "isa" "p10")
 	  (match_test "TARGET_POWER10"))
      (const_int 1)
+
+     (and (eq_attr "isa" "fastmath")
+	  (match_test "flag_unsafe_math_optimizations || flag_rounding_math"))
+     (const_int 1)
     ] (const_int 0)))
 
 ;; If this instruction is microcoded on the CELL processor
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 0865608f94a..85d4ac5082f 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -2025,7 +2025,8 @@
    xvmaddasp %x0,%x1,%x2
    xvmaddmsp %x0,%x1,%x3
    vmaddfp %0,%1,%2,%3"
-  [(set_attr "type" "vecfloat")])
+  [(set_attr "type" "vecfloat")
+   (set_attr "isa" "*,*,fastmath")])
 
 (define_insn "*vsx_fmav2df4"
   [(set (match_operand:V2DF 0 "vsx_register_operand" "=wa,wa")
@@ -2078,7 +2079,8 @@
    xvnmsubasp %x0,%x1,%x2
    xvnmsubmsp %x0,%x1,%x3
    vnmsubfp %0,%1,%2,%3"
-  [(set_attr "type" "vecfloat")])
+  [(set_attr "type" "vecfloat")
+   (set_attr "isa" "*,*,fastmath")])
 
 (define_insn "*vsx_nfmsv2df4"
   [(set (match_operand:V2DF 0 "vsx_register_operand" "=wa,wa")
diff --git a/gcc/testsuite/gcc.target/powerpc/pr70243-2.c b/gcc/testsuite/gcc.target/powerpc/pr70243-2.c
new file mode 100644
index 00000000000..5045e1271a2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr70243-2.c
@@ -0,0 +1,41 @@
+/* { dg-do compile */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx -ffast-math" } */
+
+/* PR 70423, Make sure we don't generate fmaddfp or fnmsubfp unless -ffast-math
+   is used.  These instructions do not round the same way the normal VSX
+   instructions do.  These tests are written where the 3 inputs and target are
+   all separate registers where the register allocator would prefer to issue
+   the 4 argument FMA instruction over the 3 argument instruction plus an extra
+   move.  */
+
+#include <altivec.h>
+
+vector float
+do_add1 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return (a * b) + c;
+}
+
+vector float
+do_nsub1 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return -((a * b) - c);
+}
+
+vector float
+do_add2 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return vec_madd (a, b, c);
+}
+
+vector float
+do_nsub2 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return vec_nmsub (a, b, c);
+}
+
+/* { dg-final { scan-assembler-not "xvmaddsp"  } } */
+/* { dg-final { scan-assembler-not "xvnmsubsp" } } */
+/* { dg-final { scan-assembler     "fmaddfp"   } } */
+/* { dg-final { scan-assembler     "fnmsubfp"  } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr70243.c b/gcc/testsuite/gcc.target/powerpc/pr70243.c
new file mode 100644
index 00000000000..7860767d83f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr70243.c
@@ -0,0 +1,41 @@
+/* { dg-do compile */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+/* PR 70423, Make sure we don't generate fmaddfp or fnmsubfp unless -ffast-math
+   is used.  These instructions do not round the same way the normal VSX
+   instructions do.  These tests are written where the 3 inputs and target are
+   all separate registers where the register allocator would prefer to issue
+   the 4 argument FMA instruction over the 3 argument instruction plus an extra
+   move.  */
+
+#include <altivec.h>
+
+vector float
+do_add1 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return (a * b) + c;
+}
+
+vector float
+do_nsub1 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return -((a * b) - c);
+}
+
+vector float
+do_add2 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return vec_madd (a, b, c);
+}
+
+vector float
+do_nsub2 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return vec_nmsub (a, b, c);
+}
+
+/* { dg-final { scan-assembler     "xvmaddsp"  } } */
+/* { dg-final { scan-assembler     "xvnmsubsp" } } */
+/* { dg-final { scan-assembler-not "fmaddfp"   } } */
+/* { dg-final { scan-assembler-not "fnmsubfp"  } } */

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [gcc(refs/users/meissner/heads/work117)] Do not generate fmaddfp and fnmsubfp
@ 2023-04-06  3:14 Michael Meissner
  0 siblings, 0 replies; 7+ messages in thread
From: Michael Meissner @ 2023-04-06  3:14 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:705e878b8261bf703a24a9067d0d44d842f39846

commit 705e878b8261bf703a24a9067d0d44d842f39846
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Wed Apr 5 23:13:52 2023 -0400

    Do not generate fmaddfp and fnmsubfp
    
    The Altivec instructions fmaddfp and fnmsubfp have different rounding behaviors
    than the VSX xvmaddsp and xvnmsubsp instructions.  In particular, generating
    these instructions seems to break Eigen.
    
    2023-04-05   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            PR target/70243
            * config/rs6000/altivec.md (altivec_fmav4sf4): Add a test to prevent
            fmaddfp and fnmsubfp from being generated on VSX systems.
            (altivec_vnmsubfp): Likewise.
            * config/rs6000/rs6000.md (vsx_fmav4sf4): Do not generate fmaddfp or
            fnmsubfp.
            (vsx_nfmsv4sf4): Likewise.
    
    gcc/testsuite/
    
            PR target/70243
            * gcc.target/powerpc/pr70243.c: New test.

Diff:
---
 gcc/config/rs6000/altivec.md               |  9 +++++--
 gcc/config/rs6000/vsx.md                   | 29 ++++++++++-----------
 gcc/testsuite/gcc.target/powerpc/pr70243.c | 41 ++++++++++++++++++++++++++++++
 3 files changed, 61 insertions(+), 18 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 49b0c964f4d..63eab228d0d 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -750,12 +750,15 @@
 
 ;; Fused multiply add.
 
+;; If we are using VSX instructions, do not generate the vmaddfp instruction
+;; since is has different rounding behavior than the xvmaddsp instruction.
+
 (define_insn "*altivec_fmav4sf4"
   [(set (match_operand:V4SF 0 "register_operand" "=v")
 	(fma:V4SF (match_operand:V4SF 1 "register_operand" "v")
 		  (match_operand:V4SF 2 "register_operand" "v")
 		  (match_operand:V4SF 3 "register_operand" "v")))]
-  "VECTOR_UNIT_ALTIVEC_P (V4SFmode)"
+  "VECTOR_UNIT_ALTIVEC_P (V4SFmode) && !TARGET_VSX"
   "vmaddfp %0,%1,%2,%3"
   [(set_attr "type" "vecfloat")])
 
@@ -984,6 +987,8 @@
   [(set_attr "type" "vecsimple")])
 
 ;; Fused multiply subtract 
+;; If we are using VSX instructions, do not generate the vnmsubfp instruction
+;; since is has different rounding behavior than the xvnmsubsp instruction.
 (define_insn "*altivec_vnmsubfp"
   [(set (match_operand:V4SF 0 "register_operand" "=v")
 	(neg:V4SF
@@ -991,7 +996,7 @@
 		   (match_operand:V4SF 2 "register_operand" "v")
 		   (neg:V4SF
 		    (match_operand:V4SF 3 "register_operand" "v")))))]
-  "VECTOR_UNIT_ALTIVEC_P (V4SFmode)"
+  "VECTOR_UNIT_ALTIVEC_P (V4SFmode) && !TARGET_VSX"
   "vnmsubfp %0,%1,%2,%3"
   [(set_attr "type" "vecfloat")])
 
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 0865608f94a..03c1d787b6c 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -2009,22 +2009,20 @@
   "x<VSv>tsqrt<sd>p %0,%x1"
   [(set_attr "type" "<VStype_simple>")])
 
-;; Fused vector multiply/add instructions. Support the classical Altivec
-;; versions of fma, which allows the target to be a separate register from the
-;; 3 inputs.  Under VSX, the target must be either the addend or the first
-;; multiply.
+;; Fused vector multiply/add instructions. Do not use the classical Altivec
+;; versions of fma.  Those instructions allows the target to be a separate
+;; register from the 3 inputs, but they have different rounding behaviors.
 
 (define_insn "*vsx_fmav4sf4"
-  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v")
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
 	(fma:V4SF
-	  (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v")
-	  (match_operand:V4SF 2 "vsx_register_operand" "wa,0,v")
-	  (match_operand:V4SF 3 "vsx_register_operand" "0,wa,v")))]
+	  (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa")
+	  (match_operand:V4SF 2 "vsx_register_operand" "wa,0")
+	  (match_operand:V4SF 3 "vsx_register_operand" "0,wa")))]
   "VECTOR_UNIT_VSX_P (V4SFmode)"
   "@
    xvmaddasp %x0,%x1,%x2
-   xvmaddmsp %x0,%x1,%x3
-   vmaddfp %0,%1,%2,%3"
+   xvmaddmsp %x0,%x1,%x3"
   [(set_attr "type" "vecfloat")])
 
 (define_insn "*vsx_fmav2df4"
@@ -2066,18 +2064,17 @@
   [(set_attr "type" "<VStype_mul>")])
 
 (define_insn "*vsx_nfmsv4sf4"
-  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v")
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
 	(neg:V4SF
 	 (fma:V4SF
-	   (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v")
-	   (match_operand:V4SF 2 "vsx_register_operand" "wa,0,v")
+	   (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa")
+	   (match_operand:V4SF 2 "vsx_register_operand" "wa,0")
 	   (neg:V4SF
-	     (match_operand:V4SF 3 "vsx_register_operand" "0,wa,v")))))]
+	     (match_operand:V4SF 3 "vsx_register_operand" "0,wa")))))]
   "VECTOR_UNIT_VSX_P (V4SFmode)"
   "@
    xvnmsubasp %x0,%x1,%x2
-   xvnmsubmsp %x0,%x1,%x3
-   vnmsubfp %0,%1,%2,%3"
+   xvnmsubmsp %x0,%x1,%x3"
   [(set_attr "type" "vecfloat")])
 
 (define_insn "*vsx_nfmsv2df4"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr70243.c b/gcc/testsuite/gcc.target/powerpc/pr70243.c
new file mode 100644
index 00000000000..4ec54734f3e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr70243.c
@@ -0,0 +1,41 @@
+/* { dg-do compile */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+/* PR 70423, Make sure we don't generate fmaddfp or fnmsubfp.  These
+   instructions have different rounding modes than the VSX instructions
+   xvmaddsp and xvnmsubsp.  These tests are written where the 3 inputs and
+   target are all separate registers.  Because fmaddfp and fnmsubfp are no
+   longer generated the compiler will have to generate an xsmadd[am]sp or
+   xsnmsub[am]sp instruction along with a move operation.  */
+
+#include <altivec.h>
+
+vector float
+do_add1 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return (a * b) + c;
+}
+
+vector float
+do_nsub1 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return -((a * b) - c);
+}
+
+vector float
+do_add2 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return vec_madd (a, b, c);
+}
+
+vector float
+do_nsub2 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return vec_nmsub (a, b, c);
+}
+
+/* { dg-final { scan-assembler     {\mxvmadd[am]sp\M}  } } */
+/* { dg-final { scan-assembler     {\mxvnmsub[am]sp\M} } } */
+/* { dg-final { scan-assembler-not {\mfmaddfp\M}       } } */
+/* { dg-final { scan-assembler-not {\mfnmsubfp\M}      } } */

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [gcc(refs/users/meissner/heads/work117)] Do not generate fmaddfp and fnmsubfp
@ 2023-04-06  2:48 Michael Meissner
  0 siblings, 0 replies; 7+ messages in thread
From: Michael Meissner @ 2023-04-06  2:48 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:e20b21164990960fb8c2afaa4811d2c9ad645794

commit e20b21164990960fb8c2afaa4811d2c9ad645794
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Wed Apr 5 22:47:04 2023 -0400

    Do not generate fmaddfp and fnmsubfp
    
    The Altivec instructions fmaddfp and fnmsubfp have different rounding behaviors
    than the VSX xvmaddsp and xvnmsubsp instructions.  In particular, generating
    these instructions seems to break Eigen.
    
    2023-04-05   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            PR target/70243
            * config/rs6000/altivec.md (altivec_fmav4sf4): Add a test to prevent
            fmaddfp and fnmsubfp from being generated on VSX systems.
            (altivec_vnmsubfp): Likewise.
            * config/rs6000/rs6000.md (vsx_fmav4sf4): Do not generate fmaddfp or
            fnmsubfp.
            (vsx_nfmsv4sf4): Likewise.
    
    gcc/testsuite/
    
            PR target/70243
            * gcc.target/powerpc/pr70243.c: New test.

Diff:
---
 gcc/config/rs6000/altivec.md               |  9 +++++--
 gcc/config/rs6000/vsx.md                   | 29 ++++++++++-----------
 gcc/testsuite/gcc.target/powerpc/pr70243.c | 41 ++++++++++++++++++++++++++++++
 3 files changed, 61 insertions(+), 18 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 49b0c964f4d..63eab228d0d 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -750,12 +750,15 @@
 
 ;; Fused multiply add.
 
+;; If we are using VSX instructions, do not generate the vmaddfp instruction
+;; since is has different rounding behavior than the xvmaddsp instruction.
+
 (define_insn "*altivec_fmav4sf4"
   [(set (match_operand:V4SF 0 "register_operand" "=v")
 	(fma:V4SF (match_operand:V4SF 1 "register_operand" "v")
 		  (match_operand:V4SF 2 "register_operand" "v")
 		  (match_operand:V4SF 3 "register_operand" "v")))]
-  "VECTOR_UNIT_ALTIVEC_P (V4SFmode)"
+  "VECTOR_UNIT_ALTIVEC_P (V4SFmode) && !TARGET_VSX"
   "vmaddfp %0,%1,%2,%3"
   [(set_attr "type" "vecfloat")])
 
@@ -984,6 +987,8 @@
   [(set_attr "type" "vecsimple")])
 
 ;; Fused multiply subtract 
+;; If we are using VSX instructions, do not generate the vnmsubfp instruction
+;; since is has different rounding behavior than the xvnmsubsp instruction.
 (define_insn "*altivec_vnmsubfp"
   [(set (match_operand:V4SF 0 "register_operand" "=v")
 	(neg:V4SF
@@ -991,7 +996,7 @@
 		   (match_operand:V4SF 2 "register_operand" "v")
 		   (neg:V4SF
 		    (match_operand:V4SF 3 "register_operand" "v")))))]
-  "VECTOR_UNIT_ALTIVEC_P (V4SFmode)"
+  "VECTOR_UNIT_ALTIVEC_P (V4SFmode) && !TARGET_VSX"
   "vnmsubfp %0,%1,%2,%3"
   [(set_attr "type" "vecfloat")])
 
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 0865608f94a..03c1d787b6c 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -2009,22 +2009,20 @@
   "x<VSv>tsqrt<sd>p %0,%x1"
   [(set_attr "type" "<VStype_simple>")])
 
-;; Fused vector multiply/add instructions. Support the classical Altivec
-;; versions of fma, which allows the target to be a separate register from the
-;; 3 inputs.  Under VSX, the target must be either the addend or the first
-;; multiply.
+;; Fused vector multiply/add instructions. Do not use the classical Altivec
+;; versions of fma.  Those instructions allows the target to be a separate
+;; register from the 3 inputs, but they have different rounding behaviors.
 
 (define_insn "*vsx_fmav4sf4"
-  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v")
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
 	(fma:V4SF
-	  (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v")
-	  (match_operand:V4SF 2 "vsx_register_operand" "wa,0,v")
-	  (match_operand:V4SF 3 "vsx_register_operand" "0,wa,v")))]
+	  (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa")
+	  (match_operand:V4SF 2 "vsx_register_operand" "wa,0")
+	  (match_operand:V4SF 3 "vsx_register_operand" "0,wa")))]
   "VECTOR_UNIT_VSX_P (V4SFmode)"
   "@
    xvmaddasp %x0,%x1,%x2
-   xvmaddmsp %x0,%x1,%x3
-   vmaddfp %0,%1,%2,%3"
+   xvmaddmsp %x0,%x1,%x3"
   [(set_attr "type" "vecfloat")])
 
 (define_insn "*vsx_fmav2df4"
@@ -2066,18 +2064,17 @@
   [(set_attr "type" "<VStype_mul>")])
 
 (define_insn "*vsx_nfmsv4sf4"
-  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v")
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
 	(neg:V4SF
 	 (fma:V4SF
-	   (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v")
-	   (match_operand:V4SF 2 "vsx_register_operand" "wa,0,v")
+	   (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa")
+	   (match_operand:V4SF 2 "vsx_register_operand" "wa,0")
 	   (neg:V4SF
-	     (match_operand:V4SF 3 "vsx_register_operand" "0,wa,v")))))]
+	     (match_operand:V4SF 3 "vsx_register_operand" "0,wa")))))]
   "VECTOR_UNIT_VSX_P (V4SFmode)"
   "@
    xvnmsubasp %x0,%x1,%x2
-   xvnmsubmsp %x0,%x1,%x3
-   vnmsubfp %0,%1,%2,%3"
+   xvnmsubmsp %x0,%x1,%x3"
   [(set_attr "type" "vecfloat")])
 
 (define_insn "*vsx_nfmsv2df4"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr70243.c b/gcc/testsuite/gcc.target/powerpc/pr70243.c
new file mode 100644
index 00000000000..1dfc13a8864
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr70243.c
@@ -0,0 +1,41 @@
+/* { dg-do compile */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+/* PR 70423, Make sure we don't generate fmaddfp or fnmsubfp.  These
+   instructions have different rounding modes than the VSX instructions
+   xvmaddsp and xvnmsubsp.  These tests are written where the 3 inputs and
+   target are all separate registers.  Because fmaddfp and fnmsubfp are no
+   longer generated the compiler will have to generate an xsmaddsp or xsnmsubsp
+   instruction followed by a move operation.  */
+
+#include <altivec.h>
+
+vector float
+do_add1 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return (a * b) + c;
+}
+
+vector float
+do_nsub1 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return -((a * b) - c);
+}
+
+vector float
+do_add2 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return vec_madd (a, b, c);
+}
+
+vector float
+do_nsub2 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return vec_nmsub (a, b, c);
+}
+
+/* { dg-final { scan-assembler     "xvmaddsp"  } } */
+/* { dg-final { scan-assembler     "xvnmsubsp" } } */
+/* { dg-final { scan-assembler-not "fmaddfp"   } } */
+/* { dg-final { scan-assembler-not "fnmsubfp"  } } */

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [gcc(refs/users/meissner/heads/work117)] Do not generate fmaddfp and fnmsubfp
@ 2023-04-05 23:20 Michael Meissner
  0 siblings, 0 replies; 7+ messages in thread
From: Michael Meissner @ 2023-04-05 23:20 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:6367e6b4c9ae6ffc0f5ba9e0430a2f44be867a8d

commit 6367e6b4c9ae6ffc0f5ba9e0430a2f44be867a8d
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Wed Apr 5 19:20:08 2023 -0400

    Do not generate fmaddfp and fnmsubfp
    
    The Altivec instructions fmaddfp and fnmsubfp have different rounding behaviors
    than the VSX xvmaddsp and xvnmsubsp instructions.  In particular, generating
    these instructions seems to break Eigen.
    
    2023-04-05   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            PR target/70243
            * config/rs6000/altivec.md (altivec_fmav4sf4): Add a test to prevent
            fmaddfp and fnmsubfp from being generated on VSX systems.
            (altivec_vnmsubfp): Likewise.
            * config/rs6000/rs6000.md (vsx_fmav4sf4): Do not generate fmaddfp or
            fnmsubfp.
            (vsx_nfmsv4sf4): Likewise.
    
    gcc/testsuite/
    
            PR target/70243
            * gcc.target/powerpc/pr70243.c: New test.

Diff:
---
 gcc/config/rs6000/altivec.md               |  9 +++++--
 gcc/config/rs6000/vsx.md                   | 29 ++++++++++------------
 gcc/testsuite/gcc.target/powerpc/pr70243.c | 39 ++++++++++++++++++++++++++++++
 3 files changed, 59 insertions(+), 18 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 49b0c964f4d..63eab228d0d 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -750,12 +750,15 @@
 
 ;; Fused multiply add.
 
+;; If we are using VSX instructions, do not generate the vmaddfp instruction
+;; since is has different rounding behavior than the xvmaddsp instruction.
+
 (define_insn "*altivec_fmav4sf4"
   [(set (match_operand:V4SF 0 "register_operand" "=v")
 	(fma:V4SF (match_operand:V4SF 1 "register_operand" "v")
 		  (match_operand:V4SF 2 "register_operand" "v")
 		  (match_operand:V4SF 3 "register_operand" "v")))]
-  "VECTOR_UNIT_ALTIVEC_P (V4SFmode)"
+  "VECTOR_UNIT_ALTIVEC_P (V4SFmode) && !TARGET_VSX"
   "vmaddfp %0,%1,%2,%3"
   [(set_attr "type" "vecfloat")])
 
@@ -984,6 +987,8 @@
   [(set_attr "type" "vecsimple")])
 
 ;; Fused multiply subtract 
+;; If we are using VSX instructions, do not generate the vnmsubfp instruction
+;; since is has different rounding behavior than the xvnmsubsp instruction.
 (define_insn "*altivec_vnmsubfp"
   [(set (match_operand:V4SF 0 "register_operand" "=v")
 	(neg:V4SF
@@ -991,7 +996,7 @@
 		   (match_operand:V4SF 2 "register_operand" "v")
 		   (neg:V4SF
 		    (match_operand:V4SF 3 "register_operand" "v")))))]
-  "VECTOR_UNIT_ALTIVEC_P (V4SFmode)"
+  "VECTOR_UNIT_ALTIVEC_P (V4SFmode) && !TARGET_VSX"
   "vnmsubfp %0,%1,%2,%3"
   [(set_attr "type" "vecfloat")])
 
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 0865608f94a..03c1d787b6c 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -2009,22 +2009,20 @@
   "x<VSv>tsqrt<sd>p %0,%x1"
   [(set_attr "type" "<VStype_simple>")])
 
-;; Fused vector multiply/add instructions. Support the classical Altivec
-;; versions of fma, which allows the target to be a separate register from the
-;; 3 inputs.  Under VSX, the target must be either the addend or the first
-;; multiply.
+;; Fused vector multiply/add instructions. Do not use the classical Altivec
+;; versions of fma.  Those instructions allows the target to be a separate
+;; register from the 3 inputs, but they have different rounding behaviors.
 
 (define_insn "*vsx_fmav4sf4"
-  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v")
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
 	(fma:V4SF
-	  (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v")
-	  (match_operand:V4SF 2 "vsx_register_operand" "wa,0,v")
-	  (match_operand:V4SF 3 "vsx_register_operand" "0,wa,v")))]
+	  (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa")
+	  (match_operand:V4SF 2 "vsx_register_operand" "wa,0")
+	  (match_operand:V4SF 3 "vsx_register_operand" "0,wa")))]
   "VECTOR_UNIT_VSX_P (V4SFmode)"
   "@
    xvmaddasp %x0,%x1,%x2
-   xvmaddmsp %x0,%x1,%x3
-   vmaddfp %0,%1,%2,%3"
+   xvmaddmsp %x0,%x1,%x3"
   [(set_attr "type" "vecfloat")])
 
 (define_insn "*vsx_fmav2df4"
@@ -2066,18 +2064,17 @@
   [(set_attr "type" "<VStype_mul>")])
 
 (define_insn "*vsx_nfmsv4sf4"
-  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v")
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
 	(neg:V4SF
 	 (fma:V4SF
-	   (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v")
-	   (match_operand:V4SF 2 "vsx_register_operand" "wa,0,v")
+	   (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa")
+	   (match_operand:V4SF 2 "vsx_register_operand" "wa,0")
 	   (neg:V4SF
-	     (match_operand:V4SF 3 "vsx_register_operand" "0,wa,v")))))]
+	     (match_operand:V4SF 3 "vsx_register_operand" "0,wa")))))]
   "VECTOR_UNIT_VSX_P (V4SFmode)"
   "@
    xvnmsubasp %x0,%x1,%x2
-   xvnmsubmsp %x0,%x1,%x3
-   vnmsubfp %0,%1,%2,%3"
+   xvnmsubmsp %x0,%x1,%x3"
   [(set_attr "type" "vecfloat")])
 
 (define_insn "*vsx_nfmsv2df4"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr70243.c b/gcc/testsuite/gcc.target/powerpc/pr70243.c
new file mode 100644
index 00000000000..5dc4e997193
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr70243.c
@@ -0,0 +1,39 @@
+/* { dg-do compile { } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power7" } */
+
+/* PR 70423, Make sure we don't generate fmaddfp or fnmsubfp.  These
+   instructions have different rounding modes than the VSX instructions
+   xvmaddsp and xvnmsubsp.  */
+
+#include <altivec.h>
+
+vector float
+do_add1 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return (a * b) + c;
+}
+
+vector float
+do_nsub1 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return -((a * b) - c);
+}
+
+vector float
+do_add2 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return vec_madd (a, b, c);
+}
+
+vector float
+do_nsub2 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return vec_nmsub (a, b, c);
+}
+
+/* { dg-final { scan-assembler     "xvmaddsp"  } } */
+/* { dg-final { scan-assembler     "xvnmsubsp" } } */
+/* { dg-final { scan-assembler-not "fmaddfp"   } } */
+/* { dg-final { scan-assembler-not "fnmsubfp"  } } */

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-04-06 20:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-06 18:48 [gcc(refs/users/meissner/heads/work117)] Do not generate fmaddfp and fnmsubfp Michael Meissner
  -- strict thread matches above, loose matches on Subject: below --
2023-04-06 20:32 Michael Meissner
2023-04-06 19:52 Michael Meissner
2023-04-06 17:20 Michael Meissner
2023-04-06  3:14 Michael Meissner
2023-04-06  2:48 Michael Meissner
2023-04-05 23:20 Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).