public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc(refs/users/meissner/heads/work117)] PR target/70243: Do not generate vmaddfp and vnmsubfp
@ 2023-04-07  4:36 Michael Meissner
  0 siblings, 0 replies; only message in thread
From: Michael Meissner @ 2023-04-07  4:36 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:48da331d20f75f36b83f333e9ea51248b726c94d

commit 48da331d20f75f36b83f333e9ea51248b726c94d
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Fri Apr 7 00:36:22 2023 -0400

    PR target/70243: Do not generate vmaddfp and vnmsubfp
    
    This is version 2 of the patch.  The first version was posted on April 6th.
    
    In this version, I eliminated the changes to Altivec.md that added checks to
    altivec_fmav4sf4 and altivec_vnmsubfp.  After writing the code, I remembered
    that VECTOR_UNIT_ALTIVEC_P that is used by those insns will not be true if the
    VSX instruction set is enabled, so no additional test is needed.
    
    As we discussed in a private chat room, I modified the code to generate vmaddfp
    and vnmsubfp if -Ofast (-ffast-math) is used.  This allows the compiler to
    eliminate the extra move if the user does not care about strict floating point
    code generation, but it generates only the VSX instructions in the normal
    case.
    
    I reworked the examples and split them into two tests to test both the normal
    case when -Ofast is not used and when it is used.
    
    I also fixed the instructions mentioned in the comments to be the actual
    instructions (vmaddfp and vnmsubfp) instead of fmaddfp and fnmsubdp.  Sorry
    about tat.
    
    The AltiVec (VMX) instructions vmaddfp and vnmsubfp have different rounding
    behaviors than the VSX xvmadd{a,m}sp and xvnmsub{a,m}sp instructions.  In
    particular, generating these instructions seems to break Eigen.
    
    The bug is that GCC has generated the VMX vmaddfp and vnmsubfp instructions on
    VSX systems as an alternative to the xsmadd{a,m}sp and xsnmsub{a,m}sp
    instructions.  The advantage of the VMX instructions is that they are 4 operand
    instructions (i.e. the target register does not have to overlap with one of the
    input registers).  This can mean that the compiler can eliminate an extra move
    instruction. The disadvantage of generating these instructions is it does not
    round the same was as the VSX instructions.
    
    This patch will only generate the VMX vmaddfp and vnmsubfp instructions as
    alternatives in the VSX instruction insn support if -Ofast (-ffast-math) is
    used.  I also added 2 tests to the regression suite.
    
    I have done bootstrap builds on power9 little endian (with both IEEE long
    double and IBM long double).  I have also done the builds and test on a power8
    big endian system (testing both 32-bit and 64-bit code generation).  Chip has
    verified that it fixes the problem that Eigen encountered.  Can I check this
    into the master GCC branch?  After a burn-in period, can I check this patch
    into the active GCC branches?
    
    Thanks in advance.
    
    2023-04-07   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            PR target/70243
            * config/rs6000/rs6000.md (isa attribute): Add fastmath.
            (enabled attribute): Add support for fastmath.
            * config/rs6000/vsx.md (vsx_fmav4sf4): Set the isa attribute to
            fastmath to disable Altivec instruction generatins normally.
            (vsx_nfmsv4sf4): Likewise.
    
    gcc/testsuite/
    
            PR target/70243
            * gcc.target/powerpc/pr70243.c: New test.
            * gcc.target/powerpc/pr70243-2.c: New test.

Diff:
---
 gcc/config/rs6000/rs6000.md                  |  6 +++-
 gcc/config/rs6000/vsx.md                     | 17 +++++++-----
 gcc/testsuite/gcc.target/powerpc/pr70243-2.c | 41 ++++++++++++++++++++++++++++
 gcc/testsuite/gcc.target/powerpc/pr70243.c   | 41 ++++++++++++++++++++++++++++
 4 files changed, 97 insertions(+), 8 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 44f7dd509cb..7fea6a40e0c 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -354,7 +354,7 @@
   (const (symbol_ref "(enum attr_cpu) rs6000_tune")))
 
 ;; The ISA we implement.
-(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10"
+(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10,fastmath"
   (const_string "any"))
 
 ;; Is this alternative enabled for the current CPU/ISA/etc.?
@@ -402,6 +402,10 @@
      (and (eq_attr "isa" "p10")
 	  (match_test "TARGET_POWER10"))
      (const_int 1)
+
+     (and (eq_attr "isa" "fastmath")
+	  (match_test "flag_unsafe_math_optimizations"))
+     (const_int 1)
     ] (const_int 0)))
 
 ;; If this instruction is microcoded on the CELL processor
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 0865608f94a..7f64a2dd356 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -2009,11 +2009,12 @@
   "x<VSv>tsqrt<sd>p %0,%x1"
   [(set_attr "type" "<VStype_simple>")])
 
-;; Fused vector multiply/add instructions. Support the classical Altivec
-;; versions of fma, which allows the target to be a separate register from the
-;; 3 inputs.  Under VSX, the target must be either the addend or the first
-;; multiply.
-
+;; Fused vector multiply/add instructions. Under VSX, the target must be either
+;; the addend or the first multiply.  If the user used -Ofast, also support the
+;; classical VMX versions of fma (vmaddfp and vnmsubfp), which allows the
+;; target to be a separate register from the 3 inputs.  This restriction is due
+;; to the fact that vmaddfp and vnmsubfp have different rounding behaviors
+;; compared to xvmadd{a,m}sp or xvnmsub{a,m}sp.
 (define_insn "*vsx_fmav4sf4"
   [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v")
 	(fma:V4SF
@@ -2025,7 +2026,8 @@
    xvmaddasp %x0,%x1,%x2
    xvmaddmsp %x0,%x1,%x3
    vmaddfp %0,%1,%2,%3"
-  [(set_attr "type" "vecfloat")])
+  [(set_attr "type" "vecfloat")
+   (set_attr "isa" "*,*,fastmath")])
 
 (define_insn "*vsx_fmav2df4"
   [(set (match_operand:V2DF 0 "vsx_register_operand" "=wa,wa")
@@ -2078,7 +2080,8 @@
    xvnmsubasp %x0,%x1,%x2
    xvnmsubmsp %x0,%x1,%x3
    vnmsubfp %0,%1,%2,%3"
-  [(set_attr "type" "vecfloat")])
+  [(set_attr "type" "vecfloat")
+   (set_attr "isa" "*,*,fastmath")])
 
 (define_insn "*vsx_nfmsv2df4"
   [(set (match_operand:V2DF 0 "vsx_register_operand" "=wa,wa")
diff --git a/gcc/testsuite/gcc.target/powerpc/pr70243-2.c b/gcc/testsuite/gcc.target/powerpc/pr70243-2.c
new file mode 100644
index 00000000000..ef475f39b12
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr70243-2.c
@@ -0,0 +1,41 @@
+/* { dg-do compile */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-Ofast -mvsx" } */
+
+/* PR 70423.  Make sure we don't generate vmaddfp or vnmsubfp unless -Ofast is
+   used.  These instructions do not round the same way the normal VSX
+   instructions do.  These tests are written where the 3 inputs and target are
+   all separate registers where the register allocator would prefer to issue
+   the 4 argument FMA instruction over the 3 argument instruction plus an extra
+   move.  */
+
+#include <altivec.h>
+
+vector float
+do_add1 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return (a * b) + c;
+}
+
+vector float
+do_nsub1 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return -((a * b) - c);
+}
+
+vector float
+do_add2 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return vec_madd (a, b, c);
+}
+
+vector float
+do_nsub2 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return vec_nmsub (a, b, c);
+}
+
+/* { dg-final { scan-assembler-not {\mxvmadd[am]sp\M}  } } */
+/* { dg-final { scan-assembler-not {\mxvnmsub[am]sp\M} } } */
+/* { dg-final { scan-assembler     {\mvmaddfp\M}       } } */
+/* { dg-final { scan-assembler     {\mvnmsubfp\M}      } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr70243.c b/gcc/testsuite/gcc.target/powerpc/pr70243.c
new file mode 100644
index 00000000000..c1a5c676fc3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr70243.c
@@ -0,0 +1,41 @@
+/* { dg-do compile */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+/* PR 70423.  Make sure we don't generate vmaddfp or vnmsubfp unless -Ofast is
+   used.  These instructions do not round the same way the normal VSX
+   instructions do.  These tests are written where the 3 inputs and target are
+   all separate registers where the register allocator would prefer to issue
+   the 4 argument FMA instruction over the 3 argument instruction plus an extra
+   move.  */
+
+#include <altivec.h>
+
+vector float
+do_add1 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return (a * b) + c;
+}
+
+vector float
+do_nsub1 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return -((a * b) - c);
+}
+
+vector float
+do_add2 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return vec_madd (a, b, c);
+}
+
+vector float
+do_nsub2 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return vec_nmsub (a, b, c);
+}
+
+/* { dg-final { scan-assembler     {\mxvmadd[am]sp\M}  } } */
+/* { dg-final { scan-assembler     {\mxvnmsub[am]sp\M} } } */
+/* { dg-final { scan-assembler-not {\mvmaddfp\M}       } } */
+/* { dg-final { scan-assembler-not {\mvnmsubfp\M}      } } */

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-04-07  4:36 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-07  4:36 [gcc(refs/users/meissner/heads/work117)] PR target/70243: Do not generate vmaddfp and vnmsubfp Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).