PR target/70243: Do not generate fmaddfp and fnmsubfp

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* PR target/70243: Do not generate fmaddfp and fnmsubfp
@ 2023-04-06 15:12 Michael Meissner
  2023-04-06 20:37 ` Segher Boessenkool
  0 siblings, 1 reply; 4+ messages in thread
From: Michael Meissner @ 2023-04-06 15:12 UTC (permalink / raw)
  To: gcc-patches, Michael Meissner, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner, Will Schmidt, chip.kerchner

The Altivec instructions fmaddfp and fnmsubfp have different rounding behaviors
than the VSX xvmaddsp and xvnmsubsp instructions.  In particular, generating
these instructions seems to break Eigen.

GCC has generated the Altivec fmaddfp and fnmsubfp instructions on VSX systems
as an alternative to the xsmadd{a,m}sp and xsnmsub{a,m}sp instructions.  The
advantage  of the Altivec instructions is that they are 4 operand instructions
(i.e. the target register does not have to overlap with one of the input
registers).  The advantage is it can eliminate an extra move instruction.  The
disadvantage is it does round the same was as the VSX instructions.

This patch eliminates the generation of the Altivec fmaddfp and fnmsubfp
instructions as alternatives in the VSX instruction insn support, and in the
Altivec insns it adds a test to prevent the insn from being used if VSX is
available.  I also added a test to the regression test suite.

I have done bootstrap builds on power9 little endian (with both IEEE long
double and IBM long double).  I have also done the builds and test on a power8
big endian system (testing both 32-bit and 64-bit code generation).  Chip has
verified that it fixes the problem that Eigen encountered.  Can I check this
into the master GCC branch?  After a burn-in period, can I check this patch
into the active GCC branches?

Thanks in advance.

2023-04-06   Michael Meissner  <meissner@linux.ibm.com>

gcc/

	PR target/70243
	* config/rs6000/altivec.md (altivec_fmav4sf4): Add a test to prevent
	fmaddfp and fnmsubfp from being generated on VSX systems.
	(altivec_vnmsubfp): Likewise.
	* config/rs6000/rs6000.md (vsx_fmav4sf4): Do not generate fmaddfp or
	fnmsubfp.
	(vsx_nfmsv4sf4): Likewise.

gcc/testsuite/

	PR target/70243
	* gcc.target/powerpc/pr70243.c: New test.
---
 gcc/config/rs6000/altivec.md               |  9 +++--
 gcc/config/rs6000/vsx.md                   | 29 +++++++--------
 gcc/testsuite/gcc.target/powerpc/pr70243.c | 41 ++++++++++++++++++++++
 3 files changed, 61 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr70243.c

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 49b0c964f4d..63eab228d0d 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -750,12 +750,15 @@ (define_insn "altivec_vsel<mode>4"
 
 ;; Fused multiply add.
 
+;; If we are using VSX instructions, do not generate the vmaddfp instruction
+;; since is has different rounding behavior than the xvmaddsp instruction.
+
 (define_insn "*altivec_fmav4sf4"
   [(set (match_operand:V4SF 0 "register_operand" "=v")
 	(fma:V4SF (match_operand:V4SF 1 "register_operand" "v")
 		  (match_operand:V4SF 2 "register_operand" "v")
 		  (match_operand:V4SF 3 "register_operand" "v")))]
-  "VECTOR_UNIT_ALTIVEC_P (V4SFmode)"
+  "VECTOR_UNIT_ALTIVEC_P (V4SFmode) && !TARGET_VSX"
   "vmaddfp %0,%1,%2,%3"
   [(set_attr "type" "vecfloat")])
 
@@ -984,6 +987,8 @@ (define_insn "vstril_p_direct_<mode>"
   [(set_attr "type" "vecsimple")])
 
 ;; Fused multiply subtract 
+;; If we are using VSX instructions, do not generate the vnmsubfp instruction
+;; since is has different rounding behavior than the xvnmsubsp instruction.
 (define_insn "*altivec_vnmsubfp"
   [(set (match_operand:V4SF 0 "register_operand" "=v")
 	(neg:V4SF
@@ -991,7 +996,7 @@ (define_insn "*altivec_vnmsubfp"
 		   (match_operand:V4SF 2 "register_operand" "v")
 		   (neg:V4SF
 		    (match_operand:V4SF 3 "register_operand" "v")))))]
-  "VECTOR_UNIT_ALTIVEC_P (V4SFmode)"
+  "VECTOR_UNIT_ALTIVEC_P (V4SFmode) && !TARGET_VSX"
   "vnmsubfp %0,%1,%2,%3"
   [(set_attr "type" "vecfloat")])
 
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 0865608f94a..03c1d787b6c 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -2009,22 +2009,20 @@ (define_insn "*vsx_tsqrt<mode>2_internal"
   "x<VSv>tsqrt<sd>p %0,%x1"
   [(set_attr "type" "<VStype_simple>")])
 
-;; Fused vector multiply/add instructions. Support the classical Altivec
-;; versions of fma, which allows the target to be a separate register from the
-;; 3 inputs.  Under VSX, the target must be either the addend or the first
-;; multiply.
+;; Fused vector multiply/add instructions. Do not use the classical Altivec
+;; versions of fma.  Those instructions allows the target to be a separate
+;; register from the 3 inputs, but they have different rounding behaviors.
 
 (define_insn "*vsx_fmav4sf4"
-  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v")
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
 	(fma:V4SF
-	  (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v")
-	  (match_operand:V4SF 2 "vsx_register_operand" "wa,0,v")
-	  (match_operand:V4SF 3 "vsx_register_operand" "0,wa,v")))]
+	  (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa")
+	  (match_operand:V4SF 2 "vsx_register_operand" "wa,0")
+	  (match_operand:V4SF 3 "vsx_register_operand" "0,wa")))]
   "VECTOR_UNIT_VSX_P (V4SFmode)"
   "@
    xvmaddasp %x0,%x1,%x2
-   xvmaddmsp %x0,%x1,%x3
-   vmaddfp %0,%1,%2,%3"
+   xvmaddmsp %x0,%x1,%x3"
   [(set_attr "type" "vecfloat")])
 
 (define_insn "*vsx_fmav2df4"
@@ -2066,18 +2064,17 @@ (define_insn "*vsx_nfma<mode>4"
   [(set_attr "type" "<VStype_mul>")])
 
 (define_insn "*vsx_nfmsv4sf4"
-  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v")
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
 	(neg:V4SF
 	 (fma:V4SF
-	   (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v")
-	   (match_operand:V4SF 2 "vsx_register_operand" "wa,0,v")
+	   (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa")
+	   (match_operand:V4SF 2 "vsx_register_operand" "wa,0")
 	   (neg:V4SF
-	     (match_operand:V4SF 3 "vsx_register_operand" "0,wa,v")))))]
+	     (match_operand:V4SF 3 "vsx_register_operand" "0,wa")))))]
   "VECTOR_UNIT_VSX_P (V4SFmode)"
   "@
    xvnmsubasp %x0,%x1,%x2
-   xvnmsubmsp %x0,%x1,%x3
-   vnmsubfp %0,%1,%2,%3"
+   xvnmsubmsp %x0,%x1,%x3"
   [(set_attr "type" "vecfloat")])
 
 (define_insn "*vsx_nfmsv2df4"
diff --git a/gcc/testsuite/gcc.target/powerpc/pr70243.c b/gcc/testsuite/gcc.target/powerpc/pr70243.c
new file mode 100644
index 00000000000..1dfc13a8864
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr70243.c
@@ -0,0 +1,41 @@
+/* { dg-do compile */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx" } */
+
+/* PR 70423, Make sure we don't generate fmaddfp or fnmsubfp.  These
+   instructions have different rounding modes than the VSX instructions
+   xvmaddsp and xvnmsubsp.  These tests are written where the 3 inputs and
+   target are all separate registers.  Because fmaddfp and fnmsubfp are no
+   longer generated the compiler will have to generate an xsmaddsp or xsnmsubsp
+   instruction followed by a move operation.  */
+
+#include <altivec.h>
+
+vector float
+do_add1 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return (a * b) + c;
+}
+
+vector float
+do_nsub1 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return -((a * b) - c);
+}
+
+vector float
+do_add2 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return vec_madd (a, b, c);
+}
+
+vector float
+do_nsub2 (vector float dummy, vector float a, vector float b, vector float c)
+{
+  return vec_nmsub (a, b, c);
+}
+
+/* { dg-final { scan-assembler     "xvmaddsp"  } } */
+/* { dg-final { scan-assembler     "xvnmsubsp" } } */
+/* { dg-final { scan-assembler-not "fmaddfp"   } } */
+/* { dg-final { scan-assembler-not "fnmsubfp"  } } */
-- 
2.39.2


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: PR target/70243: Do not generate fmaddfp and fnmsubfp
  2023-04-06 15:12 PR target/70243: Do not generate fmaddfp and fnmsubfp Michael Meissner
@ 2023-04-06 20:37 ` Segher Boessenkool
  2023-04-07  6:32   ` Michael Meissner
  0 siblings, 1 reply; 4+ messages in thread
From: Segher Boessenkool @ 2023-04-06 20:37 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Kewen.Lin, David Edelsohn,
	Peter Bergner, Will Schmidt, chip.kerchner

Hi!

On Thu, Apr 06, 2023 at 11:12:11AM -0400, Michael Meissner wrote:
> The Altivec instructions fmaddfp and fnmsubfp have different rounding behaviors

Those are not existing instructions.  You mean "vmaddfp" etc.

> than the VSX xvmaddsp and xvnmsubsp instructions.  In particular, generating
> these instructions seems to break Eigen.

Those instructions use round-to-nearest-tiea-to-even, like all other
VMX FP insns.  A proper patch has to deal with all VMX FP insns.  But,
almost all programs expect that rounding mode anyway, so this is not a
problem in practice.  What happened on Eigen is that the Linux kernel
starts every new process with VSCR[NJ]=1, breaking pretty much
everything that wants floating point for non-toy purposes.  (There
currently is a bug on LE that sets the wrong bit, hiding the problem in
that configuration, but it is intended there as well).

> GCC has generated the Altivec fmaddfp and fnmsubfp instructions on VSX systems
> as an alternative to the xsmadd{a,m}sp and xsnmsub{a,m}sp instructions.  The
> advantage  of the Altivec instructions is that they are 4 operand instructions
> (i.e. the target register does not have to overlap with one of the input
> registers).  The advantage is it can eliminate an extra move instruction.  The
> disadvantage is it does round the same was as the VSX instructions.

And it gets the VSCR[NJ] setting applied.  Yup.

> This patch eliminates the generation of the Altivec fmaddfp and fnmsubfp
> instructions as alternatives in the VSX instruction insn support, and in the
> Altivec insns it adds a test to prevent the insn from being used if VSX is
> available.  I also added a test to the regression test suite.

Please leave the latter out, it does not belong in this patch.  If you
want a patch to do that deal with *all* VMX FP insns?  There also are
add, sub, mul, etc.  Well I think those (as well as madd and nmsub) are
the only ones that use the NJ bit or the RN bits, but please check.

> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -750,12 +750,15 @@ (define_insn "altivec_vsel<mode>4"
>  
>  ;; Fused multiply add.
>  
> +;; If we are using VSX instructions, do not generate the vmaddfp instruction
> +;; since is has different rounding behavior than the xvmaddsp instruction.
> +

No blank lines please.

>  (define_insn "*altivec_fmav4sf4"
>    [(set (match_operand:V4SF 0 "register_operand" "=v")
>  	(fma:V4SF (match_operand:V4SF 1 "register_operand" "v")
>  		  (match_operand:V4SF 2 "register_operand" "v")
>  		  (match_operand:V4SF 3 "register_operand" "v")))]
> -  "VECTOR_UNIT_ALTIVEC_P (V4SFmode)"
> +  "VECTOR_UNIT_ALTIVEC_P (V4SFmode) && !TARGET_VSX"

This is very error-prone.  Maybe add a test to the VECTOR_UNIT_ALTIVEC
macro instead?

> -;; Fused vector multiply/add instructions. Support the classical Altivec
> -;; versions of fma, which allows the target to be a separate register from the
> -;; 3 inputs.  Under VSX, the target must be either the addend or the first
> -;; multiply.
> +;; Fused vector multiply/add instructions. Do not use the classical Altivec

(Two spaces after dot, and AltiVec is spelled with a capital V.  I don't
like it either, VMX is a much nicer and more regular name).

> +;; versions of fma.  Those instructions allows the target to be a separate
> +;; register from the 3 inputs, but they have different rounding behaviors.
>  
>  (define_insn "*vsx_fmav4sf4"
> -  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v")
> +  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
>  	(fma:V4SF
> -	  (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v")
> -	  (match_operand:V4SF 2 "vsx_register_operand" "wa,0,v")
> -	  (match_operand:V4SF 3 "vsx_register_operand" "0,wa,v")))]
> +	  (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa")
> +	  (match_operand:V4SF 2 "vsx_register_operand" "wa,0")
> +	  (match_operand:V4SF 3 "vsx_register_operand" "0,wa")))]
>    "VECTOR_UNIT_VSX_P (V4SFmode)"
>    "@
>     xvmaddasp %x0,%x1,%x2
> -   xvmaddmsp %x0,%x1,%x3
> -   vmaddfp %0,%1,%2,%3"
> +   xvmaddmsp %x0,%x1,%x3"
>    [(set_attr "type" "vecfloat")])

So this part looks okay, and it alone is safe for GCC 13 as well.

>  (define_insn "*vsx_nfmsv4sf4"
> -  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v")
> +  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
>  	(neg:V4SF
>  	 (fma:V4SF
> -	   (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v")
> -	   (match_operand:V4SF 2 "vsx_register_operand" "wa,0,v")
> +	   (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa")
> +	   (match_operand:V4SF 2 "vsx_register_operand" "wa,0")
>  	   (neg:V4SF
> -	     (match_operand:V4SF 3 "vsx_register_operand" "0,wa,v")))))]
> +	     (match_operand:V4SF 3 "vsx_register_operand" "0,wa")))))]
>    "VECTOR_UNIT_VSX_P (V4SFmode)"
>    "@
>     xvnmsubasp %x0,%x1,%x2
> -   xvnmsubmsp %x0,%x1,%x3
> -   vnmsubfp %0,%1,%2,%3"
> +   xvnmsubmsp %x0,%x1,%x3"
>    [(set_attr "type" "vecfloat")])

Well, together with this of course :-)

Could you please do that?


Segher

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: PR target/70243: Do not generate fmaddfp and fnmsubfp
  2023-04-06 20:37 ` Segher Boessenkool
@ 2023-04-07  6:32   ` Michael Meissner
  2023-04-07 11:44     ` Segher Boessenkool
  0 siblings, 1 reply; 4+ messages in thread
From: Michael Meissner @ 2023-04-07  6:32 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Michael Meissner, gcc-patches, Kewen.Lin, David Edelsohn,
	Peter Bergner, Will Schmidt, chip.kerchner

On Thu, Apr 06, 2023 at 03:37:59PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Apr 06, 2023 at 11:12:11AM -0400, Michael Meissner wrote:
> > The Altivec instructions fmaddfp and fnmsubfp have different rounding behaviors
> 
> Those are not existing instructions.  You mean "vmaddfp" etc.

Yes, sorry about that.  I guess I was thinking about the scalar instructions.

> > than the VSX xvmaddsp and xvnmsubsp instructions.  In particular, generating
> > these instructions seems to break Eigen.
> 
> Those instructions use round-to-nearest-tiea-to-even, like all other
> VMX FP insns.  A proper patch has to deal with all VMX FP insns.  But,
> almost all programs expect that rounding mode anyway, so this is not a
> problem in practice.  What happened on Eigen is that the Linux kernel
> starts every new process with VSCR[NJ]=1, breaking pretty much
> everything that wants floating point for non-toy purposes.  (There
> currently is a bug on LE that sets the wrong bit, hiding the problem in
> that configuration, but it is intended there as well).
> 
> > GCC has generated the Altivec fmaddfp and fnmsubfp instructions on VSX systems
> > as an alternative to the xsmadd{a,m}sp and xsnmsub{a,m}sp instructions.  The
> > advantage  of the Altivec instructions is that they are 4 operand instructions
> > (i.e. the target register does not have to overlap with one of the input
> > registers).  The advantage is it can eliminate an extra move instruction.  The
> > disadvantage is it does round the same was as the VSX instructions.
> 
> And it gets the VSCR[NJ] setting applied.  Yup.
> 
> > This patch eliminates the generation of the Altivec fmaddfp and fnmsubfp
> > instructions as alternatives in the VSX instruction insn support, and in the
> > Altivec insns it adds a test to prevent the insn from being used if VSX is
> > available.  I also added a test to the regression test suite.
> 
> Please leave the latter out, it does not belong in this patch.  If you
> want a patch to do that deal with *all* VMX FP insns?  There also are
> add, sub, mul, etc.  Well I think those (as well as madd and nmsub) are
> the only ones that use the NJ bit or the RN bits, but please check.

After I posted the patch I refreshed my memory of the VECTOR_UNIT_ALTIVEC_P
macro and it is not true if VSX code generation is enabled.  So I dropped the
changes to altivec.md.

In addition, as far as I know, the only AltiVec (VMX) floating point
instructions generated when VSX is used are the vmaddfp and vnmsubfp
instructions.  In the case of add and subtract, xvaddsp and xvsubsp is more
general than vaddfp or vsubfp since it can access all VSX registers.  VMX does
not have a stand-alone multiply (it generates FMA with a zero register) and it
does not have a division operation.  And VMX does not have xvmsub{a,m}sp nor
xvnadd{a,m}sp variations of the FMA instructions.

> > --- a/gcc/config/rs6000/altivec.md
> > +++ b/gcc/config/rs6000/altivec.md
> > @@ -750,12 +750,15 @@ (define_insn "altivec_vsel<mode>4"
> >  
> >  ;; Fused multiply add.
> >  
> > +;; If we are using VSX instructions, do not generate the vmaddfp instruction
> > +;; since is has different rounding behavior than the xvmaddsp instruction.
> > +
> 
> No blank lines please.

Ok.

> >  (define_insn "*altivec_fmav4sf4"
> >    [(set (match_operand:V4SF 0 "register_operand" "=v")
> >  	(fma:V4SF (match_operand:V4SF 1 "register_operand" "v")
> >  		  (match_operand:V4SF 2 "register_operand" "v")
> >  		  (match_operand:V4SF 3 "register_operand" "v")))]
> > -  "VECTOR_UNIT_ALTIVEC_P (V4SFmode)"
> > +  "VECTOR_UNIT_ALTIVEC_P (V4SFmode) && !TARGET_VSX"
> 
> This is very error-prone.  Maybe add a test to the VECTOR_UNIT_ALTIVEC
> macro instead?

As I said that part of the code is not in the next patch.

> > -;; Fused vector multiply/add instructions. Support the classical Altivec
> > -;; versions of fma, which allows the target to be a separate register from the
> > -;; 3 inputs.  Under VSX, the target must be either the addend or the first
> > -;; multiply.
> > +;; Fused vector multiply/add instructions. Do not use the classical Altivec

> (Two spaces after dot, and AltiVec is spelled with a capital V.  I don't
> like it either, VMX is a much nicer and more regular name).

When the name might be more regular, but in terms of the instruction set, it
does have holes that I mentioned above (no multiply that is not a FMA, two of
the four FMA variants are not provided).

> > +;; versions of fma.  Those instructions allows the target to be a separate
> > +;; register from the 3 inputs, but they have different rounding behaviors.
> >  
> >  (define_insn "*vsx_fmav4sf4"
> > -  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v")
> > +  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
> >  	(fma:V4SF
> > -	  (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v")
> > -	  (match_operand:V4SF 2 "vsx_register_operand" "wa,0,v")
> > -	  (match_operand:V4SF 3 "vsx_register_operand" "0,wa,v")))]
> > +	  (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa")
> > +	  (match_operand:V4SF 2 "vsx_register_operand" "wa,0")
> > +	  (match_operand:V4SF 3 "vsx_register_operand" "0,wa")))]
> >    "VECTOR_UNIT_VSX_P (V4SFmode)"
> >    "@
> >     xvmaddasp %x0,%x1,%x2
> > -   xvmaddmsp %x0,%x1,%x3
> > -   vmaddfp %0,%1,%2,%3"
> > +   xvmaddmsp %x0,%x1,%x3"
> >    [(set_attr "type" "vecfloat")])
> 
> So this part looks okay, and it alone is safe for GCC 13 as well.

Well as we were discussing on a private channel, it is desirable to generate
vmaddfp and vnmsubfp if -Ofast is used, so the next patch incorporates that
change.  It will consider generating those instructions if -Ofast is used, and
if not, it will only generate VSX instructions.

> >  (define_insn "*vsx_nfmsv4sf4"
> > -  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa,v")
> > +  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
> >  	(neg:V4SF
> >  	 (fma:V4SF
> > -	   (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa,v")
> > -	   (match_operand:V4SF 2 "vsx_register_operand" "wa,0,v")
> > +	   (match_operand:V4SF 1 "vsx_register_operand" "%wa,wa")
> > +	   (match_operand:V4SF 2 "vsx_register_operand" "wa,0")
> >  	   (neg:V4SF
> > -	     (match_operand:V4SF 3 "vsx_register_operand" "0,wa,v")))))]
> > +	     (match_operand:V4SF 3 "vsx_register_operand" "0,wa")))))]
> >    "VECTOR_UNIT_VSX_P (V4SFmode)"
> >    "@
> >     xvnmsubasp %x0,%x1,%x2
> > -   xvnmsubmsp %x0,%x1,%x3
> > -   vnmsubfp %0,%1,%2,%3"
> > +   xvnmsubmsp %x0,%x1,%x3"
> >    [(set_attr "type" "vecfloat")])
> 
> Well, together with this of course :-)
> 
> Could you please do that?

I will submit the patch separately.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: PR target/70243: Do not generate fmaddfp and fnmsubfp
  2023-04-07  6:32   ` Michael Meissner
@ 2023-04-07 11:44     ` Segher Boessenkool
  0 siblings, 0 replies; 4+ messages in thread
From: Segher Boessenkool @ 2023-04-07 11:44 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Kewen.Lin, David Edelsohn,
	Peter Bergner, Will Schmidt, chip.kerchner

Hi!

On Fri, Apr 07, 2023 at 02:32:04AM -0400, Michael Meissner wrote:
> On Thu, Apr 06, 2023 at 03:37:59PM -0500, Segher Boessenkool wrote:
> > > This patch eliminates the generation of the Altivec fmaddfp and fnmsubfp
> > > instructions as alternatives in the VSX instruction insn support, and in the
> > > Altivec insns it adds a test to prevent the insn from being used if VSX is
> > > available.  I also added a test to the regression test suite.
> > 
> > Please leave the latter out, it does not belong in this patch.  If you
> > want a patch to do that deal with *all* VMX FP insns?  There also are
> > add, sub, mul, etc.  Well I think those (as well as madd and nmsub) are
> > the only ones that use the NJ bit or the RN bits, but please check.
> 
> After I posted the patch I refreshed my memory of the VECTOR_UNIT_ALTIVEC_P
> macro and it is not true if VSX code generation is enabled.  So I dropped the
> changes to altivec.md.

Right you are.  We still run into all the same problems for -maltivec
-mno-vsx compilations, but no one (knock on wood) does that except it is
the default on old systems.  We can live with that / we'll just have to
live with that, take your pick.

> In addition, as far as I know, the only AltiVec (VMX) floating point
> instructions generated when VSX is used are the vmaddfp and vnmsubfp
> instructions.

That is more likely given that VECTOR_UNIT_ALTIVEC_P means things match
if *only* VMX registers are allowed, right.  But it still sits very
uneasy with me, the way it is written is not very defensive.

> In the case of add and subtract, xvaddsp and xvsubsp is more
> general than vaddfp or vsubfp since it can access all VSX registers.  VMX does
> not have a stand-alone multiply (it generates FMA with a zero register) and it
> does not have a division operation.  And VMX does not have xvmsub{a,m}sp nor
> xvnadd{a,m}sp variations of the FMA instructions.

Yes, it has only the more frequent two variants.

> > >  (define_insn "*altivec_fmav4sf4"
> > >    [(set (match_operand:V4SF 0 "register_operand" "=v")
> > >  	(fma:V4SF (match_operand:V4SF 1 "register_operand" "v")
> > >  		  (match_operand:V4SF 2 "register_operand" "v")
> > >  		  (match_operand:V4SF 3 "register_operand" "v")))]
> > > -  "VECTOR_UNIT_ALTIVEC_P (V4SFmode)"
> > > +  "VECTOR_UNIT_ALTIVEC_P (V4SFmode) && !TARGET_VSX"
> > 
> > This is very error-prone.  Maybe add a test to the VECTOR_UNIT_ALTIVEC
> > macro instead?
> 
> As I said that part of the code is not in the next patch.

Excellent.

> > > -;; Fused vector multiply/add instructions. Support the classical Altivec
> > > -;; versions of fma, which allows the target to be a separate register from the
> > > -;; 3 inputs.  Under VSX, the target must be either the addend or the first
> > > -;; multiply.
> > > +;; Fused vector multiply/add instructions. Do not use the classical Altivec
> 
> > (Two spaces after dot, and AltiVec is spelled with a capital V.  I don't
> > like it either, VMX is a much nicer and more regular name).
> 
> When the name might be more regular, but in terms of the instruction set, it
> does have holes that I mentioned above (no multiply that is not a FMA, two of
> the four FMA variants are not provided).

Yes.  And several new insns (like all QP insns) are VMX only, in terms
of what registers are allowed.  The mnemonics for most of those insns
(with noteworthy exception the QP insns) starts with a "v", too.

> > So this part looks okay, and it alone is safe for GCC 13 as well.
> 
> Well as we were discussing on a private channel, it is desirable to generate
> vmaddfp and vnmsubfp if -Ofast is used, so the next patch incorporates that
> change.

If only considering the RN effects.  But not when considering NJ screws
us over big time -- we should never generate VMX FP insns unless
explicitly asked for, its semantics are just too foreign.

We could add a separate flag for just this, but is there any demand?


Segher

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-04-07 11:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-06 15:12 PR target/70243: Do not generate fmaddfp and fnmsubfp Michael Meissner
2023-04-06 20:37 ` Segher Boessenkool
2023-04-07  6:32   ` Michael Meissner
2023-04-07 11:44     ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).