rs6000 fused multiply-add patch

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* rs6000 fused multiply-add patch
@ 2002-12-02 19:01 Geoffrey Keating
  2002-12-03 15:41 ` Segher Boessenkool
  0 siblings, 1 reply; 33+ messages in thread
From: Geoffrey Keating @ 2002-12-02 19:01 UTC (permalink / raw)
  To: gcc-patches; +Cc: dje, pinskia, segher, dalej


I looked at Segher's patch, and while I thought it was the right
direction, I kept finding more missing pieces the more I looked at
it.  So, I wrote my own.  This one has test cases, documentation, and
works in nearly every case.

It ensures that the fused multiply-add operations are used whenever
appropriate when -ffast-math is used.  It also gets it right in nearly
all cases when -ffast-math is not being used, by using fneg followed
by a fused multiply-add operation.

The one case it misses is

(minus A (mult B C))

when ! -ffast-math, which it should really do as 

T = (neg B)
(plus (mult T C) A)

causing suboptimal code generation in the a[7] case in the testcase.
If I make combine do this, it'll generate an extra operation on
non-powerpc systems, which I thought would probably be a bad idea (all
the other simplifications I added just move operations around or
delete operations).  Then I tried to make a splitter, which I think
ought to be the right long-term solution, but couldn't get it to work,
be recognized by combine, and have T be a new temporary.  So I'm
leaving that as a project for later.

I'm running a bootstrap & testrun on powerpc-darwin and if it works
I'll commit to mainline.

-- 
- Geoffrey Keating <geoffk@apple.com>

===File ~/patches/rs6000-fnmadd.patch=======================
Index: ChangeLog
2002-12-02  Geoffrey Keating  <geoffk@apple.com>

	* combine.c (combine_simplify_rtx): Add new canonicalizations.
	* doc/md.texi (Insn Canonicalizations): Document new
	canonicalizations for multiply/add combinations.
	* config/rs6000/rs6000.md: Add and modify floating add/multiply
	patterns to ensure they're used whenever they can be.

Index: testsuite/ChangeLog
2002-12-02  Geoffrey Keating  <geoffk@apple.com>

	* gcc.dg/ppc-fmadd-1.c: New file.
	* gcc.dg/ppc-fmadd-2.c: New file.
	* gcc.dg/ppc-fmadd-3.c: New file.

Index: combine.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/combine.c,v
retrieving revision 1.324
diff -u -p -u -p -r1.324 combine.c
--- combine.c	20 Nov 2002 09:43:19 -0000	1.324
+++ combine.c	3 Dec 2002 02:32:22 -0000
@@ -4029,6 +4029,23 @@ combine_simplify_rtx (x, op0_mode, last,
 	return gen_binary (MINUS, mode, XEXP (XEXP (x, 0), 1),
 			   XEXP (XEXP (x, 0), 0));
 
+      /* (neg (plus A B)) is canonicalized to (minus (neg A) B).  */
+      if (GET_CODE (XEXP (x, 0)) == PLUS
+	  && !HONOR_SIGNED_ZEROS (mode)
+	  && !HONOR_SIGN_DEPENDENT_ROUNDING (mode))
+	{
+	  temp = simplify_gen_unary (NEG, mode, XEXP (XEXP (x, 0), 0), mode);
+	  return gen_binary (PLUS, mode, temp, XEXP (XEXP (x, 0), 1));
+	}
+
+      /* (neg (mult A B)) becomes (mult (neg A) B).  
+         This works even for floating-point values.  */
+      if (GET_CODE (XEXP (x, 0)) == MULT)
+	{
+	  temp = simplify_gen_unary (NEG, mode, XEXP (XEXP (x, 0), 0), mode);
+	  return gen_binary (MULT, mode, temp, XEXP (XEXP (x, 0), 1));
+	}
+
       /* (neg (xor A 1)) is (plus A -1) if A is known to be either 0 or 1.  */
       if (GET_CODE (XEXP (x, 0)) == XOR && XEXP (XEXP (x, 0), 1) == const1_rtx
 	  && nonzero_bits (XEXP (XEXP (x, 0), 0), mode) == 1)
@@ -4217,6 +4234,19 @@ combine_simplify_rtx (x, op0_mode, last,
 #endif
 
     case PLUS:
+      /* Canonicalize (plus (mult (neg B) C) A) to (minus A (mult B C)).
+       */
+      if (GET_CODE (XEXP (x, 0)) == MULT 
+	  && GET_CODE (XEXP (XEXP (x, 0), 0)) == NEG)
+	{
+	  rtx in1, in2;
+	 
+	  in1 = XEXP (XEXP (XEXP (x, 0), 0), 0);
+	  in2 = XEXP (XEXP (x, 0), 1);
+	  return gen_binary (MINUS, mode, XEXP (x, 1),
+			     gen_binary (MULT, mode, in1, in2));
+	}
+
       /* If we have (plus (plus (A const) B)), associate it so that CONST is
 	 outermost.  That's because that's the way indexed addresses are
 	 supposed to appear.  This code used to check many more cases, but
@@ -4322,6 +4352,32 @@ combine_simplify_rtx (x, op0_mode, last,
 	  && rtx_equal_p (XEXP (XEXP (x, 1), 0), XEXP (x, 0)))
 	return simplify_and_const_int (NULL_RTX, mode, XEXP (x, 0),
 				       -INTVAL (XEXP (XEXP (x, 1), 1)) - 1);
+
+      /* Canonicalize (minus A (mult (neg B) C)) to (plus (mult B C) A).
+       */
+      if (GET_CODE (XEXP (x, 1)) == MULT 
+	  && GET_CODE (XEXP (XEXP (x, 1), 0)) == NEG)
+	{
+	  rtx in1, in2;
+	 
+	  in1 = XEXP (XEXP (XEXP (x, 1), 0), 0);
+	  in2 = XEXP (XEXP (x, 1), 1);
+	  return gen_binary (PLUS, mode, gen_binary (MULT, mode, in1, in2),
+			     XEXP (x, 0));
+	}
+
+       /* Canonicalize (minus (neg A) (mult B C)) to 
+	  (minus (mult (neg B) C) A). */
+      if (GET_CODE (XEXP (x, 1)) == MULT 
+	  && GET_CODE (XEXP (x, 0)) == NEG)
+	{
+	  rtx in1, in2;
+	 
+	  in1 = simplify_gen_unary (NEG, mode, XEXP (XEXP (x, 1), 0), mode);
+	  in2 = XEXP (XEXP (x, 1), 1);
+	  return gen_binary (MINUS, mode, gen_binary (MULT, mode, in1, in2),
+			     XEXP (XEXP (x, 0), 0));
+	}
 
       /* Canonicalize (minus A (plus B C)) to (minus (minus A B) C) for
 	 integers.  */
Index: config/rs6000/rs6000.md
===================================================================
RCS file: /cvs/gcc/gcc/gcc/config/rs6000/rs6000.md,v
retrieving revision 1.222
diff -u -p -u -p -r1.222 rs6000.md
--- config/rs6000/rs6000.md	16 Nov 2002 18:01:51 -0000	1.222
+++ config/rs6000/rs6000.md	3 Dec 2002 02:32:41 -0000
@@ -5280,7 +5280,18 @@
 	(neg:SF (plus:SF (mult:SF (match_operand:SF 1 "gpc_reg_operand" "%f")
 				  (match_operand:SF 2 "gpc_reg_operand" "f"))
 			 (match_operand:SF 3 "gpc_reg_operand" "f"))))]
-  "TARGET_POWERPC && TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_FUSED_MADD"
+  "TARGET_POWERPC && TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_FUSED_MADD
+   && HONOR_SIGNED_ZEROS (SFmode)"
+  "fnmadds %0,%1,%2,%3"
+  [(set_attr "type" "fp")])
+
+(define_insn ""
+  [(set (match_operand:SF 0 "gpc_reg_operand" "=f")
+	(minus:SF (mult:SF (neg:SF (match_operand:SF 1 "gpc_reg_operand" "f"))
+			   (match_operand:SF 2 "gpc_reg_operand" "f"))
+			 (match_operand:SF 3 "gpc_reg_operand" "f")))]
+  "TARGET_POWERPC && TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_FUSED_MADD
+   && ! HONOR_SIGNED_ZEROS (SFmode)"
   "fnmadds %0,%1,%2,%3"
   [(set_attr "type" "fp")])
 
@@ -5295,10 +5306,31 @@
 
 (define_insn ""
   [(set (match_operand:SF 0 "gpc_reg_operand" "=f")
+	(minus:SF (mult:SF (neg:SF (match_operand:SF 1 "gpc_reg_operand" "f"))
+			   (match_operand:SF 2 "gpc_reg_operand" "f"))
+			 (match_operand:SF 3 "gpc_reg_operand" "f")))]
+  "! TARGET_POWERPC && TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_FUSED_MADD
+   && ! HONOR_SIGNED_ZEROS (SFmode)"
+  "{fnma|fnmadd} %0,%1,%2,%3"
+  [(set_attr "type" "dmul")])
+
+(define_insn ""
+  [(set (match_operand:SF 0 "gpc_reg_operand" "=f")
 	(neg:SF (minus:SF (mult:SF (match_operand:SF 1 "gpc_reg_operand" "%f")
 				   (match_operand:SF 2 "gpc_reg_operand" "f"))
 			  (match_operand:SF 3 "gpc_reg_operand" "f"))))]
-  "TARGET_POWERPC && TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_FUSED_MADD"
+  "TARGET_POWERPC && TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_FUSED_MADD
+   && HONOR_SIGNED_ZEROS (SFmode)"
+  "fnmsubs %0,%1,%2,%3"
+  [(set_attr "type" "fp")])
+
+(define_insn ""
+  [(set (match_operand:SF 0 "gpc_reg_operand" "=f")
+	(minus:SF (match_operand:SF 3 "gpc_reg_operand" "f")
+		  (mult:SF (match_operand:SF 1 "gpc_reg_operand" "%f")
+			   (match_operand:SF 2 "gpc_reg_operand" "f"))))]
+  "TARGET_POWERPC && TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_FUSED_MADD
+   && ! HONOR_SIGNED_ZEROS (SFmode)"
   "fnmsubs %0,%1,%2,%3"
   [(set_attr "type" "fp")])
 
@@ -5311,6 +5343,16 @@
   "{fnms|fnmsub} %0,%1,%2,%3"
   [(set_attr "type" "dmul")])
 
+(define_insn ""
+  [(set (match_operand:SF 0 "gpc_reg_operand" "=f")
+	(minus:SF (match_operand:SF 3 "gpc_reg_operand" "f")
+		  (mult:SF (match_operand:SF 1 "gpc_reg_operand" "%f")
+			   (match_operand:SF 2 "gpc_reg_operand" "f"))))]
+  "! TARGET_POWERPC && TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_FUSED_MADD
+   && ! HONOR_SIGNED_ZEROS (SFmode)"
+  "{fnms|fnmsub} %0,%1,%2,%3"
+  [(set_attr "type" "fp")])
+
 (define_expand "sqrtsf2"
   [(set (match_operand:SF 0 "gpc_reg_operand" "")
 	(sqrt:SF (match_operand:SF 1 "gpc_reg_operand" "")))]
@@ -5524,7 +5566,18 @@
 	(neg:DF (plus:DF (mult:DF (match_operand:DF 1 "gpc_reg_operand" "%f")
 				  (match_operand:DF 2 "gpc_reg_operand" "f"))
 			 (match_operand:DF 3 "gpc_reg_operand" "f"))))]
-  "TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_FUSED_MADD"
+  "TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_FUSED_MADD
+   && HONOR_SIGNED_ZEROS (DFmode)"
+  "{fnma|fnmadd} %0,%1,%2,%3"
+  [(set_attr "type" "dmul")])
+
+(define_insn ""
+  [(set (match_operand:DF 0 "gpc_reg_operand" "=f")
+	(minus:DF (mult:DF (neg:DF (match_operand:DF 1 "gpc_reg_operand" "f"))
+			   (match_operand:DF 2 "gpc_reg_operand" "f"))
+		  (match_operand:DF 3 "gpc_reg_operand" "f")))]
+  "TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_FUSED_MADD
+   && ! HONOR_SIGNED_ZEROS (DFmode)"
   "{fnma|fnmadd} %0,%1,%2,%3"
   [(set_attr "type" "dmul")])
 
@@ -5533,7 +5586,18 @@
 	(neg:DF (minus:DF (mult:DF (match_operand:DF 1 "gpc_reg_operand" "%f")
 				   (match_operand:DF 2 "gpc_reg_operand" "f"))
 			  (match_operand:DF 3 "gpc_reg_operand" "f"))))]
-  "TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_FUSED_MADD"
+  "TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_FUSED_MADD
+   && HONOR_SIGNED_ZEROS (DFmode)"
+  "{fnms|fnmsub} %0,%1,%2,%3"
+  [(set_attr "type" "dmul")])
+
+(define_insn ""
+  [(set (match_operand:DF 0 "gpc_reg_operand" "=f")
+	(minus:DF (match_operand:DF 3 "gpc_reg_operand" "f")
+	          (mult:DF (match_operand:DF 1 "gpc_reg_operand" "%f")
+			   (match_operand:DF 2 "gpc_reg_operand" "f"))))]
+  "TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_FUSED_MADD 
+   && ! HONOR_SIGNED_ZEROS (DFmode)"
   "{fnms|fnmsub} %0,%1,%2,%3"
   [(set_attr "type" "dmul")])
 
Index: doc/md.texi
===================================================================
RCS file: /cvs/gcc/gcc/gcc/doc/md.texi,v
retrieving revision 1.53
diff -u -p -u -p -r1.53 md.texi
--- doc/md.texi	1 Nov 2002 07:05:57 -0000	1.53
+++ doc/md.texi	3 Dec 2002 02:32:43 -0000
@@ -3670,6 +3670,14 @@ For these operators, if only one operand
 @code{mult}, @code{plus}, or @code{minus} expression, it will be the
 first operand.
 
+@item
+In combinations of @code{neg}, @code{mult}, @code{plus}, and
+@code{minus}, the @code{neg} operations (if any) will be moved inside
+the operations as far as possible.  For instance, 
+@code{(neg (mult A B))} is canonicalized as @code{(mult (neg A) B)}, but
+@code{(plus (mult (neg A) B) C)} is canonicalized as
+@code{(minus A (mult B C))}.
+
 @cindex @code{compare}, canonicalization of
 @item
 For the @code{compare} operator, a constant is always the second operand
Index: testsuite/gcc.dg/ppc-fmadd-1.c
===================================================================
RCS file: testsuite/gcc.dg/ppc-fmadd-1.c
diff -N testsuite/gcc.dg/ppc-fmadd-1.c
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ testsuite/gcc.dg/ppc-fmadd-1.c	3 Dec 2002 02:33:56 -0000
@@ -0,0 +1,43 @@
+/* { dg-do compile { target powerpc*-*-* } } */
+/* { dg-options "-ffast-math -O2" } */
+/* { dg-final { scan-assembler-not "f(add|sub|mul|neg)" } } */
+
+void foo(double *a, double *b, double *c, double *d)
+{
+  a[0] =  b[0] + c[0] * d[0];		// fmadd
+  a[1] =  b[1] - c[1] * d[1];		// fnmsub with fast-math
+  a[2] = -b[2] + c[2] * d[2];   	// fmsub
+  a[3] = -b[3] - c[3] * d[3];		// fnmadd with fast-math
+  a[4] = -( b[4] + c[4] * d[4]);	// fnmadd
+  a[5] = -( b[5] - c[5] * d[5]);	// fmsub with fast-math
+  a[6] = -(-b[6] + c[6] * d[6]);	// fnmsub
+  a[7] = -(-b[7] - c[7] * d[7]);	// fmadd with fast-math
+  a[10] =  b[10] - c[10] * -d[10];	// fmadd
+  a[11] =  b[11] + c[11] * -d[11];	// fnmsub with fast-math
+  a[12] = -b[12] - c[12] * -d[12];   	// fmsub
+  a[13] = -b[13] + c[13] * -d[13];	// fnmadd with fast-math
+  a[14] = -( b[14] - c[14] * -d[14]);	// fnmadd
+  a[15] = -( b[15] + c[15] * -d[15]);	// fmsub with fast-math
+  a[16] = -(-b[16] - c[16] * -d[16]);	// fnmsub
+  a[17] = -(-b[17] + c[17] * -d[17]);	// fmadd with fast-math
+}
+
+void foos(float *a, float *b, float *c, float *d)
+{
+  a[0] =  b[0] + c[0] * d[0];		// fmadd
+  a[1] =  b[1] - c[1] * d[1];		// fnmsub with fast-math
+  a[2] = -b[2] + c[2] * d[2];   	// fmsub
+  a[3] = -b[3] - c[3] * d[3];		// fnmadd with fast-math
+  a[4] = -( b[4] + c[4] * d[4]);	// fnmadd
+  a[5] = -( b[5] - c[5] * d[5]);	// fmsub with fast-math
+  a[6] = -(-b[6] + c[6] * d[6]);	// fnmsub
+  a[7] = -(-b[7] - c[7] * d[7]);	// fmadd with fast-math
+  a[10] =  b[10] - c[10] * -d[10];	// fmadd
+  a[11] =  b[11] + c[11] * -d[11];	// fnmsub with fast-math
+  a[12] = -b[12] - c[12] * -d[12];   	// fmsub
+  a[13] = -b[13] + c[13] * -d[13];	// fnmadd with fast-math
+  a[14] = -( b[14] - c[14] * -d[14]);	// fnmadd
+  a[15] = -( b[15] + c[15] * -d[15]);	// fmsub with fast-math
+  a[16] = -(-b[16] - c[16] * -d[16]);	// fnmsub
+  a[17] = -(-b[17] + c[17] * -d[17]);	// fmadd with fast-math
+}
Index: testsuite/gcc.dg/ppc-fmadd-2.c
===================================================================
RCS file: testsuite/gcc.dg/ppc-fmadd-2.c
diff -N testsuite/gcc.dg/ppc-fmadd-2.c
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ testsuite/gcc.dg/ppc-fmadd-2.c	3 Dec 2002 02:33:56 -0000
@@ -0,0 +1,27 @@
+/* { dg-do compile { target powerpc*-*-* } } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not "f(add|sub|mul|neg)" } } */
+
+void foo(double *a, double *b, double *c, double *d)
+{
+  a[0] =  b[0] + c[0] * d[0];		// fmadd
+  a[2] = -b[2] + c[2] * d[2];   	// fmsub
+  a[4] = -( b[4] + c[4] * d[4]);	// fnmadd
+  a[6] = -(-b[6] + c[6] * d[6]);	// fnmsub
+  a[10] =  b[10] - c[10] * -d[10];	// fmadd
+  a[12] = -b[12] - c[12] * -d[12];   	// fmsub
+  a[14] = -( b[14] - c[14] * -d[14]);	// fnmadd
+  a[16] = -(-b[16] - c[16] * -d[16]);	// fnmsub
+}
+
+void foos(float *a, float *b, float *c, float *d)
+{
+  a[0] =  b[0] + c[0] * d[0];		// fmadd
+  a[2] = -b[2] + c[2] * d[2];   	// fmsub
+  a[4] = -( b[4] + c[4] * d[4]);	// fnmadd
+  a[6] = -(-b[6] + c[6] * d[6]);	// fnmsub
+  a[10] =  b[10] - c[10] * -d[10];	// fmadd
+  a[12] = -b[12] - c[12] * -d[12];   	// fmsub
+  a[14] = -( b[14] - c[14] * -d[14]);	// fnmadd
+  a[16] = -(-b[16] - c[16] * -d[16]);	// fnmsub
+}
Index: testsuite/gcc.dg/ppc-fmadd-3.c
===================================================================
RCS file: testsuite/gcc.dg/ppc-fmadd-3.c
diff -N testsuite/gcc.dg/ppc-fmadd-3.c
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ testsuite/gcc.dg/ppc-fmadd-3.c	3 Dec 2002 02:33:58 -0000
@@ -0,0 +1,36 @@
+/* { dg-do compile { target powerpc*-*-* } } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not "f(add|sub|mul)" } } */
+
+void foo(double *a, double *b, double *c, double *d)
+{
+#if 0
+  a[1] =  b[1] - c[1] * d[1];		// fneg, fmadd without fast-math
+#endif
+  a[3] = -b[3] - c[3] * d[3];		// fneg, fmsub without fast-math
+#if 0
+  a[5] = -( b[5] - c[5] * d[5]);	// fneg, fnmadd without fast-math
+#endif
+  a[7] = -(-b[7] - c[7] * d[7]);	// fneg, fnmsub without fast-math
+  a[11] =  b[11] + c[11] * -d[11];	// fneg, fmadd without fast-math
+  a[13] = -b[13] + c[13] * -d[13];	// fneg, fmsub without fast-math
+  a[15] = -( b[15] + c[15] * -d[15]);	// fneg, fnmadd without fast-math
+  a[17] = -(-b[17] + c[17] * -d[17]);	// fneg, fnmsub without fast-math
+}
+
+void foos(float *a, float *b, float *c, float *d)
+{
+#if 0
+  a[1] =  b[1] - c[1] * d[1];		// fneg, fmadd without fast-math
+#endif
+  a[3] = -b[3] - c[3] * d[3];		// fneg, fmsub without fast-math
+#if 0
+  a[5] = -( b[5] - c[5] * d[5]);	// fneg, fnmadd without fast-math
+#endif
+  a[7] = -(-b[7] - c[7] * d[7]);	// fneg, fnmsub without fast-math
+  a[11] =  b[11] + c[11] * -d[11];	// fneg, fmadd without fast-math
+  a[13] = -b[13] + c[13] * -d[13];	// fneg, fmsub without fast-math
+  a[15] = -( b[15] + c[15] * -d[15]);	// fneg, fnmadd without fast-math
+  a[17] = -(-b[17] + c[17] * -d[17]);	// fneg, fnmsub without fast-math
+}
+
============================================================

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: rs6000 fused multiply-add patch
  2002-12-02 19:01 rs6000 fused multiply-add patch Geoffrey Keating
@ 2002-12-03 15:41 ` Segher Boessenkool
  2002-12-03 16:59   ` Geoff Keating
  0 siblings, 1 reply; 33+ messages in thread
From: Segher Boessenkool @ 2002-12-03 15:41 UTC (permalink / raw)
  To: Geoffrey Keating; +Cc: gcc-patches, dje, pinskia, dalej

Geoffrey Keating wrote:
> 
> I looked at Segher's patch, and while I thought it was the right
> direction, I kept finding more missing pieces the more I looked at
> it.  So, I wrote my own.  This one has test cases, documentation, and
> works in nearly every case.

Mine works in _every_ case -- I _did_ test it, after all :)

I didn't test your patch yet, but...

> --- config/rs6000/rs6000.md     16 Nov 2002 18:01:51 -0000      1.222
> +++ config/rs6000/rs6000.md     3 Dec 2002 02:32:41 -0000
> @@ -5280,7 +5280,18 @@
>         (neg:SF (plus:SF (mult:SF (match_operand:SF 1 "gpc_reg_operand" "%f")
>                                   (match_operand:SF 2 "gpc_reg_operand" "f"))
>                          (match_operand:SF 3 "gpc_reg_operand" "f"))))]
> -  "TARGET_POWERPC && TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_FUSED_MADD"
> +  "TARGET_POWERPC && TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_FUSED_MADD
> +   && HONOR_SIGNED_ZEROS (SFmode)"
> +  "fnmadds %0,%1,%2,%3"
> +  [(set_attr "type" "fp")])
> +
> +(define_insn ""
> +  [(set (match_operand:SF 0 "gpc_reg_operand" "=f")
> +       (minus:SF (mult:SF (neg:SF (match_operand:SF 1 "gpc_reg_operand" "f"))
> +                          (match_operand:SF 2 "gpc_reg_operand" "f"))
> +                        (match_operand:SF 3 "gpc_reg_operand" "f")))]
> +  "TARGET_POWERPC && TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_FUSED_MADD
> +   && ! HONOR_SIGNED_ZEROS (SFmode)"
>    "fnmadds %0,%1,%2,%3"
>    [(set_attr "type" "fp")])

We're not supposed to add duplicate patterns to the .md -- that's what started
all of this, after all?


Segher

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: rs6000 fused multiply-add patch
  2002-12-03 15:41 ` Segher Boessenkool
@ 2002-12-03 16:59   ` Geoff Keating
  2002-12-03 17:12     ` Segher Boessenkool
  0 siblings, 1 reply; 33+ messages in thread
From: Geoff Keating @ 2002-12-03 16:59 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, dje, pinskia, dalej

Segher Boessenkool <segher@koffie.nl> writes:

> Geoffrey Keating wrote:
> > --- config/rs6000/rs6000.md     16 Nov 2002 18:01:51 -0000      1.222
> > +++ config/rs6000/rs6000.md     3 Dec 2002 02:32:41 -0000
> > @@ -5280,7 +5280,18 @@
> >         (neg:SF (plus:SF (mult:SF (match_operand:SF 1 "gpc_reg_operand" "%f")
> >                                   (match_operand:SF 2 "gpc_reg_operand" "f"))
> >                          (match_operand:SF 3 "gpc_reg_operand" "f"))))]
> > -  "TARGET_POWERPC && TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_FUSED_MADD"
> > +  "TARGET_POWERPC && TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_FUSED_MADD
> > +   && HONOR_SIGNED_ZEROS (SFmode)"
> > +  "fnmadds %0,%1,%2,%3"
> > +  [(set_attr "type" "fp")])
> > +
> > +(define_insn ""
> > +  [(set (match_operand:SF 0 "gpc_reg_operand" "=f")
> > +       (minus:SF (mult:SF (neg:SF (match_operand:SF 1 "gpc_reg_operand" "f"))
> > +                          (match_operand:SF 2 "gpc_reg_operand" "f"))
> > +                        (match_operand:SF 3 "gpc_reg_operand" "f")))]
> > +  "TARGET_POWERPC && TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_FUSED_MADD
> > +   && ! HONOR_SIGNED_ZEROS (SFmode)"
> >    "fnmadds %0,%1,%2,%3"
> >    [(set_attr "type" "fp")])

> We're not supposed to add duplicate patterns to the .md -- that's
> what started all of this, after all?

These patterns aren't duplicates, only one is active at a time.

-- 
- Geoffrey Keating <geoffk@geoffk.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: rs6000 fused multiply-add patch
  2002-12-03 16:59   ` Geoff Keating
@ 2002-12-03 17:12     ` Segher Boessenkool
  2002-12-03 17:29       ` David Edelsohn
  0 siblings, 1 reply; 33+ messages in thread
From: Segher Boessenkool @ 2002-12-03 17:12 UTC (permalink / raw)
  To: Geoff Keating; +Cc: gcc-patches, dje, pinskia, dalej

Geoff Keating wrote:
> 
> Segher Boessenkool <segher@koffie.nl> writes:
> > We're not supposed to add duplicate patterns to the .md -- that's
> > what started all of this, after all?
> 
> These patterns aren't duplicates, only one is active at a time.

That's true for my very first patch, too.


Segher

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: rs6000 fused multiply-add patch
  2002-12-03 17:12     ` Segher Boessenkool
@ 2002-12-03 17:29       ` David Edelsohn
  2002-12-04 19:41         ` Segher Boessenkool
  0 siblings, 1 reply; 33+ messages in thread
From: David Edelsohn @ 2002-12-03 17:29 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Geoff Keating, gcc-patches, pinskia, dalej

>>>>> Segher Boessenkool writes:

Segher> That's true for my very first patch, too.

	I would recommend that you focus on understanding why Geoff's
patch is more correct.  The different patterns provide logically different
building blocks for GCC to use instead of duplicating patterns to provide
missing optimization transformations.  MD files should provide canonical
forms of patterns and both patterns in the new patch are canonical,
although they are the same machine instruction.  The original patch was
not limited to canonical patterns.

David

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: rs6000 fused multiply-add patch
  2002-12-03 17:29       ` David Edelsohn
@ 2002-12-04 19:41         ` Segher Boessenkool
  2002-12-05 14:04           ` Geoff Keating
  0 siblings, 1 reply; 33+ messages in thread
From: Segher Boessenkool @ 2002-12-04 19:41 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Geoff Keating, gcc-patches, pinskia, dalej

David Edelsohn wrote:

>         I would recommend that you focus on understanding why Geoff's
> patch is more correct.  The different patterns provide logically different
> building blocks for GCC to use instead of duplicating patterns to provide
> missing optimization transformations.  MD files should provide canonical
> forms of patterns and both patterns in the new patch are canonical,
> although they are the same machine instruction.  The original patch was
> not limited to canonical patterns.

Geoff's patch redefines canonical ;) :


Geoffrey Keating wrote:
> +@item
> +In combinations of @code{neg}, @code{mult}, @code{plus}, and
> +@code{minus}, the @code{neg} operations (if any) will be moved inside
> +the operations as far as possible.  For instance,
> +@code{(neg (mult A B))} is canonicalized as @code{(mult (neg A) B)}, but
> +@code{(plus (mult (neg A) B) C)} is canonicalized as
> +@code{(minus A (mult B C))}.

Why is it better to move the NEG's inward instead of outward?


> +void foo(double *a, double *b, double *c, double *d)
> +{
> +  a[0] =  b[0] + c[0] * d[0];          // fmadd
> +  a[1] =  b[1] - c[1] * d[1];          // fnmsub with fast-math
> +  a[2] = -b[2] + c[2] * d[2];          // fmsub
> +  a[3] = -b[3] - c[3] * d[3];          // fnmadd with fast-math
> +  a[4] = -( b[4] + c[4] * d[4]);       // fnmadd
> +  a[5] = -( b[5] - c[5] * d[5]);       // fmsub with fast-math
> +  a[6] = -(-b[6] + c[6] * d[6]);       // fnmsub
> +  a[7] = -(-b[7] - c[7] * d[7]);       // fmadd with fast-math
> +  a[10] =  b[10] - c[10] * -d[10];     // fmadd
> +  a[11] =  b[11] + c[11] * -d[11];     // fnmsub with fast-math
> +  a[12] = -b[12] - c[12] * -d[12];     // fmsub
> +  a[13] = -b[13] + c[13] * -d[13];     // fnmadd with fast-math
> +  a[14] = -( b[14] - c[14] * -d[14]);  // fnmadd
> +  a[15] = -( b[15] + c[15] * -d[15]);  // fmsub with fast-math
> +  a[16] = -(-b[16] - c[16] * -d[16]);  // fnmsub
> +  a[17] = -(-b[17] + c[17] * -d[17]);  // fmadd with fast-math
> +}

You are no tests that have the multiply first (to the left of the addition);
this mattered in my testing (with the non-patched gcc, too).


Segher

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: rs6000 fused multiply-add patch
  2002-12-04 19:41         ` Segher Boessenkool
@ 2002-12-05 14:04           ` Geoff Keating
  2002-12-20 21:08             ` rs6000 fused multiply-add patch [+ patchlet] Segher Boessenkool
  0 siblings, 1 reply; 33+ messages in thread
From: Geoff Keating @ 2002-12-05 14:04 UTC (permalink / raw)
  To: segher; +Cc: dje, gcc-patches, pinskia, dalej

> Date: Thu, 05 Dec 2002 04:41:59 +0100
> From: Segher Boessenkool <segher@koffie.nl>
> X-Accept-Language: en
> Cc: Geoff Keating <geoffk@geoffk.org>, gcc-patches@gcc.gnu.org,
>    pinskia@physics.uc.edu, dalej@apple.com
> X-OriginalArrivalTime: 05 Dec 2002 03:41:59.0292 (UTC) FILETIME=[43EEABC0:01C29C10]
> 
> David Edelsohn wrote:
> 
> >         I would recommend that you focus on understanding why Geoff's
> > patch is more correct.  The different patterns provide logically different
> > building blocks for GCC to use instead of duplicating patterns to provide
> > missing optimization transformations.  MD files should provide canonical
> > forms of patterns and both patterns in the new patch are canonical,
> > although they are the same machine instruction.  The original patch was
> > not limited to canonical patterns.
> 
> Geoff's patch redefines canonical ;) :

It _extends_ canonical, which is the point.

> Geoffrey Keating wrote:
> > +@item
> > +In combinations of @code{neg}, @code{mult}, @code{plus}, and
> > +@code{minus}, the @code{neg} operations (if any) will be moved inside
> > +the operations as far as possible.  For instance,
> > +@code{(neg (mult A B))} is canonicalized as @code{(mult (neg A) B)}, but
> > +@code{(plus (mult (neg A) B) C)} is canonicalized as
> > +@code{(minus A (mult B C))}.
> 
> Why is it better to move the NEG's inward instead of outward?

It makes no difference; I picked this direction because we move NOTs
inwards too and I wanted that tiny bit of consistency.

> > +void foo(double *a, double *b, double *c, double *d)
> > +{
> > +  a[0] =  b[0] + c[0] * d[0];          // fmadd
> > +  a[1] =  b[1] - c[1] * d[1];          // fnmsub with fast-math
> > +  a[2] = -b[2] + c[2] * d[2];          // fmsub
> > +  a[3] = -b[3] - c[3] * d[3];          // fnmadd with fast-math
> > +  a[4] = -( b[4] + c[4] * d[4]);       // fnmadd
> > +  a[5] = -( b[5] - c[5] * d[5]);       // fmsub with fast-math
> > +  a[6] = -(-b[6] + c[6] * d[6]);       // fnmsub
> > +  a[7] = -(-b[7] - c[7] * d[7]);       // fmadd with fast-math
> > +  a[10] =  b[10] - c[10] * -d[10];     // fmadd
> > +  a[11] =  b[11] + c[11] * -d[11];     // fnmsub with fast-math
> > +  a[12] = -b[12] - c[12] * -d[12];     // fmsub
> > +  a[13] = -b[13] + c[13] * -d[13];     // fnmadd with fast-math
> > +  a[14] = -( b[14] - c[14] * -d[14]);  // fnmadd
> > +  a[15] = -( b[15] + c[15] * -d[15]);  // fmsub with fast-math
> > +  a[16] = -(-b[16] - c[16] * -d[16]);  // fnmsub
> > +  a[17] = -(-b[17] + c[17] * -d[17]);  // fmadd with fast-math
> > +}
> 
> You are no tests that have the multiply first (to the left of the addition);
> this mattered in my testing (with the non-patched gcc, too).

It didn't seem to matter for me, I tried (with fast-math):

void fooi(double *a, double *b, double *c, double *d)
{ 
  a[0] = c[0] * d[0] +  b[0];
  a[1] = c[1] * d[1] -  b[1];
  a[2] = c[2] * d[2] + -b[2];
  a[3] = c[3] * d[3] - -b[3];
  a[4] = -(c[4] * d[4] +  b[4]);
  a[5] = -(c[5] * d[5] -  b[5]);
  a[6] = -(c[6] * d[6] + -b[6]);
  a[7] = -(c[7] * d[7] - -b[7]);
  a[10] = c[10] * -d[10] -  b[10];
  a[11] = c[11] * -d[11] +  b[11];
  a[12] = c[12] * -d[12] - -b[12];
  a[13] = c[13] * -d[13] + -b[13];
  a[14] = -(c[14] * -d[14] -  b[14]);
  a[15] = -(c[15] * -d[15] +  b[15]);
  a[16] = -(c[16] * -d[16] - -b[16]);
  a[17] = -(c[17] * -d[17] + -b[17]);
}

Although in general, GCC won't try to pick between something like
(mult (plus A B) (neg C)) and
(mult (neg C) (plus A B))
this sort of case won't happen with multiply-add combinations because
the equivalent form is
(plus (mult A B) (neg C))
which has a canonical form with a MINUS, likewise with
(minus (mult A B) (neg C))
there's a canonical form with a PLUS, and with
(minus (neg C) (mult A B))
I declared that the canonical form is
(minus (mult (neg A) B) C)

I'd have used your testcases, but they didn't seem to be in your patch.

-- 
- Geoffrey Keating <geoffk@geoffk.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2002-12-05 14:04           ` Geoff Keating
@ 2002-12-20 21:08             ` Segher Boessenkool
  2002-12-20 21:38               ` Geoff Keating
  0 siblings, 1 reply; 33+ messages in thread
From: Segher Boessenkool @ 2002-12-20 21:08 UTC (permalink / raw)
  To: Geoff Keating; +Cc: dje, gcc-patches, pinskia, dalej

Sorry for the delay...

First, a trivial extra patch that's needed to get fma's in all
fastmath cases:

2002-12-21  Segher Boessenkool  <segher@koffie.nl>

	* combine.c (combine_simplify_rtx): Add a simplification
	for the multiplication of two negations.

*** gcc/combine.c.orig	Sat Dec 21 01:22:18 2002
--- gcc/combine.c	Sat Dec 21 01:49:03 2002
*************** combine_simplify_rtx (x, op0_mode, last,
*** 4417,4422 ****
--- 4417,4427 ----
  	  if (tem)
  	    return gen_binary (DIV, mode, tem, XEXP (XEXP (x, 0), 1));
  	}
+ 
+       /* Simplify (mult (neg A) (neg B)) to (mult A B).  */
+       if (GET_CODE (XEXP (x, 0)) == NEG && GET_CODE (XEXP (x, 1)) == NEG)
+ 	return gen_binary (MULT, mode, XEXP (XEXP (x, 0), 0),
+ 				       XEXP (XEXP (x, 1), 0));
        break;

      case UDIV:

---end of patch

This is needed because of the pushing-inwards of NEG's, when there's
already a NEG inside, like in  -(b * -c) - a .  With this, it passes
all my 96 cases for a single fma insn expression.  Great!

Geoff Keating wrote:
> > Geoffrey Keating wrote:
> > > +@item
> > > +In combinations of @code{neg}, @code{mult}, @code{plus}, and
> > > +@code{minus}, the @code{neg} operations (if any) will be moved inside
> > > +the operations as far as possible.  For instance,
> > > +@code{(neg (mult A B))} is canonicalized as @code{(mult (neg A) B)}, but
> > > +@code{(plus (mult (neg A) B) C)} is canonicalized as
> > > +@code{(minus A (mult B C))}.

[Typo: this last example is in error (it mixes up the ABC).]

This exact example is the only case where ieee-math doesn't use an
fma instruction where it could [as you yourself already mentioned, btw]
(it uses an fmul/fsub sequence instead of an fneg/fmadd sequence); it might
be nice to get the extra precision the fma provides.  Slightly worse though,
for -(a - b * c), it generates fmul/fsub/fneg instead of fneg/fnmadd.

I'll try to write a splitter for this, like you suggest.

> I'd have used your testcases, but they didn't seem to be in your patch.

I don't know how to write a testsuite thingy yet; maybe tomorrow, if I can
find myself a working version of dejagnu.  I'll send my testcases, promise :)

Oh btw, for a function like

float bla(float a, float b, float c)
{
	return a + b * c;
}

gcc generates something like

	fmadds	2,2,3,1
	fmr	1,2
	blr

any idea what needs to be fixed to get rid of the fmr insn?

Segher

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2002-12-20 21:08             ` rs6000 fused multiply-add patch [+ patchlet] Segher Boessenkool
@ 2002-12-20 21:38               ` Geoff Keating
  2002-12-20 22:21                 ` Segher Boessenkool
  0 siblings, 1 reply; 33+ messages in thread
From: Geoff Keating @ 2002-12-20 21:38 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: dje, gcc-patches, pinskia, dalej

Segher Boessenkool <segher@koffie.nl> writes:

> Sorry for the delay...
> 
> First, a trivial extra patch that's needed to get fma's in all
> fastmath cases:
> 
> 
> 2002-12-21  Segher Boessenkool  <segher@koffie.nl>
> 
> 	* combine.c (combine_simplify_rtx): Add a simplification
> 	for the multiplication of two negations.

The canonical form for (mult A (neg B)) is (mult (neg A) B);
this is documented but may not be implemented.  If it was implemented,
(mult (neg A) (neg B)) should be simplified to (mult (neg (neg A)) B),
and that simplifies down to (mult A B).

> > > > +@code{(plus (mult (neg A) B) C)} is canonicalized as
> > > > +@code{(minus A (mult B C))}.
> 
> [Typo: this last example is in error (it mixes up the ABC).]

Sigh.  Yes, it should be '(minus C (mult A B))'..

> This exact example is the only case where ieee-math doesn't use an
> fma instruction where it could [as you yourself already mentioned, btw]
> (it uses an fmul/fsub sequence instead of an fneg/fmadd sequence); it might
> be nice to get the extra precision the fma provides.  Slightly worse though,
> for -(a - b * c), it generates fmul/fsub/fneg instead of fneg/fnmadd.

Yes.

> Oh btw, for a function like
> 
> float bla(float a, float b, float c)
> {
> 	return a + b * c;
> }
> 
> gcc generates something like
> 
> 	fmadds	2,2,3,1
> 	fmr	1,2
> 	blr
> 
> any idea what needs to be fixed to get rid of the fmr insn?

The register allocator.

-- 
- Geoffrey Keating <geoffk@geoffk.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2002-12-20 21:38               ` Geoff Keating
@ 2002-12-20 22:21                 ` Segher Boessenkool
  2002-12-20 22:28                   ` David Edelsohn
                                     ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Segher Boessenkool @ 2002-12-20 22:21 UTC (permalink / raw)
  To: Geoff Keating; +Cc: dje, gcc-patches, pinskia, dalej

Geoff Keating wrote:
> The canonical form for (mult A (neg B)) is (mult (neg A) B);
> this is documented but may not be implemented.  If it was implemented,
> (mult (neg A) (neg B)) should be simplified to (mult (neg (neg A)) B),
> and that simplifies down to (mult A B).

I'll make a patch to fix that, then.

> > Oh btw, for a function like
> >
> > float bla(float a, float b, float c)
> > {
> >       return a + b * c;
> > }
> >
> > gcc generates something like
> >
> >       fmadds  2,2,3,1
> >       fmr     1,2
> >       blr
> >
> > any idea what needs to be fixed to get rid of the fmr insn?
> 
> The register allocator.

I was hoping for something a little bit more specific than this, like,
if it's a known shortcoming for example.


Segher

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2002-12-20 22:21                 ` Segher Boessenkool
@ 2002-12-20 22:28                   ` David Edelsohn
  2002-12-21 21:55                   ` Geoff Keating
  2002-12-28 22:08                   ` Segher Boessenkool
  2 siblings, 0 replies; 33+ messages in thread
From: David Edelsohn @ 2002-12-20 22:28 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Geoff Keating, gcc-patches, pinskia, dalej

>>>>> Segher Boessenkool writes:

>> >       fmadds  2,2,3,1
>> >       fmr     1,2
>> >
>> > any idea what needs to be fixed to get rid of the fmr insn?
>> 
>> The register allocator.

Segher> I was hoping for something a little bit more specific than this, like,
Segher> if it's a known shortcoming for example.

	It's a known problem.

David

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2002-12-20 22:21                 ` Segher Boessenkool
  2002-12-20 22:28                   ` David Edelsohn
@ 2002-12-21 21:55                   ` Geoff Keating
  2002-12-28 22:08                   ` Segher Boessenkool
  2 siblings, 0 replies; 33+ messages in thread
From: Geoff Keating @ 2002-12-21 21:55 UTC (permalink / raw)
  To: segher; +Cc: dje, gcc-patches, pinskia, dalej

> Date: Sat, 21 Dec 2002 06:58:50 +0100
> From: Segher Boessenkool <segher@koffie.nl>

> > > Oh btw, for a function like
> > >
> > > float bla(float a, float b, float c)
> > > {
> > >       return a + b * c;
> > > }
> > >
> > > gcc generates something like
> > >
> > >       fmadds  2,2,3,1
> > >       fmr     1,2
> > >       blr
> > >
> > > any idea what needs to be fixed to get rid of the fmr insn?
> > 
> > The register allocator.
> 
> I was hoping for something a little bit more specific than this, like,
> if it's a known shortcoming for example.

I don't know about this particular case, but it's known that the
allocator doesn't prefer to allocate pseudos to hard registers that
don't have their own separate register class just because the psuedo
is copied to that register.

-- 
- Geoffrey Keating <geoffk@geoffk.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2002-12-20 22:21                 ` Segher Boessenkool
  2002-12-20 22:28                   ` David Edelsohn
  2002-12-21 21:55                   ` Geoff Keating
@ 2002-12-28 22:08                   ` Segher Boessenkool
  2 siblings, 0 replies; 33+ messages in thread
From: Segher Boessenkool @ 2002-12-28 22:08 UTC (permalink / raw)
  To: Geoff Keating, dje, gcc-patches, pinskia, dalej

Segher Boessenkool wrote:
> 
> Geoff Keating wrote:
> > The canonical form for (mult A (neg B)) is (mult (neg A) B);
> > this is documented but may not be implemented.  If it was implemented,
> > (mult (neg A) (neg B)) should be simplified to (mult (neg (neg A)) B),
> > and that simplifies down to (mult A B).
> 
> I'll make a patch to fix that, then.

The simplifications are implemented, but combine won't simplify two insns
into two different insns.  The attached patch fixes this.  Bootstrapped on
powerpc-unknown-linux-gnu (c,c++,objc); regression checked too.


Segher


2002-12-28  Segher Boessenkool  <segher@koffie.nl>

	* combine.c (try_combine): Handle simplifying two insns into
	two different insns.



*** ../../gcc-clean/gcc/combine.c	Fri Dec 27 03:21:29 2002
--- ./combine.c	Sat Dec 28 07:18:19 2002
*************** try_combine (i3, i2, i1, new_direct_jump
*** 2113,2120 ****
       insns.  There are two ways to do this.  It can be split using a
       machine-specific method (like when you have an addition of a large
       constant) or by combine in the function find_split_point.  */
  
!   if (i1 && insn_code_number < 0 && GET_CODE (newpat) == SET
        && asm_noperands (newpat) < 0)
      {
        rtx m_split, *split;
--- 2113,2123 ----
       insns.  There are two ways to do this.  It can be split using a
       machine-specific method (like when you have an addition of a large
       constant) or by combine in the function find_split_point.  */
+   /* We need to do this when combining only two insns, too; otherwise,
+      canonicalizations and simplifications that do not result in a single
+      machine insn will never be performed.  */
  
!   if (insn_code_number < 0 && GET_CODE (newpat) == SET
        && asm_noperands (newpat) < 0)
      {
        rtx m_split, *split;
*************** try_combine (i3, i2, i1, new_direct_jump
*** 2472,2477 ****
--- 2475,2498 ----
        return 0;
      }
  
+   /* If combining two insns doesn't improve things, fail.  */
+   /* If NEWI2PAT is a PARALLEL, it might still improve things;
+      that will need additional checks though, as I2PAT can be
+      part of an identical PARALLEL, and we end up with an
+      infinite loop.  For now, just don't allow it.  */
+   /* Similarly, it is also possible that there is some valid
+      simplification for which the GET_CODE ... clause below
+      is true.  */
+ 
+   if (i1 == 0 && newi2pat
+       && (GET_CODE (i2pat) != SET || GET_CODE (newi2pat) == PARALLEL
+ 	  || (GET_CODE (newi2pat) == SET
+ 	      && GET_CODE (SET_SRC (i2pat)) == GET_CODE (SET_SRC (newi2pat)))))
+     {
+       undo_all ();
+       return 0;
+     }
+ 
    /* If we had to change another insn, make sure it is valid also.  */
    if (undobuf.other_insn)
      {
*************** try_combine (i3, i2, i1, new_direct_jump
*** 2666,2672 ****
         patterns, move from I1 to I2 then I2 to I3 so that we get the
         proper movement on registers that I2 modifies.  */
  
!     if (newi2pat)
        {
  	move_deaths (newi2pat, NULL_RTX, INSN_CUID (i1), i2, &midnotes);
  	move_deaths (newpat, newi2pat, INSN_CUID (i1), i3, &midnotes);
--- 2687,2693 ----
         patterns, move from I1 to I2 then I2 to I3 so that we get the
         proper movement on registers that I2 modifies.  */
  
!     if (newi2pat && i1)
        {
  	move_deaths (newi2pat, NULL_RTX, INSN_CUID (i1), i2, &midnotes);
  	move_deaths (newpat, newi2pat, INSN_CUID (i1), i3, &midnotes);
*************** try_combine (i3, i2, i1, new_direct_jump
*** 2869,2875 ****
        && INSN_CUID (added_links_insn) < INSN_CUID (i3))
      return added_links_insn;
    else
!     return newi2pat ? i2 : i3;
  }
  \f
  /* Undo all the modifications recorded in undobuf.  */
--- 2890,2896 ----
        && INSN_CUID (added_links_insn) < INSN_CUID (i3))
      return added_links_insn;
    else
!     return (i1 && newi2pat) ? i2 : i3;
  }
  \f
  /* Undo all the modifications recorded in undobuf.  */


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
@ 2002-12-29  5:13 Richard Kenner
  2002-12-29 19:25 ` Segher Boessenkool
  0 siblings, 1 reply; 33+ messages in thread
From: Richard Kenner @ 2002-12-29  5:13 UTC (permalink / raw)
  To: segher; +Cc: gcc-patches

    The simplifications are implemented, but combine won't simplify two insns
    into two different insns.  

Correct, and it *must* not.

The reason is that combine must always *decrease* the number of insns
to avoid infinite loops.

    The attached patch fixes this.  

This patch is *not* OK.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2002-12-29  5:13 Richard Kenner
@ 2002-12-29 19:25 ` Segher Boessenkool
  0 siblings, 0 replies; 33+ messages in thread
From: Segher Boessenkool @ 2002-12-29 19:25 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc-patches

Richard Kenner wrote:
> 
>     The simplifications are implemented, but combine won't simplify two insns
>     into two different insns.
> 
> Correct, and it *must* not.
> 
> The reason is that combine must always *decrease* the number of insns
> to avoid infinite loops.
> 
>     The attached patch fixes this.
> 
> This patch is *not* OK.


Isn't this part of the patch preventing such loops?


Segher


*************** try_combine (i3, i2, i1, new_direct_jump
*** 2472,2477 ****
--- 2475,2498 ----
        return 0;
      }
  
+   /* If combining two insns doesn't improve things, fail.  */
+   /* If NEWI2PAT is a PARALLEL, it might still improve things;
+      that will need additional checks though, as I2PAT can be
+      part of an identical PARALLEL, and we end up with an
+      infinite loop.  For now, just don't allow it.  */
+   /* Similarly, it is also possible that there is some valid
+      simplification for which the GET_CODE ... clause below
+      is true.  */
+ 
+   if (i1 == 0 && newi2pat
+       && (GET_CODE (i2pat) != SET || GET_CODE (newi2pat) == PARALLEL
+ 	  || (GET_CODE (newi2pat) == SET
+ 	      && GET_CODE (SET_SRC (i2pat)) == GET_CODE (SET_SRC (newi2pat)))))
+     {
+       undo_all ();
+       return 0;
+     }
+ 
    /* If we had to change another insn, make sure it is valid also.  */
    if (undobuf.other_insn)
      {

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
@ 2002-12-30  4:58 Richard Kenner
  2002-12-30 20:17 ` Segher Boessenkool
  0 siblings, 1 reply; 33+ messages in thread
From: Richard Kenner @ 2002-12-30  4:58 UTC (permalink / raw)
  To: segher; +Cc: gcc-patches

    Isn't this part of the patch preventing such loops?

Yes, that's *part* of the code, but the other part is what you've taken
out: not combining two insns into two!

The *purpose* of combine is to reduce the number of insns.  Why would it
be helpful to replace two insns by two?  You still have the same number
of insns.  What's better about it?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2002-12-30  4:58 Richard Kenner
@ 2002-12-30 20:17 ` Segher Boessenkool
  2002-12-30 20:51   ` David Edelsohn
  0 siblings, 1 reply; 33+ messages in thread
From: Segher Boessenkool @ 2002-12-30 20:17 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc-patches

Richard Kenner wrote:
> 
>     Isn't this part of the patch preventing such loops?
> 
> Yes, that's *part* of the code, but the other part is what you've taken
> out: not combining two insns into two!
> 
> The *purpose* of combine is to reduce the number of insns.  Why would it
> be helpful to replace two insns by two?  You still have the same number
> of insns.  What's better about it?

Without it, lots of simplifications don't ever get applied.  This results
in worse code.  For example, with the patch applied, bootstrap time goes
down by a few percent (powerpc-unknown-linux-gnu), as well as code size.

One common example is, without the patch, computations involving bitfields
use mfcr insns; with it, they use logic instructions.

Another example: with the patch, loads + mask are done as a smaller width
load if appropriate, which sometimes enables further optimizations.

There are many more cases then just thes two examples, though.  I'll try
and find some on i686 tomorrow.

Segher

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2002-12-30 20:17 ` Segher Boessenkool
@ 2002-12-30 20:51   ` David Edelsohn
  2003-01-02  0:52     ` Segher Boessenkool
  0 siblings, 1 reply; 33+ messages in thread
From: David Edelsohn @ 2002-12-30 20:51 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches

	The combine pass is called "combine" because it is predicated on
combining instructions and uses decreased instructions as the goal.  If
more optimal instruction sequences should be used, that needs to be
optimized by a different pass.  Your bitfield+mfcr -> logic instruction
example might be appropriate as define_peephole2 patterns.  Your load+mask
-> narrower load already should be handled correctly by other
optimizations or the combiner because it seem like it should decrease the
number of instructions.  If the code size is decreasing then the number of
instructions is decreasing, so maybe we need to add patterns transforming
three instructions into two instructions.

David

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2002-12-30 20:51   ` David Edelsohn
@ 2003-01-02  0:52     ` Segher Boessenkool
  2003-01-02  1:44       ` Geoff Keating
  0 siblings, 1 reply; 33+ messages in thread
From: Segher Boessenkool @ 2003-01-02  0:52 UTC (permalink / raw)
  To: David Edelsohn, Richard Kenner; +Cc: gcc-patches

David Edelsohn wrote:
> 
>         The combine pass is called "combine" because it is predicated on
> combining instructions and uses decreased instructions as the goal.

Yes, I understand that.

> If more optimal instruction sequences should be used, that needs to be
> optimized by a different pass.  Your bitfield+mfcr -> logic instruction
> example might be appropriate as define_peephole2 patterns.  Your load+mask
> -> narrower load already should be handled correctly by other
> optimizations or the combiner because it seem like it should decrease the
> number of instructions.

Well, all these optimizations are already there, in simplify-rtx.c etc., it's
just that they never get done by the current combine, because it doesn't
decrease the number of rtl insns _if looking through a very small (3 insn)
window_.

> If the code size is decreasing then the number of
> instructions is decreasing, so maybe we need to add patterns transforming
> three instructions into two instructions.

That's a lot of different patterns; also, it's not only 3->2, but also 4->3
and I saw an 8->7, even.

Richard Kenner wrote:
> 
>     Without it, lots of simplifications don't ever get applied.  This results
>     in worse code.  For example, with the patch applied, bootstrap time goes
>     down by a few percent (powerpc-unknown-linux-gnu), as well as code size.
> 
>     One common example is, without the patch, computations involving bitfields
>     use mfcr insns; with it, they use logic instructions.
> 
> But that's not what combine is supposed to do!  The purpose of combine
> is what it's name says, to *combine* insns.

Yes, but sometimes doing a 2->2 simplification will allow it to do a 2->1
simplification, or two 2->2 simplifications will allow a 3->2, or maybe
some even longer chain.

If combine is supposed to apply (recursively) all possible simplifications,
it needs to do all canonicalizations that are in simplify-rtx and friends,
or it will fail to do some simplifications because the simplification
patterns assume their "sub patterns" (child nodes? nomenclature fails me)
are already simplified.

> If there is a simpler way to do an insn, it should be in the MD file.

Most of these are not machine dependent.  The mcrf thing just stuck out
because I looked at GCC itself as an example of "normal" big code, which
of course it is not (much too many bitfields).

Segher

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2003-01-02  0:52     ` Segher Boessenkool
@ 2003-01-02  1:44       ` Geoff Keating
  2003-01-04  1:59         ` Segher Boessenkool
  0 siblings, 1 reply; 33+ messages in thread
From: Geoff Keating @ 2003-01-02  1:44 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches

Segher Boessenkool <segher@koffie.nl> writes:

> David Edelsohn wrote:
> > 
> >         The combine pass is called "combine" because it is predicated on
> > combining instructions and uses decreased instructions as the goal.
> 
> Yes, I understand that.
> 
> > If more optimal instruction sequences should be used, that needs to be
> > optimized by a different pass.  Your bitfield+mfcr -> logic instruction
> > example might be appropriate as define_peephole2 patterns.  Your load+mask
> > -> narrower load already should be handled correctly by other
> > optimizations or the combiner because it seem like it should decrease the
> > number of instructions.
> 
> Well, all these optimizations are already there, in simplify-rtx.c etc., it's
> just that they never get done by the current combine, because it doesn't
> decrease the number of rtl insns _if looking through a very small (3 insn)
> window_.

What you usually do in this case is create a combination insn+splitter
to give combine an appropriate intermediate result.

If you could provide testcases/code samples for the cases you mention,
it would be easier to see precisely what's going on.  Often several
iterations and many eyes are necessary to get the proper fix.

It is also possible that some rearchitecting of combine might be
necessary, but that's something that would require careful
consideration and an examination of all the possible alternatives,
not an ad-hoc patch to solve a specific problem.

> > If the code size is decreasing then the number of
> > instructions is decreasing, so maybe we need to add patterns transforming
> > three instructions into two instructions.
> 
> That's a lot of different patterns; also, it's not only 3->2, but also 4->3
> and I saw an 8->7, even.

-- 
- Geoffrey Keating <geoffk@geoffk.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2003-01-02  1:44       ` Geoff Keating
@ 2003-01-04  1:59         ` Segher Boessenkool
  0 siblings, 0 replies; 33+ messages in thread
From: Segher Boessenkool @ 2003-01-04  1:59 UTC (permalink / raw)
  To: Geoff Keating; +Cc: gcc-patches

Geoff Keating wrote:
> 
> What you usually do in this case is create a combination insn+splitter
> to give combine an appropriate intermediate result.
> 
> If you could provide testcases/code samples for the cases you mention,
> it would be easier to see precisely what's going on.  Often several
> iterations and many eyes are necessary to get the proper fix.

Here are some of the cases that happen most on PowerPC:

-- comparing an SI to a "big" constant.
	(set (reg2) (big_constant))
	(set:CC (compare:CC (reg1) (reg2)))
		becomes
	(set (reg1) (xor (reg1) (hi_constant)))
	(set:CC (compare:CC (reg1) (lo_constant)))

	There is a define_split to handle this; it splits two rtl insns
	into two different rtl insns.  In actual machine insns this is
	an improvement, though (either less insns, or less register
	pressure if lo == 0).  So you're saying there should be a define_insn
	for this too, so combine would think this is just one insn?

-- sign extend of xor (twice) => xor (twice) of sign extend.

	Swapping an xor with a sign extend doesn't help per se, but it
	does in cases like this.  combine can't help it: it would need
	to look at 4 insns.

> It is also possible that some rearchitecting of combine might be
> necessary, but that's something that would require careful
> consideration and an examination of all the possible alternatives,
> not an ad-hoc patch to solve a specific problem.

Small progress is progress too.  Of course there's more to be gained
by writing a new'n'improved combine pass, or a new separate pass, but
that's not a reason to not accept other improvements.  All IMHO of course,
and assuming it is actually an improvement :)


Segher


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
@ 2002-12-31  3:59 Richard Kenner
  2003-01-05 23:24 ` Zack Weinberg
  0 siblings, 1 reply; 33+ messages in thread
From: Richard Kenner @ 2002-12-31  3:59 UTC (permalink / raw)
  To: segher; +Cc: gcc-patches

    Without it, lots of simplifications don't ever get applied.  This results
    in worse code.  For example, with the patch applied, bootstrap time goes
    down by a few percent (powerpc-unknown-linux-gnu), as well as code size.

    One common example is, without the patch, computations involving bitfields
    use mfcr insns; with it, they use logic instructions.

But that's not what combine is supposed to do!  The purpose of combine
is what it's name says, to *combine* insns.

If there is a simpler way to do an insn, it should be in the MD file.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2002-12-31  3:59 Richard Kenner
@ 2003-01-05 23:24 ` Zack Weinberg
  2003-01-06  2:26   ` David Edelsohn
  0 siblings, 1 reply; 33+ messages in thread
From: Zack Weinberg @ 2003-01-05 23:24 UTC (permalink / raw)
  To: Richard Kenner; +Cc: segher, gcc-patches

kenner@vlsi1.ultra.nyu.edu (Richard Kenner) writes:

>     Without it, lots of simplifications don't ever get applied.
>     This results in worse code.  For example, with the patch
>     applied, bootstrap time goes down by a few percent
>     (powerpc-unknown-linux-gnu), as well as code size.
>
>     One common example is, without the patch, computations involving
>     bitfields use mfcr insns; with it, they use logic instructions.
>
> But that's not what combine is supposed to do!  The purpose of
> combine is what it's name says, to *combine* insns.

That's what it has historically done, but that is not adequate reason
to refuse to make it do something else, if the something else is a
more effective optimization.  It sounds like you are reacting
negatively to this patch without even considering what merit it might
have.

The alternative you and others have suggested -- adding splitters to
the machine description -- neglects to consider that this change
potentially benefits *all* architectures, and without requiring
backend authors to jump through even more hoops than they already do.

I am not saying that this patch should definitely be applied; I just
think it is worthy of serious consideration on its technical merits,
rather than being blown off because it makes the name "combine.c"
marginally inaccurate.

zw

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2003-01-05 23:24 ` Zack Weinberg
@ 2003-01-06  2:26   ` David Edelsohn
  2003-01-06  4:02     ` Segher Boessenkool
  2003-01-06 23:21     ` Geoff Keating
  0 siblings, 2 replies; 33+ messages in thread
From: David Edelsohn @ 2003-01-06  2:26 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Richard Kenner, segher, gcc-patches

	I don't think that anyone is objecting to the concept and the
benefit.  If I understand correctly, the patch violates the semantics of
the combiner algorithm which requires a declining cost calculated as the
number of instructions.  Allowing combinations that do not decrease the
cost would make the algorithm non-deterministic and possibly not converge,
right? 

David

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2003-01-06  2:26   ` David Edelsohn
@ 2003-01-06  4:02     ` Segher Boessenkool
  2003-01-06  4:07       ` Segher Boessenkool
  2003-01-06 23:21     ` Geoff Keating
  1 sibling, 1 reply; 33+ messages in thread
From: Segher Boessenkool @ 2003-01-06  4:02 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Zack Weinberg, Richard Kenner, gcc-patches

David Edelsohn wrote:
> 
>         I don't think that anyone is objecting to the concept and the
> benefit.  If I understand correctly, the patch violates the semantics of
> the combiner algorithm which requires a declining cost calculated as the
> number of instructions.  Allowing combinations that do not decrease the
> cost would make the algorithm non-deterministic and possibly not converge,
> right?

Only if the "canonicalizations" don't actually canonicalize... but yes, in
that (hypothetical) case that could happen.  Some documentation might be
needed to that extend if this patch ever gets accepted.


Segher

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2003-01-06  4:02     ` Segher Boessenkool
@ 2003-01-06  4:07       ` Segher Boessenkool
  0 siblings, 0 replies; 33+ messages in thread
From: Segher Boessenkool @ 2003-01-06  4:07 UTC (permalink / raw)
  To: David Edelsohn, Zack Weinberg, Richard Kenner, gcc-patches

Segher Boessenkool wrote:
> 
> David Edelsohn wrote:
> >
> >         I don't think that anyone is objecting to the concept and the
> > benefit.  If I understand correctly, the patch violates the semantics of
> > the combiner algorithm which requires a declining cost calculated as the
> > number of instructions.  Allowing combinations that do not decrease the
> > cost would make the algorithm non-deterministic and possibly not converge,
> > right?
> 
> Only if the "canonicalizations" don't actually canonicalize... but yes, in
> that (hypothetical) case that could happen.  Some documentation might be
> needed to that extend if this patch ever gets accepted.

I should add that this would require two simplifications that do exactly the
opposite transform, and would maybe already recurse -- in a subroutine of combine.


Segher

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2003-01-06  2:26   ` David Edelsohn
  2003-01-06  4:02     ` Segher Boessenkool
@ 2003-01-06 23:21     ` Geoff Keating
  2003-01-09 22:41       ` Segher Boessenkool
  1 sibling, 1 reply; 33+ messages in thread
From: Geoff Keating @ 2003-01-06 23:21 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Richard Kenner, segher, gcc-patches

David Edelsohn <dje@watson.ibm.com> writes:

> 	I don't think that anyone is objecting to the concept and the
> benefit.  If I understand correctly, the patch violates the semantics of
> the combiner algorithm which requires a declining cost calculated as the
> number of instructions.  Allowing combinations that do not decrease the
> cost would make the algorithm non-deterministic and possibly not converge,
> right? 

It could also cause combine to perform a de-optimisation.  Now, its
default behaivour is to leave the original code alone, which at least
allows the user to try to obtain the best code sequence.

The difficulty I have with this patch is the justification for it.
It's not claimed that the change in the RTL itself improves
performance, but that because of other limitations in combine, the
change allows other changes that improve performance; and this example
is used as justification for allowing a whole class of changes only
one of which is the one that has been justified.

No analysis has been made of the impact of this patch on compile
speed, no analysis has been made of the impact of the patch on
compiled code performance on other architectures, only limited
analysis has been made of compiled code performance on powerpc, and
there is no sound theoretical foundation for the patch.

-- 
- Geoffrey Keating <geoffk@geoffk.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2003-01-06 23:21     ` Geoff Keating
@ 2003-01-09 22:41       ` Segher Boessenkool
  0 siblings, 0 replies; 33+ messages in thread
From: Segher Boessenkool @ 2003-01-09 22:41 UTC (permalink / raw)
  To: Geoff Keating; +Cc: David Edelsohn, Richard Kenner, gcc-patches

Geoff Keating wrote:
> 
> It could also cause combine to perform a de-optimisation.

Example?

> Now, its
> default behaivour is to leave the original code alone, which at least
> allows the user to try to obtain the best code sequence.

Try, yes.  But the user code really has almost no influence at this
level.

> The difficulty I have with this patch is the justification for it.

I agree with you here.  This is somewhat scary patch, and it needs good
testing.  But I can't do much more than I already did.

> It's not claimed that the change in the RTL itself improves
> performance,

No, it is claimed.

> but that because of other limitations in combine, the
> change allows other changes that improve performance;

That, too.  With a hypothetical infinite-window combine, the patch wouldn't
be useful, of course.

> and this example
> is used as justification for allowing a whole class of changes only
> one of which is the one that has been justified.

No, my justification was "if a tranformation makes the code worse, the
transformation shouldn't be there in the first place"; also, "the
documentation implies this is the intended behaviour"; the examples I
gave I just did because I was asked to give such examples.

> No analysis has been made of the impact of this patch on compile
> speed,

I did.  See my mail from dec 31.  It helped bootstrap time, which
is good enough imho.

> no analysis has been made of the impact of the patch on
> compiled code performance on other architectures,

I can compile for other architectures, but not run on it, sorry.
Maybe someone else?

> only limited
> analysis has been made of compiled code performance on powerpc,

I don't have access to SPEC etc., sorry.  Maybe someone else?

> and there is no sound theoretical foundation for the patch.

If you say so, but then, neither is there for combine *at all*.
Reducing the amount of rtl isn't the same as improving performance.
The only difference in this regard is that the original combine
is easier to prove to be terminating.

Segher

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
@ 2003-01-02  1:39 Richard Kenner
  2003-01-04  1:59 ` Segher Boessenkool
  0 siblings, 1 reply; 33+ messages in thread
From: Richard Kenner @ 2003-01-02  1:39 UTC (permalink / raw)
  To: segher; +Cc: gcc-patches

    Yes, but sometimes doing a 2->2 simplification will allow it to do a
    2->1 simplification, or two 2->2 simplifications will allow a 3->2, or
    maybe some even longer chain.

But that should get done when combining 3 insns, for example.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2003-01-02  1:39 Richard Kenner
@ 2003-01-04  1:59 ` Segher Boessenkool
  0 siblings, 0 replies; 33+ messages in thread
From: Segher Boessenkool @ 2003-01-04  1:59 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc-patches

Richard Kenner wrote:
> 
>     Yes, but sometimes doing a 2->2 simplification will allow it to do a
>     2->1 simplification, or two 2->2 simplifications will allow a 3->2, or
>     maybe some even longer chain.
> 
> But that should get done when combining 3 insns, for example.

But that won't work if 3 insns don't reduce to 2, but with one more, it
_does_ reduce to 3 insns, for example:

	(set (t1) (op1 (reg0) (const X1)))
	(set (reg1) (op2 (t1)))
	(set (t2) (op1 (reg0) (const X2)))
	(set (reg2) (op2 (t2)))

where op1 and op2 can be swapped (like, op1 is xor and op2 is extend), so we get

	(set (t1) (op2 (reg0)))
	(set (reg1) (op1 (t1) (xonst X1)))
	(set (t2) (op2 (reg0)))
	(set (reg2) (op1 (t2) (xonst X2)))

where the sets of t1 and t2 can be merged.

Without always canonicalizing all rtl, you miss some simplifications.  But
canonicalizing won't always help, either; sometimes the non-canonical pattern
will simplify better.  It may very well be too slow to try all possible
transformations, however.  Always doing the canonicalizations was a win on
almost all the code I inspected (gcc itself, some codecs, a forth vm); I saw
worse code only two or three times (and those were all non-eliminated dead code).

Some more to try (these use mfcr while better and/or shorter patterns are possible):

int xx1(int a, int b)
{
	return a == 0 && b == 0;
}

int xx2(int a, int b)
{
	a &= 1; b &= 1;
	return a == 0 && b == 0;
}

Another nice one:

int xx3(int a, int b)
{
	return ~a ^ b ^ a ^ ~b;
}

Segher

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
@ 2003-01-06 22:09 Richard Kenner
  0 siblings, 0 replies; 33+ messages in thread
From: Richard Kenner @ 2003-01-06 22:09 UTC (permalink / raw)
  To: dje; +Cc: gcc-patches

    I don't think that anyone is objecting to the concept and the benefit.
    If I understand correctly, the patch violates the semantics of the
    combiner algorithm which requires a declining cost calculated as the
    number of instructions.  Allowing combinations that do not decrease
    the cost would make the algorithm non-deterministic and possibly not
    converge, right?

That's what I'm saying, yes.  I'm out of the country right now with poor
email access and will have more to say on this when I et bck in a few days.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
@ 2003-01-10  2:15 Richard Kenner
  2003-01-12  1:38 ` Segher Boessenkool
  0 siblings, 1 reply; 33+ messages in thread
From: Richard Kenner @ 2003-01-10  2:15 UTC (permalink / raw)
  To: segher; +Cc: gcc-patches

    Reducing the amount of rtl isn't the same as improving performance.

No, but unless the MD file has a serious problem, it is *one way* of
improving performance.

    The only difference in this regard is that the original combine
    is easier to prove to be terminating.

And that it sticks to the semantics, which is *combining* insns.

There's no question that we don't current have a pass which is good at
finding the simplest form of the RTL.  The idea is to move as much of
combine as possible into simplify-rtx.c.  Then combine is very small and
other passes will do what you are trying to do.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: rs6000 fused multiply-add patch [+ patchlet]
  2003-01-10  2:15 Richard Kenner
@ 2003-01-12  1:38 ` Segher Boessenkool
  0 siblings, 0 replies; 33+ messages in thread
From: Segher Boessenkool @ 2003-01-12  1:38 UTC (permalink / raw)
  To: Richard Kenner; +Cc: gcc-patches

Richard Kenner wrote:
> 
>     Reducing the amount of rtl isn't the same as improving performance.
> 
> No, but unless the MD file has a serious problem, it is *one way* of
> improving performance.

I don't dispute that.

> There's no question that we don't current have a pass which is good at
> finding the simplest form of the RTL.  The idea is to move as much of
> combine as possible into simplify-rtx.c.

It's on my todo list (it's in the beginner projects list).

> Then combine is very small and other passes will do what you are
> trying to do.

Part of it, maybe.  But certainly not all, unless we run simplify
and combine over and over again until nothing changes anymore.


Segher


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2003-01-12  1:38 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-12-02 19:01 rs6000 fused multiply-add patch Geoffrey Keating
2002-12-03 15:41 ` Segher Boessenkool
2002-12-03 16:59   ` Geoff Keating
2002-12-03 17:12     ` Segher Boessenkool
2002-12-03 17:29       ` David Edelsohn
2002-12-04 19:41         ` Segher Boessenkool
2002-12-05 14:04           ` Geoff Keating
2002-12-20 21:08             ` rs6000 fused multiply-add patch [+ patchlet] Segher Boessenkool
2002-12-20 21:38               ` Geoff Keating
2002-12-20 22:21                 ` Segher Boessenkool
2002-12-20 22:28                   ` David Edelsohn
2002-12-21 21:55                   ` Geoff Keating
2002-12-28 22:08                   ` Segher Boessenkool
2002-12-29  5:13 Richard Kenner
2002-12-29 19:25 ` Segher Boessenkool
2002-12-30  4:58 Richard Kenner
2002-12-30 20:17 ` Segher Boessenkool
2002-12-30 20:51   ` David Edelsohn
2003-01-02  0:52     ` Segher Boessenkool
2003-01-02  1:44       ` Geoff Keating
2003-01-04  1:59         ` Segher Boessenkool
2002-12-31  3:59 Richard Kenner
2003-01-05 23:24 ` Zack Weinberg
2003-01-06  2:26   ` David Edelsohn
2003-01-06  4:02     ` Segher Boessenkool
2003-01-06  4:07       ` Segher Boessenkool
2003-01-06 23:21     ` Geoff Keating
2003-01-09 22:41       ` Segher Boessenkool
2003-01-02  1:39 Richard Kenner
2003-01-04  1:59 ` Segher Boessenkool
2003-01-06 22:09 Richard Kenner
2003-01-10  2:15 Richard Kenner
2003-01-12  1:38 ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).