public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [RS6000] bswapdi2 pattern, reload and lra
@ 2013-12-17 11:50 Alan Modra
  2013-12-18 14:53 ` David Edelsohn
  2015-03-16  2:51 ` David Edelsohn
  0 siblings, 2 replies; 5+ messages in thread
From: Alan Modra @ 2013-12-17 11:50 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Edelsohn, pthaugen

This patch is aimed at fixing test failures introduced by my
2013-12-07 change to bswapdi2_32bit:
FAIL: gcc.target/powerpc/pr53199.c scan-assembler-times lwbrx 6
FAIL: gcc.target/powerpc/pr53199.c scan-assembler-times stwbrx 6

The 2013-12-07 change was necessary to make -m32 -mlra generate good
code for pr53199.c:reg_reverse.  Too many '?'s on the r->r alternative
result in lra never choosing that option.  Instead we get Z->r,
ie. storing the input reg to a stack temp then using lwbrx from there.
That means we have a load-hit-store flush with a severe slowdown.
(See http://gcc.gnu.org/ml/gcc-patches/2013-12/msg00002.html for the
corresponding -m64 result, a 4x slowdown.)

A similar problem occurs with -m64 -mcpu=power7 -mlra due to
bswapdi2_ldbrx having two '?'s on r->r.

To fix this I ran into a whole lot of pain.  reload and lra are quite
different in their selection of insn alternatives.  I could not find
any combination of '?'s that generated the best code for both reload
an lra on pr53199.c.  To see why, it's necessary to look (from a great
height) at how reload chooses amongst alternatives.  A particular
alternative gets a "loser" score, with the lowest "loser" being
chosen.  "loser" is incremented
a) when an operand does not match its constraint.
b) when an alternative has a '?' in the constraint.
c) when a scratch register is required.
d) when an early clobber output clashes with one of the inputs.

a) is fairly obvious.  For example, if we have a MEM when the operand
   alternative needs a reg, then we'll require a reload.
b) is also quite obvious.  Multiple '?'s accumulate.
c) is a little more subtle.  It bites you when alternatives require
   differing numbers of scratch registers.  Take for example
   bswapdi2_64bit, which before this patch has three alternatives
   Z->r with 3 scratch regs, (Z is a subset of m)
   r->Z with 2 scratch regs,
   r->r with 3 scratch regs.
   All other things being equal, with reload you could correct this
   disparity by adding a '?' to the r->Z alternative.  We might want
   to do that so that Z->r and r->r are the same "distance" apart
   as r->Z is from r->r.  With lra it seems that scratch regs are
   weighted differently..
d) is also tricky, and a trap for anyone optimizing insn selection for
   functions like some in pr53199.c that have just one rtl insn with
   early clobbers.  PowerPC generally returns function results in the
   same register as the first argument, so these hit the early
   clobber.  Code elsewhere in larger functions probably won't.
   lra penalizes early clobbers differently to reload (a lot less).

So, putting this all together..  Point (d) test implication is covered
by the additional functions in pr53199.c.  Avoiding early clobbers
where possible is good since it reduces differences between reload and
lra insn alternative costing.  We also generate better code.  I
managed to do this for all the Z->r bswapdi patterns, but stopped
short of changing r->r as well since at that point everything looked
OK!  Avoiding extra scratch registers for one alternative is also good.
The op4 r->r scratch wasn't even used, and op4 for Z->r fairly easy to
do without.  Renaming word_high/low to word1/2 was to make it a little
easier to track lifetime of addr1/2.

Bootstrapped and regression tested powerpc64-linux.  Output of
pr53199.c inspected for sanity with -mcpu=power{6,7} -m{32,64} and
{-mlra,}.  OK to apply?

gcc/
	* config/rs6000/rs6000.md (bswapdi2): Remove one scratch reg.
	Modify Z->r bswapdi splitter to use dest in place of scratch.
	In r->Z and Z->r bswapdi splitter rename word_high, word_low
	to word1, word2 and rearrange logic to suit.
	(bswapdi2_64bit): Remove early clobber on Z->r alternative.
	(bswapdi2_ldbrx): Likewise.  Remove '??' on r->r.
	(bswapdi2_32bit): Remove early clobber on Z->r alternative.
	Add one '?' on r->r.  Modify Z->r splitter to avoid need for
	early clobber.
gcc/testsuite/
	* gcc.target/powerpc/pr53199.c: Add extra functions.

Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 206009)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -2344,8 +2344,7 @@
 		   (bswap:DI
 		    (match_operand:DI 1 "reg_or_mem_operand" "")))
 	      (clobber (match_scratch:DI 2 ""))
-	      (clobber (match_scratch:DI 3 ""))
-	      (clobber (match_scratch:DI 4 ""))])]
+	      (clobber (match_scratch:DI 3 ""))])]
   ""
 {
   if (!REG_P (operands[0]) && !REG_P (operands[1]))
@@ -2363,11 +2362,10 @@
 
 ;; Power7/cell has ldbrx/stdbrx, so use it directly
 (define_insn "*bswapdi2_ldbrx"
-  [(set (match_operand:DI 0 "reg_or_mem_operand" "=&r,Z,??&r")
+  [(set (match_operand:DI 0 "reg_or_mem_operand" "=r,Z,&r")
 	(bswap:DI (match_operand:DI 1 "reg_or_mem_operand" "Z,r,r")))
    (clobber (match_scratch:DI 2 "=X,X,&r"))
-   (clobber (match_scratch:DI 3 "=X,X,&r"))
-   (clobber (match_scratch:DI 4 "=X,X,&r"))]
+   (clobber (match_scratch:DI 3 "=X,X,&r"))]
   "TARGET_POWERPC64 && TARGET_LDBRX
    && (REG_P (operands[0]) || REG_P (operands[1]))"
   "@
@@ -2379,11 +2377,10 @@
 
 ;; Non-power7/cell, fall back to use lwbrx/stwbrx
 (define_insn "*bswapdi2_64bit"
-  [(set (match_operand:DI 0 "reg_or_mem_operand" "=&r,Z,&r")
+  [(set (match_operand:DI 0 "reg_or_mem_operand" "=r,Z,&r")
 	(bswap:DI (match_operand:DI 1 "reg_or_mem_operand" "Z,r,r")))
    (clobber (match_scratch:DI 2 "=&b,&b,&r"))
-   (clobber (match_scratch:DI 3 "=&r,&r,&r"))
-   (clobber (match_scratch:DI 4 "=&r,X,&r"))]
+   (clobber (match_scratch:DI 3 "=&r,&r,&r"))]
   "TARGET_POWERPC64 && !TARGET_LDBRX
    && (REG_P (operands[0]) || REG_P (operands[1]))
    && !(MEM_P (operands[0]) && MEM_VOLATILE_P (operands[0]))
@@ -2395,8 +2392,7 @@
   [(set (match_operand:DI 0 "gpc_reg_operand" "")
 	(bswap:DI (match_operand:DI 1 "indexed_or_indirect_operand" "")))
    (clobber (match_operand:DI 2 "gpc_reg_operand" ""))
-   (clobber (match_operand:DI 3 "gpc_reg_operand" ""))
-   (clobber (match_operand:DI 4 "gpc_reg_operand" ""))]
+   (clobber (match_operand:DI 3 "gpc_reg_operand" ""))]
   "TARGET_POWERPC64 && !TARGET_LDBRX && reload_completed"
   [(const_int 0)]
   "
@@ -2405,15 +2401,14 @@
   rtx src    = operands[1];
   rtx op2    = operands[2];
   rtx op3    = operands[3];
-  rtx op4    = operands[4];
   rtx op3_32 = simplify_gen_subreg (SImode, op3, DImode,
 				    BYTES_BIG_ENDIAN ? 4 : 0);
-  rtx op4_32 = simplify_gen_subreg (SImode, op4, DImode,
-				    BYTES_BIG_ENDIAN ? 4 : 0);
+  rtx dest_32 = simplify_gen_subreg (SImode, dest, DImode,
+				     BYTES_BIG_ENDIAN ? 4 : 0);
   rtx addr1;
   rtx addr2;
-  rtx word_high;
-  rtx word_low;
+  rtx word1;
+  rtx word2;
 
   addr1 = XEXP (src, 0);
   if (GET_CODE (addr1) == PLUS)
@@ -2438,29 +2433,29 @@
       addr2 = gen_rtx_PLUS (Pmode, op2, addr1);
     }
 
+  word1 = change_address (src, SImode, addr1);
+  word2 = change_address (src, SImode, addr2);
+
   if (BYTES_BIG_ENDIAN)
     {
-      word_high = change_address (src, SImode, addr1);
-      word_low  = change_address (src, SImode, addr2);
+      emit_insn (gen_bswapsi2 (op3_32, word2));
+      emit_insn (gen_bswapsi2 (dest_32, word1));
     }
   else
     {
-      word_high = change_address (src, SImode, addr2);
-      word_low  = change_address (src, SImode, addr1);
+      emit_insn (gen_bswapsi2 (op3_32, word1));
+      emit_insn (gen_bswapsi2 (dest_32, word2));
     }
 
-  emit_insn (gen_bswapsi2 (op3_32, word_low));
-  emit_insn (gen_bswapsi2 (op4_32, word_high));
-  emit_insn (gen_ashldi3 (dest, op3, GEN_INT (32)));
-  emit_insn (gen_iordi3 (dest, dest, op4));
+  emit_insn (gen_ashldi3 (op3, op3, GEN_INT (32)));
+  emit_insn (gen_iordi3 (dest, dest, op3));
 }")
 
 (define_split
   [(set (match_operand:DI 0 "indexed_or_indirect_operand" "")
 	(bswap:DI (match_operand:DI 1 "gpc_reg_operand" "")))
    (clobber (match_operand:DI 2 "gpc_reg_operand" ""))
-   (clobber (match_operand:DI 3 "gpc_reg_operand" ""))
-   (clobber (match_operand:DI 4 "" ""))]
+   (clobber (match_operand:DI 3 "gpc_reg_operand" ""))]
   "TARGET_POWERPC64 && !TARGET_LDBRX && reload_completed"
   [(const_int 0)]
   "
@@ -2475,8 +2470,8 @@
 				    BYTES_BIG_ENDIAN ? 4 : 0);
   rtx addr1;
   rtx addr2;
-  rtx word_high;
-  rtx word_low;
+  rtx word1;
+  rtx word2;
 
   addr1 = XEXP (dest, 0);
   if (GET_CODE (addr1) == PLUS)
@@ -2501,27 +2496,28 @@
       addr2 = gen_rtx_PLUS (Pmode, op2, addr1);
     }
 
+  word1 = change_address (dest, SImode, addr1);
+  word2 = change_address (dest, SImode, addr2);
+
   emit_insn (gen_lshrdi3 (op3, src, GEN_INT (32)));
+
   if (BYTES_BIG_ENDIAN)
     {
-      word_high = change_address (dest, SImode, addr1);
-      word_low  = change_address (dest, SImode, addr2);
+      emit_insn (gen_bswapsi2 (word1, src_si));
+      emit_insn (gen_bswapsi2 (word2, op3_si));
     }
   else
     {
-      word_high = change_address (dest, SImode, addr2);
-      word_low  = change_address (dest, SImode, addr1);
+      emit_insn (gen_bswapsi2 (word2, src_si));
+      emit_insn (gen_bswapsi2 (word1, op3_si));
     }
-  emit_insn (gen_bswapsi2 (word_high, src_si));
-  emit_insn (gen_bswapsi2 (word_low, op3_si));
 }")
 
 (define_split
   [(set (match_operand:DI 0 "gpc_reg_operand" "")
 	(bswap:DI (match_operand:DI 1 "gpc_reg_operand" "")))
    (clobber (match_operand:DI 2 "gpc_reg_operand" ""))
-   (clobber (match_operand:DI 3 "gpc_reg_operand" ""))
-   (clobber (match_operand:DI 4 "" ""))]
+   (clobber (match_operand:DI 3 "gpc_reg_operand" ""))]
   "TARGET_POWERPC64 && reload_completed"
   [(const_int 0)]
   "
@@ -2544,7 +2540,7 @@
 }")
 
 (define_insn "bswapdi2_32bit"
-  [(set (match_operand:DI 0 "reg_or_mem_operand" "=&r,Z,&r")
+  [(set (match_operand:DI 0 "reg_or_mem_operand" "=r,Z,?&r")
 	(bswap:DI (match_operand:DI 1 "reg_or_mem_operand" "Z,r,r")))
    (clobber (match_scratch:SI 2 "=&b,&b,X"))]
   "!TARGET_POWERPC64 && (REG_P (operands[0]) || REG_P (operands[1]))"
@@ -2573,7 +2569,8 @@
   if (GET_CODE (addr1) == PLUS)
     {
       emit_insn (gen_add3_insn (op2, XEXP (addr1, 0), GEN_INT (4)));
-      if (TARGET_AVOID_XFORM)
+      if (TARGET_AVOID_XFORM
+	  || REGNO (XEXP (addr1, 1)) == REGNO (dest2))
 	{
 	  emit_insn (gen_add3_insn (op2, XEXP (addr1, 1), op2));
 	  addr2 = op2;
@@ -2581,7 +2578,8 @@
       else
 	addr2 = gen_rtx_PLUS (SImode, op2, XEXP (addr1, 1));
     }
-  else if (TARGET_AVOID_XFORM)
+  else if (TARGET_AVOID_XFORM
+	   || REGNO (addr1) == REGNO (dest2))
     {
       emit_insn (gen_add3_insn (op2, addr1, GEN_INT (4)));
       addr2 = op2;
@@ -2596,6 +2594,8 @@
   word2 = change_address (src, SImode, addr2);
 
   emit_insn (gen_bswapsi2 (dest2, word1));
+  /* The REGNO (dest2) tests above ensure that addr2 has not been trashed,
+     thus allowing us to omit an early clobber on the output.  */
   emit_insn (gen_bswapsi2 (dest1, word2));
 }")
 
Index: gcc/testsuite/gcc.target/powerpc/pr53199.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr53199.c	(revision 206009)
+++ gcc/testsuite/gcc.target/powerpc/pr53199.c	(working copy)
@@ -1,7 +1,7 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
 /* { dg-options "-O2 -mcpu=power6 -mavoid-indexed-addresses" } */
-/* { dg-final { scan-assembler-times "lwbrx" 6 } } */
+/* { dg-final { scan-assembler-times "lwbrx" 12 } } */
 /* { dg-final { scan-assembler-times "stwbrx" 6 } } */
 
 /* PR 51399: bswap gets an error if -mavoid-indexed-addresses was used in
@@ -25,6 +25,24 @@
   return __builtin_bswap64 (p[i]);
 }
 
+long long
+load64_reverse_4 (long long dummy __attribute__ ((unused)), long long *p)
+{
+  return __builtin_bswap64 (*p);
+}
+
+long long
+load64_reverse_5 (long long dummy __attribute__ ((unused)), long long *p)
+{
+  return __builtin_bswap64 (p[1]);
+}
+
+long long
+load64_reverse_6 (long long dummy __attribute__ ((unused)), long long *p, int i)
+{
+  return __builtin_bswap64 (p[i]);
+}
+
 void
 store64_reverse_1 (long long *p, long long x)
 {
@@ -44,7 +62,13 @@
 }
 
 long long
-reg_reverse (long long x)
+reg_reverse_1 (long long x)
 {
   return __builtin_bswap64 (x);
 }
+
+long long
+reg_reverse_2 (long long dummy __attribute__ ((unused)), long long x)
+{
+  return __builtin_bswap64 (x);
+}

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RS6000] bswapdi2 pattern, reload and lra
  2013-12-17 11:50 [RS6000] bswapdi2 pattern, reload and lra Alan Modra
@ 2013-12-18 14:53 ` David Edelsohn
  2015-03-12 11:15   ` Alan Modra
  2015-03-16  2:51 ` David Edelsohn
  1 sibling, 1 reply; 5+ messages in thread
From: David Edelsohn @ 2013-12-18 14:53 UTC (permalink / raw)
  To: GCC Patches, Pat Haugen, Alan Modra

On Tue, Dec 17, 2013 at 6:50 AM, Alan Modra <amodra@gmail.com> wrote:

> Bootstrapped and regression tested powerpc64-linux.  Output of
> pr53199.c inspected for sanity with -mcpu=power{6,7} -m{32,64} and
> {-mlra,}.  OK to apply?
>
> gcc/
>         * config/rs6000/rs6000.md (bswapdi2): Remove one scratch reg.
>         Modify Z->r bswapdi splitter to use dest in place of scratch.
>         In r->Z and Z->r bswapdi splitter rename word_high, word_low
>         to word1, word2 and rearrange logic to suit.
>         (bswapdi2_64bit): Remove early clobber on Z->r alternative.
>         (bswapdi2_ldbrx): Likewise.  Remove '??' on r->r.
>         (bswapdi2_32bit): Remove early clobber on Z->r alternative.
>         Add one '?' on r->r.  Modify Z->r splitter to avoid need for
>         early clobber.
> gcc/testsuite/
>         * gcc.target/powerpc/pr53199.c: Add extra functions.

> @@ -2438,29 +2433,29 @@
>        addr2 = gen_rtx_PLUS (Pmode, op2, addr1);
>      }
>
> +  word1 = change_address (src, SImode, addr1);
> +  word2 = change_address (src, SImode, addr2);
> +
>    if (BYTES_BIG_ENDIAN)
>      {
> -      word_high = change_address (src, SImode, addr1);
> -      word_low  = change_address (src, SImode, addr2);
> +      emit_insn (gen_bswapsi2 (op3_32, word2));
> +      emit_insn (gen_bswapsi2 (dest_32, word1));
>      }
>    else
>      {
> -      word_high = change_address (src, SImode, addr2);
> -      word_low  = change_address (src, SImode, addr1);
> +      emit_insn (gen_bswapsi2 (op3_32, word1));
> +      emit_insn (gen_bswapsi2 (dest_32, word2));
>      }
>
> -  emit_insn (gen_bswapsi2 (op3_32, word_low));
> -  emit_insn (gen_bswapsi2 (op4_32, word_high));
> -  emit_insn (gen_ashldi3 (dest, op3, GEN_INT (32)));
> -  emit_insn (gen_iordi3 (dest, dest, op4));
> +  emit_insn (gen_ashldi3 (op3, op3, GEN_INT (32)));
> +  emit_insn (gen_iordi3 (dest, dest, op3));
>  }")

Why change the code from swapping the words at the initial
change_address() to swapping the words in the call to gen_bswapsi2()?

- David

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RS6000] bswapdi2 pattern, reload and lra
  2013-12-18 14:53 ` David Edelsohn
@ 2015-03-12 11:15   ` Alan Modra
  0 siblings, 0 replies; 5+ messages in thread
From: Alan Modra @ 2015-03-12 11:15 UTC (permalink / raw)
  To: David Edelsohn; +Cc: GCC Patches

On Wed, Dec 18, 2013 at 09:53:38AM -0500, David Edelsohn wrote:
https://gcc.gnu.org/ml/gcc-patches/2013-12/msg01599.html
> Why change the code from swapping the words at the initial
> change_address() to swapping the words in the call to gen_bswapsi2()?

Sorry for dropping this on the floor for so long.  I've been prodded
back into action by Redhat people and pr63150.

I don't recall a compelling technical reason for the change.  It was
probably to make my life easier in tracking the lifetimes of addr1
and addr2, necessary due to losing one of the scratch registers along
with early clobbers.  (In the splitter you question, addr1 might be
the same register as dest/dest_32.)

I suppose it also makes those splitters look a little more like the
one for bswapdi2_32bit, so a plus for maintenance.

The patch applies with some minor changes (see pr63150) and I've
checked for regressions on a current powerpc64le build.  OK to apply,
and on the branches?

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RS6000] bswapdi2 pattern, reload and lra
  2013-12-17 11:50 [RS6000] bswapdi2 pattern, reload and lra Alan Modra
  2013-12-18 14:53 ` David Edelsohn
@ 2015-03-16  2:51 ` David Edelsohn
  2015-03-16  3:33   ` Alan Modra
  1 sibling, 1 reply; 5+ messages in thread
From: David Edelsohn @ 2015-03-16  2:51 UTC (permalink / raw)
  To: GCC Patches, Alan Modra

On Tue, Dec 17, 2013 at 6:50 AM, Alan Modra <amodra@gmail.com> wrote:
> This patch is aimed at fixing test failures introduced by my
> 2013-12-07 change to bswapdi2_32bit:
> FAIL: gcc.target/powerpc/pr53199.c scan-assembler-times lwbrx 6
> FAIL: gcc.target/powerpc/pr53199.c scan-assembler-times stwbrx 6
>
> The 2013-12-07 change was necessary to make -m32 -mlra generate good
> code for pr53199.c:reg_reverse.  Too many '?'s on the r->r alternative
> result in lra never choosing that option.  Instead we get Z->r,
> ie. storing the input reg to a stack temp then using lwbrx from there.
> That means we have a load-hit-store flush with a severe slowdown.
> (See http://gcc.gnu.org/ml/gcc-patches/2013-12/msg00002.html for the
> corresponding -m64 result, a 4x slowdown.)
>
> A similar problem occurs with -m64 -mcpu=power7 -mlra due to
> bswapdi2_ldbrx having two '?'s on r->r.
>
> To fix this I ran into a whole lot of pain.  reload and lra are quite
> different in their selection of insn alternatives.  I could not find
> any combination of '?'s that generated the best code for both reload
> an lra on pr53199.c.  To see why, it's necessary to look (from a great
> height) at how reload chooses amongst alternatives.  A particular
> alternative gets a "loser" score, with the lowest "loser" being
> chosen.  "loser" is incremented
> a) when an operand does not match its constraint.
> b) when an alternative has a '?' in the constraint.
> c) when a scratch register is required.
> d) when an early clobber output clashes with one of the inputs.
>
> a) is fairly obvious.  For example, if we have a MEM when the operand
>    alternative needs a reg, then we'll require a reload.
> b) is also quite obvious.  Multiple '?'s accumulate.
> c) is a little more subtle.  It bites you when alternatives require
>    differing numbers of scratch registers.  Take for example
>    bswapdi2_64bit, which before this patch has three alternatives
>    Z->r with 3 scratch regs, (Z is a subset of m)
>    r->Z with 2 scratch regs,
>    r->r with 3 scratch regs.
>    All other things being equal, with reload you could correct this
>    disparity by adding a '?' to the r->Z alternative.  We might want
>    to do that so that Z->r and r->r are the same "distance" apart
>    as r->Z is from r->r.  With lra it seems that scratch regs are
>    weighted differently..
> d) is also tricky, and a trap for anyone optimizing insn selection for
>    functions like some in pr53199.c that have just one rtl insn with
>    early clobbers.  PowerPC generally returns function results in the
>    same register as the first argument, so these hit the early
>    clobber.  Code elsewhere in larger functions probably won't.
>    lra penalizes early clobbers differently to reload (a lot less).
>
> So, putting this all together..  Point (d) test implication is covered
> by the additional functions in pr53199.c.  Avoiding early clobbers
> where possible is good since it reduces differences between reload and
> lra insn alternative costing.  We also generate better code.  I
> managed to do this for all the Z->r bswapdi patterns, but stopped
> short of changing r->r as well since at that point everything looked
> OK!  Avoiding extra scratch registers for one alternative is also good.
> The op4 r->r scratch wasn't even used, and op4 for Z->r fairly easy to
> do without.  Renaming word_high/low to word1/2 was to make it a little
> easier to track lifetime of addr1/2.
>
> Bootstrapped and regression tested powerpc64-linux.  Output of
> pr53199.c inspected for sanity with -mcpu=power{6,7} -m{32,64} and
> {-mlra,}.  OK to apply?
>
> gcc/
>         * config/rs6000/rs6000.md (bswapdi2): Remove one scratch reg.
>         Modify Z->r bswapdi splitter to use dest in place of scratch.
>         In r->Z and Z->r bswapdi splitter rename word_high, word_low
>         to word1, word2 and rearrange logic to suit.
>         (bswapdi2_64bit): Remove early clobber on Z->r alternative.
>         (bswapdi2_ldbrx): Likewise.  Remove '??' on r->r.
>         (bswapdi2_32bit): Remove early clobber on Z->r alternative.
>         Add one '?' on r->r.  Modify Z->r splitter to avoid need for
>         early clobber.
> gcc/testsuite/
>         * gcc.target/powerpc/pr53199.c: Add extra functions.

Okay.

Thanks, David

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RS6000] bswapdi2 pattern, reload and lra
  2015-03-16  2:51 ` David Edelsohn
@ 2015-03-16  3:33   ` Alan Modra
  0 siblings, 0 replies; 5+ messages in thread
From: Alan Modra @ 2015-03-16  3:33 UTC (permalink / raw)
  To: David Edelsohn; +Cc: GCC Patches

On Sun, Mar 15, 2015 at 10:51:02PM -0400, David Edelsohn wrote:
> On Tue, Dec 17, 2013 at 6:50 AM, Alan Modra <amodra@gmail.com> wrote:
	PR target/61350
> > gcc/
> >         * config/rs6000/rs6000.md (bswapdi2): Remove one scratch reg.
> >         Modify Z->r bswapdi splitter to use dest in place of scratch.
> >         In r->Z and Z->r bswapdi splitter rename word_high, word_low
> >         to word1, word2 and rearrange logic to suit.
> >         (bswapdi2_64bit): Remove early clobber on Z->r alternative.
> >         (bswapdi2_ldbrx): Likewise.  Remove '??' on r->r.
> >         (bswapdi2_32bit): Remove early clobber on Z->r alternative.
> >         Add one '?' on r->r.  Modify Z->r splitter to avoid need for
> >         early clobber.
> > gcc/testsuite/
> >         * gcc.target/powerpc/pr53199.c: Add extra functions.
> 
> Okay.

Committed revision 221445.  I'll leave it a few days before applying
to gcc-4.9.

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-03-16  3:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-17 11:50 [RS6000] bswapdi2 pattern, reload and lra Alan Modra
2013-12-18 14:53 ` David Edelsohn
2015-03-12 11:15   ` Alan Modra
2015-03-16  2:51 ` David Edelsohn
2015-03-16  3:33   ` Alan Modra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).