public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Tighten LRA cycling check
@ 2018-01-04 21:28 Richard Sandiford
  2018-01-05 23:11 ` Jeff Law
  0 siblings, 1 reply; 3+ messages in thread
From: Richard Sandiford @ 2018-01-04 21:28 UTC (permalink / raw)
  To: gcc-patches; +Cc: vmakarov

LRA has code to try to prevent cycling, by avoiding reloads that
look too similar to the instruction being reloaded.  E.g. if we
have a R<-C move for some constant C, reloading the source with
another R<-C move is unlikely to be a good idea.

However, this safeguard unnecessarily triggered in tests like
the one in the patch.  We started with instructions like:

(insn 12 9 13 5 (set (reg:DI 0 x0)
        (reg/f:DI 459)) "reg-alloc-1.c":18 47 {*movdi_aarch64}
     (expr_list:REG_EQUAL (symbol_ref:DI ("x00") [flags 0xc0]  <var_decl 0x7f3c03c1f510 x00>)
        (nil)))

where r459 didn't get allocated a register and is equivalent to
constant x00.  LRA would then handle it like this:

Changing pseudo 459 in operand 1 of insn 12 on equiv `x00'
            1 Non-pseudo reload: reject+=2
            1 Non input pseudo reload: reject++
          alt=0,overall=609,losers=1,rld_nregs=1
[...]
          alt=13,overall=9,losers=1,rld_nregs=1
[...]
         Choosing alt 13 in insn 12:  (0) r  (1) w {*movdi_aarch64}

In other words, to avoid loading the constant x00 into another GPR,
LRA decided instead to move it into a floating-point register,
then move that floating-point register into x0:

      Creating newreg=630, assigning class FP_REGS to r630
      Set class ALL_REGS for r631
   12: x0:DI=r630:DI
      REG_EQUAL `x00'
    Inserting insn reload before:
  815: r631:DI=high(`x00')
  816: r630:DI=r631:DI+low(`x00')
      REG_EQUAL `x00'

That's inefficient and doesn't really help to resolve a cycling
problem, since the r630 destination of 816 needs to be reloaded into
a GPR anyway.

The cycling check already had an exception for source values that are
the result of an elimination.  This patch extends it to include the
result of equivalence substitution.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
Also tested by comparing the before and after assembly output for at
least one target per CPU directory.  The targets most affected were:

  aarch64_be-linux-gn
  aarch64-linux-gnu
  powerpc-eabispe
  riscv32-elf
  riscv64-elf
  sparc64-linux-gnu
  sparc-linux-gnu
  spu-elf

for which it improved code size overall.  There were minor register
renaming differences on some other targets.  x86 and powerpc*-linux-gnu
targets were unaffected.

OK to install?

Richard


2018-01-04  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* lra-constraints.c (process_alt_operands): Test for the equivalence
	substitutions when detecting a possible reload cycle.

gcc/testsuite/
	* gcc.target/aarch64/reg-alloc-1.c: New test.

Index: gcc/lra-constraints.c
===================================================================
--- gcc/lra-constraints.c	2018-01-04 21:28:22.495890864 +0000
+++ gcc/lra-constraints.c	2018-01-04 21:28:22.637885000 +0000
@@ -2866,7 +2866,12 @@ process_alt_operands (int only_alternati
 		  /* If it is a result of recent elimination in move
 		     insn we can transform it into an add still by
 		     using this alternative.  */
-		  && GET_CODE (no_subreg_reg_operand[1]) != PLUS)))
+		  && GET_CODE (no_subreg_reg_operand[1]) != PLUS
+		  /* Likewise if the source has been replaced with an
+		     equivalent value.  This only happens once -- the reload
+		     will use the equivalent value instead of the register it
+		     replaces -- so there should be no danger of cycling.  */
+		  && !equiv_substition_p[1])))
 	{
 	  /* We have a move insn and a new reload insn will be similar
 	     to the current insn.  We should avoid such situation as
Index: gcc/testsuite/gcc.target/aarch64/reg-alloc-1.c
===================================================================
--- /dev/null	2018-01-04 17:16:45.681472446 +0000
+++ gcc/testsuite/gcc.target/aarch64/reg-alloc-1.c	2018-01-04 21:28:22.637885000 +0000
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-reload-details" } */
+
+#define R1(X, Y) X (Y##0) X (Y##1) X (Y##2) X (Y##3) \
+                 X (Y##4) X (Y##5) X (Y##6) X (Y##7)
+#define R2(X) R1 (X, 0) R1 (X, 1) R1 (X, 2) R1 (X, 3)
+
+#define DEFINE(N) extern int x##N;
+R2 (DEFINE)
+
+void b1 (int *);
+
+void
+foo (int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+#define CALL(N) b1 (&x##N);
+      R2 (CALL);
+      R2 (CALL);
+    }
+#define INC(N) x##N += 1;
+  R2 (INC);
+}
+
+/* { dg-final { scan-rtl-dump-not "DI 32 v0" "reload" } } */

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Tighten LRA cycling check
  2018-01-04 21:28 Tighten LRA cycling check Richard Sandiford
@ 2018-01-05 23:11 ` Jeff Law
  2018-01-06 11:13   ` Richard Sandiford
  0 siblings, 1 reply; 3+ messages in thread
From: Jeff Law @ 2018-01-05 23:11 UTC (permalink / raw)
  To: gcc-patches, vmakarov, richard.sandiford

On 01/04/2018 02:28 PM, Richard Sandiford wrote:
> LRA has code to try to prevent cycling, by avoiding reloads that
> look too similar to the instruction being reloaded.  E.g. if we
> have a R<-C move for some constant C, reloading the source with
> another R<-C move is unlikely to be a good idea.
> 
> However, this safeguard unnecessarily triggered in tests like
> the one in the patch.  We started with instructions like:
> 
> (insn 12 9 13 5 (set (reg:DI 0 x0)
>         (reg/f:DI 459)) "reg-alloc-1.c":18 47 {*movdi_aarch64}
>      (expr_list:REG_EQUAL (symbol_ref:DI ("x00") [flags 0xc0]  <var_decl 0x7f3c03c1f510 x00>)
>         (nil)))
> 
> where r459 didn't get allocated a register and is equivalent to
> constant x00.  LRA would then handle it like this:
> 
> Changing pseudo 459 in operand 1 of insn 12 on equiv `x00'
>             1 Non-pseudo reload: reject+=2
>             1 Non input pseudo reload: reject++
>           alt=0,overall=609,losers=1,rld_nregs=1
> [...]
>           alt=13,overall=9,losers=1,rld_nregs=1
> [...]
>          Choosing alt 13 in insn 12:  (0) r  (1) w {*movdi_aarch64}
> 
> In other words, to avoid loading the constant x00 into another GPR,
> LRA decided instead to move it into a floating-point register,
> then move that floating-point register into x0:
> 
>       Creating newreg=630, assigning class FP_REGS to r630
>       Set class ALL_REGS for r631
>    12: x0:DI=r630:DI
>       REG_EQUAL `x00'
>     Inserting insn reload before:
>   815: r631:DI=high(`x00')
>   816: r630:DI=r631:DI+low(`x00')
>       REG_EQUAL `x00'
> 
> That's inefficient and doesn't really help to resolve a cycling
> problem, since the r630 destination of 816 needs to be reloaded into
> a GPR anyway.
> 
> The cycling check already had an exception for source values that are
> the result of an elimination.  This patch extends it to include the
> result of equivalence substitution.
> 
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> Also tested by comparing the before and after assembly output for at
> least one target per CPU directory.  The targets most affected were:
> 
>   aarch64_be-linux-gn
>   aarch64-linux-gnu
>   powerpc-eabispe
>   riscv32-elf
>   riscv64-elf
>   sparc64-linux-gnu
>   sparc-linux-gnu
>   spu-elf
> 
> for which it improved code size overall.  There were minor register
> renaming differences on some other targets.  x86 and powerpc*-linux-gnu
> targets were unaffected.
> 
> OK to install?
> 
> Richard
> 
> 
> 2018-01-04  Richard Sandiford  <richard.sandiford@linaro.org>
> 
> gcc/
> 	* lra-constraints.c (process_alt_operands): Test for the equivalence
> 	substitutions when detecting a possible reload cycle.
> 
> gcc/testsuite/
> 	* gcc.target/aarch64/reg-alloc-1.c: New test.
OK.  Presumably related to 83699?  Do you want to reference it in the
ChangeLog?

jeff

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Tighten LRA cycling check
  2018-01-05 23:11 ` Jeff Law
@ 2018-01-06 11:13   ` Richard Sandiford
  0 siblings, 0 replies; 3+ messages in thread
From: Richard Sandiford @ 2018-01-06 11:13 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches, vmakarov

Jeff Law <law@redhat.com> writes:
> On 01/04/2018 02:28 PM, Richard Sandiford wrote:
>> LRA has code to try to prevent cycling, by avoiding reloads that
>> look too similar to the instruction being reloaded.  E.g. if we
>> have a R<-C move for some constant C, reloading the source with
>> another R<-C move is unlikely to be a good idea.
>> 
>> However, this safeguard unnecessarily triggered in tests like
>> the one in the patch.  We started with instructions like:
>> 
>> (insn 12 9 13 5 (set (reg:DI 0 x0)
>>         (reg/f:DI 459)) "reg-alloc-1.c":18 47 {*movdi_aarch64}
>>      (expr_list:REG_EQUAL (symbol_ref:DI ("x00") [flags 0xc0]  <var_decl 0x7f3c03c1f510 x00>)
>>         (nil)))
>> 
>> where r459 didn't get allocated a register and is equivalent to
>> constant x00.  LRA would then handle it like this:
>> 
>> Changing pseudo 459 in operand 1 of insn 12 on equiv `x00'
>>             1 Non-pseudo reload: reject+=2
>>             1 Non input pseudo reload: reject++
>>           alt=0,overall=609,losers=1,rld_nregs=1
>> [...]
>>           alt=13,overall=9,losers=1,rld_nregs=1
>> [...]
>>          Choosing alt 13 in insn 12:  (0) r  (1) w {*movdi_aarch64}
>> 
>> In other words, to avoid loading the constant x00 into another GPR,
>> LRA decided instead to move it into a floating-point register,
>> then move that floating-point register into x0:
>> 
>>       Creating newreg=630, assigning class FP_REGS to r630
>>       Set class ALL_REGS for r631
>>    12: x0:DI=r630:DI
>>       REG_EQUAL `x00'
>>     Inserting insn reload before:
>>   815: r631:DI=high(`x00')
>>   816: r630:DI=r631:DI+low(`x00')
>>       REG_EQUAL `x00'
>> 
>> That's inefficient and doesn't really help to resolve a cycling
>> problem, since the r630 destination of 816 needs to be reloaded into
>> a GPR anyway.
>> 
>> The cycling check already had an exception for source values that are
>> the result of an elimination.  This patch extends it to include the
>> result of equivalence substitution.
>> 
>> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
>> Also tested by comparing the before and after assembly output for at
>> least one target per CPU directory.  The targets most affected were:
>> 
>>   aarch64_be-linux-gn
>>   aarch64-linux-gnu
>>   powerpc-eabispe
>>   riscv32-elf
>>   riscv64-elf
>>   sparc64-linux-gnu
>>   sparc-linux-gnu
>>   spu-elf
>> 
>> for which it improved code size overall.  There were minor register
>> renaming differences on some other targets.  x86 and powerpc*-linux-gnu
>> targets were unaffected.
>> 
>> OK to install?
>> 
>> Richard
>> 
>> 
>> 2018-01-04  Richard Sandiford  <richard.sandiford@linaro.org>
>> 
>> gcc/
>> 	* lra-constraints.c (process_alt_operands): Test for the equivalence
>> 	substitutions when detecting a possible reload cycle.
>> 
>> gcc/testsuite/
>> 	* gcc.target/aarch64/reg-alloc-1.c: New test.
> OK.

Thanks.

> Presumably related to 83699?  Do you want to reference it in the
> ChangeLog?

It's actually an unrelated problem that I'd originally seen on SVE.
I guess I was just lucky to have two cycling bugs around the same time :-)

Richard

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-01-06 11:13 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-04 21:28 Tighten LRA cycling check Richard Sandiford
2018-01-05 23:11 ` Jeff Law
2018-01-06 11:13   ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).