[Patch, lra] PR70751, correct the cost for spilling non-pseudo into memory

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [Patch, lra] PR70751, correct the cost for spilling non-pseudo into memory
@ 2016-06-08 16:47 Jiong Wang
  2016-06-09 15:52 ` Jeff Law
  0 siblings, 1 reply; 14+ messages in thread
From: Jiong Wang @ 2016-06-08 16:47 UTC (permalink / raw)
  To: GCC Patches; +Cc: Vladimir N Makarov

[-- Attachment #1: Type: text/plain, Size: 1213 bytes --]

As discussed on the PR

   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70751,

here is the patch.

For this particular failure on arm, *arm_movsi_insn has the following operand
constraints:
   
   operand 0: "=rk,r,r,r,rk,m"
   operand 1: "rk, I,K,j,mi,rk"

gcc won't explicitly refuse an unmatch CT_MEMORY operand (r235184) if it
comes from substituion that alternative (alt) 4 got a chance to compete with
alt 0, and eventually be the winner as it's with rld_nregs=0 while alt 0 is
with rld_nregs=1.

I fell it's OK to give alt 4 a chance here, but we should calculate the cost
correctly.

For alt 4, it should be treated as spill into memory, but currently lra only
recognize a spill for pseudo register.   While the spilled rtx for alt 4 is a
plus after equiv substitution.

          (plus:SI (reg/f:SI 102 sfp)
             (const_int 4 [0x4]))

This patch thus let lra-constraint cost spill of Non-pseudo as well and fixed
the regression.

x86_64/aarch64/arm boostrap and regression OK.
arm bootstrapped cc1 is about 0.3% smaller in code size.

OK for trunk?

gcc/
         PR rtl-optimization/70751
         * lra-constraints.c (process_alt_operands): Recognize Non-pseudo spilled into
         memory.


[-- Attachment #2: lra.patch --]
[-- Type: text/x-patch, Size: 1627 bytes --]

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index e4e6c8c..8f2db87 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -2474,14 +2474,29 @@ process_alt_operands (int only_alternative)
 	      /* We are trying to spill pseudo into memory.  It is
 		 usually more costly than moving to a hard register
 		 although it might takes the same number of
-		 reloads.  */
-	      if (no_regs_p && REG_P (op) && hard_regno[nop] >= 0)
+		 reloads.
+
+		 PR 70751, non-pseudo spill may happen also.  For example, if an
+		 operand comes from equiv substitution, then we won't reject it
+		 if it's an unmatch CT_MEMORY in above code (see r235184).
+		 Suppose a target allows both register and memory in the
+		 operand constraint alternatives, then it's typical that an
+		 eliminable register has a substition of "base + offset" which
+		 can either be reloaded by a simple "new_reg <= base + offset"
+		 which will match the register constraint, or a similar reg
+		 addition followed by further spill to and reload from memory
+		 which will match the memory constraint, but this memory spill
+		 will be much more costly usually.
+
+		 Code below increases the reject for both pseudo and non-pseudo
+		 spill.  */
+	      if (no_regs_p && !(REG_P (op) && hard_regno[nop] < 0))
 		{
 		  if (lra_dump_file != NULL)
 		    fprintf
 		      (lra_dump_file,
-		       "            %d Spill pseudo into memory: reject+=3\n",
-		       nop);
+		       "            %d Spill %spseudo into memory: reject+=3\n",
+		       nop, REG_P (op) ? "" : "Non-");
 		  reject += 3;
 		  if (VECTOR_MODE_P (mode))
 		    {

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch, lra] PR70751, correct the cost for spilling non-pseudo into memory
  2016-06-08 16:47 [Patch, lra] PR70751, correct the cost for spilling non-pseudo into memory Jiong Wang
@ 2016-06-09 15:52 ` Jeff Law
  2016-06-27 16:33   ` Dominik Vogt
  0 siblings, 1 reply; 14+ messages in thread
From: Jeff Law @ 2016-06-09 15:52 UTC (permalink / raw)
  To: Jiong Wang, GCC Patches; +Cc: Vladimir N Makarov

On 06/08/2016 10:47 AM, Jiong Wang wrote:
> As discussed on the PR
>
>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70751,
>
> here is the patch.
>
> For this particular failure on arm, *arm_movsi_insn has the following
> operand
> constraints:
>     operand 0: "=rk,r,r,r,rk,m"
>   operand 1: "rk, I,K,j,mi,rk"
>
> gcc won't explicitly refuse an unmatch CT_MEMORY operand (r235184) if it
> comes from substituion that alternative (alt) 4 got a chance to compete
> with
> alt 0, and eventually be the winner as it's with rld_nregs=0 while alt 0 is
> with rld_nregs=1.
>
> I fell it's OK to give alt 4 a chance here, but we should calculate the
> cost
> correctly.
>
> For alt 4, it should be treated as spill into memory, but currently lra
> only
> recognize a spill for pseudo register.   While the spilled rtx for alt 4
> is a
> plus after equiv substitution.
>
>          (plus:SI (reg/f:SI 102 sfp)
>             (const_int 4 [0x4]))
>
> This patch thus let lra-constraint cost spill of Non-pseudo as well and
> fixed
> the regression.
>
> x86_64/aarch64/arm boostrap and regression OK.
> arm bootstrapped cc1 is about 0.3% smaller in code size.
>
> OK for trunk?
>
> gcc/
>         PR rtl-optimization/70751
>         * lra-constraints.c (process_alt_operands): Recognize Non-pseudo
> spilled into
>         memory.
>
I think the change itself is fine, but the comment could use some work. 
Actually I think you just need to remove the first sentence (the one 
referring to BZ70751 and r235184) and keep everything from "Suppose a 
target" onward.

OK with that fix.

jeff


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch, lra] PR70751, correct the cost for spilling non-pseudo into memory
  2016-06-09 15:52 ` Jeff Law
@ 2016-06-27 16:33   ` Dominik Vogt
  2016-06-28 10:01     ` Jiong Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Dominik Vogt @ 2016-06-27 16:33 UTC (permalink / raw)
  To: gcc-patches
  Cc: Jeff Law, Jiong Wang, Vladimir N Makarov, Andreas Krebbel, Robin Dapp

On Thu, Jun 09, 2016 at 09:52:39AM -0600, Jeff Law wrote:
> On 06/08/2016 10:47 AM, Jiong Wang wrote:
> >As discussed on the PR
> >
> >  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70751,
> >
> >here is the patch.

This commit has introduced an ICE with s390x, march=z13.  Is it a
backend bug or one in the middleend?

-- x.c --
void foo(int *n, int off, int *m)
{
  int i;

  for(i = 0 ;i <= 3; i++)
    {
      n[off + i] = m[2 * i];
      n[off + 7 - i] = m[2 * i + 1];
    }
}
-- snip --

  $ gcc -S -O3 -march=z13 x.c                                   
x.c: In function â€˜fooâ€™:
x.c:10:1: internal compiler error: Max. number of generated reload insns per insn is achieved (90)

 }
 ^
0x804c34b7 lra_constraints(bool)
	gcc/lra-constraints.c:4468
0x804b085f lra(_IO_FILE*)
	gcc/lra.c:2290
0x804690fb do_reload
	gcc/ira.c:5384
0x804690fb execute
	gcc/ira.c:5568

> >
> >For this particular failure on arm, *arm_movsi_insn has the following
> >operand
> >constraints:
> >    operand 0: "=rk,r,r,r,rk,m"
> >  operand 1: "rk, I,K,j,mi,rk"
> >
> >gcc won't explicitly refuse an unmatch CT_MEMORY operand (r235184) if it
> >comes from substituion that alternative (alt) 4 got a chance to compete
> >with
> >alt 0, and eventually be the winner as it's with rld_nregs=0 while alt 0 is
> >with rld_nregs=1.
> >
> >I fell it's OK to give alt 4 a chance here, but we should calculate the
> >cost
> >correctly.
> >
> >For alt 4, it should be treated as spill into memory, but currently lra
> >only
> >recognize a spill for pseudo register.   While the spilled rtx for alt 4
> >is a
> >plus after equiv substitution.
> >
> >         (plus:SI (reg/f:SI 102 sfp)
> >            (const_int 4 [0x4]))
> >
> >This patch thus let lra-constraint cost spill of Non-pseudo as well and
> >fixed
> >the regression.
> >
> >x86_64/aarch64/arm boostrap and regression OK.
> >arm bootstrapped cc1 is about 0.3% smaller in code size.
> >
> >OK for trunk?
> >
> >gcc/
> >        PR rtl-optimization/70751
> >        * lra-constraints.c (process_alt_operands): Recognize Non-pseudo
> >spilled into
> >        memory.
> >
> I think the change itself is fine, but the comment could use some
> work. Actually I think you just need to remove the first sentence
> (the one referring to BZ70751 and r235184) and keep everything from
> "Suppose a target" onward.
> 
> OK with that fix.
> 
> jeff
> 


Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch, lra] PR70751, correct the cost for spilling non-pseudo into memory
  2016-06-27 16:33   ` Dominik Vogt
@ 2016-06-28 10:01     ` Jiong Wang
  2016-06-28 10:10       ` Jiong Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Jiong Wang @ 2016-06-28 10:01 UTC (permalink / raw)
  To: vogt, gcc-patches, Jeff Law, Vladimir N Makarov, Andreas Krebbel,
	Robin Dapp

On 27/06/16 17:26, Dominik Vogt wrote:
> On Thu, Jun 09, 2016 at 09:52:39AM -0600, Jeff Law wrote:
>> On 06/08/2016 10:47 AM, Jiong Wang wrote:
>>> As discussed on the PR
>>>
>>>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70751,
>>>
>>> here is the patch.
> This commit has introduced an ICE with s390x, march=z13.  Is it a
> backend bug or one in the middleend?
>
> -- x.c --
> void foo(int *n, int off, int *m)
> {
>    int i;
>
>    for(i = 0 ;i <= 3; i++)
>      {
>        n[off + i] = m[2 * i];
>        n[off + 7 - i] = m[2 * i + 1];
>      }
> }
> -- snip --
>
>    $ gcc -S -O3 -march=z13 x.c
> x.c: In function â€˜fooâ€™:
> x.c:10:1: internal compiler error: Max. number of generated reload insns per insn is achieved (90)

I tried the following on gcc trunk (r237817, 28-June-2016), can't 
reproduce the issue.

configure --target=s390-linux
./cc1 -O3 1.c -nostdinc -march=z13

Can you please paste "gcc -v -S -O3 -march=z13 x.c" to see what's passed 
to cc1?

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch, lra] PR70751, correct the cost for spilling non-pseudo into memory
  2016-06-28 10:01     ` Jiong Wang
@ 2016-06-28 10:10       ` Jiong Wang
  2016-06-28 14:19         ` Jiong Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Jiong Wang @ 2016-06-28 10:10 UTC (permalink / raw)
  To: vogt, gcc-patches, Jeff Law, Vladimir N Makarov, Andreas Krebbel,
	Robin Dapp



On 28/06/16 10:51, Jiong Wang wrote:
> On 27/06/16 17:26, Dominik Vogt wrote:
>> On Thu, Jun 09, 2016 at 09:52:39AM -0600, Jeff Law wrote:
>>> On 06/08/2016 10:47 AM, Jiong Wang wrote:
>>>> As discussed on the PR
>>>>
>>>>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70751,
>>>>
>>>> here is the patch.
>> This commit has introduced an ICE with s390x, march=z13.  Is it a
>> backend bug or one in the middleend?
>>
>> -- x.c --
>> void foo(int *n, int off, int *m)
>> {
>>    int i;
>>
>>    for(i = 0 ;i <= 3; i++)
>>      {
>>        n[off + i] = m[2 * i];
>>        n[off + 7 - i] = m[2 * i + 1];
>>      }
>> }
>> -- snip --
>>
>>    $ gcc -S -O3 -march=z13 x.c
>> x.c: In function â€˜fooâ€™:
>> x.c:10:1: internal compiler error: Max. number of generated reload 
>> insns per insn is achieved (90)
>
> I tried the following on gcc trunk (r237817, 28-June-2016), can't 
> reproduce the issue.
>
> configure --target=s390-linux
> ./cc1 -O3 1.c -nostdinc -march=z13
>
> Can you please paste "gcc -v -S -O3 -march=z13 x.c" to see what's 
> passed to cc1?

Reproduced with --target=s390x-linux.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch, lra] PR70751, correct the cost for spilling non-pseudo into memory
  2016-06-28 10:10       ` Jiong Wang
@ 2016-06-28 14:19         ` Jiong Wang
  2016-06-29  7:28           ` Andreas Krebbel
  0 siblings, 1 reply; 14+ messages in thread
From: Jiong Wang @ 2016-06-28 14:19 UTC (permalink / raw)
  To: vogt, Jeff Law, Vladimir N Makarov
  Cc: gcc-patches, Andreas Krebbel, Robin Dapp



On 28/06/16 11:01, Jiong Wang wrote:
>
>
> On 28/06/16 10:51, Jiong Wang wrote:
>> On 27/06/16 17:26, Dominik Vogt wrote:
>>> On Thu, Jun 09, 2016 at 09:52:39AM -0600, Jeff Law wrote:
>>>> On 06/08/2016 10:47 AM, Jiong Wang wrote:
>>>>> As discussed on the PR
>>>>>
>>>>>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70751,
>>>>>
>>>>> here is the patch.
>>> This commit has introduced an ICE with s390x, march=z13.  Is it a
>>> backend bug or one in the middleend?
>>>
>>> -- x.c --
>>> void foo(int *n, int off, int *m)
>>> {
>>>    int i;
>>>
>>>    for(i = 0 ;i <= 3; i++)
>>>      {
>>>        n[off + i] = m[2 * i];
>>>        n[off + 7 - i] = m[2 * i + 1];
>>>      }
>>> }
>>> -- snip --
>>>
>>>    $ gcc -S -O3 -march=z13 x.c
>>> x.c: In function â€˜fooâ€™:
>>> x.c:10:1: internal compiler error: Max. number of generated reload 
>>> insns per insn is achieved (90)
>>
>> I tried the following on gcc trunk (r237817, 28-June-2016), can't 
>> reproduce the issue.
>>
>> configure --target=s390-linux
>> ./cc1 -O3 1.c -nostdinc -march=z13
>>
>> Can you please paste "gcc -v -S -O3 -march=z13 x.c" to see what's 
>> passed to cc1?
>
> Reproduced with --target=s390x-linux.

The insn causing trouble is

(insn 41 38 120 4 (set (mem:V4SI (plus:DI (plus:DI (reg/v/f:DI 116 [ n ])
                     (reg:DI 69 [ _39 ]))
                 (const_int -16 [0xfffffffffffffff0]))
         (subreg:V4SI (reg:V16QI 134) 0)) 1.c:8 300 {movv4si}

The mem address actually doesn't qualify s390_short_displacement as the
disp is negative number, but s390 has allowed it to pass
legitiate_address_p check during early address generation stage. The
following draft patch fixed this ICE.

(plus:DI (plus:DI (reg/v/f:DI 116 [ n ])
                     (reg:DI 69 [ _39 ]))

So my first impression is TARGET_LEGITIMATE_ADDRESS_P on s390 do need a
fix here. The following draft patch fix this, my fix may be in
correct as normally we will allow illegal constant offset if it's
associated with virtual frame register before virtual register
elimination, I don't know if that should be considered here. Also I
don't know if such check should be constrainted under some architecture
features.

--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -4374,6 +4374,11 @@ s390_legitimate_address_p (machine_mode mode, rtx 
addr, bool strict)
                || REGNO_REG_CLASS (REGNO (ad.indx)) == ADDR_REGS))
           return false;
      }
+
+  if (ad.disp
+      && !s390_short_displacement (ad.disp))
+    return false;
+
    return true;
  }

Meanwhile I feel LRA might be too strict here, as we can't assume all
address legitimized before lra, right? we still need to handle reload
illegal address. While now, a reload of

   set (mem (reg + reg + offset)) regA

   (The offset is out of range that the inner address doesn't pass
    constraint check.)

into

    regB <- reg + reg + offset
   (set (mem (regB)) regA)

is treate as spill after r237277, so I feel the following check in
lra-constraints.c also needs a relax?

if (no_regs_p && !(REG_P (op) && hard_regno[nop] < 0))


Comments?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch, lra] PR70751, correct the cost for spilling non-pseudo into memory
  2016-06-28 14:19         ` Jiong Wang
@ 2016-06-29  7:28           ` Andreas Krebbel
  2016-06-29 21:35             ` Jiong Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Andreas Krebbel @ 2016-06-29  7:28 UTC (permalink / raw)
  To: Jiong Wang, vogt, Jeff Law, Vladimir N Makarov; +Cc: gcc-patches, Robin Dapp

On 06/28/2016 04:16 PM, Jiong Wang wrote:
...
> So my first impression is TARGET_LEGITIMATE_ADDRESS_P on s390 do need a
> fix here. The following draft patch fix this, my fix may be in
> correct as normally we will allow illegal constant offset if it's
> associated with virtual frame register before virtual register
> elimination, I don't know if that should be considered here. Also I
> don't know if such check should be constrainted under some architecture
> features.
> 
> --- a/gcc/config/s390/s390.c
> +++ b/gcc/config/s390/s390.c
> @@ -4374,6 +4374,11 @@ s390_legitimate_address_p (machine_mode mode, rtx 
> addr, bool strict)
>                 || REGNO_REG_CLASS (REGNO (ad.indx)) == ADDR_REGS))
>            return false;
>       }
> +
> +  if (ad.disp
> +      && !s390_short_displacement (ad.disp))
> +    return false;
> +
>     return true;
>   }

Whether short displacement is required or not depends on the instruction. We always relied on reload
to take care of this.  legitimate_address_p needs to accept the union of all valid adresses on
S/390.  That change probably would cause a severe performance regression.

> Meanwhile I feel LRA might be too strict here, as we can't assume all
> address legitimized before lra, right? we still need to handle reload
> illegal address. While now, a reload of
> 
>    set (mem (reg + reg + offset)) regA
> 
>    (The offset is out of range that the inner address doesn't pass
>     constraint check.)
> 
> into
> 
>     regB <- reg + reg + offset
>    (set (mem (regB)) regA)

This is exactly what is supposed to happen in that case. All addresses which might be invalid to be
used directly in an insn can be handled with load address or load address relative long (+ secondary
reload for odd addresses).

> is treate as spill after r237277, so I feel the following check in
> lra-constraints.c also needs a relax?
> 
> if (no_regs_p && !(REG_P (op) && hard_regno[nop] < 0))
> 
> 
> Comments?
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Patch, lra] PR70751, correct the cost for spilling non-pseudo into memory
  2016-06-29  7:28           ` Andreas Krebbel
@ 2016-06-29 21:35             ` Jiong Wang
  2016-06-30 17:35               ` [lra] Cleanup the use of offmemok and don't count spilling cost for it Jiong Wang
       [not found]               ` <47d1401e-2d80-f7de-20c3-57f1876b680a@foss.arm.com>
  0 siblings, 2 replies; 14+ messages in thread
From: Jiong Wang @ 2016-06-29 21:35 UTC (permalink / raw)
  To: Andreas Krebbel
  Cc: vogt, Jeff Law, Vladimir N Makarov, gcc-patches, Robin Dapp


Andreas Krebbel writes:

> On 06/28/2016 04:16 PM, Jiong Wang wrote:
> ...
>> So my first impression is TARGET_LEGITIMATE_ADDRESS_P on s390 do need a
>> fix here. The following draft patch fix this, my fix may be in
>> correct as normally we will allow illegal constant offset if it's
>> associated with virtual frame register before virtual register
>> elimination, I don't know if that should be considered here. Also I
>> don't know if such check should be constrainted under some architecture
>> features.
>> 
>> --- a/gcc/config/s390/s390.c
>> +++ b/gcc/config/s390/s390.c
>> @@ -4374,6 +4374,11 @@ s390_legitimate_address_p (machine_mode mode, rtx 
>> addr, bool strict)
>>                 || REGNO_REG_CLASS (REGNO (ad.indx)) == ADDR_REGS))
>>            return false;
>>       }
>> +
>> +  if (ad.disp
>> +      && !s390_short_displacement (ad.disp))
>> +    return false;
>> +
>>     return true;
>>   }
>
> Whether short displacement is required or not depends on the instruction. We always relied on reload
> to take care of this.  legitimate_address_p needs to accept the union of all valid adresses on
> S/390.  That change probably would cause a severe performance regression.
>
>> Meanwhile I feel LRA might be too strict here, as we can't assume all
>> address legitimized before lra, right? we still need to handle reload
>> illegal address. While now, a reload of
>> 
>>    set (mem (reg + reg + offset)) regA
>> 
>>    (The offset is out of range that the inner address doesn't pass
>>     constraint check.)
>> 
>> into
>> 
>>     regB <- reg + reg + offset
>>    (set (mem (regB)) regA)
>
> This is exactly what is supposed to happen in that case. All addresses which might be invalid to be
> used directly in an insn can be handled with load address or load address relative long (+ secondary
> reload for odd addresses).
>
>> is treate as spill after r237277, so I feel the following check in
>> lra-constraints.c also needs a relax?
>> 
>> if (no_regs_p && !(REG_P (op) && hard_regno[nop] < 0))
>> 
>> 
>> Comments?

I am testing a fix which teach LRA don't add spill cost if it's "offmemok".

-- 
Regards,
Jiong

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [lra] Cleanup the use of offmemok and don't count spilling cost for it
  2016-06-29 21:35             ` Jiong Wang
@ 2016-06-30 17:35               ` Jiong Wang
  2016-07-04 13:12                 ` Bernd Schmidt
       [not found]               ` <47d1401e-2d80-f7de-20c3-57f1876b680a@foss.arm.com>
  1 sibling, 1 reply; 14+ messages in thread
From: Jiong Wang @ 2016-06-30 17:35 UTC (permalink / raw)
  To: gcc-patches
  Cc: Andreas Krebbel, vogt, Jeff Law, Vladimir N Makarov, Robin Dapp

[-- Attachment #1: Type: text/plain, Size: 3169 bytes --]

On 29/06/16 22:31, Jiong Wang wrote:
> Andreas Krebbel writes:
>
>> On 06/28/2016 04:16 PM, Jiong Wang wrote:
>> ...
>>> So my first impression is TARGET_LEGITIMATE_ADDRESS_P on s390 do need a
>>> fix here. The following draft patch fix this, my fix may be in
>>> correct as normally we will allow illegal constant offset if it's
>>> associated with virtual frame register before virtual register
>>> elimination, I don't know if that should be considered here. Also I
>>> don't know if such check should be constrainted under some architecture
>>> features.
>>>
>>> --- a/gcc/config/s390/s390.c
>>> +++ b/gcc/config/s390/s390.c
>>> @@ -4374,6 +4374,11 @@ s390_legitimate_address_p (machine_mode mode, rtx
>>> addr, bool strict)
>>>                  || REGNO_REG_CLASS (REGNO (ad.indx)) == ADDR_REGS))
>>>             return false;
>>>        }
>>> +
>>> +  if (ad.disp
>>> +      && !s390_short_displacement (ad.disp))
>>> +    return false;
>>> +
>>>      return true;
>>>    }
>> Whether short displacement is required or not depends on the instruction. We always relied on reload
>> to take care of this.  legitimate_address_p needs to accept the union of all valid adresses on
>> S/390.  That change probably would cause a severe performance regression.
>>
>>> Meanwhile I feel LRA might be too strict here, as we can't assume all
>>> address legitimized before lra, right? we still need to handle reload
>>> illegal address. While now, a reload of
>>>
>>>     set (mem (reg + reg + offset)) regA
>>>
>>>     (The offset is out of range that the inner address doesn't pass
>>>      constraint check.)
>>>
>>> into
>>>
>>>      regB <- reg + reg + offset
>>>     (set (mem (regB)) regA)
>> This is exactly what is supposed to happen in that case. All addresses which might be invalid to be
>> used directly in an insn can be handled with load address or load address relative long (+ secondary
>> reload for odd addresses).
>>
>>> is treate as spill after r237277, so I feel the following check in
>>> lra-constraints.c also needs a relax?
>>>
>>> if (no_regs_p && !(REG_P (op) && hard_regno[nop] < 0))
>>>
>>>
>>> Comments?
> I am testing a fix which teach LRA don't add spill cost if it's "offmemok".
Here is the patch,

 From my understanding, "offmemok" is used to represent a memory operand
who's address we want to reload, and searching of it's reference location
seems confirmed my understanding as it's always used together with MEM_P 
check.

So this patch does the following modifications:

   * Only set offmemok to true if MEM_P is also true, as otherwise offmemok
     is not used.
   * Remove redundant MEM_P check which was used together with offmemok.
   * Avoid the addition of spilling cost if offmemok be true as an address
     calculation reload is not spilling.

bootstrap & gcc/g++ regression OK on x86_64/aarch64/arm.

OK for trunk?

2016-06-30  Jiong Wang  <jiong.wang@arm.com>

gcc
   * lra-constraints.c (process_alt_operands): Only set "offmemok" for
   MEM_P.  Remove redundant MEM_P check if it's used together with
   "offmemok" check.  Don't add spilling cost for "offmemok".
   (curr_insn_transform): Remove redundant MEM_P check.


[-- Attachment #2: lra.patch --]
[-- Type: text/x-patch, Size: 1841 bytes --]

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index bf08dce..4546f1e 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -1967,7 +1967,6 @@ process_alt_operands (int only_alternative)
 			   combination, because we can't reload
 			   it.	*/
 			if (curr_alt_offmemok[m]
-			    && MEM_P (*curr_id->operand_loc[m])
 			    && curr_alt[m] == NO_REGS && ! curr_alt_win[m])
 			  continue;
 		      }
@@ -2089,7 +2088,8 @@ process_alt_operands (int only_alternative)
 			  || equiv_substition_p[nop])
 			badop = false;
 		      constmemok = true;
-		      offmemok = true;
+		      if (MEM_P (op))
+			offmemok = true;
 		      break;
 
 		    case CT_ADDRESS:
@@ -2436,8 +2436,7 @@ process_alt_operands (int only_alternative)
 		  reject += LRA_MAX_REJECT;
 		}
 
-	      if (! (MEM_P (op) && offmemok)
-		  && ! (const_to_mem && constmemok))
+	      if (!offmemok && ! (const_to_mem && constmemok))
 		{
 		  /* We prefer to reload pseudos over reloading other
 		     things, since such reloads may be able to be
@@ -2488,7 +2487,9 @@ process_alt_operands (int only_alternative)
 
 		 Code below increases the reject for both pseudo and non-pseudo
 		 spill.  */
-	      if (no_regs_p && !(REG_P (op) && hard_regno[nop] < 0))
+	      if (no_regs_p
+		  && !offmemok
+		  && !(REG_P (op) && hard_regno[nop] < 0))
 		{
 		  if (lra_dump_file != NULL)
 		    fprintf
@@ -3909,7 +3910,7 @@ curr_insn_transform (bool check_only_p)
 	 appearing where an offsettable address will do.  It also may
 	 be a case when the address should be special in other words
 	 not a general one (e.g. it needs no index reg).  */
-      if (goal_alt_matched[i][0] == -1 && goal_alt_offmemok[i] && MEM_P (op))
+      if (goal_alt_matched[i][0] == -1 && goal_alt_offmemok[i])
 	{
 	  enum reg_class rclass;
 	  rtx *loc = &XEXP (op, 0);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [lra] Cleanup the use of offmemok and don't count spilling cost for it
  2016-06-30 17:35               ` [lra] Cleanup the use of offmemok and don't count spilling cost for it Jiong Wang
@ 2016-07-04 13:12                 ` Bernd Schmidt
  2016-07-04 14:05                   ` Jiong Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Bernd Schmidt @ 2016-07-04 13:12 UTC (permalink / raw)
  To: Jiong Wang, gcc-patches
  Cc: Andreas Krebbel, vogt, Jeff Law, Vladimir N Makarov, Robin Dapp

On 06/30/2016 07:24 PM, Jiong Wang wrote:
> From my understanding, "offmemok" is used to represent a memory operand
> who's address we want to reload, and searching of it's reference location
> seems confirmed my understanding as it's always used together with MEM_P
> check.
>
> So this patch does the following modifications:
>
>   * Only set offmemok to true if MEM_P is also true, as otherwise offmemok
>     is not used.
 >   * Remove redundant MEM_P check which was used together with offmemok.

I really dislike this part. The various _ok variables say what is 
acceptable - the type of the operand doesn't really factor into that. I 
think the code becomes more confusing when merging the two.

>   * Avoid the addition of spilling cost if offmemok be true as an address
>     calculation reload is not spilling.

This part seems to be plausible. I am however unclear how this would fix 
the ICE (if it does - Andreas?) since it only seems to modify cost 
computations. What exactly is preventing the correct sequence of events 
(reloading the address) from triggering without this patch?


Bernd

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [lra] Cleanup the use of offmemok and don't count spilling cost for it
  2016-07-04 13:12                 ` Bernd Schmidt
@ 2016-07-04 14:05                   ` Jiong Wang
  2016-07-04 17:40                     ` Bernd Schmidt
  0 siblings, 1 reply; 14+ messages in thread
From: Jiong Wang @ 2016-07-04 14:05 UTC (permalink / raw)
  To: Bernd Schmidt
  Cc: gcc-patches, Andreas Krebbel, vogt, Jeff Law, Vladimir N Makarov,
	Robin Dapp

On 04/07/16 14:12, Bernd Schmidt wrote:
> On 06/30/2016 07:24 PM, Jiong Wang wrote:
>> From my understanding, "offmemok" is used to represent a memory operand
>> who's address we want to reload, and searching of it's reference 
>> location
>> seems confirmed my understanding as it's always used together with MEM_P
>> check.
>>
>> So this patch does the following modifications:
>>
>>   * Only set offmemok to true if MEM_P is also true, as otherwise 
>> offmemok
>>     is not used.
> >   * Remove redundant MEM_P check which was used together with offmemok.
>
> I really dislike this part. The various _ok variables say what is 
> acceptable - the type of the operand doesn't really factor into that. 
> I think the code becomes more confusing when merging the two.
>
>>   * Avoid the addition of spilling cost if offmemok be true as an 
>> address
>>     calculation reload is not spilling.
>
> This part seems to be plausible. I am however unclear how this would 
> fix the ICE (if it does - Andreas?) since it only seems to modify cost 
> computations. What exactly is preventing the correct sequence of 
> events (reloading the address) from triggering without this patch?
>

Hi Bernd,

The ICE reported was a "Max. number of generated reload insns per insn
is achieved",

The input rtx pattern which triggers this issue is:

(insn 41 38 120 4 (set (mem:V4SI
         (plus:DI (plus:DI (reg/v/f:DI 116) (reg:DI 69 ))
                  (const_int -16))
      (subreg:V4SI (reg:V16QI 134) 0)) {movv4si}

And the corresponding s390 patten is "mov<mode>" for V_128.

(define_insn "mov<mode>"
   [(set (match_operand:V_128 0 "" "=v,v,R,  v,  v,  v,  v,  v,v,d")
           (match_operand:V_128 1 "" "v,R,v,j00,jm1,jyy,jxx,jKK,d,v"))]

As the offset "-16" does not qualify s390_short_displacement, we need a
reload.

Ideally we want alternative 2, for which gcc simply reload the mem
address into a address register.

r157:DI=r116:DI+r69:DI-0x10
[r157:DI]=r134:V16QI#0

While after r237277, gcc is treating the reload of insn 41 as a spill
and thus increased the costs for it, then alternative 8 beat alternative
2, thus the following reload sequences are generated.

r157:V4SI=r134:V16QI#0
[r116:DI+r69:DI-0x10]=r157:V4SI

GCC move the vector register into general register, then a second
instruction to store the general register into memory so it can match
alternative 8, which is "v", "d".

However the second instructions still constains the illegal mem address,
thus a further reload triggered, and gcc triggers above max number
reload issue.

The functional chang of this patch is to make gcc don't treat an memory
address reload as spill which is regression caused by r237277.

Does this explanation make sense?

Thanks.

Regards,
Jiong

>
> Bernd

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [lra] Cleanup the use of offmemok and don't count spilling cost for it
  2016-07-04 14:05                   ` Jiong Wang
@ 2016-07-04 17:40                     ` Bernd Schmidt
  0 siblings, 0 replies; 14+ messages in thread
From: Bernd Schmidt @ 2016-07-04 17:40 UTC (permalink / raw)
  To: Jiong Wang
  Cc: gcc-patches, Andreas Krebbel, vogt, Jeff Law, Vladimir N Makarov,
	Robin Dapp

On 07/04/2016 04:05 PM, Jiong Wang wrote:
> And the corresponding s390 patten is "mov<mode>" for V_128.
>
> (define_insn "mov<mode>"
>   [(set (match_operand:V_128 0 "" "=v,v,R,  v,  v,  v,  v,  v,v,d")
>           (match_operand:V_128 1 "" "v,R,v,j00,jm1,jyy,jxx,jKK,d,v"))]
>
> As the offset "-16" does not qualify s390_short_displacement, we need a
> reload.
>
> Ideally we want alternative 2, for which gcc simply reload the mem
> address into a address register.
>
> r157:DI=r116:DI+r69:DI-0x10
> [r157:DI]=r134:V16QI#0
>
> While after r237277, gcc is treating the reload of insn 41 as a spill
> and thus increased the costs for it, then alternative 8 beat alternative
> 2, thus the following reload sequences are generated.
>
> r157:V4SI=r134:V16QI#0
> [r116:DI+r69:DI-0x10]=r157:V4SI
>
> GCC move the vector register into general register, then a second
> instruction to store the general register into memory so it can match
> alternative 8, which is "v", "d".
>
> However the second instructions still constains the illegal mem address,
> thus a further reload triggered, and gcc triggers above max number
> reload issue.
>
> The functional chang of this patch is to make gcc don't treat an memory
> address reload as spill which is regression caused by r237277.
>
> Does this explanation make sense?

Yes, it explains well what's going on. I think the part of your patch 
that avoids counting a reload of an address as a spill looks ok. I'm 
uncertain whether the code still has issues after that, it seems a 
little iffy not to count the cost of reloading the memory address at 
all. We might have to add code at some point to detect if we're 
reloading a move instruction and would be generating an identical one 
when picking a given alternative.


Bernd

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Fwd: Re: [lra] Cleanup the use of offmemok and don't count spilling cost for it
       [not found]                 ` <577ABC99.9090102@redhat.com>
@ 2016-07-04 19:53                   ` Vladimir Makarov
  2016-07-05 16:09                   ` Jiong Wang
  1 sibling, 0 replies; 14+ messages in thread
From: Vladimir Makarov @ 2016-07-04 19:53 UTC (permalink / raw)
  To: gcc-patches

gcc-patches has rejected the original message as it contained invalid 
MIME type.  Therefore I am re-sending it.


-------- Forwarded Message --------
Subject: 	Re: [lra] Cleanup the use of offmemok and don't count spilling 
cost for it
Date: 	Mon, 4 Jul 2016 15:44:25 -0400
From: 	Vladimir Makarov <vmakarov@redhat.com>
To: 	Jiong Wang <jiong.wang@foss.arm.com>, gcc-patches@gcc.gnu.org
CC: 	Andreas Krebbel <krebbel@linux.vnet.ibm.com>, 
vogt@linux.vnet.ibm.com, Jeff Law <law@redhat.com>, Robin Dapp 
<rdapp@linux.vnet.ibm.com>



On 06/30/2016 01:22 PM, Jiong Wang wrote:
>
> Here is the patch,
>
> From my understanding, "offmemok" is used to represent a memory 
> operand who's address we want to reload, and searching of it's 
> reference location seems confirmed my understanding as it's always 
> used together with MEM_P check. So this patch does the following 
> modifications: * Only set offmemok to true if MEM_P is also true, as 
> otherwise offmemok is not used.  * Remove redundant MEM_P check which 
> was used together with offmemok. * Avoid the addition of spilling cost 
> if offmemok be true as an address calculation reload is not spilling. 
> bootstrap & gcc/g++ regression OK on x86_64/aarch64/arm. OK for trunk?

Yes.  The patch looks OK to me.  Thank you for working on the solution, 
Jiong.  As I wrote the code is very sensitive and any its change might 
affect some targets.  Usually patches for this part of LRA can take a 
few iterations.

>
> 2016-06-30 Jiong Wang <jiong.wang@arm.com> gcc * lra-constraints.c 
> (process_alt_operands): Only set "offmemok" for MEM_P. Remove 
> redundant MEM_P check if it's used together with "offmemok" check. 
> Don't add spilling cost for "offmemok". (curr_insn_transform): Remove 
> redundant MEM_P check.
>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [lra] Cleanup the use of offmemok and don't count spilling cost for it
       [not found]                 ` <577ABC99.9090102@redhat.com>
  2016-07-04 19:53                   ` Fwd: " Vladimir Makarov
@ 2016-07-05 16:09                   ` Jiong Wang
  1 sibling, 0 replies; 14+ messages in thread
From: Jiong Wang @ 2016-07-05 16:09 UTC (permalink / raw)
  To: Vladimir Makarov, gcc-patches
  Cc: Andreas Krebbel, vogt, Jeff Law, Robin Dapp, Bernd Schmidt

[-- Attachment #1: Type: text/plain, Size: 1359 bytes --]

On 04/07/16 20:44, Vladimir Makarov wrote:
> On 06/30/2016 01:22 PM, Jiong Wang wrote:
>>
>> Here is the patch,
>>  From my understanding, "offmemok" is used to represent a memory operand
>> who's address we want to reload, and searching of it's reference location
>> seems confirmed my understanding as it's always used together with MEM_P check.
>>
>> So this patch does the following modifications:
>>
>>    * Only set offmemok to true if MEM_P is also true, as otherwise offmemok
>>      is not used.
>>    * Remove redundant MEM_P check which was used together with offmemok.
>>    * Avoid the addition of spilling cost if offmemok be true as an address
>>      calculation reload is not spilling.
>>
>> bootstrap & gcc/g++ regression OK on x86_64/aarch64/arm.
>>
>> OK for trunk?
>
> Yes.  The patch looks OK to me.  Thank you for working on the 
> solution, Jiong.  As I wrote the code is very sensitive and any its 
> change might affect some targets.  Usually patches for this part of 
> LRA can take a few iterations.

Thanks for the review Vlad.

As Bernd has concerns on merging MEM_P into offmemok. I committed the
following patch as r238010 which keeps the functional change but without
merging checks.

2016-07-05  Jiong Wang<jiong.wang@arm.com>

gcc/
   * lra-constraints.c (process_alt_operands): Don't add spilling cost
   for "offmemok".



[-- Attachment #2: 1.patch --]
[-- Type: text/x-patch, Size: 512 bytes --]

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index bf08dce..e9d3e43 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -2488,7 +2488,9 @@ process_alt_operands (int only_alternative)
 
 		 Code below increases the reject for both pseudo and non-pseudo
 		 spill.  */
-	      if (no_regs_p && !(REG_P (op) && hard_regno[nop] < 0))
+	      if (no_regs_p
+		  && !(MEM_P (op) && offmemok)
+		  && !(REG_P (op) && hard_regno[nop] < 0))
 		{
 		  if (lra_dump_file != NULL)
 		    fprintf

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-07-05 16:09 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-08 16:47 [Patch, lra] PR70751, correct the cost for spilling non-pseudo into memory Jiong Wang
2016-06-09 15:52 ` Jeff Law
2016-06-27 16:33   ` Dominik Vogt
2016-06-28 10:01     ` Jiong Wang
2016-06-28 10:10       ` Jiong Wang
2016-06-28 14:19         ` Jiong Wang
2016-06-29  7:28           ` Andreas Krebbel
2016-06-29 21:35             ` Jiong Wang
2016-06-30 17:35               ` [lra] Cleanup the use of offmemok and don't count spilling cost for it Jiong Wang
2016-07-04 13:12                 ` Bernd Schmidt
2016-07-04 14:05                   ` Jiong Wang
2016-07-04 17:40                     ` Bernd Schmidt
     [not found]               ` <47d1401e-2d80-f7de-20c3-57f1876b680a@foss.arm.com>
     [not found]                 ` <577ABC99.9090102@redhat.com>
2016-07-04 19:53                   ` Fwd: " Vladimir Makarov
2016-07-05 16:09                   ` Jiong Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).