public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] lower-subreg, expr: Mitigate inefficiencies derived from "(clobber (reg X))" followed by "(set (subreg (reg X)) (...))"
@ 2022-08-03  1:35 Takayuki 'January June' Suwa
  2022-08-03  7:52 ` Richard Sandiford
  0 siblings, 1 reply; 20+ messages in thread
From: Takayuki 'January June' Suwa @ 2022-08-03  1:35 UTC (permalink / raw)
  To: GCC Patches

Emitting "(clobber (reg X))" before "(set (subreg (reg X)) (...))" keeps
data flow consistent, but it also increases register allocation pressure
and thus often creates many unwanted register-to-register moves that
cannot be optimized away.  It seems just analogous to partial register
stall which is a famous problem on processors that do register renaming.

In my opinion, when the register to be clobbered is a composite of hard
ones, we should clobber the individual elements separetely, otherwise
clear the entire to zero prior to use as the "init-regs" pass does (like
partial register stall workarounds on x86 CPUs).  Such redundant zero
constant assignments will be removed later in the "cprop_hardreg" pass.

This patch may give better output code quality for the reasons above,
especially on architectures that don't have DFmode hard registers
(On architectures with such hard registers, this patch changes virtually
nothing).

For example (Espressif ESP8266, Xtensa without FP hard regs):

    /* example */
    double _Complex conjugate(double _Complex z) {
      __imag__(z) *= -1;
      return z;
    }

    ;; before
    conjugate:
        movi.n  a6, -1
        slli    a6, a6, 31
        mov.n   a8, a2
        mov.n   a9, a3
        mov.n   a7, a4
        xor     a6, a5, a6
        mov.n   a2, a8
        mov.n   a3, a9
        mov.n   a4, a7
        mov.n   a5, a6
        ret.n

    ;; after
    conjugate:
        movi.n  a6, -1
        slli    a6, a6, 31
        xor     a6, a5, a6
        mov.n   a5, a6
        ret.n

gcc/ChangeLog:

	* lower-subreg.cc (resolve_simple_move):
	Add zero clear of the entire register immediately after
	the clobber.
	* expr.cc (emit_move_complex_parts):
	Change to clobber the real and imaginary parts separately
	instead of the whole complex register if possible.
---
 gcc/expr.cc         | 26 ++++++++++++++++++++------
 gcc/lower-subreg.cc |  7 ++++++-
 2 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 80bb1b8a4c5..9732e8fd4e5 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -3775,15 +3775,29 @@ emit_move_complex_push (machine_mode mode, rtx x, rtx y)
 rtx_insn *
 emit_move_complex_parts (rtx x, rtx y)
 {
-  /* Show the output dies here.  This is necessary for SUBREGs
-     of pseudos since we cannot track their lifetimes correctly;
-     hard regs shouldn't appear here except as return values.  */
-  if (!reload_completed && !reload_in_progress
-      && REG_P (x) && !reg_overlap_mentioned_p (x, y))
-    emit_clobber (x);
+  rtx_insn *re_insn, *im_insn;
 
   write_complex_part (x, read_complex_part (y, false), false, true);
+  re_insn = get_last_insn ();
   write_complex_part (x, read_complex_part (y, true), true, false);
+  im_insn = get_last_insn ();
+
+  /* Show the output dies here.  This is necessary for SUBREGs
+     of pseudos since we cannot track their lifetimes correctly.  */
+  if (can_create_pseudo_p ()
+      && REG_P (x) && ! reg_overlap_mentioned_p (x, y))
+    {
+      /* Hard regs shouldn't appear here except as return values.  */
+      if (HARD_REGISTER_P (x) && REG_NREGS (x) % 2 == 0)
+	{
+	  emit_insn_before (gen_clobber (SET_DEST (PATTERN (re_insn))),
+			    re_insn);
+	  emit_insn_before (gen_clobber (SET_DEST (PATTERN (im_insn))),
+			    im_insn);
+	}
+      else
+	emit_insn_before (gen_clobber (x), re_insn);
+    }
 
   return get_last_insn ();
 }
diff --git a/gcc/lower-subreg.cc b/gcc/lower-subreg.cc
index 03e9326c663..4ff0a7d1556 100644
--- a/gcc/lower-subreg.cc
+++ b/gcc/lower-subreg.cc
@@ -1086,7 +1086,12 @@ resolve_simple_move (rtx set, rtx_insn *insn)
       unsigned int i;
 
       if (REG_P (dest) && !HARD_REGISTER_NUM_P (REGNO (dest)))
-	emit_clobber (dest);
+	{
+	  emit_clobber (dest);
+	  /* We clear the entire of dest with zero after the clobber,
+	     similar to the "init-regs" pass.  */
+	  emit_move_insn (dest, CONST0_RTX (GET_MODE (dest)));
+	}
 
       for (i = 0; i < words; ++i)
 	{
-- 
2.20.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] lower-subreg, expr: Mitigate inefficiencies derived from "(clobber (reg X))" followed by "(set (subreg (reg X)) (...))"
  2022-08-03  1:35 [PATCH] lower-subreg, expr: Mitigate inefficiencies derived from "(clobber (reg X))" followed by "(set (subreg (reg X)) (...))" Takayuki 'January June' Suwa
@ 2022-08-03  7:52 ` Richard Sandiford
  2022-08-03 11:17   ` Takayuki 'January June' Suwa
  2022-08-03 17:23   ` Jeff Law
  0 siblings, 2 replies; 20+ messages in thread
From: Richard Sandiford @ 2022-08-03  7:52 UTC (permalink / raw)
  To: Takayuki 'January June' Suwa via Gcc-patches

Takayuki 'January June' Suwa via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Emitting "(clobber (reg X))" before "(set (subreg (reg X)) (...))" keeps
> data flow consistent, but it also increases register allocation pressure
> and thus often creates many unwanted register-to-register moves that
> cannot be optimized away.

There are two things here:

- If emit_move_complex_parts emits a clobber of a hard register,
  then that's probably a bug/misfeature.  The point of the clobber is
  to indicate that the register has no useful contents.  That's useful
  for wide pseudos that are written to in parts, since it avoids the
  need to track the liveness of each part of the pseudo individually.
  But it shouldn't be necessary for hard registers, since subregs of
  hard registers are simplified to hard registers wherever possible
  (which on most targets is "always").

  So I think the emit_move_complex_parts clobber should be restricted
  to !HARD_REGISTER_P, like the lower-subreg clobber is.  If that helps
  (if only partly) then it would be worth doing as its own patch.

- I think it'd be worth looking into more detail why a clobber makes
  a difference to register pressure.  A clobber of a pseudo register R
  shouldn't make R conflict with things that are live at the point of
  the clobber.

>  It seems just analogous to partial register
> stall which is a famous problem on processors that do register renaming.
>
> In my opinion, when the register to be clobbered is a composite of hard
> ones, we should clobber the individual elements separetely, otherwise
> clear the entire to zero prior to use as the "init-regs" pass does (like
> partial register stall workarounds on x86 CPUs).  Such redundant zero
> constant assignments will be removed later in the "cprop_hardreg" pass.

I don't think we should rely on the zero being optimised away later.

Emitting the zero also makes it harder for the register allocator
to elide the move.  For example, if we have:

  (set (subreg:SI (reg:DI P) 0) (reg:SI R0))
  (set (subreg:SI (reg:DI P) 4) (reg:SI R1))

then there is at least a chance that the RA could assign hard registers
R0:R1 to P, which would turn the moves into nops.  If we emit:

  (set (reg:DI P) (const_int 0))

beforehand then that becomes impossible, since R0 and R1 would then
conflict with P.

TBH I'm surprised we still run init_regs for LRA.  I thought there was
a plan to stop doing that, but perhaps I misremember.

Thanks,
Richard

> This patch may give better output code quality for the reasons above,
> especially on architectures that don't have DFmode hard registers
> (On architectures with such hard registers, this patch changes virtually
> nothing).
>
> For example (Espressif ESP8266, Xtensa without FP hard regs):
>
>     /* example */
>     double _Complex conjugate(double _Complex z) {
>       __imag__(z) *= -1;
>       return z;
>     }
>
>     ;; before
>     conjugate:
>         movi.n  a6, -1
>         slli    a6, a6, 31
>         mov.n   a8, a2
>         mov.n   a9, a3
>         mov.n   a7, a4
>         xor     a6, a5, a6
>         mov.n   a2, a8
>         mov.n   a3, a9
>         mov.n   a4, a7
>         mov.n   a5, a6
>         ret.n
>
>     ;; after
>     conjugate:
>         movi.n  a6, -1
>         slli    a6, a6, 31
>         xor     a6, a5, a6
>         mov.n   a5, a6
>         ret.n
>
> gcc/ChangeLog:
>
> 	* lower-subreg.cc (resolve_simple_move):
> 	Add zero clear of the entire register immediately after
> 	the clobber.
> 	* expr.cc (emit_move_complex_parts):
> 	Change to clobber the real and imaginary parts separately
> 	instead of the whole complex register if possible.
> ---
>  gcc/expr.cc         | 26 ++++++++++++++++++++------
>  gcc/lower-subreg.cc |  7 ++++++-
>  2 files changed, 26 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index 80bb1b8a4c5..9732e8fd4e5 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -3775,15 +3775,29 @@ emit_move_complex_push (machine_mode mode, rtx x, rtx y)
>  rtx_insn *
>  emit_move_complex_parts (rtx x, rtx y)
>  {
> -  /* Show the output dies here.  This is necessary for SUBREGs
> -     of pseudos since we cannot track their lifetimes correctly;
> -     hard regs shouldn't appear here except as return values.  */
> -  if (!reload_completed && !reload_in_progress
> -      && REG_P (x) && !reg_overlap_mentioned_p (x, y))
> -    emit_clobber (x);
> +  rtx_insn *re_insn, *im_insn;
>  
>    write_complex_part (x, read_complex_part (y, false), false, true);
> +  re_insn = get_last_insn ();
>    write_complex_part (x, read_complex_part (y, true), true, false);
> +  im_insn = get_last_insn ();
> +
> +  /* Show the output dies here.  This is necessary for SUBREGs
> +     of pseudos since we cannot track their lifetimes correctly.  */
> +  if (can_create_pseudo_p ()
> +      && REG_P (x) && ! reg_overlap_mentioned_p (x, y))
> +    {
> +      /* Hard regs shouldn't appear here except as return values.  */
> +      if (HARD_REGISTER_P (x) && REG_NREGS (x) % 2 == 0)
> +	{
> +	  emit_insn_before (gen_clobber (SET_DEST (PATTERN (re_insn))),
> +			    re_insn);
> +	  emit_insn_before (gen_clobber (SET_DEST (PATTERN (im_insn))),
> +			    im_insn);
> +	}
> +      else
> +	emit_insn_before (gen_clobber (x), re_insn);
> +    }
>  
>    return get_last_insn ();
>  }
> diff --git a/gcc/lower-subreg.cc b/gcc/lower-subreg.cc
> index 03e9326c663..4ff0a7d1556 100644
> --- a/gcc/lower-subreg.cc
> +++ b/gcc/lower-subreg.cc
> @@ -1086,7 +1086,12 @@ resolve_simple_move (rtx set, rtx_insn *insn)
>        unsigned int i;
>  
>        if (REG_P (dest) && !HARD_REGISTER_NUM_P (REGNO (dest)))
> -	emit_clobber (dest);
> +	{
> +	  emit_clobber (dest);
> +	  /* We clear the entire of dest with zero after the clobber,
> +	     similar to the "init-regs" pass.  */
> +	  emit_move_insn (dest, CONST0_RTX (GET_MODE (dest)));
> +	}
>  
>        for (i = 0; i < words; ++i)
>  	{

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] lower-subreg, expr: Mitigate inefficiencies derived from "(clobber (reg X))" followed by "(set (subreg (reg X)) (...))"
  2022-08-03  7:52 ` Richard Sandiford
@ 2022-08-03 11:17   ` Takayuki 'January June' Suwa
  2022-08-04  9:49     ` Richard Sandiford
  2022-08-03 17:23   ` Jeff Law
  1 sibling, 1 reply; 20+ messages in thread
From: Takayuki 'January June' Suwa @ 2022-08-03 11:17 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: GCC Patches

Thanks for your response.

On 2022/08/03 16:52, Richard Sandiford wrote:
> Takayuki 'January June' Suwa via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> Emitting "(clobber (reg X))" before "(set (subreg (reg X)) (...))" keeps
>> data flow consistent, but it also increases register allocation pressure
>> and thus often creates many unwanted register-to-register moves that
>> cannot be optimized away.
> 
> There are two things here:
> 
> - If emit_move_complex_parts emits a clobber of a hard register,
>   then that's probably a bug/misfeature.  The point of the clobber is
>   to indicate that the register has no useful contents.  That's useful
>   for wide pseudos that are written to in parts, since it avoids the
>   need to track the liveness of each part of the pseudo individually.
>   But it shouldn't be necessary for hard registers, since subregs of
>   hard registers are simplified to hard registers wherever possible
>   (which on most targets is "always").
> 
>   So I think the emit_move_complex_parts clobber should be restricted
>   to !HARD_REGISTER_P, like the lower-subreg clobber is.  If that helps
>   (if only partly) then it would be worth doing as its own patch.
> 
> - I think it'd be worth looking into more detail why a clobber makes
>   a difference to register pressure.  A clobber of a pseudo register R
>   shouldn't make R conflict with things that are live at the point of
>   the clobber.

I agree with its worth.
In fact, aside from other ports, on the xtensa one, RA in code with frequent D[FC]mode pseudos is terribly bad.
For example, in __muldc3 on libgcc2, the size of the stack frame reserved will almost double depending on whether or not this patch is applied.

> 
>>  It seems just analogous to partial register
>> stall which is a famous problem on processors that do register renaming.
>>
>> In my opinion, when the register to be clobbered is a composite of hard
>> ones, we should clobber the individual elements separetely, otherwise
>> clear the entire to zero prior to use as the "init-regs" pass does (like
>> partial register stall workarounds on x86 CPUs).  Such redundant zero
>> constant assignments will be removed later in the "cprop_hardreg" pass.
> 
> I don't think we should rely on the zero being optimised away later.
> 
> Emitting the zero also makes it harder for the register allocator
> to elide the move.  For example, if we have:
> 
>   (set (subreg:SI (reg:DI P) 0) (reg:SI R0))
>   (set (subreg:SI (reg:DI P) 4) (reg:SI R1))
> 
> then there is at least a chance that the RA could assign hard registers
> R0:R1 to P, which would turn the moves into nops.  If we emit:
> 
>   (set (reg:DI P) (const_int 0))
> 
> beforehand then that becomes impossible, since R0 and R1 would then
> conflict with P.

Ah, surely, as you pointed out for targets where "(reg: DI)" corresponds to one hard register.

> 
> TBH I'm surprised we still run init_regs for LRA.  I thought there was
> a plan to stop doing that, but perhaps I misremember.

Sorry I am not sure about the status of LRA... because the xtensa port is still using reload.

As conclusion, trying to tweak the common code side may have been a bit premature.
I'll consider if I can deal with those issues on the side of the target-specific code.

> 
> Thanks,
> Richard
> 
>> This patch may give better output code quality for the reasons above,
>> especially on architectures that don't have DFmode hard registers
>> (On architectures with such hard registers, this patch changes virtually
>> nothing).
>>
>> For example (Espressif ESP8266, Xtensa without FP hard regs):
>>
>>     /* example */
>>     double _Complex conjugate(double _Complex z) {
>>       __imag__(z) *= -1;
>>       return z;
>>     }
>>
>>     ;; before
>>     conjugate:
>>         movi.n  a6, -1
>>         slli    a6, a6, 31
>>         mov.n   a8, a2
>>         mov.n   a9, a3
>>         mov.n   a7, a4
>>         xor     a6, a5, a6
>>         mov.n   a2, a8
>>         mov.n   a3, a9
>>         mov.n   a4, a7
>>         mov.n   a5, a6
>>         ret.n
>>
>>     ;; after
>>     conjugate:
>>         movi.n  a6, -1
>>         slli    a6, a6, 31
>>         xor     a6, a5, a6
>>         mov.n   a5, a6
>>         ret.n
>>
>> gcc/ChangeLog:
>>
>> 	* lower-subreg.cc (resolve_simple_move):
>> 	Add zero clear of the entire register immediately after
>> 	the clobber.
>> 	* expr.cc (emit_move_complex_parts):
>> 	Change to clobber the real and imaginary parts separately
>> 	instead of the whole complex register if possible.
>> ---
>>  gcc/expr.cc         | 26 ++++++++++++++++++++------
>>  gcc/lower-subreg.cc |  7 ++++++-
>>  2 files changed, 26 insertions(+), 7 deletions(-)
>>
>> diff --git a/gcc/expr.cc b/gcc/expr.cc
>> index 80bb1b8a4c5..9732e8fd4e5 100644
>> --- a/gcc/expr.cc
>> +++ b/gcc/expr.cc
>> @@ -3775,15 +3775,29 @@ emit_move_complex_push (machine_mode mode, rtx x, rtx y)
>>  rtx_insn *
>>  emit_move_complex_parts (rtx x, rtx y)
>>  {
>> -  /* Show the output dies here.  This is necessary for SUBREGs
>> -     of pseudos since we cannot track their lifetimes correctly;
>> -     hard regs shouldn't appear here except as return values.  */
>> -  if (!reload_completed && !reload_in_progress
>> -      && REG_P (x) && !reg_overlap_mentioned_p (x, y))
>> -    emit_clobber (x);
>> +  rtx_insn *re_insn, *im_insn;
>>  
>>    write_complex_part (x, read_complex_part (y, false), false, true);
>> +  re_insn = get_last_insn ();
>>    write_complex_part (x, read_complex_part (y, true), true, false);
>> +  im_insn = get_last_insn ();
>> +
>> +  /* Show the output dies here.  This is necessary for SUBREGs
>> +     of pseudos since we cannot track their lifetimes correctly.  */
>> +  if (can_create_pseudo_p ()
>> +      && REG_P (x) && ! reg_overlap_mentioned_p (x, y))
>> +    {
>> +      /* Hard regs shouldn't appear here except as return values.  */
>> +      if (HARD_REGISTER_P (x) && REG_NREGS (x) % 2 == 0)
>> +	{
>> +	  emit_insn_before (gen_clobber (SET_DEST (PATTERN (re_insn))),
>> +			    re_insn);
>> +	  emit_insn_before (gen_clobber (SET_DEST (PATTERN (im_insn))),
>> +			    im_insn);
>> +	}
>> +      else
>> +	emit_insn_before (gen_clobber (x), re_insn);
>> +    }
>>  
>>    return get_last_insn ();
>>  }
>> diff --git a/gcc/lower-subreg.cc b/gcc/lower-subreg.cc
>> index 03e9326c663..4ff0a7d1556 100644
>> --- a/gcc/lower-subreg.cc
>> +++ b/gcc/lower-subreg.cc
>> @@ -1086,7 +1086,12 @@ resolve_simple_move (rtx set, rtx_insn *insn)
>>        unsigned int i;
>>  
>>        if (REG_P (dest) && !HARD_REGISTER_NUM_P (REGNO (dest)))
>> -	emit_clobber (dest);
>> +	{
>> +	  emit_clobber (dest);
>> +	  /* We clear the entire of dest with zero after the clobber,
>> +	     similar to the "init-regs" pass.  */
>> +	  emit_move_insn (dest, CONST0_RTX (GET_MODE (dest)));
>> +	}
>>  
>>        for (i = 0; i < words; ++i)
>>  	{

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] lower-subreg, expr: Mitigate inefficiencies derived from "(clobber (reg X))" followed by "(set (subreg (reg X)) (...))"
  2022-08-03  7:52 ` Richard Sandiford
  2022-08-03 11:17   ` Takayuki 'January June' Suwa
@ 2022-08-03 17:23   ` Jeff Law
  2022-08-04  9:39     ` Richard Sandiford
  1 sibling, 1 reply; 20+ messages in thread
From: Jeff Law @ 2022-08-03 17:23 UTC (permalink / raw)
  To: gcc-patches



On 8/3/2022 1:52 AM, Richard Sandiford via Gcc-patches wrote:
> Takayuki 'January June' Suwa via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> Emitting "(clobber (reg X))" before "(set (subreg (reg X)) (...))" keeps
>> data flow consistent, but it also increases register allocation pressure
>> and thus often creates many unwanted register-to-register moves that
>> cannot be optimized away.
> There are two things here:
>
> - If emit_move_complex_parts emits a clobber of a hard register,
>    then that's probably a bug/misfeature.  The point of the clobber is
>    to indicate that the register has no useful contents.  That's useful
>    for wide pseudos that are written to in parts, since it avoids the
>    need to track the liveness of each part of the pseudo individually.
>    But it shouldn't be necessary for hard registers, since subregs of
>    hard registers are simplified to hard registers wherever possible
>    (which on most targets is "always").
>
>    So I think the emit_move_complex_parts clobber should be restricted
>    to !HARD_REGISTER_P, like the lower-subreg clobber is.  If that helps
>    (if only partly) then it would be worth doing as its own patch.
Agreed.

>
> - I think it'd be worth looking into more detail why a clobber makes
>    a difference to register pressure.  A clobber of a pseudo register R
>    shouldn't make R conflict with things that are live at the point of
>    the clobber.
Also agreed.

>
>>   It seems just analogous to partial register
>> stall which is a famous problem on processors that do register renaming.
>>
>> In my opinion, when the register to be clobbered is a composite of hard
>> ones, we should clobber the individual elements separetely, otherwise
>> clear the entire to zero prior to use as the "init-regs" pass does (like
>> partial register stall workarounds on x86 CPUs).  Such redundant zero
>> constant assignments will be removed later in the "cprop_hardreg" pass.
> I don't think we should rely on the zero being optimised away later.
>
> Emitting the zero also makes it harder for the register allocator
> to elide the move.  For example, if we have:
>
>    (set (subreg:SI (reg:DI P) 0) (reg:SI R0))
>    (set (subreg:SI (reg:DI P) 4) (reg:SI R1))
>
> then there is at least a chance that the RA could assign hard registers
> R0:R1 to P, which would turn the moves into nops.  If we emit:
>
>    (set (reg:DI P) (const_int 0))
>
> beforehand then that becomes impossible, since R0 and R1 would then
> conflict with P.
>
> TBH I'm surprised we still run init_regs for LRA.  I thought there was
> a plan to stop doing that, but perhaps I misremember.
I have vague memories of dealing with some of this nonsense a few 
release cycles.  I don't recall all the details, but init-regs + 
lower-subreg + regcprop + splitting all conspired to generate poor code 
on the MIPS targets.  See pr87761, though it doesn't include all my 
findings -- I can't recall if I walked through the entire tortured 
sequence in the gcc-patches discussion or not.

I ended up working around in the mips backend in conjunction with some 
changes to regcprop IIRC.

Jeff


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] lower-subreg, expr: Mitigate inefficiencies derived from "(clobber (reg X))" followed by "(set (subreg (reg X)) (...))"
  2022-08-03 17:23   ` Jeff Law
@ 2022-08-04  9:39     ` Richard Sandiford
  0 siblings, 0 replies; 20+ messages in thread
From: Richard Sandiford @ 2022-08-04  9:39 UTC (permalink / raw)
  To: Jeff Law via Gcc-patches

Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On 8/3/2022 1:52 AM, Richard Sandiford via Gcc-patches wrote:
>> Takayuki 'January June' Suwa via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>>> Emitting "(clobber (reg X))" before "(set (subreg (reg X)) (...))" keeps
>>> data flow consistent, but it also increases register allocation pressure
>>> and thus often creates many unwanted register-to-register moves that
>>> cannot be optimized away.
>> There are two things here:
>>
>> - If emit_move_complex_parts emits a clobber of a hard register,
>>    then that's probably a bug/misfeature.  The point of the clobber is
>>    to indicate that the register has no useful contents.  That's useful
>>    for wide pseudos that are written to in parts, since it avoids the
>>    need to track the liveness of each part of the pseudo individually.
>>    But it shouldn't be necessary for hard registers, since subregs of
>>    hard registers are simplified to hard registers wherever possible
>>    (which on most targets is "always").
>>
>>    So I think the emit_move_complex_parts clobber should be restricted
>>    to !HARD_REGISTER_P, like the lower-subreg clobber is.  If that helps
>>    (if only partly) then it would be worth doing as its own patch.
> Agreed.
>
>>
>> - I think it'd be worth looking into more detail why a clobber makes
>>    a difference to register pressure.  A clobber of a pseudo register R
>>    shouldn't make R conflict with things that are live at the point of
>>    the clobber.
> Also agreed.
>
>>
>>>   It seems just analogous to partial register
>>> stall which is a famous problem on processors that do register renaming.
>>>
>>> In my opinion, when the register to be clobbered is a composite of hard
>>> ones, we should clobber the individual elements separetely, otherwise
>>> clear the entire to zero prior to use as the "init-regs" pass does (like
>>> partial register stall workarounds on x86 CPUs).  Such redundant zero
>>> constant assignments will be removed later in the "cprop_hardreg" pass.
>> I don't think we should rely on the zero being optimised away later.
>>
>> Emitting the zero also makes it harder for the register allocator
>> to elide the move.  For example, if we have:
>>
>>    (set (subreg:SI (reg:DI P) 0) (reg:SI R0))
>>    (set (subreg:SI (reg:DI P) 4) (reg:SI R1))
>>
>> then there is at least a chance that the RA could assign hard registers
>> R0:R1 to P, which would turn the moves into nops.  If we emit:
>>
>>    (set (reg:DI P) (const_int 0))
>>
>> beforehand then that becomes impossible, since R0 and R1 would then
>> conflict with P.
>>
>> TBH I'm surprised we still run init_regs for LRA.  I thought there was
>> a plan to stop doing that, but perhaps I misremember.
> I have vague memories of dealing with some of this nonsense a few 
> release cycles.  I don't recall all the details, but init-regs + 
> lower-subreg + regcprop + splitting all conspired to generate poor code 
> on the MIPS targets.  See pr87761, though it doesn't include all my 
> findings -- I can't recall if I walked through the entire tortured 
> sequence in the gcc-patches discussion or not.
>
> I ended up working around in the mips backend in conjunction with some 
> changes to regcprop IIRC.

Thanks for the pointer, hadn't seen that.  And yeah, for the early-ish
passes, I guess the interaction between lower-subreg and init-regs is
important too, not just the interaction between lower-subreg and RA.
It probably also ties into the problems with overly-scalarised register
moves, like in PR 106106.

So maybe I was being too optimistic :-)

Richard

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] lower-subreg, expr: Mitigate inefficiencies derived from "(clobber (reg X))" followed by "(set (subreg (reg X)) (...))"
  2022-08-03 11:17   ` Takayuki 'January June' Suwa
@ 2022-08-04  9:49     ` Richard Sandiford
  2022-08-04 12:35       ` Takayuki 'January June' Suwa
  2022-08-05 16:12       ` [PATCH] lower-subreg, expr: Mitigate inefficiencies derived from "(clobber (reg X))" followed by "(set (subreg (reg X)) (...))" Jeff Law
  0 siblings, 2 replies; 20+ messages in thread
From: Richard Sandiford @ 2022-08-04  9:49 UTC (permalink / raw)
  To: Takayuki 'January June' Suwa; +Cc: GCC Patches

Takayuki 'January June' Suwa <jjsuwa_sys3175@yahoo.co.jp> writes:
> Thanks for your response.
>
> On 2022/08/03 16:52, Richard Sandiford wrote:
>> Takayuki 'January June' Suwa via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>>> Emitting "(clobber (reg X))" before "(set (subreg (reg X)) (...))" keeps
>>> data flow consistent, but it also increases register allocation pressure
>>> and thus often creates many unwanted register-to-register moves that
>>> cannot be optimized away.
>> 
>> There are two things here:
>> 
>> - If emit_move_complex_parts emits a clobber of a hard register,
>>   then that's probably a bug/misfeature.  The point of the clobber is
>>   to indicate that the register has no useful contents.  That's useful
>>   for wide pseudos that are written to in parts, since it avoids the
>>   need to track the liveness of each part of the pseudo individually.
>>   But it shouldn't be necessary for hard registers, since subregs of
>>   hard registers are simplified to hard registers wherever possible
>>   (which on most targets is "always").
>> 
>>   So I think the emit_move_complex_parts clobber should be restricted
>>   to !HARD_REGISTER_P, like the lower-subreg clobber is.  If that helps
>>   (if only partly) then it would be worth doing as its own patch.
>> 
>> - I think it'd be worth looking into more detail why a clobber makes
>>   a difference to register pressure.  A clobber of a pseudo register R
>>   shouldn't make R conflict with things that are live at the point of
>>   the clobber.
>
> I agree with its worth.
> In fact, aside from other ports, on the xtensa one, RA in code with frequent D[FC]mode pseudos is terribly bad.
> For example, in __muldc3 on libgcc2, the size of the stack frame reserved will almost double depending on whether or not this patch is applied.

Yeah, that's a lot.

>>>  It seems just analogous to partial register
>>> stall which is a famous problem on processors that do register renaming.
>>>
>>> In my opinion, when the register to be clobbered is a composite of hard
>>> ones, we should clobber the individual elements separetely, otherwise
>>> clear the entire to zero prior to use as the "init-regs" pass does (like
>>> partial register stall workarounds on x86 CPUs).  Such redundant zero
>>> constant assignments will be removed later in the "cprop_hardreg" pass.
>> 
>> I don't think we should rely on the zero being optimised away later.
>> 
>> Emitting the zero also makes it harder for the register allocator
>> to elide the move.  For example, if we have:
>> 
>>   (set (subreg:SI (reg:DI P) 0) (reg:SI R0))
>>   (set (subreg:SI (reg:DI P) 4) (reg:SI R1))
>> 
>> then there is at least a chance that the RA could assign hard registers
>> R0:R1 to P, which would turn the moves into nops.  If we emit:
>> 
>>   (set (reg:DI P) (const_int 0))
>> 
>> beforehand then that becomes impossible, since R0 and R1 would then
>> conflict with P.
>
> Ah, surely, as you pointed out for targets where "(reg: DI)" corresponds to one hard register.

I was thinking here about the case where (reg:DI …) corresponds to
2 hard registers.  Each subreg move is then a single hard register
copy, but assigning P to the combination R0:R1 can remove both of
the subreg moves.

>> TBH I'm surprised we still run init_regs for LRA.  I thought there was
>> a plan to stop doing that, but perhaps I misremember.
>
> Sorry I am not sure about the status of LRA... because the xtensa port is still using reload.

Ah, hadn't realised that.  If you have time to work on it, it would be
really good to move over to LRA.  There are plans to remove old reload.

It might be that old reload *does* treat a pseudo clobber as a conflict.
I can't remember now.  If so, then zeroing the register wouldn't be
too bad (for old reload only).

> As conclusion, trying to tweak the common code side may have been a bit premature.
> I'll consider if I can deal with those issues on the side of the target-specific code.

It's likely to be at least partly a target-independent issue, so tweaking
the common code makes sense in principle.

Does adding !HARD_REGISTER_P (x) to:

  /* Show the output dies here.  This is necessary for SUBREGs
     of pseudos since we cannot track their lifetimes correctly;
     hard regs shouldn't appear here except as return values.  */
  if (!reload_completed && !reload_in_progress
      && REG_P (x) && !reg_overlap_mentioned_p (x, y))
    emit_clobber (x);

in emit_move_complex_parts help?  If so, I think we should do at
least that much.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] lower-subreg, expr: Mitigate inefficiencies derived from "(clobber (reg X))" followed by "(set (subreg (reg X)) (...))"
  2022-08-04  9:49     ` Richard Sandiford
@ 2022-08-04 12:35       ` Takayuki 'January June' Suwa
  2022-08-05 16:20         ` Jeff Law
  2022-08-05 16:12       ` [PATCH] lower-subreg, expr: Mitigate inefficiencies derived from "(clobber (reg X))" followed by "(set (subreg (reg X)) (...))" Jeff Law
  1 sibling, 1 reply; 20+ messages in thread
From: Takayuki 'January June' Suwa @ 2022-08-04 12:35 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: GCC Patches

(sorry repost due to the lack of cc here)
Hi!

On 2022/08/04 18:49, Richard Sandiford wrote:
> Takayuki 'January June' Suwa <jjsuwa_sys3175@yahoo.co.jp> writes:
>> Thanks for your response.
>>
>> On 2022/08/03 16:52, Richard Sandiford wrote:
>>> Takayuki 'January June' Suwa via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>>>> Emitting "(clobber (reg X))" before "(set (subreg (reg X)) (...))" keeps
>>>> data flow consistent, but it also increases register allocation pressure
>>>> and thus often creates many unwanted register-to-register moves that
>>>> cannot be optimized away.
>>>
>>> There are two things here:
>>>
>>> - If emit_move_complex_parts emits a clobber of a hard register,
>>>   then that's probably a bug/misfeature.  The point of the clobber is
>>>   to indicate that the register has no useful contents.  That's useful
>>>   for wide pseudos that are written to in parts, since it avoids the
>>>   need to track the liveness of each part of the pseudo individually.
>>>   But it shouldn't be necessary for hard registers, since subregs of
>>>   hard registers are simplified to hard registers wherever possible
>>>   (which on most targets is "always").
>>>
>>>   So I think the emit_move_complex_parts clobber should be restricted
>>>   to !HARD_REGISTER_P, like the lower-subreg clobber is.  If that helps
>>>   (if only partly) then it would be worth doing as its own patch.
>>>
>>> - I think it'd be worth looking into more detail why a clobber makes
>>>   a difference to register pressure.  A clobber of a pseudo register R
>>>   shouldn't make R conflict with things that are live at the point of
>>>   the clobber.
>>
>> I agree with its worth.
>> In fact, aside from other ports, on the xtensa one, RA in code with frequent D[FC]mode pseudos is terribly bad.
>> For example, in __muldc3 on libgcc2, the size of the stack frame reserved will almost double depending on whether or not this patch is applied.
> 
> Yeah, that's a lot.

So lots, but almost double might be an overstatement :)

BTW after some quick experimentation, I found that turning on -fsplit-wide-types-early would roughly (but not completely) solve the problem.  Surely, the output was not so bad in the past...

> 
>>>>  It seems just analogous to partial register
>>>> stall which is a famous problem on processors that do register renaming.
>>>>
>>>> In my opinion, when the register to be clobbered is a composite of hard
>>>> ones, we should clobber the individual elements separetely, otherwise
>>>> clear the entire to zero prior to use as the "init-regs" pass does (like
>>>> partial register stall workarounds on x86 CPUs).  Such redundant zero
>>>> constant assignments will be removed later in the "cprop_hardreg" pass.
>>>
>>> I don't think we should rely on the zero being optimised away later.
>>>
>>> Emitting the zero also makes it harder for the register allocator
>>> to elide the move.  For example, if we have:
>>>
>>>   (set (subreg:SI (reg:DI P) 0) (reg:SI R0))
>>>   (set (subreg:SI (reg:DI P) 4) (reg:SI R1))
>>>
>>> then there is at least a chance that the RA could assign hard registers
>>> R0:R1 to P, which would turn the moves into nops.  If we emit:
>>>
>>>   (set (reg:DI P) (const_int 0))
>>>
>>> beforehand then that becomes impossible, since R0 and R1 would then
>>> conflict with P.
>>
>> Ah, surely, as you pointed out for targets where "(reg: DI)" corresponds to one hard register.
> 
> I was thinking here about the case where (reg:DI …) corresponds to
> 2 hard registers.  Each subreg move is then a single hard register
> copy, but assigning P to the combination R0:R1 can remove both of
> the subreg moves.
> 
>>> TBH I'm surprised we still run init_regs for LRA.  I thought there was
>>> a plan to stop doing that, but perhaps I misremember.
>>
>> Sorry I am not sure about the status of LRA... because the xtensa port is still using reload.
> 
> Ah, hadn't realised that.  If you have time to work on it, it would be
> really good to move over to LRA.  There are plans to remove old reload.

Alas you do overestimate me :) I've only been working about the GCC development for a little over a year.
Well it's a lie that I'm not interested in it, but too much for me.

> 
> It might be that old reload *does* treat a pseudo clobber as a conflict.
> I can't remember now.  If so, then zeroing the register wouldn't be
> too bad (for old reload only).
> 
>> As conclusion, trying to tweak the common code side may have been a bit premature.
>> I'll consider if I can deal with those issues on the side of the target-specific code.
> 
> It's likely to be at least partly a target-independent issue, so tweaking
> the common code makes sense in principle.
> 
> Does adding !HARD_REGISTER_P (x) to:
> 
>   /* Show the output dies here.  This is necessary for SUBREGs
>      of pseudos since we cannot track their lifetimes correctly;
>      hard regs shouldn't appear here except as return values.  */
>   if (!reload_completed && !reload_in_progress
>       && REG_P (x) && !reg_overlap_mentioned_p (x, y))
>     emit_clobber (x);
> 
> in emit_move_complex_parts help?  If so, I think we should do at

Probably yes.  Quick test says the abovementioned mod makes the ad-hoc fix I posted earlier (https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596626.html) a thing of the past.

> least that much.
> 
> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] lower-subreg, expr: Mitigate inefficiencies derived from "(clobber (reg X))" followed by "(set (subreg (reg X)) (...))"
  2022-08-04  9:49     ` Richard Sandiford
  2022-08-04 12:35       ` Takayuki 'January June' Suwa
@ 2022-08-05 16:12       ` Jeff Law
  1 sibling, 0 replies; 20+ messages in thread
From: Jeff Law @ 2022-08-05 16:12 UTC (permalink / raw)
  To: gcc-patches



On 8/4/2022 3:49 AM, Richard Sandiford via Gcc-patches wrote:
>
>>> TBH I'm surprised we still run init_regs for LRA.  I thought there was
>>> a plan to stop doing that, but perhaps I misremember.
>> Sorry I am not sure about the status of LRA... because the xtensa port is still using reload.
> Ah, hadn't realised that.  If you have time to work on it, it would be
> really good to move over to LRA.  There are plans to remove old reload.
Definitely worth investigating.  With the cc0 removal done I think the 
last blocker for removing the old reload pass is gone.   We just need to 
get the remaining targets converted to LRA.

> It might be that old reload *does* treat a pseudo clobber as a conflict.
> I can't remember now.  If so, then zeroing the register wouldn't be
> too bad (for old reload only).
No idea anymore either.  I'd be a bit surprised since IIRC the main 
purpose was to tell the old uninit warning code that the entire object 
was set by the subsequent libcall sequence.  But all that code is long-gone.

Which I think raises a question.  Do we even need those CLOBBERSs anymore?


>
>> As conclusion, trying to tweak the common code side may have been a bit premature.
>> I'll consider if I can deal with those issues on the side of the target-specific code.
> It's likely to be at least partly a target-independent issue, so tweaking
> the common code makes sense in principle.
>
> Does adding !HARD_REGISTER_P (x) to:
>
>    /* Show the output dies here.  This is necessary for SUBREGs
>       of pseudos since we cannot track their lifetimes correctly;
>       hard regs shouldn't appear here except as return values.  */
>    if (!reload_completed && !reload_in_progress
>        && REG_P (x) && !reg_overlap_mentioned_p (x, y))
>      emit_clobber (x);
>
> in emit_move_complex_parts help?  If so, I think we should do at
> least that much.
If we can't remove the CLOBBERs entirely, then this sounds like a good 
thing, even if it doesn't help this specific case.

jeff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] lower-subreg, expr: Mitigate inefficiencies derived from "(clobber (reg X))" followed by "(set (subreg (reg X)) (...))"
  2022-08-04 12:35       ` Takayuki 'January June' Suwa
@ 2022-08-05 16:20         ` Jeff Law
  2022-10-14 11:06           ` [PATCH] xtensa: Prepare the transition from Reload to LRA Takayuki 'January June' Suwa
  0 siblings, 1 reply; 20+ messages in thread
From: Jeff Law @ 2022-08-05 16:20 UTC (permalink / raw)
  To: gcc-patches



On 8/4/2022 6:35 AM, Takayuki 'January June' Suwa via Gcc-patches wrote:
> So lots, but almost double might be an overstatement :)
>
> BTW after some quick experimentation, I found that turning on -fsplit-wide-types-early would roughly (but not completely) solve the problem.  Surely, the output was not so bad in the past...
It could have been.


>> Ah, hadn't realised that.  If you have time to work on it, it would be
>> really good to move over to LRA.  There are plans to remove old reload.
> Alas you do overestimate me :) I've only been working about the GCC development for a little over a year.
> Well it's a lie that I'm not interested in it, but too much for me.
It may actually be trivial -- change TARGET_LRA_P to be 
hook_bool_void_true in the xtensa port, then rebuild & test. In the 
couple conversions I've done it's exposed a very small number of issues 
that were trivially resolved.  Given the what I've seen from you I would 
expect it's within your capabilities and there's folks here that can 
help if you do run into problems.

jeff


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH] xtensa: Prepare the transition from Reload to LRA
  2022-08-05 16:20         ` Jeff Law
@ 2022-10-14 11:06           ` Takayuki 'January June' Suwa
  2022-10-16  5:03             ` Max Filippov
  0 siblings, 1 reply; 20+ messages in thread
From: Takayuki 'January June' Suwa @ 2022-10-14 11:06 UTC (permalink / raw)
  To: GCC Patches; +Cc: Jeff Law, max Filippov

On 2022/08/06 1:20, Jeff Law via Gcc-patches wrote:
>>> Ah, hadn't realised that.  If you have time to work on it, it would be
>>> really good to move over to LRA.  There are plans to remove old reload.
>> Alas you do overestimate me :) I've only been working about the GCC development for a little over a year.
>> Well it's a lie that I'm not interested in it, but too much for me.
> It may actually be trivial -- change TARGET_LRA_P to be hook_bool_void_true in the xtensa port, then rebuild & test. In the couple conversions I've done it's exposed a very small number of issues that were trivially resolved.  Given the what I've seen from you I would expect it's within your capabilities and there's folks here that can help if you do run into problems.

I was flattered, so I decided to go along with the offer :)
===
This patch provides the first step in the transition from Reload to LRA
in Xtensa.

gcc/ChangeLog:

	* config/xtensa/xtensa-proto.h (xtensa_split1_is_finished_p):
	New prototype.
	* config/xtensa/xtensa.cc
	(xtensa_split1_is_finished_p, xtensa_lra_p): New functions.
	(TARGET_LRA_P): Replace the dummy hook with xtensa_lra_p.
	(xt_true_regnum): Rework.
	* gcc/config/xtensa/xtensa.h (CALL_REALLY_USED_REGISTERS):
	Rename from CALL_USED_REGISTERS, and remove what correspond to
	FIXED_REGISTERS.
	* gcc/config/xtensa/constraints.md (Y):
	Use !xtensa_split1_is_finished_p() instead of can_create_pseudo_p().
	* gcc/config/xtensa/predicates.md (move_operand): Ditto.
	* gcc/config/xtensa/xtensa.md:
	Add new split pattern that puts out-of-constraint integer constants
	into the constant pool.
	* gcc/config/xtensa/xtensa.opt (-mlra): New target-specific option
	for testing purpose.
---
 gcc/config/xtensa/constraints.md  |  2 +-
 gcc/config/xtensa/predicates.md   |  2 +-
 gcc/config/xtensa/xtensa-protos.h |  1 +
 gcc/config/xtensa/xtensa.cc       | 48 ++++++++++++++++++++++++-------
 gcc/config/xtensa/xtensa.h        |  6 ++--
 gcc/config/xtensa/xtensa.md       | 12 ++++++++
 gcc/config/xtensa/xtensa.opt      |  4 +++
 7 files changed, 60 insertions(+), 15 deletions(-)

diff --git a/gcc/config/xtensa/constraints.md b/gcc/config/xtensa/constraints.md
index e4c314b267c..111e5f29fd2 100644
--- a/gcc/config/xtensa/constraints.md
+++ b/gcc/config/xtensa/constraints.md
@@ -121,7 +121,7 @@
  (ior (and (match_code "const_int,const_double,const,symbol_ref,label_ref")
 	   (match_test "TARGET_AUTO_LITPOOLS"))
       (and (match_code "const_int")
-	   (match_test "can_create_pseudo_p ()"))))
+	   (match_test "! xtensa_split1_is_finished_p ()"))))
 
 ;; Memory constraints.  Do not use define_memory_constraint here.  Doing so
 ;; causes reload to force some constants into the constant pool, but since
diff --git a/gcc/config/xtensa/predicates.md b/gcc/config/xtensa/predicates.md
index 0590c0f81a9..fc9de01fd71 100644
--- a/gcc/config/xtensa/predicates.md
+++ b/gcc/config/xtensa/predicates.md
@@ -149,7 +149,7 @@
      (ior (and (match_code "const_int")
 	       (match_test "(GET_MODE_CLASS (mode) == MODE_INT
 			     && xtensa_simm12b (INTVAL (op)))
-			    || can_create_pseudo_p ()"))
+			    || ! xtensa_split1_is_finished_p ()"))
 	  (and (match_code "const_int,const_double,const,symbol_ref,label_ref")
 	       (match_test "(TARGET_CONST16 || TARGET_AUTO_LITPOOLS)
 			    && CONSTANT_P (op)
diff --git a/gcc/config/xtensa/xtensa-protos.h b/gcc/config/xtensa/xtensa-protos.h
index 459e2aac9fc..55779f9d199 100644
--- a/gcc/config/xtensa/xtensa-protos.h
+++ b/gcc/config/xtensa/xtensa-protos.h
@@ -58,6 +58,7 @@ extern char *xtensa_emit_call (int, rtx *);
 extern char *xtensa_emit_sibcall (int, rtx *);
 extern bool xtensa_tls_referenced_p (rtx);
 extern enum rtx_code xtensa_shlrd_which_direction (rtx, rtx);
+extern bool xtensa_split1_is_finished_p (void);
 
 #ifdef TREE_CODE
 extern void init_cumulative_args (CUMULATIVE_ARGS *, int);
diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc
index 828c7642b7c..ce2aab673df 100644
--- a/gcc/config/xtensa/xtensa.cc
+++ b/gcc/config/xtensa/xtensa.cc
@@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "hw-doloop.h"
 #include "rtl-iter.h"
 #include "insn-attr.h"
+#include "tree-pass.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -199,6 +200,7 @@ static void xtensa_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED,
 				    HOST_WIDE_INT delta,
 				    HOST_WIDE_INT vcall_offset,
 				    tree function);
+static bool xtensa_lra_p (void);
 
 static rtx xtensa_delegitimize_address (rtx);
 
@@ -295,7 +297,7 @@ static rtx xtensa_delegitimize_address (rtx);
 #define TARGET_CANNOT_FORCE_CONST_MEM xtensa_cannot_force_const_mem
 
 #undef TARGET_LRA_P
-#define TARGET_LRA_P hook_bool_void_false
+#define TARGET_LRA_P xtensa_lra_p
 
 #undef TARGET_LEGITIMATE_ADDRESS_P
 #define TARGET_LEGITIMATE_ADDRESS_P	xtensa_legitimate_address_p
@@ -492,21 +494,30 @@ xtensa_mask_immediate (HOST_WIDE_INT v)
 int
 xt_true_regnum (rtx x)
 {
-  if (GET_CODE (x) == REG)
+  if (REG_P (x))
     {
-      if (reg_renumber
-	  && REGNO (x) >= FIRST_PSEUDO_REGISTER
-	  && reg_renumber[REGNO (x)] >= 0)
+      if (HARD_REGISTER_P (x)
+	  && reg_renumber
+	  && (lra_in_progress || reg_renumber[REGNO (x)] >= 0))
 	return reg_renumber[REGNO (x)];
       return REGNO (x);
     }
-  if (GET_CODE (x) == SUBREG)
+  if (SUBREG_P (x))
     {
       int base = xt_true_regnum (SUBREG_REG (x));
-      if (base >= 0 && base < FIRST_PSEUDO_REGISTER)
-        return base + subreg_regno_offset (REGNO (SUBREG_REG (x)),
-                                           GET_MODE (SUBREG_REG (x)),
-                                           SUBREG_BYTE (x), GET_MODE (x));
+
+      if (base >= 0
+	  && HARD_REGISTER_NUM_P (base))
+	{
+	  struct subreg_info info;
+
+	  subreg_get_info (lra_in_progress
+			   ? (unsigned) base : REGNO (SUBREG_REG (x)),
+			   GET_MODE (SUBREG_REG (x)),
+			   SUBREG_BYTE (x), GET_MODE (x), &info);
+	  if (info.representable_p)
+	    return base + info.offset;
+	}
     }
   return -1;
 }
@@ -2477,6 +2488,15 @@ xtensa_shlrd_which_direction (rtx op0, rtx op1)
 }
 
 
+/* Return true after "split1" pass is finished.  */
+
+bool
+xtensa_split1_is_finished_p (void)
+{
+  return cfun && (cfun->curr_properties & PROP_rtl_split_insns);
+}
+
+
 /* Implement TARGET_CANNOT_FORCE_CONST_MEM.  */
 
 static bool
@@ -5119,4 +5139,12 @@ xtensa_delegitimize_address (rtx op)
   return op;
 }
 
+/* Implement TARGET_LRA_P.  */
+
+static bool
+xtensa_lra_p (void)
+{
+  return TARGET_LRA;
+}
+
 #include "gt-xtensa.h"
diff --git a/gcc/config/xtensa/xtensa.h b/gcc/config/xtensa/xtensa.h
index 16e3d55e896..6b60e596062 100644
--- a/gcc/config/xtensa/xtensa.h
+++ b/gcc/config/xtensa/xtensa.h
@@ -242,10 +242,10 @@ along with GCC; see the file COPYING3.  If not see
 
    Proper values are computed in TARGET_CONDITIONAL_REGISTER_USAGE.  */
 
-#define CALL_USED_REGISTERS						\
+#define CALL_REALLY_USED_REGISTERS					\
 {									\
-  1, 1, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 2, 2, 2, 2,			\
-  1, 1, 1,								\
+  0, 0, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 2, 2, 2, 2,			\
+  0, 0, 1,								\
   1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,			\
   1,									\
 }
diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index 608110c20bc..3b37f01e117 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -1017,6 +1017,18 @@
    (set_attr "mode"	"SI")
    (set_attr "length"	"2,2,2,2,2,2,3,3,3,3,6,3,3,3,3,3")])
 
+(define_split
+  [(set (match_operand:SI 0 "register_operand")
+	(match_operand:SI 1 "const_int_operand"))]
+  "!TARGET_CONST16 && !TARGET_AUTO_LITPOOLS
+   && ! xtensa_split1_is_finished_p ()
+   && ! xtensa_simm12b (INTVAL (operands[1]))"
+  [(set (match_dup 0)
+	(match_dup 1))]
+{
+  operands[1] = force_const_mem (SImode, operands[1]);
+})
+
 (define_split
   [(set (match_operand:SI 0 "register_operand")
 	(match_operand:SI 1 "constantpool_operand"))]
diff --git a/gcc/config/xtensa/xtensa.opt b/gcc/config/xtensa/xtensa.opt
index 08338e39060..00d2db4eae1 100644
--- a/gcc/config/xtensa/xtensa.opt
+++ b/gcc/config/xtensa/xtensa.opt
@@ -34,6 +34,10 @@ mextra-l32r-costs=
 Target RejectNegative Joined UInteger Var(xtensa_extra_l32r_costs) Init(0)
 Set extra memory access cost for L32R instruction, in clock-cycle units.
 
+mlra
+Target Mask(LRA)
+Use LRA instead of reload (transitional).
+
 mtarget-align
 Target
 Automatically align branch targets to reduce branch penalties.
-- 
2.30.2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] xtensa: Prepare the transition from Reload to LRA
  2022-10-14 11:06           ` [PATCH] xtensa: Prepare the transition from Reload to LRA Takayuki 'January June' Suwa
@ 2022-10-16  5:03             ` Max Filippov
  2022-10-18  2:57               ` [PATCH v2] " Takayuki 'January June' Suwa
  0 siblings, 1 reply; 20+ messages in thread
From: Max Filippov @ 2022-10-16  5:03 UTC (permalink / raw)
  To: Takayuki 'January June' Suwa; +Cc: GCC Patches, Jeff Law

Hi Suwa-san,

On Fri, Oct 14, 2022 at 4:19 AM Takayuki 'January June' Suwa
<jjsuwa_sys3175@yahoo.co.jp> wrote:
> This patch provides the first step in the transition from Reload to LRA
> in Xtensa.
>
> gcc/ChangeLog:
>
>         * config/xtensa/xtensa-proto.h (xtensa_split1_is_finished_p):
>         New prototype.
>         * config/xtensa/xtensa.cc
>         (xtensa_split1_is_finished_p, xtensa_lra_p): New functions.
>         (TARGET_LRA_P): Replace the dummy hook with xtensa_lra_p.
>         (xt_true_regnum): Rework.
>         * gcc/config/xtensa/xtensa.h (CALL_REALLY_USED_REGISTERS):
>         Rename from CALL_USED_REGISTERS, and remove what correspond to
>         FIXED_REGISTERS.
>         * gcc/config/xtensa/constraints.md (Y):
>         Use !xtensa_split1_is_finished_p() instead of can_create_pseudo_p().
>         * gcc/config/xtensa/predicates.md (move_operand): Ditto.
>         * gcc/config/xtensa/xtensa.md:
>         Add new split pattern that puts out-of-constraint integer constants
>         into the constant pool.
>         * gcc/config/xtensa/xtensa.opt (-mlra): New target-specific option
>         for testing purpose.
> ---
>  gcc/config/xtensa/constraints.md  |  2 +-
>  gcc/config/xtensa/predicates.md   |  2 +-
>  gcc/config/xtensa/xtensa-protos.h |  1 +
>  gcc/config/xtensa/xtensa.cc       | 48 ++++++++++++++++++++++++-------
>  gcc/config/xtensa/xtensa.h        |  6 ++--
>  gcc/config/xtensa/xtensa.md       | 12 ++++++++
>  gcc/config/xtensa/xtensa.opt      |  4 +++
>  7 files changed, 60 insertions(+), 15 deletions(-)

Thank you for doing this, I couldn't find time to get back to it since 2020 ):

This change results in a few new regressions in the following tests
caused by ICE even when running without -mlra option:

+FAIL: gcc.c-torture/execute/pr92904.c   -O1  (internal compiler
error: in extract_insn, at recog.cc:2791)
+FAIL: gcc.c-torture/execute/pr92904.c   -O2  (internal compiler
error: in extract_insn, at recog.cc:2791)
+FAIL: gcc.c-torture/execute/pr92904.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (internal
compiler error: in extract_insn, at recog.cc:2791)
+FAIL: gcc.c-torture/execute/pr92904.c   -O3 -g  (internal compiler
error: in extract_insn, at recog.cc:2791)
+FAIL: gcc.c-torture/execute/pr92904.c   -Os  (internal compiler
error: in extract_insn, at recog.cc:2791)
+FAIL: gcc.c-torture/execute/pr92904.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  (internal compiler error:
in extract_insn, at recog.cc:2791)
+FAIL: gcc.c-torture/execute/pr92904.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  (internal compiler error: in extract_insn, at
recog.cc:2791)
+FAIL: g++.dg/torture/vshuf-v2si.C   -O3 -g  (internal compiler error:
in extract_insn, at recog.cc:2791)
+FAIL: g++.dg/torture/vshuf-v8qi.C   -O3 -g  (internal compiler error:
in extract_insn, at recog.cc:2791)

The backtraces look like this in all of them:

gcc/gcc/testsuite/gcc.c-torture/execute/pr92904.c:395:1: error:
unrecognizable insn:
(insn 10501 7 10502 2 (set (reg:SI 5913)
       (const_int 1431655765 [0x55555555]))
"gcc/gcc/testsuite/gcc.c-torture/execute/pr92904.c":239:9 -1
    (nil))
during RTL pass: subreg3
gcc/gcc/testsuite/gcc.c-torture/execute/pr92904.c:395:1: internal
compiler error: in extract_insn, at recog.cc:2791
0x6b17f7 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
       gcc/gcc/rtl-error.cc:108
0x6b187a _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
       gcc/gcc/rtl-error.cc:116
0x6a2aa4 extract_insn(rtx_insn*)
       gcc/gcc/recog.cc:2791
0x179e94d decompose_multiword_subregs
       gcc/gcc/lower-subreg.cc:1678
0x179ebdd execute
       gcc/gcc/lower-subreg.cc:1820

There's also the following runtime failures, but only on
call0 configuration:

+FAIL: gcc.c-torture/execute/20010122-1.c   -O1  execution test
+FAIL: gcc.c-torture/execute/20010122-1.c   -O2  execution test
+FAIL: gcc.c-torture/execute/20010122-1.c   -O3 -g  execution test
+FAIL: gcc.c-torture/execute/20010122-1.c   -Os  execution test
+FAIL: gcc.c-torture/execute/20010122-1.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  execution test

-- 
Thanks.
-- Max

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2] xtensa: Prepare the transition from Reload to LRA
  2022-10-16  5:03             ` Max Filippov
@ 2022-10-18  2:57               ` Takayuki 'January June' Suwa
  2022-10-18  3:14                 ` Max Filippov
  0 siblings, 1 reply; 20+ messages in thread
From: Takayuki 'January June' Suwa @ 2022-10-18  2:57 UTC (permalink / raw)
  To: GCC Patches; +Cc: Max Filippov

On 2022/10/16 14:03, Max Filippov wrote:
> Hi Suwa-san,
Hi!

> This change results in a few new regressions in the following tests caused by ICE even when running without -mlra option:
> 
> +FAIL: gcc.c-torture/execute/pr92904.c   -O1  (internal compiler error: in extract_insn, at recog.cc:2791)
> 
> The backtraces look like this in all of them:
> 
> gcc/gcc/testsuite/gcc.c-torture/execute/pr92904.c:395:1: error:
> unrecognizable insn:
> (insn 10501 7 10502 2 (set (reg:SI 5913)
>        (const_int 1431655765 [0x55555555]))
> "gcc/gcc/testsuite/gcc.c-torture/execute/pr92904.c":239:9 -1
>     (nil))
> during RTL pass: subreg3
> gcc/gcc/testsuite/gcc.c-torture/execute/pr92904.c:395:1: internal compiler error: in extract_insn, at recog.cc:2791

"expand" pass generates the below from referencing to the struct:

  ;; MEM <long long int> [(union Y *)&u] = 6148914691236517205;
  (set (reg:DI X) (mem:DI (symbol_ref:SI ("*.LC_u"))))

and then "fwprop1" transforms it by dereference:

  (set (reg:DI X) (const_int 0x5555555555555555))

finally "subreg3" (but not "split1") splits it into the two that don't satisfy the constraint:

  (set (reg:SI X0) (const_int 0x55555555))
  (set (reg:SI X1) (const_int 0x55555555))

> There's also the following runtime failures, but only on call0 configuration:
> 
> +FAIL: gcc.c-torture/execute/20010122-1.c   -O1  execution test
> +FAIL: gcc.c-torture/execute/20010122-1.c   -O2  execution test
> +FAIL: gcc.c-torture/execute/20010122-1.c   -O3 -g  execution test
> +FAIL: gcc.c-torture/execute/20010122-1.c   -Os  execution test
> +FAIL: gcc.c-torture/execute/20010122-1.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test

both assembler outputs with and without this patch are identical on my side, but perhaps it can break runtime init and/or libraries due to my silly mistake:

-+      if (HARD_REGISTER_P (x)
++      if (! HARD_REGISTER_P (x)

===
This patch provides the first step in the transition from Reload to LRA
in Xtensa.

gcc/ChangeLog:

	* config/xtensa/xtensa-proto.h
	(xtensa_split1_finished_p, xtensa_split_DI_reg_imm): New prototypes.
	* config/xtensa/xtensa.cc
	(xtensa_split1_finished_p, xtensa_split_DI_reg_imm, xtensa_lra_p):
	New functions.
	(TARGET_LRA_P): Replace the dummy hook with xtensa_lra_p.
	(xt_true_regnum): Rework.
	* gcc/config/xtensa/xtensa.h (CALL_REALLY_USED_REGISTERS):
	Rename from CALL_USED_REGISTERS, and remove what correspond to
	FIXED_REGISTERS.
	* gcc/config/xtensa/constraints.md (Y):
	Use !xtensa_split1_finished_p() instead of can_create_pseudo_p().
	* gcc/config/xtensa/predicates.md (move_operand): Ditto.
	* gcc/config/xtensa/xtensa.md: Add two new split patterns:
	  - splits DImode immediate load into two SImode ones
	  - puts out-of-constraint SImode constants into the constant pool
	* gcc/config/xtensa/xtensa.opt (-mlra): New target-specific option
	for testing purpose.
---
 gcc/config/xtensa/constraints.md  |  2 +-
 gcc/config/xtensa/predicates.md   |  2 +-
 gcc/config/xtensa/xtensa-protos.h |  2 +
 gcc/config/xtensa/xtensa.cc       | 69 ++++++++++++++++++++++++++-----
 gcc/config/xtensa/xtensa.h        |  6 +--
 gcc/config/xtensa/xtensa.md       | 36 ++++++++++++----
 gcc/config/xtensa/xtensa.opt      |  4 ++
 7 files changed, 98 insertions(+), 23 deletions(-)

diff --git a/gcc/config/xtensa/constraints.md b/gcc/config/xtensa/constraints.md
index e4c314b267c..cd200d6d15a 100644
--- a/gcc/config/xtensa/constraints.md
+++ b/gcc/config/xtensa/constraints.md
@@ -121,7 +121,7 @@
  (ior (and (match_code "const_int,const_double,const,symbol_ref,label_ref")
 	   (match_test "TARGET_AUTO_LITPOOLS"))
       (and (match_code "const_int")
-	   (match_test "can_create_pseudo_p ()"))))
+	   (match_test "! xtensa_split1_finished_p ()"))))
 
 ;; Memory constraints.  Do not use define_memory_constraint here.  Doing so
 ;; causes reload to force some constants into the constant pool, but since
diff --git a/gcc/config/xtensa/predicates.md b/gcc/config/xtensa/predicates.md
index 0590c0f81a9..c11e8634dbe 100644
--- a/gcc/config/xtensa/predicates.md
+++ b/gcc/config/xtensa/predicates.md
@@ -149,7 +149,7 @@
      (ior (and (match_code "const_int")
 	       (match_test "(GET_MODE_CLASS (mode) == MODE_INT
 			     && xtensa_simm12b (INTVAL (op)))
-			    || can_create_pseudo_p ()"))
+			    || ! xtensa_split1_finished_p ()"))
 	  (and (match_code "const_int,const_double,const,symbol_ref,label_ref")
 	       (match_test "(TARGET_CONST16 || TARGET_AUTO_LITPOOLS)
 			    && CONSTANT_P (op)
diff --git a/gcc/config/xtensa/xtensa-protos.h b/gcc/config/xtensa/xtensa-protos.h
index 459e2aac9fc..bc75ad9698a 100644
--- a/gcc/config/xtensa/xtensa-protos.h
+++ b/gcc/config/xtensa/xtensa-protos.h
@@ -58,6 +58,8 @@ extern char *xtensa_emit_call (int, rtx *);
 extern char *xtensa_emit_sibcall (int, rtx *);
 extern bool xtensa_tls_referenced_p (rtx);
 extern enum rtx_code xtensa_shlrd_which_direction (rtx, rtx);
+extern bool xtensa_split1_finished_p (void);
+extern void xtensa_split_DI_reg_imm (rtx *);
 
 #ifdef TREE_CODE
 extern void init_cumulative_args (CUMULATIVE_ARGS *, int);
diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc
index 828c7642b7c..950eb5a59be 100644
--- a/gcc/config/xtensa/xtensa.cc
+++ b/gcc/config/xtensa/xtensa.cc
@@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "hw-doloop.h"
 #include "rtl-iter.h"
 #include "insn-attr.h"
+#include "tree-pass.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -199,6 +200,7 @@ static void xtensa_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED,
 				    HOST_WIDE_INT delta,
 				    HOST_WIDE_INT vcall_offset,
 				    tree function);
+static bool xtensa_lra_p (void);
 
 static rtx xtensa_delegitimize_address (rtx);
 
@@ -295,7 +297,7 @@ static rtx xtensa_delegitimize_address (rtx);
 #define TARGET_CANNOT_FORCE_CONST_MEM xtensa_cannot_force_const_mem
 
 #undef TARGET_LRA_P
-#define TARGET_LRA_P hook_bool_void_false
+#define TARGET_LRA_P xtensa_lra_p
 
 #undef TARGET_LEGITIMATE_ADDRESS_P
 #define TARGET_LEGITIMATE_ADDRESS_P	xtensa_legitimate_address_p
@@ -492,21 +494,30 @@ xtensa_mask_immediate (HOST_WIDE_INT v)
 int
 xt_true_regnum (rtx x)
 {
-  if (GET_CODE (x) == REG)
+  if (REG_P (x))
     {
-      if (reg_renumber
-	  && REGNO (x) >= FIRST_PSEUDO_REGISTER
-	  && reg_renumber[REGNO (x)] >= 0)
+      if (! HARD_REGISTER_P (x)
+	  && reg_renumber
+	  && (lra_in_progress || reg_renumber[REGNO (x)] >= 0))
 	return reg_renumber[REGNO (x)];
       return REGNO (x);
     }
-  if (GET_CODE (x) == SUBREG)
+  if (SUBREG_P (x))
     {
       int base = xt_true_regnum (SUBREG_REG (x));
-      if (base >= 0 && base < FIRST_PSEUDO_REGISTER)
-        return base + subreg_regno_offset (REGNO (SUBREG_REG (x)),
-                                           GET_MODE (SUBREG_REG (x)),
-                                           SUBREG_BYTE (x), GET_MODE (x));
+
+      if (base >= 0
+	  && HARD_REGISTER_NUM_P (base))
+	{
+	  struct subreg_info info;
+
+	  subreg_get_info (lra_in_progress
+			   ? (unsigned) base : REGNO (SUBREG_REG (x)),
+			   GET_MODE (SUBREG_REG (x)),
+			   SUBREG_BYTE (x), GET_MODE (x), &info);
+	  if (info.representable_p)
+	    return base + info.offset;
+	}
     }
   return -1;
 }
@@ -2477,6 +2488,36 @@ xtensa_shlrd_which_direction (rtx op0, rtx op1)
 }
 
 
+/* Return true after "split1" pass has been finished.  */
+
+bool
+xtensa_split1_finished_p (void)
+{
+  return cfun && (cfun->curr_properties & PROP_rtl_split_insns);
+}
+
+
+/* Split a DImode pair of reg (operand[0]) and const_int (operand[1]) into
+   two SImode pairs, the low-part (operands[0] and [1]) and the high-part
+   (operands[2] and [3]).  */
+
+void
+xtensa_split_DI_reg_imm (rtx *operands)
+{
+  rtx lowpart, highpart;
+
+  if (WORDS_BIG_ENDIAN)
+    split_double (operands[1], &highpart, &lowpart);
+  else
+    split_double (operands[1], &lowpart, &highpart);
+
+  operands[3] = highpart;
+  operands[2] = gen_highpart (SImode, operands[0]);
+  operands[1] = lowpart;
+  operands[0] = gen_lowpart (SImode, operands[0]);
+}
+
+
 /* Implement TARGET_CANNOT_FORCE_CONST_MEM.  */
 
 static bool
@@ -5119,4 +5160,12 @@ xtensa_delegitimize_address (rtx op)
   return op;
 }
 
+/* Implement TARGET_LRA_P.  */
+
+static bool
+xtensa_lra_p (void)
+{
+  return TARGET_LRA;
+}
+
 #include "gt-xtensa.h"
diff --git a/gcc/config/xtensa/xtensa.h b/gcc/config/xtensa/xtensa.h
index 16e3d55e896..6b60e596062 100644
--- a/gcc/config/xtensa/xtensa.h
+++ b/gcc/config/xtensa/xtensa.h
@@ -242,10 +242,10 @@ along with GCC; see the file COPYING3.  If not see
 
    Proper values are computed in TARGET_CONDITIONAL_REGISTER_USAGE.  */
 
-#define CALL_USED_REGISTERS						\
+#define CALL_REALLY_USED_REGISTERS					\
 {									\
-  1, 1, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 2, 2, 2, 2,			\
-  1, 1, 1,								\
+  0, 0, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 2, 2, 2, 2,			\
+  0, 0, 1,								\
   1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,			\
   1,									\
 }
diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index 608110c20bc..2e7f76ada5c 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -940,14 +940,9 @@
 	 because of offering further optimization opportunities.  */
       if (register_operand (operands[0], DImode))
 	{
-	  rtx lowpart, highpart;
-
-	  if (TARGET_BIG_ENDIAN)
-	    split_double (operands[1], &highpart, &lowpart);
-	  else
-	    split_double (operands[1], &lowpart, &highpart);
-	  emit_insn (gen_movsi (gen_lowpart (SImode, operands[0]), lowpart));
-	  emit_insn (gen_movsi (gen_highpart (SImode, operands[0]), highpart));
+	  xtensa_split_DI_reg_imm (operands);
+	  emit_move_insn (operands[0], operands[1]);
+	  emit_move_insn (operands[2], operands[3]);
 	  DONE;
 	}
 
@@ -981,6 +976,19 @@
     }
 })
 
+(define_split
+  [(set (match_operand:DI 0 "register_operand")
+	(match_operand:DI 1 "const_int_operand"))]
+  "!TARGET_CONST16 && !TARGET_AUTO_LITPOOLS
+   && ! xtensa_split1_finished_p ()"
+  [(set (match_dup 0)
+	(match_dup 1))
+   (set (match_dup 2)
+	(match_dup 3))]
+{
+  xtensa_split_DI_reg_imm (operands);
+})
+
 ;; 32-bit Integer moves
 
 (define_expand "movsi"
@@ -1017,6 +1025,18 @@
    (set_attr "mode"	"SI")
    (set_attr "length"	"2,2,2,2,2,2,3,3,3,3,6,3,3,3,3,3")])
 
+(define_split
+  [(set (match_operand:SI 0 "register_operand")
+	(match_operand:SI 1 "const_int_operand"))]
+  "!TARGET_CONST16 && !TARGET_AUTO_LITPOOLS
+   && ! xtensa_split1_finished_p ()
+   && ! xtensa_simm12b (INTVAL (operands[1]))"
+  [(set (match_dup 0)
+	(match_dup 1))]
+{
+  operands[1] = force_const_mem (SImode, operands[1]);
+})
+
 (define_split
   [(set (match_operand:SI 0 "register_operand")
 	(match_operand:SI 1 "constantpool_operand"))]
diff --git a/gcc/config/xtensa/xtensa.opt b/gcc/config/xtensa/xtensa.opt
index 08338e39060..00d2db4eae1 100644
--- a/gcc/config/xtensa/xtensa.opt
+++ b/gcc/config/xtensa/xtensa.opt
@@ -34,6 +34,10 @@ mextra-l32r-costs=
 Target RejectNegative Joined UInteger Var(xtensa_extra_l32r_costs) Init(0)
 Set extra memory access cost for L32R instruction, in clock-cycle units.
 
+mlra
+Target Mask(LRA)
+Use LRA instead of reload (transitional).
+
 mtarget-align
 Target
 Automatically align branch targets to reduce branch penalties.
-- 
2.30.2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] xtensa: Prepare the transition from Reload to LRA
  2022-10-18  2:57               ` [PATCH v2] " Takayuki 'January June' Suwa
@ 2022-10-18  3:14                 ` Max Filippov
  2022-10-18 12:16                   ` Max Filippov
  0 siblings, 1 reply; 20+ messages in thread
From: Max Filippov @ 2022-10-18  3:14 UTC (permalink / raw)
  To: Takayuki 'January June' Suwa; +Cc: GCC Patches

On Mon, Oct 17, 2022 at 7:57 PM Takayuki 'January June' Suwa
<jjsuwa_sys3175@yahoo.co.jp> wrote:
> On 2022/10/16 14:03, Max Filippov wrote:
> > There's also the following runtime failures, but only on call0 configuration:
> >
> > +FAIL: gcc.c-torture/execute/20010122-1.c   -O1  execution test
> > +FAIL: gcc.c-torture/execute/20010122-1.c   -O2  execution test
> > +FAIL: gcc.c-torture/execute/20010122-1.c   -O3 -g  execution test
> > +FAIL: gcc.c-torture/execute/20010122-1.c   -Os  execution test
> > +FAIL: gcc.c-torture/execute/20010122-1.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
>
> both assembler outputs with and without this patch are identical on my side

Interesting. In -O1 test I see the following difference that is going to affect
the return value of the corresponding functions:

--- gcc-13-3308-gb4a4c6382b14-call0-le/20010122-1.s      2022-10-17
20:07:32.390363204 -0700
+++ gcc-13-3309-g851636ecd015-call0-le/20010122-1.s      2022-10-17
20:06:36.613785546 -0700
@@ -143,13 +143,10 @@
test2:
       addi    sp, sp, -16
       s32i.n  a0, sp, 12
-       s32i.n  a12, sp, 8
-       mov.n   a12, a0
       l32r    a2, .LC6
       callx0  a2
-       mov.n   a2, a12
+       mov.n   a2, a0
       l32i.n  a0, sp, 12
-       l32i.n  a12, sp, 8
       addi    sp, sp, 16
       ret.n
       .size   test2, .-test2
@@ -161,13 +158,10 @@
test3:
       addi    sp, sp, -16
       s32i.n  a0, sp, 12
-       s32i.n  a12, sp, 8
-       mov.n   a12, a0
       l32r    a2, .LC7
       callx0  a2
-       mov.n   a2, a12
+       mov.n   a2, a0
       l32i.n  a0, sp, 12
-       l32i.n  a12, sp, 8
       addi    sp, sp, 16
       ret.n
       .size   test3, .-test3
@@ -258,14 +252,11 @@
test8:
       addi    sp, sp, -16
       s32i.n  a0, sp, 12
-       s32i.n  a12, sp, 8
-       mov.n   a12, a0
       l32r    a2, .LC12
       callx0  a2
       l32r    a2, .LC13
-       s32i.n  a12, a2, 0
+       s32i.n  a0, a2, 0
       l32i.n  a0, sp, 12
-       l32i.n  a12, sp, 8
       addi    sp, sp, 16
       ret.n
       .size   test8, .-test8

-- 
Thanks.
-- Max

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] xtensa: Prepare the transition from Reload to LRA
  2022-10-18  3:14                 ` Max Filippov
@ 2022-10-18 12:16                   ` Max Filippov
  2022-10-19  8:16                     ` [PATCH v3] " Takayuki 'January June' Suwa
  0 siblings, 1 reply; 20+ messages in thread
From: Max Filippov @ 2022-10-18 12:16 UTC (permalink / raw)
  To: Takayuki 'January June' Suwa; +Cc: GCC Patches

Hi Suwa-san,

v2 fixes the regressions caused by ICEs, but not the runtime failures.

On Mon, Oct 17, 2022 at 8:14 PM Max Filippov <jcmvbkbc@gmail.com> wrote:
> On Mon, Oct 17, 2022 at 7:57 PM Takayuki 'January June' Suwa
> <jjsuwa_sys3175@yahoo.co.jp> wrote:
> > On 2022/10/16 14:03, Max Filippov wrote:
> > > There's also the following runtime failures, but only on call0 configuration:
> > >
> > > +FAIL: gcc.c-torture/execute/20010122-1.c   -O1  execution test
> > > +FAIL: gcc.c-torture/execute/20010122-1.c   -O2  execution test
> > > +FAIL: gcc.c-torture/execute/20010122-1.c   -O3 -g  execution test
> > > +FAIL: gcc.c-torture/execute/20010122-1.c   -Os  execution test
> > > +FAIL: gcc.c-torture/execute/20010122-1.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  execution test
> >
> > both assembler outputs with and without this patch are identical on my side
>
> Interesting. In -O1 test I see the following difference that is going to affect
> the return value of the corresponding functions:
>
> --- gcc-13-3308-gb4a4c6382b14-call0-le/20010122-1.s      2022-10-17
> 20:07:32.390363204 -0700
> +++ gcc-13-3309-g851636ecd015-call0-le/20010122-1.s      2022-10-17
> 20:06:36.613785546 -0700
> @@ -143,13 +143,10 @@
> test2:
>        addi    sp, sp, -16
>        s32i.n  a0, sp, 12
> -       s32i.n  a12, sp, 8
> -       mov.n   a12, a0
>        l32r    a2, .LC6
>        callx0  a2
> -       mov.n   a2, a12
> +       mov.n   a2, a0
>        l32i.n  a0, sp, 12
> -       l32i.n  a12, sp, 8
>        addi    sp, sp, 16
>        ret.n
>        .size   test2, .-test2
> @@ -161,13 +158,10 @@
> test3:
>        addi    sp, sp, -16
>        s32i.n  a0, sp, 12
> -       s32i.n  a12, sp, 8
> -       mov.n   a12, a0
>        l32r    a2, .LC7
>        callx0  a2
> -       mov.n   a2, a12
> +       mov.n   a2, a0
>        l32i.n  a0, sp, 12
> -       l32i.n  a12, sp, 8
>        addi    sp, sp, 16
>        ret.n
>        .size   test3, .-test3
> @@ -258,14 +252,11 @@
> test8:
>        addi    sp, sp, -16
>        s32i.n  a0, sp, 12
> -       s32i.n  a12, sp, 8
> -       mov.n   a12, a0
>        l32r    a2, .LC12
>        callx0  a2
>        l32r    a2, .LC13
> -       s32i.n  a12, a2, 0
> +       s32i.n  a0, a2, 0
>        l32i.n  a0, sp, 12
> -       l32i.n  a12, sp, 8
>        addi    sp, sp, 16
>        ret.n
>        .size   test8, .-test8

I've noticed that this is related to the following hunk:

-#define CALL_USED_REGISTERS                                            \
+#define CALL_REALLY_USED_REGISTERS                                     \
 {                                                                      \
-  1, 1, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 2, 2, 2, 2,                      \
-  1, 1, 1,                                                             \
+  0, 0, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 2, 2, 2, 2,                      \
+  0, 0, 1,                                                             \
   1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,                      \
   1,                                                                   \
 }

And the following change on top of v2 fixes this regression for me:

diff --git a/gcc/config/xtensa/xtensa.h b/gcc/config/xtensa/xtensa.h
index 6b60e5960625..897f87f735da 100644
--- a/gcc/config/xtensa/xtensa.h
+++ b/gcc/config/xtensa/xtensa.h
@@ -244,7 +244,7 @@ along with GCC; see the file COPYING3.  If not see

#define CALL_REALLY_USED_REGISTERS                                     \
{                                                                      \
-  0, 0, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 2, 2, 2, 2,                      \
+  1, 0, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 2, 2, 2, 2,                      \
  0, 0, 1,                                                             \
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,                      \
  1,                                                                   \


-- 
Thanks.
-- Max

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3] xtensa: Prepare the transition from Reload to LRA
  2022-10-18 12:16                   ` Max Filippov
@ 2022-10-19  8:16                     ` Takayuki 'January June' Suwa
  2022-10-19 11:31                       ` Max Filippov
  2022-10-25 20:09                       ` Jan-Benedict Glaw
  0 siblings, 2 replies; 20+ messages in thread
From: Takayuki 'January June' Suwa @ 2022-10-19  8:16 UTC (permalink / raw)
  To: GCC Patches; +Cc: Max Filippov

On 2022/10/18 21:16, Max Filippov wrote:
> Hi Suwa-san,
Hi!

> I've noticed that this is related to the following hunk:
> 
> -#define CALL_USED_REGISTERS                                            \
> +#define CALL_REALLY_USED_REGISTERS                                     \
>  {                                                                      \
> -  1, 1, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 2, 2, 2, 2,                      \
> -  1, 1, 1,                                                             \
> +  0, 0, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 2, 2, 2, 2,                      \
> +  0, 0, 1,                                                             \
>    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,                      \
>    1,                                                                   \
>  }
> 
> And the following change on top of v2 fixes this regression for me:
> 
> diff --git a/gcc/config/xtensa/xtensa.h b/gcc/config/xtensa/xtensa.h
> index 6b60e5960625..897f87f735da 100644
> --- a/gcc/config/xtensa/xtensa.h
> +++ b/gcc/config/xtensa/xtensa.h
> @@ -244,7 +244,7 @@ along with GCC; see the file COPYING3.  If not see
> 
> #define CALL_REALLY_USED_REGISTERS                                     \
> {                                                                      \
> -  0, 0, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 2, 2, 2, 2,                      \
> +  1, 0, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 2, 2, 2, 2,                      \
>   0, 0, 1,                                                             \
>   1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,                      \
>   1,                                                                   \

If we take "CALL_REALLY_USED_REGISTERS" literally, it makes sense because the CALL0/CALLX0 instructions actually change the A0 register.
(little question: Then FIXED_REGISTERS is interpreted as not including the link register, but is that correct...)

===
This patch provides the first step in the transition from Reload to LRA
in Xtensa.

gcc/ChangeLog:

	* config/xtensa/xtensa-proto.h
	(xtensa_split1_finished_p, xtensa_split_DI_reg_imm): New prototypes.
	* config/xtensa/xtensa.cc
	(xtensa_split1_finished_p, xtensa_split_DI_reg_imm, xtensa_lra_p):
	New functions.
	(TARGET_LRA_P): Replace the dummy hook with xtensa_lra_p.
	(xt_true_regnum): Rework.
	* gcc/config/xtensa/xtensa.h (CALL_REALLY_USED_REGISTERS):
	Switch from CALL_USED_REGISTERS, and revise the comment.
	* gcc/config/xtensa/constraints.md (Y):
	Use !xtensa_split1_finished_p() instead of can_create_pseudo_p().
	* gcc/config/xtensa/predicates.md (move_operand): Ditto.
	* gcc/config/xtensa/xtensa.md: Add two new split patterns:
	  - splits DImode immediate load into two SImode ones
	  - puts out-of-constraint SImode constants into the constant pool
	* gcc/config/xtensa/xtensa.opt (-mlra): New target-specific option
	for testing purpose.
---
 gcc/config/xtensa/constraints.md  |  2 +-
 gcc/config/xtensa/predicates.md   |  2 +-
 gcc/config/xtensa/xtensa-protos.h |  2 +
 gcc/config/xtensa/xtensa.cc       | 69 ++++++++++++++++++++++++++-----
 gcc/config/xtensa/xtensa.h        |  8 ++--
 gcc/config/xtensa/xtensa.md       | 36 ++++++++++++----
 gcc/config/xtensa/xtensa.opt      |  4 ++
 7 files changed, 99 insertions(+), 24 deletions(-)

diff --git a/gcc/config/xtensa/constraints.md b/gcc/config/xtensa/constraints.md
index e4c314b267c..cd200d6d15a 100644
--- a/gcc/config/xtensa/constraints.md
+++ b/gcc/config/xtensa/constraints.md
@@ -121,7 +121,7 @@
  (ior (and (match_code "const_int,const_double,const,symbol_ref,label_ref")
 	   (match_test "TARGET_AUTO_LITPOOLS"))
       (and (match_code "const_int")
-	   (match_test "can_create_pseudo_p ()"))))
+	   (match_test "! xtensa_split1_finished_p ()"))))
 
 ;; Memory constraints.  Do not use define_memory_constraint here.  Doing so
 ;; causes reload to force some constants into the constant pool, but since
diff --git a/gcc/config/xtensa/predicates.md b/gcc/config/xtensa/predicates.md
index 0590c0f81a9..c11e8634dbe 100644
--- a/gcc/config/xtensa/predicates.md
+++ b/gcc/config/xtensa/predicates.md
@@ -149,7 +149,7 @@
      (ior (and (match_code "const_int")
 	       (match_test "(GET_MODE_CLASS (mode) == MODE_INT
 			     && xtensa_simm12b (INTVAL (op)))
-			    || can_create_pseudo_p ()"))
+			    || ! xtensa_split1_finished_p ()"))
 	  (and (match_code "const_int,const_double,const,symbol_ref,label_ref")
 	       (match_test "(TARGET_CONST16 || TARGET_AUTO_LITPOOLS)
 			    && CONSTANT_P (op)
diff --git a/gcc/config/xtensa/xtensa-protos.h b/gcc/config/xtensa/xtensa-protos.h
index 459e2aac9fc..bc75ad9698a 100644
--- a/gcc/config/xtensa/xtensa-protos.h
+++ b/gcc/config/xtensa/xtensa-protos.h
@@ -58,6 +58,8 @@ extern char *xtensa_emit_call (int, rtx *);
 extern char *xtensa_emit_sibcall (int, rtx *);
 extern bool xtensa_tls_referenced_p (rtx);
 extern enum rtx_code xtensa_shlrd_which_direction (rtx, rtx);
+extern bool xtensa_split1_finished_p (void);
+extern void xtensa_split_DI_reg_imm (rtx *);
 
 #ifdef TREE_CODE
 extern void init_cumulative_args (CUMULATIVE_ARGS *, int);
diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc
index 828c7642b7c..950eb5a59be 100644
--- a/gcc/config/xtensa/xtensa.cc
+++ b/gcc/config/xtensa/xtensa.cc
@@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "hw-doloop.h"
 #include "rtl-iter.h"
 #include "insn-attr.h"
+#include "tree-pass.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -199,6 +200,7 @@ static void xtensa_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED,
 				    HOST_WIDE_INT delta,
 				    HOST_WIDE_INT vcall_offset,
 				    tree function);
+static bool xtensa_lra_p (void);
 
 static rtx xtensa_delegitimize_address (rtx);
 
@@ -295,7 +297,7 @@ static rtx xtensa_delegitimize_address (rtx);
 #define TARGET_CANNOT_FORCE_CONST_MEM xtensa_cannot_force_const_mem
 
 #undef TARGET_LRA_P
-#define TARGET_LRA_P hook_bool_void_false
+#define TARGET_LRA_P xtensa_lra_p
 
 #undef TARGET_LEGITIMATE_ADDRESS_P
 #define TARGET_LEGITIMATE_ADDRESS_P	xtensa_legitimate_address_p
@@ -492,21 +494,30 @@ xtensa_mask_immediate (HOST_WIDE_INT v)
 int
 xt_true_regnum (rtx x)
 {
-  if (GET_CODE (x) == REG)
+  if (REG_P (x))
     {
-      if (reg_renumber
-	  && REGNO (x) >= FIRST_PSEUDO_REGISTER
-	  && reg_renumber[REGNO (x)] >= 0)
+      if (! HARD_REGISTER_P (x)
+	  && reg_renumber
+	  && (lra_in_progress || reg_renumber[REGNO (x)] >= 0))
 	return reg_renumber[REGNO (x)];
       return REGNO (x);
     }
-  if (GET_CODE (x) == SUBREG)
+  if (SUBREG_P (x))
     {
       int base = xt_true_regnum (SUBREG_REG (x));
-      if (base >= 0 && base < FIRST_PSEUDO_REGISTER)
-        return base + subreg_regno_offset (REGNO (SUBREG_REG (x)),
-                                           GET_MODE (SUBREG_REG (x)),
-                                           SUBREG_BYTE (x), GET_MODE (x));
+
+      if (base >= 0
+	  && HARD_REGISTER_NUM_P (base))
+	{
+	  struct subreg_info info;
+
+	  subreg_get_info (lra_in_progress
+			   ? (unsigned) base : REGNO (SUBREG_REG (x)),
+			   GET_MODE (SUBREG_REG (x)),
+			   SUBREG_BYTE (x), GET_MODE (x), &info);
+	  if (info.representable_p)
+	    return base + info.offset;
+	}
     }
   return -1;
 }
@@ -2477,6 +2488,36 @@ xtensa_shlrd_which_direction (rtx op0, rtx op1)
 }
 
 
+/* Return true after "split1" pass has been finished.  */
+
+bool
+xtensa_split1_finished_p (void)
+{
+  return cfun && (cfun->curr_properties & PROP_rtl_split_insns);
+}
+
+
+/* Split a DImode pair of reg (operand[0]) and const_int (operand[1]) into
+   two SImode pairs, the low-part (operands[0] and [1]) and the high-part
+   (operands[2] and [3]).  */
+
+void
+xtensa_split_DI_reg_imm (rtx *operands)
+{
+  rtx lowpart, highpart;
+
+  if (WORDS_BIG_ENDIAN)
+    split_double (operands[1], &highpart, &lowpart);
+  else
+    split_double (operands[1], &lowpart, &highpart);
+
+  operands[3] = highpart;
+  operands[2] = gen_highpart (SImode, operands[0]);
+  operands[1] = lowpart;
+  operands[0] = gen_lowpart (SImode, operands[0]);
+}
+
+
 /* Implement TARGET_CANNOT_FORCE_CONST_MEM.  */
 
 static bool
@@ -5119,4 +5160,12 @@ xtensa_delegitimize_address (rtx op)
   return op;
 }
 
+/* Implement TARGET_LRA_P.  */
+
+static bool
+xtensa_lra_p (void)
+{
+  return TARGET_LRA;
+}
+
 #include "gt-xtensa.h"
diff --git a/gcc/config/xtensa/xtensa.h b/gcc/config/xtensa/xtensa.h
index 16e3d55e896..2275fe6d426 100644
--- a/gcc/config/xtensa/xtensa.h
+++ b/gcc/config/xtensa/xtensa.h
@@ -229,7 +229,7 @@ along with GCC; see the file COPYING3.  If not see
 }
 
 /* 1 for registers not available across function calls.
-   These must include the FIXED_REGISTERS and also any
+   These need not include the FIXED_REGISTERS but must any
    registers that can be used without being saved.
    The latter must include the registers where values are returned
    and the register where structure-value addresses are passed.
@@ -242,10 +242,10 @@ along with GCC; see the file COPYING3.  If not see
 
    Proper values are computed in TARGET_CONDITIONAL_REGISTER_USAGE.  */
 
-#define CALL_USED_REGISTERS						\
+#define CALL_REALLY_USED_REGISTERS					\
 {									\
-  1, 1, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 2, 2, 2, 2,			\
-  1, 1, 1,								\
+  1, 0, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 2, 2, 2, 2,			\
+  0, 0, 1,								\
   1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,			\
   1,									\
 }
diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index 608110c20bc..2e7f76ada5c 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -940,14 +940,9 @@
 	 because of offering further optimization opportunities.  */
       if (register_operand (operands[0], DImode))
 	{
-	  rtx lowpart, highpart;
-
-	  if (TARGET_BIG_ENDIAN)
-	    split_double (operands[1], &highpart, &lowpart);
-	  else
-	    split_double (operands[1], &lowpart, &highpart);
-	  emit_insn (gen_movsi (gen_lowpart (SImode, operands[0]), lowpart));
-	  emit_insn (gen_movsi (gen_highpart (SImode, operands[0]), highpart));
+	  xtensa_split_DI_reg_imm (operands);
+	  emit_move_insn (operands[0], operands[1]);
+	  emit_move_insn (operands[2], operands[3]);
 	  DONE;
 	}
 
@@ -981,6 +976,19 @@
     }
 })
 
+(define_split
+  [(set (match_operand:DI 0 "register_operand")
+	(match_operand:DI 1 "const_int_operand"))]
+  "!TARGET_CONST16 && !TARGET_AUTO_LITPOOLS
+   && ! xtensa_split1_finished_p ()"
+  [(set (match_dup 0)
+	(match_dup 1))
+   (set (match_dup 2)
+	(match_dup 3))]
+{
+  xtensa_split_DI_reg_imm (operands);
+})
+
 ;; 32-bit Integer moves
 
 (define_expand "movsi"
@@ -1017,6 +1025,18 @@
    (set_attr "mode"	"SI")
    (set_attr "length"	"2,2,2,2,2,2,3,3,3,3,6,3,3,3,3,3")])
 
+(define_split
+  [(set (match_operand:SI 0 "register_operand")
+	(match_operand:SI 1 "const_int_operand"))]
+  "!TARGET_CONST16 && !TARGET_AUTO_LITPOOLS
+   && ! xtensa_split1_finished_p ()
+   && ! xtensa_simm12b (INTVAL (operands[1]))"
+  [(set (match_dup 0)
+	(match_dup 1))]
+{
+  operands[1] = force_const_mem (SImode, operands[1]);
+})
+
 (define_split
   [(set (match_operand:SI 0 "register_operand")
 	(match_operand:SI 1 "constantpool_operand"))]
diff --git a/gcc/config/xtensa/xtensa.opt b/gcc/config/xtensa/xtensa.opt
index 08338e39060..00d2db4eae1 100644
--- a/gcc/config/xtensa/xtensa.opt
+++ b/gcc/config/xtensa/xtensa.opt
@@ -34,6 +34,10 @@ mextra-l32r-costs=
 Target RejectNegative Joined UInteger Var(xtensa_extra_l32r_costs) Init(0)
 Set extra memory access cost for L32R instruction, in clock-cycle units.
 
+mlra
+Target Mask(LRA)
+Use LRA instead of reload (transitional).
+
 mtarget-align
 Target
 Automatically align branch targets to reduce branch penalties.
-- 
2.30.2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] xtensa: Prepare the transition from Reload to LRA
  2022-10-19  8:16                     ` [PATCH v3] " Takayuki 'January June' Suwa
@ 2022-10-19 11:31                       ` Max Filippov
  2022-10-25 20:09                       ` Jan-Benedict Glaw
  1 sibling, 0 replies; 20+ messages in thread
From: Max Filippov @ 2022-10-19 11:31 UTC (permalink / raw)
  To: Takayuki 'January June' Suwa; +Cc: GCC Patches

On Wed, Oct 19, 2022 at 1:16 AM Takayuki 'January June' Suwa
<jjsuwa_sys3175@yahoo.co.jp> wrote:
> This patch provides the first step in the transition from Reload to LRA
> in Xtensa.
>
> gcc/ChangeLog:
>
>         * config/xtensa/xtensa-proto.h
>         (xtensa_split1_finished_p, xtensa_split_DI_reg_imm): New prototypes.
>         * config/xtensa/xtensa.cc
>         (xtensa_split1_finished_p, xtensa_split_DI_reg_imm, xtensa_lra_p):
>         New functions.
>         (TARGET_LRA_P): Replace the dummy hook with xtensa_lra_p.
>         (xt_true_regnum): Rework.
>         * gcc/config/xtensa/xtensa.h (CALL_REALLY_USED_REGISTERS):
>         Switch from CALL_USED_REGISTERS, and revise the comment.
>         * gcc/config/xtensa/constraints.md (Y):
>         Use !xtensa_split1_finished_p() instead of can_create_pseudo_p().
>         * gcc/config/xtensa/predicates.md (move_operand): Ditto.
>         * gcc/config/xtensa/xtensa.md: Add two new split patterns:
>           - splits DImode immediate load into two SImode ones
>           - puts out-of-constraint SImode constants into the constant pool
>         * gcc/config/xtensa/xtensa.opt (-mlra): New target-specific option
>         for testing purpose.
> ---
>  gcc/config/xtensa/constraints.md  |  2 +-
>  gcc/config/xtensa/predicates.md   |  2 +-
>  gcc/config/xtensa/xtensa-protos.h |  2 +
>  gcc/config/xtensa/xtensa.cc       | 69 ++++++++++++++++++++++++++-----
>  gcc/config/xtensa/xtensa.h        |  8 ++--
>  gcc/config/xtensa/xtensa.md       | 36 ++++++++++++----
>  gcc/config/xtensa/xtensa.opt      |  4 ++
>  7 files changed, 99 insertions(+), 24 deletions(-)

Regtested for target=xtensa-linux-uclibc, no new regressions.
Committed to master after fixing the changelog.

-- 
Thanks.
-- Max

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] xtensa: Prepare the transition from Reload to LRA
  2022-10-19  8:16                     ` [PATCH v3] " Takayuki 'January June' Suwa
  2022-10-19 11:31                       ` Max Filippov
@ 2022-10-25 20:09                       ` Jan-Benedict Glaw
  2022-10-26  3:23                         ` Takayuki 'January June' Suwa
  2022-10-26  6:27                         ` [PATCH] xtensa: Fix out-of-bounds array access Takayuki 'January June' Suwa
  1 sibling, 2 replies; 20+ messages in thread
From: Jan-Benedict Glaw @ 2022-10-25 20:09 UTC (permalink / raw)
  To: Takayuki 'January June' Suwa; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 3083 bytes --]

Hi!

On Wed, 2022-10-19 17:16:24 +0900, Takayuki 'January June' Suwa via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 	* gcc/config/xtensa/xtensa.md: Add two new split patterns:
> 	  - splits DImode immediate load into two SImode ones
> 	  - puts out-of-constraint SImode constants into the constant pool

> --- a/gcc/config/xtensa/xtensa.md
> +++ b/gcc/config/xtensa/xtensa.md
> @@ -940,14 +940,9 @@
>  	 because of offering further optimization opportunities.  */
>        if (register_operand (operands[0], DImode))
>  	{
> -	  rtx lowpart, highpart;
> -
> -	  if (TARGET_BIG_ENDIAN)
> -	    split_double (operands[1], &highpart, &lowpart);
> -	  else
> -	    split_double (operands[1], &lowpart, &highpart);
> -	  emit_insn (gen_movsi (gen_lowpart (SImode, operands[0]), lowpart));
> -	  emit_insn (gen_movsi (gen_highpart (SImode, operands[0]), highpart));
> +	  xtensa_split_DI_reg_imm (operands);
> +	  emit_move_insn (operands[0], operands[1]);
> +	  emit_move_insn (operands[2], operands[3]);

This results in a new warning for me:

[all 2022-10-25 16:04:19] g++  -fno-PIE -c   -g -O2   -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE   -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common  -DHAVE_CONFIG_H -I. -I. -I../../gcc/gcc -I../../gcc/gcc/. -I../../gcc/gcc/../include -I../../gcc/gcc/../libcpp/include -I../../gcc/gcc/../libcody  -I../../gcc/gcc/../libdecnumber -I../../gcc/gcc/../libdecnumber/dpd -I../libdecnumber -I../../gcc/gcc/../libbacktrace   -o insn-emit.o -MT insn-emit.o -MMD -MP -MF ./.deps/insn-emit.TPo insn-emit.cc
[all 2022-10-25 16:04:22] ../../gcc/gcc/config/xtensa/xtensa.md: In function 'rtx_def* gen_movdi(rtx, rtx)':
[all 2022-10-25 16:04:22] ../../gcc/gcc/config/xtensa/xtensa.md:945:26: error: array subscript 3 is above array bounds of 'rtx_def* [2]' [-Werror=array-bounds]
[all 2022-10-25 16:04:22]   945 |           emit_move_insn (operands[2], operands[3]);
[all 2022-10-25 16:04:22]       |           ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~
[all 2022-10-25 16:04:22] ../../gcc/gcc/config/xtensa/xtensa.md:897:9: note: while referencing 'operands'
[all 2022-10-25 16:04:22]   897 |    (set_attr "mode"     "SF")
[all 2022-10-25 16:04:22]       |         ^~~~~~~~
[all 2022-10-25 16:04:22] ../../gcc/gcc/config/xtensa/xtensa.md:945:26: error: array subscript 2 is above array bounds of 'rtx_def* [2]' [-Werror=array-bounds]
[all 2022-10-25 16:04:22]   945 |           emit_move_insn (operands[2], operands[3]);
[all 2022-10-25 16:04:22]       |           ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~
[all 2022-10-25 16:04:22] ../../gcc/gcc/config/xtensa/xtensa.md:897:9: note: while referencing 'operands'
[all 2022-10-25 16:04:22]   897 |    (set_attr "mode"     "SF")
[all 2022-10-25 16:04:22]       |         ^~~~~~~~

I didn't yet actually check the warning, it may be bogus.

Thanks,
  Jan-Benedict

-- 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] xtensa: Prepare the transition from Reload to LRA
  2022-10-25 20:09                       ` Jan-Benedict Glaw
@ 2022-10-26  3:23                         ` Takayuki 'January June' Suwa
  2022-10-26  6:27                         ` [PATCH] xtensa: Fix out-of-bounds array access Takayuki 'January June' Suwa
  1 sibling, 0 replies; 20+ messages in thread
From: Takayuki 'January June' Suwa @ 2022-10-26  3:23 UTC (permalink / raw)
  To: Jan-Benedict Glaw; +Cc: GCC Patches

On 2022/10/26 5:09, Jan-Benedict Glaw wrote:
> Hi!
Hi!

> This results in a new warning for me:
> 
> [all 2022-10-25 16:04:19] g++  -fno-PIE -c   -g -O2   -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE   -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common  -DHAVE_CONFIG_H -I. -I. -I../../gcc/gcc -I../../gcc/gcc/. -I../../gcc/gcc/../include -I../../gcc/gcc/../libcpp/include -I../../gcc/gcc/../libcody  -I../../gcc/gcc/../libdecnumber -I../../gcc/gcc/../libdecnumber/dpd -I../libdecnumber -I../../gcc/gcc/../libbacktrace   -o insn-emit.o -MT insn-emit.o -MMD -MP -MF ./.deps/insn-emit.TPo insn-emit.cc
> [all 2022-10-25 16:04:22] ../../gcc/gcc/config/xtensa/xtensa.md: In function 'rtx_def* gen_movdi(rtx, rtx)':
> [all 2022-10-25 16:04:22] ../../gcc/gcc/config/xtensa/xtensa.md:945:26: error: array subscript 3 is above array bounds of 'rtx_def* [2]' [-Werror=array-bounds]
> [all 2022-10-25 16:04:22]   945 |           emit_move_insn (operands[2], operands[3]);
> [all 2022-10-25 16:04:22]       |           ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~
> [all 2022-10-25 16:04:22] ../../gcc/gcc/config/xtensa/xtensa.md:897:9: note: while referencing 'operands'
> [all 2022-10-25 16:04:22]   897 |    (set_attr "mode"     "SF")
> [all 2022-10-25 16:04:22]       |         ^~~~~~~~
> [all 2022-10-25 16:04:22] ../../gcc/gcc/config/xtensa/xtensa.md:945:26: error: array subscript 2 is above array bounds of 'rtx_def* [2]' [-Werror=array-bounds]
> [all 2022-10-25 16:04:22]   945 |           emit_move_insn (operands[2], operands[3]);
> [all 2022-10-25 16:04:22]       |           ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~
> [all 2022-10-25 16:04:22] ../../gcc/gcc/config/xtensa/xtensa.md:897:9: note: while referencing 'operands'
> [all 2022-10-25 16:04:22]   897 |    (set_attr "mode"     "SF")
> [all 2022-10-25 16:04:22]       |         ^~~~~~~~
> I didn't yet actually check the warning, it may be bogus.

Thanks for your report.  I'm aware of that warning.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH] xtensa: Fix out-of-bounds array access
  2022-10-25 20:09                       ` Jan-Benedict Glaw
  2022-10-26  3:23                         ` Takayuki 'January June' Suwa
@ 2022-10-26  6:27                         ` Takayuki 'January June' Suwa
  2022-10-26 17:05                           ` Max Filippov
  1 sibling, 1 reply; 20+ messages in thread
From: Takayuki 'January June' Suwa @ 2022-10-26  6:27 UTC (permalink / raw)
  To: GCC Patches; +Cc: Max Filippov, Jan-Benedict Glaw

On 2022/10/26 5:09, Jan-Benedict Glaw wrote:
> I didn't yet actually check the warning, it may be bogus.

This "problem" can occur in the following two places calling xtensa_split_DI_reg_imm():

- (define_expand "movdi") @ line 943-945
- (define_split) @ line 989

and the former causes the "real" problem:

[from gcc/insn-emit.cc (generated by building)]

> /* ../../gcc/config/xtensa/xtensa.md:932 */
> rtx
> gen_movdi (rtx operand0,
> 	rtx operand1)
> {
>   rtx_insn *_val = 0;
>   start_sequence ();
>   {
>     rtx operands[2];					// only 2 elements
>     operands[0] = operand0;
>     operands[1] = operand1;
> #define FAIL return (end_sequence (), _val)
> #define DONE return (_val = get_insns (), end_sequence (), _val)
> #line 936 "../../gcc/config/xtensa/xtensa.md"
> {
>   if (CONSTANT_P (operands[1]))
>     {
>       /* Split in halves if 64-bit Const-to-Reg moves
> 	 because of offering further optimization opportunities.  */
>       if (register_operand (operands[0], DImode))
> 	{
> 	  xtensa_split_DI_reg_imm (operands);		// out-of-bounds!
> 	  emit_move_insn (operands[0], operands[1]);
> 	  emit_move_insn (operands[2], operands[3]);	// out-of-bounds!
> 	  DONE;
> 	}

The latter is not a problem as the array is large enough (up to MAX_RECOG_OPERANDS-1).

===

gcc/ChangeLog:

	* config/xtensa/xtensa.md (movdi):
	Copy operands[0...1] to ops[0...3] and then use the latter before
	calling xtensa_split_DI_reg_imm() and emitting insns.
---
 gcc/config/xtensa/xtensa.md | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index 2e7f76ada5c..de9bcbf24f7 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -940,9 +940,10 @@
 	 because of offering further optimization opportunities.  */
       if (register_operand (operands[0], DImode))
 	{
-	  xtensa_split_DI_reg_imm (operands);
-	  emit_move_insn (operands[0], operands[1]);
-	  emit_move_insn (operands[2], operands[3]);
+	  rtx ops[4] = { operands[0], operands[1] };
+	  xtensa_split_DI_reg_imm (ops);
+	  emit_move_insn (ops[0], ops[1]);
+	  emit_move_insn (ops[2], ops[3]);
 	  DONE;
 	}
 
-- 
2.30.2

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] xtensa: Fix out-of-bounds array access
  2022-10-26  6:27                         ` [PATCH] xtensa: Fix out-of-bounds array access Takayuki 'January June' Suwa
@ 2022-10-26 17:05                           ` Max Filippov
  0 siblings, 0 replies; 20+ messages in thread
From: Max Filippov @ 2022-10-26 17:05 UTC (permalink / raw)
  To: Takayuki 'January June' Suwa; +Cc: GCC Patches, Jan-Benedict Glaw

On Tue, Oct 25, 2022 at 11:27 PM Takayuki 'January June' Suwa
<jjsuwa_sys3175@yahoo.co.jp> wrote:
>
> On 2022/10/26 5:09, Jan-Benedict Glaw wrote:
> > I didn't yet actually check the warning, it may be bogus.
>
> This "problem" can occur in the following two places calling xtensa_split_DI_reg_imm():
>
> - (define_expand "movdi") @ line 943-945
> - (define_split) @ line 989
>
> and the former causes the "real" problem:
>
> [from gcc/insn-emit.cc (generated by building)]
>
> > /* ../../gcc/config/xtensa/xtensa.md:932 */
> > rtx
> > gen_movdi (rtx operand0,
> >       rtx operand1)
> > {
> >   rtx_insn *_val = 0;
> >   start_sequence ();
> >   {
> >     rtx operands[2];                                  // only 2 elements
> >     operands[0] = operand0;
> >     operands[1] = operand1;
> > #define FAIL return (end_sequence (), _val)
> > #define DONE return (_val = get_insns (), end_sequence (), _val)
> > #line 936 "../../gcc/config/xtensa/xtensa.md"
> > {
> >   if (CONSTANT_P (operands[1]))
> >     {
> >       /* Split in halves if 64-bit Const-to-Reg moves
> >        because of offering further optimization opportunities.  */
> >       if (register_operand (operands[0], DImode))
> >       {
> >         xtensa_split_DI_reg_imm (operands);           // out-of-bounds!
> >         emit_move_insn (operands[0], operands[1]);
> >         emit_move_insn (operands[2], operands[3]);    // out-of-bounds!
> >         DONE;
> >       }
>
> The latter is not a problem as the array is large enough (up to MAX_RECOG_OPERANDS-1).
>
> ===
>
> gcc/ChangeLog:
>
>         * config/xtensa/xtensa.md (movdi):
>         Copy operands[0...1] to ops[0...3] and then use the latter before
>         calling xtensa_split_DI_reg_imm() and emitting insns.
> ---
>  gcc/config/xtensa/xtensa.md | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)

Committed to master as obvious after cleaning up
the commit message.

-- 
Thanks.
-- Max

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-10-26 17:05 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-03  1:35 [PATCH] lower-subreg, expr: Mitigate inefficiencies derived from "(clobber (reg X))" followed by "(set (subreg (reg X)) (...))" Takayuki 'January June' Suwa
2022-08-03  7:52 ` Richard Sandiford
2022-08-03 11:17   ` Takayuki 'January June' Suwa
2022-08-04  9:49     ` Richard Sandiford
2022-08-04 12:35       ` Takayuki 'January June' Suwa
2022-08-05 16:20         ` Jeff Law
2022-10-14 11:06           ` [PATCH] xtensa: Prepare the transition from Reload to LRA Takayuki 'January June' Suwa
2022-10-16  5:03             ` Max Filippov
2022-10-18  2:57               ` [PATCH v2] " Takayuki 'January June' Suwa
2022-10-18  3:14                 ` Max Filippov
2022-10-18 12:16                   ` Max Filippov
2022-10-19  8:16                     ` [PATCH v3] " Takayuki 'January June' Suwa
2022-10-19 11:31                       ` Max Filippov
2022-10-25 20:09                       ` Jan-Benedict Glaw
2022-10-26  3:23                         ` Takayuki 'January June' Suwa
2022-10-26  6:27                         ` [PATCH] xtensa: Fix out-of-bounds array access Takayuki 'January June' Suwa
2022-10-26 17:05                           ` Max Filippov
2022-08-05 16:12       ` [PATCH] lower-subreg, expr: Mitigate inefficiencies derived from "(clobber (reg X))" followed by "(set (subreg (reg X)) (...))" Jeff Law
2022-08-03 17:23   ` Jeff Law
2022-08-04  9:39     ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).