Re: [PATCH] lower-subreg, expr: Mitigate inefficiencies derived from "(clobber (reg X))" followed by "(set (subreg (reg X)) (...))"

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Takayuki 'January June' Suwa <jjsuwa_sys3175@yahoo.co.jp>
To: Richard Sandiford <richard.sandiford@arm.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] lower-subreg, expr: Mitigate inefficiencies derived from "(clobber (reg X))" followed by "(set (subreg (reg X)) (...))"
Date: Wed, 3 Aug 2022 20:17:23 +0900	[thread overview]
Message-ID: <7e3fe210-6dbc-fc29-dbb8-b951e89cf7e9@yahoo.co.jp> (raw)
In-Reply-To: <mpt4jytpxw5.fsf@arm.com>

Thanks for your response.

On 2022/08/03 16:52, Richard Sandiford wrote:
> Takayuki 'January June' Suwa via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> Emitting "(clobber (reg X))" before "(set (subreg (reg X)) (...))" keeps
>> data flow consistent, but it also increases register allocation pressure
>> and thus often creates many unwanted register-to-register moves that
>> cannot be optimized away.
> 
> There are two things here:
> 
> - If emit_move_complex_parts emits a clobber of a hard register,
>   then that's probably a bug/misfeature.  The point of the clobber is
>   to indicate that the register has no useful contents.  That's useful
>   for wide pseudos that are written to in parts, since it avoids the
>   need to track the liveness of each part of the pseudo individually.
>   But it shouldn't be necessary for hard registers, since subregs of
>   hard registers are simplified to hard registers wherever possible
>   (which on most targets is "always").
> 
>   So I think the emit_move_complex_parts clobber should be restricted
>   to !HARD_REGISTER_P, like the lower-subreg clobber is.  If that helps
>   (if only partly) then it would be worth doing as its own patch.
> 
> - I think it'd be worth looking into more detail why a clobber makes
>   a difference to register pressure.  A clobber of a pseudo register R
>   shouldn't make R conflict with things that are live at the point of
>   the clobber.

I agree with its worth.
In fact, aside from other ports, on the xtensa one, RA in code with frequent D[FC]mode pseudos is terribly bad.
For example, in __muldc3 on libgcc2, the size of the stack frame reserved will almost double depending on whether or not this patch is applied.

> 
>>  It seems just analogous to partial register
>> stall which is a famous problem on processors that do register renaming.
>>
>> In my opinion, when the register to be clobbered is a composite of hard
>> ones, we should clobber the individual elements separetely, otherwise
>> clear the entire to zero prior to use as the "init-regs" pass does (like
>> partial register stall workarounds on x86 CPUs).  Such redundant zero
>> constant assignments will be removed later in the "cprop_hardreg" pass.
> 
> I don't think we should rely on the zero being optimised away later.
> 
> Emitting the zero also makes it harder for the register allocator
> to elide the move.  For example, if we have:
> 
>   (set (subreg:SI (reg:DI P) 0) (reg:SI R0))
>   (set (subreg:SI (reg:DI P) 4) (reg:SI R1))
> 
> then there is at least a chance that the RA could assign hard registers
> R0:R1 to P, which would turn the moves into nops.  If we emit:
> 
>   (set (reg:DI P) (const_int 0))
> 
> beforehand then that becomes impossible, since R0 and R1 would then
> conflict with P.

Ah, surely, as you pointed out for targets where "(reg: DI)" corresponds to one hard register.

> 
> TBH I'm surprised we still run init_regs for LRA.  I thought there was
> a plan to stop doing that, but perhaps I misremember.

Sorry I am not sure about the status of LRA... because the xtensa port is still using reload.

As conclusion, trying to tweak the common code side may have been a bit premature.
I'll consider if I can deal with those issues on the side of the target-specific code.

> 
> Thanks,
> Richard
> 
>> This patch may give better output code quality for the reasons above,
>> especially on architectures that don't have DFmode hard registers
>> (On architectures with such hard registers, this patch changes virtually
>> nothing).
>>
>> For example (Espressif ESP8266, Xtensa without FP hard regs):
>>
>>     /* example */
>>     double _Complex conjugate(double _Complex z) {
>>       __imag__(z) *= -1;
>>       return z;
>>     }
>>
>>     ;; before
>>     conjugate:
>>         movi.n  a6, -1
>>         slli    a6, a6, 31
>>         mov.n   a8, a2
>>         mov.n   a9, a3
>>         mov.n   a7, a4
>>         xor     a6, a5, a6
>>         mov.n   a2, a8
>>         mov.n   a3, a9
>>         mov.n   a4, a7
>>         mov.n   a5, a6
>>         ret.n
>>
>>     ;; after
>>     conjugate:
>>         movi.n  a6, -1
>>         slli    a6, a6, 31
>>         xor     a6, a5, a6
>>         mov.n   a5, a6
>>         ret.n
>>
>> gcc/ChangeLog:
>>
>> 	* lower-subreg.cc (resolve_simple_move):
>> 	Add zero clear of the entire register immediately after
>> 	the clobber.
>> 	* expr.cc (emit_move_complex_parts):
>> 	Change to clobber the real and imaginary parts separately
>> 	instead of the whole complex register if possible.
>> ---
>>  gcc/expr.cc         | 26 ++++++++++++++++++++------
>>  gcc/lower-subreg.cc |  7 ++++++-
>>  2 files changed, 26 insertions(+), 7 deletions(-)
>>
>> diff --git a/gcc/expr.cc b/gcc/expr.cc
>> index 80bb1b8a4c5..9732e8fd4e5 100644
>> --- a/gcc/expr.cc
>> +++ b/gcc/expr.cc
>> @@ -3775,15 +3775,29 @@ emit_move_complex_push (machine_mode mode, rtx x, rtx y)
>>  rtx_insn *
>>  emit_move_complex_parts (rtx x, rtx y)
>>  {
>> -  /* Show the output dies here.  This is necessary for SUBREGs
>> -     of pseudos since we cannot track their lifetimes correctly;
>> -     hard regs shouldn't appear here except as return values.  */
>> -  if (!reload_completed && !reload_in_progress
>> -      && REG_P (x) && !reg_overlap_mentioned_p (x, y))
>> -    emit_clobber (x);
>> +  rtx_insn *re_insn, *im_insn;
>>  
>>    write_complex_part (x, read_complex_part (y, false), false, true);
>> +  re_insn = get_last_insn ();
>>    write_complex_part (x, read_complex_part (y, true), true, false);
>> +  im_insn = get_last_insn ();
>> +
>> +  /* Show the output dies here.  This is necessary for SUBREGs
>> +     of pseudos since we cannot track their lifetimes correctly.  */
>> +  if (can_create_pseudo_p ()
>> +      && REG_P (x) && ! reg_overlap_mentioned_p (x, y))
>> +    {
>> +      /* Hard regs shouldn't appear here except as return values.  */
>> +      if (HARD_REGISTER_P (x) && REG_NREGS (x) % 2 == 0)
>> +	{
>> +	  emit_insn_before (gen_clobber (SET_DEST (PATTERN (re_insn))),
>> +			    re_insn);
>> +	  emit_insn_before (gen_clobber (SET_DEST (PATTERN (im_insn))),
>> +			    im_insn);
>> +	}
>> +      else
>> +	emit_insn_before (gen_clobber (x), re_insn);
>> +    }
>>  
>>    return get_last_insn ();
>>  }
>> diff --git a/gcc/lower-subreg.cc b/gcc/lower-subreg.cc
>> index 03e9326c663..4ff0a7d1556 100644
>> --- a/gcc/lower-subreg.cc
>> +++ b/gcc/lower-subreg.cc
>> @@ -1086,7 +1086,12 @@ resolve_simple_move (rtx set, rtx_insn *insn)
>>        unsigned int i;
>>  
>>        if (REG_P (dest) && !HARD_REGISTER_NUM_P (REGNO (dest)))
>> -	emit_clobber (dest);
>> +	{
>> +	  emit_clobber (dest);
>> +	  /* We clear the entire of dest with zero after the clobber,
>> +	     similar to the "init-regs" pass.  */
>> +	  emit_move_insn (dest, CONST0_RTX (GET_MODE (dest)));
>> +	}
>>  
>>        for (i = 0; i < words; ++i)
>>  	{

next prev parent reply	other threads:[~2022-08-03 11:17 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-03  1:35 Takayuki 'January June' Suwa
2022-08-03  7:52 ` Richard Sandiford
2022-08-03 11:17   ` Takayuki 'January June' Suwa [this message]
2022-08-04  9:49     ` Richard Sandiford
2022-08-04 12:35       ` Takayuki 'January June' Suwa
2022-08-05 16:20         ` Jeff Law
2022-10-14 11:06           ` [PATCH] xtensa: Prepare the transition from Reload to LRA Takayuki 'January June' Suwa
2022-10-16  5:03             ` Max Filippov
2022-10-18  2:57               ` [PATCH v2] " Takayuki 'January June' Suwa
2022-10-18  3:14                 ` Max Filippov
2022-10-18 12:16                   ` Max Filippov
2022-10-19  8:16                     ` [PATCH v3] " Takayuki 'January June' Suwa
2022-10-19 11:31                       ` Max Filippov
2022-10-25 20:09                       ` Jan-Benedict Glaw
2022-10-26  3:23                         ` Takayuki 'January June' Suwa
2022-10-26  6:27                         ` [PATCH] xtensa: Fix out-of-bounds array access Takayuki 'January June' Suwa
2022-10-26 17:05                           ` Max Filippov
2022-08-05 16:12       ` [PATCH] lower-subreg, expr: Mitigate inefficiencies derived from "(clobber (reg X))" followed by "(set (subreg (reg X)) (...))" Jeff Law
2022-08-03 17:23   ` Jeff Law
2022-08-04  9:39     ` Richard Sandiford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7e3fe210-6dbc-fc29-dbb8-b951e89cf7e9@yahoo.co.jp \
    --to=jjsuwa_sys3175@yahoo.co.jp \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).