From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.sandiford@arm.com>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id 0FC273858CDB
 for <gcc-patches@gcc.gnu.org>; Wed,  3 Aug 2022 07:52:13 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0FC273858CDB
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2E8DC13D5;
 Wed,  3 Aug 2022 00:52:13 -0700 (PDT)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.37])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 081F33F73B;
 Wed,  3 Aug 2022 00:52:11 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Takayuki 'January June' Suwa via Gcc-patches <gcc-patches@gcc.gnu.org>
Mail-Followup-To: Takayuki 'January June' Suwa via Gcc-patches
 <gcc-patches@gcc.gnu.org>,
 Takayuki 'January June' Suwa <jjsuwa_sys3175@yahoo.co.jp>,
 richard.sandiford@arm.com
Subject: Re: [PATCH] lower-subreg,
 expr: Mitigate inefficiencies derived from "(clobber (reg X))"
 followed by "(set (subreg (reg X)) (...))"
References: <ea289929-5955-d366-9341-34b95f9ceb57@yahoo.co.jp>
Date: Wed, 03 Aug 2022 08:52:10 +0100
In-Reply-To: <ea289929-5955-d366-9341-34b95f9ceb57@yahoo.co.jp> (Takayuki Suwa
 via Gcc-patches's message of "Wed, 3 Aug 2022 10:35:19 +0900")
Message-ID: <mpt4jytpxw5.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Status: No, score=-52.7 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE,
 SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 03 Aug 2022 07:52:15 -0000

Takayuki 'January June' Suwa via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Emitting "(clobber (reg X))" before "(set (subreg (reg X)) (...))" keeps
> data flow consistent, but it also increases register allocation pressure
> and thus often creates many unwanted register-to-register moves that
> cannot be optimized away.

There are two things here:

- If emit_move_complex_parts emits a clobber of a hard register,
  then that's probably a bug/misfeature.  The point of the clobber is
  to indicate that the register has no useful contents.  That's useful
  for wide pseudos that are written to in parts, since it avoids the
  need to track the liveness of each part of the pseudo individually.
  But it shouldn't be necessary for hard registers, since subregs of
  hard registers are simplified to hard registers wherever possible
  (which on most targets is "always").

  So I think the emit_move_complex_parts clobber should be restricted
  to !HARD_REGISTER_P, like the lower-subreg clobber is.  If that helps
  (if only partly) then it would be worth doing as its own patch.

- I think it'd be worth looking into more detail why a clobber makes
  a difference to register pressure.  A clobber of a pseudo register R
  shouldn't make R conflict with things that are live at the point of
  the clobber.

>  It seems just analogous to partial register
> stall which is a famous problem on processors that do register renaming.
>
> In my opinion, when the register to be clobbered is a composite of hard
> ones, we should clobber the individual elements separetely, otherwise
> clear the entire to zero prior to use as the "init-regs" pass does (like
> partial register stall workarounds on x86 CPUs).  Such redundant zero
> constant assignments will be removed later in the "cprop_hardreg" pass.

I don't think we should rely on the zero being optimised away later.

Emitting the zero also makes it harder for the register allocator
to elide the move.  For example, if we have:

  (set (subreg:SI (reg:DI P) 0) (reg:SI R0))
  (set (subreg:SI (reg:DI P) 4) (reg:SI R1))

then there is at least a chance that the RA could assign hard registers
R0:R1 to P, which would turn the moves into nops.  If we emit:

  (set (reg:DI P) (const_int 0))

beforehand then that becomes impossible, since R0 and R1 would then
conflict with P.

TBH I'm surprised we still run init_regs for LRA.  I thought there was
a plan to stop doing that, but perhaps I misremember.

Thanks,
Richard

> This patch may give better output code quality for the reasons above,
> especially on architectures that don't have DFmode hard registers
> (On architectures with such hard registers, this patch changes virtually
> nothing).
>
> For example (Espressif ESP8266, Xtensa without FP hard regs):
>
>     /* example */
>     double _Complex conjugate(double _Complex z) {
>       __imag__(z) *= -1;
>       return z;
>     }
>
>     ;; before
>     conjugate:
>         movi.n  a6, -1
>         slli    a6, a6, 31
>         mov.n   a8, a2
>         mov.n   a9, a3
>         mov.n   a7, a4
>         xor     a6, a5, a6
>         mov.n   a2, a8
>         mov.n   a3, a9
>         mov.n   a4, a7
>         mov.n   a5, a6
>         ret.n
>
>     ;; after
>     conjugate:
>         movi.n  a6, -1
>         slli    a6, a6, 31
>         xor     a6, a5, a6
>         mov.n   a5, a6
>         ret.n
>
> gcc/ChangeLog:
>
> 	* lower-subreg.cc (resolve_simple_move):
> 	Add zero clear of the entire register immediately after
> 	the clobber.
> 	* expr.cc (emit_move_complex_parts):
> 	Change to clobber the real and imaginary parts separately
> 	instead of the whole complex register if possible.
> ---
>  gcc/expr.cc         | 26 ++++++++++++++++++++------
>  gcc/lower-subreg.cc |  7 ++++++-
>  2 files changed, 26 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index 80bb1b8a4c5..9732e8fd4e5 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -3775,15 +3775,29 @@ emit_move_complex_push (machine_mode mode, rtx x, rtx y)
>  rtx_insn *
>  emit_move_complex_parts (rtx x, rtx y)
>  {
> -  /* Show the output dies here.  This is necessary for SUBREGs
> -     of pseudos since we cannot track their lifetimes correctly;
> -     hard regs shouldn't appear here except as return values.  */
> -  if (!reload_completed && !reload_in_progress
> -      && REG_P (x) && !reg_overlap_mentioned_p (x, y))
> -    emit_clobber (x);
> +  rtx_insn *re_insn, *im_insn;
>  
>    write_complex_part (x, read_complex_part (y, false), false, true);
> +  re_insn = get_last_insn ();
>    write_complex_part (x, read_complex_part (y, true), true, false);
> +  im_insn = get_last_insn ();
> +
> +  /* Show the output dies here.  This is necessary for SUBREGs
> +     of pseudos since we cannot track their lifetimes correctly.  */
> +  if (can_create_pseudo_p ()
> +      && REG_P (x) && ! reg_overlap_mentioned_p (x, y))
> +    {
> +      /* Hard regs shouldn't appear here except as return values.  */
> +      if (HARD_REGISTER_P (x) && REG_NREGS (x) % 2 == 0)
> +	{
> +	  emit_insn_before (gen_clobber (SET_DEST (PATTERN (re_insn))),
> +			    re_insn);
> +	  emit_insn_before (gen_clobber (SET_DEST (PATTERN (im_insn))),
> +			    im_insn);
> +	}
> +      else
> +	emit_insn_before (gen_clobber (x), re_insn);
> +    }
>  
>    return get_last_insn ();
>  }
> diff --git a/gcc/lower-subreg.cc b/gcc/lower-subreg.cc
> index 03e9326c663..4ff0a7d1556 100644
> --- a/gcc/lower-subreg.cc
> +++ b/gcc/lower-subreg.cc
> @@ -1086,7 +1086,12 @@ resolve_simple_move (rtx set, rtx_insn *insn)
>        unsigned int i;
>  
>        if (REG_P (dest) && !HARD_REGISTER_NUM_P (REGNO (dest)))
> -	emit_clobber (dest);
> +	{
> +	  emit_clobber (dest);
> +	  /* We clear the entire of dest with zero after the clobber,
> +	     similar to the "init-regs" pass.  */
> +	  emit_move_insn (dest, CONST0_RTX (GET_MODE (dest)));
> +	}
>  
>        for (i = 0; i < words; ++i)
>  	{