From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.sandiford@arm.com>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id F066A385840F
 for <gcc-patches@gcc.gnu.org>; Mon, 13 Jun 2022 08:40:42 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org F066A385840F
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BA88ED6E;
 Mon, 13 Jun 2022 01:40:42 -0700 (PDT)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.37])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5DA7E3F792;
 Mon, 13 Jun 2022 01:40:41 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Tamar Christina <tamar.christina@arm.com>
Mail-Followup-To: Tamar Christina <tamar.christina@arm.com>,
 gcc-patches@gcc.gnu.org, nd@arm.com, rguenther@suse.de,
 richard.sandiford@arm.com
Cc: gcc-patches@gcc.gnu.org,  nd@arm.com,  rguenther@suse.de
Subject: Re: [PATCH]middle-end Use subregs to expand COMPLEX_EXPR to set the
 lowpart.
References: <patch-15778-tamar@arm.com>
Date: Mon, 13 Jun 2022 09:40:39 +0100
In-Reply-To: <patch-15778-tamar@arm.com> (Tamar Christina's message of "Thu, 9
 Jun 2022 08:52:35 +0100")
Message-ID: <mptbkuxdkhk.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Status: No, score=-58.5 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_LOTSOFHASH,
 SPF_HELO_NONE, SPF_NONE, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Mon, 13 Jun 2022 08:40:44 -0000

Tamar Christina <tamar.christina@arm.com> writes:
> Hi All,
>
> When lowering COMPLEX_EXPR we currently emit two VEC_EXTRACTs.  One for the
> lowpart and one for the highpart.
>
> The problem with this is that in RTL the lvalue of the RTX is the only thing
> tying the two instructions together.
>
> This means that e.g. combine is unable to try to combine the two instructions
> for setting the lowpart and highpart.
>
> For ISAs that have bit extract instructions we can eliminate one of the extracts
> if, and only if we're setting the entire complex number.
>
> This change changes the expand code when we're setting the entire complex number
> to generate a subreg for the lowpart instead of a vec_extract.
>
> This allows us to optimize sequences such as:
>
> _Complex int f(int a, int b) {
>     _Complex int t = a + b * 1i;
>     return t;
> }
>
> from:
>
> f:
> 	bfi     x2, x0, 0, 32
> 	bfi     x2, x1, 32, 32
> 	mov     x0, x2
> 	ret
>
> into:
>
> f:
> 	bfi	x0, x1, 32, 32
> 	ret
>
> I have also confirmed the codegen for x86_64 did not change.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> and no issues.
>
> Ok for master?

I'm not sure this is endian-safe.  For big-endian it's the imaginary
part that can be written as a subreg.  The real part might be the
high part of a register.

Maybe a more general way to handle this would be to add (yet another)
parameter to store_bit_field that indicates that the current value
of the structure is undefined.  That would also be useful in at
least one other caller (from calls.cc).  write_complex_part could
then have a similar parameter, true for the first write and false
for the second.

Thanks,
Richard

>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> 	* emit-rtl.cc (validate_subreg): Accept subregs of complex modes.
> 	* expr.cc (emit_move_complex_parts): Emit subreg of lowpart if possible.
>
> gcc/testsuite/ChangeLog:
>
> 	* g++.target/aarch64/complex-init.C: New test.
>
> --- inline copy of patch -- 
> diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
> index f4404d7abe33b565358b7f609a91114c75ecf4e7..15ffca2ffe986bca56c1fae9381bd33f5d6b012d 100644
> --- a/gcc/emit-rtl.cc
> +++ b/gcc/emit-rtl.cc
> @@ -947,9 +947,11 @@ validate_subreg (machine_mode omode, machine_mode imode,
>  	   && GET_MODE_INNER (omode) == GET_MODE_INNER (imode))
>      ;
>    /* Subregs involving floating point modes are not allowed to
> -     change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
> +     change size unless it's an insert into a complex mode.
> +     Therefore (subreg:DI (reg:DF) 0) and (subreg:CS (reg:SF) 0) are fine, but
>       (subreg:SI (reg:DF) 0) isn't.  */
> -  else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
> +  else if ((FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
> +	   && !COMPLEX_MODE_P (omode))
>      {
>        if (! (known_eq (isize, osize)
>  	     /* LRA can use subreg to store a floating point value in
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index 5f7142b975ada2cd8b00663d35ba1e0004b8e28d..fce672c236fdbc4d40adb6e2614c234c02a61933 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -3740,7 +3740,17 @@ emit_move_complex_parts (rtx x, rtx y)
>        && REG_P (x) && !reg_overlap_mentioned_p (x, y))
>      emit_clobber (x);
>  
> -  write_complex_part (x, read_complex_part (y, false), false);
> +  /* If we're writing the entire value using a concat into a register
> +     then emit the lower part as a simple mov followed by an insert
> +     into the top part.  */
> +  if (GET_CODE (y) == CONCAT && !reload_completed && REG_P (x))
> +    {
> +      rtx val = XEXP (y, false);
> +      rtx dest = lowpart_subreg (GET_MODE (val), x, GET_MODE (x));
> +      emit_move_insn (dest, val);
> +    }
> +  else
> +    write_complex_part (x, read_complex_part (y, false), false);
>    write_complex_part (x, read_complex_part (y, true), true);
>  
>    return get_last_insn ();
> diff --git a/gcc/testsuite/g++.target/aarch64/complex-init.C b/gcc/testsuite/g++.target/aarch64/complex-init.C
> new file mode 100644
> index 0000000000000000000000000000000000000000..497cc4bca3e2c59da95c871ceb5cc96216fc302d
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/aarch64/complex-init.C
> @@ -0,0 +1,40 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
> +
> +/*
> +** _Z1fii:
> +** ...
> +** 	bfi	x0, x1, 32, 32
> +** 	ret
> +** ...
> +*/
> +_Complex int f(int a, int b) {
> +    _Complex int t = a + b * 1i;
> +    return t;
> +}
> +
> +/*
> +** _Z2f2ii:
> +** ...
> +** 	bfi	x0, x1, 32, 32
> +** 	ret
> +** ...
> +*/
> +_Complex int f2(int a, int b) {
> +    _Complex int t = {a, b};
> +    return t;
> +}
> +
> +/*
> +** _Z12f_convolutedii:
> +** ...
> +** 	bfi	x0, x1, 32, 32
> +** 	ret
> +** ...
> +*/
> +_Complex int f_convoluted(int a, int b) {
> +    _Complex int t = (_Complex int)a;
> +    __imag__ t = b;
> +    return t;
> +}