From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id F066A385840F for ; Mon, 13 Jun 2022 08:40:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org F066A385840F Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BA88ED6E; Mon, 13 Jun 2022 01:40:42 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.37]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5DA7E3F792; Mon, 13 Jun 2022 01:40:41 -0700 (PDT) From: Richard Sandiford To: Tamar Christina Mail-Followup-To: Tamar Christina , gcc-patches@gcc.gnu.org, nd@arm.com, rguenther@suse.de, richard.sandiford@arm.com Cc: gcc-patches@gcc.gnu.org, nd@arm.com, rguenther@suse.de Subject: Re: [PATCH]middle-end Use subregs to expand COMPLEX_EXPR to set the lowpart. References: Date: Mon, 13 Jun 2022 09:40:39 +0100 In-Reply-To: (Tamar Christina's message of "Thu, 9 Jun 2022 08:52:35 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-58.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_LOTSOFHASH, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Jun 2022 08:40:44 -0000 Tamar Christina writes: > Hi All, > > When lowering COMPLEX_EXPR we currently emit two VEC_EXTRACTs. One for the > lowpart and one for the highpart. > > The problem with this is that in RTL the lvalue of the RTX is the only thing > tying the two instructions together. > > This means that e.g. combine is unable to try to combine the two instructions > for setting the lowpart and highpart. > > For ISAs that have bit extract instructions we can eliminate one of the extracts > if, and only if we're setting the entire complex number. > > This change changes the expand code when we're setting the entire complex number > to generate a subreg for the lowpart instead of a vec_extract. > > This allows us to optimize sequences such as: > > _Complex int f(int a, int b) { > _Complex int t = a + b * 1i; > return t; > } > > from: > > f: > bfi x2, x0, 0, 32 > bfi x2, x1, 32, 32 > mov x0, x2 > ret > > into: > > f: > bfi x0, x1, 32, 32 > ret > > I have also confirmed the codegen for x86_64 did not change. > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu > and no issues. > > Ok for master? I'm not sure this is endian-safe. For big-endian it's the imaginary part that can be written as a subreg. The real part might be the high part of a register. Maybe a more general way to handle this would be to add (yet another) parameter to store_bit_field that indicates that the current value of the structure is undefined. That would also be useful in at least one other caller (from calls.cc). write_complex_part could then have a similar parameter, true for the first write and false for the second. Thanks, Richard > > Thanks, > Tamar > > gcc/ChangeLog: > > * emit-rtl.cc (validate_subreg): Accept subregs of complex modes. > * expr.cc (emit_move_complex_parts): Emit subreg of lowpart if possible. > > gcc/testsuite/ChangeLog: > > * g++.target/aarch64/complex-init.C: New test. > > --- inline copy of patch -- > diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc > index f4404d7abe33b565358b7f609a91114c75ecf4e7..15ffca2ffe986bca56c1fae9381bd33f5d6b012d 100644 > --- a/gcc/emit-rtl.cc > +++ b/gcc/emit-rtl.cc > @@ -947,9 +947,11 @@ validate_subreg (machine_mode omode, machine_mode imode, > && GET_MODE_INNER (omode) == GET_MODE_INNER (imode)) > ; > /* Subregs involving floating point modes are not allowed to > - change size. Therefore (subreg:DI (reg:DF) 0) is fine, but > + change size unless it's an insert into a complex mode. > + Therefore (subreg:DI (reg:DF) 0) and (subreg:CS (reg:SF) 0) are fine, but > (subreg:SI (reg:DF) 0) isn't. */ > - else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode)) > + else if ((FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode)) > + && !COMPLEX_MODE_P (omode)) > { > if (! (known_eq (isize, osize) > /* LRA can use subreg to store a floating point value in > diff --git a/gcc/expr.cc b/gcc/expr.cc > index 5f7142b975ada2cd8b00663d35ba1e0004b8e28d..fce672c236fdbc4d40adb6e2614c234c02a61933 100644 > --- a/gcc/expr.cc > +++ b/gcc/expr.cc > @@ -3740,7 +3740,17 @@ emit_move_complex_parts (rtx x, rtx y) > && REG_P (x) && !reg_overlap_mentioned_p (x, y)) > emit_clobber (x); > > - write_complex_part (x, read_complex_part (y, false), false); > + /* If we're writing the entire value using a concat into a register > + then emit the lower part as a simple mov followed by an insert > + into the top part. */ > + if (GET_CODE (y) == CONCAT && !reload_completed && REG_P (x)) > + { > + rtx val = XEXP (y, false); > + rtx dest = lowpart_subreg (GET_MODE (val), x, GET_MODE (x)); > + emit_move_insn (dest, val); > + } > + else > + write_complex_part (x, read_complex_part (y, false), false); > write_complex_part (x, read_complex_part (y, true), true); > > return get_last_insn (); > diff --git a/gcc/testsuite/g++.target/aarch64/complex-init.C b/gcc/testsuite/g++.target/aarch64/complex-init.C > new file mode 100644 > index 0000000000000000000000000000000000000000..497cc4bca3e2c59da95c871ceb5cc96216fc302d > --- /dev/null > +++ b/gcc/testsuite/g++.target/aarch64/complex-init.C > @@ -0,0 +1,40 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ > + > +/* > +** _Z1fii: > +** ... > +** bfi x0, x1, 32, 32 > +** ret > +** ... > +*/ > +_Complex int f(int a, int b) { > + _Complex int t = a + b * 1i; > + return t; > +} > + > +/* > +** _Z2f2ii: > +** ... > +** bfi x0, x1, 32, 32 > +** ret > +** ... > +*/ > +_Complex int f2(int a, int b) { > + _Complex int t = {a, b}; > + return t; > +} > + > +/* > +** _Z12f_convolutedii: > +** ... > +** bfi x0, x1, 32, 32 > +** ret > +** ... > +*/ > +_Complex int f_convoluted(int a, int b) { > + _Complex int t = (_Complex int)a; > + __imag__ t = b; > + return t; > +}