From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 1251) id 1D167386DC7E; Mon, 27 Jun 2022 06:49:23 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1D167386DC7E MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="utf-8" From: Roger Sayle To: gcc-cvs@gcc.gnu.org Subject: [gcc r13-1282] Implement __imag__ of float _Complex using shufps on x86_64. X-Act-Checkin: gcc X-Git-Author: Roger Sayle X-Git-Refname: refs/heads/master X-Git-Oldrev: f3f73e86ec8613f176db3e52bbfbfbb9636cb714 X-Git-Newrev: 64d4f27a0ce47e97867512bda7fa5683acf8a134 Message-Id: <20220627064923.1D167386DC7E@sourceware.org> Date: Mon, 27 Jun 2022 06:49:23 +0000 (GMT) X-BeenThere: gcc-cvs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-cvs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Jun 2022 06:49:23 -0000 https://gcc.gnu.org/g:64d4f27a0ce47e97867512bda7fa5683acf8a134 commit r13-1282-g64d4f27a0ce47e97867512bda7fa5683acf8a134 Author: Roger Sayle Date: Mon Jun 27 07:47:40 2022 +0100 Implement __imag__ of float _Complex using shufps on x86_64. This patch is a follow-up improvement to my recent patch for PR rtl-optimization/7061. That patch added the test case gcc.target/i386/pr7061-2.c: float im(float _Complex a) { return __imag__ a; } For which GCC on x86_64 currently generates: movq %xmm0, %rax shrq $32, %rax movd %eax, %xmm0 ret but with this patch we now generate (the same as LLVM): shufps $85, %xmm0, %xmm0 ret This is achieved by providing a define_insn_and_split that allows truncated lshiftrt:DI by 32 to be performed on either SSE or general regs, where if the register allocator prefers to use SSE, we split to a shufps_v4si, or if not, we use a regular shrq. 2022-06-27 Roger Sayle gcc/ChangeLog PR rtl-optimization/7061 * config/i386/i386.md (*highpartdisi2): New define_insn_and_split. gcc/testsuite/ChangeLog PR rtl-optimization/7061 * gcc.target/i386/pr7061-2.c: Update to look for shufps. Diff: --- gcc/config/i386/i386.md | 25 +++++++++++++++++++++++++ gcc/testsuite/gcc.target/i386/pr7061-2.c | 4 ++++ 2 files changed, 29 insertions(+) diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index dd173f78508..125a3b44a6d 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -13249,6 +13249,31 @@ (const_string "*"))) (set_attr "mode" "")]) +;; Specialization of *lshr3_1 below, extracting the SImode +;; highpart of a DI to be extracted, but allowing it to be clobbered. +(define_insn_and_split "*highpartdisi2" + [(set (subreg:DI (match_operand:SI 0 "register_operand" "=r,x,?k") 0) + (lshiftrt:DI (match_operand:DI 1 "register_operand" "0,0,k") + (const_int 32))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_64BIT" + "#" + "&& reload_completed" + [(parallel + [(set (match_dup 0) (lshiftrt:DI (match_dup 1) (const_int 32))) + (clobber (reg:CC FLAGS_REG))])] +{ + if (SSE_REG_P (operands[0])) + { + rtx tmp = gen_rtx_REG (V4SImode, REGNO (operands[0])); + emit_insn (gen_sse_shufps_v4si (tmp, tmp, tmp, + const1_rtx, const1_rtx, + GEN_INT (5), GEN_INT (5))); + DONE; + } + operands[0] = gen_rtx_REG (DImode, REGNO (operands[0])); +}) + (define_insn "*lshr3_1" [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,?k") (lshiftrt:SWI48 diff --git a/gcc/testsuite/gcc.target/i386/pr7061-2.c b/gcc/testsuite/gcc.target/i386/pr7061-2.c index ac33340099b..837cd83e156 100644 --- a/gcc/testsuite/gcc.target/i386/pr7061-2.c +++ b/gcc/testsuite/gcc.target/i386/pr7061-2.c @@ -1,5 +1,9 @@ /* { dg-do compile { target { ! ia32 } } } */ /* { dg-options "-O2" } */ float im(float _Complex a) { return __imag__ a; } +/* { dg-final { scan-assembler "shufps" } } */ +/* { dg-final { scan-assembler-not "movd" } } */ +/* { dg-final { scan-assembler-not "movq" } } */ /* { dg-final { scan-assembler-not "movss" } } */ /* { dg-final { scan-assembler-not "rsp" } } */ +/* { dg-final { scan-assembler-not "shr" } } */