From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id BD3083858D20 for ; Fri, 17 Nov 2023 08:45:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BD3083858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn ARC-Filter: OpenARC Filter v1.0.0 sourceware.org BD3083858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=114.242.206.163 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700210710; cv=none; b=ozrWtRuIx2EPnUw8Kd7uK256bBFbZJ7fpjfSSDDh2z8hbMSj0YWxEwDAjQwL9W6u9do0IpWQEDxTiJO5D/SHReeCBGni/WTai4p/zAKL0RZj6uWVQ8u4MbMTRNRf6UN3EliNsMsggrNE/0mGlzO1c4jq7wP5ezoESCzbLuP1yr4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700210710; c=relaxed/simple; bh=5PUSfMZvFYFxY4Xb4+omDsgxTDNsvk1R3GmFtPj81a0=; h=Subject:To:From:Message-ID:Date:MIME-Version; b=uoqfW9qm1IVFMs5gQbGDpwIa5m2IIc0nndhAIonyTladxpFnrTbSJcZged0hsEe983M1G0zRrFWc//VoOyW36NH8d8/MIo0ldkCaaW6CBzxEeEM8HOQRm2aYv4u06jEx8tQ87mSGHGlbG+MeufvVO0kX0KH4qgrvZg10ZFfV6qM= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from loongson.cn (unknown [10.20.4.107]) by gateway (Coremail) with SMTP id _____8BxyeoRKFdld8M6AA--.40812S3; Fri, 17 Nov 2023 16:45:05 +0800 (CST) Received: from [10.20.4.107] (unknown [10.20.4.107]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Bxni8PKFdlmidFAA--.20751S3; Fri, 17 Nov 2023 16:45:04 +0800 (CST) Subject: Re: [PATCH] LoongArch: Handle vectorized copysign (x, -1) expansion efficiently To: Xi Ruoyao , gcc-patches@gcc.gnu.org Cc: i@xen0n.name, xuchenghua@loongson.cn References: <20231113200840.339229-1-xry111@xry111.site> From: chenglulu Message-ID: Date: Fri, 17 Nov 2023 16:45:03 +0800 User-Agent: Mozilla/5.0 (X11; Linux loongarch64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <20231113200840.339229-1-xry111@xry111.site> Content-Type: text/plain; charset=gbk; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-CM-TRANSID:AQAAf8Bxni8PKFdlmidFAA--.20751S3 X-CM-SenderInfo: xfkh0wpoxo3qxorr0wxvrqhubq/ X-Coremail-Antispam: 1Uk129KBj93XoW3Gw1fKry8XFyUAw18JFyrAFc_yoWxury3pr ZrCw12yrWxXr92g3Z3Wa45Jrs8Kr42gF4a9Fy3AFy2yr4aqr17Ja18KFZIqF98A3yYgr4I vF4v93W7uF4Yk3gCm3ZEXasCq-sJn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7KY7ZEXa sCq-sGcSsGvfJ3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29KBjDU 0xBIdaVrnRJUUUvIb4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2 IYs7xG6rWj6s0DM7CIcVAFz4kK6r1Y6r17M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48v e4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Xr0_Ar1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI 0_Gr0_Cr1l84ACjcxK6I8E87Iv67AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVCY1x0267AK xVW8Jr0_Cr1UM2AIxVAIcxkEcVAq07x20xvEncxIr21l57IF6xkI12xvs2x26I8E6xACxx 1l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1q6rW5McIj6I8E87Iv 67AKxVW8JVWxJwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IY64vIr41lc7I2V7IY0VAS07 AlzVAYIcxG8wCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02 F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_JF0_Jw 1lIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW8JVW5JwCI42IY6xIIjxv20xvEc7Cj xVAFwI0_Gr0_Cr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r 4j6F4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x07jo sjUUUUUU= X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_STATUS,MIME_CHARSET_FARAWAY,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: LGTM. Thanks. ÔÚ 2023/11/14 ÉÏÎç4:07, Xi Ruoyao дµÀ: > With LSX or LASX, copysign (x[i], -1) (or any negative constant) can be > vectorized using [x]vbitseti.{w/d} instructions to directly set the > signbits. > > Inspired by Tamar Christina's "AArch64: Handle copysign (x, -1) expansion > efficiently" (r14-5289). > > gcc/ChangeLog: > > * config/loongarch/lsx.md (copysign3): Allow operand[2] to > be an reg_or_vector_same_val_operand. If it's a const vector > with same negative elements, expand the copysign with a bitset > instruction. Otherwise, force it into an register. > * config/loongarch/lasx.md (copysign3): Likewise. > > gcc/testsuite/ChangeLog: > > * g++.target/loongarch/vect-copysign-negconst.C: New test. > * g++.target/loongarch/vect-copysign-negconst-run.C: New test. > --- > > Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? > > gcc/config/loongarch/lasx.md | 22 ++++++++- > gcc/config/loongarch/lsx.md | 22 ++++++++- > .../loongarch/vect-copysign-negconst-run.C | 47 +++++++++++++++++++ > .../loongarch/vect-copysign-negconst.C | 27 +++++++++++ > 4 files changed, 116 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/g++.target/loongarch/vect-copysign-negconst-run.C > create mode 100644 gcc/testsuite/g++.target/loongarch/vect-copysign-negconst.C > > diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md > index f0f2dd08dd8..2e11f061202 100644 > --- a/gcc/config/loongarch/lasx.md > +++ b/gcc/config/loongarch/lasx.md > @@ -3136,11 +3136,31 @@ (define_expand "copysign3" > (match_operand:FLASX 1 "register_operand"))) > (set (match_dup 5) > (and:FLASX (match_dup 3) > - (match_operand:FLASX 2 "register_operand"))) > + (match_operand:FLASX 2 "reg_or_vector_same_val_operand"))) > (set (match_operand:FLASX 0 "register_operand") > (ior:FLASX (match_dup 4) (match_dup 5)))] > "ISA_HAS_LASX" > { > + /* copysign (x, -1) should instead be expanded as setting the sign > + bit. */ > + if (!REG_P (operands[2])) > + { > + rtx op2_elt = unwrap_const_vec_duplicate (operands[2]); > + if (GET_CODE (op2_elt) == CONST_DOUBLE > + && real_isneg (CONST_DOUBLE_REAL_VALUE (op2_elt))) > + { > + rtx n = GEN_INT (8 * GET_MODE_SIZE (mode) - 1); > + operands[0] = lowpart_subreg (mode, operands[0], > + mode); > + operands[1] = lowpart_subreg (mode, operands[1], > + mode); > + emit_insn (gen_lasx_xvbitseti_ (operands[0], > + operands[1], n)); > + DONE; > + } > + } > + > + operands[2] = force_reg (mode, operands[2]); > operands[3] = loongarch_build_signbit_mask (mode, 1, 0); > > operands[4] = gen_reg_rtx (mode); > diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md > index 55c7d79a030..8ea41c85b01 100644 > --- a/gcc/config/loongarch/lsx.md > +++ b/gcc/config/loongarch/lsx.md > @@ -2873,11 +2873,31 @@ (define_expand "copysign3" > (match_operand:FLSX 1 "register_operand"))) > (set (match_dup 5) > (and:FLSX (match_dup 3) > - (match_operand:FLSX 2 "register_operand"))) > + (match_operand:FLSX 2 "reg_or_vector_same_val_operand"))) > (set (match_operand:FLSX 0 "register_operand") > (ior:FLSX (match_dup 4) (match_dup 5)))] > "ISA_HAS_LSX" > { > + /* copysign (x, -1) should instead be expanded as setting the sign > + bit. */ > + if (!REG_P (operands[2])) > + { > + rtx op2_elt = unwrap_const_vec_duplicate (operands[2]); > + if (GET_CODE (op2_elt) == CONST_DOUBLE > + && real_isneg (CONST_DOUBLE_REAL_VALUE (op2_elt))) > + { > + rtx n = GEN_INT (8 * GET_MODE_SIZE (mode) - 1); > + operands[0] = lowpart_subreg (mode, operands[0], > + mode); > + operands[1] = lowpart_subreg (mode, operands[1], > + mode); > + emit_insn (gen_lsx_vbitseti_ (operands[0], operands[1], > + n)); > + DONE; > + } > + } > + > + operands[2] = force_reg (mode, operands[2]); > operands[3] = loongarch_build_signbit_mask (mode, 1, 0); > > operands[4] = gen_reg_rtx (mode); > diff --git a/gcc/testsuite/g++.target/loongarch/vect-copysign-negconst-run.C b/gcc/testsuite/g++.target/loongarch/vect-copysign-negconst-run.C > new file mode 100644 > index 00000000000..d2d5d15c933 > --- /dev/null > +++ b/gcc/testsuite/g++.target/loongarch/vect-copysign-negconst-run.C > @@ -0,0 +1,47 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2 -march=loongarch64 -mlasx -mno-strict-align" } */ > +/* { dg-require-effective-target loongarch_asx_hw } */ > + > +#include "vect-copysign-negconst.C" > + > +double d[] = {1.2, -3.4, -5.6, 7.8}; > +float f[] = {1.2, -3.4, -5.6, 7.8, -9.0, -11.4, 51.4, 1919.810}; > + > +double _abs(double x) { return __builtin_fabs (x); } > +float _abs(float x) { return __builtin_fabsf (x); } > + > +template > +void > +check (T *arr, T *orig, int len) > +{ > + for (int i = 0; i < len; i++) > + { > + if (arr[i] > 0) > + __builtin_trap (); > + if (_abs (arr[i]) != _abs (orig[i])) > + __builtin_trap (); > + } > +} > + > +int > +main() > +{ > + double test_d[4]; > + float test_f[8]; > + > + __builtin_memcpy (test_d, d, sizeof (test_d)); > + force_negative<2> (test_d); > + check (test_d, d, 2); > + > + __builtin_memcpy (test_d, d, sizeof (test_d)); > + force_negative<4> (test_d); > + check (test_d, d, 4); > + > + __builtin_memcpy (test_f, f, sizeof (test_f)); > + force_negative<4> (test_f); > + check (test_f, f, 4); > + > + __builtin_memcpy (test_f, f, sizeof (test_f)); > + force_negative<8> (test_f); > + check (test_f, f, 8); > +} > diff --git a/gcc/testsuite/g++.target/loongarch/vect-copysign-negconst.C b/gcc/testsuite/g++.target/loongarch/vect-copysign-negconst.C > new file mode 100644 > index 00000000000..5e8820d2bca > --- /dev/null > +++ b/gcc/testsuite/g++.target/loongarch/vect-copysign-negconst.C > @@ -0,0 +1,27 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -march=loongarch64 -mlasx -mno-strict-align" } */ > +/* { dg-final { scan-assembler "\txvbitseti.*63" } } */ > +/* { dg-final { scan-assembler "\txvbitseti.*31" } } */ > +/* { dg-final { scan-assembler "\tvbitseti.*63" } } */ > +/* { dg-final { scan-assembler "\tvbitseti.*31" } } */ > + > +template > +__attribute__ ((noipa)) void > +force_negative (float *arr) > +{ > + for (int i = 0; i < N; i++) > + arr[i] = __builtin_copysignf (arr[i], -2); > +} > + > +template > +__attribute__ ((noipa)) void > +force_negative (double *arr) > +{ > + for (int i = 0; i < N; i++) > + arr[i] = __builtin_copysign (arr[i], -3); > +} > + > +template void force_negative<4>(float *); > +template void force_negative<8>(float *); > +template void force_negative<2>(double *); > +template void force_negative<4>(double *);