From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from eggs.gnu.org (eggs.gnu.org [IPv6:2001:470:142:3::10]) by sourceware.org (Postfix) with ESMTPS id 838423858C50 for ; Wed, 22 Nov 2023 06:41:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 838423858C50 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=loongson.cn ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 838423858C50 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:470:142:3::10 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700635316; cv=none; b=CKy6K7h3WYxkjC0KTmmIHlNvZsR5IMtHVCH0Y1C5MotQp3MUETAFqoxs/wc900JhqH9rmgE5O827er6aVluijldwigsiG8A3qzkUwjtXmYI6aNg5KsXa20FfQPhSW0TgyBnE1yCQaE93gPlm7fKPQemz7s99ioOhdrr8WwaWEzc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700635316; c=relaxed/simple; bh=qjFkA7LJAvKSbuL+KNaVByFFvS0OoTz7qw8FinxhT5k=; h=Subject:To:From:Message-ID:Date:MIME-Version; b=M08Vxc4X1EFcW0a3tiT/+P3H28FUD8A+/1H1hf+taYcXLZIu7ODZhn83bmdJmSsLIARh10ErQK/+ZPPS3k/RW8Z2sncgoD9qm+OwKraatoPnTs1M+WqG5eTWfuOGUfZ71Dv2ZurjMEscyLd/K1a/QlWK0oGSvBX9BIw93f7i6kw= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from mail.loongson.cn ([114.242.206.163]) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1r5gvn-0001oX-G7 for gcc-patches@gcc.gnu.org; Wed, 22 Nov 2023 01:41:54 -0500 Received: from loongson.cn (unknown [10.20.4.107]) by gateway (Coremail) with SMTP id _____8DxVuigol1lYdo7AA--.17877S3; Wed, 22 Nov 2023 14:41:37 +0800 (CST) Received: from [10.20.4.107] (unknown [10.20.4.107]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Cxjd6eol1lQT9JAA--.32926S3; Wed, 22 Nov 2023 14:41:34 +0800 (CST) Subject: Re: [PATCH] LoongArch: Optimize LSX vector shuffle on floating-point vector To: Xi Ruoyao , gcc-patches@gcc.gnu.org Cc: i@xen0n.name, xuchenghua@loongson.cn References: <20231119070102.3053-2-xry111@xry111.site> From: chenglulu Message-ID: Date: Wed, 22 Nov 2023 14:41:33 +0800 User-Agent: Mozilla/5.0 (X11; Linux loongarch64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <20231119070102.3053-2-xry111@xry111.site> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-CM-TRANSID:AQAAf8Cxjd6eol1lQT9JAA--.32926S3 X-CM-SenderInfo: xfkh0wpoxo3qxorr0wxvrqhubq/ X-Coremail-Antispam: 1Uk129KBj93XoWxGrWDJF4xWFW7WF1xZw4UZFc_yoWrKw1fpr Z8uas2kF48Wr97K3Z7Ja45Xr42gr17Gr429F13JrWfCw43Gr1vvwn5Kry2qFyUt3yY9r4j ga18CwnFvayUJ3cCm3ZEXasCq-sJn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7KY7ZEXa sCq-sGcSsGvfJ3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29KBjDU 0xBIdaVrnRJUUUv2b4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2 IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48v e4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_JFI_Gr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI 0_Gr0_Cr1l84ACjcxK6I8E87Iv67AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVCY1x0267AK xVW8Jr0_Cr1UM2AIxVAIcxkEcVAq07x20xvEncxIr21l57IF6xkI12xvs2x26I8E6xACxx 1l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r106r15McIj6I8E87Iv 67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IY64vIr41lc7I2V7IY0VAS07 AlzVAYIcxG8wCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02 F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_JF0_Jw 1lIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7Cj xVAFwI0_Jr0_Gr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r 1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Jr0_GrUvcSsGvfC2KfnxnUUI43ZEXa7IU1wL 05UUUUU== Received-SPF: pass client-ip=114.242.206.163; envelope-from=chenglulu@loongson.cn; helo=mail.loongson.cn X-Spam_score_int: -38 X-Spam_score: -3.9 X-Spam_bar: --- X-Spam_report: (-3.9 / 5.0 requ) BAYES_00=-1.9,NICE_REPLY_A=-1.996,SPF_HELO_NONE=0.001,SPF_PASS=-0.001,T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Status: No, score=-14.4 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_STATUS,KAM_SHORT,NICE_REPLY_A,SPF_FAIL,SPF_HELO_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: 在 2023/11/19 下午3:01, Xi Ruoyao 写道: > The vec_perm expander was wrongly defined. GCC internal says: > > Operand 3 is the “selector”. It is an integral mode vector of the same > width and number of elements as mode M. > > With this mistake, the generic code manages to work around and it ends > up creating some very nasty code for a simple __builtin_shuffle (a, b, > c) where a and b are V4SF, c is V4SI: > > la.local $r12,.LANCHOR0 > la.local $r13,.LANCHOR1 > vld $vr1,$r12,48 > vslli.w $vr1,$vr1,2 > vld $vr2,$r12,16 > vld $vr0,$r13,0 > vld $vr3,$r13,16 > vshuf.b $vr0,$vr1,$vr1,$vr0 > vld $vr1,$r12,32 > vadd.b $vr0,$vr0,$vr3 > vandi.b $vr0,$vr0,31 > vshuf.b $vr0,$vr1,$vr2,$vr0 > vst $vr0,$r12,0 > jr $r1 > > This is obviously stupid. Fix the expander definition and adjust > loongarch_expand_vec_perm to handle it correctly. > > gcc/ChangeLog: > > * config/loongarch/lsx.md (vec_perm): Make the > selector VIMODE. > * config/loongarch/loongarch.cc (loongarch_expand_vec_perm): > Use the mode of the selector (instead of the shuffled vector) > for truncating it. Operate on subregs in the selector mode if > the shuffled vector has a different mode (i. e. it's a > floating-point vector). > > gcc/testsuite/ChangeLog: > > * gcc.target/loongarch/vect-shuf-fp.c: New test. > --- > > Bootstrapped & regtested on loongarch64-linux-gnu. Ok for trunk? LGTM. Thanks! > > gcc/config/loongarch/loongarch.cc | 18 ++++++++++-------- > gcc/config/loongarch/lsx.md | 2 +- > .../gcc.target/loongarch/vect-shuf-fp.c | 16 ++++++++++++++++ > 3 files changed, 27 insertions(+), 9 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/loongarch/vect-shuf-fp.c > > diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc > index ce601a331f7..33357c670e1 100644 > --- a/gcc/config/loongarch/loongarch.cc > +++ b/gcc/config/loongarch/loongarch.cc > @@ -8607,8 +8607,9 @@ void > loongarch_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel) > { > machine_mode vmode = GET_MODE (target); > + machine_mode vimode = GET_MODE (sel); > auto nelt = GET_MODE_NUNITS (vmode); > - auto round_reg = gen_reg_rtx (vmode); > + auto round_reg = gen_reg_rtx (vimode); > rtx round_data[MAX_VECT_LEN]; > > for (int i = 0; i < nelt; i += 1) > @@ -8616,9 +8617,16 @@ loongarch_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel) > round_data[i] = GEN_INT (0x1f); > } > > - rtx round_data_rtx = gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (nelt, round_data)); > + rtx round_data_rtx = gen_rtx_CONST_VECTOR (vimode, gen_rtvec_v (nelt, round_data)); > emit_move_insn (round_reg, round_data_rtx); > > + if (vmode != vimode) > + { > + target = lowpart_subreg (vimode, target, vmode); > + op0 = lowpart_subreg (vimode, op0, vmode); > + op1 = lowpart_subreg (vimode, op1, vmode); > + } > + > switch (vmode) > { > case E_V16QImode: > @@ -8626,17 +8634,11 @@ loongarch_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel) > emit_insn (gen_lsx_vshuf_b (target, op1, op0, sel)); > break; > case E_V2DFmode: > - emit_insn (gen_andv2di3 (sel, sel, round_reg)); > - emit_insn (gen_lsx_vshuf_d_f (target, sel, op1, op0)); > - break; > case E_V2DImode: > emit_insn (gen_andv2di3 (sel, sel, round_reg)); > emit_insn (gen_lsx_vshuf_d (target, sel, op1, op0)); > break; > case E_V4SFmode: > - emit_insn (gen_andv4si3 (sel, sel, round_reg)); > - emit_insn (gen_lsx_vshuf_w_f (target, sel, op1, op0)); > - break; > case E_V4SImode: > emit_insn (gen_andv4si3 (sel, sel, round_reg)); > emit_insn (gen_lsx_vshuf_w (target, sel, op1, op0)); > diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md > index 8ea41c85b01..5e8d8d74b43 100644 > --- a/gcc/config/loongarch/lsx.md > +++ b/gcc/config/loongarch/lsx.md > @@ -837,7 +837,7 @@ (define_expand "vec_perm" > [(match_operand:LSX 0 "register_operand") > (match_operand:LSX 1 "register_operand") > (match_operand:LSX 2 "register_operand") > - (match_operand:LSX 3 "register_operand")] > + (match_operand: 3 "register_operand")] > "ISA_HAS_LSX" > { > loongarch_expand_vec_perm (operands[0], operands[1], > diff --git a/gcc/testsuite/gcc.target/loongarch/vect-shuf-fp.c b/gcc/testsuite/gcc.target/loongarch/vect-shuf-fp.c > new file mode 100644 > index 00000000000..7acc2113afe > --- /dev/null > +++ b/gcc/testsuite/gcc.target/loongarch/vect-shuf-fp.c > @@ -0,0 +1,16 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mlasx -O3" } */ > +/* { dg-final { scan-assembler "vshuf\.w" } } */ > + > +#define V __attribute__ ((vector_size (16))) > + > +int a V; > +float b V; > +float c V; > +float d V; > + > +void > +test (void) > +{ > + d = __builtin_shuffle (b, c, a); > +}