From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id 12D243858024 for ; Sat, 9 Sep 2023 07:04:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 12D243858024 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn Received: from loongson.cn (unknown [10.20.4.107]) by gateway (Coremail) with SMTP id _____8Cx7+voGPxk4e8iAA--.2333S3; Sat, 09 Sep 2023 15:04:08 +0800 (CST) Received: from [10.20.4.107] (unknown [10.20.4.107]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Bx3yPnGPxkA8NzAA--.39627S3; Sat, 09 Sep 2023 15:04:08 +0800 (CST) Subject: Re: [PATCH] LoongArch: Use LSX and LASX for block move To: Xi Ruoyao , gcc-patches@gcc.gnu.org Cc: Chenghui Pan , i@xen0n.name, xuchenghua@loongson.cn References: <20230907161407.27338-2-xry111@xry111.site> From: chenglulu Message-ID: <3875e341-97a0-29cc-be12-417ee62a38e8@loongson.cn> Date: Sat, 9 Sep 2023 15:04:07 +0800 User-Agent: Mozilla/5.0 (X11; Linux loongarch64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <20230907161407.27338-2-xry111@xry111.site> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-CM-TRANSID:AQAAf8Bx3yPnGPxkA8NzAA--.39627S3 X-CM-SenderInfo: xfkh0wpoxo3qxorr0wxvrqhubq/ X-Coremail-Antispam: 1Uk129KBj93XoWxXryUJF4Dtw17WFW5AF1UJwc_yoWrGw4Up3 9ru3ZxKr48JrnrWFsrX343Wr1DXwn7Gr12qFW3trn2kFsruryj9r18GrZaqFyjqa1kWrsF qr1rKa47Xay8CacCm3ZEXasCq-sJn29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7KY7ZEXa sCq-sGcSsGvfJ3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29KBjDU 0xBIdaVrnRJUUU9ab4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2 IYs7xG6rWj6s0DM7CIcVAFz4kK6r106r15M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48v e4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Jr0_JF4l84ACjcxK6xIIjxv20xvEc7CjxVAFwI 0_Jr0_Gr1l84ACjcxK6I8E87Iv67AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVCY1x0267AK xVW8Jr0_Cr1UM2AIxVAIcxkEcVAq07x20xvEncxIr21l57IF6xkI12xvs2x26I8E6xACxx 1l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv 67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IY64vIr41lc7I2V7IY0VAS07 AlzVAYIcxG8wCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwCFI7km07C2 67AKxVWUAVWUtwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI 8E67AF67kF1VAFwI0_JF0_Jw1lIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWU CwCI42IY6xIIjxv20xvEc7CjxVAFwI0_Jr0_Gr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r 1xMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Jr0_GrUvcSsG vfC2KfnxnUUI43ZEXa7IU8czVUUUUUU== X-Spam-Status: No, score=-13.2 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_STATUS,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi,RuoYao:  I think the test example memcpy-vec-3.c submitted in r14-3818 is implemented incorrectly. The 16-byte length in this test example will cause can_move_by_pieces to return true when with '-mstrict-align', so no vector load instructions will be generated. 在 2023/9/8 上午12:14, Xi Ruoyao 写道: > gcc/ChangeLog: > > * config/loongarch/loongarch.h (LARCH_MAX_MOVE_PER_INSN): > Define to the maximum amount of bytes able to be loaded or > stored with one machine instruction. > * config/loongarch/loongarch.cc (loongarch_mode_for_move_size): > New static function. > (loongarch_block_move_straight): Call > loongarch_mode_for_move_size for machine_mode to be moved. > (loongarch_expand_block_move): Use LARCH_MAX_MOVE_PER_INSN > instead of UNITS_PER_WORD. > --- > > Bootstrapped and regtested on loongarch64-linux-gnu, with PR110939 patch > applied, the "lib_build_self_spec = %<..." line in t-linux commented out > (because it's silently making -mlasx in BOOT_CFLAGS ineffective, Yujie > is working on a proper fix), and BOOT_CFLAGS="-O3 -mlasx". Ok for trunk? > > gcc/config/loongarch/loongarch.cc | 22 ++++++++++++++++++---- > gcc/config/loongarch/loongarch.h | 3 +++ > 2 files changed, 21 insertions(+), 4 deletions(-) > > diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc > index 6698414281e..509ef2b97f1 100644 > --- a/gcc/config/loongarch/loongarch.cc > +++ b/gcc/config/loongarch/loongarch.cc > @@ -5191,6 +5191,20 @@ loongarch_function_ok_for_sibcall (tree decl ATTRIBUTE_UNUSED, > return true; > } > > +static machine_mode > +loongarch_mode_for_move_size (HOST_WIDE_INT size) > +{ > + switch (size) > + { > + case 32: > + return V32QImode; > + case 16: > + return V16QImode; > + } > + > + return int_mode_for_size (size * BITS_PER_UNIT, 0).require (); > +} > + > /* Emit straight-line code to move LENGTH bytes from SRC to DEST. > Assume that the areas do not overlap. */ > > @@ -5220,7 +5234,7 @@ loongarch_block_move_straight (rtx dest, rtx src, HOST_WIDE_INT length, > > for (delta_cur = delta, i = 0, offs = 0; offs < length; delta_cur /= 2) > { > - mode = int_mode_for_size (delta_cur * BITS_PER_UNIT, 0).require (); > + mode = loongarch_mode_for_move_size (delta_cur); > > for (; offs + delta_cur <= length; offs += delta_cur, i++) > { > @@ -5231,7 +5245,7 @@ loongarch_block_move_straight (rtx dest, rtx src, HOST_WIDE_INT length, > > for (delta_cur = delta, i = 0, offs = 0; offs < length; delta_cur /= 2) > { > - mode = int_mode_for_size (delta_cur * BITS_PER_UNIT, 0).require (); > + mode = loongarch_mode_for_move_size (delta_cur); > > for (; offs + delta_cur <= length; offs += delta_cur, i++) > loongarch_emit_move (adjust_address (dest, mode, offs), regs[i]); > @@ -5326,8 +5340,8 @@ loongarch_expand_block_move (rtx dest, rtx src, rtx r_length, rtx r_align) > > HOST_WIDE_INT align = INTVAL (r_align); > > - if (!TARGET_STRICT_ALIGN || align > UNITS_PER_WORD) > - align = UNITS_PER_WORD; > + if (!TARGET_STRICT_ALIGN || align > LARCH_MAX_MOVE_PER_INSN) > + align = LARCH_MAX_MOVE_PER_INSN; > > if (length <= align * LARCH_MAX_MOVE_OPS_STRAIGHT) > { > diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h > index 3fc9dc43ab1..7e391205583 100644 > --- a/gcc/config/loongarch/loongarch.h > +++ b/gcc/config/loongarch/loongarch.h > @@ -1181,6 +1181,9 @@ typedef struct { > least twice. */ > #define LARCH_MAX_MOVE_OPS_STRAIGHT (LARCH_MAX_MOVE_OPS_PER_LOOP_ITER * 2) > > +#define LARCH_MAX_MOVE_PER_INSN \ > + (ISA_HAS_LASX ? 32 : (ISA_HAS_LSX ? 16 : UNITS_PER_WORD)) > + > /* The base cost of a memcpy call, for MOVE_RATIO and friends. These > values were determined experimentally by benchmarking with CSiBE. > */