From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id 5B8873858C74 for ; Wed, 23 Aug 2023 07:25:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5B8873858C74 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn Received: from loongson.cn (unknown [111.9.175.10]) by gateway (Coremail) with SMTP id _____8CxyOhqtOVkYSMbAA--.19728S3; Wed, 23 Aug 2023 15:25:30 +0800 (CST) Received: from [10.136.15.11] (unknown [111.9.175.10]) by localhost.localdomain (Coremail) with SMTP id AQAAf8CxLCNntOVk7R1hAA--.20936S3; Wed, 23 Aug 2023 15:25:29 +0800 (CST) Subject: Re: [PATCH 3/3] Loongarch: Add ifunc support for strncmp{aligned, lsx} To: Xi Ruoyao , Richard Henderson , libc-alpha@sourceware.org Cc: adhemerval.zanella@linaro.org, caiyinyu@loongson.cn, xuchenghua@loongson.cn, huangpei@loongson.cn References: <20230822021118.3489949-1-dengjianbo@loongson.cn> <20230822021118.3489949-4-dengjianbo@loongson.cn> <540864c7-ae1f-2358-54d0-41f38ebe43fb@loongson.cn> <35f2640e67ab009e008a87782319a617bfaa9d03.camel@xry111.site> <0d67b98f3b093a8e0eded89234164fbebcb7f9f7.camel@xry111.site> From: dengjianbo Message-ID: <1d499ec5-08c4-57cd-470a-c5b5eca416c4@loongson.cn> Date: Wed, 23 Aug 2023 15:25:27 +0800 User-Agent: Mozilla/5.0 (X11; Linux loongarch64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <0d67b98f3b093a8e0eded89234164fbebcb7f9f7.camel@xry111.site> Content-Type: multipart/alternative; boundary="------------16705ABCDF29C8C4DC48B9DF" Content-Language: en-US X-CM-TRANSID:AQAAf8CxLCNntOVk7R1hAA--.20936S3 X-CM-SenderInfo: pghqwyxldqu0o6or00hjvr0hdfq/ X-Coremail-Antispam: 1Uk129KBj93XoW7WFykWF1fJry7GF1xGF4DGFX_yoW8tw48pr yxAr1UGFy8Xr18Jr1Utw1UWa4UJr4UGw1UJF1UJFyUJr1UZr12gr18Xr10gF17Ww48Zw10 qr18Xr1UuF1DArcCm3ZEXasCq-sJn29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7KY7ZEXa sCq-sGcSsGvfJ3UbIjqfuFe4nvWSU5nxnvy29KBjDU0xBIdaVrnRJUUUvYb4IE77IF4wAF F20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r 106r15M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAF wI0_Xr0_Ar1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Cr0_Gr1UM28EF7xvwVC2z280aV AFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJwAS0I0E0xvYzxvE 52x082IY62kv0487Mc804VCY07AIYIkI8VC2zVCFFI0UMcIj6xIIjxv20xvE14v26r1j6r 18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IY64vIr41l 7480Y4vEI4kI2Ix0rVAqx4xJMxk0xIA0c2IEe2xFo4CEbIxvr21l42xK82IYc2Ij64vIr4 1l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUGVWUWwC20s026x8GjcxK 67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMIIYrxkI7VAKI48JMIIF0xvE2Ix0cI 8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r1j6r4UMIIF0xvE42xK8VAv wI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14 v26r1j6r4UYxBIdaVFxhVjvjDU0xZFpf9x07UEFAJUUUUU= X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00,BODY_8BITS,GIT_PATCH_0,HTML_MESSAGE,KAM_DMARC_STATUS,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This is a multi-part message in MIME format. --------------16705ABCDF29C8C4DC48B9DF Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit On 2023-08-22 19:23, Xi Ruoyao wrote: > On Tue, 2023-08-22 at 19:13 +0800, Xi Ruoyao via Libc-alpha wrote: >> On Tue, 2023-08-22 at 14:37 +0800, dengjianbo wrote: >> >>> Putting the data here is due to the performance. When the vld >>>  instruction is executed, the data will be in the cache, it can >>>  speed up the data loading. >> AFAIK LoongArch CPUs have separate icache and dcache like all modern >> CPUs, so this is not valid to me. > And even if it can really improve the performance, this is not on the > hot path of the algorithm so we should not use bizarre optimizations > here for marginal improvement. > > -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University Thanks for your suggestion. We have changed strcmp and strncmp to put the data in the rodata section with mergeable flags, and also use pcalau12i and %pc_lo12 with the vld to get the data. diff --git a/sysdeps/loongarch/lp64/multiarch/strncmp-lsx.S b/sysdeps/loongarch/lp64/multiarch/strncmp-lsx.S index 595472fcda..0b4eee2a98 100644 --- a/sysdeps/loongarch/lp64/multiarch/strncmp-lsx.S +++ b/sysdeps/loongarch/lp64/multiarch/strncmp-lsx.S @@ -25,15 +25,11 @@    # define STRNCMP __strncmp_lsx   -L(magic_num): -    .align          6 -    .dword          0x0706050403020100 -    .dword          0x0f0e0d0c0b0a0908 -ENTRY_NO_ALIGN(STRNCMP) +LEAF(STRNCMP, 6)      beqz            a2, L(ret0) -    pcaddi          t0, -5 +    pcalau12i       t0, %pc_hi20(L(INDEX))      andi            a3, a0, 0xf -    vld             vr2, t0, 0 +    vld             vr2, t0, %pc_lo12(L(INDEX))        andi            a4, a1, 0xf      li.d            t2, 16 @@ -202,5 +198,11 @@ L(ret0):      jr              ra  END(STRNCMP)   +    .section         .rodata.cst16,"M",@progbits,16 +    .align           4 +L(INDEX): +    .dword           0x0706050403020100 +    .dword           0x0f0e0d0c0b0a0908 +  libc_hidden_builtin_def (STRNCMP)  #endif --------------16705ABCDF29C8C4DC48B9DF--