From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id 777C53858C66 for ; Wed, 13 Sep 2023 07:47:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 777C53858C66 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn Received: from loongson.cn (unknown [111.9.175.10]) by gateway (Coremail) with SMTP id _____8DxVugZaQFlt0wmAA--.39057S3; Wed, 13 Sep 2023 15:47:38 +0800 (CST) Received: from [10.136.15.11] (unknown [111.9.175.10]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Ax3twVaQFlJ2ACAA--.4080S3; Wed, 13 Sep 2023 15:47:34 +0800 (CST) Subject: Re: [PATCH 1/4] LoongArch: Add ifunc support for strcpy{aligned, unaligned, lsx, lasx} From: dengjianbo To: Xi Ruoyao , libc-alpha@sourceware.org Cc: adhemerval.zanella@linaro.org, caiyinyu@loongson.cn, xuchenghua@loongson.cn, huangpei@loongson.cn References: <20230908093357.3119822-1-dengjianbo@loongson.cn> <20230908093357.3119822-2-dengjianbo@loongson.cn> <7246aa6f75703fd18f9a22e83d759dae1264797b.camel@xry111.site> <1ce4cff2-72a5-06cc-ef64-904c3ec47e4f@loongson.cn> Message-ID: <5ace60fa-da0e-c1fe-9e31-539c2dfd4872@loongson.cn> Date: Wed, 13 Sep 2023 15:47:32 +0800 User-Agent: Mozilla/5.0 (X11; Linux loongarch64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <1ce4cff2-72a5-06cc-ef64-904c3ec47e4f@loongson.cn> Content-Type: multipart/alternative; boundary="------------8585831EF02328A2731EDAF4" Content-Language: en-US X-CM-TRANSID:AQAAf8Ax3twVaQFlJ2ACAA--.4080S3 X-CM-SenderInfo: pghqwyxldqu0o6or00hjvr0hdfq/ X-Coremail-Antispam: 1Uk129KBj93XoW7urWkJw1UGw1xury5tw4rtFc_yoW8Zr1Upa y8Gw12yFyDJ34rAw4xJr1ktFyjqF4kJ3W5JFWFya4qyr1DWa4vqrZrWF1j9FyxWr4kJw4Y qr1Svr1UZFnrAabCm3ZEXasCq-sJn29KB7ZKAUJUUUU5529EdanIXcx71UUUUU7KY7ZEXa sCq-sGcSsGvfJ3UbIjqfuFe4nvWSU5nxnvy29KBjDU0xBIdaVrnRJUUUvYb4IE77IF4wAF F20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r 1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAF wI0_Xr0_Ar1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr1l84ACjcxK6I8E87Iv67 AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVCY1x0267AKxVW8Jr0_Cr1UM2AIxVAIcxkEcVAq 07x20xvEncxIr21l57IF6xkI12xvs2x26I8E6xACxx1lYx0E2Ix0cI8IcVAFwI0_Jrv_JF 1lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWUJVW8JwACjcxG0xvEwIxGrwCj r7xvwVCIw2I0I7xG6c02F41lc7I2V7IY0VAS07AlzVAYIcxG8wCF04k20xvY0x0EwIxGrw CFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v26r106r1rMI8I3I0E7480Y4vE 14v26r106r1rMI8E67AF67kF1VAFwI0_JF0_Jw1lIxkGc2Ij64vIr41lIxAIcVC0I7IYx2 IY67AKxVW8JVW5JwCI42IY6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr1lIxAIcVCF04k26cxK x2IYs7xG6r1j6r1xMIIF0xvEx4A2jsIE14v26r4j6F4UMIIF0xvEx4A2jsIEc7CjxVAFwI 0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf9x07ULdb8UUUUU= X-Spam-Status: No, score=-5.1 required=5.0 tests=BAYES_00,BODY_8BITS,HTML_MESSAGE,KAM_DMARC_STATUS,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This is a multi-part message in MIME format. --------------8585831EF02328A2731EDAF4 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit We have changed strcpy to include both strcpy and stpcpy implementation, and use USE_AS_STPCPY to distinguish these two functions, stpcpy function will define related macros and include strcpy source code. See patch v2: https://sourceware.org/pipermail/libc-alpha/2023-September/151531.html On 2023-09-11 17:53, dengjianbo wrote: > Tested strcpy-lasx comparing with strcpy(call stpcpy-lasx), the > difference between two timings are 0.28, strcpy-lasx takes less time. > When the length of data is less than 32, it could reduce the runtime > more than 30%. > > See: > https://github.com/jiadengx/glibc_test/blob/main/bench/strcpy_lasx_compare_generic_strcpy.out > > There are some duplicated code in strcpy from stpcpy, since the main > part is almost same. Maybe we can try to use one source code with > MARCO USE_AS_STPCPY to distinguish strcpy and stpcpy like x86_64? it > could avoid the performance degradation. > > On 2023-09-08 22:22, Xi Ruoyao wrote: >> On Fri, 2023-09-08 at 17:33 +0800, dengjianbo wrote: >>> According to glibc strcpy microbenchmark test results(changed to use >>> generic_strcpy instead of strlen + memcpy), comparing with generic_strcpy, >>> this implementation could reduce the runtime as following: >>> >>> Name              Percent of rutime reduced >>> strcpy-aligned    10%-45% >>> strcpy-unaligned  10%-49%, comparing with the aligned version,unaligned >>>                   version experience better performance in case src and dest >>>                   cannot be both aligned with 8bytes >>> strcpy-lsx        20%-80% >>> strcpy-lasx       15%-86% >> Generic strcpy calls stpcpy, so if we've optimized stpcpy maybe it's not >> necessary to duplicate everything in strcpy. Is there a benchmark >> result comparing the timing with and without this patch, but both with >> the second patch (optimized stpcpy)? >> --------------8585831EF02328A2731EDAF4--