From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id BC20D3858407 for ; Mon, 11 Sep 2023 09:53:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BC20D3858407 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn Received: from loongson.cn (unknown [111.9.175.10]) by gateway (Coremail) with SMTP id _____8Cx2eqZ4_5keockAA--.898S3; Mon, 11 Sep 2023 17:53:30 +0800 (CST) Received: from [10.136.15.11] (unknown [111.9.175.10]) by localhost.localdomain (Coremail) with SMTP id AQAAf8AxjiOW4_5kfCJ3AA--.47071S3; Mon, 11 Sep 2023 17:53:27 +0800 (CST) Subject: Re: [PATCH 1/4] LoongArch: Add ifunc support for strcpy{aligned, unaligned, lsx, lasx} To: Xi Ruoyao , libc-alpha@sourceware.org Cc: adhemerval.zanella@linaro.org, caiyinyu@loongson.cn, xuchenghua@loongson.cn, huangpei@loongson.cn References: <20230908093357.3119822-1-dengjianbo@loongson.cn> <20230908093357.3119822-2-dengjianbo@loongson.cn> <7246aa6f75703fd18f9a22e83d759dae1264797b.camel@xry111.site> From: dengjianbo Message-ID: <1ce4cff2-72a5-06cc-ef64-904c3ec47e4f@loongson.cn> Date: Mon, 11 Sep 2023 17:53:26 +0800 User-Agent: Mozilla/5.0 (X11; Linux loongarch64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <7246aa6f75703fd18f9a22e83d759dae1264797b.camel@xry111.site> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Content-Language: en-US X-CM-TRANSID:AQAAf8AxjiOW4_5kfCJ3AA--.47071S3 X-CM-SenderInfo: pghqwyxldqu0o6or00hjvr0hdfq/ X-Coremail-Antispam: 1Uk129KBj93XoW7Kr15Kw4DKr18JF1UtFy8Xrc_yoW8WrWfpa y8Kw12yFykJ34rAw4xJw1vqFyjqF4kt3WUJrWFyFyDAr1DWa4qqrZrWF1jgFyxXr48Gw45 XryfAryUZFnrZabCm3ZEXasCq-sJn29KB7ZKAUJUUUU5529EdanIXcx71UUUUU7KY7ZEXa sCq-sGcSsGvfJ3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29KBjDU 0xBIdaVrnRJUUUvYb4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2 IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48v e4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Xr0_Ar1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI 0_Gr0_Cr1l84ACjcxK6I8E87Iv67AKxVWxJr0_GcWl84ACjcxK6I8E87Iv6xkF7I0E14v2 6F4UJVW0owAS0I0E0xvYzxvE52x082IY62kv0487Mc804VCY07AIYIkI8VC2zVCFFI0UMc 02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUXVWUAwAv7VC2z280aVAF wI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcVAKI48JMxk0xIA0c2IEe2xFo4 CEbIxvr21l42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG 67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMI IYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E 14v26r1j6r4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJV W8JwCI42IY6I8E87Iv6xkF7I0E14v26r1j6r4UYxBIdaVFxhVjvjDU0xZFpf9x07UNvtZU UUUU= X-Spam-Status: No, score=-5.1 required=5.0 tests=BAYES_00,BODY_8BITS,KAM_DMARC_STATUS,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Tested strcpy-lasx comparing with strcpy(call stpcpy-lasx), the difference between two timings are 0.28, strcpy-lasx takes less time. When the length of data is less than 32, it could reduce the runtime more than 30%. See: https://github.com/jiadengx/glibc_test/blob/main/bench/strcpy_lasx_compare_generic_strcpy.out There are some duplicated code in strcpy from stpcpy, since the main part is almost same. Maybe we can try to use one source code with MARCO USE_AS_STPCPY to distinguish strcpy and stpcpy link x86_64? it could avoid the performance degradation. On 2023-09-08 22:22, Xi Ruoyao wrote: > On Fri, 2023-09-08 at 17:33 +0800, dengjianbo wrote: >> According to glibc strcpy microbenchmark test results(changed to use >> generic_strcpy instead of strlen + memcpy), comparing with generic_strcpy, >> this implementation could reduce the runtime as following: >> >> Name              Percent of rutime reduced >> strcpy-aligned    10%-45% >> strcpy-unaligned  10%-49%, comparing with the aligned version,unaligned >>                   version experience better performance in case src and dest >>                   cannot be both aligned with 8bytes >> strcpy-lsx        20%-80% >> strcpy-lasx       15%-86% > Generic strcpy calls stpcpy, so if we've optimized stpcpy maybe it's not > necessary to duplicate everything in strcpy. Is there a benchmark > result comparing the timing with and without this patch, but both with > the second patch (optimized stpcpy)? >