From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id 3335E385840B for ; Wed, 13 Sep 2023 07:35:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3335E385840B Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn Received: from loongson.cn (unknown [10.2.5.5]) by gateway (Coremail) with SMTP id _____8AxXOomZgFl30smAA--.47284S3; Wed, 13 Sep 2023 15:35:03 +0800 (CST) Received: from 5.5.5 (unknown [10.2.5.5]) by localhost.localdomain (Coremail) with SMTP id AQAAf8BxbNwmZgFl4FwCAA--.4029S2; Wed, 13 Sep 2023 15:35:02 +0800 (CST) From: dengjianbo To: libc-alpha@sourceware.org Cc: adhemerval.zanella@linaro.org, xry111@xry111.site, caiyinyu@loongson.cn, xuchenghua@loongson.cn, huangpei@loongson.cn, dengjianbo Subject: [PATCH v2 0/3] LoongArch: Add ifunc support for str{cp, rchr}, Date: Wed, 13 Sep 2023 15:34:58 +0800 Message-Id: <20230913073501.1018239-1-dengjianbo@loongson.cn> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CM-TRANSID:AQAAf8BxbNwmZgFl4FwCAA--.4029S2 X-CM-SenderInfo: pghqwyxldqu0o6or00hjvr0hdfq/ X-Coremail-Antispam: 1Uk129KBj93XoWxAr43WF17uFWrtr1rAF4kZrc_yoWruw4xp3 97Cwn8JF4fua42gw4fta4aq3yrX3ykGr129FZIy345GrWIgr93XrySywn8ZF1UXw18JrW0 qrnakr1UW3W5AacCm3ZEXasCq-sJn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7KY7ZEXa sCq-sGcSsGvfJ3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29KBjDU 0xBIdaVrnRJUUUkFb4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2 IYs7xG6rWj6s0DM7CIcVAFz4kK6r106r15M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48v e4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_JFI_Gr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI 0_Gr0_Cr1l84ACjcxK6I8E87Iv67AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVCY1x0267AK xVW8Jr0_Cr1UM2AIxVAIcxkEcVAq07x20xvEncxIr21l57IF6xkI12xvs2x26I8E6xACxx 1l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r106r15McIj6I8E87Iv 67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41l42xK82IYc2 Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s02 6x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMIIYrxkI7VAKI48JMIIF0x vE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r1j6r4UMIIF0xvE 42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6x kF7I0E14v26r1j6r4UYxBIdaVFxhVjvjDU0xZFpf9x07URa0PUUUUU= X-Spam-Status: No, score=-6.1 required=5.0 tests=BAYES_00,KAM_DMARC_STATUS,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This patch add mutiple versions of strcpy, stpcpy, strrchr implemented by basic LoongArch instructions, LSX instructions, LASX instructions. Even though this implementation experience degradation in a few cases, overall, the performance gains are significant. See: https://github.com/jiadengx/glibc_test/blob/main/bench/strcpy_compare_v2.out https://github.com/jiadengx/glibc_test/blob/main/bench/stpcpy_compare_v2.out Test results are compared with generic strcpy and stpcpy, not strlen + memcpy in the benchmark. Generic strrchr is implemented by strlen + memrchr, the strrchr_lasx will be compared with generic_strrchr implemented by strlen-lasx and memrchr-lasx, strrchr-lsx will be compared with generic_strrchr implemented by strlen-lsx and memrchr-lsx, strrchr-aligned will be compared with generic_strrchr implemented by strlen-aligned and memrchr-generic. https://github.com/jiadengx/glibc_test/blob/main/bench/strrchr_lasx_compare.out https://github.com/jiadengx/glibc_test/blob/main/bench/strrchr_lsx_compare.out https://github.com/jiadengx/glibc_test/blob/main/bench/strrchr_aligned_compare.out In the data, positive values in the parentheses indicate that our implementation took less time, indicating a performance improvement; negative values in the parentheses mean that our implementation took more time, indicating a decrease in performance. Following is the summarise of the performance comparing with the generic version in the glibc microbenchmark, name reduce time percent strcpy-aligned 8%-45% strcpy-unaligned 8%-48%, comparing with the aligned version,unaligned version experience better performance in case src and dest cannot be both aligned with 8bytes strcpy-lsx 20%-80% strcpy-lasx 15%-86% stpcpy-lasx 10%-87% stpcpy-lsx 10%-80% stpcpy-aligned 6%-43% stpcpy-unaligned 8%-48% strrchr-lasx 10%-50% strrchr-lsx 0%-50% strrchr-aligned 5%-50% dengjianbo (3): LoongArch: Add ifunc support for strcpy, stpcpy{aligned, unaligned, lsx, lasx} LoongArch: Add ifunc support for strrchr{aligned, lsx, lasx} LoongArch: Change to put magic number to .rodata section sysdeps/loongarch/lp64/multiarch/Makefile | 11 + .../lp64/multiarch/ifunc-impl-list.c | 26 +++ .../loongarch/lp64/multiarch/ifunc-strrchr.h | 41 ++++ .../loongarch/lp64/multiarch/memmove-lsx.S | 20 +- .../loongarch/lp64/multiarch/stpcpy-aligned.S | 27 +++ .../loongarch/lp64/multiarch/stpcpy-lasx.S | 22 ++ sysdeps/loongarch/lp64/multiarch/stpcpy-lsx.S | 22 ++ .../lp64/multiarch/stpcpy-unaligned.S | 22 ++ sysdeps/loongarch/lp64/multiarch/stpcpy.c | 42 ++++ .../loongarch/lp64/multiarch/strcpy-aligned.S | 202 ++++++++++++++++ .../loongarch/lp64/multiarch/strcpy-lasx.S | 215 ++++++++++++++++++ sysdeps/loongarch/lp64/multiarch/strcpy-lsx.S | 212 +++++++++++++++++ .../lp64/multiarch/strcpy-unaligned.S | 138 +++++++++++ sysdeps/loongarch/lp64/multiarch/strcpy.c | 35 +++ .../lp64/multiarch/strrchr-aligned.S | 170 ++++++++++++++ .../loongarch/lp64/multiarch/strrchr-lasx.S | 176 ++++++++++++++ .../loongarch/lp64/multiarch/strrchr-lsx.S | 144 ++++++++++++ sysdeps/loongarch/lp64/multiarch/strrchr.c | 36 +++ 18 files changed, 1551 insertions(+), 10 deletions(-) create mode 100644 sysdeps/loongarch/lp64/multiarch/ifunc-strrchr.h create mode 100644 sysdeps/loongarch/lp64/multiarch/stpcpy-aligned.S create mode 100644 sysdeps/loongarch/lp64/multiarch/stpcpy-lasx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/stpcpy-lsx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/stpcpy-unaligned.S create mode 100644 sysdeps/loongarch/lp64/multiarch/stpcpy.c create mode 100644 sysdeps/loongarch/lp64/multiarch/strcpy-aligned.S create mode 100644 sysdeps/loongarch/lp64/multiarch/strcpy-lasx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/strcpy-lsx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/strcpy-unaligned.S create mode 100644 sysdeps/loongarch/lp64/multiarch/strcpy.c create mode 100644 sysdeps/loongarch/lp64/multiarch/strrchr-aligned.S create mode 100644 sysdeps/loongarch/lp64/multiarch/strrchr-lasx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/strrchr-lsx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/strrchr.c -- 2.40.0