From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id C3C533858D33 for ; Mon, 28 Aug 2023 07:26:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C3C533858D33 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn Received: from loongson.cn (unknown [10.2.5.5]) by gateway (Coremail) with SMTP id _____8AxCPI9TOxkXnUcAA--.58084S3; Mon, 28 Aug 2023 15:26:53 +0800 (CST) Received: from 5.5.5 (unknown [10.2.5.5]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Dx4eQ9TOxkBYBlAA--.49174S2; Mon, 28 Aug 2023 15:26:53 +0800 (CST) From: dengjianbo To: libc-alpha@sourceware.org Cc: adhemerval.zanella@linaro.org, xry111@xry111.site, caiyinyu@loongson.cn, xuchenghua@loongson.cn, huangpei@loongson.cn, dengjianbo Subject: [PATCH 0/6] LoongArch: Add ifunc support for {raw}memchr, Date: Mon, 28 Aug 2023 15:26:45 +0800 Message-Id: <20230828072651.3085034-1-dengjianbo@loongson.cn> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CM-TRANSID:AQAAf8Dx4eQ9TOxkBYBlAA--.49174S2 X-CM-SenderInfo: pghqwyxldqu0o6or00hjvr0hdfq/ X-Coremail-Antispam: 1Uk129KBj93XoWxKry8JF18Xry3KryfCrWxGrX_yoWxKFy8p3 srCwn8JF4xC3W2gr4Iyw43Xa1rArWkGw12vF9IyryUGrW8Xr93ZryIvw1DXF1DXw18XrW0 vrnYkw1UWa1UCagCm3ZEXasCq-sJn29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7KY7ZEXa sCq-sGcSsGvfJ3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29KBjDU 0xBIdaVrnRJUUUkFb4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2 IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48v e4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Gr0_Xr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI 0_Gr0_Cr1l84ACjcxK6I8E87Iv67AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVCY1x0267AK xVW8Jr0_Cr1UM2AIxVAIcxkEcVAq07x20xvEncxIr21l57IF6xkI12xvs2x26I8E6xACxx 1l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv 67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41l42xK82IYc2 Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s02 6x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMIIYrxkI7VAKI48JMIIF0x vE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r1j6r4UMIIF0xvE 42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6x kF7I0E14v26r1j6r4UYxBIdaVFxhVjvjDU0xZFpf9x07UE-erUUUUU= X-Spam-Status: No, score=-6.1 required=5.0 tests=BAYES_00,KAM_DMARC_STATUS,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This patch add mutiple versions of rawmemchr, memchr, memrchr, memset, memcmp implemented by LoongArch basic instructions, LSX instructions, LASX instructions, comparing with current generic version, even this implementation experience performance degradation in few cases, overall, the performance gains are significant. See: https://github.com/jiadengx/glibc_test/blob/main/bench/rawmemchr_compare.out https://github.com/jiadengx/glibc_test/blob/main/bench/memchr_compare.out https://github.com/jiadengx/glibc_test/blob/main/bench/memrchr_compare.out https://github.com/jiadengx/glibc_test/blob/main/bench/memset_compare.out https://github.com/jiadengx/glibc_test/blob/main/bench/memcmp_compare.out In the data, positive values in the parentheses indicate that out implementation took less time, indicating a performance improvement; negative values in the parentheses mean that our implementation took more time, indicating a decrease in performance. Following is the summarize of the performance comparing with the generic version in the glibc microbenchmark: Name Percent of time reduced rawmemchr-lasx 40%-80% rawmemchr-lsx 40%-66% rawmemchr-aligned 20%-40% memchr-lasx 37%-83% memchr-lsx 30%-66% memchr-aligned 0%-15% memrchr-lasx 20%-83% memrchr-lsx 20%-64% memset-lasx 15%-75% memset-lsx 15%-50% memset-unaligned performance is close when the length larger than 128. For 8-128, 30%-70% memset-aligned performance is close when the length larger than 128. For 8-128, 20%-50% memcmp-lasx 16%-74% memcmp-lsx 20%-50% memcmp-aligned 5%-20% dengjianbo (6): LoongArch: Add ifunc support for rawmemchr{aligned, lsx, lasx} LoongArch: Add ifunc support for memchr{aligned, lsx, lasx} LoongArch: Add ifunc support for memrchr{lsx, lasx} LoongArch: Add ifunc support for memset{aligned, unaligned, lsx, lasx} LoongArch: Add ifunc support for memcmp{aligned, lsx, lasx} LoongArch: Change loongarch to LoongArch in comments sysdeps/loongarch/lp64/multiarch/Makefile | 16 + .../lp64/multiarch/dl-symbol-redir-ifunc.h | 24 ++ .../lp64/multiarch/ifunc-impl-list.c | 40 +++ .../loongarch/lp64/multiarch/ifunc-memchr.h | 40 +++ .../loongarch/lp64/multiarch/ifunc-memcmp.h | 40 +++ .../loongarch/lp64/multiarch/ifunc-memrchr.h | 40 +++ .../lp64/multiarch/ifunc-rawmemchr.h | 40 +++ .../loongarch/lp64/multiarch/memchr-aligned.S | 95 ++++++ .../loongarch/lp64/multiarch/memchr-lasx.S | 117 +++++++ sysdeps/loongarch/lp64/multiarch/memchr-lsx.S | 102 ++++++ sysdeps/loongarch/lp64/multiarch/memchr.c | 37 +++ .../loongarch/lp64/multiarch/memcmp-aligned.S | 292 ++++++++++++++++++ .../loongarch/lp64/multiarch/memcmp-lasx.S | 207 +++++++++++++ sysdeps/loongarch/lp64/multiarch/memcmp-lsx.S | 269 ++++++++++++++++ sysdeps/loongarch/lp64/multiarch/memcmp.c | 43 +++ .../loongarch/lp64/multiarch/memcpy-aligned.S | 2 +- .../loongarch/lp64/multiarch/memcpy-lasx.S | 2 +- sysdeps/loongarch/lp64/multiarch/memcpy-lsx.S | 2 +- .../lp64/multiarch/memcpy-unaligned.S | 2 +- .../lp64/multiarch/memmove-aligned.S | 2 +- .../loongarch/lp64/multiarch/memmove-lasx.S | 2 +- .../loongarch/lp64/multiarch/memmove-lsx.S | 2 +- .../lp64/multiarch/memmove-unaligned.S | 2 +- .../lp64/multiarch/memrchr-generic.c | 23 ++ .../loongarch/lp64/multiarch/memrchr-lasx.S | 123 ++++++++ .../loongarch/lp64/multiarch/memrchr-lsx.S | 105 +++++++ sysdeps/loongarch/lp64/multiarch/memrchr.c | 33 ++ .../loongarch/lp64/multiarch/memset-aligned.S | 174 +++++++++++ .../loongarch/lp64/multiarch/memset-lasx.S | 142 +++++++++ sysdeps/loongarch/lp64/multiarch/memset-lsx.S | 135 ++++++++ .../lp64/multiarch/memset-unaligned.S | 162 ++++++++++ sysdeps/loongarch/lp64/multiarch/memset.c | 37 +++ .../lp64/multiarch/rawmemchr-aligned.S | 124 ++++++++ .../loongarch/lp64/multiarch/rawmemchr-lasx.S | 82 +++++ .../loongarch/lp64/multiarch/rawmemchr-lsx.S | 71 +++++ sysdeps/loongarch/lp64/multiarch/rawmemchr.c | 37 +++ .../loongarch/lp64/multiarch/strchr-aligned.S | 2 +- .../loongarch/lp64/multiarch/strchr-lasx.S | 2 +- sysdeps/loongarch/lp64/multiarch/strchr-lsx.S | 2 +- .../lp64/multiarch/strchrnul-aligned.S | 2 +- .../loongarch/lp64/multiarch/strchrnul-lasx.S | 2 +- .../loongarch/lp64/multiarch/strchrnul-lsx.S | 2 +- .../loongarch/lp64/multiarch/strcmp-aligned.S | 2 +- sysdeps/loongarch/lp64/multiarch/strcmp-lsx.S | 2 +- .../loongarch/lp64/multiarch/strlen-aligned.S | 2 +- .../loongarch/lp64/multiarch/strlen-lasx.S | 2 +- sysdeps/loongarch/lp64/multiarch/strlen-lsx.S | 2 +- .../lp64/multiarch/strncmp-aligned.S | 2 +- .../loongarch/lp64/multiarch/strncmp-lsx.S | 2 +- .../lp64/multiarch/strnlen-aligned.S | 2 +- .../loongarch/lp64/multiarch/strnlen-lasx.S | 2 +- .../loongarch/lp64/multiarch/strnlen-lsx.S | 2 +- 52 files changed, 2674 insertions(+), 24 deletions(-) create mode 100644 sysdeps/loongarch/lp64/multiarch/dl-symbol-redir-ifunc.h create mode 100644 sysdeps/loongarch/lp64/multiarch/ifunc-memchr.h create mode 100644 sysdeps/loongarch/lp64/multiarch/ifunc-memcmp.h create mode 100644 sysdeps/loongarch/lp64/multiarch/ifunc-memrchr.h create mode 100644 sysdeps/loongarch/lp64/multiarch/ifunc-rawmemchr.h create mode 100644 sysdeps/loongarch/lp64/multiarch/memchr-aligned.S create mode 100644 sysdeps/loongarch/lp64/multiarch/memchr-lasx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/memchr-lsx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/memchr.c create mode 100644 sysdeps/loongarch/lp64/multiarch/memcmp-aligned.S create mode 100644 sysdeps/loongarch/lp64/multiarch/memcmp-lasx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/memcmp-lsx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/memcmp.c create mode 100644 sysdeps/loongarch/lp64/multiarch/memrchr-generic.c create mode 100644 sysdeps/loongarch/lp64/multiarch/memrchr-lasx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/memrchr-lsx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/memrchr.c create mode 100644 sysdeps/loongarch/lp64/multiarch/memset-aligned.S create mode 100644 sysdeps/loongarch/lp64/multiarch/memset-lasx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/memset-lsx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/memset-unaligned.S create mode 100644 sysdeps/loongarch/lp64/multiarch/memset.c create mode 100644 sysdeps/loongarch/lp64/multiarch/rawmemchr-aligned.S create mode 100644 sysdeps/loongarch/lp64/multiarch/rawmemchr-lasx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/rawmemchr-lsx.S create mode 100644 sysdeps/loongarch/lp64/multiarch/rawmemchr.c -- 2.40.0