From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id BBD1C3858D28 for ; Tue, 18 Apr 2023 03:01:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BBD1C3858D28 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn Received: from loongson.cn (unknown [10.20.4.187]) by gateway (Coremail) with SMTP id _____8DxE0zwBz5keEEeAA--.47482S3; Tue, 18 Apr 2023 11:01:05 +0800 (CST) Received: from [10.20.4.187] (unknown [10.20.4.187]) by localhost.localdomain (Coremail) with SMTP id AQAAf8AxWb3uBz5kkhwsAA--.50478S3; Tue, 18 Apr 2023 11:01:02 +0800 (CST) Subject: Re: [PATCH 0/5] LoongArch: Multiarch string and memory copy routines for unaligned access To: Xi Ruoyao , libc-alpha@sourceware.org Cc: Wang Xuerui , Adhemerval Zanella Netto References: <20230415112340.38431-1-xry111@xry111.site> From: caiyinyu Message-ID: Date: Tue, 18 Apr 2023 11:01:02 +0800 User-Agent: Mozilla/5.0 (X11; Linux mips64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <20230415112340.38431-1-xry111@xry111.site> Content-Type: text/plain; charset=gbk; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-CM-TRANSID:AQAAf8AxWb3uBz5kkhwsAA--.50478S3 X-CM-SenderInfo: 5fdl5xhq1xqz5rrqw2lrqou0/ X-Coremail-Antispam: 1Uk129KBjvJXoWxWF1fZFW8uF1DZryDKr4UJwb_yoW5ZFWUpw 4xuFn8Jr4rKr97Krn3tw15XF1rXr4xGr42va4ak34UZryxZrn5ZrySy3ZxZFyDJw1xKrW0 vrn5Wr1UWF15taDanT9S1TB71UUUUUDqnTZGkaVYY2UrUUUUj1kv1TuYvTs0mT0YCTnIWj qI5I8CrVACY4xI64kE6c02F40Ex7xfYxn0WfASr-VFAUDa7-sFnT9fnUUIcSsGvfJTRUUU bI8YFVCjjxCrM7AC8VAFwI0_Jr0_Gr1l1xkIjI8I6I8E6xAIw20EY4v20xvaj40_Wr0E3s 1l1IIY67AEw4v_JrI_Jryl8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxSw2x7M28EF7xv wVC0I7IYx2IY67AKxVW8JVW5JwA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxVW8JVWxJwA2z4 x0Y4vEx4A2jsIE14v26r4UJVWxJr1l84ACjcxK6I8E87Iv6xkF7I0E14v26r4UJVWxJr1l e2I262IYc4CY6c8Ij28IcVAaY2xG8wAqjxCEc2xF0cIa020Ex4CE44I27wAqx4xG64xvF2 IEw4CE5I8CrVC2j2WlYx0E2Ix0cI8IcVAFwI0_Jrv_JF1lYx0Ex4A2jsIE14v26r1j6r4U McvjeVCFs4IE7xkEbVWUJVW8JwACjcxG0xvEwIxGrwCYjI0SjxkI62AI1cAE67vIY487Mx AIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_ Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUAVWUtwCIc40Y0x0EwI xGrwCI42IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVWUJVW8 JwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcV C2z280aVCY1x0267AKxVWUJVW8JbIYCTnIWIevJa73UjIFyTuYvjxU7XTmDUUUU X-Spam-Status: No, score=-5.2 required=5.0 tests=BAYES_00,KAM_DMARC_STATUS,MIME_CHARSET_FARAWAY,NICE_REPLY_A,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: We are preparing a series of patches that include ifunc support (aligned/unaligned/vectorized assembly implementation) for str/mem functions, tunable functionality, and vectorized _dl_runtime_resolve. However, we are not currently able to submit them to the upstream community. We may consider publishing them on GitHub in the future like gcc and binutils. We will temporarily keep your patches. ÔÚ 2023/4/15 ÏÂÎç7:23, Xi Ruoyao дµÀ: > LoongArch CPUs may have hardware unaligned access support. For the > launched LoongArch CPUs, those branded as Loongson-3 (for desktops or > servers) have hardware unaligned access support, but those branded as > Loongson-2 (for embedded or industrial applications) do not. > > On Linux, the unaligned access support is indicated by a HWCAP bit > provided by the kernel. So we can multiarch stpcpy and memcpy with > ifunc to take the advantage on the CPUs with unaligned access support. > > On a Loongson-3A5000HV CPU running at 2.5GHz, "make bench" has shown > these changes can really improve the performance: > > - https://www.linuxfromscratch.org/~xry111/loongarch-ual-bench/bench-stpcpy-summary.txt > - https://www.linuxfromscratch.org/~xry111/loongarch-ual-bench/bench-memcpy-summary.txt > > Xi Ruoyao (5): > LoongArch: Add bits/hwcap.h for Linux > LoongArch: Add LOONGARCH_HAVE_UAL macro > string: stpcpy.c: Only alias __stpcpy to stpcpy if STPCPY undefined > LoongArch: Multiarch stpcpy for unaligned access > LoongArch: Multiarch memcpy for unaligned access > > string/stpcpy.c | 3 ++ > sysdeps/loongarch/loongarch-features.h | 26 ++++++++++ > sysdeps/loongarch/multiarch/Makefile | 6 +++ > sysdeps/loongarch/multiarch/memcpy-generic.c | 27 ++++++++++ > sysdeps/loongarch/multiarch/memcpy-ual.c | 50 +++++++++++++++++++ > sysdeps/loongarch/multiarch/memcpy.c | 39 +++++++++++++++ > sysdeps/loongarch/multiarch/stpcpy-generic.c | 25 ++++++++++ > sysdeps/loongarch/multiarch/stpcpy-ual.c | 43 ++++++++++++++++ > sysdeps/loongarch/multiarch/stpcpy.c | 37 ++++++++++++++ > .../loongarch/multiarch/wordcopy-ual-inline.c | 31 ++++++++++++ > .../unix/sysv/linux/loongarch/bits/hwcap.h | 37 ++++++++++++++ > .../sysv/linux/loongarch/loongarch-features.h | 30 +++++++++++ > sysdeps/unix/sysv/linux/loongarch/sysdep.h | 1 + > 13 files changed, 355 insertions(+) > create mode 100644 sysdeps/loongarch/loongarch-features.h > create mode 100644 sysdeps/loongarch/multiarch/Makefile > create mode 100644 sysdeps/loongarch/multiarch/memcpy-generic.c > create mode 100644 sysdeps/loongarch/multiarch/memcpy-ual.c > create mode 100644 sysdeps/loongarch/multiarch/memcpy.c > create mode 100644 sysdeps/loongarch/multiarch/stpcpy-generic.c > create mode 100644 sysdeps/loongarch/multiarch/stpcpy-ual.c > create mode 100644 sysdeps/loongarch/multiarch/stpcpy.c > create mode 100644 sysdeps/loongarch/multiarch/wordcopy-ual-inline.c > create mode 100644 sysdeps/unix/sysv/linux/loongarch/bits/hwcap.h > create mode 100644 sysdeps/unix/sysv/linux/loongarch/loongarch-features.h >