From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-x431.google.com (mail-pf1-x431.google.com [IPv6:2607:f8b0:4864:20::431]) by sourceware.org (Postfix) with ESMTPS id CF97F3959C69 for ; Mon, 3 Jun 2024 21:08:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CF97F3959C69 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu ARC-Filter: OpenARC Filter v1.0.0 sourceware.org CF97F3959C69 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::431 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1717448929; cv=none; b=ol3c14rQNoL/TRjOIYEHuxzZ7S2Yp9aiwWWKcIJDG9USW+dgF4/r97ixxANgTCkd4MsD8atdYvwYMPcyVqIDcPDVWPOa+TepictYeyu6QR7CrnYrH02UOLACW68oF5vQIO5OshXGlgQxeVlynT/Fz997MJILeXtIawq+EmWudKQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1717448929; c=relaxed/simple; bh=sSLaerPkN3nIkgUydvnBFNd467Sm9PPUuNc+N1+C0TY=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=Ps1IJyBOqzksby4YVv1l2yPQQ0a3a2/w3rViRBX2r5UVJPqbFMYzPv5zYQgBBpp8fNqAa3FLArz/vOZttc/iv21hiv/6WuQQ7U8VGjzyOuAJoK9rQVB/ly9glcS877uvLpKvZ3mFLybecUil+Bqj3RIjtFrA56MbLGAQu9JciFE= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pf1-x431.google.com with SMTP id d2e1a72fcca58-70264bcb631so1429779b3a.2 for ; Mon, 03 Jun 2024 14:08:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull.eu; s=google; t=1717448925; x=1718053725; darn=sourceware.org; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=y0crdyHSoQUC+M5hunjlw5eLKsaoX8k0heYSFb1fuXg=; b=d9U18gPpI3fJO8nz/Z2UCJP1NL5MViV+UmSlcPi2OEwyI7H0U46z8HPCNrNCWHGyGF 2Kliv3KWLf7XdUZgr5zQ+bmr7gj+qMOETBFjbij6mbMRFbPrXj3ejdl9fFulRI5FpFt8 JONobEgEXU74xhySc3ToyyRkLe36s23dGcLveJYNFYCJaphPeSNGUb/duPW3SrTROxeh V+qWz6xDN0T1GzQImR3ZJoc+bxxR7kxsn4kH7QyXes/SqE6uhwR8K/8VyDUV15tppacR qOrS8qILbdlmFETHUI3UduwVIkL68gwvJ5dpVy+SmwlOYRjCzYiDaox2gtXQHHEVSM/4 23eA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717448925; x=1718053725; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=y0crdyHSoQUC+M5hunjlw5eLKsaoX8k0heYSFb1fuXg=; b=uwPt/JckWZE/0lqlBk3HPCzuEvGE2rxaJ+Xbc9w5z+zxY83Fs+Thvn1x4PpVRc3k0O 3SQpeMavLlsnqFJJIT2SBoEpOo4Mgqrn/IrBvTc8O+pLSSMA1iwFT85cvMDdj39hhwW4 6mwwScnJI5ijfXM8DvkFqYzM6lfeNbWlVEN3ZuCtna9cKDSORUvCbYMvPYfBbPTop62N Y+0f9mt0mIVNheBSsypf+BV7QTtmlvQFR3ZguVCLZBBMdcMtSEhNfc/mG/yt/7FKuU3C phNtEDGF1stcPsYSjIW9FH0u7H6oaf9AduCxiLyNsb7G8ycIx56B7wWYpyDnsRqosEoS hRTA== X-Gm-Message-State: AOJu0Yzcn1cxvy7VQIXvqegaGkg/6izIEGFPRu9fIJLErO32AJm9xFEb vU8d08czLh+TF3bq4pEBG9lzlpQPkwK7wS6nW2CKxfkhJKBNsjA5uoLGrhcZS+aahOjCgpRL3vt cGtkrUJuFzjQsS7IX13gIvQIp9zBin98+j4lxaKotZodqtCaL5WY= X-Google-Smtp-Source: AGHT+IE6pCh3UjL9pe9YeM8yRCHhKQg39F2pHSBpbY6hNinA0KOEinDOGAgk6C8G3m3+Ev3ecesRqZdf66vHtOFaiTg= X-Received: by 2002:a05:6a20:7f84:b0:1a7:a3ee:5e4a with SMTP id adf61e73a8af0-1b26f187f74mr10799005637.33.1717448925304; Mon, 03 Jun 2024 14:08:45 -0700 (PDT) MIME-Version: 1.0 References: <20240527111900.1060546-1-christoph.muellner@vrull.eu> In-Reply-To: <20240527111900.1060546-1-christoph.muellner@vrull.eu> From: =?UTF-8?Q?Christoph_M=C3=BCllner?= Date: Mon, 3 Jun 2024 23:08:33 +0200 Message-ID: Subject: Re: [PATCH v2 00/15] RISC-V: Add Zbb-optimized string routines as ifuncs To: libc-alpha@sourceware.org, Adhemerval Zanella , Palmer Dabbelt , Darius Rad , Andrew Waterman , Philipp Tomsich , Evan Green , DJ Delorie , Vineet Gupta , Kito Cheng , Jeff Law Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-4.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_MANYTO,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Ping. On Mon, May 27, 2024 at 1:19=E2=80=AFPM Christoph M=C3=BCllner wrote: > > Glibc recently got hwprobe() support for RISC-V, which allows querying > avaiable extensions at runtime. On top of that an optimized memcpy() > routine (for fast unaligned accesses) has been merged, which is built by > recompiling the generic C code with a different compiler flag. An ifunc > resolver then detects which routine should be run using hwprobe(). > > This patchset follows this idea and recompiles the following functions > for Zbb (via function attributes) and enables the existing Zbb/orc.b > optimization in riscv/string-fza.h: > memchr, memrchr, strchrnul, strcmp, strlen, strncmp. > The resulting optimized routines are then selected by the resolver functi= on > if the Zbb extension is present at runtime. > > To use target function attributes, a few issues had to be resovled: > - The functions above got a mechanism to be compiled with function attrib= utes > (patches 2-7). Only those routines have been touched, which are > required for the purpose of this patchset. > - Ensuring that inlined functions also get the same function attributes > (first patch). > - Add mechanism to explicitly enable the orc.b optimization for string fu= nctions > (patch 8), which is a bit inspired by USE_FFS_BUILTIN. > > One of the design questions is, if Zbb represents a broad enough optimiza= tion > target. Tests with Zb* extensions showed, that no further code improveme= nts > can be achieved with them. Also most other extensions likely won't affec= t > the generated code for string routines (ignoring vector instructions, whi= ch > are a different topic). Therefore, Zbb seemed like a sufficient target. > > This series was tested by writing a simple test program to invoke the > libc routines (e.g. strcmp) and a modified QEMU that reports the > emulation of orc.b on stderr. With that the QEMU can be used to test > if the optimized routines are executed (-cpu "rv64,zbb=3D[false,true]"). > Further, this series was tested with SPEC CPU 2017 intrate with Zbb > enabled. The function attribute detection mechanism was tested with > GCC 13 and GCC 14. > > Changes in v2: > - Drop "Use .insn directive form for orc.b" > - Introduce use of target function attribute (and all depenendcies) > - Introduce detection of target function attribute support > - Make orc.b optimization explicit > - Small cleanups > > Christoph M=C3=BCllner (15): > cdefs: Add mechanism to add attributes to __always_inline functions > string/memchr: Add mechanism to set function attributes > string/memrchr: Add mechanism to set function attributes > string/strchrnul: Add mechanism to set function attributes > string/strcmp: Add mechanism to set function attributes > string/strlen: Add mechanism to set function attributes > string/strncmp: Add mechanism to set function attributes > RISC-V: string-fz[a,i].h: Make orc.b optimization explicit > RISC-V: Add compiler test for Zbb function attribute support > RISC-V: Add Zbb optimized memchr as ifunc > RISC-V: Add Zbb optimized memrchr as ifunc > RISC-V: Add Zbb optimized strchrnul as ifunc > RISC-V: Add Zbb optimized strcmp as ifunc > RISC-V: Add Zbb optimized strlen as ifunc > RISC-V: Add Zbb optimized strncmp as ifunc > > config.h.in | 3 + > misc/sys/cdefs.h | 8 ++- > string/memchr.c | 5 ++ > string/memrchr.c | 5 ++ > string/strchrnul.c | 5 ++ > string/strcmp.c | 8 +++ > string/strlen.c | 5 ++ > string/strncmp.c | 8 +++ > sysdeps/riscv/configure | 27 ++++++++ > sysdeps/riscv/configure.ac | 18 +++++ > sysdeps/riscv/multiarch/memchr-generic.c | 24 +++++++ > sysdeps/riscv/multiarch/memchr-zbb.c | 23 +++++++ > sysdeps/riscv/multiarch/memrchr-generic.c | 24 +++++++ > sysdeps/riscv/multiarch/memrchr-zbb.c | 23 +++++++ > sysdeps/riscv/multiarch/strchrnul-generic.c | 24 +++++++ > sysdeps/riscv/multiarch/strchrnul-zbb.c | 23 +++++++ > sysdeps/riscv/multiarch/strcmp-generic.c | 24 +++++++ > sysdeps/riscv/multiarch/strcmp-zbb.c | 23 +++++++ > sysdeps/riscv/multiarch/strlen-generic.c | 24 +++++++ > sysdeps/riscv/multiarch/strlen-zbb.c | 23 +++++++ > sysdeps/riscv/multiarch/strncmp-generic.c | 26 +++++++ > sysdeps/riscv/multiarch/strncmp-zbb.c | 25 +++++++ > sysdeps/riscv/string-fza.h | 22 +++++- > sysdeps/riscv/string-fzi.h | 20 +++++- > .../unix/sysv/linux/riscv/multiarch/Makefile | 23 +++++++ > .../linux/riscv/multiarch/ifunc-impl-list.c | 67 +++++++++++++++++-- > .../unix/sysv/linux/riscv/multiarch/memchr.c | 60 +++++++++++++++++ > .../unix/sysv/linux/riscv/multiarch/memrchr.c | 63 +++++++++++++++++ > .../sysv/linux/riscv/multiarch/strchrnul.c | 63 +++++++++++++++++ > .../unix/sysv/linux/riscv/multiarch/strcmp.c | 59 ++++++++++++++++ > .../unix/sysv/linux/riscv/multiarch/strlen.c | 59 ++++++++++++++++ > .../unix/sysv/linux/riscv/multiarch/strncmp.c | 59 ++++++++++++++++ > 32 files changed, 863 insertions(+), 10 deletions(-) > create mode 100644 sysdeps/riscv/multiarch/memchr-generic.c > create mode 100644 sysdeps/riscv/multiarch/memchr-zbb.c > create mode 100644 sysdeps/riscv/multiarch/memrchr-generic.c > create mode 100644 sysdeps/riscv/multiarch/memrchr-zbb.c > create mode 100644 sysdeps/riscv/multiarch/strchrnul-generic.c > create mode 100644 sysdeps/riscv/multiarch/strchrnul-zbb.c > create mode 100644 sysdeps/riscv/multiarch/strcmp-generic.c > create mode 100644 sysdeps/riscv/multiarch/strcmp-zbb.c > create mode 100644 sysdeps/riscv/multiarch/strlen-generic.c > create mode 100644 sysdeps/riscv/multiarch/strlen-zbb.c > create mode 100644 sysdeps/riscv/multiarch/strncmp-generic.c > create mode 100644 sysdeps/riscv/multiarch/strncmp-zbb.c > create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/memchr.c > create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/memrchr.c > create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/strchrnul.c > create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/strcmp.c > create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/strlen.c > create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/strncmp.c > > -- > 2.45.1 >