public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Palmer Dabbelt <palmer@dabbelt.com>
To: philipp.tomsich@vrull.eu
Cc: goldstein.w.n@gmail.com, christoph.muellner@vrull.eu,
	libc-alpha@sourceware.org, Darius Rad <darius@bluespec.com>,
	Andrew Waterman <andrew@sifive.com>, DJ Delorie <dj@redhat.com>,
	Vineet Gupta <vineetg@rivosinc.com>,
	kito.cheng@sifive.com, jeffreyalaw@gmail.com,
	heiko.stuebner@vrull.eu
Subject: Re: [RFC PATCH 18/19] riscv: Add an optimized strncmp routine
Date: Wed, 08 Feb 2023 09:55:03 -0800 (PST)	[thread overview]
Message-ID: <mhng-e7534968-425c-4164-a403-1d02725c661c@palmer-ri-x1c9> (raw)
In-Reply-To: <CAAeLtUC0gCvUyaUaH5rVGE37ibaMFSA1TojoG4jdOkwkZAcPsw@mail.gmail.com>

On Wed, 08 Feb 2023 07:13:44 PST (-0800), philipp.tomsich@vrull.eu wrote:
> On Tue, 7 Feb 2023 at 02:20, Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>>
>> On Mon, Feb 6, 2023 at 6:23 PM Christoph Muellner
>> <christoph.muellner@vrull.eu> wrote:
>> >
>> > From: Christoph Müllner <christoph.muellner@vrull.eu>
>> >
>> > The implementation of strncmp() can be accelerated using Zbb's orc.b
>> > instruction. Let's add an optimized implementation that makes use
>> > of this instruction.
>> >
>> > Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
>>
>> Not necessary, but imo performance patches should have at least some reference
>> to the expected speedup versus the existing alternatives.
>
> Given that this is effectively a SWAR-like optimization (orc.b allows
> us to test 8 bytes in parallel for a NUL byte), we should be able to
> show the benefit through a reduction in dynamic instructions.  Would
> this be considered reasonable reference data?

Generally for performance improvements the only metrics that count come 
from real hardware.  Processor implementation is complex and it's not 
generally true that reducing dynamic instructions results in better 
performance (particularly when more complex flavors instructions replace 
simpler ones).

We've not been so good about this on the RISC-V side of things, though.  
I think that's largely because we didn't have all that much complexity 
around this, but there's a ton of stuff showing up right now.  The 
general theory has been that Zbb instructions will execute faster than 
their corresponding I sequences, but nobody has proved that.  I believe 
the new JH7110 has Zba and Zbb, so maybe the right answer there is to 
just benchmark things before merging them?  That way we can get back to 
doing things sanely before we go too far down the premature optimization 
rabbit hole.

FWIW: we had a pretty similar discussion in Linux land around these and 
nobody could get the JH7110 to boot, but given that we have ~6 months 
until glibc releases again hopefully that will be sorted out.  There's a 
bunch of ongoing work looking at the more core issues like probing, so 
maybe it's best to focus on getting that all sorted out first?  It's 
kind of awkward to have a bunch of routines posted in a whole new 
framework that's not sorting out all the probing dependencies.

>> > ---
>> >  sysdeps/riscv/multiarch/Makefile          |   3 +-
>> >  sysdeps/riscv/multiarch/ifunc-impl-list.c |   1 +
>> >  sysdeps/riscv/multiarch/strncmp.c         |   6 +-
>> >  sysdeps/riscv/multiarch/strncmp_zbb.S     | 119 ++++++++++++++++++++++
>> >  4 files changed, 127 insertions(+), 2 deletions(-)
>> >  create mode 100644 sysdeps/riscv/multiarch/strncmp_zbb.S
>> >
>> > diff --git a/sysdeps/riscv/multiarch/Makefile b/sysdeps/riscv/multiarch/Makefile
>> > index 056ce2ffc0..9f22e31b99 100644
>> > --- a/sysdeps/riscv/multiarch/Makefile
>> > +++ b/sysdeps/riscv/multiarch/Makefile
>> > @@ -14,5 +14,6 @@ sysdep_routines += \
>> >         strcmp_generic \
>> >         strcmp_zbb \
>> >         strcmp_zbb_unaligned \
>> > -       strncmp_generic
>> > +       strncmp_generic \
>> > +       strncmp_zbb
>> >  endif
>> > diff --git a/sysdeps/riscv/multiarch/ifunc-impl-list.c b/sysdeps/riscv/multiarch/ifunc-impl-list.c
>> > index eb37ed6017..82fd34d010 100644
>> > --- a/sysdeps/riscv/multiarch/ifunc-impl-list.c
>> > +++ b/sysdeps/riscv/multiarch/ifunc-impl-list.c
>> > @@ -64,6 +64,7 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
>> >               IFUNC_IMPL_ADD (array, i, strcmp, 1, __strcmp_generic))
>> >
>> >    IFUNC_IMPL (i, name, strncmp,
>> > +             IFUNC_IMPL_ADD (array, i, strncmp, 1, __strncmp_zbb)
>> >               IFUNC_IMPL_ADD (array, i, strncmp, 1, __strncmp_generic))
>> >    return i;
>> >  }
>> > diff --git a/sysdeps/riscv/multiarch/strncmp.c b/sysdeps/riscv/multiarch/strncmp.c
>> > index 970aeb8b85..5b0fe08e98 100644
>> > --- a/sysdeps/riscv/multiarch/strncmp.c
>> > +++ b/sysdeps/riscv/multiarch/strncmp.c
>> > @@ -30,8 +30,12 @@
>> >
>> >  extern __typeof (__redirect_strncmp) __libc_strncmp;
>> >  extern __typeof (__redirect_strncmp) __strncmp_generic attribute_hidden;
>> > +extern __typeof (__redirect_strncmp) __strncmp_zbb attribute_hidden;
>> >
>> > -libc_ifunc (__libc_strncmp, __strncmp_generic);
>> > +libc_ifunc (__libc_strncmp,
>> > +           HAVE_RV(zbb)
>> > +           ? __strncmp_zbb
>> > +           : __strncmp_generic);
>> >
>> >  # undef strncmp
>> >  strong_alias (__libc_strncmp, strncmp);
>> > diff --git a/sysdeps/riscv/multiarch/strncmp_zbb.S b/sysdeps/riscv/multiarch/strncmp_zbb.S
>> > new file mode 100644
>> > index 0000000000..29cff30def
>> > --- /dev/null
>> > +++ b/sysdeps/riscv/multiarch/strncmp_zbb.S
>> > @@ -0,0 +1,119 @@
>> > +/* Copyright (C) 2022 Free Software Foundation, Inc.
>> > +
>> > +   This file is part of the GNU C Library.
>> > +
>> > +   The GNU C Library is free software; you can redistribute it and/or
>> > +   modify it under the terms of the GNU Lesser General Public
>> > +   License as published by the Free Software Foundation; either
>> > +   version 2.1 of the License, or (at your option) any later version.
>> > +
>> > +   The GNU C Library is distributed in the hope that it will be useful,
>> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> > +   Lesser General Public License for more details.
>> > +
>> > +   You should have received a copy of the GNU Lesser General Public
>> > +   License along with the GNU C Library.  If not, see
>> > +   <https://www.gnu.org/licenses/>.  */
>> > +
>> > +#include <sysdep.h>
>> > +#include <sys/asm.h>
>> > +
>> > +/* Assumptions: rvi_zbb.  */
>> > +
>> > +#define src1           a0
>> > +#define result         a0
>> > +#define src2           a1
>> > +#define len            a2
>> > +#define data1          a2
>> > +#define data2          a3
>> > +#define align          a4
>> > +#define data1_orcb     t0
>> > +#define limit          t1
>> > +#define fast_limit      t2
>> > +#define m1             t3
>> > +
>> > +#if __riscv_xlen == 64
>> > +# define REG_L ld
>> > +# define SZREG 8
>> > +# define PTRLOG        3
>> > +#else
>> > +# define REG_L lw
>> > +# define SZREG 4
>> > +# define PTRLOG        2
>> > +#endif
>> > +
>> > +#ifndef STRNCMP
>> > +# define STRNCMP __strncmp_zbb
>> > +#endif
>> > +
>> > +.option push
>> > +.option arch,+zbb
>> > +
>> > +ENTRY_ALIGN (STRNCMP, 6)
>> > +       beqz    len, L(equal)
>> > +       or      align, src1, src2
>> > +       and     align, align, SZREG-1
>> > +       add     limit, src1, len
>> > +       bnez    align, L(simpleloop)
>> > +       li      m1, -1
>> > +
>> > +       /* Adjust limit for fast-path.  */
>> > +       andi fast_limit, limit, -SZREG
>> > +
>> > +       /* Main loop for aligned string.  */
>> > +       .p2align 3
>> > +L(loop):
>> > +       bge     src1, fast_limit, L(simpleloop)
>> > +       REG_L   data1, 0(src1)
>> > +       REG_L   data2, 0(src2)
>> > +       orc.b   data1_orcb, data1
>> > +       bne     data1_orcb, m1, L(foundnull)
>> > +       addi    src1, src1, SZREG
>> > +       addi    src2, src2, SZREG
>> > +       beq     data1, data2, L(loop)
>> > +
>> > +       /* Words don't match, and no null byte in the first
>> > +        * word. Get bytes in big-endian order and compare.  */
>> > +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
>> > +       rev8    data1, data1
>> > +       rev8    data2, data2
>> > +#endif
>> > +       /* Synthesize (data1 >= data2) ? 1 : -1 in a branchless sequence.  */
>> > +       sltu    result, data1, data2
>> > +       neg     result, result
>> > +       ori     result, result, 1
>> > +       ret
>> > +
>> > +L(foundnull):
>> > +       /* Found a null byte.
>> > +        * If words don't match, fall back to simple loop.  */
>> > +       bne     data1, data2, L(simpleloop)
>> > +
>> > +       /* Otherwise, strings are equal.  */
>> > +       li      result, 0
>> > +       ret
>> > +
>> > +       /* Simple loop for misaligned strings.  */
>> > +       .p2align 3
>> > +L(simpleloop):
>> > +       bge     src1, limit, L(equal)
>> > +       lbu     data1, 0(src1)
>> > +       addi    src1, src1, 1
>> > +       lbu     data2, 0(src2)
>> > +       addi    src2, src2, 1
>> > +       bne     data1, data2, L(sub)
>> > +       bnez    data1, L(simpleloop)
>> > +
>> > +L(sub):
>> > +       sub     result, data1, data2
>> > +       ret
>> > +
>> > +L(equal):
>> > +       li      result, 0
>> > +       ret
>> > +
>> > +.option pop
>> > +
>> > +END (STRNCMP)
>> > +libc_hidden_builtin_def (STRNCMP)
>> > --
>> > 2.39.1
>> >

  reply	other threads:[~2023-02-08 17:55 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-07  0:15 [RFC PATCH 00/19] riscv: ifunc support with optimized mem*/str*/cpu_relax routines Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 01/19] Inhibit early libcalls before ifunc support is ready Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 02/19] riscv: LEAF: Use C_LABEL() to construct the asm name for a C symbol Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 03/19] riscv: Add ENTRY_ALIGN() macro Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 04/19] riscv: Add hart feature run-time detection framework Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 05/19] riscv: Introduction of ISA extensions Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 06/19] riscv: Adding ISA string parser for environment variables Christoph Muellner
2023-02-07  6:20   ` David Abdurachmanov
2023-02-07  0:16 ` [RFC PATCH 07/19] riscv: hart-features: Add fast_unaligned property Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 08/19] riscv: Add (empty) ifunc framework Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 09/19] riscv: Add ifunc support for memset Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 10/19] riscv: Add accelerated memset routines for RV64 Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 11/19] riscv: Add ifunc support for memcpy/memmove Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 12/19] riscv: Add accelerated memcpy/memmove routines for RV64 Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 13/19] riscv: Add ifunc support for strlen Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 14/19] riscv: Add accelerated strlen routine Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 15/19] riscv: Add ifunc support for strcmp Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 16/19] riscv: Add accelerated strcmp routines Christoph Muellner
2023-02-07 11:57   ` Xi Ruoyao
2023-02-07 14:15     ` Christoph Müllner
2023-03-31  5:06       ` Jeff Law
2023-03-31 12:31         ` Adhemerval Zanella Netto
2023-03-31 14:30           ` Jeff Law
2023-03-31 14:48             ` Adhemerval Zanella Netto
2023-03-31 17:19               ` Palmer Dabbelt
2023-03-31 14:32       ` Jeff Law
2023-02-07  0:16 ` [RFC PATCH 17/19] riscv: Add ifunc support for strncmp Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 18/19] riscv: Add an optimized strncmp routine Christoph Muellner
2023-02-07  1:19   ` Noah Goldstein
2023-02-08 15:13     ` Philipp Tomsich
2023-02-08 17:55       ` Palmer Dabbelt [this message]
2023-02-08 19:48         ` Adhemerval Zanella Netto
2023-02-08 18:04       ` Noah Goldstein
2023-02-07  0:16 ` [RFC PATCH 19/19] riscv: Add __riscv_cpu_relax() to allow yielding in busy loops Christoph Muellner
2023-02-07  0:23   ` Andrew Waterman
2023-02-07  0:29     ` Christoph Müllner
2023-02-07  2:59 ` [RFC PATCH 00/19] riscv: ifunc support with optimized mem*/str*/cpu_relax routines Kito Cheng
2023-02-07 16:40 ` Adhemerval Zanella Netto
2023-02-07 17:16   ` DJ Delorie
2023-02-07 19:32     ` Philipp Tomsich
2023-02-07 21:14       ` DJ Delorie
2023-02-08 11:26         ` Christoph Müllner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mhng-e7534968-425c-4164-a403-1d02725c661c@palmer-ri-x1c9 \
    --to=palmer@dabbelt.com \
    --cc=andrew@sifive.com \
    --cc=christoph.muellner@vrull.eu \
    --cc=darius@bluespec.com \
    --cc=dj@redhat.com \
    --cc=goldstein.w.n@gmail.com \
    --cc=heiko.stuebner@vrull.eu \
    --cc=jeffreyalaw@gmail.com \
    --cc=kito.cheng@sifive.com \
    --cc=libc-alpha@sourceware.org \
    --cc=philipp.tomsich@vrull.eu \
    --cc=vineetg@rivosinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).