public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
To: Palmer Dabbelt <palmer@dabbelt.com>, philipp.tomsich@vrull.eu
Cc: goldstein.w.n@gmail.com, christoph.muellner@vrull.eu,
	libc-alpha@sourceware.org, Darius Rad <darius@bluespec.com>,
	Andrew Waterman <andrew@sifive.com>, DJ Delorie <dj@redhat.com>,
	Vineet Gupta <vineetg@rivosinc.com>,
	kito.cheng@sifive.com, jeffreyalaw@gmail.com,
	heiko.stuebner@vrull.eu
Subject: Re: [RFC PATCH 18/19] riscv: Add an optimized strncmp routine
Date: Wed, 8 Feb 2023 16:48:56 -0300	[thread overview]
Message-ID: <ae18aa51-f38d-952f-8576-a31614069adb@linaro.org> (raw)
In-Reply-To: <mhng-e7534968-425c-4164-a403-1d02725c661c@palmer-ri-x1c9>



On 08/02/23 14:55, Palmer Dabbelt wrote:
> On Wed, 08 Feb 2023 07:13:44 PST (-0800), philipp.tomsich@vrull.eu wrote:
>> On Tue, 7 Feb 2023 at 02:20, Noah Goldstein <goldstein.w.n@gmail.com> wrote:
>>>
>>> On Mon, Feb 6, 2023 at 6:23 PM Christoph Muellner
>>> <christoph.muellner@vrull.eu> wrote:
>>> >
>>> > From: Christoph Müllner <christoph.muellner@vrull.eu>
>>> >
>>> > The implementation of strncmp() can be accelerated using Zbb's orc.b
>>> > instruction. Let's add an optimized implementation that makes use
>>> > of this instruction.
>>> >
>>> > Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
>>>
>>> Not necessary, but imo performance patches should have at least some reference
>>> to the expected speedup versus the existing alternatives.
>>
>> Given that this is effectively a SWAR-like optimization (orc.b allows
>> us to test 8 bytes in parallel for a NUL byte), we should be able to
>> show the benefit through a reduction in dynamic instructions.  Would
>> this be considered reasonable reference data?
> 
> Generally for performance improvements the only metrics that count come from real hardware.  Processor implementation is complex and it's not generally true that reducing dynamic instructions results in better performance (particularly when more complex flavors instructions replace simpler ones).
> 

I agree with Noah here that we need to have some baseline performance number, 
even tough we are comparing naive implementations (what glibc used to have for
implementations).

> We've not been so good about this on the RISC-V side of things, though.  I think that's largely because we didn't have all that much complexity around this, but there's a ton of stuff showing up right now.  The general theory has been that Zbb instructions will execute faster than their corresponding I sequences, but nobody has proved that.  I believe the new JH7110 has Zba and Zbb, so maybe the right answer there is to just benchmark things before merging them?  That way we can get back to doing things sanely before we go too far down the premature optimization rabbit hole.
> 
> FWIW: we had a pretty similar discussion in Linux land around these and nobody could get the JH7110 to boot, but given that we have ~6 months until glibc releases again hopefully that will be sorted out.  There's a bunch of ongoing work looking at the more core issues like probing, so maybe it's best to focus on getting that all sorted out first?  It's kind of awkward to have a bunch of routines posted in a whole new framework that's not sorting out all the probing dependencies.

Just a heads up that with latest generic string routines optimization, all
str* routines should now use new zbb extensions (if compiler is instructed
to do so). I think you might squeeze some cycles with hand crafted assembly
routine, but I would rather focus on trying to optimize code generation
instead.

The generic routines still assumes that hardware can't or is prohibitive 
expensive to issue unaligned memory access.  However, I think we move toward 
this direction to start adding unaligned variants when it makes sense.

Another usual tuning is loop unrolling, which depends on underlying hardware.
Unfortunately we need to explicit force gcc to unroll some loop construction
(for instance check sysdeps/powerpc/powerpc64/power4/Makefile), so this might
be another approach you might use to tune RISCV routines.

The memcpy, memmove, memset, memcmp are a slight different subject.  Although
current generic mem routines does use some explicit unrolling, it also does
not take in consideration unaligned access, vector instructions, or special 
instruction (such as cache clear one).  And these usually make a lot of
difference.

What I would expect it maybe we can use a similar strategy Google is doing
with llvm libc, which based its work on the automemcpy paper [1]. It means
that for unaligned, each architecture will reimplement the memory routine
block.  Although the project focus on static compiling, I think using hooks
over assembly routines might be a better approach (you might reuse code
blocks or try different strategies more easily).

[1] https://storage.googleapis.com/pub-tools-public-publication-data/pdf/4f7c3da72d557ed418828823a8e59942859d677f.pdf

  reply	other threads:[~2023-02-08 19:49 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-07  0:15 [RFC PATCH 00/19] riscv: ifunc support with optimized mem*/str*/cpu_relax routines Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 01/19] Inhibit early libcalls before ifunc support is ready Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 02/19] riscv: LEAF: Use C_LABEL() to construct the asm name for a C symbol Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 03/19] riscv: Add ENTRY_ALIGN() macro Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 04/19] riscv: Add hart feature run-time detection framework Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 05/19] riscv: Introduction of ISA extensions Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 06/19] riscv: Adding ISA string parser for environment variables Christoph Muellner
2023-02-07  6:20   ` David Abdurachmanov
2023-02-07  0:16 ` [RFC PATCH 07/19] riscv: hart-features: Add fast_unaligned property Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 08/19] riscv: Add (empty) ifunc framework Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 09/19] riscv: Add ifunc support for memset Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 10/19] riscv: Add accelerated memset routines for RV64 Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 11/19] riscv: Add ifunc support for memcpy/memmove Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 12/19] riscv: Add accelerated memcpy/memmove routines for RV64 Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 13/19] riscv: Add ifunc support for strlen Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 14/19] riscv: Add accelerated strlen routine Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 15/19] riscv: Add ifunc support for strcmp Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 16/19] riscv: Add accelerated strcmp routines Christoph Muellner
2023-02-07 11:57   ` Xi Ruoyao
2023-02-07 14:15     ` Christoph Müllner
2023-03-31  5:06       ` Jeff Law
2023-03-31 12:31         ` Adhemerval Zanella Netto
2023-03-31 14:30           ` Jeff Law
2023-03-31 14:48             ` Adhemerval Zanella Netto
2023-03-31 17:19               ` Palmer Dabbelt
2023-03-31 14:32       ` Jeff Law
2023-02-07  0:16 ` [RFC PATCH 17/19] riscv: Add ifunc support for strncmp Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 18/19] riscv: Add an optimized strncmp routine Christoph Muellner
2023-02-07  1:19   ` Noah Goldstein
2023-02-08 15:13     ` Philipp Tomsich
2023-02-08 17:55       ` Palmer Dabbelt
2023-02-08 19:48         ` Adhemerval Zanella Netto [this message]
2023-02-08 18:04       ` Noah Goldstein
2023-02-07  0:16 ` [RFC PATCH 19/19] riscv: Add __riscv_cpu_relax() to allow yielding in busy loops Christoph Muellner
2023-02-07  0:23   ` Andrew Waterman
2023-02-07  0:29     ` Christoph Müllner
2023-02-07  2:59 ` [RFC PATCH 00/19] riscv: ifunc support with optimized mem*/str*/cpu_relax routines Kito Cheng
2023-02-07 16:40 ` Adhemerval Zanella Netto
2023-02-07 17:16   ` DJ Delorie
2023-02-07 19:32     ` Philipp Tomsich
2023-02-07 21:14       ` DJ Delorie
2023-02-08 11:26         ` Christoph Müllner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ae18aa51-f38d-952f-8576-a31614069adb@linaro.org \
    --to=adhemerval.zanella@linaro.org \
    --cc=andrew@sifive.com \
    --cc=christoph.muellner@vrull.eu \
    --cc=darius@bluespec.com \
    --cc=dj@redhat.com \
    --cc=goldstein.w.n@gmail.com \
    --cc=heiko.stuebner@vrull.eu \
    --cc=jeffreyalaw@gmail.com \
    --cc=kito.cheng@sifive.com \
    --cc=libc-alpha@sourceware.org \
    --cc=palmer@dabbelt.com \
    --cc=philipp.tomsich@vrull.eu \
    --cc=vineetg@rivosinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).