public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Palmer Dabbelt <palmer@dabbelt.com>
To: adhemerval.zanella@linaro.org
Cc: jeffreyalaw@gmail.com, christoph.muellner@vrull.eu,
	xry111@xry111.site, libc-alpha@sourceware.org,
	Darius Rad <darius@bluespec.com>,
	Andrew Waterman <andrew@sifive.com>, DJ Delorie <dj@redhat.com>,
	Vineet Gupta <vineetg@rivosinc.com>,
	kito.cheng@sifive.com, philipp.tomsich@vrull.eu,
	heiko.stuebner@vrull.eu
Subject: Re: [RFC PATCH 16/19] riscv: Add accelerated strcmp routines
Date: Fri, 31 Mar 2023 10:19:37 -0700 (PDT)	[thread overview]
Message-ID: <mhng-3745a0bb-3612-415c-a367-b30f69daeb82@palmer-ri-x1c9a> (raw)
In-Reply-To: <8db65d2d-e0ed-b999-8d28-35cbcfadc4ed@linaro.org>

On Fri, 31 Mar 2023 07:48:43 PDT (-0700), adhemerval.zanella@linaro.org wrote:
>
>
> On 31/03/23 11:30, Jeff Law wrote:
>>
>>
>> On 3/31/23 06:31, Adhemerval Zanella Netto wrote:
>>
>>>> Jeff
>>>
>>> Is this implementation really better than new generic one [1]? With a target
>>> with zbb support, the generic word comparison should use orc.b instruction [2].
>>> And the final comparison, once with the last word or the mismatch word is found,
>>> should use clz/ctz instruction [3] (result also in branchless code, albeit
>>> I have not check if better than the snippet this implementation uses).
>> I haven't done any comparisons against the updated generic bits.  I nearly suggested to Christoph to do that evaluation, but when I wandered around sysdeps I saw that we still had multiple custom strcmp implementations and set that suggestion aside.
>>
>>
>>>
>>> The generic implementation also has the advantage of use word instruction
>>> on unaligned case, where this implementation does a naive byte per byte
>>> check.
>> Yea, but in my digging this just didn't happen terribly often.  I don't think there's a lot of value there.  Along the same lines, my investigation didn't show any significant value to realign cases and I nearly suggested dropping them to avoid the branch in the hot path, but I wasn't confident enough in the breadth of my investigations to push it.
>> >>
>>> So maybe a better option would to optimize further the generic implementation.
>>> One option might be to parametrize the final_cmp so you can use the branchless
>>> trick (if it indeed is better than generic code).  Another option that the
>>> generic implementation does not explore is manual loop unrolling, as done by
>>> multiple assembly implementations.
>> I could certainly support that.  I was on the fence about pushing to use the generic bits, a little nudge could easily push me to that side.
>
> The initial realign could be tuned, I added mostly because it simplifies both
> aligned and unaligned case a lot.  But it should be doable to use a similar
> strategy as strchr/strlen to mask off the bits based on the input alignment.
>
> The unaligned case is just to avoid drastic performance different between
> input alignment, it is cheap and in the end should just be additional code
> size.
>
> But the main gain of using the generic implementation is one less assembly
> routine to maintain and tune; and by improving the generic implementation
> we gain in ecosystem as whole.

I think we should use the generic stuff where we can, just to avoid 
extra maintiance issues.  I think we'll eventually end up with 
vendor-specific assembly routines, but IMO it's best to only merge those 
if there's a meaningful performance advantage and there's no way to 
replicate it without resorting to assembly.

  reply	other threads:[~2023-03-31 17:19 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-07  0:15 [RFC PATCH 00/19] riscv: ifunc support with optimized mem*/str*/cpu_relax routines Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 01/19] Inhibit early libcalls before ifunc support is ready Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 02/19] riscv: LEAF: Use C_LABEL() to construct the asm name for a C symbol Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 03/19] riscv: Add ENTRY_ALIGN() macro Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 04/19] riscv: Add hart feature run-time detection framework Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 05/19] riscv: Introduction of ISA extensions Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 06/19] riscv: Adding ISA string parser for environment variables Christoph Muellner
2023-02-07  6:20   ` David Abdurachmanov
2023-02-07  0:16 ` [RFC PATCH 07/19] riscv: hart-features: Add fast_unaligned property Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 08/19] riscv: Add (empty) ifunc framework Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 09/19] riscv: Add ifunc support for memset Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 10/19] riscv: Add accelerated memset routines for RV64 Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 11/19] riscv: Add ifunc support for memcpy/memmove Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 12/19] riscv: Add accelerated memcpy/memmove routines for RV64 Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 13/19] riscv: Add ifunc support for strlen Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 14/19] riscv: Add accelerated strlen routine Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 15/19] riscv: Add ifunc support for strcmp Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 16/19] riscv: Add accelerated strcmp routines Christoph Muellner
2023-02-07 11:57   ` Xi Ruoyao
2023-02-07 14:15     ` Christoph Müllner
2023-03-31  5:06       ` Jeff Law
2023-03-31 12:31         ` Adhemerval Zanella Netto
2023-03-31 14:30           ` Jeff Law
2023-03-31 14:48             ` Adhemerval Zanella Netto
2023-03-31 17:19               ` Palmer Dabbelt [this message]
2023-03-31 14:32       ` Jeff Law
2023-02-07  0:16 ` [RFC PATCH 17/19] riscv: Add ifunc support for strncmp Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 18/19] riscv: Add an optimized strncmp routine Christoph Muellner
2023-02-07  1:19   ` Noah Goldstein
2023-02-08 15:13     ` Philipp Tomsich
2023-02-08 17:55       ` Palmer Dabbelt
2023-02-08 19:48         ` Adhemerval Zanella Netto
2023-02-08 18:04       ` Noah Goldstein
2023-02-07  0:16 ` [RFC PATCH 19/19] riscv: Add __riscv_cpu_relax() to allow yielding in busy loops Christoph Muellner
2023-02-07  0:23   ` Andrew Waterman
2023-02-07  0:29     ` Christoph Müllner
2023-02-07  2:59 ` [RFC PATCH 00/19] riscv: ifunc support with optimized mem*/str*/cpu_relax routines Kito Cheng
2023-02-07 16:40 ` Adhemerval Zanella Netto
2023-02-07 17:16   ` DJ Delorie
2023-02-07 19:32     ` Philipp Tomsich
2023-02-07 21:14       ` DJ Delorie
2023-02-08 11:26         ` Christoph Müllner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mhng-3745a0bb-3612-415c-a367-b30f69daeb82@palmer-ri-x1c9a \
    --to=palmer@dabbelt.com \
    --cc=adhemerval.zanella@linaro.org \
    --cc=andrew@sifive.com \
    --cc=christoph.muellner@vrull.eu \
    --cc=darius@bluespec.com \
    --cc=dj@redhat.com \
    --cc=heiko.stuebner@vrull.eu \
    --cc=jeffreyalaw@gmail.com \
    --cc=kito.cheng@sifive.com \
    --cc=libc-alpha@sourceware.org \
    --cc=philipp.tomsich@vrull.eu \
    --cc=vineetg@rivosinc.com \
    --cc=xry111@xry111.site \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).