public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
To: Palmer Dabbelt <palmer@rivosinc.com>, jeffreyalaw@gmail.com
Cc: Evan Green <evan@rivosinc.com>,
	libc-alpha@sourceware.org, slewis@rivosinc.com,
	Vineet Gupta <vineetg@rivosinc.com>
Subject: Re: [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface
Date: Fri, 31 Mar 2023 16:32:42 -0300	[thread overview]
Message-ID: <71550310-e6ef-dfa1-db3b-dcd7407fd413@linaro.org> (raw)
In-Reply-To: <mhng-d66a0944-bc47-40e0-95a8-2b080de8c8db@palmer-ri-x1c9a>



On 31/03/23 15:34, Palmer Dabbelt wrote:
> On Fri, 31 Mar 2023 11:07:02 PDT (-0700), jeffreyalaw@gmail.com wrote:
>>
>>
>> On 3/30/23 13:38, Adhemerval Zanella Netto wrote:
>>>
>>>
>>> On 30/03/23 03:20, Jeff Law wrote:
>>>>
>>>>
>>>> On 3/29/23 13:45, Palmer Dabbelt wrote:
>>>>
>>>>> It's not in for-next yet, but various patch sets / proposals have been on the lists for a few months and it seems like discussion on the kernel side has pretty much died down.  That's why I was pinging the glibc side of things, if anyone here has comments on the interface then it's time to chime in.  If there's no comments then we're likely to end up with this in the next release (so queue into for-next soon, Linus' master in a month or so).
>>>> Right.  And I've suggested that we at least try to settle on the various mem* and str* implementations independently of the kernel->glibc interface question.
>>>>
>>>> I don't much care how we break down the problem of selecting implementations, just that we get started.   That can and probably should be happening in parallel with the kernel->glibc API work.
>>>>
>>>> I've got some performance testing to do in this space (primarily of the VRULL implementations).  It's just going to take a long time to get the data.  And that implementation probably needs some revamping after all the work on the mem* and str* infrastructure that landed earlier this year.
>>>>
>>>
>>> I don't think glibc is the right place for code dump, specially for implementations
>>> that does not have representative performance numbers in real hardware and might
>>> require further tuning.  It can be even tricky if you require different build config
>>> to testing as used to have for some ABI (for instance on powerpc with --with-cpu),
>>> at least for ifunc we have some mechanism to test multiple variants assuming the
>>> chips at least support (which should be case for unaligned).
>> It's not meant to be "code dump".  It's "these are the recommended
>> implementation and we're just waiting for the final ifunc wiring to use
>> them automatically."
>>
>> But I understand your point. Even if we just agree on the
>> implementations without committing until the ifunc interface is settled
>> is a major step forward.
>>
>> My larger point is that we need to work through the str* and mem*
>> implementations and settle on those implementations and that can happen
>> in independently of the interface discussion with the kernel team.  If
>> we've settled on specific implementations, why not go ahead and put them
>> into the repo with the expectation that we can trivially wire them into
>> the ifunc resolver once the abi interface is sorted out.
> 
> IMO that's fine: we've got a bunch of other infrastructure around these optimized routines that will need to get built (glibc_hwcaps, for example) so it's not like just having hwprobe means we're done.
> 
> The only issue I see with having these in tree is that we'll end up with glibc binaries that have vendor-specific tunings, but no way to provide those with generic binaries.  That means vendors will end up shipping these non-portable binaries.  We've historically tried to avoid that wherever possible, but it's probably time to call that a pipe dream -- the only base we could really have is rv64gc, and that's going to be so slow it's essentially useless for any real systems.
> 
> So if you guys have actual performance gain numbers to talk about, then I'm happy taking the optimized glibc routines (or at least whatever bits of them are in RISC-V land) for that hardware -- even if it means there's a build-time configuration that results in Ventana-specific binaries.
> 
> I think we do want to keep pushing on the dynamic flavors of stuff, just so we can try to dig out of this hole at some point, but we're going to have a mess until the ISA get sorted out.  My guess is that will take years, and blocking the optimizations until then is just going to lead to a bunch of out-of-tree ports from vendors and an even bigger mess.

It is still not clear to me what RISCV, as ABI and not as an specific vendor, 
wants to provide arch and vendor specific str* and mem* routines.  Christophe
has hinted that the focus is not compile-only approach, so I take --with-cpu
support (similar to what some old ABI used to provide, like powerpc) is not
an option.  However, this is not what the RVV proposal does [3], which is to
enable RVV iff you target glibc to rvv (so compile-only).

And that's why I asked you guys to first define on how you want to approach
it.  

So I take that RISCV want to follow what x86_64 and aarch64 do, which is
provide optimized routines for a minimum abi (say rv64gc), and then provide
runtime selection through ifunc for either ABI or vendor specific routines
(including variant like the unaligned optimization).  You can still follow
what x86_64 and s390 recently did, which is if you define a minimum ABI
version, you default the optimized version and either skip ifunc selection
or setup a more restrict set (so in future, you can have a rvv-only build
that does not need to provide old zbb or rv64gc support).

Which then leads to how to actually test and provide such support.  The
str* and mem* tests consult which ifunc variant are support 
(ifunc-impl-list.c) on the underlying hardware; while the selector returns
the best option.  Both rely on how to query the hardware at or least which
version are supported, so I think RISCV should first figure out this part
(unless you do want to follow the compile-only approach...)

So it does not make sense to me to have ifunc variants not selected or 
tested in repo, only to be enabled in a foreseen future.

[1] https://sourceware.org/pipermail/libc-alpha/2023-February/145392.html
[2] https://sourceware.org/pipermail/libc-alpha/2023-February/145414.html
[3] https://sourceware.org/pipermail/libc-alpha/2023-March/thread.html

> 
>>> So for experimental routines, where you expect to have frequent tuning based on
>>> once you have tested and benchmarks on different chips; an external project
>>> might a better idea; and sync with glibc once the routines are tested and validate.
>>> And these RISCV does seemed to be still very experimental, where performance numbers
>>> are still synthetic ones from emulators.
>> I think we're actually a lot closer than you might think :-)  My goal
>> would be that we're not doing frequent tuning and avoid uarch specific
>> versions if we at all can.  There's a reasonable chance we can do that
>> if we have good baseline, zbb and vector versions.  I'm not including
> 
> Unfortunately there's going to be very wide variation in performance between vendors for the vector extension, we're going to have at least 3 flavors of anything there (plus whatever Allwinner/T-Head ends up needing, but that's a whole can of worms).  So I think at this point we'd be better off just calling these vendor-specific routines, if there's some commonality between them we can sort it out later.
> 
>> cboz memory clear right now -- there's already evidence that uarch
>> considerations around cboz may be significant.
> 
> Yep, again there's at least 3 ways of implementing CBOZ that I've seen floating around so we're going to have a vendor-specific mess there.
> 
>>> Another possibility might to improve the generic implementation, as we have done
>>> recently where RISCV bitmanip was a matter to add just 2 files and 4 functions
>>> to optimize multiple string functions [2].  I have some WIP patches to add support
>>> for unaligned memcpy/memmove with a very simple strategy.
>> As I noted elsewhere.  I was on the fence with pushing for improvements
>> to the generic strcmp bits, but could be easily swayed to that position.
>>
>> jeff

  reply	other threads:[~2023-03-31 19:32 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-21 19:15 Evan Green
2023-02-21 19:15 ` [PATCH v2 1/3] riscv: Add Linux hwprobe syscall support Evan Green
2023-03-29 18:38   ` Adhemerval Zanella Netto
2023-02-21 19:15 ` [PATCH v2 2/3] riscv: Add hwprobe vdso call support Evan Green
2023-03-29 18:39   ` Adhemerval Zanella Netto
2023-02-21 19:15 ` [PATCH v2 3/3] riscv: Add and use alignment-ignorant memcpy Evan Green
2023-03-28 22:54 ` [PATCH v2 0/3] RISC-V: ifunced memcpy using new kernel hwprobe interface Palmer Dabbelt
2023-03-28 23:41   ` Adhemerval Zanella Netto
2023-03-29  0:01     ` Palmer Dabbelt
2023-03-29 19:16       ` Adhemerval Zanella Netto
2023-03-29 19:45         ` Palmer Dabbelt
2023-03-29 20:13           ` Adhemerval Zanella Netto
2023-03-30 18:31             ` Evan Green
2023-03-30 19:43               ` Adhemerval Zanella Netto
2023-03-30  6:20           ` Jeff Law
2023-03-30 18:43             ` Evan Green
2023-03-31  5:09               ` Jeff Law
2023-03-30 19:38             ` Adhemerval Zanella Netto
2023-03-31 18:07               ` Jeff Law
2023-03-31 18:34                 ` Palmer Dabbelt
2023-03-31 19:32                   ` Adhemerval Zanella Netto [this message]
2023-03-31 20:19                     ` Jeff Law
2023-03-31 21:03                       ` Palmer Dabbelt
2023-03-31 21:35                         ` Jeff Law
2023-03-31 21:38                           ` Palmer Dabbelt
2023-03-31 22:10                             ` Jeff Law
2023-04-07 15:36                               ` Palmer Dabbelt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=71550310-e6ef-dfa1-db3b-dcd7407fd413@linaro.org \
    --to=adhemerval.zanella@linaro.org \
    --cc=evan@rivosinc.com \
    --cc=jeffreyalaw@gmail.com \
    --cc=libc-alpha@sourceware.org \
    --cc=palmer@rivosinc.com \
    --cc=slewis@rivosinc.com \
    --cc=vineetg@rivosinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).