[RFC PATCH 12/19] riscv: Add accelerated memcpy/memmove routines for RV64

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
To: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: 'GNU C Library' <libc-alpha@sourceware.org>
Subject: [RFC PATCH 12/19] riscv: Add accelerated memcpy/memmove routines for RV64
Date: Thu, 9 Feb 2023 11:43:15 +0000	[thread overview]
Message-ID: <PAWPR08MB898220123E75682E73D8E91C83D99@PAWPR08MB8982.eurprd08.prod.outlook.com> (raw)

Hi Adhemerval,

> The generic routines still assumes that hardware can't or is prohibitive 
> expensive to issue unaligned memory access.  However, I think we move toward 
> this direction to start adding unaligned variants when it makes sense.

There is a _STRING_ARCH_unaligned define that can be set per target. It needs
cleaning up since it's used mostly for premature micro-optimizations (eg. getenv.c)
where using a fixed size memcpy would be best (it also appears to have big-endian
bugs).

> Another usual tuning is loop unrolling, which depends on underlying hardware.
> Unfortunately we need to explicit force gcc to unroll some loop construction
> (for instance check sysdeps/powerpc/powerpc64/power4/Makefile), so this might
> be another approach you might use to tune RISCV routines.

Compiler unrolling is unlikely to give improved results, especially on GCC where
the default unroll factor is still 16 times which will just bloat the code...
So all reasonable unrolling is best done by hand (and doesn't need to be target
specific).

> The memcpy, memmove, memset, memcmp are a slight different subject.  Although
> current generic mem routines does use some explicit unrolling, it also does
> not take in consideration unaligned access, vector instructions, or special 
> instruction (such as cache clear one).  And these usually make a lot of
> difference.

Indeed. However it is also quite difficult to make use of all these without a lot of
target specific code and inline assembler. And at that point you might as well use
assembler...

> What I would expect it maybe we can use a similar strategy Google is doing
> with llvm libc, which based its work on the automemcpy paper [1]. It means
> that for unaligned, each architecture will reimplement the memory routine
> block.  Although the project focus on static compiling, I think using hooks
> over assembly routines might be a better approach (you might reuse code
> blocks or try different strategies more easily).
>
> [1] https://storage.googleapis.com/pub-tools-public-publication-data/pdf/4f7c3da72d557ed418828823a8e59942859d677f.pdf

I'm still not convinced about this strategy - it's hard to beat assembler using
generic code. The way it works in LLVM is that you implement a new set of
builtins that inline an optimal memcpy for a fixed size. But you don't know the
alignment, so this only works on targets that support fast unaligned access.
And with different compiler versions/options you get major performance
variations due to code reordering, register allocation differences or failure
to emit load/store pairs...

I believe it is reasonable to ensure the generic string functions are efficient
to avoid having to write assembler for every string function. However it
becomes crazy when you set the goal to be as close as possible to the best
assembler version in all cases. Most targets will add assembly versions for
key functions like memcpy, strlen etc.

Cheers,
Wilco

next             reply	other threads:[~2023-02-09 11:43 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-09 11:43 Wilco Dijkstra [this message]
2023-02-09 12:25 ` Adhemerval Zanella Netto
  -- strict thread matches above, loose matches on Subject: below --
2023-02-07  0:15 [RFC PATCH 00/19] riscv: ifunc support with optimized mem*/str*/cpu_relax routines Christoph Muellner
2023-02-07  0:16 ` [RFC PATCH 12/19] riscv: Add accelerated memcpy/memmove routines for RV64 Christoph Muellner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=PAWPR08MB898220123E75682E73D8E91C83D99@PAWPR08MB8982.eurprd08.prod.outlook.com \
    --to=wilco.dijkstra@arm.com \
    --cc=adhemerval.zanella@linaro.org \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).