public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Noah Goldstein <goldstein.w.n@gmail.com>
To: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Cc: GNU C Library <libc-alpha@sourceware.org>,
	 Adhemerval Zanella <adhemerval.zanella@linaro.org>,
	"H.J. Lu" <hjl.tools@gmail.com>
Subject: Re: [PATCH v2] x86-64: Optimize bzero
Date: Wed, 23 Feb 2022 02:12:13 -0600	[thread overview]
Message-ID: <CAFUsyfJKpM+SpEt5ShCU8Dfu2+sp-rQMgmHX_zBzpc-Scvg6Ww@mail.gmail.com> (raw)
In-Reply-To: <AS8PR08MB65342CBD569FB0206F8522D883349@AS8PR08MB6534.eurprd08.prod.outlook.com>

On Tue, Feb 15, 2022 at 7:38 AM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>
> Hi,
>
> > Is there any way it can be setup so that one C impl can cover all the
> > arch that want to just leave `__memsetzero` as an alias to `memset`?
> > I know they have incompatible interfaces that make it hard but would
> > a weak static inline in string.h work?
>
> No that won't work. A C implementation similar to current string/bzero.c
> adds unacceptable overhead (since most targets just implement memset and
> will continue to do so). An inline function in string.h would introduce target
> hacks in our headers, something we've been working hard to remove over the
> years.
>
> The only reasonable option is a target specific optimization in GCC and LLVM
> so that memsetzero is only emitted when it is known an optimized GLIBC
> implementation exists (similar to mempcpy).
>
> > It's worth noting that between the two `memset` is the cold function
> > and `__memsetzero` is the hot one. Based on profiles of GCC11 and
> > Python3.7.7 setting zero covers 99%+ cases.
>
> There is no doubt memset of zero is by far the most common. What is in doubt
> is whether micro-optimizing is worth it on modern cores. Does Python speed up
> by a measurable amount if you use memsetzero?

Ran a few benchmarks for GCC/Python3.7

There is no measurable benefit using '__memsetzero' in Python3.7

For GCC there are some cases where there is a consistent speedup
though it's not universal.

Times are geomean (N=30) of memsetzero / memset
(1.0 means no difference, less than 1 means improvement, greater than
1 regression).

 Size, N Funcs,  Type, memsetzero / memset
small,       1, bench,             0.99986
small,       1, build,             0.99378
small,       1,  link,             0.99241
small,      10, bench,             0.99712
small,      10, build,             0.99393
small,      10,  link,             0.99245
small,     100, bench,             0.99659
small,     100, build,             0.99271
small,     100,  link,             0.99227
small,     250, bench,             1.00195
small,     250, build,             0.99609
small,     250,  link,             0.99744
large,     N/A, bench,             0.99930


The "small" size basically means the file was filled with essentially empty
functions i.e
```
int foo(void) { return 0; }
```

N Funcs refers to the number of these functions per file, so small-250 would
be 250 empty functions per file.

Bench recompiled the same file 100x times
Build compiled all the files
Link linked all the files with a main that emitted 1x call per function

The "large" size was a realistic file someone might compile (in this case
a freeze of sqlite3.c).

The performance improvement for the build/link step for varying amount of
small functions per file was consistently in the ~.8% range. Not mind blowing
but I believe its a genuine improvement.

I don't think this shows expected GCC usage is going to be faster, but
I do think it shows that the effects of this change could be noticeable in an
application.

NB: I'm not exactly certain why 'bench' doesn't follow the same trend
as build/link.
The only thing I notice is 'bench' takes longer (implemented in a Makefile
loop) so possibly to '+ c' term just dampens any performance differences.
The math for this doesn't work out 100% so there is a bit to still be skeptical
of.


>
> Cheers,
> Wilco

  reply	other threads:[~2022-02-23  8:12 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-15 13:38 Wilco Dijkstra
2022-02-23  8:12 ` Noah Goldstein [this message]
2022-02-23 12:09   ` Adhemerval Zanella
2022-02-24 13:16   ` Wilco Dijkstra
2022-02-24 15:48     ` H.J. Lu
2022-02-24 22:58     ` Noah Goldstein
2022-02-24 23:21       ` Noah Goldstein
2022-02-25 17:37         ` Noah Goldstein
2022-02-25 13:51       ` Wilco Dijkstra
2022-02-25 17:35         ` Noah Goldstein
  -- strict thread matches above, loose matches on Subject: below --
2022-02-08 22:43 H.J. Lu
2022-02-08 23:56 ` Noah Goldstein
2022-02-09 11:41 ` Adhemerval Zanella
2022-02-09 22:14   ` Noah Goldstein
2022-02-10 12:35     ` Adhemerval Zanella
2022-02-10 13:01       ` Wilco Dijkstra
2022-02-10 13:10         ` Adhemerval Zanella
2022-02-10 13:16           ` Adhemerval Zanella
2022-02-10 13:17           ` Wilco Dijkstra
2022-02-10 13:22             ` Adhemerval Zanella
2022-02-10 17:50               ` Alejandro Colomar (man-pages)
2022-02-10 19:19                 ` Wilco Dijkstra
2022-02-10 20:27                   ` Alejandro Colomar (man-pages)
2022-02-10 20:42                     ` Adhemerval Zanella
2022-02-10 21:07                       ` Patrick McGehearty
2022-02-11 13:01                         ` Adhemerval Zanella
2022-02-12 23:46                           ` Noah Goldstein
2022-02-14 12:07                             ` Adhemerval Zanella
2022-02-14 12:41                               ` Noah Goldstein
2022-02-14 14:07                                 ` Adhemerval Zanella
2022-02-14 15:03                                   ` H.J. Lu
2022-05-04  6:35                                     ` Sunil Pandey
2022-05-04 12:52                                       ` Adhemerval Zanella
2022-05-04 14:50                                         ` H.J. Lu
2022-05-04 14:54                                           ` Adhemerval Zanella
2022-02-10 22:00                       ` Alejandro Colomar (man-pages)
2022-02-10 19:42                 ` Adhemerval Zanella
2022-02-10 18:28         ` Noah Goldstein
2022-02-10 18:35         ` Noah Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFUsyfJKpM+SpEt5ShCU8Dfu2+sp-rQMgmHX_zBzpc-Scvg6Ww@mail.gmail.com \
    --to=goldstein.w.n@gmail.com \
    --cc=Wilco.Dijkstra@arm.com \
    --cc=adhemerval.zanella@linaro.org \
    --cc=hjl.tools@gmail.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).