Re: [PATCH v3] Reversing calculation of __x86_shared_non_temporal_threshold

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Carlos O'Donell <carlos@redhat.com>
To: Patrick McGehearty <patrick.mcgehearty@oracle.com>,
	libc-alpha@sourceware.org
Subject: Re: [PATCH v3] Reversing calculation of __x86_shared_non_temporal_threshold
Date: Sun, 27 Sep 2020 09:54:05 -0400	[thread overview]
Message-ID: <e24f28ca-68b0-3391-97bf-4a7a4da474c7@redhat.com> (raw)
In-Reply-To: <1601072475-22682-1-git-send-email-patrick.mcgehearty@oracle.com>

On 9/25/20 6:21 PM, Patrick McGehearty via Libc-alpha wrote:
> The __x86_shared_non_temporal_threshold determines when memcpy on x86
> uses non_temporal stores to avoid pushing other data out of the last
> level cache.
> 
> This patch proposes to revert the calculation change made by H.J. Lu's
> patch of June 2, 2017.
> 
> H.J. Lu's patch selected a threshold suitable for a single thread
> getting maximum performance. It was tuned using the single threaded
> large memcpy micro benchmark on an 8 core processor. The last change
> changes the threshold from using 3/4 of one thread's share of the
> cache to using 3/4 of the entire cache of a multi-threaded system
> before switching to non-temporal stores. Multi-threaded systems with
> more than a few threads are server-class and typically have many
> active threads. If one thread consumes 3/4 of the available cache for
> all threads, it will cause other active threads to have data removed
> from the cache. Two examples show the range of the effect. John
> McCalpin's widely parallel Stream benchmark, which runs in parallel
> and fetches data sequentially, saw a 20% slowdown with this patch on
> an internal system test of 128 threads. This regression was discovered
> when comparing OL8 performance to OL7.  An example that compares
> normal stores to non-temporal stores may be found at
> https://vgatherps.github.io/2018-09-02-nontemporal/.  A simple test
> shows performance loss of 400 to 500% due to a failure to use
> nontemporal stores. These performance losses are most likely to occur
> when the system load is heaviest and good performance is critical.
> 
> The tunable x86_non_temporal_threshold can be used to override the
> default for the knowledgable user who really wants maximum cache
> allocation to a single thread in a multi-threaded system.
> The manual entry for the tunable has been expanded to provide
> more information about its purpose.

Patrick,

Thank you for doing this work, and for all of the comments you made
downthread on the original posting.

I agree it is very easy to loose sight of the bigger "up and out"
picture of development when all you do is look at the core C library
performance for one process.

Your shared cautionary tales sparked various discussions within the
platform tools team here at Red Hat :-)

There is no silver bullet here, and the microbencmarks in glibc are
there to give us a starting point for a discussion.

I'm curious to know if you think there is some kind of balancing
microbenchmark we could write to show the effects of process-to-process
optimizations?

I'm happy if we all agree that the kind of "adjustments" you made
today will only be derived from an adaptive process involving customers,
applications, modern hardware, engineers, and the mixing of all of them
together to make such adjustments.

Thank you again.

-- 
Cheers,
Carlos.

next prev parent reply	other threads:[~2020-09-27 13:54 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-25 22:21 Patrick McGehearty
2020-09-25 22:26 ` H.J. Lu
2020-09-28 12:55   ` Florian Weimer
2020-09-27 13:54 ` Carlos O'Donell [this message]
2020-10-01 16:04   ` Patrick McGehearty
2020-10-01 21:02     ` Carlos O'Donell
     [not found] ` <CAMe9rOr3QUQKGgAnk+UBBq6hLXkU6i8XcNUMKkNRo1iAK=7ceA@mail.gmail.com>
2023-04-19 22:30   ` Noah Goldstein
2023-04-19 22:43     ` H.J. Lu
2023-04-19 23:24       ` Noah Goldstein
2023-04-20  0:12         ` H.J. Lu
2023-04-20  0:27           ` Noah Goldstein
2023-04-20 16:17             ` H.J. Lu
2023-04-20 20:23               ` Noah Goldstein
2023-04-20 23:50                 ` H.J. Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e24f28ca-68b0-3391-97bf-4a7a4da474c7@redhat.com \
    --to=carlos@redhat.com \
    --cc=libc-alpha@sourceware.org \
    --cc=patrick.mcgehearty@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).