Re: New x86-64 micro-architecture levels

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Biener <richard.guenther@gmail.com>
To: "H.J. Lu" <hjl.tools@gmail.com>
Cc: Florian Weimer <fweimer@redhat.com>,
	GCC Development <gcc@gcc.gnu.org>,
	 GNU C Library <libc-alpha@sourceware.org>,
	Tom Stellard <tstellar@redhat.com>,
	llvm-dev@lists.llvm.org,  "Mallappa,
	Premachandra" <Premachandra.Mallappa@amd.com>,
	x86-64-abi <x86-64-abi@googlegroups.com>
Subject: Re: New x86-64 micro-architecture levels
Date: Mon, 13 Jul 2020 08:23:32 +0200	[thread overview]
Message-ID: <CAFiYyc1xNCaz+hzKLwZASi4BuXG+tWuMZuVSex1R8rE6fQ1TPA@mail.gmail.com> (raw)
In-Reply-To: <CAMe9rOrakDcTNFJ3XfwBAjmv5UFozW3LCqccjXc5kYhO8kOU5A@mail.gmail.com>

On Fri, Jul 10, 2020 at 11:45 PM H.J. Lu via Gcc <gcc@gcc.gnu.org> wrote:
>
> On Fri, Jul 10, 2020 at 10:30 AM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > Most Linux distributions still compile against the original x86-64
> > baseline that was based on the AMD K8 (minus the 3DNow! parts, for Intel
> > EM64T compatibility).
> >
> > There has been an attempt to use the existing AT_PLATFORM-based loading
> > mechanism in the glibc dynamic linker to enable a selection of optimized
> > libraries.  But the general selection mechanism in glibc is problematic:
> >
> >   hwcaps subdirectory selection in the dynamic loader
> >   <https://sourceware.org/pipermail/libc-alpha/2020-May/113757.html>
> >
> > We also have the problem that the glibc version of "haswell" is distinct
> > from GCC's -march=haswell (and presumably other compilers):
> >
> >   Definition of "haswell" platform is inconsistent with GCC
> >   <https://sourceware.org/bugzilla/show_bug.cgi?id=24080>
> >
> > And that the selection criteria are not what people expect:
> >
> >   Epyc and other current AMD CPUs do not select the "haswell" platform
> >   subdirectory
> >   <https://sourceware.org/bugzilla/show_bug.cgi?id=23249>
> >
> > Since the hwcaps-based selection does not work well regardless of
> > architecture (even in cases the kernel provides glibc with data), I
> > worked on a new mechanism that does not have the problems associated
> > with the old mechanism:
> >
> >   [PATCH 00/30] RFC: elf: glibc-hwcaps support
> >   <https://sourceware.org/pipermail/libc-alpha/2020-June/115250.html>
> >
> > (Don't be concerned that these patches have not been reviewed; we are
> > busy preparing the glibc 2.32 release, and these changes do not alter
> > the glibc ABI itself, so they do not have immediate priority.  I'm
> > fairly confident that a version of these changes will make it into glibc
> > 2.33, and I hope to backport them into Fedora 33, Fedora 32, and Red Hat
> > Enterprise Linux 8.4.  Debian as well, but I have never done anything
> > like it there, so I don't know if the patches will be accepted.)
> >
> > Out of the box, this should work fairly well for IBM POWER and Z, where
> > there is a clear progression of silicon versions (at least on paper
> > —virtualization may blur the picture somewhat).
> >
> > However, for x86, we do not have such a clear progression of
> > micro-architecture versions.  This is not just as a result of the
> > AMD/Intel competition, but also due to ongoing product differentiation
> > within one chip vendor.  I think we need these levels broadly for the
> > following reasons:
> >
> > * Selecting on individual CPU features (similar to the old hwcaps
> >   mechanism) in glibc has scalability issues, particularly for
> >   LD_LIBRARY_PATH processing.
> >
> > * Developers need guidance about useful targets for optimization.  I
> >   think there is value in limiting the choices, in the sense that “if
> >   you are able to test three builds in total, these are the things you
> >   should build”.
> >
> > * glibc and the compilers should align in their definition of the
> >   levels, so that developers can use an -march= option to build for a
> >   particular level that is recognized by glibc.  This is why I think the
> >   description of the levels should go into the psABI supplement.
> >
> > * A preference order for these levels avoids falling back to the K8
> >   baseline if the platform progresses to a new version due to
> >   glibc/kernel/hypervisor/hardware upgrades.
> >
> > I'm including a proposal for the levels below.  I use single letters for
> > them, but I expect that the concrete implementation of this proposal
> > will use names like “x86-100”, “x86-101”, like in the glibc patch
> > referenced above.  (But we can discuss other approaches.)
> >
> > I looked at various machines in the Red Hat labs and talked to Intel and
> > AMD engineers about this, but this concrete proposal is based on my own
> > analysis of the situation.  I excluded CPU features related to
> > cryptography and cache management, including hardware transactional
> > memory, and CPU timing.  I assume that we will see some of these
> > features being disabled by the firmware or the kernel over time.  That
> > would eliminate entire levels from selection, which is not desirable.
> > For cryptographic code, I expect that localized selection of an
> > optimized implementation works because such code tends to be isolated
> > blocks, running for dozens of cycles each time, not something that gets
> > scattered all over the place by the compiler.
> >
> > We previously discussed not emitting VZEROUPPER at later levels, but I
> > don't think this is beneficial because the ABI does not have
> > callee-saved vector registers, so it can only be useful with local
> > functions (or whatever LTO considers local), where there is no ABI
> > impact anyway.
> >
> > I did not include FSGSBASE because the FS base is already available at
> > %fs:0.  Changing the FS base in userspace breaks too much, so the main
> > benefit is the tighter encoding of rdfsbase, which seems very slim.
> >
> > Not covered in this are tuning decisions.  I think we can benefit from
> > some variance in this area between implementations; it should not affect
> > correctness.  32-bit support is also a separate matter.
> >
> > * Level A
> >
> > CMPXCHG16B, LAHF/SAHF, POPCNT, SSE3, SSE4.1, SSE4.2, SSSE3
> >
> > This is one step above the K8 baseline and corresponds to a mainline CPU
> > model ca. 2008 to 2011.  It is also implemented by recent-ish
> > generations of Intel Atom server CPUs (although I haven't tested the
> > latest version).  A 32-bit variant would have to list many additional
> > CPU features here.
> >
> > * Level B
> >
> > AVX, plus everything in level A.
> >
> > This step is so small that it probably can be dropped, unless the
> > benefits from using VEX encoding are truly significant.
> >
> > For AVX and some of the following features, it is assumed that the
> > run-time selection takes full support coverage (from silicon to the
> > kernel) into account.
> >
> > * Level C
> >
> > AVX2, BMI1, BMI2, F16C, FMA, LZCNT, MOVBE, plus everything in level B.
> >
> > This is close to what glibc currently calls "haswell".
> >
> > * Level D
> >
> > AVX512F, AVX512BW, AVX512CD, AVX512DQ, AVX512VL, plus everything in
> > level C.
> >
> > This is the AVX-512 level implemented by Xeon Scalable Processors, not
> > the Xeon Phi variant.
> >
> >
> > glibc (or an alternative loader implementation) would search for
> > libraries starting at level D, going back to level A, and finally the
> > baseline implementation in the default library location.
> >
> > I expect that some distributions will also use these levels to set a
> > baseline for the entire distribution (i.e., everything would be built to
> > level A or maybe even level C), and these libraries would then be
> > installed in the default location.
> >
> > I'll be glad if I can get any feedback on this proposal.  I plan to turn
> > it into a merge request for the x86-64 psABI document eventually.
> >
>
> Looks good.  I like it.

Likewise.  Btw, did you check that VIA family chips slot into Level A
at least?  Where do AMD bdverN slot in?

>  My only concerns are
>
> 1. Names like “x86-100”, “x86-101”, what features do they support?

Indeed I didn't get the -100, -101 part.  On the GCC side I'd have
suggested -march=generic-{A,B,C,D} implying the respective
-mtune.

Do the patches end up annotating ELF binaries with the architecture
level and does ld.so check that info?

For example IIRC there's a penalty to switch between VEX and
not VEX encoded instructions so even on AVX capable hardware
it might be profitable to use non-AVX libraries if the program is
using only architecture level A?

On that side, does architecture level B+ suggest using VEX encoding
everywhere?  It would be indeed nice to have the architecture levels
documented in the psABI.

> 2. I have a library with AVX2 and FMA, which directory should it go?

Eventually GCC/gas can annotate objects with the lowest architecture
level that is applicable?

Thanks for doing this,
Richard.

> Can we pass such info to ld.so and ld.so prints out the best directory
> name?
>
> --
> H.J.

next prev parent reply	other threads:[~2020-07-13  6:23 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-10 17:30 Florian Weimer
2020-07-10 19:14 ` Joseph Myers
2020-07-13  7:55   ` Florian Weimer
2020-07-10 21:42 ` H.J. Lu
2020-07-13  6:23   ` Richard Biener [this message]
2020-07-13  7:40     ` Florian Weimer
2020-07-13  7:47       ` Jan Beulich
2020-07-13 13:31         ` H.J. Lu
2020-07-13 13:53           ` Jakub Jelinek
2020-07-13  8:57       ` Richard Biener
2020-07-13  6:49   ` Florian Weimer
2020-07-13 13:30     ` H.J. Lu
2020-07-11  7:40 ` Allan Sandfeld Jensen
2020-07-13  6:58   ` Florian Weimer
2020-07-15 14:38 ` Mark Wielaard
2020-07-15 14:45   ` H.J. Lu
2020-07-15 14:56   ` Florian Weimer
2020-07-21 16:05 ` Mallappa, Premachandra
2020-07-21 18:04   ` Florian Weimer
2020-07-22  1:31     ` Dongsheng Song
2020-07-22  8:44       ` Florian Weimer
2020-07-22  9:26         ` Richard Biener
2020-07-22 10:16           ` Florian Weimer
2020-07-22 13:50             ` Richard Biener
2020-07-22 14:21               ` H.J. Lu
2020-07-31 13:06           ` Carlos O'Donell
2020-07-22  7:48     ` Jan Beulich
2020-07-22 10:34       ` Florian Weimer
2020-07-22 11:41         ` Jan Beulich
2020-07-31 13:20         ` Carlos O'Donell
2020-07-22 16:45     ` Mallappa, Premachandra
2020-07-23 12:44       ` Michael Matz
2020-07-23 13:21         ` H.J. Lu
2020-07-28 15:54       ` Florian Weimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFiYyc1xNCaz+hzKLwZASi4BuXG+tWuMZuVSex1R8rE6fQ1TPA@mail.gmail.com \
    --to=richard.guenther@gmail.com \
    --cc=Premachandra.Mallappa@amd.com \
    --cc=fweimer@redhat.com \
    --cc=gcc@gcc.gnu.org \
    --cc=hjl.tools@gmail.com \
    --cc=libc-alpha@sourceware.org \
    --cc=llvm-dev@lists.llvm.org \
    --cc=tstellar@redhat.com \
    --cc=x86-64-abi@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).