From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-1.mimecast.com (us-smtp-2.mimecast.com [207.211.31.81]) by sourceware.org (Postfix) with ESMTP id E19693857C78 for ; Tue, 21 Jul 2020 18:04:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org E19693857C78 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-341-CmmM_N-0P0yRFk1bLTBQvA-1; Tue, 21 Jul 2020 14:04:21 -0400 X-MC-Unique: CmmM_N-0P0yRFk1bLTBQvA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 926798017FB; Tue, 21 Jul 2020 18:04:19 +0000 (UTC) Received: from oldenburg2.str.redhat.com (ovpn-112-90.ams2.redhat.com [10.36.112.90]) by smtp.corp.redhat.com (Postfix) with ESMTPS id DAE525C662; Tue, 21 Jul 2020 18:04:16 +0000 (UTC) From: Florian Weimer To: "Mallappa\, Premachandra" Cc: "libc-alpha\@sourceware.org" , "gcc\@gcc.gnu.org" , "llvm-dev\@lists.llvm.org" , "x86-64-abi\@googlegroups.com" , "H.J. Lu" , "matz\@suse.de" , Tom Stellard , Jeff Law Subject: Re: New x86-64 micro-architecture levels References: <87365zz3a6.fsf@oldenburg2.str.redhat.com> Date: Tue, 21 Jul 2020 20:04:15 +0200 In-Reply-To: (Premachandra Mallappa's message of "Tue, 21 Jul 2020 16:05:36 +0000") Message-ID: <87imegn3s0.fsf@oldenburg2.str.redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.3 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=unavailable autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Jul 2020 18:04:27 -0000 * Premachandra Mallappa: > [AMD Public Use] > > Hi Floarian, > >> I'm including a proposal for the levels below. I use single letters for= them, but I expect that the concrete implementation of this proposal will = use=20 >> names like =E2=80=9Cx86-100=E2=80=9D, =E2=80=9Cx86-101=E2=80=9D, like in= the glibc patch referenced above. (But we can discuss other approaches.) > > Personally I am not a big fan of this, for 2 reasons=20 > 1. uses just x86 in name on x86_64 as well That's deliberate, so that we can use the same x86-* names for 32-bit library selection (once we define matching micro-architecture levels there). GCC has -m32 -march=3Dx86-64 for K8 without 3DNow! (essentially the shared x86-64/EMT64 baseline), but I find this a bit confusing. > 2. 100/101 not very intuitive Any suggestions? The advantage is that these numbers show a strong preference ordering. They do make in false suggestions about feature sets: if we named Level C "x86-avx2", it would still be wrong for glibc to load libraries found in that directory just because a system has AVX2 support, because the libraries might also need FMA, based on the Level C definition). On the GCC side, it avoids a confusion between -mavx2 and -march=3Dx86-avx2. If numbers are out, what should we use instead? x86-sse4, x86-avx2, x86-avx512? Would that work? >> * Level A > ... >> * Level B >> This step is so small that it probably can be dropped, unless the benefi= ts from using VEX encoding are truly significant. > > Yes, Agree, the delta is too small, can be clubbed into A or C. Let's merge Level B into level C then? >> * Level C >> * Level D > > Others are inline with the what we expect as logical grouping. Thanks. > Also we would also like to have dynamic loader support for "zen" / > "zen2" as a version of "Level D" and takes preference over Level D, > which may have super-optimized libraries from AMD or other vendors. *That* shouldn't be too hard to implement if we can nail down the selection criteria. Let's call this Zen-specific Level C x86-zen-avx2 for the sake of exposition. What's going to be difficult is the choice for a hypothetical Zen successor that's compatible feature-flag-wise with Level D. Basically, there are two choices here: * Level D wins because it's the more powerful ISA. * x86-zen-avx2 wins because it has the Zen architecture optimizations. There's also a related issue with Level C vs x86-zen-avx2 depending on how we implement the Zen detection for AMD family numbers in the glibc dynamic linker. What I mean by this? glibc detects that this a Level C capable Zen-type CPU, but it's not one of the family/model numbers that were hard-coded into the glibc sources. What should we do then? Should we still prefer the x86-zen-avx2 library over the Level C library? > These libraries are expected to be optimized according to > micro-architectural details, not just ISA. If it's supposed to be generally useful, we really need to document the selection criteria for the subdirectory and make sure that it matches what these libraries actually require at run time in terms of ISA. I want to avoid two things here specifically: A hardware upgrade results in crashes because we incorrectly load an incompatible library. And, if possible: A hardware upgrade (or kernel/hypervisor upgrade that exposes more of the actual hardware) causes us to drop optimizations, so that users experience a performance regression. With the levels I proposed, these aspects are covered. But if we start to create vendor-specific forks in the feature progression, things get complicated. Do you think we need to figure this out in this iteration? If yes, then I really need a semi-formal description of the selection criteria for this x86-zen-avx2 directory, so that I can passed it along with my psABI proposal. Thanks, Florian