From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <luto@kernel.org>
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
 by sourceware.org (Postfix) with ESMTPS id A9C3E3858028
 for <libc-alpha@sourceware.org>; Fri, 26 Mar 2021 04:38:39 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org A9C3E3858028
Received: by mail.kernel.org (Postfix) with ESMTPSA id 7575761A4C
 for <libc-alpha@sourceware.org>; Fri, 26 Mar 2021 04:38:36 +0000 (UTC)
Received: by mail-ej1-f49.google.com with SMTP id a7so6470820ejs.3
 for <libc-alpha@sourceware.org>; Thu, 25 Mar 2021 21:38:36 -0700 (PDT)
X-Gm-Message-State: AOAM530sB0I20flVqHSW1vn/qDo7qkLqwEmyXeYBpAz4bJ3Pu3UGpPoU
 SO6ZJKyfM2FIOd3wetJdBkepmmEqNuExKiH34xOaqA==
X-Google-Smtp-Source: ABdhPJwrq9vL7MDIknae1FtSjIxsX7tXBeWqhfyBOFoJu+dGld0VB31n6SdyJjzLfhRygyYsSreIlEvS3NlBd5+YniM=
X-Received: by 2002:a17:906:7e12:: with SMTP id
 e18mr13718403ejr.316.1616733514967; 
 Thu, 25 Mar 2021 21:38:34 -0700 (PDT)
MIME-Version: 1.0
From: Andy Lutomirski <luto@kernel.org>
Date: Thu, 25 Mar 2021 21:38:24 -0700
X-Gmail-Original-Message-ID: <CALCETrURmk4ZijJVUtJwouj=_0NPiUvUFr9XMvdniRRFqeU+fg@mail.gmail.com>
Message-ID: <CALCETrURmk4ZijJVUtJwouj=_0NPiUvUFr9XMvdniRRFqeU+fg@mail.gmail.com>
Subject: Why does glibc use AVX-512?
To: libc-alpha <libc-alpha@sourceware.org>, "H. J. Lu" <hjl.tools@gmail.com>, 
 X86 ML <x86@kernel.org>, LKML <linux-kernel@vger.kernel.org>, 
 "Bae, Chang Seok" <chang.seok.bae@intel.com>,
 Florian Weimer <fweimer@redhat.com>, 
 "Carlos O'Donell" <carlos@redhat.com>, Rich Felker <dalias@libc.org>
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-4.9 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_MANYTO,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Fri, 26 Mar 2021 04:38:41 -0000

Hi all-

glibc appears to use AVX512F for memcpy by default.  (Unless
Prefer_ERMS is default-on, but I genuinely can't tell if this is the
case.  I did some searching.)  The commit adding it refers to a 2016
email saying that it's 30% on KNL.  Unfortunately, AVX-512 is now
available in normal hardware, and the overhead from switching between
normal and AVX-512 code appears to vary from bad to genuinely
horrible.  And, once anything has used the high parts of YMM and/or
ZMM, those states tend to get stuck with XINUSE=1.

I'm wondering whether glibc should stop using AVX-512 by default.

Meanwhile, some of you may have noticed a little ABI break we have.
On AVX-512 hardware, the size of a signal frame is unreasonably large,
and this is causing problems even for existing software that doesn't
use AVX-512.  Do any of you have any clever ideas for how to fix it?
We have some kernel patches around to try to fail more cleanly, but we
still fail.

I think we should seriously consider solutions in which, for new
tasks, XCR0 has new giant features (e.g. AMX) and possibly even
AVX-512 cleared, and programs need to explicitly request enablement.
This would allow programs to opt into not saving/restoring across
signals or to save/restore in buffers supplied when the feature is
enabled.  This has all kinds of pros and cons, and I'm not sure it's a
great idea.  But, in the absence of some change to the ABI, the
default outcome is that, on AMX-enabled kernels on AMX-enabled
hardware, the signal frame will be more than 8kB, and this will affect
*every* signal regardless of whether AMX is in use.

--Andy