From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by sourceware.org (Postfix) with ESMTPS id A9C3E3858028 for ; Fri, 26 Mar 2021 04:38:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org A9C3E3858028 Received: by mail.kernel.org (Postfix) with ESMTPSA id 7575761A4C for ; Fri, 26 Mar 2021 04:38:36 +0000 (UTC) Received: by mail-ej1-f49.google.com with SMTP id a7so6470820ejs.3 for ; Thu, 25 Mar 2021 21:38:36 -0700 (PDT) X-Gm-Message-State: AOAM530sB0I20flVqHSW1vn/qDo7qkLqwEmyXeYBpAz4bJ3Pu3UGpPoU SO6ZJKyfM2FIOd3wetJdBkepmmEqNuExKiH34xOaqA== X-Google-Smtp-Source: ABdhPJwrq9vL7MDIknae1FtSjIxsX7tXBeWqhfyBOFoJu+dGld0VB31n6SdyJjzLfhRygyYsSreIlEvS3NlBd5+YniM= X-Received: by 2002:a17:906:7e12:: with SMTP id e18mr13718403ejr.316.1616733514967; Thu, 25 Mar 2021 21:38:34 -0700 (PDT) MIME-Version: 1.0 From: Andy Lutomirski Date: Thu, 25 Mar 2021 21:38:24 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Why does glibc use AVX-512? To: libc-alpha , "H. J. Lu" , X86 ML , LKML , "Bae, Chang Seok" , Florian Weimer , "Carlos O'Donell" , Rich Felker Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-4.9 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_MANYTO, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Mar 2021 04:38:41 -0000 Hi all- glibc appears to use AVX512F for memcpy by default. (Unless Prefer_ERMS is default-on, but I genuinely can't tell if this is the case. I did some searching.) The commit adding it refers to a 2016 email saying that it's 30% on KNL. Unfortunately, AVX-512 is now available in normal hardware, and the overhead from switching between normal and AVX-512 code appears to vary from bad to genuinely horrible. And, once anything has used the high parts of YMM and/or ZMM, those states tend to get stuck with XINUSE=1. I'm wondering whether glibc should stop using AVX-512 by default. Meanwhile, some of you may have noticed a little ABI break we have. On AVX-512 hardware, the size of a signal frame is unreasonably large, and this is causing problems even for existing software that doesn't use AVX-512. Do any of you have any clever ideas for how to fix it? We have some kernel patches around to try to fail more cleanly, but we still fail. I think we should seriously consider solutions in which, for new tasks, XCR0 has new giant features (e.g. AMX) and possibly even AVX-512 cleared, and programs need to explicitly request enablement. This would allow programs to opt into not saving/restoring across signals or to save/restore in buffers supplied when the feature is enabled. This has all kinds of pros and cons, and I'm not sure it's a great idea. But, in the absence of some change to the ABI, the default outcome is that, on AMX-enabled kernels on AMX-enabled hardware, the signal frame will be more than 8kB, and this will affect *every* signal regardless of whether AMX is in use. --Andy