From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTP id 4A793384400A for ; Mon, 12 Apr 2021 14:19:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 4A793384400A Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-216-rtIu9o43MDCEJYk00FRuvg-1; Mon, 12 Apr 2021 10:19:24 -0400 X-MC-Unique: rtIu9o43MDCEJYk00FRuvg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id EF8301926DAE; Mon, 12 Apr 2021 14:19:21 +0000 (UTC) Received: from oldenburg.str.redhat.com (ovpn-112-148.ams2.redhat.com [10.36.112.148]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 65C826EF46; Mon, 12 Apr 2021 14:19:19 +0000 (UTC) From: Florian Weimer To: Andy Lutomirski Cc: "Bae, Chang Seok" , Dave Hansen , X86 ML , LKML , linux-abi@vger.kernel.org, libc-alpha@sourceware.org , Rich Felker , Kyle Huey , Keno Fischer Subject: Re: Candidate Linux ABI for Intel AMX and hypothetical new related features References: Date: Mon, 12 Apr 2021 16:19:29 +0200 In-Reply-To: (Andy Lutomirski's message of "Fri, 26 Mar 2021 16:12:25 -0700") Message-ID: <87lf9nk2ku.fsf@oldenburg.str.redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Spam-Status: No, score=-6.4 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Apr 2021 14:19:29 -0000 * Andy Lutomirski: > Optionally we make AVX-512 also default off, which fixes what is > arguably a serious ABI break with AVX-512: lots of programs, following > POSIX (!), seem to think that they know much much space to allocate > for sigaltstack(). AVX-512 is too big. > > Thoughts? Maybe we could have done this in 2016 when I reported this for the first time. Now it is too late, as more and more software is using CPUID-based detection for AVX-512. Our users have been using AVX-512 hardware for quite some time now, and I haven't seen *that* many issues resulting from the context size. That isn't to say that problems do not exist, but they are more of the kind that the increased stack usage means that areas of the stack that used to be zero no longer are, so users encounter different side effects from uninitialized-variable bugs. How much software depends on the signal handler data layout? The %zmm state does not appear to be exposed today, so perhaps some savings could be had there. The suggestion to make CPUID trap doesn't sound workable to me. At least in the past, it's been suggested as a serializing instruction to be used alongside RDTSC, which makes it rather time-critical for some applications. Even today, signal handlers do not really compose well in the sense that multiple libraries can use them and collaborate without being aware of each other (like they can divide up TLS memory with the help of the dynamic linker, or carve out address space using mmap). Proposals to set special process-wide flags only make that situation worse. Code that installs a signal handler often does not have control on which thread an asynchronous signal is delivered, or which code it interrupts. A single process-wide flag cannot capture that accurately, even if it is per signal number. There is some work on handlers that do not save GP registers, for userspace interrupts. But I do not think that generalizes to existing signal handlers because floating point is widely used in signal handlers today (at least implicitly). The rseq extension might work conceptually, but it requires to make operations idempotent, with periodic checkpoint, and of course inline/flatten all calls. And it requires compiler work, the present model based on inline asm hacks doesn't look workable. Maybe that works for AMX. I have not checked if there is yet any public documentation of the programming model. I think someone expressed the sentiment (maybe on another thread) that the current CPU feature enablement process does not work. I do not agree. Currently it is only necessary to upgrade the kernel and maybe glibc (but not in all cases), and then you are good to go. You can keep using your old libraries, old compilers, and even old assemblers if you are okay with .byte hacks. You do not need special userspace libraries, new compilers for different languages, special firmware or binary blobs. Overall, it just works. On x86, we are really bad about actually using CPU features pervasively, but that is a different story. Thanks, Florian