From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by sourceware.org (Postfix) with ESMTPS id 3198A385BF9D for ; Tue, 13 Apr 2021 22:59:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 3198A385BF9D Received: by mail.kernel.org (Postfix) with ESMTPSA id 554BC6128E for ; Tue, 13 Apr 2021 22:59:11 +0000 (UTC) Received: by mail-ed1-f49.google.com with SMTP id m3so21360817edv.5 for ; Tue, 13 Apr 2021 15:59:11 -0700 (PDT) X-Gm-Message-State: AOAM531h5Z8gUk9BwrtUhOc+tFR1wsijZn5x2cAQPlu+od6zOA1qdnlj 1q1b9DifBF5yWfJ49Kp1epXVtILN0AtR+xxeVVHEmg== X-Google-Smtp-Source: ABdhPJzJkPw21zTgf71eMpCcBJ3V1F0sHGsjyBMesqTdaShDitlG+tfS7Km1wCffDoYAPcz8pg7cyeKVQJPirZVPmj8= X-Received: by 2002:aa7:d353:: with SMTP id m19mr34103472edr.172.1618354749851; Tue, 13 Apr 2021 15:59:09 -0700 (PDT) MIME-Version: 1.0 References: <87lf9nk2ku.fsf@oldenburg.str.redhat.com> In-Reply-To: From: Andy Lutomirski Date: Tue, 13 Apr 2021 15:58:58 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Candidate Linux ABI for Intel AMX and hypothetical new related features To: Len Brown Cc: Andy Lutomirski , Willy Tarreau , Florian Weimer , "Bae, Chang Seok" , Dave Hansen , X86 ML , LKML , linux-abi@vger.kernel.org, "libc-alpha@sourceware.org" , Rich Felker , Kyle Huey , Keno Fischer Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3.6 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Apr 2021 22:59:13 -0000 On Tue, Apr 13, 2021 at 3:47 PM Len Brown wrote: > > On Tue, Apr 13, 2021 at 4:16 PM Andy Lutomirski wrote: > > > > On Mon, Apr 12, 2021 at 4:46 PM Len Brown wrote: > > > > > > On Mon, Apr 12, 2021 at 11:21 AM Andy Lutomirski wrote: > > > > > > > AMX: Multiplying a 4x4 matrix probably looks *great* in a > > > > microbenchmark. Do it once and you permanently allocate 8kB (is that > > > > even a constant? can it grow in newer parts?), potentially hurts all > > > > future context switches, and does who-knows-what to Turbo licenses and > > > > such. > > > > > > Intel expects that AMX will be extremely valuable to key workloads. > > > It is true that you may never run that kind of workload on the machine > > > in front of you, > > > and so you have every right to be doubtful about the value of AMX. > > > > I fully believe that AMX will be amazing when used for the right > > workload. The problem is that a library may have no way to tell > > whether a workload is the type of computationally intensive workload > > for which it makes sense. Imagine you have a little function: > > > > int matrix_times_vector(int dim, float *out, const float *matrix, > > const float *vector); > > > > A clever library might use AMX for this. If dim == 4 and the caller > > is planning to call it in a long, tight loop, maybe this even makes > > sense. If dim == 4 and it's being called once, AMX is probably a > > losing proposition. With previous technologies, at least the impact > > was limited to the function itself and maybe once per call to the > > caller. But now, with AMX, the program that invoked this takes a > > performance and memory hit *forever* if it uses AMX once. > > Again... > > As this is a "clever" library, built with a clever toolchain, and the > result is that > TILERELEASE was properly issued at the end of computation. > Thus the hardware knows that the (volatile) AMX registers are no longer live. My argument has *nothing* to do with TILERELEASE. Let me try again. Suppose I write some user code an call into a library that uses AMX because the library authors benchmarked it and determined that using AMX is faster when called in a loop. But I don't call it in a loop. Then I take the transition penalty into and out of AMX code (I'll believe there is no penalty when I see it -- we've had a penalty with VEX and with AVX-512) and my program runs *slower*. And, to top it off, I've just permanently allocated 8kB of extra FPU state buffer, *and* I'm taking either an XCR0 or an XFD write penalty on every future context switch. Someone or something needs to make a decision as to whether AMX should actually be used for a given algorithm. The user library community has swept this under the rug by declaring that libraries should use the best-in-a-tight-loop code for the entire existence of extensions beyond XMM, and the cost keeps getting higher. > > Beyond that, we have the signal handling issue. > > I'm unaware of any unresolved feedback on the signal handling series > other than a wistful "wouldn't a new SIGFAIL be more clear (for future apps) > than the existing SIGSEGV?" I agree with this sentiment, but I don't > think we should hold up a patch to prevent corrupting user data > because a new signal number to describe the scenario doesn't exit. > Particularly since the new code that knows about the new SIGFAIL > will also be new code that has been compiled with the new glibc > that for most cases will prevent this scenario in the first place... > > > One solution, going > > off of what WIlly mentioned, is: > > > > bool amx_begin(void *signal_save_buffer); > > void amx_end(); > > > > In the amx_begin() region, if you get a signal, the AMX state is saved > > in the buffer. Outside the region, if you get a signal and AMX is in > > use, the kernel will either unceremoniously kill the task or will > > deliver SIGYOUBLEWIT. [0] > > I think it is clear that if a new signal ABI is going to be invented, > that it should be opt-in on state, so that it can run fast on machines > far into the future by not choosing to opt-in on anything. > > It isn't clear that changing the signal save state around critical regions > (in multiple threads) so that a single (per process definition) of a signal > handler gets a different result at different times is going to make that > (new) signal handler author especially happy. More likely they > either always want the state, or they do not. Perhaps some form of decision should be reached before AMX lands? Landing AMX in its current form is a decision, and we should make a credible effort to decide if it's the right one. --Andy