From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x1036.google.com (mail-pj1-x1036.google.com [IPv6:2607:f8b0:4864:20::1036]) by sourceware.org (Postfix) with ESMTPS id BB52C3857C6F for ; Wed, 31 Mar 2021 16:53:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org BB52C3857C6F Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=amacapital.net Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=luto@amacapital.net Received: by mail-pj1-x1036.google.com with SMTP id bt4so9808046pjb.5 for ; Wed, 31 Mar 2021 09:53:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amacapital-net.20150623.gappssmtp.com; s=20150623; h=content-transfer-encoding:from:mime-version:subject:date:message-id :references:cc:in-reply-to:to; bh=hk1eGF9UROjG/f/Flt2fUdcK/q5+Gj+fZdKcwTUQYH8=; b=p+WEaeVoSyBA0mxHcFAHnmyH9tTYKSvE0ibbhRPxRDUQduiljO1luAFiZb0qjbpql6 yR63zxtgjsqVT5o0BG22OkkyaaORrogc0OnDKFQYB1CGkphoBz6f3pnHxINsKeiIDeHo QmKzsRo6qrh1XZIjHSbNg7nUie179ftSIvOBYA3VBHo6lAMyCnN9epfUA3JgHHzPQQyB yJK8ezTHafDcb4LNLHpYEAanQCevhoOyrdAnNF8cRDRHEFryHQ/NgOspx+hulpe53Gu5 W0h7y+beCdqcy+xmBVO0GXQo+aMHRhis0bjX3Ua1aODL520iFPaH0piN32KcU4Ul9h7L ELEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:content-transfer-encoding:from:mime-version :subject:date:message-id:references:cc:in-reply-to:to; bh=hk1eGF9UROjG/f/Flt2fUdcK/q5+Gj+fZdKcwTUQYH8=; b=cL22xwinE9BWTdwOEJHzCiCwrPkqv4zzzSO4tvsCWTDARKmqzf1+TVEW1Hcnde39UW WCiAlv4Z8GbK9E1iyp6mFn7iI99RaFei5ExdItvbvNsBGG2WTb6V0lT3qp4Gq/liWt1S E/kqoKczf9qxG3MCFedhP/dWzUxCe2aFlGyPBb6xxrM94GJ97Y434sMMNXnEUgPm6jxv wlcRxwtvCN8rjGC81ToWLePIMdsLZKiBYe7I/bSoIWzP6wQc/+YKn8z05MWkT16AvGqr CbeenrO4XzqUVYbCt+vcC1wZXklQ9iJ/PMfhbjLkKylUqlRCdz+sL66HZaN43CAaWXx6 k0pw== X-Gm-Message-State: AOAM532jGmbn2ICtDcSFUzBXnX1k9R+W9G/4eEH7qY0jwSBHqnPEOLP/ hnm8SzdMUwUStXYSfD+T8xlLjQ== X-Google-Smtp-Source: ABdhPJx30pIX87T/1v4szyFkgLQbcJSdPKomLZItBbXr12z7PTW6nzPFAxKb6rJCAsys+wAg7E0kfw== X-Received: by 2002:a17:90a:c08a:: with SMTP id o10mr4369678pjs.67.1617209631861; Wed, 31 Mar 2021 09:53:51 -0700 (PDT) Received: from ?IPv6:2601:646:c200:1ef2:6c04:8e42:2555:a3ed? ([2601:646:c200:1ef2:6c04:8e42:2555:a3ed]) by smtp.gmail.com with ESMTPSA id h15sm2848098pfo.20.2021.03.31.09.53.50 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 31 Mar 2021 09:53:51 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: Andy Lutomirski Mime-Version: 1.0 (1.0) Subject: Re: Candidate Linux ABI for Intel AMX and hypothetical new related features Date: Wed, 31 Mar 2021 09:53:50 -0700 Message-Id: References: Cc: David Laight , Dave Hansen , Andy Lutomirski , Greg KH , "Bae, Chang Seok" , X86 ML , LKML , libc-alpha , Florian Weimer , Rich Felker , Kyle Huey , Keno Fischer , Linux API In-Reply-To: To: Len Brown X-Mailer: iPhone Mail (18D70) X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Mar 2021 16:53:54 -0000 > On Mar 31, 2021, at 9:31 AM, Len Brown wrote: >=20 > =EF=BB=BFOn Tue, Mar 30, 2021 at 6:01 PM David Laight wrote: >=20 >>> Can we leave it in live registers? That would be the speed-of-light >>> signal handler approach. But we'd need to teach the signal handler to >>> not clobber it. Perhaps that could be part of the contract that a >>> fast signal handler signs? INIT=3D0 AMX state could simply sit >>> patiently in the AMX registers for the duration of the signal handler. >>> You can't get any faster than doing nothing :-) >>>=20 >>> Of course part of the contract for the fast signal handler is that it >>> knows that it can't possibly use XRESTOR of the stuff on the stack to >>> necessarily get back to the state of the signaled thread (assuming we >>> even used XSTATE format on the fast signal handler stack, it would >>> forget the contents of the AMX registers, in this example) >>=20 >> gcc will just use the AVX registers for 'normal' code within >> the signal handler. >> So it has to have its own copy of all the registers. >> (Well, maybe you could make the TMX instructions fault, >> but that would need a nested signal delivered.) >=20 > This is true, by default, but it doesn't have to be true. >=20 > Today, gcc has an annotation for user-level interrupts > https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html#x86-Functi= on-Attributes >=20 > An analogous annotation could be created for fast signals. > gcc can be told exactly what registers and instructions it can use for > that routine. >=20 > Of course, this begs the question about what routines that handler calls, > and that would need to be constrained too. >=20 > Today signal-safety(7) advises programmers to limit what legacy signal han= dlers > can call. There is no reason that a fast-signal-safety(7) could not be cr= eated > for the fast path. >=20 >> There is also the register save buffer that you need in order >> to long-jump out of a signal handler. >> Unfortunately that is required to work. >> I'm pretty sure the original setjmp/longjmp just saved the stack >> pointer - but that really doesn't work any more. >>=20 >> OTOH most signal handlers don't care - but there isn't a flag >> to sigset() (etc) so ask for a specific register layout. >=20 > Right, the idea is to optimize for *most* signal handlers, > since making any changes to *all* signal handlers is intractable. >=20 > So the idea is that opting-in to a fast signal handler would opt-out > of some legacy signal capibilities. Complete state is one of them, > and thus long-jump is not supported, because the complete state > may not automatically be available. Long jump is probably the easiest problem of all: sigsetjmp() is a *function= *, following ABI, so sigsetjmp() is expected to clobber most or all of the e= xtended state. But this whole annotation thing will require serious compiler support. We al= ready have problems with compilers inlining functions and getting confused a= bout attributes. An API like: if (get_amx()) { use AMX; } else { don=E2=80=99t; } Avoids this problem. And making XCR0 dynamic, for all its faults, at least h= elps force a degree of discipline on user code. >=20 > thanks, > Len Brown, Intel Open Source Technology Center