From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <lenb417@gmail.com>
Received: from mail-ej1-f44.google.com (mail-ej1-f44.google.com
 [209.85.218.44])
 by sourceware.org (Postfix) with ESMTPS id 521433858022
 for <libc-alpha@sourceware.org>; Tue, 30 Mar 2021 20:42:36 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 521433858022
Received: by mail-ej1-f44.google.com with SMTP id jy13so26827776ejc.2
 for <libc-alpha@sourceware.org>; Tue, 30 Mar 2021 13:42:36 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc:content-transfer-encoding;
 bh=6y0w4epdy1zgw6lkitkraNKrAbYioLiFpd2DfbWTyl8=;
 b=V2fO1Y0sOCfImYLM6y0UPv9lJTZLsKc1hC4BKE2rKq2Zw3E3qnx0FjQ4szFMB9ze8I
 kqWNE93ipF8YiWwmnnpSgal0TDgxqsN+sSaaziygwcFxe/pSoMOkEK7naO0u84Q98zWX
 wWu+RvozFOL1JWF3eF/Cyp0z/GoscyREbRHAloz8l+9qn2/ucsn2PEyVa9gKKw3A1pr2
 XboZmueYi8QJYuLIVwmwp6V339LTsPJmMt99tp+3yZNanbQv3jbYG/v+tcbn7q4D7HF4
 MX9gxFR/pWxUUg4Mz0OLlHCXGTPzSJTwhmWf5zleOf5pEWXBQ59sZ6sc/sKSRia2Fy7y
 6shA==
X-Gm-Message-State: AOAM531Alv9nWYdd0p4oJEjMhznh2EfIYwXLrwh2HCd1WjAwKLnF1dPw
 33meWC8ByEGdrsLMQb8uupeCp3reRfKee17WQZk=
X-Google-Smtp-Source: ABdhPJyrA2haUymQO29Ez3sUhVUwRRFyYvafzbD5Smm8CpiSI7WxlTg1TiYD2c/PTh+vB6yqvyj441BjJtKgKmzV2eU=
X-Received: by 2002:a17:907:ea3:: with SMTP id
 ho35mr35338549ejc.219.1617136955046; 
 Tue, 30 Mar 2021 13:42:35 -0700 (PDT)
MIME-Version: 1.0
References: <d10affcb-d315-cebc-4162-084f0a1e4d43@intel.com>
 <F2653B18-239A-42BB-84EE-04F18B712279@amacapital.net>
In-Reply-To: <F2653B18-239A-42BB-84EE-04F18B712279@amacapital.net>
From: Len Brown <lenb@kernel.org>
Date: Tue, 30 Mar 2021 16:42:23 -0400
Message-ID: <CAJvTdKnwexRpHrLFQv+2ykK9WEqtXMwehjfa_D7T+O_8DO_CGA@mail.gmail.com>
Subject: Re: Candidate Linux ABI for Intel AMX and hypothetical new related
 features
To: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>, Andy Lutomirski <luto@kernel.org>, 
 Greg KH <gregkh@linuxfoundation.org>, "Bae,
 Chang Seok" <chang.seok.bae@intel.com>, 
 X86 ML <x86@kernel.org>, LKML <linux-kernel@vger.kernel.org>, 
 libc-alpha <libc-alpha@sourceware.org>, Florian Weimer <fweimer@redhat.com>, 
 Rich Felker <dalias@libc.org>, Kyle Huey <me@kylehuey.com>, 
 Keno Fischer <keno@juliacomputing.com>, Linux API <linux-api@vger.kernel.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00,
 FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE,
 RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=no autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Mar 2021 20:42:37 -0000

On Tue, Mar 30, 2021 at 4:20 PM Andy Lutomirski <luto@amacapital.net> wrote=
:
>
>
> > On Mar 30, 2021, at 12:12 PM, Dave Hansen <dave.hansen@intel.com> wrote=
:
> >
> > =EF=BB=BFOn 3/30/21 10:56 AM, Len Brown wrote:
> >> On Tue, Mar 30, 2021 at 1:06 PM Andy Lutomirski <luto@amacapital.net> =
wrote:
> >>>> On Mar 30, 2021, at 10:01 AM, Len Brown <lenb@kernel.org> wrote:
> >>>> Is it required (by the "ABI") that a user program has everything
> >>>> on the stack for user-space XSAVE/XRESTOR to get back
> >>>> to the state of the program just before receiving the signal?
> >>> The current Linux signal frame format has XSTATE in uncompacted forma=
t,
> >>> so everything has to be there.
> >>> Maybe we could have an opt in new signal frame format, but the detail=
s would need to be worked out.
> >>>
> >>> It is certainly the case that a signal should be able to be delivered=
, run =E2=80=9Casync-signal-safe=E2=80=9D code,
> >>> and return, without corrupting register contents.
> >> And so an an acknowledgement:
> >>
> >> We can't change the legacy signal stack format without breaking
> >> existing programs.  The legacy is uncompressed XSTATE.  It is a
> >> complete set of architectural state -- everything necessary to
> >> XRESTOR.  Further, the sigreturn flow allows the signal handler to
> >> *change* any of that state, so that it becomes active upon return from
> >> signal.
> >
> > One nit with this: XRSTOR itself can work with the compacted format or
> > uncompacted format.  Unlike the XSAVE/XSAVEC side where compaction is
> > explicit from the instruction itself, XRSTOR changes its behavior by
> > reading XCOMP_BV.  There's no XRSTORC.
> >
> > The issue with using the compacted format is when legacy software in th=
e
> > signal handler needs to go access the state.  *That* is what can't
> > handle a change in the XSAVE buffer format (either optimized/XSAVEOPT,
> > or compacted/XSAVEC).
>
> The compacted format isn=E2=80=99t compact enough anyway. If we want to k=
eep AMX and AVX512 enabled in XCR0 then we need to further muck with the fo=
rmat to omit the not-in-use features. I *think* we can pull this off in a w=
ay that still does the right thing wrt XRSTOR.

Agreed.  Compacted format doesn't save any space when INIT=3D0, so it is
only a half-step forward.

> If we go this route, I think we want a way for sigreturn to understand a =
pointer to the state instead of inline state to allow programs to change th=
e state.  Or maybe just to have a way to ask sigreturn to skip the restore =
entirely.

The legacy approach puts all architectural state on the signal stack
in XSTATE format.

If we make the signal stack smaller with a new fast-signal scheme, we
need to find another place for that state to live.

It can't live in the task context switch buffer.  If we put it there
and then take an interrupt while running the signal handler, then we'd
overwrite the signaled thread's state with the signal handler's state.

Can we leave it in live registers?  That would be the speed-of-light
signal handler approach.  But we'd need to teach the signal handler to
not clobber it.  Perhaps that could be part of the contract that a
fast signal handler signs?  INIT=3D0 AMX state could simply sit
patiently in the AMX registers for the duration of the signal handler.
You can't get any faster than doing nothing :-)

Of course part of the contract for the fast signal handler is that it
knows that it can't possibly use XRESTOR of the stuff on the stack to
necessarily get back to the state of the signaled thread (assuming we
even used XSTATE format on the fast signal handler stack, it would
forget the contents of the AMX registers, in this example)

Len Brown, Intel Open Source Technology Center