From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x2b.google.com (mail-oa1-x2b.google.com [IPv6:2001:4860:4864:20::2b]) by sourceware.org (Postfix) with ESMTPS id BA5B23858C62 for ; Thu, 8 Jun 2023 09:01:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BA5B23858C62 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-oa1-x2b.google.com with SMTP id 586e51a60fabf-1a28817f7d8so295307fac.3 for ; Thu, 08 Jun 2023 02:01:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1686214907; x=1688806907; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=UW/48cXYLcGKxYWMXz7p2s7b/kK9F/rRdtEGQB8vm9M=; b=VnD+iv17yo/Eae0F6ZEL6M0B4T2iJGrprkn7ESj7PO0zpdb6SGGY0nxOPLg4w9nqnY 8y7MomcZ5Nm4rkefzqDpwxDXn/N+mwirvKWA/RqNnyGmWM0azM12a5zywyIECRTr2rp3 XWFxrcRWsPV26JMY/n2RLNUAWGZYpzOclJNr1BBNCIym0GvhuI1RtI9g1ziQ1E9fsNLo BTOWjuivWdDMCOj/CLWe821qJp2b7lSFRjEY2+PdKFzScVgj2Lh7IT6cmXRHN2/La9sX 2Uh49dbpODe5MCYm9fQXhztTQ6aYJoX8DVdejnfpWjuynkxN7lNhGviSEUmIF7Ha1RrB kQ/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686214907; x=1688806907; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UW/48cXYLcGKxYWMXz7p2s7b/kK9F/rRdtEGQB8vm9M=; b=Oe93Q3k77xBSROvXE1bok0ec6f4UhNJB4kMd+kgchAVD3rRjT9U41ac4qm21M9bVyZ YErDWJUWKTQX0FTzFCUjDcvRmX856U3rcJGkiIJ+o9q7sxCzd+siQHGSASm2p+l+zSh9 BnSK5mrI/C1fLvtg8t8ZKZYZ7RxooJ/5cg/j5HbftK6iyBcdGAXkNUNcPKTvCAUvVlPu KfRix5AzrJaCoZBKyiSUD3lhsnKFqSyIXOF4yeFTYNsUHyoB5HlRQzUSiKGw5Xm33Ien 36AlQ3pYCflRZKNYWcJcEsd3o/vrFtxBZsxEfSUMnNVo8KkuZvvjWmrItzUGKrZ2cyRY PNdQ== X-Gm-Message-State: AC+VfDwP//9+C2na1uHY94dlvkdMvCI7nnxG9L1YcTZO2C0oNkPVbZrG czpH9aatNU8saXrRctEPSHJL4USMbofrXOxfkW4= X-Google-Smtp-Source: ACHHUZ6YGjG67NeDPwK76Uzi+Hal1Ao0mDPTSmWfGqbH/SgmDP5Rd38ajYoMt2uETCgotTte3kn7ex/hNaeeID3z2PA= X-Received: by 2002:a05:6870:e516:b0:19f:499c:4dc5 with SMTP id y22-20020a056870e51600b0019f499c4dc5mr4017030oag.51.1686214906925; Thu, 08 Jun 2023 02:01:46 -0700 (PDT) MIME-Version: 1.0 References: <20230607194643.2081329-1-goldstein.w.n@gmail.com> <20230607194643.2081329-2-goldstein.w.n@gmail.com> In-Reply-To: From: Noah Goldstein Date: Thu, 8 Jun 2023 04:01:35 -0500 Message-ID: Subject: Re: [PATCH v1 2/2] x86: Add `prepare_context_switch` to initialize register inuse states To: "H.J. Lu" Cc: libc-alpha@sourceware.org, carlos@systemhalted.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, Jun 7, 2023 at 4:59=E2=80=AFPM Noah Goldstein wrote: > > On Wed, Jun 7, 2023 at 3:46=E2=80=AFPM H.J. Lu wrot= e: > > > > On Wed, Jun 7, 2023 at 12:46=E2=80=AFPM Noah Goldstein wrote: > > > > > > xsave/xrstor have optimization to skip saving/restoring register > > > classes if those register classes are in the init state > > > (inuse[bit]=3D=3D0). > > > > > > We can get: > > > SSE state > > > AVX state > > > ZMM_HI256 state > > > > > > to init state using `vzeroall`. Doing this before syscalls that will > > > cause a proper context switch can be beneficial in terms of the amoun= t > > > of state the kernel needs to save/restore. This can save time and > > > memory. > > > --- > > > sysdeps/generic/prepare-context-switch.h | 28 +++++++++++++ > > > sysdeps/unix/sysv/linux/clock_nanosleep.c | 2 + > > > sysdeps/unix/sysv/linux/sched_yield.c | 2 + > > > sysdeps/x86/prepare-context-switch.h | 50 +++++++++++++++++++++= ++ > > > 4 files changed, 82 insertions(+) > > > create mode 100644 sysdeps/generic/prepare-context-switch.h > > > create mode 100644 sysdeps/x86/prepare-context-switch.h > > > > > > diff --git a/sysdeps/generic/prepare-context-switch.h b/sysdeps/gener= ic/prepare-context-switch.h > > > new file mode 100644 > > > index 0000000000..6153847905 > > > --- /dev/null > > > +++ b/sysdeps/generic/prepare-context-switch.h > > > @@ -0,0 +1,28 @@ > > > +/* Prepare process for context switch. generic version > > > + Copyright (C) 2023 Free Software Foundation, Inc. > > > + This file is part of the GNU C Library. > > > + > > > + The GNU C Library is free software; you can redistribute it and/o= r > > > + modify it under the terms of the GNU Lesser General Public > > > + License as published by the Free Software Foundation; either > > > + version 2.1 of the License, or (at your option) any later version= . > > > + > > > + The GNU C Library is distributed in the hope that it will be usef= ul, > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > > + Lesser General Public License for more details. > > > + > > > + You should have received a copy of the GNU Lesser General Public > > > + License along with the GNU C Library; if not, see > > > + . */ > > > + > > > +#ifndef _PREPARE_CONTEXT_SWITCH_H > > > +#define _PREPARE_CONTEXT_SWITCH_H > > > + > > > +static void > > > +prepare_context_switch (void) > > > +{ > > > + /* Empty. */ > > > +} > > > + > > > +#endif > > > diff --git a/sysdeps/unix/sysv/linux/clock_nanosleep.c b/sysdeps/unix= /sysv/linux/clock_nanosleep.c > > > index ac2d810632..e674f0ac54 100644 > > > --- a/sysdeps/unix/sysv/linux/clock_nanosleep.c > > > +++ b/sysdeps/unix/sysv/linux/clock_nanosleep.c > > > @@ -23,6 +23,7 @@ > > > #include "kernel-posix-cpu-timers.h" > > > > > > #include > > > +#include > > > > > > /* We can simply use the syscall. The CPU clocks are not supported > > > with this function. */ > > > @@ -44,6 +45,7 @@ __clock_nanosleep_time64 (clockid_t clock_id, int f= lags, > > > #endif > > > > > > int r; > > > + prepare_context_switch(); > > > #ifdef __ASSUME_TIME64_SYSCALLS > > > r =3D INTERNAL_SYSCALL_CANCEL (clock_nanosleep_time64, clock_id, f= lags, req, > > > rem); > > > diff --git a/sysdeps/unix/sysv/linux/sched_yield.c b/sysdeps/unix/sys= v/linux/sched_yield.c > > > index 154bf725b0..d26c0f8a9f 100644 > > > --- a/sysdeps/unix/sysv/linux/sched_yield.c > > > +++ b/sysdeps/unix/sysv/linux/sched_yield.c > > > @@ -17,10 +17,12 @@ > > > . */ > > > > > > #include > > > +#include > > > > > > int > > > __sched_yield (void) > > > { > > > + prepare_context_switch(); > > > return INLINE_SYSCALL_CALL (sched_yield); > > > } > > > libc_hidden_def (__sched_yield); > > > diff --git a/sysdeps/x86/prepare-context-switch.h b/sysdeps/x86/prepa= re-context-switch.h > > > new file mode 100644 > > > index 0000000000..bf33a7a1b3 > > > --- /dev/null > > > +++ b/sysdeps/x86/prepare-context-switch.h > > > @@ -0,0 +1,50 @@ > > > +/* Prepare process for context switch. x86 version > > > + Copyright (C) 2023 Free Software Foundation, Inc. > > > + This file is part of the GNU C Library. > > > + > > > + The GNU C Library is free software; you can redistribute it and/o= r > > > + modify it under the terms of the GNU Lesser General Public > > > + License as published by the Free Software Foundation; either > > > + version 2.1 of the License, or (at your option) any later version= . > > > + > > > + The GNU C Library is distributed in the hope that it will be usef= ul, > > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > > + Lesser General Public License for more details. > > > + > > > + You should have received a copy of the GNU Lesser General Public > > > + License along with the GNU C Library; if not, see > > > + . */ > > > + > > > +#ifndef _PREPARE_CONTEXT_SWITCH_H > > > +#define _PREPARE_CONTEXT_SWITCH_H > > > + > > > +#ifdef __AVX__ > > > > Please use > > > > if (CPU_FEATURE_ACTIVE (AVX)) > > > > to detect it at run-time. > > > Wanted to avoid overhead. Think if we want runtime check should ifunc > the functions > we want to put it in (just clock_nanosleep64 and sched_yield). WDYT? > > > > +static void > > > +prepare_context_switch (void) > > > +{ > > > + /* vzeroall before context switch will restore xsave/xrstor state = of the > > > + following to init state: > > > + - SSE state > > > + - AVX state > > > + - ZMM_HI256 state > > > + This saves a touch of overhead and memory in context switches. > > > + This function can/should be used before an operation that will > > > + cause a context switch in the current process (sched_yield, > > > + *sleep, etc...). > > > + */ > > > + __asm__ volatile ("vzeroall" > > > > Can you use _mm256_zeroall? > > > > > + : > > > + : > > > + : "zmm0", "zmm1", "zmm2", "zmm3", "zmm4", "zmm5",= "zmm6", > > > + "zmm7", "zmm8", "zmm9", "zmm10", "zmm11", "zmm1= 2", > > > + "zmm13", "zmm14", "zmm15"); > > > + /* TODO: Add xtilerelease for amx state. */ > > > +} > > > + > > > +#else > > > +# undef _PREPARE_CONTEXT_SWITCH_H > > > +# include > > > +#endif > > > + > > > +#endif > > > -- > > > 2.34.1 > > > > > > > > > -- > > H.J. Abandoning this patch in favor of the versions at: "x86: Implement sched_yield syscall for x86 only." and "x86: Implement clock_nanosleep{_time64} syscall for x86 only."