From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw1-x1129.google.com (mail-yw1-x1129.google.com [IPv6:2607:f8b0:4864:20::1129]) by sourceware.org (Postfix) with ESMTPS id 178143858C54 for ; Wed, 7 Jun 2023 20:46:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 178143858C54 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-yw1-x1129.google.com with SMTP id 00721157ae682-569386b7861so15936687b3.0 for ; Wed, 07 Jun 2023 13:46:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1686170812; x=1688762812; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ESSACH6ZKTR242fRnBDYCxBwHLDr2y0JqKJVmmuvmCU=; b=h03bUxeGwUJHKjjXHe0GgbJo9SrrLY4sQ75Vw7NCsdA+bzVe/tmbEXgo3HMFsGRLny VbspYdi18YF8sDpC5IIMuU/mIcroz+foL1L04lSDXoOTjbC+UAt5J/n3xONyDtknY5hs YL5cS8ae44uYG7uQ8Rr1qvGke671YFYGxMp8aY7rIElAljraF/bIalgYMtoZ4b8uM7jj 91h7/vtykCDpNfKtGnnFT9DIb8gohr46wo/Qat6N/e8cA39JeIduV4z5FRZNJrma7Sc3 9d0YtM+AmlGkUp7vHyXHdkxvnDt4G3bzMlpbCLK0FFNuF14PVLc2TV6htOFx/C+leH5i s9cg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686170812; x=1688762812; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ESSACH6ZKTR242fRnBDYCxBwHLDr2y0JqKJVmmuvmCU=; b=N4Vza0s3M+0eVs2upzh53vbAZy8PK3zD4l07osCKaY370JqbfOZSsEwayMf3f88IAc 5Yegxp0z62lQ49MD21t04LUY2D9JBh0TmEQQhj5IGdT59bYZF7n0qq3dfs0/YBS9APdU kj3oYhQjMtasVEnNGNsjjT2bYustYOp8oNCVjfoh1ixYFV8/WpKrhLrFe+ENFZ4eLepH Pz6fMhdythAvGZJPbvAg6VR1exXnF8+VHEg2ycqEUAEcde8O6LhtjE+x610Tl/e3hAXX r368y6h1aYzSt8I9tzc4aMwPcE8ZiQiVq9i96nK6lVTUj6HjljOtv8ahbcNd/CcHcsV5 G87Q== X-Gm-Message-State: AC+VfDxa289mlB1dW3peTMYg083DHniJgaGTMd3UbftdQgqJoDWnDmIh MdZmd5lZuyJWpdj+d6p0uPvRE5Pkdon5VxrL/3Q= X-Google-Smtp-Source: ACHHUZ5ldc9F56aZ79yXEsogyYVtWEGaj2Y1EsCEK9A2Ki2NBbgsIzGxgwyo+gWNmh6Sg+s7U4h7rVpaH6iDlgBkR+M= X-Received: by 2002:a81:7c55:0:b0:568:ab19:8bb with SMTP id x82-20020a817c55000000b00568ab1908bbmr292717ywc.3.1686170812218; Wed, 07 Jun 2023 13:46:52 -0700 (PDT) MIME-Version: 1.0 References: <20230607194643.2081329-1-goldstein.w.n@gmail.com> <20230607194643.2081329-2-goldstein.w.n@gmail.com> In-Reply-To: <20230607194643.2081329-2-goldstein.w.n@gmail.com> From: "H.J. Lu" Date: Wed, 7 Jun 2023 13:46:16 -0700 Message-ID: Subject: Re: [PATCH v1 2/2] x86: Add `prepare_context_switch` to initialize register inuse states To: Noah Goldstein Cc: libc-alpha@sourceware.org, carlos@systemhalted.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-3021.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, Jun 7, 2023 at 12:46=E2=80=AFPM Noah Goldstein wrote: > > xsave/xrstor have optimization to skip saving/restoring register > classes if those register classes are in the init state > (inuse[bit]=3D=3D0). > > We can get: > SSE state > AVX state > ZMM_HI256 state > > to init state using `vzeroall`. Doing this before syscalls that will > cause a proper context switch can be beneficial in terms of the amount > of state the kernel needs to save/restore. This can save time and > memory. > --- > sysdeps/generic/prepare-context-switch.h | 28 +++++++++++++ > sysdeps/unix/sysv/linux/clock_nanosleep.c | 2 + > sysdeps/unix/sysv/linux/sched_yield.c | 2 + > sysdeps/x86/prepare-context-switch.h | 50 +++++++++++++++++++++++ > 4 files changed, 82 insertions(+) > create mode 100644 sysdeps/generic/prepare-context-switch.h > create mode 100644 sysdeps/x86/prepare-context-switch.h > > diff --git a/sysdeps/generic/prepare-context-switch.h b/sysdeps/generic/p= repare-context-switch.h > new file mode 100644 > index 0000000000..6153847905 > --- /dev/null > +++ b/sysdeps/generic/prepare-context-switch.h > @@ -0,0 +1,28 @@ > +/* Prepare process for context switch. generic version > + Copyright (C) 2023 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#ifndef _PREPARE_CONTEXT_SWITCH_H > +#define _PREPARE_CONTEXT_SWITCH_H > + > +static void > +prepare_context_switch (void) > +{ > + /* Empty. */ > +} > + > +#endif > diff --git a/sysdeps/unix/sysv/linux/clock_nanosleep.c b/sysdeps/unix/sys= v/linux/clock_nanosleep.c > index ac2d810632..e674f0ac54 100644 > --- a/sysdeps/unix/sysv/linux/clock_nanosleep.c > +++ b/sysdeps/unix/sysv/linux/clock_nanosleep.c > @@ -23,6 +23,7 @@ > #include "kernel-posix-cpu-timers.h" > > #include > +#include > > /* We can simply use the syscall. The CPU clocks are not supported > with this function. */ > @@ -44,6 +45,7 @@ __clock_nanosleep_time64 (clockid_t clock_id, int flags= , > #endif > > int r; > + prepare_context_switch(); > #ifdef __ASSUME_TIME64_SYSCALLS > r =3D INTERNAL_SYSCALL_CANCEL (clock_nanosleep_time64, clock_id, flags= , req, > rem); > diff --git a/sysdeps/unix/sysv/linux/sched_yield.c b/sysdeps/unix/sysv/li= nux/sched_yield.c > index 154bf725b0..d26c0f8a9f 100644 > --- a/sysdeps/unix/sysv/linux/sched_yield.c > +++ b/sysdeps/unix/sysv/linux/sched_yield.c > @@ -17,10 +17,12 @@ > . */ > > #include > +#include > > int > __sched_yield (void) > { > + prepare_context_switch(); > return INLINE_SYSCALL_CALL (sched_yield); > } > libc_hidden_def (__sched_yield); > diff --git a/sysdeps/x86/prepare-context-switch.h b/sysdeps/x86/prepare-c= ontext-switch.h > new file mode 100644 > index 0000000000..bf33a7a1b3 > --- /dev/null > +++ b/sysdeps/x86/prepare-context-switch.h > @@ -0,0 +1,50 @@ > +/* Prepare process for context switch. x86 version > + Copyright (C) 2023 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#ifndef _PREPARE_CONTEXT_SWITCH_H > +#define _PREPARE_CONTEXT_SWITCH_H > + > +#ifdef __AVX__ Please use if (CPU_FEATURE_ACTIVE (AVX)) to detect it at run-time. > +static void > +prepare_context_switch (void) > +{ > + /* vzeroall before context switch will restore xsave/xrstor state of t= he > + following to init state: > + - SSE state > + - AVX state > + - ZMM_HI256 state > + This saves a touch of overhead and memory in context switches. > + This function can/should be used before an operation that will > + cause a context switch in the current process (sched_yield, > + *sleep, etc...). > + */ > + __asm__ volatile ("vzeroall" Can you use _mm256_zeroall? > + : > + : > + : "zmm0", "zmm1", "zmm2", "zmm3", "zmm4", "zmm5", "zm= m6", > + "zmm7", "zmm8", "zmm9", "zmm10", "zmm11", "zmm12", > + "zmm13", "zmm14", "zmm15"); > + /* TODO: Add xtilerelease for amx state. */ > +} > + > +#else > +# undef _PREPARE_CONTEXT_SWITCH_H > +# include > +#endif > + > +#endif > -- > 2.34.1 > --=20 H.J.