From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-x430.google.com (mail-pf1-x430.google.com [IPv6:2607:f8b0:4864:20::430]) by sourceware.org (Postfix) with ESMTPS id E88A53858D33 for ; Thu, 18 Apr 2024 20:19:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E88A53858D33 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E88A53858D33 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::430 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1713471560; cv=none; b=v6md1srjZK70kxtd2sTRMvyqnPHyJExfDBueAkONVW6kgIDBLcrOOF1D3x5sSLO80JhvQ6Voke+K1Z2KHcImnPUaGanP6sH5aad9UuuT3OYIritaqRONN24AIIxryThGq7YizQHvSCp40G6akMnjmdD4Zm+7hzbdS/iApmAoAn4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1713471560; c=relaxed/simple; bh=ofV3oJvxATUvCbBuOQILUg4QJ9JURjvzWLZWeu6aiSk=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=MjjsqswWyxXW6yLgGOpuP/Q57Nf9obM5+Z2Pj447utdM91bsV1svN7rlbJHPz3W3YqHDlahHq1MhaS2AiC3aAFNhYiaxybz1Pjko8J+DSQRwUZ7XYEFZ/MGlTDlrAWNP/YjLd+JZc9CpJ4FYe1t+WnpZWw5NN2vXREjUsd6TX+Y= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pf1-x430.google.com with SMTP id d2e1a72fcca58-6ed11782727so1226848b3a.1 for ; Thu, 18 Apr 2024 13:19:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull.eu; s=google; t=1713471557; x=1714076357; darn=sourceware.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=MC/ylTAjqFG7lzAAG62A8cwT92T70hWLWqD2B5z5rRI=; b=BG4Y98n/5ns6ZpeshEfJ7b4kM9x+UVdthJKo1J+Z61qkKLRp/HhBwd2vz7NegNI28P ZP8hkzo+1sYjbg2biAC+b9/1+ACp6lCmAzfdtuoNHsqC/BnqlzbhBMO0FNalhQCpgmA0 HVF2Vxt1V1KHcqTi8CXV6Bv/1TWyMaGOHhLzPFfJnaIIag4leBKC8N3H5nJ+aOZcjWJD vSHP5DPOig71iLcc+UELnUVYrNPSi92jyU10Ybj4NB7KdoS/v0YsIPnxQatYQWi+FXev Rp8J3S9VnLm9gfyUi6ijFzCrhz6IcSmk4ckw5mpuP+qZgNcYaPp+SbB1nXiytm9qj0FY 11Ew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713471557; x=1714076357; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MC/ylTAjqFG7lzAAG62A8cwT92T70hWLWqD2B5z5rRI=; b=gguoYqDn8hk8V60CsajF3uMCBarRz4t3VGb27YUdHQqZLaR4LVcNkBDz6V3LYbkF4w LVMmFJ97ujdyXkymxxzRYOa0wzyZbicqzEJwA6b1CqXbA6EcjGpXlTKfOt3XTAygGQc1 GQUDKUk61CSP13yNzd5kviV1B1PKdsiQoRp6sAb9pJydJ2ZyI9IBCXBfkMLhHp3D4Rbv vpS9ljXG7c721p0HcWXCmKMEvFDykTz0/Yvi4igmU0sMHEjAX9fhR2z6H52bYya0ENWM VJvXzKO+066kEEv3BUDPFa+YkRCOZ3wu628HXM/N5DT1qvD9C47SO3kGiBTHShWxvC0V gx/w== X-Gm-Message-State: AOJu0YyqYXZztZF+8vsdmDuLglT+aLoR2gUT5Ba+iCpdSvcH/8MhfMei neOFcIZNSs92M+9spvzcIQeTSzRPr7hFlDf+t/o7BX/6WHUE0MN7GE5JgNPg/lSS+06j7oNmpS/ VLXIGMU8P2BtDqF9WyaCKVVI8jSueJi0bjUA0yg== X-Google-Smtp-Source: AGHT+IE07wjpanqvISuavs3TyR6fBlHedDU/kz/vrkH4bIwNow7BncX3Khug+BZxxAHnNyoCPMj7e7bw/zeGq3sbIi4= X-Received: by 2002:a05:6a20:7f84:b0:1a3:d60f:738f with SMTP id d4-20020a056a207f8400b001a3d60f738fmr411596pzj.18.1713471556706; Thu, 18 Apr 2024 13:19:16 -0700 (PDT) MIME-Version: 1.0 References: <20240418094635.3502009-4-christoph.muellner@vrull.eu> In-Reply-To: From: =?UTF-8?Q?Christoph_M=C3=BCllner?= Date: Thu, 18 Apr 2024 22:19:04 +0200 Message-ID: Subject: Re: [RFC PATCH 3/3] RISC-V: Implement CPU yielding for busy loops with Zihintpause/Zawrs To: Palmer Dabbelt Cc: libc-alpha@sourceware.org, adhemerval.zanella@linaro.org, Darius Rad , Andrew Waterman , philipp.tomsich@vrull.eu, Evan Green , kito.cheng@sifive.com, jeffreyalaw@gmail.com, Vineet Gupta Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-10.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, Apr 18, 2024 at 7:17=E2=80=AFPM Palmer Dabbelt = wrote: > > On Thu, 18 Apr 2024 02:46:35 PDT (-0700), christoph.muellner@vrull.eu wro= te: > > The macro atomic_spin_nop can be used to implement arch-specific > > CPU yielding that is used in busy loops (e.g. in pthread_spin_lock). > > This patch introduces an ifunc-based implementation for RISC-V, > > that uses Zihintpause's PAUSE instruction for that matter (as PAUSE > > is a HINT instruction there is not dependency to Zihintpause at > > runtime). Further, we test for Zawrs via hwprobe() and if found > > we use WRS.STO instead of PAUSE. > > > > Signed-off-by: Christoph M=C3=BCllner > > --- > > sysdeps/riscv/multiarch/cpu-relax_generic.S | 31 +++++++++++++++ > > sysdeps/riscv/multiarch/cpu-relax_zawrs.S | 28 +++++++++++++ > > .../unix/sysv/linux/riscv/atomic-machine.h | 3 ++ > > .../unix/sysv/linux/riscv/multiarch/Makefile | 8 ++++ > > .../sysv/linux/riscv/multiarch/cpu-relax.c | 39 +++++++++++++++++++ > > .../linux/riscv/multiarch/ifunc-impl-list.c | 32 +++++++++++++-- > > 6 files changed, 137 insertions(+), 4 deletions(-) > > create mode 100644 sysdeps/riscv/multiarch/cpu-relax_generic.S > > create mode 100644 sysdeps/riscv/multiarch/cpu-relax_zawrs.S > > create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/cpu-relax.c > > > > diff --git a/sysdeps/riscv/multiarch/cpu-relax_generic.S b/sysdeps/risc= v/multiarch/cpu-relax_generic.S > > new file mode 100644 > > index 0000000000..d3ccfdce84 > > --- /dev/null > > +++ b/sysdeps/riscv/multiarch/cpu-relax_generic.S > > @@ -0,0 +1,31 @@ > > +/* CPU strand yielding for busy loops. RISC-V version. > > + Copyright (C) 2024 Free Software Foundation, Inc. > > + This file is part of the GNU C Library. > > + > > + The GNU C Library is free software; you can redistribute it and/or > > + modify it under the terms of the GNU Lesser General Public > > + License as published by the Free Software Foundation; either > > + version 2.1 of the License, or (at your option) any later version. > > + > > + The GNU C Library is distributed in the hope that it will be useful= , > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > + Lesser General Public License for more details. > > + > > + You should have received a copy of the GNU Lesser General Public > > + License along with the GNU C Library. If not, see > > + . */ > > + > > +#include > > +#include > > + > > +.option push > > +.option arch, +zihintpause > > +ENTRY (__cpu_relax_generic) > > + /* While we can use the `pause` instruction without > > + the need of Zihintpause (because it is a HINT instruction), > > + we still have to enable Zihintpause for the assembler. */ > > + pause > > + ret > > +END (__cpu_relax_generic) > > +.option pop > > diff --git a/sysdeps/riscv/multiarch/cpu-relax_zawrs.S b/sysdeps/riscv/= multiarch/cpu-relax_zawrs.S > > new file mode 100644 > > index 0000000000..6d27b354df > > --- /dev/null > > +++ b/sysdeps/riscv/multiarch/cpu-relax_zawrs.S > > @@ -0,0 +1,28 @@ > > +/* CPU strand yielding for busy loops. RISC-V version with Zawrs. > > + Copyright (C) 2024 Free Software Foundation, Inc. > > + This file is part of the GNU C Library. > > + > > + The GNU C Library is free software; you can redistribute it and/or > > + modify it under the terms of the GNU Lesser General Public > > + License as published by the Free Software Foundation; either > > + version 2.1 of the License, or (at your option) any later version. > > + > > + The GNU C Library is distributed in the hope that it will be useful= , > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > + Lesser General Public License for more details. > > + > > + You should have received a copy of the GNU Lesser General Public > > + License along with the GNU C Library. If not, see > > + . */ > > + > > +#include > > +#include > > + > > +.option push > > +.option arch, +zawrs > > +ENTRY (__cpu_relax_zawrs) > > + wrs.sto > > This has the same forward progress/eventual success violation as the > code you sent for GCC and Linux does. It doesn't really matter if the > user of the reservation is in a builtin, an asm block, or a function. > The compiler just doesn't know about those reservation rules and isn't > going to generate code that follows them. I see. The main issue is that we don't have a valid reservation when calling WRS, so the whole use of Zawrs instructions is pointless. So the only way to move Zawrs forward would be to adjust the locking routin= es (introducing new primitives that have to be implemented for all architectur= es). > > That said, as far as I can tell the same grey area in the WRS spec that > I pointed out during one of the Linux reviews still exists. So if > there's some hardware that has a more generous set of LR->WRS rules than > the ISA-required LR->SC rules that's fine, we just need to know what the > hardware actually is so we can write down the rules and then stick to > them -- it's just a performance thing for WRS, so having weaker rules > seems entirely plausible. > > I don't know of any hardware with WRS, but sorry if I missed one. Do > you have something publicly availiable that behaves this way? > > > + ret > > +END (__cpu_relax_zawrs) > > +.option pop > > diff --git a/sysdeps/unix/sysv/linux/riscv/atomic-machine.h b/sysdeps/u= nix/sysv/linux/riscv/atomic-machine.h > > index c1c9d949a0..02b9b7a421 100644 > > --- a/sysdeps/unix/sysv/linux/riscv/atomic-machine.h > > +++ b/sysdeps/unix/sysv/linux/riscv/atomic-machine.h > > @@ -178,4 +178,7 @@ > > # error "ISAs that do not subsume the A extension are not supported" > > #endif /* !__riscv_atomic */ > > > > +extern void __cpu_relax (void); > > +#define atomic_spin_nop() __cpu_relax() > > + > > #endif /* bits/atomic.h */ > > diff --git a/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile b/sysdeps= /unix/sysv/linux/riscv/multiarch/Makefile > > index fcef5659d4..0cdf37a38b 100644 > > --- a/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile > > +++ b/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile > > @@ -1,3 +1,11 @@ > > +# nscd uses atomic_spin_nop which in turn requires cpu_relax > > +ifeq ($(subdir),nscd) > > +sysdep_routines +=3D \ > > + cpu-relax \ > > + cpu-relax_generic \ > > + cpu-relax_zawrs > > +endif > > + > > ifeq ($(subdir),string) > > sysdep_routines +=3D \ > > memcpy \ > > diff --git a/sysdeps/unix/sysv/linux/riscv/multiarch/cpu-relax.c b/sysd= eps/unix/sysv/linux/riscv/multiarch/cpu-relax.c > > new file mode 100644 > > index 0000000000..5aeb120e21 > > --- /dev/null > > +++ b/sysdeps/unix/sysv/linux/riscv/multiarch/cpu-relax.c > > @@ -0,0 +1,39 @@ > > +/* Multiple versions of cpu-relax. > > + All versions must be listed in ifunc-impl-list.c. > > + Copyright (C) 2024 Free Software Foundation, Inc. > > + This file is part of the GNU C Library. > > + > > + The GNU C Library is free software; you can redistribute it and/or > > + modify it under the terms of the GNU Lesser General Public > > + License as published by the Free Software Foundation; either > > + version 2.1 of the License, or (at your option) any later version. > > + > > + The GNU C Library is distributed in the hope that it will be useful= , > > + but WITHOUT ANY WARRANTY; without even the implied warranty of > > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > + Lesser General Public License for more details. > > + > > + You should have received a copy of the GNU Lesser General Public > > + License along with the GNU C Library; if not, see > > + . */ > > + > > +# include > > +# include > > +# include > > + > > +void __cpu_relax (void); > > +extern __typeof (__cpu_relax) __cpu_relax_generic attribute_hidden; > > +extern __typeof (__cpu_relax) __cpu_relax_zawrs attribute_hidden; > > + > > +static inline __typeof (__cpu_relax) * > > +select_cpu_relax_ifunc (uint64_t dl_hwcap, __riscv_hwprobe_t hwprobe_f= unc) > > +{ > > + unsigned long long int v; > > + if (__riscv_hwprobe_one (hwprobe_func, RISCV_HWPROBE_KEY_IMA_EXT_0, = &v) =3D=3D 0 > > + && (v & RISCV_HWPROBE_EXT_ZAWRS)) > > + return __cpu_relax_zawrs; > > + > > + return __cpu_relax_generic; > > +} > > + > > +riscv_libc_ifunc (__cpu_relax, select_cpu_relax_ifunc); > > diff --git a/sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c = b/sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c > > index 9f806d7a9e..9c7a8c2e1f 100644 > > --- a/sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c > > +++ b/sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c > > @@ -20,24 +20,48 @@ > > #include > > #include > > > > +#define ARRAY_SIZE(A) (sizeof (A) / sizeof ((A)[0])) > > + > > +void __cpu_relax (void); > > + > > size_t > > __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *arra= y, > > size_t max) > > { > > size_t i =3D max; > > + struct riscv_hwprobe pairs[] =3D { > > + { .key =3D RISCV_HWPROBE_KEY_IMA_EXT_0 }, > > + { .key =3D RISCV_HWPROBE_KEY_CPUPERF_0 }, > > + }; > > > > bool fast_unaligned =3D false; > > + bool has_zawrs =3D false; > > + > > + if (__riscv_hwprobe (pairs, ARRAY_SIZE (pairs), 0, NULL, 0) =3D=3D 0= ) > > + { > > + struct riscv_hwprobe *pair; > > > > - struct riscv_hwprobe pair =3D { .key =3D RISCV_HWPROBE_KEY_CPUPERF_0= }; > > - if (__riscv_hwprobe (&pair, 1, 0, NULL, 0) =3D=3D 0 > > - && (pair.value & RISCV_HWPROBE_MISALIGNED_MASK) > > + /* RISCV_HWPROBE_KEY_IMA_EXT_0 */ > > + pair =3D &pairs[0]; > > + if (pair->value & RISCV_HWPROBE_EXT_ZAWRS) > > + has_zawrs =3D true; > > + > > + /* RISCV_HWPROBE_KEY_CPUPERF_0 */ > > + pair =3D &pairs[1]; > > + if ((pair->value & RISCV_HWPROBE_MISALIGNED_MASK) > > =3D=3D RISCV_HWPROBE_MISALIGNED_FAST) > > - fast_unaligned =3D true; > > + fast_unaligned =3D true; > > + } > > > > IFUNC_IMPL (i, name, memcpy, > > IFUNC_IMPL_ADD (array, i, memcpy, fast_unaligned, > > __memcpy_noalignment) > > IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_generic)) > > > > + IFUNC_IMPL (i, name, __cpu_relax, > > + IFUNC_IMPL_ADD (array, i, __cpu_relax, has_zawrs, > > + __cpu_relax_zawrs) > > + IFUNC_IMPL_ADD (array, i, __cpu_relax, 1, __cpu_relax_gener= ic)) > > + > > return 0; > > }