From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x102b.google.com (mail-pj1-x102b.google.com [IPv6:2607:f8b0:4864:20::102b]) by sourceware.org (Postfix) with ESMTPS id 43EC53858D33 for ; Thu, 18 Apr 2024 20:25:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 43EC53858D33 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 43EC53858D33 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::102b ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1713471925; cv=none; b=HPGjNj7nSUqwxywVV/jiCzIVZoXoAsl3aTg1GNcRd6PFp9MhLUAbJi0Robno2JTIEW9FDYAzqDDe256bpfYAUk5/E1LjFaXDVjlj2wH5SLk8cZwPWFWuQAMdjapbZ22ioYTTFUWLB3rr69HG7nnd16b+Vng9azETWgjUo+Jmywo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1713471925; c=relaxed/simple; bh=TvyLar9qbWIg0nJM0zK9fx7Xzhrh/3X3BKb4d0VBakY=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=cL9+EDnKgTMMOQVSSmV6pxe0wUhbt3ngQDpX2nEDtj5D2lOFEtO797yRV6E8VrQfUmAO3LQWYD8yntqeuM7RodY+c3A4rpefonICZL8mhysRUMS2x15fveZOfW/MkjYBqcusZjxTCqgV3i7RvErAbw7RY2DKr+D+qfRwYOEPPLg= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pj1-x102b.google.com with SMTP id 98e67ed59e1d1-2a2d248a2e1so1748608a91.0 for ; Thu, 18 Apr 2024 13:25:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull.eu; s=google; t=1713471922; x=1714076722; darn=sourceware.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=67Z5nNDdzQv5uILhOV44rQb5zqMPXHjToXCvi4/ofMk=; b=NJkdLkQ3bxaQ+ReDDTWF3zoL7Z9sUpdRQfmMeGhwzfsvnICxpORmNEpfH/DWrf48fX Y+1mVgKPpPwOtDEOWYuPdGBbUI4vK2j8fP8gPn4cE5+pUvefEcuVy5ehK138vV2xFmqe 651xKQ5C/F0sIgswhxarNDyKUFnrlV/Rr+SkxJqAaGS7yu35X/CNabKQPzHcrFs02NvI QsKNOLJJu5x49uyteQD+Pr0nseFHTtUM8UqFcyyFvKJa3xn1WQ3oKt1/krvxsZcVLmRx V/lY/DPmMHTrUtbVtsw9XkQhilTmf12voKR9Mhi3lxKPFzJflRuFBqjYAkOuFWjzG60h Sifw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713471922; x=1714076722; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=67Z5nNDdzQv5uILhOV44rQb5zqMPXHjToXCvi4/ofMk=; b=q8jCevbbZ7i3Rr2helP8xVgNB6eEgqSrAqYLaB+UJwXGIwwoOpzIjLkF/7ABP8cqOO L/pJ7W/eCRacfH60OViIlx5vdbEgKVu68Fj+Z555AFWUoAmaXgcLFSV0m8LCk3n0ugG4 IaGmP0gFTUC2CihN/mewp7t+AWqXF1L7/SG5wgbPESAYeHI1+CZypCATyYnvVVWaLm/y c/uOTtrVZ1b+qdnBKdzyQYRbBber6NOLaIFiFkzYWIQ+TJo3WcEceoSkHIBcK8hBRmp5 9QcTKv4E8M2Yz6uJhUl2YK8OInPrOO9AJLWh86ZbOixJIzoc48QD2LHob0MUJwGjkOs5 av6A== X-Forwarded-Encrypted: i=1; AJvYcCWjah4v7315rnaAwgcZkDpPSP0BHevAzqLX2JSBsfWynsf25ZQ6UsR3BFZra4rDMw88t7WZt+5MhzVeJ5FIlsw2iOnESRkHwMqE X-Gm-Message-State: AOJu0YzZo82+7bfwTmES+04fWdNmdVjML5JOh1gHlFxkye7K2R1E1u8J 9XD9abDdHZqC4FfyQX8Exj6jpvt+JbbpZ48kSQk8mQh39DKpCFJHeyQKXSED7sLmHKtdJwJt3jh y0CYG3wMT2IIvWGBJk6x4SbfhtiktVVcYQFL8MQ== X-Google-Smtp-Source: AGHT+IE8ichdDcYKvUomYF/cr/JyHMeM/K+hpR7Yp2cx3zx78JwyZ4LSXzc8ekn/ZLHP0r8Sfg2zEx9SaH8bjlTuMQY= X-Received: by 2002:a17:90a:9411:b0:2a4:6ce7:37ad with SMTP id r17-20020a17090a941100b002a46ce737admr414017pjo.5.1713471922040; Thu, 18 Apr 2024 13:25:22 -0700 (PDT) MIME-Version: 1.0 References: <45e809bf-fc10-4cab-9190-48b21d8cf263@rivosinc.com> In-Reply-To: <45e809bf-fc10-4cab-9190-48b21d8cf263@rivosinc.com> From: =?UTF-8?Q?Christoph_M=C3=BCllner?= Date: Thu, 18 Apr 2024 22:25:10 +0200 Message-ID: Subject: Re: [RFC PATCH 3/3] RISC-V: Implement CPU yielding for busy loops with Zihintpause/Zawrs To: Vineet Gupta Cc: Palmer Dabbelt , libc-alpha@sourceware.org, adhemerval.zanella@linaro.org, Darius Rad , Andrew Waterman , philipp.tomsich@vrull.eu, Evan Green , kito.cheng@sifive.com, jeffreyalaw@gmail.com, gnu-toolchain Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, Apr 18, 2024 at 10:03=E2=80=AFPM Vineet Gupta wrote: > > Hi Christoph, > > My 2 cents. > > On 4/18/24 10:17, Palmer Dabbelt wrote: > > On Thu, 18 Apr 2024 02:46:35 PDT (-0700), christoph.muellner@vrull.eu w= rote: > >> The macro atomic_spin_nop can be used to implement arch-specific > >> CPU yielding that is used in busy loops (e.g. in pthread_spin_lock). > >> This patch introduces an ifunc-based implementation for RISC-V, > >> that uses Zihintpause's PAUSE instruction for that matter (as PAUSE > >> is a HINT instruction there is not dependency to Zihintpause at > >> runtime). Further, we test for Zawrs via hwprobe() and if found > >> we use WRS.STO instead of PAUSE. > > [snip] > > >> +ENTRY (__cpu_relax_generic) > >> + /* While we can use the `pause` instruction without > >> + the need of Zihintpause (because it is a HINT instruction), > >> + we still have to enable Zihintpause for the assembler. */ > >> + pause > >> + ret > >> +END (__cpu_relax_generic) > > [snip] > > >> +.option push > >> +.option arch, +zawrs > >> +ENTRY (__cpu_relax_zawrs) > >> + wrs.sto > > This has the same forward progress/eventual success violation as the > > code you sent for GCC and Linux does. It doesn't really matter if the > > user of the reservation is in a builtin, an asm block, or a function. > > The compiler just doesn't know about those reservation rules and isn't > > going to generate code that follows them. > > So, this routine maps to atomic_spin_nop () called in generic code in a > bunch of places. > > The easiest case is nptl/pthread_spin_lock.c which looks something like t= his > > __pthread_spin_lock (pthread_spinlock_t *lock) > ... > do > { > atomic_spin_nop (); > val =3D atomic_load_relaxed (lock); > } > while (val !=3D 0); > > This works fine for a PAUSE based implementation which doesn't need a > prior reservation, but WRS does and even if both the load and spin are > inlined, reservation happens after WRS which is wrong. > > So I think before we can implement this optimization for riscv, we need > a generic glibc change to replace of atomic_spin_nop () with a variant > which also takes the lock/memory under consideration. > The fallback implementation of atomic_load_and_spin_if_cond() will be > what the above loop does. For arches that do implement the API, based on > ISA semantics, they can choose to ignore the mem arg completely (RISC-V > PAUSE based implementation) or implement a little loop which explicitly > takes reservation on the mem/lock and calls WRS. > > Does that make sense ? I did something similar in the Linux patch a while ago. Here I wanted to avoid changes in arch-independent code. But you are right, wrs.sto does not make sense without a valid reservation. > > We just need to sort out the 2nd use of atomic_spin_nop in > nptl/pthread_mutex_lock.c where this conversion might not be straight fwd= . > The recent back off back spin looping is getting in the way of obvious > solution, however for discussion sake if we ignore it, the code looks > like this: > > PTHREAD_MUTEX_LOCK (pthread_mutex_t *mutex) > ... > int cnt=3D0, max_cnt =3D something; > do > { > if (cnt ++>=3D max_cnt) > { > LLL_MUTEX_LOCK (mutex); > break; > } > atomic_spin_nop (); > } > while (LLL_MUTEX_READ_LOCK (mutex) !=3D 0 || LLL_MUTEX_TRYLOCK > (mutex) !=3D 0); > > And this seems like it can be converted to a > atomic_load_and_spin_if_cond(mutex, !=3D0 ) kind of API. > > Let me know if you want me to spin this - pun intended ;-) > > -Vineet > > > > That said, as far as I can tell the same grey area in the WRS spec that > > I pointed out during one of the Linux reviews still exists. So if > > there's some hardware that has a more generous set of LR->WRS rules tha= n > > the ISA-required LR->SC rules that's fine, we just need to know what th= e > > hardware actually is so we can write down the rules and then stick to > > them -- it's just a performance thing for WRS, so having weaker rules > > seems entirely plausible. > > > > I don't know of any hardware with WRS, but sorry if I missed one. Do > > you have something publicly availiable that behaves this way?