From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-il1-x135.google.com (mail-il1-x135.google.com [IPv6:2607:f8b0:4864:20::135]) by sourceware.org (Postfix) with ESMTPS id A07143858D33 for ; Thu, 18 Apr 2024 20:03:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A07143858D33 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivosinc.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A07143858D33 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::135 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1713470636; cv=none; b=sXeEm75VIeVbKk40+RjlimSEEguuHDOvx7L78XL7OvOMf6ZfDwU2PDpGss7Xy50bazPvKrJFqSRjjw/Ux7LADq3TgO8+E2a26ZaEFdX4KktdyagQps9USOlk67AAbe1Jjm7+AfjV/InptlWF4WDVJ/6LpxFd4bxFxXydlKLMcxc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1713470636; c=relaxed/simple; bh=DFF8ehs3jKZeXdKc+8oLnB2n1L6tQAHnAGW3ZNDBiTA=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=LwmH/lWkK2nG5r2lnbBY6gZ4hUh3febup4agfBn/0zCMlZd0s+tc1sQCjVp/Rgy2sLnDR0Ey262yRJZXL7xGt7XM+CAukeOtYK3OfngdfkbyKejZrBF6nYUZp1/Br0cGxkDFlFalhLhX8m2gBUdsgdxgdjWNhiAGusR9QahE3zc= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-il1-x135.google.com with SMTP id e9e14a558f8ab-36a0d09e6ffso5366685ab.0 for ; Thu, 18 Apr 2024 13:03:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1713470633; x=1714075433; darn=sourceware.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=k57EZ5Eja0eZvcuPmRxQjwN2vhHJlptZg21m0JtcQpA=; b=b4jyvUIZh+unQon8gGi6GQbvOwFKpGzKI9d0Yaqr+z5uVYp0OjhlFrM43/SjGWu+Gk fb0VPt3FZs0fWVibBcCOkJ/ejcZUjim+o8MBmjLtZp35TqYL73F5dOkHMJAVR0HBZtMw YRbzA8M/xZlPoZsKRueBkJIQ07UoMbGooeMEZIRpGIpVkrxSvA16BMWfkmjcE98G8LGX SKDssZLVvfoTPXKKs1skqy9j+omDrR5z/1BQtfWfFbvVHIeUdUvL+7N51d8l6cq0NBMU lih48kc2Wg6JJCv83YGwkpcTNnCO/UAbNGuMUfdOnYiltZw8yjL10EBlXapt9khVGaE1 KoQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713470633; x=1714075433; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=k57EZ5Eja0eZvcuPmRxQjwN2vhHJlptZg21m0JtcQpA=; b=Iin4b1qzqROOd96lD6YvPwBWU88r//3sycYw5VArTZ+z3FIsr1gKe2EZGxfgP7cDOj 72PS7DpssCZSG+faSOmDbuxm+9QF5gWZ7d7HjRbvt8B3pXIOkd6wgsQCTxIpPt8ywuTw uw+4cqAOxY229xGtpuiUZvtZO39wGXPznojwUNM2dhtaP7vxygFBQ0oWU9vdbW4g/p3n XLD/hJmLAdbBJG9WdMENMTT/OVHYAmC2o8pXWjG+uURglt3LYk759iQyJQBEm3oIcZ6p oLSOK4zz8gCfx1k7e6pXW866nE1pF6PrhSGRuT5oWimvJGjVP4v9iIa7tK1Gw85v7szg m0eA== X-Gm-Message-State: AOJu0Yxcw4g7vYlj6txe3InUBB+Q/LOh3naMm4oMDC/B8+Y9qmArWsh1 yBW1e2sjYznjSTiLxBtUODcXPfUKHa6vkf9HpucArFHXPir6g2LcE9yUPNLdztQ= X-Google-Smtp-Source: AGHT+IGtvntou7PfeJA55bbVJYV/XLpg2TPDXE/vsaquVHaX2AjrYktO6EfQLtGH4PAbPXEppSaXaA== X-Received: by 2002:a05:6e02:1645:b0:36b:3bb4:674 with SMTP id v5-20020a056e02164500b0036b3bb40674mr187520ilu.27.1713470632816; Thu, 18 Apr 2024 13:03:52 -0700 (PDT) Received: from [10.0.16.165] ([50.145.13.30]) by smtp.gmail.com with ESMTPSA id r68-20020a632b47000000b005f4178a9d12sm1757729pgr.60.2024.04.18.13.03.51 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 18 Apr 2024 13:03:52 -0700 (PDT) Message-ID: <45e809bf-fc10-4cab-9190-48b21d8cf263@rivosinc.com> Date: Thu, 18 Apr 2024 13:03:50 -0700 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH 3/3] RISC-V: Implement CPU yielding for busy loops with Zihintpause/Zawrs To: Palmer Dabbelt , christoph.muellner@vrull.eu Cc: libc-alpha@sourceware.org, adhemerval.zanella@linaro.org, Darius Rad , Andrew Waterman , philipp.tomsich@vrull.eu, Evan Green , kito.cheng@sifive.com, jeffreyalaw@gmail.com, gnu-toolchain References: Content-Language: en-US From: Vineet Gupta In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3.2 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Christoph, My 2 cents. On 4/18/24 10:17, Palmer Dabbelt wrote: > On Thu, 18 Apr 2024 02:46:35 PDT (-0700), christoph.muellner@vrull.eu wrote: >> The macro atomic_spin_nop can be used to implement arch-specific >> CPU yielding that is used in busy loops (e.g. in pthread_spin_lock). >> This patch introduces an ifunc-based implementation for RISC-V, >> that uses Zihintpause's PAUSE instruction for that matter (as PAUSE >> is a HINT instruction there is not dependency to Zihintpause at >> runtime). Further, we test for Zawrs via hwprobe() and if found >> we use WRS.STO instead of PAUSE. [snip] >> +ENTRY (__cpu_relax_generic) >> + /* While we can use the `pause` instruction without >> + the need of Zihintpause (because it is a HINT instruction), >> + we still have to enable Zihintpause for the assembler. */ >> + pause >> + ret >> +END (__cpu_relax_generic) [snip] >> +.option push >> +.option arch, +zawrs >> +ENTRY (__cpu_relax_zawrs) >> + wrs.sto > This has the same forward progress/eventual success violation as the > code you sent for GCC and Linux does. It doesn't really matter if the > user of the reservation is in a builtin, an asm block, or a function. > The compiler just doesn't know about those reservation rules and isn't > going to generate code that follows them. So, this routine maps to atomic_spin_nop () called in generic code in a bunch of places. The easiest case is nptl/pthread_spin_lock.c which looks something like this __pthread_spin_lock (pthread_spinlock_t *lock) ...    do     {       atomic_spin_nop ();       val = atomic_load_relaxed (lock);     }   while (val != 0); This works fine for a PAUSE based implementation which doesn't need a prior reservation, but WRS does and even if both the load and spin are inlined, reservation happens after WRS which is wrong. So I think before we can implement this optimization for riscv, we need a generic glibc change to replace of atomic_spin_nop () with a variant which also takes the lock/memory under consideration. The fallback implementation of atomic_load_and_spin_if_cond() will be what the above loop does. For arches that do implement the API, based on ISA semantics, they can choose to ignore the mem arg completely (RISC-V PAUSE based implementation) or implement a little loop which explicitly takes reservation on the mem/lock and calls WRS. Does that make sense ? We just need to sort out the 2nd use of atomic_spin_nop in nptl/pthread_mutex_lock.c where this conversion might not be straight fwd. The recent back off back spin looping is getting in the way of obvious solution, however for discussion sake if we ignore it, the code looks like this: PTHREAD_MUTEX_LOCK (pthread_mutex_t *mutex) ...        int cnt=0, max_cnt = something;        do         {             if (cnt ++>= max_cnt)             {                  LLL_MUTEX_LOCK (mutex);                  break;             }             atomic_spin_nop ();        }        while (LLL_MUTEX_READ_LOCK (mutex) != 0 || LLL_MUTEX_TRYLOCK (mutex) != 0); And this seems like it can be converted to a atomic_load_and_spin_if_cond(mutex, !=0 ) kind of API. Let me know if you want me to spin this - pun intended ;-) -Vineet > That said, as far as I can tell the same grey area in the WRS spec that > I pointed out during one of the Linux reviews still exists. So if > there's some hardware that has a more generous set of LR->WRS rules than > the ISA-required LR->SC rules that's fine, we just need to know what the > hardware actually is so we can write down the rules and then stick to > them -- it's just a performance thing for WRS, so having weaker rules > seems entirely plausible. > > I don't know of any hardware with WRS, but sorry if I missed one. Do > you have something publicly availiable that behaves this way?