From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=GpmG=AJ=gmail.com=jeffreyalaw@sourceware.org>
Received: from mail-pg1-x531.google.com (mail-pg1-x531.google.com [IPv6:2607:f8b0:4864:20::531])
	by sourceware.org (Postfix) with ESMTPS id DF2B73857712
	for <gcc-patches@gcc.gnu.org>; Tue, 18 Apr 2023 16:59:10 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DF2B73857712
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-pg1-x531.google.com with SMTP id 41be03b00d2f7-517bdc9e81dso1034673a12.1
        for <gcc-patches@gcc.gnu.org>; Tue, 18 Apr 2023 09:59:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1681837149; x=1684429149;
        h=content-transfer-encoding:in-reply-to:from:references:cc:to
         :content-language:subject:user-agent:mime-version:date:message-id
         :from:to:cc:subject:date:message-id:reply-to;
        bh=eXXW5F0Ij3V6ZxhO9IbMOes/N27elU0l76B8esJmgH4=;
        b=sTgvXBOsgO15JldDGLrfqzrFk1BTtE+3e3+DHisNVlCxV3hyDBG14c8oA3IyX754bQ
         aF1VjPmGq4LPDK2W0+CIV1gt4aK5KGKvk7G0qZZXe0OCOX7N4MEZFebadxvy3Io8zWma
         tj/ywMmNwunQxPBm7zmOx2drQqzVOEUCRJFeQIETL9NuXSLtUwNkrL0V9aBLoMjGYOhp
         zqaWrXF2+dkTv58GMItWrXSvK0YBIfm6FvU3nr2l4fBlMyGTx/QZhJfyOs2cmAdrRLPq
         mijZ7b8y6zJ/tzrlZU9nd3Zlba06ypByDtklILs/XfNoJ04Ikv2YMTwRcAJwtlqOZUhZ
         MJCA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1681837149; x=1684429149;
        h=content-transfer-encoding:in-reply-to:from:references:cc:to
         :content-language:subject:user-agent:mime-version:date:message-id
         :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=eXXW5F0Ij3V6ZxhO9IbMOes/N27elU0l76B8esJmgH4=;
        b=ONZJ/WXaN2oo38/qYjRuJu1A6EIPQR8mNrb4mlmHXz5Ydb8aD28VOKr50on4GAzoTa
         6wMF1WF3j3tyfZ+WS0GwRiywpyvtuLNjncFcECP4fEIJTskOE8xb/k6N/1bY+oeOLWcc
         zLUMjmlHnhg+EY2IKw0kuzBAhz82DgvKq8d3SAZjt8/JPm2dQbFXbubmLs8s/szNQGWR
         AaW3pY5YIJBWRzGDjsPcXoQ/UUHd+7yRGQXmkOHzcEZfHHvXLwIWoIylzKTh6HnK1jbn
         EqLmBlcMETiAFCla5H8u9ja02Tf3KrFvUbUWqG+JPACfH0qxhI6DGtv8sX+1aaOIw/Qz
         tolw==
X-Gm-Message-State: AAQBX9esZ9tW2a2Fgyp39HBu+hp/eBFk0NUGrc6a1jH7AaGctQuannYg
	k5tqHnrXhTJnNhC9WyKkD0U=
X-Google-Smtp-Source: AKy350ZBeufPXQ/uRwOjqANRRIl0IvSVAU0+KFX3OO0RYftEZOBQzDlNjBiWuFYtst408A+3UWL+fw==
X-Received: by 2002:a17:903:22cf:b0:1a5:1438:9bcd with SMTP id y15-20020a17090322cf00b001a514389bcdmr3034457plg.35.1681837149167;
        Tue, 18 Apr 2023 09:59:09 -0700 (PDT)
Received: from ?IPV6:2601:681:8600:13d0::f0a? ([2601:681:8600:13d0::f0a])
        by smtp.gmail.com with ESMTPSA id p7-20020a170903248700b001a68d45e52dsm9412943plw.249.2023.04.18.09.59.07
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Tue, 18 Apr 2023 09:59:08 -0700 (PDT)
Message-ID: <dcab135e-dc8a-f610-dd0b-c38bab6e292b@gmail.com>
Date: Tue, 18 Apr 2023 10:59:07 -0600
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.9.1
Subject: Re: [PATCH v5] RISCV: Inline subword atomic ops
Content-Language: en-US
To: Patrick O'Neill <patrick@rivosinc.com>, gcc-patches@gcc.gnu.org
Cc: palmer@rivosinc.com, kito.cheng@gmail.com, david.abd@gmail.com
References: <20220821215823.18207-1-palmer@rivosinc.com>
 <20230418142858.2424851-1-patrick@rivosinc.com>
From: Jeff Law <jeffreyalaw@gmail.com>
In-Reply-To: <20230418142858.2424851-1-patrick@rivosinc.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-9.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>


On 4/18/23 08:28, Patrick O'Neill wrote:
> RISC-V has no support for subword atomic operations; code currently
> generates libatomic library calls.
> 
> This patch changes the default behavior to inline subword atomic calls
> (using the same logic as the existing library call).
> Behavior can be specified using the -minline-atomics and
> -mno-inline-atomics command line flags.
> 
> gcc/libgcc/config/riscv/atomic.c has the same logic implemented in asm.
> This will need to stay for backwards compatibility and the
> -mno-inline-atomics flag.
> 
> 2023-04-18 Patrick O'Neill <patrick@rivosinc.com>
> 
> 	PR target/104338
> 	* riscv-protos.h: Add helper function stubs.
> 	* riscv.cc: Add helper functions for subword masking.
> 	* riscv.opt: Add command-line flag.
> 	* sync.md: Add masking logic and inline asm for fetch_and_op,
> 	fetch_and_nand, CAS, and exchange ops.
> 	* invoke.texi: Add blurb regarding command-line flag.
> 	* inline-atomics-1.c: New test.
> 	* inline-atomics-2.c: Likewise.
> 	* inline-atomics-3.c: Likewise.
> 	* inline-atomics-4.c: Likewise.
> 	* inline-atomics-5.c: Likewise.
> 	* inline-atomics-6.c: Likewise.
> 	* inline-atomics-7.c: Likewise.
> 	* inline-atomics-8.c: Likewise.
> 	* atomic.c: Add reference to duplicate logic.
So for others who may be interested.  The motivation here is that for a 
sub-word atomic we currently have to explicitly link in libatomic or we 
get undefined symbols.

This is particularly problematical for the distros because we're one of 
the few (only?) architectures supported by the distros that require 
linking in libatomic for these cases.  THe distros don't want to adjust 
each affected packages and be stuck carrying that change forward or 
negotiating with all the relevant upstreams.  The distros might tackle 
this problem by porting this patch into their compiler tree which has 
its own set of problems with long term maintenance.

The net is from a usability standpoint it's best if we get this problem 
addressed and backported to our gcc-13 RISC-V coordination branch.

We had held this up pending resolution of some other issues in the 
atomics space.  In retrospect that might have been a mistake.

So with that background...  Here we go...

>   
> +/* Helper function for extracting a subword from memory.  */
> +
> +void
> +riscv_subword_address (rtx mem, rtx *aligned_mem, rtx *shift, rtx *mask,
> +		       rtx *not_mask)
So I'd expand on that comment.  The idea is we would like someone 
working in the backend to be able to read the function comment and have 
a reasonable sense of what the function does as well as the inputs and 
return value.  So perhaps something like this:

/* Given memory reference MEM, expand code to compute the aligned
    memory address, shift and mask values and store them into
    *ALIGNED_MEM, *SHIFT, *MASK and *NOT_MASK.  */

Or something like that.


> +{
> +  /* Align the memory addess to a word.  */
s/addess/address/


> +  rtx addr = force_reg (Pmode, XEXP (mem, 0));
> +
> +  rtx aligned_addr = gen_reg_rtx (Pmode);
> +  emit_move_insn (aligned_addr,  gen_rtx_AND (Pmode, addr,
> +					      gen_int_mode (-4, Pmode)));
So rather than -4 as a magic number, GET_MODE_MASK would be better. 
That may result in needing to rewrap this code.  I'd bring the 
gen_rtx_AND down on a new line, aligned with aligned_addr.

Presumably using SImode is intentional here rather than wanting to use 
word_mode which would be SImode for rv32 and DImode for rv64?  I'm going 
to work based on that assumption, but if it isn't there's more work to 
do to generalize this code.


> +
> +  /* Calculate the shift amount.  */
> +  emit_move_insn (*shift, gen_rtx_AND (SImode, gen_lowpart (SImode, addr),
> +				       gen_int_mode (3, SImode)));
> +  emit_move_insn (*shift, gen_rtx_ASHIFT (SImode, *shift,
> +					  gen_int_mode(3, SImode)));


Formatting nit.  space after before open paren in the gen_int_mode call 
on that last line above.  This minor goof shows up in various places, 
please review the patch as a whole looking for similar nits.


> +
> +  /* Calculate the mask.  */
> +  int unshifted_mask;
> +  if (GET_MODE (mem) == QImode)
> +    unshifted_mask = 0xFF;
> +  else
> +    unshifted_mask = 0xFFFF;
Can you just use GET_MASK_MODE here which should simplify this to 
something like

unshifted_mask = GET_MODE_MASK (GET_MODE (mem));


> +
> +(define_expand "atomic_fetch_nand<mode>"
> +  [(set (match_operand:SHORT 0 "register_operand" "=&r")
> +	(match_operand:SHORT 1 "memory_operand" "+A"))
> +   (set (match_dup 1)
> +	(unspec_volatile:SHORT
> +	  [(not:SHORT (and:SHORT (match_dup 1)
> +				 (match_operand:SHORT 2 "reg_or_0_operand" "rJ")))
> +	   (match_operand:SI 3 "const_int_operand")] ;; model
> +	 UNSPEC_SYNC_OLD_OP_SUBWORD))]
> +  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
Just a note, constraints aren't necessary for a define_expand.  They 
don't hurt anything though.  They do document expectations, but then you 
have to maintain them over time.  I'm OK leaving them, mostly wanted to 
make sure you're aware they aren't strictly necessary for a define_expand.


> +
> +(define_expand "atomic_fetch_<atomic_optab><mode>"
> +  [(set (match_operand:SHORT 0 "register_operand" "=&r")	      ;; old value at mem
> +	(match_operand:SHORT 1 "memory_operand" "+A"))		      ;; mem location
> +   (set (match_dup 1)
> +	(unspec_volatile:SHORT
> +	  [(any_atomic:SHORT (match_dup 1)
> +		     (match_operand:SHORT 2 "reg_or_0_operand" "rJ")) ;; value for op
> +	   (match_operand:SI 3 "const_int_operand")]		      ;; model
> +	 UNSPEC_SYNC_OLD_OP_SUBWORD))]
> +  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
> +{
> +  /* We have no QImode/HImode atomics, so form a mask, then use
> +     subword_atomic_fetch_strong_<mode> to implement a LR/SC version of the
> +     operation. */
> +
> +  /* Logic duplicated in gcc/libgcc/config/riscv/atomic.c for use when inlining
> +     is disabled */
Makes me wonder if we should expose builtins that could be used by 
atomic.c rather than having the logic open coded in two places.  That 
could be a follow-up in my opinion rather than a blocker for this patch.


>   
> +(define_expand "atomic_compare_and_swap<mode>"
> +  [(match_operand:SI 0 "register_operand" "")    ;; bool output
> +   (match_operand:SHORT 1 "register_operand" "") ;; val output
> +   (match_operand:SHORT 2 "memory_operand" "")   ;; memory
> +   (match_operand:SHORT 3 "reg_or_0_operand" "") ;; expected value
> +   (match_operand:SHORT 4 "reg_or_0_operand" "") ;; desired value
> +   (match_operand:SI 5 "const_int_operand" "")   ;; is_weak
> +   (match_operand:SI 6 "const_int_operand" "")   ;; mod_s
> +   (match_operand:SI 7 "const_int_operand" "")]  ;; mod_f
> +  "TARGET_ATOMIC && TARGET_INLINE_SUBWORD_ATOMIC"
I nearly suggested you use this form for the earlier expanders where the 
RTL in the expander is ultimately thrown away because we generate all 
the RTL we need in the C fragment and finish with a "DONE" tag.


> diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
> new file mode 100644
> index 00000000000..5c5623d9b2f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-1.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mno-inline-atomics" } */
> +/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" "fetch_and_nand" { target *-*-* } 0 } */
> +/* { dg-final { scan-assembler "\tcall\t__sync_fetch_and_add_1" } } */
> +/* { dg-final { scan-assembler "\tcall\t__sync_fetch_and_nand_1" } } */
> +/* { dg-final { scan-assembler "\tcall\t__sync_bool_compare_and_swap_1" } } */
> +
> +char foo;
> +char bar;
> +char baz;
> +
> +int
> +main ()
> +{
> +  __sync_fetch_and_add(&foo, 1);
> +  __sync_fetch_and_nand(&bar, 1);
> +  __sync_bool_compare_and_swap (&baz, 1, 2);
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c b/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
> new file mode 100644
> index 00000000000..fdce7a5d71f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/inline-atomics-2.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* Verify that subword atomics do not generate calls.  */
> +/* { dg-options "-minline-atomics" } */
> +/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" "fetch_and_nand" { target *-*-* } 0 } */
> +/* { dg-final { scan-assembler-not "\tcall\t__sync_fetch_and_add_1" } } */
> +/* { dg-final { scan-assembler-not "\tcall\t__sync_fetch_and_nand_1" } } */
> +/* { dg-final { scan-assembler-not "\tcall\t__sync_bool_compare_and_swap_1" } } */
Note that you can #include another test file.  When you do that the 
dg-directives in the #included file are ignored.   Meaning that you can 
share test code, but use different dg-directives.  So for inline-2.c, 
you can have the dg-* directives above, followed by:

#include "inline-atomics-1.c"

Overall it looks pretty close to ready.

Jeff