From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=f0dT=3X=gmail.com=jeffreyalaw@sourceware.org>
Received: from mail-pj1-x1035.google.com (mail-pj1-x1035.google.com [IPv6:2607:f8b0:4864:20::1035])
	by sourceware.org (Postfix) with ESMTPS id A529F3858291
	for <gcc-patches@gcc.gnu.org>; Wed, 23 Nov 2022 05:07:34 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A529F3858291
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-pj1-x1035.google.com with SMTP id y14-20020a17090a2b4e00b002189a1b84d4so862288pjc.2
        for <gcc-patches@gcc.gnu.org>; Tue, 22 Nov 2022 21:07:34 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112;
        h=content-transfer-encoding:in-reply-to:from:references:cc:to
         :content-language:subject:user-agent:mime-version:date:message-id
         :from:to:cc:subject:date:message-id:reply-to;
        bh=5B3gTYjTMiZ27LwLlo6agSuoe8j0EuIOPDjMzO+MeoQ=;
        b=CDMazrJa4P2hAEAgU3aQNP3CooAsvuOHYL2PLkocn8RK6GDInXD9caK+IjPDfzR5+R
         ezEqWNuqg6lHPnc7l24tQa/rg7QUsdQW+cTNP+05l9RWkJbcK1sehv6nEOsO8v4kjZaC
         ZOYc6DeeirO5EGO3mpFoFD6PwpBawAtIbjOMiFWUQdPhQDxXFixQ5Xq2a2iyxiYDAFOC
         mpnmzIINB27iMaN+Dl7vJX+4qzosQbTcX38qQ/q8i0sTzXAk+hYkW67ecyGWKl4CDImq
         RjW+VC1/BnfI5TZ08wOgW0OlATIrVoUNplVdOblu9SBTwBV1iX1eDsB6Ohgwl9pljtFb
         EgjQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:in-reply-to:from:references:cc:to
         :content-language:subject:user-agent:mime-version:date:message-id
         :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=5B3gTYjTMiZ27LwLlo6agSuoe8j0EuIOPDjMzO+MeoQ=;
        b=0kPvBRo44+qUS7svBnda2TR2qQ+O20VS5VV+W37Q6xtkawgkutGLaCc6H80/XUhuis
         5fZm4XwcsUyWev8/N1VLOF6gxilrxjuDOPmYQGC9RrgW4gSyT04gLgVb/YV6GXNTMqrO
         Acfv/av9nhmcs9tJVjN+ogvAWieENlwXltFbL7NgdEEn1z4PLfu9WsSXgv04Urvm0Sao
         pwIxSHyQMIPQVoTregILyRTXc15BTl6Tv/HtQo/+4j1vU9zOUYGboaD8G15sK/SlgNUo
         wXiqX3MvfsN8Pjggz4mYRDeLW5gGKqWqWxs2R1EAeENoYBxEwVavKW6rDNDCN/kI2t4f
         Ofig==
X-Gm-Message-State: ANoB5pnx9FkvcwJFJPEEdXCyZaTXf97VJ85gdLGM8GdQ0taasXSTU7D1
	dy2OT9tlu3fXnoHUfteh28o=
X-Google-Smtp-Source: AA0mqf4mxse+wkLa14h1SXNvFB9Bw2C4bWbKOdQC5xiLqq98OMSESU9FeZdC/b/YEaQaAGC7v/iJZw==
X-Received: by 2002:a17:90a:2b8c:b0:212:f4f1:96ee with SMTP id u12-20020a17090a2b8c00b00212f4f196eemr36013307pjd.72.1669180053345;
        Tue, 22 Nov 2022 21:07:33 -0800 (PST)
Received: from ?IPV6:2601:681:8600:13d0::99f? ([2601:681:8600:13d0::99f])
        by smtp.gmail.com with ESMTPSA id 85-20020a621958000000b0056b4c5dde61sm11928525pfz.98.2022.11.22.21.07.32
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Tue, 22 Nov 2022 21:07:32 -0800 (PST)
Message-ID: <bb0bbf77-27c4-14bc-d2c3-88dfabdd3671@gmail.com>
Date: Tue, 22 Nov 2022 22:07:31 -0700
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.3.1
Subject: Re: [PATCH v3] RISC-V modified add3 for large stack frame
 optimization [PR105733]
Content-Language: en-US
To: Kevin Lee <kevinl@rivosinc.com>, gcc-patches@gcc.gnu.org
Cc: gnu-toolchain@rivosinc.com
References: <20221104005331.775049-1-kevinl@rivosinc.com>
From: Jeff Law <jeffreyalaw@gmail.com>
In-Reply-To: <20221104005331.775049-1-kevinl@rivosinc.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-7.3 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>


On 11/3/22 18:53, Kevin Lee wrote:
> This is the identical patch with https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604814.html, but with the correct plaintext format.
>
>> The loop still seems a bit odd which may point to further improvements
>> that could be made to this patch.  Consider this fragment of the loop:
> Thank you for the review Jeff! I am currently looking into this issue
> in a different patch. I'll come back with some improvement.
>   
> gcc/ChangeLog:
>     Jim Wilson <jim.wilson.gcc@gmail.com>
>     Michael Collison <collison@rivosinc.com>
>     Kevin Lee <kevinl@rivosinc.com>
>     
> 	* config/riscv/predicates.md (const_lui_operand): New Predicate.
> 	(add_operand): Ditto.
> 	(reg_or_const_int_operand): Ditto.
> 	* config/riscv/riscv-protos.h (riscv_eliminable_reg): New
>     function.
> 	* config/riscv/riscv-selftests.cc (calculate_x_in_sequence):
>     Consider Parallel insns.
> 	* config/riscv/riscv.cc (riscv_eliminable_reg): New function.
> 	(riscv_adjust_libcall_cfi_prologue): Use gen_rtx_SET and
>     gen_rtx_fmt_ee instead of gen_add3_insn.
> 	(riscv_adjust_libcall_cfi_epilogue): Ditto.
> 	* config/riscv/riscv.md (addsi3): Remove.
> 	(add<mode>3): New instruction for large stack frame
>     optimization.
> 	(add<mode>3_internal): Ditto.
> 	(adddi3): Remove.
> 	(add<mode>3_internal2): New instruction for insns generated in
>     the prologue and epilogue pass.
> ---
>
> diff --git a/gcc/config/riscv/riscv-selftests.cc b/gcc/config/riscv/riscv-selftests.cc
> index 636874ebc0f..50457db708e 100644
> --- a/gcc/config/riscv/riscv-selftests.cc
> +++ b/gcc/config/riscv/riscv-selftests.cc
> @@ -116,6 +116,9 @@ calculate_x_in_sequence (rtx reg)
>         rtx pat = PATTERN (insn);
>         rtx dest = SET_DEST (pat);
>   
> +      if (GET_CODE (pat) == PARALLEL)
> +	dest = SET_DEST (XVECEXP (pat, 0, 0));

So this assumes you've got a parallel where the first vector is a SET.  
That may well be true, but it's probably safer to verify with something like


     gcc_assert (GET_CODE (XVECEXP (pat, 0, 0)) == SET)


That way we're not surprised in the future if we have more patterns that 
use PARALLEL, perhaps with the first not being a simple set.


> +{
> +  return REG_P (x) && (REGNO (x) == FRAME_POINTER_REGNUM
> +		       || REGNO (x) == ARG_POINTER_REGNUM
> +		       || (REGNO (x) >= FIRST_VIRTUAL_REGISTER
> +			   && REGNO (x) <= LAST_VIRTUAL_REGISTER));

Instead I'd write it as


   return (REG_P (x)
           && (REGNO (x) == FRAME_POINTER_REGNUM
               || REGNO (x) == ARG_POINTER_REGNUM
               || (REGNO (x) >= FIRST_VIRUTAL_REGISTER
                   && REGNO (x) <= LAST_VIRTUAL_REGISTER)));


That's just the style most GCC folks are used to reading.


> @@ -4887,8 +4897,9 @@ riscv_adjust_libcall_cfi_prologue ()
>         }
>   
>     /* Debug info for adjust sp.  */
> -  adjust_sp_rtx = gen_add3_insn (stack_pointer_rtx,
> -				 stack_pointer_rtx, GEN_INT (-saved_size));
> +  adjust_sp_rtx = gen_rtx_SET (stack_pointer_rtx,
> +				    gen_rtx_fmt_ee (PLUS, GET_MODE (stack_pointer_rtx),
> +						  stack_pointer_rtx, GEN_INT (saved_size)));
>     dwarf = alloc_reg_note (REG_CFA_ADJUST_CFA, adjust_sp_rtx,
>   			  dwarf);
>     return dwarf;
> @@ -4990,8 +5001,9 @@ riscv_adjust_libcall_cfi_epilogue ()
>     int saved_size = cfun->machine->frame.save_libcall_adjustment;
>   
>     /* Debug info for adjust sp.  */
> -  adjust_sp_rtx = gen_add3_insn (stack_pointer_rtx,
> -				 stack_pointer_rtx, GEN_INT (saved_size));
> +  adjust_sp_rtx = gen_rtx_SET (stack_pointer_rtx,
> +				    gen_rtx_fmt_ee (PLUS, GET_MODE (stack_pointer_rtx),
> +						  stack_pointer_rtx, GEN_INT (saved_size)));
>     dwarf = alloc_reg_note (REG_CFA_ADJUST_CFA, adjust_sp_rtx,
>   			  dwarf);

I think this duplicates a change from Phillip & his team. This as to fix 
ICEs in the dwarf2 CFI generator, right?  Please double check as remove 
if it does duplicate a change from Philipp & his team.


> +
> +(define_insn_and_split "add<mode>3_internal"
> +  [(set (match_operand:GPR          0 "register_operand" "=r,r,&r,!&r")
> +	(plus:GPR (match_operand:GPR 1 "register_operand" " %r,r,r,0")
> +		  (match_operand:GPR 2 "add_operand"      " r,I,L,L")))
> +  (clobber (match_scratch:GPR 3 "=X,X,X,&r"))]
> +  ""
> +{
> +  if ((which_alternative == 2) || (which_alternative == 3))
> +    return "#";
> +
> +  if (TARGET_64BIT && <MODE>mode == SImode)
> +    return "add%i2w\t%0,%1,%2";
> +  else
> +    return "add%i2\t%0,%1,%2";
> +}
> +  "&& reload_completed && const_lui_operand (operands[2], <MODE>mode)"
> +  [(const_int 0)]
> +{
> +  if (REGNO (operands[0]) != REGNO (operands[1]))
> +    {
> +      emit_insn (gen_mov<mode> (operands[0], operands[2]));
> +      emit_insn (gen_add<mode>3_internal (operands[0], operands[0], operands[1]));
> +    }
> +  else
> +    {
> +      emit_insn (gen_mov<mode> (operands[3], operands[2]));
> +      emit_insn (gen_add<mode>3_internal (operands[0], operands[0], operands[3]));
> +    }
> +  DONE;
> +}
>     [(set_attr "type" "arith")
> -   (set_attr "mode" "SI")])
> +   (set_attr "mode" "<MODE>")])

So I think it was Kito that was concerned about potential performance 
impacts, particularly with the clobber expression.      Probably the 
best thing I can think of to minimize the potential for codegen changes 
would be to leave your existing add<mode>3 pattern alone.


Instead define an addptr pattern.  That one is special in that it's only 
used by LRA.  So the potential for impacting other code is minimized.


Rather than checking REGNO (operands[0]) != REGNO (operands[1]) use 
reg_overlap_mentioned_p instead.  The only notable difference for this 
code is that reg_overlap_mentioned_p will work with modes that cover 
multiple hard regs as well as subregs.

Jeff