From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=4gl/=3S=ventanamicro.com=jlaw@sourceware.org>
Received: from mail-qk1-x735.google.com (mail-qk1-x735.google.com [IPv6:2607:f8b0:4864:20::735])
	by sourceware.org (Postfix) with ESMTPS id 80F913853D40
	for <gcc-patches@gcc.gnu.org>; Fri, 18 Nov 2022 22:14:46 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 80F913853D40
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=ventanamicro.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=ventanamicro.com
Received: by mail-qk1-x735.google.com with SMTP id z1so4434512qkl.9
        for <gcc-patches@gcc.gnu.org>; Fri, 18 Nov 2022 14:14:46 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=ventanamicro.com; s=google;
        h=content-transfer-encoding:in-reply-to:from:references:cc:to
         :content-language:subject:user-agent:mime-version:date:message-id
         :from:to:cc:subject:date:message-id:reply-to;
        bh=Aj/TE3W+uTILU4id61kAs+Zr91bQndPhPHZoEIR+cwc=;
        b=OcTsly5w3xpOjHbs2AKiLA1Ry2/0Mo6nRwPrZVpbeTxk9r4pC33P/2u8kMnl60kdW/
         o8Ny1d3MCvdoKUfB1WvtIhFY6iXwINPmuA1B3inGH7M1s3VwB0hbg4J9FumV6TI3SiR9
         dmiKkPsIN47QWIIfB1CoS92Xf2dldVtMMZORjMU2nQVn5tKY8hc6Ha4GyF13p645h0uZ
         evufRPjn/MZhKpcbDsExWL9zSa+4mXcwlCWoUrEA/3WiJjrGStgaVGtSmTReIr9QAxEs
         JsiyYAcATXgnp6Evjwv/iIO5ymCYQQQTGyoYqK0gObPmTZQInsZVHjynj724TM1ourbH
         TV0Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:in-reply-to:from:references:cc:to
         :content-language:subject:user-agent:mime-version:date:message-id
         :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=Aj/TE3W+uTILU4id61kAs+Zr91bQndPhPHZoEIR+cwc=;
        b=aDc/WpwIyTiKX5Uu/9AQmI0QUer38PkLSJ8pYN8qqtOk8kRoB0wJ1KGRWKAIT5iC7S
         Prl45RGJgRBXYplS54HrdzaRXb3FKVB4kOkTaQLizyN4ADteFWVLhk1SafUIR6mws6V8
         2tRK5FH9YRizGvBa72yJFO8Q3PsXP+ZA4dcHMxfWoU27pWRgJJlZtD8/RUMNJ79Txq+O
         7lvdtbaGe6KznxLCrmPdgjavhifjAEJhYdGSExC2sJNLsDfnJQwn8zUmBBmkFdDu+BcU
         vCIwkibINfLFnmc6EcSIosxErsBgrNZ9zkj949VNdaYvFq30sXRqL0aaD03W7NzaTkbh
         rUFg==
X-Gm-Message-State: ANoB5pmVotqzUfacw4Kmyog+0fmbspWfQ45YlNlWJnXE9ZT7BXXELat2
	1wGNXM5Qz/cqdxyiUJLoQ4+g1g==
X-Google-Smtp-Source: AA0mqf6JHyQZtOScT0U0WzRIXrD0IhsOzZavTwW6J0YgxgljE08JhnVWu99NbID3+21Z07DpWKU6zg==
X-Received: by 2002:a05:620a:1fa:b0:6ee:24d5:b8fc with SMTP id x26-20020a05620a01fa00b006ee24d5b8fcmr7716490qkn.336.1668809685814;
        Fri, 18 Nov 2022 14:14:45 -0800 (PST)
Received: from ?IPV6:2601:681:8600:13d0::f0a? ([2601:681:8600:13d0::f0a])
        by smtp.gmail.com with ESMTPSA id d5-20020ac85445000000b0039492d503cdsm2686766qtq.51.2022.11.18.14.14.44
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Fri, 18 Nov 2022 14:14:45 -0800 (PST)
Message-ID: <966b3a77-5cda-c4d0-be4e-2083c502ba05@ventanamicro.com>
Date: Fri, 18 Nov 2022 15:14:43 -0700
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.3.1
Subject: Re: [PATCH] RISC-V: Optimise adding a (larger than simm12) constant
Content-Language: en-US
To: Philipp Tomsich <philipp.tomsich@vrull.eu>,
 Jeff Law <jeffreyalaw@gmail.com>
Cc: gcc-patches@gcc.gnu.org, Christoph Muellner
 <christoph.muellner@vrull.eu>, Palmer Dabbelt <palmer@rivosinc.com>,
 Vineet Gupta <vineetg@rivosinc.com>, Kito Cheng <kito.cheng@gmail.com>
References: <20221109230718.3240479-1-philipp.tomsich@vrull.eu>
 <84ccfc10-df8c-5c77-913e-2819c7a14ee4@gmail.com>
 <CAAeLtUBU=12NzX_XN_Wy4tbCgQraPCDqwDtjLZ=HUnR6qPMHXw@mail.gmail.com>
From: Jeff Law <jlaw@ventanamicro.com>
In-Reply-To: <CAAeLtUBU=12NzX_XN_Wy4tbCgQraPCDqwDtjLZ=HUnR6qPMHXw@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>


On 11/18/22 14:26, Philipp Tomsich wrote:
> On Fri, 18 Nov 2022 at 22:13, Jeff Law <jeffreyalaw@gmail.com> wrote:
>>
>> On 11/9/22 16:07, Philipp Tomsich wrote:
>>> Handling the register-const_int addition has very quickly escalated to
>>> creating a full sign-extended 32bit constant and performing a
>>> register-register for RISC-V in GCC so far, resulting in sequences like
>>> (for the case of "a + 2048"):
>>>        li      a5,4096
>>>        addi    a5,a5,-2048
>>>        add     a0,a0,a5
>>>
>>> By adding an expansion for add<mode>3, we can emit optimised RTL that
>>> matches the capabilities of RISC-V better by adding support for the
>>> following, previously unoptimised cases:
>>>     - addi + addi
>>>        addi    a0,a0,2047
>>>        addi    a0,a0,1
>>>     - li + sh[123]add (if Zba is enabled)
>>>        li      a5,960
>>>        sh3add  a0,a5,a0
>>>
>>> With this commit, we also fix up riscv_adjust_libcall_cfi_prologue()
>>> and riscv_adjust_libcall_cfi_epilogue() to not use gen_add3_insn, as
>>> the expander will otherwise wrap the resulting set-expression in an
>>> insn (causing an ICE at dwarf2-time) when invoked with -msave-restore.
>>>
>>> This closes the gap to LLVM, which has already been emitting these
>>> optimised sequences.
>>>
>>> Note that this benefits is perlbench (in SPEC CPU 2017), which needs
>>> to add the constant 3840.
>>>
>>> gcc/ChangeLog:
>>>
>>>        * config/riscv/bitmanip.md (*shNadd): Rename.
>>>        (riscv_shNadd<X:mode>): Expose as gen_riscv_shNadd{di/si}.
>>>        * config/riscv/predicates.md (const_arith_shifted123_operand):
>>>        New predicate (for constants that are a simm12, shifted by
>>>        1, 2 or 3).
>>>        (const_arith_2simm12_operand): New predicate (that can be
>>>        expressed by adding 2 simm12 together).
>>>        (addi_operand): New predicate (an immedaite operand suitable
>>>        for the new add<mode>3 expansion).
>>>        * config/riscv/riscv.cc (riscv_adjust_libcall_cfi_prologue):
>>>        Don't use gen_add3_insn, where a RTX instead of an INSN is
>>>        required (otherwise this will break as soon as we have a
>>>        define_expand for add<mode>3).
>>>        (riscv_adjust_libcall_cfi_epilogue): Same.
>>>        * config/riscv/riscv.md (addsi3): Rename.
>>>        (riscv_addsi3): New name for addsi3.
>>>        (adddi3): Rename.
>>>        (riscv_adddi3): New name for adddi3.
>>>        (add<mode>3): New expander that handles the basic and fancy
>>>        (such as li+sh[123]add, addi+addi, ...) cases for adding
>>>        register-register and register-const_int.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>>        * gcc.target/riscv/addi.c: New test.
>>>        * gcc.target/riscv/zba-shNadd-06.c: New test.
>>>
>>> Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
>>> ---
>>>
>>>    gcc/config/riscv/bitmanip.md                  |  2 +-
>>>    gcc/config/riscv/predicates.md                | 28 +++++++++
>>>    gcc/config/riscv/riscv.cc                     | 10 ++--
>>>    gcc/config/riscv/riscv.md                     | 58 ++++++++++++++++++-
>>>    gcc/testsuite/gcc.target/riscv/addi.c         | 39 +++++++++++++
>>>    .../gcc.target/riscv/zba-shNadd-06.c          | 11 ++++
>>>    6 files changed, 141 insertions(+), 7 deletions(-)
>>>    create mode 100644 gcc/testsuite/gcc.target/riscv/addi.c
>>>    create mode 100644 gcc/testsuite/gcc.target/riscv/zba-shNadd-06.c
>>>
>>>
>>>
>>> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
>>> index 171a0cdced6..289ff7470c6 100644
>>> --- a/gcc/config/riscv/riscv.md
>>> +++ b/gcc/config/riscv/riscv.md
>>> @@ -464,6 +464,60 @@
>>>      [(set_attr "type" "arith")
>>>       (set_attr "mode" "DI")])
>>>
>>> +(define_expand "add<mode>3"
>>> +  [(set (match_operand:GPR           0 "register_operand"      "=r,r")
>>> +     (plus:GPR (match_operand:GPR 1 "register_operand"      " r,r")
>>> +               (match_operand:GPR 2 "addi_operand"          " r,I")))]
>>> +  ""
>>> +{
>>> +  if (arith_operand (operands[2], <MODE>mode))
>>> +    emit_insn (gen_riscv_add<mode>3 (operands[0], operands[1], operands[2]));
>>> +  else if (const_arith_2simm12_operand (operands[2], <MODE>mode))
>>> +    {
>>> +      /* Split into two immediates that add up to the desired value:
>>> +       * e.g., break up "a + 2445" into:
>>> +       *         addi        a0,a0,2047
>>> +       *      addi   a0,a0,398
>>> +       */
>> Nit.  GNU comment style please.
>>
>>
>>> +
>>> +      HOST_WIDE_INT val = INTVAL (operands[2]);
>>> +      HOST_WIDE_INT saturated = HOST_WIDE_INT_M1U << (IMM_BITS - 1);
>>> +
>>> +      if (val >= 0)
>>> +      saturated = ~saturated;
>>> +
>>> +      val -= saturated;
>>> +
>>> +      rtx tmp = gen_reg_rtx (<MODE>mode);
>> Can't add<mode>3 be generated by LRA?  If so, don't you have to guard
>> against going into this path as we shouldn't be creating new pseudos at
>> that point (I know LRA can create some internally, but I don't think it
>> handles new ones showing up due to target expanders).
>>
>>
>> Similarly for the shifted_123 case immediately following.
>>
>>
>> If we do indeed have an issue here, I'm not sure how best to resolve.
>> If the output operand does not overlap with the inputs, then we're
>> golden and can just re-use it to form the constant.  If not,  then it's
>> a bit tougher.  I'm not keen to add a test of no_new_pseudos to the
>> operand predicate, but I don't see a better option yet.
>  From a cursory glance, LRA does not try to go through gen_add3_insn,
> but rather forms PLUS rtx.  This matches my expectation, as (if we
> can't create a pseudo) the original add<mode>3 definition would run
> into the same problem (as an immediate that doesn't satisfy the
> SMALL_OPERAND test will need to be placed in a register).

I should have remembered that.  It vectors through emit_add3_insn which 
you and I have looked at it not terribly long ago.  We don't define 
addptr3, so it'll just form the expression directly rather than going 
through the expanders.


So, OK.


jeff