From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk1-x735.google.com (mail-qk1-x735.google.com [IPv6:2607:f8b0:4864:20::735]) by sourceware.org (Postfix) with ESMTPS id 80F913853D40 for ; Fri, 18 Nov 2022 22:14:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 80F913853D40 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=ventanamicro.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=ventanamicro.com Received: by mail-qk1-x735.google.com with SMTP id z1so4434512qkl.9 for ; Fri, 18 Nov 2022 14:14:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ventanamicro.com; s=google; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=Aj/TE3W+uTILU4id61kAs+Zr91bQndPhPHZoEIR+cwc=; b=OcTsly5w3xpOjHbs2AKiLA1Ry2/0Mo6nRwPrZVpbeTxk9r4pC33P/2u8kMnl60kdW/ o8Ny1d3MCvdoKUfB1WvtIhFY6iXwINPmuA1B3inGH7M1s3VwB0hbg4J9FumV6TI3SiR9 dmiKkPsIN47QWIIfB1CoS92Xf2dldVtMMZORjMU2nQVn5tKY8hc6Ha4GyF13p645h0uZ evufRPjn/MZhKpcbDsExWL9zSa+4mXcwlCWoUrEA/3WiJjrGStgaVGtSmTReIr9QAxEs JsiyYAcATXgnp6Evjwv/iIO5ymCYQQQTGyoYqK0gObPmTZQInsZVHjynj724TM1ourbH TV0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Aj/TE3W+uTILU4id61kAs+Zr91bQndPhPHZoEIR+cwc=; b=aDc/WpwIyTiKX5Uu/9AQmI0QUer38PkLSJ8pYN8qqtOk8kRoB0wJ1KGRWKAIT5iC7S Prl45RGJgRBXYplS54HrdzaRXb3FKVB4kOkTaQLizyN4ADteFWVLhk1SafUIR6mws6V8 2tRK5FH9YRizGvBa72yJFO8Q3PsXP+ZA4dcHMxfWoU27pWRgJJlZtD8/RUMNJ79Txq+O 7lvdtbaGe6KznxLCrmPdgjavhifjAEJhYdGSExC2sJNLsDfnJQwn8zUmBBmkFdDu+BcU vCIwkibINfLFnmc6EcSIosxErsBgrNZ9zkj949VNdaYvFq30sXRqL0aaD03W7NzaTkbh rUFg== X-Gm-Message-State: ANoB5pmVotqzUfacw4Kmyog+0fmbspWfQ45YlNlWJnXE9ZT7BXXELat2 1wGNXM5Qz/cqdxyiUJLoQ4+g1g== X-Google-Smtp-Source: AA0mqf6JHyQZtOScT0U0WzRIXrD0IhsOzZavTwW6J0YgxgljE08JhnVWu99NbID3+21Z07DpWKU6zg== X-Received: by 2002:a05:620a:1fa:b0:6ee:24d5:b8fc with SMTP id x26-20020a05620a01fa00b006ee24d5b8fcmr7716490qkn.336.1668809685814; Fri, 18 Nov 2022 14:14:45 -0800 (PST) Received: from ?IPV6:2601:681:8600:13d0::f0a? ([2601:681:8600:13d0::f0a]) by smtp.gmail.com with ESMTPSA id d5-20020ac85445000000b0039492d503cdsm2686766qtq.51.2022.11.18.14.14.44 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 18 Nov 2022 14:14:45 -0800 (PST) Message-ID: <966b3a77-5cda-c4d0-be4e-2083c502ba05@ventanamicro.com> Date: Fri, 18 Nov 2022 15:14:43 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.3.1 Subject: Re: [PATCH] RISC-V: Optimise adding a (larger than simm12) constant Content-Language: en-US To: Philipp Tomsich , Jeff Law Cc: gcc-patches@gcc.gnu.org, Christoph Muellner , Palmer Dabbelt , Vineet Gupta , Kito Cheng References: <20221109230718.3240479-1-philipp.tomsich@vrull.eu> <84ccfc10-df8c-5c77-913e-2819c7a14ee4@gmail.com> From: Jeff Law In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 11/18/22 14:26, Philipp Tomsich wrote: > On Fri, 18 Nov 2022 at 22:13, Jeff Law wrote: >> >> On 11/9/22 16:07, Philipp Tomsich wrote: >>> Handling the register-const_int addition has very quickly escalated to >>> creating a full sign-extended 32bit constant and performing a >>> register-register for RISC-V in GCC so far, resulting in sequences like >>> (for the case of "a + 2048"): >>> li a5,4096 >>> addi a5,a5,-2048 >>> add a0,a0,a5 >>> >>> By adding an expansion for add3, we can emit optimised RTL that >>> matches the capabilities of RISC-V better by adding support for the >>> following, previously unoptimised cases: >>> - addi + addi >>> addi a0,a0,2047 >>> addi a0,a0,1 >>> - li + sh[123]add (if Zba is enabled) >>> li a5,960 >>> sh3add a0,a5,a0 >>> >>> With this commit, we also fix up riscv_adjust_libcall_cfi_prologue() >>> and riscv_adjust_libcall_cfi_epilogue() to not use gen_add3_insn, as >>> the expander will otherwise wrap the resulting set-expression in an >>> insn (causing an ICE at dwarf2-time) when invoked with -msave-restore. >>> >>> This closes the gap to LLVM, which has already been emitting these >>> optimised sequences. >>> >>> Note that this benefits is perlbench (in SPEC CPU 2017), which needs >>> to add the constant 3840. >>> >>> gcc/ChangeLog: >>> >>> * config/riscv/bitmanip.md (*shNadd): Rename. >>> (riscv_shNadd): Expose as gen_riscv_shNadd{di/si}. >>> * config/riscv/predicates.md (const_arith_shifted123_operand): >>> New predicate (for constants that are a simm12, shifted by >>> 1, 2 or 3). >>> (const_arith_2simm12_operand): New predicate (that can be >>> expressed by adding 2 simm12 together). >>> (addi_operand): New predicate (an immedaite operand suitable >>> for the new add3 expansion). >>> * config/riscv/riscv.cc (riscv_adjust_libcall_cfi_prologue): >>> Don't use gen_add3_insn, where a RTX instead of an INSN is >>> required (otherwise this will break as soon as we have a >>> define_expand for add3). >>> (riscv_adjust_libcall_cfi_epilogue): Same. >>> * config/riscv/riscv.md (addsi3): Rename. >>> (riscv_addsi3): New name for addsi3. >>> (adddi3): Rename. >>> (riscv_adddi3): New name for adddi3. >>> (add3): New expander that handles the basic and fancy >>> (such as li+sh[123]add, addi+addi, ...) cases for adding >>> register-register and register-const_int. >>> >>> gcc/testsuite/ChangeLog: >>> >>> * gcc.target/riscv/addi.c: New test. >>> * gcc.target/riscv/zba-shNadd-06.c: New test. >>> >>> Signed-off-by: Philipp Tomsich >>> --- >>> >>> gcc/config/riscv/bitmanip.md | 2 +- >>> gcc/config/riscv/predicates.md | 28 +++++++++ >>> gcc/config/riscv/riscv.cc | 10 ++-- >>> gcc/config/riscv/riscv.md | 58 ++++++++++++++++++- >>> gcc/testsuite/gcc.target/riscv/addi.c | 39 +++++++++++++ >>> .../gcc.target/riscv/zba-shNadd-06.c | 11 ++++ >>> 6 files changed, 141 insertions(+), 7 deletions(-) >>> create mode 100644 gcc/testsuite/gcc.target/riscv/addi.c >>> create mode 100644 gcc/testsuite/gcc.target/riscv/zba-shNadd-06.c >>> >>> >>> >>> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md >>> index 171a0cdced6..289ff7470c6 100644 >>> --- a/gcc/config/riscv/riscv.md >>> +++ b/gcc/config/riscv/riscv.md >>> @@ -464,6 +464,60 @@ >>> [(set_attr "type" "arith") >>> (set_attr "mode" "DI")]) >>> >>> +(define_expand "add3" >>> + [(set (match_operand:GPR 0 "register_operand" "=r,r") >>> + (plus:GPR (match_operand:GPR 1 "register_operand" " r,r") >>> + (match_operand:GPR 2 "addi_operand" " r,I")))] >>> + "" >>> +{ >>> + if (arith_operand (operands[2], mode)) >>> + emit_insn (gen_riscv_add3 (operands[0], operands[1], operands[2])); >>> + else if (const_arith_2simm12_operand (operands[2], mode)) >>> + { >>> + /* Split into two immediates that add up to the desired value: >>> + * e.g., break up "a + 2445" into: >>> + * addi a0,a0,2047 >>> + * addi a0,a0,398 >>> + */ >> Nit. GNU comment style please. >> >> >>> + >>> + HOST_WIDE_INT val = INTVAL (operands[2]); >>> + HOST_WIDE_INT saturated = HOST_WIDE_INT_M1U << (IMM_BITS - 1); >>> + >>> + if (val >= 0) >>> + saturated = ~saturated; >>> + >>> + val -= saturated; >>> + >>> + rtx tmp = gen_reg_rtx (mode); >> Can't add3 be generated by LRA? If so, don't you have to guard >> against going into this path as we shouldn't be creating new pseudos at >> that point (I know LRA can create some internally, but I don't think it >> handles new ones showing up due to target expanders). >> >> >> Similarly for the shifted_123 case immediately following. >> >> >> If we do indeed have an issue here, I'm not sure how best to resolve. >> If the output operand does not overlap with the inputs, then we're >> golden and can just re-use it to form the constant. If not, then it's >> a bit tougher. I'm not keen to add a test of no_new_pseudos to the >> operand predicate, but I don't see a better option yet. > From a cursory glance, LRA does not try to go through gen_add3_insn, > but rather forms PLUS rtx. This matches my expectation, as (if we > can't create a pseudo) the original add3 definition would run > into the same problem (as an immediate that doesn't satisfy the > SMALL_OPERAND test will need to be placed in a register). I should have remembered that.  It vectors through emit_add3_insn which you and I have looked at it not terribly long ago.  We don't define addptr3, so it'll just form the expression directly rather than going through the expanders. So, OK. jeff