From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vk1-xa31.google.com (mail-vk1-xa31.google.com [IPv6:2607:f8b0:4864:20::a31]) by sourceware.org (Postfix) with ESMTPS id BA1803858C39 for ; Fri, 4 Nov 2022 00:59:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BA1803858C39 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-vk1-xa31.google.com with SMTP id h16so1636407vkn.4 for ; Thu, 03 Nov 2022 17:59:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=vKyl8Rc+6RBRH6rcpTAVDYX8wrTJhQTobExXPOqzo6Y=; b=huCUEj1+MADOBruDFUxzf2IZLkFVaq59kBxhlDe8+XjYh2qhtHDpqX7BYZ+H3HjYsH hGN5u/F8lVMxdZ5RbKUak5EHR4SpgFRD1OzMKeoXZWYzen8NQvDuaqfA48+Z11Vgi8wq hkGejIGxRaB/eIJIX2KvuXR6deelymjoIEE8qILJ7/xKbBQt2XUcYjRsyvSy62gczyyw AhIwGgfb0N+WWAkHhds2rArU/+IOlqTs4Pd20+CkG9rAbELXIRHN8Ls/JqZOx1I+F0WT iIPsbYfK8SVH8qaVQ04OhiWsaFQxT80IZLNwriGqVi0/GU4fGNZc9eb152OmfNiZTFhx mE2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=vKyl8Rc+6RBRH6rcpTAVDYX8wrTJhQTobExXPOqzo6Y=; b=KKr63++4GxvqdSIuIWEwEiKMAOzj6i3/ylzcTb+scb40GDuu32I8WAnvHaIP+cwBYp QGXFGxKKwy2qQIMtTyNNY+R3NQt/YSQtruRURiE5QM+1Tm6hjCbU4hZu9mf9eqzBGfgj UDwEl0pDDZFPd/kP4CkFBfYmbGsCEJW3/RsIUaZMG6DVLLaFI3EK6C/dmlNrQmwb5rW2 x+hlytZB0Ao3ydzRTjji6b+U0SRHVly2tsnj+kUD5voV5udCQB3W9pn8+/K4ym4VUXqn +b9VJS6b4WE6Ij029XWxt8gieyLprsB/AuMCCiqHvl4EaA0z2ldS+h2y0fljCakll7kB c9nA== X-Gm-Message-State: ACrzQf1Jnr1MVTLUOaBfQK9Efui0T3rKX3N6mxhsYVdUA7gTxsOLssdZ Bw8ua31Sq0Ls5nGhTsWigGxp2CwQ6gEmIudWVV8ABZTMC1gqtA== X-Google-Smtp-Source: AMsMyM5QGvONI57KzR2Iiyru8JUe9y+yrJ+bvXXMGUReQeMN611XBfMnUdYPhBCWW138DT54eQiNJ+11nmrTVdQs2vc= X-Received: by 2002:a1f:f28a:0:b0:3b6:868d:5fbf with SMTP id q132-20020a1ff28a000000b003b6868d5fbfmr18833148vkh.5.1667523573803; Thu, 03 Nov 2022 17:59:33 -0700 (PDT) MIME-Version: 1.0 References: <20221104005331.775049-1-kevinl@rivosinc.com> In-Reply-To: <20221104005331.775049-1-kevinl@rivosinc.com> From: Kito Cheng Date: Thu, 3 Nov 2022 17:59:22 -0700 Message-ID: Subject: Re: [PATCH v3] RISC-V modified add3 for large stack frame optimization [PR105733] To: Kevin Lee Cc: gcc-patches@gcc.gnu.org, gnu-toolchain@rivosinc.com, Jeff Law Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-7.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: I would like to see some benchmark results instead of just a simple case, to make sure everything is alright, the add pattern is used literally anywhere, my most worry is the clobber might bring some negative impact like cause register pressure estimation get higher, and then result worse code gen. Dynamic instruction count is good enough and no cycle count or complete spec score. On Thu, Nov 3, 2022 at 5:57 PM Kevin Lee wrote: > > This is the identical patch with https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604814.html, but with the correct plaintext format. > > >The loop still seems a bit odd which may point to further improvements > >that could be made to this patch. Consider this fragment of the loop: > > Thank you for the review Jeff! I am currently looking into this issue > in a different patch. I'll come back with some improvement. > > gcc/ChangeLog: > Jim Wilson > Michael Collison > Kevin Lee > > * config/riscv/predicates.md (const_lui_operand): New Predicate. > (add_operand): Ditto. > (reg_or_const_int_operand): Ditto. > * config/riscv/riscv-protos.h (riscv_eliminable_reg): New > function. > * config/riscv/riscv-selftests.cc (calculate_x_in_sequence): > Consider Parallel insns. > * config/riscv/riscv.cc (riscv_eliminable_reg): New function. > (riscv_adjust_libcall_cfi_prologue): Use gen_rtx_SET and > gen_rtx_fmt_ee instead of gen_add3_insn. > (riscv_adjust_libcall_cfi_epilogue): Ditto. > * config/riscv/riscv.md (addsi3): Remove. > (add3): New instruction for large stack frame > optimization. > (add3_internal): Ditto. > (adddi3): Remove. > (add3_internal2): New instruction for insns generated in > the prologue and epilogue pass. > --- > gcc/config/riscv/predicates.md | 13 +++++ > gcc/config/riscv/riscv-protos.h | 1 + > gcc/config/riscv/riscv-selftests.cc | 3 ++ > gcc/config/riscv/riscv.cc | 20 +++++-- > gcc/config/riscv/riscv.md | 84 ++++++++++++++++++++++++----- > 5 files changed, 104 insertions(+), 17 deletions(-) > > diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md > index c2ff41bb0fd..3149f7227ac 100644 > --- a/gcc/config/riscv/predicates.md > +++ b/gcc/config/riscv/predicates.md > @@ -35,6 +35,14 @@ > (ior (match_operand 0 "arith_operand") > (match_operand 0 "lui_operand"))) > > +(define_predicate "const_lui_operand" > + (and (match_code "const_int") > + (match_test "(INTVAL (op) & 0xFFF) == 0 && INTVAL (op) != 0"))) > + > +(define_predicate "add_operand" > + (ior (match_operand 0 "arith_operand") > + (match_operand 0 "const_lui_operand"))) > + > (define_predicate "const_csr_operand" > (and (match_code "const_int") > (match_test "IN_RANGE (INTVAL (op), 0, 31)"))) > @@ -59,6 +67,11 @@ > (ior (match_operand 0 "const_0_operand") > (match_operand 0 "register_operand"))) > > +;; For use in adds, when adding to an eliminable register. > +(define_predicate "reg_or_const_int_operand" > + (ior (match_code "const_int") > + (match_operand 0 "register_operand"))) > + > ;; Only use branch-on-bit sequences when the mask is not an ANDI immediate. > (define_predicate "branch_on_bit_operand" > (and (match_code "const_int") > diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h > index 5a718bb62b4..9348ac71956 100644 > --- a/gcc/config/riscv/riscv-protos.h > +++ b/gcc/config/riscv/riscv-protos.h > @@ -63,6 +63,7 @@ extern void riscv_expand_conditional_move (rtx, rtx, rtx, rtx_code, rtx, rtx); > extern rtx riscv_legitimize_call_address (rtx); > extern void riscv_set_return_address (rtx, rtx); > extern bool riscv_expand_block_move (rtx, rtx, rtx); > +extern bool riscv_eliminable_reg (rtx); > extern rtx riscv_return_addr (int, rtx); > extern poly_int64 riscv_initial_elimination_offset (int, int); > extern void riscv_expand_prologue (void); > diff --git a/gcc/config/riscv/riscv-selftests.cc b/gcc/config/riscv/riscv-selftests.cc > index 636874ebc0f..50457db708e 100644 > --- a/gcc/config/riscv/riscv-selftests.cc > +++ b/gcc/config/riscv/riscv-selftests.cc > @@ -116,6 +116,9 @@ calculate_x_in_sequence (rtx reg) > rtx pat = PATTERN (insn); > rtx dest = SET_DEST (pat); > > + if (GET_CODE (pat) == PARALLEL) > + dest = SET_DEST (XVECEXP (pat, 0, 0)); > + > if (GET_CODE (pat) == CLOBBER) > continue; > > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc > index 32f9ef9ade9..de9344b37a3 100644 > --- a/gcc/config/riscv/riscv.cc > +++ b/gcc/config/riscv/riscv.cc > @@ -4686,6 +4686,16 @@ riscv_initial_elimination_offset (int from, int to) > return src - dest; > } > > +/* Return true if X is a register that will be eliminated later on. */ > +bool > +riscv_eliminable_reg (rtx x) > +{ > + return REG_P (x) && (REGNO (x) == FRAME_POINTER_REGNUM > + || REGNO (x) == ARG_POINTER_REGNUM > + || (REGNO (x) >= FIRST_VIRTUAL_REGISTER > + && REGNO (x) <= LAST_VIRTUAL_REGISTER)); > +} > + > /* Implement RETURN_ADDR_RTX. We do not support moving back to a > previous frame. */ > > @@ -4887,8 +4897,9 @@ riscv_adjust_libcall_cfi_prologue () > } > > /* Debug info for adjust sp. */ > - adjust_sp_rtx = gen_add3_insn (stack_pointer_rtx, > - stack_pointer_rtx, GEN_INT (-saved_size)); > + adjust_sp_rtx = gen_rtx_SET (stack_pointer_rtx, > + gen_rtx_fmt_ee (PLUS, GET_MODE (stack_pointer_rtx), > + stack_pointer_rtx, GEN_INT (saved_size))); > dwarf = alloc_reg_note (REG_CFA_ADJUST_CFA, adjust_sp_rtx, > dwarf); > return dwarf; > @@ -4990,8 +5001,9 @@ riscv_adjust_libcall_cfi_epilogue () > int saved_size = cfun->machine->frame.save_libcall_adjustment; > > /* Debug info for adjust sp. */ > - adjust_sp_rtx = gen_add3_insn (stack_pointer_rtx, > - stack_pointer_rtx, GEN_INT (saved_size)); > + adjust_sp_rtx = gen_rtx_SET (stack_pointer_rtx, > + gen_rtx_fmt_ee (PLUS, GET_MODE (stack_pointer_rtx), > + stack_pointer_rtx, GEN_INT (saved_size))); > dwarf = alloc_reg_note (REG_CFA_ADJUST_CFA, adjust_sp_rtx, > dwarf); > > diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md > index 798f7370a08..985dbdd50c4 100644 > --- a/gcc/config/riscv/riscv.md > +++ b/gcc/config/riscv/riscv.md > @@ -446,23 +446,80 @@ > [(set_attr "type" "fadd") > (set_attr "mode" "")]) > > -(define_insn "addsi3" > - [(set (match_operand:SI 0 "register_operand" "=r,r") > - (plus:SI (match_operand:SI 1 "register_operand" " r,r") > - (match_operand:SI 2 "arith_operand" " r,I")))] > +(define_expand "add3" > + [(parallel > + [(set (match_operand:GPR 0 "register_operand" "") > + (plus:GPR (match_operand:GPR 1 "register_operand" "") > + (match_operand:GPR 2 "add_operand" ""))) > + (clobber (match_scratch:GPR 3 ""))])] > "" > - "add%i2%~\t%0,%1,%2" > +{ > + if (riscv_eliminable_reg (operands[1])) > + { > + if (splittable_const_int_operand (operands[2], mode)) > + { > + /* The idea here is that we emit > + add op0, op1, %hi(op2) > + addi op0, op0, %lo(op2) > + Then when op1, the eliminable reg, gets replaced with sp+offset, > + we can simplify the constants. */ > + HOST_WIDE_INT high_part = CONST_HIGH_PART (INTVAL (operands[2])); > + emit_insn (gen_add3_internal (operands[0], operands[1], > + GEN_INT (high_part))); > + operands[1] = operands[0]; > + operands[2] = GEN_INT (INTVAL (operands[2]) - high_part); > + } > + else if (! const_arith_operand (operands[2], mode)) > + operands[2] = force_reg (mode, operands[2]); > + } > +}) > + > +(define_insn_and_split "add3_internal" > + [(set (match_operand:GPR 0 "register_operand" "=r,r,&r,!&r") > + (plus:GPR (match_operand:GPR 1 "register_operand" " %r,r,r,0") > + (match_operand:GPR 2 "add_operand" " r,I,L,L"))) > + (clobber (match_scratch:GPR 3 "=X,X,X,&r"))] > + "" > +{ > + if ((which_alternative == 2) || (which_alternative == 3)) > + return "#"; > + > + if (TARGET_64BIT && mode == SImode) > + return "add%i2w\t%0,%1,%2"; > + else > + return "add%i2\t%0,%1,%2"; > +} > + "&& reload_completed && const_lui_operand (operands[2], mode)" > + [(const_int 0)] > +{ > + if (REGNO (operands[0]) != REGNO (operands[1])) > + { > + emit_insn (gen_mov (operands[0], operands[2])); > + emit_insn (gen_add3_internal (operands[0], operands[0], operands[1])); > + } > + else > + { > + emit_insn (gen_mov (operands[3], operands[2])); > + emit_insn (gen_add3_internal (operands[0], operands[0], operands[3])); > + } > + DONE; > +} > [(set_attr "type" "arith") > - (set_attr "mode" "SI")]) > + (set_attr "mode" "")]) > > -(define_insn "adddi3" > - [(set (match_operand:DI 0 "register_operand" "=r,r") > - (plus:DI (match_operand:DI 1 "register_operand" " r,r") > - (match_operand:DI 2 "arith_operand" " r,I")))] > - "TARGET_64BIT" > - "add%i2\t%0,%1,%2" > +(define_insn "add3_internal2" > + [(set (match_operand:GPR 0 "register_operand" "=r,r") > + (plus:GPR (match_operand:GPR 1 "register_operand" " %r,r") > + (match_operand:GPR 2 "arith_operand" " r,I")))] > + "" > + { > + if (TARGET_64BIT && mode == SImode) > + return "add%i2w\t%0,%1,%2"; > + else > + return "add%i2\t%0,%1,%2"; > + } > [(set_attr "type" "arith") > - (set_attr "mode" "DI")]) > + (set_attr "mode" "")]) > > (define_expand "addv4" > [(set (match_operand:GPR 0 "register_operand" "=r,r") > @@ -508,6 +565,7 @@ > DONE; > }) > > + > (define_expand "uaddv4" > [(set (match_operand:GPR 0 "register_operand" "=r,r") > (plus:GPR (match_operand:GPR 1 "register_operand" " r,r") > -- > 2.25.1 >