From: Christoph Muellner <christoph.muellner@vrull.eu>
To: gcc-patches@gcc.gnu.org, Kito Cheng <kito.cheng@sifive.com>,
Jim Wilson <jim.wilson.gcc@gmail.com>,
Palmer Dabbelt <palmer@dabbelt.com>,
Andrew Waterman <andrew@sifive.com>,
Philipp Tomsich <philipp.tomsich@vrull.eu>,
Jeff Law <jeffreyalaw@gmail.com>,
Cooper Qu <cooper.qu@linux.alibaba.com>,
Lifang Xia <lifang_xia@linux.alibaba.com>,
Yunhai Shang <yunhai@linux.alibaba.com>,
Zhiwei Liu <zhiwei_liu@linux.alibaba.com>
Cc: "Christoph Müllner" <christoph.muellner@vrull.eu>
Subject: [PATCH v3 09/11] riscv: thead: Add support for the XTheadMemPair ISA extension
Date: Fri, 24 Feb 2023 06:51:25 +0100 [thread overview]
Message-ID: <20230224055127.2500953-10-christoph.muellner@vrull.eu> (raw)
In-Reply-To: <20230224055127.2500953-1-christoph.muellner@vrull.eu>
From: Christoph Müllner <christoph.muellner@vrull.eu>
The XTheadMemPair ISA extension allows to pair two loads or stores:
* th.ldd (2x LD)
* th.lwd (2x LW)
* th.lwud (2x LWU)
* th.sdd (2x SD)
* th.swd (2x SW)
The displacement of these instructions is quite limited:
* Displacement := imm2 << shamt
* imm2 is a 2-bit unsigned value {0..3}
* shamt is 4 for th.ldd/th.sdd and 3 otherwise
But even with this small displacement we can identify many candidates.
The merge of the two loads/stores is realized in form of peephole2
passes that support instruction reordering.
The CFA expansion (save/restore registers on/from stack) is not
processed by the peephole2 pass and, therefore, needs special-treatment.
Many ideas of this patch are inspired by similar/equal approaches
in other backends.
gcc/ChangeLog:
* config/riscv/peephole.md: New peephole passes for
XTheadMemPair.
* config/riscv/riscv-protos.h (thead_mempair_operands_p):
New prototype.
(thead_mempair_order_operands): Likewise.
(thead_mempair_output_move): Likewise.
* config/riscv/riscv.cc (extract_base_offset_in_addr):
New function.
(riscv_split_plus): New function.
(thead_mempair_output_move): New function.
(thead_mempair_check_consecutive_mems): New function.
(thead_mempair_operand_p): New function.
(thead_mempair_load_overlap_p): New function.
(thead_mempair_operands_p): New function.
(thead_mempair_order_operands): New function.
(riscv_save_reg): Moved code.
(th_mempair_save_reg): New function.
(riscv_restore_reg): Moved code.
(th_mempair_restore_reg): New function.
(riscv_for_each_saved_reg): Add code to handle mempair
instructions.
* config/riscv/thead.md (*th_mempair_load_<GPR:mode>2):
New pattern.
(*th_mempair_store_<GPR:mode>2): Likewise.
(*th_mempair_load_extendsidi2): Likewise.
(*th_mempair_load_zero_extendsidi2): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/xtheadmempair-1.c: New test.
* gcc.target/riscv/xtheadmempair-2.c: New test.
* gcc.target/riscv/xtheadmempair-3.c: New test.
Changes in v3:
- Don't emit instructions during peephole2, but emit parallel INSNs
- Add proper checks for the INSN patterns to avoid ICEs or illegal
instructions reported by the assembler
- Don't insert any `add` instructions
- Rework the constraint handling
- Simplify the output function
- Restructure and simplify CFA processing
- Add debug notes to CFA instructions
- Emit parallel INSNs in the CFA code (same as peephole2)
- Drop tests that target reordering
- Drop tests that are irrelevant (e.g. unrolled loops)
- Add tests for all possible displacements and all instructions
- Add tests for CFA
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
---
gcc/config/riscv/peephole.md | 56 +++
gcc/config/riscv/riscv-protos.h | 6 +
gcc/config/riscv/riscv.cc | 450 ++++++++++++++++--
gcc/config/riscv/thead.md | 52 ++
.../gcc.target/riscv/xtheadmempair-1.c | 98 ++++
.../gcc.target/riscv/xtheadmempair-2.c | 84 ++++
.../gcc.target/riscv/xtheadmempair-3.c | 29 ++
7 files changed, 746 insertions(+), 29 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/xtheadmempair-3.c
diff --git a/gcc/config/riscv/peephole.md b/gcc/config/riscv/peephole.md
index 0ef0c04410b..ea696f3116c 100644
--- a/gcc/config/riscv/peephole.md
+++ b/gcc/config/riscv/peephole.md
@@ -38,3 +38,59 @@ (define_peephole2
{
operands[5] = GEN_INT (INTVAL (operands[2]) - INTVAL (operands[5]));
})
+
+;; XTheadMemPair: merge two SI or DI loads
+(define_peephole2
+ [(set (match_operand:GPR 0 "register_operand" "")
+ (match_operand:GPR 1 "memory_operand" ""))
+ (set (match_operand:GPR 2 "register_operand" "")
+ (match_operand:GPR 3 "memory_operand" ""))]
+ "TARGET_XTHEADMEMPAIR
+ && thead_mempair_operands_p (operands, true, <GPR:MODE>mode)"
+ [(parallel [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2) (match_dup 3))])]
+{
+ thead_mempair_order_operands (operands, true, <GPR:MODE>mode);
+})
+
+;; XTheadMemPair: merge two SI or DI stores
+(define_peephole2
+ [(set (match_operand:GPR 0 "memory_operand" "")
+ (match_operand:GPR 1 "register_operand" ""))
+ (set (match_operand:GPR 2 "memory_operand" "")
+ (match_operand:GPR 3 "register_operand" ""))]
+ "TARGET_XTHEADMEMPAIR
+ && thead_mempair_operands_p (operands, false, <GPR:MODE>mode)"
+ [(parallel [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2) (match_dup 3))])]
+{
+ thead_mempair_order_operands (operands, false, <GPR:MODE>mode);
+})
+
+;; XTheadMemPair: merge two SI loads with sign-extension
+(define_peephole2
+ [(set (match_operand:DI 0 "register_operand" "")
+ (sign_extend:DI (match_operand:SI 1 "memory_operand" "")))
+ (set (match_operand:DI 2 "register_operand" "")
+ (sign_extend:DI (match_operand:SI 3 "memory_operand" "")))]
+ "TARGET_XTHEADMEMPAIR && TARGET_64BIT
+ && thead_mempair_operands_p (operands, true, SImode)"
+ [(parallel [(set (match_dup 0) (sign_extend:DI (match_dup 1)))
+ (set (match_dup 2) (sign_extend:DI (match_dup 3)))])]
+{
+ thead_mempair_order_operands (operands, true, SImode);
+})
+
+;; XTheadMemPair: merge two SI loads with zero-extension
+(define_peephole2
+ [(set (match_operand:DI 0 "register_operand" "")
+ (zero_extend:DI (match_operand:SI 1 "memory_operand" "")))
+ (set (match_operand:DI 2 "register_operand" "")
+ (zero_extend:DI (match_operand:SI 3 "memory_operand" "")))]
+ "TARGET_XTHEADMEMPAIR && TARGET_64BIT
+ && thead_mempair_operands_p (operands, true, SImode)"
+ [(parallel [(set (match_dup 0) (zero_extend:DI (match_dup 1)))
+ (set (match_dup 2) (zero_extend:DI (match_dup 3)))])]
+{
+ thead_mempair_order_operands (operands, true, SImode);
+})
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5cf4fafd662..1b7ba02726d 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -54,7 +54,13 @@ extern bool riscv_split_64bit_move_p (rtx, rtx);
extern void riscv_split_doubleword_move (rtx, rtx);
extern const char *riscv_output_move (rtx, rtx);
extern const char *riscv_output_return ();
+
+extern bool thead_mempair_operands_p (rtx[4], bool, machine_mode);
+extern void thead_mempair_order_operands (rtx[4], bool, machine_mode);
+
#ifdef RTX_CODE
+extern const char *thead_mempair_output_move (rtx[4], bool, machine_mode, RTX_CODE);
+
extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx);
extern void riscv_expand_float_scc (rtx, enum rtx_code, rtx, rtx);
extern void riscv_expand_conditional_branch (rtx, enum rtx_code, rtx, rtx);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 48f2cb399ae..33854393bd2 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2943,6 +2943,332 @@ riscv_output_move (rtx dest, rtx src)
gcc_unreachable ();
}
+/* If MEM is in the form of "base+offset", extract the two parts
+ of address and set to BASE and OFFSET, otherwise return false
+ after clearing BASE and OFFSET. */
+
+static bool
+extract_base_offset_in_addr (rtx mem, rtx *base, rtx *offset)
+{
+ rtx addr;
+
+ gcc_assert (MEM_P (mem));
+
+ addr = XEXP (mem, 0);
+
+ if (REG_P (addr))
+ {
+ *base = addr;
+ *offset = const0_rtx;
+ return true;
+ }
+
+ if (GET_CODE (addr) == PLUS
+ && REG_P (XEXP (addr, 0)) && CONST_INT_P (XEXP (addr, 1)))
+ {
+ *base = XEXP (addr, 0);
+ *offset = XEXP (addr, 1);
+ return true;
+ }
+
+ *base = NULL_RTX;
+ *offset = NULL_RTX;
+
+ return false;
+}
+
+/* If X is a PLUS of a CONST_INT, return the two terms in *BASE_PTR
+ and *OFFSET_PTR. Return X in *BASE_PTR and 0 in *OFFSET_PTR otherwise. */
+
+static void
+riscv_split_plus (rtx x, rtx *base_ptr, HOST_WIDE_INT *offset_ptr)
+{
+ if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)))
+ {
+ *base_ptr = XEXP (x, 0);
+ *offset_ptr = INTVAL (XEXP (x, 1));
+ }
+ else
+ {
+ *base_ptr = x;
+ *offset_ptr = 0;
+ }
+}
+
+/* Output a mempair instruction with the provided OPERANDS.
+ LOAD_P is true if a we have a pair of loads (stores otherwise).
+ MODE is the access mode (DI or SI).
+ CODE is the extension code (UNKNOWN, SIGN_EXTEND or ZERO_EXTEND).
+ This instruction does not handle invalid inputs gracefully,
+ but is full of assertions to ensure that only valid instructions
+ are emitted. */
+
+const char *
+thead_mempair_output_move (rtx operands[4], bool load_p,
+ machine_mode mode, RTX_CODE code)
+{
+ rtx reg1, reg2, mem1, mem2, base1, base2;
+ HOST_WIDE_INT offset1, offset2;
+ rtx output_operands[5];
+ const char* format;
+
+ gcc_assert (mode == SImode || mode == DImode);
+
+ /* Paired 64-bit access instructions have a fixed shift amount of 4.
+ Paired 32-bit access instructions have a fixed shift amount of 3. */
+ unsigned shamt = (mode == DImode) ? 4 : 3;
+
+ if (load_p)
+ {
+ reg1 = copy_rtx (operands[0]);
+ reg2 = copy_rtx (operands[2]);
+ mem1 = copy_rtx (operands[1]);
+ mem2 = copy_rtx (operands[3]);
+
+ if (mode == SImode)
+ if (code == ZERO_EXTEND)
+ format = "th.lwud\t%0, %1, (%2), %3, %4";
+ else //SIGN_EXTEND or UNKNOWN
+ format = "th.lwd\t%0, %1, (%2), %3, %4";
+ else
+ format = "th.ldd\t%0, %1, (%2), %3, %4";
+ }
+ else
+ {
+ reg1 = copy_rtx (operands[1]);
+ reg2 = copy_rtx (operands[3]);
+ mem1 = copy_rtx (operands[0]);
+ mem2 = copy_rtx (operands[2]);
+
+ if (mode == SImode)
+ format = "th.swd\t%z0, %z1, (%2), %3, %4";
+ else
+ format = "th.sdd\t%z0, %z1, (%2), %3, %4";
+ }
+
+ riscv_split_plus (XEXP (mem1, 0), &base1, &offset1);
+ riscv_split_plus (XEXP (mem2, 0), &base2, &offset2);
+ gcc_assert (rtx_equal_p (base1, base2));
+ auto size1 = MEM_SIZE (mem1);
+ auto size2 = MEM_SIZE (mem2);
+ gcc_assert (known_eq (size1, size2));
+ gcc_assert (known_eq (offset1 + size1, offset2));
+
+ HOST_WIDE_INT imm2 = offset1 >> shamt;
+
+ /* Make sure all mempair instruction constraints are met. */
+ gcc_assert (imm2 >= 0 && imm2 < 4);
+ gcc_assert ((imm2 << shamt) == offset1);
+ gcc_assert (REG_P (reg1));
+ gcc_assert (REG_P (reg2));
+ gcc_assert (REG_P (base1));
+ if (load_p)
+ {
+ gcc_assert (REGNO (reg1) != REGNO (reg2));
+ gcc_assert (REGNO (reg1) != REGNO (base1));
+ gcc_assert (REGNO (reg2) != REGNO (base1));
+ }
+
+ /* Output the mempair instruction. */
+ output_operands[0] = copy_rtx (reg1);
+ output_operands[1] = copy_rtx (reg2);
+ output_operands[2] = copy_rtx (base1);
+ output_operands[3] = gen_rtx_CONST_INT (mode, imm2);
+ output_operands[4] = gen_rtx_CONST_INT (mode, shamt);
+ output_asm_insn (format, output_operands);
+
+ return "";
+}
+
+/* Analyse if a pair of loads/stores MEM1 and MEM2 with given MODE
+ are consecutive so they can be merged into a mempair instruction.
+ RESERVED will be set to true, if a reversal of the accesses is
+ required (false otherwise). Returns true if the accesses can be
+ merged (even if reversing is necessary) and false if not. */
+
+static bool
+thead_mempair_check_consecutive_mems (machine_mode mode, rtx *mem1, rtx *mem2,
+ bool *reversed)
+{
+ rtx base1, base2, offset1, offset2;
+ extract_base_offset_in_addr (*mem1, &base1, &offset1);
+ extract_base_offset_in_addr (*mem2, &base2, &offset2);
+
+ /* Make sure both mems are in base+offset form. */
+ if (!base1 || !base2)
+ return false;
+
+ /* If both mems use the same base register, just check the offsets. */
+ if (rtx_equal_p (base1, base2))
+ {
+ auto size = GET_MODE_SIZE (mode);
+
+ if (known_eq (UINTVAL (offset1) + size, UINTVAL (offset2)))
+ {
+ *reversed = false;
+ return true;
+ }
+
+ if (known_eq (UINTVAL (offset2) + size, UINTVAL (offset1)))
+ {
+ *reversed = true;
+ return true;
+ }
+
+ return false;
+ }
+
+ return false;
+}
+
+/* Check if the given MEM can be used to define the address of a mempair
+ instruction. */
+
+static bool
+thead_mempair_operand_p (rtx mem, machine_mode mode)
+{
+ if (!MEM_SIZE_KNOWN_P (mem))
+ return false;
+
+ /* Only DI or SI mempair instructions exist. */
+ auto mem_sz = MEM_SIZE (mem);
+ auto mode_sz = GET_MODE_SIZE (mode);
+ if (!known_eq (mem_sz, mode_sz))
+ return false;
+
+ /* Paired 64-bit access instructions have a fixed shift amount of 4.
+ Paired 32-bit access instructions have a fixed shift amount of 3. */
+ machine_mode mem_mode = GET_MODE (mem);
+ unsigned shamt = (mem_mode == DImode) ? 4 : 3;
+
+ rtx base;
+ HOST_WIDE_INT offset;
+ riscv_split_plus (XEXP (mem, 0), &base, &offset);
+ HOST_WIDE_INT imm2 = offset >> shamt;
+
+ if (imm2 < 0 || imm2 >= 4)
+ return false;
+
+ if ((imm2 << shamt) != offset)
+ return false;
+
+ return true;
+}
+
+static bool
+thead_mempair_load_overlap_p (rtx reg1, rtx reg2, rtx mem)
+{
+ if (REGNO (reg1) == REGNO (reg2))
+ return true;
+
+ if (reg_overlap_mentioned_p (reg1, mem))
+ return true;
+
+ rtx base;
+ HOST_WIDE_INT offset;
+ riscv_split_plus (XEXP (mem, 0), &base, &offset);
+
+ if (!REG_P (base))
+ return true;
+
+ if (REG_P (base))
+ {
+ if (REGNO (base) == REGNO (reg1)
+ || REGNO (base) == REGNO (reg2))
+ return true;
+ }
+
+ return false;
+}
+
+/* Given OPERANDS of consecutive load/store, check if we can merge
+ them into load-pair or store-pair instructions.
+ LOAD is true if they are load instructions.
+ MODE is the mode of memory operation. */
+
+bool
+thead_mempair_operands_p (rtx operands[4], bool load_p,
+ machine_mode mode)
+{
+ rtx mem_1, mem_2, reg_1, reg_2;
+
+ if (load_p)
+ {
+ reg_1 = operands[0];
+ mem_1 = operands[1];
+ reg_2 = operands[2];
+ mem_2 = operands[3];
+ if (!REG_P (reg_1) || !REG_P (reg_2))
+ return false;
+ if (thead_mempair_load_overlap_p (reg_1, reg_2, mem_1))
+ return false;
+ if (thead_mempair_load_overlap_p (reg_1, reg_2, mem_2))
+ return false;
+ }
+ else
+ {
+ mem_1 = operands[0];
+ reg_1 = operands[1];
+ mem_2 = operands[2];
+ reg_2 = operands[3];
+ }
+
+ /* Check if the registers are GP registers. */
+ if (!REG_P (reg_1) || !GP_REG_P (REGNO (reg_1))
+ || !REG_P (reg_2) || !GP_REG_P (REGNO (reg_2)))
+ return false;
+
+ /* The mems cannot be volatile. */
+ if (!MEM_P (mem_1) || !MEM_P (mem_2))
+ return false;
+ if (MEM_VOLATILE_P (mem_1) || MEM_VOLATILE_P (mem_2))
+ return false;
+
+ /* If we have slow unaligned access, we only accept aligned memory. */
+ if (riscv_slow_unaligned_access_p
+ && known_lt (MEM_ALIGN (mem_1), GET_MODE_SIZE (mode) * BITS_PER_UNIT))
+ return false;
+
+ /* Check if the addresses are in the form of [base+offset]. */
+ bool reversed = false;
+ if (!thead_mempair_check_consecutive_mems (mode, &mem_1, &mem_2, &reversed))
+ return false;
+
+ /* The first memory accesses must be a mempair operand. */
+ if ((!reversed && !thead_mempair_operand_p (mem_1, mode))
+ || (reversed && !thead_mempair_operand_p (mem_2, mode)))
+ return false;
+
+ /* The operands must be of the same size. */
+ gcc_assert (known_eq (GET_MODE_SIZE (GET_MODE (mem_1)),
+ GET_MODE_SIZE (GET_MODE (mem_2))));
+
+ return true;
+}
+
+/* Given OPERANDS of consecutive load/store that can be merged,
+ swap them if they are not in ascending order.
+ Return true if swap was performed. */
+void
+thead_mempair_order_operands (rtx operands[4], bool load_p, machine_mode mode)
+{
+ int mem_op = load_p ? 1 : 0;
+ bool reversed = false;
+ if (!thead_mempair_check_consecutive_mems (mode,
+ operands + mem_op,
+ operands + mem_op + 2,
+ &reversed))
+ gcc_unreachable ();
+
+ if (reversed)
+ {
+ /* Irrespective of whether this is a load or a store,
+ we do the same swap. */
+ std::swap (operands[0], operands[2]);
+ std::swap (operands[1], operands[3]);
+ }
+}
+
const char *
riscv_output_return ()
{
@@ -4963,6 +5289,61 @@ riscv_set_return_address (rtx address, rtx scratch)
riscv_emit_move (gen_frame_mem (GET_MODE (address), slot_address), address);
}
+/* Save register REG to MEM. Make the instruction frame-related. */
+
+static void
+riscv_save_reg (rtx reg, rtx mem)
+{
+ riscv_emit_move (mem, reg);
+ riscv_set_frame_expr (riscv_frame_set (mem, reg));
+}
+
+/* Save register 2x REG to MEM. Make the instruction frame-related. */
+
+static void
+th_mempair_save_reg (rtx operands[4])
+{
+ rtx set1 = gen_rtx_SET (operands[0], operands[1]);
+ rtx set2 = gen_rtx_SET (operands[2], operands[3]);
+ rtx insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set1, set2)));
+ RTX_FRAME_RELATED_P (insn) = 1;
+ add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set1));
+ add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set2));
+}
+
+/* Restore register REG from MEM. */
+
+static void
+riscv_restore_reg (rtx reg, rtx mem)
+{
+ rtx insn = riscv_emit_move (reg, mem);
+ rtx dwarf = NULL_RTX;
+ dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf);
+
+ if (epilogue_cfa_sp_offset && REGNO (reg) == HARD_FRAME_POINTER_REGNUM)
+ {
+ rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
+ GEN_INT (epilogue_cfa_sp_offset));
+ dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, dwarf);
+ }
+
+ REG_NOTES (insn) = dwarf;
+ RTX_FRAME_RELATED_P (insn) = 1;
+}
+
+/* Restore register 2x REG from MEM. Make the instruction frame-related. */
+
+static void
+th_mempair_restore_reg (rtx operands[4])
+{
+ rtx set1 = gen_rtx_SET (operands[0], operands[1]);
+ rtx set2 = gen_rtx_SET (operands[2], operands[3]);
+ rtx insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, set1, set2)));
+ RTX_FRAME_RELATED_P (insn) = 1;
+ add_reg_note (insn, REG_CFA_RESTORE, operands[0]);
+ add_reg_note (insn, REG_CFA_RESTORE, operands[2]);
+}
+
/* A function to save or store a register. The first argument is the
register and the second is the stack slot. */
typedef void (*riscv_save_restore_fn) (rtx, rtx);
@@ -5057,6 +5438,46 @@ riscv_for_each_saved_reg (poly_int64 sp_offset, riscv_save_restore_fn fn,
&& riscv_is_eh_return_data_register (regno))
continue;
+ if (TARGET_XTHEADMEMPAIR)
+ {
+ /* Get the next reg/offset pair. */
+ HOST_WIDE_INT offset2 = offset;
+ unsigned int regno2 = riscv_next_saved_reg (regno, limit, &offset2);
+
+ /* Prepare the operands for a mempair instruction. */
+ bool load_p = (fn == riscv_restore_reg);
+ int reg_op = load_p ? 0 : 1;
+ int mem_op = load_p ? 1 : 0;
+ rtx operands[4];
+ operands[reg_op] = gen_rtx_REG (word_mode, regno);
+ rtx mem1 = plus_constant (word_mode, stack_pointer_rtx, offset);
+ mem1 = gen_frame_mem (word_mode, mem1);
+ operands[mem_op] = mem1;
+ operands[2 + reg_op] = gen_rtx_REG (word_mode, regno2);
+ rtx mem2 = plus_constant (word_mode, stack_pointer_rtx, offset2);
+ mem2 = gen_frame_mem (word_mode, mem2);
+ operands[2 + mem_op] = mem2;
+
+ /* Validate everything before emitting a mempair instruction. */
+ if (regno2 != INVALID_REGNUM
+ && !cfun->machine->reg_is_wrapped_separately[regno2]
+ && !(epilogue && !maybe_eh_return
+ && riscv_is_eh_return_data_register (regno2))
+ && thead_mempair_operands_p (operands, load_p, word_mode))
+ {
+ thead_mempair_order_operands (operands, load_p, word_mode);
+
+ if (load_p)
+ th_mempair_restore_reg (operands);
+ else
+ th_mempair_save_reg (operands);
+
+ offset = offset2;
+ regno = regno2;
+ continue;
+ }
+ }
+
riscv_save_restore_reg (word_mode, regno, offset, fn);
}
@@ -5075,35 +5496,6 @@ riscv_for_each_saved_reg (poly_int64 sp_offset, riscv_save_restore_fn fn,
}
}
-/* Save register REG to MEM. Make the instruction frame-related. */
-
-static void
-riscv_save_reg (rtx reg, rtx mem)
-{
- riscv_emit_move (mem, reg);
- riscv_set_frame_expr (riscv_frame_set (mem, reg));
-}
-
-/* Restore register REG from MEM. */
-
-static void
-riscv_restore_reg (rtx reg, rtx mem)
-{
- rtx insn = riscv_emit_move (reg, mem);
- rtx dwarf = NULL_RTX;
- dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf);
-
- if (epilogue_cfa_sp_offset && REGNO (reg) == HARD_FRAME_POINTER_REGNUM)
- {
- rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
- GEN_INT (epilogue_cfa_sp_offset));
- dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, dwarf);
- }
-
- REG_NOTES (insn) = dwarf;
- RTX_FRAME_RELATED_P (insn) = 1;
-}
-
/* For stack frames that can't be allocated with a single ADDI instruction,
compute the best value to initially allocate. It must at a minimum
allocate enough space to spill the callee-saved registers. If TARGET_RVC,
diff --git a/gcc/config/riscv/thead.md b/gcc/config/riscv/thead.md
index 6ec3613071f..cbfe51cbd85 100644
--- a/gcc/config/riscv/thead.md
+++ b/gcc/config/riscv/thead.md
@@ -297,3 +297,55 @@ (define_insn "*th_sextw_msubhisi4"
[(set_attr "type" "imul")
(set_attr "mode" "SI")]
)
+
+;; XTheadMemPair
+
+;; MEMPAIR load 64/32 bit
+(define_insn "*th_mempair_load_<GPR:mode>2"
+ [(set (match_operand:GPR 0 "register_operand" "=r")
+ (match_operand:GPR 1 "memory_operand" "m"))
+ (set (match_operand:GPR 2 "register_operand" "=r")
+ (match_operand:GPR 3 "memory_operand" "m"))]
+ "TARGET_XTHEADMEMPAIR && reload_completed
+ && thead_mempair_operands_p (operands, true, <GPR:MODE>mode)"
+ { return thead_mempair_output_move (operands, true, <GPR:MODE>mode, UNKNOWN); }
+ [(set_attr "move_type" "load")
+ (set_attr "mode" "<GPR:MODE>")])
+
+;; MEMPAIR store 64/32 bit
+(define_insn "*th_mempair_store_<GPR:mode>2"
+ [(set (match_operand:GPR 0 "memory_operand" "=m")
+ (match_operand:GPR 1 "register_operand" "r"))
+ (set (match_operand:GPR 2 "memory_operand" "=m")
+ (match_operand:GPR 3 "register_operand" "r"))]
+ "TARGET_XTHEADMEMPAIR && reload_completed
+ && thead_mempair_operands_p (operands, false, <GPR:MODE>mode)"
+ { return thead_mempair_output_move (operands, false, <GPR:MODE>mode, UNKNOWN); }
+ [(set_attr "move_type" "store")
+ (set_attr "mode" "<GPR:MODE>")])
+
+;; MEMPAIR load DI extended signed SI
+(define_insn "*th_mempair_load_extendsidi2"
+ [(set (match_operand 0 "register_operand" "=r")
+ (sign_extend:DI (match_operand 1 "memory_operand" "m")))
+ (set (match_operand 2 "register_operand" "=r")
+ (sign_extend:DI (match_operand 3 "memory_operand" "m")))]
+ "TARGET_XTHEADMEMPAIR && TARGET_64BIT && reload_completed
+ && thead_mempair_operands_p (operands, true, SImode)"
+ { return thead_mempair_output_move (operands, true, SImode, SIGN_EXTEND); }
+ [(set_attr "move_type" "load")
+ (set_attr "mode" "DI")
+ (set_attr "length" "8")])
+
+;; MEMPAIR load DI extended unsigned SI
+(define_insn "*th_mempair_load_zero_extendsidi2"
+ [(set (match_operand 0 "register_operand" "=r")
+ (zero_extend:DI (match_operand 1 "memory_operand" "m")))
+ (set (match_operand 2 "register_operand" "=r")
+ (zero_extend:DI (match_operand 3 "memory_operand" "m")))]
+ "TARGET_XTHEADMEMPAIR && TARGET_64BIT && reload_completed
+ && thead_mempair_operands_p (operands, true, SImode)"
+ { return thead_mempair_output_move (operands, true, SImode, ZERO_EXTEND); }
+ [(set_attr "move_type" "load")
+ (set_attr "mode" "DI")
+ (set_attr "length" "8")])
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadmempair-1.c b/gcc/testsuite/gcc.target/riscv/xtheadmempair-1.c
new file mode 100644
index 00000000000..c324555fd35
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/xtheadmempair-1.c
@@ -0,0 +1,98 @@
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-g" "-Oz" "-Os" "-flto" } } */
+/* { dg-options "-march=rv64gc_xtheadmempair -mtune=thead-c906" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc_xtheadmempair -mtune=thead-c906" { target { rv32 } } } */
+
+#include <inttypes.h>
+
+#if __riscv_xlen == 32
+typedef uint32_t xlen_t;
+#else
+typedef uint64_t xlen_t;
+#endif
+
+void foof (xlen_t*, xlen_t, xlen_t);
+void foor (xlen_t*, xlen_t, xlen_t);
+void foowu (uint32_t*, uint64_t, uint64_t);
+void foows (int32_t*, int64_t, int64_t);
+
+#define LxD_TEST(f, T, i1, i2) \
+void \
+f ## i1 ## i2(T *arr) \
+{ \
+ foo ## f(arr, arr[i1], arr[i2]); \
+}
+
+// works
+LxD_TEST(f, xlen_t, 0, 1)
+// does not work (can't merge with unaligned offset)
+LxD_TEST(f, xlen_t, 1, 2)
+// works
+LxD_TEST(f, xlen_t, 2, 3)
+// does not work (can't merge with unaligned offset)
+LxD_TEST(f, xlen_t, 3, 4)
+// works
+LxD_TEST(f, xlen_t, 4, 5)
+// does not work (can't merge with unaligned offset)
+LxD_TEST(f, xlen_t, 5, 6)
+// works
+LxD_TEST(f, xlen_t, 6, 7)
+// does not work (can't merge with unaligned offset)
+LxD_TEST(f, xlen_t, 7, 8)
+// does not work (out of range)
+LxD_TEST(f, xlen_t, 8, 9)
+
+// works with reordering
+LxD_TEST(r, xlen_t, 1, 0)
+// does not work (can't merge with unaligned offset)
+LxD_TEST(r, xlen_t, 2, 1)
+// works with reordering
+LxD_TEST(r, xlen_t, 3, 2)
+// does not work (can't merge with unaligned offset)
+LxD_TEST(r, xlen_t, 4, 3)
+// works with reordering
+LxD_TEST(r, xlen_t, 5, 4)
+// does not work (can't merge with unaligned offset)
+LxD_TEST(r, xlen_t, 6, 5)
+// works with reordering
+LxD_TEST(r, xlen_t, 7, 6)
+// does not work (can't merge with unaligned offset)
+LxD_TEST(r, xlen_t, 8, 7)
+// does not work (out of range)
+LxD_TEST(r, xlen_t, 9, 8)
+
+#if __riscv_xlen != 32
+// works
+LxD_TEST(wu, uint32_t, 0, 1)
+LxD_TEST(ws, int32_t, 0, 1)
+// does not work (can't merge with unaligned offset)
+LxD_TEST(wu, uint32_t, 1, 2)
+LxD_TEST(ws, int32_t, 1, 2)
+// works
+LxD_TEST(wu, uint32_t, 2, 3)
+LxD_TEST(ws, int32_t, 2, 3)
+// does not work (can't merge with unaligned offset)
+LxD_TEST(wu, uint32_t, 3, 4)
+LxD_TEST(ws, int32_t, 3, 4)
+// works
+LxD_TEST(wu, uint32_t, 4, 5)
+LxD_TEST(ws, int32_t, 4, 5)
+// does not work (can't merge with unaligned offset)
+LxD_TEST(wu, uint32_t, 5, 6)
+LxD_TEST(ws, int32_t, 5, 6)
+// works
+LxD_TEST(wu, uint32_t, 6, 7)
+LxD_TEST(ws, int32_t, 6, 7)
+// does not work (can't merge with unaligned offset)
+LxD_TEST(wu, uint32_t, 7, 8)
+LxD_TEST(ws, int32_t, 7, 8)
+// does not work (out of range)
+LxD_TEST(wu, uint32_t, 8, 9)
+LxD_TEST(ws, int32_t, 8, 9)
+#endif
+
+/* { dg-final { scan-assembler-times "th.ldd\t" 8 { target { rv64 } } } } */
+/* { dg-final { scan-assembler-times "th.luwd\t" 4 { target { rv64 } } } } */
+/* { dg-final { scan-assembler-times "th.lwd\t" 4 { target { rv64 } } } } */
+
+/* { dg-final { scan-assembler-times "th.lwd\t" 8 { target { rv32 } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadmempair-2.c b/gcc/testsuite/gcc.target/riscv/xtheadmempair-2.c
new file mode 100644
index 00000000000..ed639709196
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/xtheadmempair-2.c
@@ -0,0 +1,84 @@
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-g" "-Oz" "-Os" "-flto" } } */
+/* { dg-options "-march=rv64gc_xtheadmempair -mtune=thead-c906" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc_xtheadmempair -mtune=thead-c906" { target { rv32 } } } */
+
+#include <inttypes.h>
+
+#if __riscv_xlen == 32
+typedef uint32_t xlen_t;
+#else
+typedef uint64_t xlen_t;
+#endif
+
+#define SxD_TEST(f, T, i1, i2) \
+void \
+f ## i1 ## i2(T *arr, T x, T y) \
+{ \
+ arr[i1] = x; \
+ arr[i2] = y; \
+}
+
+// works
+SxD_TEST(f, xlen_t, 0, 1)
+// does not work (can't merge with unaligned offset)
+SxD_TEST(f, xlen_t, 1, 2)
+// works
+SxD_TEST(f, xlen_t, 2, 3)
+// does not work (can't merge with unaligned offset)
+SxD_TEST(f, xlen_t, 3, 4)
+// works
+SxD_TEST(f, xlen_t, 4, 5)
+// does not work (can't merge with unaligned offset)
+SxD_TEST(f, xlen_t, 5, 6)
+// works
+SxD_TEST(f, xlen_t, 6, 7)
+// does not work (can't merge with unaligned offset)
+SxD_TEST(f, xlen_t, 7, 8)
+// does not work (out of range)
+SxD_TEST(f, xlen_t, 8, 9)
+
+// works with reordering
+SxD_TEST(r, xlen_t, 1, 0)
+// does not work (can't merge with unaligned offset)
+SxD_TEST(r, xlen_t, 2, 1)
+// works with reordering
+SxD_TEST(r, xlen_t, 3, 2)
+// does not work (can't merge with unaligned offset)
+SxD_TEST(r, xlen_t, 4, 3)
+// works with reordering
+SxD_TEST(r, xlen_t, 5, 4)
+// does not work (can't merge with unaligned offset)
+SxD_TEST(r, xlen_t, 6, 5)
+// works with reordering
+SxD_TEST(r, xlen_t, 7, 6)
+// does not work (can't merge with unaligned offset)
+SxD_TEST(r, xlen_t, 8, 7)
+// does not work (out of range)
+SxD_TEST(r, xlen_t, 9, 8)
+
+#if __riscv_xlen != 32
+// works
+SxD_TEST(w, uint32_t, 0, 1)
+// does not work (can't merge with unaligned offset)
+SxD_TEST(w, uint32_t, 1, 2)
+// works
+SxD_TEST(w, uint32_t, 2, 3)
+// does not work (can't merge with unaligned offset)
+SxD_TEST(w, uint32_t, 3, 4)
+// works
+SxD_TEST(w, uint32_t, 4, 5)
+// does not work (can't merge with unaligned offset)
+SxD_TEST(w, uint32_t, 5, 6)
+// works
+SxD_TEST(w, uint32_t, 6, 7)
+// does not work (can't merge with unaligned offset)
+SxD_TEST(w, uint32_t, 7, 8)
+// does not work (out of range)
+SxD_TEST(w, uint32_t, 8, 9)
+#endif
+
+/* { dg-final { scan-assembler-times "th.sdd\t" 8 { target { rv64 } } } } */
+/* { dg-final { scan-assembler-times "th.swd\t" 4 { target { rv64 } } } } */
+
+/* { dg-final { scan-assembler-times "th.swd\t" 8 { target { rv32 } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadmempair-3.c b/gcc/testsuite/gcc.target/riscv/xtheadmempair-3.c
new file mode 100644
index 00000000000..5dec702819a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/xtheadmempair-3.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-g" "-Oz" "-Os" "-flto" } } */
+/* { dg-options "-march=rv64gc_xtheadmempair -mtune=thead-c906" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc_xtheadmempair -mtune=thead-c906" { target { rv32 } } } */
+
+#include <inttypes.h>
+
+#if __riscv_xlen == 32
+typedef uint32_t xlen_t;
+#else
+typedef uint64_t xlen_t;
+#endif
+
+void foo (xlen_t, xlen_t, xlen_t, xlen_t, xlen_t, xlen_t, xlen_t, xlen_t);
+void bar (xlen_t, xlen_t, xlen_t, xlen_t, xlen_t, xlen_t, xlen_t, xlen_t);
+
+void baz (xlen_t a, xlen_t b, xlen_t c, xlen_t d, xlen_t e, xlen_t f, xlen_t g, xlen_t h)
+{
+ foo (a, b, c, d, e, f, g, h);
+ /* RV64: We don't use 0(sp), therefore we can only get 3 mempairs. */
+ /* RV32: We don't use 0(sp)-8(sp), therefore we can only get 2 mempairs. */
+ bar (a, b, c, d, e, f, g, h);
+}
+
+/* { dg-final { scan-assembler-times "th.ldd\t" 3 { target { rv64 } } } } */
+/* { dg-final { scan-assembler-times "th.sdd\t" 3 { target { rv64 } } } } */
+
+/* { dg-final { scan-assembler-times "th.lwd\t" 2 { target { rv32 } } } } */
+/* { dg-final { scan-assembler-times "th.swd\t" 2 { target { rv32 } } } } */
--
2.39.2
next prev parent reply other threads:[~2023-02-24 5:51 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-24 5:51 [PATCH v3 00/11] RISC-V: Add XThead* extension support Christoph Muellner
2023-02-24 5:51 ` [PATCH v3 01/11] riscv: Add basic XThead* vendor " Christoph Muellner
2023-02-24 5:51 ` [PATCH v3 02/11] riscv: riscv-cores.def: Add T-Head XuanTie C906 Christoph Muellner
2023-02-24 5:51 ` [PATCH v3 03/11] riscv: thead: Add support for the XTheadBa ISA extension Christoph Muellner
2023-02-24 6:52 ` Andrew Pinski
2023-02-24 9:54 ` Kito Cheng
2023-02-24 10:05 ` Christoph Müllner
2023-02-24 10:47 ` Christoph Müllner
2023-02-24 5:51 ` [PATCH v3 04/11] riscv: thead: Add support for the XTheadBs " Christoph Muellner
2023-02-24 7:36 ` Kito Cheng
2023-02-24 10:21 ` Christoph Müllner
2023-02-25 23:42 ` Hans-Peter Nilsson
2023-02-28 17:49 ` Christoph Müllner
2023-03-01 0:19 ` Hans-Peter Nilsson
2023-03-01 7:49 ` Christoph Müllner
2023-02-24 5:51 ` [PATCH v3 05/11] riscv: thead: Add support for the XTheadBb " Christoph Muellner
2023-02-24 5:51 ` [PATCH v3 06/11] riscv: thead: Add support for the XTheadCondMov ISA extensions Christoph Muellner
2023-02-24 5:51 ` [PATCH v3 07/11] riscv: thead: Add support for the XTheadMac ISA extension Christoph Muellner
2023-02-24 5:51 ` [PATCH v3 08/11] riscv: thead: Add support for the XTheadFmv " Christoph Muellner
2023-02-24 5:51 ` Christoph Muellner [this message]
2023-02-24 9:00 ` [PATCH v3 09/11] riscv: thead: Add support for the XTheadMemPair " Kito Cheng
2023-02-24 9:33 ` Christoph Müllner
2023-02-24 15:48 ` Kito Cheng
2023-02-24 5:51 ` [PATCH v3 10/11] riscv: thead: Add support for the XTheadMemIdx " Christoph Muellner
2023-02-24 9:46 ` Kito Cheng
2023-02-24 5:51 ` [PATCH v3 11/11] riscv: thead: Add support for the XTheadFMemIdx " Christoph Muellner
2023-02-24 7:53 ` Kito Cheng
2023-02-24 9:46 ` Kito Cheng
2023-02-24 8:08 ` [PATCH v3 00/11] RISC-V: Add XThead* extension support Kito Cheng
2023-02-24 11:27 ` Christoph Müllner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230224055127.2500953-10-christoph.muellner@vrull.eu \
--to=christoph.muellner@vrull.eu \
--cc=andrew@sifive.com \
--cc=cooper.qu@linux.alibaba.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=jeffreyalaw@gmail.com \
--cc=jim.wilson.gcc@gmail.com \
--cc=kito.cheng@sifive.com \
--cc=lifang_xia@linux.alibaba.com \
--cc=palmer@dabbelt.com \
--cc=philipp.tomsich@vrull.eu \
--cc=yunhai@linux.alibaba.com \
--cc=zhiwei_liu@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).