public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 4/5] RISC-V: Add Zcmp extension supports.
@ 2023-04-25 10:11 Fei Gao
  2023-05-05 15:57 ` Sinan
  0 siblings, 1 reply; 9+ messages in thread
From: Fei Gao @ 2023-04-25 10:11 UTC (permalink / raw)
  To: jiawei; +Cc: gcc-patches



hi Jiawei

Please ignore my previous reply. I accidently sent the email before I finished it.
Sorry for that!

I downloaded the series of patches from you and found in some cases
it fails to generate zcmp push and pop insns.

TC:

char my_getchar();
int test_s0()
{

        int a = my_getchar();
        int b = my_getchar();
        return a+b;
}

cc1 -fno-shrink-wrap-separate -O2 -march=rv32e_zca_zcmp -mabi=ilp32e  -mcmodel=medlow test.c

-fno-shrink-wrap-separate is used here to avoid the impact from shrink-wrap-separate that is by default
enabled in O2.

As i'm also interested in Zc*, i did some changes mainly in prologue and epilogue pass quite simliar to
what has been done for save and restore except the CFI directives due to reversed order that zcmp
pushes and pops ra, s regs than what save and restore do. 

I will refine and share the code soon for your review.

BR
Fei




On Thu Apr 6 06:21:17 GMT 2023  Jiawei jiawei@iscas.ac.cn wrote:
>
>Add Zcmp extension instructions support. Generate push/pop
>with follow steps:
>
>  1. preprocessing:
>    1.1. if there is no push rtx, then just return. e.g.
>    (note 5 1 22 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>    (insn/f 22 5 23 2 (set (reg/f:SI 2 sp)
>      (plus:SI (reg/f:SI 2 sp)
>        (const_int -32 [0xffffffffffffffe0])))
>    (nil))
>    (note 23 22 2 2 NOTE_INSN_PROLOGUE_END)
>    1.2. if push rtx exists, then we compute the number of
>    pushed s-registers, n_sreg.
>
>  push rtx should be find before NOTE_INSN_PROLOGUE_END tag
>
>  [2 and 3 happend simultaneously]
>
>  2. find valid move pattern, mv sN, aN, where N < n_sreg,
>    and aN is not used the move pattern, and sN is not
>    defined before the move pattern (from prologue to the
>    position of move pattern).
>
>  3. analysis use and reach of every instruction from prologue
>    to the position of move pattern.
>    if any sN is used, then we mark the corresponding argument list
>    candidate as invalid.
>    e.g.
>        push  {ra,s0-s3}, {}, -32
>        sw      s0,44(sp) # s0 is used, then argument list is invalid
>        mv      a0,a5     # a0 is defined, then argument list is invalid
>        ...
>        mv      s0,a0
>        mv      s1,a1
>        mv      s2,a2
>
>  4. if there is a valid argument list, then replace the pop
>    push parallel insn, and delete mv pattern.
>     if not, skip.
>
>All "zcmpe" means Zcmp with RVE extension.
>The push/pop instrunction implement is mostly finished by Sinan Lin.
>
>Co-Authored by: Sinan Lin <sinan....@linux.alibaba.com>
>Co-Authored by: Simon Cook <simon.c...@embecosm.com>
>Co-Authored by: Shihua Liao <shi...@iscas.ac.cn>
>
>gcc/ChangeLog:
>
>        * config.gcc: New object.
>        * config/riscv/predicates.md (riscv_stack_push_operation):
>          New predicate.
>        (riscv_stack_pop_operation): Ditto.
>        (pop_return_value_constant): Ditto.
>        * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): New pass.
>        * config/riscv/riscv-protos.h (riscv_output_popret_p):
>          New routine.
>        (riscv_valid_stack_push_pop_p): Ditto.
>        (riscv_check_regno): Ditto.
>        (make_pass_zcmp_popret): New pass.
>        * config/riscv/riscv.cc (struct riscv_frame_info): New variable.
>        (riscv_output_popret_p): New function.
>        (riscv_print_pop_size): Ditto.
>        (riscv_print_reglist): Ditto.
>        (riscv_print_operand): New case symbols.
>        (riscv_save_push_pop_count): New function.
>        (riscv_push_pop_base_sp_adjust): Ditto.
>        (riscv_use_push_pop): Ditto.
>        (riscv_compute_frame_info): Adjust frame value.
>        (riscv_emit_pop_insn): New function.
>        (riscv_check_regno): Ditto.
>        (riscv_valid_stack_push_pop_p): Ditto.
>        (riscv_emit_push_insn): Ditto.
>        (riscv_expand_prologue): Modify frame pattern.
>        (riscv_expand_epilogue): Ditto.
>        * config/riscv/riscv.h (RETURN_VALUE_REGNUM):
>        (RISCV_ZCE_PUSH_POP_MASK): New mask.
>        (RISCV_ZCMPE_PUSH_POP_MASK): Ditto.
>        * config/riscv/riscv.md: Add new reg number and include info.
>        * config/riscv/t-riscv: New object rules.
>        * config/riscv/riscv-zcmp-popret.cc: New file.
>        * config/riscv/zc.md: New file.
>---
> gcc/config.gcc                        |   2 +-
> gcc/config/riscv/predicates.md        |  16 +
> gcc/config/riscv/riscv-passes.def     |   1 +
> gcc/config/riscv/riscv-protos.h       |   4 +
> gcc/config/riscv/riscv-zcmp-popret.cc | 260 +++++++++++++++
> gcc/config/riscv/riscv.cc             | 437 +++++++++++++++++++++++++-
> gcc/config/riscv/riscv.h              |   4 +
> gcc/config/riscv/riscv.md             |   3 +
> gcc/config/riscv/t-riscv              |   4 +
> gcc/config/riscv/zc.md                |  47 +++
> 10 files changed, 767 insertions(+), 11 deletions(-)
> create mode 100644 gcc/config/riscv/riscv-zcmp-popret.cc
> create mode 100644 gcc/config/riscv/zc.md
>
>diff --git a/gcc/config.gcc b/gcc/config.gcc
>index 629d324b5ef..a991c5273f9 100644
>--- a/gcc/config.gcc
>+++ b/gcc/config.gcc
>@@ -529,7 +529,7 @@ pru-*-*)
>        ;;
> riscv*)
>        cpu_type=riscv
>-       extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
>riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o"
>+       extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
>riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o 
>riscv-zcmp-popret.o"
>        extra_objs="${extra_objs} riscv-vector-builtins.o 
>riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
>        extra_objs="${extra_objs} thead.o"
>        d_target_objs="riscv-d.o"
>diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
>index 0d9d7701c7e..6bff6cd047a 100644
>--- a/gcc/config/riscv/predicates.md
>+++ b/gcc/config/riscv/predicates.md
>@@ -412,3 +412,19 @@
>   (and (match_code "const_int")
>        (ior (match_operand 0 "not_uimm_extra_bit_operand")
>            (match_operand 0 "const_nottwobits_operand"))))
>+
>+(define_special_predicate "riscv_stack_push_operation"
>+  (match_code "parallel")
>+{
>+  return riscv_valid_stack_push_pop_p (op, true);
>+})
>+
>+(define_special_predicate "riscv_stack_pop_operation"
>+  (match_code "parallel")
>+{
>+  return riscv_valid_stack_push_pop_p (op, false);
>+})
>+
>+(define_predicate "pop_return_value_constant"
>+  (and (match_code "const_int")
>+       (match_test "INTVAL (op) == 0")))
>diff --git a/gcc/config/riscv/riscv-passes.def 
>b/gcc/config/riscv/riscv-passes.def
>index 4084122cf0a..25625b9af3e 100644
>--- a/gcc/config/riscv/riscv-passes.def
>+++ b/gcc/config/riscv/riscv-passes.def
>@@ -19,3 +19,4 @@
> 
> INSERT_PASS_AFTER (pass_rtl_store_motion, 1, pass_shorten_memrefs);
> INSERT_PASS_BEFORE (pass_fast_rtl_dce, 1, pass_vsetvl);
>+INSERT_PASS_AFTER (pass_cprop_hardreg, 1, pass_zcmp_popret);
>diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
>index 4611447ddde..8f243cd5f44 100644
>--- a/gcc/config/riscv/riscv-protos.h
>+++ b/gcc/config/riscv/riscv-protos.h
>@@ -54,6 +54,7 @@ extern bool riscv_split_64bit_move_p (rtx, rtx);
> extern void riscv_split_doubleword_move (rtx, rtx);
> extern const char *riscv_output_move (rtx, rtx);
> extern const char *riscv_output_return ();
>+extern bool riscv_output_popret_p (rtx);
> 
> #ifdef RTX_CODE
> extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx);
>@@ -79,6 +80,8 @@ extern void riscv_reinit (void);
> extern poly_uint64 riscv_regmode_natural_size (machine_mode);
> extern bool riscv_v_ext_vector_mode_p (machine_mode);
> extern bool riscv_shamt_matches_mask_p (int, HOST_WIDE_INT);
>+extern bool riscv_valid_stack_push_pop_p (rtx, bool);
>+extern bool riscv_check_regno(rtx, unsigned);
> 
> /* Routines implemented in riscv-c.cc.  */
> void riscv_cpu_cpp_builtins (cpp_reader *);
>@@ -99,6 +102,7 @@ extern bool riscv_hard_regno_rename_ok (unsigned, unsigned);
> 
> rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
> rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
>+rtl_opt_pass * make_pass_zcmp_popret (gcc::context *ctxt);
> 
> /* Information about one CPU we know about.  */
> struct riscv_cpu_info {
>diff --git a/gcc/config/riscv/riscv-zcmp-popret.cc 
>b/gcc/config/riscv/riscv-zcmp-popret.cc
>new file mode 100644
>index 00000000000..d7b40f6a3e2
>--- /dev/null
>+++ b/gcc/config/riscv/riscv-zcmp-popret.cc
>@@ -0,0 +1,260 @@
>+#include "config.h"
>+#include "system.h"
>+#include "coretypes.h"
>+#include "tm.h"
>+#include "rtl.h"
>+#include "backend.h"
>+#include "regs.h"
>+#include "target.h"
>+#include "memmodel.h"
>+#include "emit-rtl.h"
>+#include "df.h"
>+#include "predict.h"
>+#include "tree-pass.h"
>+#include "tree.h"
>+#include "tm_p.h"
>+#include "optabs.h"
>+#include "recog.h"
>+#include "cfgrtl.h"
>+
>+#define IN_TARGET_CODE 1
>+
>+namespace {
>+
>+/*
>+  1. preprocessing:
>+    1.1. if there is no push rtx, then just return. e.g.
>+    (note 5 1 22 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>+    (insn/f 22 5 23 2 (set (reg/f:SI 2 sp)
>+      (plus:SI (reg/f:SI 2 sp)
>+       (const_int -32 [0xffffffffffffffe0])))
>+    (nil))
>+    (note 23 22 2 2 NOTE_INSN_PROLOGUE_END)
>+    1.2. if push rtx exists, then we compute the number of
>+    pushed s-registers, n_sreg.
>+
>+  push rtx should be find before NOTE_INSN_PROLOGUE_END tag
>+
>+  [2 and 3 happend simultaneously]
>+  2. find valid move pattern, mv sN, aN, where N < n_sreg,
>+    and aN is not used the move pattern, and sN is not
>+    defined before the move pattern (from prologue to the
>+    position of move pattern).
>+  3. analysis use and reach of every instruction from prologue
>+    to the position of move pattern.
>+    if any sN is used, then we mark the corresponding argument list
>+    candidate as invalid.
>+    e.g.
>+       push  {ra,s0-s3}, {}, -32
>+       sw      s0,44(sp) # s0 is used, then argument list is invalid
>+       mv      a0,a5     # a0 is defined, then argument list is invalid
>+       ...
>+       mv      s0,a0
>+       mv      s1,a1
>+       mv      s2,a2
>+
>+  4. if there is a valid argument list, then replace the pop
>+    push parallel insn, and delete mv pattern.
>+     if not, skip.
>+*/
>+
>+static void
>+emit_zcmp_popret (rtx_insn *pop_rtx,
>+                 rtx_insn **candidates,
>+                 basic_block bb)
>+{
>+  bool gen_popretz_p = candidates [0];
>+  bool gen_popret_p = candidates [2];
>+
>+  if (!(gen_popret_p || gen_popretz_p))
>+    return;
>+
>+  gcc_assert ((gen_popret_p && !gen_popretz_p)
>+      || (gen_popretz_p && gen_popret_p));
>+
>+  rtx pop_pat = PATTERN (pop_rtx);
>+  unsigned pop_idx = 0, popret_idx = 0;
>+  unsigned n_pop_par = XVECLEN (pop_pat, 0);
>+  unsigned n_popret_par = n_pop_par
>+       + (gen_popretz_p ? 2 : 0)
>+       + (gen_popret_p ? 2 : 0);
>+
>+  rtx popret_par = gen_rtx_PARALLEL (VOIDmode,
>+         rtvec_alloc (n_popret_par));
>+
>+  /* return zero pattern */
>+  if (gen_popretz_p)
>+    {
>+      XVECEXP (popret_par, 0, 0) = PATTERN (candidates[0]);
>+      XVECEXP (popret_par, 0, 1) = PATTERN (candidates[1]);
>+      popret_idx += 2;
>+      delete_insn (candidates[0]);
>+      delete_insn (candidates[1]);
>+    }
>+
>+  /* copy pop paruence.  */
>+  for (; pop_idx < n_pop_par;
>+      pop_idx ++, popret_idx ++)
>+    {
>+      XVECEXP (popret_par, 0, popret_idx) =
>+         XVECEXP (pop_pat, 0, pop_idx);
>+    }
>+
>+  /* ret pattern.  */
>+  rtx ret_pat = PATTERN (candidates[2]);
>+  gcc_assert (GET_CODE (ret_pat) == PARALLEL);
>+
>+  for (int i = 0; i < XVECLEN (ret_pat, 0);
>+      i++, popret_idx++)
>+  {
>+    XVECEXP (popret_par, 0, popret_idx) =
>+       XVECEXP (ret_pat, 0, i);
>+  }
>+
>+  rtx_insn *insn = emit_jump_insn_after (
>+         popret_par,
>+         BB_END (bb));
>+  JUMP_LABEL (insn) = simple_return_rtx;
>+
>+  REG_NOTES (insn) = REG_NOTES (pop_rtx);
>+  RTX_FRAME_RELATED_P (insn) = 1;
>+
>+  if (dump_file)
>+    {
>+      fprintf(dump_file, "new insn:\n");
>+      print_rtl (dump_file, insn);
>+    }
>+
>+  delete_insn (candidates [2]);
>+  delete_insn (pop_rtx);
>+}
>+
>+static void
>+zcmp_popret (void)
>+{
>+  basic_block bb;
>+  rtx_insn *insn = NULL, *pop_rtx = NULL;
>+  rtx_insn *pop_candidates[3] = {NULL, };
>+  /*
>+    find NOTE_INSN_EPILOGUE_BEG, but pop_rtx not found => return
>+    find NOTE_INSN_EPILOGUE_BEG, and pop_rtx is found => looking for a0
>+  */
>+
>+  FOR_EACH_BB_REVERSE_FN (bb, cfun)
>+  {
>+    FOR_BB_INSNS_REVERSE (bb, insn)
>+      {
>+       if (!pop_rtx
>+           && NOTE_P (insn)
>+           && NOTE_KIND (insn) == NOTE_INSN_EPILOGUE_BEG)
>+         return;
>+
>+       if (NOTE_P (insn)
>+           && NOTE_KIND (insn) == NOTE_INSN_FUNCTION_BEG)
>+         {
>+           if (pop_rtx)
>+             emit_zcmp_popret (pop_rtx, pop_candidates, bb);
>+           return;
>+         };
>+
>+       if (!(NONDEBUG_INSN_P (insn)
>+           || CALL_P (insn)))
>+         continue;
>+
>+       rtx pop_pat = PATTERN (insn);
>+
>+       if (GET_CODE (pop_pat) == PARALLEL
>+           && riscv_valid_stack_push_pop_p (pop_pat, false))
>+         {
>+           pop_rtx = insn;
>+           continue;
>+         }
>+
>+       /* pattern for `ret`.  */
>+       if (JUMP_P (insn)
>+           && GET_CODE (pop_pat) == PARALLEL
>+           && XVECLEN (pop_pat, 0) == 2
>+           && GET_CODE (XVECEXP (pop_pat, 0, 0)) == SIMPLE_RETURN
>+           && GET_CODE (XVECEXP (pop_pat, 0, 1)) == USE)
>+         {
>+           rtx use_reg = XEXP (XVECEXP (pop_pat, 0, 1), 0);
>+           if (REG_P (use_reg)
>+             && REGNO (use_reg) == RETURN_ADDR_REGNUM)
>+             {
>+               pop_candidates [2] = insn;
>+               continue;
>+             }
>+         }
>+
>+       if (!pop_rtx)
>+         continue;
>+
>+       /* pattern for return value.  */
>+       if (!pop_candidates [0]
>+           && GET_CODE (pop_pat) == USE)
>+         {
>+           rtx_insn *set_insn = PREV_INSN (insn);
>+           rtx pat_set = PATTERN (set_insn);
>+
>+           if (riscv_check_regno (XEXP (pop_pat, 0),
>+                   RETURN_VALUE_REGNUM)
>+               && insn
>+               && pat_set != NULL
>+               && GET_CODE (pat_set) == SET
>+               && riscv_check_regno (SET_DEST (pat_set),
>+                      RETURN_VALUE_REGNUM)
>+               && CONST_INT_P (SET_SRC (pat_set))
>+               && INTVAL (SET_SRC (pat_set)) == 0)
>+             {
>+               pop_candidates [0] = set_insn;
>+               pop_candidates [1] = insn;
>+               break;
>+             }
>+         }
>+      }
>+
>+    if (pop_rtx)
>+      {
>+       emit_zcmp_popret (pop_rtx, pop_candidates, bb);
>+       return;
>+      }
>+  }
>+}
>+
>+const pass_data pass_data_zcmp_popret =
>+{
>+  RTL_PASS, /* type */
>+  "zcmp-popret", /* name */
>+  OPTGROUP_NONE, /* optinfo_flags */
>+  TV_NONE, /* tv_id */
>+  0, /* properties_required */
>+  0, /* properties_provided */
>+  0, /* properties_destroyed */
>+  0, /* todo_flags_start */
>+  0, /* todo_flags_finish */
>+};
>+
>+class pass_zcmp_popret : public rtl_opt_pass
>+{
>+public:
>+  pass_zcmp_popret (gcc::context *ctxt)
>+    : rtl_opt_pass (pass_data_zcmp_popret, ctxt)
>+  {}
>+
>+  /* opt_pass methods: */
>+  virtual bool gate (function *)
>+    { return TARGET_ZCMP; }
>+  virtual unsigned int execute (function *)
>+    {
>+      zcmp_popret ();
>+      return 0;
>+    }
>+}; // class pass_zcmp_popret
>+
>+} // anon namespace
>+
>+rtl_opt_pass *
>+make_pass_zcmp_popret (gcc::context *ctxt)
>+{
>+  return new pass_zcmp_popret (ctxt);
>+}
>diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>index 5f8cbfc15ed..17df2f3f8cf 100644
>--- a/gcc/config/riscv/riscv.cc
>+++ b/gcc/config/riscv/riscv.cc
>@@ -114,6 +114,9 @@ struct GTY(())  riscv_frame_info {
>   /* Likewise FPR X.  */
>   unsigned int fmask;
> 
>+  /* How much the push/pop routines adjust sp (or 0 if unused).  */
>+  unsigned push_pop_sp_adjust;
>+
>   /* How much the GPR save/restore routines adjust sp (or 0 if unused).  */
>   unsigned save_libcall_adjustment;
> 
>@@ -401,6 +404,20 @@ static const unsigned gpr_save_reg_order[] = {
>   S10_REGNUM, S11_REGNUM
> };
> 
>+/* Order for the CLOBBERs/USEs of push/pop.  */
>+static const unsigned push_save_reg_order[] = {
>+  INVALID_REGNUM, RETURN_ADDR_REGNUM, S0_REGNUM,
>+  S1_REGNUM, S2_REGNUM, S3_REGNUM, S4_REGNUM,
>+  S5_REGNUM, S6_REGNUM, S7_REGNUM, S8_REGNUM,
>+  S9_REGNUM, S10_REGNUM, S11_REGNUM
>+};
>+
>+/* Order for the CLOBBERs/USEs of push/pop in rve.  */
>+static const unsigned push_save_reg_order_zcmpe[] = {
>+  INVALID_REGNUM, RETURN_ADDR_REGNUM, S0_REGNUM,
>+  S1_REGNUM
>+};
>+
> /* A table describing all the processors GCC knows about.  */
> static const struct riscv_tune_info riscv_tune_info_table[] = {
> #define RISCV_TUNE(TUNE_NAME, PIPELINE_MODEL, TUNE_INFO)       \
>@@ -2989,6 +3006,17 @@ riscv_output_return ()
>   return "ret";
> }
> 
>+bool
>+riscv_output_popret_p (rtx op)
>+{
>+  unsigned n_rtx = XVECLEN (op, 0);
>+  rtx use = XVECEXP (op, 0, n_rtx - 1);
>+  rtx ret = XVECEXP (op, 0, n_rtx - 2);
>+
>+    return GET_CODE (ret) == SIMPLE_RETURN
>+       &&  GET_CODE (use) == USE;
>+}
>+
> 
>
> /* Return true if CMP1 is a suitable second operand for integer ordering
>    test CODE.  See also the *sCC patterns in riscv.md.  */
>@@ -4306,6 +4334,74 @@ riscv_memmodel_needs_amo_acquire (enum memmodel model)
>     }
> }
> 
>+/* Print Sp adjustment field of pop instruction.  */
>+
>+static void
>+riscv_print_pop_size (FILE *file, rtx op)
>+{
>+  unsigned sp_adjust_idx = XVECLEN (op, 0) - 1;
>+  rtx sp_adjust_rtx = XVECEXP (op, 0, sp_adjust_idx);
>+
>+  /* Skip ret or pattern.  */
>+  while (GET_CODE (sp_adjust_rtx) != SET)
>+    sp_adjust_rtx = XVECEXP (op, 0, --sp_adjust_idx);
>+
>+  rtx elt_plus = SET_SRC (sp_adjust_rtx);
>+  fprintf (file, "%ld", INTVAL (XEXP (elt_plus, 1)));
>+}
>+
>+/* Print push/pop register list. */
>+
>+static void
>+riscv_print_reglist (FILE *file, rtx op)
>+{
>+  /* we only deal with three formats:
>+      push {ra}
>+      push {ra, s0}
>+      push {ra, s0-sN}
>+    or
>+      pop {ra}
>+      pop {ra, s0}
>+      pop {ra, s0-sN}
>+    registers except ra has to be continuous s-register,
>+    and it is supposed to be checked before.
>+    register list patterns in push:
>+    (set/f (mem/c:SI
>+      (plus:SI (reg/f:SI 2 sp)
>+       (const_int 28 [0x1c])) [2  S4 A32])
>+      (reg:SI 1 ra))
>+    register list patterns in pop:
>+    (set/f (reg:DI 1 ra)
>+      (mem/c:DI (plus:DI (reg/f:DI 2 sp)
>+       (const_int 8 [0x8])) [2  S8 A64]))
>+  */
>+  int total_count = XVECLEN (op, 0);
>+  int n_regs = 0;
>+  bool push_p = GET_CODE (XVECEXP (op, 0, 0)) == SET
>+      && GET_CODE (SET_SRC (XVECEXP (op, 0, 0))) == PLUS;
>+
>+  for (int idx = 0; idx < total_count; ++idx)
>+    {
>+      rtx ele = XVECEXP (op, 0, idx);
>+      if (GET_CODE (ele) != SET)
>+       continue;
>+
>+      bool restore_save_p = push_p ?
>+         MEM_P (SET_DEST (ele)) :
>+         MEM_P (SET_SRC (ele));
>+
>+      if (restore_save_p)
>+       n_regs ++;
>+    }
>+
>+  if (n_regs > 2)
>+    fprintf (file, "ra,s0-s%u", n_regs - 2);
>+  else if (n_regs > 1)
>+    fprintf (file, "ra,s0");
>+  else
>+    fputs("ra", file);
>+}
>+
> /* Return true if a FENCE should be emitted to before a memory access to
>    implement the release portion of memory model MODEL.  */
> 
>@@ -4517,6 +4613,14 @@ riscv_print_operand (FILE *file, rtx op, int letter)
>       fputs (GET_RTX_NAME (code), file);
>       break;
> 
>+    case 'L':
>+      riscv_print_reglist (file, op);
>+      break;
>+
>+    case 's':
>+      riscv_print_pop_size (file, op);
>+      break;
>+
>     case 'S':
>       {
>        rtx newop = GEN_INT (ctz_hwi (INTVAL (op)));
>@@ -4777,6 +4881,66 @@ riscv_use_save_libcall (const struct riscv_frame_info 
>*frame)
>   return frame->save_libcall_adjustment != 0;
> }
> 
>+/* Determine how many instructions related to push/pop instructions.  */
>+
>+static unsigned
>+riscv_save_push_pop_count (unsigned mask)
>+{
>+  if (!BITSET_P (mask, GP_REG_FIRST + RETURN_ADDR_REGNUM))
>+    return 0;
>+  for (unsigned n = GP_REG_LAST; n > GP_REG_FIRST; n--)
>+    if (BITSET_P (mask, n)
>+       && !call_used_regs [n])
>+      /* add ra saving and sp adjust. */
>+      return CALLEE_SAVED_REG_NUMBER (n) + 1 + 2;
>+  abort ();
>+}
>+
>+/* Calculate the maximum sp adjustment of push/pop instruction. */
>+
>+static unsigned
>+riscv_push_pop_base_sp_adjust (unsigned mask)
>+{
>+  unsigned n_regs = riscv_save_push_pop_count (mask) - 1;
>+  return (n_regs * UNITS_PER_WORD + 15) & (~0xf);
>+}
>+
>+/* Determine whether to call push/pop routines.  */
>+
>+static bool
>+riscv_use_push_pop (const struct riscv_frame_info *frame, const HOST_WIDE_INT 
>frame_size)
>+{
>+  if (!TARGET_ZCMP)
>+    return false;
>+
>+  /* We do not handler variable argument cases currently.  */
>+  if (cfun->machine->varargs_size != 0)
>+    return false;
>+
>+  HOST_WIDE_INT base_size = riscv_push_pop_base_sp_adjust (frame->mask);
>+  /*
>+     Pr 960215-1.c in rv64 ouputs
>+
>+       addi    sp,sp,-32
>+       sd      ra,24(sp)
>+       sd      s0,16(sp)
>+       sd      s2,8(sp)
>+       sd      s3,0(sp)
>+     it is a rare case that callee saved registers are not non-continous,
>+     which breaks the old push implementation, and we just reject this case
>+     like save-restore does now.
>+  */
>+  if (base_size > frame_size)
>+    return false;
>+
>+  /* {ra,s0-s10} is invalid. */
>+  if (frame->mask & (1 << (S10_REGNUM - GP_REG_FIRST))
>+      && !(frame->mask & (1 << (S11_REGNUM - GP_REG_FIRST))))
>+    return false;
>+
>+  return frame->mask & (1 << (RETURN_ADDR_REGNUM - GP_REG_FIRST));
>+}
>+
> /* Determine which GPR save/restore routine to call.  */
> 
> static unsigned
>@@ -4934,6 +5098,8 @@ riscv_compute_frame_info (void)
>   /* Only use save/restore routines when the GPRs are atop the frame.  */
>   if (known_ne (frame->hard_frame_pointer_offset, frame->total_size))
>     frame->save_libcall_adjustment = 0;
>+
>+  frame->push_pop_sp_adjust = 0;
> }
> 
> /* Make sure that we're not trying to eliminate to the wrong hard frame
>@@ -5171,6 +5337,86 @@ riscv_for_each_saved_reg (poly_int64 sp_offset, 
>riscv_save_restore_fn fn,
>       }
> }
> 
>+static void
>+riscv_emit_pop_insn (struct riscv_frame_info *frame, HOST_WIDE_INT offset, 
>HOST_WIDE_INT size)
>+{
>+  unsigned int veclen = riscv_save_push_pop_count (frame->mask);
>+  unsigned int n_reg = veclen - 1;
>+  rtvec vec = rtvec_alloc (veclen);
>+  HOST_WIDE_INT sp_adjust;
>+  rtx dwarf = NULL_RTX;
>+
>+  const unsigned *reg_order = (TARGET_ZCMP && TARGET_RVE)
>+       ? push_save_reg_order_zcmpe
>+       : push_save_reg_order;
>+
>+  gcc_assert (n_reg >= 1
>+       && TARGET_ZCMP
>+       && ((TARGET_RVE && (n_reg <= ARRAY_SIZE (push_save_reg_order_zcmpe)))
>+           || (TARGET_ZCMP && (n_reg <= ARRAY_SIZE (push_save_reg_order)))));
>+
>+  /* sp adjust pattern */
>+  int max_allow_sp_adjust = riscv_push_pop_base_sp_adjust (frame->mask) + 48;
>+  int aligned_size = size;
>+
>+  /* if sp adjustment is too large, we should split it first. */
>+  if (aligned_size > max_allow_sp_adjust)
>+    {
>+      rtx dwarf_pre_sp_adjust = NULL_RTX;
>+      rtx pre_adjust_rtx = gen_add3_insn (stack_pointer_rtx,
>+                       stack_pointer_rtx,
>+                       GEN_INT (aligned_size - max_allow_sp_adjust));
>+      rtx insn = emit_insn (pre_adjust_rtx);
>+
>+      rtx cfa_pre_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
>+                       GEN_INT (aligned_size - max_allow_sp_adjust));
>+      dwarf_pre_sp_adjust = alloc_reg_note (REG_CFA_DEF_CFA,
>+               cfa_pre_adjust_rtx,
>+               dwarf_pre_sp_adjust);
>+
>+      RTX_FRAME_RELATED_P (insn) = 1;
>+      REG_NOTES (insn) = dwarf_pre_sp_adjust;
>+
>+      sp_adjust = max_allow_sp_adjust;
>+    }
>+  else
>+    sp_adjust = (aligned_size + 15) & (~0xf);
>+
>+  /* register save sequence. */
>+  for (unsigned i = 1; i < veclen; ++i)
>+    {
>+      offset -= UNITS_PER_WORD;
>+      unsigned regno = reg_order[i];
>+      rtx reg = gen_rtx_REG (Pmode, regno);
>+      rtx mem = gen_frame_mem (Pmode, plus_constant (Pmode,
>+             stack_pointer_rtx,
>+             offset));
>+      rtx set = gen_rtx_SET (reg, mem);
>+      RTVEC_ELT (vec, i - 1) = set;
>+      RTX_FRAME_RELATED_P (set) = 1;
>+      dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf);
>+    }
>+
>+  /* sp adjust pattern */
>+  rtx adjust_sp_rtx
>+      = gen_rtx_SET (stack_pointer_rtx,
>+           plus_constant (Pmode,
>+               stack_pointer_rtx,
>+               sp_adjust));
>+  RTVEC_ELT (vec, veclen - 1) = adjust_sp_rtx;
>+
>+  rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
>+       const0_rtx);
>+  dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, dwarf);
>+
>+  frame->gp_sp_offset -= (veclen - 1) * UNITS_PER_WORD;
>+  frame->push_pop_sp_adjust = sp_adjust;
>+
>+  rtx insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, vec));
>+  RTX_FRAME_RELATED_P (insn) = 1;
>+  REG_NOTES (insn) = dwarf;
>+}
>+
> /* For stack frames that can't be allocated with a single ADDI instruction,
>    compute the best value to initially allocate.  It must at a minimum
>    allocate enough space to spill the callee-saved registers.  If TARGET_RVC,
>@@ -5270,6 +5516,146 @@ riscv_emit_stack_tie (void)
>     emit_insn (gen_stack_tiedi (stack_pointer_rtx, hard_frame_pointer_rtx));
> }
> 
>+bool
>+riscv_check_regno(rtx pat, unsigned regno)
>+{
>+  return REG_P (pat)
>+      && REGNO (pat) == regno;
>+}
>+
>+/* Function to check whether the OP is a valid stack push/pop operation.
>+   This part is borrowed from nds32 nds32_valid_stack_push_pop_p */
>+
>+bool
>+riscv_valid_stack_push_pop_p (rtx op, bool push_p)
>+{
>+  int index;
>+  int total_count;
>+  int sp_adjust_rtx_index;
>+  rtx elt;
>+  rtx elt_reg;
>+  rtx elt_plus;
>+
>+  if (!TARGET_ZCMP)
>+    return false;
>+
>+  total_count = XVECLEN (op, 0);
>+  sp_adjust_rtx_index = push_p ? 0 : total_count - 1;
>+
>+  /* At least sp + one callee save/restore register rtx */
>+  if (total_count < 2)
>+    return false;
>+
>+  /* Perform some quick check for that every element should be 'set',
>+     for pop, it might contain `ret` and `ret value` pattern.  */
>+  for (index = 0; index < total_count; index++)
>+    {
>+      elt = XVECEXP (op, 0, index);
>+
>+      /* skip pop return value rtx */
>+      if (!push_p && GET_CODE (elt) == SET
>+         && riscv_check_regno (SET_DEST (elt), RETURN_VALUE_REGNUM)
>+         && total_count >= 4
>+         && index + 1 < total_count
>+         && GET_CODE (XVECEXP (op, 0, index + 1)) == USE)
>+       {
>+         rtx use_reg = XEXP (XVECEXP (op, 0, index + 1), 0);
>+
>+         if (!riscv_check_regno (use_reg, RETURN_VALUE_REGNUM))
>+           return false;
>+
>+         index += 1;
>+         continue;
>+       }
>+
>+      /* skip ret rtx */
>+      if (!push_p && GET_CODE (elt) == SIMPLE_RETURN
>+         && total_count >= 4
>+         && index + 1 < total_count
>+         && GET_CODE (XVECEXP (op, 0, index + 1)) == USE)
>+       {
>+         rtx use_reg = XEXP (XVECEXP (op, 0, index + 1), 0);
>+
>+         if (!riscv_check_regno (use_reg, RETURN_ADDR_REGNUM))
>+           return false;
>+
>+         index += 1;
>+         sp_adjust_rtx_index -= 2;
>+         continue;
>+       }
>+
>+      if (GET_CODE (elt) != SET)
>+       return false;
>+    }
>+
>+  elt = XVECEXP (op, 0, sp_adjust_rtx_index);
>+  elt_reg  = SET_DEST (elt);
>+  elt_plus = SET_SRC (elt);
>+
>+  /* Check this is (set (stack_reg) (plus stack_reg const)) pattern.  */
>+  if (GET_CODE (elt_plus) != PLUS
>+      || !riscv_check_regno (elt_reg, STACK_POINTER_REGNUM))
>+    return false;
>+
>+  /* Pass all test, this is a valid rtx.  */
>+  return true;
>+}
>+
>+/* Generate push/pop rtx */
>+
>+static void
>+riscv_emit_push_insn (struct riscv_frame_info *frame, HOST_WIDE_INT size)
>+{
>+  unsigned int veclen = riscv_save_push_pop_count (frame->mask);
>+  unsigned int n_reg = veclen - 1;
>+  rtvec vec = rtvec_alloc (veclen);
>+
>+  const unsigned *reg_order = (TARGET_ZCMP && TARGET_RVE)
>+       ? push_save_reg_order_zcmpe
>+       : push_save_reg_order;
>+
>+  int aligned_size = (size + 15) & (~0xf);
>+
>+  gcc_assert (n_reg >= 1
>+       && TARGET_ZCMP
>+       && ((TARGET_RVE && (n_reg <= ARRAY_SIZE (push_save_reg_order_zcmpe)))
>+           || (TARGET_ZCMP && (n_reg <= ARRAY_SIZE (push_save_reg_order)))));
>+
>+  /* sp adjust pattern */
>+  int max_allow_sp_adjust = riscv_push_pop_base_sp_adjust (frame->mask) + 48;
>+  int sp_adjust = aligned_size > max_allow_sp_adjust ?
>+      max_allow_sp_adjust
>+      : aligned_size;
>+
>+  /*TODO: move this part to frame computation function. */
>+  frame->gp_sp_offset = (veclen - 1) * UNITS_PER_WORD;
>+  frame->push_pop_sp_adjust = sp_adjust;
>+
>+  rtx adjust_sp_rtx
>+      = gen_rtx_SET (stack_pointer_rtx,
>+           plus_constant (Pmode,
>+           stack_pointer_rtx,
>+           -sp_adjust));
>+  RTVEC_ELT (vec, 0) = adjust_sp_rtx;
>+
>+  /* Register save sequence. */
>+  for (unsigned i = 1; i < veclen; ++i)
>+    {
>+      sp_adjust -= UNITS_PER_WORD;
>+      unsigned regno = reg_order[i];
>+      rtx reg = gen_rtx_REG (Pmode, regno);
>+      rtx mem = gen_frame_mem (Pmode, plus_constant (Pmode,
>+             stack_pointer_rtx,
>+             sp_adjust));
>+      rtx set = gen_rtx_SET (mem, reg);
>+      RTVEC_ELT (vec, i) = set;
>+      RTX_FRAME_RELATED_P (set) = 1;
>+    }
>+
>+  rtx insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, vec));
>+  RTX_FRAME_RELATED_P (insn) = 1;
>+}
>+
> /* Expand the "prologue" pattern.  */
> 
> void
>@@ -5278,6 +5664,7 @@ riscv_expand_prologue (void)
>   struct riscv_frame_info *frame = &cfun->machine->frame;
>   poly_int64 size = frame->total_size;
>   unsigned mask = frame->mask;
>+  HOST_WIDE_INT step1 = riscv_first_stack_step (frame);
>   rtx insn;
> 
>   if (flag_stack_usage_info)
>@@ -5300,19 +5687,32 @@ riscv_expand_prologue (void)
>       REG_NOTES (insn) = dwarf;
>     }
> 
>+    if (size.is_constant ())
>+    step1 = MIN (size.to_constant(), step1);
>+  if (riscv_use_push_pop (frame, step1))
>+    {
>+      riscv_emit_push_insn (frame, step1);
>+
>+      step1 = MAX (step1 - frame->push_pop_sp_adjust, 0);
>+      size = MAX (size.to_constant() - frame->push_pop_sp_adjust, 0);
>+      frame->mask &= ~ ((TARGET_ZCMP && TARGET_RVE) ?
>+                 RISCV_ZCMPE_PUSH_POP_MASK
>+               : RISCV_ZCE_PUSH_POP_MASK);
>+    }
>+
>   /* Save the registers.  */
>   if ((frame->mask | frame->fmask) != 0)
>     {
>-      HOST_WIDE_INT step1 = riscv_first_stack_step (frame);
>-      if (size.is_constant ())
>-       step1 = MIN (size.to_constant(), step1);
>-
>-      insn = gen_add3_insn (stack_pointer_rtx,
>-                           stack_pointer_rtx,
>-                           GEN_INT (-step1));
>-      RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
>-      size -= step1;
>-      riscv_for_each_saved_reg (size, riscv_save_reg, false, false);
>+       if (step1 > 0)
>+       {
>+         insn = gen_add3_insn (stack_pointer_rtx,
>+                       stack_pointer_rtx,
>+                       GEN_INT (-step1));
>+         RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
>+         size -= step1;
>+       }
>+     riscv_for_each_saved_reg (size, riscv_save_reg,
>+        false /* bool epilogue */, false /* bool maybe_eh_return */);
>     }
> 
>   frame->mask = mask; /* Undo the above fib.  */
>@@ -5412,6 +5812,8 @@ riscv_expand_epilogue (int style)
>   rtx ra = gen_rtx_REG (Pmode, RETURN_ADDR_REGNUM);
>   rtx insn;
> 
>+  bool use_zcmp_pop = !use_restore_libcall && !(crtl->calls_eh_return);
>+
>   /* We need to add memory barrier to prevent read from deallocated stack.  */
>   bool need_barrier_p = known_ne (get_frame_size ()
>                                  + cfun->machine->frame.arg_pointer_offset, 0);
>@@ -5538,6 +5940,18 @@ riscv_expand_epilogue (int style)
>   if (use_restore_libcall)
>     frame->mask = 0; /* Temporarily fib that we need not save GPRs.  */
> 
>+  if (use_zcmp_pop && riscv_use_push_pop (frame, step2))
>+    {
>+      /* Emit a barrier to prevent loads from a deallocated stack.  */
>+      riscv_emit_stack_tie ();
>+      need_barrier_p = false;
>+      riscv_emit_pop_insn (frame, frame->total_size.to_constant(), step2);
>+      frame->mask &= ~ ((TARGET_ZCMP && TARGET_RVE) ?
>+                 RISCV_ZCMPE_PUSH_POP_MASK
>+               : RISCV_ZCE_PUSH_POP_MASK);
>+      step2 = 0;
>+    }
>+
>   /* Restore the registers.  */
>   riscv_for_each_saved_reg (frame->total_size - step2, riscv_restore_reg,
>                            true, style == EXCEPTION_RETURN);
>@@ -5552,6 +5966,9 @@ riscv_expand_epilogue (int style)
>   if (need_barrier_p)
>     riscv_emit_stack_tie ();
> 
>+  if (use_zcmp_pop)
>+    frame->mask = mask;
>+
>   /* Deallocate the final bit of the frame.  */
>   if (step2 > 0)
>     {
>diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
>index d05b1d59853..6e6e3ee2c25 100644
>--- a/gcc/config/riscv/riscv.h
>+++ b/gcc/config/riscv/riscv.h
>@@ -383,6 +383,7 @@ ASM_MISA_SPEC
> #define HARD_FRAME_POINTER_REGNUM 8
> #define STACK_POINTER_REGNUM 2
> #define THREAD_POINTER_REGNUM 4
>+#define RETURN_VALUE_REGNUM 10
> 
> /* These two registers don't really exist: they get eliminated to either
>    the stack or hard frame pointer.  */
>@@ -1097,4 +1098,7 @@ extern void riscv_remove_unneeded_save_restore_calls 
>(void);
> #define DWARF_REG_TO_UNWIND_COLUMN(REGNO) \
>   ((REGNO == RISCV_DWARF_VLENB) ? (FIRST_PSEUDO_REGISTER + 1) : REGNO)
> 
>+#define RISCV_ZCE_PUSH_POP_MASK 0x0ffc0302u
>+#define RISCV_ZCMPE_PUSH_POP_MASK 0x302u
>+
> #endif /* ! GCC_RISCV_H */
>diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
>index bc384d9aedf..b9f2a426e48 100644
>--- a/gcc/config/riscv/riscv.md
>+++ b/gcc/config/riscv/riscv.md
>@@ -108,12 +108,14 @@
> 
> (define_constants
>   [(RETURN_ADDR_REGNUM         1)
>+   (SP_REGNUM                  2)
>    (GP_REGNUM                  3)
>    (TP_REGNUM                  4)
>    (T0_REGNUM                  5)
>    (T1_REGNUM                  6)
>    (S0_REGNUM                  8)
>    (S1_REGNUM                  9)
>+   (A0_REGNUM                  10)
>    (S2_REGNUM                  18)
>    (S3_REGNUM                  19)
>    (S4_REGNUM                  20)
>@@ -3147,3 +3149,4 @@
> (include "sifive-7.md")
> (include "thead.md")
> (include "vector.md")
>+(include "zc.md")
>diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
>index 6e326fc7e02..9ef522306a5 100644
>--- a/gcc/config/riscv/t-riscv
>+++ b/gcc/config/riscv/t-riscv
>@@ -90,6 +90,10 @@ riscv-v.o: $(srcdir)/config/riscv/riscv-v.cc \
>        $(COMPILE) $<
>        $(POSTCOMPILE)
> 
>+riscv-zcmp-popret.o: $(srcdir)/config/riscv/riscv-zcmp-popret.cc
>+       $(COMPILE) $<
>+       $(POSTCOMPILE)
>+
> thead.o: $(srcdir)/config/riscv/thead.cc \
>   $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TARGET_H) backend.h $(RTL_H) \
>   memmodel.h $(EMIT_RTL_H) poly-int.h output.h
>diff --git a/gcc/config/riscv/zc.md b/gcc/config/riscv/zc.md
>new file mode 100644
>index 00000000000..3ad34dacd49
>--- /dev/null
>+++ b/gcc/config/riscv/zc.md
>@@ -0,0 +1,47 @@
>+;; Machine description for ZCE extension.
>+;; Copyright (C) 2021 Free Software Foundation, Inc.
>+
>+;; This file is part of GCC.
>+
>+;; GCC is free software; you can redistribute it and/or modify
>+;; it under the terms of the GNU General Public License as published by
>+;; the Free Software Foundation; either version 3, or (at your option)
>+;; any later version.
>+
>+;; GCC is distributed in the hope that it will be useful,
>+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
>+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>+;; GNU General Public License for more details.
>+
>+;; You should have received a copy of the GNU General Public License
>+;; along with GCC; see the file COPYING3.  If not see
>+;; <http://www.gnu.org/licenses/>.
>+
>+(define_insn "*stack_push<mode>"
>+  [(match_parallel 0 "riscv_stack_push_operation"
>+    [(set (reg:X SP_REGNUM) (plus:X (reg:X SP_REGNUM)
>+      (match_operand:X 1 "const_int_operand" "")))])]
>+  "TARGET_ZCMP"
>+  "cm.push\t{%L0},%1")
>+
>+(define_insn "*stack_pop<mode>"
>+  [(match_parallel 0 "riscv_stack_pop_operation"
>+    [(set (match_operand:X 1 "register_operand" "")
>+      (mem:X (plus:X (reg:X SP_REGNUM)
>+       (match_operand:X 2 "const_int_operand" ""))))])]
>+  "TARGET_ZCMP"
>+  {
>+    return riscv_output_popret_p (operands[0]) ?
>+       "cm.popret\t{%L0},%s0" :
>+       "cm.pop\t{%L0},%s0";
>+  })
>+
>+(define_insn "*stack_pop_with_return_value<mode>"
>+  [(match_parallel 0 "riscv_stack_pop_operation"
>+    [(set (reg:ANYI A0_REGNUM)
>+      (match_operand:ANYI 1 "pop_return_value_constant" ""))])]
>+  "TARGET_ZCMP"
>+  {
>+    gcc_assert (riscv_output_popret_p (operands[0]));
>+    return "cm.popretz\t{%L0},%s0";
>+  })
>-- 
>2.25.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 4/5] RISC-V: Add Zcmp extension supports.
  2023-04-25 10:11 [PATCH 4/5] RISC-V: Add Zcmp extension supports Fei Gao
@ 2023-05-05 15:57 ` Sinan
  2023-05-06  8:53   ` Fei Gao
  0 siblings, 1 reply; 9+ messages in thread
From: Sinan @ 2023-05-05 15:57 UTC (permalink / raw)
  To: Fei Gao; +Cc: Jiawei, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 35462 bytes --]

> hi Jiawei
> 
> Please ignore my previous reply. I accidently sent the email before I finished it.
> Sorry for that!
> 
> I downloaded the series of patches from you and found in some cases
> it fails to generate zcmp push and pop insns.
> 
> TC:
> 
> char my_getchar();
> int test_s0()
> {
> 
> int a = my_getchar();
> int b = my_getchar();
> return a+b;
> }
> 
> cc1 -fno-shrink-wrap-separate -O2 -march=rv32e_zca_zcmp -mabi=ilp32e -mcmodel=medlow test.c
> 
> -fno-shrink-wrap-separate is used here to avoid the impact from shrink-wrap-separate that is by default
> enabled in O2.
> 
> As i'm also interested in Zc*, i did some changes mainly in prologue and epilogue pass quite simliar to
> what has been done for save and restore except the CFI directives due to reversed order that zcmp
> pushes and pops ra, s regs than what save and restore do. 
> 
> I will refine and share the code soon for your review.
> 
> BR
> Fei
Hi Fei,
In the current implementation, cm.push will not increase the original adjustment size of the stack pointer. As cm.push uses a minimum adjustment size of 16, and in your example, the adjustment size of sp is 12, so cm.push will not be generated.
you can find the check at riscv_use_push_pop
> > + */
> > + if (base_size > frame_size)
> > + return false;
> > +
And if this check is removed, then you can get the output that you expect. 
```
 cm.push {ra,s0},-16
 call my_getchar
 mv s0,a0
 call my_getchar
 add a0,s0,a0
 cm.popret {ra,s0},16
```
In many scenarios of rv32e, cm.push cannot be generated as a result. Perhaps we can remove this check? I haven't tested if it is ok to remove this check, and CC jiawei to help test it.
BR,
Sinan
------------------------------------------------------------------
Sender:Fei Gao <gaofei@eswincomputing.com>
Sent At:2023 Apr. 25 (Tue.) 18:12
Recipient:jiawei <jiawei@iscas.ac.cn>
Cc:gcc-patches <gcc-patches@gcc.gnu.org>
Subject:[PATCH 4/5] RISC-V: Add Zcmp extension supports.
hi Jiawei
Please ignore my previous reply. I accidently sent the email before I finished it.
Sorry for that!
I downloaded the series of patches from you and found in some cases
it fails to generate zcmp push and pop insns.
TC:
char my_getchar();
int test_s0()
{
 int a = my_getchar();
 int b = my_getchar();
 return a+b;
}
cc1 -fno-shrink-wrap-separate -O2 -march=rv32e_zca_zcmp -mabi=ilp32e -mcmodel=medlow test.c
-fno-shrink-wrap-separate is used here to avoid the impact from shrink-wrap-separate that is by default
enabled in O2.
As i'm also interested in Zc*, i did some changes mainly in prologue and epilogue pass quite simliar to
what has been done for save and restore except the CFI directives due to reversed order that zcmp
pushes and pops ra, s regs than what save and restore do. 
I will refine and share the code soon for your review.
BR
Fei
On Thu Apr 6 06:21:17 GMT 2023 Jiawei jiawei@iscas.ac.cn wrote:
>
>Add Zcmp extension instructions support. Generate push/pop
>with follow steps:
>
> 1. preprocessing:
> 1.1. if there is no push rtx, then just return. e.g.
> (note 5 1 22 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
> (insn/f 22 5 23 2 (set (reg/f:SI 2 sp)
> (plus:SI (reg/f:SI 2 sp)
> (const_int -32 [0xffffffffffffffe0])))
> (nil))
> (note 23 22 2 2 NOTE_INSN_PROLOGUE_END)
> 1.2. if push rtx exists, then we compute the number of
> pushed s-registers, n_sreg.
>
> push rtx should be find before NOTE_INSN_PROLOGUE_END tag
>
> [2 and 3 happend simultaneously]
>
> 2. find valid move pattern, mv sN, aN, where N < n_sreg,
> and aN is not used the move pattern, and sN is not
> defined before the move pattern (from prologue to the
> position of move pattern).
>
> 3. analysis use and reach of every instruction from prologue
> to the position of move pattern.
> if any sN is used, then we mark the corresponding argument list
> candidate as invalid.
> e.g.
> push {ra,s0-s3}, {}, -32
> sw s0,44(sp) # s0 is used, then argument list is invalid
> mv a0,a5 # a0 is defined, then argument list is invalid
> ...
> mv s0,a0
> mv s1,a1
> mv s2,a2
>
> 4. if there is a valid argument list, then replace the pop
> push parallel insn, and delete mv pattern.
> if not, skip.
>
>All "zcmpe" means Zcmp with RVE extension.
>The push/pop instrunction implement is mostly finished by Sinan Lin.
>
>Co-Authored by: Sinan Lin <sinan....@linux.alibaba.com>
>Co-Authored by: Simon Cook <simon.c...@embecosm.com>
>Co-Authored by: Shihua Liao <shi...@iscas.ac.cn>
>
>gcc/ChangeLog:
>
> * config.gcc: New object.
> * config/riscv/predicates.md (riscv_stack_push_operation):
> New predicate.
> (riscv_stack_pop_operation): Ditto.
> (pop_return_value_constant): Ditto.
> * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): New pass.
> * config/riscv/riscv-protos.h (riscv_output_popret_p):
> New routine.
> (riscv_valid_stack_push_pop_p): Ditto.
> (riscv_check_regno): Ditto.
> (make_pass_zcmp_popret): New pass.
> * config/riscv/riscv.cc (struct riscv_frame_info): New variable.
> (riscv_output_popret_p): New function.
> (riscv_print_pop_size): Ditto.
> (riscv_print_reglist): Ditto.
> (riscv_print_operand): New case symbols.
> (riscv_save_push_pop_count): New function.
> (riscv_push_pop_base_sp_adjust): Ditto.
> (riscv_use_push_pop): Ditto.
> (riscv_compute_frame_info): Adjust frame value.
> (riscv_emit_pop_insn): New function.
> (riscv_check_regno): Ditto.
> (riscv_valid_stack_push_pop_p): Ditto.
> (riscv_emit_push_insn): Ditto.
> (riscv_expand_prologue): Modify frame pattern.
> (riscv_expand_epilogue): Ditto.
> * config/riscv/riscv.h (RETURN_VALUE_REGNUM):
> (RISCV_ZCE_PUSH_POP_MASK): New mask.
> (RISCV_ZCMPE_PUSH_POP_MASK): Ditto.
> * config/riscv/riscv.md: Add new reg number and include info.
> * config/riscv/t-riscv: New object rules.
> * config/riscv/riscv-zcmp-popret.cc: New file.
> * config/riscv/zc.md: New file.
>---
> gcc/config.gcc | 2 +-
> gcc/config/riscv/predicates.md | 16 +
> gcc/config/riscv/riscv-passes.def | 1 +
> gcc/config/riscv/riscv-protos.h | 4 +
> gcc/config/riscv/riscv-zcmp-popret.cc | 260 +++++++++++++++
> gcc/config/riscv/riscv.cc | 437 +++++++++++++++++++++++++-
> gcc/config/riscv/riscv.h | 4 +
> gcc/config/riscv/riscv.md | 3 +
> gcc/config/riscv/t-riscv | 4 +
> gcc/config/riscv/zc.md | 47 +++
> 10 files changed, 767 insertions(+), 11 deletions(-)
> create mode 100644 gcc/config/riscv/riscv-zcmp-popret.cc
> create mode 100644 gcc/config/riscv/zc.md
>
>diff --git a/gcc/config.gcc b/gcc/config.gcc
>index 629d324b5ef..a991c5273f9 100644
>--- a/gcc/config.gcc
>+++ b/gcc/config.gcc
>@@ -529,7 +529,7 @@ pru-*-*)
> ;;
> riscv*)
> cpu_type=riscv
>- extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
>riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o"
>+ extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
>riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o 
>riscv-zcmp-popret.o"
> extra_objs="${extra_objs} riscv-vector-builtins.o 
>riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
> extra_objs="${extra_objs} thead.o"
> d_target_objs="riscv-d.o"
>diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
>index 0d9d7701c7e..6bff6cd047a 100644
>--- a/gcc/config/riscv/predicates.md
>+++ b/gcc/config/riscv/predicates.md
>@@ -412,3 +412,19 @@
> (and (match_code "const_int")
> (ior (match_operand 0 "not_uimm_extra_bit_operand")
> (match_operand 0 "const_nottwobits_operand"))))
>+
>+(define_special_predicate "riscv_stack_push_operation"
>+ (match_code "parallel")
>+{
>+ return riscv_valid_stack_push_pop_p (op, true);
>+})
>+
>+(define_special_predicate "riscv_stack_pop_operation"
>+ (match_code "parallel")
>+{
>+ return riscv_valid_stack_push_pop_p (op, false);
>+})
>+
>+(define_predicate "pop_return_value_constant"
>+ (and (match_code "const_int")
>+ (match_test "INTVAL (op) == 0")))
>diff --git a/gcc/config/riscv/riscv-passes.def 
>b/gcc/config/riscv/riscv-passes.def
>index 4084122cf0a..25625b9af3e 100644
>--- a/gcc/config/riscv/riscv-passes.def
>+++ b/gcc/config/riscv/riscv-passes.def
>@@ -19,3 +19,4 @@
> 
> INSERT_PASS_AFTER (pass_rtl_store_motion, 1, pass_shorten_memrefs);
> INSERT_PASS_BEFORE (pass_fast_rtl_dce, 1, pass_vsetvl);
>+INSERT_PASS_AFTER (pass_cprop_hardreg, 1, pass_zcmp_popret);
>diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
>index 4611447ddde..8f243cd5f44 100644
>--- a/gcc/config/riscv/riscv-protos.h
>+++ b/gcc/config/riscv/riscv-protos.h
>@@ -54,6 +54,7 @@ extern bool riscv_split_64bit_move_p (rtx, rtx);
> extern void riscv_split_doubleword_move (rtx, rtx);
> extern const char *riscv_output_move (rtx, rtx);
> extern const char *riscv_output_return ();
>+extern bool riscv_output_popret_p (rtx);
> 
> #ifdef RTX_CODE
> extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx);
>@@ -79,6 +80,8 @@ extern void riscv_reinit (void);
> extern poly_uint64 riscv_regmode_natural_size (machine_mode);
> extern bool riscv_v_ext_vector_mode_p (machine_mode);
> extern bool riscv_shamt_matches_mask_p (int, HOST_WIDE_INT);
>+extern bool riscv_valid_stack_push_pop_p (rtx, bool);
>+extern bool riscv_check_regno(rtx, unsigned);
> 
> /* Routines implemented in riscv-c.cc. */
> void riscv_cpu_cpp_builtins (cpp_reader *);
>@@ -99,6 +102,7 @@ extern bool riscv_hard_regno_rename_ok (unsigned, unsigned);
> 
> rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
> rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
>+rtl_opt_pass * make_pass_zcmp_popret (gcc::context *ctxt);
> 
> /* Information about one CPU we know about. */
> struct riscv_cpu_info {
>diff --git a/gcc/config/riscv/riscv-zcmp-popret.cc 
>b/gcc/config/riscv/riscv-zcmp-popret.cc
>new file mode 100644
>index 00000000000..d7b40f6a3e2
>--- /dev/null
>+++ b/gcc/config/riscv/riscv-zcmp-popret.cc
>@@ -0,0 +1,260 @@
>+#include "config.h"
>+#include "system.h"
>+#include "coretypes.h"
>+#include "tm.h"
>+#include "rtl.h"
>+#include "backend.h"
>+#include "regs.h"
>+#include "target.h"
>+#include "memmodel.h"
>+#include "emit-rtl.h"
>+#include "df.h"
>+#include "predict.h"
>+#include "tree-pass.h"
>+#include "tree.h"
>+#include "tm_p.h"
>+#include "optabs.h"
>+#include "recog.h"
>+#include "cfgrtl.h"
>+
>+#define IN_TARGET_CODE 1
>+
>+namespace {
>+
>+/*
>+ 1. preprocessing:
>+ 1.1. if there is no push rtx, then just return. e.g.
>+ (note 5 1 22 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>+ (insn/f 22 5 23 2 (set (reg/f:SI 2 sp)
>+ (plus:SI (reg/f:SI 2 sp)
>+ (const_int -32 [0xffffffffffffffe0])))
>+ (nil))
>+ (note 23 22 2 2 NOTE_INSN_PROLOGUE_END)
>+ 1.2. if push rtx exists, then we compute the number of
>+ pushed s-registers, n_sreg.
>+
>+ push rtx should be find before NOTE_INSN_PROLOGUE_END tag
>+
>+ [2 and 3 happend simultaneously]
>+ 2. find valid move pattern, mv sN, aN, where N < n_sreg,
>+ and aN is not used the move pattern, and sN is not
>+ defined before the move pattern (from prologue to the
>+ position of move pattern).
>+ 3. analysis use and reach of every instruction from prologue
>+ to the position of move pattern.
>+ if any sN is used, then we mark the corresponding argument list
>+ candidate as invalid.
>+ e.g.
>+ push {ra,s0-s3}, {}, -32
>+ sw s0,44(sp) # s0 is used, then argument list is invalid
>+ mv a0,a5 # a0 is defined, then argument list is invalid
>+ ...
>+ mv s0,a0
>+ mv s1,a1
>+ mv s2,a2
>+
>+ 4. if there is a valid argument list, then replace the pop
>+ push parallel insn, and delete mv pattern.
>+ if not, skip.
>+*/
>+
>+static void
>+emit_zcmp_popret (rtx_insn *pop_rtx,
>+ rtx_insn **candidates,
>+ basic_block bb)
>+{
>+ bool gen_popretz_p = candidates [0];
>+ bool gen_popret_p = candidates [2];
>+
>+ if (!(gen_popret_p || gen_popretz_p))
>+ return;
>+
>+ gcc_assert ((gen_popret_p && !gen_popretz_p)
>+ || (gen_popretz_p && gen_popret_p));
>+
>+ rtx pop_pat = PATTERN (pop_rtx);
>+ unsigned pop_idx = 0, popret_idx = 0;
>+ unsigned n_pop_par = XVECLEN (pop_pat, 0);
>+ unsigned n_popret_par = n_pop_par
>+ + (gen_popretz_p ? 2 : 0)
>+ + (gen_popret_p ? 2 : 0);
>+
>+ rtx popret_par = gen_rtx_PARALLEL (VOIDmode,
>+ rtvec_alloc (n_popret_par));
>+
>+ /* return zero pattern */
>+ if (gen_popretz_p)
>+ {
>+ XVECEXP (popret_par, 0, 0) = PATTERN (candidates[0]);
>+ XVECEXP (popret_par, 0, 1) = PATTERN (candidates[1]);
>+ popret_idx += 2;
>+ delete_insn (candidates[0]);
>+ delete_insn (candidates[1]);
>+ }
>+
>+ /* copy pop paruence. */
>+ for (; pop_idx < n_pop_par;
>+ pop_idx ++, popret_idx ++)
>+ {
>+ XVECEXP (popret_par, 0, popret_idx) =
>+ XVECEXP (pop_pat, 0, pop_idx);
>+ }
>+
>+ /* ret pattern. */
>+ rtx ret_pat = PATTERN (candidates[2]);
>+ gcc_assert (GET_CODE (ret_pat) == PARALLEL);
>+
>+ for (int i = 0; i < XVECLEN (ret_pat, 0);
>+ i++, popret_idx++)
>+ {
>+ XVECEXP (popret_par, 0, popret_idx) =
>+ XVECEXP (ret_pat, 0, i);
>+ }
>+
>+ rtx_insn *insn = emit_jump_insn_after (
>+ popret_par,
>+ BB_END (bb));
>+ JUMP_LABEL (insn) = simple_return_rtx;
>+
>+ REG_NOTES (insn) = REG_NOTES (pop_rtx);
>+ RTX_FRAME_RELATED_P (insn) = 1;
>+
>+ if (dump_file)
>+ {
>+ fprintf(dump_file, "new insn:\n");
>+ print_rtl (dump_file, insn);
>+ }
>+
>+ delete_insn (candidates [2]);
>+ delete_insn (pop_rtx);
>+}
>+
>+static void
>+zcmp_popret (void)
>+{
>+ basic_block bb;
>+ rtx_insn *insn = NULL, *pop_rtx = NULL;
>+ rtx_insn *pop_candidates[3] = {NULL, };
>+ /*
>+ find NOTE_INSN_EPILOGUE_BEG, but pop_rtx not found => return
>+ find NOTE_INSN_EPILOGUE_BEG, and pop_rtx is found => looking for a0
>+ */
>+
>+ FOR_EACH_BB_REVERSE_FN (bb, cfun)
>+ {
>+ FOR_BB_INSNS_REVERSE (bb, insn)
>+ {
>+ if (!pop_rtx
>+ && NOTE_P (insn)
>+ && NOTE_KIND (insn) == NOTE_INSN_EPILOGUE_BEG)
>+ return;
>+
>+ if (NOTE_P (insn)
>+ && NOTE_KIND (insn) == NOTE_INSN_FUNCTION_BEG)
>+ {
>+ if (pop_rtx)
>+ emit_zcmp_popret (pop_rtx, pop_candidates, bb);
>+ return;
>+ };
>+
>+ if (!(NONDEBUG_INSN_P (insn)
>+ || CALL_P (insn)))
>+ continue;
>+
>+ rtx pop_pat = PATTERN (insn);
>+
>+ if (GET_CODE (pop_pat) == PARALLEL
>+ && riscv_valid_stack_push_pop_p (pop_pat, false))
>+ {
>+ pop_rtx = insn;
>+ continue;
>+ }
>+
>+ /* pattern for `ret`. */
>+ if (JUMP_P (insn)
>+ && GET_CODE (pop_pat) == PARALLEL
>+ && XVECLEN (pop_pat, 0) == 2
>+ && GET_CODE (XVECEXP (pop_pat, 0, 0)) == SIMPLE_RETURN
>+ && GET_CODE (XVECEXP (pop_pat, 0, 1)) == USE)
>+ {
>+ rtx use_reg = XEXP (XVECEXP (pop_pat, 0, 1), 0);
>+ if (REG_P (use_reg)
>+ && REGNO (use_reg) == RETURN_ADDR_REGNUM)
>+ {
>+ pop_candidates [2] = insn;
>+ continue;
>+ }
>+ }
>+
>+ if (!pop_rtx)
>+ continue;
>+
>+ /* pattern for return value. */
>+ if (!pop_candidates [0]
>+ && GET_CODE (pop_pat) == USE)
>+ {
>+ rtx_insn *set_insn = PREV_INSN (insn);
>+ rtx pat_set = PATTERN (set_insn);
>+
>+ if (riscv_check_regno (XEXP (pop_pat, 0),
>+ RETURN_VALUE_REGNUM)
>+ && insn
>+ && pat_set != NULL
>+ && GET_CODE (pat_set) == SET
>+ && riscv_check_regno (SET_DEST (pat_set),
>+ RETURN_VALUE_REGNUM)
>+ && CONST_INT_P (SET_SRC (pat_set))
>+ && INTVAL (SET_SRC (pat_set)) == 0)
>+ {
>+ pop_candidates [0] = set_insn;
>+ pop_candidates [1] = insn;
>+ break;
>+ }
>+ }
>+ }
>+
>+ if (pop_rtx)
>+ {
>+ emit_zcmp_popret (pop_rtx, pop_candidates, bb);
>+ return;
>+ }
>+ }
>+}
>+
>+const pass_data pass_data_zcmp_popret =
>+{
>+ RTL_PASS, /* type */
>+ "zcmp-popret", /* name */
>+ OPTGROUP_NONE, /* optinfo_flags */
>+ TV_NONE, /* tv_id */
>+ 0, /* properties_required */
>+ 0, /* properties_provided */
>+ 0, /* properties_destroyed */
>+ 0, /* todo_flags_start */
>+ 0, /* todo_flags_finish */
>+};
>+
>+class pass_zcmp_popret : public rtl_opt_pass
>+{
>+public:
>+ pass_zcmp_popret (gcc::context *ctxt)
>+ : rtl_opt_pass (pass_data_zcmp_popret, ctxt)
>+ {}
>+
>+ /* opt_pass methods: */
>+ virtual bool gate (function *)
>+ { return TARGET_ZCMP; }
>+ virtual unsigned int execute (function *)
>+ {
>+ zcmp_popret ();
>+ return 0;
>+ }
>+}; // class pass_zcmp_popret
>+
>+} // anon namespace
>+
>+rtl_opt_pass *
>+make_pass_zcmp_popret (gcc::context *ctxt)
>+{
>+ return new pass_zcmp_popret (ctxt);
>+}
>diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>index 5f8cbfc15ed..17df2f3f8cf 100644
>--- a/gcc/config/riscv/riscv.cc
>+++ b/gcc/config/riscv/riscv.cc
>@@ -114,6 +114,9 @@ struct GTY(()) riscv_frame_info {
> /* Likewise FPR X. */
> unsigned int fmask;
> 
>+ /* How much the push/pop routines adjust sp (or 0 if unused). */
>+ unsigned push_pop_sp_adjust;
>+
> /* How much the GPR save/restore routines adjust sp (or 0 if unused). */
> unsigned save_libcall_adjustment;
> 
>@@ -401,6 +404,20 @@ static const unsigned gpr_save_reg_order[] = {
> S10_REGNUM, S11_REGNUM
> };
> 
>+/* Order for the CLOBBERs/USEs of push/pop. */
>+static const unsigned push_save_reg_order[] = {
>+ INVALID_REGNUM, RETURN_ADDR_REGNUM, S0_REGNUM,
>+ S1_REGNUM, S2_REGNUM, S3_REGNUM, S4_REGNUM,
>+ S5_REGNUM, S6_REGNUM, S7_REGNUM, S8_REGNUM,
>+ S9_REGNUM, S10_REGNUM, S11_REGNUM
>+};
>+
>+/* Order for the CLOBBERs/USEs of push/pop in rve. */
>+static const unsigned push_save_reg_order_zcmpe[] = {
>+ INVALID_REGNUM, RETURN_ADDR_REGNUM, S0_REGNUM,
>+ S1_REGNUM
>+};
>+
> /* A table describing all the processors GCC knows about. */
> static const struct riscv_tune_info riscv_tune_info_table[] = {
> #define RISCV_TUNE(TUNE_NAME, PIPELINE_MODEL, TUNE_INFO) \
>@@ -2989,6 +3006,17 @@ riscv_output_return ()
> return "ret";
> }
> 
>+bool
>+riscv_output_popret_p (rtx op)
>+{
>+ unsigned n_rtx = XVECLEN (op, 0);
>+ rtx use = XVECEXP (op, 0, n_rtx - 1);
>+ rtx ret = XVECEXP (op, 0, n_rtx - 2);
>+
>+ return GET_CODE (ret) == SIMPLE_RETURN
>+ && GET_CODE (use) == USE;
>+}
>+
> 
>
> /* Return true if CMP1 is a suitable second operand for integer ordering
> test CODE. See also the *sCC patterns in riscv.md. */
>@@ -4306,6 +4334,74 @@ riscv_memmodel_needs_amo_acquire (enum memmodel model)
> }
> }
> 
>+/* Print Sp adjustment field of pop instruction. */
>+
>+static void
>+riscv_print_pop_size (FILE *file, rtx op)
>+{
>+ unsigned sp_adjust_idx = XVECLEN (op, 0) - 1;
>+ rtx sp_adjust_rtx = XVECEXP (op, 0, sp_adjust_idx);
>+
>+ /* Skip ret or pattern. */
>+ while (GET_CODE (sp_adjust_rtx) != SET)
>+ sp_adjust_rtx = XVECEXP (op, 0, --sp_adjust_idx);
>+
>+ rtx elt_plus = SET_SRC (sp_adjust_rtx);
>+ fprintf (file, "%ld", INTVAL (XEXP (elt_plus, 1)));
>+}
>+
>+/* Print push/pop register list. */
>+
>+static void
>+riscv_print_reglist (FILE *file, rtx op)
>+{
>+ /* we only deal with three formats:
>+ push {ra}
>+ push {ra, s0}
>+ push {ra, s0-sN}
>+ or
>+ pop {ra}
>+ pop {ra, s0}
>+ pop {ra, s0-sN}
>+ registers except ra has to be continuous s-register,
>+ and it is supposed to be checked before.
>+ register list patterns in push:
>+ (set/f (mem/c:SI
>+ (plus:SI (reg/f:SI 2 sp)
>+ (const_int 28 [0x1c])) [2 S4 A32])
>+ (reg:SI 1 ra))
>+ register list patterns in pop:
>+ (set/f (reg:DI 1 ra)
>+ (mem/c:DI (plus:DI (reg/f:DI 2 sp)
>+ (const_int 8 [0x8])) [2 S8 A64]))
>+ */
>+ int total_count = XVECLEN (op, 0);
>+ int n_regs = 0;
>+ bool push_p = GET_CODE (XVECEXP (op, 0, 0)) == SET
>+ && GET_CODE (SET_SRC (XVECEXP (op, 0, 0))) == PLUS;
>+
>+ for (int idx = 0; idx < total_count; ++idx)
>+ {
>+ rtx ele = XVECEXP (op, 0, idx);
>+ if (GET_CODE (ele) != SET)
>+ continue;
>+
>+ bool restore_save_p = push_p ?
>+ MEM_P (SET_DEST (ele)) :
>+ MEM_P (SET_SRC (ele));
>+
>+ if (restore_save_p)
>+ n_regs ++;
>+ }
>+
>+ if (n_regs > 2)
>+ fprintf (file, "ra,s0-s%u", n_regs - 2);
>+ else if (n_regs > 1)
>+ fprintf (file, "ra,s0");
>+ else
>+ fputs("ra", file);
>+}
>+
> /* Return true if a FENCE should be emitted to before a memory access to
> implement the release portion of memory model MODEL. */
> 
>@@ -4517,6 +4613,14 @@ riscv_print_operand (FILE *file, rtx op, int letter)
> fputs (GET_RTX_NAME (code), file);
> break;
> 
>+ case 'L':
>+ riscv_print_reglist (file, op);
>+ break;
>+
>+ case 's':
>+ riscv_print_pop_size (file, op);
>+ break;
>+
> case 'S':
> {
> rtx newop = GEN_INT (ctz_hwi (INTVAL (op)));
>@@ -4777,6 +4881,66 @@ riscv_use_save_libcall (const struct riscv_frame_info 
>*frame)
> return frame->save_libcall_adjustment != 0;
> }
> 
>+/* Determine how many instructions related to push/pop instructions. */
>+
>+static unsigned
>+riscv_save_push_pop_count (unsigned mask)
>+{
>+ if (!BITSET_P (mask, GP_REG_FIRST + RETURN_ADDR_REGNUM))
>+ return 0;
>+ for (unsigned n = GP_REG_LAST; n > GP_REG_FIRST; n--)
>+ if (BITSET_P (mask, n)
>+ && !call_used_regs [n])
>+ /* add ra saving and sp adjust. */
>+ return CALLEE_SAVED_REG_NUMBER (n) + 1 + 2;
>+ abort ();
>+}
>+
>+/* Calculate the maximum sp adjustment of push/pop instruction. */
>+
>+static unsigned
>+riscv_push_pop_base_sp_adjust (unsigned mask)
>+{
>+ unsigned n_regs = riscv_save_push_pop_count (mask) - 1;
>+ return (n_regs * UNITS_PER_WORD + 15) & (~0xf);
>+}
>+
>+/* Determine whether to call push/pop routines. */
>+
>+static bool
>+riscv_use_push_pop (const struct riscv_frame_info *frame, const HOST_WIDE_INT 
>frame_size)
>+{
>+ if (!TARGET_ZCMP)
>+ return false;
>+
>+ /* We do not handler variable argument cases currently. */
>+ if (cfun->machine->varargs_size != 0)
>+ return false;
>+
>+ HOST_WIDE_INT base_size = riscv_push_pop_base_sp_adjust (frame->mask);
>+ /*
>+ Pr 960215-1.c in rv64 ouputs
>+
>+ addi sp,sp,-32
>+ sd ra,24(sp)
>+ sd s0,16(sp)
>+ sd s2,8(sp)
>+ sd s3,0(sp)
>+ it is a rare case that callee saved registers are not non-continous,
>+ which breaks the old push implementation, and we just reject this case
>+ like save-restore does now.
>+ */
>+ if (base_size > frame_size)
>+ return false;
>+
>+ /* {ra,s0-s10} is invalid. */
>+ if (frame->mask & (1 << (S10_REGNUM - GP_REG_FIRST))
>+ && !(frame->mask & (1 << (S11_REGNUM - GP_REG_FIRST))))
>+ return false;
>+
>+ return frame->mask & (1 << (RETURN_ADDR_REGNUM - GP_REG_FIRST));
>+}
>+
> /* Determine which GPR save/restore routine to call. */
> 
> static unsigned
>@@ -4934,6 +5098,8 @@ riscv_compute_frame_info (void)
> /* Only use save/restore routines when the GPRs are atop the frame. */
> if (known_ne (frame->hard_frame_pointer_offset, frame->total_size))
> frame->save_libcall_adjustment = 0;
>+
>+ frame->push_pop_sp_adjust = 0;
> }
> 
> /* Make sure that we're not trying to eliminate to the wrong hard frame
>@@ -5171,6 +5337,86 @@ riscv_for_each_saved_reg (poly_int64 sp_offset, 
>riscv_save_restore_fn fn,
> }
> }
> 
>+static void
>+riscv_emit_pop_insn (struct riscv_frame_info *frame, HOST_WIDE_INT offset, 
>HOST_WIDE_INT size)
>+{
>+ unsigned int veclen = riscv_save_push_pop_count (frame->mask);
>+ unsigned int n_reg = veclen - 1;
>+ rtvec vec = rtvec_alloc (veclen);
>+ HOST_WIDE_INT sp_adjust;
>+ rtx dwarf = NULL_RTX;
>+
>+ const unsigned *reg_order = (TARGET_ZCMP && TARGET_RVE)
>+ ? push_save_reg_order_zcmpe
>+ : push_save_reg_order;
>+
>+ gcc_assert (n_reg >= 1
>+ && TARGET_ZCMP
>+ && ((TARGET_RVE && (n_reg <= ARRAY_SIZE (push_save_reg_order_zcmpe)))
>+ || (TARGET_ZCMP && (n_reg <= ARRAY_SIZE (push_save_reg_order)))));
>+
>+ /* sp adjust pattern */
>+ int max_allow_sp_adjust = riscv_push_pop_base_sp_adjust (frame->mask) + 48;
>+ int aligned_size = size;
>+
>+ /* if sp adjustment is too large, we should split it first. */
>+ if (aligned_size > max_allow_sp_adjust)
>+ {
>+ rtx dwarf_pre_sp_adjust = NULL_RTX;
>+ rtx pre_adjust_rtx = gen_add3_insn (stack_pointer_rtx,
>+ stack_pointer_rtx,
>+ GEN_INT (aligned_size - max_allow_sp_adjust));
>+ rtx insn = emit_insn (pre_adjust_rtx);
>+
>+ rtx cfa_pre_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
>+ GEN_INT (aligned_size - max_allow_sp_adjust));
>+ dwarf_pre_sp_adjust = alloc_reg_note (REG_CFA_DEF_CFA,
>+ cfa_pre_adjust_rtx,
>+ dwarf_pre_sp_adjust);
>+
>+ RTX_FRAME_RELATED_P (insn) = 1;
>+ REG_NOTES (insn) = dwarf_pre_sp_adjust;
>+
>+ sp_adjust = max_allow_sp_adjust;
>+ }
>+ else
>+ sp_adjust = (aligned_size + 15) & (~0xf);
>+
>+ /* register save sequence. */
>+ for (unsigned i = 1; i < veclen; ++i)
>+ {
>+ offset -= UNITS_PER_WORD;
>+ unsigned regno = reg_order[i];
>+ rtx reg = gen_rtx_REG (Pmode, regno);
>+ rtx mem = gen_frame_mem (Pmode, plus_constant (Pmode,
>+ stack_pointer_rtx,
>+ offset));
>+ rtx set = gen_rtx_SET (reg, mem);
>+ RTVEC_ELT (vec, i - 1) = set;
>+ RTX_FRAME_RELATED_P (set) = 1;
>+ dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf);
>+ }
>+
>+ /* sp adjust pattern */
>+ rtx adjust_sp_rtx
>+ = gen_rtx_SET (stack_pointer_rtx,
>+ plus_constant (Pmode,
>+ stack_pointer_rtx,
>+ sp_adjust));
>+ RTVEC_ELT (vec, veclen - 1) = adjust_sp_rtx;
>+
>+ rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
>+ const0_rtx);
>+ dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, dwarf);
>+
>+ frame->gp_sp_offset -= (veclen - 1) * UNITS_PER_WORD;
>+ frame->push_pop_sp_adjust = sp_adjust;
>+
>+ rtx insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, vec));
>+ RTX_FRAME_RELATED_P (insn) = 1;
>+ REG_NOTES (insn) = dwarf;
>+}
>+
> /* For stack frames that can't be allocated with a single ADDI instruction,
> compute the best value to initially allocate. It must at a minimum
> allocate enough space to spill the callee-saved registers. If TARGET_RVC,
>@@ -5270,6 +5516,146 @@ riscv_emit_stack_tie (void)
> emit_insn (gen_stack_tiedi (stack_pointer_rtx, hard_frame_pointer_rtx));
> }
> 
>+bool
>+riscv_check_regno(rtx pat, unsigned regno)
>+{
>+ return REG_P (pat)
>+ && REGNO (pat) == regno;
>+}
>+
>+/* Function to check whether the OP is a valid stack push/pop operation.
>+ This part is borrowed from nds32 nds32_valid_stack_push_pop_p */
>+
>+bool
>+riscv_valid_stack_push_pop_p (rtx op, bool push_p)
>+{
>+ int index;
>+ int total_count;
>+ int sp_adjust_rtx_index;
>+ rtx elt;
>+ rtx elt_reg;
>+ rtx elt_plus;
>+
>+ if (!TARGET_ZCMP)
>+ return false;
>+
>+ total_count = XVECLEN (op, 0);
>+ sp_adjust_rtx_index = push_p ? 0 : total_count - 1;
>+
>+ /* At least sp + one callee save/restore register rtx */
>+ if (total_count < 2)
>+ return false;
>+
>+ /* Perform some quick check for that every element should be 'set',
>+ for pop, it might contain `ret` and `ret value` pattern. */
>+ for (index = 0; index < total_count; index++)
>+ {
>+ elt = XVECEXP (op, 0, index);
>+
>+ /* skip pop return value rtx */
>+ if (!push_p && GET_CODE (elt) == SET
>+ && riscv_check_regno (SET_DEST (elt), RETURN_VALUE_REGNUM)
>+ && total_count >= 4
>+ && index + 1 < total_count
>+ && GET_CODE (XVECEXP (op, 0, index + 1)) == USE)
>+ {
>+ rtx use_reg = XEXP (XVECEXP (op, 0, index + 1), 0);
>+
>+ if (!riscv_check_regno (use_reg, RETURN_VALUE_REGNUM))
>+ return false;
>+
>+ index += 1;
>+ continue;
>+ }
>+
>+ /* skip ret rtx */
>+ if (!push_p && GET_CODE (elt) == SIMPLE_RETURN
>+ && total_count >= 4
>+ && index + 1 < total_count
>+ && GET_CODE (XVECEXP (op, 0, index + 1)) == USE)
>+ {
>+ rtx use_reg = XEXP (XVECEXP (op, 0, index + 1), 0);
>+
>+ if (!riscv_check_regno (use_reg, RETURN_ADDR_REGNUM))
>+ return false;
>+
>+ index += 1;
>+ sp_adjust_rtx_index -= 2;
>+ continue;
>+ }
>+
>+ if (GET_CODE (elt) != SET)
>+ return false;
>+ }
>+
>+ elt = XVECEXP (op, 0, sp_adjust_rtx_index);
>+ elt_reg = SET_DEST (elt);
>+ elt_plus = SET_SRC (elt);
>+
>+ /* Check this is (set (stack_reg) (plus stack_reg const)) pattern. */
>+ if (GET_CODE (elt_plus) != PLUS
>+ || !riscv_check_regno (elt_reg, STACK_POINTER_REGNUM))
>+ return false;
>+
>+ /* Pass all test, this is a valid rtx. */
>+ return true;
>+}
>+
>+/* Generate push/pop rtx */
>+
>+static void
>+riscv_emit_push_insn (struct riscv_frame_info *frame, HOST_WIDE_INT size)
>+{
>+ unsigned int veclen = riscv_save_push_pop_count (frame->mask);
>+ unsigned int n_reg = veclen - 1;
>+ rtvec vec = rtvec_alloc (veclen);
>+
>+ const unsigned *reg_order = (TARGET_ZCMP && TARGET_RVE)
>+ ? push_save_reg_order_zcmpe
>+ : push_save_reg_order;
>+
>+ int aligned_size = (size + 15) & (~0xf);
>+
>+ gcc_assert (n_reg >= 1
>+ && TARGET_ZCMP
>+ && ((TARGET_RVE && (n_reg <= ARRAY_SIZE (push_save_reg_order_zcmpe)))
>+ || (TARGET_ZCMP && (n_reg <= ARRAY_SIZE (push_save_reg_order)))));
>+
>+ /* sp adjust pattern */
>+ int max_allow_sp_adjust = riscv_push_pop_base_sp_adjust (frame->mask) + 48;
>+ int sp_adjust = aligned_size > max_allow_sp_adjust ?
>+ max_allow_sp_adjust
>+ : aligned_size;
>+
>+ /*TODO: move this part to frame computation function. */
>+ frame->gp_sp_offset = (veclen - 1) * UNITS_PER_WORD;
>+ frame->push_pop_sp_adjust = sp_adjust;
>+
>+ rtx adjust_sp_rtx
>+ = gen_rtx_SET (stack_pointer_rtx,
>+ plus_constant (Pmode,
>+ stack_pointer_rtx,
>+ -sp_adjust));
>+ RTVEC_ELT (vec, 0) = adjust_sp_rtx;
>+
>+ /* Register save sequence. */
>+ for (unsigned i = 1; i < veclen; ++i)
>+ {
>+ sp_adjust -= UNITS_PER_WORD;
>+ unsigned regno = reg_order[i];
>+ rtx reg = gen_rtx_REG (Pmode, regno);
>+ rtx mem = gen_frame_mem (Pmode, plus_constant (Pmode,
>+ stack_pointer_rtx,
>+ sp_adjust));
>+ rtx set = gen_rtx_SET (mem, reg);
>+ RTVEC_ELT (vec, i) = set;
>+ RTX_FRAME_RELATED_P (set) = 1;
>+ }
>+
>+ rtx insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, vec));
>+ RTX_FRAME_RELATED_P (insn) = 1;
>+}
>+
> /* Expand the "prologue" pattern. */
> 
> void
>@@ -5278,6 +5664,7 @@ riscv_expand_prologue (void)
> struct riscv_frame_info *frame = &cfun->machine->frame;
> poly_int64 size = frame->total_size;
> unsigned mask = frame->mask;
>+ HOST_WIDE_INT step1 = riscv_first_stack_step (frame);
> rtx insn;
> 
> if (flag_stack_usage_info)
>@@ -5300,19 +5687,32 @@ riscv_expand_prologue (void)
> REG_NOTES (insn) = dwarf;
> }
> 
>+ if (size.is_constant ())
>+ step1 = MIN (size.to_constant(), step1);
>+ if (riscv_use_push_pop (frame, step1))
>+ {
>+ riscv_emit_push_insn (frame, step1);
>+
>+ step1 = MAX (step1 - frame->push_pop_sp_adjust, 0);
>+ size = MAX (size.to_constant() - frame->push_pop_sp_adjust, 0);
>+ frame->mask &= ~ ((TARGET_ZCMP && TARGET_RVE) ?
>+ RISCV_ZCMPE_PUSH_POP_MASK
>+ : RISCV_ZCE_PUSH_POP_MASK);
>+ }
>+
> /* Save the registers. */
> if ((frame->mask | frame->fmask) != 0)
> {
>- HOST_WIDE_INT step1 = riscv_first_stack_step (frame);
>- if (size.is_constant ())
>- step1 = MIN (size.to_constant(), step1);
>-
>- insn = gen_add3_insn (stack_pointer_rtx,
>- stack_pointer_rtx,
>- GEN_INT (-step1));
>- RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
>- size -= step1;
>- riscv_for_each_saved_reg (size, riscv_save_reg, false, false);
>+ if (step1 > 0)
>+ {
>+ insn = gen_add3_insn (stack_pointer_rtx,
>+ stack_pointer_rtx,
>+ GEN_INT (-step1));
>+ RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
>+ size -= step1;
>+ }
>+ riscv_for_each_saved_reg (size, riscv_save_reg,
>+ false /* bool epilogue */, false /* bool maybe_eh_return */);
> }
> 
> frame->mask = mask; /* Undo the above fib. */
>@@ -5412,6 +5812,8 @@ riscv_expand_epilogue (int style)
> rtx ra = gen_rtx_REG (Pmode, RETURN_ADDR_REGNUM);
> rtx insn;
> 
>+ bool use_zcmp_pop = !use_restore_libcall && !(crtl->calls_eh_return);
>+
> /* We need to add memory barrier to prevent read from deallocated stack. */
> bool need_barrier_p = known_ne (get_frame_size ()
> + cfun->machine->frame.arg_pointer_offset, 0);
>@@ -5538,6 +5940,18 @@ riscv_expand_epilogue (int style)
> if (use_restore_libcall)
> frame->mask = 0; /* Temporarily fib that we need not save GPRs. */
> 
>+ if (use_zcmp_pop && riscv_use_push_pop (frame, step2))
>+ {
>+ /* Emit a barrier to prevent loads from a deallocated stack. */
>+ riscv_emit_stack_tie ();
>+ need_barrier_p = false;
>+ riscv_emit_pop_insn (frame, frame->total_size.to_constant(), step2);
>+ frame->mask &= ~ ((TARGET_ZCMP && TARGET_RVE) ?
>+ RISCV_ZCMPE_PUSH_POP_MASK
>+ : RISCV_ZCE_PUSH_POP_MASK);
>+ step2 = 0;
>+ }
>+
> /* Restore the registers. */
> riscv_for_each_saved_reg (frame->total_size - step2, riscv_restore_reg,
> true, style == EXCEPTION_RETURN);
>@@ -5552,6 +5966,9 @@ riscv_expand_epilogue (int style)
> if (need_barrier_p)
> riscv_emit_stack_tie ();
> 
>+ if (use_zcmp_pop)
>+ frame->mask = mask;
>+
> /* Deallocate the final bit of the frame. */
> if (step2 > 0)
> {
>diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
>index d05b1d59853..6e6e3ee2c25 100644
>--- a/gcc/config/riscv/riscv.h
>+++ b/gcc/config/riscv/riscv.h
>@@ -383,6 +383,7 @@ ASM_MISA_SPEC
> #define HARD_FRAME_POINTER_REGNUM 8
> #define STACK_POINTER_REGNUM 2
> #define THREAD_POINTER_REGNUM 4
>+#define RETURN_VALUE_REGNUM 10
> 
> /* These two registers don't really exist: they get eliminated to either
> the stack or hard frame pointer. */
>@@ -1097,4 +1098,7 @@ extern void riscv_remove_unneeded_save_restore_calls 
>(void);
> #define DWARF_REG_TO_UNWIND_COLUMN(REGNO) \
> ((REGNO == RISCV_DWARF_VLENB) ? (FIRST_PSEUDO_REGISTER + 1) : REGNO)
> 
>+#define RISCV_ZCE_PUSH_POP_MASK 0x0ffc0302u
>+#define RISCV_ZCMPE_PUSH_POP_MASK 0x302u
>+
> #endif /* ! GCC_RISCV_H */
>diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
>index bc384d9aedf..b9f2a426e48 100644
>--- a/gcc/config/riscv/riscv.md
>+++ b/gcc/config/riscv/riscv.md
>@@ -108,12 +108,14 @@
> 
> (define_constants
> [(RETURN_ADDR_REGNUM 1)
>+ (SP_REGNUM 2)
> (GP_REGNUM 3)
> (TP_REGNUM 4)
> (T0_REGNUM 5)
> (T1_REGNUM 6)
> (S0_REGNUM 8)
> (S1_REGNUM 9)
>+ (A0_REGNUM 10)
> (S2_REGNUM 18)
> (S3_REGNUM 19)
> (S4_REGNUM 20)
>@@ -3147,3 +3149,4 @@
> (include "sifive-7.md")
> (include "thead.md")
> (include "vector.md")
>+(include "zc.md")
>diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
>index 6e326fc7e02..9ef522306a5 100644
>--- a/gcc/config/riscv/t-riscv
>+++ b/gcc/config/riscv/t-riscv
>@@ -90,6 +90,10 @@ riscv-v.o: $(srcdir)/config/riscv/riscv-v.cc \
> $(COMPILE) $<
> $(POSTCOMPILE)
> 
>+riscv-zcmp-popret.o: $(srcdir)/config/riscv/riscv-zcmp-popret.cc
>+ $(COMPILE) $<
>+ $(POSTCOMPILE)
>+
> thead.o: $(srcdir)/config/riscv/thead.cc \
> $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TARGET_H) backend.h $(RTL_H) \
> memmodel.h $(EMIT_RTL_H) poly-int.h output.h
>diff --git a/gcc/config/riscv/zc.md b/gcc/config/riscv/zc.md
>new file mode 100644
>index 00000000000..3ad34dacd49
>--- /dev/null
>+++ b/gcc/config/riscv/zc.md
>@@ -0,0 +1,47 @@
>+;; Machine description for ZCE extension.
>+;; Copyright (C) 2021 Free Software Foundation, Inc.
>+
>+;; This file is part of GCC.
>+
>+;; GCC is free software; you can redistribute it and/or modify
>+;; it under the terms of the GNU General Public License as published by
>+;; the Free Software Foundation; either version 3, or (at your option)
>+;; any later version.
>+
>+;; GCC is distributed in the hope that it will be useful,
>+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
>+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>+;; GNU General Public License for more details.
>+
>+;; You should have received a copy of the GNU General Public License
>+;; along with GCC; see the file COPYING3. If not see
>+;; <http://www.gnu.org/licenses/>.
>+
>+(define_insn "*stack_push<mode>"
>+ [(match_parallel 0 "riscv_stack_push_operation"
>+ [(set (reg:X SP_REGNUM) (plus:X (reg:X SP_REGNUM)
>+ (match_operand:X 1 "const_int_operand" "")))])]
>+ "TARGET_ZCMP"
>+ "cm.push\t{%L0},%1")
>+
>+(define_insn "*stack_pop<mode>"
>+ [(match_parallel 0 "riscv_stack_pop_operation"
>+ [(set (match_operand:X 1 "register_operand" "")
>+ (mem:X (plus:X (reg:X SP_REGNUM)
>+ (match_operand:X 2 "const_int_operand" ""))))])]
>+ "TARGET_ZCMP"
>+ {
>+ return riscv_output_popret_p (operands[0]) ?
>+ "cm.popret\t{%L0},%s0" :
>+ "cm.pop\t{%L0},%s0";
>+ })
>+
>+(define_insn "*stack_pop_with_return_value<mode>"
>+ [(match_parallel 0 "riscv_stack_pop_operation"
>+ [(set (reg:ANYI A0_REGNUM)
>+ (match_operand:ANYI 1 "pop_return_value_constant" ""))])]
>+ "TARGET_ZCMP"
>+ {
>+ gcc_assert (riscv_output_popret_p (operands[0]));
>+ return "cm.popretz\t{%L0},%s0";
>+ })
>-- 
>2.25.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re: [PATCH 4/5] RISC-V: Add Zcmp extension supports.
  2023-05-05 15:57 ` Sinan
@ 2023-05-06  8:53   ` Fei Gao
  2023-05-12  8:12     ` Sinan
  0 siblings, 1 reply; 9+ messages in thread
From: Fei Gao @ 2023-05-06  8:53 UTC (permalink / raw)
  To: Sinan; +Cc: jiawei, gcc-patches

On 2023-05-05 23:57  Sinan <sinan.lin@linux.alibaba.com> wrote:
>
>> hi Jiawei
>>
>> Please ignore my previous reply. I accidently sent the email before I finished it.
>> Sorry for that!
>>
>> I downloaded the series of patches from you and found in some cases
>> it fails to generate zcmp push and pop insns.
>>
>> TC:
>>
>> char my_getchar();
>> int test_s0()
>> {
>>
>> int a = my_getchar();
>> int b = my_getchar();
>> return a+b;
>> }
>>
>> cc1 -fno-shrink-wrap-separate -O2 -march=rv32e_zca_zcmp -mabi=ilp32e -mcmodel=medlow test.c
>>
>> -fno-shrink-wrap-separate is used here to avoid the impact from shrink-wrap-separate that is by default
>> enabled in O2.
>>
>> As i'm also interested in Zc*, i did some changes mainly in prologue and epilogue pass quite simliar to
>> what has been done for save and restore except the CFI directives due to reversed order that zcmp
>> pushes and pops ra, s regs than what save and restore do.
>>
>> I will refine and share the code soon for your review.
>>
>> BR
>> Fei
>Hi Fei,
>In the current implementation, cm.push will not increase the original adjustment size of the stack pointer. As cm.push uses a minimum adjustment size of 16, and in your example, the adjustment size of sp is 12, so cm.push will not be generated.
>you can find the check at riscv_use_push_pop
>> > + */
>> > + if (base_size > frame_size)
>> > + return false;
>> > +
>And if this check is removed, then you can get the output that you expect.
>```
> cm.push {ra,s0},-16
> call my_getchar
> mv s0,a0
> call my_getchar
> add a0,s0,a0
> cm.popret {ra,s0},16
>```
>In many scenarios of rv32e, cm.push cannot be generated as a result. Perhaps we can remove this check? I haven't tested if it is ok to remove this check, and CC jiawei to help test it.
>BR,
>Sinan 

hi Sinan

Thanks for your reply. 
I posted my codes at https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg306921.html
In the cover letter, i did some comparision. 
Could you please review?

Thanks & BR, 
Fei

>------------------------------------------------------------------
>Sender:Fei Gao <gaofei@eswincomputing.com>
>Sent At:2023 Apr. 25 (Tue.) 18:12
>Recipient:jiawei <jiawei@iscas.ac.cn>
>Cc:gcc-patches <gcc-patches@gcc.gnu.org>
>Subject:[PATCH 4/5] RISC-V: Add Zcmp extension supports.
>hi Jiawei
>Please ignore my previous reply. I accidently sent the email before I finished it.
>Sorry for that!
>I downloaded the series of patches from you and found in some cases
>it fails to generate zcmp push and pop insns.
>TC:
>char my_getchar();
>int test_s0()
>{
> int a = my_getchar();
> int b = my_getchar();
> return a+b;
>}
>cc1 -fno-shrink-wrap-separate -O2 -march=rv32e_zca_zcmp -mabi=ilp32e -mcmodel=medlow test.c
>-fno-shrink-wrap-separate is used here to avoid the impact from shrink-wrap-separate that is by default
>enabled in O2.
>As i'm also interested in Zc*, i did some changes mainly in prologue and epilogue pass quite simliar to
>what has been done for save and restore except the CFI directives due to reversed order that zcmp
>pushes and pops ra, s regs than what save and restore do.
>I will refine and share the code soon for your review.
>BR
>Fei
>On Thu Apr 6 06:21:17 GMT 2023 Jiawei jiawei@iscas.ac.cn wrote:
>>
>>Add Zcmp extension instructions support. Generate push/pop
>>with follow steps:
>>
>> 1. preprocessing:
>> 1.1. if there is no push rtx, then just return. e.g.
>> (note 5 1 22 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>> (insn/f 22 5 23 2 (set (reg/f:SI 2 sp)
>> (plus:SI (reg/f:SI 2 sp)
>> (const_int -32 [0xffffffffffffffe0])))
>> (nil))
>> (note 23 22 2 2 NOTE_INSN_PROLOGUE_END)
>> 1.2. if push rtx exists, then we compute the number of
>> pushed s-registers, n_sreg.
>>
>> push rtx should be find before NOTE_INSN_PROLOGUE_END tag
>>
>> [2 and 3 happend simultaneously]
>>
>> 2. find valid move pattern, mv sN, aN, where N < n_sreg,
>> and aN is not used the move pattern, and sN is not
>> defined before the move pattern (from prologue to the
>> position of move pattern).
>>
>> 3. analysis use and reach of every instruction from prologue
>> to the position of move pattern.
>> if any sN is used, then we mark the corresponding argument list
>> candidate as invalid.
>> e.g.
>> push {ra,s0-s3}, {}, -32
>> sw s0,44(sp) # s0 is used, then argument list is invalid
>> mv a0,a5 # a0 is defined, then argument list is invalid
>> ...
>> mv s0,a0
>> mv s1,a1
>> mv s2,a2
>>
>> 4. if there is a valid argument list, then replace the pop
>> push parallel insn, and delete mv pattern.
>> if not, skip.
>>
>>All "zcmpe" means Zcmp with RVE extension.
>>The push/pop instrunction implement is mostly finished by Sinan Lin.
>>
>>Co-Authored by: Sinan Lin <sinan....@linux.alibaba.com>
>>Co-Authored by: Simon Cook <simon.c...@embecosm.com>
>>Co-Authored by: Shihua Liao <shi...@iscas.ac.cn>
>>
>>gcc/ChangeLog:
>>
>> * config.gcc: New object.
>> * config/riscv/predicates.md (riscv_stack_push_operation):
>> New predicate.
>> (riscv_stack_pop_operation): Ditto.
>> (pop_return_value_constant): Ditto.
>> * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): New pass.
>> * config/riscv/riscv-protos.h (riscv_output_popret_p):
>> New routine.
>> (riscv_valid_stack_push_pop_p): Ditto.
>> (riscv_check_regno): Ditto.
>> (make_pass_zcmp_popret): New pass.
>> * config/riscv/riscv.cc (struct riscv_frame_info): New variable.
>> (riscv_output_popret_p): New function.
>> (riscv_print_pop_size): Ditto.
>> (riscv_print_reglist): Ditto.
>> (riscv_print_operand): New case symbols.
>> (riscv_save_push_pop_count): New function.
>> (riscv_push_pop_base_sp_adjust): Ditto.
>> (riscv_use_push_pop): Ditto.
>> (riscv_compute_frame_info): Adjust frame value.
>> (riscv_emit_pop_insn): New function.
>> (riscv_check_regno): Ditto.
>> (riscv_valid_stack_push_pop_p): Ditto.
>> (riscv_emit_push_insn): Ditto.
>> (riscv_expand_prologue): Modify frame pattern.
>> (riscv_expand_epilogue): Ditto.
>> * config/riscv/riscv.h (RETURN_VALUE_REGNUM):
>> (RISCV_ZCE_PUSH_POP_MASK): New mask.
>> (RISCV_ZCMPE_PUSH_POP_MASK): Ditto.
>> * config/riscv/riscv.md: Add new reg number and include info.
>> * config/riscv/t-riscv: New object rules.
>> * config/riscv/riscv-zcmp-popret.cc: New file.
>> * config/riscv/zc.md: New file.
>>---
>> gcc/config.gcc | 2 +-
>> gcc/config/riscv/predicates.md | 16 +
>> gcc/config/riscv/riscv-passes.def | 1 +
>> gcc/config/riscv/riscv-protos.h | 4 +
>> gcc/config/riscv/riscv-zcmp-popret.cc | 260 +++++++++++++++
>> gcc/config/riscv/riscv.cc | 437 +++++++++++++++++++++++++-
>> gcc/config/riscv/riscv.h | 4 +
>> gcc/config/riscv/riscv.md | 3 +
>> gcc/config/riscv/t-riscv | 4 +
>> gcc/config/riscv/zc.md | 47 +++
>> 10 files changed, 767 insertions(+), 11 deletions(-)
>> create mode 100644 gcc/config/riscv/riscv-zcmp-popret.cc
>> create mode 100644 gcc/config/riscv/zc.md
>>
>>diff --git a/gcc/config.gcc b/gcc/config.gcc
>>index 629d324b5ef..a991c5273f9 100644
>>--- a/gcc/config.gcc
>>+++ b/gcc/config.gcc
>>@@ -529,7 +529,7 @@ pru-*-*)
>> ;;
>> riscv*)
>> cpu_type=riscv
>>- extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o
>>riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o"
>>+ extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o
>>riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o
>>riscv-zcmp-popret.o"
>> extra_objs="${extra_objs} riscv-vector-builtins.o
>>riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
>> extra_objs="${extra_objs} thead.o"
>> d_target_objs="riscv-d.o"
>>diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
>>index 0d9d7701c7e..6bff6cd047a 100644
>>--- a/gcc/config/riscv/predicates.md
>>+++ b/gcc/config/riscv/predicates.md
>>@@ -412,3 +412,19 @@
>> (and (match_code "const_int")
>> (ior (match_operand 0 "not_uimm_extra_bit_operand")
>> (match_operand 0 "const_nottwobits_operand"))))
>>+
>>+(define_special_predicate "riscv_stack_push_operation"
>>+ (match_code "parallel")
>>+{
>>+ return riscv_valid_stack_push_pop_p (op, true);
>>+})
>>+
>>+(define_special_predicate "riscv_stack_pop_operation"
>>+ (match_code "parallel")
>>+{
>>+ return riscv_valid_stack_push_pop_p (op, false);
>>+})
>>+
>>+(define_predicate "pop_return_value_constant"
>>+ (and (match_code "const_int")
>>+ (match_test "INTVAL (op) == 0")))
>>diff --git a/gcc/config/riscv/riscv-passes.def
>>b/gcc/config/riscv/riscv-passes.def
>>index 4084122cf0a..25625b9af3e 100644
>>--- a/gcc/config/riscv/riscv-passes.def
>>+++ b/gcc/config/riscv/riscv-passes.def
>>@@ -19,3 +19,4 @@
>>
>> INSERT_PASS_AFTER (pass_rtl_store_motion, 1, pass_shorten_memrefs);
>> INSERT_PASS_BEFORE (pass_fast_rtl_dce, 1, pass_vsetvl);
>>+INSERT_PASS_AFTER (pass_cprop_hardreg, 1, pass_zcmp_popret);
>>diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
>>index 4611447ddde..8f243cd5f44 100644
>>--- a/gcc/config/riscv/riscv-protos.h
>>+++ b/gcc/config/riscv/riscv-protos.h
>>@@ -54,6 +54,7 @@ extern bool riscv_split_64bit_move_p (rtx, rtx);
>> extern void riscv_split_doubleword_move (rtx, rtx);
>> extern const char *riscv_output_move (rtx, rtx);
>> extern const char *riscv_output_return ();
>>+extern bool riscv_output_popret_p (rtx);
>>
>> #ifdef RTX_CODE
>> extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx);
>>@@ -79,6 +80,8 @@ extern void riscv_reinit (void);
>> extern poly_uint64 riscv_regmode_natural_size (machine_mode);
>> extern bool riscv_v_ext_vector_mode_p (machine_mode);
>> extern bool riscv_shamt_matches_mask_p (int, HOST_WIDE_INT);
>>+extern bool riscv_valid_stack_push_pop_p (rtx, bool);
>>+extern bool riscv_check_regno(rtx, unsigned);
>>
>> /* Routines implemented in riscv-c.cc. */
>> void riscv_cpu_cpp_builtins (cpp_reader *);
>>@@ -99,6 +102,7 @@ extern bool riscv_hard_regno_rename_ok (unsigned, unsigned);
>>
>> rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
>> rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
>>+rtl_opt_pass * make_pass_zcmp_popret (gcc::context *ctxt);
>>
>> /* Information about one CPU we know about. */
>> struct riscv_cpu_info {
>>diff --git a/gcc/config/riscv/riscv-zcmp-popret.cc
>>b/gcc/config/riscv/riscv-zcmp-popret.cc
>>new file mode 100644
>>index 00000000000..d7b40f6a3e2
>>--- /dev/null
>>+++ b/gcc/config/riscv/riscv-zcmp-popret.cc
>>@@ -0,0 +1,260 @@
>>+#include "config.h"
>>+#include "system.h"
>>+#include "coretypes.h"
>>+#include "tm.h"
>>+#include "rtl.h"
>>+#include "backend.h"
>>+#include "regs.h"
>>+#include "target.h"
>>+#include "memmodel.h"
>>+#include "emit-rtl.h"
>>+#include "df.h"
>>+#include "predict.h"
>>+#include "tree-pass.h"
>>+#include "tree.h"
>>+#include "tm_p.h"
>>+#include "optabs.h"
>>+#include "recog.h"
>>+#include "cfgrtl.h"
>>+
>>+#define IN_TARGET_CODE 1
>>+
>>+namespace {
>>+
>>+/*
>>+ 1. preprocessing:
>>+ 1.1. if there is no push rtx, then just return. e.g.
>>+ (note 5 1 22 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>>+ (insn/f 22 5 23 2 (set (reg/f:SI 2 sp)
>>+ (plus:SI (reg/f:SI 2 sp)
>>+ (const_int -32 [0xffffffffffffffe0])))
>>+ (nil))
>>+ (note 23 22 2 2 NOTE_INSN_PROLOGUE_END)
>>+ 1.2. if push rtx exists, then we compute the number of
>>+ pushed s-registers, n_sreg.
>>+
>>+ push rtx should be find before NOTE_INSN_PROLOGUE_END tag
>>+
>>+ [2 and 3 happend simultaneously]
>>+ 2. find valid move pattern, mv sN, aN, where N < n_sreg,
>>+ and aN is not used the move pattern, and sN is not
>>+ defined before the move pattern (from prologue to the
>>+ position of move pattern).
>>+ 3. analysis use and reach of every instruction from prologue
>>+ to the position of move pattern.
>>+ if any sN is used, then we mark the corresponding argument list
>>+ candidate as invalid.
>>+ e.g.
>>+ push {ra,s0-s3}, {}, -32
>>+ sw s0,44(sp) # s0 is used, then argument list is invalid
>>+ mv a0,a5 # a0 is defined, then argument list is invalid
>>+ ...
>>+ mv s0,a0
>>+ mv s1,a1
>>+ mv s2,a2
>>+
>>+ 4. if there is a valid argument list, then replace the pop
>>+ push parallel insn, and delete mv pattern.
>>+ if not, skip.
>>+*/
>>+
>>+static void
>>+emit_zcmp_popret (rtx_insn *pop_rtx,
>>+ rtx_insn **candidates,
>>+ basic_block bb)
>>+{
>>+ bool gen_popretz_p = candidates [0];
>>+ bool gen_popret_p = candidates [2];
>>+
>>+ if (!(gen_popret_p || gen_popretz_p))
>>+ return;
>>+
>>+ gcc_assert ((gen_popret_p && !gen_popretz_p)
>>+ || (gen_popretz_p && gen_popret_p));
>>+
>>+ rtx pop_pat = PATTERN (pop_rtx);
>>+ unsigned pop_idx = 0, popret_idx = 0;
>>+ unsigned n_pop_par = XVECLEN (pop_pat, 0);
>>+ unsigned n_popret_par = n_pop_par
>>+ + (gen_popretz_p ? 2 : 0)
>>+ + (gen_popret_p ? 2 : 0);
>>+
>>+ rtx popret_par = gen_rtx_PARALLEL (VOIDmode,
>>+ rtvec_alloc (n_popret_par));
>>+
>>+ /* return zero pattern */
>>+ if (gen_popretz_p)
>>+ {
>>+ XVECEXP (popret_par, 0, 0) = PATTERN (candidates[0]);
>>+ XVECEXP (popret_par, 0, 1) = PATTERN (candidates[1]);
>>+ popret_idx += 2;
>>+ delete_insn (candidates[0]);
>>+ delete_insn (candidates[1]);
>>+ }
>>+
>>+ /* copy pop paruence. */
>>+ for (; pop_idx < n_pop_par;
>>+ pop_idx ++, popret_idx ++)
>>+ {
>>+ XVECEXP (popret_par, 0, popret_idx) =
>>+ XVECEXP (pop_pat, 0, pop_idx);
>>+ }
>>+
>>+ /* ret pattern. */
>>+ rtx ret_pat = PATTERN (candidates[2]);
>>+ gcc_assert (GET_CODE (ret_pat) == PARALLEL);
>>+
>>+ for (int i = 0; i < XVECLEN (ret_pat, 0);
>>+ i++, popret_idx++)
>>+ {
>>+ XVECEXP (popret_par, 0, popret_idx) =
>>+ XVECEXP (ret_pat, 0, i);
>>+ }
>>+
>>+ rtx_insn *insn = emit_jump_insn_after (
>>+ popret_par,
>>+ BB_END (bb));
>>+ JUMP_LABEL (insn) = simple_return_rtx;
>>+
>>+ REG_NOTES (insn) = REG_NOTES (pop_rtx);
>>+ RTX_FRAME_RELATED_P (insn) = 1;
>>+
>>+ if (dump_file)
>>+ {
>>+ fprintf(dump_file, "new insn:\n");
>>+ print_rtl (dump_file, insn);
>>+ }
>>+
>>+ delete_insn (candidates [2]);
>>+ delete_insn (pop_rtx);
>>+}
>>+
>>+static void
>>+zcmp_popret (void)
>>+{
>>+ basic_block bb;
>>+ rtx_insn *insn = NULL, *pop_rtx = NULL;
>>+ rtx_insn *pop_candidates[3] = {NULL, };
>>+ /*
>>+ find NOTE_INSN_EPILOGUE_BEG, but pop_rtx not found => return
>>+ find NOTE_INSN_EPILOGUE_BEG, and pop_rtx is found => looking for a0
>>+ */
>>+
>>+ FOR_EACH_BB_REVERSE_FN (bb, cfun)
>>+ {
>>+ FOR_BB_INSNS_REVERSE (bb, insn)
>>+ {
>>+ if (!pop_rtx
>>+ && NOTE_P (insn)
>>+ && NOTE_KIND (insn) == NOTE_INSN_EPILOGUE_BEG)
>>+ return;
>>+
>>+ if (NOTE_P (insn)
>>+ && NOTE_KIND (insn) == NOTE_INSN_FUNCTION_BEG)
>>+ {
>>+ if (pop_rtx)
>>+ emit_zcmp_popret (pop_rtx, pop_candidates, bb);
>>+ return;
>>+ };
>>+
>>+ if (!(NONDEBUG_INSN_P (insn)
>>+ || CALL_P (insn)))
>>+ continue;
>>+
>>+ rtx pop_pat = PATTERN (insn);
>>+
>>+ if (GET_CODE (pop_pat) == PARALLEL
>>+ && riscv_valid_stack_push_pop_p (pop_pat, false))
>>+ {
>>+ pop_rtx = insn;
>>+ continue;
>>+ }
>>+
>>+ /* pattern for `ret`. */
>>+ if (JUMP_P (insn)
>>+ && GET_CODE (pop_pat) == PARALLEL
>>+ && XVECLEN (pop_pat, 0) == 2
>>+ && GET_CODE (XVECEXP (pop_pat, 0, 0)) == SIMPLE_RETURN
>>+ && GET_CODE (XVECEXP (pop_pat, 0, 1)) == USE)
>>+ {
>>+ rtx use_reg = XEXP (XVECEXP (pop_pat, 0, 1), 0);
>>+ if (REG_P (use_reg)
>>+ && REGNO (use_reg) == RETURN_ADDR_REGNUM)
>>+ {
>>+ pop_candidates [2] = insn;
>>+ continue;
>>+ }
>>+ }
>>+
>>+ if (!pop_rtx)
>>+ continue;
>>+
>>+ /* pattern for return value. */
>>+ if (!pop_candidates [0]
>>+ && GET_CODE (pop_pat) == USE)
>>+ {
>>+ rtx_insn *set_insn = PREV_INSN (insn);
>>+ rtx pat_set = PATTERN (set_insn);
>>+
>>+ if (riscv_check_regno (XEXP (pop_pat, 0),
>>+ RETURN_VALUE_REGNUM)
>>+ && insn
>>+ && pat_set != NULL
>>+ && GET_CODE (pat_set) == SET
>>+ && riscv_check_regno (SET_DEST (pat_set),
>>+ RETURN_VALUE_REGNUM)
>>+ && CONST_INT_P (SET_SRC (pat_set))
>>+ && INTVAL (SET_SRC (pat_set)) == 0)
>>+ {
>>+ pop_candidates [0] = set_insn;
>>+ pop_candidates [1] = insn;
>>+ break;
>>+ }
>>+ }
>>+ }
>>+
>>+ if (pop_rtx)
>>+ {
>>+ emit_zcmp_popret (pop_rtx, pop_candidates, bb);
>>+ return;
>>+ }
>>+ }
>>+}
>>+
>>+const pass_data pass_data_zcmp_popret =
>>+{
>>+ RTL_PASS, /* type */
>>+ "zcmp-popret", /* name */
>>+ OPTGROUP_NONE, /* optinfo_flags */
>>+ TV_NONE, /* tv_id */
>>+ 0, /* properties_required */
>>+ 0, /* properties_provided */
>>+ 0, /* properties_destroyed */
>>+ 0, /* todo_flags_start */
>>+ 0, /* todo_flags_finish */
>>+};
>>+
>>+class pass_zcmp_popret : public rtl_opt_pass
>>+{
>>+public:
>>+ pass_zcmp_popret (gcc::context *ctxt)
>>+ : rtl_opt_pass (pass_data_zcmp_popret, ctxt)
>>+ {}
>>+
>>+ /* opt_pass methods: */
>>+ virtual bool gate (function *)
>>+ { return TARGET_ZCMP; }
>>+ virtual unsigned int execute (function *)
>>+ {
>>+ zcmp_popret ();
>>+ return 0;
>>+ }
>>+}; // class pass_zcmp_popret
>>+
>>+} // anon namespace
>>+
>>+rtl_opt_pass *
>>+make_pass_zcmp_popret (gcc::context *ctxt)
>>+{
>>+ return new pass_zcmp_popret (ctxt);
>>+}
>>diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>>index 5f8cbfc15ed..17df2f3f8cf 100644
>>--- a/gcc/config/riscv/riscv.cc
>>+++ b/gcc/config/riscv/riscv.cc
>>@@ -114,6 +114,9 @@ struct GTY(()) riscv_frame_info {
>> /* Likewise FPR X. */
>> unsigned int fmask;
>>
>>+ /* How much the push/pop routines adjust sp (or 0 if unused). */
>>+ unsigned push_pop_sp_adjust;
>>+
>> /* How much the GPR save/restore routines adjust sp (or 0 if unused). */
>> unsigned save_libcall_adjustment;
>>
>>@@ -401,6 +404,20 @@ static const unsigned gpr_save_reg_order[] = {
>> S10_REGNUM, S11_REGNUM
>> };
>>
>>+/* Order for the CLOBBERs/USEs of push/pop. */
>>+static const unsigned push_save_reg_order[] = {
>>+ INVALID_REGNUM, RETURN_ADDR_REGNUM, S0_REGNUM,
>>+ S1_REGNUM, S2_REGNUM, S3_REGNUM, S4_REGNUM,
>>+ S5_REGNUM, S6_REGNUM, S7_REGNUM, S8_REGNUM,
>>+ S9_REGNUM, S10_REGNUM, S11_REGNUM
>>+};
>>+
>>+/* Order for the CLOBBERs/USEs of push/pop in rve. */
>>+static const unsigned push_save_reg_order_zcmpe[] = {
>>+ INVALID_REGNUM, RETURN_ADDR_REGNUM, S0_REGNUM,
>>+ S1_REGNUM
>>+};
>>+
>> /* A table describing all the processors GCC knows about. */
>> static const struct riscv_tune_info riscv_tune_info_table[] = {
>> #define RISCV_TUNE(TUNE_NAME, PIPELINE_MODEL, TUNE_INFO) \
>>@@ -2989,6 +3006,17 @@ riscv_output_return ()
>> return "ret";
>> }
>>
>>+bool
>>+riscv_output_popret_p (rtx op)
>>+{
>>+ unsigned n_rtx = XVECLEN (op, 0);
>>+ rtx use = XVECEXP (op, 0, n_rtx - 1);
>>+ rtx ret = XVECEXP (op, 0, n_rtx - 2);
>>+
>>+ return GET_CODE (ret) == SIMPLE_RETURN
>>+ && GET_CODE (use) == USE;
>>+}
>>+
>>
>>
>> /* Return true if CMP1 is a suitable second operand for integer ordering
>> test CODE. See also the *sCC patterns in riscv.md. */
>>@@ -4306,6 +4334,74 @@ riscv_memmodel_needs_amo_acquire (enum memmodel model)
>> }
>> }
>>
>>+/* Print Sp adjustment field of pop instruction. */
>>+
>>+static void
>>+riscv_print_pop_size (FILE *file, rtx op)
>>+{
>>+ unsigned sp_adjust_idx = XVECLEN (op, 0) - 1;
>>+ rtx sp_adjust_rtx = XVECEXP (op, 0, sp_adjust_idx);
>>+
>>+ /* Skip ret or pattern. */
>>+ while (GET_CODE (sp_adjust_rtx) != SET)
>>+ sp_adjust_rtx = XVECEXP (op, 0, --sp_adjust_idx);
>>+
>>+ rtx elt_plus = SET_SRC (sp_adjust_rtx);
>>+ fprintf (file, "%ld", INTVAL (XEXP (elt_plus, 1)));
>>+}
>>+
>>+/* Print push/pop register list. */
>>+
>>+static void
>>+riscv_print_reglist (FILE *file, rtx op)
>>+{
>>+ /* we only deal with three formats:
>>+ push {ra}
>>+ push {ra, s0}
>>+ push {ra, s0-sN}
>>+ or
>>+ pop {ra}
>>+ pop {ra, s0}
>>+ pop {ra, s0-sN}
>>+ registers except ra has to be continuous s-register,
>>+ and it is supposed to be checked before.
>>+ register list patterns in push:
>>+ (set/f (mem/c:SI
>>+ (plus:SI (reg/f:SI 2 sp)
>>+ (const_int 28 [0x1c])) [2 S4 A32])
>>+ (reg:SI 1 ra))
>>+ register list patterns in pop:
>>+ (set/f (reg:DI 1 ra)
>>+ (mem/c:DI (plus:DI (reg/f:DI 2 sp)
>>+ (const_int 8 [0x8])) [2 S8 A64]))
>>+ */
>>+ int total_count = XVECLEN (op, 0);
>>+ int n_regs = 0;
>>+ bool push_p = GET_CODE (XVECEXP (op, 0, 0)) == SET
>>+ && GET_CODE (SET_SRC (XVECEXP (op, 0, 0))) == PLUS;
>>+
>>+ for (int idx = 0; idx < total_count; ++idx)
>>+ {
>>+ rtx ele = XVECEXP (op, 0, idx);
>>+ if (GET_CODE (ele) != SET)
>>+ continue;
>>+
>>+ bool restore_save_p = push_p ?
>>+ MEM_P (SET_DEST (ele)) :
>>+ MEM_P (SET_SRC (ele));
>>+
>>+ if (restore_save_p)
>>+ n_regs ++;
>>+ }
>>+
>>+ if (n_regs > 2)
>>+ fprintf (file, "ra,s0-s%u", n_regs - 2);
>>+ else if (n_regs > 1)
>>+ fprintf (file, "ra,s0");
>>+ else
>>+ fputs("ra", file);
>>+}
>>+
>> /* Return true if a FENCE should be emitted to before a memory access to
>> implement the release portion of memory model MODEL. */
>>
>>@@ -4517,6 +4613,14 @@ riscv_print_operand (FILE *file, rtx op, int letter)
>> fputs (GET_RTX_NAME (code), file);
>> break;
>>
>>+ case 'L':
>>+ riscv_print_reglist (file, op);
>>+ break;
>>+
>>+ case 's':
>>+ riscv_print_pop_size (file, op);
>>+ break;
>>+
>> case 'S':
>> {
>> rtx newop = GEN_INT (ctz_hwi (INTVAL (op)));
>>@@ -4777,6 +4881,66 @@ riscv_use_save_libcall (const struct riscv_frame_info
>>*frame)
>> return frame->save_libcall_adjustment != 0;
>> }
>>
>>+/* Determine how many instructions related to push/pop instructions. */
>>+
>>+static unsigned
>>+riscv_save_push_pop_count (unsigned mask)
>>+{
>>+ if (!BITSET_P (mask, GP_REG_FIRST + RETURN_ADDR_REGNUM))
>>+ return 0;
>>+ for (unsigned n = GP_REG_LAST; n > GP_REG_FIRST; n--)
>>+ if (BITSET_P (mask, n)
>>+ && !call_used_regs [n])
>>+ /* add ra saving and sp adjust. */
>>+ return CALLEE_SAVED_REG_NUMBER (n) + 1 + 2;
>>+ abort ();
>>+}
>>+
>>+/* Calculate the maximum sp adjustment of push/pop instruction. */
>>+
>>+static unsigned
>>+riscv_push_pop_base_sp_adjust (unsigned mask)
>>+{
>>+ unsigned n_regs = riscv_save_push_pop_count (mask) - 1;
>>+ return (n_regs * UNITS_PER_WORD + 15) & (~0xf);
>>+}
>>+
>>+/* Determine whether to call push/pop routines. */
>>+
>>+static bool
>>+riscv_use_push_pop (const struct riscv_frame_info *frame, const HOST_WIDE_INT
>>frame_size)
>>+{
>>+ if (!TARGET_ZCMP)
>>+ return false;
>>+
>>+ /* We do not handler variable argument cases currently. */
>>+ if (cfun->machine->varargs_size != 0)
>>+ return false;
>>+
>>+ HOST_WIDE_INT base_size = riscv_push_pop_base_sp_adjust (frame->mask);
>>+ /*
>>+ Pr 960215-1.c in rv64 ouputs
>>+
>>+ addi sp,sp,-32
>>+ sd ra,24(sp)
>>+ sd s0,16(sp)
>>+ sd s2,8(sp)
>>+ sd s3,0(sp)
>>+ it is a rare case that callee saved registers are not non-continous,
>>+ which breaks the old push implementation, and we just reject this case
>>+ like save-restore does now.
>>+ */
>>+ if (base_size > frame_size)
>>+ return false;
>>+
>>+ /* {ra,s0-s10} is invalid. */
>>+ if (frame->mask & (1 << (S10_REGNUM - GP_REG_FIRST))
>>+ && !(frame->mask & (1 << (S11_REGNUM - GP_REG_FIRST))))
>>+ return false;
>>+
>>+ return frame->mask & (1 << (RETURN_ADDR_REGNUM - GP_REG_FIRST));
>>+}
>>+
>> /* Determine which GPR save/restore routine to call. */
>>
>> static unsigned
>>@@ -4934,6 +5098,8 @@ riscv_compute_frame_info (void)
>> /* Only use save/restore routines when the GPRs are atop the frame. */
>> if (known_ne (frame->hard_frame_pointer_offset, frame->total_size))
>> frame->save_libcall_adjustment = 0;
>>+
>>+ frame->push_pop_sp_adjust = 0;
>> }
>>
>> /* Make sure that we're not trying to eliminate to the wrong hard frame
>>@@ -5171,6 +5337,86 @@ riscv_for_each_saved_reg (poly_int64 sp_offset,
>>riscv_save_restore_fn fn,
>> }
>> }
>>
>>+static void
>>+riscv_emit_pop_insn (struct riscv_frame_info *frame, HOST_WIDE_INT offset,
>>HOST_WIDE_INT size)
>>+{
>>+ unsigned int veclen = riscv_save_push_pop_count (frame->mask);
>>+ unsigned int n_reg = veclen - 1;
>>+ rtvec vec = rtvec_alloc (veclen);
>>+ HOST_WIDE_INT sp_adjust;
>>+ rtx dwarf = NULL_RTX;
>>+
>>+ const unsigned *reg_order = (TARGET_ZCMP && TARGET_RVE)
>>+ ? push_save_reg_order_zcmpe
>>+ : push_save_reg_order;
>>+
>>+ gcc_assert (n_reg >= 1
>>+ && TARGET_ZCMP
>>+ && ((TARGET_RVE && (n_reg <= ARRAY_SIZE (push_save_reg_order_zcmpe)))
>>+ || (TARGET_ZCMP && (n_reg <= ARRAY_SIZE (push_save_reg_order)))));
>>+
>>+ /* sp adjust pattern */
>>+ int max_allow_sp_adjust = riscv_push_pop_base_sp_adjust (frame->mask) + 48;
>>+ int aligned_size = size;
>>+
>>+ /* if sp adjustment is too large, we should split it first. */
>>+ if (aligned_size > max_allow_sp_adjust)
>>+ {
>>+ rtx dwarf_pre_sp_adjust = NULL_RTX;
>>+ rtx pre_adjust_rtx = gen_add3_insn (stack_pointer_rtx,
>>+ stack_pointer_rtx,
>>+ GEN_INT (aligned_size - max_allow_sp_adjust));
>>+ rtx insn = emit_insn (pre_adjust_rtx);
>>+
>>+ rtx cfa_pre_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
>>+ GEN_INT (aligned_size - max_allow_sp_adjust));
>>+ dwarf_pre_sp_adjust = alloc_reg_note (REG_CFA_DEF_CFA,
>>+ cfa_pre_adjust_rtx,
>>+ dwarf_pre_sp_adjust);
>>+
>>+ RTX_FRAME_RELATED_P (insn) = 1;
>>+ REG_NOTES (insn) = dwarf_pre_sp_adjust;
>>+
>>+ sp_adjust = max_allow_sp_adjust;
>>+ }
>>+ else
>>+ sp_adjust = (aligned_size + 15) & (~0xf);
>>+
>>+ /* register save sequence. */
>>+ for (unsigned i = 1; i < veclen; ++i)
>>+ {
>>+ offset -= UNITS_PER_WORD;
>>+ unsigned regno = reg_order[i];
>>+ rtx reg = gen_rtx_REG (Pmode, regno);
>>+ rtx mem = gen_frame_mem (Pmode, plus_constant (Pmode,
>>+ stack_pointer_rtx,
>>+ offset));
>>+ rtx set = gen_rtx_SET (reg, mem);
>>+ RTVEC_ELT (vec, i - 1) = set;
>>+ RTX_FRAME_RELATED_P (set) = 1;
>>+ dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf);
>>+ }
>>+
>>+ /* sp adjust pattern */
>>+ rtx adjust_sp_rtx
>>+ = gen_rtx_SET (stack_pointer_rtx,
>>+ plus_constant (Pmode,
>>+ stack_pointer_rtx,
>>+ sp_adjust));
>>+ RTVEC_ELT (vec, veclen - 1) = adjust_sp_rtx;
>>+
>>+ rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
>>+ const0_rtx);
>>+ dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, dwarf);
>>+
>>+ frame->gp_sp_offset -= (veclen - 1) * UNITS_PER_WORD;
>>+ frame->push_pop_sp_adjust = sp_adjust;
>>+
>>+ rtx insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, vec));
>>+ RTX_FRAME_RELATED_P (insn) = 1;
>>+ REG_NOTES (insn) = dwarf;
>>+}
>>+
>> /* For stack frames that can't be allocated with a single ADDI instruction,
>> compute the best value to initially allocate. It must at a minimum
>> allocate enough space to spill the callee-saved registers. If TARGET_RVC,
>>@@ -5270,6 +5516,146 @@ riscv_emit_stack_tie (void)
>> emit_insn (gen_stack_tiedi (stack_pointer_rtx, hard_frame_pointer_rtx));
>> }
>>
>>+bool
>>+riscv_check_regno(rtx pat, unsigned regno)
>>+{
>>+ return REG_P (pat)
>>+ && REGNO (pat) == regno;
>>+}
>>+
>>+/* Function to check whether the OP is a valid stack push/pop operation.
>>+ This part is borrowed from nds32 nds32_valid_stack_push_pop_p */
>>+
>>+bool
>>+riscv_valid_stack_push_pop_p (rtx op, bool push_p)
>>+{
>>+ int index;
>>+ int total_count;
>>+ int sp_adjust_rtx_index;
>>+ rtx elt;
>>+ rtx elt_reg;
>>+ rtx elt_plus;
>>+
>>+ if (!TARGET_ZCMP)
>>+ return false;
>>+
>>+ total_count = XVECLEN (op, 0);
>>+ sp_adjust_rtx_index = push_p ? 0 : total_count - 1;
>>+
>>+ /* At least sp + one callee save/restore register rtx */
>>+ if (total_count < 2)
>>+ return false;
>>+
>>+ /* Perform some quick check for that every element should be 'set',
>>+ for pop, it might contain `ret` and `ret value` pattern. */
>>+ for (index = 0; index < total_count; index++)
>>+ {
>>+ elt = XVECEXP (op, 0, index);
>>+
>>+ /* skip pop return value rtx */
>>+ if (!push_p && GET_CODE (elt) == SET
>>+ && riscv_check_regno (SET_DEST (elt), RETURN_VALUE_REGNUM)
>>+ && total_count >= 4
>>+ && index + 1 < total_count
>>+ && GET_CODE (XVECEXP (op, 0, index + 1)) == USE)
>>+ {
>>+ rtx use_reg = XEXP (XVECEXP (op, 0, index + 1), 0);
>>+
>>+ if (!riscv_check_regno (use_reg, RETURN_VALUE_REGNUM))
>>+ return false;
>>+
>>+ index += 1;
>>+ continue;
>>+ }
>>+
>>+ /* skip ret rtx */
>>+ if (!push_p && GET_CODE (elt) == SIMPLE_RETURN
>>+ && total_count >= 4
>>+ && index + 1 < total_count
>>+ && GET_CODE (XVECEXP (op, 0, index + 1)) == USE)
>>+ {
>>+ rtx use_reg = XEXP (XVECEXP (op, 0, index + 1), 0);
>>+
>>+ if (!riscv_check_regno (use_reg, RETURN_ADDR_REGNUM))
>>+ return false;
>>+
>>+ index += 1;
>>+ sp_adjust_rtx_index -= 2;
>>+ continue;
>>+ }
>>+
>>+ if (GET_CODE (elt) != SET)
>>+ return false;
>>+ }
>>+
>>+ elt = XVECEXP (op, 0, sp_adjust_rtx_index);
>>+ elt_reg = SET_DEST (elt);
>>+ elt_plus = SET_SRC (elt);
>>+
>>+ /* Check this is (set (stack_reg) (plus stack_reg const)) pattern. */
>>+ if (GET_CODE (elt_plus) != PLUS
>>+ || !riscv_check_regno (elt_reg, STACK_POINTER_REGNUM))
>>+ return false;
>>+
>>+ /* Pass all test, this is a valid rtx. */
>>+ return true;
>>+}
>>+
>>+/* Generate push/pop rtx */
>>+
>>+static void
>>+riscv_emit_push_insn (struct riscv_frame_info *frame, HOST_WIDE_INT size)
>>+{
>>+ unsigned int veclen = riscv_save_push_pop_count (frame->mask);
>>+ unsigned int n_reg = veclen - 1;
>>+ rtvec vec = rtvec_alloc (veclen);
>>+
>>+ const unsigned *reg_order = (TARGET_ZCMP && TARGET_RVE)
>>+ ? push_save_reg_order_zcmpe
>>+ : push_save_reg_order;
>>+
>>+ int aligned_size = (size + 15) & (~0xf);
>>+
>>+ gcc_assert (n_reg >= 1
>>+ && TARGET_ZCMP
>>+ && ((TARGET_RVE && (n_reg <= ARRAY_SIZE (push_save_reg_order_zcmpe)))
>>+ || (TARGET_ZCMP && (n_reg <= ARRAY_SIZE (push_save_reg_order)))));
>>+
>>+ /* sp adjust pattern */
>>+ int max_allow_sp_adjust = riscv_push_pop_base_sp_adjust (frame->mask) + 48;
>>+ int sp_adjust = aligned_size > max_allow_sp_adjust ?
>>+ max_allow_sp_adjust
>>+ : aligned_size;
>>+
>>+ /*TODO: move this part to frame computation function. */
>>+ frame->gp_sp_offset = (veclen - 1) * UNITS_PER_WORD;
>>+ frame->push_pop_sp_adjust = sp_adjust;
>>+
>>+ rtx adjust_sp_rtx
>>+ = gen_rtx_SET (stack_pointer_rtx,
>>+ plus_constant (Pmode,
>>+ stack_pointer_rtx,
>>+ -sp_adjust));
>>+ RTVEC_ELT (vec, 0) = adjust_sp_rtx;
>>+
>>+ /* Register save sequence. */
>>+ for (unsigned i = 1; i < veclen; ++i)
>>+ {
>>+ sp_adjust -= UNITS_PER_WORD;
>>+ unsigned regno = reg_order[i];
>>+ rtx reg = gen_rtx_REG (Pmode, regno);
>>+ rtx mem = gen_frame_mem (Pmode, plus_constant (Pmode,
>>+ stack_pointer_rtx,
>>+ sp_adjust));
>>+ rtx set = gen_rtx_SET (mem, reg);
>>+ RTVEC_ELT (vec, i) = set;
>>+ RTX_FRAME_RELATED_P (set) = 1;
>>+ }
>>+
>>+ rtx insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, vec));
>>+ RTX_FRAME_RELATED_P (insn) = 1;
>>+}
>>+
>> /* Expand the "prologue" pattern. */
>>
>> void
>>@@ -5278,6 +5664,7 @@ riscv_expand_prologue (void)
>> struct riscv_frame_info *frame = &cfun->machine->frame;
>> poly_int64 size = frame->total_size;
>> unsigned mask = frame->mask;
>>+ HOST_WIDE_INT step1 = riscv_first_stack_step (frame);
>> rtx insn;
>>
>> if (flag_stack_usage_info)
>>@@ -5300,19 +5687,32 @@ riscv_expand_prologue (void)
>> REG_NOTES (insn) = dwarf;
>> }
>>
>>+ if (size.is_constant ())
>>+ step1 = MIN (size.to_constant(), step1);
>>+ if (riscv_use_push_pop (frame, step1))
>>+ {
>>+ riscv_emit_push_insn (frame, step1);
>>+
>>+ step1 = MAX (step1 - frame->push_pop_sp_adjust, 0);
>>+ size = MAX (size.to_constant() - frame->push_pop_sp_adjust, 0);
>>+ frame->mask &= ~ ((TARGET_ZCMP && TARGET_RVE) ?
>>+ RISCV_ZCMPE_PUSH_POP_MASK
>>+ : RISCV_ZCE_PUSH_POP_MASK);
>>+ }
>>+
>> /* Save the registers. */
>> if ((frame->mask | frame->fmask) != 0)
>> {
>>- HOST_WIDE_INT step1 = riscv_first_stack_step (frame);
>>- if (size.is_constant ())
>>- step1 = MIN (size.to_constant(), step1);
>>-
>>- insn = gen_add3_insn (stack_pointer_rtx,
>>- stack_pointer_rtx,
>>- GEN_INT (-step1));
>>- RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
>>- size -= step1;
>>- riscv_for_each_saved_reg (size, riscv_save_reg, false, false);
>>+ if (step1 > 0)
>>+ {
>>+ insn = gen_add3_insn (stack_pointer_rtx,
>>+ stack_pointer_rtx,
>>+ GEN_INT (-step1));
>>+ RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
>>+ size -= step1;
>>+ }
>>+ riscv_for_each_saved_reg (size, riscv_save_reg,
>>+ false /* bool epilogue */, false /* bool maybe_eh_return */);
>> }
>>
>> frame->mask = mask; /* Undo the above fib. */
>>@@ -5412,6 +5812,8 @@ riscv_expand_epilogue (int style)
>> rtx ra = gen_rtx_REG (Pmode, RETURN_ADDR_REGNUM);
>> rtx insn;
>>
>>+ bool use_zcmp_pop = !use_restore_libcall && !(crtl->calls_eh_return);
>>+
>> /* We need to add memory barrier to prevent read from deallocated stack. */
>> bool need_barrier_p = known_ne (get_frame_size ()
>> + cfun->machine->frame.arg_pointer_offset, 0);
>>@@ -5538,6 +5940,18 @@ riscv_expand_epilogue (int style)
>> if (use_restore_libcall)
>> frame->mask = 0; /* Temporarily fib that we need not save GPRs. */
>>
>>+ if (use_zcmp_pop && riscv_use_push_pop (frame, step2))
>>+ {
>>+ /* Emit a barrier to prevent loads from a deallocated stack. */
>>+ riscv_emit_stack_tie ();
>>+ need_barrier_p = false;
>>+ riscv_emit_pop_insn (frame, frame->total_size.to_constant(), step2);
>>+ frame->mask &= ~ ((TARGET_ZCMP && TARGET_RVE) ?
>>+ RISCV_ZCMPE_PUSH_POP_MASK
>>+ : RISCV_ZCE_PUSH_POP_MASK);
>>+ step2 = 0;
>>+ }
>>+
>> /* Restore the registers. */
>> riscv_for_each_saved_reg (frame->total_size - step2, riscv_restore_reg,
>> true, style == EXCEPTION_RETURN);
>>@@ -5552,6 +5966,9 @@ riscv_expand_epilogue (int style)
>> if (need_barrier_p)
>> riscv_emit_stack_tie ();
>>
>>+ if (use_zcmp_pop)
>>+ frame->mask = mask;
>>+
>> /* Deallocate the final bit of the frame. */
>> if (step2 > 0)
>> {
>>diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
>>index d05b1d59853..6e6e3ee2c25 100644
>>--- a/gcc/config/riscv/riscv.h
>>+++ b/gcc/config/riscv/riscv.h
>>@@ -383,6 +383,7 @@ ASM_MISA_SPEC
>> #define HARD_FRAME_POINTER_REGNUM 8
>> #define STACK_POINTER_REGNUM 2
>> #define THREAD_POINTER_REGNUM 4
>>+#define RETURN_VALUE_REGNUM 10
>>
>> /* These two registers don't really exist: they get eliminated to either
>> the stack or hard frame pointer. */
>>@@ -1097,4 +1098,7 @@ extern void riscv_remove_unneeded_save_restore_calls
>>(void);
>> #define DWARF_REG_TO_UNWIND_COLUMN(REGNO) \
>> ((REGNO == RISCV_DWARF_VLENB) ? (FIRST_PSEUDO_REGISTER + 1) : REGNO)
>>
>>+#define RISCV_ZCE_PUSH_POP_MASK 0x0ffc0302u
>>+#define RISCV_ZCMPE_PUSH_POP_MASK 0x302u
>>+
>> #endif /* ! GCC_RISCV_H */
>>diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
>>index bc384d9aedf..b9f2a426e48 100644
>>--- a/gcc/config/riscv/riscv.md
>>+++ b/gcc/config/riscv/riscv.md
>>@@ -108,12 +108,14 @@
>>
>> (define_constants
>> [(RETURN_ADDR_REGNUM 1)
>>+ (SP_REGNUM 2)
>> (GP_REGNUM 3)
>> (TP_REGNUM 4)
>> (T0_REGNUM 5)
>> (T1_REGNUM 6)
>> (S0_REGNUM 8)
>> (S1_REGNUM 9)
>>+ (A0_REGNUM 10)
>> (S2_REGNUM 18)
>> (S3_REGNUM 19)
>> (S4_REGNUM 20)
>>@@ -3147,3 +3149,4 @@
>> (include "sifive-7.md")
>> (include "thead.md")
>> (include "vector.md")
>>+(include "zc.md")
>>diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
>>index 6e326fc7e02..9ef522306a5 100644
>>--- a/gcc/config/riscv/t-riscv
>>+++ b/gcc/config/riscv/t-riscv
>>@@ -90,6 +90,10 @@ riscv-v.o: $(srcdir)/config/riscv/riscv-v.cc \
>> $(COMPILE) $<
>> $(POSTCOMPILE)
>>
>>+riscv-zcmp-popret.o: $(srcdir)/config/riscv/riscv-zcmp-popret.cc
>>+ $(COMPILE) $<
>>+ $(POSTCOMPILE)
>>+
>> thead.o: $(srcdir)/config/riscv/thead.cc \
>> $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TARGET_H) backend.h $(RTL_H) \
>> memmodel.h $(EMIT_RTL_H) poly-int.h output.h
>>diff --git a/gcc/config/riscv/zc.md b/gcc/config/riscv/zc.md
>>new file mode 100644
>>index 00000000000..3ad34dacd49
>>--- /dev/null
>>+++ b/gcc/config/riscv/zc.md
>>@@ -0,0 +1,47 @@
>>+;; Machine description for ZCE extension.
>>+;; Copyright (C) 2021 Free Software Foundation, Inc.
>>+
>>+;; This file is part of GCC.
>>+
>>+;; GCC is free software; you can redistribute it and/or modify
>>+;; it under the terms of the GNU General Public License as published by
>>+;; the Free Software Foundation; either version 3, or (at your option)
>>+;; any later version.
>>+
>>+;; GCC is distributed in the hope that it will be useful,
>>+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
>>+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>>+;; GNU General Public License for more details.
>>+
>>+;; You should have received a copy of the GNU General Public License
>>+;; along with GCC; see the file COPYING3. If not see
>>+;; <http://www.gnu.org/licenses/>.
>>+
>>+(define_insn "*stack_push<mode>"
>>+ [(match_parallel 0 "riscv_stack_push_operation"
>>+ [(set (reg:X SP_REGNUM) (plus:X (reg:X SP_REGNUM)
>>+ (match_operand:X 1 "const_int_operand" "")))])]
>>+ "TARGET_ZCMP"
>>+ "cm.push\t{%L0},%1")
>>+
>>+(define_insn "*stack_pop<mode>"
>>+ [(match_parallel 0 "riscv_stack_pop_operation"
>>+ [(set (match_operand:X 1 "register_operand" "")
>>+ (mem:X (plus:X (reg:X SP_REGNUM)
>>+ (match_operand:X 2 "const_int_operand" ""))))])]
>>+ "TARGET_ZCMP"
>>+ {
>>+ return riscv_output_popret_p (operands[0]) ?
>>+ "cm.popret\t{%L0},%s0" :
>>+ "cm.pop\t{%L0},%s0";
>>+ })
>>+
>>+(define_insn "*stack_pop_with_return_value<mode>"
>>+ [(match_parallel 0 "riscv_stack_pop_operation"
>>+ [(set (reg:ANYI A0_REGNUM)
>>+ (match_operand:ANYI 1 "pop_return_value_constant" ""))])]
>>+ "TARGET_ZCMP"
>>+ {
>>+ gcc_assert (riscv_output_popret_p (operands[0]));
>>+ return "cm.popretz\t{%L0},%s0";
>>+ })
>>--
>>2.25.1 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re: [PATCH 4/5] RISC-V: Add Zcmp extension supports.
  2023-05-06  8:53   ` Fei Gao
@ 2023-05-12  8:12     ` Sinan
  2023-05-12  9:10       ` Fei Gao
  0 siblings, 1 reply; 9+ messages in thread
From: Sinan @ 2023-05-12  8:12 UTC (permalink / raw)
  To: Fei Gao; +Cc: gcc-patches, Jiawei

[-- Attachment #1: Type: text/plain, Size: 37407 bytes --]

Hi Fei,
Sorry for the late reply, I've been busy with moving these days :(.
Thanks for working on it. I would prefer removing the extra pass for popretz if possible ... I will test your patches ASAP. 
BR,
Sinan
------------------------------------------------------------------
Sender:Fei Gao <gaofei@eswincomputing.com>
Sent At:2023 May 6 (Sat.) 16:53
Recipient:Sinan <sinan.lin@linux.alibaba.com>
Cc:jiawei <jiawei@iscas.ac.cn>; gcc-patches <gcc-patches@gcc.gnu.org>
Subject:Re: Re: [PATCH 4/5] RISC-V: Add Zcmp extension supports.
On 2023-05-05 23:57 Sinan <sinan.lin@linux.alibaba.com> wrote:
>
>> hi Jiawei
>>
>> Please ignore my previous reply. I accidently sent the email before I finished it.
>> Sorry for that!
>>
>> I downloaded the series of patches from you and found in some cases
>> it fails to generate zcmp push and pop insns.
>>
>> TC:
>>
>> char my_getchar();
>> int test_s0()
>> {
>>
>> int a = my_getchar();
>> int b = my_getchar();
>> return a+b;
>> }
>>
>> cc1 -fno-shrink-wrap-separate -O2 -march=rv32e_zca_zcmp -mabi=ilp32e -mcmodel=medlow test.c
>>
>> -fno-shrink-wrap-separate is used here to avoid the impact from shrink-wrap-separate that is by default
>> enabled in O2.
>>
>> As i'm also interested in Zc*, i did some changes mainly in prologue and epilogue pass quite simliar to
>> what has been done for save and restore except the CFI directives due to reversed order that zcmp
>> pushes and pops ra, s regs than what save and restore do.
>>
>> I will refine and share the code soon for your review.
>>
>> BR
>> Fei
>Hi Fei,
>In the current implementation, cm.push will not increase the original adjustment size of the stack pointer. As cm.push uses a minimum adjustment size of 16, and in your example, the adjustment size of sp is 12, so cm.push will not be generated.
>you can find the check at riscv_use_push_pop
>> > + */
>> > + if (base_size > frame_size)
>> > + return false;
>> > +
>And if this check is removed, then you can get the output that you expect.
>```
> cm.push {ra,s0},-16
> call my_getchar
> mv s0,a0
> call my_getchar
> add a0,s0,a0
> cm.popret {ra,s0},16
>```
>In many scenarios of rv32e, cm.push cannot be generated as a result. Perhaps we can remove this check? I haven't tested if it is ok to remove this check, and CC jiawei to help test it.
>BR,
>Sinan 
hi Sinan
Thanks for your reply. 
I posted my codes at https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg306921.html
In the cover letter, i did some comparision. 
Could you please review?
Thanks & BR, 
Fei
>------------------------------------------------------------------
>Sender:Fei Gao <gaofei@eswincomputing.com>
>Sent At:2023 Apr. 25 (Tue.) 18:12
>Recipient:jiawei <jiawei@iscas.ac.cn>
>Cc:gcc-patches <gcc-patches@gcc.gnu.org>
>Subject:[PATCH 4/5] RISC-V: Add Zcmp extension supports.
>hi Jiawei
>Please ignore my previous reply. I accidently sent the email before I finished it.
>Sorry for that!
>I downloaded the series of patches from you and found in some cases
>it fails to generate zcmp push and pop insns.
>TC:
>char my_getchar();
>int test_s0()
>{
> int a = my_getchar();
> int b = my_getchar();
> return a+b;
>}
>cc1 -fno-shrink-wrap-separate -O2 -march=rv32e_zca_zcmp -mabi=ilp32e -mcmodel=medlow test.c
>-fno-shrink-wrap-separate is used here to avoid the impact from shrink-wrap-separate that is by default
>enabled in O2.
>As i'm also interested in Zc*, i did some changes mainly in prologue and epilogue pass quite simliar to
>what has been done for save and restore except the CFI directives due to reversed order that zcmp
>pushes and pops ra, s regs than what save and restore do.
>I will refine and share the code soon for your review.
>BR
>Fei
>On Thu Apr 6 06:21:17 GMT 2023 Jiawei jiawei@iscas.ac.cn wrote:
>>
>>Add Zcmp extension instructions support. Generate push/pop
>>with follow steps:
>>
>> 1. preprocessing:
>> 1.1. if there is no push rtx, then just return. e.g.
>> (note 5 1 22 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>> (insn/f 22 5 23 2 (set (reg/f:SI 2 sp)
>> (plus:SI (reg/f:SI 2 sp)
>> (const_int -32 [0xffffffffffffffe0])))
>> (nil))
>> (note 23 22 2 2 NOTE_INSN_PROLOGUE_END)
>> 1.2. if push rtx exists, then we compute the number of
>> pushed s-registers, n_sreg.
>>
>> push rtx should be find before NOTE_INSN_PROLOGUE_END tag
>>
>> [2 and 3 happend simultaneously]
>>
>> 2. find valid move pattern, mv sN, aN, where N < n_sreg,
>> and aN is not used the move pattern, and sN is not
>> defined before the move pattern (from prologue to the
>> position of move pattern).
>>
>> 3. analysis use and reach of every instruction from prologue
>> to the position of move pattern.
>> if any sN is used, then we mark the corresponding argument list
>> candidate as invalid.
>> e.g.
>> push {ra,s0-s3}, {}, -32
>> sw s0,44(sp) # s0 is used, then argument list is invalid
>> mv a0,a5 # a0 is defined, then argument list is invalid
>> ...
>> mv s0,a0
>> mv s1,a1
>> mv s2,a2
>>
>> 4. if there is a valid argument list, then replace the pop
>> push parallel insn, and delete mv pattern.
>> if not, skip.
>>
>>All "zcmpe" means Zcmp with RVE extension.
>>The push/pop instrunction implement is mostly finished by Sinan Lin.
>>
>>Co-Authored by: Sinan Lin <sinan....@linux.alibaba.com>
>>Co-Authored by: Simon Cook <simon.c...@embecosm.com>
>>Co-Authored by: Shihua Liao <shi...@iscas.ac.cn>
>>
>>gcc/ChangeLog:
>>
>> * config.gcc: New object.
>> * config/riscv/predicates.md (riscv_stack_push_operation):
>> New predicate.
>> (riscv_stack_pop_operation): Ditto.
>> (pop_return_value_constant): Ditto.
>> * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): New pass.
>> * config/riscv/riscv-protos.h (riscv_output_popret_p):
>> New routine.
>> (riscv_valid_stack_push_pop_p): Ditto.
>> (riscv_check_regno): Ditto.
>> (make_pass_zcmp_popret): New pass.
>> * config/riscv/riscv.cc (struct riscv_frame_info): New variable.
>> (riscv_output_popret_p): New function.
>> (riscv_print_pop_size): Ditto.
>> (riscv_print_reglist): Ditto.
>> (riscv_print_operand): New case symbols.
>> (riscv_save_push_pop_count): New function.
>> (riscv_push_pop_base_sp_adjust): Ditto.
>> (riscv_use_push_pop): Ditto.
>> (riscv_compute_frame_info): Adjust frame value.
>> (riscv_emit_pop_insn): New function.
>> (riscv_check_regno): Ditto.
>> (riscv_valid_stack_push_pop_p): Ditto.
>> (riscv_emit_push_insn): Ditto.
>> (riscv_expand_prologue): Modify frame pattern.
>> (riscv_expand_epilogue): Ditto.
>> * config/riscv/riscv.h (RETURN_VALUE_REGNUM):
>> (RISCV_ZCE_PUSH_POP_MASK): New mask.
>> (RISCV_ZCMPE_PUSH_POP_MASK): Ditto.
>> * config/riscv/riscv.md: Add new reg number and include info.
>> * config/riscv/t-riscv: New object rules.
>> * config/riscv/riscv-zcmp-popret.cc: New file.
>> * config/riscv/zc.md: New file.
>>---
>> gcc/config.gcc | 2 +-
>> gcc/config/riscv/predicates.md | 16 +
>> gcc/config/riscv/riscv-passes.def | 1 +
>> gcc/config/riscv/riscv-protos.h | 4 +
>> gcc/config/riscv/riscv-zcmp-popret.cc | 260 +++++++++++++++
>> gcc/config/riscv/riscv.cc | 437 +++++++++++++++++++++++++-
>> gcc/config/riscv/riscv.h | 4 +
>> gcc/config/riscv/riscv.md | 3 +
>> gcc/config/riscv/t-riscv | 4 +
>> gcc/config/riscv/zc.md | 47 +++
>> 10 files changed, 767 insertions(+), 11 deletions(-)
>> create mode 100644 gcc/config/riscv/riscv-zcmp-popret.cc
>> create mode 100644 gcc/config/riscv/zc.md
>>
>>diff --git a/gcc/config.gcc b/gcc/config.gcc
>>index 629d324b5ef..a991c5273f9 100644
>>--- a/gcc/config.gcc
>>+++ b/gcc/config.gcc
>>@@ -529,7 +529,7 @@ pru-*-*)
>> ;;
>> riscv*)
>> cpu_type=riscv
>>- extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o
>>riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o"
>>+ extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o
>>riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o
>>riscv-zcmp-popret.o"
>> extra_objs="${extra_objs} riscv-vector-builtins.o
>>riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
>> extra_objs="${extra_objs} thead.o"
>> d_target_objs="riscv-d.o"
>>diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
>>index 0d9d7701c7e..6bff6cd047a 100644
>>--- a/gcc/config/riscv/predicates.md
>>+++ b/gcc/config/riscv/predicates.md
>>@@ -412,3 +412,19 @@
>> (and (match_code "const_int")
>> (ior (match_operand 0 "not_uimm_extra_bit_operand")
>> (match_operand 0 "const_nottwobits_operand"))))
>>+
>>+(define_special_predicate "riscv_stack_push_operation"
>>+ (match_code "parallel")
>>+{
>>+ return riscv_valid_stack_push_pop_p (op, true);
>>+})
>>+
>>+(define_special_predicate "riscv_stack_pop_operation"
>>+ (match_code "parallel")
>>+{
>>+ return riscv_valid_stack_push_pop_p (op, false);
>>+})
>>+
>>+(define_predicate "pop_return_value_constant"
>>+ (and (match_code "const_int")
>>+ (match_test "INTVAL (op) == 0")))
>>diff --git a/gcc/config/riscv/riscv-passes.def
>>b/gcc/config/riscv/riscv-passes.def
>>index 4084122cf0a..25625b9af3e 100644
>>--- a/gcc/config/riscv/riscv-passes.def
>>+++ b/gcc/config/riscv/riscv-passes.def
>>@@ -19,3 +19,4 @@
>>
>> INSERT_PASS_AFTER (pass_rtl_store_motion, 1, pass_shorten_memrefs);
>> INSERT_PASS_BEFORE (pass_fast_rtl_dce, 1, pass_vsetvl);
>>+INSERT_PASS_AFTER (pass_cprop_hardreg, 1, pass_zcmp_popret);
>>diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
>>index 4611447ddde..8f243cd5f44 100644
>>--- a/gcc/config/riscv/riscv-protos.h
>>+++ b/gcc/config/riscv/riscv-protos.h
>>@@ -54,6 +54,7 @@ extern bool riscv_split_64bit_move_p (rtx, rtx);
>> extern void riscv_split_doubleword_move (rtx, rtx);
>> extern const char *riscv_output_move (rtx, rtx);
>> extern const char *riscv_output_return ();
>>+extern bool riscv_output_popret_p (rtx);
>>
>> #ifdef RTX_CODE
>> extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx);
>>@@ -79,6 +80,8 @@ extern void riscv_reinit (void);
>> extern poly_uint64 riscv_regmode_natural_size (machine_mode);
>> extern bool riscv_v_ext_vector_mode_p (machine_mode);
>> extern bool riscv_shamt_matches_mask_p (int, HOST_WIDE_INT);
>>+extern bool riscv_valid_stack_push_pop_p (rtx, bool);
>>+extern bool riscv_check_regno(rtx, unsigned);
>>
>> /* Routines implemented in riscv-c.cc. */
>> void riscv_cpu_cpp_builtins (cpp_reader *);
>>@@ -99,6 +102,7 @@ extern bool riscv_hard_regno_rename_ok (unsigned, unsigned);
>>
>> rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
>> rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
>>+rtl_opt_pass * make_pass_zcmp_popret (gcc::context *ctxt);
>>
>> /* Information about one CPU we know about. */
>> struct riscv_cpu_info {
>>diff --git a/gcc/config/riscv/riscv-zcmp-popret.cc
>>b/gcc/config/riscv/riscv-zcmp-popret.cc
>>new file mode 100644
>>index 00000000000..d7b40f6a3e2
>>--- /dev/null
>>+++ b/gcc/config/riscv/riscv-zcmp-popret.cc
>>@@ -0,0 +1,260 @@
>>+#include "config.h"
>>+#include "system.h"
>>+#include "coretypes.h"
>>+#include "tm.h"
>>+#include "rtl.h"
>>+#include "backend.h"
>>+#include "regs.h"
>>+#include "target.h"
>>+#include "memmodel.h"
>>+#include "emit-rtl.h"
>>+#include "df.h"
>>+#include "predict.h"
>>+#include "tree-pass.h"
>>+#include "tree.h"
>>+#include "tm_p.h"
>>+#include "optabs.h"
>>+#include "recog.h"
>>+#include "cfgrtl.h"
>>+
>>+#define IN_TARGET_CODE 1
>>+
>>+namespace {
>>+
>>+/*
>>+ 1. preprocessing:
>>+ 1.1. if there is no push rtx, then just return. e.g.
>>+ (note 5 1 22 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>>+ (insn/f 22 5 23 2 (set (reg/f:SI 2 sp)
>>+ (plus:SI (reg/f:SI 2 sp)
>>+ (const_int -32 [0xffffffffffffffe0])))
>>+ (nil))
>>+ (note 23 22 2 2 NOTE_INSN_PROLOGUE_END)
>>+ 1.2. if push rtx exists, then we compute the number of
>>+ pushed s-registers, n_sreg.
>>+
>>+ push rtx should be find before NOTE_INSN_PROLOGUE_END tag
>>+
>>+ [2 and 3 happend simultaneously]
>>+ 2. find valid move pattern, mv sN, aN, where N < n_sreg,
>>+ and aN is not used the move pattern, and sN is not
>>+ defined before the move pattern (from prologue to the
>>+ position of move pattern).
>>+ 3. analysis use and reach of every instruction from prologue
>>+ to the position of move pattern.
>>+ if any sN is used, then we mark the corresponding argument list
>>+ candidate as invalid.
>>+ e.g.
>>+ push {ra,s0-s3}, {}, -32
>>+ sw s0,44(sp) # s0 is used, then argument list is invalid
>>+ mv a0,a5 # a0 is defined, then argument list is invalid
>>+ ...
>>+ mv s0,a0
>>+ mv s1,a1
>>+ mv s2,a2
>>+
>>+ 4. if there is a valid argument list, then replace the pop
>>+ push parallel insn, and delete mv pattern.
>>+ if not, skip.
>>+*/
>>+
>>+static void
>>+emit_zcmp_popret (rtx_insn *pop_rtx,
>>+ rtx_insn **candidates,
>>+ basic_block bb)
>>+{
>>+ bool gen_popretz_p = candidates [0];
>>+ bool gen_popret_p = candidates [2];
>>+
>>+ if (!(gen_popret_p || gen_popretz_p))
>>+ return;
>>+
>>+ gcc_assert ((gen_popret_p && !gen_popretz_p)
>>+ || (gen_popretz_p && gen_popret_p));
>>+
>>+ rtx pop_pat = PATTERN (pop_rtx);
>>+ unsigned pop_idx = 0, popret_idx = 0;
>>+ unsigned n_pop_par = XVECLEN (pop_pat, 0);
>>+ unsigned n_popret_par = n_pop_par
>>+ + (gen_popretz_p ? 2 : 0)
>>+ + (gen_popret_p ? 2 : 0);
>>+
>>+ rtx popret_par = gen_rtx_PARALLEL (VOIDmode,
>>+ rtvec_alloc (n_popret_par));
>>+
>>+ /* return zero pattern */
>>+ if (gen_popretz_p)
>>+ {
>>+ XVECEXP (popret_par, 0, 0) = PATTERN (candidates[0]);
>>+ XVECEXP (popret_par, 0, 1) = PATTERN (candidates[1]);
>>+ popret_idx += 2;
>>+ delete_insn (candidates[0]);
>>+ delete_insn (candidates[1]);
>>+ }
>>+
>>+ /* copy pop paruence. */
>>+ for (; pop_idx < n_pop_par;
>>+ pop_idx ++, popret_idx ++)
>>+ {
>>+ XVECEXP (popret_par, 0, popret_idx) =
>>+ XVECEXP (pop_pat, 0, pop_idx);
>>+ }
>>+
>>+ /* ret pattern. */
>>+ rtx ret_pat = PATTERN (candidates[2]);
>>+ gcc_assert (GET_CODE (ret_pat) == PARALLEL);
>>+
>>+ for (int i = 0; i < XVECLEN (ret_pat, 0);
>>+ i++, popret_idx++)
>>+ {
>>+ XVECEXP (popret_par, 0, popret_idx) =
>>+ XVECEXP (ret_pat, 0, i);
>>+ }
>>+
>>+ rtx_insn *insn = emit_jump_insn_after (
>>+ popret_par,
>>+ BB_END (bb));
>>+ JUMP_LABEL (insn) = simple_return_rtx;
>>+
>>+ REG_NOTES (insn) = REG_NOTES (pop_rtx);
>>+ RTX_FRAME_RELATED_P (insn) = 1;
>>+
>>+ if (dump_file)
>>+ {
>>+ fprintf(dump_file, "new insn:\n");
>>+ print_rtl (dump_file, insn);
>>+ }
>>+
>>+ delete_insn (candidates [2]);
>>+ delete_insn (pop_rtx);
>>+}
>>+
>>+static void
>>+zcmp_popret (void)
>>+{
>>+ basic_block bb;
>>+ rtx_insn *insn = NULL, *pop_rtx = NULL;
>>+ rtx_insn *pop_candidates[3] = {NULL, };
>>+ /*
>>+ find NOTE_INSN_EPILOGUE_BEG, but pop_rtx not found => return
>>+ find NOTE_INSN_EPILOGUE_BEG, and pop_rtx is found => looking for a0
>>+ */
>>+
>>+ FOR_EACH_BB_REVERSE_FN (bb, cfun)
>>+ {
>>+ FOR_BB_INSNS_REVERSE (bb, insn)
>>+ {
>>+ if (!pop_rtx
>>+ && NOTE_P (insn)
>>+ && NOTE_KIND (insn) == NOTE_INSN_EPILOGUE_BEG)
>>+ return;
>>+
>>+ if (NOTE_P (insn)
>>+ && NOTE_KIND (insn) == NOTE_INSN_FUNCTION_BEG)
>>+ {
>>+ if (pop_rtx)
>>+ emit_zcmp_popret (pop_rtx, pop_candidates, bb);
>>+ return;
>>+ };
>>+
>>+ if (!(NONDEBUG_INSN_P (insn)
>>+ || CALL_P (insn)))
>>+ continue;
>>+
>>+ rtx pop_pat = PATTERN (insn);
>>+
>>+ if (GET_CODE (pop_pat) == PARALLEL
>>+ && riscv_valid_stack_push_pop_p (pop_pat, false))
>>+ {
>>+ pop_rtx = insn;
>>+ continue;
>>+ }
>>+
>>+ /* pattern for `ret`. */
>>+ if (JUMP_P (insn)
>>+ && GET_CODE (pop_pat) == PARALLEL
>>+ && XVECLEN (pop_pat, 0) == 2
>>+ && GET_CODE (XVECEXP (pop_pat, 0, 0)) == SIMPLE_RETURN
>>+ && GET_CODE (XVECEXP (pop_pat, 0, 1)) == USE)
>>+ {
>>+ rtx use_reg = XEXP (XVECEXP (pop_pat, 0, 1), 0);
>>+ if (REG_P (use_reg)
>>+ && REGNO (use_reg) == RETURN_ADDR_REGNUM)
>>+ {
>>+ pop_candidates [2] = insn;
>>+ continue;
>>+ }
>>+ }
>>+
>>+ if (!pop_rtx)
>>+ continue;
>>+
>>+ /* pattern for return value. */
>>+ if (!pop_candidates [0]
>>+ && GET_CODE (pop_pat) == USE)
>>+ {
>>+ rtx_insn *set_insn = PREV_INSN (insn);
>>+ rtx pat_set = PATTERN (set_insn);
>>+
>>+ if (riscv_check_regno (XEXP (pop_pat, 0),
>>+ RETURN_VALUE_REGNUM)
>>+ && insn
>>+ && pat_set != NULL
>>+ && GET_CODE (pat_set) == SET
>>+ && riscv_check_regno (SET_DEST (pat_set),
>>+ RETURN_VALUE_REGNUM)
>>+ && CONST_INT_P (SET_SRC (pat_set))
>>+ && INTVAL (SET_SRC (pat_set)) == 0)
>>+ {
>>+ pop_candidates [0] = set_insn;
>>+ pop_candidates [1] = insn;
>>+ break;
>>+ }
>>+ }
>>+ }
>>+
>>+ if (pop_rtx)
>>+ {
>>+ emit_zcmp_popret (pop_rtx, pop_candidates, bb);
>>+ return;
>>+ }
>>+ }
>>+}
>>+
>>+const pass_data pass_data_zcmp_popret =
>>+{
>>+ RTL_PASS, /* type */
>>+ "zcmp-popret", /* name */
>>+ OPTGROUP_NONE, /* optinfo_flags */
>>+ TV_NONE, /* tv_id */
>>+ 0, /* properties_required */
>>+ 0, /* properties_provided */
>>+ 0, /* properties_destroyed */
>>+ 0, /* todo_flags_start */
>>+ 0, /* todo_flags_finish */
>>+};
>>+
>>+class pass_zcmp_popret : public rtl_opt_pass
>>+{
>>+public:
>>+ pass_zcmp_popret (gcc::context *ctxt)
>>+ : rtl_opt_pass (pass_data_zcmp_popret, ctxt)
>>+ {}
>>+
>>+ /* opt_pass methods: */
>>+ virtual bool gate (function *)
>>+ { return TARGET_ZCMP; }
>>+ virtual unsigned int execute (function *)
>>+ {
>>+ zcmp_popret ();
>>+ return 0;
>>+ }
>>+}; // class pass_zcmp_popret
>>+
>>+} // anon namespace
>>+
>>+rtl_opt_pass *
>>+make_pass_zcmp_popret (gcc::context *ctxt)
>>+{
>>+ return new pass_zcmp_popret (ctxt);
>>+}
>>diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>>index 5f8cbfc15ed..17df2f3f8cf 100644
>>--- a/gcc/config/riscv/riscv.cc
>>+++ b/gcc/config/riscv/riscv.cc
>>@@ -114,6 +114,9 @@ struct GTY(()) riscv_frame_info {
>> /* Likewise FPR X. */
>> unsigned int fmask;
>>
>>+ /* How much the push/pop routines adjust sp (or 0 if unused). */
>>+ unsigned push_pop_sp_adjust;
>>+
>> /* How much the GPR save/restore routines adjust sp (or 0 if unused). */
>> unsigned save_libcall_adjustment;
>>
>>@@ -401,6 +404,20 @@ static const unsigned gpr_save_reg_order[] = {
>> S10_REGNUM, S11_REGNUM
>> };
>>
>>+/* Order for the CLOBBERs/USEs of push/pop. */
>>+static const unsigned push_save_reg_order[] = {
>>+ INVALID_REGNUM, RETURN_ADDR_REGNUM, S0_REGNUM,
>>+ S1_REGNUM, S2_REGNUM, S3_REGNUM, S4_REGNUM,
>>+ S5_REGNUM, S6_REGNUM, S7_REGNUM, S8_REGNUM,
>>+ S9_REGNUM, S10_REGNUM, S11_REGNUM
>>+};
>>+
>>+/* Order for the CLOBBERs/USEs of push/pop in rve. */
>>+static const unsigned push_save_reg_order_zcmpe[] = {
>>+ INVALID_REGNUM, RETURN_ADDR_REGNUM, S0_REGNUM,
>>+ S1_REGNUM
>>+};
>>+
>> /* A table describing all the processors GCC knows about. */
>> static const struct riscv_tune_info riscv_tune_info_table[] = {
>> #define RISCV_TUNE(TUNE_NAME, PIPELINE_MODEL, TUNE_INFO) \
>>@@ -2989,6 +3006,17 @@ riscv_output_return ()
>> return "ret";
>> }
>>
>>+bool
>>+riscv_output_popret_p (rtx op)
>>+{
>>+ unsigned n_rtx = XVECLEN (op, 0);
>>+ rtx use = XVECEXP (op, 0, n_rtx - 1);
>>+ rtx ret = XVECEXP (op, 0, n_rtx - 2);
>>+
>>+ return GET_CODE (ret) == SIMPLE_RETURN
>>+ && GET_CODE (use) == USE;
>>+}
>>+
>>
>>
>> /* Return true if CMP1 is a suitable second operand for integer ordering
>> test CODE. See also the *sCC patterns in riscv.md. */
>>@@ -4306,6 +4334,74 @@ riscv_memmodel_needs_amo_acquire (enum memmodel model)
>> }
>> }
>>
>>+/* Print Sp adjustment field of pop instruction. */
>>+
>>+static void
>>+riscv_print_pop_size (FILE *file, rtx op)
>>+{
>>+ unsigned sp_adjust_idx = XVECLEN (op, 0) - 1;
>>+ rtx sp_adjust_rtx = XVECEXP (op, 0, sp_adjust_idx);
>>+
>>+ /* Skip ret or pattern. */
>>+ while (GET_CODE (sp_adjust_rtx) != SET)
>>+ sp_adjust_rtx = XVECEXP (op, 0, --sp_adjust_idx);
>>+
>>+ rtx elt_plus = SET_SRC (sp_adjust_rtx);
>>+ fprintf (file, "%ld", INTVAL (XEXP (elt_plus, 1)));
>>+}
>>+
>>+/* Print push/pop register list. */
>>+
>>+static void
>>+riscv_print_reglist (FILE *file, rtx op)
>>+{
>>+ /* we only deal with three formats:
>>+ push {ra}
>>+ push {ra, s0}
>>+ push {ra, s0-sN}
>>+ or
>>+ pop {ra}
>>+ pop {ra, s0}
>>+ pop {ra, s0-sN}
>>+ registers except ra has to be continuous s-register,
>>+ and it is supposed to be checked before.
>>+ register list patterns in push:
>>+ (set/f (mem/c:SI
>>+ (plus:SI (reg/f:SI 2 sp)
>>+ (const_int 28 [0x1c])) [2 S4 A32])
>>+ (reg:SI 1 ra))
>>+ register list patterns in pop:
>>+ (set/f (reg:DI 1 ra)
>>+ (mem/c:DI (plus:DI (reg/f:DI 2 sp)
>>+ (const_int 8 [0x8])) [2 S8 A64]))
>>+ */
>>+ int total_count = XVECLEN (op, 0);
>>+ int n_regs = 0;
>>+ bool push_p = GET_CODE (XVECEXP (op, 0, 0)) == SET
>>+ && GET_CODE (SET_SRC (XVECEXP (op, 0, 0))) == PLUS;
>>+
>>+ for (int idx = 0; idx < total_count; ++idx)
>>+ {
>>+ rtx ele = XVECEXP (op, 0, idx);
>>+ if (GET_CODE (ele) != SET)
>>+ continue;
>>+
>>+ bool restore_save_p = push_p ?
>>+ MEM_P (SET_DEST (ele)) :
>>+ MEM_P (SET_SRC (ele));
>>+
>>+ if (restore_save_p)
>>+ n_regs ++;
>>+ }
>>+
>>+ if (n_regs > 2)
>>+ fprintf (file, "ra,s0-s%u", n_regs - 2);
>>+ else if (n_regs > 1)
>>+ fprintf (file, "ra,s0");
>>+ else
>>+ fputs("ra", file);
>>+}
>>+
>> /* Return true if a FENCE should be emitted to before a memory access to
>> implement the release portion of memory model MODEL. */
>>
>>@@ -4517,6 +4613,14 @@ riscv_print_operand (FILE *file, rtx op, int letter)
>> fputs (GET_RTX_NAME (code), file);
>> break;
>>
>>+ case 'L':
>>+ riscv_print_reglist (file, op);
>>+ break;
>>+
>>+ case 's':
>>+ riscv_print_pop_size (file, op);
>>+ break;
>>+
>> case 'S':
>> {
>> rtx newop = GEN_INT (ctz_hwi (INTVAL (op)));
>>@@ -4777,6 +4881,66 @@ riscv_use_save_libcall (const struct riscv_frame_info
>>*frame)
>> return frame->save_libcall_adjustment != 0;
>> }
>>
>>+/* Determine how many instructions related to push/pop instructions. */
>>+
>>+static unsigned
>>+riscv_save_push_pop_count (unsigned mask)
>>+{
>>+ if (!BITSET_P (mask, GP_REG_FIRST + RETURN_ADDR_REGNUM))
>>+ return 0;
>>+ for (unsigned n = GP_REG_LAST; n > GP_REG_FIRST; n--)
>>+ if (BITSET_P (mask, n)
>>+ && !call_used_regs [n])
>>+ /* add ra saving and sp adjust. */
>>+ return CALLEE_SAVED_REG_NUMBER (n) + 1 + 2;
>>+ abort ();
>>+}
>>+
>>+/* Calculate the maximum sp adjustment of push/pop instruction. */
>>+
>>+static unsigned
>>+riscv_push_pop_base_sp_adjust (unsigned mask)
>>+{
>>+ unsigned n_regs = riscv_save_push_pop_count (mask) - 1;
>>+ return (n_regs * UNITS_PER_WORD + 15) & (~0xf);
>>+}
>>+
>>+/* Determine whether to call push/pop routines. */
>>+
>>+static bool
>>+riscv_use_push_pop (const struct riscv_frame_info *frame, const HOST_WIDE_INT
>>frame_size)
>>+{
>>+ if (!TARGET_ZCMP)
>>+ return false;
>>+
>>+ /* We do not handler variable argument cases currently. */
>>+ if (cfun->machine->varargs_size != 0)
>>+ return false;
>>+
>>+ HOST_WIDE_INT base_size = riscv_push_pop_base_sp_adjust (frame->mask);
>>+ /*
>>+ Pr 960215-1.c in rv64 ouputs
>>+
>>+ addi sp,sp,-32
>>+ sd ra,24(sp)
>>+ sd s0,16(sp)
>>+ sd s2,8(sp)
>>+ sd s3,0(sp)
>>+ it is a rare case that callee saved registers are not non-continous,
>>+ which breaks the old push implementation, and we just reject this case
>>+ like save-restore does now.
>>+ */
>>+ if (base_size > frame_size)
>>+ return false;
>>+
>>+ /* {ra,s0-s10} is invalid. */
>>+ if (frame->mask & (1 << (S10_REGNUM - GP_REG_FIRST))
>>+ && !(frame->mask & (1 << (S11_REGNUM - GP_REG_FIRST))))
>>+ return false;
>>+
>>+ return frame->mask & (1 << (RETURN_ADDR_REGNUM - GP_REG_FIRST));
>>+}
>>+
>> /* Determine which GPR save/restore routine to call. */
>>
>> static unsigned
>>@@ -4934,6 +5098,8 @@ riscv_compute_frame_info (void)
>> /* Only use save/restore routines when the GPRs are atop the frame. */
>> if (known_ne (frame->hard_frame_pointer_offset, frame->total_size))
>> frame->save_libcall_adjustment = 0;
>>+
>>+ frame->push_pop_sp_adjust = 0;
>> }
>>
>> /* Make sure that we're not trying to eliminate to the wrong hard frame
>>@@ -5171,6 +5337,86 @@ riscv_for_each_saved_reg (poly_int64 sp_offset,
>>riscv_save_restore_fn fn,
>> }
>> }
>>
>>+static void
>>+riscv_emit_pop_insn (struct riscv_frame_info *frame, HOST_WIDE_INT offset,
>>HOST_WIDE_INT size)
>>+{
>>+ unsigned int veclen = riscv_save_push_pop_count (frame->mask);
>>+ unsigned int n_reg = veclen - 1;
>>+ rtvec vec = rtvec_alloc (veclen);
>>+ HOST_WIDE_INT sp_adjust;
>>+ rtx dwarf = NULL_RTX;
>>+
>>+ const unsigned *reg_order = (TARGET_ZCMP && TARGET_RVE)
>>+ ? push_save_reg_order_zcmpe
>>+ : push_save_reg_order;
>>+
>>+ gcc_assert (n_reg >= 1
>>+ && TARGET_ZCMP
>>+ && ((TARGET_RVE && (n_reg <= ARRAY_SIZE (push_save_reg_order_zcmpe)))
>>+ || (TARGET_ZCMP && (n_reg <= ARRAY_SIZE (push_save_reg_order)))));
>>+
>>+ /* sp adjust pattern */
>>+ int max_allow_sp_adjust = riscv_push_pop_base_sp_adjust (frame->mask) + 48;
>>+ int aligned_size = size;
>>+
>>+ /* if sp adjustment is too large, we should split it first. */
>>+ if (aligned_size > max_allow_sp_adjust)
>>+ {
>>+ rtx dwarf_pre_sp_adjust = NULL_RTX;
>>+ rtx pre_adjust_rtx = gen_add3_insn (stack_pointer_rtx,
>>+ stack_pointer_rtx,
>>+ GEN_INT (aligned_size - max_allow_sp_adjust));
>>+ rtx insn = emit_insn (pre_adjust_rtx);
>>+
>>+ rtx cfa_pre_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
>>+ GEN_INT (aligned_size - max_allow_sp_adjust));
>>+ dwarf_pre_sp_adjust = alloc_reg_note (REG_CFA_DEF_CFA,
>>+ cfa_pre_adjust_rtx,
>>+ dwarf_pre_sp_adjust);
>>+
>>+ RTX_FRAME_RELATED_P (insn) = 1;
>>+ REG_NOTES (insn) = dwarf_pre_sp_adjust;
>>+
>>+ sp_adjust = max_allow_sp_adjust;
>>+ }
>>+ else
>>+ sp_adjust = (aligned_size + 15) & (~0xf);
>>+
>>+ /* register save sequence. */
>>+ for (unsigned i = 1; i < veclen; ++i)
>>+ {
>>+ offset -= UNITS_PER_WORD;
>>+ unsigned regno = reg_order[i];
>>+ rtx reg = gen_rtx_REG (Pmode, regno);
>>+ rtx mem = gen_frame_mem (Pmode, plus_constant (Pmode,
>>+ stack_pointer_rtx,
>>+ offset));
>>+ rtx set = gen_rtx_SET (reg, mem);
>>+ RTVEC_ELT (vec, i - 1) = set;
>>+ RTX_FRAME_RELATED_P (set) = 1;
>>+ dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf);
>>+ }
>>+
>>+ /* sp adjust pattern */
>>+ rtx adjust_sp_rtx
>>+ = gen_rtx_SET (stack_pointer_rtx,
>>+ plus_constant (Pmode,
>>+ stack_pointer_rtx,
>>+ sp_adjust));
>>+ RTVEC_ELT (vec, veclen - 1) = adjust_sp_rtx;
>>+
>>+ rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
>>+ const0_rtx);
>>+ dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, dwarf);
>>+
>>+ frame->gp_sp_offset -= (veclen - 1) * UNITS_PER_WORD;
>>+ frame->push_pop_sp_adjust = sp_adjust;
>>+
>>+ rtx insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, vec));
>>+ RTX_FRAME_RELATED_P (insn) = 1;
>>+ REG_NOTES (insn) = dwarf;
>>+}
>>+
>> /* For stack frames that can't be allocated with a single ADDI instruction,
>> compute the best value to initially allocate. It must at a minimum
>> allocate enough space to spill the callee-saved registers. If TARGET_RVC,
>>@@ -5270,6 +5516,146 @@ riscv_emit_stack_tie (void)
>> emit_insn (gen_stack_tiedi (stack_pointer_rtx, hard_frame_pointer_rtx));
>> }
>>
>>+bool
>>+riscv_check_regno(rtx pat, unsigned regno)
>>+{
>>+ return REG_P (pat)
>>+ && REGNO (pat) == regno;
>>+}
>>+
>>+/* Function to check whether the OP is a valid stack push/pop operation.
>>+ This part is borrowed from nds32 nds32_valid_stack_push_pop_p */
>>+
>>+bool
>>+riscv_valid_stack_push_pop_p (rtx op, bool push_p)
>>+{
>>+ int index;
>>+ int total_count;
>>+ int sp_adjust_rtx_index;
>>+ rtx elt;
>>+ rtx elt_reg;
>>+ rtx elt_plus;
>>+
>>+ if (!TARGET_ZCMP)
>>+ return false;
>>+
>>+ total_count = XVECLEN (op, 0);
>>+ sp_adjust_rtx_index = push_p ? 0 : total_count - 1;
>>+
>>+ /* At least sp + one callee save/restore register rtx */
>>+ if (total_count < 2)
>>+ return false;
>>+
>>+ /* Perform some quick check for that every element should be 'set',
>>+ for pop, it might contain `ret` and `ret value` pattern. */
>>+ for (index = 0; index < total_count; index++)
>>+ {
>>+ elt = XVECEXP (op, 0, index);
>>+
>>+ /* skip pop return value rtx */
>>+ if (!push_p && GET_CODE (elt) == SET
>>+ && riscv_check_regno (SET_DEST (elt), RETURN_VALUE_REGNUM)
>>+ && total_count >= 4
>>+ && index + 1 < total_count
>>+ && GET_CODE (XVECEXP (op, 0, index + 1)) == USE)
>>+ {
>>+ rtx use_reg = XEXP (XVECEXP (op, 0, index + 1), 0);
>>+
>>+ if (!riscv_check_regno (use_reg, RETURN_VALUE_REGNUM))
>>+ return false;
>>+
>>+ index += 1;
>>+ continue;
>>+ }
>>+
>>+ /* skip ret rtx */
>>+ if (!push_p && GET_CODE (elt) == SIMPLE_RETURN
>>+ && total_count >= 4
>>+ && index + 1 < total_count
>>+ && GET_CODE (XVECEXP (op, 0, index + 1)) == USE)
>>+ {
>>+ rtx use_reg = XEXP (XVECEXP (op, 0, index + 1), 0);
>>+
>>+ if (!riscv_check_regno (use_reg, RETURN_ADDR_REGNUM))
>>+ return false;
>>+
>>+ index += 1;
>>+ sp_adjust_rtx_index -= 2;
>>+ continue;
>>+ }
>>+
>>+ if (GET_CODE (elt) != SET)
>>+ return false;
>>+ }
>>+
>>+ elt = XVECEXP (op, 0, sp_adjust_rtx_index);
>>+ elt_reg = SET_DEST (elt);
>>+ elt_plus = SET_SRC (elt);
>>+
>>+ /* Check this is (set (stack_reg) (plus stack_reg const)) pattern. */
>>+ if (GET_CODE (elt_plus) != PLUS
>>+ || !riscv_check_regno (elt_reg, STACK_POINTER_REGNUM))
>>+ return false;
>>+
>>+ /* Pass all test, this is a valid rtx. */
>>+ return true;
>>+}
>>+
>>+/* Generate push/pop rtx */
>>+
>>+static void
>>+riscv_emit_push_insn (struct riscv_frame_info *frame, HOST_WIDE_INT size)
>>+{
>>+ unsigned int veclen = riscv_save_push_pop_count (frame->mask);
>>+ unsigned int n_reg = veclen - 1;
>>+ rtvec vec = rtvec_alloc (veclen);
>>+
>>+ const unsigned *reg_order = (TARGET_ZCMP && TARGET_RVE)
>>+ ? push_save_reg_order_zcmpe
>>+ : push_save_reg_order;
>>+
>>+ int aligned_size = (size + 15) & (~0xf);
>>+
>>+ gcc_assert (n_reg >= 1
>>+ && TARGET_ZCMP
>>+ && ((TARGET_RVE && (n_reg <= ARRAY_SIZE (push_save_reg_order_zcmpe)))
>>+ || (TARGET_ZCMP && (n_reg <= ARRAY_SIZE (push_save_reg_order)))));
>>+
>>+ /* sp adjust pattern */
>>+ int max_allow_sp_adjust = riscv_push_pop_base_sp_adjust (frame->mask) + 48;
>>+ int sp_adjust = aligned_size > max_allow_sp_adjust ?
>>+ max_allow_sp_adjust
>>+ : aligned_size;
>>+
>>+ /*TODO: move this part to frame computation function. */
>>+ frame->gp_sp_offset = (veclen - 1) * UNITS_PER_WORD;
>>+ frame->push_pop_sp_adjust = sp_adjust;
>>+
>>+ rtx adjust_sp_rtx
>>+ = gen_rtx_SET (stack_pointer_rtx,
>>+ plus_constant (Pmode,
>>+ stack_pointer_rtx,
>>+ -sp_adjust));
>>+ RTVEC_ELT (vec, 0) = adjust_sp_rtx;
>>+
>>+ /* Register save sequence. */
>>+ for (unsigned i = 1; i < veclen; ++i)
>>+ {
>>+ sp_adjust -= UNITS_PER_WORD;
>>+ unsigned regno = reg_order[i];
>>+ rtx reg = gen_rtx_REG (Pmode, regno);
>>+ rtx mem = gen_frame_mem (Pmode, plus_constant (Pmode,
>>+ stack_pointer_rtx,
>>+ sp_adjust));
>>+ rtx set = gen_rtx_SET (mem, reg);
>>+ RTVEC_ELT (vec, i) = set;
>>+ RTX_FRAME_RELATED_P (set) = 1;
>>+ }
>>+
>>+ rtx insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, vec));
>>+ RTX_FRAME_RELATED_P (insn) = 1;
>>+}
>>+
>> /* Expand the "prologue" pattern. */
>>
>> void
>>@@ -5278,6 +5664,7 @@ riscv_expand_prologue (void)
>> struct riscv_frame_info *frame = &cfun->machine->frame;
>> poly_int64 size = frame->total_size;
>> unsigned mask = frame->mask;
>>+ HOST_WIDE_INT step1 = riscv_first_stack_step (frame);
>> rtx insn;
>>
>> if (flag_stack_usage_info)
>>@@ -5300,19 +5687,32 @@ riscv_expand_prologue (void)
>> REG_NOTES (insn) = dwarf;
>> }
>>
>>+ if (size.is_constant ())
>>+ step1 = MIN (size.to_constant(), step1);
>>+ if (riscv_use_push_pop (frame, step1))
>>+ {
>>+ riscv_emit_push_insn (frame, step1);
>>+
>>+ step1 = MAX (step1 - frame->push_pop_sp_adjust, 0);
>>+ size = MAX (size.to_constant() - frame->push_pop_sp_adjust, 0);
>>+ frame->mask &= ~ ((TARGET_ZCMP && TARGET_RVE) ?
>>+ RISCV_ZCMPE_PUSH_POP_MASK
>>+ : RISCV_ZCE_PUSH_POP_MASK);
>>+ }
>>+
>> /* Save the registers. */
>> if ((frame->mask | frame->fmask) != 0)
>> {
>>- HOST_WIDE_INT step1 = riscv_first_stack_step (frame);
>>- if (size.is_constant ())
>>- step1 = MIN (size.to_constant(), step1);
>>-
>>- insn = gen_add3_insn (stack_pointer_rtx,
>>- stack_pointer_rtx,
>>- GEN_INT (-step1));
>>- RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
>>- size -= step1;
>>- riscv_for_each_saved_reg (size, riscv_save_reg, false, false);
>>+ if (step1 > 0)
>>+ {
>>+ insn = gen_add3_insn (stack_pointer_rtx,
>>+ stack_pointer_rtx,
>>+ GEN_INT (-step1));
>>+ RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
>>+ size -= step1;
>>+ }
>>+ riscv_for_each_saved_reg (size, riscv_save_reg,
>>+ false /* bool epilogue */, false /* bool maybe_eh_return */);
>> }
>>
>> frame->mask = mask; /* Undo the above fib. */
>>@@ -5412,6 +5812,8 @@ riscv_expand_epilogue (int style)
>> rtx ra = gen_rtx_REG (Pmode, RETURN_ADDR_REGNUM);
>> rtx insn;
>>
>>+ bool use_zcmp_pop = !use_restore_libcall && !(crtl->calls_eh_return);
>>+
>> /* We need to add memory barrier to prevent read from deallocated stack. */
>> bool need_barrier_p = known_ne (get_frame_size ()
>> + cfun->machine->frame.arg_pointer_offset, 0);
>>@@ -5538,6 +5940,18 @@ riscv_expand_epilogue (int style)
>> if (use_restore_libcall)
>> frame->mask = 0; /* Temporarily fib that we need not save GPRs. */
>>
>>+ if (use_zcmp_pop && riscv_use_push_pop (frame, step2))
>>+ {
>>+ /* Emit a barrier to prevent loads from a deallocated stack. */
>>+ riscv_emit_stack_tie ();
>>+ need_barrier_p = false;
>>+ riscv_emit_pop_insn (frame, frame->total_size.to_constant(), step2);
>>+ frame->mask &= ~ ((TARGET_ZCMP && TARGET_RVE) ?
>>+ RISCV_ZCMPE_PUSH_POP_MASK
>>+ : RISCV_ZCE_PUSH_POP_MASK);
>>+ step2 = 0;
>>+ }
>>+
>> /* Restore the registers. */
>> riscv_for_each_saved_reg (frame->total_size - step2, riscv_restore_reg,
>> true, style == EXCEPTION_RETURN);
>>@@ -5552,6 +5966,9 @@ riscv_expand_epilogue (int style)
>> if (need_barrier_p)
>> riscv_emit_stack_tie ();
>>
>>+ if (use_zcmp_pop)
>>+ frame->mask = mask;
>>+
>> /* Deallocate the final bit of the frame. */
>> if (step2 > 0)
>> {
>>diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
>>index d05b1d59853..6e6e3ee2c25 100644
>>--- a/gcc/config/riscv/riscv.h
>>+++ b/gcc/config/riscv/riscv.h
>>@@ -383,6 +383,7 @@ ASM_MISA_SPEC
>> #define HARD_FRAME_POINTER_REGNUM 8
>> #define STACK_POINTER_REGNUM 2
>> #define THREAD_POINTER_REGNUM 4
>>+#define RETURN_VALUE_REGNUM 10
>>
>> /* These two registers don't really exist: they get eliminated to either
>> the stack or hard frame pointer. */
>>@@ -1097,4 +1098,7 @@ extern void riscv_remove_unneeded_save_restore_calls
>>(void);
>> #define DWARF_REG_TO_UNWIND_COLUMN(REGNO) \
>> ((REGNO == RISCV_DWARF_VLENB) ? (FIRST_PSEUDO_REGISTER + 1) : REGNO)
>>
>>+#define RISCV_ZCE_PUSH_POP_MASK 0x0ffc0302u
>>+#define RISCV_ZCMPE_PUSH_POP_MASK 0x302u
>>+
>> #endif /* ! GCC_RISCV_H */
>>diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
>>index bc384d9aedf..b9f2a426e48 100644
>>--- a/gcc/config/riscv/riscv.md
>>+++ b/gcc/config/riscv/riscv.md
>>@@ -108,12 +108,14 @@
>>
>> (define_constants
>> [(RETURN_ADDR_REGNUM 1)
>>+ (SP_REGNUM 2)
>> (GP_REGNUM 3)
>> (TP_REGNUM 4)
>> (T0_REGNUM 5)
>> (T1_REGNUM 6)
>> (S0_REGNUM 8)
>> (S1_REGNUM 9)
>>+ (A0_REGNUM 10)
>> (S2_REGNUM 18)
>> (S3_REGNUM 19)
>> (S4_REGNUM 20)
>>@@ -3147,3 +3149,4 @@
>> (include "sifive-7.md")
>> (include "thead.md")
>> (include "vector.md")
>>+(include "zc.md")
>>diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
>>index 6e326fc7e02..9ef522306a5 100644
>>--- a/gcc/config/riscv/t-riscv
>>+++ b/gcc/config/riscv/t-riscv
>>@@ -90,6 +90,10 @@ riscv-v.o: $(srcdir)/config/riscv/riscv-v.cc \
>> $(COMPILE) $<
>> $(POSTCOMPILE)
>>
>>+riscv-zcmp-popret.o: $(srcdir)/config/riscv/riscv-zcmp-popret.cc
>>+ $(COMPILE) $<
>>+ $(POSTCOMPILE)
>>+
>> thead.o: $(srcdir)/config/riscv/thead.cc \
>> $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TARGET_H) backend.h $(RTL_H) \
>> memmodel.h $(EMIT_RTL_H) poly-int.h output.h
>>diff --git a/gcc/config/riscv/zc.md b/gcc/config/riscv/zc.md
>>new file mode 100644
>>index 00000000000..3ad34dacd49
>>--- /dev/null
>>+++ b/gcc/config/riscv/zc.md
>>@@ -0,0 +1,47 @@
>>+;; Machine description for ZCE extension.
>>+;; Copyright (C) 2021 Free Software Foundation, Inc.
>>+
>>+;; This file is part of GCC.
>>+
>>+;; GCC is free software; you can redistribute it and/or modify
>>+;; it under the terms of the GNU General Public License as published by
>>+;; the Free Software Foundation; either version 3, or (at your option)
>>+;; any later version.
>>+
>>+;; GCC is distributed in the hope that it will be useful,
>>+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
>>+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>>+;; GNU General Public License for more details.
>>+
>>+;; You should have received a copy of the GNU General Public License
>>+;; along with GCC; see the file COPYING3. If not see
>>+;; <http://www.gnu.org/licenses/>.
>>+
>>+(define_insn "*stack_push<mode>"
>>+ [(match_parallel 0 "riscv_stack_push_operation"
>>+ [(set (reg:X SP_REGNUM) (plus:X (reg:X SP_REGNUM)
>>+ (match_operand:X 1 "const_int_operand" "")))])]
>>+ "TARGET_ZCMP"
>>+ "cm.push\t{%L0},%1")
>>+
>>+(define_insn "*stack_pop<mode>"
>>+ [(match_parallel 0 "riscv_stack_pop_operation"
>>+ [(set (match_operand:X 1 "register_operand" "")
>>+ (mem:X (plus:X (reg:X SP_REGNUM)
>>+ (match_operand:X 2 "const_int_operand" ""))))])]
>>+ "TARGET_ZCMP"
>>+ {
>>+ return riscv_output_popret_p (operands[0]) ?
>>+ "cm.popret\t{%L0},%s0" :
>>+ "cm.pop\t{%L0},%s0";
>>+ })
>>+
>>+(define_insn "*stack_pop_with_return_value<mode>"
>>+ [(match_parallel 0 "riscv_stack_pop_operation"
>>+ [(set (reg:ANYI A0_REGNUM)
>>+ (match_operand:ANYI 1 "pop_return_value_constant" ""))])]
>>+ "TARGET_ZCMP"
>>+ {
>>+ gcc_assert (riscv_output_popret_p (operands[0]));
>>+ return "cm.popretz\t{%L0},%s0";
>>+ })
>>--
>>2.25.1 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re: [PATCH 4/5] RISC-V: Add Zcmp extension supports.
  2023-05-12  8:12     ` Sinan
@ 2023-05-12  9:10       ` Fei Gao
  0 siblings, 0 replies; 9+ messages in thread
From: Fei Gao @ 2023-05-12  9:10 UTC (permalink / raw)
  To: Sinan; +Cc: gcc-patches, jiawei

On 2023-05-12 16:12  Sinan <sinan.lin@linux.alibaba.com> wrote:
>
>Hi Fei,
>Sorry for the late reply, I've been busy with moving these days :(.
>Thanks for working on it. I would prefer removing the extra pass for popretz if possible ... I will test your patches ASAP.
>BR,
>Sinan 

hi Sinan

I posted V2 based on Kito's comment just now.
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg307507.html

For popretz, we can discuss further offline if it's convenient to you.

BR, 
Fei
>------------------------------------------------------------------
>Sender:Fei Gao <gaofei@eswincomputing.com>
>Sent At:2023 May 6 (Sat.) 16:53
>Recipient:Sinan <sinan.lin@linux.alibaba.com>
>Cc:jiawei <jiawei@iscas.ac.cn>; gcc-patches <gcc-patches@gcc.gnu.org>
>Subject:Re: Re: [PATCH 4/5] RISC-V: Add Zcmp extension supports.
>On 2023-05-05 23:57 Sinan <sinan.lin@linux.alibaba.com> wrote:
>>
>>> hi Jiawei
>>>
>>> Please ignore my previous reply. I accidently sent the email before I finished it.
>>> Sorry for that!
>>>
>>> I downloaded the series of patches from you and found in some cases
>>> it fails to generate zcmp push and pop insns.
>>>
>>> TC:
>>>
>>> char my_getchar();
>>> int test_s0()
>>> {
>>>
>>> int a = my_getchar();
>>> int b = my_getchar();
>>> return a+b;
>>> }
>>>
>>> cc1 -fno-shrink-wrap-separate -O2 -march=rv32e_zca_zcmp -mabi=ilp32e -mcmodel=medlow test.c
>>>
>>> -fno-shrink-wrap-separate is used here to avoid the impact from shrink-wrap-separate that is by default
>>> enabled in O2.
>>>
>>> As i'm also interested in Zc*, i did some changes mainly in prologue and epilogue pass quite simliar to
>>> what has been done for save and restore except the CFI directives due to reversed order that zcmp
>>> pushes and pops ra, s regs than what save and restore do.
>>>
>>> I will refine and share the code soon for your review.
>>>
>>> BR
>>> Fei
>>Hi Fei,
>>In the current implementation, cm.push will not increase the original adjustment size of the stack pointer. As cm.push uses a minimum adjustment size of 16, and in your example, the adjustment size of sp is 12, so cm.push will not be generated.
>>you can find the check at riscv_use_push_pop
>>> > + */
>>> > + if (base_size > frame_size)
>>> > + return false;
>>> > +
>>And if this check is removed, then you can get the output that you expect.
>>```
>> cm.push {ra,s0},-16
>> call my_getchar
>> mv s0,a0
>> call my_getchar
>> add a0,s0,a0
>> cm.popret {ra,s0},16
>>```
>>In many scenarios of rv32e, cm.push cannot be generated as a result. Perhaps we can remove this check? I haven't tested if it is ok to remove this check, and CC jiawei to help test it.
>>BR,
>>Sinan
>hi Sinan
>Thanks for your reply.
>I posted my codes at https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg306921.html
>In the cover letter, i did some comparision.
>Could you please review?
>Thanks & BR,
>Fei
>>------------------------------------------------------------------
>>Sender:Fei Gao <gaofei@eswincomputing.com>
>>Sent At:2023 Apr. 25 (Tue.) 18:12
>>Recipient:jiawei <jiawei@iscas.ac.cn>
>>Cc:gcc-patches <gcc-patches@gcc.gnu.org>
>>Subject:[PATCH 4/5] RISC-V: Add Zcmp extension supports.
>>hi Jiawei
>>Please ignore my previous reply. I accidently sent the email before I finished it.
>>Sorry for that!
>>I downloaded the series of patches from you and found in some cases
>>it fails to generate zcmp push and pop insns.
>>TC:
>>char my_getchar();
>>int test_s0()
>>{
>> int a = my_getchar();
>> int b = my_getchar();
>> return a+b;
>>}
>>cc1 -fno-shrink-wrap-separate -O2 -march=rv32e_zca_zcmp -mabi=ilp32e -mcmodel=medlow test.c
>>-fno-shrink-wrap-separate is used here to avoid the impact from shrink-wrap-separate that is by default
>>enabled in O2.
>>As i'm also interested in Zc*, i did some changes mainly in prologue and epilogue pass quite simliar to
>>what has been done for save and restore except the CFI directives due to reversed order that zcmp
>>pushes and pops ra, s regs than what save and restore do.
>>I will refine and share the code soon for your review.
>>BR
>>Fei
>>On Thu Apr 6 06:21:17 GMT 2023 Jiawei jiawei@iscas.ac.cn wrote:
>>>
>>>Add Zcmp extension instructions support. Generate push/pop
>>>with follow steps:
>>>
>>> 1. preprocessing:
>>> 1.1. if there is no push rtx, then just return. e.g.
>>> (note 5 1 22 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>>> (insn/f 22 5 23 2 (set (reg/f:SI 2 sp)
>>> (plus:SI (reg/f:SI 2 sp)
>>> (const_int -32 [0xffffffffffffffe0])))
>>> (nil))
>>> (note 23 22 2 2 NOTE_INSN_PROLOGUE_END)
>>> 1.2. if push rtx exists, then we compute the number of
>>> pushed s-registers, n_sreg.
>>>
>>> push rtx should be find before NOTE_INSN_PROLOGUE_END tag
>>>
>>> [2 and 3 happend simultaneously]
>>>
>>> 2. find valid move pattern, mv sN, aN, where N < n_sreg,
>>> and aN is not used the move pattern, and sN is not
>>> defined before the move pattern (from prologue to the
>>> position of move pattern).
>>>
>>> 3. analysis use and reach of every instruction from prologue
>>> to the position of move pattern.
>>> if any sN is used, then we mark the corresponding argument list
>>> candidate as invalid.
>>> e.g.
>>> push {ra,s0-s3}, {}, -32
>>> sw s0,44(sp) # s0 is used, then argument list is invalid
>>> mv a0,a5 # a0 is defined, then argument list is invalid
>>> ...
>>> mv s0,a0
>>> mv s1,a1
>>> mv s2,a2
>>>
>>> 4. if there is a valid argument list, then replace the pop
>>> push parallel insn, and delete mv pattern.
>>> if not, skip.
>>>
>>>All "zcmpe" means Zcmp with RVE extension.
>>>The push/pop instrunction implement is mostly finished by Sinan Lin.
>>>
>>>Co-Authored by: Sinan Lin <sinan....@linux.alibaba.com>
>>>Co-Authored by: Simon Cook <simon.c...@embecosm.com>
>>>Co-Authored by: Shihua Liao <shi...@iscas.ac.cn>
>>>
>>>gcc/ChangeLog:
>>>
>>> * config.gcc: New object.
>>> * config/riscv/predicates.md (riscv_stack_push_operation):
>>> New predicate.
>>> (riscv_stack_pop_operation): Ditto.
>>> (pop_return_value_constant): Ditto.
>>> * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): New pass.
>>> * config/riscv/riscv-protos.h (riscv_output_popret_p):
>>> New routine.
>>> (riscv_valid_stack_push_pop_p): Ditto.
>>> (riscv_check_regno): Ditto.
>>> (make_pass_zcmp_popret): New pass.
>>> * config/riscv/riscv.cc (struct riscv_frame_info): New variable.
>>> (riscv_output_popret_p): New function.
>>> (riscv_print_pop_size): Ditto.
>>> (riscv_print_reglist): Ditto.
>>> (riscv_print_operand): New case symbols.
>>> (riscv_save_push_pop_count): New function.
>>> (riscv_push_pop_base_sp_adjust): Ditto.
>>> (riscv_use_push_pop): Ditto.
>>> (riscv_compute_frame_info): Adjust frame value.
>>> (riscv_emit_pop_insn): New function.
>>> (riscv_check_regno): Ditto.
>>> (riscv_valid_stack_push_pop_p): Ditto.
>>> (riscv_emit_push_insn): Ditto.
>>> (riscv_expand_prologue): Modify frame pattern.
>>> (riscv_expand_epilogue): Ditto.
>>> * config/riscv/riscv.h (RETURN_VALUE_REGNUM):
>>> (RISCV_ZCE_PUSH_POP_MASK): New mask.
>>> (RISCV_ZCMPE_PUSH_POP_MASK): Ditto.
>>> * config/riscv/riscv.md: Add new reg number and include info.
>>> * config/riscv/t-riscv: New object rules.
>>> * config/riscv/riscv-zcmp-popret.cc: New file.
>>> * config/riscv/zc.md: New file.
>>>---
>>> gcc/config.gcc | 2 +-
>>> gcc/config/riscv/predicates.md | 16 +
>>> gcc/config/riscv/riscv-passes.def | 1 +
>>> gcc/config/riscv/riscv-protos.h | 4 +
>>> gcc/config/riscv/riscv-zcmp-popret.cc | 260 +++++++++++++++
>>> gcc/config/riscv/riscv.cc | 437 +++++++++++++++++++++++++-
>>> gcc/config/riscv/riscv.h | 4 +
>>> gcc/config/riscv/riscv.md | 3 +
>>> gcc/config/riscv/t-riscv | 4 +
>>> gcc/config/riscv/zc.md | 47 +++
>>> 10 files changed, 767 insertions(+), 11 deletions(-)
>>> create mode 100644 gcc/config/riscv/riscv-zcmp-popret.cc
>>> create mode 100644 gcc/config/riscv/zc.md
>>>
>>>diff --git a/gcc/config.gcc b/gcc/config.gcc
>>>index 629d324b5ef..a991c5273f9 100644
>>>--- a/gcc/config.gcc
>>>+++ b/gcc/config.gcc
>>>@@ -529,7 +529,7 @@ pru-*-*)
>>> ;;
>>> riscv*)
>>> cpu_type=riscv
>>>- extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o
>>>riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o"
>>>+ extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o
>>>riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o
>>>riscv-zcmp-popret.o"
>>> extra_objs="${extra_objs} riscv-vector-builtins.o
>>>riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
>>> extra_objs="${extra_objs} thead.o"
>>> d_target_objs="riscv-d.o"
>>>diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
>>>index 0d9d7701c7e..6bff6cd047a 100644
>>>--- a/gcc/config/riscv/predicates.md
>>>+++ b/gcc/config/riscv/predicates.md
>>>@@ -412,3 +412,19 @@
>>> (and (match_code "const_int")
>>> (ior (match_operand 0 "not_uimm_extra_bit_operand")
>>> (match_operand 0 "const_nottwobits_operand"))))
>>>+
>>>+(define_special_predicate "riscv_stack_push_operation"
>>>+ (match_code "parallel")
>>>+{
>>>+ return riscv_valid_stack_push_pop_p (op, true);
>>>+})
>>>+
>>>+(define_special_predicate "riscv_stack_pop_operation"
>>>+ (match_code "parallel")
>>>+{
>>>+ return riscv_valid_stack_push_pop_p (op, false);
>>>+})
>>>+
>>>+(define_predicate "pop_return_value_constant"
>>>+ (and (match_code "const_int")
>>>+ (match_test "INTVAL (op) == 0")))
>>>diff --git a/gcc/config/riscv/riscv-passes.def
>>>b/gcc/config/riscv/riscv-passes.def
>>>index 4084122cf0a..25625b9af3e 100644
>>>--- a/gcc/config/riscv/riscv-passes.def
>>>+++ b/gcc/config/riscv/riscv-passes.def
>>>@@ -19,3 +19,4 @@
>>>
>>> INSERT_PASS_AFTER (pass_rtl_store_motion, 1, pass_shorten_memrefs);
>>> INSERT_PASS_BEFORE (pass_fast_rtl_dce, 1, pass_vsetvl);
>>>+INSERT_PASS_AFTER (pass_cprop_hardreg, 1, pass_zcmp_popret);
>>>diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
>>>index 4611447ddde..8f243cd5f44 100644
>>>--- a/gcc/config/riscv/riscv-protos.h
>>>+++ b/gcc/config/riscv/riscv-protos.h
>>>@@ -54,6 +54,7 @@ extern bool riscv_split_64bit_move_p (rtx, rtx);
>>> extern void riscv_split_doubleword_move (rtx, rtx);
>>> extern const char *riscv_output_move (rtx, rtx);
>>> extern const char *riscv_output_return ();
>>>+extern bool riscv_output_popret_p (rtx);
>>>
>>> #ifdef RTX_CODE
>>> extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx);
>>>@@ -79,6 +80,8 @@ extern void riscv_reinit (void);
>>> extern poly_uint64 riscv_regmode_natural_size (machine_mode);
>>> extern bool riscv_v_ext_vector_mode_p (machine_mode);
>>> extern bool riscv_shamt_matches_mask_p (int, HOST_WIDE_INT);
>>>+extern bool riscv_valid_stack_push_pop_p (rtx, bool);
>>>+extern bool riscv_check_regno(rtx, unsigned);
>>>
>>> /* Routines implemented in riscv-c.cc. */
>>> void riscv_cpu_cpp_builtins (cpp_reader *);
>>>@@ -99,6 +102,7 @@ extern bool riscv_hard_regno_rename_ok (unsigned, unsigned);
>>>
>>> rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
>>> rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
>>>+rtl_opt_pass * make_pass_zcmp_popret (gcc::context *ctxt);
>>>
>>> /* Information about one CPU we know about. */
>>> struct riscv_cpu_info {
>>>diff --git a/gcc/config/riscv/riscv-zcmp-popret.cc
>>>b/gcc/config/riscv/riscv-zcmp-popret.cc
>>>new file mode 100644
>>>index 00000000000..d7b40f6a3e2
>>>--- /dev/null
>>>+++ b/gcc/config/riscv/riscv-zcmp-popret.cc
>>>@@ -0,0 +1,260 @@
>>>+#include "config.h"
>>>+#include "system.h"
>>>+#include "coretypes.h"
>>>+#include "tm.h"
>>>+#include "rtl.h"
>>>+#include "backend.h"
>>>+#include "regs.h"
>>>+#include "target.h"
>>>+#include "memmodel.h"
>>>+#include "emit-rtl.h"
>>>+#include "df.h"
>>>+#include "predict.h"
>>>+#include "tree-pass.h"
>>>+#include "tree.h"
>>>+#include "tm_p.h"
>>>+#include "optabs.h"
>>>+#include "recog.h"
>>>+#include "cfgrtl.h"
>>>+
>>>+#define IN_TARGET_CODE 1
>>>+
>>>+namespace {
>>>+
>>>+/*
>>>+ 1. preprocessing:
>>>+ 1.1. if there is no push rtx, then just return. e.g.
>>>+ (note 5 1 22 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>>>+ (insn/f 22 5 23 2 (set (reg/f:SI 2 sp)
>>>+ (plus:SI (reg/f:SI 2 sp)
>>>+ (const_int -32 [0xffffffffffffffe0])))
>>>+ (nil))
>>>+ (note 23 22 2 2 NOTE_INSN_PROLOGUE_END)
>>>+ 1.2. if push rtx exists, then we compute the number of
>>>+ pushed s-registers, n_sreg.
>>>+
>>>+ push rtx should be find before NOTE_INSN_PROLOGUE_END tag
>>>+
>>>+ [2 and 3 happend simultaneously]
>>>+ 2. find valid move pattern, mv sN, aN, where N < n_sreg,
>>>+ and aN is not used the move pattern, and sN is not
>>>+ defined before the move pattern (from prologue to the
>>>+ position of move pattern).
>>>+ 3. analysis use and reach of every instruction from prologue
>>>+ to the position of move pattern.
>>>+ if any sN is used, then we mark the corresponding argument list
>>>+ candidate as invalid.
>>>+ e.g.
>>>+ push {ra,s0-s3}, {}, -32
>>>+ sw s0,44(sp) # s0 is used, then argument list is invalid
>>>+ mv a0,a5 # a0 is defined, then argument list is invalid
>>>+ ...
>>>+ mv s0,a0
>>>+ mv s1,a1
>>>+ mv s2,a2
>>>+
>>>+ 4. if there is a valid argument list, then replace the pop
>>>+ push parallel insn, and delete mv pattern.
>>>+ if not, skip.
>>>+*/
>>>+
>>>+static void
>>>+emit_zcmp_popret (rtx_insn *pop_rtx,
>>>+ rtx_insn **candidates,
>>>+ basic_block bb)
>>>+{
>>>+ bool gen_popretz_p = candidates [0];
>>>+ bool gen_popret_p = candidates [2];
>>>+
>>>+ if (!(gen_popret_p || gen_popretz_p))
>>>+ return;
>>>+
>>>+ gcc_assert ((gen_popret_p && !gen_popretz_p)
>>>+ || (gen_popretz_p && gen_popret_p));
>>>+
>>>+ rtx pop_pat = PATTERN (pop_rtx);
>>>+ unsigned pop_idx = 0, popret_idx = 0;
>>>+ unsigned n_pop_par = XVECLEN (pop_pat, 0);
>>>+ unsigned n_popret_par = n_pop_par
>>>+ + (gen_popretz_p ? 2 : 0)
>>>+ + (gen_popret_p ? 2 : 0);
>>>+
>>>+ rtx popret_par = gen_rtx_PARALLEL (VOIDmode,
>>>+ rtvec_alloc (n_popret_par));
>>>+
>>>+ /* return zero pattern */
>>>+ if (gen_popretz_p)
>>>+ {
>>>+ XVECEXP (popret_par, 0, 0) = PATTERN (candidates[0]);
>>>+ XVECEXP (popret_par, 0, 1) = PATTERN (candidates[1]);
>>>+ popret_idx += 2;
>>>+ delete_insn (candidates[0]);
>>>+ delete_insn (candidates[1]);
>>>+ }
>>>+
>>>+ /* copy pop paruence. */
>>>+ for (; pop_idx < n_pop_par;
>>>+ pop_idx ++, popret_idx ++)
>>>+ {
>>>+ XVECEXP (popret_par, 0, popret_idx) =
>>>+ XVECEXP (pop_pat, 0, pop_idx);
>>>+ }
>>>+
>>>+ /* ret pattern. */
>>>+ rtx ret_pat = PATTERN (candidates[2]);
>>>+ gcc_assert (GET_CODE (ret_pat) == PARALLEL);
>>>+
>>>+ for (int i = 0; i < XVECLEN (ret_pat, 0);
>>>+ i++, popret_idx++)
>>>+ {
>>>+ XVECEXP (popret_par, 0, popret_idx) =
>>>+ XVECEXP (ret_pat, 0, i);
>>>+ }
>>>+
>>>+ rtx_insn *insn = emit_jump_insn_after (
>>>+ popret_par,
>>>+ BB_END (bb));
>>>+ JUMP_LABEL (insn) = simple_return_rtx;
>>>+
>>>+ REG_NOTES (insn) = REG_NOTES (pop_rtx);
>>>+ RTX_FRAME_RELATED_P (insn) = 1;
>>>+
>>>+ if (dump_file)
>>>+ {
>>>+ fprintf(dump_file, "new insn:\n");
>>>+ print_rtl (dump_file, insn);
>>>+ }
>>>+
>>>+ delete_insn (candidates [2]);
>>>+ delete_insn (pop_rtx);
>>>+}
>>>+
>>>+static void
>>>+zcmp_popret (void)
>>>+{
>>>+ basic_block bb;
>>>+ rtx_insn *insn = NULL, *pop_rtx = NULL;
>>>+ rtx_insn *pop_candidates[3] = {NULL, };
>>>+ /*
>>>+ find NOTE_INSN_EPILOGUE_BEG, but pop_rtx not found => return
>>>+ find NOTE_INSN_EPILOGUE_BEG, and pop_rtx is found => looking for a0
>>>+ */
>>>+
>>>+ FOR_EACH_BB_REVERSE_FN (bb, cfun)
>>>+ {
>>>+ FOR_BB_INSNS_REVERSE (bb, insn)
>>>+ {
>>>+ if (!pop_rtx
>>>+ && NOTE_P (insn)
>>>+ && NOTE_KIND (insn) == NOTE_INSN_EPILOGUE_BEG)
>>>+ return;
>>>+
>>>+ if (NOTE_P (insn)
>>>+ && NOTE_KIND (insn) == NOTE_INSN_FUNCTION_BEG)
>>>+ {
>>>+ if (pop_rtx)
>>>+ emit_zcmp_popret (pop_rtx, pop_candidates, bb);
>>>+ return;
>>>+ };
>>>+
>>>+ if (!(NONDEBUG_INSN_P (insn)
>>>+ || CALL_P (insn)))
>>>+ continue;
>>>+
>>>+ rtx pop_pat = PATTERN (insn);
>>>+
>>>+ if (GET_CODE (pop_pat) == PARALLEL
>>>+ && riscv_valid_stack_push_pop_p (pop_pat, false))
>>>+ {
>>>+ pop_rtx = insn;
>>>+ continue;
>>>+ }
>>>+
>>>+ /* pattern for `ret`. */
>>>+ if (JUMP_P (insn)
>>>+ && GET_CODE (pop_pat) == PARALLEL
>>>+ && XVECLEN (pop_pat, 0) == 2
>>>+ && GET_CODE (XVECEXP (pop_pat, 0, 0)) == SIMPLE_RETURN
>>>+ && GET_CODE (XVECEXP (pop_pat, 0, 1)) == USE)
>>>+ {
>>>+ rtx use_reg = XEXP (XVECEXP (pop_pat, 0, 1), 0);
>>>+ if (REG_P (use_reg)
>>>+ && REGNO (use_reg) == RETURN_ADDR_REGNUM)
>>>+ {
>>>+ pop_candidates [2] = insn;
>>>+ continue;
>>>+ }
>>>+ }
>>>+
>>>+ if (!pop_rtx)
>>>+ continue;
>>>+
>>>+ /* pattern for return value. */
>>>+ if (!pop_candidates [0]
>>>+ && GET_CODE (pop_pat) == USE)
>>>+ {
>>>+ rtx_insn *set_insn = PREV_INSN (insn);
>>>+ rtx pat_set = PATTERN (set_insn);
>>>+
>>>+ if (riscv_check_regno (XEXP (pop_pat, 0),
>>>+ RETURN_VALUE_REGNUM)
>>>+ && insn
>>>+ && pat_set != NULL
>>>+ && GET_CODE (pat_set) == SET
>>>+ && riscv_check_regno (SET_DEST (pat_set),
>>>+ RETURN_VALUE_REGNUM)
>>>+ && CONST_INT_P (SET_SRC (pat_set))
>>>+ && INTVAL (SET_SRC (pat_set)) == 0)
>>>+ {
>>>+ pop_candidates [0] = set_insn;
>>>+ pop_candidates [1] = insn;
>>>+ break;
>>>+ }
>>>+ }
>>>+ }
>>>+
>>>+ if (pop_rtx)
>>>+ {
>>>+ emit_zcmp_popret (pop_rtx, pop_candidates, bb);
>>>+ return;
>>>+ }
>>>+ }
>>>+}
>>>+
>>>+const pass_data pass_data_zcmp_popret =
>>>+{
>>>+ RTL_PASS, /* type */
>>>+ "zcmp-popret", /* name */
>>>+ OPTGROUP_NONE, /* optinfo_flags */
>>>+ TV_NONE, /* tv_id */
>>>+ 0, /* properties_required */
>>>+ 0, /* properties_provided */
>>>+ 0, /* properties_destroyed */
>>>+ 0, /* todo_flags_start */
>>>+ 0, /* todo_flags_finish */
>>>+};
>>>+
>>>+class pass_zcmp_popret : public rtl_opt_pass
>>>+{
>>>+public:
>>>+ pass_zcmp_popret (gcc::context *ctxt)
>>>+ : rtl_opt_pass (pass_data_zcmp_popret, ctxt)
>>>+ {}
>>>+
>>>+ /* opt_pass methods: */
>>>+ virtual bool gate (function *)
>>>+ { return TARGET_ZCMP; }
>>>+ virtual unsigned int execute (function *)
>>>+ {
>>>+ zcmp_popret ();
>>>+ return 0;
>>>+ }
>>>+}; // class pass_zcmp_popret
>>>+
>>>+} // anon namespace
>>>+
>>>+rtl_opt_pass *
>>>+make_pass_zcmp_popret (gcc::context *ctxt)
>>>+{
>>>+ return new pass_zcmp_popret (ctxt);
>>>+}
>>>diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>>>index 5f8cbfc15ed..17df2f3f8cf 100644
>>>--- a/gcc/config/riscv/riscv.cc
>>>+++ b/gcc/config/riscv/riscv.cc
>>>@@ -114,6 +114,9 @@ struct GTY(()) riscv_frame_info {
>>> /* Likewise FPR X. */
>>> unsigned int fmask;
>>>
>>>+ /* How much the push/pop routines adjust sp (or 0 if unused). */
>>>+ unsigned push_pop_sp_adjust;
>>>+
>>> /* How much the GPR save/restore routines adjust sp (or 0 if unused). */
>>> unsigned save_libcall_adjustment;
>>>
>>>@@ -401,6 +404,20 @@ static const unsigned gpr_save_reg_order[] = {
>>> S10_REGNUM, S11_REGNUM
>>> };
>>>
>>>+/* Order for the CLOBBERs/USEs of push/pop. */
>>>+static const unsigned push_save_reg_order[] = {
>>>+ INVALID_REGNUM, RETURN_ADDR_REGNUM, S0_REGNUM,
>>>+ S1_REGNUM, S2_REGNUM, S3_REGNUM, S4_REGNUM,
>>>+ S5_REGNUM, S6_REGNUM, S7_REGNUM, S8_REGNUM,
>>>+ S9_REGNUM, S10_REGNUM, S11_REGNUM
>>>+};
>>>+
>>>+/* Order for the CLOBBERs/USEs of push/pop in rve. */
>>>+static const unsigned push_save_reg_order_zcmpe[] = {
>>>+ INVALID_REGNUM, RETURN_ADDR_REGNUM, S0_REGNUM,
>>>+ S1_REGNUM
>>>+};
>>>+
>>> /* A table describing all the processors GCC knows about. */
>>> static const struct riscv_tune_info riscv_tune_info_table[] = {
>>> #define RISCV_TUNE(TUNE_NAME, PIPELINE_MODEL, TUNE_INFO) \
>>>@@ -2989,6 +3006,17 @@ riscv_output_return ()
>>> return "ret";
>>> }
>>>
>>>+bool
>>>+riscv_output_popret_p (rtx op)
>>>+{
>>>+ unsigned n_rtx = XVECLEN (op, 0);
>>>+ rtx use = XVECEXP (op, 0, n_rtx - 1);
>>>+ rtx ret = XVECEXP (op, 0, n_rtx - 2);
>>>+
>>>+ return GET_CODE (ret) == SIMPLE_RETURN
>>>+ && GET_CODE (use) == USE;
>>>+}
>>>+
>>>
>>>
>>> /* Return true if CMP1 is a suitable second operand for integer ordering
>>> test CODE. See also the *sCC patterns in riscv.md. */
>>>@@ -4306,6 +4334,74 @@ riscv_memmodel_needs_amo_acquire (enum memmodel model)
>>> }
>>> }
>>>
>>>+/* Print Sp adjustment field of pop instruction. */
>>>+
>>>+static void
>>>+riscv_print_pop_size (FILE *file, rtx op)
>>>+{
>>>+ unsigned sp_adjust_idx = XVECLEN (op, 0) - 1;
>>>+ rtx sp_adjust_rtx = XVECEXP (op, 0, sp_adjust_idx);
>>>+
>>>+ /* Skip ret or pattern. */
>>>+ while (GET_CODE (sp_adjust_rtx) != SET)
>>>+ sp_adjust_rtx = XVECEXP (op, 0, --sp_adjust_idx);
>>>+
>>>+ rtx elt_plus = SET_SRC (sp_adjust_rtx);
>>>+ fprintf (file, "%ld", INTVAL (XEXP (elt_plus, 1)));
>>>+}
>>>+
>>>+/* Print push/pop register list. */
>>>+
>>>+static void
>>>+riscv_print_reglist (FILE *file, rtx op)
>>>+{
>>>+ /* we only deal with three formats:
>>>+ push {ra}
>>>+ push {ra, s0}
>>>+ push {ra, s0-sN}
>>>+ or
>>>+ pop {ra}
>>>+ pop {ra, s0}
>>>+ pop {ra, s0-sN}
>>>+ registers except ra has to be continuous s-register,
>>>+ and it is supposed to be checked before.
>>>+ register list patterns in push:
>>>+ (set/f (mem/c:SI
>>>+ (plus:SI (reg/f:SI 2 sp)
>>>+ (const_int 28 [0x1c])) [2 S4 A32])
>>>+ (reg:SI 1 ra))
>>>+ register list patterns in pop:
>>>+ (set/f (reg:DI 1 ra)
>>>+ (mem/c:DI (plus:DI (reg/f:DI 2 sp)
>>>+ (const_int 8 [0x8])) [2 S8 A64]))
>>>+ */
>>>+ int total_count = XVECLEN (op, 0);
>>>+ int n_regs = 0;
>>>+ bool push_p = GET_CODE (XVECEXP (op, 0, 0)) == SET
>>>+ && GET_CODE (SET_SRC (XVECEXP (op, 0, 0))) == PLUS;
>>>+
>>>+ for (int idx = 0; idx < total_count; ++idx)
>>>+ {
>>>+ rtx ele = XVECEXP (op, 0, idx);
>>>+ if (GET_CODE (ele) != SET)
>>>+ continue;
>>>+
>>>+ bool restore_save_p = push_p ?
>>>+ MEM_P (SET_DEST (ele)) :
>>>+ MEM_P (SET_SRC (ele));
>>>+
>>>+ if (restore_save_p)
>>>+ n_regs ++;
>>>+ }
>>>+
>>>+ if (n_regs > 2)
>>>+ fprintf (file, "ra,s0-s%u", n_regs - 2);
>>>+ else if (n_regs > 1)
>>>+ fprintf (file, "ra,s0");
>>>+ else
>>>+ fputs("ra", file);
>>>+}
>>>+
>>> /* Return true if a FENCE should be emitted to before a memory access to
>>> implement the release portion of memory model MODEL. */
>>>
>>>@@ -4517,6 +4613,14 @@ riscv_print_operand (FILE *file, rtx op, int letter)
>>> fputs (GET_RTX_NAME (code), file);
>>> break;
>>>
>>>+ case 'L':
>>>+ riscv_print_reglist (file, op);
>>>+ break;
>>>+
>>>+ case 's':
>>>+ riscv_print_pop_size (file, op);
>>>+ break;
>>>+
>>> case 'S':
>>> {
>>> rtx newop = GEN_INT (ctz_hwi (INTVAL (op)));
>>>@@ -4777,6 +4881,66 @@ riscv_use_save_libcall (const struct riscv_frame_info
>>>*frame)
>>> return frame->save_libcall_adjustment != 0;
>>> }
>>>
>>>+/* Determine how many instructions related to push/pop instructions. */
>>>+
>>>+static unsigned
>>>+riscv_save_push_pop_count (unsigned mask)
>>>+{
>>>+ if (!BITSET_P (mask, GP_REG_FIRST + RETURN_ADDR_REGNUM))
>>>+ return 0;
>>>+ for (unsigned n = GP_REG_LAST; n > GP_REG_FIRST; n--)
>>>+ if (BITSET_P (mask, n)
>>>+ && !call_used_regs [n])
>>>+ /* add ra saving and sp adjust. */
>>>+ return CALLEE_SAVED_REG_NUMBER (n) + 1 + 2;
>>>+ abort ();
>>>+}
>>>+
>>>+/* Calculate the maximum sp adjustment of push/pop instruction. */
>>>+
>>>+static unsigned
>>>+riscv_push_pop_base_sp_adjust (unsigned mask)
>>>+{
>>>+ unsigned n_regs = riscv_save_push_pop_count (mask) - 1;
>>>+ return (n_regs * UNITS_PER_WORD + 15) & (~0xf);
>>>+}
>>>+
>>>+/* Determine whether to call push/pop routines. */
>>>+
>>>+static bool
>>>+riscv_use_push_pop (const struct riscv_frame_info *frame, const HOST_WIDE_INT
>>>frame_size)
>>>+{
>>>+ if (!TARGET_ZCMP)
>>>+ return false;
>>>+
>>>+ /* We do not handler variable argument cases currently. */
>>>+ if (cfun->machine->varargs_size != 0)
>>>+ return false;
>>>+
>>>+ HOST_WIDE_INT base_size = riscv_push_pop_base_sp_adjust (frame->mask);
>>>+ /*
>>>+ Pr 960215-1.c in rv64 ouputs
>>>+
>>>+ addi sp,sp,-32
>>>+ sd ra,24(sp)
>>>+ sd s0,16(sp)
>>>+ sd s2,8(sp)
>>>+ sd s3,0(sp)
>>>+ it is a rare case that callee saved registers are not non-continous,
>>>+ which breaks the old push implementation, and we just reject this case
>>>+ like save-restore does now.
>>>+ */
>>>+ if (base_size > frame_size)
>>>+ return false;
>>>+
>>>+ /* {ra,s0-s10} is invalid. */
>>>+ if (frame->mask & (1 << (S10_REGNUM - GP_REG_FIRST))
>>>+ && !(frame->mask & (1 << (S11_REGNUM - GP_REG_FIRST))))
>>>+ return false;
>>>+
>>>+ return frame->mask & (1 << (RETURN_ADDR_REGNUM - GP_REG_FIRST));
>>>+}
>>>+
>>> /* Determine which GPR save/restore routine to call. */
>>>
>>> static unsigned
>>>@@ -4934,6 +5098,8 @@ riscv_compute_frame_info (void)
>>> /* Only use save/restore routines when the GPRs are atop the frame. */
>>> if (known_ne (frame->hard_frame_pointer_offset, frame->total_size))
>>> frame->save_libcall_adjustment = 0;
>>>+
>>>+ frame->push_pop_sp_adjust = 0;
>>> }
>>>
>>> /* Make sure that we're not trying to eliminate to the wrong hard frame
>>>@@ -5171,6 +5337,86 @@ riscv_for_each_saved_reg (poly_int64 sp_offset,
>>>riscv_save_restore_fn fn,
>>> }
>>> }
>>>
>>>+static void
>>>+riscv_emit_pop_insn (struct riscv_frame_info *frame, HOST_WIDE_INT offset,
>>>HOST_WIDE_INT size)
>>>+{
>>>+ unsigned int veclen = riscv_save_push_pop_count (frame->mask);
>>>+ unsigned int n_reg = veclen - 1;
>>>+ rtvec vec = rtvec_alloc (veclen);
>>>+ HOST_WIDE_INT sp_adjust;
>>>+ rtx dwarf = NULL_RTX;
>>>+
>>>+ const unsigned *reg_order = (TARGET_ZCMP && TARGET_RVE)
>>>+ ? push_save_reg_order_zcmpe
>>>+ : push_save_reg_order;
>>>+
>>>+ gcc_assert (n_reg >= 1
>>>+ && TARGET_ZCMP
>>>+ && ((TARGET_RVE && (n_reg <= ARRAY_SIZE (push_save_reg_order_zcmpe)))
>>>+ || (TARGET_ZCMP && (n_reg <= ARRAY_SIZE (push_save_reg_order)))));
>>>+
>>>+ /* sp adjust pattern */
>>>+ int max_allow_sp_adjust = riscv_push_pop_base_sp_adjust (frame->mask) + 48;
>>>+ int aligned_size = size;
>>>+
>>>+ /* if sp adjustment is too large, we should split it first. */
>>>+ if (aligned_size > max_allow_sp_adjust)
>>>+ {
>>>+ rtx dwarf_pre_sp_adjust = NULL_RTX;
>>>+ rtx pre_adjust_rtx = gen_add3_insn (stack_pointer_rtx,
>>>+ stack_pointer_rtx,
>>>+ GEN_INT (aligned_size - max_allow_sp_adjust));
>>>+ rtx insn = emit_insn (pre_adjust_rtx);
>>>+
>>>+ rtx cfa_pre_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
>>>+ GEN_INT (aligned_size - max_allow_sp_adjust));
>>>+ dwarf_pre_sp_adjust = alloc_reg_note (REG_CFA_DEF_CFA,
>>>+ cfa_pre_adjust_rtx,
>>>+ dwarf_pre_sp_adjust);
>>>+
>>>+ RTX_FRAME_RELATED_P (insn) = 1;
>>>+ REG_NOTES (insn) = dwarf_pre_sp_adjust;
>>>+
>>>+ sp_adjust = max_allow_sp_adjust;
>>>+ }
>>>+ else
>>>+ sp_adjust = (aligned_size + 15) & (~0xf);
>>>+
>>>+ /* register save sequence. */
>>>+ for (unsigned i = 1; i < veclen; ++i)
>>>+ {
>>>+ offset -= UNITS_PER_WORD;
>>>+ unsigned regno = reg_order[i];
>>>+ rtx reg = gen_rtx_REG (Pmode, regno);
>>>+ rtx mem = gen_frame_mem (Pmode, plus_constant (Pmode,
>>>+ stack_pointer_rtx,
>>>+ offset));
>>>+ rtx set = gen_rtx_SET (reg, mem);
>>>+ RTVEC_ELT (vec, i - 1) = set;
>>>+ RTX_FRAME_RELATED_P (set) = 1;
>>>+ dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf);
>>>+ }
>>>+
>>>+ /* sp adjust pattern */
>>>+ rtx adjust_sp_rtx
>>>+ = gen_rtx_SET (stack_pointer_rtx,
>>>+ plus_constant (Pmode,
>>>+ stack_pointer_rtx,
>>>+ sp_adjust));
>>>+ RTVEC_ELT (vec, veclen - 1) = adjust_sp_rtx;
>>>+
>>>+ rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
>>>+ const0_rtx);
>>>+ dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, dwarf);
>>>+
>>>+ frame->gp_sp_offset -= (veclen - 1) * UNITS_PER_WORD;
>>>+ frame->push_pop_sp_adjust = sp_adjust;
>>>+
>>>+ rtx insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, vec));
>>>+ RTX_FRAME_RELATED_P (insn) = 1;
>>>+ REG_NOTES (insn) = dwarf;
>>>+}
>>>+
>>> /* For stack frames that can't be allocated with a single ADDI instruction,
>>> compute the best value to initially allocate. It must at a minimum
>>> allocate enough space to spill the callee-saved registers. If TARGET_RVC,
>>>@@ -5270,6 +5516,146 @@ riscv_emit_stack_tie (void)
>>> emit_insn (gen_stack_tiedi (stack_pointer_rtx, hard_frame_pointer_rtx));
>>> }
>>>
>>>+bool
>>>+riscv_check_regno(rtx pat, unsigned regno)
>>>+{
>>>+ return REG_P (pat)
>>>+ && REGNO (pat) == regno;
>>>+}
>>>+
>>>+/* Function to check whether the OP is a valid stack push/pop operation.
>>>+ This part is borrowed from nds32 nds32_valid_stack_push_pop_p */
>>>+
>>>+bool
>>>+riscv_valid_stack_push_pop_p (rtx op, bool push_p)
>>>+{
>>>+ int index;
>>>+ int total_count;
>>>+ int sp_adjust_rtx_index;
>>>+ rtx elt;
>>>+ rtx elt_reg;
>>>+ rtx elt_plus;
>>>+
>>>+ if (!TARGET_ZCMP)
>>>+ return false;
>>>+
>>>+ total_count = XVECLEN (op, 0);
>>>+ sp_adjust_rtx_index = push_p ? 0 : total_count - 1;
>>>+
>>>+ /* At least sp + one callee save/restore register rtx */
>>>+ if (total_count < 2)
>>>+ return false;
>>>+
>>>+ /* Perform some quick check for that every element should be 'set',
>>>+ for pop, it might contain `ret` and `ret value` pattern. */
>>>+ for (index = 0; index < total_count; index++)
>>>+ {
>>>+ elt = XVECEXP (op, 0, index);
>>>+
>>>+ /* skip pop return value rtx */
>>>+ if (!push_p && GET_CODE (elt) == SET
>>>+ && riscv_check_regno (SET_DEST (elt), RETURN_VALUE_REGNUM)
>>>+ && total_count >= 4
>>>+ && index + 1 < total_count
>>>+ && GET_CODE (XVECEXP (op, 0, index + 1)) == USE)
>>>+ {
>>>+ rtx use_reg = XEXP (XVECEXP (op, 0, index + 1), 0);
>>>+
>>>+ if (!riscv_check_regno (use_reg, RETURN_VALUE_REGNUM))
>>>+ return false;
>>>+
>>>+ index += 1;
>>>+ continue;
>>>+ }
>>>+
>>>+ /* skip ret rtx */
>>>+ if (!push_p && GET_CODE (elt) == SIMPLE_RETURN
>>>+ && total_count >= 4
>>>+ && index + 1 < total_count
>>>+ && GET_CODE (XVECEXP (op, 0, index + 1)) == USE)
>>>+ {
>>>+ rtx use_reg = XEXP (XVECEXP (op, 0, index + 1), 0);
>>>+
>>>+ if (!riscv_check_regno (use_reg, RETURN_ADDR_REGNUM))
>>>+ return false;
>>>+
>>>+ index += 1;
>>>+ sp_adjust_rtx_index -= 2;
>>>+ continue;
>>>+ }
>>>+
>>>+ if (GET_CODE (elt) != SET)
>>>+ return false;
>>>+ }
>>>+
>>>+ elt = XVECEXP (op, 0, sp_adjust_rtx_index);
>>>+ elt_reg = SET_DEST (elt);
>>>+ elt_plus = SET_SRC (elt);
>>>+
>>>+ /* Check this is (set (stack_reg) (plus stack_reg const)) pattern. */
>>>+ if (GET_CODE (elt_plus) != PLUS
>>>+ || !riscv_check_regno (elt_reg, STACK_POINTER_REGNUM))
>>>+ return false;
>>>+
>>>+ /* Pass all test, this is a valid rtx. */
>>>+ return true;
>>>+}
>>>+
>>>+/* Generate push/pop rtx */
>>>+
>>>+static void
>>>+riscv_emit_push_insn (struct riscv_frame_info *frame, HOST_WIDE_INT size)
>>>+{
>>>+ unsigned int veclen = riscv_save_push_pop_count (frame->mask);
>>>+ unsigned int n_reg = veclen - 1;
>>>+ rtvec vec = rtvec_alloc (veclen);
>>>+
>>>+ const unsigned *reg_order = (TARGET_ZCMP && TARGET_RVE)
>>>+ ? push_save_reg_order_zcmpe
>>>+ : push_save_reg_order;
>>>+
>>>+ int aligned_size = (size + 15) & (~0xf);
>>>+
>>>+ gcc_assert (n_reg >= 1
>>>+ && TARGET_ZCMP
>>>+ && ((TARGET_RVE && (n_reg <= ARRAY_SIZE (push_save_reg_order_zcmpe)))
>>>+ || (TARGET_ZCMP && (n_reg <= ARRAY_SIZE (push_save_reg_order)))));
>>>+
>>>+ /* sp adjust pattern */
>>>+ int max_allow_sp_adjust = riscv_push_pop_base_sp_adjust (frame->mask) + 48;
>>>+ int sp_adjust = aligned_size > max_allow_sp_adjust ?
>>>+ max_allow_sp_adjust
>>>+ : aligned_size;
>>>+
>>>+ /*TODO: move this part to frame computation function. */
>>>+ frame->gp_sp_offset = (veclen - 1) * UNITS_PER_WORD;
>>>+ frame->push_pop_sp_adjust = sp_adjust;
>>>+
>>>+ rtx adjust_sp_rtx
>>>+ = gen_rtx_SET (stack_pointer_rtx,
>>>+ plus_constant (Pmode,
>>>+ stack_pointer_rtx,
>>>+ -sp_adjust));
>>>+ RTVEC_ELT (vec, 0) = adjust_sp_rtx;
>>>+
>>>+ /* Register save sequence. */
>>>+ for (unsigned i = 1; i < veclen; ++i)
>>>+ {
>>>+ sp_adjust -= UNITS_PER_WORD;
>>>+ unsigned regno = reg_order[i];
>>>+ rtx reg = gen_rtx_REG (Pmode, regno);
>>>+ rtx mem = gen_frame_mem (Pmode, plus_constant (Pmode,
>>>+ stack_pointer_rtx,
>>>+ sp_adjust));
>>>+ rtx set = gen_rtx_SET (mem, reg);
>>>+ RTVEC_ELT (vec, i) = set;
>>>+ RTX_FRAME_RELATED_P (set) = 1;
>>>+ }
>>>+
>>>+ rtx insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, vec));
>>>+ RTX_FRAME_RELATED_P (insn) = 1;
>>>+}
>>>+
>>> /* Expand the "prologue" pattern. */
>>>
>>> void
>>>@@ -5278,6 +5664,7 @@ riscv_expand_prologue (void)
>>> struct riscv_frame_info *frame = &cfun->machine->frame;
>>> poly_int64 size = frame->total_size;
>>> unsigned mask = frame->mask;
>>>+ HOST_WIDE_INT step1 = riscv_first_stack_step (frame);
>>> rtx insn;
>>>
>>> if (flag_stack_usage_info)
>>>@@ -5300,19 +5687,32 @@ riscv_expand_prologue (void)
>>> REG_NOTES (insn) = dwarf;
>>> }
>>>
>>>+ if (size.is_constant ())
>>>+ step1 = MIN (size.to_constant(), step1);
>>>+ if (riscv_use_push_pop (frame, step1))
>>>+ {
>>>+ riscv_emit_push_insn (frame, step1);
>>>+
>>>+ step1 = MAX (step1 - frame->push_pop_sp_adjust, 0);
>>>+ size = MAX (size.to_constant() - frame->push_pop_sp_adjust, 0);
>>>+ frame->mask &= ~ ((TARGET_ZCMP && TARGET_RVE) ?
>>>+ RISCV_ZCMPE_PUSH_POP_MASK
>>>+ : RISCV_ZCE_PUSH_POP_MASK);
>>>+ }
>>>+
>>> /* Save the registers. */
>>> if ((frame->mask | frame->fmask) != 0)
>>> {
>>>- HOST_WIDE_INT step1 = riscv_first_stack_step (frame);
>>>- if (size.is_constant ())
>>>- step1 = MIN (size.to_constant(), step1);
>>>-
>>>- insn = gen_add3_insn (stack_pointer_rtx,
>>>- stack_pointer_rtx,
>>>- GEN_INT (-step1));
>>>- RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
>>>- size -= step1;
>>>- riscv_for_each_saved_reg (size, riscv_save_reg, false, false);
>>>+ if (step1 > 0)
>>>+ {
>>>+ insn = gen_add3_insn (stack_pointer_rtx,
>>>+ stack_pointer_rtx,
>>>+ GEN_INT (-step1));
>>>+ RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
>>>+ size -= step1;
>>>+ }
>>>+ riscv_for_each_saved_reg (size, riscv_save_reg,
>>>+ false /* bool epilogue */, false /* bool maybe_eh_return */);
>>> }
>>>
>>> frame->mask = mask; /* Undo the above fib. */
>>>@@ -5412,6 +5812,8 @@ riscv_expand_epilogue (int style)
>>> rtx ra = gen_rtx_REG (Pmode, RETURN_ADDR_REGNUM);
>>> rtx insn;
>>>
>>>+ bool use_zcmp_pop = !use_restore_libcall && !(crtl->calls_eh_return);
>>>+
>>> /* We need to add memory barrier to prevent read from deallocated stack. */
>>> bool need_barrier_p = known_ne (get_frame_size ()
>>> + cfun->machine->frame.arg_pointer_offset, 0);
>>>@@ -5538,6 +5940,18 @@ riscv_expand_epilogue (int style)
>>> if (use_restore_libcall)
>>> frame->mask = 0; /* Temporarily fib that we need not save GPRs. */
>>>
>>>+ if (use_zcmp_pop && riscv_use_push_pop (frame, step2))
>>>+ {
>>>+ /* Emit a barrier to prevent loads from a deallocated stack. */
>>>+ riscv_emit_stack_tie ();
>>>+ need_barrier_p = false;
>>>+ riscv_emit_pop_insn (frame, frame->total_size.to_constant(), step2);
>>>+ frame->mask &= ~ ((TARGET_ZCMP && TARGET_RVE) ?
>>>+ RISCV_ZCMPE_PUSH_POP_MASK
>>>+ : RISCV_ZCE_PUSH_POP_MASK);
>>>+ step2 = 0;
>>>+ }
>>>+
>>> /* Restore the registers. */
>>> riscv_for_each_saved_reg (frame->total_size - step2, riscv_restore_reg,
>>> true, style == EXCEPTION_RETURN);
>>>@@ -5552,6 +5966,9 @@ riscv_expand_epilogue (int style)
>>> if (need_barrier_p)
>>> riscv_emit_stack_tie ();
>>>
>>>+ if (use_zcmp_pop)
>>>+ frame->mask = mask;
>>>+
>>> /* Deallocate the final bit of the frame. */
>>> if (step2 > 0)
>>> {
>>>diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
>>>index d05b1d59853..6e6e3ee2c25 100644
>>>--- a/gcc/config/riscv/riscv.h
>>>+++ b/gcc/config/riscv/riscv.h
>>>@@ -383,6 +383,7 @@ ASM_MISA_SPEC
>>> #define HARD_FRAME_POINTER_REGNUM 8
>>> #define STACK_POINTER_REGNUM 2
>>> #define THREAD_POINTER_REGNUM 4
>>>+#define RETURN_VALUE_REGNUM 10
>>>
>>> /* These two registers don't really exist: they get eliminated to either
>>> the stack or hard frame pointer. */
>>>@@ -1097,4 +1098,7 @@ extern void riscv_remove_unneeded_save_restore_calls
>>>(void);
>>> #define DWARF_REG_TO_UNWIND_COLUMN(REGNO) \
>>> ((REGNO == RISCV_DWARF_VLENB) ? (FIRST_PSEUDO_REGISTER + 1) : REGNO)
>>>
>>>+#define RISCV_ZCE_PUSH_POP_MASK 0x0ffc0302u
>>>+#define RISCV_ZCMPE_PUSH_POP_MASK 0x302u
>>>+
>>> #endif /* ! GCC_RISCV_H */
>>>diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
>>>index bc384d9aedf..b9f2a426e48 100644
>>>--- a/gcc/config/riscv/riscv.md
>>>+++ b/gcc/config/riscv/riscv.md
>>>@@ -108,12 +108,14 @@
>>>
>>> (define_constants
>>> [(RETURN_ADDR_REGNUM 1)
>>>+ (SP_REGNUM 2)
>>> (GP_REGNUM 3)
>>> (TP_REGNUM 4)
>>> (T0_REGNUM 5)
>>> (T1_REGNUM 6)
>>> (S0_REGNUM 8)
>>> (S1_REGNUM 9)
>>>+ (A0_REGNUM 10)
>>> (S2_REGNUM 18)
>>> (S3_REGNUM 19)
>>> (S4_REGNUM 20)
>>>@@ -3147,3 +3149,4 @@
>>> (include "sifive-7.md")
>>> (include "thead.md")
>>> (include "vector.md")
>>>+(include "zc.md")
>>>diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
>>>index 6e326fc7e02..9ef522306a5 100644
>>>--- a/gcc/config/riscv/t-riscv
>>>+++ b/gcc/config/riscv/t-riscv
>>>@@ -90,6 +90,10 @@ riscv-v.o: $(srcdir)/config/riscv/riscv-v.cc \
>>> $(COMPILE) $<
>>> $(POSTCOMPILE)
>>>
>>>+riscv-zcmp-popret.o: $(srcdir)/config/riscv/riscv-zcmp-popret.cc
>>>+ $(COMPILE) $<
>>>+ $(POSTCOMPILE)
>>>+
>>> thead.o: $(srcdir)/config/riscv/thead.cc \
>>> $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TARGET_H) backend.h $(RTL_H) \
>>> memmodel.h $(EMIT_RTL_H) poly-int.h output.h
>>>diff --git a/gcc/config/riscv/zc.md b/gcc/config/riscv/zc.md
>>>new file mode 100644
>>>index 00000000000..3ad34dacd49
>>>--- /dev/null
>>>+++ b/gcc/config/riscv/zc.md
>>>@@ -0,0 +1,47 @@
>>>+;; Machine description for ZCE extension.
>>>+;; Copyright (C) 2021 Free Software Foundation, Inc.
>>>+
>>>+;; This file is part of GCC.
>>>+
>>>+;; GCC is free software; you can redistribute it and/or modify
>>>+;; it under the terms of the GNU General Public License as published by
>>>+;; the Free Software Foundation; either version 3, or (at your option)
>>>+;; any later version.
>>>+
>>>+;; GCC is distributed in the hope that it will be useful,
>>>+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>>>+;; GNU General Public License for more details.
>>>+
>>>+;; You should have received a copy of the GNU General Public License
>>>+;; along with GCC; see the file COPYING3. If not see
>>>+;; <http://www.gnu.org/licenses/>.
>>>+
>>>+(define_insn "*stack_push<mode>"
>>>+ [(match_parallel 0 "riscv_stack_push_operation"
>>>+ [(set (reg:X SP_REGNUM) (plus:X (reg:X SP_REGNUM)
>>>+ (match_operand:X 1 "const_int_operand" "")))])]
>>>+ "TARGET_ZCMP"
>>>+ "cm.push\t{%L0},%1")
>>>+
>>>+(define_insn "*stack_pop<mode>"
>>>+ [(match_parallel 0 "riscv_stack_pop_operation"
>>>+ [(set (match_operand:X 1 "register_operand" "")
>>>+ (mem:X (plus:X (reg:X SP_REGNUM)
>>>+ (match_operand:X 2 "const_int_operand" ""))))])]
>>>+ "TARGET_ZCMP"
>>>+ {
>>>+ return riscv_output_popret_p (operands[0]) ?
>>>+ "cm.popret\t{%L0},%s0" :
>>>+ "cm.pop\t{%L0},%s0";
>>>+ })
>>>+
>>>+(define_insn "*stack_pop_with_return_value<mode>"
>>>+ [(match_parallel 0 "riscv_stack_pop_operation"
>>>+ [(set (reg:ANYI A0_REGNUM)
>>>+ (match_operand:ANYI 1 "pop_return_value_constant" ""))])]
>>>+ "TARGET_ZCMP"
>>>+ {
>>>+ gcc_assert (riscv_output_popret_p (operands[0]));
>>>+ return "cm.popretz\t{%L0},%s0";
>>>+ })
>>>--
>>>2.25.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 4/5] RISC-V: Add Zcmp extension supports.
       [not found]   ` <07720619-dd69-4816-987e-ff0e14d9a348.>
@ 2023-05-12  8:53     ` Sinan
  0 siblings, 0 replies; 9+ messages in thread
From: Sinan @ 2023-05-12  8:53 UTC (permalink / raw)
  To: Kito Cheng, Jiawei; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 14423 bytes --]

Hi, Kito and Jiawei
I have noticed several comments are not accurate or no longer valid(e.g. only for zc 0.5) and they need an update or improvement.
> +
>> +namespace {
>> +
>> +/*
>> + 1. preprocessing:
>> + 1.1. if there is no push rtx, then just return. e.g.
>> + (note 5 1 22 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>> + (insn/f 22 5 23 2 (set (reg/f:SI 2 sp)
>> + (plus:SI (reg/f:SI 2 sp)
>> + (const_int -32 [0xffffffffffffffe0])))
>> + (nil))
>> + (note 23 22 2 2 NOTE_INSN_PROLOGUE_END)
>> + 1.2. if push rtx exists, then we compute the number of
>> + pushed s-registers, n_sreg.
>> +
>> + push rtx should be find before NOTE_INSN_PROLOGUE_END tag
>> +
>> + [2 and 3 happend simultaneously]
>> + 2. find valid move pattern, mv sN, aN, where N < n_sreg,
>> + and aN is not used the move pattern, and sN is not
>> + defined before the move pattern (from prologue to the
>> + position of move pattern).
>> + 3. analysis use and reach of every instruction from prologue
>> + to the position of move pattern.
>> + if any sN is used, then we mark the corresponding argument list
>> + candidate as invalid.
>> + e.g.
>> + push {ra,s0-s3}, {}, -32
>> + sw s0,44(sp) # s0 is used, then argument list is invalid
>> + mv a0,a5 # a0 is defined, then argument list is invalid
>> + ...
>> + mv s0,a0
>> + mv s1,a1
>> + mv s2,a2
>> +
>> + 4. if there is a valid argument list, then replace the pop
>> + push parallel insn, and delete mv pattern.
>> + if not, skip.
>> +*/
>
>I am not sure I understand this optimization pass correctly,
>could you give more example or indicate which testcase can demonstrate
>this pass?
>
>And I would prefer this pass split from this patch, let it become a separated
>patch including testcase.
This comment is incorrect.
this pass is to search `ret`, `cm.pop` and `mv a0, 0` and try to combine them into cm.popretz and you can find relevant cm.popretz testcases from https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg304545.html
> @@ -4777,6 +4881,66 @@ riscv_use_save_libcall (const struct riscv_frame_info *frame)
>> return frame->save_libcall_adjustment != 0;
>> }
>>
>> +/* Determine how many instructions related to push/pop instructions. */
>> +
>> +static unsigned
>> +riscv_save_push_pop_count (unsigned mask)
>> +{
>> + if (!BITSET_P (mask, GP_REG_FIRST + RETURN_ADDR_REGNUM))
>> + return 0;
>> + for (unsigned n = GP_REG_LAST; n > GP_REG_FIRST; n--)
>> + if (BITSET_P (mask, n)
>> + && !call_used_regs [n])
>> + /* add ra saving and sp adjust. */
>> + return CALLEE_SAVED_REG_NUMBER (n) + 1 + 2;
>
>What the magic number of `+ 1 + 2`?
well, it is really misleading here, and it is better to make it more clear ...
`riscv_save_push_pop_count` is used to calculate the expected size of the push/pop parallel pattern(the number saved/restored registers plus one sp adjust pattern), so
the number of xreg saved/restored = CALLEE_SAVED_REG_NUMBER (n) + 1 and then the `+2` is for ra and sp adjustment patterns ...
>> +riscv_emit_push_insn (struct riscv_frame_info *frame, HOST_WIDE_INT size)
>> +{
>> + unsigned int veclen = riscv_save_push_pop_count (frame->mask);
>> + unsigned int n_reg = veclen - 1;
>
> Need comment to explain why `- 1` here.
so we could use `-1` to calculate how many registers are saved/restored here.
BR,
Sinan
------------------------------------------------------------------
Sender:Kito Cheng <kito.cheng@gmail.com>
Sent At:2023 May 4 (Thu.) 17:04
Recipient:Jiawei <jiawei@iscas.ac.cn>
Cc:gcc-patches <gcc-patches@gcc.gnu.org>; kito.cheng <kito.cheng@sifive.com>; palmer <palmer@dabbelt.com>; christoph.muellner <christoph.muellner@vrull.eu>; jeremy.bennett <jeremy.bennett@embecosm.com>; mary.bennett <mary.bennett@embecosm.com>; nandni.jamnadas <nandni.jamnadas@embecosm.com>; charlie.keaney <charlie.keaney@embecosm.com>; simon.cook <simon.cook@embecosm.com>; tariq.kurd <tariq.kurd@codasip.com>; ibrahim.abu.kharmeh1 <ibrahim.abu.kharmeh1@huawei.com>; sinan.lin <sinan.lin@linux.alibaba.com>; wuwei2016 <wuwei2016@iscas.ac.cn>; shihua <shihua@iscas.ac.cn>; shiyulong <shiyulong@iscas.ac.cn>; chenyixuan <chenyixuan@iscas.ac.cn>
Subject:Re: [PATCH 4/5] RISC-V: Add Zcmp extension supports.
Could you rebase this patch, we have some changes on
> All "zcmpe" means Zcmp with RVE extension.
Use zcmp_rve instead, zcmpe seems like a new ext. name
> diff --git a/gcc/config/riscv/riscv-zcmp-popret.cc b/gcc/config/riscv/riscv-zcmp-popret.cc
> new file mode 100644
> index 00000000000..d7b40f6a3e2
> --- /dev/null
> +++ b/gcc/config/riscv/riscv-zcmp-popret.cc
> @@ -0,0 +1,260 @@
Need a header here like "^#$% for RISC-V Copyright (C) 2023 Free
Software Foundation, Inc." here
> +#include "config.h"
...
> +#include "cfgrtl.h"
> +
> +#define IN_TARGET_CODE 1
This should appear before include anything.
> +
> +namespace {
> +
> +/*
> + 1. preprocessing:
> + 1.1. if there is no push rtx, then just return. e.g.
> + (note 5 1 22 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
> + (insn/f 22 5 23 2 (set (reg/f:SI 2 sp)
> + (plus:SI (reg/f:SI 2 sp)
> + (const_int -32 [0xffffffffffffffe0])))
> + (nil))
> + (note 23 22 2 2 NOTE_INSN_PROLOGUE_END)
> + 1.2. if push rtx exists, then we compute the number of
> + pushed s-registers, n_sreg.
> +
> + push rtx should be find before NOTE_INSN_PROLOGUE_END tag
> +
> + [2 and 3 happend simultaneously]
> + 2. find valid move pattern, mv sN, aN, where N < n_sreg,
> + and aN is not used the move pattern, and sN is not
> + defined before the move pattern (from prologue to the
> + position of move pattern).
> + 3. analysis use and reach of every instruction from prologue
> + to the position of move pattern.
> + if any sN is used, then we mark the corresponding argument list
> + candidate as invalid.
> + e.g.
> + push {ra,s0-s3}, {}, -32
> + sw s0,44(sp) # s0 is used, then argument list is invalid
> + mv a0,a5 # a0 is defined, then argument list is invalid
> + ...
> + mv s0,a0
> + mv s1,a1
> + mv s2,a2
> +
> + 4. if there is a valid argument list, then replace the pop
> + push parallel insn, and delete mv pattern.
> + if not, skip.
> +*/
I am not sure I understand this optimization pass correctly,
could you give more example or indicate which testcase can demonstrate
this pass?
And I would prefer this pass split from this patch, let it become a separated
patch including testcase.
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 5f8cbfc15ed..17df2f3f8cf 100644
> +/* Order for the CLOBBERs/USEs of push/pop. */
> +static const unsigned push_save_reg_order[] = {
push_save_reg_order -> zcmp_push_save_reg_order
> + INVALID_REGNUM, RETURN_ADDR_REGNUM, S0_REGNUM,
> + S1_REGNUM, S2_REGNUM, S3_REGNUM, S4_REGNUM,
> + S5_REGNUM, S6_REGNUM, S7_REGNUM, S8_REGNUM,
> + S9_REGNUM, S10_REGNUM, S11_REGNUM
> +};
> +
> +/* Order for the CLOBBERs/USEs of push/pop in rve. */
> +static const unsigned push_save_reg_order_zcmpe[] = {
push_save_reg_order_zcmpe -> zcmp_rve_push_save_reg_order
> @@ -4777,6 +4881,66 @@ riscv_use_save_libcall (const struct riscv_frame_info *frame)
> return frame->save_libcall_adjustment != 0;
> }
>
> +/* Determine how many instructions related to push/pop instructions. */
> +
> +static unsigned
> +riscv_save_push_pop_count (unsigned mask)
> +{
> + if (!BITSET_P (mask, GP_REG_FIRST + RETURN_ADDR_REGNUM))
> + return 0;
> + for (unsigned n = GP_REG_LAST; n > GP_REG_FIRST; n--)
> + if (BITSET_P (mask, n)
> + && !call_used_regs [n])
> + /* add ra saving and sp adjust. */
> + return CALLEE_SAVED_REG_NUMBER (n) + 1 + 2;
What the magic number of `+ 1 + 2`?
> + abort ();
> +}
> +
> +/* Calculate the maximum sp adjustment of push/pop instruction. */
> +
> +static unsigned
> +riscv_push_pop_base_sp_adjust (unsigned mask)
> +{
> + unsigned n_regs = riscv_save_push_pop_count (mask) - 1;
> + return (n_regs * UNITS_PER_WORD + 15) & (~0xf);
Use ROUND_UP
> @@ -5171,6 +5337,86 @@ riscv_for_each_saved_reg (poly_int64 sp_offset, riscv_save_restore_fn fn,
> }
> }
>
> +static void
> +riscv_emit_pop_insn (struct riscv_frame_info *frame, HOST_WIDE_INT offset, HOST_WIDE_INT size)
> +{
> + unsigned int veclen = riscv_save_push_pop_count (frame->mask);
> + unsigned int n_reg = veclen - 1;
> + rtvec vec = rtvec_alloc (veclen);
> + HOST_WIDE_INT sp_adjust;
> + rtx dwarf = NULL_RTX;
> +
> + const unsigned *reg_order = (TARGET_ZCMP && TARGET_RVE)
> + ? push_save_reg_order_zcmpe
> + : push_save_reg_order;
> +
> + gcc_assert (n_reg >= 1
> + && TARGET_ZCMP
> + && ((TARGET_RVE && (n_reg <= ARRAY_SIZE (push_save_reg_order_zcmpe)))
> + || (TARGET_ZCMP && (n_reg <= ARRAY_SIZE (push_save_reg_order)))));
> +
> + /* sp adjust pattern */
> + int max_allow_sp_adjust = riscv_push_pop_base_sp_adjust (frame->mask) + 48;
> + int aligned_size = size;
> +
> + /* if sp adjustment is too large, we should split it first. */
> + if (aligned_size > max_allow_sp_adjust)
> + {
> + rtx dwarf_pre_sp_adjust = NULL_RTX;
> + rtx pre_adjust_rtx = gen_add3_insn (stack_pointer_rtx,
> + stack_pointer_rtx,
> + GEN_INT (aligned_size - max_allow_sp_adjust));
> + rtx insn = emit_insn (pre_adjust_rtx);
> +
> + rtx cfa_pre_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
> + GEN_INT (aligned_size - max_allow_sp_adjust));
> + dwarf_pre_sp_adjust = alloc_reg_note (REG_CFA_DEF_CFA,
> + cfa_pre_adjust_rtx,
> + dwarf_pre_sp_adjust);
> +
> + RTX_FRAME_RELATED_P (insn) = 1;
> + REG_NOTES (insn) = dwarf_pre_sp_adjust;
> +
> + sp_adjust = max_allow_sp_adjust;
> + }
> + else
> + sp_adjust = (aligned_size + 15) & (~0xf);
Use ROUND_UP
> @@ -5270,6 +5516,146 @@ riscv_emit_stack_tie (void)
> emit_insn (gen_stack_tiedi (stack_pointer_rtx, hard_frame_pointer_rtx));
> }
>
> +bool
> +riscv_check_regno(rtx pat, unsigned regno)
> +{
> + return REG_P (pat)
> + && REGNO (pat) == regno;
> +}
> +
> +/* Function to check whether the OP is a valid stack push/pop operation.
> + This part is borrowed from nds32 nds32_valid_stack_push_pop_p */
> +
> +bool
> +riscv_valid_stack_push_pop_p (rtx op, bool push_p)
> +{
> + int index;
> + int total_count;
> + int sp_adjust_rtx_index;
> + rtx elt;
> + rtx elt_reg;
> + rtx elt_plus;
> +
> + if (!TARGET_ZCMP)
> + return false;
> +
> + total_count = XVECLEN (op, 0);
> + sp_adjust_rtx_index = push_p ? 0 : total_count - 1;
> +
> + /* At least sp + one callee save/restore register rtx */
> + if (total_count < 2)
> + return false;
> +
> + /* Perform some quick check for that every element should be 'set',
> + for pop, it might contain `ret` and `ret value` pattern. */
> + for (index = 0; index < total_count; index++)
> + {
> + elt = XVECEXP (op, 0, index);
> +
> + /* skip pop return value rtx */
> + if (!push_p && GET_CODE (elt) == SET
> + && riscv_check_regno (SET_DEST (elt), RETURN_VALUE_REGNUM)
> + && total_count >= 4
> + && index + 1 < total_count
> + && GET_CODE (XVECEXP (op, 0, index + 1)) == USE)
> + {
> + rtx use_reg = XEXP (XVECEXP (op, 0, index + 1), 0);
> +
> + if (!riscv_check_regno (use_reg, RETURN_VALUE_REGNUM))
> + return false;
> +
> + index += 1;
> + continue;
> + }
> +
> + /* skip ret rtx */
> + if (!push_p && GET_CODE (elt) == SIMPLE_RETURN
> + && total_count >= 4
> + && index + 1 < total_count
> + && GET_CODE (XVECEXP (op, 0, index + 1)) == USE)
> + {
> + rtx use_reg = XEXP (XVECEXP (op, 0, index + 1), 0);
> +
> + if (!riscv_check_regno (use_reg, RETURN_ADDR_REGNUM))
> + return false;
> +
> + index += 1;
> + sp_adjust_rtx_index -= 2;
> + continue;
> + }
> +
> + if (GET_CODE (elt) != SET)
> + return false;
> + }
> +
> + elt = XVECEXP (op, 0, sp_adjust_rtx_index);
> + elt_reg = SET_DEST (elt);
> + elt_plus = SET_SRC (elt);
> +
> + /* Check this is (set (stack_reg) (plus stack_reg const)) pattern. */
> + if (GET_CODE (elt_plus) != PLUS
> + || !riscv_check_regno (elt_reg, STACK_POINTER_REGNUM))
> + return false;
> +
> + /* Pass all test, this is a valid rtx. */
> + return true;
> +}
> +
> +/* Generate push/pop rtx */
> +
> +static void
> +riscv_emit_push_insn (struct riscv_frame_info *frame, HOST_WIDE_INT size)
> +{
> + unsigned int veclen = riscv_save_push_pop_count (frame->mask);
> + unsigned int n_reg = veclen - 1;
Need comment to explain why `- 1` here.
> + rtvec vec = rtvec_alloc (veclen);
> +
> + const unsigned *reg_order = (TARGET_ZCMP && TARGET_RVE)
> + ? push_save_reg_order_zcmpe
> + : push_save_reg_order;
> +
> + int aligned_size = (size + 15) & (~0xf);
Use ROUND_UP
> +
> + gcc_assert (n_reg >= 1
> + && TARGET_ZCMP
> + && ((TARGET_RVE && (n_reg <= ARRAY_SIZE (push_save_reg_order_zcmpe)))
> + || (TARGET_ZCMP && (n_reg <= ARRAY_SIZE (push_save_reg_order)))));
> +
> + /* sp adjust pattern */
> + int max_allow_sp_adjust = riscv_push_pop_base_sp_adjust (frame->mask) + 48;
What's the magic number of 48?
> + int sp_adjust = aligned_size > max_allow_sp_adjust ?
> + max_allow_sp_adjust
> + : aligned_size;
> +
> + /*TODO: move this part to frame computation function. */
Is it possible to resolve this TODO?
> + frame->gp_sp_offset = (veclen - 1) * UNITS_PER_WORD;
> + frame->push_pop_sp_adjust = sp_adjust;
> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> index d05b1d59853..6e6e3ee2c25 100644
> --- a/gcc/config/riscv/riscv.h
> +++ b/gcc/config/riscv/riscv.h
> @@ -383,6 +383,7 @@ ASM_MISA_SPEC
> #define HARD_FRAME_POINTER_REGNUM 8
> #define STACK_POINTER_REGNUM 2
> #define THREAD_POINTER_REGNUM 4
> +#define RETURN_VALUE_REGNUM 10
>
> /* These two registers don't really exist: they get eliminated to either
> the stack or hard frame pointer. */
> @@ -1097,4 +1098,7 @@ extern void riscv_remove_unneeded_save_restore_calls (void);
> #define DWARF_REG_TO_UNWIND_COLUMN(REGNO) \
> ((REGNO == RISCV_DWARF_VLENB) ? (FIRST_PSEUDO_REGISTER + 1) : REGNO)
>
> +#define RISCV_ZCE_PUSH_POP_MASK 0x0ffc0302u
RISCV_ZCE_PUSH_POP_MASK -> RISCV_ZCMP_PUSH_POP_MASK
> +#define RISCV_ZCMPE_PUSH_POP_MASK 0x302u
RISCV_ZCMPE_PUSH_POP_MASK -> RISCV_ZCMP_RVE_PUSH_POP_MASK
> diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
> index 6e326fc7e02..9ef522306a5 100644
> --- a/gcc/config/riscv/t-riscv
> +++ b/gcc/config/riscv/t-riscv
> @@ -90,6 +90,10 @@ riscv-v.o: $(srcdir)/config/riscv/riscv-v.cc \
> $(COMPILE) $<
> $(POSTCOMPILE)
>
> +riscv-zcmp-popret.o: $(srcdir)/config/riscv/riscv-zcmp-popret.cc
Plz add right dependency here.
> diff --git a/gcc/config/riscv/zc.md b/gcc/config/riscv/zc.md
> new file mode 100644
> index 00000000000..3ad34dacd49
> --- /dev/null
> +++ b/gcc/config/riscv/zc.md
> @@ -0,0 +1,47 @@
> +;; Machine description for ZCE extension.
ZCE extension. -> Zc* extension

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 4/5] RISC-V: Add Zcmp extension supports.
  2023-04-06  6:21 ` [PATCH 4/5] RISC-V: Add Zcmp extension supports Jiawei
@ 2023-05-04  9:03   ` Kito Cheng
       [not found]   ` <07720619-dd69-4816-987e-ff0e14d9a348.>
  1 sibling, 0 replies; 9+ messages in thread
From: Kito Cheng @ 2023-05-04  9:03 UTC (permalink / raw)
  To: Jiawei
  Cc: gcc-patches, kito.cheng, palmer, christoph.muellner,
	jeremy.bennett, mary.bennett, nandni.jamnadas, charlie.keaney,
	simon.cook, tariq.kurd, ibrahim.abu.kharmeh1, sinan.lin,
	wuwei2016, shihua, shiyulong, chenyixuan

Could you rebase this patch, we have some changes on

> All "zcmpe" means Zcmp with RVE extension.

Use zcmp_rve instead, zcmpe seems like a new ext. name

> diff --git a/gcc/config/riscv/riscv-zcmp-popret.cc b/gcc/config/riscv/riscv-zcmp-popret.cc
> new file mode 100644
> index 00000000000..d7b40f6a3e2
> --- /dev/null
> +++ b/gcc/config/riscv/riscv-zcmp-popret.cc
> @@ -0,0 +1,260 @@

Need a header here like "^#$% for RISC-V Copyright (C) 2023 Free
Software Foundation, Inc." here

> +#include "config.h"
...
> +#include "cfgrtl.h"
> +
> +#define IN_TARGET_CODE 1

This should appear before include anything.

> +
> +namespace {
> +
> +/*
> +  1. preprocessing:
> +    1.1. if there is no push rtx, then just return. e.g.
> +    (note 5 1 22 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
> +    (insn/f 22 5 23 2 (set (reg/f:SI 2 sp)
> +      (plus:SI (reg/f:SI 2 sp)
> +       (const_int -32 [0xffffffffffffffe0])))
> +    (nil))
> +    (note 23 22 2 2 NOTE_INSN_PROLOGUE_END)
> +    1.2. if push rtx exists, then we compute the number of
> +    pushed s-registers, n_sreg.
> +
> +  push rtx should be find before NOTE_INSN_PROLOGUE_END tag
> +
> +  [2 and 3 happend simultaneously]
> +  2. find valid move pattern, mv sN, aN, where N < n_sreg,
> +    and aN is not used the move pattern, and sN is not
> +    defined before the move pattern (from prologue to the
> +    position of move pattern).
> +  3. analysis use and reach of every instruction from prologue
> +    to the position of move pattern.
> +    if any sN is used, then we mark the corresponding argument list
> +    candidate as invalid.
> +    e.g.
> +       push  {ra,s0-s3}, {}, -32
> +       sw      s0,44(sp) # s0 is used, then argument list is invalid
> +       mv      a0,a5     # a0 is defined, then argument list is invalid
> +       ...
> +       mv      s0,a0
> +       mv      s1,a1
> +       mv      s2,a2
> +
> +  4. if there is a valid argument list, then replace the pop
> +    push parallel insn, and delete mv pattern.
> +     if not, skip.
> +*/

I am not sure I understand this optimization pass correctly,
could you give more example or indicate which testcase can demonstrate
this pass?

And I would prefer this pass split from this patch, let it become a separated
patch including testcase.


> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 5f8cbfc15ed..17df2f3f8cf 100644
> +/* Order for the CLOBBERs/USEs of push/pop.  */
> +static const unsigned push_save_reg_order[] = {

push_save_reg_order -> zcmp_push_save_reg_order

> +  INVALID_REGNUM, RETURN_ADDR_REGNUM, S0_REGNUM,
> +  S1_REGNUM, S2_REGNUM, S3_REGNUM, S4_REGNUM,
> +  S5_REGNUM, S6_REGNUM, S7_REGNUM, S8_REGNUM,
> +  S9_REGNUM, S10_REGNUM, S11_REGNUM
> +};
> +
> +/* Order for the CLOBBERs/USEs of push/pop in rve.  */
> +static const unsigned push_save_reg_order_zcmpe[] = {

push_save_reg_order_zcmpe -> zcmp_rve_push_save_reg_order

> @@ -4777,6 +4881,66 @@ riscv_use_save_libcall (const struct riscv_frame_info *frame)
>    return frame->save_libcall_adjustment != 0;
>  }
>
> +/* Determine how many instructions related to push/pop instructions.  */
> +
> +static unsigned
> +riscv_save_push_pop_count (unsigned mask)
> +{
> +  if (!BITSET_P (mask, GP_REG_FIRST + RETURN_ADDR_REGNUM))
> +    return 0;
> +  for (unsigned n = GP_REG_LAST; n > GP_REG_FIRST; n--)
> +    if (BITSET_P (mask, n)
> +       && !call_used_regs [n])
> +      /* add ra saving and sp adjust. */
> +      return CALLEE_SAVED_REG_NUMBER (n) + 1 + 2;

What the magic number of `+ 1 + 2`?

> +  abort ();
> +}
> +
> +/* Calculate the maximum sp adjustment of push/pop instruction. */
> +
> +static unsigned
> +riscv_push_pop_base_sp_adjust (unsigned mask)
> +{
> +  unsigned n_regs = riscv_save_push_pop_count (mask) - 1;
> +  return (n_regs * UNITS_PER_WORD + 15) & (~0xf);

Use ROUND_UP

> @@ -5171,6 +5337,86 @@ riscv_for_each_saved_reg (poly_int64 sp_offset, riscv_save_restore_fn fn,
>        }
>  }
>
> +static void
> +riscv_emit_pop_insn (struct riscv_frame_info *frame, HOST_WIDE_INT offset, HOST_WIDE_INT size)
> +{
> +  unsigned int veclen = riscv_save_push_pop_count (frame->mask);
> +  unsigned int n_reg = veclen - 1;
> +  rtvec vec = rtvec_alloc (veclen);
> +  HOST_WIDE_INT sp_adjust;
> +  rtx dwarf = NULL_RTX;
> +
> +  const unsigned *reg_order = (TARGET_ZCMP && TARGET_RVE)
> +       ? push_save_reg_order_zcmpe
> +       : push_save_reg_order;
> +
> +  gcc_assert (n_reg >= 1
> +       && TARGET_ZCMP
> +       && ((TARGET_RVE && (n_reg <= ARRAY_SIZE (push_save_reg_order_zcmpe)))
> +           || (TARGET_ZCMP && (n_reg <= ARRAY_SIZE (push_save_reg_order)))));
> +
> +  /* sp adjust pattern */
> +  int max_allow_sp_adjust = riscv_push_pop_base_sp_adjust (frame->mask) + 48;
> +  int aligned_size = size;
> +
> +  /* if sp adjustment is too large, we should split it first. */
> +  if (aligned_size > max_allow_sp_adjust)
> +    {
> +      rtx dwarf_pre_sp_adjust = NULL_RTX;
> +      rtx pre_adjust_rtx = gen_add3_insn (stack_pointer_rtx,
> +                       stack_pointer_rtx,
> +                       GEN_INT (aligned_size - max_allow_sp_adjust));
> +      rtx insn = emit_insn (pre_adjust_rtx);
> +
> +      rtx cfa_pre_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
> +                       GEN_INT (aligned_size - max_allow_sp_adjust));
> +      dwarf_pre_sp_adjust = alloc_reg_note (REG_CFA_DEF_CFA,
> +               cfa_pre_adjust_rtx,
> +               dwarf_pre_sp_adjust);
> +
> +      RTX_FRAME_RELATED_P (insn) = 1;
> +      REG_NOTES (insn) = dwarf_pre_sp_adjust;
> +
> +      sp_adjust = max_allow_sp_adjust;
> +    }
> +  else
> +    sp_adjust = (aligned_size + 15) & (~0xf);

Use ROUND_UP

> @@ -5270,6 +5516,146 @@ riscv_emit_stack_tie (void)
>      emit_insn (gen_stack_tiedi (stack_pointer_rtx, hard_frame_pointer_rtx));
>  }
>
> +bool
> +riscv_check_regno(rtx pat, unsigned regno)
> +{
> +  return REG_P (pat)
> +      && REGNO (pat) == regno;
> +}
> +
> +/* Function to check whether the OP is a valid stack push/pop operation.
> +   This part is borrowed from nds32 nds32_valid_stack_push_pop_p */
> +
> +bool
> +riscv_valid_stack_push_pop_p (rtx op, bool push_p)
> +{
> +  int index;
> +  int total_count;
> +  int sp_adjust_rtx_index;
> +  rtx elt;
> +  rtx elt_reg;
> +  rtx elt_plus;
> +
> +  if (!TARGET_ZCMP)
> +    return false;
> +
> +  total_count = XVECLEN (op, 0);
> +  sp_adjust_rtx_index = push_p ? 0 : total_count - 1;
> +
> +  /* At least sp + one callee save/restore register rtx */
> +  if (total_count < 2)
> +    return false;
> +
> +  /* Perform some quick check for that every element should be 'set',
> +     for pop, it might contain `ret` and `ret value` pattern.  */
> +  for (index = 0; index < total_count; index++)
> +    {
> +      elt = XVECEXP (op, 0, index);
> +
> +      /* skip pop return value rtx */
> +      if (!push_p && GET_CODE (elt) == SET
> +         && riscv_check_regno (SET_DEST (elt), RETURN_VALUE_REGNUM)
> +         && total_count >= 4
> +         && index + 1 < total_count
> +         && GET_CODE (XVECEXP (op, 0, index + 1)) == USE)
> +       {
> +         rtx use_reg = XEXP (XVECEXP (op, 0, index + 1), 0);
> +
> +         if (!riscv_check_regno (use_reg, RETURN_VALUE_REGNUM))
> +           return false;
> +
> +         index += 1;
> +         continue;
> +       }
> +
> +      /* skip ret rtx */
> +      if (!push_p && GET_CODE (elt) == SIMPLE_RETURN
> +         && total_count >= 4
> +         && index + 1 < total_count
> +         && GET_CODE (XVECEXP (op, 0, index + 1)) == USE)
> +       {
> +         rtx use_reg = XEXP (XVECEXP (op, 0, index + 1), 0);
> +
> +         if (!riscv_check_regno (use_reg, RETURN_ADDR_REGNUM))
> +           return false;
> +
> +         index += 1;
> +         sp_adjust_rtx_index -= 2;
> +         continue;
> +       }
> +
> +      if (GET_CODE (elt) != SET)
> +       return false;
> +    }
> +
> +  elt = XVECEXP (op, 0, sp_adjust_rtx_index);
> +  elt_reg  = SET_DEST (elt);
> +  elt_plus = SET_SRC (elt);
> +
> +  /* Check this is (set (stack_reg) (plus stack_reg const)) pattern.  */
> +  if (GET_CODE (elt_plus) != PLUS
> +      || !riscv_check_regno (elt_reg, STACK_POINTER_REGNUM))
> +    return false;
> +
> +  /* Pass all test, this is a valid rtx.  */
> +  return true;
> +}
> +
> +/* Generate push/pop rtx */
> +
> +static void
> +riscv_emit_push_insn (struct riscv_frame_info *frame, HOST_WIDE_INT size)
> +{
> +  unsigned int veclen = riscv_save_push_pop_count (frame->mask);
> +  unsigned int n_reg = veclen - 1;

Need comment to explain why `- 1` here.

> +  rtvec vec = rtvec_alloc (veclen);
> +
> +  const unsigned *reg_order = (TARGET_ZCMP && TARGET_RVE)
> +       ? push_save_reg_order_zcmpe
> +       : push_save_reg_order;
> +
> +  int aligned_size = (size + 15) & (~0xf);

Use ROUND_UP

> +
> +  gcc_assert (n_reg >= 1
> +       && TARGET_ZCMP
> +       && ((TARGET_RVE && (n_reg <= ARRAY_SIZE (push_save_reg_order_zcmpe)))
> +           || (TARGET_ZCMP && (n_reg <= ARRAY_SIZE (push_save_reg_order)))));
> +
> +  /* sp adjust pattern */
> +  int max_allow_sp_adjust = riscv_push_pop_base_sp_adjust (frame->mask) + 48;

What's the magic number of 48?

> +  int sp_adjust = aligned_size > max_allow_sp_adjust ?
> +      max_allow_sp_adjust
> +      : aligned_size;
> +
> +  /*TODO: move this part to frame computation function. */

Is it possible to resolve this TODO?

> +  frame->gp_sp_offset = (veclen - 1) * UNITS_PER_WORD;
> +  frame->push_pop_sp_adjust = sp_adjust;

> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> index d05b1d59853..6e6e3ee2c25 100644
> --- a/gcc/config/riscv/riscv.h
> +++ b/gcc/config/riscv/riscv.h
> @@ -383,6 +383,7 @@ ASM_MISA_SPEC
>  #define HARD_FRAME_POINTER_REGNUM 8
>  #define STACK_POINTER_REGNUM 2
>  #define THREAD_POINTER_REGNUM 4
> +#define RETURN_VALUE_REGNUM 10
>
>  /* These two registers don't really exist: they get eliminated to either
>     the stack or hard frame pointer.  */
> @@ -1097,4 +1098,7 @@ extern void riscv_remove_unneeded_save_restore_calls (void);
>  #define DWARF_REG_TO_UNWIND_COLUMN(REGNO) \
>    ((REGNO == RISCV_DWARF_VLENB) ? (FIRST_PSEUDO_REGISTER + 1) : REGNO)
>
> +#define RISCV_ZCE_PUSH_POP_MASK 0x0ffc0302u

RISCV_ZCE_PUSH_POP_MASK -> RISCV_ZCMP_PUSH_POP_MASK


> +#define RISCV_ZCMPE_PUSH_POP_MASK 0x302u

RISCV_ZCMPE_PUSH_POP_MASK -> RISCV_ZCMP_RVE_PUSH_POP_MASK

> diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
> index 6e326fc7e02..9ef522306a5 100644
> --- a/gcc/config/riscv/t-riscv
> +++ b/gcc/config/riscv/t-riscv
> @@ -90,6 +90,10 @@ riscv-v.o: $(srcdir)/config/riscv/riscv-v.cc \
>         $(COMPILE) $<
>         $(POSTCOMPILE)
>
> +riscv-zcmp-popret.o: $(srcdir)/config/riscv/riscv-zcmp-popret.cc

Plz add right dependency here.

> diff --git a/gcc/config/riscv/zc.md b/gcc/config/riscv/zc.md
> new file mode 100644
> index 00000000000..3ad34dacd49
> --- /dev/null
> +++ b/gcc/config/riscv/zc.md
> @@ -0,0 +1,47 @@
> +;; Machine description for ZCE extension.

ZCE extension. -> Zc* extension

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 4/5] RISC-V: Add Zcmp extension supports.
       [not found] <2023042517370879865929@eswincomputing.com>
@ 2023-04-25  9:52 ` Fei Gao
  0 siblings, 0 replies; 9+ messages in thread
From: Fei Gao @ 2023-04-25  9:52 UTC (permalink / raw)
  To: jiawei; +Cc: gcc-patches

hi Jiawei

I downloaded the series of patches from you and found in some cases
it fails to generate zcmp push and pop insns.

test.c

char my_getchar();
int test_s0()
{

        int a = my_getchar();
        int b = my_getchar();
        return a+b;
}




On Thu Apr 6 06:21:17 GMT 2023  Jiawei jiawei@iscas.ac.cn wrote:
>
>Add Zcmp extension instructions support. Generate push/pop
>with follow steps:
>
>  1. preprocessing:
>    1.1. if there is no push rtx, then just return. e.g.
>    (note 5 1 22 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>    (insn/f 22 5 23 2 (set (reg/f:SI 2 sp)
>      (plus:SI (reg/f:SI 2 sp)
>        (const_int -32 [0xffffffffffffffe0])))
>    (nil))
>    (note 23 22 2 2 NOTE_INSN_PROLOGUE_END)
>    1.2. if push rtx exists, then we compute the number of
>    pushed s-registers, n_sreg.
>
>  push rtx should be find before NOTE_INSN_PROLOGUE_END tag
>
>  [2 and 3 happend simultaneously]
>
>  2. find valid move pattern, mv sN, aN, where N < n_sreg,
>    and aN is not used the move pattern, and sN is not
>    defined before the move pattern (from prologue to the
>    position of move pattern).
>
>  3. analysis use and reach of every instruction from prologue
>    to the position of move pattern.
>    if any sN is used, then we mark the corresponding argument list
>    candidate as invalid.
>    e.g.
>        push  {ra,s0-s3}, {}, -32
>        sw      s0,44(sp) # s0 is used, then argument list is invalid
>        mv      a0,a5     # a0 is defined, then argument list is invalid
>        ...
>        mv      s0,a0
>        mv      s1,a1
>        mv      s2,a2
>
>  4. if there is a valid argument list, then replace the pop
>    push parallel insn, and delete mv pattern.
>     if not, skip.
>
>All "zcmpe" means Zcmp with RVE extension.
>The push/pop instrunction implement is mostly finished by Sinan Lin.
>
>Co-Authored by: Sinan Lin <sinan....@linux.alibaba.com>
>Co-Authored by: Simon Cook <simon.c...@embecosm.com>
>Co-Authored by: Shihua Liao <shi...@iscas.ac.cn>
>
>gcc/ChangeLog:
>
>        * config.gcc: New object.
>        * config/riscv/predicates.md (riscv_stack_push_operation):
>          New predicate.
>        (riscv_stack_pop_operation): Ditto.
>        (pop_return_value_constant): Ditto.
>        * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): New pass.
>        * config/riscv/riscv-protos.h (riscv_output_popret_p):
>          New routine.
>        (riscv_valid_stack_push_pop_p): Ditto.
>        (riscv_check_regno): Ditto.
>        (make_pass_zcmp_popret): New pass.
>        * config/riscv/riscv.cc (struct riscv_frame_info): New variable.
>        (riscv_output_popret_p): New function.
>        (riscv_print_pop_size): Ditto.
>        (riscv_print_reglist): Ditto.
>        (riscv_print_operand): New case symbols.
>        (riscv_save_push_pop_count): New function.
>        (riscv_push_pop_base_sp_adjust): Ditto.
>        (riscv_use_push_pop): Ditto.
>        (riscv_compute_frame_info): Adjust frame value.
>        (riscv_emit_pop_insn): New function.
>        (riscv_check_regno): Ditto.
>        (riscv_valid_stack_push_pop_p): Ditto.
>        (riscv_emit_push_insn): Ditto.
>        (riscv_expand_prologue): Modify frame pattern.
>        (riscv_expand_epilogue): Ditto.
>        * config/riscv/riscv.h (RETURN_VALUE_REGNUM):
>        (RISCV_ZCE_PUSH_POP_MASK): New mask.
>        (RISCV_ZCMPE_PUSH_POP_MASK): Ditto.
>        * config/riscv/riscv.md: Add new reg number and include info.
>        * config/riscv/t-riscv: New object rules.
>        * config/riscv/riscv-zcmp-popret.cc: New file.
>        * config/riscv/zc.md: New file.
>---
> gcc/config.gcc                        |   2 +-
> gcc/config/riscv/predicates.md        |  16 +
> gcc/config/riscv/riscv-passes.def     |   1 +
> gcc/config/riscv/riscv-protos.h       |   4 +
> gcc/config/riscv/riscv-zcmp-popret.cc | 260 +++++++++++++++
> gcc/config/riscv/riscv.cc             | 437 +++++++++++++++++++++++++-
> gcc/config/riscv/riscv.h              |   4 +
> gcc/config/riscv/riscv.md             |   3 +
> gcc/config/riscv/t-riscv              |   4 +
> gcc/config/riscv/zc.md                |  47 +++
> 10 files changed, 767 insertions(+), 11 deletions(-)
> create mode 100644 gcc/config/riscv/riscv-zcmp-popret.cc
> create mode 100644 gcc/config/riscv/zc.md
>
>diff --git a/gcc/config.gcc b/gcc/config.gcc
>index 629d324b5ef..a991c5273f9 100644
>--- a/gcc/config.gcc
>+++ b/gcc/config.gcc
>@@ -529,7 +529,7 @@ pru-*-*)
>        ;;
> riscv*)
>        cpu_type=riscv
>-       extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
>riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o"
>+       extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o 
>riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o 
>riscv-zcmp-popret.o"
>        extra_objs="${extra_objs} riscv-vector-builtins.o 
>riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
>        extra_objs="${extra_objs} thead.o"
>        d_target_objs="riscv-d.o"
>diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
>index 0d9d7701c7e..6bff6cd047a 100644
>--- a/gcc/config/riscv/predicates.md
>+++ b/gcc/config/riscv/predicates.md
>@@ -412,3 +412,19 @@
>   (and (match_code "const_int")
>        (ior (match_operand 0 "not_uimm_extra_bit_operand")
>            (match_operand 0 "const_nottwobits_operand"))))
>+
>+(define_special_predicate "riscv_stack_push_operation"
>+  (match_code "parallel")
>+{
>+  return riscv_valid_stack_push_pop_p (op, true);
>+})
>+
>+(define_special_predicate "riscv_stack_pop_operation"
>+  (match_code "parallel")
>+{
>+  return riscv_valid_stack_push_pop_p (op, false);
>+})
>+
>+(define_predicate "pop_return_value_constant"
>+  (and (match_code "const_int")
>+       (match_test "INTVAL (op) == 0")))
>diff --git a/gcc/config/riscv/riscv-passes.def 
>b/gcc/config/riscv/riscv-passes.def
>index 4084122cf0a..25625b9af3e 100644
>--- a/gcc/config/riscv/riscv-passes.def
>+++ b/gcc/config/riscv/riscv-passes.def
>@@ -19,3 +19,4 @@
> 
> INSERT_PASS_AFTER (pass_rtl_store_motion, 1, pass_shorten_memrefs);
> INSERT_PASS_BEFORE (pass_fast_rtl_dce, 1, pass_vsetvl);
>+INSERT_PASS_AFTER (pass_cprop_hardreg, 1, pass_zcmp_popret);
>diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
>index 4611447ddde..8f243cd5f44 100644
>--- a/gcc/config/riscv/riscv-protos.h
>+++ b/gcc/config/riscv/riscv-protos.h
>@@ -54,6 +54,7 @@ extern bool riscv_split_64bit_move_p (rtx, rtx);
> extern void riscv_split_doubleword_move (rtx, rtx);
> extern const char *riscv_output_move (rtx, rtx);
> extern const char *riscv_output_return ();
>+extern bool riscv_output_popret_p (rtx);
> 
> #ifdef RTX_CODE
> extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx);
>@@ -79,6 +80,8 @@ extern void riscv_reinit (void);
> extern poly_uint64 riscv_regmode_natural_size (machine_mode);
> extern bool riscv_v_ext_vector_mode_p (machine_mode);
> extern bool riscv_shamt_matches_mask_p (int, HOST_WIDE_INT);
>+extern bool riscv_valid_stack_push_pop_p (rtx, bool);
>+extern bool riscv_check_regno(rtx, unsigned);
> 
> /* Routines implemented in riscv-c.cc.  */
> void riscv_cpu_cpp_builtins (cpp_reader *);
>@@ -99,6 +102,7 @@ extern bool riscv_hard_regno_rename_ok (unsigned, unsigned);
> 
> rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
> rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
>+rtl_opt_pass * make_pass_zcmp_popret (gcc::context *ctxt);
> 
> /* Information about one CPU we know about.  */
> struct riscv_cpu_info {
>diff --git a/gcc/config/riscv/riscv-zcmp-popret.cc 
>b/gcc/config/riscv/riscv-zcmp-popret.cc
>new file mode 100644
>index 00000000000..d7b40f6a3e2
>--- /dev/null
>+++ b/gcc/config/riscv/riscv-zcmp-popret.cc
>@@ -0,0 +1,260 @@
>+#include "config.h"
>+#include "system.h"
>+#include "coretypes.h"
>+#include "tm.h"
>+#include "rtl.h"
>+#include "backend.h"
>+#include "regs.h"
>+#include "target.h"
>+#include "memmodel.h"
>+#include "emit-rtl.h"
>+#include "df.h"
>+#include "predict.h"
>+#include "tree-pass.h"
>+#include "tree.h"
>+#include "tm_p.h"
>+#include "optabs.h"
>+#include "recog.h"
>+#include "cfgrtl.h"
>+
>+#define IN_TARGET_CODE 1
>+
>+namespace {
>+
>+/*
>+  1. preprocessing:
>+    1.1. if there is no push rtx, then just return. e.g.
>+    (note 5 1 22 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
>+    (insn/f 22 5 23 2 (set (reg/f:SI 2 sp)
>+      (plus:SI (reg/f:SI 2 sp)
>+       (const_int -32 [0xffffffffffffffe0])))
>+    (nil))
>+    (note 23 22 2 2 NOTE_INSN_PROLOGUE_END)
>+    1.2. if push rtx exists, then we compute the number of
>+    pushed s-registers, n_sreg.
>+
>+  push rtx should be find before NOTE_INSN_PROLOGUE_END tag
>+
>+  [2 and 3 happend simultaneously]
>+  2. find valid move pattern, mv sN, aN, where N < n_sreg,
>+    and aN is not used the move pattern, and sN is not
>+    defined before the move pattern (from prologue to the
>+    position of move pattern).
>+  3. analysis use and reach of every instruction from prologue
>+    to the position of move pattern.
>+    if any sN is used, then we mark the corresponding argument list
>+    candidate as invalid.
>+    e.g.
>+       push  {ra,s0-s3}, {}, -32
>+       sw      s0,44(sp) # s0 is used, then argument list is invalid
>+       mv      a0,a5     # a0 is defined, then argument list is invalid
>+       ...
>+       mv      s0,a0
>+       mv      s1,a1
>+       mv      s2,a2
>+
>+  4. if there is a valid argument list, then replace the pop
>+    push parallel insn, and delete mv pattern.
>+     if not, skip.
>+*/
>+
>+static void
>+emit_zcmp_popret (rtx_insn *pop_rtx,
>+                 rtx_insn **candidates,
>+                 basic_block bb)
>+{
>+  bool gen_popretz_p = candidates [0];
>+  bool gen_popret_p = candidates [2];
>+
>+  if (!(gen_popret_p || gen_popretz_p))
>+    return;
>+
>+  gcc_assert ((gen_popret_p && !gen_popretz_p)
>+      || (gen_popretz_p && gen_popret_p));
>+
>+  rtx pop_pat = PATTERN (pop_rtx);
>+  unsigned pop_idx = 0, popret_idx = 0;
>+  unsigned n_pop_par = XVECLEN (pop_pat, 0);
>+  unsigned n_popret_par = n_pop_par
>+       + (gen_popretz_p ? 2 : 0)
>+       + (gen_popret_p ? 2 : 0);
>+
>+  rtx popret_par = gen_rtx_PARALLEL (VOIDmode,
>+         rtvec_alloc (n_popret_par));
>+
>+  /* return zero pattern */
>+  if (gen_popretz_p)
>+    {
>+      XVECEXP (popret_par, 0, 0) = PATTERN (candidates[0]);
>+      XVECEXP (popret_par, 0, 1) = PATTERN (candidates[1]);
>+      popret_idx += 2;
>+      delete_insn (candidates[0]);
>+      delete_insn (candidates[1]);
>+    }
>+
>+  /* copy pop paruence.  */
>+  for (; pop_idx < n_pop_par;
>+      pop_idx ++, popret_idx ++)
>+    {
>+      XVECEXP (popret_par, 0, popret_idx) =
>+         XVECEXP (pop_pat, 0, pop_idx);
>+    }
>+
>+  /* ret pattern.  */
>+  rtx ret_pat = PATTERN (candidates[2]);
>+  gcc_assert (GET_CODE (ret_pat) == PARALLEL);
>+
>+  for (int i = 0; i < XVECLEN (ret_pat, 0);
>+      i++, popret_idx++)
>+  {
>+    XVECEXP (popret_par, 0, popret_idx) =
>+       XVECEXP (ret_pat, 0, i);
>+  }
>+
>+  rtx_insn *insn = emit_jump_insn_after (
>+         popret_par,
>+         BB_END (bb));
>+  JUMP_LABEL (insn) = simple_return_rtx;
>+
>+  REG_NOTES (insn) = REG_NOTES (pop_rtx);
>+  RTX_FRAME_RELATED_P (insn) = 1;
>+
>+  if (dump_file)
>+    {
>+      fprintf(dump_file, "new insn:\n");
>+      print_rtl (dump_file, insn);
>+    }
>+
>+  delete_insn (candidates [2]);
>+  delete_insn (pop_rtx);
>+}
>+
>+static void
>+zcmp_popret (void)
>+{
>+  basic_block bb;
>+  rtx_insn *insn = NULL, *pop_rtx = NULL;
>+  rtx_insn *pop_candidates[3] = {NULL, };
>+  /*
>+    find NOTE_INSN_EPILOGUE_BEG, but pop_rtx not found => return
>+    find NOTE_INSN_EPILOGUE_BEG, and pop_rtx is found => looking for a0
>+  */
>+
>+  FOR_EACH_BB_REVERSE_FN (bb, cfun)
>+  {
>+    FOR_BB_INSNS_REVERSE (bb, insn)
>+      {
>+       if (!pop_rtx
>+           && NOTE_P (insn)
>+           && NOTE_KIND (insn) == NOTE_INSN_EPILOGUE_BEG)
>+         return;
>+
>+       if (NOTE_P (insn)
>+           && NOTE_KIND (insn) == NOTE_INSN_FUNCTION_BEG)
>+         {
>+           if (pop_rtx)
>+             emit_zcmp_popret (pop_rtx, pop_candidates, bb);
>+           return;
>+         };
>+
>+       if (!(NONDEBUG_INSN_P (insn)
>+           || CALL_P (insn)))
>+         continue;
>+
>+       rtx pop_pat = PATTERN (insn);
>+
>+       if (GET_CODE (pop_pat) == PARALLEL
>+           && riscv_valid_stack_push_pop_p (pop_pat, false))
>+         {
>+           pop_rtx = insn;
>+           continue;
>+         }
>+
>+       /* pattern for `ret`.  */
>+       if (JUMP_P (insn)
>+           && GET_CODE (pop_pat) == PARALLEL
>+           && XVECLEN (pop_pat, 0) == 2
>+           && GET_CODE (XVECEXP (pop_pat, 0, 0)) == SIMPLE_RETURN
>+           && GET_CODE (XVECEXP (pop_pat, 0, 1)) == USE)
>+         {
>+           rtx use_reg = XEXP (XVECEXP (pop_pat, 0, 1), 0);
>+           if (REG_P (use_reg)
>+             && REGNO (use_reg) == RETURN_ADDR_REGNUM)
>+             {
>+               pop_candidates [2] = insn;
>+               continue;
>+             }
>+         }
>+
>+       if (!pop_rtx)
>+         continue;
>+
>+       /* pattern for return value.  */
>+       if (!pop_candidates [0]
>+           && GET_CODE (pop_pat) == USE)
>+         {
>+           rtx_insn *set_insn = PREV_INSN (insn);
>+           rtx pat_set = PATTERN (set_insn);
>+
>+           if (riscv_check_regno (XEXP (pop_pat, 0),
>+                   RETURN_VALUE_REGNUM)
>+               && insn
>+               && pat_set != NULL
>+               && GET_CODE (pat_set) == SET
>+               && riscv_check_regno (SET_DEST (pat_set),
>+                      RETURN_VALUE_REGNUM)
>+               && CONST_INT_P (SET_SRC (pat_set))
>+               && INTVAL (SET_SRC (pat_set)) == 0)
>+             {
>+               pop_candidates [0] = set_insn;
>+               pop_candidates [1] = insn;
>+               break;
>+             }
>+         }
>+      }
>+
>+    if (pop_rtx)
>+      {
>+       emit_zcmp_popret (pop_rtx, pop_candidates, bb);
>+       return;
>+      }
>+  }
>+}
>+
>+const pass_data pass_data_zcmp_popret =
>+{
>+  RTL_PASS, /* type */
>+  "zcmp-popret", /* name */
>+  OPTGROUP_NONE, /* optinfo_flags */
>+  TV_NONE, /* tv_id */
>+  0, /* properties_required */
>+  0, /* properties_provided */
>+  0, /* properties_destroyed */
>+  0, /* todo_flags_start */
>+  0, /* todo_flags_finish */
>+};
>+
>+class pass_zcmp_popret : public rtl_opt_pass
>+{
>+public:
>+  pass_zcmp_popret (gcc::context *ctxt)
>+    : rtl_opt_pass (pass_data_zcmp_popret, ctxt)
>+  {}
>+
>+  /* opt_pass methods: */
>+  virtual bool gate (function *)
>+    { return TARGET_ZCMP; }
>+  virtual unsigned int execute (function *)
>+    {
>+      zcmp_popret ();
>+      return 0;
>+    }
>+}; // class pass_zcmp_popret
>+
>+} // anon namespace
>+
>+rtl_opt_pass *
>+make_pass_zcmp_popret (gcc::context *ctxt)
>+{
>+  return new pass_zcmp_popret (ctxt);
>+}
>diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>index 5f8cbfc15ed..17df2f3f8cf 100644
>--- a/gcc/config/riscv/riscv.cc
>+++ b/gcc/config/riscv/riscv.cc
>@@ -114,6 +114,9 @@ struct GTY(())  riscv_frame_info {
>   /* Likewise FPR X.  */
>   unsigned int fmask;
> 
>+  /* How much the push/pop routines adjust sp (or 0 if unused).  */
>+  unsigned push_pop_sp_adjust;
>+
>   /* How much the GPR save/restore routines adjust sp (or 0 if unused).  */
>   unsigned save_libcall_adjustment;
> 
>@@ -401,6 +404,20 @@ static const unsigned gpr_save_reg_order[] = {
>   S10_REGNUM, S11_REGNUM
> };
> 
>+/* Order for the CLOBBERs/USEs of push/pop.  */
>+static const unsigned push_save_reg_order[] = {
>+  INVALID_REGNUM, RETURN_ADDR_REGNUM, S0_REGNUM,
>+  S1_REGNUM, S2_REGNUM, S3_REGNUM, S4_REGNUM,
>+  S5_REGNUM, S6_REGNUM, S7_REGNUM, S8_REGNUM,
>+  S9_REGNUM, S10_REGNUM, S11_REGNUM
>+};
>+
>+/* Order for the CLOBBERs/USEs of push/pop in rve.  */
>+static const unsigned push_save_reg_order_zcmpe[] = {
>+  INVALID_REGNUM, RETURN_ADDR_REGNUM, S0_REGNUM,
>+  S1_REGNUM
>+};
>+
> /* A table describing all the processors GCC knows about.  */
> static const struct riscv_tune_info riscv_tune_info_table[] = {
> #define RISCV_TUNE(TUNE_NAME, PIPELINE_MODEL, TUNE_INFO)       \
>@@ -2989,6 +3006,17 @@ riscv_output_return ()
>   return "ret";
> }
> 
>+bool
>+riscv_output_popret_p (rtx op)
>+{
>+  unsigned n_rtx = XVECLEN (op, 0);
>+  rtx use = XVECEXP (op, 0, n_rtx - 1);
>+  rtx ret = XVECEXP (op, 0, n_rtx - 2);
>+
>+    return GET_CODE (ret) == SIMPLE_RETURN
>+       &&  GET_CODE (use) == USE;
>+}
>+
> 
>
> /* Return true if CMP1 is a suitable second operand for integer ordering
>    test CODE.  See also the *sCC patterns in riscv.md.  */
>@@ -4306,6 +4334,74 @@ riscv_memmodel_needs_amo_acquire (enum memmodel model)
>     }
> }
> 
>+/* Print Sp adjustment field of pop instruction.  */
>+
>+static void
>+riscv_print_pop_size (FILE *file, rtx op)
>+{
>+  unsigned sp_adjust_idx = XVECLEN (op, 0) - 1;
>+  rtx sp_adjust_rtx = XVECEXP (op, 0, sp_adjust_idx);
>+
>+  /* Skip ret or pattern.  */
>+  while (GET_CODE (sp_adjust_rtx) != SET)
>+    sp_adjust_rtx = XVECEXP (op, 0, --sp_adjust_idx);
>+
>+  rtx elt_plus = SET_SRC (sp_adjust_rtx);
>+  fprintf (file, "%ld", INTVAL (XEXP (elt_plus, 1)));
>+}
>+
>+/* Print push/pop register list. */
>+
>+static void
>+riscv_print_reglist (FILE *file, rtx op)
>+{
>+  /* we only deal with three formats:
>+      push {ra}
>+      push {ra, s0}
>+      push {ra, s0-sN}
>+    or
>+      pop {ra}
>+      pop {ra, s0}
>+      pop {ra, s0-sN}
>+    registers except ra has to be continuous s-register,
>+    and it is supposed to be checked before.
>+    register list patterns in push:
>+    (set/f (mem/c:SI
>+      (plus:SI (reg/f:SI 2 sp)
>+       (const_int 28 [0x1c])) [2  S4 A32])
>+      (reg:SI 1 ra))
>+    register list patterns in pop:
>+    (set/f (reg:DI 1 ra)
>+      (mem/c:DI (plus:DI (reg/f:DI 2 sp)
>+       (const_int 8 [0x8])) [2  S8 A64]))
>+  */
>+  int total_count = XVECLEN (op, 0);
>+  int n_regs = 0;
>+  bool push_p = GET_CODE (XVECEXP (op, 0, 0)) == SET
>+      && GET_CODE (SET_SRC (XVECEXP (op, 0, 0))) == PLUS;
>+
>+  for (int idx = 0; idx < total_count; ++idx)
>+    {
>+      rtx ele = XVECEXP (op, 0, idx);
>+      if (GET_CODE (ele) != SET)
>+       continue;
>+
>+      bool restore_save_p = push_p ?
>+         MEM_P (SET_DEST (ele)) :
>+         MEM_P (SET_SRC (ele));
>+
>+      if (restore_save_p)
>+       n_regs ++;
>+    }
>+
>+  if (n_regs > 2)
>+    fprintf (file, "ra,s0-s%u", n_regs - 2);
>+  else if (n_regs > 1)
>+    fprintf (file, "ra,s0");
>+  else
>+    fputs("ra", file);
>+}
>+
> /* Return true if a FENCE should be emitted to before a memory access to
>    implement the release portion of memory model MODEL.  */
> 
>@@ -4517,6 +4613,14 @@ riscv_print_operand (FILE *file, rtx op, int letter)
>       fputs (GET_RTX_NAME (code), file);
>       break;
> 
>+    case 'L':
>+      riscv_print_reglist (file, op);
>+      break;
>+
>+    case 's':
>+      riscv_print_pop_size (file, op);
>+      break;
>+
>     case 'S':
>       {
>        rtx newop = GEN_INT (ctz_hwi (INTVAL (op)));
>@@ -4777,6 +4881,66 @@ riscv_use_save_libcall (const struct riscv_frame_info 
>*frame)
>   return frame->save_libcall_adjustment != 0;
> }
> 
>+/* Determine how many instructions related to push/pop instructions.  */
>+
>+static unsigned
>+riscv_save_push_pop_count (unsigned mask)
>+{
>+  if (!BITSET_P (mask, GP_REG_FIRST + RETURN_ADDR_REGNUM))
>+    return 0;
>+  for (unsigned n = GP_REG_LAST; n > GP_REG_FIRST; n--)
>+    if (BITSET_P (mask, n)
>+       && !call_used_regs [n])
>+      /* add ra saving and sp adjust. */
>+      return CALLEE_SAVED_REG_NUMBER (n) + 1 + 2;
>+  abort ();
>+}
>+
>+/* Calculate the maximum sp adjustment of push/pop instruction. */
>+
>+static unsigned
>+riscv_push_pop_base_sp_adjust (unsigned mask)
>+{
>+  unsigned n_regs = riscv_save_push_pop_count (mask) - 1;
>+  return (n_regs * UNITS_PER_WORD + 15) & (~0xf);
>+}
>+
>+/* Determine whether to call push/pop routines.  */
>+
>+static bool
>+riscv_use_push_pop (const struct riscv_frame_info *frame, const HOST_WIDE_INT 
>frame_size)
>+{
>+  if (!TARGET_ZCMP)
>+    return false;
>+
>+  /* We do not handler variable argument cases currently.  */
>+  if (cfun->machine->varargs_size != 0)
>+    return false;
>+
>+  HOST_WIDE_INT base_size = riscv_push_pop_base_sp_adjust (frame->mask);
>+  /*
>+     Pr 960215-1.c in rv64 ouputs
>+
>+       addi    sp,sp,-32
>+       sd      ra,24(sp)
>+       sd      s0,16(sp)
>+       sd      s2,8(sp)
>+       sd      s3,0(sp)
>+     it is a rare case that callee saved registers are not non-continous,
>+     which breaks the old push implementation, and we just reject this case
>+     like save-restore does now.
>+  */
>+  if (base_size > frame_size)
>+    return false;
>+
>+  /* {ra,s0-s10} is invalid. */
>+  if (frame->mask & (1 << (S10_REGNUM - GP_REG_FIRST))
>+      && !(frame->mask & (1 << (S11_REGNUM - GP_REG_FIRST))))
>+    return false;
>+
>+  return frame->mask & (1 << (RETURN_ADDR_REGNUM - GP_REG_FIRST));
>+}
>+
> /* Determine which GPR save/restore routine to call.  */
> 
> static unsigned
>@@ -4934,6 +5098,8 @@ riscv_compute_frame_info (void)
>   /* Only use save/restore routines when the GPRs are atop the frame.  */
>   if (known_ne (frame->hard_frame_pointer_offset, frame->total_size))
>     frame->save_libcall_adjustment = 0;
>+
>+  frame->push_pop_sp_adjust = 0;
> }
> 
> /* Make sure that we're not trying to eliminate to the wrong hard frame
>@@ -5171,6 +5337,86 @@ riscv_for_each_saved_reg (poly_int64 sp_offset, 
>riscv_save_restore_fn fn,
>       }
> }
> 
>+static void
>+riscv_emit_pop_insn (struct riscv_frame_info *frame, HOST_WIDE_INT offset, 
>HOST_WIDE_INT size)
>+{
>+  unsigned int veclen = riscv_save_push_pop_count (frame->mask);
>+  unsigned int n_reg = veclen - 1;
>+  rtvec vec = rtvec_alloc (veclen);
>+  HOST_WIDE_INT sp_adjust;
>+  rtx dwarf = NULL_RTX;
>+
>+  const unsigned *reg_order = (TARGET_ZCMP && TARGET_RVE)
>+       ? push_save_reg_order_zcmpe
>+       : push_save_reg_order;
>+
>+  gcc_assert (n_reg >= 1
>+       && TARGET_ZCMP
>+       && ((TARGET_RVE && (n_reg <= ARRAY_SIZE (push_save_reg_order_zcmpe)))
>+           || (TARGET_ZCMP && (n_reg <= ARRAY_SIZE (push_save_reg_order)))));
>+
>+  /* sp adjust pattern */
>+  int max_allow_sp_adjust = riscv_push_pop_base_sp_adjust (frame->mask) + 48;
>+  int aligned_size = size;
>+
>+  /* if sp adjustment is too large, we should split it first. */
>+  if (aligned_size > max_allow_sp_adjust)
>+    {
>+      rtx dwarf_pre_sp_adjust = NULL_RTX;
>+      rtx pre_adjust_rtx = gen_add3_insn (stack_pointer_rtx,
>+                       stack_pointer_rtx,
>+                       GEN_INT (aligned_size - max_allow_sp_adjust));
>+      rtx insn = emit_insn (pre_adjust_rtx);
>+
>+      rtx cfa_pre_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
>+                       GEN_INT (aligned_size - max_allow_sp_adjust));
>+      dwarf_pre_sp_adjust = alloc_reg_note (REG_CFA_DEF_CFA,
>+               cfa_pre_adjust_rtx,
>+               dwarf_pre_sp_adjust);
>+
>+      RTX_FRAME_RELATED_P (insn) = 1;
>+      REG_NOTES (insn) = dwarf_pre_sp_adjust;
>+
>+      sp_adjust = max_allow_sp_adjust;
>+    }
>+  else
>+    sp_adjust = (aligned_size + 15) & (~0xf);
>+
>+  /* register save sequence. */
>+  for (unsigned i = 1; i < veclen; ++i)
>+    {
>+      offset -= UNITS_PER_WORD;
>+      unsigned regno = reg_order[i];
>+      rtx reg = gen_rtx_REG (Pmode, regno);
>+      rtx mem = gen_frame_mem (Pmode, plus_constant (Pmode,
>+             stack_pointer_rtx,
>+             offset));
>+      rtx set = gen_rtx_SET (reg, mem);
>+      RTVEC_ELT (vec, i - 1) = set;
>+      RTX_FRAME_RELATED_P (set) = 1;
>+      dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf);
>+    }
>+
>+  /* sp adjust pattern */
>+  rtx adjust_sp_rtx
>+      = gen_rtx_SET (stack_pointer_rtx,
>+           plus_constant (Pmode,
>+               stack_pointer_rtx,
>+               sp_adjust));
>+  RTVEC_ELT (vec, veclen - 1) = adjust_sp_rtx;
>+
>+  rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
>+       const0_rtx);
>+  dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, dwarf);
>+
>+  frame->gp_sp_offset -= (veclen - 1) * UNITS_PER_WORD;
>+  frame->push_pop_sp_adjust = sp_adjust;
>+
>+  rtx insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, vec));
>+  RTX_FRAME_RELATED_P (insn) = 1;
>+  REG_NOTES (insn) = dwarf;
>+}
>+
> /* For stack frames that can't be allocated with a single ADDI instruction,
>    compute the best value to initially allocate.  It must at a minimum
>    allocate enough space to spill the callee-saved registers.  If TARGET_RVC,
>@@ -5270,6 +5516,146 @@ riscv_emit_stack_tie (void)
>     emit_insn (gen_stack_tiedi (stack_pointer_rtx, hard_frame_pointer_rtx));
> }
> 
>+bool
>+riscv_check_regno(rtx pat, unsigned regno)
>+{
>+  return REG_P (pat)
>+      && REGNO (pat) == regno;
>+}
>+
>+/* Function to check whether the OP is a valid stack push/pop operation.
>+   This part is borrowed from nds32 nds32_valid_stack_push_pop_p */
>+
>+bool
>+riscv_valid_stack_push_pop_p (rtx op, bool push_p)
>+{
>+  int index;
>+  int total_count;
>+  int sp_adjust_rtx_index;
>+  rtx elt;
>+  rtx elt_reg;
>+  rtx elt_plus;
>+
>+  if (!TARGET_ZCMP)
>+    return false;
>+
>+  total_count = XVECLEN (op, 0);
>+  sp_adjust_rtx_index = push_p ? 0 : total_count - 1;
>+
>+  /* At least sp + one callee save/restore register rtx */
>+  if (total_count < 2)
>+    return false;
>+
>+  /* Perform some quick check for that every element should be 'set',
>+     for pop, it might contain `ret` and `ret value` pattern.  */
>+  for (index = 0; index < total_count; index++)
>+    {
>+      elt = XVECEXP (op, 0, index);
>+
>+      /* skip pop return value rtx */
>+      if (!push_p && GET_CODE (elt) == SET
>+         && riscv_check_regno (SET_DEST (elt), RETURN_VALUE_REGNUM)
>+         && total_count >= 4
>+         && index + 1 < total_count
>+         && GET_CODE (XVECEXP (op, 0, index + 1)) == USE)
>+       {
>+         rtx use_reg = XEXP (XVECEXP (op, 0, index + 1), 0);
>+
>+         if (!riscv_check_regno (use_reg, RETURN_VALUE_REGNUM))
>+           return false;
>+
>+         index += 1;
>+         continue;
>+       }
>+
>+      /* skip ret rtx */
>+      if (!push_p && GET_CODE (elt) == SIMPLE_RETURN
>+         && total_count >= 4
>+         && index + 1 < total_count
>+         && GET_CODE (XVECEXP (op, 0, index + 1)) == USE)
>+       {
>+         rtx use_reg = XEXP (XVECEXP (op, 0, index + 1), 0);
>+
>+         if (!riscv_check_regno (use_reg, RETURN_ADDR_REGNUM))
>+           return false;
>+
>+         index += 1;
>+         sp_adjust_rtx_index -= 2;
>+         continue;
>+       }
>+
>+      if (GET_CODE (elt) != SET)
>+       return false;
>+    }
>+
>+  elt = XVECEXP (op, 0, sp_adjust_rtx_index);
>+  elt_reg  = SET_DEST (elt);
>+  elt_plus = SET_SRC (elt);
>+
>+  /* Check this is (set (stack_reg) (plus stack_reg const)) pattern.  */
>+  if (GET_CODE (elt_plus) != PLUS
>+      || !riscv_check_regno (elt_reg, STACK_POINTER_REGNUM))
>+    return false;
>+
>+  /* Pass all test, this is a valid rtx.  */
>+  return true;
>+}
>+
>+/* Generate push/pop rtx */
>+
>+static void
>+riscv_emit_push_insn (struct riscv_frame_info *frame, HOST_WIDE_INT size)
>+{
>+  unsigned int veclen = riscv_save_push_pop_count (frame->mask);
>+  unsigned int n_reg = veclen - 1;
>+  rtvec vec = rtvec_alloc (veclen);
>+
>+  const unsigned *reg_order = (TARGET_ZCMP && TARGET_RVE)
>+       ? push_save_reg_order_zcmpe
>+       : push_save_reg_order;
>+
>+  int aligned_size = (size + 15) & (~0xf);
>+
>+  gcc_assert (n_reg >= 1
>+       && TARGET_ZCMP
>+       && ((TARGET_RVE && (n_reg <= ARRAY_SIZE (push_save_reg_order_zcmpe)))
>+           || (TARGET_ZCMP && (n_reg <= ARRAY_SIZE (push_save_reg_order)))));
>+
>+  /* sp adjust pattern */
>+  int max_allow_sp_adjust = riscv_push_pop_base_sp_adjust (frame->mask) + 48;
>+  int sp_adjust = aligned_size > max_allow_sp_adjust ?
>+      max_allow_sp_adjust
>+      : aligned_size;
>+
>+  /*TODO: move this part to frame computation function. */
>+  frame->gp_sp_offset = (veclen - 1) * UNITS_PER_WORD;
>+  frame->push_pop_sp_adjust = sp_adjust;
>+
>+  rtx adjust_sp_rtx
>+      = gen_rtx_SET (stack_pointer_rtx,
>+           plus_constant (Pmode,
>+           stack_pointer_rtx,
>+           -sp_adjust));
>+  RTVEC_ELT (vec, 0) = adjust_sp_rtx;
>+
>+  /* Register save sequence. */
>+  for (unsigned i = 1; i < veclen; ++i)
>+    {
>+      sp_adjust -= UNITS_PER_WORD;
>+      unsigned regno = reg_order[i];
>+      rtx reg = gen_rtx_REG (Pmode, regno);
>+      rtx mem = gen_frame_mem (Pmode, plus_constant (Pmode,
>+             stack_pointer_rtx,
>+             sp_adjust));
>+      rtx set = gen_rtx_SET (mem, reg);
>+      RTVEC_ELT (vec, i) = set;
>+      RTX_FRAME_RELATED_P (set) = 1;
>+    }
>+
>+  rtx insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, vec));
>+  RTX_FRAME_RELATED_P (insn) = 1;
>+}
>+
> /* Expand the "prologue" pattern.  */
> 
> void
>@@ -5278,6 +5664,7 @@ riscv_expand_prologue (void)
>   struct riscv_frame_info *frame = &cfun->machine->frame;
>   poly_int64 size = frame->total_size;
>   unsigned mask = frame->mask;
>+  HOST_WIDE_INT step1 = riscv_first_stack_step (frame);
>   rtx insn;
> 
>   if (flag_stack_usage_info)
>@@ -5300,19 +5687,32 @@ riscv_expand_prologue (void)
>       REG_NOTES (insn) = dwarf;
>     }
> 
>+    if (size.is_constant ())
>+    step1 = MIN (size.to_constant(), step1);
>+  if (riscv_use_push_pop (frame, step1))
>+    {
>+      riscv_emit_push_insn (frame, step1);
>+
>+      step1 = MAX (step1 - frame->push_pop_sp_adjust, 0);
>+      size = MAX (size.to_constant() - frame->push_pop_sp_adjust, 0);
>+      frame->mask &= ~ ((TARGET_ZCMP && TARGET_RVE) ?
>+                 RISCV_ZCMPE_PUSH_POP_MASK
>+               : RISCV_ZCE_PUSH_POP_MASK);
>+    }
>+
>   /* Save the registers.  */
>   if ((frame->mask | frame->fmask) != 0)
>     {
>-      HOST_WIDE_INT step1 = riscv_first_stack_step (frame);
>-      if (size.is_constant ())
>-       step1 = MIN (size.to_constant(), step1);
>-
>-      insn = gen_add3_insn (stack_pointer_rtx,
>-                           stack_pointer_rtx,
>-                           GEN_INT (-step1));
>-      RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
>-      size -= step1;
>-      riscv_for_each_saved_reg (size, riscv_save_reg, false, false);
>+       if (step1 > 0)
>+       {
>+         insn = gen_add3_insn (stack_pointer_rtx,
>+                       stack_pointer_rtx,
>+                       GEN_INT (-step1));
>+         RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
>+         size -= step1;
>+       }
>+     riscv_for_each_saved_reg (size, riscv_save_reg,
>+        false /* bool epilogue */, false /* bool maybe_eh_return */);
>     }
> 
>   frame->mask = mask; /* Undo the above fib.  */
>@@ -5412,6 +5812,8 @@ riscv_expand_epilogue (int style)
>   rtx ra = gen_rtx_REG (Pmode, RETURN_ADDR_REGNUM);
>   rtx insn;
> 
>+  bool use_zcmp_pop = !use_restore_libcall && !(crtl->calls_eh_return);
>+
>   /* We need to add memory barrier to prevent read from deallocated stack.  */
>   bool need_barrier_p = known_ne (get_frame_size ()
>                                  + cfun->machine->frame.arg_pointer_offset, 0);
>@@ -5538,6 +5940,18 @@ riscv_expand_epilogue (int style)
>   if (use_restore_libcall)
>     frame->mask = 0; /* Temporarily fib that we need not save GPRs.  */
> 
>+  if (use_zcmp_pop && riscv_use_push_pop (frame, step2))
>+    {
>+      /* Emit a barrier to prevent loads from a deallocated stack.  */
>+      riscv_emit_stack_tie ();
>+      need_barrier_p = false;
>+      riscv_emit_pop_insn (frame, frame->total_size.to_constant(), step2);
>+      frame->mask &= ~ ((TARGET_ZCMP && TARGET_RVE) ?
>+                 RISCV_ZCMPE_PUSH_POP_MASK
>+               : RISCV_ZCE_PUSH_POP_MASK);
>+      step2 = 0;
>+    }
>+
>   /* Restore the registers.  */
>   riscv_for_each_saved_reg (frame->total_size - step2, riscv_restore_reg,
>                            true, style == EXCEPTION_RETURN);
>@@ -5552,6 +5966,9 @@ riscv_expand_epilogue (int style)
>   if (need_barrier_p)
>     riscv_emit_stack_tie ();
> 
>+  if (use_zcmp_pop)
>+    frame->mask = mask;
>+
>   /* Deallocate the final bit of the frame.  */
>   if (step2 > 0)
>     {
>diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
>index d05b1d59853..6e6e3ee2c25 100644
>--- a/gcc/config/riscv/riscv.h
>+++ b/gcc/config/riscv/riscv.h
>@@ -383,6 +383,7 @@ ASM_MISA_SPEC
> #define HARD_FRAME_POINTER_REGNUM 8
> #define STACK_POINTER_REGNUM 2
> #define THREAD_POINTER_REGNUM 4
>+#define RETURN_VALUE_REGNUM 10
> 
> /* These two registers don't really exist: they get eliminated to either
>    the stack or hard frame pointer.  */
>@@ -1097,4 +1098,7 @@ extern void riscv_remove_unneeded_save_restore_calls 
>(void);
> #define DWARF_REG_TO_UNWIND_COLUMN(REGNO) \
>   ((REGNO == RISCV_DWARF_VLENB) ? (FIRST_PSEUDO_REGISTER + 1) : REGNO)
> 
>+#define RISCV_ZCE_PUSH_POP_MASK 0x0ffc0302u
>+#define RISCV_ZCMPE_PUSH_POP_MASK 0x302u
>+
> #endif /* ! GCC_RISCV_H */
>diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
>index bc384d9aedf..b9f2a426e48 100644
>--- a/gcc/config/riscv/riscv.md
>+++ b/gcc/config/riscv/riscv.md
>@@ -108,12 +108,14 @@
> 
> (define_constants
>   [(RETURN_ADDR_REGNUM         1)
>+   (SP_REGNUM                  2)
>    (GP_REGNUM                  3)
>    (TP_REGNUM                  4)
>    (T0_REGNUM                  5)
>    (T1_REGNUM                  6)
>    (S0_REGNUM                  8)
>    (S1_REGNUM                  9)
>+   (A0_REGNUM                  10)
>    (S2_REGNUM                  18)
>    (S3_REGNUM                  19)
>    (S4_REGNUM                  20)
>@@ -3147,3 +3149,4 @@
> (include "sifive-7.md")
> (include "thead.md")
> (include "vector.md")
>+(include "zc.md")
>diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
>index 6e326fc7e02..9ef522306a5 100644
>--- a/gcc/config/riscv/t-riscv
>+++ b/gcc/config/riscv/t-riscv
>@@ -90,6 +90,10 @@ riscv-v.o: $(srcdir)/config/riscv/riscv-v.cc \
>        $(COMPILE) $<
>        $(POSTCOMPILE)
> 
>+riscv-zcmp-popret.o: $(srcdir)/config/riscv/riscv-zcmp-popret.cc
>+       $(COMPILE) $<
>+       $(POSTCOMPILE)
>+
> thead.o: $(srcdir)/config/riscv/thead.cc \
>   $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TARGET_H) backend.h $(RTL_H) \
>   memmodel.h $(EMIT_RTL_H) poly-int.h output.h
>diff --git a/gcc/config/riscv/zc.md b/gcc/config/riscv/zc.md
>new file mode 100644
>index 00000000000..3ad34dacd49
>--- /dev/null
>+++ b/gcc/config/riscv/zc.md
>@@ -0,0 +1,47 @@
>+;; Machine description for ZCE extension.
>+;; Copyright (C) 2021 Free Software Foundation, Inc.
>+
>+;; This file is part of GCC.
>+
>+;; GCC is free software; you can redistribute it and/or modify
>+;; it under the terms of the GNU General Public License as published by
>+;; the Free Software Foundation; either version 3, or (at your option)
>+;; any later version.
>+
>+;; GCC is distributed in the hope that it will be useful,
>+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
>+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>+;; GNU General Public License for more details.
>+
>+;; You should have received a copy of the GNU General Public License
>+;; along with GCC; see the file COPYING3.  If not see
>+;; <http://www.gnu.org/licenses/>.
>+
>+(define_insn "*stack_push<mode>"
>+  [(match_parallel 0 "riscv_stack_push_operation"
>+    [(set (reg:X SP_REGNUM) (plus:X (reg:X SP_REGNUM)
>+      (match_operand:X 1 "const_int_operand" "")))])]
>+  "TARGET_ZCMP"
>+  "cm.push\t{%L0},%1")
>+
>+(define_insn "*stack_pop<mode>"
>+  [(match_parallel 0 "riscv_stack_pop_operation"
>+    [(set (match_operand:X 1 "register_operand" "")
>+      (mem:X (plus:X (reg:X SP_REGNUM)
>+       (match_operand:X 2 "const_int_operand" ""))))])]
>+  "TARGET_ZCMP"
>+  {
>+    return riscv_output_popret_p (operands[0]) ?
>+       "cm.popret\t{%L0},%s0" :
>+       "cm.pop\t{%L0},%s0";
>+  })
>+
>+(define_insn "*stack_pop_with_return_value<mode>"
>+  [(match_parallel 0 "riscv_stack_pop_operation"
>+    [(set (reg:ANYI A0_REGNUM)
>+      (match_operand:ANYI 1 "pop_return_value_constant" ""))])]
>+  "TARGET_ZCMP"
>+  {
>+    gcc_assert (riscv_output_popret_p (operands[0]));
>+    return "cm.popretz\t{%L0},%s0";
>+  })
>-- 
>2.25.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 4/5] RISC-V: Add Zcmp extension supports.
  2023-04-06  6:21 [PATCH 0/5] RISC-V: Support ZC* extensions Jiawei
@ 2023-04-06  6:21 ` Jiawei
  2023-05-04  9:03   ` Kito Cheng
       [not found]   ` <07720619-dd69-4816-987e-ff0e14d9a348.>
  0 siblings, 2 replies; 9+ messages in thread
From: Jiawei @ 2023-04-06  6:21 UTC (permalink / raw)
  To: gcc-patches
  Cc: kito.cheng, palmer, christoph.muellner, jeremy.bennett,
	mary.bennett, nandni.jamnadas, charlie.keaney, simon.cook,
	tariq.kurd, ibrahim.abu.kharmeh1, sinan.lin, wuwei2016, shihua,
	shiyulong, chenyixuan, Jiawei

Add Zcmp extension instructions support. Generate push/pop
with follow steps:

  1. preprocessing:
    1.1. if there is no push rtx, then just return. e.g.
    (note 5 1 22 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
    (insn/f 22 5 23 2 (set (reg/f:SI 2 sp)
      (plus:SI (reg/f:SI 2 sp)
        (const_int -32 [0xffffffffffffffe0])))
    (nil))
    (note 23 22 2 2 NOTE_INSN_PROLOGUE_END)
    1.2. if push rtx exists, then we compute the number of
    pushed s-registers, n_sreg.

  push rtx should be find before NOTE_INSN_PROLOGUE_END tag

  [2 and 3 happend simultaneously]

  2. find valid move pattern, mv sN, aN, where N < n_sreg,
    and aN is not used the move pattern, and sN is not
    defined before the move pattern (from prologue to the
    position of move pattern).

  3. analysis use and reach of every instruction from prologue
    to the position of move pattern.
    if any sN is used, then we mark the corresponding argument list
    candidate as invalid.
    e.g.
        push  {ra,s0-s3}, {}, -32
        sw      s0,44(sp) # s0 is used, then argument list is invalid
        mv      a0,a5     # a0 is defined, then argument list is invalid
        ...
        mv      s0,a0
        mv      s1,a1
        mv      s2,a2

  4. if there is a valid argument list, then replace the pop
    push parallel insn, and delete mv pattern.
     if not, skip.

All "zcmpe" means Zcmp with RVE extension.
The push/pop instrunction implement is mostly finished by Sinan Lin.

Co-Authored by: Sinan Lin <sinan.lin@linux.alibaba.com>
Co-Authored by: Simon Cook <simon.cook@embecosm.com>
Co-Authored by: Shihua Liao <shihua@iscas.ac.cn>

gcc/ChangeLog:

        * config.gcc: New object.
        * config/riscv/predicates.md (riscv_stack_push_operation):
	  New predicate.
        (riscv_stack_pop_operation): Ditto.
        (pop_return_value_constant): Ditto.
        * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): New pass.
        * config/riscv/riscv-protos.h (riscv_output_popret_p):
	  New routine.
        (riscv_valid_stack_push_pop_p): Ditto.
        (riscv_check_regno): Ditto.
        (make_pass_zcmp_popret): New pass.
        * config/riscv/riscv.cc (struct riscv_frame_info): New variable.
        (riscv_output_popret_p): New function.
        (riscv_print_pop_size): Ditto.
        (riscv_print_reglist): Ditto.
        (riscv_print_operand): New case symbols.
        (riscv_save_push_pop_count): New function.
        (riscv_push_pop_base_sp_adjust): Ditto.
        (riscv_use_push_pop): Ditto.
        (riscv_compute_frame_info): Adjust frame value.
        (riscv_emit_pop_insn): New function.
        (riscv_check_regno): Ditto.
        (riscv_valid_stack_push_pop_p): Ditto.
        (riscv_emit_push_insn): Ditto.
        (riscv_expand_prologue): Modify frame pattern.
        (riscv_expand_epilogue): Ditto.
        * config/riscv/riscv.h (RETURN_VALUE_REGNUM):
        (RISCV_ZCE_PUSH_POP_MASK): New mask.
        (RISCV_ZCMPE_PUSH_POP_MASK): Ditto.
        * config/riscv/riscv.md: Add new reg number and include info.
        * config/riscv/t-riscv: New object rules.
        * config/riscv/riscv-zcmp-popret.cc: New file.
        * config/riscv/zc.md: New file.
---
 gcc/config.gcc                        |   2 +-
 gcc/config/riscv/predicates.md        |  16 +
 gcc/config/riscv/riscv-passes.def     |   1 +
 gcc/config/riscv/riscv-protos.h       |   4 +
 gcc/config/riscv/riscv-zcmp-popret.cc | 260 +++++++++++++++
 gcc/config/riscv/riscv.cc             | 437 +++++++++++++++++++++++++-
 gcc/config/riscv/riscv.h              |   4 +
 gcc/config/riscv/riscv.md             |   3 +
 gcc/config/riscv/t-riscv              |   4 +
 gcc/config/riscv/zc.md                |  47 +++
 10 files changed, 767 insertions(+), 11 deletions(-)
 create mode 100644 gcc/config/riscv/riscv-zcmp-popret.cc
 create mode 100644 gcc/config/riscv/zc.md

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 629d324b5ef..a991c5273f9 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -529,7 +529,7 @@ pru-*-*)
 	;;
 riscv*)
 	cpu_type=riscv
-	extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o"
+	extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o riscv-zcmp-popret.o"
 	extra_objs="${extra_objs} riscv-vector-builtins.o riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
 	extra_objs="${extra_objs} thead.o"
 	d_target_objs="riscv-d.o"
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 0d9d7701c7e..6bff6cd047a 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -412,3 +412,19 @@
   (and (match_code "const_int")
        (ior (match_operand 0 "not_uimm_extra_bit_operand")
 	    (match_operand 0 "const_nottwobits_operand"))))
+
+(define_special_predicate "riscv_stack_push_operation"
+  (match_code "parallel")
+{
+  return riscv_valid_stack_push_pop_p (op, true);
+})
+
+(define_special_predicate "riscv_stack_pop_operation"
+  (match_code "parallel")
+{
+  return riscv_valid_stack_push_pop_p (op, false);
+})
+
+(define_predicate "pop_return_value_constant"
+  (and (match_code "const_int")
+       (match_test "INTVAL (op) == 0")))
diff --git a/gcc/config/riscv/riscv-passes.def b/gcc/config/riscv/riscv-passes.def
index 4084122cf0a..25625b9af3e 100644
--- a/gcc/config/riscv/riscv-passes.def
+++ b/gcc/config/riscv/riscv-passes.def
@@ -19,3 +19,4 @@
 
 INSERT_PASS_AFTER (pass_rtl_store_motion, 1, pass_shorten_memrefs);
 INSERT_PASS_BEFORE (pass_fast_rtl_dce, 1, pass_vsetvl);
+INSERT_PASS_AFTER (pass_cprop_hardreg, 1, pass_zcmp_popret);
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 4611447ddde..8f243cd5f44 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -54,6 +54,7 @@ extern bool riscv_split_64bit_move_p (rtx, rtx);
 extern void riscv_split_doubleword_move (rtx, rtx);
 extern const char *riscv_output_move (rtx, rtx);
 extern const char *riscv_output_return ();
+extern bool riscv_output_popret_p (rtx);
 
 #ifdef RTX_CODE
 extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx);
@@ -79,6 +80,8 @@ extern void riscv_reinit (void);
 extern poly_uint64 riscv_regmode_natural_size (machine_mode);
 extern bool riscv_v_ext_vector_mode_p (machine_mode);
 extern bool riscv_shamt_matches_mask_p (int, HOST_WIDE_INT);
+extern bool riscv_valid_stack_push_pop_p (rtx, bool);
+extern bool riscv_check_regno(rtx, unsigned);
 
 /* Routines implemented in riscv-c.cc.  */
 void riscv_cpu_cpp_builtins (cpp_reader *);
@@ -99,6 +102,7 @@ extern bool riscv_hard_regno_rename_ok (unsigned, unsigned);
 
 rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
 rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
+rtl_opt_pass * make_pass_zcmp_popret (gcc::context *ctxt);
 
 /* Information about one CPU we know about.  */
 struct riscv_cpu_info {
diff --git a/gcc/config/riscv/riscv-zcmp-popret.cc b/gcc/config/riscv/riscv-zcmp-popret.cc
new file mode 100644
index 00000000000..d7b40f6a3e2
--- /dev/null
+++ b/gcc/config/riscv/riscv-zcmp-popret.cc
@@ -0,0 +1,260 @@
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "rtl.h"
+#include "backend.h"
+#include "regs.h"
+#include "target.h"
+#include "memmodel.h"
+#include "emit-rtl.h"
+#include "df.h"
+#include "predict.h"
+#include "tree-pass.h"
+#include "tree.h"
+#include "tm_p.h"
+#include "optabs.h"
+#include "recog.h"
+#include "cfgrtl.h"
+
+#define IN_TARGET_CODE 1
+
+namespace {
+
+/*
+  1. preprocessing:
+    1.1. if there is no push rtx, then just return. e.g.
+    (note 5 1 22 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
+    (insn/f 22 5 23 2 (set (reg/f:SI 2 sp)
+      (plus:SI (reg/f:SI 2 sp)
+	(const_int -32 [0xffffffffffffffe0])))
+    (nil))
+    (note 23 22 2 2 NOTE_INSN_PROLOGUE_END)
+    1.2. if push rtx exists, then we compute the number of
+    pushed s-registers, n_sreg.
+
+  push rtx should be find before NOTE_INSN_PROLOGUE_END tag
+
+  [2 and 3 happend simultaneously]
+  2. find valid move pattern, mv sN, aN, where N < n_sreg,
+    and aN is not used the move pattern, and sN is not
+    defined before the move pattern (from prologue to the
+    position of move pattern).
+  3. analysis use and reach of every instruction from prologue
+    to the position of move pattern.
+    if any sN is used, then we mark the corresponding argument list
+    candidate as invalid.
+    e.g.
+	push  {ra,s0-s3}, {}, -32
+	sw	s0,44(sp) # s0 is used, then argument list is invalid
+	mv	a0,a5     # a0 is defined, then argument list is invalid
+	...
+	mv	s0,a0
+	mv	s1,a1
+	mv	s2,a2
+
+  4. if there is a valid argument list, then replace the pop
+    push parallel insn, and delete mv pattern.
+     if not, skip.
+*/
+
+static void
+emit_zcmp_popret (rtx_insn *pop_rtx,
+		  rtx_insn **candidates,
+		  basic_block bb)
+{
+  bool gen_popretz_p = candidates [0];
+  bool gen_popret_p = candidates [2];
+
+  if (!(gen_popret_p || gen_popretz_p))
+    return;
+
+  gcc_assert ((gen_popret_p && !gen_popretz_p)
+      || (gen_popretz_p && gen_popret_p));
+
+  rtx pop_pat = PATTERN (pop_rtx);
+  unsigned pop_idx = 0, popret_idx = 0;
+  unsigned n_pop_par = XVECLEN (pop_pat, 0);
+  unsigned n_popret_par = n_pop_par
+	+ (gen_popretz_p ? 2 : 0)
+	+ (gen_popret_p ? 2 : 0);
+
+  rtx popret_par = gen_rtx_PARALLEL (VOIDmode,
+	  rtvec_alloc (n_popret_par));
+
+  /* return zero pattern */
+  if (gen_popretz_p)
+    {
+      XVECEXP (popret_par, 0, 0) = PATTERN (candidates[0]);
+      XVECEXP (popret_par, 0, 1) = PATTERN (candidates[1]);
+      popret_idx += 2;
+      delete_insn (candidates[0]);
+      delete_insn (candidates[1]);
+    }
+
+  /* copy pop paruence.  */
+  for (; pop_idx < n_pop_par;
+      pop_idx ++, popret_idx ++)
+    {
+      XVECEXP (popret_par, 0, popret_idx) =
+	  XVECEXP (pop_pat, 0, pop_idx);
+    }
+
+  /* ret pattern.  */
+  rtx ret_pat = PATTERN (candidates[2]);
+  gcc_assert (GET_CODE (ret_pat) == PARALLEL);
+
+  for (int i = 0; i < XVECLEN (ret_pat, 0);
+      i++, popret_idx++)
+  {
+    XVECEXP (popret_par, 0, popret_idx) =
+	XVECEXP (ret_pat, 0, i);
+  }
+
+  rtx_insn *insn = emit_jump_insn_after (
+	  popret_par,
+	  BB_END (bb));
+  JUMP_LABEL (insn) = simple_return_rtx;
+
+  REG_NOTES (insn) = REG_NOTES (pop_rtx);
+  RTX_FRAME_RELATED_P (insn) = 1;
+
+  if (dump_file)
+    {
+      fprintf(dump_file, "new insn:\n");
+      print_rtl (dump_file, insn);
+    }
+
+  delete_insn (candidates [2]);
+  delete_insn (pop_rtx);
+}
+
+static void
+zcmp_popret (void)
+{
+  basic_block bb;
+  rtx_insn *insn = NULL, *pop_rtx = NULL;
+  rtx_insn *pop_candidates[3] = {NULL, };
+  /*
+    find NOTE_INSN_EPILOGUE_BEG, but pop_rtx not found => return
+    find NOTE_INSN_EPILOGUE_BEG, and pop_rtx is found => looking for a0
+  */
+
+  FOR_EACH_BB_REVERSE_FN (bb, cfun)
+  {
+    FOR_BB_INSNS_REVERSE (bb, insn)
+      {
+	if (!pop_rtx
+	    && NOTE_P (insn)
+	    && NOTE_KIND (insn) == NOTE_INSN_EPILOGUE_BEG)
+	  return;
+
+	if (NOTE_P (insn)
+	    && NOTE_KIND (insn) == NOTE_INSN_FUNCTION_BEG)
+	  {
+	    if (pop_rtx)
+	      emit_zcmp_popret (pop_rtx, pop_candidates, bb);
+	    return;
+	  };
+
+	if (!(NONDEBUG_INSN_P (insn)
+	    || CALL_P (insn)))
+	  continue;
+
+	rtx pop_pat = PATTERN (insn);
+
+	if (GET_CODE (pop_pat) == PARALLEL
+	    && riscv_valid_stack_push_pop_p (pop_pat, false))
+	  {
+	    pop_rtx = insn;
+	    continue;
+	  }
+
+	/* pattern for `ret`.  */
+	if (JUMP_P (insn)
+	    && GET_CODE (pop_pat) == PARALLEL
+	    && XVECLEN (pop_pat, 0) == 2
+	    && GET_CODE (XVECEXP (pop_pat, 0, 0)) == SIMPLE_RETURN
+	    && GET_CODE (XVECEXP (pop_pat, 0, 1)) == USE)
+	  {
+	    rtx use_reg = XEXP (XVECEXP (pop_pat, 0, 1), 0);
+	    if (REG_P (use_reg)
+	      && REGNO (use_reg) == RETURN_ADDR_REGNUM)
+	      {
+		pop_candidates [2] = insn;
+		continue;
+	      }
+	  }
+
+	if (!pop_rtx)
+	  continue;
+
+	/* pattern for return value.  */
+	if (!pop_candidates [0]
+	    && GET_CODE (pop_pat) == USE)
+	  {
+	    rtx_insn *set_insn = PREV_INSN (insn);
+	    rtx pat_set = PATTERN (set_insn);
+
+	    if (riscv_check_regno (XEXP (pop_pat, 0),
+		    RETURN_VALUE_REGNUM)
+		&& insn
+		&& pat_set != NULL
+		&& GET_CODE (pat_set) == SET
+		&& riscv_check_regno (SET_DEST (pat_set),
+		       RETURN_VALUE_REGNUM)
+		&& CONST_INT_P (SET_SRC (pat_set))
+		&& INTVAL (SET_SRC (pat_set)) == 0)
+	      {
+		pop_candidates [0] = set_insn;
+		pop_candidates [1] = insn;
+		break;
+	      }
+	  }
+      }
+
+    if (pop_rtx)
+      {
+	emit_zcmp_popret (pop_rtx, pop_candidates, bb);
+	return;
+      }
+  }
+}
+
+const pass_data pass_data_zcmp_popret =
+{
+  RTL_PASS, /* type */
+  "zcmp-popret", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_zcmp_popret : public rtl_opt_pass
+{
+public:
+  pass_zcmp_popret (gcc::context *ctxt)
+    : rtl_opt_pass (pass_data_zcmp_popret, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *)
+    { return TARGET_ZCMP; }
+  virtual unsigned int execute (function *)
+    {
+      zcmp_popret ();
+      return 0;
+    }
+}; // class pass_zcmp_popret
+
+} // anon namespace
+
+rtl_opt_pass *
+make_pass_zcmp_popret (gcc::context *ctxt)
+{
+  return new pass_zcmp_popret (ctxt);
+}
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 5f8cbfc15ed..17df2f3f8cf 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -114,6 +114,9 @@ struct GTY(())  riscv_frame_info {
   /* Likewise FPR X.  */
   unsigned int fmask;
 
+  /* How much the push/pop routines adjust sp (or 0 if unused).  */
+  unsigned push_pop_sp_adjust;
+
   /* How much the GPR save/restore routines adjust sp (or 0 if unused).  */
   unsigned save_libcall_adjustment;
 
@@ -401,6 +404,20 @@ static const unsigned gpr_save_reg_order[] = {
   S10_REGNUM, S11_REGNUM
 };
 
+/* Order for the CLOBBERs/USEs of push/pop.  */
+static const unsigned push_save_reg_order[] = {
+  INVALID_REGNUM, RETURN_ADDR_REGNUM, S0_REGNUM,
+  S1_REGNUM, S2_REGNUM, S3_REGNUM, S4_REGNUM,
+  S5_REGNUM, S6_REGNUM, S7_REGNUM, S8_REGNUM,
+  S9_REGNUM, S10_REGNUM, S11_REGNUM
+};
+
+/* Order for the CLOBBERs/USEs of push/pop in rve.  */
+static const unsigned push_save_reg_order_zcmpe[] = {
+  INVALID_REGNUM, RETURN_ADDR_REGNUM, S0_REGNUM,
+  S1_REGNUM
+};
+
 /* A table describing all the processors GCC knows about.  */
 static const struct riscv_tune_info riscv_tune_info_table[] = {
 #define RISCV_TUNE(TUNE_NAME, PIPELINE_MODEL, TUNE_INFO)	\
@@ -2989,6 +3006,17 @@ riscv_output_return ()
   return "ret";
 }
 
+bool
+riscv_output_popret_p (rtx op)
+{
+  unsigned n_rtx = XVECLEN (op, 0);
+  rtx use = XVECEXP (op, 0, n_rtx - 1);
+  rtx ret = XVECEXP (op, 0, n_rtx - 2);
+
+    return GET_CODE (ret) == SIMPLE_RETURN
+       &&  GET_CODE (use) == USE;
+}
+
 \f
 /* Return true if CMP1 is a suitable second operand for integer ordering
    test CODE.  See also the *sCC patterns in riscv.md.  */
@@ -4306,6 +4334,74 @@ riscv_memmodel_needs_amo_acquire (enum memmodel model)
     }
 }
 
+/* Print Sp adjustment field of pop instruction.  */
+
+static void
+riscv_print_pop_size (FILE *file, rtx op)
+{
+  unsigned sp_adjust_idx = XVECLEN (op, 0) - 1;
+  rtx sp_adjust_rtx = XVECEXP (op, 0, sp_adjust_idx);
+
+  /* Skip ret or pattern.  */
+  while (GET_CODE (sp_adjust_rtx) != SET)
+    sp_adjust_rtx = XVECEXP (op, 0, --sp_adjust_idx);
+
+  rtx elt_plus = SET_SRC (sp_adjust_rtx);
+  fprintf (file, "%ld", INTVAL (XEXP (elt_plus, 1)));
+}
+
+/* Print push/pop register list. */
+
+static void
+riscv_print_reglist (FILE *file, rtx op)
+{
+  /* we only deal with three formats:
+      push {ra}
+      push {ra, s0}
+      push {ra, s0-sN}
+    or
+      pop {ra}
+      pop {ra, s0}
+      pop {ra, s0-sN}
+    registers except ra has to be continuous s-register,
+    and it is supposed to be checked before.
+    register list patterns in push:
+    (set/f (mem/c:SI
+      (plus:SI (reg/f:SI 2 sp)
+	(const_int 28 [0x1c])) [2  S4 A32])
+      (reg:SI 1 ra))
+    register list patterns in pop:
+    (set/f (reg:DI 1 ra)
+      (mem/c:DI (plus:DI (reg/f:DI 2 sp)
+	(const_int 8 [0x8])) [2  S8 A64]))
+  */
+  int total_count = XVECLEN (op, 0);
+  int n_regs = 0;
+  bool push_p = GET_CODE (XVECEXP (op, 0, 0)) == SET
+      && GET_CODE (SET_SRC (XVECEXP (op, 0, 0))) == PLUS;
+
+  for (int idx = 0; idx < total_count; ++idx)
+    {
+      rtx ele = XVECEXP (op, 0, idx);
+      if (GET_CODE (ele) != SET)
+	continue;
+
+      bool restore_save_p = push_p ?
+	  MEM_P (SET_DEST (ele)) :
+	  MEM_P (SET_SRC (ele));
+
+      if (restore_save_p)
+	n_regs ++;
+    }
+
+  if (n_regs > 2)
+    fprintf (file, "ra,s0-s%u", n_regs - 2);
+  else if (n_regs > 1)
+    fprintf (file, "ra,s0");
+  else
+    fputs("ra", file);
+}
+
 /* Return true if a FENCE should be emitted to before a memory access to
    implement the release portion of memory model MODEL.  */
 
@@ -4517,6 +4613,14 @@ riscv_print_operand (FILE *file, rtx op, int letter)
       fputs (GET_RTX_NAME (code), file);
       break;
 
+    case 'L':
+      riscv_print_reglist (file, op);
+      break;
+
+    case 's':
+      riscv_print_pop_size (file, op);
+      break;
+
     case 'S':
       {
 	rtx newop = GEN_INT (ctz_hwi (INTVAL (op)));
@@ -4777,6 +4881,66 @@ riscv_use_save_libcall (const struct riscv_frame_info *frame)
   return frame->save_libcall_adjustment != 0;
 }
 
+/* Determine how many instructions related to push/pop instructions.  */
+
+static unsigned
+riscv_save_push_pop_count (unsigned mask)
+{
+  if (!BITSET_P (mask, GP_REG_FIRST + RETURN_ADDR_REGNUM))
+    return 0;
+  for (unsigned n = GP_REG_LAST; n > GP_REG_FIRST; n--)
+    if (BITSET_P (mask, n)
+	&& !call_used_regs [n])
+      /* add ra saving and sp adjust. */
+      return CALLEE_SAVED_REG_NUMBER (n) + 1 + 2;
+  abort ();
+}
+
+/* Calculate the maximum sp adjustment of push/pop instruction. */
+
+static unsigned
+riscv_push_pop_base_sp_adjust (unsigned mask)
+{
+  unsigned n_regs = riscv_save_push_pop_count (mask) - 1;
+  return (n_regs * UNITS_PER_WORD + 15) & (~0xf);
+}
+
+/* Determine whether to call push/pop routines.  */
+
+static bool
+riscv_use_push_pop (const struct riscv_frame_info *frame, const HOST_WIDE_INT frame_size)
+{
+  if (!TARGET_ZCMP)
+    return false;
+
+  /* We do not handler variable argument cases currently.  */
+  if (cfun->machine->varargs_size != 0)
+    return false;
+
+  HOST_WIDE_INT base_size = riscv_push_pop_base_sp_adjust (frame->mask);
+  /*
+     Pr 960215-1.c in rv64 ouputs
+
+	addi	sp,sp,-32
+	sd	ra,24(sp)
+	sd	s0,16(sp)
+	sd	s2,8(sp)
+	sd	s3,0(sp)
+     it is a rare case that callee saved registers are not non-continous,
+     which breaks the old push implementation, and we just reject this case
+     like save-restore does now.
+  */
+  if (base_size > frame_size)
+    return false;
+
+  /* {ra,s0-s10} is invalid. */
+  if (frame->mask & (1 << (S10_REGNUM - GP_REG_FIRST))
+      && !(frame->mask & (1 << (S11_REGNUM - GP_REG_FIRST))))
+    return false;
+
+  return frame->mask & (1 << (RETURN_ADDR_REGNUM - GP_REG_FIRST));
+}
+
 /* Determine which GPR save/restore routine to call.  */
 
 static unsigned
@@ -4934,6 +5098,8 @@ riscv_compute_frame_info (void)
   /* Only use save/restore routines when the GPRs are atop the frame.  */
   if (known_ne (frame->hard_frame_pointer_offset, frame->total_size))
     frame->save_libcall_adjustment = 0;
+
+  frame->push_pop_sp_adjust = 0;
 }
 
 /* Make sure that we're not trying to eliminate to the wrong hard frame
@@ -5171,6 +5337,86 @@ riscv_for_each_saved_reg (poly_int64 sp_offset, riscv_save_restore_fn fn,
       }
 }
 
+static void
+riscv_emit_pop_insn (struct riscv_frame_info *frame, HOST_WIDE_INT offset, HOST_WIDE_INT size)
+{
+  unsigned int veclen = riscv_save_push_pop_count (frame->mask);
+  unsigned int n_reg = veclen - 1;
+  rtvec vec = rtvec_alloc (veclen);
+  HOST_WIDE_INT sp_adjust;
+  rtx dwarf = NULL_RTX;
+
+  const unsigned *reg_order = (TARGET_ZCMP && TARGET_RVE)
+	? push_save_reg_order_zcmpe
+	: push_save_reg_order;
+
+  gcc_assert (n_reg >= 1
+	&& TARGET_ZCMP
+	&& ((TARGET_RVE && (n_reg <= ARRAY_SIZE (push_save_reg_order_zcmpe)))
+	    || (TARGET_ZCMP && (n_reg <= ARRAY_SIZE (push_save_reg_order)))));
+
+  /* sp adjust pattern */
+  int max_allow_sp_adjust = riscv_push_pop_base_sp_adjust (frame->mask) + 48;
+  int aligned_size = size;
+
+  /* if sp adjustment is too large, we should split it first. */
+  if (aligned_size > max_allow_sp_adjust)
+    {
+      rtx dwarf_pre_sp_adjust = NULL_RTX;
+      rtx pre_adjust_rtx = gen_add3_insn (stack_pointer_rtx,
+			stack_pointer_rtx,
+			GEN_INT (aligned_size - max_allow_sp_adjust));
+      rtx insn = emit_insn (pre_adjust_rtx);
+
+      rtx cfa_pre_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
+			GEN_INT (aligned_size - max_allow_sp_adjust));
+      dwarf_pre_sp_adjust = alloc_reg_note (REG_CFA_DEF_CFA,
+		cfa_pre_adjust_rtx,
+		dwarf_pre_sp_adjust);
+
+      RTX_FRAME_RELATED_P (insn) = 1;
+      REG_NOTES (insn) = dwarf_pre_sp_adjust;
+
+      sp_adjust = max_allow_sp_adjust;
+    }
+  else
+    sp_adjust = (aligned_size + 15) & (~0xf);
+
+  /* register save sequence. */
+  for (unsigned i = 1; i < veclen; ++i)
+    {
+      offset -= UNITS_PER_WORD;
+      unsigned regno = reg_order[i];
+      rtx reg = gen_rtx_REG (Pmode, regno);
+      rtx mem = gen_frame_mem (Pmode, plus_constant (Pmode,
+	      stack_pointer_rtx,
+	      offset));
+      rtx set = gen_rtx_SET (reg, mem);
+      RTVEC_ELT (vec, i - 1) = set;
+      RTX_FRAME_RELATED_P (set) = 1;
+      dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf);
+    }
+
+  /* sp adjust pattern */
+  rtx adjust_sp_rtx
+      = gen_rtx_SET (stack_pointer_rtx,
+	    plus_constant (Pmode,
+		stack_pointer_rtx,
+		sp_adjust));
+  RTVEC_ELT (vec, veclen - 1) = adjust_sp_rtx;
+
+  rtx cfa_adjust_rtx = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
+	const0_rtx);
+  dwarf = alloc_reg_note (REG_CFA_DEF_CFA, cfa_adjust_rtx, dwarf);
+
+  frame->gp_sp_offset -= (veclen - 1) * UNITS_PER_WORD;
+  frame->push_pop_sp_adjust = sp_adjust;
+
+  rtx insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, vec));
+  RTX_FRAME_RELATED_P (insn) = 1;
+  REG_NOTES (insn) = dwarf;
+}
+
 /* For stack frames that can't be allocated with a single ADDI instruction,
    compute the best value to initially allocate.  It must at a minimum
    allocate enough space to spill the callee-saved registers.  If TARGET_RVC,
@@ -5270,6 +5516,146 @@ riscv_emit_stack_tie (void)
     emit_insn (gen_stack_tiedi (stack_pointer_rtx, hard_frame_pointer_rtx));
 }
 
+bool
+riscv_check_regno(rtx pat, unsigned regno)
+{
+  return REG_P (pat)
+      && REGNO (pat) == regno;
+}
+
+/* Function to check whether the OP is a valid stack push/pop operation.
+   This part is borrowed from nds32 nds32_valid_stack_push_pop_p */
+
+bool
+riscv_valid_stack_push_pop_p (rtx op, bool push_p)
+{
+  int index;
+  int total_count;
+  int sp_adjust_rtx_index;
+  rtx elt;
+  rtx elt_reg;
+  rtx elt_plus;
+
+  if (!TARGET_ZCMP)
+    return false;
+
+  total_count = XVECLEN (op, 0);
+  sp_adjust_rtx_index = push_p ? 0 : total_count - 1;
+
+  /* At least sp + one callee save/restore register rtx */
+  if (total_count < 2)
+    return false;
+
+  /* Perform some quick check for that every element should be 'set',
+     for pop, it might contain `ret` and `ret value` pattern.  */
+  for (index = 0; index < total_count; index++)
+    {
+      elt = XVECEXP (op, 0, index);
+
+      /* skip pop return value rtx */
+      if (!push_p && GET_CODE (elt) == SET
+	  && riscv_check_regno (SET_DEST (elt), RETURN_VALUE_REGNUM)
+	  && total_count >= 4
+	  && index + 1 < total_count
+	  && GET_CODE (XVECEXP (op, 0, index + 1)) == USE)
+	{
+	  rtx use_reg = XEXP (XVECEXP (op, 0, index + 1), 0);
+
+	  if (!riscv_check_regno (use_reg, RETURN_VALUE_REGNUM))
+	    return false;
+
+	  index += 1;
+	  continue;
+	}
+
+      /* skip ret rtx */
+      if (!push_p && GET_CODE (elt) == SIMPLE_RETURN
+	  && total_count >= 4
+	  && index + 1 < total_count
+	  && GET_CODE (XVECEXP (op, 0, index + 1)) == USE)
+	{
+	  rtx use_reg = XEXP (XVECEXP (op, 0, index + 1), 0);
+
+	  if (!riscv_check_regno (use_reg, RETURN_ADDR_REGNUM))
+	    return false;
+
+	  index += 1;
+	  sp_adjust_rtx_index -= 2;
+	  continue;
+	}
+
+      if (GET_CODE (elt) != SET)
+	return false;
+    }
+
+  elt = XVECEXP (op, 0, sp_adjust_rtx_index);
+  elt_reg  = SET_DEST (elt);
+  elt_plus = SET_SRC (elt);
+
+  /* Check this is (set (stack_reg) (plus stack_reg const)) pattern.  */
+  if (GET_CODE (elt_plus) != PLUS
+      || !riscv_check_regno (elt_reg, STACK_POINTER_REGNUM))
+    return false;
+
+  /* Pass all test, this is a valid rtx.  */
+  return true;
+}
+
+/* Generate push/pop rtx */
+
+static void
+riscv_emit_push_insn (struct riscv_frame_info *frame, HOST_WIDE_INT size)
+{
+  unsigned int veclen = riscv_save_push_pop_count (frame->mask);
+  unsigned int n_reg = veclen - 1;
+  rtvec vec = rtvec_alloc (veclen);
+
+  const unsigned *reg_order = (TARGET_ZCMP && TARGET_RVE)
+	? push_save_reg_order_zcmpe
+	: push_save_reg_order;
+
+  int aligned_size = (size + 15) & (~0xf);
+
+  gcc_assert (n_reg >= 1
+	&& TARGET_ZCMP
+	&& ((TARGET_RVE && (n_reg <= ARRAY_SIZE (push_save_reg_order_zcmpe)))
+	    || (TARGET_ZCMP && (n_reg <= ARRAY_SIZE (push_save_reg_order)))));
+
+  /* sp adjust pattern */
+  int max_allow_sp_adjust = riscv_push_pop_base_sp_adjust (frame->mask) + 48;
+  int sp_adjust = aligned_size > max_allow_sp_adjust ?
+      max_allow_sp_adjust
+      : aligned_size;
+
+  /*TODO: move this part to frame computation function. */
+  frame->gp_sp_offset = (veclen - 1) * UNITS_PER_WORD;
+  frame->push_pop_sp_adjust = sp_adjust;
+
+  rtx adjust_sp_rtx
+      = gen_rtx_SET (stack_pointer_rtx,
+	    plus_constant (Pmode,
+	    stack_pointer_rtx,
+	    -sp_adjust));
+  RTVEC_ELT (vec, 0) = adjust_sp_rtx;
+
+  /* Register save sequence. */
+  for (unsigned i = 1; i < veclen; ++i)
+    {
+      sp_adjust -= UNITS_PER_WORD;
+      unsigned regno = reg_order[i];
+      rtx reg = gen_rtx_REG (Pmode, regno);
+      rtx mem = gen_frame_mem (Pmode, plus_constant (Pmode,
+	      stack_pointer_rtx,
+	      sp_adjust));
+      rtx set = gen_rtx_SET (mem, reg);
+      RTVEC_ELT (vec, i) = set;
+      RTX_FRAME_RELATED_P (set) = 1;
+    }
+
+  rtx insn = emit_insn (gen_rtx_PARALLEL (VOIDmode, vec));
+  RTX_FRAME_RELATED_P (insn) = 1;
+}
+
 /* Expand the "prologue" pattern.  */
 
 void
@@ -5278,6 +5664,7 @@ riscv_expand_prologue (void)
   struct riscv_frame_info *frame = &cfun->machine->frame;
   poly_int64 size = frame->total_size;
   unsigned mask = frame->mask;
+  HOST_WIDE_INT step1 = riscv_first_stack_step (frame);
   rtx insn;
 
   if (flag_stack_usage_info)
@@ -5300,19 +5687,32 @@ riscv_expand_prologue (void)
       REG_NOTES (insn) = dwarf;
     }
 
+    if (size.is_constant ())
+    step1 = MIN (size.to_constant(), step1);
+  if (riscv_use_push_pop (frame, step1))
+    {
+      riscv_emit_push_insn (frame, step1);
+
+      step1 = MAX (step1 - frame->push_pop_sp_adjust, 0);
+      size = MAX (size.to_constant() - frame->push_pop_sp_adjust, 0);
+      frame->mask &= ~ ((TARGET_ZCMP && TARGET_RVE) ?
+		  RISCV_ZCMPE_PUSH_POP_MASK
+		: RISCV_ZCE_PUSH_POP_MASK);
+    }
+
   /* Save the registers.  */
   if ((frame->mask | frame->fmask) != 0)
     {
-      HOST_WIDE_INT step1 = riscv_first_stack_step (frame);
-      if (size.is_constant ())
-	step1 = MIN (size.to_constant(), step1);
-
-      insn = gen_add3_insn (stack_pointer_rtx,
-			    stack_pointer_rtx,
-			    GEN_INT (-step1));
-      RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
-      size -= step1;
-      riscv_for_each_saved_reg (size, riscv_save_reg, false, false);
+	if (step1 > 0)
+	{
+	  insn = gen_add3_insn (stack_pointer_rtx,
+			stack_pointer_rtx,
+			GEN_INT (-step1));
+	  RTX_FRAME_RELATED_P (emit_insn (insn)) = 1;
+	  size -= step1;
+	}
+     riscv_for_each_saved_reg (size, riscv_save_reg,
+	 false /* bool epilogue */, false /* bool maybe_eh_return */);
     }
 
   frame->mask = mask; /* Undo the above fib.  */
@@ -5412,6 +5812,8 @@ riscv_expand_epilogue (int style)
   rtx ra = gen_rtx_REG (Pmode, RETURN_ADDR_REGNUM);
   rtx insn;
 
+  bool use_zcmp_pop = !use_restore_libcall && !(crtl->calls_eh_return);
+
   /* We need to add memory barrier to prevent read from deallocated stack.  */
   bool need_barrier_p = known_ne (get_frame_size ()
 				  + cfun->machine->frame.arg_pointer_offset, 0);
@@ -5538,6 +5940,18 @@ riscv_expand_epilogue (int style)
   if (use_restore_libcall)
     frame->mask = 0; /* Temporarily fib that we need not save GPRs.  */
 
+  if (use_zcmp_pop && riscv_use_push_pop (frame, step2))
+    {
+      /* Emit a barrier to prevent loads from a deallocated stack.  */
+      riscv_emit_stack_tie ();
+      need_barrier_p = false;
+      riscv_emit_pop_insn (frame, frame->total_size.to_constant(), step2);
+      frame->mask &= ~ ((TARGET_ZCMP && TARGET_RVE) ?
+		  RISCV_ZCMPE_PUSH_POP_MASK
+		: RISCV_ZCE_PUSH_POP_MASK);
+      step2 = 0;
+    }
+
   /* Restore the registers.  */
   riscv_for_each_saved_reg (frame->total_size - step2, riscv_restore_reg,
 			    true, style == EXCEPTION_RETURN);
@@ -5552,6 +5966,9 @@ riscv_expand_epilogue (int style)
   if (need_barrier_p)
     riscv_emit_stack_tie ();
 
+  if (use_zcmp_pop)
+    frame->mask = mask;
+
   /* Deallocate the final bit of the frame.  */
   if (step2 > 0)
     {
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index d05b1d59853..6e6e3ee2c25 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -383,6 +383,7 @@ ASM_MISA_SPEC
 #define HARD_FRAME_POINTER_REGNUM 8
 #define STACK_POINTER_REGNUM 2
 #define THREAD_POINTER_REGNUM 4
+#define RETURN_VALUE_REGNUM 10
 
 /* These two registers don't really exist: they get eliminated to either
    the stack or hard frame pointer.  */
@@ -1097,4 +1098,7 @@ extern void riscv_remove_unneeded_save_restore_calls (void);
 #define DWARF_REG_TO_UNWIND_COLUMN(REGNO) \
   ((REGNO == RISCV_DWARF_VLENB) ? (FIRST_PSEUDO_REGISTER + 1) : REGNO)
 
+#define RISCV_ZCE_PUSH_POP_MASK 0x0ffc0302u
+#define RISCV_ZCMPE_PUSH_POP_MASK 0x302u
+
 #endif /* ! GCC_RISCV_H */
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index bc384d9aedf..b9f2a426e48 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -108,12 +108,14 @@
 
 (define_constants
   [(RETURN_ADDR_REGNUM		1)
+   (SP_REGNUM 			2)
    (GP_REGNUM 			3)
    (TP_REGNUM			4)
    (T0_REGNUM			5)
    (T1_REGNUM			6)
    (S0_REGNUM			8)
    (S1_REGNUM			9)
+   (A0_REGNUM			10)
    (S2_REGNUM			18)
    (S3_REGNUM			19)
    (S4_REGNUM			20)
@@ -3147,3 +3149,4 @@
 (include "sifive-7.md")
 (include "thead.md")
 (include "vector.md")
+(include "zc.md")
diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
index 6e326fc7e02..9ef522306a5 100644
--- a/gcc/config/riscv/t-riscv
+++ b/gcc/config/riscv/t-riscv
@@ -90,6 +90,10 @@ riscv-v.o: $(srcdir)/config/riscv/riscv-v.cc \
 	$(COMPILE) $<
 	$(POSTCOMPILE)
 
+riscv-zcmp-popret.o: $(srcdir)/config/riscv/riscv-zcmp-popret.cc
+	$(COMPILE) $<
+	$(POSTCOMPILE)
+
 thead.o: $(srcdir)/config/riscv/thead.cc \
   $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TARGET_H) backend.h $(RTL_H) \
   memmodel.h $(EMIT_RTL_H) poly-int.h output.h
diff --git a/gcc/config/riscv/zc.md b/gcc/config/riscv/zc.md
new file mode 100644
index 00000000000..3ad34dacd49
--- /dev/null
+++ b/gcc/config/riscv/zc.md
@@ -0,0 +1,47 @@
+;; Machine description for ZCE extension.
+;; Copyright (C) 2021 Free Software Foundation, Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+(define_insn "*stack_push<mode>"
+  [(match_parallel 0 "riscv_stack_push_operation"
+    [(set (reg:X SP_REGNUM) (plus:X (reg:X SP_REGNUM)
+      (match_operand:X 1 "const_int_operand" "")))])]
+  "TARGET_ZCMP"
+  "cm.push\t{%L0},%1")
+
+(define_insn "*stack_pop<mode>"
+  [(match_parallel 0 "riscv_stack_pop_operation"
+    [(set (match_operand:X 1 "register_operand" "")
+      (mem:X (plus:X (reg:X SP_REGNUM)
+	(match_operand:X 2 "const_int_operand" ""))))])]
+  "TARGET_ZCMP"
+  {
+    return riscv_output_popret_p (operands[0]) ?
+	"cm.popret\t{%L0},%s0" :
+	"cm.pop\t{%L0},%s0";
+  })
+
+(define_insn "*stack_pop_with_return_value<mode>"
+  [(match_parallel 0 "riscv_stack_pop_operation"
+    [(set (reg:ANYI A0_REGNUM)
+      (match_operand:ANYI 1 "pop_return_value_constant" ""))])]
+  "TARGET_ZCMP"
+  {
+    gcc_assert (riscv_output_popret_p (operands[0]));
+    return "cm.popretz\t{%L0},%s0";
+  })
-- 
2.25.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-05-12  9:10 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-25 10:11 [PATCH 4/5] RISC-V: Add Zcmp extension supports Fei Gao
2023-05-05 15:57 ` Sinan
2023-05-06  8:53   ` Fei Gao
2023-05-12  8:12     ` Sinan
2023-05-12  9:10       ` Fei Gao
     [not found] <2023042517370879865929@eswincomputing.com>
2023-04-25  9:52 ` Fei Gao
  -- strict thread matches above, loose matches on Subject: below --
2023-04-06  6:21 [PATCH 0/5] RISC-V: Support ZC* extensions Jiawei
2023-04-06  6:21 ` [PATCH 4/5] RISC-V: Add Zcmp extension supports Jiawei
2023-05-04  9:03   ` Kito Cheng
     [not found]   ` <07720619-dd69-4816-987e-ff0e14d9a348.>
2023-05-12  8:53     ` Sinan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).