[PATCH 0/2] RISC-V: New pass to optimize calculation of offsets for memory operations.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 0/2] RISC-V: New pass to optimize calculation of offsets for memory operations.
@ 2023-05-25 12:35 Manolis Tsamis
  2023-05-25 12:35 ` [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets Manolis Tsamis
                   ` (2 more replies)
  0 siblings, 3 replies; 45+ messages in thread
From: Manolis Tsamis @ 2023-05-25 12:35 UTC (permalink / raw)
  To: gcc-patches
  Cc: Richard Biener, Palmer Dabbelt, Philipp Tomsich, Kito Cheng,
	Manolis Tsamis

This pass tries to optimize memory offset calculations by moving them
from add immediate instructions to the memory loads/stores.
For example it can transform this:

  addi t4,sp,16
  add  t2,a6,t4
  shl  t3,t2,1
  ld   a2,0(t3)
  addi a2,1
  sd   a2,8(t2)

into the following (one instruction less):

  add  t2,a6,sp
  shl  t3,t2,1
  ld   a2,32(t3)
  addi a2,1
  sd   a2,24(t2)

Although there are places where this is done already, this pass is more
powerful and can handle the more difficult cases that are currently not
optimized. Also, it runs late enough and can optimize away unnecessary
stack pointer calculations.

The first patch in the series contains the implementation of this pass
while the second is a minor change that enables cprop_hardreg's
propgation of the stack pointer, because this pass depends on cprop
to do the propagation of optimized operations. If preferred I can split
this into two different patches (in which cases some of the testcases
included will fail temporarily).

Manolis Tsamis (2):
  Implementation of new RISCV optimizations pass: fold-mem-offsets.
  cprop_hardreg: Enable propagation of the stack pointer if possible.

 gcc/config.gcc                                |   2 +-
 gcc/config/riscv/riscv-fold-mem-offsets.cc    | 637 ++++++++++++++++++
 gcc/config/riscv/riscv-passes.def             |   1 +
 gcc/config/riscv/riscv-protos.h               |   1 +
 gcc/config/riscv/riscv.opt                    |   4 +
 gcc/config/riscv/t-riscv                      |   4 +
 gcc/doc/invoke.texi                           |   8 +
 gcc/regcprop.cc                               |   7 +-
 .../gcc.target/riscv/fold-mem-offsets-1.c     |  16 +
 .../gcc.target/riscv/fold-mem-offsets-2.c     |  24 +
 .../gcc.target/riscv/fold-mem-offsets-3.c     |  17 +
 11 files changed, 719 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/riscv/riscv-fold-mem-offsets.cc
 create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c

-- 
2.34.1

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.
  2023-05-25 12:35 [PATCH 0/2] RISC-V: New pass to optimize calculation of offsets for memory operations Manolis Tsamis
@ 2023-05-25 12:35 ` Manolis Tsamis
  2023-05-25 13:01   ` Richard Biener
                     ` (3 more replies)
  2023-05-25 12:35 ` [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible Manolis Tsamis
  2023-05-25 13:42 ` [PATCH 0/2] RISC-V: New pass to optimize calculation of offsets for memory operations Jeff Law
  2 siblings, 4 replies; 45+ messages in thread
From: Manolis Tsamis @ 2023-05-25 12:35 UTC (permalink / raw)
  To: gcc-patches
  Cc: Richard Biener, Palmer Dabbelt, Philipp Tomsich, Kito Cheng,
	Manolis Tsamis

Implementation of the new RISC-V optimization pass for memory offset
calculations, documentation and testcases.

gcc/ChangeLog:

	* config.gcc: Add riscv-fold-mem-offsets.o to extra_objs.
	* config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Schedule a new
	pass.
	* config/riscv/riscv-protos.h (make_pass_fold_mem_offsets): Declare.
	* config/riscv/riscv.opt: New options.
	* config/riscv/t-riscv: New build rule.
	* doc/invoke.texi: Document new option.
	* config/riscv/riscv-fold-mem-offsets.cc: New file.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/fold-mem-offsets-1.c: New test.
	* gcc.target/riscv/fold-mem-offsets-2.c: New test.
	* gcc.target/riscv/fold-mem-offsets-3.c: New test.

Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
---

 gcc/config.gcc                                |   2 +-
 gcc/config/riscv/riscv-fold-mem-offsets.cc    | 637 ++++++++++++++++++
 gcc/config/riscv/riscv-passes.def             |   1 +
 gcc/config/riscv/riscv-protos.h               |   1 +
 gcc/config/riscv/riscv.opt                    |   4 +
 gcc/config/riscv/t-riscv                      |   4 +
 gcc/doc/invoke.texi                           |   8 +
 .../gcc.target/riscv/fold-mem-offsets-1.c     |  16 +
 .../gcc.target/riscv/fold-mem-offsets-2.c     |  24 +
 .../gcc.target/riscv/fold-mem-offsets-3.c     |  17 +
 10 files changed, 713 insertions(+), 1 deletion(-)
 create mode 100644 gcc/config/riscv/riscv-fold-mem-offsets.cc
 create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index d88071773c9..5dffd21b4c8 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -529,7 +529,7 @@ pru-*-*)
 	;;
 riscv*)
 	cpu_type=riscv
-	extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o"
+	extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o riscv-fold-mem-offsets.o riscv-selftests.o riscv-v.o riscv-vsetvl.o"
 	extra_objs="${extra_objs} riscv-vector-builtins.o riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
 	extra_objs="${extra_objs} thead.o"
 	d_target_objs="riscv-d.o"
diff --git a/gcc/config/riscv/riscv-fold-mem-offsets.cc b/gcc/config/riscv/riscv-fold-mem-offsets.cc
new file mode 100644
index 00000000000..81325bb3beb
--- /dev/null
+++ b/gcc/config/riscv/riscv-fold-mem-offsets.cc
@@ -0,0 +1,637 @@
+/* Fold memory offsets pass for RISC-V.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#define IN_TARGET_CODE 1
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "rtl.h"
+#include "tree.h"
+#include "expr.h"
+#include "backend.h"
+#include "regs.h"
+#include "target.h"
+#include "memmodel.h"
+#include "emit-rtl.h"
+#include "insn-config.h"
+#include "recog.h"
+#include "predict.h"
+#include "df.h"
+#include "tree-pass.h"
+#include "cfgrtl.h"
+
+/* This pass tries to optimize memory offset calculations by moving them
+   from add immediate instructions to the memory loads/stores.
+   For example it can transform this:
+
+     addi t4,sp,16
+     add  t2,a6,t4
+     shl  t3,t2,1
+     ld   a2,0(t3)
+     addi a2,1
+     sd   a2,8(t2)
+
+   into the following (one instruction less):
+
+     add  t2,a6,sp
+     shl  t3,t2,1
+     ld   a2,32(t3)
+     addi a2,1
+     sd   a2,24(t2)
+
+   Usually, the code generated from the previous passes tries to have the
+   offsets in the memory instructions but this pass is still beneficial
+   because:
+
+    - There are cases where add instructions are added in a late rtl pass
+      and the rest of the pipeline cannot eliminate them.  Specifically,
+      arrays and structs allocated on the stack can result in multiple
+      unnecessary add instructions that cannot be eliminated easily
+      otherwise.
+
+    - The existing mechanisms that move offsets to memory instructions
+      usually apply only to specific patterns or have other limitations.
+      This pass is very generic and can fold offsets through complex
+      calculations with multiple memory uses and partially overlapping
+      calculations.  As a result it can eliminate more instructions than
+      what is possible otherwise.
+
+   This pass runs inside a single basic blocks and consists of 4 phases:
+
+    - Phase 1 (Analysis): Find "foldable" instructions.
+      Foldable instructions are those that we know how to propagate
+      a constant addition through (add, slli, mv, ...) and only have other
+      foldable instructions for uses.  In that phase a DFS traversal on the
+      definition tree is performed and foldable instructions are marked on
+      a bitmap.  The add immediate instructions that are reachable in this
+      DFS are candidates for removal since all the intermediate
+      calculations affected by them are also foldable.
+
+    - Phase 2 (Validity): Traverse again, this time calculating the
+      offsets that would result from folding all add immediate instructions
+      found.  Also keep track of which instructions will be folded for this
+      particular offset because folding can be partially or completely
+      shared across an number of different memory instructions.  At this point,
+      since we calculated the actual offset resulting from folding, we check
+      and keep track if it's a valid 12-bit immediate.
+
+    - Phase 3 (Commit offsets): Traverse again.  This time it is known if
+      a particular fold is valid so actually fold the offset by changing
+      the RTL statement.  It's important that this phase is separate from the
+      previous because one instruction that is foldable with a valid offset
+      can become result in an invalid offset for another instruction later on.
+
+    - Phase 4 (Commit instruction deletions): Scan all insns and delete
+      all add immediate instructions that were folded.  */
+
+namespace {
+
+const pass_data pass_data_fold_mem =
+{
+  RTL_PASS, /* type */
+  "fold_mem_offsets", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  TODO_df_finish, /* todo_flags_finish */
+};
+
+class pass_fold_mem_offsets : public rtl_opt_pass
+{
+public:
+  pass_fold_mem_offsets (gcc::context *ctxt)
+    : rtl_opt_pass (pass_data_fold_mem, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *)
+    {
+      return riscv_mfold_mem_offsets
+	       && optimize >= 2;
+    }
+
+  virtual unsigned int execute (function *);
+}; // class pass_fold_mem_offsets
+
+/* Bitmap that tracks which instructions are reachable through sequences
+   of foldable instructions.  */
+static bitmap_head can_fold_insn;
+
+/* Bitmap with instructions marked for deletion due to folding.  */
+static bitmap_head pending_remove_insn;
+
+/* Bitmap with instructions that cannot be deleted because that would
+   require folding an offset that's invalid in some memory access.
+   An instruction can be in both PENDING_REMOVE_INSN and CANNOT_REMOVE_INSN
+   at the same time, in which case it cannot be safely deleted.  */
+static bitmap_head cannot_remove_insn;
+
+/* The number of folded addi instructions of the form "addi reg, sp, X".  */
+static int stats_folded_sp;
+
+/* The number of the rest folded addi instructions.  */
+static int stats_folded_other;
+
+enum fold_mem_phase
+{
+  FM_PHASE_ANALYSIS,
+  FM_PHASE_VALIDITY,
+  FM_PHASE_COMMIT_OFFSETS,
+  FM_PHASE_COMMIT_INSNS
+};
+
+/* Helper function for fold_offsets.
+  Get the single reaching definition of an instruction inside a BB.
+  The definition is desired for REG used in INSN.
+  Return the definition insn or NULL if there's no definition with
+  the desired criteria.  */
+static rtx_insn*
+get_single_def_in_bb (rtx_insn *insn, rtx reg)
+{
+  df_ref use;
+  struct df_link *ref_chain, *ref_link;
+
+  FOR_EACH_INSN_USE (use, insn)
+    {
+      if (GET_CODE (DF_REF_REG (use)) == SUBREG)
+	return NULL;
+      if (REGNO (DF_REF_REG (use)) == REGNO (reg))
+	break;
+    }
+
+  if (!use)
+    return NULL;
+
+  ref_chain = DF_REF_CHAIN (use);
+
+  if (!ref_chain)
+    return NULL;
+
+  for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
+    {
+      /* Problem getting some definition for this instruction.  */
+      if (ref_link->ref == NULL)
+	return NULL;
+      if (DF_REF_INSN_INFO (ref_link->ref) == NULL)
+	return NULL;
+      if (global_regs[REGNO (reg)]
+	  && !set_of (reg, DF_REF_INSN (ref_link->ref)))
+	return NULL;
+    }
+
+  if (ref_chain->next)
+    return NULL;
+
+  rtx_insn* def = DF_REF_INSN (ref_chain->ref);
+
+  if (BLOCK_FOR_INSN (def) != BLOCK_FOR_INSN (insn))
+    return NULL;
+
+  if (DF_INSN_LUID (def) > DF_INSN_LUID (insn))
+    return NULL;
+
+  return def;
+}
+
+/* Helper function for fold_offsets.
+   Get all the reaching uses of an instruction.  The uses are desired for REG
+   set in INSN.  Return use list or NULL if a use is missing or irregular.
+   If SUCCESS is not NULL then it's value is set to false if there are
+   missing or irregular uses and to true otherwise.  */
+static struct df_link*
+get_uses (rtx_insn *insn, rtx reg, bool* success)
+{
+  df_ref def;
+  struct df_link *ref_chain, *ref_link;
+
+  if (success != NULL)
+    *success = false;
+
+  FOR_EACH_INSN_DEF (def, insn)
+    if (REGNO (DF_REF_REG (def)) == REGNO (reg))
+      break;
+
+  if (!def)
+    return NULL;
+
+  ref_chain = DF_REF_CHAIN (def);
+
+  for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
+    {
+      /* Problem getting some use for this instruction.  */
+      if (ref_link->ref == NULL)
+	return NULL;
+      if (DF_REF_CLASS (ref_link->ref) != DF_REF_REGULAR)
+	return NULL;
+    }
+
+  if (success != NULL)
+    *success = true;
+
+  return ref_chain;
+}
+
+/* Recursive function that computes the foldable offsets through the
+   definitions of REG in INSN given an integer scale factor SCALE.
+   Returns the offset that would have to be added if all instructions
+   in PENDING_DELETES were to be deleted.
+
+  - if ANALYZE is true then it recurses through definitions with the common
+    code and marks eligible for folding instructions in the bitmap
+    can_fold_insn.  An instruction is eligible if all it's uses are also
+    eligible.  Initially can_fold_insn is true for memory accesses.
+
+  - if ANALYZE is false then it recurses through definitions with the common
+    code and computes and returns the offset that would result from folding
+    the instructions in PENDING_DELETES were to be deleted.  */
+static HOST_WIDE_INT
+fold_offsets (rtx_insn* insn, rtx reg, int scale, bool analyze,
+	      bitmap pending_deletes)
+{
+  rtx_insn* def = get_single_def_in_bb (insn, reg);
+
+  if (!def)
+    return 0;
+
+  rtx set = single_set (def);
+
+  if (!set)
+    return 0;
+
+  rtx src = SET_SRC (set);
+  rtx dest = SET_DEST (set);
+
+  enum rtx_code code = GET_CODE (src);
+
+  /* Return early for SRC codes that we don't know how to handle.  */
+  if (code != PLUS && code != ASHIFT && code != REG)
+    return 0;
+
+  unsigned int dest_regno = REGNO (dest);
+
+  /* We don't want to fold offsets from instructions that change some
+     particular registers with potentially global side effects.  */
+  if (!GP_REG_P (dest_regno)
+      || dest_regno == STACK_POINTER_REGNUM
+      || (frame_pointer_needed && dest_regno == HARD_FRAME_POINTER_REGNUM)
+      || dest_regno == GP_REGNUM
+      || dest_regno == THREAD_POINTER_REGNUM
+      || dest_regno == RETURN_ADDR_REGNUM)
+    return 0;
+
+  if (analyze)
+    {
+      /* We can only fold through instructions that are eventually used as
+	 memory addresses and do not have other uses.  Use the same logic
+	 from the offset calculation to visit instructions that can
+	 propagate offsets and keep track in can_fold_insn which have uses
+	 that end always in memory instructions.  */
+
+      if (REG_P (dest))
+	{
+	  bool success;
+	  struct df_link *uses = get_uses (def, dest, &success), *ref_link;
+
+	  if (!success)
+	    return 0;
+
+	  for (ref_link = uses; ref_link; ref_link = ref_link->next)
+	    {
+	      rtx_insn* use = DF_REF_INSN (ref_link->ref);
+
+	      /* Ignore debug insns during analysis.  */
+	      if (DEBUG_INSN_P (use))
+		continue;
+
+	      if (!bitmap_bit_p (&can_fold_insn, INSN_UID (use)))
+		return 0;
+
+	      rtx use_set = single_set (use);
+
+	      /* Prevent folding when a memory store uses the dest register.  */
+	      if (use_set
+		  && MEM_P (SET_DEST (use_set))
+		  && REG_P (SET_SRC (use_set))
+		  && REGNO (SET_SRC (use_set)) == REGNO (dest))
+		return 0;
+	    }
+
+	  bitmap_set_bit (&can_fold_insn, INSN_UID (def));
+	}
+    }
+
+  if (!bitmap_bit_p (&can_fold_insn, INSN_UID (def)))
+    return 0;
+
+  switch (code)
+    {
+    case PLUS:
+      {
+	/* Propagate through add.  */
+	rtx arg1 = XEXP (src, 0);
+	rtx arg2 = XEXP (src, 1);
+
+	HOST_WIDE_INT offset = 0;
+
+	if (REG_P (arg1))
+	  offset += fold_offsets (def, arg1, 1, analyze, pending_deletes);
+	else if (GET_CODE (arg1) == ASHIFT && REG_P (XEXP (arg1, 0))
+		 && CONST_INT_P (XEXP (arg1, 1)))
+	  {
+	    /* Also handle shift-and-add from the zbb extension.  */
+	    int shift_scale = (1 << (int) INTVAL (XEXP (arg1, 1)));
+	    offset += fold_offsets (def, XEXP (arg1, 0), shift_scale, analyze,
+				    pending_deletes);
+	  }
+
+	if (REG_P (arg2))
+	  offset += fold_offsets (def, arg2, 1, analyze, pending_deletes);
+	else if (CONST_INT_P (arg2) && !analyze)
+	  {
+	    offset += INTVAL (arg2);
+	    bitmap_set_bit (pending_deletes, INSN_UID (def));
+	  }
+
+	return scale * offset;
+      }
+    case ASHIFT:
+      {
+	/* Propagate through sll.  */
+	rtx arg1 = XEXP (src, 0);
+	rtx arg2 = XEXP (src, 1);
+
+	if (REG_P (arg1) && CONST_INT_P (arg2))
+	  {
+	    int shift_scale = (1 << (int) INTVAL (arg2));
+	    return scale * fold_offsets (def, arg1, shift_scale, analyze,
+					 pending_deletes);
+	  }
+
+	return 0;
+      }
+    case REG:
+      /* Propagate through mv.  */
+      return scale * fold_offsets (def, src, 1, analyze, pending_deletes);
+    default:
+      /* Cannot propagate.  */
+      return 0;
+    }
+}
+
+/* Helper function for fold_offset_mem.
+   If INSN is a set rtx that loads from or stores to
+   some memory location that could have an offset folded
+   to it, return the rtx for the memory operand.  */
+static rtx
+get_foldable_mem_rtx (rtx_insn* insn)
+{
+  rtx set = single_set (insn);
+
+  if (set != NULL_RTX)
+    {
+      rtx src = SET_SRC (set);
+      rtx dest = SET_DEST (set);
+
+      /* We don't want folding if the memory has
+	 unspec/unspec volatile in either src or dest.
+	 In particular this also prevents folding
+	 when atomics are involved.  */
+      if (GET_CODE (src) == UNSPEC
+	  || GET_CODE (src) == UNSPEC_VOLATILE
+	  || GET_CODE (dest) == UNSPEC
+	  || GET_CODE (dest) == UNSPEC_VOLATILE)
+	return NULL;
+
+      if (MEM_P (src))
+	return src;
+      else if (MEM_P (dest))
+	return dest;
+      else if ((
+		GET_CODE (src) == SIGN_EXTEND
+		|| GET_CODE (src) == ZERO_EXTEND
+	      )
+	      && MEM_P (XEXP (src, 0)))
+	return XEXP (src, 0);
+    }
+
+  return NULL;
+}
+
+/* Driver function that performs the actions defined by PHASE for INSN.  */
+static void
+fold_offset_mem (rtx_insn* insn, int phase)
+{
+  if (phase == FM_PHASE_COMMIT_INSNS)
+    {
+      if (bitmap_bit_p (&pending_remove_insn, INSN_UID (insn))
+	  && !bitmap_bit_p (&cannot_remove_insn, INSN_UID (insn)))
+	{
+	  rtx set = single_set (insn);
+	  rtx src = SET_SRC (set);
+	  rtx dest = SET_DEST (set);
+	  rtx arg1 = XEXP (src, 0);
+
+	  /* INSN is an add immidiate addi DEST, SRC1, SRC2 that we
+	     must replace with addi DEST, SRC1, 0.  */
+	  if (XEXP (src, 0) == stack_pointer_rtx)
+	    stats_folded_sp++;
+	  else
+	    stats_folded_other++;
+
+	  if (dump_file)
+	    {
+	      fprintf (dump_file, "Instruction deleted from folding:");
+	      print_rtl_single (dump_file, insn);
+	    }
+
+	  if (REGNO (dest) != REGNO (arg1))
+	    {
+	      /* If the dest register is different than the fisrt argument
+		 then the addition with constant 0 is equivalent to a move
+		 instruction.  We emit the move and let the subsequent
+		 pass cprop_hardreg eliminate that if possible.  */
+	      rtx arg1_reg_rtx = gen_rtx_REG (GET_MODE (dest), REGNO (arg1));
+	      rtx mov_rtx = gen_move_insn (dest, arg1_reg_rtx);
+	      df_insn_rescan (emit_insn_after (mov_rtx, insn));
+	    }
+
+	  /* If the dest register is the same with the first argument
+	     then the addition with constant 0 is a no-op.
+	     We can now delete the original add immidiate instruction.  */
+	  delete_insn (insn);
+	}
+    }
+  else
+    {
+      rtx mem = get_foldable_mem_rtx (insn);
+
+      if (!mem)
+	return;
+
+      rtx mem_addr = XEXP (mem, 0);
+      rtx reg;
+      HOST_WIDE_INT cur_off;
+
+      if (REG_P (mem_addr))
+	{
+	  reg = mem_addr;
+	  cur_off = 0;
+	}
+      else if (GET_CODE (mem_addr) == PLUS
+	       && REG_P (XEXP (mem_addr, 0))
+	       && CONST_INT_P (XEXP (mem_addr, 1)))
+	{
+	  reg = XEXP (mem_addr, 0);
+	  cur_off = INTVAL (XEXP (mem_addr, 1));
+	}
+      else
+	return;
+
+      if (phase == FM_PHASE_ANALYSIS)
+	{
+	  bitmap_set_bit (&can_fold_insn, INSN_UID (insn));
+	  fold_offsets (insn, reg, 1, true, NULL);
+	}
+      else if (phase == FM_PHASE_VALIDITY)
+	{
+	  bitmap_head new_pending_deletes;
+	  bitmap_initialize (&new_pending_deletes, NULL);
+	  HOST_WIDE_INT offset = cur_off + fold_offsets (insn, reg, 1, false,
+							&new_pending_deletes);
+
+	  /* Temporarily change the offset in MEM to test whether
+	     it results in a valid instruction.  */
+	  machine_mode mode = GET_MODE (mem_addr);
+	  XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, GEN_INT (offset));
+
+	  bool valid_change = recog (PATTERN (insn), insn, 0) >= 0;
+
+	  /* Restore the instruction.  */
+	  XEXP (mem, 0) = mem_addr;
+
+	  if (valid_change)
+	    bitmap_ior_into (&pending_remove_insn, &new_pending_deletes);
+	  else
+	    bitmap_ior_into (&cannot_remove_insn, &new_pending_deletes);
+	  bitmap_release (&new_pending_deletes);
+	}
+      else if (phase == FM_PHASE_COMMIT_OFFSETS)
+	{
+	  bitmap_head required_deletes;
+	  bitmap_initialize (&required_deletes, NULL);
+	  HOST_WIDE_INT offset = cur_off + fold_offsets (insn, reg, 1, false,
+							 &required_deletes);
+	  bool illegal = bitmap_intersect_p (&required_deletes,
+					     &cannot_remove_insn);
+
+	  if (offset == cur_off)
+	    return;
+
+	  gcc_assert (!bitmap_empty_p (&required_deletes));
+
+	  /* We have to update CANNOT_REMOVE_INSN again if transforming
+	     this instruction is illegal.  */
+	  if (illegal)
+	    bitmap_ior_into (&cannot_remove_insn, &required_deletes);
+	  else
+	    {
+	      machine_mode mode = GET_MODE (mem_addr);
+	      XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, GEN_INT (offset));
+	      df_insn_rescan (insn);
+
+	      if (dump_file)
+		{
+		  fprintf (dump_file, "Memory offset changed from "
+				      HOST_WIDE_INT_PRINT_DEC
+				      " to "
+				      HOST_WIDE_INT_PRINT_DEC
+				      " for instruction:\n", cur_off, offset);
+			print_rtl_single (dump_file, insn);
+		}
+	    }
+	  bitmap_release (&required_deletes);
+	}
+    }
+}
+
+unsigned int
+pass_fold_mem_offsets::execute (function *fn)
+{
+  basic_block bb;
+  rtx_insn *insn;
+
+  df_set_flags (DF_RD_PRUNE_DEAD_DEFS | DF_DEFER_INSN_RESCAN);
+  df_chain_add_problem (DF_UD_CHAIN + DF_DU_CHAIN);
+  df_analyze ();
+
+  bitmap_initialize (&can_fold_insn, NULL);
+  bitmap_initialize (&pending_remove_insn, NULL);
+  bitmap_initialize (&cannot_remove_insn, NULL);
+
+  stats_folded_sp = 0;
+  stats_folded_other = 0;
+
+  FOR_ALL_BB_FN (bb, fn)
+    {
+      /* The shorten-memrefs pass runs when a BB is optimized for size
+	 and moves offsets from multiple memory instructions to a common
+	 add instruction.  Disable folding if optimizing for size because
+	 this pass will cancel the effects of shorten-memrefs.  */
+      if (optimize_bb_for_size_p (bb))
+	continue;
+
+      bitmap_clear (&can_fold_insn);
+      bitmap_clear (&pending_remove_insn);
+      bitmap_clear (&cannot_remove_insn);
+
+      FOR_BB_INSNS (bb, insn)
+	fold_offset_mem (insn, FM_PHASE_ANALYSIS);
+
+      FOR_BB_INSNS (bb, insn)
+	fold_offset_mem (insn, FM_PHASE_VALIDITY);
+
+      FOR_BB_INSNS (bb, insn)
+	fold_offset_mem (insn, FM_PHASE_COMMIT_OFFSETS);
+
+      FOR_BB_INSNS (bb, insn)
+	fold_offset_mem (insn, FM_PHASE_COMMIT_INSNS);
+    }
+
+  statistics_counter_event (cfun, "addi with sp fold", stats_folded_sp);
+  statistics_counter_event (cfun, "other addi fold", stats_folded_other);
+
+  bitmap_release (&can_fold_insn);
+  bitmap_release (&pending_remove_insn);
+  bitmap_release (&cannot_remove_insn);
+
+  return 0;
+}
+
+} // anon namespace
+
+rtl_opt_pass *
+make_pass_fold_mem_offsets (gcc::context *ctxt)
+{
+  return new pass_fold_mem_offsets (ctxt);
+}
diff --git a/gcc/config/riscv/riscv-passes.def b/gcc/config/riscv/riscv-passes.def
index 4084122cf0a..dc08daadc66 100644
--- a/gcc/config/riscv/riscv-passes.def
+++ b/gcc/config/riscv/riscv-passes.def
@@ -18,4 +18,5 @@
    <http://www.gnu.org/licenses/>.  */
 
 INSERT_PASS_AFTER (pass_rtl_store_motion, 1, pass_shorten_memrefs);
+INSERT_PASS_AFTER (pass_regrename, 1, pass_fold_mem_offsets);
 INSERT_PASS_BEFORE (pass_fast_rtl_dce, 1, pass_vsetvl);
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5f78fd579bb..b89a82adb0e 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -104,6 +104,7 @@ extern void riscv_parse_arch_string (const char *, struct gcc_options *, locatio
 extern bool riscv_hard_regno_rename_ok (unsigned, unsigned);
 
 rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
+rtl_opt_pass * make_pass_fold_mem_offsets (gcc::context *ctxt);
 rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
 
 /* Information about one CPU we know about.  */
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 63d4710cb15..5e1fbdbedcc 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -105,6 +105,10 @@ Convert BASE + LARGE_OFFSET addresses to NEW_BASE + SMALL_OFFSET to allow more
 memory accesses to be generated as compressed instructions.  Currently targets
 32-bit integer load/stores.
 
+mfold-mem-offsets
+Target Bool Var(riscv_mfold_mem_offsets) Init(1)
+Fold instructions calculating memory offsets to the memory access instruction if possible.
+
 mcmodel=
 Target RejectNegative Joined Enum(code_model) Var(riscv_cmodel) Init(TARGET_DEFAULT_CMODEL)
 Specify the code model.
diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
index 1252d6f851a..f29cf463867 100644
--- a/gcc/config/riscv/t-riscv
+++ b/gcc/config/riscv/t-riscv
@@ -76,6 +76,10 @@ riscv-shorten-memrefs.o: $(srcdir)/config/riscv/riscv-shorten-memrefs.cc \
 	$(COMPILE) $<
 	$(POSTCOMPILE)
 
+riscv-fold-mem-offsets.o: $(srcdir)/config/riscv/riscv-fold-mem-offsets.cc
+	$(COMPILE) $<
+	$(POSTCOMPILE)
+
 riscv-selftests.o: $(srcdir)/config/riscv/riscv-selftests.cc \
   $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(TREE_H) output.h \
   $(C_COMMON_H) $(TARGET_H) $(OPTABS_H) $(EXPR_H) $(INSN_ATTR_H) $(EMIT_RTL_H)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ee78591c73e..39b57cab595 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1218,6 +1218,7 @@ See RS/6000 and PowerPC Options.
 -msmall-data-limit=@var{N-bytes}
 -msave-restore  -mno-save-restore
 -mshorten-memrefs  -mno-shorten-memrefs
+-mfold-mem-offsets  -mno-fold-mem-offsets
 -mstrict-align  -mno-strict-align
 -mcmodel=medlow  -mcmodel=medany
 -mexplicit-relocs  -mno-explicit-relocs
@@ -29048,6 +29049,13 @@ of 'new base + small offset'.  If the new base gets stored in a compressed
 register, then the new load/store can be compressed.  Currently targets 32-bit
 integer load/stores only.
 
+@opindex mfold-mem-offsets
+@item -mfold-mem-offsets
+@itemx -mno-fold-mem-offsets
+Do or do not attempt to move constant addition calculations used to for memory
+offsets to the corresponding memory instructions.  The default is
+@option{-mfold-mem-offsets} at levels @option{-O2}, @option{-O3}.
+
 @opindex mstrict-align
 @item -mstrict-align
 @itemx -mno-strict-align
diff --git a/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
new file mode 100644
index 00000000000..574cc92b6ab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mfold-mem-offsets" } */
+
+void sink(int arr[2]);
+
+void
+foo(int a, int b, int i)
+{
+  int arr[2] = {a, b};
+  arr[i]++;
+  sink(arr);
+}
+
+// Should compile without negative memory offsets when using -mfold-mem-offsets
+/* { dg-final { scan-assembler-not "lw\t.*,-.*\\(.*\\)" } } */
+/* { dg-final { scan-assembler-not "sw\t.*,-.*\\(.*\\)" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
new file mode 100644
index 00000000000..e6c251d3a3c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mfold-mem-offsets" } */
+
+void sink(int arr[3]);
+
+void
+foo(int a, int b, int c, int i)
+{
+  int arr1[3] = {a, b, c};
+  int arr2[3] = {a, c, b};
+  int arr3[3] = {c, b, a};
+
+  arr1[i]++;
+  arr2[i]++;
+  arr3[i]++;
+  
+  sink(arr1);
+  sink(arr2);
+  sink(arr3);
+}
+
+// Should compile without negative memory offsets when using -mfold-mem-offsets
+/* { dg-final { scan-assembler-not "lw\t.*,-.*\\(.*\\)" } } */
+/* { dg-final { scan-assembler-not "sw\t.*,-.*\\(.*\\)" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c
new file mode 100644
index 00000000000..8586d3e3a29
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mfold-mem-offsets" } */
+
+void load(int arr[2]);
+
+int
+foo(long unsigned int i)
+{
+  int arr[2];
+  load(arr);
+
+  return arr[3 * i + 77];
+}
+
+// Should compile without negative memory offsets when using -mfold-mem-offsets
+/* { dg-final { scan-assembler-not "lw\t.*,-.*\\(.*\\)" } } */
+/* { dg-final { scan-assembler-not "addi\t.*,.*,77" } } */
\ No newline at end of file
-- 
2.34.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.
  2023-05-25 12:35 [PATCH 0/2] RISC-V: New pass to optimize calculation of offsets for memory operations Manolis Tsamis
  2023-05-25 12:35 ` [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets Manolis Tsamis
@ 2023-05-25 12:35 ` Manolis Tsamis
  2023-05-25 13:38   ` Jeff Law
  2023-06-07 22:18   ` Jeff Law
  2023-05-25 13:42 ` [PATCH 0/2] RISC-V: New pass to optimize calculation of offsets for memory operations Jeff Law
  2 siblings, 2 replies; 45+ messages in thread
From: Manolis Tsamis @ 2023-05-25 12:35 UTC (permalink / raw)
  To: gcc-patches
  Cc: Richard Biener, Palmer Dabbelt, Philipp Tomsich, Kito Cheng,
	Manolis Tsamis

Propagation of the stack pointer in cprop_hardreg is currenty forbidden
in all cases, due to maybe_mode_change returning NULL. Relax this
restriction and allow propagation when no mode change is requested.

gcc/ChangeLog:

        * regcprop.cc (maybe_mode_change): Enable stack pointer propagation.

Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
---

 gcc/regcprop.cc | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/regcprop.cc b/gcc/regcprop.cc
index f426f4fedcd..6cbfadb181f 100644
--- a/gcc/regcprop.cc
+++ b/gcc/regcprop.cc
@@ -422,7 +422,12 @@ maybe_mode_change (machine_mode orig_mode, machine_mode copy_mode,
 
      It's unclear if we need to do the same for other special registers.  */
   if (regno == STACK_POINTER_REGNUM)
-    return NULL_RTX;
+    {
+      if (orig_mode == new_mode)
+	return stack_pointer_rtx;
+      else
+	return NULL_RTX;
+    }
 
   if (orig_mode == new_mode)
     return gen_raw_REG (new_mode, regno);
-- 
2.34.1


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.
  2023-05-25 12:35 ` [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets Manolis Tsamis
@ 2023-05-25 13:01   ` Richard Biener
  2023-05-25 13:25     ` Manolis Tsamis
  2023-05-25 13:31     ` Jeff Law
  2023-06-08  5:37   ` Jeff Law
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 45+ messages in thread
From: Richard Biener @ 2023-05-25 13:01 UTC (permalink / raw)
  To: Manolis Tsamis; +Cc: gcc-patches, Palmer Dabbelt, Philipp Tomsich, Kito Cheng

On Thu, May 25, 2023 at 2:36 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
>
> Implementation of the new RISC-V optimization pass for memory offset
> calculations, documentation and testcases.

Why do fwprop or combine not what you want to do?

> gcc/ChangeLog:
>
>         * config.gcc: Add riscv-fold-mem-offsets.o to extra_objs.
>         * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Schedule a new
>         pass.
>         * config/riscv/riscv-protos.h (make_pass_fold_mem_offsets): Declare.
>         * config/riscv/riscv.opt: New options.
>         * config/riscv/t-riscv: New build rule.
>         * doc/invoke.texi: Document new option.
>         * config/riscv/riscv-fold-mem-offsets.cc: New file.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/riscv/fold-mem-offsets-1.c: New test.
>         * gcc.target/riscv/fold-mem-offsets-2.c: New test.
>         * gcc.target/riscv/fold-mem-offsets-3.c: New test.
>
> Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
> ---
>
>  gcc/config.gcc                                |   2 +-
>  gcc/config/riscv/riscv-fold-mem-offsets.cc    | 637 ++++++++++++++++++
>  gcc/config/riscv/riscv-passes.def             |   1 +
>  gcc/config/riscv/riscv-protos.h               |   1 +
>  gcc/config/riscv/riscv.opt                    |   4 +
>  gcc/config/riscv/t-riscv                      |   4 +
>  gcc/doc/invoke.texi                           |   8 +
>  .../gcc.target/riscv/fold-mem-offsets-1.c     |  16 +
>  .../gcc.target/riscv/fold-mem-offsets-2.c     |  24 +
>  .../gcc.target/riscv/fold-mem-offsets-3.c     |  17 +
>  10 files changed, 713 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/config/riscv/riscv-fold-mem-offsets.cc
>  create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index d88071773c9..5dffd21b4c8 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -529,7 +529,7 @@ pru-*-*)
>         ;;
>  riscv*)
>         cpu_type=riscv
> -       extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o"
> +       extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o riscv-fold-mem-offsets.o riscv-selftests.o riscv-v.o riscv-vsetvl.o"
>         extra_objs="${extra_objs} riscv-vector-builtins.o riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
>         extra_objs="${extra_objs} thead.o"
>         d_target_objs="riscv-d.o"
> diff --git a/gcc/config/riscv/riscv-fold-mem-offsets.cc b/gcc/config/riscv/riscv-fold-mem-offsets.cc
> new file mode 100644
> index 00000000000..81325bb3beb
> --- /dev/null
> +++ b/gcc/config/riscv/riscv-fold-mem-offsets.cc
> @@ -0,0 +1,637 @@
> +/* Fold memory offsets pass for RISC-V.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify
> +it under the terms of the GNU General Public License as published by
> +the Free Software Foundation; either version 3, or (at your option)
> +any later version.
> +
> +GCC is distributed in the hope that it will be useful,
> +but WITHOUT ANY WARRANTY; without even the implied warranty of
> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +GNU General Public License for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#define IN_TARGET_CODE 1
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "tm.h"
> +#include "rtl.h"
> +#include "tree.h"
> +#include "expr.h"
> +#include "backend.h"
> +#include "regs.h"
> +#include "target.h"
> +#include "memmodel.h"
> +#include "emit-rtl.h"
> +#include "insn-config.h"
> +#include "recog.h"
> +#include "predict.h"
> +#include "df.h"
> +#include "tree-pass.h"
> +#include "cfgrtl.h"
> +
> +/* This pass tries to optimize memory offset calculations by moving them
> +   from add immediate instructions to the memory loads/stores.
> +   For example it can transform this:
> +
> +     addi t4,sp,16
> +     add  t2,a6,t4
> +     shl  t3,t2,1
> +     ld   a2,0(t3)
> +     addi a2,1
> +     sd   a2,8(t2)
> +
> +   into the following (one instruction less):
> +
> +     add  t2,a6,sp
> +     shl  t3,t2,1
> +     ld   a2,32(t3)
> +     addi a2,1
> +     sd   a2,24(t2)
> +
> +   Usually, the code generated from the previous passes tries to have the
> +   offsets in the memory instructions but this pass is still beneficial
> +   because:
> +
> +    - There are cases where add instructions are added in a late rtl pass
> +      and the rest of the pipeline cannot eliminate them.  Specifically,
> +      arrays and structs allocated on the stack can result in multiple
> +      unnecessary add instructions that cannot be eliminated easily
> +      otherwise.
> +
> +    - The existing mechanisms that move offsets to memory instructions
> +      usually apply only to specific patterns or have other limitations.
> +      This pass is very generic and can fold offsets through complex
> +      calculations with multiple memory uses and partially overlapping
> +      calculations.  As a result it can eliminate more instructions than
> +      what is possible otherwise.
> +
> +   This pass runs inside a single basic blocks and consists of 4 phases:
> +
> +    - Phase 1 (Analysis): Find "foldable" instructions.
> +      Foldable instructions are those that we know how to propagate
> +      a constant addition through (add, slli, mv, ...) and only have other
> +      foldable instructions for uses.  In that phase a DFS traversal on the
> +      definition tree is performed and foldable instructions are marked on
> +      a bitmap.  The add immediate instructions that are reachable in this
> +      DFS are candidates for removal since all the intermediate
> +      calculations affected by them are also foldable.
> +
> +    - Phase 2 (Validity): Traverse again, this time calculating the
> +      offsets that would result from folding all add immediate instructions
> +      found.  Also keep track of which instructions will be folded for this
> +      particular offset because folding can be partially or completely
> +      shared across an number of different memory instructions.  At this point,
> +      since we calculated the actual offset resulting from folding, we check
> +      and keep track if it's a valid 12-bit immediate.
> +
> +    - Phase 3 (Commit offsets): Traverse again.  This time it is known if
> +      a particular fold is valid so actually fold the offset by changing
> +      the RTL statement.  It's important that this phase is separate from the
> +      previous because one instruction that is foldable with a valid offset
> +      can become result in an invalid offset for another instruction later on.
> +
> +    - Phase 4 (Commit instruction deletions): Scan all insns and delete
> +      all add immediate instructions that were folded.  */
> +
> +namespace {
> +
> +const pass_data pass_data_fold_mem =
> +{
> +  RTL_PASS, /* type */
> +  "fold_mem_offsets", /* name */
> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_NONE, /* tv_id */
> +  0, /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  TODO_df_finish, /* todo_flags_finish */
> +};
> +
> +class pass_fold_mem_offsets : public rtl_opt_pass
> +{
> +public:
> +  pass_fold_mem_offsets (gcc::context *ctxt)
> +    : rtl_opt_pass (pass_data_fold_mem, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *)
> +    {
> +      return riscv_mfold_mem_offsets
> +              && optimize >= 2;
> +    }
> +
> +  virtual unsigned int execute (function *);
> +}; // class pass_fold_mem_offsets
> +
> +/* Bitmap that tracks which instructions are reachable through sequences
> +   of foldable instructions.  */
> +static bitmap_head can_fold_insn;
> +
> +/* Bitmap with instructions marked for deletion due to folding.  */
> +static bitmap_head pending_remove_insn;
> +
> +/* Bitmap with instructions that cannot be deleted because that would
> +   require folding an offset that's invalid in some memory access.
> +   An instruction can be in both PENDING_REMOVE_INSN and CANNOT_REMOVE_INSN
> +   at the same time, in which case it cannot be safely deleted.  */
> +static bitmap_head cannot_remove_insn;
> +
> +/* The number of folded addi instructions of the form "addi reg, sp, X".  */
> +static int stats_folded_sp;
> +
> +/* The number of the rest folded addi instructions.  */
> +static int stats_folded_other;
> +
> +enum fold_mem_phase
> +{
> +  FM_PHASE_ANALYSIS,
> +  FM_PHASE_VALIDITY,
> +  FM_PHASE_COMMIT_OFFSETS,
> +  FM_PHASE_COMMIT_INSNS
> +};
> +
> +/* Helper function for fold_offsets.
> +  Get the single reaching definition of an instruction inside a BB.
> +  The definition is desired for REG used in INSN.
> +  Return the definition insn or NULL if there's no definition with
> +  the desired criteria.  */
> +static rtx_insn*
> +get_single_def_in_bb (rtx_insn *insn, rtx reg)
> +{
> +  df_ref use;
> +  struct df_link *ref_chain, *ref_link;
> +
> +  FOR_EACH_INSN_USE (use, insn)
> +    {
> +      if (GET_CODE (DF_REF_REG (use)) == SUBREG)
> +       return NULL;
> +      if (REGNO (DF_REF_REG (use)) == REGNO (reg))
> +       break;
> +    }
> +
> +  if (!use)
> +    return NULL;
> +
> +  ref_chain = DF_REF_CHAIN (use);
> +
> +  if (!ref_chain)
> +    return NULL;
> +
> +  for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
> +    {
> +      /* Problem getting some definition for this instruction.  */
> +      if (ref_link->ref == NULL)
> +       return NULL;
> +      if (DF_REF_INSN_INFO (ref_link->ref) == NULL)
> +       return NULL;
> +      if (global_regs[REGNO (reg)]
> +         && !set_of (reg, DF_REF_INSN (ref_link->ref)))
> +       return NULL;
> +    }
> +
> +  if (ref_chain->next)
> +    return NULL;
> +
> +  rtx_insn* def = DF_REF_INSN (ref_chain->ref);
> +
> +  if (BLOCK_FOR_INSN (def) != BLOCK_FOR_INSN (insn))
> +    return NULL;
> +
> +  if (DF_INSN_LUID (def) > DF_INSN_LUID (insn))
> +    return NULL;
> +
> +  return def;
> +}
> +
> +/* Helper function for fold_offsets.
> +   Get all the reaching uses of an instruction.  The uses are desired for REG
> +   set in INSN.  Return use list or NULL if a use is missing or irregular.
> +   If SUCCESS is not NULL then it's value is set to false if there are
> +   missing or irregular uses and to true otherwise.  */
> +static struct df_link*
> +get_uses (rtx_insn *insn, rtx reg, bool* success)
> +{
> +  df_ref def;
> +  struct df_link *ref_chain, *ref_link;
> +
> +  if (success != NULL)
> +    *success = false;
> +
> +  FOR_EACH_INSN_DEF (def, insn)
> +    if (REGNO (DF_REF_REG (def)) == REGNO (reg))
> +      break;
> +
> +  if (!def)
> +    return NULL;
> +
> +  ref_chain = DF_REF_CHAIN (def);
> +
> +  for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
> +    {
> +      /* Problem getting some use for this instruction.  */
> +      if (ref_link->ref == NULL)
> +       return NULL;
> +      if (DF_REF_CLASS (ref_link->ref) != DF_REF_REGULAR)
> +       return NULL;
> +    }
> +
> +  if (success != NULL)
> +    *success = true;
> +
> +  return ref_chain;
> +}
> +
> +/* Recursive function that computes the foldable offsets through the
> +   definitions of REG in INSN given an integer scale factor SCALE.
> +   Returns the offset that would have to be added if all instructions
> +   in PENDING_DELETES were to be deleted.
> +
> +  - if ANALYZE is true then it recurses through definitions with the common
> +    code and marks eligible for folding instructions in the bitmap
> +    can_fold_insn.  An instruction is eligible if all it's uses are also
> +    eligible.  Initially can_fold_insn is true for memory accesses.
> +
> +  - if ANALYZE is false then it recurses through definitions with the common
> +    code and computes and returns the offset that would result from folding
> +    the instructions in PENDING_DELETES were to be deleted.  */
> +static HOST_WIDE_INT
> +fold_offsets (rtx_insn* insn, rtx reg, int scale, bool analyze,
> +             bitmap pending_deletes)
> +{
> +  rtx_insn* def = get_single_def_in_bb (insn, reg);
> +
> +  if (!def)
> +    return 0;
> +
> +  rtx set = single_set (def);
> +
> +  if (!set)
> +    return 0;
> +
> +  rtx src = SET_SRC (set);
> +  rtx dest = SET_DEST (set);
> +
> +  enum rtx_code code = GET_CODE (src);
> +
> +  /* Return early for SRC codes that we don't know how to handle.  */
> +  if (code != PLUS && code != ASHIFT && code != REG)
> +    return 0;
> +
> +  unsigned int dest_regno = REGNO (dest);
> +
> +  /* We don't want to fold offsets from instructions that change some
> +     particular registers with potentially global side effects.  */
> +  if (!GP_REG_P (dest_regno)
> +      || dest_regno == STACK_POINTER_REGNUM
> +      || (frame_pointer_needed && dest_regno == HARD_FRAME_POINTER_REGNUM)
> +      || dest_regno == GP_REGNUM
> +      || dest_regno == THREAD_POINTER_REGNUM
> +      || dest_regno == RETURN_ADDR_REGNUM)
> +    return 0;
> +
> +  if (analyze)
> +    {
> +      /* We can only fold through instructions that are eventually used as
> +        memory addresses and do not have other uses.  Use the same logic
> +        from the offset calculation to visit instructions that can
> +        propagate offsets and keep track in can_fold_insn which have uses
> +        that end always in memory instructions.  */
> +
> +      if (REG_P (dest))
> +       {
> +         bool success;
> +         struct df_link *uses = get_uses (def, dest, &success), *ref_link;
> +
> +         if (!success)
> +           return 0;
> +
> +         for (ref_link = uses; ref_link; ref_link = ref_link->next)
> +           {
> +             rtx_insn* use = DF_REF_INSN (ref_link->ref);
> +
> +             /* Ignore debug insns during analysis.  */
> +             if (DEBUG_INSN_P (use))
> +               continue;
> +
> +             if (!bitmap_bit_p (&can_fold_insn, INSN_UID (use)))
> +               return 0;
> +
> +             rtx use_set = single_set (use);
> +
> +             /* Prevent folding when a memory store uses the dest register.  */
> +             if (use_set
> +                 && MEM_P (SET_DEST (use_set))
> +                 && REG_P (SET_SRC (use_set))
> +                 && REGNO (SET_SRC (use_set)) == REGNO (dest))
> +               return 0;
> +           }
> +
> +         bitmap_set_bit (&can_fold_insn, INSN_UID (def));
> +       }
> +    }
> +
> +  if (!bitmap_bit_p (&can_fold_insn, INSN_UID (def)))
> +    return 0;
> +
> +  switch (code)
> +    {
> +    case PLUS:
> +      {
> +       /* Propagate through add.  */
> +       rtx arg1 = XEXP (src, 0);
> +       rtx arg2 = XEXP (src, 1);
> +
> +       HOST_WIDE_INT offset = 0;
> +
> +       if (REG_P (arg1))
> +         offset += fold_offsets (def, arg1, 1, analyze, pending_deletes);
> +       else if (GET_CODE (arg1) == ASHIFT && REG_P (XEXP (arg1, 0))
> +                && CONST_INT_P (XEXP (arg1, 1)))
> +         {
> +           /* Also handle shift-and-add from the zbb extension.  */
> +           int shift_scale = (1 << (int) INTVAL (XEXP (arg1, 1)));
> +           offset += fold_offsets (def, XEXP (arg1, 0), shift_scale, analyze,
> +                                   pending_deletes);
> +         }
> +
> +       if (REG_P (arg2))
> +         offset += fold_offsets (def, arg2, 1, analyze, pending_deletes);
> +       else if (CONST_INT_P (arg2) && !analyze)
> +         {
> +           offset += INTVAL (arg2);
> +           bitmap_set_bit (pending_deletes, INSN_UID (def));
> +         }
> +
> +       return scale * offset;
> +      }
> +    case ASHIFT:
> +      {
> +       /* Propagate through sll.  */
> +       rtx arg1 = XEXP (src, 0);
> +       rtx arg2 = XEXP (src, 1);
> +
> +       if (REG_P (arg1) && CONST_INT_P (arg2))
> +         {
> +           int shift_scale = (1 << (int) INTVAL (arg2));
> +           return scale * fold_offsets (def, arg1, shift_scale, analyze,
> +                                        pending_deletes);
> +         }
> +
> +       return 0;
> +      }
> +    case REG:
> +      /* Propagate through mv.  */
> +      return scale * fold_offsets (def, src, 1, analyze, pending_deletes);
> +    default:
> +      /* Cannot propagate.  */
> +      return 0;
> +    }
> +}
> +
> +/* Helper function for fold_offset_mem.
> +   If INSN is a set rtx that loads from or stores to
> +   some memory location that could have an offset folded
> +   to it, return the rtx for the memory operand.  */
> +static rtx
> +get_foldable_mem_rtx (rtx_insn* insn)
> +{
> +  rtx set = single_set (insn);
> +
> +  if (set != NULL_RTX)
> +    {
> +      rtx src = SET_SRC (set);
> +      rtx dest = SET_DEST (set);
> +
> +      /* We don't want folding if the memory has
> +        unspec/unspec volatile in either src or dest.
> +        In particular this also prevents folding
> +        when atomics are involved.  */
> +      if (GET_CODE (src) == UNSPEC
> +         || GET_CODE (src) == UNSPEC_VOLATILE
> +         || GET_CODE (dest) == UNSPEC
> +         || GET_CODE (dest) == UNSPEC_VOLATILE)
> +       return NULL;
> +
> +      if (MEM_P (src))
> +       return src;
> +      else if (MEM_P (dest))
> +       return dest;
> +      else if ((
> +               GET_CODE (src) == SIGN_EXTEND
> +               || GET_CODE (src) == ZERO_EXTEND
> +             )
> +             && MEM_P (XEXP (src, 0)))
> +       return XEXP (src, 0);
> +    }
> +
> +  return NULL;
> +}
> +
> +/* Driver function that performs the actions defined by PHASE for INSN.  */
> +static void
> +fold_offset_mem (rtx_insn* insn, int phase)
> +{
> +  if (phase == FM_PHASE_COMMIT_INSNS)
> +    {
> +      if (bitmap_bit_p (&pending_remove_insn, INSN_UID (insn))
> +         && !bitmap_bit_p (&cannot_remove_insn, INSN_UID (insn)))
> +       {
> +         rtx set = single_set (insn);
> +         rtx src = SET_SRC (set);
> +         rtx dest = SET_DEST (set);
> +         rtx arg1 = XEXP (src, 0);
> +
> +         /* INSN is an add immidiate addi DEST, SRC1, SRC2 that we
> +            must replace with addi DEST, SRC1, 0.  */
> +         if (XEXP (src, 0) == stack_pointer_rtx)
> +           stats_folded_sp++;
> +         else
> +           stats_folded_other++;
> +
> +         if (dump_file)
> +           {
> +             fprintf (dump_file, "Instruction deleted from folding:");
> +             print_rtl_single (dump_file, insn);
> +           }
> +
> +         if (REGNO (dest) != REGNO (arg1))
> +           {
> +             /* If the dest register is different than the fisrt argument
> +                then the addition with constant 0 is equivalent to a move
> +                instruction.  We emit the move and let the subsequent
> +                pass cprop_hardreg eliminate that if possible.  */
> +             rtx arg1_reg_rtx = gen_rtx_REG (GET_MODE (dest), REGNO (arg1));
> +             rtx mov_rtx = gen_move_insn (dest, arg1_reg_rtx);
> +             df_insn_rescan (emit_insn_after (mov_rtx, insn));
> +           }
> +
> +         /* If the dest register is the same with the first argument
> +            then the addition with constant 0 is a no-op.
> +            We can now delete the original add immidiate instruction.  */
> +         delete_insn (insn);
> +       }
> +    }
> +  else
> +    {
> +      rtx mem = get_foldable_mem_rtx (insn);
> +
> +      if (!mem)
> +       return;
> +
> +      rtx mem_addr = XEXP (mem, 0);
> +      rtx reg;
> +      HOST_WIDE_INT cur_off;
> +
> +      if (REG_P (mem_addr))
> +       {
> +         reg = mem_addr;
> +         cur_off = 0;
> +       }
> +      else if (GET_CODE (mem_addr) == PLUS
> +              && REG_P (XEXP (mem_addr, 0))
> +              && CONST_INT_P (XEXP (mem_addr, 1)))
> +       {
> +         reg = XEXP (mem_addr, 0);
> +         cur_off = INTVAL (XEXP (mem_addr, 1));
> +       }
> +      else
> +       return;
> +
> +      if (phase == FM_PHASE_ANALYSIS)
> +       {
> +         bitmap_set_bit (&can_fold_insn, INSN_UID (insn));
> +         fold_offsets (insn, reg, 1, true, NULL);
> +       }
> +      else if (phase == FM_PHASE_VALIDITY)
> +       {
> +         bitmap_head new_pending_deletes;
> +         bitmap_initialize (&new_pending_deletes, NULL);
> +         HOST_WIDE_INT offset = cur_off + fold_offsets (insn, reg, 1, false,
> +                                                       &new_pending_deletes);
> +
> +         /* Temporarily change the offset in MEM to test whether
> +            it results in a valid instruction.  */
> +         machine_mode mode = GET_MODE (mem_addr);
> +         XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, GEN_INT (offset));
> +
> +         bool valid_change = recog (PATTERN (insn), insn, 0) >= 0;
> +
> +         /* Restore the instruction.  */
> +         XEXP (mem, 0) = mem_addr;
> +
> +         if (valid_change)
> +           bitmap_ior_into (&pending_remove_insn, &new_pending_deletes);
> +         else
> +           bitmap_ior_into (&cannot_remove_insn, &new_pending_deletes);
> +         bitmap_release (&new_pending_deletes);
> +       }
> +      else if (phase == FM_PHASE_COMMIT_OFFSETS)
> +       {
> +         bitmap_head required_deletes;
> +         bitmap_initialize (&required_deletes, NULL);
> +         HOST_WIDE_INT offset = cur_off + fold_offsets (insn, reg, 1, false,
> +                                                        &required_deletes);
> +         bool illegal = bitmap_intersect_p (&required_deletes,
> +                                            &cannot_remove_insn);
> +
> +         if (offset == cur_off)
> +           return;
> +
> +         gcc_assert (!bitmap_empty_p (&required_deletes));
> +
> +         /* We have to update CANNOT_REMOVE_INSN again if transforming
> +            this instruction is illegal.  */
> +         if (illegal)
> +           bitmap_ior_into (&cannot_remove_insn, &required_deletes);
> +         else
> +           {
> +             machine_mode mode = GET_MODE (mem_addr);
> +             XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, GEN_INT (offset));
> +             df_insn_rescan (insn);
> +
> +             if (dump_file)
> +               {
> +                 fprintf (dump_file, "Memory offset changed from "
> +                                     HOST_WIDE_INT_PRINT_DEC
> +                                     " to "
> +                                     HOST_WIDE_INT_PRINT_DEC
> +                                     " for instruction:\n", cur_off, offset);
> +                       print_rtl_single (dump_file, insn);
> +               }
> +           }
> +         bitmap_release (&required_deletes);
> +       }
> +    }
> +}
> +
> +unsigned int
> +pass_fold_mem_offsets::execute (function *fn)
> +{
> +  basic_block bb;
> +  rtx_insn *insn;
> +
> +  df_set_flags (DF_RD_PRUNE_DEAD_DEFS | DF_DEFER_INSN_RESCAN);
> +  df_chain_add_problem (DF_UD_CHAIN + DF_DU_CHAIN);
> +  df_analyze ();
> +
> +  bitmap_initialize (&can_fold_insn, NULL);
> +  bitmap_initialize (&pending_remove_insn, NULL);
> +  bitmap_initialize (&cannot_remove_insn, NULL);
> +
> +  stats_folded_sp = 0;
> +  stats_folded_other = 0;
> +
> +  FOR_ALL_BB_FN (bb, fn)
> +    {
> +      /* The shorten-memrefs pass runs when a BB is optimized for size
> +        and moves offsets from multiple memory instructions to a common
> +        add instruction.  Disable folding if optimizing for size because
> +        this pass will cancel the effects of shorten-memrefs.  */
> +      if (optimize_bb_for_size_p (bb))
> +       continue;
> +
> +      bitmap_clear (&can_fold_insn);
> +      bitmap_clear (&pending_remove_insn);
> +      bitmap_clear (&cannot_remove_insn);
> +
> +      FOR_BB_INSNS (bb, insn)
> +       fold_offset_mem (insn, FM_PHASE_ANALYSIS);
> +
> +      FOR_BB_INSNS (bb, insn)
> +       fold_offset_mem (insn, FM_PHASE_VALIDITY);
> +
> +      FOR_BB_INSNS (bb, insn)
> +       fold_offset_mem (insn, FM_PHASE_COMMIT_OFFSETS);
> +
> +      FOR_BB_INSNS (bb, insn)
> +       fold_offset_mem (insn, FM_PHASE_COMMIT_INSNS);
> +    }
> +
> +  statistics_counter_event (cfun, "addi with sp fold", stats_folded_sp);
> +  statistics_counter_event (cfun, "other addi fold", stats_folded_other);
> +
> +  bitmap_release (&can_fold_insn);
> +  bitmap_release (&pending_remove_insn);
> +  bitmap_release (&cannot_remove_insn);
> +
> +  return 0;
> +}
> +
> +} // anon namespace
> +
> +rtl_opt_pass *
> +make_pass_fold_mem_offsets (gcc::context *ctxt)
> +{
> +  return new pass_fold_mem_offsets (ctxt);
> +}
> diff --git a/gcc/config/riscv/riscv-passes.def b/gcc/config/riscv/riscv-passes.def
> index 4084122cf0a..dc08daadc66 100644
> --- a/gcc/config/riscv/riscv-passes.def
> +++ b/gcc/config/riscv/riscv-passes.def
> @@ -18,4 +18,5 @@
>     <http://www.gnu.org/licenses/>.  */
>
>  INSERT_PASS_AFTER (pass_rtl_store_motion, 1, pass_shorten_memrefs);
> +INSERT_PASS_AFTER (pass_regrename, 1, pass_fold_mem_offsets);
>  INSERT_PASS_BEFORE (pass_fast_rtl_dce, 1, pass_vsetvl);
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index 5f78fd579bb..b89a82adb0e 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -104,6 +104,7 @@ extern void riscv_parse_arch_string (const char *, struct gcc_options *, locatio
>  extern bool riscv_hard_regno_rename_ok (unsigned, unsigned);
>
>  rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
> +rtl_opt_pass * make_pass_fold_mem_offsets (gcc::context *ctxt);
>  rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
>
>  /* Information about one CPU we know about.  */
> diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
> index 63d4710cb15..5e1fbdbedcc 100644
> --- a/gcc/config/riscv/riscv.opt
> +++ b/gcc/config/riscv/riscv.opt
> @@ -105,6 +105,10 @@ Convert BASE + LARGE_OFFSET addresses to NEW_BASE + SMALL_OFFSET to allow more
>  memory accesses to be generated as compressed instructions.  Currently targets
>  32-bit integer load/stores.
>
> +mfold-mem-offsets
> +Target Bool Var(riscv_mfold_mem_offsets) Init(1)
> +Fold instructions calculating memory offsets to the memory access instruction if possible.
> +
>  mcmodel=
>  Target RejectNegative Joined Enum(code_model) Var(riscv_cmodel) Init(TARGET_DEFAULT_CMODEL)
>  Specify the code model.
> diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
> index 1252d6f851a..f29cf463867 100644
> --- a/gcc/config/riscv/t-riscv
> +++ b/gcc/config/riscv/t-riscv
> @@ -76,6 +76,10 @@ riscv-shorten-memrefs.o: $(srcdir)/config/riscv/riscv-shorten-memrefs.cc \
>         $(COMPILE) $<
>         $(POSTCOMPILE)
>
> +riscv-fold-mem-offsets.o: $(srcdir)/config/riscv/riscv-fold-mem-offsets.cc
> +       $(COMPILE) $<
> +       $(POSTCOMPILE)
> +
>  riscv-selftests.o: $(srcdir)/config/riscv/riscv-selftests.cc \
>    $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(TREE_H) output.h \
>    $(C_COMMON_H) $(TARGET_H) $(OPTABS_H) $(EXPR_H) $(INSN_ATTR_H) $(EMIT_RTL_H)
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index ee78591c73e..39b57cab595 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -1218,6 +1218,7 @@ See RS/6000 and PowerPC Options.
>  -msmall-data-limit=@var{N-bytes}
>  -msave-restore  -mno-save-restore
>  -mshorten-memrefs  -mno-shorten-memrefs
> +-mfold-mem-offsets  -mno-fold-mem-offsets
>  -mstrict-align  -mno-strict-align
>  -mcmodel=medlow  -mcmodel=medany
>  -mexplicit-relocs  -mno-explicit-relocs
> @@ -29048,6 +29049,13 @@ of 'new base + small offset'.  If the new base gets stored in a compressed
>  register, then the new load/store can be compressed.  Currently targets 32-bit
>  integer load/stores only.
>
> +@opindex mfold-mem-offsets
> +@item -mfold-mem-offsets
> +@itemx -mno-fold-mem-offsets
> +Do or do not attempt to move constant addition calculations used to for memory
> +offsets to the corresponding memory instructions.  The default is
> +@option{-mfold-mem-offsets} at levels @option{-O2}, @option{-O3}.
> +
>  @opindex mstrict-align
>  @item -mstrict-align
>  @itemx -mno-strict-align
> diff --git a/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
> new file mode 100644
> index 00000000000..574cc92b6ab
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mfold-mem-offsets" } */
> +
> +void sink(int arr[2]);
> +
> +void
> +foo(int a, int b, int i)
> +{
> +  int arr[2] = {a, b};
> +  arr[i]++;
> +  sink(arr);
> +}
> +
> +// Should compile without negative memory offsets when using -mfold-mem-offsets
> +/* { dg-final { scan-assembler-not "lw\t.*,-.*\\(.*\\)" } } */
> +/* { dg-final { scan-assembler-not "sw\t.*,-.*\\(.*\\)" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
> new file mode 100644
> index 00000000000..e6c251d3a3c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mfold-mem-offsets" } */
> +
> +void sink(int arr[3]);
> +
> +void
> +foo(int a, int b, int c, int i)
> +{
> +  int arr1[3] = {a, b, c};
> +  int arr2[3] = {a, c, b};
> +  int arr3[3] = {c, b, a};
> +
> +  arr1[i]++;
> +  arr2[i]++;
> +  arr3[i]++;
> +
> +  sink(arr1);
> +  sink(arr2);
> +  sink(arr3);
> +}
> +
> +// Should compile without negative memory offsets when using -mfold-mem-offsets
> +/* { dg-final { scan-assembler-not "lw\t.*,-.*\\(.*\\)" } } */
> +/* { dg-final { scan-assembler-not "sw\t.*,-.*\\(.*\\)" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c
> new file mode 100644
> index 00000000000..8586d3e3a29
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mfold-mem-offsets" } */
> +
> +void load(int arr[2]);
> +
> +int
> +foo(long unsigned int i)
> +{
> +  int arr[2];
> +  load(arr);
> +
> +  return arr[3 * i + 77];
> +}
> +
> +// Should compile without negative memory offsets when using -mfold-mem-offsets
> +/* { dg-final { scan-assembler-not "lw\t.*,-.*\\(.*\\)" } } */
> +/* { dg-final { scan-assembler-not "addi\t.*,.*,77" } } */
> \ No newline at end of file
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.
  2023-05-25 13:01   ` Richard Biener
@ 2023-05-25 13:25     ` Manolis Tsamis
  2023-05-25 13:31     ` Jeff Law
  1 sibling, 0 replies; 45+ messages in thread
From: Manolis Tsamis @ 2023-05-25 13:25 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Palmer Dabbelt, Philipp Tomsich, Kito Cheng

On Thu, May 25, 2023 at 4:03 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Thu, May 25, 2023 at 2:36 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
> >
> > Implementation of the new RISC-V optimization pass for memory offset
> > calculations, documentation and testcases.
>
> Why do fwprop or combine not what you want to do?
>

Hi Richard,

At least from my experiments, the existing mechanisms (fwprop,
combine, ...) cannot handle the more difficult cases for which this
was created.

As can be seen in the example presented in the cover letter, this pass
is designed to work with partially-overlapping offset calculations,
multiple memory operations sharing some intermediate calculations
while also taking care of the offset range restrictions.
Also some offset calculation is introduced late enough (mostly
involving the stack pointer, local vars etc) that I think fwprop
cannot do something about them. Please correct me if I am wrong.

Prior to implementing this I did analyze the code generated for
benchmarks and found out that a lot of the harder cases are missed,
but they require powerful analysis and cannot be handled with combine.

Thanks,
Manolis

On Thu, May 25, 2023 at 4:03 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Thu, May 25, 2023 at 2:36 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
> >
> > Implementation of the new RISC-V optimization pass for memory offset
> > calculations, documentation and testcases.
>
> Why do fwprop or combine not what you want to do?
>
> > gcc/ChangeLog:
> >
> >         * config.gcc: Add riscv-fold-mem-offsets.o to extra_objs.
> >         * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Schedule a new
> >         pass.
> >         * config/riscv/riscv-protos.h (make_pass_fold_mem_offsets): Declare.
> >         * config/riscv/riscv.opt: New options.
> >         * config/riscv/t-riscv: New build rule.
> >         * doc/invoke.texi: Document new option.
> >         * config/riscv/riscv-fold-mem-offsets.cc: New file.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/riscv/fold-mem-offsets-1.c: New test.
> >         * gcc.target/riscv/fold-mem-offsets-2.c: New test.
> >         * gcc.target/riscv/fold-mem-offsets-3.c: New test.
> >
> > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
> > ---
> >
> >  gcc/config.gcc                                |   2 +-
> >  gcc/config/riscv/riscv-fold-mem-offsets.cc    | 637 ++++++++++++++++++
> >  gcc/config/riscv/riscv-passes.def             |   1 +
> >  gcc/config/riscv/riscv-protos.h               |   1 +
> >  gcc/config/riscv/riscv.opt                    |   4 +
> >  gcc/config/riscv/t-riscv                      |   4 +
> >  gcc/doc/invoke.texi                           |   8 +
> >  .../gcc.target/riscv/fold-mem-offsets-1.c     |  16 +
> >  .../gcc.target/riscv/fold-mem-offsets-2.c     |  24 +
> >  .../gcc.target/riscv/fold-mem-offsets-3.c     |  17 +
> >  10 files changed, 713 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/config/riscv/riscv-fold-mem-offsets.cc
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c
> >
> > diff --git a/gcc/config.gcc b/gcc/config.gcc
> > index d88071773c9..5dffd21b4c8 100644
> > --- a/gcc/config.gcc
> > +++ b/gcc/config.gcc
> > @@ -529,7 +529,7 @@ pru-*-*)
> >         ;;
> >  riscv*)
> >         cpu_type=riscv
> > -       extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o"
> > +       extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o riscv-fold-mem-offsets.o riscv-selftests.o riscv-v.o riscv-vsetvl.o"
> >         extra_objs="${extra_objs} riscv-vector-builtins.o riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
> >         extra_objs="${extra_objs} thead.o"
> >         d_target_objs="riscv-d.o"
> > diff --git a/gcc/config/riscv/riscv-fold-mem-offsets.cc b/gcc/config/riscv/riscv-fold-mem-offsets.cc
> > new file mode 100644
> > index 00000000000..81325bb3beb
> > --- /dev/null
> > +++ b/gcc/config/riscv/riscv-fold-mem-offsets.cc
> > @@ -0,0 +1,637 @@
> > +/* Fold memory offsets pass for RISC-V.
> > +   Copyright (C) 2022 Free Software Foundation, Inc.
> > +
> > +This file is part of GCC.
> > +
> > +GCC is free software; you can redistribute it and/or modify
> > +it under the terms of the GNU General Public License as published by
> > +the Free Software Foundation; either version 3, or (at your option)
> > +any later version.
> > +
> > +GCC is distributed in the hope that it will be useful,
> > +but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +GNU General Public License for more details.
> > +
> > +You should have received a copy of the GNU General Public License
> > +along with GCC; see the file COPYING3.  If not see
> > +<http://www.gnu.org/licenses/>.  */
> > +
> > +#define IN_TARGET_CODE 1
> > +
> > +#include "config.h"
> > +#include "system.h"
> > +#include "coretypes.h"
> > +#include "tm.h"
> > +#include "rtl.h"
> > +#include "tree.h"
> > +#include "expr.h"
> > +#include "backend.h"
> > +#include "regs.h"
> > +#include "target.h"
> > +#include "memmodel.h"
> > +#include "emit-rtl.h"
> > +#include "insn-config.h"
> > +#include "recog.h"
> > +#include "predict.h"
> > +#include "df.h"
> > +#include "tree-pass.h"
> > +#include "cfgrtl.h"
> > +
> > +/* This pass tries to optimize memory offset calculations by moving them
> > +   from add immediate instructions to the memory loads/stores.
> > +   For example it can transform this:
> > +
> > +     addi t4,sp,16
> > +     add  t2,a6,t4
> > +     shl  t3,t2,1
> > +     ld   a2,0(t3)
> > +     addi a2,1
> > +     sd   a2,8(t2)
> > +
> > +   into the following (one instruction less):
> > +
> > +     add  t2,a6,sp
> > +     shl  t3,t2,1
> > +     ld   a2,32(t3)
> > +     addi a2,1
> > +     sd   a2,24(t2)
> > +
> > +   Usually, the code generated from the previous passes tries to have the
> > +   offsets in the memory instructions but this pass is still beneficial
> > +   because:
> > +
> > +    - There are cases where add instructions are added in a late rtl pass
> > +      and the rest of the pipeline cannot eliminate them.  Specifically,
> > +      arrays and structs allocated on the stack can result in multiple
> > +      unnecessary add instructions that cannot be eliminated easily
> > +      otherwise.
> > +
> > +    - The existing mechanisms that move offsets to memory instructions
> > +      usually apply only to specific patterns or have other limitations.
> > +      This pass is very generic and can fold offsets through complex
> > +      calculations with multiple memory uses and partially overlapping
> > +      calculations.  As a result it can eliminate more instructions than
> > +      what is possible otherwise.
> > +
> > +   This pass runs inside a single basic blocks and consists of 4 phases:
> > +
> > +    - Phase 1 (Analysis): Find "foldable" instructions.
> > +      Foldable instructions are those that we know how to propagate
> > +      a constant addition through (add, slli, mv, ...) and only have other
> > +      foldable instructions for uses.  In that phase a DFS traversal on the
> > +      definition tree is performed and foldable instructions are marked on
> > +      a bitmap.  The add immediate instructions that are reachable in this
> > +      DFS are candidates for removal since all the intermediate
> > +      calculations affected by them are also foldable.
> > +
> > +    - Phase 2 (Validity): Traverse again, this time calculating the
> > +      offsets that would result from folding all add immediate instructions
> > +      found.  Also keep track of which instructions will be folded for this
> > +      particular offset because folding can be partially or completely
> > +      shared across an number of different memory instructions.  At this point,
> > +      since we calculated the actual offset resulting from folding, we check
> > +      and keep track if it's a valid 12-bit immediate.
> > +
> > +    - Phase 3 (Commit offsets): Traverse again.  This time it is known if
> > +      a particular fold is valid so actually fold the offset by changing
> > +      the RTL statement.  It's important that this phase is separate from the
> > +      previous because one instruction that is foldable with a valid offset
> > +      can become result in an invalid offset for another instruction later on.
> > +
> > +    - Phase 4 (Commit instruction deletions): Scan all insns and delete
> > +      all add immediate instructions that were folded.  */
> > +
> > +namespace {
> > +
> > +const pass_data pass_data_fold_mem =
> > +{
> > +  RTL_PASS, /* type */
> > +  "fold_mem_offsets", /* name */
> > +  OPTGROUP_NONE, /* optinfo_flags */
> > +  TV_NONE, /* tv_id */
> > +  0, /* properties_required */
> > +  0, /* properties_provided */
> > +  0, /* properties_destroyed */
> > +  0, /* todo_flags_start */
> > +  TODO_df_finish, /* todo_flags_finish */
> > +};
> > +
> > +class pass_fold_mem_offsets : public rtl_opt_pass
> > +{
> > +public:
> > +  pass_fold_mem_offsets (gcc::context *ctxt)
> > +    : rtl_opt_pass (pass_data_fold_mem, ctxt)
> > +  {}
> > +
> > +  /* opt_pass methods: */
> > +  virtual bool gate (function *)
> > +    {
> > +      return riscv_mfold_mem_offsets
> > +              && optimize >= 2;
> > +    }
> > +
> > +  virtual unsigned int execute (function *);
> > +}; // class pass_fold_mem_offsets
> > +
> > +/* Bitmap that tracks which instructions are reachable through sequences
> > +   of foldable instructions.  */
> > +static bitmap_head can_fold_insn;
> > +
> > +/* Bitmap with instructions marked for deletion due to folding.  */
> > +static bitmap_head pending_remove_insn;
> > +
> > +/* Bitmap with instructions that cannot be deleted because that would
> > +   require folding an offset that's invalid in some memory access.
> > +   An instruction can be in both PENDING_REMOVE_INSN and CANNOT_REMOVE_INSN
> > +   at the same time, in which case it cannot be safely deleted.  */
> > +static bitmap_head cannot_remove_insn;
> > +
> > +/* The number of folded addi instructions of the form "addi reg, sp, X".  */
> > +static int stats_folded_sp;
> > +
> > +/* The number of the rest folded addi instructions.  */
> > +static int stats_folded_other;
> > +
> > +enum fold_mem_phase
> > +{
> > +  FM_PHASE_ANALYSIS,
> > +  FM_PHASE_VALIDITY,
> > +  FM_PHASE_COMMIT_OFFSETS,
> > +  FM_PHASE_COMMIT_INSNS
> > +};
> > +
> > +/* Helper function for fold_offsets.
> > +  Get the single reaching definition of an instruction inside a BB.
> > +  The definition is desired for REG used in INSN.
> > +  Return the definition insn or NULL if there's no definition with
> > +  the desired criteria.  */
> > +static rtx_insn*
> > +get_single_def_in_bb (rtx_insn *insn, rtx reg)
> > +{
> > +  df_ref use;
> > +  struct df_link *ref_chain, *ref_link;
> > +
> > +  FOR_EACH_INSN_USE (use, insn)
> > +    {
> > +      if (GET_CODE (DF_REF_REG (use)) == SUBREG)
> > +       return NULL;
> > +      if (REGNO (DF_REF_REG (use)) == REGNO (reg))
> > +       break;
> > +    }
> > +
> > +  if (!use)
> > +    return NULL;
> > +
> > +  ref_chain = DF_REF_CHAIN (use);
> > +
> > +  if (!ref_chain)
> > +    return NULL;
> > +
> > +  for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
> > +    {
> > +      /* Problem getting some definition for this instruction.  */
> > +      if (ref_link->ref == NULL)
> > +       return NULL;
> > +      if (DF_REF_INSN_INFO (ref_link->ref) == NULL)
> > +       return NULL;
> > +      if (global_regs[REGNO (reg)]
> > +         && !set_of (reg, DF_REF_INSN (ref_link->ref)))
> > +       return NULL;
> > +    }
> > +
> > +  if (ref_chain->next)
> > +    return NULL;
> > +
> > +  rtx_insn* def = DF_REF_INSN (ref_chain->ref);
> > +
> > +  if (BLOCK_FOR_INSN (def) != BLOCK_FOR_INSN (insn))
> > +    return NULL;
> > +
> > +  if (DF_INSN_LUID (def) > DF_INSN_LUID (insn))
> > +    return NULL;
> > +
> > +  return def;
> > +}
> > +
> > +/* Helper function for fold_offsets.
> > +   Get all the reaching uses of an instruction.  The uses are desired for REG
> > +   set in INSN.  Return use list or NULL if a use is missing or irregular.
> > +   If SUCCESS is not NULL then it's value is set to false if there are
> > +   missing or irregular uses and to true otherwise.  */
> > +static struct df_link*
> > +get_uses (rtx_insn *insn, rtx reg, bool* success)
> > +{
> > +  df_ref def;
> > +  struct df_link *ref_chain, *ref_link;
> > +
> > +  if (success != NULL)
> > +    *success = false;
> > +
> > +  FOR_EACH_INSN_DEF (def, insn)
> > +    if (REGNO (DF_REF_REG (def)) == REGNO (reg))
> > +      break;
> > +
> > +  if (!def)
> > +    return NULL;
> > +
> > +  ref_chain = DF_REF_CHAIN (def);
> > +
> > +  for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
> > +    {
> > +      /* Problem getting some use for this instruction.  */
> > +      if (ref_link->ref == NULL)
> > +       return NULL;
> > +      if (DF_REF_CLASS (ref_link->ref) != DF_REF_REGULAR)
> > +       return NULL;
> > +    }
> > +
> > +  if (success != NULL)
> > +    *success = true;
> > +
> > +  return ref_chain;
> > +}
> > +
> > +/* Recursive function that computes the foldable offsets through the
> > +   definitions of REG in INSN given an integer scale factor SCALE.
> > +   Returns the offset that would have to be added if all instructions
> > +   in PENDING_DELETES were to be deleted.
> > +
> > +  - if ANALYZE is true then it recurses through definitions with the common
> > +    code and marks eligible for folding instructions in the bitmap
> > +    can_fold_insn.  An instruction is eligible if all it's uses are also
> > +    eligible.  Initially can_fold_insn is true for memory accesses.
> > +
> > +  - if ANALYZE is false then it recurses through definitions with the common
> > +    code and computes and returns the offset that would result from folding
> > +    the instructions in PENDING_DELETES were to be deleted.  */
> > +static HOST_WIDE_INT
> > +fold_offsets (rtx_insn* insn, rtx reg, int scale, bool analyze,
> > +             bitmap pending_deletes)
> > +{
> > +  rtx_insn* def = get_single_def_in_bb (insn, reg);
> > +
> > +  if (!def)
> > +    return 0;
> > +
> > +  rtx set = single_set (def);
> > +
> > +  if (!set)
> > +    return 0;
> > +
> > +  rtx src = SET_SRC (set);
> > +  rtx dest = SET_DEST (set);
> > +
> > +  enum rtx_code code = GET_CODE (src);
> > +
> > +  /* Return early for SRC codes that we don't know how to handle.  */
> > +  if (code != PLUS && code != ASHIFT && code != REG)
> > +    return 0;
> > +
> > +  unsigned int dest_regno = REGNO (dest);
> > +
> > +  /* We don't want to fold offsets from instructions that change some
> > +     particular registers with potentially global side effects.  */
> > +  if (!GP_REG_P (dest_regno)
> > +      || dest_regno == STACK_POINTER_REGNUM
> > +      || (frame_pointer_needed && dest_regno == HARD_FRAME_POINTER_REGNUM)
> > +      || dest_regno == GP_REGNUM
> > +      || dest_regno == THREAD_POINTER_REGNUM
> > +      || dest_regno == RETURN_ADDR_REGNUM)
> > +    return 0;
> > +
> > +  if (analyze)
> > +    {
> > +      /* We can only fold through instructions that are eventually used as
> > +        memory addresses and do not have other uses.  Use the same logic
> > +        from the offset calculation to visit instructions that can
> > +        propagate offsets and keep track in can_fold_insn which have uses
> > +        that end always in memory instructions.  */
> > +
> > +      if (REG_P (dest))
> > +       {
> > +         bool success;
> > +         struct df_link *uses = get_uses (def, dest, &success), *ref_link;
> > +
> > +         if (!success)
> > +           return 0;
> > +
> > +         for (ref_link = uses; ref_link; ref_link = ref_link->next)
> > +           {
> > +             rtx_insn* use = DF_REF_INSN (ref_link->ref);
> > +
> > +             /* Ignore debug insns during analysis.  */
> > +             if (DEBUG_INSN_P (use))
> > +               continue;
> > +
> > +             if (!bitmap_bit_p (&can_fold_insn, INSN_UID (use)))
> > +               return 0;
> > +
> > +             rtx use_set = single_set (use);
> > +
> > +             /* Prevent folding when a memory store uses the dest register.  */
> > +             if (use_set
> > +                 && MEM_P (SET_DEST (use_set))
> > +                 && REG_P (SET_SRC (use_set))
> > +                 && REGNO (SET_SRC (use_set)) == REGNO (dest))
> > +               return 0;
> > +           }
> > +
> > +         bitmap_set_bit (&can_fold_insn, INSN_UID (def));
> > +       }
> > +    }
> > +
> > +  if (!bitmap_bit_p (&can_fold_insn, INSN_UID (def)))
> > +    return 0;
> > +
> > +  switch (code)
> > +    {
> > +    case PLUS:
> > +      {
> > +       /* Propagate through add.  */
> > +       rtx arg1 = XEXP (src, 0);
> > +       rtx arg2 = XEXP (src, 1);
> > +
> > +       HOST_WIDE_INT offset = 0;
> > +
> > +       if (REG_P (arg1))
> > +         offset += fold_offsets (def, arg1, 1, analyze, pending_deletes);
> > +       else if (GET_CODE (arg1) == ASHIFT && REG_P (XEXP (arg1, 0))
> > +                && CONST_INT_P (XEXP (arg1, 1)))
> > +         {
> > +           /* Also handle shift-and-add from the zbb extension.  */
> > +           int shift_scale = (1 << (int) INTVAL (XEXP (arg1, 1)));
> > +           offset += fold_offsets (def, XEXP (arg1, 0), shift_scale, analyze,
> > +                                   pending_deletes);
> > +         }
> > +
> > +       if (REG_P (arg2))
> > +         offset += fold_offsets (def, arg2, 1, analyze, pending_deletes);
> > +       else if (CONST_INT_P (arg2) && !analyze)
> > +         {
> > +           offset += INTVAL (arg2);
> > +           bitmap_set_bit (pending_deletes, INSN_UID (def));
> > +         }
> > +
> > +       return scale * offset;
> > +      }
> > +    case ASHIFT:
> > +      {
> > +       /* Propagate through sll.  */
> > +       rtx arg1 = XEXP (src, 0);
> > +       rtx arg2 = XEXP (src, 1);
> > +
> > +       if (REG_P (arg1) && CONST_INT_P (arg2))
> > +         {
> > +           int shift_scale = (1 << (int) INTVAL (arg2));
> > +           return scale * fold_offsets (def, arg1, shift_scale, analyze,
> > +                                        pending_deletes);
> > +         }
> > +
> > +       return 0;
> > +      }
> > +    case REG:
> > +      /* Propagate through mv.  */
> > +      return scale * fold_offsets (def, src, 1, analyze, pending_deletes);
> > +    default:
> > +      /* Cannot propagate.  */
> > +      return 0;
> > +    }
> > +}
> > +
> > +/* Helper function for fold_offset_mem.
> > +   If INSN is a set rtx that loads from or stores to
> > +   some memory location that could have an offset folded
> > +   to it, return the rtx for the memory operand.  */
> > +static rtx
> > +get_foldable_mem_rtx (rtx_insn* insn)
> > +{
> > +  rtx set = single_set (insn);
> > +
> > +  if (set != NULL_RTX)
> > +    {
> > +      rtx src = SET_SRC (set);
> > +      rtx dest = SET_DEST (set);
> > +
> > +      /* We don't want folding if the memory has
> > +        unspec/unspec volatile in either src or dest.
> > +        In particular this also prevents folding
> > +        when atomics are involved.  */
> > +      if (GET_CODE (src) == UNSPEC
> > +         || GET_CODE (src) == UNSPEC_VOLATILE
> > +         || GET_CODE (dest) == UNSPEC
> > +         || GET_CODE (dest) == UNSPEC_VOLATILE)
> > +       return NULL;
> > +
> > +      if (MEM_P (src))
> > +       return src;
> > +      else if (MEM_P (dest))
> > +       return dest;
> > +      else if ((
> > +               GET_CODE (src) == SIGN_EXTEND
> > +               || GET_CODE (src) == ZERO_EXTEND
> > +             )
> > +             && MEM_P (XEXP (src, 0)))
> > +       return XEXP (src, 0);
> > +    }
> > +
> > +  return NULL;
> > +}
> > +
> > +/* Driver function that performs the actions defined by PHASE for INSN.  */
> > +static void
> > +fold_offset_mem (rtx_insn* insn, int phase)
> > +{
> > +  if (phase == FM_PHASE_COMMIT_INSNS)
> > +    {
> > +      if (bitmap_bit_p (&pending_remove_insn, INSN_UID (insn))
> > +         && !bitmap_bit_p (&cannot_remove_insn, INSN_UID (insn)))
> > +       {
> > +         rtx set = single_set (insn);
> > +         rtx src = SET_SRC (set);
> > +         rtx dest = SET_DEST (set);
> > +         rtx arg1 = XEXP (src, 0);
> > +
> > +         /* INSN is an add immidiate addi DEST, SRC1, SRC2 that we
> > +            must replace with addi DEST, SRC1, 0.  */
> > +         if (XEXP (src, 0) == stack_pointer_rtx)
> > +           stats_folded_sp++;
> > +         else
> > +           stats_folded_other++;
> > +
> > +         if (dump_file)
> > +           {
> > +             fprintf (dump_file, "Instruction deleted from folding:");
> > +             print_rtl_single (dump_file, insn);
> > +           }
> > +
> > +         if (REGNO (dest) != REGNO (arg1))
> > +           {
> > +             /* If the dest register is different than the fisrt argument
> > +                then the addition with constant 0 is equivalent to a move
> > +                instruction.  We emit the move and let the subsequent
> > +                pass cprop_hardreg eliminate that if possible.  */
> > +             rtx arg1_reg_rtx = gen_rtx_REG (GET_MODE (dest), REGNO (arg1));
> > +             rtx mov_rtx = gen_move_insn (dest, arg1_reg_rtx);
> > +             df_insn_rescan (emit_insn_after (mov_rtx, insn));
> > +           }
> > +
> > +         /* If the dest register is the same with the first argument
> > +            then the addition with constant 0 is a no-op.
> > +            We can now delete the original add immidiate instruction.  */
> > +         delete_insn (insn);
> > +       }
> > +    }
> > +  else
> > +    {
> > +      rtx mem = get_foldable_mem_rtx (insn);
> > +
> > +      if (!mem)
> > +       return;
> > +
> > +      rtx mem_addr = XEXP (mem, 0);
> > +      rtx reg;
> > +      HOST_WIDE_INT cur_off;
> > +
> > +      if (REG_P (mem_addr))
> > +       {
> > +         reg = mem_addr;
> > +         cur_off = 0;
> > +       }
> > +      else if (GET_CODE (mem_addr) == PLUS
> > +              && REG_P (XEXP (mem_addr, 0))
> > +              && CONST_INT_P (XEXP (mem_addr, 1)))
> > +       {
> > +         reg = XEXP (mem_addr, 0);
> > +         cur_off = INTVAL (XEXP (mem_addr, 1));
> > +       }
> > +      else
> > +       return;
> > +
> > +      if (phase == FM_PHASE_ANALYSIS)
> > +       {
> > +         bitmap_set_bit (&can_fold_insn, INSN_UID (insn));
> > +         fold_offsets (insn, reg, 1, true, NULL);
> > +       }
> > +      else if (phase == FM_PHASE_VALIDITY)
> > +       {
> > +         bitmap_head new_pending_deletes;
> > +         bitmap_initialize (&new_pending_deletes, NULL);
> > +         HOST_WIDE_INT offset = cur_off + fold_offsets (insn, reg, 1, false,
> > +                                                       &new_pending_deletes);
> > +
> > +         /* Temporarily change the offset in MEM to test whether
> > +            it results in a valid instruction.  */
> > +         machine_mode mode = GET_MODE (mem_addr);
> > +         XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, GEN_INT (offset));
> > +
> > +         bool valid_change = recog (PATTERN (insn), insn, 0) >= 0;
> > +
> > +         /* Restore the instruction.  */
> > +         XEXP (mem, 0) = mem_addr;
> > +
> > +         if (valid_change)
> > +           bitmap_ior_into (&pending_remove_insn, &new_pending_deletes);
> > +         else
> > +           bitmap_ior_into (&cannot_remove_insn, &new_pending_deletes);
> > +         bitmap_release (&new_pending_deletes);
> > +       }
> > +      else if (phase == FM_PHASE_COMMIT_OFFSETS)
> > +       {
> > +         bitmap_head required_deletes;
> > +         bitmap_initialize (&required_deletes, NULL);
> > +         HOST_WIDE_INT offset = cur_off + fold_offsets (insn, reg, 1, false,
> > +                                                        &required_deletes);
> > +         bool illegal = bitmap_intersect_p (&required_deletes,
> > +                                            &cannot_remove_insn);
> > +
> > +         if (offset == cur_off)
> > +           return;
> > +
> > +         gcc_assert (!bitmap_empty_p (&required_deletes));
> > +
> > +         /* We have to update CANNOT_REMOVE_INSN again if transforming
> > +            this instruction is illegal.  */
> > +         if (illegal)
> > +           bitmap_ior_into (&cannot_remove_insn, &required_deletes);
> > +         else
> > +           {
> > +             machine_mode mode = GET_MODE (mem_addr);
> > +             XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, GEN_INT (offset));
> > +             df_insn_rescan (insn);
> > +
> > +             if (dump_file)
> > +               {
> > +                 fprintf (dump_file, "Memory offset changed from "
> > +                                     HOST_WIDE_INT_PRINT_DEC
> > +                                     " to "
> > +                                     HOST_WIDE_INT_PRINT_DEC
> > +                                     " for instruction:\n", cur_off, offset);
> > +                       print_rtl_single (dump_file, insn);
> > +               }
> > +           }
> > +         bitmap_release (&required_deletes);
> > +       }
> > +    }
> > +}
> > +
> > +unsigned int
> > +pass_fold_mem_offsets::execute (function *fn)
> > +{
> > +  basic_block bb;
> > +  rtx_insn *insn;
> > +
> > +  df_set_flags (DF_RD_PRUNE_DEAD_DEFS | DF_DEFER_INSN_RESCAN);
> > +  df_chain_add_problem (DF_UD_CHAIN + DF_DU_CHAIN);
> > +  df_analyze ();
> > +
> > +  bitmap_initialize (&can_fold_insn, NULL);
> > +  bitmap_initialize (&pending_remove_insn, NULL);
> > +  bitmap_initialize (&cannot_remove_insn, NULL);
> > +
> > +  stats_folded_sp = 0;
> > +  stats_folded_other = 0;
> > +
> > +  FOR_ALL_BB_FN (bb, fn)
> > +    {
> > +      /* The shorten-memrefs pass runs when a BB is optimized for size
> > +        and moves offsets from multiple memory instructions to a common
> > +        add instruction.  Disable folding if optimizing for size because
> > +        this pass will cancel the effects of shorten-memrefs.  */
> > +      if (optimize_bb_for_size_p (bb))
> > +       continue;
> > +
> > +      bitmap_clear (&can_fold_insn);
> > +      bitmap_clear (&pending_remove_insn);
> > +      bitmap_clear (&cannot_remove_insn);
> > +
> > +      FOR_BB_INSNS (bb, insn)
> > +       fold_offset_mem (insn, FM_PHASE_ANALYSIS);
> > +
> > +      FOR_BB_INSNS (bb, insn)
> > +       fold_offset_mem (insn, FM_PHASE_VALIDITY);
> > +
> > +      FOR_BB_INSNS (bb, insn)
> > +       fold_offset_mem (insn, FM_PHASE_COMMIT_OFFSETS);
> > +
> > +      FOR_BB_INSNS (bb, insn)
> > +       fold_offset_mem (insn, FM_PHASE_COMMIT_INSNS);
> > +    }
> > +
> > +  statistics_counter_event (cfun, "addi with sp fold", stats_folded_sp);
> > +  statistics_counter_event (cfun, "other addi fold", stats_folded_other);
> > +
> > +  bitmap_release (&can_fold_insn);
> > +  bitmap_release (&pending_remove_insn);
> > +  bitmap_release (&cannot_remove_insn);
> > +
> > +  return 0;
> > +}
> > +
> > +} // anon namespace
> > +
> > +rtl_opt_pass *
> > +make_pass_fold_mem_offsets (gcc::context *ctxt)
> > +{
> > +  return new pass_fold_mem_offsets (ctxt);
> > +}
> > diff --git a/gcc/config/riscv/riscv-passes.def b/gcc/config/riscv/riscv-passes.def
> > index 4084122cf0a..dc08daadc66 100644
> > --- a/gcc/config/riscv/riscv-passes.def
> > +++ b/gcc/config/riscv/riscv-passes.def
> > @@ -18,4 +18,5 @@
> >     <http://www.gnu.org/licenses/>.  */
> >
> >  INSERT_PASS_AFTER (pass_rtl_store_motion, 1, pass_shorten_memrefs);
> > +INSERT_PASS_AFTER (pass_regrename, 1, pass_fold_mem_offsets);
> >  INSERT_PASS_BEFORE (pass_fast_rtl_dce, 1, pass_vsetvl);
> > diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> > index 5f78fd579bb..b89a82adb0e 100644
> > --- a/gcc/config/riscv/riscv-protos.h
> > +++ b/gcc/config/riscv/riscv-protos.h
> > @@ -104,6 +104,7 @@ extern void riscv_parse_arch_string (const char *, struct gcc_options *, locatio
> >  extern bool riscv_hard_regno_rename_ok (unsigned, unsigned);
> >
> >  rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
> > +rtl_opt_pass * make_pass_fold_mem_offsets (gcc::context *ctxt);
> >  rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
> >
> >  /* Information about one CPU we know about.  */
> > diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
> > index 63d4710cb15..5e1fbdbedcc 100644
> > --- a/gcc/config/riscv/riscv.opt
> > +++ b/gcc/config/riscv/riscv.opt
> > @@ -105,6 +105,10 @@ Convert BASE + LARGE_OFFSET addresses to NEW_BASE + SMALL_OFFSET to allow more
> >  memory accesses to be generated as compressed instructions.  Currently targets
> >  32-bit integer load/stores.
> >
> > +mfold-mem-offsets
> > +Target Bool Var(riscv_mfold_mem_offsets) Init(1)
> > +Fold instructions calculating memory offsets to the memory access instruction if possible.
> > +
> >  mcmodel=
> >  Target RejectNegative Joined Enum(code_model) Var(riscv_cmodel) Init(TARGET_DEFAULT_CMODEL)
> >  Specify the code model.
> > diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
> > index 1252d6f851a..f29cf463867 100644
> > --- a/gcc/config/riscv/t-riscv
> > +++ b/gcc/config/riscv/t-riscv
> > @@ -76,6 +76,10 @@ riscv-shorten-memrefs.o: $(srcdir)/config/riscv/riscv-shorten-memrefs.cc \
> >         $(COMPILE) $<
> >         $(POSTCOMPILE)
> >
> > +riscv-fold-mem-offsets.o: $(srcdir)/config/riscv/riscv-fold-mem-offsets.cc
> > +       $(COMPILE) $<
> > +       $(POSTCOMPILE)
> > +
> >  riscv-selftests.o: $(srcdir)/config/riscv/riscv-selftests.cc \
> >    $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(TREE_H) output.h \
> >    $(C_COMMON_H) $(TARGET_H) $(OPTABS_H) $(EXPR_H) $(INSN_ATTR_H) $(EMIT_RTL_H)
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index ee78591c73e..39b57cab595 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -1218,6 +1218,7 @@ See RS/6000 and PowerPC Options.
> >  -msmall-data-limit=@var{N-bytes}
> >  -msave-restore  -mno-save-restore
> >  -mshorten-memrefs  -mno-shorten-memrefs
> > +-mfold-mem-offsets  -mno-fold-mem-offsets
> >  -mstrict-align  -mno-strict-align
> >  -mcmodel=medlow  -mcmodel=medany
> >  -mexplicit-relocs  -mno-explicit-relocs
> > @@ -29048,6 +29049,13 @@ of 'new base + small offset'.  If the new base gets stored in a compressed
> >  register, then the new load/store can be compressed.  Currently targets 32-bit
> >  integer load/stores only.
> >
> > +@opindex mfold-mem-offsets
> > +@item -mfold-mem-offsets
> > +@itemx -mno-fold-mem-offsets
> > +Do or do not attempt to move constant addition calculations used to for memory
> > +offsets to the corresponding memory instructions.  The default is
> > +@option{-mfold-mem-offsets} at levels @option{-O2}, @option{-O3}.
> > +
> >  @opindex mstrict-align
> >  @item -mstrict-align
> >  @itemx -mno-strict-align
> > diff --git a/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
> > new file mode 100644
> > index 00000000000..574cc92b6ab
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mfold-mem-offsets" } */
> > +
> > +void sink(int arr[2]);
> > +
> > +void
> > +foo(int a, int b, int i)
> > +{
> > +  int arr[2] = {a, b};
> > +  arr[i]++;
> > +  sink(arr);
> > +}
> > +
> > +// Should compile without negative memory offsets when using -mfold-mem-offsets
> > +/* { dg-final { scan-assembler-not "lw\t.*,-.*\\(.*\\)" } } */
> > +/* { dg-final { scan-assembler-not "sw\t.*,-.*\\(.*\\)" } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
> > new file mode 100644
> > index 00000000000..e6c251d3a3c
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
> > @@ -0,0 +1,24 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mfold-mem-offsets" } */
> > +
> > +void sink(int arr[3]);
> > +
> > +void
> > +foo(int a, int b, int c, int i)
> > +{
> > +  int arr1[3] = {a, b, c};
> > +  int arr2[3] = {a, c, b};
> > +  int arr3[3] = {c, b, a};
> > +
> > +  arr1[i]++;
> > +  arr2[i]++;
> > +  arr3[i]++;
> > +
> > +  sink(arr1);
> > +  sink(arr2);
> > +  sink(arr3);
> > +}
> > +
> > +// Should compile without negative memory offsets when using -mfold-mem-offsets
> > +/* { dg-final { scan-assembler-not "lw\t.*,-.*\\(.*\\)" } } */
> > +/* { dg-final { scan-assembler-not "sw\t.*,-.*\\(.*\\)" } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c
> > new file mode 100644
> > index 00000000000..8586d3e3a29
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mfold-mem-offsets" } */
> > +
> > +void load(int arr[2]);
> > +
> > +int
> > +foo(long unsigned int i)
> > +{
> > +  int arr[2];
> > +  load(arr);
> > +
> > +  return arr[3 * i + 77];
> > +}
> > +
> > +// Should compile without negative memory offsets when using -mfold-mem-offsets
> > +/* { dg-final { scan-assembler-not "lw\t.*,-.*\\(.*\\)" } } */
> > +/* { dg-final { scan-assembler-not "addi\t.*,.*,77" } } */
> > \ No newline at end of file
> > --
> > 2.34.1
> >


> > gcc/ChangeLog:
> >
> >         * config.gcc: Add riscv-fold-mem-offsets.o to extra_objs.
> >         * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Schedule a new
> >         pass.
> >         * config/riscv/riscv-protos.h (make_pass_fold_mem_offsets): Declare.
> >         * config/riscv/riscv.opt: New options.
> >         * config/riscv/t-riscv: New build rule.
> >         * doc/invoke.texi: Document new option.
> >         * config/riscv/riscv-fold-mem-offsets.cc: New file.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/riscv/fold-mem-offsets-1.c: New test.
> >         * gcc.target/riscv/fold-mem-offsets-2.c: New test.
> >         * gcc.target/riscv/fold-mem-offsets-3.c: New test.
> >
> > Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
> > ---
> >
> >  gcc/config.gcc                                |   2 +-
> >  gcc/config/riscv/riscv-fold-mem-offsets.cc    | 637 ++++++++++++++++++
> >  gcc/config/riscv/riscv-passes.def             |   1 +
> >  gcc/config/riscv/riscv-protos.h               |   1 +
> >  gcc/config/riscv/riscv.opt                    |   4 +
> >  gcc/config/riscv/t-riscv                      |   4 +
> >  gcc/doc/invoke.texi                           |   8 +
> >  .../gcc.target/riscv/fold-mem-offsets-1.c     |  16 +
> >  .../gcc.target/riscv/fold-mem-offsets-2.c     |  24 +
> >  .../gcc.target/riscv/fold-mem-offsets-3.c     |  17 +
> >  10 files changed, 713 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/config/riscv/riscv-fold-mem-offsets.cc
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c
> >
> > diff --git a/gcc/config.gcc b/gcc/config.gcc
> > index d88071773c9..5dffd21b4c8 100644
> > --- a/gcc/config.gcc
> > +++ b/gcc/config.gcc
> > @@ -529,7 +529,7 @@ pru-*-*)
> >         ;;
> >  riscv*)
> >         cpu_type=riscv
> > -       extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o"
> > +       extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o riscv-fold-mem-offsets.o riscv-selftests.o riscv-v.o riscv-vsetvl.o"
> >         extra_objs="${extra_objs} riscv-vector-builtins.o riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
> >         extra_objs="${extra_objs} thead.o"
> >         d_target_objs="riscv-d.o"
> > diff --git a/gcc/config/riscv/riscv-fold-mem-offsets.cc b/gcc/config/riscv/riscv-fold-mem-offsets.cc
> > new file mode 100644
> > index 00000000000..81325bb3beb
> > --- /dev/null
> > +++ b/gcc/config/riscv/riscv-fold-mem-offsets.cc
> > @@ -0,0 +1,637 @@
> > +/* Fold memory offsets pass for RISC-V.
> > +   Copyright (C) 2022 Free Software Foundation, Inc.
> > +
> > +This file is part of GCC.
> > +
> > +GCC is free software; you can redistribute it and/or modify
> > +it under the terms of the GNU General Public License as published by
> > +the Free Software Foundation; either version 3, or (at your option)
> > +any later version.
> > +
> > +GCC is distributed in the hope that it will be useful,
> > +but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +GNU General Public License for more details.
> > +
> > +You should have received a copy of the GNU General Public License
> > +along with GCC; see the file COPYING3.  If not see
> > +<http://www.gnu.org/licenses/>.  */
> > +
> > +#define IN_TARGET_CODE 1
> > +
> > +#include "config.h"
> > +#include "system.h"
> > +#include "coretypes.h"
> > +#include "tm.h"
> > +#include "rtl.h"
> > +#include "tree.h"
> > +#include "expr.h"
> > +#include "backend.h"
> > +#include "regs.h"
> > +#include "target.h"
> > +#include "memmodel.h"
> > +#include "emit-rtl.h"
> > +#include "insn-config.h"
> > +#include "recog.h"
> > +#include "predict.h"
> > +#include "df.h"
> > +#include "tree-pass.h"
> > +#include "cfgrtl.h"
> > +
> > +/* This pass tries to optimize memory offset calculations by moving them
> > +   from add immediate instructions to the memory loads/stores.
> > +   For example it can transform this:
> > +
> > +     addi t4,sp,16
> > +     add  t2,a6,t4
> > +     shl  t3,t2,1
> > +     ld   a2,0(t3)
> > +     addi a2,1
> > +     sd   a2,8(t2)
> > +
> > +   into the following (one instruction less):
> > +
> > +     add  t2,a6,sp
> > +     shl  t3,t2,1
> > +     ld   a2,32(t3)
> > +     addi a2,1
> > +     sd   a2,24(t2)
> > +
> > +   Usually, the code generated from the previous passes tries to have the
> > +   offsets in the memory instructions but this pass is still beneficial
> > +   because:
> > +
> > +    - There are cases where add instructions are added in a late rtl pass
> > +      and the rest of the pipeline cannot eliminate them.  Specifically,
> > +      arrays and structs allocated on the stack can result in multiple
> > +      unnecessary add instructions that cannot be eliminated easily
> > +      otherwise.
> > +
> > +    - The existing mechanisms that move offsets to memory instructions
> > +      usually apply only to specific patterns or have other limitations.
> > +      This pass is very generic and can fold offsets through complex
> > +      calculations with multiple memory uses and partially overlapping
> > +      calculations.  As a result it can eliminate more instructions than
> > +      what is possible otherwise.
> > +
> > +   This pass runs inside a single basic blocks and consists of 4 phases:
> > +
> > +    - Phase 1 (Analysis): Find "foldable" instructions.
> > +      Foldable instructions are those that we know how to propagate
> > +      a constant addition through (add, slli, mv, ...) and only have other
> > +      foldable instructions for uses.  In that phase a DFS traversal on the
> > +      definition tree is performed and foldable instructions are marked on
> > +      a bitmap.  The add immediate instructions that are reachable in this
> > +      DFS are candidates for removal since all the intermediate
> > +      calculations affected by them are also foldable.
> > +
> > +    - Phase 2 (Validity): Traverse again, this time calculating the
> > +      offsets that would result from folding all add immediate instructions
> > +      found.  Also keep track of which instructions will be folded for this
> > +      particular offset because folding can be partially or completely
> > +      shared across an number of different memory instructions.  At this point,
> > +      since we calculated the actual offset resulting from folding, we check
> > +      and keep track if it's a valid 12-bit immediate.
> > +
> > +    - Phase 3 (Commit offsets): Traverse again.  This time it is known if
> > +      a particular fold is valid so actually fold the offset by changing
> > +      the RTL statement.  It's important that this phase is separate from the
> > +      previous because one instruction that is foldable with a valid offset
> > +      can become result in an invalid offset for another instruction later on.
> > +
> > +    - Phase 4 (Commit instruction deletions): Scan all insns and delete
> > +      all add immediate instructions that were folded.  */
> > +
> > +namespace {
> > +
> > +const pass_data pass_data_fold_mem =
> > +{
> > +  RTL_PASS, /* type */
> > +  "fold_mem_offsets", /* name */
> > +  OPTGROUP_NONE, /* optinfo_flags */
> > +  TV_NONE, /* tv_id */
> > +  0, /* properties_required */
> > +  0, /* properties_provided */
> > +  0, /* properties_destroyed */
> > +  0, /* todo_flags_start */
> > +  TODO_df_finish, /* todo_flags_finish */
> > +};
> > +
> > +class pass_fold_mem_offsets : public rtl_opt_pass
> > +{
> > +public:
> > +  pass_fold_mem_offsets (gcc::context *ctxt)
> > +    : rtl_opt_pass (pass_data_fold_mem, ctxt)
> > +  {}
> > +
> > +  /* opt_pass methods: */
> > +  virtual bool gate (function *)
> > +    {
> > +      return riscv_mfold_mem_offsets
> > +              && optimize >= 2;
> > +    }
> > +
> > +  virtual unsigned int execute (function *);
> > +}; // class pass_fold_mem_offsets
> > +
> > +/* Bitmap that tracks which instructions are reachable through sequences
> > +   of foldable instructions.  */
> > +static bitmap_head can_fold_insn;
> > +
> > +/* Bitmap with instructions marked for deletion due to folding.  */
> > +static bitmap_head pending_remove_insn;
> > +
> > +/* Bitmap with instructions that cannot be deleted because that would
> > +   require folding an offset that's invalid in some memory access.
> > +   An instruction can be in both PENDING_REMOVE_INSN and CANNOT_REMOVE_INSN
> > +   at the same time, in which case it cannot be safely deleted.  */
> > +static bitmap_head cannot_remove_insn;
> > +
> > +/* The number of folded addi instructions of the form "addi reg, sp, X".  */
> > +static int stats_folded_sp;
> > +
> > +/* The number of the rest folded addi instructions.  */
> > +static int stats_folded_other;
> > +
> > +enum fold_mem_phase
> > +{
> > +  FM_PHASE_ANALYSIS,
> > +  FM_PHASE_VALIDITY,
> > +  FM_PHASE_COMMIT_OFFSETS,
> > +  FM_PHASE_COMMIT_INSNS
> > +};
> > +
> > +/* Helper function for fold_offsets.
> > +  Get the single reaching definition of an instruction inside a BB.
> > +  The definition is desired for REG used in INSN.
> > +  Return the definition insn or NULL if there's no definition with
> > +  the desired criteria.  */
> > +static rtx_insn*
> > +get_single_def_in_bb (rtx_insn *insn, rtx reg)
> > +{
> > +  df_ref use;
> > +  struct df_link *ref_chain, *ref_link;
> > +
> > +  FOR_EACH_INSN_USE (use, insn)
> > +    {
> > +      if (GET_CODE (DF_REF_REG (use)) == SUBREG)
> > +       return NULL;
> > +      if (REGNO (DF_REF_REG (use)) == REGNO (reg))
> > +       break;
> > +    }
> > +
> > +  if (!use)
> > +    return NULL;
> > +
> > +  ref_chain = DF_REF_CHAIN (use);
> > +
> > +  if (!ref_chain)
> > +    return NULL;
> > +
> > +  for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
> > +    {
> > +      /* Problem getting some definition for this instruction.  */
> > +      if (ref_link->ref == NULL)
> > +       return NULL;
> > +      if (DF_REF_INSN_INFO (ref_link->ref) == NULL)
> > +       return NULL;
> > +      if (global_regs[REGNO (reg)]
> > +         && !set_of (reg, DF_REF_INSN (ref_link->ref)))
> > +       return NULL;
> > +    }
> > +
> > +  if (ref_chain->next)
> > +    return NULL;
> > +
> > +  rtx_insn* def = DF_REF_INSN (ref_chain->ref);
> > +
> > +  if (BLOCK_FOR_INSN (def) != BLOCK_FOR_INSN (insn))
> > +    return NULL;
> > +
> > +  if (DF_INSN_LUID (def) > DF_INSN_LUID (insn))
> > +    return NULL;
> > +
> > +  return def;
> > +}
> > +
> > +/* Helper function for fold_offsets.
> > +   Get all the reaching uses of an instruction.  The uses are desired for REG
> > +   set in INSN.  Return use list or NULL if a use is missing or irregular.
> > +   If SUCCESS is not NULL then it's value is set to false if there are
> > +   missing or irregular uses and to true otherwise.  */
> > +static struct df_link*
> > +get_uses (rtx_insn *insn, rtx reg, bool* success)
> > +{
> > +  df_ref def;
> > +  struct df_link *ref_chain, *ref_link;
> > +
> > +  if (success != NULL)
> > +    *success = false;
> > +
> > +  FOR_EACH_INSN_DEF (def, insn)
> > +    if (REGNO (DF_REF_REG (def)) == REGNO (reg))
> > +      break;
> > +
> > +  if (!def)
> > +    return NULL;
> > +
> > +  ref_chain = DF_REF_CHAIN (def);
> > +
> > +  for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
> > +    {
> > +      /* Problem getting some use for this instruction.  */
> > +      if (ref_link->ref == NULL)
> > +       return NULL;
> > +      if (DF_REF_CLASS (ref_link->ref) != DF_REF_REGULAR)
> > +       return NULL;
> > +    }
> > +
> > +  if (success != NULL)
> > +    *success = true;
> > +
> > +  return ref_chain;
> > +}
> > +
> > +/* Recursive function that computes the foldable offsets through the
> > +   definitions of REG in INSN given an integer scale factor SCALE.
> > +   Returns the offset that would have to be added if all instructions
> > +   in PENDING_DELETES were to be deleted.
> > +
> > +  - if ANALYZE is true then it recurses through definitions with the common
> > +    code and marks eligible for folding instructions in the bitmap
> > +    can_fold_insn.  An instruction is eligible if all it's uses are also
> > +    eligible.  Initially can_fold_insn is true for memory accesses.
> > +
> > +  - if ANALYZE is false then it recurses through definitions with the common
> > +    code and computes and returns the offset that would result from folding
> > +    the instructions in PENDING_DELETES were to be deleted.  */
> > +static HOST_WIDE_INT
> > +fold_offsets (rtx_insn* insn, rtx reg, int scale, bool analyze,
> > +             bitmap pending_deletes)
> > +{
> > +  rtx_insn* def = get_single_def_in_bb (insn, reg);
> > +
> > +  if (!def)
> > +    return 0;
> > +
> > +  rtx set = single_set (def);
> > +
> > +  if (!set)
> > +    return 0;
> > +
> > +  rtx src = SET_SRC (set);
> > +  rtx dest = SET_DEST (set);
> > +
> > +  enum rtx_code code = GET_CODE (src);
> > +
> > +  /* Return early for SRC codes that we don't know how to handle.  */
> > +  if (code != PLUS && code != ASHIFT && code != REG)
> > +    return 0;
> > +
> > +  unsigned int dest_regno = REGNO (dest);
> > +
> > +  /* We don't want to fold offsets from instructions that change some
> > +     particular registers with potentially global side effects.  */
> > +  if (!GP_REG_P (dest_regno)
> > +      || dest_regno == STACK_POINTER_REGNUM
> > +      || (frame_pointer_needed && dest_regno == HARD_FRAME_POINTER_REGNUM)
> > +      || dest_regno == GP_REGNUM
> > +      || dest_regno == THREAD_POINTER_REGNUM
> > +      || dest_regno == RETURN_ADDR_REGNUM)
> > +    return 0;
> > +
> > +  if (analyze)
> > +    {
> > +      /* We can only fold through instructions that are eventually used as
> > +        memory addresses and do not have other uses.  Use the same logic
> > +        from the offset calculation to visit instructions that can
> > +        propagate offsets and keep track in can_fold_insn which have uses
> > +        that end always in memory instructions.  */
> > +
> > +      if (REG_P (dest))
> > +       {
> > +         bool success;
> > +         struct df_link *uses = get_uses (def, dest, &success), *ref_link;
> > +
> > +         if (!success)
> > +           return 0;
> > +
> > +         for (ref_link = uses; ref_link; ref_link = ref_link->next)
> > +           {
> > +             rtx_insn* use = DF_REF_INSN (ref_link->ref);
> > +
> > +             /* Ignore debug insns during analysis.  */
> > +             if (DEBUG_INSN_P (use))
> > +               continue;
> > +
> > +             if (!bitmap_bit_p (&can_fold_insn, INSN_UID (use)))
> > +               return 0;
> > +
> > +             rtx use_set = single_set (use);
> > +
> > +             /* Prevent folding when a memory store uses the dest register.  */
> > +             if (use_set
> > +                 && MEM_P (SET_DEST (use_set))
> > +                 && REG_P (SET_SRC (use_set))
> > +                 && REGNO (SET_SRC (use_set)) == REGNO (dest))
> > +               return 0;
> > +           }
> > +
> > +         bitmap_set_bit (&can_fold_insn, INSN_UID (def));
> > +       }
> > +    }
> > +
> > +  if (!bitmap_bit_p (&can_fold_insn, INSN_UID (def)))
> > +    return 0;
> > +
> > +  switch (code)
> > +    {
> > +    case PLUS:
> > +      {
> > +       /* Propagate through add.  */
> > +       rtx arg1 = XEXP (src, 0);
> > +       rtx arg2 = XEXP (src, 1);
> > +
> > +       HOST_WIDE_INT offset = 0;
> > +
> > +       if (REG_P (arg1))
> > +         offset += fold_offsets (def, arg1, 1, analyze, pending_deletes);
> > +       else if (GET_CODE (arg1) == ASHIFT && REG_P (XEXP (arg1, 0))
> > +                && CONST_INT_P (XEXP (arg1, 1)))
> > +         {
> > +           /* Also handle shift-and-add from the zbb extension.  */
> > +           int shift_scale = (1 << (int) INTVAL (XEXP (arg1, 1)));
> > +           offset += fold_offsets (def, XEXP (arg1, 0), shift_scale, analyze,
> > +                                   pending_deletes);
> > +         }
> > +
> > +       if (REG_P (arg2))
> > +         offset += fold_offsets (def, arg2, 1, analyze, pending_deletes);
> > +       else if (CONST_INT_P (arg2) && !analyze)
> > +         {
> > +           offset += INTVAL (arg2);
> > +           bitmap_set_bit (pending_deletes, INSN_UID (def));
> > +         }
> > +
> > +       return scale * offset;
> > +      }
> > +    case ASHIFT:
> > +      {
> > +       /* Propagate through sll.  */
> > +       rtx arg1 = XEXP (src, 0);
> > +       rtx arg2 = XEXP (src, 1);
> > +
> > +       if (REG_P (arg1) && CONST_INT_P (arg2))
> > +         {
> > +           int shift_scale = (1 << (int) INTVAL (arg2));
> > +           return scale * fold_offsets (def, arg1, shift_scale, analyze,
> > +                                        pending_deletes);
> > +         }
> > +
> > +       return 0;
> > +      }
> > +    case REG:
> > +      /* Propagate through mv.  */
> > +      return scale * fold_offsets (def, src, 1, analyze, pending_deletes);
> > +    default:
> > +      /* Cannot propagate.  */
> > +      return 0;
> > +    }
> > +}
> > +
> > +/* Helper function for fold_offset_mem.
> > +   If INSN is a set rtx that loads from or stores to
> > +   some memory location that could have an offset folded
> > +   to it, return the rtx for the memory operand.  */
> > +static rtx
> > +get_foldable_mem_rtx (rtx_insn* insn)
> > +{
> > +  rtx set = single_set (insn);
> > +
> > +  if (set != NULL_RTX)
> > +    {
> > +      rtx src = SET_SRC (set);
> > +      rtx dest = SET_DEST (set);
> > +
> > +      /* We don't want folding if the memory has
> > +        unspec/unspec volatile in either src or dest.
> > +        In particular this also prevents folding
> > +        when atomics are involved.  */
> > +      if (GET_CODE (src) == UNSPEC
> > +         || GET_CODE (src) == UNSPEC_VOLATILE
> > +         || GET_CODE (dest) == UNSPEC
> > +         || GET_CODE (dest) == UNSPEC_VOLATILE)
> > +       return NULL;
> > +
> > +      if (MEM_P (src))
> > +       return src;
> > +      else if (MEM_P (dest))
> > +       return dest;
> > +      else if ((
> > +               GET_CODE (src) == SIGN_EXTEND
> > +               || GET_CODE (src) == ZERO_EXTEND
> > +             )
> > +             && MEM_P (XEXP (src, 0)))
> > +       return XEXP (src, 0);
> > +    }
> > +
> > +  return NULL;
> > +}
> > +
> > +/* Driver function that performs the actions defined by PHASE for INSN.  */
> > +static void
> > +fold_offset_mem (rtx_insn* insn, int phase)
> > +{
> > +  if (phase == FM_PHASE_COMMIT_INSNS)
> > +    {
> > +      if (bitmap_bit_p (&pending_remove_insn, INSN_UID (insn))
> > +         && !bitmap_bit_p (&cannot_remove_insn, INSN_UID (insn)))
> > +       {
> > +         rtx set = single_set (insn);
> > +         rtx src = SET_SRC (set);
> > +         rtx dest = SET_DEST (set);
> > +         rtx arg1 = XEXP (src, 0);
> > +
> > +         /* INSN is an add immidiate addi DEST, SRC1, SRC2 that we
> > +            must replace with addi DEST, SRC1, 0.  */
> > +         if (XEXP (src, 0) == stack_pointer_rtx)
> > +           stats_folded_sp++;
> > +         else
> > +           stats_folded_other++;
> > +
> > +         if (dump_file)
> > +           {
> > +             fprintf (dump_file, "Instruction deleted from folding:");
> > +             print_rtl_single (dump_file, insn);
> > +           }
> > +
> > +         if (REGNO (dest) != REGNO (arg1))
> > +           {
> > +             /* If the dest register is different than the fisrt argument
> > +                then the addition with constant 0 is equivalent to a move
> > +                instruction.  We emit the move and let the subsequent
> > +                pass cprop_hardreg eliminate that if possible.  */
> > +             rtx arg1_reg_rtx = gen_rtx_REG (GET_MODE (dest), REGNO (arg1));
> > +             rtx mov_rtx = gen_move_insn (dest, arg1_reg_rtx);
> > +             df_insn_rescan (emit_insn_after (mov_rtx, insn));
> > +           }
> > +
> > +         /* If the dest register is the same with the first argument
> > +            then the addition with constant 0 is a no-op.
> > +            We can now delete the original add immidiate instruction.  */
> > +         delete_insn (insn);
> > +       }
> > +    }
> > +  else
> > +    {
> > +      rtx mem = get_foldable_mem_rtx (insn);
> > +
> > +      if (!mem)
> > +       return;
> > +
> > +      rtx mem_addr = XEXP (mem, 0);
> > +      rtx reg;
> > +      HOST_WIDE_INT cur_off;
> > +
> > +      if (REG_P (mem_addr))
> > +       {
> > +         reg = mem_addr;
> > +         cur_off = 0;
> > +       }
> > +      else if (GET_CODE (mem_addr) == PLUS
> > +              && REG_P (XEXP (mem_addr, 0))
> > +              && CONST_INT_P (XEXP (mem_addr, 1)))
> > +       {
> > +         reg = XEXP (mem_addr, 0);
> > +         cur_off = INTVAL (XEXP (mem_addr, 1));
> > +       }
> > +      else
> > +       return;
> > +
> > +      if (phase == FM_PHASE_ANALYSIS)
> > +       {
> > +         bitmap_set_bit (&can_fold_insn, INSN_UID (insn));
> > +         fold_offsets (insn, reg, 1, true, NULL);
> > +       }
> > +      else if (phase == FM_PHASE_VALIDITY)
> > +       {
> > +         bitmap_head new_pending_deletes;
> > +         bitmap_initialize (&new_pending_deletes, NULL);
> > +         HOST_WIDE_INT offset = cur_off + fold_offsets (insn, reg, 1, false,
> > +                                                       &new_pending_deletes);
> > +
> > +         /* Temporarily change the offset in MEM to test whether
> > +            it results in a valid instruction.  */
> > +         machine_mode mode = GET_MODE (mem_addr);
> > +         XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, GEN_INT (offset));
> > +
> > +         bool valid_change = recog (PATTERN (insn), insn, 0) >= 0;
> > +
> > +         /* Restore the instruction.  */
> > +         XEXP (mem, 0) = mem_addr;
> > +
> > +         if (valid_change)
> > +           bitmap_ior_into (&pending_remove_insn, &new_pending_deletes);
> > +         else
> > +           bitmap_ior_into (&cannot_remove_insn, &new_pending_deletes);
> > +         bitmap_release (&new_pending_deletes);
> > +       }
> > +      else if (phase == FM_PHASE_COMMIT_OFFSETS)
> > +       {
> > +         bitmap_head required_deletes;
> > +         bitmap_initialize (&required_deletes, NULL);
> > +         HOST_WIDE_INT offset = cur_off + fold_offsets (insn, reg, 1, false,
> > +                                                        &required_deletes);
> > +         bool illegal = bitmap_intersect_p (&required_deletes,
> > +                                            &cannot_remove_insn);
> > +
> > +         if (offset == cur_off)
> > +           return;
> > +
> > +         gcc_assert (!bitmap_empty_p (&required_deletes));
> > +
> > +         /* We have to update CANNOT_REMOVE_INSN again if transforming
> > +            this instruction is illegal.  */
> > +         if (illegal)
> > +           bitmap_ior_into (&cannot_remove_insn, &required_deletes);
> > +         else
> > +           {
> > +             machine_mode mode = GET_MODE (mem_addr);
> > +             XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, GEN_INT (offset));
> > +             df_insn_rescan (insn);
> > +
> > +             if (dump_file)
> > +               {
> > +                 fprintf (dump_file, "Memory offset changed from "
> > +                                     HOST_WIDE_INT_PRINT_DEC
> > +                                     " to "
> > +                                     HOST_WIDE_INT_PRINT_DEC
> > +                                     " for instruction:\n", cur_off, offset);
> > +                       print_rtl_single (dump_file, insn);
> > +               }
> > +           }
> > +         bitmap_release (&required_deletes);
> > +       }
> > +    }
> > +}
> > +
> > +unsigned int
> > +pass_fold_mem_offsets::execute (function *fn)
> > +{
> > +  basic_block bb;
> > +  rtx_insn *insn;
> > +
> > +  df_set_flags (DF_RD_PRUNE_DEAD_DEFS | DF_DEFER_INSN_RESCAN);
> > +  df_chain_add_problem (DF_UD_CHAIN + DF_DU_CHAIN);
> > +  df_analyze ();
> > +
> > +  bitmap_initialize (&can_fold_insn, NULL);
> > +  bitmap_initialize (&pending_remove_insn, NULL);
> > +  bitmap_initialize (&cannot_remove_insn, NULL);
> > +
> > +  stats_folded_sp = 0;
> > +  stats_folded_other = 0;
> > +
> > +  FOR_ALL_BB_FN (bb, fn)
> > +    {
> > +      /* The shorten-memrefs pass runs when a BB is optimized for size
> > +        and moves offsets from multiple memory instructions to a common
> > +        add instruction.  Disable folding if optimizing for size because
> > +        this pass will cancel the effects of shorten-memrefs.  */
> > +      if (optimize_bb_for_size_p (bb))
> > +       continue;
> > +
> > +      bitmap_clear (&can_fold_insn);
> > +      bitmap_clear (&pending_remove_insn);
> > +      bitmap_clear (&cannot_remove_insn);
> > +
> > +      FOR_BB_INSNS (bb, insn)
> > +       fold_offset_mem (insn, FM_PHASE_ANALYSIS);
> > +
> > +      FOR_BB_INSNS (bb, insn)
> > +       fold_offset_mem (insn, FM_PHASE_VALIDITY);
> > +
> > +      FOR_BB_INSNS (bb, insn)
> > +       fold_offset_mem (insn, FM_PHASE_COMMIT_OFFSETS);
> > +
> > +      FOR_BB_INSNS (bb, insn)
> > +       fold_offset_mem (insn, FM_PHASE_COMMIT_INSNS);
> > +    }
> > +
> > +  statistics_counter_event (cfun, "addi with sp fold", stats_folded_sp);
> > +  statistics_counter_event (cfun, "other addi fold", stats_folded_other);
> > +
> > +  bitmap_release (&can_fold_insn);
> > +  bitmap_release (&pending_remove_insn);
> > +  bitmap_release (&cannot_remove_insn);
> > +
> > +  return 0;
> > +}
> > +
> > +} // anon namespace
> > +
> > +rtl_opt_pass *
> > +make_pass_fold_mem_offsets (gcc::context *ctxt)
> > +{
> > +  return new pass_fold_mem_offsets (ctxt);
> > +}
> > diff --git a/gcc/config/riscv/riscv-passes.def b/gcc/config/riscv/riscv-passes.def
> > index 4084122cf0a..dc08daadc66 100644
> > --- a/gcc/config/riscv/riscv-passes.def
> > +++ b/gcc/config/riscv/riscv-passes.def
> > @@ -18,4 +18,5 @@
> >     <http://www.gnu.org/licenses/>.  */
> >
> >  INSERT_PASS_AFTER (pass_rtl_store_motion, 1, pass_shorten_memrefs);
> > +INSERT_PASS_AFTER (pass_regrename, 1, pass_fold_mem_offsets);
> >  INSERT_PASS_BEFORE (pass_fast_rtl_dce, 1, pass_vsetvl);
> > diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> > index 5f78fd579bb..b89a82adb0e 100644
> > --- a/gcc/config/riscv/riscv-protos.h
> > +++ b/gcc/config/riscv/riscv-protos.h
> > @@ -104,6 +104,7 @@ extern void riscv_parse_arch_string (const char *, struct gcc_options *, locatio
> >  extern bool riscv_hard_regno_rename_ok (unsigned, unsigned);
> >
> >  rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
> > +rtl_opt_pass * make_pass_fold_mem_offsets (gcc::context *ctxt);
> >  rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
> >
> >  /* Information about one CPU we know about.  */
> > diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
> > index 63d4710cb15..5e1fbdbedcc 100644
> > --- a/gcc/config/riscv/riscv.opt
> > +++ b/gcc/config/riscv/riscv.opt
> > @@ -105,6 +105,10 @@ Convert BASE + LARGE_OFFSET addresses to NEW_BASE + SMALL_OFFSET to allow more
> >  memory accesses to be generated as compressed instructions.  Currently targets
> >  32-bit integer load/stores.
> >
> > +mfold-mem-offsets
> > +Target Bool Var(riscv_mfold_mem_offsets) Init(1)
> > +Fold instructions calculating memory offsets to the memory access instruction if possible.
> > +
> >  mcmodel=
> >  Target RejectNegative Joined Enum(code_model) Var(riscv_cmodel) Init(TARGET_DEFAULT_CMODEL)
> >  Specify the code model.
> > diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv
> > index 1252d6f851a..f29cf463867 100644
> > --- a/gcc/config/riscv/t-riscv
> > +++ b/gcc/config/riscv/t-riscv
> > @@ -76,6 +76,10 @@ riscv-shorten-memrefs.o: $(srcdir)/config/riscv/riscv-shorten-memrefs.cc \
> >         $(COMPILE) $<
> >         $(POSTCOMPILE)
> >
> > +riscv-fold-mem-offsets.o: $(srcdir)/config/riscv/riscv-fold-mem-offsets.cc
> > +       $(COMPILE) $<
> > +       $(POSTCOMPILE)
> > +
> >  riscv-selftests.o: $(srcdir)/config/riscv/riscv-selftests.cc \
> >    $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(TREE_H) output.h \
> >    $(C_COMMON_H) $(TARGET_H) $(OPTABS_H) $(EXPR_H) $(INSN_ATTR_H) $(EMIT_RTL_H)
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index ee78591c73e..39b57cab595 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -1218,6 +1218,7 @@ See RS/6000 and PowerPC Options.
> >  -msmall-data-limit=@var{N-bytes}
> >  -msave-restore  -mno-save-restore
> >  -mshorten-memrefs  -mno-shorten-memrefs
> > +-mfold-mem-offsets  -mno-fold-mem-offsets
> >  -mstrict-align  -mno-strict-align
> >  -mcmodel=medlow  -mcmodel=medany
> >  -mexplicit-relocs  -mno-explicit-relocs
> > @@ -29048,6 +29049,13 @@ of 'new base + small offset'.  If the new base gets stored in a compressed
> >  register, then the new load/store can be compressed.  Currently targets 32-bit
> >  integer load/stores only.
> >
> > +@opindex mfold-mem-offsets
> > +@item -mfold-mem-offsets
> > +@itemx -mno-fold-mem-offsets
> > +Do or do not attempt to move constant addition calculations used to for memory
> > +offsets to the corresponding memory instructions.  The default is
> > +@option{-mfold-mem-offsets} at levels @option{-O2}, @option{-O3}.
> > +
> >  @opindex mstrict-align
> >  @item -mstrict-align
> >  @itemx -mno-strict-align
> > diff --git a/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
> > new file mode 100644
> > index 00000000000..574cc92b6ab
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mfold-mem-offsets" } */
> > +
> > +void sink(int arr[2]);
> > +
> > +void
> > +foo(int a, int b, int i)
> > +{
> > +  int arr[2] = {a, b};
> > +  arr[i]++;
> > +  sink(arr);
> > +}
> > +
> > +// Should compile without negative memory offsets when using -mfold-mem-offsets
> > +/* { dg-final { scan-assembler-not "lw\t.*,-.*\\(.*\\)" } } */
> > +/* { dg-final { scan-assembler-not "sw\t.*,-.*\\(.*\\)" } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
> > new file mode 100644
> > index 00000000000..e6c251d3a3c
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
> > @@ -0,0 +1,24 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mfold-mem-offsets" } */
> > +
> > +void sink(int arr[3]);
> > +
> > +void
> > +foo(int a, int b, int c, int i)
> > +{
> > +  int arr1[3] = {a, b, c};
> > +  int arr2[3] = {a, c, b};
> > +  int arr3[3] = {c, b, a};
> > +
> > +  arr1[i]++;
> > +  arr2[i]++;
> > +  arr3[i]++;
> > +
> > +  sink(arr1);
> > +  sink(arr2);
> > +  sink(arr3);
> > +}
> > +
> > +// Should compile without negative memory offsets when using -mfold-mem-offsets
> > +/* { dg-final { scan-assembler-not "lw\t.*,-.*\\(.*\\)" } } */
> > +/* { dg-final { scan-assembler-not "sw\t.*,-.*\\(.*\\)" } } */
> > \ No newline at end of file
> > diff --git a/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c
> > new file mode 100644
> > index 00000000000..8586d3e3a29
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c
> > @@ -0,0 +1,17 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mfold-mem-offsets" } */
> > +
> > +void load(int arr[2]);
> > +
> > +int
> > +foo(long unsigned int i)
> > +{
> > +  int arr[2];
> > +  load(arr);
> > +
> > +  return arr[3 * i + 77];
> > +}
> > +
> > +// Should compile without negative memory offsets when using -mfold-mem-offsets
> > +/* { dg-final { scan-assembler-not "lw\t.*,-.*\\(.*\\)" } } */
> > +/* { dg-final { scan-assembler-not "addi\t.*,.*,77" } } */
> > \ No newline at end of file
> > --
> > 2.34.1
> >

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.
  2023-05-25 13:01   ` Richard Biener
  2023-05-25 13:25     ` Manolis Tsamis
@ 2023-05-25 13:31     ` Jeff Law
  2023-05-25 13:50       ` Richard Biener
  1 sibling, 1 reply; 45+ messages in thread
From: Jeff Law @ 2023-05-25 13:31 UTC (permalink / raw)
  To: gcc-patches



On 5/25/23 07:01, Richard Biener via Gcc-patches wrote:
> On Thu, May 25, 2023 at 2:36 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
>>
>> Implementation of the new RISC-V optimization pass for memory offset
>> calculations, documentation and testcases.
> 
> Why do fwprop or combine not what you want to do?
I think a lot of them end up coming from register elimination.

jeff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.
  2023-05-25 12:35 ` [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible Manolis Tsamis
@ 2023-05-25 13:38   ` Jeff Law
  2023-05-31 12:15     ` Manolis Tsamis
  2023-06-07 22:18   ` Jeff Law
  1 sibling, 1 reply; 45+ messages in thread
From: Jeff Law @ 2023-05-25 13:38 UTC (permalink / raw)
  To: Manolis Tsamis, gcc-patches
  Cc: Richard Biener, Palmer Dabbelt, Philipp Tomsich, Kito Cheng



On 5/25/23 06:35, Manolis Tsamis wrote:
> Propagation of the stack pointer in cprop_hardreg is currenty forbidden
> in all cases, due to maybe_mode_change returning NULL. Relax this
> restriction and allow propagation when no mode change is requested.
> 
> gcc/ChangeLog:
> 
>          * regcprop.cc (maybe_mode_change): Enable stack pointer propagation.
I can't see how this can be correct given the stack pointer equality 
tests elsewhere in the compiler, particularly the various targets.

The problem is if you change the mode then you end up with multiple REG 
expressions that reference the stack pointer.

See rev: d1446456c3fcaa7be628726c9de4a877729490ca and the thread around 
the change which introduced this code.


Jeff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/2] RISC-V: New pass to optimize calculation of offsets for memory operations.
  2023-05-25 12:35 [PATCH 0/2] RISC-V: New pass to optimize calculation of offsets for memory operations Manolis Tsamis
  2023-05-25 12:35 ` [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets Manolis Tsamis
  2023-05-25 12:35 ` [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible Manolis Tsamis
@ 2023-05-25 13:42 ` Jeff Law
  2023-05-25 13:57   ` Manolis Tsamis
  2023-06-15 15:04   ` Jeff Law
  2 siblings, 2 replies; 45+ messages in thread
From: Jeff Law @ 2023-05-25 13:42 UTC (permalink / raw)
  To: Manolis Tsamis, gcc-patches
  Cc: Richard Biener, Palmer Dabbelt, Philipp Tomsich, Kito Cheng

On 5/25/23 06:35, Manolis Tsamis wrote:
> 
> This pass tries to optimize memory offset calculations by moving them
> from add immediate instructions to the memory loads/stores.
> For example it can transform this:
> 
>    addi t4,sp,16
>    add  t2,a6,t4
>    shl  t3,t2,1
>    ld   a2,0(t3)
>    addi a2,1
>    sd   a2,8(t2)
> 
> into the following (one instruction less):
> 
>    add  t2,a6,sp
>    shl  t3,t2,1
>    ld   a2,32(t3)
>    addi a2,1
>    sd   a2,24(t2)
> 
> Although there are places where this is done already, this pass is more
> powerful and can handle the more difficult cases that are currently not
> optimized. Also, it runs late enough and can optimize away unnecessary
> stack pointer calculations.
> 
> The first patch in the series contains the implementation of this pass
> while the second is a minor change that enables cprop_hardreg's
> propgation of the stack pointer, because this pass depends on cprop
> to do the propagation of optimized operations. If preferred I can split
> this into two different patches (in which cases some of the testcases
> included will fail temporarily).
Thanks Manolis.  Do you happen to know if this includes the fixes I 
passed along to Philipp a few months back?  My recollection is one fixed 
stale DF data which prevented an ICE during bootstrapping, the other 
needed to ignore debug insns in one or two places so that the behavior 
didn't change based on the existence of debug insns.

Jeff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.
  2023-05-25 13:31     ` Jeff Law
@ 2023-05-25 13:50       ` Richard Biener
  2023-05-25 14:02         ` Manolis Tsamis
  2023-05-25 14:13         ` Jeff Law
  0 siblings, 2 replies; 45+ messages in thread
From: Richard Biener @ 2023-05-25 13:50 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

On Thu, May 25, 2023 at 3:32 PM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
>
> On 5/25/23 07:01, Richard Biener via Gcc-patches wrote:
> > On Thu, May 25, 2023 at 2:36 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
> >>
> >> Implementation of the new RISC-V optimization pass for memory offset
> >> calculations, documentation and testcases.
> >
> > Why do fwprop or combine not what you want to do?
> I think a lot of them end up coming from register elimination.

Why isn't this a problem for other targets then?  Or maybe it is and this
shouldn't be a machine specific pass?  Maybe postreload-gcse should
perform strength reduction (I can't think of any other post reload pass
that would do something even remotely related).

Richard.

> jeff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/2] RISC-V: New pass to optimize calculation of offsets for memory operations.
  2023-05-25 13:42 ` [PATCH 0/2] RISC-V: New pass to optimize calculation of offsets for memory operations Jeff Law
@ 2023-05-25 13:57   ` Manolis Tsamis
  2023-06-15 15:04   ` Jeff Law
  1 sibling, 0 replies; 45+ messages in thread
From: Manolis Tsamis @ 2023-05-25 13:57 UTC (permalink / raw)
  To: Jeff Law
  Cc: gcc-patches, Richard Biener, Palmer Dabbelt, Philipp Tomsich, Kito Cheng

On Thu, May 25, 2023 at 4:42 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 5/25/23 06:35, Manolis Tsamis wrote:
> >
> > This pass tries to optimize memory offset calculations by moving them
> > from add immediate instructions to the memory loads/stores.
> > For example it can transform this:
> >
> >    addi t4,sp,16
> >    add  t2,a6,t4
> >    shl  t3,t2,1
> >    ld   a2,0(t3)
> >    addi a2,1
> >    sd   a2,8(t2)
> >
> > into the following (one instruction less):
> >
> >    add  t2,a6,sp
> >    shl  t3,t2,1
> >    ld   a2,32(t3)
> >    addi a2,1
> >    sd   a2,24(t2)
> >
> > Although there are places where this is done already, this pass is more
> > powerful and can handle the more difficult cases that are currently not
> > optimized. Also, it runs late enough and can optimize away unnecessary
> > stack pointer calculations.
> >
> > The first patch in the series contains the implementation of this pass
> > while the second is a minor change that enables cprop_hardreg's
> > propgation of the stack pointer, because this pass depends on cprop
> > to do the propagation of optimized operations. If preferred I can split
> > this into two different patches (in which cases some of the testcases
> > included will fail temporarily).
> Thanks Manolis.  Do you happen to know if this includes the fixes I
> passed along to Philipp a few months back?  My recollection is one fixed
> stale DF data which prevented an ICE during bootstrapping, the other
> needed to ignore debug insns in one or two places so that the behavior
> didn't change based on the existence of debug insns.
>

Hi Jeff,

Yes this does include your fixes for DF and debug insns, along with
some other minor improvements.
Also, thanks for catching these!

Manolis

>
> Jeff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.
  2023-05-25 13:50       ` Richard Biener
@ 2023-05-25 14:02         ` Manolis Tsamis
  2023-05-29 23:30           ` Jeff Law
  2023-05-25 14:13         ` Jeff Law
  1 sibling, 1 reply; 45+ messages in thread
From: Manolis Tsamis @ 2023-05-25 14:02 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, gcc-patches

On Thu, May 25, 2023 at 4:53 PM Richard Biener via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Thu, May 25, 2023 at 3:32 PM Jeff Law via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> >
> >
> > On 5/25/23 07:01, Richard Biener via Gcc-patches wrote:
> > > On Thu, May 25, 2023 at 2:36 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
> > >>
> > >> Implementation of the new RISC-V optimization pass for memory offset
> > >> calculations, documentation and testcases.
> > >
> > > Why do fwprop or combine not what you want to do?
> > I think a lot of them end up coming from register elimination.
>
> Why isn't this a problem for other targets then?  Or maybe it is and this
> shouldn't be a machine specific pass?  Maybe postreload-gcse should
> perform strength reduction (I can't think of any other post reload pass
> that would do something even remotely related).
>
> Richard.
>

It should be a problem for other targets as well (especially RISC-style ISAs).

It can be easily seen by comparing the generated code for the
testcases: Example for testcase-2 on AArch64:
https://godbolt.org/z/GMT1K7Ebr
Although the patterns in the test cases are the ones that are simple
as the complex ones manifest in complex programs, the case still
holds.
The code for this pass is quite generic and could work for most/all
targets if that would be interesting.

Manolis

> > jeff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.
  2023-05-25 13:50       ` Richard Biener
  2023-05-25 14:02         ` Manolis Tsamis
@ 2023-05-25 14:13         ` Jeff Law
  2023-05-25 14:18           ` Philipp Tomsich
  1 sibling, 1 reply; 45+ messages in thread
From: Jeff Law @ 2023-05-25 14:13 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches



On 5/25/23 07:50, Richard Biener wrote:
> On Thu, May 25, 2023 at 3:32 PM Jeff Law via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>>
>>
>> On 5/25/23 07:01, Richard Biener via Gcc-patches wrote:
>>> On Thu, May 25, 2023 at 2:36 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
>>>>
>>>> Implementation of the new RISC-V optimization pass for memory offset
>>>> calculations, documentation and testcases.
>>>
>>> Why do fwprop or combine not what you want to do?
>> I think a lot of them end up coming from register elimination.
> 
> Why isn't this a problem for other targets then?  Or maybe it is and this
> shouldn't be a machine specific pass?  Maybe postreload-gcse should
> perform strength reduction (I can't think of any other post reload pass
> that would do something even remotely related).
It is to some degree.  I ran into similar problems at my prior employer. 
  We ended up working around it in the target files in a different way 
-- which didn't work when I quickly tried it on RISC-V.

Seems like it would be worth another investigative step as part of the 
evaluation of this patch.  I wasn't at 100% when I did that poking 
around many months ago.

Jeff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.
  2023-05-25 14:13         ` Jeff Law
@ 2023-05-25 14:18           ` Philipp Tomsich
  0 siblings, 0 replies; 45+ messages in thread
From: Philipp Tomsich @ 2023-05-25 14:18 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Biener, gcc-patches

On Thu, 25 May 2023 at 16:14, Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
>
> On 5/25/23 07:50, Richard Biener wrote:
> > On Thu, May 25, 2023 at 3:32 PM Jeff Law via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> >>
> >>
> >>
> >> On 5/25/23 07:01, Richard Biener via Gcc-patches wrote:
> >>> On Thu, May 25, 2023 at 2:36 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
> >>>>
> >>>> Implementation of the new RISC-V optimization pass for memory offset
> >>>> calculations, documentation and testcases.
> >>>
> >>> Why do fwprop or combine not what you want to do?

At least for stack variables, the virtual-stack-vars is not resolved
until reload.
So combine will be running much too early to be of any use (and I
haven't recently looked at whether one of the propagation passes runs
after).

Philipp.

> >> I think a lot of them end up coming from register elimination.
> >
> > Why isn't this a problem for other targets then?  Or maybe it is and this
> > shouldn't be a machine specific pass?  Maybe postreload-gcse should
> > perform strength reduction (I can't think of any other post reload pass
> > that would do something even remotely related).
> It is to some degree.  I ran into similar problems at my prior employer.
>   We ended up working around it in the target files in a different way
> -- which didn't work when I quickly tried it on RISC-V.
>
> Seems like it would be worth another investigative step as part of the
> evaluation of this patch.  I wasn't at 100% when I did that poking
> around many months ago.
>
> Jeff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.
  2023-05-25 14:02         ` Manolis Tsamis
@ 2023-05-29 23:30           ` Jeff Law
  2023-05-31 12:19             ` Manolis Tsamis
  0 siblings, 1 reply; 45+ messages in thread
From: Jeff Law @ 2023-05-29 23:30 UTC (permalink / raw)
  To: Manolis Tsamis, Richard Biener; +Cc: gcc-patches



On 5/25/23 08:02, Manolis Tsamis wrote:
> On Thu, May 25, 2023 at 4:53 PM Richard Biener via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> On Thu, May 25, 2023 at 3:32 PM Jeff Law via Gcc-patches
>> <gcc-patches@gcc.gnu.org> wrote:
>>>
>>>
>>>
>>> On 5/25/23 07:01, Richard Biener via Gcc-patches wrote:
>>>> On Thu, May 25, 2023 at 2:36 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
>>>>>
>>>>> Implementation of the new RISC-V optimization pass for memory offset
>>>>> calculations, documentation and testcases.
>>>>
>>>> Why do fwprop or combine not what you want to do?
>>> I think a lot of them end up coming from register elimination.
>>
>> Why isn't this a problem for other targets then?  Or maybe it is and this
>> shouldn't be a machine specific pass?  Maybe postreload-gcse should
>> perform strength reduction (I can't think of any other post reload pass
>> that would do something even remotely related).
>>
>> Richard.
>>
> 
> It should be a problem for other targets as well (especially RISC-style ISAs).
> 
> It can be easily seen by comparing the generated code for the
> testcases: Example for testcase-2 on AArch64:
> https://godbolt.org/z/GMT1K7Ebr
> Although the patterns in the test cases are the ones that are simple
> as the complex ones manifest in complex programs, the case still
> holds.
> The code for this pass is quite generic and could work for most/all
> targets if that would be interesting.
Interestly enough, fold-mem-offsets seems to interact strangely with the 
load/store pair support on aarch64.  Note show store2a uses 2 stp 
instructions on the trunk, but 4 str instructions with fold-mem-offsets. 
  Yet in load1r we're able to generate a load-quad rather than two load 
pairs.  Weird.

jeff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.
  2023-05-25 13:38   ` Jeff Law
@ 2023-05-31 12:15     ` Manolis Tsamis
  2023-06-07 22:16       ` Jeff Law
  0 siblings, 1 reply; 45+ messages in thread
From: Manolis Tsamis @ 2023-05-31 12:15 UTC (permalink / raw)
  To: Jeff Law
  Cc: gcc-patches, Richard Biener, Palmer Dabbelt, Philipp Tomsich, Kito Cheng

On Thu, May 25, 2023 at 4:38 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 5/25/23 06:35, Manolis Tsamis wrote:
> > Propagation of the stack pointer in cprop_hardreg is currenty forbidden
> > in all cases, due to maybe_mode_change returning NULL. Relax this
> > restriction and allow propagation when no mode change is requested.
> >
> > gcc/ChangeLog:
> >
> >          * regcprop.cc (maybe_mode_change): Enable stack pointer propagation.
> I can't see how this can be correct given the stack pointer equality
> tests elsewhere in the compiler, particularly the various targets.
>
> The problem is if you change the mode then you end up with multiple REG
> expressions that reference the stack pointer.
>
> See rev: d1446456c3fcaa7be628726c9de4a877729490ca and the thread around
> the change which introduced this code.
>

Hi Jeff,

Isn't this fine for this case since:

  1) stack_pointer_rtx is used which won't cause issues with pointer
equalities (If I understand correctly).
  2) Propagation is guarded with `if (orig_mode == new_mode)` so only
when there is no mode change.

Thanks,
Manolis

>
> Jeff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.
  2023-05-29 23:30           ` Jeff Law
@ 2023-05-31 12:19             ` Manolis Tsamis
  2023-05-31 14:00               ` Jeff Law
  0 siblings, 1 reply; 45+ messages in thread
From: Manolis Tsamis @ 2023-05-31 12:19 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Biener, gcc-patches

On Tue, May 30, 2023 at 2:30 AM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 5/25/23 08:02, Manolis Tsamis wrote:
> > On Thu, May 25, 2023 at 4:53 PM Richard Biener via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> >>
> >> On Thu, May 25, 2023 at 3:32 PM Jeff Law via Gcc-patches
> >> <gcc-patches@gcc.gnu.org> wrote:
> >>>
> >>>
> >>>
> >>> On 5/25/23 07:01, Richard Biener via Gcc-patches wrote:
> >>>> On Thu, May 25, 2023 at 2:36 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
> >>>>>
> >>>>> Implementation of the new RISC-V optimization pass for memory offset
> >>>>> calculations, documentation and testcases.
> >>>>
> >>>> Why do fwprop or combine not what you want to do?
> >>> I think a lot of them end up coming from register elimination.
> >>
> >> Why isn't this a problem for other targets then?  Or maybe it is and this
> >> shouldn't be a machine specific pass?  Maybe postreload-gcse should
> >> perform strength reduction (I can't think of any other post reload pass
> >> that would do something even remotely related).
> >>
> >> Richard.
> >>
> >
> > It should be a problem for other targets as well (especially RISC-style ISAs).
> >
> > It can be easily seen by comparing the generated code for the
> > testcases: Example for testcase-2 on AArch64:
> > https://godbolt.org/z/GMT1K7Ebr
> > Although the patterns in the test cases are the ones that are simple
> > as the complex ones manifest in complex programs, the case still
> > holds.
> > The code for this pass is quite generic and could work for most/all
> > targets if that would be interesting.
> Interestly enough, fold-mem-offsets seems to interact strangely with the
> load/store pair support on aarch64.  Note show store2a uses 2 stp
> instructions on the trunk, but 4 str instructions with fold-mem-offsets.
>   Yet in load1r we're able to generate a load-quad rather than two load
> pairs.  Weird.
>

I'm confused, where is this comparison from?
The fold-mem-offsets pass is only run on RISCV and doesn't (shouldn't)
affect AArch64.

I only see the 2x stp / 4x str in the godbolt link, but that is gcc vs
clang, no fold-mem-offsets involved here.

Manolis

> jeff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.
  2023-05-31 12:19             ` Manolis Tsamis
@ 2023-05-31 14:00               ` Jeff Law
  0 siblings, 0 replies; 45+ messages in thread
From: Jeff Law @ 2023-05-31 14:00 UTC (permalink / raw)
  To: Manolis Tsamis; +Cc: Richard Biener, gcc-patches



On 5/31/23 06:19, Manolis Tsamis wrote:
> On Tue, May 30, 2023 at 2:30 AM Jeff Law <jeffreyalaw@gmail.com> wrote:
>>
>>
>>
>> On 5/25/23 08:02, Manolis Tsamis wrote:
>>> On Thu, May 25, 2023 at 4:53 PM Richard Biener via Gcc-patches
>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>
>>>> On Thu, May 25, 2023 at 3:32 PM Jeff Law via Gcc-patches
>>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 5/25/23 07:01, Richard Biener via Gcc-patches wrote:
>>>>>> On Thu, May 25, 2023 at 2:36 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
>>>>>>>
>>>>>>> Implementation of the new RISC-V optimization pass for memory offset
>>>>>>> calculations, documentation and testcases.
>>>>>>
>>>>>> Why do fwprop or combine not what you want to do?
>>>>> I think a lot of them end up coming from register elimination.
>>>>
>>>> Why isn't this a problem for other targets then?  Or maybe it is and this
>>>> shouldn't be a machine specific pass?  Maybe postreload-gcse should
>>>> perform strength reduction (I can't think of any other post reload pass
>>>> that would do something even remotely related).
>>>>
>>>> Richard.
>>>>
>>>
>>> It should be a problem for other targets as well (especially RISC-style ISAs).
>>>
>>> It can be easily seen by comparing the generated code for the
>>> testcases: Example for testcase-2 on AArch64:
>>> https://godbolt.org/z/GMT1K7Ebr
>>> Although the patterns in the test cases are the ones that are simple
>>> as the complex ones manifest in complex programs, the case still
>>> holds.
>>> The code for this pass is quite generic and could work for most/all
>>> targets if that would be interesting.
>> Interestly enough, fold-mem-offsets seems to interact strangely with the
>> load/store pair support on aarch64.  Note show store2a uses 2 stp
>> instructions on the trunk, but 4 str instructions with fold-mem-offsets.
>>    Yet in load1r we're able to generate a load-quad rather than two load
>> pairs.  Weird.
>>
> 
> I'm confused, where is this comparison from?
> The fold-mem-offsets pass is only run on RISCV and doesn't (shouldn't)
> affect AArch64.
> 
> I only see the 2x stp / 4x str in the godbolt link, but that is gcc vs
> clang, no fold-mem-offsets involved here.
My bad!  I should have looked at the headings more closely.  I thought 
you'd set up a with/without fold-mem-offsets comparison.

jeff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.
  2023-05-31 12:15     ` Manolis Tsamis
@ 2023-06-07 22:16       ` Jeff Law
  0 siblings, 0 replies; 45+ messages in thread
From: Jeff Law @ 2023-06-07 22:16 UTC (permalink / raw)
  To: Manolis Tsamis
  Cc: gcc-patches, Richard Biener, Palmer Dabbelt, Philipp Tomsich, Kito Cheng



On 5/31/23 06:15, Manolis Tsamis wrote:
> On Thu, May 25, 2023 at 4:38 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>>
>>
>>
>> On 5/25/23 06:35, Manolis Tsamis wrote:
>>> Propagation of the stack pointer in cprop_hardreg is currenty forbidden
>>> in all cases, due to maybe_mode_change returning NULL. Relax this
>>> restriction and allow propagation when no mode change is requested.
>>>
>>> gcc/ChangeLog:
>>>
>>>           * regcprop.cc (maybe_mode_change): Enable stack pointer propagation.
>> I can't see how this can be correct given the stack pointer equality
>> tests elsewhere in the compiler, particularly the various targets.
>>
>> The problem is if you change the mode then you end up with multiple REG
>> expressions that reference the stack pointer.
>>
>> See rev: d1446456c3fcaa7be628726c9de4a877729490ca and the thread around
>> the change which introduced this code.
>>
> 
> Hi Jeff,
> 
> Isn't this fine for this case since:
> 
>    1) stack_pointer_rtx is used which won't cause issues with pointer
> equalities (If I understand correctly).
>    2) Propagation is guarded with `if (orig_mode == new_mode)` so only
> when there is no mode change.
I must have missed #2 -- is that something that changed since the first 
iteration for Ventana many months ago?

Anyway, hoping to make meaningful progress on these two patches over the 
next couple days.

jeff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.
  2023-05-25 12:35 ` [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible Manolis Tsamis
  2023-05-25 13:38   ` Jeff Law
@ 2023-06-07 22:18   ` Jeff Law
  2023-06-08  6:15     ` Manolis Tsamis
  2023-06-15 20:13     ` Philipp Tomsich
  1 sibling, 2 replies; 45+ messages in thread
From: Jeff Law @ 2023-06-07 22:18 UTC (permalink / raw)
  To: Manolis Tsamis, gcc-patches
  Cc: Richard Biener, Palmer Dabbelt, Philipp Tomsich, Kito Cheng



On 5/25/23 06:35, Manolis Tsamis wrote:
> Propagation of the stack pointer in cprop_hardreg is currenty forbidden
> in all cases, due to maybe_mode_change returning NULL. Relax this
> restriction and allow propagation when no mode change is requested.
> 
> gcc/ChangeLog:
> 
>          * regcprop.cc (maybe_mode_change): Enable stack pointer propagation.
Thanks for the clarification.  This is OK for the trunk.  It looks 
generic enough to have value going forward now rather than waiting.

jeff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.
  2023-05-25 12:35 ` [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets Manolis Tsamis
  2023-05-25 13:01   ` Richard Biener
@ 2023-06-08  5:37   ` Jeff Law
  2023-06-12  7:36     ` Manolis Tsamis
  2023-06-09  0:57   ` Jeff Law
  2023-06-10 15:49   ` Jeff Law
  3 siblings, 1 reply; 45+ messages in thread
From: Jeff Law @ 2023-06-08  5:37 UTC (permalink / raw)
  To: Manolis Tsamis, gcc-patches
  Cc: Richard Biener, Palmer Dabbelt, Philipp Tomsich, Kito Cheng

On 5/25/23 06:35, Manolis Tsamis wrote:
> Implementation of the new RISC-V optimization pass for memory offset
> calculations, documentation and testcases.
> 
> gcc/ChangeLog:
> 
> 	* config.gcc: Add riscv-fold-mem-offsets.o to extra_objs.
> 	* config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Schedule a new
> 	pass.
> 	* config/riscv/riscv-protos.h (make_pass_fold_mem_offsets): Declare.
> 	* config/riscv/riscv.opt: New options.
> 	* config/riscv/t-riscv: New build rule.
> 	* doc/invoke.texi: Document new option.
> 	* config/riscv/riscv-fold-mem-offsets.cc: New file.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/riscv/fold-mem-offsets-1.c: New test.
> 	* gcc.target/riscv/fold-mem-offsets-2.c: New test.
> 	* gcc.target/riscv/fold-mem-offsets-3.c: New test.
So not going into the guts of the patch yet.

 From a benchmark standpoint the only two that get out of the +-0.05% 
range are mcf and deepsjeng (from a dynamic instruction standpoint).  So 
from an evaluation standpoint we can probably focus our efforts there. 
And as we know, mcf is actually memory bound, so while improving its 
dynamic instruction count is good, the end performance improvement may 
be marginal.

As I mentioned to Philipp many months ago this reminds me a lot of a 
problem I've seen before.  Basically register elimination emits code 
that can be terrible in some circumstances.  So I went and poked at this 
again.

I think the key difference between now and what I was dealing with 
before is for the cases that really matter for rv64 we have a shNadd 
insn in the sequence.  That private port I was working on before did not 
have shNadd (don't ask, I probably can't tell).  Our target also had 
reg+reg addressing modes.  What I can't remember was if we were trying 
harder to fold the constant terms into the memory reference or if we 
were more focused on the reg+reg.  Ultimately it's probably not that 
important to remember -- the key is there are very significant 
differences in the target's capabilities which impact how we should be 
generating code in this case.  Those differences affect the code we 
generate *and* the places where we can potentially get control and do 
some address rewriting.

A  key sequence in mcf looks something like this in IRA, others have 
similar structure:

> (insn 237 234 239 26 (set (reg:DI 377)
>         (plus:DI (ashift:DI (reg:DI 200 [ _173 ])
>                 (const_int 3 [0x3]))
>             (reg/f:DI 65 frame))) "pbeampp.c":139:15 333 {*shNadd}
>      (nil))
> (insn 239 237 235 26 (set (reg/f:DI 380)
>         (plus:DI (reg:DI 513)
>             (reg:DI 377))) "pbeampp.c":139:15 5 {adddi3}
>      (expr_list:REG_DEAD (reg:DI 377)
>         (expr_list:REG_EQUAL (plus:DI (reg:DI 377)
>                 (const_int -32768 [0xffffffffffff8000]))
>             (nil))))
[ ... ]
> (insn 240 235 255 26 (set (reg/f:DI 204 [ _177 ])
>         (mem/f:DI (plus:DI (reg/f:DI 380)
>                 (const_int 280 [0x118])) [7 *_176+0 S8 A64])) "pbeampp.c":139:15 179 {*movdi_64bit}
>      (expr_list:REG_DEAD (reg/f:DI 380)
>         (nil)))

The key here is insn 237.  It's generally going to be bad to have FP 
show up in a shadd insn because its going to be eliminated into 
sp+offset.  That'll generate an input reload before insn 237 and we 
can't do any combination with the constant in insn 239.

After LRA it looks like this:

> (insn 1540 234 1541 26 (set (reg:DI 11 a1 [750])
>         (const_int 32768 [0x8000])) "pbeampp.c":139:15 179 {*movdi_64bit}
>      (nil))
> (insn 1541 1540 1611 26 (set (reg:DI 12 a2 [749])
>         (plus:DI (reg:DI 11 a1 [750])
>             (const_int -272 [0xfffffffffffffef0]))) "pbeampp.c":139:15 5 {adddi3}
>      (expr_list:REG_EQUAL (const_int 32496 [0x7ef0])
>         (nil))) 
> (insn 1611 1541 1542 26 (set (reg:DI 29 t4 [795])
>         (plus:DI (reg/f:DI 2 sp)
>             (const_int 64 [0x40]))) "pbeampp.c":139:15 5 {adddi3}
>      (nil))
> (insn 1542 1611 237 26 (set (reg:DI 12 a2 [749])
>         (plus:DI (reg:DI 12 a2 [749])
>             (reg:DI 29 t4 [795]))) "pbeampp.c":139:15 5 {adddi3}
>      (nil))
> (insn 237 1542 239 26 (set (reg:DI 12 a2 [377])
>         (plus:DI (ashift:DI (reg:DI 14 a4 [orig:200 _173 ] [200])
>                 (const_int 3 [0x3]))
>             (reg:DI 12 a2 [749]))) "pbeampp.c":139:15 333 {*shNadd}
>      (nil))
> (insn 239 237 235 26 (set (reg/f:DI 12 a2 [380])
>         (plus:DI (reg:DI 10 a0 [513])
>             (reg:DI 12 a2 [377]))) "pbeampp.c":139:15 5 {adddi3}
>      (expr_list:REG_EQUAL (plus:DI (reg:DI 12 a2 [377])
>             (const_int -32768 [0xffffffffffff8000]))
>         (nil))) 
[ ... ]
> (insn 240 235 255 26 (set (reg/f:DI 14 a4 [orig:204 _177 ] [204])
>         (mem/f:DI (plus:DI (reg/f:DI 12 a2 [380])
>                 (const_int 280 [0x118])) [7 *_176+0 S8 A64])) "pbeampp.c":139:15 179 {*movdi_64bit}
>      (nil))

Reload/LRA made an absolute mess of that code.

But before we add a new pass (target specific or generic), I think it 
may be in our best interest experiment a bit of creative rewriting to 
preserve the shadd, but without the frame pointer.  Perhaps also looking 
for a way to fold the constants, both the explicit ones and the implicit 
one from FP elimination.

This looks particularly promising:

> Trying 237, 239 -> 240:
>   237: r377:DI=r200:DI<<0x3+frame:DI
>       REG_DEAD r200:DI
>   239: r380:DI=r513:DI+r377:DI
>       REG_DEAD r377:DI
>       REG_EQUAL r377:DI-0x8000
>   240: r204:DI=[r380:DI+0x118]
>       REG_DEAD r380:DI
> Failed to match this instruction:
> (set (reg/f:DI 204 [ _177 ])
>     (mem/f:DI (plus:DI (plus:DI (plus:DI (mult:DI (reg:DI 200 [ _173 ])
>                         (const_int 8 [0x8]))
>                     (reg/f:DI 65 frame))
>                 (reg:DI 513))
>             (const_int 280 [0x118])) [7 *_176+0 S8 A64]))

We could reassociate this as

t1 = r200 * 8 + r513
t2 = frame + 280
t3 = t1 + t2
r204 = *t3

Which after elimination would be

t1 = r2000 * 8 + r513
t2 = sp + C + 280
t3 = t1 + t2
r204 = *t3

C + 280 will simplify.  And we'll probably end up in the addptr3 case 
which I think gives us a chance to write this a bit so that we end up wit
t1 = r200 * 8 + r513
t2 = sp + t1
r204 = *(t2 + 280 + C)

Or at least I *think* we might be able to get there.  Anyway, as I said, 
I think this deserves a bit of playing around before we jump straight 
into adding a new pass.

jeff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.
  2023-06-07 22:18   ` Jeff Law
@ 2023-06-08  6:15     ` Manolis Tsamis
  2023-06-15 20:13     ` Philipp Tomsich
  1 sibling, 0 replies; 45+ messages in thread
From: Manolis Tsamis @ 2023-06-08  6:15 UTC (permalink / raw)
  To: Jeff Law
  Cc: gcc-patches, Richard Biener, Palmer Dabbelt, Philipp Tomsich, Kito Cheng

Hi Jeff,

Yes that one has changed; I changed the implementation based on your feedback.

Thanks,
Manolis

On Thu, Jun 8, 2023 at 1:18 AM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 5/25/23 06:35, Manolis Tsamis wrote:
> > Propagation of the stack pointer in cprop_hardreg is currenty forbidden
> > in all cases, due to maybe_mode_change returning NULL. Relax this
> > restriction and allow propagation when no mode change is requested.
> >
> > gcc/ChangeLog:
> >
> >          * regcprop.cc (maybe_mode_change): Enable stack pointer propagation.
> Thanks for the clarification.  This is OK for the trunk.  It looks
> generic enough to have value going forward now rather than waiting.
>
> jeff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.
  2023-05-25 12:35 ` [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets Manolis Tsamis
  2023-05-25 13:01   ` Richard Biener
  2023-06-08  5:37   ` Jeff Law
@ 2023-06-09  0:57   ` Jeff Law
  2023-06-12  7:32     ` Manolis Tsamis
  2023-06-10 15:49   ` Jeff Law
  3 siblings, 1 reply; 45+ messages in thread
From: Jeff Law @ 2023-06-09  0:57 UTC (permalink / raw)
  To: Manolis Tsamis, gcc-patches
  Cc: Richard Biener, Palmer Dabbelt, Philipp Tomsich, Kito Cheng



On 5/25/23 06:35, Manolis Tsamis wrote:
> Implementation of the new RISC-V optimization pass for memory offset
> calculations, documentation and testcases.
> 
> gcc/ChangeLog:
> 
> 	* config.gcc: Add riscv-fold-mem-offsets.o to extra_objs.
> 	* config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Schedule a new
> 	pass.
> 	* config/riscv/riscv-protos.h (make_pass_fold_mem_offsets): Declare.
> 	* config/riscv/riscv.opt: New options.
> 	* config/riscv/t-riscv: New build rule.
> 	* doc/invoke.texi: Document new option.
> 	* config/riscv/riscv-fold-mem-offsets.cc: New file.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/riscv/fold-mem-offsets-1.c: New test.
> 	* gcc.target/riscv/fold-mem-offsets-2.c: New test.
> 	* gcc.target/riscv/fold-mem-offsets-3.c: New test.
So a followup.

While I think we probably could create a variety of backend patterns, 
perhaps disallow the frame pointer as the addend argument to a shadd 
pattern and the like and capture the key cases from mcf and probably 
deepsjeng it's probably not the best direction.

What I suspect would ultimately happen is we'd be presented with 
additional cases over time that would require an ever increasing number 
of patterns.  sign vs zero extension, increasing depth of search space 
to find reassociation opportunities, different variants with and without 
shadd/zbb, etc etc.

So with that space explored a bit the next big question is target 
specific or generic.  I'll poke in there a it over the coming days.  In 
the mean time I do have some questions/comments on the code itself. 
There may be more over time..



> +static rtx_insn*
> +get_single_def_in_bb (rtx_insn *insn, rtx reg)
[ ... ]


> +  for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
> +    {
> +      /* Problem getting some definition for this instruction.  */
> +      if (ref_link->ref == NULL)
> +	return NULL;
> +      if (DF_REF_INSN_INFO (ref_link->ref) == NULL)
> +	return NULL;
> +      if (global_regs[REGNO (reg)]
> +	  && !set_of (reg, DF_REF_INSN (ref_link->ref)))
> +	return NULL;
> +    }
That last condition feels a bit odd.  It would seem that you wanted an 
OR boolean rather than AND.


> +
> +  unsigned int dest_regno = REGNO (dest);
> +
> +  /* We don't want to fold offsets from instructions that change some
> +     particular registers with potentially global side effects.  */
> +  if (!GP_REG_P (dest_regno)
> +      || dest_regno == STACK_POINTER_REGNUM
> +      || (frame_pointer_needed && dest_regno == HARD_FRAME_POINTER_REGNUM)
> +      || dest_regno == GP_REGNUM
> +      || dest_regno == THREAD_POINTER_REGNUM
> +      || dest_regno == RETURN_ADDR_REGNUM)
> +    return 0;
I'd think most of this would be captured by testing fixed_registers 
rather than trying to list each register individually.  In fact, if we 
need to generalize this to work on other targets we almost certainly 
want a more general test.


> +      else if ((
> +		GET_CODE (src) == SIGN_EXTEND
> +		|| GET_CODE (src) == ZERO_EXTEND
> +	      )
> +	      && MEM_P (XEXP (src, 0)))
Formatting is goofy above...



> +
> +	  if (dump_file)
> +	    {
> +	      fprintf (dump_file, "Instruction deleted from folding:");
> +	      print_rtl_single (dump_file, insn);
> +	    }
> +
> +	  if (REGNO (dest) != REGNO (arg1))
> +	    {
> +	      /* If the dest register is different than the fisrt argument
> +		 then the addition with constant 0 is equivalent to a move
> +		 instruction.  We emit the move and let the subsequent
> +		 pass cprop_hardreg eliminate that if possible.  */
> +	      rtx arg1_reg_rtx = gen_rtx_REG (GET_MODE (dest), REGNO (arg1));
> +	      rtx mov_rtx = gen_move_insn (dest, arg1_reg_rtx);
> +	      df_insn_rescan (emit_insn_after (mov_rtx, insn));
> +	    }
> +
> +	  /* If the dest register is the same with the first argument
> +	     then the addition with constant 0 is a no-op.
> +	     We can now delete the original add immidiate instruction.  */
> +	  delete_insn (insn);
The debugging message is a bit misleading.  Yea, we always delete 
something here, but in one case we end up emitting a copy.



> +
> +	  /* Temporarily change the offset in MEM to test whether
> +	     it results in a valid instruction.  */
> +	  machine_mode mode = GET_MODE (mem_addr);
> +	  XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, GEN_INT (offset));
> +
> +	  bool valid_change = recog (PATTERN (insn), insn, 0) >= 0;
> +
> +	  /* Restore the instruction.  */
> +	  XEXP (mem, 0) = mem_addr;
You need to reset the INSN_CODE after restoring the instruction.  That's 
generally a bad thing to do, but I've seen it done enough (and been 
guilty myself in the past) that we should just assume some ports are 
broken in this regard.


Anyway, just wanted to get those issues raised so that you can address them.

jeff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.
  2023-05-25 12:35 ` [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets Manolis Tsamis
                     ` (2 preceding siblings ...)
  2023-06-09  0:57   ` Jeff Law
@ 2023-06-10 15:49   ` Jeff Law
  2023-06-12  7:41     ` Manolis Tsamis
  3 siblings, 1 reply; 45+ messages in thread
From: Jeff Law @ 2023-06-10 15:49 UTC (permalink / raw)
  To: Manolis Tsamis, gcc-patches
  Cc: Richard Biener, Palmer Dabbelt, Philipp Tomsich, Kito Cheng

On 5/25/23 06:35, Manolis Tsamis wrote:
> Implementation of the new RISC-V optimization pass for memory offset
> calculations, documentation and testcases.
> 
> gcc/ChangeLog:
> 
> 	* config.gcc: Add riscv-fold-mem-offsets.o to extra_objs.
> 	* config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Schedule a new
> 	pass.
> 	* config/riscv/riscv-protos.h (make_pass_fold_mem_offsets): Declare.
> 	* config/riscv/riscv.opt: New options.
> 	* config/riscv/t-riscv: New build rule.
> 	* doc/invoke.texi: Document new option.
> 	* config/riscv/riscv-fold-mem-offsets.cc: New file.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/riscv/fold-mem-offsets-1.c: New test.
> 	* gcc.target/riscv/fold-mem-offsets-2.c: New test.
> 	* gcc.target/riscv/fold-mem-offsets-3.c: New test.

So I made a small number of changes so that this could be run on other 
targets.

I had an hppa compiler handy, so it was trivial to do some light testing 
with that.  f-m-o didn't help at all on the included tests.  But I think 
that's more likely an artifact of the port supporting scaled indexed 
loads and doing fairly aggressive address rewriting to encourage that 
addressing mode.

Next I had an H8 compiler handy.  All three included tests showed 
improvement, both in terms of instruction count and size.  What was most 
interesting here is that f-m-o removed some redundant address 
calculations without needing to adjust the memory references which was a 
pleasant surprise.

Given the fact that both ports worked and the H8 showed an improvement, 
the next step was to put the patch into my tester.  It tests 30+ 
distinct processor families.  The goal wasn't to evaluate effectiveness, 
but to validate that those targets could still build their target 
libraries and successfully run their testsuites.

That's run through the various crosses.  Things like the hppa, alpha, 
m68k bootstraps only run once a week as they take many hours each.  The 
result is quite encouraging.  None of the crosses had any build issues 
or regressions.

The net result I think is we should probably move this to a target 
independent optimization pass.  We only need to generalize a few things.

Most importantly we need to get a resolution on the conditional I asked 
about inside get_single_def_in_bb.   There's some other refactoring I 
think we should do, but I'd really like to get a resolution on the code 
in get_single_def_in_bb first, then we ought to be able to move forward 
pretty quickly on the refactoring and integration.

jeff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.
  2023-06-09  0:57   ` Jeff Law
@ 2023-06-12  7:32     ` Manolis Tsamis
  2023-06-12 21:58       ` Jeff Law
  0 siblings, 1 reply; 45+ messages in thread
From: Manolis Tsamis @ 2023-06-12  7:32 UTC (permalink / raw)
  To: Jeff Law
  Cc: gcc-patches, Richard Biener, Palmer Dabbelt, Philipp Tomsich, Kito Cheng

On Fri, Jun 9, 2023 at 3:57 AM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 5/25/23 06:35, Manolis Tsamis wrote:
> > Implementation of the new RISC-V optimization pass for memory offset
> > calculations, documentation and testcases.
> >
> > gcc/ChangeLog:
> >
> >       * config.gcc: Add riscv-fold-mem-offsets.o to extra_objs.
> >       * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Schedule a new
> >       pass.
> >       * config/riscv/riscv-protos.h (make_pass_fold_mem_offsets): Declare.
> >       * config/riscv/riscv.opt: New options.
> >       * config/riscv/t-riscv: New build rule.
> >       * doc/invoke.texi: Document new option.
> >       * config/riscv/riscv-fold-mem-offsets.cc: New file.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.target/riscv/fold-mem-offsets-1.c: New test.
> >       * gcc.target/riscv/fold-mem-offsets-2.c: New test.
> >       * gcc.target/riscv/fold-mem-offsets-3.c: New test.
> So a followup.
>
> While I think we probably could create a variety of backend patterns,
> perhaps disallow the frame pointer as the addend argument to a shadd
> pattern and the like and capture the key cases from mcf and probably
> deepsjeng it's probably not the best direction.
>
> What I suspect would ultimately happen is we'd be presented with
> additional cases over time that would require an ever increasing number
> of patterns.  sign vs zero extension, increasing depth of search space
> to find reassociation opportunities, different variants with and without
> shadd/zbb, etc etc.
>
> So with that space explored a bit the next big question is target
> specific or generic.  I'll poke in there a it over the coming days.  In
> the mean time I do have some questions/comments on the code itself.
> There may be more over time..
>
>
>
> > +static rtx_insn*
> > +get_single_def_in_bb (rtx_insn *insn, rtx reg)
> [ ... ]
>
>
> > +  for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
> > +    {
> > +      /* Problem getting some definition for this instruction.  */
> > +      if (ref_link->ref == NULL)
> > +     return NULL;
> > +      if (DF_REF_INSN_INFO (ref_link->ref) == NULL)
> > +     return NULL;
> > +      if (global_regs[REGNO (reg)]
> > +       && !set_of (reg, DF_REF_INSN (ref_link->ref)))
> > +     return NULL;
> > +    }
> That last condition feels a bit odd.  It would seem that you wanted an
> OR boolean rather than AND.
>

Most of this function I didn't write by myself, I used existing code
to get definitions taken from REE's get_defs.
In the original there's a comment about this line this comment that explains it:

  As global regs are assumed to be defined at each function call
  dataflow can report a call_insn as being a definition of REG.
  But we can't do anything with that in this pass so proceed only
  if the instruction really sets REG in a way that can be deduced
  from the RTL structure.

This function is the only one I copied without changing much (because
I didn't quite understand it), so I don't know if that condition is
any useful for f-m-o.
Also the code duplication here is a bit unfortunate, maybe it would be
preferred to create a generic version that can be used in both?

>
> > +
> > +  unsigned int dest_regno = REGNO (dest);
> > +
> > +  /* We don't want to fold offsets from instructions that change some
> > +     particular registers with potentially global side effects.  */
> > +  if (!GP_REG_P (dest_regno)
> > +      || dest_regno == STACK_POINTER_REGNUM
> > +      || (frame_pointer_needed && dest_regno == HARD_FRAME_POINTER_REGNUM)
> > +      || dest_regno == GP_REGNUM
> > +      || dest_regno == THREAD_POINTER_REGNUM
> > +      || dest_regno == RETURN_ADDR_REGNUM)
> > +    return 0;
> I'd think most of this would be captured by testing fixed_registers
> rather than trying to list each register individually.  In fact, if we
> need to generalize this to work on other targets we almost certainly
> want a more general test.
>

Thanks, I knew there would be some proper way to test this but wasn't
aware which is the correct one.
Should this look like below? Or is the GP_REG_P redundant and just
fixed_regs will do?

  if (!GP_REG_P (dest_regno) || fixed_regs[dest_regno])
    return 0;

>
> > +      else if ((
> > +             GET_CODE (src) == SIGN_EXTEND
> > +             || GET_CODE (src) == ZERO_EXTEND
> > +           )
> > +           && MEM_P (XEXP (src, 0)))
> Formatting is goofy above...
>

Noted.

>
>
> > +
> > +       if (dump_file)
> > +         {
> > +           fprintf (dump_file, "Instruction deleted from folding:");
> > +           print_rtl_single (dump_file, insn);
> > +         }
> > +
> > +       if (REGNO (dest) != REGNO (arg1))
> > +         {
> > +           /* If the dest register is different than the fisrt argument
> > +              then the addition with constant 0 is equivalent to a move
> > +              instruction.  We emit the move and let the subsequent
> > +              pass cprop_hardreg eliminate that if possible.  */
> > +           rtx arg1_reg_rtx = gen_rtx_REG (GET_MODE (dest), REGNO (arg1));
> > +           rtx mov_rtx = gen_move_insn (dest, arg1_reg_rtx);
> > +           df_insn_rescan (emit_insn_after (mov_rtx, insn));
> > +         }
> > +
> > +       /* If the dest register is the same with the first argument
> > +          then the addition with constant 0 is a no-op.
> > +          We can now delete the original add immidiate instruction.  */
> > +       delete_insn (insn);
> The debugging message is a bit misleading.  Yea, we always delete
> something here, but in one case we end up emitting a copy.
>

Indeed. Maybe "Instruction reduced to move: ..."?

>
>
> > +
> > +       /* Temporarily change the offset in MEM to test whether
> > +          it results in a valid instruction.  */
> > +       machine_mode mode = GET_MODE (mem_addr);
> > +       XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, GEN_INT (offset));
> > +
> > +       bool valid_change = recog (PATTERN (insn), insn, 0) >= 0;
> > +
> > +       /* Restore the instruction.  */
> > +       XEXP (mem, 0) = mem_addr;
> You need to reset the INSN_CODE after restoring the instruction.  That's
> generally a bad thing to do, but I've seen it done enough (and been
> guilty myself in the past) that we should just assume some ports are
> broken in this regard.
>

Ok thanks, I didn't knew that one.

>
> Anyway, just wanted to get those issues raised so that you can address them.
>
> jeff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.
  2023-06-08  5:37   ` Jeff Law
@ 2023-06-12  7:36     ` Manolis Tsamis
  2023-06-12 14:37       ` Jeff Law
  0 siblings, 1 reply; 45+ messages in thread
From: Manolis Tsamis @ 2023-06-12  7:36 UTC (permalink / raw)
  To: Jeff Law
  Cc: gcc-patches, Richard Biener, Palmer Dabbelt, Philipp Tomsich, Kito Cheng

On Thu, Jun 8, 2023 at 8:37 AM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 5/25/23 06:35, Manolis Tsamis wrote:
> > Implementation of the new RISC-V optimization pass for memory offset
> > calculations, documentation and testcases.
> >
> > gcc/ChangeLog:
> >
> >       * config.gcc: Add riscv-fold-mem-offsets.o to extra_objs.
> >       * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Schedule a new
> >       pass.
> >       * config/riscv/riscv-protos.h (make_pass_fold_mem_offsets): Declare.
> >       * config/riscv/riscv.opt: New options.
> >       * config/riscv/t-riscv: New build rule.
> >       * doc/invoke.texi: Document new option.
> >       * config/riscv/riscv-fold-mem-offsets.cc: New file.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.target/riscv/fold-mem-offsets-1.c: New test.
> >       * gcc.target/riscv/fold-mem-offsets-2.c: New test.
> >       * gcc.target/riscv/fold-mem-offsets-3.c: New test.
> So not going into the guts of the patch yet.
>
>  From a benchmark standpoint the only two that get out of the +-0.05%
> range are mcf and deepsjeng (from a dynamic instruction standpoint).  So
> from an evaluation standpoint we can probably focus our efforts there.
> And as we know, mcf is actually memory bound, so while improving its
> dynamic instruction count is good, the end performance improvement may
> be marginal.
>

Even if late, one question for the dynamic instruction numbers.
Was this measured just with f-m-o or with the stack pointer fold patch
applied too?
I remember I was getting better improvements in the past, but most of
the cases had to do with the stack pointer so the second patch is
necessary.

> As I mentioned to Philipp many months ago this reminds me a lot of a
> problem I've seen before.  Basically register elimination emits code
> that can be terrible in some circumstances.  So I went and poked at this
> again.
>
> I think the key difference between now and what I was dealing with
> before is for the cases that really matter for rv64 we have a shNadd
> insn in the sequence.  That private port I was working on before did not
> have shNadd (don't ask, I probably can't tell).  Our target also had
> reg+reg addressing modes.  What I can't remember was if we were trying
> harder to fold the constant terms into the memory reference or if we
> were more focused on the reg+reg.  Ultimately it's probably not that
> important to remember -- the key is there are very significant
> differences in the target's capabilities which impact how we should be
> generating code in this case.  Those differences affect the code we
> generate *and* the places where we can potentially get control and do
> some address rewriting.
>
> A  key sequence in mcf looks something like this in IRA, others have
> similar structure:
>
> > (insn 237 234 239 26 (set (reg:DI 377)
> >         (plus:DI (ashift:DI (reg:DI 200 [ _173 ])
> >                 (const_int 3 [0x3]))
> >             (reg/f:DI 65 frame))) "pbeampp.c":139:15 333 {*shNadd}
> >      (nil))
> > (insn 239 237 235 26 (set (reg/f:DI 380)
> >         (plus:DI (reg:DI 513)
> >             (reg:DI 377))) "pbeampp.c":139:15 5 {adddi3}
> >      (expr_list:REG_DEAD (reg:DI 377)
> >         (expr_list:REG_EQUAL (plus:DI (reg:DI 377)
> >                 (const_int -32768 [0xffffffffffff8000]))
> >             (nil))))
> [ ... ]
> > (insn 240 235 255 26 (set (reg/f:DI 204 [ _177 ])
> >         (mem/f:DI (plus:DI (reg/f:DI 380)
> >                 (const_int 280 [0x118])) [7 *_176+0 S8 A64])) "pbeampp.c":139:15 179 {*movdi_64bit}
> >      (expr_list:REG_DEAD (reg/f:DI 380)
> >         (nil)))
>
>
> The key here is insn 237.  It's generally going to be bad to have FP
> show up in a shadd insn because its going to be eliminated into
> sp+offset.  That'll generate an input reload before insn 237 and we
> can't do any combination with the constant in insn 239.
>
> After LRA it looks like this:
>
> > (insn 1540 234 1541 26 (set (reg:DI 11 a1 [750])
> >         (const_int 32768 [0x8000])) "pbeampp.c":139:15 179 {*movdi_64bit}
> >      (nil))
> > (insn 1541 1540 1611 26 (set (reg:DI 12 a2 [749])
> >         (plus:DI (reg:DI 11 a1 [750])
> >             (const_int -272 [0xfffffffffffffef0]))) "pbeampp.c":139:15 5 {adddi3}
> >      (expr_list:REG_EQUAL (const_int 32496 [0x7ef0])
> >         (nil)))
> > (insn 1611 1541 1542 26 (set (reg:DI 29 t4 [795])
> >         (plus:DI (reg/f:DI 2 sp)
> >             (const_int 64 [0x40]))) "pbeampp.c":139:15 5 {adddi3}
> >      (nil))
> > (insn 1542 1611 237 26 (set (reg:DI 12 a2 [749])
> >         (plus:DI (reg:DI 12 a2 [749])
> >             (reg:DI 29 t4 [795]))) "pbeampp.c":139:15 5 {adddi3}
> >      (nil))
> > (insn 237 1542 239 26 (set (reg:DI 12 a2 [377])
> >         (plus:DI (ashift:DI (reg:DI 14 a4 [orig:200 _173 ] [200])
> >                 (const_int 3 [0x3]))
> >             (reg:DI 12 a2 [749]))) "pbeampp.c":139:15 333 {*shNadd}
> >      (nil))
> > (insn 239 237 235 26 (set (reg/f:DI 12 a2 [380])
> >         (plus:DI (reg:DI 10 a0 [513])
> >             (reg:DI 12 a2 [377]))) "pbeampp.c":139:15 5 {adddi3}
> >      (expr_list:REG_EQUAL (plus:DI (reg:DI 12 a2 [377])
> >             (const_int -32768 [0xffffffffffff8000]))
> >         (nil)))
> [ ... ]
> > (insn 240 235 255 26 (set (reg/f:DI 14 a4 [orig:204 _177 ] [204])
> >         (mem/f:DI (plus:DI (reg/f:DI 12 a2 [380])
> >                 (const_int 280 [0x118])) [7 *_176+0 S8 A64])) "pbeampp.c":139:15 179 {*movdi_64bit}
> >      (nil))
>
>
> Reload/LRA made an absolute mess of that code.
>
> But before we add a new pass (target specific or generic), I think it
> may be in our best interest experiment a bit of creative rewriting to
> preserve the shadd, but without the frame pointer.  Perhaps also looking
> for a way to fold the constants, both the explicit ones and the implicit
> one from FP elimination.
>
> This looks particularly promising:
>
> > Trying 237, 239 -> 240:
> >   237: r377:DI=r200:DI<<0x3+frame:DI
> >       REG_DEAD r200:DI
> >   239: r380:DI=r513:DI+r377:DI
> >       REG_DEAD r377:DI
> >       REG_EQUAL r377:DI-0x8000
> >   240: r204:DI=[r380:DI+0x118]
> >       REG_DEAD r380:DI
> > Failed to match this instruction:
> > (set (reg/f:DI 204 [ _177 ])
> >     (mem/f:DI (plus:DI (plus:DI (plus:DI (mult:DI (reg:DI 200 [ _173 ])
> >                         (const_int 8 [0x8]))
> >                     (reg/f:DI 65 frame))
> >                 (reg:DI 513))
> >             (const_int 280 [0x118])) [7 *_176+0 S8 A64]))
>
>
> We could reassociate this as
>
> t1 = r200 * 8 + r513
> t2 = frame + 280
> t3 = t1 + t2
> r204 = *t3
>
> Which after elimination would be
>
> t1 = r2000 * 8 + r513
> t2 = sp + C + 280
> t3 = t1 + t2
> r204 = *t3
>
> C + 280 will simplify.  And we'll probably end up in the addptr3 case
> which I think gives us a chance to write this a bit so that we end up wit
> t1 = r200 * 8 + r513
> t2 = sp + t1
> r204 = *(t2 + 280 + C)
>
>
> Or at least I *think* we might be able to get there.  Anyway, as I said,
> I think this deserves a bit of playing around before we jump straight
> into adding a new pass.
>
> jeff
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.
  2023-06-10 15:49   ` Jeff Law
@ 2023-06-12  7:41     ` Manolis Tsamis
  2023-06-12 21:36       ` Jeff Law
  0 siblings, 1 reply; 45+ messages in thread
From: Manolis Tsamis @ 2023-06-12  7:41 UTC (permalink / raw)
  To: Jeff Law
  Cc: gcc-patches, Richard Biener, Palmer Dabbelt, Philipp Tomsich, Kito Cheng

On Sat, Jun 10, 2023 at 6:49 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 5/25/23 06:35, Manolis Tsamis wrote:
> > Implementation of the new RISC-V optimization pass for memory offset
> > calculations, documentation and testcases.
> >
> > gcc/ChangeLog:
> >
> >       * config.gcc: Add riscv-fold-mem-offsets.o to extra_objs.
> >       * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Schedule a new
> >       pass.
> >       * config/riscv/riscv-protos.h (make_pass_fold_mem_offsets): Declare.
> >       * config/riscv/riscv.opt: New options.
> >       * config/riscv/t-riscv: New build rule.
> >       * doc/invoke.texi: Document new option.
> >       * config/riscv/riscv-fold-mem-offsets.cc: New file.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.target/riscv/fold-mem-offsets-1.c: New test.
> >       * gcc.target/riscv/fold-mem-offsets-2.c: New test.
> >       * gcc.target/riscv/fold-mem-offsets-3.c: New test.
>
> So I made a small number of changes so that this could be run on other
> targets.
>
>
> I had an hppa compiler handy, so it was trivial to do some light testing
> with that.  f-m-o didn't help at all on the included tests.  But I think
> that's more likely an artifact of the port supporting scaled indexed
> loads and doing fairly aggressive address rewriting to encourage that
> addressing mode.
>
> Next I had an H8 compiler handy.  All three included tests showed
> improvement, both in terms of instruction count and size.  What was most
> interesting here is that f-m-o removed some redundant address
> calculations without needing to adjust the memory references which was a
> pleasant surprise.
>
> Given the fact that both ports worked and the H8 showed an improvement,
> the next step was to put the patch into my tester.  It tests 30+
> distinct processor families.  The goal wasn't to evaluate effectiveness,
> but to validate that those targets could still build their target
> libraries and successfully run their testsuites.
>
> That's run through the various crosses.  Things like the hppa, alpha,
> m68k bootstraps only run once a week as they take many hours each.  The
> result is quite encouraging.  None of the crosses had any build issues
> or regressions.
>

That's all great news!

> The net result I think is we should probably move this to a target
> independent optimization pass.  We only need to generalize a few things.
>

I also think that's where this should end up since most of the pass is
target independent anyway.
I just couldn't figure out what would be a proper way to model the
propagation rules for each target.
Is a target hook necessary for that?

> Most importantly we need to get a resolution on the conditional I asked
> about inside get_single_def_in_bb.   There's some other refactoring I
> think we should do, but I'd really like to get a resolution on the code
> in get_single_def_in_bb first, then we ought to be able to move forward
> pretty quickly on the refactoring and integration.
>

Just replied to that in my previous response :)

> jeff

Thanks,
Manolis

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.
  2023-06-12  7:36     ` Manolis Tsamis
@ 2023-06-12 14:37       ` Jeff Law
  0 siblings, 0 replies; 45+ messages in thread
From: Jeff Law @ 2023-06-12 14:37 UTC (permalink / raw)
  To: Manolis Tsamis
  Cc: gcc-patches, Richard Biener, Palmer Dabbelt, Philipp Tomsich, Kito Cheng



On 6/12/23 01:36, Manolis Tsamis wrote:

>>
> 
> Even if late, one question for the dynamic instruction numbers.
> Was this measured just with f-m-o or with the stack pointer fold patch
> applied too?
> I remember I was getting better improvements in the past, but most of
> the cases had to do with the stack pointer so the second patch is
> necessary.
It was just the main f-m-o patch, so there'll be additional benefits 
with the ability to cprop the stack pointer.   And even if we don't get 
measurable wins for something like mcf due to its memory bound nature, 
smaller, tighter code is always preferable.

Jeff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.
  2023-06-12  7:41     ` Manolis Tsamis
@ 2023-06-12 21:36       ` Jeff Law
  0 siblings, 0 replies; 45+ messages in thread
From: Jeff Law @ 2023-06-12 21:36 UTC (permalink / raw)
  To: Manolis Tsamis
  Cc: gcc-patches, Richard Biener, Palmer Dabbelt, Philipp Tomsich, Kito Cheng

On 6/12/23 01:41, Manolis Tsamis wrote:

> 
> I also think that's where this should end up since most of the pass is
> target independent anyway.
> I just couldn't figure out what would be a proper way to model the
> propagation rules for each target.
> Is a target hook necessary for that?
No hook should be necessary.  You're already checking that the result is 
recognized.  In theory you shouldn't have to, but checking the 
constraints seems advisable as well.

Costing is a different matter.  You might end changing an offset in such 
a way as to create a longer instruction on targets that have variable 
length encoding.  If we see that we'll likely have to add some rtx cost 
calls and compare the before/after.

But I suspect those cases are going to be limited in practice and in 
general if we're able to delete an earlier instruction we going to win 
even if the offset in the MEM changes and perhaps even results in a 
longer instruction.

Jeff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.
  2023-06-12  7:32     ` Manolis Tsamis
@ 2023-06-12 21:58       ` Jeff Law
  2023-06-15 17:34         ` Manolis Tsamis
  0 siblings, 1 reply; 45+ messages in thread
From: Jeff Law @ 2023-06-12 21:58 UTC (permalink / raw)
  To: Manolis Tsamis
  Cc: gcc-patches, Richard Biener, Palmer Dabbelt, Philipp Tomsich, Kito Cheng



On 6/12/23 01:32, Manolis Tsamis wrote:

>>
>>> +  for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
>>> +    {
>>> +      /* Problem getting some definition for this instruction.  */
>>> +      if (ref_link->ref == NULL)
>>> +     return NULL;
>>> +      if (DF_REF_INSN_INFO (ref_link->ref) == NULL)
>>> +     return NULL;
>>> +      if (global_regs[REGNO (reg)]
>>> +       && !set_of (reg, DF_REF_INSN (ref_link->ref)))
>>> +     return NULL;
>>> +    }
>> That last condition feels a bit odd.  It would seem that you wanted an
>> OR boolean rather than AND.
>>
> 
> Most of this function I didn't write by myself, I used existing code
> to get definitions taken from REE's get_defs.
> In the original there's a comment about this line this comment that explains it:
> 
>    As global regs are assumed to be defined at each function call
>    dataflow can report a call_insn as being a definition of REG.
>    But we can't do anything with that in this pass so proceed only
>    if the instruction really sets REG in a way that can be deduced
>    from the RTL structure.
> 
> This function is the only one I copied without changing much (because
> I didn't quite understand it), so I don't know if that condition is
> any useful for f-m-o.
> Also the code duplication here is a bit unfortunate, maybe it would be
> preferred to create a generic version that can be used in both?
Ah.  So I think the code is meant to filter out things that DF will say 
are set vs those which are actually exposed explicitly in the RTL (and 
which REE might be able to modify).  So we're probably good.

Those routines are are pretty close to each other in implementation.  I 
bet we could take everything up to the loop over the ref links and 
factor that into a common function.   Both your function and get_defs 
would be able to use that and then do bit of processing afterwards.


> 
>>
>>> +
>>> +  unsigned int dest_regno = REGNO (dest);
>>> +
>>> +  /* We don't want to fold offsets from instructions that change some
>>> +     particular registers with potentially global side effects.  */
>>> +  if (!GP_REG_P (dest_regno)
>>> +      || dest_regno == STACK_POINTER_REGNUM
>>> +      || (frame_pointer_needed && dest_regno == HARD_FRAME_POINTER_REGNUM)
>>> +      || dest_regno == GP_REGNUM
>>> +      || dest_regno == THREAD_POINTER_REGNUM
>>> +      || dest_regno == RETURN_ADDR_REGNUM)
>>> +    return 0;
>> I'd think most of this would be captured by testing fixed_registers
>> rather than trying to list each register individually.  In fact, if we
>> need to generalize this to work on other targets we almost certainly
>> want a more general test.
>>
> 
> Thanks, I knew there would be some proper way to test this but wasn't
> aware which is the correct one.
> Should this look like below? Or is the GP_REG_P redundant and just
> fixed_regs will do?
If you want to verify it's a general register, then you have to ask if 
the regno is in the GENERAL_REGS class.  Something like:

TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], dest_regno)

GP_REG_P is a risc-v specific macro, so we can't use it here.

So something like
   if (fixed_regs[dest_regno]
       || !TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], dest_regno))



>> The debugging message is a bit misleading.  Yea, we always delete
>> something here, but in one case we end up emitting a copy.
>>
> 
> Indeed. Maybe "Instruction reduced to move: ..."?
Works for me.

> 
>>
>>
>>> +
>>> +       /* Temporarily change the offset in MEM to test whether
>>> +          it results in a valid instruction.  */
>>> +       machine_mode mode = GET_MODE (mem_addr);
>>> +       XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, GEN_INT (offset));
>>> +
>>> +       bool valid_change = recog (PATTERN (insn), insn, 0) >= 0;
>>> +
>>> +       /* Restore the instruction.  */
>>> +       XEXP (mem, 0) = mem_addr;
>> You need to reset the INSN_CODE after restoring the instruction.  That's
>> generally a bad thing to do, but I've seen it done enough (and been
>> guilty myself in the past) that we should just assume some ports are
>> broken in this regard.
>>
> 
> Ok thanks, I didn't knew that one.
It's pretty obscure and I probably would have missed it if I hadn't 
debugged a problem related to this just a few months back.  It shouldn't 
be necessary due to rules about how the movXX patterns are supposed to 
work.  But I've seen it mucked up enough that it's the right thing to do.

Essentially when you call into recog, if the pattern is recognized, then 
a cookie is cached so that we know what pattern was recognized within 
the backend.

As far as generalizing to a target independent pass.  You'll need to 
declare the new pass in tree-pass.h, add it to passes.def, wire up the 
option in common.opt, document it in invoke.texi and turn it on for -O2 
and above.  WRT the interaction with shorten-memrefs, I think that can 
be handled in override-options.  I think we want to have shorten-memrefs 
override.  So if shorten-memrefs is on, then we turn off f-o-m in the RV 
backend.  This should probably be documented in invoke.texi as well.

It sounds like a lot, but I think each of those is a relatively simple 
change.  It'd be best if you could tackle those changes.  I think with 
that done and bootstrap/regression test round we'd be ready to integrate 
your work.

Thanks!

jeff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/2] RISC-V: New pass to optimize calculation of offsets for memory operations.
  2023-05-25 13:42 ` [PATCH 0/2] RISC-V: New pass to optimize calculation of offsets for memory operations Jeff Law
  2023-05-25 13:57   ` Manolis Tsamis
@ 2023-06-15 15:04   ` Jeff Law
  2023-06-15 15:30     ` Manolis Tsamis
  1 sibling, 1 reply; 45+ messages in thread
From: Jeff Law @ 2023-06-15 15:04 UTC (permalink / raw)
  To: Manolis Tsamis, gcc-patches
  Cc: Richard Biener, Palmer Dabbelt, Philipp Tomsich, Kito Cheng

On 5/25/23 07:42, Jeff Law wrote:

> Thanks Manolis.  Do you happen to know if this includes the fixes I 
> passed along to Philipp a few months back?  My recollection is one fixed 
> stale DF data which prevented an ICE during bootstrapping, the other 
> needed to ignore debug insns in one or two places so that the behavior 
> didn't change based on the existence of debug insns.
So we stumbled over another relatively minor issue in this code this 
week that I'm sure you'll want to fix for a V2.

Specifically fold_offset's "scale" argument needs to be a HOST_WIDE_INT 
rather than an "int".  Inside the ASHIFT handling you need to change the 
type of shift_scale to a HOST_WIDE_INT as well and potentially the 
actual computation of shift_scale.

The problem is if you have a compile-time constant address on rv64, it 
might be constructed with code like this:

> (insn 282 47 283 6 (set (reg:DI 14 a4 [267])
>         (const_int 348160 [0x55000])) "test_dbmd_pucinterruptenable_rw.c":18:31 179 {*movdi_64bit}
>      (nil))
> (insn 283 282 284 6 (set (reg:DI 14 a4 [267])
>         (plus:DI (reg:DI 14 a4 [267])
>             (const_int 1365 [0x555]))) "test_dbmd_pucinterruptenable_rw.c":18:31 5 {riscv_adddi3}
>      (expr_list:REG_EQUAL (const_int 349525 [0x55555])
>         (nil)))
> (insn 284 283 285 6 (set (reg:DI 13 a3 [268])
>         (const_int 1431662592 [0x55557000])) "test_dbmd_pucinterruptenable_rw.c":18:31 179 {*movdi_64bit}
>      (nil))
> (insn 285 284 215 6 (set (reg:DI 13 a3 [268])
>         (plus:DI (reg:DI 13 a3 [268])
>             (const_int 4 [0x4]))) "test_dbmd_pucinterruptenable_rw.c":18:31 5 {riscv_adddi3}
>      (expr_list:REG_EQUAL (const_int 1431662596 [0x55557004])
>         (nil)))
> (insn 215 285 216 6 (set (reg:DI 14 a4 [271])
>         (ashift:DI (reg:DI 14 a4 [267]) 
>             (const_int 32 [0x20]))) "test_dbmd_pucinterruptenable_rw.c":18:31 204 {ashldi3}
>      (nil)) 
> (insn 216 215 42 6 (set (reg/f:DI 14 a4 [166])
>         (plus:DI (reg:DI 14 a4 [271]) 
>             (reg:DI 13 a3 [268]))) "test_dbmd_pucinterruptenable_rw.c":18:31 5 {riscv_adddi3}
>      (expr_list:REG_DEAD (reg:DI 13 a3 [268])
>         (expr_list:REG_EQUIV (const_int 1501199875796996 [0x5555555557004])
>             (nil))))

Note that 32bit ASHIFT in insn 215.  If you're doing that computation in 
a 32bit integer type, then it's going to shift off the end of the type.

Jeff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/2] RISC-V: New pass to optimize calculation of offsets for memory operations.
  2023-06-15 15:04   ` Jeff Law
@ 2023-06-15 15:30     ` Manolis Tsamis
  2023-06-15 15:56       ` Jeff Law
  2023-06-18 18:11       ` Jeff Law
  0 siblings, 2 replies; 45+ messages in thread
From: Manolis Tsamis @ 2023-06-15 15:30 UTC (permalink / raw)
  To: Jeff Law
  Cc: gcc-patches, Richard Biener, Palmer Dabbelt, Philipp Tomsich, Kito Cheng

On Thu, Jun 15, 2023 at 6:04 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 5/25/23 07:42, Jeff Law wrote:
>
> > Thanks Manolis.  Do you happen to know if this includes the fixes I
> > passed along to Philipp a few months back?  My recollection is one fixed
> > stale DF data which prevented an ICE during bootstrapping, the other
> > needed to ignore debug insns in one or two places so that the behavior
> > didn't change based on the existence of debug insns.
> So we stumbled over another relatively minor issue in this code this
> week that I'm sure you'll want to fix for a V2.
>
> Specifically fold_offset's "scale" argument needs to be a HOST_WIDE_INT
> rather than an "int".  Inside the ASHIFT handling you need to change the
> type of shift_scale to a HOST_WIDE_INT as well and potentially the
> actual computation of shift_scale.
>
> The problem is if you have a compile-time constant address on rv64, it
> might be constructed with code like this:
>
>
>
>
> > (insn 282 47 283 6 (set (reg:DI 14 a4 [267])
> >         (const_int 348160 [0x55000])) "test_dbmd_pucinterruptenable_rw.c":18:31 179 {*movdi_64bit}
> >      (nil))
> > (insn 283 282 284 6 (set (reg:DI 14 a4 [267])
> >         (plus:DI (reg:DI 14 a4 [267])
> >             (const_int 1365 [0x555]))) "test_dbmd_pucinterruptenable_rw.c":18:31 5 {riscv_adddi3}
> >      (expr_list:REG_EQUAL (const_int 349525 [0x55555])
> >         (nil)))
> > (insn 284 283 285 6 (set (reg:DI 13 a3 [268])
> >         (const_int 1431662592 [0x55557000])) "test_dbmd_pucinterruptenable_rw.c":18:31 179 {*movdi_64bit}
> >      (nil))
> > (insn 285 284 215 6 (set (reg:DI 13 a3 [268])
> >         (plus:DI (reg:DI 13 a3 [268])
> >             (const_int 4 [0x4]))) "test_dbmd_pucinterruptenable_rw.c":18:31 5 {riscv_adddi3}
> >      (expr_list:REG_EQUAL (const_int 1431662596 [0x55557004])
> >         (nil)))
> > (insn 215 285 216 6 (set (reg:DI 14 a4 [271])
> >         (ashift:DI (reg:DI 14 a4 [267])
> >             (const_int 32 [0x20]))) "test_dbmd_pucinterruptenable_rw.c":18:31 204 {ashldi3}
> >      (nil))
> > (insn 216 215 42 6 (set (reg/f:DI 14 a4 [166])
> >         (plus:DI (reg:DI 14 a4 [271])
> >             (reg:DI 13 a3 [268]))) "test_dbmd_pucinterruptenable_rw.c":18:31 5 {riscv_adddi3}
> >      (expr_list:REG_DEAD (reg:DI 13 a3 [268])
> >         (expr_list:REG_EQUIV (const_int 1501199875796996 [0x5555555557004])
> >             (nil))))
>
>
>
> Note that 32bit ASHIFT in insn 215.  If you're doing that computation in
> a 32bit integer type, then it's going to shift off the end of the type.
>
Thanks for reporting. I also noticed this while reworking the
implementation for v2 and I have fixed it among other things.

But I'm still wondering about the type of the offset folding
calculation and whether it could overflow in a bad way:
Could there also be edge cases where HOST_WIDE_INT would be problematic as well?
Maybe unsigned HOST_WIDE_INT is more correct (due to potential overflow issues)?

Manolis

>
> Jeff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/2] RISC-V: New pass to optimize calculation of offsets for memory operations.
  2023-06-15 15:30     ` Manolis Tsamis
@ 2023-06-15 15:56       ` Jeff Law
  2023-06-18 18:11       ` Jeff Law
  1 sibling, 0 replies; 45+ messages in thread
From: Jeff Law @ 2023-06-15 15:56 UTC (permalink / raw)
  To: Manolis Tsamis
  Cc: gcc-patches, Richard Biener, Palmer Dabbelt, Philipp Tomsich, Kito Cheng



On 6/15/23 09:30, Manolis Tsamis wrote:
> On Thu, Jun 15, 2023 at 6:04 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>>
>>
>>
>> On 5/25/23 07:42, Jeff Law wrote:
>>
>>> Thanks Manolis.  Do you happen to know if this includes the fixes I
>>> passed along to Philipp a few months back?  My recollection is one fixed
>>> stale DF data which prevented an ICE during bootstrapping, the other
>>> needed to ignore debug insns in one or two places so that the behavior
>>> didn't change based on the existence of debug insns.
>> So we stumbled over another relatively minor issue in this code this
>> week that I'm sure you'll want to fix for a V2.
>>
>> Specifically fold_offset's "scale" argument needs to be a HOST_WIDE_INT
>> rather than an "int".  Inside the ASHIFT handling you need to change the
>> type of shift_scale to a HOST_WIDE_INT as well and potentially the
>> actual computation of shift_scale.
>>
>> The problem is if you have a compile-time constant address on rv64, it
>> might be constructed with code like this:
>>
>>
>>
>>
>>> (insn 282 47 283 6 (set (reg:DI 14 a4 [267])
>>>          (const_int 348160 [0x55000])) "test_dbmd_pucinterruptenable_rw.c":18:31 179 {*movdi_64bit}
>>>       (nil))
>>> (insn 283 282 284 6 (set (reg:DI 14 a4 [267])
>>>          (plus:DI (reg:DI 14 a4 [267])
>>>              (const_int 1365 [0x555]))) "test_dbmd_pucinterruptenable_rw.c":18:31 5 {riscv_adddi3}
>>>       (expr_list:REG_EQUAL (const_int 349525 [0x55555])
>>>          (nil)))
>>> (insn 284 283 285 6 (set (reg:DI 13 a3 [268])
>>>          (const_int 1431662592 [0x55557000])) "test_dbmd_pucinterruptenable_rw.c":18:31 179 {*movdi_64bit}
>>>       (nil))
>>> (insn 285 284 215 6 (set (reg:DI 13 a3 [268])
>>>          (plus:DI (reg:DI 13 a3 [268])
>>>              (const_int 4 [0x4]))) "test_dbmd_pucinterruptenable_rw.c":18:31 5 {riscv_adddi3}
>>>       (expr_list:REG_EQUAL (const_int 1431662596 [0x55557004])
>>>          (nil)))
>>> (insn 215 285 216 6 (set (reg:DI 14 a4 [271])
>>>          (ashift:DI (reg:DI 14 a4 [267])
>>>              (const_int 32 [0x20]))) "test_dbmd_pucinterruptenable_rw.c":18:31 204 {ashldi3}
>>>       (nil))
>>> (insn 216 215 42 6 (set (reg/f:DI 14 a4 [166])
>>>          (plus:DI (reg:DI 14 a4 [271])
>>>              (reg:DI 13 a3 [268]))) "test_dbmd_pucinterruptenable_rw.c":18:31 5 {riscv_adddi3}
>>>       (expr_list:REG_DEAD (reg:DI 13 a3 [268])
>>>          (expr_list:REG_EQUIV (const_int 1501199875796996 [0x5555555557004])
>>>              (nil))))
>>
>>
>>
>> Note that 32bit ASHIFT in insn 215.  If you're doing that computation in
>> a 32bit integer type, then it's going to shift off the end of the type.
>>
> Thanks for reporting. I also noticed this while reworking the
> implementation for v2 and I have fixed it among other things.
> 
> But I'm still wondering about the type of the offset folding
> calculation and whether it could overflow in a bad way:
> Could there also be edge cases where HOST_WIDE_INT would be problematic as well?
> Maybe unsigned HOST_WIDE_INT is more correct (due to potential overflow issues)?
I think HOST_WIDE_INT is going to be OK.  If we overflow a H_W_I, then 
there's bigger problems elsewhere.

jeff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets.
  2023-06-12 21:58       ` Jeff Law
@ 2023-06-15 17:34         ` Manolis Tsamis
  0 siblings, 0 replies; 45+ messages in thread
From: Manolis Tsamis @ 2023-06-15 17:34 UTC (permalink / raw)
  To: Jeff Law
  Cc: gcc-patches, Richard Biener, Palmer Dabbelt, Philipp Tomsich, Kito Cheng

The new target-independant implementation of fold-mem-offsets pass can
be found in the list (link is
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/621920.html)

Aside from now being target independent, I have fixed a number of new
bugs that emerged when running this on other targets and a minor
memory leak.
I have also improved the propagation logic in fold_offsets to work
with more patterns found in other targets (e.g. LEA instructions in
x86).
Finally I improved the naming of things (e.g. replaced uses of
'delete'/'remove' with 'fold', made bitmap names more meaningful) and
reduced unnecessary verbosity in some comments.

Thanks,
Manolis

On Tue, Jun 13, 2023 at 12:58 AM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 6/12/23 01:32, Manolis Tsamis wrote:
>
> >>
> >>> +  for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
> >>> +    {
> >>> +      /* Problem getting some definition for this instruction.  */
> >>> +      if (ref_link->ref == NULL)
> >>> +     return NULL;
> >>> +      if (DF_REF_INSN_INFO (ref_link->ref) == NULL)
> >>> +     return NULL;
> >>> +      if (global_regs[REGNO (reg)]
> >>> +       && !set_of (reg, DF_REF_INSN (ref_link->ref)))
> >>> +     return NULL;
> >>> +    }
> >> That last condition feels a bit odd.  It would seem that you wanted an
> >> OR boolean rather than AND.
> >>
> >
> > Most of this function I didn't write by myself, I used existing code
> > to get definitions taken from REE's get_defs.
> > In the original there's a comment about this line this comment that explains it:
> >
> >    As global regs are assumed to be defined at each function call
> >    dataflow can report a call_insn as being a definition of REG.
> >    But we can't do anything with that in this pass so proceed only
> >    if the instruction really sets REG in a way that can be deduced
> >    from the RTL structure.
> >
> > This function is the only one I copied without changing much (because
> > I didn't quite understand it), so I don't know if that condition is
> > any useful for f-m-o.
> > Also the code duplication here is a bit unfortunate, maybe it would be
> > preferred to create a generic version that can be used in both?
> Ah.  So I think the code is meant to filter out things that DF will say
> are set vs those which are actually exposed explicitly in the RTL (and
> which REE might be able to modify).  So we're probably good.
>
> Those routines are are pretty close to each other in implementation.  I
> bet we could take everything up to the loop over the ref links and
> factor that into a common function.   Both your function and get_defs
> would be able to use that and then do bit of processing afterwards.
>
>
> >
> >>
> >>> +
> >>> +  unsigned int dest_regno = REGNO (dest);
> >>> +
> >>> +  /* We don't want to fold offsets from instructions that change some
> >>> +     particular registers with potentially global side effects.  */
> >>> +  if (!GP_REG_P (dest_regno)
> >>> +      || dest_regno == STACK_POINTER_REGNUM
> >>> +      || (frame_pointer_needed && dest_regno == HARD_FRAME_POINTER_REGNUM)
> >>> +      || dest_regno == GP_REGNUM
> >>> +      || dest_regno == THREAD_POINTER_REGNUM
> >>> +      || dest_regno == RETURN_ADDR_REGNUM)
> >>> +    return 0;
> >> I'd think most of this would be captured by testing fixed_registers
> >> rather than trying to list each register individually.  In fact, if we
> >> need to generalize this to work on other targets we almost certainly
> >> want a more general test.
> >>
> >
> > Thanks, I knew there would be some proper way to test this but wasn't
> > aware which is the correct one.
> > Should this look like below? Or is the GP_REG_P redundant and just
> > fixed_regs will do?
> If you want to verify it's a general register, then you have to ask if
> the regno is in the GENERAL_REGS class.  Something like:
>
> TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], dest_regno)
>
> GP_REG_P is a risc-v specific macro, so we can't use it here.
>
> So something like
>    if (fixed_regs[dest_regno]
>        || !TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], dest_regno))
>
>
>
> >> The debugging message is a bit misleading.  Yea, we always delete
> >> something here, but in one case we end up emitting a copy.
> >>
> >
> > Indeed. Maybe "Instruction reduced to move: ..."?
> Works for me.
>
> >
> >>
> >>
> >>> +
> >>> +       /* Temporarily change the offset in MEM to test whether
> >>> +          it results in a valid instruction.  */
> >>> +       machine_mode mode = GET_MODE (mem_addr);
> >>> +       XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, GEN_INT (offset));
> >>> +
> >>> +       bool valid_change = recog (PATTERN (insn), insn, 0) >= 0;
> >>> +
> >>> +       /* Restore the instruction.  */
> >>> +       XEXP (mem, 0) = mem_addr;
> >> You need to reset the INSN_CODE after restoring the instruction.  That's
> >> generally a bad thing to do, but I've seen it done enough (and been
> >> guilty myself in the past) that we should just assume some ports are
> >> broken in this regard.
> >>
> >
> > Ok thanks, I didn't knew that one.
> It's pretty obscure and I probably would have missed it if I hadn't
> debugged a problem related to this just a few months back.  It shouldn't
> be necessary due to rules about how the movXX patterns are supposed to
> work.  But I've seen it mucked up enough that it's the right thing to do.
>
> Essentially when you call into recog, if the pattern is recognized, then
> a cookie is cached so that we know what pattern was recognized within
> the backend.
>
> As far as generalizing to a target independent pass.  You'll need to
> declare the new pass in tree-pass.h, add it to passes.def, wire up the
> option in common.opt, document it in invoke.texi and turn it on for -O2
> and above.  WRT the interaction with shorten-memrefs, I think that can
> be handled in override-options.  I think we want to have shorten-memrefs
> override.  So if shorten-memrefs is on, then we turn off f-o-m in the RV
> backend.  This should probably be documented in invoke.texi as well.
>
> It sounds like a lot, but I think each of those is a relatively simple
> change.  It'd be best if you could tackle those changes.  I think with
> that done and bootstrap/regression test round we'd be ready to integrate
> your work.
>
> Thanks!
>
> jeff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.
  2023-06-07 22:18   ` Jeff Law
  2023-06-08  6:15     ` Manolis Tsamis
@ 2023-06-15 20:13     ` Philipp Tomsich
  2023-06-19 16:57       ` Thiago Jung Bauermann
  1 sibling, 1 reply; 45+ messages in thread
From: Philipp Tomsich @ 2023-06-15 20:13 UTC (permalink / raw)
  To: Jeff Law
  Cc: Manolis Tsamis, gcc-patches, Richard Biener, Palmer Dabbelt, Kito Cheng

Rebased, retested, and applied to trunk.  Thanks!
--Philipp.


On Thu, 8 Jun 2023 at 00:18, Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 5/25/23 06:35, Manolis Tsamis wrote:
> > Propagation of the stack pointer in cprop_hardreg is currenty forbidden
> > in all cases, due to maybe_mode_change returning NULL. Relax this
> > restriction and allow propagation when no mode change is requested.
> >
> > gcc/ChangeLog:
> >
> >          * regcprop.cc (maybe_mode_change): Enable stack pointer propagation.
> Thanks for the clarification.  This is OK for the trunk.  It looks
> generic enough to have value going forward now rather than waiting.
>
> jeff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/2] RISC-V: New pass to optimize calculation of offsets for memory operations.
  2023-06-15 15:30     ` Manolis Tsamis
  2023-06-15 15:56       ` Jeff Law
@ 2023-06-18 18:11       ` Jeff Law
  1 sibling, 0 replies; 45+ messages in thread
From: Jeff Law @ 2023-06-18 18:11 UTC (permalink / raw)
  To: Manolis Tsamis
  Cc: gcc-patches, Richard Biener, Palmer Dabbelt, Philipp Tomsich, Kito Cheng

On 6/15/23 09:30, Manolis Tsamis wrote:

>>
> Thanks for reporting. I also noticed this while reworking the
> implementation for v2 and I have fixed it among other things.
Sounds good.  I stumbled across another problem while testing V2.

GEN_INT can create a non-canonical integer constant (and one might 
legitimately wonder if we should eliminate GEN_INT).  The specific case 
I ran into was something like 0xfffffff0 for an SImode value on a 64bit 
host.  That should have been 0xfffffffffffffff0 to be canonical.

The right way to handle this these days is with gen_int_mode.    You 
should replace the two calls to GEN_INT with gen_int_mode (new_offset, mode)

Still testing the new variant...

jeff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.
  2023-06-15 20:13     ` Philipp Tomsich
@ 2023-06-19 16:57       ` Thiago Jung Bauermann
  2023-06-19 17:07         ` Manolis Tsamis
  2023-06-19 23:40         ` Andrew Pinski
  0 siblings, 2 replies; 45+ messages in thread
From: Thiago Jung Bauermann @ 2023-06-19 16:57 UTC (permalink / raw)
  To: Manolis Tsamis
  Cc: Jeff Law, Philipp Tomsich, Richard Biener, Palmer Dabbelt,
	Kito Cheng, gcc-patches


Hello Manolis,

Philipp Tomsich <philipp.tomsich@vrull.eu> writes:

> On Thu, 8 Jun 2023 at 00:18, Jeff Law <jeffreyalaw@gmail.com> wrote:
>>
>> On 5/25/23 06:35, Manolis Tsamis wrote:
>> > Propagation of the stack pointer in cprop_hardreg is currenty forbidden
>> > in all cases, due to maybe_mode_change returning NULL. Relax this
>> > restriction and allow propagation when no mode change is requested.
>> >
>> > gcc/ChangeLog:
>> >
>> >          * regcprop.cc (maybe_mode_change): Enable stack pointer propagation.
>> Thanks for the clarification.  This is OK for the trunk.  It looks
>> generic enough to have value going forward now rather than waiting.
>
> Rebased, retested, and applied to trunk.  Thanks!

Our CI found a couple of tests that started failing on aarch64-linux
after this commit. I was able to confirm manually that they don't happen
in the commit immediately before this one, and also that these failures
are still present in today's trunk.

I have testsuite logs for last good commit, first bad commit and current
trunk here:

https://people.linaro.org/~thiago.bauermann/gcc-regression-6a2e8dcbbd4b/

Could you please check?

These are the new failures:

Running gcc:gcc.target/aarch64/aarch64.exp ...
FAIL: gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times mov\\tx11, sp 1

Running gcc:gcc.target/aarch64/sve/pcs/aarch64-sve-pcs.exp ...
FAIL: gcc.target/aarch64/sve/pcs/args_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_pred
FAIL: gcc.target/aarch64/sve/pcs/args_2.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tmov\\t(z[0-9]+\\.b), #8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_3.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tmov\\t(z[0-9]+\\.b), #8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_4.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tfmov\\t(z[0-9]+\\.h), #8\\.0.*\\tst1h\\t\\1, p[0-7], \\[x4\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_bf16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2b\\t{(z[0-9]+\\.b) - z[0-9]+\\.b}.*\\tst1b\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2b\\t{(z[0-9]+\\.b) - z[0-9]+\\.b}.*\\tst1b\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_le_bf16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+)\\.s - z[0-9]+\\.s}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+)\\.d - z[0-9]+\\.d}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_le_s16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_le_s32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+)\\.s - z[0-9]+\\.s}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_le_s64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+)\\.d - z[0-9]+\\.d}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_le_s8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2b\\t{(z[0-9]+)\\.b - z[0-9]+\\.b}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_le_u16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_le_u32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+)\\.s - z[0-9]+\\.s}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_le_u64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+)\\.d - z[0-9]+\\.d}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_5_le_u8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2b\\t{(z[0-9]+)\\.b - z[0-9]+\\.b}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_be_bf16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1h\\t(z[0-9]+\\.h), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1h\\t\\1, p[0-7], \\[x2\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_be_bf16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_be_f16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1h\\t(z[0-9]+\\.h), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1h\\t\\1, p[0-7], \\[x2\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_be_f16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_be_f32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1w\\t(z[0-9]+\\.s), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1w\\t\\1, p[0-7], \\[x2\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_be_f32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4w\\t{(z[0-9]+\\.s) - z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_be_f64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1d\\t(z[0-9]+\\.d), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1d\\t\\1, p[0-7], \\[x2\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_be_f64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4d\\t{(z[0-9]+\\.d) - z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_be_s16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1h\\t(z[0-9]+\\.h), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1h\\t\\1, p[0-7], \\[x2\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_be_s16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_be_s32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1w\\t(z[0-9]+\\.s), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1w\\t\\1, p[0-7], \\[x2\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_be_s32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4w\\t{(z[0-9]+\\.s) - z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_be_s64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1d\\t(z[0-9]+\\.d), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1d\\t\\1, p[0-7], \\[x2\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_be_s64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4d\\t{(z[0-9]+\\.d) - z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_be_s8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1b\\t(z[0-9]+\\.b), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1b\\t\\1, p[0-7], \\[x2\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_be_s8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4b\\t{(z[0-9]+\\.b) - z[0-9]+\\.b}.*\\tst1b\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_be_u16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1h\\t(z[0-9]+\\.h), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1h\\t\\1, p[0-7], \\[x2\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_be_u16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_be_u32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1w\\t(z[0-9]+\\.s), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1w\\t\\1, p[0-7], \\[x2\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_be_u32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4w\\t{(z[0-9]+\\.s) - z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_be_u64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1d\\t(z[0-9]+\\.d), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1d\\t\\1, p[0-7], \\[x2\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_be_u64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4d\\t{(z[0-9]+\\.d) - z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_be_u8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1b\\t(z[0-9]+\\.b), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1b\\t\\1, p[0-7], \\[x2\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_be_u8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4b\\t{(z[0-9]+\\.b) - z[0-9]+\\.b}.*\\tst1b\\t\\1, p[0-7], \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_le_bf16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1h\\t(z[0-9]+\\.h), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1h\\t\\1, p[0-7], \\[x2\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_le_bf16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_le_f16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1h\\t(z[0-9]+\\.h), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1h\\t\\1, p[0-7], \\[x2\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_le_f16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_le_f32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1w\\t(z[0-9]+\\.s), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1w\\t\\1, p[0-7], \\[x2\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_le_f32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4w\\t{(z[0-9]+)\\.s - z[0-9]+\\.s}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_le_f64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1d\\t(z[0-9]+\\.d), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1d\\t\\1, p[0-7], \\[x2\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_le_f64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4d\\t{(z[0-9]+)\\.d - z[0-9]+\\.d}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_le_s16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1h\\t(z[0-9]+\\.h), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1h\\t\\1, p[0-7], \\[x2\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_le_s16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_le_s32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1w\\t(z[0-9]+\\.s), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1w\\t\\1, p[0-7], \\[x2\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_le_s32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4w\\t{(z[0-9]+)\\.s - z[0-9]+\\.s}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_le_s64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1d\\t(z[0-9]+\\.d), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1d\\t\\1, p[0-7], \\[x2\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_le_s64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4d\\t{(z[0-9]+)\\.d - z[0-9]+\\.d}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_le_s8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1b\\t(z[0-9]+\\.b), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1b\\t\\1, p[0-7], \\[x2\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_le_s8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4b\\t{(z[0-9]+)\\.b - z[0-9]+\\.b}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_le_u16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1h\\t(z[0-9]+\\.h), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1h\\t\\1, p[0-7], \\[x2\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_le_u16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_le_u32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1w\\t(z[0-9]+\\.s), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1w\\t\\1, p[0-7], \\[x2\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_le_u32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4w\\t{(z[0-9]+)\\.s - z[0-9]+\\.s}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_le_u64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1d\\t(z[0-9]+\\.d), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1d\\t\\1, p[0-7], \\[x2\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_le_u64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4d\\t{(z[0-9]+)\\.d - z[0-9]+\\.d}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_le_u8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1b\\t(z[0-9]+\\.b), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1b\\t\\1, p[0-7], \\[x2\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_6_le_u8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4b\\t{(z[0-9]+)\\.b - z[0-9]+\\.b}.*\\tstr\\t\\1, \\[x1\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/args_8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tmov\\t(z[0-9]+\\.b), #42\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n
FAIL: gcc.target/aarch64/sve/pcs/stack_clash_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_1
FAIL: gcc.target/aarch64/sve/pcs/stack_clash_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_3
FAIL: gcc.target/aarch64/sve/pcs/stack_clash_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_4
FAIL: gcc.target/aarch64/sve/pcs/stack_clash_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_5
FAIL: gcc.target/aarch64/sve/pcs/stack_clash_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_6
FAIL: gcc.target/aarch64/sve/pcs/stack_clash_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_7
FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_1
FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_10
FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_11
FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_2
FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_3
FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_4
FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_5
FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_6
FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_7
FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_8
FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_9
FAIL: gcc.target/aarch64/sve/pcs/stack_clash_3.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_1
FAIL: gcc.target/aarch64/sve/pcs/stack_clash_3.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_2
FAIL: gcc.target/aarch64/sve/pcs/varargs_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
FAIL: gcc.target/aarch64/sve/pcs/varargs_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
FAIL: gcc.target/aarch64/sve/pcs/varargs_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s8.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s8.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s8.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u8.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u8.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u8.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7

-- 
Thiago

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.
  2023-06-19 16:57       ` Thiago Jung Bauermann
@ 2023-06-19 17:07         ` Manolis Tsamis
  2023-06-19 23:40         ` Andrew Pinski
  1 sibling, 0 replies; 45+ messages in thread
From: Manolis Tsamis @ 2023-06-19 17:07 UTC (permalink / raw)
  To: Thiago Jung Bauermann
  Cc: Jeff Law, Philipp Tomsich, Richard Biener, Palmer Dabbelt,
	Kito Cheng, gcc-patches

On Mon, Jun 19, 2023 at 7:57 PM Thiago Jung Bauermann
<thiago.bauermann@linaro.org> wrote:
>
>
> Hello Manolis,
>
> Philipp Tomsich <philipp.tomsich@vrull.eu> writes:
>
> > On Thu, 8 Jun 2023 at 00:18, Jeff Law <jeffreyalaw@gmail.com> wrote:
> >>
> >> On 5/25/23 06:35, Manolis Tsamis wrote:
> >> > Propagation of the stack pointer in cprop_hardreg is currenty forbidden
> >> > in all cases, due to maybe_mode_change returning NULL. Relax this
> >> > restriction and allow propagation when no mode change is requested.
> >> >
> >> > gcc/ChangeLog:
> >> >
> >> >          * regcprop.cc (maybe_mode_change): Enable stack pointer propagation.
> >> Thanks for the clarification.  This is OK for the trunk.  It looks
> >> generic enough to have value going forward now rather than waiting.
> >
> > Rebased, retested, and applied to trunk.  Thanks!
>
> Our CI found a couple of tests that started failing on aarch64-linux
> after this commit. I was able to confirm manually that they don't happen
> in the commit immediately before this one, and also that these failures
> are still present in today's trunk.
>
> I have testsuite logs for last good commit, first bad commit and current
> trunk here:
>
> https://people.linaro.org/~thiago.bauermann/gcc-regression-6a2e8dcbbd4b/
>
> Could you please check?
>
> These are the new failures:
>
> Running gcc:gcc.target/aarch64/aarch64.exp ...
> FAIL: gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times mov\\tx11, sp 1
>
> Running gcc:gcc.target/aarch64/sve/pcs/aarch64-sve-pcs.exp ...
> FAIL: gcc.target/aarch64/sve/pcs/args_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_pred
> FAIL: gcc.target/aarch64/sve/pcs/args_2.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tmov\\t(z[0-9]+\\.b), #8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_3.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tmov\\t(z[0-9]+\\.b), #8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_4.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tfmov\\t(z[0-9]+\\.h), #8\\.0.*\\tst1h\\t\\1, p[0-7], \\[x4\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_bf16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2b\\t{(z[0-9]+\\.b) - z[0-9]+\\.b}.*\\tst1b\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2b\\t{(z[0-9]+\\.b) - z[0-9]+\\.b}.*\\tst1b\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_bf16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+)\\.s - z[0-9]+\\.s}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+)\\.d - z[0-9]+\\.d}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_s16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_s32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+)\\.s - z[0-9]+\\.s}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_s64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+)\\.d - z[0-9]+\\.d}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_s8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2b\\t{(z[0-9]+)\\.b - z[0-9]+\\.b}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_u16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_u32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+)\\.s - z[0-9]+\\.s}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_u64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+)\\.d - z[0-9]+\\.d}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_u8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2b\\t{(z[0-9]+)\\.b - z[0-9]+\\.b}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_bf16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1h\\t(z[0-9]+\\.h), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1h\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_bf16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_f16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1h\\t(z[0-9]+\\.h), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1h\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_f16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_f32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1w\\t(z[0-9]+\\.s), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1w\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_f32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4w\\t{(z[0-9]+\\.s) - z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_f64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1d\\t(z[0-9]+\\.d), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1d\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_f64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4d\\t{(z[0-9]+\\.d) - z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_s16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1h\\t(z[0-9]+\\.h), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1h\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_s16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_s32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1w\\t(z[0-9]+\\.s), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1w\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_s32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4w\\t{(z[0-9]+\\.s) - z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_s64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1d\\t(z[0-9]+\\.d), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1d\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_s64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4d\\t{(z[0-9]+\\.d) - z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_s8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1b\\t(z[0-9]+\\.b), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1b\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_s8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4b\\t{(z[0-9]+\\.b) - z[0-9]+\\.b}.*\\tst1b\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_u16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1h\\t(z[0-9]+\\.h), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1h\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_u16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_u32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1w\\t(z[0-9]+\\.s), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1w\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_u32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4w\\t{(z[0-9]+\\.s) - z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_u64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1d\\t(z[0-9]+\\.d), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1d\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_u64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4d\\t{(z[0-9]+\\.d) - z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_u8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1b\\t(z[0-9]+\\.b), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1b\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_u8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4b\\t{(z[0-9]+\\.b) - z[0-9]+\\.b}.*\\tst1b\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_bf16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1h\\t(z[0-9]+\\.h), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1h\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_bf16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_f16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1h\\t(z[0-9]+\\.h), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1h\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_f16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_f32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1w\\t(z[0-9]+\\.s), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1w\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_f32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4w\\t{(z[0-9]+)\\.s - z[0-9]+\\.s}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_f64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1d\\t(z[0-9]+\\.d), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1d\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_f64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4d\\t{(z[0-9]+)\\.d - z[0-9]+\\.d}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_s16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1h\\t(z[0-9]+\\.h), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1h\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_s16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_s32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1w\\t(z[0-9]+\\.s), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1w\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_s32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4w\\t{(z[0-9]+)\\.s - z[0-9]+\\.s}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_s64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1d\\t(z[0-9]+\\.d), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1d\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_s64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4d\\t{(z[0-9]+)\\.d - z[0-9]+\\.d}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_s8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1b\\t(z[0-9]+\\.b), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1b\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_s8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4b\\t{(z[0-9]+)\\.b - z[0-9]+\\.b}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_u16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1h\\t(z[0-9]+\\.h), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1h\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_u16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_u32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1w\\t(z[0-9]+\\.s), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1w\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_u32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4w\\t{(z[0-9]+)\\.s - z[0-9]+\\.s}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_u64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1d\\t(z[0-9]+\\.d), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1d\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_u64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4d\\t{(z[0-9]+)\\.d - z[0-9]+\\.d}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_u8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1b\\t(z[0-9]+\\.b), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1b\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_u8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4b\\t{(z[0-9]+)\\.b - z[0-9]+\\.b}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tmov\\t(z[0-9]+\\.b), #42\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_1
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_3
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_4
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_5
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_6
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_7
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_1
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_10
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_11
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_2
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_3
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_4
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_5
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_6
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_7
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_8
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_9
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_3.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_1
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_3.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_2
> FAIL: gcc.target/aarch64/sve/pcs/varargs_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> FAIL: gcc.target/aarch64/sve/pcs/varargs_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> FAIL: gcc.target/aarch64/sve/pcs/varargs_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s8.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s8.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s8.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u8.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u8.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u8.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
>
Hi Thiago,

Thanks for the heads up on this; I only tested this on x86 when I sent it.
I'll have a look and update on this thread asap.

Manolis

> --
> Thiago

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.
  2023-06-19 16:57       ` Thiago Jung Bauermann
  2023-06-19 17:07         ` Manolis Tsamis
@ 2023-06-19 23:40         ` Andrew Pinski
  2023-06-19 23:48           ` Andrew Pinski
  1 sibling, 1 reply; 45+ messages in thread
From: Andrew Pinski @ 2023-06-19 23:40 UTC (permalink / raw)
  To: Thiago Jung Bauermann
  Cc: Manolis Tsamis, Jeff Law, Philipp Tomsich, Richard Biener,
	Palmer Dabbelt, Kito Cheng, gcc-patches

On Mon, Jun 19, 2023 at 9:58 AM Thiago Jung Bauermann via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
> Hello Manolis,
>
> Philipp Tomsich <philipp.tomsich@vrull.eu> writes:
>
> > On Thu, 8 Jun 2023 at 00:18, Jeff Law <jeffreyalaw@gmail.com> wrote:
> >>
> >> On 5/25/23 06:35, Manolis Tsamis wrote:
> >> > Propagation of the stack pointer in cprop_hardreg is currenty forbidden
> >> > in all cases, due to maybe_mode_change returning NULL. Relax this
> >> > restriction and allow propagation when no mode change is requested.
> >> >
> >> > gcc/ChangeLog:
> >> >
> >> >          * regcprop.cc (maybe_mode_change): Enable stack pointer propagation.
> >> Thanks for the clarification.  This is OK for the trunk.  It looks
> >> generic enough to have value going forward now rather than waiting.
> >
> > Rebased, retested, and applied to trunk.  Thanks!
>
> Our CI found a couple of tests that started failing on aarch64-linux
> after this commit. I was able to confirm manually that they don't happen
> in the commit immediately before this one, and also that these failures
> are still present in today's trunk.
>
> I have testsuite logs for last good commit, first bad commit and current
> trunk here:
>
> https://people.linaro.org/~thiago.bauermann/gcc-regression-6a2e8dcbbd4b/
>
> Could you please check?
>
> These are the new failures:
>
> Running gcc:gcc.target/aarch64/aarch64.exp ...
> FAIL: gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times mov\\tx11, sp 1

So for the above before this change we had:
```
(insn:TI 597 596 598 2 (set (reg:DI 11 x11)
        (reg/f:DI 31 sp)) "stack-check-prologue-16.c":16:1 65 {*movdi_aarch64}
     (nil))
(insn 598 597 599 2 (set (mem:BLK (scratch) [0  A8])
        (unspec:BLK [
                (reg:DI 11 x11)
                (reg/f:DI 31 sp)
            ] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1 1169
{stack_tie}
     (expr_list:REG_DEAD (reg:DI 11 x11)
        (nil)))
```

After we get:
```
(insn 598 596 599 2 (set (mem:BLK (scratch) [0  A8])
        (unspec:BLK [
                (reg:DI 31 sp [11]) repeated x2
            ] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1 1169
{stack_tie}
     (nil))
```
Which seems to be ok, except we still have:
.cfi_def_cfa_register 11

That is because on:
(insn/f 596 595 598 2 (set (reg:DI 12 x12)
        (plus:DI (reg:DI 12 x12)
            (const_int 272 [0x110]))) "stack-check-prologue-16.c":16:1
153 {*adddi3_aarch64}
     (expr_list:REG_CFA_DEF_CFA (reg:DI 11 x11)
        (nil)))

We record x11 but never update it though that came before the mov for
x11 ... So it seems like cprop_hardreg had no idea it needed to update
it.

I suspect the other testcases are just propagation of sp into the
stores and such and just needed update. But the above testcase seems
getting broken cfi  though I don't know how to fix it.

Thanks,
Andrew Pinski


>
> Running gcc:gcc.target/aarch64/sve/pcs/aarch64-sve-pcs.exp ...
> FAIL: gcc.target/aarch64/sve/pcs/args_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_pred
> FAIL: gcc.target/aarch64/sve/pcs/args_2.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tmov\\t(z[0-9]+\\.b), #8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_3.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tmov\\t(z[0-9]+\\.b), #8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_4.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tfmov\\t(z[0-9]+\\.h), #8\\.0.*\\tst1h\\t\\1, p[0-7], \\[x4\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_bf16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2b\\t{(z[0-9]+\\.b) - z[0-9]+\\.b}.*\\tst1b\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2b\\t{(z[0-9]+\\.b) - z[0-9]+\\.b}.*\\tst1b\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_bf16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+)\\.s - z[0-9]+\\.s}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+)\\.d - z[0-9]+\\.d}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_s16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_s32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+)\\.s - z[0-9]+\\.s}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_s64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+)\\.d - z[0-9]+\\.d}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_s8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2b\\t{(z[0-9]+)\\.b - z[0-9]+\\.b}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_u16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_u32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+)\\.s - z[0-9]+\\.s}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_u64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+)\\.d - z[0-9]+\\.d}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_5_le_u8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2b\\t{(z[0-9]+)\\.b - z[0-9]+\\.b}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_bf16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1h\\t(z[0-9]+\\.h), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1h\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_bf16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_f16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1h\\t(z[0-9]+\\.h), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1h\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_f16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_f32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1w\\t(z[0-9]+\\.s), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1w\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_f32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4w\\t{(z[0-9]+\\.s) - z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_f64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1d\\t(z[0-9]+\\.d), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1d\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_f64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4d\\t{(z[0-9]+\\.d) - z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_s16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1h\\t(z[0-9]+\\.h), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1h\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_s16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_s32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1w\\t(z[0-9]+\\.s), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1w\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_s32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4w\\t{(z[0-9]+\\.s) - z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_s64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1d\\t(z[0-9]+\\.d), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1d\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_s64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4d\\t{(z[0-9]+\\.d) - z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_s8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1b\\t(z[0-9]+\\.b), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1b\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_s8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4b\\t{(z[0-9]+\\.b) - z[0-9]+\\.b}.*\\tst1b\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_u16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1h\\t(z[0-9]+\\.h), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1h\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_u16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_u32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1w\\t(z[0-9]+\\.s), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1w\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_u32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4w\\t{(z[0-9]+\\.s) - z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_u64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1d\\t(z[0-9]+\\.d), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1d\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_u64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4d\\t{(z[0-9]+\\.d) - z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_u8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1b\\t(z[0-9]+\\.b), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1b\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_be_u8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4b\\t{(z[0-9]+\\.b) - z[0-9]+\\.b}.*\\tst1b\\t\\1, p[0-7], \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_bf16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1h\\t(z[0-9]+\\.h), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1h\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_bf16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_f16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1h\\t(z[0-9]+\\.h), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1h\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_f16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_f32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1w\\t(z[0-9]+\\.s), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1w\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_f32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4w\\t{(z[0-9]+)\\.s - z[0-9]+\\.s}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_f64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1d\\t(z[0-9]+\\.d), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1d\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_f64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4d\\t{(z[0-9]+)\\.d - z[0-9]+\\.d}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_s16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1h\\t(z[0-9]+\\.h), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1h\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_s16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_s32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1w\\t(z[0-9]+\\.s), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1w\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_s32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4w\\t{(z[0-9]+)\\.s - z[0-9]+\\.s}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_s64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1d\\t(z[0-9]+\\.d), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1d\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_s64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4d\\t{(z[0-9]+)\\.d - z[0-9]+\\.d}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_s8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1b\\t(z[0-9]+\\.b), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1b\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_s8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4b\\t{(z[0-9]+)\\.b - z[0-9]+\\.b}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_u16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1h\\t(z[0-9]+\\.h), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1h\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_u16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_u32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1w\\t(z[0-9]+\\.s), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1w\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_u32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4w\\t{(z[0-9]+)\\.s - z[0-9]+\\.s}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_u64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1d\\t(z[0-9]+\\.d), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1d\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_u64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4d\\t{(z[0-9]+)\\.d - z[0-9]+\\.d}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_u8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1b\\t(z[0-9]+\\.b), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1b\\t\\1, p[0-7], \\[x2\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_6_le_u8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4b\\t{(z[0-9]+)\\.b - z[0-9]+\\.b}.*\\tstr\\t\\1, \\[x1\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/args_8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tmov\\t(z[0-9]+\\.b), #42\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_1
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_3
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_4
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_5
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_6
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_7
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_1
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_10
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_11
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_2
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_3
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_4
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_5
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_6
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_7
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_8
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_9
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_3.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_1
> FAIL: gcc.target/aarch64/sve/pcs/stack_clash_3.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_2
> FAIL: gcc.target/aarch64/sve/pcs/varargs_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> FAIL: gcc.target/aarch64/sve/pcs/varargs_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> FAIL: gcc.target/aarch64/sve/pcs/varargs_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s8.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s8.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s8.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u8.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u8.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u8.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
>
> --
> Thiago

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.
  2023-06-19 23:40         ` Andrew Pinski
@ 2023-06-19 23:48           ` Andrew Pinski
  2023-06-20  2:16             ` Jeff Law
  0 siblings, 1 reply; 45+ messages in thread
From: Andrew Pinski @ 2023-06-19 23:48 UTC (permalink / raw)
  To: Thiago Jung Bauermann
  Cc: Manolis Tsamis, Jeff Law, Philipp Tomsich, Richard Biener,
	Palmer Dabbelt, Kito Cheng, gcc-patches, Tamar Christina

On Mon, Jun 19, 2023 at 4:40 PM Andrew Pinski <pinskia@gmail.com> wrote:
>
> On Mon, Jun 19, 2023 at 9:58 AM Thiago Jung Bauermann via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> >
> > Hello Manolis,
> >
> > Philipp Tomsich <philipp.tomsich@vrull.eu> writes:
> >
> > > On Thu, 8 Jun 2023 at 00:18, Jeff Law <jeffreyalaw@gmail.com> wrote:
> > >>
> > >> On 5/25/23 06:35, Manolis Tsamis wrote:
> > >> > Propagation of the stack pointer in cprop_hardreg is currenty forbidden
> > >> > in all cases, due to maybe_mode_change returning NULL. Relax this
> > >> > restriction and allow propagation when no mode change is requested.
> > >> >
> > >> > gcc/ChangeLog:
> > >> >
> > >> >          * regcprop.cc (maybe_mode_change): Enable stack pointer propagation.
> > >> Thanks for the clarification.  This is OK for the trunk.  It looks
> > >> generic enough to have value going forward now rather than waiting.
> > >
> > > Rebased, retested, and applied to trunk.  Thanks!
> >
> > Our CI found a couple of tests that started failing on aarch64-linux
> > after this commit. I was able to confirm manually that they don't happen
> > in the commit immediately before this one, and also that these failures
> > are still present in today's trunk.
> >
> > I have testsuite logs for last good commit, first bad commit and current
> > trunk here:
> >
> > https://people.linaro.org/~thiago.bauermann/gcc-regression-6a2e8dcbbd4b/
> >
> > Could you please check?
> >
> > These are the new failures:
> >
> > Running gcc:gcc.target/aarch64/aarch64.exp ...
> > FAIL: gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times mov\\tx11, sp 1
>
> So for the above before this change we had:
> ```
> (insn:TI 597 596 598 2 (set (reg:DI 11 x11)
>         (reg/f:DI 31 sp)) "stack-check-prologue-16.c":16:1 65 {*movdi_aarch64}
>      (nil))
> (insn 598 597 599 2 (set (mem:BLK (scratch) [0  A8])
>         (unspec:BLK [
>                 (reg:DI 11 x11)
>                 (reg/f:DI 31 sp)
>             ] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1 1169
> {stack_tie}
>      (expr_list:REG_DEAD (reg:DI 11 x11)
>         (nil)))
> ```
>
> After we get:
> ```
> (insn 598 596 599 2 (set (mem:BLK (scratch) [0  A8])
>         (unspec:BLK [
>                 (reg:DI 31 sp [11]) repeated x2
>             ] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1 1169
> {stack_tie}
>      (nil))
> ```
> Which seems to be ok, except we still have:
> .cfi_def_cfa_register 11
>
> That is because on:
> (insn/f 596 595 598 2 (set (reg:DI 12 x12)
>         (plus:DI (reg:DI 12 x12)
>             (const_int 272 [0x110]))) "stack-check-prologue-16.c":16:1
> 153 {*adddi3_aarch64}
>      (expr_list:REG_CFA_DEF_CFA (reg:DI 11 x11)
>         (nil)))
>
> We record x11 but never update it though that came before the mov for
> x11 ... So it seems like cprop_hardreg had no idea it needed to update
> it.
>
> I suspect the other testcases are just propagation of sp into the
> stores and such and just needed update. But the above testcase seems
> getting broken cfi  though I don't know how to fix it.

The code from aarch64.cc:
```
          /* This is done to provide unwinding information for the stack
             adjustments we're about to do, however to prevent the optimizers
             from removing the R11 move and leaving the CFA note (which would be
             very wrong) we tie the old and new stack pointer together.
             The tie will expand to nothing but the optimizers will not touch
             the instruction.  */
          rtx stack_ptr_copy = gen_rtx_REG (Pmode, STACK_CLASH_SVE_CFA_REGNUM);
          emit_move_insn (stack_ptr_copy, stack_pointer_rtx);
          emit_insn (gen_stack_tie (stack_ptr_copy, stack_pointer_rtx));

          /* We want the CFA independent of the stack pointer for the
             duration of the loop.  */
          add_reg_note (insn, REG_CFA_DEF_CFA, stack_ptr_copy);
          RTX_FRAME_RELATED_P (insn) = 1;
```

Well except now with this change, the optimizers touch this
instruction. Maybe the move instruction should not be a move but an
unspec so optimizers don't know what the move was.
Adding Tamar to the CC who added this code to aarch64 originally for
comments on the above understanding here.

Thanks,
Andrew


>
> Thanks,
> Andrew Pinski
>
>
> >
> > Running gcc:gcc.target/aarch64/sve/pcs/aarch64-sve-pcs.exp ...
> > FAIL: gcc.target/aarch64/sve/pcs/args_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_pred
> > FAIL: gcc.target/aarch64/sve/pcs/args_2.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tmov\\t(z[0-9]+\\.b), #8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_3.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tmov\\t(z[0-9]+\\.b), #8\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_4.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tfmov\\t(z[0-9]+\\.h), #8\\.0.*\\tst1h\\t\\1, p[0-7], \\[x4\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_5_be_bf16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_5_be_f64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_5_be_s8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2b\\t{(z[0-9]+\\.b) - z[0-9]+\\.b}.*\\tst1b\\t\\1, p[0-7], \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+\\.s) - z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+\\.d) - z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_5_be_u8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2b\\t{(z[0-9]+\\.b) - z[0-9]+\\.b}.*\\tst1b\\t\\1, p[0-7], \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_5_le_bf16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+)\\.s - z[0-9]+\\.s}.*\\tstr\\t\\1, \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_5_le_f64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+)\\.d - z[0-9]+\\.d}.*\\tstr\\t\\1, \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_5_le_s16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_5_le_s32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+)\\.s - z[0-9]+\\.s}.*\\tstr\\t\\1, \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_5_le_s64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+)\\.d - z[0-9]+\\.d}.*\\tstr\\t\\1, \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_5_le_s8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2b\\t{(z[0-9]+)\\.b - z[0-9]+\\.b}.*\\tstr\\t\\1, \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_5_le_u16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_5_le_u32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2w\\t{(z[0-9]+)\\.s - z[0-9]+\\.s}.*\\tstr\\t\\1, \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_5_le_u64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2d\\t{(z[0-9]+)\\.d - z[0-9]+\\.d}.*\\tstr\\t\\1, \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_5_le_u8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld2b\\t{(z[0-9]+)\\.b - z[0-9]+\\.b}.*\\tstr\\t\\1, \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_be_bf16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1h\\t(z[0-9]+\\.h), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1h\\t\\1, p[0-7], \\[x2\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_be_bf16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_be_f16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1h\\t(z[0-9]+\\.h), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1h\\t\\1, p[0-7], \\[x2\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_be_f16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_be_f32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1w\\t(z[0-9]+\\.s), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1w\\t\\1, p[0-7], \\[x2\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_be_f32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4w\\t{(z[0-9]+\\.s) - z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_be_f64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1d\\t(z[0-9]+\\.d), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1d\\t\\1, p[0-7], \\[x2\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_be_f64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4d\\t{(z[0-9]+\\.d) - z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_be_s16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1h\\t(z[0-9]+\\.h), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1h\\t\\1, p[0-7], \\[x2\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_be_s16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_be_s32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1w\\t(z[0-9]+\\.s), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1w\\t\\1, p[0-7], \\[x2\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_be_s32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4w\\t{(z[0-9]+\\.s) - z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_be_s64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1d\\t(z[0-9]+\\.d), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1d\\t\\1, p[0-7], \\[x2\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_be_s64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4d\\t{(z[0-9]+\\.d) - z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_be_s8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1b\\t(z[0-9]+\\.b), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1b\\t\\1, p[0-7], \\[x2\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_be_s8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4b\\t{(z[0-9]+\\.b) - z[0-9]+\\.b}.*\\tst1b\\t\\1, p[0-7], \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_be_u16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1h\\t(z[0-9]+\\.h), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1h\\t\\1, p[0-7], \\[x2\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_be_u16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4h\\t{(z[0-9]+\\.h) - z[0-9]+\\.h}.*\\tst1h\\t\\1, p[0-7], \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_be_u32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1w\\t(z[0-9]+\\.s), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1w\\t\\1, p[0-7], \\[x2\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_be_u32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4w\\t{(z[0-9]+\\.s) - z[0-9]+\\.s}.*\\tst1w\\t\\1, p[0-7], \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_be_u64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1d\\t(z[0-9]+\\.d), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1d\\t\\1, p[0-7], \\[x2\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_be_u64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4d\\t{(z[0-9]+\\.d) - z[0-9]+\\.d}.*\\tst1d\\t\\1, p[0-7], \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_be_u8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1b\\t(z[0-9]+\\.b), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1b\\t\\1, p[0-7], \\[x2\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_be_u8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4b\\t{(z[0-9]+\\.b) - z[0-9]+\\.b}.*\\tst1b\\t\\1, p[0-7], \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_le_bf16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1h\\t(z[0-9]+\\.h), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1h\\t\\1, p[0-7], \\[x2\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_le_bf16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_le_f16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1h\\t(z[0-9]+\\.h), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1h\\t\\1, p[0-7], \\[x2\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_le_f16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_le_f32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1w\\t(z[0-9]+\\.s), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1w\\t\\1, p[0-7], \\[x2\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_le_f32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4w\\t{(z[0-9]+)\\.s - z[0-9]+\\.s}.*\\tstr\\t\\1, \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_le_f64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1d\\t(z[0-9]+\\.d), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1d\\t\\1, p[0-7], \\[x2\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_le_f64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4d\\t{(z[0-9]+)\\.d - z[0-9]+\\.d}.*\\tstr\\t\\1, \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_le_s16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1h\\t(z[0-9]+\\.h), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1h\\t\\1, p[0-7], \\[x2\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_le_s16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_le_s32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1w\\t(z[0-9]+\\.s), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1w\\t\\1, p[0-7], \\[x2\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_le_s32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4w\\t{(z[0-9]+)\\.s - z[0-9]+\\.s}.*\\tstr\\t\\1, \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_le_s64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1d\\t(z[0-9]+\\.d), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1d\\t\\1, p[0-7], \\[x2\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_le_s64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4d\\t{(z[0-9]+)\\.d - z[0-9]+\\.d}.*\\tstr\\t\\1, \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_le_s8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1b\\t(z[0-9]+\\.b), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1b\\t\\1, p[0-7], \\[x2\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_le_s8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4b\\t{(z[0-9]+)\\.b - z[0-9]+\\.b}.*\\tstr\\t\\1, \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_le_u16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1h\\t(z[0-9]+\\.h), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1h\\t\\1, p[0-7], \\[x2\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_le_u16.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4h\\t{(z[0-9]+)\\.h - z[0-9]+\\.h}.*\\tstr\\t\\1, \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_le_u32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1w\\t(z[0-9]+\\.s), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1w\\t\\1, p[0-7], \\[x2\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_le_u32.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4w\\t{(z[0-9]+)\\.s - z[0-9]+\\.s}.*\\tstr\\t\\1, \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_le_u64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1d\\t(z[0-9]+\\.d), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1d\\t\\1, p[0-7], \\[x2\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_le_u64.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4d\\t{(z[0-9]+)\\.d - z[0-9]+\\.d}.*\\tstr\\t\\1, \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_le_u8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld1b\\t(z[0-9]+\\.b), p[0-7]/z, \\[x0, #5, mul vl\\]\\n.*\\tst1b\\t\\1, p[0-7], \\[x2\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_6_le_u8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tld4b\\t{(z[0-9]+)\\.b - z[0-9]+\\.b}.*\\tstr\\t\\1, \\[x1\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/args_8.c -march=armv8.2-a+sve -fno-stack-protector  scan-assembler \\tmov\\t(z[0-9]+\\.b), #42\\n.*\\tst1b\\t\\1, p[0-7], \\[x4\\]\\n
> > FAIL: gcc.target/aarch64/sve/pcs/stack_clash_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_1
> > FAIL: gcc.target/aarch64/sve/pcs/stack_clash_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_3
> > FAIL: gcc.target/aarch64/sve/pcs/stack_clash_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_4
> > FAIL: gcc.target/aarch64/sve/pcs/stack_clash_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_5
> > FAIL: gcc.target/aarch64/sve/pcs/stack_clash_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_6
> > FAIL: gcc.target/aarch64/sve/pcs/stack_clash_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_7
> > FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_1
> > FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_10
> > FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_11
> > FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_2
> > FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_3
> > FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_4
> > FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_5
> > FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_6
> > FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_7
> > FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_8
> > FAIL: gcc.target/aarch64/sve/pcs/stack_clash_2.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_9
> > FAIL: gcc.target/aarch64/sve/pcs/stack_clash_3.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_1
> > FAIL: gcc.target/aarch64/sve/pcs/stack_clash_3.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies test_2
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_1.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_f64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s8.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s8.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_s8.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u16.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u32.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u64.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u8.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_0
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u8.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_1
> > FAIL: gcc.target/aarch64/sve/pcs/varargs_2_u8.c -march=armv8.2-a+sve -fno-stack-protector  check-function-bodies caller_7
> >
> > --
> > Thiago

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.
  2023-06-19 23:48           ` Andrew Pinski
@ 2023-06-20  2:16             ` Jeff Law
  2023-06-20  4:52               ` Tamar Christina
  0 siblings, 1 reply; 45+ messages in thread
From: Jeff Law @ 2023-06-20  2:16 UTC (permalink / raw)
  To: Andrew Pinski, Thiago Jung Bauermann
  Cc: Manolis Tsamis, Philipp Tomsich, Richard Biener, Palmer Dabbelt,
	Kito Cheng, gcc-patches, Tamar Christina



On 6/19/23 17:48, Andrew Pinski wrote:
> On Mon, Jun 19, 2023 at 4:40 PM Andrew Pinski <pinskia@gmail.com> wrote:
>>
>> On Mon, Jun 19, 2023 at 9:58 AM Thiago Jung Bauermann via Gcc-patches
>> <gcc-patches@gcc.gnu.org> wrote:
>>>
>>>
>>> Hello Manolis,
>>>
>>> Philipp Tomsich <philipp.tomsich@vrull.eu> writes:
>>>
>>>> On Thu, 8 Jun 2023 at 00:18, Jeff Law <jeffreyalaw@gmail.com> wrote:
>>>>>
>>>>> On 5/25/23 06:35, Manolis Tsamis wrote:
>>>>>> Propagation of the stack pointer in cprop_hardreg is currenty forbidden
>>>>>> in all cases, due to maybe_mode_change returning NULL. Relax this
>>>>>> restriction and allow propagation when no mode change is requested.
>>>>>>
>>>>>> gcc/ChangeLog:
>>>>>>
>>>>>>           * regcprop.cc (maybe_mode_change): Enable stack pointer propagation.
>>>>> Thanks for the clarification.  This is OK for the trunk.  It looks
>>>>> generic enough to have value going forward now rather than waiting.
>>>>
>>>> Rebased, retested, and applied to trunk.  Thanks!
>>>
>>> Our CI found a couple of tests that started failing on aarch64-linux
>>> after this commit. I was able to confirm manually that they don't happen
>>> in the commit immediately before this one, and also that these failures
>>> are still present in today's trunk.
>>>
>>> I have testsuite logs for last good commit, first bad commit and current
>>> trunk here:
>>>
>>> https://people.linaro.org/~thiago.bauermann/gcc-regression-6a2e8dcbbd4b/
>>>
>>> Could you please check?
>>>
>>> These are the new failures:
>>>
>>> Running gcc:gcc.target/aarch64/aarch64.exp ...
>>> FAIL: gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times mov\\tx11, sp 1
>>
>> So for the above before this change we had:
>> ```
>> (insn:TI 597 596 598 2 (set (reg:DI 11 x11)
>>          (reg/f:DI 31 sp)) "stack-check-prologue-16.c":16:1 65 {*movdi_aarch64}
>>       (nil))
>> (insn 598 597 599 2 (set (mem:BLK (scratch) [0  A8])
>>          (unspec:BLK [
>>                  (reg:DI 11 x11)
>>                  (reg/f:DI 31 sp)
>>              ] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1 1169
>> {stack_tie}
>>       (expr_list:REG_DEAD (reg:DI 11 x11)
>>          (nil)))
>> ```
>>
>> After we get:
>> ```
>> (insn 598 596 599 2 (set (mem:BLK (scratch) [0  A8])
>>          (unspec:BLK [
>>                  (reg:DI 31 sp [11]) repeated x2
>>              ] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1 1169
>> {stack_tie}
>>       (nil))
>> ```
>> Which seems to be ok, except we still have:
>> .cfi_def_cfa_register 11
>>
>> That is because on:
>> (insn/f 596 595 598 2 (set (reg:DI 12 x12)
>>          (plus:DI (reg:DI 12 x12)
>>              (const_int 272 [0x110]))) "stack-check-prologue-16.c":16:1
>> 153 {*adddi3_aarch64}
>>       (expr_list:REG_CFA_DEF_CFA (reg:DI 11 x11)
>>          (nil)))
>>
>> We record x11 but never update it though that came before the mov for
>> x11 ... So it seems like cprop_hardreg had no idea it needed to update
>> it.
>>
>> I suspect the other testcases are just propagation of sp into the
>> stores and such and just needed update. But the above testcase seems
>> getting broken cfi  though I don't know how to fix it.
> 
> The code from aarch64.cc:
> ```
>            /* This is done to provide unwinding information for the stack
>               adjustments we're about to do, however to prevent the optimizers
>               from removing the R11 move and leaving the CFA note (which would be
>               very wrong) we tie the old and new stack pointer together.
>               The tie will expand to nothing but the optimizers will not touch
>               the instruction.  */
>            rtx stack_ptr_copy = gen_rtx_REG (Pmode, STACK_CLASH_SVE_CFA_REGNUM);
>            emit_move_insn (stack_ptr_copy, stack_pointer_rtx);
>            emit_insn (gen_stack_tie (stack_ptr_copy, stack_pointer_rtx));
> 
>            /* We want the CFA independent of the stack pointer for the
>               duration of the loop.  */
>            add_reg_note (insn, REG_CFA_DEF_CFA, stack_ptr_copy);
>            RTX_FRAME_RELATED_P (insn) = 1;
> ```
> 
> Well except now with this change, the optimizers touch this
> instruction. Maybe the move instruction should not be a move but an
> unspec so optimizers don't know what the move was.
> Adding Tamar to the CC who added this code to aarch64 originally for
> comments on the above understanding here.
It's a bit hackish, but could we reject the stack pointer for operand1 
in the stack-tie?  And if we do so, does it help?

jeff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* RE: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.
  2023-06-20  2:16             ` Jeff Law
@ 2023-06-20  4:52               ` Tamar Christina
  2023-06-20  5:00                 ` Jeff Law
  0 siblings, 1 reply; 45+ messages in thread
From: Tamar Christina @ 2023-06-20  4:52 UTC (permalink / raw)
  To: Jeff Law, Andrew Pinski, Thiago Jung Bauermann
  Cc: Manolis Tsamis, Philipp Tomsich, Richard Biener, Palmer Dabbelt,
	Kito Cheng, gcc-patches

> -----Original Message-----
> From: Jeff Law <jeffreyalaw@gmail.com>
> Sent: Tuesday, June 20, 2023 3:17 AM
> To: Andrew Pinski <pinskia@gmail.com>; Thiago Jung Bauermann
> <thiago.bauermann@linaro.org>
> Cc: Manolis Tsamis <manolis.tsamis@vrull.eu>; Philipp Tomsich
> <philipp.tomsich@vrull.eu>; Richard Biener <richard.guenther@gmail.com>;
> Palmer Dabbelt <palmer@rivosinc.com>; Kito Cheng <kito.cheng@gmail.com>;
> gcc-patches@gcc.gnu.org; Tamar Christina <Tamar.Christina@arm.com>
> Subject: Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack
> pointer if possible.
> 
> 
> 
> On 6/19/23 17:48, Andrew Pinski wrote:
> > On Mon, Jun 19, 2023 at 4:40 PM Andrew Pinski <pinskia@gmail.com>
> wrote:
> >>
> >> On Mon, Jun 19, 2023 at 9:58 AM Thiago Jung Bauermann via Gcc-patches
> >> <gcc-patches@gcc.gnu.org> wrote:
> >>>
> >>>
> >>> Hello Manolis,
> >>>
> >>> Philipp Tomsich <philipp.tomsich@vrull.eu> writes:
> >>>
> >>>> On Thu, 8 Jun 2023 at 00:18, Jeff Law <jeffreyalaw@gmail.com> wrote:
> >>>>>
> >>>>> On 5/25/23 06:35, Manolis Tsamis wrote:
> >>>>>> Propagation of the stack pointer in cprop_hardreg is currenty
> >>>>>> forbidden in all cases, due to maybe_mode_change returning NULL.
> >>>>>> Relax this restriction and allow propagation when no mode change is
> requested.
> >>>>>>
> >>>>>> gcc/ChangeLog:
> >>>>>>
> >>>>>>           * regcprop.cc (maybe_mode_change): Enable stack pointer
> propagation.
> >>>>> Thanks for the clarification.  This is OK for the trunk.  It looks
> >>>>> generic enough to have value going forward now rather than waiting.
> >>>>
> >>>> Rebased, retested, and applied to trunk.  Thanks!
> >>>
> >>> Our CI found a couple of tests that started failing on aarch64-linux
> >>> after this commit. I was able to confirm manually that they don't
> >>> happen in the commit immediately before this one, and also that
> >>> these failures are still present in today's trunk.
> >>>
> >>> I have testsuite logs for last good commit, first bad commit and
> >>> current trunk here:
> >>>
> >>> https://people.linaro.org/~thiago.bauermann/gcc-regression-6a2e8dcbb
> >>> d4b/
> >>>
> >>> Could you please check?
> >>>
> >>> These are the new failures:
> >>>
> >>> Running gcc:gcc.target/aarch64/aarch64.exp ...
> >>> FAIL: gcc.target/aarch64/stack-check-cfa-3.c scan-assembler-times
> >>> mov\\tx11, sp 1
> >>
> >> So for the above before this change we had:
> >> ```
> >> (insn:TI 597 596 598 2 (set (reg:DI 11 x11)
> >>          (reg/f:DI 31 sp)) "stack-check-prologue-16.c":16:1 65
> {*movdi_aarch64}
> >>       (nil))
> >> (insn 598 597 599 2 (set (mem:BLK (scratch) [0  A8])
> >>          (unspec:BLK [
> >>                  (reg:DI 11 x11)
> >>                  (reg/f:DI 31 sp)
> >>              ] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1
> >> 1169 {stack_tie}
> >>       (expr_list:REG_DEAD (reg:DI 11 x11)
> >>          (nil)))
> >> ```
> >>
> >> After we get:
> >> ```
> >> (insn 598 596 599 2 (set (mem:BLK (scratch) [0  A8])
> >>          (unspec:BLK [
> >>                  (reg:DI 31 sp [11]) repeated x2
> >>              ] UNSPEC_PRLG_STK)) "stack-check-prologue-16.c":16:1
> >> 1169 {stack_tie}
> >>       (nil))
> >> ```
> >> Which seems to be ok, except we still have:
> >> .cfi_def_cfa_register 11
> >>
> >> That is because on:
> >> (insn/f 596 595 598 2 (set (reg:DI 12 x12)
> >>          (plus:DI (reg:DI 12 x12)
> >>              (const_int 272 [0x110])))
> >> "stack-check-prologue-16.c":16:1
> >> 153 {*adddi3_aarch64}
> >>       (expr_list:REG_CFA_DEF_CFA (reg:DI 11 x11)
> >>          (nil)))
> >>
> >> We record x11 but never update it though that came before the mov for
> >> x11 ... So it seems like cprop_hardreg had no idea it needed to
> >> update it.
> >>
> >> I suspect the other testcases are just propagation of sp into the
> >> stores and such and just needed update. But the above testcase seems
> >> getting broken cfi  though I don't know how to fix it.

Yeah, we noticed the failures internally but left them broken since we have an
upcoming AArch64 patch which requires them to be updated anyway and are
rolling up the updates into that patch. 

> >
> > The code from aarch64.cc:
> > ```
> >            /* This is done to provide unwinding information for the stack
> >               adjustments we're about to do, however to prevent the optimizers
> >               from removing the R11 move and leaving the CFA note (which would
> be
> >               very wrong) we tie the old and new stack pointer together.
> >               The tie will expand to nothing but the optimizers will not touch
> >               the instruction.  */
> >            rtx stack_ptr_copy = gen_rtx_REG (Pmode,
> STACK_CLASH_SVE_CFA_REGNUM);
> >            emit_move_insn (stack_ptr_copy, stack_pointer_rtx);
> >            emit_insn (gen_stack_tie (stack_ptr_copy,
> > stack_pointer_rtx));
> >
> >            /* We want the CFA independent of the stack pointer for the
> >               duration of the loop.  */
> >            add_reg_note (insn, REG_CFA_DEF_CFA, stack_ptr_copy);
> >            RTX_FRAME_RELATED_P (insn) = 1; ```
> >
> > Well except now with this change, the optimizers touch this
> > instruction. Maybe the move instruction should not be a move but an
> > unspec so optimizers don't know what the move was.
> > Adding Tamar to the CC who added this code to aarch64 originally for
> > comments on the above understanding here.
> It's a bit hackish, but could we reject the stack pointer for operand1 in the
> stack-tie?  And if we do so, does it help?

Yeah this one I had to defer until later this week to look at closer because what I'm
wondering about is whether the optimization should apply to frame related
RTX as well.

Looking at the description of RTX_FRAME_RELATED_P that this optimization may
end up de-optimizing RISC targets by creating an offset that is larger than offset
which can be used from a SP making reload having to spill.  i.e. sometimes the
move was explicitly done. So perhaps it should not apply it to
RTX_FRAME_RELATED_P in find_oldest_value_reg and copyprop_hardreg_forward_1?

Other parts of this pass already seems to bail out in similar situations.   So I needed to
write some testcases to check what would happen in these cases hence the deferral.
to later in the week.

Kind Regards,
Tamar

> 
> jeff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.
  2023-06-20  4:52               ` Tamar Christina
@ 2023-06-20  5:00                 ` Jeff Law
  2023-06-21 23:42                   ` Thiago Jung Bauermann
  0 siblings, 1 reply; 45+ messages in thread
From: Jeff Law @ 2023-06-20  5:00 UTC (permalink / raw)
  To: Tamar Christina, Andrew Pinski, Thiago Jung Bauermann
  Cc: Manolis Tsamis, Philipp Tomsich, Richard Biener, Palmer Dabbelt,
	Kito Cheng, gcc-patches



On 6/19/23 22:52, Tamar Christina wrote:

>> It's a bit hackish, but could we reject the stack pointer for operand1 in the
>> stack-tie?  And if we do so, does it help?
> 
> Yeah this one I had to defer until later this week to look at closer because what I'm
> wondering about is whether the optimization should apply to frame related
> RTX as well.
> 
> Looking at the description of RTX_FRAME_RELATED_P that this optimization may
> end up de-optimizing RISC targets by creating an offset that is larger than offset
> which can be used from a SP making reload having to spill.  i.e. sometimes the
> move was explicitly done. So perhaps it should not apply it to
> RTX_FRAME_RELATED_P in find_oldest_value_reg and copyprop_hardreg_forward_1?
> 
> Other parts of this pass already seems to bail out in similar situations.   So I needed to
> write some testcases to check what would happen in these cases hence the deferral.
> to later in the week.
Rejecting for RTX_FRAME_RELATED_P would seem reasonable and probably 
better in general to me.  The cases where we're looking to clean things 
up aren't really in the prologue/epilogue, but instead in the main body 
after register elimination has turned fp into sp + offset, thus making 
all kinds of things no longer valid.

jeff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.
  2023-06-20  5:00                 ` Jeff Law
@ 2023-06-21 23:42                   ` Thiago Jung Bauermann
  2023-06-22  7:37                     ` Richard Biener
  0 siblings, 1 reply; 45+ messages in thread
From: Thiago Jung Bauermann @ 2023-06-21 23:42 UTC (permalink / raw)
  To: Jeff Law
  Cc: Tamar Christina, Andrew Pinski, Manolis Tsamis, Philipp Tomsich,
	Richard Biener, Palmer Dabbelt, Kito Cheng, gcc-patches


Hello,

Jeff Law <jeffreyalaw@gmail.com> writes:

> On 6/19/23 22:52, Tamar Christina wrote:
>
>>> It's a bit hackish, but could we reject the stack pointer for operand1 in the
>>> stack-tie?  And if we do so, does it help?
>> Yeah this one I had to defer until later this week to look at closer because what I'm
>> wondering about is whether the optimization should apply to frame related
>> RTX as well.
>> Looking at the description of RTX_FRAME_RELATED_P that this optimization may
>> end up de-optimizing RISC targets by creating an offset that is larger than offset
>> which can be used from a SP making reload having to spill.  i.e. sometimes the
>> move was explicitly done. So perhaps it should not apply it to
>> RTX_FRAME_RELATED_P in find_oldest_value_reg and copyprop_hardreg_forward_1?
>> Other parts of this pass already seems to bail out in similar situations. So I needed
>> to
>> write some testcases to check what would happen in these cases hence the deferral.
>> to later in the week.
> Rejecting for RTX_FRAME_RELATED_P would seem reasonable and probably better in general to
> me.  The cases where we're looking to clean things up aren't really in the
> prologue/epilogue, but instead in the main body after register elimination has turned fp
> into sp + offset, thus making all kinds of things no longer valid.

The problems I reported were fixed by commits:

580b74a79146 "aarch64: Robustify stack tie handling"
079f31c55318 "aarch64: Fix gcc.target/aarch64/sve/pcs failures"

Thanks!

But unfortunately I'm still seeing bootstrap failures (ICE segmentation
fault) in today's trunk with build config bootstrap-lto in both
armv8l-linux-gnueabihf and aarch64-linux-gnu.

If I revert commit 6a2e8dcbbd4b "cprop_hardreg: Enable propagation of
the stack pointer if possible" from trunk then both bootstraps succeed.

Here's the command I'm using to build on armv8l:

~/src/configure \
    SHELL=/bin/bash \
    --with-gnu-as \
    --with-gnu-ld \
    --disable-libmudflap \
    --enable-lto \
    --enable-shared \
    --without-included-gettext \
    --enable-nls \
    --with-system-zlib \
    --disable-sjlj-exceptions \
    --enable-gnu-unique-object \
    --enable-linker-build-id \
    --disable-libstdcxx-pch \
    --enable-c99 \
    --enable-clocale=gnu \
    --enable-libstdcxx-debug \
    --enable-long-long \
    --with-cloog=no \
    --with-ppl=no \
    --with-isl=no \
    --disable-multilib \
    --with-float=hard \
    --with-fpu=neon-fp-armv8 \
    --with-mode=thumb \
    --with-arch=armv8-a \
    --enable-threads=posix \
    --enable-multiarch \
    --enable-libstdcxx-time=yes \
    --enable-gnu-indirect-function \
    --disable-werror \
    --enable-checking=yes \
    --enable-bootstrap \
    --with-build-config=bootstrap-lto \
    --enable-languages=c,c++,fortran,lto \
    && make \
        profiledbootstrap \
        SHELL=/bin/bash \
        -w \
        -j 40 \
        CFLAGS_FOR_BUILD="-pipe -g -O2" \
        CXXFLAGS_FOR_BUILD="-pipe -g -O2" \
        LDFLAGS_FOR_BUILD="-static-libgcc" \
        MAKEINFOFLAGS=--force \
        BUILD_INFO="" \
        MAKEINFO=echo

And here's the slightly different one for aarch64-linux:

~/src/configure \
    SHELL=/bin/bash \
    --with-gnu-as \
    --with-gnu-ld \
    --disable-libmudflap \
    --enable-lto \
    --enable-shared \
    --without-included-gettext \
    --enable-nls \
    --with-system-zlib \
    --disable-sjlj-exceptions \
    --enable-gnu-unique-object \
    --enable-linker-build-id \
    --disable-libstdcxx-pch \
    --enable-c99 \
    --enable-clocale=gnu \
    --enable-libstdcxx-debug \
    --enable-long-long \
    --with-cloog=no \
    --with-ppl=no \
    --with-isl=no \
    --disable-multilib \
    --enable-fix-cortex-a53-835769 \
    --enable-fix-cortex-a53-843419 \
    --with-arch=armv8-a \
    --enable-threads=posix \
    --enable-multiarch \
    --enable-libstdcxx-time=yes \
    --enable-gnu-indirect-function \
    --disable-werror \
    --enable-checking=yes \
    --enable-bootstrap \
    --with-build-config=bootstrap-lto \
    --enable-languages=c,c++,fortran,lto \
    && make \
        profiledbootstrap \
        SHELL=/bin/bash \
        -w \
        -j 40 \
        LDFLAGS_FOR_TARGET="-Wl,-fix-cortex-a53-843419" \
        CFLAGS_FOR_BUILD="-pipe -g -O2" \
        CXXFLAGS_FOR_BUILD="-pipe -g -O2" \
        LDFLAGS_FOR_BUILD="-static-libgcc" \
        MAKEINFOFLAGS=--force \
        BUILD_INFO="" \
        MAKEINFO=echo

-- 
Thiago

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.
  2023-06-21 23:42                   ` Thiago Jung Bauermann
@ 2023-06-22  7:37                     ` Richard Biener
  2023-06-22  7:58                       ` Philipp Tomsich
  0 siblings, 1 reply; 45+ messages in thread
From: Richard Biener @ 2023-06-22  7:37 UTC (permalink / raw)
  To: Thiago Jung Bauermann
  Cc: Jeff Law, Tamar Christina, Andrew Pinski, Manolis Tsamis,
	Philipp Tomsich, Palmer Dabbelt, Kito Cheng, gcc-patches

On Thu, Jun 22, 2023 at 1:42 AM Thiago Jung Bauermann
<thiago.bauermann@linaro.org> wrote:
>
>
> Hello,
>
> Jeff Law <jeffreyalaw@gmail.com> writes:
>
> > On 6/19/23 22:52, Tamar Christina wrote:
> >
> >>> It's a bit hackish, but could we reject the stack pointer for operand1 in the
> >>> stack-tie?  And if we do so, does it help?
> >> Yeah this one I had to defer until later this week to look at closer because what I'm
> >> wondering about is whether the optimization should apply to frame related
> >> RTX as well.
> >> Looking at the description of RTX_FRAME_RELATED_P that this optimization may
> >> end up de-optimizing RISC targets by creating an offset that is larger than offset
> >> which can be used from a SP making reload having to spill.  i.e. sometimes the
> >> move was explicitly done. So perhaps it should not apply it to
> >> RTX_FRAME_RELATED_P in find_oldest_value_reg and copyprop_hardreg_forward_1?
> >> Other parts of this pass already seems to bail out in similar situations. So I needed
> >> to
> >> write some testcases to check what would happen in these cases hence the deferral.
> >> to later in the week.
> > Rejecting for RTX_FRAME_RELATED_P would seem reasonable and probably better in general to
> > me.  The cases where we're looking to clean things up aren't really in the
> > prologue/epilogue, but instead in the main body after register elimination has turned fp
> > into sp + offset, thus making all kinds of things no longer valid.
>
> The problems I reported were fixed by commits:
>
> 580b74a79146 "aarch64: Robustify stack tie handling"
> 079f31c55318 "aarch64: Fix gcc.target/aarch64/sve/pcs failures"
>
> Thanks!
>
> But unfortunately I'm still seeing bootstrap failures (ICE segmentation
> fault) in today's trunk with build config bootstrap-lto in both
> armv8l-linux-gnueabihf and aarch64-linux-gnu.

If there's not yet a bugreport for this please make sure to open one so
this issue doesn't get lost.

> If I revert commit 6a2e8dcbbd4b "cprop_hardreg: Enable propagation of
> the stack pointer if possible" from trunk then both bootstraps succeed.
>
> Here's the command I'm using to build on armv8l:
>
> ~/src/configure \
>     SHELL=/bin/bash \
>     --with-gnu-as \
>     --with-gnu-ld \
>     --disable-libmudflap \
>     --enable-lto \
>     --enable-shared \
>     --without-included-gettext \
>     --enable-nls \
>     --with-system-zlib \
>     --disable-sjlj-exceptions \
>     --enable-gnu-unique-object \
>     --enable-linker-build-id \
>     --disable-libstdcxx-pch \
>     --enable-c99 \
>     --enable-clocale=gnu \
>     --enable-libstdcxx-debug \
>     --enable-long-long \
>     --with-cloog=no \
>     --with-ppl=no \
>     --with-isl=no \
>     --disable-multilib \
>     --with-float=hard \
>     --with-fpu=neon-fp-armv8 \
>     --with-mode=thumb \
>     --with-arch=armv8-a \
>     --enable-threads=posix \
>     --enable-multiarch \
>     --enable-libstdcxx-time=yes \
>     --enable-gnu-indirect-function \
>     --disable-werror \
>     --enable-checking=yes \
>     --enable-bootstrap \
>     --with-build-config=bootstrap-lto \
>     --enable-languages=c,c++,fortran,lto \
>     && make \
>         profiledbootstrap \
>         SHELL=/bin/bash \
>         -w \
>         -j 40 \
>         CFLAGS_FOR_BUILD="-pipe -g -O2" \
>         CXXFLAGS_FOR_BUILD="-pipe -g -O2" \
>         LDFLAGS_FOR_BUILD="-static-libgcc" \
>         MAKEINFOFLAGS=--force \
>         BUILD_INFO="" \
>         MAKEINFO=echo
>
> And here's the slightly different one for aarch64-linux:
>
> ~/src/configure \
>     SHELL=/bin/bash \
>     --with-gnu-as \
>     --with-gnu-ld \
>     --disable-libmudflap \
>     --enable-lto \
>     --enable-shared \
>     --without-included-gettext \
>     --enable-nls \
>     --with-system-zlib \
>     --disable-sjlj-exceptions \
>     --enable-gnu-unique-object \
>     --enable-linker-build-id \
>     --disable-libstdcxx-pch \
>     --enable-c99 \
>     --enable-clocale=gnu \
>     --enable-libstdcxx-debug \
>     --enable-long-long \
>     --with-cloog=no \
>     --with-ppl=no \
>     --with-isl=no \
>     --disable-multilib \
>     --enable-fix-cortex-a53-835769 \
>     --enable-fix-cortex-a53-843419 \
>     --with-arch=armv8-a \
>     --enable-threads=posix \
>     --enable-multiarch \
>     --enable-libstdcxx-time=yes \
>     --enable-gnu-indirect-function \
>     --disable-werror \
>     --enable-checking=yes \
>     --enable-bootstrap \
>     --with-build-config=bootstrap-lto \
>     --enable-languages=c,c++,fortran,lto \
>     && make \
>         profiledbootstrap \
>         SHELL=/bin/bash \
>         -w \
>         -j 40 \
>         LDFLAGS_FOR_TARGET="-Wl,-fix-cortex-a53-843419" \
>         CFLAGS_FOR_BUILD="-pipe -g -O2" \
>         CXXFLAGS_FOR_BUILD="-pipe -g -O2" \
>         LDFLAGS_FOR_BUILD="-static-libgcc" \
>         MAKEINFOFLAGS=--force \
>         BUILD_INFO="" \
>         MAKEINFO=echo
>
> --
> Thiago

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible.
  2023-06-22  7:37                     ` Richard Biener
@ 2023-06-22  7:58                       ` Philipp Tomsich
  0 siblings, 0 replies; 45+ messages in thread
From: Philipp Tomsich @ 2023-06-22  7:58 UTC (permalink / raw)
  To: Richard Biener
  Cc: Thiago Jung Bauermann, Jeff Law, Tamar Christina, Andrew Pinski,
	Manolis Tsamis, Palmer Dabbelt, Kito Cheng, gcc-patches

This should be covered by PR110308 (proposed fix attached there) and PR110313.
Our bootstrap runs are still in progress to confirm.


On Thu, 22 Jun 2023 at 09:40, Richard Biener <richard.guenther@gmail.com> wrote:
>
> On Thu, Jun 22, 2023 at 1:42 AM Thiago Jung Bauermann
> <thiago.bauermann@linaro.org> wrote:
> >
> >
> > Hello,
> >
> > Jeff Law <jeffreyalaw@gmail.com> writes:
> >
> > > On 6/19/23 22:52, Tamar Christina wrote:
> > >
> > >>> It's a bit hackish, but could we reject the stack pointer for operand1 in the
> > >>> stack-tie?  And if we do so, does it help?
> > >> Yeah this one I had to defer until later this week to look at closer because what I'm
> > >> wondering about is whether the optimization should apply to frame related
> > >> RTX as well.
> > >> Looking at the description of RTX_FRAME_RELATED_P that this optimization may
> > >> end up de-optimizing RISC targets by creating an offset that is larger than offset
> > >> which can be used from a SP making reload having to spill.  i.e. sometimes the
> > >> move was explicitly done. So perhaps it should not apply it to
> > >> RTX_FRAME_RELATED_P in find_oldest_value_reg and copyprop_hardreg_forward_1?
> > >> Other parts of this pass already seems to bail out in similar situations. So I needed
> > >> to
> > >> write some testcases to check what would happen in these cases hence the deferral.
> > >> to later in the week.
> > > Rejecting for RTX_FRAME_RELATED_P would seem reasonable and probably better in general to
> > > me.  The cases where we're looking to clean things up aren't really in the
> > > prologue/epilogue, but instead in the main body after register elimination has turned fp
> > > into sp + offset, thus making all kinds of things no longer valid.
> >
> > The problems I reported were fixed by commits:
> >
> > 580b74a79146 "aarch64: Robustify stack tie handling"
> > 079f31c55318 "aarch64: Fix gcc.target/aarch64/sve/pcs failures"
> >
> > Thanks!
> >
> > But unfortunately I'm still seeing bootstrap failures (ICE segmentation
> > fault) in today's trunk with build config bootstrap-lto in both
> > armv8l-linux-gnueabihf and aarch64-linux-gnu.
>
> If there's not yet a bugreport for this please make sure to open one so
> this issue doesn't get lost.
>
> > If I revert commit 6a2e8dcbbd4b "cprop_hardreg: Enable propagation of
> > the stack pointer if possible" from trunk then both bootstraps succeed.
> >
> > Here's the command I'm using to build on armv8l:
> >
> > ~/src/configure \
> >     SHELL=/bin/bash \
> >     --with-gnu-as \
> >     --with-gnu-ld \
> >     --disable-libmudflap \
> >     --enable-lto \
> >     --enable-shared \
> >     --without-included-gettext \
> >     --enable-nls \
> >     --with-system-zlib \
> >     --disable-sjlj-exceptions \
> >     --enable-gnu-unique-object \
> >     --enable-linker-build-id \
> >     --disable-libstdcxx-pch \
> >     --enable-c99 \
> >     --enable-clocale=gnu \
> >     --enable-libstdcxx-debug \
> >     --enable-long-long \
> >     --with-cloog=no \
> >     --with-ppl=no \
> >     --with-isl=no \
> >     --disable-multilib \
> >     --with-float=hard \
> >     --with-fpu=neon-fp-armv8 \
> >     --with-mode=thumb \
> >     --with-arch=armv8-a \
> >     --enable-threads=posix \
> >     --enable-multiarch \
> >     --enable-libstdcxx-time=yes \
> >     --enable-gnu-indirect-function \
> >     --disable-werror \
> >     --enable-checking=yes \
> >     --enable-bootstrap \
> >     --with-build-config=bootstrap-lto \
> >     --enable-languages=c,c++,fortran,lto \
> >     && make \
> >         profiledbootstrap \
> >         SHELL=/bin/bash \
> >         -w \
> >         -j 40 \
> >         CFLAGS_FOR_BUILD="-pipe -g -O2" \
> >         CXXFLAGS_FOR_BUILD="-pipe -g -O2" \
> >         LDFLAGS_FOR_BUILD="-static-libgcc" \
> >         MAKEINFOFLAGS=--force \
> >         BUILD_INFO="" \
> >         MAKEINFO=echo
> >
> > And here's the slightly different one for aarch64-linux:
> >
> > ~/src/configure \
> >     SHELL=/bin/bash \
> >     --with-gnu-as \
> >     --with-gnu-ld \
> >     --disable-libmudflap \
> >     --enable-lto \
> >     --enable-shared \
> >     --without-included-gettext \
> >     --enable-nls \
> >     --with-system-zlib \
> >     --disable-sjlj-exceptions \
> >     --enable-gnu-unique-object \
> >     --enable-linker-build-id \
> >     --disable-libstdcxx-pch \
> >     --enable-c99 \
> >     --enable-clocale=gnu \
> >     --enable-libstdcxx-debug \
> >     --enable-long-long \
> >     --with-cloog=no \
> >     --with-ppl=no \
> >     --with-isl=no \
> >     --disable-multilib \
> >     --enable-fix-cortex-a53-835769 \
> >     --enable-fix-cortex-a53-843419 \
> >     --with-arch=armv8-a \
> >     --enable-threads=posix \
> >     --enable-multiarch \
> >     --enable-libstdcxx-time=yes \
> >     --enable-gnu-indirect-function \
> >     --disable-werror \
> >     --enable-checking=yes \
> >     --enable-bootstrap \
> >     --with-build-config=bootstrap-lto \
> >     --enable-languages=c,c++,fortran,lto \
> >     && make \
> >         profiledbootstrap \
> >         SHELL=/bin/bash \
> >         -w \
> >         -j 40 \
> >         LDFLAGS_FOR_TARGET="-Wl,-fix-cortex-a53-843419" \
> >         CFLAGS_FOR_BUILD="-pipe -g -O2" \
> >         CXXFLAGS_FOR_BUILD="-pipe -g -O2" \
> >         LDFLAGS_FOR_BUILD="-static-libgcc" \
> >         MAKEINFOFLAGS=--force \
> >         BUILD_INFO="" \
> >         MAKEINFO=echo
> >
> > --
> > Thiago

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2023-06-22  7:58 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-25 12:35 [PATCH 0/2] RISC-V: New pass to optimize calculation of offsets for memory operations Manolis Tsamis
2023-05-25 12:35 ` [PATCH 1/2] Implementation of new RISCV optimizations pass: fold-mem-offsets Manolis Tsamis
2023-05-25 13:01   ` Richard Biener
2023-05-25 13:25     ` Manolis Tsamis
2023-05-25 13:31     ` Jeff Law
2023-05-25 13:50       ` Richard Biener
2023-05-25 14:02         ` Manolis Tsamis
2023-05-29 23:30           ` Jeff Law
2023-05-31 12:19             ` Manolis Tsamis
2023-05-31 14:00               ` Jeff Law
2023-05-25 14:13         ` Jeff Law
2023-05-25 14:18           ` Philipp Tomsich
2023-06-08  5:37   ` Jeff Law
2023-06-12  7:36     ` Manolis Tsamis
2023-06-12 14:37       ` Jeff Law
2023-06-09  0:57   ` Jeff Law
2023-06-12  7:32     ` Manolis Tsamis
2023-06-12 21:58       ` Jeff Law
2023-06-15 17:34         ` Manolis Tsamis
2023-06-10 15:49   ` Jeff Law
2023-06-12  7:41     ` Manolis Tsamis
2023-06-12 21:36       ` Jeff Law
2023-05-25 12:35 ` [PATCH 2/2] cprop_hardreg: Enable propagation of the stack pointer if possible Manolis Tsamis
2023-05-25 13:38   ` Jeff Law
2023-05-31 12:15     ` Manolis Tsamis
2023-06-07 22:16       ` Jeff Law
2023-06-07 22:18   ` Jeff Law
2023-06-08  6:15     ` Manolis Tsamis
2023-06-15 20:13     ` Philipp Tomsich
2023-06-19 16:57       ` Thiago Jung Bauermann
2023-06-19 17:07         ` Manolis Tsamis
2023-06-19 23:40         ` Andrew Pinski
2023-06-19 23:48           ` Andrew Pinski
2023-06-20  2:16             ` Jeff Law
2023-06-20  4:52               ` Tamar Christina
2023-06-20  5:00                 ` Jeff Law
2023-06-21 23:42                   ` Thiago Jung Bauermann
2023-06-22  7:37                     ` Richard Biener
2023-06-22  7:58                       ` Philipp Tomsich
2023-05-25 13:42 ` [PATCH 0/2] RISC-V: New pass to optimize calculation of offsets for memory operations Jeff Law
2023-05-25 13:57   ` Manolis Tsamis
2023-06-15 15:04   ` Jeff Law
2023-06-15 15:30     ` Manolis Tsamis
2023-06-15 15:56       ` Jeff Law
2023-06-18 18:11       ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).