public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v7] Implement new RTL optimizations pass: fold-mem-offsets.
@ 2023-10-16 18:01 Manolis Tsamis
  2023-10-16 18:14 ` Manolis Tsamis
  2023-10-16 19:11 ` Jeff Law
  0 siblings, 2 replies; 12+ messages in thread
From: Manolis Tsamis @ 2023-10-16 18:01 UTC (permalink / raw)
  To: gcc-patches
  Cc: Jeff Law, Richard Biener, Vineet Gupta, Philipp Tomsich, Manolis Tsamis

This is a new RTL pass that tries to optimize memory offset calculations
by moving them from add immediate instructions to the memory loads/stores.
For example it can transform this:

  addi t4,sp,16
  add  t2,a6,t4
  shl  t3,t2,1
  ld   a2,0(t3)
  addi a2,1
  sd   a2,8(t2)

into the following (one instruction less):

  add  t2,a6,sp
  shl  t3,t2,1
  ld   a2,32(t3)
  addi a2,1
  sd   a2,24(t2)

Although there are places where this is done already, this pass is more
powerful and can handle the more difficult cases that are currently not
optimized. Also, it runs late enough and can optimize away unnecessary
stack pointer calculations.

gcc/ChangeLog:

	* Makefile.in: Add fold-mem-offsets.o.
	* passes.def: Schedule a new pass.
	* tree-pass.h (make_pass_fold_mem_offsets): Declare.
	* common.opt: New options.
	* doc/invoke.texi: Document new option.
	* fold-mem-offsets.cc: New file.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/fold-mem-offsets-1.c: New test.
	* gcc.target/riscv/fold-mem-offsets-2.c: New test.
	* gcc.target/riscv/fold-mem-offsets-3.c: New test.

Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
---

Changes in v7:
        - Replace recognise with recognize.
        - Add INSN_CODE(insn) = -1 before calling insn_invalid_p.
        - Adjust scan regex in i386/pr52146.c.

Changes in v6:
        - Fix formatting issues.
        - Compute maximum validity iterations based on
          flag_expensive_optimizations.
        - Generate rtx plus only if offset is not zero.

Changes in v5:
        - Introduce new helper function fold_offsets_1.
        - Fix bug because constants could be partially propagated
          through instructions that weren't understood.
        - Introduce helper class fold_mem_info that stores f-m-o
          info for an instruction.
        - Calculate fold_offsets only once with do_fold_info_calculation.
        - Fix correctness issue by introducing compute_validity_closure.
        - Propagate in more cases for PLUS/MINUS with constant.

Changes in v4:
        - Add DF_EQ_NOTES flag to avoid incorrect state in notes.
        - Remove fold_mem_offsets_driver and enum fold_mem_phase.
        - Call recog when patching offsets in do_commit_offset.
        - Restore INSN_CODE after modifying insn in do_check_validity.

Changes in v3:
        - Added propagation for more codes:
          sub, neg, mul.
        - Added folding / elimination for sub and
          const int moves.
        - For the validity check of the generated addresses
          also test memory_address_addr_space_p.
        - Replaced GEN_INT with gen_int_mode.
        - Replaced some bitmap_head with auto_bitmap.
        - Refactor each phase into own function for readability.
        - Add dump details.
        - Replace rtx iteration with reg_mentioned_p.
        - Return early for codes that we can't propagate through.

Changes in v2:
        - Made the pass target-independant instead of RISCV specific.
        - Fixed a number of bugs.
        - Add code to handle more ADD patterns as found
          in other targets (x86, aarch64).
        - Improved naming and comments.
        - Fixed bitmap memory leak.

 gcc/Makefile.in                               |   1 +
 gcc/common.opt                                |   4 +
 gcc/doc/invoke.texi                           |   8 +
 gcc/fold-mem-offsets.cc                       | 901 ++++++++++++++++++
 gcc/passes.def                                |   1 +
 gcc/testsuite/gcc.target/i386/pr52146.c       |   2 +-
 .../gcc.target/riscv/fold-mem-offsets-1.c     |  16 +
 .../gcc.target/riscv/fold-mem-offsets-2.c     |  24 +
 .../gcc.target/riscv/fold-mem-offsets-3.c     |  17 +
 gcc/tree-pass.h                               |   1 +
 10 files changed, 974 insertions(+), 1 deletion(-)
 create mode 100644 gcc/fold-mem-offsets.cc
 create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 9cc16268abf..747f749538d 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1443,6 +1443,7 @@ OBJS = \
 	fixed-value.o \
 	fold-const.o \
 	fold-const-call.o \
+	fold-mem-offsets.o \
 	function.o \
 	function-abi.o \
 	function-tests.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index f137a1f81ac..b103b8d28ed 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1252,6 +1252,10 @@ fcprop-registers
 Common Var(flag_cprop_registers) Optimization
 Perform a register copy-propagation optimization pass.
 
+ffold-mem-offsets
+Target Bool Var(flag_fold_mem_offsets) Init(1)
+Fold instructions calculating memory offsets to the memory access instruction if possible.
+
 fcrossjumping
 Common Var(flag_crossjumping) Optimization
 Perform cross-jumping optimization.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index eb714d18511..a26dc5aae69 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -543,6 +543,7 @@ Objective-C and Objective-C++ Dialects}.
 -fauto-inc-dec  -fbranch-probabilities
 -fcaller-saves
 -fcombine-stack-adjustments  -fconserve-stack
+-ffold-mem-offsets
 -fcompare-elim  -fcprop-registers  -fcrossjumping
 -fcse-follow-jumps  -fcse-skip-blocks  -fcx-fortran-rules
 -fcx-limited-range
@@ -14395,6 +14396,13 @@ the comparison operation before register allocation is complete.
 
 Enabled at levels @option{-O1}, @option{-O2}, @option{-O3}, @option{-Os}.
 
+@opindex ffold-mem-offsets
+@item -ffold-mem-offsets
+@itemx -fno-fold-mem-offsets
+Try to eliminate add instructions by folding them in memory loads/stores.
+
+Enabled at levels @option{-O2}, @option{-O3}.
+
 @opindex fcprop-registers
 @item -fcprop-registers
 After register allocation and post-register allocation instruction splitting,
diff --git a/gcc/fold-mem-offsets.cc b/gcc/fold-mem-offsets.cc
new file mode 100644
index 00000000000..6263fc7bd15
--- /dev/null
+++ b/gcc/fold-mem-offsets.cc
@@ -0,0 +1,901 @@
+/* Late RTL pass to fold memory offsets.
+   Copyright (C) 2023 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "rtl.h"
+#include "tree.h"
+#include "expr.h"
+#include "backend.h"
+#include "regs.h"
+#include "target.h"
+#include "memmodel.h"
+#include "emit-rtl.h"
+#include "insn-config.h"
+#include "recog.h"
+#include "predict.h"
+#include "df.h"
+#include "tree-pass.h"
+#include "cfgrtl.h"
+
+/* This pass tries to optimize memory offset calculations by moving constants
+   from add instructions to the memory instructions (loads / stores).
+   For example it can transform code like this:
+
+     add  t4, sp, 16
+     add  t2, a6, t4
+     shl  t3, t2, 1
+     ld   a2, 0(t3)
+     add  a2, 1
+     sd   a2, 8(t2)
+
+   into the following (one instruction less):
+
+     add  t2, a6, sp
+     shl  t3, t2, 1
+     ld   a2, 32(t3)
+     add  a2, 1
+     sd   a2, 24(t2)
+
+   Although the previous passes try to emit efficient offset calculations
+   this pass is still beneficial because:
+
+    - The mechanisms that optimize memory offsets usually work with specific
+      patterns or have limitations.  This pass is designed to fold offsets
+      through complex calculations that affect multiple memory operations
+      and have partially overlapping calculations.
+
+    - There are cases where add instructions are introduced in late rtl passes
+      and the rest of the pipeline cannot eliminate them.  Arrays and structs
+      allocated on the stack can result in unwanted add instructions that
+      cannot be eliminated easily.
+
+   This pass works on a basic block level and consists of 4 phases:
+
+    - Phase 1 (Analysis): Find "foldable" instructions.
+      Foldable instructions are those that we know how to propagate
+      a constant addition through (add, shift, move, ...) and only have other
+      foldable instructions for uses.  In that phase a DFS traversal on the
+      definition tree is performed and foldable instructions are marked on
+      a bitmap.  The add immediate instructions that are reachable in this
+      DFS are candidates for folding since all the intermediate calculations
+      affected by them are also foldable.
+
+    - Phase 2 (Validity): Traverse and calculate the offsets that would result
+      from folding the add immediate instructions.  Check whether the
+      calculated offsets result in a valid instruction for the target.
+
+    - Phase 3 (Commit offsets): Traverse again.  It is now known which folds
+      are valid so at this point change the offsets in the memory instructions.
+
+    - Phase 4 (Commit instruction deletions): Scan all instructions and delete
+      or simplify (reduce to move) all add immediate instructions that were
+      folded.
+
+   This pass should run before hard register propagation because it creates
+   register moves that we expect to be eliminated.  */
+
+namespace {
+
+const pass_data pass_data_fold_mem =
+{
+  RTL_PASS, /* type */
+  "fold_mem_offsets", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  TODO_df_finish, /* todo_flags_finish */
+};
+
+class pass_fold_mem_offsets : public rtl_opt_pass
+{
+public:
+  pass_fold_mem_offsets (gcc::context *ctxt)
+    : rtl_opt_pass (pass_data_fold_mem, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *)
+    {
+      return flag_fold_mem_offsets && optimize >= 2;
+    }
+
+  virtual unsigned int execute (function *);
+}; // class pass_fold_mem_offsets
+
+/* Class that holds in FOLD_INSNS the instructions that if folded the offset
+   of a memory instruction would increase by ADDED_OFFSET.  */
+class fold_mem_info {
+public:
+  auto_bitmap fold_insns;
+  HOST_WIDE_INT added_offset;
+};
+
+typedef hash_map<rtx_insn *, fold_mem_info *> fold_info_map;
+
+/* Tracks which instructions can be reached through instructions that can
+   propagate offsets for folding.  */
+static bitmap_head can_fold_insns;
+
+/* Marks instructions that are currently eligible for folding.  */
+static bitmap_head candidate_fold_insns;
+
+/* Tracks instructions that cannot be folded because it turned out that
+   folding will result in creating an invalid memory instruction.
+   An instruction can be in both CANDIDATE_FOLD_INSNS and CANNOT_FOLD_INSNS
+   at the same time, in which case it is not legal to fold.  */
+static bitmap_head cannot_fold_insns;
+
+/* The number of instructions that were simplified or eliminated.  */
+static int stats_fold_count;
+
+/* Get the single reaching definition of an instruction inside a BB.
+   The definition is desired for REG used in INSN.
+   Return the definition insn or NULL if there's no definition with
+   the desired criteria.  */
+static rtx_insn*
+get_single_def_in_bb (rtx_insn *insn, rtx reg)
+{
+  df_ref use;
+  struct df_link *ref_chain, *ref_link;
+
+  FOR_EACH_INSN_USE (use, insn)
+    {
+      if (GET_CODE (DF_REF_REG (use)) == SUBREG)
+	return NULL;
+      if (REGNO (DF_REF_REG (use)) == REGNO (reg))
+	break;
+    }
+
+  if (!use)
+    return NULL;
+
+  ref_chain = DF_REF_CHAIN (use);
+
+  if (!ref_chain)
+    return NULL;
+
+  for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
+    {
+      /* Problem getting some definition for this instruction.  */
+      if (ref_link->ref == NULL)
+	return NULL;
+      if (DF_REF_INSN_INFO (ref_link->ref) == NULL)
+	return NULL;
+      if (global_regs[REGNO (reg)]
+	  && !set_of (reg, DF_REF_INSN (ref_link->ref)))
+	return NULL;
+    }
+
+  if (ref_chain->next)
+    return NULL;
+
+  rtx_insn *def = DF_REF_INSN (ref_chain->ref);
+
+  if (BLOCK_FOR_INSN (def) != BLOCK_FOR_INSN (insn))
+    return NULL;
+
+  if (DF_INSN_LUID (def) > DF_INSN_LUID (insn))
+    return NULL;
+
+  return def;
+}
+
+/* Get all uses of REG which is set in INSN.  Return the use list or NULL if a
+   use is missing / irregular.  If SUCCESS is not NULL then set it to false if
+   there are missing / irregular uses and true otherwise.  */
+static struct df_link*
+get_uses (rtx_insn *insn, rtx reg, bool *success)
+{
+  df_ref def;
+  struct df_link *ref_chain, *ref_link;
+
+  if (success)
+    *success = false;
+
+  FOR_EACH_INSN_DEF (def, insn)
+    if (REGNO (DF_REF_REG (def)) == REGNO (reg))
+      break;
+
+  if (!def)
+    return NULL;
+
+  ref_chain = DF_REF_CHAIN (def);
+
+  for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
+    {
+      /* Problem getting a use for this instruction.  */
+      if (ref_link->ref == NULL)
+	return NULL;
+      if (DF_REF_CLASS (ref_link->ref) != DF_REF_REGULAR)
+	return NULL;
+      /* We do not handle REG_EQUIV/REG_EQ notes for now.  */
+      if (DF_REF_FLAGS (ref_link->ref) & DF_REF_IN_NOTE)
+	return NULL;
+    }
+
+  if (success)
+    *success = true;
+
+  return ref_chain;
+}
+
+static HOST_WIDE_INT
+fold_offsets (rtx_insn *insn, rtx reg, bool analyze, bitmap foldable_insns);
+
+/*  Helper function for fold_offsets.
+
+    If DO_RECURSION is false and ANALYZE is true this function returns true iff
+    it understands the structure of INSN and knows how to propagate constants
+    through it.  In this case OFFSET_OUT and FOLDABLE_INSNS are unused.
+
+    If DO_RECURSION is true then it also calls fold_offsets for each recognized
+    part of INSN with the appropriate arguments.
+
+    If DO_RECURSION is true and ANALYZE is false then offset that would result
+    from folding is computed and is returned through the pointer OFFSET_OUT.
+    The instructions that can be folded are recorded in FOLDABLE_INSNS.
+*/
+static bool
+fold_offsets_1 (rtx_insn *insn, bool analyze, bool do_recursion,
+		HOST_WIDE_INT *offset_out, bitmap foldable_insns)
+{
+  /* Doesn't make sense if both DO_RECURSION and ANALYZE are false.  */
+  gcc_checking_assert (do_recursion || analyze);
+  gcc_checking_assert (GET_CODE (PATTERN (insn)) == SET);
+
+  rtx src = SET_SRC (PATTERN (insn));
+  HOST_WIDE_INT offset = 0;
+
+  switch (GET_CODE (src))
+    {
+    case PLUS:
+      {
+	/* Propagate through add.  */
+	rtx arg1 = XEXP (src, 0);
+	rtx arg2 = XEXP (src, 1);
+
+	if (REG_P (arg1))
+	  {
+	    if (do_recursion)
+	      offset += fold_offsets (insn, arg1, analyze, foldable_insns);
+	  }
+	else if (GET_CODE (arg1) == ASHIFT
+		 && REG_P (XEXP (arg1, 0))
+		 && CONST_INT_P (XEXP (arg1, 1)))
+	  {
+	    /* Handle R1 = (R2 << C) + ...  */
+	    if (do_recursion)
+	      {
+		HOST_WIDE_INT scale
+		  = (HOST_WIDE_INT_1U << INTVAL (XEXP (arg1, 1)));
+		offset += scale * fold_offsets (insn, XEXP (arg1, 0), analyze,
+						foldable_insns);
+	      }
+	  }
+	else if (GET_CODE (arg1) == PLUS
+		 && REG_P (XEXP (arg1, 0))
+		 && REG_P (XEXP (arg1, 1)))
+	  {
+	    /* Handle R1 = (R2 + R3) + ...  */
+	    if (do_recursion)
+	      {
+		offset += fold_offsets (insn, XEXP (arg1, 0), analyze,
+					foldable_insns);
+		offset += fold_offsets (insn, XEXP (arg1, 1), analyze,
+					foldable_insns);
+	      }
+	  }
+	else if (GET_CODE (arg1) == PLUS
+		 && GET_CODE (XEXP (arg1, 0)) == ASHIFT
+		 && REG_P (XEXP (XEXP (arg1, 0), 0))
+		 && CONST_INT_P (XEXP (XEXP (arg1, 0), 1))
+		 && REG_P (XEXP (arg1, 1)))
+	  {
+	    /* Handle R1 = ((R2 << C) + R3) + ...  */
+	    if (do_recursion)
+	      {
+		HOST_WIDE_INT scale
+		  = (HOST_WIDE_INT_1U << INTVAL (XEXP (XEXP (arg1, 0), 1)));
+		offset += scale * fold_offsets (insn, XEXP (XEXP (arg1, 0), 0),
+						analyze, foldable_insns);
+		offset += fold_offsets (insn, XEXP (arg1, 1), analyze,
+					foldable_insns);
+	      }
+	  }
+	else
+	  return false;
+
+	if (REG_P (arg2))
+	  {
+	    if (do_recursion)
+	      offset += fold_offsets (insn, arg2, analyze, foldable_insns);
+	  }
+	else if (CONST_INT_P (arg2))
+	  {
+	    if (REG_P (arg1))
+	      {
+		offset += INTVAL (arg2);
+		/* This is a R1 = R2 + C instruction, candidate for folding.  */
+		if (!analyze)
+		  bitmap_set_bit (foldable_insns, INSN_UID (insn));
+	      }
+	  }
+	else
+	  return false;
+
+	/* Pattern recognized for folding.  */
+	break;
+      }
+    case MINUS:
+      {
+	/* Propagate through minus.  */
+	rtx arg1 = XEXP (src, 0);
+	rtx arg2 = XEXP (src, 1);
+
+	if (REG_P (arg1))
+	  {
+	    if (do_recursion)
+	      offset += fold_offsets (insn, arg1, analyze, foldable_insns);
+	  }
+	else
+	  return false;
+
+	if (REG_P (arg2))
+	  {
+	    if (do_recursion)
+	      offset -= fold_offsets (insn, arg2, analyze, foldable_insns);
+	  }
+	else if (CONST_INT_P (arg2))
+	  {
+	    if (REG_P (arg1))
+	      {
+		offset -= INTVAL (arg2);
+		/* This is a R1 = R2 - C instruction, candidate for folding.  */
+		if (!analyze)
+		  bitmap_set_bit (foldable_insns, INSN_UID (insn));
+	      }
+	  }
+	else
+	  return false;
+
+	/* Pattern recognized for folding.  */
+	break;
+      }
+    case NEG:
+      {
+	/* Propagate through negation.  */
+	rtx arg1 = XEXP (src, 0);
+	if (REG_P (arg1))
+	  {
+	    if (do_recursion)
+	      offset = -fold_offsets (insn, arg1, analyze, foldable_insns);
+	  }
+	else
+	  return false;
+
+	/* Pattern recognized for folding.  */
+	break;
+      }
+    case MULT:
+      {
+	/* Propagate through multiply by constant.  */
+	rtx arg1 = XEXP (src, 0);
+	rtx arg2 = XEXP (src, 1);
+
+	if (REG_P (arg1) && CONST_INT_P (arg2))
+	  {
+	    if (do_recursion)
+	      {
+		HOST_WIDE_INT scale = INTVAL (arg2);
+		offset = scale * fold_offsets (insn, arg1, analyze,
+					       foldable_insns);
+	      }
+	  }
+	else
+	  return false;
+
+	/* Pattern recognized for folding.  */
+	break;
+      }
+    case ASHIFT:
+      {
+	/* Propagate through shift left by constant.  */
+	rtx arg1 = XEXP (src, 0);
+	rtx arg2 = XEXP (src, 1);
+
+	if (REG_P (arg1) && CONST_INT_P (arg2))
+	  {
+	    if (do_recursion)
+	      {
+		HOST_WIDE_INT scale = (HOST_WIDE_INT_1U << INTVAL (arg2));
+		offset = scale * fold_offsets (insn, arg1, analyze,
+					       foldable_insns);
+	      }
+	  }
+	else
+	  return false;
+
+	/* Pattern recognized for folding.  */
+	break;
+      }
+    case REG:
+      {
+	/* Propagate through register move.  */
+	if (do_recursion)
+	  offset = fold_offsets (insn, src, analyze, foldable_insns);
+
+	/* Pattern recognized for folding.  */
+	break;
+      }
+    case CONST_INT:
+      {
+	offset = INTVAL (src);
+	/* R1 = C is candidate for folding.  */
+	if (!analyze)
+	  bitmap_set_bit (foldable_insns, INSN_UID (insn));
+
+	/* Pattern recognized for folding.  */
+	break;
+      }
+    default:
+      /* Cannot recognize.  */
+      return false;
+    }
+
+    if (do_recursion && !analyze)
+      *offset_out = offset;
+
+    return true;
+}
+
+/* Function that computes the offset that would have to be added to all uses
+   of REG if the instructions marked in FOLDABLE_INSNS were to be eliminated.
+
+   If ANALYZE is true then mark in CAN_FOLD_INSNS which instructions
+   transitively only affect other instructions found in CAN_FOLD_INSNS.
+   If ANALYZE is false then compute the offset required for folding.  */
+static HOST_WIDE_INT
+fold_offsets (rtx_insn *insn, rtx reg, bool analyze, bitmap foldable_insns)
+{
+  rtx_insn *def = get_single_def_in_bb (insn, reg);
+
+  if (!def || GET_CODE (PATTERN (def)) != SET)
+    return 0;
+
+  rtx dest = SET_DEST (PATTERN (def));
+
+  if (!REG_P (dest))
+    return 0;
+
+  /* We can only affect the values of GPR registers.  */
+  unsigned int dest_regno = REGNO (dest);
+  if (fixed_regs[dest_regno]
+      || !TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], dest_regno))
+    return 0;
+
+  if (analyze)
+    {
+      /* Check if we know how to handle DEF.  */
+      if (!fold_offsets_1 (def, true, false, NULL, NULL))
+	return 0;
+
+      /* We only fold through instructions that are transitively used as
+	 memory addresses and do not have other uses.  Use the same logic
+	 from offset calculation to visit instructions that can propagate
+	 offsets and keep track of them in CAN_FOLD_INSNS.  */
+      bool success;
+      struct df_link *uses = get_uses (def, dest, &success), *ref_link;
+
+      if (!success)
+	return 0;
+
+      for (ref_link = uses; ref_link; ref_link = ref_link->next)
+	{
+	  rtx_insn *use = DF_REF_INSN (ref_link->ref);
+
+	  if (DEBUG_INSN_P (use))
+	    continue;
+
+	  /* Punt if the use is anything more complicated than a set
+	     (clobber, use, etc).  */
+	  if (!NONJUMP_INSN_P (use) || GET_CODE (PATTERN (use)) != SET)
+	    return 0;
+
+	  /* This use affects instructions outside of CAN_FOLD_INSNS.  */
+	  if (!bitmap_bit_p (&can_fold_insns, INSN_UID (use)))
+	    return 0;
+
+	  rtx use_set = PATTERN (use);
+
+	  /* Special case: A foldable memory store is not foldable if it
+	     mentions DEST outside of the address calculation.  */
+	  if (use_set && MEM_P (SET_DEST (use_set))
+	      && reg_mentioned_p (dest, SET_SRC (use_set)))
+	    return 0;
+	}
+
+      bitmap_set_bit (&can_fold_insns, INSN_UID (def));
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Instruction marked for propagation: ");
+	  print_rtl_single (dump_file, def);
+	}
+    }
+  else
+    {
+      /* We cannot propagate through this instruction.  */
+      if (!bitmap_bit_p (&can_fold_insns, INSN_UID (def)))
+	return 0;
+    }
+
+  HOST_WIDE_INT offset = 0;
+  bool recognized = fold_offsets_1 (def, analyze, true, &offset,
+				    foldable_insns);
+
+  if (!recognized)
+    return 0;
+
+  return offset;
+}
+
+/* Test if INSN is a memory load / store that can have an offset folded to it.
+   Return true iff INSN is such an instruction and return through MEM_OUT,
+   REG_OUT and OFFSET_OUT the RTX that has a MEM code, the register that is
+   used as a base address and the offset accordingly.
+   All of the out pointers may be NULL in which case they will be ignored.  */
+bool
+get_fold_mem_root (rtx_insn *insn, rtx *mem_out, rtx *reg_out,
+		   HOST_WIDE_INT *offset_out)
+{
+  rtx set = single_set (insn);
+  rtx mem = NULL_RTX;
+
+  if (set != NULL_RTX)
+    {
+      rtx src = SET_SRC (set);
+      rtx dest = SET_DEST (set);
+
+      /* Don't fold when we have unspec / volatile.  */
+      if (GET_CODE (src) == UNSPEC
+	  || GET_CODE (src) == UNSPEC_VOLATILE
+	  || GET_CODE (dest) == UNSPEC
+	  || GET_CODE (dest) == UNSPEC_VOLATILE)
+	return false;
+
+      if (MEM_P (src))
+	mem = src;
+      else if (MEM_P (dest))
+	mem = dest;
+      else if ((GET_CODE (src) == SIGN_EXTEND
+		|| GET_CODE (src) == ZERO_EXTEND)
+	       && MEM_P (XEXP (src, 0)))
+	mem = XEXP (src, 0);
+    }
+
+  if (mem == NULL_RTX)
+    return false;
+
+  rtx mem_addr = XEXP (mem, 0);
+  rtx reg;
+  HOST_WIDE_INT offset;
+
+  if (REG_P (mem_addr))
+    {
+      reg = mem_addr;
+      offset = 0;
+    }
+  else if (GET_CODE (mem_addr) == PLUS
+	   && REG_P (XEXP (mem_addr, 0))
+	   && CONST_INT_P (XEXP (mem_addr, 1)))
+    {
+      reg = XEXP (mem_addr, 0);
+      offset = INTVAL (XEXP (mem_addr, 1));
+    }
+  else
+    return false;
+
+  if (mem_out)
+    *mem_out = mem;
+  if (reg_out)
+    *reg_out = reg;
+  if (offset_out)
+    *offset_out = offset;
+
+  return true;
+}
+
+/* If INSN is a root memory instruction then do a DFS traversal on its
+   definitions and find folding candidates.  */
+static void
+do_analysis (rtx_insn *insn)
+{
+  rtx reg;
+  if (!get_fold_mem_root (insn, NULL, &reg, NULL))
+    return;
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "Starting analysis from root: ");
+      print_rtl_single (dump_file, insn);
+    }
+
+  /* Analyse folding opportunities for this memory instruction.  */
+  bitmap_set_bit (&can_fold_insns, INSN_UID (insn));
+  fold_offsets (insn, reg, true, NULL);
+}
+
+static void
+do_fold_info_calculation (rtx_insn *insn, fold_info_map *fold_info)
+{
+  rtx mem, reg;
+  HOST_WIDE_INT cur_offset;
+  if (!get_fold_mem_root (insn, &mem, &reg, &cur_offset))
+    return;
+
+  fold_mem_info *info = new fold_mem_info;
+  info->added_offset = fold_offsets (insn, reg, false, info->fold_insns);
+
+  fold_info->put (insn, info);
+}
+
+/* If INSN is a root memory instruction then compute a potentially new offset
+   for it and test if the resulting instruction is valid.  */
+static void
+do_check_validity (rtx_insn *insn, fold_mem_info *info)
+{
+  rtx mem, reg;
+  HOST_WIDE_INT cur_offset;
+  if (!get_fold_mem_root (insn, &mem, &reg, &cur_offset))
+    return;
+
+  HOST_WIDE_INT new_offset = cur_offset + info->added_offset;
+
+  /* Test if it is valid to change MEM's address offset to NEW_OFFSET.  */
+  int icode = INSN_CODE (insn);
+  INSN_CODE (insn) = -1;
+  rtx mem_addr = XEXP (mem, 0);
+  machine_mode mode = GET_MODE (mem_addr);
+  if (new_offset != 0)
+    XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, gen_int_mode (new_offset, mode));
+  else
+    XEXP (mem, 0) = reg;
+
+  bool illegal = insn_invalid_p (insn, false)
+		 || !memory_address_addr_space_p (mode, XEXP (mem, 0),
+						  MEM_ADDR_SPACE (mem));
+
+  /* Restore the instruction.  */
+  XEXP (mem, 0) = mem_addr;
+  INSN_CODE (insn) = icode;
+
+  if (illegal)
+    bitmap_ior_into (&cannot_fold_insns, info->fold_insns);
+  else
+    bitmap_ior_into (&candidate_fold_insns, info->fold_insns);
+}
+
+static bool
+compute_validity_closure (fold_info_map *fold_info)
+{
+  /* Let's say we have an arbitrary chain of foldable instructions xN = xN + C
+     and memory operations rN that use xN as shown below.  If folding x1 in r1
+     turns out to be invalid for whatever reason then it's also invalid to fold
+     any of the other xN into any rN.  That means that we need the transitive
+     closure of validity to determine whether we can fold a xN instruction.
+
+     +--------------+    +-------------------+    +-------------------+
+     | r1 = mem[x1] |    | r2 = mem[x1 + x2] |    | r3 = mem[x2 + x3] |   ...
+     +--------------+    +-------------------+    +-------------------+
+	    ^                ^       ^                ^       ^
+	    |               /        |               /        |           ...
+	    |              /         |              /         |
+     +-------------+      /   +-------------+      /   +-------------+
+     | x1 = x1 + 1 |-----+    | x2 = x2 + 1 |-----+    | x3 = x3 + 1 |--- ...
+     +-------------+          +-------------+          +-------------+
+	    ^                        ^                        ^
+	    |                        |                        |
+	   ...                      ...                      ...
+  */
+
+  /* In general three iterations should be enough for most cases, but allow up
+     to five when -fexpensive-optimizations is used.  */
+  int max_iters = 3 + 2 * flag_expensive_optimizations;
+  for (int pass = 0; pass < max_iters; pass++)
+    {
+      bool made_changes = false;
+      for (fold_info_map::iterator iter = fold_info->begin ();
+	   iter != fold_info->end (); ++iter)
+	{
+	  fold_mem_info *info = (*iter).second;
+	  if (bitmap_intersect_p (&cannot_fold_insns, info->fold_insns))
+	    made_changes |= bitmap_ior_into (&cannot_fold_insns,
+					     info->fold_insns);
+	}
+
+      if (!made_changes)
+	return true;
+    }
+
+  return false;
+}
+
+/* If INSN is a root memory instruction that was affected by any folding
+   then update its offset as necessary.  */
+static void
+do_commit_offset (rtx_insn *insn, fold_mem_info *info)
+{
+  rtx mem, reg;
+  HOST_WIDE_INT cur_offset;
+  if (!get_fold_mem_root (insn, &mem, &reg, &cur_offset))
+    return;
+
+  HOST_WIDE_INT new_offset = cur_offset + info->added_offset;
+
+  if (new_offset == cur_offset)
+    return;
+
+  gcc_assert (!bitmap_empty_p (info->fold_insns));
+
+  if (bitmap_intersect_p (&cannot_fold_insns, info->fold_insns))
+    return;
+
+  if (dump_file)
+    {
+      fprintf (dump_file, "Memory offset changed from "
+	       HOST_WIDE_INT_PRINT_DEC " to " HOST_WIDE_INT_PRINT_DEC
+	       " for instruction:\n", cur_offset, new_offset);
+      print_rtl_single (dump_file, insn);
+    }
+
+  machine_mode mode = GET_MODE (XEXP (mem, 0));
+  if (new_offset != 0)
+    XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, gen_int_mode (new_offset, mode));
+  else
+    XEXP (mem, 0) = reg;
+  INSN_CODE (insn) = recog (PATTERN (insn), insn, 0);
+  df_insn_rescan (insn);
+}
+
+/* If INSN is a move / add instruction that was folded then replace its
+   constant part with zero.  */
+static void
+do_commit_insn (rtx_insn *insn)
+{
+  if (bitmap_bit_p (&candidate_fold_insns, INSN_UID (insn))
+      && !bitmap_bit_p (&cannot_fold_insns, INSN_UID (insn)))
+    {
+      if (dump_file)
+	{
+	  fprintf (dump_file, "Instruction folded:");
+	  print_rtl_single (dump_file, insn);
+	}
+
+      stats_fold_count++;
+
+      rtx set = single_set (insn);
+      rtx dest = SET_DEST (set);
+      rtx src = SET_SRC (set);
+
+      /* Emit a move and let subsequent passes eliminate it if possible.  */
+      if (GET_CODE (src) == CONST_INT)
+	{
+	  /* INSN is R1 = C.
+	     Replace it with R1 = 0 because C was folded.  */
+	  rtx mov_rtx
+	    = gen_move_insn (dest, gen_int_mode (0, GET_MODE (dest)));
+	  df_insn_rescan (emit_insn_after (mov_rtx, insn));
+	}
+      else
+	{
+	  /* INSN is R1 = R2 + C.
+	     Replace it with R1 = R2 because C was folded.  */
+	  rtx arg1 = XEXP (src, 0);
+
+	  /* If the DEST == ARG1 then the move is a no-op.  */
+	  if (REGNO (dest) != REGNO (arg1))
+	    {
+	      gcc_checking_assert (GET_MODE (dest) == GET_MODE (arg1));
+	      rtx mov_rtx = gen_move_insn (dest, arg1);
+	      df_insn_rescan (emit_insn_after (mov_rtx, insn));
+	    }
+	}
+
+      /* Delete the original move / add instruction.  */
+      delete_insn (insn);
+    }
+}
+
+unsigned int
+pass_fold_mem_offsets::execute (function *fn)
+{
+  df_set_flags (DF_EQ_NOTES + DF_RD_PRUNE_DEAD_DEFS + DF_DEFER_INSN_RESCAN);
+  df_chain_add_problem (DF_UD_CHAIN + DF_DU_CHAIN);
+  df_analyze ();
+
+  bitmap_initialize (&can_fold_insns, NULL);
+  bitmap_initialize (&candidate_fold_insns, NULL);
+  bitmap_initialize (&cannot_fold_insns, NULL);
+
+  stats_fold_count = 0;
+
+  basic_block bb;
+  rtx_insn *insn;
+  FOR_ALL_BB_FN (bb, fn)
+    {
+      /* There is a conflict between this pass and RISCV's shorten-memrefs
+	  pass.  For now disable folding if optimizing for size because
+	  otherwise this cancels the effects of shorten-memrefs.  */
+      if (optimize_bb_for_size_p (bb))
+	continue;
+
+      fold_info_map fold_info;
+
+      bitmap_clear (&can_fold_insns);
+      bitmap_clear (&candidate_fold_insns);
+      bitmap_clear (&cannot_fold_insns);
+
+      FOR_BB_INSNS (bb, insn)
+	do_analysis (insn);
+
+      FOR_BB_INSNS (bb, insn)
+	do_fold_info_calculation (insn, &fold_info);
+
+      FOR_BB_INSNS (bb, insn)
+	if (fold_mem_info **info = fold_info.get (insn))
+	  do_check_validity (insn, *info);
+
+      if (compute_validity_closure (&fold_info))
+	{
+	  FOR_BB_INSNS (bb, insn)
+	    if (fold_mem_info **info = fold_info.get (insn))
+	      do_commit_offset (insn, *info);
+
+	  FOR_BB_INSNS (bb, insn)
+	    do_commit_insn (insn);
+	}
+
+      for (fold_info_map::iterator iter = fold_info.begin ();
+	   iter != fold_info.end (); ++iter)
+	delete (*iter).second;
+    }
+
+  statistics_counter_event (cfun, "Number of folded instructions",
+			    stats_fold_count);
+
+  bitmap_release (&can_fold_insns);
+  bitmap_release (&candidate_fold_insns);
+  bitmap_release (&cannot_fold_insns);
+
+  return 0;
+}
+
+} // anon namespace
+
+rtl_opt_pass *
+make_pass_fold_mem_offsets (gcc::context *ctxt)
+{
+  return new pass_fold_mem_offsets (ctxt);
+}
diff --git a/gcc/passes.def b/gcc/passes.def
index 2bafd60bbfb..df7965dc50f 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -519,6 +519,7 @@ along with GCC; see the file COPYING3.  If not see
 	  NEXT_PASS (pass_peephole2);
 	  NEXT_PASS (pass_if_after_reload);
 	  NEXT_PASS (pass_regrename);
+	  NEXT_PASS (pass_fold_mem_offsets);
 	  NEXT_PASS (pass_cprop_hardreg);
 	  NEXT_PASS (pass_fast_rtl_dce);
 	  NEXT_PASS (pass_reorder_blocks);
diff --git a/gcc/testsuite/gcc.target/i386/pr52146.c b/gcc/testsuite/gcc.target/i386/pr52146.c
index 9bd81368bcd..03e3e95ec04 100644
--- a/gcc/testsuite/gcc.target/i386/pr52146.c
+++ b/gcc/testsuite/gcc.target/i386/pr52146.c
@@ -16,4 +16,4 @@ test2 (void)
   *apic_tpr_addr = 0;
 }
 
-/* { dg-final { scan-assembler-not "\[,\\t \]+-18874240" } } */
+/* { dg-final { scan-assembler-not "\[,\\t \]+-18874240\n" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
new file mode 100644
index 00000000000..ffb49936dc6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffold-mem-offsets" } */
+
+void sink(int arr[2]);
+
+void
+foo(int a, int b, int i)
+{
+  int arr[2] = {a, b};
+  arr[i]++;
+  sink(arr);
+}
+
+/* The should be no negative memory offsets when using -ffold-mem-offsets.  */
+/* { dg-final { scan-assembler-not "lw\t.*,-.*\\(.*\\)" } } */
+/* { dg-final { scan-assembler-not "sw\t.*,-.*\\(.*\\)" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
new file mode 100644
index 00000000000..ca96180470a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffold-mem-offsets" } */
+
+void sink(int arr[3]);
+
+void
+foo(int a, int b, int c, int i)
+{
+  int arr1[3] = {a, b, c};
+  int arr2[3] = {a, c, b};
+  int arr3[3] = {c, b, a};
+
+  arr1[i]++;
+  arr2[i]++;
+  arr3[i]++;
+  
+  sink(arr1);
+  sink(arr2);
+  sink(arr3);
+}
+
+/* The should be no negative memory offsets when using -ffold-mem-offsets.  */
+/* { dg-final { scan-assembler-not "lw\t.*,-.*\\(.*\\)" } } */
+/* { dg-final { scan-assembler-not "sw\t.*,-.*\\(.*\\)" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c
new file mode 100644
index 00000000000..83f82c02caa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffold-mem-offsets" } */
+
+void load(int arr[2]);
+
+int
+foo(long unsigned int i)
+{
+  int arr[2];
+  load(arr);
+
+  return arr[3 * i + 77];
+}
+
+/* The should be no negative memory offsets when using -ffold-mem-offsets.  */
+/* { dg-final { scan-assembler-not "lw\t.*,-.*\\(.*\\)" } } */
+/* { dg-final { scan-assembler-not "addi\t.*,.*,77" } } */
\ No newline at end of file
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 9c4b1e4185c..79a5f330274 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -622,6 +622,7 @@ extern rtl_opt_pass *make_pass_sched_fusion (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_peephole2 (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_if_after_reload (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_regrename (gcc::context *ctxt);
+extern rtl_opt_pass *make_pass_fold_mem_offsets (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_cprop_hardreg (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_reorder_blocks (gcc::context *ctxt);
 extern rtl_opt_pass *make_pass_leaf_regs (gcc::context *ctxt);
-- 
2.34.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7] Implement new RTL optimizations pass: fold-mem-offsets.
  2023-10-16 18:01 [PATCH v7] Implement new RTL optimizations pass: fold-mem-offsets Manolis Tsamis
@ 2023-10-16 18:14 ` Manolis Tsamis
  2023-10-16 19:11 ` Jeff Law
  1 sibling, 0 replies; 12+ messages in thread
From: Manolis Tsamis @ 2023-10-16 18:14 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jeff Law, Richard Biener, Vineet Gupta, Philipp Tomsich

This version has been bootstrapped and tested to cause no regressions
in both x86 and Aarch64.

Manolis

On Mon, Oct 16, 2023 at 9:01 PM Manolis Tsamis <manolis.tsamis@vrull.eu> wrote:
>
> This is a new RTL pass that tries to optimize memory offset calculations
> by moving them from add immediate instructions to the memory loads/stores.
> For example it can transform this:
>
>   addi t4,sp,16
>   add  t2,a6,t4
>   shl  t3,t2,1
>   ld   a2,0(t3)
>   addi a2,1
>   sd   a2,8(t2)
>
> into the following (one instruction less):
>
>   add  t2,a6,sp
>   shl  t3,t2,1
>   ld   a2,32(t3)
>   addi a2,1
>   sd   a2,24(t2)
>
> Although there are places where this is done already, this pass is more
> powerful and can handle the more difficult cases that are currently not
> optimized. Also, it runs late enough and can optimize away unnecessary
> stack pointer calculations.
>
> gcc/ChangeLog:
>
>         * Makefile.in: Add fold-mem-offsets.o.
>         * passes.def: Schedule a new pass.
>         * tree-pass.h (make_pass_fold_mem_offsets): Declare.
>         * common.opt: New options.
>         * doc/invoke.texi: Document new option.
>         * fold-mem-offsets.cc: New file.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/riscv/fold-mem-offsets-1.c: New test.
>         * gcc.target/riscv/fold-mem-offsets-2.c: New test.
>         * gcc.target/riscv/fold-mem-offsets-3.c: New test.
>
> Signed-off-by: Manolis Tsamis <manolis.tsamis@vrull.eu>
> ---
>
> Changes in v7:
>         - Replace recognise with recognize.
>         - Add INSN_CODE(insn) = -1 before calling insn_invalid_p.
>         - Adjust scan regex in i386/pr52146.c.
>
> Changes in v6:
>         - Fix formatting issues.
>         - Compute maximum validity iterations based on
>           flag_expensive_optimizations.
>         - Generate rtx plus only if offset is not zero.
>
> Changes in v5:
>         - Introduce new helper function fold_offsets_1.
>         - Fix bug because constants could be partially propagated
>           through instructions that weren't understood.
>         - Introduce helper class fold_mem_info that stores f-m-o
>           info for an instruction.
>         - Calculate fold_offsets only once with do_fold_info_calculation.
>         - Fix correctness issue by introducing compute_validity_closure.
>         - Propagate in more cases for PLUS/MINUS with constant.
>
> Changes in v4:
>         - Add DF_EQ_NOTES flag to avoid incorrect state in notes.
>         - Remove fold_mem_offsets_driver and enum fold_mem_phase.
>         - Call recog when patching offsets in do_commit_offset.
>         - Restore INSN_CODE after modifying insn in do_check_validity.
>
> Changes in v3:
>         - Added propagation for more codes:
>           sub, neg, mul.
>         - Added folding / elimination for sub and
>           const int moves.
>         - For the validity check of the generated addresses
>           also test memory_address_addr_space_p.
>         - Replaced GEN_INT with gen_int_mode.
>         - Replaced some bitmap_head with auto_bitmap.
>         - Refactor each phase into own function for readability.
>         - Add dump details.
>         - Replace rtx iteration with reg_mentioned_p.
>         - Return early for codes that we can't propagate through.
>
> Changes in v2:
>         - Made the pass target-independant instead of RISCV specific.
>         - Fixed a number of bugs.
>         - Add code to handle more ADD patterns as found
>           in other targets (x86, aarch64).
>         - Improved naming and comments.
>         - Fixed bitmap memory leak.
>
>  gcc/Makefile.in                               |   1 +
>  gcc/common.opt                                |   4 +
>  gcc/doc/invoke.texi                           |   8 +
>  gcc/fold-mem-offsets.cc                       | 901 ++++++++++++++++++
>  gcc/passes.def                                |   1 +
>  gcc/testsuite/gcc.target/i386/pr52146.c       |   2 +-
>  .../gcc.target/riscv/fold-mem-offsets-1.c     |  16 +
>  .../gcc.target/riscv/fold-mem-offsets-2.c     |  24 +
>  .../gcc.target/riscv/fold-mem-offsets-3.c     |  17 +
>  gcc/tree-pass.h                               |   1 +
>  10 files changed, 974 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/fold-mem-offsets.cc
>  create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 9cc16268abf..747f749538d 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1443,6 +1443,7 @@ OBJS = \
>         fixed-value.o \
>         fold-const.o \
>         fold-const-call.o \
> +       fold-mem-offsets.o \
>         function.o \
>         function-abi.o \
>         function-tests.o \
> diff --git a/gcc/common.opt b/gcc/common.opt
> index f137a1f81ac..b103b8d28ed 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -1252,6 +1252,10 @@ fcprop-registers
>  Common Var(flag_cprop_registers) Optimization
>  Perform a register copy-propagation optimization pass.
>
> +ffold-mem-offsets
> +Target Bool Var(flag_fold_mem_offsets) Init(1)
> +Fold instructions calculating memory offsets to the memory access instruction if possible.
> +
>  fcrossjumping
>  Common Var(flag_crossjumping) Optimization
>  Perform cross-jumping optimization.
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index eb714d18511..a26dc5aae69 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -543,6 +543,7 @@ Objective-C and Objective-C++ Dialects}.
>  -fauto-inc-dec  -fbranch-probabilities
>  -fcaller-saves
>  -fcombine-stack-adjustments  -fconserve-stack
> +-ffold-mem-offsets
>  -fcompare-elim  -fcprop-registers  -fcrossjumping
>  -fcse-follow-jumps  -fcse-skip-blocks  -fcx-fortran-rules
>  -fcx-limited-range
> @@ -14395,6 +14396,13 @@ the comparison operation before register allocation is complete.
>
>  Enabled at levels @option{-O1}, @option{-O2}, @option{-O3}, @option{-Os}.
>
> +@opindex ffold-mem-offsets
> +@item -ffold-mem-offsets
> +@itemx -fno-fold-mem-offsets
> +Try to eliminate add instructions by folding them in memory loads/stores.
> +
> +Enabled at levels @option{-O2}, @option{-O3}.
> +
>  @opindex fcprop-registers
>  @item -fcprop-registers
>  After register allocation and post-register allocation instruction splitting,
> diff --git a/gcc/fold-mem-offsets.cc b/gcc/fold-mem-offsets.cc
> new file mode 100644
> index 00000000000..6263fc7bd15
> --- /dev/null
> +++ b/gcc/fold-mem-offsets.cc
> @@ -0,0 +1,901 @@
> +/* Late RTL pass to fold memory offsets.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify
> +it under the terms of the GNU General Public License as published by
> +the Free Software Foundation; either version 3, or (at your option)
> +any later version.
> +
> +GCC is distributed in the hope that it will be useful,
> +but WITHOUT ANY WARRANTY; without even the implied warranty of
> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +GNU General Public License for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "tm.h"
> +#include "rtl.h"
> +#include "tree.h"
> +#include "expr.h"
> +#include "backend.h"
> +#include "regs.h"
> +#include "target.h"
> +#include "memmodel.h"
> +#include "emit-rtl.h"
> +#include "insn-config.h"
> +#include "recog.h"
> +#include "predict.h"
> +#include "df.h"
> +#include "tree-pass.h"
> +#include "cfgrtl.h"
> +
> +/* This pass tries to optimize memory offset calculations by moving constants
> +   from add instructions to the memory instructions (loads / stores).
> +   For example it can transform code like this:
> +
> +     add  t4, sp, 16
> +     add  t2, a6, t4
> +     shl  t3, t2, 1
> +     ld   a2, 0(t3)
> +     add  a2, 1
> +     sd   a2, 8(t2)
> +
> +   into the following (one instruction less):
> +
> +     add  t2, a6, sp
> +     shl  t3, t2, 1
> +     ld   a2, 32(t3)
> +     add  a2, 1
> +     sd   a2, 24(t2)
> +
> +   Although the previous passes try to emit efficient offset calculations
> +   this pass is still beneficial because:
> +
> +    - The mechanisms that optimize memory offsets usually work with specific
> +      patterns or have limitations.  This pass is designed to fold offsets
> +      through complex calculations that affect multiple memory operations
> +      and have partially overlapping calculations.
> +
> +    - There are cases where add instructions are introduced in late rtl passes
> +      and the rest of the pipeline cannot eliminate them.  Arrays and structs
> +      allocated on the stack can result in unwanted add instructions that
> +      cannot be eliminated easily.
> +
> +   This pass works on a basic block level and consists of 4 phases:
> +
> +    - Phase 1 (Analysis): Find "foldable" instructions.
> +      Foldable instructions are those that we know how to propagate
> +      a constant addition through (add, shift, move, ...) and only have other
> +      foldable instructions for uses.  In that phase a DFS traversal on the
> +      definition tree is performed and foldable instructions are marked on
> +      a bitmap.  The add immediate instructions that are reachable in this
> +      DFS are candidates for folding since all the intermediate calculations
> +      affected by them are also foldable.
> +
> +    - Phase 2 (Validity): Traverse and calculate the offsets that would result
> +      from folding the add immediate instructions.  Check whether the
> +      calculated offsets result in a valid instruction for the target.
> +
> +    - Phase 3 (Commit offsets): Traverse again.  It is now known which folds
> +      are valid so at this point change the offsets in the memory instructions.
> +
> +    - Phase 4 (Commit instruction deletions): Scan all instructions and delete
> +      or simplify (reduce to move) all add immediate instructions that were
> +      folded.
> +
> +   This pass should run before hard register propagation because it creates
> +   register moves that we expect to be eliminated.  */
> +
> +namespace {
> +
> +const pass_data pass_data_fold_mem =
> +{
> +  RTL_PASS, /* type */
> +  "fold_mem_offsets", /* name */
> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_NONE, /* tv_id */
> +  0, /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  TODO_df_finish, /* todo_flags_finish */
> +};
> +
> +class pass_fold_mem_offsets : public rtl_opt_pass
> +{
> +public:
> +  pass_fold_mem_offsets (gcc::context *ctxt)
> +    : rtl_opt_pass (pass_data_fold_mem, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *)
> +    {
> +      return flag_fold_mem_offsets && optimize >= 2;
> +    }
> +
> +  virtual unsigned int execute (function *);
> +}; // class pass_fold_mem_offsets
> +
> +/* Class that holds in FOLD_INSNS the instructions that if folded the offset
> +   of a memory instruction would increase by ADDED_OFFSET.  */
> +class fold_mem_info {
> +public:
> +  auto_bitmap fold_insns;
> +  HOST_WIDE_INT added_offset;
> +};
> +
> +typedef hash_map<rtx_insn *, fold_mem_info *> fold_info_map;
> +
> +/* Tracks which instructions can be reached through instructions that can
> +   propagate offsets for folding.  */
> +static bitmap_head can_fold_insns;
> +
> +/* Marks instructions that are currently eligible for folding.  */
> +static bitmap_head candidate_fold_insns;
> +
> +/* Tracks instructions that cannot be folded because it turned out that
> +   folding will result in creating an invalid memory instruction.
> +   An instruction can be in both CANDIDATE_FOLD_INSNS and CANNOT_FOLD_INSNS
> +   at the same time, in which case it is not legal to fold.  */
> +static bitmap_head cannot_fold_insns;
> +
> +/* The number of instructions that were simplified or eliminated.  */
> +static int stats_fold_count;
> +
> +/* Get the single reaching definition of an instruction inside a BB.
> +   The definition is desired for REG used in INSN.
> +   Return the definition insn or NULL if there's no definition with
> +   the desired criteria.  */
> +static rtx_insn*
> +get_single_def_in_bb (rtx_insn *insn, rtx reg)
> +{
> +  df_ref use;
> +  struct df_link *ref_chain, *ref_link;
> +
> +  FOR_EACH_INSN_USE (use, insn)
> +    {
> +      if (GET_CODE (DF_REF_REG (use)) == SUBREG)
> +       return NULL;
> +      if (REGNO (DF_REF_REG (use)) == REGNO (reg))
> +       break;
> +    }
> +
> +  if (!use)
> +    return NULL;
> +
> +  ref_chain = DF_REF_CHAIN (use);
> +
> +  if (!ref_chain)
> +    return NULL;
> +
> +  for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
> +    {
> +      /* Problem getting some definition for this instruction.  */
> +      if (ref_link->ref == NULL)
> +       return NULL;
> +      if (DF_REF_INSN_INFO (ref_link->ref) == NULL)
> +       return NULL;
> +      if (global_regs[REGNO (reg)]
> +         && !set_of (reg, DF_REF_INSN (ref_link->ref)))
> +       return NULL;
> +    }
> +
> +  if (ref_chain->next)
> +    return NULL;
> +
> +  rtx_insn *def = DF_REF_INSN (ref_chain->ref);
> +
> +  if (BLOCK_FOR_INSN (def) != BLOCK_FOR_INSN (insn))
> +    return NULL;
> +
> +  if (DF_INSN_LUID (def) > DF_INSN_LUID (insn))
> +    return NULL;
> +
> +  return def;
> +}
> +
> +/* Get all uses of REG which is set in INSN.  Return the use list or NULL if a
> +   use is missing / irregular.  If SUCCESS is not NULL then set it to false if
> +   there are missing / irregular uses and true otherwise.  */
> +static struct df_link*
> +get_uses (rtx_insn *insn, rtx reg, bool *success)
> +{
> +  df_ref def;
> +  struct df_link *ref_chain, *ref_link;
> +
> +  if (success)
> +    *success = false;
> +
> +  FOR_EACH_INSN_DEF (def, insn)
> +    if (REGNO (DF_REF_REG (def)) == REGNO (reg))
> +      break;
> +
> +  if (!def)
> +    return NULL;
> +
> +  ref_chain = DF_REF_CHAIN (def);
> +
> +  for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
> +    {
> +      /* Problem getting a use for this instruction.  */
> +      if (ref_link->ref == NULL)
> +       return NULL;
> +      if (DF_REF_CLASS (ref_link->ref) != DF_REF_REGULAR)
> +       return NULL;
> +      /* We do not handle REG_EQUIV/REG_EQ notes for now.  */
> +      if (DF_REF_FLAGS (ref_link->ref) & DF_REF_IN_NOTE)
> +       return NULL;
> +    }
> +
> +  if (success)
> +    *success = true;
> +
> +  return ref_chain;
> +}
> +
> +static HOST_WIDE_INT
> +fold_offsets (rtx_insn *insn, rtx reg, bool analyze, bitmap foldable_insns);
> +
> +/*  Helper function for fold_offsets.
> +
> +    If DO_RECURSION is false and ANALYZE is true this function returns true iff
> +    it understands the structure of INSN and knows how to propagate constants
> +    through it.  In this case OFFSET_OUT and FOLDABLE_INSNS are unused.
> +
> +    If DO_RECURSION is true then it also calls fold_offsets for each recognized
> +    part of INSN with the appropriate arguments.
> +
> +    If DO_RECURSION is true and ANALYZE is false then offset that would result
> +    from folding is computed and is returned through the pointer OFFSET_OUT.
> +    The instructions that can be folded are recorded in FOLDABLE_INSNS.
> +*/
> +static bool
> +fold_offsets_1 (rtx_insn *insn, bool analyze, bool do_recursion,
> +               HOST_WIDE_INT *offset_out, bitmap foldable_insns)
> +{
> +  /* Doesn't make sense if both DO_RECURSION and ANALYZE are false.  */
> +  gcc_checking_assert (do_recursion || analyze);
> +  gcc_checking_assert (GET_CODE (PATTERN (insn)) == SET);
> +
> +  rtx src = SET_SRC (PATTERN (insn));
> +  HOST_WIDE_INT offset = 0;
> +
> +  switch (GET_CODE (src))
> +    {
> +    case PLUS:
> +      {
> +       /* Propagate through add.  */
> +       rtx arg1 = XEXP (src, 0);
> +       rtx arg2 = XEXP (src, 1);
> +
> +       if (REG_P (arg1))
> +         {
> +           if (do_recursion)
> +             offset += fold_offsets (insn, arg1, analyze, foldable_insns);
> +         }
> +       else if (GET_CODE (arg1) == ASHIFT
> +                && REG_P (XEXP (arg1, 0))
> +                && CONST_INT_P (XEXP (arg1, 1)))
> +         {
> +           /* Handle R1 = (R2 << C) + ...  */
> +           if (do_recursion)
> +             {
> +               HOST_WIDE_INT scale
> +                 = (HOST_WIDE_INT_1U << INTVAL (XEXP (arg1, 1)));
> +               offset += scale * fold_offsets (insn, XEXP (arg1, 0), analyze,
> +                                               foldable_insns);
> +             }
> +         }
> +       else if (GET_CODE (arg1) == PLUS
> +                && REG_P (XEXP (arg1, 0))
> +                && REG_P (XEXP (arg1, 1)))
> +         {
> +           /* Handle R1 = (R2 + R3) + ...  */
> +           if (do_recursion)
> +             {
> +               offset += fold_offsets (insn, XEXP (arg1, 0), analyze,
> +                                       foldable_insns);
> +               offset += fold_offsets (insn, XEXP (arg1, 1), analyze,
> +                                       foldable_insns);
> +             }
> +         }
> +       else if (GET_CODE (arg1) == PLUS
> +                && GET_CODE (XEXP (arg1, 0)) == ASHIFT
> +                && REG_P (XEXP (XEXP (arg1, 0), 0))
> +                && CONST_INT_P (XEXP (XEXP (arg1, 0), 1))
> +                && REG_P (XEXP (arg1, 1)))
> +         {
> +           /* Handle R1 = ((R2 << C) + R3) + ...  */
> +           if (do_recursion)
> +             {
> +               HOST_WIDE_INT scale
> +                 = (HOST_WIDE_INT_1U << INTVAL (XEXP (XEXP (arg1, 0), 1)));
> +               offset += scale * fold_offsets (insn, XEXP (XEXP (arg1, 0), 0),
> +                                               analyze, foldable_insns);
> +               offset += fold_offsets (insn, XEXP (arg1, 1), analyze,
> +                                       foldable_insns);
> +             }
> +         }
> +       else
> +         return false;
> +
> +       if (REG_P (arg2))
> +         {
> +           if (do_recursion)
> +             offset += fold_offsets (insn, arg2, analyze, foldable_insns);
> +         }
> +       else if (CONST_INT_P (arg2))
> +         {
> +           if (REG_P (arg1))
> +             {
> +               offset += INTVAL (arg2);
> +               /* This is a R1 = R2 + C instruction, candidate for folding.  */
> +               if (!analyze)
> +                 bitmap_set_bit (foldable_insns, INSN_UID (insn));
> +             }
> +         }
> +       else
> +         return false;
> +
> +       /* Pattern recognized for folding.  */
> +       break;
> +      }
> +    case MINUS:
> +      {
> +       /* Propagate through minus.  */
> +       rtx arg1 = XEXP (src, 0);
> +       rtx arg2 = XEXP (src, 1);
> +
> +       if (REG_P (arg1))
> +         {
> +           if (do_recursion)
> +             offset += fold_offsets (insn, arg1, analyze, foldable_insns);
> +         }
> +       else
> +         return false;
> +
> +       if (REG_P (arg2))
> +         {
> +           if (do_recursion)
> +             offset -= fold_offsets (insn, arg2, analyze, foldable_insns);
> +         }
> +       else if (CONST_INT_P (arg2))
> +         {
> +           if (REG_P (arg1))
> +             {
> +               offset -= INTVAL (arg2);
> +               /* This is a R1 = R2 - C instruction, candidate for folding.  */
> +               if (!analyze)
> +                 bitmap_set_bit (foldable_insns, INSN_UID (insn));
> +             }
> +         }
> +       else
> +         return false;
> +
> +       /* Pattern recognized for folding.  */
> +       break;
> +      }
> +    case NEG:
> +      {
> +       /* Propagate through negation.  */
> +       rtx arg1 = XEXP (src, 0);
> +       if (REG_P (arg1))
> +         {
> +           if (do_recursion)
> +             offset = -fold_offsets (insn, arg1, analyze, foldable_insns);
> +         }
> +       else
> +         return false;
> +
> +       /* Pattern recognized for folding.  */
> +       break;
> +      }
> +    case MULT:
> +      {
> +       /* Propagate through multiply by constant.  */
> +       rtx arg1 = XEXP (src, 0);
> +       rtx arg2 = XEXP (src, 1);
> +
> +       if (REG_P (arg1) && CONST_INT_P (arg2))
> +         {
> +           if (do_recursion)
> +             {
> +               HOST_WIDE_INT scale = INTVAL (arg2);
> +               offset = scale * fold_offsets (insn, arg1, analyze,
> +                                              foldable_insns);
> +             }
> +         }
> +       else
> +         return false;
> +
> +       /* Pattern recognized for folding.  */
> +       break;
> +      }
> +    case ASHIFT:
> +      {
> +       /* Propagate through shift left by constant.  */
> +       rtx arg1 = XEXP (src, 0);
> +       rtx arg2 = XEXP (src, 1);
> +
> +       if (REG_P (arg1) && CONST_INT_P (arg2))
> +         {
> +           if (do_recursion)
> +             {
> +               HOST_WIDE_INT scale = (HOST_WIDE_INT_1U << INTVAL (arg2));
> +               offset = scale * fold_offsets (insn, arg1, analyze,
> +                                              foldable_insns);
> +             }
> +         }
> +       else
> +         return false;
> +
> +       /* Pattern recognized for folding.  */
> +       break;
> +      }
> +    case REG:
> +      {
> +       /* Propagate through register move.  */
> +       if (do_recursion)
> +         offset = fold_offsets (insn, src, analyze, foldable_insns);
> +
> +       /* Pattern recognized for folding.  */
> +       break;
> +      }
> +    case CONST_INT:
> +      {
> +       offset = INTVAL (src);
> +       /* R1 = C is candidate for folding.  */
> +       if (!analyze)
> +         bitmap_set_bit (foldable_insns, INSN_UID (insn));
> +
> +       /* Pattern recognized for folding.  */
> +       break;
> +      }
> +    default:
> +      /* Cannot recognize.  */
> +      return false;
> +    }
> +
> +    if (do_recursion && !analyze)
> +      *offset_out = offset;
> +
> +    return true;
> +}
> +
> +/* Function that computes the offset that would have to be added to all uses
> +   of REG if the instructions marked in FOLDABLE_INSNS were to be eliminated.
> +
> +   If ANALYZE is true then mark in CAN_FOLD_INSNS which instructions
> +   transitively only affect other instructions found in CAN_FOLD_INSNS.
> +   If ANALYZE is false then compute the offset required for folding.  */
> +static HOST_WIDE_INT
> +fold_offsets (rtx_insn *insn, rtx reg, bool analyze, bitmap foldable_insns)
> +{
> +  rtx_insn *def = get_single_def_in_bb (insn, reg);
> +
> +  if (!def || GET_CODE (PATTERN (def)) != SET)
> +    return 0;
> +
> +  rtx dest = SET_DEST (PATTERN (def));
> +
> +  if (!REG_P (dest))
> +    return 0;
> +
> +  /* We can only affect the values of GPR registers.  */
> +  unsigned int dest_regno = REGNO (dest);
> +  if (fixed_regs[dest_regno]
> +      || !TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], dest_regno))
> +    return 0;
> +
> +  if (analyze)
> +    {
> +      /* Check if we know how to handle DEF.  */
> +      if (!fold_offsets_1 (def, true, false, NULL, NULL))
> +       return 0;
> +
> +      /* We only fold through instructions that are transitively used as
> +        memory addresses and do not have other uses.  Use the same logic
> +        from offset calculation to visit instructions that can propagate
> +        offsets and keep track of them in CAN_FOLD_INSNS.  */
> +      bool success;
> +      struct df_link *uses = get_uses (def, dest, &success), *ref_link;
> +
> +      if (!success)
> +       return 0;
> +
> +      for (ref_link = uses; ref_link; ref_link = ref_link->next)
> +       {
> +         rtx_insn *use = DF_REF_INSN (ref_link->ref);
> +
> +         if (DEBUG_INSN_P (use))
> +           continue;
> +
> +         /* Punt if the use is anything more complicated than a set
> +            (clobber, use, etc).  */
> +         if (!NONJUMP_INSN_P (use) || GET_CODE (PATTERN (use)) != SET)
> +           return 0;
> +
> +         /* This use affects instructions outside of CAN_FOLD_INSNS.  */
> +         if (!bitmap_bit_p (&can_fold_insns, INSN_UID (use)))
> +           return 0;
> +
> +         rtx use_set = PATTERN (use);
> +
> +         /* Special case: A foldable memory store is not foldable if it
> +            mentions DEST outside of the address calculation.  */
> +         if (use_set && MEM_P (SET_DEST (use_set))
> +             && reg_mentioned_p (dest, SET_SRC (use_set)))
> +           return 0;
> +       }
> +
> +      bitmap_set_bit (&can_fold_insns, INSN_UID (def));
> +
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +       {
> +         fprintf (dump_file, "Instruction marked for propagation: ");
> +         print_rtl_single (dump_file, def);
> +       }
> +    }
> +  else
> +    {
> +      /* We cannot propagate through this instruction.  */
> +      if (!bitmap_bit_p (&can_fold_insns, INSN_UID (def)))
> +       return 0;
> +    }
> +
> +  HOST_WIDE_INT offset = 0;
> +  bool recognized = fold_offsets_1 (def, analyze, true, &offset,
> +                                   foldable_insns);
> +
> +  if (!recognized)
> +    return 0;
> +
> +  return offset;
> +}
> +
> +/* Test if INSN is a memory load / store that can have an offset folded to it.
> +   Return true iff INSN is such an instruction and return through MEM_OUT,
> +   REG_OUT and OFFSET_OUT the RTX that has a MEM code, the register that is
> +   used as a base address and the offset accordingly.
> +   All of the out pointers may be NULL in which case they will be ignored.  */
> +bool
> +get_fold_mem_root (rtx_insn *insn, rtx *mem_out, rtx *reg_out,
> +                  HOST_WIDE_INT *offset_out)
> +{
> +  rtx set = single_set (insn);
> +  rtx mem = NULL_RTX;
> +
> +  if (set != NULL_RTX)
> +    {
> +      rtx src = SET_SRC (set);
> +      rtx dest = SET_DEST (set);
> +
> +      /* Don't fold when we have unspec / volatile.  */
> +      if (GET_CODE (src) == UNSPEC
> +         || GET_CODE (src) == UNSPEC_VOLATILE
> +         || GET_CODE (dest) == UNSPEC
> +         || GET_CODE (dest) == UNSPEC_VOLATILE)
> +       return false;
> +
> +      if (MEM_P (src))
> +       mem = src;
> +      else if (MEM_P (dest))
> +       mem = dest;
> +      else if ((GET_CODE (src) == SIGN_EXTEND
> +               || GET_CODE (src) == ZERO_EXTEND)
> +              && MEM_P (XEXP (src, 0)))
> +       mem = XEXP (src, 0);
> +    }
> +
> +  if (mem == NULL_RTX)
> +    return false;
> +
> +  rtx mem_addr = XEXP (mem, 0);
> +  rtx reg;
> +  HOST_WIDE_INT offset;
> +
> +  if (REG_P (mem_addr))
> +    {
> +      reg = mem_addr;
> +      offset = 0;
> +    }
> +  else if (GET_CODE (mem_addr) == PLUS
> +          && REG_P (XEXP (mem_addr, 0))
> +          && CONST_INT_P (XEXP (mem_addr, 1)))
> +    {
> +      reg = XEXP (mem_addr, 0);
> +      offset = INTVAL (XEXP (mem_addr, 1));
> +    }
> +  else
> +    return false;
> +
> +  if (mem_out)
> +    *mem_out = mem;
> +  if (reg_out)
> +    *reg_out = reg;
> +  if (offset_out)
> +    *offset_out = offset;
> +
> +  return true;
> +}
> +
> +/* If INSN is a root memory instruction then do a DFS traversal on its
> +   definitions and find folding candidates.  */
> +static void
> +do_analysis (rtx_insn *insn)
> +{
> +  rtx reg;
> +  if (!get_fold_mem_root (insn, NULL, &reg, NULL))
> +    return;
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    {
> +      fprintf (dump_file, "Starting analysis from root: ");
> +      print_rtl_single (dump_file, insn);
> +    }
> +
> +  /* Analyse folding opportunities for this memory instruction.  */
> +  bitmap_set_bit (&can_fold_insns, INSN_UID (insn));
> +  fold_offsets (insn, reg, true, NULL);
> +}
> +
> +static void
> +do_fold_info_calculation (rtx_insn *insn, fold_info_map *fold_info)
> +{
> +  rtx mem, reg;
> +  HOST_WIDE_INT cur_offset;
> +  if (!get_fold_mem_root (insn, &mem, &reg, &cur_offset))
> +    return;
> +
> +  fold_mem_info *info = new fold_mem_info;
> +  info->added_offset = fold_offsets (insn, reg, false, info->fold_insns);
> +
> +  fold_info->put (insn, info);
> +}
> +
> +/* If INSN is a root memory instruction then compute a potentially new offset
> +   for it and test if the resulting instruction is valid.  */
> +static void
> +do_check_validity (rtx_insn *insn, fold_mem_info *info)
> +{
> +  rtx mem, reg;
> +  HOST_WIDE_INT cur_offset;
> +  if (!get_fold_mem_root (insn, &mem, &reg, &cur_offset))
> +    return;
> +
> +  HOST_WIDE_INT new_offset = cur_offset + info->added_offset;
> +
> +  /* Test if it is valid to change MEM's address offset to NEW_OFFSET.  */
> +  int icode = INSN_CODE (insn);
> +  INSN_CODE (insn) = -1;
> +  rtx mem_addr = XEXP (mem, 0);
> +  machine_mode mode = GET_MODE (mem_addr);
> +  if (new_offset != 0)
> +    XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, gen_int_mode (new_offset, mode));
> +  else
> +    XEXP (mem, 0) = reg;
> +
> +  bool illegal = insn_invalid_p (insn, false)
> +                || !memory_address_addr_space_p (mode, XEXP (mem, 0),
> +                                                 MEM_ADDR_SPACE (mem));
> +
> +  /* Restore the instruction.  */
> +  XEXP (mem, 0) = mem_addr;
> +  INSN_CODE (insn) = icode;
> +
> +  if (illegal)
> +    bitmap_ior_into (&cannot_fold_insns, info->fold_insns);
> +  else
> +    bitmap_ior_into (&candidate_fold_insns, info->fold_insns);
> +}
> +
> +static bool
> +compute_validity_closure (fold_info_map *fold_info)
> +{
> +  /* Let's say we have an arbitrary chain of foldable instructions xN = xN + C
> +     and memory operations rN that use xN as shown below.  If folding x1 in r1
> +     turns out to be invalid for whatever reason then it's also invalid to fold
> +     any of the other xN into any rN.  That means that we need the transitive
> +     closure of validity to determine whether we can fold a xN instruction.
> +
> +     +--------------+    +-------------------+    +-------------------+
> +     | r1 = mem[x1] |    | r2 = mem[x1 + x2] |    | r3 = mem[x2 + x3] |   ...
> +     +--------------+    +-------------------+    +-------------------+
> +           ^                ^       ^                ^       ^
> +           |               /        |               /        |           ...
> +           |              /         |              /         |
> +     +-------------+      /   +-------------+      /   +-------------+
> +     | x1 = x1 + 1 |-----+    | x2 = x2 + 1 |-----+    | x3 = x3 + 1 |--- ...
> +     +-------------+          +-------------+          +-------------+
> +           ^                        ^                        ^
> +           |                        |                        |
> +          ...                      ...                      ...
> +  */
> +
> +  /* In general three iterations should be enough for most cases, but allow up
> +     to five when -fexpensive-optimizations is used.  */
> +  int max_iters = 3 + 2 * flag_expensive_optimizations;
> +  for (int pass = 0; pass < max_iters; pass++)
> +    {
> +      bool made_changes = false;
> +      for (fold_info_map::iterator iter = fold_info->begin ();
> +          iter != fold_info->end (); ++iter)
> +       {
> +         fold_mem_info *info = (*iter).second;
> +         if (bitmap_intersect_p (&cannot_fold_insns, info->fold_insns))
> +           made_changes |= bitmap_ior_into (&cannot_fold_insns,
> +                                            info->fold_insns);
> +       }
> +
> +      if (!made_changes)
> +       return true;
> +    }
> +
> +  return false;
> +}
> +
> +/* If INSN is a root memory instruction that was affected by any folding
> +   then update its offset as necessary.  */
> +static void
> +do_commit_offset (rtx_insn *insn, fold_mem_info *info)
> +{
> +  rtx mem, reg;
> +  HOST_WIDE_INT cur_offset;
> +  if (!get_fold_mem_root (insn, &mem, &reg, &cur_offset))
> +    return;
> +
> +  HOST_WIDE_INT new_offset = cur_offset + info->added_offset;
> +
> +  if (new_offset == cur_offset)
> +    return;
> +
> +  gcc_assert (!bitmap_empty_p (info->fold_insns));
> +
> +  if (bitmap_intersect_p (&cannot_fold_insns, info->fold_insns))
> +    return;
> +
> +  if (dump_file)
> +    {
> +      fprintf (dump_file, "Memory offset changed from "
> +              HOST_WIDE_INT_PRINT_DEC " to " HOST_WIDE_INT_PRINT_DEC
> +              " for instruction:\n", cur_offset, new_offset);
> +      print_rtl_single (dump_file, insn);
> +    }
> +
> +  machine_mode mode = GET_MODE (XEXP (mem, 0));
> +  if (new_offset != 0)
> +    XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, gen_int_mode (new_offset, mode));
> +  else
> +    XEXP (mem, 0) = reg;
> +  INSN_CODE (insn) = recog (PATTERN (insn), insn, 0);
> +  df_insn_rescan (insn);
> +}
> +
> +/* If INSN is a move / add instruction that was folded then replace its
> +   constant part with zero.  */
> +static void
> +do_commit_insn (rtx_insn *insn)
> +{
> +  if (bitmap_bit_p (&candidate_fold_insns, INSN_UID (insn))
> +      && !bitmap_bit_p (&cannot_fold_insns, INSN_UID (insn)))
> +    {
> +      if (dump_file)
> +       {
> +         fprintf (dump_file, "Instruction folded:");
> +         print_rtl_single (dump_file, insn);
> +       }
> +
> +      stats_fold_count++;
> +
> +      rtx set = single_set (insn);
> +      rtx dest = SET_DEST (set);
> +      rtx src = SET_SRC (set);
> +
> +      /* Emit a move and let subsequent passes eliminate it if possible.  */
> +      if (GET_CODE (src) == CONST_INT)
> +       {
> +         /* INSN is R1 = C.
> +            Replace it with R1 = 0 because C was folded.  */
> +         rtx mov_rtx
> +           = gen_move_insn (dest, gen_int_mode (0, GET_MODE (dest)));
> +         df_insn_rescan (emit_insn_after (mov_rtx, insn));
> +       }
> +      else
> +       {
> +         /* INSN is R1 = R2 + C.
> +            Replace it with R1 = R2 because C was folded.  */
> +         rtx arg1 = XEXP (src, 0);
> +
> +         /* If the DEST == ARG1 then the move is a no-op.  */
> +         if (REGNO (dest) != REGNO (arg1))
> +           {
> +             gcc_checking_assert (GET_MODE (dest) == GET_MODE (arg1));
> +             rtx mov_rtx = gen_move_insn (dest, arg1);
> +             df_insn_rescan (emit_insn_after (mov_rtx, insn));
> +           }
> +       }
> +
> +      /* Delete the original move / add instruction.  */
> +      delete_insn (insn);
> +    }
> +}
> +
> +unsigned int
> +pass_fold_mem_offsets::execute (function *fn)
> +{
> +  df_set_flags (DF_EQ_NOTES + DF_RD_PRUNE_DEAD_DEFS + DF_DEFER_INSN_RESCAN);
> +  df_chain_add_problem (DF_UD_CHAIN + DF_DU_CHAIN);
> +  df_analyze ();
> +
> +  bitmap_initialize (&can_fold_insns, NULL);
> +  bitmap_initialize (&candidate_fold_insns, NULL);
> +  bitmap_initialize (&cannot_fold_insns, NULL);
> +
> +  stats_fold_count = 0;
> +
> +  basic_block bb;
> +  rtx_insn *insn;
> +  FOR_ALL_BB_FN (bb, fn)
> +    {
> +      /* There is a conflict between this pass and RISCV's shorten-memrefs
> +         pass.  For now disable folding if optimizing for size because
> +         otherwise this cancels the effects of shorten-memrefs.  */
> +      if (optimize_bb_for_size_p (bb))
> +       continue;
> +
> +      fold_info_map fold_info;
> +
> +      bitmap_clear (&can_fold_insns);
> +      bitmap_clear (&candidate_fold_insns);
> +      bitmap_clear (&cannot_fold_insns);
> +
> +      FOR_BB_INSNS (bb, insn)
> +       do_analysis (insn);
> +
> +      FOR_BB_INSNS (bb, insn)
> +       do_fold_info_calculation (insn, &fold_info);
> +
> +      FOR_BB_INSNS (bb, insn)
> +       if (fold_mem_info **info = fold_info.get (insn))
> +         do_check_validity (insn, *info);
> +
> +      if (compute_validity_closure (&fold_info))
> +       {
> +         FOR_BB_INSNS (bb, insn)
> +           if (fold_mem_info **info = fold_info.get (insn))
> +             do_commit_offset (insn, *info);
> +
> +         FOR_BB_INSNS (bb, insn)
> +           do_commit_insn (insn);
> +       }
> +
> +      for (fold_info_map::iterator iter = fold_info.begin ();
> +          iter != fold_info.end (); ++iter)
> +       delete (*iter).second;
> +    }
> +
> +  statistics_counter_event (cfun, "Number of folded instructions",
> +                           stats_fold_count);
> +
> +  bitmap_release (&can_fold_insns);
> +  bitmap_release (&candidate_fold_insns);
> +  bitmap_release (&cannot_fold_insns);
> +
> +  return 0;
> +}
> +
> +} // anon namespace
> +
> +rtl_opt_pass *
> +make_pass_fold_mem_offsets (gcc::context *ctxt)
> +{
> +  return new pass_fold_mem_offsets (ctxt);
> +}
> diff --git a/gcc/passes.def b/gcc/passes.def
> index 2bafd60bbfb..df7965dc50f 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -519,6 +519,7 @@ along with GCC; see the file COPYING3.  If not see
>           NEXT_PASS (pass_peephole2);
>           NEXT_PASS (pass_if_after_reload);
>           NEXT_PASS (pass_regrename);
> +         NEXT_PASS (pass_fold_mem_offsets);
>           NEXT_PASS (pass_cprop_hardreg);
>           NEXT_PASS (pass_fast_rtl_dce);
>           NEXT_PASS (pass_reorder_blocks);
> diff --git a/gcc/testsuite/gcc.target/i386/pr52146.c b/gcc/testsuite/gcc.target/i386/pr52146.c
> index 9bd81368bcd..03e3e95ec04 100644
> --- a/gcc/testsuite/gcc.target/i386/pr52146.c
> +++ b/gcc/testsuite/gcc.target/i386/pr52146.c
> @@ -16,4 +16,4 @@ test2 (void)
>    *apic_tpr_addr = 0;
>  }
>
> -/* { dg-final { scan-assembler-not "\[,\\t \]+-18874240" } } */
> +/* { dg-final { scan-assembler-not "\[,\\t \]+-18874240\n" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
> new file mode 100644
> index 00000000000..ffb49936dc6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ffold-mem-offsets" } */
> +
> +void sink(int arr[2]);
> +
> +void
> +foo(int a, int b, int i)
> +{
> +  int arr[2] = {a, b};
> +  arr[i]++;
> +  sink(arr);
> +}
> +
> +/* The should be no negative memory offsets when using -ffold-mem-offsets.  */
> +/* { dg-final { scan-assembler-not "lw\t.*,-.*\\(.*\\)" } } */
> +/* { dg-final { scan-assembler-not "sw\t.*,-.*\\(.*\\)" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
> new file mode 100644
> index 00000000000..ca96180470a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ffold-mem-offsets" } */
> +
> +void sink(int arr[3]);
> +
> +void
> +foo(int a, int b, int c, int i)
> +{
> +  int arr1[3] = {a, b, c};
> +  int arr2[3] = {a, c, b};
> +  int arr3[3] = {c, b, a};
> +
> +  arr1[i]++;
> +  arr2[i]++;
> +  arr3[i]++;
> +
> +  sink(arr1);
> +  sink(arr2);
> +  sink(arr3);
> +}
> +
> +/* The should be no negative memory offsets when using -ffold-mem-offsets.  */
> +/* { dg-final { scan-assembler-not "lw\t.*,-.*\\(.*\\)" } } */
> +/* { dg-final { scan-assembler-not "sw\t.*,-.*\\(.*\\)" } } */
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c
> new file mode 100644
> index 00000000000..83f82c02caa
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ffold-mem-offsets" } */
> +
> +void load(int arr[2]);
> +
> +int
> +foo(long unsigned int i)
> +{
> +  int arr[2];
> +  load(arr);
> +
> +  return arr[3 * i + 77];
> +}
> +
> +/* The should be no negative memory offsets when using -ffold-mem-offsets.  */
> +/* { dg-final { scan-assembler-not "lw\t.*,-.*\\(.*\\)" } } */
> +/* { dg-final { scan-assembler-not "addi\t.*,.*,77" } } */
> \ No newline at end of file
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index 9c4b1e4185c..79a5f330274 100644
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -622,6 +622,7 @@ extern rtl_opt_pass *make_pass_sched_fusion (gcc::context *ctxt);
>  extern rtl_opt_pass *make_pass_peephole2 (gcc::context *ctxt);
>  extern rtl_opt_pass *make_pass_if_after_reload (gcc::context *ctxt);
>  extern rtl_opt_pass *make_pass_regrename (gcc::context *ctxt);
> +extern rtl_opt_pass *make_pass_fold_mem_offsets (gcc::context *ctxt);
>  extern rtl_opt_pass *make_pass_cprop_hardreg (gcc::context *ctxt);
>  extern rtl_opt_pass *make_pass_reorder_blocks (gcc::context *ctxt);
>  extern rtl_opt_pass *make_pass_leaf_regs (gcc::context *ctxt);
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7] Implement new RTL optimizations pass: fold-mem-offsets.
  2023-10-16 18:01 [PATCH v7] Implement new RTL optimizations pass: fold-mem-offsets Manolis Tsamis
  2023-10-16 18:14 ` Manolis Tsamis
@ 2023-10-16 19:11 ` Jeff Law
  2023-11-27 20:52   ` Jakub Jelinek
  1 sibling, 1 reply; 12+ messages in thread
From: Jeff Law @ 2023-10-16 19:11 UTC (permalink / raw)
  To: Manolis Tsamis, gcc-patches; +Cc: Richard Biener, Vineet Gupta, Philipp Tomsich



On 10/16/23 12:01, Manolis Tsamis wrote:
> This is a new RTL pass that tries to optimize memory offset calculations
> by moving them from add immediate instructions to the memory loads/stores.
> For example it can transform this:
> 
>    addi t4,sp,16
>    add  t2,a6,t4
>    shl  t3,t2,1
>    ld   a2,0(t3)
>    addi a2,1
>    sd   a2,8(t2)
> 
> into the following (one instruction less):
> 
>    add  t2,a6,sp
>    shl  t3,t2,1
>    ld   a2,32(t3)
>    addi a2,1
>    sd   a2,24(t2)
> 
> Although there are places where this is done already, this pass is more
> powerful and can handle the more difficult cases that are currently not
> optimized. Also, it runs late enough and can optimize away unnecessary
> stack pointer calculations.
> 
> gcc/ChangeLog:
> 
> 	* Makefile.in: Add fold-mem-offsets.o.
> 	* passes.def: Schedule a new pass.
> 	* tree-pass.h (make_pass_fold_mem_offsets): Declare.
> 	* common.opt: New options.
> 	* doc/invoke.texi: Document new option.
> 	* fold-mem-offsets.cc: New file.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/riscv/fold-mem-offsets-1.c: New test.
> 	* gcc.target/riscv/fold-mem-offsets-2.c: New test.
> 	* gcc.target/riscv/fold-mem-offsets-3.c: New test.
Thanks, I've pushed this to the trunk.

jeff
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7] Implement new RTL optimizations pass: fold-mem-offsets.
  2023-10-16 19:11 ` Jeff Law
@ 2023-11-27 20:52   ` Jakub Jelinek
  2023-11-27 23:51     ` [PATCH] fold-mem-offsets: Fix powerpc64le-linux profiledbootstrap [PR111601] Jakub Jelinek
                       ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Jakub Jelinek @ 2023-11-27 20:52 UTC (permalink / raw)
  To: Jeff Law, Manolis Tsamis
  Cc: gcc-patches, Richard Biener, Vineet Gupta, Philipp Tomsich

On Mon, Oct 16, 2023 at 01:11:01PM -0600, Jeff Law wrote:
> > gcc/ChangeLog:
> > 
> > 	* Makefile.in: Add fold-mem-offsets.o.
> > 	* passes.def: Schedule a new pass.
> > 	* tree-pass.h (make_pass_fold_mem_offsets): Declare.
> > 	* common.opt: New options.
> > 	* doc/invoke.texi: Document new option.
> > 	* fold-mem-offsets.cc: New file.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > 	* gcc.target/riscv/fold-mem-offsets-1.c: New test.
> > 	* gcc.target/riscv/fold-mem-offsets-2.c: New test.
> > 	* gcc.target/riscv/fold-mem-offsets-3.c: New test.
> Thanks, I've pushed this to the trunk.

This breaks profiledbootstrap on powerpc64le-linux.
From what I can see, the pass works one basic block at a time and
will punt on any non-DEBUG_INSN uses outside of the current block
(I believe because of the
          /* This use affects instructions outside of CAN_FOLD_INSNS.  */
          if (!bitmap_bit_p (&can_fold_insns, INSN_UID (use)))
            return 0;
test and can_fold_insns only set in do_analysis (when processing insns in
current bb, cleared at the end) or results of get_single_def_in_bb
(which are checked to be in the same bb).
But, while get_single_def_in_bb checks for
  if (DF_INSN_LUID (def) > DF_INSN_LUID (insn))
    return NULL;
(OT, why not DF_INSN_INFO_LUID (DF_REF_INSN_INFO (ref_chain->ref))
instead of DF_INSN_LUID (def), then it doesn't need to look up
DF_INSN_INFO_GET (def)?), nothing when walking all uses of def does such
luid check.
The basic block in the PR in question has:
...
(insn 212 210 215 25 (set (mem/f:DI (reg/v/f:DI 10 10 [orig:152 last_viable ] [152]) [2 *last_viable_336+0 S8 A64])
        (reg/f:DI 9 9 [orig:155 _342 ] [155])) "pr111601.ii":50:17 683 {*movdi_internal64}
     (expr_list:REG_DEAD (reg/v/f:DI 10 10 [orig:152 last_viable ] [152])
        (nil)))
(insn 215 212 484 25 (set (reg:DI 5 5 [226])
        (const_int 0 [0])) "pr111601.ii":52:12 683 {*movdi_internal64}
     (expr_list:REG_EQUIV (const_int 0 [0])
        (nil)))
(insn 484 215 218 25 (set (reg/v/f:DI 10 10 [orig:152 last_viable ] [152])
        (reg/f:DI 9 9 [orig:155 _342 ] [155])) "pr111601.ii":52:12 683 {*movdi_internal64}
     (nil))
...
(insn 564 214 216 25 (set (reg/v/f:DI 10 10 [orig:152 last_viable ] [152])
        (plus:DI (reg/v/f:DI 10 10 [orig:152 last_viable ] [152])
            (const_int 96 [0x60]))) "pr111601.ii":52:12 66 {*adddi3}
     (nil))
(insn 216 564 219 25 (set (mem/f:DI (reg/v/f:DI 10 10 [orig:152 last_viable ] [152]) [2 _343->next+0 S8 A64])
        (reg:DI 5 5 [226])) "pr111601.ii":52:12 683 {*movdi_internal64}
     (expr_list:REG_DEAD (reg:DI 5 5 [226])
        (nil)))
...
and when asking for all uses of %r10 from def 564, it will see uses
in 216 and 212; the former is after the += 96 addition and gets changed
to load from %r10+96 with the addition being dropped, but there is
the other store which is a use across the backedge and when reached
from other edges certainly doesn't have the + 96 addition anywhere,
so the pass doesn't actually change that location.

Haven't bootstrapped/regtested this yet, will start momentarily,
posting here just in case I'm missing something important.

2023-11-27  Jakub Jelinek  <jakub@redhat.com>

	PR bootstrap/111601
	* fold-mem-offsets.cc (fold_offsets): Punt if use appears before
	def in the basic block.

	* g++.dg/opt/pr111601.C: New test.

--- gcc/fold-mem-offsets.cc.jj	2023-11-02 07:49:17.060865772 +0100
+++ gcc/fold-mem-offsets.cc	2023-11-27 21:35:35.089007365 +0100
@@ -511,6 +511,7 @@ fold_offsets (rtx_insn *insn, rtx reg, b
       if (!success)
 	return 0;
 
+      unsigned luid = DF_INSN_LUID (def);
       for (ref_link = uses; ref_link; ref_link = ref_link->next)
 	{
 	  rtx_insn *use = DF_REF_INSN (ref_link->ref);
@@ -534,6 +535,11 @@ fold_offsets (rtx_insn *insn, rtx reg, b
 	  if (use_set && MEM_P (SET_DEST (use_set))
 	      && reg_mentioned_p (dest, SET_SRC (use_set)))
 	    return 0;
+
+	  /* Punt if use appears before def in the basic block.  See
+	     PR111601.  */
+	  if (DF_INSN_INFO_LUID (DF_REF_INSN_INFO (ref_link->ref)) < luid)
+	    return 0;
 	}
 
       bitmap_set_bit (&can_fold_insns, INSN_UID (def));
--- gcc/testsuite/g++.dg/opt/pr111601.C.jj	2023-11-27 21:33:12.605006881 +0100
+++ gcc/testsuite/g++.dg/opt/pr111601.C	2023-11-27 21:34:47.267678510 +0100
@@ -0,0 +1,86 @@
+// PR bootstrap/111601
+// { dg-do run { target c++11 } }
+// { dg-options "-O2 -fno-exceptions -fno-rtti -fprofile-generate" }
+// { dg-require-profiling "-fprofile-generate" }
+// { dg-final { cleanup-coverage-files } }
+
+struct tree_base
+{
+  int code:16;
+};
+struct saved_scope
+{
+  void *pad[14];
+  int x_processing_template_decl;
+};
+struct saved_scope *scope_chain;
+struct z_candidate
+{
+  tree_base *fn;
+  void *pad[11];
+  z_candidate *next;
+  int viable;
+  int flags;
+};
+
+__attribute__((noipa)) struct z_candidate *
+splice_viable (struct z_candidate *cands, bool strict_p, bool *any_viable_p)
+{
+  struct z_candidate *viable;
+  struct z_candidate **last_viable;
+  struct z_candidate **cand;
+  bool found_strictly_viable = false;
+  if (scope_chain->x_processing_template_decl)
+    strict_p = true;
+  viable = (z_candidate *) 0;
+  last_viable = &viable;
+  *any_viable_p = false;
+  cand = &cands;
+  while (*cand)
+    {
+      struct z_candidate *c = *cand;
+      if (!strict_p && (c->viable == 1 || ((int) (c->fn)->code) == 273))
+	{
+	  strict_p = true;
+	  if (viable && !found_strictly_viable)
+	    {
+	      *any_viable_p = false;
+	      *last_viable = cands;
+	      cands = viable;
+	      viable = (z_candidate *) 0;
+	      last_viable = &viable;
+	    }
+	}
+      if (strict_p ? c->viable == 1 : c->viable)
+	{
+	  *last_viable = c;
+	  *cand = c->next;
+	  c->next = (z_candidate *) 0;
+	  last_viable = &c->next;
+	  *any_viable_p = true;
+	  if (c->viable == 1)
+	    found_strictly_viable = true;
+	}
+      else
+	cand = &c->next;
+    }
+  return viable ? viable : cands;
+}
+
+int
+main ()
+{
+  saved_scope s{};
+  scope_chain = &s;
+  z_candidate z[4] = {};
+  z[0].next = &z[1];
+  z[1].viable = 1;
+  z[1].next = &z[2];
+  z[2].viable = 1;
+  z[2].next = &z[3];
+  bool b;
+  z_candidate *c = splice_viable (&z[0], true, &b);
+  if (c != &z[1] || z[1].next != &z[2] || z[2].next)
+    __builtin_abort ();
+  return 0;
+}


	Jakub


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] fold-mem-offsets: Fix powerpc64le-linux profiledbootstrap [PR111601]
  2023-11-27 20:52   ` Jakub Jelinek
@ 2023-11-27 23:51     ` Jakub Jelinek
  2023-11-27 23:55       ` Andrew Pinski
  2023-11-28  9:45     ` [PATCH v7] Implement new RTL optimizations pass: fold-mem-offsets Manolis Tsamis
  2023-11-28 17:50     ` [PATCH v7] Implement new RTL optimizations pass: fold-mem-offsets Jeff Law
  2 siblings, 1 reply; 12+ messages in thread
From: Jakub Jelinek @ 2023-11-27 23:51 UTC (permalink / raw)
  To: Jeff Law, Manolis Tsamis, gcc-patches, Richard Biener,
	Vineet Gupta, Philipp Tomsich

On Mon, Nov 27, 2023 at 09:52:14PM +0100, Jakub Jelinek wrote:
> On Mon, Oct 16, 2023 at 01:11:01PM -0600, Jeff Law wrote:
> > > gcc/ChangeLog:
> > > 
> > > 	* Makefile.in: Add fold-mem-offsets.o.
> > > 	* passes.def: Schedule a new pass.
> > > 	* tree-pass.h (make_pass_fold_mem_offsets): Declare.
> > > 	* common.opt: New options.
> > > 	* doc/invoke.texi: Document new option.
> > > 	* fold-mem-offsets.cc: New file.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > > 	* gcc.target/riscv/fold-mem-offsets-1.c: New test.
> > > 	* gcc.target/riscv/fold-mem-offsets-2.c: New test.
> > > 	* gcc.target/riscv/fold-mem-offsets-3.c: New test.
> > Thanks, I've pushed this to the trunk.
> 
> This breaks profiledbootstrap on powerpc64le-linux.
> >From what I can see, the pass works one basic block at a time and
> will punt on any non-DEBUG_INSN uses outside of the current block
> (I believe because of the
>           /* This use affects instructions outside of CAN_FOLD_INSNS.  */
>           if (!bitmap_bit_p (&can_fold_insns, INSN_UID (use)))
>             return 0;
> test and can_fold_insns only set in do_analysis (when processing insns in
> current bb, cleared at the end) or results of get_single_def_in_bb
> (which are checked to be in the same bb).
> But, while get_single_def_in_bb checks for
>   if (DF_INSN_LUID (def) > DF_INSN_LUID (insn))
>     return NULL;
> (OT, why not DF_INSN_INFO_LUID (DF_REF_INSN_INFO (ref_chain->ref))
> instead of DF_INSN_LUID (def), then it doesn't need to look up
> DF_INSN_INFO_GET (def)?), nothing when walking all uses of def does such
> luid check.
> The basic block in the PR in question has:
> ...
> (insn 212 210 215 25 (set (mem/f:DI (reg/v/f:DI 10 10 [orig:152 last_viable ] [152]) [2 *last_viable_336+0 S8 A64])
>         (reg/f:DI 9 9 [orig:155 _342 ] [155])) "pr111601.ii":50:17 683 {*movdi_internal64}
>      (expr_list:REG_DEAD (reg/v/f:DI 10 10 [orig:152 last_viable ] [152])
>         (nil)))
> (insn 215 212 484 25 (set (reg:DI 5 5 [226])
>         (const_int 0 [0])) "pr111601.ii":52:12 683 {*movdi_internal64}
>      (expr_list:REG_EQUIV (const_int 0 [0])
>         (nil)))
> (insn 484 215 218 25 (set (reg/v/f:DI 10 10 [orig:152 last_viable ] [152])
>         (reg/f:DI 9 9 [orig:155 _342 ] [155])) "pr111601.ii":52:12 683 {*movdi_internal64}
>      (nil))
> ...
> (insn 564 214 216 25 (set (reg/v/f:DI 10 10 [orig:152 last_viable ] [152])
>         (plus:DI (reg/v/f:DI 10 10 [orig:152 last_viable ] [152])
>             (const_int 96 [0x60]))) "pr111601.ii":52:12 66 {*adddi3}
>      (nil))
> (insn 216 564 219 25 (set (mem/f:DI (reg/v/f:DI 10 10 [orig:152 last_viable ] [152]) [2 _343->next+0 S8 A64])
>         (reg:DI 5 5 [226])) "pr111601.ii":52:12 683 {*movdi_internal64}
>      (expr_list:REG_DEAD (reg:DI 5 5 [226])
>         (nil)))
> ...
> and when asking for all uses of %r10 from def 564, it will see uses
> in 216 and 212; the former is after the += 96 addition and gets changed
> to load from %r10+96 with the addition being dropped, but there is
> the other store which is a use across the backedge and when reached
> from other edges certainly doesn't have the + 96 addition anywhere,
> so the pass doesn't actually change that location.
> 
> Haven't bootstrapped/regtested this yet, will start momentarily,
> posting here just in case I'm missing something important.

That version failed bootstrap because DF_INSN_LUID is signed int, not
unsigned.

Here is what so far passed on powerpc64le-linux
../configure --enable-languages=c,c++ --enable-checking=yes,rtl,extra --disable-libsanitizer --with-long-double-128
make -j160 profiledbootstrap
which has been failing for more than a month now.

I've noticed a couple of small formatting issues and fixed them too,
doing normal {powerpc64le,x86_64,i686}-linux bootstraps/regtests:

2023-11-27  Jakub Jelinek  <jakub@redhat.com>

	PR bootstrap/111601
	* fold-mem-offsets.cc (fold_offsets): Punt if use appears before
	def in the basic block.
	(get_single_def_in_bb, get_uses): Formatting fixes.
	(fold_offsets_1, pass_fold_mem_offsets::execute): Comment formatting
	fixes.

	* g++.dg/opt/pr111601.C: New test.

--- gcc/fold-mem-offsets.cc.jj	2023-11-02 07:49:17.060865772 +0100
+++ gcc/fold-mem-offsets.cc	2023-11-27 22:47:21.128591332 +0100
@@ -154,7 +154,7 @@ static int stats_fold_count;
    The definition is desired for REG used in INSN.
    Return the definition insn or NULL if there's no definition with
    the desired criteria.  */
-static rtx_insn*
+static rtx_insn *
 get_single_def_in_bb (rtx_insn *insn, rtx reg)
 {
   df_ref use;
@@ -205,7 +205,7 @@ get_single_def_in_bb (rtx_insn *insn, rt
 /* Get all uses of REG which is set in INSN.  Return the use list or NULL if a
    use is missing / irregular.  If SUCCESS is not NULL then set it to false if
    there are missing / irregular uses and true otherwise.  */
-static struct df_link*
+static struct df_link *
 get_uses (rtx_insn *insn, rtx reg, bool *success)
 {
   df_ref def;
@@ -255,8 +255,7 @@ fold_offsets (rtx_insn *insn, rtx reg, b
 
     If DO_RECURSION is true and ANALYZE is false then offset that would result
     from folding is computed and is returned through the pointer OFFSET_OUT.
-    The instructions that can be folded are recorded in FOLDABLE_INSNS.
-*/
+    The instructions that can be folded are recorded in FOLDABLE_INSNS.  */
 static bool
 fold_offsets_1 (rtx_insn *insn, bool analyze, bool do_recursion,
 		HOST_WIDE_INT *offset_out, bitmap foldable_insns)
@@ -511,6 +510,7 @@ fold_offsets (rtx_insn *insn, rtx reg, b
       if (!success)
 	return 0;
 
+      int luid = DF_INSN_LUID (def);
       for (ref_link = uses; ref_link; ref_link = ref_link->next)
 	{
 	  rtx_insn *use = DF_REF_INSN (ref_link->ref);
@@ -534,6 +534,11 @@ fold_offsets (rtx_insn *insn, rtx reg, b
 	  if (use_set && MEM_P (SET_DEST (use_set))
 	      && reg_mentioned_p (dest, SET_SRC (use_set)))
 	    return 0;
+
+	  /* Punt if use appears before def in the basic block.  See
+	     PR111601.  */
+	  if (DF_INSN_LUID (use) < luid)
+	    return 0;
 	}
 
       bitmap_set_bit (&can_fold_insns, INSN_UID (def));
@@ -846,8 +851,8 @@ pass_fold_mem_offsets::execute (function
   FOR_ALL_BB_FN (bb, fn)
     {
       /* There is a conflict between this pass and RISCV's shorten-memrefs
-	  pass.  For now disable folding if optimizing for size because
-	  otherwise this cancels the effects of shorten-memrefs.  */
+	 pass.  For now disable folding if optimizing for size because
+	 otherwise this cancels the effects of shorten-memrefs.  */
       if (optimize_bb_for_size_p (bb))
 	continue;
 
--- gcc/testsuite/g++.dg/opt/pr111601.C.jj	2023-11-27 21:33:12.605006881 +0100
+++ gcc/testsuite/g++.dg/opt/pr111601.C	2023-11-27 21:34:47.267678510 +0100
@@ -0,0 +1,86 @@
+// PR bootstrap/111601
+// { dg-do run { target c++11 } }
+// { dg-options "-O2 -fno-exceptions -fno-rtti -fprofile-generate" }
+// { dg-require-profiling "-fprofile-generate" }
+// { dg-final { cleanup-coverage-files } }
+
+struct tree_base
+{
+  int code:16;
+};
+struct saved_scope
+{
+  void *pad[14];
+  int x_processing_template_decl;
+};
+struct saved_scope *scope_chain;
+struct z_candidate
+{
+  tree_base *fn;
+  void *pad[11];
+  z_candidate *next;
+  int viable;
+  int flags;
+};
+
+__attribute__((noipa)) struct z_candidate *
+splice_viable (struct z_candidate *cands, bool strict_p, bool *any_viable_p)
+{
+  struct z_candidate *viable;
+  struct z_candidate **last_viable;
+  struct z_candidate **cand;
+  bool found_strictly_viable = false;
+  if (scope_chain->x_processing_template_decl)
+    strict_p = true;
+  viable = (z_candidate *) 0;
+  last_viable = &viable;
+  *any_viable_p = false;
+  cand = &cands;
+  while (*cand)
+    {
+      struct z_candidate *c = *cand;
+      if (!strict_p && (c->viable == 1 || ((int) (c->fn)->code) == 273))
+	{
+	  strict_p = true;
+	  if (viable && !found_strictly_viable)
+	    {
+	      *any_viable_p = false;
+	      *last_viable = cands;
+	      cands = viable;
+	      viable = (z_candidate *) 0;
+	      last_viable = &viable;
+	    }
+	}
+      if (strict_p ? c->viable == 1 : c->viable)
+	{
+	  *last_viable = c;
+	  *cand = c->next;
+	  c->next = (z_candidate *) 0;
+	  last_viable = &c->next;
+	  *any_viable_p = true;
+	  if (c->viable == 1)
+	    found_strictly_viable = true;
+	}
+      else
+	cand = &c->next;
+    }
+  return viable ? viable : cands;
+}
+
+int
+main ()
+{
+  saved_scope s{};
+  scope_chain = &s;
+  z_candidate z[4] = {};
+  z[0].next = &z[1];
+  z[1].viable = 1;
+  z[1].next = &z[2];
+  z[2].viable = 1;
+  z[2].next = &z[3];
+  bool b;
+  z_candidate *c = splice_viable (&z[0], true, &b);
+  if (c != &z[1] || z[1].next != &z[2] || z[2].next)
+    __builtin_abort ();
+  return 0;
+}


	Jakub


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fold-mem-offsets: Fix powerpc64le-linux profiledbootstrap [PR111601]
  2023-11-27 23:51     ` [PATCH] fold-mem-offsets: Fix powerpc64le-linux profiledbootstrap [PR111601] Jakub Jelinek
@ 2023-11-27 23:55       ` Andrew Pinski
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Pinski @ 2023-11-27 23:55 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Jeff Law, Manolis Tsamis, gcc-patches, Richard Biener,
	Vineet Gupta, Philipp Tomsich

On Mon, Nov 27, 2023 at 3:51 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Mon, Nov 27, 2023 at 09:52:14PM +0100, Jakub Jelinek wrote:
> > On Mon, Oct 16, 2023 at 01:11:01PM -0600, Jeff Law wrote:
> > > > gcc/ChangeLog:
> > > >
> > > >   * Makefile.in: Add fold-mem-offsets.o.
> > > >   * passes.def: Schedule a new pass.
> > > >   * tree-pass.h (make_pass_fold_mem_offsets): Declare.
> > > >   * common.opt: New options.
> > > >   * doc/invoke.texi: Document new option.
> > > >   * fold-mem-offsets.cc: New file.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >   * gcc.target/riscv/fold-mem-offsets-1.c: New test.
> > > >   * gcc.target/riscv/fold-mem-offsets-2.c: New test.
> > > >   * gcc.target/riscv/fold-mem-offsets-3.c: New test.
> > > Thanks, I've pushed this to the trunk.
> >
> > This breaks profiledbootstrap on powerpc64le-linux.
> > >From what I can see, the pass works one basic block at a time and
> > will punt on any non-DEBUG_INSN uses outside of the current block
> > (I believe because of the
> >           /* This use affects instructions outside of CAN_FOLD_INSNS.  */
> >           if (!bitmap_bit_p (&can_fold_insns, INSN_UID (use)))
> >             return 0;
> > test and can_fold_insns only set in do_analysis (when processing insns in
> > current bb, cleared at the end) or results of get_single_def_in_bb
> > (which are checked to be in the same bb).
> > But, while get_single_def_in_bb checks for
> >   if (DF_INSN_LUID (def) > DF_INSN_LUID (insn))
> >     return NULL;
> > (OT, why not DF_INSN_INFO_LUID (DF_REF_INSN_INFO (ref_chain->ref))
> > instead of DF_INSN_LUID (def), then it doesn't need to look up
> > DF_INSN_INFO_GET (def)?), nothing when walking all uses of def does such
> > luid check.
> > The basic block in the PR in question has:
> > ...
> > (insn 212 210 215 25 (set (mem/f:DI (reg/v/f:DI 10 10 [orig:152 last_viable ] [152]) [2 *last_viable_336+0 S8 A64])
> >         (reg/f:DI 9 9 [orig:155 _342 ] [155])) "pr111601.ii":50:17 683 {*movdi_internal64}
> >      (expr_list:REG_DEAD (reg/v/f:DI 10 10 [orig:152 last_viable ] [152])
> >         (nil)))
> > (insn 215 212 484 25 (set (reg:DI 5 5 [226])
> >         (const_int 0 [0])) "pr111601.ii":52:12 683 {*movdi_internal64}
> >      (expr_list:REG_EQUIV (const_int 0 [0])
> >         (nil)))
> > (insn 484 215 218 25 (set (reg/v/f:DI 10 10 [orig:152 last_viable ] [152])
> >         (reg/f:DI 9 9 [orig:155 _342 ] [155])) "pr111601.ii":52:12 683 {*movdi_internal64}
> >      (nil))
> > ...
> > (insn 564 214 216 25 (set (reg/v/f:DI 10 10 [orig:152 last_viable ] [152])
> >         (plus:DI (reg/v/f:DI 10 10 [orig:152 last_viable ] [152])
> >             (const_int 96 [0x60]))) "pr111601.ii":52:12 66 {*adddi3}
> >      (nil))
> > (insn 216 564 219 25 (set (mem/f:DI (reg/v/f:DI 10 10 [orig:152 last_viable ] [152]) [2 _343->next+0 S8 A64])
> >         (reg:DI 5 5 [226])) "pr111601.ii":52:12 683 {*movdi_internal64}
> >      (expr_list:REG_DEAD (reg:DI 5 5 [226])
> >         (nil)))
> > ...
> > and when asking for all uses of %r10 from def 564, it will see uses
> > in 216 and 212; the former is after the += 96 addition and gets changed
> > to load from %r10+96 with the addition being dropped, but there is
> > the other store which is a use across the backedge and when reached
> > from other edges certainly doesn't have the + 96 addition anywhere,
> > so the pass doesn't actually change that location.
> >
> > Haven't bootstrapped/regtested this yet, will start momentarily,
> > posting here just in case I'm missing something important.
>
> That version failed bootstrap because DF_INSN_LUID is signed int, not
> unsigned.
>
> Here is what so far passed on powerpc64le-linux
> ../configure --enable-languages=c,c++ --enable-checking=yes,rtl,extra --disable-libsanitizer --with-long-double-128
> make -j160 profiledbootstrap
> which has been failing for more than a month now.
>
> I've noticed a couple of small formatting issues and fixed them too,
> doing normal {powerpc64le,x86_64,i686}-linux bootstraps/regtests:
>
> 2023-11-27  Jakub Jelinek  <jakub@redhat.com>
>
>         PR bootstrap/111601
>         * fold-mem-offsets.cc (fold_offsets): Punt if use appears before
>         def in the basic block.
>         (get_single_def_in_bb, get_uses): Formatting fixes.
>         (fold_offsets_1, pass_fold_mem_offsets::execute): Comment formatting
>         fixes.
>
>         * g++.dg/opt/pr111601.C: New test.
>
> --- gcc/fold-mem-offsets.cc.jj  2023-11-02 07:49:17.060865772 +0100
> +++ gcc/fold-mem-offsets.cc     2023-11-27 22:47:21.128591332 +0100
> @@ -154,7 +154,7 @@ static int stats_fold_count;
>     The definition is desired for REG used in INSN.
>     Return the definition insn or NULL if there's no definition with
>     the desired criteria.  */
> -static rtx_insn*
> +static rtx_insn *
>  get_single_def_in_bb (rtx_insn *insn, rtx reg)
>  {
>    df_ref use;
> @@ -205,7 +205,7 @@ get_single_def_in_bb (rtx_insn *insn, rt
>  /* Get all uses of REG which is set in INSN.  Return the use list or NULL if a
>     use is missing / irregular.  If SUCCESS is not NULL then set it to false if
>     there are missing / irregular uses and true otherwise.  */
> -static struct df_link*
> +static struct df_link *

Since you are making style changes here, it might make sense to remove
the struct keyword too since this is C++ code after all.

Thanks,
Andrew Pinski


>  get_uses (rtx_insn *insn, rtx reg, bool *success)
>  {
>    df_ref def;
> @@ -255,8 +255,7 @@ fold_offsets (rtx_insn *insn, rtx reg, b
>
>      If DO_RECURSION is true and ANALYZE is false then offset that would result
>      from folding is computed and is returned through the pointer OFFSET_OUT.
> -    The instructions that can be folded are recorded in FOLDABLE_INSNS.
> -*/
> +    The instructions that can be folded are recorded in FOLDABLE_INSNS.  */
>  static bool
>  fold_offsets_1 (rtx_insn *insn, bool analyze, bool do_recursion,
>                 HOST_WIDE_INT *offset_out, bitmap foldable_insns)
> @@ -511,6 +510,7 @@ fold_offsets (rtx_insn *insn, rtx reg, b
>        if (!success)
>         return 0;
>
> +      int luid = DF_INSN_LUID (def);
>        for (ref_link = uses; ref_link; ref_link = ref_link->next)
>         {
>           rtx_insn *use = DF_REF_INSN (ref_link->ref);
> @@ -534,6 +534,11 @@ fold_offsets (rtx_insn *insn, rtx reg, b
>           if (use_set && MEM_P (SET_DEST (use_set))
>               && reg_mentioned_p (dest, SET_SRC (use_set)))
>             return 0;
> +
> +         /* Punt if use appears before def in the basic block.  See
> +            PR111601.  */
> +         if (DF_INSN_LUID (use) < luid)
> +           return 0;
>         }
>
>        bitmap_set_bit (&can_fold_insns, INSN_UID (def));
> @@ -846,8 +851,8 @@ pass_fold_mem_offsets::execute (function
>    FOR_ALL_BB_FN (bb, fn)
>      {
>        /* There is a conflict between this pass and RISCV's shorten-memrefs
> -         pass.  For now disable folding if optimizing for size because
> -         otherwise this cancels the effects of shorten-memrefs.  */
> +        pass.  For now disable folding if optimizing for size because
> +        otherwise this cancels the effects of shorten-memrefs.  */
>        if (optimize_bb_for_size_p (bb))
>         continue;
>
> --- gcc/testsuite/g++.dg/opt/pr111601.C.jj      2023-11-27 21:33:12.605006881 +0100
> +++ gcc/testsuite/g++.dg/opt/pr111601.C 2023-11-27 21:34:47.267678510 +0100
> @@ -0,0 +1,86 @@
> +// PR bootstrap/111601
> +// { dg-do run { target c++11 } }
> +// { dg-options "-O2 -fno-exceptions -fno-rtti -fprofile-generate" }
> +// { dg-require-profiling "-fprofile-generate" }
> +// { dg-final { cleanup-coverage-files } }
> +
> +struct tree_base
> +{
> +  int code:16;
> +};
> +struct saved_scope
> +{
> +  void *pad[14];
> +  int x_processing_template_decl;
> +};
> +struct saved_scope *scope_chain;
> +struct z_candidate
> +{
> +  tree_base *fn;
> +  void *pad[11];
> +  z_candidate *next;
> +  int viable;
> +  int flags;
> +};
> +
> +__attribute__((noipa)) struct z_candidate *
> +splice_viable (struct z_candidate *cands, bool strict_p, bool *any_viable_p)
> +{
> +  struct z_candidate *viable;
> +  struct z_candidate **last_viable;
> +  struct z_candidate **cand;
> +  bool found_strictly_viable = false;
> +  if (scope_chain->x_processing_template_decl)
> +    strict_p = true;
> +  viable = (z_candidate *) 0;
> +  last_viable = &viable;
> +  *any_viable_p = false;
> +  cand = &cands;
> +  while (*cand)
> +    {
> +      struct z_candidate *c = *cand;
> +      if (!strict_p && (c->viable == 1 || ((int) (c->fn)->code) == 273))
> +       {
> +         strict_p = true;
> +         if (viable && !found_strictly_viable)
> +           {
> +             *any_viable_p = false;
> +             *last_viable = cands;
> +             cands = viable;
> +             viable = (z_candidate *) 0;
> +             last_viable = &viable;
> +           }
> +       }
> +      if (strict_p ? c->viable == 1 : c->viable)
> +       {
> +         *last_viable = c;
> +         *cand = c->next;
> +         c->next = (z_candidate *) 0;
> +         last_viable = &c->next;
> +         *any_viable_p = true;
> +         if (c->viable == 1)
> +           found_strictly_viable = true;
> +       }
> +      else
> +       cand = &c->next;
> +    }
> +  return viable ? viable : cands;
> +}
> +
> +int
> +main ()
> +{
> +  saved_scope s{};
> +  scope_chain = &s;
> +  z_candidate z[4] = {};
> +  z[0].next = &z[1];
> +  z[1].viable = 1;
> +  z[1].next = &z[2];
> +  z[2].viable = 1;
> +  z[2].next = &z[3];
> +  bool b;
> +  z_candidate *c = splice_viable (&z[0], true, &b);
> +  if (c != &z[1] || z[1].next != &z[2] || z[2].next)
> +    __builtin_abort ();
> +  return 0;
> +}
>
>
>         Jakub
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7] Implement new RTL optimizations pass: fold-mem-offsets.
  2023-11-27 20:52   ` Jakub Jelinek
  2023-11-27 23:51     ` [PATCH] fold-mem-offsets: Fix powerpc64le-linux profiledbootstrap [PR111601] Jakub Jelinek
@ 2023-11-28  9:45     ` Manolis Tsamis
  2023-11-28 10:53       ` Jakub Jelinek
  2023-11-28 17:50     ` [PATCH v7] Implement new RTL optimizations pass: fold-mem-offsets Jeff Law
  2 siblings, 1 reply; 12+ messages in thread
From: Manolis Tsamis @ 2023-11-28  9:45 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Jeff Law, gcc-patches, Richard Biener, Vineet Gupta, Philipp Tomsich

Hi Jakub,

Thanks a lot for tracking this down and providing both testcases and a
fix. Some thoughts below.

On Mon, Nov 27, 2023 at 10:52 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Mon, Oct 16, 2023 at 01:11:01PM -0600, Jeff Law wrote:
> > > gcc/ChangeLog:
> > >
> > >     * Makefile.in: Add fold-mem-offsets.o.
> > >     * passes.def: Schedule a new pass.
> > >     * tree-pass.h (make_pass_fold_mem_offsets): Declare.
> > >     * common.opt: New options.
> > >     * doc/invoke.texi: Document new option.
> > >     * fold-mem-offsets.cc: New file.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >     * gcc.target/riscv/fold-mem-offsets-1.c: New test.
> > >     * gcc.target/riscv/fold-mem-offsets-2.c: New test.
> > >     * gcc.target/riscv/fold-mem-offsets-3.c: New test.
> > Thanks, I've pushed this to the trunk.
>
> This breaks profiledbootstrap on powerpc64le-linux.
> From what I can see, the pass works one basic block at a time and
> will punt on any non-DEBUG_INSN uses outside of the current block
> (I believe because of the
>           /* This use affects instructions outside of CAN_FOLD_INSNS.  */
>           if (!bitmap_bit_p (&can_fold_insns, INSN_UID (use)))
>             return 0;
> test and can_fold_insns only set in do_analysis (when processing insns in
> current bb, cleared at the end) or results of get_single_def_in_bb
> (which are checked to be in the same bb).
> But, while get_single_def_in_bb checks for
>   if (DF_INSN_LUID (def) > DF_INSN_LUID (insn))
>     return NULL;
> (OT, why not DF_INSN_INFO_LUID (DF_REF_INSN_INFO (ref_chain->ref))
> instead of DF_INSN_LUID (def), then it doesn't need to look up
> DF_INSN_INFO_GET (def)?), nothing when walking all uses of def does such
> luid check.

I didn't actually know about the difference between DF_INSN_LUID (def)
and DF_INSN_INFO_LUID (DF_REF_INSN_INFO (ref_chain->ref)), so I just
used the more concise version.
Thanks for pointing this out, I could make this a separate change.

> The basic block in the PR in question has:
> ...
> (insn 212 210 215 25 (set (mem/f:DI (reg/v/f:DI 10 10 [orig:152 last_viable ] [152]) [2 *last_viable_336+0 S8 A64])
>         (reg/f:DI 9 9 [orig:155 _342 ] [155])) "pr111601.ii":50:17 683 {*movdi_internal64}
>      (expr_list:REG_DEAD (reg/v/f:DI 10 10 [orig:152 last_viable ] [152])
>         (nil)))
> (insn 215 212 484 25 (set (reg:DI 5 5 [226])
>         (const_int 0 [0])) "pr111601.ii":52:12 683 {*movdi_internal64}
>      (expr_list:REG_EQUIV (const_int 0 [0])
>         (nil)))
> (insn 484 215 218 25 (set (reg/v/f:DI 10 10 [orig:152 last_viable ] [152])
>         (reg/f:DI 9 9 [orig:155 _342 ] [155])) "pr111601.ii":52:12 683 {*movdi_internal64}
>      (nil))
> ...
> (insn 564 214 216 25 (set (reg/v/f:DI 10 10 [orig:152 last_viable ] [152])
>         (plus:DI (reg/v/f:DI 10 10 [orig:152 last_viable ] [152])
>             (const_int 96 [0x60]))) "pr111601.ii":52:12 66 {*adddi3}
>      (nil))
> (insn 216 564 219 25 (set (mem/f:DI (reg/v/f:DI 10 10 [orig:152 last_viable ] [152]) [2 _343->next+0 S8 A64])
>         (reg:DI 5 5 [226])) "pr111601.ii":52:12 683 {*movdi_internal64}
>      (expr_list:REG_DEAD (reg:DI 5 5 [226])
>         (nil)))
> ...
> and when asking for all uses of %r10 from def 564, it will see uses
> in 216 and 212; the former is after the += 96 addition and gets changed
> to load from %r10+96 with the addition being dropped, but there is
> the other store which is a use across the backedge and when reached
> from other edges certainly doesn't have the + 96 addition anywhere,
> so the pass doesn't actually change that location.
>
Indeed. Your observations got me thinking about this issue and I see
two main weaknesses with my current implementation:

 1) Callers of get_single_def_in_bb cannot differentiate between NULL
-> def is in another BB so we don't want to process vs NULL ->
something bad happened, don't touch this def.
This was originally an issue with get_uses, where callers couldn't
differentiate between NULL -> zero uses found and NULL -> error, which
is why the bool *success argument was added.
I didn't consider at the time that get_def had a similar issue with
NULL being overloaded with many different meanings.
 2) Since get_single_def_in_bb is used to traverse backwards and then
get_uses is used to validate forwards, it looks like the two functions
must be 'in sync'. If a def is rejected for a reason then the
corresponding use must also be rejected. This is essentially this bug
you found, where the LUID condition exists only in one of the two
functions and is not reflected on the other. This got me thinking
whether the `(global_regs[REGNO (reg)] && !set_of (reg, DF_REF_INSN
(ref_link->ref)))` that only exists in get_def could be an issue under
any circumstances.

In any case, your fix for the uses looks to be a needed addition to
the checks done. I will think about whether the robustness of the
def/use functions can be improved in other ways too.

> Haven't bootstrapped/regtested this yet, will start momentarily,
> posting here just in case I'm missing something important.
>
> 2023-11-27  Jakub Jelinek  <jakub@redhat.com>
>
>         PR bootstrap/111601
>         * fold-mem-offsets.cc (fold_offsets): Punt if use appears before
>         def in the basic block.
>
>         * g++.dg/opt/pr111601.C: New test.
>
> --- gcc/fold-mem-offsets.cc.jj  2023-11-02 07:49:17.060865772 +0100
> +++ gcc/fold-mem-offsets.cc     2023-11-27 21:35:35.089007365 +0100
> @@ -511,6 +511,7 @@ fold_offsets (rtx_insn *insn, rtx reg, b
>        if (!success)
>         return 0;
>
> +      unsigned luid = DF_INSN_LUID (def);
>        for (ref_link = uses; ref_link; ref_link = ref_link->next)
>         {
>           rtx_insn *use = DF_REF_INSN (ref_link->ref);
> @@ -534,6 +535,11 @@ fold_offsets (rtx_insn *insn, rtx reg, b
>           if (use_set && MEM_P (SET_DEST (use_set))
>               && reg_mentioned_p (dest, SET_SRC (use_set)))
>             return 0;
> +
> +         /* Punt if use appears before def in the basic block.  See
> +            PR111601.  */
> +         if (DF_INSN_INFO_LUID (DF_REF_INSN_INFO (ref_link->ref)) < luid)
> +           return 0;
I think it would be fitting to put this condition within the get_uses
function? This way it would reflect what exists in
get_single_def_in_bb.

>         }
>
>        bitmap_set_bit (&can_fold_insns, INSN_UID (def));
> --- gcc/testsuite/g++.dg/opt/pr111601.C.jj      2023-11-27 21:33:12.605006881 +0100
> +++ gcc/testsuite/g++.dg/opt/pr111601.C 2023-11-27 21:34:47.267678510 +0100
> @@ -0,0 +1,86 @@
> +// PR bootstrap/111601
> +// { dg-do run { target c++11 } }
> +// { dg-options "-O2 -fno-exceptions -fno-rtti -fprofile-generate" }
> +// { dg-require-profiling "-fprofile-generate" }
> +// { dg-final { cleanup-coverage-files } }
> +
> +struct tree_base
> +{
> +  int code:16;
> +};
> +struct saved_scope
> +{
> +  void *pad[14];
> +  int x_processing_template_decl;
> +};
> +struct saved_scope *scope_chain;
> +struct z_candidate
> +{
> +  tree_base *fn;
> +  void *pad[11];
> +  z_candidate *next;
> +  int viable;
> +  int flags;
> +};
> +
> +__attribute__((noipa)) struct z_candidate *
> +splice_viable (struct z_candidate *cands, bool strict_p, bool *any_viable_p)
> +{
> +  struct z_candidate *viable;
> +  struct z_candidate **last_viable;
> +  struct z_candidate **cand;
> +  bool found_strictly_viable = false;
> +  if (scope_chain->x_processing_template_decl)
> +    strict_p = true;
> +  viable = (z_candidate *) 0;
> +  last_viable = &viable;
> +  *any_viable_p = false;
> +  cand = &cands;
> +  while (*cand)
> +    {
> +      struct z_candidate *c = *cand;
> +      if (!strict_p && (c->viable == 1 || ((int) (c->fn)->code) == 273))
> +       {
> +         strict_p = true;
> +         if (viable && !found_strictly_viable)
> +           {
> +             *any_viable_p = false;
> +             *last_viable = cands;
> +             cands = viable;
> +             viable = (z_candidate *) 0;
> +             last_viable = &viable;
> +           }
> +       }
> +      if (strict_p ? c->viable == 1 : c->viable)
> +       {
> +         *last_viable = c;
> +         *cand = c->next;
> +         c->next = (z_candidate *) 0;
> +         last_viable = &c->next;
> +         *any_viable_p = true;
> +         if (c->viable == 1)
> +           found_strictly_viable = true;
> +       }
> +      else
> +       cand = &c->next;
> +    }
> +  return viable ? viable : cands;
> +}
> +
> +int
> +main ()
> +{
> +  saved_scope s{};
> +  scope_chain = &s;
> +  z_candidate z[4] = {};
> +  z[0].next = &z[1];
> +  z[1].viable = 1;
> +  z[1].next = &z[2];
> +  z[2].viable = 1;
> +  z[2].next = &z[3];
> +  bool b;
> +  z_candidate *c = splice_viable (&z[0], true, &b);
> +  if (c != &z[1] || z[1].next != &z[2] || z[2].next)
> +    __builtin_abort ();
> +  return 0;
> +}
>
>
>         Jakub
>

Manolis

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7] Implement new RTL optimizations pass: fold-mem-offsets.
  2023-11-28  9:45     ` [PATCH v7] Implement new RTL optimizations pass: fold-mem-offsets Manolis Tsamis
@ 2023-11-28 10:53       ` Jakub Jelinek
  2023-11-28 12:19         ` Manolis Tsamis
  0 siblings, 1 reply; 12+ messages in thread
From: Jakub Jelinek @ 2023-11-28 10:53 UTC (permalink / raw)
  To: Manolis Tsamis
  Cc: Jeff Law, gcc-patches, Richard Biener, Vineet Gupta, Philipp Tomsich

On Tue, Nov 28, 2023 at 11:45:58AM +0200, Manolis Tsamis wrote:
> > But, while get_single_def_in_bb checks for
> >   if (DF_INSN_LUID (def) > DF_INSN_LUID (insn))
> >     return NULL;
> > (OT, why not DF_INSN_INFO_LUID (DF_REF_INSN_INFO (ref_chain->ref))
> > instead of DF_INSN_LUID (def), then it doesn't need to look up
> > DF_INSN_INFO_GET (def)?), nothing when walking all uses of def does such
> > luid check.
> 
> I didn't actually know about the difference between DF_INSN_LUID (def)
> and DF_INSN_INFO_LUID (DF_REF_INSN_INFO (ref_chain->ref)), so I just
> used the more concise version.
> Thanks for pointing this out, I could make this a separate change.

Note, in the end in the actually tested patch (see the follow-up)
I've just used DF_INSN_LUID, after looking up that DF_INSN_INFO_GET is
fairly cheap -
#define DF_INSN_INFO_GET(INSN) (df->insns[(INSN_UID (INSN))])
I was afraid it could be a hash table lookup.
Because, while using that DF_INSN_INFO_LUID (DF_REF_INSN_INFO (ref_chain->ref))
etc. is tiny bit faster (doesn't have to read the u2.insn_uid from ref
and look it up in the table), it is less readable, and because it is
just tiny bit, I guess readability/maintainability should win.

> Indeed. Your observations got me thinking about this issue and I see
> two main weaknesses with my current implementation:

> > --- gcc/fold-mem-offsets.cc.jj  2023-11-02 07:49:17.060865772 +0100
> > +++ gcc/fold-mem-offsets.cc     2023-11-27 21:35:35.089007365 +0100
> > @@ -511,6 +511,7 @@ fold_offsets (rtx_insn *insn, rtx reg, b
> >        if (!success)
> >         return 0;
> >
> > +      unsigned luid = DF_INSN_LUID (def);
> >        for (ref_link = uses; ref_link; ref_link = ref_link->next)
> >         {
> >           rtx_insn *use = DF_REF_INSN (ref_link->ref);
> > @@ -534,6 +535,11 @@ fold_offsets (rtx_insn *insn, rtx reg, b
> >           if (use_set && MEM_P (SET_DEST (use_set))
> >               && reg_mentioned_p (dest, SET_SRC (use_set)))
> >             return 0;
> > +
> > +         /* Punt if use appears before def in the basic block.  See
> > +            PR111601.  */
> > +         if (DF_INSN_INFO_LUID (DF_REF_INSN_INFO (ref_link->ref)) < luid)
> > +           return 0;
> I think it would be fitting to put this condition within the get_uses
> function? This way it would reflect what exists in
> get_single_def_in_bb.

get_uses is where I've initially added it to, but to do it there
one needs to also copy the DEBUG_INSN_P check in there (because we
shouldn't be doing such tests on debug insns).

If we put it into get_uses, maybe we should in sync with
get_single_def_in_bb also check it is a use in the same bb.

So, like this (so far untested)?

Note, the earlier posted patch passed bootstrap/regtest on
{powerpc64le,x86_64,i686}-linux.

2023-11-28  Jakub Jelinek  <jakub@redhat.com>

	PR bootstrap/111601
	* fold-mem-offsets.cc (get_uses): Ignore DEBUG_INSN uses.  Otherwise,
	punt if use is in a different basic block from INSN or appears before
	INSN in the same basic block.  Formatting fixes.
	(get_single_def_in_bb): Formatting fixes.
	(fold_offsets_1, pass_fold_mem_offsets::execute): Comment formatting
	fixes.

	* g++.dg/opt/pr111601.C: New test.

--- gcc/fold-mem-offsets.cc.jj	2023-11-02 07:49:17.060865772 +0100
+++ gcc/fold-mem-offsets.cc	2023-11-28 11:47:34.941679105 +0100
@@ -154,7 +154,7 @@ static int stats_fold_count;
    The definition is desired for REG used in INSN.
    Return the definition insn or NULL if there's no definition with
    the desired criteria.  */
-static rtx_insn*
+static rtx_insn *
 get_single_def_in_bb (rtx_insn *insn, rtx reg)
 {
   df_ref use;
@@ -205,11 +205,10 @@ get_single_def_in_bb (rtx_insn *insn, rt
 /* Get all uses of REG which is set in INSN.  Return the use list or NULL if a
    use is missing / irregular.  If SUCCESS is not NULL then set it to false if
    there are missing / irregular uses and true otherwise.  */
-static struct df_link*
+static df_link *
 get_uses (rtx_insn *insn, rtx reg, bool *success)
 {
   df_ref def;
-  struct df_link *ref_chain, *ref_link;
 
   if (success)
     *success = false;
@@ -221,18 +220,30 @@ get_uses (rtx_insn *insn, rtx reg, bool
   if (!def)
     return NULL;
 
-  ref_chain = DF_REF_CHAIN (def);
+  df_link *ref_chain = DF_REF_CHAIN (def);
+  int insn_luid = DF_INSN_LUID (insn);
+  basic_block insn_bb = BLOCK_FOR_INSN (insn);
 
-  for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
+  for (df_link *ref_link = ref_chain; ref_link; ref_link = ref_link->next)
     {
       /* Problem getting a use for this instruction.  */
       if (ref_link->ref == NULL)
 	return NULL;
+
+      rtx_insn *use = DF_REF_INSN (ref_link->ref);
+      if (DEBUG_INSN_P (use))
+	continue;
+
       if (DF_REF_CLASS (ref_link->ref) != DF_REF_REGULAR)
 	return NULL;
       /* We do not handle REG_EQUIV/REG_EQ notes for now.  */
       if (DF_REF_FLAGS (ref_link->ref) & DF_REF_IN_NOTE)
 	return NULL;
+      if (BLOCK_FOR_INSN (use) != insn_bb)
+	return NULL;
+      /* Punt if use appears before def in the basic block.  See PR111601.  */
+      if (DF_INSN_LUID (use) < insn_luid)
+	return NULL;
     }
 
   if (success)
@@ -255,8 +266,7 @@ fold_offsets (rtx_insn *insn, rtx reg, b
 
     If DO_RECURSION is true and ANALYZE is false then offset that would result
     from folding is computed and is returned through the pointer OFFSET_OUT.
-    The instructions that can be folded are recorded in FOLDABLE_INSNS.
-*/
+    The instructions that can be folded are recorded in FOLDABLE_INSNS.  */
 static bool
 fold_offsets_1 (rtx_insn *insn, bool analyze, bool do_recursion,
 		HOST_WIDE_INT *offset_out, bitmap foldable_insns)
@@ -846,8 +856,8 @@ pass_fold_mem_offsets::execute (function
   FOR_ALL_BB_FN (bb, fn)
     {
       /* There is a conflict between this pass and RISCV's shorten-memrefs
-	  pass.  For now disable folding if optimizing for size because
-	  otherwise this cancels the effects of shorten-memrefs.  */
+	 pass.  For now disable folding if optimizing for size because
+	 otherwise this cancels the effects of shorten-memrefs.  */
       if (optimize_bb_for_size_p (bb))
 	continue;
 
--- gcc/testsuite/g++.dg/opt/pr111601.C.jj	2023-11-27 21:33:12.605006881 +0100
+++ gcc/testsuite/g++.dg/opt/pr111601.C	2023-11-27 21:34:47.267678510 +0100
@@ -0,0 +1,86 @@
+// PR bootstrap/111601
+// { dg-do run { target c++11 } }
+// { dg-options "-O2 -fno-exceptions -fno-rtti -fprofile-generate" }
+// { dg-require-profiling "-fprofile-generate" }
+// { dg-final { cleanup-coverage-files } }
+
+struct tree_base
+{
+  int code:16;
+};
+struct saved_scope
+{
+  void *pad[14];
+  int x_processing_template_decl;
+};
+struct saved_scope *scope_chain;
+struct z_candidate
+{
+  tree_base *fn;
+  void *pad[11];
+  z_candidate *next;
+  int viable;
+  int flags;
+};
+
+__attribute__((noipa)) struct z_candidate *
+splice_viable (struct z_candidate *cands, bool strict_p, bool *any_viable_p)
+{
+  struct z_candidate *viable;
+  struct z_candidate **last_viable;
+  struct z_candidate **cand;
+  bool found_strictly_viable = false;
+  if (scope_chain->x_processing_template_decl)
+    strict_p = true;
+  viable = (z_candidate *) 0;
+  last_viable = &viable;
+  *any_viable_p = false;
+  cand = &cands;
+  while (*cand)
+    {
+      struct z_candidate *c = *cand;
+      if (!strict_p && (c->viable == 1 || ((int) (c->fn)->code) == 273))
+	{
+	  strict_p = true;
+	  if (viable && !found_strictly_viable)
+	    {
+	      *any_viable_p = false;
+	      *last_viable = cands;
+	      cands = viable;
+	      viable = (z_candidate *) 0;
+	      last_viable = &viable;
+	    }
+	}
+      if (strict_p ? c->viable == 1 : c->viable)
+	{
+	  *last_viable = c;
+	  *cand = c->next;
+	  c->next = (z_candidate *) 0;
+	  last_viable = &c->next;
+	  *any_viable_p = true;
+	  if (c->viable == 1)
+	    found_strictly_viable = true;
+	}
+      else
+	cand = &c->next;
+    }
+  return viable ? viable : cands;
+}
+
+int
+main ()
+{
+  saved_scope s{};
+  scope_chain = &s;
+  z_candidate z[4] = {};
+  z[0].next = &z[1];
+  z[1].viable = 1;
+  z[1].next = &z[2];
+  z[2].viable = 1;
+  z[2].next = &z[3];
+  bool b;
+  z_candidate *c = splice_viable (&z[0], true, &b);
+  if (c != &z[1] || z[1].next != &z[2] || z[2].next)
+    __builtin_abort ();
+  return 0;
+}


	Jakub


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7] Implement new RTL optimizations pass: fold-mem-offsets.
  2023-11-28 10:53       ` Jakub Jelinek
@ 2023-11-28 12:19         ` Manolis Tsamis
  2023-11-28 17:30           ` [PATCH] fold-mem-offsets, v2: Fix powerpc64le-linux profiledbootstrap [PR111601] Jakub Jelinek
  0 siblings, 1 reply; 12+ messages in thread
From: Manolis Tsamis @ 2023-11-28 12:19 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Jeff Law, gcc-patches, Richard Biener, Vineet Gupta, Philipp Tomsich

On Tue, Nov 28, 2023 at 12:53 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Tue, Nov 28, 2023 at 11:45:58AM +0200, Manolis Tsamis wrote:
> > > But, while get_single_def_in_bb checks for
> > >   if (DF_INSN_LUID (def) > DF_INSN_LUID (insn))
> > >     return NULL;
> > > (OT, why not DF_INSN_INFO_LUID (DF_REF_INSN_INFO (ref_chain->ref))
> > > instead of DF_INSN_LUID (def), then it doesn't need to look up
> > > DF_INSN_INFO_GET (def)?), nothing when walking all uses of def does such
> > > luid check.
> >
> > I didn't actually know about the difference between DF_INSN_LUID (def)
> > and DF_INSN_INFO_LUID (DF_REF_INSN_INFO (ref_chain->ref)), so I just
> > used the more concise version.
> > Thanks for pointing this out, I could make this a separate change.
>
> Note, in the end in the actually tested patch (see the follow-up)
> I've just used DF_INSN_LUID, after looking up that DF_INSN_INFO_GET is
> fairly cheap -
> #define DF_INSN_INFO_GET(INSN) (df->insns[(INSN_UID (INSN))])
> I was afraid it could be a hash table lookup.
> Because, while using that DF_INSN_INFO_LUID (DF_REF_INSN_INFO (ref_chain->ref))
> etc. is tiny bit faster (doesn't have to read the u2.insn_uid from ref
> and look it up in the table), it is less readable, and because it is
> just tiny bit, I guess readability/maintainability should win.
>
Ok, I agree it's more readable and the overhead is small.

> > Indeed. Your observations got me thinking about this issue and I see
> > two main weaknesses with my current implementation:
>
> > > --- gcc/fold-mem-offsets.cc.jj  2023-11-02 07:49:17.060865772 +0100
> > > +++ gcc/fold-mem-offsets.cc     2023-11-27 21:35:35.089007365 +0100
> > > @@ -511,6 +511,7 @@ fold_offsets (rtx_insn *insn, rtx reg, b
> > >        if (!success)
> > >         return 0;
> > >
> > > +      unsigned luid = DF_INSN_LUID (def);
> > >        for (ref_link = uses; ref_link; ref_link = ref_link->next)
> > >         {
> > >           rtx_insn *use = DF_REF_INSN (ref_link->ref);
> > > @@ -534,6 +535,11 @@ fold_offsets (rtx_insn *insn, rtx reg, b
> > >           if (use_set && MEM_P (SET_DEST (use_set))
> > >               && reg_mentioned_p (dest, SET_SRC (use_set)))
> > >             return 0;
> > > +
> > > +         /* Punt if use appears before def in the basic block.  See
> > > +            PR111601.  */
> > > +         if (DF_INSN_INFO_LUID (DF_REF_INSN_INFO (ref_link->ref)) < luid)
> > > +           return 0;
> > I think it would be fitting to put this condition within the get_uses
> > function? This way it would reflect what exists in
> > get_single_def_in_bb.
>
> get_uses is where I've initially added it to, but to do it there
> one needs to also copy the DEBUG_INSN_P check in there (because we
> shouldn't be doing such tests on debug insns).
>
> If we put it into get_uses, maybe we should in sync with
> get_single_def_in_bb also check it is a use in the same bb.
>
I have omitted the "use is in the same bb" check from get_uses because
we only mark instructions as foldable within a BB, so if there were
any uses outside they wouldn't be foldable anyway.
But this check is harmless and probably makes the intention more
clear. Also it could be a small performance improvement as it punts
earlier than otherwise.
So it looks good to add the check imo.

> So, like this (so far untested)?
>
> Note, the earlier posted patch passed bootstrap/regtest on
> {powerpc64le,x86_64,i686}-linux.
>
> 2023-11-28  Jakub Jelinek  <jakub@redhat.com>
>
>         PR bootstrap/111601
>         * fold-mem-offsets.cc (get_uses): Ignore DEBUG_INSN uses.  Otherwise,
>         punt if use is in a different basic block from INSN or appears before
>         INSN in the same basic block.  Formatting fixes.
>         (get_single_def_in_bb): Formatting fixes.
>         (fold_offsets_1, pass_fold_mem_offsets::execute): Comment formatting
>         fixes.
>
>         * g++.dg/opt/pr111601.C: New test.
>
> --- gcc/fold-mem-offsets.cc.jj  2023-11-02 07:49:17.060865772 +0100
> +++ gcc/fold-mem-offsets.cc     2023-11-28 11:47:34.941679105 +0100
> @@ -154,7 +154,7 @@ static int stats_fold_count;
>     The definition is desired for REG used in INSN.
>     Return the definition insn or NULL if there's no definition with
>     the desired criteria.  */
> -static rtx_insn*
> +static rtx_insn *
>  get_single_def_in_bb (rtx_insn *insn, rtx reg)
>  {
>    df_ref use;
> @@ -205,11 +205,10 @@ get_single_def_in_bb (rtx_insn *insn, rt
>  /* Get all uses of REG which is set in INSN.  Return the use list or NULL if a
>     use is missing / irregular.  If SUCCESS is not NULL then set it to false if
>     there are missing / irregular uses and true otherwise.  */
> -static struct df_link*
> +static df_link *
>  get_uses (rtx_insn *insn, rtx reg, bool *success)
>  {
>    df_ref def;
> -  struct df_link *ref_chain, *ref_link;
>
>    if (success)
>      *success = false;
> @@ -221,18 +220,30 @@ get_uses (rtx_insn *insn, rtx reg, bool
>    if (!def)
>      return NULL;
>
> -  ref_chain = DF_REF_CHAIN (def);
> +  df_link *ref_chain = DF_REF_CHAIN (def);
> +  int insn_luid = DF_INSN_LUID (insn);
> +  basic_block insn_bb = BLOCK_FOR_INSN (insn);
>
> -  for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
> +  for (df_link *ref_link = ref_chain; ref_link; ref_link = ref_link->next)
>      {
>        /* Problem getting a use for this instruction.  */
>        if (ref_link->ref == NULL)
>         return NULL;
> +
> +      rtx_insn *use = DF_REF_INSN (ref_link->ref);
> +      if (DEBUG_INSN_P (use))
> +       continue;
> +
>        if (DF_REF_CLASS (ref_link->ref) != DF_REF_REGULAR)
>         return NULL;
>        /* We do not handle REG_EQUIV/REG_EQ notes for now.  */
>        if (DF_REF_FLAGS (ref_link->ref) & DF_REF_IN_NOTE)
>         return NULL;
> +      if (BLOCK_FOR_INSN (use) != insn_bb)
> +       return NULL;
> +      /* Punt if use appears before def in the basic block.  See PR111601.  */
> +      if (DF_INSN_LUID (use) < insn_luid)
> +       return NULL;
>      }
>
>    if (success)
> @@ -255,8 +266,7 @@ fold_offsets (rtx_insn *insn, rtx reg, b
>
>      If DO_RECURSION is true and ANALYZE is false then offset that would result
>      from folding is computed and is returned through the pointer OFFSET_OUT.
> -    The instructions that can be folded are recorded in FOLDABLE_INSNS.
> -*/
> +    The instructions that can be folded are recorded in FOLDABLE_INSNS.  */
>  static bool
>  fold_offsets_1 (rtx_insn *insn, bool analyze, bool do_recursion,
>                 HOST_WIDE_INT *offset_out, bitmap foldable_insns)
> @@ -846,8 +856,8 @@ pass_fold_mem_offsets::execute (function
>    FOR_ALL_BB_FN (bb, fn)
>      {
>        /* There is a conflict between this pass and RISCV's shorten-memrefs
> -         pass.  For now disable folding if optimizing for size because
> -         otherwise this cancels the effects of shorten-memrefs.  */
> +        pass.  For now disable folding if optimizing for size because
> +        otherwise this cancels the effects of shorten-memrefs.  */
>        if (optimize_bb_for_size_p (bb))
>         continue;
>
> --- gcc/testsuite/g++.dg/opt/pr111601.C.jj      2023-11-27 21:33:12.605006881 +0100
> +++ gcc/testsuite/g++.dg/opt/pr111601.C 2023-11-27 21:34:47.267678510 +0100
> @@ -0,0 +1,86 @@
> +// PR bootstrap/111601
> +// { dg-do run { target c++11 } }
> +// { dg-options "-O2 -fno-exceptions -fno-rtti -fprofile-generate" }
> +// { dg-require-profiling "-fprofile-generate" }
> +// { dg-final { cleanup-coverage-files } }
> +
> +struct tree_base
> +{
> +  int code:16;
> +};
> +struct saved_scope
> +{
> +  void *pad[14];
> +  int x_processing_template_decl;
> +};
> +struct saved_scope *scope_chain;
> +struct z_candidate
> +{
> +  tree_base *fn;
> +  void *pad[11];
> +  z_candidate *next;
> +  int viable;
> +  int flags;
> +};
> +
> +__attribute__((noipa)) struct z_candidate *
> +splice_viable (struct z_candidate *cands, bool strict_p, bool *any_viable_p)
> +{
> +  struct z_candidate *viable;
> +  struct z_candidate **last_viable;
> +  struct z_candidate **cand;
> +  bool found_strictly_viable = false;
> +  if (scope_chain->x_processing_template_decl)
> +    strict_p = true;
> +  viable = (z_candidate *) 0;
> +  last_viable = &viable;
> +  *any_viable_p = false;
> +  cand = &cands;
> +  while (*cand)
> +    {
> +      struct z_candidate *c = *cand;
> +      if (!strict_p && (c->viable == 1 || ((int) (c->fn)->code) == 273))
> +       {
> +         strict_p = true;
> +         if (viable && !found_strictly_viable)
> +           {
> +             *any_viable_p = false;
> +             *last_viable = cands;
> +             cands = viable;
> +             viable = (z_candidate *) 0;
> +             last_viable = &viable;
> +           }
> +       }
> +      if (strict_p ? c->viable == 1 : c->viable)
> +       {
> +         *last_viable = c;
> +         *cand = c->next;
> +         c->next = (z_candidate *) 0;
> +         last_viable = &c->next;
> +         *any_viable_p = true;
> +         if (c->viable == 1)
> +           found_strictly_viable = true;
> +       }
> +      else
> +       cand = &c->next;
> +    }
> +  return viable ? viable : cands;
> +}
> +
> +int
> +main ()
> +{
> +  saved_scope s{};
> +  scope_chain = &s;
> +  z_candidate z[4] = {};
> +  z[0].next = &z[1];
> +  z[1].viable = 1;
> +  z[1].next = &z[2];
> +  z[2].viable = 1;
> +  z[2].next = &z[3];
> +  bool b;
> +  z_candidate *c = splice_viable (&z[0], true, &b);
> +  if (c != &z[1] || z[1].next != &z[2] || z[2].next)
> +    __builtin_abort ();
> +  return 0;
> +}
>
>
>         Jakub
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] fold-mem-offsets, v2: Fix powerpc64le-linux profiledbootstrap [PR111601]
  2023-11-28 12:19         ` Manolis Tsamis
@ 2023-11-28 17:30           ` Jakub Jelinek
  2023-11-28 17:51             ` Jeff Law
  0 siblings, 1 reply; 12+ messages in thread
From: Jakub Jelinek @ 2023-11-28 17:30 UTC (permalink / raw)
  To: Manolis Tsamis, Jeff Law
  Cc: gcc-patches, Richard Biener, Vineet Gupta, Philipp Tomsich

On Tue, Nov 28, 2023 at 02:19:36PM +0200, Manolis Tsamis wrote:
> > So, like this (so far untested)?
> >
> > Note, the earlier posted patch passed bootstrap/regtest on
> > {powerpc64le,x86_64,i686}-linux.

Testing revealed this fails miserably, artificial uses don't really
have DF_REF_INSN, so I'm now retrying with this which first punts
on non-regular uses (hoping non-regular uses are always the same
between -g and -g0) and only continuing on DEBUG_INSNs for the regular
uses.

2023-11-28  Jakub Jelinek  <jakub@redhat.com>

	PR bootstrap/111601
	* fold-mem-offsets.cc (get_uses): Ignore DEBUG_INSN uses.  Otherwise,
	punt if use is in a different basic block from INSN or appears before
	INSN in the same basic block.  Formatting fixes.
	(get_single_def_in_bb): Formatting fixes.
	(fold_offsets_1, pass_fold_mem_offsets::execute): Comment formatting
	fixes.

	* g++.dg/opt/pr111601.C: New test.

--- gcc/fold-mem-offsets.cc.jj	2023-11-02 07:49:17.060865772 +0100
+++ gcc/fold-mem-offsets.cc	2023-11-28 11:47:34.941679105 +0100
@@ -154,7 +154,7 @@ static int stats_fold_count;
    The definition is desired for REG used in INSN.
    Return the definition insn or NULL if there's no definition with
    the desired criteria.  */
-static rtx_insn*
+static rtx_insn *
 get_single_def_in_bb (rtx_insn *insn, rtx reg)
 {
   df_ref use;
@@ -205,11 +205,10 @@ get_single_def_in_bb (rtx_insn *insn, rt
 /* Get all uses of REG which is set in INSN.  Return the use list or NULL if a
    use is missing / irregular.  If SUCCESS is not NULL then set it to false if
    there are missing / irregular uses and true otherwise.  */
-static struct df_link*
+static df_link *
 get_uses (rtx_insn *insn, rtx reg, bool *success)
 {
   df_ref def;
-  struct df_link *ref_chain, *ref_link;
 
   if (success)
     *success = false;
@@ -221,18 +220,30 @@ get_uses (rtx_insn *insn, rtx reg, bool
   if (!def)
     return NULL;
 
-  ref_chain = DF_REF_CHAIN (def);
+  df_link *ref_chain = DF_REF_CHAIN (def);
+  int insn_luid = DF_INSN_LUID (insn);
+  basic_block insn_bb = BLOCK_FOR_INSN (insn);
 
-  for (ref_link = ref_chain; ref_link; ref_link = ref_link->next)
+  for (df_link *ref_link = ref_chain; ref_link; ref_link = ref_link->next)
     {
       /* Problem getting a use for this instruction.  */
       if (ref_link->ref == NULL)
 	return NULL;
       if (DF_REF_CLASS (ref_link->ref) != DF_REF_REGULAR)
 	return NULL;
+
+      rtx_insn *use = DF_REF_INSN (ref_link->ref);
+      if (DEBUG_INSN_P (use))
+	continue;
+
       /* We do not handle REG_EQUIV/REG_EQ notes for now.  */
       if (DF_REF_FLAGS (ref_link->ref) & DF_REF_IN_NOTE)
 	return NULL;
+      if (BLOCK_FOR_INSN (use) != insn_bb)
+	return NULL;
+      /* Punt if use appears before def in the basic block.  See PR111601.  */
+      if (DF_INSN_LUID (use) < insn_luid)
+	return NULL;
     }
 
   if (success)
@@ -255,8 +266,7 @@ fold_offsets (rtx_insn *insn, rtx reg, b
 
     If DO_RECURSION is true and ANALYZE is false then offset that would result
     from folding is computed and is returned through the pointer OFFSET_OUT.
-    The instructions that can be folded are recorded in FOLDABLE_INSNS.
-*/
+    The instructions that can be folded are recorded in FOLDABLE_INSNS.  */
 static bool
 fold_offsets_1 (rtx_insn *insn, bool analyze, bool do_recursion,
 		HOST_WIDE_INT *offset_out, bitmap foldable_insns)
@@ -846,8 +856,8 @@ pass_fold_mem_offsets::execute (function
   FOR_ALL_BB_FN (bb, fn)
     {
       /* There is a conflict between this pass and RISCV's shorten-memrefs
-	  pass.  For now disable folding if optimizing for size because
-	  otherwise this cancels the effects of shorten-memrefs.  */
+	 pass.  For now disable folding if optimizing for size because
+	 otherwise this cancels the effects of shorten-memrefs.  */
       if (optimize_bb_for_size_p (bb))
 	continue;
 
--- gcc/testsuite/g++.dg/opt/pr111601.C.jj	2023-11-27 21:33:12.605006881 +0100
+++ gcc/testsuite/g++.dg/opt/pr111601.C	2023-11-27 21:34:47.267678510 +0100
@@ -0,0 +1,86 @@
+// PR bootstrap/111601
+// { dg-do run { target c++11 } }
+// { dg-options "-O2 -fno-exceptions -fno-rtti -fprofile-generate" }
+// { dg-require-profiling "-fprofile-generate" }
+// { dg-final { cleanup-coverage-files } }
+
+struct tree_base
+{
+  int code:16;
+};
+struct saved_scope
+{
+  void *pad[14];
+  int x_processing_template_decl;
+};
+struct saved_scope *scope_chain;
+struct z_candidate
+{
+  tree_base *fn;
+  void *pad[11];
+  z_candidate *next;
+  int viable;
+  int flags;
+};
+
+__attribute__((noipa)) struct z_candidate *
+splice_viable (struct z_candidate *cands, bool strict_p, bool *any_viable_p)
+{
+  struct z_candidate *viable;
+  struct z_candidate **last_viable;
+  struct z_candidate **cand;
+  bool found_strictly_viable = false;
+  if (scope_chain->x_processing_template_decl)
+    strict_p = true;
+  viable = (z_candidate *) 0;
+  last_viable = &viable;
+  *any_viable_p = false;
+  cand = &cands;
+  while (*cand)
+    {
+      struct z_candidate *c = *cand;
+      if (!strict_p && (c->viable == 1 || ((int) (c->fn)->code) == 273))
+	{
+	  strict_p = true;
+	  if (viable && !found_strictly_viable)
+	    {
+	      *any_viable_p = false;
+	      *last_viable = cands;
+	      cands = viable;
+	      viable = (z_candidate *) 0;
+	      last_viable = &viable;
+	    }
+	}
+      if (strict_p ? c->viable == 1 : c->viable)
+	{
+	  *last_viable = c;
+	  *cand = c->next;
+	  c->next = (z_candidate *) 0;
+	  last_viable = &c->next;
+	  *any_viable_p = true;
+	  if (c->viable == 1)
+	    found_strictly_viable = true;
+	}
+      else
+	cand = &c->next;
+    }
+  return viable ? viable : cands;
+}
+
+int
+main ()
+{
+  saved_scope s{};
+  scope_chain = &s;
+  z_candidate z[4] = {};
+  z[0].next = &z[1];
+  z[1].viable = 1;
+  z[1].next = &z[2];
+  z[2].viable = 1;
+  z[2].next = &z[3];
+  bool b;
+  z_candidate *c = splice_viable (&z[0], true, &b);
+  if (c != &z[1] || z[1].next != &z[2] || z[2].next)
+    __builtin_abort ();
+  return 0;
+}

	Jakub


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7] Implement new RTL optimizations pass: fold-mem-offsets.
  2023-11-27 20:52   ` Jakub Jelinek
  2023-11-27 23:51     ` [PATCH] fold-mem-offsets: Fix powerpc64le-linux profiledbootstrap [PR111601] Jakub Jelinek
  2023-11-28  9:45     ` [PATCH v7] Implement new RTL optimizations pass: fold-mem-offsets Manolis Tsamis
@ 2023-11-28 17:50     ` Jeff Law
  2 siblings, 0 replies; 12+ messages in thread
From: Jeff Law @ 2023-11-28 17:50 UTC (permalink / raw)
  To: Jakub Jelinek, Manolis Tsamis
  Cc: gcc-patches, Richard Biener, Vineet Gupta, Philipp Tomsich



On 11/27/23 13:52, Jakub Jelinek wrote:
> On Mon, Oct 16, 2023 at 01:11:01PM -0600, Jeff Law wrote:
>>> gcc/ChangeLog:
>>>
>>> 	* Makefile.in: Add fold-mem-offsets.o.
>>> 	* passes.def: Schedule a new pass.
>>> 	* tree-pass.h (make_pass_fold_mem_offsets): Declare.
>>> 	* common.opt: New options.
>>> 	* doc/invoke.texi: Document new option.
>>> 	* fold-mem-offsets.cc: New file.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 	* gcc.target/riscv/fold-mem-offsets-1.c: New test.
>>> 	* gcc.target/riscv/fold-mem-offsets-2.c: New test.
>>> 	* gcc.target/riscv/fold-mem-offsets-3.c: New test.
>> Thanks, I've pushed this to the trunk.
> 
> This breaks profiledbootstrap on powerpc64le-linux.
>  From what I can see, the pass works one basic block at a time and
> will punt on any non-DEBUG_INSN uses outside of the current block
Correct.  If a non-debug use is seen outside the block, the entire chain 
should become unoptimizable as we don't know how to fixup uses outside 
the block.

> (I believe because of the
>            /* This use affects instructions outside of CAN_FOLD_INSNS.  */
>            if (!bitmap_bit_p (&can_fold_insns, INSN_UID (use)))
>              return 0;
> test and can_fold_insns only set in do_analysis (when processing insns in
> current bb, cleared at the end) or results of get_single_def_in_bb
> (which are checked to be in the same bb).
> But, while get_single_def_in_bb checks for
>    if (DF_INSN_LUID (def) > DF_INSN_LUID (insn))
>      return NULL;
> (OT, why not DF_INSN_INFO_LUID (DF_REF_INSN_INFO (ref_chain->ref))
> instead of DF_INSN_LUID (def), then it doesn't need to look up
> DF_INSN_INFO_GET (def)?), nothing when walking all uses of def does such
> luid check.
On the OT question, I don't think there's any reason why we couldn't use 
the other form.  We're just looking at the LUIDs to validate order 
within a block.


> The basic block in the PR in question has:
> ...
> (insn 212 210 215 25 (set (mem/f:DI (reg/v/f:DI 10 10 [orig:152 last_viable ] [152]) [2 *last_viable_336+0 S8 A64])
>          (reg/f:DI 9 9 [orig:155 _342 ] [155])) "pr111601.ii":50:17 683 {*movdi_internal64}
>       (expr_list:REG_DEAD (reg/v/f:DI 10 10 [orig:152 last_viable ] [152])
>          (nil)))
> (insn 215 212 484 25 (set (reg:DI 5 5 [226])
>          (const_int 0 [0])) "pr111601.ii":52:12 683 {*movdi_internal64}
>       (expr_list:REG_EQUIV (const_int 0 [0])
>          (nil)))
> (insn 484 215 218 25 (set (reg/v/f:DI 10 10 [orig:152 last_viable ] [152])
>          (reg/f:DI 9 9 [orig:155 _342 ] [155])) "pr111601.ii":52:12 683 {*movdi_internal64}
>       (nil))
> ...
> (insn 564 214 216 25 (set (reg/v/f:DI 10 10 [orig:152 last_viable ] [152])
>          (plus:DI (reg/v/f:DI 10 10 [orig:152 last_viable ] [152])
>              (const_int 96 [0x60]))) "pr111601.ii":52:12 66 {*adddi3}
>       (nil))
> (insn 216 564 219 25 (set (mem/f:DI (reg/v/f:DI 10 10 [orig:152 last_viable ] [152]) [2 _343->next+0 S8 A64])
>          (reg:DI 5 5 [226])) "pr111601.ii":52:12 683 {*movdi_internal64}
>       (expr_list:REG_DEAD (reg:DI 5 5 [226])
>          (nil)))
> ...
> and when asking for all uses of %r10 from def 564, it will see uses
> in 216 and 212; the former is after the += 96 addition and gets changed
> to load from %r10+96 with the addition being dropped, but there is
> the other store which is a use across the backedge and when reached
> from other edges certainly doesn't have the + 96 addition anywhere,
> so the pass doesn't actually change that location.

> 
> Haven't bootstrapped/regtested this yet, will start momentarily,
> posting here just in case I'm missing something important.
> 
> 2023-11-27  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR bootstrap/111601
> 	* fold-mem-offsets.cc (fold_offsets): Punt if use appears before
> 	def in the basic block.
> 
> 	* g++.dg/opt/pr111601.C: New test.
It looks sensible to me.  When the use is before the set we're either 
dealing with undefined scenarios or a backedge.  Either way I think 
punting is going to be appropriate.

As you noted downthread that didn't actually work due to the artificial 
uses problem.  But conceptually I think you're on the right track and we 
just need to sort out the implementation details.

jeff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] fold-mem-offsets, v2: Fix powerpc64le-linux profiledbootstrap [PR111601]
  2023-11-28 17:30           ` [PATCH] fold-mem-offsets, v2: Fix powerpc64le-linux profiledbootstrap [PR111601] Jakub Jelinek
@ 2023-11-28 17:51             ` Jeff Law
  0 siblings, 0 replies; 12+ messages in thread
From: Jeff Law @ 2023-11-28 17:51 UTC (permalink / raw)
  To: Jakub Jelinek, Manolis Tsamis
  Cc: gcc-patches, Richard Biener, Vineet Gupta, Philipp Tomsich



On 11/28/23 10:30, Jakub Jelinek wrote:
> On Tue, Nov 28, 2023 at 02:19:36PM +0200, Manolis Tsamis wrote:
>>> So, like this (so far untested)?
>>>
>>> Note, the earlier posted patch passed bootstrap/regtest on
>>> {powerpc64le,x86_64,i686}-linux.
> 
> Testing revealed this fails miserably, artificial uses don't really
> have DF_REF_INSN, so I'm now retrying with this which first punts
> on non-regular uses (hoping non-regular uses are always the same
> between -g and -g0) and only continuing on DEBUG_INSNs for the regular
> uses.
> 
> 2023-11-28  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR bootstrap/111601
> 	* fold-mem-offsets.cc (get_uses): Ignore DEBUG_INSN uses.  Otherwise,
> 	punt if use is in a different basic block from INSN or appears before
> 	INSN in the same basic block.  Formatting fixes.
> 	(get_single_def_in_bb): Formatting fixes.
> 	(fold_offsets_1, pass_fold_mem_offsets::execute): Comment formatting
> 	fixes.
> 
> 	* g++.dg/opt/pr111601.C: New test.
OK if it passes your testing.

jeff

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-11-28 17:51 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-16 18:01 [PATCH v7] Implement new RTL optimizations pass: fold-mem-offsets Manolis Tsamis
2023-10-16 18:14 ` Manolis Tsamis
2023-10-16 19:11 ` Jeff Law
2023-11-27 20:52   ` Jakub Jelinek
2023-11-27 23:51     ` [PATCH] fold-mem-offsets: Fix powerpc64le-linux profiledbootstrap [PR111601] Jakub Jelinek
2023-11-27 23:55       ` Andrew Pinski
2023-11-28  9:45     ` [PATCH v7] Implement new RTL optimizations pass: fold-mem-offsets Manolis Tsamis
2023-11-28 10:53       ` Jakub Jelinek
2023-11-28 12:19         ` Manolis Tsamis
2023-11-28 17:30           ` [PATCH] fold-mem-offsets, v2: Fix powerpc64le-linux profiledbootstrap [PR111601] Jakub Jelinek
2023-11-28 17:51             ` Jeff Law
2023-11-28 17:50     ` [PATCH v7] Implement new RTL optimizations pass: fold-mem-offsets Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).