public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/3, RFC] fsra: Add final gimple sra before expander
@ 2024-02-27  7:04 Jiufu Guo
  2024-02-27  7:04 ` [PATCH 1/3, RFC] fsra: Add final gimple sra just " Jiufu Guo
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Jiufu Guo @ 2024-02-27  7:04 UTC (permalink / raw)
  To: gcc-patches
  Cc: rguenther, jeffreyalaw, richard.sandiford, segher, dje.gcc,
	linkw, bergner, guojiufu

Hi,

As known there are a few PRs (meta-bug PR101926) about
accessing aggregate param/returns which are passed through registers.

Given the suggestion from: 
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637935.html:
We could even use the actual SRA pass in a special mode right before
RTL expansion for the incoming/outgoing part.

Compared to other solutions (e.g. previous light-sra-in-expander), this
method could decouple the different parts (gimple-sra and rtl-expand),
and could leverage the current SRA maximum.

The following patches implements a prototype of this idea.

In this prototype, only "parameters and returns" are treated as 'sra'
candidates.  If a 'parameter' is scalarized, then an IFN_ARG_PART is 
generated for the access at the beginning of the function, and if an
access of a 'return' is scalarized, then IFN_SET_RET is generated 
for it.  Those IFNs are expanded according to the incoming/outgoing
registers for the accesses.

Bootstrapped/regtested on ppc64{,le} and x86_64.


In this prototype, there are still a few areas which can be enhanced,
like:
- Access multi-registers in one stmt,
- Arg access across function calls,
- More special target/ABI behavior,
...

I would like to ask for comments/suggestions before jump into depth,
to ensure this is in the correct direction, and to avoid missing some
important thing.

One thing/concern in this implementation:
For an aggregate parameter, if it is not passed through registers,
there is no need to scalarize in this sra.
For example like i386/pr101908-3.c,
Without sra, the stmts look like this:
 bar ();
 vect__1.5_10 = MEM <vector(2) double> [(double *)&x];

With sra, the stmts look like this:
 x_1 = .ARG_PART (x, 0, 128);
 bar ();
 vect__1.5_10 = x_1;

The issue is that there are no instructions before invoking
'bar ()' without the patch; with the patch, instructions may be generated
before 'bar ()' and those insns would not easy to be optimized by RTL
passes.
This would not be hard to fix (but maybe hacking):
- Let 'sra' pass know the information about if the access is in the
  register, then avoid generating 'ARG_PART'. This would introduce coupling
  before gimple sra and rtl.
- When expanding 'ARG_PART', if the access is in mem, then defer it.
  This would mean 'ARG_PART' may expand to nothing and make the dumped rtl is
  a little confused.
Any comments?

This prototype is splitted into three patches for review.
1/3: Add final gimple sra just before expander
2/3: Add support for ARG_PARTS
3/3: Add support for RET_PARTS

Thanks for your comments and suggestions!

BR,
Jeff (Jiufu Guo)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/3, RFC] fsra: Add final gimple sra just before expander
  2024-02-27  7:04 [PATCH 0/3, RFC] fsra: Add final gimple sra before expander Jiufu Guo
@ 2024-02-27  7:04 ` Jiufu Guo
  2024-02-27  7:04 ` [PATCH 2/3, RFC] fsra: support ARG_PARTS Jiufu Guo
  2024-02-27  7:04 ` [PATCH 3/3, RFC] fsra: support SET_RET_PART Jiufu Guo
  2 siblings, 0 replies; 4+ messages in thread
From: Jiufu Guo @ 2024-02-27  7:04 UTC (permalink / raw)
  To: gcc-patches
  Cc: rguenther, jeffreyalaw, richard.sandiford, segher, dje.gcc,
	linkw, bergner, guojiufu

This patch adds a new mode for sra pass: "fsra".
This 'fsra' pass handle function parameters and returns as candidates.
And run it at the end of GIMPLE passes sequences.

gcc/ChangeLog:

	* passes.def: Add pass pass_sra_final.
	* tree-pass.h (make_pass_sra_final): Declare make_pass_sra_final.
	* tree-sra.cc (enum sra_mode): New enum item SRA_MODE_FINAL_INTRA.
	(build_accesses_from_assign): Accept SRA_MODE_FINAL_INTRA.
	(find_var_candidates): Collect candidates for SRA_MODE_FINAL_INTRA.
	(final_intra_sra): New function.
	(class pass_sra_final): New pass class.
	(make_pass_sra_final): New function.

---
 gcc/passes.def  |  2 ++
 gcc/tree-pass.h |  1 +
 gcc/tree-sra.cc | 81 +++++++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 78 insertions(+), 6 deletions(-)

diff --git a/gcc/passes.def b/gcc/passes.def
index 1cbbd413097..183c1becd65 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -449,6 +449,8 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_harden_conditional_branches);
   NEXT_PASS (pass_harden_compares);
   NEXT_PASS (pass_warn_access, /*early=*/false);
+  NEXT_PASS (pass_sra_final);
+
   NEXT_PASS (pass_cleanup_cfg_post_optimizing);
   NEXT_PASS (pass_warn_function_noreturn);
 
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 29267589eeb..2d0e12bd1bb 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -366,6 +366,7 @@ extern gimple_opt_pass *make_pass_early_tree_profile (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_cleanup_eh (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_sra (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_sra_early (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_sra_final (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tail_recursion (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tail_calls (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_fix_loops (gcc::context *ctxt);
diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
index f8e71ec48b9..aacc76f58b5 100644
--- a/gcc/tree-sra.cc
+++ b/gcc/tree-sra.cc
@@ -21,14 +21,16 @@ along with GCC; see the file COPYING3.  If not see
 <http://www.gnu.org/licenses/>.  */
 
 /* This file implements Scalar Reduction of Aggregates (SRA).  SRA is run
-   twice, once in the early stages of compilation (early SRA) and once in the
-   late stages (late SRA).  The aim of both is to turn references to scalar
-   parts of aggregates into uses of independent scalar variables.
+   three times, once in the early stages of compilation (early SRA) and once
+   in the late stages (late SRA).  The aim of them is to turn references to
+   scalar parts of aggregates into uses of independent scalar variables.
 
-   The two passes are nearly identical, the only difference is that early SRA
+   The three passes are nearly identical, the difference are that early SRA
    does not scalarize unions which are used as the result in a GIMPLE_RETURN
    statement because together with inlining this can lead to weird type
-   conversions.
+   conversions.  The third pass is more care about parameters and returns,
+   it would be helpful for the parameters and returns which are passed through
+   registers.
 
    Both passes operate in four stages:
 
@@ -104,6 +106,7 @@ along with GCC; see the file COPYING3.  If not see
 /* Enumeration of all aggregate reductions we can do.  */
 enum sra_mode { SRA_MODE_EARLY_IPA,   /* early call regularization */
 		SRA_MODE_EARLY_INTRA, /* early intraprocedural SRA */
+		SRA_MODE_FINAL_INTRA, /* final gimple intraprocedural SRA */
 		SRA_MODE_INTRA };     /* late intraprocedural SRA */
 
 /* Global variable describing which aggregate reduction we are performing at
@@ -1437,7 +1440,8 @@ build_accesses_from_assign (gimple *stmt)
     }
 
   if (lacc && racc
-      && (sra_mode == SRA_MODE_EARLY_INTRA || sra_mode == SRA_MODE_INTRA)
+      && (sra_mode == SRA_MODE_EARLY_INTRA || sra_mode == SRA_MODE_INTRA
+	  || sra_mode == SRA_MODE_FINAL_INTRA)
       && !lacc->grp_unscalarizable_region
       && !racc->grp_unscalarizable_region
       && AGGREGATE_TYPE_P (TREE_TYPE (lhs))
@@ -2149,6 +2153,24 @@ find_var_candidates (void)
        parm = DECL_CHAIN (parm))
     ret |= maybe_add_sra_candidate (parm);
 
+  /* fsra only care about parameters and returns */
+  if (sra_mode == SRA_MODE_FINAL_INTRA)
+    {
+      if (!DECL_RESULT (current_function_decl))
+	return ret;
+
+      edge_iterator ei;
+      edge e;
+      FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
+	if (greturn *r = safe_dyn_cast<greturn *> (*gsi_last_bb (e->src)))
+	  {
+	    tree val = gimple_return_retval (r);
+	    if (val && VAR_P (val))
+	      ret |= maybe_add_sra_candidate (val);
+	  }
+      return ret;
+    }
+
   FOR_EACH_LOCAL_DECL (cfun, i, var)
     {
       if (!VAR_P (var))
@@ -5017,6 +5039,14 @@ late_intra_sra (void)
   return perform_intra_sra ();
 }
 
+/* Perform "final sra" intraprocedural SRA just before expander.  */
+static unsigned int
+final_intra_sra (void)
+{
+  sra_mode = SRA_MODE_FINAL_INTRA;
+  return perform_intra_sra ();
+}
+
 
 static bool
 gate_intra_sra (void)
@@ -5099,3 +5129,42 @@ make_pass_sra (gcc::context *ctxt)
 {
   return new pass_sra (ctxt);
 }
+
+namespace
+{
+const pass_data pass_data_sra_final = {
+  GIMPLE_PASS,		 /* type */
+  "fsra",		 /* name */
+  OPTGROUP_NONE,	 /* optinfo_flags */
+  TV_TREE_SRA,		 /* tv_id */
+  (PROP_cfg | PROP_ssa), /* properties_required */
+  0,			 /* properties_provided */
+  0,			 /* properties_destroyed */
+  0,			 /* todo_flags_start */
+  TODO_update_ssa,	 /* todo_flags_finish */
+};
+
+class pass_sra_final : public gimple_opt_pass
+{
+public:
+  pass_sra_final (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_sra_final, ctxt)
+  {
+  }
+
+  /* opt_pass methods: */
+  bool gate (function *) final override { return gate_intra_sra (); }
+  unsigned int execute (function *) final override
+  {
+    return final_intra_sra ();
+  }
+
+}; // class pass_sra_final
+
+} // namespace
+
+gimple_opt_pass *
+make_pass_sra_final (gcc::context *ctxt)
+{
+  return new pass_sra_final (ctxt);
+}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 2/3, RFC] fsra: support ARG_PARTS
  2024-02-27  7:04 [PATCH 0/3, RFC] fsra: Add final gimple sra before expander Jiufu Guo
  2024-02-27  7:04 ` [PATCH 1/3, RFC] fsra: Add final gimple sra just " Jiufu Guo
@ 2024-02-27  7:04 ` Jiufu Guo
  2024-02-27  7:04 ` [PATCH 3/3, RFC] fsra: support SET_RET_PART Jiufu Guo
  2 siblings, 0 replies; 4+ messages in thread
From: Jiufu Guo @ 2024-02-27  7:04 UTC (permalink / raw)
  To: gcc-patches
  Cc: rguenther, jeffreyalaw, richard.sandiford, segher, dje.gcc,
	linkw, bergner, guojiufu

This patch adds IFN_ARG_PARTS, and generate this IFN for parameters access
in fsra pass.  And this IFN is expanded according to the incoming registers
of the parameter.  "fsra" is tunned for the access of parameters.

	PR target/108073

gcc/ChangeLog:

	* internal-fn.cc (query_position_in_parallel): New function.
	(construct_reg_seq): New function.
	(get_incoming_element): New function.
	(reference_alias_ptr_type): Extern declare.
	(expand_ARG_PARTS): New expand function.
	* internal-fn.def (ARG_PARTS): New IFN.
	* tree-sra.cc (scan_function): Update for fsra.
	(analyze_access_subtree): Enable reading ARG analyze for fsra.
	(generate_subtree_copies): Update to generate IFN_ARG_PARTS.

gcc/testsuite/ChangeLog:

	* g++.target/powerpc/pr102024.C: Update.
	* gcc.target/powerpc/pr108073-1.c: New test.
	* gcc.target/powerpc/pr108073.c: New test.

---
 gcc/internal-fn.cc                            | 164 ++++++++++++++++++
 gcc/internal-fn.def                           |   3 +
 gcc/tree-sra.cc                               |  43 ++++-
 gcc/testsuite/g++.target/powerpc/pr102024.C   |   2 +-
 gcc/testsuite/gcc.target/powerpc/pr108073-1.c |  76 ++++++++
 gcc/testsuite/gcc.target/powerpc/pr108073.c   |  74 ++++++++
 6 files changed, 354 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index a07f25f3aee..ee19e155628 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -3393,6 +3393,170 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
     }
 }
 
+/* In the parallel rtx register series REGS, compute the register position for
+   given {BITPOS, BITSIZE}.  The results are stored into START_INDEX, END_INDEX,
+   LEFT_BITS and RIGHT_BITS.  */
+
+void
+query_position_in_parallel (HOST_WIDE_INT bitpos, HOST_WIDE_INT bitsize,
+			    rtx regs, int &start_index, int &end_index,
+			    HOST_WIDE_INT &left_bits, HOST_WIDE_INT &right_bits)
+{
+  int cur_index = XEXP (XVECEXP (regs, 0, 0), 0) ? 0 : 1;
+  for (; cur_index < XVECLEN (regs, 0); cur_index++)
+    {
+      rtx slot = XVECEXP (regs, 0, cur_index);
+      HOST_WIDE_INT off = UINTVAL (XEXP (slot, 1)) * BITS_PER_UNIT;
+      machine_mode mode = GET_MODE (XEXP (slot, 0));
+      HOST_WIDE_INT size = GET_MODE_BITSIZE (mode).to_constant ();
+      if (off <= bitpos && off + size > bitpos)
+	{
+	  start_index = cur_index;
+	  left_bits = bitpos - off;
+	}
+      if (off + size >= bitpos + bitsize)
+	{
+	  end_index = cur_index;
+	  right_bits = off + size - (bitpos + bitsize);
+	  break;
+	}
+    }
+}
+
+/* Create a serial registers which start at FIRST_REG,
+   and SIZE is the total size of those registers.  */
+static rtx
+construct_reg_seq (HOST_WIDE_INT size, rtx first_reg)
+{
+  int nregs = size / UNITS_PER_WORD + (((size % UNITS_PER_WORD) != 0) ? 1 : 0);
+  rtx *tmps = XALLOCAVEC (rtx, nregs);
+  int regno = REGNO (first_reg);
+  machine_mode mode = word_mode;
+  HOST_WIDE_INT word_size = GET_MODE_SIZE (mode).to_constant ();
+  for (int i = 0; i < nregs; i++)
+    {
+      rtx reg = gen_rtx_REG (mode, regno + i);
+      rtx off = GEN_INT (word_size * i);
+      tmps[i] = gen_rtx_EXPR_LIST (VOIDmode, reg, off);
+    }
+  return gen_rtx_PARALLEL (BLKmode, gen_rtvec_v (nregs, tmps));
+}
+
+static rtx
+get_incoming_element (tree arg, HOST_WIDE_INT bitpos, HOST_WIDE_INT bitsize,
+		      bool reversep, tree expr)
+{
+  rtx regs = DECL_INCOMING_RTL (arg);
+  bool has_padding = false;
+  if (REG_P (regs) && GET_MODE (regs) == BLKmode)
+    {
+      HOST_WIDE_INT size = int_size_in_bytes (TREE_TYPE (arg));
+      has_padding = (size % UNITS_PER_WORD) != 0;
+      regs = construct_reg_seq (size, regs);
+    }
+
+  if (GET_CODE (regs) != PARALLEL)
+    return NULL_RTX;
+
+  int start_index = -1;
+  int end_index = -1;
+  HOST_WIDE_INT left_bits = 0;
+  HOST_WIDE_INT right_bits = 0;
+  query_position_in_parallel (bitpos, bitsize, regs, start_index, end_index,
+			      left_bits, right_bits);
+
+  if (start_index < 0 || end_index < 0)
+    return NULL_RTX;
+
+  machine_mode expr_mode = TYPE_MODE (TREE_TYPE (expr));
+  /* Just need one reg for the access.  */
+  if (end_index != start_index)
+    return NULL_RTX;
+
+  rtx reg = XEXP (XVECEXP (regs, 0, start_index), 0);
+  /* Just need one reg for the access.  */
+  if (left_bits == 0 && right_bits == 0)
+    {
+      if (GET_MODE (reg) != expr_mode)
+	reg = gen_lowpart (expr_mode, reg);
+      return reg;
+    }
+
+  /* Need to extract bitfield part reg for the access.
+     left_bits != 0 or right_bits != 0 */
+  if (has_padding && end_index == XVECLEN (regs, 0) - 1)
+    return NULL_RTX;
+  scalar_int_mode imode;
+  if (!int_mode_for_mode (expr_mode).exists (&imode))
+    return NULL_RTX;
+
+  if (expr_mode != imode
+      && known_gt (GET_MODE_SIZE (GET_MODE (regs)), UNITS_PER_WORD))
+    return NULL_RTX;
+
+  machine_mode mode = GET_MODE (reg);
+  bool sgn = TYPE_UNSIGNED (TREE_TYPE (expr));
+  rtx bfld = extract_bit_field (reg, bitsize, left_bits, sgn, NULL_RTX, mode,
+				imode, reversep, NULL);
+
+  if (GET_MODE (bfld) != imode)
+    bfld = gen_lowpart (imode, bfld);
+
+  if (expr_mode == imode)
+    return bfld;
+
+  /* expr_mode != imode, e.g. SF != SI.  */
+  rtx result = gen_reg_rtx (imode);
+  emit_move_insn (result, bfld);
+  return gen_lowpart (expr_mode, result);
+}
+
+tree
+reference_alias_ptr_type (tree t);
+
+static void
+expand_ARG_PARTS (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  tree arg = gimple_call_arg (stmt, 0);
+  HOST_WIDE_INT offset = tree_to_shwi (gimple_call_arg (stmt, 1));
+  HOST_WIDE_INT size = tree_to_shwi (gimple_call_arg (stmt, 2));
+  int reversep = tree_to_shwi (gimple_call_arg (stmt, 3));
+  rtx sub_elem = get_incoming_element (arg, offset, size, reversep, lhs);
+  if (sub_elem)
+    {
+      rtx to_rtx = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+      if (to_rtx)
+	{
+	  gcc_assert (REG_P (to_rtx));
+	  emit_move_insn (to_rtx, sub_elem);
+	  return;
+	}
+    }
+  /* Fall back to normal expand method. */
+  if ((offset % BITS_PER_WORD == 0) && (size % BITS_PER_WORD == 0))
+    {
+      tree base = build_fold_addr_expr (arg);
+      tree type = reference_alias_ptr_type (arg);
+      tree off = build_int_cst (type, offset / BITS_PER_UNIT);
+      location_t loc = EXPR_LOCATION (arg);
+      tree rhs = fold_build2_loc (loc, MEM_REF, TREE_TYPE (lhs), base, off);
+      REF_REVERSE_STORAGE_ORDER (rhs) = reversep;
+      expand_assignment (lhs, rhs, false);
+    }
+  else
+    {
+      tree type = TREE_TYPE (lhs);
+      machine_mode mode = TYPE_MODE (type);
+      rtx op0
+	= expand_expr_real (arg, NULL, VOIDmode, EXPAND_NORMAL, NULL, true);
+      op0 = extract_bit_field (op0, size, offset, TYPE_UNSIGNED (type), NULL,
+			       mode, mode, reversep, NULL);
+      rtx dest = expand_expr (lhs, NULL, VOIDmode, EXPAND_WRITE);
+      emit_move_insn (dest, op0);
+    }
+}
+
 /* The size of an OpenACC compute dimension.  */
 
 static void
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index c14d30365c1..2bbf70dd6a1 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -510,6 +510,9 @@ DEF_INTERNAL_FN (PHI, 0, NULL)
    automatic variable.  */
 DEF_INTERNAL_FN (DEFERRED_INIT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 
+/* A function to extract elemet(s) from an aggregate argument in fsra. */
+DEF_INTERNAL_FN (ARG_PARTS, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
+
 /* DIM_SIZE and DIM_POS return the size of a particular compute
    dimension and the executing thread's position within that
    dimension.  DIM_POS is pure (and not const) so that it isn't
diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
index aacc76f58b5..0bbb8940921 100644
--- a/gcc/tree-sra.cc
+++ b/gcc/tree-sra.cc
@@ -1508,7 +1508,8 @@ scan_function (void)
 	  tree t;
 	  unsigned i;
 
-	  if (gimple_code (stmt) != GIMPLE_CALL)
+	  if (gimple_code (stmt) != GIMPLE_CALL
+	      || sra_mode == SRA_MODE_FINAL_INTRA)
 	    walk_stmt_load_store_addr_ops (stmt, NULL, NULL, NULL,
 					   scan_visit_addr);
 
@@ -2767,12 +2768,22 @@ analyze_access_subtree (struct access *root, struct access *parent,
 	hole = true;
     }
 
+  auto check_rw = [] (struct access *root) -> bool {
+    if ((root->grp_scalar_read || root->grp_assignment_read)
+	&& (root->grp_scalar_write || root->grp_assignment_write))
+      return true;
+    if (sra_mode != SRA_MODE_FINAL_INTRA)
+      return false;
+    if ((root->grp_scalar_read || root->grp_assignment_read)
+	&& TREE_CODE (root->base) == PARM_DECL)
+      return true;
+    return false;
+  };
+
+  /* In fsra, parameter is scalarizable even no writing to it.  */
   if (allow_replacements && scalar && !root->first_child
       && (totally || !root->grp_total_scalarization)
-      && (totally
-	  || root->grp_hint
-	  || ((root->grp_scalar_read || root->grp_assignment_read)
-	      && (root->grp_scalar_write || root->grp_assignment_write))))
+      && (totally || root->grp_hint || check_rw (root)))
     {
       /* Always create access replacements that cover the whole access.
          For integral types this means the precision has to match.
@@ -2841,6 +2852,11 @@ analyze_access_subtree (struct access *root, struct access *parent,
     root->grp_covered = 1;
   else if (root->grp_write || comes_initialized_p (root->base))
     root->grp_unscalarized_data = 1; /* not covered and written to */
+
+  if (sra_mode == SRA_MODE_FINAL_INTRA && root->grp_write
+      && TREE_CODE (root->base) == PARM_DECL)
+    return false;
+
   return sth_created;
 }
 
@@ -3802,7 +3818,7 @@ generate_subtree_copies (struct access *access, tree agg,
 	      || access->offset + access->size > start_offset))
 	{
 	  tree expr, repl = get_access_replacement (access);
-	  gassign *stmt;
+	  gimple *stmt;
 
 	  expr = build_ref_for_model (loc, agg, access->offset - top_offset,
 				      access, gsi, insert_after);
@@ -3814,7 +3830,20 @@ generate_subtree_copies (struct access *access, tree agg,
 						 !insert_after,
 						 insert_after ? GSI_NEW_STMT
 						 : GSI_SAME_STMT);
-	      stmt = gimple_build_assign (repl, expr);
+	      if (sra_mode == SRA_MODE_FINAL_INTRA
+		  && TREE_CODE (access->base) == PARM_DECL
+		  && (access->grp_scalar_read || access->grp_assignment_read))
+		{
+		  gimple *call = gimple_build_call_internal (
+		    IFN_ARG_PARTS, 4, access->base,
+		    wide_int_to_tree (sizetype, access->offset),
+		    wide_int_to_tree (sizetype, access->size),
+		    wide_int_to_tree (sizetype, access->reverse));
+		  gimple_call_set_lhs (call, repl);
+		  stmt = call;
+		}
+	      else
+		stmt = gimple_build_assign (repl, expr);
 	    }
 	  else
 	    {
diff --git a/gcc/testsuite/g++.target/powerpc/pr102024.C b/gcc/testsuite/g++.target/powerpc/pr102024.C
index 769585052b5..c8995cae707 100644
--- a/gcc/testsuite/g++.target/powerpc/pr102024.C
+++ b/gcc/testsuite/g++.target/powerpc/pr102024.C
@@ -5,7 +5,7 @@
 // Test that a zero-width bit field in an otherwise homogeneous aggregate
 // generates a psabi warning and passes arguments in GPRs.
 
-// { dg-final { scan-assembler-times {\mstd\M} 4 } }
+// { dg-final { scan-assembler-times {\mmtvsrd\M} 4 } }
 
 struct a_thing
 {
diff --git a/gcc/testsuite/gcc.target/powerpc/pr108073-1.c b/gcc/testsuite/gcc.target/powerpc/pr108073-1.c
new file mode 100644
index 00000000000..4892716e85f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr108073-1.c
@@ -0,0 +1,76 @@
+/* { dg-do run } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-O2 -save-temps" } */
+
+typedef struct DF
+{
+  double a[4];
+  short s1;
+  short s2;
+  short s3;
+  short s4;
+} DF;
+typedef struct SF
+{
+  float a[4];
+  int i1;
+  int i2;
+} SF;
+
+/* { dg-final { scan-assembler-times {\mmtvsrd|mtvsrws\M} 3 {target { lp64 && has_arch_pwr8 } } } } */
+/* { dg-final { scan-assembler-not {\mlwz\M} {target { lp64 && has_arch_pwr8 } } } } */
+/* { dg-final { scan-assembler-not {\mlhz\M} {target { lp64 && has_arch_pwr8 } } } } */
+
+#define NOIPA __attribute__ ((noipa))
+
+short NOIPA
+foo_hi (DF a, int flag)
+{
+  if (flag == 2)
+    return a.s2 + a.s3;
+  return 0;
+}
+int NOIPA
+foo_si (SF a, int flag)
+{
+  if (flag == 2)
+    return a.i2 + a.i1;
+  return 0;
+}
+double NOIPA
+foo_df (DF arg, int flag)
+{
+  if (flag == 2)
+    return arg.a[3];
+  else
+    return 0.0;
+}
+float NOIPA
+foo_sf (SF arg, int flag)
+{
+  if (flag == 2)
+    return arg.a[2];
+  return 0;
+}
+float NOIPA
+foo_sf1 (SF arg, int flag)
+{
+  if (flag == 2)
+    return arg.a[1];
+  return 0;
+}
+
+DF gdf = {{1.0, 2.0, 3.0, 4.0}, 1, 2, 3, 4};
+SF gsf = {{1.0f, 2.0f, 3.0f, 4.0f}, 1, 2};
+
+int
+main ()
+{
+  if (!(foo_hi (gdf, 2) == 5 && foo_si (gsf, 2) == 3 && foo_df (gdf, 2) == 4.0
+	&& foo_sf (gsf, 2) == 3.0 && foo_sf1 (gsf, 2) == 2.0))
+    __builtin_abort ();
+  if (!(foo_hi (gdf, 1) == 0 && foo_si (gsf, 1) == 0 && foo_df (gdf, 1) == 0
+	&& foo_sf (gsf, 1) == 0 && foo_sf1 (gsf, 1) == 0))
+    __builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/pr108073.c b/gcc/testsuite/gcc.target/powerpc/pr108073.c
new file mode 100644
index 00000000000..4e7feaa6810
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr108073.c
@@ -0,0 +1,74 @@
+/* { dg-do run } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-O2 -save-temps" } */
+
+/* { dg-final { scan-assembler-times {\mmtvsrd|mtvsrws\M} 5 {target { lp64 && { has_arch_pwr8 && be } } } } } */
+/* { dg-final { scan-assembler-times {\mxscvspdpn\M} 4 {target { lp64 && { has_arch_pwr8 && be } } } } } */
+/* { dg-final { scan-assembler-times {\mmtvsrd|mtvsrws\M} 3 {target { lp64 && { has_arch_pwr8 && le } } } } } */
+/* { dg-final { scan-assembler-times {\mxscvspdpn\M} 2 {target { lp64 && { has_arch_pwr8 && le } } } } } */
+/* { dg-final { scan-assembler-times {\mfadds\M} 2 {target { lp64 && has_arch_pwr8 } } } } */
+
+#define NOIPA __attribute__ ((noipa))
+typedef struct X
+{
+  float x;
+  float y;
+} X;
+
+float NOIPA
+fooX (X y)
+{
+  y.x += 1;
+  return y.x + y.y;
+}
+
+typedef struct Y
+{
+  double a[4];
+  long l;
+} Y;
+
+double NOIPA
+fooY (Y arg)
+{
+  return arg.a[3];
+}
+
+typedef struct Z
+{
+  float a[4];
+  short l;
+} Z;
+
+float NOIPA
+fooZ (Z arg)
+{
+  return arg.a[3];
+}
+
+float NOIPA
+fooZ2 (Z arg)
+{
+  return arg.a[2];
+}
+
+X x = {1.0f, 2.0f};
+Y y = {1.0, 2.0, 3.0, 4.0, 1};
+Z z = {1.0f, 2.0f, 3.0f, 4.0f, 1};
+int
+main ()
+{
+  if (fooX (x) != 4.0f)
+    __builtin_abort ();
+
+  if (fooY (y) != 4.0)
+    __builtin_abort ();
+
+  if (fooZ (z) != 4.0f)
+    __builtin_abort ();
+
+  if (fooZ2 (z) != 3.0f)
+    __builtin_abort ();
+
+  return 0;
+}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 3/3, RFC] fsra: support SET_RET_PART
  2024-02-27  7:04 [PATCH 0/3, RFC] fsra: Add final gimple sra before expander Jiufu Guo
  2024-02-27  7:04 ` [PATCH 1/3, RFC] fsra: Add final gimple sra just " Jiufu Guo
  2024-02-27  7:04 ` [PATCH 2/3, RFC] fsra: support ARG_PARTS Jiufu Guo
@ 2024-02-27  7:04 ` Jiufu Guo
  2 siblings, 0 replies; 4+ messages in thread
From: Jiufu Guo @ 2024-02-27  7:04 UTC (permalink / raw)
  To: gcc-patches
  Cc: rguenther, jeffreyalaw, richard.sandiford, segher, dje.gcc,
	linkw, bergner, guojiufu

This patch adds IFN_SET_RET_PARTS, and generate this IFN for the accesses of
the 'returns' in fsra pass.  And the IFN is expanded according to the outgoing
registers of the 'return'.  "fsra" is tunned for the access analyze for
'returns'.

'IFN_SET_RET_LAST_PARTS' is just for this prototype, it helps to
reuse the decl information of the 'return var'.  With enhancing the
implementation, this IFN may be removed.

	PR target/65421
	PR target/69143

gcc/ChangeLog:

	* cfgexpand.cc (expand_value_return): Update.
	(expand_return): Update for returns expand.
	* internal-fn.cc (store_outgoing_element): New function.
	(expand_SET_RET_PARTS): New IFN expand function.
	(expand_SET_RET_LAST_PARTS): New IFN expand function.
	* internal-fn.def (SET_RET_PARTS): New IFN.
	(SET_RET_LAST_PARTS): New IFN.
	* tree-sra.cc (analyze_access_subtree): Upate for returns in fsra.
	(generate_subtree_copies): Generate IFN for returns.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/pr65421.c: New test.
	* gcc.target/powerpc/pr69143.c: New test.

---
 gcc/cfgexpand.cc                           |  6 +-
 gcc/internal-fn.cc                         | 84 ++++++++++++++++++++++
 gcc/internal-fn.def                        |  6 ++
 gcc/tree-sra.cc                            | 39 ++++++++--
 gcc/testsuite/gcc.target/powerpc/pr65421.c | 10 +++
 gcc/testsuite/gcc.target/powerpc/pr69143.c | 23 ++++++
 6 files changed, 163 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr69143.c

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index eef565eddb5..1ec6c2d8102 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -3759,7 +3759,7 @@ expand_value_return (rtx val)
 
   tree decl = DECL_RESULT (current_function_decl);
   rtx return_reg = DECL_RTL (decl);
-  if (return_reg != val)
+  if (!rtx_equal_p (return_reg, val))
     {
       tree funtype = TREE_TYPE (current_function_decl);
       tree type = TREE_TYPE (decl);
@@ -3832,6 +3832,10 @@ expand_return (tree retval)
      been stored into it, so we don't have to do anything special.  */
   if (TREE_CODE (retval_rhs) == RESULT_DECL)
     expand_value_return (result_rtl);
+  /* return is scalarized by fsra: TODO use FLAG. */
+  else if (VAR_P (retval_rhs)
+	   && rtx_equal_p (result_rtl, DECL_RTL (retval_rhs)))
+    expand_null_return_1 ();
 
   /* If the result is an aggregate that is being returned in one (or more)
      registers, load the registers here.  */
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index ee19e155628..be06dc3a16c 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -3557,6 +3557,90 @@ expand_ARG_PARTS (internal_fn, gcall *stmt)
     }
 }
 
+static bool
+store_outgoing_element (rtx regs, HOST_WIDE_INT bitpos, HOST_WIDE_INT bitsize,
+			tree rhs)
+{
+  if (GET_CODE (regs) != PARALLEL)
+    return false;
+
+  int start_index = -1;
+  int end_index = -1;
+  HOST_WIDE_INT left_bits = 0;
+  HOST_WIDE_INT right_bits = 0;
+  query_position_in_parallel (bitpos, bitsize, regs, start_index, end_index,
+			      left_bits, right_bits);
+
+  if (start_index < 0 || end_index < 0)
+    return false;
+
+  if (end_index != start_index)
+    return false;
+
+  if (!((left_bits == 0 && !BITS_BIG_ENDIAN)
+	|| (right_bits == 0 && BITS_BIG_ENDIAN)))
+    return false;
+
+  /* Just need one reg for the access.  */
+  rtx dest = XEXP (XVECEXP (regs, 0, start_index), 0);
+  machine_mode mode = GET_MODE (dest);
+
+  if (left_bits != 0 || right_bits != 0)
+    {
+      machine_mode small_mode;
+      if (!SCALAR_INT_MODE_P (mode)
+	  || !mode_for_size (bitsize, GET_MODE_CLASS (mode), 0)
+		.exists (&small_mode))
+	return false;
+
+      dest = gen_lowpart (small_mode, dest);
+      mode = small_mode;
+    }
+
+  rtx src = expand_expr (rhs, NULL_RTX, VOIDmode, EXPAND_NORMAL);
+  if (!src)
+    return false;
+
+  machine_mode src_mode = GET_MODE (src);
+  if (mode != src_mode)
+    src = gen_lowpart (mode, src);
+
+  emit_move_insn (dest, src);
+
+  return true;
+}
+
+static void
+expand_SET_RET_PARTS (internal_fn, gcall *stmt)
+{
+  HOST_WIDE_INT offset = tree_to_shwi (gimple_call_arg (stmt, 1));
+  HOST_WIDE_INT size = tree_to_shwi (gimple_call_arg (stmt, 2));
+  tree decl = DECL_RESULT (current_function_decl);
+  rtx dest_regs = decl->decl_with_rtl.rtl; // DECL_RTL (base);
+  tree rhs = gimple_call_arg (stmt, 3);
+  bool res = store_outgoing_element (dest_regs, offset, size, rhs);
+  if (!res)
+    {
+      tree base = gimple_call_arg (stmt, 0);
+      tree lhs = gimple_call_lhs (stmt);
+      expand_assignment (base, decl, false);
+      expand_assignment (lhs, rhs, false);
+      expand_assignment (decl, base, false);
+    }
+}
+
+static void
+expand_SET_RET_LAST_PARTS (internal_fn, gcall *stmt)
+{
+  expand_SET_RET_PARTS (IFN_SET_RET_PARTS, stmt);
+
+  tree decl = DECL_RESULT (current_function_decl);
+  rtx dest_regs = decl->decl_with_rtl.rtl; // DECL_RTL (base);
+  tree base = gimple_call_arg (stmt, 0);
+  base->decl_with_rtl.rtl = dest_regs; // SET_DECL_RTL
+}
+
+
 /* The size of an OpenACC compute dimension.  */
 
 static void
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 2bbf70dd6a1..e6fab4671d5 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -513,6 +513,12 @@ DEF_INTERNAL_FN (DEFERRED_INIT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 /* A function to extract elemet(s) from an aggregate argument in fsra. */
 DEF_INTERNAL_FN (ARG_PARTS, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 
+/* Functions to set/construct elemet(s) for an 'return' aggregate. */
+DEF_INTERNAL_FN (SET_RET_PARTS, ECF_LEAF | ECF_NOTHROW, NULL)
+/* Functions to set/construct elemet(s) for a 'return' aggregate just before
+return statement. */
+DEF_INTERNAL_FN (SET_RET_LAST_PARTS, ECF_LEAF | ECF_NOTHROW, NULL)
+
 /* DIM_SIZE and DIM_POS return the size of a particular compute
    dimension and the executing thread's position within that
    dimension.  DIM_POS is pure (and not const) so that it isn't
diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
index 0bbb8940921..d78a2cc4b02 100644
--- a/gcc/tree-sra.cc
+++ b/gcc/tree-sra.cc
@@ -2777,6 +2777,15 @@ analyze_access_subtree (struct access *root, struct access *parent,
     if ((root->grp_scalar_read || root->grp_assignment_read)
 	&& TREE_CODE (root->base) == PARM_DECL)
       return true;
+    /* Now in fsra (SRA_MODE_FINAL_INTRA), only PARAM and RETURNS
+       are candidates, so if "VAR_P (root->base)", then it is used by
+       a return stmt.
+       TODO: add a flag to root->base to indicate it is used by return
+       stmt.*/
+    if ((root->grp_scalar_write || root->grp_assignment_write)
+	&& VAR_P (root->base))
+      return true;
+
     return false;
   };
 
@@ -2853,9 +2862,13 @@ analyze_access_subtree (struct access *root, struct access *parent,
   else if (root->grp_write || comes_initialized_p (root->base))
     root->grp_unscalarized_data = 1; /* not covered and written to */
 
-  if (sra_mode == SRA_MODE_FINAL_INTRA && root->grp_write
-      && TREE_CODE (root->base) == PARM_DECL)
-    return false;
+  if (sra_mode == SRA_MODE_FINAL_INTRA)
+    {/* Does not support writen to PARAM and partial-unscalarized RET yet.  */
+      if (root->grp_unscalarized_data && (VAR_P (root->base)))
+	return false;
+      if (root->grp_write && TREE_CODE (root->base) == PARM_DECL)
+	return false;
+    }
 
   return sth_created;
 }
@@ -3853,7 +3866,25 @@ generate_subtree_copies (struct access *access, tree agg,
 						 !insert_after,
 						 insert_after ? GSI_NEW_STMT
 						 : GSI_SAME_STMT);
-	      stmt = gimple_build_assign (expr, repl);
+	      if (sra_mode == SRA_MODE_FINAL_INTRA && VAR_P (access->base)
+		  && (access->grp_scalar_write || access->grp_assignment_write))
+		{
+		  enum internal_fn fcode;
+		  if (access->first_child == NULL
+		      && access->next_sibling == NULL)
+		    fcode = IFN_SET_RET_LAST_PARTS;
+		  else
+		    fcode = IFN_SET_RET_PARTS;
+
+		  gimple *call = gimple_build_call_internal (
+		    fcode, 4, access->base,
+		    wide_int_to_tree (sizetype, access->offset),
+		    wide_int_to_tree (sizetype, access->size), repl);
+		  gimple_call_set_lhs (call, expr);
+		  stmt = call;
+		}
+	      else
+		stmt = gimple_build_assign (expr, repl);
 	    }
 	  gimple_set_location (stmt, loc);
 
diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421.c b/gcc/testsuite/gcc.target/powerpc/pr65421.c
new file mode 100644
index 00000000000..ea86b53afbb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr65421.c
@@ -0,0 +1,10 @@
+/* { dg-require-effective-target hard_float } */
+/* { dg-require-effective-target powerpc_elfv2 } */
+/* { dg-options "-O2" } */
+
+/* { dg-final { scan-assembler-times {\mlfd\M} 4 {target { lp64 && has_arch_pwr8 } } } } */
+/* { dg-final { scan-assembler-not {\mstd\M} {target { lp64 && has_arch_pwr8 } } } } */
+/* { dg-final { scan-assembler-not {\mld\M} {target { lp64 && has_arch_pwr8 } } } } */
+
+typedef struct { double a[4]; } A;
+A foo (const A *a) { return *a; }
diff --git a/gcc/testsuite/gcc.target/powerpc/pr69143.c b/gcc/testsuite/gcc.target/powerpc/pr69143.c
new file mode 100644
index 00000000000..216a270fb7b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr69143.c
@@ -0,0 +1,23 @@
+/* { dg-require-effective-target hard_float } */
+/* { dg-require-effective-target powerpc_elfv2 } */
+/* { dg-options "-O2" } */
+
+/* { dg-final { scan-assembler-times {\mfmr\M} 3 {target { lp64 && has_arch_pwr8 } } } } */
+/* { dg-final { scan-assembler-not {\mxscvspdpn\M} {target { lp64 && has_arch_pwr8 } } } } */
+
+struct foo1
+{
+  float x;
+  float y;
+};
+
+struct foo1
+blah1 (struct foo1 y)
+{
+  struct foo1 x;
+
+  x.x = y.y;
+  x.y = y.x;
+
+  return x;
+}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-02-27 15:41 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-27  7:04 [PATCH 0/3, RFC] fsra: Add final gimple sra before expander Jiufu Guo
2024-02-27  7:04 ` [PATCH 1/3, RFC] fsra: Add final gimple sra just " Jiufu Guo
2024-02-27  7:04 ` [PATCH 2/3, RFC] fsra: support ARG_PARTS Jiufu Guo
2024-02-27  7:04 ` [PATCH 3/3, RFC] fsra: support SET_RET_PART Jiufu Guo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).