public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH V1] introduce light expander sra
@ 2023-10-10  2:18 Jiufu Guo
  0 siblings, 0 replies; only message in thread
From: Jiufu Guo @ 2023-10-10  2:18 UTC (permalink / raw)
  To: gcc-patches
  Cc: rguenther, jeffreyalaw, richard.sandiford, segher, dje.gcc,
	linkw, bergner, guojiufu, amacleod, aldyh

Hi,

There are a few PRs (meta-bug PR101926) on various targets.
The root causes of them are similar: the aggeragte param/
returns are passed by multi-registers, but they are stored
to stack from registers first; and then, access the 
parameter through stack slot.

A general idea to enhance this: accessing the aggregate
parameters/returns directly through registers.  This idea
would be a kind of SRA (using the scalar registers to
access the aggregate parameter/returns).

This experimental patch for light-expander-sra contains
below parts:

a. Check if the parameters/returns are ok/profitable to
   scalarize, and set the scalar pseudos for the
   parameter/return.
  - This is done in "expand_function_start", after the
    incoming/outgoing hard registers are determined for the
    paramter(s)/return.
    The scalarized registers are recorded in DECL_RTL for
    the parameter/return in parallel form.
  - At the time when setting DECL_RTL, "scalarizable_aggregate"
    is called to check the accesses are ok/profitable to
    scalarize.
    We can continue to enhance this function, to support
    more cases.  For example:
    - 'reverse storage order'.
    - 'TImode/vector-mode from multi-regs'.
    - some cases on 'writing to parameter'/'overlap accesses'.

b. When expanding the accesses of the parameters/returns,
   according to the info of the access(e.g. bitpos,bitsize,
   mode), the scalar(pseudos) can be figured out to expand
   the access.  This may happen when expand below accesses:
  - The component access of a parameter: "_1 = arg.f1".
    Or whole parameter access: rhs of "_2 = arg"
  - The assignment to a return val:
    "D.xx = yy; or D.xx.f = zz" where D.xx occurs on return
    stmt.
  - This is mainly done in expr.cc(expand_expr_real_1, and
    expand_assignment).  Function "extract_sub_member" is
    used to figure out the scalar rtxs(pseudos).

Besides the above two parts, some work are done in the GIMPLE
tree:  collect sra candidates for parameters/returns, and
collect the SRA access info.
This is mainly done at the beginning of the expander pass by
the class (named expand_sra) and its member functions.
Below are two major items of this part.
 - Collect light-expand-sra candidates.
  Each parameter is checked if it has the proper aggregate type.
  Collect return val (VAR_P) on each return stmts if the
  function is returning via registers.  
  This is implemented in expand_sra::collect_sra_candidates. 

 - Build/collect/manage all the access on the candidates.
  The function "scan_function" is used to do this work, it
  goes through all basicblocks, and all interesting stmts (
  phi, return, assign, call, asm) are checked.
  If there is an interesting expression (e.g. COMPONENT_REF
  or PARM_DECL), then record the required info for the access
  (e.g. pos, size, type, base).
  And if it is risky to do SRA, the candidates may be removed.
  e.g. address-taken and accessed via memory.
  "foo(struct S arg) {bar (&arg);}"

This patch also try to common code for light-expand-sra,
tree-sra, and ipa-sra.
We can continue refactoring to share similar functionalities.

Compare with previous version, this version avoid to store
the parameter to stack if it is scalarized.
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631177.html

This patch is tested on ppc64{,le} and x86_64.
Is this ok for trunk?

BR,
Jeff (Jiufu Guo)

	PR target/65421

gcc/ChangeLog:

	* cfgexpand.cc (struct access): New class.
	(struct expand_sra): New class.
	(expand_sra::collect_sra_candidates): New member function.
	(expand_sra::add_sra_candidate): Likewise.
	(expand_sra::build_access): Likewise.
	(expand_sra::analyze_phi): Likewise.
	(expand_sra::analyze_assign): Likewise.
	(expand_sra::visit_base): Likewise.
	(expand_sra::protect_mem_access_in_stmt): Likewise.
	(expand_sra::expand_sra):  Class constructor.
	(expand_sra::~expand_sra): Class destructor.
	(expand_sra::scalarizable_access):  New member function.
	(expand_sra::scalarizable_accesses):  Likewise.
	(scalarizable_aggregate):  New function.
	(set_scalar_rtx_for_returns):  New function.
	(expand_value_return): Updated.
	(expand_debug_expr): Updated.
	(pass_expand::execute): Updated to use expand_sra.
	* cfgexpand.h (scalarizable_aggregate): New declare.
	(set_scalar_rtx_for_returns): New declare.
	* expr.cc (expand_assignment): Updated.
	(expand_constructor): Updated.
	(query_position_in_parallel): New function.
	(extract_sub_member): New function.
	(expand_expr_real_1): Updated.
	* expr.h (query_position_in_parallel): New declare.
	* function.cc (assign_parm_setup_block): Updated.
	(assign_parms): Updated.
	(expand_function_start): Updated.
	* tree-sra.h (struct sra_base_access): New class.
	(struct sra_default_analyzer): New class.
	(scan_function): New template function.
	* var-tracking.cc (track_loc_p): Updated.

gcc/testsuite/ChangeLog:

	* g++.target/powerpc/pr102024.C: Updated
	* gcc.target/powerpc/pr108073.c: New test.
	* gcc.target/powerpc/pr65421-1.c: New test.
	* gcc.target/powerpc/pr65421-2.c: New test.

---
 gcc/cfgexpand.cc                             | 350 ++++++++++++++++++-
 gcc/cfgexpand.h                              |   2 +
 gcc/expr.cc                                  | 135 ++++++-
 gcc/expr.h                                   |   3 +
 gcc/function.cc                              |  36 +-
 gcc/tree-sra.h                               |  76 ++++
 gcc/var-tracking.cc                          |   3 +-
 gcc/testsuite/g++.target/powerpc/pr102024.C  |   2 +-
 gcc/testsuite/gcc.target/powerpc/pr108073.c  |  29 ++
 gcc/testsuite/gcc.target/powerpc/pr65421-1.c |   6 +
 gcc/testsuite/gcc.target/powerpc/pr65421-2.c |  32 ++
 11 files changed, 667 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-2.c

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index 4262703a138..936d3b2b5ee 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -74,6 +74,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "output.h"
 #include "builtins.h"
 #include "opts.h"
+#include "tree-sra.h"
 
 /* Some systems use __main in a way incompatible with its use in gcc, in these
    cases use the macros NAME__MAIN to give a quoted symbol and SYMBOL__MAIN to
@@ -97,6 +98,341 @@ static bool defer_stack_allocation (tree, bool);
 
 static void record_alignment_for_reg_var (unsigned int);
 
+/* For light SRA in expander about paramaters and returns.  */
+struct access : public sra_base_access
+{
+};
+
+typedef struct access *access_p;
+
+struct expand_sra : public sra_default_analyzer
+{
+  /* Construct/destruct resources, e.g. sra candidates.  */
+  expand_sra ();
+  ~expand_sra ();
+
+  /* No actions for pre_analyze_stmt, analyze_return.  */
+
+  /* Overwrite phi,call,asm analyzations.  */
+  void analyze_phi (gphi *s);
+
+  /* TODO: Check accesses on call/asm.  */
+  void analyze_call (gcall *s) { protect_mem_access_in_stmt (s); };
+  void analyze_asm (gasm *s) { protect_mem_access_in_stmt (s); };
+
+  /* Check access of SRA on assignment.  */
+  void analyze_assign (gassign *);
+
+  /* Check if the accesses of BASE(parameter or return) are
+     scalarizable, according to the incoming/outgoing REGS. */
+  bool scalarizable_accesses (tree base, rtx regs);
+
+private:
+  /* Collect the parameter and returns to check if they are suitable for
+     scalarization.  */
+  bool collect_sra_candidates (void);
+
+  /* Return true if VAR is added as a candidate for SRA.  */
+  bool add_sra_candidate (tree var);
+
+  /* Return true if EXPR has interesting sra access, and created access,
+     return false otherwise.  */
+  access_p build_access (tree expr, bool write);
+
+  /* Check if the access ACC is scalarizable.  REGS is the incoming/outgoing
+     registers which the access is based on. */
+  bool scalarizable_access (access_p acc, rtx regs, bool is_parm);
+
+  /* If there is risk (stored/loaded or addr taken),
+     disqualify the sra candidates in the un-interesting STMT. */
+  void protect_mem_access_in_stmt (gimple *stmt);
+
+  /* Callback of walk_stmt_load_store_addr_ops, used to remove
+     unscalarizable accesses.  */
+  static bool visit_base (gimple *, tree op, tree, void *data);
+
+  /* Base (tree) -> Vector (vec<access_p> *) map.  */
+  hash_map<tree, auto_vec<access_p> > *base_access_vec;
+};
+
+bool
+expand_sra::collect_sra_candidates (void)
+{
+  bool ret = false;
+
+  /* Collect parameters.  */
+  for (tree parm = DECL_ARGUMENTS (current_function_decl); parm;
+       parm = DECL_CHAIN (parm))
+    ret |= add_sra_candidate (parm);
+
+  /* Collect VARs on returns.  */
+  if (DECL_RESULT (current_function_decl))
+    {
+      edge_iterator ei;
+      edge e;
+      FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
+	if (greturn *r = safe_dyn_cast<greturn *> (*gsi_last_bb (e->src)))
+	  {
+	    tree val = gimple_return_retval (r);
+	    /* To sclaraized the return, the return value should be only
+	       writen (except this return stmt).  Using 'true(write)' to
+	       pretend the access only be 'writen'. */
+	    if (val && VAR_P (val))
+	      ret |= add_sra_candidate (val) && build_access (val, true);
+	  }
+    }
+
+  return ret;
+}
+
+bool
+expand_sra::add_sra_candidate (tree var)
+{
+  tree type = TREE_TYPE (var);
+
+  if (!AGGREGATE_TYPE_P (type) || !tree_fits_shwi_p (TYPE_SIZE (type))
+      || tree_to_shwi (TYPE_SIZE (type)) == 0 || TREE_THIS_VOLATILE (var)
+      || is_va_list_type (type))
+    return false;
+
+  base_access_vec->get_or_insert (var);
+
+  return true;
+}
+
+access_p
+expand_sra::build_access (tree expr, bool write)
+{
+  enum tree_code code = TREE_CODE (expr);
+  if (code != VAR_DECL && code != PARM_DECL && code != COMPONENT_REF
+      && code != ARRAY_REF && code != ARRAY_RANGE_REF)
+    return NULL;
+
+  HOST_WIDE_INT offset, size;
+  bool reverse;
+  tree base = get_ref_base_and_extent_hwi (expr, &offset, &size, &reverse);
+  if (!base || !DECL_P (base))
+    return NULL;
+
+  vec<access_p> *access_vec = base_access_vec->get (base);
+  if (!access_vec)
+    return NULL;
+
+  /* TODO: support reverse. */
+  if (reverse || size <= 0 || offset + size > tree_to_shwi (DECL_SIZE (base)))
+    {
+      base_access_vec->remove (base);
+      return NULL;
+    }
+
+  struct access *access = XNEWVEC (struct access, 1);
+
+  memset (access, 0, sizeof (struct access));
+  access->offset = offset;
+  access->size = size;
+  access->expr = expr;
+  access->write = write;
+  access->reverse = reverse;
+
+  access_vec->safe_push (access);
+
+  return access;
+}
+
+/* Function protect_mem_access_in_stmt removes the SRA candidates if
+   there is addr-taken on the candidate in the STMT.  */
+
+void
+expand_sra::analyze_phi (gphi *stmt)
+{
+  if (base_access_vec && !base_access_vec->is_empty ())
+    walk_stmt_load_store_addr_ops (stmt, this, NULL, NULL, visit_base);
+}
+
+void
+expand_sra::analyze_assign (gassign *stmt)
+{
+  if (!base_access_vec || base_access_vec->is_empty ())
+    return;
+
+  if (gimple_assign_single_p (stmt) && !gimple_clobber_p (stmt))
+    {
+      tree rhs = gimple_assign_rhs1 (stmt);
+      tree lhs = gimple_assign_lhs (stmt);
+      bool res_r = build_access (rhs, false);
+      bool res_l = build_access (lhs, true);
+
+      if (res_l || res_r)
+	return;
+    }
+
+  protect_mem_access_in_stmt (stmt);
+}
+
+/* Callback of walk_stmt_load_store_addr_ops, used to remove
+   unscalarizable accesses.  Called by protect_mem_access_in_stmt.  */
+
+bool
+expand_sra::visit_base (gimple *, tree op, tree, void *data)
+{
+  op = get_base_address (op);
+  if (op && DECL_P (op))
+    {
+      expand_sra *p = (expand_sra *) data;
+      p->base_access_vec->remove (op);
+    }
+  return false;
+}
+
+/* Function protect_mem_access_in_stmt removes the SRA candidates if
+   there is store/load/addr-taken on the candidate in the STMT.
+
+   For some statements, which SRA does not care about, if there are
+   possible memory operation on the SRA candidates, it would be risky
+   to scalarize it.  */
+
+void
+expand_sra::protect_mem_access_in_stmt (gimple *stmt)
+{
+  if (base_access_vec && !base_access_vec->is_empty ())
+    walk_stmt_load_store_addr_ops (stmt, this, visit_base, visit_base,
+				   visit_base);
+}
+
+expand_sra::expand_sra () : base_access_vec (NULL)
+{
+  if (optimize <= 0)
+    return;
+
+  base_access_vec = new hash_map<tree, auto_vec<access_p> >;
+  collect_sra_candidates ();
+}
+
+expand_sra::~expand_sra ()
+{
+  if (optimize <= 0)
+    return;
+
+  delete base_access_vec;
+}
+
+bool
+expand_sra::scalarizable_access (access_p acc, rtx regs, bool is_parm)
+{
+  /* Now only support reading from parms
+     or writing to returns.  */
+  if (is_parm && acc->write)
+    return false;
+  if (!is_parm && !acc->write)
+    return false;
+
+  machine_mode expr_mode = TYPE_MODE (TREE_TYPE (acc->expr));
+  /* TODO: Support non-blkmode of mult registers. */
+  if (expr_mode != BLKmode
+      && known_gt (acc->size, GET_MODE_BITSIZE (word_mode)))
+    return false;
+
+  /* Compute the position of the access in the parallel regs.  */
+  int start_index = -1;
+  int end_index = -1;
+  HOST_WIDE_INT left_bits = 0;
+  HOST_WIDE_INT right_bits = 0;
+  query_position_in_parallel (acc->offset, acc->size, regs, start_index,
+			      end_index, left_bits, right_bits);
+
+  /* Invalid access possition: padding or outof bound.  */
+  if (start_index < 0 || end_index < 0)
+    return false;
+
+  /* Need multi-registers in a parallel for the access.  */
+  if (expr_mode == BLKmode || end_index > start_index)
+    return !(left_bits || right_bits);
+
+  /* Just need one reg for the access.  */
+  if (end_index == start_index && left_bits == 0 && right_bits == 0)
+    return true;
+
+  /* Need to extract bits from the reg for the access.  */
+  if (!acc->write && end_index == start_index)
+    {
+      scalar_int_mode imode;
+      return int_mode_for_mode (expr_mode).exists (&imode);
+    }
+
+  return false;
+}
+
+/* Now, the base (parm/return) is scalarizable, only if all
+   accesses of the BASE are scalariable.
+
+   This function need to be updated, to support more complicate
+   cases, like:
+   - Some access are scalarizable, but some are not.
+   - Access is writing to a parameter.
+   - Writing accesses are overlap with multi-accesses.   */
+
+bool
+expand_sra::scalarizable_accesses (tree base, rtx regs)
+{
+  if (!base_access_vec)
+    return false;
+  vec<access_p> *access_vec = base_access_vec->get (base);
+  if (!access_vec)
+    return false;
+  if (access_vec->is_empty ())
+    return false;
+
+  bool is_parm = TREE_CODE (base) == PARM_DECL;
+  int n = access_vec->length ();
+  int cur_access_index = 0;
+  for (; cur_access_index < n; cur_access_index++)
+    if (!scalarizable_access ((*access_vec)[cur_access_index], regs, is_parm))
+      break;
+
+  /* It is ok if all access are scalarizable.  */
+  if (cur_access_index == n)
+    return true;
+
+  base_access_vec->remove (base);
+  return false;
+}
+
+static expand_sra *current_sra = NULL;
+
+/* Check if the PARAM (or return) is scalarizable.
+
+   This interface is used in expand_function_start
+   to check sra possiblity for parmeters. */
+
+bool
+scalarizable_aggregate (tree parm, rtx regs)
+{
+  if (!current_sra)
+    return false;
+  return current_sra->scalarizable_accesses (parm, regs);
+}
+
+/* Check if interesting returns, and if they are scalarizable,
+   set DECL_RTL as scalar registers.
+
+   This interface is used in expand_function_start
+   when outgoing registers are determinded for DECL_RESULT.  */
+
+void
+set_scalar_rtx_for_returns ()
+{
+  rtx res = DECL_RTL (DECL_RESULT (current_function_decl));
+  edge_iterator ei;
+  edge e;
+  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
+    if (greturn *r = safe_dyn_cast<greturn *> (*gsi_last_bb (e->src)))
+      {
+	tree val = gimple_return_retval (r);
+	if (val && VAR_P (val) && scalarizable_aggregate (val, res))
+	  SET_DECL_RTL (val, res);
+      }
+}
+
 /* Return an expression tree corresponding to the RHS of GIMPLE
    statement STMT.  */
 
@@ -3707,7 +4043,7 @@ expand_value_return (rtx val)
 
   tree decl = DECL_RESULT (current_function_decl);
   rtx return_reg = DECL_RTL (decl);
-  if (return_reg != val)
+  if (!rtx_equal_p (return_reg, val))
     {
       tree funtype = TREE_TYPE (current_function_decl);
       tree type = TREE_TYPE (decl);
@@ -4423,6 +4759,12 @@ expand_debug_expr (tree exp)
   addr_space_t as;
   scalar_int_mode op0_mode, op1_mode, addr_mode;
 
+  /* TODO: Enable to debug expand-sra optimized parm/returns.  */
+  tree base = get_base_address (exp);
+  if ((TREE_CODE (base) == PARM_DECL || (VAR_P (base) && DECL_RTL_SET_P (base)))
+      && GET_CODE (DECL_RTL (base)) == PARALLEL)
+    return NULL_RTX;
+
   switch (TREE_CODE_CLASS (TREE_CODE (exp)))
     {
     case tcc_expression:
@@ -6628,6 +6970,10 @@ pass_expand::execute (function *fun)
   auto_bitmap forced_stack_vars;
   discover_nonconstant_array_refs (forced_stack_vars);
 
+  /* Enable light-expander-sra.  */
+  current_sra = new expand_sra;
+  scan_function (cfun, *current_sra);
+
   /* Make sure all values used by the optimization passes have sane
      defaults.  */
   reg_renumber = 0;
@@ -7056,6 +7402,8 @@ pass_expand::execute (function *fun)
       loop_optimizer_finalize ();
     }
 
+  delete current_sra;
+  current_sra = NULL;
   timevar_pop (TV_POST_EXPAND);
 
   return 0;
diff --git a/gcc/cfgexpand.h b/gcc/cfgexpand.h
index 0e551f6cfd3..3415c217708 100644
--- a/gcc/cfgexpand.h
+++ b/gcc/cfgexpand.h
@@ -24,5 +24,7 @@ extern tree gimple_assign_rhs_to_tree (gimple *);
 extern HOST_WIDE_INT estimated_stack_frame_size (struct cgraph_node *);
 extern void set_parm_rtl (tree, rtx);
 
+extern bool scalarizable_aggregate (tree, rtx);
+extern void set_scalar_rtx_for_returns ();
 
 #endif /* GCC_CFGEXPAND_H */
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 4220cbd9f8f..20bc8eb0f16 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -5618,7 +5618,10 @@ expand_assignment (tree to, tree from, bool nontemporal)
      Assignment of an array element at a constant index, and assignment of
      an array element in an unaligned packed structure field, has the same
      problem.  Same for (partially) storing into a non-memory object.  */
-  if (handled_component_p (to)
+  if ((handled_component_p (to)
+       && !(VAR_P (get_base_address (to))
+	    && DECL_RTL_SET_P (get_base_address (to))
+	    && GET_CODE (DECL_RTL (get_base_address (to))) == PARALLEL))
       || (TREE_CODE (to) == MEM_REF
 	  && (REF_REVERSE_STORAGE_ORDER (to)
 	      || mem_ref_refers_to_non_mem_p (to)))
@@ -8909,6 +8912,19 @@ expand_constructor (tree exp, rtx target, enum expand_modifier modifier,
       && ! mostly_zeros_p (exp))
     return NULL_RTX;
 
+  if (target && GET_CODE (target) == PARALLEL && all_zeros_p (exp))
+    {
+      int length = XVECLEN (target, 0);
+      int start = XEXP (XVECEXP (target, 0, 0), 0) ? 0 : 1;
+      for (int i = start; i < length; i++)
+	{
+	  rtx dst = XEXP (XVECEXP (target, 0, i), 0);
+	  rtx zero = CONST0_RTX (GET_MODE (dst));
+	  emit_move_insn (dst, zero);
+	}
+      return target;
+    }
+
   /* Handle calls that pass values in multiple non-contiguous
      locations.  The Irix 6 ABI has examples of this.  */
   if (target == 0 || ! safe_from_p (target, exp, 1)
@@ -10628,6 +10644,113 @@ stmt_is_replaceable_p (gimple *stmt)
   return false;
 }
 
+/* In the parallel rtx register series REGS, compute the position for given
+   {BITPOS, BITSIZE}.
+   START_INDEX, END_INDEX, LEFT_BITS and RIGHT_BITS are computed outputs.  */
+
+void
+query_position_in_parallel (HOST_WIDE_INT bitpos, HOST_WIDE_INT bitsize,
+			    rtx regs, int &start_index, int &end_index,
+			    HOST_WIDE_INT &left_bits, HOST_WIDE_INT &right_bits)
+{
+  int cur_index = XEXP (XVECEXP (regs, 0, 0), 0) ? 0 : 1;
+  for (; cur_index < XVECLEN (regs, 0); cur_index++)
+    {
+      rtx slot = XVECEXP (regs, 0, cur_index);
+      HOST_WIDE_INT off = UINTVAL (XEXP (slot, 1)) * BITS_PER_UNIT;
+      machine_mode mode = GET_MODE (XEXP (slot, 0));
+      HOST_WIDE_INT size = GET_MODE_BITSIZE (mode).to_constant ();
+      if (off <= bitpos && off + size > bitpos)
+	{
+	  start_index = cur_index;
+	  left_bits = bitpos - off;
+	}
+      if (off + size >= bitpos + bitsize)
+	{
+	  end_index = cur_index;
+	  right_bits = off + size - (bitpos + bitsize);
+	  break;
+	}
+    }
+}
+
+static rtx
+extract_sub_member (rtx regs, HOST_WIDE_INT bitpos, HOST_WIDE_INT bitsize,
+		    tree expr)
+{
+  int start_index = -1;
+  int end_index = -1;
+  HOST_WIDE_INT left_bits = 0;
+  HOST_WIDE_INT right_bits = 0;
+  query_position_in_parallel (bitpos, bitsize, regs, start_index, end_index,
+			      left_bits, right_bits);
+
+  machine_mode expr_mode = TYPE_MODE (TREE_TYPE (expr));
+  /* Need multi-registers in a parallel for the access.  */
+  if (expr_mode == BLKmode || end_index > start_index)
+    {
+      int num_words = end_index - start_index + 1;
+      rtx *tmps = XALLOCAVEC (rtx, num_words);
+
+      int pos = 0;
+      HOST_WIDE_INT start;
+      start = UINTVAL (XEXP (XVECEXP (regs, 0, start_index), 1));
+      /* Extract whole registers.  */
+      for (; pos < num_words; pos++)
+	{
+	  int index = start_index + pos;
+	  rtx reg = XEXP (XVECEXP (regs, 0, index), 0);
+	  machine_mode mode = GET_MODE (reg);
+	  HOST_WIDE_INT off;
+	  off = UINTVAL (XEXP (XVECEXP (regs, 0, index), 1)) - start;
+	  tmps[pos] = gen_rtx_EXPR_LIST (mode, reg, GEN_INT (off));
+	}
+
+      rtx res = gen_rtx_PARALLEL (expr_mode, gen_rtvec_v (pos, tmps));
+      return res;
+    }
+
+  /* Just need one reg for the access.  */
+  if (end_index == start_index && left_bits == 0 && right_bits == 0)
+    {
+      rtx reg = XEXP (XVECEXP (regs, 0, start_index), 0);
+      if (GET_MODE (reg) != expr_mode)
+	reg = gen_lowpart (expr_mode, reg);
+      return reg;
+    }
+
+  /* Need to extract part reg for the access.  */
+  if (end_index == start_index)
+    {
+      rtx reg = XEXP (XVECEXP (regs, 0, start_index), 0);
+      bool sgn = TYPE_UNSIGNED (TREE_TYPE (expr));
+      scalar_int_mode imode;
+      if (!int_mode_for_mode (expr_mode).exists (&imode))
+	{
+	  gcc_assert (false);
+	  return NULL_RTX;
+	}
+
+      machine_mode mode = GET_MODE (reg);
+      bool reverse = false;
+      rtx bfld = extract_bit_field (reg, bitsize, left_bits, sgn, NULL_RTX,
+				    mode, imode, reverse, NULL);
+
+      if (GET_MODE (bfld) != imode)
+	bfld = gen_lowpart (imode, bfld);
+
+      if (expr_mode == imode)
+	return bfld;
+
+      /* expr_mode != imode, e.g. SF != SI.  */
+      rtx result = gen_reg_rtx (imode);
+      emit_move_insn (result, bfld);
+      return gen_lowpart (expr_mode, result);
+    }
+
+  return NULL_RTX;
+}
+
 rtx
 expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
 		    enum expand_modifier modifier, rtx *alt_rtl,
@@ -11505,6 +11628,16 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
 	  = expand_expr_real (tem, tem_target, VOIDmode, tem_modifier, NULL,
 			      true);
 
+	/* It is scalarizable access on param which is passed by registers.  */
+	if (GET_CODE (op0) == PARALLEL
+	    && (TREE_CODE (tem) == PARM_DECL || VAR_P (tem)))
+	  {
+	    HOST_WIDE_INT pos, size;
+	    size = bitsize.to_constant ();
+	    pos = bitpos.to_constant ();
+	    return extract_sub_member (op0, pos, size, exp);
+	  }
+
 	/* If the field has a mode, we want to access it in the
 	   field's mode, not the computed mode.
 	   If a MEM has VOIDmode (external with incomplete type),
diff --git a/gcc/expr.h b/gcc/expr.h
index 2a172867fdb..8a9332aaad6 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -362,5 +362,8 @@ extern rtx expr_size (tree);
 
 extern bool mem_ref_refers_to_non_mem_p (tree);
 extern bool non_mem_decl_p (tree);
+extern void query_position_in_parallel (HOST_WIDE_INT, HOST_WIDE_INT, rtx,
+					int &, int &, HOST_WIDE_INT &,
+					HOST_WIDE_INT &);
 
 #endif /* GCC_EXPR_H */
diff --git a/gcc/function.cc b/gcc/function.cc
index 336af28fb22..3f3d46e2a73 100644
--- a/gcc/function.cc
+++ b/gcc/function.cc
@@ -3107,8 +3107,29 @@ assign_parm_setup_block (struct assign_parm_data_all *all,
 	  emit_move_insn (mem, entry_parm);
 	}
       else
-	move_block_from_reg (REGNO (entry_parm), mem,
-			     size_stored / UNITS_PER_WORD);
+	{
+	  int regno = REGNO (entry_parm);
+	  int nregs = size_stored / UNITS_PER_WORD;
+	  rtx *tmps = XALLOCAVEC (rtx, nregs);
+	  machine_mode mode = word_mode;
+	  HOST_WIDE_INT word_size = GET_MODE_SIZE (mode).to_constant ();
+	  for (int i = 0; i < nregs; i++)
+	    {
+	      rtx reg = gen_rtx_REG (mode, regno + i);
+	      rtx off = GEN_INT (word_size * i);
+	      tmps[i] = gen_rtx_EXPR_LIST (VOIDmode, reg, off);
+	    }
+
+	  rtx regs = gen_rtx_PARALLEL (BLKmode, gen_rtvec_v (nregs, tmps));
+	  if (scalarizable_aggregate (parm, regs))
+	    {
+	      rtx pseudos = gen_group_rtx (regs);
+	      emit_group_move (pseudos, regs);
+	      stack_parm = pseudos;
+	    }
+	  else
+	    move_block_from_reg (regno, mem, nregs);
+	}
     }
   else if (data->stack_parm == 0 && !TYPE_EMPTY_P (data->arg.type))
     {
@@ -3710,7 +3731,15 @@ assign_parms (tree fndecl)
 
       assign_parm_adjust_stack_rtl (&data);
 
-      if (assign_parm_setup_block_p (&data))
+      rtx incoming = DECL_INCOMING_RTL (parm);
+      if (GET_CODE (incoming) == PARALLEL
+	  && scalarizable_aggregate (parm, incoming))
+	{
+	  rtx pseudos = gen_group_rtx (incoming);
+	  emit_group_move (pseudos, incoming);
+	  set_parm_rtl (parm, pseudos);
+	}
+      else if (assign_parm_setup_block_p (&data))
 	assign_parm_setup_block (&all, parm, &data);
       else if (data.arg.pass_by_reference || use_register_for_decl (parm))
 	assign_parm_setup_reg (&all, parm, &data);
@@ -5128,6 +5157,7 @@ expand_function_start (tree subr)
 	    {
 	      gcc_assert (GET_CODE (hard_reg) == PARALLEL);
 	      set_parm_rtl (res, gen_group_rtx (hard_reg));
+	      set_scalar_rtx_for_returns ();
 	    }
 	}
 
diff --git a/gcc/tree-sra.h b/gcc/tree-sra.h
index f20266c4622..df3071ccf6e 100644
--- a/gcc/tree-sra.h
+++ b/gcc/tree-sra.h
@@ -19,6 +19,82 @@ You should have received a copy of the GNU General Public License
 along with GCC; see the file COPYING3.  If not see
 <http://www.gnu.org/licenses/>.  */
 
+struct sra_base_access
+{
+  /* Values returned by get_ref_base_and_extent, indicates the
+     OFFSET, SIZE and BASE of the access.  */
+  HOST_WIDE_INT offset;
+  HOST_WIDE_INT size;
+
+  /* The context expression of this access.  */
+  tree expr;
+
+  /* Indicates this is a write access.  */
+  bool write : 1;
+
+  /* Indicates if this access is made in reverse storage order.  */
+  bool reverse : 1;
+};
+
+/* Default template for sra_scan_function.  */
+
+struct sra_default_analyzer
+{
+  /* Template analyze functions.  */
+  void analyze_phi (gphi *){};
+  void pre_analyze_stmt (gimple *){};
+  void analyze_return (greturn *){};
+  void analyze_assign (gassign *){};
+  void analyze_call (gcall *){};
+  void analyze_asm (gasm *){};
+  void analyze_default_stmt (gimple *){};
+};
+
+/* Scan function and look for interesting expressions.  */
+
+template <typename analyzer>
+void
+scan_function (struct function *fun, analyzer &a)
+{
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, fun)
+    {
+      for (gphi_iterator gsi = gsi_start_phis (bb); !gsi_end_p (gsi);
+	   gsi_next (&gsi))
+	a.analyze_phi (gsi.phi ());
+
+      for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
+	   gsi_next (&gsi))
+	{
+	  gimple *stmt = gsi_stmt (gsi);
+	  a.pre_analyze_stmt (stmt);
+
+	  switch (gimple_code (stmt))
+	    {
+	    case GIMPLE_RETURN:
+	      a.analyze_return (as_a<greturn *> (stmt));
+	      break;
+
+	    case GIMPLE_ASSIGN:
+	      a.analyze_assign (as_a<gassign *> (stmt));
+	      break;
+
+	    case GIMPLE_CALL:
+	      a.analyze_call (as_a<gcall *> (stmt));
+	      break;
+
+	    case GIMPLE_ASM:
+	      a.analyze_asm (as_a<gasm *> (stmt));
+	      break;
+
+	    default:
+	      a.analyze_default_stmt (stmt);
+	      break;
+	    }
+	}
+    }
+}
+
 bool type_internals_preclude_sra_p (tree type, const char **msg);
 
 /* Return true iff TYPE is stdarg va_list type (which early SRA and IPA-SRA
diff --git a/gcc/var-tracking.cc b/gcc/var-tracking.cc
index d8dafa5481a..7fc801f2612 100644
--- a/gcc/var-tracking.cc
+++ b/gcc/var-tracking.cc
@@ -5352,7 +5352,8 @@ track_loc_p (rtx loc, tree expr, poly_int64 offset, bool store_reg_p,
      because the real and imaginary parts are represented as separate
      pseudo registers, even if the whole complex value fits into one
      hard register.  */
-  if ((paradoxical_subreg_p (mode, DECL_MODE (expr))
+  if (((DECL_MODE (expr) != BLKmode
+	&& paradoxical_subreg_p (mode, DECL_MODE (expr)))
        || (store_reg_p
 	   && !COMPLEX_MODE_P (DECL_MODE (expr))
 	   && hard_regno_nregs (REGNO (loc), DECL_MODE (expr)) == 1))
diff --git a/gcc/testsuite/g++.target/powerpc/pr102024.C b/gcc/testsuite/g++.target/powerpc/pr102024.C
index 769585052b5..c8995cae707 100644
--- a/gcc/testsuite/g++.target/powerpc/pr102024.C
+++ b/gcc/testsuite/g++.target/powerpc/pr102024.C
@@ -5,7 +5,7 @@
 // Test that a zero-width bit field in an otherwise homogeneous aggregate
 // generates a psabi warning and passes arguments in GPRs.
 
-// { dg-final { scan-assembler-times {\mstd\M} 4 } }
+// { dg-final { scan-assembler-times {\mmtvsrd\M} 4 } }
 
 struct a_thing
 {
diff --git a/gcc/testsuite/gcc.target/powerpc/pr108073.c b/gcc/testsuite/gcc.target/powerpc/pr108073.c
new file mode 100644
index 00000000000..7dd1a4a326a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr108073.c
@@ -0,0 +1,29 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -save-temps" } */
+
+typedef struct DF {double a[4]; short s1; short s2; short s3; short s4; } DF;
+typedef struct SF {float a[4]; int i1; int i2; } SF;
+
+/* { dg-final { scan-assembler-times {\mmtvsrd\M} 3 {target { has_arch_ppc64 && has_arch_pwr8 } } } } */
+/* { dg-final { scan-assembler-not {\mlwz\M} {target { has_arch_ppc64 && has_arch_pwr8 } } } } */
+/* { dg-final { scan-assembler-not {\mlhz\M} {target { has_arch_ppc64 && has_arch_pwr8 } } } } */
+short  __attribute__ ((noipa)) foo_hi (DF a, int flag){if (flag == 2)return a.s2+a.s3;return 0;}
+int  __attribute__ ((noipa)) foo_si (SF a, int flag){if (flag == 2)return a.i2+a.i1;return 0;}
+double __attribute__ ((noipa)) foo_df (DF arg, int flag){if (flag == 2)return arg.a[3];else return 0.0;}
+float  __attribute__ ((noipa)) foo_sf (SF arg, int flag){if (flag == 2)return arg.a[2]; return 0;}
+float  __attribute__ ((noipa)) foo_sf1 (SF arg, int flag){if (flag == 2)return arg.a[1];return 0;}
+
+DF gdf = {{1.0,2.0,3.0,4.0}, 1, 2, 3, 4};
+SF gsf = {{1.0f,2.0f,3.0f,4.0f}, 1, 2};
+
+int main()
+{
+  if (!(foo_hi (gdf, 2) == 5 && foo_si (gsf, 2) == 3 && foo_df (gdf, 2) == 4.0
+	&& foo_sf (gsf, 2) == 3.0 && foo_sf1 (gsf, 2) == 2.0))
+    __builtin_abort ();
+  if (!(foo_hi (gdf, 1) == 0 && foo_si (gsf, 1) == 0 && foo_df (gdf, 1) == 0
+	&& foo_sf (gsf, 1) == 0 && foo_sf1 (gsf, 1) == 0))
+    __builtin_abort ();
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421-1.c b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c
new file mode 100644
index 00000000000..4e1f87f7939
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c
@@ -0,0 +1,6 @@
+/* PR target/65421 */
+/* { dg-options "-O2" } */
+
+typedef struct LARGE {double a[4]; int arr[32];} LARGE;
+LARGE foo (LARGE a){return a;}
+/* { dg-final { scan-assembler-times {\mmemcpy\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421-2.c b/gcc/testsuite/gcc.target/powerpc/pr65421-2.c
new file mode 100644
index 00000000000..8a8e1a0e996
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr65421-2.c
@@ -0,0 +1,32 @@
+/* PR target/65421 */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target powerpc_elfv2 } */
+/* { dg-require-effective-target has_arch_ppc64 } */
+
+typedef struct FLOATS
+{
+  double a[3];
+} FLOATS;
+
+/* 3 lfd after returns also optimized */
+/* FLOATS ret_arg_pt (FLOATS *a){return *a;} */
+
+/* 3 stfd */
+void st_arg (FLOATS a, FLOATS *p) {*p = a;}
+/* { dg-final { scan-assembler-times {\mstfd\M} 3 } } */
+
+/* blr */
+FLOATS ret_arg (FLOATS a) {return a;}
+
+typedef struct MIX
+{
+  double a[2];
+  long l;
+} MIX;
+
+/* std 3 param regs to return slot */
+MIX ret_arg1 (MIX a) {return a;}
+/* { dg-final { scan-assembler-times {\mstd\M} 3 } } */
+
+/* count insns */
+/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 9 } } */
-- 
2.25.1


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-10-10  2:18 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-10  2:18 [PATCH V1] introduce light expander sra Jiufu Guo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).