public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH V2] Use subscalar mode to move struct block for parameter
@ 2022-11-17  6:15 Jiufu Guo
  2022-11-21  3:07 ` Jiufu Guo
  0 siblings, 1 reply; 11+ messages in thread
From: Jiufu Guo @ 2022-11-17  6:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc, linkw, guojiufu, rguenther, jeffreyalaw

Hi,

As mentioned in the previous version patch:
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604646.html
The suboptimal code is generated for "assigning from parameter" or
"assigning to return value".
This patch enhances the assignment from parameters like the below
cases:
/////case1.c
typedef struct SA {double a[3];long l; } A;
A ret_arg (A a) {return a;}
void st_arg (A a, A *p) {*p = a;}

////case2.c
typedef struct SA {double a[3];} A;
A ret_arg (A a) {return a;}
void st_arg (A a, A *p) {*p = a;}

For this patch, bootstrap and regtest pass on ppc64{,le}
and x86_64.
* Besides asking for help reviewing this patch, I would like to
consult comments about enhancing for "assigning to returns".

On some targets(ppc64), for below case:
////case3.c
typedef struct SA {double a[3]; long l; } A;
A ret_arg_pt (A *a) {return *a;}

The optimized GIMPLE code looks like:
  <retval> = *a_2(D);
  return <retval>;
Here, <retval>(aka. RESULT_DECL) is MEM, and "aggregate_value_p"
returns true for <retval>.

* While for below case, the generated code is still suboptimal.
////case4.c
typedef struct SA {double a[3];} A;
A ret_arg_pt (A *a) {return *a;}

The optimized GIMPLE code looks like:
  D.3951 = *a_2(D);
  return D.3951;
The "return/assign" stmts are using D.3951(VAR_DECL) instead
"<retval>(RESULT_DECL)".  The mode of D.3951/<retval> is BLK.
The RTL of D.3951 is MEM, and RTL of <retval> is PARALLEL. For
PARALLEL, aggregate_value_p returns false.

In function expand_assignment, there is code:
  if (TREE_CODE (to) == RESULT_DECL
      && (REG_P (to_rtx) || GET_CODE (to_rtx) == PARALLEL))
This code can handle "<retval>", but can not handle "D.3951".

I'm thinking of one way to handle this issue is to update the
GIMPLE sequence as: "<retval> = *a_2(D); return <retval>;"
Or, collecting VARs which are used by return stmts; and for
assignments to those VARs, using sub scalar mode for the block
move.

Thanks for any comments and suggestions!


BR,
Jeff (Jiufu)

---
 gcc/expr.cc | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/gcc/expr.cc b/gcc/expr.cc
index d9407432ea5..420f9cf3662 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -6045,6 +6045,46 @@ expand_assignment (tree to, tree from, bool nontemporal)
       return;
     }
 
+  if (TREE_CODE (from) == PARM_DECL && DECL_INCOMING_RTL (from)
+      && TYPE_MODE (TREE_TYPE (from)) == BLKmode
+      && (GET_CODE (DECL_INCOMING_RTL (from)) == PARALLEL
+	  || REG_P (DECL_INCOMING_RTL (from))))
+    {
+      rtx parm = DECL_INCOMING_RTL (from);
+
+      push_temp_slots ();
+      machine_mode mode;
+      mode = GET_CODE (parm) == PARALLEL
+	       ? GET_MODE (XEXP (XVECEXP (parm, 0, 0), 0))
+	       : word_mode;
+      int mode_size = GET_MODE_SIZE (mode).to_constant ();
+      int size = INTVAL (expr_size (from));
+
+      /* If/How the parameter using submode, it dependes on the size and
+	 position of the parameter.  Here using heurisitic number.  */
+      int hurstc_num = 8;
+      if (size < mode_size || (size % mode_size) != 0
+	  || size > (mode_size * hurstc_num))
+	result = store_expr (from, to_rtx, 0, nontemporal, false);
+      else
+	{
+	  rtx from_rtx
+	    = expand_expr (from, NULL_RTX, GET_MODE (to_rtx), EXPAND_NORMAL);
+	  for (int i = 0; i < size / mode_size; i++)
+	    {
+	      rtx temp = gen_reg_rtx (mode);
+	      rtx src = adjust_address (from_rtx, mode, mode_size * i);
+	      rtx dest = adjust_address (to_rtx, mode, mode_size * i);
+	      emit_move_insn (temp, src);
+	      emit_move_insn (dest, temp);
+	    }
+	  result = to_rtx;
+	}
+      preserve_temp_slots (result);
+      pop_temp_slots ();
+      return;
+    }
+
   /* Compute FROM and store the value in the rtx we got.  */
 
   push_temp_slots ();
-- 
2.17.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-12-06  2:36 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-17  6:15 [PATCH V2] Use subscalar mode to move struct block for parameter Jiufu Guo
2022-11-21  3:07 ` Jiufu Guo
2022-11-22 21:57   ` Jeff Law
2022-11-23  2:58     ` Jiufu Guo
2022-11-24  7:31       ` Richard Biener
2022-11-25  5:05         ` Jiufu Guo
2022-11-25 12:29           ` Jiufu Guo
2022-11-28 17:00       ` Jeff Law
2022-11-29  3:53         ` Jiufu Guo
2022-12-05 16:48           ` Jeff Law
2022-12-06  2:36             ` Jiufu Guo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).