public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jiufu Guo <guojiufu@linux.ibm.com>
To: Jeff Law <jeffreyalaw@gmail.com>
Cc: gcc-patches@gcc.gnu.org, rguenther@suse.de,
	segher@kernel.crashing.org, dje.gcc@gmail.com, linkw@gcc.gnu.org,
	meissner@linux.ibm.com
Subject: Re: [PATCH V5] Use reg mode to move sub blocks for parameters and returns
Date: Tue, 09 May 2023 21:43:45 +0800	[thread overview]
Message-ID: <7n3545d3hq.fsf@ltcden2-lp1.aus.stglabs.ibm.com> (raw)
In-Reply-To: <0473d90c-7ee0-2989-16b9-234f422e0e3c@gmail.com> (Jeff Law's message of "Fri, 5 May 2023 17:37:05 -0600")


Hi,

Jeff Law <jeffreyalaw@gmail.com> writes:

> On 5/3/23 23:49, guojiufu wrote:
>> Hi,
>>
>> On 2023-05-01 03:00, Jeff Law wrote:
>>> On 3/16/23 21:39, Jiufu Guo wrote:
>>>> Hi,
>>>>
>>>> When assigning a parameter to a variable, or assigning a variable to
>>>> return value with struct type, and the parameter/return is passed
>>>> through registers.
>>>> For this kind of case, it would be better to use the nature mode of
>>>> the registers to move the content for the assignment.
>>>>
>>>> As the example code (like code in PR65421):
>>>>
>>>> typedef struct SA {double a[3];} A;
>>>> A ret_arg_pt (A *a) {return *a;} // on ppc64le, expect only 3 lfd(s)
>>>> A ret_arg (A a) {return a;} // just empty fun body
>>>> void st_arg (A a, A *p) {*p = a;} //only 3 stfd(s)
>>>>
>>>> Comparing with previous version:
>>>> https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609394.html
>>>> This version refine code to eliminated reductant code in  the sub
>>>> routine "move_sub_blocks".
>>>>
>>>> Bootstrap and regtest pass on ppc64{,le}.
>>>> Is this ok for trunk?
>>>>
>> ...
>>
>>>> diff --git a/gcc/expr.cc b/gcc/expr.cc
>>>> index 15be1c8db99..97a7be9542e 100644
>>>> --- a/gcc/expr.cc
>>>> +++ b/gcc/expr.cc
>>>> @@ -5559,6 +5559,41 @@ mem_ref_refers_to_non_mem_p (tree ref)
>>>>     return non_mem_decl_p (base);
>>>>   }
>>>>   +/* Sub routine of expand_assignment, invoked when assigning from a
>>>> +   parameter or assigning to a return val on struct type which may
>>>> +   be passed through registers.  The mode of register is used to
>>>> +   move the content for the assignment.
>>>> +
>>>> +   This routine generates code for expression FROM which is BLKmode,
>>>> +   and move the generated content to TO_RTX by su-blocks in SUB_MODE.  */
>>>> +
>>>> +static void
>>>> +move_sub_blocks (rtx to_rtx, tree from, machine_mode sub_mode)
>>>> +{
>>>> +  gcc_assert (MEM_P (to_rtx));
>>>> +
>>>> +  HOST_WIDE_INT size = MEM_SIZE (to_rtx).to_constant ();
>>> Consider the case of a BLKmode return value.  Isn't TO_RTX in this
>>> case a BLKmode object?
>>
>> Thanks for this question!
>>
>> Yes, the mode of TO_RTX is BLKmode.
>> As we know, when the function returns via registers, the mode of
>> the `return-rtx` could also be BLKmode.  This patch is going to
>> improve these kinds of cases.
>>
>> For example:
>> ```
>> typedef struct FLOATS
>> {
>>    double a[3];
>> } FLOATS;
>> FLOATS ret_arg_pt (FLOATS *a){return *a;}
>> ```
>>
>> D.3952 = *a_2(D); //this patch enhance this assignment
>> return D.3952;
>>
>> The mode is BLKmode for the rtx of `D.3952` is BLKmode, and the
>> rtx for "DECL_RESULT(current_function_decl)".  And the DECL_RESULT
>> represents the return registers.
> I didn't think MEM_SIZE worked for BLKmode.  BUt looking at its
> definition, it's pulling the size out of the attributes rather than
> from the mode.  SO I guess there's a reasonable chance it's going to
> work :-)

Thanks for point out this!  Yes, BLKmode rtx may not always be a MEM.
MEM_SIZE is only ok for MEM after the it's known size is computed.
Here MEM_SIZE is fine just because it is an stack rtx corresponding
to the type of parameter and returns which has been computed.

I updated the patch to resolve the conflicts with the trunk, and
retest bootstrap&testsuite, and then updated the patch a new version.

And this version pass bootstrap and regtest on ppc64{,le}, x86_64. 

The major change is 'move_sub_blocks' only handles the case when
the block size can be move by same submode, or say (size % sub_size)
is 0.  If no objection, I would committed the new version.

BR,
Jeff (Jiufu)

gcc/ChangeLog:

	* cfgexpand.cc (expand_used_vars): Update to mark DECL_USEDBY_RETURN_P
	for returns.
	* expr.cc (move_sub_blocks): New function.
	(expand_assignment): Update assignment code about returns/parameters.
	* function.cc (assign_parm_setup_block): Update to mark
	DECL_REGS_TO_STACK_P for parameter.
	* tree-core.h (struct tree_decl_common): Add comment.
	* tree.h (DECL_USEDBY_RETURN_P): New define.
	(DECL_REGS_TO_STACK_P): New define.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/pr65421-1.c: New test.
	* gcc.target/powerpc/pr65421-2.c: New test.
---
 gcc/cfgexpand.cc                             | 14 +++++
 gcc/expr.cc                                  | 62 ++++++++++++++++++++
 gcc/function.cc                              |  3 +
 gcc/tree-core.h                              |  4 +-
 gcc/tree.h                                   |  9 +++
 gcc/testsuite/gcc.target/powerpc/pr65421-1.c |  6 ++
 gcc/testsuite/gcc.target/powerpc/pr65421-2.c | 33 +++++++++++
 7 files changed, 130 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-2.c

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index 1a1b26b1c6c..7b6a2216492 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -2158,6 +2158,20 @@ expand_used_vars (bitmap forced_stack_vars)
     frame_phase = off ? align - off : 0;
   }
 
+  /* Collect VARs on returns.  */
+  if (DECL_RESULT (current_function_decl))
+    {
+      edge_iterator ei;
+      edge e;
+      FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
+	if (greturn *ret = safe_dyn_cast<greturn *> (*gsi_last_bb (e->src)))
+	  {
+	    tree val = gimple_return_retval (ret);
+	    if (val && VAR_P (val))
+	      DECL_USEDBY_RETURN_P (val) = 1;
+	  }
+    }
+
   /* Set TREE_USED on all variables in the local_decls.  */
   FOR_EACH_LOCAL_DECL (cfun, i, var)
     TREE_USED (var) = 1;
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 758dda9ec68..5ed98b477c3 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -5556,6 +5556,42 @@ mem_ref_refers_to_non_mem_p (tree ref)
   return non_mem_decl_p (base);
 }
 
+/* Sub routine of expand_assignment, invoked when assigning from a
+   parameter or assigning to a return val on struct type which may
+   be passed through registers.  The mode of register is used to
+   move the content for the assignment.
+
+   This routine generates code for expression FROM which is BLKmode,
+   and move the generated content to TO_RTX by su-blocks in SUB_MODE.  */
+
+static bool
+move_sub_blocks (rtx to_rtx, tree from, machine_mode sub_mode)
+{
+  gcc_assert (MEM_P (to_rtx));
+
+  HOST_WIDE_INT size = MEM_SIZE (to_rtx).to_constant ();
+  HOST_WIDE_INT sub_size = GET_MODE_SIZE (sub_mode).to_constant ();
+  HOST_WIDE_INT len = size / sub_size;
+  if (size % sub_size != 0)
+    return false;
+
+  push_temp_slots ();
+
+  rtx from_rtx = expand_expr (from, NULL_RTX, GET_MODE (to_rtx), EXPAND_NORMAL);
+  for (int i = 0; i < len; i++)
+    {
+      rtx temp = gen_reg_rtx (sub_mode);
+      rtx src = adjust_address (from_rtx, sub_mode, sub_size * i);
+      rtx dest = adjust_address (to_rtx, sub_mode, sub_size * i);
+      emit_move_insn (temp, src);
+      emit_move_insn (dest, temp);
+    }
+
+  preserve_temp_slots (to_rtx);
+  pop_temp_slots ();
+  return true;
+}
+
 /* Expand an assignment that stores the value of FROM into TO.  If NONTEMPORAL
    is true, try generating a nontemporal store.  */
 
@@ -6042,6 +6078,32 @@ expand_assignment (tree to, tree from, bool nontemporal)
       return;
     }
 
+  /* If it is assigning from a struct param which may be passed via registers,
+     it would be better to use the register's mode to move sub-blocks for the
+     assignment.  */
+  if (TREE_CODE (from) == PARM_DECL && mode == BLKmode
+      && DECL_REGS_TO_STACK_P (from))
+    {
+      rtx parm = DECL_INCOMING_RTL (from);
+      machine_mode sub_mode
+	= REG_P (parm) ? word_mode : GET_MODE (XEXP (XVECEXP (parm, 0, 0), 0));
+      if (move_sub_blocks (to_rtx, from, sub_mode))
+	return;
+    }
+
+  /* If it is assigning to a struct var which will be returned, and the
+     function is returning via registers, it would be better to use the
+     register's mode to move sub-blocks for the assignment.  */
+  if (VAR_P (to) && DECL_USEDBY_RETURN_P (to) && mode == BLKmode
+      && TREE_CODE (from) != CONSTRUCTOR
+      && GET_CODE (DECL_RTL (DECL_RESULT (current_function_decl))) == PARALLEL)
+    {
+      rtx ret = DECL_RTL (DECL_RESULT (current_function_decl));
+      machine_mode sub_mode = GET_MODE (XEXP (XVECEXP (ret, 0, 0), 0));
+      if (move_sub_blocks (to_rtx, from, sub_mode))
+	return;
+    }
+
   /* Compute FROM and store the value in the rtx we got.  */
 
   push_temp_slots ();
diff --git a/gcc/function.cc b/gcc/function.cc
index f0ae641512d..bc291a95b32 100644
--- a/gcc/function.cc
+++ b/gcc/function.cc
@@ -2990,6 +2990,9 @@ assign_parm_setup_block (struct assign_parm_data_all *all,
 
       mem = validize_mem (copy_rtx (stack_parm));
 
+      if (MEM_P (mem) && size != 0 && size % UNITS_PER_WORD == 0)
+	DECL_REGS_TO_STACK_P (parm) = 1;
+
       /* Handle values in multiple non-contiguous locations.  */
       if (GET_CODE (entry_parm) == PARALLEL && !MEM_P (mem))
 	emit_group_store (mem, entry_parm, data->arg.type, size);
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 847f0b1e994..7543f7a57b2 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1802,7 +1802,9 @@ struct GTY(()) tree_decl_common {
      In VAR_DECL, PARM_DECL and RESULT_DECL, this is
      DECL_HAS_VALUE_EXPR_P.  */
   unsigned decl_flag_2 : 1;
-  /* In FIELD_DECL, this is DECL_PADDING_P.  */
+  /* In FIELD_DECL, this is DECL_PADDING_P
+     In VAR_DECL, this is DECL_USEDBY_RETURN_P
+     In PARM_DECL, this is DECL_REGS_TO_STACK_P.  */
   unsigned decl_flag_3 : 1;
   /* Logically, these two would go in a theoretical base shared by var and
      parm decl. */
diff --git a/gcc/tree.h b/gcc/tree.h
index 0b72663e6a1..6f8e7cabed3 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -3019,6 +3019,15 @@ extern void decl_value_expr_insert (tree, tree);
 #define DECL_PADDING_P(NODE) \
   (FIELD_DECL_CHECK (NODE)->decl_common.decl_flag_3)
 
+/* Used in a VAR_DECL to indicate that it is used by a return stmt.  */
+#define DECL_USEDBY_RETURN_P(NODE) \
+  (VAR_DECL_CHECK (NODE)->decl_common.decl_flag_3)
+
+/* Used in a PARM_DECL to indicate that it is struct parameter passed
+   by registers totally and stored to stack during setup.  */
+#define DECL_REGS_TO_STACK_P(NODE) \
+  (PARM_DECL_CHECK (NODE)->decl_common.decl_flag_3)
+
 /* Used in a FIELD_DECL to indicate whether this field is not a flexible
    array member. This is only valid for the last array type field of a
    structure.  */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421-1.c b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c
new file mode 100644
index 00000000000..4e1f87f7939
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c
@@ -0,0 +1,6 @@
+/* PR target/65421 */
+/* { dg-options "-O2" } */
+
+typedef struct LARGE {double a[4]; int arr[32];} LARGE;
+LARGE foo (LARGE a){return a;}
+/* { dg-final { scan-assembler-times {\mmemcpy\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421-2.c b/gcc/testsuite/gcc.target/powerpc/pr65421-2.c
new file mode 100644
index 00000000000..fd5ad542c64
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr65421-2.c
@@ -0,0 +1,33 @@
+/* PR target/65421 */
+/* { dg-options "-O2" } */
+/* { dg-require-effective-target powerpc_elfv2 } */
+/* { dg-require-effective-target has_arch_ppc64 } */
+
+typedef struct FLOATS
+{
+  double a[3];
+} FLOATS;
+
+/* 3 lfd */
+FLOATS ret_arg_pt (FLOATS *a){return *a;}
+/* { dg-final { scan-assembler-times {\mlfd\M} 3 } } */
+
+/* 3 stfd */
+void st_arg (FLOATS a, FLOATS *p) {*p = a;}
+/* { dg-final { scan-assembler-times {\mstfd\M} 3 } } */
+
+/* blr */
+FLOATS ret_arg (FLOATS a) {return a;}
+
+typedef struct MIX
+{
+  double a[2];
+  long l;
+} MIX;
+
+/* std 3 param regs to return slot */
+MIX ret_arg1 (MIX a) {return a;}
+/* { dg-final { scan-assembler-times {\mstd\M} 3 } } */
+
+/* count insns */
+/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 13 } } */
-- 
2.31.1


>
> OK for the trunk.
>
> jeff

  reply	other threads:[~2023-05-09 13:43 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-17  3:39 Jiufu Guo
2023-04-30 19:00 ` Jeff Law
2023-05-04  5:49   ` guojiufu
2023-05-05 23:37     ` Jeff Law
2023-05-09 13:43       ` Jiufu Guo [this message]
2023-06-04 16:59         ` Jeff Law
2023-06-07  1:21           ` guojiufu
2023-05-01 15:52 ` Segher Boessenkool
2023-05-04  5:55   ` guojiufu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7n3545d3hq.fsf@ltcden2-lp1.aus.stglabs.ibm.com \
    --to=guojiufu@linux.ibm.com \
    --cc=dje.gcc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jeffreyalaw@gmail.com \
    --cc=linkw@gcc.gnu.org \
    --cc=meissner@linux.ibm.com \
    --cc=rguenther@suse.de \
    --cc=segher@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).