From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id C661E385842B; Thu, 2 Mar 2023 09:22:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C661E385842B Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3228Zmtk021926; Thu, 2 Mar 2023 09:22:02 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : references : date : in-reply-to : message-id : content-type : mime-version; s=pp1; bh=P6dSQkpQxyqmxgmam7H07zwgrYnmN+MVfFeGbyIcOgo=; b=ezr3e5gxHH8A5ZE8PHWMOAm5EyoEc921PpDY3s9ID5IZXTYjDnZF7H3XF1JBaM/85gWq WFdd4sYnHaph2xMdHMV1Us589TqAh45d5SfFEcODAhHgyc8XO3tpukQfAZi3KqHy4D6n DPLnADPY91KCMuIU/KdkEYLWoY8Hr4u0+nE/UCfYC8/X4IrIg6K0SlrW3bJ91OCkUzLf 9Njv/yPcwFSeQPkvEJP0NYUfqHKr7P2fbUxMreulwpWYlnvzy59g607VW6X1e28QPEf6 P2IVbOludM2lKRLHP4l7BKJcTpbg/EVckXj5LjaO1oWbkoW6EbRkFSUfgYQQH6zCNaVr UA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3p2re31aas-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 02 Mar 2023 09:22:02 +0000 Received: from m0098416.ppops.net (m0098416.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3228ai1H024990; Thu, 2 Mar 2023 09:22:01 GMT Received: from ppma02dal.us.ibm.com (a.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.10]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3p2re31aad-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 02 Mar 2023 09:22:01 +0000 Received: from pps.filterd (ppma02dal.us.ibm.com [127.0.0.1]) by ppma02dal.us.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3227LTFe016412; Thu, 2 Mar 2023 09:22:00 GMT Received: from smtprelay04.dal12v.mail.ibm.com ([9.208.130.102]) by ppma02dal.us.ibm.com (PPS) with ESMTPS id 3nybdm2x7b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 02 Mar 2023 09:22:00 +0000 Received: from smtpav06.dal12v.mail.ibm.com (smtpav06.dal12v.mail.ibm.com [10.241.53.105]) by smtprelay04.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3229LxqE62980440 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 2 Mar 2023 09:21:59 GMT Received: from smtpav06.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9503158055; Thu, 2 Mar 2023 09:21:59 +0000 (GMT) Received: from smtpav06.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6589858043; Thu, 2 Mar 2023 09:21:59 +0000 (GMT) Received: from ltcden2-lp1.aus.stglabs.ibm.com (unknown [9.3.90.43]) by smtpav06.dal12v.mail.ibm.com (Postfix) with ESMTPS; Thu, 2 Mar 2023 09:21:59 +0000 (GMT) From: Jiufu Guo To: gcc-patches@gcc.gnu.org Cc: segher@kernel.crashing.org, dje.gcc@gmail.com, linkw@gcc.gnu.org, rguenther@suse.de, jeffreyalaw@gmail.com Subject: Ping: [PATCH V4] Use reg mode to move sub blocks for parameters and returns References: <20230104124439.191858-1-guojiufu@linux.ibm.com> Date: Thu, 02 Mar 2023 17:21:54 +0800 In-Reply-To: <20230104124439.191858-1-guojiufu@linux.ibm.com> (Jiufu Guo's message of "Wed, 4 Jan 2023 20:44:39 +0800") Message-ID: <7nmt4vld8t.fsf@ltcden2-lp1.aus.stglabs.ibm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) Content-Type: text/plain X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 1zMdjFO9t7r4zNUz5g93v_HYFuhtMoIz X-Proofpoint-ORIG-GUID: OKWY3QTyueqSw9bevDOwB9lLHqg4LDtb X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-03-02_04,2023-03-02_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=999 clxscore=1011 mlxscore=0 lowpriorityscore=0 priorityscore=1501 adultscore=0 bulkscore=0 suspectscore=0 impostorscore=0 spamscore=0 phishscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2303020078 X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi, Ping this patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609394.html Thanks for any comments and suggestions! BR, Jeff (Jiufu) Jiufu Guo writes: > Hi, > > When assigning a parameter to a variable, or assigning a variable to > return value with struct type, "block move" may be used to expand > the assignment if the parameter/return is passing through registers and > the parameter/return has BLK mode. > For this kind of case, when moving the blocks, it would be better to use > the nature mode of the registers. > This would raise more opportunities for other optimization passes(cse, > dse, xprop). > > As the example code (like code in PR65421): > > typedef struct SA {double a[3];} A; > A ret_arg_pt (A *a) {return *a;} // on ppc64le, expect only 3 lfd(s) > A ret_arg (A a) {return a;} // just empty fun body > void st_arg (A a, A *p) {*p = a;} //only 3 stfd(s) > > This patches check the "from" and "to" of an assignment in > "expand_assignment", if it is about param/ret which may passing via > register, then use the register nature mode to move sub-blocks for > the assignning. > > This patch may be still useful even if we change the behavior of > parameter setup or adopt SRA-like code in expender. > > Comparing with previous version: > https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608081.html > This patch update the code slightly and merged/added test cases. > And I checked the cases with large struct or non-homogeneous struct > to confirm it does not degrade the code. > > Bootstrap and regtest pass on ppc64{,le} and x86_64. > Is this ok for trunk? > > BR, > Jeff (Jiufu) > > PR target/65421 > > gcc/ChangeLog: > > * cfgexpand.cc (expand_used_vars): Update to mark DECL_USEDBY_RETURN_P > for returns. > * expr.cc (move_sub_blocks): New function. > (expand_assignment): Update to call move_sub_blocks for returns or > parameters. > * function.cc (assign_parm_setup_block): Update to mark > DECL_REGS_TO_STACK_P for parameter. > * tree-core.h (struct tree_decl_common): Add comment. > * tree.h (DECL_USEDBY_RETURN_P): New define. > (DECL_REGS_TO_STACK_P): New define. > > gcc/testsuite/ChangeLog: > > * gcc.target/powerpc/pr65421-1.c: New test. > * gcc.target/powerpc/pr65421.c: New test. > > --- > gcc/cfgexpand.cc | 14 ++++ > gcc/expr.cc | 77 ++++++++++++++++++++ > gcc/function.cc | 3 + > gcc/tree-core.h | 4 +- > gcc/tree.h | 9 +++ > gcc/testsuite/gcc.target/powerpc/pr65421-1.c | 6 ++ > gcc/testsuite/gcc.target/powerpc/pr65421.c | 33 +++++++++ > 7 files changed, 145 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421.c > > diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc > index dd29ffffc03..09b8ec64cea 100644 > --- a/gcc/cfgexpand.cc > +++ b/gcc/cfgexpand.cc > @@ -2158,6 +2158,20 @@ expand_used_vars (bitmap forced_stack_vars) > frame_phase = off ? align - off : 0; > } > > + /* Collect VARs on returns. */ > + if (DECL_RESULT (current_function_decl)) > + { > + edge_iterator ei; > + edge e; > + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) > + if (greturn *ret = safe_dyn_cast (last_stmt (e->src))) > + { > + tree val = gimple_return_retval (ret); > + if (val && VAR_P (val)) > + DECL_USEDBY_RETURN_P (val) = 1; > + } > + } > + > /* Set TREE_USED on all variables in the local_decls. */ > FOR_EACH_LOCAL_DECL (cfun, i, var) > TREE_USED (var) = 1; > diff --git a/gcc/expr.cc b/gcc/expr.cc > index d9407432ea5..afcec6f3c10 100644 > --- a/gcc/expr.cc > +++ b/gcc/expr.cc > @@ -5559,6 +5559,51 @@ mem_ref_refers_to_non_mem_p (tree ref) > return non_mem_decl_p (base); > } > > +/* Sub routine of expand_assignment, invoked when assigning from a > + parameter or assigning to a return val on struct type which may > + be passed through registers. The mode of register is used to > + move the content for the assignment. > + > + This routine generates code for expression FROM which is BLKmode, > + and move the generated content to TO_RTX by su-blocks in SUB_MODE. */ > + > +static void > +move_sub_blocks (rtx to_rtx, tree from, machine_mode sub_mode, bool nontemporal) > +{ > + gcc_assert (MEM_P (to_rtx)); > + > + HOST_WIDE_INT size = MEM_SIZE (to_rtx).to_constant (); > + HOST_WIDE_INT sub_size = GET_MODE_SIZE (sub_mode).to_constant (); > + HOST_WIDE_INT len = size / sub_size; > + > + /* It would be not profitable to move through sub-modes, if the size does > + not meet register mode. */ > + if ((size % sub_size) != 0) > + { > + push_temp_slots (); > + rtx result = store_expr (from, to_rtx, 0, nontemporal, false); > + preserve_temp_slots (result); > + pop_temp_slots (); > + return; > + } > + > + push_temp_slots (); > + > + rtx from_rtx = expand_expr (from, NULL_RTX, GET_MODE (to_rtx), EXPAND_NORMAL); > + for (int i = 0; i < len; i++) > + { > + rtx temp = gen_reg_rtx (sub_mode); > + rtx src = adjust_address (from_rtx, sub_mode, sub_size * i); > + rtx dest = adjust_address (to_rtx, sub_mode, sub_size * i); > + emit_move_insn (temp, src); > + emit_move_insn (dest, temp); > + } > + > + preserve_temp_slots (to_rtx); > + pop_temp_slots (); > + return; > +} > + > /* Expand an assignment that stores the value of FROM into TO. If NONTEMPORAL > is true, try generating a nontemporal store. */ > > @@ -6045,6 +6090,38 @@ expand_assignment (tree to, tree from, bool nontemporal) > return; > } > > + /* If it is assigning from a struct param which may be passed via registers, > + It would be better to use the register's mode to move sub-blocks for the > + assignment. */ > + if (TREE_CODE (from) == PARM_DECL && mode == BLKmode > + && DECL_REGS_TO_STACK_P (from)) > + { > + rtx parm = DECL_INCOMING_RTL (from); > + gcc_assert (REG_P (parm) || GET_CODE (parm) == PARALLEL); > + > + machine_mode sub_mode; > + if (REG_P (parm)) > + sub_mode = word_mode; > + else > + sub_mode = GET_MODE (XEXP (XVECEXP (parm, 0, 0), 0)); > + > + move_sub_blocks (to_rtx, from, sub_mode, nontemporal); > + return; > + } > + > + /* If it is assigning to a struct var which will be returned, and the > + function is returning via registers, it would be better to use the > + register's mode to move sub-blocks for the assignment. */ > + if (VAR_P (to) && DECL_USEDBY_RETURN_P (to) && mode == BLKmode > + && TREE_CODE (from) != CONSTRUCTOR > + && GET_CODE (DECL_RTL (DECL_RESULT (current_function_decl))) == PARALLEL) > + { > + rtx ret = DECL_RTL (DECL_RESULT (current_function_decl)); > + machine_mode sub_mode = GET_MODE (XEXP (XVECEXP (ret, 0, 0), 0)); > + move_sub_blocks (to_rtx, from, sub_mode, nontemporal); > + return; > + } > + > /* Compute FROM and store the value in the rtx we got. */ > > push_temp_slots (); > diff --git a/gcc/function.cc b/gcc/function.cc > index dc333c27e92..0ff89d89365 100644 > --- a/gcc/function.cc > +++ b/gcc/function.cc > @@ -2991,6 +2991,9 @@ assign_parm_setup_block (struct assign_parm_data_all *all, > > mem = validize_mem (copy_rtx (stack_parm)); > > + if (MEM_P (mem)) > + DECL_REGS_TO_STACK_P (parm) = 1; > + > /* Handle values in multiple non-contiguous locations. */ > if (GET_CODE (entry_parm) == PARALLEL && !MEM_P (mem)) > emit_group_store (mem, entry_parm, data->arg.type, size); > diff --git a/gcc/tree-core.h b/gcc/tree-core.h > index e146b133dbd..c76d08bd109 100644 > --- a/gcc/tree-core.h > +++ b/gcc/tree-core.h > @@ -1808,7 +1808,9 @@ struct GTY(()) tree_decl_common { > In VAR_DECL, PARM_DECL and RESULT_DECL, this is > DECL_HAS_VALUE_EXPR_P. */ > unsigned decl_flag_2 : 1; > - /* In FIELD_DECL, this is DECL_PADDING_P. */ > + /* In FIELD_DECL, this is DECL_PADDING_P > + In VAR_DECL, this is DECL_USEDBY_RETURN_P > + In PARM_DECL, this is DECL_REGS_TO_STACK_P. */ > unsigned decl_flag_3 : 1; > /* Logically, these two would go in a theoretical base shared by var and > parm decl. */ > diff --git a/gcc/tree.h b/gcc/tree.h > index 7e26e726bc5..3f732e3c00c 100644 > --- a/gcc/tree.h > +++ b/gcc/tree.h > @@ -3009,6 +3009,15 @@ extern void decl_value_expr_insert (tree, tree); > #define DECL_PADDING_P(NODE) \ > (FIELD_DECL_CHECK (NODE)->decl_common.decl_flag_3) > > +/* Used in a VAR_DECL to indicate that it is used by a return stmt. */ > +#define DECL_USEDBY_RETURN_P(NODE) \ > + (VAR_DECL_CHECK (NODE)->decl_common.decl_flag_3) > + > +/* Used in a PARM_DECL to indicate that it is struct parameter passed > + by registers totally and stored to stack during setup. */ > +#define DECL_REGS_TO_STACK_P(NODE) \ > + (PARM_DECL_CHECK (NODE)->decl_common.decl_flag_3) > + > /* Used in a FIELD_DECL to indicate whether this field is not a flexible > array member. This is only valid for the last array type field of a > structure. */ > diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421-1.c b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c > new file mode 100644 > index 00000000000..4e1f87f7939 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c > @@ -0,0 +1,6 @@ > +/* PR target/65421 */ > +/* { dg-options "-O2" } */ > + > +typedef struct LARGE {double a[4]; int arr[32];} LARGE; > +LARGE foo (LARGE a){return a;} > +/* { dg-final { scan-assembler-times {\mmemcpy\M} 1 } } */ > diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421.c b/gcc/testsuite/gcc.target/powerpc/pr65421.c > new file mode 100644 > index 00000000000..fd5ad542c64 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr65421.c > @@ -0,0 +1,33 @@ > +/* PR target/65421 */ > +/* { dg-options "-O2" } */ > +/* { dg-require-effective-target powerpc_elfv2 } */ > +/* { dg-require-effective-target has_arch_ppc64 } */ > + > +typedef struct FLOATS > +{ > + double a[3]; > +} FLOATS; > + > +/* 3 lfd */ > +FLOATS ret_arg_pt (FLOATS *a){return *a;} > +/* { dg-final { scan-assembler-times {\mlfd\M} 3 } } */ > + > +/* 3 stfd */ > +void st_arg (FLOATS a, FLOATS *p) {*p = a;} > +/* { dg-final { scan-assembler-times {\mstfd\M} 3 } } */ > + > +/* blr */ > +FLOATS ret_arg (FLOATS a) {return a;} > + > +typedef struct MIX > +{ > + double a[2]; > + long l; > +} MIX; > + > +/* std 3 param regs to return slot */ > +MIX ret_arg1 (MIX a) {return a;} > +/* { dg-final { scan-assembler-times {\mstd\M} 3 } } */ > + > +/* count insns */ > +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 13 } } */