From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 54C9C3858D37; Tue, 9 May 2023 13:43:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 54C9C3858D37 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0353726.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 349D8oda003467; Tue, 9 May 2023 13:43:53 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : references : date : in-reply-to : message-id : content-type : content-transfer-encoding : mime-version; s=pp1; bh=gvGNY+QgwXhV8Hc8i0iHAHLIn0hAQ8D57XZIfbj5goU=; b=tq8C2ZetPelcBUY/bSjl25r8a+hIK+k7b14F78LmgwqQudyW+PHIMDItzeHXxEmDz385 EUh6Xv00Ha9njVM8VnekRFt4uyAICv7T6GGoJAeM/g2LqT2ZUfdQ4JtyzpU9g/ajHFxh CFpTBTbSToGFcfOLyLgC5AL4+lCfV2dNyw37B1/suXKIX4ir7qosBxHKrcFRIF62XwaP XhyBaQW8wvlRlEhpw2MJ2/WdfIYYZ8dcZBuUtzlUhuPXXBFcpAWSClJ++v/6zTbppzvv K+dc8PxP7H87vxeADuLGiqhoDePgUpOuWDJZYAKe+YMHl3uzcO5Re8vHgJQeEtkxwZ3c Zg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qfjnhhsa3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 09 May 2023 13:43:52 +0000 Received: from m0353726.ppops.net (m0353726.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 349D9j8W010159; Tue, 9 May 2023 13:43:52 GMT Received: from ppma03wdc.us.ibm.com (ba.79.3fa9.ip4.static.sl-reverse.com [169.63.121.186]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3qfjnhhs99-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 09 May 2023 13:43:51 +0000 Received: from pps.filterd (ppma03wdc.us.ibm.com [127.0.0.1]) by ppma03wdc.us.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 349DJ4G8026705; Tue, 9 May 2023 13:43:50 GMT Received: from smtprelay04.dal12v.mail.ibm.com ([9.208.130.102]) by ppma03wdc.us.ibm.com (PPS) with ESMTPS id 3qf7xd3krd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 09 May 2023 13:43:50 +0000 Received: from smtpav01.dal12v.mail.ibm.com (smtpav01.dal12v.mail.ibm.com [10.241.53.100]) by smtprelay04.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 349DhnRE38928768 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 9 May 2023 13:43:49 GMT Received: from smtpav01.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C88FD58057; Tue, 9 May 2023 13:43:49 +0000 (GMT) Received: from smtpav01.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9EC455805D; Tue, 9 May 2023 13:43:49 +0000 (GMT) Received: from ltcden2-lp1.aus.stglabs.ibm.com (unknown [9.3.90.43]) by smtpav01.dal12v.mail.ibm.com (Postfix) with ESMTPS; Tue, 9 May 2023 13:43:49 +0000 (GMT) From: Jiufu Guo To: Jeff Law Cc: gcc-patches@gcc.gnu.org, rguenther@suse.de, segher@kernel.crashing.org, dje.gcc@gmail.com, linkw@gcc.gnu.org, meissner@linux.ibm.com Subject: Re: [PATCH V5] Use reg mode to move sub blocks for parameters and returns References: <20230317033952.1549050-1-guojiufu@linux.ibm.com> <0221bead-5d08-cb56-e620-642825c1abc3@gmail.com> <0473d90c-7ee0-2989-16b9-234f422e0e3c@gmail.com> Date: Tue, 09 May 2023 21:43:45 +0800 In-Reply-To: <0473d90c-7ee0-2989-16b9-234f422e0e3c@gmail.com> (Jeff Law's message of "Fri, 5 May 2023 17:37:05 -0600") Message-ID: <7n3545d3hq.fsf@ltcden2-lp1.aus.stglabs.ibm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) Content-Type: text/plain; charset=utf-8 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: BbM0_3OhUXTvlTvIXHeYptVYfV1LDwZU X-Proofpoint-ORIG-GUID: du6T1qIlDBJshvInMqiqcucOs-RIboKI Content-Transfer-Encoding: quoted-printable X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-05-09_08,2023-05-05_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 malwarescore=0 clxscore=1015 adultscore=0 mlxlogscore=999 suspectscore=0 phishscore=0 priorityscore=1501 lowpriorityscore=0 mlxscore=0 spamscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2304280000 definitions=main-2305090111 X-Spam-Status: No, score=-11.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi, Jeff Law writes: > On 5/3/23 23:49, guojiufu wrote: >> Hi, >> >> On 2023-05-01 03:00, Jeff Law wrote: >>> On 3/16/23 21:39, Jiufu Guo wrote: >>>> Hi, >>>> >>>> When assigning a parameter to a variable, or assigning a variable to >>>> return value with struct type, and the parameter/return is passed >>>> through registers. >>>> For this kind of case, it would be better to use the nature mode of >>>> the registers to move the content for the assignment. >>>> >>>> As the example code (like code in PR65421): >>>> >>>> typedef struct SA {double a[3];} A; >>>> A ret_arg_pt (A *a) {return *a;} // on ppc64le, expect only 3 lfd(s) >>>> A ret_arg (A a) {return a;} // just empty fun body >>>> void st_arg (A a, A *p) {*p =3D a;} //only 3 stfd(s) >>>> >>>> Comparing with previous version: >>>> https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609394.html >>>> This version refine code to eliminated reductant code in=C2=A0 the sub >>>> routine "move_sub_blocks". >>>> >>>> Bootstrap and regtest pass on ppc64{,le}. >>>> Is this ok for trunk? >>>> >> ... >> >>>> diff --git a/gcc/expr.cc b/gcc/expr.cc >>>> index 15be1c8db99..97a7be9542e 100644 >>>> --- a/gcc/expr.cc >>>> +++ b/gcc/expr.cc >>>> @@ -5559,6 +5559,41 @@ mem_ref_refers_to_non_mem_p (tree ref) >>>> =C2=A0=C2=A0=C2=A0 return non_mem_decl_p (base); >>>> =C2=A0 } >>>> =C2=A0 +/* Sub routine of expand_assignment, invoked when assigning fr= om a >>>> +=C2=A0=C2=A0 parameter or assigning to a return val on struct type wh= ich may >>>> +=C2=A0=C2=A0 be passed through registers.=C2=A0 The mode of register = is used to >>>> +=C2=A0=C2=A0 move the content for the assignment. >>>> + >>>> +=C2=A0=C2=A0 This routine generates code for expression FROM which is= BLKmode, >>>> +=C2=A0=C2=A0 and move the generated content to TO_RTX by su-blocks in= SUB_MODE. =C2=A0*/ >>>> + >>>> +static void >>>> +move_sub_blocks (rtx to_rtx, tree from, machine_mode sub_mode) >>>> +{ >>>> +=C2=A0 gcc_assert (MEM_P (to_rtx)); >>>> + >>>> +=C2=A0 HOST_WIDE_INT size =3D MEM_SIZE (to_rtx).to_constant (); >>> Consider the case of a BLKmode return value.=C2=A0 Isn't TO_RTX in this >>> case a BLKmode object? >> >> Thanks for this question! >> >> Yes, the mode of TO_RTX is BLKmode. >> As we know, when the function returns via registers, the mode of >> the `return-rtx` could also be BLKmode.=C2=A0 This patch is going to >> improve these kinds of cases. >> >> For example: >> ``` >> typedef struct FLOATS >> { >> =C2=A0 double a[3]; >> } FLOATS; >> FLOATS ret_arg_pt (FLOATS *a){return *a;} >> ``` >> >> D.3952 =3D *a_2(D); //this patch enhance this assignment >> return D.3952; >> >> The mode is BLKmode for the rtx of `D.3952` is BLKmode, and the >> rtx for "DECL_RESULT(current_function_decl)".=C2=A0 And the DECL_RESULT >> represents the return registers. > I didn't think MEM_SIZE worked for BLKmode. BUt looking at its > definition, it's pulling the size out of the attributes rather than > from the mode. SO I guess there's a reasonable chance it's going to > work :-) Thanks for point out this! Yes, BLKmode rtx may not always be a MEM. MEM_SIZE is only ok for MEM after the it's known size is computed. Here MEM_SIZE is fine just because it is an stack rtx corresponding to the type of parameter and returns which has been computed. I updated the patch to resolve the conflicts with the trunk, and retest bootstrap&testsuite, and then updated the patch a new version. And this version pass bootstrap and regtest on ppc64{,le}, x86_64.=20 The major change is 'move_sub_blocks' only handles the case when the block size can be move by same submode, or say (size % sub_size) is 0. If no objection, I would committed the new version. BR, Jeff (Jiufu) gcc/ChangeLog: * cfgexpand.cc (expand_used_vars): Update to mark DECL_USEDBY_RETURN_P for returns. * expr.cc (move_sub_blocks): New function. (expand_assignment): Update assignment code about returns/parameters. * function.cc (assign_parm_setup_block): Update to mark DECL_REGS_TO_STACK_P for parameter. * tree-core.h (struct tree_decl_common): Add comment. * tree.h (DECL_USEDBY_RETURN_P): New define. (DECL_REGS_TO_STACK_P): New define. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr65421-1.c: New test. * gcc.target/powerpc/pr65421-2.c: New test. --- gcc/cfgexpand.cc | 14 +++++ gcc/expr.cc | 62 ++++++++++++++++++++ gcc/function.cc | 3 + gcc/tree-core.h | 4 +- gcc/tree.h | 9 +++ gcc/testsuite/gcc.target/powerpc/pr65421-1.c | 6 ++ gcc/testsuite/gcc.target/powerpc/pr65421-2.c | 33 +++++++++++ 7 files changed, 130 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-2.c diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc index 1a1b26b1c6c..7b6a2216492 100644 --- a/gcc/cfgexpand.cc +++ b/gcc/cfgexpand.cc @@ -2158,6 +2158,20 @@ expand_used_vars (bitmap forced_stack_vars) frame_phase =3D off ? align - off : 0; } =20 + /* Collect VARs on returns. */ + if (DECL_RESULT (current_function_decl)) + { + edge_iterator ei; + edge e; + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) + if (greturn *ret =3D safe_dyn_cast (*gsi_last_bb (e->src))) + { + tree val =3D gimple_return_retval (ret); + if (val && VAR_P (val)) + DECL_USEDBY_RETURN_P (val) =3D 1; + } + } + /* Set TREE_USED on all variables in the local_decls. */ FOR_EACH_LOCAL_DECL (cfun, i, var) TREE_USED (var) =3D 1; diff --git a/gcc/expr.cc b/gcc/expr.cc index 758dda9ec68..5ed98b477c3 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -5556,6 +5556,42 @@ mem_ref_refers_to_non_mem_p (tree ref) return non_mem_decl_p (base); } =20 +/* Sub routine of expand_assignment, invoked when assigning from a + parameter or assigning to a return val on struct type which may + be passed through registers. The mode of register is used to + move the content for the assignment. + + This routine generates code for expression FROM which is BLKmode, + and move the generated content to TO_RTX by su-blocks in SUB_MODE. */ + +static bool +move_sub_blocks (rtx to_rtx, tree from, machine_mode sub_mode) +{ + gcc_assert (MEM_P (to_rtx)); + + HOST_WIDE_INT size =3D MEM_SIZE (to_rtx).to_constant (); + HOST_WIDE_INT sub_size =3D GET_MODE_SIZE (sub_mode).to_constant (); + HOST_WIDE_INT len =3D size / sub_size; + if (size % sub_size !=3D 0) + return false; + + push_temp_slots (); + + rtx from_rtx =3D expand_expr (from, NULL_RTX, GET_MODE (to_rtx), EXPAND_= NORMAL); + for (int i =3D 0; i < len; i++) + { + rtx temp =3D gen_reg_rtx (sub_mode); + rtx src =3D adjust_address (from_rtx, sub_mode, sub_size * i); + rtx dest =3D adjust_address (to_rtx, sub_mode, sub_size * i); + emit_move_insn (temp, src); + emit_move_insn (dest, temp); + } + + preserve_temp_slots (to_rtx); + pop_temp_slots (); + return true; +} + /* Expand an assignment that stores the value of FROM into TO. If NONTEMP= ORAL is true, try generating a nontemporal store. */ =20 @@ -6042,6 +6078,32 @@ expand_assignment (tree to, tree from, bool nontempo= ral) return; } =20 + /* If it is assigning from a struct param which may be passed via regist= ers, + it would be better to use the register's mode to move sub-blocks for = the + assignment. */ + if (TREE_CODE (from) =3D=3D PARM_DECL && mode =3D=3D BLKmode + && DECL_REGS_TO_STACK_P (from)) + { + rtx parm =3D DECL_INCOMING_RTL (from); + machine_mode sub_mode + =3D REG_P (parm) ? word_mode : GET_MODE (XEXP (XVECEXP (parm, 0, 0), 0)); + if (move_sub_blocks (to_rtx, from, sub_mode)) + return; + } + + /* If it is assigning to a struct var which will be returned, and the + function is returning via registers, it would be better to use the + register's mode to move sub-blocks for the assignment. */ + if (VAR_P (to) && DECL_USEDBY_RETURN_P (to) && mode =3D=3D BLKmode + && TREE_CODE (from) !=3D CONSTRUCTOR + && GET_CODE (DECL_RTL (DECL_RESULT (current_function_decl))) =3D=3D = PARALLEL) + { + rtx ret =3D DECL_RTL (DECL_RESULT (current_function_decl)); + machine_mode sub_mode =3D GET_MODE (XEXP (XVECEXP (ret, 0, 0), 0)); + if (move_sub_blocks (to_rtx, from, sub_mode)) + return; + } + /* Compute FROM and store the value in the rtx we got. */ =20 push_temp_slots (); diff --git a/gcc/function.cc b/gcc/function.cc index f0ae641512d..bc291a95b32 100644 --- a/gcc/function.cc +++ b/gcc/function.cc @@ -2990,6 +2990,9 @@ assign_parm_setup_block (struct assign_parm_data_all = *all, =20 mem =3D validize_mem (copy_rtx (stack_parm)); =20 + if (MEM_P (mem) && size !=3D 0 && size % UNITS_PER_WORD =3D=3D 0) + DECL_REGS_TO_STACK_P (parm) =3D 1; + /* Handle values in multiple non-contiguous locations. */ if (GET_CODE (entry_parm) =3D=3D PARALLEL && !MEM_P (mem)) emit_group_store (mem, entry_parm, data->arg.type, size); diff --git a/gcc/tree-core.h b/gcc/tree-core.h index 847f0b1e994..7543f7a57b2 100644 --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -1802,7 +1802,9 @@ struct GTY(()) tree_decl_common { In VAR_DECL, PARM_DECL and RESULT_DECL, this is DECL_HAS_VALUE_EXPR_P. */ unsigned decl_flag_2 : 1; - /* In FIELD_DECL, this is DECL_PADDING_P. */ + /* In FIELD_DECL, this is DECL_PADDING_P + In VAR_DECL, this is DECL_USEDBY_RETURN_P + In PARM_DECL, this is DECL_REGS_TO_STACK_P. */ unsigned decl_flag_3 : 1; /* Logically, these two would go in a theoretical base shared by var and parm decl. */ diff --git a/gcc/tree.h b/gcc/tree.h index 0b72663e6a1..6f8e7cabed3 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -3019,6 +3019,15 @@ extern void decl_value_expr_insert (tree, tree); #define DECL_PADDING_P(NODE) \ (FIELD_DECL_CHECK (NODE)->decl_common.decl_flag_3) =20 +/* Used in a VAR_DECL to indicate that it is used by a return stmt. */ +#define DECL_USEDBY_RETURN_P(NODE) \ + (VAR_DECL_CHECK (NODE)->decl_common.decl_flag_3) + +/* Used in a PARM_DECL to indicate that it is struct parameter passed + by registers totally and stored to stack during setup. */ +#define DECL_REGS_TO_STACK_P(NODE) \ + (PARM_DECL_CHECK (NODE)->decl_common.decl_flag_3) + /* Used in a FIELD_DECL to indicate whether this field is not a flexible array member. This is only valid for the last array type field of a structure. */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421-1.c b/gcc/testsuite/g= cc.target/powerpc/pr65421-1.c new file mode 100644 index 00000000000..4e1f87f7939 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c @@ -0,0 +1,6 @@ +/* PR target/65421 */ +/* { dg-options "-O2" } */ + +typedef struct LARGE {double a[4]; int arr[32];} LARGE; +LARGE foo (LARGE a){return a;} +/* { dg-final { scan-assembler-times {\mmemcpy\M} 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421-2.c b/gcc/testsuite/g= cc.target/powerpc/pr65421-2.c new file mode 100644 index 00000000000..fd5ad542c64 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr65421-2.c @@ -0,0 +1,33 @@ +/* PR target/65421 */ +/* { dg-options "-O2" } */ +/* { dg-require-effective-target powerpc_elfv2 } */ +/* { dg-require-effective-target has_arch_ppc64 } */ + +typedef struct FLOATS +{ + double a[3]; +} FLOATS; + +/* 3 lfd */ +FLOATS ret_arg_pt (FLOATS *a){return *a;} +/* { dg-final { scan-assembler-times {\mlfd\M} 3 } } */ + +/* 3 stfd */ +void st_arg (FLOATS a, FLOATS *p) {*p =3D a;} +/* { dg-final { scan-assembler-times {\mstfd\M} 3 } } */ + +/* blr */ +FLOATS ret_arg (FLOATS a) {return a;} + +typedef struct MIX +{ + double a[2]; + long l; +} MIX; + +/* std 3 param regs to return slot */ +MIX ret_arg1 (MIX a) {return a;} +/* { dg-final { scan-assembler-times {\mstd\M} 3 } } */ + +/* count insns */ +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 13 } } */ --=20 2.31.1 > > OK for the trunk. > > jeff