From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id E0339385703A; Wed, 23 Nov 2022 03:00:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E0339385703A Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2AN1sb29010125; Wed, 23 Nov 2022 03:00:18 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : references : date : in-reply-to : message-id : content-type : content-transfer-encoding : mime-version; s=pp1; bh=u9n32SVg2O17mt4+mB5/G7jY7+C8ofU4SyIdTTvojWk=; b=D3F21PCaKegoS9ktAVfGM+2Gz4mBbbkU9KgaHqBgbzS66ZfhAnLpeu4vtX9Q2iZuZ9N2 igMy+Uek+oqv3tccJI2kyfuQPCfZIHMIqBAcmBvrQLZSHeXYVUK2Z8Iy0gqF+kJwW9UQ lQsnhcqdhbC0z3O/+JcxAZ/EA/HlSfH5h42vl2icy+0TqiFR5I6N2a+1/Pg74oAm02Mp Yyn6C7DNqiGJ5qRLX1BS8j6KkSBCfOfnEWC+rJqK6nnRP1xDdIabRavn9Nt8OaECl05t f5DpmST0849tFqtqgOKR8aL1ANePM1lm9L9hkpyVFOoyPQviC0FiRMBK8hMyIBxSSS49 rQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3m10bm04sf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 23 Nov 2022 03:00:17 +0000 Received: from m0098396.ppops.net (m0098396.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2AN2vBqx006974; Wed, 23 Nov 2022 03:00:17 GMT Received: from ppma01dal.us.ibm.com (83.d6.3fa9.ip4.static.sl-reverse.com [169.63.214.131]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3m10bm04s0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 23 Nov 2022 03:00:17 +0000 Received: from pps.filterd (ppma01dal.us.ibm.com [127.0.0.1]) by ppma01dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 2AN2p2G5008885; Wed, 23 Nov 2022 03:00:16 GMT Received: from smtprelay05.wdc07v.mail.ibm.com ([9.208.129.117]) by ppma01dal.us.ibm.com with ESMTP id 3kxpsa6mpu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 23 Nov 2022 03:00:16 +0000 Received: from smtpav01.dal12v.mail.ibm.com (smtpav01.dal12v.mail.ibm.com [10.241.53.100]) by smtprelay05.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2AN2wx5S29950334 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 23 Nov 2022 02:58:59 GMT Received: from smtpav01.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5470A58059; Wed, 23 Nov 2022 02:58:59 +0000 (GMT) Received: from smtpav01.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 12D3B58057; Wed, 23 Nov 2022 02:58:59 +0000 (GMT) Received: from pike (unknown [9.5.12.127]) by smtpav01.dal12v.mail.ibm.com (Postfix) with ESMTPS; Wed, 23 Nov 2022 02:58:59 +0000 (GMT) From: Jiufu Guo To: Jeff Law Cc: gcc-patches@gcc.gnu.org, segher@kernel.crashing.org, dje.gcc@gmail.com, linkw@gcc.gnu.org, rguenther@suse.de Subject: Re: [PATCH V2] Use subscalar mode to move struct block for parameter References: <20221117061549.178481-1-guojiufu@linux.ibm.com> <7ea64lroo6.fsf@pike.rch.stglabs.ibm.com> <9424d98e-a95f-58ae-9764-bcf8b4f503dc@gmail.com> Date: Wed, 23 Nov 2022 10:58:58 +0800 In-Reply-To: <9424d98e-a95f-58ae-9764-bcf8b4f503dc@gmail.com> (Jeff Law's message of "Tue, 22 Nov 2022 14:57:08 -0700") Message-ID: <7efseas7f1.fsf@pike.rch.stglabs.ibm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) Content-Type: text/plain; charset=utf-8 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: eNmrXKWgArhJ8Qlp_Ljevl2_yNuEOwKP X-Proofpoint-GUID: F4ek1C3H7R9eRWBrKcdp8wB0IBKbZ-9l Content-Transfer-Encoding: quoted-printable X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-22_13,2022-11-18_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 priorityscore=1501 impostorscore=0 suspectscore=0 clxscore=1015 lowpriorityscore=0 malwarescore=0 phishscore=0 spamscore=0 mlxscore=0 adultscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2210170000 definitions=main-2211230017 X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Jeff, Thanks a lot for your comments! Jeff Law writes: > On 11/20/22 20:07, Jiufu Guo wrote: >> Jiufu Guo writes: >> >>> Hi, >>> >>> As mentioned in the previous version patch: >>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604646.html >>> The suboptimal code is generated for "assigning from parameter" or >>> "assigning to return value". >>> This patch enhances the assignment from parameters like the below >>> cases: >>> /////case1.c >>> typedef struct SA {double a[3];long l; } A; >>> A ret_arg (A a) {return a;} >>> void st_arg (A a, A *p) {*p =3D a;} >>> >>> ////case2.c >>> typedef struct SA {double a[3];} A; >>> A ret_arg (A a) {return a;} >>> void st_arg (A a, A *p) {*p =3D a;} >>> >>> For this patch, bootstrap and regtest pass on ppc64{,le} >>> and x86_64. >>> * Besides asking for help reviewing this patch, I would like to >>> consult comments about enhancing for "assigning to returns". >> I updated the patch to fix the issue for returns. This patch >> adds a flag DECL_USEDBY_RETURN_P to indicate if a var is used >> by a return stmt. This patch fix the issue in expand pass only, >> so, we would try to update the patch to avoid this flag. >> >> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc >> index dd29ffffc03..09b8ec64cea 100644 >> --- a/gcc/cfgexpand.cc >> +++ b/gcc/cfgexpand.cc >> @@ -2158,6 +2158,20 @@ expand_used_vars (bitmap forced_stack_vars) >> frame_phase =3D off ? align - off : 0; >> } >> + /* Collect VARs on returns. */ >> + if (DECL_RESULT (current_function_decl)) >> + { >> + edge_iterator ei; >> + edge e; >> + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) >> + if (greturn *ret =3D safe_dyn_cast (last_stmt (e->src))) >> + { >> + tree val =3D gimple_return_retval (ret); >> + if (val && VAR_P (val)) >> + DECL_USEDBY_RETURN_P (val) =3D 1; >> + } >> + } >> + >> /* Set TREE_USED on all variables in the local_decls. */ >> FOR_EACH_LOCAL_DECL (cfun, i, var) >> TREE_USED (var) =3D 1; >> diff --git a/gcc/expr.cc b/gcc/expr.cc >> index d9407432ea5..20973649963 100644 >> --- a/gcc/expr.cc >> +++ b/gcc/expr.cc >> @@ -6045,6 +6045,52 @@ expand_assignment (tree to, tree from, bool nonte= mporal) >> return; >> } >> + if ((TREE_CODE (from) =3D=3D PARM_DECL && DECL_INCOMING_RTL (from) >> + && TYPE_MODE (TREE_TYPE (from)) =3D=3D BLKmode >> + && (GET_CODE (DECL_INCOMING_RTL (from)) =3D=3D PARALLEL >> + || REG_P (DECL_INCOMING_RTL (from)))) >> + || (VAR_P (to) && DECL_USEDBY_RETURN_P (to) >> + && TYPE_MODE (TREE_TYPE (to)) =3D=3D BLKmode >> + && GET_CODE (DECL_RTL (DECL_RESULT (current_function_decl))) >> + =3D=3D PARALLEL)) >> + { >> + push_temp_slots (); >> + rtx par_ret; >> + machine_mode mode; >> + par_ret =3D TREE_CODE (from) =3D=3D PARM_DECL >> + ? DECL_INCOMING_RTL (from) >> + : DECL_RTL (DECL_RESULT (current_function_decl)); >> + mode =3D GET_CODE (par_ret) =3D=3D PARALLEL >> + ? GET_MODE (XEXP (XVECEXP (par_ret, 0, 0), 0)) >> + : word_mode; >> + int mode_size =3D GET_MODE_SIZE (mode).to_constant (); >> + int size =3D INTVAL (expr_size (from)); >> + >> + /* If/How the parameter using submode, it dependes on the size and >> + position of the parameter. Here using heurisitic number. */ >> + int hurstc_num =3D 8; > > Where did this come from and what does it mean? Sorry for does not make this clear. We know that an aggregate arg may be on registers partially or totally, as assign_parm_adjust_entry_rtl. For an example, if a parameter with 12 words and the target/ABI only allow 8 gprs for arguments, then the parameter could use 8 regs at most and left part in stack. > > > Note that BLKmode subword values passed in registers can be either > right or left justified.=C2=A0 I think you also need to worry about > endianness here. Since the subword is used to move block(read from source mem and then store to destination mem with register mode), and this would keep to use the same endianness on reg like move_block_from_reg. So, the patch does not check the endianness. If any concerns and sugguestions, please point out, thanks! BR, Jeff (Jiufu) > > > Jeff