From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 742AC392AC07; Fri, 25 Nov 2022 05:05:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 742AC392AC07 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2AP4o7KI022771; Fri, 25 Nov 2022 05:05:49 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : references : date : in-reply-to : message-id : content-type : content-transfer-encoding : mime-version; s=pp1; bh=XA12UvGs83N4TAAGtCFZ8llzexDS5cqo3lJxxcyyTXw=; b=p5L0PSVwULazMm+I75Kt1CD4rzwks++bzT90/h7FGiWywoXD4g0iHJrLcls2UwuJwhW6 HkqPKErYbA5f5wjVpC2uulmo5j0kk1ObEfk8qzy2xK3VDGDBLCIF4jcv45EOZbryn8o7 sneva7otwIhRHxYJQBbjrS0b2E8z82PgLHfQLmqYLRzltkqKnuj5FGEQDkJSoPZjYgy9 Ok19wwQE5LOAPizzJbwkI74Gc1W1cjUDfffsXZIigULxswywrZfsNZaxTJJ4s2lMutTH gxs5G40g5YrNmLA8tn9V1h3AcFG40PP4fF7EWv7cs7XECyJolrLuREEvzoSIjh+q6kZI 7w== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3m2q1f099w-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 25 Nov 2022 05:05:48 +0000 Received: from m0098417.ppops.net (m0098417.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2AP55mJV015486; Fri, 25 Nov 2022 05:05:48 GMT Received: from ppma02wdc.us.ibm.com (aa.5b.37a9.ip4.static.sl-reverse.com [169.55.91.170]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3m2q1f099m-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 25 Nov 2022 05:05:48 +0000 Received: from pps.filterd (ppma02wdc.us.ibm.com [127.0.0.1]) by ppma02wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 2AP54lIS016374; Fri, 25 Nov 2022 05:05:47 GMT Received: from smtprelay06.wdc07v.mail.ibm.com ([9.208.129.118]) by ppma02wdc.us.ibm.com with ESMTP id 3kxpsadm9m-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 25 Nov 2022 05:05:47 +0000 Received: from smtpav01.wdc07v.mail.ibm.com (smtpav01.wdc07v.mail.ibm.com [10.39.53.228]) by smtprelay06.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2AP55kXi31064376 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 25 Nov 2022 05:05:46 GMT Received: from smtpav01.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9C10D5804B; Fri, 25 Nov 2022 05:05:46 +0000 (GMT) Received: from smtpav01.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 14CB158059; Fri, 25 Nov 2022 05:05:46 +0000 (GMT) Received: from pike (unknown [9.5.12.127]) by smtpav01.wdc07v.mail.ibm.com (Postfix) with ESMTPS; Fri, 25 Nov 2022 05:05:45 +0000 (GMT) From: Jiufu Guo To: Richard Biener Cc: Jeff Law , gcc-patches@gcc.gnu.org, segher@kernel.crashing.org, dje.gcc@gmail.com, linkw@gcc.gnu.org Subject: Re: [PATCH V2] Use subscalar mode to move struct block for parameter References: <20221117061549.178481-1-guojiufu@linux.ibm.com> <7ea64lroo6.fsf@pike.rch.stglabs.ibm.com> <9424d98e-a95f-58ae-9764-bcf8b4f503dc@gmail.com> <7efseas7f1.fsf@pike.rch.stglabs.ibm.com> Date: Fri, 25 Nov 2022 13:05:43 +0800 In-Reply-To: (Richard Biener's message of "Thu, 24 Nov 2022 07:31:15 +0000 (UTC)") Message-ID: <7ewn7jr5co.fsf@pike.rch.stglabs.ibm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) Content-Type: text/plain; charset=utf-8 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: sf8jpz2SVH2WjHUy5i3nf6ttYkqxNs0v X-Proofpoint-ORIG-GUID: 15YoWC2xsgpo9Djuj98HRVjedZ5WZB-e Content-Transfer-Encoding: quoted-printable X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-25_02,2022-11-24_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 impostorscore=0 suspectscore=0 bulkscore=0 phishscore=0 adultscore=0 lowpriorityscore=0 mlxscore=0 spamscore=0 malwarescore=0 priorityscore=1501 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2210170000 definitions=main-2211250039 X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Richard, Thanks a lot for your comments! Richard Biener writes: > On Wed, 23 Nov 2022, Jiufu Guo wrote: > >> Hi Jeff, >>=20 >> Thanks a lot for your comments! > > Sorry for the late response ... > >> Jeff Law writes: >>=20 >> > On 11/20/22 20:07, Jiufu Guo wrote: >> >> Jiufu Guo writes: >> >> >> >>> Hi, >> >>> >> >>> As mentioned in the previous version patch: >> >>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604646.html >> >>> The suboptimal code is generated for "assigning from parameter" or >> >>> "assigning to return value". >> >>> This patch enhances the assignment from parameters like the below >> >>> cases: >> >>> /////case1.c >> >>> typedef struct SA {double a[3];long l; } A; >> >>> A ret_arg (A a) {return a;} >> >>> void st_arg (A a, A *p) {*p =3D a;} >> >>> >> >>> ////case2.c >> >>> typedef struct SA {double a[3];} A; >> >>> A ret_arg (A a) {return a;} >> >>> void st_arg (A a, A *p) {*p =3D a;} >> >>> >> >>> For this patch, bootstrap and regtest pass on ppc64{,le} >> >>> and x86_64. >> >>> * Besides asking for help reviewing this patch, I would like to >> >>> consult comments about enhancing for "assigning to returns". >> >> I updated the patch to fix the issue for returns. This patch >> >> adds a flag DECL_USEDBY_RETURN_P to indicate if a var is used >> >> by a return stmt. This patch fix the issue in expand pass only, >> >> so, we would try to update the patch to avoid this flag. >> >> >> >> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc >> >> index dd29ffffc03..09b8ec64cea 100644 >> >> --- a/gcc/cfgexpand.cc >> >> +++ b/gcc/cfgexpand.cc >> >> @@ -2158,6 +2158,20 @@ expand_used_vars (bitmap forced_stack_vars) >> >> frame_phase =3D off ? align - off : 0; >> >> } >> >> + /* Collect VARs on returns. */ >> >> + if (DECL_RESULT (current_function_decl)) >> >> + { >> >> + edge_iterator ei; >> >> + edge e; >> >> + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) >> >> + if (greturn *ret =3D safe_dyn_cast (last_stmt (e->src))) >> >> + { >> >> + tree val =3D gimple_return_retval (ret); >> >> + if (val && VAR_P (val)) >> >> + DECL_USEDBY_RETURN_P (val) =3D 1; > > you probably want to check && auto_var_in_fn (val, ...) since val > might be global? Since we are collecting the return vals, it would be built in function gimplify_return_expr. In this function, create_tmp_reg is used and a local temp. So it would not be a global var here. code piece in gimplify_return_expr: if (!result_decl) result =3D NULL_TREE; else if (aggregate_value_p (result_decl, TREE_TYPE (current_function_decl= ))) { .... result =3D result_decl; } else if (gimplify_ctxp->return_temp) result =3D gimplify_ctxp->return_temp; else { result =3D create_tmp_reg (TREE_TYPE (result_decl)); Here, for "typedef struct SA {double a[3];}", aggregate_value_p returns false for target like ppc64le, because result of "hard_function_value" is a "rtx with PARALLELL code". And then a DECL_VAR is built via "create_tmp_reg" (actually it is not a reg here. it built a DECL_VAR with RECORD type and BLK mode). I also tried the way to use RESULT_DECL for this kind of type, or let aggregate_value_p accept this kind of type. But it seems not easy, because " (RESULT_DECL with PARALLEL)" is not ok for address operations. > >> >> + } >> >> + } >> >> + >> >> /* Set TREE_USED on all variables in the local_decls. */ >> >> FOR_EACH_LOCAL_DECL (cfun, i, var) >> >> TREE_USED (var) =3D 1; >> >> diff --git a/gcc/expr.cc b/gcc/expr.cc >> >> index d9407432ea5..20973649963 100644 >> >> --- a/gcc/expr.cc >> >> +++ b/gcc/expr.cc >> >> @@ -6045,6 +6045,52 @@ expand_assignment (tree to, tree from, bool no= ntemporal) >> >> return; >> >> } > > I miss an explanatory comment here on that the following is heuristics > and its reasoning. > >> >> + if ((TREE_CODE (from) =3D=3D PARM_DECL && DECL_INCOMING_RTL (fro= m) >> >> + && TYPE_MODE (TREE_TYPE (from)) =3D=3D BLKmode > > Why check TYPE_MODE here? Do you want AGGREGATE_TYPE_P on the type > instead? Checking BLK, because I want make sure the param should occur on register and stack (saved from register). Actualy, my intention is checking: GET_MODE (DECL_INCOMING_RTL (from)) =3D=3D BLKmode For code: GET_MODE (DECL_INCOMING_RTL (from)) =3D=3D BLKmode && (GET_CODE (DECL_INCOMING_RTL (from)) =3D=3D PARALLEL || REG_P (DECL_INCOMING_RTL (from))) This checking indicates if the param may be passing via 2 or more registers. Using "AGGREGATE_TYPE_P && (PARALLEL || REG_P)" may be ok and more readable. I would have a test. > >> >> + && (GET_CODE (DECL_INCOMING_RTL (from)) =3D=3D PARALLEL >> >> + || REG_P (DECL_INCOMING_RTL (from)))) >> >> + || (VAR_P (to) && DECL_USEDBY_RETURN_P (to) >> >> + && TYPE_MODE (TREE_TYPE (to)) =3D=3D BLKmode > > Likewise. > >> >> + && GET_CODE (DECL_RTL (DECL_RESULT (current_function_decl))) >> >> + =3D=3D PARALLEL)) > > Not REG_P here? REG_P with BLK on return would means return in memory, instead return through registers, so, REG_P was not added here, and let it use orignal behavior. > >> >> + { >> >> + push_temp_slots (); >> >> + rtx par_ret; >> >> + machine_mode mode; >> >> + par_ret =3D TREE_CODE (from) =3D=3D PARM_DECL >> >> + ? DECL_INCOMING_RTL (from) >> >> + : DECL_RTL (DECL_RESULT (current_function_decl)); >> >> + mode =3D GET_CODE (par_ret) =3D=3D PARALLEL >> >> + ? GET_MODE (XEXP (XVECEXP (par_ret, 0, 0), 0)) >> >> + : word_mode; >> >> + int mode_size =3D GET_MODE_SIZE (mode).to_constant (); >> >> + int size =3D INTVAL (expr_size (from)); >> >> + >> >> + /* If/How the parameter using submode, it dependes on the size= and >> >> + position of the parameter. Here using heurisitic number. */ >> >> + int hurstc_num =3D 8; >> > >> > Where did this come from and what does it mean? >> Sorry for does not make this clear. We know that an aggregate arg may be >> on registers partially or totally, as assign_parm_adjust_entry_rtl. >> For an example, if a parameter with 12 words and the target/ABI only >> allow 8 gprs for arguments, then the parameter could use 8 regs at most >> and left part in stack. > > I also wonder about the exact semantics of the parallels we get here. > > + int size =3D INTVAL (expr_size (from)); > > esp. when you use sth as simple as this. Shouldn't you instead look > at to_rtx since that's already expanded? For returns that should Yes, I use "expr_size (from)" is just because the size should be same between "from", "to" and "to_rtx" for these cases. > be the desired layout to match 'from' to, no? Maybe it's better > to not try sharing the code for both incoming and return copies > for clarity? OK, thanks, good sugguestion. > > Also, what happens if there's a copy from a PARM_DECL to a > DECL_USEDBY_RETURN_P decl? Which heuristic takes precedent? I thinking below code reflects this senario: typedef struct SA {double a[3];} A; A ret_arg (A a) {return a;} The gimple seq: D.3951 =3D a; return D.3951; For this patch, since "PARM_DECL" is checked before "VAR_P", so, PARM_DECL is checked as true first. > > I think that at least the place of the copy improvement and the > way you compute DECL_USEDBY_RETURN_P is reasonable. I'm also thinking to mark this flag for return variables during it was built in gimplify_return_expr. This would be more straightforward and no need to query/walk return stmts. One possible concern: it is too far between "gimplify" phase and "expand". I will update the patch accordingly. Thanks for suggestions and comments! BR, Jeff (Jiufu) > > Thanks, > Richard. > >> > >> > >> > Note that BLKmode subword values passed in registers can be either >> > right or left justified.=C2=A0 I think you also need to worry about >> > endianness here. >> Since the subword is used to move block(read from source mem and then >> store to destination mem with register mode), and this would keep to use >> the same endianness on reg like move_block_from_reg. So, the patch does >> not check the endianness. >>=20 >> If any concerns and sugguestions, please point out, thanks! >>=20 >> BR, >> Jeff (Jiufu) >> > >> > >> > Jeff >>=20