From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id B1F9B3855171; Fri, 25 Nov 2022 12:30:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B1F9B3855171 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2APCQCdP013207; Fri, 25 Nov 2022 12:30:04 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : references : content-type : date : in-reply-to : message-id : mime-version; s=pp1; bh=+ew/2kweiGl6Lqdo4i0FfAfVi4I9xCk9Qi1qX8AVsfE=; b=OVqZ6+LK0OzdfH4M61DGpf2KhIl9QPi6HR7pA1srF3+sljm1I7z1Mzqb+qfMbzOXQ26I 8eKdecWL9xETDhJoOUcRwkD6Jcw7bP458mrbgMUbtBZDPTQSqcCodMkAr23MIKMTtk3Q 6ZW9mmRO/U1qtKavstikYfu5VCtxpUPE03z+YUmmkKTU6ZqLdDzmBVD7rXQKGhRDA3D/ 1n2ZWjYggZv7kWbtRLLdYB0EWYfuxHNJlo0V3qos6j9GzfTIvKLqWZP7PPGzvYO5olVf 9/wEc/X7dvWaYmjFkKGkVMCDjtvgRNTLTcMSDyTnxDO69xXhlhG9/5MVRhHhplCNAU+U MQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3m2wq1g1wf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 25 Nov 2022 12:30:03 +0000 Received: from m0098420.ppops.net (m0098420.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2APCQSKR013517; Fri, 25 Nov 2022 12:30:03 GMT Received: from ppma02dal.us.ibm.com (a.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.10]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3m2wq1g1w5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 25 Nov 2022 12:30:03 +0000 Received: from pps.filterd (ppma02dal.us.ibm.com [127.0.0.1]) by ppma02dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 2APCJr0h005943; Fri, 25 Nov 2022 12:30:02 GMT Received: from smtprelay05.dal12v.mail.ibm.com ([9.208.130.101]) by ppma02dal.us.ibm.com with ESMTP id 3kxpsb186c-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 25 Nov 2022 12:30:02 +0000 Received: from smtpav03.wdc07v.mail.ibm.com (smtpav03.wdc07v.mail.ibm.com [10.39.53.230]) by smtprelay05.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2APCU1G534013568 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 25 Nov 2022 12:30:01 GMT Received: from smtpav03.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2B4825805D; Fri, 25 Nov 2022 12:30:01 +0000 (GMT) Received: from smtpav03.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id AAFCD5805A; Fri, 25 Nov 2022 12:30:00 +0000 (GMT) Received: from pike (unknown [9.5.12.127]) by smtpav03.wdc07v.mail.ibm.com (Postfix) with ESMTPS; Fri, 25 Nov 2022 12:30:00 +0000 (GMT) From: Jiufu Guo To: Jiufu Guo via Gcc-patches Cc: Richard Biener , Jeff Law , segher@kernel.crashing.org, dje.gcc@gmail.com, linkw@gcc.gnu.org Subject: Re: [PATCH V2] Use subscalar mode to move struct block for parameter References: <20221117061549.178481-1-guojiufu@linux.ibm.com> <7ea64lroo6.fsf@pike.rch.stglabs.ibm.com> <9424d98e-a95f-58ae-9764-bcf8b4f503dc@gmail.com> <7efseas7f1.fsf@pike.rch.stglabs.ibm.com> <7ewn7jr5co.fsf@pike.rch.stglabs.ibm.com> Content-Type: text/plain Date: Fri, 25 Nov 2022 20:29:45 +0800 In-Reply-To: <7ewn7jr5co.fsf@pike.rch.stglabs.ibm.com> (Jiufu Guo via Gcc-patches's message of "Fri, 25 Nov 2022 13:05:43 +0800") Message-ID: <7eedtrqksm.fsf@pike.rch.stglabs.ibm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 3bikb-4uV1j8Hk-J-oHAFuZsTEYTz39Q X-Proofpoint-ORIG-GUID: mELh-R1fzBU4AFc-ip4tS5oYiZMd0hpJ X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-25_04,2022-11-25_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 mlxlogscore=999 lowpriorityscore=0 clxscore=1015 spamscore=0 phishscore=0 impostorscore=0 suspectscore=0 priorityscore=1501 adultscore=0 bulkscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2210170000 definitions=main-2211250095 X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Jiufu Guo via Gcc-patches writes: > Hi Richard, > > Thanks a lot for your comments! > > Richard Biener writes: > >> On Wed, 23 Nov 2022, Jiufu Guo wrote: >> >>> Hi Jeff, >>> >>> Thanks a lot for your comments! >> >> Sorry for the late response ... >> >>> Jeff Law writes: >>> >>> > On 11/20/22 20:07, Jiufu Guo wrote: >>> >> Jiufu Guo writes: >>> >> >>> >>> Hi, >>> >>> >>> >>> As mentioned in the previous version patch: >>> >>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604646.html >>> >>> The suboptimal code is generated for "assigning from parameter" or >>> >>> "assigning to return value". >>> >>> This patch enhances the assignment from parameters like the below >>> >>> cases: >>> >>> /////case1.c cut... >>> >> + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) >>> >> + if (greturn *ret = safe_dyn_cast (last_stmt (e->src))) >>> >> + { >>> >> + tree val = gimple_return_retval (ret); >>> >> + if (val && VAR_P (val)) >>> >> + DECL_USEDBY_RETURN_P (val) = 1; >> >> you probably want to check && auto_var_in_fn (val, ...) since val >> might be global? > Since we are collecting the return vals, it would be built in function > gimplify_return_expr. In this function, create_tmp_reg is used and > a local temp. So it would not be a global var here. > > code piece in gimplify_return_expr: > if (!result_decl) > result = NULL_TREE; > else if (aggregate_value_p (result_decl, TREE_TYPE (current_function_decl))) > { > .... > result = result_decl; > } > else if (gimplify_ctxp->return_temp) > result = gimplify_ctxp->return_temp; > else > { > result = create_tmp_reg (TREE_TYPE (result_decl)); > > Here, for "typedef struct SA {double a[3];}", aggregate_value_p returns > false for target like ppc64le, because result of "hard_function_value" > is a "rtx with PARALLELL code". > And then a DECL_VAR is built via "create_tmp_reg" (actually it is not a > reg here. it built a DECL_VAR with RECORD type and BLK mode). > > I also tried the way to use RESULT_DECL for this kind of type, or > let aggregate_value_p accept this kind of type. But it seems not easy, > because " (RESULT_DECL with PARALLEL)" is not ok for address > operations. > > >> >>> >> + } >>> >> + } >>> >> + >>> >> /* Set TREE_USED on all variables in the local_decls. */ >>> >> FOR_EACH_LOCAL_DECL (cfun, i, var) >>> >> TREE_USED (var) = 1; >>> >> diff --git a/gcc/expr.cc b/gcc/expr.cc >>> >> index d9407432ea5..20973649963 100644 >>> >> --- a/gcc/expr.cc >>> >> +++ b/gcc/expr.cc >>> >> @@ -6045,6 +6045,52 @@ expand_assignment (tree to, tree from, bool nontemporal) >>> >> return; >>> >> } >> >> I miss an explanatory comment here on that the following is heuristics >> and its reasoning. >> >>> >> + if ((TREE_CODE (from) == PARM_DECL && DECL_INCOMING_RTL (from) >>> >> + && TYPE_MODE (TREE_TYPE (from)) == BLKmode >> >> Why check TYPE_MODE here? Do you want AGGREGATE_TYPE_P on the type >> instead? > Checking BLK, because I want make sure the param should occur on > register and stack (saved from register). > Actualy, my intention is checking: > GET_MODE (DECL_INCOMING_RTL (from)) == BLKmode > > For code: > GET_MODE (DECL_INCOMING_RTL (from)) == BLKmode > && (GET_CODE (DECL_INCOMING_RTL (from)) == PARALLEL > || REG_P (DECL_INCOMING_RTL (from))) > This checking indicates if the param may be passing via 2 or more > registers. > > Using "AGGREGATE_TYPE_P && (PARALLEL || REG_P)" may be ok and more > readable. I would have a test. Oh, AGGREGATE_TYPE_P seems not the intention of this patch, since if struct size can be represented by an INT size, the mode would be that INT mode, and no need to use block move for the assignment. BR, Jeff (Jiufu) > >> >>> >> + && (GET_CODE (DECL_INCOMING_RTL (from)) == PARALLEL >>> >> + || REG_P (DECL_INCOMING_RTL (from)))) >>> >> + || (VAR_P (to) && DECL_USEDBY_RETURN_P (to) >>> >> + && TYPE_MODE (TREE_TYPE (to)) == BLKmode >> >> Likewise. >> >>> >> + && GET_CODE (DECL_RTL (DECL_RESULT (current_function_decl))) >>> >> + == PARALLEL)) >> >> Not REG_P here? > REG_P with BLK on return would means return in memory, instead return > through registers, so, REG_P was not added here, and let it use > orignal behavior. >> >>> >> + { >>> >> + push_temp_slots (); >>> >> + rtx par_ret; >>> >> + machine_mode mode; >>> >> + par_ret = TREE_CODE (from) == PARM_DECL >>> >> + ? DECL_INCOMING_RTL (from) >>> >> + : DECL_RTL (DECL_RESULT (current_function_decl)); >>> >> + mode = GET_CODE (par_ret) == PARALLEL >>> >> + ? GET_MODE (XEXP (XVECEXP (par_ret, 0, 0), 0)) >>> >> + : word_mode; >>> >> + int mode_size = GET_MODE_SIZE (mode).to_constant (); >>> >> + int size = INTVAL (expr_size (from)); >>> >> + >>> >> + /* If/How the parameter using submode, it dependes on the size and >>> >> + position of the parameter. Here using heurisitic number. */ >>> >> + int hurstc_num = 8; >>> > >>> > Where did this come from and what does it mean? >>> Sorry for does not make this clear. We know that an aggregate arg may be >>> on registers partially or totally, as assign_parm_adjust_entry_rtl. >>> For an example, if a parameter with 12 words and the target/ABI only >>> allow 8 gprs for arguments, then the parameter could use 8 regs at most >>> and left part in stack. >> >> I also wonder about the exact semantics of the parallels we get here. >> >> + int size = INTVAL (expr_size (from)); >> >> esp. when you use sth as simple as this. Shouldn't you instead look >> at to_rtx since that's already expanded? For returns that should > Yes, I use "expr_size (from)" is just because the size should be same > between "from", "to" and "to_rtx" for these cases. >> be the desired layout to match 'from' to, no? Maybe it's better >> to not try sharing the code for both incoming and return copies >> for clarity? > OK, thanks, good sugguestion. >> cut... >>> Since the subword is used to move block(read from source mem and then >>> store to destination mem with register mode), and this would keep to use >>> the same endianness on reg like move_block_from_reg. So, the patch does >>> not check the endianness. >>> >>> If any concerns and sugguestions, please point out, thanks! >>> >>> BR, >>> Jeff (Jiufu) >>> > >>> > >>> > Jeff >>>