From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 860C53858D1E; Tue, 29 Nov 2022 03:53:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 860C53858D1E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2AT2qGio019037; Tue, 29 Nov 2022 03:53:14 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : references : date : in-reply-to : message-id : content-type : content-transfer-encoding : mime-version; s=pp1; bh=ZJZ4c0CV39vx2O2KHfyppvZ6eRh/qLWjFVXsX/Pcy8A=; b=Zrk1t0dtVar2CCPu6oLM5fdD8O7WL3RE8UgBprPuk4diVSydjQr51btEot4Is/X2yor8 lOQMC63UCWVz3oG1S0yud1t1ENs0i1vCEWvSxgakPdfqF3p0jbiyTQagcg1RCO4x10Hz baXRrPzZWeUyyHilBsxl7Q4O0QmOL8XcrhB694Da8C87is20unY/TVyDlhaBJbn2YNV8 2iw3+PevCzODFXm7I/suVv2fSa0UtTO9EKiZe4QgNklNrTlZkpeCyKEa3jDBalbguKNS NzQy1J65WZRMZcn1Fv4LITZO5syemwOCetzO4fMKAA43AGr+yLtZJHGhcYA2PDQEU7LW rw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3m599b9h3e-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 29 Nov 2022 03:53:14 +0000 Received: from m0098396.ppops.net (m0098396.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2AT2ttV6029070; Tue, 29 Nov 2022 03:53:13 GMT Received: from ppma01wdc.us.ibm.com (fd.55.37a9.ip4.static.sl-reverse.com [169.55.85.253]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3m599b9h33-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 29 Nov 2022 03:53:13 +0000 Received: from pps.filterd (ppma01wdc.us.ibm.com [127.0.0.1]) by ppma01wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 2AT3pTkA013839; Tue, 29 Nov 2022 03:53:12 GMT Received: from b01cxnp23032.gho.pok.ibm.com (b01cxnp23032.gho.pok.ibm.com [9.57.198.27]) by ppma01wdc.us.ibm.com with ESMTP id 3m3ae99ubv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 29 Nov 2022 03:53:12 +0000 Received: from smtpav01.wdc07v.mail.ibm.com ([9.208.128.113]) by b01cxnp23032.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2AT3rBeR12518038 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 29 Nov 2022 03:53:12 GMT Received: from smtpav01.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 419C658059; Tue, 29 Nov 2022 03:53:11 +0000 (GMT) Received: from smtpav01.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BBD045805B; Tue, 29 Nov 2022 03:53:10 +0000 (GMT) Received: from pike (unknown [9.5.12.127]) by smtpav01.wdc07v.mail.ibm.com (Postfix) with ESMTPS; Tue, 29 Nov 2022 03:53:10 +0000 (GMT) From: Jiufu Guo To: Jeff Law Cc: gcc-patches@gcc.gnu.org, segher@kernel.crashing.org, dje.gcc@gmail.com, linkw@gcc.gnu.org, rguenther@suse.de Subject: Re: [PATCH V2] Use subscalar mode to move struct block for parameter References: <20221117061549.178481-1-guojiufu@linux.ibm.com> <7ea64lroo6.fsf@pike.rch.stglabs.ibm.com> <9424d98e-a95f-58ae-9764-bcf8b4f503dc@gmail.com> <7efseas7f1.fsf@pike.rch.stglabs.ibm.com> <0e432b50-d500-ca2f-0db5-9e9cf099f26c@gmail.com> Date: Tue, 29 Nov 2022 11:53:07 +0800 In-Reply-To: <0e432b50-d500-ca2f-0db5-9e9cf099f26c@gmail.com> (Jeff Law's message of "Mon, 28 Nov 2022 10:00:57 -0700") Message-ID: <7elenuzaak.fsf@pike.rch.stglabs.ibm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) Content-Type: text/plain; charset=utf-8 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: -CH6spwmh3EdKmXZvD-vLnpU7ExAiNzC X-Proofpoint-ORIG-GUID: IqZzSzQkI8fImGfquO_k0UqE7UC4XifT Content-Transfer-Encoding: quoted-printable X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-29_02,2022-11-28_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 clxscore=1015 impostorscore=0 suspectscore=0 phishscore=0 adultscore=0 bulkscore=0 malwarescore=0 lowpriorityscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2210170000 definitions=main-2211290022 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,KAM_SHORT,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Jeff, Thanks a lot for your comments! Jeff Law writes: > On 11/22/22 19:58, Jiufu Guo wrote: >> Hi Jeff, >> >> Thanks a lot for your comments! >> >> Jeff Law writes: >> >>> On 11/20/22 20:07, Jiufu Guo wrote: >>>> Jiufu Guo writes: >>>> >>>>> Hi, >>>>> >>>>> As mentioned in the previous version patch: >>>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604646.html >>>>> The suboptimal code is generated for "assigning from parameter" or >>>>> "assigning to return value". >>>>> This patch enhances the assignment from parameters like the below >>>>> cases: >>>>> /////case1.c >>>>> typedef struct SA {double a[3];long l; } A; >>>>> A ret_arg (A a) {return a;} >>>>> void st_arg (A a, A *p) {*p =3D a;} >>>>> >>>>> ////case2.c >>>>> typedef struct SA {double a[3];} A; >>>>> A ret_arg (A a) {return a;} >>>>> void st_arg (A a, A *p) {*p =3D a;} >>>>> >>>>> For this patch, bootstrap and regtest pass on ppc64{,le} >>>>> and x86_64. cut... >>>> + : word_mode; >>>> + int mode_size =3D GET_MODE_SIZE (mode).to_constant (); >>>> + int size =3D INTVAL (expr_size (from)); >>>> + >>>> + /* If/How the parameter using submode, it dependes on the size = and >>>> + position of the parameter. Here using heurisitic number. */ >>>> + int hurstc_num =3D 8; >>> Where did this come from and what does it mean? >> Sorry for does not make this clear. We know that an aggregate arg may be >> on registers partially or totally, as assign_parm_adjust_entry_rtl. >> For an example, if a parameter with 12 words and the target/ABI only >> allow 8 gprs for arguments, then the parameter could use 8 regs at most >> and left part in stack. > > Right, but the number of registers is target dependent, so I don't see > how using "8" or any number of that matter is correct here. I understand. And even for the same struct type, using how many registers to pass a parameter, it also dependends on the size of the parameter and how many leading parameters there is. So, as you said, "8" or any numbers are not always accurate. Because, the enhancement in this patch is just make "block move" to be more friendly for follow optiomizations(cse/dse/dce...) by moving through sub-mode. So, I just selected one arbitrary number which may not too large and also tolerable. I also through to query the max number registers from targets for param/ret passing, but it may not very accurate neither. Any sugguestions are welcome! and thanks! > > >> >>> >>> Note that BLKmode subword values passed in registers can be either >>> right or left justified.=C2=A0 I think you also need to worry about >>> endianness here. >> Since the subword is used to move block(read from source mem and then >> store to destination mem with register mode), and this would keep to use >> the same endianness on reg like move_block_from_reg. So, the patch does >> not check the endianness. > > Hmm, I was clear enough here, particularly using the endianness term.=C2= =A0 > Don't you need to query the ABI to ensure that you're not changing > left vs right justification for a partially in register argument.=C2=A0= =C2=A0=C2=A0=C2=A0 > On the PA we have: > > /* Specify padding for the last element of a block move between registers > =C2=A0=C2=A0 and memory. > > =C2=A0=C2=A0 The 64-bit runtime specifies that objects need to be left ju= stified > =C2=A0=C2=A0 (i.e., the normal justification for a big endian target).=C2= =A0 The 32-bit > =C2=A0=C2=A0 runtime specifies right justification for objects smaller th= an 64 bits. > =C2=A0=C2=A0 We use a DImode register in the parallel for 5 to 7 byte str= uctures > =C2=A0=C2=A0 so that there is only one element.=C2=A0 This allows the obj= ect to be > =C2=A0=C2=A0 correctly padded.=C2=A0 */ > #define BLOCK_REG_PADDING(MODE, TYPE, FIRST) \ > =C2=A0 targetm.calls.function_arg_padding ((MODE), (TYPE)) Yes. We should be careful when store registers to stack (assign_parms/assign_parm_setup_xx/block/reg), or load to returns. For this patch, only simple stuffs are handled like "D.xxx =3D param_1", where the source and dest of the assignment are all in memory which is the DECL_RTL(of D.xx/param_xx) in MEM_P/BLK. And to avoid complicate, this patch only handle the case where "(size % mode_size) =3D=3D 0". If any misunderstandings, please point out, thanks. And thanks for comments!=20 BR, Jeff (Jiufu) > > > Jeff