From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 6D4963858D35; Tue, 1 Nov 2022 04:30:26 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6D4963858D35 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2A14KUIg032762; Tue, 1 Nov 2022 04:30:25 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : references : date : in-reply-to : message-id : mime-version : content-type : content-transfer-encoding; s=pp1; bh=Tzz1duyuY+79RyLycN70jnFBuXpVPXRhio3NrK3bj4s=; b=px7ULrWVvXfemzDyspUxnK4RugXWGas1Ii0isdVIXCGYoeBDIhhx2m+UaHApNIzqwhWN tDLSOjeOjQJlYsTTXTppAFTEdPgjpwc4aNfLVVdfwF5VOw2PqMqu87uK2uTaEXkv0RWf n/c2F93nOs1GB0bhD6FwFMsBr9zHoSmYwScJ6qK/hZdVrGWXLe0MEsAyMV0UZhx0qGRR qEtbdRNRkCdri7dT41AlnfQgjpQEp0vbo+26gDXJD3AIzMshWPWesspqsEC8OShWXnNY dKd33D9FTQDJRm/Q2xICXA2tc4c8CwGbQCz5OZ5FSwIqWapNFwELVlV9Dug251KrhHh7 WA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3kjvbgg6wa-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 01 Nov 2022 04:30:24 +0000 Received: from m0098409.ppops.net (m0098409.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2A14LOcF005899; Tue, 1 Nov 2022 04:30:24 GMT Received: from ppma04wdc.us.ibm.com (1a.90.2fa9.ip4.static.sl-reverse.com [169.47.144.26]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3kjvbgg6vt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 01 Nov 2022 04:30:24 +0000 Received: from pps.filterd (ppma04wdc.us.ibm.com [127.0.0.1]) by ppma04wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 2A14K2nc016645; Tue, 1 Nov 2022 04:30:23 GMT Received: from b03cxnp08025.gho.boulder.ibm.com (b03cxnp08025.gho.boulder.ibm.com [9.17.130.17]) by ppma04wdc.us.ibm.com with ESMTP id 3kgut9svfg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 01 Nov 2022 04:30:23 +0000 Received: from smtpav02.dal12v.mail.ibm.com ([9.208.128.128]) by b03cxnp08025.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2A14UKHD16777814 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 1 Nov 2022 04:30:20 GMT Received: from smtpav02.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E902058068; Tue, 1 Nov 2022 04:30:21 +0000 (GMT) Received: from smtpav02.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A0EBA5806F; Tue, 1 Nov 2022 04:30:21 +0000 (GMT) Received: from pike (unknown [9.5.12.127]) by smtpav02.dal12v.mail.ibm.com (Postfix) with ESMTPS; Tue, 1 Nov 2022 04:30:21 +0000 (GMT) From: Jiufu Guo To: Segher Boessenkool Cc: Jeff Law , gcc-patches@gcc.gnu.org, rguenth@gcc.gnu.org, pinskia@gcc.gnu.org, linkw@gcc.gnu.org, dje.gcc@gmail.com Subject: Re: [RFC] propgation leap over memory copy for struct References: <20221031024235.110995-1-guojiufu@linux.ibm.com> <20221101004956.GL25951@gate.crashing.org> Date: Tue, 01 Nov 2022 12:30:21 +0800 In-Reply-To: <20221101004956.GL25951@gate.crashing.org> (Segher Boessenkool's message of "Mon, 31 Oct 2022 19:49:56 -0500") Message-ID: <7e1qqnwb36.fsf@pike.rch.stglabs.ibm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 61DjHKEH6Vrb3Ni6M0VS8cIvkT3Y2tEz X-Proofpoint-ORIG-GUID: dV9x6CnWgGDVgOoXm7E_VDIYemx86_Un X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-01_01,2022-10-31_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 lowpriorityscore=0 phishscore=0 priorityscore=1501 impostorscore=0 bulkscore=0 adultscore=0 mlxscore=0 spamscore=0 malwarescore=0 mlxlogscore=999 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2210170000 definitions=main-2211010030 X-Spam-Status: No, score=-5.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Segher Boessenkool writes: > On Mon, Oct 31, 2022 at 04:13:38PM -0600, Jeff Law wrote: >> On 10/30/22 20:42, Jiufu Guo via Gcc-patches wrote: >> >We know that for struct variable assignment, memory copy may be used. >> >And for memcpy, we may load and store more bytes as possible at one tim= e. >> >While it may be not best here: > >> So the first question in my mind is can we do better at the gimple=20 >> phase?=C2=A0 For the second case in particular can't we just "return a"= =20 >> rather than copying a into then returning ?=C2=A0 This = feels=20 >> a lot like the return value optimization from C++.=C2=A0 I'm not sure if= it=20 >> applies to the first case or not, it's been a long time since I looked=20 >> at NRV optimizations, but it might be worth poking around in there a bit= =20 >> (tree-nrv.cc). > > If it is a bigger struct you end up with quite a lot of stuff in > registers. GCC will eventually put that all in memory so it will work > out fine in the end, but you are likely to get inefficient code. Yes. We may need to use memory to save regiters for big struct. Small struct may be practical to use registers. We may leverage the idea that: some type of small struct are passing to function through registers.=20 > > OTOH, 8 bytes isn't as big as we would want these days, is it? So it > would be useful to put smaller temportaries, say 32 bytes and smaller, > in registers instead of in memory. I think you mean: we should try to registers to avoid memory accesing, and using registers would be ok for more bytes memcpy(32bytes). Great sugguestion, thanks a lot! Like below idea: [r100:TI, r101:TI] =3D src; //Or r100:OI/OO =3D src; dest =3D [r100:TI, r101:TI]; Currently, for 8bytes structure, we are using TImode for it. And subreg/fwprop/cse passes are able to optimize it as expected. Two concerns here: larger int modes(OI/OO/..) may be not introduced yet; I'm not sure if current infrastructure supports to use two more registers for one structure. > >> But even so, these kinds of things are still bound to happen, so it's=20 >> probably worth thinking about if we can do better in RTL as well. > > Always. It is a mistake to think that having better high-level > optimisations means that you don't need good low-level optimisations > anymore: in fact deficiencies there become more glaringly apparent if > the early pipeline opts become better :-) Understant, thanks :) > >> The first thing that comes to my mind is to annotate memcpy calls that=20 >> are structure assignments.=C2=A0 The idea here is that we may want to ex= pand=20 >> a memcpy differently in those cases.=C2=A0=C2=A0 Changing how we expand = an opaque=20 >> memcpy call is unlikely to be beneficial in most cases.=C2=A0 But changi= ng=20 >> how we expand a structure copy may be beneficial by exposing the=20 >> underlying field values.=C2=A0=C2=A0 This would roughly correspond to yo= ur method=20 >> #1. >>=20 >> Or instead of changing how we expand, teach the optimizers about these=20 >> annotated memcpy calls -- they're just a a copy of each field. =C2=A0 Th= at's=20 >> how CSE and the propagators could treat them. After some point we'd=20 >> lower them in the usual ways, but at least early in the RTL pipeline we= =20 >> could keep them as annotated memcpy calls.=C2=A0 This roughly correspond= s to=20 >> your second suggestion. > > Ideally this won't ever make it as far as RTL, if the structures do not > need to go via memory. All high-level optimissations should have been > done earlier, and hopefully it was not expand tiself that forced stuff > into memory! :-/ Currently, after early gimple optimization, the struct member accessing may still need to be in memory (if the mode of the struct is BLK). For example: _Bool foo (const A a) { return a.a[0] > 1.0; } The optimized gimple would be: _1 =3D a.a[0]; _3 =3D _1 > 1.0e+0; return _3; During expand to RTL, parm 'a' is store to memory from arg regs firstly, and "a.a[0]" is also reading from memory. It may be better to use "f1" for "a.a[0]" here. Maybe, method3 is similar with your idea: using "parallel:BLK {DF;DF;DF; DF= }" for the struct (BLK may be changed), and using 4 DF registers to access the structure in expand pass. Thanks again for your kindly and helpful comments! BR, Jeff(Jiufu) > > > Segher