From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 796463857BB0; Tue, 1 Nov 2022 03:30:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 796463857BB0 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2A13Dhoe028140; Tue, 1 Nov 2022 03:30:23 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : references : date : in-reply-to : message-id : mime-version : content-type : content-transfer-encoding; s=pp1; bh=Ne8hu4j18X55YeIgswaFI8VpNJo5n3UNa4oPGRv0Uus=; b=PXToPdJzhz1iW4ULtzP/b4EwCyCU1d/6CqGcmds3UW+EJ9mqQfa/0GQWdZ1XmTEBdaYK kZKw+AYm4ogIxWJ7aZCbK/c+QRPgfKePy+mnzbDY/o31ykv3YrGumAObefL9ZICbCsN2 Fypgjkoy9tjIvEN26InXVs/AMHqAMfoo1t3IdgSyqUsdwxL9mEcYiebM9wgYvVtkwXYO v3X44RwoLP0pdo6XeYPuRi6N/yDBBsB/lXinmuY7K0nF8jU6gDX6c/JIIGiyHYLvbuXK rRQRcZLbYLJtKvTqjVz03CMtjhESAUCrkTiBn2JJzQGWtHSS1K7c5Uv/8P9Esd1bOoku bw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3kjuc08btj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 01 Nov 2022 03:30:22 +0000 Received: from m0098417.ppops.net (m0098417.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2A13QkKA012694; Tue, 1 Nov 2022 03:30:22 GMT Received: from ppma01dal.us.ibm.com (83.d6.3fa9.ip4.static.sl-reverse.com [169.63.214.131]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3kjuc08bt8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 01 Nov 2022 03:30:22 +0000 Received: from pps.filterd (ppma01dal.us.ibm.com [127.0.0.1]) by ppma01dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 2A13LYE0005759; Tue, 1 Nov 2022 03:30:21 GMT Received: from b03cxnp07029.gho.boulder.ibm.com (b03cxnp07029.gho.boulder.ibm.com [9.17.130.16]) by ppma01dal.us.ibm.com with ESMTP id 3kgutadyfu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 01 Nov 2022 03:30:21 +0000 Received: from smtpav02.dal12v.mail.ibm.com ([9.208.128.128]) by b03cxnp07029.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2A13UI6021627566 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 1 Nov 2022 03:30:19 GMT Received: from smtpav02.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A335758060; Tue, 1 Nov 2022 03:30:19 +0000 (GMT) Received: from smtpav02.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 59A905805C; Tue, 1 Nov 2022 03:30:19 +0000 (GMT) Received: from pike (unknown [9.5.12.127]) by smtpav02.dal12v.mail.ibm.com (Postfix) with ESMTPS; Tue, 1 Nov 2022 03:30:19 +0000 (GMT) From: Jiufu Guo To: Jeff Law Cc: gcc-patches@gcc.gnu.org, segher@kernel.crashing.org, rguenth@gcc.gnu.org, pinskia@gcc.gnu.org, linkw@gcc.gnu.org, dje.gcc@gmail.com Subject: Re: [RFC] propgation leap over memory copy for struct References: <20221031024235.110995-1-guojiufu@linux.ibm.com> Date: Tue, 01 Nov 2022 11:30:16 +0800 In-Reply-To: (Jeff Law's message of "Mon, 31 Oct 2022 16:13:38 -0600") Message-ID: <7ecza7wdvb.fsf@pike.rch.stglabs.ibm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: EcfMiv-WoIrbVuuq9cIxlL5hNcbPDDyz X-Proofpoint-GUID: -ZA9AwjIMRKfmfz8rF8h56q109GkfBuu X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-10-31_22,2022-10-31_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 malwarescore=0 adultscore=0 clxscore=1015 lowpriorityscore=0 suspectscore=0 mlxscore=0 spamscore=0 impostorscore=0 mlxlogscore=999 phishscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2210170000 definitions=main-2211010022 X-Spam-Status: No, score=-5.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Jeff Law writes: > On 10/30/22 20:42, Jiufu Guo via Gcc-patches wrote: >> Hi, >> >> We know that for struct variable assignment, memory copy may be used. >> And for memcpy, we may load and store more bytes as possible at one time. >> While it may be not best here: >> 1. Before/after stuct variable assignment, the vaiable may be operated. >> And it is hard for some optimizations to leap over memcpy. Then some st= ruct >> operations may be sub-optimimal. Like the issue in PR65421. >> 2. The size of struct is constant mostly, the memcpy would be expanded. = Using >> small size to load/store and executing in parallel may not slower than u= sing >> large size to loat/store. (sure, more registers may be used for smaller = bytes.) >> >> >> In PR65421, For source code as below: >> ////////t.c >> #define FN 4 >> typedef struct { double a[FN]; } A; >> >> A foo (const A *a) { return *a; } >> A bar (const A a) { return a; } > > So the first question in my mind is can we do better at the gimple > phase?=C2=A0 For the second case in particular can't we just "return a" > rather than copying a into then returning ?=C2=A0 This > feels a lot like the return value optimization from C++.=C2=A0 I'm not su= re > if it applies to the first case or not, it's been a long time since I > looked at NRV optimizations, but it might be worth poking around in > there a bit (tree-nrv.cc). Thanks for point out this idea!! Currently the optimized gimple looks like: D.3334 =3D a; return D.3334; and D.3336 =3D *a_2(D); return D.3336; It may be better to have: "return a;" and "return *a;" ----------------- If the code looks like: typedef struct { double a[3]; long l;} A; //mix types A foo (const A a) { return a; } A bar (const A *a) { return *a; } Current optimized gimples looks like: =3D a; return ; and =3D *a_2(D); return ; "return a;" and "return *a;" may be works here too. > > > But even so, these kinds of things are still bound to happen, so it's > probably worth thinking about if we can do better in RTL as well.=20 > Yeap, thanks! > > The first thing that comes to my mind is to annotate memcpy calls that > are structure assignments.=C2=A0 The idea here is that we may want to > expand a memcpy differently in those cases.=C2=A0=C2=A0 Changing how we e= xpand > an opaque memcpy call is unlikely to be beneficial in most cases.=C2=A0 B= ut > changing how we expand a structure copy may be beneficial by exposing > the underlying field values.=C2=A0=C2=A0 This would roughly correspond to= your > method #1. Right. For general memcpy, we would read/write larger bytes at one time. Reading/writing small fields may only beneficial for structure assignment. > > Or instead of changing how we expand, teach the optimizers about these > annotated memcpy calls -- they're just a a copy of each field. =C2=A0 > That's how CSE and the propagators could treat them. After some point > we'd lower them in the usual ways, but at least early in the RTL > pipeline we could keep them as annotated memcpy calls.=C2=A0 This roughly > corresponds to your second suggestion. Thanks for your insights about this idea! Using annoated memcpy for early optimizations, and it would be treated as general memcpy in later passes. Thanks again for your very helpful comments and sugguestions! BR, Jeff(Jiufu) > > > jeff