From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id CE4633858D33; Wed, 9 Nov 2022 09:21:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CE4633858D33 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2A991cEJ005049; Wed, 9 Nov 2022 09:21:34 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : references : date : in-reply-to : message-id : mime-version : content-type; s=pp1; bh=DlYv3O+8ftopzTVzP8nTaSYs3Ed46BxddghNzwwdcIA=; b=MKsiUSBu/hKLTg5shOWB3MEhXvttAWr+/yYXoAPCeozYS/SnYSNhLvjQIkxhu0jvs70m whzlvwENBwKDXLU68CNQq3IvrfpJePKTGz9urcY9KcJ7exYxE8KakeZLD9ZAx2TiikN/ hyEhtYd/pqKMfjNLzXl7Z4ebdyGKHj73dgRRJdwQjxEgCTU3dKt3KTXJDqyzglWqi4Xl N40H/Y4lY8rvT6pqL7f9JGtSFeRTRYVjCSylfR9cygIHkkAHwdfOWkLWPD8q+vuS6YWw P668M85689Ejt6EgrOVlad83SNcouiH7KuVymIaZj3tb2gUWc4IBjxsQ7AE6t6ZpWOay jQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3kr7d1m0r3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 09 Nov 2022 09:21:33 +0000 Received: from m0098420.ppops.net (m0098420.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2A99JCRD028331; Wed, 9 Nov 2022 09:21:33 GMT Received: from ppma03wdc.us.ibm.com (ba.79.3fa9.ip4.static.sl-reverse.com [169.63.121.186]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3kr7d1m0qs-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 09 Nov 2022 09:21:33 +0000 Received: from pps.filterd (ppma03wdc.us.ibm.com [127.0.0.1]) by ppma03wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 2A99Jm53029013; Wed, 9 Nov 2022 09:21:32 GMT Received: from b01cxnp22035.gho.pok.ibm.com (b01cxnp22035.gho.pok.ibm.com [9.57.198.25]) by ppma03wdc.us.ibm.com with ESMTP id 3kngs7cqxa-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 09 Nov 2022 09:21:32 +0000 Received: from smtpav02.wdc07v.mail.ibm.com ([9.208.128.114]) by b01cxnp22035.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2A99LVOM4850218 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 9 Nov 2022 09:21:32 GMT Received: from smtpav02.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 434AB58067; Wed, 9 Nov 2022 09:21:31 +0000 (GMT) Received: from smtpav02.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id A74F95805F; Wed, 9 Nov 2022 09:21:30 +0000 (GMT) Received: from pike (unknown [9.5.12.127]) by smtpav02.wdc07v.mail.ibm.com (Postfix) with ESMTPS; Wed, 9 Nov 2022 09:21:30 +0000 (GMT) From: Jiufu Guo To: Richard Biener Cc: Jeff Law , gcc-patches@gcc.gnu.org, pinskia@gcc.gnu.org, dje.gcc@gmail.com, linkw@gcc.gnu.org, segher@kernel.crashing.org, rguenth@gcc.gnu.org Subject: Re: [RFC] propgation leap over memory copy for struct References: <20221031024235.110995-1-guojiufu@linux.ibm.com> Date: Wed, 09 Nov 2022 17:21:28 +0800 In-Reply-To: (Richard Biener's message of "Sat, 5 Nov 2022 12:38:11 +0100") Message-ID: <7e8rkktrdz.fsf@pike.rch.stglabs.ibm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-TM-AS-GCONF: 00 X-Proofpoint-GUID: hrY-fajxbxBA9X0XwUjdZnRoHT4zps1Y X-Proofpoint-ORIG-GUID: lhWLbNrJrMrEh2Jo6tQSIZB6WtEAeW_- X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-09_03,2022-11-08_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 adultscore=0 priorityscore=1501 bulkscore=0 clxscore=1015 spamscore=0 malwarescore=0 phishscore=0 lowpriorityscore=0 impostorscore=0 mlxlogscore=999 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2210170000 definitions=main-2211090069 X-Spam-Status: No, score=-5.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi, Richard Biener writes: > On Mon, Oct 31, 2022 at 11:14 PM Jeff Law via Gcc-patches > wrote: >> >> >> On 10/30/22 20:42, Jiufu Guo via Gcc-patches wrote: >> > Hi, >> > >> > We know that for struct variable assignment, memory copy may be used. >> > And for memcpy, we may load and store more bytes as possible at one time. >> > While it may be not best here: >> > 1. Before/after stuct variable assignment, the vaiable may be operated. >> > And it is hard for some optimizations to leap over memcpy. Then some struct >> > operations may be sub-optimimal. Like the issue in PR65421. >> > 2. The size of struct is constant mostly, the memcpy would be expanded. Using >> > small size to load/store and executing in parallel may not slower than using >> > large size to loat/store. (sure, more registers may be used for smaller bytes.) >> > >> > >> > In PR65421, For source code as below: >> > ////////t.c >> > #define FN 4 >> > typedef struct { double a[FN]; } A; >> > >> > A foo (const A *a) { return *a; } >> > A bar (const A a) { return a; } >> >> So the first question in my mind is can we do better at the gimple >> phase? For the second case in particular can't we just "return a" >> rather than copying a into then returning ? This feels >> a lot like the return value optimization from C++. I'm not sure if it >> applies to the first case or not, it's been a long time since I looked >> at NRV optimizations, but it might be worth poking around in there a bit >> (tree-nrv.cc). >> >> >> But even so, these kinds of things are still bound to happen, so it's >> probably worth thinking about if we can do better in RTL as well. >> >> >> The first thing that comes to my mind is to annotate memcpy calls that >> are structure assignments. The idea here is that we may want to expand >> a memcpy differently in those cases. Changing how we expand an opaque >> memcpy call is unlikely to be beneficial in most cases. But changing >> how we expand a structure copy may be beneficial by exposing the >> underlying field values. This would roughly correspond to your method #1. >> >> Or instead of changing how we expand, teach the optimizers about these >> annotated memcpy calls -- they're just a a copy of each field. That's >> how CSE and the propagators could treat them. After some point we'd >> lower them in the usual ways, but at least early in the RTL pipeline we >> could keep them as annotated memcpy calls. This roughly corresponds to >> your second suggestion. > > In the end it depends on the access patterns so some analysis like SRA > performs would be nice. The issue with expanding memcpy on GIMPLE > is that we currently cannot express 'rep; movb;' or other target specific > sequences from the cpymem like optabs on GIMPLE and recovering those > from piecewise copies on RTL is going to be difficult. Actually, it is a special memcpy. It is generated during expanding the struct assignment(expand_assignment/store_expr/emit_block_move). We may introduce a function block_move_for_record for struct type. And this function could be a target hook to generate specificed sequences. For example: r125:DF=[r112:DI+0x20] r126:DF=[r112:DI+0x28] [r112:DI]=r125:DF [r112:DI+0x8]=r126:DF After expanding, following passes(cse/prop/dse/..) could optimize the insn sequences. e.g "[r112:DI+0x20]=f1;r125:DF=[r112:DI+0x20]; [r112:DI]=r125:DF;r129:DF=[r112:DI]" ==> "r129:DF=f1" And if the small reading/writing insns are still occur in late passes e.g. combine, we would recover the isnsn to better sequence: r125:DF=[r112:DI+0x20];r126:DF=[r112:DI+0x28] ==> r155:V2DI=[r112:DI+0x20]; Any comments? Thanks! BR, Jeff(Jiufu) > >> >> jeff >> >> >>