From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <guojiufu@linux.ibm.com>
Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5])
	by sourceware.org (Postfix) with ESMTPS id CE4633858D33;
	Wed,  9 Nov 2022 09:21:36 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CE4633858D33
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com
Received: from pps.filterd (m0098420.ppops.net [127.0.0.1])
	by mx0b-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2A991cEJ005049;
	Wed, 9 Nov 2022 09:21:34 GMT
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject
 : references : date : in-reply-to : message-id : mime-version :
 content-type; s=pp1; bh=DlYv3O+8ftopzTVzP8nTaSYs3Ed46BxddghNzwwdcIA=;
 b=MKsiUSBu/hKLTg5shOWB3MEhXvttAWr+/yYXoAPCeozYS/SnYSNhLvjQIkxhu0jvs70m
 whzlvwENBwKDXLU68CNQq3IvrfpJePKTGz9urcY9KcJ7exYxE8KakeZLD9ZAx2TiikN/
 hyEhtYd/pqKMfjNLzXl7Z4ebdyGKHj73dgRRJdwQjxEgCTU3dKt3KTXJDqyzglWqi4Xl
 N40H/Y4lY8rvT6pqL7f9JGtSFeRTRYVjCSylfR9cygIHkkAHwdfOWkLWPD8q+vuS6YWw
 P668M85689Ejt6EgrOVlad83SNcouiH7KuVymIaZj3tb2gUWc4IBjxsQ7AE6t6ZpWOay jQ== 
Received: from pps.reinject (localhost [127.0.0.1])
	by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3kr7d1m0r3-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
	Wed, 09 Nov 2022 09:21:33 +0000
Received: from m0098420.ppops.net (m0098420.ppops.net [127.0.0.1])
	by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2A99JCRD028331;
	Wed, 9 Nov 2022 09:21:33 GMT
Received: from ppma03wdc.us.ibm.com (ba.79.3fa9.ip4.static.sl-reverse.com [169.63.121.186])
	by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3kr7d1m0qs-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
	Wed, 09 Nov 2022 09:21:33 +0000
Received: from pps.filterd (ppma03wdc.us.ibm.com [127.0.0.1])
	by ppma03wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 2A99Jm53029013;
	Wed, 9 Nov 2022 09:21:32 GMT
Received: from b01cxnp22035.gho.pok.ibm.com (b01cxnp22035.gho.pok.ibm.com [9.57.198.25])
	by ppma03wdc.us.ibm.com with ESMTP id 3kngs7cqxa-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
	Wed, 09 Nov 2022 09:21:32 +0000
Received: from smtpav02.wdc07v.mail.ibm.com ([9.208.128.114])
	by b01cxnp22035.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2A99LVOM4850218
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
	Wed, 9 Nov 2022 09:21:32 GMT
Received: from smtpav02.wdc07v.mail.ibm.com (unknown [127.0.0.1])
	by IMSVA (Postfix) with ESMTP id 434AB58067;
	Wed,  9 Nov 2022 09:21:31 +0000 (GMT)
Received: from smtpav02.wdc07v.mail.ibm.com (unknown [127.0.0.1])
	by IMSVA (Postfix) with ESMTP id A74F95805F;
	Wed,  9 Nov 2022 09:21:30 +0000 (GMT)
Received: from pike (unknown [9.5.12.127])
	by smtpav02.wdc07v.mail.ibm.com (Postfix) with ESMTPS;
	Wed,  9 Nov 2022 09:21:30 +0000 (GMT)
From: Jiufu Guo <guojiufu@linux.ibm.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: Jeff Law <jeffreyalaw@gmail.com>, gcc-patches@gcc.gnu.org,
        pinskia@gcc.gnu.org, dje.gcc@gmail.com, linkw@gcc.gnu.org,
        segher@kernel.crashing.org, rguenth@gcc.gnu.org
Subject: Re: [RFC] propgation leap over memory copy for struct
References: <20221031024235.110995-1-guojiufu@linux.ibm.com>
	<daf54634-cb3e-a7f8-213d-c18ba781a3ef@gmail.com>
	<CAFiYyc1gtCSC5563LAWDGEn1EAcbpkcCqjj5JMEqnRyKMTmr6Q@mail.gmail.com>
Date: Wed, 09 Nov 2022 17:21:28 +0800
In-Reply-To: <CAFiYyc1gtCSC5563LAWDGEn1EAcbpkcCqjj5JMEqnRyKMTmr6Q@mail.gmail.com>
	(Richard Biener's message of "Sat, 5 Nov 2022 12:38:11 +0100")
Message-ID: <7e8rkktrdz.fsf@pike.rch.stglabs.ibm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-TM-AS-GCONF: 00
X-Proofpoint-GUID: hrY-fajxbxBA9X0XwUjdZnRoHT4zps1Y
X-Proofpoint-ORIG-GUID: lhWLbNrJrMrEh2Jo6tQSIZB6WtEAeW_-
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1
 definitions=2022-11-09_03,2022-11-08_01,2022-06-22_01
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 adultscore=0
 priorityscore=1501 bulkscore=0 clxscore=1015 spamscore=0 malwarescore=0
 phishscore=0 lowpriorityscore=0 impostorscore=0 mlxlogscore=999
 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.12.0-2210170000 definitions=main-2211090069
X-Spam-Status: No, score=-5.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Hi,

Richard Biener <richard.guenther@gmail.com> writes:

> On Mon, Oct 31, 2022 at 11:14 PM Jeff Law via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>>
>> On 10/30/22 20:42, Jiufu Guo via Gcc-patches wrote:
>> > Hi,
>> >
>> > We know that for struct variable assignment, memory copy may be used.
>> > And for memcpy, we may load and store more bytes as possible at one time.
>> > While it may be not best here:
>> > 1. Before/after stuct variable assignment, the vaiable may be operated.
>> > And it is hard for some optimizations to leap over memcpy.  Then some struct
>> > operations may be sub-optimimal.  Like the issue in PR65421.
>> > 2. The size of struct is constant mostly, the memcpy would be expanded.  Using
>> > small size to load/store and executing in parallel may not slower than using
>> > large size to loat/store. (sure, more registers may be used for smaller bytes.)
>> >
>> >
>> > In PR65421, For source code as below:
>> > ////////t.c
>> > #define FN 4
>> > typedef struct { double a[FN]; } A;
>> >
>> > A foo (const A *a) { return *a; }
>> > A bar (const A a) { return a; }
>>
>> So the first question in my mind is can we do better at the gimple
>> phase?  For the second case in particular can't we just "return a"
>> rather than copying a into <retval> then returning <retval>?  This feels
>> a lot like the return value optimization from C++.  I'm not sure if it
>> applies to the first case or not, it's been a long time since I looked
>> at NRV optimizations, but it might be worth poking around in there a bit
>> (tree-nrv.cc).
>>
>>
>> But even so, these kinds of things are still bound to happen, so it's
>> probably worth thinking about if we can do better in RTL as well.
>>
>>
>> The first thing that comes to my mind is to annotate memcpy calls that
>> are structure assignments.  The idea here is that we may want to expand
>> a memcpy differently in those cases.   Changing how we expand an opaque
>> memcpy call is unlikely to be beneficial in most cases.  But changing
>> how we expand a structure copy may be beneficial by exposing the
>> underlying field values.   This would roughly correspond to your method #1.
>>
>> Or instead of changing how we expand, teach the optimizers about these
>> annotated memcpy calls -- they're just a a copy of each field.   That's
>> how CSE and the propagators could treat them. After some point we'd
>> lower them in the usual ways, but at least early in the RTL pipeline we
>> could keep them as annotated memcpy calls.  This roughly corresponds to
>> your second suggestion.
>
> In the end it depends on the access patterns so some analysis like SRA
> performs would be nice.  The issue with expanding memcpy on GIMPLE
> is that we currently cannot express 'rep; movb;' or other target specific
> sequences from the cpymem like optabs on GIMPLE and recovering those
> from piecewise copies on RTL is going to be difficult.
Actually, it is a special memcpy. It is generated during expanding the
struct assignment(expand_assignment/store_expr/emit_block_move).
We may introduce a function block_move_for_record for struct type.  And
this function could be a target hook to generate specificed sequences.
For example:
r125:DF=[r112:DI+0x20]
r126:DF=[r112:DI+0x28]
[r112:DI]=r125:DF
[r112:DI+0x8]=r126:DF

After expanding, following passes(cse/prop/dse/..) could optimize the
insn sequences. e.g "[r112:DI+0x20]=f1;r125:DF=[r112:DI+0x20];
[r112:DI]=r125:DF;r129:DF=[r112:DI]" ==> "r129:DF=f1"

And if the small reading/writing insns are still occur in late passes
e.g. combine, we would recover the isnsn to better sequence:
r125:DF=[r112:DI+0x20];r126:DF=[r112:DI+0x28]
==>
r155:V2DI=[r112:DI+0x20];

Any comments? Thanks!

BR,
Jeff(Jiufu)


>
>>
>> jeff
>>
>>
>>