From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <guojiufu@linux.ibm.com>
Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1])
	by sourceware.org (Postfix) with ESMTPS id B5C063858418;
	Tue,  1 Nov 2022 03:02:38 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B5C063858418
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com
Received: from pps.filterd (m0098410.ppops.net [127.0.0.1])
	by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2A11F95G031621;
	Tue, 1 Nov 2022 03:02:37 GMT
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject
 : references : date : in-reply-to : message-id : mime-version :
 content-type; s=pp1; bh=TfLcvZmpnz2O6Z5tgV/vy2RnbICtoZU9dPyKj+IBCbo=;
 b=jU7QdPP2ceuc4wRKpI/yS773Cbf7mJR3+y4BajwwTOf3rkOKnyniJ7309JTiVhgJ7nRD
 GI+FWtG0wQ1DdUTf5RSmJoY4oRC5/w2CP6EX9Q7uR7vLG7w8exIhL4x6dxDW9Y/d/pwa
 AZDFgN73ONN4i71Qj/QOT2nimoIXx4utYOVU2y1v2oiLjLDqanjHRYdEpz1t6MsjBN3r
 BI5bl0+gUNPMis5n5wKDzovXaBwkbjcjE9RNvcVCzNR7AeVUl1pHunk250wB7oevbyN5
 tTeeBxwOCzPjE212OYS+CH6OlVo+lkkKf0fwjipaIDu/AkNZJ7WEf6PWlHrDzr4oGdWo 0A== 
Received: from pps.reinject (localhost [127.0.0.1])
	by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3kjqtqdnmm-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
	Tue, 01 Nov 2022 03:02:37 +0000
Received: from m0098410.ppops.net (m0098410.ppops.net [127.0.0.1])
	by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2A12Cv9r030879;
	Tue, 1 Nov 2022 03:02:36 GMT
Received: from ppma04dal.us.ibm.com (7a.29.35a9.ip4.static.sl-reverse.com [169.53.41.122])
	by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3kjqtqdn2b-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
	Tue, 01 Nov 2022 03:02:36 +0000
Received: from pps.filterd (ppma04dal.us.ibm.com [127.0.0.1])
	by ppma04dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 2A12ogAV029247;
	Tue, 1 Nov 2022 03:01:20 GMT
Received: from b03cxnp07028.gho.boulder.ibm.com (b03cxnp07028.gho.boulder.ibm.com [9.17.130.15])
	by ppma04dal.us.ibm.com with ESMTP id 3kgut9wtk3-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
	Tue, 01 Nov 2022 03:01:19 +0000
Received: from smtpav03.dal12v.mail.ibm.com ([9.208.128.129])
	by b03cxnp07028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2A131Kux8913460
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
	Tue, 1 Nov 2022 03:01:20 GMT
Received: from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1])
	by IMSVA (Postfix) with ESMTP id 63A6D58070;
	Tue,  1 Nov 2022 03:01:18 +0000 (GMT)
Received: from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1])
	by IMSVA (Postfix) with ESMTP id 206365806F;
	Tue,  1 Nov 2022 03:01:18 +0000 (GMT)
Received: from pike (unknown [9.5.12.127])
	by smtpav03.dal12v.mail.ibm.com (Postfix) with ESMTPS;
	Tue,  1 Nov 2022 03:01:18 +0000 (GMT)
From: Jiufu Guo <guojiufu@linux.ibm.com>
To: Segher Boessenkool <segher@kernel.crashing.org>
Cc: gcc-patches@gcc.gnu.org, dje.gcc@gmail.com, linkw@gcc.gnu.org,
        rguenth@gcc.gnu.org, pinskia@gcc.gnu.org
Subject: Re: [RFC] propgation leap over memory copy for struct
References: <20221031024235.110995-1-guojiufu@linux.ibm.com>
	<20221101003705.GK25951@gate.crashing.org>
Date: Tue, 01 Nov 2022 11:01:14 +0800
In-Reply-To: <20221101003705.GK25951@gate.crashing.org> (Segher Boessenkool's
	message of "Mon, 31 Oct 2022 19:37:05 -0500")
Message-ID: <7emt9bwf7p.fsf@pike.rch.stglabs.ibm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-TM-AS-GCONF: 00
X-Proofpoint-GUID: o7xASl7W4I8wSDRe2eQvZwMS-_iyJtrE
X-Proofpoint-ORIG-GUID: VfrA6lak-0YskRkg3wDjRZjOMKWrQMOC
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1
 definitions=2022-10-31_22,2022-10-31_01,2022-06-22_01
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 suspectscore=0
 impostorscore=0 lowpriorityscore=0 spamscore=0 mlxscore=0
 priorityscore=1501 adultscore=0 malwarescore=0 clxscore=1015
 mlxlogscore=999 phishscore=0 classifier=spam adjust=0 reason=mlx
 scancount=1 engine=8.12.0-2210170000 definitions=main-2211010022
X-Spam-Status: No, score=-6.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Segher Boessenkool <segher@kernel.crashing.org> writes:

> Hi!
>
> On Mon, Oct 31, 2022 at 10:42:35AM +0800, Jiufu Guo wrote:
>> #define FN 4
>> typedef struct { double a[FN]; } A;
>> 
>> A foo (const A *a) { return *a; }
>> A bar (const A a) { return a; }
>> ///////
>> 
>> If FN<=2; the size of "A" fits into TImode, then this code can be optimized 
>> (by subreg/cse/fwprop/cprop) as:
>> -------
>> foo:
>> .LFB0:
>>         .cfi_startproc
>>         blr
>> 
>> bar:
>> .LFB1:
>>       	.cfi_startproc
>> 	lfd 2,8(3)
>> 	lfd 1,0(3)
>> 	blr
>> --------
>
> I think you swapped foo and bar here?
Oh, thanks!
>
>> If the size of "A" is larger than any INT mode size, RTL insns would be 
>> generated as:
>>    13: r125:V2DI=[r112:DI+0x20]
>>    14: r126:V2DI=[r112:DI+0x30]
>>    15: [r112:DI]=r125:V2DI
>>    16: [r112:DI+0x10]=r126:V2DI  /// memcpy for assignment: D.3338 = arg;
>>    17: r127:DF=[r112:DI]
>>    18: r128:DF=[r112:DI+0x8]
>>    19: r129:DF=[r112:DI+0x10]
>>    20: r130:DF=[r112:DI+0x18]
>> ------------
>> 
>> I'm thinking about ways to improve this.
>> Metod1: One way may be changing the memory copy by referencing the type 
>> of the struct if the size of struct is not too big. And generate insns 
>> like the below:
>>    13: r125:DF=[r112:DI+0x20]
>>    15: r126:DF=[r112:DI+0x28]
>>    17: r127:DF=[r112:DI+0x30]
>>    19: r128:DF=[r112:DI+0x38]
>>    14: [r112:DI]=r125:DF
>>    16: [r112:DI+0x8]=r126:DF
>>    18: [r112:DI+0x10]=r127:DF
>>    20: [r112:DI+0x18]=r128:DF
>>    21: r129:DF=[r112:DI]
>>    22: r130:DF=[r112:DI+0x8]
>>    23: r131:DF=[r112:DI+0x10]
>>    24: r132:DF=[r112:DI+0x18]
>
> This is much worse though?  The expansion with memcpy used V2DI, which
> typically is close to 2x faster than DFmode accesses.
Using V2DI, it help to access 2x bytes at one time than DF/DI.
While since those readings can be executed in parallel, it would be not
too bad via using DF/DI.

>
> Or are you trying to avoid small reads of large stores here?  Those
> aren't so bad, large reads of small stores is the nastiness we need to
> avoid.
Here, I want to use 2 DF readings, because optimizations cse/fwprop/dse
could eleminate those memory accesses on same size.
>
> The code we have now does
>
>    15: [r112:DI]=r125:V2DI
> ...
>    17: r127:DF=[r112:DI]
>    18: r128:DF=[r112:DI+0x8]
>
> Can you make this optimised to not use a memory temporary at all, just
> immediately assign from r125 to r127 and r128?
r125 are not directly assinged to r127/r128, since 'insn 15' and 'insn
17/18' are expanded for different gimple stmt:
  D.3331 = a;  ==> 'insn 15' is generated for struct assignment here.
  return D.3331; ==> 'insn 17/18' are prepared for return registers.

I'm trying to eliminate thos  memory temporary, and did not find a good
way.  Method1-3 are the ideas which I'm trying to use to delete those
temporaries.

>
>> Method2: One way may be enhancing CSE to make it able to treat one large
>> memory slot as two(or more) combined slots: 
>>    13: r125:V2DI#0=[r112:DI+0x20]
>>    13': r125:V2DI#8=[r112:DI+0x28]
>>    15: [r112:DI]#0=r125:V2DI#0
>>    15': [r112:DI]#8=r125:V2DI#8
>> 
>> This may seems more hack in CSE.
>
> The current CSE pass we have is the pass most in need of a full rewrite
> we have, since many many years.  It does a lot of things, important
> things that we should not lose, but it does a pretty bad job of CSE.
>
>> Method3: For some record type, use "PARALLEL:BLK" instead "MEM:BLK".
>
> :BLK can never be optimised well.  It always has to live in memory, by
> definition.

Thanks for your sugguestions!

BR,
Jeff (Jiufu)
>
>
> Segher