From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id B5C063858418; Tue, 1 Nov 2022 03:02:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B5C063858418 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2A11F95G031621; Tue, 1 Nov 2022 03:02:37 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : references : date : in-reply-to : message-id : mime-version : content-type; s=pp1; bh=TfLcvZmpnz2O6Z5tgV/vy2RnbICtoZU9dPyKj+IBCbo=; b=jU7QdPP2ceuc4wRKpI/yS773Cbf7mJR3+y4BajwwTOf3rkOKnyniJ7309JTiVhgJ7nRD GI+FWtG0wQ1DdUTf5RSmJoY4oRC5/w2CP6EX9Q7uR7vLG7w8exIhL4x6dxDW9Y/d/pwa AZDFgN73ONN4i71Qj/QOT2nimoIXx4utYOVU2y1v2oiLjLDqanjHRYdEpz1t6MsjBN3r BI5bl0+gUNPMis5n5wKDzovXaBwkbjcjE9RNvcVCzNR7AeVUl1pHunk250wB7oevbyN5 tTeeBxwOCzPjE212OYS+CH6OlVo+lkkKf0fwjipaIDu/AkNZJ7WEf6PWlHrDzr4oGdWo 0A== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3kjqtqdnmm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 01 Nov 2022 03:02:37 +0000 Received: from m0098410.ppops.net (m0098410.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2A12Cv9r030879; Tue, 1 Nov 2022 03:02:36 GMT Received: from ppma04dal.us.ibm.com (7a.29.35a9.ip4.static.sl-reverse.com [169.53.41.122]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3kjqtqdn2b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 01 Nov 2022 03:02:36 +0000 Received: from pps.filterd (ppma04dal.us.ibm.com [127.0.0.1]) by ppma04dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 2A12ogAV029247; Tue, 1 Nov 2022 03:01:20 GMT Received: from b03cxnp07028.gho.boulder.ibm.com (b03cxnp07028.gho.boulder.ibm.com [9.17.130.15]) by ppma04dal.us.ibm.com with ESMTP id 3kgut9wtk3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 01 Nov 2022 03:01:19 +0000 Received: from smtpav03.dal12v.mail.ibm.com ([9.208.128.129]) by b03cxnp07028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2A131Kux8913460 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 1 Nov 2022 03:01:20 GMT Received: from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 63A6D58070; Tue, 1 Nov 2022 03:01:18 +0000 (GMT) Received: from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 206365806F; Tue, 1 Nov 2022 03:01:18 +0000 (GMT) Received: from pike (unknown [9.5.12.127]) by smtpav03.dal12v.mail.ibm.com (Postfix) with ESMTPS; Tue, 1 Nov 2022 03:01:18 +0000 (GMT) From: Jiufu Guo To: Segher Boessenkool Cc: gcc-patches@gcc.gnu.org, dje.gcc@gmail.com, linkw@gcc.gnu.org, rguenth@gcc.gnu.org, pinskia@gcc.gnu.org Subject: Re: [RFC] propgation leap over memory copy for struct References: <20221031024235.110995-1-guojiufu@linux.ibm.com> <20221101003705.GK25951@gate.crashing.org> Date: Tue, 01 Nov 2022 11:01:14 +0800 In-Reply-To: <20221101003705.GK25951@gate.crashing.org> (Segher Boessenkool's message of "Mon, 31 Oct 2022 19:37:05 -0500") Message-ID: <7emt9bwf7p.fsf@pike.rch.stglabs.ibm.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-TM-AS-GCONF: 00 X-Proofpoint-GUID: o7xASl7W4I8wSDRe2eQvZwMS-_iyJtrE X-Proofpoint-ORIG-GUID: VfrA6lak-0YskRkg3wDjRZjOMKWrQMOC X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-10-31_22,2022-10-31_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 suspectscore=0 impostorscore=0 lowpriorityscore=0 spamscore=0 mlxscore=0 priorityscore=1501 adultscore=0 malwarescore=0 clxscore=1015 mlxlogscore=999 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2210170000 definitions=main-2211010022 X-Spam-Status: No, score=-6.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Segher Boessenkool writes: > Hi! > > On Mon, Oct 31, 2022 at 10:42:35AM +0800, Jiufu Guo wrote: >> #define FN 4 >> typedef struct { double a[FN]; } A; >> >> A foo (const A *a) { return *a; } >> A bar (const A a) { return a; } >> /////// >> >> If FN<=2; the size of "A" fits into TImode, then this code can be optimized >> (by subreg/cse/fwprop/cprop) as: >> ------- >> foo: >> .LFB0: >> .cfi_startproc >> blr >> >> bar: >> .LFB1: >> .cfi_startproc >> lfd 2,8(3) >> lfd 1,0(3) >> blr >> -------- > > I think you swapped foo and bar here? Oh, thanks! > >> If the size of "A" is larger than any INT mode size, RTL insns would be >> generated as: >> 13: r125:V2DI=[r112:DI+0x20] >> 14: r126:V2DI=[r112:DI+0x30] >> 15: [r112:DI]=r125:V2DI >> 16: [r112:DI+0x10]=r126:V2DI /// memcpy for assignment: D.3338 = arg; >> 17: r127:DF=[r112:DI] >> 18: r128:DF=[r112:DI+0x8] >> 19: r129:DF=[r112:DI+0x10] >> 20: r130:DF=[r112:DI+0x18] >> ------------ >> >> I'm thinking about ways to improve this. >> Metod1: One way may be changing the memory copy by referencing the type >> of the struct if the size of struct is not too big. And generate insns >> like the below: >> 13: r125:DF=[r112:DI+0x20] >> 15: r126:DF=[r112:DI+0x28] >> 17: r127:DF=[r112:DI+0x30] >> 19: r128:DF=[r112:DI+0x38] >> 14: [r112:DI]=r125:DF >> 16: [r112:DI+0x8]=r126:DF >> 18: [r112:DI+0x10]=r127:DF >> 20: [r112:DI+0x18]=r128:DF >> 21: r129:DF=[r112:DI] >> 22: r130:DF=[r112:DI+0x8] >> 23: r131:DF=[r112:DI+0x10] >> 24: r132:DF=[r112:DI+0x18] > > This is much worse though? The expansion with memcpy used V2DI, which > typically is close to 2x faster than DFmode accesses. Using V2DI, it help to access 2x bytes at one time than DF/DI. While since those readings can be executed in parallel, it would be not too bad via using DF/DI. > > Or are you trying to avoid small reads of large stores here? Those > aren't so bad, large reads of small stores is the nastiness we need to > avoid. Here, I want to use 2 DF readings, because optimizations cse/fwprop/dse could eleminate those memory accesses on same size. > > The code we have now does > > 15: [r112:DI]=r125:V2DI > ... > 17: r127:DF=[r112:DI] > 18: r128:DF=[r112:DI+0x8] > > Can you make this optimised to not use a memory temporary at all, just > immediately assign from r125 to r127 and r128? r125 are not directly assinged to r127/r128, since 'insn 15' and 'insn 17/18' are expanded for different gimple stmt: D.3331 = a; ==> 'insn 15' is generated for struct assignment here. return D.3331; ==> 'insn 17/18' are prepared for return registers. I'm trying to eliminate thos memory temporary, and did not find a good way. Method1-3 are the ideas which I'm trying to use to delete those temporaries. > >> Method2: One way may be enhancing CSE to make it able to treat one large >> memory slot as two(or more) combined slots: >> 13: r125:V2DI#0=[r112:DI+0x20] >> 13': r125:V2DI#8=[r112:DI+0x28] >> 15: [r112:DI]#0=r125:V2DI#0 >> 15': [r112:DI]#8=r125:V2DI#8 >> >> This may seems more hack in CSE. > > The current CSE pass we have is the pass most in need of a full rewrite > we have, since many many years. It does a lot of things, important > things that we should not lose, but it does a pretty bad job of CSE. > >> Method3: For some record type, use "PARALLEL:BLK" instead "MEM:BLK". > > :BLK can never be optimised well. It always has to live in memory, by > definition. Thanks for your sugguestions! BR, Jeff (Jiufu) > > > Segher