From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) by sourceware.org (Postfix) with ESMTP id AE80E3858401; Tue, 1 Nov 2022 00:38:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AE80E3858401 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=kernel.crashing.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=kernel.crashing.org Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 2A10b52j012886; Mon, 31 Oct 2022 19:37:06 -0500 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id 2A10b5CG012885; Mon, 31 Oct 2022 19:37:05 -0500 X-Authentication-Warning: gate.crashing.org: segher set sender to segher@kernel.crashing.org using -f Date: Mon, 31 Oct 2022 19:37:05 -0500 From: Segher Boessenkool To: Jiufu Guo Cc: gcc-patches@gcc.gnu.org, dje.gcc@gmail.com, linkw@gcc.gnu.org, rguenth@gcc.gnu.org, pinskia@gcc.gnu.org Subject: Re: [RFC] propgation leap over memory copy for struct Message-ID: <20221101003705.GK25951@gate.crashing.org> References: <20221031024235.110995-1-guojiufu@linux.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20221031024235.110995-1-guojiufu@linux.ibm.com> User-Agent: Mutt/1.4.2.3i X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00,JMQ_SPF_NEUTRAL,KAM_DMARC_STATUS,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi! On Mon, Oct 31, 2022 at 10:42:35AM +0800, Jiufu Guo wrote: > #define FN 4 > typedef struct { double a[FN]; } A; > > A foo (const A *a) { return *a; } > A bar (const A a) { return a; } > /////// > > If FN<=2; the size of "A" fits into TImode, then this code can be optimized > (by subreg/cse/fwprop/cprop) as: > ------- > foo: > .LFB0: > .cfi_startproc > blr > > bar: > .LFB1: > .cfi_startproc > lfd 2,8(3) > lfd 1,0(3) > blr > -------- I think you swapped foo and bar here? > If the size of "A" is larger than any INT mode size, RTL insns would be > generated as: > 13: r125:V2DI=[r112:DI+0x20] > 14: r126:V2DI=[r112:DI+0x30] > 15: [r112:DI]=r125:V2DI > 16: [r112:DI+0x10]=r126:V2DI /// memcpy for assignment: D.3338 = arg; > 17: r127:DF=[r112:DI] > 18: r128:DF=[r112:DI+0x8] > 19: r129:DF=[r112:DI+0x10] > 20: r130:DF=[r112:DI+0x18] > ------------ > > I'm thinking about ways to improve this. > Metod1: One way may be changing the memory copy by referencing the type > of the struct if the size of struct is not too big. And generate insns > like the below: > 13: r125:DF=[r112:DI+0x20] > 15: r126:DF=[r112:DI+0x28] > 17: r127:DF=[r112:DI+0x30] > 19: r128:DF=[r112:DI+0x38] > 14: [r112:DI]=r125:DF > 16: [r112:DI+0x8]=r126:DF > 18: [r112:DI+0x10]=r127:DF > 20: [r112:DI+0x18]=r128:DF > 21: r129:DF=[r112:DI] > 22: r130:DF=[r112:DI+0x8] > 23: r131:DF=[r112:DI+0x10] > 24: r132:DF=[r112:DI+0x18] This is much worse though? The expansion with memcpy used V2DI, which typically is close to 2x faster than DFmode accesses. Or are you trying to avoid small reads of large stores here? Those aren't so bad, large reads of small stores is the nastiness we need to avoid. The code we have now does 15: [r112:DI]=r125:V2DI ... 17: r127:DF=[r112:DI] 18: r128:DF=[r112:DI+0x8] Can you make this optimised to not use a memory temporary at all, just immediately assign from r125 to r127 and r128? > Method2: One way may be enhancing CSE to make it able to treat one large > memory slot as two(or more) combined slots: > 13: r125:V2DI#0=[r112:DI+0x20] > 13': r125:V2DI#8=[r112:DI+0x28] > 15: [r112:DI]#0=r125:V2DI#0 > 15': [r112:DI]#8=r125:V2DI#8 > > This may seems more hack in CSE. The current CSE pass we have is the pass most in need of a full rewrite we have, since many many years. It does a lot of things, important things that we should not lose, but it does a pretty bad job of CSE. > Method3: For some record type, use "PARALLEL:BLK" instead "MEM:BLK". :BLK can never be optimised well. It always has to live in memory, by definition. Segher