From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <segher@kernel.crashing.org>
Received: from gate.crashing.org (gate.crashing.org [63.228.1.57])
	by sourceware.org (Postfix) with ESMTP id AE80E3858401;
	Tue,  1 Nov 2022 00:38:06 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AE80E3858401
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=kernel.crashing.org
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=kernel.crashing.org
Received: from gate.crashing.org (localhost.localdomain [127.0.0.1])
	by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 2A10b52j012886;
	Mon, 31 Oct 2022 19:37:06 -0500
Received: (from segher@localhost)
	by gate.crashing.org (8.14.1/8.14.1/Submit) id 2A10b5CG012885;
	Mon, 31 Oct 2022 19:37:05 -0500
X-Authentication-Warning: gate.crashing.org: segher set sender to segher@kernel.crashing.org using -f
Date: Mon, 31 Oct 2022 19:37:05 -0500
From: Segher Boessenkool <segher@kernel.crashing.org>
To: Jiufu Guo <guojiufu@linux.ibm.com>
Cc: gcc-patches@gcc.gnu.org, dje.gcc@gmail.com, linkw@gcc.gnu.org,
        rguenth@gcc.gnu.org, pinskia@gcc.gnu.org
Subject: Re: [RFC] propgation leap over memory copy for struct
Message-ID: <20221101003705.GK25951@gate.crashing.org>
References: <20221031024235.110995-1-guojiufu@linux.ibm.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20221031024235.110995-1-guojiufu@linux.ibm.com>
User-Agent: Mutt/1.4.2.3i
X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00,JMQ_SPF_NEUTRAL,KAM_DMARC_STATUS,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Hi!

On Mon, Oct 31, 2022 at 10:42:35AM +0800, Jiufu Guo wrote:
> #define FN 4
> typedef struct { double a[FN]; } A;
> 
> A foo (const A *a) { return *a; }
> A bar (const A a) { return a; }
> ///////
> 
> If FN<=2; the size of "A" fits into TImode, then this code can be optimized 
> (by subreg/cse/fwprop/cprop) as:
> -------
> foo:
> .LFB0:
>         .cfi_startproc
>         blr
> 
> bar:
> .LFB1:
>       	.cfi_startproc
> 	lfd 2,8(3)
> 	lfd 1,0(3)
> 	blr
> --------

I think you swapped foo and bar here?

> If the size of "A" is larger than any INT mode size, RTL insns would be 
> generated as:
>    13: r125:V2DI=[r112:DI+0x20]
>    14: r126:V2DI=[r112:DI+0x30]
>    15: [r112:DI]=r125:V2DI
>    16: [r112:DI+0x10]=r126:V2DI  /// memcpy for assignment: D.3338 = arg;
>    17: r127:DF=[r112:DI]
>    18: r128:DF=[r112:DI+0x8]
>    19: r129:DF=[r112:DI+0x10]
>    20: r130:DF=[r112:DI+0x18]
> ------------
> 
> I'm thinking about ways to improve this.
> Metod1: One way may be changing the memory copy by referencing the type 
> of the struct if the size of struct is not too big. And generate insns 
> like the below:
>    13: r125:DF=[r112:DI+0x20]
>    15: r126:DF=[r112:DI+0x28]
>    17: r127:DF=[r112:DI+0x30]
>    19: r128:DF=[r112:DI+0x38]
>    14: [r112:DI]=r125:DF
>    16: [r112:DI+0x8]=r126:DF
>    18: [r112:DI+0x10]=r127:DF
>    20: [r112:DI+0x18]=r128:DF
>    21: r129:DF=[r112:DI]
>    22: r130:DF=[r112:DI+0x8]
>    23: r131:DF=[r112:DI+0x10]
>    24: r132:DF=[r112:DI+0x18]

This is much worse though?  The expansion with memcpy used V2DI, which
typically is close to 2x faster than DFmode accesses.

Or are you trying to avoid small reads of large stores here?  Those
aren't so bad, large reads of small stores is the nastiness we need to
avoid.

The code we have now does

   15: [r112:DI]=r125:V2DI
...
   17: r127:DF=[r112:DI]
   18: r128:DF=[r112:DI+0x8]

Can you make this optimised to not use a memory temporary at all, just
immediately assign from r125 to r127 and r128?

> Method2: One way may be enhancing CSE to make it able to treat one large
> memory slot as two(or more) combined slots: 
>    13: r125:V2DI#0=[r112:DI+0x20]
>    13': r125:V2DI#8=[r112:DI+0x28]
>    15: [r112:DI]#0=r125:V2DI#0
>    15': [r112:DI]#8=r125:V2DI#8
> 
> This may seems more hack in CSE.

The current CSE pass we have is the pass most in need of a full rewrite
we have, since many many years.  It does a lot of things, important
things that we should not lose, but it does a pretty bad job of CSE.

> Method3: For some record type, use "PARALLEL:BLK" instead "MEM:BLK".

:BLK can never be optimised well.  It always has to live in memory, by
definition.


Segher