Re: [RFC] propgation leap over memory copy for struct

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Jiufu Guo <guojiufu@linux.ibm.com>
To: Segher Boessenkool <segher@kernel.crashing.org>
Cc: gcc-patches@gcc.gnu.org, dje.gcc@gmail.com, linkw@gcc.gnu.org,
	rguenth@gcc.gnu.org, pinskia@gcc.gnu.org
Subject: Re: [RFC] propgation leap over memory copy for struct
Date: Tue, 01 Nov 2022 11:01:14 +0800	[thread overview]
Message-ID: <7emt9bwf7p.fsf@pike.rch.stglabs.ibm.com> (raw)
In-Reply-To: <20221101003705.GK25951@gate.crashing.org> (Segher Boessenkool's message of "Mon, 31 Oct 2022 19:37:05 -0500")

Segher Boessenkool <segher@kernel.crashing.org> writes:

> Hi!
>
> On Mon, Oct 31, 2022 at 10:42:35AM +0800, Jiufu Guo wrote:
>> #define FN 4
>> typedef struct { double a[FN]; } A;
>> 
>> A foo (const A *a) { return *a; }
>> A bar (const A a) { return a; }
>> ///////
>> 
>> If FN<=2; the size of "A" fits into TImode, then this code can be optimized 
>> (by subreg/cse/fwprop/cprop) as:
>> -------
>> foo:
>> .LFB0:
>>         .cfi_startproc
>>         blr
>> 
>> bar:
>> .LFB1:
>>       	.cfi_startproc
>> 	lfd 2,8(3)
>> 	lfd 1,0(3)
>> 	blr
>> --------
>
> I think you swapped foo and bar here?
Oh, thanks!
>
>> If the size of "A" is larger than any INT mode size, RTL insns would be 
>> generated as:
>>    13: r125:V2DI=[r112:DI+0x20]
>>    14: r126:V2DI=[r112:DI+0x30]
>>    15: [r112:DI]=r125:V2DI
>>    16: [r112:DI+0x10]=r126:V2DI  /// memcpy for assignment: D.3338 = arg;
>>    17: r127:DF=[r112:DI]
>>    18: r128:DF=[r112:DI+0x8]
>>    19: r129:DF=[r112:DI+0x10]
>>    20: r130:DF=[r112:DI+0x18]
>> ------------
>> 
>> I'm thinking about ways to improve this.
>> Metod1: One way may be changing the memory copy by referencing the type 
>> of the struct if the size of struct is not too big. And generate insns 
>> like the below:
>>    13: r125:DF=[r112:DI+0x20]
>>    15: r126:DF=[r112:DI+0x28]
>>    17: r127:DF=[r112:DI+0x30]
>>    19: r128:DF=[r112:DI+0x38]
>>    14: [r112:DI]=r125:DF
>>    16: [r112:DI+0x8]=r126:DF
>>    18: [r112:DI+0x10]=r127:DF
>>    20: [r112:DI+0x18]=r128:DF
>>    21: r129:DF=[r112:DI]
>>    22: r130:DF=[r112:DI+0x8]
>>    23: r131:DF=[r112:DI+0x10]
>>    24: r132:DF=[r112:DI+0x18]
>
> This is much worse though?  The expansion with memcpy used V2DI, which
> typically is close to 2x faster than DFmode accesses.
Using V2DI, it help to access 2x bytes at one time than DF/DI.
While since those readings can be executed in parallel, it would be not
too bad via using DF/DI.

>
> Or are you trying to avoid small reads of large stores here?  Those
> aren't so bad, large reads of small stores is the nastiness we need to
> avoid.
Here, I want to use 2 DF readings, because optimizations cse/fwprop/dse
could eleminate those memory accesses on same size.
>
> The code we have now does
>
>    15: [r112:DI]=r125:V2DI
> ...
>    17: r127:DF=[r112:DI]
>    18: r128:DF=[r112:DI+0x8]
>
> Can you make this optimised to not use a memory temporary at all, just
> immediately assign from r125 to r127 and r128?
r125 are not directly assinged to r127/r128, since 'insn 15' and 'insn
17/18' are expanded for different gimple stmt:
  D.3331 = a;  ==> 'insn 15' is generated for struct assignment here.
  return D.3331; ==> 'insn 17/18' are prepared for return registers.

I'm trying to eliminate thos  memory temporary, and did not find a good
way.  Method1-3 are the ideas which I'm trying to use to delete those
temporaries.

>
>> Method2: One way may be enhancing CSE to make it able to treat one large
>> memory slot as two(or more) combined slots: 
>>    13: r125:V2DI#0=[r112:DI+0x20]
>>    13': r125:V2DI#8=[r112:DI+0x28]
>>    15: [r112:DI]#0=r125:V2DI#0
>>    15': [r112:DI]#8=r125:V2DI#8
>> 
>> This may seems more hack in CSE.
>
> The current CSE pass we have is the pass most in need of a full rewrite
> we have, since many many years.  It does a lot of things, important
> things that we should not lose, but it does a pretty bad job of CSE.
>
>> Method3: For some record type, use "PARALLEL:BLK" instead "MEM:BLK".
>
> :BLK can never be optimised well.  It always has to live in memory, by
> definition.

Thanks for your sugguestions!

BR,
Jeff (Jiufu)
>
>
> Segher

     prev parent reply	other threads:[~2022-11-01  3:02 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-31  2:42 Jiufu Guo
2022-10-31 22:13 ` Jeff Law
2022-11-01  0:49   ` Segher Boessenkool
2022-11-01  4:30     ` Jiufu Guo
2022-11-05 14:13       ` Richard Biener
2022-11-08  4:05         ` Jiufu Guo
2022-11-09  7:51           ` Jiufu Guo
2022-11-09  8:50             ` Richard Biener
2022-11-01  3:30   ` Jiufu Guo
2022-11-05 11:38   ` Richard Biener
2022-11-09  9:21     ` Jiufu Guo
2022-11-09 12:56       ` Richard Biener
2022-11-01  0:37 ` Segher Boessenkool
2022-11-01  3:01   ` Jiufu Guo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7emt9bwf7p.fsf@pike.rch.stglabs.ibm.com \
    --to=guojiufu@linux.ibm.com \
    --cc=dje.gcc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=linkw@gcc.gnu.org \
    --cc=pinskia@gcc.gnu.org \
    --cc=rguenth@gcc.gnu.org \
    --cc=segher@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).