From: Jiufu Guo <guojiufu@linux.ibm.com>
To: Segher Boessenkool <segher@kernel.crashing.org>
Cc: gcc-patches@gcc.gnu.org, dje.gcc@gmail.com, linkw@gcc.gnu.org,
rguenth@gcc.gnu.org, pinskia@gcc.gnu.org
Subject: Re: [RFC] propgation leap over memory copy for struct
Date: Tue, 01 Nov 2022 11:01:14 +0800 [thread overview]
Message-ID: <7emt9bwf7p.fsf@pike.rch.stglabs.ibm.com> (raw)
In-Reply-To: <20221101003705.GK25951@gate.crashing.org> (Segher Boessenkool's message of "Mon, 31 Oct 2022 19:37:05 -0500")
Segher Boessenkool <segher@kernel.crashing.org> writes:
> Hi!
>
> On Mon, Oct 31, 2022 at 10:42:35AM +0800, Jiufu Guo wrote:
>> #define FN 4
>> typedef struct { double a[FN]; } A;
>>
>> A foo (const A *a) { return *a; }
>> A bar (const A a) { return a; }
>> ///////
>>
>> If FN<=2; the size of "A" fits into TImode, then this code can be optimized
>> (by subreg/cse/fwprop/cprop) as:
>> -------
>> foo:
>> .LFB0:
>> .cfi_startproc
>> blr
>>
>> bar:
>> .LFB1:
>> .cfi_startproc
>> lfd 2,8(3)
>> lfd 1,0(3)
>> blr
>> --------
>
> I think you swapped foo and bar here?
Oh, thanks!
>
>> If the size of "A" is larger than any INT mode size, RTL insns would be
>> generated as:
>> 13: r125:V2DI=[r112:DI+0x20]
>> 14: r126:V2DI=[r112:DI+0x30]
>> 15: [r112:DI]=r125:V2DI
>> 16: [r112:DI+0x10]=r126:V2DI /// memcpy for assignment: D.3338 = arg;
>> 17: r127:DF=[r112:DI]
>> 18: r128:DF=[r112:DI+0x8]
>> 19: r129:DF=[r112:DI+0x10]
>> 20: r130:DF=[r112:DI+0x18]
>> ------------
>>
>> I'm thinking about ways to improve this.
>> Metod1: One way may be changing the memory copy by referencing the type
>> of the struct if the size of struct is not too big. And generate insns
>> like the below:
>> 13: r125:DF=[r112:DI+0x20]
>> 15: r126:DF=[r112:DI+0x28]
>> 17: r127:DF=[r112:DI+0x30]
>> 19: r128:DF=[r112:DI+0x38]
>> 14: [r112:DI]=r125:DF
>> 16: [r112:DI+0x8]=r126:DF
>> 18: [r112:DI+0x10]=r127:DF
>> 20: [r112:DI+0x18]=r128:DF
>> 21: r129:DF=[r112:DI]
>> 22: r130:DF=[r112:DI+0x8]
>> 23: r131:DF=[r112:DI+0x10]
>> 24: r132:DF=[r112:DI+0x18]
>
> This is much worse though? The expansion with memcpy used V2DI, which
> typically is close to 2x faster than DFmode accesses.
Using V2DI, it help to access 2x bytes at one time than DF/DI.
While since those readings can be executed in parallel, it would be not
too bad via using DF/DI.
>
> Or are you trying to avoid small reads of large stores here? Those
> aren't so bad, large reads of small stores is the nastiness we need to
> avoid.
Here, I want to use 2 DF readings, because optimizations cse/fwprop/dse
could eleminate those memory accesses on same size.
>
> The code we have now does
>
> 15: [r112:DI]=r125:V2DI
> ...
> 17: r127:DF=[r112:DI]
> 18: r128:DF=[r112:DI+0x8]
>
> Can you make this optimised to not use a memory temporary at all, just
> immediately assign from r125 to r127 and r128?
r125 are not directly assinged to r127/r128, since 'insn 15' and 'insn
17/18' are expanded for different gimple stmt:
D.3331 = a; ==> 'insn 15' is generated for struct assignment here.
return D.3331; ==> 'insn 17/18' are prepared for return registers.
I'm trying to eliminate thos memory temporary, and did not find a good
way. Method1-3 are the ideas which I'm trying to use to delete those
temporaries.
>
>> Method2: One way may be enhancing CSE to make it able to treat one large
>> memory slot as two(or more) combined slots:
>> 13: r125:V2DI#0=[r112:DI+0x20]
>> 13': r125:V2DI#8=[r112:DI+0x28]
>> 15: [r112:DI]#0=r125:V2DI#0
>> 15': [r112:DI]#8=r125:V2DI#8
>>
>> This may seems more hack in CSE.
>
> The current CSE pass we have is the pass most in need of a full rewrite
> we have, since many many years. It does a lot of things, important
> things that we should not lose, but it does a pretty bad job of CSE.
>
>> Method3: For some record type, use "PARALLEL:BLK" instead "MEM:BLK".
>
> :BLK can never be optimised well. It always has to live in memory, by
> definition.
Thanks for your sugguestions!
BR,
Jeff (Jiufu)
>
>
> Segher
prev parent reply other threads:[~2022-11-01 3:02 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-31 2:42 Jiufu Guo
2022-10-31 22:13 ` Jeff Law
2022-11-01 0:49 ` Segher Boessenkool
2022-11-01 4:30 ` Jiufu Guo
2022-11-05 14:13 ` Richard Biener
2022-11-08 4:05 ` Jiufu Guo
2022-11-09 7:51 ` Jiufu Guo
2022-11-09 8:50 ` Richard Biener
2022-11-01 3:30 ` Jiufu Guo
2022-11-05 11:38 ` Richard Biener
2022-11-09 9:21 ` Jiufu Guo
2022-11-09 12:56 ` Richard Biener
2022-11-01 0:37 ` Segher Boessenkool
2022-11-01 3:01 ` Jiufu Guo [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7emt9bwf7p.fsf@pike.rch.stglabs.ibm.com \
--to=guojiufu@linux.ibm.com \
--cc=dje.gcc@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=linkw@gcc.gnu.org \
--cc=pinskia@gcc.gnu.org \
--cc=rguenth@gcc.gnu.org \
--cc=segher@kernel.crashing.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).