Re: [RFC] propgation leap over memory copy for struct

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Biener <rguenther@suse.de>
To: Jiufu Guo <guojiufu@linux.ibm.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>,
	 Jeff Law <jeffreyalaw@gmail.com>,
	gcc-patches@gcc.gnu.org,  rguenth@gcc.gnu.org,
	pinskia@gcc.gnu.org, linkw@gcc.gnu.org,  dje.gcc@gmail.com
Subject: Re: [RFC] propgation leap over memory copy for struct
Date: Sat, 5 Nov 2022 15:13:55 +0100 (CET)	[thread overview]
Message-ID: <381qr8s3-53n-pr61-7r1n-6q8q71nsqnq@fhfr.qr> (raw)
In-Reply-To: <7e1qqnwb36.fsf@pike.rch.stglabs.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 5107 bytes --]

On Tue, 1 Nov 2022, Jiufu Guo wrote:

> Segher Boessenkool <segher@kernel.crashing.org> writes:
> 
> > On Mon, Oct 31, 2022 at 04:13:38PM -0600, Jeff Law wrote:
> >> On 10/30/22 20:42, Jiufu Guo via Gcc-patches wrote:
> >> >We know that for struct variable assignment, memory copy may be used.
> >> >And for memcpy, we may load and store more bytes as possible at one time.
> >> >While it may be not best here:
> >
> >> So the first question in my mind is can we do better at the gimple 
> >> phase?  For the second case in particular can't we just "return a" 
> >> rather than copying a into <retval> then returning <retval>?  This feels 
> >> a lot like the return value optimization from C++.  I'm not sure if it 
> >> applies to the first case or not, it's been a long time since I looked 
> >> at NRV optimizations, but it might be worth poking around in there a bit 
> >> (tree-nrv.cc).
> >
> > If it is a bigger struct you end up with quite a lot of stuff in
> > registers.  GCC will eventually put that all in memory so it will work
> > out fine in the end, but you are likely to get inefficient code.
> Yes.  We may need to use memory to save regiters for big struct.
> Small struct may be practical to use registers.  We may leverage the
> idea that: some type of small struct are passing to function through
> registers. 
> 
> >
> > OTOH, 8 bytes isn't as big as we would want these days, is it?  So it
> > would be useful to put smaller temportaries, say 32 bytes and smaller,
> > in registers instead of in memory.
> I think you mean:  we should try to registers to avoid memory accesing,
> and using registers would be ok for more bytes memcpy(32bytes).
> Great sugguestion, thanks a lot!
> 
> Like below idea:
> [r100:TI, r101:TI] = src;  //Or r100:OI/OO = src;
> dest = [r100:TI, r101:TI];
> 
> Currently, for 8bytes structure, we are using TImode for it.
> And subreg/fwprop/cse passes are able to optimize it as expected.
> Two concerns here: larger int modes(OI/OO/..) may be not introduced yet;
> I'm not sure if current infrastructure supports to use two more
> registers for one structure.
> 
> >
> >> But even so, these kinds of things are still bound to happen, so it's 
> >> probably worth thinking about if we can do better in RTL as well.
> >
> > Always.  It is a mistake to think that having better high-level
> > optimisations means that you don't need good low-level optimisations
> > anymore: in fact deficiencies there become more glaringly apparent if
> > the early pipeline opts become better :-)
> Understant, thanks :)
> 
> >
> >> The first thing that comes to my mind is to annotate memcpy calls that 
> >> are structure assignments.  The idea here is that we may want to expand 
> >> a memcpy differently in those cases.   Changing how we expand an opaque 
> >> memcpy call is unlikely to be beneficial in most cases.  But changing 
> >> how we expand a structure copy may be beneficial by exposing the 
> >> underlying field values.   This would roughly correspond to your method 
> >> #1.
> >> 
> >> Or instead of changing how we expand, teach the optimizers about these 
> >> annotated memcpy calls -- they're just a a copy of each field.   That's 
> >> how CSE and the propagators could treat them. After some point we'd 
> >> lower them in the usual ways, but at least early in the RTL pipeline we 
> >> could keep them as annotated memcpy calls.  This roughly corresponds to 
> >> your second suggestion.
> >
> > Ideally this won't ever make it as far as RTL, if the structures do not
> > need to go via memory.  All high-level optimissations should have been
> > done earlier, and hopefully it was not expand tiself that forced stuff
> > into memory!  :-/
> Currently, after early gimple optimization, the struct member accessing
> may still need to be in memory (if the mode of the struct is BLK).
> For example:
> 
> _Bool foo (const A a) { return a.a[0] > 1.0; }
> 
> The optimized gimple would be:
>   _1 = a.a[0];
>   _3 = _1 > 1.0e+0;
>   return _3;
> 
> During expand to RTL, parm 'a' is store to memory from arg regs firstly,
> and "a.a[0]" is also reading from memory.  It may be better to use
> "f1" for "a.a[0]" here.
> 
> Maybe, method3 is similar with your idea: using "parallel:BLK {DF;DF;DF; DF}"
> for the struct (BLK may be changed), and using 4 DF registers to access
> the structure in expand pass.

I think for cases like this it might be a good idea to perform
SRA-like analysis at RTL expansion time when we know how parameters
arrive (in pieces) and take that knowledge into account when
assigning the RTL to a decl.  The same applies for the return ABI.
Since we rely on RTL to elide copies to/from return/argument
registers/slots we have to assign "layout compatible" registers
to the corresponding auto vars.

> 
> Thanks again for your kindly and helpful comments!
> 
> BR,
> Jeff(Jiufu)
> 
> >
> >
> > Segher
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

next prev parent reply	other threads:[~2022-11-05 14:13 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-31  2:42 Jiufu Guo
2022-10-31 22:13 ` Jeff Law
2022-11-01  0:49   ` Segher Boessenkool
2022-11-01  4:30     ` Jiufu Guo
2022-11-05 14:13       ` Richard Biener [this message]
2022-11-08  4:05         ` Jiufu Guo
2022-11-09  7:51           ` Jiufu Guo
2022-11-09  8:50             ` Richard Biener
2022-11-01  3:30   ` Jiufu Guo
2022-11-05 11:38   ` Richard Biener
2022-11-09  9:21     ` Jiufu Guo
2022-11-09 12:56       ` Richard Biener
2022-11-01  0:37 ` Segher Boessenkool
2022-11-01  3:01   ` Jiufu Guo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=381qr8s3-53n-pr61-7r1n-6q8q71nsqnq@fhfr.qr \
    --to=rguenther@suse.de \
    --cc=dje.gcc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=guojiufu@linux.ibm.com \
    --cc=jeffreyalaw@gmail.com \
    --cc=linkw@gcc.gnu.org \
    --cc=pinskia@gcc.gnu.org \
    --cc=rguenth@gcc.gnu.org \
    --cc=segher@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).