Re: [RFC] ldist: Recognize rawmemchr loop patterns

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Biener <richard.guenther@gmail.com>
To: Stefan Schulze Frielinghaus <stefansf@linux.ibm.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [RFC] ldist: Recognize rawmemchr loop patterns
Date: Wed, 5 May 2021 12:03:55 +0200	[thread overview]
Message-ID: <CAFiYyc20YVeMzoThdx2ujbk1qmswjMaPO1ynGNwhU1jzJmGDQw@mail.gmail.com> (raw)
In-Reply-To: <CAFiYyc0Q3OZqbzGBukt=rvd6VM0EcdLKbv1d2uORbGCGg-Lbzg@mail.gmail.com>

On Wed, May 5, 2021 at 11:36 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Tue, Mar 16, 2021 at 6:13 PM Stefan Schulze Frielinghaus
> <stefansf@linux.ibm.com> wrote:
> >
> > [snip]
> >
> > Please find attached a new version of the patch.  A major change compared to
> > the previous patch is that I created a separate pass which hopefully makes
> > reviewing also easier since it is almost self-contained.  After realizing that
> > detecting loops which mimic the behavior of rawmemchr/strlen functions does not
> > really fit into the topic of loop distribution, I created a separate pass.
>
> It's true that these reduction-like patterns are more difficult than
> the existing
> memcpy/memset cases.
>
> >  Due
> > to this I was also able to play around a bit and schedule the pass at different
> > times.  Currently it is scheduled right before loop distribution where loop
> > header copying already took place which leads to the following effect.
>
> In fact I'd schedule it after loop distribution so there's the chance that loop
> distribution can expose a loop that fits the new pattern.
>
> >  Running
> > this setup over
> >
> > char *t (char *p)
> > {
> >   for (; *p; ++p);
> >   return p;
> > }
> >
> > the new pass transforms
> >
> > char * t (char * p)
> > {
> >   char _1;
> >   char _7;
> >
> >   <bb 2> [local count: 118111600]:
> >   _7 = *p_3(D);
> >   if (_7 != 0)
> >     goto <bb 5>; [89.00%]
> >   else
> >     goto <bb 7>; [11.00%]
> >
> >   <bb 5> [local count: 105119324]:
> >
> >   <bb 3> [local count: 955630225]:
> >   # p_8 = PHI <p_6(6), p_3(D)(5)>
> >   p_6 = p_8 + 1;
> >   _1 = *p_6;
> >   if (_1 != 0)
> >     goto <bb 6>; [89.00%]
> >   else
> >     goto <bb 8>; [11.00%]
> >
> >   <bb 8> [local count: 105119324]:
> >   # p_2 = PHI <p_6(3)>
> >   goto <bb 4>; [100.00%]
> >
> >   <bb 6> [local count: 850510901]:
> >   goto <bb 3>; [100.00%]
> >
> >   <bb 7> [local count: 12992276]:
> >
> >   <bb 4> [local count: 118111600]:
> >   # p_9 = PHI <p_2(8), p_3(D)(7)>
> >   return p_9;
> >
> > }
> >
> > into
> >
> > char * t (char * p)
> > {
> >   char * _5;
> >   char _7;
> >
> >   <bb 2> [local count: 118111600]:
> >   _7 = *p_3(D);
> >   if (_7 != 0)
> >     goto <bb 5>; [89.00%]
> >   else
> >     goto <bb 4>; [11.00%]
> >
> >   <bb 5> [local count: 105119324]:
> >   _5 = p_3(D) + 1;
> >   p_10 = .RAWMEMCHR (_5, 0);
> >
> >   <bb 4> [local count: 118111600]:
> >   # p_9 = PHI <p_10(5), p_3(D)(2)>
> >   return p_9;
> >
> > }
> >
> > which is fine so far.  However, I haven't made up my mind so far whether it is
> > worthwhile to spend more time in order to also eliminate the "first unrolling"
> > of the loop.
>
> Might be a phiopt transform ;)  Might apply to quite some set of
> builtins.  I wonder how the strlen case looks like though.
>
> > I gave it a shot by scheduling the pass prior pass copy header
> > and ended up with:
> >
> > char * t (char * p)
> > {
> >   <bb 2> [local count: 118111600]:
> >   p_5 = .RAWMEMCHR (p_3(D), 0);
> >   return p_5;
> >
> > }
> >
> > which seems optimal to me.  The downside of this is that I have to initialize
> > scalar evolution analysis which might be undesired that early.
> >
> > All this brings me to the question where do you see this peace of code running?
> > If in a separate pass when would you schedule it?  If in an existing pass,
> > which one would you choose?
>
> I think it still fits loop distribution.  If you manage to detect it
> with your pass
> standalone then you should be able to detect it in loop distribution.  Can you
> explain what part is "easier" as standalone pass?

Btw, another "fitting" pass would be final value replacement (pass_scev_cprop)
since what these patterns provide is a builtin call to compute the value of one
of the loop PHIs on exit.  Note this pass leaves removal of in-loop computations
to followup DCE which means that in some cases it does unprofitable transforms.
There's a bug somewhere where I worked on doing final value replacement
on-demand when DCE figures out the loop is otherwise dead but I never finished
this (loop distribution could also use such mechanism to get rid of
unwanted PHIs).

> > Another topic which came up is whether there exists a more elegant solution to
> > my current implementation in order to deal with stores (I'm speaking of the `if
> > (store_dr)` statement inside of function transform_loop_1).  For example,
> >
> > extern char *p;
> > char *t ()
> > {
> >   for (; *p; ++p);
> >   return p;
> > }
> >
> > ends up as
> >
> > char * t ()
> > {
> >   char * _1;
> >   char * _2;
> >   char _3;
> >   char * p.1_8;
> >   char _9;
> >   char * p.1_10;
> >   char * p.1_11;
> >
> >   <bb 2> [local count: 118111600]:
> >   p.1_8 = p;
> >   _9 = *p.1_8;
> >   if (_9 != 0)
> >     goto <bb 5>; [89.00%]
> >   else
> >     goto <bb 7>; [11.00%]
> >
> >   <bb 5> [local count: 105119324]:
> >
> >   <bb 3> [local count: 955630225]:
> >   # p.1_10 = PHI <_1(6), p.1_8(5)>
> >   _1 = p.1_10 + 1;
> >   p = _1;
> >   _3 = *_1;
> >   if (_3 != 0)
> >     goto <bb 6>; [89.00%]
> >   else
> >     goto <bb 8>; [11.00%]
> >
> >   <bb 8> [local count: 105119324]:
> >   # _2 = PHI <_1(3)>
> >   goto <bb 4>; [100.00%]
> >
> >   <bb 6> [local count: 850510901]:
> >   goto <bb 3>; [100.00%]
> >
> >   <bb 7> [local count: 12992276]:
> >
> >   <bb 4> [local count: 118111600]:
> >   # p.1_11 = PHI <_2(8), p.1_8(7)>
> >   return p.1_11;
> >
> > }
> >
> > where inside the loop a load and store occurs.  For a rawmemchr like loop I
> > have to show that we never load from a memory location to which we write.
> > Currently I solve this by hard coding those facts which are not generic at all.
> > I gave compute_data_dependences_for_loop a try which failed to determine the
> > fact that stores only happen to p[0] and loads from p[i] where i>0.  Maybe
> > there are more generic solutions to express this in contrast to my current one?
>
> So the example loop is not valid to be trasformed to rawmemchr since it's
> valid to call it with p = &p; - but sure, once you pass the first *p != 0 check
> things become fishy but GCC isn't able to turn that into a non-dependence.
>
> Why is the case of stores inside the loop important?  In fact that you
> think it is
> makes a case for integrating this with loop distribution since loop distribution
> would be able to prove (if possible) that the stores can be separated into
> a different loop.
>
> And sorry for the delay in answering ...
>
> Thanks,
> Richard.
>
> >
> > Thanks again for your input so far.  Really appreciated.
> >
> > Cheers,
> > Stefan

next prev parent reply	other threads:[~2021-05-05 10:04 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-08 12:47 Stefan Schulze Frielinghaus
2021-02-09  8:57 ` Richard Biener
2021-02-14 10:27   ` Stefan Schulze Frielinghaus
2021-02-25 17:01     ` Stefan Schulze Frielinghaus
2021-02-25 23:49       ` Jeff Law
2021-03-02 12:29     ` Richard Biener
2021-03-03 17:17       ` Stefan Schulze Frielinghaus
2021-03-16 17:13         ` Stefan Schulze Frielinghaus
2021-04-08  8:23           ` Stefan Schulze Frielinghaus
2021-05-04 17:25             ` Stefan Schulze Frielinghaus
2021-05-05  9:36           ` Richard Biener
2021-05-05 10:03             ` Richard Biener [this message]
2021-05-07 12:32             ` Stefan Schulze Frielinghaus
2021-05-20  9:24               ` Richard Biener
2021-05-20 18:37                 ` Stefan Schulze Frielinghaus
2021-06-14 17:26                   ` Stefan Schulze Frielinghaus
2021-06-16 14:22                     ` Richard Biener
2021-06-25 10:23                       ` Stefan Schulze Frielinghaus
2021-08-06 14:02                         ` Stefan Schulze Frielinghaus
2021-08-20 10:35                         ` Richard Biener
2021-09-03  8:00                           ` Stefan Schulze Frielinghaus
2021-09-06  9:56                             ` Richard Biener
2021-09-13 14:53                               ` Stefan Schulze Frielinghaus
2021-09-17  8:08                                 ` Richard Biener
2021-10-11 16:02                                   ` Stefan Schulze Frielinghaus
2022-01-31 13:16                                   ` Tom de Vries
2022-01-31 15:00                                     ` Richard Biener
2022-01-31 16:26                                       ` [PATCH][ldist] Don't add lib calls with -fno-tree-loop-distribute-patterns Tom de Vries
2022-02-01  7:04                                         ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFiYyc20YVeMzoThdx2ujbk1qmswjMaPO1ynGNwhU1jzJmGDQw@mail.gmail.com \
    --to=richard.guenther@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=stefansf@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).