public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <richard.guenther@gmail.com>
To: Stefan Schulze Frielinghaus <stefansf@linux.ibm.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [RFC] ldist: Recognize rawmemchr loop patterns
Date: Wed, 5 May 2021 11:36:41 +0200	[thread overview]
Message-ID: <CAFiYyc0Q3OZqbzGBukt=rvd6VM0EcdLKbv1d2uORbGCGg-Lbzg@mail.gmail.com> (raw)
In-Reply-To: <YFDnMF3i6OsQa6L9@localhost.localdomain>

On Tue, Mar 16, 2021 at 6:13 PM Stefan Schulze Frielinghaus
<stefansf@linux.ibm.com> wrote:
>
> [snip]
>
> Please find attached a new version of the patch.  A major change compared to
> the previous patch is that I created a separate pass which hopefully makes
> reviewing also easier since it is almost self-contained.  After realizing that
> detecting loops which mimic the behavior of rawmemchr/strlen functions does not
> really fit into the topic of loop distribution, I created a separate pass.

It's true that these reduction-like patterns are more difficult than
the existing
memcpy/memset cases.

>  Due
> to this I was also able to play around a bit and schedule the pass at different
> times.  Currently it is scheduled right before loop distribution where loop
> header copying already took place which leads to the following effect.

In fact I'd schedule it after loop distribution so there's the chance that loop
distribution can expose a loop that fits the new pattern.

>  Running
> this setup over
>
> char *t (char *p)
> {
>   for (; *p; ++p);
>   return p;
> }
>
> the new pass transforms
>
> char * t (char * p)
> {
>   char _1;
>   char _7;
>
>   <bb 2> [local count: 118111600]:
>   _7 = *p_3(D);
>   if (_7 != 0)
>     goto <bb 5>; [89.00%]
>   else
>     goto <bb 7>; [11.00%]
>
>   <bb 5> [local count: 105119324]:
>
>   <bb 3> [local count: 955630225]:
>   # p_8 = PHI <p_6(6), p_3(D)(5)>
>   p_6 = p_8 + 1;
>   _1 = *p_6;
>   if (_1 != 0)
>     goto <bb 6>; [89.00%]
>   else
>     goto <bb 8>; [11.00%]
>
>   <bb 8> [local count: 105119324]:
>   # p_2 = PHI <p_6(3)>
>   goto <bb 4>; [100.00%]
>
>   <bb 6> [local count: 850510901]:
>   goto <bb 3>; [100.00%]
>
>   <bb 7> [local count: 12992276]:
>
>   <bb 4> [local count: 118111600]:
>   # p_9 = PHI <p_2(8), p_3(D)(7)>
>   return p_9;
>
> }
>
> into
>
> char * t (char * p)
> {
>   char * _5;
>   char _7;
>
>   <bb 2> [local count: 118111600]:
>   _7 = *p_3(D);
>   if (_7 != 0)
>     goto <bb 5>; [89.00%]
>   else
>     goto <bb 4>; [11.00%]
>
>   <bb 5> [local count: 105119324]:
>   _5 = p_3(D) + 1;
>   p_10 = .RAWMEMCHR (_5, 0);
>
>   <bb 4> [local count: 118111600]:
>   # p_9 = PHI <p_10(5), p_3(D)(2)>
>   return p_9;
>
> }
>
> which is fine so far.  However, I haven't made up my mind so far whether it is
> worthwhile to spend more time in order to also eliminate the "first unrolling"
> of the loop.

Might be a phiopt transform ;)  Might apply to quite some set of
builtins.  I wonder how the strlen case looks like though.

> I gave it a shot by scheduling the pass prior pass copy header
> and ended up with:
>
> char * t (char * p)
> {
>   <bb 2> [local count: 118111600]:
>   p_5 = .RAWMEMCHR (p_3(D), 0);
>   return p_5;
>
> }
>
> which seems optimal to me.  The downside of this is that I have to initialize
> scalar evolution analysis which might be undesired that early.
>
> All this brings me to the question where do you see this peace of code running?
> If in a separate pass when would you schedule it?  If in an existing pass,
> which one would you choose?

I think it still fits loop distribution.  If you manage to detect it
with your pass
standalone then you should be able to detect it in loop distribution.  Can you
explain what part is "easier" as standalone pass?

> Another topic which came up is whether there exists a more elegant solution to
> my current implementation in order to deal with stores (I'm speaking of the `if
> (store_dr)` statement inside of function transform_loop_1).  For example,
>
> extern char *p;
> char *t ()
> {
>   for (; *p; ++p);
>   return p;
> }
>
> ends up as
>
> char * t ()
> {
>   char * _1;
>   char * _2;
>   char _3;
>   char * p.1_8;
>   char _9;
>   char * p.1_10;
>   char * p.1_11;
>
>   <bb 2> [local count: 118111600]:
>   p.1_8 = p;
>   _9 = *p.1_8;
>   if (_9 != 0)
>     goto <bb 5>; [89.00%]
>   else
>     goto <bb 7>; [11.00%]
>
>   <bb 5> [local count: 105119324]:
>
>   <bb 3> [local count: 955630225]:
>   # p.1_10 = PHI <_1(6), p.1_8(5)>
>   _1 = p.1_10 + 1;
>   p = _1;
>   _3 = *_1;
>   if (_3 != 0)
>     goto <bb 6>; [89.00%]
>   else
>     goto <bb 8>; [11.00%]
>
>   <bb 8> [local count: 105119324]:
>   # _2 = PHI <_1(3)>
>   goto <bb 4>; [100.00%]
>
>   <bb 6> [local count: 850510901]:
>   goto <bb 3>; [100.00%]
>
>   <bb 7> [local count: 12992276]:
>
>   <bb 4> [local count: 118111600]:
>   # p.1_11 = PHI <_2(8), p.1_8(7)>
>   return p.1_11;
>
> }
>
> where inside the loop a load and store occurs.  For a rawmemchr like loop I
> have to show that we never load from a memory location to which we write.
> Currently I solve this by hard coding those facts which are not generic at all.
> I gave compute_data_dependences_for_loop a try which failed to determine the
> fact that stores only happen to p[0] and loads from p[i] where i>0.  Maybe
> there are more generic solutions to express this in contrast to my current one?

So the example loop is not valid to be trasformed to rawmemchr since it's
valid to call it with p = &p; - but sure, once you pass the first *p != 0 check
things become fishy but GCC isn't able to turn that into a non-dependence.

Why is the case of stores inside the loop important?  In fact that you
think it is
makes a case for integrating this with loop distribution since loop distribution
would be able to prove (if possible) that the stores can be separated into
a different loop.

And sorry for the delay in answering ...

Thanks,
Richard.

>
> Thanks again for your input so far.  Really appreciated.
>
> Cheers,
> Stefan

  parent reply	other threads:[~2021-05-05  9:36 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-08 12:47 Stefan Schulze Frielinghaus
2021-02-09  8:57 ` Richard Biener
2021-02-14 10:27   ` Stefan Schulze Frielinghaus
2021-02-25 17:01     ` Stefan Schulze Frielinghaus
2021-02-25 23:49       ` Jeff Law
2021-03-02 12:29     ` Richard Biener
2021-03-03 17:17       ` Stefan Schulze Frielinghaus
2021-03-16 17:13         ` Stefan Schulze Frielinghaus
2021-04-08  8:23           ` Stefan Schulze Frielinghaus
2021-05-04 17:25             ` Stefan Schulze Frielinghaus
2021-05-05  9:36           ` Richard Biener [this message]
2021-05-05 10:03             ` Richard Biener
2021-05-07 12:32             ` Stefan Schulze Frielinghaus
2021-05-20  9:24               ` Richard Biener
2021-05-20 18:37                 ` Stefan Schulze Frielinghaus
2021-06-14 17:26                   ` Stefan Schulze Frielinghaus
2021-06-16 14:22                     ` Richard Biener
2021-06-25 10:23                       ` Stefan Schulze Frielinghaus
2021-08-06 14:02                         ` Stefan Schulze Frielinghaus
2021-08-20 10:35                         ` Richard Biener
2021-09-03  8:00                           ` Stefan Schulze Frielinghaus
2021-09-06  9:56                             ` Richard Biener
2021-09-13 14:53                               ` Stefan Schulze Frielinghaus
2021-09-17  8:08                                 ` Richard Biener
2021-10-11 16:02                                   ` Stefan Schulze Frielinghaus
2022-01-31 13:16                                   ` Tom de Vries
2022-01-31 15:00                                     ` Richard Biener
2022-01-31 16:26                                       ` [PATCH][ldist] Don't add lib calls with -fno-tree-loop-distribute-patterns Tom de Vries
2022-02-01  7:04                                         ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFiYyc0Q3OZqbzGBukt=rvd6VM0EcdLKbv1d2uORbGCGg-Lbzg@mail.gmail.com' \
    --to=richard.guenther@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=stefansf@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).