From: Richard Biener <richard.guenther@gmail.com>
To: Stefan Schulze Frielinghaus <stefansf@linux.ibm.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [RFC] ldist: Recognize rawmemchr loop patterns
Date: Wed, 5 May 2021 12:03:55 +0200 [thread overview]
Message-ID: <CAFiYyc20YVeMzoThdx2ujbk1qmswjMaPO1ynGNwhU1jzJmGDQw@mail.gmail.com> (raw)
In-Reply-To: <CAFiYyc0Q3OZqbzGBukt=rvd6VM0EcdLKbv1d2uORbGCGg-Lbzg@mail.gmail.com>
On Wed, May 5, 2021 at 11:36 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Tue, Mar 16, 2021 at 6:13 PM Stefan Schulze Frielinghaus
> <stefansf@linux.ibm.com> wrote:
> >
> > [snip]
> >
> > Please find attached a new version of the patch. A major change compared to
> > the previous patch is that I created a separate pass which hopefully makes
> > reviewing also easier since it is almost self-contained. After realizing that
> > detecting loops which mimic the behavior of rawmemchr/strlen functions does not
> > really fit into the topic of loop distribution, I created a separate pass.
>
> It's true that these reduction-like patterns are more difficult than
> the existing
> memcpy/memset cases.
>
> > Due
> > to this I was also able to play around a bit and schedule the pass at different
> > times. Currently it is scheduled right before loop distribution where loop
> > header copying already took place which leads to the following effect.
>
> In fact I'd schedule it after loop distribution so there's the chance that loop
> distribution can expose a loop that fits the new pattern.
>
> > Running
> > this setup over
> >
> > char *t (char *p)
> > {
> > for (; *p; ++p);
> > return p;
> > }
> >
> > the new pass transforms
> >
> > char * t (char * p)
> > {
> > char _1;
> > char _7;
> >
> > <bb 2> [local count: 118111600]:
> > _7 = *p_3(D);
> > if (_7 != 0)
> > goto <bb 5>; [89.00%]
> > else
> > goto <bb 7>; [11.00%]
> >
> > <bb 5> [local count: 105119324]:
> >
> > <bb 3> [local count: 955630225]:
> > # p_8 = PHI <p_6(6), p_3(D)(5)>
> > p_6 = p_8 + 1;
> > _1 = *p_6;
> > if (_1 != 0)
> > goto <bb 6>; [89.00%]
> > else
> > goto <bb 8>; [11.00%]
> >
> > <bb 8> [local count: 105119324]:
> > # p_2 = PHI <p_6(3)>
> > goto <bb 4>; [100.00%]
> >
> > <bb 6> [local count: 850510901]:
> > goto <bb 3>; [100.00%]
> >
> > <bb 7> [local count: 12992276]:
> >
> > <bb 4> [local count: 118111600]:
> > # p_9 = PHI <p_2(8), p_3(D)(7)>
> > return p_9;
> >
> > }
> >
> > into
> >
> > char * t (char * p)
> > {
> > char * _5;
> > char _7;
> >
> > <bb 2> [local count: 118111600]:
> > _7 = *p_3(D);
> > if (_7 != 0)
> > goto <bb 5>; [89.00%]
> > else
> > goto <bb 4>; [11.00%]
> >
> > <bb 5> [local count: 105119324]:
> > _5 = p_3(D) + 1;
> > p_10 = .RAWMEMCHR (_5, 0);
> >
> > <bb 4> [local count: 118111600]:
> > # p_9 = PHI <p_10(5), p_3(D)(2)>
> > return p_9;
> >
> > }
> >
> > which is fine so far. However, I haven't made up my mind so far whether it is
> > worthwhile to spend more time in order to also eliminate the "first unrolling"
> > of the loop.
>
> Might be a phiopt transform ;) Might apply to quite some set of
> builtins. I wonder how the strlen case looks like though.
>
> > I gave it a shot by scheduling the pass prior pass copy header
> > and ended up with:
> >
> > char * t (char * p)
> > {
> > <bb 2> [local count: 118111600]:
> > p_5 = .RAWMEMCHR (p_3(D), 0);
> > return p_5;
> >
> > }
> >
> > which seems optimal to me. The downside of this is that I have to initialize
> > scalar evolution analysis which might be undesired that early.
> >
> > All this brings me to the question where do you see this peace of code running?
> > If in a separate pass when would you schedule it? If in an existing pass,
> > which one would you choose?
>
> I think it still fits loop distribution. If you manage to detect it
> with your pass
> standalone then you should be able to detect it in loop distribution. Can you
> explain what part is "easier" as standalone pass?
Btw, another "fitting" pass would be final value replacement (pass_scev_cprop)
since what these patterns provide is a builtin call to compute the value of one
of the loop PHIs on exit. Note this pass leaves removal of in-loop computations
to followup DCE which means that in some cases it does unprofitable transforms.
There's a bug somewhere where I worked on doing final value replacement
on-demand when DCE figures out the loop is otherwise dead but I never finished
this (loop distribution could also use such mechanism to get rid of
unwanted PHIs).
> > Another topic which came up is whether there exists a more elegant solution to
> > my current implementation in order to deal with stores (I'm speaking of the `if
> > (store_dr)` statement inside of function transform_loop_1). For example,
> >
> > extern char *p;
> > char *t ()
> > {
> > for (; *p; ++p);
> > return p;
> > }
> >
> > ends up as
> >
> > char * t ()
> > {
> > char * _1;
> > char * _2;
> > char _3;
> > char * p.1_8;
> > char _9;
> > char * p.1_10;
> > char * p.1_11;
> >
> > <bb 2> [local count: 118111600]:
> > p.1_8 = p;
> > _9 = *p.1_8;
> > if (_9 != 0)
> > goto <bb 5>; [89.00%]
> > else
> > goto <bb 7>; [11.00%]
> >
> > <bb 5> [local count: 105119324]:
> >
> > <bb 3> [local count: 955630225]:
> > # p.1_10 = PHI <_1(6), p.1_8(5)>
> > _1 = p.1_10 + 1;
> > p = _1;
> > _3 = *_1;
> > if (_3 != 0)
> > goto <bb 6>; [89.00%]
> > else
> > goto <bb 8>; [11.00%]
> >
> > <bb 8> [local count: 105119324]:
> > # _2 = PHI <_1(3)>
> > goto <bb 4>; [100.00%]
> >
> > <bb 6> [local count: 850510901]:
> > goto <bb 3>; [100.00%]
> >
> > <bb 7> [local count: 12992276]:
> >
> > <bb 4> [local count: 118111600]:
> > # p.1_11 = PHI <_2(8), p.1_8(7)>
> > return p.1_11;
> >
> > }
> >
> > where inside the loop a load and store occurs. For a rawmemchr like loop I
> > have to show that we never load from a memory location to which we write.
> > Currently I solve this by hard coding those facts which are not generic at all.
> > I gave compute_data_dependences_for_loop a try which failed to determine the
> > fact that stores only happen to p[0] and loads from p[i] where i>0. Maybe
> > there are more generic solutions to express this in contrast to my current one?
>
> So the example loop is not valid to be trasformed to rawmemchr since it's
> valid to call it with p = &p; - but sure, once you pass the first *p != 0 check
> things become fishy but GCC isn't able to turn that into a non-dependence.
>
> Why is the case of stores inside the loop important? In fact that you
> think it is
> makes a case for integrating this with loop distribution since loop distribution
> would be able to prove (if possible) that the stores can be separated into
> a different loop.
>
> And sorry for the delay in answering ...
>
> Thanks,
> Richard.
>
> >
> > Thanks again for your input so far. Really appreciated.
> >
> > Cheers,
> > Stefan
next prev parent reply other threads:[~2021-05-05 10:04 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-08 12:47 Stefan Schulze Frielinghaus
2021-02-09 8:57 ` Richard Biener
2021-02-14 10:27 ` Stefan Schulze Frielinghaus
2021-02-25 17:01 ` Stefan Schulze Frielinghaus
2021-02-25 23:49 ` Jeff Law
2021-03-02 12:29 ` Richard Biener
2021-03-03 17:17 ` Stefan Schulze Frielinghaus
2021-03-16 17:13 ` Stefan Schulze Frielinghaus
2021-04-08 8:23 ` Stefan Schulze Frielinghaus
2021-05-04 17:25 ` Stefan Schulze Frielinghaus
2021-05-05 9:36 ` Richard Biener
2021-05-05 10:03 ` Richard Biener [this message]
2021-05-07 12:32 ` Stefan Schulze Frielinghaus
2021-05-20 9:24 ` Richard Biener
2021-05-20 18:37 ` Stefan Schulze Frielinghaus
2021-06-14 17:26 ` Stefan Schulze Frielinghaus
2021-06-16 14:22 ` Richard Biener
2021-06-25 10:23 ` Stefan Schulze Frielinghaus
2021-08-06 14:02 ` Stefan Schulze Frielinghaus
2021-08-20 10:35 ` Richard Biener
2021-09-03 8:00 ` Stefan Schulze Frielinghaus
2021-09-06 9:56 ` Richard Biener
2021-09-13 14:53 ` Stefan Schulze Frielinghaus
2021-09-17 8:08 ` Richard Biener
2021-10-11 16:02 ` Stefan Schulze Frielinghaus
2022-01-31 13:16 ` Tom de Vries
2022-01-31 15:00 ` Richard Biener
2022-01-31 16:26 ` [PATCH][ldist] Don't add lib calls with -fno-tree-loop-distribute-patterns Tom de Vries
2022-02-01 7:04 ` Richard Biener
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAFiYyc20YVeMzoThdx2ujbk1qmswjMaPO1ynGNwhU1jzJmGDQw@mail.gmail.com \
--to=richard.guenther@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=stefansf@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).