public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Bin.Cheng" <amker.cheng@gmail.com>
To: Jan Hubicka <hubicka@ucw.cz>
Cc: Richard Biener <richard.guenther@gmail.com>,
	Bin Cheng <Bin.Cheng@arm.com>,
		"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	nd <nd@arm.com>,
		"pthaugen@linux.vnet.ibm.com" <pthaugen@linux.vnet.ibm.com>
Subject: Re: [PATCH PR77536]Generate correct profiling information for vectorized loop
Date: Mon, 20 Feb 2017 17:53:00 -0000	[thread overview]
Message-ID: <CAHFci2-E8Wqze03r6AqfjiL31hnh7gEitT3Of-iOeAKPF4enyA@mail.gmail.com> (raw)
In-Reply-To: <20170220160509.GA2669@kam.mff.cuni.cz>

On Mon, Feb 20, 2017 at 4:05 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> BTW, if we use gcov_type in calculation from expected_loop_iterations_unbounded,
>> how should we adjust the number for calling scale_loop_frequencies
>> which has int type?  In extreme case, gcov_type could be out of int's
>> range, we have to cap the value anyway.  But yes, 10000 in
>> expect_loop_iterations is too small.
>
> What I usually do is to use fixed point math in this case (based on REG_BR_PROB_BASE).
> Just pass REG_BR_PROB_BASE as den and calculate the adjustment in gcov_type converting
> to int. Because you should be just decreasing the counts, it won't overflow and because
> the decarese will be in range, say 2...256 times, it should also be sufficiently
> precise.
>
> Be careful to avoid overflow of gcov type - it is not safe to multiply two
> counts in 64bit math because each count can be more than 2^32.  (next stage1 I
> plan to switch most of this to sreals that will make this easier)
>
>> >> > But I guess here it is sort of safe because vectorized loops are simple.
>> >> > You can't just scale down the existing counts/frequencies by vf, because the
>> >> > entry edge frequency was adjusted.
>> >> I am not 100% follow here, it looks the code avoids changing frequency
>> >> counter for preheader/exit edge, otherwise we would need to change all
>> >> counters dominated by them?
>> >
>> > I was just wondering what is wrong with taking the existing frequencies/counts
>> > the loop body has and dividing them all by the unroll factor.  This is correct
>> > if you ignore the versioning. With versioning I guess you want to scale by
>> > the unroll factor and also subtract frequencies/counts that was acocunted to
>> > the other versions of the loop, right?
>> IIUC, for (vectorized) loop header, it's frequency is calculated by:
>>           freq_header = freq_preheader + freq_latch
>> and freq_latch = (niter * freq_preheader).  Simply scaling it by VF gives:
>>           freq_header = (freq_preheader + freq_latch) / VF
>> which is wrong.  Especially if the loop is vectorized by large VF
>> (=16) and we normally assume niter (=10) without profiling
>> information, it results not only mismatch, but also
>> (loop->header->frequency < loop->preheader->frequency).  In fact, if
>> we have accurate niter information, the counter should be:
>>           freq_header = freq_preheader + niter * freq_preheader
>
> You are right. We need to compensate for the change of probability of the loop
> exit edge.
>>
>> >> >
>> >> > Also niter_for_unrolled_loop depends on sanity of the profile, so perhaps you
>> >> > need to compute it before you start chanigng the CFG by peeling proplogue?
>> >> Peeling for prologue doesn't change profiling information of
>> >> vect_loop, it is the skip edge from before loop to preferred epilogue
>> >> loop that will change profile counters.  I guess here exists a dilemma
>> >> that niter_for_unrolled_loop is for loop after peeling for prologue?
>> >
>> > expected_loop_iterations_unbounded calculates number of iteations by computing
>> > sum of frequencies of edges entering the loop and comparing it to the frequency
>> > of loop header.  While peeling the prologue, you split the preheader edge and
>> > adjust frequency of the new preheader BB of the loop to be vectorized.  I think
>> > that will adjust the #of iterations estimate.
>> It's not the case now I think.  one motivation of new vect_do_peeling
>> is to avoid niter checks between prologue and vector loop.  Once
>> prologue loop is entered or checked, the vector loop must be executed
>> unconditionally.  So the preheaderof vector loop has consistent
>> frequency counters now.  The niter check on whether vector loop should
>> be executed is now merged with cost check before prologue, and in the
>> future I think this can be further merged if loop versioning is
>> needed.
>
> Originally you have
>
>   loop_preheader
>        |
>        v
>    loop_header
>
> and the ratio of the two BB frequencies is the loop iteration count. Then you
> do something like:
>
>   orig_loop_preheader
>        |
>        v
>    loop_prologue -----> scalar_version_of_loop
>        |
>        v
>  new_loop_preheader
>        |
>        v
>    loop_header
It's like:

   orig_loop_preheader
        |
        v
    check_on_niter -----> scalar_version_of_loop (i.e, epilog_loop)
        |
        v
    loop_prologue
        |
        v
  new_loop_preheader
        |
        v
    loop_header

Yes, the preheader/header need to be consistent here, and it is now.
Thanks for explaining.

> At some point, you need to update new_loop_preheader frequency/count
> to reflect the fact that with some probability the loop_prologue avoids
> the vectorized loop.  Once you do it and if you don't scale frequency of
> loop_header you will make expect_loop_iterations to return higher value
> than previously.
>
> So at the time you are calling it, you need to be sure that the loop_header
> and its preheader frequences was both adjusted by same factor.  Or you need
> to call it early before you start hacking on the CFG and its profile.
>
> Pehaps currently it is safe, because your peeling code is also scaling
> the loop profiles.

I will revise the patch according to this discussion.

Thanks,
bin
>
> Honza
>>
>> Thanks,
>> bin
>> >
>> > Honza
>> >>
>> >> Thanks,
>> >> bin
>> >> >
>> >> > Finally if freq_e is 0, all frequencies and counts will be probably dropped to
>> >> > 0.  What about determining fraction by counts if they are available?
>> >> >
>> >> > Otherwise the patch looks good and thanks a lot for working on this!
>> >> >
>> >> > Honza
>> >> >
>> >> >> >
>> >> >> > gcc/testsuite/ChangeLog
>> >> >> > 2017-02-16  Bin Cheng  <bin.cheng@arm.com>
>> >> >> >
>> >> >> >         PR tree-optimization/77536
>> >> >> >         * gcc.dg/vect/pr79347.c: Revise testing string.

  reply	other threads:[~2017-02-20 17:15 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-16 18:38 Bin Cheng
2017-02-17  1:39 ` Pat Haugen
2017-02-20 12:54 ` Richard Biener
2017-02-20 14:21   ` Jan Hubicka
2017-02-20 15:16     ` Bin.Cheng
2017-02-20 15:44       ` Jan Hubicka
2017-02-20 16:05         ` Bin.Cheng
2017-02-20 17:02           ` Jan Hubicka
2017-02-20 17:53             ` Bin.Cheng [this message]
2017-02-21 14:48             ` Bin.Cheng
2017-02-21 15:52               ` Jan Hubicka
2017-02-22 12:23                 ` Bin.Cheng
2017-02-22 14:59                   ` Jan Hubicka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHFci2-E8Wqze03r6AqfjiL31hnh7gEitT3Of-iOeAKPF4enyA@mail.gmail.com \
    --to=amker.cheng@gmail.com \
    --cc=Bin.Cheng@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=hubicka@ucw.cz \
    --cc=nd@arm.com \
    --cc=pthaugen@linux.vnet.ibm.com \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).