Re: [PATCH PR77536]Generate correct profiling information for vectorized loop

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: "Bin.Cheng" <amker.cheng@gmail.com>
To: Jan Hubicka <hubicka@ucw.cz>
Cc: Richard Biener <richard.guenther@gmail.com>,
	Bin Cheng <Bin.Cheng@arm.com>,
		"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	nd <nd@arm.com>,
		"pthaugen@linux.vnet.ibm.com" <pthaugen@linux.vnet.ibm.com>
Subject: Re: [PATCH PR77536]Generate correct profiling information for vectorized loop
Date: Mon, 20 Feb 2017 16:05:00 -0000	[thread overview]
Message-ID: <CAHFci294t7QDhjkKWp96pvyRXUm8gucfY6G6Lzhez1H+jb-kag@mail.gmail.com> (raw)
In-Reply-To: <20170220151705.GA29965@kam.mff.cuni.cz>

On Mon, Feb 20, 2017 at 3:17 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> > +  /* Without profile feedback, loops for which we do not know a better estimate
>> > +     are assumed to roll 10 times.  When we unroll such loop, it appears to
>> > +     roll too little, and it may even seem to be cold.  To avoid this, we
>> > +     ensure that the created loop appears to roll at least 5 times (but at
>> > +     most as many times as before unrolling).  */
>> > +  if (new_est_niter < 5)
>> > +    {
>> > +      if (est_niter < 5)
>> > +       new_est_niter = est_niter;
>> > +      else
>> > +       new_est_niter = 5;
>> > +    }
>> > +
>> > +  return new_est_niter;
>> > +}
>> >
>> > I see this code is pre-existing, but please extend it to test if
>> > loop->header->count is non-zero.  Even if we do not have idea about loop
>> > iteration count estimate we may end up predicting more than 10 iterations when
>> > predictors combine that way.
>> If I use expected_loop_iterations_unbounded, then do I need to handle
>> loop->header->count explicitly here?  I suppose not because it has
>> below code already:
>>
>>   /* If we have no profile at all, use AVG_LOOP_NITER.  */
>>   if (profile_status_for_fn (cfun) == PROFILE_ABSENT)
>>     expected = PARAM_VALUE (PARAM_AVG_LOOP_NITER);
>>   else if (loop->latch && (loop->latch->count || loop->header->count))
>>     {
>>       gcov_type count_in, count_latch;
>>       //...
>
> expected_loop_iterations_unbounded avoids capping the # of iterations comming
> from profile feedback (so if loop iterates 1000000 it retns 1000000 instead of
> 10000). Testing loop->header->count for non-zero tests if the loop was ever
> entered in the train run so it is easy test if you have any profile feedback
> for a given loop available.
BTW, if we use gcov_type in calculation from expected_loop_iterations_unbounded,
how should we adjust the number for calling scale_loop_frequencies
which has int type?  In extreme case, gcov_type could be out of int's
range, we have to cap the value anyway.  But yes, 10000 in
expect_loop_iterations is too small.
>
> I profile feedback is there, then making usre that new_est_niter is at least
> 5 will only make feedback less acurate.  I am not sure if we vectorize loop
> that will iterate only 1...4 times after unrolling, but I guess we could,
> and it would be desirable to peel it afterwards.
Right, I think we can discard this lower bound if profiling
information is present.
>>
>> The second question is: looks like it only takes latch->count into
>> consideration when PROFILE_ABSENT.  But according to your comments, we
>> could have nonzero count sometime?
>
> profile_status_for_fn (cfun) == PROFILE_ABSENT is set only when there is no
> profile esitmate at all - i.e. you compile with
> -fno-guess-branch-probabilities.  We don't really use that path very often
> but in that case you have no data to guess your profile from.
>
>>
>> >
>> > Perhaps testing estimated-loop_iterations would also make sense, but that
>> > could be dealt with incrementally.
>> >
>> > +static void
>> > +scale_profile_for_vect_loop (struct loop *loop, unsigned vf)
>> > +{
>> > +  unsigned freq_h = loop->header->frequency;
>> > +  unsigned freq_e = EDGE_FREQUENCY (loop_preheader_edge (loop));
>> > +  /* Reduce loop iterations by the vectorization factor.  */
>> > +  unsigned new_est_niter = niter_for_unrolled_loop (loop, vf);
>> > +
>> > +  if (freq_h != 0)
>> > +    scale_loop_frequencies (loop, freq_e * (new_est_niter + 1), freq_h);
>> > +
>> > I am always trying to avoid propagating small mistakes (i.e. frong freq_h or
>> > freq_h) into bigger mistakes (i.e. wrong profile of the whole loop) to avoid
>> > spreading mistakes across cfg.
>> >
>> > But I guess here it is sort of safe because vectorized loops are simple.
>> > You can't just scale down the existing counts/frequencies by vf, because the
>> > entry edge frequency was adjusted.
>> I am not 100% follow here, it looks the code avoids changing frequency
>> counter for preheader/exit edge, otherwise we would need to change all
>> counters dominated by them?
>
> I was just wondering what is wrong with taking the existing frequencies/counts
> the loop body has and dividing them all by the unroll factor.  This is correct
> if you ignore the versioning. With versioning I guess you want to scale by
> the unroll factor and also subtract frequencies/counts that was acocunted to
> the other versions of the loop, right?
IIUC, for (vectorized) loop header, it's frequency is calculated by:
          freq_header = freq_preheader + freq_latch
and freq_latch = (niter * freq_preheader).  Simply scaling it by VF gives:
          freq_header = (freq_preheader + freq_latch) / VF
which is wrong.  Especially if the loop is vectorized by large VF
(=16) and we normally assume niter (=10) without profiling
information, it results not only mismatch, but also
(loop->header->frequency < loop->preheader->frequency).  In fact, if
we have accurate niter information, the counter should be:
          freq_header = freq_preheader + niter * freq_preheader

>> >
>> > Also niter_for_unrolled_loop depends on sanity of the profile, so perhaps you
>> > need to compute it before you start chanigng the CFG by peeling proplogue?
>> Peeling for prologue doesn't change profiling information of
>> vect_loop, it is the skip edge from before loop to preferred epilogue
>> loop that will change profile counters.  I guess here exists a dilemma
>> that niter_for_unrolled_loop is for loop after peeling for prologue?
>
> expected_loop_iterations_unbounded calculates number of iteations by computing
> sum of frequencies of edges entering the loop and comparing it to the frequency
> of loop header.  While peeling the prologue, you split the preheader edge and
> adjust frequency of the new preheader BB of the loop to be vectorized.  I think
> that will adjust the #of iterations estimate.
It's not the case now I think.  one motivation of new vect_do_peeling
is to avoid niter checks between prologue and vector loop.  Once
prologue loop is entered or checked, the vector loop must be executed
unconditionally.  So the preheaderof vector loop has consistent
frequency counters now.  The niter check on whether vector loop should
be executed is now merged with cost check before prologue, and in the
future I think this can be further merged if loop versioning is
needed.

Thanks,
bin
>
> Honza
>>
>> Thanks,
>> bin
>> >
>> > Finally if freq_e is 0, all frequencies and counts will be probably dropped to
>> > 0.  What about determining fraction by counts if they are available?
>> >
>> > Otherwise the patch looks good and thanks a lot for working on this!
>> >
>> > Honza
>> >
>> >> >
>> >> > gcc/testsuite/ChangeLog
>> >> > 2017-02-16  Bin Cheng  <bin.cheng@arm.com>
>> >> >
>> >> >         PR tree-optimization/77536
>> >> >         * gcc.dg/vect/pr79347.c: Revise testing string.

next prev parent reply	other threads:[~2017-02-20 15:48 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-16 18:38 Bin Cheng
2017-02-17  1:39 ` Pat Haugen
2017-02-20 12:54 ` Richard Biener
2017-02-20 14:21   ` Jan Hubicka
2017-02-20 15:16     ` Bin.Cheng
2017-02-20 15:44       ` Jan Hubicka
2017-02-20 16:05         ` Bin.Cheng [this message]
2017-02-20 17:02           ` Jan Hubicka
2017-02-20 17:53             ` Bin.Cheng
2017-02-21 14:48             ` Bin.Cheng
2017-02-21 15:52               ` Jan Hubicka
2017-02-22 12:23                 ` Bin.Cheng
2017-02-22 14:59                   ` Jan Hubicka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHFci294t7QDhjkKWp96pvyRXUm8gucfY6G6Lzhez1H+jb-kag@mail.gmail.com \
    --to=amker.cheng@gmail.com \
    --cc=Bin.Cheng@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=hubicka@ucw.cz \
    --cc=nd@arm.com \
    --cc=pthaugen@linux.vnet.ibm.com \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).