From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-448828-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 58252 invoked by alias); 20 Feb 2017 17:15:00 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 58242 invoked by uid 89); 20 Feb 2017 17:14:59 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.4 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=no version=3.3.2 spammy=hacking, H*f:sk:jb-kag@, sk:REG_BR_, sk:reg_br_
X-HELO: mail-ua0-f182.google.com
Received: from mail-ua0-f182.google.com (HELO mail-ua0-f182.google.com) (209.85.217.182) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 20 Feb 2017 17:14:57 +0000
Received: by mail-ua0-f182.google.com with SMTP id t17so60677380uae.3        for <gcc-patches@gcc.gnu.org>; Mon, 20 Feb 2017 09:14:57 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=1e100.net; s=20161025;        h=x-gm-message-state:mime-version:in-reply-to:references:from:date         :message-id:subject:to:cc;        bh=0CkLhX4yi5ICu4P59phvBrvMW/w6ZRRX6hyB7CSzi5k=;        b=ksuZ6QeYum0OmPeXfIPhHGOhp2eNIhKTKdYm7ijhtoNVQVVNoPNrHqRgJjxwm4F0vc         Q+ZIQHwEd1LYc6nEiBHlRndizz3cyvk5IcGKPReDv0mE+qwXCj5ODFHqKBbKOfdWGVg5         VCKs98weAtrHaWGjaFUtH+z0ByZwCDcm6yv5kp2PKOzJKJTQeYbnM5bmdR9xbCwNc2Xp         WKe+tCNfwx12z8hqjaHtbwgC+8FnvVd/X2Q+2k357rgfdr1JbeKs4lj2h694WEZBr6SZ         XJESntH/XJHxgMdqIRbRe+zN8OHmrkb1NAy1BFgFdF6rRYlALeHlDCKVTDB6rxooBxct         nVjg==
X-Gm-Message-State: AMke39nalT2eiwZnFq4x892HRQ4Lfjm7/qaaq4z51GrvWDE33Zioqcl8D2K7KrxOEMekmpJj09eiaRh+wD3eMw==
X-Received: by 10.159.33.35 with SMTP id 32mr9482512uab.43.1487610896215; Mon, 20 Feb 2017 09:14:56 -0800 (PST)
MIME-Version: 1.0
Received: by 10.103.72.157 with HTTP; Mon, 20 Feb 2017 09:14:55 -0800 (PST)
In-Reply-To: <20170220160509.GA2669@kam.mff.cuni.cz>
References: <VI1PR0802MB217672F0D45BA9EA238CA7BDE75A0@VI1PR0802MB2176.eurprd08.prod.outlook.com> <CAFiYyc2C2jr8+aJgdN5s_WEbg6Hh_1B2jzJ3TAvLxJJBZ-i70Q@mail.gmail.com> <20170220140210.GA2932@kam.mff.cuni.cz> <CAHFci2_Mq7b_-7cwxdmwvGWojgyeTg2qApedRKTFMgPcJvU-hw@mail.gmail.com> <20170220151705.GA29965@kam.mff.cuni.cz> <CAHFci294t7QDhjkKWp96pvyRXUm8gucfY6G6Lzhez1H+jb-kag@mail.gmail.com> <20170220160509.GA2669@kam.mff.cuni.cz>
From: "Bin.Cheng" <amker.cheng@gmail.com>
Date: Mon, 20 Feb 2017 17:53:00 -0000
Message-ID: <CAHFci2-E8Wqze03r6AqfjiL31hnh7gEitT3Of-iOeAKPF4enyA@mail.gmail.com>
Subject: Re: [PATCH PR77536]Generate correct profiling information for vectorized loop
To: Jan Hubicka <hubicka@ucw.cz>
Cc: Richard Biener <richard.guenther@gmail.com>, Bin Cheng <Bin.Cheng@arm.com>, 	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>, nd <nd@arm.com>, 	"pthaugen@linux.vnet.ibm.com" <pthaugen@linux.vnet.ibm.com>
Content-Type: text/plain; charset=UTF-8
X-IsSubscribed: yes
X-SW-Source: 2017-02/txt/msg01238.txt.bz2

On Mon, Feb 20, 2017 at 4:05 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> BTW, if we use gcov_type in calculation from expected_loop_iterations_unbounded,
>> how should we adjust the number for calling scale_loop_frequencies
>> which has int type?  In extreme case, gcov_type could be out of int's
>> range, we have to cap the value anyway.  But yes, 10000 in
>> expect_loop_iterations is too small.
>
> What I usually do is to use fixed point math in this case (based on REG_BR_PROB_BASE).
> Just pass REG_BR_PROB_BASE as den and calculate the adjustment in gcov_type converting
> to int. Because you should be just decreasing the counts, it won't overflow and because
> the decarese will be in range, say 2...256 times, it should also be sufficiently
> precise.
>
> Be careful to avoid overflow of gcov type - it is not safe to multiply two
> counts in 64bit math because each count can be more than 2^32.  (next stage1 I
> plan to switch most of this to sreals that will make this easier)
>
>> >> > But I guess here it is sort of safe because vectorized loops are simple.
>> >> > You can't just scale down the existing counts/frequencies by vf, because the
>> >> > entry edge frequency was adjusted.
>> >> I am not 100% follow here, it looks the code avoids changing frequency
>> >> counter for preheader/exit edge, otherwise we would need to change all
>> >> counters dominated by them?
>> >
>> > I was just wondering what is wrong with taking the existing frequencies/counts
>> > the loop body has and dividing them all by the unroll factor.  This is correct
>> > if you ignore the versioning. With versioning I guess you want to scale by
>> > the unroll factor and also subtract frequencies/counts that was acocunted to
>> > the other versions of the loop, right?
>> IIUC, for (vectorized) loop header, it's frequency is calculated by:
>>           freq_header = freq_preheader + freq_latch
>> and freq_latch = (niter * freq_preheader).  Simply scaling it by VF gives:
>>           freq_header = (freq_preheader + freq_latch) / VF
>> which is wrong.  Especially if the loop is vectorized by large VF
>> (=16) and we normally assume niter (=10) without profiling
>> information, it results not only mismatch, but also
>> (loop->header->frequency < loop->preheader->frequency).  In fact, if
>> we have accurate niter information, the counter should be:
>>           freq_header = freq_preheader + niter * freq_preheader
>
> You are right. We need to compensate for the change of probability of the loop
> exit edge.
>>
>> >> >
>> >> > Also niter_for_unrolled_loop depends on sanity of the profile, so perhaps you
>> >> > need to compute it before you start chanigng the CFG by peeling proplogue?
>> >> Peeling for prologue doesn't change profiling information of
>> >> vect_loop, it is the skip edge from before loop to preferred epilogue
>> >> loop that will change profile counters.  I guess here exists a dilemma
>> >> that niter_for_unrolled_loop is for loop after peeling for prologue?
>> >
>> > expected_loop_iterations_unbounded calculates number of iteations by computing
>> > sum of frequencies of edges entering the loop and comparing it to the frequency
>> > of loop header.  While peeling the prologue, you split the preheader edge and
>> > adjust frequency of the new preheader BB of the loop to be vectorized.  I think
>> > that will adjust the #of iterations estimate.
>> It's not the case now I think.  one motivation of new vect_do_peeling
>> is to avoid niter checks between prologue and vector loop.  Once
>> prologue loop is entered or checked, the vector loop must be executed
>> unconditionally.  So the preheaderof vector loop has consistent
>> frequency counters now.  The niter check on whether vector loop should
>> be executed is now merged with cost check before prologue, and in the
>> future I think this can be further merged if loop versioning is
>> needed.
>
> Originally you have
>
>   loop_preheader
>        |
>        v
>    loop_header
>
> and the ratio of the two BB frequencies is the loop iteration count. Then you
> do something like:
>
>   orig_loop_preheader
>        |
>        v
>    loop_prologue -----> scalar_version_of_loop
>        |
>        v
>  new_loop_preheader
>        |
>        v
>    loop_header
It's like:

   orig_loop_preheader
        |
        v
    check_on_niter -----> scalar_version_of_loop (i.e, epilog_loop)
        |
        v
    loop_prologue
        |
        v
  new_loop_preheader
        |
        v
    loop_header

Yes, the preheader/header need to be consistent here, and it is now.
Thanks for explaining.

> At some point, you need to update new_loop_preheader frequency/count
> to reflect the fact that with some probability the loop_prologue avoids
> the vectorized loop.  Once you do it and if you don't scale frequency of
> loop_header you will make expect_loop_iterations to return higher value
> than previously.
>
> So at the time you are calling it, you need to be sure that the loop_header
> and its preheader frequences was both adjusted by same factor.  Or you need
> to call it early before you start hacking on the CFG and its profile.
>
> Pehaps currently it is safe, because your peeling code is also scaling
> the loop profiles.

I will revise the patch according to this discussion.

Thanks,
bin
>
> Honza
>>
>> Thanks,
>> bin
>> >
>> > Honza
>> >>
>> >> Thanks,
>> >> bin
>> >> >
>> >> > Finally if freq_e is 0, all frequencies and counts will be probably dropped to
>> >> > 0.  What about determining fraction by counts if they are available?
>> >> >
>> >> > Otherwise the patch looks good and thanks a lot for working on this!
>> >> >
>> >> > Honza
>> >> >
>> >> >> >
>> >> >> > gcc/testsuite/ChangeLog
>> >> >> > 2017-02-16  Bin Cheng  <bin.cheng@arm.com>
>> >> >> >
>> >> >> >         PR tree-optimization/77536
>> >> >> >         * gcc.dg/vect/pr79347.c: Revise testing string.