From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 58252 invoked by alias); 20 Feb 2017 17:15:00 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 58242 invoked by uid 89); 20 Feb 2017 17:14:59 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.4 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=no version=3.3.2 spammy=hacking, H*f:sk:jb-kag@, sk:REG_BR_, sk:reg_br_ X-HELO: mail-ua0-f182.google.com Received: from mail-ua0-f182.google.com (HELO mail-ua0-f182.google.com) (209.85.217.182) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 20 Feb 2017 17:14:57 +0000 Received: by mail-ua0-f182.google.com with SMTP id t17so60677380uae.3 for ; Mon, 20 Feb 2017 09:14:57 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=0CkLhX4yi5ICu4P59phvBrvMW/w6ZRRX6hyB7CSzi5k=; b=ksuZ6QeYum0OmPeXfIPhHGOhp2eNIhKTKdYm7ijhtoNVQVVNoPNrHqRgJjxwm4F0vc Q+ZIQHwEd1LYc6nEiBHlRndizz3cyvk5IcGKPReDv0mE+qwXCj5ODFHqKBbKOfdWGVg5 VCKs98weAtrHaWGjaFUtH+z0ByZwCDcm6yv5kp2PKOzJKJTQeYbnM5bmdR9xbCwNc2Xp WKe+tCNfwx12z8hqjaHtbwgC+8FnvVd/X2Q+2k357rgfdr1JbeKs4lj2h694WEZBr6SZ XJESntH/XJHxgMdqIRbRe+zN8OHmrkb1NAy1BFgFdF6rRYlALeHlDCKVTDB6rxooBxct nVjg== X-Gm-Message-State: AMke39nalT2eiwZnFq4x892HRQ4Lfjm7/qaaq4z51GrvWDE33Zioqcl8D2K7KrxOEMekmpJj09eiaRh+wD3eMw== X-Received: by 10.159.33.35 with SMTP id 32mr9482512uab.43.1487610896215; Mon, 20 Feb 2017 09:14:56 -0800 (PST) MIME-Version: 1.0 Received: by 10.103.72.157 with HTTP; Mon, 20 Feb 2017 09:14:55 -0800 (PST) In-Reply-To: <20170220160509.GA2669@kam.mff.cuni.cz> References: <20170220140210.GA2932@kam.mff.cuni.cz> <20170220151705.GA29965@kam.mff.cuni.cz> <20170220160509.GA2669@kam.mff.cuni.cz> From: "Bin.Cheng" Date: Mon, 20 Feb 2017 17:53:00 -0000 Message-ID: Subject: Re: [PATCH PR77536]Generate correct profiling information for vectorized loop To: Jan Hubicka Cc: Richard Biener , Bin Cheng , "gcc-patches@gcc.gnu.org" , nd , "pthaugen@linux.vnet.ibm.com" Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2017-02/txt/msg01238.txt.bz2 On Mon, Feb 20, 2017 at 4:05 PM, Jan Hubicka wrote: >> BTW, if we use gcov_type in calculation from expected_loop_iterations_unbounded, >> how should we adjust the number for calling scale_loop_frequencies >> which has int type? In extreme case, gcov_type could be out of int's >> range, we have to cap the value anyway. But yes, 10000 in >> expect_loop_iterations is too small. > > What I usually do is to use fixed point math in this case (based on REG_BR_PROB_BASE). > Just pass REG_BR_PROB_BASE as den and calculate the adjustment in gcov_type converting > to int. Because you should be just decreasing the counts, it won't overflow and because > the decarese will be in range, say 2...256 times, it should also be sufficiently > precise. > > Be careful to avoid overflow of gcov type - it is not safe to multiply two > counts in 64bit math because each count can be more than 2^32. (next stage1 I > plan to switch most of this to sreals that will make this easier) > >> >> > But I guess here it is sort of safe because vectorized loops are simple. >> >> > You can't just scale down the existing counts/frequencies by vf, because the >> >> > entry edge frequency was adjusted. >> >> I am not 100% follow here, it looks the code avoids changing frequency >> >> counter for preheader/exit edge, otherwise we would need to change all >> >> counters dominated by them? >> > >> > I was just wondering what is wrong with taking the existing frequencies/counts >> > the loop body has and dividing them all by the unroll factor. This is correct >> > if you ignore the versioning. With versioning I guess you want to scale by >> > the unroll factor and also subtract frequencies/counts that was acocunted to >> > the other versions of the loop, right? >> IIUC, for (vectorized) loop header, it's frequency is calculated by: >> freq_header = freq_preheader + freq_latch >> and freq_latch = (niter * freq_preheader). Simply scaling it by VF gives: >> freq_header = (freq_preheader + freq_latch) / VF >> which is wrong. Especially if the loop is vectorized by large VF >> (=16) and we normally assume niter (=10) without profiling >> information, it results not only mismatch, but also >> (loop->header->frequency < loop->preheader->frequency). In fact, if >> we have accurate niter information, the counter should be: >> freq_header = freq_preheader + niter * freq_preheader > > You are right. We need to compensate for the change of probability of the loop > exit edge. >> >> >> > >> >> > Also niter_for_unrolled_loop depends on sanity of the profile, so perhaps you >> >> > need to compute it before you start chanigng the CFG by peeling proplogue? >> >> Peeling for prologue doesn't change profiling information of >> >> vect_loop, it is the skip edge from before loop to preferred epilogue >> >> loop that will change profile counters. I guess here exists a dilemma >> >> that niter_for_unrolled_loop is for loop after peeling for prologue? >> > >> > expected_loop_iterations_unbounded calculates number of iteations by computing >> > sum of frequencies of edges entering the loop and comparing it to the frequency >> > of loop header. While peeling the prologue, you split the preheader edge and >> > adjust frequency of the new preheader BB of the loop to be vectorized. I think >> > that will adjust the #of iterations estimate. >> It's not the case now I think. one motivation of new vect_do_peeling >> is to avoid niter checks between prologue and vector loop. Once >> prologue loop is entered or checked, the vector loop must be executed >> unconditionally. So the preheaderof vector loop has consistent >> frequency counters now. The niter check on whether vector loop should >> be executed is now merged with cost check before prologue, and in the >> future I think this can be further merged if loop versioning is >> needed. > > Originally you have > > loop_preheader > | > v > loop_header > > and the ratio of the two BB frequencies is the loop iteration count. Then you > do something like: > > orig_loop_preheader > | > v > loop_prologue -----> scalar_version_of_loop > | > v > new_loop_preheader > | > v > loop_header It's like: orig_loop_preheader | v check_on_niter -----> scalar_version_of_loop (i.e, epilog_loop) | v loop_prologue | v new_loop_preheader | v loop_header Yes, the preheader/header need to be consistent here, and it is now. Thanks for explaining. > At some point, you need to update new_loop_preheader frequency/count > to reflect the fact that with some probability the loop_prologue avoids > the vectorized loop. Once you do it and if you don't scale frequency of > loop_header you will make expect_loop_iterations to return higher value > than previously. > > So at the time you are calling it, you need to be sure that the loop_header > and its preheader frequences was both adjusted by same factor. Or you need > to call it early before you start hacking on the CFG and its profile. > > Pehaps currently it is safe, because your peeling code is also scaling > the loop profiles. I will revise the patch according to this discussion. Thanks, bin > > Honza >> >> Thanks, >> bin >> > >> > Honza >> >> >> >> Thanks, >> >> bin >> >> > >> >> > Finally if freq_e is 0, all frequencies and counts will be probably dropped to >> >> > 0. What about determining fraction by counts if they are available? >> >> > >> >> > Otherwise the patch looks good and thanks a lot for working on this! >> >> > >> >> > Honza >> >> > >> >> >> > >> >> >> > gcc/testsuite/ChangeLog >> >> >> > 2017-02-16 Bin Cheng >> >> >> > >> >> >> > PR tree-optimization/77536 >> >> >> > * gcc.dg/vect/pr79347.c: Revise testing string.