From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 34446 invoked by alias); 20 Feb 2017 14:02:18 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 34432 invoked by uid 89); 20 Feb 2017 14:02:17 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.8 required=5.0 tests=AWL,BAYES_40,KAM_LAZY_DOMAIN_SECURITY,RP_MATCHES_RCVD autolearn=no version=3.3.2 spammy=fraction, sk:treeve, honza, sk:tree_tr X-HELO: nikam.ms.mff.cuni.cz Received: from nikam.ms.mff.cuni.cz (HELO nikam.ms.mff.cuni.cz) (195.113.20.16) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 20 Feb 2017 14:02:14 +0000 Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id 2690D542B60; Mon, 20 Feb 2017 15:02:11 +0100 (CET) Date: Mon, 20 Feb 2017 14:21:00 -0000 From: Jan Hubicka To: Richard Biener Cc: Bin Cheng , Jan Hubicka , "gcc-patches@gcc.gnu.org" , nd , "pthaugen@linux.vnet.ibm.com" Subject: Re: [PATCH PR77536]Generate correct profiling information for vectorized loop Message-ID: <20170220140210.GA2932@kam.mff.cuni.cz> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-SW-Source: 2017-02/txt/msg01224.txt.bz2 > > 2017-02-16 Bin Cheng > > > > PR tree-optimization/77536 > > * tree-ssa-loop-manip.c (niter_for_unrolled_loop): New function. > > (tree_transform_and_unroll_loop): Use above function to compute the > > estimated niter of unrolled loop. > > * tree-ssa-loop-manip.h niter_for_unrolled_loop(): New declaration. > > * tree-vect-loop.c (scale_profile_for_vect_loop): New function. > > (vect_transform_loop): Call above function. +/* Return estimated niter for LOOP after unrolling by FACTOR times. */ + +unsigned +niter_for_unrolled_loop (struct loop *loop, unsigned factor) +{ + unsigned est_niter = expected_loop_iterations (loop); What happens when you have profile and loop iterates very many times? Perhaps we want to do all calculation in gcov_type and use expected_loop_iterations_unbounded>? expected_loop_iterations is capping by 10000 that is easy to overflow. + gcc_assert (factor != 0); + unsigned new_est_niter = est_niter / factor; + + /* Without profile feedback, loops for which we do not know a better estimate + are assumed to roll 10 times. When we unroll such loop, it appears to + roll too little, and it may even seem to be cold. To avoid this, we + ensure that the created loop appears to roll at least 5 times (but at + most as many times as before unrolling). */ + if (new_est_niter < 5) + { + if (est_niter < 5) + new_est_niter = est_niter; + else + new_est_niter = 5; + } + + return new_est_niter; +} I see this code is pre-existing, but please extend it to test if loop->header->count is non-zero. Even if we do not have idea about loop iteration count estimate we may end up predicting more than 10 iterations when predictors combine that way. Perhaps testing estimated-loop_iterations would also make sense, but that could be dealt with incrementally. +static void +scale_profile_for_vect_loop (struct loop *loop, unsigned vf) +{ + unsigned freq_h = loop->header->frequency; + unsigned freq_e = EDGE_FREQUENCY (loop_preheader_edge (loop)); + /* Reduce loop iterations by the vectorization factor. */ + unsigned new_est_niter = niter_for_unrolled_loop (loop, vf); + + if (freq_h != 0) + scale_loop_frequencies (loop, freq_e * (new_est_niter + 1), freq_h); + I am always trying to avoid propagating small mistakes (i.e. frong freq_h or freq_h) into bigger mistakes (i.e. wrong profile of the whole loop) to avoid spreading mistakes across cfg. But I guess here it is sort of safe because vectorized loops are simple. You can't just scale down the existing counts/frequencies by vf, because the entry edge frequency was adjusted. Also niter_for_unrolled_loop depends on sanity of the profile, so perhaps you need to compute it before you start chanigng the CFG by peeling proplogue? Finally if freq_e is 0, all frequencies and counts will be probably dropped to 0. What about determining fraction by counts if they are available? Otherwise the patch looks good and thanks a lot for working on this! Honza > > > > gcc/testsuite/ChangeLog > > 2017-02-16 Bin Cheng > > > > PR tree-optimization/77536 > > * gcc.dg/vect/pr79347.c: Revise testing string.