From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 15081 invoked by alias); 20 Feb 2017 15:04:02 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 15055 invoked by uid 89); 20 Feb 2017 15:04:01 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=1.0 required=5.0 tests=AWL,BAYES_50,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=no version=3.3.2 spammy=fraction, U*hubicka, D*ucw.cz, sk:treeve X-HELO: mail-vk0-f52.google.com Received: from mail-vk0-f52.google.com (HELO mail-vk0-f52.google.com) (209.85.213.52) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 20 Feb 2017 15:03:59 +0000 Received: by mail-vk0-f52.google.com with SMTP id x75so63119252vke.2 for ; Mon, 20 Feb 2017 07:03:59 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=MuhYZnHrOJpwfoU+U3DHmC+b/DPR7VbNBFnletC8tOA=; b=JzlvRVw22QVEJpBqexiCB1fz/Iq2CP7ERh6mom5SRoC90Skp5avWWSJ9hPgSw6JLfH 4GAzr+N2XMS/pCMZU7eHOD/41NJYtWJB/Rgatbwg4poZISwu31mADEW4dS8DwUML8Vdv hFdpVTMprhh6Bu6vTJv/yuEe3+xb0M82adDmw+Gt5qSfeOGAVeLi6iScdtEuchdd9ize ANEFKX76veGBLxuWV9dxNE06DZPhteh88lcFUn7/p/4X7+uIxG89ml/YD9RZl/Nvg3DW 1AvV4G8UUivHYx/9b2ZdMV9U4b/9f/y2iKIFOV7GcG1dzYsDWCz4iN84YVEf0F7iHcXC VoqA== X-Gm-Message-State: AMke39ldIWeOHp/R05ZodlTgayXYIicZOFY/TSO1ZjZDJS7aWkM3hUlmgI3SRqzdW3EFFfLCxqp4jHrasc/hyg== X-Received: by 10.31.93.66 with SMTP id r63mr10962911vkb.126.1487603038074; Mon, 20 Feb 2017 07:03:58 -0800 (PST) MIME-Version: 1.0 Received: by 10.103.72.157 with HTTP; Mon, 20 Feb 2017 07:03:57 -0800 (PST) In-Reply-To: <20170220140210.GA2932@kam.mff.cuni.cz> References: <20170220140210.GA2932@kam.mff.cuni.cz> From: "Bin.Cheng" Date: Mon, 20 Feb 2017 15:16:00 -0000 Message-ID: Subject: Re: [PATCH PR77536]Generate correct profiling information for vectorized loop To: Jan Hubicka Cc: Richard Biener , Bin Cheng , "gcc-patches@gcc.gnu.org" , nd , "pthaugen@linux.vnet.ibm.com" Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2017-02/txt/msg01229.txt.bz2 On Mon, Feb 20, 2017 at 2:02 PM, Jan Hubicka wrote: >> > 2017-02-16 Bin Cheng >> > >> > PR tree-optimization/77536 >> > * tree-ssa-loop-manip.c (niter_for_unrolled_loop): New function. >> > (tree_transform_and_unroll_loop): Use above function to compute the >> > estimated niter of unrolled loop. >> > * tree-ssa-loop-manip.h niter_for_unrolled_loop(): New declaration. >> > * tree-vect-loop.c (scale_profile_for_vect_loop): New function. >> > (vect_transform_loop): Call above function. Thanks very much for your suggestions. I don't know profiling logic very well and have some questions embedded below before I start revising patch. > > +/* Return estimated niter for LOOP after unrolling by FACTOR times. */ > + > +unsigned > +niter_for_unrolled_loop (struct loop *loop, unsigned factor) > +{ > + unsigned est_niter = expected_loop_iterations (loop); > > What happens when you have profile and loop iterates very many times? > Perhaps we want to do all calculation in gcov_type and use > expected_loop_iterations_unbounded>? > > expected_loop_iterations is capping by 10000 that is easy to overflow. > > + gcc_assert (factor != 0); > + unsigned new_est_niter = est_niter / factor; > + > + /* Without profile feedback, loops for which we do not know a better estimate > + are assumed to roll 10 times. When we unroll such loop, it appears to > + roll too little, and it may even seem to be cold. To avoid this, we > + ensure that the created loop appears to roll at least 5 times (but at > + most as many times as before unrolling). */ > + if (new_est_niter < 5) > + { > + if (est_niter < 5) > + new_est_niter = est_niter; > + else > + new_est_niter = 5; > + } > + > + return new_est_niter; > +} > > I see this code is pre-existing, but please extend it to test if > loop->header->count is non-zero. Even if we do not have idea about loop > iteration count estimate we may end up predicting more than 10 iterations when > predictors combine that way. If I use expected_loop_iterations_unbounded, then do I need to handle loop->header->count explicitly here? I suppose not because it has below code already: /* If we have no profile at all, use AVG_LOOP_NITER. */ if (profile_status_for_fn (cfun) == PROFILE_ABSENT) expected = PARAM_VALUE (PARAM_AVG_LOOP_NITER); else if (loop->latch && (loop->latch->count || loop->header->count)) { gcov_type count_in, count_latch; //... The second question is: looks like it only takes latch->count into consideration when PROFILE_ABSENT. But according to your comments, we could have nonzero count sometime? > > Perhaps testing estimated-loop_iterations would also make sense, but that > could be dealt with incrementally. > > +static void > +scale_profile_for_vect_loop (struct loop *loop, unsigned vf) > +{ > + unsigned freq_h = loop->header->frequency; > + unsigned freq_e = EDGE_FREQUENCY (loop_preheader_edge (loop)); > + /* Reduce loop iterations by the vectorization factor. */ > + unsigned new_est_niter = niter_for_unrolled_loop (loop, vf); > + > + if (freq_h != 0) > + scale_loop_frequencies (loop, freq_e * (new_est_niter + 1), freq_h); > + > I am always trying to avoid propagating small mistakes (i.e. frong freq_h or > freq_h) into bigger mistakes (i.e. wrong profile of the whole loop) to avoid > spreading mistakes across cfg. > > But I guess here it is sort of safe because vectorized loops are simple. > You can't just scale down the existing counts/frequencies by vf, because the > entry edge frequency was adjusted. I am not 100% follow here, it looks the code avoids changing frequency counter for preheader/exit edge, otherwise we would need to change all counters dominated by them? > > Also niter_for_unrolled_loop depends on sanity of the profile, so perhaps you > need to compute it before you start chanigng the CFG by peeling proplogue? Peeling for prologue doesn't change profiling information of vect_loop, it is the skip edge from before loop to preferred epilogue loop that will change profile counters. I guess here exists a dilemma that niter_for_unrolled_loop is for loop after peeling for prologue? Thanks, bin > > Finally if freq_e is 0, all frequencies and counts will be probably dropped to > 0. What about determining fraction by counts if they are available? > > Otherwise the patch looks good and thanks a lot for working on this! > > Honza > >> > >> > gcc/testsuite/ChangeLog >> > 2017-02-16 Bin Cheng >> > >> > PR tree-optimization/77536 >> > * gcc.dg/vect/pr79347.c: Revise testing string.