From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-448814-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 34446 invoked by alias); 20 Feb 2017 14:02:18 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 34432 invoked by uid 89); 20 Feb 2017 14:02:17 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=0.8 required=5.0 tests=AWL,BAYES_40,KAM_LAZY_DOMAIN_SECURITY,RP_MATCHES_RCVD autolearn=no version=3.3.2 spammy=fraction, sk:treeve, honza, sk:tree_tr
X-HELO: nikam.ms.mff.cuni.cz
Received: from nikam.ms.mff.cuni.cz (HELO nikam.ms.mff.cuni.cz) (195.113.20.16) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 20 Feb 2017 14:02:14 +0000
Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202)	id 2690D542B60; Mon, 20 Feb 2017 15:02:11 +0100 (CET)
Date: Mon, 20 Feb 2017 14:21:00 -0000
From: Jan Hubicka <hubicka@ucw.cz>
To: Richard Biener <richard.guenther@gmail.com>
Cc: Bin Cheng <Bin.Cheng@arm.com>, Jan Hubicka <hubicka@ucw.cz>,	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,	nd <nd@arm.com>,	"pthaugen@linux.vnet.ibm.com" <pthaugen@linux.vnet.ibm.com>
Subject: Re: [PATCH PR77536]Generate correct profiling information for vectorized loop
Message-ID: <20170220140210.GA2932@kam.mff.cuni.cz>
References: <VI1PR0802MB217672F0D45BA9EA238CA7BDE75A0@VI1PR0802MB2176.eurprd08.prod.outlook.com> <CAFiYyc2C2jr8+aJgdN5s_WEbg6Hh_1B2jzJ3TAvLxJJBZ-i70Q@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAFiYyc2C2jr8+aJgdN5s_WEbg6Hh_1B2jzJ3TAvLxJJBZ-i70Q@mail.gmail.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
X-SW-Source: 2017-02/txt/msg01224.txt.bz2

> > 2017-02-16  Bin Cheng  <bin.cheng@arm.com>
> >
> >         PR tree-optimization/77536
> >         * tree-ssa-loop-manip.c (niter_for_unrolled_loop): New function.
> >         (tree_transform_and_unroll_loop): Use above function to compute the
> >         estimated niter of unrolled loop.
> >         * tree-ssa-loop-manip.h niter_for_unrolled_loop(): New declaration.
> >         * tree-vect-loop.c (scale_profile_for_vect_loop): New function.
> >         (vect_transform_loop): Call above function.

+/* Return estimated niter for LOOP after unrolling by FACTOR times.  */
+
+unsigned
+niter_for_unrolled_loop (struct loop *loop, unsigned factor)
+{
+  unsigned est_niter = expected_loop_iterations (loop);

What happens when you have profile and loop iterates very many times?
Perhaps we want to do all calculation in gcov_type and use
expected_loop_iterations_unbounded>?

expected_loop_iterations is capping by 10000 that is easy to overflow.

+  gcc_assert (factor != 0);
+  unsigned new_est_niter = est_niter / factor;
+
+  /* Without profile feedback, loops for which we do not know a better estimate
+     are assumed to roll 10 times.  When we unroll such loop, it appears to
+     roll too little, and it may even seem to be cold.  To avoid this, we
+     ensure that the created loop appears to roll at least 5 times (but at
+     most as many times as before unrolling).  */
+  if (new_est_niter < 5)
+    {
+      if (est_niter < 5)
+	new_est_niter = est_niter;
+      else
+	new_est_niter = 5;
+    }
+
+  return new_est_niter;
+}

I see this code is pre-existing, but please extend it to test if
loop->header->count is non-zero.  Even if we do not have idea about loop
iteration count estimate we may end up predicting more than 10 iterations when
predictors combine that way.

Perhaps testing estimated-loop_iterations would also make sense, but that
could be dealt with incrementally.

+static void
+scale_profile_for_vect_loop (struct loop *loop, unsigned vf)
+{
+  unsigned freq_h = loop->header->frequency;
+  unsigned freq_e = EDGE_FREQUENCY (loop_preheader_edge (loop));
+  /* Reduce loop iterations by the vectorization factor.  */
+  unsigned new_est_niter = niter_for_unrolled_loop (loop, vf);
+
+  if (freq_h != 0)
+    scale_loop_frequencies (loop, freq_e * (new_est_niter + 1), freq_h);
+
I am always trying to avoid propagating small mistakes (i.e. frong freq_h or
freq_h) into bigger mistakes (i.e. wrong profile of the whole loop) to avoid
spreading mistakes across cfg.

But I guess here it is sort of safe because vectorized loops are simple.
You can't just scale down the existing counts/frequencies by vf, because the
entry edge frequency was adjusted.

Also niter_for_unrolled_loop depends on sanity of the profile, so perhaps you
need to compute it before you start chanigng the CFG by peeling proplogue?

Finally if freq_e is 0, all frequencies and counts will be probably dropped to
0.  What about determining fraction by counts if they are available?

Otherwise the patch looks good and thanks a lot for working on this!

Honza

> >
> > gcc/testsuite/ChangeLog
> > 2017-02-16  Bin Cheng  <bin.cheng@arm.com>
> >
> >         PR tree-optimization/77536
> >         * gcc.dg/vect/pr79347.c: Revise testing string.