From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 29444 invoked by alias); 16 Dec 2010 11:33:13 -0000 Received: (qmail 29429 invoked by uid 22791); 16 Dec 2010 11:33:12 -0000 X-SWARE-Spam-Status: No, hits=-2.2 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW X-Spam-Check-By: sourceware.org Received: from mail-iw0-f175.google.com (HELO mail-iw0-f175.google.com) (209.85.214.175) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 16 Dec 2010 11:33:06 +0000 Received: by iwn8 with SMTP id 8so3474147iwn.20 for ; Thu, 16 Dec 2010 03:33:04 -0800 (PST) MIME-Version: 1.0 Received: by 10.231.34.201 with SMTP id m9mr6029201ibd.81.1292499184390; Thu, 16 Dec 2010 03:33:04 -0800 (PST) Received: by 10.231.15.2 with HTTP; Thu, 16 Dec 2010 03:33:04 -0800 (PST) In-Reply-To: <20101215092220.GA9872@kam.mff.cuni.cz> References: <20101214075629.GA10020@kam.mff.cuni.cz> <20101214210552.GA19633@kam.mff.cuni.cz> <20101215092220.GA9872@kam.mff.cuni.cz> Date: Thu, 16 Dec 2010 12:09:00 -0000 Message-ID: Subject: Re: [PATCH, Loop optimizer]: Add logic to disable certain loop optimizations on pre-/post-loops From: Richard Guenther To: Zdenek Dvorak Cc: Xinliang David Li , "Fang, Changpeng" , "gcc-patches@gcc.gnu.org" Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2010-12/txt/msg01292.txt.bz2 2010/12/15 Zdenek Dvorak : > Hi, > >> >>> =A0>why not simply change the profile updating to correctly indicate= that these loops do not roll? >> >>> >That way, all the optimizations would profit, not just those aware = of the new bb flag, >> >>> >> >>> Maybe my understanding is not correct. But I feel not comfortable us= ing profile of trip count >> >>> to guard loop optimizations. >> >> >> >> it is already used that way; i.e., you do not need to change anything= in the optimizations, just >> >> make sure that the edge probabilities are sensible. >> >> >> >>> For a given program, different data sizes will result in quite diffe= rent >> >>> loop trip counts. >> >> >> >> That should not be the case -- for the pre/post loops generated in ve= ctorization, we know the >> >> expected # of iterations, based on their purpose; e.g., for loops ins= erted so that the # of iterarations >> >> is divisible by 4, we know that the loop will iterate at most three t= imes (and probably less), etc. >> >> >> >>> By the way, what optimizations else do you think will benefit from d= isabling for small trip count >> >>> loops, significantly? >> >> >> >> Anything where we check whether we should optimize for speed or code = size, >> > >> >>I agree with Zdenek (without having looked at the patch sofar). >> > >> > I think my patch (adding a bb flag) provides a simple and yet effectiv= e solution for the unnecessay >> > code expansion problem in prefetching, unswitching, and loop unrolling= . However, I don't mind >> > updating the profile information for the same purpose. >> > >> > Now, suppose we know a loop will roll at most 3 times at runtime. How = should we update the profile >> > information to let the expected_loop_iterations to know this value? ( = I got lost here about the >> > edge probabilities issues) >> > >> >> >> In general, without FDO, gcc does not estimate loop iteration >> according to the back-edge probability computed by static prediction >> (predict.c). =A0This is less than ideal. For instance, when >> builtin_expect is used to annotate the loop bound, the information >> will be lost. > > hmmm.... I forgot about this. =A0OK, I withdraw my objection against the = patch, although > I would suggest the following changes: > -- rename BB_PRE_POST_LOOP_HEADER to something like BB_HEADER_OF_NONROLLI= NG_LOOP, > -- in estimate_numbers_of_iterations_loop, for loops with this flags use > =A0 record_niter_bound (loop, double_int_two, true, false) > =A0 to make tree-level loop optimizations know that the loop does not rol= l, > -- the check for the flag in loop_prefetch_arrays should not be needed, t= hen. Btw, it would be nice if number-of-iteration analysis would figure out an upper bound for niter for the typical prologue loops (which have exit tests like i < niter & CST). It's of course more difficult for epilogues where we'd need to figure out the exit test and increment of a preceeding loop. Btw, any reason why we do not use static profiles for number of iteration estimates? We after all _do_ use the static profile to guide the maybe_hot/cold_bb tests. Richard. > Zdenek >