From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 27829 invoked by alias); 5 Jan 2011 14:11:07 -0000 Received: (qmail 27815 invoked by uid 22791); 5 Jan 2011 14:11:05 -0000 X-SWARE-Spam-Status: No, hits=-1.7 required=5.0 tests=AWL,BAYES_00,TW_DB,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from bromo.med.uc.edu (HELO bromo.med.uc.edu) (129.137.3.146) by sourceware.org (qpsmtpd/0.43rc1) with SMTP; Wed, 05 Jan 2011 14:10:59 +0000 Received: from bromo.med.uc.edu (localhost.localdomain [127.0.0.1]) by bromo.med.uc.edu (Postfix) with ESMTP id CC3C3B0074; Wed, 5 Jan 2011 09:10:56 -0500 (EST) Received: (from howarth@localhost) by bromo.med.uc.edu (8.14.3/8.14.3/Submit) id p05EAtgL009199; Wed, 5 Jan 2011 09:10:55 -0500 Date: Wed, 05 Jan 2011 14:12:00 -0000 From: Jack Howarth To: "Fang, Changpeng" Cc: Zdenek Dvorak , Richard Guenther , Xinliang David Li , "gcc-patches@gcc.gnu.org" Subject: Re: [PATCH, Loop optimizer]: Add logic to disable certain loop optimizations on pre-/post-loops Message-ID: <20110105141055.GA9168@bromo.med.uc.edu> References: <20101214075629.GA10020@kam.mff.cuni.cz> <20101214210552.GA19633@kam.mff.cuni.cz> <20101215092220.GA9872@kam.mff.cuni.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.18 (2008-05-17) Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-01/txt/msg00222.txt.bz2 On Tue, Jan 04, 2011 at 04:30:38PM -0600, Fang, Changpeng wrote: > Hi, > > Thanks, Jack. Hopefully the 6% protein gain (for -m64) is not just noise. > > I updated the patch based on the current trunk (REV 168477). > > Is it OK to commit now? I can't approve it but the benchmarks now seem reasonable with the patch (although you might want to investigate why fatigue runs 3.9% slower at -m32). > > Thanks, > > Changpeng > > > > ________________________________________ > From: Jack Howarth [howarth@bromo.med.uc.edu] > Sent: Monday, January 03, 2011 9:04 PM > To: Fang, Changpeng > Cc: Zdenek Dvorak; Richard Guenther; Xinliang David Li; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH, Loop optimizer]: Add logic to disable certain loop optimizations on pre-/post-loops > > On Fri, Dec 17, 2010 at 01:14:49AM -0600, Fang, Changpeng wrote: > > Hi, Jack: > > > > Thanks for the testing. > > > > This patch is not supposed to slow down a program by 10% (rnflow and test_fpu). > > It would be helpful if you can provide analysis why they are slowed down. > > Changpeng, > The corrected merge against gcc trunk of... > > Index: gcc/basic-block.h > =================================================================== > --- gcc/basic-block.h (revision 168437) > +++ gcc/basic-block.h (working copy) > @@ -247,11 +247,14 @@ > Only used in cfgcleanup.c. */ > BB_NONTHREADABLE_BLOCK = 1 << 11, > > + /* Set on blocks that are headers of non-rolling loops. */ > + BB_HEADER_OF_NONROLLING_LOOP = 1 << 12, > + > /* Set on blocks that were modified in some way. This bit is set in > df_set_bb_dirty, but not cleared by df_analyze, so it can be used > to test whether a block has been modified prior to a df_analyze > call. */ > - BB_MODIFIED = 1 << 12 > + BB_MODIFIED = 1 << 13 > }; > > /* Dummy flag for convenience in the hot/cold partitioning code. */ > > for the proposed patch from http://gcc.gnu.org/ml/gcc-patches/2010-12/msg01344.html > eliminated the performance regressions on x86_64-apple-darwin10. I now get... > > Compile Command : gfortran -ffast-math -funroll-loops -O3 %n.f90 -o %n > > Execution Time > -m32 > stock patched %increase > ac 10.59 10.59 0.0 > aermod 19.49 19.13 -1.8 > air 6.07 6.07 0.0 > capacita 44.60 44.61 0.0 > channel 1.98 1.98 0.0 > doduc 31.19 31.31 0.4 > fatigue 9.90 10.29 3.9 > gas_dyn 4.72 4.71 -0.2 > induct 13.93 13.93 0.0 > linpk 15.50 15.49 -0.1 > mdbx 11.28 11.26 -0.2 > nf 27.62 27.58 -0.1 > protein 38.70 38.60 -0.3 > rnflow 24.68 24.68 0.0 > test_fpu 10.13 10.13 0.0 > tfft 1.92 1.92 0.0 > > Geometric Mean 12.06 12.08 0.2 > Execution Time > > -m64 > stock patched %increase > ac 8.80 8.80 0.0 > aermod 17.34 17.17 -1.0 > air 5.48 5.52 0.7 > capacita 32.38 32.50 0.4 > channel 1.84 1.84 0.0 > doduc 26.50 26.52 0.1 > fatigue 8.35 8.33 -0.2 > gas_dyn 4.30 4.29 -0.2 > induct 12.83 12.83 0.0 > linpk 15.49 15.49 0.0 > mdbx 11.23 11.22 -0.1 > nf 30.21 30.16 -0.2 > protein 34.13 32.07 -6.0 > rnflow 23.18 23.19 0.0 > test_fpu 8.04 8.02 -0.2 > tfft 1.87 1.86 -0.5 > > Geometric Mean 10.87 10.82 -0.5 > Execution Time Content-Description: 0001-Consider-a-loop-not-hot-if-it-rolls-only-a-few-times.patch > From 7d10fcfe379e3fe4387cf192fbbdbea7daff0637 Mon Sep 17 00:00:00 2001 > From: Changpeng Fang > Date: Tue, 4 Jan 2011 14:36:50 -0800 > Subject: [PATCH] Consider a loop not hot if it rolls only a few times (non-rolling) > > * basic-block.h (bb_flags): Add a new flag BB_HEADER_OF_NONROLLING_LOOP. > * cfg.c (clear_bb_flags): Keep BB_HEADER_OF_NONROLLING marker. > * cfgloop.h (mark_non_rolling_loop): New function declaration. > (non_rolling_loop_p): New function declaration. > * predict.c (optimize_loop_for_size_p): Return true if the loop was marked > NON-ROLLING. (optimize_loop_for_speed_p): Return false if the loop was > marked NON-ROLLING. > * tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Mark the > non-rolling loop. > * tree-ssa-loop-niter.c (mark_non_rolling_loop): Implement the new > function. (non_rolling_loop_p): Implement the new function. > * tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Mark the > non-rolling loop. (vect_do_peeling_for_alignment): Mark the non-rolling > loop. > --- > gcc/basic-block.h | 5 ++++- > gcc/cfg.c | 7 ++++--- > gcc/cfgloop.h | 2 ++ > gcc/predict.c | 6 ++++++ > gcc/tree-ssa-loop-manip.c | 3 +++ > gcc/tree-ssa-loop-niter.c | 20 ++++++++++++++++++++ > gcc/tree-vect-loop-manip.c | 8 ++++++++ > 7 files changed, 47 insertions(+), 4 deletions(-) > > diff --git a/gcc/basic-block.h b/gcc/basic-block.h > index 3594eea..081c175 100644 > --- a/gcc/basic-block.h > +++ b/gcc/basic-block.h > @@ -251,7 +251,10 @@ enum bb_flags > df_set_bb_dirty, but not cleared by df_analyze, so it can be used > to test whether a block has been modified prior to a df_analyze > call. */ > - BB_MODIFIED = 1 << 12 > + BB_MODIFIED = 1 << 12, > + > + /* Set on blocks that are headers of non-rolling loops. */ > + BB_HEADER_OF_NONROLLING_LOOP = 1 << 13 > }; > > /* Dummy flag for convenience in the hot/cold partitioning code. */ > diff --git a/gcc/cfg.c b/gcc/cfg.c > index c8ef799..e59a637 100644 > --- a/gcc/cfg.c > +++ b/gcc/cfg.c > @@ -425,8 +425,8 @@ redirect_edge_pred (edge e, basic_block new_pred) > connect_src (e); > } > > -/* Clear all basic block flags, with the exception of partitioning and > - setjmp_target. */ > +/* Clear all basic block flags, with the exception of partitioning, > + setjmp_target, and the non-rolling loop marker. */ > void > clear_bb_flags (void) > { > @@ -434,7 +434,8 @@ clear_bb_flags (void) > > FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb) > bb->flags = (BB_PARTITION (bb) > - | (bb->flags & (BB_DISABLE_SCHEDULE + BB_RTL + BB_NON_LOCAL_GOTO_TARGET))); > + | (bb->flags & (BB_DISABLE_SCHEDULE + BB_RTL + BB_NON_LOCAL_GOTO_TARGET > + + BB_HEADER_OF_NONROLLING_LOOP))); > } > > /* Check the consistency of profile information. We can't do that > diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h > index f7bb134..0f48115 100644 > --- a/gcc/cfgloop.h > +++ b/gcc/cfgloop.h > @@ -279,6 +279,8 @@ extern rtx doloop_condition_get (rtx); > void estimate_numbers_of_iterations_loop (struct loop *, bool); > HOST_WIDE_INT estimated_loop_iterations_int (struct loop *, bool); > bool estimated_loop_iterations (struct loop *, bool, double_int *); > +void mark_non_rolling_loop (struct loop *); > +bool non_rolling_loop_p (struct loop *); > > /* Loop manipulation. */ > extern bool can_duplicate_loop_p (const struct loop *loop); > diff --git a/gcc/predict.c b/gcc/predict.c > index a86708a..34d7dff 100644 > --- a/gcc/predict.c > +++ b/gcc/predict.c > @@ -280,6 +280,9 @@ optimize_insn_for_speed_p (void) > bool > optimize_loop_for_size_p (struct loop *loop) > { > + /* Loops marked NON-ROLLING are not likely to be hot. */ > + if (non_rolling_loop_p (loop)) > + return true; > return optimize_bb_for_size_p (loop->header); > } > > @@ -288,6 +291,9 @@ optimize_loop_for_size_p (struct loop *loop) > bool > optimize_loop_for_speed_p (struct loop *loop) > { > + /* Loops marked NON-ROLLING are not likely to be hot. */ > + if (non_rolling_loop_p (loop)) > + return false; > return optimize_bb_for_speed_p (loop->header); > } > > diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c > index 8176ed8..3ff0790 100644 > --- a/gcc/tree-ssa-loop-manip.c > +++ b/gcc/tree-ssa-loop-manip.c > @@ -931,6 +931,9 @@ tree_transform_and_unroll_loop (struct loop *loop, unsigned factor, > gcc_assert (new_loop != NULL); > update_ssa (TODO_update_ssa); > > + /* NEW_LOOP is a non-rolling loop. */ > + mark_non_rolling_loop (new_loop); > + > /* Determine the probability of the exit edge of the unrolled loop. */ > new_est_niter = est_niter / factor; > > diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c > index ee85f6f..ec108c2 100644 > --- a/gcc/tree-ssa-loop-niter.c > +++ b/gcc/tree-ssa-loop-niter.c > @@ -3011,6 +3011,26 @@ estimate_numbers_of_iterations (bool use_undefined_p) > fold_undefer_and_ignore_overflow_warnings (); > } > > +/* Mark LOOP as a non-rolling loop. */ > + > +void > +mark_non_rolling_loop (struct loop *loop) > +{ > + gcc_assert (loop && loop->header); > + loop->header->flags |= BB_HEADER_OF_NONROLLING_LOOP; > +} > + > +/* Return true if LOOP is a non-rolling loop. */ > + > +bool > +non_rolling_loop_p (struct loop *loop) > +{ > + int masked_flags; > + gcc_assert (loop && loop->header); > + masked_flags = (loop->header->flags & BB_HEADER_OF_NONROLLING_LOOP); > + return (masked_flags != 0); > +} > + > /* Returns true if statement S1 dominates statement S2. */ > > bool > diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c > index 28b75f1..9bbe68b 100644 > --- a/gcc/tree-vect-loop-manip.c > +++ b/gcc/tree-vect-loop-manip.c > @@ -1938,6 +1938,10 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo, tree *ratio, > cond_expr, cond_expr_stmt_list); > gcc_assert (new_loop); > gcc_assert (loop_num == loop->num); > + > + /* NEW_LOOP is a non-rolling loop. */ > + mark_non_rolling_loop (new_loop); > + > #ifdef ENABLE_CHECKING > slpeel_verify_cfg_after_peeling (loop, new_loop); > #endif > @@ -2191,6 +2195,10 @@ vect_do_peeling_for_alignment (loop_vec_info loop_vinfo) > th, true, NULL_TREE, NULL); > > gcc_assert (new_loop); > + > + /* NEW_LOOP is a non-rolling loop. */ > + mark_non_rolling_loop (new_loop); > + > #ifdef ENABLE_CHECKING > slpeel_verify_cfg_after_peeling (new_loop, loop); > #endif > -- > 1.6.3.3 >