From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 11397 invoked by alias); 14 Dec 2010 14:27:38 -0000 Received: (qmail 10783 invoked by uid 22791); 14 Dec 2010 14:27:34 -0000 X-SWARE-Spam-Status: No, hits=-1.7 required=5.0 tests=AWL,BAYES_00,TW_DB,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from bromo.med.uc.edu (HELO bromo.med.uc.edu) (129.137.3.146) by sourceware.org (qpsmtpd/0.43rc1) with SMTP; Tue, 14 Dec 2010 14:27:15 +0000 Received: from bromo.med.uc.edu (localhost.localdomain [127.0.0.1]) by bromo.med.uc.edu (Postfix) with ESMTP id 52AD7B2DBB; Tue, 14 Dec 2010 09:27:12 -0500 (EST) Received: (from howarth@localhost) by bromo.med.uc.edu (8.14.3/8.14.3/Submit) id oBEERCMw024226; Tue, 14 Dec 2010 09:27:12 -0500 Date: Tue, 14 Dec 2010 16:13:00 -0000 From: Jack Howarth To: "Fang, Changpeng" Cc: "gcc-patches@gcc.gnu.org" Subject: Re: [PATCH, Loop optimizer]: Add logic to disable certain loop optimizations on pre-/post-loops Message-ID: <20101214142712.GA24218@bromo.med.uc.edu> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.18 (2008-05-17) Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2010-12/txt/msg01102.txt.bz2 On Mon, Dec 13, 2010 at 02:35:35PM -0600, Fang, Changpeng wrote: > Hi, > > The attached patch adds the logic to disable certain loop optimizations on pre-/post-loops. > > Some loop optimizations (auto-vectorization, loop unrolling, etc) may peel a few iterations > of a loop to form pre- and/or post-loops for various purposes (alignment, loop bounds, etc). > Currently, GCC loop optimizer is unable to recognize that such loops will roll only a few > iterations and still perform optimizations on them. While this does not hurt the performance in general, > it may significantly increase the compilation time and code size without performance benefit. > > This patch adds such logic for the loop optimizer to recognize pre- and/or post loops, and disable > prefetch, unswitch and loop unrolling on them. On polyhedron with -Ofast -funroll-loops -march=amdfam10, > the patch could reduce the compilation time by 28% on average, the reduce the binary size by 20% on > average (see the atached data). Note that the small improvement (0.5%) could have been noise, the > code size reduction could possibly improve the performance in some cases (I_cache iprovement?). > > The patch passed bootstrap and gcc regression tests on x86_64-unknown-linux-gnu. > > Is it OK to commit to trunk? > > Thanks, > > Changpeng Changpeng, On x86_64-apple-darwin10, this patch produces some regressions in the gcc testsuite. In particular at both -m32 and -m64... XPASS: gcc.dg/pr30957-1.c execution test FAIL: gcc.dg/pr30957-1.c scan-rtl-dump loop2_unroll "Expanding Accumulator" and FAIL: gcc.dg/var-expand1.c scan-rtl-dump loop2_unroll "Expanding Accumulator" Do you see those as well on linux? Jack > Content-Description: polyhedron.txt > Effact of the pre-/post-loop patch on polyhedron > option: gfortran -Ofast -funroll-loops -march=amdfam10 > > compilation code size speed > time reduction reduction improvement > (%) (%) (%) > ----------------------------------------------------------- > ac -20.54 -17.15 0 > aermod -15.93 -10.15 2.51 > air -5.74 -5.45 -0.09 > capacita -31.35 -18.27 0.08 > channel -11.32 -10.24 1.22 > doduc -4.52 -6.12 0.82 > fatigue -34.51 -15.94 0 > gas_dyn -45.56 -28.66 2.31 > induct -3.1 -1.91 0.05 > linpk -25.55 -27.5 0.26 > mdbx -24.06 -19.74 1.27 > nf -60.85 -48.92 -0.77 > protein -44.73 -24.02 -0.19 > rnflow -50.55 -36.69 0.47 > test_fpu -52.49 -41.35 1.18 > tfft -24.83 -18.29 0.39 > ----------------------------------------------------------- > average -28.48 -20.65 0.59 > Content-Description: 0001-Don-t-perform-certain-loop-optimizations-on-pre-post.patch > From e8636e80de4d6de8ba2dbc8f08bd2daddd02edc3 Mon Sep 17 00:00:00 2001 > From: Changpeng Fang > Date: Mon, 13 Dec 2010 12:01:49 -0800 > Subject: [PATCH] Don't perform certain loop optimizations on pre/post loops > > * basic-block.h (bb_flags): Add a new flag BB_PRE_POST_LOOP_HEADER. > * cfg.c (clear_bb_flags): Keep BB_PRE_POST_LOOP_HEADER marker. > * cfgloop.h (mark_pre_or_post_loop): New function declaration. > (pre_or_post_loop_p): New function declaration. > * loop-unroll.c (decide_unroll_runtime_iterations): Do not unroll a > pre- or post-loop. > * loop-unswitch.c (unswitch_single_loop): Do not unswitch a pre- or > post-loop. > * tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Mark the > post-loop. > * tree-ssa-loop-niter.c (mark_pre_or_post_loop): Implement the new > function. (pre_or_post_loop_p): Implement the new function. > * tree-ssa-loop-prefetch.c (loop_prefetch_arrays): Don't prefetch > a pre- or post-loop. > * tree-ssa-loop-unswitch.c (tree_ssa_unswitch_loops): Do not unswitch > a pre- or post-loop. > * tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Mark the > post-loop. (vect_do_peeling_for_alignment): Mark the pre-loop. > --- > gcc/basic-block.h | 6 +++++- > gcc/cfg.c | 7 ++++--- > gcc/cfgloop.h | 2 ++ > gcc/loop-unroll.c | 7 +++++++ > gcc/loop-unswitch.c | 8 ++++++++ > gcc/tree-ssa-loop-manip.c | 3 +++ > gcc/tree-ssa-loop-niter.c | 20 ++++++++++++++++++++ > gcc/tree-ssa-loop-prefetch.c | 7 +++++++ > gcc/tree-ssa-loop-unswitch.c | 8 ++++++++ > gcc/tree-vect-loop-manip.c | 8 ++++++++ > 10 files changed, 72 insertions(+), 4 deletions(-) > > diff --git a/gcc/basic-block.h b/gcc/basic-block.h > index be0a1d1..78552fd 100644 > --- a/gcc/basic-block.h > +++ b/gcc/basic-block.h > @@ -245,7 +245,11 @@ enum bb_flags > > /* Set on blocks that cannot be threaded through. > Only used in cfgcleanup.c. */ > - BB_NONTHREADABLE_BLOCK = 1 << 11 > + BB_NONTHREADABLE_BLOCK = 1 << 11, > + > + /* Set on blocks that are headers of pre- or post-loops. */ > + BB_PRE_POST_LOOP_HEADER = 1 << 12 > + > }; > > /* Dummy flag for convenience in the hot/cold partitioning code. */ > diff --git a/gcc/cfg.c b/gcc/cfg.c > index c8ef799..e9b394a 100644 > --- a/gcc/cfg.c > +++ b/gcc/cfg.c > @@ -425,8 +425,8 @@ redirect_edge_pred (edge e, basic_block new_pred) > connect_src (e); > } > > -/* Clear all basic block flags, with the exception of partitioning and > - setjmp_target. */ > +/* Clear all basic block flags, with the exception of partitioning, > + setjmp_target, and the pre/post loop marker. */ > void > clear_bb_flags (void) > { > @@ -434,7 +434,8 @@ clear_bb_flags (void) > > FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb) > bb->flags = (BB_PARTITION (bb) > - | (bb->flags & (BB_DISABLE_SCHEDULE + BB_RTL + BB_NON_LOCAL_GOTO_TARGET))); > + | (bb->flags & (BB_DISABLE_SCHEDULE + BB_RTL + BB_NON_LOCAL_GOTO_TARGET > + + BB_PRE_POST_LOOP_HEADER))); > } > > /* Check the consistency of profile information. We can't do that > diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h > index bf2614e..ce848cc 100644 > --- a/gcc/cfgloop.h > +++ b/gcc/cfgloop.h > @@ -279,6 +279,8 @@ extern rtx doloop_condition_get (rtx); > void estimate_numbers_of_iterations_loop (struct loop *, bool); > HOST_WIDE_INT estimated_loop_iterations_int (struct loop *, bool); > bool estimated_loop_iterations (struct loop *, bool, double_int *); > +void mark_pre_or_post_loop (struct loop *); > +bool pre_or_post_loop_p (struct loop *); > > /* Loop manipulation. */ > extern bool can_duplicate_loop_p (const struct loop *loop); > diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c > index 67d6ea0..6f095f6 100644 > --- a/gcc/loop-unroll.c > +++ b/gcc/loop-unroll.c > @@ -857,6 +857,13 @@ decide_unroll_runtime_iterations (struct loop *loop, int flags) > fprintf (dump_file, ";; Loop iterates constant times\n"); > return; > } > + > + if (pre_or_post_loop_p (loop)) > + { > + if (dump_file) > + fprintf (dump_file, ";; Not unrolling, a pre- or post-loop\n"); > + return; > + } > > /* If we have profile feedback, check whether the loop rolls. */ > if (loop->header->count && expected_loop_iterations (loop) < 2 * nunroll) > diff --git a/gcc/loop-unswitch.c b/gcc/loop-unswitch.c > index 77524d8..59373bf 100644 > --- a/gcc/loop-unswitch.c > +++ b/gcc/loop-unswitch.c > @@ -276,6 +276,14 @@ unswitch_single_loop (struct loop *loop, rtx cond_checked, int num) > return; > } > > + /* Pre- or post loop usually just roll a few iterations. */ > + if (pre_or_post_loop_p (loop)) > + { > + if (dump_file) > + fprintf (dump_file, ";; Not unswitching, a pre- or post loop\n"); > + return; > + } > + > /* We must be able to duplicate loop body. */ > if (!can_duplicate_loop_p (loop)) > { > diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c > index 87b2c0d..f8ddbab 100644 > --- a/gcc/tree-ssa-loop-manip.c > +++ b/gcc/tree-ssa-loop-manip.c > @@ -931,6 +931,9 @@ tree_transform_and_unroll_loop (struct loop *loop, unsigned factor, > gcc_assert (new_loop != NULL); > update_ssa (TODO_update_ssa); > > + /* NEW_LOOP is a post-loop. */ > + mark_pre_or_post_loop (new_loop); > + > /* Determine the probability of the exit edge of the unrolled loop. */ > new_est_niter = est_niter / factor; > > diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c > index ee85f6f..33e8cc3 100644 > --- a/gcc/tree-ssa-loop-niter.c > +++ b/gcc/tree-ssa-loop-niter.c > @@ -3011,6 +3011,26 @@ estimate_numbers_of_iterations (bool use_undefined_p) > fold_undefer_and_ignore_overflow_warnings (); > } > > +/* Mark LOOP as a pre- or post loop. */ > + > +void > +mark_pre_or_post_loop (struct loop *loop) > +{ > + gcc_assert (loop && loop->header); > + loop->header->flags |= BB_PRE_POST_LOOP_HEADER; > +} > + > +/* Return true if LOOP is a pre- or post loop. */ > + > +bool > +pre_or_post_loop_p (struct loop *loop) > +{ > + int masked_flags; > + gcc_assert (loop && loop->header); > + masked_flags = (loop->header->flags & BB_PRE_POST_LOOP_HEADER); > + return (masked_flags != 0); > +} > + > /* Returns true if statement S1 dominates statement S2. */ > > bool > diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c > index 59c65d3..5c9f640 100644 > --- a/gcc/tree-ssa-loop-prefetch.c > +++ b/gcc/tree-ssa-loop-prefetch.c > @@ -1793,6 +1793,13 @@ loop_prefetch_arrays (struct loop *loop) > return false; > } > > + if (pre_or_post_loop_p (loop)) > + { > + if (dump_file && (dump_flags & TDF_DETAILS)) > + fprintf (dump_file, " Not Prefetching -- pre- or post loop\n"); > + return false; > + } > + > /* FIXME: the time should be weighted by the probabilities of the blocks in > the loop body. */ > time = tree_num_loop_insns (loop, &eni_time_weights); > diff --git a/gcc/tree-ssa-loop-unswitch.c b/gcc/tree-ssa-loop-unswitch.c > index b6b32dc..f3b8108 100644 > --- a/gcc/tree-ssa-loop-unswitch.c > +++ b/gcc/tree-ssa-loop-unswitch.c > @@ -88,6 +88,14 @@ tree_ssa_unswitch_loops (void) > if (dump_file && (dump_flags & TDF_DETAILS)) > fprintf (dump_file, ";; Considering loop %d\n", loop->num); > > + /* Do not unswitch a pre- or post loop. */ > + if (pre_or_post_loop_p (loop)) > + { > + if (dump_file && (dump_flags & TDF_DETAILS)) > + fprintf (dump_file, ";; Not unswitching, a pre- or post loop\n"); > + continue; > + } > + > /* Do not unswitch in cold regions. */ > if (optimize_loop_for_size_p (loop)) > { > diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c > index 6ecd304..9a63f7e 100644 > --- a/gcc/tree-vect-loop-manip.c > +++ b/gcc/tree-vect-loop-manip.c > @@ -1938,6 +1938,10 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo, tree *ratio, > cond_expr, cond_expr_stmt_list); > gcc_assert (new_loop); > gcc_assert (loop_num == loop->num); > + > + /* NEW_LOOP is a post loop. */ > + mark_pre_or_post_loop (new_loop); > + > #ifdef ENABLE_CHECKING > slpeel_verify_cfg_after_peeling (loop, new_loop); > #endif > @@ -2191,6 +2195,10 @@ vect_do_peeling_for_alignment (loop_vec_info loop_vinfo) > th, true, NULL_TREE, NULL); > > gcc_assert (new_loop); > + > + /* NEW_LOOP is a pre-loop. */ > + mark_pre_or_post_loop (new_loop); > + > #ifdef ENABLE_CHECKING > slpeel_verify_cfg_after_peeling (new_loop, loop); > #endif > -- > 1.6.3.3 >