From: "Fang, Changpeng" <Changpeng.Fang@amd.com>
To: Jack Howarth <howarth@bromo.med.uc.edu>
Cc: Zdenek Dvorak <rakdver@kam.mff.cuni.cz>,
Richard Guenther <richard.guenther@gmail.com>,
Xinliang David Li <davidxl@google.com>,
"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: RE: [PATCH, Loop optimizer]: Add logic to disable certain loop optimizations on pre-/post-loops
Date: Fri, 17 Dec 2010 09:55:00 -0000 [thread overview]
Message-ID: <D4C76825A6780047854A11E93CDE84D004C1768412@SAUSEXMBP01.amd.com> (raw)
In-Reply-To: <20101217053115.GA14379@bromo.med.uc.edu>
Hi, Jack:
Thanks for the testing.
This patch is not supposed to slow down a program by 10% (rnflow and test_fpu).
It would be helpful if you can provide analysis why they are slowed down.
We did see a significant compilation time reduction for most pb05 programs.
(I don't know why you do not have executable size data).
>I would note that intel darwin now defaults
>to -mtune=core2 and always has defaulted to -fPIC.
I could not understand these default for darwin. My understanding is that,
for x86_64, the default should be -mtune=generic.
Thanks,
Changpeng
________________________________________
From: Jack Howarth [howarth@bromo.med.uc.edu]
Sent: Thursday, December 16, 2010 11:31 PM
To: Fang, Changpeng
Cc: Zdenek Dvorak; Richard Guenther; Xinliang David Li; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH, Loop optimizer]: Add logic to disable certain loop optimizations on pre-/post-loops
On Thu, Dec 16, 2010 at 06:13:52PM -0600, Fang, Changpeng wrote:
> Hi,
>
> Based on previous discussions, I modified the patch as such.
>
> If a loop is marked as non-rolling, optimize_loop_for_size_p returns TRUE
> and optimize_loop_for_speed_p returns FALSE. All users of these two
> functions will be affected.
>
> After applying the modified patch, pb05 compilation time decreases 29%, binary
> size decreases 20%, while a small (0.5%) performance increase was found which may
> be just noise.
>
> Modified patch passed bootstrapping and gcc regression tests on x86_64-unknown-linux-gnu.
>
> Is it OK to commit to trunk?
Changpeng,
On x86_64-apple-darwin10, I am finding a more severe penalty for this patch with
the pb05 benchmarks. Using a profiledbootstrap BOOT_CFLAGS="-g -O3" build with...
Configured with: ../gcc-4.6-20101216/configure --prefix=/sw --prefix=/sw/lib/gcc4.6 --mandir=/sw/share/man --infodir=/sw/lib/gcc4.6/info --enable-languages=c,c++,fortran,objc,obj-c++,java --with-gmp=/sw --with-libiconv-prefix=/sw --with-ppl=/sw --with-cloog=/sw --with-mpc=/sw --with-system-zlib --x-includes=/usr/X11R6/include --x-libraries=/usr/X11R6/lib --program-suffix=-fsf-4.6 --enable-checking=yes --enable-cloog-backend=isl --enable-build-with-cxx
I get without the patch...
================================================================================
Date & Time : 16 Dec 2010 23:36:27
Test Name : gfortran_lin_O3
Compile Command : gfortran -ffast-math -funroll-loops -O3 %n.f90 -o %n
Benchmarks : ac aermod air capacita channel doduc fatigue gas_dyn induct linpk mdbx nf protein rnflow test_fpu tfft
Maximum Times : 2000.0
Target Error % : 0.100
Minimum Repeats : 10
Maximum Repeats : 100
Benchmark Compile Executable Ave Run Number Estim
Name (secs) (bytes) (secs) Repeats Err %
--------- ------- ---------- ------- ------- ------
ac 1.76 10000 8.78 12 0.0081
aermod 54.94 10000 17.28 10 0.0307
air 3.44 10000 5.53 13 0.0734
capacita 2.64 10000 32.65 10 0.0096
channel 0.89 10000 1.84 20 0.0977
doduc 8.12 10000 27.00 10 0.0132
fatigue 3.06 10000 8.36 10 0.0104
gas_dyn 5.00 10000 4.30 17 0.0915
induct 6.24 10000 12.42 10 0.0100
linpk 1.02 10000 15.50 12 0.0542
mdbx 2.55 10000 11.24 10 0.0256
nf 2.90 10000 30.16 20 0.0989
protein 7.98 10000 33.72 10 0.0070
rnflow 9.34 10000 23.21 10 0.0551
test_fpu 6.72 10000 8.05 10 0.0426
tfft 0.76 10000 1.87 10 0.0597
Geometric Mean Execution Time = 10.87 seconds
and with the patch...
================================================================================
Date & Time : 16 Dec 2010 21:31:06
Test Name : gfortran_lin_O3
Compile Command : gfortran -ffast-math -funroll-loops -O3 %n.f90 -o %n
Benchmarks : ac aermod air capacita channel doduc fatigue gas_dyn induct linpk mdbx nf protein rnflow test_fpu tfft
Maximum Times : 2000.0
Target Error % : 0.100
Minimum Repeats : 10
Maximum Repeats : 100
Benchmark Compile Executable Ave Run Number Estim
Name (secs) (bytes) (secs) Repeats Err %
--------- ------- ---------- ------- ------- ------
ac 1.19 10000 8.78 10 0.0099
aermod 47.91 10000 16.95 10 0.0123
air 2.85 10000 5.34 12 0.0715
capacita 1.63 10000 33.10 10 0.0361
channel 0.67 10000 1.87 10 0.0884
doduc 6.42 10000 27.35 10 0.0206
fatigue 2.10 10000 8.32 10 0.0194
gas_dyn 2.07 10000 4.30 17 0.0843
induct 5.38 10000 12.58 10 0.0088
linpk 0.71 10000 15.69 18 0.0796
mdbx 1.95 10000 11.41 10 0.0238
nf 1.24 10000 31.34 12 0.0991
protein 3.88 10000 35.13 10 0.0659
rnflow 4.73 10000 25.97 10 0.0629
test_fpu 3.66 10000 8.88 11 0.0989
tfft 0.52 10000 1.89 10 0.0403
Geometric Mean Execution Time = 11.09 seconds
This shows about a 2.0% performance reduction in the Geometric
Mean Execution Time. I would note that intel darwin now defaults
to -mtune=core2 and always has defaulted to -fPIC.
Jack
>
> Thanks,
>
> Changpeng
>
>
>
>
>
>
>
>
>
> ________________________________________
> From: Zdenek Dvorak [rakdver@kam.mff.cuni.cz]
> Sent: Thursday, December 16, 2010 12:47 PM
> To: Fang, Changpeng
> Cc: Richard Guenther; Xinliang David Li; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH, Loop optimizer]: Add logic to disable certain loop optimizations on pre-/post-loops
>
> Hi,
>
> > For prefetching of prologue or epilogue loops, we have two choices (1) prefetching not
> > not unrolling, (2) not prefetching. Which one do you prefer?
>
> it is better not to prefetch (the current placement of prefetches is not good for non-rolling
> loops),
>
> Zdenek
>
Content-Description: polyhedron1.txt
> Comparison of Polyhedron (2005) Compile Time, Binary Size and Performance After Applying the NON-ROLLING Marking Patch
>
> gfortran -Ofast -funroll-loops -march=amdfam10 %n.f90 -o %n
> ============================================================================================================================
> || | Before Patch | After Patch | Changes ||
> ||========================================================================================================================||
> || Benchmark | Compile Binary Size Run Time | Compile Binary Size Run Time | Compile Binary Performance ||
> || Name | Time (s) (bytes) (s) | Time(s) (bytes) (secs) | Time (%) Size(%) (%) ||
> ||========================================================================================================================||
> || ac | 3.36 41976 13.26 | 2.52 34424 13.21 | -25.00 -17.99 0.38 ||
> || aermod | 103.45 1412221 44.36 | 86.55 1268861 43.08 | -16.34 -10.15 2.97 ||
> || air | 6.11 75186 11.73 | 5.72 71090 11.56 | -6.38 -5.45 1.47 ||
> || capacita | 6.83 91257 88.40 | 4.70 74585 88.01 | -31.19 -18.27 0.44 ||
> || channel | 2.14 39984 6.65 | 1.84 35888 6.69 | -14.02 -10.24 -0.60 ||
> || doduc | 12.78 198624 38.59 | 12.20 186336 38.18 | -4.54 -6.19 1.07 ||
> || fatigue | 9.11 110008 10.15 | 5.93 92472 10.12 | -34.91 -15.94 0.30 ||
> || gas_dyn | 15.69 149726 7.14 | 8.45 109342 6.96 | -46.14 -26.97 2.59 ||
> || induct | 10.98 191800 20.66 | 10.62 188168 20.61 | -3.28 -1.89 0.24 ||
> || linpk | 2.27 46073 19.03 | 1.68 33401 19.03 | -25.99 -27.50 0.00 ||
> || mdbx | 5.63 103731 21.41 | 4.24 83251 21.33 | -24.69 -19.74 0.38 ||
> || nf | 14.18 118451 22.88 | 5.55 60499 23.14 | -60.86 -48.92 -1.12 ||
> || protein | 34.20 177700 47.04 | 19.16 135012 46.94 | -43.98 -24.02 0.21 ||
> || rnflow | 42.13 283645 40.30 | 20.92 178477 40.65 | -50.34 -37.08 -0.86 ||
> || test_fpu | 30.17 252080 14.46 | 14.44 149136 14.32 | -52.14 -40.84 0.98 ||
> || tfft | 1.50 32450 7.71 | 1.12 26546 7.67 | -25.33 -18.19 0.52 ||
> ||========================================================================================================================||
> || average | 19.57 | 19.46 | -29.07 -20.59 0.56 ||
> ============================================================================================================================
Content-Description: 0001-Don-t-perform-certain-loop-optimizations-on-pre-post.patch
> From cd8b85bba1b39e108235f44d9d07918179ff3d79 Mon Sep 17 00:00:00 2001
> From: Changpeng Fang <chfang@houghton.(none)>
> Date: Mon, 13 Dec 2010 12:01:49 -0800
> Subject: [PATCH] Don't perform certain loop optimizations on pre/post loops
>
> * basic-block.h (bb_flags): Add a new flag BB_HEADER_OF_NONROLLING_LOOP.
> * cfg.c (clear_bb_flags): Keep BB_HEADER_OF_NONROLLING marker.
> * cfgloop.h (mark_non_rolling_loop): New function declaration.
> (non_rolling_loop_p): New function declaration.
> * predict.c (optimize_loop_for_size_p): Return true if the loop was marked
> NON-ROLLING. (optimize_loop_for_speed_p): Return false if the loop was
> marked NON-ROLLING.
> * tree-ssa-loop-manip.c (tree_transform_and_unroll_loop): Mark the
> non-rolling loop.
> * tree-ssa-loop-niter.c (mark_non_rolling_loop): Implement the new
> function. (non_rolling_loop_p): Implement the new function.
> * tree-vect-loop-manip.c (vect_do_peeling_for_loop_bound): Mark the
> non-rolling loop. (vect_do_peeling_for_alignment): Mark the non-rolling
> loop.
> ---
> gcc/basic-block.h | 6 +++++-
> gcc/cfg.c | 7 ++++---
> gcc/cfgloop.h | 2 ++
> gcc/predict.c | 6 ++++++
> gcc/tree-ssa-loop-manip.c | 3 +++
> gcc/tree-ssa-loop-niter.c | 20 ++++++++++++++++++++
> gcc/tree-vect-loop-manip.c | 8 ++++++++
> 7 files changed, 48 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/basic-block.h b/gcc/basic-block.h
> index be0a1d1..850472d 100644
> --- a/gcc/basic-block.h
> +++ b/gcc/basic-block.h
> @@ -245,7 +245,11 @@ enum bb_flags
>
> /* Set on blocks that cannot be threaded through.
> Only used in cfgcleanup.c. */
> - BB_NONTHREADABLE_BLOCK = 1 << 11
> + BB_NONTHREADABLE_BLOCK = 1 << 11,
> +
> + /* Set on blocks that are headers of non-rolling loops. */
> + BB_HEADER_OF_NONROLLING_LOOP = 1 << 12
> +
> };
>
> /* Dummy flag for convenience in the hot/cold partitioning code. */
> diff --git a/gcc/cfg.c b/gcc/cfg.c
> index c8ef799..e59a637 100644
> --- a/gcc/cfg.c
> +++ b/gcc/cfg.c
> @@ -425,8 +425,8 @@ redirect_edge_pred (edge e, basic_block new_pred)
> connect_src (e);
> }
>
> -/* Clear all basic block flags, with the exception of partitioning and
> - setjmp_target. */
> +/* Clear all basic block flags, with the exception of partitioning,
> + setjmp_target, and the non-rolling loop marker. */
> void
> clear_bb_flags (void)
> {
> @@ -434,7 +434,8 @@ clear_bb_flags (void)
>
> FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR, NULL, next_bb)
> bb->flags = (BB_PARTITION (bb)
> - | (bb->flags & (BB_DISABLE_SCHEDULE + BB_RTL + BB_NON_LOCAL_GOTO_TARGET)));
> + | (bb->flags & (BB_DISABLE_SCHEDULE + BB_RTL + BB_NON_LOCAL_GOTO_TARGET
> + + BB_HEADER_OF_NONROLLING_LOOP)));
> }
>
> /* Check the consistency of profile information. We can't do that
> diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
> index bf2614e..e856a78 100644
> --- a/gcc/cfgloop.h
> +++ b/gcc/cfgloop.h
> @@ -279,6 +279,8 @@ extern rtx doloop_condition_get (rtx);
> void estimate_numbers_of_iterations_loop (struct loop *, bool);
> HOST_WIDE_INT estimated_loop_iterations_int (struct loop *, bool);
> bool estimated_loop_iterations (struct loop *, bool, double_int *);
> +void mark_non_rolling_loop (struct loop *);
> +bool non_rolling_loop_p (struct loop *);
>
> /* Loop manipulation. */
> extern bool can_duplicate_loop_p (const struct loop *loop);
> diff --git a/gcc/predict.c b/gcc/predict.c
> index c691990..bf729f8 100644
> --- a/gcc/predict.c
> +++ b/gcc/predict.c
> @@ -279,6 +279,9 @@ optimize_insn_for_speed_p (void)
> bool
> optimize_loop_for_size_p (struct loop *loop)
> {
> + /* Loops marked NON-ROLLING are not likely to be hot. */
> + if (non_rolling_loop_p (loop))
> + return true;
> return optimize_bb_for_size_p (loop->header);
> }
>
> @@ -287,6 +290,9 @@ optimize_loop_for_size_p (struct loop *loop)
> bool
> optimize_loop_for_speed_p (struct loop *loop)
> {
> + /* Loops marked NON-ROLLING are not likely to be hot. */
> + if (non_rolling_loop_p (loop))
> + return false;
> return optimize_bb_for_speed_p (loop->header);
> }
>
> diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
> index 87b2c0d..bc977bb 100644
> --- a/gcc/tree-ssa-loop-manip.c
> +++ b/gcc/tree-ssa-loop-manip.c
> @@ -931,6 +931,9 @@ tree_transform_and_unroll_loop (struct loop *loop, unsigned factor,
> gcc_assert (new_loop != NULL);
> update_ssa (TODO_update_ssa);
>
> + /* NEW_LOOP is a non-rolling loop. */
> + mark_non_rolling_loop (new_loop);
> +
> /* Determine the probability of the exit edge of the unrolled loop. */
> new_est_niter = est_niter / factor;
>
> diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
> index ee85f6f..1e2e4b2 100644
> --- a/gcc/tree-ssa-loop-niter.c
> +++ b/gcc/tree-ssa-loop-niter.c
> @@ -3011,6 +3011,26 @@ estimate_numbers_of_iterations (bool use_undefined_p)
> fold_undefer_and_ignore_overflow_warnings ();
> }
>
> +/* Mark LOOP as a non-rolling loop. */
> +
> +void
> +mark_non_rolling_loop (struct loop *loop)
> +{
> + gcc_assert (loop && loop->header);
> + loop->header->flags |= BB_HEADER_OF_NONROLLING_LOOP;
> +}
> +
> +/* Return true if LOOP is a non-rolling loop. */
> +
> +bool
> +non_rolling_loop_p (struct loop *loop)
> +{
> + int masked_flags;
> + gcc_assert (loop && loop->header);
> + masked_flags = (loop->header->flags & BB_HEADER_OF_NONROLLING_LOOP);
> + return (masked_flags != 0);
> +}
> +
> /* Returns true if statement S1 dominates statement S2. */
>
> bool
> diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
> index 6ecd304..216de78 100644
> --- a/gcc/tree-vect-loop-manip.c
> +++ b/gcc/tree-vect-loop-manip.c
> @@ -1938,6 +1938,10 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo, tree *ratio,
> cond_expr, cond_expr_stmt_list);
> gcc_assert (new_loop);
> gcc_assert (loop_num == loop->num);
> +
> + /* NEW_LOOP is a non-rolling loop. */
> + mark_non_rolling_loop (new_loop);
> +
> #ifdef ENABLE_CHECKING
> slpeel_verify_cfg_after_peeling (loop, new_loop);
> #endif
> @@ -2191,6 +2195,10 @@ vect_do_peeling_for_alignment (loop_vec_info loop_vinfo)
> th, true, NULL_TREE, NULL);
>
> gcc_assert (new_loop);
> +
> + /* NEW_LOOP is a non-rolling loop. */
> + mark_non_rolling_loop (new_loop);
> +
> #ifdef ENABLE_CHECKING
> slpeel_verify_cfg_after_peeling (new_loop, loop);
> #endif
> --
> 1.6.3.3
>
next prev parent reply other threads:[~2010-12-17 7:15 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-12-13 21:57 Fang, Changpeng
2010-12-14 0:56 ` Sebastian Pop
2010-12-14 8:25 ` Zdenek Dvorak
2010-12-14 20:02 ` Fang, Changpeng
2010-12-14 21:55 ` Zdenek Dvorak
2010-12-15 6:16 ` Richard Guenther
2010-12-15 8:34 ` Fang, Changpeng
2010-12-15 9:22 ` Xinliang David Li
2010-12-15 10:00 ` Zdenek Dvorak
2010-12-15 16:46 ` Fang, Changpeng
2010-12-15 16:47 ` Zdenek Dvorak
2010-12-15 17:08 ` Xinliang David Li
2010-12-16 12:09 ` Richard Guenther
2010-12-16 12:41 ` Zdenek Dvorak
2010-12-16 18:26 ` Fang, Changpeng
2010-12-16 20:06 ` Zdenek Dvorak
2010-12-17 3:53 ` Fang, Changpeng
2010-12-17 6:36 ` Jack Howarth
2010-12-17 9:55 ` Fang, Changpeng [this message]
2010-12-17 16:13 ` Jack Howarth
2010-12-17 16:48 ` Fang, Changpeng
2010-12-17 17:20 ` Jack Howarth
2010-12-17 18:01 ` Jack Howarth
2010-12-17 18:31 ` Fang, Changpeng
2011-01-04 3:33 ` Jack Howarth
2011-01-04 22:40 ` Fang, Changpeng
2011-01-05 12:07 ` Richard Guenther
2011-01-05 14:12 ` Jack Howarth
2010-12-17 21:45 ` Jack Howarth
2010-12-17 22:35 ` Fang, Changpeng
2010-12-17 22:54 ` [PATCH, Loop optimizer]: Add logic to disable certain loop optimizations on pre-/post-loopsj Jack Howarth
2010-12-18 11:35 ` [PATCH, Loop optimizer]: Add logic to disable certain loop optimizations on pre-/post-loops Jack Howarth
2010-12-19 0:29 ` Richard Guenther
2010-12-21 22:45 ` Fang, Changpeng
2010-12-14 16:13 ` Jack Howarth
2010-12-14 17:33 ` Fang, Changpeng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=D4C76825A6780047854A11E93CDE84D004C1768412@SAUSEXMBP01.amd.com \
--to=changpeng.fang@amd.com \
--cc=davidxl@google.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=howarth@bromo.med.uc.edu \
--cc=rakdver@kam.mff.cuni.cz \
--cc=richard.guenther@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).