public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Reduce complette unrolling & peeling limits
@ 2012-11-14 23:34 Jan Hubicka
  2012-11-15 13:19 ` Jakub Jelinek
  2012-11-18  8:09 ` Eric Botcazou
  0 siblings, 2 replies; 19+ messages in thread
From: Jan Hubicka @ 2012-11-14 23:34 UTC (permalink / raw)
  To: gcc-patches, jakub

Hi,
this patch reduces max-peeled-insns and max-completely-peeled-insns from 400 to
100.  The reason why I am doing this is that I want to reduce code bloat caused
by my cunroll work that enabled a lot more unrolling then previously causing
considerable code size regression at -O3.

I do not think those params was ever serviously tunned, or re-tunned after
introduction of tree-ssa peeling.  I bootstrapped/regtested x86_64 with few
values - 4000, 200, 100, 50 on spec2000,spec2k6,C++ benchmarks and polyhedron.

I also did partial tests on ia-64 (that is broken quite a lot now, but I wanted
to have some sanity check that these values are not too x86 specific).

With 4000 (and also bumped up max-peel-times/max-completely-peel-times) there
are improvements on
  ammp 1360->1460
  equake 1800->1840
  applu 1450->1500
but i guess those needs to be handled by better heuristic.

Otherwise there are no perfromance regression with going 400->100. With 50
there are tiny performance drops on swim and applu.

I plan to follow by testing the max-peel times parameters and then doing inliner
tests.

Bootstrapped/regtested x86_64-linux, OK?

Honza

	* params.def (max-peeled-insns, max-completely-peeled-insns): Reduce to 100.
Index: params.def
===================================================================
--- params.def	(revision 193505)
+++ params.def	(working copy)
@@ -290,7 +290,7 @@ DEFPARAM(PARAM_MAX_UNROLL_TIMES,
 DEFPARAM(PARAM_MAX_PEELED_INSNS,
 	"max-peeled-insns",
 	"The maximum number of insns of a peeled loop",
-	400, 0, 0)
+	100, 0, 0)
 /* The maximum number of peelings of a single loop.  */
 DEFPARAM(PARAM_MAX_PEEL_TIMES,
 	"max-peel-times",
@@ -305,7 +305,7 @@ DEFPARAM(PARAM_MAX_PEEL_BRANCHES,
 DEFPARAM(PARAM_MAX_COMPLETELY_PEELED_INSNS,
 	"max-completely-peeled-insns",
 	"The maximum number of insns of a completely peeled loop",
-	400, 0, 0)
+	100, 0, 0)
 /* The maximum number of peelings of a single loop that is peeled completely.  */
 DEFPARAM(PARAM_MAX_COMPLETELY_PEEL_TIMES,
 	"max-completely-peel-times",

^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: Reduce complette unrolling & peeling limits
@ 2012-11-18 13:35 Dominique Dhumieres
  2012-11-21 16:27 ` Jan Hubicka
  0 siblings, 1 reply; 19+ messages in thread
From: Dominique Dhumieres @ 2012-11-18 13:35 UTC (permalink / raw)
  To: gcc-patches; +Cc: ebotcazou, hubicka

> Did you notice that gcc.c-torture/compile/pr43186.c regressed?  It now again
> takes a while to compile, so times out on slow machines:
> ...

On a 2.5Ghz Core2Duo, compiling the test with revision 192891 (2012-10-28)
takes a small fraction of a second, while with revision 193270 (2012-11-06)
it takes ~25s.

However this patch makes gfortran.dg/reassoc_4.f to fail

FAIL: gfortran.dg/reassoc_4.f  -O   scan-tree-dump-times reassoc1 "[0-9] \\\\* " 22

After it 22 should be replaced with 16 (thresshold max-completely-peeled-insns=138
gives 16, =139 gives 22).

Dominique

^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: Reduce complette unrolling & peeling limits
@ 2012-11-21 13:47 Dominique Dhumieres
  2012-11-21 14:16 ` Jan Hubicka
  0 siblings, 1 reply; 19+ messages in thread
From: Dominique Dhumieres @ 2012-11-21 13:47 UTC (permalink / raw)
  To: gcc-patches; +Cc: ebotcazou, hubicka

Hi Jan,

> this is patch I will try to test once I have chance :)
> It simply prevents unroller from analyzing loops when they are already too large.
> ...

This patch breaks bootstrap with

...
/opt/gcc/p_build/./prev-gcc/g++ -B/opt/gcc/p_build/./prev-gcc/ -B/opt/gcc/gcc4.8p-193652p3/x86_64-apple-darwin10.8.0/bin/ -nostdinc++ -B/opt/gcc/p_build/prev-x86_64-apple-darwin10.8.0/libstdc++-v3/src/.libs -B/opt/gcc/p_build/prev-x86_64-apple-darwin10.8.0/libstdc++-v3/libsupc++/.libs -I/opt/gcc/p_build/prev-x86_64-apple-darwin10.8.0/libstdc++-v3/include/x86_64-apple-darwin10.8.0 -I/opt/gcc/p_build/prev-x86_64-apple-darwin10.8.0/libstdc++-v3/include -I/opt/gcc/p_work/libstdc++-v3/libsupc++ -L/opt/gcc/p_build/prev-x86_64-apple-darwin10.8.0/libstdc++-v3/src/.libs -L/opt/gcc/p_build/prev-x86_64-apple-darwin10.8.0/libstdc++-v3/libsupc++/.libs -c   -g -O2 -mdynamic-no-pic -gtoggle -DIN_GCC   -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror   -DHAVE_CONFIG_H -I. -I. -I../../p_work/gcc -I../../p_work/gcc/. -I../../p_work/gcc/../include -I./../intl -I../../p_work/gcc/../libcpp/include -I/opt/mp/include  -I../../p_work/gcc/../libdecnumber -I../../p_work/gcc/../libdecnumber/dpd -I../libdecnumber -I../../p_work/gcc/../libbacktrace -DCLOOG_INT_GMP  -I/opt/mp/include  ../../p_work/gcc/tree-ssa-loop-ivopts.c -o tree-ssa-loop-ivopts.o
../../p_work/gcc/tree-ssa-loop-ivcanon.c: In function 'bool canonicalize_loop_induction_variables(loop*, bool, unroll_level, bool)':
../../p_work/gcc/tree-ssa-loop-ivcanon.c:690:62: error: 'n_unroll' may be used uninitialized in this function [-Werror=maybe-uninitialized]
       && (!n_unroll_found || (unsigned HOST_WIDE_INT)maxiter < n_unroll))
                                                              ^
../../p_work/gcc/tree-ssa-loop-ivcanon.c:656:26: note: 'n_unroll' was declared here
   unsigned HOST_WIDE_INT n_unroll, ninsns, max_unroll, unr_insns;
                          ^
cc1plus: all warnings being treated as errors
...

I have completed bootstrap with the following change

--- ../_clean/gcc/tree-ssa-loop-ivcanon.c	2012-11-18 11:27:28.000000000 +0100
+++ gcc/tree-ssa-loop-ivcanon.c	2012-11-20 16:27:07.000000000 +0100
@@ -641,9 +641,10 @@ try_unroll_loop_completely (struct loop 
 			    enum unroll_level ul,
 			    HOST_WIDE_INT maxiter)
 {
-  unsigned HOST_WIDE_INT n_unroll, ninsns, max_unroll, unr_insns;
+  unsigned HOST_WIDE_INT ninsns, max_unroll, unr_insns;
   gimple cond;
   struct loop_size size;
+  unsigned HOST_WIDE_INT n_unroll = 0;
   bool n_unroll_found = false;
   edge edge_to_cancel = NULL;
   int num = loop->num;

After that the compilation of gcc.c-torture/compile/pr43186.c is back to
a fraction of a second, but I see the following regressions:

FAIL: gcc.dg/graphite/interchange-8.c scan-tree-dump-times graphite "will be interchanged" 2
FAIL: gcc.dg/graphite/pr42530.c (internal compiler error)
FAIL: gcc.dg/graphite/pr42530.c (test for excess errors)
FAIL: gcc.dg/tree-ssa/cunroll-1.c scan-tree-dump cunrolli "Unrolled loop 1 completely .duplicated 2 times.."
FAIL: gcc.dg/tree-ssa/cunroll-1.c scan-tree-dump cunrolli "Last iteration exit edge was proved true."
FAIL: gcc.dg/tree-ssa/cunroll-3.c scan-tree-dump cunrolli "Unrolled loop 1 completely .duplicated 1 times.."
FAIL: gcc.dg/tree-ssa/loop-36.c scan-tree-dump-not dce2 "c.array"
FAIL: gcc.dg/tree-ssa/loop-37.c scan-tree-dump-not optimized "my_array"
FAIL: gcc.dg/tree-ssa/pr21829.c scan-tree-dump-not optimized "if \\("
FAIL: libgomp.fortran/reduction2.f90  -O3 -fomit-frame-pointer  execution test
FAIL: libgomp.fortran/reduction2.f90  -O3 -fomit-frame-pointer -funroll-loops  execution test
FAIL: libgomp.fortran/reduction2.f90  -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions  execution test
FAIL: libgomp.fortran/reduction2.f90  -O3 -g  execution test
FAIL: libgomp.fortran/reduction2.f90  -O3 -fomit-frame-pointer  execution test
FAIL: libgomp.fortran/reduction2.f90  -O3 -fomit-frame-pointer -funroll-loops  execution test
FAIL: libgomp.fortran/reduction2.f90  -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions  execution test
FAIL: libgomp.fortran/reduction2.f90  -O3 -g  execution test

for both -m32 and -m64 +

FAIL: gcc.dg/tree-ssa/loadpre6.c scan-tree-dump-times pre "Insertions: 2" 1

with -m32.

Dominique

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-12-06  9:35 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-14 23:34 Reduce complette unrolling & peeling limits Jan Hubicka
2012-11-15 13:19 ` Jakub Jelinek
2012-11-18  8:09 ` Eric Botcazou
2012-11-18 16:52   ` Jan Hubicka
2012-11-18 17:09     ` Jan Hubicka
2012-11-18 18:47       ` Eric Botcazou
2012-11-19 12:46         ` Jan Hubicka
2012-11-21 16:25           ` Jan Hubicka
2012-12-03 12:06             ` Eric Botcazou
2012-12-04 18:28               ` Jan Hubicka
2012-12-06  9:35                 ` Richard Biener
2012-11-23 18:46       ` Hans-Peter Nilsson
2012-11-24  7:47         ` Jan Hubicka
2012-11-18 13:35 Dominique Dhumieres
2012-11-21 16:27 ` Jan Hubicka
2012-11-25 12:22   ` Dominique Dhumieres
2012-11-25 12:26     ` Dominique Dhumieres
2012-11-21 13:47 Dominique Dhumieres
2012-11-21 14:16 ` Jan Hubicka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).