* [PATCH][0/n] Merge from match-and-simplify @ 2014-10-15 13:20 Richard Biener 2014-10-15 16:30 ` Kyrill Tkachov ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Richard Biener @ 2014-10-15 13:20 UTC (permalink / raw) To: gcc-patches I have posted 5 patches as part of a larger series to merge (parts) from the match-and-simplify branch. While I think there was overall consensus that the idea behind the project is sound there are technical questions left for how the thing should look in the end. I've raised them in 3/n which is the only patch of the series that contains any patterns sofar. To re-iterate here (as I expect most people will only look at [0/n] patches ;)), the question is whether we are fine with making fold-const (thus fold_{unary,binary,ternary}) not handle some cases it handles currently. Some cases may result from the fact that the autogenerated code from genmatch does not apply selective STRIP_NOPS/STRIP_SIGN_NOPS to the outermost expression operands as fold does. Some cases may result from the fact that the autogenerated code bails out on operands with side-effects (patterns really target GIMPLE where operands never have side-effects). In 10 years (well, maybe earlier) we shouldn't do any expression simplification from the frontends (and thus GENERIC) if not explicitely required by language standards. Thus I expect fold-const.c to "vanish" anyway. Generally it is of course impossible to prove that adding a pattern and removing existing fold-const.c and gimple-fold.c/tree-ssa-forwprop.c code will not regress. But I suppose that's expected and passing the testsuite is fine (together with possibly amending it with testcases for foldings that now should apply on GIMPLE). For stage1 I'd like to merge [1/n] to [5/n] as one revision (well, excluding 3/n) and add patterns in smaller chunks to be able to bisect to issues that show up. Any comments and reviews welcome (I don't think that my maintainership covers enough to simply check this in without approval). Thanks, Richard. -- Richard Biener <rguenther@suse.de> SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH][0/n] Merge from match-and-simplify 2014-10-15 13:20 [PATCH][0/n] Merge from match-and-simplify Richard Biener @ 2014-10-15 16:30 ` Kyrill Tkachov 2014-10-17 7:39 ` Ramana Radhakrishnan 2014-10-15 17:13 ` Jakub Jelinek 2014-10-16 20:43 ` Sebastian Pop 2 siblings, 1 reply; 15+ messages in thread From: Kyrill Tkachov @ 2014-10-15 16:30 UTC (permalink / raw) To: Richard Biener, gcc-patches On 15/10/14 14:00, Richard Biener wrote: > > Any comments and reviews welcome (I don't think that > my maintainership covers enough to simply check this in > without approval). > Hi Richard, The match-and-simplify branch bootstrapped successfully on aarch64-none-linux-gnu FWIW. Thanks, Kyrill ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH][0/n] Merge from match-and-simplify 2014-10-15 16:30 ` Kyrill Tkachov @ 2014-10-17 7:39 ` Ramana Radhakrishnan 2014-10-17 8:24 ` Richard Biener 0 siblings, 1 reply; 15+ messages in thread From: Ramana Radhakrishnan @ 2014-10-17 7:39 UTC (permalink / raw) To: Kyrill Tkachov; +Cc: Richard Biener, gcc-patches On Wed, Oct 15, 2014 at 5:29 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote: > > On 15/10/14 14:00, Richard Biener wrote: >> >> >> Any comments and reviews welcome (I don't think that >> my maintainership covers enough to simply check this in >> without approval). >> > Hi Richard, > > The match-and-simplify branch bootstrapped successfully on > aarch64-none-linux-gnu FWIW. > What about regression tests ? > Thanks, > Kyrill > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH][0/n] Merge from match-and-simplify 2014-10-17 7:39 ` Ramana Radhakrishnan @ 2014-10-17 8:24 ` Richard Biener 2014-10-17 11:58 ` Richard Biener 0 siblings, 1 reply; 15+ messages in thread From: Richard Biener @ 2014-10-17 8:24 UTC (permalink / raw) To: ramrad01; +Cc: Kyrill Tkachov, gcc-patches On Fri, 17 Oct 2014, Ramana Radhakrishnan wrote: > On Wed, Oct 15, 2014 at 5:29 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote: > > > > On 15/10/14 14:00, Richard Biener wrote: > >> > >> > >> Any comments and reviews welcome (I don't think that > >> my maintainership covers enough to simply check this in > >> without approval). > >> > > Hi Richard, > > > > The match-and-simplify branch bootstrapped successfully on > > aarch64-none-linux-gnu FWIW. > > > > What about regression tests ? Note the branch isn't regression free on x86_64 either. The branch does more than I want to merge to trunk (and it also retains all folding code I added patterns for). I've gone farther there to explore whether it will end up working in the end and what kind of features the IL and the APIs need. I've pasted testsuite results on x86_64 below for rev. 216324 which is based on trunk rev. 216315 which unfortunately has lots of regressions on its own. This is why I want to restrict the effect of the machinery to fold (), fold_stmt () and tree-ssa-forwprop.c for the moment and merge individual patterns (well, maybe in small groups) separately to allow for easy bi-section. I suppose I should push the most visible change to trunk first, namely tree-ssa-forwprop.c folding all statements via fold_stmt after the merge. I suspect this alone can have some odd effects like the sub + cmp fusing. That would be sth like the patch attached below. Richard. Index: gcc/tree-ssa-forwprop.c =================================================================== --- gcc/tree-ssa-forwprop.c (revision 216258) +++ gcc/tree-ssa-forwprop.c (working copy) @@ -54,6 +54,8 @@ along with GCC; see the file COPYING3. #include "tree-ssa-propagate.h" #include "tree-ssa-dom.h" #include "builtins.h" +#include "tree-cfgcleanup.h" +#include "tree-into-ssa.h" /* This pass propagates the RHS of assignment statements into use sites of the LHS of the assignment. It's basically a specialized @@ -3586,6 +3588,8 @@ simplify_mult (gimple_stmt_iterator *gsi return false; } + + /* Main entry point for the forward propagation and statement combine optimizer. */ @@ -3626,6 +3630,40 @@ pass_forwprop::execute (function *fun) cfg_changed = false; + /* Combine stmts with the stmts defining their operands. Do that + in an order that guarantees visiting SSA defs before SSA uses. */ + int *postorder = XNEWVEC (int, n_basic_blocks_for_fn (fun)); + int postorder_num = inverted_post_order_compute (postorder); + for (int i = 0; i < postorder_num; ++i) + { + bb = BASIC_BLOCK_FOR_FN (fun, postorder[i]); + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); + !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple stmt = gsi_stmt (gsi); + gimple orig_stmt = stmt; + + if (fold_stmt (&gsi)) + { + stmt = gsi_stmt (gsi); + if (maybe_clean_or_replace_eh_stmt (orig_stmt, stmt) + && gimple_purge_dead_eh_edges (bb)) + cfg_changed = true; + update_stmt (stmt); + } + } + } + free (postorder); + + /* ??? Code below doesn't expect non-renamed VOPs and the above + doesn't keep virtual operand form up-to-date. */ + if (cfg_changed) + { + cleanup_tree_cfg (); + cfg_changed = false; + } + update_ssa (TODO_update_ssa_only_virtuals); + FOR_EACH_BB_FN (bb, fun) { gimple_stmt_iterator gsi; ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH][0/n] Merge from match-and-simplify 2014-10-17 8:24 ` Richard Biener @ 2014-10-17 11:58 ` Richard Biener 0 siblings, 0 replies; 15+ messages in thread From: Richard Biener @ 2014-10-17 11:58 UTC (permalink / raw) To: ramrad01; +Cc: Kyrill Tkachov, gcc-patches On Fri, 17 Oct 2014, Richard Biener wrote: > On Fri, 17 Oct 2014, Ramana Radhakrishnan wrote: > > > On Wed, Oct 15, 2014 at 5:29 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote: > > > > > > On 15/10/14 14:00, Richard Biener wrote: > > >> > > >> > > >> Any comments and reviews welcome (I don't think that > > >> my maintainership covers enough to simply check this in > > >> without approval). > > >> > > > Hi Richard, > > > > > > The match-and-simplify branch bootstrapped successfully on > > > aarch64-none-linux-gnu FWIW. > > > > > > > What about regression tests ? > > Note the branch isn't regression free on x86_64 either. The branch > does more than I want to merge to trunk (and it also retains all > folding code I added patterns for). I've gone farther there to > explore whether it will end up working in the end and what kind > of features the IL and the APIs need. > > I've pasted testsuite results on x86_64 below for rev. 216324 > which is based on trunk rev. 216315 which unfortunately has > lots of regressions on its own. > > This is why I want to restrict the effect of the machinery to > fold (), fold_stmt () and tree-ssa-forwprop.c for the moment > and merge individual patterns (well, maybe in small groups) > separately to allow for easy bi-section. > > I suppose I should push the most visible change to trunk first, > namely tree-ssa-forwprop.c folding all statements via fold_stmt > after the merge. I suspect this alone can have some odd effects > like the sub + cmp fusing. That would be sth like the patch > attached below. Just finished testing this (with -m32 on x86_64), showing regressions in the testsuite like FAIL: gcc.dg/tree-ssa/slsr-19.c scan-tree-dump-times optimized " \\\\* y" 1 FAIL: gcc.dg/vect/bb-slp-27.c -flto -ffat-lto-objects scan-tree-dump-times slp2 "basic block vectorized" 1 FAIL: gcc.dg/vect/bb-slp-27.c scan-tree-dump-times slp2 "basic block vectorized" 1 FAIL: gcc.dg/vect/bb-slp-8b.c -flto -ffat-lto-objects scan-tree-dump-times slp2 "basic block vectorized" 1 FAIL: gcc.dg/vect/bb-slp-8b.c scan-tree-dump-times slp2 "basic block vectorized" 1 FAIL: gcc.dg/vect/slp-cond-3.c -flto -ffat-lto-objects scan-tree-dump-times vec t "vectorizing stmts using SLP" 1 FAIL: gcc.dg/vect/slp-cond-3.c scan-tree-dump-times vect "vectorizing stmts usin g SLP" 1 Bah. I suppose I need to investigate this (simply folding a stmt shouldn't cause any of the above... - with SLP it is probably operand canonicalization, but not sure). Richard. > Richard. > > Index: gcc/tree-ssa-forwprop.c > =================================================================== > --- gcc/tree-ssa-forwprop.c (revision 216258) > +++ gcc/tree-ssa-forwprop.c (working copy) > @@ -54,6 +54,8 @@ along with GCC; see the file COPYING3. > #include "tree-ssa-propagate.h" > #include "tree-ssa-dom.h" > #include "builtins.h" > +#include "tree-cfgcleanup.h" > +#include "tree-into-ssa.h" > > /* This pass propagates the RHS of assignment statements into use > sites of the LHS of the assignment. It's basically a specialized > @@ -3586,6 +3588,8 @@ simplify_mult (gimple_stmt_iterator *gsi > > return false; > } > + > + > /* Main entry point for the forward propagation and statement combine > optimizer. */ > > @@ -3626,6 +3630,40 @@ pass_forwprop::execute (function *fun) > > cfg_changed = false; > > + /* Combine stmts with the stmts defining their operands. Do that > + in an order that guarantees visiting SSA defs before SSA uses. */ > + int *postorder = XNEWVEC (int, n_basic_blocks_for_fn (fun)); > + int postorder_num = inverted_post_order_compute (postorder); > + for (int i = 0; i < postorder_num; ++i) > + { > + bb = BASIC_BLOCK_FOR_FN (fun, postorder[i]); > + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); > + !gsi_end_p (gsi); gsi_next (&gsi)) > + { > + gimple stmt = gsi_stmt (gsi); > + gimple orig_stmt = stmt; > + > + if (fold_stmt (&gsi)) > + { > + stmt = gsi_stmt (gsi); > + if (maybe_clean_or_replace_eh_stmt (orig_stmt, stmt) > + && gimple_purge_dead_eh_edges (bb)) > + cfg_changed = true; > + update_stmt (stmt); > + } > + } > + } > + free (postorder); > + > + /* ??? Code below doesn't expect non-renamed VOPs and the above > + doesn't keep virtual operand form up-to-date. */ > + if (cfg_changed) > + { > + cleanup_tree_cfg (); > + cfg_changed = false; > + } > + update_ssa (TODO_update_ssa_only_virtuals); > + > FOR_EACH_BB_FN (bb, fun) > { > gimple_stmt_iterator gsi; > -- Richard Biener <rguenther@suse.de> SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH][0/n] Merge from match-and-simplify 2014-10-15 13:20 [PATCH][0/n] Merge from match-and-simplify Richard Biener 2014-10-15 16:30 ` Kyrill Tkachov @ 2014-10-15 17:13 ` Jakub Jelinek 2014-10-16 20:43 ` Sebastian Pop 2 siblings, 0 replies; 15+ messages in thread From: Jakub Jelinek @ 2014-10-15 17:13 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches On Wed, Oct 15, 2014 at 03:00:57PM +0200, Richard Biener wrote: > To re-iterate here (as I expect most people will only look > at [0/n] patches ;)), the question is whether we are fine > with making fold-const (thus fold_{unary,binary,ternary}) > not handle some cases it handles currently. I'd say not dealing with side-effects should be smaller problem than looking through casts etc., expressions with side-effects generally shouldn't be folding into constants, where I'd expect the highest risks (something previously folded into constants and used in initializers not working any longer, or __builtin_constant_p, etc.). I'd say just trying hard not to regress, and adding testsuite coverage where not already covered, should be good enough, if something important regresses, hopefully mass rebuilds will reveal it or users report it to us. Jakub ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH][0/n] Merge from match-and-simplify 2014-10-15 13:20 [PATCH][0/n] Merge from match-and-simplify Richard Biener 2014-10-15 16:30 ` Kyrill Tkachov 2014-10-15 17:13 ` Jakub Jelinek @ 2014-10-16 20:43 ` Sebastian Pop 2014-10-16 20:50 ` Andrew Pinski 2014-10-17 8:00 ` Richard Biener 2 siblings, 2 replies; 15+ messages in thread From: Sebastian Pop @ 2014-10-16 20:43 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches Richard Biener wrote: > > I have posted 5 patches as part of a larger series to merge > (parts) from the match-and-simplify branch. While I think > there was overall consensus that the idea behind the project > is sound there are technical questions left for how the > thing should look in the end. I've raised them in 3/n > which is the only patch of the series that contains any > patterns sofar. > > To re-iterate here (as I expect most people will only look > at [0/n] patches ;)), the question is whether we are fine > with making fold-const (thus fold_{unary,binary,ternary}) > not handle some cases it handles currently. I have tested on aarch64 all the code in the match-and-simplify against trunk as of the last merge at r216315: 2014-10-16 Richard Biener <rguenther@suse.de> Merge from trunk r216235 through r216315. Overall, I see a lot of perf regressions (about 2/3 of the tests) than improvements (1/3 of the tests). I will try to reduce tests. For instance, saxpy regresses at -O3 on aarch64: void saxpy(double* x, double* y, double* z) { int i=0; for (i = 0 ; i < ARRAY_SIZE; i++) { z[i] = x[i] + scalar*y[i]; } } $ diff -u base.s mas.s --- base.s 2014-10-16 15:30:15.351430000 -0500 +++ mas.s 2014-10-16 15:30:16.183035000 -0500 @@ -2,12 +2,14 @@ add x1, x2, 800 ldr q0, [x0, x2] add x3, x2, 1600 + cmp x0, 784 ldr q1, [x0, x1] + add x1, x0, 16 fmla v0.2d, v1.2d, v2.2d str q0, [x0, x3] - add x0, x0, 16 - cmp x0, 800 + mov x0, x1 bne .L140 .LBE179: - subs w4, w4, #1 + cmp w4, 1 + sub w4, w4, #1 bne .L139 Thanks, Sebastian ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH][0/n] Merge from match-and-simplify 2014-10-16 20:43 ` Sebastian Pop @ 2014-10-16 20:50 ` Andrew Pinski 2014-10-17 7:29 ` Ramana Radhakrishnan 2014-10-17 8:00 ` Richard Biener 1 sibling, 1 reply; 15+ messages in thread From: Andrew Pinski @ 2014-10-16 20:50 UTC (permalink / raw) To: Sebastian Pop; +Cc: Richard Biener, GCC Patches On Thu, Oct 16, 2014 at 1:38 PM, Sebastian Pop <sebpop@gmail.com> wrote: > Richard Biener wrote: >> >> I have posted 5 patches as part of a larger series to merge >> (parts) from the match-and-simplify branch. While I think >> there was overall consensus that the idea behind the project >> is sound there are technical questions left for how the >> thing should look in the end. I've raised them in 3/n >> which is the only patch of the series that contains any >> patterns sofar. >> >> To re-iterate here (as I expect most people will only look >> at [0/n] patches ;)), the question is whether we are fine >> with making fold-const (thus fold_{unary,binary,ternary}) >> not handle some cases it handles currently. > > I have tested on aarch64 all the code in the match-and-simplify against trunk as > of the last merge at r216315: > > 2014-10-16 Richard Biener <rguenther@suse.de> > > Merge from trunk r216235 through r216315. > > Overall, I see a lot of perf regressions (about 2/3 of the tests) than > improvements (1/3 of the tests). I will try to reduce tests. > > For instance, saxpy regresses at -O3 on aarch64: > > void saxpy(double* x, double* y, double* z) { > int i=0; > for (i = 0 ; i < ARRAY_SIZE; i++) { > z[i] = x[i] + scalar*y[i]; > } > } This looks like a scheduling issue rather than anything else. The scheduler for a57 is not complete and does not model some things like the fusion of the compares and branch which is most likely what you are seeing. Thanks, Andrew Pinski > > $ diff -u base.s mas.s > --- base.s 2014-10-16 15:30:15.351430000 -0500 > +++ mas.s 2014-10-16 15:30:16.183035000 -0500 > @@ -2,12 +2,14 @@ > add x1, x2, 800 > ldr q0, [x0, x2] > add x3, x2, 1600 > + cmp x0, 784 > ldr q1, [x0, x1] > + add x1, x0, 16 > fmla v0.2d, v1.2d, v2.2d > str q0, [x0, x3] > - add x0, x0, 16 > - cmp x0, 800 > + mov x0, x1 > bne .L140 > .LBE179: > - subs w4, w4, #1 > + cmp w4, 1 > + sub w4, w4, #1 > bne .L139 > > > > Thanks, > Sebastian ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH][0/n] Merge from match-and-simplify 2014-10-16 20:50 ` Andrew Pinski @ 2014-10-17 7:29 ` Ramana Radhakrishnan 0 siblings, 0 replies; 15+ messages in thread From: Ramana Radhakrishnan @ 2014-10-17 7:29 UTC (permalink / raw) To: Andrew Pinski, Sebastian Pop, Richard Biener; +Cc: GCC Patches On 16/10/14 21:43, Andrew Pinski wrote: > On Thu, Oct 16, 2014 at 1:38 PM, Sebastian Pop <sebpop@gmail.com> wrote: >> Richard Biener wrote: >>> >>> I have posted 5 patches as part of a larger series to merge >>> (parts) from the match-and-simplify branch. While I think >>> there was overall consensus that the idea behind the project >>> is sound there are technical questions left for how the >>> thing should look in the end. I've raised them in 3/n >>> which is the only patch of the series that contains any >>> patterns sofar. >>> >>> To re-iterate here (as I expect most people will only look >>> at [0/n] patches ;)), the question is whether we are fine >>> with making fold-const (thus fold_{unary,binary,ternary}) >>> not handle some cases it handles currently. >> >> I have tested on aarch64 all the code in the match-and-simplify against trunk as >> of the last merge at r216315: >> >> 2014-10-16 Richard Biener <rguenther@suse.de> >> >> Merge from trunk r216235 through r216315. >> >> Overall, I see a lot of perf regressions (about 2/3 of the tests) than >> improvements (1/3 of the tests). I will try to reduce tests. >> >> For instance, saxpy regresses at -O3 on aarch64: >> >> void saxpy(double* x, double* y, double* z) { >> int i=0; >> for (i = 0 ; i < ARRAY_SIZE; i++) { >> z[i] = x[i] + scalar*y[i]; >> } >> } > > This looks like a scheduling issue rather than anything else. The > scheduler for a57 is not complete and does not model some things like > the fusion of the compares and branch which is most likely what you > are seeing. Huh !! how is that related to the code generation shown by Seb ? See the replacement of subs by cmp and sub. Folding cmp into other flag setting instructions is a very useful optimization on ARM and AArch64 and that's what appears missing in fold-const. That maybe what's causing the slowdown. I've never known that to be caused by any scheduler vagaries ! regards Ramana > > Thanks, > Andrew Pinski > >> >> $ diff -u base.s mas.s >> --- base.s 2014-10-16 15:30:15.351430000 -0500 >> +++ mas.s 2014-10-16 15:30:16.183035000 -0500 >> @@ -2,12 +2,14 @@ >> add x1, x2, 800 >> ldr q0, [x0, x2] >> add x3, x2, 1600 >> + cmp x0, 784 >> ldr q1, [x0, x1] >> + add x1, x0, 16 >> fmla v0.2d, v1.2d, v2.2d >> str q0, [x0, x3] >> - add x0, x0, 16 >> - cmp x0, 800 >> + mov x0, x1 >> bne .L140 >> .LBE179: >> - subs w4, w4, #1 >> + cmp w4, 1 >> + sub w4, w4, #1 >> bne .L139 >> >> >> >> Thanks, >> Sebastian > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH][0/n] Merge from match-and-simplify 2014-10-16 20:43 ` Sebastian Pop 2014-10-16 20:50 ` Andrew Pinski @ 2014-10-17 8:00 ` Richard Biener 2014-10-17 16:44 ` Sebastian Pop 1 sibling, 1 reply; 15+ messages in thread From: Richard Biener @ 2014-10-17 8:00 UTC (permalink / raw) To: Sebastian Pop; +Cc: gcc-patches On Thu, 16 Oct 2014, Sebastian Pop wrote: > Richard Biener wrote: > > > > I have posted 5 patches as part of a larger series to merge > > (parts) from the match-and-simplify branch. While I think > > there was overall consensus that the idea behind the project > > is sound there are technical questions left for how the > > thing should look in the end. I've raised them in 3/n > > which is the only patch of the series that contains any > > patterns sofar. > > > > To re-iterate here (as I expect most people will only look > > at [0/n] patches ;)), the question is whether we are fine > > with making fold-const (thus fold_{unary,binary,ternary}) > > not handle some cases it handles currently. > > I have tested on aarch64 all the code in the match-and-simplify against trunk as > of the last merge at r216315: > > 2014-10-16 Richard Biener <rguenther@suse.de> > > Merge from trunk r216235 through r216315. > > Overall, I see a lot of perf regressions (about 2/3 of the tests) than > improvements (1/3 of the tests). I will try to reduce tests. Note that the branch goes much further in exercising the machinery than I want to merge at this point (that applies mostly to all passes using the SSA propagator such as CCP and VRP and passes exercising value-numbering - FRE and PRE). It may also simply show the effect of now folding all statements from tree-ssa-forwprop.c. I have yet to investigate the testsuite fallout of [1/n] to [5/n] - testresults have been very noisy lately due to the C11 change and now ICF. > For instance, saxpy regresses at -O3 on aarch64: > > void saxpy(double* x, double* y, double* z) { > int i=0; > for (i = 0 ; i < ARRAY_SIZE; i++) { > z[i] = x[i] + scalar*y[i]; > } > } > > $ diff -u base.s mas.s > --- base.s 2014-10-16 15:30:15.351430000 -0500 > +++ mas.s 2014-10-16 15:30:16.183035000 -0500 > @@ -2,12 +2,14 @@ > add x1, x2, 800 > ldr q0, [x0, x2] > add x3, x2, 1600 > + cmp x0, 784 > ldr q1, [x0, x1] > + add x1, x0, 16 > fmla v0.2d, v1.2d, v2.2d > str q0, [x0, x3] > - add x0, x0, 16 > - cmp x0, 800 > + mov x0, x1 > bne .L140 > .LBE179: > - subs w4, w4, #1 > + cmp w4, 1 > + sub w4, w4, #1 > bne .L139 I don't understand AARCH64 assembly very well but the above looks like RTL issues and/or IVOPTs issues? Thanks for doing performance measurements. Richard. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH][0/n] Merge from match-and-simplify 2014-10-17 8:00 ` Richard Biener @ 2014-10-17 16:44 ` Sebastian Pop 2014-10-17 17:37 ` Richard Biener 2014-10-17 18:32 ` Sebastian Pop 0 siblings, 2 replies; 15+ messages in thread From: Sebastian Pop @ 2014-10-17 16:44 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches Richard Biener wrote: > On Thu, 16 Oct 2014, Sebastian Pop wrote: > > > Richard Biener wrote: > > > > > > I have posted 5 patches as part of a larger series to merge > > > (parts) from the match-and-simplify branch. While I think > > > there was overall consensus that the idea behind the project > > > is sound there are technical questions left for how the > > > thing should look in the end. I've raised them in 3/n > > > which is the only patch of the series that contains any > > > patterns sofar. > > > > > > To re-iterate here (as I expect most people will only look > > > at [0/n] patches ;)), the question is whether we are fine > > > with making fold-const (thus fold_{unary,binary,ternary}) > > > not handle some cases it handles currently. > > > > I have tested on aarch64 all the code in the match-and-simplify against trunk as > > of the last merge at r216315: > > > > 2014-10-16 Richard Biener <rguenther@suse.de> > > > > Merge from trunk r216235 through r216315. > > > > Overall, I see a lot of perf regressions (about 2/3 of the tests) than > > improvements (1/3 of the tests). I will try to reduce tests. > > Note that the branch goes much further in exercising the machinery > than I want to merge at this point (that applies mostly to all > passes using the SSA propagator such as CCP and VRP and passes > exercising value-numbering - FRE and PRE). I see. Should I run benchmarks only with the patches that you submitted for trunk? > I don't understand AARCH64 assembly very well but the above looks like > RTL issues and/or IVOPTs issues? I should have posted the first diff between the compilers with -fdump-tree-all: that would expose the problem at its root. I have seen that there is a way to dump the folded expressions from the new functionality, is there a flag to print the folded expressions in current trunk? It would be interesting to have the same kind of output, such that we could run a diff between. Thanks, Sebastian ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH][0/n] Merge from match-and-simplify 2014-10-17 16:44 ` Sebastian Pop @ 2014-10-17 17:37 ` Richard Biener 2014-10-17 18:32 ` Sebastian Pop 1 sibling, 0 replies; 15+ messages in thread From: Richard Biener @ 2014-10-17 17:37 UTC (permalink / raw) To: Sebastian Pop; +Cc: gcc-patches On October 17, 2014 6:35:58 PM CEST, Sebastian Pop <sebpop@gmail.com> wrote: >Richard Biener wrote: >> On Thu, 16 Oct 2014, Sebastian Pop wrote: >> >> > Richard Biener wrote: >> > > >> > > I have posted 5 patches as part of a larger series to merge >> > > (parts) from the match-and-simplify branch. While I think >> > > there was overall consensus that the idea behind the project >> > > is sound there are technical questions left for how the >> > > thing should look in the end. I've raised them in 3/n >> > > which is the only patch of the series that contains any >> > > patterns sofar. >> > > >> > > To re-iterate here (as I expect most people will only look >> > > at [0/n] patches ;)), the question is whether we are fine >> > > with making fold-const (thus fold_{unary,binary,ternary}) >> > > not handle some cases it handles currently. >> > >> > I have tested on aarch64 all the code in the match-and-simplify >against trunk as >> > of the last merge at r216315: >> > >> > 2014-10-16 Richard Biener <rguenther@suse.de> >> > >> > Merge from trunk r216235 through r216315. >> > >> > Overall, I see a lot of perf regressions (about 2/3 of the tests) >than >> > improvements (1/3 of the tests). I will try to reduce tests. >> >> Note that the branch goes much further in exercising the machinery >> than I want to merge at this point (that applies mostly to all >> passes using the SSA propagator such as CCP and VRP and passes >> exercising value-numbering - FRE and PRE). > >I see. Should I run benchmarks only with the patches that you >submitted for >trunk? Yes. What I posted sofar should be a no-op performance wise. Benchmarks on the branch are still useful though as eventually all changes should get merged to trunk if enough patterns are implemented. Richard. >> I don't understand AARCH64 assembly very well but the above looks >like >> RTL issues and/or IVOPTs issues? > >I should have posted the first diff between the compilers with >-fdump-tree-all: >that would expose the problem at its root. > >I have seen that there is a way to dump the folded expressions from the >new >functionality, is there a flag to print the folded expressions in >current trunk? >It would be interesting to have the same kind of output, such that we >could run >a diff between. > >Thanks, >Sebastian ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH][0/n] Merge from match-and-simplify 2014-10-17 16:44 ` Sebastian Pop 2014-10-17 17:37 ` Richard Biener @ 2014-10-17 18:32 ` Sebastian Pop 2014-10-20 11:47 ` Richard Biener 1 sibling, 1 reply; 15+ messages in thread From: Sebastian Pop @ 2014-10-17 18:32 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches Sebastian Pop wrote: > Richard Biener wrote: > > looks like > > RTL issues and/or IVOPTs issues? > > I should have posted the first diff between the compilers with -fdump-tree-all: > that would expose the problem at its root. Looks like this is caused by the fwprop pass: diff -u -r ./foo.i.087t.forwprop3 ../mas/foo.i.087t.forwprop3 --- ./foo.i.087t.forwprop3 2014-10-17 13:17:29.985327000 -0500 +++ ../mas/foo.i.087t.forwprop3 2014-10-17 13:17:29.308814000 -0500 @@ -5,6 +5,8 @@ Pass statistics: ---------------- +Applying pattern match-comparison.pd:43, gimple-match.c:11747 +gimple_simplified to if (i_20 != 99) Pass statistics: ---------------- @@ -60,7 +62,7 @@ i_17 = i_20 + 1; # DEBUG iD.2450 => i_17 # DEBUG iD.2450 => i_17 - if (i_17 != 100) + if (i_20 != 99) goto <bb 3>; else goto <bb 4>; [...] diff -u -r ./foo.i.089t.ccp3 ../mas/foo.i.089t.ccp3 --- ./foo.i.089t.ccp3 2014-10-17 13:17:29.991734000 -0500 +++ ../mas/foo.i.089t.ccp3 2014-10-17 13:17:29.316140000 -0500 @@ -53,13 +53,13 @@ # VUSE <.MEM_16> return; -i_17 : -->2 uses. +i_17 : --> single use. i_20 = PHI <i_17(3), 0(2)> # DEBUG i => i_17 -if (i_17 != 100) # DEBUG i => i_17 -i_20 : -->2 uses. +i_20 : -->3 uses. +if (i_20 != 99) i_17 = i_20 + 1; _4 = (long unsigned int) i_20; # DEBUG i => i_20 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH][0/n] Merge from match-and-simplify 2014-10-17 18:32 ` Sebastian Pop @ 2014-10-20 11:47 ` Richard Biener 2014-10-22 21:06 ` Jeff Law 0 siblings, 1 reply; 15+ messages in thread From: Richard Biener @ 2014-10-20 11:47 UTC (permalink / raw) To: Sebastian Pop; +Cc: gcc-patches, marc.glisse On Fri, 17 Oct 2014, Sebastian Pop wrote: > Sebastian Pop wrote: > > Richard Biener wrote: > > > looks like > > > RTL issues and/or IVOPTs issues? > > > > I should have posted the first diff between the compilers with -fdump-tree-all: > > that would expose the problem at its root. > > Looks like this is caused by the fwprop pass: > > diff -u -r ./foo.i.087t.forwprop3 ../mas/foo.i.087t.forwprop3 > --- ./foo.i.087t.forwprop3 2014-10-17 13:17:29.985327000 -0500 > +++ ../mas/foo.i.087t.forwprop3 2014-10-17 13:17:29.308814000 -0500 > @@ -5,6 +5,8 @@ > Pass statistics: > ---------------- > > +Applying pattern match-comparison.pd:43, gimple-match.c:11747 > +gimple_simplified to if (i_20 != 99) > > Pass statistics: > ---------------- > @@ -60,7 +62,7 @@ > i_17 = i_20 + 1; > # DEBUG iD.2450 => i_17 > # DEBUG iD.2450 => i_17 > - if (i_17 != 100) > + if (i_20 != 99) > goto <bb 3>; > else > goto <bb 4>; Ok, so this is one effect on the thing Marc pointed out - currently no patterns (well, no but one) guards itself with has_single_use predicates. That was a conscious decision and the idea was that the caller should do this via its lattice valueization function which could look like tree valueize (tree t) { if (TREE_CODE (t) == SSA_NAME && !has_single_use (t)) return NULL_TREE; return t; } But of course doing that unconditionally would also pessimize code. Generally we'd like to avoid un-CSEing stuff in a way that cannot be CSEd again. That's a more complex condition than what can be implemented with has_single_use. You might also consider a stmt doing a_1 + a_1 where a_1 has two uses now. For Sebastians case above the issue is that we are appearantly bad at optimizing post-increment exit tests. But if you'd consider code like i_2 = i_1 + 1; b1_3 = i_2 < 100; b2_4 = i_2 > 50; if (b1_3 && b2_4) ... then it is profitable to remove i_2 by changing the two comparisons to i_2 <= 98 and i_2 > 49. I thought about doing all simplifications first without committing any simplified sequence to the IL, then scanning over the result, pruning out cases that end up pessimizing code (how exactly isn't yet clear to me). So I'm not sure what we want to do here now. I don't very much like doing things explicitely in the pattern description (nor using the "has_single_use" predicate). I suppose for the gimple_build () stuff we could restrict simplifications to the expression we are building (not simplifying with SSA defs in the IL), more exactly mimicing fold_buildN behavior. I suppose for forwprop we could use the above valueize hook (but then regress because not all patterns as implemented in forwprop guard their def stmt lookup with has_single_use...). Any opinion on this? Any idea of a "simple" cost function if you have the functions IL before and after simplifications (but without any DCE/CSE applied)? Thanks, Richard. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH][0/n] Merge from match-and-simplify 2014-10-20 11:47 ` Richard Biener @ 2014-10-22 21:06 ` Jeff Law 0 siblings, 0 replies; 15+ messages in thread From: Jeff Law @ 2014-10-22 21:06 UTC (permalink / raw) To: Richard Biener, Sebastian Pop; +Cc: gcc-patches, marc.glisse On 10/20/14 05:42, Richard Biener wrote: > That was a conscious decision and the idea was that the caller should > do this via its lattice valueization function which could look like > > tree > valueize (tree t) > { > if (TREE_CODE (t) == SSA_NAME > && !has_single_use (t)) > return NULL_TREE; > return t; > } > > But of course doing that unconditionally would also pessimize code. > Generally we'd like to avoid un-CSEing stuff in a way that cannot > be CSEd again. That's a more complex condition than what can be > implemented with has_single_use. You might also consider a > stmt doing a_1 + a_1 where a_1 has two uses now. FWIW, I wouldn't worry much about the two uses in a single statement case. I looked at that in RTL eons ago it just doesn't happen enough to bother trying to detect and treat as a single use. > > I thought about doing all simplifications first without committing > any simplified sequence to the IL, then scanning over the result, > pruning out cases that end up pessimizing code (how exactly isn't > yet clear to me). > > So I'm not sure what we want to do here now. I don't very much like > doing things explicitely in the pattern description (nor using the > "has_single_use" predicate). > I suppose for the gimple_build () stuff we could restrict simplifications > to the expression we are building (not simplifying with SSA defs in the > IL), more exactly mimicing fold_buildN behavior. > I suppose for forwprop we could use the above valueize hook (but then > regress because not all patterns as implemented in forwprop guard > their def stmt lookup with has_single_use...). > > Any opinion on this? Any idea of a "simple" cost function if > you have the functions IL before and after simplifications (but > without any DCE/CSE applied)? It's certainly ideal to be able to be able to CSE/un-CSE depending on final context and it's a design goal I've heard other compiler developers making. ie, every transformation early which may be somewhat speculative must be "un-doable" later. But the infrastructure for that is, umm, hard. The concept of simplify on the side, then prune out stuff that isn't profitable is nice, but as you state, that's nontrivial as well. In general, the has_single_use case is profitable. So we want to aggressively go after those and I think we can commit those immediately and use the valueize function shown above. Maybe you then look at the more speculative cases... jeff ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2014-10-22 20:40 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-10-15 13:20 [PATCH][0/n] Merge from match-and-simplify Richard Biener 2014-10-15 16:30 ` Kyrill Tkachov 2014-10-17 7:39 ` Ramana Radhakrishnan 2014-10-17 8:24 ` Richard Biener 2014-10-17 11:58 ` Richard Biener 2014-10-15 17:13 ` Jakub Jelinek 2014-10-16 20:43 ` Sebastian Pop 2014-10-16 20:50 ` Andrew Pinski 2014-10-17 7:29 ` Ramana Radhakrishnan 2014-10-17 8:00 ` Richard Biener 2014-10-17 16:44 ` Sebastian Pop 2014-10-17 17:37 ` Richard Biener 2014-10-17 18:32 ` Sebastian Pop 2014-10-20 11:47 ` Richard Biener 2014-10-22 21:06 ` Jeff Law
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).