public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/40979] New: induct benchmark 60% slower when compiled with -fgraphite-identity
@ 2009-08-06 0:19 howarth at nitro dot med dot uc dot edu
2009-08-06 0:23 ` [Bug middle-end/40979] " spop at gcc dot gnu dot org
` (8 more replies)
0 siblings, 9 replies; 25+ messages in thread
From: howarth at nitro dot med dot uc dot edu @ 2009-08-06 0:19 UTC (permalink / raw)
To: gcc-bugs
The Polyhedron 2005 induct benchmark averages 12.44 seconds run-time when
compiled with...
gfortran -ffast-math -funroll-loops -msse3 -O3 induct.f90 -o induct
but averages 20.2 seconds when compiled with -fgraphite-identity added to the
compilation flags.
This issue remains after...
http://gcc.gnu.org/ml/gcc-patches/2009-08/msg00220.html
http://gcc.gnu.org/ml/gcc-patches/2009-08/msg00294.html
are applied to r150500.
--
Summary: induct benchmark 60% slower when compiled with -
fgraphite-identity
Product: gcc
Version: 4.5.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: howarth at nitro dot med dot uc dot edu
GCC build triplet: x86_64-apple-darwin10
GCC host triplet: x86_64-apple-darwin10
GCC target triplet: x86_64-apple-darwin10
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979
^ permalink raw reply [flat|nested] 25+ messages in thread
* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
2009-08-06 0:19 [Bug middle-end/40979] New: induct benchmark 60% slower when compiled with -fgraphite-identity howarth at nitro dot med dot uc dot edu
@ 2009-08-06 0:23 ` spop at gcc dot gnu dot org
2009-08-12 14:58 ` spop at gcc dot gnu dot org
` (7 subsequent siblings)
8 siblings, 0 replies; 25+ messages in thread
From: spop at gcc dot gnu dot org @ 2009-08-06 0:23 UTC (permalink / raw)
To: gcc-bugs
--
spop at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|unassigned at gcc dot gnu |spop at gcc dot gnu dot org
|dot org |
Status|UNCONFIRMED |ASSIGNED
Ever Confirmed|0 |1
Last reconfirmed|0000-00-00 00:00:00 |2009-08-06 00:23:48
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979
^ permalink raw reply [flat|nested] 25+ messages in thread
* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
2009-08-06 0:19 [Bug middle-end/40979] New: induct benchmark 60% slower when compiled with -fgraphite-identity howarth at nitro dot med dot uc dot edu
2009-08-06 0:23 ` [Bug middle-end/40979] " spop at gcc dot gnu dot org
@ 2009-08-12 14:58 ` spop at gcc dot gnu dot org
2009-08-13 2:26 ` howarth at nitro dot med dot uc dot edu
` (6 subsequent siblings)
8 siblings, 0 replies; 25+ messages in thread
From: spop at gcc dot gnu dot org @ 2009-08-12 14:58 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from spop at gcc dot gnu dot org 2009-08-12 14:58 -------
Still fails on my machine, on rev150694.
~/gcc/svn/trunk/usr/bin/gfortran -ffast-math -funroll-loops -msse3 -O3
induct.f90 -o induct
time ./induct
real 0m16.596s
user 0m16.393s
sys 0m0.076s
~/gcc/svn/trunk/usr/bin/gfortran -fgraphite-identity -ffast-math -funroll-loops
-msse3 -O3 induct.f90 -o induct
time ./induct
real 0m25.740s
user 0m25.634s
sys 0m0.084s
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979
^ permalink raw reply [flat|nested] 25+ messages in thread
* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
2009-08-06 0:19 [Bug middle-end/40979] New: induct benchmark 60% slower when compiled with -fgraphite-identity howarth at nitro dot med dot uc dot edu
2009-08-06 0:23 ` [Bug middle-end/40979] " spop at gcc dot gnu dot org
2009-08-12 14:58 ` spop at gcc dot gnu dot org
@ 2009-08-13 2:26 ` howarth at nitro dot med dot uc dot edu
2009-08-14 11:59 ` dominiq at lps dot ens dot fr
` (5 subsequent siblings)
8 siblings, 0 replies; 25+ messages in thread
From: howarth at nitro dot med dot uc dot edu @ 2009-08-13 2:26 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from howarth at nitro dot med dot uc dot edu 2009-08-13 02:25 -------
Interestingly, this benchmark is also the one that shows the best improvement
from -floop-interchange...
Compile Command : gfortran -ffast-math -funroll-loops -msse3 -O3 %n.f90 -o %n
Benchmarks : induct
Maximum Times : 2000.0
Benchmark Compile Executable Ave Run Number Estim
Name (secs) (bytes) (secs) Repeats Err %
--------- ------- ---------- ------- ------- ------
induct 6.83 10000 12.44 10 0.0153
Compile Command : gfortran -ffast-math -funroll-loops -msse3 -O3
-fgraphite-identity %n.f90 -o %n
Benchmarks : induct
Benchmark Compile Executable Ave Run Number Estim
Name (secs) (bytes) (secs) Repeats Err %
--------- ------- ---------- ------- ------- ------
induct 25.09 10000 20.19 10 0.0113
Compile Command : gfortran -ffast-math -funroll-loops -msse3 -O3
-fgraphite-identity -floop-interchange %n.f90 -o %n
Benchmarks : induct
Benchmark Compile Executable Ave Run Number Estim
Name (secs) (bytes) (secs) Repeats Err %
--------- ------- ---------- ------- ------- ------
induct 26.48 10000 7.43 10 0.0045
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979
^ permalink raw reply [flat|nested] 25+ messages in thread
* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
2009-08-06 0:19 [Bug middle-end/40979] New: induct benchmark 60% slower when compiled with -fgraphite-identity howarth at nitro dot med dot uc dot edu
` (2 preceding siblings ...)
2009-08-13 2:26 ` howarth at nitro dot med dot uc dot edu
@ 2009-08-14 11:59 ` dominiq at lps dot ens dot fr
2009-12-14 19:24 ` spop at gcc dot gnu dot org
` (4 subsequent siblings)
8 siblings, 0 replies; 25+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-08-14 11:59 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from dominiq at lps dot ens dot fr 2009-08-14 11:59 -------
> Interestingly, this benchmark is also the one that shows the best improvement
> from -floop-interchange...
I also see that ~20s versus ~34s, however comparing the outputs:
Maximum wand/quad abs rel mutual inductance = 5.95379428444659242E-002
(without)
Maximum wand/quad abs rel mutual inductance = 5.37795458094567566E-002
(with)
I suspect that gfortran generates a wrong code with -floop-interchange.
Could this be checked before I fill a new pr?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979
^ permalink raw reply [flat|nested] 25+ messages in thread
* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
2009-08-06 0:19 [Bug middle-end/40979] New: induct benchmark 60% slower when compiled with -fgraphite-identity howarth at nitro dot med dot uc dot edu
` (3 preceding siblings ...)
2009-08-14 11:59 ` dominiq at lps dot ens dot fr
@ 2009-12-14 19:24 ` spop at gcc dot gnu dot org
2010-02-25 15:23 ` dominiq at lps dot ens dot fr
` (3 subsequent siblings)
8 siblings, 0 replies; 25+ messages in thread
From: spop at gcc dot gnu dot org @ 2009-12-14 19:24 UTC (permalink / raw)
To: gcc-bugs
--
spop at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|spop at gcc dot gnu dot org |unassigned at gcc dot gnu
| |dot org
Status|ASSIGNED |NEW
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979
^ permalink raw reply [flat|nested] 25+ messages in thread
* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
2009-08-06 0:19 [Bug middle-end/40979] New: induct benchmark 60% slower when compiled with -fgraphite-identity howarth at nitro dot med dot uc dot edu
` (4 preceding siblings ...)
2009-12-14 19:24 ` spop at gcc dot gnu dot org
@ 2010-02-25 15:23 ` dominiq at lps dot ens dot fr
2010-02-25 17:26 ` dominiq at lps dot ens dot fr
` (2 subsequent siblings)
8 siblings, 0 replies; 25+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-02-25 15:23 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from dominiq at lps dot ens dot fr 2010-02-25 15:23 -------
At revision 156693 or higher, the miscompilation with -floop-interchange
reported in comment #3 is gone. As a consequence the corresponding execution
time is now the same as when compiled with -fgraphite-identity.
The timings wit/without the options correspond to a missed vectorization of the
critical loops.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979
^ permalink raw reply [flat|nested] 25+ messages in thread
* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
2009-08-06 0:19 [Bug middle-end/40979] New: induct benchmark 60% slower when compiled with -fgraphite-identity howarth at nitro dot med dot uc dot edu
` (5 preceding siblings ...)
2010-02-25 15:23 ` dominiq at lps dot ens dot fr
@ 2010-02-25 17:26 ` dominiq at lps dot ens dot fr
2010-03-10 1:57 ` howarth at nitro dot med dot uc dot edu
2010-03-15 14:21 ` dominiq at lps dot ens dot fr
8 siblings, 0 replies; 25+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-02-25 17:26 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from dominiq at lps dot ens dot fr 2010-02-25 17:26 -------
This problem may be related to pr34265, pr36099 and the linked ones.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979
^ permalink raw reply [flat|nested] 25+ messages in thread
* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
2009-08-06 0:19 [Bug middle-end/40979] New: induct benchmark 60% slower when compiled with -fgraphite-identity howarth at nitro dot med dot uc dot edu
` (6 preceding siblings ...)
2010-02-25 17:26 ` dominiq at lps dot ens dot fr
@ 2010-03-10 1:57 ` howarth at nitro dot med dot uc dot edu
2010-03-15 14:21 ` dominiq at lps dot ens dot fr
8 siblings, 0 replies; 25+ messages in thread
From: howarth at nitro dot med dot uc dot edu @ 2010-03-10 1:57 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from howarth at nitro dot med dot uc dot edu 2010-03-10 01:57 -------
The code being degraded by -fgraphite-identity (when using -ffast-math
-funroll-loops -O3) is in the mqr_m and mqc_m modules. The exact distribution
of performance loss in execution time for the induct benchmark is...
no use of -fgraphite-identity 12.695 sec
-fgraphite-identity for all 20.177
sec
-fgraphite-identity for all but mqc_m 14.293 sec
-fgraphite-identity for all but mqr_m 18.598 sec
-fgraphite-identity for all but mqc_m and mqr_m 12.677 sec
as benchmarked on x86_64-apple-darwin10.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979
^ permalink raw reply [flat|nested] 25+ messages in thread
* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
2009-08-06 0:19 [Bug middle-end/40979] New: induct benchmark 60% slower when compiled with -fgraphite-identity howarth at nitro dot med dot uc dot edu
` (7 preceding siblings ...)
2010-03-10 1:57 ` howarth at nitro dot med dot uc dot edu
@ 2010-03-15 14:21 ` dominiq at lps dot ens dot fr
8 siblings, 0 replies; 25+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-03-15 14:21 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from dominiq at lps dot ens dot fr 2010-03-15 14:21 -------
See also pr43359.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979
^ permalink raw reply [flat|nested] 25+ messages in thread
* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
[not found] <bug-40979-4@http.gcc.gnu.org/bugzilla/>
` (13 preceding siblings ...)
2011-02-02 15:53 ` spop at gcc dot gnu.org
@ 2011-02-02 15:59 ` spop at gcc dot gnu.org
14 siblings, 0 replies; 25+ messages in thread
From: spop at gcc dot gnu.org @ 2011-02-02 15:59 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979
Sebastian Pop <spop at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
--- Comment #23 from Sebastian Pop <spop at gcc dot gnu.org> 2011-02-02 15:59:20 UTC ---
Fixed.
^ permalink raw reply [flat|nested] 25+ messages in thread
* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
[not found] <bug-40979-4@http.gcc.gnu.org/bugzilla/>
` (12 preceding siblings ...)
2011-02-01 21:22 ` spop at gcc dot gnu.org
@ 2011-02-02 15:53 ` spop at gcc dot gnu.org
2011-02-02 15:59 ` spop at gcc dot gnu.org
14 siblings, 0 replies; 25+ messages in thread
From: spop at gcc dot gnu.org @ 2011-02-02 15:53 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979
--- Comment #22 from Sebastian Pop <spop at gcc dot gnu.org> 2011-02-02 15:52:26 UTC ---
Author: spop
Date: Wed Feb 2 15:52:21 2011
New Revision: 169531
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=169531
Log:
Fix PR40979 and PR47044: after LIM call copy_prop and DCE to clean up.
2011-02-02 Sebastian Pop <sebastian.pop@amd.com>
Richard Guenther <rguenther@suse.de>
PR tree-optimization/40979
PR bootstrap/47044
* passes.c (init_optimization_passes): After LIM call copy_prop
and DCE to clean up.
* tree-ssa-loop.c (pass_graphite_transforms): Add TODO_dump_func.
* gcc.dg/graphite/graphite.exp (DEFAULT_VECTCFLAGS): Add -ffast-math.
* gcc.dg/graphite/pr35356-2.c: Adjust pattern.
* gfortran.dg/graphite/graphite.exp: Run vect_files conditionally to
check_vect_support_and_set_flags.
* gfortran.dg/graphite/vect-pr40979.f90: New.
Added:
trunk/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90
Modified:
trunk/gcc/ChangeLog
trunk/gcc/passes.c
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.dg/graphite/graphite.exp
trunk/gcc/testsuite/gcc.dg/graphite/pr35356-2.c
trunk/gcc/testsuite/gfortran.dg/graphite/graphite.exp
trunk/gcc/tree-ssa-loop.c
^ permalink raw reply [flat|nested] 25+ messages in thread
* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
[not found] <bug-40979-4@http.gcc.gnu.org/bugzilla/>
` (11 preceding siblings ...)
2011-02-01 21:19 ` howarth at nitro dot med.uc.edu
@ 2011-02-01 21:22 ` spop at gcc dot gnu.org
2011-02-02 15:53 ` spop at gcc dot gnu.org
2011-02-02 15:59 ` spop at gcc dot gnu.org
14 siblings, 0 replies; 25+ messages in thread
From: spop at gcc dot gnu.org @ 2011-02-01 21:22 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979
--- Comment #21 from Sebastian Pop <spop at gcc dot gnu.org> 2011-02-01 20:51:31 UTC ---
Patch here:
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00070.html
^ permalink raw reply [flat|nested] 25+ messages in thread
* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
[not found] <bug-40979-4@http.gcc.gnu.org/bugzilla/>
` (10 preceding siblings ...)
2011-02-01 18:22 ` dominiq at lps dot ens.fr
@ 2011-02-01 21:19 ` howarth at nitro dot med.uc.edu
2011-02-01 21:22 ` spop at gcc dot gnu.org
` (2 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: howarth at nitro dot med.uc.edu @ 2011-02-01 21:19 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979
--- Comment #20 from Jack Howarth <howarth at nitro dot med.uc.edu> 2011-02-01 20:15:49 UTC ---
FYI, the patches in Comment 14 and 17 when also used with the patch...
Index: opts.c
===================================================================
--- opts.c (revision 167318)
+++ opts.c (working copy)
@@ -462,6 +462,9 @@
{ OPT_LEVELS_1_PLUS, OPT_fcombine_stack_adjustments, NULL, 1 },
/* -O2 optimizations. */
+#ifdef HAVE_cloog
+ { OPT_LEVELS_2_PLUS, OPT_fgraphite_identity, NULL, 1 },
+#endif
{ OPT_LEVELS_2_PLUS, OPT_finline_small_functions, NULL, 1 },
{ OPT_LEVELS_2_PLUS, OPT_findirect_inlining, NULL, 1 },
{ OPT_LEVELS_2_PLUS, OPT_fpartial_inlining, NULL, 1 },
shows that the vect.exp failures (PR 47048) for -fgraphite-identity at -O2 are
reduced from
the previous 129 at -m32 to only 24! So close to perfection...
^ permalink raw reply [flat|nested] 25+ messages in thread
* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
[not found] <bug-40979-4@http.gcc.gnu.org/bugzilla/>
` (9 preceding siblings ...)
2011-02-01 17:51 ` sebpop at gmail dot com
@ 2011-02-01 18:22 ` dominiq at lps dot ens.fr
2011-02-01 21:19 ` howarth at nitro dot med.uc.edu
` (3 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: dominiq at lps dot ens.fr @ 2011-02-01 18:22 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979
--- Comment #19 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2011-02-01 17:40:17 UTC ---
> That made the loop vectorizable.
Confirmed on top of the patch in comment #14.
^ permalink raw reply [flat|nested] 25+ messages in thread
* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
[not found] <bug-40979-4@http.gcc.gnu.org/bugzilla/>
` (8 preceding siblings ...)
2011-02-01 17:23 ` sebpop at gmail dot com
@ 2011-02-01 17:51 ` sebpop at gmail dot com
2011-02-01 18:22 ` dominiq at lps dot ens.fr
` (4 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: sebpop at gmail dot com @ 2011-02-01 17:51 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979
--- Comment #18 from sebpop at gmail dot com <sebpop at gmail dot com> 2011-02-01 17:22:06 UTC ---
On Tue, Feb 1, 2011 at 11:15, rguenth at gcc dot gnu.org
<gcc-bugzilla@gcc.gnu.org> wrote:
> I'd suggest
>
> NEXT_PASS (pass_graphite);
> {
> struct opt_pass **p = &pass_graphite.pass.sub;
> NEXT_PASS (pass_graphite_transforms);
> NEXT_PASS (pass_lim);
> NEXT_PASS (pass_copy_prop);
> NEXT_PASS (pass_dce_loop);
> }
>
That made the loop vectorizable.
Thanks Richi!
^ permalink raw reply [flat|nested] 25+ messages in thread
* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
[not found] <bug-40979-4@http.gcc.gnu.org/bugzilla/>
` (7 preceding siblings ...)
2011-02-01 17:18 ` rguenth at gcc dot gnu.org
@ 2011-02-01 17:23 ` sebpop at gmail dot com
2011-02-01 17:51 ` sebpop at gmail dot com
` (5 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: sebpop at gmail dot com @ 2011-02-01 17:23 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979
--- Comment #16 from sebpop at gmail dot com <sebpop at gmail dot com> 2011-02-01 16:59:03 UTC ---
> It's unfortunate that graphite inserts arrays of size 1 instead of scalar
> (memory) vars.
That could be easily fixed.
graphite can also use the original data reference to write the reduction in,
and that cannot be replaced by a scalar memory variable.
^ permalink raw reply [flat|nested] 25+ messages in thread
* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
[not found] <bug-40979-4@http.gcc.gnu.org/bugzilla/>
` (6 preceding siblings ...)
2011-02-01 16:47 ` spop at gcc dot gnu.org
@ 2011-02-01 17:18 ` rguenth at gcc dot gnu.org
2011-02-01 17:23 ` sebpop at gmail dot com
` (6 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-02-01 17:18 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979
--- Comment #17 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-02-01 17:04:38 UTC ---
(In reply to comment #15)
> The vectorizer does not apply because it does not match the canonical
> form of a reduction: here is the reduction after graphite-identity:
>
> # l12__lsm.18_179 = PHI <l12__lsm.18_183(5), l12__lsm.18_154(7)>
> S1: l12_lower_188 = l12__lsm.18_179;
> l12_lower_184 = D.1589_34 + l12_lower_188;
> S2: l12__lsm.18_154 = l12_lower_184;
>
> Without S1 and S2, this would be recognized as a reduction by the
> vectorizer.
>
> Why we end up with the two extra copies?
> Here is the original code:
>
> # l12_lower_5 = PHI <l12_lower_4(4), l12_lower_36(6)>
> l12_lower_36 = D.1589_321 + l12_lower_5;
>
> Graphite does the following:
>
> l12_lower_5 = *l12_43(D);
> l12_lower_36 = D.1589_321 + l12_lower_5;
> *l12_43(D) = l12_lower_36;
>
> Note that at this point we cannot construct this code because we use
> data references and we are in Gimple form:
>
> *l12_43(D) = D.1589_321 + *l12_43(D);
>
> So I think that the code produced by Graphite is fine, and the problem
> is in the cleanups that we're doing after: for instance loop invariant
> motion could be improved to avoid the extra two statements S1 and S2:
>
> # l12__lsm.18_179 = PHI <l12__lsm.18_183(5), l12__lsm.18_154(7)>
> S1: l12_lower_188 = l12__lsm.18_179;
> l12_lower_184 = D.1589_34 + l12_lower_188;
> S2: l12__lsm.18_154 = l12_lower_184;
Well, LIM needs a copyprop to cleanup after it - but the cleanups
after graphite are in a strange order. LIM is also not really the
pass that is supposed to do scalarization of the memory temporary.
> I also have tried to run pass_rename_ssa_copies but that would just
> rename the base variable l12__lsm.18 into l12_lower and wait for the
> out-of-SSA to remove the extra copies. Constant propagation does not
> help either... any other suggestions?
I'd suggest
NEXT_PASS (pass_graphite);
{
struct opt_pass **p = &pass_graphite.pass.sub;
NEXT_PASS (pass_graphite_transforms);
NEXT_PASS (pass_lim);
NEXT_PASS (pass_copy_prop);
NEXT_PASS (pass_dce_loop);
}
^ permalink raw reply [flat|nested] 25+ messages in thread
* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
[not found] <bug-40979-4@http.gcc.gnu.org/bugzilla/>
` (5 preceding siblings ...)
2011-02-01 11:45 ` rguenth at gcc dot gnu.org
@ 2011-02-01 16:47 ` spop at gcc dot gnu.org
2011-02-01 17:18 ` rguenth at gcc dot gnu.org
` (7 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: spop at gcc dot gnu.org @ 2011-02-01 16:47 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979
--- Comment #15 from Sebastian Pop <spop at gcc dot gnu.org> 2011-02-01 16:46:54 UTC ---
The vectorizer does not apply because it does not match the canonical
form of a reduction: here is the reduction after graphite-identity:
# l12__lsm.18_179 = PHI <l12__lsm.18_183(5), l12__lsm.18_154(7)>
S1: l12_lower_188 = l12__lsm.18_179;
l12_lower_184 = D.1589_34 + l12_lower_188;
S2: l12__lsm.18_154 = l12_lower_184;
Without S1 and S2, this would be recognized as a reduction by the
vectorizer.
Why we end up with the two extra copies?
Here is the original code:
# l12_lower_5 = PHI <l12_lower_4(4), l12_lower_36(6)>
l12_lower_36 = D.1589_321 + l12_lower_5;
Graphite does the following:
l12_lower_5 = *l12_43(D);
l12_lower_36 = D.1589_321 + l12_lower_5;
*l12_43(D) = l12_lower_36;
Note that at this point we cannot construct this code because we use
data references and we are in Gimple form:
*l12_43(D) = D.1589_321 + *l12_43(D);
So I think that the code produced by Graphite is fine, and the problem
is in the cleanups that we're doing after: for instance loop invariant
motion could be improved to avoid the extra two statements S1 and S2:
# l12__lsm.18_179 = PHI <l12__lsm.18_183(5), l12__lsm.18_154(7)>
S1: l12_lower_188 = l12__lsm.18_179;
l12_lower_184 = D.1589_34 + l12_lower_188;
S2: l12__lsm.18_154 = l12_lower_184;
I also have tried to run pass_rename_ssa_copies but that would just
rename the base variable l12__lsm.18 into l12_lower and wait for the
out-of-SSA to remove the extra copies. Constant propagation does not
help either... any other suggestions?
^ permalink raw reply [flat|nested] 25+ messages in thread
* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
[not found] <bug-40979-4@http.gcc.gnu.org/bugzilla/>
` (4 preceding siblings ...)
2011-02-01 11:35 ` rguenth at gcc dot gnu.org
@ 2011-02-01 11:45 ` rguenth at gcc dot gnu.org
2011-02-01 16:47 ` spop at gcc dot gnu.org
` (8 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-02-01 11:45 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979
--- Comment #14 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-02-01 11:45:44 UTC ---
Noting that pass_graphite_transforms lacks any verifier calls, the following
would enable the cleanup (in case scalar vars would have been used).
Index: gcc/tree-ssa-loop.c
===================================================================
--- gcc/tree-ssa-loop.c (revision 169434)
+++ gcc/tree-ssa-loop.c (working copy)
@@ -314,7 +314,8 @@ struct gimple_opt_pass pass_graphite_tra
0, /* properties_provided */
0, /* properties_destroyed */
0, /* todo_flags_start */
- 0 /* todo_flags_finish */
+ TODO_update_address_taken
+ | TODO_dump_func /* todo_flags_finish */
}
};
^ permalink raw reply [flat|nested] 25+ messages in thread
* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
[not found] <bug-40979-4@http.gcc.gnu.org/bugzilla/>
` (3 preceding siblings ...)
2011-01-31 19:36 ` dominiq at lps dot ens.fr
@ 2011-02-01 11:35 ` rguenth at gcc dot gnu.org
2011-02-01 11:45 ` rguenth at gcc dot gnu.org
` (9 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-02-01 11:35 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979
--- Comment #13 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-02-01 11:35:08 UTC ---
It's unfortunate that graphite inserts arrays of size 1 instead of scalar
(memory) vars. Otherwise update-address-taken would just re-write those
into SSA after going out-of-graphite (if run, of course). It can probably
be teached to rewrite single-element arrays into SSA form as well.
^ permalink raw reply [flat|nested] 25+ messages in thread
* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
[not found] <bug-40979-4@http.gcc.gnu.org/bugzilla/>
` (2 preceding siblings ...)
2011-01-31 18:53 ` spop at gcc dot gnu.org
@ 2011-01-31 19:36 ` dominiq at lps dot ens.fr
2011-02-01 11:35 ` rguenth at gcc dot gnu.org
` (10 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: dominiq at lps dot ens.fr @ 2011-01-31 19:36 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979
--- Comment #12 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2011-01-31 18:46:23 UTC ---
> I looked at how to improve translate_scalar_reduction_to_array in
> order to avoid the creation of the temporary array, but it seems to be
> difficult as the result is written to memory under a different type
> than the reduction itself: l12_lower is a real whereas l12 is an
> integer:
In the original code l12 is real (kind = longreal) as l12_lower, but making the
change does not help the vectorization.
^ permalink raw reply [flat|nested] 25+ messages in thread
* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
[not found] <bug-40979-4@http.gcc.gnu.org/bugzilla/>
2011-01-26 10:45 ` dominiq at lps dot ens.fr
2011-01-26 14:43 ` howarth at nitro dot med.uc.edu
@ 2011-01-31 18:53 ` spop at gcc dot gnu.org
2011-01-31 19:36 ` dominiq at lps dot ens.fr
` (11 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: spop at gcc dot gnu.org @ 2011-01-31 18:53 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979
--- Comment #11 from Sebastian Pop <spop at gcc dot gnu.org> 2011-01-31 18:12:38 UTC ---
Here is a reduced testcase from induct.f90 for the first loop
not vectorized with -fgraphite-identity:
module mqc_m
integer, parameter, private :: longreal = selected_real_kind(15,90)
contains
subroutine mutual_ind_quad_cir_coil (m, l12)
real (kind = longreal), dimension(9), save :: w2gauss, w1gauss
real (kind = longreal) :: l12_lower, numerator
real (kind = longreal), dimension(3) :: current_vector, coil_current_vec
w2gauss(1) = 16.0_longreal/81.0_longreal
w1gauss(5) = 0.3302393550_longreal
do i = 1, 2*m
do j = 1, 9
do k = 1, 9
numerator = w1gauss(j) * w2gauss(k) *
&
dot_product(coil_current_vec,current_vector)
l12_lower = l12_lower + numerator
end do
end do
end do
l12 = l12_lower
end subroutine mutual_ind_quad_cir_coil
end module mqc_m
The problem seems to be that graphite introduces a
Commutative_Associative_Reduction array that confuses the vectorizer.
I looked at how to improve translate_scalar_reduction_to_array in
order to avoid the creation of the temporary array, but it seems to be
difficult as the result is written to memory under a different type
than the reduction itself: l12_lower is a real whereas l12 is an
integer:
l12_lower_200 = some_computation;
# l12_lower_9 = PHI <l12_lower_16(D)(2), l12_lower_200(9)>
D.1585_43 = (integer(kind=4)) l12_lower_9;
# .MEM_48 = VDEF <.MEM_47>
*l12_44(D) = D.1585_43;
so we cannot use *l12_44(D) as a data reference in the loop to perform
the reduction as it does not have the same precision as l12_lower: it
seems to me that we cannot avoid creating the temporary array.
The solution could be to clean up the temporary arrays after graphite.
^ permalink raw reply [flat|nested] 25+ messages in thread
* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
[not found] <bug-40979-4@http.gcc.gnu.org/bugzilla/>
2011-01-26 10:45 ` dominiq at lps dot ens.fr
@ 2011-01-26 14:43 ` howarth at nitro dot med.uc.edu
2011-01-31 18:53 ` spop at gcc dot gnu.org
` (12 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: howarth at nitro dot med.uc.edu @ 2011-01-26 14:43 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979
--- Comment #10 from Jack Howarth <howarth at nitro dot med.uc.edu> 2011-01-26 14:20:18 UTC ---
(In reply to comment #9)
> This pr is not fixed at revision 169261 (gfc). AFAIU -ftree-loop-linear is now
> implemented through graphite. This leads to a sort of regression with respect
> to revision 169227(gfc6):
>
> [macbook] lin/test% gfc -Ofast -ftree-loop-linear induct.f90
> [macbook] lin/test% time a.out > /dev/null
> 22.380u 0.023s 0:22.40 100.0% 0+0k 0+0io 0pf+0w
> [macbook] lin/test% gfc6 -Ofast -ftree-loop-linear induct.f90
> [macbook] lin/test% time a.out > /dev/null
> 13.978u 0.019s 0:13.99 99.9% 0+0k 0+0io 0pf+0w
Note that -fgraphite-identity still triggers a large number of failures in the
vect.exp testsuite when defaulted on at -O2...
http://gcc.gnu.org/ml/gcc-testresults/2011-01/msg02005.html
so the regression in induct.f90 isn't unique.
^ permalink raw reply [flat|nested] 25+ messages in thread
* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
[not found] <bug-40979-4@http.gcc.gnu.org/bugzilla/>
@ 2011-01-26 10:45 ` dominiq at lps dot ens.fr
2011-01-26 14:43 ` howarth at nitro dot med.uc.edu
` (13 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: dominiq at lps dot ens.fr @ 2011-01-26 10:45 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979
--- Comment #9 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2011-01-26 10:23:12 UTC ---
This pr is not fixed at revision 169261 (gfc). AFAIU -ftree-loop-linear is now
implemented through graphite. This leads to a sort of regression with respect
to revision 169227(gfc6):
[macbook] lin/test% gfc -Ofast -ftree-loop-linear induct.f90
[macbook] lin/test% time a.out > /dev/null
22.380u 0.023s 0:22.40 100.0% 0+0k 0+0io 0pf+0w
[macbook] lin/test% gfc6 -Ofast -ftree-loop-linear induct.f90
[macbook] lin/test% time a.out > /dev/null
13.978u 0.019s 0:13.99 99.9% 0+0k 0+0io 0pf+0w
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2011-02-02 15:59 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-06 0:19 [Bug middle-end/40979] New: induct benchmark 60% slower when compiled with -fgraphite-identity howarth at nitro dot med dot uc dot edu
2009-08-06 0:23 ` [Bug middle-end/40979] " spop at gcc dot gnu dot org
2009-08-12 14:58 ` spop at gcc dot gnu dot org
2009-08-13 2:26 ` howarth at nitro dot med dot uc dot edu
2009-08-14 11:59 ` dominiq at lps dot ens dot fr
2009-12-14 19:24 ` spop at gcc dot gnu dot org
2010-02-25 15:23 ` dominiq at lps dot ens dot fr
2010-02-25 17:26 ` dominiq at lps dot ens dot fr
2010-03-10 1:57 ` howarth at nitro dot med dot uc dot edu
2010-03-15 14:21 ` dominiq at lps dot ens dot fr
[not found] <bug-40979-4@http.gcc.gnu.org/bugzilla/>
2011-01-26 10:45 ` dominiq at lps dot ens.fr
2011-01-26 14:43 ` howarth at nitro dot med.uc.edu
2011-01-31 18:53 ` spop at gcc dot gnu.org
2011-01-31 19:36 ` dominiq at lps dot ens.fr
2011-02-01 11:35 ` rguenth at gcc dot gnu.org
2011-02-01 11:45 ` rguenth at gcc dot gnu.org
2011-02-01 16:47 ` spop at gcc dot gnu.org
2011-02-01 17:18 ` rguenth at gcc dot gnu.org
2011-02-01 17:23 ` sebpop at gmail dot com
2011-02-01 17:51 ` sebpop at gmail dot com
2011-02-01 18:22 ` dominiq at lps dot ens.fr
2011-02-01 21:19 ` howarth at nitro dot med.uc.edu
2011-02-01 21:22 ` spop at gcc dot gnu.org
2011-02-02 15:53 ` spop at gcc dot gnu.org
2011-02-02 15:59 ` spop at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).