[Bug middle-end/40979] New: induct benchmark 60% slower when compiled with -fgraphite-identity

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug middle-end/40979]  New: induct benchmark 60% slower when compiled with -fgraphite-identity
@ 2009-08-06  0:19 howarth at nitro dot med dot uc dot edu
  2009-08-06  0:23 ` [Bug middle-end/40979] " spop at gcc dot gnu dot org
                   ` (8 more replies)
  0 siblings, 9 replies; 25+ messages in thread
From: howarth at nitro dot med dot uc dot edu @ 2009-08-06  0:19 UTC (permalink / raw)
  To: gcc-bugs

The Polyhedron 2005 induct benchmark averages 12.44 seconds run-time when
compiled with...

gfortran -ffast-math -funroll-loops -msse3 -O3 induct.f90 -o induct

but averages 20.2 seconds when compiled with -fgraphite-identity added to the
compilation flags.
This issue remains after...

http://gcc.gnu.org/ml/gcc-patches/2009-08/msg00220.html
http://gcc.gnu.org/ml/gcc-patches/2009-08/msg00294.html

are applied to r150500.


-- 
           Summary: induct benchmark 60% slower when compiled with -
                    fgraphite-identity
           Product: gcc
           Version: 4.5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: howarth at nitro dot med dot uc dot edu
 GCC build triplet: x86_64-apple-darwin10
  GCC host triplet: x86_64-apple-darwin10
GCC target triplet: x86_64-apple-darwin10


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
  2009-08-06  0:19 [Bug middle-end/40979] New: induct benchmark 60% slower when compiled with -fgraphite-identity howarth at nitro dot med dot uc dot edu
@ 2009-08-06  0:23 ` spop at gcc dot gnu dot org
  2009-08-12 14:58 ` spop at gcc dot gnu dot org
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: spop at gcc dot gnu dot org @ 2009-08-06  0:23 UTC (permalink / raw)
  To: gcc-bugs



-- 

spop at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|unassigned at gcc dot gnu   |spop at gcc dot gnu dot org
                   |dot org                     |
             Status|UNCONFIRMED                 |ASSIGNED
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2009-08-06 00:23:48
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
  2009-08-06  0:19 [Bug middle-end/40979] New: induct benchmark 60% slower when compiled with -fgraphite-identity howarth at nitro dot med dot uc dot edu
  2009-08-06  0:23 ` [Bug middle-end/40979] " spop at gcc dot gnu dot org
@ 2009-08-12 14:58 ` spop at gcc dot gnu dot org
  2009-08-13  2:26 ` howarth at nitro dot med dot uc dot edu
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: spop at gcc dot gnu dot org @ 2009-08-12 14:58 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from spop at gcc dot gnu dot org  2009-08-12 14:58 -------
Still fails on my machine, on rev150694.

~/gcc/svn/trunk/usr/bin/gfortran -ffast-math -funroll-loops -msse3 -O3
induct.f90 -o induct
time ./induct
real    0m16.596s
user    0m16.393s
sys     0m0.076s

~/gcc/svn/trunk/usr/bin/gfortran -fgraphite-identity -ffast-math -funroll-loops
-msse3 -O3 induct.f90 -o induct
time ./induct
real    0m25.740s
user    0m25.634s
sys     0m0.084s


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
  2009-08-06  0:19 [Bug middle-end/40979] New: induct benchmark 60% slower when compiled with -fgraphite-identity howarth at nitro dot med dot uc dot edu
  2009-08-06  0:23 ` [Bug middle-end/40979] " spop at gcc dot gnu dot org
  2009-08-12 14:58 ` spop at gcc dot gnu dot org
@ 2009-08-13  2:26 ` howarth at nitro dot med dot uc dot edu
  2009-08-14 11:59 ` dominiq at lps dot ens dot fr
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: howarth at nitro dot med dot uc dot edu @ 2009-08-13  2:26 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from howarth at nitro dot med dot uc dot edu  2009-08-13 02:25 -------
Interestingly, this benchmark is also the one that shows the best improvement
from -floop-interchange...

Compile Command : gfortran -ffast-math -funroll-loops -msse3 -O3 %n.f90 -o %n
Benchmarks      : induct
Maximum Times   :     2000.0

   Benchmark   Compile  Executable   Ave Run  Number   Estim
        Name    (secs)     (bytes)    (secs) Repeats   Err %
   ---------   -------  ----------   ------- -------  ------
      induct      6.83       10000     12.44      10  0.0153

Compile Command : gfortran -ffast-math -funroll-loops -msse3 -O3
-fgraphite-identity  %n.f90 -o %n
Benchmarks      : induct

   Benchmark   Compile  Executable   Ave Run  Number   Estim
        Name    (secs)     (bytes)    (secs) Repeats   Err %
   ---------   -------  ----------   ------- -------  ------
      induct     25.09       10000     20.19      10  0.0113

Compile Command : gfortran -ffast-math -funroll-loops -msse3 -O3
-fgraphite-identity -floop-interchange %n.f90 -o %n
Benchmarks      : induct

   Benchmark   Compile  Executable   Ave Run  Number   Estim
        Name    (secs)     (bytes)    (secs) Repeats   Err %
   ---------   -------  ----------   ------- -------  ------
      induct     26.48       10000      7.43      10  0.0045


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
  2009-08-06  0:19 [Bug middle-end/40979] New: induct benchmark 60% slower when compiled with -fgraphite-identity howarth at nitro dot med dot uc dot edu
                   ` (2 preceding siblings ...)
  2009-08-13  2:26 ` howarth at nitro dot med dot uc dot edu
@ 2009-08-14 11:59 ` dominiq at lps dot ens dot fr
  2009-12-14 19:24 ` spop at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-08-14 11:59 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from dominiq at lps dot ens dot fr  2009-08-14 11:59 -------
> Interestingly, this benchmark is also the one that shows the best improvement
> from -floop-interchange...

I also see that ~20s versus ~34s, however comparing the outputs:

 Maximum wand/quad abs rel mutual inductance =   5.95379428444659242E-002   
(without)

 Maximum wand/quad abs rel mutual inductance =   5.37795458094567566E-002   
(with)

I suspect that gfortran generates a wrong code with -floop-interchange.

Could this be checked before I fill a new pr?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
  2009-08-06  0:19 [Bug middle-end/40979] New: induct benchmark 60% slower when compiled with -fgraphite-identity howarth at nitro dot med dot uc dot edu
                   ` (3 preceding siblings ...)
  2009-08-14 11:59 ` dominiq at lps dot ens dot fr
@ 2009-12-14 19:24 ` spop at gcc dot gnu dot org
  2010-02-25 15:23 ` dominiq at lps dot ens dot fr
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: spop at gcc dot gnu dot org @ 2009-12-14 19:24 UTC (permalink / raw)
  To: gcc-bugs



-- 

spop at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|spop at gcc dot gnu dot org |unassigned at gcc dot gnu
                   |                            |dot org
             Status|ASSIGNED                    |NEW


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
  2009-08-06  0:19 [Bug middle-end/40979] New: induct benchmark 60% slower when compiled with -fgraphite-identity howarth at nitro dot med dot uc dot edu
                   ` (4 preceding siblings ...)
  2009-12-14 19:24 ` spop at gcc dot gnu dot org
@ 2010-02-25 15:23 ` dominiq at lps dot ens dot fr
  2010-02-25 17:26 ` dominiq at lps dot ens dot fr
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-02-25 15:23 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from dominiq at lps dot ens dot fr  2010-02-25 15:23 -------
At revision 156693 or higher, the miscompilation with -floop-interchange
reported in comment #3 is gone. As a consequence the corresponding execution
time is now the same as when compiled with -fgraphite-identity. 

The timings wit/without the options correspond to a missed vectorization of the
critical loops.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
  2009-08-06  0:19 [Bug middle-end/40979] New: induct benchmark 60% slower when compiled with -fgraphite-identity howarth at nitro dot med dot uc dot edu
                   ` (5 preceding siblings ...)
  2010-02-25 15:23 ` dominiq at lps dot ens dot fr
@ 2010-02-25 17:26 ` dominiq at lps dot ens dot fr
  2010-03-10  1:57 ` howarth at nitro dot med dot uc dot edu
  2010-03-15 14:21 ` dominiq at lps dot ens dot fr
  8 siblings, 0 replies; 25+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-02-25 17:26 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from dominiq at lps dot ens dot fr  2010-02-25 17:26 -------
This problem may be related to pr34265, pr36099 and the linked ones.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
  2009-08-06  0:19 [Bug middle-end/40979] New: induct benchmark 60% slower when compiled with -fgraphite-identity howarth at nitro dot med dot uc dot edu
                   ` (6 preceding siblings ...)
  2010-02-25 17:26 ` dominiq at lps dot ens dot fr
@ 2010-03-10  1:57 ` howarth at nitro dot med dot uc dot edu
  2010-03-15 14:21 ` dominiq at lps dot ens dot fr
  8 siblings, 0 replies; 25+ messages in thread
From: howarth at nitro dot med dot uc dot edu @ 2010-03-10  1:57 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from howarth at nitro dot med dot uc dot edu  2010-03-10 01:57 -------
The code being degraded by -fgraphite-identity (when using -ffast-math
-funroll-loops -O3) is in the mqr_m and mqc_m modules. The exact distribution
of performance loss in execution time for the induct benchmark is...

no use of -fgraphite-identity                                      12.695 sec
-fgraphite-identity for all                                            20.177
sec
-fgraphite-identity for all but mqc_m                       14.293 sec
-fgraphite-identity for all but mqr_m                        18.598 sec
-fgraphite-identity for all but mqc_m and mqr_m   12.677 sec

as benchmarked on x86_64-apple-darwin10.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
  2009-08-06  0:19 [Bug middle-end/40979] New: induct benchmark 60% slower when compiled with -fgraphite-identity howarth at nitro dot med dot uc dot edu
                   ` (7 preceding siblings ...)
  2010-03-10  1:57 ` howarth at nitro dot med dot uc dot edu
@ 2010-03-15 14:21 ` dominiq at lps dot ens dot fr
  8 siblings, 0 replies; 25+ messages in thread
From: dominiq at lps dot ens dot fr @ 2010-03-15 14:21 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from dominiq at lps dot ens dot fr  2010-03-15 14:21 -------
See also pr43359.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
       [not found] <bug-40979-4@http.gcc.gnu.org/bugzilla/>
                   ` (13 preceding siblings ...)
  2011-02-02 15:53 ` spop at gcc dot gnu.org
@ 2011-02-02 15:59 ` spop at gcc dot gnu.org
  14 siblings, 0 replies; 25+ messages in thread
From: spop at gcc dot gnu.org @ 2011-02-02 15:59 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979

Sebastian Pop <spop at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED

--- Comment #23 from Sebastian Pop <spop at gcc dot gnu.org> 2011-02-02 15:59:20 UTC ---
Fixed.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
       [not found] <bug-40979-4@http.gcc.gnu.org/bugzilla/>
                   ` (12 preceding siblings ...)
  2011-02-01 21:22 ` spop at gcc dot gnu.org
@ 2011-02-02 15:53 ` spop at gcc dot gnu.org
  2011-02-02 15:59 ` spop at gcc dot gnu.org
  14 siblings, 0 replies; 25+ messages in thread
From: spop at gcc dot gnu.org @ 2011-02-02 15:53 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979

--- Comment #22 from Sebastian Pop <spop at gcc dot gnu.org> 2011-02-02 15:52:26 UTC ---
Author: spop
Date: Wed Feb  2 15:52:21 2011
New Revision: 169531

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=169531
Log:
Fix PR40979 and PR47044: after LIM call copy_prop and DCE to clean up.

2011-02-02  Sebastian Pop  <sebastian.pop@amd.com>
        Richard Guenther  <rguenther@suse.de>

    PR tree-optimization/40979
    PR bootstrap/47044
    * passes.c (init_optimization_passes): After LIM call copy_prop
    and DCE to clean up.
    * tree-ssa-loop.c (pass_graphite_transforms): Add TODO_dump_func.

    * gcc.dg/graphite/graphite.exp (DEFAULT_VECTCFLAGS): Add -ffast-math.
    * gcc.dg/graphite/pr35356-2.c: Adjust pattern.
    * gfortran.dg/graphite/graphite.exp: Run vect_files conditionally to
    check_vect_support_and_set_flags.
    * gfortran.dg/graphite/vect-pr40979.f90: New.

Added:
    trunk/gcc/testsuite/gfortran.dg/graphite/vect-pr40979.f90
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/passes.c
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.dg/graphite/graphite.exp
    trunk/gcc/testsuite/gcc.dg/graphite/pr35356-2.c
    trunk/gcc/testsuite/gfortran.dg/graphite/graphite.exp
    trunk/gcc/tree-ssa-loop.c


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
       [not found] <bug-40979-4@http.gcc.gnu.org/bugzilla/>
                   ` (11 preceding siblings ...)
  2011-02-01 21:19 ` howarth at nitro dot med.uc.edu
@ 2011-02-01 21:22 ` spop at gcc dot gnu.org
  2011-02-02 15:53 ` spop at gcc dot gnu.org
  2011-02-02 15:59 ` spop at gcc dot gnu.org
  14 siblings, 0 replies; 25+ messages in thread
From: spop at gcc dot gnu.org @ 2011-02-01 21:22 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979

--- Comment #21 from Sebastian Pop <spop at gcc dot gnu.org> 2011-02-01 20:51:31 UTC ---
Patch here:
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg00070.html


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
       [not found] <bug-40979-4@http.gcc.gnu.org/bugzilla/>
                   ` (10 preceding siblings ...)
  2011-02-01 18:22 ` dominiq at lps dot ens.fr
@ 2011-02-01 21:19 ` howarth at nitro dot med.uc.edu
  2011-02-01 21:22 ` spop at gcc dot gnu.org
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: howarth at nitro dot med.uc.edu @ 2011-02-01 21:19 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979

--- Comment #20 from Jack Howarth <howarth at nitro dot med.uc.edu> 2011-02-01 20:15:49 UTC ---
FYI, the patches in Comment 14 and 17 when also used with the patch...

Index: opts.c
===================================================================
--- opts.c      (revision 167318)
+++ opts.c      (working copy)
@@ -462,6 +462,9 @@
     { OPT_LEVELS_1_PLUS, OPT_fcombine_stack_adjustments, NULL, 1 },

     /* -O2 optimizations.  */
+#ifdef HAVE_cloog
+    { OPT_LEVELS_2_PLUS, OPT_fgraphite_identity, NULL, 1 },
+#endif
     { OPT_LEVELS_2_PLUS, OPT_finline_small_functions, NULL, 1 },
     { OPT_LEVELS_2_PLUS, OPT_findirect_inlining, NULL, 1 },
     { OPT_LEVELS_2_PLUS, OPT_fpartial_inlining, NULL, 1 },

shows that the vect.exp failures (PR 47048) for -fgraphite-identity at -O2 are
reduced from
the previous 129 at -m32 to only 24! So close to perfection...


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
       [not found] <bug-40979-4@http.gcc.gnu.org/bugzilla/>
                   ` (9 preceding siblings ...)
  2011-02-01 17:51 ` sebpop at gmail dot com
@ 2011-02-01 18:22 ` dominiq at lps dot ens.fr
  2011-02-01 21:19 ` howarth at nitro dot med.uc.edu
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: dominiq at lps dot ens.fr @ 2011-02-01 18:22 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979

--- Comment #19 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2011-02-01 17:40:17 UTC ---
> That made the loop vectorizable.

Confirmed on top of the patch in comment #14.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
       [not found] <bug-40979-4@http.gcc.gnu.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2011-02-01 17:23 ` sebpop at gmail dot com
@ 2011-02-01 17:51 ` sebpop at gmail dot com
  2011-02-01 18:22 ` dominiq at lps dot ens.fr
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: sebpop at gmail dot com @ 2011-02-01 17:51 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979

--- Comment #18 from sebpop at gmail dot com <sebpop at gmail dot com> 2011-02-01 17:22:06 UTC ---
On Tue, Feb 1, 2011 at 11:15, rguenth at gcc dot gnu.org
<gcc-bugzilla@gcc.gnu.org> wrote:
> I'd suggest
>
>          NEXT_PASS (pass_graphite);
>            {
>              struct opt_pass **p = &pass_graphite.pass.sub;
>              NEXT_PASS (pass_graphite_transforms);
>              NEXT_PASS (pass_lim);
>              NEXT_PASS (pass_copy_prop);
>              NEXT_PASS (pass_dce_loop);
>            }
>

That made the loop vectorizable.
Thanks Richi!


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
       [not found] <bug-40979-4@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2011-02-01 17:18 ` rguenth at gcc dot gnu.org
@ 2011-02-01 17:23 ` sebpop at gmail dot com
  2011-02-01 17:51 ` sebpop at gmail dot com
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: sebpop at gmail dot com @ 2011-02-01 17:23 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979

--- Comment #16 from sebpop at gmail dot com <sebpop at gmail dot com> 2011-02-01 16:59:03 UTC ---
> It's unfortunate that graphite inserts arrays of size 1 instead of scalar
> (memory) vars.

That could be easily fixed.

graphite can also use the original data reference to write the reduction in,
and that cannot be replaced by a scalar memory variable.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
       [not found] <bug-40979-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2011-02-01 16:47 ` spop at gcc dot gnu.org
@ 2011-02-01 17:18 ` rguenth at gcc dot gnu.org
  2011-02-01 17:23 ` sebpop at gmail dot com
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-02-01 17:18 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979

--- Comment #17 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-02-01 17:04:38 UTC ---
(In reply to comment #15)
> The vectorizer does not apply because it does not match the canonical
> form of a reduction: here is the reduction after graphite-identity:
> 
>         # l12__lsm.18_179 = PHI <l12__lsm.18_183(5), l12__lsm.18_154(7)>
> S1:        l12_lower_188 = l12__lsm.18_179;
>         l12_lower_184 = D.1589_34 + l12_lower_188;
> S2:        l12__lsm.18_154 = l12_lower_184;
> 
> Without S1 and S2, this would be recognized as a reduction by the
> vectorizer.
> 
> Why we end up with the two extra copies?
> Here is the original code:
> 
>         # l12_lower_5 = PHI <l12_lower_4(4), l12_lower_36(6)>
>         l12_lower_36 = D.1589_321 + l12_lower_5;
> 
> Graphite does the following:
> 
>         l12_lower_5 = *l12_43(D);
>         l12_lower_36 = D.1589_321 + l12_lower_5;
>         *l12_43(D) = l12_lower_36;
> 
> Note that at this point we cannot construct this code because we use
> data references and we are in Gimple form:
> 
>         *l12_43(D) = D.1589_321 + *l12_43(D);
> 
> So I think that the code produced by Graphite is fine, and the problem
> is in the cleanups that we're doing after: for instance loop invariant
> motion could be improved to avoid the extra two statements S1 and S2:
> 
>         # l12__lsm.18_179 = PHI <l12__lsm.18_183(5), l12__lsm.18_154(7)>
> S1:        l12_lower_188 = l12__lsm.18_179;
>         l12_lower_184 = D.1589_34 + l12_lower_188;
> S2:        l12__lsm.18_154 = l12_lower_184;

Well, LIM needs a copyprop to cleanup after it - but the cleanups
after graphite are in a strange order.  LIM is also not really the
pass that is supposed to do scalarization of the memory temporary.

> I also have tried to run pass_rename_ssa_copies but that would just
> rename the base variable l12__lsm.18 into l12_lower and wait for the
> out-of-SSA to remove the extra copies.  Constant propagation does not
> help either... any other suggestions?

I'd suggest

          NEXT_PASS (pass_graphite);
            {
              struct opt_pass **p = &pass_graphite.pass.sub;
              NEXT_PASS (pass_graphite_transforms);
              NEXT_PASS (pass_lim);
              NEXT_PASS (pass_copy_prop);
              NEXT_PASS (pass_dce_loop);
            }


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
       [not found] <bug-40979-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2011-02-01 11:45 ` rguenth at gcc dot gnu.org
@ 2011-02-01 16:47 ` spop at gcc dot gnu.org
  2011-02-01 17:18 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: spop at gcc dot gnu.org @ 2011-02-01 16:47 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979

--- Comment #15 from Sebastian Pop <spop at gcc dot gnu.org> 2011-02-01 16:46:54 UTC ---
The vectorizer does not apply because it does not match the canonical
form of a reduction: here is the reduction after graphite-identity:

        # l12__lsm.18_179 = PHI <l12__lsm.18_183(5), l12__lsm.18_154(7)>
S1:        l12_lower_188 = l12__lsm.18_179;
        l12_lower_184 = D.1589_34 + l12_lower_188;
S2:        l12__lsm.18_154 = l12_lower_184;

Without S1 and S2, this would be recognized as a reduction by the
vectorizer.

Why we end up with the two extra copies?
Here is the original code:

        # l12_lower_5 = PHI <l12_lower_4(4), l12_lower_36(6)>
        l12_lower_36 = D.1589_321 + l12_lower_5;

Graphite does the following:

        l12_lower_5 = *l12_43(D);
        l12_lower_36 = D.1589_321 + l12_lower_5;
        *l12_43(D) = l12_lower_36;

Note that at this point we cannot construct this code because we use
data references and we are in Gimple form:

        *l12_43(D) = D.1589_321 + *l12_43(D);

So I think that the code produced by Graphite is fine, and the problem
is in the cleanups that we're doing after: for instance loop invariant
motion could be improved to avoid the extra two statements S1 and S2:

        # l12__lsm.18_179 = PHI <l12__lsm.18_183(5), l12__lsm.18_154(7)>
S1:        l12_lower_188 = l12__lsm.18_179;
        l12_lower_184 = D.1589_34 + l12_lower_188;
S2:        l12__lsm.18_154 = l12_lower_184;

I also have tried to run pass_rename_ssa_copies but that would just
rename the base variable l12__lsm.18 into l12_lower and wait for the
out-of-SSA to remove the extra copies.  Constant propagation does not
help either... any other suggestions?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
       [not found] <bug-40979-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2011-02-01 11:35 ` rguenth at gcc dot gnu.org
@ 2011-02-01 11:45 ` rguenth at gcc dot gnu.org
  2011-02-01 16:47 ` spop at gcc dot gnu.org
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-02-01 11:45 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979

--- Comment #14 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-02-01 11:45:44 UTC ---
Noting that pass_graphite_transforms lacks any verifier calls, the following
would enable the cleanup (in case scalar vars would have been used).

Index: gcc/tree-ssa-loop.c
===================================================================
--- gcc/tree-ssa-loop.c (revision 169434)
+++ gcc/tree-ssa-loop.c (working copy)
@@ -314,7 +314,8 @@ struct gimple_opt_pass pass_graphite_tra
   0,                                   /* properties_provided */
   0,                                   /* properties_destroyed */
   0,                                   /* todo_flags_start */
-  0                                    /* todo_flags_finish */
+  TODO_update_address_taken
+  | TODO_dump_func                     /* todo_flags_finish */
  }
 };


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
       [not found] <bug-40979-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2011-01-31 19:36 ` dominiq at lps dot ens.fr
@ 2011-02-01 11:35 ` rguenth at gcc dot gnu.org
  2011-02-01 11:45 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-02-01 11:35 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979

--- Comment #13 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-02-01 11:35:08 UTC ---
It's unfortunate that graphite inserts arrays of size 1 instead of scalar
(memory) vars.  Otherwise update-address-taken would just re-write those
into SSA after going out-of-graphite (if run, of course).  It can probably
be teached to rewrite single-element arrays into SSA form as well.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
       [not found] <bug-40979-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2011-01-31 18:53 ` spop at gcc dot gnu.org
@ 2011-01-31 19:36 ` dominiq at lps dot ens.fr
  2011-02-01 11:35 ` rguenth at gcc dot gnu.org
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: dominiq at lps dot ens.fr @ 2011-01-31 19:36 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979

--- Comment #12 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2011-01-31 18:46:23 UTC ---
> I looked at how to improve translate_scalar_reduction_to_array in
> order to avoid the creation of the temporary array, but it seems to be
> difficult as the result is written to memory under a different type
> than the reduction itself: l12_lower is a real whereas l12 is an
> integer:

In the original code l12 is real (kind = longreal) as l12_lower, but making the
change does not help the vectorization.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
       [not found] <bug-40979-4@http.gcc.gnu.org/bugzilla/>
  2011-01-26 10:45 ` dominiq at lps dot ens.fr
  2011-01-26 14:43 ` howarth at nitro dot med.uc.edu
@ 2011-01-31 18:53 ` spop at gcc dot gnu.org
  2011-01-31 19:36 ` dominiq at lps dot ens.fr
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: spop at gcc dot gnu.org @ 2011-01-31 18:53 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979

--- Comment #11 from Sebastian Pop <spop at gcc dot gnu.org> 2011-01-31 18:12:38 UTC ---
Here is a reduced testcase from induct.f90 for the first loop
not vectorized with -fgraphite-identity:

module mqc_m
integer, parameter, private :: longreal = selected_real_kind(15,90)
contains
      subroutine mutual_ind_quad_cir_coil (m, l12)
      real (kind = longreal), dimension(9), save :: w2gauss, w1gauss
      real (kind = longreal) :: l12_lower, numerator
      real (kind = longreal), dimension(3) :: current_vector, coil_current_vec
      w2gauss(1) = 16.0_longreal/81.0_longreal
      w1gauss(5) = 0.3302393550_longreal
      do i = 1, 2*m
          do j = 1, 9
              do k = 1, 9
                  numerator = w1gauss(j) * w2gauss(k) *                        
            &
                                                
dot_product(coil_current_vec,current_vector)
                  l12_lower = l12_lower + numerator
              end do
          end do
      end do
      l12 = l12_lower
      end subroutine mutual_ind_quad_cir_coil
end module mqc_m

The problem seems to be that graphite introduces a
Commutative_Associative_Reduction array that confuses the vectorizer.

I looked at how to improve translate_scalar_reduction_to_array in
order to avoid the creation of the temporary array, but it seems to be
difficult as the result is written to memory under a different type
than the reduction itself: l12_lower is a real whereas l12 is an
integer:

    l12_lower_200 = some_computation;
    # l12_lower_9 = PHI <l12_lower_16(D)(2), l12_lower_200(9)>
    D.1585_43 = (integer(kind=4)) l12_lower_9;
    # .MEM_48 = VDEF <.MEM_47>
    *l12_44(D) = D.1585_43;

so we cannot use *l12_44(D) as a data reference in the loop to perform
the reduction as it does not have the same precision as l12_lower: it
seems to me that we cannot avoid creating the temporary array.

The solution could be to clean up the temporary arrays after graphite.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
       [not found] <bug-40979-4@http.gcc.gnu.org/bugzilla/>
  2011-01-26 10:45 ` dominiq at lps dot ens.fr
@ 2011-01-26 14:43 ` howarth at nitro dot med.uc.edu
  2011-01-31 18:53 ` spop at gcc dot gnu.org
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: howarth at nitro dot med.uc.edu @ 2011-01-26 14:43 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979

--- Comment #10 from Jack Howarth <howarth at nitro dot med.uc.edu> 2011-01-26 14:20:18 UTC ---
(In reply to comment #9)
> This pr is not fixed at revision 169261 (gfc). AFAIU -ftree-loop-linear is now
> implemented through graphite. This leads to a sort of regression with respect
> to revision 169227(gfc6):
> 
> [macbook] lin/test% gfc -Ofast -ftree-loop-linear induct.f90
> [macbook] lin/test% time a.out > /dev/null
> 22.380u 0.023s 0:22.40 100.0%    0+0k 0+0io 0pf+0w
> [macbook] lin/test% gfc6 -Ofast -ftree-loop-linear induct.f90
> [macbook] lin/test% time a.out > /dev/null
> 13.978u 0.019s 0:13.99 99.9%    0+0k 0+0io 0pf+0w

Note that -fgraphite-identity still triggers a large number of failures in the
vect.exp testsuite when defaulted on at -O2...

http://gcc.gnu.org/ml/gcc-testresults/2011-01/msg02005.html

so the regression in induct.f90 isn't unique.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Bug middle-end/40979] induct benchmark 60% slower when compiled with -fgraphite-identity
       [not found] <bug-40979-4@http.gcc.gnu.org/bugzilla/>
@ 2011-01-26 10:45 ` dominiq at lps dot ens.fr
  2011-01-26 14:43 ` howarth at nitro dot med.uc.edu
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 25+ messages in thread
From: dominiq at lps dot ens.fr @ 2011-01-26 10:45 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40979

--- Comment #9 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2011-01-26 10:23:12 UTC ---
This pr is not fixed at revision 169261 (gfc). AFAIU -ftree-loop-linear is now
implemented through graphite. This leads to a sort of regression with respect
to revision 169227(gfc6):

[macbook] lin/test% gfc -Ofast -ftree-loop-linear induct.f90
[macbook] lin/test% time a.out > /dev/null
22.380u 0.023s 0:22.40 100.0%    0+0k 0+0io 0pf+0w
[macbook] lin/test% gfc6 -Ofast -ftree-loop-linear induct.f90
[macbook] lin/test% time a.out > /dev/null
13.978u 0.019s 0:13.99 99.9%    0+0k 0+0io 0pf+0w


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2011-02-02 15:59 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-06  0:19 [Bug middle-end/40979] New: induct benchmark 60% slower when compiled with -fgraphite-identity howarth at nitro dot med dot uc dot edu
2009-08-06  0:23 ` [Bug middle-end/40979] " spop at gcc dot gnu dot org
2009-08-12 14:58 ` spop at gcc dot gnu dot org
2009-08-13  2:26 ` howarth at nitro dot med dot uc dot edu
2009-08-14 11:59 ` dominiq at lps dot ens dot fr
2009-12-14 19:24 ` spop at gcc dot gnu dot org
2010-02-25 15:23 ` dominiq at lps dot ens dot fr
2010-02-25 17:26 ` dominiq at lps dot ens dot fr
2010-03-10  1:57 ` howarth at nitro dot med dot uc dot edu
2010-03-15 14:21 ` dominiq at lps dot ens dot fr
     [not found] <bug-40979-4@http.gcc.gnu.org/bugzilla/>
2011-01-26 10:45 ` dominiq at lps dot ens.fr
2011-01-26 14:43 ` howarth at nitro dot med.uc.edu
2011-01-31 18:53 ` spop at gcc dot gnu.org
2011-01-31 19:36 ` dominiq at lps dot ens.fr
2011-02-01 11:35 ` rguenth at gcc dot gnu.org
2011-02-01 11:45 ` rguenth at gcc dot gnu.org
2011-02-01 16:47 ` spop at gcc dot gnu.org
2011-02-01 17:18 ` rguenth at gcc dot gnu.org
2011-02-01 17:23 ` sebpop at gmail dot com
2011-02-01 17:51 ` sebpop at gmail dot com
2011-02-01 18:22 ` dominiq at lps dot ens.fr
2011-02-01 21:19 ` howarth at nitro dot med.uc.edu
2011-02-01 21:22 ` spop at gcc dot gnu.org
2011-02-02 15:53 ` spop at gcc dot gnu.org
2011-02-02 15:59 ` spop at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).