public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/54717] New: Runtime regression: polyhedron test "rnflow" degraded
@ 2012-09-26 12:54 sergos.gnu at gmail dot com
  2012-09-26 13:17 ` [Bug tree-optimization/54717] " ubizjak at gmail dot com
                   ` (18 more replies)
  0 siblings, 19 replies; 21+ messages in thread
From: sergos.gnu at gmail dot com @ 2012-09-26 12:54 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54717

             Bug #: 54717
           Summary: Runtime regression: polyhedron test "rnflow" degraded
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: sergos.gnu@gmail.com


commit 024fee2c369096e6fe6cde620243df5843893004
Author: rguenth <rguenth@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Thu Sep 13 12:43:58 2012 +0000

    2012-09-13  Richard Guenther  <rguenther@suse.de>

        * tree-ssa-sccvn.h (enum vn_kind): New.
        (vn_get_stmt_kind): Likewise.
        * tree-ssa-sccvn.c (vn_get_stmt_kind): New function, adjust
        ADDR_EXPR handling.
        (visit_use): Use it.
        * tree-ssa-pre.c (compute_avail): Likewise, simplify further.

        * gcc.dg/tree-ssa/ssa-fre-37.c: New testcase.


    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@191253
138bc75d-0d04-0410-961f-82ee72b054a4

caused a 20% degradation on polyhedron's "rnflow"

commit 780bedc1ccae5ae85fb99afed8a1ac1cc598121b
Geometric Mean Execution Time =      18.28 seconds

commit 024fee2c369096e6fe6cde620243df5843893004
Geometric Mean Execution Time =      24.82 seconds


compilation options used:
gfortran -march=native -ffast-math -funroll-loops -O3 -ftree-vectorize %n.f90
-static -o %n


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug tree-optimization/54717] Runtime regression: polyhedron test "rnflow" degraded
  2012-09-26 12:54 [Bug tree-optimization/54717] New: Runtime regression: polyhedron test "rnflow" degraded sergos.gnu at gmail dot com
@ 2012-09-26 13:17 ` ubizjak at gmail dot com
  2012-09-26 14:17 ` [Bug tree-optimization/54717] [4.8 Regression] " rguenth at gcc dot gnu.org
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: ubizjak at gmail dot com @ 2012-09-26 13:17 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54717

Uros Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org,
                   |                            |ubizjak at gmail dot com

--- Comment #1 from Uros Bizjak <ubizjak at gmail dot com> 2012-09-26 13:17:15 UTC ---
Adding CCs.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug tree-optimization/54717] [4.8 Regression] Runtime regression: polyhedron test "rnflow" degraded
  2012-09-26 12:54 [Bug tree-optimization/54717] New: Runtime regression: polyhedron test "rnflow" degraded sergos.gnu at gmail dot com
  2012-09-26 13:17 ` [Bug tree-optimization/54717] " ubizjak at gmail dot com
@ 2012-09-26 14:17 ` rguenth at gcc dot gnu.org
  2012-09-26 15:12 ` sergos.gnu at gmail dot com
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-09-26 14:17 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54717

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2012-09-26
   Target Milestone|---                         |4.8.0
            Summary|Runtime regression:         |[4.8 Regression] Runtime
                   |polyhedron test "rnflow"    |regression: polyhedron test
                   |degraded                    |"rnflow" degraded
     Ever Confirmed|0                           |1

--- Comment #2 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-09-26 14:17:04 UTC ---
What's "-march=native" to you?  Any help in reduction appreciated.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug tree-optimization/54717] [4.8 Regression] Runtime regression: polyhedron test "rnflow" degraded
  2012-09-26 12:54 [Bug tree-optimization/54717] New: Runtime regression: polyhedron test "rnflow" degraded sergos.gnu at gmail dot com
  2012-09-26 13:17 ` [Bug tree-optimization/54717] " ubizjak at gmail dot com
  2012-09-26 14:17 ` [Bug tree-optimization/54717] [4.8 Regression] " rguenth at gcc dot gnu.org
@ 2012-09-26 15:12 ` sergos.gnu at gmail dot com
  2012-09-26 15:41 ` dominiq at lps dot ens.fr
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: sergos.gnu at gmail dot com @ 2012-09-26 15:12 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54717

--- Comment #3 from Sergey Ostanevich <sergos.gnu at gmail dot com> 2012-09-26 15:11:38 UTC ---
adding -### gives (in part of options)


/export/users/syostane/pb11/gcc120914/libexec/gcc/x86_64-unknown-linux-gnu/4.8.0/f951
air.f90 "-march=corei7" -mcx16 -msahf -mno-movbe -maes -mpclmul -mpopcnt
-mno-abm -mno-lwp -mno-fma -mno-fma4 -mno-xop -mno-bmi -mno-bmi2 -mno-tbm
-mno-avx -mno-avx2 -msse4.2 -msse4.1 -mno-lzcnt -mno-rtm -mno-hle -mno-rdrnd
-mno-f16c -mno-fsgsbase -mno-rdseed -mno-prfchw -mno-adx --param
"l1-cache-size=32" --param "l1-cache-line-size=64" --param
"l2-cache-size=12288" "-mtune=corei7" -quiet -dumpbase air.f90 -auxbase air
-fintrinsic-modules-path
/export/users/syostane/pb11/gcc120914/lib/gcc/x86_64-unknown-linux-gnu/4.8.0/finclude
-o /tmp/ccmW82c1.s


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug tree-optimization/54717] [4.8 Regression] Runtime regression: polyhedron test "rnflow" degraded
  2012-09-26 12:54 [Bug tree-optimization/54717] New: Runtime regression: polyhedron test "rnflow" degraded sergos.gnu at gmail dot com
                   ` (2 preceding siblings ...)
  2012-09-26 15:12 ` sergos.gnu at gmail dot com
@ 2012-09-26 15:41 ` dominiq at lps dot ens.fr
  2012-09-26 20:07 ` sergos.gnu at gmail dot com
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: dominiq at lps dot ens.fr @ 2012-09-26 15:41 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54717

--- Comment #4 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2012-09-26 15:41:05 UTC ---
The slowdown is mostly hidden by  -fno-tree-loop-if-convert.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug tree-optimization/54717] [4.8 Regression] Runtime regression: polyhedron test "rnflow" degraded
  2012-09-26 12:54 [Bug tree-optimization/54717] New: Runtime regression: polyhedron test "rnflow" degraded sergos.gnu at gmail dot com
                   ` (3 preceding siblings ...)
  2012-09-26 15:41 ` dominiq at lps dot ens.fr
@ 2012-09-26 20:07 ` sergos.gnu at gmail dot com
  2012-09-27  9:28 ` rguenth at gcc dot gnu.org
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: sergos.gnu at gmail dot com @ 2012-09-26 20:07 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54717

--- Comment #5 from Sergey Ostanevich <sergos.gnu at gmail dot com> 2012-09-26 20:07:26 UTC ---
for 093t.pre I see the following missing in cptrf2 function, first is good,
second is degraded:

***************
*** 8947,8966 ****
    goto <bb 35>;

    <bb 93>:
-   pretmp_325 = (integer(kind=8)) ival2_80;
-   pretmp_326 = pretmp_325 + -1;
-   pretmp_327 = *xxtrt_25(D)[pretmp_326];

    <bb 27>:
    # ival2_136 = PHI <ival2_62(93), ival2_144(97)>
    # ival2_140 = PHI <ival2_80(93), ival2_146(97)>
-   # prephitmp_328 = PHI <pretmp_327(93), prephitmp_290(97)>
    _137 = (integer(kind=8)) ival2_136;
    _138 = _137 + -1;
    _139 = *xxtrt_25(D)[_138];
    _141 = (integer(kind=8)) ival2_140;
    _142 = _141 + -1;
!   _143 = prephitmp_328;
    if (_139 < _143)
      goto <bb 28>;
    else
--- 8838,8853 ----
    goto <bb 35>;

    <bb 93>:

    <bb 27>:
    # ival2_136 = PHI <ival2_62(93), ival2_144(97)>
    # ival2_140 = PHI <ival2_80(93), ival2_146(97)>
    _137 = (integer(kind=8)) ival2_136;
    _138 = _137 + -1;
    _139 = *xxtrt_25(D)[_138];
    _141 = (integer(kind=8)) ival2_140;
    _142 = _141 + -1;
!   _143 = *xxtrt_25(D)[_142];
    if (_139 < _143)
      goto <bb 28>;
    else
***************

but more surprising to me is that first diff is in 020t.inline_param1

***************
*** 16790,16794 ****
    calls:
      dtrti2/26 function not considered for inlining
!       loop depth: 0 freq:1000 size: 9 time: 18 callee size:82 stack:28
      dtrsm/21 function not considered for inlining
        loop depth: 0 freq:1000 size:16 time: 25 callee size:324 stack: 4
--- 16790,16794 ----
    calls:
      dtrti2/26 function not considered for inlining
!       loop depth: 0 freq:1000 size: 9 time: 18 callee size:81 stack:28
      dtrsm/21 function not considered for inlining
        loop depth: 0 freq:1000 size:16 time: 25 callee size:324 stack: 4
***************


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug tree-optimization/54717] [4.8 Regression] Runtime regression: polyhedron test "rnflow" degraded
  2012-09-26 12:54 [Bug tree-optimization/54717] New: Runtime regression: polyhedron test "rnflow" degraded sergos.gnu at gmail dot com
                   ` (4 preceding siblings ...)
  2012-09-26 20:07 ` sergos.gnu at gmail dot com
@ 2012-09-27  9:28 ` rguenth at gcc dot gnu.org
  2012-09-27 10:43 ` rguenth at gcc dot gnu.org
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-09-27  9:28 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54717

--- Comment #6 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-09-27 09:28:04 UTC ---
(In reply to comment #4)
> The slowdown is mostly hidden by  -fno-tree-loop-if-convert.

I would say this means we have more vectorization opportunities after the
patch.  Opportunities that might end up being not profitable.

Sergey, are those differences you quote the only differences?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug tree-optimization/54717] [4.8 Regression] Runtime regression: polyhedron test "rnflow" degraded
  2012-09-26 12:54 [Bug tree-optimization/54717] New: Runtime regression: polyhedron test "rnflow" degraded sergos.gnu at gmail dot com
                   ` (5 preceding siblings ...)
  2012-09-27  9:28 ` rguenth at gcc dot gnu.org
@ 2012-09-27 10:43 ` rguenth at gcc dot gnu.org
  2012-10-02 20:24 ` dominiq at lps dot ens.fr
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-09-27 10:43 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54717

--- Comment #7 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-09-27 10:43:00 UTC ---
I can reproduce the slowdown.  Code differences appear first in early FRE,
good ones like:

-  _84 = &*a_56(D)[_83];
+  _84 = _75;

which was the intention of the patch (and that is also likely the
reason for the inliner code size/time estimate changes).

It would be nice to get a smaller testcase for the PRE change you quote.

Unfortunately the big slowdown does not reproduce with -fno-inline which makes
it harder to track down.

The real differences do appear in PRE, some of the kind you quote and
some where we perform more PRE like:

@@ -19695,11 +19720,13 @@
   <bb 289>:
   pretmp_ = stride.258_ * _;
   pretmp_ = offset.259_ + pretmp_;
+  pretmp_ = stride.258_ * _;
+  pretmp_ = offset.259_ + pretmp_;

   <bb 123>:
   # i_ = PHI <1(289), i_(292)>
-  _ = stride.258_ * _;
-  _ = _ + offset.259_;
+  _ = pretmp_;
+  _ = pretmp_;

Aside from that the differences you quote result in less if-conversion
applied:

   # ival2_ = PHI <ival2_(39), ival2_(41)>
   # ival2_ = PHI <ival2_(39), ival2_(41)>
-  # prephitmp_ = PHI <pretmp_(39), prephitmp_(41)>
   _ = (integer(kind=8)) ival2_;
   _ = _ + -1;
   _ = *xxtrt_(D)[_];
-  ival2_ = _ < prephitmp_ ? ival2_ : ival2_;
-  prephitmp_ = MIN_EXPR <_, prephitmp_>;
+  _ = (integer(kind=8)) ival2_;
+  _ = _ + -1;
+  _ = *xxtrt_(D)[_];
+  ival2_ = _ < _ ? ival2_ : ival2_;

but that does not result in any extra or missed vectorization.

Btw, dropping to -O2 also fixes the regression.

So, it's not at all clear what we are chasing here (the PRE seems to be
a partial antic expression).


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug tree-optimization/54717] [4.8 Regression] Runtime regression: polyhedron test "rnflow" degraded
  2012-09-26 12:54 [Bug tree-optimization/54717] New: Runtime regression: polyhedron test "rnflow" degraded sergos.gnu at gmail dot com
                   ` (6 preceding siblings ...)
  2012-09-27 10:43 ` rguenth at gcc dot gnu.org
@ 2012-10-02 20:24 ` dominiq at lps dot ens.fr
  2012-10-08  8:55 ` sergos.gnu at gmail dot com
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: dominiq at lps dot ens.fr @ 2012-10-02 20:24 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54717

--- Comment #8 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2012-10-02 20:23:42 UTC ---
Created attachment 28333
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28333
bzipped tar archive of a reduced test

The tar archive contains the files
cptrf2_inl_1.f90  rnflow.in  rnflow_red.f90  rnfprm.h
and can be used as in

[macbook] dbg_rnflow/pr54717% gfc -c -Ofast -funroll-loops rnflow_red.f90
[macbook] dbg_rnflow/pr54717% gfc -c -O2 cptrf2_inl_1.f90
[macbook] dbg_rnflow/pr54717% gfc rnflow_red.o cptrf2_inl_1.o
[macbook] dbg_rnflow/pr54717% time a.out > /dev/null
21.036u 0.051s 0:21.09 99.9%    0+0k 0+0io 0pf+0w
[macbook] dbg_rnflow/pr54717% gfc -c -O2 -ftree-loop-if-convert
cptrf2_inl_1.f90
[macbook] dbg_rnflow/pr54717% gfc rnflow_red.o cptrf2_inl_1.o
[macbook] dbg_rnflow/pr54717% time a.out > /dev/null
27.150u 0.051s 0:27.20 100.0%    0+0k 0+0io 0pf+0w

This shows that the file cptrf2_inl_1.f90 compiled with -ftree-loop-if-convert
gives a slow executable without involving inlining or vectorization.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug tree-optimization/54717] [4.8 Regression] Runtime regression: polyhedron test "rnflow" degraded
  2012-09-26 12:54 [Bug tree-optimization/54717] New: Runtime regression: polyhedron test "rnflow" degraded sergos.gnu at gmail dot com
                   ` (7 preceding siblings ...)
  2012-10-02 20:24 ` dominiq at lps dot ens.fr
@ 2012-10-08  8:55 ` sergos.gnu at gmail dot com
  2012-11-13 18:40 ` ubizjak at gmail dot com
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: sergos.gnu at gmail dot com @ 2012-10-08  8:55 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54717

--- Comment #9 from Sergey Ostanevich <sergos.gnu at gmail dot com> 2012-10-08 08:55:25 UTC ---
Thanks for the reduced test, Dominique!

I see that vectorized did not manage to generate MIN after the change. Also, it
is looks pretty similar to what I posted at first: there was no prephitmp
created for the xxtrt_[]


> ival2_15 = _85 < prephitmp_266 ? ival2_10 : iva
> prephitmp_237 = MIN_EXPR <_85, prephitmp_266>;
-----------------------
< _86 = (integer(kind=8)) ival2_14;
< _87 = _86 + -1;
< _88 = *xxtrt_46(D)[_87];
< ival2_15 = _85 < _88 ? ival2_10 : ival2_14;

I suspect that one of the iterator you removed - possibly VEC_iterate - made
more traverse than that you created?

I also double check that for the reduced test MIN did not generated and not
appears in assembly. PMU measurements (Vtune) confirms that BBLOCKs missing min
contributes the difference in clocks.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug tree-optimization/54717] [4.8 Regression] Runtime regression: polyhedron test "rnflow" degraded
  2012-09-26 12:54 [Bug tree-optimization/54717] New: Runtime regression: polyhedron test "rnflow" degraded sergos.gnu at gmail dot com
                   ` (8 preceding siblings ...)
  2012-10-08  8:55 ` sergos.gnu at gmail dot com
@ 2012-11-13 18:40 ` ubizjak at gmail dot com
  2012-11-13 18:54 ` dominiq at lps dot ens.fr
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: ubizjak at gmail dot com @ 2012-11-13 18:40 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54717

--- Comment #10 from Uros Bizjak <ubizjak at gmail dot com> 2012-11-13 18:39:28 UTC ---
(In reply to comment #8)

> This shows that the file cptrf2_inl_1.f90 compiled with -ftree-loop-if-convert
> gives a slow executable without involving inlining or vectorization.

Dup of PR53346 ?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug tree-optimization/54717] [4.8 Regression] Runtime regression: polyhedron test "rnflow" degraded
  2012-09-26 12:54 [Bug tree-optimization/54717] New: Runtime regression: polyhedron test "rnflow" degraded sergos.gnu at gmail dot com
                   ` (9 preceding siblings ...)
  2012-11-13 18:40 ` ubizjak at gmail dot com
@ 2012-11-13 18:54 ` dominiq at lps dot ens.fr
  2012-11-14 18:56 ` sergos.gnu at gmail dot com
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: dominiq at lps dot ens.fr @ 2012-11-13 18:54 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54717

--- Comment #11 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2012-11-13 18:54:40 UTC ---
> Dup of PR53346 ?

May be! Both PRs seem also related to pr54073.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug tree-optimization/54717] [4.8 Regression] Runtime regression: polyhedron test "rnflow" degraded
  2012-09-26 12:54 [Bug tree-optimization/54717] New: Runtime regression: polyhedron test "rnflow" degraded sergos.gnu at gmail dot com
                   ` (10 preceding siblings ...)
  2012-11-13 18:54 ` dominiq at lps dot ens.fr
@ 2012-11-14 18:56 ` sergos.gnu at gmail dot com
  2012-11-14 19:42   ` Jan Hubicka
  2012-11-14 19:43 ` hubicka at ucw dot cz
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 21+ messages in thread
From: sergos.gnu at gmail dot com @ 2012-11-14 18:56 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54717

--- Comment #12 from Sergey Ostanevich <sergos.gnu at gmail dot com> 2012-11-14 18:56:22 UTC ---
Actually, it is not. 
I found that PRE did not collected a memory access within the loop that caused
later missing vectorization. Here is dump before (good one) and after the
commit (bad one)

    <bb 88>:
    pretmp_263 = (integer(kind=8)) ival2_82;
    pretmp_264 = pretmp_263 + -1;
    pretmp_265 = *xxtrt_46(D)[pretmp_264];

    <bb 28>:
    # ival2_10 = PHI <ival2_63(88), ival2_89(92)>
    # ival2_14 = PHI <ival2_82(88), ival2_15(92)>
    # prephitmp_266 = PHI <pretmp_265(88), prephitmp_237(92)>
    _83 = (integer(kind=8)) ival2_10;
    _84 = _83 + -1;
    _85 = *xxtrt_46(D)[_84];
    _86 = (integer(kind=8)) ival2_14;
    _87 = _86 + -1;
    _88 = prephitmp_266;
    if (_85 < _88)
      goto <bb 29>;
    else
      goto <bb 90>;

    <bb 90>:
    goto <bb 30>;

    <bb 29>:

    <bb 30>:
    # ival2_15 = PHI <ival2_14(90), ival2_10(29)>
    # prephitmp_237 = PHI <_88(90), _85(29)>
    ival2_89 = ival2_10 + -1;
    if (ival2_10 == ipos1_12)
      goto <bb 91>;
    else
      goto <bb 92>;

   <bb 92>:
   goto <bb 28>;
---------------------------------
    <bb 88>:

    <bb 28>:
    # ival2_10 = PHI <ival2_63(88), ival2_89(92)>
   # ival2_14 = PHI <ival2_82(88), ival2_15(92)>
    _83 = (integer(kind=8)) ival2_10;
    _84 = _83 + -1;
    _85 = *xxtrt_46(D)[_84];
    _86 = (integer(kind=8)) ival2_14;
    _87 = _86 + -1;
    _88 = *xxtrt_46(D)[_87];
    if (_85 < _88)
      goto <bb 29>;
    else
      goto <bb 90>;

    <bb 90>:
    goto <bb 30>;

    <bb 29>:

    <bb 30>:
    # ival2_15 = PHI <ival2_14(90), ival2_10(29)>
    ival2_89 = ival2_10 + -1;
    if (ival2_10 == ipos1_12)
      goto <bb 91>;
    else
      goto <bb 92>;

   <bb 92>:
   goto <bb 28>;
-------------------------

So for the loop that starting at bb 28 you can see the xxtrt_46 access was not
put into pretemp. Possible reason is exactly as it was mentioned by Richard -
there were extra candidates collected and this one become less anticipatable

Skipping partial partial redundancy for expression                    
{array_ref<pretmp_8,0,4>,mem_ref<0B>,xxtrt_46(D)}@.MEM_30(D) (0165)   
   not partially anticipated on any to be optimized for speed edges    
  -----------------------------------------------------------------------
Found partial partial redundancy for expression
 {array_ref<pretmp_8,0,4>,mem_ref<0B>,xxtrt_46(D)}@.MEM_30(D) (0165)
Created phi prephitmp_237 = PHI <_88(90), _85(29)>
 in block 30


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug tree-optimization/54717] [4.8 Regression] Runtime regression: polyhedron test "rnflow" degraded
  2012-11-14 18:56 ` sergos.gnu at gmail dot com
@ 2012-11-14 19:42   ` Jan Hubicka
  0 siblings, 0 replies; 21+ messages in thread
From: Jan Hubicka @ 2012-11-14 19:42 UTC (permalink / raw)
  To: sergos.gnu at gmail dot com; +Cc: gcc-bugs

> So for the loop that starting at bb 28 you can see the xxtrt_46 access was not
> put into pretemp. Possible reason is exactly as it was mentioned by Richard -
> there were extra candidates collected and this one become less anticipatable
> 
> Skipping partial partial redundancy for expression                    
> {array_ref<pretmp_8,0,4>,mem_ref<0B>,xxtrt_46(D)}@.MEM_30(D) (0165)   
>    not partially anticipated on any to be optimized for speed edges    
>   -----------------------------------------------------------------------
> Found partial partial redundancy for expression
>  {array_ref<pretmp_8,0,4>,mem_ref<0B>,xxtrt_46(D)}@.MEM_30(D) (0165)
> Created phi prephitmp_237 = PHI <_88(90), _85(29)>
>  in block 30

Hmm, interesting, what is the edge resonsible?
I would expect it to be the loopback edge and its frequency is:
;;   basic block 28, loop depth 0, count 0, freq 1998, maybe hot
;;    prev block 92, next block 94, flags: (NEW, REACHABLE)
;;    pred:       92 [100.0%, 180]  (FALLTHRU)
;;                96 [100.0%, 1818]  (FALLTHRU,DFS_BACK)
  # ival2_136 = PHI <ival2_62(92), ival2_144(96)>
  # ival2_140 = PHI <ival2_80(92), ival2_146(96)>
  _137 = (integer(kind=8)) ival2_136;
  _138 = _137 + -1;
  _139 = *xxtrt_25(D)[_138];
  _141 = (integer(kind=8)) ival2_140;
  _142 = _141 + -1;
  _143 = *xxtrt_25(D)[_142];
  if (_139 < _143)
    goto <bb 29>; 
  else            
    goto <bb 94>;

1818 that should be still hot.  Or isn't the heuristic backwards? I.e. I would expect
the partial anticipance to sit on edge 92->28 (with freq 180) where we need to insert
the computation to get the other path hot.

Honza


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug tree-optimization/54717] [4.8 Regression] Runtime regression: polyhedron test "rnflow" degraded
  2012-09-26 12:54 [Bug tree-optimization/54717] New: Runtime regression: polyhedron test "rnflow" degraded sergos.gnu at gmail dot com
                   ` (11 preceding siblings ...)
  2012-11-14 18:56 ` sergos.gnu at gmail dot com
@ 2012-11-14 19:43 ` hubicka at ucw dot cz
  2012-11-14 20:11 ` hubicka at gcc dot gnu.org
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: hubicka at ucw dot cz @ 2012-11-14 19:43 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54717

--- Comment #13 from Jan Hubicka <hubicka at ucw dot cz> 2012-11-14 19:43:00 UTC ---
> So for the loop that starting at bb 28 you can see the xxtrt_46 access was not
> put into pretemp. Possible reason is exactly as it was mentioned by Richard -
> there were extra candidates collected and this one become less anticipatable
> 
> Skipping partial partial redundancy for expression                    
> {array_ref<pretmp_8,0,4>,mem_ref<0B>,xxtrt_46(D)}@.MEM_30(D) (0165)   
>    not partially anticipated on any to be optimized for speed edges    
>   -----------------------------------------------------------------------
> Found partial partial redundancy for expression
>  {array_ref<pretmp_8,0,4>,mem_ref<0B>,xxtrt_46(D)}@.MEM_30(D) (0165)
> Created phi prephitmp_237 = PHI <_88(90), _85(29)>
>  in block 30

Hmm, interesting, what is the edge resonsible?
I would expect it to be the loopback edge and its frequency is:
;;   basic block 28, loop depth 0, count 0, freq 1998, maybe hot
;;    prev block 92, next block 94, flags: (NEW, REACHABLE)
;;    pred:       92 [100.0%, 180]  (FALLTHRU)
;;                96 [100.0%, 1818]  (FALLTHRU,DFS_BACK)
  # ival2_136 = PHI <ival2_62(92), ival2_144(96)>
  # ival2_140 = PHI <ival2_80(92), ival2_146(96)>
  _137 = (integer(kind=8)) ival2_136;
  _138 = _137 + -1;
  _139 = *xxtrt_25(D)[_138];
  _141 = (integer(kind=8)) ival2_140;
  _142 = _141 + -1;
  _143 = *xxtrt_25(D)[_142];
  if (_139 < _143)
    goto <bb 29>; 
  else            
    goto <bb 94>;

1818 that should be still hot.  Or isn't the heuristic backwards? I.e. I would
expect
the partial anticipance to sit on edge 92->28 (with freq 180) where we need to
insert
the computation to get the other path hot.

Honza


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug tree-optimization/54717] [4.8 Regression] Runtime regression: polyhedron test "rnflow" degraded
  2012-09-26 12:54 [Bug tree-optimization/54717] New: Runtime regression: polyhedron test "rnflow" degraded sergos.gnu at gmail dot com
                   ` (12 preceding siblings ...)
  2012-11-14 19:43 ` hubicka at ucw dot cz
@ 2012-11-14 20:11 ` hubicka at gcc dot gnu.org
  2012-11-15 10:28 ` hubicka at gcc dot gnu.org
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: hubicka at gcc dot gnu.org @ 2012-11-14 20:11 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54717

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hubicka at gcc dot gnu.org

--- Comment #14 from Jan Hubicka <hubicka at gcc dot gnu.org> 2012-11-14 20:11:17 UTC ---
Hmm, the optimize_edge_for_speed never returns false here. The problem is that
patch assumes that interesting successors of block with partial anticipance are
blocks with partial anticipance. The anticipance however could be full and it
seems that full anticipance do not imply partial one
Index: tree-ssa-pre.c
===================================================================
*** tree-ssa-pre.c      (revision 193503)
--- tree-ssa-pre.c      (working copy)
*************** do_partial_partial_insertion (basic_bloc
*** 3525,3531 ****
                 may cause regressions on the speed path.  */
              FOR_EACH_EDGE (succ, ei, block->succs)
                {
!                 if (bitmap_set_contains_value (PA_IN (succ->dest), val))
                    {
                      if (optimize_edge_for_speed_p (succ))
                        do_insertion = true;
--- 3525,3532 ----
                 may cause regressions on the speed path.  */
              FOR_EACH_EDGE (succ, ei, block->succs)
                {
!                 if (bitmap_set_contains_value (PA_IN (succ->dest), val)
!                     || bitmap_set_contains_value (ANTIC_IN (succ->dest),
val))
                    {
                      if (optimize_edge_for_speed_p (succ))
                        do_insertion = true;


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug tree-optimization/54717] [4.8 Regression] Runtime regression: polyhedron test "rnflow" degraded
  2012-09-26 12:54 [Bug tree-optimization/54717] New: Runtime regression: polyhedron test "rnflow" degraded sergos.gnu at gmail dot com
                   ` (13 preceding siblings ...)
  2012-11-14 20:11 ` hubicka at gcc dot gnu.org
@ 2012-11-15 10:28 ` hubicka at gcc dot gnu.org
  2012-11-15 10:52 ` hubicka at gcc dot gnu.org
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: hubicka at gcc dot gnu.org @ 2012-11-15 10:28 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54717

--- Comment #15 from Jan Hubicka <hubicka at gcc dot gnu.org> 2012-11-15 10:27:49 UTC ---
Path posted at http://gcc.gnu.org/ml/gcc-patches/2012-11/msg01222.html
Can we figure out why the vectorization still does not happen?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug tree-optimization/54717] [4.8 Regression] Runtime regression: polyhedron test "rnflow" degraded
  2012-09-26 12:54 [Bug tree-optimization/54717] New: Runtime regression: polyhedron test "rnflow" degraded sergos.gnu at gmail dot com
                   ` (14 preceding siblings ...)
  2012-11-15 10:28 ` hubicka at gcc dot gnu.org
@ 2012-11-15 10:52 ` hubicka at gcc dot gnu.org
  2012-11-15 15:07 ` dominiq at lps dot ens.fr
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 21+ messages in thread
From: hubicka at gcc dot gnu.org @ 2012-11-15 10:52 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54717

--- Comment #16 from Jan Hubicka <hubicka at gcc dot gnu.org> 2012-11-15 10:52:13 UTC ---
OK, 4.7 vectorize two loops in the function in cptrf2

loop at ../a.f90:3538

      if (nxtr < 4) then
         kerr = 1
         do ixtr = 1, nxtr - 1
           ixtrt (ixtr) = ixtr + 1
         enddo
         goto 9000
      endif


and 

loop at ../a.f90:3530


         ixtrt = 0


The second loop is recognized as memset by mainline, so it remains to figure
out what is wrong with the first loop.  It is unrolled:

Analyzing # of iterations of loop 9
  exit condition [1, + , 1](no_overflow) != ival2_27 + -1
  bounds on difference of bases: 0 ... 1
  result:
    # of iterations (unsigned int) ival2_27 + 4294967294, bounded by 1
Loop 9 iterates at most 1 times.
Estimating sizes for loop 9
 BB: 8, after_exit: 0
  size:   0 _38 = (integer(kind=8)) ixtr_12;
   Induction variable computation will be folded away.
  size:   1 _39 = _38 + -1;
   Induction variable computation will be folded away.
  size:   1 ixtr_40 = ixtr_12 + 1;
   Induction variable computation will be folded away.
  size:   1 *ixtrt_33(D)[_39] = ixtr_40;
  size:   2 if (ixtr_12 == _37)
   Exit condition will be eliminated in last copy.
 BB: 79, after_exit: 1
size: 5-2, last_iteration: 5-4
  Loop size: 5
  Estimated size after unrolling: 2
Unrolled loop 9 completely (duplicated 1 times).

I do not quite see why it iterates at most once, but if seems to work. So I
would say that it is good idea to unroll rather than vectorize.

Is the slowdown still reproducing with my patch?


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug tree-optimization/54717] [4.8 Regression] Runtime regression: polyhedron test "rnflow" degraded
  2012-09-26 12:54 [Bug tree-optimization/54717] New: Runtime regression: polyhedron test "rnflow" degraded sergos.gnu at gmail dot com
                   ` (15 preceding siblings ...)
  2012-11-15 10:52 ` hubicka at gcc dot gnu.org
@ 2012-11-15 15:07 ` dominiq at lps dot ens.fr
  2012-11-16 10:37 ` hubicka at gcc dot gnu.org
  2012-12-06 16:51 ` rguenth at gcc dot gnu.org
  18 siblings, 0 replies; 21+ messages in thread
From: dominiq at lps dot ens.fr @ 2012-11-15 15:07 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54717

--- Comment #17 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2012-11-15 15:07:33 UTC ---
> Is the slowdown still reproducing with my patch?

Most of it (if not all) is gone with the patch: 
23.96s with '-fprotect-parens -Ofast -funroll-loops -ftree-loop-linear
-fomit-frame-pointer -fwhole-program -flto' compared to 
23.37s with '-fprotect-parens -Ofast -funroll-loops -ftree-loop-linear
-fomit-frame-pointer -fwhole-program -flto -fno-tree-loop-if-convert'.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug tree-optimization/54717] [4.8 Regression] Runtime regression: polyhedron test "rnflow" degraded
  2012-09-26 12:54 [Bug tree-optimization/54717] New: Runtime regression: polyhedron test "rnflow" degraded sergos.gnu at gmail dot com
                   ` (16 preceding siblings ...)
  2012-11-15 15:07 ` dominiq at lps dot ens.fr
@ 2012-11-16 10:37 ` hubicka at gcc dot gnu.org
  2012-12-06 16:51 ` rguenth at gcc dot gnu.org
  18 siblings, 0 replies; 21+ messages in thread
From: hubicka at gcc dot gnu.org @ 2012-11-16 10:37 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54717

--- Comment #18 from Jan Hubicka <hubicka at gcc dot gnu.org> 2012-11-16 10:37:30 UTC ---
Author: hubicka
Date: Fri Nov 16 10:37:25 2012
New Revision: 193553

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=193553
Log:
    PR tree-optimization/54717
    * tree-ssa-pre.c (do_partial_partial_insertion): Consider also edges
    with ANTIC_IN.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-ssa-pre.c


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug tree-optimization/54717] [4.8 Regression] Runtime regression: polyhedron test "rnflow" degraded
  2012-09-26 12:54 [Bug tree-optimization/54717] New: Runtime regression: polyhedron test "rnflow" degraded sergos.gnu at gmail dot com
                   ` (17 preceding siblings ...)
  2012-11-16 10:37 ` hubicka at gcc dot gnu.org
@ 2012-12-06 16:51 ` rguenth at gcc dot gnu.org
  18 siblings, 0 replies; 21+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-12-06 16:51 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54717

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED

--- Comment #19 from Richard Biener <rguenth at gcc dot gnu.org> 2012-12-06 16:51:11 UTC ---
Fixed.


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2012-12-06 16:51 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-26 12:54 [Bug tree-optimization/54717] New: Runtime regression: polyhedron test "rnflow" degraded sergos.gnu at gmail dot com
2012-09-26 13:17 ` [Bug tree-optimization/54717] " ubizjak at gmail dot com
2012-09-26 14:17 ` [Bug tree-optimization/54717] [4.8 Regression] " rguenth at gcc dot gnu.org
2012-09-26 15:12 ` sergos.gnu at gmail dot com
2012-09-26 15:41 ` dominiq at lps dot ens.fr
2012-09-26 20:07 ` sergos.gnu at gmail dot com
2012-09-27  9:28 ` rguenth at gcc dot gnu.org
2012-09-27 10:43 ` rguenth at gcc dot gnu.org
2012-10-02 20:24 ` dominiq at lps dot ens.fr
2012-10-08  8:55 ` sergos.gnu at gmail dot com
2012-11-13 18:40 ` ubizjak at gmail dot com
2012-11-13 18:54 ` dominiq at lps dot ens.fr
2012-11-14 18:56 ` sergos.gnu at gmail dot com
2012-11-14 19:42   ` Jan Hubicka
2012-11-14 19:43 ` hubicka at ucw dot cz
2012-11-14 20:11 ` hubicka at gcc dot gnu.org
2012-11-15 10:28 ` hubicka at gcc dot gnu.org
2012-11-15 10:52 ` hubicka at gcc dot gnu.org
2012-11-15 15:07 ` dominiq at lps dot ens.fr
2012-11-16 10:37 ` hubicka at gcc dot gnu.org
2012-12-06 16:51 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).