public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
@ 2012-05-14 15:44 dominiq at lps dot ens.fr
  2012-05-15  9:54 ` [Bug tree-optimization/53346] " rguenth at gcc dot gnu.org
                   ` (25 more replies)
  0 siblings, 26 replies; 27+ messages in thread
From: dominiq at lps dot ens.fr @ 2012-05-14 15:44 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346

             Bug #: 53346
           Summary: [4.6/4.7/4.8 Regression] Bad vectorization in the proc
                    cptrf2 of rnflow.f90
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: dominiq@lps.ens.fr
                CC: rguenth@gcc.gnu.org, ubizjak@gmail.com


At revision 187457 (i.e., with pr53340 fixed) on x86_64-apple-darwin10, after

[macbook] test/dbg_rnflow% gfc -c -O3 -ffast-math -funroll-loops timctr.f90
cmpcpt.f90 cptrf2.f90 dger.f90 dgetri.f90 dswap.f90 dtrsm.f90 evlrnf.f90
idamax.f90 main.f90 mattrs.f90 cmpmat.f90 dgemm.f90 dgetf2.f90 dlaswp.f90
dtrmm.f90 dtrti2.f90 extpic.f90 ilaenv.f90 matcnt.f90 reaseq.f90 xerbla.f90
cptrf1.f90 dgemv.f90 dgetrf.f90 dscal.f90 dtrmv.f90 dtrtri.f90 gentrs.f90
lsame.f90 matsim.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null                      
                                                                      23.872u
0.349s 0:24.22 99.9%    0+0k 0+0io 0pf+0w[macbook] test/dbg_rnflow%
/opt/gcc/gcc4.8p-187339/bin/gfortran -c -O3 -ffast-math -funroll-loops
evlrnf.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
22.259u 0.346s 0:22.61 99.9%    0+0k 0+0io 0pf+0w
[macbook] test/dbg_rnflow% /opt/gcc/gcc4.8p-187291/bin/gfortran -c -O3
-ffast-math -funroll-loops idamax.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
22.252u 0.345s 0:22.60 99.9%    0+0k 0+0io 0pf+0w
[macbook] test/dbg_rnflow% /opt/gcc/gcc4.8p-187102/bin/gfortran -c -O3
-ffast-math -funroll-loops idamax.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
22.121u 0.346s 0:22.47 99.9%    0+0k 0+0io 0pf+0w

(i.e., working around prpr53342 and a regression for idamax.f90, see 
below), the compilation of cptrf2.f90 (source attached to pr53340) with the
following flags yiels

optimization level      4.4.6   4.5.3   4.6.3   4.7.0   r187457

-O2                      27.8    28.2    28.2    21.8    21.8
-O2 -ftree-vectorize     27.8    28.2    28.2    27.9    27.9
-O3                      22.0    21.3    25.1    25.3    25.3
-O3 -fno-tree-vectorize  22.1    21.3    21.4    21.4    21.4

Note that 4.5/4.6/4.7 vectorize two loops (lines 21 and 29), while 4.8
vectorizes only the loop at line 21 (29: not vectorized: iteration count too
small.).

Looking at my archives I have found that a first regression appeared 
between revisions 162456 and 164728

optimization level      4.6-162456 4.6p-164728

-O2                             28.2    28.3
-O2 -ftree-vectorize            28.1    28.3
-O3                             21.4    29.4
-O3 -fno-tree-vectorize         21.3    21.4
-O3 -ffast-math                 21.4    22.3
-O3 -ffast-math -funroll-loops  21.9    22.4

For the record, as said above the compilation of idamax regressed between 
revisions 187102 and 187291

[macbook] test/dbg_rnflow% /opt/gcc/gcc4.8p-187291/bin/gfortran -c -O3
-ffast-math -funroll-loops idamax.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
22.252u 0.345s 0:22.60 99.9%    0+0k 0+0io 0pf+0w
[macbook] test/dbg_rnflow% /opt/gcc/gcc4.8p-187102/bin/gfortran -c -O3
-ffast-math -funroll-loops idamax.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
22.121u 0.346s 0:22.47 99.9%    0+0k 0+0io 0pf+0w

Although the regression is slightly above the noise margin at the level of 
rnflow.f90, it could be worth to investigate it because:
(1) it is a LAPACK routine (may be slightly modified),
(2) there equivalent intrinsics in F90,
(3) the slowdown may be quite significant at the level of the proc itself.


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2022-09-26  3:24 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
2012-05-15  9:54 ` [Bug tree-optimization/53346] " rguenth at gcc dot gnu.org
2012-05-15 12:55 ` dominiq at lps dot ens.fr
2012-05-17 18:35 ` ubizjak at gmail dot com
2012-05-17 20:47 ` ubizjak at gmail dot com
2012-05-18 11:49 ` rguenth at gcc dot gnu.org
2012-05-18 14:28 ` rguenth at gcc dot gnu.org
2012-05-18 14:32 ` rguenth at gcc dot gnu.org
2012-05-18 14:49 ` ubizjak at gmail dot com
2012-05-18 14:52 ` dominiq at lps dot ens.fr
2012-05-18 15:13 ` ubizjak at gmail dot com
2012-05-18 17:32 ` ubizjak at gmail dot com
2012-05-18 17:34 ` ubizjak at gmail dot com
2012-05-18 17:46 ` ubizjak at gmail dot com
2012-05-18 17:48 ` ubizjak at gmail dot com
2012-05-18 17:56 ` pinskia at gcc dot gnu.org
2012-05-18 18:27 ` hjl.tools at gmail dot com
2012-05-18 18:27 ` ubizjak at gmail dot com
2012-05-18 19:45 ` dominiq at lps dot ens.fr
2012-05-19 23:50 ` dominiq at lps dot ens.fr
2012-09-07 11:59 ` [Bug target/53346] " rguenth at gcc dot gnu.org
2012-11-14 22:19 ` hubicka at gcc dot gnu.org
2012-11-14 22:38 ` hubicka at gcc dot gnu.org
2012-12-31  9:20 ` [Bug target/53346] [4.6/4.7/4.8 Regression] Bad if conversion in " pinskia at gcc dot gnu.org
2012-12-31  9:41 ` pinskia at gcc dot gnu.org
2022-09-26  3:22 ` cvs-commit at gcc dot gnu.org
2022-09-26  3:24 ` crazylht at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).