public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
@ 2012-05-14 15:44 dominiq at lps dot ens.fr
  2012-05-15  9:54 ` [Bug tree-optimization/53346] " rguenth at gcc dot gnu.org
                   ` (25 more replies)
  0 siblings, 26 replies; 27+ messages in thread
From: dominiq at lps dot ens.fr @ 2012-05-14 15:44 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346

             Bug #: 53346
           Summary: [4.6/4.7/4.8 Regression] Bad vectorization in the proc
                    cptrf2 of rnflow.f90
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: dominiq@lps.ens.fr
                CC: rguenth@gcc.gnu.org, ubizjak@gmail.com


At revision 187457 (i.e., with pr53340 fixed) on x86_64-apple-darwin10, after

[macbook] test/dbg_rnflow% gfc -c -O3 -ffast-math -funroll-loops timctr.f90
cmpcpt.f90 cptrf2.f90 dger.f90 dgetri.f90 dswap.f90 dtrsm.f90 evlrnf.f90
idamax.f90 main.f90 mattrs.f90 cmpmat.f90 dgemm.f90 dgetf2.f90 dlaswp.f90
dtrmm.f90 dtrti2.f90 extpic.f90 ilaenv.f90 matcnt.f90 reaseq.f90 xerbla.f90
cptrf1.f90 dgemv.f90 dgetrf.f90 dscal.f90 dtrmv.f90 dtrtri.f90 gentrs.f90
lsame.f90 matsim.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null                      
                                                                      23.872u
0.349s 0:24.22 99.9%    0+0k 0+0io 0pf+0w[macbook] test/dbg_rnflow%
/opt/gcc/gcc4.8p-187339/bin/gfortran -c -O3 -ffast-math -funroll-loops
evlrnf.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
22.259u 0.346s 0:22.61 99.9%    0+0k 0+0io 0pf+0w
[macbook] test/dbg_rnflow% /opt/gcc/gcc4.8p-187291/bin/gfortran -c -O3
-ffast-math -funroll-loops idamax.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
22.252u 0.345s 0:22.60 99.9%    0+0k 0+0io 0pf+0w
[macbook] test/dbg_rnflow% /opt/gcc/gcc4.8p-187102/bin/gfortran -c -O3
-ffast-math -funroll-loops idamax.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
22.121u 0.346s 0:22.47 99.9%    0+0k 0+0io 0pf+0w

(i.e., working around prpr53342 and a regression for idamax.f90, see 
below), the compilation of cptrf2.f90 (source attached to pr53340) with the
following flags yiels

optimization level      4.4.6   4.5.3   4.6.3   4.7.0   r187457

-O2                      27.8    28.2    28.2    21.8    21.8
-O2 -ftree-vectorize     27.8    28.2    28.2    27.9    27.9
-O3                      22.0    21.3    25.1    25.3    25.3
-O3 -fno-tree-vectorize  22.1    21.3    21.4    21.4    21.4

Note that 4.5/4.6/4.7 vectorize two loops (lines 21 and 29), while 4.8
vectorizes only the loop at line 21 (29: not vectorized: iteration count too
small.).

Looking at my archives I have found that a first regression appeared 
between revisions 162456 and 164728

optimization level      4.6-162456 4.6p-164728

-O2                             28.2    28.3
-O2 -ftree-vectorize            28.1    28.3
-O3                             21.4    29.4
-O3 -fno-tree-vectorize         21.3    21.4
-O3 -ffast-math                 21.4    22.3
-O3 -ffast-math -funroll-loops  21.9    22.4

For the record, as said above the compilation of idamax regressed between 
revisions 187102 and 187291

[macbook] test/dbg_rnflow% /opt/gcc/gcc4.8p-187291/bin/gfortran -c -O3
-ffast-math -funroll-loops idamax.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
22.252u 0.345s 0:22.60 99.9%    0+0k 0+0io 0pf+0w
[macbook] test/dbg_rnflow% /opt/gcc/gcc4.8p-187102/bin/gfortran -c -O3
-ffast-math -funroll-loops idamax.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
22.121u 0.346s 0:22.47 99.9%    0+0k 0+0io 0pf+0w

Although the regression is slightly above the noise margin at the level of 
rnflow.f90, it could be worth to investigate it because:
(1) it is a LAPACK routine (may be slightly modified),
(2) there equivalent intrinsics in F90,
(3) the slowdown may be quite significant at the level of the proc itself.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
  2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
@ 2012-05-15  9:54 ` rguenth at gcc dot gnu.org
  2012-05-15 12:55 ` dominiq at lps dot ens.fr
                   ` (24 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-05-15  9:54 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
   Target Milestone|---                         |4.8.0

--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-05-15 09:43:31 UTC ---
Do you possibly have a testcase?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
  2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
  2012-05-15  9:54 ` [Bug tree-optimization/53346] " rguenth at gcc dot gnu.org
@ 2012-05-15 12:55 ` dominiq at lps dot ens.fr
  2012-05-17 18:35 ` ubizjak at gmail dot com
                   ` (23 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: dominiq at lps dot ens.fr @ 2012-05-15 12:55 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346

--- Comment #2 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2012-05-15 12:39:50 UTC ---
> Do you possibly have a testcase?

I am not sure to understand what you ask for. 

The source for cptrf2.f90 has been attached to pr53340. I can provide a version
of rnflow without the proc cptrf2 or an archive with the rnflow.f90 source
split to one file per proc.

If you ask for a reduced test, it is much more difficult:
(1) the code is not mine and I don't know it well,
(2) optimizations may change for tiny details of the source layout.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
  2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
  2012-05-15  9:54 ` [Bug tree-optimization/53346] " rguenth at gcc dot gnu.org
  2012-05-15 12:55 ` dominiq at lps dot ens.fr
@ 2012-05-17 18:35 ` ubizjak at gmail dot com
  2012-05-17 20:47 ` ubizjak at gmail dot com
                   ` (22 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: ubizjak at gmail dot com @ 2012-05-17 18:35 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346

Uros Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2012-05-17
     Ever Confirmed|0                           |1

--- Comment #3 from Uros Bizjak <ubizjak at gmail dot com> 2012-05-17 18:29:12 UTC ---
Confirmed, -O2 vs. -O2 -ftree-vectorize on x86_64:

-O2 -ftree-vectorize:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 43.83      9.73     9.73       64     0.15     0.15  cptrf2_
 40.68     18.76     9.03     6685     0.00     0.00  trs2a2.2054
  7.70     20.47     1.71       64     0.03     0.03  gentrs_
  1.49     20.80     0.33       64     0.01     0.01  cptrf1_
  1.40     21.11     0.31        1     0.31    12.33  matsim_
  1.40     21.42     0.31     6685     0.00     0.00  invima.2045
  1.13     21.67     0.25       64     0.00     0.00  cmpcpt_

-O2:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 55.20      9.20     9.20     6685     0.00     0.00  trs2a2.2054
 23.40     13.10     3.90       64     0.06     0.06  cptrf2_
 10.38     14.83     1.73       64     0.03     0.03  gentrs_
  2.58     15.26     0.43       64     0.01     0.01  cptrf1_
  2.34     15.65     0.39     6685     0.00     0.00  invima.2045
  1.98     15.98     0.33        1     0.33     6.58  matsim_
  1.14     16.17     0.19       64     0.00     0.00  cmpcpt_

cptrf2_ runtime increased for almost 6 seconds!

The only vectorization is in:

3530: LOOP VECTORIZED.
rnflow.f90:3510: note: vectorized 1 loops in function.

Which corresponds to:

! ______________________________________________________________________
      real, dimension (1:nxtr), intent (in)     :: xxtrt ! extrema
      integer, intent (in)                      :: nxtr  ! leur nombre
      integer, dimension (1:nxtr), intent (out) :: ixtrt ! indices
      integer, intent (out)                     :: kerr  ! code d'erreur
! ______________________________________________________________________
!
      kerr = 0
      ixtrt = 0                  <<<<<<<<<<<<<< HERE

This vectorization results in zeroing of certain memory area:

    pxor    %xmm0, %xmm0
    leaq    (%rdx,%r8,4), %r8
    xorl    %esi, %esi
    .p2align 4,,10
    .p2align 3
.L183:
    addq    $1, %rsi
    movdqa    %xmm0, (%r8)
    addq    $16, %r8
    cmpq    %rsi, %r11
    ja    .L183

And this causes 6 second difference ?!


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
  2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
                   ` (2 preceding siblings ...)
  2012-05-17 18:35 ` ubizjak at gmail dot com
@ 2012-05-17 20:47 ` ubizjak at gmail dot com
  2012-05-18 11:49 ` rguenth at gcc dot gnu.org
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: ubizjak at gmail dot com @ 2012-05-17 20:47 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346

--- Comment #4 from Uros Bizjak <ubizjak at gmail dot com> 2012-05-17 20:09:42 UTC ---
Instead of this:

.L228:
    movl    $0, -4(%rdx,%rax,4)
    addq    $1, %rax
    cmpq    %rax, %rsi
    jge    .L228

vectorization generates following:

    movq    %rdx, %rax
    movq    %r9, %r8
    andl    $15, %eax
    shrq    $2, %rax
    negq    %rax
    andl    $3, %eax
    cmpq    %r9, %rax
    cmovbe    %rax, %r8
    cmpq    $6, %r9
    cmovbe    %r9, %r8
    testq    %r8, %r8
    je    .L233
    leaq    1(%r8), %rsi
    movl    $1, %eax
    .p2align 4,,10
    .p2align 3
.L176:
    movl    $0, -4(%rdx,%rax,4)
    addq    $1, %rax
    cmpq    %rsi, %rax
    jne    .L176
    cmpq    %r9, %r8
    je    .L182
.L174:
    movq    %r9, %rbp
    subq    %r8, %rbp
    movq    %rbp, %r11
    shrq    $2, %r11
    leaq    0(,%r11,4), %rbx
    testq    %rbx, %rbx
    je    .L181
    pxor    %xmm0, %xmm0
    leaq    (%rdx,%r8,4), %r8
    xorl    %esi, %esi
    .p2align 4,,10
    .p2align 3
.L183:
    addq    $1, %rsi
    movdqa    %xmm0, (%r8)
    addq    $16, %r8
    cmpq    %rsi, %r11
    ja    .L183
    addq    %rbx, %rax
    cmpq    %rbx, %rbp
    je    .L182
    .p2align 4,,10
    .p2align 3
.L181:
    movl    $0, -4(%rdx,%rax,4)
    addq    $1, %rax
    cmpq    %rax, %r9
    jge    .L181

Whoa.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
  2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
                   ` (3 preceding siblings ...)
  2012-05-17 20:47 ` ubizjak at gmail dot com
@ 2012-05-18 11:49 ` rguenth at gcc dot gnu.org
  2012-05-18 14:28 ` rguenth at gcc dot gnu.org
                   ` (20 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-05-18 11:49 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
         AssignedTo|unassigned at gcc dot       |rguenth at gcc dot gnu.org
                   |gnu.org                     |

--- Comment #5 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-05-18 11:02:53 UTC ---
Yeah, this is sort-of related to what is observed in PR53355.  I suppose at
runtime nxtr is comparatively small.

Reduced testcase:

      subroutine cptrf2 (nxtr, ixtrt)
      integer, dimension (1:nxtr), intent (out) :: ixtrt
      ixtrt = 0
      end subroutine

we peel the loop to possibly align the stores, and we peel the loop
to possibly take care of a remaining store at the end of the array.
And of course we compute that we need at least 6 scalar iterations
to make executing the vectorized loop profitable.

And apart from all that we should have recognized the loop as memset.

Mine.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
  2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
                   ` (4 preceding siblings ...)
  2012-05-18 11:49 ` rguenth at gcc dot gnu.org
@ 2012-05-18 14:28 ` rguenth at gcc dot gnu.org
  2012-05-18 14:32 ` rguenth at gcc dot gnu.org
                   ` (19 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-05-18 14:28 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED

--- Comment #7 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-05-18 13:10:28 UTC ---
Fixed.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
  2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
                   ` (5 preceding siblings ...)
  2012-05-18 14:28 ` rguenth at gcc dot gnu.org
@ 2012-05-18 14:32 ` rguenth at gcc dot gnu.org
  2012-05-18 14:49 ` ubizjak at gmail dot com
                   ` (18 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-05-18 14:32 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346

--- Comment #6 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-05-18 13:10:11 UTC ---
Author: rguenth
Date: Fri May 18 13:10:01 2012
New Revision: 187655

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=187655
Log:
2012-05-18  Richard Guenther  <rguenther@suse.de>

    PR tree-optimization/53346
    * tree-loop-distribution.c (ldist_gen): Make sure to apply
    builtin transform even when only a single partition with
    all reads/writes exists.

    * gcc.dg/tree-ssa/ldist-18.c: New testcase.
    * gcc.target/i386/incoming-10.c: Adjust.
    * gcc.target/i386/incoming-11.c: Likewise.
    * gcc.target/i386/pr46295.c: Likewise.

Added:
    trunk/gcc/testsuite/gcc.dg/tree-ssa/ldist-18.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/testsuite/gcc.target/i386/incoming-10.c
    trunk/gcc/testsuite/gcc.target/i386/incoming-11.c
    trunk/gcc/testsuite/gcc.target/i386/pr46295.c
    trunk/gcc/tree-loop-distribution.c


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
  2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
                   ` (6 preceding siblings ...)
  2012-05-18 14:32 ` rguenth at gcc dot gnu.org
@ 2012-05-18 14:49 ` ubizjak at gmail dot com
  2012-05-18 14:52 ` dominiq at lps dot ens.fr
                   ` (17 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: ubizjak at gmail dot com @ 2012-05-18 14:49 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346

Uros Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |NEW
         Resolution|FIXED                       |

--- Comment #8 from Uros Bizjak <ubizjak at gmail dot com> 2012-05-18 14:46:01 UTC ---
(In reply to comment #7)
> Fixed.

Unfortunately, the loop in original rnflow test still gets vectorized, with no
change in the runtime:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 43.46      9.69     9.69       64     0.15     0.15  cptrf2_
 40.63     18.75     9.06     6685     0.00     0.00  trs2a2.2054
  7.89     20.51     1.76       64     0.03     0.03  gentrs_
  2.02     20.96     0.45     6685     0.00     0.00  invima.2045
  1.93     21.39     0.43       64     0.01     0.01  cptrf1_
  1.17     21.65     0.26        1     0.26    12.36  matsim_
  0.99     21.87     0.22       64     0.00     0.00  cmpcpt_

GNU Fortran (GCC) version 4.8.0 20120518 (experimental) [trunk revision 187655]


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
  2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
                   ` (7 preceding siblings ...)
  2012-05-18 14:49 ` ubizjak at gmail dot com
@ 2012-05-18 14:52 ` dominiq at lps dot ens.fr
  2012-05-18 15:13 ` ubizjak at gmail dot com
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: dominiq at lps dot ens.fr @ 2012-05-18 14:52 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346

--- Comment #9 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2012-05-18 14:49:22 UTC ---
> Unfortunately, the loop in original rnflow test still gets vectorized, with no
> change in the runtime:

Confirmed, at revision 187655 I still get

-O2                      21.8
-O2 -ftree-vectorize     27.9
-O3                      25.2
-O3 -fno-tree-vectorize  21.4

Uneducated guess: is it possible that failed attempts to vectorize may mess up
further optimizations?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
  2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
                   ` (8 preceding siblings ...)
  2012-05-18 14:52 ` dominiq at lps dot ens.fr
@ 2012-05-18 15:13 ` ubizjak at gmail dot com
  2012-05-18 17:32 ` ubizjak at gmail dot com
                   ` (15 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: ubizjak at gmail dot com @ 2012-05-18 15:13 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346

--- Comment #10 from Uros Bizjak <ubizjak at gmail dot com> 2012-05-18 15:11:53 UTC ---
(In reply to comment #8)
> (In reply to comment #7)
> > Fixed.
> 
> Unfortunately, the loop in original rnflow test still gets vectorized, with no
> change in the runtime:

With -O2 -ftree-loop-distribute-patterns -ftree-vectorize, the runtime is still
the same:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 43.76      9.70     9.70       64     0.15     0.15  cptrf2_
 40.69     18.72     9.02     6685     0.00     0.00  trs2a2.2054
  7.35     20.35     1.63       64     0.03     0.03  gentrs_
  2.21     20.84     0.49       64     0.01     0.01  cptrf1_
  1.44     21.16     0.32        1     0.32    12.32  matsim_
  1.17     21.42     0.26     6685     0.00     0.00  invima.2045
  0.81     21.60     0.18       64     0.00     0.00  cmpcpt_
  0.54     21.72     0.12        1     0.12     9.85  evlrnf_


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
  2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
                   ` (9 preceding siblings ...)
  2012-05-18 15:13 ` ubizjak at gmail dot com
@ 2012-05-18 17:32 ` ubizjak at gmail dot com
  2012-05-18 17:34 ` ubizjak at gmail dot com
                   ` (14 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: ubizjak at gmail dot com @ 2012-05-18 17:32 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346

--- Comment #11 from Uros Bizjak <ubizjak at gmail dot com> 2012-05-18 16:04:46 UTC ---
(In reply to comment #9)

> Uneducated guess: is it possible that failed attempts to vectorize may mess up
> further optimizations?

You are right. -ftree-vectorize implies -ftree-loop-if-convert and this option
makes all the difference!

-O2 -ftree-vectorize:

real    0m24.061s
user    0m23.789s
sys     0m0.225s

-O2 -ftree-vectorize -fno-tree-loop-if-convert

real    0m18.029s
user    0m17.761s
sys     0m0.220s

We were barking up to the wrong tree. ;)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
  2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
                   ` (10 preceding siblings ...)
  2012-05-18 17:32 ` ubizjak at gmail dot com
@ 2012-05-18 17:34 ` ubizjak at gmail dot com
  2012-05-18 17:46 ` ubizjak at gmail dot com
                   ` (13 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: ubizjak at gmail dot com @ 2012-05-18 17:34 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346

--- Comment #12 from Uros Bizjak <ubizjak at gmail dot com> 2012-05-18 16:07:45 UTC ---
(In reply to comment #11)

> You are right. -ftree-vectorize implies -ftree-loop-if-convert and this option
> makes all the difference!
> 
> -O2 -ftree-vectorize:
> 
> real    0m24.061s
> user    0m23.789s
> sys     0m0.225s
> 
> -O2 -ftree-vectorize -fno-tree-loop-if-convert
> 
> real    0m18.029s
> user    0m17.761s
> sys     0m0.220s

-O2 -ftree-loop-if-convert:

real    0m24.034s
user    0m23.770s
sys     0m0.218s

-O2

real    0m18.163s
user    0m17.892s
sys     0m0.233s


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
  2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
                   ` (11 preceding siblings ...)
  2012-05-18 17:34 ` ubizjak at gmail dot com
@ 2012-05-18 17:46 ` ubizjak at gmail dot com
  2012-05-18 17:48 ` ubizjak at gmail dot com
                   ` (12 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: ubizjak at gmail dot com @ 2012-05-18 17:46 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346

Uros Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hjl.tools at gmail dot com

--- Comment #14 from Uros Bizjak <ubizjak at gmail dot com> 2012-05-18 17:17:45 UTC ---
Compile and execute slow assembly:

gfortran rnflow.s && time ./a.out

real    0m24.454s
user    0m24.167s
sys     0m0.231s

Apply following patch that changes cmove in very fast loops (cptrf2) to jumps:

--cut here--
--- rnflow.s    2012-05-18 19:00:22.314102061 +0200
+++ rnflow1.s   2012-05-18 19:10:59.363428625 +0200
@@ -1305,7 +1305,9 @@
        movslq  %edx, %rbx
        movss   -4(%rdi,%rbx,4), %xmm0
        ucomiss (%r9), %xmm0
-       cmova   %ecx, %edx
+       jbe     .L183x
+       movl    %ecx, %edx
+.L183x:
        subl    $1, %ecx
        subq    $4, %r9
        cmpl    %r10d, %ecx
@@ -1329,7 +1331,9 @@
        movslq  %ecx, %r10
        movss   -4(%rdi,%r10,4), %xmm0
        ucomiss (%r9), %xmm0
-       cmova   %r11d, %ecx
+       jbe     .L192x
+       movl    %r11d, %ecx
+.L192x:
        subl    $1, %r11d
        subq    $4, %r9
        cmpl    %eax, %r11d
@@ -1485,7 +1489,9 @@
        movslq  %edx, %r10
        movss   -4(%rdi,%r10,4), %xmm0
        ucomiss (%r9), %xmm0
-       cmova   %ecx, %edx
+       jbe     .L179x
+       movl    %ecx, %edx
+.L179x:
        subq    $4, %r9
        subl    $1, %ecx
        jne     .L179
--cut here--

gfortran rnflow.s && time ./a.out

real    0m18.170s
user    0m17.907s
sys     0m0.223s

WTF happened here?!

Relevant part of my /proc/cpuinfo:

vendor_id       : GenuineIntel
cpu family      : 6
model           : 42

Adding CC.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
  2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
                   ` (12 preceding siblings ...)
  2012-05-18 17:46 ` ubizjak at gmail dot com
@ 2012-05-18 17:48 ` ubizjak at gmail dot com
  2012-05-18 17:56 ` pinskia at gcc dot gnu.org
                   ` (11 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: ubizjak at gmail dot com @ 2012-05-18 17:48 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346

--- Comment #13 from Uros Bizjak <ubizjak at gmail dot com> 2012-05-18 17:08:08 UTC ---
Created attachment 27435
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27435
slow x86_64 assembly, obtained with -O2 -ftree-loop-if-convert

This is the slow assembly, stay tuned for the WTF part.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
  2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
                   ` (13 preceding siblings ...)
  2012-05-18 17:48 ` ubizjak at gmail dot com
@ 2012-05-18 17:56 ` pinskia at gcc dot gnu.org
  2012-05-18 18:27 ` ubizjak at gmail dot com
                   ` (10 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: pinskia at gcc dot gnu.org @ 2012-05-18 17:56 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346

--- Comment #15 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-05-18 17:54:16 UTC ---
(In reply to comment #14)
> Compile and execute slow assembly:
> real    0m18.170s
> user    0m17.907s
> sys     0m0.223s
> 
> WTF happened here?!

Are conditional moves that bad on x86?  The change which uses them more for
COND_EXPR was mine but really I think this was a latent bug or a way to say
chose conditional move over jumps for some targets.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
  2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
                   ` (14 preceding siblings ...)
  2012-05-18 17:56 ` pinskia at gcc dot gnu.org
@ 2012-05-18 18:27 ` ubizjak at gmail dot com
  2012-05-18 18:27 ` hjl.tools at gmail dot com
                   ` (9 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: ubizjak at gmail dot com @ 2012-05-18 18:27 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346

--- Comment #16 from Uros Bizjak <ubizjak at gmail dot com> 2012-05-18 18:24:43 UTC ---
Perf confirms this findings, the first loop:

    0.02 :          401e10:       movslq %edx,%rbx
    5.04 :          401e13:       movss  -0x4(%rdi,%rbx,4),%xmm0
   24.97 :          401e19:       ucomiss (%r9),%xmm0
   14.66 :          401e1d:       cmova  %ecx,%edx
   15.37 :          401e20:       sub    $0x1,%ecx
    0.00 :          401e23:       sub    $0x4,%r9
    0.00 :          401e27:       cmp    %r10d,%ecx
    0.00 :          401e2a:       jne    401e10 <cptrf2_+0x230>

the second:

    0.00 :          401e60:       movslq %ecx,%r10
    1.69 :          401e63:       movss  -0x4(%rdi,%r10,4),%xmm0
    7.78 :          401e6a:       ucomiss (%r9),%xmm0
    4.75 :          401e6e:       cmova  %r11d,%ecx
    4.52 :          401e72:       sub    $0x1,%r11d
    0.00 :          401e76:       sub    $0x4,%r9
    0.05 :          401e7a:       cmp    %eax,%r11d
    0.00 :          401e7d:       jne    401e60 <cptrf2_+0x280>

the third:

    0.00 :          401ff8:       movslq %edx,%r10
    0.78 :          401ffb:       movss  -0x4(%rdi,%r10,4),%xmm0
    3.14 :          402002:       ucomiss (%r9),%xmm0
    2.04 :          402006:       cmova  %ecx,%edx
    1.89 :          402009:       sub    $0x4,%r9
    0.00 :          40200d:       sub    $0x1,%ecx
    0.00 :          402010:       jne    401ff8 <cptrf2_+0x418>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
  2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
                   ` (15 preceding siblings ...)
  2012-05-18 18:27 ` ubizjak at gmail dot com
@ 2012-05-18 18:27 ` hjl.tools at gmail dot com
  2012-05-18 19:45 ` dominiq at lps dot ens.fr
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: hjl.tools at gmail dot com @ 2012-05-18 18:27 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346

--- Comment #17 from H.J. Lu <hjl.tools at gmail dot com> 2012-05-18 18:27:21 UTC ---
I was told that cmov wins if branch is mispredicted, otherwise
cmov loses.  We will investigate if we can improve cmov in GCC.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
  2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
                   ` (16 preceding siblings ...)
  2012-05-18 18:27 ` hjl.tools at gmail dot com
@ 2012-05-18 19:45 ` dominiq at lps dot ens.fr
  2012-05-19 23:50 ` dominiq at lps dot ens.fr
                   ` (7 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: dominiq at lps dot ens.fr @ 2012-05-18 19:45 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346

--- Comment #18 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2012-05-18 18:29:06 UTC ---
> Are conditional moves that bad on x86?  The change which uses them more for
> COND_EXPR was mine but really I think this was a latent bug or a way to say
> chose conditional move over jumps for some targets.

As said in comment #0 the first regression appeared  between revisions 162456
(2010-07-23) and 164728 (2010-09-29), so the problem is fairly old

[macbook] test/dbg_rnflow% /opt/gcc/gcc4.6p-162456/bin/gfortran -c -O3
cptrf2.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
20.904u 0.345s 0:21.26 99.9%    0+0k 0+0io 0pf+0w
[macbook] test/dbg_rnflow% /opt/gcc/gcc4.6p-162456/bin/gfortran -c -O3
-fno-tree-loop-if-convert cptrf2.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
20.898u 0.341s 0:21.24 99.9%    0+0k 0+0io 0pf+0w
[macbook] test/dbg_rnflow% /opt/gcc/gcc4.6p-164728/bin/gfortran -c -O3
cptrf2.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
28.607u 0.346s 0:28.96 99.9%    0+0k 0+0io 0pf+0w
[macbook] test/dbg_rnflow% /opt/gcc/gcc4.6p-164728/bin/gfortran -c -O3
-fno-tree-loop-if-convert cptrf2.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
21.153u 0.342s 0:21.50 99.9%    0+0k 0+0io 0pf+0w


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
  2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
                   ` (17 preceding siblings ...)
  2012-05-18 19:45 ` dominiq at lps dot ens.fr
@ 2012-05-19 23:50 ` dominiq at lps dot ens.fr
  2012-09-07 11:59 ` [Bug target/53346] " rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: dominiq at lps dot ens.fr @ 2012-05-19 23:50 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346

Dominique d'Humieres <dominiq at lps dot ens.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |matz at gcc dot gnu.org

--- Comment #19 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2012-05-19 22:19:02 UTC ---
The change in timing occured at revision 163998

Author:    matz
Date:    Wed Sep 8 12:34:52 2010 UTC (20 months, 1 week ago)
Changed paths:    4
Log Message:    
    PR tree-optimization/33244
    * tree-ssa-sink.c (statement_sink_location): Don't sink into
    empty loop latches.

testsuite/
    PR tree-optimization/33244
    * gfortran.dg/vect/fast-math-vect-8.f90: New test.

[macbook] test/dbg_rnflow% /opt/gcc/gcc4.6p-163997/bin/gfortran -c -O3
cptrf2.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
20.881u 0.345s 0:21.37 99.2%    0+0k 3+0io 0pf+0w
[macbook] test/dbg_rnflow% /opt/gcc/gcc4.6p-163998/bin/gfortran -c -O3
cptrf2.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
28.545u 0.351s 0:29.06 99.4%    0+0k 3+0io 0pf+0w


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
  2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
                   ` (18 preceding siblings ...)
  2012-05-19 23:50 ` dominiq at lps dot ens.fr
@ 2012-09-07 11:59 ` rguenth at gcc dot gnu.org
  2012-11-14 22:19 ` hubicka at gcc dot gnu.org
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-09-07 11:59 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |x86_64-*-*, i?86-*-*
           Priority|P3                          |P2
          Component|tree-optimization           |target
         AssignedTo|rguenth at gcc dot gnu.org  |unassigned at gcc dot
                   |                            |gnu.org

--- Comment #20 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-09-07 11:58:31 UTC ---
This turned into a target bug about cmov.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
  2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
                   ` (19 preceding siblings ...)
  2012-09-07 11:59 ` [Bug target/53346] " rguenth at gcc dot gnu.org
@ 2012-11-14 22:19 ` hubicka at gcc dot gnu.org
  2012-11-14 22:38 ` hubicka at gcc dot gnu.org
                   ` (4 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: hubicka at gcc dot gnu.org @ 2012-11-14 22:19 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hubicka at gcc dot gnu.org

--- Comment #21 from Jan Hubicka <hubicka at gcc dot gnu.org> 2012-11-14 22:18:53 UTC ---
Well, as I wrote to the other PR, the main problem of cmov is extension of
dependency chain.  For well predicted sequence with conditional jump there is
no update of rbs so the loop executes faster, because the
loads/stores/comparisons executes "in parallel". The load in the next iteration
can then happen speculatively before the condition from previous iteration is
resolved. With cmov in it, there is dependence on rbx for all the other
computations in the loop.

I guess there is no localy available information suggesting suggesting that the
particular branch is well predictable, at least without profile feedback (where
we won't disable the conversion anyway).

I wonder
 1) why the conversion to cmov do not happen on RTL if conversion pass
 2) whether we can do something to detect similar patterns and possibly disable
cmovs on them...


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
  2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
                   ` (20 preceding siblings ...)
  2012-11-14 22:19 ` hubicka at gcc dot gnu.org
@ 2012-11-14 22:38 ` hubicka at gcc dot gnu.org
  2012-12-31  9:20 ` [Bug target/53346] [4.6/4.7/4.8 Regression] Bad if conversion in " pinskia at gcc dot gnu.org
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: hubicka at gcc dot gnu.org @ 2012-11-14 22:38 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346

--- Comment #22 from Jan Hubicka <hubicka at gcc dot gnu.org> 2012-11-14 22:38:19 UTC ---
OK, similar loop in C looks like:
float a[10000];
float b[10000];
t()
{
  int mi = 0,i;
  for (i=0;i<1000;i++)
    if (a[i]<b[i])
      mi = i;
  return mi;
}
and the why we do not ifconvert at RTl level is that the condition is UNLE that
do not pass unordered_comparsion_operator.  This was noticed by Jakub in other
PR, we do not really need to test unorderedness here since expander knows how
to handle it.  So this was more by chance than by design. I am testing
Index: config/i386/i386.md
===================================================================
--- config/i386/i386.md (revision 193503)
+++ config/i386/i386.md (working copy)
@@ -964,7 +964,7 @@
        (compare:CC (match_operand:SDWIM 1 "nonimmediate_operand")
                    (match_operand:SDWIM 2 "<general_operand>")))
    (set (pc) (if_then_else
-              (match_operator 0 "ordered_comparison_operator"
+              (match_operator 0 "comparison_operator"
                [(reg:CC FLAGS_REG) (const_int 0)])
               (label_ref (match_operand 3))
               (pc)))]
@@ -982,7 +982,7 @@
        (compare:CC (match_operand:SWIM 2 "nonimmediate_operand")
                    (match_operand:SWIM 3 "<general_operand>")))
    (set (match_operand:QI 0 "register_operand")
-       (match_operator 1 "ordered_comparison_operator"
+       (match_operator 1 "comparison_operator"
          [(reg:CC FLAGS_REG) (const_int 0)]))]
   ""
 {
@@ -16120,7 +16120,7 @@

 (define_expand "mov<mode>cc"
   [(set (match_operand:SWIM 0 "register_operand")
-       (if_then_else:SWIM (match_operand 1 "ordered_comparison_operator")
+       (if_then_else:SWIM (match_operand 1 "comparison_operator")
                           (match_operand:SWIM 2 "<general_operand>")
                           (match_operand:SWIM 3 "<general_operand>")))]
   ""


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/53346] [4.6/4.7/4.8 Regression] Bad if conversion in cptrf2 of rnflow.f90
  2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
                   ` (21 preceding siblings ...)
  2012-11-14 22:38 ` hubicka at gcc dot gnu.org
@ 2012-12-31  9:20 ` pinskia at gcc dot gnu.org
  2012-12-31  9:41 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: pinskia at gcc dot gnu.org @ 2012-12-31  9:20 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346

--- Comment #23 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-12-31 09:19:50 UTC ---
(In reply to comment #22)
If the patch referenced in comment #22 fixes this bug, then it is a dup of bug
54073.  Can someone confirm if this has been fixed on the trunk now?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/53346] [4.6/4.7/4.8 Regression] Bad if conversion in cptrf2 of rnflow.f90
  2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
                   ` (22 preceding siblings ...)
  2012-12-31  9:20 ` [Bug target/53346] [4.6/4.7/4.8 Regression] Bad if conversion in " pinskia at gcc dot gnu.org
@ 2012-12-31  9:41 ` pinskia at gcc dot gnu.org
  2022-09-26  3:22 ` cvs-commit at gcc dot gnu.org
  2022-09-26  3:24 ` crazylht at gmail dot com
  25 siblings, 0 replies; 27+ messages in thread
From: pinskia at gcc dot gnu.org @ 2012-12-31  9:41 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |DUPLICATE

--- Comment #24 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-12-31 09:40:29 UTC ---
Fixed aka a dup of bug 54073.

*** This bug has been marked as a duplicate of bug 54073 ***


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/53346] [4.6/4.7/4.8 Regression] Bad if conversion in cptrf2 of rnflow.f90
  2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
                   ` (23 preceding siblings ...)
  2012-12-31  9:41 ` pinskia at gcc dot gnu.org
@ 2022-09-26  3:22 ` cvs-commit at gcc dot gnu.org
  2022-09-26  3:24 ` crazylht at gmail dot com
  25 siblings, 0 replies; 27+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-09-26  3:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346

--- Comment #25 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:3db8e9c2422d924a958336fd0871b24cce3e65d1

commit r13-2843-g3db8e9c2422d924a958336fd0871b24cce3e65d1
Author: liuhongt <hongtao.liu@intel.com>
Date:   Wed Sep 21 14:56:08 2022 +0800

    Support 2-instruction vector shuffle for V4SI/V4SF in
ix86_expand_vec_perm_const_1.

    2022-09-23  Hongtao Liu  <hongtao.liu@intel.com>
                Liwei Xu  <liwei.xu@intel.com>

    gcc/ChangeLog:

            PR target/53346
            * config/i386/i386-expand.cc (expand_vec_perm_shufps_shufps):
            New function.
            (ix86_expand_vec_perm_const_1): Insert
            expand_vec_perm_shufps_shufps at the end of 2-instruction
            expand sequence.

    gcc/testsuite/ChangeLog:

            * gcc.target/i386/pr53346-1.c: New test.
            * gcc.target/i386/pr53346-2.c: New test.
            * gcc.target/i386/pr53346-3.c: New test.
            * gcc.target/i386/pr53346-4.c: New test.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/53346] [4.6/4.7/4.8 Regression] Bad if conversion in cptrf2 of rnflow.f90
  2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
                   ` (24 preceding siblings ...)
  2022-09-26  3:22 ` cvs-commit at gcc dot gnu.org
@ 2022-09-26  3:24 ` crazylht at gmail dot com
  25 siblings, 0 replies; 27+ messages in thread
From: crazylht at gmail dot com @ 2022-09-26  3:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346

Hongtao.liu <crazylht at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |crazylht at gmail dot com

--- Comment #26 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to CVS Commits from comment #25)
> The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:
> 
> https://gcc.gnu.org/g:3db8e9c2422d924a958336fd0871b24cce3e65d1
> 
> commit r13-2843-g3db8e9c2422d924a958336fd0871b24cce3e65d1
> Author: liuhongt <hongtao.liu@intel.com>
> Date:   Wed Sep 21 14:56:08 2022 +0800
> 
>     Support 2-instruction vector shuffle for V4SI/V4SF in
> ix86_expand_vec_perm_const_1.
>     
>     2022-09-23  Hongtao Liu  <hongtao.liu@intel.com>
>                 Liwei Xu  <liwei.xu@intel.com>
>     
>     gcc/ChangeLog:
>     
>             PR target/53346
>             * config/i386/i386-expand.cc (expand_vec_perm_shufps_shufps):
>             New function.
>             (ix86_expand_vec_perm_const_1): Insert
>             expand_vec_perm_shufps_shufps at the end of 2-instruction
>             expand sequence.
>     
>     gcc/testsuite/ChangeLog:
>     
>             * gcc.target/i386/pr53346-1.c: New test.
>             * gcc.target/i386/pr53346-2.c: New test.
>             * gcc.target/i386/pr53346-3.c: New test.
>             * gcc.target/i386/pr53346-4.c: New test.

Sorry, it should be PR54346

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2022-09-26  3:24 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
2012-05-15  9:54 ` [Bug tree-optimization/53346] " rguenth at gcc dot gnu.org
2012-05-15 12:55 ` dominiq at lps dot ens.fr
2012-05-17 18:35 ` ubizjak at gmail dot com
2012-05-17 20:47 ` ubizjak at gmail dot com
2012-05-18 11:49 ` rguenth at gcc dot gnu.org
2012-05-18 14:28 ` rguenth at gcc dot gnu.org
2012-05-18 14:32 ` rguenth at gcc dot gnu.org
2012-05-18 14:49 ` ubizjak at gmail dot com
2012-05-18 14:52 ` dominiq at lps dot ens.fr
2012-05-18 15:13 ` ubizjak at gmail dot com
2012-05-18 17:32 ` ubizjak at gmail dot com
2012-05-18 17:34 ` ubizjak at gmail dot com
2012-05-18 17:46 ` ubizjak at gmail dot com
2012-05-18 17:48 ` ubizjak at gmail dot com
2012-05-18 17:56 ` pinskia at gcc dot gnu.org
2012-05-18 18:27 ` ubizjak at gmail dot com
2012-05-18 18:27 ` hjl.tools at gmail dot com
2012-05-18 19:45 ` dominiq at lps dot ens.fr
2012-05-19 23:50 ` dominiq at lps dot ens.fr
2012-09-07 11:59 ` [Bug target/53346] " rguenth at gcc dot gnu.org
2012-11-14 22:19 ` hubicka at gcc dot gnu.org
2012-11-14 22:38 ` hubicka at gcc dot gnu.org
2012-12-31  9:20 ` [Bug target/53346] [4.6/4.7/4.8 Regression] Bad if conversion in " pinskia at gcc dot gnu.org
2012-12-31  9:41 ` pinskia at gcc dot gnu.org
2022-09-26  3:22 ` cvs-commit at gcc dot gnu.org
2022-09-26  3:24 ` crazylht at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).