public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
@ 2012-05-14 15:44 dominiq at lps dot ens.fr
2012-05-15 9:54 ` [Bug tree-optimization/53346] " rguenth at gcc dot gnu.org
` (25 more replies)
0 siblings, 26 replies; 27+ messages in thread
From: dominiq at lps dot ens.fr @ 2012-05-14 15:44 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346
Bug #: 53346
Summary: [4.6/4.7/4.8 Regression] Bad vectorization in the proc
cptrf2 of rnflow.f90
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: dominiq@lps.ens.fr
CC: rguenth@gcc.gnu.org, ubizjak@gmail.com
At revision 187457 (i.e., with pr53340 fixed) on x86_64-apple-darwin10, after
[macbook] test/dbg_rnflow% gfc -c -O3 -ffast-math -funroll-loops timctr.f90
cmpcpt.f90 cptrf2.f90 dger.f90 dgetri.f90 dswap.f90 dtrsm.f90 evlrnf.f90
idamax.f90 main.f90 mattrs.f90 cmpmat.f90 dgemm.f90 dgetf2.f90 dlaswp.f90
dtrmm.f90 dtrti2.f90 extpic.f90 ilaenv.f90 matcnt.f90 reaseq.f90 xerbla.f90
cptrf1.f90 dgemv.f90 dgetrf.f90 dscal.f90 dtrmv.f90 dtrtri.f90 gentrs.f90
lsame.f90 matsim.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
23.872u
0.349s 0:24.22 99.9% 0+0k 0+0io 0pf+0w[macbook] test/dbg_rnflow%
/opt/gcc/gcc4.8p-187339/bin/gfortran -c -O3 -ffast-math -funroll-loops
evlrnf.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
22.259u 0.346s 0:22.61 99.9% 0+0k 0+0io 0pf+0w
[macbook] test/dbg_rnflow% /opt/gcc/gcc4.8p-187291/bin/gfortran -c -O3
-ffast-math -funroll-loops idamax.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
22.252u 0.345s 0:22.60 99.9% 0+0k 0+0io 0pf+0w
[macbook] test/dbg_rnflow% /opt/gcc/gcc4.8p-187102/bin/gfortran -c -O3
-ffast-math -funroll-loops idamax.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
22.121u 0.346s 0:22.47 99.9% 0+0k 0+0io 0pf+0w
(i.e., working around prpr53342 and a regression for idamax.f90, see
below), the compilation of cptrf2.f90 (source attached to pr53340) with the
following flags yiels
optimization level 4.4.6 4.5.3 4.6.3 4.7.0 r187457
-O2 27.8 28.2 28.2 21.8 21.8
-O2 -ftree-vectorize 27.8 28.2 28.2 27.9 27.9
-O3 22.0 21.3 25.1 25.3 25.3
-O3 -fno-tree-vectorize 22.1 21.3 21.4 21.4 21.4
Note that 4.5/4.6/4.7 vectorize two loops (lines 21 and 29), while 4.8
vectorizes only the loop at line 21 (29: not vectorized: iteration count too
small.).
Looking at my archives I have found that a first regression appeared
between revisions 162456 and 164728
optimization level 4.6-162456 4.6p-164728
-O2 28.2 28.3
-O2 -ftree-vectorize 28.1 28.3
-O3 21.4 29.4
-O3 -fno-tree-vectorize 21.3 21.4
-O3 -ffast-math 21.4 22.3
-O3 -ffast-math -funroll-loops 21.9 22.4
For the record, as said above the compilation of idamax regressed between
revisions 187102 and 187291
[macbook] test/dbg_rnflow% /opt/gcc/gcc4.8p-187291/bin/gfortran -c -O3
-ffast-math -funroll-loops idamax.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
22.252u 0.345s 0:22.60 99.9% 0+0k 0+0io 0pf+0w
[macbook] test/dbg_rnflow% /opt/gcc/gcc4.8p-187102/bin/gfortran -c -O3
-ffast-math -funroll-loops idamax.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
22.121u 0.346s 0:22.47 99.9% 0+0k 0+0io 0pf+0w
Although the regression is slightly above the noise margin at the level of
rnflow.f90, it could be worth to investigate it because:
(1) it is a LAPACK routine (may be slightly modified),
(2) there equivalent intrinsics in F90,
(3) the slowdown may be quite significant at the level of the proc itself.
^ permalink raw reply [flat|nested] 27+ messages in thread
* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
@ 2012-05-15 9:54 ` rguenth at gcc dot gnu.org
2012-05-15 12:55 ` dominiq at lps dot ens.fr
` (24 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-05-15 9:54 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Target Milestone|--- |4.8.0
--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-05-15 09:43:31 UTC ---
Do you possibly have a testcase?
^ permalink raw reply [flat|nested] 27+ messages in thread
* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
2012-05-15 9:54 ` [Bug tree-optimization/53346] " rguenth at gcc dot gnu.org
@ 2012-05-15 12:55 ` dominiq at lps dot ens.fr
2012-05-17 18:35 ` ubizjak at gmail dot com
` (23 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: dominiq at lps dot ens.fr @ 2012-05-15 12:55 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346
--- Comment #2 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2012-05-15 12:39:50 UTC ---
> Do you possibly have a testcase?
I am not sure to understand what you ask for.
The source for cptrf2.f90 has been attached to pr53340. I can provide a version
of rnflow without the proc cptrf2 or an archive with the rnflow.f90 source
split to one file per proc.
If you ask for a reduced test, it is much more difficult:
(1) the code is not mine and I don't know it well,
(2) optimizations may change for tiny details of the source layout.
^ permalink raw reply [flat|nested] 27+ messages in thread
* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
2012-05-15 9:54 ` [Bug tree-optimization/53346] " rguenth at gcc dot gnu.org
2012-05-15 12:55 ` dominiq at lps dot ens.fr
@ 2012-05-17 18:35 ` ubizjak at gmail dot com
2012-05-17 20:47 ` ubizjak at gmail dot com
` (22 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: ubizjak at gmail dot com @ 2012-05-17 18:35 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346
Uros Bizjak <ubizjak at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2012-05-17
Ever Confirmed|0 |1
--- Comment #3 from Uros Bizjak <ubizjak at gmail dot com> 2012-05-17 18:29:12 UTC ---
Confirmed, -O2 vs. -O2 -ftree-vectorize on x86_64:
-O2 -ftree-vectorize:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
43.83 9.73 9.73 64 0.15 0.15 cptrf2_
40.68 18.76 9.03 6685 0.00 0.00 trs2a2.2054
7.70 20.47 1.71 64 0.03 0.03 gentrs_
1.49 20.80 0.33 64 0.01 0.01 cptrf1_
1.40 21.11 0.31 1 0.31 12.33 matsim_
1.40 21.42 0.31 6685 0.00 0.00 invima.2045
1.13 21.67 0.25 64 0.00 0.00 cmpcpt_
-O2:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
55.20 9.20 9.20 6685 0.00 0.00 trs2a2.2054
23.40 13.10 3.90 64 0.06 0.06 cptrf2_
10.38 14.83 1.73 64 0.03 0.03 gentrs_
2.58 15.26 0.43 64 0.01 0.01 cptrf1_
2.34 15.65 0.39 6685 0.00 0.00 invima.2045
1.98 15.98 0.33 1 0.33 6.58 matsim_
1.14 16.17 0.19 64 0.00 0.00 cmpcpt_
cptrf2_ runtime increased for almost 6 seconds!
The only vectorization is in:
3530: LOOP VECTORIZED.
rnflow.f90:3510: note: vectorized 1 loops in function.
Which corresponds to:
! ______________________________________________________________________
real, dimension (1:nxtr), intent (in) :: xxtrt ! extrema
integer, intent (in) :: nxtr ! leur nombre
integer, dimension (1:nxtr), intent (out) :: ixtrt ! indices
integer, intent (out) :: kerr ! code d'erreur
! ______________________________________________________________________
!
kerr = 0
ixtrt = 0 <<<<<<<<<<<<<< HERE
This vectorization results in zeroing of certain memory area:
pxor %xmm0, %xmm0
leaq (%rdx,%r8,4), %r8
xorl %esi, %esi
.p2align 4,,10
.p2align 3
.L183:
addq $1, %rsi
movdqa %xmm0, (%r8)
addq $16, %r8
cmpq %rsi, %r11
ja .L183
And this causes 6 second difference ?!
^ permalink raw reply [flat|nested] 27+ messages in thread
* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
` (2 preceding siblings ...)
2012-05-17 18:35 ` ubizjak at gmail dot com
@ 2012-05-17 20:47 ` ubizjak at gmail dot com
2012-05-18 11:49 ` rguenth at gcc dot gnu.org
` (21 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: ubizjak at gmail dot com @ 2012-05-17 20:47 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346
--- Comment #4 from Uros Bizjak <ubizjak at gmail dot com> 2012-05-17 20:09:42 UTC ---
Instead of this:
.L228:
movl $0, -4(%rdx,%rax,4)
addq $1, %rax
cmpq %rax, %rsi
jge .L228
vectorization generates following:
movq %rdx, %rax
movq %r9, %r8
andl $15, %eax
shrq $2, %rax
negq %rax
andl $3, %eax
cmpq %r9, %rax
cmovbe %rax, %r8
cmpq $6, %r9
cmovbe %r9, %r8
testq %r8, %r8
je .L233
leaq 1(%r8), %rsi
movl $1, %eax
.p2align 4,,10
.p2align 3
.L176:
movl $0, -4(%rdx,%rax,4)
addq $1, %rax
cmpq %rsi, %rax
jne .L176
cmpq %r9, %r8
je .L182
.L174:
movq %r9, %rbp
subq %r8, %rbp
movq %rbp, %r11
shrq $2, %r11
leaq 0(,%r11,4), %rbx
testq %rbx, %rbx
je .L181
pxor %xmm0, %xmm0
leaq (%rdx,%r8,4), %r8
xorl %esi, %esi
.p2align 4,,10
.p2align 3
.L183:
addq $1, %rsi
movdqa %xmm0, (%r8)
addq $16, %r8
cmpq %rsi, %r11
ja .L183
addq %rbx, %rax
cmpq %rbx, %rbp
je .L182
.p2align 4,,10
.p2align 3
.L181:
movl $0, -4(%rdx,%rax,4)
addq $1, %rax
cmpq %rax, %r9
jge .L181
Whoa.
^ permalink raw reply [flat|nested] 27+ messages in thread
* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
` (3 preceding siblings ...)
2012-05-17 20:47 ` ubizjak at gmail dot com
@ 2012-05-18 11:49 ` rguenth at gcc dot gnu.org
2012-05-18 14:28 ` rguenth at gcc dot gnu.org
` (20 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-05-18 11:49 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
AssignedTo|unassigned at gcc dot |rguenth at gcc dot gnu.org
|gnu.org |
--- Comment #5 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-05-18 11:02:53 UTC ---
Yeah, this is sort-of related to what is observed in PR53355. I suppose at
runtime nxtr is comparatively small.
Reduced testcase:
subroutine cptrf2 (nxtr, ixtrt)
integer, dimension (1:nxtr), intent (out) :: ixtrt
ixtrt = 0
end subroutine
we peel the loop to possibly align the stores, and we peel the loop
to possibly take care of a remaining store at the end of the array.
And of course we compute that we need at least 6 scalar iterations
to make executing the vectorized loop profitable.
And apart from all that we should have recognized the loop as memset.
Mine.
^ permalink raw reply [flat|nested] 27+ messages in thread
* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
` (4 preceding siblings ...)
2012-05-18 11:49 ` rguenth at gcc dot gnu.org
@ 2012-05-18 14:28 ` rguenth at gcc dot gnu.org
2012-05-18 14:32 ` rguenth at gcc dot gnu.org
` (19 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-05-18 14:28 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
--- Comment #7 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-05-18 13:10:28 UTC ---
Fixed.
^ permalink raw reply [flat|nested] 27+ messages in thread
* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
` (5 preceding siblings ...)
2012-05-18 14:28 ` rguenth at gcc dot gnu.org
@ 2012-05-18 14:32 ` rguenth at gcc dot gnu.org
2012-05-18 14:49 ` ubizjak at gmail dot com
` (18 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-05-18 14:32 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346
--- Comment #6 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-05-18 13:10:11 UTC ---
Author: rguenth
Date: Fri May 18 13:10:01 2012
New Revision: 187655
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=187655
Log:
2012-05-18 Richard Guenther <rguenther@suse.de>
PR tree-optimization/53346
* tree-loop-distribution.c (ldist_gen): Make sure to apply
builtin transform even when only a single partition with
all reads/writes exists.
* gcc.dg/tree-ssa/ldist-18.c: New testcase.
* gcc.target/i386/incoming-10.c: Adjust.
* gcc.target/i386/incoming-11.c: Likewise.
* gcc.target/i386/pr46295.c: Likewise.
Added:
trunk/gcc/testsuite/gcc.dg/tree-ssa/ldist-18.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/i386/incoming-10.c
trunk/gcc/testsuite/gcc.target/i386/incoming-11.c
trunk/gcc/testsuite/gcc.target/i386/pr46295.c
trunk/gcc/tree-loop-distribution.c
^ permalink raw reply [flat|nested] 27+ messages in thread
* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
` (6 preceding siblings ...)
2012-05-18 14:32 ` rguenth at gcc dot gnu.org
@ 2012-05-18 14:49 ` ubizjak at gmail dot com
2012-05-18 14:52 ` dominiq at lps dot ens.fr
` (17 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: ubizjak at gmail dot com @ 2012-05-18 14:49 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346
Uros Bizjak <ubizjak at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |NEW
Resolution|FIXED |
--- Comment #8 from Uros Bizjak <ubizjak at gmail dot com> 2012-05-18 14:46:01 UTC ---
(In reply to comment #7)
> Fixed.
Unfortunately, the loop in original rnflow test still gets vectorized, with no
change in the runtime:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
43.46 9.69 9.69 64 0.15 0.15 cptrf2_
40.63 18.75 9.06 6685 0.00 0.00 trs2a2.2054
7.89 20.51 1.76 64 0.03 0.03 gentrs_
2.02 20.96 0.45 6685 0.00 0.00 invima.2045
1.93 21.39 0.43 64 0.01 0.01 cptrf1_
1.17 21.65 0.26 1 0.26 12.36 matsim_
0.99 21.87 0.22 64 0.00 0.00 cmpcpt_
GNU Fortran (GCC) version 4.8.0 20120518 (experimental) [trunk revision 187655]
^ permalink raw reply [flat|nested] 27+ messages in thread
* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
` (7 preceding siblings ...)
2012-05-18 14:49 ` ubizjak at gmail dot com
@ 2012-05-18 14:52 ` dominiq at lps dot ens.fr
2012-05-18 15:13 ` ubizjak at gmail dot com
` (16 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: dominiq at lps dot ens.fr @ 2012-05-18 14:52 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346
--- Comment #9 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2012-05-18 14:49:22 UTC ---
> Unfortunately, the loop in original rnflow test still gets vectorized, with no
> change in the runtime:
Confirmed, at revision 187655 I still get
-O2 21.8
-O2 -ftree-vectorize 27.9
-O3 25.2
-O3 -fno-tree-vectorize 21.4
Uneducated guess: is it possible that failed attempts to vectorize may mess up
further optimizations?
^ permalink raw reply [flat|nested] 27+ messages in thread
* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
` (8 preceding siblings ...)
2012-05-18 14:52 ` dominiq at lps dot ens.fr
@ 2012-05-18 15:13 ` ubizjak at gmail dot com
2012-05-18 17:32 ` ubizjak at gmail dot com
` (15 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: ubizjak at gmail dot com @ 2012-05-18 15:13 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346
--- Comment #10 from Uros Bizjak <ubizjak at gmail dot com> 2012-05-18 15:11:53 UTC ---
(In reply to comment #8)
> (In reply to comment #7)
> > Fixed.
>
> Unfortunately, the loop in original rnflow test still gets vectorized, with no
> change in the runtime:
With -O2 -ftree-loop-distribute-patterns -ftree-vectorize, the runtime is still
the same:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
43.76 9.70 9.70 64 0.15 0.15 cptrf2_
40.69 18.72 9.02 6685 0.00 0.00 trs2a2.2054
7.35 20.35 1.63 64 0.03 0.03 gentrs_
2.21 20.84 0.49 64 0.01 0.01 cptrf1_
1.44 21.16 0.32 1 0.32 12.32 matsim_
1.17 21.42 0.26 6685 0.00 0.00 invima.2045
0.81 21.60 0.18 64 0.00 0.00 cmpcpt_
0.54 21.72 0.12 1 0.12 9.85 evlrnf_
^ permalink raw reply [flat|nested] 27+ messages in thread
* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
` (9 preceding siblings ...)
2012-05-18 15:13 ` ubizjak at gmail dot com
@ 2012-05-18 17:32 ` ubizjak at gmail dot com
2012-05-18 17:34 ` ubizjak at gmail dot com
` (14 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: ubizjak at gmail dot com @ 2012-05-18 17:32 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346
--- Comment #11 from Uros Bizjak <ubizjak at gmail dot com> 2012-05-18 16:04:46 UTC ---
(In reply to comment #9)
> Uneducated guess: is it possible that failed attempts to vectorize may mess up
> further optimizations?
You are right. -ftree-vectorize implies -ftree-loop-if-convert and this option
makes all the difference!
-O2 -ftree-vectorize:
real 0m24.061s
user 0m23.789s
sys 0m0.225s
-O2 -ftree-vectorize -fno-tree-loop-if-convert
real 0m18.029s
user 0m17.761s
sys 0m0.220s
We were barking up to the wrong tree. ;)
^ permalink raw reply [flat|nested] 27+ messages in thread
* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
` (10 preceding siblings ...)
2012-05-18 17:32 ` ubizjak at gmail dot com
@ 2012-05-18 17:34 ` ubizjak at gmail dot com
2012-05-18 17:46 ` ubizjak at gmail dot com
` (13 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: ubizjak at gmail dot com @ 2012-05-18 17:34 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346
--- Comment #12 from Uros Bizjak <ubizjak at gmail dot com> 2012-05-18 16:07:45 UTC ---
(In reply to comment #11)
> You are right. -ftree-vectorize implies -ftree-loop-if-convert and this option
> makes all the difference!
>
> -O2 -ftree-vectorize:
>
> real 0m24.061s
> user 0m23.789s
> sys 0m0.225s
>
> -O2 -ftree-vectorize -fno-tree-loop-if-convert
>
> real 0m18.029s
> user 0m17.761s
> sys 0m0.220s
-O2 -ftree-loop-if-convert:
real 0m24.034s
user 0m23.770s
sys 0m0.218s
-O2
real 0m18.163s
user 0m17.892s
sys 0m0.233s
^ permalink raw reply [flat|nested] 27+ messages in thread
* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
` (11 preceding siblings ...)
2012-05-18 17:34 ` ubizjak at gmail dot com
@ 2012-05-18 17:46 ` ubizjak at gmail dot com
2012-05-18 17:48 ` ubizjak at gmail dot com
` (12 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: ubizjak at gmail dot com @ 2012-05-18 17:46 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346
Uros Bizjak <ubizjak at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |hjl.tools at gmail dot com
--- Comment #14 from Uros Bizjak <ubizjak at gmail dot com> 2012-05-18 17:17:45 UTC ---
Compile and execute slow assembly:
gfortran rnflow.s && time ./a.out
real 0m24.454s
user 0m24.167s
sys 0m0.231s
Apply following patch that changes cmove in very fast loops (cptrf2) to jumps:
--cut here--
--- rnflow.s 2012-05-18 19:00:22.314102061 +0200
+++ rnflow1.s 2012-05-18 19:10:59.363428625 +0200
@@ -1305,7 +1305,9 @@
movslq %edx, %rbx
movss -4(%rdi,%rbx,4), %xmm0
ucomiss (%r9), %xmm0
- cmova %ecx, %edx
+ jbe .L183x
+ movl %ecx, %edx
+.L183x:
subl $1, %ecx
subq $4, %r9
cmpl %r10d, %ecx
@@ -1329,7 +1331,9 @@
movslq %ecx, %r10
movss -4(%rdi,%r10,4), %xmm0
ucomiss (%r9), %xmm0
- cmova %r11d, %ecx
+ jbe .L192x
+ movl %r11d, %ecx
+.L192x:
subl $1, %r11d
subq $4, %r9
cmpl %eax, %r11d
@@ -1485,7 +1489,9 @@
movslq %edx, %r10
movss -4(%rdi,%r10,4), %xmm0
ucomiss (%r9), %xmm0
- cmova %ecx, %edx
+ jbe .L179x
+ movl %ecx, %edx
+.L179x:
subq $4, %r9
subl $1, %ecx
jne .L179
--cut here--
gfortran rnflow.s && time ./a.out
real 0m18.170s
user 0m17.907s
sys 0m0.223s
WTF happened here?!
Relevant part of my /proc/cpuinfo:
vendor_id : GenuineIntel
cpu family : 6
model : 42
Adding CC.
^ permalink raw reply [flat|nested] 27+ messages in thread
* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
` (12 preceding siblings ...)
2012-05-18 17:46 ` ubizjak at gmail dot com
@ 2012-05-18 17:48 ` ubizjak at gmail dot com
2012-05-18 17:56 ` pinskia at gcc dot gnu.org
` (11 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: ubizjak at gmail dot com @ 2012-05-18 17:48 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346
--- Comment #13 from Uros Bizjak <ubizjak at gmail dot com> 2012-05-18 17:08:08 UTC ---
Created attachment 27435
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27435
slow x86_64 assembly, obtained with -O2 -ftree-loop-if-convert
This is the slow assembly, stay tuned for the WTF part.
^ permalink raw reply [flat|nested] 27+ messages in thread
* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
` (13 preceding siblings ...)
2012-05-18 17:48 ` ubizjak at gmail dot com
@ 2012-05-18 17:56 ` pinskia at gcc dot gnu.org
2012-05-18 18:27 ` ubizjak at gmail dot com
` (10 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: pinskia at gcc dot gnu.org @ 2012-05-18 17:56 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346
--- Comment #15 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-05-18 17:54:16 UTC ---
(In reply to comment #14)
> Compile and execute slow assembly:
> real 0m18.170s
> user 0m17.907s
> sys 0m0.223s
>
> WTF happened here?!
Are conditional moves that bad on x86? The change which uses them more for
COND_EXPR was mine but really I think this was a latent bug or a way to say
chose conditional move over jumps for some targets.
^ permalink raw reply [flat|nested] 27+ messages in thread
* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
` (14 preceding siblings ...)
2012-05-18 17:56 ` pinskia at gcc dot gnu.org
@ 2012-05-18 18:27 ` ubizjak at gmail dot com
2012-05-18 18:27 ` hjl.tools at gmail dot com
` (9 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: ubizjak at gmail dot com @ 2012-05-18 18:27 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346
--- Comment #16 from Uros Bizjak <ubizjak at gmail dot com> 2012-05-18 18:24:43 UTC ---
Perf confirms this findings, the first loop:
0.02 : 401e10: movslq %edx,%rbx
5.04 : 401e13: movss -0x4(%rdi,%rbx,4),%xmm0
24.97 : 401e19: ucomiss (%r9),%xmm0
14.66 : 401e1d: cmova %ecx,%edx
15.37 : 401e20: sub $0x1,%ecx
0.00 : 401e23: sub $0x4,%r9
0.00 : 401e27: cmp %r10d,%ecx
0.00 : 401e2a: jne 401e10 <cptrf2_+0x230>
the second:
0.00 : 401e60: movslq %ecx,%r10
1.69 : 401e63: movss -0x4(%rdi,%r10,4),%xmm0
7.78 : 401e6a: ucomiss (%r9),%xmm0
4.75 : 401e6e: cmova %r11d,%ecx
4.52 : 401e72: sub $0x1,%r11d
0.00 : 401e76: sub $0x4,%r9
0.05 : 401e7a: cmp %eax,%r11d
0.00 : 401e7d: jne 401e60 <cptrf2_+0x280>
the third:
0.00 : 401ff8: movslq %edx,%r10
0.78 : 401ffb: movss -0x4(%rdi,%r10,4),%xmm0
3.14 : 402002: ucomiss (%r9),%xmm0
2.04 : 402006: cmova %ecx,%edx
1.89 : 402009: sub $0x4,%r9
0.00 : 40200d: sub $0x1,%ecx
0.00 : 402010: jne 401ff8 <cptrf2_+0x418>
^ permalink raw reply [flat|nested] 27+ messages in thread
* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
` (15 preceding siblings ...)
2012-05-18 18:27 ` ubizjak at gmail dot com
@ 2012-05-18 18:27 ` hjl.tools at gmail dot com
2012-05-18 19:45 ` dominiq at lps dot ens.fr
` (8 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: hjl.tools at gmail dot com @ 2012-05-18 18:27 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346
--- Comment #17 from H.J. Lu <hjl.tools at gmail dot com> 2012-05-18 18:27:21 UTC ---
I was told that cmov wins if branch is mispredicted, otherwise
cmov loses. We will investigate if we can improve cmov in GCC.
^ permalink raw reply [flat|nested] 27+ messages in thread
* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
` (16 preceding siblings ...)
2012-05-18 18:27 ` hjl.tools at gmail dot com
@ 2012-05-18 19:45 ` dominiq at lps dot ens.fr
2012-05-19 23:50 ` dominiq at lps dot ens.fr
` (7 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: dominiq at lps dot ens.fr @ 2012-05-18 19:45 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346
--- Comment #18 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2012-05-18 18:29:06 UTC ---
> Are conditional moves that bad on x86? The change which uses them more for
> COND_EXPR was mine but really I think this was a latent bug or a way to say
> chose conditional move over jumps for some targets.
As said in comment #0 the first regression appeared between revisions 162456
(2010-07-23) and 164728 (2010-09-29), so the problem is fairly old
[macbook] test/dbg_rnflow% /opt/gcc/gcc4.6p-162456/bin/gfortran -c -O3
cptrf2.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
20.904u 0.345s 0:21.26 99.9% 0+0k 0+0io 0pf+0w
[macbook] test/dbg_rnflow% /opt/gcc/gcc4.6p-162456/bin/gfortran -c -O3
-fno-tree-loop-if-convert cptrf2.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
20.898u 0.341s 0:21.24 99.9% 0+0k 0+0io 0pf+0w
[macbook] test/dbg_rnflow% /opt/gcc/gcc4.6p-164728/bin/gfortran -c -O3
cptrf2.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
28.607u 0.346s 0:28.96 99.9% 0+0k 0+0io 0pf+0w
[macbook] test/dbg_rnflow% /opt/gcc/gcc4.6p-164728/bin/gfortran -c -O3
-fno-tree-loop-if-convert cptrf2.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
21.153u 0.342s 0:21.50 99.9% 0+0k 0+0io 0pf+0w
^ permalink raw reply [flat|nested] 27+ messages in thread
* [Bug tree-optimization/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
` (17 preceding siblings ...)
2012-05-18 19:45 ` dominiq at lps dot ens.fr
@ 2012-05-19 23:50 ` dominiq at lps dot ens.fr
2012-09-07 11:59 ` [Bug target/53346] " rguenth at gcc dot gnu.org
` (6 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: dominiq at lps dot ens.fr @ 2012-05-19 23:50 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346
Dominique d'Humieres <dominiq at lps dot ens.fr> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |matz at gcc dot gnu.org
--- Comment #19 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2012-05-19 22:19:02 UTC ---
The change in timing occured at revision 163998
Author: matz
Date: Wed Sep 8 12:34:52 2010 UTC (20 months, 1 week ago)
Changed paths: 4
Log Message:
PR tree-optimization/33244
* tree-ssa-sink.c (statement_sink_location): Don't sink into
empty loop latches.
testsuite/
PR tree-optimization/33244
* gfortran.dg/vect/fast-math-vect-8.f90: New test.
[macbook] test/dbg_rnflow% /opt/gcc/gcc4.6p-163997/bin/gfortran -c -O3
cptrf2.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
20.881u 0.345s 0:21.37 99.2% 0+0k 3+0io 0pf+0w
[macbook] test/dbg_rnflow% /opt/gcc/gcc4.6p-163998/bin/gfortran -c -O3
cptrf2.f90
[macbook] test/dbg_rnflow% makeo ; time a.out > /dev/null
28.545u 0.351s 0:29.06 99.4% 0+0k 3+0io 0pf+0w
^ permalink raw reply [flat|nested] 27+ messages in thread
* [Bug target/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
` (18 preceding siblings ...)
2012-05-19 23:50 ` dominiq at lps dot ens.fr
@ 2012-09-07 11:59 ` rguenth at gcc dot gnu.org
2012-11-14 22:19 ` hubicka at gcc dot gnu.org
` (5 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-09-07 11:59 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target| |x86_64-*-*, i?86-*-*
Priority|P3 |P2
Component|tree-optimization |target
AssignedTo|rguenth at gcc dot gnu.org |unassigned at gcc dot
| |gnu.org
--- Comment #20 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-09-07 11:58:31 UTC ---
This turned into a target bug about cmov.
^ permalink raw reply [flat|nested] 27+ messages in thread
* [Bug target/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
` (19 preceding siblings ...)
2012-09-07 11:59 ` [Bug target/53346] " rguenth at gcc dot gnu.org
@ 2012-11-14 22:19 ` hubicka at gcc dot gnu.org
2012-11-14 22:38 ` hubicka at gcc dot gnu.org
` (4 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: hubicka at gcc dot gnu.org @ 2012-11-14 22:19 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346
Jan Hubicka <hubicka at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |hubicka at gcc dot gnu.org
--- Comment #21 from Jan Hubicka <hubicka at gcc dot gnu.org> 2012-11-14 22:18:53 UTC ---
Well, as I wrote to the other PR, the main problem of cmov is extension of
dependency chain. For well predicted sequence with conditional jump there is
no update of rbs so the loop executes faster, because the
loads/stores/comparisons executes "in parallel". The load in the next iteration
can then happen speculatively before the condition from previous iteration is
resolved. With cmov in it, there is dependence on rbx for all the other
computations in the loop.
I guess there is no localy available information suggesting suggesting that the
particular branch is well predictable, at least without profile feedback (where
we won't disable the conversion anyway).
I wonder
1) why the conversion to cmov do not happen on RTL if conversion pass
2) whether we can do something to detect similar patterns and possibly disable
cmovs on them...
^ permalink raw reply [flat|nested] 27+ messages in thread
* [Bug target/53346] [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90
2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
` (20 preceding siblings ...)
2012-11-14 22:19 ` hubicka at gcc dot gnu.org
@ 2012-11-14 22:38 ` hubicka at gcc dot gnu.org
2012-12-31 9:20 ` [Bug target/53346] [4.6/4.7/4.8 Regression] Bad if conversion in " pinskia at gcc dot gnu.org
` (3 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: hubicka at gcc dot gnu.org @ 2012-11-14 22:38 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346
--- Comment #22 from Jan Hubicka <hubicka at gcc dot gnu.org> 2012-11-14 22:38:19 UTC ---
OK, similar loop in C looks like:
float a[10000];
float b[10000];
t()
{
int mi = 0,i;
for (i=0;i<1000;i++)
if (a[i]<b[i])
mi = i;
return mi;
}
and the why we do not ifconvert at RTl level is that the condition is UNLE that
do not pass unordered_comparsion_operator. This was noticed by Jakub in other
PR, we do not really need to test unorderedness here since expander knows how
to handle it. So this was more by chance than by design. I am testing
Index: config/i386/i386.md
===================================================================
--- config/i386/i386.md (revision 193503)
+++ config/i386/i386.md (working copy)
@@ -964,7 +964,7 @@
(compare:CC (match_operand:SDWIM 1 "nonimmediate_operand")
(match_operand:SDWIM 2 "<general_operand>")))
(set (pc) (if_then_else
- (match_operator 0 "ordered_comparison_operator"
+ (match_operator 0 "comparison_operator"
[(reg:CC FLAGS_REG) (const_int 0)])
(label_ref (match_operand 3))
(pc)))]
@@ -982,7 +982,7 @@
(compare:CC (match_operand:SWIM 2 "nonimmediate_operand")
(match_operand:SWIM 3 "<general_operand>")))
(set (match_operand:QI 0 "register_operand")
- (match_operator 1 "ordered_comparison_operator"
+ (match_operator 1 "comparison_operator"
[(reg:CC FLAGS_REG) (const_int 0)]))]
""
{
@@ -16120,7 +16120,7 @@
(define_expand "mov<mode>cc"
[(set (match_operand:SWIM 0 "register_operand")
- (if_then_else:SWIM (match_operand 1 "ordered_comparison_operator")
+ (if_then_else:SWIM (match_operand 1 "comparison_operator")
(match_operand:SWIM 2 "<general_operand>")
(match_operand:SWIM 3 "<general_operand>")))]
""
^ permalink raw reply [flat|nested] 27+ messages in thread
* [Bug target/53346] [4.6/4.7/4.8 Regression] Bad if conversion in cptrf2 of rnflow.f90
2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
` (21 preceding siblings ...)
2012-11-14 22:38 ` hubicka at gcc dot gnu.org
@ 2012-12-31 9:20 ` pinskia at gcc dot gnu.org
2012-12-31 9:41 ` pinskia at gcc dot gnu.org
` (2 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: pinskia at gcc dot gnu.org @ 2012-12-31 9:20 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346
--- Comment #23 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-12-31 09:19:50 UTC ---
(In reply to comment #22)
If the patch referenced in comment #22 fixes this bug, then it is a dup of bug
54073. Can someone confirm if this has been fixed on the trunk now?
^ permalink raw reply [flat|nested] 27+ messages in thread
* [Bug target/53346] [4.6/4.7/4.8 Regression] Bad if conversion in cptrf2 of rnflow.f90
2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
` (22 preceding siblings ...)
2012-12-31 9:20 ` [Bug target/53346] [4.6/4.7/4.8 Regression] Bad if conversion in " pinskia at gcc dot gnu.org
@ 2012-12-31 9:41 ` pinskia at gcc dot gnu.org
2022-09-26 3:22 ` cvs-commit at gcc dot gnu.org
2022-09-26 3:24 ` crazylht at gmail dot com
25 siblings, 0 replies; 27+ messages in thread
From: pinskia at gcc dot gnu.org @ 2012-12-31 9:41 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |DUPLICATE
--- Comment #24 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-12-31 09:40:29 UTC ---
Fixed aka a dup of bug 54073.
*** This bug has been marked as a duplicate of bug 54073 ***
^ permalink raw reply [flat|nested] 27+ messages in thread
* [Bug target/53346] [4.6/4.7/4.8 Regression] Bad if conversion in cptrf2 of rnflow.f90
2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
` (23 preceding siblings ...)
2012-12-31 9:41 ` pinskia at gcc dot gnu.org
@ 2022-09-26 3:22 ` cvs-commit at gcc dot gnu.org
2022-09-26 3:24 ` crazylht at gmail dot com
25 siblings, 0 replies; 27+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-09-26 3:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346
--- Comment #25 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:
https://gcc.gnu.org/g:3db8e9c2422d924a958336fd0871b24cce3e65d1
commit r13-2843-g3db8e9c2422d924a958336fd0871b24cce3e65d1
Author: liuhongt <hongtao.liu@intel.com>
Date: Wed Sep 21 14:56:08 2022 +0800
Support 2-instruction vector shuffle for V4SI/V4SF in
ix86_expand_vec_perm_const_1.
2022-09-23 Hongtao Liu <hongtao.liu@intel.com>
Liwei Xu <liwei.xu@intel.com>
gcc/ChangeLog:
PR target/53346
* config/i386/i386-expand.cc (expand_vec_perm_shufps_shufps):
New function.
(ix86_expand_vec_perm_const_1): Insert
expand_vec_perm_shufps_shufps at the end of 2-instruction
expand sequence.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr53346-1.c: New test.
* gcc.target/i386/pr53346-2.c: New test.
* gcc.target/i386/pr53346-3.c: New test.
* gcc.target/i386/pr53346-4.c: New test.
^ permalink raw reply [flat|nested] 27+ messages in thread
* [Bug target/53346] [4.6/4.7/4.8 Regression] Bad if conversion in cptrf2 of rnflow.f90
2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
` (24 preceding siblings ...)
2022-09-26 3:22 ` cvs-commit at gcc dot gnu.org
@ 2022-09-26 3:24 ` crazylht at gmail dot com
25 siblings, 0 replies; 27+ messages in thread
From: crazylht at gmail dot com @ 2022-09-26 3:24 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346
Hongtao.liu <crazylht at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |crazylht at gmail dot com
--- Comment #26 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to CVS Commits from comment #25)
> The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:
>
> https://gcc.gnu.org/g:3db8e9c2422d924a958336fd0871b24cce3e65d1
>
> commit r13-2843-g3db8e9c2422d924a958336fd0871b24cce3e65d1
> Author: liuhongt <hongtao.liu@intel.com>
> Date: Wed Sep 21 14:56:08 2022 +0800
>
> Support 2-instruction vector shuffle for V4SI/V4SF in
> ix86_expand_vec_perm_const_1.
>
> 2022-09-23 Hongtao Liu <hongtao.liu@intel.com>
> Liwei Xu <liwei.xu@intel.com>
>
> gcc/ChangeLog:
>
> PR target/53346
> * config/i386/i386-expand.cc (expand_vec_perm_shufps_shufps):
> New function.
> (ix86_expand_vec_perm_const_1): Insert
> expand_vec_perm_shufps_shufps at the end of 2-instruction
> expand sequence.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr53346-1.c: New test.
> * gcc.target/i386/pr53346-2.c: New test.
> * gcc.target/i386/pr53346-3.c: New test.
> * gcc.target/i386/pr53346-4.c: New test.
Sorry, it should be PR54346
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2022-09-26 3:24 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-14 15:44 [Bug tree-optimization/53346] New: [4.6/4.7/4.8 Regression] Bad vectorization in the proc cptrf2 of rnflow.f90 dominiq at lps dot ens.fr
2012-05-15 9:54 ` [Bug tree-optimization/53346] " rguenth at gcc dot gnu.org
2012-05-15 12:55 ` dominiq at lps dot ens.fr
2012-05-17 18:35 ` ubizjak at gmail dot com
2012-05-17 20:47 ` ubizjak at gmail dot com
2012-05-18 11:49 ` rguenth at gcc dot gnu.org
2012-05-18 14:28 ` rguenth at gcc dot gnu.org
2012-05-18 14:32 ` rguenth at gcc dot gnu.org
2012-05-18 14:49 ` ubizjak at gmail dot com
2012-05-18 14:52 ` dominiq at lps dot ens.fr
2012-05-18 15:13 ` ubizjak at gmail dot com
2012-05-18 17:32 ` ubizjak at gmail dot com
2012-05-18 17:34 ` ubizjak at gmail dot com
2012-05-18 17:46 ` ubizjak at gmail dot com
2012-05-18 17:48 ` ubizjak at gmail dot com
2012-05-18 17:56 ` pinskia at gcc dot gnu.org
2012-05-18 18:27 ` ubizjak at gmail dot com
2012-05-18 18:27 ` hjl.tools at gmail dot com
2012-05-18 19:45 ` dominiq at lps dot ens.fr
2012-05-19 23:50 ` dominiq at lps dot ens.fr
2012-09-07 11:59 ` [Bug target/53346] " rguenth at gcc dot gnu.org
2012-11-14 22:19 ` hubicka at gcc dot gnu.org
2012-11-14 22:38 ` hubicka at gcc dot gnu.org
2012-12-31 9:20 ` [Bug target/53346] [4.6/4.7/4.8 Regression] Bad if conversion in " pinskia at gcc dot gnu.org
2012-12-31 9:41 ` pinskia at gcc dot gnu.org
2022-09-26 3:22 ` cvs-commit at gcc dot gnu.org
2022-09-26 3:24 ` crazylht at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).