public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization
[not found] <bug-31079-4@http.gcc.gnu.org/bugzilla/>
@ 2012-07-18 13:28 ` rguenth at gcc dot gnu.org
2013-03-27 11:45 ` rguenth at gcc dot gnu.org
1 sibling, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-07-18 13:28 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079
--- Comment #14 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-07-18 13:28:41 UTC ---
Smart again - with stock trunk I get everything optimized away ;)
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization
[not found] <bug-31079-4@http.gcc.gnu.org/bugzilla/>
2012-07-18 13:28 ` [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization rguenth at gcc dot gnu.org
@ 2013-03-27 11:45 ` rguenth at gcc dot gnu.org
1 sibling, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-03-27 11:45 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Known to work| |4.8.0
Resolution| |FIXED
Target Milestone|--- |4.8.0
--- Comment #15 from Richard Biener <rguenth at gcc dot gnu.org> 2013-03-27 11:44:56 UTC ---
We vectorize the new testcase now (move the USE function to a separate TU
to not optimize everything away...).
At -O3 -ffast-math I see
4.6: 4.25s
4.7: 4.25s
4.8/trunk: 2.7s
ifort 12.1 and -fast: 3.6s
I conclude - fixed for 4.8.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization
2007-03-08 9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
` (7 preceding siblings ...)
2008-08-19 6:12 ` jv244 at cam dot ac dot uk
@ 2008-08-19 13:33 ` jv244 at cam dot ac dot uk
8 siblings, 0 replies; 11+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-08-19 13:33 UTC (permalink / raw)
To: gcc-bugs
------- Comment #13 from jv244 at cam dot ac dot uk 2008-08-19 13:31 -------
(In reply to comment #11)
> This (PR31079_11.f90) should be a replacement for comment #4, and illustrates
> the vectorizer issue.
The patch Richard posted in PR37150 also improves this PR31079_11.f90 testcase
a lot:
ifort : 2.54
gfortran (unpatched): 4.00
gfortran (patched) : 2.96
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization
2007-03-08 9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
` (6 preceding siblings ...)
2008-08-19 6:11 ` jv244 at cam dot ac dot uk
@ 2008-08-19 6:12 ` jv244 at cam dot ac dot uk
2008-08-19 13:33 ` jv244 at cam dot ac dot uk
8 siblings, 0 replies; 11+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-08-19 6:12 UTC (permalink / raw)
To: gcc-bugs
------- Comment #12 from jv244 at cam dot ac dot uk 2008-08-19 06:11 -------
Created an attachment (id=16096)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16096&action=view)
ifort's asm for PR31079_11.f90 at -O3 -xT -S
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization
2007-03-08 9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
` (5 preceding siblings ...)
2008-08-19 5:46 ` jv244 at cam dot ac dot uk
@ 2008-08-19 6:11 ` jv244 at cam dot ac dot uk
2008-08-19 6:12 ` jv244 at cam dot ac dot uk
2008-08-19 13:33 ` jv244 at cam dot ac dot uk
8 siblings, 0 replies; 11+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-08-19 6:11 UTC (permalink / raw)
To: gcc-bugs
------- Comment #11 from jv244 at cam dot ac dot uk 2008-08-19 06:09 -------
Created an attachment (id=16095)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16095&action=view)
new testcase
This (PR31079_11.f90) should be a replacement for comment #4, and illustrates
the vectorizer issue.
> gfortran -O3 -ftree-vectorize -ffast-math -march=native PR31079_11.f90
> ./a.out
4.0282512
> ifort -O3 -xT PR31079_11.f90
PR31079_11.f90(52): (col. 13) remark: LOOP WAS VECTORIZED.
PR31079_11.f90(52): (col. 13) remark: BLOCK WAS VECTORIZED.
PR31079_11.f90(52): (col. 13) remark: LOOP WAS VECTORIZED.
PR31079_11.f90(52): (col. 13) remark: LOOP WAS VECTORIZED.
PR31079_11.f90(17): (col. 8) remark: LOOP WAS VECTORIZED.
PR31079_11.f90(24): (col. 5) remark: BLOCK WAS VECTORIZED.
PR31079_11.f90(30): (col. 7) remark: LOOP WAS VECTORIZED.
PR31079_11.f90(31): (col. 7) remark: LOOP WAS VECTORIZED.
> ./a.out
2.640165
The inner loop looks like:
DO i=1,N
s(1:2)=s(1:2)+pxy(i)%a(:)*dpy(i)%a(1)
s(3:4)=s(3:4)+pxy(i)%a(:)*dpy(i)%a(2)
ENDDO
which ifort vectorizes (I will attach the full asm):
..B3.4: # Preds ..B3.4 ..B3.3
movddup collocate_core_2_2_0_0_$DPY.0.1(%rax), %xmm2 #30.33
movddup 8+collocate_core_2_2_0_0_$DPY.0.1(%rax), %xmm4 #31.33
movaps collocate_core_2_2_0_0_$PXY.0.1(%rax), %xmm3 #30.7
mulpd %xmm3, %xmm2 #30.32
incq %rdx #29.5
addq $16, %rax #29.5
addpd %xmm2, %xmm1 #30.7
cmpq $1000, %rdx #29.5
mulpd %xmm3, %xmm4 #31.32
addpd %xmm4, %xmm0 #31.7
jl ..B3.4 # Prob 99% #29.5
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization
2007-03-08 9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
` (4 preceding siblings ...)
2008-08-19 5:45 ` jv244 at cam dot ac dot uk
@ 2008-08-19 5:46 ` jv244 at cam dot ac dot uk
2008-08-19 6:11 ` jv244 at cam dot ac dot uk
` (2 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-08-19 5:46 UTC (permalink / raw)
To: gcc-bugs
------- Comment #10 from jv244 at cam dot ac dot uk 2008-08-19 05:45 -------
Created an attachment (id=16094)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16094&action=view)
comment #0 intel's assembly (ifort 9.1 at -O2 -xT)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization
2007-03-08 9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
` (2 preceding siblings ...)
2008-08-18 15:23 ` rguenth at gcc dot gnu dot org
@ 2008-08-19 5:45 ` jv244 at cam dot ac dot uk
2008-08-19 5:45 ` jv244 at cam dot ac dot uk
` (4 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-08-19 5:45 UTC (permalink / raw)
To: gcc-bugs
------- Comment #8 from jv244 at cam dot ac dot uk 2008-08-19 05:43 -------
(In reply to comment #7)
> That is, GCCs inner loop is
>
> .L6:
> addl $1, %eax
> addsd %xmm12, %xmm11
> cmpl $100000000, %eax
> addsd %xmm14, %xmm3
> addsd %xmm15, %xmm2
> addsd %xmm13, %xmm1
> jne .L6
>
> which doesn't necessarily look slower than ICCs.
>
Right... checked trunk, and it now does something very smart with the testcase
from comment 4 ... it is now about 10 times faster than ifort (9.1 /11.0)
> gfortran -O3 -ftree-vectorize -ffast-math -march=native -S PR31079_4.f90
> ./a.out
0.25201499
> ifort -xT -O2 PR31079_4.f90
> ./a.out
2.040127
I'll see if there is a way to get the testcase somewhat smarter. I checked the
very first program (comment #0), and this is still slower with gfortran (intel
3.51 vs gfortran 4.1). Just for completeness, I attach the Fortran source and
the intel assembly.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization
2007-03-08 9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
` (3 preceding siblings ...)
2008-08-19 5:45 ` jv244 at cam dot ac dot uk
@ 2008-08-19 5:45 ` jv244 at cam dot ac dot uk
2008-08-19 5:46 ` jv244 at cam dot ac dot uk
` (3 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-08-19 5:45 UTC (permalink / raw)
To: gcc-bugs
------- Comment #9 from jv244 at cam dot ac dot uk 2008-08-19 05:44 -------
Created an attachment (id=16093)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16093&action=view)
comment #0 source
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization
2007-03-08 9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
2008-01-08 10:22 ` [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization jv244 at cam dot ac dot uk
2008-08-18 15:21 ` rguenth at gcc dot gnu dot org
@ 2008-08-18 15:23 ` rguenth at gcc dot gnu dot org
2008-08-19 5:45 ` jv244 at cam dot ac dot uk
` (5 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-08-18 15:23 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from rguenth at gcc dot gnu dot org 2008-08-18 15:22 -------
That is, GCCs inner loop is
.L6:
addl $1, %eax
addsd %xmm12, %xmm11
cmpl $100000000, %eax
addsd %xmm14, %xmm3
addsd %xmm15, %xmm2
addsd %xmm13, %xmm1
jne .L6
which doesn't necessarily look slower than ICCs.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization
2007-03-08 9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
2008-01-08 10:22 ` [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization jv244 at cam dot ac dot uk
@ 2008-08-18 15:21 ` rguenth at gcc dot gnu dot org
2008-08-18 15:23 ` rguenth at gcc dot gnu dot org
` (6 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-08-18 15:21 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from rguenth at gcc dot gnu dot org 2008-08-18 15:20 -------
The problem for the GCC vectorizer is that there are no loads or stores left
in the loop and it doesn't handle vectorizing "registers" only. This is a
case where real vectorization of straight-line code would be necessary.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization
2007-03-08 9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
@ 2008-01-08 10:22 ` jv244 at cam dot ac dot uk
2008-08-18 15:21 ` rguenth at gcc dot gnu dot org
` (7 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: jv244 at cam dot ac dot uk @ 2008-01-08 10:22 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from jv244 at cam dot ac dot uk 2008-01-08 09:52 -------
updated the summary after the analysis in comment #4, and and CCed Dorit for
the vectorization issue.
--
jv244 at cam dot ac dot uk changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |dorit at il dot ibm dot com
Summary|300% difference between |20% difference between
|ifort/gfortran |ifort/gfortran, missed
| |vectorization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31079
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2013-03-27 11:45 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <bug-31079-4@http.gcc.gnu.org/bugzilla/>
2012-07-18 13:28 ` [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization rguenth at gcc dot gnu.org
2013-03-27 11:45 ` rguenth at gcc dot gnu.org
2007-03-08 9:46 [Bug tree-optimization/31079] New: 300% difference between ifort/gfortran jv244 at cam dot ac dot uk
2008-01-08 10:22 ` [Bug tree-optimization/31079] 20% difference between ifort/gfortran, missed vectorization jv244 at cam dot ac dot uk
2008-08-18 15:21 ` rguenth at gcc dot gnu dot org
2008-08-18 15:23 ` rguenth at gcc dot gnu dot org
2008-08-19 5:45 ` jv244 at cam dot ac dot uk
2008-08-19 5:45 ` jv244 at cam dot ac dot uk
2008-08-19 5:46 ` jv244 at cam dot ac dot uk
2008-08-19 6:11 ` jv244 at cam dot ac dot uk
2008-08-19 6:12 ` jv244 at cam dot ac dot uk
2008-08-19 13:33 ` jv244 at cam dot ac dot uk
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).