public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug fortran/31139] New: sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower than C version
@ 2007-03-11 22:38 burnus at gcc dot gnu dot org
2007-03-11 22:45 ` [Bug fortran/31139] " burnus at gcc dot gnu dot org
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: burnus at gcc dot gnu dot org @ 2007-03-11 22:38 UTC (permalink / raw)
To: gcc-bugs
The problem came up with octopus, http://www.tddft.org/programs/octopus/
http://www.tddft.org/pipermail/octopus-devel/2007-March/003398.html
(Somehow, all my messages are not archived?)
The problem is that "sum(w_re(1:nn,1)*fi(i(1:nn, ii)))" can be much slower. For
the original program, one finds the following timings:
Core 2 Duo + gfortan + gcc (v. 4.1.x), total cpu time:
SSE2 120 s
PLAIN C 160 s
FORTRAN 331 s
(Fortran = assumed shape arrays)
* * *
I tried to reproduce this with a smaller test case (see attachment) - and with
explicit shape arrays. Here, SSE and non-SSE version made little difference.
Result for gcc/gfortran 4.3.0 20070311 on an Athlon 64 X2 4800+.
-O3 -march=opteron -funroll-loops -msse3 -ftree-vectorize -m64:
Fortran: 0.8240519, real 0m7.661s, user 0m7.232s
Fortran: 0.8240528, real 0m7.654s, user 0m7.232s
c_nosse: 0.2320137, real 0m7.071s, user 0m6.652s
c_nosse: 0.2320151, real 0m7.062s, user 0m6.672s
-O3 -march=opteron -msse3 -ftree-vectorize -m32:
Fortran: 0.3840241, real 0m7.714s, user 0m7.280s
Fortran: 0.3840246, real 0m7.701s, user 0m7.328s
c_nosse: 0.3480220, real 0m7.687s, user 0m7.256s
c_nosse: 0.3400207, real 0m7.670s, user 0m7.236s
And with ifort/x86-64:
gcc -std=c99 -O3 -funroll-loops -ftree-vectorize -march=opteron -msse3 -m64
ifort -xW -O3
Fortran: 0.3280210, real 0m0.855s, user 0m0.624s
Fortran: 0.3280210, real 0m0.856s, user 0m0.624s
c_nosse: 0.2320140, real 0m0.753s, user 0m0.492s
c_nosse: 0.2280150, real 0m0.756s, user 0m0.464s
and with ifort/ia32:
Fortran: 0.3000200, real 0m0.818s, user 0m0.516s
Fortran: 0.2960190, real 0m0.826s, user 0m0.528s
c_nosse: 0.3760230, real 0m0.904s, user 0m0.652s
c_nosse: 0.3800240, real 0m0.902s, user 0m0.624s
I did no yet check which of the problems are Fortran, Backend and Target
problems.
Summary:
- GCC -m32 is much slower than -m64
- gfortran is slower (-m32) / much slower (-m64) than the C version
- ifort is faster than gfortran and similarly fast on both -m32 and -m64.
--
Summary: sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower
than C version
Product: gcc
Version: 4.3.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: enhancement
Priority: P3
Component: fortran
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: burnus at gcc dot gnu dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug fortran/31139] sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower than C version
2007-03-11 22:38 [Bug fortran/31139] New: sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower than C version burnus at gcc dot gnu dot org
@ 2007-03-11 22:45 ` burnus at gcc dot gnu dot org
2007-03-11 22:50 ` burnus at gcc dot gnu dot org
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: burnus at gcc dot gnu dot org @ 2007-03-11 22:45 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from burnus at gcc dot gnu dot org 2007-03-11 22:45 -------
Contains the test case. The hand-made SSE version (USE_VECTORS) crashes here
for -m32, but as it is C vs. Fortran, one can completely ignore that test case
(for -m64 USE_VECTORS is about as fast as the other C versions anyhow).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug fortran/31139] sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower than C version
2007-03-11 22:38 [Bug fortran/31139] New: sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower than C version burnus at gcc dot gnu dot org
2007-03-11 22:45 ` [Bug fortran/31139] " burnus at gcc dot gnu dot org
@ 2007-03-11 22:50 ` burnus at gcc dot gnu dot org
2007-03-12 5:33 ` pinskia at gcc dot gnu dot org
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: burnus at gcc dot gnu dot org @ 2007-03-11 22:50 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from burnus at gcc dot gnu dot org 2007-03-11 22:50 -------
Created an attachment (id=13191)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13191&action=view)
test.tar.gz
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug fortran/31139] sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower than C version
2007-03-11 22:38 [Bug fortran/31139] New: sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower than C version burnus at gcc dot gnu dot org
2007-03-11 22:45 ` [Bug fortran/31139] " burnus at gcc dot gnu dot org
2007-03-11 22:50 ` burnus at gcc dot gnu dot org
@ 2007-03-12 5:33 ` pinskia at gcc dot gnu dot org
2007-03-12 5:38 ` pinskia at gcc dot gnu dot org
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2007-03-12 5:33 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from pinskia at gcc dot gnu dot org 2007-03-12 05:33 -------
The problem here is obvious, in the Fortran case, there is a temp array being
created while in the C case, there is not. Also in the optimized C case, the
multiplication of the complex numbers is incorrect unless you add -ffast-math.
Actually I think in both C cases it is incorrect. Can someone try with
-ffast-math for both the C and Fortran cases?
complex * complex is not a simple cross product in FP world.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug fortran/31139] sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower than C version
2007-03-11 22:38 [Bug fortran/31139] New: sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower than C version burnus at gcc dot gnu dot org
` (2 preceding siblings ...)
2007-03-12 5:33 ` pinskia at gcc dot gnu dot org
@ 2007-03-12 5:38 ` pinskia at gcc dot gnu dot org
2007-03-12 7:58 ` burnus at gcc dot gnu dot org
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2007-03-12 5:38 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from pinskia at gcc dot gnu dot org 2007-03-12 05:37 -------
> The hand-made SSE version (USE_VECTORS) crashes here for -m32
Because well complex(8)'s alignment is the same as double which means it is
only 8 byte aligned and not 16 byte aligned, it is just magical that the SSE
case works for -m64 also.
I am thinking about declaring this bug as invalid as right now the C testcase
is not even closely related to the Fortran case.
Can someone try instead of doing "__real__ a += w[j ] * __real__ mfi[*index
];" Use "a+= xxx* yyy" and also use -std=c99 to get the correct multiplication?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug fortran/31139] sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower than C version
2007-03-11 22:38 [Bug fortran/31139] New: sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower than C version burnus at gcc dot gnu dot org
` (3 preceding siblings ...)
2007-03-12 5:38 ` pinskia at gcc dot gnu dot org
@ 2007-03-12 7:58 ` burnus at gcc dot gnu dot org
2007-03-12 8:16 ` burnus at gcc dot gnu dot org
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: burnus at gcc dot gnu dot org @ 2007-03-12 7:58 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from burnus at gcc dot gnu dot org 2007-03-12 07:58 -------
> complex * complex is not a simple cross product in FP world.
Well, the program calculates: real * complex
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug fortran/31139] sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower than C version
2007-03-11 22:38 [Bug fortran/31139] New: sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower than C version burnus at gcc dot gnu dot org
` (4 preceding siblings ...)
2007-03-12 7:58 ` burnus at gcc dot gnu dot org
@ 2007-03-12 8:16 ` burnus at gcc dot gnu dot org
2007-04-18 5:56 ` fxcoudert at gcc dot gnu dot org
2010-09-12 15:45 ` jvdelisle at gcc dot gnu dot org
7 siblings, 0 replies; 9+ messages in thread
From: burnus at gcc dot gnu dot org @ 2007-03-12 8:16 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from burnus at gcc dot gnu dot org 2007-03-12 08:16 -------
> Can someone try instead of doing "__real__ a += w[j] *__real__ mfi[*index];"
> Use "a+= xxx* yyy" and also use -std=c99 to get the correct multiplication?
Well, -std=c99 was used already and the "real(!) * complex" calculation was
already correct. "c_cmplx" below uses now:
a += w[j ] * mfi[*index++];
Compiled with:
gcc -std=c99 -O3 -funroll-loops -ftree-vectorize -march=opteron -msse3
-ffast-math -m64
gfortran -O3 -funroll-loops -ftree-vectorize -march=opteron -msse3 -ffast-math
-m64
Fortran: 0.4360271
Fortran: 0.4280267
c_nosse: 0.2440166
c_nosse: 0.2320151
c_sse: 0.2320137
c_sse: 0.2400150
c_struct: 0.2320151
c_struct: 0.2320147
c_cmplx: 0.2360163
c_cmplx: 0.2320147
And using a non-manually unrolled version: 0.3760242, 0.3760242
for(i = 0; i < np ; i++) {
for(j = 1; j < n; j++)
a += w[j ] * mfi[*index++];
fo[i] = a;
}
Thus the unrolling seems to do most of the speed up. With -funroll-all-loops,
the timings of fortran an the non-unrolled version remain the same.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug fortran/31139] sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower than C version
2007-03-11 22:38 [Bug fortran/31139] New: sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower than C version burnus at gcc dot gnu dot org
` (5 preceding siblings ...)
2007-03-12 8:16 ` burnus at gcc dot gnu dot org
@ 2007-04-18 5:56 ` fxcoudert at gcc dot gnu dot org
2010-09-12 15:45 ` jvdelisle at gcc dot gnu dot org
7 siblings, 0 replies; 9+ messages in thread
From: fxcoudert at gcc dot gnu dot org @ 2007-04-18 5:56 UTC (permalink / raw)
To: gcc-bugs
--
fxcoudert at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
Last reconfirmed|0000-00-00 00:00:00 |2007-04-18 06:56:03
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug fortran/31139] sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower than C version
2007-03-11 22:38 [Bug fortran/31139] New: sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower than C version burnus at gcc dot gnu dot org
` (6 preceding siblings ...)
2007-04-18 5:56 ` fxcoudert at gcc dot gnu dot org
@ 2010-09-12 15:45 ` jvdelisle at gcc dot gnu dot org
7 siblings, 0 replies; 9+ messages in thread
From: jvdelisle at gcc dot gnu dot org @ 2010-09-12 15:45 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from jvdelisle at gcc dot gnu dot org 2010-09-12 15:45 -------
I have not tested this with latest trunk, but I wonder if any of the recent
optimization work has improved this. Can it be closed yet?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-09-12 15:45 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-11 22:38 [Bug fortran/31139] New: sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower than C version burnus at gcc dot gnu dot org
2007-03-11 22:45 ` [Bug fortran/31139] " burnus at gcc dot gnu dot org
2007-03-11 22:50 ` burnus at gcc dot gnu dot org
2007-03-12 5:33 ` pinskia at gcc dot gnu dot org
2007-03-12 5:38 ` pinskia at gcc dot gnu dot org
2007-03-12 7:58 ` burnus at gcc dot gnu dot org
2007-03-12 8:16 ` burnus at gcc dot gnu dot org
2007-04-18 5:56 ` fxcoudert at gcc dot gnu dot org
2010-09-12 15:45 ` jvdelisle at gcc dot gnu dot org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).