public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/36241] New: Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo
@ 2008-05-15 8:05 dominiq at lps dot ens dot fr
2008-05-15 9:07 ` [Bug target/36241] " rguenth at gcc dot gnu dot org
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: dominiq at lps dot ens dot fr @ 2008-05-15 8:05 UTC (permalink / raw)
To: gcc-bugs
The following code (borrowed from
http://gcc.gnu.org/ml/gcc/2008-05/msg00134.html):
integer(8), parameter :: l = z'5fe6eb3be0000000'
integer, parameter :: ni = 3
integer :: i, j, n
integer(8) :: k
real(8) :: a, b, e, m, s
equivalence (b, k)
a = 1.0d0
e = epsilon(1.0)/2.0d0**4
m = 0.0d0
s = 0.0d0
n = 0
do
n = n + 1
b = a
k = l - ishft(k, -1_8)
do i = 1, ni
b = b*(1.5-(0.5*a)*b*b)
end do
b = b + b*(0.5-(0.5*a)*b*b)
! b = 1.0d0/sqrt(a)
m = max(m, abs(a*b*b - 1.0d0))
s = s + abs(a*b*b - 1.0d0)
a = a + e
if (a == 2.0d0) exit
end do
print *, n, m/epsilon(a), s/(n*epsilon(a))
end
gives the following timings:
[ibook-dhum] bug/timing% gfc -m64 -O3 rsqrt_8_nr_v1_s.f90
[ibook-dhum] bug/timing% time a.out
134217728 2.0000000000000000 0.36966567113995552
2.662u 0.008s 0:02.67 99.6% 0+0k 0+1io 0pf+0w
[ibook-dhum] bug/timing% gfc -m32 -O3 rsqrt_8_nr_v1_s.f90
[ibook-dhum] bug/timing% time a.out
134217728 2.0000000000000000 0.36966567113995552
7.401u 0.023s 0:07.42 100.0% 0+0k 0+0io 0pf+0w
For comparison the following code:
integer :: n
real(8) :: a, b, e, m, s
a = 1.0d0
e = epsilon(1.0)/2.0d0**4
s = 0.0d0
m = 0.0d0
n = 0
do
n = n + 1
b = 1.0d0/sqrt(a)
s = s + abs(a*b*b - 1.0d0)
m = max(m, abs(a*b*b - 1.0d0))
a = a + e
if (a == 2.0d0) exit
end do
print *, n, m/epsilon(a), s/(n*epsilon(a))
end
gives
[ibook-dhum] bug/timing% gfc -m64 -O3 rsqrt_8_s.f90
[ibook-dhum] bug/timing% time a.out
134217728 1.00000000000000000 0.49419290572404861
5.469u 0.002s 0:05.47 99.8% 0+0k 0+0io 0pf+0w
[ibook-dhum] bug/timing% gfc -m32 -O3 rsqrt_8_s.f90
[ibook-dhum] bug/timing% time a.out
134217728 1.00000000000000000 0.49419290572404861
5.475u 0.020s 0:05.49 100.0% 0+0k 0+0io 0pf+0w
Note that the later code is vectorized, while the former one is not.
--
Summary: Executable compiled with -m64 almost three times faster
than the one compiled with -m32 on Core2Duo
Product: gcc
Version: 4.4.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dominiq at lps dot ens dot fr
GCC build triplet: i686-apple-darwin9
GCC host triplet: i686-apple-darwin9
GCC target triplet: i686-apple-darwin9
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36241
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/36241] Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo
2008-05-15 8:05 [Bug target/36241] New: Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo dominiq at lps dot ens dot fr
@ 2008-05-15 9:07 ` rguenth at gcc dot gnu dot org
2008-05-15 9:22 ` ubizjak at gmail dot com
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-05-15 9:07 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from rguenth at gcc dot gnu dot org 2008-05-15 09:06 -------
First without -ffast-math the phiopt doesn't recognize the MAX_EXPR (see
PR36190), second
t.f90:24: note: not vectorized: number of iterations cannot be computed.
t.f90:24: note: bad loop form.
which is the problem for both testcases (so I don't see either one being
vectorized). But I can confirm that -m32 is more than two times slower
(also with SSE math), likely due to the use of integer(8).
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
Status|UNCONFIRMED |NEW
Ever Confirmed|0 |1
Last reconfirmed|0000-00-00 00:00:00 |2008-05-15 09:06:21
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36241
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/36241] Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo
2008-05-15 8:05 [Bug target/36241] New: Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo dominiq at lps dot ens dot fr
2008-05-15 9:07 ` [Bug target/36241] " rguenth at gcc dot gnu dot org
@ 2008-05-15 9:22 ` ubizjak at gmail dot com
2009-06-15 18:32 ` fxcoudert at gcc dot gnu dot org
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: ubizjak at gmail dot com @ 2008-05-15 9:22 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from ubizjak at gmail dot com 2008-05-15 09:21 -------
This regression is due to store forwarding penalty:
...
movl %esi, -408(%ebp)
movl %edi, -404(%ebp)
fldl -408(%ebp)
...
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36241
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/36241] Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo
2008-05-15 8:05 [Bug target/36241] New: Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo dominiq at lps dot ens dot fr
2008-05-15 9:07 ` [Bug target/36241] " rguenth at gcc dot gnu dot org
2008-05-15 9:22 ` ubizjak at gmail dot com
@ 2009-06-15 18:32 ` fxcoudert at gcc dot gnu dot org
2009-06-15 18:44 ` kargl at gcc dot gnu dot org
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: fxcoudert at gcc dot gnu dot org @ 2009-06-15 18:32 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from fxcoudert at gcc dot gnu dot org 2009-06-15 18:32 -------
This is not darwin-specific, I also see it happening on x86_64-linux.
And what's more, the output changes between -m32 and -m64.
$ cat u.f90
integer(8), parameter :: l = z'5fe6eb3be0000000'
integer, parameter :: ni = 3
integer :: i, j, n
integer(8) :: k
real(8) :: a, b, e, m, s
equivalence (b, k)
a = 1.0d0
e = epsilon(1.0)/2.0d0**4
m = 0.0d0
s = 0.0d0
n = 0
do
n = n + 1
b = a
k = l - ishft(k, -1_8)
do i = 1, ni
b = b*(1.5-(0.5*a)*b*b)
end do
b = b + b*(0.5-(0.5*a)*b*b)
! b = 1.0d0/sqrt(a)
m = max(m, abs(a*b*b - 1.0d0))
s = s + abs(a*b*b - 1.0d0)
a = a + e
if (a == 2.0d0) exit
end do
print *, n, m/epsilon(a), s/(n*epsilon(a))
end
$ gfortran -m64 -O3 u.f90 && time ./a.out
134217728 2.0000000000000000 0.36966567113995552
./a.out 3.05s user 0.00s system 100% cpu 3.049 total
$ gfortran -m32 -O3 u.f90 && time ./a.out
134217728 9.76562500000000000E-004 1.82069155926001258E-004
./a.out 6.80s user 0.00s system 99% cpu 6.854 total
$ gfortran -m32 -O3 u.f90 -ffast-math && time ./a.out
134217728 1.46484375000000000E-003 2.97074087939108722E-004
./a.out 6.74s user 0.00s system 99% cpu 6.743 total
$ gfortran -m64 -O3 u.f90 -ffast-math && time ./a.out
134217728 3.0000000000000000 0.59681034088134766
./a.out 3.18s user 0.00s system 99% cpu 3.178 total
--
fxcoudert at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
GCC build triplet|i686-apple-darwin9 |
GCC host triplet|i686-apple-darwin9 |
GCC target triplet|i686-apple-darwin9 |i686
Known to fail| |4.4.0 4.5.0
Last reconfirmed|2008-05-15 09:06:21 |2009-06-15 18:32:39
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36241
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/36241] Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo
2008-05-15 8:05 [Bug target/36241] New: Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo dominiq at lps dot ens dot fr
` (2 preceding siblings ...)
2009-06-15 18:32 ` fxcoudert at gcc dot gnu dot org
@ 2009-06-15 18:44 ` kargl at gcc dot gnu dot org
2009-06-16 20:30 ` dominiq at lps dot ens dot fr
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: kargl at gcc dot gnu dot org @ 2009-06-15 18:44 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from kargl at gcc dot gnu dot org 2009-06-15 18:44 -------
(In reply to comment #3)
> This is not darwin-specific, I also see it happening on x86_64-linux.
>
> And what's more, the output changes between -m32 and -m64.
The code is invalid Fortran, so gfortran is not required to give
any sensible output.
> $ cat u.f90
> integer(8), parameter :: l = z'5fe6eb3be0000000'
> integer, parameter :: ni = 3
> integer :: i, j, n
> integer(8) :: k
> real(8) :: a, b, e, m, s
> equivalence (b, k)
> a = 1.0d0
> e = epsilon(1.0)/2.0d0**4
> m = 0.0d0
> s = 0.0d0
> n = 0
> do
> n = n + 1
> b = a
When you do the assignment to b, k is/becomes undefined.
> k = l - ishft(k, -1_8)
When you do the assignment to k, b becomes undefined.
Not to mention, that the RHS uses k, which is undefined.
See Section 14.7.6 (1) and (9).
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36241
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/36241] Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo
2008-05-15 8:05 [Bug target/36241] New: Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo dominiq at lps dot ens dot fr
` (3 preceding siblings ...)
2009-06-15 18:44 ` kargl at gcc dot gnu dot org
@ 2009-06-16 20:30 ` dominiq at lps dot ens dot fr
2009-06-16 21:02 ` kargl at gcc dot gnu dot org
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-06-16 20:30 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from dominiq at lps dot ens dot fr 2009-06-16 20:30 -------
I have forgotten this one!
> This is not darwin-specific, I also see it happening on x86_64-linux.
>
> And what's more, the output changes between -m32 and -m64.
This is probably related to the extra precision for some floating-point
computations (disabled by default on darwin and I refuse to learn what to do to
change it!-).
> The code is invalid Fortran, so gfortran is not required to give
> any sensible output.
You know that it is not relevant for this pr!-( would the following make you
happier?)
integer(8), parameter :: l = z'5fe6eb3be0000000'
integer, parameter :: ni = 3
integer :: i, j, n
integer(8) :: k
real(8) :: a, b, e, m, s
!equivalence (b, k)
a = 1.0d0
e = epsilon(1.0)/2.0d0**4
m = 0.0d0
s = 0.0d0
n = 0
do
n = n + 1
b = a
k = transfer(b,k)
k = l - ishft(k, -1_8)
b = transfer(k,b)
do i = 1, ni
b = b*(1.5-(0.5*a)*b*b)
end do
b = b + b*(0.5-(0.5*a)*b*b)
! b = 1.0d0/sqrt(a)
m = max(m, abs(a*b*b - 1.0d0))
s = s + abs(a*b*b - 1.0d0)
a = a + e
if (a == 2.0d0) exit
end do
print *, n, m/epsilon(a), s/(n*epsilon(a))
end
The timings are:
[ibook-dhum] f90/bug% gfc -O3 pr36241_db.f90
[ibook-dhum] f90/bug% time a.out
134217728 2.0000000000000000 0.36966567113995552
7.832u 0.010s 0:07.85 99.8% 0+0k 0+0io 0pf+0w
[ibook-dhum] f90/bug% gfc -m64 -O3 pr36241_db.f90
[ibook-dhum] f90/bug% time a.out
134217728 2.0000000000000000 0.36966567113995552
3.327u 0.011s 0:03.35 99.4% 0+0k 0+0io 0pf+0w
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36241
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/36241] Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo
2008-05-15 8:05 [Bug target/36241] New: Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo dominiq at lps dot ens dot fr
` (4 preceding siblings ...)
2009-06-16 20:30 ` dominiq at lps dot ens dot fr
@ 2009-06-16 21:02 ` kargl at gcc dot gnu dot org
2009-06-17 9:19 ` ubizjak at gmail dot com
2009-06-29 8:47 ` ubizjak at gmail dot com
7 siblings, 0 replies; 9+ messages in thread
From: kargl at gcc dot gnu dot org @ 2009-06-16 21:02 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from kargl at gcc dot gnu dot org 2009-06-16 21:02 -------
(In reply to comment #5)
> > The code is invalid Fortran, so gfortran is not required to give
> > any sensible output.
>
> You know that it is not relevant for this pr!-( would the following make you
> happier?)
It most certainly is relevant to this PR and any other
PR with invalid Fortran. A Fortran processor can do
anything it wants with Invalid code, including meeting
your expectations.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36241
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/36241] Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo
2008-05-15 8:05 [Bug target/36241] New: Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo dominiq at lps dot ens dot fr
` (5 preceding siblings ...)
2009-06-16 21:02 ` kargl at gcc dot gnu dot org
@ 2009-06-17 9:19 ` ubizjak at gmail dot com
2009-06-29 8:47 ` ubizjak at gmail dot com
7 siblings, 0 replies; 9+ messages in thread
From: ubizjak at gmail dot com @ 2009-06-17 9:19 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from ubizjak at gmail dot com 2009-06-17 09:18 -------
See Comment #2!
I tried to enhance ix86_secondary_reload target macro to return XMM
intermediate reg with movdi_to_sse handler for DImode -> DFmode moves. However,
handling of this macro has plenty of FIXMEs, and I was not able to get it work.
OTOH, doing integer arithmetics on 64bit _image_ of FP value has questionable
usability, so the motivation to fix this PR is proportionally low...
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36241
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/36241] Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo
2008-05-15 8:05 [Bug target/36241] New: Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo dominiq at lps dot ens dot fr
` (6 preceding siblings ...)
2009-06-17 9:19 ` ubizjak at gmail dot com
@ 2009-06-29 8:47 ` ubizjak at gmail dot com
7 siblings, 0 replies; 9+ messages in thread
From: ubizjak at gmail dot com @ 2009-06-29 8:47 UTC (permalink / raw)
To: gcc-bugs
------- Comment #8 from ubizjak at gmail dot com 2009-06-29 08:47 -------
(In reply to comment #7)
> OTOH, doing integer arithmetics on 64bit _image_ of FP value has questionable
> usability, so the motivation to fix this PR is proportionally low...
So, wontfix.
--
ubizjak at gmail dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |WONTFIX
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36241
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-06-29 8:47 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-05-15 8:05 [Bug target/36241] New: Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo dominiq at lps dot ens dot fr
2008-05-15 9:07 ` [Bug target/36241] " rguenth at gcc dot gnu dot org
2008-05-15 9:22 ` ubizjak at gmail dot com
2009-06-15 18:32 ` fxcoudert at gcc dot gnu dot org
2009-06-15 18:44 ` kargl at gcc dot gnu dot org
2009-06-16 20:30 ` dominiq at lps dot ens dot fr
2009-06-16 21:02 ` kargl at gcc dot gnu dot org
2009-06-17 9:19 ` ubizjak at gmail dot com
2009-06-29 8:47 ` ubizjak at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).