[Bug target/36241] New: Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/36241]  New: Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo
@ 2008-05-15  8:05 dominiq at lps dot ens dot fr
  2008-05-15  9:07 ` [Bug target/36241] " rguenth at gcc dot gnu dot org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: dominiq at lps dot ens dot fr @ 2008-05-15  8:05 UTC (permalink / raw)
  To: gcc-bugs

The following code (borrowed from
http://gcc.gnu.org/ml/gcc/2008-05/msg00134.html):

integer(8), parameter :: l = z'5fe6eb3be0000000'
integer, parameter :: ni = 3
integer :: i, j, n
integer(8) :: k
real(8) :: a, b, e, m, s
equivalence (b, k)
a = 1.0d0
e = epsilon(1.0)/2.0d0**4
m = 0.0d0
s = 0.0d0
n = 0
do
  n = n + 1
  b = a
  k = l - ishft(k, -1_8)
  do i = 1, ni
    b = b*(1.5-(0.5*a)*b*b)
  end do
  b = b + b*(0.5-(0.5*a)*b*b)
!   b = 1.0d0/sqrt(a)
  m = max(m, abs(a*b*b - 1.0d0))
  s = s + abs(a*b*b - 1.0d0)
  a = a + e
  if (a == 2.0d0) exit
end do
print *, n, m/epsilon(a), s/(n*epsilon(a))
end

gives the following timings:

[ibook-dhum] bug/timing% gfc -m64 -O3 rsqrt_8_nr_v1_s.f90
[ibook-dhum] bug/timing% time a.out
   134217728   2.0000000000000000       0.36966567113995552     
2.662u 0.008s 0:02.67 99.6%     0+0k 0+1io 0pf+0w

[ibook-dhum] bug/timing% gfc -m32 -O3 rsqrt_8_nr_v1_s.f90
[ibook-dhum] bug/timing% time a.out
   134217728   2.0000000000000000       0.36966567113995552     
7.401u 0.023s 0:07.42 100.0%    0+0k 0+0io 0pf+0w

For comparison the following code:

integer :: n
real(8) :: a, b, e, m, s
a = 1.0d0
e = epsilon(1.0)/2.0d0**4
s = 0.0d0
m = 0.0d0
n = 0
do
  n = n + 1
  b = 1.0d0/sqrt(a)
  s = s + abs(a*b*b - 1.0d0)
  m = max(m, abs(a*b*b - 1.0d0))
  a = a + e
  if (a == 2.0d0) exit
end do
print *, n, m/epsilon(a), s/(n*epsilon(a))
end

gives

[ibook-dhum] bug/timing% gfc -m64 -O3 rsqrt_8_s.f90
[ibook-dhum] bug/timing% time a.out
   134217728  1.00000000000000000       0.49419290572404861     
5.469u 0.002s 0:05.47 99.8%     0+0k 0+0io 0pf+0w
[ibook-dhum] bug/timing% gfc -m32 -O3 rsqrt_8_s.f90
[ibook-dhum] bug/timing% time a.out
   134217728  1.00000000000000000       0.49419290572404861     
5.475u 0.020s 0:05.49 100.0%    0+0k 0+0io 0pf+0w

Note that the later code is vectorized, while the former one is not.


-- 
           Summary: Executable compiled with -m64 almost three times faster
                    than the one compiled with -m32 on Core2Duo
           Product: gcc
           Version: 4.4.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: dominiq at lps dot ens dot fr
 GCC build triplet: i686-apple-darwin9
  GCC host triplet: i686-apple-darwin9
GCC target triplet: i686-apple-darwin9


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36241


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/36241] Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo
  2008-05-15  8:05 [Bug target/36241] New: Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo dominiq at lps dot ens dot fr
@ 2008-05-15  9:07 ` rguenth at gcc dot gnu dot org
  2008-05-15  9:22 ` ubizjak at gmail dot com
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-05-15  9:07 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from rguenth at gcc dot gnu dot org  2008-05-15 09:06 -------
First without -ffast-math the phiopt doesn't recognize the MAX_EXPR (see
PR36190), second

t.f90:24: note: not vectorized: number of iterations cannot be computed.
t.f90:24: note: bad loop form.

which is the problem for both testcases (so I don't see either one being
vectorized).  But I can confirm that -m32 is more than two times slower
(also with SSE math), likely due to the use of integer(8).


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2008-05-15 09:06:21
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36241


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/36241] Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo
  2008-05-15  8:05 [Bug target/36241] New: Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo dominiq at lps dot ens dot fr
  2008-05-15  9:07 ` [Bug target/36241] " rguenth at gcc dot gnu dot org
@ 2008-05-15  9:22 ` ubizjak at gmail dot com
  2009-06-15 18:32 ` fxcoudert at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: ubizjak at gmail dot com @ 2008-05-15  9:22 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from ubizjak at gmail dot com  2008-05-15 09:21 -------
This regression is due to store forwarding penalty:

        ...
        movl    %esi, -408(%ebp)
        movl    %edi, -404(%ebp)
        fldl    -408(%ebp)
        ...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36241


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/36241] Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo
  2008-05-15  8:05 [Bug target/36241] New: Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo dominiq at lps dot ens dot fr
  2008-05-15  9:07 ` [Bug target/36241] " rguenth at gcc dot gnu dot org
  2008-05-15  9:22 ` ubizjak at gmail dot com
@ 2009-06-15 18:32 ` fxcoudert at gcc dot gnu dot org
  2009-06-15 18:44 ` kargl at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: fxcoudert at gcc dot gnu dot org @ 2009-06-15 18:32 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from fxcoudert at gcc dot gnu dot org  2009-06-15 18:32 -------
This is not darwin-specific, I also see it happening on x86_64-linux.

And what's more, the output changes between -m32 and -m64.

$ cat u.f90
integer(8), parameter :: l = z'5fe6eb3be0000000'
integer, parameter :: ni = 3
integer :: i, j, n
integer(8) :: k
real(8) :: a, b, e, m, s
equivalence (b, k)
a = 1.0d0
e = epsilon(1.0)/2.0d0**4
m = 0.0d0
s = 0.0d0
n = 0
do
  n = n + 1
  b = a
  k = l - ishft(k, -1_8)
  do i = 1, ni
    b = b*(1.5-(0.5*a)*b*b)
  end do
  b = b + b*(0.5-(0.5*a)*b*b)
!   b = 1.0d0/sqrt(a)
  m = max(m, abs(a*b*b - 1.0d0))
  s = s + abs(a*b*b - 1.0d0)
  a = a + e
  if (a == 2.0d0) exit
end do
print *, n, m/epsilon(a), s/(n*epsilon(a))
end

$ gfortran -m64 -O3 u.f90 && time ./a.out                                     
   134217728   2.0000000000000000       0.36966567113995552     
./a.out  3.05s user 0.00s system 100% cpu 3.049 total

$ gfortran -m32 -O3 u.f90 && time ./a.out                                     
   134217728  9.76562500000000000E-004  1.82069155926001258E-004
./a.out  6.80s user 0.00s system 99% cpu 6.854 total

$ gfortran -m32 -O3 u.f90 -ffast-math && time ./a.out
   134217728  1.46484375000000000E-003  2.97074087939108722E-004
./a.out  6.74s user 0.00s system 99% cpu 6.743 total

$ gfortran -m64 -O3 u.f90 -ffast-math && time ./a.out
   134217728   3.0000000000000000       0.59681034088134766     
./a.out  3.18s user 0.00s system 99% cpu 3.178 total


-- 

fxcoudert at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  GCC build triplet|i686-apple-darwin9          |
   GCC host triplet|i686-apple-darwin9          |
 GCC target triplet|i686-apple-darwin9          |i686
      Known to fail|                            |4.4.0 4.5.0
   Last reconfirmed|2008-05-15 09:06:21         |2009-06-15 18:32:39
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36241


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/36241] Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo
  2008-05-15  8:05 [Bug target/36241] New: Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo dominiq at lps dot ens dot fr
                   ` (2 preceding siblings ...)
  2009-06-15 18:32 ` fxcoudert at gcc dot gnu dot org
@ 2009-06-15 18:44 ` kargl at gcc dot gnu dot org
  2009-06-16 20:30 ` dominiq at lps dot ens dot fr
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: kargl at gcc dot gnu dot org @ 2009-06-15 18:44 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from kargl at gcc dot gnu dot org  2009-06-15 18:44 -------
(In reply to comment #3)
> This is not darwin-specific, I also see it happening on x86_64-linux.
> 
> And what's more, the output changes between -m32 and -m64.

The code is invalid Fortran, so gfortran is not required to give
any sensible output.

> $ cat u.f90
> integer(8), parameter :: l = z'5fe6eb3be0000000'
> integer, parameter :: ni = 3
> integer :: i, j, n
> integer(8) :: k
> real(8) :: a, b, e, m, s
> equivalence (b, k)
> a = 1.0d0
> e = epsilon(1.0)/2.0d0**4
> m = 0.0d0
> s = 0.0d0
> n = 0
> do
>   n = n + 1
>   b = a

When you do the assignment to b, k is/becomes undefined.

>   k = l - ishft(k, -1_8)

When you do the assignment to k, b becomes undefined.
Not to mention, that the RHS uses k, which is undefined.

See Section 14.7.6 (1) and (9).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36241


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/36241] Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo
  2008-05-15  8:05 [Bug target/36241] New: Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo dominiq at lps dot ens dot fr
                   ` (3 preceding siblings ...)
  2009-06-15 18:44 ` kargl at gcc dot gnu dot org
@ 2009-06-16 20:30 ` dominiq at lps dot ens dot fr
  2009-06-16 21:02 ` kargl at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: dominiq at lps dot ens dot fr @ 2009-06-16 20:30 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from dominiq at lps dot ens dot fr  2009-06-16 20:30 -------
I have forgotten this one!

> This is not darwin-specific, I also see it happening on x86_64-linux.
> 
> And what's more, the output changes between -m32 and -m64.

This is probably related to the extra precision for some floating-point
computations (disabled by default on darwin and I refuse to learn what to do to
change it!-).

> The code is invalid Fortran, so gfortran is not required to give
> any sensible output.

You know that it is not relevant for this pr!-( would the following make you
happier?)

integer(8), parameter :: l = z'5fe6eb3be0000000'
integer, parameter :: ni = 3
integer :: i, j, n
integer(8) :: k
real(8) :: a, b, e, m, s
!equivalence (b, k)
a = 1.0d0
e = epsilon(1.0)/2.0d0**4
m = 0.0d0
s = 0.0d0
n = 0
do
  n = n + 1
  b = a
  k = transfer(b,k)
  k = l - ishft(k, -1_8)
  b = transfer(k,b)
  do i = 1, ni
    b = b*(1.5-(0.5*a)*b*b)
  end do
  b = b + b*(0.5-(0.5*a)*b*b)
!   b = 1.0d0/sqrt(a)
  m = max(m, abs(a*b*b - 1.0d0))
  s = s + abs(a*b*b - 1.0d0)
  a = a + e
  if (a == 2.0d0) exit
end do
print *, n, m/epsilon(a), s/(n*epsilon(a))
end

The timings are:

[ibook-dhum] f90/bug% gfc -O3 pr36241_db.f90
[ibook-dhum] f90/bug% time a.out
   134217728   2.0000000000000000       0.36966567113995552     
7.832u 0.010s 0:07.85 99.8%     0+0k 0+0io 0pf+0w
[ibook-dhum] f90/bug% gfc -m64 -O3 pr36241_db.f90
[ibook-dhum] f90/bug% time a.out
   134217728   2.0000000000000000       0.36966567113995552     
3.327u 0.011s 0:03.35 99.4%     0+0k 0+0io 0pf+0w


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36241


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/36241] Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo
  2008-05-15  8:05 [Bug target/36241] New: Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo dominiq at lps dot ens dot fr
                   ` (4 preceding siblings ...)
  2009-06-16 20:30 ` dominiq at lps dot ens dot fr
@ 2009-06-16 21:02 ` kargl at gcc dot gnu dot org
  2009-06-17  9:19 ` ubizjak at gmail dot com
  2009-06-29  8:47 ` ubizjak at gmail dot com
  7 siblings, 0 replies; 9+ messages in thread
From: kargl at gcc dot gnu dot org @ 2009-06-16 21:02 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from kargl at gcc dot gnu dot org  2009-06-16 21:02 -------
(In reply to comment #5)

> > The code is invalid Fortran, so gfortran is not required to give
> > any sensible output.
> 
> You know that it is not relevant for this pr!-( would the following make you
> happier?)

It most certainly is relevant to this PR and any other 
PR with invalid Fortran.  A Fortran processor can do
anything it wants with Invalid code, including meeting
your expectations.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36241


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/36241] Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo
  2008-05-15  8:05 [Bug target/36241] New: Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo dominiq at lps dot ens dot fr
                   ` (5 preceding siblings ...)
  2009-06-16 21:02 ` kargl at gcc dot gnu dot org
@ 2009-06-17  9:19 ` ubizjak at gmail dot com
  2009-06-29  8:47 ` ubizjak at gmail dot com
  7 siblings, 0 replies; 9+ messages in thread
From: ubizjak at gmail dot com @ 2009-06-17  9:19 UTC (permalink / raw)
  To: gcc-bugs

------- Comment #7 from ubizjak at gmail dot com  2009-06-17 09:18 -------
See Comment #2!

I tried to enhance ix86_secondary_reload target macro to return XMM
intermediate reg with movdi_to_sse handler for DImode -> DFmode moves. However,
handling of this macro has plenty of FIXMEs, and I was not able to get it work.

OTOH, doing integer arithmetics on 64bit _image_ of FP value has questionable
usability, so the motivation to fix this PR is proportionally low...

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36241

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/36241] Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo
  2008-05-15  8:05 [Bug target/36241] New: Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo dominiq at lps dot ens dot fr
                   ` (6 preceding siblings ...)
  2009-06-17  9:19 ` ubizjak at gmail dot com
@ 2009-06-29  8:47 ` ubizjak at gmail dot com
  7 siblings, 0 replies; 9+ messages in thread
From: ubizjak at gmail dot com @ 2009-06-29  8:47 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from ubizjak at gmail dot com  2009-06-29 08:47 -------
(In reply to comment #7)

> OTOH, doing integer arithmetics on 64bit _image_ of FP value has questionable
> usability, so the motivation to fix this PR is proportionally low...

So, wontfix.


-- 

ubizjak at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |WONTFIX


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36241


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-06-29  8:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-05-15  8:05 [Bug target/36241] New: Executable compiled with -m64 almost three times faster than the one compiled with -m32 on Core2Duo dominiq at lps dot ens dot fr
2008-05-15  9:07 ` [Bug target/36241] " rguenth at gcc dot gnu dot org
2008-05-15  9:22 ` ubizjak at gmail dot com
2009-06-15 18:32 ` fxcoudert at gcc dot gnu dot org
2009-06-15 18:44 ` kargl at gcc dot gnu dot org
2009-06-16 20:30 ` dominiq at lps dot ens dot fr
2009-06-16 21:02 ` kargl at gcc dot gnu dot org
2009-06-17  9:19 ` ubizjak at gmail dot com
2009-06-29  8:47 ` ubizjak at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).