public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug fortran/31139] sum(w_re(1:nn,1)*fi(i(1:nn, ii)))  up to 3.5x slower than C version
       [not found] <bug-31139-4@http.gcc.gnu.org/bugzilla/>
@ 2013-06-22 16:55 ` dominiq at lps dot ens.fr
  2015-10-10  9:21 ` dominiq at lps dot ens.fr
  1 sibling, 0 replies; 10+ messages in thread
From: dominiq at lps dot ens.fr @ 2013-06-22 16:55 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139

Dominique d'Humieres <dominiq at lps dot ens.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |WAITING

--- Comment #8 from Dominique d'Humieres <dominiq at lps dot ens.fr> ---
> I have not tested this with latest trunk, but I wonder if any of the recent
> optimization work has improved this.  Can it be closed yet?

A quick test on a 2.5Ghz Core2Duo at revision 200321 with -Ofast shows

 Fortran:   0.330040932    
 c_sse:     0.225150943    
 c_struct:  0.227035046    

and with -Ofast -funroll-loops

 Fortran:   0.213014960    
 c_sse:     0.223238945    
 c_struct:  0.209081888    

The change occured between 4.5 and 4.6 (note that 4.6 and 4.7 gives 0.263675928
without -funroll-loops). Is this still an issue?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug fortran/31139] sum(w_re(1:nn,1)*fi(i(1:nn, ii)))  up to 3.5x slower than C version
       [not found] <bug-31139-4@http.gcc.gnu.org/bugzilla/>
  2013-06-22 16:55 ` [Bug fortran/31139] sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower than C version dominiq at lps dot ens.fr
@ 2015-10-10  9:21 ` dominiq at lps dot ens.fr
  1 sibling, 0 replies; 10+ messages in thread
From: dominiq at lps dot ens.fr @ 2015-10-10  9:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139

Dominique d'Humieres <dominiq at lps dot ens.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |RESOLVED
         Resolution|---                         |FIXED

--- Comment #9 from Dominique d'Humieres <dominiq at lps dot ens.fr> ---
> The change occured between 4.5 and 4.6 (note that 4.6 and 4.7 gives
> 0.263675928 without -funroll-loops). Is this still an issue?

No feedback for over two years. Closing as FIXED.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug fortran/31139] sum(w_re(1:nn,1)*fi(i(1:nn, ii)))  up to 3.5x slower than C version
  2007-03-11 22:38 [Bug fortran/31139] New: " burnus at gcc dot gnu dot org
                   ` (6 preceding siblings ...)
  2007-04-18  5:56 ` fxcoudert at gcc dot gnu dot org
@ 2010-09-12 15:45 ` jvdelisle at gcc dot gnu dot org
  7 siblings, 0 replies; 10+ messages in thread
From: jvdelisle at gcc dot gnu dot org @ 2010-09-12 15:45 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from jvdelisle at gcc dot gnu dot org  2010-09-12 15:45 -------
I have not tested this with latest trunk, but I wonder if any of the recent
optimization work has improved this.  Can it be closed yet?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug fortran/31139] sum(w_re(1:nn,1)*fi(i(1:nn, ii)))  up to 3.5x slower than C version
  2007-03-11 22:38 [Bug fortran/31139] New: " burnus at gcc dot gnu dot org
                   ` (5 preceding siblings ...)
  2007-03-12  8:16 ` burnus at gcc dot gnu dot org
@ 2007-04-18  5:56 ` fxcoudert at gcc dot gnu dot org
  2010-09-12 15:45 ` jvdelisle at gcc dot gnu dot org
  7 siblings, 0 replies; 10+ messages in thread
From: fxcoudert at gcc dot gnu dot org @ 2007-04-18  5:56 UTC (permalink / raw)
  To: gcc-bugs



-- 

fxcoudert at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2007-04-18 06:56:03
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug fortran/31139] sum(w_re(1:nn,1)*fi(i(1:nn, ii)))  up to 3.5x slower than C version
  2007-03-11 22:38 [Bug fortran/31139] New: " burnus at gcc dot gnu dot org
                   ` (4 preceding siblings ...)
  2007-03-12  7:58 ` burnus at gcc dot gnu dot org
@ 2007-03-12  8:16 ` burnus at gcc dot gnu dot org
  2007-04-18  5:56 ` fxcoudert at gcc dot gnu dot org
  2010-09-12 15:45 ` jvdelisle at gcc dot gnu dot org
  7 siblings, 0 replies; 10+ messages in thread
From: burnus at gcc dot gnu dot org @ 2007-03-12  8:16 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from burnus at gcc dot gnu dot org  2007-03-12 08:16 -------
> Can someone try instead of doing "__real__ a += w[j] *__real__ mfi[*index];"
> Use "a+= xxx* yyy" and also use -std=c99 to get the correct multiplication?

Well, -std=c99 was used already and the "real(!) * complex" calculation was
already correct. "c_cmplx" below uses now:
      a += w[j    ] * mfi[*index++];

Compiled with:
gcc -std=c99 -O3 -funroll-loops -ftree-vectorize -march=opteron -msse3
-ffast-math -m64
gfortran -O3 -funroll-loops -ftree-vectorize -march=opteron -msse3 -ffast-math
-m64

 Fortran:   0.4360271
 Fortran:   0.4280267
 c_nosse:   0.2440166
 c_nosse:   0.2320151
 c_sse:     0.2320137
 c_sse:     0.2400150
 c_struct:  0.2320151
 c_struct:  0.2320147
 c_cmplx:   0.2360163
 c_cmplx:   0.2320147
And using a non-manually unrolled version: 0.3760242, 0.3760242
  for(i = 0; i < np ; i++) {
    for(j = 1; j < n; j++)
      a += w[j    ] * mfi[*index++];
    fo[i] = a;
  }
Thus the unrolling seems to do most of the speed up. With -funroll-all-loops,
the timings of fortran an the non-unrolled version remain the same.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug fortran/31139] sum(w_re(1:nn,1)*fi(i(1:nn, ii)))  up to 3.5x slower than C version
  2007-03-11 22:38 [Bug fortran/31139] New: " burnus at gcc dot gnu dot org
                   ` (3 preceding siblings ...)
  2007-03-12  5:38 ` pinskia at gcc dot gnu dot org
@ 2007-03-12  7:58 ` burnus at gcc dot gnu dot org
  2007-03-12  8:16 ` burnus at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: burnus at gcc dot gnu dot org @ 2007-03-12  7:58 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from burnus at gcc dot gnu dot org  2007-03-12 07:58 -------
> complex * complex is not a simple cross product in FP world.
Well, the program calculates: real * complex


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug fortran/31139] sum(w_re(1:nn,1)*fi(i(1:nn, ii)))  up to 3.5x slower than C version
  2007-03-11 22:38 [Bug fortran/31139] New: " burnus at gcc dot gnu dot org
                   ` (2 preceding siblings ...)
  2007-03-12  5:33 ` pinskia at gcc dot gnu dot org
@ 2007-03-12  5:38 ` pinskia at gcc dot gnu dot org
  2007-03-12  7:58 ` burnus at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2007-03-12  5:38 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from pinskia at gcc dot gnu dot org  2007-03-12 05:37 -------
> The hand-made SSE version (USE_VECTORS) crashes here for -m32

Because well complex(8)'s alignment is the same as double which means it is
only 8 byte aligned and not 16 byte aligned, it is just magical that the SSE
case works for -m64 also.

I am thinking about declaring this bug as invalid as right now the C testcase
is not even closely related to the Fortran case.

Can someone try instead of doing "__real__ a += w[j    ] * __real__ mfi[*index 
];" Use "a+= xxx* yyy" and also use -std=c99 to get the correct multiplication?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug fortran/31139] sum(w_re(1:nn,1)*fi(i(1:nn, ii)))  up to 3.5x slower than C version
  2007-03-11 22:38 [Bug fortran/31139] New: " burnus at gcc dot gnu dot org
  2007-03-11 22:45 ` [Bug fortran/31139] " burnus at gcc dot gnu dot org
  2007-03-11 22:50 ` burnus at gcc dot gnu dot org
@ 2007-03-12  5:33 ` pinskia at gcc dot gnu dot org
  2007-03-12  5:38 ` pinskia at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2007-03-12  5:33 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from pinskia at gcc dot gnu dot org  2007-03-12 05:33 -------
The problem here is obvious, in the Fortran case, there is a temp array being
created while in the C case, there is not.  Also in the optimized C case, the
multiplication of the complex numbers is incorrect unless you add -ffast-math. 
Actually I think in both C cases it is incorrect.  Can someone try with
-ffast-math for both the C and Fortran cases?


complex * complex is not a simple cross product in FP world.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug fortran/31139] sum(w_re(1:nn,1)*fi(i(1:nn, ii)))  up to 3.5x slower than C version
  2007-03-11 22:38 [Bug fortran/31139] New: " burnus at gcc dot gnu dot org
  2007-03-11 22:45 ` [Bug fortran/31139] " burnus at gcc dot gnu dot org
@ 2007-03-11 22:50 ` burnus at gcc dot gnu dot org
  2007-03-12  5:33 ` pinskia at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: burnus at gcc dot gnu dot org @ 2007-03-11 22:50 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from burnus at gcc dot gnu dot org  2007-03-11 22:50 -------
Created an attachment (id=13191)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13191&action=view)
test.tar.gz


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug fortran/31139] sum(w_re(1:nn,1)*fi(i(1:nn, ii)))  up to 3.5x slower than C version
  2007-03-11 22:38 [Bug fortran/31139] New: " burnus at gcc dot gnu dot org
@ 2007-03-11 22:45 ` burnus at gcc dot gnu dot org
  2007-03-11 22:50 ` burnus at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: burnus at gcc dot gnu dot org @ 2007-03-11 22:45 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from burnus at gcc dot gnu dot org  2007-03-11 22:45 -------
Contains the test case. The hand-made SSE version (USE_VECTORS) crashes here
for -m32, but as it is C vs. Fortran, one can completely ignore that test case
(for -m64 USE_VECTORS is about as fast as the other C versions anyhow).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31139


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-10-10  9:21 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-31139-4@http.gcc.gnu.org/bugzilla/>
2013-06-22 16:55 ` [Bug fortran/31139] sum(w_re(1:nn,1)*fi(i(1:nn, ii))) up to 3.5x slower than C version dominiq at lps dot ens.fr
2015-10-10  9:21 ` dominiq at lps dot ens.fr
2007-03-11 22:38 [Bug fortran/31139] New: " burnus at gcc dot gnu dot org
2007-03-11 22:45 ` [Bug fortran/31139] " burnus at gcc dot gnu dot org
2007-03-11 22:50 ` burnus at gcc dot gnu dot org
2007-03-12  5:33 ` pinskia at gcc dot gnu dot org
2007-03-12  5:38 ` pinskia at gcc dot gnu dot org
2007-03-12  7:58 ` burnus at gcc dot gnu dot org
2007-03-12  8:16 ` burnus at gcc dot gnu dot org
2007-04-18  5:56 ` fxcoudert at gcc dot gnu dot org
2010-09-12 15:45 ` jvdelisle at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).