[Bug c++/42194] New: performance degradation with STL complex convolution operation

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug c++/42194]  New: performance degradation with STL complex convolution operation
@ 2009-11-27 12:02 jagjeet dot nain at gmail dot com
  2009-11-27 15:02 ` [Bug c++/42194] " rguenth at gcc dot gnu dot org
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: jagjeet dot nain at gmail dot com @ 2009-11-27 12:02 UTC (permalink / raw)
  To: gcc-bugs

I have very simple program which basically does complex matrix convolution
operation.
I am seeing 3 times performance degradation if this program is compiled with
4.3.2 version vs compiled with 4.0.2. I am compiling this program with -O3
option, no additional optimization flags supplied. Also one more interesting
thing to note is that this behavior is seen only with complex data type, if i
use plain float data type, timings are better with 4.3.2 version.
Please help me.


#include <complex>
#include <iostream>
#include <stdio.h>
#include <time.h>

float procTimeInSeconds()
{
    return clock()/static_cast<float>(CLOCKS_PER_SEC);
}


using namespace std;

int main(int argc , char** arg )
{

    const int Nc = 32;  // total matrix
    const int Nx = 512; // columns
    const int Nn = 16;  //typical value
    const int Ns = 10;
    const int Nw = Nc * Nn;

    complex<float>* all_weights = new complex<float>[Nx*Nw*Nc];
    complex<float>* input = (complex<float>*)new complex<float>[Nx*Nw*Ns];
    complex<float>* output = (complex<float>*)new complex<float>[Nx*Nc*Ns];
    int weights_stride_c = Nx * Nw;
    int weights_stride_w = Nx;
    int weights_stride_x = 1;
    int input_stride_s = Nx * Nw;
    int input_stride_w = Nx;
    int input_stride_x = 1;
    int output_stride_s = Nx * Nc;
    int output_stride_c = Nx;
    int output_stride_x = 1;

    // ================================================================
    //                      Round 1
    //    Do array reductions as we decend into the loop nesting, 
    //    keeping temporary pointers for each result.
    //    Results: Faster for unoptimized compilation, but slower for 
    //             compiler optimization on.
    // ================================================================
    int count = 0;
    float startTime = procTimeInSeconds();
        complex<float>* input_s;
        complex<float>* output_s ;
        complex<float>* curr_weight_c;
        complex<float>* output_sc;

        complex<float>* curr_weight_cw;
        complex<float>* input_sw;
    for(int is = 0; is < Ns; ++is )
    {
        input_s = &input[is*input_stride_s];
        output_s = &output[is*output_stride_s];

        for (int ic=0; ic<Nc; ++ic)
        {
            curr_weight_c = &all_weights[ic * weights_stride_c];
            output_sc = &output_s[ic*output_stride_c];
            // for that matrix, loop through w
            for (int iw=0; iw<Nw; ++iw)
            {
                curr_weight_cw = &curr_weight_c[weights_stride_w * iw];
                input_sw = &input_s[iw*input_stride_w];
               for (int ix=0; ix<Nx; ++ix)
                {
                    output_sc[ix*output_stride_x] +=
curr_weight_cw[ix*weights_stride_x] * input_sw[ix*input_stride_x];
                    ++count;
                }
            }
        }
    }

    //delete [] all_weights;
    float netTime = procTimeInSeconds() - startTime;
    cout << count << " in " << netTime << " seconds, round 1" << std::endl;

    return 0;
}


-- 
           Summary: performance degradation with STL complex convolution
                    operation
           Product: gcc
           Version: 4.3.3
            Status: UNCONFIRMED
          Severity: major
          Priority: P3
         Component: c++
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: jagjeet dot nain at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42194


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug c++/42194] performance degradation with STL complex convolution operation
  2009-11-27 12:02 [Bug c++/42194] New: performance degradation with STL complex convolution operation jagjeet dot nain at gmail dot com
@ 2009-11-27 15:02 ` rguenth at gcc dot gnu dot org
  2009-11-30  9:57 ` jagjeet dot nain at gmail dot com
  2010-01-10  5:54 ` jagjeet dot nain at gmail dot com
  2 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2009-11-27 15:02 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from rguenth at gcc dot gnu dot org  2009-11-27 15:02 -------
This is because with GCC 4.3 we properly implement complex arithmetic.
Use -fcx-fortran-rules or -fcx-limited-range for speed.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|                            |WORKSFORME


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42194


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug c++/42194] performance degradation with STL complex convolution operation
  2009-11-27 12:02 [Bug c++/42194] New: performance degradation with STL complex convolution operation jagjeet dot nain at gmail dot com
  2009-11-27 15:02 ` [Bug c++/42194] " rguenth at gcc dot gnu dot org
@ 2009-11-30  9:57 ` jagjeet dot nain at gmail dot com
  2010-01-10  5:54 ` jagjeet dot nain at gmail dot com
  2 siblings, 0 replies; 5+ messages in thread
From: jagjeet dot nain at gmail dot com @ 2009-11-30  9:57 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from jagjeet dot nain at gmail dot com  2009-11-30 09:57 -------
Will -fcx-limited-range or -fcx-fortran-rules change the results compared to
compiled with 4.0.2 without these flags ?

Or in otherwords, A complex division program compiled with and without
-fcx-limited-range flag of gcc 4.3.3, will results differ ?

with regards
J. S. Nain


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42194


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug c++/42194] performance degradation with STL complex convolution operation
  2009-11-27 12:02 [Bug c++/42194] New: performance degradation with STL complex convolution operation jagjeet dot nain at gmail dot com
  2009-11-27 15:02 ` [Bug c++/42194] " rguenth at gcc dot gnu dot org
  2009-11-30  9:57 ` jagjeet dot nain at gmail dot com
@ 2010-01-10  5:54 ` jagjeet dot nain at gmail dot com
  2 siblings, 0 replies; 5+ messages in thread
From: jagjeet dot nain at gmail dot com @ 2010-01-10  5:54 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from jagjeet dot nain at gmail dot com  2010-01-10 05:54 -------
Got the speedup improvement with fcx-limited-range flag.
but got into another problem.
This particular code when run on Sunx4100 server(with OpenSuse 10.1) shows
unusual behavior. Different runs have different runtime. always no other load
on server was ensured.
One in three runs have double runtime of normal run.
Processor is dual core AMD opteron.
any help in this matter is appreciated.


-- 

jagjeet dot nain at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |UNCONFIRMED
         Resolution|WORKSFORME                  |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42194


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug c++/42194] performance degradation with STL complex convolution operation
       [not found] <bug-42194-4@http.gcc.gnu.org/bugzilla/>
@ 2011-09-23 21:54 ` paolo.carlini at oracle dot com
  0 siblings, 0 replies; 5+ messages in thread
From: paolo.carlini at oracle dot com @ 2011-09-23 21:54 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42194

Paolo Carlini <paolo.carlini at oracle dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|                            |INVALID
           Severity|major                       |normal

--- Comment #4 from Paolo Carlini <paolo.carlini at oracle dot com> 2011-09-23 21:47:26 UTC ---
Closing.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-09-23 21:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-27 12:02 [Bug c++/42194] New: performance degradation with STL complex convolution operation jagjeet dot nain at gmail dot com
2009-11-27 15:02 ` [Bug c++/42194] " rguenth at gcc dot gnu dot org
2009-11-30  9:57 ` jagjeet dot nain at gmail dot com
2010-01-10  5:54 ` jagjeet dot nain at gmail dot com
     [not found] <bug-42194-4@http.gcc.gnu.org/bugzilla/>
2011-09-23 21:54 ` paolo.carlini at oracle dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).