From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 10794 invoked by alias); 27 Nov 2009 12:02:08 -0000 Received: (qmail 7562 invoked by uid 48); 27 Nov 2009 12:01:56 -0000 Date: Fri, 27 Nov 2009 12:02:00 -0000 Subject: [Bug c++/42194] New: performance degradation with STL complex convolution operation X-Bugzilla-Reason: CC Message-ID: Reply-To: gcc-bugzilla@gcc.gnu.org To: gcc-bugs@gcc.gnu.org From: "jagjeet dot nain at gmail dot com" Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2009-11/txt/msg02353.txt.bz2 I have very simple program which basically does complex matrix convolution operation. I am seeing 3 times performance degradation if this program is compiled with 4.3.2 version vs compiled with 4.0.2. I am compiling this program with -O3 option, no additional optimization flags supplied. Also one more interesting thing to note is that this behavior is seen only with complex data type, if i use plain float data type, timings are better with 4.3.2 version. Please help me. #include #include #include #include float procTimeInSeconds() { return clock()/static_cast(CLOCKS_PER_SEC); } using namespace std; int main(int argc , char** arg ) { const int Nc = 32; // total matrix const int Nx = 512; // columns const int Nn = 16; //typical value const int Ns = 10; const int Nw = Nc * Nn; complex* all_weights = new complex[Nx*Nw*Nc]; complex* input = (complex*)new complex[Nx*Nw*Ns]; complex* output = (complex*)new complex[Nx*Nc*Ns]; int weights_stride_c = Nx * Nw; int weights_stride_w = Nx; int weights_stride_x = 1; int input_stride_s = Nx * Nw; int input_stride_w = Nx; int input_stride_x = 1; int output_stride_s = Nx * Nc; int output_stride_c = Nx; int output_stride_x = 1; // ================================================================ // Round 1 // Do array reductions as we decend into the loop nesting, // keeping temporary pointers for each result. // Results: Faster for unoptimized compilation, but slower for // compiler optimization on. // ================================================================ int count = 0; float startTime = procTimeInSeconds(); complex* input_s; complex* output_s ; complex* curr_weight_c; complex* output_sc; complex* curr_weight_cw; complex* input_sw; for(int is = 0; is < Ns; ++is ) { input_s = &input[is*input_stride_s]; output_s = &output[is*output_stride_s]; for (int ic=0; ic