From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 14069 invoked by alias); 6 Jun 2006 00:19:48 -0000 Received: (qmail 14061 invoked by uid 22791); 6 Jun 2006 00:19:48 -0000 X-Spam-Check-By: sourceware.org Received: from www.nabble.com (HELO talk.nabble.com) (72.21.53.35) by sourceware.org (qpsmtpd/0.31) with ESMTP; Tue, 06 Jun 2006 00:19:45 +0000 Received: from localhost ([127.0.0.1] helo=talk.nabble.com) by talk.nabble.com with esmtp (Exim 4.50) id 1FnPIM-000061-Qz for gcc-help@gcc.gnu.org; Mon, 05 Jun 2006 17:19:42 -0700 Message-ID: <4724748.post@talk.nabble.com> Date: Tue, 06 Jun 2006 00:19:00 -0000 From: bobk To: gcc-help@gcc.gnu.org Subject: HELP With Slow SSE Code MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Nabble-Sender: bobklepko@yahoo.com X-Nabble-From: bobk X-IsSubscribed: yes Mailing-List: contact gcc-help-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: Sender: gcc-help-owner@gcc.gnu.org X-SW-Source: 2006-06/txt/msg00042.txt.bz2 I am new to the world of SSE, but in trying to speed up some C code I have run into a wall which is both perplexing and frustrating (since I can't find a solution). I am hoping someone here can provide the help I seek. I thank you for all your assistance! My (watered down version) code is as follows (running on a pentium4 based machine and compiling with gcc 4.02 using the compile options: -O3 -Wall -march=pentium4 -msse2 -mfpmath=sse): // standard C #include files are put here #include // I will actually eventually be using sse2 and // sse instructions #include void main() { float *ptr1,*ptr2,*ptr3,*tptr1,*tptr2; __m128 m1,m2,m3,*sptr1,*sptr2,*sptr3; int i,j,arraysize=1000,loopcount=10; // allocate space for dynamic arrays that are aligned to 16-byte boundary (note that arraysize will actually be read into this program in the final version). ptr1=(float *) __mm_malloc(arraysize*sizeof(float),16); ptr2=(float *) __mm_malloc(arraysize*sizeof(float),16); ptr3=(float *) __mm_malloc(arraysize*sizeof(float),16); tptr1=ptr1; tptr2=ptr2; // fill in two of the arrays with some numbers for(i=0;i