From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-help-return-31919-listarch-gcc-help=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 28230 invoked by alias); 14 Mar 2008 00:13:10 -0000
Received: (qmail 28190 invoked by uid 22791); 14 Mar 2008 00:13:03 -0000
X-Spam-Check-By: sourceware.org
Received: from smtp101.rog.mail.re2.yahoo.com (HELO smtp101.rog.mail.re2.yahoo.com) (206.190.36.79)     by sourceware.org (qpsmtpd/0.31) with SMTP; Fri, 14 Mar 2008 00:12:34 +0000
Received: (qmail 26561 invoked from network); 14 Mar 2008 00:12:31 -0000
Received: from unknown (HELO ?192.168.2.102?) (jfournier121@rogers.com@206.248.132.9 with plain)   by smtp101.rog.mail.re2.yahoo.com with SMTP; 14 Mar 2008 00:12:31 -0000
X-YMail-OSG: pOTTZUoVM1mILUfsn6a30C_X0wYYb_9Bk9BHyOf4_KjsC67detpMcfytxUx0Op1wdw--
X-Yahoo-Newman-Property: ymail-3
Message-ID: <47D9C2EF.9070004@gmail.com>
Date: Fri, 14 Mar 2008 00:13:00 -0000
From: JP Fournier <jape41@gmail.com>
User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.11) Gecko/20071128 SeaMonkey/1.1.7
MIME-Version: 1.0
To: gcc-help@gcc.gnu.org
Subject: Re: is -O2 breaking sse2 alignment?
References: <47D8679C.4090606@gmail.com> <47D8751F.722F7196@dessent.net>
In-Reply-To: <47D8751F.722F7196@dessent.net>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Mailing-List: contact gcc-help-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-help.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-help/>
List-Post: <mailto:gcc-help@gcc.gnu.org>
List-Help: <mailto:gcc-help-help@gcc.gnu.org>
Sender: gcc-help-owner@gcc.gnu.org
X-SW-Source: 2008-03/txt/msg00125.txt.bz2

Brian Dessent wrote:
> 
> You're violates the C aliasing rules.  You can't store through a casted
> pointer like that.  You also don't have to do the load/store, the
> compiler know what you want when you use a union instead:
> 
>   union { __m128i v; long l[2]; } a, b, c;
> 
>    a.l[0] = a.l[1] = 1;
>    b.l[0] = b.l[1] = 1;
> 
>    c.v = _mm_add_epi8 (a.v, b.v);
>    printf("c0=%ld c1=%ld\n", c.l[0], c.l[1]);

Many Thanks Brian.  My little program now behaves better:

bash-3.1$ gcc -O2 -msse2 -o sse2 sse2-1.c
bash-3.1$ ./sse2
c0=2 c1=2

#include <stdio.h>
#include <emmintrin.h>

void test_int() {
        union { __m128i v; long l[2]; } a, b, c;

        a.l[0] = a.l[1] = 1;
        b.l[0] = b.l[1] = 1;
        c.l[0] = c.l[1] = 0;

        c.v = _mm_add_epi8( a.v, b.v );
        printf("c0=%ld c1=%ld\n", c.l[0], c.l[1] );
}
int main( int count, char ** args ) {
     test_int();
     return 0;
}


> 
> There's an even more natural way to do this though using gcc's built-in
> vector extensions without any of the Intel mmintrin.h stuff.  This way
> will result in code that will vectorize to altivec, sse2, spu, whatever
> the machine supports, it's not hardware specific:
> 
>   typedef int v4si __attribute__ ((vector_size (16)));
> 
>   v4si a = { 1, 2, 3, 4 }, b = { 5, 6, 7, 8 }, c;
> 
>   c = a + b;
> 
> You can use all the normal C operators like + and * as if they were
> scalars but they will be compiled using the corresponding SIMD
> instructions.  See
> <http://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html> for more.  If
> you want access to the individual parts you can again use the union,

My thinking is that I'd like try to be compiler independent, so by using 
the intel intrinsics I figure I should be able to get gcc and the intel 
compiler to work as a start.

What I am _really_ trying to do is to implement is the addition of 
elements of two arrays.

Is there a more efficient way of doing this than this way?:


#include <stdio.h>
#include <emmintrin.h>

void test_add_long(long * result, long * a, long * b, long size) {
        union { __m128i v; long l[2]; } temp1, temp2, temp3;
        int index=0;

        for( index=0; index < size; index+=2  ) {
            temp1.l[0] = a[index];
            temp1.l[1] = a[index+1];
            temp2.l[0] = b[index];
            temp2.l[1] = b[index + 1];

            temp3.v = _mm_add_epi8( temp1.v, temp2.v );
            result[index] = temp3.l[0];
            result[index+1] = temp3.l[1];

            printf("c0=%ld c1=%ld\n", result[index], result[index+1] );
        }
}

int main( int count, char ** args ) {
     // array of 4 8 byte ints
     long a[]  = { 1, 2, 3, 4};
     long b[]  = { 1, 2, 3, 4};
     long result[]  = {0,0,0,0};

     test_add_long(result, a, b, 4);

     return 0;
}