From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 28230 invoked by alias); 14 Mar 2008 00:13:10 -0000 Received: (qmail 28190 invoked by uid 22791); 14 Mar 2008 00:13:03 -0000 X-Spam-Check-By: sourceware.org Received: from smtp101.rog.mail.re2.yahoo.com (HELO smtp101.rog.mail.re2.yahoo.com) (206.190.36.79) by sourceware.org (qpsmtpd/0.31) with SMTP; Fri, 14 Mar 2008 00:12:34 +0000 Received: (qmail 26561 invoked from network); 14 Mar 2008 00:12:31 -0000 Received: from unknown (HELO ?192.168.2.102?) (jfournier121@rogers.com@206.248.132.9 with plain) by smtp101.rog.mail.re2.yahoo.com with SMTP; 14 Mar 2008 00:12:31 -0000 X-YMail-OSG: pOTTZUoVM1mILUfsn6a30C_X0wYYb_9Bk9BHyOf4_KjsC67detpMcfytxUx0Op1wdw-- X-Yahoo-Newman-Property: ymail-3 Message-ID: <47D9C2EF.9070004@gmail.com> Date: Fri, 14 Mar 2008 00:13:00 -0000 From: JP Fournier User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.11) Gecko/20071128 SeaMonkey/1.1.7 MIME-Version: 1.0 To: gcc-help@gcc.gnu.org Subject: Re: is -O2 breaking sse2 alignment? References: <47D8679C.4090606@gmail.com> <47D8751F.722F7196@dessent.net> In-Reply-To: <47D8751F.722F7196@dessent.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Mailing-List: contact gcc-help-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-help-owner@gcc.gnu.org X-SW-Source: 2008-03/txt/msg00125.txt.bz2 Brian Dessent wrote: > > You're violates the C aliasing rules. You can't store through a casted > pointer like that. You also don't have to do the load/store, the > compiler know what you want when you use a union instead: > > union { __m128i v; long l[2]; } a, b, c; > > a.l[0] = a.l[1] = 1; > b.l[0] = b.l[1] = 1; > > c.v = _mm_add_epi8 (a.v, b.v); > printf("c0=%ld c1=%ld\n", c.l[0], c.l[1]); Many Thanks Brian. My little program now behaves better: bash-3.1$ gcc -O2 -msse2 -o sse2 sse2-1.c bash-3.1$ ./sse2 c0=2 c1=2 #include #include void test_int() { union { __m128i v; long l[2]; } a, b, c; a.l[0] = a.l[1] = 1; b.l[0] = b.l[1] = 1; c.l[0] = c.l[1] = 0; c.v = _mm_add_epi8( a.v, b.v ); printf("c0=%ld c1=%ld\n", c.l[0], c.l[1] ); } int main( int count, char ** args ) { test_int(); return 0; } > > There's an even more natural way to do this though using gcc's built-in > vector extensions without any of the Intel mmintrin.h stuff. This way > will result in code that will vectorize to altivec, sse2, spu, whatever > the machine supports, it's not hardware specific: > > typedef int v4si __attribute__ ((vector_size (16))); > > v4si a = { 1, 2, 3, 4 }, b = { 5, 6, 7, 8 }, c; > > c = a + b; > > You can use all the normal C operators like + and * as if they were > scalars but they will be compiled using the corresponding SIMD > instructions. See > for more. If > you want access to the individual parts you can again use the union, My thinking is that I'd like try to be compiler independent, so by using the intel intrinsics I figure I should be able to get gcc and the intel compiler to work as a start. What I am _really_ trying to do is to implement is the addition of elements of two arrays. Is there a more efficient way of doing this than this way?: #include #include void test_add_long(long * result, long * a, long * b, long size) { union { __m128i v; long l[2]; } temp1, temp2, temp3; int index=0; for( index=0; index < size; index+=2 ) { temp1.l[0] = a[index]; temp1.l[1] = a[index+1]; temp2.l[0] = b[index]; temp2.l[1] = b[index + 1]; temp3.v = _mm_add_epi8( temp1.v, temp2.v ); result[index] = temp3.l[0]; result[index+1] = temp3.l[1]; printf("c0=%ld c1=%ld\n", result[index], result[index+1] ); } } int main( int count, char ** args ) { // array of 4 8 byte ints long a[] = { 1, 2, 3, 4}; long b[] = { 1, 2, 3, 4}; long result[] = {0,0,0,0}; test_add_long(result, a, b, 4); return 0; }