public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
From: Brian Dessent <brian@dessent.net>
To: JP Fournier <jape41@gmail.com>
Cc: gcc-help@gcc.gnu.org
Subject: Re: is -O2 breaking sse2 alignment?
Date: Thu, 13 Mar 2008 00:28:00 -0000	[thread overview]
Message-ID: <47D8751F.722F7196@dessent.net> (raw)
In-Reply-To: <47D8679C.4090606@gmail.com>

JP Fournier wrote:

> In the example below, compiling with -O2 results in incorrect output
> from the program.  -O seems OK.  Am I missing something alignment wise
> (or otherwise) or is -O2 breaking my alignment?

If it was an alignment problem you'd most likely be getting a
segmentation fault.  The __m128i type should already include the proper
alignment so you don't need the __attribute__((aligned (16))) stuff.

>         // array of 2 8 byte ints
>         long int *a  = _mm_malloc(16, 16);
>         long int *b  = _mm_malloc(16, 16);
>         long int *c  = _mm_malloc(16, 16);
> 
>         __m128i ai __attribute__ ((aligned (16)));
>         __m128i bi __attribute__ ((aligned (16)));
>         __m128i ci __attribute__ ((aligned (16)));
> 
>         a[0] = a[1] = 1;
>         b[0] = b[1] = 1;
>         c[0] = c[1] = 0;
> 
>         ai = _mm_load_si128( (__m128i *) (void*)a );
>         bi = _mm_load_si128( (__m128i *) (void*)b );
> 
>         ci = _mm_add_epi8( ai, bi );
>         _mm_store_si128( (__m128i *) (void*)c, ci );
>         printf("c0=%ld c1=%ld\n", c[0], c[1] );
> }

You're violates the C aliasing rules.  You can't store through a casted
pointer like that.  You also don't have to do the load/store, the
compiler know what you want when you use a union instead:

  union { __m128i v; long l[2]; } a, b, c;

   a.l[0] = a.l[1] = 1;
   b.l[0] = b.l[1] = 1;

   c.v = _mm_add_epi8 (a.v, b.v);
   printf("c0=%ld c1=%ld\n", c.l[0], c.l[1]);

There's an even more natural way to do this though using gcc's built-in
vector extensions without any of the Intel mmintrin.h stuff.  This way
will result in code that will vectorize to altivec, sse2, spu, whatever
the machine supports, it's not hardware specific:

  typedef int v4si __attribute__ ((vector_size (16)));

  v4si a = { 1, 2, 3, 4 }, b = { 5, 6, 7, 8 }, c;

  c = a + b;

You can use all the normal C operators like + and * as if they were
scalars but they will be compiled using the corresponding SIMD
instructions.  See
<http://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html> for more.  If
you want access to the individual parts you can again use the union,
e.g.

  union { v4si v; int i[4]; } u;

  u.v = a + b;

  printf ("%d,%d,%d,%d\n", v.i[0], v.i[1], v.i[2], v.i[3]);

Brian

  reply	other threads:[~2008-03-13  0:28 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-12 23:31 JP Fournier
2008-03-13  0:28 ` Brian Dessent [this message]
2008-03-14  0:13   ` JP Fournier
2008-03-14 19:31     ` Brian Budge
2008-03-15  9:49       ` Andrew Haley
2008-03-15  0:10     ` Maximillian Murphy
2008-03-15 15:10       ` Maximillian Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47D8751F.722F7196@dessent.net \
    --to=brian@dessent.net \
    --cc=gcc-help@gcc.gnu.org \
    --cc=jape41@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).