public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* Correct way to make a 16-byte aligned double* for SSE vectorization?
@ 2009-12-30 21:39 Benjamin Redelings I
  2009-12-31  9:36 ` Jie Zhang
  0 siblings, 1 reply; 4+ messages in thread
From: Benjamin Redelings I @ 2009-12-30 21:39 UTC (permalink / raw)
  To: gcc-help

Hi,

     I am trying to figure out how to make a double* that is 16-byte 
aligned in the way that SSE instructions want.  Hopefully this would 
allow GCC to auto-vectorize loops in a better way.  The problem that I 
am having is that I want a pointer to an aligned double, not an aligned 
pointer to a double.

     I am compiling with these options:
% gcc -c test.C -O3 -ftree-vectorizer-verbose=3 -ffast-math

According to the output of the vectorizer, none of the three ways 
(below) of declaring an aligned pointer actually work.  They are treated 
as unaligned accesses, so presumably the location of the pointer itself 
is being aligned, but it does not point to an aligned location.  In 
contrast, if I define an aligned double, and then define a pointer to 
it, this works.  Is this recommended?

I ask, because gcc-4.5 complains about declaring a 16-byte aligned 
double, if the double is an instantiation of a template parameter. (See 
PR42555.)

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42555

Thanks for any help!

-BenRI

P.S. Here is the example code.  As is, the pointers are recognized as 
aligned.  However, if you comment out the definition of SSE_PTR and 
replace it with any of the tree other approaches, they do not work.

typedef double real;

// these two lines work (together)
typedef real aligned_real __attribute__((aligned(16)));
typedef const aligned_real* SSE_PTR;

// note of these three approaches work to define an aligned pointer in a 
single line.
//typedef const real *SSE_PTR __attribute__((aligned(16)));
//typedef const real __attribute__((aligned(16))) *SSE_PTR;
//typedef const __attribute__((aligned(16))) real *SSE_PTR;

real f(SSE_PTR __restrict__ p, SSE_PTR __restrict__ q,int n)
{
   real sum = 0;
   for(int i=0; i<n;i++)
     sum += p[i] * q[i];

   return sum;
}

-BenRI

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Correct way to make a 16-byte aligned double* for SSE vectorization?
  2009-12-30 21:39 Correct way to make a 16-byte aligned double* for SSE vectorization? Benjamin Redelings I
@ 2009-12-31  9:36 ` Jie Zhang
  2009-12-31 20:28   ` Brian Budge
  0 siblings, 1 reply; 4+ messages in thread
From: Jie Zhang @ 2009-12-31  9:36 UTC (permalink / raw)
  To: Benjamin Redelings I; +Cc: gcc-help

Hi,

On 12/31/2009 03:30 AM, Benjamin Redelings I wrote:
> Hi,
>
> I am trying to figure out how to make a double* that is 16-byte aligned
> in the way that SSE instructions want. Hopefully this would allow GCC to
> auto-vectorize loops in a better way. The problem that I am having is
> that I want a pointer to an aligned double, not an aligned pointer to a
> double.
>
> I am compiling with these options:
> % gcc -c test.C -O3 -ftree-vectorizer-verbose=3 -ffast-math
>
> According to the output of the vectorizer, none of the three ways
> (below) of declaring an aligned pointer actually work. They are treated
> as unaligned accesses, so presumably the location of the pointer itself
> is being aligned, but it does not point to an aligned location. In
> contrast, if I define an aligned double, and then define a pointer to
> it, this works. Is this recommended?
>
Below is just taken from the GCC Manual:
[quote]
As another example,

      char *__attribute__((aligned(8))) *f;

specifies the type “pointer to 8-byte-aligned pointer to char”. Note 
again that this does not work with most attributes; for example, the 
usage of `aligned' and `noreturn' attributes given above is not yet 
supported.
[/quote]

If it had been supported, you could use

 > //typedef const real __attribute__((aligned(16))) *SSE_PTR;

But since it is not yet supported now, you have to use

> typedef double real;
>
> // these two lines work (together)
> typedef real aligned_real __attribute__((aligned(16)));
> typedef const aligned_real* SSE_PTR;
>

Jie

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Correct way to make a 16-byte aligned double* for SSE   vectorization?
  2009-12-31  9:36 ` Jie Zhang
@ 2009-12-31 20:28   ` Brian Budge
       [not found]     ` <4B3D0948.5040407@ncsu.edu>
  0 siblings, 1 reply; 4+ messages in thread
From: Brian Budge @ 2009-12-31 20:28 UTC (permalink / raw)
  To: Jie Zhang; +Cc: Benjamin Redelings I, gcc-help

The reason it won't work is that you're saying the pointer itself
needs to be 16 (or 8) byte aligned.  You need the address that the
pointer points to to be aligned.

On the stack:

__attribute__ ((aligned(16)) real myArray[32];

On the heap (*nix):
real *myArray;
posix_memalign((void**)&myArray, 16, 32 * sizeof(real));

or for more portability you could use the SSE intrinsic mm_malloc.

To know why the one version you posted works, we'd need to see the
calling code of f.   In general, it shouldn't work if malloc or new
are used to allocate the memory passed in, but it might be that the
memory is allocated on the stack?

  Brian

On Wed, Dec 30, 2009 at 6:37 PM, Jie Zhang <jie.zhang@analog.com> wrote:
> Hi,
>
> On 12/31/2009 03:30 AM, Benjamin Redelings I wrote:
>>
>> Hi,
>>
>> I am trying to figure out how to make a double* that is 16-byte aligned
>> in the way that SSE instructions want. Hopefully this would allow GCC to
>> auto-vectorize loops in a better way. The problem that I am having is
>> that I want a pointer to an aligned double, not an aligned pointer to a
>> double.
>>
>> I am compiling with these options:
>> % gcc -c test.C -O3 -ftree-vectorizer-verbose=3 -ffast-math
>>
>> According to the output of the vectorizer, none of the three ways
>> (below) of declaring an aligned pointer actually work. They are treated
>> as unaligned accesses, so presumably the location of the pointer itself
>> is being aligned, but it does not point to an aligned location. In
>> contrast, if I define an aligned double, and then define a pointer to
>> it, this works. Is this recommended?
>>
> Below is just taken from the GCC Manual:
> [quote]
> As another example,
>
>     char *__attribute__((aligned(8))) *f;
>
> specifies the type “pointer to 8-byte-aligned pointer to char”. Note again
> that this does not work with most attributes; for example, the usage of
> `aligned' and `noreturn' attributes given above is not yet supported.
> [/quote]
>
> If it had been supported, you could use
>
>> //typedef const real __attribute__((aligned(16))) *SSE_PTR;
>
> But since it is not yet supported now, you have to use
>
>> typedef double real;
>>
>> // these two lines work (together)
>> typedef real aligned_real __attribute__((aligned(16)));
>> typedef const aligned_real* SSE_PTR;
>>
>
> Jie
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Correct way to make a 16-byte aligned double* for SSE   vectorization?
       [not found]     ` <4B3D0948.5040407@ncsu.edu>
@ 2010-01-01  0:14       ` Brian Budge
  0 siblings, 0 replies; 4+ messages in thread
From: Brian Budge @ 2010-01-01  0:14 UTC (permalink / raw)
  To: Benjamin Redelings I; +Cc: gcc-help

I see, so you want the function to be compiled as though the pointers
are guaranteed to point to 16-byte-aligned addresses.  This is an
interesting question.  I'll be following this too :)

  Brian

On Thu, Dec 31, 2009 at 12:27 PM, Benjamin Redelings I
<benjamin_redelings@ncsu.edu> wrote:
> On 12/31/2009 08:41 AM, Brian Budge wrote:
>>
>> The reason it won't work is that you're saying the pointer itself
>> needs to be 16 (or 8) byte aligned.  You need the address that the
>> pointer points to to be aligned.
>>
>> On the stack:
>>
>> __attribute__ ((aligned(16)) real myArray[32];
>>
>> On the heap (*nix):
>> real *myArray;
>> posix_memalign((void**)&myArray, 16, 32 * sizeof(real));
>>
>> or for more portability you could use the SSE intrinsic mm_malloc.
>>
>> To know why the one version you posted works, we'd need to see the
>> calling code of f.   In general, it shouldn't work if malloc or new
>> are used to allocate the memory passed in, but it might be that the
>> memory is allocated on the stack?
>>
>>   Brian
>
> Hi Brian,
>
>    I think there are two different issues:
>
> 1. First, how to actually allocate memory that is 16-byte aligned.
> 2. Second, how to inform the compiler that a pointer to that memory is in
> fact has the property p&15L == 0L
>
> I am interested in the second question, whereas I think you are answering
> the first one.
>
>> To know why the one version you posted works, we'd need to see the
>> calling code of f.   In general, it shouldn't work if malloc or new
>> are used to allocate the memory passed in, but it might be that the
>> memory is allocated on the stack?
>>
>
> There is no calling code.  That is, I'm not saying that it works when I run
> it.  I am saying that it works (that is, the compiler makes use of the
> 16-byte alignment of the pointer target) when I compile it.
>
> -BenRI
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-01-01  0:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-30 21:39 Correct way to make a 16-byte aligned double* for SSE vectorization? Benjamin Redelings I
2009-12-31  9:36 ` Jie Zhang
2009-12-31 20:28   ` Brian Budge
     [not found]     ` <4B3D0948.5040407@ncsu.edu>
2010-01-01  0:14       ` Brian Budge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).