public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* sparse overlapping structs for vectorization
@ 2014-02-12  6:21 Albert Cahalan
  2014-02-12 11:14 ` Richard Biener
  0 siblings, 1 reply; 2+ messages in thread
From: Albert Cahalan @ 2014-02-12  6:21 UTC (permalink / raw)
  To: gcc

I had a problem that got solved in an ugly way. I think gcc ought
to provide a few ways to make a nicer solution.

There was an array of structs roughly like so:

struct{int w;float x;char y[4];short z[2];}foo[512][4];

The types within the struct are 4 bytes each; I don't actually
remember anything else and it doesn't matter except that they
are distinct. I think it was bitfields actually, neatly grouped
into groups of 32 bits. In other words, like 4 4-byte values
but with more-or-less incompatible types.

Note that 4 of the structs neatly fill a 64-byte cache line.
An alignment attribute was used to ensure 64-byte alignment.

The most common operation needed on this array is to compare
the first struct member of 4 of the structs against a given
value, looking to see if there is a match. SSE would be good.
This would then be followed by using the matching entry if
there is one, else picking one of the 4 to recycle and thus use.

First bad solution:

One could load up 4 SSE registers, shuffle things around... NO.

Second bad solution:

One could simply have 4 distinct arrays. This is bad because
there are different cache lines for w, x, y, and z.

Third bad solution:

The array can be viewed as "int foo[512][4][4]" instead, with
the struct forming the third array index. Note that the last two
array indexes are both 4, so you can kind of swap them around.
This groups 4 fields of each type together, allowing SSE. The
problem here is loss of type safety; one must use array indexes
instead of struct field names. Like so: foo[idx][WHERE_W_IS][i]

Fourth bad solution:

We lay things out as in the third solution, but we cast pointers
to effectively lay sparse structs over each other like shingles.
{
int w;
int pad_wx[3];
float x;
int pad_xy[3];
char y[4];
int pad_yz[3];
short z[2];
}
Performance is hurt by the need for __may_alias__ and of course
the result is painful to look at. We went with this anyway, using
SSE intrinsics, and performance was great. Maintainability... not
so much.

BTW, an array of 512 structs containing 4-entry arrays was not used
because we wanted to have a simple normal pointer to indicate the
item being operated on. We didn't want to need a pointer,index pair.

Can something be done to help out here? The first thing that pops
into mind is the ability to tell gcc that the struct-to-struct
byte offset for array indexing is a user-specified value instead
of simply the struct size.

It's possible we could have safely ignored the warning about aliasing.
I don't know. Perhaps that would give even better performance, but
the casting would still be very ugly.

Solutions that that be defined away for non-gcc compilers are better.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: sparse overlapping structs for vectorization
  2014-02-12  6:21 sparse overlapping structs for vectorization Albert Cahalan
@ 2014-02-12 11:14 ` Richard Biener
  0 siblings, 0 replies; 2+ messages in thread
From: Richard Biener @ 2014-02-12 11:14 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: GCC Development

On Wed, Feb 12, 2014 at 7:21 AM, Albert Cahalan <acahalan@gmail.com> wrote:
> I had a problem that got solved in an ugly way. I think gcc ought
> to provide a few ways to make a nicer solution.
>
> There was an array of structs roughly like so:
>
> struct{int w;float x;char y[4];short z[2];}foo[512][4];
>
> The types within the struct are 4 bytes each; I don't actually
> remember anything else and it doesn't matter except that they
> are distinct. I think it was bitfields actually, neatly grouped
> into groups of 32 bits. In other words, like 4 4-byte values
> but with more-or-less incompatible types.
>
> Note that 4 of the structs neatly fill a 64-byte cache line.
> An alignment attribute was used to ensure 64-byte alignment.
>
> The most common operation needed on this array is to compare
> the first struct member of 4 of the structs against a given
> value, looking to see if there is a match. SSE would be good.
> This would then be followed by using the matching entry if
> there is one, else picking one of the 4 to recycle and thus use.
>
> First bad solution:
>
> One could load up 4 SSE registers, shuffle things around... NO.
>
> Second bad solution:
>
> One could simply have 4 distinct arrays. This is bad because
> there are different cache lines for w, x, y, and z.
>
> Third bad solution:
>
> The array can be viewed as "int foo[512][4][4]" instead, with
> the struct forming the third array index. Note that the last two
> array indexes are both 4, so you can kind of swap them around.
> This groups 4 fields of each type together, allowing SSE. The
> problem here is loss of type safety; one must use array indexes
> instead of struct field names. Like so: foo[idx][WHERE_W_IS][i]
>
> Fourth bad solution:
>
> We lay things out as in the third solution, but we cast pointers
> to effectively lay sparse structs over each other like shingles.
> {
> int w;
> int pad_wx[3];
> float x;
> int pad_xy[3];
> char y[4];
> int pad_yz[3];
> short z[2];
> }
> Performance is hurt by the need for __may_alias__ and of course
> the result is painful to look at. We went with this anyway, using
> SSE intrinsics, and performance was great. Maintainability... not
> so much.
>
> BTW, an array of 512 structs containing 4-entry arrays was not used
> because we wanted to have a simple normal pointer to indicate the
> item being operated on. We didn't want to need a pointer,index pair.
>
> Can something be done to help out here? The first thing that pops
> into mind is the ability to tell gcc that the struct-to-struct
> byte offset for array indexing is a user-specified value instead
> of simply the struct size.
>
> It's possible we could have safely ignored the warning about aliasing.
> I don't know. Perhaps that would give even better performance, but
> the casting would still be very ugly.
>
> Solutions that that be defined away for non-gcc compilers are better.

Do the overlay but use an overlay of type char[<large enough>] and
load from that.  Should be more maintainable than using may_alias
and also work with other compilers.

Richard.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2014-02-12 11:14 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-12  6:21 sparse overlapping structs for vectorization Albert Cahalan
2014-02-12 11:14 ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).