question on gcc vector extensions for vector sizes

public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed

* question on gcc vector extensions for vector sizes > native SIMD width
@ 2011-11-30 10:31 Barragy, Edward
  0 siblings, 0 replies; only message in thread
From: Barragy, Edward @ 2011-11-30 10:31 UTC (permalink / raw)
  To: gcc-help

Hi -
I'm hoping to get some help / direction with the vector extensions in gcc 4.6 running on AMD Interlagos.
I've been using these as an alternative to compiler generated vectorization of loops.
Using typedefs, such as: typedef float v4sf __attribute__ ((vector_size (16)));, has allowed
a fairly painless mapping of floats to 4 packed floats in parts of my finite element code.
Gcc then reliably maps the usual *, + etc into packed SSE3 instructions.  Also works well with AVX.  
This in turn is giving ~> 3x improvement over the original scalar code, whereas the loop vectorizer
more or less fails completely.

What I'd like to try next is something like typedef float v16sf __attribute__((vector_size (64))) .
Gcc accepts that construct, but emits 16 x scalar instructions rather than 4 x packed SSE instructions.

How difficult would it be to modify gcc to map to 4 x packed SSE & where in the code would I look to get started?
Or, am I simply missing some flags / directives etc to get the packed SSE mapping?

There are a couple of reasons for wanting this.  On the CPU side, it gives something like a depth 4 unrolling.  
That in turn gives the CPU out of order execution engine lots of independent instructions to chew on, at least for
my data structure (which is an array of structures, each struct processed independently of the others).  For the
GPU / APU side of things, where the SIMD width is nominally 64, this transparently changes the layout of structs
in memory - which can be very important for performance on the GPU / APU.  There it would be something like
typedef float v64sf __attribute__((vector_size(256))) .   With a little luck, this typedef technique
should give good perf on CPU SIMD as well as GPU & APU SIMD without any surgery on the code data structures.
Thanks -
Ted

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2011-11-30  2:16 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-30 10:31 question on gcc vector extensions for vector sizes > native SIMD width Barragy, Edward

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).