public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* sse vector extensions, unions and inlining
@ 2008-08-19 23:20 Scott Dillard
  0 siblings, 0 replies; only message in thread
From: Scott Dillard @ 2008-08-19 23:20 UTC (permalink / raw)
  To: gcc-help

[-- Attachment #1: Type: text/plain, Size: 2252 bytes --]

Hi,

I've been messing around with the gcc vector extensions (sse) and the
assembly produced seems somewhat suboptimal. I'm not sure what
"optimal" is so I'm inquring here first, before filing a bug report.

This concerns inlined functions that return vectors using using the
struct/union return convention, that is, the address where the result
is to be stored is passed as a hidden first argument to the callee.
When the function returns a 'raw' vector type (such as "double foo
__attribute__((vector_size(16)
))") that fits in a single mmx register then the result of the call to
the inlined function is the same as manual inlining. However if a
union is returned (such as "union { double a[2]; double v
__attribute((vector_size(16))); } ") or if the vector type is too big
for a register (such as "double foo __attribute__((vector_size(32)))")
then excessive stack shuffling occurs, relative to manual inlining.

This is C, btw, so I understand that, in general, stack space has to
be reserved for the arguments (as opposed to const&) but I would
expect that after inlining, the optimizer could see that the arguments
are not modified and not bounce them through the stack, as it does for
things like int and double. Lets say there's functions f(a,b) = a+b,
g(a,b) = a*b, and h(a,b) = g(f(a,a),f(a,b)). Functions f and g are
inlined into h, but the body of h looks like this:

reserve stack space for what would have been calls to f and g
copy arguments into that space
load from that space into mmx registers
operate
copy from mmx registers into stack space
copy from stack space into the space pointed to by the hidden "return
here" argument.

If h is defined as (a*a)+(a*b) then this stack shuffling does not happen.

Is this asking too much? Is there some fundamental reason why the
arguments to the inlined function need to be bounced through the
stack? This is with gcc 4.1.3. I've attached a test file and resulting
assembly. The difference is pretty striking, though I've not
benchmarked it.  I'm also aware this is not really the best way to use
sse (better to put each vector component in a separate array and
vectorize the loop) but I think maybe the issue is with inlined
functions that return structs/unions in general.

Thanks,
Scott

[-- Attachment #2: vec_test.tar.gz --]
[-- Type: application/x-gzip, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2008-08-19 22:36 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-08-19 23:20 sse vector extensions, unions and inlining Scott Dillard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).