public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Scott Dillard" <sedillard@ucdavis.edu>
To: gcc-help@gcc.gnu.org
Subject: sse vector extensions, unions and inlining
Date: Tue, 19 Aug 2008 23:20:00 -0000	[thread overview]
Message-ID: <d9b657cd0808191536i2718634dh343dee70b76aca6a@mail.gmail.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 2252 bytes --]

Hi,

I've been messing around with the gcc vector extensions (sse) and the
assembly produced seems somewhat suboptimal. I'm not sure what
"optimal" is so I'm inquring here first, before filing a bug report.

This concerns inlined functions that return vectors using using the
struct/union return convention, that is, the address where the result
is to be stored is passed as a hidden first argument to the callee.
When the function returns a 'raw' vector type (such as "double foo
__attribute__((vector_size(16)
))") that fits in a single mmx register then the result of the call to
the inlined function is the same as manual inlining. However if a
union is returned (such as "union { double a[2]; double v
__attribute((vector_size(16))); } ") or if the vector type is too big
for a register (such as "double foo __attribute__((vector_size(32)))")
then excessive stack shuffling occurs, relative to manual inlining.

This is C, btw, so I understand that, in general, stack space has to
be reserved for the arguments (as opposed to const&) but I would
expect that after inlining, the optimizer could see that the arguments
are not modified and not bounce them through the stack, as it does for
things like int and double. Lets say there's functions f(a,b) = a+b,
g(a,b) = a*b, and h(a,b) = g(f(a,a),f(a,b)). Functions f and g are
inlined into h, but the body of h looks like this:

reserve stack space for what would have been calls to f and g
copy arguments into that space
load from that space into mmx registers
operate
copy from mmx registers into stack space
copy from stack space into the space pointed to by the hidden "return
here" argument.

If h is defined as (a*a)+(a*b) then this stack shuffling does not happen.

Is this asking too much? Is there some fundamental reason why the
arguments to the inlined function need to be bounced through the
stack? This is with gcc 4.1.3. I've attached a test file and resulting
assembly. The difference is pretty striking, though I've not
benchmarked it.  I'm also aware this is not really the best way to use
sse (better to put each vector component in a separate array and
vectorize the loop) but I think maybe the issue is with inlined
functions that return structs/unions in general.

Thanks,
Scott

[-- Attachment #2: vec_test.tar.gz --]
[-- Type: application/x-gzip, Size: 833 bytes --]

                 reply	other threads:[~2008-08-19 22:36 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d9b657cd0808191536i2718634dh343dee70b76aca6a@mail.gmail.com \
    --to=sedillard@ucdavis.edu \
    --cc=gcc-help@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).