Re: RFC: Extend x86-64 psABI for 256bit AVX register

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

From: Jan Hubicka <hubicka@ucw.cz>
To: "H.J. Lu" <hjl.tools@gmail.com>
Cc: Jan Hubicka <jh@suse.cz>, Jan Hubicka <hubicka@ucw.cz>,
		discuss@x86-64.org, GCC <gcc@gcc.gnu.org>,
		"Girkar, Milind" <milind.girkar@intel.com>,
		"Dmitriev, Serguei N" <serguei.n.dmitriev@intel.com>,
		"Kreitzer, David L" <david.l.kreitzer@intel.com>
Subject: Re: RFC: Extend x86-64 psABI for 256bit AVX register
Date: Mon, 09 Jun 2008 14:41:00 -0000	[thread overview]
Message-ID: <20080609144054.GA13869@atrey.karlin.mff.cuni.cz> (raw)
In-Reply-To: <20080606142813.GA18621@lucon.org>

> On Fri, Jun 06, 2008 at 06:50:26AM -0700, H.J. Lu wrote:
> > On Fri, Jun 06, 2008 at 10:28:34AM +0200, Jan Hubicka wrote:
> > > > 
> > > > ymm0 and xmm0 are the same register. xmm0 is the lower 128bit
> > > > of xmm0. I am not sure if we need separate XMM registers from
> > > > YMM registers.
> > > 
> > > 
> > > Yes, I know that xmm0 is lower part of ymm0.  I still think we ought to
> > > be able to support varargs that do save ymm0 registers only when ymm
> > > values are passed same way as we touch SSE only when SSE values are
> > > passed via EAX hint.
> > 
> > Which register do you propose for hint? The current psABI uses RAX
> > for XMM registers. We can't change it to AL and AH for YMM without
> > breaking backward compatibility.
> > 
> > > This way we will be able to support e.g. printf that has YMM printing %
> > > construct but don't need YMM enabled hardware when those are not used.
> > > 
> > > This is why I think extending EAX to contain information about amount of
> > > XMM values to save and in addition YMM values to save is sane.  Then old
> > > non-YMM aware varargs prologues will crash when YMM values are passed,
> > > but all other combinations will work.
> > 
> > I don't think it is necessary since -mavx will enable AVX code
> > generation for all SSE codes. Unless the function only uses integer,
> > it will crash on non-YMM aware hardware.  That is if there is one
> > SSE register is used, which is hinted in RAX, varargs prologue will
> > use AVX instructions to save it. We don't need another hint for AVX
> > instructions.
> > 
> > > > 
> > > > >
> > > > > I personally don't have much preferences over 1. or 2.. 1. seems
> > > > > relatively easy to implement too, or is packaging two 128bit values to
> > > > > single 256bit difficult in va_arg expansion?
> > > > >
> > > > 
> > > > Access to 256bit register as lower and upper 128bits needs 2
> > > > instructions. For store
> > > > 
> > > > vmovaps   %xmm7, -143(%rax)
> > > > vextractf128 $1, %ymm7, -15(%rax)
> > > > 
> > > > For load
> > > > 
> > > > vmovaps  -143(%rax),%xmm7
> > > > vinsert128 $1, -15(%rax),%ymm7,%ymm7
> > > > 
> > > > If we go beyond 256bit, we need more instructions to access
> > > > the full register. For 512bit, it will be split into lower 128bit,
> > > > middle 128bit and upper 256bit. 1024bit will have 4 parts.
> > > > 
> > > > For #2, only one instruction will be needed for 256bit and
> > > > beyond.
> > > 
> > > Yes, but we will still save half of stack space.  Well, I don't have
> > > much preferences here.  If it seems saner to simply save whole thing
> > > saving lower part twice, I am fine with that.
> > 
> > I was told that it wasn't very easy to get decent performance with
> > split access. I extended my proposal to include a 16bit bitmask to
> > indicate which YMM regisetrs should be saved. If the bit is 0,
> > we should only save the the lower 128bit in the original register
> > save area. Otherwise, we should only save the same whole YMM register.
> > 
> 
> My second thought. How useful is such a bitmask? Do we really
> need it? Is that accepetable to save the lower 128bit twice?

I dont' see much benefit in bitmask.  I think we only should try to
enforce:
  1) that AVX prologue will not ICE on non-AVX hardware for functions
  not using AVX va_arg constructs.
  2) backward compatibility with current va_lists.  That is make
     calling AVX function from non-AVX code work as well as calling
     non-AVX function from AVX code.

I don't think unconditionally saving the AVX registers or guarding them
same way as we do for SSE is good because it breaks 1).

We can't use new register to hint number of AVX operands, because the
register would be uninitialized in non-AVX code.

Still it seems to me that we can use extend current eax convention.
Currently the value must be in range 0...8 as it specify number of SSE
registers.  We can pack both numbers into it.  This way we get
unforutnately wild jump on case of AVX code calling non-AVX function and
passing in AVX arguments, but this seems less important than 1) and 2)
to me and I don't see how to get all three cases working.

Duplicating the value seems OK with me if it simplifies implementation
significandly.

Honza
> 
> Thanks.
> 
> 
> H.J.

next prev parent reply	other threads:[~2008-06-09 14:41 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-05 14:31 H.J. Lu
2008-06-05 14:49 ` Richard Guenther
2008-06-05 15:52   ` H.J. Lu
2008-06-05 15:15 ` Jan Hubicka
2008-06-05 16:14   ` H.J. Lu
2008-06-06  8:29     ` Jan Hubicka
2008-06-06 13:50       ` H.J. Lu
2008-06-06 14:28         ` H.J. Lu
2008-06-06 14:31           ` Richard Guenther
2008-06-06 14:41             ` H.J. Lu
2008-06-06 14:44               ` Richard Guenther
2008-06-09 14:41           ` Jan Hubicka [this message]
2008-06-10 11:24             ` Jakub Jelinek
2008-06-10 11:32               ` Jan Hubicka
2008-06-10 13:48                 ` H.J. Lu
2008-06-10 14:50                   ` Jan Hubicka
2008-06-10 14:57                     ` Jakub Jelinek
2008-06-10 15:41                       ` H.J. Lu
2008-06-10 15:49                         ` Jan Hubicka
2008-06-10 16:18                           ` H.J. Lu
2008-06-11 14:49                           ` H.J. Lu
2008-06-15 22:37                             ` Jakub Jelinek
2008-06-16  1:49                               ` Jan Hubicka
2008-06-18 23:16                                 ` H.J. Lu
2008-06-06 15:01 ` Jakub Jelinek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080609144054.GA13869@atrey.karlin.mff.cuni.cz \
    --to=hubicka@ucw.cz \
    --cc=david.l.kreitzer@intel.com \
    --cc=discuss@x86-64.org \
    --cc=gcc@gcc.gnu.org \
    --cc=hjl.tools@gmail.com \
    --cc=jh@suse.cz \
    --cc=milind.girkar@intel.com \
    --cc=serguei.n.dmitriev@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).