public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Sandiford <richard.sandiford@arm.com>
To: Michael Matz via Gcc-patches <gcc-patches@gcc.gnu.org>
Cc: Jan Hubicka <hubicka@ucw.cz>,  Michael Matz <matz@suse.de>
Subject: Re: [x86-64] RFC: Add nosse abi attribute
Date: Tue, 18 Jul 2023 00:00:09 +0100	[thread overview]
Message-ID: <mptjzuym9dy.fsf@arm.com> (raw)
In-Reply-To: <alpine.LSU.2.20.2307101549210.13548@wotan.suse.de> (Michael Matz via Gcc-patches's message of "Mon, 10 Jul 2023 15:55:27 +0000 (UTC)")

Michael Matz via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Hello,
>
> the ELF psABI for x86-64 doesn't have any callee-saved SSE
> registers (there were actual reasons for that, but those don't
> matter anymore).  This starts to hurt some uses, as it means that
> as soon as you have a call (say to memmove/memcpy, even if
> implicit as libcall) in a loop that manipulates floating point
> or vector data you get saves/restores around those calls.
>
> But in reality many functions can be written such that they only need
> to clobber a subset of the 16 XMM registers (or do the save/restore
> themself in the codepaths that needs them, hello memcpy again).
> So we want to introduce a way to specify this, via an ABI attribute
> that basically says "doesn't clobber the high XMM regs".
>
> I've opted to do only the obvious: do something special only for
> xmm8 to xmm15, without a way to specify the clobber set in more detail.
> I think such half/half split is reasonable, and as I don't want to
> change the argument passing anyway (whose regs are always clobbered)
> there isn't that much wiggle room anyway.
>
> I chose to make it possible to write function definitions with that
> attribute with GCC adding the necessary callee save/restore code in
> the xlogue itself.  Carefully note that this is only possible for
> the SSE2 registers, as other parts of them would need instructions
> that are only optional.  When a function doesn't contain calls to
> unknown functions we can be a bit more lenient: we can make it so that
> GCC simply doesn't touch xmm8-15 at all, then no save/restore is
> necessary.  If a function contains calls then GCC can't know which
> parts of the XMM regset is clobbered by that, it may be parts
> which don't even exist yet (say until avx2048 comes out), so we must
> restrict ourself to only save/restore the SSE2 parts and then of course
> can only claim to not clobber those parts.
>
> To that end I introduce actually two related attributes (for naming
> see below):
> * nosseclobber: claims (and ensures) that xmm8-15 aren't clobbered
> * noanysseclobber: claims (and ensures) that nothing of any of the
>   registers overlapping xmm8-15 is clobbered (not even future, as of
>   yet unknown, parts)
>
> Ensuring the first is simple: potentially add saves/restore in xlogue
> (e.g. when xmm8 is either used explicitely or implicitely by a call).
> Ensuring the second comes with more: we must also ensure that no
> functions are called that don't guarantee the same thing (in addition
> to just removing all xmm8-15 parts alltogether from the available
> regsters).
>
> See also the added testcases for what I intended to support.
>
> I chose to use the new target independend function-abi facility for
> this.  I need some adjustments in generic code:
> * the "default_abi" is actually more like a "current" abi: it happily
>   changes its contents according to conditional_register_usage,
>   and other code assumes that such changes do propagate.
>   But if that conditonal_reg_usage is actually done because the current
>   function is of a different ABI, then we must not change default_abi.
> * in insn_callee_abi we do look at a potential fndecl for a call
>   insn (only set when -fipa-ra), but doesn't work for calls through
>   pointers and (as said) is optional.  So, also always look at the
>   called functions type (it's always recorded in the MEM_EXPR for
>   non-libcalls), before asking the target.
>   (The function-abi accessors working on trees were already doing that,
>   its just the RTL accessor that missed this)
> [...]
> diff --git a/gcc/function-abi.cc b/gcc/function-abi.cc
> index 2ab9b2c5649..efbe114218c 100644
> --- a/gcc/function-abi.cc
> +++ b/gcc/function-abi.cc
> @@ -42,6 +42,26 @@ void
>  predefined_function_abi::initialize (unsigned int id,
>  				     const_hard_reg_set full_reg_clobbers)
>  {
> +  /* Don't reinitialize an ABI struct.  We might be called from reinit_regs
> +     from the targets conditional_register_usage hook which might depend
> +     on cfun and might have changed the global register sets according
> +     to that functions ABI already.  That's not the default ABI anymore.
> +
> +     XXX only avoid this if we're reinitializing the default ABI, and the
> +     current function is _not_ of the default ABI.  That's for
> +     backward compatibility where some backends modify the regsets with
> +     the exception that those changes are then reflected also in the default
> +     ABI (which rather is then the "current" ABI).  E.g. x86_64 with the
> +     ms_abi vs sysv attribute.  They aren't reflected by separate ABI
> +     structs, but handled different.  The "default" ABI hence changes
> +     back and forth (and is expected to!) between a ms_abi and a sysv
> +     function.  */

The default ABI is also the eh_edge_abi, and so describes the set of
registers that are preserved or clobbered across EH edges.  If changing
between ms_abi and sysv changes the "default" ABI's clobber set, I assume
it also (intentionally?) changes the EH edge clobber set, but how does
that work in practice?

Richard

> +  if (m_initialized
> +      && id == 0
> +      && cfun
> +      && fndecl_abi (cfun->decl).base_abi ().id() != 0)
> +    return;
> +
>    m_id = id;
>    m_initialized = true;
>    m_full_reg_clobbers = full_reg_clobbers;
> @@ -224,6 +244,13 @@ insn_callee_abi (const rtx_insn *insn)
>      if (tree fndecl = get_call_fndecl (insn))
>        return fndecl_abi (fndecl);
>  
> +  if (rtx call = get_call_rtx_from (insn))
> +    {
> +      tree memexp = MEM_EXPR (XEXP (call, 0));
> +      if (memexp)
> +	return fntype_abi (TREE_TYPE (memexp));
> +    }
> +
>    if (targetm.calls.insn_callee_abi)
>      return targetm.calls.insn_callee_abi (insn);
>  
> diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-1.c b/gcc/testsuite/gcc.target/i386/sseclobber-1.c
> new file mode 100644
> index 00000000000..8758e2d3109
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sseclobber-1.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target sse2 } */
> +/* { dg-options "-O1" } */
> +/* { dg-final { scan-assembler-times {mm[89], [0-9]*\(%rsp\)} 2 } } */
> +/* { dg-final { scan-assembler-times {mm1[0-5], [0-9]*\(%rsp\)} 6 } } */
> +
> +extern int nonsse (int) __attribute__((nosseclobber));
> +extern int normalfunc (int);
> +
> +/* Demonstrate that all regs potentially clobbered by normal psABI
> +   functions are saved/restored by otherabi functions.  */
> +__attribute__((nosseclobber)) int nonsse (int i)
> +{
> +  return normalfunc (i + 2) + 3;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-2.c b/gcc/testsuite/gcc.target/i386/sseclobber-2.c
> new file mode 100644
> index 00000000000..9abafa0a9ba
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sseclobber-2.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target sse2 } */
> +/* { dg-options "-O1" } */
> +/* { dg-final { scan-assembler-not {mm[0-9], [0-9]*\(%rsp\)} } } */
> +
> +extern int nonsse (int) __attribute__((nosseclobber));
> +extern int othernonsse (int) __attribute__((nosseclobber));
> +
> +/* Demonstrate that calling a nosseclobber function from a nosseclobber
> +   function does _not_ need to save all the regs (unlike in nonsse).  */
> +__attribute__((nosseclobber)) int nonsse (int i)
> +{
> +  return othernonsse (i + 2) + 3;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-3.c b/gcc/testsuite/gcc.target/i386/sseclobber-3.c
> new file mode 100644
> index 00000000000..276c7fd926b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sseclobber-3.c
> @@ -0,0 +1,54 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target sse2 } */
> +/* { dg-options "-O1" } */
> +/* for docalc2 we should use the high xmm regs */
> +/* { dg-final { scan-assembler {xmm[89]} } } */
> +/* do docalc4_notany we should use the high ymm regs */
> +/* { dg-final { scan-assembler {ymm[89]} } } */
> +/* for docalc4 (and nowhere else) we should save/restore exactly
> +   one reg to stack around the inner-loop call */
> +/* { dg-final { scan-assembler-times {ymm[0-9]*, [0-9]*\(%rsp\)} 1 } } */
> +
> +typedef double dbl2 __attribute__((vector_size(16)));
> +typedef double dbl4 __attribute__((vector_size(32)));
> +typedef double dbl8 __attribute__((vector_size(64)));
> +extern __attribute__((nosseclobber,const)) double nonsse (int);
> +
> +/* Demonstrate that some values can be kept in a register over calls
> +   to otherabi functions.  nonsse saves the XMM register, so those
> +   are usable, hence docalc2 should be able to keep values in registers
> +   over the nonsse call.  */
> +void docalc2 (dbl2 *d, dbl2 *a, dbl2 *b, int n)
> +{
> +  long i;
> +  for (i = 0; i < n; i++)
> +    {
> +      d[i] = a[i] * b[i] * nonsse(i);
> +    }
> +}
> +
> +/* Here we're using YMM registers (four doubles) and those are _not_
> +   saved by nonsse() (only the XMM parts) so docalc4 should not keep
> +   the value in a register over the call to nonsse.  */
> +void __attribute__((target("avx2"))) docalc4 (dbl4 *d, dbl4 *a, dbl4 *b, int n)
> +{
> +  long i;
> +  for (i = 0; i < n; i++)
> +    {
> +      d[i] = a[i] * b[i] * nonsse(i);
> +    }
> +}
> +
> +/* And here we're also using YMM registers, but have a call to a
> +   noanysseclobber function, which _does_ save all [XYZ]MM regs except
> +   arguments, so docalc4_notany should again be able to keep the value
> +   in a register.  */
> +extern __attribute__((noanysseclobber,const)) double notanysse (int);
> +void __attribute__((target("avx2"))) docalc4_notany (dbl4 *d, dbl4 *a, dbl4 *b, int n)
> +{
> +  long i;
> +  for (i = 0; i < n; i++)
> +    {
> +      d[i] = a[i] * b[i] * notanysse(i);
> +    }
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-4.c b/gcc/testsuite/gcc.target/i386/sseclobber-4.c
> new file mode 100644
> index 00000000000..734f25068f0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sseclobber-4.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target sse2 } */
> +/* { dg-options "-O1" } */
> +/* { dg-final { scan-assembler-not {mm[0-9], [0-9]*\(%rsp\)} } } */
> +
> +extern __attribute__((nosseclobber)) int (*nonsse_ptr) (int);
> +
> +/* Demonstrate that some values can be kept in a register over calls
> +   to otherabi functions when called via function pointer.  */
> +double docalc (double d)
> +{
> +  double ret = d;
> +  int i = 0;
> +  while (1) {
> +      int j = nonsse_ptr (i++);
> +      if (!j)
> +        break;
> +      ret += j;
> +  }
> +  return ret;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-5.c b/gcc/testsuite/gcc.target/i386/sseclobber-5.c
> new file mode 100644
> index 00000000000..1869ae06148
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sseclobber-5.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target sse2 } */
> +/* { dg-options "-O1" } */
> +/* { dg-final { scan-assembler-not {mm[89]} } } */
> +/* { dg-final { scan-assembler-not {mm1[0-5]} } } */
> +
> +extern int noanysse (int) __attribute__((noanysseclobber));
> +extern int noanysse2 (int) __attribute__((noanysseclobber));
> +extern __attribute__((noanysseclobber)) double calcstuff (double, double);
> +
> +/* Demonstrate that none of the clobbered SSE (or wider) regs are
> +   used by a noanysse function.  */
> +__attribute__((noanysseclobber)) double calcstuff (double d, double e)
> +{
> +  double s1, s2, s3, s4, s5, s6, s7, s8;
> +  s1 = s2 = s3 = s4 = s5 = s6 = s7 = s8 = 0.0;
> +  while (d > 0.1)
> +    {
> +      s1 += s2 * 2 + d;
> +      s2 += s3 * 3 + e;
> +      s3 += s4 * 5 + d * e;
> +      s4 += e / d;
> +      s5 += s2 * 7 + d - e;
> +      s5 += 2 * d + e;
> +      s6 += 5 * e + d;
> +      s7 += 7 * e * (d+1);
> +      d -= e;
> +    }
> +  return s1 + s2 + s3 + s4 + s5 + s6 + s7;
> +}
> +
> +/* Demonstrate that we can call noanysse functions from noannysse
> +   functions.  */
> +__attribute__((noanysseclobber)) int noanysse2 (int i)
> +{
> +  return noanysse (i + 2) + 3;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-6.c b/gcc/testsuite/gcc.target/i386/sseclobber-6.c
> new file mode 100644
> index 00000000000..89ece11c9f2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sseclobber-6.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target sse2 } */
> +/* { dg-options "-O1" } */
> +
> +/* Various ways of invalid usage of the nosse attributes.  */
> +extern __attribute__((nosseclobber)) int nonfndecl; /* { dg-warning "only applies to function types" } */
> +
> +extern int normalfunc (int);
> +__attribute__((nosseclobber)) int (*nonsse_ptr) (int) = normalfunc; /* { dg-warning "from incompatible pointer type" } */
> +
> +extern int noanysse (int) __attribute__((noanysseclobber));
> +/* Demonstrate that it's not allowed to call any functions that
> +   aren't noanysse from noanysse functions.  */
> +__attribute__((noanysseclobber)) int noanysse (int i)
> +{
> +  return normalfunc (i + 2) + 3; /* { dg-error "cannot be called from function" } */
> +}

      parent reply	other threads:[~2023-07-17 23:00 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-10 15:55 Michael Matz
2023-07-10 17:28 ` Richard Biener
2023-07-10 19:07 ` Alexander Monakov
2023-07-10 20:33   ` Alexander Monakov
2023-07-11  6:42   ` Richard Biener
2023-07-11  8:53     ` Jan Hubicka
2023-07-11  9:07       ` Richard Biener
2023-07-11 13:00   ` Richard Biener
2023-07-11 13:21     ` Jan Hubicka
2023-07-11 14:00       ` Michael Matz
2023-07-11 13:59     ` Alexander Monakov
2023-07-11 14:57   ` Michael Matz
2023-07-11 15:17     ` Alexander Monakov
2023-07-11 15:34       ` Michael Matz
2023-07-11 16:53         ` Alexander Monakov
2023-07-17 23:00 ` Richard Sandiford [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mptjzuym9dy.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=hubicka@ucw.cz \
    --cc=matz@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).