public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [x86-64] RFC: Add nosse abi attribute
@ 2023-07-10 15:55 Michael Matz
  2023-07-10 17:28 ` Richard Biener
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Michael Matz @ 2023-07-10 15:55 UTC (permalink / raw)
  To: gcc-patches, Jan Hubicka

Hello,

the ELF psABI for x86-64 doesn't have any callee-saved SSE
registers (there were actual reasons for that, but those don't
matter anymore).  This starts to hurt some uses, as it means that
as soon as you have a call (say to memmove/memcpy, even if
implicit as libcall) in a loop that manipulates floating point
or vector data you get saves/restores around those calls.

But in reality many functions can be written such that they only need
to clobber a subset of the 16 XMM registers (or do the save/restore
themself in the codepaths that needs them, hello memcpy again).
So we want to introduce a way to specify this, via an ABI attribute
that basically says "doesn't clobber the high XMM regs".

I've opted to do only the obvious: do something special only for
xmm8 to xmm15, without a way to specify the clobber set in more detail.
I think such half/half split is reasonable, and as I don't want to
change the argument passing anyway (whose regs are always clobbered)
there isn't that much wiggle room anyway.

I chose to make it possible to write function definitions with that
attribute with GCC adding the necessary callee save/restore code in
the xlogue itself.  Carefully note that this is only possible for
the SSE2 registers, as other parts of them would need instructions
that are only optional.  When a function doesn't contain calls to
unknown functions we can be a bit more lenient: we can make it so that
GCC simply doesn't touch xmm8-15 at all, then no save/restore is
necessary.  If a function contains calls then GCC can't know which
parts of the XMM regset is clobbered by that, it may be parts
which don't even exist yet (say until avx2048 comes out), so we must
restrict ourself to only save/restore the SSE2 parts and then of course
can only claim to not clobber those parts.

To that end I introduce actually two related attributes (for naming
see below):
* nosseclobber: claims (and ensures) that xmm8-15 aren't clobbered
* noanysseclobber: claims (and ensures) that nothing of any of the
  registers overlapping xmm8-15 is clobbered (not even future, as of
  yet unknown, parts)

Ensuring the first is simple: potentially add saves/restore in xlogue
(e.g. when xmm8 is either used explicitely or implicitely by a call).
Ensuring the second comes with more: we must also ensure that no
functions are called that don't guarantee the same thing (in addition
to just removing all xmm8-15 parts alltogether from the available
regsters).

See also the added testcases for what I intended to support.

I chose to use the new target independend function-abi facility for
this.  I need some adjustments in generic code:
* the "default_abi" is actually more like a "current" abi: it happily
  changes its contents according to conditional_register_usage,
  and other code assumes that such changes do propagate.
  But if that conditonal_reg_usage is actually done because the current
  function is of a different ABI, then we must not change default_abi.
* in insn_callee_abi we do look at a potential fndecl for a call
  insn (only set when -fipa-ra), but doesn't work for calls through
  pointers and (as said) is optional.  So, also always look at the
  called functions type (it's always recorded in the MEM_EXPR for
  non-libcalls), before asking the target.
  (The function-abi accessors working on trees were already doing that,
  its just the RTL accessor that missed this)

Accordingly I also implement some more target hooks for function-abi.
With that it's possible to also move the other ABI-influencing code
of i386 to function-abi (ms_abi and friends).  I have not done so for
this patch.

Regarding the names of the attributes: gah!  I've left them at
my mediocre attempts of names in order to hopefully get input on better
names :-)

I would welcome any comments, about the names, the approach, the attempt
at documenting the intricacies of these attributes and anything.

FWIW, this particular patch was regstrapped on x86-64-linux
with trunk from a week ago (and sniff-tested on current trunk).


Ciao,
Michael.

diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index 37cb5a0dcc4..92358f4ac41 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -3244,6 +3244,16 @@ ix86_set_indirect_branch_type (tree fndecl)
     }
 }
 
+unsigned
+ix86_fntype_to_abi_id (const_tree fntype)
+{
+  if (lookup_attribute ("nosseclobber", TYPE_ATTRIBUTES (fntype)))
+    return ABI_LESS_SSE;
+  if (lookup_attribute ("noanysseclobber", TYPE_ATTRIBUTES (fntype)))
+    return ABI_NO_SSE;
+  return ABI_DEFAULT;
+}
+
 /* Establish appropriate back-end context for processing the function
    FNDECL.  The argument might be NULL to indicate processing at top
    level, outside of any function scope.  */
@@ -3311,6 +3321,12 @@ ix86_set_current_function (tree fndecl)
       else
 	TREE_TARGET_GLOBALS (new_tree) = save_target_globals_default_opts ();
     }
+
+  unsigned prev_abi_id = 0;
+  if (ix86_previous_fndecl)
+    prev_abi_id = ix86_fntype_to_abi_id (TREE_TYPE (ix86_previous_fndecl));
+  unsigned this_abi_id = ix86_fntype_to_abi_id (TREE_TYPE (fndecl));
+
   ix86_previous_fndecl = fndecl;
 
   static bool prev_no_caller_saved_registers;
@@ -3327,6 +3343,8 @@ ix86_set_current_function (tree fndecl)
   else if (prev_no_caller_saved_registers
 	   != cfun->machine->no_caller_saved_registers)
     reinit_regs ();
+  else if (prev_abi_id != this_abi_id)
+    reinit_regs ();
 
   if (cfun->machine->func_type != TYPE_NORMAL
       || cfun->machine->no_caller_saved_registers)
@@ -3940,6 +3958,10 @@ const struct attribute_spec ix86_attribute_table[] =
     ix86_handle_fndecl_attribute, NULL },
   { "nodirect_extern_access", 0, 0, true, false, false, false,
     handle_nodirect_extern_access_attribute, NULL },
+  { "nosseclobber",	      0, 0, false, true, true, true,
+    NULL, NULL },
+  { "noanysseclobber",	      0, 0, false, true, true, true,
+    NULL, NULL },
 
   /* End element.  */
   { NULL, 0, 0, false, false, false, false, NULL, NULL }
diff --git a/gcc/config/i386/i386-options.h b/gcc/config/i386/i386-options.h
index 68666067fea..ad39661d852 100644
--- a/gcc/config/i386/i386-options.h
+++ b/gcc/config/i386/i386-options.h
@@ -53,6 +53,7 @@ extern unsigned int ix86_incoming_stack_boundary;
 extern char *ix86_offload_options (void);
 extern void ix86_option_override (void);
 extern void ix86_override_options_after_change (void);
+unsigned ix86_fntype_to_abi_id (const_tree fntype);
 void ix86_set_current_function (tree fndecl);
 bool ix86_function_naked (const_tree fn);
 void ix86_simd_clone_adjust (struct cgraph_node *node);
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index f0d6167e667..01387a3c38b 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -487,6 +487,20 @@ ix86_conditional_register_usage (void)
   
   CLEAR_HARD_REG_SET (reg_class_contents[(int)CLOBBERED_REGS]);
 
+  /* If this function is one of the non-SSE-clobber variants, remove
+     those from the call_used_regs.  */
+  if (cfun && ix86_fntype_to_abi_id (TREE_TYPE (cfun->decl)) != ABI_DEFAULT)
+    {
+      for (i = XMM8_REG; i < XMM16_REG; i++)
+	call_used_regs[i] = 0;
+      if (ix86_fntype_to_abi_id (TREE_TYPE (cfun->decl)) == ABI_NO_SSE)
+	{
+	  /* And from any accessible regs if this is ABI_NO_SSE.  */
+	  for (i = XMM8_REG; i < XMM16_REG; i++)
+	    CLEAR_HARD_REG_BIT (accessible_reg_set, i);
+	}
+    }
+
   for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
     {
       /* Set/reset conditionally defined registers from
@@ -1119,6 +1133,8 @@ ix86_comp_type_attributes (const_tree type1, const_tree type2)
   if (ix86_function_regparm (type1, NULL)
       != ix86_function_regparm (type2, NULL))
     return 0;
+  if (ix86_fntype_to_abi_id (type1) != ix86_fntype_to_abi_id (type2))
+    return 0;
 
   return 1;
 }
@@ -1791,6 +1807,21 @@ init_cumulative_args (CUMULATIVE_ARGS *cum,  /* Argument info to initialize */
   cum->warn_sse = true;
   cum->warn_mmx = true;
 
+  if (ix86_fntype_to_abi_id (TREE_TYPE (cfun->decl)) == ABI_NO_SSE
+      && (!fntype
+	  || ix86_fntype_to_abi_id (fntype) != ABI_NO_SSE))
+    {
+      if (fndecl)
+	error ("%qD without attribute noanysseclobber cannot be "
+	       "called from functions with that attribute", fndecl);
+      else if (fntype)
+	error ("%qT without attribute noanysseclobber cannot be "
+	       "called from functions with that attribute", fntype);
+      else
+	error ("functions without attribute noanysseclobber cannot be "
+	       "called from functions with that attribute");
+    }
+
   /* Because type might mismatch in between caller and callee, we need to
      use actual type of function for local calls.
      FIXME: cgraph_analyze can be told to actually record if function uses
@@ -6514,7 +6545,7 @@ ix86_nsaved_sseregs (void)
   int nregs = 0;
   int regno;
 
-  if (!TARGET_64BIT_MS_ABI)
+  if (!TARGET_64BIT_MS_ABI && crtl->abi->id() == ABI_DEFAULT)
     return 0;
   for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
     if (SSE_REGNO_P (regno) && ix86_save_reg (regno, true, true))
@@ -20285,6 +20316,34 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
   return false;
 }
 
+/* Return the descriptor of an nosseclobber ABI_ID.  */
+
+static const predefined_function_abi &
+i386_less_sse_abi (unsigned abi_id)
+{
+  predefined_function_abi &myabi = function_abis[abi_id];
+  if (!myabi.initialized_p ())
+    {
+      HARD_REG_SET full_reg_clobbers
+	= default_function_abi.full_reg_clobbers ();
+      for (int regno = XMM8_REG; regno < XMM16_REG; regno++)
+	CLEAR_HARD_REG_BIT (full_reg_clobbers, regno);
+      myabi.initialize (abi_id, full_reg_clobbers);
+    }
+  return myabi;
+}
+
+/* Implement TARGET_FNTYPE_ABI.  */
+
+static const predefined_function_abi &
+i386_fntype_abi (const_tree fntype)
+{
+  unsigned abi_id = ix86_fntype_to_abi_id (fntype);
+  if (abi_id != ABI_DEFAULT)
+    return i386_less_sse_abi (abi_id);
+  return default_function_abi;
+}
+
 /* Implement TARGET_INSN_CALLEE_ABI.  */
 
 const predefined_function_abi &
@@ -20341,6 +20400,9 @@ ix86_hard_regno_call_part_clobbered (unsigned int abi_id, unsigned int regno,
 	      && ((TARGET_64BIT && REX_SSE_REGNO_P (regno))
 		  || LEGACY_SSE_REGNO_P (regno)));
 
+  if (abi_id == ABI_NO_SSE)
+    return false;
+
   return SSE_REGNO_P (regno) && GET_MODE_SIZE (mode) > 16;
 }
 
@@ -25594,6 +25656,9 @@ ix86_libgcc_floating_mode_supported_p
 #define TARGET_HARD_REGNO_CALL_PART_CLOBBERED \
   ix86_hard_regno_call_part_clobbered
 
+#undef TARGET_FNTYPE_ABI
+#define TARGET_FNTYPE_ABI i386_fntype_abi
+
 #undef TARGET_INSN_CALLEE_ABI
 #define TARGET_INSN_CALLEE_ABI ix86_insn_callee_abi
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 844deeae6cb..44d32ec2e4f 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -471,7 +471,9 @@
 (define_constants
   [(ABI_DEFAULT		0)
    (ABI_VZEROUPPER	1)
-   (ABI_UNKNOWN		2)])
+   (ABI_LESS_SSE    2)
+   (ABI_NO_SSE      3)
+   (ABI_UNKNOWN		4)])
 
 ;; Insns whose names begin with "x86_" are emitted by gen_FOO calls
 ;; from i386.cc.
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index d88fd75e06e..3adbbc75b1c 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -6680,6 +6680,41 @@ Exception handlers should only be used for exceptions that push an error
 code; you should use an interrupt handler in other cases.  The system
 will crash if the wrong kind of handler is used.
 
+@cindex @code{nosseclobber} function attribute, x86
+@cindex @code{notanysseclobber} function attribute, x86
+@item nosseclobber
+@itemx notanysseclobber
+
+On 32-bit and 64-bit x86 targets, you can use these attributes to indicate that
+a so-marked function doesn't clobber a subset of the SSE2 and AVX registers.
+The @code{nosseclobber} attribute specifies that registers @code{%xmm8} through
+@code{%xmm15} are not clobbered by a function.  This includes the low 16 bytes
+of the corresponding AVX2 and AVX512 registers.  You can't make assumptions
+about the higher parts of these registers, or other registers: those are
+assumed to be clobbered (or not) according to the base ABI.
+
+The @code{notanysseclobber} attribute specifies that the function doesn't
+clobber @emph{any} parts of the SSE2/AVX2/AVX512 registers @code{%zmm8}
+through @code{%zmm15}, not even the high parts.
+
+Functions marked with @code{nosseclobber} can be defined
+without restrictions: they can contain arbitrary floating point or vector
+code, and they can call functions not marked with this attribute (i.e. those
+that must be assumed to clobber parts of these register).
+GCC will insert register saves and restores in the pro- and epilogue in
+those cases (only the low 16 bytes of the used registers will be
+saved/restored, like the attribute implies).
+
+In comparison functions defined with @code{notanysseclobber} are severely
+restricted: they can't call functions not marked with that attribute.
+They also can't write to any of the @code{%xmm8} through @code{%xmm15}
+registers (or their extended variants with other ISAs).  GCC does not
+emit any saves or restores for them.
+
+Calls to such functions (other than above) are unrestricted.  The effect
+is simply that some values can be kept in registers over calls to
+such marked functions.
+
 @cindex @code{target} function attribute
 @item target (@var{options})
 As discussed in @ref{Common Function Attributes}, this attribute 
diff --git a/gcc/function-abi.cc b/gcc/function-abi.cc
index 2ab9b2c5649..efbe114218c 100644
--- a/gcc/function-abi.cc
+++ b/gcc/function-abi.cc
@@ -42,6 +42,26 @@ void
 predefined_function_abi::initialize (unsigned int id,
 				     const_hard_reg_set full_reg_clobbers)
 {
+  /* Don't reinitialize an ABI struct.  We might be called from reinit_regs
+     from the targets conditional_register_usage hook which might depend
+     on cfun and might have changed the global register sets according
+     to that functions ABI already.  That's not the default ABI anymore.
+
+     XXX only avoid this if we're reinitializing the default ABI, and the
+     current function is _not_ of the default ABI.  That's for
+     backward compatibility where some backends modify the regsets with
+     the exception that those changes are then reflected also in the default
+     ABI (which rather is then the "current" ABI).  E.g. x86_64 with the
+     ms_abi vs sysv attribute.  They aren't reflected by separate ABI
+     structs, but handled different.  The "default" ABI hence changes
+     back and forth (and is expected to!) between a ms_abi and a sysv
+     function.  */
+  if (m_initialized
+      && id == 0
+      && cfun
+      && fndecl_abi (cfun->decl).base_abi ().id() != 0)
+    return;
+
   m_id = id;
   m_initialized = true;
   m_full_reg_clobbers = full_reg_clobbers;
@@ -224,6 +244,13 @@ insn_callee_abi (const rtx_insn *insn)
     if (tree fndecl = get_call_fndecl (insn))
       return fndecl_abi (fndecl);
 
+  if (rtx call = get_call_rtx_from (insn))
+    {
+      tree memexp = MEM_EXPR (XEXP (call, 0));
+      if (memexp)
+	return fntype_abi (TREE_TYPE (memexp));
+    }
+
   if (targetm.calls.insn_callee_abi)
     return targetm.calls.insn_callee_abi (insn);
 
diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-1.c b/gcc/testsuite/gcc.target/i386/sseclobber-1.c
new file mode 100644
index 00000000000..8758e2d3109
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sseclobber-1.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target sse2 } */
+/* { dg-options "-O1" } */
+/* { dg-final { scan-assembler-times {mm[89], [0-9]*\(%rsp\)} 2 } } */
+/* { dg-final { scan-assembler-times {mm1[0-5], [0-9]*\(%rsp\)} 6 } } */
+
+extern int nonsse (int) __attribute__((nosseclobber));
+extern int normalfunc (int);
+
+/* Demonstrate that all regs potentially clobbered by normal psABI
+   functions are saved/restored by otherabi functions.  */
+__attribute__((nosseclobber)) int nonsse (int i)
+{
+  return normalfunc (i + 2) + 3;
+}
diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-2.c b/gcc/testsuite/gcc.target/i386/sseclobber-2.c
new file mode 100644
index 00000000000..9abafa0a9ba
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sseclobber-2.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target sse2 } */
+/* { dg-options "-O1" } */
+/* { dg-final { scan-assembler-not {mm[0-9], [0-9]*\(%rsp\)} } } */
+
+extern int nonsse (int) __attribute__((nosseclobber));
+extern int othernonsse (int) __attribute__((nosseclobber));
+
+/* Demonstrate that calling a nosseclobber function from a nosseclobber
+   function does _not_ need to save all the regs (unlike in nonsse).  */
+__attribute__((nosseclobber)) int nonsse (int i)
+{
+  return othernonsse (i + 2) + 3;
+}
diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-3.c b/gcc/testsuite/gcc.target/i386/sseclobber-3.c
new file mode 100644
index 00000000000..276c7fd926b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sseclobber-3.c
@@ -0,0 +1,54 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target sse2 } */
+/* { dg-options "-O1" } */
+/* for docalc2 we should use the high xmm regs */
+/* { dg-final { scan-assembler {xmm[89]} } } */
+/* do docalc4_notany we should use the high ymm regs */
+/* { dg-final { scan-assembler {ymm[89]} } } */
+/* for docalc4 (and nowhere else) we should save/restore exactly
+   one reg to stack around the inner-loop call */
+/* { dg-final { scan-assembler-times {ymm[0-9]*, [0-9]*\(%rsp\)} 1 } } */
+
+typedef double dbl2 __attribute__((vector_size(16)));
+typedef double dbl4 __attribute__((vector_size(32)));
+typedef double dbl8 __attribute__((vector_size(64)));
+extern __attribute__((nosseclobber,const)) double nonsse (int);
+
+/* Demonstrate that some values can be kept in a register over calls
+   to otherabi functions.  nonsse saves the XMM register, so those
+   are usable, hence docalc2 should be able to keep values in registers
+   over the nonsse call.  */
+void docalc2 (dbl2 *d, dbl2 *a, dbl2 *b, int n)
+{
+  long i;
+  for (i = 0; i < n; i++)
+    {
+      d[i] = a[i] * b[i] * nonsse(i);
+    }
+}
+
+/* Here we're using YMM registers (four doubles) and those are _not_
+   saved by nonsse() (only the XMM parts) so docalc4 should not keep
+   the value in a register over the call to nonsse.  */
+void __attribute__((target("avx2"))) docalc4 (dbl4 *d, dbl4 *a, dbl4 *b, int n)
+{
+  long i;
+  for (i = 0; i < n; i++)
+    {
+      d[i] = a[i] * b[i] * nonsse(i);
+    }
+}
+
+/* And here we're also using YMM registers, but have a call to a
+   noanysseclobber function, which _does_ save all [XYZ]MM regs except
+   arguments, so docalc4_notany should again be able to keep the value
+   in a register.  */
+extern __attribute__((noanysseclobber,const)) double notanysse (int);
+void __attribute__((target("avx2"))) docalc4_notany (dbl4 *d, dbl4 *a, dbl4 *b, int n)
+{
+  long i;
+  for (i = 0; i < n; i++)
+    {
+      d[i] = a[i] * b[i] * notanysse(i);
+    }
+}
diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-4.c b/gcc/testsuite/gcc.target/i386/sseclobber-4.c
new file mode 100644
index 00000000000..734f25068f0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sseclobber-4.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target sse2 } */
+/* { dg-options "-O1" } */
+/* { dg-final { scan-assembler-not {mm[0-9], [0-9]*\(%rsp\)} } } */
+
+extern __attribute__((nosseclobber)) int (*nonsse_ptr) (int);
+
+/* Demonstrate that some values can be kept in a register over calls
+   to otherabi functions when called via function pointer.  */
+double docalc (double d)
+{
+  double ret = d;
+  int i = 0;
+  while (1) {
+      int j = nonsse_ptr (i++);
+      if (!j)
+        break;
+      ret += j;
+  }
+  return ret;
+}
diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-5.c b/gcc/testsuite/gcc.target/i386/sseclobber-5.c
new file mode 100644
index 00000000000..1869ae06148
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sseclobber-5.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target sse2 } */
+/* { dg-options "-O1" } */
+/* { dg-final { scan-assembler-not {mm[89]} } } */
+/* { dg-final { scan-assembler-not {mm1[0-5]} } } */
+
+extern int noanysse (int) __attribute__((noanysseclobber));
+extern int noanysse2 (int) __attribute__((noanysseclobber));
+extern __attribute__((noanysseclobber)) double calcstuff (double, double);
+
+/* Demonstrate that none of the clobbered SSE (or wider) regs are
+   used by a noanysse function.  */
+__attribute__((noanysseclobber)) double calcstuff (double d, double e)
+{
+  double s1, s2, s3, s4, s5, s6, s7, s8;
+  s1 = s2 = s3 = s4 = s5 = s6 = s7 = s8 = 0.0;
+  while (d > 0.1)
+    {
+      s1 += s2 * 2 + d;
+      s2 += s3 * 3 + e;
+      s3 += s4 * 5 + d * e;
+      s4 += e / d;
+      s5 += s2 * 7 + d - e;
+      s5 += 2 * d + e;
+      s6 += 5 * e + d;
+      s7 += 7 * e * (d+1);
+      d -= e;
+    }
+  return s1 + s2 + s3 + s4 + s5 + s6 + s7;
+}
+
+/* Demonstrate that we can call noanysse functions from noannysse
+   functions.  */
+__attribute__((noanysseclobber)) int noanysse2 (int i)
+{
+  return noanysse (i + 2) + 3;
+}
diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-6.c b/gcc/testsuite/gcc.target/i386/sseclobber-6.c
new file mode 100644
index 00000000000..89ece11c9f2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sseclobber-6.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target sse2 } */
+/* { dg-options "-O1" } */
+
+/* Various ways of invalid usage of the nosse attributes.  */
+extern __attribute__((nosseclobber)) int nonfndecl; /* { dg-warning "only applies to function types" } */
+
+extern int normalfunc (int);
+__attribute__((nosseclobber)) int (*nonsse_ptr) (int) = normalfunc; /* { dg-warning "from incompatible pointer type" } */
+
+extern int noanysse (int) __attribute__((noanysseclobber));
+/* Demonstrate that it's not allowed to call any functions that
+   aren't noanysse from noanysse functions.  */
+__attribute__((noanysseclobber)) int noanysse (int i)
+{
+  return normalfunc (i + 2) + 3; /* { dg-error "cannot be called from function" } */
+}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [x86-64] RFC: Add nosse abi attribute
  2023-07-10 15:55 [x86-64] RFC: Add nosse abi attribute Michael Matz
@ 2023-07-10 17:28 ` Richard Biener
  2023-07-10 19:07 ` Alexander Monakov
  2023-07-17 23:00 ` Richard Sandiford
  2 siblings, 0 replies; 16+ messages in thread
From: Richard Biener @ 2023-07-10 17:28 UTC (permalink / raw)
  To: Michael Matz; +Cc: gcc-patches, Jan Hubicka



> Am 10.07.2023 um 17:56 schrieb Michael Matz via Gcc-patches <gcc-patches@gcc.gnu.org>:
> 
> Hello,
> 
> the ELF psABI for x86-64 doesn't have any callee-saved SSE
> registers (there were actual reasons for that, but those don't
> matter anymore).  This starts to hurt some uses, as it means that
> as soon as you have a call (say to memmove/memcpy, even if
> implicit as libcall) in a loop that manipulates floating point
> or vector data you get saves/restores around those calls.
> 
> But in reality many functions can be written such that they only need
> to clobber a subset of the 16 XMM registers (or do the save/restore
> themself in the codepaths that needs them, hello memcpy again).
> So we want to introduce a way to specify this, via an ABI attribute
> that basically says "doesn't clobber the high XMM regs".
> 
> I've opted to do only the obvious: do something special only for
> xmm8 to xmm15, without a way to specify the clobber set in more detail.
> I think such half/half split is reasonable, and as I don't want to
> change the argument passing anyway (whose regs are always clobbered)
> there isn't that much wiggle room anyway.

What about xmm16 to xmm31 which AVX512 adds and any possible future additions to the register file?  (I suppose the any variant also covers zmm - and also future widened variants?). What about AVX512 mask registers?

> I chose to make it possible to write function definitions with that
> attribute with GCC adding the necessary callee save/restore code in
> the xlogue itself.  Carefully note that this is only possible for
> the SSE2 registers, as other parts of them would need instructions
> that are only optional.  When a function doesn't contain calls to
> unknown functions we can be a bit more lenient: we can make it so that
> GCC simply doesn't touch xmm8-15 at all, then no save/restore is
> necessary.  If a function contains calls then GCC can't know which
> parts of the XMM regset is clobbered by that, it may be parts
> which don't even exist yet (say until avx2048 comes out), so we must
> restrict ourself to only save/restore the SSE2 parts and then of course
> can only claim to not clobber those parts.
> 
> To that end I introduce actually two related attributes (for naming
> see below):
> * nosseclobber: claims (and ensures) that xmm8-15 aren't clobbered
> * noanysseclobber: claims (and ensures) that nothing of any of the
>  registers overlapping xmm8-15 is clobbered (not even future, as of
>  yet unknown, parts)
> 
> Ensuring the first is simple: potentially add saves/restore in xlogue
> (e.g. when xmm8 is either used explicitely or implicitely by a call).
> Ensuring the second comes with more: we must also ensure that no
> functions are called that don't guarantee the same thing (in addition
> to just removing all xmm8-15 parts alltogether from the available
> regsters).
> 
> See also the added testcases for what I intended to support.
> 
> I chose to use the new target independend function-abi facility for
> this.  I need some adjustments in generic code:
> * the "default_abi" is actually more like a "current" abi: it happily
>  changes its contents according to conditional_register_usage,
>  and other code assumes that such changes do propagate.
>  But if that conditonal_reg_usage is actually done because the current
>  function is of a different ABI, then we must not change default_abi.
> * in insn_callee_abi we do look at a potential fndecl for a call
>  insn (only set when -fipa-ra), but doesn't work for calls through
>  pointers and (as said) is optional.  So, also always look at the
>  called functions type (it's always recorded in the MEM_EXPR for
>  non-libcalls), before asking the target.
>  (The function-abi accessors working on trees were already doing that,
>  its just the RTL accessor that missed this)
> 
> Accordingly I also implement some more target hooks for function-abi.
> With that it's possible to also move the other ABI-influencing code
> of i386 to function-abi (ms_abi and friends).  I have not done so for
> this patch.
> 
> Regarding the names of the attributes: gah!  I've left them at
> my mediocre attempts of names in order to hopefully get input on better
> names :-)
> 
> I would welcome any comments, about the names, the approach, the attempt
> at documenting the intricacies of these attributes and anything.
> 
> FWIW, this particular patch was regstrapped on x86-64-linux
> with trunk from a week ago (and sniff-tested on current trunk).
> 
> 
> Ciao,
> Michael.
> 
> diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
> index 37cb5a0dcc4..92358f4ac41 100644
> --- a/gcc/config/i386/i386-options.cc
> +++ b/gcc/config/i386/i386-options.cc
> @@ -3244,6 +3244,16 @@ ix86_set_indirect_branch_type (tree fndecl)
>     }
> }
> 
> +unsigned
> +ix86_fntype_to_abi_id (const_tree fntype)
> +{
> +  if (lookup_attribute ("nosseclobber", TYPE_ATTRIBUTES (fntype)))
> +    return ABI_LESS_SSE;
> +  if (lookup_attribute ("noanysseclobber", TYPE_ATTRIBUTES (fntype)))
> +    return ABI_NO_SSE;
> +  return ABI_DEFAULT;
> +}
> +
> /* Establish appropriate back-end context for processing the function
>    FNDECL.  The argument might be NULL to indicate processing at top
>    level, outside of any function scope.  */
> @@ -3311,6 +3321,12 @@ ix86_set_current_function (tree fndecl)
>       else
>    TREE_TARGET_GLOBALS (new_tree) = save_target_globals_default_opts ();
>     }
> +
> +  unsigned prev_abi_id = 0;
> +  if (ix86_previous_fndecl)
> +    prev_abi_id = ix86_fntype_to_abi_id (TREE_TYPE (ix86_previous_fndecl));
> +  unsigned this_abi_id = ix86_fntype_to_abi_id (TREE_TYPE (fndecl));
> +
>   ix86_previous_fndecl = fndecl;
> 
>   static bool prev_no_caller_saved_registers;
> @@ -3327,6 +3343,8 @@ ix86_set_current_function (tree fndecl)
>   else if (prev_no_caller_saved_registers
>       != cfun->machine->no_caller_saved_registers)
>     reinit_regs ();
> +  else if (prev_abi_id != this_abi_id)
> +    reinit_regs ();
> 
>   if (cfun->machine->func_type != TYPE_NORMAL
>       || cfun->machine->no_caller_saved_registers)
> @@ -3940,6 +3958,10 @@ const struct attribute_spec ix86_attribute_table[] =
>     ix86_handle_fndecl_attribute, NULL },
>   { "nodirect_extern_access", 0, 0, true, false, false, false,
>     handle_nodirect_extern_access_attribute, NULL },
> +  { "nosseclobber",          0, 0, false, true, true, true,
> +    NULL, NULL },
> +  { "noanysseclobber",          0, 0, false, true, true, true,
> +    NULL, NULL },
> 
>   /* End element.  */
>   { NULL, 0, 0, false, false, false, false, NULL, NULL }
> diff --git a/gcc/config/i386/i386-options.h b/gcc/config/i386/i386-options.h
> index 68666067fea..ad39661d852 100644
> --- a/gcc/config/i386/i386-options.h
> +++ b/gcc/config/i386/i386-options.h
> @@ -53,6 +53,7 @@ extern unsigned int ix86_incoming_stack_boundary;
> extern char *ix86_offload_options (void);
> extern void ix86_option_override (void);
> extern void ix86_override_options_after_change (void);
> +unsigned ix86_fntype_to_abi_id (const_tree fntype);
> void ix86_set_current_function (tree fndecl);
> bool ix86_function_naked (const_tree fn);
> void ix86_simd_clone_adjust (struct cgraph_node *node);
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index f0d6167e667..01387a3c38b 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -487,6 +487,20 @@ ix86_conditional_register_usage (void)
> 
>   CLEAR_HARD_REG_SET (reg_class_contents[(int)CLOBBERED_REGS]);
> 
> +  /* If this function is one of the non-SSE-clobber variants, remove
> +     those from the call_used_regs.  */
> +  if (cfun && ix86_fntype_to_abi_id (TREE_TYPE (cfun->decl)) != ABI_DEFAULT)
> +    {
> +      for (i = XMM8_REG; i < XMM16_REG; i++)
> +    call_used_regs[i] = 0;
> +      if (ix86_fntype_to_abi_id (TREE_TYPE (cfun->decl)) == ABI_NO_SSE)
> +    {
> +      /* And from any accessible regs if this is ABI_NO_SSE.  */
> +      for (i = XMM8_REG; i < XMM16_REG; i++)
> +        CLEAR_HARD_REG_BIT (accessible_reg_set, i);
> +    }
> +    }
> +
>   for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
>     {
>       /* Set/reset conditionally defined registers from
> @@ -1119,6 +1133,8 @@ ix86_comp_type_attributes (const_tree type1, const_tree type2)
>   if (ix86_function_regparm (type1, NULL)
>       != ix86_function_regparm (type2, NULL))
>     return 0;
> +  if (ix86_fntype_to_abi_id (type1) != ix86_fntype_to_abi_id (type2))
> +    return 0;
> 
>   return 1;
> }
> @@ -1791,6 +1807,21 @@ init_cumulative_args (CUMULATIVE_ARGS *cum,  /* Argument info to initialize */
>   cum->warn_sse = true;
>   cum->warn_mmx = true;
> 
> +  if (ix86_fntype_to_abi_id (TREE_TYPE (cfun->decl)) == ABI_NO_SSE
> +      && (!fntype
> +      || ix86_fntype_to_abi_id (fntype) != ABI_NO_SSE))
> +    {
> +      if (fndecl)
> +    error ("%qD without attribute noanysseclobber cannot be "
> +           "called from functions with that attribute", fndecl);
> +      else if (fntype)
> +    error ("%qT without attribute noanysseclobber cannot be "
> +           "called from functions with that attribute", fntype);
> +      else
> +    error ("functions without attribute noanysseclobber cannot be "
> +           "called from functions with that attribute");
> +    }
> +
>   /* Because type might mismatch in between caller and callee, we need to
>      use actual type of function for local calls.
>      FIXME: cgraph_analyze can be told to actually record if function uses
> @@ -6514,7 +6545,7 @@ ix86_nsaved_sseregs (void)
>   int nregs = 0;
>   int regno;
> 
> -  if (!TARGET_64BIT_MS_ABI)
> +  if (!TARGET_64BIT_MS_ABI && crtl->abi->id() == ABI_DEFAULT)
>     return 0;
>   for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>     if (SSE_REGNO_P (regno) && ix86_save_reg (regno, true, true))
> @@ -20285,6 +20316,34 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
>   return false;
> }
> 
> +/* Return the descriptor of an nosseclobber ABI_ID.  */
> +
> +static const predefined_function_abi &
> +i386_less_sse_abi (unsigned abi_id)
> +{
> +  predefined_function_abi &myabi = function_abis[abi_id];
> +  if (!myabi.initialized_p ())
> +    {
> +      HARD_REG_SET full_reg_clobbers
> +    = default_function_abi.full_reg_clobbers ();
> +      for (int regno = XMM8_REG; regno < XMM16_REG; regno++)
> +    CLEAR_HARD_REG_BIT (full_reg_clobbers, regno);
> +      myabi.initialize (abi_id, full_reg_clobbers);
> +    }
> +  return myabi;
> +}
> +
> +/* Implement TARGET_FNTYPE_ABI.  */
> +
> +static const predefined_function_abi &
> +i386_fntype_abi (const_tree fntype)
> +{
> +  unsigned abi_id = ix86_fntype_to_abi_id (fntype);
> +  if (abi_id != ABI_DEFAULT)
> +    return i386_less_sse_abi (abi_id);
> +  return default_function_abi;
> +}
> +
> /* Implement TARGET_INSN_CALLEE_ABI.  */
> 
> const predefined_function_abi &
> @@ -20341,6 +20400,9 @@ ix86_hard_regno_call_part_clobbered (unsigned int abi_id, unsigned int regno,
>          && ((TARGET_64BIT && REX_SSE_REGNO_P (regno))
>          || LEGACY_SSE_REGNO_P (regno)));
> 
> +  if (abi_id == ABI_NO_SSE)
> +    return false;
> +
>   return SSE_REGNO_P (regno) && GET_MODE_SIZE (mode) > 16;
> }
> 
> @@ -25594,6 +25656,9 @@ ix86_libgcc_floating_mode_supported_p
> #define TARGET_HARD_REGNO_CALL_PART_CLOBBERED \
>   ix86_hard_regno_call_part_clobbered
> 
> +#undef TARGET_FNTYPE_ABI
> +#define TARGET_FNTYPE_ABI i386_fntype_abi
> +
> #undef TARGET_INSN_CALLEE_ABI
> #define TARGET_INSN_CALLEE_ABI ix86_insn_callee_abi
> 
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 844deeae6cb..44d32ec2e4f 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -471,7 +471,9 @@
> (define_constants
>   [(ABI_DEFAULT        0)
>    (ABI_VZEROUPPER    1)
> -   (ABI_UNKNOWN        2)])
> +   (ABI_LESS_SSE    2)
> +   (ABI_NO_SSE      3)
> +   (ABI_UNKNOWN        4)])
> 
> ;; Insns whose names begin with "x86_" are emitted by gen_FOO calls
> ;; from i386.cc.
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index d88fd75e06e..3adbbc75b1c 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -6680,6 +6680,41 @@ Exception handlers should only be used for exceptions that push an error
> code; you should use an interrupt handler in other cases.  The system
> will crash if the wrong kind of handler is used.
> 
> +@cindex @code{nosseclobber} function attribute, x86
> +@cindex @code{notanysseclobber} function attribute, x86
> +@item nosseclobber
> +@itemx notanysseclobber
> +
> +On 32-bit and 64-bit x86 targets, you can use these attributes to indicate that
> +a so-marked function doesn't clobber a subset of the SSE2 and AVX registers.
> +The @code{nosseclobber} attribute specifies that registers @code{%xmm8} through
> +@code{%xmm15} are not clobbered by a function.  This includes the low 16 bytes
> +of the corresponding AVX2 and AVX512 registers.  You can't make assumptions
> +about the higher parts of these registers, or other registers: those are
> +assumed to be clobbered (or not) according to the base ABI.
> +
> +The @code{notanysseclobber} attribute specifies that the function doesn't
> +clobber @emph{any} parts of the SSE2/AVX2/AVX512 registers @code{%zmm8}
> +through @code{%zmm15}, not even the high parts.
> +
> +Functions marked with @code{nosseclobber} can be defined
> +without restrictions: they can contain arbitrary floating point or vector
> +code, and they can call functions not marked with this attribute (i.e. those
> +that must be assumed to clobber parts of these register).
> +GCC will insert register saves and restores in the pro- and epilogue in
> +those cases (only the low 16 bytes of the used registers will be
> +saved/restored, like the attribute implies).
> +
> +In comparison functions defined with @code{notanysseclobber} are severely
> +restricted: they can't call functions not marked with that attribute.
> +They also can't write to any of the @code{%xmm8} through @code{%xmm15}
> +registers (or their extended variants with other ISAs).  GCC does not
> +emit any saves or restores for them.
> +
> +Calls to such functions (other than above) are unrestricted.  The effect
> +is simply that some values can be kept in registers over calls to
> +such marked functions.
> +
> @cindex @code{target} function attribute
> @item target (@var{options})
> As discussed in @ref{Common Function Attributes}, this attribute 
> diff --git a/gcc/function-abi.cc b/gcc/function-abi.cc
> index 2ab9b2c5649..efbe114218c 100644
> --- a/gcc/function-abi.cc
> +++ b/gcc/function-abi.cc
> @@ -42,6 +42,26 @@ void
> predefined_function_abi::initialize (unsigned int id,
>                     const_hard_reg_set full_reg_clobbers)
> {
> +  /* Don't reinitialize an ABI struct.  We might be called from reinit_regs
> +     from the targets conditional_register_usage hook which might depend
> +     on cfun and might have changed the global register sets according
> +     to that functions ABI already.  That's not the default ABI anymore.
> +
> +     XXX only avoid this if we're reinitializing the default ABI, and the
> +     current function is _not_ of the default ABI.  That's for
> +     backward compatibility where some backends modify the regsets with
> +     the exception that those changes are then reflected also in the default
> +     ABI (which rather is then the "current" ABI).  E.g. x86_64 with the
> +     ms_abi vs sysv attribute.  They aren't reflected by separate ABI
> +     structs, but handled different.  The "default" ABI hence changes
> +     back and forth (and is expected to!) between a ms_abi and a sysv
> +     function.  */
> +  if (m_initialized
> +      && id == 0
> +      && cfun
> +      && fndecl_abi (cfun->decl).base_abi ().id() != 0)
> +    return;
> +
>   m_id = id;
>   m_initialized = true;
>   m_full_reg_clobbers = full_reg_clobbers;
> @@ -224,6 +244,13 @@ insn_callee_abi (const rtx_insn *insn)
>     if (tree fndecl = get_call_fndecl (insn))
>       return fndecl_abi (fndecl);
> 
> +  if (rtx call = get_call_rtx_from (insn))
> +    {
> +      tree memexp = MEM_EXPR (XEXP (call, 0));
> +      if (memexp)
> +    return fntype_abi (TREE_TYPE (memexp));
> +    }
> +
>   if (targetm.calls.insn_callee_abi)
>     return targetm.calls.insn_callee_abi (insn);
> 
> diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-1.c b/gcc/testsuite/gcc.target/i386/sseclobber-1.c
> new file mode 100644
> index 00000000000..8758e2d3109
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sseclobber-1.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target sse2 } */
> +/* { dg-options "-O1" } */
> +/* { dg-final { scan-assembler-times {mm[89], [0-9]*\(%rsp\)} 2 } } */
> +/* { dg-final { scan-assembler-times {mm1[0-5], [0-9]*\(%rsp\)} 6 } } */
> +
> +extern int nonsse (int) __attribute__((nosseclobber));
> +extern int normalfunc (int);
> +
> +/* Demonstrate that all regs potentially clobbered by normal psABI
> +   functions are saved/restored by otherabi functions.  */
> +__attribute__((nosseclobber)) int nonsse (int i)
> +{
> +  return normalfunc (i + 2) + 3;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-2.c b/gcc/testsuite/gcc.target/i386/sseclobber-2.c
> new file mode 100644
> index 00000000000..9abafa0a9ba
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sseclobber-2.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target sse2 } */
> +/* { dg-options "-O1" } */
> +/* { dg-final { scan-assembler-not {mm[0-9], [0-9]*\(%rsp\)} } } */
> +
> +extern int nonsse (int) __attribute__((nosseclobber));
> +extern int othernonsse (int) __attribute__((nosseclobber));
> +
> +/* Demonstrate that calling a nosseclobber function from a nosseclobber
> +   function does _not_ need to save all the regs (unlike in nonsse).  */
> +__attribute__((nosseclobber)) int nonsse (int i)
> +{
> +  return othernonsse (i + 2) + 3;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-3.c b/gcc/testsuite/gcc.target/i386/sseclobber-3.c
> new file mode 100644
> index 00000000000..276c7fd926b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sseclobber-3.c
> @@ -0,0 +1,54 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target sse2 } */
> +/* { dg-options "-O1" } */
> +/* for docalc2 we should use the high xmm regs */
> +/* { dg-final { scan-assembler {xmm[89]} } } */
> +/* do docalc4_notany we should use the high ymm regs */
> +/* { dg-final { scan-assembler {ymm[89]} } } */
> +/* for docalc4 (and nowhere else) we should save/restore exactly
> +   one reg to stack around the inner-loop call */
> +/* { dg-final { scan-assembler-times {ymm[0-9]*, [0-9]*\(%rsp\)} 1 } } */
> +
> +typedef double dbl2 __attribute__((vector_size(16)));
> +typedef double dbl4 __attribute__((vector_size(32)));
> +typedef double dbl8 __attribute__((vector_size(64)));
> +extern __attribute__((nosseclobber,const)) double nonsse (int);
> +
> +/* Demonstrate that some values can be kept in a register over calls
> +   to otherabi functions.  nonsse saves the XMM register, so those
> +   are usable, hence docalc2 should be able to keep values in registers
> +   over the nonsse call.  */
> +void docalc2 (dbl2 *d, dbl2 *a, dbl2 *b, int n)
> +{
> +  long i;
> +  for (i = 0; i < n; i++)
> +    {
> +      d[i] = a[i] * b[i] * nonsse(i);
> +    }
> +}
> +
> +/* Here we're using YMM registers (four doubles) and those are _not_
> +   saved by nonsse() (only the XMM parts) so docalc4 should not keep
> +   the value in a register over the call to nonsse.  */
> +void __attribute__((target("avx2"))) docalc4 (dbl4 *d, dbl4 *a, dbl4 *b, int n)
> +{
> +  long i;
> +  for (i = 0; i < n; i++)
> +    {
> +      d[i] = a[i] * b[i] * nonsse(i);
> +    }
> +}
> +
> +/* And here we're also using YMM registers, but have a call to a
> +   noanysseclobber function, which _does_ save all [XYZ]MM regs except
> +   arguments, so docalc4_notany should again be able to keep the value
> +   in a register.  */
> +extern __attribute__((noanysseclobber,const)) double notanysse (int);
> +void __attribute__((target("avx2"))) docalc4_notany (dbl4 *d, dbl4 *a, dbl4 *b, int n)
> +{
> +  long i;
> +  for (i = 0; i < n; i++)
> +    {
> +      d[i] = a[i] * b[i] * notanysse(i);
> +    }
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-4.c b/gcc/testsuite/gcc.target/i386/sseclobber-4.c
> new file mode 100644
> index 00000000000..734f25068f0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sseclobber-4.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target sse2 } */
> +/* { dg-options "-O1" } */
> +/* { dg-final { scan-assembler-not {mm[0-9], [0-9]*\(%rsp\)} } } */
> +
> +extern __attribute__((nosseclobber)) int (*nonsse_ptr) (int);
> +
> +/* Demonstrate that some values can be kept in a register over calls
> +   to otherabi functions when called via function pointer.  */
> +double docalc (double d)
> +{
> +  double ret = d;
> +  int i = 0;
> +  while (1) {
> +      int j = nonsse_ptr (i++);
> +      if (!j)
> +        break;
> +      ret += j;
> +  }
> +  return ret;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-5.c b/gcc/testsuite/gcc.target/i386/sseclobber-5.c
> new file mode 100644
> index 00000000000..1869ae06148
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sseclobber-5.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target sse2 } */
> +/* { dg-options "-O1" } */
> +/* { dg-final { scan-assembler-not {mm[89]} } } */
> +/* { dg-final { scan-assembler-not {mm1[0-5]} } } */
> +
> +extern int noanysse (int) __attribute__((noanysseclobber));
> +extern int noanysse2 (int) __attribute__((noanysseclobber));
> +extern __attribute__((noanysseclobber)) double calcstuff (double, double);
> +
> +/* Demonstrate that none of the clobbered SSE (or wider) regs are
> +   used by a noanysse function.  */
> +__attribute__((noanysseclobber)) double calcstuff (double d, double e)
> +{
> +  double s1, s2, s3, s4, s5, s6, s7, s8;
> +  s1 = s2 = s3 = s4 = s5 = s6 = s7 = s8 = 0.0;
> +  while (d > 0.1)
> +    {
> +      s1 += s2 * 2 + d;
> +      s2 += s3 * 3 + e;
> +      s3 += s4 * 5 + d * e;
> +      s4 += e / d;
> +      s5 += s2 * 7 + d - e;
> +      s5 += 2 * d + e;
> +      s6 += 5 * e + d;
> +      s7 += 7 * e * (d+1);
> +      d -= e;
> +    }
> +  return s1 + s2 + s3 + s4 + s5 + s6 + s7;
> +}
> +
> +/* Demonstrate that we can call noanysse functions from noannysse
> +   functions.  */
> +__attribute__((noanysseclobber)) int noanysse2 (int i)
> +{
> +  return noanysse (i + 2) + 3;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-6.c b/gcc/testsuite/gcc.target/i386/sseclobber-6.c
> new file mode 100644
> index 00000000000..89ece11c9f2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sseclobber-6.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target sse2 } */
> +/* { dg-options "-O1" } */
> +
> +/* Various ways of invalid usage of the nosse attributes.  */
> +extern __attribute__((nosseclobber)) int nonfndecl; /* { dg-warning "only applies to function types" } */
> +
> +extern int normalfunc (int);
> +__attribute__((nosseclobber)) int (*nonsse_ptr) (int) = normalfunc; /* { dg-warning "from incompatible pointer type" } */
> +
> +extern int noanysse (int) __attribute__((noanysseclobber));
> +/* Demonstrate that it's not allowed to call any functions that
> +   aren't noanysse from noanysse functions.  */
> +__attribute__((noanysseclobber)) int noanysse (int i)
> +{
> +  return normalfunc (i + 2) + 3; /* { dg-error "cannot be called from function" } */
> +}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [x86-64] RFC: Add nosse abi attribute
  2023-07-10 15:55 [x86-64] RFC: Add nosse abi attribute Michael Matz
  2023-07-10 17:28 ` Richard Biener
@ 2023-07-10 19:07 ` Alexander Monakov
  2023-07-10 20:33   ` Alexander Monakov
                     ` (3 more replies)
  2023-07-17 23:00 ` Richard Sandiford
  2 siblings, 4 replies; 16+ messages in thread
From: Alexander Monakov @ 2023-07-10 19:07 UTC (permalink / raw)
  To: Michael Matz; +Cc: gcc-patches, Jan Hubicka

[-- Attachment #1: Type: text/plain, Size: 5814 bytes --]


On Mon, 10 Jul 2023, Michael Matz via Gcc-patches wrote:

> Hello,
> 
> the ELF psABI for x86-64 doesn't have any callee-saved SSE
> registers (there were actual reasons for that, but those don't
> matter anymore).  This starts to hurt some uses, as it means that
> as soon as you have a call (say to memmove/memcpy, even if
> implicit as libcall) in a loop that manipulates floating point
> or vector data you get saves/restores around those calls.
> 
> But in reality many functions can be written such that they only need
> to clobber a subset of the 16 XMM registers (or do the save/restore
> themself in the codepaths that needs them, hello memcpy again).
> So we want to introduce a way to specify this, via an ABI attribute
> that basically says "doesn't clobber the high XMM regs".

I think the main question is why you're going with this (weak) form
instead of the (strong) form "may only clobber the low XMM regs":
as Richi noted, surely for libcalls we'd like to know they preserve
AVX-512 mask registers as well?

(I realize this is partially answered later)

Note this interacts with anything that interposes between the caller
and the callee, like the Glibc lazy binding stub (which used to
zero out high halves of 512-bit arguments in ZMM registers).
Not an immediate problem for the patch, just something to mind perhaps.

> I've opted to do only the obvious: do something special only for
> xmm8 to xmm15, without a way to specify the clobber set in more detail.
> I think such half/half split is reasonable, and as I don't want to
> change the argument passing anyway (whose regs are always clobbered)
> there isn't that much wiggle room anyway.
> 
> I chose to make it possible to write function definitions with that
> attribute with GCC adding the necessary callee save/restore code in
> the xlogue itself.

But you can't trivially restore if the callee is sibcalling — what
happens then (a testcase might be nice)?

> Carefully note that this is only possible for
> the SSE2 registers, as other parts of them would need instructions
> that are only optional.

What is supposed to happen on 32-bit x86 with -msse -mno-sse2?

> When a function doesn't contain calls to
> unknown functions we can be a bit more lenient: we can make it so that
> GCC simply doesn't touch xmm8-15 at all, then no save/restore is
> necessary.

What if the source code has a local register variable bound to xmm15,
i.e. register double x asm("xmm15"); asm("..." : "+x"(x)); ?
Probably "dont'd do that", i.e. disallow that in the documentation?

> If a function contains calls then GCC can't know which
> parts of the XMM regset is clobbered by that, it may be parts
> which don't even exist yet (say until avx2048 comes out), so we must
> restrict ourself to only save/restore the SSE2 parts and then of course
> can only claim to not clobber those parts.

Hm, I guess this is kinda the reason a "weak" form is needed. But this
highlights the difference between the two: the "weak" form will actively
preserve some state (so it cannot preserve future extensions), while
the "strong" form may just passively not touch any state, preserving
any state it doesn't know about.

> To that end I introduce actually two related attributes (for naming
> see below):
> * nosseclobber: claims (and ensures) that xmm8-15 aren't clobbered

This is the weak/active form; I'd suggest "preserve_high_sse".

> * noanysseclobber: claims (and ensures) that nothing of any of the
>   registers overlapping xmm8-15 is clobbered (not even future, as of
>   yet unknown, parts)

This is the strong/passive form; I'd suggest "only_low_sse".

> Ensuring the first is simple: potentially add saves/restore in xlogue
> (e.g. when xmm8 is either used explicitely or implicitely by a call).
> Ensuring the second comes with more: we must also ensure that no
> functions are called that don't guarantee the same thing (in addition
> to just removing all xmm8-15 parts alltogether from the available
> regsters).
> 
> See also the added testcases for what I intended to support.
> 
> I chose to use the new target independend function-abi facility for
> this.  I need some adjustments in generic code:
> * the "default_abi" is actually more like a "current" abi: it happily
>   changes its contents according to conditional_register_usage,
>   and other code assumes that such changes do propagate.
>   But if that conditonal_reg_usage is actually done because the current
>   function is of a different ABI, then we must not change default_abi.
> * in insn_callee_abi we do look at a potential fndecl for a call
>   insn (only set when -fipa-ra), but doesn't work for calls through
>   pointers and (as said) is optional.  So, also always look at the
>   called functions type (it's always recorded in the MEM_EXPR for
>   non-libcalls), before asking the target.
>   (The function-abi accessors working on trees were already doing that,
>   its just the RTL accessor that missed this)
> 
> Accordingly I also implement some more target hooks for function-abi.
> With that it's possible to also move the other ABI-influencing code
> of i386 to function-abi (ms_abi and friends).  I have not done so for
> this patch.
> 
> Regarding the names of the attributes: gah!  I've left them at
> my mediocre attempts of names in order to hopefully get input on better
> names :-)
> 
> I would welcome any comments, about the names, the approach, the attempt
> at documenting the intricacies of these attributes and anything.

I hope the new attributes are supposed to be usable with function pointers?
From the code it looks that way, but the documentation doesn't promise that.

> FWIW, this particular patch was regstrapped on x86-64-linux
> with trunk from a week ago (and sniff-tested on current trunk).

This looks really cool.

Thanks.
Alexander

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [x86-64] RFC: Add nosse abi attribute
  2023-07-10 19:07 ` Alexander Monakov
@ 2023-07-10 20:33   ` Alexander Monakov
  2023-07-11  6:42   ` Richard Biener
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Alexander Monakov @ 2023-07-10 20:33 UTC (permalink / raw)
  To: Michael Matz; +Cc: gcc-patches, Jan Hubicka

[-- Attachment #1: Type: text/plain, Size: 403 bytes --]

On Mon, 10 Jul 2023, Alexander Monakov wrote:

> > I chose to make it possible to write function definitions with that
> > attribute with GCC adding the necessary callee save/restore code in
> > the xlogue itself.
> 
> But you can't trivially restore if the callee is sibcalling — what
> happens then (a testcase might be nice)?

Sorry, when the caller is doing the sibcall, not the callee.

Alexander

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [x86-64] RFC: Add nosse abi attribute
  2023-07-10 19:07 ` Alexander Monakov
  2023-07-10 20:33   ` Alexander Monakov
@ 2023-07-11  6:42   ` Richard Biener
  2023-07-11  8:53     ` Jan Hubicka
  2023-07-11 13:00   ` Richard Biener
  2023-07-11 14:57   ` Michael Matz
  3 siblings, 1 reply; 16+ messages in thread
From: Richard Biener @ 2023-07-11  6:42 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: Michael Matz, gcc-patches, Jan Hubicka

On Mon, Jul 10, 2023 at 9:08 PM Alexander Monakov via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
> On Mon, 10 Jul 2023, Michael Matz via Gcc-patches wrote:
>
> > Hello,
> >
> > the ELF psABI for x86-64 doesn't have any callee-saved SSE
> > registers (there were actual reasons for that, but those don't
> > matter anymore).  This starts to hurt some uses, as it means that
> > as soon as you have a call (say to memmove/memcpy, even if
> > implicit as libcall) in a loop that manipulates floating point
> > or vector data you get saves/restores around those calls.
> >
> > But in reality many functions can be written such that they only need
> > to clobber a subset of the 16 XMM registers (or do the save/restore
> > themself in the codepaths that needs them, hello memcpy again).
> > So we want to introduce a way to specify this, via an ABI attribute
> > that basically says "doesn't clobber the high XMM regs".
>
> I think the main question is why you're going with this (weak) form
> instead of the (strong) form "may only clobber the low XMM regs":
> as Richi noted, surely for libcalls we'd like to know they preserve
> AVX-512 mask registers as well?
>
> (I realize this is partially answered later)
>
> Note this interacts with anything that interposes between the caller
> and the callee, like the Glibc lazy binding stub (which used to
> zero out high halves of 512-bit arguments in ZMM registers).
> Not an immediate problem for the patch, just something to mind perhaps.
>
> > I've opted to do only the obvious: do something special only for
> > xmm8 to xmm15, without a way to specify the clobber set in more detail.
> > I think such half/half split is reasonable, and as I don't want to
> > change the argument passing anyway (whose regs are always clobbered)
> > there isn't that much wiggle room anyway.
> >
> > I chose to make it possible to write function definitions with that
> > attribute with GCC adding the necessary callee save/restore code in
> > the xlogue itself.
>
> But you can't trivially restore if the callee is sibcalling — what
> happens then (a testcase might be nice)?
>
> > Carefully note that this is only possible for
> > the SSE2 registers, as other parts of them would need instructions
> > that are only optional.
>
> What is supposed to happen on 32-bit x86 with -msse -mno-sse2?
>
> > When a function doesn't contain calls to
> > unknown functions we can be a bit more lenient: we can make it so that
> > GCC simply doesn't touch xmm8-15 at all, then no save/restore is
> > necessary.
>
> What if the source code has a local register variable bound to xmm15,
> i.e. register double x asm("xmm15"); asm("..." : "+x"(x)); ?
> Probably "dont'd do that", i.e. disallow that in the documentation?
>
> > If a function contains calls then GCC can't know which
> > parts of the XMM regset is clobbered by that, it may be parts
> > which don't even exist yet (say until avx2048 comes out), so we must
> > restrict ourself to only save/restore the SSE2 parts and then of course
> > can only claim to not clobber those parts.
>
> Hm, I guess this is kinda the reason a "weak" form is needed. But this
> highlights the difference between the two: the "weak" form will actively
> preserve some state (so it cannot preserve future extensions), while
> the "strong" form may just passively not touch any state, preserving
> any state it doesn't know about.
>
> > To that end I introduce actually two related attributes (for naming
> > see below):
> > * nosseclobber: claims (and ensures) that xmm8-15 aren't clobbered
>
> This is the weak/active form; I'd suggest "preserve_high_sse".
>
> > * noanysseclobber: claims (and ensures) that nothing of any of the
> >   registers overlapping xmm8-15 is clobbered (not even future, as of
> >   yet unknown, parts)
>
> This is the strong/passive form; I'd suggest "only_low_sse".
>
> > Ensuring the first is simple: potentially add saves/restore in xlogue
> > (e.g. when xmm8 is either used explicitely or implicitely by a call).
> > Ensuring the second comes with more: we must also ensure that no
> > functions are called that don't guarantee the same thing (in addition
> > to just removing all xmm8-15 parts alltogether from the available
> > regsters).
> >
> > See also the added testcases for what I intended to support.
> >
> > I chose to use the new target independend function-abi facility for
> > this.  I need some adjustments in generic code:
> > * the "default_abi" is actually more like a "current" abi: it happily
> >   changes its contents according to conditional_register_usage,
> >   and other code assumes that such changes do propagate.
> >   But if that conditonal_reg_usage is actually done because the current
> >   function is of a different ABI, then we must not change default_abi.
> > * in insn_callee_abi we do look at a potential fndecl for a call
> >   insn (only set when -fipa-ra), but doesn't work for calls through
> >   pointers and (as said) is optional.  So, also always look at the
> >   called functions type (it's always recorded in the MEM_EXPR for
> >   non-libcalls), before asking the target.
> >   (The function-abi accessors working on trees were already doing that,
> >   its just the RTL accessor that missed this)
> >
> > Accordingly I also implement some more target hooks for function-abi.
> > With that it's possible to also move the other ABI-influencing code
> > of i386 to function-abi (ms_abi and friends).  I have not done so for
> > this patch.
> >
> > Regarding the names of the attributes: gah!  I've left them at
> > my mediocre attempts of names in order to hopefully get input on better
> > names :-)
> >
> > I would welcome any comments, about the names, the approach, the attempt
> > at documenting the intricacies of these attributes and anything.
>
> I hope the new attributes are supposed to be usable with function pointers?
> From the code it looks that way, but the documentation doesn't promise that.
>
> > FWIW, this particular patch was regstrapped on x86-64-linux
> > with trunk from a week ago (and sniff-tested on current trunk).
>
> This looks really cool.

The biggest benefit might be from IPA with LTO where we'd carefully place those
attributes at WPA time (at that time tieing our hands for later).

For manual use it would be nice to diagnose calls to non-{nosse,noanysse}clobber
functions in such annotated functions - because when we have to conservatively
handle unknown calls that's hardly going to be better than saving
exactly the set
of SSE regs that need to be preserved in the ultimate caller we want
to optimize.

I wonder whether the linker could come to rescue here if we introduce special
aliases with nosse/noanysse clobber ABI that would generate stubs when entry
points with such ABI guarantee are not available (those stubs could also specify
the sub-ISA used and thus "solve" the "future" thing as long as the
dynamic loader(?)
can handle it).

I'll note that with AVX512 one of the advantages is that
vzero{upper,all} does not
modify xmm16-xmm31, so an alternate ABI where all mask registers and
xmm16-xmm31 are callee saved would not be impacted by AVX2 using code.

What's the plan for using those attributes?  As Alex says glibc is using
vzero{upper,all} in the AVX+ specific routines.  Any future changes there
and placing attributes would be an ABI break requiring new entry points?

Richard.

> Thanks.
> Alexander

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [x86-64] RFC: Add nosse abi attribute
  2023-07-11  6:42   ` Richard Biener
@ 2023-07-11  8:53     ` Jan Hubicka
  2023-07-11  9:07       ` Richard Biener
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Hubicka @ 2023-07-11  8:53 UTC (permalink / raw)
  To: Richard Biener; +Cc: Alexander Monakov, Michael Matz, gcc-patches

> > > FWIW, this particular patch was regstrapped on x86-64-linux
> > > with trunk from a week ago (and sniff-tested on current trunk).
> >
> > This looks really cool.
> 
> The biggest benefit might be from IPA with LTO where we'd carefully place those
> attributes at WPA time (at that time tieing our hands for later).

Within single partition IRA already propagates the knowledge about
callee-clobbered registers.

Across partition we already automatically enable regparm with -m32
see ix86_function_regparm and tests for target->local and
can_change_attribute

Enabling SSE at the same spot should be easy.

Honza

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [x86-64] RFC: Add nosse abi attribute
  2023-07-11  8:53     ` Jan Hubicka
@ 2023-07-11  9:07       ` Richard Biener
  0 siblings, 0 replies; 16+ messages in thread
From: Richard Biener @ 2023-07-11  9:07 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Alexander Monakov, Michael Matz, gcc-patches

On Tue, Jul 11, 2023 at 10:53 AM Jan Hubicka <hubicka@ucw.cz> wrote:
>
> > > > FWIW, this particular patch was regstrapped on x86-64-linux
> > > > with trunk from a week ago (and sniff-tested on current trunk).
> > >
> > > This looks really cool.
> >
> > The biggest benefit might be from IPA with LTO where we'd carefully place those
> > attributes at WPA time (at that time tieing our hands for later).
>
> Within single partition IRA already propagates the knowledge about
> callee-clobbered registers.
>
> Across partition we already automatically enable regparm with -m32
> see ix86_function_regparm and tests for target->local and
> can_change_attribute
>
> Enabling SSE at the same spot should be easy.

It's probably slightly different since we want to enable it for a "leaf"
sub-callgraph (or where edges to extern have the appropriate ABI
by means of attributes) irrespective of whether the functions are exported
(we're adding to the callee save set, which is ABI compatible
with the default ABI).  But yes, that place would be appropriate.

Richard.

>
> Honza

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [x86-64] RFC: Add nosse abi attribute
  2023-07-10 19:07 ` Alexander Monakov
  2023-07-10 20:33   ` Alexander Monakov
  2023-07-11  6:42   ` Richard Biener
@ 2023-07-11 13:00   ` Richard Biener
  2023-07-11 13:21     ` Jan Hubicka
  2023-07-11 13:59     ` Alexander Monakov
  2023-07-11 14:57   ` Michael Matz
  3 siblings, 2 replies; 16+ messages in thread
From: Richard Biener @ 2023-07-11 13:00 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: Michael Matz, gcc-patches, Jan Hubicka

On Mon, Jul 10, 2023 at 9:08 PM Alexander Monakov via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
> On Mon, 10 Jul 2023, Michael Matz via Gcc-patches wrote:
>
> > Hello,
> >
> > the ELF psABI for x86-64 doesn't have any callee-saved SSE
> > registers (there were actual reasons for that, but those don't
> > matter anymore).  This starts to hurt some uses, as it means that
> > as soon as you have a call (say to memmove/memcpy, even if
> > implicit as libcall) in a loop that manipulates floating point
> > or vector data you get saves/restores around those calls.
> >
> > But in reality many functions can be written such that they only need
> > to clobber a subset of the 16 XMM registers (or do the save/restore
> > themself in the codepaths that needs them, hello memcpy again).
> > So we want to introduce a way to specify this, via an ABI attribute
> > that basically says "doesn't clobber the high XMM regs".
>
> I think the main question is why you're going with this (weak) form
> instead of the (strong) form "may only clobber the low XMM regs":
> as Richi noted, surely for libcalls we'd like to know they preserve
> AVX-512 mask registers as well?
>
> (I realize this is partially answered later)
>
> Note this interacts with anything that interposes between the caller
> and the callee, like the Glibc lazy binding stub (which used to
> zero out high halves of 512-bit arguments in ZMM registers).
> Not an immediate problem for the patch, just something to mind perhaps.
>
> > I've opted to do only the obvious: do something special only for
> > xmm8 to xmm15, without a way to specify the clobber set in more detail.
> > I think such half/half split is reasonable, and as I don't want to
> > change the argument passing anyway (whose regs are always clobbered)
> > there isn't that much wiggle room anyway.
> >
> > I chose to make it possible to write function definitions with that
> > attribute with GCC adding the necessary callee save/restore code in
> > the xlogue itself.
>
> But you can't trivially restore if the callee is sibcalling — what
> happens then (a testcase might be nice)?
>
> > Carefully note that this is only possible for
> > the SSE2 registers, as other parts of them would need instructions
> > that are only optional.
>
> What is supposed to happen on 32-bit x86 with -msse -mno-sse2?
>
> > When a function doesn't contain calls to
> > unknown functions we can be a bit more lenient: we can make it so that
> > GCC simply doesn't touch xmm8-15 at all, then no save/restore is
> > necessary.
>
> What if the source code has a local register variable bound to xmm15,
> i.e. register double x asm("xmm15"); asm("..." : "+x"(x)); ?
> Probably "dont'd do that", i.e. disallow that in the documentation?
>
> > If a function contains calls then GCC can't know which
> > parts of the XMM regset is clobbered by that, it may be parts
> > which don't even exist yet (say until avx2048 comes out), so we must
> > restrict ourself to only save/restore the SSE2 parts and then of course
> > can only claim to not clobber those parts.
>
> Hm, I guess this is kinda the reason a "weak" form is needed. But this
> highlights the difference between the two: the "weak" form will actively
> preserve some state (so it cannot preserve future extensions), while
> the "strong" form may just passively not touch any state, preserving
> any state it doesn't know about.
>
> > To that end I introduce actually two related attributes (for naming
> > see below):
> > * nosseclobber: claims (and ensures) that xmm8-15 aren't clobbered
>
> This is the weak/active form; I'd suggest "preserve_high_sse".

Isn't it the opposite?  "preserves_low_sse", unless you suggest
the name applies to the caller which has to preserve high parts
when calling nosseclobber.

> > * noanysseclobber: claims (and ensures) that nothing of any of the
> >   registers overlapping xmm8-15 is clobbered (not even future, as of
> >   yet unknown, parts)
>
> This is the strong/passive form; I'd suggest "only_low_sse".

Likewise.

As for mask registers I understand we'd have to split the 8 register
set into two halves to make the same approach work, otherwise
we'd have no registers left to allocate from.

> > Ensuring the first is simple: potentially add saves/restore in xlogue
> > (e.g. when xmm8 is either used explicitely or implicitely by a call).
> > Ensuring the second comes with more: we must also ensure that no
> > functions are called that don't guarantee the same thing (in addition
> > to just removing all xmm8-15 parts alltogether from the available
> > regsters).
> >
> > See also the added testcases for what I intended to support.
> >
> > I chose to use the new target independend function-abi facility for
> > this.  I need some adjustments in generic code:
> > * the "default_abi" is actually more like a "current" abi: it happily
> >   changes its contents according to conditional_register_usage,
> >   and other code assumes that such changes do propagate.
> >   But if that conditonal_reg_usage is actually done because the current
> >   function is of a different ABI, then we must not change default_abi.
> > * in insn_callee_abi we do look at a potential fndecl for a call
> >   insn (only set when -fipa-ra), but doesn't work for calls through
> >   pointers and (as said) is optional.  So, also always look at the
> >   called functions type (it's always recorded in the MEM_EXPR for
> >   non-libcalls), before asking the target.
> >   (The function-abi accessors working on trees were already doing that,
> >   its just the RTL accessor that missed this)
> >
> > Accordingly I also implement some more target hooks for function-abi.
> > With that it's possible to also move the other ABI-influencing code
> > of i386 to function-abi (ms_abi and friends).  I have not done so for
> > this patch.
> >
> > Regarding the names of the attributes: gah!  I've left them at
> > my mediocre attempts of names in order to hopefully get input on better
> > names :-)
> >
> > I would welcome any comments, about the names, the approach, the attempt
> > at documenting the intricacies of these attributes and anything.
>
> I hope the new attributes are supposed to be usable with function pointers?
> From the code it looks that way, but the documentation doesn't promise that.
>
> > FWIW, this particular patch was regstrapped on x86-64-linux
> > with trunk from a week ago (and sniff-tested on current trunk).
>
> This looks really cool.

I think it's indeed nice for scalar code but since noanysseclobber
will be a pain
to manually use vector code is unlikely to benefit for modern archs.
There might
be the chance to auto-enable (maybe even more fine-grained) these with LTO,
but in the end 8 registers is not much and maybe we want to extend this to
noavxclobber and noavx512clobber, esp. considering glibc vectorized math
routines (maybe the OMP SIMD ABI could have been extended instead).

Otherwise after thinking about this more I don't see how to do better - I'm
still curious about actual use cases (for manually annotation, that is).

Richard.



> Thanks.
> Alexander

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [x86-64] RFC: Add nosse abi attribute
  2023-07-11 13:00   ` Richard Biener
@ 2023-07-11 13:21     ` Jan Hubicka
  2023-07-11 14:00       ` Michael Matz
  2023-07-11 13:59     ` Alexander Monakov
  1 sibling, 1 reply; 16+ messages in thread
From: Jan Hubicka @ 2023-07-11 13:21 UTC (permalink / raw)
  To: Richard Biener; +Cc: Alexander Monakov, Michael Matz, gcc-patches

> > > When a function doesn't contain calls to
> > > unknown functions we can be a bit more lenient: we can make it so that
> > > GCC simply doesn't touch xmm8-15 at all, then no save/restore is
> > > necessary.

One may also take into account that first 8 registers are cheaper to
encode than the later 8, so perhaps we may want to choose range that
contains both.

Honza

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [x86-64] RFC: Add nosse abi attribute
  2023-07-11 13:00   ` Richard Biener
  2023-07-11 13:21     ` Jan Hubicka
@ 2023-07-11 13:59     ` Alexander Monakov
  1 sibling, 0 replies; 16+ messages in thread
From: Alexander Monakov @ 2023-07-11 13:59 UTC (permalink / raw)
  To: Richard Biener; +Cc: Michael Matz, gcc-patches, Jan Hubicka


On Tue, 11 Jul 2023, Richard Biener wrote:

> > > If a function contains calls then GCC can't know which
> > > parts of the XMM regset is clobbered by that, it may be parts
> > > which don't even exist yet (say until avx2048 comes out), so we must
> > > restrict ourself to only save/restore the SSE2 parts and then of course
> > > can only claim to not clobber those parts.
> >
> > Hm, I guess this is kinda the reason a "weak" form is needed. But this
> > highlights the difference between the two: the "weak" form will actively
> > preserve some state (so it cannot preserve future extensions), while
> > the "strong" form may just passively not touch any state, preserving
> > any state it doesn't know about.
> >
> > > To that end I introduce actually two related attributes (for naming
> > > see below):
> > > * nosseclobber: claims (and ensures) that xmm8-15 aren't clobbered
> >
> > This is the weak/active form; I'd suggest "preserve_high_sse".
> 
> Isn't it the opposite?  "preserves_low_sse", unless you suggest
> the name applies to the caller which has to preserve high parts
> when calling nosseclobber.

This is the form where the function annnotated with this attribute
consumes 128 bytes on the stack to "blindly" save/restore xmm8-15
if it calls anything with a vanilla ABI.

(actually thinking about it more, I'd like to suggest shelving this part
and only implement the zero-cost variant, noanysseclobber)

> > > * noanysseclobber: claims (and ensures) that nothing of any of the
> > >   registers overlapping xmm8-15 is clobbered (not even future, as of
> > >   yet unknown, parts)
> >
> > This is the strong/passive form; I'd suggest "only_low_sse".
> 
> Likewise.

Sorry if I managed to sow confusion here. In my mind, this is the form where
only xmm0-xmm7 can be written in the function annotated with the attribute,
including its callees. I was thinking that writing to zmm16-31 would be
disallowed too. The initial example was memcpy, where eight vector registers
are sufficient for the job.

> As for mask registers I understand we'd have to split the 8 register
> set into two halves to make the same approach work, otherwise
> we'd have no registers left to allocate from.

I'd suggest to look how many mask registers OpenMP SIMD AVX-512 clones
can receive as implicit arguments, as one data point.

Alexander

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [x86-64] RFC: Add nosse abi attribute
  2023-07-11 13:21     ` Jan Hubicka
@ 2023-07-11 14:00       ` Michael Matz
  0 siblings, 0 replies; 16+ messages in thread
From: Michael Matz @ 2023-07-11 14:00 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Richard Biener, Alexander Monakov, gcc-patches

Hello,

On Tue, 11 Jul 2023, Jan Hubicka wrote:

> > > > When a function doesn't contain calls to
> > > > unknown functions we can be a bit more lenient: we can make it so that
> > > > GCC simply doesn't touch xmm8-15 at all, then no save/restore is
> > > > necessary.
> 
> One may also take into account that first 8 registers are cheaper to
> encode than the later 8, so perhaps we may want to choose range that
> contains both.

There is actually none in the low range that's usable.  xmm0/1 are used 
for return values and xmm2-7 are used for argument passing.  Arguments are 
by default callee clobbered, and we do not want to change this (or limit 
the number of register arguments for the alternate ABI).


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [x86-64] RFC: Add nosse abi attribute
  2023-07-10 19:07 ` Alexander Monakov
                     ` (2 preceding siblings ...)
  2023-07-11 13:00   ` Richard Biener
@ 2023-07-11 14:57   ` Michael Matz
  2023-07-11 15:17     ` Alexander Monakov
  3 siblings, 1 reply; 16+ messages in thread
From: Michael Matz @ 2023-07-11 14:57 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: gcc-patches, Jan Hubicka

[-- Attachment #1: Type: text/plain, Size: 3591 bytes --]

Hello,

On Mon, 10 Jul 2023, Alexander Monakov wrote:

> I think the main question is why you're going with this (weak) form
> instead of the (strong) form "may only clobber the low XMM regs":

I want to provide both.  One of them allows more arbitrary function 
definitions, the other allows more register (parts) to be preserved.  I 
feel both have their place.

> as Richi noted, surely for libcalls we'd like to know they preserve
> AVX-512 mask registers as well?

Yeah, mask registers.  I'm still pondering this.  We would need to split 
the 8 maskregs into two parts.  Hmm.

> Note this interacts with anything that interposes between the caller
> and the callee, like the Glibc lazy binding stub (which used to
> zero out high halves of 512-bit arguments in ZMM registers).
> Not an immediate problem for the patch, just something to mind perhaps.

Yeah, needs to be kept in mind indeed.  Anything coming in between the 
caller and a so-marked callee needs to preserve things.

> > I chose to make it possible to write function definitions with that
> > attribute with GCC adding the necessary callee save/restore code in
> > the xlogue itself.
> 
> But you can't trivially restore if the callee is sibcalling — what
> happens then (a testcase might be nice)?

I hoped early on that the generic code that prohibits sibcalls between 
call sites of too "different" ABIs would deal with this, and then forgot 
to check.  Turns out you had a good hunch here, it actually does a 
sibcall, destroying the guarantees.  Thanks! :)

> > Carefully note that this is only possible for the SSE2 registers, as 
> > other parts of them would need instructions that are only optional.
> 
> What is supposed to happen on 32-bit x86 with -msse -mno-sse2?

Hmm.  I feel the best answer here is "that should error out".  I'll add a 
test and adjust patch if necessary.

> > When a function doesn't contain calls to
> > unknown functions we can be a bit more lenient: we can make it so that
> > GCC simply doesn't touch xmm8-15 at all, then no save/restore is
> > necessary.
> 
> What if the source code has a local register variable bound to xmm15,
> i.e. register double x asm("xmm15"); asm("..." : "+x"(x)); ?

Makes a good testcase as well.  My take: it's acceptable with the 
only-sse2-preserved attribute (xmm15 will in this case be saved/restored), 
and should be an error with the everything-preserved attribute (maybe we 
can make an exception as here we only specify an XMM reg, instead of 
larger parts).

> > To that end I introduce actually two related attributes (for naming
> > see below):
> > * nosseclobber: claims (and ensures) that xmm8-15 aren't clobbered
> 
> This is the weak/active form; I'd suggest "preserve_high_sse".

But it preserves only the low parts :-)  You swapped the two in your 
mind when writing the reply?

> > I would welcome any comments, about the names, the approach, the attempt
> > at documenting the intricacies of these attributes and anything.
> 
> I hope the new attributes are supposed to be usable with function 
> pointers? From the code it looks that way, but the documentation doesn't 
> promise that.

Yes, like all ABI influencing attributes they _have_ to be part of the 
functions type (and hence transfer to function pointers), with appropriate 
incompatible-conversion errors and warnings at the appropriate places.  (I 
know that this isn't always the way we're dealing with ABI-infuencing 
attributes, and often refer to a decl only.  All those are actual bugs.)

And yes, I will adjust the docu to be explicit about this.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [x86-64] RFC: Add nosse abi attribute
  2023-07-11 14:57   ` Michael Matz
@ 2023-07-11 15:17     ` Alexander Monakov
  2023-07-11 15:34       ` Michael Matz
  0 siblings, 1 reply; 16+ messages in thread
From: Alexander Monakov @ 2023-07-11 15:17 UTC (permalink / raw)
  To: Michael Matz; +Cc: gcc-patches, Jan Hubicka


On Tue, 11 Jul 2023, Michael Matz wrote:

> > > To that end I introduce actually two related attributes (for naming
> > > see below):
> > > * nosseclobber: claims (and ensures) that xmm8-15 aren't clobbered
> > 
> > This is the weak/active form; I'd suggest "preserve_high_sse".
> 
> But it preserves only the low parts :-)  You swapped the two in your 
> mind when writing the reply?

Ahhh. By "high SSE" I mean the high-numbered SSE regs, i.e. xmm8-15, not
the higher halves of (unspecified subset of) SSE regs.

If you look from AVX viewpoint, yes, it preserves lower 128 bits of the
high-numbered vector registers.

Alexander

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [x86-64] RFC: Add nosse abi attribute
  2023-07-11 15:17     ` Alexander Monakov
@ 2023-07-11 15:34       ` Michael Matz
  2023-07-11 16:53         ` Alexander Monakov
  0 siblings, 1 reply; 16+ messages in thread
From: Michael Matz @ 2023-07-11 15:34 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: gcc-patches, Jan Hubicka

Hey,

On Tue, 11 Jul 2023, Alexander Monakov via Gcc-patches wrote:

> > > > * nosseclobber: claims (and ensures) that xmm8-15 aren't clobbered
> > > 
> > > This is the weak/active form; I'd suggest "preserve_high_sse".
> > 
> > But it preserves only the low parts :-)  You swapped the two in your 
> > mind when writing the reply?
> 
> Ahhh. By "high SSE" I mean the high-numbered SSE regs, i.e. xmm8-15, not
> the higher halves of (unspecified subset of) SSE regs.

Ah, gotcha :-)  It just shows that all these names are confusing.  Maybe 
I'll just go with "attribute1" and "attribute2" and rely on docu.  (SCNR)


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [x86-64] RFC: Add nosse abi attribute
  2023-07-11 15:34       ` Michael Matz
@ 2023-07-11 16:53         ` Alexander Monakov
  0 siblings, 0 replies; 16+ messages in thread
From: Alexander Monakov @ 2023-07-11 16:53 UTC (permalink / raw)
  To: Michael Matz; +Cc: gcc-patches, Jan Hubicka



On Tue, 11 Jul 2023, Michael Matz wrote:

> Hey,
> 
> On Tue, 11 Jul 2023, Alexander Monakov via Gcc-patches wrote:
> 
> > > > > * nosseclobber: claims (and ensures) that xmm8-15 aren't clobbered
> > > > 
> > > > This is the weak/active form; I'd suggest "preserve_high_sse".
> > > 
> > > But it preserves only the low parts :-)  You swapped the two in your 
> > > mind when writing the reply?
> > 
> > Ahhh. By "high SSE" I mean the high-numbered SSE regs, i.e. xmm8-15, not
> > the higher halves of (unspecified subset of) SSE regs.
> 
> Ah, gotcha :-)  It just shows that all these names are confusing.  Maybe 
> I'll just go with "attribute1" and "attribute2" and rely on docu.  (SCNR)

Heh, that reminds me that decimal digits are allowed in attribute names.
Let me offer "preserve_xmm_8_15" and "only_xmm_0_7" then.

One more thing to keep in mind is interaction with SSE-AVX transition.
If the function with a new attribute is using classic non-VEX-encoded SSE,
but its caller is using 256-bit ymm0-15, it will incur a substantial penalty
on Intel CPUs. There's no penalty on AMD (afaik) and no penalty for zmm16-31,
since those are inaccessible in non-EVEX code.

Alexander

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [x86-64] RFC: Add nosse abi attribute
  2023-07-10 15:55 [x86-64] RFC: Add nosse abi attribute Michael Matz
  2023-07-10 17:28 ` Richard Biener
  2023-07-10 19:07 ` Alexander Monakov
@ 2023-07-17 23:00 ` Richard Sandiford
  2 siblings, 0 replies; 16+ messages in thread
From: Richard Sandiford @ 2023-07-17 23:00 UTC (permalink / raw)
  To: Michael Matz via Gcc-patches; +Cc: Jan Hubicka, Michael Matz

Michael Matz via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> Hello,
>
> the ELF psABI for x86-64 doesn't have any callee-saved SSE
> registers (there were actual reasons for that, but those don't
> matter anymore).  This starts to hurt some uses, as it means that
> as soon as you have a call (say to memmove/memcpy, even if
> implicit as libcall) in a loop that manipulates floating point
> or vector data you get saves/restores around those calls.
>
> But in reality many functions can be written such that they only need
> to clobber a subset of the 16 XMM registers (or do the save/restore
> themself in the codepaths that needs them, hello memcpy again).
> So we want to introduce a way to specify this, via an ABI attribute
> that basically says "doesn't clobber the high XMM regs".
>
> I've opted to do only the obvious: do something special only for
> xmm8 to xmm15, without a way to specify the clobber set in more detail.
> I think such half/half split is reasonable, and as I don't want to
> change the argument passing anyway (whose regs are always clobbered)
> there isn't that much wiggle room anyway.
>
> I chose to make it possible to write function definitions with that
> attribute with GCC adding the necessary callee save/restore code in
> the xlogue itself.  Carefully note that this is only possible for
> the SSE2 registers, as other parts of them would need instructions
> that are only optional.  When a function doesn't contain calls to
> unknown functions we can be a bit more lenient: we can make it so that
> GCC simply doesn't touch xmm8-15 at all, then no save/restore is
> necessary.  If a function contains calls then GCC can't know which
> parts of the XMM regset is clobbered by that, it may be parts
> which don't even exist yet (say until avx2048 comes out), so we must
> restrict ourself to only save/restore the SSE2 parts and then of course
> can only claim to not clobber those parts.
>
> To that end I introduce actually two related attributes (for naming
> see below):
> * nosseclobber: claims (and ensures) that xmm8-15 aren't clobbered
> * noanysseclobber: claims (and ensures) that nothing of any of the
>   registers overlapping xmm8-15 is clobbered (not even future, as of
>   yet unknown, parts)
>
> Ensuring the first is simple: potentially add saves/restore in xlogue
> (e.g. when xmm8 is either used explicitely or implicitely by a call).
> Ensuring the second comes with more: we must also ensure that no
> functions are called that don't guarantee the same thing (in addition
> to just removing all xmm8-15 parts alltogether from the available
> regsters).
>
> See also the added testcases for what I intended to support.
>
> I chose to use the new target independend function-abi facility for
> this.  I need some adjustments in generic code:
> * the "default_abi" is actually more like a "current" abi: it happily
>   changes its contents according to conditional_register_usage,
>   and other code assumes that such changes do propagate.
>   But if that conditonal_reg_usage is actually done because the current
>   function is of a different ABI, then we must not change default_abi.
> * in insn_callee_abi we do look at a potential fndecl for a call
>   insn (only set when -fipa-ra), but doesn't work for calls through
>   pointers and (as said) is optional.  So, also always look at the
>   called functions type (it's always recorded in the MEM_EXPR for
>   non-libcalls), before asking the target.
>   (The function-abi accessors working on trees were already doing that,
>   its just the RTL accessor that missed this)
> [...]
> diff --git a/gcc/function-abi.cc b/gcc/function-abi.cc
> index 2ab9b2c5649..efbe114218c 100644
> --- a/gcc/function-abi.cc
> +++ b/gcc/function-abi.cc
> @@ -42,6 +42,26 @@ void
>  predefined_function_abi::initialize (unsigned int id,
>  				     const_hard_reg_set full_reg_clobbers)
>  {
> +  /* Don't reinitialize an ABI struct.  We might be called from reinit_regs
> +     from the targets conditional_register_usage hook which might depend
> +     on cfun and might have changed the global register sets according
> +     to that functions ABI already.  That's not the default ABI anymore.
> +
> +     XXX only avoid this if we're reinitializing the default ABI, and the
> +     current function is _not_ of the default ABI.  That's for
> +     backward compatibility where some backends modify the regsets with
> +     the exception that those changes are then reflected also in the default
> +     ABI (which rather is then the "current" ABI).  E.g. x86_64 with the
> +     ms_abi vs sysv attribute.  They aren't reflected by separate ABI
> +     structs, but handled different.  The "default" ABI hence changes
> +     back and forth (and is expected to!) between a ms_abi and a sysv
> +     function.  */

The default ABI is also the eh_edge_abi, and so describes the set of
registers that are preserved or clobbered across EH edges.  If changing
between ms_abi and sysv changes the "default" ABI's clobber set, I assume
it also (intentionally?) changes the EH edge clobber set, but how does
that work in practice?

Richard

> +  if (m_initialized
> +      && id == 0
> +      && cfun
> +      && fndecl_abi (cfun->decl).base_abi ().id() != 0)
> +    return;
> +
>    m_id = id;
>    m_initialized = true;
>    m_full_reg_clobbers = full_reg_clobbers;
> @@ -224,6 +244,13 @@ insn_callee_abi (const rtx_insn *insn)
>      if (tree fndecl = get_call_fndecl (insn))
>        return fndecl_abi (fndecl);
>  
> +  if (rtx call = get_call_rtx_from (insn))
> +    {
> +      tree memexp = MEM_EXPR (XEXP (call, 0));
> +      if (memexp)
> +	return fntype_abi (TREE_TYPE (memexp));
> +    }
> +
>    if (targetm.calls.insn_callee_abi)
>      return targetm.calls.insn_callee_abi (insn);
>  
> diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-1.c b/gcc/testsuite/gcc.target/i386/sseclobber-1.c
> new file mode 100644
> index 00000000000..8758e2d3109
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sseclobber-1.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target sse2 } */
> +/* { dg-options "-O1" } */
> +/* { dg-final { scan-assembler-times {mm[89], [0-9]*\(%rsp\)} 2 } } */
> +/* { dg-final { scan-assembler-times {mm1[0-5], [0-9]*\(%rsp\)} 6 } } */
> +
> +extern int nonsse (int) __attribute__((nosseclobber));
> +extern int normalfunc (int);
> +
> +/* Demonstrate that all regs potentially clobbered by normal psABI
> +   functions are saved/restored by otherabi functions.  */
> +__attribute__((nosseclobber)) int nonsse (int i)
> +{
> +  return normalfunc (i + 2) + 3;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-2.c b/gcc/testsuite/gcc.target/i386/sseclobber-2.c
> new file mode 100644
> index 00000000000..9abafa0a9ba
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sseclobber-2.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target sse2 } */
> +/* { dg-options "-O1" } */
> +/* { dg-final { scan-assembler-not {mm[0-9], [0-9]*\(%rsp\)} } } */
> +
> +extern int nonsse (int) __attribute__((nosseclobber));
> +extern int othernonsse (int) __attribute__((nosseclobber));
> +
> +/* Demonstrate that calling a nosseclobber function from a nosseclobber
> +   function does _not_ need to save all the regs (unlike in nonsse).  */
> +__attribute__((nosseclobber)) int nonsse (int i)
> +{
> +  return othernonsse (i + 2) + 3;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-3.c b/gcc/testsuite/gcc.target/i386/sseclobber-3.c
> new file mode 100644
> index 00000000000..276c7fd926b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sseclobber-3.c
> @@ -0,0 +1,54 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target sse2 } */
> +/* { dg-options "-O1" } */
> +/* for docalc2 we should use the high xmm regs */
> +/* { dg-final { scan-assembler {xmm[89]} } } */
> +/* do docalc4_notany we should use the high ymm regs */
> +/* { dg-final { scan-assembler {ymm[89]} } } */
> +/* for docalc4 (and nowhere else) we should save/restore exactly
> +   one reg to stack around the inner-loop call */
> +/* { dg-final { scan-assembler-times {ymm[0-9]*, [0-9]*\(%rsp\)} 1 } } */
> +
> +typedef double dbl2 __attribute__((vector_size(16)));
> +typedef double dbl4 __attribute__((vector_size(32)));
> +typedef double dbl8 __attribute__((vector_size(64)));
> +extern __attribute__((nosseclobber,const)) double nonsse (int);
> +
> +/* Demonstrate that some values can be kept in a register over calls
> +   to otherabi functions.  nonsse saves the XMM register, so those
> +   are usable, hence docalc2 should be able to keep values in registers
> +   over the nonsse call.  */
> +void docalc2 (dbl2 *d, dbl2 *a, dbl2 *b, int n)
> +{
> +  long i;
> +  for (i = 0; i < n; i++)
> +    {
> +      d[i] = a[i] * b[i] * nonsse(i);
> +    }
> +}
> +
> +/* Here we're using YMM registers (four doubles) and those are _not_
> +   saved by nonsse() (only the XMM parts) so docalc4 should not keep
> +   the value in a register over the call to nonsse.  */
> +void __attribute__((target("avx2"))) docalc4 (dbl4 *d, dbl4 *a, dbl4 *b, int n)
> +{
> +  long i;
> +  for (i = 0; i < n; i++)
> +    {
> +      d[i] = a[i] * b[i] * nonsse(i);
> +    }
> +}
> +
> +/* And here we're also using YMM registers, but have a call to a
> +   noanysseclobber function, which _does_ save all [XYZ]MM regs except
> +   arguments, so docalc4_notany should again be able to keep the value
> +   in a register.  */
> +extern __attribute__((noanysseclobber,const)) double notanysse (int);
> +void __attribute__((target("avx2"))) docalc4_notany (dbl4 *d, dbl4 *a, dbl4 *b, int n)
> +{
> +  long i;
> +  for (i = 0; i < n; i++)
> +    {
> +      d[i] = a[i] * b[i] * notanysse(i);
> +    }
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-4.c b/gcc/testsuite/gcc.target/i386/sseclobber-4.c
> new file mode 100644
> index 00000000000..734f25068f0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sseclobber-4.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target sse2 } */
> +/* { dg-options "-O1" } */
> +/* { dg-final { scan-assembler-not {mm[0-9], [0-9]*\(%rsp\)} } } */
> +
> +extern __attribute__((nosseclobber)) int (*nonsse_ptr) (int);
> +
> +/* Demonstrate that some values can be kept in a register over calls
> +   to otherabi functions when called via function pointer.  */
> +double docalc (double d)
> +{
> +  double ret = d;
> +  int i = 0;
> +  while (1) {
> +      int j = nonsse_ptr (i++);
> +      if (!j)
> +        break;
> +      ret += j;
> +  }
> +  return ret;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-5.c b/gcc/testsuite/gcc.target/i386/sseclobber-5.c
> new file mode 100644
> index 00000000000..1869ae06148
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sseclobber-5.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target sse2 } */
> +/* { dg-options "-O1" } */
> +/* { dg-final { scan-assembler-not {mm[89]} } } */
> +/* { dg-final { scan-assembler-not {mm1[0-5]} } } */
> +
> +extern int noanysse (int) __attribute__((noanysseclobber));
> +extern int noanysse2 (int) __attribute__((noanysseclobber));
> +extern __attribute__((noanysseclobber)) double calcstuff (double, double);
> +
> +/* Demonstrate that none of the clobbered SSE (or wider) regs are
> +   used by a noanysse function.  */
> +__attribute__((noanysseclobber)) double calcstuff (double d, double e)
> +{
> +  double s1, s2, s3, s4, s5, s6, s7, s8;
> +  s1 = s2 = s3 = s4 = s5 = s6 = s7 = s8 = 0.0;
> +  while (d > 0.1)
> +    {
> +      s1 += s2 * 2 + d;
> +      s2 += s3 * 3 + e;
> +      s3 += s4 * 5 + d * e;
> +      s4 += e / d;
> +      s5 += s2 * 7 + d - e;
> +      s5 += 2 * d + e;
> +      s6 += 5 * e + d;
> +      s7 += 7 * e * (d+1);
> +      d -= e;
> +    }
> +  return s1 + s2 + s3 + s4 + s5 + s6 + s7;
> +}
> +
> +/* Demonstrate that we can call noanysse functions from noannysse
> +   functions.  */
> +__attribute__((noanysseclobber)) int noanysse2 (int i)
> +{
> +  return noanysse (i + 2) + 3;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-6.c b/gcc/testsuite/gcc.target/i386/sseclobber-6.c
> new file mode 100644
> index 00000000000..89ece11c9f2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sseclobber-6.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target sse2 } */
> +/* { dg-options "-O1" } */
> +
> +/* Various ways of invalid usage of the nosse attributes.  */
> +extern __attribute__((nosseclobber)) int nonfndecl; /* { dg-warning "only applies to function types" } */
> +
> +extern int normalfunc (int);
> +__attribute__((nosseclobber)) int (*nonsse_ptr) (int) = normalfunc; /* { dg-warning "from incompatible pointer type" } */
> +
> +extern int noanysse (int) __attribute__((noanysseclobber));
> +/* Demonstrate that it's not allowed to call any functions that
> +   aren't noanysse from noanysse functions.  */
> +__attribute__((noanysseclobber)) int noanysse (int i)
> +{
> +  return normalfunc (i + 2) + 3; /* { dg-error "cannot be called from function" } */
> +}

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-07-17 23:00 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-10 15:55 [x86-64] RFC: Add nosse abi attribute Michael Matz
2023-07-10 17:28 ` Richard Biener
2023-07-10 19:07 ` Alexander Monakov
2023-07-10 20:33   ` Alexander Monakov
2023-07-11  6:42   ` Richard Biener
2023-07-11  8:53     ` Jan Hubicka
2023-07-11  9:07       ` Richard Biener
2023-07-11 13:00   ` Richard Biener
2023-07-11 13:21     ` Jan Hubicka
2023-07-11 14:00       ` Michael Matz
2023-07-11 13:59     ` Alexander Monakov
2023-07-11 14:57   ` Michael Matz
2023-07-11 15:17     ` Alexander Monakov
2023-07-11 15:34       ` Michael Matz
2023-07-11 16:53         ` Alexander Monakov
2023-07-17 23:00 ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).