From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id F3E303858D33 for ; Mon, 17 Jul 2023 23:00:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F3E303858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0ED17C15; Mon, 17 Jul 2023 16:00:55 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id EC53D3F738; Mon, 17 Jul 2023 16:00:10 -0700 (PDT) From: Richard Sandiford To: Michael Matz via Gcc-patches Mail-Followup-To: Michael Matz via Gcc-patches ,Jan Hubicka , Michael Matz , richard.sandiford@arm.com Cc: Jan Hubicka , Michael Matz Subject: Re: [x86-64] RFC: Add nosse abi attribute References: Date: Tue, 18 Jul 2023 00:00:09 +0100 In-Reply-To: (Michael Matz via Gcc-patches's message of "Mon, 10 Jul 2023 15:55:27 +0000 (UTC)") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-26.7 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Michael Matz via Gcc-patches writes: > Hello, > > the ELF psABI for x86-64 doesn't have any callee-saved SSE > registers (there were actual reasons for that, but those don't > matter anymore). This starts to hurt some uses, as it means that > as soon as you have a call (say to memmove/memcpy, even if > implicit as libcall) in a loop that manipulates floating point > or vector data you get saves/restores around those calls. > > But in reality many functions can be written such that they only need > to clobber a subset of the 16 XMM registers (or do the save/restore > themself in the codepaths that needs them, hello memcpy again). > So we want to introduce a way to specify this, via an ABI attribute > that basically says "doesn't clobber the high XMM regs". > > I've opted to do only the obvious: do something special only for > xmm8 to xmm15, without a way to specify the clobber set in more detail. > I think such half/half split is reasonable, and as I don't want to > change the argument passing anyway (whose regs are always clobbered) > there isn't that much wiggle room anyway. > > I chose to make it possible to write function definitions with that > attribute with GCC adding the necessary callee save/restore code in > the xlogue itself. Carefully note that this is only possible for > the SSE2 registers, as other parts of them would need instructions > that are only optional. When a function doesn't contain calls to > unknown functions we can be a bit more lenient: we can make it so that > GCC simply doesn't touch xmm8-15 at all, then no save/restore is > necessary. If a function contains calls then GCC can't know which > parts of the XMM regset is clobbered by that, it may be parts > which don't even exist yet (say until avx2048 comes out), so we must > restrict ourself to only save/restore the SSE2 parts and then of course > can only claim to not clobber those parts. > > To that end I introduce actually two related attributes (for naming > see below): > * nosseclobber: claims (and ensures) that xmm8-15 aren't clobbered > * noanysseclobber: claims (and ensures) that nothing of any of the > registers overlapping xmm8-15 is clobbered (not even future, as of > yet unknown, parts) > > Ensuring the first is simple: potentially add saves/restore in xlogue > (e.g. when xmm8 is either used explicitely or implicitely by a call). > Ensuring the second comes with more: we must also ensure that no > functions are called that don't guarantee the same thing (in addition > to just removing all xmm8-15 parts alltogether from the available > regsters). > > See also the added testcases for what I intended to support. > > I chose to use the new target independend function-abi facility for > this. I need some adjustments in generic code: > * the "default_abi" is actually more like a "current" abi: it happily > changes its contents according to conditional_register_usage, > and other code assumes that such changes do propagate. > But if that conditonal_reg_usage is actually done because the current > function is of a different ABI, then we must not change default_abi. > * in insn_callee_abi we do look at a potential fndecl for a call > insn (only set when -fipa-ra), but doesn't work for calls through > pointers and (as said) is optional. So, also always look at the > called functions type (it's always recorded in the MEM_EXPR for > non-libcalls), before asking the target. > (The function-abi accessors working on trees were already doing that, > its just the RTL accessor that missed this) > [...] > diff --git a/gcc/function-abi.cc b/gcc/function-abi.cc > index 2ab9b2c5649..efbe114218c 100644 > --- a/gcc/function-abi.cc > +++ b/gcc/function-abi.cc > @@ -42,6 +42,26 @@ void > predefined_function_abi::initialize (unsigned int id, > const_hard_reg_set full_reg_clobbers) > { > + /* Don't reinitialize an ABI struct. We might be called from reinit_regs > + from the targets conditional_register_usage hook which might depend > + on cfun and might have changed the global register sets according > + to that functions ABI already. That's not the default ABI anymore. > + > + XXX only avoid this if we're reinitializing the default ABI, and the > + current function is _not_ of the default ABI. That's for > + backward compatibility where some backends modify the regsets with > + the exception that those changes are then reflected also in the default > + ABI (which rather is then the "current" ABI). E.g. x86_64 with the > + ms_abi vs sysv attribute. They aren't reflected by separate ABI > + structs, but handled different. The "default" ABI hence changes > + back and forth (and is expected to!) between a ms_abi and a sysv > + function. */ The default ABI is also the eh_edge_abi, and so describes the set of registers that are preserved or clobbered across EH edges. If changing between ms_abi and sysv changes the "default" ABI's clobber set, I assume it also (intentionally?) changes the EH edge clobber set, but how does that work in practice? Richard > + if (m_initialized > + && id == 0 > + && cfun > + && fndecl_abi (cfun->decl).base_abi ().id() != 0) > + return; > + > m_id = id; > m_initialized = true; > m_full_reg_clobbers = full_reg_clobbers; > @@ -224,6 +244,13 @@ insn_callee_abi (const rtx_insn *insn) > if (tree fndecl = get_call_fndecl (insn)) > return fndecl_abi (fndecl); > > + if (rtx call = get_call_rtx_from (insn)) > + { > + tree memexp = MEM_EXPR (XEXP (call, 0)); > + if (memexp) > + return fntype_abi (TREE_TYPE (memexp)); > + } > + > if (targetm.calls.insn_callee_abi) > return targetm.calls.insn_callee_abi (insn); > > diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-1.c b/gcc/testsuite/gcc.target/i386/sseclobber-1.c > new file mode 100644 > index 00000000000..8758e2d3109 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/sseclobber-1.c > @@ -0,0 +1,15 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target sse2 } */ > +/* { dg-options "-O1" } */ > +/* { dg-final { scan-assembler-times {mm[89], [0-9]*\(%rsp\)} 2 } } */ > +/* { dg-final { scan-assembler-times {mm1[0-5], [0-9]*\(%rsp\)} 6 } } */ > + > +extern int nonsse (int) __attribute__((nosseclobber)); > +extern int normalfunc (int); > + > +/* Demonstrate that all regs potentially clobbered by normal psABI > + functions are saved/restored by otherabi functions. */ > +__attribute__((nosseclobber)) int nonsse (int i) > +{ > + return normalfunc (i + 2) + 3; > +} > diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-2.c b/gcc/testsuite/gcc.target/i386/sseclobber-2.c > new file mode 100644 > index 00000000000..9abafa0a9ba > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/sseclobber-2.c > @@ -0,0 +1,14 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target sse2 } */ > +/* { dg-options "-O1" } */ > +/* { dg-final { scan-assembler-not {mm[0-9], [0-9]*\(%rsp\)} } } */ > + > +extern int nonsse (int) __attribute__((nosseclobber)); > +extern int othernonsse (int) __attribute__((nosseclobber)); > + > +/* Demonstrate that calling a nosseclobber function from a nosseclobber > + function does _not_ need to save all the regs (unlike in nonsse). */ > +__attribute__((nosseclobber)) int nonsse (int i) > +{ > + return othernonsse (i + 2) + 3; > +} > diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-3.c b/gcc/testsuite/gcc.target/i386/sseclobber-3.c > new file mode 100644 > index 00000000000..276c7fd926b > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/sseclobber-3.c > @@ -0,0 +1,54 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target sse2 } */ > +/* { dg-options "-O1" } */ > +/* for docalc2 we should use the high xmm regs */ > +/* { dg-final { scan-assembler {xmm[89]} } } */ > +/* do docalc4_notany we should use the high ymm regs */ > +/* { dg-final { scan-assembler {ymm[89]} } } */ > +/* for docalc4 (and nowhere else) we should save/restore exactly > + one reg to stack around the inner-loop call */ > +/* { dg-final { scan-assembler-times {ymm[0-9]*, [0-9]*\(%rsp\)} 1 } } */ > + > +typedef double dbl2 __attribute__((vector_size(16))); > +typedef double dbl4 __attribute__((vector_size(32))); > +typedef double dbl8 __attribute__((vector_size(64))); > +extern __attribute__((nosseclobber,const)) double nonsse (int); > + > +/* Demonstrate that some values can be kept in a register over calls > + to otherabi functions. nonsse saves the XMM register, so those > + are usable, hence docalc2 should be able to keep values in registers > + over the nonsse call. */ > +void docalc2 (dbl2 *d, dbl2 *a, dbl2 *b, int n) > +{ > + long i; > + for (i = 0; i < n; i++) > + { > + d[i] = a[i] * b[i] * nonsse(i); > + } > +} > + > +/* Here we're using YMM registers (four doubles) and those are _not_ > + saved by nonsse() (only the XMM parts) so docalc4 should not keep > + the value in a register over the call to nonsse. */ > +void __attribute__((target("avx2"))) docalc4 (dbl4 *d, dbl4 *a, dbl4 *b, int n) > +{ > + long i; > + for (i = 0; i < n; i++) > + { > + d[i] = a[i] * b[i] * nonsse(i); > + } > +} > + > +/* And here we're also using YMM registers, but have a call to a > + noanysseclobber function, which _does_ save all [XYZ]MM regs except > + arguments, so docalc4_notany should again be able to keep the value > + in a register. */ > +extern __attribute__((noanysseclobber,const)) double notanysse (int); > +void __attribute__((target("avx2"))) docalc4_notany (dbl4 *d, dbl4 *a, dbl4 *b, int n) > +{ > + long i; > + for (i = 0; i < n; i++) > + { > + d[i] = a[i] * b[i] * notanysse(i); > + } > +} > diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-4.c b/gcc/testsuite/gcc.target/i386/sseclobber-4.c > new file mode 100644 > index 00000000000..734f25068f0 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/sseclobber-4.c > @@ -0,0 +1,21 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target sse2 } */ > +/* { dg-options "-O1" } */ > +/* { dg-final { scan-assembler-not {mm[0-9], [0-9]*\(%rsp\)} } } */ > + > +extern __attribute__((nosseclobber)) int (*nonsse_ptr) (int); > + > +/* Demonstrate that some values can be kept in a register over calls > + to otherabi functions when called via function pointer. */ > +double docalc (double d) > +{ > + double ret = d; > + int i = 0; > + while (1) { > + int j = nonsse_ptr (i++); > + if (!j) > + break; > + ret += j; > + } > + return ret; > +} > diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-5.c b/gcc/testsuite/gcc.target/i386/sseclobber-5.c > new file mode 100644 > index 00000000000..1869ae06148 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/sseclobber-5.c > @@ -0,0 +1,37 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target sse2 } */ > +/* { dg-options "-O1" } */ > +/* { dg-final { scan-assembler-not {mm[89]} } } */ > +/* { dg-final { scan-assembler-not {mm1[0-5]} } } */ > + > +extern int noanysse (int) __attribute__((noanysseclobber)); > +extern int noanysse2 (int) __attribute__((noanysseclobber)); > +extern __attribute__((noanysseclobber)) double calcstuff (double, double); > + > +/* Demonstrate that none of the clobbered SSE (or wider) regs are > + used by a noanysse function. */ > +__attribute__((noanysseclobber)) double calcstuff (double d, double e) > +{ > + double s1, s2, s3, s4, s5, s6, s7, s8; > + s1 = s2 = s3 = s4 = s5 = s6 = s7 = s8 = 0.0; > + while (d > 0.1) > + { > + s1 += s2 * 2 + d; > + s2 += s3 * 3 + e; > + s3 += s4 * 5 + d * e; > + s4 += e / d; > + s5 += s2 * 7 + d - e; > + s5 += 2 * d + e; > + s6 += 5 * e + d; > + s7 += 7 * e * (d+1); > + d -= e; > + } > + return s1 + s2 + s3 + s4 + s5 + s6 + s7; > +} > + > +/* Demonstrate that we can call noanysse functions from noannysse > + functions. */ > +__attribute__((noanysseclobber)) int noanysse2 (int i) > +{ > + return noanysse (i + 2) + 3; > +} > diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-6.c b/gcc/testsuite/gcc.target/i386/sseclobber-6.c > new file mode 100644 > index 00000000000..89ece11c9f2 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/sseclobber-6.c > @@ -0,0 +1,17 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target sse2 } */ > +/* { dg-options "-O1" } */ > + > +/* Various ways of invalid usage of the nosse attributes. */ > +extern __attribute__((nosseclobber)) int nonfndecl; /* { dg-warning "only applies to function types" } */ > + > +extern int normalfunc (int); > +__attribute__((nosseclobber)) int (*nonsse_ptr) (int) = normalfunc; /* { dg-warning "from incompatible pointer type" } */ > + > +extern int noanysse (int) __attribute__((noanysseclobber)); > +/* Demonstrate that it's not allowed to call any functions that > + aren't noanysse from noanysse functions. */ > +__attribute__((noanysseclobber)) int noanysse (int i) > +{ > + return normalfunc (i + 2) + 3; /* { dg-error "cannot be called from function" } */ > +}