public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Calling convention for Intel APX extension
@ 2023-07-27  8:38 Thomas Koenig
  2023-07-27 13:10 ` Florian Weimer
  2023-07-27 13:43 ` Michael Matz
  0 siblings, 2 replies; 5+ messages in thread
From: Thomas Koenig @ 2023-07-27  8:38 UTC (permalink / raw)
  To: gcc mailing list

With the upcoming Intel APX extension, Intel processors will
finally gain 32 general-purpose registers and three-operand
arithmetic, see

https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html

Intel recommends to have the new registers as caller-saved for
compatibility with current calling conventions.  If I understand this
correctly, this is required for exception unwinding, but not if the
function called is __attribute__((nothrow)).

Since Fortran tends to use a lot of registers for its array descriptors,
and also tends to call nothrow functions (all Fortran functions, and
all Fortran intrinsics, such as sin/cos/etc) a lot, it could profit from
making some of the new registers callee-saved, to save some spills
at function calls.

What are the thoughts on it? Is a modification to the ps-ABI already in
the works, and how would it treat this case?

Best regards

	Thomas



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Calling convention for Intel APX extension
  2023-07-27  8:38 Calling convention for Intel APX extension Thomas Koenig
@ 2023-07-27 13:10 ` Florian Weimer
  2023-07-27 13:43 ` Michael Matz
  1 sibling, 0 replies; 5+ messages in thread
From: Florian Weimer @ 2023-07-27 13:10 UTC (permalink / raw)
  To: Thomas Koenig via Gcc; +Cc: Thomas Koenig

* Thomas Koenig via Gcc:

> Intel recommends to have the new registers as caller-saved for
> compatibility with current calling conventions.  If I understand this
> correctly, this is required for exception unwinding, but not if the
> function called is __attribute__((nothrow)).

Nothrow functions still can call longjmp, so that's probably not the
right discriminator.

For glibc on Linux, we have some extra space in jmpbuf in the signal
mask (we have 1024 bits, but the kernel can use just 64, some of those
have already been repurposed), but it's going to be tough for
cancellation support because of a historic microoptimization there.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Calling convention for Intel APX extension
  2023-07-27  8:38 Calling convention for Intel APX extension Thomas Koenig
  2023-07-27 13:10 ` Florian Weimer
@ 2023-07-27 13:43 ` Michael Matz
  2023-07-30 15:33   ` Thomas Koenig
  1 sibling, 1 reply; 5+ messages in thread
From: Michael Matz @ 2023-07-27 13:43 UTC (permalink / raw)
  To: Thomas Koenig; +Cc: gcc mailing list

Hey,

On Thu, 27 Jul 2023, Thomas Koenig via Gcc wrote:

> Intel recommends to have the new registers as caller-saved for
> compatibility with current calling conventions.  If I understand this
> correctly, this is required for exception unwinding, but not if the
> function called is __attribute__((nothrow)).

That's not the full truth.  It's not (only) exception handling but also 
context switching via setjmp/longjmp and make/get/setcontext.

The data structures for that are part of the ABI unfortunately, and can't 
be assumed to be extensible (as Florian says, for glibc there maybe be 
hacks (or maybe not) on x86-64.  Some other archs implemented 
extensibility from the outset).  So all registers (and register parts!) 
added after the initial psABI is defined usually _have_ to be 
call-clobbered.

> Since Fortran tends to use a lot of registers for its array descriptors,
> and also tends to call nothrow functions (all Fortran functions, and
> all Fortran intrinsics, such as sin/cos/etc) a lot, it could profit from
> making some of the new registers callee-saved, to save some spills
> at function calls.

I've recently submitted a patch that adds some attributes that basically 
say "these-and-those regs aren't clobbered by this function" (I did them 
for not clobbered xmm8-15).  Something similar could be used for the new 
GPRs as well.  Then it would be a matter of ensuring that the interesting 
functions are marked with that attributes (and then of course do the 
necessary call-save/restore).


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Calling convention for Intel APX extension
  2023-07-27 13:43 ` Michael Matz
@ 2023-07-30 15:33   ` Thomas Koenig
  2023-07-31 12:43     ` Michael Matz
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Koenig @ 2023-07-30 15:33 UTC (permalink / raw)
  To: Michael Matz; +Cc: gcc mailing list

Am 27.07.23 um 15:43 schrieb Michael Matz:

> I've recently submitted a patch that adds some attributes that basically
> say "these-and-those regs aren't clobbered by this function" (I did them
> for not clobbered xmm8-15).  Something similar could be used for the new
> GPRs as well.  Then it would be a matter of ensuring that the interesting
> functions are marked with that attributes (and then of course do the
> necessary call-save/restore).

Interesting.

Taking this a bit further: The compiler knows which registers it used
(and which ones might get clobbered by called functions) and could
generate such information automatically and embed it in the assembly
file, and the assembler could, in turn, put it into the object file.

A linker (or LTO) could then check this and elide save/restore pairs
where they are not needed.

Now, I know that removing instructions during linking is a dangerous
business, and is a source of hard-to-find and rare bugs (the worst kind)
if not done right; a bullet-proof algorithm would be needed for that.

It would probably be impossible for calls into shared libraries, since
the saved registers might change from version to version.  It also would
probably not work for virtual member functions which are not found by
devirtualitzation.

Still, potential gains could be substantial, and it could have an
effect which could come close to inlining, while actually saving space
instead of using extra.

Comments?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Calling convention for Intel APX extension
  2023-07-30 15:33   ` Thomas Koenig
@ 2023-07-31 12:43     ` Michael Matz
  0 siblings, 0 replies; 5+ messages in thread
From: Michael Matz @ 2023-07-31 12:43 UTC (permalink / raw)
  To: Thomas Koenig; +Cc: gcc mailing list

Hello,

On Sun, 30 Jul 2023, Thomas Koenig wrote:

> > I've recently submitted a patch that adds some attributes that basically
> > say "these-and-those regs aren't clobbered by this function" (I did them
> > for not clobbered xmm8-15).  Something similar could be used for the new
> > GPRs as well.  Then it would be a matter of ensuring that the interesting
> > functions are marked with that attributes (and then of course do the
> > necessary call-save/restore).
> 
> Interesting.
> 
> Taking this a bit further: The compiler knows which registers it used
> (and which ones might get clobbered by called functions) and could
> generate such information automatically and embed it in the assembly
> file, and the assembler could, in turn, put it into the object file.
> 
> A linker (or LTO) could then check this and elide save/restore pairs
> where they are not needed.

LTO with interprocedural register allocation (-fipa-ra) already does this.  

Doing it without LTO is possible to implement in the way you suggest, but 
is very hard to get effective: the problem is that saving/restoring of 
registers might be scheduled in non-trivial ways and getting rid of 
instruction bytes within function bodies at link time is fairly 
non-trivial: it needs excessive meta-information to be effective (e.g. all 
jumps that potentially cross the removed bytes must get relocations).

So you either limit the ways that prologue and epilogues are emitted to 
help the linker (thereby limiting effectiveness of unchanged xlogues) or 
you emit more meta-info than the instruction bytes themself, bloating 
object files for dubious outcomes.

> It would probably be impossible for calls into shared libraries, since
> the saved registers might change from version to version.

The above scheme could be extended to also allow introducing stubs 
(wrappers) for shared lib functions, handled by the dynamic loader.  But 
then you would get hard problems to solve related to function addresses 
and their uniqueness.

> Still, potential gains could be substantial, and it could have an
> effect which could come close to inlining, while actually saving space
> instead of using extra.
> 
> Comments?

I think it would be an interesting experiment to implement such scheme 
fully just to see how effective it would be in practice.  But it's very 
non-trivial to do, and my guess is that it won't be super effective.  So, 
could be a typical research paper topic :-)

At least outside of extreme cases like the SSE regs, where none are 
callee-saved, and which can be handled in a different way like the 
explicit attributes.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-07-31 12:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-27  8:38 Calling convention for Intel APX extension Thomas Koenig
2023-07-27 13:10 ` Florian Weimer
2023-07-27 13:43 ` Michael Matz
2023-07-30 15:33   ` Thomas Koenig
2023-07-31 12:43     ` Michael Matz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).