public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/116497] New: Need no_caller_saved_registers with SSE support
@ 2024-08-27  5:42 andi-gcc at firstfloor dot org
  2024-08-27  5:49 ` [Bug target/116497] " andi-gcc at firstfloor dot org
                   ` (22 more replies)
  0 siblings, 23 replies; 24+ messages in thread
From: andi-gcc at firstfloor dot org @ 2024-08-27  5:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116497

            Bug ID: 116497
           Summary: Need no_caller_saved_registers with SSE support
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: andi-gcc at firstfloor dot org
                CC: hjl.tools at gmail dot com
  Target Milestone: ---
            Target: x86_64-linux

When writing threaded code interpreters by chaining functions with musttail the
normal ABI behavior of some caller saved registers can cause unnecessary spills
and fills compared to using indirect goto.

In principle this could be avoided by using no_caller_saved_registers on the
musttail called function, and perhaps no_callee_saved_registers on the function
that starts the interpretation chain to maintain the ABI on the interpreter
entry point.

But these attributes were designed for interrupt handlers and require disabling
SSE because an interrupt handler needs to really preserve all registers. While
for the interpreter case which interacts with the normal ABI it is fine to
clobber the SSE registers, as specified by the x86_64 SYSV ABI

So disabling SSE can be done (and it is done in some real code today, see [1]
below) it is very inconvenient for an interpreter that may want to use SSE for
floating point etc.

So what we really need for the efficient musttail interpreters is
no_caller_saved_registers, but allow using SSE.

clang has a special ABI for this case (preserve_most[2]), but that seems
overkill.

There are two ways around this:

- We just remove the code that enforces no SSE for (see below patch). The
interrupt handlers would need to disable SSE without error and trust that 
it doesn't happen by mistake.

I'm not sure there is much code that uses this (I couldn't find any).
Presumably they would rather use the interrupt attribute anyways.

- We define a new attribute like no_caller_saved_registers except that it
allows using SSE.

[1] 
https://github.com/swoole/swoole-cli/blob/94ab97fbcfe39be8f5a985da82575bfb4c2319db/Zend/zend_string.c#L376

[2]
https://clang.llvm.org/docs/AttributeReference.html#preserve-most

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/116497] Need no_caller_saved_registers with SSE support
  2024-08-27  5:42 [Bug target/116497] New: Need no_caller_saved_registers with SSE support andi-gcc at firstfloor dot org
@ 2024-08-27  5:49 ` andi-gcc at firstfloor dot org
  2024-08-27  5:50 ` pinskia at gcc dot gnu.org
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: andi-gcc at firstfloor dot org @ 2024-08-27  5:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116497

--- Comment #1 from Andi Kleen <andi-gcc at firstfloor dot org> ---
Disable check for no_caller_saved_registers enforcing non FP.

diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index f79257cc764..cec652cc9e6 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -3639,8 +3639,8 @@ ix86_set_current_function (tree fndecl)
     reinit_regs ();

   if (cfun->machine->func_type != TYPE_NORMAL
-      || (cfun->machine->call_saved_registers
-         == TYPE_NO_CALLER_SAVED_REGISTERS))
+      /* || (cfun->machine->call_saved_registers
+        == TYPE_NO_CALLER_SAVED_REGISTERS) */)
     {
       /* Don't allow SSE, MMX nor x87 instructions since they
         may change processor state.  */

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/116497] Need no_caller_saved_registers with SSE support
  2024-08-27  5:42 [Bug target/116497] New: Need no_caller_saved_registers with SSE support andi-gcc at firstfloor dot org
  2024-08-27  5:49 ` [Bug target/116497] " andi-gcc at firstfloor dot org
@ 2024-08-27  5:50 ` pinskia at gcc dot gnu.org
  2024-08-27  5:58 ` [Bug target/116497] static functions ABI should be improved for SSE caller saved registers pinskia at gcc dot gnu.org
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-08-27  5:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116497

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Sounds more like the attribute is not needed but gcc should figure out how to
improve the abi for static functions instead, like what is done already for
32bit x86.

I think even musttail is also a bogus way of doing this.

Attributes should not be used but rather improving gcc in more generic way.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/116497] static functions ABI should be improved for SSE caller saved registers
  2024-08-27  5:42 [Bug target/116497] New: Need no_caller_saved_registers with SSE support andi-gcc at firstfloor dot org
  2024-08-27  5:49 ` [Bug target/116497] " andi-gcc at firstfloor dot org
  2024-08-27  5:50 ` pinskia at gcc dot gnu.org
@ 2024-08-27  5:58 ` pinskia at gcc dot gnu.org
  2024-08-27  6:03 ` pinskia at gcc dot gnu.org
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-08-27  5:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116497

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/116497] static functions ABI should be improved for SSE caller saved registers
  2024-08-27  5:42 [Bug target/116497] New: Need no_caller_saved_registers with SSE support andi-gcc at firstfloor dot org
                   ` (2 preceding siblings ...)
  2024-08-27  5:58 ` [Bug target/116497] static functions ABI should be improved for SSE caller saved registers pinskia at gcc dot gnu.org
@ 2024-08-27  6:03 ` pinskia at gcc dot gnu.org
  2024-08-27  6:17 ` andi at firstfloor dot org
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-08-27  6:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116497

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
>When writing threaded code interpreters by chaining functions with musttail the normal ABI behavior of some caller saved registers can cause unnecessary spills and fills compared to using indirect goto.

THIS is why I call all of this attribute usage a hack since it means you will
always need to keep on changing the sources of the application rather than ever
doing improvements to GCC that would help code that didn't even know about the
attributes. The same is true of this whole musttail attribute. It does nothing
except provide an error message. There are better ways of implementing that
inside GCC really than the attribute that was added. GCC has -fopt-info which
should have been used instead.

Here is another place where the attribute is just a way to hack around instead
of improving GCC for ABI for static functions.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/116497] static functions ABI should be improved for SSE caller saved registers
  2024-08-27  5:42 [Bug target/116497] New: Need no_caller_saved_registers with SSE support andi-gcc at firstfloor dot org
                   ` (3 preceding siblings ...)
  2024-08-27  6:03 ` pinskia at gcc dot gnu.org
@ 2024-08-27  6:17 ` andi at firstfloor dot org
  2024-08-27  6:25 ` [Bug target/116497] Need no_caller_saved_registers with SSE support pinskia at gcc dot gnu.org
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: andi at firstfloor dot org @ 2024-08-27  6:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116497

--- Comment #4 from andi at firstfloor dot org ---
The change of the subject is incorrect. The transformation
has nothing to do with static function: consider LTO or someone
might write an interpreter spread over multiple files.

> always need to keep on changing the sources of the application rather than ever
> doing improvements to GCC that would help code that didn't even know about the
> attributes. The same is true of this whole musttail attribute. It does nothing
> except provide an error message. There are better ways of implementing that
> inside GCC really than the attribute that was added. GCC has -fopt-info which
> should have been used instead.

-fopt-info is not a useful interface for checking for an transformation
that is needed for correctness.

> Here is another place where the attribute is just a way to hack around instead
> of improving GCC for ABI for static functions.

So yes the compiler might figure this out on its own for some 
cases, but it would need this proposed attribute semantic change anyways to
do this.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/116497] Need no_caller_saved_registers with SSE support
  2024-08-27  5:42 [Bug target/116497] New: Need no_caller_saved_registers with SSE support andi-gcc at firstfloor dot org
                   ` (4 preceding siblings ...)
  2024-08-27  6:17 ` andi at firstfloor dot org
@ 2024-08-27  6:25 ` pinskia at gcc dot gnu.org
  2024-08-27  7:58 ` liuhongt at gcc dot gnu.org
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-08-27  6:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116497

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Musttail can never be used for correctness.

Also lto deals deals just fine with localizing a function.

But again you are making hacks what for? Code which is specific to one
application rather than making improvements for gcc for all. I am sorry but
that seems like the wrong. Improve optimizations instead of thinking of adding
attributes.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/116497] Need no_caller_saved_registers with SSE support
  2024-08-27  5:42 [Bug target/116497] New: Need no_caller_saved_registers with SSE support andi-gcc at firstfloor dot org
                   ` (5 preceding siblings ...)
  2024-08-27  6:25 ` [Bug target/116497] Need no_caller_saved_registers with SSE support pinskia at gcc dot gnu.org
@ 2024-08-27  7:58 ` liuhongt at gcc dot gnu.org
  2024-08-27  8:02 ` rguenth at gcc dot gnu.org
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-08-27  7:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116497

Hongtao Liu <liuhongt at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |liuhongt at gcc dot gnu.org

--- Comment #6 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to Andi Kleen from comment #1)
> Disable check for no_caller_saved_registers enforcing non FP.
> 
> diff --git a/gcc/config/i386/i386-options.cc
> b/gcc/config/i386/i386-options.cc
> index f79257cc764..cec652cc9e6 100644
> --- a/gcc/config/i386/i386-options.cc
> +++ b/gcc/config/i386/i386-options.cc
> @@ -3639,8 +3639,8 @@ ix86_set_current_function (tree fndecl)
>      reinit_regs ();
>  
>    if (cfun->machine->func_type != TYPE_NORMAL
> -      || (cfun->machine->call_saved_registers
> -         == TYPE_NO_CALLER_SAVED_REGISTERS))
> +      /* || (cfun->machine->call_saved_registers
> +        == TYPE_NO_CALLER_SAVED_REGISTERS) */)
>      {
>        /* Don't allow SSE, MMX nor x87 instructions since they
>          may change processor state.  */

I think RA is smart enough to save and restore SSE,MMX or x87 registers, and we
can remove TYPE_NO_CALLER_SAVED_REGISTERS part from this.
Or are there any other concerns here regarding the comments? @hj

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/116497] Need no_caller_saved_registers with SSE support
  2024-08-27  5:42 [Bug target/116497] New: Need no_caller_saved_registers with SSE support andi-gcc at firstfloor dot org
                   ` (6 preceding siblings ...)
  2024-08-27  7:58 ` liuhongt at gcc dot gnu.org
@ 2024-08-27  8:02 ` rguenth at gcc dot gnu.org
  2024-08-27  8:10 ` andi at firstfloor dot org
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-08-27  8:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116497

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
Hmm, why would a tail call need to save extra regs over what the callers caller
already saved?  We're returning to that after all.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/116497] Need no_caller_saved_registers with SSE support
  2024-08-27  5:42 [Bug target/116497] New: Need no_caller_saved_registers with SSE support andi-gcc at firstfloor dot org
                   ` (7 preceding siblings ...)
  2024-08-27  8:02 ` rguenth at gcc dot gnu.org
@ 2024-08-27  8:10 ` andi at firstfloor dot org
  2024-08-27  8:14 ` andi at firstfloor dot org
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: andi at firstfloor dot org @ 2024-08-27  8:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116497

--- Comment #8 from andi at firstfloor dot org ---
On Tue, Aug 27, 2024 at 07:58:30AM +0000, liuhongt at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116497
> 
> Hongtao Liu <liuhongt at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |liuhongt at gcc dot gnu.org
> 
> --- Comment #6 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> (In reply to Andi Kleen from comment #1)
> > Disable check for no_caller_saved_registers enforcing non FP.
> > 
> > diff --git a/gcc/config/i386/i386-options.cc
> > b/gcc/config/i386/i386-options.cc
> > index f79257cc764..cec652cc9e6 100644
> > --- a/gcc/config/i386/i386-options.cc
> > +++ b/gcc/config/i386/i386-options.cc
> > @@ -3639,8 +3639,8 @@ ix86_set_current_function (tree fndecl)
> >      reinit_regs ();
> >  
> >    if (cfun->machine->func_type != TYPE_NORMAL
> > -      || (cfun->machine->call_saved_registers
> > -         == TYPE_NO_CALLER_SAVED_REGISTERS))
> > +      /* || (cfun->machine->call_saved_registers
> > +        == TYPE_NO_CALLER_SAVED_REGISTERS) */)
> >      {
> >        /* Don't allow SSE, MMX nor x87 instructions since they
> >          may change processor state.  */
> 
> I think RA is smart enough to save and restore SSE,MMX or x87 registers, and we
> can remove TYPE_NO_CALLER_SAVED_REGISTERS part from this.
> Or are there any other concerns here regarding the comments? @hj

While that would work, it would add unnecessary overhead
to the interpreter entry case which doesn't need to save these registers.

On the other hand maybe it is cheap enough.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/116497] Need no_caller_saved_registers with SSE support
  2024-08-27  5:42 [Bug target/116497] New: Need no_caller_saved_registers with SSE support andi-gcc at firstfloor dot org
                   ` (8 preceding siblings ...)
  2024-08-27  8:10 ` andi at firstfloor dot org
@ 2024-08-27  8:14 ` andi at firstfloor dot org
  2024-08-27  8:26 ` xry111 at gcc dot gnu.org
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: andi at firstfloor dot org @ 2024-08-27  8:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116497

--- Comment #9 from andi at firstfloor dot org ---
On Tue, Aug 27, 2024 at 08:02:53AM +0000, rguenth at gcc dot gnu.org wrote:
> Hmm, why would a tail call need to save extra regs over what the callers caller
> already saved?  We're returning to that after all.

The tail call doesn't need to save anything extra, but the original
function entering the tail call loop needs to, otherwise the tail
calls clobbering stuff would violate assumptions of its caller.

So the transformation is

original_caller -> add no_caller_saved_registers
tailcall loop functions -> add no_callee_saved_registers

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/116497] Need no_caller_saved_registers with SSE support
  2024-08-27  5:42 [Bug target/116497] New: Need no_caller_saved_registers with SSE support andi-gcc at firstfloor dot org
                   ` (9 preceding siblings ...)
  2024-08-27  8:14 ` andi at firstfloor dot org
@ 2024-08-27  8:26 ` xry111 at gcc dot gnu.org
  2024-08-27 13:27 ` hjl.tools at gmail dot com
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: xry111 at gcc dot gnu.org @ 2024-08-27  8:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116497

Xi Ruoyao <xry111 at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |xry111 at gcc dot gnu.org

--- Comment #10 from Xi Ruoyao <xry111 at gcc dot gnu.org> ---
no_call{er,ee}_saved_registers are i386-specific so how do we handle other
ports?  Are we going to require implementing them for all ports?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/116497] Need no_caller_saved_registers with SSE support
  2024-08-27  5:42 [Bug target/116497] New: Need no_caller_saved_registers with SSE support andi-gcc at firstfloor dot org
                   ` (10 preceding siblings ...)
  2024-08-27  8:26 ` xry111 at gcc dot gnu.org
@ 2024-08-27 13:27 ` hjl.tools at gmail dot com
  2024-08-27 14:21 ` andi at firstfloor dot org
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: hjl.tools at gmail dot com @ 2024-08-27 13:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116497

H.J. Lu <hjl.tools at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |WAITING
   Last reconfirmed|                            |2024-08-27

--- Comment #11 from H.J. Lu <hjl.tools at gmail dot com> ---
Please provide a small testcase to show the issue.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/116497] Need no_caller_saved_registers with SSE support
  2024-08-27  5:42 [Bug target/116497] New: Need no_caller_saved_registers with SSE support andi-gcc at firstfloor dot org
                   ` (11 preceding siblings ...)
  2024-08-27 13:27 ` hjl.tools at gmail dot com
@ 2024-08-27 14:21 ` andi at firstfloor dot org
  2024-08-27 14:33 ` andi at firstfloor dot org
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: andi at firstfloor dot org @ 2024-08-27 14:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116497

--- Comment #12 from andi at firstfloor dot org ---
> no_call{er,ee}_saved_registers are i386-specific so how do we handle other
> ports?  Are we going to require implementing them for all ports?

It's an optimization, so nothing is required. But yes if the other
targets want the more efficient interpreters they would need an
equivalent. clang's preserve_none/most currently is supported on aarch64
and x86-64. Right now I'm just trying to get it to work for x86.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/116497] Need no_caller_saved_registers with SSE support
  2024-08-27  5:42 [Bug target/116497] New: Need no_caller_saved_registers with SSE support andi-gcc at firstfloor dot org
                   ` (12 preceding siblings ...)
  2024-08-27 14:21 ` andi at firstfloor dot org
@ 2024-08-27 14:33 ` andi at firstfloor dot org
  2024-08-27 14:47 ` pinskia at gcc dot gnu.org
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: andi at firstfloor dot org @ 2024-08-27 14:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116497

--- Comment #13 from andi at firstfloor dot org ---
> --- Comment #11 from H.J. Lu <hjl.tools at gmail dot com> ---
> Please provide a small testcase to show the issue.

You mean a test case for no_caller_saved_registers failing with SSE?

It's just 

__attribute__((no_caller_saved_registers)) void foo(void) {} 

(if you compile without any special options, or use the
target dance as seen in the PHP github link above)

Or a test case for the intended register allocation benefits?
That's more complicated and won't be small.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/116497] Need no_caller_saved_registers with SSE support
  2024-08-27  5:42 [Bug target/116497] New: Need no_caller_saved_registers with SSE support andi-gcc at firstfloor dot org
                   ` (13 preceding siblings ...)
  2024-08-27 14:33 ` andi at firstfloor dot org
@ 2024-08-27 14:47 ` pinskia at gcc dot gnu.org
  2024-08-27 15:19 ` hjl.tools at gmail dot com
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-08-27 14:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116497

--- Comment #14 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to andi from comment #13)
> Or a test case for the intended register allocation benefits?
> That's more complicated and won't be small.

So what if it won't be small but it will be understanding the overall benefit
that is if it is a good idea or not.

Note you can fake high register pressure by using inline-asm and clobbers which
should make the testcase small really :).

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/116497] Need no_caller_saved_registers with SSE support
  2024-08-27  5:42 [Bug target/116497] New: Need no_caller_saved_registers with SSE support andi-gcc at firstfloor dot org
                   ` (14 preceding siblings ...)
  2024-08-27 14:47 ` pinskia at gcc dot gnu.org
@ 2024-08-27 15:19 ` hjl.tools at gmail dot com
  2024-08-27 15:53 ` andi-gcc at firstfloor dot org
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: hjl.tools at gmail dot com @ 2024-08-27 15:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116497

--- Comment #15 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to andi from comment #13)
> > --- Comment #11 from H.J. Lu <hjl.tools at gmail dot com> ---
> > Please provide a small testcase to show the issue.
> 
> You mean a test case for no_caller_saved_registers failing with SSE?

No.  We need a testcase to show the missed optimization without
no_caller_saved_registers.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/116497] Need no_caller_saved_registers with SSE support
  2024-08-27  5:42 [Bug target/116497] New: Need no_caller_saved_registers with SSE support andi-gcc at firstfloor dot org
                   ` (15 preceding siblings ...)
  2024-08-27 15:19 ` hjl.tools at gmail dot com
@ 2024-08-27 15:53 ` andi-gcc at firstfloor dot org
  2024-08-27 16:25 ` hjl.tools at gmail dot com
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: andi-gcc at firstfloor dot org @ 2024-08-27 15:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116497

--- Comment #16 from Andi Kleen <andi-gcc at firstfloor dot org> ---
Created attachment 59013
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59013&action=edit
test case

This test case using Pinski's clobber trick shows the benefit.

If you compile with -O2 -mgeneral-regs-only the inc/dec opcodes don't save any
extra registers and generate nearly optimal code. If you make the
SAVE_REGS/DONT_SAVE_REGS macros empty they have a lot of extra push/pop, which
would ruin the interpreter loop.

-mgeneral-regs-only works for this case, but breaks SSE.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/116497] Need no_caller_saved_registers with SSE support
  2024-08-27  5:42 [Bug target/116497] New: Need no_caller_saved_registers with SSE support andi-gcc at firstfloor dot org
                   ` (16 preceding siblings ...)
  2024-08-27 15:53 ` andi-gcc at firstfloor dot org
@ 2024-08-27 16:25 ` hjl.tools at gmail dot com
  2024-08-27 17:06 ` andi at firstfloor dot org
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: hjl.tools at gmail dot com @ 2024-08-27 16:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116497

--- Comment #17 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to Andi Kleen from comment #16)
> Created attachment 59013 [details]
> test case
> 
> This test case using Pinski's clobber trick shows the benefit.
> 
> If you compile with -O2 -mgeneral-regs-only the inc/dec opcodes don't save
> any extra registers and generate nearly optimal code. If you make the
> SAVE_REGS/DONT_SAVE_REGS macros empty they have a lot of extra push/pop,
> which would ruin the interpreter loop.
> 
> -mgeneral-regs-only works for this case, but breaks SSE.

Why is __attribute__((no_caller_saved_registers)) needed on start?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/116497] Need no_caller_saved_registers with SSE support
  2024-08-27  5:42 [Bug target/116497] New: Need no_caller_saved_registers with SSE support andi-gcc at firstfloor dot org
                   ` (17 preceding siblings ...)
  2024-08-27 16:25 ` hjl.tools at gmail dot com
@ 2024-08-27 17:06 ` andi at firstfloor dot org
  2024-08-27 17:12 ` hjl.tools at gmail dot com
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: andi at firstfloor dot org @ 2024-08-27 17:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116497

--- Comment #18 from andi at firstfloor dot org ---
> > -mgeneral-regs-only works for this case, but breaks SSE.
> 
> Why is __attribute__((no_caller_saved_registers)) needed on start?

To maintain the standard ABI to its caller. Otherwise the final
return could clobber caller state.

Basically the optimization is to move all the save/returns to only
one wrapper function, not every interpreter op function.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/116497] Need no_caller_saved_registers with SSE support
  2024-08-27  5:42 [Bug target/116497] New: Need no_caller_saved_registers with SSE support andi-gcc at firstfloor dot org
                   ` (18 preceding siblings ...)
  2024-08-27 17:06 ` andi at firstfloor dot org
@ 2024-08-27 17:12 ` hjl.tools at gmail dot com
  2024-08-27 17:49 ` andi at firstfloor dot org
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: hjl.tools at gmail dot com @ 2024-08-27 17:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116497

--- Comment #19 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to andi from comment #18)
> > > -mgeneral-regs-only works for this case, but breaks SSE.
> > 
> > Why is __attribute__((no_caller_saved_registers)) needed on start?
> 
> To maintain the standard ABI to its caller. Otherwise the final
> return could clobber caller state.

GCC should do the right thing without no_caller_saved_registers. If not,
it is a GCC bug.  Do you have a testcase to show such GCC bug?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/116497] Need no_caller_saved_registers with SSE support
  2024-08-27  5:42 [Bug target/116497] New: Need no_caller_saved_registers with SSE support andi-gcc at firstfloor dot org
                   ` (19 preceding siblings ...)
  2024-08-27 17:12 ` hjl.tools at gmail dot com
@ 2024-08-27 17:49 ` andi at firstfloor dot org
  2024-08-27 17:50 ` andi-gcc at firstfloor dot org
  2024-08-27 17:50 ` andi-gcc at firstfloor dot org
  22 siblings, 0 replies; 24+ messages in thread
From: andi at firstfloor dot org @ 2024-08-27 17:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116497

--- Comment #20 from andi at firstfloor dot org ---
On Tue, Aug 27, 2024 at 05:12:41PM +0000, hjl.tools at gmail dot com wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116497
> 
> --- Comment #19 from H.J. Lu <hjl.tools at gmail dot com> ---
> (In reply to andi from comment #18)
> > > > -mgeneral-regs-only works for this case, but breaks SSE.
> > > 
> > > Why is __attribute__((no_caller_saved_registers)) needed on start?
> > 
> > To maintain the standard ABI to its caller. Otherwise the final
> > return could clobber caller state.
> 
> GCC should do the right thing without no_caller_saved_registers. If not,
> it is a GCC bug.  Do you have a testcase to show such GCC bug?

The test case is the same, just commenting out SAVE_REGS.

You're right. It seems gcc does the right thing based on the callee
ABIs. I hadn't realized that.

So yes the attribute and the change are not really needed. Good news.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/116497] Need no_caller_saved_registers with SSE support
  2024-08-27  5:42 [Bug target/116497] New: Need no_caller_saved_registers with SSE support andi-gcc at firstfloor dot org
                   ` (20 preceding siblings ...)
  2024-08-27 17:49 ` andi at firstfloor dot org
@ 2024-08-27 17:50 ` andi-gcc at firstfloor dot org
  2024-08-27 17:50 ` andi-gcc at firstfloor dot org
  22 siblings, 0 replies; 24+ messages in thread
From: andi-gcc at firstfloor dot org @ 2024-08-27 17:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116497

--- Comment #21 from Andi Kleen <andi-gcc at firstfloor dot org> ---
As HJ pointed out the change is not needed, the compiler DTRT with
no_callee_saved_registers on the callees.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug target/116497] Need no_caller_saved_registers with SSE support
  2024-08-27  5:42 [Bug target/116497] New: Need no_caller_saved_registers with SSE support andi-gcc at firstfloor dot org
                   ` (21 preceding siblings ...)
  2024-08-27 17:50 ` andi-gcc at firstfloor dot org
@ 2024-08-27 17:50 ` andi-gcc at firstfloor dot org
  22 siblings, 0 replies; 24+ messages in thread
From: andi-gcc at firstfloor dot org @ 2024-08-27 17:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116497

Andi Kleen <andi-gcc at firstfloor dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |INVALID
             Status|WAITING                     |RESOLVED

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2024-08-27 17:50 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-27  5:42 [Bug target/116497] New: Need no_caller_saved_registers with SSE support andi-gcc at firstfloor dot org
2024-08-27  5:49 ` [Bug target/116497] " andi-gcc at firstfloor dot org
2024-08-27  5:50 ` pinskia at gcc dot gnu.org
2024-08-27  5:58 ` [Bug target/116497] static functions ABI should be improved for SSE caller saved registers pinskia at gcc dot gnu.org
2024-08-27  6:03 ` pinskia at gcc dot gnu.org
2024-08-27  6:17 ` andi at firstfloor dot org
2024-08-27  6:25 ` [Bug target/116497] Need no_caller_saved_registers with SSE support pinskia at gcc dot gnu.org
2024-08-27  7:58 ` liuhongt at gcc dot gnu.org
2024-08-27  8:02 ` rguenth at gcc dot gnu.org
2024-08-27  8:10 ` andi at firstfloor dot org
2024-08-27  8:14 ` andi at firstfloor dot org
2024-08-27  8:26 ` xry111 at gcc dot gnu.org
2024-08-27 13:27 ` hjl.tools at gmail dot com
2024-08-27 14:21 ` andi at firstfloor dot org
2024-08-27 14:33 ` andi at firstfloor dot org
2024-08-27 14:47 ` pinskia at gcc dot gnu.org
2024-08-27 15:19 ` hjl.tools at gmail dot com
2024-08-27 15:53 ` andi-gcc at firstfloor dot org
2024-08-27 16:25 ` hjl.tools at gmail dot com
2024-08-27 17:06 ` andi at firstfloor dot org
2024-08-27 17:12 ` hjl.tools at gmail dot com
2024-08-27 17:49 ` andi at firstfloor dot org
2024-08-27 17:50 ` andi-gcc at firstfloor dot org
2024-08-27 17:50 ` andi-gcc at firstfloor dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).