Re: optimizing calling conventions for function returns

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: optimizing calling conventions for function returns
@ 2006-05-24  9:57 Etienne Lorrain
  2006-05-24 10:04 ` Andrew Pinski
  0 siblings, 1 reply; 16+ messages in thread
From: Etienne Lorrain @ 2006-05-24  9:57 UTC (permalink / raw)
  To: gcc; +Cc: jonsmirl

> Looking at assembly listings of the Linux kernel I see thousands of
> places where function returns are checked to be non-zero to indicate
> errors. For example something like this:
> 
>     mov bx, 0
> .L1
>    call foo
>    test ax,ax
>    jnz .Lerror

 Another calling convention could be to not only return the "return value"
in %eax (or %edx:%eax for long long returns) but also its comparisson to
zero in the flags, so that you get:
    call foo
    jg  .Lwarning
    jnz .Lerror

 The test is done in the called function, but it is often do there anyway,
for instance when another internal function failed there is a chain of
return and the same %eax value is tested over and over again, in each
function body, followed by a return.

  Etienne.

___________________________________________________________________________ 
Yahoo! Mail rÃ©invente le mail ! DÃ©couvrez le nouveau Yahoo! Mail et son interface rÃ©volutionnaire. 
http://fr.mail.yahoo.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: optimizing calling conventions for function returns
  2006-05-24  9:57 optimizing calling conventions for function returns Etienne Lorrain
@ 2006-05-24 10:04 ` Andrew Pinski
  2006-05-24 11:13   ` Etienne Lorrain
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Pinski @ 2006-05-24 10:04 UTC (permalink / raw)
  To: Etienne Lorrain; +Cc: gcc, jonsmirl


On May 24, 2006, at 2:54 AM, Etienne Lorrain wrote:
>  Another calling convention could be to not only return the "return  
> value"
> in %eax (or %edx:%eax for long long returns) but also its  
> comparisson to
> zero in the flags, so that you get:
>     call foo
>     jg  .Lwarning
>     jnz .Lerror

And you think this will help?  It will at most 1-10 cycles depending
on the processor.  And if you have a call in the hot loop, you are  
screwed
anyways because you will have to deal with the overhead of the call.
So it will end up being about even.

-- Pinski   

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: optimizing calling conventions for function returns
  2006-05-24 10:04 ` Andrew Pinski
@ 2006-05-24 11:13   ` Etienne Lorrain
  0 siblings, 0 replies; 16+ messages in thread
From: Etienne Lorrain @ 2006-05-24 11:13 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: gcc, jonsmirl

--- Andrew Pinski wrote:
> On May 24, 2006, at 2:54 AM, Etienne Lorrain wrote:
> >  Another calling convention could be to not only return the "return  
> > value" in %eax (or %edx:%eax for long long returns) but also its  
> > comparisson to zero in the flags, so that you get:
> >     call foo
> >     jg  .Lwarning
> >     jnz .Lerror
> 
> And you think this will help?  It will at most 1-10 cycles depending
> on the processor.

  The same can be said of register passing argument, at least for ia32
 passing a parameter in %edx/%eax instead of the stack saves one or
 two loads so few cycles - and by increasing the register pressure you
 often get a slower and (a lot) bigger function (what I have seen).
 But in some cases, a function attribute placed in the right position
 (usually very small functions) can help.

> And if you have a call in the hot loop, you are screwed
> anyways because you will have to deal with the overhead of the call.

  I am not sure of what you are refering to, but there is plenty of
 places where you are screwed - for instance the stack readjustment is
 the better done by "mov %ebp,%esp" instead of "add $16,%esp".

> So it will end up being about even.

  I was thinking of very small functions, kind of one instruction,
 something like:

asm (" atomic_dec: \n lock dec (%eax) \n ret ");
  extern unsigned atomic_dec (unsigned *counter) __attribute__((return_flags));
  void fct (void) { while (atomic_dec()) wait(); }

  But it was an intermediate solution to the problem of passing two return
 values to a function - I am not sure it worth the time to implement.

  Etienne.

___________________________________________________________________________ 
Yahoo! Mail rÃ©invente le mail ! DÃ©couvrez le nouveau Yahoo! Mail et son interface rÃ©volutionnaire. 
http://fr.mail.yahoo.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: optimizing calling conventions for function returns
  2006-05-25 17:21     ` Jon Smirl
@ 2006-05-25 18:56       ` Geert Bosch
  0 siblings, 0 replies; 16+ messages in thread
From: Geert Bosch @ 2006-05-25 18:56 UTC (permalink / raw)
  To: Jon Smirl; +Cc: gcc

On May 25, 2006, at 13:21, Jon Smirl wrote:
>            jmp   *4($esp)
>
> This is slightly faster than addl, ret.

The point is that this is only executed in the error case.
> But my micro scale benchmarks are extremely influenced by changes in
> branch prediction. I still wonder how this would perform in large
> programs.

The jmp *4($esp) doesn't confuse the branch predictors. Basically
the assumption is that call and ret instructions match up. Your
addl, ret messes up that assumption, which means the return predictions
will all be wrong.

Maybe the future link-time optimizations might be able to handle
this kind of error-exit code automatically, but for now I think
your best bet is handling this explicitly or just not worry about
the minor inefficiency.

   -Geert

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: optimizing calling conventions for function returns
  2006-05-25 15:21   ` Jon Smirl
@ 2006-05-25 17:21     ` Jon Smirl
  2006-05-25 18:56       ` Geert Bosch
  0 siblings, 1 reply; 16+ messages in thread
From: Jon Smirl @ 2006-05-25 17:21 UTC (permalink / raw)
  To: Geert Bosch; +Cc: gcc

On 5/25/06, Jon Smirl <jonsmirl@gmail.com> wrote:
> I ran into another snag that taking the alternative return on a P4 has
> really bad performance impacts since it messes up prefetch. This
> sequence is the killer.
>
>        addl    $4, %esp
>        ret                                     /* Return to error return */
>
> I can try coding this as a parameter and see how the compiler
> generates code differently.

            jmp   *4($esp)

This is slightly faster than addl, ret.

But my micro scale benchmarks are extremely influenced by changes in
branch prediction. I still wonder how this would perform in large
programs.

It seems that the sequence

   ret
   test
   jne

is very fast compared to

   jmp   *4($esp)

Even when they both end up at the same place. It looks to me like the
call stack predictor is controlling everything.  The only way to make
this work would be to figure out some way to get the alternative
return address into the call stack predictor.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: optimizing calling conventions for function returns
  2006-05-25 14:16 ` Geert Bosch
@ 2006-05-25 15:21   ` Jon Smirl
  2006-05-25 17:21     ` Jon Smirl
  0 siblings, 1 reply; 16+ messages in thread
From: Jon Smirl @ 2006-05-25 15:21 UTC (permalink / raw)
  To: Geert Bosch; +Cc: gcc

On 5/25/06, Geert Bosch <bosch@adacore.com> wrote:
>
> On May 23, 2006, at 11:21, Jon Smirl wrote:
>
> > A new calling convention could push two return addresses for functions
> > that return their status in EAX. On EAX=0 you take the first return,
> > EAX != 0 you take the second.
>
> This seems the same as passing an extra function pointer
> argument and calling that instead of doing a regular return.
> Tail-call optimization should turn the calll into a jump.
>
> Why do you think a custom ABI is necessary?

The new ABI may not be necessary but adding an extra parameter would
require changing source everywhere. The ABI scheme is source
transparent and lets the compiler locate the places where it would be
a win. The ABI scheme would also let the alternative return be pushed
on the stack once no matter how many calls were made, a parameter has
to be pushed each time.

I ran into another snag that taking the alternative return on a P4 has
really bad performance impacts since it messes up prefetch. This
sequence is the killer.

       addl    $4, %esp
       ret                                     /* Return to error return */

I can try coding this as a parameter and see how the compiler
generates code differently.

The sequence of call, test, jne (or slight variations) occurs in
1000's of places, if a better alternative can be found there could be
significant perofrmance gains. I haven't found a good solution yet,
any help would be appreciated.

>
>    -Geert
>

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: optimizing calling conventions for function returns
  2006-05-23 15:21 Jon Smirl
                   ` (2 preceding siblings ...)
  2006-05-23 21:09 ` Gabriel Paubert
@ 2006-05-25 14:16 ` Geert Bosch
  2006-05-25 15:21   ` Jon Smirl
  3 siblings, 1 reply; 16+ messages in thread
From: Geert Bosch @ 2006-05-25 14:16 UTC (permalink / raw)
  To: Jon Smirl; +Cc: gcc


On May 23, 2006, at 11:21, Jon Smirl wrote:

> A new calling convention could push two return addresses for functions
> that return their status in EAX. On EAX=0 you take the first return,
> EAX != 0 you take the second.

This seems the same as passing an extra function pointer
argument and calling that instead of doing a regular return.
Tail-call optimization should turn the calll into a jump.

Why do you think a custom ABI is necessary?

   -Geert

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: optimizing calling conventions for function returns
  2006-05-23 16:14 ` Paul Brook
  2006-05-23 16:37   ` Jon Smirl
@ 2006-05-24 19:09   ` Jon Smirl
  1 sibling, 0 replies; 16+ messages in thread
From: Jon Smirl @ 2006-05-24 19:09 UTC (permalink / raw)
  To: Paul Brook; +Cc: gcc

On 5/23/06, Paul Brook <paul@codesourcery.com> wrote:
> > Has work been done to evaluate a calling convention that takes error
> > checks like this into account? Are there size/performance wins? Or am
> > I just reinventing a variation on exception handling?
>
> This introduces an extra stack push and will confuse a call-stack branch
> predictor. If both the call stack and the test are normally predicted
> correctly I'd guess this would be a performance loss on modern cpus.

I just finished writing a bunch of test cases to explore the idea. My
conclusion is that if the error returns are very infrequent (<<1%)
then this is a win. But if there are a significant number of error
returns this is a major loss.

These two instructions on the error return path are the killer:
	addl	$4, %esp
	ret					/* Return to error return */

Apparently the CPU has zero expectation that the address being jumped
to is code. In the calling routine I pushed the error return as data.

	pushl	$.L11	/* push return address */

So for the non-error path there is a win by removing the error
test/jmp on the function return. But taking the error path is very
expensive.

I'm experimenting with 50 line assembly programs on a P4. I do wonder
if these micro results would apply in a macro program. My test is
losing because the return destination had been predicted and the
introduction of the addl messed up the prediction. But in a large
program with many levels of calls would the return always be predicted
on the error path?

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: optimizing calling conventions for function returns
  2006-05-23 21:09 ` Gabriel Paubert
@ 2006-05-23 21:46   ` Jon Smirl
  0 siblings, 0 replies; 16+ messages in thread
From: Jon Smirl @ 2006-05-23 21:46 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: gcc

On 5/23/06, Gabriel Paubert <paubert@iram.es> wrote:
> On Tue, May 23, 2006 at 11:21:46AM -0400, Jon Smirl wrote:
> > Has work been done to evaluate a calling convention that takes error
> > checks like this into account? Are there size/performance wins? Or am
> > I just reinventing a variation on exception handling?
>
> It's fairly close to Fortran alternate return labels, which
> were standard in Fortran 77 but have been declared obsolescent
> in later revisions of the standard.

I like this method since it can be implemented transparently in C
code. That means the Linux kernel could use it without rewriting
everything.

>
>         Regards,
>         Gabriel
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: optimizing calling conventions for function returns
  2006-05-23 15:21 Jon Smirl
  2006-05-23 16:14 ` Paul Brook
  2006-05-23 19:06 ` Mike Stump
@ 2006-05-23 21:09 ` Gabriel Paubert
  2006-05-23 21:46   ` Jon Smirl
  2006-05-25 14:16 ` Geert Bosch
  3 siblings, 1 reply; 16+ messages in thread
From: Gabriel Paubert @ 2006-05-23 21:09 UTC (permalink / raw)
  To: Jon Smirl; +Cc: gcc

On Tue, May 23, 2006 at 11:21:46AM -0400, Jon Smirl wrote:
> Has work been done to evaluate a calling convention that takes error
> checks like this into account? Are there size/performance wins? Or am
> I just reinventing a variation on exception handling?

It's fairly close to Fortran alternate return labels, which 
were standard in Fortran 77 but have been declared obsolescent
in later revisions of the standard.

	Regards,
	Gabriel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: optimizing calling conventions for function returns
  2006-05-23 20:29     ` Florian Weimer
@ 2006-05-23 20:54       ` Jon Smirl
  0 siblings, 0 replies; 16+ messages in thread
From: Jon Smirl @ 2006-05-23 20:54 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Paul Brook, gcc

On 5/23/06, Florian Weimer <fw@deneb.enyo.de> wrote:
> Yes, but the test/jump now happens in the callee, and you need to
> maintain an additional stack slot.  I wouldn't be surprised if the

The callee already had to implement the test/jmp in order to decide to
return the error. So this shouldn't introduce another one.

> change isn't a win.  Some form of exception handling for truly
> exceptional situations would probably be better (and might have helped
> to avoid quite a few of the last CVEs 8-).
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: optimizing calling conventions for function returns
  2006-05-23 16:37   ` Jon Smirl
@ 2006-05-23 20:29     ` Florian Weimer
  2006-05-23 20:54       ` Jon Smirl
  0 siblings, 1 reply; 16+ messages in thread
From: Florian Weimer @ 2006-05-23 20:29 UTC (permalink / raw)
  To: Jon Smirl; +Cc: Paul Brook, gcc

* Jon Smirl:

> Is the callstack branch correctly predicted if the routine being
> called is complex?

At least the AMD CPUs have implemented a special return stack cache,
so the answer is probably "yes".

> This does eliminate the test./jmp after every function call.

Yes, but the test/jump now happens in the callee, and you need to
maintain an additional stack slot.  I wouldn't be surprised if the
change isn't a win.  Some form of exception handling for truly
exceptional situations would probably be better (and might have helped
to avoid quite a few of the last CVEs 8-).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: optimizing calling conventions for function returns
  2006-05-23 15:21 Jon Smirl
  2006-05-23 16:14 ` Paul Brook
@ 2006-05-23 19:06 ` Mike Stump
  2006-05-23 21:09 ` Gabriel Paubert
  2006-05-25 14:16 ` Geert Bosch
  3 siblings, 0 replies; 16+ messages in thread
From: Mike Stump @ 2006-05-23 19:06 UTC (permalink / raw)
  To: Jon Smirl; +Cc: gcc

On May 23, 2006, at 8:21 AM, Jon Smirl wrote:
> Or am I just reinventing a variation on exception handling?

:-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: optimizing calling conventions for function returns
  2006-05-23 16:14 ` Paul Brook
@ 2006-05-23 16:37   ` Jon Smirl
  2006-05-23 20:29     ` Florian Weimer
  2006-05-24 19:09   ` Jon Smirl
  1 sibling, 1 reply; 16+ messages in thread
From: Jon Smirl @ 2006-05-23 16:37 UTC (permalink / raw)
  To: Paul Brook; +Cc: gcc

On 5/23/06, Paul Brook <paul@codesourcery.com> wrote:
> > Has work been done to evaluate a calling convention that takes error
> > checks like this into account? Are there size/performance wins? Or am
> > I just reinventing a variation on exception handling?
>
> This introduces an extra stack push and will confuse a call-stack branch
> predictor. If both the call stack and the test are normally predicted
> correctly I'd guess this would be a performance loss on modern cpus.

Note that the error return is above the normal return and not placed
there by a call, it should look like data to the predictor. The normal
return is placed on the stack by a call which should continue to be
correctly predicted, I would expect the error return path to be
mispredicted but it is supposed to be the unlikely case. Is the
callstack branch correctly predicted if the routine being called is
complex?

This does eliminate the test./jmp after every function call.

Further branches could be eliminated by having multiple returns from
the called function at the expense of increasing code size.

>
> Paul
>

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: optimizing calling conventions for function returns
  2006-05-23 15:21 Jon Smirl
@ 2006-05-23 16:14 ` Paul Brook
  2006-05-23 16:37   ` Jon Smirl
  2006-05-24 19:09   ` Jon Smirl
  2006-05-23 19:06 ` Mike Stump
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 16+ messages in thread
From: Paul Brook @ 2006-05-23 16:14 UTC (permalink / raw)
  To: gcc; +Cc: Jon Smirl

> Has work been done to evaluate a calling convention that takes error
> checks like this into account? Are there size/performance wins? Or am
> I just reinventing a variation on exception handling?

This introduces an extra stack push and will confuse a call-stack branch 
predictor. If both the call stack and the test are normally predicted 
correctly I'd guess this would be a performance loss on modern cpus.

Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* optimizing calling conventions for function returns
@ 2006-05-23 15:21 Jon Smirl
  2006-05-23 16:14 ` Paul Brook
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Jon Smirl @ 2006-05-23 15:21 UTC (permalink / raw)
  To: gcc

Looking at assembly listings of the Linux kernel I see thousands of
places where function returns are checked to be non-zero to indicate
errors. For example something like this:

     mov bx, 0
.L1
    call foo
    test ax,ax
    jnz .Lerror
    inc bx
    cmp bx, 10
    jne .L1
    ....

.Lerror
     process error

A new calling convention could push two return addresses for functions
that return their status in EAX. On EAX=0 you take the first return,
EAX != 0 you take the second.

So the above code becomes:

     push .Lerror
     mov bx, 0
.L1
    call foo
    inc bx
    cmp bx, 10
    jne .L1
    add sp, 2

.Lerror
    process error

The called function then does 'ret' or 'ret 4' depending on the status
of EAX != 0.

Of course there are many further optimizations that can be done, but
this illustrates the concept.

Has work been done to evaluate a calling convention that takes error
checks like this into account? Are there size/performance wins? Or am
I just reinventing a variation on exception handling?

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2006-05-25 18:56 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-05-24  9:57 optimizing calling conventions for function returns Etienne Lorrain
2006-05-24 10:04 ` Andrew Pinski
2006-05-24 11:13   ` Etienne Lorrain
  -- strict thread matches above, loose matches on Subject: below --
2006-05-23 15:21 Jon Smirl
2006-05-23 16:14 ` Paul Brook
2006-05-23 16:37   ` Jon Smirl
2006-05-23 20:29     ` Florian Weimer
2006-05-23 20:54       ` Jon Smirl
2006-05-24 19:09   ` Jon Smirl
2006-05-23 19:06 ` Mike Stump
2006-05-23 21:09 ` Gabriel Paubert
2006-05-23 21:46   ` Jon Smirl
2006-05-25 14:16 ` Geert Bosch
2006-05-25 15:21   ` Jon Smirl
2006-05-25 17:21     ` Jon Smirl
2006-05-25 18:56       ` Geert Bosch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).