public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* CALL_EXPR_MUST_TAIL_CALL and LLVM's musttail
@ 2021-12-07 22:38 Bradley Lucier
  2021-12-08 23:17 ` Jeff Law
  0 siblings, 1 reply; 22+ messages in thread
From: Bradley Lucier @ 2021-12-07 22:38 UTC (permalink / raw)
  To: gcc-help; +Cc: lucier, Marc Feeley

I use GCC mainly to compile Scheme code that has been translated to 
large, heavily macrofied C routines by the Gambit Scheme compiler:

https://github.com/gambit/gambit

Scheme semantics require that all procedure calls in "tail call 
position" are, in fact, tail called.

For calls between Scheme procedures defined in the same file, Gambit's 
compiler generates a C goto; intermodule calls use a trampoline.

Recently, the Gambit-generated C code was modified to take advantage of 
LLVM's musttail attribute, when available.

So I've been investigating whether gcc's -foptimize-sibling-calls option 
(for which I've found very little documentation) might produce similar 
results.

This commit to GCC's source tree defines must_tail_call_plugin.c, which 
seems (a) to be used only in some tests, and (b) to have had no patches 
since first committed:

https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=9a385c2d3d74ffed78f2ed9ad47b844d2f294105

I found this similar, more recent plugin:

https://github.com/pietro/gcc-musttail-plugin/

This seems to set up maybe_complain_about_tail_call in calls.c to fail 
when a call is not compiled to a tail call.

So, my understanding of what I need to do is:

1.  Built and dynamically link a plugin to set CALL_EXPR_MUST_TAIL_CALL 
on all calls.

2.  Use -foptimize-sibling-calls.

3.  gcc fails with an appropriate message on the first call it can't 
"tail call" in each file, or it succeeds.

So some questions:

1.  Is this how you recommend I proceed?

2.  Which plugin do you recommend?

3.  I think it would be better to have a flag named something like 
-Wcant-tail-call that would generate a warning in 
maybe_complain_about_tail_call, so that (a) one could catch all such 
cases at once and (b) one can throw an error if needed by specifying 
-Werror, and (c) I wouldn't need to fiddle with a plugin.

Do you agree with this idea?  I added a few flags to gcc about 20 years 
ago, I guess I could look at it again to see how to do such a thing now.

Brad Lucier

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: CALL_EXPR_MUST_TAIL_CALL and LLVM's musttail
  2021-12-07 22:38 CALL_EXPR_MUST_TAIL_CALL and LLVM's musttail Bradley Lucier
@ 2021-12-08 23:17 ` Jeff Law
  2021-12-09  0:07   ` Segher Boessenkool
  0 siblings, 1 reply; 22+ messages in thread
From: Jeff Law @ 2021-12-08 23:17 UTC (permalink / raw)
  To: Bradley Lucier, gcc-help; +Cc: Marc Feeley, lucier



On 12/7/2021 3:38 PM, Bradley Lucier via Gcc-help wrote:
> I use GCC mainly to compile Scheme code that has been translated to 
> large, heavily macrofied C routines by the Gambit Scheme compiler:
[ ... ]
Hi Bradley, haven't heard from you in quite some time...  Your testcases 
continue to be legendary for the GCC project :-)
>
> So I've been investigating whether gcc's -foptimize-sibling-calls 
> option (for which I've found very little documentation) might produce 
> similar results.
The option has been around for probably 20+ years at this point, you 
just might not have been aware of it.  They try, but do not guarantee 
tail call optimization.   They have all kinds of checks that might cause 
any particular call site to be rejected for tail call elimination.  It's 
enabled by default at O2 or higher.



>
> This commit to GCC's source tree defines must_tail_call_plugin.c, 
> which seems (a) to be used only in some tests, and (b) to have had no 
> patches since first committed:
Yea.  I don't think this has really gone anywhere, though I haven't 
followed the state of the plugin since its introduction.

jeff


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: CALL_EXPR_MUST_TAIL_CALL and LLVM's musttail
  2021-12-08 23:17 ` Jeff Law
@ 2021-12-09  0:07   ` Segher Boessenkool
  2021-12-09  3:31     ` Bradley Lucier
  2021-12-09 10:57     ` Florian Weimer
  0 siblings, 2 replies; 22+ messages in thread
From: Segher Boessenkool @ 2021-12-09  0:07 UTC (permalink / raw)
  To: Jeff Law; +Cc: Bradley Lucier, gcc-help, Marc Feeley, lucier

On Wed, Dec 08, 2021 at 04:17:33PM -0700, Jeff Law via Gcc-help wrote:
> On 12/7/2021 3:38 PM, Bradley Lucier via Gcc-help wrote:
> >So I've been investigating whether gcc's -foptimize-sibling-calls 
> >option (for which I've found very little documentation) might produce 
> >similar results.
> The option has been around for probably 20+ years at this point, you 
> just might not have been aware of it.  They try, but do not guarantee 
> tail call optimization.   They have all kinds of checks that might cause 
> any particular call site to be rejected for tail call elimination.  It's 
> enabled by default at O2 or higher.

(Including -Os.)

GCC will never not do a tail call if it knows any way to do it.  A very
low-tech way of checking if all your threaded code words got proper
tail calls is to simply look if there are any "return" instructions in
the generated machine code :-)


Segher

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: CALL_EXPR_MUST_TAIL_CALL and LLVM's musttail
  2021-12-09  0:07   ` Segher Boessenkool
@ 2021-12-09  3:31     ` Bradley Lucier
  2021-12-09 10:56       ` Florian Weimer
  2021-12-09 23:15       ` Segher Boessenkool
  2021-12-09 10:57     ` Florian Weimer
  1 sibling, 2 replies; 22+ messages in thread
From: Bradley Lucier @ 2021-12-09  3:31 UTC (permalink / raw)
  To: Segher Boessenkool, Jeff Law; +Cc: lucier, gcc-help, Marc Feeley

Here's the kind of call I'd like turned into a tail call:

return (*((___host*)((((long*)((___pc)-(1))) + (1 +((-2)))))))(___ps);

and it appears that gcc doesn't want to convert it to a tail call.   The 
calling function has the same argument sequence.

What should I search for in the .s file to see if this was tail called? 
  It appears that the assembly code that this is translated into is:

<stuff removed>
	movq	-864(%rbp), %rax
	subq	$9, %rax
	.loc 1 95837 157
	movq	(%rax), %rdx
	movq	-1032(%rbp), %rax
	movq	%rax, %rdi
	call	*%rdx
.LVL0:
.L1:
	.loc 1 95837 220
	movq	-24(%rbp), %rax
	subq	%fs:40, %rax
	je	.L17743
	call	__stack_chk_fail@PLT
.L17743:
	movq	-8(%rbp), %rbx
	leave
	.cfi_def_cfa 7, 8
	ret

This plugin claims to define a musttail attribute, but I'm not sure what 
that means, really:

https://github.com/pietro/gcc-musttail-plugin/

Thanks for your help.

Brad

PS:  LLVM seems to have a musttail attribute that you use at the call site:

-#define ___PROPER_TAIL_CALL(call) __attribute__((musttail)) return call

Does gcc have something like that?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: CALL_EXPR_MUST_TAIL_CALL and LLVM's musttail
  2021-12-09  3:31     ` Bradley Lucier
@ 2021-12-09 10:56       ` Florian Weimer
  2021-12-09 23:15       ` Segher Boessenkool
  1 sibling, 0 replies; 22+ messages in thread
From: Florian Weimer @ 2021-12-09 10:56 UTC (permalink / raw)
  To: Bradley Lucier via Gcc-help
  Cc: Segher Boessenkool, Jeff Law, Bradley Lucier, Marc Feeley, lucier

* Bradley Lucier via Gcc-help:

> Here's the kind of call I'd like turned into a tail call:
>
> return (*((___host*)((((long*)((___pc)-(1))) + (1 +((-2)))))))(___ps);
>
> and it appears that gcc doesn't want to convert it to a tail call.
> The calling function has the same argument sequence.
>
> What should I search for in the .s file to see if this was tail
> called?   It appears that the assembly code that this is translated
> into is:
>
> <stuff removed>
> 	movq	-864(%rbp), %rax
> 	subq	$9, %rax
> 	.loc 1 95837 157
> 	movq	(%rax), %rdx
> 	movq	-1032(%rbp), %rax
> 	movq	%rax, %rdi
> 	call	*%rdx
> .LVL0:
> .L1:
> 	.loc 1 95837 220
> 	movq	-24(%rbp), %rax
> 	subq	%fs:40, %rax
> 	je	.L17743
> 	call	__stack_chk_fail@PLT
> .L17743:
> 	movq	-8(%rbp), %rbx
> 	leave
> 	.cfi_def_cfa 7, 8
> 	ret

The stack guard suggests that there addressable variables on the stack.
Maybe GCC cannot prove that the arguments in the tail call cannot be
used to access those stack variables?

> PS:  LLVM seems to have a musttail attribute that you use at the call site:
>
> -#define ___PROPER_TAIL_CALL(call) __attribute__((musttail)) return call
>
> Does gcc have something like that?

There was a discussion about this in April:

  "musttail" statement attribute for GCC?
  <https://gcc.gnu.org/pipermail/gcc/2021-April/235882.html>

Some ABIs make tail-calls hard, and they are not always an optimization.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: CALL_EXPR_MUST_TAIL_CALL and LLVM's musttail
  2021-12-09  0:07   ` Segher Boessenkool
  2021-12-09  3:31     ` Bradley Lucier
@ 2021-12-09 10:57     ` Florian Weimer
  2021-12-09 13:30       ` Marc Feeley
  2021-12-09 22:00       ` Segher Boessenkool
  1 sibling, 2 replies; 22+ messages in thread
From: Florian Weimer @ 2021-12-09 10:57 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Jeff Law, gcc-help, Marc Feeley, lucier, Bradley Lucier

* Segher Boessenkool:

> On Wed, Dec 08, 2021 at 04:17:33PM -0700, Jeff Law via Gcc-help wrote:
>> On 12/7/2021 3:38 PM, Bradley Lucier via Gcc-help wrote:
>> >So I've been investigating whether gcc's -foptimize-sibling-calls 
>> >option (for which I've found very little documentation) might produce 
>> >similar results.
>> The option has been around for probably 20+ years at this point, you 
>> just might not have been aware of it.  They try, but do not guarantee 
>> tail call optimization.   They have all kinds of checks that might cause 
>> any particular call site to be rejected for tail call elimination.  It's 
>> enabled by default at O2 or higher.
>
> (Including -Os.)
>
> GCC will never not do a tail call if it knows any way to do it.

GCC will never do a tail call if the target function is known not to
return normally, to preserve the stack backtrace.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: CALL_EXPR_MUST_TAIL_CALL and LLVM's musttail
  2021-12-09 10:57     ` Florian Weimer
@ 2021-12-09 13:30       ` Marc Feeley
  2021-12-09 14:04         ` Florian Weimer
  2021-12-09 23:36         ` Segher Boessenkool
  2021-12-09 22:00       ` Segher Boessenkool
  1 sibling, 2 replies; 22+ messages in thread
From: Marc Feeley @ 2021-12-09 13:30 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Segher Boessenkool, Jeff Law, gcc-help, Lucier, Bradley J,
	Bradley Lucier

[Marc Feeley here, the author of the Gambit Scheme compiler that generates the code which would benefit from tail-calls]

What is needed (for Gambit) is a way for the programmer to express in the source code that the tail-call is known to be optimizable, i.e. it does not violate any of the tail-calling constraints, such as all local variables are dead (including those whose address was taken with “&var”) at the moment of the call.

The code generated by the Gambit compiler is sufficiently complex that it would be impossible (in the halting problem sense) to check that the tail-calling constraints are not violated. However, I know the constraints are not violated because the code was designed that way. And if I’m mistaken in this belief, then I am the one that will bare the blame of the ensuing problems.

So having a way to check during or after the compilation that tail-calls were optimized is not sufficient. A Scheme compiler that sometimes fails to compile is not very useful.  What is needed is a guarantee that specific C calls are optimized.

FYI the code generated by Gambit is put in “host” functions that look like this:

  typedef struct ___processor_state_struct { … };
  typedef struct ___processor_state_struct *___processor_state;
  typedef void (*___host)(___processor_state ___ps);

  static void host1(___processor_state ___ps) {
    …
    ___host h = …;
    h(___ps);
  }

So the last statement in the host function is a call to the next host function.  The parameter is the same type, and indeed the same value (a pointer to a structure containing the state of the virtual machine such as registers), and there is no result.

I would think this is the easiest situation to handle for a tail-call optimization.

Marc



> On Dec 9, 2021, at 5:57 AM, Florian Weimer <fweimer@redhat.com> wrote:
> 
> * Segher Boessenkool:
> 
>> On Wed, Dec 08, 2021 at 04:17:33PM -0700, Jeff Law via Gcc-help wrote:
>>> On 12/7/2021 3:38 PM, Bradley Lucier via Gcc-help wrote:
>>>> So I've been investigating whether gcc's -foptimize-sibling-calls 
>>>> option (for which I've found very little documentation) might produce 
>>>> similar results.
>>> The option has been around for probably 20+ years at this point, you 
>>> just might not have been aware of it.  They try, but do not guarantee 
>>> tail call optimization.   They have all kinds of checks that might cause 
>>> any particular call site to be rejected for tail call elimination.  It's 
>>> enabled by default at O2 or higher.
>> 
>> (Including -Os.)
>> 
>> GCC will never not do a tail call if it knows any way to do it.
> 
> GCC will never do a tail call if the target function is known not to
> return normally, to preserve the stack backtrace.
> 
> Thanks,
> Florian



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: CALL_EXPR_MUST_TAIL_CALL and LLVM's musttail
  2021-12-09 13:30       ` Marc Feeley
@ 2021-12-09 14:04         ` Florian Weimer
  2021-12-09 14:17           ` Marc Feeley
  2021-12-09 23:36         ` Segher Boessenkool
  1 sibling, 1 reply; 22+ messages in thread
From: Florian Weimer @ 2021-12-09 14:04 UTC (permalink / raw)
  To: Marc Feeley
  Cc: Segher Boessenkool, Jeff Law, gcc-help, Lucier, Bradley J,
	Bradley Lucier

* Marc Feeley:

> [Marc Feeley here, the author of the Gambit Scheme compiler that generates the code which would benefit from tail-calls]
>
> What is needed (for Gambit) is a way for the programmer to express in
> the source code that the tail-call is known to be optimizable, i.e. it
> does not violate any of the tail-calling constraints, such as all
> local variables are dead (including those whose address was taken with
> “&var”) at the moment of the call.

Are you presently using scopes for that?

See the different between f3 and f3_tail on x86-64:

void f1 (int *);
int f2 (int *);

int
f3 (void)
{
  int i;
  f1 (&i);
  return f2 (i);
}

int
f3_tail (void)
{
  int i;
  {
    int i0;
    f1 (&i0);
    i = i0;
  }
  return f2 (i);
}

If the life-time of the variables whose address has been taken has ended
at the time of the call in a tail position, I expect GCC to turn it into
a tail call (subject to the other constraints mentioned).

Thanks,
Florian


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: CALL_EXPR_MUST_TAIL_CALL and LLVM's musttail
  2021-12-09 14:04         ` Florian Weimer
@ 2021-12-09 14:17           ` Marc Feeley
  2021-12-10  4:31             ` Florian Weimer
  0 siblings, 1 reply; 22+ messages in thread
From: Marc Feeley @ 2021-12-09 14:17 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Segher Boessenkool, Jeff Law, gcc-help, Lucier, Bradley J,
	Bradley Lucier


> On Dec 9, 2021, at 9:04 AM, Florian Weimer <fweimer@redhat.com> wrote:
> 
> If the life-time of the variables whose address has been taken has ended
> at the time of the call in a tail position, I expect GCC to turn it into
> a tail call (subject to the other constraints mentioned).

That’s a very interesting point an I will give it a try.

What worries me however is that you say “I expect GCC to turn it into a tail call” and I really need “I guarantee that GCC will turn it into a tail call”.  In other words I’m not looking for an optimization that will make the code faster or less memory hungry in certain cases, but rather it is a requirement for the code to be correct (otherwise stack overflows will happen).

Marc



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: CALL_EXPR_MUST_TAIL_CALL and LLVM's musttail
  2021-12-09 10:57     ` Florian Weimer
  2021-12-09 13:30       ` Marc Feeley
@ 2021-12-09 22:00       ` Segher Boessenkool
  1 sibling, 0 replies; 22+ messages in thread
From: Segher Boessenkool @ 2021-12-09 22:00 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Jeff Law, gcc-help, Marc Feeley, lucier, Bradley Lucier

On Thu, Dec 09, 2021 at 11:57:19AM +0100, Florian Weimer wrote:
> * Segher Boessenkool:
> 
> > On Wed, Dec 08, 2021 at 04:17:33PM -0700, Jeff Law via Gcc-help wrote:
> >> On 12/7/2021 3:38 PM, Bradley Lucier via Gcc-help wrote:
> >> >So I've been investigating whether gcc's -foptimize-sibling-calls 
> >> >option (for which I've found very little documentation) might produce 
> >> >similar results.
> >> The option has been around for probably 20+ years at this point, you 
> >> just might not have been aware of it.  They try, but do not guarantee 
> >> tail call optimization.   They have all kinds of checks that might cause 
> >> any particular call site to be rejected for tail call elimination.  It's 
> >> enabled by default at O2 or higher.
> >
> > (Including -Os.)
> >
> > GCC will never not do a tail call if it knows any way to do it.
> 
> GCC will never do a tail call if the target function is known not to
> return normally, to preserve the stack backtrace.

Yes, and that is arguably a bug.  We usually can generate better code
without this artificial limitation, and there are many other cases where
we do similar optimisations that make non-DWARF backtraces harder.  Such
backtraces are of questionable value these days since we have inlining
as well as outlining (or partial inlining at least).

I think we have a PR open for this already?

This usually does not matter much at all, since such paths are called at
most once per program run.

Some targets will never tail call, and some have further limitations,
but on most targets (more mainstream targets anyway) you get tail calls
wherever you could.


Segher

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: CALL_EXPR_MUST_TAIL_CALL and LLVM's musttail
  2021-12-09  3:31     ` Bradley Lucier
  2021-12-09 10:56       ` Florian Weimer
@ 2021-12-09 23:15       ` Segher Boessenkool
  1 sibling, 0 replies; 22+ messages in thread
From: Segher Boessenkool @ 2021-12-09 23:15 UTC (permalink / raw)
  To: Bradley Lucier; +Cc: Jeff Law, lucier, gcc-help, Marc Feeley

On Wed, Dec 08, 2021 at 10:31:53PM -0500, Bradley Lucier wrote:
> Here's the kind of call I'd like turned into a tail call:
> 
> return (*((___host*)((((long*)((___pc)-(1))) + (1 +((-2)))))))(___ps);

A simple quite standard indirect call.

> and it appears that gcc doesn't want to convert it to a tail call.   The 
> calling function has the same argument sequence.

There are more reasons.  Many are target-specific, and many are to do
with *other* things in the calling function.

> What should I search for in the .s file to see if this was tail called? 
>  It appears that the assembly code that this is translated into is:
> 
> <stuff removed>
> 	movq	-864(%rbp), %rax
> 	subq	$9, %rax
> 	.loc 1 95837 157
> 	movq	(%rax), %rdx
> 	movq	-1032(%rbp), %rax
> 	movq	%rax, %rdi
> 	call	*%rdx
> .LVL0:
> .L1:
> 	.loc 1 95837 220
> 	movq	-24(%rbp), %rax
> 	subq	%fs:40, %rax
> 	je	.L17743
> 	call	__stack_chk_fail@PLT
> .L17743:
> 	movq	-8(%rbp), %rbx
> 	leave
> 	.cfi_def_cfa 7, 8
> 	ret

The "ret" says it is not a tail call.  Perhaps because it needs the
leave insn.  Oh, much more likely because __stack_chk_fail is noreturn.
We really need to fix that :-)

If you use -fdump-rtl-sibling-all the dump file might tell you more?

> PS:  LLVM seems to have a musttail attribute that you use at the call site:
> 
> -#define ___PROPER_TAIL_CALL(call) __attribute__((musttail)) return call
> 
> Does gcc have something like that?

I don't think so, no.

Note that this will fail unpredictably in the same way as what you
complained about before, and more so if you try to build for another
architecture than what the original author did.


Segher

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: CALL_EXPR_MUST_TAIL_CALL and LLVM's musttail
  2021-12-09 13:30       ` Marc Feeley
  2021-12-09 14:04         ` Florian Weimer
@ 2021-12-09 23:36         ` Segher Boessenkool
  2021-12-10  1:06           ` Marc Feeley
  1 sibling, 1 reply; 22+ messages in thread
From: Segher Boessenkool @ 2021-12-09 23:36 UTC (permalink / raw)
  To: Marc Feeley
  Cc: Florian Weimer, Jeff Law, gcc-help, Lucier, Bradley J, Bradley Lucier

On Thu, Dec 09, 2021 at 08:30:56AM -0500, Marc Feeley wrote:
> [Marc Feeley here, the author of the Gambit Scheme compiler that generates the code which would benefit from tail-calls]

Hi :-)

> What is needed (for Gambit) is a way for the programmer to express in the source code that the tail-call is known to be optimizable, i.e. it does not violate any of the tail-calling constraints, such as all local variables are dead (including those whose address was taken with “&var”) at the moment of the call.

How can you express that in the source code?  The compiler might not
agree, especially on architectures the code write did not consider.

And why do you need it?  Is it just to not have unlimited poterntial
stack use, like, from threaded code?

I think the best the compiler can do is say "sorry, no can do" when it
does not know how to elide a return after some specific call, on
whatever cpu+os combo you are targeting.

> The code generated by the Gambit compiler is sufficiently complex that it would be impossible (in the halting problem sense) to check that the tail-calling constraints are not violated. However, I know the constraints are not violated because the code was designed that way. And if I’m mistaken in this belief, then I am the one that will bare the blame of the ensuing problems.

On most architectures you can simply check if there are any "return"
instructions generated, no?  Or do you generate any actual calls (to
user code) as well?

> So having a way to check during or after the compilation that tail-calls were optimized is not sufficient. A Scheme compiler that sometimes fails to compile is not very useful.  What is needed is a guarantee that specific C calls are optimized.

Then you need to write code that works as you want on all configurations
you support, state what those configs are, and test that it compiles on
all such configs, before every release.

> FYI the code generated by Gambit is put in “host” functions that look like this:
> 
>   typedef struct ___processor_state_struct { … };
>   typedef struct ___processor_state_struct *___processor_state;
>   typedef void (*___host)(___processor_state ___ps);
> 
>   static void host1(___processor_state ___ps) {
>     …
>     ___host h = …;
>     h(___ps);
>   }
> 
> So the last statement in the host function is a call to the next host function.

Classical threaded code.  Good :-)


Segher

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: CALL_EXPR_MUST_TAIL_CALL and LLVM's musttail
  2021-12-09 23:36         ` Segher Boessenkool
@ 2021-12-10  1:06           ` Marc Feeley
  2021-12-10 22:40             ` Jeff Law
                               ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Marc Feeley @ 2021-12-10  1:06 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Florian Weimer, Jeff Law, gcc-help, Lucier, Bradley J, Bradley Lucier


> On Dec 9, 2021, at 6:36 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
>> 
>> The code generated by the Gambit compiler is sufficiently complex that it would be impossible (in the halting problem sense) to check that the tail-calling constraints are not violated. However, I know the constraints are not violated because the code was designed that way. And if I’m mistaken in this belief, then I am the one that will bare the blame of the ensuing problems.
> 
> On most architectures you can simply check if there are any "return"
> instructions generated, no?  Or do you generate any actual calls (to
> user code) as well?
> 

You mean a “return” at the assembly code level?  Seems very brittle… on x86 a “pop %eax; jmp *%eax” will do the same as a “ret” and gcc might generate that sequence (from my point of view).  In any case, as I said previously, I don’t want to check if gcc did not optimize the tail calls… it is too late to do anything other than essentially tell the programmer “sorry I couldn’t compile that program correctly”.

>> So having a way to check during or after the compilation that tail-calls were optimized is not sufficient. A Scheme compiler that sometimes fails to compile is not very useful.  What is needed is a guarantee that specific C calls are optimized.
> 
> Then you need to write code that works as you want on all configurations
> you support, state what those configs are, and test that it compiles on
> all such configs, before every release.

This does not solve my problem either. No testing will cover all possible situations because for a particular architecture the tail call optimization may work on all the tests I can come up with, but some programmer using Gambit may write a program for which the tail calls in the C code generated by the Gambit compiler are not optimized.

I’m not looking for a guarantee that tail calls are supported transparently on all architectures.  Currently Gambit uses a trampoline to implement Scheme tail calls, and this can be used as a fallback for architectures where C TCO is not guaranteed.

So what I need is some way to test at compile time if for the target architecture and compile options there is a guarantee that tail calls will be optimized (if the C code obeys the usual set of constraints).  Something like

  #if __has_attribute(tail_call_supported_on_this_architecture)
  /* use tail calls for jumping around... */
  #else
  /* use trampoline fallback... */
  #endif

Marc



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: CALL_EXPR_MUST_TAIL_CALL and LLVM's musttail
  2021-12-09 14:17           ` Marc Feeley
@ 2021-12-10  4:31             ` Florian Weimer
  2021-12-11  2:44               ` Segher Boessenkool
  0 siblings, 1 reply; 22+ messages in thread
From: Florian Weimer @ 2021-12-10  4:31 UTC (permalink / raw)
  To: Marc Feeley
  Cc: Segher Boessenkool, Jeff Law, gcc-help, Lucier, Bradley J,
	Bradley Lucier

* Marc Feeley:

>> On Dec 9, 2021, at 9:04 AM, Florian Weimer <fweimer@redhat.com> wrote:
>> 
>> If the life-time of the variables whose address has been taken has ended
>> at the time of the call in a tail position, I expect GCC to turn it into
>> a tail call (subject to the other constraints mentioned).
>
> That’s a very interesting point an I will give it a try.
>
> What worries me however is that you say “I expect GCC to turn it into
> a tail call” and I really need “I guarantee that GCC will turn it into
> a tail call”.

I'm just a GCC user.  And GCC is sufficiently complex and has many
targets that maybe no GCC developer wants to make such a guarantee.

If you don't want to patch glibc, you can perhaps inspect the generated
assembler code.  If the only indirect calls are the tail calls, it might
work.  Kind of like the Evil Mangler, but with less mangling.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: CALL_EXPR_MUST_TAIL_CALL and LLVM's musttail
  2021-12-10  1:06           ` Marc Feeley
@ 2021-12-10 22:40             ` Jeff Law
  2021-12-12 17:32               ` Segher Boessenkool
  2021-12-11 23:02             ` Segher Boessenkool
  2021-12-13 17:48             ` Avi Kivity
  2 siblings, 1 reply; 22+ messages in thread
From: Jeff Law @ 2021-12-10 22:40 UTC (permalink / raw)
  To: Marc Feeley, Segher Boessenkool
  Cc: Florian Weimer, gcc-help, Lucier, Bradley J, Bradley Lucier



On 12/9/2021 6:06 PM, Marc Feeley wrote:
>> On Dec 9, 2021, at 6:36 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>
>>> The code generated by the Gambit compiler is sufficiently complex that it would be impossible (in the halting problem sense) to check that the tail-calling constraints are not violated. However, I know the constraints are not violated because the code was designed that way. And if I’m mistaken in this belief, then I am the one that will bare the blame of the ensuing problems.
>> On most architectures you can simply check if there are any "return"
>> instructions generated, no?  Or do you generate any actual calls (to
>> user code) as well?
>>
> You mean a “return” at the assembly code level?  Seems very brittle… on x86 a “pop %eax; jmp *%eax” will do the same as a “ret” and gcc might generate that sequence (from my point of view).
It might generate something horrible like that for retpolines, but for 
normal code that kind of sequence would be awful for performance (it'll 
mess up the call/return predictor stack) and terrible for security (it's 
a ROP gadget).





>   In any case, as I said previously, I don’t want to check if gcc did not optimize the tail calls… it is too late to do anything other than essentially tell the programmer “sorry I couldn’t compile that program correctly”.
We don't generally like that kind of diagnostic particularly from the 
code generator.  If it's valid C code, then we need to generate correct 
target code.
>
> So what I need is some way to test at compile time if for the target architecture and compile options there is a guarantee that tail calls will be optimized (if the C code obeys the usual set of constraints).  Something like
>
>    #if __has_attribute(tail_call_supported_on_this_architecture)
>    /* use tail calls for jumping around... */
>    #else
>    /* use trampoline fallback... */
>    #endif
Which is the fundamental problem.  GCC simply doesn't have the concept 
that "this must be implemented as a tail call".

You might want to touch base with Iain Sandoe though.  I vaguely 
remember this coming up in the context of coroutines.

jeff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: CALL_EXPR_MUST_TAIL_CALL and LLVM's musttail
  2021-12-10  4:31             ` Florian Weimer
@ 2021-12-11  2:44               ` Segher Boessenkool
  0 siblings, 0 replies; 22+ messages in thread
From: Segher Boessenkool @ 2021-12-11  2:44 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Marc Feeley, Jeff Law, gcc-help, Lucier, Bradley J, Bradley Lucier

On Fri, Dec 10, 2021 at 05:31:39AM +0100, Florian Weimer wrote:
> * Marc Feeley:
> >> On Dec 9, 2021, at 9:04 AM, Florian Weimer <fweimer@redhat.com> wrote:
> > What worries me however is that you say “I expect GCC to turn it into
> > a tail call” and I really need “I guarantee that GCC will turn it into
> > a tail call”.
> 
> I'm just a GCC user.  And GCC is sufficiently complex and has many
> targets that maybe no GCC developer wants to make such a guarantee.

It is impossible to make that guarantee (with most ABIs).  What we can
(and imho should) do is add something like that musttail attribute, to
make the compiler loudly complain if it cannot do it, instead of
silently generate code that does not do what the user needs.

But that will only work with new compilers, after we implemented such
a feature.  For all older compilers (going back 21 years, anyway) you
can just ask for it (-O2 or -Os is enough), and check if you got what
you needed.

> If you don't want to patch glibc, you can perhaps inspect the generated
> assembler code.  If the only indirect calls are the tail calls, it might
> work.  Kind of like the Evil Mangler, but with less mangling.

Yup :-)


Segher

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: CALL_EXPR_MUST_TAIL_CALL and LLVM's musttail
  2021-12-10  1:06           ` Marc Feeley
  2021-12-10 22:40             ` Jeff Law
@ 2021-12-11 23:02             ` Segher Boessenkool
  2021-12-13 17:48             ` Avi Kivity
  2 siblings, 0 replies; 22+ messages in thread
From: Segher Boessenkool @ 2021-12-11 23:02 UTC (permalink / raw)
  To: Marc Feeley
  Cc: Florian Weimer, Jeff Law, gcc-help, Lucier, Bradley J, Bradley Lucier

On Thu, Dec 09, 2021 at 08:06:51PM -0500, Marc Feeley wrote:
> > On Dec 9, 2021, at 6:36 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> >> The code generated by the Gambit compiler is sufficiently complex that it would be impossible (in the halting problem sense) to check that the tail-calling constraints are not violated. However, I know the constraints are not violated because the code was designed that way. And if I’m mistaken in this belief, then I am the one that will bare the blame of the ensuing problems.
> > 
> > On most architectures you can simply check if there are any "return"
> > instructions generated, no?  Or do you generate any actual calls (to
> > user code) as well?
> 
> You mean a “return” at the assembly code level?  Seems very brittle… on x86 a “pop %eax; jmp *%eax” will do the same as a “ret” and gcc might generate that sequence (from my point of view).  In any case, as I said previously, I don’t want to check if gcc did not optimize the tail calls… it is too late to do anything other than essentially tell the programmer “sorry I couldn’t compile that program correctly”.

I know of no single compiler that does not have very predictable insns
as return sequence.  There could be more than one sequence you need to
check for, sure, but in reality compilers generate only one or a few
code sequences to do returns: by its very nature a return can not be
combined with some other operation and optimised that way.

 >> So having a way to check during or after the compilation that tail-calls were optimized is not sufficient. A Scheme compiler that sometimes fails to compile is not very useful.  What is needed is a guarantee that specific C calls are optimized.
> > 
> > Then you need to write code that works as you want on all configurations
> > you support, state what those configs are, and test that it compiles on
> > all such configs, before every release.
> 
> This does not solve my problem either. No testing will cover all possible situations because for a particular architecture the tail call optimization may work on all the tests I can come up with, but some programmer using Gambit may write a program for which the tail calls in the C code generated by the Gambit compiler are not optimized.

You need to know what a particular compiler (for a particular target, in
a particular configuration, invoked with particular flags) will do with
epilogues, yes.  This is *unavoidable* if the compiler will not tell you
itself if every tail position call is optimised to an actual tail call,
and no existing GCC version will tell you.

> I’m not looking for a guarantee that tail calls are supported transparently on all architectures.  Currently Gambit uses a trampoline to implement Scheme tail calls, and this can be used as a fallback for architectures where C TCO is not guaranteed.
> 
> So what I need is some way to test at compile time if for the target architecture and compile options there is a guarantee that tail calls will be optimized (if the C code obeys the usual set of constraints).  Something like
> 
>   #if __has_attribute(tail_call_supported_on_this_architecture)
>   /* use tail calls for jumping around... */
>   #else
>   /* use trampoline fallback... */
>   #endif

This is impossible to do.  Tail calls are supported on *all* GCC
architectures, but not every (tail position) function call will be
optimised to a tail call, and not every such function call *can* (except
on architectures with a separate return stack of course, but
unfortunately there are very few of those).


Segher

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: CALL_EXPR_MUST_TAIL_CALL and LLVM's musttail
  2021-12-10 22:40             ` Jeff Law
@ 2021-12-12 17:32               ` Segher Boessenkool
  2021-12-12 18:08                 ` Jonathan Wakely
  0 siblings, 1 reply; 22+ messages in thread
From: Segher Boessenkool @ 2021-12-12 17:32 UTC (permalink / raw)
  To: Jeff Law
  Cc: Marc Feeley, Florian Weimer, gcc-help, Lucier, Bradley J, Bradley Lucier

On Fri, Dec 10, 2021 at 03:40:30PM -0700, Jeff Law wrote:
> On 12/9/2021 6:06 PM, Marc Feeley wrote:
> >  In any case, as I said previously, I don’t want to check if gcc did 
> >  not optimize the tail calls… it is too late to do anything other than 
> >  essentially tell the programmer “sorry I couldn’t compile that 
> >  program correctly”.
> We don't generally like that kind of diagnostic particularly from the 
> code generator.  If it's valid C code, then we need to generate correct 
> target code.

Yes.  But we *do* have sorry()s for some other cases where it is in
principle possible to generate correct code, but we just don't have that
implemented.

> >So what I need is some way to test at compile time if for the target 
> >architecture and compile options there is a guarantee that tail calls will 
> >be optimized (if the C code obeys the usual set of constraints).  
> >Something like
> >
> >   #if __has_attribute(tail_call_supported_on_this_architecture)
> >   /* use tail calls for jumping around... */
> >   #else
> >   /* use trampoline fallback... */
> >   #endif
> Which is the fundamental problem.  GCC simply doesn't have the concept 
> that "this must be implemented as a tail call".

All architecture support tail calls, and all architectures do not
support them everywhere: for some code GCC will not do a tail call on
any architecture.

An attribute that means "sorry() if you cannot do a tail call here" (or
"on any call to this function") can be implemented, and will work fine
in practice for all users, in my estimation.

> You might want to touch base with Iain Sandoe though.  I vaguely 
> remember this coming up in the context of coroutines.

Yes, it did.  I don't remember the exact considerations there.  It is
done from GCC itself, so it has more introspection than if it would be
done from user code.  Not sure if that matters for the actual
implementation though?


Segher

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: CALL_EXPR_MUST_TAIL_CALL and LLVM's musttail
  2021-12-12 17:32               ` Segher Boessenkool
@ 2021-12-12 18:08                 ` Jonathan Wakely
  2021-12-12 21:41                   ` Segher Boessenkool
  0 siblings, 1 reply; 22+ messages in thread
From: Jonathan Wakely @ 2021-12-12 18:08 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Jeff Law, Marc Feeley, Florian Weimer, Lucier, Bradley J,
	Bradley Lucier, gcc-help

On Sun, 12 Dec 2021, 17:34 Segher Boessenkool, <segher@kernel.crashing.org>
wrote:

> On Fri, Dec 10, 2021 at 03:40:30PM -0700, Jeff Law wrote:
> > On 12/9/2021 6:06 PM, Marc Feeley wrote:
> > >  In any case, as I said previously, I don’t want to check if gcc did
> > >  not optimize the tail calls… it is too late to do anything other than
> > >  essentially tell the programmer “sorry I couldn’t compile that
> > >  program correctly”.
> > We don't generally like that kind of diagnostic particularly from the
> > code generator.  If it's valid C code, then we need to generate correct
> > target code.
>
> Yes.  But we *do* have sorry()s for some other cases where it is in
> principle possible to generate correct code, but we just don't have that
> implemented.
>
> > >So what I need is some way to test at compile time if for the target
> > >architecture and compile options there is a guarantee that tail calls
> will
> > >be optimized (if the C code obeys the usual set of constraints).
> > >Something like
> > >
> > >   #if __has_attribute(tail_call_supported_on_this_architecture)
> > >   /* use tail calls for jumping around... */
> > >   #else
> > >   /* use trampoline fallback... */
> > >   #endif
> > Which is the fundamental problem.  GCC simply doesn't have the concept
> > that "this must be implemented as a tail call".
>
> All architecture support tail calls, and all architectures do not
> support them everywhere: for some code GCC will not do a tail call on
> any architecture.
>
> An attribute that means "sorry() if you cannot do a tail call here" (or
> "on any call to this function") can be implemented, and will work fine
> in practice for all users, in my estimation.
>

That seems roughly analogous to always_inline.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: CALL_EXPR_MUST_TAIL_CALL and LLVM's musttail
  2021-12-12 18:08                 ` Jonathan Wakely
@ 2021-12-12 21:41                   ` Segher Boessenkool
  0 siblings, 0 replies; 22+ messages in thread
From: Segher Boessenkool @ 2021-12-12 21:41 UTC (permalink / raw)
  To: Jonathan Wakely
  Cc: Jeff Law, Marc Feeley, Florian Weimer, Lucier, Bradley J,
	Bradley Lucier, gcc-help

On Sun, Dec 12, 2021 at 06:08:06PM +0000, Jonathan Wakely wrote:
> On Sun, 12 Dec 2021, 17:34 Segher Boessenkool, <segher@kernel.crashing.org>
> wrote:
> > On Fri, Dec 10, 2021 at 03:40:30PM -0700, Jeff Law wrote:
> > > On 12/9/2021 6:06 PM, Marc Feeley wrote:
> > > >   #if __has_attribute(tail_call_supported_on_this_architecture)
> > > >   /* use tail calls for jumping around... */
> > > >   #else
> > > >   /* use trampoline fallback... */
> > > >   #endif
> > > Which is the fundamental problem.  GCC simply doesn't have the concept
> > > that "this must be implemented as a tail call".
> >
> > All architecture support tail calls, and all architectures do not
> > support them everywhere: for some code GCC will not do a tail call on
> > any architecture.
> >
> > An attribute that means "sorry() if you cannot do a tail call here" (or
> > "on any call to this function") can be implemented, and will work fine
> > in practice for all users, in my estimation.
> 
> That seems roughly analogous to always_inline.

Yes indeed, good example :-)  This will be implemented very late in the
pass pipeline instead of very early, but that does not matter for what
the user sees.


Segher

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: CALL_EXPR_MUST_TAIL_CALL and LLVM's musttail
  2021-12-10  1:06           ` Marc Feeley
  2021-12-10 22:40             ` Jeff Law
  2021-12-11 23:02             ` Segher Boessenkool
@ 2021-12-13 17:48             ` Avi Kivity
  2021-12-15 18:29               ` Bradley Lucier
  2 siblings, 1 reply; 22+ messages in thread
From: Avi Kivity @ 2021-12-13 17:48 UTC (permalink / raw)
  To: Marc Feeley, Segher Boessenkool
  Cc: Florian Weimer, Lucier, Bradley J, Bradley Lucier, gcc-help


On 10/12/2021 03.06, Marc Feeley via Gcc-help wrote:
>> On Dec 9, 2021, at 6:36 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>>
>>> The code generated by the Gambit compiler is sufficiently complex that it would be impossible (in the halting problem sense) to check that the tail-calling constraints are not violated. However, I know the constraints are not violated because the code was designed that way. And if I’m mistaken in this belief, then I am the one that will bare the blame of the ensuing problems.
>> On most architectures you can simply check if there are any "return"
>> instructions generated, no?  Or do you generate any actual calls (to
>> user code) as well?
>>
> You mean a “return” at the assembly code level?  Seems very brittle… on x86 a “pop %eax; jmp *%eax” will do the same as a “ret” and gcc might generate that sequence (from my point of view).  In any case, as I said previously, I don’t want to check if gcc did not optimize the tail calls… it is too late to do anything other than essentially tell the programmer “sorry I couldn’t compile that program correctly”.
>
>>> So having a way to check during or after the compilation that tail-calls were optimized is not sufficient. A Scheme compiler that sometimes fails to compile is not very useful.  What is needed is a guarantee that specific C calls are optimized.
>> Then you need to write code that works as you want on all configurations
>> you support, state what those configs are, and test that it compiles on
>> all such configs, before every release.
> This does not solve my problem either. No testing will cover all possible situations because for a particular architecture the tail call optimization may work on all the tests I can come up with, but some programmer using Gambit may write a program for which the tail calls in the C code generated by the Gambit compiler are not optimized.
>
> I’m not looking for a guarantee that tail calls are supported transparently on all architectures.  Currently Gambit uses a trampoline to implement Scheme tail calls, and this can be used as a fallback for architectures where C TCO is not guaranteed.
>
> So what I need is some way to test at compile time if for the target architecture and compile options there is a guarantee that tail calls will be optimized (if the C code obeys the usual set of constraints).  Something like
>
>    #if __has_attribute(tail_call_supported_on_this_architecture)
>    /* use tail calls for jumping around... */
>    #else
>    /* use trampoline fallback... */
>    #endif
>
> Marc
>
>

Perhaps you should generate one function with a huge switch instead of 
separate functions. Each original function translates to a case, and a 
tail call translates to a goto. Calling one of the functions from 
outside translates to using the switch to select the initial case. This 
should work well if only a minority of the calls are from outside.




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: CALL_EXPR_MUST_TAIL_CALL and LLVM's musttail
  2021-12-13 17:48             ` Avi Kivity
@ 2021-12-15 18:29               ` Bradley Lucier
  0 siblings, 0 replies; 22+ messages in thread
From: Bradley Lucier @ 2021-12-15 18:29 UTC (permalink / raw)
  To: Avi Kivity, Marc Feeley, Segher Boessenkool
  Cc: lucier, Florian Weimer, Bradley Lucier, gcc-help

On 12/13/21 12:48 PM, Avi Kivity wrote:
> Perhaps you should generate one function with a huge switch instead of 
> separate functions. Each original function translates to a case, and a 
> tail call translates to a goto. Calling one of the functions from 
> outside translates to using the switch to select the initial case. This 
> should work well if only a minority of the calls are from outside.

That's how Gambit's C code has been generated for a while now.

One problem is in (Scheme) code like

(map abs list-of-numbers)

where, even if the implicit loop in map is expanded inline, each call to 
abs may be an intermodule call to the runtime library.  With some other 
tricks (e.g., eta expansion 
https://funcall.blogspot.com/2021/04/and-tail-recursion.html) the fast 
path for the abs might also be inlined, in any case intermodule calls 
are not as infrequent as one might guess.

Brad

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2021-12-15 18:29 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-07 22:38 CALL_EXPR_MUST_TAIL_CALL and LLVM's musttail Bradley Lucier
2021-12-08 23:17 ` Jeff Law
2021-12-09  0:07   ` Segher Boessenkool
2021-12-09  3:31     ` Bradley Lucier
2021-12-09 10:56       ` Florian Weimer
2021-12-09 23:15       ` Segher Boessenkool
2021-12-09 10:57     ` Florian Weimer
2021-12-09 13:30       ` Marc Feeley
2021-12-09 14:04         ` Florian Weimer
2021-12-09 14:17           ` Marc Feeley
2021-12-10  4:31             ` Florian Weimer
2021-12-11  2:44               ` Segher Boessenkool
2021-12-09 23:36         ` Segher Boessenkool
2021-12-10  1:06           ` Marc Feeley
2021-12-10 22:40             ` Jeff Law
2021-12-12 17:32               ` Segher Boessenkool
2021-12-12 18:08                 ` Jonathan Wakely
2021-12-12 21:41                   ` Segher Boessenkool
2021-12-11 23:02             ` Segher Boessenkool
2021-12-13 17:48             ` Avi Kivity
2021-12-15 18:29               ` Bradley Lucier
2021-12-09 22:00       ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).