Work in progress: "Super Sib Calls"; opinions sought

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Work in progress: "Super Sib Calls"; opinions sought
@ 2002-09-08 19:52 Andreas Bauer
  2002-09-08 23:23 ` Richard Henderson
  0 siblings, 1 reply; 29+ messages in thread
From: Andreas Bauer @ 2002-09-08 19:52 UTC (permalink / raw)
  To: gcc; +Cc: pizka, jason.ozolins

[-- Attachment #1: Type: text/plain, Size: 3806 bytes --]

Dear gcc hackers,

I'm currently trying to develop and implement a more general tail call
optimisation for gcc 3.2.  My focus is on enhancing the compilation of
functional languages that use GNU C as target platform, such as Haskell
or Mercury.

APPROACH(ES):

I'm planning to go ahead with two separate approaches:  a) implementation
of what I call "super sib calls" and b) implementation of a "fully"
general tail call optimisation that can be applied to all the cases where
(super) sib calls normally fail.

I'm treating super sib calls as an independend call sequence on the RTL
level to not mess with the existing sibling call stuff.  But the
technique I'm applying is very similar to what's already there in gcc.
So, I'm merely removing some of the common constraints, e.g.

 - super sib calls may have a positive stack frame

 - super sib calls must not necessarily match in function signature

 - super sib call arguments are not concerned by overlapping arguments

 - etc.

This is achieved by creating "almost a normal call", but right before the
actual call instruction would be used, I'm copying all the outgoing
arguments back into the incoming argument space and jump to the subroutine
instead.  The result is that I can reuse my incoming argument space and
prevent stack growth for such cases.

Unlike broad and general approaches, this technique is binary compatible
with gcc compiled programs and doesn't require extra precautions.  It
does have some flaws though, but I'm mailing to discuss them with you.

Please, find attached a first patch that implements super sib calls for
gcc 3.2.  I have only tested it on x86, so far with a small sample code
that I have also included with this email (in case you want to try it on
a small demo program).  And no, the current patch does not yet solve all
the restrictions of sibcalls, but in the long run it should also support

 - indirect calls

 - more architectures

 - struct-returns / -args(?).

I'd be very grateful, if you pointed out problems in my code and
implications that I may not be aware of yet.  As I said, it's only a
start and it's work in progress, but an early discussion on the list may
prevent me and others from doing obvious mistakes when messing with call
chains and the like.

EXAMPLE:

The following little program computes the factorial of a number, but if I
use the "ordinary" gcc 3.2 as is, then on my machine I'm not getting
beyond fac(6) without stack overflow.  Using --foptimize-tail-calls and
my patch, you will have stack reusage and therefore no overflow at all.

#include <stdio.h>

int get_num (void);
int factorial (long long);
int fact_help (int, int);

int recursion;

int main ()
{
  int value;
  printf ("Enter 0 to exit.\n");
  value = get_num();
  while (value != 0)
    {
      printf ("Factorial: %d\n", factorial(value));
      printf ("Calls: %d\n", recursion);
      value = get_num();
    }
  return 0;
}

int get_num (void)
{
  int num;
  recursion = 0;
  printf ("Number: ");
  scanf ("%d", &num);
  return num;
}

int factorial (long long x)
{
  if (x != 1)
    {
      recursion++;
      /* This is being optimized as tail call */
      return fact_help (x, x - 1);
    }
  else
    return 1;
}

int fact_help (int total, int x)
{
  int dummy_array[500000];
  
  if (x == 1)
    return total;
  else 
    {
      recursion++;
      /* Another tail call */
      return fact_help (total * x, x - 1);
    }
}

Excuse the length of this mail and the patch, but hopefully it helped
making clear what I am working on and what I am trying to achieve here.
However, I did leave out the discussion of a fully general
implementation, as this is not relevant just yet.  Hope, that's ok.

Thank you in advance.
Cheers,
Andi.

P.S. the patch is from my current local cvs against the original gcc 3.2.

[-- Attachment #2: super_sib_call_0_1.diff --]
[-- Type: text/plain, Size: 55137 bytes --]

Only in gcc-3.2-cvs/gcc: .#calls.c.1.2
Only in gcc-3.2-cvs/gcc: .#cfgbuild.c.1.1.1.1
Only in gcc-3.2-cvs/gcc: .#except.c.1.1.1.1
Only in gcc-3.2-cvs/gcc: .#function.c.1.1.1.1
Only in gcc-3.2-cvs/gcc: .#integrate.c.1.2
Only in gcc-3.2-cvs/gcc: .#jump.c.1.1.1.1
Only in gcc-3.2-cvs/gcc: .#rtl.h.1.2
Only in gcc-3.2-cvs/gcc: .#sibcall.c.1.1.1.1
Only in gcc-3.2-cvs/gcc: .#toplev.c.1.2
Only in gcc-3.2-cvs/gcc: CVS
Only in gcc-3.2-cvs/gcc: TAGS
Common subdirectories: gcc-3.2/gcc/ada and gcc-3.2-cvs/gcc/ada
diff -b -w -B -d -p -c gcc-3.2/gcc/calls.c gcc-3.2-cvs/gcc/calls.c
*** gcc-3.2/gcc/calls.c	Fri Apr  5 09:28:47 2002
--- gcc-3.2-cvs/gcc/calls.c	Mon Sep  9 11:31:10 2002
*************** static int calls_function_1	PARAMS ((tre
*** 167,172 ****
--- 167,173 ----
  /* Nonzero if this is a syscall that makes a new process in the image of
     the current one.  */
  #define ECF_FORK_OR_EXEC	128
+ /* Nonzero if this is a sib call.  */
  #define ECF_SIBCALL		256
  /* Nonzero if this is a call to "pure" function (like const function,
     but may read memory.  */
*************** static int calls_function_1	PARAMS ((tre
*** 178,183 ****
--- 179,195 ----
  #define ECF_ALWAYS_RETURN	2048
  /* Create libcall block around the call.  */
  #define ECF_LIBCALL_BLOCK	4096
+ /* Nonzero if this call is a more general tail call, i.e.
+    (a super sib call).  */
+ #define ECF_TAILCALL            8192
+ 
+ /* Definitions for the different call instruction sequences
+    generated by expand_call.  */
+ enum call_seq {
+   sib_call_seq = 0,
+   normal_call_seq,
+   tail_call_seq
+ };
  
  static void emit_call_1		PARAMS ((rtx, tree, tree, HOST_WIDE_INT,
  					 HOST_WIDE_INT, HOST_WIDE_INT, rtx,
*************** emit_call_1 (funexp, fndecl, funtype, st
*** 627,633 ****
        current_function_calls_setjmp = 1;
      }
  
!   SIBLING_CALL_P (call_insn) = ((ecf_flags & ECF_SIBCALL) != 0);
  
    /* Restore this now, so that we do defer pops for this call's args
       if the context of the call as a whole permits.  */
--- 639,654 ----
        current_function_calls_setjmp = 1;
      }
  
!   /* Annotate sib- and supersib-calls, i.e. calls in tail position.  */
!   if ((ecf_flags & ECF_SIBCALL) || (ecf_flags & ECF_TAILCALL))
!     SIBLING_CALL_P (call_insn) = 1;
! 
!   /* This flag is important for determining in final.c, whether
!      the current function is a leaf or not.  */
!   if (ecf_flags & ECF_TAILCALL)
!     TAIL_CALL_P (call_insn) = 1;
!   else
!     TAIL_CALL_P (call_insn) = 0;
  
    /* Restore this now, so that we do defer pops for this call's args
       if the context of the call as a whole permits.  */
*************** expand_call (exp, target, ignore)
*** 2085,2091 ****
    rtx tail_recursion_insns = NULL_RTX;
    /* Sequence of insns to perform a normal "call".  */
    rtx normal_call_insns = NULL_RTX;
!   /* Sequence of insns to perform a tail recursive "call".  */
    rtx tail_call_insns = NULL_RTX;
    /* Data type of the function.  */
    tree funtype;
--- 2106,2114 ----
    rtx tail_recursion_insns = NULL_RTX;
    /* Sequence of insns to perform a normal "call".  */
    rtx normal_call_insns = NULL_RTX;
!   /* Sequence of insns to perform a sib call.  */
!   rtx sib_call_insns = NULL_RTX;
!   /* Sequence of insns to perform a tail call.  */
    rtx tail_call_insns = NULL_RTX;
    /* Data type of the function.  */
    tree funtype;
*************** expand_call (exp, target, ignore)
*** 2094,2101 ****
    tree fndecl = 0;
    rtx insn;
    int try_tail_call = 1;
    int try_tail_recursion = 1;
!   int pass;
  
    /* Register in which non-BLKmode value will be returned,
       or 0 if no value or if value is BLKmode.  */
--- 2117,2125 ----
    tree fndecl = 0;
    rtx insn;
    int try_tail_call = 1;
+   int try_sib_call = 1;
    int try_tail_recursion = 1;
!   int pass = 0;
  
    /* Register in which non-BLKmode value will be returned,
       or 0 if no value or if value is BLKmode.  */
*************** expand_call (exp, target, ignore)
*** 2124,2130 ****
    /* Vector of information about each argument.
       Arguments are numbered in the order they will be pushed,
       not the order they are written.  */
!   struct arg_data *args;
  
    /* Total size in bytes of all the stack-parms scanned so far.  */
    struct args_size args_size;
--- 2148,2154 ----
    /* Vector of information about each argument.
       Arguments are numbered in the order they will be pushed,
       not the order they are written.  */
!   struct arg_data *args, *args_tmp;
  
    /* Total size in bytes of all the stack-parms scanned so far.  */
    struct args_size args_size;
*************** expand_call (exp, target, ignore)
*** 2375,2380 ****
--- 2399,2409 ----
    args = (struct arg_data *) alloca (num_actuals * sizeof (struct arg_data));
    memset ((char *) args, 0, num_actuals * sizeof (struct arg_data));
  
+   /* Make space for a copy of args, needed when iterating through the
+      call chain generation.  */
+   args_tmp = (struct arg_data *) alloca (num_actuals * sizeof (struct arg_data));
+   memset ((char *) args_tmp, 0, num_actuals * sizeof (struct arg_data));
+   
    /* Build up entries in the ARGS array, compute the size of the
       arguments into ARGS_SIZE, etc.  */
    initialize_argument_information (num_actuals, args, &args_size,
*************** expand_call (exp, target, ignore)
*** 2422,2433 ****
       This is most often true of sjlj-exceptions, which we couldn't
       tail-call to anyway.  */
  
!   if (currently_expanding_call++ != 0
        || !flag_optimize_sibling_calls
        || !rtx_equal_function_value_matters
        || any_pending_cleanups (1)
        || args_size.var)
!     try_tail_call = try_tail_recursion = 0;
  
    /* Tail recursion fails, when we are not dealing with recursive calls.  */
    if (!try_tail_recursion
--- 2451,2476 ----
       This is most often true of sjlj-exceptions, which we couldn't
       tail-call to anyway.  */
  
!   if (currently_expanding_call != 0
        || !flag_optimize_sibling_calls
        || !rtx_equal_function_value_matters
        || any_pending_cleanups (1)
        || args_size.var)
!     try_sib_call = try_tail_recursion = 0;
! 
!   /* This check is somewhat redundant, but for the sake of
!      expandability and convenient editing, it makes sense
!      to keep it for now.  Later on, we try to cut down the 
!      conditions bit by bit.  */
!   
!   if (currently_expanding_call != 0
!       || !flag_optimize_tail_calls
!       || !rtx_equal_function_value_matters
!       || any_pending_cleanups (1)
!       || args_size.var)
!     try_tail_call = 0;
! 
!   currently_expanding_call++;
    
    /* Tail recursion fails, when we are not dealing with recursive calls.  */
    if (!try_tail_recursion
*************** expand_call (exp, target, ignore)
*** 2442,2448 ****
  #else
        1
  #endif
-       || !try_tail_call
        /* Doing sibling call optimization needs some work, since
  	 structure_value_addr can be allocated on the stack.
  	 It does not seem worth the effort since few optimizable
--- 2485,2490 ----
*************** expand_call (exp, target, ignore)
*** 2468,2476 ****
  	 != RETURN_POPS_ARGS (current_function_decl,
  			      TREE_TYPE (current_function_decl),
  			      current_function_args_size))
!   try_tail_call = 0;
  
!   if (try_tail_call || try_tail_recursion)
      {
        int end, inc;
        actparms = NULL_TREE;
--- 2510,2518 ----
  	 != RETURN_POPS_ARGS (current_function_decl,
  			      TREE_TYPE (current_function_decl),
  			      current_function_args_size))
!     try_tail_call = try_sib_call = 0;
      
!   if (try_sib_call || try_tail_recursion)
      {
        int end, inc;
        actparms = NULL_TREE;
*************** expand_call (exp, target, ignore)
*** 2536,2542 ****
        /* Expanding one of those dangerous arguments could have added
  	 cleanups, but otherwise give it a whirl.  */
        if (any_pending_cleanups (1))
! 	try_tail_call = try_tail_recursion = 0;
      }
  
    /* Generate a tail recursion sequence when calling ourselves.  */
--- 2578,2584 ----
        /* Expanding one of those dangerous arguments could have added
  	 cleanups, but otherwise give it a whirl.  */
        if (any_pending_cleanups (1))
! 	try_sib_call = try_tail_recursion = 0;
      }
  
    /* Generate a tail recursion sequence when calling ourselves.  */
*************** expand_call (exp, target, ignore)
*** 2568,2574 ****
        if (optimize_tail_recursion (actparms, get_last_insn ()))
  	{
  	  if (any_pending_cleanups (1))
! 	    try_tail_call = try_tail_recursion = 0;
  	  else
  	    tail_recursion_insns = get_insns ();
  	}
--- 2610,2616 ----
        if (optimize_tail_recursion (actparms, get_last_insn ()))
  	{
  	  if (any_pending_cleanups (1))
! 	    try_sib_call = try_tail_recursion = 0;
  	  else
  	    tail_recursion_insns = get_insns ();
  	}
*************** expand_call (exp, target, ignore)
*** 2606,2617 ****
  
    function_call_count++;
  
!   /* We want to make two insn chains; one for a sibling call, the other
!      for a normal call.  We will select one of the two chains after
!      initial RTL generation is complete.  */
!   for (pass = 0; pass < 2; pass++)
      {
        int sibcall_failure = 0;
        /* We want to emit any pending stack adjustments before the tail
  	 recursion "call".  That way we know any adjustment after the tail
  	 recursion call can be ignored if we indeed use the tail recursion
--- 2648,2664 ----
  
    function_call_count++;
  
!   /* Save the vector which holds function arguments and
!      corresponding addresses.  */
!   memcpy (args_tmp, args, num_actuals * sizeof (struct arg_data));
! 
!   /* We want to make three insn chains; one for a sibling call, the
!      other for a normal call and the super sib call.  We will select
!      one of three chains after initial RTL generation is complete.  */
!   for (pass = 0; pass < 3; pass++)
      {
        int sibcall_failure = 0;
+       int tailcall_failure = 0;
        /* We want to emit any pending stack adjustments before the tail
  	 recursion "call".  That way we know any adjustment after the tail
  	 recursion call can be ignored if we indeed use the tail recursion
*************** expand_call (exp, target, ignore)
*** 2621,2644 ****
        rtx insns;
        rtx before_call, next_arg_reg;
  
!       if (pass == 0)
  	{
- 	  if (! try_tail_call)
- 	    continue;
- 
  	  /* Emit any queued insns now; otherwise they would end up in
               only one of the alternates.  */
  	  emit_queue ();
  
  	  /* State variables we need to save and restore between
! 	     iterations.  */
  	  save_pending_stack_adjust = pending_stack_adjust;
  	  save_stack_pointer_delta = stack_pointer_delta;
  	}
!       if (pass)
! 	flags &= ~ECF_SIBCALL;
        else
  	flags |= ECF_SIBCALL;
  
        /* Other state variables that we must reinitialize each time
  	 through the loop (that are not initialized by the loop itself).  */
--- 2668,2714 ----
        rtx insns;
        rtx before_call, next_arg_reg;
  
!       if (pass == sib_call_seq
! 	  && (try_tail_call || try_sib_call))
  	{
  	  /* Emit any queued insns now; otherwise they would end up in
               only one of the alternates.  */
  	  emit_queue ();
+ 	}
+       
+       if (pass == sib_call_seq)
+ 	{
+ 	  if (! try_sib_call)
+ 	    continue;
  	  
  	  /* State variables we need to save and restore between
! 	     iterations; only for sibcalls and tail recursion.  */
  	  save_pending_stack_adjust = pending_stack_adjust;
  	  save_stack_pointer_delta = stack_pointer_delta;
  	}
!       
!       if (pass == tail_call_seq)
! 	{
! 	  if (! try_tail_call)
! 	    continue;
  	  else
+ 	    {
+ 	      /* Restore arguments for reevaluation.  This can surely
+ 		 be done in a nicer way, but for the purpose of getting
+ 		 things running, it does the job just fine.  */
+ 	      memcpy (args, args_tmp, num_actuals * sizeof (struct arg_data));
+ 	    }
+ 	}
+ 
+       if (pass == sib_call_seq)
  	flags |= ECF_SIBCALL;
+       else
+ 	flags &= ~ECF_SIBCALL;
+ 
+       if (pass == tail_call_seq)
+ 	flags |= ECF_TAILCALL;
+       else
+ 	flags &= ~ECF_TAILCALL;
  
        /* Other state variables that we must reinitialize each time
  	 through the loop (that are not initialized by the loop itself).  */
*************** expand_call (exp, target, ignore)
*** 2651,2657 ****
  	 sibcall_failure instead of continuing the loop.  */
        start_sequence ();
  
!       if (pass == 0)
  	{
  	  /* We know at this point that there are not currently any
  	     pending cleanups.  If, however, in the process of evaluating
--- 2721,2727 ----
  	 sibcall_failure instead of continuing the loop.  */
        start_sequence ();
  
!       if (pass == sib_call_seq)
  	{
  	  /* We know at this point that there are not currently any
  	     pending cleanups.  If, however, in the process of evaluating
*************** expand_call (exp, target, ignore)
*** 2668,2679 ****
        if (pending_stack_adjust >= 32
  	  || (pending_stack_adjust > 0
  	      && (flags & (ECF_MAY_BE_ALLOCA | ECF_SP_DEPRESSED)))
! 	  || pass == 0)
  	do_pending_stack_adjust ();
  
        /* When calling a const function, we must pop the stack args right away,
  	 so that the pop is deleted or moved with the call.  */
!       if (pass && (flags & ECF_LIBCALL_BLOCK))
  	NO_DEFER_POP;
  
  #ifdef FINAL_REG_PARM_STACK_SPACE
--- 2738,2750 ----
        if (pending_stack_adjust >= 32
  	  || (pending_stack_adjust > 0
  	      && (flags & (ECF_MAY_BE_ALLOCA | ECF_SP_DEPRESSED)))
! 	  || pass == sib_call_seq)
  	do_pending_stack_adjust ();
  
        /* When calling a const function, we must pop the stack args right away,
  	 so that the pop is deleted or moved with the call.  */
!       if ((pass == normal_call_seq || pass == tail_call_seq)
! 	  && (flags & ECF_LIBCALL_BLOCK))
  	NO_DEFER_POP;
  
  #ifdef FINAL_REG_PARM_STACK_SPACE
*************** expand_call (exp, target, ignore)
*** 2681,2692 ****
  							 args_size.var);
  #endif
        /* Precompute any arguments as needed.  */
!       if (pass)
  	precompute_arguments (flags, num_actuals, args);
  
        /* Now we are about to start emitting insns that can be deleted
  	 if a libcall is deleted.  */
!       if (pass && (flags & (ECF_LIBCALL_BLOCK | ECF_MALLOC)))
  	start_sequence ();
  
        adjusted_args_size = args_size;
--- 2752,2764 ----
  							 args_size.var);
  #endif
        /* Precompute any arguments as needed.  */
!       if (pass == normal_call_seq || pass == tail_call_seq)
  	precompute_arguments (flags, num_actuals, args);
  
        /* Now we are about to start emitting insns that can be deleted
  	 if a libcall is deleted.  */
!       if ((pass  == normal_call_seq || pass == tail_call_seq) 
! 	  && (flags & (ECF_LIBCALL_BLOCK | ECF_MALLOC)))
  	start_sequence ();
  
        adjusted_args_size = args_size;
*************** expand_call (exp, target, ignore)
*** 2695,2711 ****
  	 and there may be a minimum required size.  When generating a sibcall
  	 pattern, do not round up, since we'll be re-using whatever space our
  	 caller provided.  */
        unadjusted_args_size
  	= compute_argument_block_size (reg_parm_stack_space,
  				       &adjusted_args_size,
! 				       (pass == 0 ? 0
  					: preferred_stack_boundary));
  
        old_stack_allocated = stack_pointer_delta - pending_stack_adjust;
  
        /* The argument block when performing a sibling call is the
           incoming argument block.  */
!       if (pass == 0)
  	{
  	  argblock = virtual_incoming_args_rtx;
  	  stored_args_map = sbitmap_alloc (args_size.constant);
--- 2767,2789 ----
  	 and there may be a minimum required size.  When generating a sibcall
  	 pattern, do not round up, since we'll be re-using whatever space our
  	 caller provided.  */
+       /* ??? Should this also apply to general tail calls.  */
        unadjusted_args_size
  	= compute_argument_block_size (reg_parm_stack_space,
  				       &adjusted_args_size,
! 				       (pass == sib_call_seq ? 0
  					: preferred_stack_boundary));
  
        old_stack_allocated = stack_pointer_delta - pending_stack_adjust;
  
        /* The argument block when performing a sibling call is the
           incoming argument block.  */
!       
!       /* The same thing holds for general tail calls, but we'll
! 	 shift the arguments right before we jump to the callee,
! 	 not earlier.  */
!      
!       if (pass == sib_call_seq)
  	{
  	  argblock = virtual_incoming_args_rtx;
  	  stored_args_map = sbitmap_alloc (args_size.constant);
*************** expand_call (exp, target, ignore)
*** 2929,2937 ****
  	{
  	  if (pcc_struct_value)
  	    valreg = hard_function_value (build_pointer_type (TREE_TYPE (exp)),
! 					  fndecl, (pass == 0));
  	  else
! 	    valreg = hard_function_value (TREE_TYPE (exp), fndecl, (pass == 0));
  	}
  
        /* Precompute all register parameters.  It isn't safe to compute anything
--- 3007,3016 ----
  	{
  	  if (pcc_struct_value)
  	    valreg = hard_function_value (build_pointer_type (TREE_TYPE (exp)),
! 					  fndecl, (pass == sib_call_seq));
  	  else
! 	    valreg = hard_function_value (TREE_TYPE (exp), fndecl, 
! 					  (pass == sib_call_seq));
  	}
  
        /* Precompute all register parameters.  It isn't safe to compute anything
*************** expand_call (exp, target, ignore)
*** 2941,2947 ****
  #ifdef REG_PARM_STACK_SPACE
        /* Save the fixed argument area if it's part of the caller's frame and
  	 is clobbered by argument setup for this call.  */
!       if (ACCUMULATE_OUTGOING_ARGS && pass)
  	save_area = save_fixed_argument_area (reg_parm_stack_space, argblock,
  					      &low_to_save, &high_to_save);
  #endif
--- 3020,3027 ----
  #ifdef REG_PARM_STACK_SPACE
        /* Save the fixed argument area if it's part of the caller's frame and
  	 is clobbered by argument setup for this call.  */
!       if (ACCUMULATE_OUTGOING_ARGS
! 	  && (pass == normal_call_seq || pass == tail_call_seq))
  	save_area = save_fixed_argument_area (reg_parm_stack_space, argblock,
  					      &low_to_save, &high_to_save);
  #endif
*************** expand_call (exp, target, ignore)
*** 2960,2970 ****
  	    if (store_one_arg (&args[i], argblock, flags,
  			       adjusted_args_size.var != 0,
  			       reg_parm_stack_space)
! 		|| (pass == 0
  		    && check_sibcall_argument_overlap (before_arg,
  						       &args[i])))
  	      sibcall_failure = 1;
  	  }
  
        /* If we have a parm that is passed in registers but not in memory
  	 and whose alignment does not permit a direct copy into registers,
--- 3040,3054 ----
  	    if (store_one_arg (&args[i], argblock, flags,
  			       adjusted_args_size.var != 0,
  			       reg_parm_stack_space)
! 		|| (pass == sib_call_seq
  		    && check_sibcall_argument_overlap (before_arg,
  						       &args[i])))
+ 	      {
+ 		/* Debugging stuff; remove, if you please. */
+ 		warning ("sib call arguments overlap");
  		sibcall_failure = 1;
  	      }
+ 	  }
  
        /* If we have a parm that is passed in registers but not in memory
  	 and whose alignment does not permit a direct copy into registers,
*************** expand_call (exp, target, ignore)
*** 2984,2994 ****
  	      if (store_one_arg (&args[i], argblock, flags,
  				 adjusted_args_size.var != 0,
  				 reg_parm_stack_space)
! 		  || (pass == 0
  		      && check_sibcall_argument_overlap (before_arg,
  							 &args[i])))
  		sibcall_failure = 1;
  	    }
  
        /* If we pushed args in forward order, perform stack alignment
  	 after pushing the last arg.  */
--- 3068,3082 ----
  	      if (store_one_arg (&args[i], argblock, flags,
  				 adjusted_args_size.var != 0,
  				 reg_parm_stack_space)
! 		  || (pass == sib_call_seq
  		      && check_sibcall_argument_overlap (before_arg,
  							 &args[i])))
+ 		{
+ 		  /* Debugging stuff; remove, if you please. */
+ 		  warning ("sib call arguments overlap");
  		  sibcall_failure = 1;
  		}
+ 	    }
  
        /* If we pushed args in forward order, perform stack alignment
  	 after pushing the last arg.  */
*************** expand_call (exp, target, ignore)
*** 3007,3013 ****
  
        /* Pass the function the address in which to return a
  	 structure value.  */
!       if (pass != 0 && structure_value_addr && ! structure_value_addr_parm)
  	{
  	  emit_move_insn (struct_value_rtx,
  			  force_reg (Pmode,
--- 3095,3102 ----
  
        /* Pass the function the address in which to return a
  	 structure value.  */
!       if ((pass == normal_call_seq || pass == tail_call_seq)
! 	  && structure_value_addr && ! structure_value_addr_parm)
  	{
  	  emit_move_insn (struct_value_rtx,
  			  force_reg (Pmode,
*************** expand_call (exp, target, ignore)
*** 3019,3025 ****
  	}
  
        funexp = prepare_call_address (funexp, fndecl, &call_fusage,
! 				     reg_parm_seen, pass == 0);
  
        load_register_parameters (args, num_actuals, &call_fusage, flags);
  
--- 3108,3114 ----
  	}
  
        funexp = prepare_call_address (funexp, fndecl, &call_fusage,
! 				     reg_parm_seen, pass == sib_call_seq);
  
        load_register_parameters (args, num_actuals, &call_fusage, flags);
  
*************** expand_call (exp, target, ignore)
*** 3033,3039 ****
        /* Set up next argument register.  For sibling calls on machines
  	 with register windows this should be the incoming register.  */
  #ifdef FUNCTION_INCOMING_ARG
!       if (pass == 0)
  	next_arg_reg = FUNCTION_INCOMING_ARG (args_so_far, VOIDmode,
  					      void_type_node, 1);
        else
--- 3122,3128 ----
        /* Set up next argument register.  For sibling calls on machines
  	 with register windows this should be the incoming register.  */
  #ifdef FUNCTION_INCOMING_ARG
!       if (pass == sib_call_seq)
  	next_arg_reg = FUNCTION_INCOMING_ARG (args_so_far, VOIDmode,
  					      void_type_node, 1);
        else
*************** expand_call (exp, target, ignore)
*** 3045,3053 ****
  	 now!  */
  
        /* Stack must be properly aligned now.  */
!       if (pass && stack_pointer_delta % preferred_unit_stack_boundary)
  	abort ();
  
        /* Generate the actual call instruction.  */
        emit_call_1 (funexp, fndecl, funtype, unadjusted_args_size,
  		   adjusted_args_size.constant, struct_value_size,
--- 3134,3169 ----
  	 now!  */
  
        /* Stack must be properly aligned now.  */
!       if ((pass == normal_call_seq || pass == tail_call_seq) 
! 	  && stack_pointer_delta % preferred_unit_stack_boundary)
  	abort ();
  
+       /* Once all the arguments are in their respective place, we're
+ 	 ready to shift them up into the incoming argument space.  */
+       if (pass == tail_call_seq)
+ 	{
+ 	  for (i = 0; i < num_actuals; i++)
+ 	    {
+ 	      /* Determine whether to pass arg on stack or in register
+ 		 and the according address to access.  */
+ 	      rtx src = (args[i].reg != 0) ? args[i].reg : args[i].stack;
+ 
+ 	      /* This code only handles variables that are being passed
+ 		 via the stack.  That is, extra precautions are required
+ 		 to pass other types.  */
+ 	      
+ 	      /* Determine the destination for each argument.  */
+ 	      rtx slot_offset = ARGS_SIZE_RTX (args[i].slot_offset);
+ 	      rtx addr = plus_constant (virtual_incoming_args_rtx,
+ 					INTVAL(slot_offset));
+ 	      rtx dest = gen_rtx_MEM (args[i].mode, addr);
+ 
+ 	      /* A block move might be more appropriate, when the args
+ 		 are actual structs; not C basic types.  */
+ 	      emit_move_insn (dest, src);
+ 	    }
+ 	}
+ 
        /* Generate the actual call instruction.  */
        emit_call_1 (funexp, fndecl, funtype, unadjusted_args_size,
  		   adjusted_args_size.constant, struct_value_size,
*************** expand_call (exp, target, ignore)
*** 3055,3061 ****
  		   flags, & args_so_far);
  
        /* Verify that we've deallocated all the stack we used.  */
!       if (pass
  	  && old_stack_allocated != stack_pointer_delta - pending_stack_adjust)
  	abort ();
  
--- 3171,3177 ----
  		   flags, & args_so_far);
  
        /* Verify that we've deallocated all the stack we used.  */
!       if ((pass == normal_call_seq || pass == tail_call_seq)
  	  && old_stack_allocated != stack_pointer_delta - pending_stack_adjust)
  	abort ();
  
*************** expand_call (exp, target, ignore)
*** 3063,3069 ****
  	 Test valreg so we don't crash; may safely ignore `const'
  	 if return type is void.  Disable for PARALLEL return values, because
  	 we have no way to move such values into a pseudo register.  */
!       if (pass && (flags & ECF_LIBCALL_BLOCK))
  	{
  	  rtx insns;
  
--- 3179,3186 ----
  	 Test valreg so we don't crash; may safely ignore `const'
  	 if return type is void.  Disable for PARALLEL return values, because
  	 we have no way to move such values into a pseudo register.  */
!       if ((pass == normal_call_seq || pass == tail_call_seq)
! 	  && (flags & ECF_LIBCALL_BLOCK))
  	{
  	  rtx insns;
  
*************** expand_call (exp, target, ignore)
*** 3105,3111 ****
  	      valreg = temp;
  	    }
  	}
!       else if (pass && (flags & ECF_MALLOC))
  	{
  	  rtx temp = gen_reg_rtx (GET_MODE (valreg));
  	  rtx last, insns;
--- 3222,3229 ----
  	      valreg = temp;
  	    }
  	}
!       else if ((pass == normal_call_seq || pass == tail_call_seq)
! 	       && (flags & ECF_MALLOC))
  	{
  	  rtx temp = gen_reg_rtx (GET_MODE (valreg));
  	  rtx last, insns;
*************** expand_call (exp, target, ignore)
*** 3132,3139 ****
        /* For calls to `setjmp', etc., inform flow.c it should complain
  	 if nonvolatile values are live.  For functions that cannot return,
  	 inform flow that control does not fall through.  */
! 
!       if ((flags & (ECF_NORETURN | ECF_LONGJMP)) || pass == 0)
  	{
  	  /* The barrier must be emitted
  	     immediately after the CALL_INSN.  Some ports emit more
--- 3250,3258 ----
        /* For calls to `setjmp', etc., inform flow.c it should complain
  	 if nonvolatile values are live.  For functions that cannot return,
  	 inform flow that control does not fall through.  */
!       if ((flags & (ECF_NORETURN | ECF_LONGJMP)) 
! 	  || pass == sib_call_seq
! 	  || pass == tail_call_seq)
  	{
  	  /* The barrier must be emitted
  	     immediately after the CALL_INSN.  Some ports emit more
*************** expand_call (exp, target, ignore)
*** 3286,3292 ****
  	  stack_usage_map = initial_stack_usage_map;
  	  sibcall_failure = 1;
  	}
!       else if (ACCUMULATE_OUTGOING_ARGS && pass)
  	{
  #ifdef REG_PARM_STACK_SPACE
  	  if (save_area)
--- 3405,3412 ----
  	  stack_usage_map = initial_stack_usage_map;
  	  sibcall_failure = 1;
  	}
!       else if (ACCUMULATE_OUTGOING_ARGS 
! 	       && (pass == normal_call_seq || pass == tail_call_seq))
  	{
  #ifdef REG_PARM_STACK_SPACE
  	  if (save_area)
*************** expand_call (exp, target, ignore)
*** 3330,3336 ****
  	if (args[i].aligned_regs)
  	  free (args[i].aligned_regs);
  
!       if (pass == 0)
  	{
  	  /* Undo the fake expand_start_target_temps we did earlier.  If
  	     there had been any cleanups created, we've already set
--- 3450,3456 ----
  	if (args[i].aligned_regs)
  	  free (args[i].aligned_regs);
  
!       if (pass == sib_call_seq)
  	{
  	  /* Undo the fake expand_start_target_temps we did earlier.  If
  	     there had been any cleanups created, we've already set
*************** expand_call (exp, target, ignore)
*** 3341,3349 ****
        insns = get_insns ();
        end_sequence ();
  
!       if (pass == 0)
  	{
! 	  tail_call_insns = insns;
  
  	  /* Restore the pending stack adjustment now that we have
  	     finished generating the sibling call sequence.  */
--- 3461,3469 ----
        insns = get_insns ();
        end_sequence ();
  
!       if (pass == sib_call_seq)
  	{
! 	  sib_call_insns = insns;
  
  	  /* Restore the pending stack adjustment now that we have
  	     finished generating the sibling call sequence.  */
*************** expand_call (exp, target, ignore)
*** 3361,3372 ****
  
  	  sbitmap_free (stored_args_map);
  	}
!       else
  	normal_call_insns = insns;
  
        /* If something prevents making this a sibling call,
  	 zero out the sequence.  */
        if (sibcall_failure)
  	tail_call_insns = NULL_RTX;
      }
  
--- 3481,3499 ----
  
  	  sbitmap_free (stored_args_map);
  	}
!       else if (pass == normal_call_seq)
  	normal_call_insns = insns;
+       else if (pass == tail_call_seq)
+ 	tail_call_insns = insns;
+       else
+ 	abort ();
  
        /* If something prevents making this a sibling call,
  	 zero out the sequence.  */
        if (sibcall_failure)
+ 	sib_call_insns = NULL_RTX;
+ 
+       if (tailcall_failure)
  	tail_call_insns = NULL_RTX;
      }
  
*************** expand_call (exp, target, ignore)
*** 3393,3402 ****
  	&& GET_CODE (PATTERN (insn)) == CALL_PLACEHOLDER)
        replace_call_placeholder (insn, sibcall_use_normal);
  
!   /* If this was a potential tail recursion site, then emit a
!      CALL_PLACEHOLDER with the normal and the tail recursion streams.
!      One of them will be selected later.  */
!   if (tail_recursion_insns || tail_call_insns)
      {
        /* The tail recursion label must be kept around.  We could expose
  	 its use in the CALL_PLACEHOLDER, but that creates unwanted edges
--- 3520,3534 ----
  	&& GET_CODE (PATTERN (insn)) == CALL_PLACEHOLDER)
        replace_call_placeholder (insn, sibcall_use_normal);
  
!   for (insn = sib_call_insns; insn; insn = NEXT_INSN (insn))
!     if (GET_CODE (insn) == CALL_INSN
! 	&& GET_CODE (PATTERN (insn)) == CALL_PLACEHOLDER)
!       replace_call_placeholder (insn, sibcall_use_normal);
! 
!   /* If this was a potential tail call/recursion site, then emit a
!      CALL_PLACEHOLDER with the different streams.  One of them will
!      be selected later.  */
!   if (sib_call_insns || tail_recursion_insns || tail_call_insns)
      {
        /* The tail recursion label must be kept around.  We could expose
  	 its use in the CALL_PLACEHOLDER, but that creates unwanted edges
*************** expand_call (exp, target, ignore)
*** 3407,3414 ****
        if (tail_recursion_insns)
  	LABEL_PRESERVE_P (tail_recursion_label) = 1;
        emit_call_insn (gen_rtx_CALL_PLACEHOLDER (VOIDmode, normal_call_insns,
! 						tail_call_insns,
  						tail_recursion_insns,
  						tail_recursion_label));
      }
    else
--- 3539,3547 ----
        if (tail_recursion_insns)
  	LABEL_PRESERVE_P (tail_recursion_label) = 1;
        emit_call_insn (gen_rtx_CALL_PLACEHOLDER (VOIDmode, normal_call_insns,
! 						sib_call_insns,
  						tail_recursion_insns,
+ 						tail_call_insns,
  						tail_recursion_label));
      }
    else
diff -b -w -B -d -p -c gcc-3.2/gcc/cfgbuild.c gcc-3.2-cvs/gcc/cfgbuild.c
*** gcc-3.2/gcc/cfgbuild.c	Sun Dec 23 02:51:07 2001
--- gcc-3.2-cvs/gcc/cfgbuild.c	Mon Sep  2 13:53:03 2002
*************** find_basic_blocks_1 (f)
*** 544,552 ****
  	      lvl = find_label_refs (XEXP (PATTERN (insn), 0), lvl);
  	      lvl = find_label_refs (XEXP (PATTERN (insn), 1), lvl);
  	      lvl = find_label_refs (XEXP (PATTERN (insn), 2), lvl);
  	      /* Record its tail recursion label, if any.  */
  	      if (XEXP (PATTERN (insn), 3) != NULL_RTX)
! 		trll = alloc_EXPR_LIST (0, XEXP (PATTERN (insn), 3), trll);
  	    }
  	  break;
  
--- 544,553 ----
  	      lvl = find_label_refs (XEXP (PATTERN (insn), 0), lvl);
  	      lvl = find_label_refs (XEXP (PATTERN (insn), 1), lvl);
  	      lvl = find_label_refs (XEXP (PATTERN (insn), 2), lvl);
+ 	      lvl = find_label_refs (XEXP (PATTERN (insn), 3), lvl);
  	      /* Record its tail recursion label, if any.  */
  	      if (XEXP (PATTERN (insn), 3) != NULL_RTX)
! 		trll = alloc_EXPR_LIST (0, XEXP (PATTERN (insn), 4), trll);
  	    }
  	  break;
  
Common subdirectories: gcc-3.2/gcc/config and gcc-3.2-cvs/gcc/config
Common subdirectories: gcc-3.2/gcc/cp and gcc-3.2-cvs/gcc/cp
Common subdirectories: gcc-3.2/gcc/doc and gcc-3.2-cvs/gcc/doc
diff -b -w -B -d -p -c gcc-3.2/gcc/except.c gcc-3.2-cvs/gcc/except.c
*** gcc-3.2/gcc/except.c	Wed Apr 17 11:48:17 2002
--- gcc-3.2-cvs/gcc/except.c	Mon Sep  2 13:53:08 2002
*************** convert_from_eh_region_ranges_1 (pinsns,
*** 1396,1401 ****
--- 1396,1403 ----
  					       sp, cur);
  	      convert_from_eh_region_ranges_1 (&XEXP (PATTERN (insn), 2),
  					       sp, cur);
+ 	      convert_from_eh_region_ranges_1 (&XEXP (PATTERN (insn), 3),
+ 					       sp, cur);
  	    }
  	}
      }
*************** can_throw_internal (insn)
*** 3082,3088 ****
        && GET_CODE (PATTERN (insn)) == CALL_PLACEHOLDER)
      {
        int i;
!       for (i = 0; i < 3; ++i)
  	{
  	  rtx sub = XEXP (PATTERN (insn), i);
  	  for (; sub ; sub = NEXT_INSN (sub))
--- 3084,3090 ----
        && GET_CODE (PATTERN (insn)) == CALL_PLACEHOLDER)
      {
        int i;
!       for (i = 0; i < 4; ++i)
  	{
  	  rtx sub = XEXP (PATTERN (insn), i);
  	  for (; sub ; sub = NEXT_INSN (sub))
*************** can_throw_external (insn)
*** 3143,3149 ****
        && GET_CODE (PATTERN (insn)) == CALL_PLACEHOLDER)
      {
        int i;
!       for (i = 0; i < 3; ++i)
  	{
  	  rtx sub = XEXP (PATTERN (insn), i);
  	  for (; sub ; sub = NEXT_INSN (sub))
--- 3145,3151 ----
        && GET_CODE (PATTERN (insn)) == CALL_PLACEHOLDER)
      {
        int i;
!       for (i = 0; i < 4; ++i)
  	{
  	  rtx sub = XEXP (PATTERN (insn), i);
  	  for (; sub ; sub = NEXT_INSN (sub))
Common subdirectories: gcc-3.2/gcc/f and gcc-3.2-cvs/gcc/f
diff -b -w -B -d -p -c gcc-3.2/gcc/final.c gcc-3.2-cvs/gcc/final.c
*** gcc-3.2/gcc/final.c	Sat May 25 07:26:50 2002
--- gcc-3.2-cvs/gcc/final.c	Sun Sep  1 18:26:20 2002
*************** leaf_function_p ()
*** 3838,3849 ****
    for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
      {
        if (GET_CODE (insn) == CALL_INSN
! 	  && ! SIBLING_CALL_P (insn))
  	return 0;
        if (GET_CODE (insn) == INSN
  	  && GET_CODE (PATTERN (insn)) == SEQUENCE
  	  && GET_CODE (XVECEXP (PATTERN (insn), 0, 0)) == CALL_INSN
! 	  && ! SIBLING_CALL_P (XVECEXP (PATTERN (insn), 0, 0)))
  	return 0;
      }
    for (link = current_function_epilogue_delay_list;
--- 3838,3850 ----
    for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
      {
        if (GET_CODE (insn) == CALL_INSN
! 	  && (! SIBLING_CALL_P (insn) || TAIL_CALL_P (insn)))
  	return 0;
        if (GET_CODE (insn) == INSN
  	  && GET_CODE (PATTERN (insn)) == SEQUENCE
  	  && GET_CODE (XVECEXP (PATTERN (insn), 0, 0)) == CALL_INSN
! 	  && (! SIBLING_CALL_P (XVECEXP (PATTERN (insn), 0, 0)) 
! 	      || TAIL_CALL_P (XVECEXP (PATTERN (insn), 0, 0))))
  	return 0;
      }
    for (link = current_function_epilogue_delay_list;
*************** leaf_function_p ()
*** 3858,3864 ****
        if (GET_CODE (insn) == INSN
  	  && GET_CODE (PATTERN (insn)) == SEQUENCE
  	  && GET_CODE (XVECEXP (PATTERN (insn), 0, 0)) == CALL_INSN
! 	  && ! SIBLING_CALL_P (XVECEXP (PATTERN (insn), 0, 0)))
  	return 0;
      }
  
--- 3859,3866 ----
        if (GET_CODE (insn) == INSN
  	  && GET_CODE (PATTERN (insn)) == SEQUENCE
  	  && GET_CODE (XVECEXP (PATTERN (insn), 0, 0)) == CALL_INSN
! 	  && (! SIBLING_CALL_P (XVECEXP (PATTERN (insn), 0, 0))
! 	      || TAIL_CALL_P (XVECEXP (PATTERN (insn), 0, 0))))
  	return 0;
      }
  
Common subdirectories: gcc-3.2/gcc/fixinc and gcc-3.2-cvs/gcc/fixinc
diff -b -w -B -d -p -c gcc-3.2/gcc/flags.h gcc-3.2-cvs/gcc/flags.h
*** gcc-3.2/gcc/flags.h	Fri Mar 22 10:12:21 2002
--- gcc-3.2-cvs/gcc/flags.h	Sun Sep  1 18:26:21 2002
*************** extern int flag_volatile_static;
*** 333,338 ****
--- 333,342 ----
  
  extern int flag_optimize_sibling_calls;
  
+ /* Nonzero allows GCC to optimize general tail calls.  */
+ 
+ extern int flag_optimize_tail_calls;
+ 
  /* Nonzero means the front end generally wants `errno' maintained by math
     operations, like built-in SQRT.  */
  
diff -b -w -B -d -p -c gcc-3.2/gcc/function.c gcc-3.2-cvs/gcc/function.c
*** gcc-3.2/gcc/function.c	Sun Jun 23 00:29:26 2002
--- gcc-3.2-cvs/gcc/function.c	Mon Sep  2 13:53:09 2002
*************** fixup_var_refs_insns (insn, var, promote
*** 1673,1679 ****
        rtx next = NEXT_INSN (insn);
  
        /* CALL_PLACEHOLDERs are special; we have to switch into each of
! 	 the three sequences they (potentially) contain, and process
  	 them recursively.  The CALL_INSN itself is not interesting.  */
  
        if (GET_CODE (insn) == CALL_INSN
--- 1673,1679 ----
        rtx next = NEXT_INSN (insn);
  
        /* CALL_PLACEHOLDERs are special; we have to switch into each of
! 	 the four sequences they (potentially) contain, and process
  	 them recursively.  The CALL_INSN itself is not interesting.  */
  
        if (GET_CODE (insn) == CALL_INSN
*************** fixup_var_refs_insns (insn, var, promote
*** 1683,1689 ****
  
  	  /* Look at the Normal call, sibling call and tail recursion
  	     sequences attached to the CALL_PLACEHOLDER.  */
! 	  for (i = 0; i < 3; i++)
  	    {
  	      rtx seq = XEXP (PATTERN (insn), i);
  	      if (seq)
--- 1683,1689 ----
  
  	  /* Look at the Normal call, sibling call and tail recursion
  	     sequences attached to the CALL_PLACEHOLDER.  */
! 	  for (i = 0; i < 4; i++)
  	    {
  	      rtx seq = XEXP (PATTERN (insn), i);
  	      if (seq)
*************** identify_blocks_1 (insns, block_vector, 
*** 5902,5907 ****
--- 5902,5910 ----
  	  if (XEXP (cp, 2))
  	    block_vector = identify_blocks_1 (XEXP (cp, 2), block_vector,
  					      end_block_vector, block_stack);
+ 	  if (XEXP (cp, 3))
+ 	    block_vector = identify_blocks_1 (XEXP (cp, 3), block_vector,
+ 					      end_block_vector, block_stack);
  	}
      }
  
*************** reorder_blocks_1 (insns, current_block, 
*** 6022,6027 ****
--- 6025,6032 ----
  	    reorder_blocks_1 (XEXP (cp, 1), current_block, p_block_stack);
  	  if (XEXP (cp, 2))
  	    reorder_blocks_1 (XEXP (cp, 2), current_block, p_block_stack);
+ 	  if (XEXP (cp, 3))
+ 	    reorder_blocks_1 (XEXP (cp, 3), current_block, p_block_stack);
  	}
      }
  }
Common subdirectories: gcc-3.2/gcc/ginclude and gcc-3.2-cvs/gcc/ginclude
diff -b -w -B -d -p -c gcc-3.2/gcc/integrate.c gcc-3.2-cvs/gcc/integrate.c
*** gcc-3.2/gcc/integrate.c	Sat Apr 13 05:17:54 2002
--- gcc-3.2-cvs/gcc/integrate.c	Mon Sep  2 13:53:11 2002
*************** save_parm_insns (insn, first_nonparm_ins
*** 523,536 ****
  	  note_stores (PATTERN (insn), note_modified_parmregs, NULL);
  
  	  /* If this is a CALL_PLACEHOLDER insn then we need to look into the
! 	     three attached sequences: normal call, sibling call and tail
! 	     recursion.  */
  	  if (GET_CODE (insn) == CALL_INSN
  	      && GET_CODE (PATTERN (insn)) == CALL_PLACEHOLDER)
  	    {
  	      int i;
  
! 	      for (i = 0; i < 3; i++)
  		save_parm_insns (XEXP (PATTERN (insn), i),
  				 first_nonparm_insn);
  	    }
--- 523,536 ----
  	  note_stores (PATTERN (insn), note_modified_parmregs, NULL);
  
  	  /* If this is a CALL_PLACEHOLDER insn then we need to look into the
! 	     four attached sequences: normal call, sibling call, tail
! 	     recursion and tail call.  */
  	  if (GET_CODE (insn) == CALL_INSN
  	      && GET_CODE (PATTERN (insn)) == CALL_PLACEHOLDER)
  	    {
  	      int i;
  
! 	      for (i = 0; i < 4; i++)
  		save_parm_insns (XEXP (PATTERN (insn), i),
  				 first_nonparm_insn);
  	    }
*************** copy_insn_list (insns, map, static_chain
*** 1559,1572 ****
  
  	case CALL_INSN:
  	  /* If this is a CALL_PLACEHOLDER insn then we need to copy the
! 	     three attached sequences: normal call, sibling call and tail
! 	     recursion.  */
  	  if (GET_CODE (PATTERN (insn)) == CALL_PLACEHOLDER)
  	    {
! 	      rtx sequence[3];
  	      rtx tail_label;
  
! 	      for (i = 0; i < 3; i++)
  		{
  		  rtx seq;
  
--- 1559,1572 ----
  
  	case CALL_INSN:
  	  /* If this is a CALL_PLACEHOLDER insn then we need to copy the
! 	     four attached sequences: normal call, sibling call, tail
! 	     recursion and tail call.  */
  	  if (GET_CODE (PATTERN (insn)) == CALL_PLACEHOLDER)
  	    {
! 	      rtx sequence[4];
  	      rtx tail_label;
  
! 	      for (i = 0; i < 4; i++)
  		{
  		  rtx seq;
  
*************** copy_insn_list (insns, map, static_chain
*** 1583,1595 ****
  
  	      /* Find the new tail recursion label.
  	         It will already be substituted into sequence[2].  */
! 	      tail_label = copy_rtx_and_substitute (XEXP (PATTERN (insn), 3),
  						    map, 0);
  
  	      copy = emit_call_insn (gen_rtx_CALL_PLACEHOLDER (VOIDmode,
  							       sequence[0],
  							       sequence[1],
  							       sequence[2],
  							       tail_label));
  	      break;
  	    }
--- 1583,1600 ----
  
  	      /* Find the new tail recursion label.
  	         It will already be substituted into sequence[2].  */
! 	      tail_label = copy_rtx_and_substitute (XEXP (PATTERN (insn), 4),
  						    map, 0);
  
+ 	      /* The last call sequence contains the general tail call
+ 		 optimisation.  */
+ 	      /* FIXME: squeeze sequence[3] between sequence[2] and
+ 		 tail_label.  */
  	      copy = emit_call_insn (gen_rtx_CALL_PLACEHOLDER (VOIDmode,
  							       sequence[0],
  							       sequence[1],
  							       sequence[2],
+ 							       sequence[3],
  							       tail_label));
  	      break;
  	    }
*************** copy_insn_list (insns, map, static_chain
*** 1598,1603 ****
--- 1603,1609 ----
  	  copy = emit_call_insn (pattern);
  
  	  SIBLING_CALL_P (copy) = SIBLING_CALL_P (insn);
+ 	  TAIL_CALL_P (copy) = TAIL_CALL_P (insn);
  	  CONST_OR_PURE_CALL_P (copy) = CONST_OR_PURE_CALL_P (insn);
  
  	  /* Because the USAGE information potentially contains objects other
*************** copy_insn_notes (insns, map, eh_region_o
*** 1742,1748 ****
  	  && GET_CODE (PATTERN (insn)) == CALL_PLACEHOLDER)
  	{
  	  int i;
! 	  for (i = 0; i < 3; i++)
  	    copy_insn_notes (XEXP (PATTERN (insn), i), map, eh_region_offset);
  	}
  
--- 1748,1754 ----
  	  && GET_CODE (PATTERN (insn)) == CALL_PLACEHOLDER)
  	{
  	  int i;
! 	  for (i = 0; i < 4; i++)
  	    copy_insn_notes (XEXP (PATTERN (insn), i), map, eh_region_offset);
  	}
  
Common subdirectories: gcc-3.2/gcc/intl and gcc-3.2-cvs/gcc/intl
Common subdirectories: gcc-3.2/gcc/java and gcc-3.2-cvs/gcc/java
diff -b -w -B -d -p -c gcc-3.2/gcc/jump.c gcc-3.2-cvs/gcc/jump.c
*** gcc-3.2/gcc/jump.c	Wed Apr 10 06:38:58 2002
--- gcc-3.2-cvs/gcc/jump.c	Mon Sep  2 13:53:11 2002
*************** mark_all_labels (f)
*** 236,250 ****
  	    mark_all_labels (XEXP (PATTERN (insn), 0));
  	    mark_all_labels (XEXP (PATTERN (insn), 1));
  	    mark_all_labels (XEXP (PATTERN (insn), 2));
  
  	    /* Canonicalize the tail recursion label attached to the
  	       CALL_PLACEHOLDER insn.  */
! 	    if (XEXP (PATTERN (insn), 3))
  	      {
  		rtx label_ref = gen_rtx_LABEL_REF (VOIDmode,
  						   XEXP (PATTERN (insn), 3));
  		mark_jump_label (label_ref, insn, 0);
! 		XEXP (PATTERN (insn), 3) = XEXP (label_ref, 0);
  	      }
  
  	    continue;
--- 236,251 ----
  	    mark_all_labels (XEXP (PATTERN (insn), 0));
  	    mark_all_labels (XEXP (PATTERN (insn), 1));
  	    mark_all_labels (XEXP (PATTERN (insn), 2));
+ 	    mark_all_labels (XEXP (PATTERN (insn), 3));
  
  	    /* Canonicalize the tail recursion label attached to the
  	       CALL_PLACEHOLDER insn.  */
! 	    if (XEXP (PATTERN (insn), 4))
  	      {
  		rtx label_ref = gen_rtx_LABEL_REF (VOIDmode,
  						   XEXP (PATTERN (insn), 3));
  		mark_jump_label (label_ref, insn, 0);
! 		XEXP (PATTERN (insn), 4) = XEXP (label_ref, 0);
  	      }
  
  	    continue;
diff -b -w -B -d -p -c gcc-3.2/gcc/mkinstalldirs gcc-3.2-cvs/gcc/mkinstalldirs
*** gcc-3.2/gcc/mkinstalldirs	Sun Sep  5 01:08:55 1999
--- gcc-3.2-cvs/gcc/mkinstalldirs	Thu Aug 22 16:25:59 2002
***************
*** 4,10 ****
  # Created: 1993-05-16
  # Public domain
  
! # $Id: mkinstalldirs,v 1.2 1999/09/04 15:08:55 law Exp $
  
  errstatus=0
  
--- 4,10 ----
  # Created: 1993-05-16
  # Public domain
  
! # $Id: mkinstalldirs,v 1.1.1.1 2002/08/22 06:25:59 baueran Exp $
  
  errstatus=0
  
Common subdirectories: gcc-3.2/gcc/objc and gcc-3.2-cvs/gcc/objc
Common subdirectories: gcc-3.2/gcc/po and gcc-3.2-cvs/gcc/po
diff -b -w -B -d -p -c gcc-3.2/gcc/rtl.def gcc-3.2-cvs/gcc/rtl.def
*** gcc-3.2/gcc/rtl.def	Tue Feb 19 13:53:26 2002
--- gcc-3.2-cvs/gcc/rtl.def	Fri Aug 23 15:11:46 2002
*************** DEF_RTL_EXPR(CONSTANT_P_RTX, "constant_p
*** 955,961 ****
  
     This method of tail-call elimination is intended to be replaced by
     tree-based optimizations once front-end conversions are complete.  */
! DEF_RTL_EXPR(CALL_PLACEHOLDER, "call_placeholder", "uuuu", 'x')
  
  /* Describes a merge operation between two vector values.
     Operands 0 and 1 are the vectors to be merged, operand 2 is a bitmask
--- 955,961 ----
  
     This method of tail-call elimination is intended to be replaced by
     tree-based optimizations once front-end conversions are complete.  */
! DEF_RTL_EXPR(CALL_PLACEHOLDER, "call_placeholder", "uuuuu", 'x')
  
  /* Describes a merge operation between two vector values.
     Operands 0 and 1 are the vectors to be merged, operand 2 is a bitmask
diff -b -w -B -d -p -c gcc-3.2/gcc/rtl.h gcc-3.2-cvs/gcc/rtl.h
*** gcc-3.2/gcc/rtl.h	Sun May 19 19:05:27 2002
--- gcc-3.2-cvs/gcc/rtl.h	Mon Sep  2 14:10:50 2002
*************** struct rtx_def
*** 133,138 ****
--- 133,139 ----
       LINK_COST_ZERO in an INSN_LIST.
       SET_IS_RETURN_P in a SET.  */
    unsigned int jump : 1;
+   unsigned int tailcall : 1;
    /* 1 in an INSN if it can call another function.
       LINK_COST_FREE in an INSN_LIST.  */
    unsigned int call : 1;
*************** extern void rtvec_check_failed_bounds PA
*** 419,425 ****
  /* 1 if insn is a call to a const or pure function.  */
  #define CONST_OR_PURE_CALL_P(INSN) ((INSN)->unchanging)
  
! /* 1 if insn (assumed to be a CALL_INSN) is a sibling call.  */
  #define SIBLING_CALL_P(INSN) ((INSN)->jump)
  
  /* 1 if insn is a branch that should not unconditionally execute its
--- 420,428 ----
  /* 1 if insn is a call to a const or pure function.  */
  #define CONST_OR_PURE_CALL_P(INSN) ((INSN)->unchanging)
  
! /* 1 if insn (assumed to be a CALL_INSN) is a call in tail
!    position (sibcall or even a more general tail call).  */
! #define TAIL_CALL_P(INSN) ((INSN)->tailcall)
  #define SIBLING_CALL_P(INSN) ((INSN)->jump)
  
  /* 1 if insn is a branch that should not unconditionally execute its
*************** extern rtx addr_side_effect_eval	PARAMS 
*** 2106,2112 ****
  typedef enum {
    sibcall_use_normal = 1,
    sibcall_use_tail_recursion,
!   sibcall_use_sibcall
  } sibcall_use_t;
  
  extern void optimize_sibling_and_tail_recursive_calls PARAMS ((void));
--- 2109,2116 ----
  typedef enum {
    sibcall_use_normal = 1,
    sibcall_use_tail_recursion,
!   sibcall_use_sibcall,
!   sibcall_use_tailcall
  } sibcall_use_t;
  
  extern void optimize_sibling_and_tail_recursive_calls PARAMS ((void));
diff -b -w -B -d -p -c gcc-3.2/gcc/sibcall.c gcc-3.2-cvs/gcc/sibcall.c
*** gcc-3.2/gcc/sibcall.c	Sun Apr  7 05:37:38 2002
--- gcc-3.2-cvs/gcc/sibcall.c	Mon Sep  9 11:35:18 2002
*************** sequence_uses_addressof (seq)
*** 461,466 ****
--- 461,469 ----
  	    if (XEXP (PATTERN (insn), 2) != NULL_RTX
  		&& sequence_uses_addressof (XEXP (PATTERN (insn), 2)))
  	      return 1;
+ 	    if (XEXP (PATTERN (insn), 3) != NULL_RTX
+ 		&& sequence_uses_addressof (XEXP (PATTERN (insn), 3)))
+ 	      return 1;
  	  }
  	else if (uses_addressof (PATTERN (insn))
  		 || (REG_NOTES (insn) && uses_addressof (REG_NOTES (insn))))
*************** replace_call_placeholder (insn, use)
*** 540,546 ****
       rtx insn;
       sibcall_use_t use;
  {
!   if (use == sibcall_use_tail_recursion)
      emit_insns_before (XEXP (PATTERN (insn), 2), insn);
    else if (use == sibcall_use_sibcall)
      emit_insns_before (XEXP (PATTERN (insn), 1), insn);
--- 543,551 ----
       rtx insn;
       sibcall_use_t use;
  {
!   if (use == sibcall_use_tailcall)
!     emit_insns_before (XEXP (PATTERN (insn), 3), insn);
!   else if (use == sibcall_use_tail_recursion)
      emit_insns_before (XEXP (PATTERN (insn), 2), insn);
    else if (use == sibcall_use_sibcall)
      emit_insns_before (XEXP (PATTERN (insn), 1), insn);
*************** optimize_sibling_and_tail_recursive_call
*** 575,580 ****
--- 580,586 ----
    basic_block alternate_exit = EXIT_BLOCK_PTR;
    bool no_sibcalls_this_function = false;
    int successful_sibling_call = 0;
+   int successful_tail_call = 0;
    int replaced_call_placeholder = 0;
    edge e;
  
*************** optimize_sibling_and_tail_recursive_call
*** 680,685 ****
--- 686,692 ----
  	{
  	  int sibcall = (XEXP (PATTERN (insn), 1) != NULL_RTX);
  	  int tailrecursion = (XEXP (PATTERN (insn), 2) != NULL_RTX);
+ 	  int tailcall = (XEXP (PATTERN (insn), 3) != NULL_RTX);
  	  basic_block call_block = BLOCK_FOR_INSN (insn);
  
  	  /* alloca (until we have stack slot life analysis) inhibits
*************** optimize_sibling_and_tail_recursive_call
*** 688,701 ****
  	     may take the address of an argument.  */
   	  if (current_function_calls_alloca
  	      || current_function_varargs || current_function_stdarg)
! 	    sibcall = 0;
  
  	  /* See if there are any reasons we can't perform either sibling or
  	     tail call optimizations.  We must be careful with stack slots
  	     which are live at potential optimization sites.  */
  	  if (no_sibcalls_this_function
- 	      /* ??? Overly conservative.  */
- 	      || frame_offset
  	      /* Any function that calls setjmp might have longjmp called from
  		 any called function.  ??? We really should represent this
  		 properly in the CFG so that this needn't be special cased.  */
--- 695,706 ----
  	     may take the address of an argument.  */
   	  if (current_function_calls_alloca
  	      || current_function_varargs || current_function_stdarg)
! 	    sibcall = 0, tailcall = 0;
  
  	  /* See if there are any reasons we can't perform either sibling or
  	     tail call optimizations.  We must be careful with stack slots
  	     which are live at potential optimization sites.  */
  	  if (no_sibcalls_this_function
  	      /* Any function that calls setjmp might have longjmp called from
  		 any called function.  ??? We really should represent this
  		 properly in the CFG so that this needn't be special cased.  */
*************** optimize_sibling_and_tail_recursive_call
*** 710,715 ****
--- 715,728 ----
  	      /* If this call doesn't end the block, there are operations at
  		 the end of the block which we must execute after returning.  */
  	      || ! call_ends_block_p (insn, call_block->end))
+ 	    sibcall = 0, tailrecursion = 0, tailcall = 0;
+ 	  
+ 	  /* Another reason why sibling calls fail: if the stack frame size is
+ 	     positive.  This can be overcome by letting the front-end do live
+ 	     analysis and tell the back-end it's safe to do tail call
+ 	     optimization.  For now, we just ignore the restriction though and
+ 	     keep optimizing the super sib calls anyway.  */
+ 	  if (frame_offset)
  	    sibcall = 0, tailrecursion = 0;
  	  
  	  /* Select a set of insns to implement the call and emit them.
*************** optimize_sibling_and_tail_recursive_call
*** 718,730 ****
  	  if (sibcall)
  	    successful_sibling_call = 1;
  
  	  replaced_call_placeholder = 1;
! 	  replace_call_placeholder (insn, 
! 				    tailrecursion != 0 
! 				      ? sibcall_use_tail_recursion
! 				      : sibcall != 0
! 					 ? sibcall_use_sibcall
! 					 : sibcall_use_normal);
  	}
      }
  
--- 731,752 ----
  	  if (sibcall)
  	    successful_sibling_call = 1;
  
+ 	  if (tailcall)
+ 	    successful_tail_call = 1;
+ 
  	  replaced_call_placeholder = 1;
! 	  
! 	  /* For the sake of readability, the different call sequences
! 	     are put into if-conditions with the most restrictive choice
! 	     at the top and the most general call method at the bottom.  */
! 	  if (tailrecursion)
! 	    replace_call_placeholder (insn, sibcall_use_tail_recursion);
! 	  else if (sibcall)
! 	    replace_call_placeholder (insn, sibcall_use_sibcall);
! 	  else if (tailcall)
! 	    replace_call_placeholder (insn, sibcall_use_tailcall);
! 	  else
! 	    replace_call_placeholder (insn, sibcall_use_normal);
  	}
      }
  
Common subdirectories: gcc-3.2/gcc/testsuite and gcc-3.2-cvs/gcc/testsuite
diff -b -w -B -d -p -c gcc-3.2/gcc/toplev.c gcc-3.2-cvs/gcc/toplev.c
*** gcc-3.2/gcc/toplev.c	Mon May 27 15:48:15 2002
--- gcc-3.2-cvs/gcc/toplev.c	Mon Sep  9 11:36:52 2002
*************** int flag_no_peephole = 0;
*** 570,575 ****
--- 570,579 ----
  
  int flag_optimize_sibling_calls = 0;
  
+ /* Nonzero allows GCC to optimize general tail calls.  */
+ 
+ int flag_optimize_tail_calls = 0;
+ 
  /* Nonzero means the front end generally wants `errno' maintained by math
     operations, like built-in SQRT.  */
  
*************** static const lang_independent_options f_
*** 973,978 ****
--- 977,984 ----
     N_("When possible do not generate stack frames") },
    {"optimize-sibling-calls", &flag_optimize_sibling_calls, 1,
     N_("Optimize sibling and tail recursive calls") },
+   {"optimize-tail-calls", &flag_optimize_tail_calls, 1,
+    N_("Optimize general tail calls") },
    {"cse-follow-jumps", &flag_cse_follow_jumps, 1,
     N_("When running CSE, follow jumps to their targets") },
    {"cse-skip-blocks", &flag_cse_skip_blocks, 1,
*************** rest_of_compilation (decl)
*** 2565,2571 ****
  
    /* We may have potential sibling or tail recursion sites.  Select one
       (of possibly multiple) methods of performing the call.  */
!   if (flag_optimize_sibling_calls)
      {
        timevar_push (TV_JUMP);
        open_dump_file (DFI_sibling, decl);
--- 2571,2577 ----
  
    /* We may have potential sibling or tail recursion sites.  Select one
       (of possibly multiple) methods of performing the call.  */
!   if (flag_optimize_sibling_calls || flag_optimize_tail_calls)
      {
        timevar_push (TV_JUMP);
        open_dump_file (DFI_sibling, decl);
diff -b -w -B -d -p -c gcc-3.2/gcc/tree.c gcc-3.2-cvs/gcc/tree.c
*** gcc-3.2/gcc/tree.c	Sat Apr 27 09:46:01 2002
--- gcc-3.2-cvs/gcc/tree.c	Mon Sep  2 16:04:01 2002
*************** get_callee_fndecl (call)
*** 4449,4454 ****
--- 4449,4455 ----
      return TREE_OPERAND (addr, 0);
  
    /* We couldn't figure out what was being called.  */
+   warning ("indirect call, not recognized");
    return NULL_TREE;
  }
  
diff -b -w -B -d -p -c gcc-3.2/gcc/unroll.c gcc-3.2-cvs/gcc/unroll.c
*** gcc-3.2/gcc/unroll.c	Sat Jun 15 11:12:06 2002
--- gcc-3.2-cvs/gcc/unroll.c	Sun Sep  1 18:26:54 2002
*************** copy_loop_body (loop, copy_start, copy_e
*** 2222,2227 ****
--- 2222,2228 ----
  	  copy = emit_call_insn (pattern);
  	  REG_NOTES (copy) = initial_reg_note_copy (REG_NOTES (insn), map);
  	  SIBLING_CALL_P (copy) = SIBLING_CALL_P (insn);
+ 	  TAIL_CALL_P (copy) = TAIL_CALL_P (insn);
  
  	  /* Because the USAGE information potentially contains objects other
  	     than hard registers, we need to copy it.  */

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Work in progress: "Super Sib Calls"; opinions sought
  2002-09-08 19:52 Work in progress: "Super Sib Calls"; opinions sought Andreas Bauer
@ 2002-09-08 23:23 ` Richard Henderson
  2002-09-09  2:48   ` Andreas Bauer
  0 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2002-09-08 23:23 UTC (permalink / raw)
  To: Andreas Bauer; +Cc: gcc, pizka, jason.ozolins

On Mon, Sep 09, 2002 at 12:51:55PM +1000, Andreas Bauer wrote:
> I'm treating super sib calls as an independend call sequence on the RTL
> level to not mess with the existing sibling call stuff.  But the
> technique I'm applying is very similar to what's already there in gcc.

Rather than wade through your patch, perhaps you can explain
_how_ you intend to attack the current problems with regular
sibcalls?

>  - super sib calls may have a positive stack frame

This isn't a problem at present.  What _is_ currently the
restriction is that the caller can't have local variables
whose address is taken.

The ideal restriction here is that the caller can't have
any local variables whose address is somehow leaked to
the callee.  But we currently don't do the data flow
analysis to find out what happens to the addreses we take.

I don't see how you can possibly address the problem posed
by the ideal restriction.

>  - super sib calls must not necessarily match in function signature

This also is not a restriction at present.  The restriction is that
the callee cannot use _more_ argument space than the caller did
(since this space is allocated by the caller's caller).

>  - super sib call arguments are not concerned by overlapping arguments

Better handling of this issue would be a good thing.

> This is achieved by creating "almost a normal call", but right before the
> actual call instruction would be used, I'm copying all the outgoing
> arguments back into the incoming argument space and jump to the subroutine
> instead.  The result is that I can reuse my incoming argument space and
> prevent stack growth for such cases.

How is this different from "normal" sibcalls?

r~

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Work in progress: "Super Sib Calls"; opinions sought
  2002-09-08 23:23 ` Richard Henderson
@ 2002-09-09  2:48   ` Andreas Bauer
  2002-09-09  3:12     ` Richard Henderson
  2002-09-09  3:34     ` Lars Brinkhoff
  0 siblings, 2 replies; 29+ messages in thread
From: Andreas Bauer @ 2002-09-09  2:48 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc, pizka, jason.ozolins

> Rather than wade through your patch, perhaps you can explain
> _how_ you intend to attack the current problems with regular
> sibcalls?

First off, the super sib calls will not be a replacement for the current
sib call stuff, but rather an addition.  I figure that's necessary, in
case I'm running into certain portability issues during the process.

So, what I'm trying to do is offer a more general stack re-usage for calls
in tail position, even if the sib call optimisation fails.  That is, I'm
generating RTL code similar to an ordinary call for each tail call, but
I'm moving the args back into the incoming arg space, _after_ they're all
evaluated and have been mangled via the outgoing arg space.  In a way
that prevents the worrying about overlapping arguments (also see below).

And how did I proceed so far?  I have added a new sequence to the
CALL_PLACEHOLDER object that's being filled during calls.c.  Hence, a new
independent call chain will be passed on and later evaluated during
rest_of_compilation in sibcalls.c.  If, for example, the tail call con-
tains var args, then super sib calls (as well as other tail call
optimisations) fail.  However, in cases where the callee requires as much
(or less) arg space as the caller, super sib calls actually optimise the
call, _even_ if the current function has allocated huge amounts of stack
space (for locals).  Note that sibcalls currently fail in such cases!

Other restrictions, like "no indirect calls", "-fpic", etc are currently
also part of super sib calls, but I'm working on removing some of them,
too (especially indirect calls are important, even if specific platforms
like ARM may not benefit from these changes).

To sum this up a bit: my patch mainly touches calls.c and sibcalls.c.  I
did need to modify rtl.def and rtl.h to extend the placeholder; just like
I had to promote/pass the new call sequence through some other files as
well.  So, in calls.c I'm checking whether there's a super sib call
candidate and if so, emit RTL for it; and in sibcall.c I'm picking an
appropriate sequence.  The actual jump instruction and resetting of the
stack pointer (to achieve stack re-usage) is handled via the current sib
call epilogue, as this is the same for super sib calls and "ordinary" sib
calls.  On x86 it usually looks like this:

           ...
           leave
           jmp foo

No need to reinvent the wheel, I guess.

> >  - super sib calls may have a positive stack frame
> 
> This isn't a problem at present.  What _is_ currently the
> restriction is that the caller can't have local variables
> whose address is taken.

That's interesting.  In sibcall.c I found something along the lines of
this:

              /* ??? Overly conservative.  */
	      || frame_offset
              ...
	      sibcall = 0, tailrecursion = 0;

To me that means you're not allowed to have any locals at all and my test
programs seem to verify that assumption.  So, what's correct?

> >  - super sib calls must not necessarily match in function signature
> 
> This also is not a restriction at present.  The restriction is that
> the callee cannot use _more_ argument space than the caller did
> (since this space is allocated by the caller's caller).

Oops, my bad.  Super sib calls will be working just like that, for the
sake of binary compatibility.  Everything more advanced will be attacked
in a second step where I'm actually using an independent calling con-
vention.

> >  - super sib call arguments are not concerned by overlapping arguments
> 
> Better handling of this issue would be a good thing.

I think I have addressed this one already.  I use the outgoing arg space
to mangle things and store the evaluated arguments, then copy the final
results to the incoming arg space, _afterwards_.

I guess that means there won't be any overlapping args at all anymore.
My solution is simply a bit slower.

> > This is achieved by creating "almost a normal call", but right before the
> > actual call instruction would be used, I'm copying all the outgoing
> > arguments back into the incoming argument space and jump to the subroutine
> > instead.  The result is that I can reuse my incoming argument space and
> > prevent stack growth for such cases.
> 
> How is this different from "normal" sibcalls?

This is _very_ different, because the normal sibcalls don't have space on
the stack to mangle the arguments.  Instead everything is efficiently
pushed straight into the incoming arg space.  That's when you get problems
with overlapping arguments, especially when passing on your own args, etc.

I'm proposing a solution that's maybe a bit slower, but covers more cases
of tail calls.

So, thanks very much for the reply and the comments and I hope this mail
puts my approach into perspective a bit more.

Cheers,
Andi.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Work in progress: "Super Sib Calls"; opinions sought
  2002-09-09  2:48   ` Andreas Bauer
@ 2002-09-09  3:12     ` Richard Henderson
  2002-09-09  3:34       ` Jakub Jelinek
                         ` (4 more replies)
  2002-09-09  3:34     ` Lars Brinkhoff
  1 sibling, 5 replies; 29+ messages in thread
From: Richard Henderson @ 2002-09-09  3:12 UTC (permalink / raw)
  To: Andreas Bauer; +Cc: gcc, pizka, jason.ozolins

On Mon, Sep 09, 2002 at 07:49:22PM +1000, Andreas Bauer wrote:
> So, what I'm trying to do is offer a more general stack re-usage for calls
> in tail position, even if the sib call optimisation fails.  That is, I'm
> generating RTL code similar to an ordinary call for each tail call, but
> I'm moving the args back into the incoming arg space, _after_ they're all
> evaluated and have been mangled via the outgoing arg space.

Personally, I see this as the _easy_ part.  This could be done
within the existing framework for sibcalls.  And, IMO, if you're
not doing this within the existing framework for sibcalls you 
are making a mistake.

The hard part is distinguishing 

	void foo()
	{
	  int x[100];

	  // something local that uses x

	  bar();  // legal to tail-call
	}

	void baz()
	{
	  int x[100];

	  global = x;

	  bar();  // *not* legal to tail-call, since bar may reference x
	}

> However, in cases where the callee requires as much
> (or less) arg space as the caller, super sib calls actually optimise the
> call, _even_ if the current function has allocated huge amounts of stack
> space (for locals).  Note that sibcalls currently fail in such cases!

For *exactly* the reason above!  You absolutely positively *must*
guarantee that no stack frame addresses escape from the function.
We don't currently do that, and thus

              /* ??? Overly conservative.  */
              || frame_offset

simply notes that if there is no local stack frame, none of it
could have possibly escaped.  (My previous message was confused on
this issue -- I thought we did minimal scanning; apparently we do
nothing at all.)

If you do not plan to address this data flow issue in some way,
you might as well quit now.

r~

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Work in progress: "Super Sib Calls"; opinions sought
  2002-09-09  3:12     ` Richard Henderson
@ 2002-09-09  3:34       ` Jakub Jelinek
  2002-09-09  6:49       ` Fergus Henderson
                         ` (3 subsequent siblings)
  4 siblings, 0 replies; 29+ messages in thread
From: Jakub Jelinek @ 2002-09-09  3:34 UTC (permalink / raw)
  To: Richard Henderson, Andreas Bauer, gcc, pizka, jason.ozolins

On Mon, Sep 09, 2002 at 03:12:18AM -0700, Richard Henderson wrote:
> On Mon, Sep 09, 2002 at 07:49:22PM +1000, Andreas Bauer wrote:
> > So, what I'm trying to do is offer a more general stack re-usage for calls
> > in tail position, even if the sib call optimisation fails.  That is, I'm
> > generating RTL code similar to an ordinary call for each tail call, but
> > I'm moving the args back into the incoming arg space, _after_ they're all
> > evaluated and have been mangled via the outgoing arg space.
> 
> Personally, I see this as the _easy_ part.  This could be done
> within the existing framework for sibcalls.  And, IMO, if you're
> not doing this within the existing framework for sibcalls you 
> are making a mistake.
> 
> The hard part is distinguishing 
> 
> 	void foo()
> 	{
> 	  int x[100];
> 
> 	  // something local that uses x
> 
> 	  bar();  // legal to tail-call
> 	}
> 
> 	void baz()
> 	{
> 	  int x[100];
> 
> 	  global = x;
> 
> 	  bar();  // *not* legal to tail-call, since bar may reference x
> 	}

And also if we want to utilize the knowledge -fhosted gives about various
standard functions:

  void foo(char *p)
  {
    char x[100];
    char *q = strcpy (x, p);
    // something local that uses q and x
    bar(p);
  }

  void baz(char *p)
  {
    char x[100];
    char *q = strcpy (x, p);
    global = q + 10;
    bar (p); // *not* legal to tail-call
  }

  void baz2(char *p)
  {
    char x[100];
    char *q = strcpy (x, p);
    bar (q); // *not* legal to tail-call, as q will
	     // (it is sufficient that it might though) contain a pointer
	     // to local stack area to be freed before tail call
  }

etc., ie. unless you know everything about a particular external function,
you have to assume worst case it will store the pointers given to it into some
global variable, while for known functions you know that they will resp.
will not store them into global variables and know whether e.g. the return
value etc. might contain pointer derived from the one passed to it.

	Jakub

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Work in progress: "Super Sib Calls"; opinions sought
  2002-09-09  2:48   ` Andreas Bauer
  2002-09-09  3:12     ` Richard Henderson
@ 2002-09-09  3:34     ` Lars Brinkhoff
  2002-09-09  6:53       ` Fergus Henderson
  1 sibling, 1 reply; 29+ messages in thread
From: Lars Brinkhoff @ 2002-09-09  3:34 UTC (permalink / raw)
  To: Andreas Bauer; +Cc: gcc

Andreas Bauer <baueran@in.tum.de> writes:
> Everything more advanced will be attacked in a second step where I'm
> actually using an independent calling convention.

What convention?  One where the callee pops its frame?

-- 
Lars Brinkhoff          http://lars.nocrew.org/     Linux, GCC, PDP-10,
Brinkhoff Consulting    http://www.brinkhoff.se/    HTTP programming

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Work in progress: "Super Sib Calls"; opinions sought
  2002-09-09  3:12     ` Richard Henderson
  2002-09-09  3:34       ` Jakub Jelinek
@ 2002-09-09  6:49       ` Fergus Henderson
  2002-09-09  7:26         ` Simon Marlow
                           ` (3 more replies)
  2002-09-09  9:44       ` Jeff Law
                         ` (2 subsequent siblings)
  4 siblings, 4 replies; 29+ messages in thread
From: Fergus Henderson @ 2002-09-09  6:49 UTC (permalink / raw)
  To: Richard Henderson, Andreas Bauer, gcc, pizka, jason.ozolins

On 09-Sep-2002, Richard Henderson <rth@redhat.com> wrote:
> On Mon, Sep 09, 2002 at 07:49:22PM +1000, Andreas Bauer wrote:
> > So, what I'm trying to do is offer a more general stack re-usage for calls
> > in tail position, even if the sib call optimisation fails.  That is, I'm
> > generating RTL code similar to an ordinary call for each tail call, but
> > I'm moving the args back into the incoming arg space, _after_ they're all
> > evaluated and have been mangled via the outgoing arg space.
> 
> Personally, I see this as the _easy_ part.

GOOD!  Let's tackle the easy parts first.

Break the task down into small parts, and when the easy
ones are done, then the hard ones will be easier...

> This could be done
> within the existing framework for sibcalls.  And, IMO, if you're
> not doing this within the existing framework for sibcalls you 
> are making a mistake.

In other words, the decision between whether to use an ordinary sibcall
(construct arguments in the incoming args area) and a "super sibcall"
(construct arguments in the outgoing args area and then copy them to
the incoming args area) should be made in calls.c, before constructing
the sibcall CALL_PLACEHOLDER chain.  OK, that sounds reasonable.

Andreas, just to expand on the rationale for this a bit more:
the reason for delaying the choice between ordinary calls and sibling
calls is that this depends on the context (whether the call is actually
in a tail position), and in general we don't know what code will follow
when generating the call.  Hence we need to generate RTL for both
kinds, and make the decision later, once we know what code follows.
However, the choice between sibling calls and "super" sibcalls depends
only on factors such as whether the arguments overlap, which can be
determined at the time the call is generated.  It is therefore better
(for compilation time) to make the decision when generating the call,
avoiding the need to generate another RTL chain.

> The hard part is distinguishing 
> 
> 	void foo()
> 	{
> 	  int x[100];
> 	  // something local that uses x
> 	  bar();  // legal to tail-call
> 	}
> 
> 	void baz()
> 	{
> 	  int x[100];
> 	  global = x;
> 	  bar();  // *not* legal to tail-call, since bar may reference x
> 	}

Right.  I think Andreas is not trying to distinguish between these two,
but is planning to use the explicit annotation approach that I
posted some time ago.  In other words, we'd distinguish between those and

 	void quux()
 	{
 	  int x[100];
 	  global = x;
 	  __tailcall bar(); /* legal to tail-call;
	  	the annotation implies that
		bar is not allowed to reference x. */
 	}

But this can be dealt with by a separate patch.
It doesn't need to be part of Andreas' change.

> (My previous message was confused on
> this issue -- I thought we did minimal scanning; apparently we do
> nothing at all.)

Actually there *is* code in sibcall.c to do such scanning (see
uses_addressof() and sequence_uses_addressof() in sibcall.c).  However,
the test for frame_size in optimize_sibling_and_tail_recursive_calls()
in sibcall.c makes this code currently redundant, I think.
I don't understand why both are needed.

> If you do not plan to address this data flow issue in some way,
> you might as well quit now.

Hey, we don't want to unnecessarily discourage volunteers, do we? ;-)
The data flow issue does need to be addressed eventually.
But I don't think it needs to be addressed right away;
improving sibcall to better handle overlapping arguments
would already be a useful improvement.
The data flow issues can be address by the explicit annotation
approach that I suggested earlier.

-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Work in progress: "Super Sib Calls"; opinions sought
  2002-09-09  3:34     ` Lars Brinkhoff
@ 2002-09-09  6:53       ` Fergus Henderson
  2002-09-09  7:11         ` Richard Earnshaw
  0 siblings, 1 reply; 29+ messages in thread
From: Fergus Henderson @ 2002-09-09  6:53 UTC (permalink / raw)
  To: Lars Brinkhoff; +Cc: Andreas Bauer, gcc

On 09-Sep-2002, Lars Brinkhoff <lars.spam@nocrew.org> wrote:
> Andreas Bauer <baueran@in.tum.de> writes:
> > Everything more advanced will be attacked in a second step where I'm
> > actually using an independent calling convention.
> 
> What convention?  One where the callee pops its frame?

Yes.

The calling convention

	(1) must be callee-pops,
	      to enable callee argument size to 
	      exceed caller argument size

	(2) needs to ensure that at least one
	      call-clobbered register is not
	      used for argument passing,
	      to handle indirect calls
	      (apparently some existing calling conventions,
	      such as the one for ARM, don't have this property)

	(3) it might be desirable to use
	      not very many callee-save registers,
	      because these need to be restored
	      at every optimized last call

-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Work in progress: "Super Sib Calls"; opinions sought
  2002-09-09  6:53       ` Fergus Henderson
@ 2002-09-09  7:11         ` Richard Earnshaw
  0 siblings, 0 replies; 29+ messages in thread
From: Richard Earnshaw @ 2002-09-09  7:11 UTC (permalink / raw)
  To: Fergus Henderson; +Cc: Lars Brinkhoff, Andreas Bauer, gcc, Richard.Earnshaw


> 	(2) needs to ensure that at least one
> 	      call-clobbered register is not
> 	      used for argument passing,
> 	      to handle indirect calls
> 	      (apparently some existing calling conventions,
> 	      such as the one for ARM, don't have this property)
> 

In this case I think we can use IP (r12) (though not in Thumb mode), since 
the register is dead by the time we hit any intra-link stub which might 
clobber the register.   Note however, that IP is also used for nested 
function static-chain address...

R.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: Work in progress: "Super Sib Calls"; opinions sought
  2002-09-09  6:49       ` Fergus Henderson
@ 2002-09-09  7:26         ` Simon Marlow
  2002-09-09 19:40           ` Jason Ozolins
  2002-09-09  9:35         ` Markus Pizka
                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 29+ messages in thread
From: Simon Marlow @ 2002-09-09  7:26 UTC (permalink / raw)
  To: 'Fergus Henderson',
	Richard Henderson, Andreas Bauer, gcc, pizka, jason.ozolins

> > This could be done
> > within the existing framework for sibcalls.  And, IMO, if you're
> > not doing this within the existing framework for sibcalls you 
> > are making a mistake.
> 
> In other words, the decision between whether to use an 
> ordinary sibcall
> (construct arguments in the incoming args area) and a "super sibcall"
> (construct arguments in the outgoing args area and then copy them to
> the incoming args area) should be made in calls.c, before constructing
> the sibcall CALL_PLACEHOLDER chain.  OK, that sounds reasonable.

Wouldn't it be better to always contruct the new arguments in place,
using a dependency-analysis pass to decide in which order to calculate
the arguments and whether any temporaries are needed?  In most cases
you'd be able to do a tail-call as fast or faster than a normal call.  

I realise this is much harder than doing the copying, but it's a big win
because it eliminates the performance pessimisation of copying the
arguments, meaning that having tail-calls turned on all the time would
probably be a good idea from a performace perspective, not just stack
space.

> Right.  I think Andreas is not trying to distinguish between 
> these two,
> but is planning to use the explicit annotation approach that I
> posted some time ago.  In other words, we'd distinguish 
> between those and
> 
>  	void quux()
>  	{
>  	  int x[100];
>  	  global = x;
>  	  __tailcall bar(); /* legal to tail-call;
> 	  	the annotation implies that
> 		bar is not allowed to reference x. */
>  	}
> 
> But this can be dealt with by a separate patch.
> It doesn't need to be part of Andreas' change.

Speaking as someone who works on a compiler which generates this kind of
code, I'd be happy with an explicit annotation.  Happier, even: the
compiler can warn me if the call can't be compiled as a tail-call.

Cheers,
	Simon

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Work in progress: "Super Sib Calls"; opinions sought
  2002-09-09  6:49       ` Fergus Henderson
  2002-09-09  7:26         ` Simon Marlow
@ 2002-09-09  9:35         ` Markus Pizka
  2002-09-09 10:08         ` Jeff Law
  2002-09-09 10:17         ` Richard Henderson
  3 siblings, 0 replies; 29+ messages in thread
From: Markus Pizka @ 2002-09-09  9:35 UTC (permalink / raw)
  To: fjh; +Cc: rth, baueran, gcc, jason.ozolins

Fergus Henderson writes (9. September 2002):
 > On 09-Sep-2002, Richard Henderson <rth@redhat.com> wrote:
 > > On Mon, Sep 09, 2002 at 07:49:22PM +1000, Andreas Bauer wrote:
 > > > So, what I'm trying to do is offer a more general stack re-usage for calls
 > > > in tail position, even if the sib call optimisation fails.  That is, I'm
 > > > generating RTL code similar to an ordinary call for each tail call, but
 > > > I'm moving the args back into the incoming arg space, _after_ they're all
 > > > evaluated and have been mangled via the outgoing arg space.
 > > 
 > > Personally, I see this as the _easy_ part.
 > 
 > GOOD!  Let's tackle the easy parts first.

completely agreed
we should tackle this challenge step by step instead of capitulating
the effort is worth it

 > Break the task down into small parts, and when the easy
 > ones are done, then the hard ones will be easier...

exactly

 > > The hard part is distinguishing 
 > > 
 > > 	void foo()
 > > 	{
 > > 	  int x[100];
 > > 	  // something local that uses x
 > > 	  bar();  // legal to tail-call
 > > 	}
 > > 
 > > 	void baz()
 > > 	{
 > > 	  int x[100];
 > > 	  global = x;
 > > 	  bar();  // *not* legal to tail-call, since bar may reference x
 > > 	}
 > 
 > Right.  I think Andreas is not trying to distinguish between these two,
 > but is planning to use the explicit annotation approach that I
 > posted some time ago.  In other words, we'd distinguish between those and
 > 
 >  	void quux()
 >  	{
 >  	  int x[100];
 >  	  global = x;
 >  	  __tailcall bar(); /* legal to tail-call;
 > 	  	the annotation implies that
 > 		bar is not allowed to reference x. */
 >  	}
 >
 > But this can be dealt with by a separate patch.
 > It doesn't need to be part of Andreas' change.

exactly,
I think the explicit annotation is a reasonable starting point to tackle
the tailcall problem

 > > If you do not plan to address this data flow issue in some way,
 > > you might as well quit now.
 > 
 > Hey, we don't want to unnecessarily discourage volunteers, do we? ;-)

No, we definetely don't.
Andreas work is valuable and important to further propagate gcc as the most
suitable or even unique solution to a wide range of applications 

 > The data flow issue does need to be addressed eventually.
 > But I don't think it needs to be addressed right away;

agreed

 > improving sibcall to better handle overlapping arguments
 > would already be a useful improvement.
 > The data flow issues can be address by the explicit annotation
 > approach that I suggested earlier.


--
   Markus Pizka

 -------------------------------------------------------------
 Dr. Markus Pizka                              pizka@in.tum.de
 Technische Universitaet Muenchen, Institut für Informatik I4
 room: 01.11.053, Boltzmannstr. 3, D - 85748 Garching
 Tel: +49 89 289-17334                   Fax: +49 89 289-17307
 -------------------------------------------------------------

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Work in progress: "Super Sib Calls"; opinions sought
  2002-09-09  3:12     ` Richard Henderson
  2002-09-09  3:34       ` Jakub Jelinek
  2002-09-09  6:49       ` Fergus Henderson
@ 2002-09-09  9:44       ` Jeff Law
  2002-09-09 23:09         ` Andreas Bauer
  2002-09-09 10:38       ` Geert Bosch
  2002-09-09 22:46       ` Andreas Bauer
  4 siblings, 1 reply; 29+ messages in thread
From: Jeff Law @ 2002-09-09  9:44 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Andreas Bauer, gcc, pizka, jason.ozolins

In message <20020909101218.GA22341@redhat.com>, Richard Henderson writes:
 >On Mon, Sep 09, 2002 at 07:49:22PM +1000, Andreas Bauer wrote:
 >> So, what I'm trying to do is offer a more general stack re-usage for calls
 >> in tail position, even if the sib call optimisation fails.  That is, I'm
 >> generating RTL code similar to an ordinary call for each tail call, but
 >> I'm moving the args back into the incoming arg space, _after_ they're all
 >> evaluated and have been mangled via the outgoing arg space.
 >
 >Personally, I see this as the _easy_ part.  This could be done
 >within the existing framework for sibcalls.  And, IMO, if you're
 >not doing this within the existing framework for sibcalls you 
 >are making a mistake.
Also note that simply constructing arguments in the outgoing space is
not necessarily going to avoid the overlapping problems.  Consider
targets where the incoming and outgoing space is not disjoint
(as is the case when passing arguments in registers).


For memory arguments, clearly the incoming and outgoing spaces are
disjoint.  However, you've got those (*&#@$ memory copies from the
outgoing space back to the incoming space; those are bloody expensive.

As others have pointed out, the way to go is to build a dependency graph,
break the cycles and use the resulting graph to construct the arguments in
the incoming space whenever possible.  Note this can (and should) happen
within the current context of tail/sibling call optimizations.  In fact,
building a dependency graph for argument setup would be useful in other
ways (for example all the silly REG_PARM_STACK_SPACE copying that we do
when we encounter nested calls).

 >For *exactly* the reason above!  You absolutely positively *must*
 >guarantee that no stack frame addresses escape from the function.
 >We don't currently do that, and thus
 >
 >              /* ??? Overly conservative.  */
 >              || frame_offset
 >
 >simply notes that if there is no local stack frame, none of it
 >could have possibly escaped.  (My previous message was confused on
 >this issue -- I thought we did minimal scanning; apparently we do
 >nothing at all.)
We ran out of time on that contract to implement the dataflow analysis
necessary to track live memory slots, so we had to stick with the 
overly conservative "no local variables" scheme.   I don't know why
we didn't refine that to be "no locally addressable variables".

jeff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Work in progress: "Super Sib Calls"; opinions sought
  2002-09-09  6:49       ` Fergus Henderson
  2002-09-09  7:26         ` Simon Marlow
  2002-09-09  9:35         ` Markus Pizka
@ 2002-09-09 10:08         ` Jeff Law
  2002-09-09 10:17         ` Richard Henderson
  3 siblings, 0 replies; 29+ messages in thread
From: Jeff Law @ 2002-09-09 10:08 UTC (permalink / raw)
  To: Fergus Henderson
  Cc: Richard Henderson, Andreas Bauer, gcc, pizka, jason.ozolins

 In message <20020909134920.GA26479@ceres.cs.mu.oz.au>, Fergus Henderson 
writes:
 >Actually there *is* code in sibcall.c to do such scanning (see
 >uses_addressof() and sequence_uses_addressof() in sibcall.c).  However,
 >the test for frame_size in optimize_sibling_and_tail_recursive_calls()
 >in sibcall.c makes this code currently redundant, I think.
 >I don't understand why both are needed.
ADDRESSOF doesn't generate a stack slot until after CSE has completed.  ie,
the existence of an ADDRESSOF expression will have zero effect on frame_size
until after CSE has completed and we purge ADDRESSOF expressions.

I'm pretty sure both checks are still necessary.

jeff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Work in progress: "Super Sib Calls"; opinions sought
  2002-09-09  6:49       ` Fergus Henderson
                           ` (2 preceding siblings ...)
  2002-09-09 10:08         ` Jeff Law
@ 2002-09-09 10:17         ` Richard Henderson
  3 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2002-09-09 10:17 UTC (permalink / raw)
  To: Fergus Henderson; +Cc: Andreas Bauer, gcc, pizka, jason.ozolins

On Mon, Sep 09, 2002 at 11:49:20PM +1000, Fergus Henderson wrote:
> Right.  I think Andreas is not trying to distinguish between these two,
> but is planning to use the explicit annotation approach that I
> posted some time ago.

Ah, right.  I'd forgotten about that.

> Hey, we don't want to unnecessarily discourage volunteers, do we? ;-)

No, we don't.  I certainly didn't mean to disparage the real
improvements already extant in his patch.


r~

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Work in progress: "Super Sib Calls"; opinions sought
  2002-09-09  3:12     ` Richard Henderson
                         ` (2 preceding siblings ...)
  2002-09-09  9:44       ` Jeff Law
@ 2002-09-09 10:38       ` Geert Bosch
  2002-09-09 10:41         ` Andrew Haley
  2002-09-09 22:46       ` Andreas Bauer
  4 siblings, 1 reply; 29+ messages in thread
From: Geert Bosch @ 2002-09-09 10:38 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Andreas Bauer, gcc, pizka, jason.ozolins

On Monday, Sep 9, 2002, at 06:12 America/New_York, Richard Henderson 
wrote:

> If you do not plan to address this data flow issue in some way,
> you might as well quit now.
>

Note that Ada semantics prevent such leaking of general pointer types.
I'm not sure what the situation is with other languages such as Fortran,
but it seems a bit strong to say that better optimizing tail calls for
languages that do not suffer from this particular C problem would be 
useless.

   -Geert

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Work in progress: "Super Sib Calls"; opinions sought
  2002-09-09 10:38       ` Geert Bosch
@ 2002-09-09 10:41         ` Andrew Haley
  0 siblings, 0 replies; 29+ messages in thread
From: Andrew Haley @ 2002-09-09 10:41 UTC (permalink / raw)
  To: Geert Bosch; +Cc: Richard Henderson, Andreas Bauer, gcc, pizka, jason.ozolins

Geert Bosch writes:
 > 
 > On Monday, Sep 9, 2002, at 06:12 America/New_York, Richard Henderson 
 > wrote:
 > 
 > > If you do not plan to address this data flow issue in some way,
 > > you might as well quit now.
 > >
 > 
 > 
 > Note that Ada semantics prevent such leaking of general pointer types.

Java too.

 > I'm not sure what the situation is with other languages such as Fortran,
 > but it seems a bit strong to say that better optimizing tail calls for
 > languages that do not suffer from this particular C problem would be 
 > useless.

Yup.

Andrew.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Work in progress: "Super Sib Calls"; opinions sought
  2002-09-09  7:26         ` Simon Marlow
@ 2002-09-09 19:40           ` Jason Ozolins
  2002-09-10  5:59             ` Simon Marlow
  2002-09-10 10:05             ` Jeff Law
  0 siblings, 2 replies; 29+ messages in thread
From: Jason Ozolins @ 2002-09-09 19:40 UTC (permalink / raw)
  To: Simon Marlow; +Cc: gcc, pizka

Simon Marlow wrote:

> > In other words, the decision between whether to use an
> > ordinary sibcall
> > (construct arguments in the incoming args area) and a "super sibcall"
> > (construct arguments in the outgoing args area and then copy them to
> > the incoming args area) should be made in calls.c, before constructing
> > the sibcall CALL_PLACEHOLDER chain.  OK, that sounds reasonable.
> 
> Wouldn't it be better to always contruct the new arguments in place,
> using a dependency-analysis pass to decide in which order to calculate
> the arguments and whether any temporaries are needed?  In most cases
> you'd be able to do a tail-call as fast or faster than a normal call.
> 
> I realise this is much harder than doing the copying, but it's a big win
> because it eliminates the performance pessimisation of copying the
> arguments, meaning that having tail-calls turned on all the time would
> probably be a good idea from a performace perspective, not just stack
> space.

Doing the dependency analysis would be great, but it strikes me that if
it were easy, someone would probably already have done it.  :-)  If it
is the case that it's not done at all because it's hard to make it
perfect, what's wrong with an effort to make tail-calls work with
overlapping arguments in a way that is easier to verify correctness,
leaving open the option to do the dependency analysis as an optimization
later on?  If the tail-call must be annotated in the source to be a
candidate for super sibcalling, as was mentioned by Markus Pizka, then
the only people who pay the price for argument rearrangement are people
who explicitly request a tailcall; they can decide whether to pay the
time penalty for argument rearrangement in return for constraining stack
usage.

In the worst case you may well need N temporaries anyway, so you do need
to decide where to put the "argument marshalling area" for the
super-sibcall; consider

int foo (int a, b, c, d)
{
	return bar (a*2+b*3+c*5+d*7, a*3+b*5+c*7+d*11,
			a*5+b*7+c*11+d*13, a*7+b*11+c*13+d*17);
}

It may be possible to do this with less than 4 temporaries, but I don't
think it's easy.  If the multipliers aren't literal values but globals,
then it gets close to impossible, no?  In light of this, I reckon that
getting overlapping arguments working at all would be better than
waiting for the perfect solution to argument rearrangement.
-- 
Jason Ozolins
Technical Support Group                 Local: x55449
Department of Computer Science         Global: +61 2 6125 5449          
The Australian National University      Email: jason.ozolins@anu.edu.au

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Work in progress: "Super Sib Calls"; opinions sought
  2002-09-09  3:12     ` Richard Henderson
                         ` (3 preceding siblings ...)
  2002-09-09 10:38       ` Geert Bosch
@ 2002-09-09 22:46       ` Andreas Bauer
  2002-09-09 23:34         ` Richard Henderson
                           ` (2 more replies)
  4 siblings, 3 replies; 29+ messages in thread
From: Andreas Bauer @ 2002-09-09 22:46 UTC (permalink / raw)
  To: Richard Henderson, gcc, pizka, jason.ozolins

> Personally, I see this as the _easy_ part.  This could be done
> within the existing framework for sibcalls.  And, IMO, if you're
> not doing this within the existing framework for sibcalls you 
> are making a mistake.

The reason why I chose a new and independent call sequence is that I am
aiming to optimise indirect calls as well (and maybe few other
constraints I encounter during this process).  Indirect calls, as you
have pointed out in an earlier email, can not easily be optimised on
certain platforms as the call clobbers all registers on register poor
machines.  Super sib calls will therefore not work on these architectures
and it may be useful to have the original sibcall framework as a fall
back, IMO.

As I have explained in my first email, the goal of these optimisations is
to enable certain front-ends to do tail call optimisation.  And, for
example, the GHC generated intermediate C-code has got indirect calls all
over the place.

So, maybe it would be more appropriate redesigning the sibcall stuff and
simply add my argument marshalling mechanism to it, but I do not see how
I could possibly address indirect calls without making ARM maintainers
(and others) unhappy.

Andi.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Work in progress: "Super Sib Calls"; opinions sought
  2002-09-09  9:44       ` Jeff Law
@ 2002-09-09 23:09         ` Andreas Bauer
  2002-09-10 10:00           ` Jeff Law
  0 siblings, 1 reply; 29+ messages in thread
From: Andreas Bauer @ 2002-09-09 23:09 UTC (permalink / raw)
  To: law; +Cc: gcc, pizka, jason.ozolins

> Also note that simply constructing arguments in the outgoing space is
> not necessarily going to avoid the overlapping problems.  Consider
> targets where the incoming and outgoing space is not disjoint
> (as is the case when passing arguments in registers).
> 
> For memory arguments, clearly the incoming and outgoing spaces are
> disjoint.  However, you've got those (*&#@$ memory copies from the
> outgoing space back to the incoming space; those are bloody expensive.

I wasn't aware of these sorts of problems and architectures.  I assume
ARM is again the platform which would cause difficulties, since its got
such a limited number of registers?  I'd be glad if you could point out
some examples for me to have a look at.

Andi.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Work in progress: "Super Sib Calls"; opinions sought
  2002-09-09 22:46       ` Andreas Bauer
@ 2002-09-09 23:34         ` Richard Henderson
  2002-09-10 10:08           ` Jeff Law
  2002-09-10 10:10           ` Jeff Law
  2002-09-10  0:06         ` Fergus Henderson
  2002-09-10  2:26         ` Richard Earnshaw
  2 siblings, 2 replies; 29+ messages in thread
From: Richard Henderson @ 2002-09-09 23:34 UTC (permalink / raw)
  To: Andreas Bauer, gcc, pizka, jason.ozolins

On Tue, Sep 10, 2002 at 03:46:07PM +1000, Andreas Bauer wrote:
> ... but I do not see how
> I could possibly address indirect calls without making ARM maintainers
> (and others) unhappy.

The way you can make this work with the existing mechanism is to push
the "no indirect" check into every target's FUNCTION_OK_FOR_SIBCALL hook.
This, by itself, preserves the status quo.

Then, on a case-by-case basis, modify each target's sibcall patterns
(and if it doesn't have one, add it) such that there is a register
class available that is call-clobbered.

E.g. on i386, 

(define_insn "*sibcall_1"
  [(call (mem:QI (match_operand:SI 0 "call_insn_operand" "s,c,d,a"))
	 (match_operand 1 "" ""))]
  "SIBLING_CALL_P (insn) && !TARGET_64BIT"
 ...

The four alternatives are "immediate", "ecx", "edx", and "eax";
the later three are the call clobbered registers.  This isn't
quite as good as having a ACD_REGS register class, but adding
too many register classes slows down the register allocator.

At which point you can remove the "no indirect" check from
FUNCTION_OK_FOR_SIBCALL.

r~

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Work in progress: "Super Sib Calls"; opinions sought
  2002-09-09 22:46       ` Andreas Bauer
  2002-09-09 23:34         ` Richard Henderson
@ 2002-09-10  0:06         ` Fergus Henderson
  2002-09-10  2:26         ` Richard Earnshaw
  2 siblings, 0 replies; 29+ messages in thread
From: Fergus Henderson @ 2002-09-10  0:06 UTC (permalink / raw)
  To: Andreas Bauer, Richard Henderson, gcc, pizka, jason.ozolins

On 10-Sep-2002, Andreas Bauer <baueran@in.tum.de> wrote:
> The reason why I chose a new and independent call sequence is that I am
> aiming to optimise indirect calls as well (and maybe few other
> constraints I encounter during this process).  Indirect calls, as you
> have pointed out in an earlier email, can not easily be optimised on
> certain platforms as the call clobbers all registers on register poor
> machines.  Super sib calls will therefore not work on these architectures

I don't see why you would want to disable *all* super sib calls on such
platforms; it would be better to disable the use of super sib calls
only for those cases that won't work.  In particular it should be
possible to use super sib calls for direct calls.

> and it may be useful to have the original sibcall framework as a fall
> back, IMO.

By all means keep the original sib call code as a fall back.
But the decision as to whether to use the original sib call code
or the new super sib call code can be done in calls.c, when
generating the call, rather than in sibcall.c.

(Actually the new super sib call code should be the fall back, and the
original sib call code should be tried first, since in cases where
both are applicable, the original sib call code will be more efficient.)

> So, maybe it would be more appropriate redesigning the sibcall stuff and
> simply add my argument marshalling mechanism to it, but I do not see how
> I could possibly address indirect calls without making ARM maintainers
> (and others) unhappy.

This could be achieved by only optimizing indirect calls in cases
where it is safe to do so.

-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Work in progress: "Super Sib Calls"; opinions sought
  2002-09-09 22:46       ` Andreas Bauer
  2002-09-09 23:34         ` Richard Henderson
  2002-09-10  0:06         ` Fergus Henderson
@ 2002-09-10  2:26         ` Richard Earnshaw
  2 siblings, 0 replies; 29+ messages in thread
From: Richard Earnshaw @ 2002-09-10  2:26 UTC (permalink / raw)
  To: Andreas Bauer
  Cc: Richard Henderson, gcc, pizka, jason.ozolins, Richard.Earnshaw

> So, maybe it would be more appropriate redesigning the sibcall stuff and
> simply add my argument marshalling mechanism to it, but I do not see how
> I could possibly address indirect calls without making ARM maintainers
> (and others) unhappy.

As long as it doesn't mean that we loose the existing level of tail-call 
support I doubt there will be any complaint.  As for ARM code, I've 
already pointed out that I think we can use IP in the majority of 
circumstances (though untill I see details I can't be sure of that; for 
Thumb code, tail-calling is pretty close to impossible anyway for anything 
except the most basic of cases.

R.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: Work in progress: "Super Sib Calls"; opinions sought
  2002-09-09 19:40           ` Jason Ozolins
@ 2002-09-10  5:59             ` Simon Marlow
  2002-09-10 10:05             ` Jeff Law
  1 sibling, 0 replies; 29+ messages in thread
From: Simon Marlow @ 2002-09-10  5:59 UTC (permalink / raw)
  To: 'Jason Ozolins'; +Cc: gcc, pizka

> From: Jason Ozolins [mailto:jason.ozolins@anu.edu.au] 
>
> Doing the dependency analysis would be great, but it strikes 
> me that if
> it were easy, someone would probably already have done it.  :-)  If it
> is the case that it's not done at all because it's hard to make it
> perfect, what's wrong with an effort to make tail-calls work with
> overlapping arguments in a way that is easier to verify correctness,
> leaving open the option to do the dependency analysis as an 
> optimization later on?

No problem with that at all.

> In the worst case you may well need N temporaries anyway, so 
> you do need
> to decide where to put the "argument marshalling area" for the
> super-sibcall; consider
> 
> int foo (int a, b, c, d)
> {
> 	return bar (a*2+b*3+c*5+d*7, a*3+b*5+c*7+d*11,
> 			a*5+b*7+c*11+d*13, a*7+b*11+c*13+d*17);
> }

The argument marshalling area can be on the stack after the argument
list and any stack variables; in the worst case, this just degrades to
the copying implementation.  But it degrades *smoothly*, and the bad
cases happen rarely (I conjecture).  But, as I said, I'm aware that this
is not trivial to implement.

Cheers,
	Simon

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Work in progress: "Super Sib Calls"; opinions sought
  2002-09-09 23:09         ` Andreas Bauer
@ 2002-09-10 10:00           ` Jeff Law
  2002-09-10 11:06             ` Fergus Henderson
  0 siblings, 1 reply; 29+ messages in thread
From: Jeff Law @ 2002-09-10 10:00 UTC (permalink / raw)
  To: Andreas Bauer; +Cc: gcc, pizka, jason.ozolins

In message <20020910160854.A4234@royal.anu.edu.au>, Andreas Bauer writes:
 >> Also note that simply constructing arguments in the outgoing space is
 >> not necessarily going to avoid the overlapping problems.  Consider
 >> targets where the incoming and outgoing space is not disjoint
 >> (as is the case when passing arguments in registers).
 >> 
 >> For memory arguments, clearly the incoming and outgoing spaces are
 >> disjoint.  However, you've got those (*&#@$ memory copies from the
 >> outgoing space back to the incoming space; those are bloody expensive.
 >
 >I wasn't aware of these sorts of problems and architectures.  I assume
 >ARM is again the platform which would cause difficulties, since its got
 >such a limited number of registers?  I'd be glad if you could point out
 >some examples for me to have a look at.
Alpha, HP-PA, mn103, mn102, h8300 and many others.  Most targets which pass
parameters in registers use the same registers for incoming and outgoing
arguments.

The only register passing targets I'm aware of which do not have an overlap
in the incoming/outgoing argument registers would would be the sparc, ia64
and xtensa (?).  Not surprisingly those are architctures with register
windows.

It's not a question of a limited number of registers, but the case that
there isn't any mechanism to shift "views" at the call point on most
architectures.  

Think about it for a while.  If the caller puts arguments in some registers,
say %o0, %o1, %o2, etc, then the callee is going to have to expect to find
them in the *same* registers.  ie, the incoming and outgoing register spaces
overlap.  

The exception to that is targets with register windows -- on such targets
the register file is "shifted" at call points which results in the caller
and callee accessing arguments via different registers.  For example, on
the sparc, outgoing arguments are placed in %o0, %o1, etc.  We perform a
call and shift the register window.  As a result the callee sees the
incoming arguments in %i0, %i1,etc.  

Jeff

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Work in progress: "Super Sib Calls"; opinions sought
  2002-09-09 19:40           ` Jason Ozolins
  2002-09-10  5:59             ` Simon Marlow
@ 2002-09-10 10:05             ` Jeff Law
  1 sibling, 0 replies; 29+ messages in thread
From: Jeff Law @ 2002-09-10 10:05 UTC (permalink / raw)
  To: Jason Ozolins; +Cc: Simon Marlow, gcc, pizka

 In message <3D7D5B94.63674DC3@anu.edu.au>, Jason Ozolins writes:
 >Simon Marlow wrote:

 >Doing the dependency analysis would be great, but it strikes me that if
 >it were easy, someone would probably already have done it.  :-) 
I believe a number of compilers have done this -- mostly to minimize
argument register copying for functions which shuffle their incoming
arguments and pass them down to another function.  

 >perfect, what's wrong with an effort to make tail-calls work with
 >overlapping arguments in a way that is easier to verify correctness,
 >leaving open the option to do the dependency analysis as an optimization
 >later on?
Because it's the dependency analysis that you use to actually verify
correctness.  That's really the best way to solve the correctness problem.
Once you have the dependency analysis to solve the correctness problem,
optimizing away the unwanted copies is relatively easy.

I strongly suspect if you sit down and try to write code to get the
overlapping argument problem correct that you'll find that dependency
analysis is the natural answer.


jeff


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Work in progress: "Super Sib Calls"; opinions sought
  2002-09-09 23:34         ` Richard Henderson
@ 2002-09-10 10:08           ` Jeff Law
  2002-09-10 10:10           ` Jeff Law
  1 sibling, 0 replies; 29+ messages in thread
From: Jeff Law @ 2002-09-10 10:08 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Andreas Bauer, gcc, pizka, jason.ozolins

In message <20020910063439.GA18565@redhat.com>, Richard Henderson writes:
 >On Tue, Sep 10, 2002 at 03:46:07PM +1000, Andreas Bauer wrote:
 >> ... but I do not see how
 >> I could possibly address indirect calls without making ARM maintainers
 >> (and others) unhappy.
 >
 >The way you can make this work with the existing mechanism is to push
 >the "no indirect" check into every target's FUNCTION_OK_FOR_SIBCALL hook.
 >This, by itself, preserves the status quo.
 >
 >Then, on a case-by-case basis, modify each target's sibcall patterns
 >(and if it doesn't have one, add it) such that there is a register
 >class available that is call-clobbered.
Right.  This (like so many other things) fell to the wayside due to 
time constraints, it's not a particularly hard problem to solve.

jeff



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Work in progress: "Super Sib Calls"; opinions sought
  2002-09-09 23:34         ` Richard Henderson
  2002-09-10 10:08           ` Jeff Law
@ 2002-09-10 10:10           ` Jeff Law
  2002-09-10 11:04             ` Richard Henderson
  1 sibling, 1 reply; 29+ messages in thread
From: Jeff Law @ 2002-09-10 10:10 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Andreas Bauer, gcc, pizka, jason.ozolins

 In message <20020910063439.GA18565@redhat.com>, Richard Henderson writes:
 >Then, on a case-by-case basis, modify each target's sibcall patterns
 >(and if it doesn't have one, add it) such that there is a register
 >class available that is call-clobbered.
BTW, we are making the assumption that the register allocator will
realize that call-clobbered registers are available at the sibcall
site.  It should probably already do that and failure to get a reg
would indicate a serious problem in the allocator (with the exception
of targets which completely exhaust the call-clobbered registers 
passing arguments such as the mn102 & mn103, but those we'd just
disallow indirect sibcalls in the target checks).

jeff


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Work in progress: "Super Sib Calls"; opinions sought
  2002-09-10 10:10           ` Jeff Law
@ 2002-09-10 11:04             ` Richard Henderson
  0 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2002-09-10 11:04 UTC (permalink / raw)
  To: Jeff Law; +Cc: Andreas Bauer, gcc, pizka, jason.ozolins

On Tue, Sep 10, 2002 at 11:15:12AM -0600, Jeff Law wrote:
> BTW, we are making the assumption that the register allocator will
> realize that call-clobbered registers are available at the sibcall
> site.  It should probably already do that and failure to get a reg
> would indicate a serious problem in the allocator ...

Nope.  Since we've not yet emitted the sibcall_epilogue 
(and how could we before register allocation), there's
nothing in the viscinity of the sibcall pattern that would
indicate that a call-saved register wasn't usable.


r~

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Work in progress: "Super Sib Calls"; opinions sought
  2002-09-10 10:00           ` Jeff Law
@ 2002-09-10 11:06             ` Fergus Henderson
  0 siblings, 0 replies; 29+ messages in thread
From: Fergus Henderson @ 2002-09-10 11:06 UTC (permalink / raw)
  To: law; +Cc: Andreas Bauer, gcc, pizka, jason.ozolins

On 10-Sep-2002, Jeff Law <law@porcupine.slc.redhat.com> wrote:
> Alpha, HP-PA, mn103, mn102, h8300 and many others.  Most targets which pass
> parameters in registers use the same registers for incoming and outgoing
> arguments.

Also x86, if you use __attribute__((regparm(...))).
Which is often a good idea, if you can, since it improves performance;
I got about a 3-4% speedup for the Mercury compiler by using regparm(3).

-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2002-09-10 18:06 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-09-08 19:52 Work in progress: "Super Sib Calls"; opinions sought Andreas Bauer
2002-09-08 23:23 ` Richard Henderson
2002-09-09  2:48   ` Andreas Bauer
2002-09-09  3:12     ` Richard Henderson
2002-09-09  3:34       ` Jakub Jelinek
2002-09-09  6:49       ` Fergus Henderson
2002-09-09  7:26         ` Simon Marlow
2002-09-09 19:40           ` Jason Ozolins
2002-09-10  5:59             ` Simon Marlow
2002-09-10 10:05             ` Jeff Law
2002-09-09  9:35         ` Markus Pizka
2002-09-09 10:08         ` Jeff Law
2002-09-09 10:17         ` Richard Henderson
2002-09-09  9:44       ` Jeff Law
2002-09-09 23:09         ` Andreas Bauer
2002-09-10 10:00           ` Jeff Law
2002-09-10 11:06             ` Fergus Henderson
2002-09-09 10:38       ` Geert Bosch
2002-09-09 10:41         ` Andrew Haley
2002-09-09 22:46       ` Andreas Bauer
2002-09-09 23:34         ` Richard Henderson
2002-09-10 10:08           ` Jeff Law
2002-09-10 10:10           ` Jeff Law
2002-09-10 11:04             ` Richard Henderson
2002-09-10  0:06         ` Fergus Henderson
2002-09-10  2:26         ` Richard Earnshaw
2002-09-09  3:34     ` Lars Brinkhoff
2002-09-09  6:53       ` Fergus Henderson
2002-09-09  7:11         ` Richard Earnshaw

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).