From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 84445 invoked by alias); 13 Aug 2015 09:15:01 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 84409 invoked by uid 89); 13 Aug 2015 09:15:00 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.8 required=5.0 tests=AWL,BAYES_05,KAM_STOCKGEN,RCVD_IN_DNSWL_LOW,RP_MATCHES_RCVD,SPF_PASS autolearn=no version=3.3.2 X-HELO: mailapp01.imgtec.com Received: from mailapp01.imgtec.com (HELO mailapp01.imgtec.com) (195.59.15.196) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 13 Aug 2015 09:14:58 +0000 Received: from KLMAIL01.kl.imgtec.org (unknown [192.168.5.35]) by Websense Email Security Gateway with ESMTPS id 60BF732222292; Thu, 13 Aug 2015 10:14:52 +0100 (IST) Received: from LEMAIL01.le.imgtec.org (192.168.152.62) by KLMAIL01.kl.imgtec.org (192.168.5.35) with Microsoft SMTP Server (TLS) id 14.3.195.1; Thu, 13 Aug 2015 10:14:54 +0100 Received: from LEMAIL01.le.imgtec.org ([fe80::5ae:ee16:f4b9:cda9]) by LEMAIL01.le.imgtec.org ([fe80::5ae:ee16:f4b9:cda9%17]) with mapi id 14.03.0210.002; Thu, 13 Aug 2015 10:14:54 +0100 From: Matthew Fortune To: Steve Ellcey CC: GCC Patches , clm , Joseph Myers Subject: RE: [Patch, MIPS] MIPS specific optimization for o32 ABI Date: Thu, 13 Aug 2015 09:15:00 -0000 Message-ID: <6D39441BF12EF246A7ABCE6654B0235321225FC7@LEMAIL01.le.imgtec.org> References: <1438101523.19674.219.camel@ubuntu-sellcey> <1438361507.19674.252.camel@ubuntu-sellcey> In-Reply-To: <1438361507.19674.252.camel@ubuntu-sellcey> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-IsSubscribed: yes X-SW-Source: 2015-08/txt/msg00684.txt.bz2 Hi Steve, Overall, I don't think these optimizations are ready to include. In princip= le the idea looks good but it is done at the wrong point in the compiler in my opinion. The biggest concern I have is that the analysis should be possible at (or prior to) the point where the prologue/epilogue are expanded. I don't think it is safe enough to post-process the code and delete the stack allocation. There is at least one other optimization idea that competes with this one which is to allow LRA to use the argument save area for arbitrary spills wh= en it is not used for spilling arguments or to prepare varargs. I think we need to at least consider how the frame header removal will interact with such an optimization. I'd also like to see this kind of optimization be on by default and the fact it is off by default in this patch suggests you/whoever originally wrote it is not confident enough about its safety. I've been through the code in detail anyway as there are a couple of things that should be addressed if you use this in its current form elsewhere. Thanks, Matthew >diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c >index c3cd52d..7cdef89 100644 >--- a/gcc/config/mips/mips.c >+++ b/gcc/config/mips/mips.c >@@ -77,6 +77,7 @@ along with GCC; see the file COPYING3. If not see > #include "cgraph.h" > #include "builtins.h" > #include "rtl-iter.h" >+#include "dumpfile.h" >=20 > /* This file should be included last. */ > #include "target-def.h" >@@ -380,6 +381,9 @@ struct GTY(()) mips_frame_info { >=20 > /* The offset of hard_frame_pointer_rtx from the bottom of the frame. = */ > HOST_WIDE_INT hard_frame_pointer_offset; >+ >+ /* Skip stack frame allocation if possible. */ >+ bool skip_stack_frame_allocation_p; > }; >=20 > /* Enumeration for masked vectored (VI) and non-masked (EIC) interrupts. = */ >@@ -472,6 +476,15 @@ struct GTY(()) machine_function { > /* True if this is an interrupt handler that should use DERET > instead of ERET. */ > bool use_debug_exception_return_p; >+ >+ /* True if some of the callees uses its frame header. */ Is this supposed to say 'use their frame header'? >+ bool callees_use_frame_header_p; >+ >+ /* True if current function uses its frame header. */ >+ bool uses_frame_header_p; >+ >+ /* Frame size before updated by optimizations. */ >+ HOST_WIDE_INT initial_total_size; > }; >=20 > /* Information about a single argument. */ >@@ -574,6 +587,8 @@ struct mips_rtx_cost_data >=20 > /* Global variables for machine-dependent things. */ >=20 >+static hash_map *mips_frame_header_usage; >+ > /* The -G setting, or the configuration's default small-data limit if > no -G option is given. */ > static unsigned int mips_small_data_threshold; >@@ -1296,6 +1311,7 @@ static const struct mips_rtx_cost_data > } > }; >=20 >+static void mips_rest_of_frame_header_opt (void); > static rtx mips_find_pic_call_symbol (rtx_insn *, rtx, bool); > static int mips_register_move_cost (machine_mode, reg_class_t, > reg_class_t); >@@ -10358,6 +10374,114 @@ mips_save_reg_p (unsigned int regno) > return false; > } >=20 >+/* Try to find if function may use its incoming frame header. */ >+ >+static bool >+mips_find_if_frame_header_is_used (tree fndecl) mips_frame_header_used_p >+{ >+ bool *frame_header_unused; What is the pointer doing? This should just be a bool. >+ >+ if (mips_frame_header_usage) >+ frame_header_unused =3D mips_frame_header_usage->get (fndecl); >+ else >+ frame_header_unused =3D false; >+ >+ return !frame_header_unused; Is this just supposed to be (or maybe inverted, the logic looks weird): return !(mips_frame_header_usage && mips_frame_header_usage->get (fndecl)); This should be inlined in the function below it only has one caller. >+} >+ >+/* Return true if the instruction is a call and the called function may u= se its >+ incoming frame header. */ >+ >+static bool >+mips_callee_use_frame_header (rtx_insn *insn) >+{ >+ rtx call_insn; >+ tree fndecl; >+ >+ if (insn =3D=3D NULL_RTX || !USEFUL_INSN_P (insn)) >+ return false; >+ >+ /* Handle sequence of instructions. */ >+ if (GET_CODE (PATTERN (insn)) =3D=3D SEQUENCE) >+ { >+ rtx_insn *subinsn; >+ FOR_EACH_SUBINSN (subinsn, insn) >+ if (INSN_P (subinsn) && mips_callee_use_frame_header (subinsn)) >+ return true; >+ } >+ >+ if (GET_CODE (insn) !=3D CALL_INSN) >+ return false; >+ >+ if (GET_CODE (PATTERN (insn)) !=3D PARALLEL >+ || GET_CODE (XVECEXP (PATTERN (insn), 0, 0)) !=3D SET) >+ return true; I don't understand this. It says there is a callee frame header usage if the instruction does not have this pattern (parallel [(set...)...]) Why? (I'm probably just being dumb here) >+ >+ call_insn =3D SET_SRC (XVECEXP (PATTERN (insn), 0, 0)); >+ >+ if (GET_CODE (call_insn) !=3D CALL >+ || GET_CODE (XEXP (call_insn, 0)) !=3D MEM >+ || GET_CODE (XEXP (XEXP (call_insn, 0), 0)) !=3D SYMBOL_REF) >+ return true; MEM_P, SYMBOL_REF_P etc. Weak symbols? We must assume the worst for preemptable symbols. >+ >+ fndecl =3D SYMBOL_REF_DECL (XEXP (XEXP (call_insn, 0), 0)); >+ >+ if (fndecl =3D=3D current_function_decl) >+ return true; >+ >+ return mips_find_if_frame_header_is_used (fndecl); >+} >+ >+/* Return true if any of the callee functions may use its incoming frame their incoming frame... >+ header. */ >+ >+static bool >+mips_callees_use_frame_header_p (void) >+{ >+ rtx_insn *insn; >+ >+ /* Iterate through all instructions in the current function and check w= hether >+ only already seen functions may be called. Assume that any unseen f= unction >+ may use its incoming frame header. */ >+ for (insn =3D get_insns (); insn !=3D NULL_RTX; insn =3D NEXT_INSN (ins= n)) >+ if (mips_callee_use_frame_header (insn)) >+ return true; Let's at least pull all the basic instruction iterating code into here. I.e= . this code at least (it doesn't need to check for a sequence aas FOR_EACH_SUBINSN= is safe on a single insn). It may simply be worth inlining the whole thing as = the separation doesn't seem to make things any clearer. rtx_insn *subinsn; FOR_EACH_SUBINSN (subinsn, insn) if (INSN_P (subinsn) && mips_callee_use_frame_header (subinsn)) return true; >+ >+ return false; >+} >+ >+/* Return true if the current function may use its incoming frame header. >+ If destination of memory store in format sp + offset and offset is gre= ater >+ or equal than frame->total_size than this function returns true. >+ */ So is this code really just looking for spills/reloads? The restrictions on applying this optimisation seem like they are basically saying the function must have no need for frame. I don't see any reason to have such a large number of restrictions, the only thing that matters is the use of the incoming argument area. >+ >+static bool >+mips_cfun_use_frame_header_p (void) >+{ >+ rtx_insn *insn; >+ >+ for (insn =3D get_insns (); insn !=3D NULL_RTX; insn =3D NEXT_INSN (ins= n)) >+ { >+ if (insn !=3D NULL_RTX && INSN_P (insn) remove insn !=3D NULL_RTX it is redundant. >+ && GET_CODE (PATTERN (insn)) =3D=3D SET >+ && MEM_P (XEXP (PATTERN (insn), 0))) >+ { >+ rtx mem_dst =3D XEXP (XEXP (PATTERN (insn), 0), 0); >+ if (GET_CODE (mem_dst) =3D=3D PLUS >+ && CONST_INT_P (XEXP (mem_dst, 1)) >+ && REG_P (XEXP (mem_dst, 0)) >+ && REGNO (XEXP (mem_dst, 0)) =3D=3D STACK_POINTER_REGNUM) I'm uneasy about how simplistic the test is for using the frame header. It probably works today but other optimisations could create things like anchors into the frame and offset from there. >+ { >+ int offset =3D INTVAL (XEXP (mem_dst, 1)); HOST_WIDE_INT offset >+ if (offset >=3D cfun->machine->initial_total_size) >+ return true; >+ } >+ } >+ } >+ >+ return false; >+} >+ > /* Populate the current function's mips_frame_info structure. >=20 > MIPS stack frames look like: >@@ -10429,9 +10553,8 @@ mips_save_reg_p (unsigned int regno) > hard_frame_pointer_rtx unchanged. */ >=20 > static void >-mips_compute_frame_info (void) >+mips_compute_frame_info (bool recalculate, struct mips_frame_info *frame) > { >- struct mips_frame_info *frame; > HOST_WIDE_INT offset, size; > unsigned int regno, i; >=20 >@@ -10457,11 +10580,11 @@ mips_compute_frame_info (void) > } > } >=20 >- frame =3D &cfun->machine->frame; > memset (frame, 0, sizeof (*frame)); > size =3D get_frame_size (); >=20 > cfun->machine->global_pointer =3D mips_global_pointer (); >+ frame->cprestore_size =3D 0; This has just been zero'd 3 lines earlier. >=20 > /* The first two blocks contain the outgoing argument area and the $gp = save > slot. This area isn't needed in leaf functions, but if the >@@ -10477,12 +10600,18 @@ mips_compute_frame_info (void) > frame->args_size =3D REG_PARM_STACK_SPACE (cfun->decl); > else > frame->args_size =3D 0; >- frame->cprestore_size =3D 0; If you are going to delete this then delete the 'else' and args_size =3D 0 = as well. > } > else > { >- frame->args_size =3D crtl->outgoing_args_size; >- frame->cprestore_size =3D MIPS_GP_SAVE_AREA_SIZE; >+ /* If recalculate do not take args_size into account. */ This comment needs to say why this happens. I don't follow why from the cod= e. >+ if (recalculate) >+ frame->args_size =3D 0; >+ else No need to zero args_size just leave it as it has been memzero'd. if (!recalculate) >+ frame->args_size =3D crtl->outgoing_args_size; >+ >+ /* Check if space allocated on stack for gp will be used. */ >+ if (!recalculate || mips_must_initialize_gp_p ()) >+ frame->cprestore_size =3D MIPS_GP_SAVE_AREA_SIZE; Why is this dependent on !recalculate? Please split out the fix for not allocating space for cprestore as that can be applied independently, it looks like a simple bug fix. > } > offset =3D frame->args_size + frame->cprestore_size; >=20 >@@ -10606,6 +10735,9 @@ mips_compute_frame_info (void) > instructions for local variables and incoming arguments. */ > if (TARGET_MIPS16) > frame->hard_frame_pointer_offset =3D frame->args_size; >+ >+ if (!recalculate) >+ cfun->machine->initial_total_size =3D frame->total_size; > } >=20 > /* Return the style of GP load sequence that is being used for the >@@ -10642,7 +10774,7 @@ mips_frame_pointer_required (void) > without using a second temporary register. */ > if (TARGET_MIPS16) > { >- mips_compute_frame_info (); >+ mips_compute_frame_info (false, &cfun->machine->frame); > if (!SMALL_OPERAND (cfun->machine->frame.total_size)) > return true; > } >@@ -10668,7 +10800,7 @@ mips_initial_elimination_offset (int from, int to) > { > HOST_WIDE_INT offset; >=20 >- mips_compute_frame_info (); >+ mips_compute_frame_info (false, &cfun->machine->frame); >=20 > /* Set OFFSET to the offset from the end-of-prologue stack pointer. */ > switch (from) >@@ -16838,12 +16970,8 @@ mips_has_long_branch_p (void) > return false; > } >=20 >-/* If we are using a GOT, but have not decided to use a global pointer ye= t, >- see whether we need one to implement long branches. Convert the ghost >- global-pointer instructions into real ones if so. */ >- > static bool >-mips_expand_ghost_gp_insns (void) >+mips_gp_expand_needed_p (void) > { > /* Quick exit if we already know that we will or won't need a > global pointer. */ >@@ -16857,12 +16985,28 @@ mips_expand_ghost_gp_insns (void) > return false; >=20 > /* We've now established that we need $gp. */ >- cfun->machine->must_initialize_gp_p =3D true; >- split_all_insns_noflow (); >- > return true; > } >=20 >+ >+/* If we are using a GOT, but have not decided to use a global pointer ye= t, >+ see whether we need one to implement long branches. Convert the ghost >+ global-pointer instructions into real ones if so. */ >+ >+static bool >+mips_expand_ghost_gp_insns (void) >+{ >+ >+ if (mips_gp_expand_needed_p ()) >+ { >+ /* We've now established that we need $gp. */ >+ cfun->machine->must_initialize_gp_p =3D true; >+ split_all_insns_noflow (); >+ return true; >+ } >+ return false; >+} >+ > /* Subroutine of mips_reorg to manage passes that require DF. */ >=20 > static void >@@ -17004,6 +17148,9 @@ mips_reorg (void) > mips_df_reorg (); > free_bb_for_insn (); > } >+ >+ if (flag_frame_header_optimization) >+ mips_rest_of_frame_header_opt (); > } >=20 > /* We use a machine specific pass to do a second machine dependent reorg >@@ -18802,6 +18949,164 @@ mips_prepare_pch_save (void) > mips_set_compression_mode (0); > mips16_globals =3D 0; > } >+ >+/* Return new offset for stack load/store operations. */ >+ >+static int >+mips_get_updated_offset (int old_offset) >+{ >+ struct mips_frame_info *frame =3D &cfun->machine->frame; >+ int res =3D old_offset; >+ int initial_total_size =3D cfun->machine->initial_total_size; >+ >+ if (old_offset > 0 && old_offset <=3D frame->gp_sp_offset) >+ /* It should be only gp. */ >+ res =3D old_offset - (initial_total_size >+ - REG_PARM_STACK_SPACE (cfun->decl)); >+ else if (old_offset >=3D frame->gp_sp_offset >+ && old_offset <=3D initial_total_size) >+ /* gp registers, accumulators. */ >+ res =3D old_offset - (initial_total_size >+ - REG_PARM_STACK_SPACE (cfun->decl)); This reduces to old_offset < 0 & old_offset <=3D initial_total_size. I don't understand the reason for the original separation nor the comment about 'it should be only gp' will it only ever be gp or is there some chance that it can be something else? >+ else if (old_offset > initial_total_size) >+ /* Incoming args. */ >+ res =3D old_offset - initial_total_size; >+ >+ return res; >+} >+ >+/* Test whether to skip frame header allocation. TODO: Try to do stack >+ frame allocation removal even if local variables are used. */ >+ >+static bool >+mips_skip_stack_frame_alloc (void) >+{ >+ struct mips_frame_info *frame =3D &cfun->machine->frame; >+ struct mips_frame_info opt_frame; >+ >+ if (!flag_frame_header_optimization) >+ return false; >+ >+ if (cfun->calls_setjmp !=3D 0 >+ || cfun->calls_alloca !=3D 0 >+ || cfun->stdarg !=3D 0 >+ || crtl->shrink_wrapped >+ || frame->var_size !=3D 0 >+ || frame->args_size > REG_PARM_STACK_SPACE (cfun->decl) >+ || mips_abi !=3D ABI_32 >+ || TARGET_MIPS16 >+ || TARGET_MICROMIPS >+ || frame_pointer_needed !=3D 0 >+ || mips_gp_expand_needed_p ()) >+ return false; Ugh! This is fairly tightly constrained. I'm not sure all of this is necessary. >+ >+ if (mips_callees_use_frame_header_p ()) >+ return false; >+ >+ mips_compute_frame_info (true, &opt_frame); >+ >+ if (opt_frame.total_size > REG_PARM_STACK_SPACE (cfun->decl) >+ || cfun->machine->uses_frame_header_p) >+ return false; I'm probably being dumb again but I'm struggling to understand why this is the key condition for detecting when the optimisation can be done. Since it would need a comment anyway can you explain? >+ >+ return true; >+} >+ >+/* Update stack related instructions. */ >+ >+static void >+mips_frame_header_update_insn (rtx_insn *insn) >+{ >+ rtx set_insn, src, dst; >+ >+ if (insn =3D=3D NULL_RTX || !USEFUL_INSN_P (insn)) >+ return; >+ >+ set_insn =3D single_set (insn); >+ if (set_insn =3D=3D NULL_RTX) >+ return; >+ >+ src =3D SET_SRC (set_insn); >+ dst =3D SET_DEST (set_insn); >+ >+ if (GET_CODE (src) =3D=3D REG && GET_CODE (dst) =3D=3D MEM >+ && GET_CODE (XEXP (dst, 0)) =3D=3D PLUS >+ && GET_CODE (XEXP (XEXP (dst, 0), 0)) =3D=3D REG >+ && CONST_INT_P (XEXP (XEXP (dst, 0), 1)) >+ && (REGNO (XEXP (XEXP (dst, 0), 0)) >+ =3D=3D STACK_POINTER_REGNUM)) Use MEM_P and REG_P. >+ { >+ /* It is a store through sp - update offset. */ >+ XEXP (XEXP (dst, 0), 1) >+ =3D GEN_INT (mips_get_updated_offset (INTVAL (XEXP (XEXP (dst, 0)= , 1)))); >+ return; >+ } >+ >+ if (GET_CODE (src) =3D=3D MEM && GET_CODE (dst) =3D=3D REG >+ && GET_CODE (XEXP (src, 0)) =3D=3D PLUS >+ && GET_CODE (XEXP (XEXP (src, 0), 0)) =3D=3D REG >+ && CONST_INT_P (XEXP (XEXP (src, 0), 1)) >+ && (REGNO (XEXP (XEXP (src, 0), 0)) >+ =3D=3D STACK_POINTER_REGNUM)) >+ { >+ /* It is a load through sp - update offset. */ >+ XEXP (XEXP (src, 0), 1) >+ =3D GEN_INT (mips_get_updated_offset (INTVAL (XEXP (XEXP (src, 0)= , 1)))); >+ return; >+ } >+ >+ if (GET_CODE (src) =3D=3D PLUS >+ && GET_CODE (XEXP (src, 0)) =3D=3D REG >+ && CONST_INT_P (XEXP (src, 1)) >+ && REGNO (XEXP (src, 0)) =3D=3D STACK_POINTER_REGNUM >+ && REGNO (SET_DEST (set_insn)) =3D=3D STACK_POINTER_REGNUM) >+ delete_insn (insn); >+} As in my summary, I don't like this. It feels dangerous and unnecessary to = apply the optimization so late. >+ >+/* Entry function for the frame header optimization. */ >+ >+static void >+mips_rest_of_frame_header_opt (void) >+{ >+ rtx_insn *insn; >+ bool skip_stack_frame_alloc; >+ struct mips_frame_info *frame =3D &cfun->machine->frame; >+ >+ cfun->machine->uses_frame_header_p =3D mips_cfun_use_frame_header_p (); >+ skip_stack_frame_alloc =3D mips_skip_stack_frame_alloc (); >+ >+ /* Check if it is needed to recalculate stack frame info. */ >+ if (skip_stack_frame_alloc) >+ mips_compute_frame_info (true, frame); >+ >+ if ((skip_stack_frame_alloc && frame->total_size =3D=3D 0) >+ || (!skip_stack_frame_alloc && !cfun->machine->uses_frame_header_p >+ && !cfun->stdarg)) >+ { >+ /* Function does not use its incoming frame header. */ >+ >+ if (!mips_frame_header_usage) >+ mips_frame_header_usage =3D new hash_map; >+ >+ tree fndecl =3D current_function_decl; >+ bool existed; >+ bool &frame_hdr_unused =3D mips_frame_header_usage->get_or_insert (= fndecl, &existed); >+ if (!existed) >+ frame_hdr_unused =3D true; >+ } >+ >+ if (skip_stack_frame_alloc) >+ { >+ if (dump_file && cfun->machine->initial_total_size > frame->total_s= ize) >+ fprintf (dump_file, "Frame size reduced by frame header optimization" >+ " from %ld to %ld.\n", cfun->machine->initial_total_size, >+ frame->total_size); >+ >+ /* Update instructions. */ >+ for (insn =3D get_insns (); insn !=3D NULL_RTX; insn =3D NEXT_INSN = (insn)) >+ mips_frame_header_update_insn (insn); >+ } >+} >=20 > /* Generate or test for an insn that supports a constant permutation. */ >=20 >diff --git a/gcc/config/mips/mips.opt b/gcc/config/mips/mips.opt >index 348c6e0..3e72936 100644 >--- a/gcc/config/mips/mips.opt >+++ b/gcc/config/mips/mips.opt >@@ -412,6 +412,10 @@ modd-spreg > Target Report Mask(ODD_SPREG) > Enable use of odd-numbered single-precision registers >=20 >+mframe-header-opt >+Target Report Var(flag_frame_header_optimization) Optimization >+Optimize frame header >+ > noasmopt > Driver >=20 >diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi >index 413ac16..3b7b1b6 100644 >--- a/gcc/doc/invoke.texi >+++ b/gcc/doc/invoke.texi >@@ -813,7 +813,8 @@ Objective-C and Objective-C++ Dialects}. > -mbranch-cost=3D@var{num} -mbranch-likely -mno-branch-likely @gol > -mfp-exceptions -mno-fp-exceptions @gol > -mvr4130-align -mno-vr4130-align -msynci -mno-synci @gol >--mrelax-pic-calls -mno-relax-pic-calls -mmcount-ra-address} >+-mrelax-pic-calls -mno-relax-pic-calls -mmcount-ra-address @gol >+-mframe-header-opt} add -mno-frame-header-opt. >=20 > @emph{MMIX Options} > @gccoptlist{-mlibfuncs -mno-libfuncs -mepsilon -mno-epsilon -mabi=3Dg= nu @gol >@@ -18013,6 +18014,18 @@ if @var{ra-address} is nonnull. >=20 > The default is @option{-mno-mcount-ra-address}. >=20 >+@item -mframe-header-opt >+@itemx -mno-frame-header-opt >+@opindex mframe-header-opt >+Enable (disable) frame header optimization in the O32 ABI. When using >+the O32 ABI, calling functions allocate 16 bytes on the stack in case >+the called function needs to write out register arguments to memory so >+that their address can be taken. When enabled, this optimization allows >+the called function to use those 16 bytes for other purposes if the >+arguments do not need to be written to memory. >+ >+This optimization is off by default at all optimization levels. >+ > @end table >=20 > @node MMIX Options