From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 17166 invoked by alias); 23 Aug 2007 14:22:41 -0000 Received: (qmail 16537 invoked by uid 22791); 23 Aug 2007 14:22:37 -0000 X-Spam-Check-By: sourceware.org Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (65.74.133.4) by sourceware.org (qpsmtpd/0.31) with ESMTP; Thu, 23 Aug 2007 14:22:31 +0000 Received: (qmail 6814 invoked from network); 23 Aug 2007 14:22:28 -0000 Received: from unknown (HELO bullfrog.localdomain) (sandra@127.0.0.2) by mail.codesourcery.com with ESMTPA; 23 Aug 2007 14:22:28 -0000 Message-ID: <46CD9828.3040305@codesourcery.com> Date: Thu, 23 Aug 2007 14:35:00 -0000 From: Sandra Loosemore User-Agent: Thunderbird 2.0.0.4 (X11/20070604) MIME-Version: 1.0 To: GCC Patches , Nigel Stephens , Guy Morrogh , David Ung , Thiemo Seufer , Mark Mitchell , richard@codesourcery.com Subject: Re: PATCH: fine-tuning for can_store_by_pieces References: <46C3343A.5080407@codesourcery.com> <87ps1nop2x.fsf@firetop.home> <46C778D6.5060808@codesourcery.com> <87y7g6r50c.fsf@firetop.home> <46CA222D.2050107@codesourcery.com> <87ps1h5mda.fsf@firetop.home> <46CAEBCE.3050807@codesourcery.com> <87r6lx3r9p.fsf@firetop.home> <46CB4B99.5010501@codesourcery.com> <87zm0k39gj.fsf@firetop.home> In-Reply-To: <87zm0k39gj.fsf@firetop.home> Content-Type: multipart/mixed; boundary="------------000108000701060004050209" Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2007-08/txt/msg01554.txt.bz2 This is a multi-part message in MIME format. --------------000108000701060004050209 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-length: 725 I think this version of the patch addresses all the issues Richard raised with the last version, unless I've gotten confused again and let something slip through the cracks. I experimented with tweaking all of MOVE_RATIO/CLEAR_RATIO/SET_RATIO and came up with values that produce good -Os results in CSiBE as well as making sense in terms of the base MIPS_CALL_RATIO versus how many instructions would be required for each bytewise move/clear/set. Overall I am getting just over 0.5% improvement with -Os from this patch now. Mark, could you take another look at this as well, since I included Richard's suggested change to remove the optimize_size check from the target-independent code in builtins.c? -Sandra --------------000108000701060004050209 Content-Type: text/x-log; name="31b-frob-by-pieces.log" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="31b-frob-by-pieces.log" Content-length: 1326 2007-08-22 Sandra Loosemore Nigel Stephens PR target/11787 gcc/ * doc/tm.texi (SET_RATIO, SET_BY_PIECES_P): Document new macros. (STORE_BY_PIECES_P): No longer applies to __builtin_memset. * expr.c (SET_BY_PIECES_P): Define. (can_store_by_pieces, store_by_pieces): Add MEMSETP argument; use it to decide whether to use SET_BY_PIECES_P or STORE_BY_PIECES_P. * expr.h (SET_RATIO): Define. (can_store_by_pieces, store_by_pieces): Update prototypes. * builtins.c (expand_builtin_memcpy): Pass MEMSETP argument to can_store_by_pieces/store_by_pieces. (expand_builtin_memcpy_args): Likewise. (expand_builtin_strncpy): Likewise. (expand_builtin_memset_args): Likewise. Also remove special case for optimize_size so that can_store_by_pieces/SET_BY_PIECES_P can decide what to do instead. * value-prof.c (tree_stringops_transform): Pass MEMSETP argument to can_store_by_pieces. * config/sh/sh.h (SET_BY_PIECES_P): Clone from STORE_BY_PIECES_P. * config/s390/s390.h (SET_BY_PIECES_P): Likewise. * config/mips/mips.opt (mmemcpy): Change from Var to Mask. * config/mips/mips.c (override_options): Make -Os default to -mmemcpy. * config/mips/mips.h (MIPS_CALL_RATIO): Define. (MOVE_RATIO, CLEAR_RATIO, SET_RATIO): Define. (STORE_BY_PIECES_P): Define. --------------000108000701060004050209 Content-Type: text/x-patch; name="31b-frob-by-pieces.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="31b-frob-by-pieces.patch" Content-length: 22805 Index: gcc/doc/tm.texi =================================================================== *** gcc/doc/tm.texi (revision 127324) --- gcc/doc/tm.texi (working copy) *************** will be used. Defaults to 1 if @code{mo *** 5893,5904 **** than @code{CLEAR_RATIO}. @end defmac @defmac STORE_BY_PIECES_P (@var{size}, @var{alignment}) A C expression used to determine whether @code{store_by_pieces} will be ! used to set a chunk of memory to a constant value, or whether some other ! mechanism will be used. Used by @code{__builtin_memset} when storing ! values other than constant zero and by @code{__builtin_strcpy} when ! when called with a constant source string. Defaults to 1 if @code{move_by_pieces_ninsns} returns less than @code{MOVE_RATIO}. @end defmac --- 5893,5922 ---- than @code{CLEAR_RATIO}. @end defmac + @defmac SET_RATIO + The threshold of number of scalar move insns, @emph{below} which a sequence + of insns should be generated to set memory to a constant value, instead of + a block set insn or a library call. + Increasing the value will always make code faster, but + eventually incurs high cost in increased code size. + + If you don't define this, it defaults to the value of @code{MOVE_RATIO}. + @end defmac + + @defmac SET_BY_PIECES_P (@var{size}, @var{alignment}) + A C expression used to determine whether @code{store_by_pieces} will be + used to set a chunk of memory to a constant value, or whether some + other mechanism will be used. Used by @code{__builtin_memset} when + storing values other than constant zero. + Defaults to 1 if @code{move_by_pieces_ninsns} returns less + than @code{SET_RATIO}. + @end defmac + @defmac STORE_BY_PIECES_P (@var{size}, @var{alignment}) A C expression used to determine whether @code{store_by_pieces} will be ! used to set a chunk of memory to a constant string value, or whether some ! other mechanism will be used. Used by @code{__builtin_strcpy} when ! called with a constant source string. Defaults to 1 if @code{move_by_pieces_ninsns} returns less than @code{MOVE_RATIO}. @end defmac Index: gcc/expr.c =================================================================== *** gcc/expr.c (revision 127324) --- gcc/expr.c (working copy) *************** static bool float_extend_from_mem[NUM_MA *** 186,193 **** #endif /* This macro is used to determine whether store_by_pieces should be ! called to "memset" storage with byte values other than zero, or ! to "memcpy" storage when the source is a constant string. */ #ifndef STORE_BY_PIECES_P #define STORE_BY_PIECES_P(SIZE, ALIGN) \ (move_by_pieces_ninsns (SIZE, ALIGN, STORE_MAX_PIECES + 1) \ --- 186,200 ---- #endif /* This macro is used to determine whether store_by_pieces should be ! called to "memset" storage with byte values other than zero. */ ! #ifndef SET_BY_PIECES_P ! #define SET_BY_PIECES_P(SIZE, ALIGN) \ ! (move_by_pieces_ninsns (SIZE, ALIGN, STORE_MAX_PIECES + 1) \ ! < (unsigned int) SET_RATIO) ! #endif ! ! /* This macro is used to determine whether store_by_pieces should be ! called to "memcpy" storage when the source is a constant string. */ #ifndef STORE_BY_PIECES_P #define STORE_BY_PIECES_P(SIZE, ALIGN) \ (move_by_pieces_ninsns (SIZE, ALIGN, STORE_MAX_PIECES + 1) \ *************** use_group_regs (rtx *call_fusage, rtx re *** 2191,2203 **** /* Determine whether the LEN bytes generated by CONSTFUN can be stored to memory using several move instructions. CONSTFUNDATA is a pointer which will be passed as argument in every CONSTFUN call. ! ALIGN is maximum alignment we can assume. Return nonzero if a ! call to store_by_pieces should succeed. */ int can_store_by_pieces (unsigned HOST_WIDE_INT len, rtx (*constfun) (void *, HOST_WIDE_INT, enum machine_mode), ! void *constfundata, unsigned int align) { unsigned HOST_WIDE_INT l; unsigned int max_size; --- 2198,2211 ---- /* Determine whether the LEN bytes generated by CONSTFUN can be stored to memory using several move instructions. CONSTFUNDATA is a pointer which will be passed as argument in every CONSTFUN call. ! ALIGN is maximum alignment we can assume. MEMSETP is true if this is ! a memset operation and false if it's a copy of a constant string. ! Return nonzero if a call to store_by_pieces should succeed. */ int can_store_by_pieces (unsigned HOST_WIDE_INT len, rtx (*constfun) (void *, HOST_WIDE_INT, enum machine_mode), ! void *constfundata, unsigned int align, bool memsetp) { unsigned HOST_WIDE_INT l; unsigned int max_size; *************** can_store_by_pieces (unsigned HOST_WIDE_ *** 2210,2216 **** if (len == 0) return 1; ! if (! STORE_BY_PIECES_P (len, align)) return 0; tmode = mode_for_size (STORE_MAX_PIECES * BITS_PER_UNIT, MODE_INT, 1); --- 2218,2226 ---- if (len == 0) return 1; ! if (! (memsetp ! ? SET_BY_PIECES_P (len, align) ! : STORE_BY_PIECES_P (len, align))) return 0; tmode = mode_for_size (STORE_MAX_PIECES * BITS_PER_UNIT, MODE_INT, 1); *************** can_store_by_pieces (unsigned HOST_WIDE_ *** 2285,2291 **** /* Generate several move instructions to store LEN bytes generated by CONSTFUN to block TO. (A MEM rtx with BLKmode). CONSTFUNDATA is a pointer which will be passed as argument in every CONSTFUN call. ! ALIGN is maximum alignment we can assume. If ENDP is 0 return to, if ENDP is 1 return memory at the end ala mempcpy, and if ENDP is 2 return memory the end minus one byte ala stpcpy. */ --- 2295,2302 ---- /* Generate several move instructions to store LEN bytes generated by CONSTFUN to block TO. (A MEM rtx with BLKmode). CONSTFUNDATA is a pointer which will be passed as argument in every CONSTFUN call. ! ALIGN is maximum alignment we can assume. MEMSETP is true if this is ! a memset operation and false if it's a copy of a constant string. If ENDP is 0 return to, if ENDP is 1 return memory at the end ala mempcpy, and if ENDP is 2 return memory the end minus one byte ala stpcpy. */ *************** can_store_by_pieces (unsigned HOST_WIDE_ *** 2293,2299 **** rtx store_by_pieces (rtx to, unsigned HOST_WIDE_INT len, rtx (*constfun) (void *, HOST_WIDE_INT, enum machine_mode), ! void *constfundata, unsigned int align, int endp) { struct store_by_pieces data; --- 2304,2310 ---- rtx store_by_pieces (rtx to, unsigned HOST_WIDE_INT len, rtx (*constfun) (void *, HOST_WIDE_INT, enum machine_mode), ! void *constfundata, unsigned int align, bool memsetp, int endp) { struct store_by_pieces data; *************** store_by_pieces (rtx to, unsigned HOST_W *** 2303,2309 **** return to; } ! gcc_assert (STORE_BY_PIECES_P (len, align)); data.constfun = constfun; data.constfundata = constfundata; data.len = len; --- 2314,2322 ---- return to; } ! gcc_assert (memsetp ! ? SET_BY_PIECES_P (len, align) ! : STORE_BY_PIECES_P (len, align)); data.constfun = constfun; data.constfundata = constfundata; data.len = len; Index: gcc/expr.h =================================================================== *** gcc/expr.h (revision 127324) --- gcc/expr.h (working copy) *************** enum expand_modifier {EXPAND_NORMAL = 0, *** 84,89 **** --- 84,96 ---- #define CLEAR_RATIO (optimize_size ? 3 : 15) #endif #endif + + /* If a memory set (to value other than zero) operation would take + SET_RATIO or more simple move-instruction sequences, we will do a movmem + or libcall instead. */ + #ifndef SET_RATIO + #define SET_RATIO MOVE_RATIO + #endif enum direction {none, upward, downward}; *************** extern int can_move_by_pieces (unsigned *** 443,462 **** CONSTFUN with several move instructions by store_by_pieces function. CONSTFUNDATA is a pointer which will be passed as argument in every CONSTFUN call. ! ALIGN is maximum alignment we can assume. */ extern int can_store_by_pieces (unsigned HOST_WIDE_INT, rtx (*) (void *, HOST_WIDE_INT, enum machine_mode), ! void *, unsigned int); /* Generate several move instructions to store LEN bytes generated by CONSTFUN to block TO. (A MEM rtx with BLKmode). CONSTFUNDATA is a pointer which will be passed as argument in every CONSTFUN call. ALIGN is maximum alignment we can assume. Returns TO + LEN. */ extern rtx store_by_pieces (rtx, unsigned HOST_WIDE_INT, rtx (*) (void *, HOST_WIDE_INT, enum machine_mode), ! void *, unsigned int, int); /* Emit insns to set X from Y. */ extern rtx emit_move_insn (rtx, rtx); --- 450,472 ---- CONSTFUN with several move instructions by store_by_pieces function. CONSTFUNDATA is a pointer which will be passed as argument in every CONSTFUN call. ! ALIGN is maximum alignment we can assume. ! MEMSETP is true if this is a real memset/bzero, not a copy ! of a const string. */ extern int can_store_by_pieces (unsigned HOST_WIDE_INT, rtx (*) (void *, HOST_WIDE_INT, enum machine_mode), ! void *, unsigned int, bool); /* Generate several move instructions to store LEN bytes generated by CONSTFUN to block TO. (A MEM rtx with BLKmode). CONSTFUNDATA is a pointer which will be passed as argument in every CONSTFUN call. ALIGN is maximum alignment we can assume. + MEMSETP is true if this is a real memset/bzero, not a copy. Returns TO + LEN. */ extern rtx store_by_pieces (rtx, unsigned HOST_WIDE_INT, rtx (*) (void *, HOST_WIDE_INT, enum machine_mode), ! void *, unsigned int, bool, int); /* Emit insns to set X from Y. */ extern rtx emit_move_insn (rtx, rtx); Index: gcc/builtins.c =================================================================== *** gcc/builtins.c (revision 127324) --- gcc/builtins.c (working copy) *************** expand_builtin_memcpy (tree exp, rtx tar *** 3371,3381 **** && GET_CODE (len_rtx) == CONST_INT && (unsigned HOST_WIDE_INT) INTVAL (len_rtx) <= strlen (src_str) + 1 && can_store_by_pieces (INTVAL (len_rtx), builtin_memcpy_read_str, ! (void *) src_str, dest_align)) { dest_mem = store_by_pieces (dest_mem, INTVAL (len_rtx), builtin_memcpy_read_str, ! (void *) src_str, dest_align, 0); dest_mem = force_operand (XEXP (dest_mem, 0), NULL_RTX); dest_mem = convert_memory_address (ptr_mode, dest_mem); return dest_mem; --- 3371,3381 ---- && GET_CODE (len_rtx) == CONST_INT && (unsigned HOST_WIDE_INT) INTVAL (len_rtx) <= strlen (src_str) + 1 && can_store_by_pieces (INTVAL (len_rtx), builtin_memcpy_read_str, ! (void *) src_str, dest_align, false)) { dest_mem = store_by_pieces (dest_mem, INTVAL (len_rtx), builtin_memcpy_read_str, ! (void *) src_str, dest_align, false, 0); dest_mem = force_operand (XEXP (dest_mem, 0), NULL_RTX); dest_mem = convert_memory_address (ptr_mode, dest_mem); return dest_mem; *************** expand_builtin_mempcpy_args (tree dest, *** 3484,3496 **** && GET_CODE (len_rtx) == CONST_INT && (unsigned HOST_WIDE_INT) INTVAL (len_rtx) <= strlen (src_str) + 1 && can_store_by_pieces (INTVAL (len_rtx), builtin_memcpy_read_str, ! (void *) src_str, dest_align)) { dest_mem = get_memory_rtx (dest, len); set_mem_align (dest_mem, dest_align); dest_mem = store_by_pieces (dest_mem, INTVAL (len_rtx), builtin_memcpy_read_str, ! (void *) src_str, dest_align, endp); dest_mem = force_operand (XEXP (dest_mem, 0), NULL_RTX); dest_mem = convert_memory_address (ptr_mode, dest_mem); return dest_mem; --- 3484,3497 ---- && GET_CODE (len_rtx) == CONST_INT && (unsigned HOST_WIDE_INT) INTVAL (len_rtx) <= strlen (src_str) + 1 && can_store_by_pieces (INTVAL (len_rtx), builtin_memcpy_read_str, ! (void *) src_str, dest_align, false)) { dest_mem = get_memory_rtx (dest, len); set_mem_align (dest_mem, dest_align); dest_mem = store_by_pieces (dest_mem, INTVAL (len_rtx), builtin_memcpy_read_str, ! (void *) src_str, dest_align, ! false, endp); dest_mem = force_operand (XEXP (dest_mem, 0), NULL_RTX); dest_mem = convert_memory_address (ptr_mode, dest_mem); return dest_mem; *************** expand_builtin_strncpy (tree exp, rtx ta *** 3832,3844 **** if (!p || dest_align == 0 || !host_integerp (len, 1) || !can_store_by_pieces (tree_low_cst (len, 1), builtin_strncpy_read_str, ! (void *) p, dest_align)) return NULL_RTX; dest_mem = get_memory_rtx (dest, len); store_by_pieces (dest_mem, tree_low_cst (len, 1), builtin_strncpy_read_str, ! (void *) p, dest_align, 0); dest_mem = force_operand (XEXP (dest_mem, 0), NULL_RTX); dest_mem = convert_memory_address (ptr_mode, dest_mem); return dest_mem; --- 3833,3845 ---- if (!p || dest_align == 0 || !host_integerp (len, 1) || !can_store_by_pieces (tree_low_cst (len, 1), builtin_strncpy_read_str, ! (void *) p, dest_align, false)) return NULL_RTX; dest_mem = get_memory_rtx (dest, len); store_by_pieces (dest_mem, tree_low_cst (len, 1), builtin_strncpy_read_str, ! (void *) p, dest_align, false, 0); dest_mem = force_operand (XEXP (dest_mem, 0), NULL_RTX); dest_mem = convert_memory_address (ptr_mode, dest_mem); return dest_mem; *************** expand_builtin_memset_args (tree dest, t *** 3966,3979 **** * We can't pass builtin_memset_gen_str as that emits RTL. */ c = 1; if (host_integerp (len, 1) - && !(optimize_size && tree_low_cst (len, 1) > 1) && can_store_by_pieces (tree_low_cst (len, 1), ! builtin_memset_read_str, &c, dest_align)) { val_rtx = force_reg (TYPE_MODE (unsigned_char_type_node), val_rtx); store_by_pieces (dest_mem, tree_low_cst (len, 1), ! builtin_memset_gen_str, val_rtx, dest_align, 0); } else if (!set_storage_via_setmem (dest_mem, len_rtx, val_rtx, dest_align, expected_align, --- 3967,3981 ---- * We can't pass builtin_memset_gen_str as that emits RTL. */ c = 1; if (host_integerp (len, 1) && can_store_by_pieces (tree_low_cst (len, 1), ! builtin_memset_read_str, &c, dest_align, ! true)) { val_rtx = force_reg (TYPE_MODE (unsigned_char_type_node), val_rtx); store_by_pieces (dest_mem, tree_low_cst (len, 1), ! builtin_memset_gen_str, val_rtx, dest_align, ! true, 0); } else if (!set_storage_via_setmem (dest_mem, len_rtx, val_rtx, dest_align, expected_align, *************** expand_builtin_memset_args (tree dest, t *** 3991,4001 **** if (c) { if (host_integerp (len, 1) - && !(optimize_size && tree_low_cst (len, 1) > 1) && can_store_by_pieces (tree_low_cst (len, 1), ! builtin_memset_read_str, &c, dest_align)) store_by_pieces (dest_mem, tree_low_cst (len, 1), ! builtin_memset_read_str, &c, dest_align, 0); else if (!set_storage_via_setmem (dest_mem, len_rtx, GEN_INT (c), dest_align, expected_align, expected_size)) --- 3993,4003 ---- if (c) { if (host_integerp (len, 1) && can_store_by_pieces (tree_low_cst (len, 1), ! builtin_memset_read_str, &c, dest_align, ! true)) store_by_pieces (dest_mem, tree_low_cst (len, 1), ! builtin_memset_read_str, &c, dest_align, true, 0); else if (!set_storage_via_setmem (dest_mem, len_rtx, GEN_INT (c), dest_align, expected_align, expected_size)) Index: gcc/value-prof.c =================================================================== *** gcc/value-prof.c (revision 127324) --- gcc/value-prof.c (working copy) *************** tree_stringops_transform (block_stmt_ite *** 1392,1404 **** case BUILT_IN_MEMSET: if (!can_store_by_pieces (val, builtin_memset_read_str, CALL_EXPR_ARG (call, 1), ! dest_align)) return false; break; case BUILT_IN_BZERO: if (!can_store_by_pieces (val, builtin_memset_read_str, integer_zero_node, ! dest_align)) return false; break; default: --- 1392,1404 ---- case BUILT_IN_MEMSET: if (!can_store_by_pieces (val, builtin_memset_read_str, CALL_EXPR_ARG (call, 1), ! dest_align, true)) return false; break; case BUILT_IN_BZERO: if (!can_store_by_pieces (val, builtin_memset_read_str, integer_zero_node, ! dest_align, true)) return false; break; default: Index: gcc/config/sh/sh.h =================================================================== *** gcc/config/sh/sh.h (revision 127324) --- gcc/config/sh/sh.h (working copy) *************** struct sh_args { *** 2184,2189 **** --- 2184,2191 ---- (move_by_pieces_ninsns (SIZE, ALIGN, STORE_MAX_PIECES + 1) \ < (TARGET_SMALLCODE ? 2 : ((ALIGN >= 32) ? 16 : 2))) + #define SET_BY_PIECES_P(SIZE, ALIGN) STORE_BY_PIECES_P(SIZE, ALIGN) + /* Macros to check register numbers against specific register classes. */ /* These assume that REGNO is a hard or pseudo reg number. Index: gcc/config/s390/s390.h =================================================================== *** gcc/config/s390/s390.h (revision 127324) --- gcc/config/s390/s390.h (working copy) *************** extern struct rtx_def *s390_compare_op0, *** 803,812 **** || (TARGET_64BIT && (SIZE) == 8) ) /* This macro is used to determine whether store_by_pieces should be ! called to "memset" storage with byte values other than zero, or ! to "memcpy" storage when the source is a constant string. */ #define STORE_BY_PIECES_P(SIZE, ALIGN) MOVE_BY_PIECES_P (SIZE, ALIGN) /* Don't perform CSE on function addresses. */ #define NO_FUNCTION_CSE --- 803,815 ---- || (TARGET_64BIT && (SIZE) == 8) ) /* This macro is used to determine whether store_by_pieces should be ! called to "memcpy" storage when the source is a constant string. */ #define STORE_BY_PIECES_P(SIZE, ALIGN) MOVE_BY_PIECES_P (SIZE, ALIGN) + /* Likewise to decide whether to "memset" storage with byte values + other than zero. */ + #define SET_BY_PIECES_P(SIZE, ALIGN) STORE_BY_PIECES_P (SIZE, ALIGN) + /* Don't perform CSE on function addresses. */ #define NO_FUNCTION_CSE Index: gcc/config/mips/mips.opt =================================================================== *** gcc/config/mips/mips.opt (revision 127325) --- gcc/config/mips/mips.opt (working copy) *************** Target Report RejectNegative Mask(LONG64 *** 173,179 **** Use a 64-bit long type mmemcpy ! Target Report Var(TARGET_MEMCPY) Don't optimize block moves mmips-tfile --- 173,179 ---- Use a 64-bit long type mmemcpy ! Target Report Mask(MEMCPY) Don't optimize block moves mmips-tfile Index: gcc/config/mips/mips.c =================================================================== *** gcc/config/mips/mips.c (revision 127325) --- gcc/config/mips/mips.c (working copy) *************** override_options (void) *** 5299,5304 **** --- 5299,5309 ---- flag_delayed_branch = 0; } + /* Prefer a call to memcpy over inline code when optimizing for size, + though see MOVE_RATIO in mips.h. */ + if (optimize_size && (target_flags_explicit & MASK_MEMCPY) == 0) + target_flags |= MASK_MEMCPY; + #ifdef MIPS_TFMODE_FORMAT REAL_MODE_FORMAT (TFmode) = &MIPS_TFMODE_FORMAT; #endif Index: gcc/config/mips/mips.h =================================================================== *** gcc/config/mips/mips.h (revision 127325) --- gcc/config/mips/mips.h (working copy) *************** while (0) *** 2780,2785 **** --- 2780,2836 ---- #undef PTRDIFF_TYPE #define PTRDIFF_TYPE (POINTER_SIZE == 64 ? "long int" : "int") + + /* The base cost of a memcpy call, for MOVE_RATIO and friends. These + values were determined experimentally by benchmarking with CSiBE. + In theory, the call overhead is higher for TARGET_ABICALLS (especially + for o32 where we have to restore $gp afterwards as well as make an + indirect call), but in practice, bumping this up higher for + TARGET_ABICALLS doesn't make much difference to code size. */ + + #define MIPS_CALL_RATIO 8 + + /* Define MOVE_RATIO to encourage use of movmemsi when enabled, + since it should always generate code at least as good as + move_by_pieces(). But when inline movmemsi pattern is disabled + (i.e., with -mips16 or -mmemcpy), instead use a value approximating + the length of a memcpy call sequence, so that move_by_pieces will + generate inline code if it is shorter than a function call. + Since move_by_pieces_ninsns() counts memory-to-memory moves, but + we'll have to generate a load/store pair for each, halve the value of + MIPS_CALL_RATIO to take that into account. + The default value for MOVE_RATIO when HAVE_movmemsi is true is 2. + There is no point to setting it to less than this to try to disable + move_by_pieces entirely, because that also disables some desirable + tree-level optimizations, specifically related to optimizing a + one-byte string copy into a simple move byte operation. */ + + #define MOVE_RATIO \ + ((TARGET_MIPS16 || TARGET_MEMCPY) ? MIPS_CALL_RATIO / 2 : 2) + + /* For CLEAR_RATIO, when optimizing for size, give a better estimate + of the length of a memset call, but use the default otherwise. */ + + #define CLEAR_RATIO \ + (optimize_size ? MIPS_CALL_RATIO : 15) + + /* This is similar to CLEAR_RATIO, but for a non-zero constant, so when + optimizing for size adjust the ratio to account for the overhead of + loading the constant and replicating it across the word. */ + + #define SET_RATIO \ + (optimize_size ? MIPS_CALL_RATIO - 2 : 15) + + /* STORE_BY_PIECES_P can be used when copying a constant string, but + in that case each word takes 3 insns (lui, ori, sw), or more in + 64-bit mode, instead of 2 (lw, sw). For now we always fail this + and let the move_by_pieces code copy the string from read-only + memory. In the future, this could be tuned further for multi-issue + CPUs that can issue stores down one pipe and arithmetic instructions + down another; in that case, the lui/ori/sw combination would be a + win for long enough strings. */ + + #define STORE_BY_PIECES_P(SIZE, ALIGN) 0 #ifndef __mips16 /* Since the bits of the _init and _fini function is spread across --------------000108000701060004050209--