public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* PATCH: fine-tuning for can_store_by_pieces
@ 2007-08-15 17:15 Sandra Loosemore
  2007-08-15 17:22 ` Andrew Pinski
  2007-08-16  8:34 ` Richard Sandiford
  0 siblings, 2 replies; 30+ messages in thread
From: Sandra Loosemore @ 2007-08-15 17:15 UTC (permalink / raw)
  To: GCC Patches
  Cc: Nigel Stephens, Guy Morrogh, David Ung, Thiemo Seufer, Richard Sandiford

[-- Attachment #1: Type: text/plain, Size: 835 bytes --]

On MIPS we can get better code by having can_store_by_pieces differentiate 
between the cases where it's used for memset operations, and those where it's 
used to copy a string constant.  This patch introduces new SET_RATIO and 
SET_BY_PIECES_P macros, with appropriate defaults to preserve the existing 
behavior.  I checked other targets and made the ones that override the default 
STORE_BY_PIECES_P clone the same definition for SET_BY_PIECES_P.

I used CSiBE to run some code size measurements with -Os.  Together, these bits 
add up to about 0.5% total space savings for both MIPS32R2 and MIPS64.  With 
-mips16 there was just a tiny improvement.

Look OK?  I haven't yet tested on other platforms besides MIPS but will do a 
bootstrap on i686 before checking it in, if there are no objections to the patch 
otherwise.

-Sandra


[-- Attachment #2: 31-frob-by-pieces.log --]
[-- Type: text/x-log, Size: 1127 bytes --]

2007-08-15  Sandra Loosemore  <sandra@codesourcery.com>
            Nigel Stephens <nigel@mips.com>

	gcc/

	* doc/tm.texi (SET_RATIO, SET_BY_PIECES_P): Document new macros.
	(STORE_BY_PIECES_P): No longer applies to __builtin_memset.
	* expr.c (SET_BY_PIECES_P): Define.
	(can_store_by_pieces, store_by_pieces): Add MEMSETP argument; use
	it to decide whether to use SET_BY_PIECES_P or STORE_BY_PIECES_P.
	* expr.h (SET_RATIO): Define.
	(can_store_by_pieces, store_by_pieces):	Update prototypes.
	* builtins.c (expand_builtin_memcpy): Pass MEMSETP argument to
	can_store_by_pieces/store_by_pieces.
	(expand_builtin_memcpy_args): Likewise.
	(expand_builtin_strncpy): Likewise.
	(expand_builtin_memset_args): Likewise.
	* value-prof.c (tree_stringops_transform): Likewise.

	* config/sh/sh.h (SET_BY_PIECES_P): Clone from STORE_BY_PIECES_P.
	* config/s390/s390.h (SET_BY_PIECES_P): Likewise.

	* config/mips/mips.opt (mmemcpy): Change from Var to Mask.
	* config/mips/mips.c (override_options): Make -Os default to -mmemcpy.
	* config/mips/mips.h (MOVE_RATIO, CLEAR_RATIO, SET_RATIO): Define.
	(STORE_BY_PIECES_P): Define to 0.

[-- Attachment #3: 31-frob-by-pieces.patch --]
[-- Type: text/x-patch, Size: 21632 bytes --]

Index: gcc/doc/tm.texi
===================================================================
*** gcc/doc/tm.texi	(revision 127324)
--- gcc/doc/tm.texi	(working copy)
*************** will be used.  Defaults to 1 if @code{mo
*** 5893,5904 ****
  than @code{CLEAR_RATIO}.
  @end defmac
  
  @defmac STORE_BY_PIECES_P (@var{size}, @var{alignment})
  A C expression used to determine whether @code{store_by_pieces} will be
! used to set a chunk of memory to a constant value, or whether some other
! mechanism will be used.  Used by @code{__builtin_memset} when storing
! values other than constant zero and by @code{__builtin_strcpy} when
! when called with a constant source string.
  Defaults to 1 if @code{move_by_pieces_ninsns} returns less
  than @code{MOVE_RATIO}.
  @end defmac
--- 5893,5922 ----
  than @code{CLEAR_RATIO}.
  @end defmac
  
+ @defmac SET_RATIO
+ The threshold of number of scalar move insns, @emph{below} which a sequence
+ of insns should be generated to set memory to a constant value, instead of
+ a block set insn or a library call.  
+ Increasing the value will always make code faster, but
+ eventually incurs high cost in increased code size.
+ 
+ If you don't define this, it defaults to the value of @code{MOVE_RATIO}.
+ @end defmac
+ 
+ @defmac SET_BY_PIECES_P (@var{size}, @var{alignment})
+ A C expression used to determine whether @code{store_by_pieces} will be
+ used to set a chunk of memory to a constant value, or whether some 
+ other mechanism will be used.  Used by @code{__builtin_memset} when 
+ storing values other than constant zero.
+ Defaults to 1 if @code{move_by_pieces_ninsns} returns less
+ than @code{SET_RATIO}.
+ @end defmac
+ 
  @defmac STORE_BY_PIECES_P (@var{size}, @var{alignment})
  A C expression used to determine whether @code{store_by_pieces} will be
! used to set a chunk of memory to a constant string value, or whether some 
! other mechanism will be used.  Used by @code{__builtin_strcpy} when
! called with a constant source string.
  Defaults to 1 if @code{move_by_pieces_ninsns} returns less
  than @code{MOVE_RATIO}.
  @end defmac
Index: gcc/expr.c
===================================================================
*** gcc/expr.c	(revision 127324)
--- gcc/expr.c	(working copy)
*************** static bool float_extend_from_mem[NUM_MA
*** 186,193 ****
  #endif
  
  /* This macro is used to determine whether store_by_pieces should be
!    called to "memset" storage with byte values other than zero, or
!    to "memcpy" storage when the source is a constant string.  */
  #ifndef STORE_BY_PIECES_P
  #define STORE_BY_PIECES_P(SIZE, ALIGN) \
    (move_by_pieces_ninsns (SIZE, ALIGN, STORE_MAX_PIECES + 1) \
--- 186,200 ----
  #endif
  
  /* This macro is used to determine whether store_by_pieces should be
!    called to "memset" storage with byte values other than zero.  */
! #ifndef SET_BY_PIECES_P
! #define SET_BY_PIECES_P(SIZE, ALIGN) \
!   (move_by_pieces_ninsns (SIZE, ALIGN, STORE_MAX_PIECES + 1) \
!    < (unsigned int) SET_RATIO)
! #endif
! 
! /* This macro is used to determine whether store_by_pieces should be
!    called to "memcpy" storage when the source is a constant string.  */
  #ifndef STORE_BY_PIECES_P
  #define STORE_BY_PIECES_P(SIZE, ALIGN) \
    (move_by_pieces_ninsns (SIZE, ALIGN, STORE_MAX_PIECES + 1) \
*************** use_group_regs (rtx *call_fusage, rtx re
*** 2191,2203 ****
  /* Determine whether the LEN bytes generated by CONSTFUN can be
     stored to memory using several move instructions.  CONSTFUNDATA is
     a pointer which will be passed as argument in every CONSTFUN call.
!    ALIGN is maximum alignment we can assume.  Return nonzero if a
!    call to store_by_pieces should succeed.  */
  
  int
  can_store_by_pieces (unsigned HOST_WIDE_INT len,
  		     rtx (*constfun) (void *, HOST_WIDE_INT, enum machine_mode),
! 		     void *constfundata, unsigned int align)
  {
    unsigned HOST_WIDE_INT l;
    unsigned int max_size;
--- 2198,2211 ----
  /* Determine whether the LEN bytes generated by CONSTFUN can be
     stored to memory using several move instructions.  CONSTFUNDATA is
     a pointer which will be passed as argument in every CONSTFUN call.
!    ALIGN is maximum alignment we can assume.  MEMSETP is true if this is
!    a memset operation and false if it's a copy of a constant string.
!    Return nonzero if a call to store_by_pieces should succeed.  */
  
  int
  can_store_by_pieces (unsigned HOST_WIDE_INT len,
  		     rtx (*constfun) (void *, HOST_WIDE_INT, enum machine_mode),
! 		     void *constfundata, unsigned int align, int memsetp)
  {
    unsigned HOST_WIDE_INT l;
    unsigned int max_size;
*************** can_store_by_pieces (unsigned HOST_WIDE_
*** 2210,2216 ****
    if (len == 0)
      return 1;
  
!   if (! STORE_BY_PIECES_P (len, align))
      return 0;
  
    tmode = mode_for_size (STORE_MAX_PIECES * BITS_PER_UNIT, MODE_INT, 1);
--- 2218,2226 ----
    if (len == 0)
      return 1;
  
!   if (! (memsetp 
! 	 ? SET_BY_PIECES_P (len, align)
! 	 : STORE_BY_PIECES_P (len, align)))
      return 0;
  
    tmode = mode_for_size (STORE_MAX_PIECES * BITS_PER_UNIT, MODE_INT, 1);
*************** can_store_by_pieces (unsigned HOST_WIDE_
*** 2285,2291 ****
  /* Generate several move instructions to store LEN bytes generated by
     CONSTFUN to block TO.  (A MEM rtx with BLKmode).  CONSTFUNDATA is a
     pointer which will be passed as argument in every CONSTFUN call.
!    ALIGN is maximum alignment we can assume.
     If ENDP is 0 return to, if ENDP is 1 return memory at the end ala
     mempcpy, and if ENDP is 2 return memory the end minus one byte ala
     stpcpy.  */
--- 2295,2302 ----
  /* Generate several move instructions to store LEN bytes generated by
     CONSTFUN to block TO.  (A MEM rtx with BLKmode).  CONSTFUNDATA is a
     pointer which will be passed as argument in every CONSTFUN call.
!    ALIGN is maximum alignment we can assume.  MEMSETP is true if this is
!    a memset operation and false if it's a copy of a constant string.
     If ENDP is 0 return to, if ENDP is 1 return memory at the end ala
     mempcpy, and if ENDP is 2 return memory the end minus one byte ala
     stpcpy.  */
*************** can_store_by_pieces (unsigned HOST_WIDE_
*** 2293,2299 ****
  rtx
  store_by_pieces (rtx to, unsigned HOST_WIDE_INT len,
  		 rtx (*constfun) (void *, HOST_WIDE_INT, enum machine_mode),
! 		 void *constfundata, unsigned int align, int endp)
  {
    struct store_by_pieces data;
  
--- 2304,2310 ----
  rtx
  store_by_pieces (rtx to, unsigned HOST_WIDE_INT len,
  		 rtx (*constfun) (void *, HOST_WIDE_INT, enum machine_mode),
! 		 void *constfundata, unsigned int align, int memsetp, int endp)
  {
    struct store_by_pieces data;
  
*************** store_by_pieces (rtx to, unsigned HOST_W
*** 2303,2309 ****
        return to;
      }
  
!   gcc_assert (STORE_BY_PIECES_P (len, align));
    data.constfun = constfun;
    data.constfundata = constfundata;
    data.len = len;
--- 2314,2322 ----
        return to;
      }
  
!   gcc_assert (memsetp
! 	      ? SET_BY_PIECES_P (len, align)
! 	      : STORE_BY_PIECES_P (len, align));
    data.constfun = constfun;
    data.constfundata = constfundata;
    data.len = len;
Index: gcc/expr.h
===================================================================
*** gcc/expr.h	(revision 127324)
--- gcc/expr.h	(working copy)
*************** enum expand_modifier {EXPAND_NORMAL = 0,
*** 84,89 ****
--- 84,96 ----
  #define CLEAR_RATIO (optimize_size ? 3 : 15)
  #endif
  #endif
+ 
+ /* If a memory set (to value other than zero) operation would take
+    SET_RATIO or more simple move-instruction sequences, we will do a movmem
+    or libcall instead.  */
+ #ifndef SET_RATIO
+ #define SET_RATIO MOVE_RATIO
+ #endif
  \f
  enum direction {none, upward, downward};
  
*************** extern int can_move_by_pieces (unsigned 
*** 443,462 ****
     CONSTFUN with several move instructions by store_by_pieces
     function.  CONSTFUNDATA is a pointer which will be passed as argument
     in every CONSTFUN call.
!    ALIGN is maximum alignment we can assume.  */
  extern int can_store_by_pieces (unsigned HOST_WIDE_INT,
  				rtx (*) (void *, HOST_WIDE_INT,
  					 enum machine_mode),
! 				void *, unsigned int);
  
  /* Generate several move instructions to store LEN bytes generated by
     CONSTFUN to block TO.  (A MEM rtx with BLKmode).  CONSTFUNDATA is a
     pointer which will be passed as argument in every CONSTFUN call.
     ALIGN is maximum alignment we can assume.
     Returns TO + LEN.  */
  extern rtx store_by_pieces (rtx, unsigned HOST_WIDE_INT,
  			    rtx (*) (void *, HOST_WIDE_INT, enum machine_mode),
! 			    void *, unsigned int, int);
  
  /* Emit insns to set X from Y.  */
  extern rtx emit_move_insn (rtx, rtx);
--- 450,472 ----
     CONSTFUN with several move instructions by store_by_pieces
     function.  CONSTFUNDATA is a pointer which will be passed as argument
     in every CONSTFUN call.
!    ALIGN is maximum alignment we can assume.
!    MEMSETP is true if this is a real memset/bzero, not a copy
!    of a const string.  */
  extern int can_store_by_pieces (unsigned HOST_WIDE_INT,
  				rtx (*) (void *, HOST_WIDE_INT,
  					 enum machine_mode),
! 				void *, unsigned int, int);
  
  /* Generate several move instructions to store LEN bytes generated by
     CONSTFUN to block TO.  (A MEM rtx with BLKmode).  CONSTFUNDATA is a
     pointer which will be passed as argument in every CONSTFUN call.
     ALIGN is maximum alignment we can assume.
+    MEMSETP is true if this is a real memset/bzero, not a copy.
     Returns TO + LEN.  */
  extern rtx store_by_pieces (rtx, unsigned HOST_WIDE_INT,
  			    rtx (*) (void *, HOST_WIDE_INT, enum machine_mode),
! 			    void *, unsigned int, int, int);
  
  /* Emit insns to set X from Y.  */
  extern rtx emit_move_insn (rtx, rtx);
Index: gcc/builtins.c
===================================================================
*** gcc/builtins.c	(revision 127324)
--- gcc/builtins.c	(working copy)
*************** expand_builtin_memcpy (tree exp, rtx tar
*** 3371,3381 ****
  	  && GET_CODE (len_rtx) == CONST_INT
  	  && (unsigned HOST_WIDE_INT) INTVAL (len_rtx) <= strlen (src_str) + 1
  	  && can_store_by_pieces (INTVAL (len_rtx), builtin_memcpy_read_str,
! 				  (void *) src_str, dest_align))
  	{
  	  dest_mem = store_by_pieces (dest_mem, INTVAL (len_rtx),
  				      builtin_memcpy_read_str,
! 				      (void *) src_str, dest_align, 0);
  	  dest_mem = force_operand (XEXP (dest_mem, 0), NULL_RTX);
  	  dest_mem = convert_memory_address (ptr_mode, dest_mem);
  	  return dest_mem;
--- 3371,3381 ----
  	  && GET_CODE (len_rtx) == CONST_INT
  	  && (unsigned HOST_WIDE_INT) INTVAL (len_rtx) <= strlen (src_str) + 1
  	  && can_store_by_pieces (INTVAL (len_rtx), builtin_memcpy_read_str,
! 				  (void *) src_str, dest_align, 0))
  	{
  	  dest_mem = store_by_pieces (dest_mem, INTVAL (len_rtx),
  				      builtin_memcpy_read_str,
! 				      (void *) src_str, dest_align, 0, 0);
  	  dest_mem = force_operand (XEXP (dest_mem, 0), NULL_RTX);
  	  dest_mem = convert_memory_address (ptr_mode, dest_mem);
  	  return dest_mem;
*************** expand_builtin_mempcpy_args (tree dest, 
*** 3484,3496 ****
  	  && GET_CODE (len_rtx) == CONST_INT
  	  && (unsigned HOST_WIDE_INT) INTVAL (len_rtx) <= strlen (src_str) + 1
  	  && can_store_by_pieces (INTVAL (len_rtx), builtin_memcpy_read_str,
! 				  (void *) src_str, dest_align))
  	{
  	  dest_mem = get_memory_rtx (dest, len);
  	  set_mem_align (dest_mem, dest_align);
  	  dest_mem = store_by_pieces (dest_mem, INTVAL (len_rtx),
  				      builtin_memcpy_read_str,
! 				      (void *) src_str, dest_align, endp);
  	  dest_mem = force_operand (XEXP (dest_mem, 0), NULL_RTX);
  	  dest_mem = convert_memory_address (ptr_mode, dest_mem);
  	  return dest_mem;
--- 3484,3496 ----
  	  && GET_CODE (len_rtx) == CONST_INT
  	  && (unsigned HOST_WIDE_INT) INTVAL (len_rtx) <= strlen (src_str) + 1
  	  && can_store_by_pieces (INTVAL (len_rtx), builtin_memcpy_read_str,
! 				  (void *) src_str, dest_align, 0))
  	{
  	  dest_mem = get_memory_rtx (dest, len);
  	  set_mem_align (dest_mem, dest_align);
  	  dest_mem = store_by_pieces (dest_mem, INTVAL (len_rtx),
  				      builtin_memcpy_read_str,
! 				      (void *) src_str, dest_align, 0, endp);
  	  dest_mem = force_operand (XEXP (dest_mem, 0), NULL_RTX);
  	  dest_mem = convert_memory_address (ptr_mode, dest_mem);
  	  return dest_mem;
*************** expand_builtin_strncpy (tree exp, rtx ta
*** 3832,3844 ****
  	  if (!p || dest_align == 0 || !host_integerp (len, 1)
  	      || !can_store_by_pieces (tree_low_cst (len, 1),
  				       builtin_strncpy_read_str,
! 				       (void *) p, dest_align))
  	    return NULL_RTX;
  
  	  dest_mem = get_memory_rtx (dest, len);
  	  store_by_pieces (dest_mem, tree_low_cst (len, 1),
  			   builtin_strncpy_read_str,
! 			   (void *) p, dest_align, 0);
  	  dest_mem = force_operand (XEXP (dest_mem, 0), NULL_RTX);
  	  dest_mem = convert_memory_address (ptr_mode, dest_mem);
  	  return dest_mem;
--- 3832,3844 ----
  	  if (!p || dest_align == 0 || !host_integerp (len, 1)
  	      || !can_store_by_pieces (tree_low_cst (len, 1),
  				       builtin_strncpy_read_str,
! 				       (void *) p, dest_align, 0))
  	    return NULL_RTX;
  
  	  dest_mem = get_memory_rtx (dest, len);
  	  store_by_pieces (dest_mem, tree_low_cst (len, 1),
  			   builtin_strncpy_read_str,
! 			   (void *) p, dest_align, 0, 0);
  	  dest_mem = force_operand (XEXP (dest_mem, 0), NULL_RTX);
  	  dest_mem = convert_memory_address (ptr_mode, dest_mem);
  	  return dest_mem;
*************** expand_builtin_memset_args (tree dest, t
*** 3968,3979 ****
        if (host_integerp (len, 1)
  	  && !(optimize_size && tree_low_cst (len, 1) > 1)
  	  && can_store_by_pieces (tree_low_cst (len, 1),
! 				  builtin_memset_read_str, &c, dest_align))
  	{
  	  val_rtx = force_reg (TYPE_MODE (unsigned_char_type_node),
  			       val_rtx);
  	  store_by_pieces (dest_mem, tree_low_cst (len, 1),
! 			   builtin_memset_gen_str, val_rtx, dest_align, 0);
  	}
        else if (!set_storage_via_setmem (dest_mem, len_rtx, val_rtx,
  					dest_align, expected_align,
--- 3968,3979 ----
        if (host_integerp (len, 1)
  	  && !(optimize_size && tree_low_cst (len, 1) > 1)
  	  && can_store_by_pieces (tree_low_cst (len, 1),
! 				  builtin_memset_read_str, &c, dest_align, 1))
  	{
  	  val_rtx = force_reg (TYPE_MODE (unsigned_char_type_node),
  			       val_rtx);
  	  store_by_pieces (dest_mem, tree_low_cst (len, 1),
! 			   builtin_memset_gen_str, val_rtx, dest_align, 1, 0);
  	}
        else if (!set_storage_via_setmem (dest_mem, len_rtx, val_rtx,
  					dest_align, expected_align,
*************** expand_builtin_memset_args (tree dest, t
*** 3993,4001 ****
        if (host_integerp (len, 1)
  	  && !(optimize_size && tree_low_cst (len, 1) > 1)
  	  && can_store_by_pieces (tree_low_cst (len, 1),
! 				  builtin_memset_read_str, &c, dest_align))
  	store_by_pieces (dest_mem, tree_low_cst (len, 1),
! 			 builtin_memset_read_str, &c, dest_align, 0);
        else if (!set_storage_via_setmem (dest_mem, len_rtx, GEN_INT (c),
  					dest_align, expected_align,
  					expected_size))
--- 3993,4001 ----
        if (host_integerp (len, 1)
  	  && !(optimize_size && tree_low_cst (len, 1) > 1)
  	  && can_store_by_pieces (tree_low_cst (len, 1),
! 				  builtin_memset_read_str, &c, dest_align, 1))
  	store_by_pieces (dest_mem, tree_low_cst (len, 1),
! 			 builtin_memset_read_str, &c, dest_align, 1, 0);
        else if (!set_storage_via_setmem (dest_mem, len_rtx, GEN_INT (c),
  					dest_align, expected_align,
  					expected_size))
Index: gcc/value-prof.c
===================================================================
*** gcc/value-prof.c	(revision 127324)
--- gcc/value-prof.c	(working copy)
*************** tree_stringops_transform (block_stmt_ite
*** 1392,1404 ****
      case BUILT_IN_MEMSET:
        if (!can_store_by_pieces (val, builtin_memset_read_str,
  				CALL_EXPR_ARG (call, 1),
! 				dest_align))
  	return false;
        break;
      case BUILT_IN_BZERO:
        if (!can_store_by_pieces (val, builtin_memset_read_str,
  				integer_zero_node,
! 				dest_align))
  	return false;
        break;
      default:
--- 1392,1404 ----
      case BUILT_IN_MEMSET:
        if (!can_store_by_pieces (val, builtin_memset_read_str,
  				CALL_EXPR_ARG (call, 1),
! 				dest_align, 1))
  	return false;
        break;
      case BUILT_IN_BZERO:
        if (!can_store_by_pieces (val, builtin_memset_read_str,
  				integer_zero_node,
! 				dest_align, 1))
  	return false;
        break;
      default:
Index: gcc/config/sh/sh.h
===================================================================
*** gcc/config/sh/sh.h	(revision 127324)
--- gcc/config/sh/sh.h	(working copy)
*************** struct sh_args {
*** 2180,2185 ****
--- 2180,2189 ----
    (move_by_pieces_ninsns (SIZE, ALIGN, MOVE_MAX_PIECES + 1) \
     < (TARGET_SMALLCODE ? 2 : ((ALIGN >= 32) ? 16 : 2)))
  
+ #define SET_BY_PIECES_P(SIZE, ALIGN) \
+   (move_by_pieces_ninsns (SIZE, ALIGN, STORE_MAX_PIECES + 1) \
+    < (TARGET_SMALLCODE ? 2 : ((ALIGN >= 32) ? 16 : 2)))
+ 
  #define STORE_BY_PIECES_P(SIZE, ALIGN) \
    (move_by_pieces_ninsns (SIZE, ALIGN, STORE_MAX_PIECES + 1) \
     < (TARGET_SMALLCODE ? 2 : ((ALIGN >= 32) ? 16 : 2)))
Index: gcc/config/s390/s390.h
===================================================================
*** gcc/config/s390/s390.h	(revision 127324)
--- gcc/config/s390/s390.h	(working copy)
*************** extern struct rtx_def *s390_compare_op0,
*** 803,810 ****
      || (TARGET_64BIT && (SIZE) == 8) )
  
  /* This macro is used to determine whether store_by_pieces should be
!    called to "memset" storage with byte values other than zero, or
!    to "memcpy" storage when the source is a constant string.  */
  #define STORE_BY_PIECES_P(SIZE, ALIGN) MOVE_BY_PIECES_P (SIZE, ALIGN)
  
  /* Don't perform CSE on function addresses.  */
--- 803,813 ----
      || (TARGET_64BIT && (SIZE) == 8) )
  
  /* This macro is used to determine whether store_by_pieces should be
!    called to "memset" storage with byte values other than zero.  */
! #define SET_BY_PIECES_P(SIZE, ALIGN) MOVE_BY_PIECES_P (SIZE, ALIGN)
! 
! /* Likewise to decide whether to "memcpy" storage when the source is a
!    constant string.  */
  #define STORE_BY_PIECES_P(SIZE, ALIGN) MOVE_BY_PIECES_P (SIZE, ALIGN)
  
  /* Don't perform CSE on function addresses.  */
Index: gcc/config/mips/mips.opt
===================================================================
*** gcc/config/mips/mips.opt	(revision 127325)
--- gcc/config/mips/mips.opt	(working copy)
*************** Target Report RejectNegative Mask(LONG64
*** 173,179 ****
  Use a 64-bit long type
  
  mmemcpy
! Target Report Var(TARGET_MEMCPY)
  Don't optimize block moves
  
  mmips-tfile
--- 173,179 ----
  Use a 64-bit long type
  
  mmemcpy
! Target Report Mask(MEMCPY)
  Don't optimize block moves
  
  mmips-tfile
Index: gcc/config/mips/mips.c
===================================================================
*** gcc/config/mips/mips.c	(revision 127325)
--- gcc/config/mips/mips.c	(working copy)
*************** override_options (void)
*** 5299,5304 ****
--- 5299,5309 ----
        flag_delayed_branch = 0;
      }
  
+   /* Prefer a call to memcpy over inline code when optimizing for size,
+      though see MOVE_RATIO in mips.h.  */
+   if (optimize_size && (target_flags_explicit & MASK_MEMCPY) == 0)
+     target_flags |= MASK_MEMCPY;
+ 
  #ifdef MIPS_TFMODE_FORMAT
    REAL_MODE_FORMAT (TFmode) = &MIPS_TFMODE_FORMAT;
  #endif
Index: gcc/config/mips/mips.h
===================================================================
*** gcc/config/mips/mips.h	(revision 127325)
--- gcc/config/mips/mips.h	(working copy)
*************** while (0)
*** 2780,2785 ****
--- 2780,2818 ----
  
  #undef PTRDIFF_TYPE
  #define PTRDIFF_TYPE (POINTER_SIZE == 64 ? "long int" : "int")
+ 
+ /* Define MOVE_RATIO to encourage use of movmemsi when enabled,
+    since it should always generate code at least as good as
+    move_by_pieces().  But when inline movmemsi pattern is disabled
+    (i.e., with -mips16 or -mmemcpy), instead represent the length of
+    a memcpy call sequence (~6 insns), so that move_by_pieces will
+    generate inline code if it is shorter than a function call.
+    Note that move_by_pieces_ninsns() counts memory-to-memory moves,
+    but we'll have to generate a load/store pair for each, so further
+    halve the value to take that into account.  */
+ 
+ #define MOVE_RATIO ((TARGET_MIPS16 || TARGET_MEMCPY) ? 3 : 2)
+ 
+ /* For CLEAR_RATIO use a better estimate of the length of a memset
+    call when optimizing for size.  */
+ 
+ #define CLEAR_RATIO	(optimize_size ? 5 : 15)
+ 
+ /* This is similar to CLEAR_RATIO, but for a non-zero constant, so when
+    optimizing for size adjust the ratio to account for the overhead of
+    loading the constant and replicating it across the word.  In fact in
+    that case it is never worth inlining, since calling memset should
+    always be smaller.  */
+ 
+ #define SET_RATIO	(optimize_size ? 3 : 15)
+ 
+ /* STORE_BY_PIECES_P can be used when copying a constant string, but
+    in that case each word takes 3 insns (lui, ori, sw), or more in
+    64-bit mode, instead of 2 (lw, sw). So better to always fail this
+    and let the move_by_pieces code copy the string from read-only
+    memory.  */
+ 
+ #define STORE_BY_PIECES_P(SIZE, ALIGN) 0
  \f
  #ifndef __mips16
  /* Since the bits of the _init and _fini function is spread across

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: PATCH: fine-tuning for can_store_by_pieces
  2007-08-15 17:15 PATCH: fine-tuning for can_store_by_pieces Sandra Loosemore
@ 2007-08-15 17:22 ` Andrew Pinski
  2007-08-15 18:32   ` Sandra Loosemore
                     ` (2 more replies)
  2007-08-16  8:34 ` Richard Sandiford
  1 sibling, 3 replies; 30+ messages in thread
From: Andrew Pinski @ 2007-08-15 17:22 UTC (permalink / raw)
  To: Sandra Loosemore
  Cc: GCC Patches, Nigel Stephens, Guy Morrogh, David Ung,
	Thiemo Seufer, Richard Sandiford

On 8/15/07, Sandra Loosemore <sandra@codesourcery.com> wrote:
> On MIPS we can get better code by having can_store_by_pieces differentiate
> between the cases where it's used for memset operations, and those where it's
> used to copy a string constant.  This patch introduces new SET_RATIO and
> SET_BY_PIECES_P macros, with appropriate defaults to preserve the existing
> behavior.  I checked other targets and made the ones that override the default
> STORE_BY_PIECES_P clone the same definition for SET_BY_PIECES_P.

Are you sure that the cause of the real issue here is not really PR
31150?  I don't think copying string constants and memcpy/memset
should be different in terms of heuristics.

It seems if you gave a testcase where this is profitable, it would be
better to judge this patch (and maybe a testcase for the testsuite
also).

Thanks,
Andrew Pinski

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: PATCH: fine-tuning for can_store_by_pieces
  2007-08-15 17:22 ` Andrew Pinski
@ 2007-08-15 18:32   ` Sandra Loosemore
  2007-08-15 19:53     ` Nigel Stephens
  2007-08-15 19:58   ` Sandra Loosemore
  2007-08-17  4:50   ` Mark Mitchell
  2 siblings, 1 reply; 30+ messages in thread
From: Sandra Loosemore @ 2007-08-15 18:32 UTC (permalink / raw)
  To: Andrew Pinski
  Cc: GCC Patches, Nigel Stephens, Guy Morrogh, David Ung,
	Thiemo Seufer, Richard Sandiford

Andrew Pinski wrote:
> On 8/15/07, Sandra Loosemore <sandra@codesourcery.com> wrote:
>> On MIPS we can get better code by having can_store_by_pieces differentiate
>> between the cases where it's used for memset operations, and those where it's
>> used to copy a string constant.  This patch introduces new SET_RATIO and
>> SET_BY_PIECES_P macros, with appropriate defaults to preserve the existing
>> behavior.  I checked other targets and made the ones that override the default
>> STORE_BY_PIECES_P clone the same definition for SET_BY_PIECES_P.
> 
> Are you sure that the cause of the real issue here is not really PR
> 31150?  

I don't think so.  Nigel originally developed this patch against gcc 3.4; I just 
verified that it still does something useful for current mainline.

> I don't think copying string constants and memcpy/memset
> should be different in terms of heuristics.
> 
> It seems if you gave a testcase where this is profitable, it would be
> better to judge this patch (and maybe a testcase for the testsuite
> also).

I'll see if I can come up with something specific, or perhaps Nigel has a test case.

-Sandra

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: PATCH: fine-tuning for can_store_by_pieces
  2007-08-15 18:32   ` Sandra Loosemore
@ 2007-08-15 19:53     ` Nigel Stephens
  0 siblings, 0 replies; 30+ messages in thread
From: Nigel Stephens @ 2007-08-15 19:53 UTC (permalink / raw)
  To: Sandra Loosemore
  Cc: Andrew Pinski, GCC Patches, Guy Morrogh, David Ung,
	Thiemo Seufer, Richard Sandiford



Sandra Loosemore wrote:
> Andrew Pinski wrote:
>> On 8/15/07, Sandra Loosemore <sandra@codesourcery.com> wrote:
>>> On MIPS we can get better code by having can_store_by_pieces 
>>> differentiate
>>> between the cases where it's used for memset operations, and those 
>>> where it's
>>> used to copy a string constant.  This patch introduces new SET_RATIO 
>>> and
>>> SET_BY_PIECES_P macros, with appropriate defaults to preserve the 
>>> existing
>>> behavior.  I checked other targets and made the ones that override 
>>> the default
>>> STORE_BY_PIECES_P clone the same definition for SET_BY_PIECES_P.
>>
>> Are you sure that the cause of the real issue here is not really PR
>> 31150?  
>
> I don't think so.  Nigel originally developed this patch against gcc 
> 3.4; I just verified that it still does something useful for current 
> mainline.
>
>> I don't think copying string constants and memcpy/memset
>> should be different in terms of heuristics.
>>
>> It seems if you gave a testcase where this is profitable, it would be
>> better to judge this patch (and maybe a testcase for the testsuite
>> also).
>
> I'll see if I can come up with something specific, or perhaps Nigel 
> has a test case.

No I don't have a specific test case: it's not a failure case, but an 
optimisation.

Note that store_by_pieces is used for two different purposes: to copy 
string constants to memory (e.g.  __builtin_strcpy) or initialise local 
arrays, and also to set a memory buffer to a single constant value 
(__builtin_memset). IMHO the benefit computation of using a movstr 
versus expanding it inline does need to be different for each case, 
since on a 32-bit MIPS processor the strcpy gets expanded into a 
repeated sequence of three instructions (lui; ori; sw) where each 
immediate value is the next 4-byte chunk of the string; whereas in the 
second case the immediate value is computed only once, followed then by 
a sequence of "sw" instructions.

Nigel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: PATCH: fine-tuning for can_store_by_pieces
  2007-08-15 17:22 ` Andrew Pinski
  2007-08-15 18:32   ` Sandra Loosemore
@ 2007-08-15 19:58   ` Sandra Loosemore
  2007-08-17  4:50   ` Mark Mitchell
  2 siblings, 0 replies; 30+ messages in thread
From: Sandra Loosemore @ 2007-08-15 19:58 UTC (permalink / raw)
  To: Andrew Pinski
  Cc: GCC Patches, Nigel Stephens, Guy Morrogh, David Ung,
	Thiemo Seufer, Richard Sandiford

Andrew Pinski wrote:

> It seems if you gave a testcase where this is profitable, it would be
> better to judge this patch (and maybe a testcase for the testsuite
> also).

OK, here's a test case.

void f (char *buf)
{
   __builtin_memcpy (buf, "foo", 3);
}

void g (char *buf)
{
   __builtin_memset (buf, 'f', 3);
}

The MIPS back end has its own custom block move expander, so HAVE_movmemsi is 
true, and the current behavior makes MOVE_RATIO default to 2.  But this only 
handles memcpy, not memset; so with -O2 we end up with f using the custom 
expansion and g using an out-of-line call.

The patch makes STORE_BY_PIECES_P zero so it will always go straight to the 
custom code for memcpy.  The SET_BY_PIECES_P test is now independent of 
MOVE_RATIO and HAVE_movmemsi, so with the patch can_store_by_pieces returns true 
for the memset and it uses the generic expansion instead of a library call.
Before:

g:
         .frame  $sp,0,$31               # vars= 0, regs= 0/0, args= 0, gp= 0
         .mask   0x00000000,0
         .fmask  0x00000000,0
         .set    noreorder
         .set    nomacro

         li      $5,102                  # 0x66
         li      $6,3                    # 0x3
         j       memset
         nop

With the patch:

g:
         .frame  $sp,0,$31               # vars= 0, regs= 0/0, args= 0, gp= 0
         .mask   0x00000000,0
         .fmask  0x00000000,0
         .set    noreorder
         .set    nomacro

         li      $2,102
         sb      $2,2($4)
         sb      $2,0($4)
         j       $31
         sb      $2,1($4)

The code for f, using the custom movmemsi expansion, is unchanged.

-Sandra

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: PATCH: fine-tuning for can_store_by_pieces
  2007-08-15 17:15 PATCH: fine-tuning for can_store_by_pieces Sandra Loosemore
  2007-08-15 17:22 ` Andrew Pinski
@ 2007-08-16  8:34 ` Richard Sandiford
  2007-08-16 19:41   ` Sandra Loosemore
  2007-08-19  0:03   ` Sandra Loosemore
  1 sibling, 2 replies; 30+ messages in thread
From: Richard Sandiford @ 2007-08-16  8:34 UTC (permalink / raw)
  To: Sandra Loosemore
  Cc: GCC Patches, Nigel Stephens, Guy Morrogh, David Ung, Thiemo Seufer

I'll leave the target-independent bits to the appropriate maintainers.
As far as the MIPS bits go:

Sandra Loosemore <sandra@codesourcery.com> writes:
> + /* Define MOVE_RATIO to encourage use of movmemsi when enabled,
> +    since it should always generate code at least as good as
> +    move_by_pieces().  But when inline movmemsi pattern is disabled
> +    (i.e., with -mips16 or -mmemcpy), instead represent the length of
> +    a memcpy call sequence (~6 insns), so that move_by_pieces will
> +    generate inline code if it is shorter than a function call.
> +    Note that move_by_pieces_ninsns() counts memory-to-memory moves,
> +    but we'll have to generate a load/store pair for each, so further
> +    halve the value to take that into account.  */
> + 
> + #define MOVE_RATIO ((TARGET_MIPS16 || TARGET_MEMCPY) ? 3 : 2)

The TARGET_MIPS16 || TARGET_MEMCPY case is fine, and nicely described,
but it would be good to have a comment explaining why 2 encourages
"use of movmemsi when enabled".

> + /* For CLEAR_RATIO use a better estimate of the length of a memset
> +    call when optimizing for size.  */
> + 
> + #define CLEAR_RATIO	(optimize_size ? 5 : 15)

Might be worth emphasising that 15 is the default for !optimize_size,
as it otherwise isn't clear where that number came from.

> + /* This is similar to CLEAR_RATIO, but for a non-zero constant, so when
> +    optimizing for size adjust the ratio to account for the overhead of
> +    loading the constant and replicating it across the word.  In fact in
> +    that case it is never worth inlining, since calling memset should
> +    always be smaller.  */
> + 
> + #define SET_RATIO	(optimize_size ? 3 : 15)

Is there a lower limit to the size of memsets for which SET_RATIO is
used?  I'm not convinced that memset is smaller for power-of-2 stores up
to word or (if aligned) doubleword size.  (I'm assuming here that we use
left/right stores for unaligned 4-byte and (on 64-bit targets) 8-byte
stores.  If we don't, we should.)

Perhaps one of these days we should add a setmem pattern...

> + /* STORE_BY_PIECES_P can be used when copying a constant string, but
> +    in that case each word takes 3 insns (lui, ori, sw), or more in
> +    64-bit mode, instead of 2 (lw, sw). So better to always fail this
> +    and let the move_by_pieces code copy the string from read-only
> +    memory.  */
> + 
> + #define STORE_BY_PIECES_P(SIZE, ALIGN) 0

However, lui/ori/sw could easily be better speedwise in some situations,
so should this be dependent on optimize_size too?

Finally, a general comment: the call costs seem to be based on *-elf
targets, where a direct call is allowed.  The cost of a call is higher
sizewise for n32 and n64 abicalls.  It is higher still for o32 abicalls,
where we have to restore $gp afterwards.  (The cost for all three is even
higher if the function didn't otherwise need $gp, but I don't think we
can really account for that.)

Please check whether this patch fixes PR 11787.  (It should, if the
cost are good.)  If it does, add "PR target/11787" to the changelog.

Richard

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: PATCH: fine-tuning for can_store_by_pieces
  2007-08-16  8:34 ` Richard Sandiford
@ 2007-08-16 19:41   ` Sandra Loosemore
  2007-08-19  0:03   ` Sandra Loosemore
  1 sibling, 0 replies; 30+ messages in thread
From: Sandra Loosemore @ 2007-08-16 19:41 UTC (permalink / raw)
  To: GCC Patches, Nigel Stephens, Guy Morrogh, David Ung,
	Thiemo Seufer, richard

Richard Sandiford wrote:

>> + /* This is similar to CLEAR_RATIO, but for a non-zero constant, so when
>> +    optimizing for size adjust the ratio to account for the overhead of
>> +    loading the constant and replicating it across the word.  In fact in
>> +    that case it is never worth inlining, since calling memset should
>> +    always be smaller.  */
>> + 
>> + #define SET_RATIO	(optimize_size ? 3 : 15)
> 
> Is there a lower limit to the size of memsets for which SET_RATIO is
> used?  I'm not convinced that memset is smaller for power-of-2 stores up
> to word or (if aligned) doubleword size.  (I'm assuming here that we use
> left/right stores for unaligned 4-byte and (on 64-bit targets) 8-byte
> stores.  If we don't, we should.)

This is being handled by the non-target-specific code in builtins.c.  If 
optimize_size is true, it won't even consider can_store_by_pieces if the byte 
count is greater than 1, and the next alternative it tries is setmem.

> Perhaps one of these days we should add a setmem pattern...

Uh, yeah.  :-)  For optimize_size, it looks like it would be a win over the 
libcall to memset in the case where the buffer is properly aligned for an 
integer move and the value is a constant.  For a non-constant value, there's the 
extra overhead of replicating it across the word, as noted in the comment above.

>> + /* STORE_BY_PIECES_P can be used when copying a constant string, but
>> +    in that case each word takes 3 insns (lui, ori, sw), or more in
>> +    64-bit mode, instead of 2 (lw, sw). So better to always fail this
>> +    and let the move_by_pieces code copy the string from read-only
>> +    memory.  */
>> + 
>> + #define STORE_BY_PIECES_P(SIZE, ALIGN) 0
> 
> However, lui/ori/sw could easily be better speedwise in some situations,
> so should this be dependent on optimize_size too?

Ermmm.  In what "some situations" would it better?

> Finally, a general comment: the call costs seem to be based on *-elf
> targets, where a direct call is allowed.  The cost of a call is higher
> sizewise for n32 and n64 abicalls.  It is higher still for o32 abicalls,
> where we have to restore $gp afterwards.  (The cost for all three is even
> higher if the function didn't otherwise need $gp, but I don't think we
> can really account for that.)

I can figure this out by trial and error, of course, but do you (or Nigel, etc) 
know offhand what the cost adjustments should be for each ABI?

> Please check whether this patch fixes PR 11787.  (It should, if the
> cost are good.)  If it does, add "PR target/11787" to the changelog.

Yes, it looks like it does fix that issue.  I will mark the ChangeLog for the 
next version of the patch.

-Sandra

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: PATCH: fine-tuning for can_store_by_pieces
  2007-08-15 17:22 ` Andrew Pinski
  2007-08-15 18:32   ` Sandra Loosemore
  2007-08-15 19:58   ` Sandra Loosemore
@ 2007-08-17  4:50   ` Mark Mitchell
  2007-08-17 13:24     ` Sandra Loosemore
  2 siblings, 1 reply; 30+ messages in thread
From: Mark Mitchell @ 2007-08-17  4:50 UTC (permalink / raw)
  To: Andrew Pinski
  Cc: Sandra Loosemore, GCC Patches, Nigel Stephens, Guy Morrogh,
	David Ung, Thiemo Seufer, Richard Sandiford

Andrew Pinski wrote:
> On 8/15/07, Sandra Loosemore <sandra@codesourcery.com> wrote:
>> On MIPS we can get better code by having can_store_by_pieces
>> differentiate between the cases where it's used for memset
>> operations, and those where it's used to copy a string constant.
>> This patch introduces new SET_RATIO and SET_BY_PIECES_P macros,
>> with appropriate defaults to preserve the existing behavior.  I
>> checked other targets and made the ones that override the default 
>> STORE_BY_PIECES_P clone the same definition for SET_BY_PIECES_P.

> It seems if you gave a testcase where this is profitable, it would be
>  better to judge this patch (and maybe a testcase for the testsuite 
> also).

Sandra's now posted such a testcase, and it looks compelling to me.
Certainly, setting all bytes to a single value is different from copying
one series of bytes to another, so it doesn't surprise me that one wants
different rules for the different cases.

Sandra, I think the generic parts of the patch are OK, with a few nits:

> ! 		     void *constfundata, unsigned int align, int memsetp)

MEMSETP should be a bool, not an int.  And, when literal values are
passed to it, they should be "true" and "false", not "1" and "0".

> * config/sh/sh.h (SET_BY_PIECES_P): Clone from STORE_BY_PIECES_P.

In the various places where you clone the macro, I think you should just
 do:

#define SET_BY_PIECES_P(SIZE, ALIGN) STORE_BY_PIECES_P(SIZE, ALIGN)

In fact, if you do that in expr.c, under #ifndef SET_BY_PIECES_P, I
think you can void changing the other backends at all, as you'll
automatically pick up their STORE_BY_PIECES_P definitions.

If that all works, then the target-independent parts of the patch are OK
with those changes.

Thanks,

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: PATCH: fine-tuning for can_store_by_pieces
  2007-08-17  4:50   ` Mark Mitchell
@ 2007-08-17 13:24     ` Sandra Loosemore
  2007-08-17 18:55       ` Mark Mitchell
  0 siblings, 1 reply; 30+ messages in thread
From: Sandra Loosemore @ 2007-08-17 13:24 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: Andrew Pinski, GCC Patches, Nigel Stephens, Guy Morrogh,
	David Ung, Thiemo Seufer, Richard Sandiford

Mark Mitchell wrote:

> In the various places where you clone the macro, I think you should just
>  do:
> 
> #define SET_BY_PIECES_P(SIZE, ALIGN) STORE_BY_PIECES_P(SIZE, ALIGN)
> 
> In fact, if you do that in expr.c, under #ifndef SET_BY_PIECES_P, I
> think you can void changing the other backends at all, as you'll
> automatically pick up their STORE_BY_PIECES_P definitions.

I previously considered that idea, but that means that just defining SET_RATIO 
to be different than MOVE_RATIO won't do anything useful.  Making the default 
definition of SET_BY_PIECES_P use SET_RATIO and making the latter default to 
MOVE_RATIO seemed to be the best combination of avoiding breaking current 
backend code while making it easy to tweak in the future.  If I do it your way, 
there's no point in having SET_RATIO in the target-independent code at all, 
because nothing would refer to it.

-Sandra

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: PATCH: fine-tuning for can_store_by_pieces
  2007-08-17 13:24     ` Sandra Loosemore
@ 2007-08-17 18:55       ` Mark Mitchell
  0 siblings, 0 replies; 30+ messages in thread
From: Mark Mitchell @ 2007-08-17 18:55 UTC (permalink / raw)
  To: Sandra Loosemore
  Cc: Andrew Pinski, GCC Patches, Nigel Stephens, Guy Morrogh,
	David Ung, Thiemo Seufer, Richard Sandiford

Sandra Loosemore wrote:
> Mark Mitchell wrote:
> 
>> In the various places where you clone the macro, I think you should just
>>  do:
>>
>> #define SET_BY_PIECES_P(SIZE, ALIGN) STORE_BY_PIECES_P(SIZE, ALIGN)
>>
>> In fact, if you do that in expr.c, under #ifndef SET_BY_PIECES_P, I
>> think you can void changing the other backends at all, as you'll
>> automatically pick up their STORE_BY_PIECES_P definitions.
> 
> I previously considered that idea, but that means that just defining
> SET_RATIO to be different than MOVE_RATIO won't do anything useful.

Good point; ignore that suggestion then.

Thanks,

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: PATCH: fine-tuning for can_store_by_pieces
  2007-08-16  8:34 ` Richard Sandiford
  2007-08-16 19:41   ` Sandra Loosemore
@ 2007-08-19  0:03   ` Sandra Loosemore
  2007-08-20  8:22     ` Richard Sandiford
  1 sibling, 1 reply; 30+ messages in thread
From: Sandra Loosemore @ 2007-08-19  0:03 UTC (permalink / raw)
  To: GCC Patches, Nigel Stephens, Guy Morrogh, David Ung,
	Thiemo Seufer, richard, Mark Mitchell

[-- Attachment #1: Type: text/plain, Size: 797 bytes --]

Here's a revised version of the patch, incorporating comments from Richard and 
Mark.

I did some fiddling around with CSiBE to further improve the cost estimate for a 
call.  Setting the base cost to 4, rather than 3 as in the original patch, 
produced smaller code both with and without -mabicalls.  With o32 abicalls, 
which Richard indicated would be the worst case scenario for call overhead, base 
cost of 5 gave similar size results to 4 (overall a tiny decrease, but it looked 
like just as many individual tests went up as down), but code size went up for 
everything else.  So I think 4 is probably a reasonable value without having to 
conditionalize it for -mabicalls.

OK to commit this time?  I did the i686 bootstrap on this version of the patch 
as well as MIPS testing.

-Sandra


[-- Attachment #2: 31a-frob-by-pieces.log --]
[-- Type: text/x-log, Size: 1168 bytes --]

2007-08-15  Sandra Loosemore  <sandra@codesourcery.com>
            Nigel Stephens <nigel@mips.com>

	PR target/11787

	gcc/

	* doc/tm.texi (SET_RATIO, SET_BY_PIECES_P): Document new macros.
	(STORE_BY_PIECES_P): No longer applies to __builtin_memset.
	* expr.c (SET_BY_PIECES_P): Define.
	(can_store_by_pieces, store_by_pieces): Add MEMSETP argument; use
	it to decide whether to use SET_BY_PIECES_P or STORE_BY_PIECES_P.
	* expr.h (SET_RATIO): Define.
	(can_store_by_pieces, store_by_pieces):	Update prototypes.
	* builtins.c (expand_builtin_memcpy): Pass MEMSETP argument to
	can_store_by_pieces/store_by_pieces.
	(expand_builtin_memcpy_args): Likewise.
	(expand_builtin_strncpy): Likewise.
	(expand_builtin_memset_args): Likewise.
	* value-prof.c (tree_stringops_transform): Likewise.

	* config/sh/sh.h (SET_BY_PIECES_P): Clone from STORE_BY_PIECES_P.
	* config/s390/s390.h (SET_BY_PIECES_P): Likewise.

	* config/mips/mips.opt (mmemcpy): Change from Var to Mask.
	* config/mips/mips.c (override_options): Make -Os default to -mmemcpy.
	* config/mips/mips.h (MIPS_CALL_RATIO): Define.
	(MOVE_RATIO, CLEAR_RATIO, SET_RATIO): Define.
	(STORE_BY_PIECES_P): Define.

[-- Attachment #3: 31a-frob-by-pieces.patch --]
[-- Type: text/x-patch, Size: 22348 bytes --]

Index: gcc/doc/tm.texi
===================================================================
*** gcc/doc/tm.texi	(revision 127324)
--- gcc/doc/tm.texi	(working copy)
*************** will be used.  Defaults to 1 if @code{mo
*** 5893,5904 ****
  than @code{CLEAR_RATIO}.
  @end defmac
  
  @defmac STORE_BY_PIECES_P (@var{size}, @var{alignment})
  A C expression used to determine whether @code{store_by_pieces} will be
! used to set a chunk of memory to a constant value, or whether some other
! mechanism will be used.  Used by @code{__builtin_memset} when storing
! values other than constant zero and by @code{__builtin_strcpy} when
! when called with a constant source string.
  Defaults to 1 if @code{move_by_pieces_ninsns} returns less
  than @code{MOVE_RATIO}.
  @end defmac
--- 5893,5922 ----
  than @code{CLEAR_RATIO}.
  @end defmac
  
+ @defmac SET_RATIO
+ The threshold of number of scalar move insns, @emph{below} which a sequence
+ of insns should be generated to set memory to a constant value, instead of
+ a block set insn or a library call.  
+ Increasing the value will always make code faster, but
+ eventually incurs high cost in increased code size.
+ 
+ If you don't define this, it defaults to the value of @code{MOVE_RATIO}.
+ @end defmac
+ 
+ @defmac SET_BY_PIECES_P (@var{size}, @var{alignment})
+ A C expression used to determine whether @code{store_by_pieces} will be
+ used to set a chunk of memory to a constant value, or whether some 
+ other mechanism will be used.  Used by @code{__builtin_memset} when 
+ storing values other than constant zero.
+ Defaults to 1 if @code{move_by_pieces_ninsns} returns less
+ than @code{SET_RATIO}.
+ @end defmac
+ 
  @defmac STORE_BY_PIECES_P (@var{size}, @var{alignment})
  A C expression used to determine whether @code{store_by_pieces} will be
! used to set a chunk of memory to a constant string value, or whether some 
! other mechanism will be used.  Used by @code{__builtin_strcpy} when
! called with a constant source string.
  Defaults to 1 if @code{move_by_pieces_ninsns} returns less
  than @code{MOVE_RATIO}.
  @end defmac
Index: gcc/expr.c
===================================================================
*** gcc/expr.c	(revision 127324)
--- gcc/expr.c	(working copy)
*************** static bool float_extend_from_mem[NUM_MA
*** 186,193 ****
  #endif
  
  /* This macro is used to determine whether store_by_pieces should be
!    called to "memset" storage with byte values other than zero, or
!    to "memcpy" storage when the source is a constant string.  */
  #ifndef STORE_BY_PIECES_P
  #define STORE_BY_PIECES_P(SIZE, ALIGN) \
    (move_by_pieces_ninsns (SIZE, ALIGN, STORE_MAX_PIECES + 1) \
--- 186,200 ----
  #endif
  
  /* This macro is used to determine whether store_by_pieces should be
!    called to "memset" storage with byte values other than zero.  */
! #ifndef SET_BY_PIECES_P
! #define SET_BY_PIECES_P(SIZE, ALIGN) \
!   (move_by_pieces_ninsns (SIZE, ALIGN, STORE_MAX_PIECES + 1) \
!    < (unsigned int) SET_RATIO)
! #endif
! 
! /* This macro is used to determine whether store_by_pieces should be
!    called to "memcpy" storage when the source is a constant string.  */
  #ifndef STORE_BY_PIECES_P
  #define STORE_BY_PIECES_P(SIZE, ALIGN) \
    (move_by_pieces_ninsns (SIZE, ALIGN, STORE_MAX_PIECES + 1) \
*************** use_group_regs (rtx *call_fusage, rtx re
*** 2191,2203 ****
  /* Determine whether the LEN bytes generated by CONSTFUN can be
     stored to memory using several move instructions.  CONSTFUNDATA is
     a pointer which will be passed as argument in every CONSTFUN call.
!    ALIGN is maximum alignment we can assume.  Return nonzero if a
!    call to store_by_pieces should succeed.  */
  
  int
  can_store_by_pieces (unsigned HOST_WIDE_INT len,
  		     rtx (*constfun) (void *, HOST_WIDE_INT, enum machine_mode),
! 		     void *constfundata, unsigned int align)
  {
    unsigned HOST_WIDE_INT l;
    unsigned int max_size;
--- 2198,2211 ----
  /* Determine whether the LEN bytes generated by CONSTFUN can be
     stored to memory using several move instructions.  CONSTFUNDATA is
     a pointer which will be passed as argument in every CONSTFUN call.
!    ALIGN is maximum alignment we can assume.  MEMSETP is true if this is
!    a memset operation and false if it's a copy of a constant string.
!    Return nonzero if a call to store_by_pieces should succeed.  */
  
  int
  can_store_by_pieces (unsigned HOST_WIDE_INT len,
  		     rtx (*constfun) (void *, HOST_WIDE_INT, enum machine_mode),
! 		     void *constfundata, unsigned int align, bool memsetp)
  {
    unsigned HOST_WIDE_INT l;
    unsigned int max_size;
*************** can_store_by_pieces (unsigned HOST_WIDE_
*** 2210,2216 ****
    if (len == 0)
      return 1;
  
!   if (! STORE_BY_PIECES_P (len, align))
      return 0;
  
    tmode = mode_for_size (STORE_MAX_PIECES * BITS_PER_UNIT, MODE_INT, 1);
--- 2218,2226 ----
    if (len == 0)
      return 1;
  
!   if (! (memsetp 
! 	 ? SET_BY_PIECES_P (len, align)
! 	 : STORE_BY_PIECES_P (len, align)))
      return 0;
  
    tmode = mode_for_size (STORE_MAX_PIECES * BITS_PER_UNIT, MODE_INT, 1);
*************** can_store_by_pieces (unsigned HOST_WIDE_
*** 2285,2291 ****
  /* Generate several move instructions to store LEN bytes generated by
     CONSTFUN to block TO.  (A MEM rtx with BLKmode).  CONSTFUNDATA is a
     pointer which will be passed as argument in every CONSTFUN call.
!    ALIGN is maximum alignment we can assume.
     If ENDP is 0 return to, if ENDP is 1 return memory at the end ala
     mempcpy, and if ENDP is 2 return memory the end minus one byte ala
     stpcpy.  */
--- 2295,2302 ----
  /* Generate several move instructions to store LEN bytes generated by
     CONSTFUN to block TO.  (A MEM rtx with BLKmode).  CONSTFUNDATA is a
     pointer which will be passed as argument in every CONSTFUN call.
!    ALIGN is maximum alignment we can assume.  MEMSETP is true if this is
!    a memset operation and false if it's a copy of a constant string.
     If ENDP is 0 return to, if ENDP is 1 return memory at the end ala
     mempcpy, and if ENDP is 2 return memory the end minus one byte ala
     stpcpy.  */
*************** can_store_by_pieces (unsigned HOST_WIDE_
*** 2293,2299 ****
  rtx
  store_by_pieces (rtx to, unsigned HOST_WIDE_INT len,
  		 rtx (*constfun) (void *, HOST_WIDE_INT, enum machine_mode),
! 		 void *constfundata, unsigned int align, int endp)
  {
    struct store_by_pieces data;
  
--- 2304,2310 ----
  rtx
  store_by_pieces (rtx to, unsigned HOST_WIDE_INT len,
  		 rtx (*constfun) (void *, HOST_WIDE_INT, enum machine_mode),
! 		 void *constfundata, unsigned int align, bool memsetp, int endp)
  {
    struct store_by_pieces data;
  
*************** store_by_pieces (rtx to, unsigned HOST_W
*** 2303,2309 ****
        return to;
      }
  
!   gcc_assert (STORE_BY_PIECES_P (len, align));
    data.constfun = constfun;
    data.constfundata = constfundata;
    data.len = len;
--- 2314,2322 ----
        return to;
      }
  
!   gcc_assert (memsetp
! 	      ? SET_BY_PIECES_P (len, align)
! 	      : STORE_BY_PIECES_P (len, align));
    data.constfun = constfun;
    data.constfundata = constfundata;
    data.len = len;
Index: gcc/expr.h
===================================================================
*** gcc/expr.h	(revision 127324)
--- gcc/expr.h	(working copy)
*************** enum expand_modifier {EXPAND_NORMAL = 0,
*** 84,89 ****
--- 84,96 ----
  #define CLEAR_RATIO (optimize_size ? 3 : 15)
  #endif
  #endif
+ 
+ /* If a memory set (to value other than zero) operation would take
+    SET_RATIO or more simple move-instruction sequences, we will do a movmem
+    or libcall instead.  */
+ #ifndef SET_RATIO
+ #define SET_RATIO MOVE_RATIO
+ #endif
  \f
  enum direction {none, upward, downward};
  
*************** extern int can_move_by_pieces (unsigned 
*** 443,462 ****
     CONSTFUN with several move instructions by store_by_pieces
     function.  CONSTFUNDATA is a pointer which will be passed as argument
     in every CONSTFUN call.
!    ALIGN is maximum alignment we can assume.  */
  extern int can_store_by_pieces (unsigned HOST_WIDE_INT,
  				rtx (*) (void *, HOST_WIDE_INT,
  					 enum machine_mode),
! 				void *, unsigned int);
  
  /* Generate several move instructions to store LEN bytes generated by
     CONSTFUN to block TO.  (A MEM rtx with BLKmode).  CONSTFUNDATA is a
     pointer which will be passed as argument in every CONSTFUN call.
     ALIGN is maximum alignment we can assume.
     Returns TO + LEN.  */
  extern rtx store_by_pieces (rtx, unsigned HOST_WIDE_INT,
  			    rtx (*) (void *, HOST_WIDE_INT, enum machine_mode),
! 			    void *, unsigned int, int);
  
  /* Emit insns to set X from Y.  */
  extern rtx emit_move_insn (rtx, rtx);
--- 450,472 ----
     CONSTFUN with several move instructions by store_by_pieces
     function.  CONSTFUNDATA is a pointer which will be passed as argument
     in every CONSTFUN call.
!    ALIGN is maximum alignment we can assume.
!    MEMSETP is true if this is a real memset/bzero, not a copy
!    of a const string.  */
  extern int can_store_by_pieces (unsigned HOST_WIDE_INT,
  				rtx (*) (void *, HOST_WIDE_INT,
  					 enum machine_mode),
! 				void *, unsigned int, bool);
  
  /* Generate several move instructions to store LEN bytes generated by
     CONSTFUN to block TO.  (A MEM rtx with BLKmode).  CONSTFUNDATA is a
     pointer which will be passed as argument in every CONSTFUN call.
     ALIGN is maximum alignment we can assume.
+    MEMSETP is true if this is a real memset/bzero, not a copy.
     Returns TO + LEN.  */
  extern rtx store_by_pieces (rtx, unsigned HOST_WIDE_INT,
  			    rtx (*) (void *, HOST_WIDE_INT, enum machine_mode),
! 			    void *, unsigned int, bool, int);
  
  /* Emit insns to set X from Y.  */
  extern rtx emit_move_insn (rtx, rtx);
Index: gcc/builtins.c
===================================================================
*** gcc/builtins.c	(revision 127324)
--- gcc/builtins.c	(working copy)
*************** expand_builtin_memcpy (tree exp, rtx tar
*** 3371,3381 ****
  	  && GET_CODE (len_rtx) == CONST_INT
  	  && (unsigned HOST_WIDE_INT) INTVAL (len_rtx) <= strlen (src_str) + 1
  	  && can_store_by_pieces (INTVAL (len_rtx), builtin_memcpy_read_str,
! 				  (void *) src_str, dest_align))
  	{
  	  dest_mem = store_by_pieces (dest_mem, INTVAL (len_rtx),
  				      builtin_memcpy_read_str,
! 				      (void *) src_str, dest_align, 0);
  	  dest_mem = force_operand (XEXP (dest_mem, 0), NULL_RTX);
  	  dest_mem = convert_memory_address (ptr_mode, dest_mem);
  	  return dest_mem;
--- 3371,3381 ----
  	  && GET_CODE (len_rtx) == CONST_INT
  	  && (unsigned HOST_WIDE_INT) INTVAL (len_rtx) <= strlen (src_str) + 1
  	  && can_store_by_pieces (INTVAL (len_rtx), builtin_memcpy_read_str,
! 				  (void *) src_str, dest_align, false))
  	{
  	  dest_mem = store_by_pieces (dest_mem, INTVAL (len_rtx),
  				      builtin_memcpy_read_str,
! 				      (void *) src_str, dest_align, false, 0);
  	  dest_mem = force_operand (XEXP (dest_mem, 0), NULL_RTX);
  	  dest_mem = convert_memory_address (ptr_mode, dest_mem);
  	  return dest_mem;
*************** expand_builtin_mempcpy_args (tree dest, 
*** 3484,3496 ****
  	  && GET_CODE (len_rtx) == CONST_INT
  	  && (unsigned HOST_WIDE_INT) INTVAL (len_rtx) <= strlen (src_str) + 1
  	  && can_store_by_pieces (INTVAL (len_rtx), builtin_memcpy_read_str,
! 				  (void *) src_str, dest_align))
  	{
  	  dest_mem = get_memory_rtx (dest, len);
  	  set_mem_align (dest_mem, dest_align);
  	  dest_mem = store_by_pieces (dest_mem, INTVAL (len_rtx),
  				      builtin_memcpy_read_str,
! 				      (void *) src_str, dest_align, endp);
  	  dest_mem = force_operand (XEXP (dest_mem, 0), NULL_RTX);
  	  dest_mem = convert_memory_address (ptr_mode, dest_mem);
  	  return dest_mem;
--- 3484,3497 ----
  	  && GET_CODE (len_rtx) == CONST_INT
  	  && (unsigned HOST_WIDE_INT) INTVAL (len_rtx) <= strlen (src_str) + 1
  	  && can_store_by_pieces (INTVAL (len_rtx), builtin_memcpy_read_str,
! 				  (void *) src_str, dest_align, false))
  	{
  	  dest_mem = get_memory_rtx (dest, len);
  	  set_mem_align (dest_mem, dest_align);
  	  dest_mem = store_by_pieces (dest_mem, INTVAL (len_rtx),
  				      builtin_memcpy_read_str,
! 				      (void *) src_str, dest_align,
! 				      false, endp);
  	  dest_mem = force_operand (XEXP (dest_mem, 0), NULL_RTX);
  	  dest_mem = convert_memory_address (ptr_mode, dest_mem);
  	  return dest_mem;
*************** expand_builtin_strncpy (tree exp, rtx ta
*** 3832,3844 ****
  	  if (!p || dest_align == 0 || !host_integerp (len, 1)
  	      || !can_store_by_pieces (tree_low_cst (len, 1),
  				       builtin_strncpy_read_str,
! 				       (void *) p, dest_align))
  	    return NULL_RTX;
  
  	  dest_mem = get_memory_rtx (dest, len);
  	  store_by_pieces (dest_mem, tree_low_cst (len, 1),
  			   builtin_strncpy_read_str,
! 			   (void *) p, dest_align, 0);
  	  dest_mem = force_operand (XEXP (dest_mem, 0), NULL_RTX);
  	  dest_mem = convert_memory_address (ptr_mode, dest_mem);
  	  return dest_mem;
--- 3833,3845 ----
  	  if (!p || dest_align == 0 || !host_integerp (len, 1)
  	      || !can_store_by_pieces (tree_low_cst (len, 1),
  				       builtin_strncpy_read_str,
! 				       (void *) p, dest_align, false))
  	    return NULL_RTX;
  
  	  dest_mem = get_memory_rtx (dest, len);
  	  store_by_pieces (dest_mem, tree_low_cst (len, 1),
  			   builtin_strncpy_read_str,
! 			   (void *) p, dest_align, false, 0);
  	  dest_mem = force_operand (XEXP (dest_mem, 0), NULL_RTX);
  	  dest_mem = convert_memory_address (ptr_mode, dest_mem);
  	  return dest_mem;
*************** expand_builtin_memset_args (tree dest, t
*** 3968,3979 ****
        if (host_integerp (len, 1)
  	  && !(optimize_size && tree_low_cst (len, 1) > 1)
  	  && can_store_by_pieces (tree_low_cst (len, 1),
! 				  builtin_memset_read_str, &c, dest_align))
  	{
  	  val_rtx = force_reg (TYPE_MODE (unsigned_char_type_node),
  			       val_rtx);
  	  store_by_pieces (dest_mem, tree_low_cst (len, 1),
! 			   builtin_memset_gen_str, val_rtx, dest_align, 0);
  	}
        else if (!set_storage_via_setmem (dest_mem, len_rtx, val_rtx,
  					dest_align, expected_align,
--- 3969,3982 ----
        if (host_integerp (len, 1)
  	  && !(optimize_size && tree_low_cst (len, 1) > 1)
  	  && can_store_by_pieces (tree_low_cst (len, 1),
! 				  builtin_memset_read_str, &c, dest_align,
! 				  true))
  	{
  	  val_rtx = force_reg (TYPE_MODE (unsigned_char_type_node),
  			       val_rtx);
  	  store_by_pieces (dest_mem, tree_low_cst (len, 1),
! 			   builtin_memset_gen_str, val_rtx, dest_align,
! 			   true, 0);
  	}
        else if (!set_storage_via_setmem (dest_mem, len_rtx, val_rtx,
  					dest_align, expected_align,
*************** expand_builtin_memset_args (tree dest, t
*** 3993,4001 ****
        if (host_integerp (len, 1)
  	  && !(optimize_size && tree_low_cst (len, 1) > 1)
  	  && can_store_by_pieces (tree_low_cst (len, 1),
! 				  builtin_memset_read_str, &c, dest_align))
  	store_by_pieces (dest_mem, tree_low_cst (len, 1),
! 			 builtin_memset_read_str, &c, dest_align, 0);
        else if (!set_storage_via_setmem (dest_mem, len_rtx, GEN_INT (c),
  					dest_align, expected_align,
  					expected_size))
--- 3996,4005 ----
        if (host_integerp (len, 1)
  	  && !(optimize_size && tree_low_cst (len, 1) > 1)
  	  && can_store_by_pieces (tree_low_cst (len, 1),
! 				  builtin_memset_read_str, &c, dest_align,
! 				  true))
  	store_by_pieces (dest_mem, tree_low_cst (len, 1),
! 			 builtin_memset_read_str, &c, dest_align, true, 0);
        else if (!set_storage_via_setmem (dest_mem, len_rtx, GEN_INT (c),
  					dest_align, expected_align,
  					expected_size))
Index: gcc/value-prof.c
===================================================================
*** gcc/value-prof.c	(revision 127324)
--- gcc/value-prof.c	(working copy)
*************** tree_stringops_transform (block_stmt_ite
*** 1392,1404 ****
      case BUILT_IN_MEMSET:
        if (!can_store_by_pieces (val, builtin_memset_read_str,
  				CALL_EXPR_ARG (call, 1),
! 				dest_align))
  	return false;
        break;
      case BUILT_IN_BZERO:
        if (!can_store_by_pieces (val, builtin_memset_read_str,
  				integer_zero_node,
! 				dest_align))
  	return false;
        break;
      default:
--- 1392,1404 ----
      case BUILT_IN_MEMSET:
        if (!can_store_by_pieces (val, builtin_memset_read_str,
  				CALL_EXPR_ARG (call, 1),
! 				dest_align, true))
  	return false;
        break;
      case BUILT_IN_BZERO:
        if (!can_store_by_pieces (val, builtin_memset_read_str,
  				integer_zero_node,
! 				dest_align, true))
  	return false;
        break;
      default:
Index: gcc/config/sh/sh.h
===================================================================
*** gcc/config/sh/sh.h	(revision 127324)
--- gcc/config/sh/sh.h	(working copy)
*************** struct sh_args {
*** 2184,2189 ****
--- 2184,2191 ----
    (move_by_pieces_ninsns (SIZE, ALIGN, STORE_MAX_PIECES + 1) \
     < (TARGET_SMALLCODE ? 2 : ((ALIGN >= 32) ? 16 : 2)))
  
+ #define SET_BY_PIECES_P(SIZE, ALIGN) STORE_BY_PIECES_P(SIZE, ALIGN)
+ 
  /* Macros to check register numbers against specific register classes.  */
  
  /* These assume that REGNO is a hard or pseudo reg number.
Index: gcc/config/s390/s390.h
===================================================================
*** gcc/config/s390/s390.h	(revision 127324)
--- gcc/config/s390/s390.h	(working copy)
*************** extern struct rtx_def *s390_compare_op0,
*** 803,812 ****
      || (TARGET_64BIT && (SIZE) == 8) )
  
  /* This macro is used to determine whether store_by_pieces should be
!    called to "memset" storage with byte values other than zero, or
!    to "memcpy" storage when the source is a constant string.  */
  #define STORE_BY_PIECES_P(SIZE, ALIGN) MOVE_BY_PIECES_P (SIZE, ALIGN)
  
  /* Don't perform CSE on function addresses.  */
  #define NO_FUNCTION_CSE
  
--- 803,815 ----
      || (TARGET_64BIT && (SIZE) == 8) )
  
  /* This macro is used to determine whether store_by_pieces should be
!    called to "memcpy" storage when the source is a constant string.  */
  #define STORE_BY_PIECES_P(SIZE, ALIGN) MOVE_BY_PIECES_P (SIZE, ALIGN)
  
+ /* Likewise to decide whether to "memset" storage with byte values
+    other than zero.  */
+ #define SET_BY_PIECES_P(SIZE, ALIGN) STORE_BY_PIECES_P (SIZE, ALIGN)
+ 
  /* Don't perform CSE on function addresses.  */
  #define NO_FUNCTION_CSE
  
Index: gcc/config/mips/mips.opt
===================================================================
*** gcc/config/mips/mips.opt	(revision 127325)
--- gcc/config/mips/mips.opt	(working copy)
*************** Target Report RejectNegative Mask(LONG64
*** 173,179 ****
  Use a 64-bit long type
  
  mmemcpy
! Target Report Var(TARGET_MEMCPY)
  Don't optimize block moves
  
  mmips-tfile
--- 173,179 ----
  Use a 64-bit long type
  
  mmemcpy
! Target Report Mask(MEMCPY)
  Don't optimize block moves
  
  mmips-tfile
Index: gcc/config/mips/mips.c
===================================================================
*** gcc/config/mips/mips.c	(revision 127325)
--- gcc/config/mips/mips.c	(working copy)
*************** override_options (void)
*** 5299,5304 ****
--- 5299,5309 ----
        flag_delayed_branch = 0;
      }
  
+   /* Prefer a call to memcpy over inline code when optimizing for size,
+      though see MOVE_RATIO in mips.h.  */
+   if (optimize_size && (target_flags_explicit & MASK_MEMCPY) == 0)
+     target_flags |= MASK_MEMCPY;
+ 
  #ifdef MIPS_TFMODE_FORMAT
    REAL_MODE_FORMAT (TFmode) = &MIPS_TFMODE_FORMAT;
  #endif
Index: gcc/config/mips/mips.h
===================================================================
*** gcc/config/mips/mips.h	(revision 127325)
--- gcc/config/mips/mips.h	(working copy)
*************** while (0)
*** 2780,2785 ****
--- 2780,2829 ----
  
  #undef PTRDIFF_TYPE
  #define PTRDIFF_TYPE (POINTER_SIZE == 64 ? "long int" : "int")
+ 
+ /* The base cost of a memcpy call, for MOVE_RATIO and friends.  This
+    was determined experimentally by benchmarking with CSiBE.  In theory,
+    the call overhead is higher for TARGET_ABICALLS (especially for o32
+    where we have to restore $gp afterwards as well as make an indirect
+    call), but in practice, bumping this up higher for TARGET_ABICALLS 
+    doesn't make much difference to code size.  */
+ 
+ #define MIPS_CALL_RATIO 4
+ 
+ /* Define MOVE_RATIO to encourage use of movmemsi when enabled,
+    since it should always generate code at least as good as
+    move_by_pieces().  But when inline movmemsi pattern is disabled
+    (i.e., with -mips16 or -mmemcpy), instead use a value approximating
+    the length of a memcpy call sequence, so that move_by_pieces will
+    generate inline code if it is shorter than a function call.
+    The default value for MOVE_RATIO when HAVE_movmemsi is true is 2.
+    There is no point to setting it to less than this to try to disable
+    move_by_pieces entirely, because that also disables some desirable 
+    tree-level optimizations, specifically related to optimizing a
+    one-byte string copy into a simple move byte operation.  */
+ 
+ #define MOVE_RATIO ((TARGET_MIPS16 || TARGET_MEMCPY) ? MIPS_CALL_RATIO : 2)
+ 
+ /* For CLEAR_RATIO, when optimizing for size, give a better estimate
+    of the length of a memset call, but use the default otherwise.  */
+ 
+ #define CLEAR_RATIO	(optimize_size ? MIPS_CALL_RATIO + 2 : 15)
+ 
+ /* This is similar to CLEAR_RATIO, but for a non-zero constant, so when
+    optimizing for size adjust the ratio to account for the overhead of
+    loading the constant and replicating it across the word.  In fact in
+    that case it is never worth inlining, since calling memset should
+    always be smaller.  */
+ 
+ #define SET_RATIO	(optimize_size ? MIPS_CALL_RATIO : 15)
+ 
+ /* STORE_BY_PIECES_P can be used when copying a constant string, but
+    in that case each word takes 3 insns (lui, ori, sw), or more in
+    64-bit mode, instead of 2 (lw, sw). So better to always fail this
+    and let the move_by_pieces code copy the string from read-only
+    memory.  */
+ 
+ #define STORE_BY_PIECES_P(SIZE, ALIGN) 0
  \f
  #ifndef __mips16
  /* Since the bits of the _init and _fini function is spread across

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: PATCH: fine-tuning for can_store_by_pieces
  2007-08-19  0:03   ` Sandra Loosemore
@ 2007-08-20  8:22     ` Richard Sandiford
  2007-08-20 23:38       ` Sandra Loosemore
  0 siblings, 1 reply; 30+ messages in thread
From: Richard Sandiford @ 2007-08-20  8:22 UTC (permalink / raw)
  To: Sandra Loosemore
  Cc: GCC Patches, Nigel Stephens, Guy Morrogh, David Ung,
	Thiemo Seufer, Mark Mitchell

Sandra Loosemore <sandra@codesourcery.com> writes:
> I did some fiddling around with CSiBE to further improve the cost
> estimate for a call.  Setting the base cost to 4, rather than 3 as in
> the original patch, produced smaller code both with and without
> -mabicalls.  With o32 abicalls, which Richard indicated would be the
> worst case scenario for call overhead, base cost of 5 gave similar
> size results to 4 (overall a tiny decrease, but it looked like just as
> many individual tests went up as down), but code size went up for
> everything else.  So I think 4 is probably a reasonable value without
> having to conditionalize it for -mabicalls.

Thanks for the testing.  In that case, I agree 4 is fine for everything.
If you still have the results, could you post the totals?  I'm curious
what kind of figures we're talking about here.

> + /* The base cost of a memcpy call, for MOVE_RATIO and friends.  This
> +    was determined experimentally by benchmarking with CSiBE.  In theory,
> +    the call overhead is higher for TARGET_ABICALLS (especially for o32
> +    where we have to restore $gp afterwards as well as make an indirect
> +    call), but in practice, bumping this up higher for TARGET_ABICALLS 
> +    doesn't make much difference to code size.  */
> + 
> + #define MIPS_CALL_RATIO 4

I think the number you use in CLEAR_RATIO (MIPS_CALL_RATIO + 2)
is effectively estimating the number of instruction for a call.
ISTM CLEAR_RATIO is basically being compared against an estimate of
the number of zero stores, and zero stores are 1 instruction on MIPS.
(Also, nothing really explained why CLEAR_RATIO adds a magic 2 to the
ratio.)

So I think this should really be 6 and that CLEAR_RATIO should be:

#define CLEAR_RATIO	(optimize_size ? MIPS_CALL_RATIO : 15)

Then...

> + #define MOVE_RATIO ((TARGET_MIPS16 || TARGET_MEMCPY) ? MIPS_CALL_RATIO : 2)

...a comment in the original patch said that MOVE_RATIO effectively
counted memory-to-memory moves.  I think that was a useful comment,
and that the use of the old MIPS_CALL_RATIO above should be the new
MIPS_CALL_RATIO / 2.  Conveniently, that gives us the 3 that you had
in the original patch.  (You didn't say whether you'd benchmarked
-mips16 or -mmemcpy; if so, did you see any difference between a
MOVE_RATIO of 3 and a MOVE_RATIO of 4?)

> + /* This is similar to CLEAR_RATIO, but for a non-zero constant, so when
> +    optimizing for size adjust the ratio to account for the overhead of
> +    loading the constant and replicating it across the word.  In fact in
> +    that case it is never worth inlining, since calling memset should
> +    always be smaller.  */
> + 
> + #define SET_RATIO	(optimize_size ? MIPS_CALL_RATIO : 15)

I still find this comment a little confusing.  If this macro were
ever used for anything other than 1-byte copies when optimize_size,
MIPS_CALL_RATIO _would_ be big enough to cause SET_BY_PIECES_P to be
true for some of them.  And, as already discussed, I think "always"
is too strong anyway.

How about:

/* This is similar to CLEAR_RATIO, but for a non-zero constant.
   At the time this definition was added, optimize_size compilations
   only used this macro for single-byte stores.  Although it might be
   worthwhile doing larger stores inline too -- if the target-indepedent
   code ever considered it -- the boundaries are not very clear.
   Just use a value of 2 for now so that all single-byte copies are
   done inline.  */

#define SET_RATIO	(optimize_size ? 2 : 15)

But I don't like having to make excuses like this.  I wonder if we
should instead remove the optimize_size checks above the calls to
store_by_pieces_p and define the default SET_RATIO to something like:

#define SET_RATIO (optimize_size ? 2 : MOVE_RATIO)

?  Then the target would be able to control the otherwise arbitrary
optimize_size limit, just like it already can for CLEAR_RATIO.

> + /* STORE_BY_PIECES_P can be used when copying a constant string, but
> +    in that case each word takes 3 insns (lui, ori, sw), or more in
> +    64-bit mode, instead of 2 (lw, sw). So better to always fail this
> +    and let the move_by_pieces code copy the string from read-only
> +    memory.  */
> + 
> + #define STORE_BY_PIECES_P(SIZE, ALIGN) 0

You asked when lui/ori/sw might be faster.  Consider a three-word
store on a typical 2-way superscalar target:

  Cycle 1:    lui     lui
        2:    ori     ori
        3:    sw             lui
        4:            sw     ori
        5:                   sw

That's 5 cycles.  The equivalent lw/sw version is at least 6 cycles
(more if the read-only string is not in cache).

Richard

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: PATCH: fine-tuning for can_store_by_pieces
  2007-08-20  8:22     ` Richard Sandiford
@ 2007-08-20 23:38       ` Sandra Loosemore
  2007-08-21  8:21         ` Richard Sandiford
  0 siblings, 1 reply; 30+ messages in thread
From: Sandra Loosemore @ 2007-08-20 23:38 UTC (permalink / raw)
  To: GCC Patches, Nigel Stephens, Guy Morrogh, David Ung,
	Thiemo Seufer, Mark Mitchell, richard

Richard Sandiford wrote:

> Thanks for the testing.  In that case, I agree 4 is fine for everything.
> If you still have the results, could you post the totals?  I'm curious
> what kind of figures we're talking about here.

Here's what I have.  Except for measuring the original version of the patch on a 
mips64-elfoabi build, everything else was done with mips32r2-elfoabi; the 
numbers are total sizes from CSiBE.

		default		 -mips16	-mabicalls	mips64
baseline	3583977		2860177				3558373
call ratio 3	3566997		2859401		4039960		3541493
call ratio 4	3565961		2858881		4037876
call ratio 5	3566857		2859901		4037172
call ratio 6					4037332

>> + #define MIPS_CALL_RATIO 4
> 
> I think the number you use in CLEAR_RATIO (MIPS_CALL_RATIO + 2)
> is effectively estimating the number of instruction for a call.
> ISTM CLEAR_RATIO is basically being compared against an estimate of
> the number of zero stores, and zero stores are 1 instruction on MIPS.
> (Also, nothing really explained why CLEAR_RATIO adds a magic 2 to the
> ratio.)
> 
> So I think this should really be 6 and that CLEAR_RATIO should be:
> 
> #define CLEAR_RATIO	(optimize_size ? MIPS_CALL_RATIO : 15)
> 
> Then...
> 
>> + #define MOVE_RATIO ((TARGET_MIPS16 || TARGET_MEMCPY) ? MIPS_CALL_RATIO : 2)
> 
> ...a comment in the original patch said that MOVE_RATIO effectively
> counted memory-to-memory moves.  I think that was a useful comment,
> and that the use of the old MIPS_CALL_RATIO above should be the new
> MIPS_CALL_RATIO / 2.  Conveniently, that gives us the 3 that you had
> in the original patch.  

Except that 4 seems to be a better number, and that number doesn't fall out of 
this theory.  I guess I could run some tests with different values for 
CLEAR_RATIO too, and just document both numbers as being experimentally determined?

> (You didn't say whether you'd benchmarked
> -mips16 or -mmemcpy; if so, did you see any difference between a
> MOVE_RATIO of 3 and a MOVE_RATIO of 4?)

I tried -mips16 but not -mmemcpy.  See table above.

>> + /* STORE_BY_PIECES_P can be used when copying a constant string, but
>> +    in that case each word takes 3 insns (lui, ori, sw), or more in
>> +    64-bit mode, instead of 2 (lw, sw). So better to always fail this
>> +    and let the move_by_pieces code copy the string from read-only
>> +    memory.  */
>> + 
>> + #define STORE_BY_PIECES_P(SIZE, ALIGN) 0
> 
> You asked when lui/ori/sw might be faster.  Consider a three-word
> store on a typical 2-way superscalar target:
> 
>   Cycle 1:    lui     lui
>         2:    ori     ori
>         3:    sw             lui
>         4:            sw     ori
>         5:                   sw
> 
> That's 5 cycles.  The equivalent lw/sw version is at least 6 cycles
> (more if the read-only string is not in cache).

OK, but what I was really asking was, is there a way to *test* for situations 
where we should generate the lui/ori/sw sequences instead of the lw/sw?  Some 
combination of TARGET_foo flags and/or the size of the string?

-Sandra the clueless

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: PATCH: fine-tuning for can_store_by_pieces
  2007-08-20 23:38       ` Sandra Loosemore
@ 2007-08-21  8:21         ` Richard Sandiford
  2007-08-21 10:34           ` Nigel Stephens
  2007-08-21 13:54           ` Sandra Loosemore
  0 siblings, 2 replies; 30+ messages in thread
From: Richard Sandiford @ 2007-08-21  8:21 UTC (permalink / raw)
  To: Sandra Loosemore
  Cc: GCC Patches, Nigel Stephens, Guy Morrogh, David Ung,
	Thiemo Seufer, Mark Mitchell

Sandra Loosemore <sandra@codesourcery.com> writes:
> Richard Sandiford wrote:
>>> + #define MOVE_RATIO ((TARGET_MIPS16 || TARGET_MEMCPY) ? MIPS_CALL_RATIO : 2)
>> 
>> ...a comment in the original patch said that MOVE_RATIO effectively
>> counted memory-to-memory moves.  I think that was a useful comment,
>> and that the use of the old MIPS_CALL_RATIO above should be the new
>> MIPS_CALL_RATIO / 2.  Conveniently, that gives us the 3 that you had
>> in the original patch.  
>
> Except that 4 seems to be a better number, and that number doesn't
> fall out of this theory.

OK.  (Hence the question that completed that paragraph.  It was far from
obvious whether the MOVE_RATIO change from 3 to 4 had been determined
experimentally or not.  You just said "with and without -mabicalls",
which implied two sets of testing flags.)

> I guess I could run some tests with different values for CLEAR_RATIO
> too, and just document both numbers as being experimentally
> determined?

That does sound better, thanks.

>>> + /* STORE_BY_PIECES_P can be used when copying a constant string, but
>>> +    in that case each word takes 3 insns (lui, ori, sw), or more in
>>> +    64-bit mode, instead of 2 (lw, sw). So better to always fail this
>>> +    and let the move_by_pieces code copy the string from read-only
>>> +    memory.  */
>>> + 
>>> + #define STORE_BY_PIECES_P(SIZE, ALIGN) 0
>> 
>> You asked when lui/ori/sw might be faster.  Consider a three-word
>> store on a typical 2-way superscalar target:
>> 
>>   Cycle 1:    lui     lui
>>         2:    ori     ori
>>         3:    sw             lui
>>         4:            sw     ori
>>         5:                   sw
>> 
>> That's 5 cycles.  The equivalent lw/sw version is at least 6 cycles
>> (more if the read-only string is not in cache).
>
> OK, but what I was really asking was, is there a way to *test* for
> situations where we should generate the lui/ori/sw sequences instead
> of the lw/sw?  Some combination of TARGET_foo flags and/or the size of
> the string?

Well, I suppose:

    !optimize_size && !TARGET_MIPS16 && mips_issue_rate () > 1

is the condition under which the concerns above apply.  But, as with the
other !optimize_size bounds in your patch, the bound would need to be
determined experimentally.  Since we don't have an easy way of doing that,
I'm happy to preserve mainline's current behaviour there.

What did you think about the other suggestion: moving the magic
"1 instruction" bound for optimize_size from builtins.c to SET_RATIO?

Richard

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: PATCH: fine-tuning for can_store_by_pieces
  2007-08-21  8:21         ` Richard Sandiford
@ 2007-08-21 10:34           ` Nigel Stephens
  2007-08-21 11:53             ` Richard Sandiford
  2007-08-21 13:54           ` Sandra Loosemore
  1 sibling, 1 reply; 30+ messages in thread
From: Nigel Stephens @ 2007-08-21 10:34 UTC (permalink / raw)
  To: Sandra Loosemore, GCC Patches, Nigel Stephens, Guy Morrogh,
	David Ung, Thiemo Seufer, Mark Mitchell, richard



Richard Sandiford wrote:
> Sandra Loosemore <sandra@codesourcery.com> writes:
>   
>> OK, but what I was really asking was, is there a way to *test* for
>> situations where we should generate the lui/ori/sw sequences instead
>> of the lw/sw?  Some combination of TARGET_foo flags and/or the size of
>> the string?
>>     
>
> Well, I suppose:
>
>     !optimize_size && !TARGET_MIPS16 && mips_issue_rate () > 1
>
>   

Many MIPS dual-issue processors are asymettric, and may be able to issue 
lui, ori and sw down only one pipe or the other -- not both in parallel. 
So I don't think that testing mips_issue_rate() is sufficient.

Nigel


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: PATCH: fine-tuning for can_store_by_pieces
  2007-08-21 10:34           ` Nigel Stephens
@ 2007-08-21 11:53             ` Richard Sandiford
  2007-08-21 12:14               ` Nigel Stephens
  0 siblings, 1 reply; 30+ messages in thread
From: Richard Sandiford @ 2007-08-21 11:53 UTC (permalink / raw)
  To: Nigel Stephens
  Cc: Sandra Loosemore, GCC Patches, Guy Morrogh, David Ung,
	Thiemo Seufer, Mark Mitchell

Nigel Stephens <nigel@mips.com> writes:
> Richard Sandiford wrote:
>> Sandra Loosemore <sandra@codesourcery.com> writes:
>>   
>>> OK, but what I was really asking was, is there a way to *test* for
>>> situations where we should generate the lui/ori/sw sequences instead
>>> of the lw/sw?  Some combination of TARGET_foo flags and/or the size of
>>> the string?
>>>     
>>
>> Well, I suppose:
>>
>>     !optimize_size && !TARGET_MIPS16 && mips_issue_rate () > 1
>
> Many MIPS dual-issue processors are asymettric, and may be able to issue 
> lui, ori and sw down only one pipe or the other -- not both in parallel. 
> So I don't think that testing mips_issue_rate() is sufficient.

True.  I suppose I'm biased because all the processors whose schedulers
I've worked on could issue stores down one pipe and arithmetic
instructions down the other.

I assume you think we should stick the uncoditional zero here?

Richard

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: PATCH: fine-tuning for can_store_by_pieces
  2007-08-21 11:53             ` Richard Sandiford
@ 2007-08-21 12:14               ` Nigel Stephens
  2007-08-21 12:35                 ` Richard Sandiford
  0 siblings, 1 reply; 30+ messages in thread
From: Nigel Stephens @ 2007-08-21 12:14 UTC (permalink / raw)
  To: Nigel Stephens, Sandra Loosemore, GCC Patches, Guy Morrogh,
	David Ung, Thiemo Seufer, Mark Mitchell, richard



Richard Sandiford wrote:
> Nigel Stephens <nigel@mips.com> writes:
>   
>> Richard Sandiford wrote:
>>     
>>> Sandra Loosemore <sandra@codesourcery.com> writes:
>>>   
>>>       
>>>> OK, but what I was really asking was, is there a way to *test* for
>>>> situations where we should generate the lui/ori/sw sequences instead
>>>> of the lw/sw?  Some combination of TARGET_foo flags and/or the size of
>>>> the string?
>>>>     
>>>>         
>>> Well, I suppose:
>>>
>>>     !optimize_size && !TARGET_MIPS16 && mips_issue_rate () > 1
>>>       
>> Many MIPS dual-issue processors are asymettric, and may be able to issue 
>> lui, ori and sw down only one pipe or the other -- not both in parallel. 
>> So I don't think that testing mips_issue_rate() is sufficient.
>>     
>
> True.  I suppose I'm biased because all the processors whose schedulers
> I've worked on could issue stores down one pipe and arithmetic
> instructions down the other.
>
>   

Your assumption is probably correct, however your schedule for the first 
two cycles shows two lui and two ori issuing in parallel, which would 
not be possible in such an architecture. For a long enough string though 
it could be a win.

> I assume you think we should stick the uncoditional zero here?
>   

We could make the STORE_BY_PIECES_P macro dependent on -mtune, and use 
the size and alignment to estimate the total cycle count. But zero is 
reasonable  and we could add a comment suggesting that there's some 
opportunity for future tuning on muilti-issue cpus.

Nigel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: PATCH: fine-tuning for can_store_by_pieces
  2007-08-21 12:14               ` Nigel Stephens
@ 2007-08-21 12:35                 ` Richard Sandiford
  0 siblings, 0 replies; 30+ messages in thread
From: Richard Sandiford @ 2007-08-21 12:35 UTC (permalink / raw)
  To: Nigel Stephens
  Cc: Sandra Loosemore, GCC Patches, Guy Morrogh, David Ung,
	Thiemo Seufer, Mark Mitchell

Nigel Stephens <nigel@mips.com> writes:
> Richard Sandiford wrote:
>> Nigel Stephens <nigel@mips.com> writes:
>>   
>>> Richard Sandiford wrote:
>>>     
>>>> Sandra Loosemore <sandra@codesourcery.com> writes:
>>>>   
>>>>       
>>>>> OK, but what I was really asking was, is there a way to *test* for
>>>>> situations where we should generate the lui/ori/sw sequences instead
>>>>> of the lw/sw?  Some combination of TARGET_foo flags and/or the size of
>>>>> the string?
>>>>>     
>>>>>         
>>>> Well, I suppose:
>>>>
>>>>     !optimize_size && !TARGET_MIPS16 && mips_issue_rate () > 1
>>>>       
>>> Many MIPS dual-issue processors are asymettric, and may be able to issue 
>>> lui, ori and sw down only one pipe or the other -- not both in parallel. 
>>> So I don't think that testing mips_issue_rate() is sufficient.
>>>     
>>
>> True.  I suppose I'm biased because all the processors whose schedulers
>> I've worked on could issue stores down one pipe and arithmetic
>> instructions down the other.
>
> Your assumption is probably correct, however your schedule for the first 
> two cycles shows two lui and two ori issuing in parallel, which would 
> not be possible in such an architecture. For a long enough string though 
> it could be a win.

Well, all the ones I worked on were also be able to issue two
arithmetic instructions together.

>> I assume you think we should stick the uncoditional zero here?
>
> We could make the STORE_BY_PIECES_P macro dependent on -mtune, and use 
> the size and alignment to estimate the total cycle count. But zero is 
> reasonable  and we could add a comment suggesting that there's some 
> opportunity for future tuning on muilti-issue cpus.

OK.  I'm not entirely happy that we'll be regressing on those processors,
but I'll let it pass.

Richard

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: PATCH: fine-tuning for can_store_by_pieces
  2007-08-21  8:21         ` Richard Sandiford
  2007-08-21 10:34           ` Nigel Stephens
@ 2007-08-21 13:54           ` Sandra Loosemore
  2007-08-21 14:22             ` Richard Sandiford
  1 sibling, 1 reply; 30+ messages in thread
From: Sandra Loosemore @ 2007-08-21 13:54 UTC (permalink / raw)
  To: GCC Patches, Nigel Stephens, Guy Morrogh, David Ung,
	Thiemo Seufer, Mark Mitchell, richard

Richard Sandiford wrote:

> What did you think about the other suggestion: moving the magic
> "1 instruction" bound for optimize_size from builtins.c to SET_RATIO?

Perhaps other maintainers can jump in and say something here, but my gut feeling 
is that it doesn't make sense to remove that.  Doing a one-byte store inline 
always has to be cheaper than a function call, and doing the optimization early 
makes more sense than relying on a target-specific expansion, because it might 
allow recognition of other optimization patterns along the way.  I found during 
my earlier testing that setting MOVE_RATIO too low to catch that case had the 
side-effect of causing one of the profile-guided optimization test cases to 
fail, for instance.

-Sandra

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: PATCH: fine-tuning for can_store_by_pieces
  2007-08-21 13:54           ` Sandra Loosemore
@ 2007-08-21 14:22             ` Richard Sandiford
  2007-08-21 20:39               ` Sandra Loosemore
  0 siblings, 1 reply; 30+ messages in thread
From: Richard Sandiford @ 2007-08-21 14:22 UTC (permalink / raw)
  To: Sandra Loosemore
  Cc: GCC Patches, Nigel Stephens, Guy Morrogh, David Ung,
	Thiemo Seufer, Mark Mitchell

Sandra Loosemore <sandra@codesourcery.com> writes:
> Richard Sandiford wrote:
>> What did you think about the other suggestion: moving the magic
>> "1 instruction" bound for optimize_size from builtins.c to SET_RATIO?
>
> Perhaps other maintainers can jump in and say something here, but my
> gut feeling is that it doesn't make sense to remove that.  Doing a
> one-byte store inline always has to be cheaper than a function call,
> and doing the optimization early makes more sense than relying on a
> target-specific expansion, because it might allow recognition of other
> optimization patterns along the way.  I found during my earlier
> testing that setting MOVE_RATIO too low to catch that case had the
> side-effect of causing one of the profile-guided optimization test
> cases to fail, for instance.

Hmm, I'm not sure I follow.  You seem to be implying that 1-byte stores
are always done "by pieces" when optimize_size, but I don't think that's
true.  I was referring the 1-instruction bound in code like this:

      if (host_integerp (len, 1)
--->	  && !(optimize_size && tree_low_cst (len, 1) > 1)
	  && can_store_by_pieces (tree_low_cst (len, 1),
				  builtin_memset_read_str, &c, dest_align))
	{
	  val_rtx = force_reg (TYPE_MODE (unsigned_char_type_node),
			       val_rtx);
	  store_by_pieces (dest_mem, tree_low_cst (len, 1),
			   builtin_memset_gen_str, val_rtx, dest_align, 0);
	}
      else if (!set_storage_via_setmem (dest_mem, len_rtx, val_rtx,
					dest_align, expected_align,
					expected_size))
	goto do_libcall;
       
This code still uses can_store_by_pieces for single-byte stores when
optimize_size (and can still fall back to setmem or libcalls for that
case if can_store_by_pieces returns false, although I agree that's an
odd thing to do for single-byte stores).  What I was objecting to was
that the target doesn't get any chance to say that _2-byte stores_ (or
bigger) are better implemented "by pieces" than via a setmem or libcall
pattern.

You referred to this limit yourself when I queried the MIPS
optimize_size value of SET_RATIO.  You said that the value only really
matters for 1-byte stores, and looking at the patch, I thought I could
see why.  All calls to can_store_by_pieces with a "true" argument seemed
to be guarded by a check like the above.  So the suggestion to move the
check was really following on from that.  As far as I could tell,
CLEAR_RATIO and CLEAR_BY_PIECES_P have no single-byte limit for
optimize_size, so I was thinking it would be better if SET_RATIO and
SET_BY_PIECES_P didn't either.

Richard

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: PATCH: fine-tuning for can_store_by_pieces
  2007-08-21 14:22             ` Richard Sandiford
@ 2007-08-21 20:39               ` Sandra Loosemore
  2007-08-21 20:56                 ` Richard Sandiford
  0 siblings, 1 reply; 30+ messages in thread
From: Sandra Loosemore @ 2007-08-21 20:39 UTC (permalink / raw)
  To: GCC Patches, Nigel Stephens, Guy Morrogh, David Ung,
	Thiemo Seufer, Mark Mitchell, richard

Richard Sandiford wrote:

> Hmm, I'm not sure I follow.  You seem to be implying that 1-byte stores
> are always done "by pieces" when optimize_size, but I don't think that's
> true.  I was referring the 1-instruction bound in code like this:
> 
>       if (host_integerp (len, 1)
> --->	  && !(optimize_size && tree_low_cst (len, 1) > 1)
> 	  && can_store_by_pieces (tree_low_cst (len, 1),
> 				  builtin_memset_read_str, &c, dest_align))
> 	{
> 	  val_rtx = force_reg (TYPE_MODE (unsigned_char_type_node),
> 			       val_rtx);
> 	  store_by_pieces (dest_mem, tree_low_cst (len, 1),
> 			   builtin_memset_gen_str, val_rtx, dest_align, 0);
> 	}
>       else if (!set_storage_via_setmem (dest_mem, len_rtx, val_rtx,
> 					dest_align, expected_align,
> 					expected_size))
> 	goto do_libcall;
>        
> This code still uses can_store_by_pieces for single-byte stores when
> optimize_size (and can still fall back to setmem or libcalls for that
> case if can_store_by_pieces returns false, although I agree that's an
> odd thing to do for single-byte stores).  What I was objecting to was
> that the target doesn't get any chance to say that _2-byte stores_ (or
> bigger) are better implemented "by pieces" than via a setmem or libcall
> pattern.
> 
> You referred to this limit yourself when I queried the MIPS
> optimize_size value of SET_RATIO.  You said that the value only really
> matters for 1-byte stores, and looking at the patch, I thought I could
> see why.  All calls to can_store_by_pieces with a "true" argument seemed
> to be guarded by a check like the above.  So the suggestion to move the
> check was really following on from that.  As far as I could tell,
> CLEAR_RATIO and CLEAR_BY_PIECES_P have no single-byte limit for
> optimize_size, so I was thinking it would be better if SET_RATIO and
> SET_BY_PIECES_P didn't either.

Oh, OK.  I misunderstood what you were unhappy about there.  I tried 
just removing the offending clause of the if statement, and got a small 
decrease in size across the board for CSiBE tests (just -Os, -Os 
-mips16, and -Os -mabicalls) on mips32r2.

I'm not sure it's necessary to move the optimize_size test to the 
default definition of SET_RATIO.  We currently have:

#ifndef MOVE_RATIO
#if defined (HAVE_movmemqi) || defined (HAVE_movmemhi) || defined 
(HAVE_movmemsi) || defined (HAVE_movmemdi) || defined (HAVE_movmemti)
#define MOVE_RATIO 2
#else
/* If we are optimizing for space (-Os), cut down the default move 
ratio.  */
#define MOVE_RATIO (optimize_size ? 3 : 15)
#endif

#endif
/* If a memory set (to value other than zero) operation would take
    SET_RATIO or more simple move-instruction sequences, we will do a movmem
    or libcall instead.  */
#ifndef SET_RATIO
#define SET_RATIO MOVE_RATIO
#endif

...so SET_RATIO is already guaranteed to default to a small value if 
optimize_size is true.  Only problem is if folks are concerned that this 
change might be a Bad Thing for other back ends; all my other changes 
have tried not to change the behavior on anything other than MIPS.

-Sandra

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: PATCH: fine-tuning for can_store_by_pieces
  2007-08-21 20:39               ` Sandra Loosemore
@ 2007-08-21 20:56                 ` Richard Sandiford
  2007-08-23 14:35                   ` Sandra Loosemore
  0 siblings, 1 reply; 30+ messages in thread
From: Richard Sandiford @ 2007-08-21 20:56 UTC (permalink / raw)
  To: Sandra Loosemore
  Cc: GCC Patches, Nigel Stephens, Guy Morrogh, David Ung,
	Thiemo Seufer, Mark Mitchell

Sandra Loosemore <sandra@codesourcery.com> writes:
> Richard Sandiford wrote:
>> Hmm, I'm not sure I follow.  You seem to be implying that 1-byte stores
>> are always done "by pieces" when optimize_size, but I don't think that's
>> true.  I was referring the 1-instruction bound in code like this:
>> 
>>       if (host_integerp (len, 1)
>> --->	  && !(optimize_size && tree_low_cst (len, 1) > 1)
>> 	  && can_store_by_pieces (tree_low_cst (len, 1),
>> 				  builtin_memset_read_str, &c, dest_align))
>> 	{
>> 	  val_rtx = force_reg (TYPE_MODE (unsigned_char_type_node),
>> 			       val_rtx);
>> 	  store_by_pieces (dest_mem, tree_low_cst (len, 1),
>> 			   builtin_memset_gen_str, val_rtx, dest_align, 0);
>> 	}
>>       else if (!set_storage_via_setmem (dest_mem, len_rtx, val_rtx,
>> 					dest_align, expected_align,
>> 					expected_size))
>> 	goto do_libcall;
>>        
>> This code still uses can_store_by_pieces for single-byte stores when
>> optimize_size (and can still fall back to setmem or libcalls for that
>> case if can_store_by_pieces returns false, although I agree that's an
>> odd thing to do for single-byte stores).  What I was objecting to was
>> that the target doesn't get any chance to say that _2-byte stores_ (or
>> bigger) are better implemented "by pieces" than via a setmem or libcall
>> pattern.
>> 
>> You referred to this limit yourself when I queried the MIPS
>> optimize_size value of SET_RATIO.  You said that the value only really
>> matters for 1-byte stores, and looking at the patch, I thought I could
>> see why.  All calls to can_store_by_pieces with a "true" argument seemed
>> to be guarded by a check like the above.  So the suggestion to move the
>> check was really following on from that.  As far as I could tell,
>> CLEAR_RATIO and CLEAR_BY_PIECES_P have no single-byte limit for
>> optimize_size, so I was thinking it would be better if SET_RATIO and
>> SET_BY_PIECES_P didn't either.
>
> Oh, OK.  I misunderstood what you were unhappy about there.  I tried 
> just removing the offending clause of the if statement, and got a small 
> decrease in size across the board for CSiBE tests (just -Os, -Os 
> -mips16, and -Os -mabicalls) on mips32r2.
>
> I'm not sure it's necessary to move the optimize_size test to the 
> default definition of SET_RATIO.  We currently have:
>
> #ifndef MOVE_RATIO
> #if defined (HAVE_movmemqi) || defined (HAVE_movmemhi) || defined 
> (HAVE_movmemsi) || defined (HAVE_movmemdi) || defined (HAVE_movmemti)
> #define MOVE_RATIO 2
> #else
> /* If we are optimizing for space (-Os), cut down the default move 
> ratio.  */
> #define MOVE_RATIO (optimize_size ? 3 : 15)
> #endif
>
> #endif
> /* If a memory set (to value other than zero) operation would take
>     SET_RATIO or more simple move-instruction sequences, we will do a movmem
>     or libcall instead.  */
> #ifndef SET_RATIO
> #define SET_RATIO MOVE_RATIO
> #endif
>
> ...so SET_RATIO is already guaranteed to default to a small value if 
> optimize_size is true.  Only problem is if folks are concerned that this 
> change might be a Bad Thing for other back ends; all my other changes 
> have tried not to change the behavior on anything other than MIPS.

Ah, good point.  FWIW, as far as the MIPS port goes, just removing the
test would certainly be fine by me.  It's good to hear that it does
indeed bring about a size improvement.

Richard

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: PATCH: fine-tuning for can_store_by_pieces
  2007-08-21 20:56                 ` Richard Sandiford
@ 2007-08-23 14:35                   ` Sandra Loosemore
  2007-08-23 14:44                     ` Richard Sandiford
  2007-08-24 22:06                     ` Mark Mitchell
  0 siblings, 2 replies; 30+ messages in thread
From: Sandra Loosemore @ 2007-08-23 14:35 UTC (permalink / raw)
  To: GCC Patches, Nigel Stephens, Guy Morrogh, David Ung,
	Thiemo Seufer, Mark Mitchell, richard

[-- Attachment #1: Type: text/plain, Size: 725 bytes --]

I think this version of the patch addresses all the issues Richard 
raised with the last version, unless I've gotten confused again and let 
something slip through the cracks.  I experimented with tweaking all of 
MOVE_RATIO/CLEAR_RATIO/SET_RATIO and came up with values that produce 
good -Os results in CSiBE as well as making sense in terms of the base 
MIPS_CALL_RATIO versus how many instructions would be required for each 
bytewise move/clear/set.  Overall I am getting just over 0.5% 
improvement with -Os from this patch now.

Mark, could you take another look at this as well, since I included 
Richard's suggested change to remove the optimize_size check from the 
target-independent code in builtins.c?

-Sandra


[-- Attachment #2: 31b-frob-by-pieces.log --]
[-- Type: text/x-log, Size: 1326 bytes --]

2007-08-22  Sandra Loosemore  <sandra@codesourcery.com>
            Nigel Stephens <nigel@mips.com>

	PR target/11787

	gcc/

	* doc/tm.texi (SET_RATIO, SET_BY_PIECES_P): Document new macros.
	(STORE_BY_PIECES_P): No longer applies to __builtin_memset.
	* expr.c (SET_BY_PIECES_P): Define.
	(can_store_by_pieces, store_by_pieces): Add MEMSETP argument; use
	it to decide whether to use SET_BY_PIECES_P or STORE_BY_PIECES_P.
	* expr.h (SET_RATIO): Define.
	(can_store_by_pieces, store_by_pieces):	Update prototypes.
	* builtins.c (expand_builtin_memcpy): Pass MEMSETP argument to
	can_store_by_pieces/store_by_pieces.
	(expand_builtin_memcpy_args): Likewise.
	(expand_builtin_strncpy): Likewise.
	(expand_builtin_memset_args): Likewise.  Also remove special case
	for optimize_size so that can_store_by_pieces/SET_BY_PIECES_P can
	decide what to do instead.
	* value-prof.c (tree_stringops_transform): Pass MEMSETP argument
	to can_store_by_pieces.

	* config/sh/sh.h (SET_BY_PIECES_P): Clone from STORE_BY_PIECES_P.
	* config/s390/s390.h (SET_BY_PIECES_P): Likewise.

	* config/mips/mips.opt (mmemcpy): Change from Var to Mask.
	* config/mips/mips.c (override_options): Make -Os default to -mmemcpy.
	* config/mips/mips.h (MIPS_CALL_RATIO): Define.
	(MOVE_RATIO, CLEAR_RATIO, SET_RATIO): Define.
	(STORE_BY_PIECES_P): Define.

[-- Attachment #3: 31b-frob-by-pieces.patch --]
[-- Type: text/x-patch, Size: 22805 bytes --]

Index: gcc/doc/tm.texi
===================================================================
*** gcc/doc/tm.texi	(revision 127324)
--- gcc/doc/tm.texi	(working copy)
*************** will be used.  Defaults to 1 if @code{mo
*** 5893,5904 ****
  than @code{CLEAR_RATIO}.
  @end defmac
  
  @defmac STORE_BY_PIECES_P (@var{size}, @var{alignment})
  A C expression used to determine whether @code{store_by_pieces} will be
! used to set a chunk of memory to a constant value, or whether some other
! mechanism will be used.  Used by @code{__builtin_memset} when storing
! values other than constant zero and by @code{__builtin_strcpy} when
! when called with a constant source string.
  Defaults to 1 if @code{move_by_pieces_ninsns} returns less
  than @code{MOVE_RATIO}.
  @end defmac
--- 5893,5922 ----
  than @code{CLEAR_RATIO}.
  @end defmac
  
+ @defmac SET_RATIO
+ The threshold of number of scalar move insns, @emph{below} which a sequence
+ of insns should be generated to set memory to a constant value, instead of
+ a block set insn or a library call.  
+ Increasing the value will always make code faster, but
+ eventually incurs high cost in increased code size.
+ 
+ If you don't define this, it defaults to the value of @code{MOVE_RATIO}.
+ @end defmac
+ 
+ @defmac SET_BY_PIECES_P (@var{size}, @var{alignment})
+ A C expression used to determine whether @code{store_by_pieces} will be
+ used to set a chunk of memory to a constant value, or whether some 
+ other mechanism will be used.  Used by @code{__builtin_memset} when 
+ storing values other than constant zero.
+ Defaults to 1 if @code{move_by_pieces_ninsns} returns less
+ than @code{SET_RATIO}.
+ @end defmac
+ 
  @defmac STORE_BY_PIECES_P (@var{size}, @var{alignment})
  A C expression used to determine whether @code{store_by_pieces} will be
! used to set a chunk of memory to a constant string value, or whether some 
! other mechanism will be used.  Used by @code{__builtin_strcpy} when
! called with a constant source string.
  Defaults to 1 if @code{move_by_pieces_ninsns} returns less
  than @code{MOVE_RATIO}.
  @end defmac
Index: gcc/expr.c
===================================================================
*** gcc/expr.c	(revision 127324)
--- gcc/expr.c	(working copy)
*************** static bool float_extend_from_mem[NUM_MA
*** 186,193 ****
  #endif
  
  /* This macro is used to determine whether store_by_pieces should be
!    called to "memset" storage with byte values other than zero, or
!    to "memcpy" storage when the source is a constant string.  */
  #ifndef STORE_BY_PIECES_P
  #define STORE_BY_PIECES_P(SIZE, ALIGN) \
    (move_by_pieces_ninsns (SIZE, ALIGN, STORE_MAX_PIECES + 1) \
--- 186,200 ----
  #endif
  
  /* This macro is used to determine whether store_by_pieces should be
!    called to "memset" storage with byte values other than zero.  */
! #ifndef SET_BY_PIECES_P
! #define SET_BY_PIECES_P(SIZE, ALIGN) \
!   (move_by_pieces_ninsns (SIZE, ALIGN, STORE_MAX_PIECES + 1) \
!    < (unsigned int) SET_RATIO)
! #endif
! 
! /* This macro is used to determine whether store_by_pieces should be
!    called to "memcpy" storage when the source is a constant string.  */
  #ifndef STORE_BY_PIECES_P
  #define STORE_BY_PIECES_P(SIZE, ALIGN) \
    (move_by_pieces_ninsns (SIZE, ALIGN, STORE_MAX_PIECES + 1) \
*************** use_group_regs (rtx *call_fusage, rtx re
*** 2191,2203 ****
  /* Determine whether the LEN bytes generated by CONSTFUN can be
     stored to memory using several move instructions.  CONSTFUNDATA is
     a pointer which will be passed as argument in every CONSTFUN call.
!    ALIGN is maximum alignment we can assume.  Return nonzero if a
!    call to store_by_pieces should succeed.  */
  
  int
  can_store_by_pieces (unsigned HOST_WIDE_INT len,
  		     rtx (*constfun) (void *, HOST_WIDE_INT, enum machine_mode),
! 		     void *constfundata, unsigned int align)
  {
    unsigned HOST_WIDE_INT l;
    unsigned int max_size;
--- 2198,2211 ----
  /* Determine whether the LEN bytes generated by CONSTFUN can be
     stored to memory using several move instructions.  CONSTFUNDATA is
     a pointer which will be passed as argument in every CONSTFUN call.
!    ALIGN is maximum alignment we can assume.  MEMSETP is true if this is
!    a memset operation and false if it's a copy of a constant string.
!    Return nonzero if a call to store_by_pieces should succeed.  */
  
  int
  can_store_by_pieces (unsigned HOST_WIDE_INT len,
  		     rtx (*constfun) (void *, HOST_WIDE_INT, enum machine_mode),
! 		     void *constfundata, unsigned int align, bool memsetp)
  {
    unsigned HOST_WIDE_INT l;
    unsigned int max_size;
*************** can_store_by_pieces (unsigned HOST_WIDE_
*** 2210,2216 ****
    if (len == 0)
      return 1;
  
!   if (! STORE_BY_PIECES_P (len, align))
      return 0;
  
    tmode = mode_for_size (STORE_MAX_PIECES * BITS_PER_UNIT, MODE_INT, 1);
--- 2218,2226 ----
    if (len == 0)
      return 1;
  
!   if (! (memsetp 
! 	 ? SET_BY_PIECES_P (len, align)
! 	 : STORE_BY_PIECES_P (len, align)))
      return 0;
  
    tmode = mode_for_size (STORE_MAX_PIECES * BITS_PER_UNIT, MODE_INT, 1);
*************** can_store_by_pieces (unsigned HOST_WIDE_
*** 2285,2291 ****
  /* Generate several move instructions to store LEN bytes generated by
     CONSTFUN to block TO.  (A MEM rtx with BLKmode).  CONSTFUNDATA is a
     pointer which will be passed as argument in every CONSTFUN call.
!    ALIGN is maximum alignment we can assume.
     If ENDP is 0 return to, if ENDP is 1 return memory at the end ala
     mempcpy, and if ENDP is 2 return memory the end minus one byte ala
     stpcpy.  */
--- 2295,2302 ----
  /* Generate several move instructions to store LEN bytes generated by
     CONSTFUN to block TO.  (A MEM rtx with BLKmode).  CONSTFUNDATA is a
     pointer which will be passed as argument in every CONSTFUN call.
!    ALIGN is maximum alignment we can assume.  MEMSETP is true if this is
!    a memset operation and false if it's a copy of a constant string.
     If ENDP is 0 return to, if ENDP is 1 return memory at the end ala
     mempcpy, and if ENDP is 2 return memory the end minus one byte ala
     stpcpy.  */
*************** can_store_by_pieces (unsigned HOST_WIDE_
*** 2293,2299 ****
  rtx
  store_by_pieces (rtx to, unsigned HOST_WIDE_INT len,
  		 rtx (*constfun) (void *, HOST_WIDE_INT, enum machine_mode),
! 		 void *constfundata, unsigned int align, int endp)
  {
    struct store_by_pieces data;
  
--- 2304,2310 ----
  rtx
  store_by_pieces (rtx to, unsigned HOST_WIDE_INT len,
  		 rtx (*constfun) (void *, HOST_WIDE_INT, enum machine_mode),
! 		 void *constfundata, unsigned int align, bool memsetp, int endp)
  {
    struct store_by_pieces data;
  
*************** store_by_pieces (rtx to, unsigned HOST_W
*** 2303,2309 ****
        return to;
      }
  
!   gcc_assert (STORE_BY_PIECES_P (len, align));
    data.constfun = constfun;
    data.constfundata = constfundata;
    data.len = len;
--- 2314,2322 ----
        return to;
      }
  
!   gcc_assert (memsetp
! 	      ? SET_BY_PIECES_P (len, align)
! 	      : STORE_BY_PIECES_P (len, align));
    data.constfun = constfun;
    data.constfundata = constfundata;
    data.len = len;
Index: gcc/expr.h
===================================================================
*** gcc/expr.h	(revision 127324)
--- gcc/expr.h	(working copy)
*************** enum expand_modifier {EXPAND_NORMAL = 0,
*** 84,89 ****
--- 84,96 ----
  #define CLEAR_RATIO (optimize_size ? 3 : 15)
  #endif
  #endif
+ 
+ /* If a memory set (to value other than zero) operation would take
+    SET_RATIO or more simple move-instruction sequences, we will do a movmem
+    or libcall instead.  */
+ #ifndef SET_RATIO
+ #define SET_RATIO MOVE_RATIO
+ #endif
  \f
  enum direction {none, upward, downward};
  
*************** extern int can_move_by_pieces (unsigned 
*** 443,462 ****
     CONSTFUN with several move instructions by store_by_pieces
     function.  CONSTFUNDATA is a pointer which will be passed as argument
     in every CONSTFUN call.
!    ALIGN is maximum alignment we can assume.  */
  extern int can_store_by_pieces (unsigned HOST_WIDE_INT,
  				rtx (*) (void *, HOST_WIDE_INT,
  					 enum machine_mode),
! 				void *, unsigned int);
  
  /* Generate several move instructions to store LEN bytes generated by
     CONSTFUN to block TO.  (A MEM rtx with BLKmode).  CONSTFUNDATA is a
     pointer which will be passed as argument in every CONSTFUN call.
     ALIGN is maximum alignment we can assume.
     Returns TO + LEN.  */
  extern rtx store_by_pieces (rtx, unsigned HOST_WIDE_INT,
  			    rtx (*) (void *, HOST_WIDE_INT, enum machine_mode),
! 			    void *, unsigned int, int);
  
  /* Emit insns to set X from Y.  */
  extern rtx emit_move_insn (rtx, rtx);
--- 450,472 ----
     CONSTFUN with several move instructions by store_by_pieces
     function.  CONSTFUNDATA is a pointer which will be passed as argument
     in every CONSTFUN call.
!    ALIGN is maximum alignment we can assume.
!    MEMSETP is true if this is a real memset/bzero, not a copy
!    of a const string.  */
  extern int can_store_by_pieces (unsigned HOST_WIDE_INT,
  				rtx (*) (void *, HOST_WIDE_INT,
  					 enum machine_mode),
! 				void *, unsigned int, bool);
  
  /* Generate several move instructions to store LEN bytes generated by
     CONSTFUN to block TO.  (A MEM rtx with BLKmode).  CONSTFUNDATA is a
     pointer which will be passed as argument in every CONSTFUN call.
     ALIGN is maximum alignment we can assume.
+    MEMSETP is true if this is a real memset/bzero, not a copy.
     Returns TO + LEN.  */
  extern rtx store_by_pieces (rtx, unsigned HOST_WIDE_INT,
  			    rtx (*) (void *, HOST_WIDE_INT, enum machine_mode),
! 			    void *, unsigned int, bool, int);
  
  /* Emit insns to set X from Y.  */
  extern rtx emit_move_insn (rtx, rtx);
Index: gcc/builtins.c
===================================================================
*** gcc/builtins.c	(revision 127324)
--- gcc/builtins.c	(working copy)
*************** expand_builtin_memcpy (tree exp, rtx tar
*** 3371,3381 ****
  	  && GET_CODE (len_rtx) == CONST_INT
  	  && (unsigned HOST_WIDE_INT) INTVAL (len_rtx) <= strlen (src_str) + 1
  	  && can_store_by_pieces (INTVAL (len_rtx), builtin_memcpy_read_str,
! 				  (void *) src_str, dest_align))
  	{
  	  dest_mem = store_by_pieces (dest_mem, INTVAL (len_rtx),
  				      builtin_memcpy_read_str,
! 				      (void *) src_str, dest_align, 0);
  	  dest_mem = force_operand (XEXP (dest_mem, 0), NULL_RTX);
  	  dest_mem = convert_memory_address (ptr_mode, dest_mem);
  	  return dest_mem;
--- 3371,3381 ----
  	  && GET_CODE (len_rtx) == CONST_INT
  	  && (unsigned HOST_WIDE_INT) INTVAL (len_rtx) <= strlen (src_str) + 1
  	  && can_store_by_pieces (INTVAL (len_rtx), builtin_memcpy_read_str,
! 				  (void *) src_str, dest_align, false))
  	{
  	  dest_mem = store_by_pieces (dest_mem, INTVAL (len_rtx),
  				      builtin_memcpy_read_str,
! 				      (void *) src_str, dest_align, false, 0);
  	  dest_mem = force_operand (XEXP (dest_mem, 0), NULL_RTX);
  	  dest_mem = convert_memory_address (ptr_mode, dest_mem);
  	  return dest_mem;
*************** expand_builtin_mempcpy_args (tree dest, 
*** 3484,3496 ****
  	  && GET_CODE (len_rtx) == CONST_INT
  	  && (unsigned HOST_WIDE_INT) INTVAL (len_rtx) <= strlen (src_str) + 1
  	  && can_store_by_pieces (INTVAL (len_rtx), builtin_memcpy_read_str,
! 				  (void *) src_str, dest_align))
  	{
  	  dest_mem = get_memory_rtx (dest, len);
  	  set_mem_align (dest_mem, dest_align);
  	  dest_mem = store_by_pieces (dest_mem, INTVAL (len_rtx),
  				      builtin_memcpy_read_str,
! 				      (void *) src_str, dest_align, endp);
  	  dest_mem = force_operand (XEXP (dest_mem, 0), NULL_RTX);
  	  dest_mem = convert_memory_address (ptr_mode, dest_mem);
  	  return dest_mem;
--- 3484,3497 ----
  	  && GET_CODE (len_rtx) == CONST_INT
  	  && (unsigned HOST_WIDE_INT) INTVAL (len_rtx) <= strlen (src_str) + 1
  	  && can_store_by_pieces (INTVAL (len_rtx), builtin_memcpy_read_str,
! 				  (void *) src_str, dest_align, false))
  	{
  	  dest_mem = get_memory_rtx (dest, len);
  	  set_mem_align (dest_mem, dest_align);
  	  dest_mem = store_by_pieces (dest_mem, INTVAL (len_rtx),
  				      builtin_memcpy_read_str,
! 				      (void *) src_str, dest_align,
! 				      false, endp);
  	  dest_mem = force_operand (XEXP (dest_mem, 0), NULL_RTX);
  	  dest_mem = convert_memory_address (ptr_mode, dest_mem);
  	  return dest_mem;
*************** expand_builtin_strncpy (tree exp, rtx ta
*** 3832,3844 ****
  	  if (!p || dest_align == 0 || !host_integerp (len, 1)
  	      || !can_store_by_pieces (tree_low_cst (len, 1),
  				       builtin_strncpy_read_str,
! 				       (void *) p, dest_align))
  	    return NULL_RTX;
  
  	  dest_mem = get_memory_rtx (dest, len);
  	  store_by_pieces (dest_mem, tree_low_cst (len, 1),
  			   builtin_strncpy_read_str,
! 			   (void *) p, dest_align, 0);
  	  dest_mem = force_operand (XEXP (dest_mem, 0), NULL_RTX);
  	  dest_mem = convert_memory_address (ptr_mode, dest_mem);
  	  return dest_mem;
--- 3833,3845 ----
  	  if (!p || dest_align == 0 || !host_integerp (len, 1)
  	      || !can_store_by_pieces (tree_low_cst (len, 1),
  				       builtin_strncpy_read_str,
! 				       (void *) p, dest_align, false))
  	    return NULL_RTX;
  
  	  dest_mem = get_memory_rtx (dest, len);
  	  store_by_pieces (dest_mem, tree_low_cst (len, 1),
  			   builtin_strncpy_read_str,
! 			   (void *) p, dest_align, false, 0);
  	  dest_mem = force_operand (XEXP (dest_mem, 0), NULL_RTX);
  	  dest_mem = convert_memory_address (ptr_mode, dest_mem);
  	  return dest_mem;
*************** expand_builtin_memset_args (tree dest, t
*** 3966,3979 ****
         * We can't pass builtin_memset_gen_str as that emits RTL.  */
        c = 1;
        if (host_integerp (len, 1)
- 	  && !(optimize_size && tree_low_cst (len, 1) > 1)
  	  && can_store_by_pieces (tree_low_cst (len, 1),
! 				  builtin_memset_read_str, &c, dest_align))
  	{
  	  val_rtx = force_reg (TYPE_MODE (unsigned_char_type_node),
  			       val_rtx);
  	  store_by_pieces (dest_mem, tree_low_cst (len, 1),
! 			   builtin_memset_gen_str, val_rtx, dest_align, 0);
  	}
        else if (!set_storage_via_setmem (dest_mem, len_rtx, val_rtx,
  					dest_align, expected_align,
--- 3967,3981 ----
         * We can't pass builtin_memset_gen_str as that emits RTL.  */
        c = 1;
        if (host_integerp (len, 1)
  	  && can_store_by_pieces (tree_low_cst (len, 1),
! 				  builtin_memset_read_str, &c, dest_align,
! 				  true))
  	{
  	  val_rtx = force_reg (TYPE_MODE (unsigned_char_type_node),
  			       val_rtx);
  	  store_by_pieces (dest_mem, tree_low_cst (len, 1),
! 			   builtin_memset_gen_str, val_rtx, dest_align,
! 			   true, 0);
  	}
        else if (!set_storage_via_setmem (dest_mem, len_rtx, val_rtx,
  					dest_align, expected_align,
*************** expand_builtin_memset_args (tree dest, t
*** 3991,4001 ****
    if (c)
      {
        if (host_integerp (len, 1)
- 	  && !(optimize_size && tree_low_cst (len, 1) > 1)
  	  && can_store_by_pieces (tree_low_cst (len, 1),
! 				  builtin_memset_read_str, &c, dest_align))
  	store_by_pieces (dest_mem, tree_low_cst (len, 1),
! 			 builtin_memset_read_str, &c, dest_align, 0);
        else if (!set_storage_via_setmem (dest_mem, len_rtx, GEN_INT (c),
  					dest_align, expected_align,
  					expected_size))
--- 3993,4003 ----
    if (c)
      {
        if (host_integerp (len, 1)
  	  && can_store_by_pieces (tree_low_cst (len, 1),
! 				  builtin_memset_read_str, &c, dest_align,
! 				  true))
  	store_by_pieces (dest_mem, tree_low_cst (len, 1),
! 			 builtin_memset_read_str, &c, dest_align, true, 0);
        else if (!set_storage_via_setmem (dest_mem, len_rtx, GEN_INT (c),
  					dest_align, expected_align,
  					expected_size))
Index: gcc/value-prof.c
===================================================================
*** gcc/value-prof.c	(revision 127324)
--- gcc/value-prof.c	(working copy)
*************** tree_stringops_transform (block_stmt_ite
*** 1392,1404 ****
      case BUILT_IN_MEMSET:
        if (!can_store_by_pieces (val, builtin_memset_read_str,
  				CALL_EXPR_ARG (call, 1),
! 				dest_align))
  	return false;
        break;
      case BUILT_IN_BZERO:
        if (!can_store_by_pieces (val, builtin_memset_read_str,
  				integer_zero_node,
! 				dest_align))
  	return false;
        break;
      default:
--- 1392,1404 ----
      case BUILT_IN_MEMSET:
        if (!can_store_by_pieces (val, builtin_memset_read_str,
  				CALL_EXPR_ARG (call, 1),
! 				dest_align, true))
  	return false;
        break;
      case BUILT_IN_BZERO:
        if (!can_store_by_pieces (val, builtin_memset_read_str,
  				integer_zero_node,
! 				dest_align, true))
  	return false;
        break;
      default:
Index: gcc/config/sh/sh.h
===================================================================
*** gcc/config/sh/sh.h	(revision 127324)
--- gcc/config/sh/sh.h	(working copy)
*************** struct sh_args {
*** 2184,2189 ****
--- 2184,2191 ----
    (move_by_pieces_ninsns (SIZE, ALIGN, STORE_MAX_PIECES + 1) \
     < (TARGET_SMALLCODE ? 2 : ((ALIGN >= 32) ? 16 : 2)))
  
+ #define SET_BY_PIECES_P(SIZE, ALIGN) STORE_BY_PIECES_P(SIZE, ALIGN)
+ 
  /* Macros to check register numbers against specific register classes.  */
  
  /* These assume that REGNO is a hard or pseudo reg number.
Index: gcc/config/s390/s390.h
===================================================================
*** gcc/config/s390/s390.h	(revision 127324)
--- gcc/config/s390/s390.h	(working copy)
*************** extern struct rtx_def *s390_compare_op0,
*** 803,812 ****
      || (TARGET_64BIT && (SIZE) == 8) )
  
  /* This macro is used to determine whether store_by_pieces should be
!    called to "memset" storage with byte values other than zero, or
!    to "memcpy" storage when the source is a constant string.  */
  #define STORE_BY_PIECES_P(SIZE, ALIGN) MOVE_BY_PIECES_P (SIZE, ALIGN)
  
  /* Don't perform CSE on function addresses.  */
  #define NO_FUNCTION_CSE
  
--- 803,815 ----
      || (TARGET_64BIT && (SIZE) == 8) )
  
  /* This macro is used to determine whether store_by_pieces should be
!    called to "memcpy" storage when the source is a constant string.  */
  #define STORE_BY_PIECES_P(SIZE, ALIGN) MOVE_BY_PIECES_P (SIZE, ALIGN)
  
+ /* Likewise to decide whether to "memset" storage with byte values
+    other than zero.  */
+ #define SET_BY_PIECES_P(SIZE, ALIGN) STORE_BY_PIECES_P (SIZE, ALIGN)
+ 
  /* Don't perform CSE on function addresses.  */
  #define NO_FUNCTION_CSE
  
Index: gcc/config/mips/mips.opt
===================================================================
*** gcc/config/mips/mips.opt	(revision 127325)
--- gcc/config/mips/mips.opt	(working copy)
*************** Target Report RejectNegative Mask(LONG64
*** 173,179 ****
  Use a 64-bit long type
  
  mmemcpy
! Target Report Var(TARGET_MEMCPY)
  Don't optimize block moves
  
  mmips-tfile
--- 173,179 ----
  Use a 64-bit long type
  
  mmemcpy
! Target Report Mask(MEMCPY)
  Don't optimize block moves
  
  mmips-tfile
Index: gcc/config/mips/mips.c
===================================================================
*** gcc/config/mips/mips.c	(revision 127325)
--- gcc/config/mips/mips.c	(working copy)
*************** override_options (void)
*** 5299,5304 ****
--- 5299,5309 ----
        flag_delayed_branch = 0;
      }
  
+   /* Prefer a call to memcpy over inline code when optimizing for size,
+      though see MOVE_RATIO in mips.h.  */
+   if (optimize_size && (target_flags_explicit & MASK_MEMCPY) == 0)
+     target_flags |= MASK_MEMCPY;
+ 
  #ifdef MIPS_TFMODE_FORMAT
    REAL_MODE_FORMAT (TFmode) = &MIPS_TFMODE_FORMAT;
  #endif
Index: gcc/config/mips/mips.h
===================================================================
*** gcc/config/mips/mips.h	(revision 127325)
--- gcc/config/mips/mips.h	(working copy)
*************** while (0)
*** 2780,2785 ****
--- 2780,2836 ----
  
  #undef PTRDIFF_TYPE
  #define PTRDIFF_TYPE (POINTER_SIZE == 64 ? "long int" : "int")
+ 
+ /* The base cost of a memcpy call, for MOVE_RATIO and friends.  These
+    values were determined experimentally by benchmarking with CSiBE.
+    In theory, the call overhead is higher for TARGET_ABICALLS (especially
+    for o32 where we have to restore $gp afterwards as well as make an
+    indirect  call), but in practice, bumping this up higher for
+    TARGET_ABICALLS doesn't make much difference to code size.  */
+ 
+ #define MIPS_CALL_RATIO 8
+ 
+ /* Define MOVE_RATIO to encourage use of movmemsi when enabled,
+    since it should always generate code at least as good as
+    move_by_pieces().  But when inline movmemsi pattern is disabled
+    (i.e., with -mips16 or -mmemcpy), instead use a value approximating
+    the length of a memcpy call sequence, so that move_by_pieces will
+    generate inline code if it is shorter than a function call.
+    Since move_by_pieces_ninsns() counts memory-to-memory moves, but
+    we'll have to generate a load/store pair for each, halve the value of 
+    MIPS_CALL_RATIO to take that into account.
+    The default value for MOVE_RATIO when HAVE_movmemsi is true is 2.
+    There is no point to setting it to less than this to try to disable
+    move_by_pieces entirely, because that also disables some desirable 
+    tree-level optimizations, specifically related to optimizing a
+    one-byte string copy into a simple move byte operation.  */
+ 
+ #define MOVE_RATIO \
+   ((TARGET_MIPS16 || TARGET_MEMCPY) ? MIPS_CALL_RATIO / 2 : 2)
+ 
+ /* For CLEAR_RATIO, when optimizing for size, give a better estimate
+    of the length of a memset call, but use the default otherwise.  */
+ 
+ #define CLEAR_RATIO \
+   (optimize_size ? MIPS_CALL_RATIO : 15)
+ 
+ /* This is similar to CLEAR_RATIO, but for a non-zero constant, so when
+    optimizing for size adjust the ratio to account for the overhead of
+    loading the constant and replicating it across the word.  */
+ 
+ #define SET_RATIO \
+   (optimize_size ? MIPS_CALL_RATIO - 2 : 15)
+ 
+ /* STORE_BY_PIECES_P can be used when copying a constant string, but
+    in that case each word takes 3 insns (lui, ori, sw), or more in
+    64-bit mode, instead of 2 (lw, sw).  For now we always fail this
+    and let the move_by_pieces code copy the string from read-only
+    memory.  In the future, this could be tuned further for multi-issue
+    CPUs that can issue stores down one pipe and arithmetic instructions
+    down another; in that case, the lui/ori/sw combination would be a
+    win for long enough strings.  */
+ 
+ #define STORE_BY_PIECES_P(SIZE, ALIGN) 0
  \f
  #ifndef __mips16
  /* Since the bits of the _init and _fini function is spread across

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: PATCH: fine-tuning for can_store_by_pieces
  2007-08-23 14:35                   ` Sandra Loosemore
@ 2007-08-23 14:44                     ` Richard Sandiford
  2007-08-25  5:35                       ` [committed] " Sandra Loosemore
  2007-08-24 22:06                     ` Mark Mitchell
  1 sibling, 1 reply; 30+ messages in thread
From: Richard Sandiford @ 2007-08-23 14:44 UTC (permalink / raw)
  To: Sandra Loosemore
  Cc: GCC Patches, Nigel Stephens, Guy Morrogh, David Ung,
	Thiemo Seufer, Mark Mitchell

Thanks for your patience and all the iterations and testing.
This version looks good to me.  (And thanks for the work on
the commentary.  I think it really will help the next person
who works on this.)

The MIPS parts are OK with one trivial fix:

Sandra Loosemore <sandra@codesourcery.com> writes:
> + /* The base cost of a memcpy call, for MOVE_RATIO and friends.  These
> +    values were determined experimentally by benchmarking with CSiBE.
> +    In theory, the call overhead is higher for TARGET_ABICALLS (especially
> +    for o32 where we have to restore $gp afterwards as well as make an
> +    indirect  call), but in practice, bumping this up higher for
               ^^
double space

Richard

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: PATCH: fine-tuning for can_store_by_pieces
  2007-08-23 14:35                   ` Sandra Loosemore
  2007-08-23 14:44                     ` Richard Sandiford
@ 2007-08-24 22:06                     ` Mark Mitchell
  1 sibling, 0 replies; 30+ messages in thread
From: Mark Mitchell @ 2007-08-24 22:06 UTC (permalink / raw)
  To: Sandra Loosemore
  Cc: GCC Patches, Nigel Stephens, Guy Morrogh, David Ung,
	Thiemo Seufer, richard

Sandra Loosemore wrote:

> Mark, could you take another look at this as well, since I included
> Richard's suggested change to remove the optimize_size check from the
> target-independent code in builtins.c?

The target-independent changes look fine.

Thanks,

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [committed] Re: PATCH: fine-tuning for can_store_by_pieces
  2007-08-23 14:44                     ` Richard Sandiford
@ 2007-08-25  5:35                       ` Sandra Loosemore
  2007-08-25  9:18                         ` Jakub Jelinek
  0 siblings, 1 reply; 30+ messages in thread
From: Sandra Loosemore @ 2007-08-25  5:35 UTC (permalink / raw)
  To: GCC Patches, Nigel Stephens, Guy Morrogh, David Ung,
	Thiemo Seufer, Mark Mitchell, richard, jakub

[-- Attachment #1: Type: text/plain, Size: 533 bytes --]

Richard Sandiford wrote:
> Thanks for your patience and all the iterations and testing.
> This version looks good to me.

Mark Mitchell wrote:
> The target-independent changes look fine.

OK, I've committed the patch.  I found that it collided with this one 
from Jakub:

http://gcc.gnu.org/ml/gcc-patches/2007-08/msg01641.html

but made the obvious correction, and verified that it still builds and 
that CSiBE results on MIPS do not regress from my previous version. 
I've attached the final version of the patch.

-Sandra









[-- Attachment #2: 31c-frob-by-pieces.log --]
[-- Type: text/x-log, Size: 1409 bytes --]

2007-08-24  Sandra Loosemore  <sandra@codesourcery.com>
            Nigel Stephens <nigel@mips.com>

	PR target/11787

	gcc/

	* doc/tm.texi (SET_RATIO, SET_BY_PIECES_P): Document new macros.
	(STORE_BY_PIECES_P): No longer applies to __builtin_memset.
	* expr.c (SET_BY_PIECES_P): Define.
	(can_store_by_pieces, store_by_pieces): Add MEMSETP argument; use
	it to decide whether to use SET_BY_PIECES_P or STORE_BY_PIECES_P.
	(store_expr):  Pass MEMSETP argument to can_store_by_pieces and
	store_by_pieces.
	* expr.h (SET_RATIO): Define.
	(can_store_by_pieces, store_by_pieces):	Update prototypes.
	* builtins.c (expand_builtin_memcpy): Pass MEMSETP argument to
	can_store_by_pieces/store_by_pieces.
	(expand_builtin_memcpy_args): Likewise.
	(expand_builtin_strncpy): Likewise.
	(expand_builtin_memset_args): Likewise.  Also remove special case
	for optimize_size so that can_store_by_pieces/SET_BY_PIECES_P can
	decide what to do instead.
	* value-prof.c (tree_stringops_transform): Pass MEMSETP argument
	to can_store_by_pieces.

	* config/sh/sh.h (SET_BY_PIECES_P): Clone from STORE_BY_PIECES_P.
	* config/s390/s390.h (SET_BY_PIECES_P): Likewise.

	* config/mips/mips.opt (mmemcpy): Change from Var to Mask.
	* config/mips/mips.c (override_options): Make -Os default to -mmemcpy.
	* config/mips/mips.h (MIPS_CALL_RATIO): Define.
	(MOVE_RATIO, CLEAR_RATIO, SET_RATIO): Define.
	(STORE_BY_PIECES_P): Define.

[-- Attachment #3: 31c-frob-by-pieces.patch --]
[-- Type: text/x-patch, Size: 24084 bytes --]

Index: gcc/doc/tm.texi
===================================================================
*** gcc/doc/tm.texi	(revision 127789)
--- gcc/doc/tm.texi	(working copy)
*************** will be used.  Defaults to 1 if @code{mo
*** 5897,5908 ****
  than @code{CLEAR_RATIO}.
  @end defmac
  
  @defmac STORE_BY_PIECES_P (@var{size}, @var{alignment})
  A C expression used to determine whether @code{store_by_pieces} will be
! used to set a chunk of memory to a constant value, or whether some other
! mechanism will be used.  Used by @code{__builtin_memset} when storing
! values other than constant zero and by @code{__builtin_strcpy} when
! when called with a constant source string.
  Defaults to 1 if @code{move_by_pieces_ninsns} returns less
  than @code{MOVE_RATIO}.
  @end defmac
--- 5897,5926 ----
  than @code{CLEAR_RATIO}.
  @end defmac
  
+ @defmac SET_RATIO
+ The threshold of number of scalar move insns, @emph{below} which a sequence
+ of insns should be generated to set memory to a constant value, instead of
+ a block set insn or a library call.  
+ Increasing the value will always make code faster, but
+ eventually incurs high cost in increased code size.
+ 
+ If you don't define this, it defaults to the value of @code{MOVE_RATIO}.
+ @end defmac
+ 
+ @defmac SET_BY_PIECES_P (@var{size}, @var{alignment})
+ A C expression used to determine whether @code{store_by_pieces} will be
+ used to set a chunk of memory to a constant value, or whether some 
+ other mechanism will be used.  Used by @code{__builtin_memset} when 
+ storing values other than constant zero.
+ Defaults to 1 if @code{move_by_pieces_ninsns} returns less
+ than @code{SET_RATIO}.
+ @end defmac
+ 
  @defmac STORE_BY_PIECES_P (@var{size}, @var{alignment})
  A C expression used to determine whether @code{store_by_pieces} will be
! used to set a chunk of memory to a constant string value, or whether some 
! other mechanism will be used.  Used by @code{__builtin_strcpy} when
! called with a constant source string.
  Defaults to 1 if @code{move_by_pieces_ninsns} returns less
  than @code{MOVE_RATIO}.
  @end defmac
Index: gcc/expr.c
===================================================================
*** gcc/expr.c	(revision 127789)
--- gcc/expr.c	(working copy)
*************** static bool float_extend_from_mem[NUM_MA
*** 186,193 ****
  #endif
  
  /* This macro is used to determine whether store_by_pieces should be
!    called to "memset" storage with byte values other than zero, or
!    to "memcpy" storage when the source is a constant string.  */
  #ifndef STORE_BY_PIECES_P
  #define STORE_BY_PIECES_P(SIZE, ALIGN) \
    (move_by_pieces_ninsns (SIZE, ALIGN, STORE_MAX_PIECES + 1) \
--- 186,200 ----
  #endif
  
  /* This macro is used to determine whether store_by_pieces should be
!    called to "memset" storage with byte values other than zero.  */
! #ifndef SET_BY_PIECES_P
! #define SET_BY_PIECES_P(SIZE, ALIGN) \
!   (move_by_pieces_ninsns (SIZE, ALIGN, STORE_MAX_PIECES + 1) \
!    < (unsigned int) SET_RATIO)
! #endif
! 
! /* This macro is used to determine whether store_by_pieces should be
!    called to "memcpy" storage when the source is a constant string.  */
  #ifndef STORE_BY_PIECES_P
  #define STORE_BY_PIECES_P(SIZE, ALIGN) \
    (move_by_pieces_ninsns (SIZE, ALIGN, STORE_MAX_PIECES + 1) \
*************** use_group_regs (rtx *call_fusage, rtx re
*** 2191,2203 ****
  /* Determine whether the LEN bytes generated by CONSTFUN can be
     stored to memory using several move instructions.  CONSTFUNDATA is
     a pointer which will be passed as argument in every CONSTFUN call.
!    ALIGN is maximum alignment we can assume.  Return nonzero if a
!    call to store_by_pieces should succeed.  */
  
  int
  can_store_by_pieces (unsigned HOST_WIDE_INT len,
  		     rtx (*constfun) (void *, HOST_WIDE_INT, enum machine_mode),
! 		     void *constfundata, unsigned int align)
  {
    unsigned HOST_WIDE_INT l;
    unsigned int max_size;
--- 2198,2211 ----
  /* Determine whether the LEN bytes generated by CONSTFUN can be
     stored to memory using several move instructions.  CONSTFUNDATA is
     a pointer which will be passed as argument in every CONSTFUN call.
!    ALIGN is maximum alignment we can assume.  MEMSETP is true if this is
!    a memset operation and false if it's a copy of a constant string.
!    Return nonzero if a call to store_by_pieces should succeed.  */
  
  int
  can_store_by_pieces (unsigned HOST_WIDE_INT len,
  		     rtx (*constfun) (void *, HOST_WIDE_INT, enum machine_mode),
! 		     void *constfundata, unsigned int align, bool memsetp)
  {
    unsigned HOST_WIDE_INT l;
    unsigned int max_size;
*************** can_store_by_pieces (unsigned HOST_WIDE_
*** 2210,2216 ****
    if (len == 0)
      return 1;
  
!   if (! STORE_BY_PIECES_P (len, align))
      return 0;
  
    tmode = mode_for_size (STORE_MAX_PIECES * BITS_PER_UNIT, MODE_INT, 1);
--- 2218,2226 ----
    if (len == 0)
      return 1;
  
!   if (! (memsetp 
! 	 ? SET_BY_PIECES_P (len, align)
! 	 : STORE_BY_PIECES_P (len, align)))
      return 0;
  
    tmode = mode_for_size (STORE_MAX_PIECES * BITS_PER_UNIT, MODE_INT, 1);
*************** can_store_by_pieces (unsigned HOST_WIDE_
*** 2285,2291 ****
  /* Generate several move instructions to store LEN bytes generated by
     CONSTFUN to block TO.  (A MEM rtx with BLKmode).  CONSTFUNDATA is a
     pointer which will be passed as argument in every CONSTFUN call.
!    ALIGN is maximum alignment we can assume.
     If ENDP is 0 return to, if ENDP is 1 return memory at the end ala
     mempcpy, and if ENDP is 2 return memory the end minus one byte ala
     stpcpy.  */
--- 2295,2302 ----
  /* Generate several move instructions to store LEN bytes generated by
     CONSTFUN to block TO.  (A MEM rtx with BLKmode).  CONSTFUNDATA is a
     pointer which will be passed as argument in every CONSTFUN call.
!    ALIGN is maximum alignment we can assume.  MEMSETP is true if this is
!    a memset operation and false if it's a copy of a constant string.
     If ENDP is 0 return to, if ENDP is 1 return memory at the end ala
     mempcpy, and if ENDP is 2 return memory the end minus one byte ala
     stpcpy.  */
*************** can_store_by_pieces (unsigned HOST_WIDE_
*** 2293,2299 ****
  rtx
  store_by_pieces (rtx to, unsigned HOST_WIDE_INT len,
  		 rtx (*constfun) (void *, HOST_WIDE_INT, enum machine_mode),
! 		 void *constfundata, unsigned int align, int endp)
  {
    struct store_by_pieces data;
  
--- 2304,2310 ----
  rtx
  store_by_pieces (rtx to, unsigned HOST_WIDE_INT len,
  		 rtx (*constfun) (void *, HOST_WIDE_INT, enum machine_mode),
! 		 void *constfundata, unsigned int align, bool memsetp, int endp)
  {
    struct store_by_pieces data;
  
*************** store_by_pieces (rtx to, unsigned HOST_W
*** 2303,2309 ****
        return to;
      }
  
!   gcc_assert (STORE_BY_PIECES_P (len, align));
    data.constfun = constfun;
    data.constfundata = constfundata;
    data.len = len;
--- 2314,2322 ----
        return to;
      }
  
!   gcc_assert (memsetp
! 	      ? SET_BY_PIECES_P (len, align)
! 	      : STORE_BY_PIECES_P (len, align));
    data.constfun = constfun;
    data.constfundata = constfundata;
    data.len = len;
*************** store_expr (tree exp, rtx target, int ca
*** 4498,4504 ****
        str_copy_len = MIN (str_copy_len, exp_len);
        if (!can_store_by_pieces (str_copy_len, builtin_strncpy_read_str,
  				(void *) TREE_STRING_POINTER (exp),
! 				MEM_ALIGN (target)))
  	goto normal_expr;
  
        dest_mem = target;
--- 4511,4517 ----
        str_copy_len = MIN (str_copy_len, exp_len);
        if (!can_store_by_pieces (str_copy_len, builtin_strncpy_read_str,
  				(void *) TREE_STRING_POINTER (exp),
! 				MEM_ALIGN (target), false))
  	goto normal_expr;
  
        dest_mem = target;
*************** store_expr (tree exp, rtx target, int ca
*** 4507,4513 ****
  				  str_copy_len, builtin_strncpy_read_str,
  				  (void *) TREE_STRING_POINTER (exp),
  				  MEM_ALIGN (target),
! 				  exp_len > str_copy_len ? 1 : 0);
        if (exp_len > str_copy_len)
  	clear_storage (dest_mem, GEN_INT (exp_len - str_copy_len),
  		       BLOCK_OP_NORMAL);
--- 4520,4527 ----
  				  str_copy_len, builtin_strncpy_read_str,
  				  (void *) TREE_STRING_POINTER (exp),
  				  MEM_ALIGN (target),
! 				  exp_len > str_copy_len ? 1 : 0,
! 				  false);
        if (exp_len > str_copy_len)
  	clear_storage (dest_mem, GEN_INT (exp_len - str_copy_len),
  		       BLOCK_OP_NORMAL);
Index: gcc/expr.h
===================================================================
*** gcc/expr.h	(revision 127789)
--- gcc/expr.h	(working copy)
*************** enum expand_modifier {EXPAND_NORMAL = 0,
*** 84,89 ****
--- 84,96 ----
  #define CLEAR_RATIO (optimize_size ? 3 : 15)
  #endif
  #endif
+ 
+ /* If a memory set (to value other than zero) operation would take
+    SET_RATIO or more simple move-instruction sequences, we will do a movmem
+    or libcall instead.  */
+ #ifndef SET_RATIO
+ #define SET_RATIO MOVE_RATIO
+ #endif
  \f
  enum direction {none, upward, downward};
  
*************** extern int can_move_by_pieces (unsigned 
*** 444,463 ****
     CONSTFUN with several move instructions by store_by_pieces
     function.  CONSTFUNDATA is a pointer which will be passed as argument
     in every CONSTFUN call.
!    ALIGN is maximum alignment we can assume.  */
  extern int can_store_by_pieces (unsigned HOST_WIDE_INT,
  				rtx (*) (void *, HOST_WIDE_INT,
  					 enum machine_mode),
! 				void *, unsigned int);
  
  /* Generate several move instructions to store LEN bytes generated by
     CONSTFUN to block TO.  (A MEM rtx with BLKmode).  CONSTFUNDATA is a
     pointer which will be passed as argument in every CONSTFUN call.
     ALIGN is maximum alignment we can assume.
     Returns TO + LEN.  */
  extern rtx store_by_pieces (rtx, unsigned HOST_WIDE_INT,
  			    rtx (*) (void *, HOST_WIDE_INT, enum machine_mode),
! 			    void *, unsigned int, int);
  
  /* Emit insns to set X from Y.  */
  extern rtx emit_move_insn (rtx, rtx);
--- 451,473 ----
     CONSTFUN with several move instructions by store_by_pieces
     function.  CONSTFUNDATA is a pointer which will be passed as argument
     in every CONSTFUN call.
!    ALIGN is maximum alignment we can assume.
!    MEMSETP is true if this is a real memset/bzero, not a copy
!    of a const string.  */
  extern int can_store_by_pieces (unsigned HOST_WIDE_INT,
  				rtx (*) (void *, HOST_WIDE_INT,
  					 enum machine_mode),
! 				void *, unsigned int, bool);
  
  /* Generate several move instructions to store LEN bytes generated by
     CONSTFUN to block TO.  (A MEM rtx with BLKmode).  CONSTFUNDATA is a
     pointer which will be passed as argument in every CONSTFUN call.
     ALIGN is maximum alignment we can assume.
+    MEMSETP is true if this is a real memset/bzero, not a copy.
     Returns TO + LEN.  */
  extern rtx store_by_pieces (rtx, unsigned HOST_WIDE_INT,
  			    rtx (*) (void *, HOST_WIDE_INT, enum machine_mode),
! 			    void *, unsigned int, bool, int);
  
  /* Emit insns to set X from Y.  */
  extern rtx emit_move_insn (rtx, rtx);
Index: gcc/builtins.c
===================================================================
*** gcc/builtins.c	(revision 127789)
--- gcc/builtins.c	(working copy)
*************** expand_builtin_memcpy (tree exp, rtx tar
*** 3331,3341 ****
  	  && GET_CODE (len_rtx) == CONST_INT
  	  && (unsigned HOST_WIDE_INT) INTVAL (len_rtx) <= strlen (src_str) + 1
  	  && can_store_by_pieces (INTVAL (len_rtx), builtin_memcpy_read_str,
! 				  (void *) src_str, dest_align))
  	{
  	  dest_mem = store_by_pieces (dest_mem, INTVAL (len_rtx),
  				      builtin_memcpy_read_str,
! 				      (void *) src_str, dest_align, 0);
  	  dest_mem = force_operand (XEXP (dest_mem, 0), NULL_RTX);
  	  dest_mem = convert_memory_address (ptr_mode, dest_mem);
  	  return dest_mem;
--- 3331,3341 ----
  	  && GET_CODE (len_rtx) == CONST_INT
  	  && (unsigned HOST_WIDE_INT) INTVAL (len_rtx) <= strlen (src_str) + 1
  	  && can_store_by_pieces (INTVAL (len_rtx), builtin_memcpy_read_str,
! 				  (void *) src_str, dest_align, false))
  	{
  	  dest_mem = store_by_pieces (dest_mem, INTVAL (len_rtx),
  				      builtin_memcpy_read_str,
! 				      (void *) src_str, dest_align, false, 0);
  	  dest_mem = force_operand (XEXP (dest_mem, 0), NULL_RTX);
  	  dest_mem = convert_memory_address (ptr_mode, dest_mem);
  	  return dest_mem;
*************** expand_builtin_mempcpy_args (tree dest, 
*** 3444,3456 ****
  	  && GET_CODE (len_rtx) == CONST_INT
  	  && (unsigned HOST_WIDE_INT) INTVAL (len_rtx) <= strlen (src_str) + 1
  	  && can_store_by_pieces (INTVAL (len_rtx), builtin_memcpy_read_str,
! 				  (void *) src_str, dest_align))
  	{
  	  dest_mem = get_memory_rtx (dest, len);
  	  set_mem_align (dest_mem, dest_align);
  	  dest_mem = store_by_pieces (dest_mem, INTVAL (len_rtx),
  				      builtin_memcpy_read_str,
! 				      (void *) src_str, dest_align, endp);
  	  dest_mem = force_operand (XEXP (dest_mem, 0), NULL_RTX);
  	  dest_mem = convert_memory_address (ptr_mode, dest_mem);
  	  return dest_mem;
--- 3444,3457 ----
  	  && GET_CODE (len_rtx) == CONST_INT
  	  && (unsigned HOST_WIDE_INT) INTVAL (len_rtx) <= strlen (src_str) + 1
  	  && can_store_by_pieces (INTVAL (len_rtx), builtin_memcpy_read_str,
! 				  (void *) src_str, dest_align, false))
  	{
  	  dest_mem = get_memory_rtx (dest, len);
  	  set_mem_align (dest_mem, dest_align);
  	  dest_mem = store_by_pieces (dest_mem, INTVAL (len_rtx),
  				      builtin_memcpy_read_str,
! 				      (void *) src_str, dest_align,
! 				      false, endp);
  	  dest_mem = force_operand (XEXP (dest_mem, 0), NULL_RTX);
  	  dest_mem = convert_memory_address (ptr_mode, dest_mem);
  	  return dest_mem;
*************** expand_builtin_strncpy (tree exp, rtx ta
*** 3792,3804 ****
  	  if (!p || dest_align == 0 || !host_integerp (len, 1)
  	      || !can_store_by_pieces (tree_low_cst (len, 1),
  				       builtin_strncpy_read_str,
! 				       (void *) p, dest_align))
  	    return NULL_RTX;
  
  	  dest_mem = get_memory_rtx (dest, len);
  	  store_by_pieces (dest_mem, tree_low_cst (len, 1),
  			   builtin_strncpy_read_str,
! 			   (void *) p, dest_align, 0);
  	  dest_mem = force_operand (XEXP (dest_mem, 0), NULL_RTX);
  	  dest_mem = convert_memory_address (ptr_mode, dest_mem);
  	  return dest_mem;
--- 3793,3805 ----
  	  if (!p || dest_align == 0 || !host_integerp (len, 1)
  	      || !can_store_by_pieces (tree_low_cst (len, 1),
  				       builtin_strncpy_read_str,
! 				       (void *) p, dest_align, false))
  	    return NULL_RTX;
  
  	  dest_mem = get_memory_rtx (dest, len);
  	  store_by_pieces (dest_mem, tree_low_cst (len, 1),
  			   builtin_strncpy_read_str,
! 			   (void *) p, dest_align, false, 0);
  	  dest_mem = force_operand (XEXP (dest_mem, 0), NULL_RTX);
  	  dest_mem = convert_memory_address (ptr_mode, dest_mem);
  	  return dest_mem;
*************** expand_builtin_memset_args (tree dest, t
*** 3926,3939 ****
         * We can't pass builtin_memset_gen_str as that emits RTL.  */
        c = 1;
        if (host_integerp (len, 1)
- 	  && !(optimize_size && tree_low_cst (len, 1) > 1)
  	  && can_store_by_pieces (tree_low_cst (len, 1),
! 				  builtin_memset_read_str, &c, dest_align))
  	{
  	  val_rtx = force_reg (TYPE_MODE (unsigned_char_type_node),
  			       val_rtx);
  	  store_by_pieces (dest_mem, tree_low_cst (len, 1),
! 			   builtin_memset_gen_str, val_rtx, dest_align, 0);
  	}
        else if (!set_storage_via_setmem (dest_mem, len_rtx, val_rtx,
  					dest_align, expected_align,
--- 3927,3941 ----
         * We can't pass builtin_memset_gen_str as that emits RTL.  */
        c = 1;
        if (host_integerp (len, 1)
  	  && can_store_by_pieces (tree_low_cst (len, 1),
! 				  builtin_memset_read_str, &c, dest_align,
! 				  true))
  	{
  	  val_rtx = force_reg (TYPE_MODE (unsigned_char_type_node),
  			       val_rtx);
  	  store_by_pieces (dest_mem, tree_low_cst (len, 1),
! 			   builtin_memset_gen_str, val_rtx, dest_align,
! 			   true, 0);
  	}
        else if (!set_storage_via_setmem (dest_mem, len_rtx, val_rtx,
  					dest_align, expected_align,
*************** expand_builtin_memset_args (tree dest, t
*** 3951,3961 ****
    if (c)
      {
        if (host_integerp (len, 1)
- 	  && !(optimize_size && tree_low_cst (len, 1) > 1)
  	  && can_store_by_pieces (tree_low_cst (len, 1),
! 				  builtin_memset_read_str, &c, dest_align))
  	store_by_pieces (dest_mem, tree_low_cst (len, 1),
! 			 builtin_memset_read_str, &c, dest_align, 0);
        else if (!set_storage_via_setmem (dest_mem, len_rtx, GEN_INT (c),
  					dest_align, expected_align,
  					expected_size))
--- 3953,3963 ----
    if (c)
      {
        if (host_integerp (len, 1)
  	  && can_store_by_pieces (tree_low_cst (len, 1),
! 				  builtin_memset_read_str, &c, dest_align,
! 				  true))
  	store_by_pieces (dest_mem, tree_low_cst (len, 1),
! 			 builtin_memset_read_str, &c, dest_align, true, 0);
        else if (!set_storage_via_setmem (dest_mem, len_rtx, GEN_INT (c),
  					dest_align, expected_align,
  					expected_size))
Index: gcc/value-prof.c
===================================================================
*** gcc/value-prof.c	(revision 127789)
--- gcc/value-prof.c	(working copy)
*************** tree_stringops_transform (block_stmt_ite
*** 1392,1404 ****
      case BUILT_IN_MEMSET:
        if (!can_store_by_pieces (val, builtin_memset_read_str,
  				CALL_EXPR_ARG (call, 1),
! 				dest_align))
  	return false;
        break;
      case BUILT_IN_BZERO:
        if (!can_store_by_pieces (val, builtin_memset_read_str,
  				integer_zero_node,
! 				dest_align))
  	return false;
        break;
      default:
--- 1392,1404 ----
      case BUILT_IN_MEMSET:
        if (!can_store_by_pieces (val, builtin_memset_read_str,
  				CALL_EXPR_ARG (call, 1),
! 				dest_align, true))
  	return false;
        break;
      case BUILT_IN_BZERO:
        if (!can_store_by_pieces (val, builtin_memset_read_str,
  				integer_zero_node,
! 				dest_align, true))
  	return false;
        break;
      default:
Index: gcc/config/sh/sh.h
===================================================================
*** gcc/config/sh/sh.h	(revision 127789)
--- gcc/config/sh/sh.h	(working copy)
*************** struct sh_args {
*** 2184,2189 ****
--- 2184,2191 ----
    (move_by_pieces_ninsns (SIZE, ALIGN, STORE_MAX_PIECES + 1) \
     < (TARGET_SMALLCODE ? 2 : ((ALIGN >= 32) ? 16 : 2)))
  
+ #define SET_BY_PIECES_P(SIZE, ALIGN) STORE_BY_PIECES_P(SIZE, ALIGN)
+ 
  /* Macros to check register numbers against specific register classes.  */
  
  /* These assume that REGNO is a hard or pseudo reg number.
Index: gcc/config/s390/s390.h
===================================================================
*** gcc/config/s390/s390.h	(revision 127789)
--- gcc/config/s390/s390.h	(working copy)
*************** extern struct rtx_def *s390_compare_op0,
*** 803,812 ****
      || (TARGET_64BIT && (SIZE) == 8) )
  
  /* This macro is used to determine whether store_by_pieces should be
!    called to "memset" storage with byte values other than zero, or
!    to "memcpy" storage when the source is a constant string.  */
  #define STORE_BY_PIECES_P(SIZE, ALIGN) MOVE_BY_PIECES_P (SIZE, ALIGN)
  
  /* Don't perform CSE on function addresses.  */
  #define NO_FUNCTION_CSE
  
--- 803,815 ----
      || (TARGET_64BIT && (SIZE) == 8) )
  
  /* This macro is used to determine whether store_by_pieces should be
!    called to "memcpy" storage when the source is a constant string.  */
  #define STORE_BY_PIECES_P(SIZE, ALIGN) MOVE_BY_PIECES_P (SIZE, ALIGN)
  
+ /* Likewise to decide whether to "memset" storage with byte values
+    other than zero.  */
+ #define SET_BY_PIECES_P(SIZE, ALIGN) STORE_BY_PIECES_P (SIZE, ALIGN)
+ 
  /* Don't perform CSE on function addresses.  */
  #define NO_FUNCTION_CSE
  
Index: gcc/config/mips/mips.opt
===================================================================
*** gcc/config/mips/mips.opt	(revision 127789)
--- gcc/config/mips/mips.opt	(working copy)
*************** Target Report RejectNegative Mask(LONG64
*** 173,179 ****
  Use a 64-bit long type
  
  mmemcpy
! Target Report Var(TARGET_MEMCPY)
  Don't optimize block moves
  
  mmips-tfile
--- 173,179 ----
  Use a 64-bit long type
  
  mmemcpy
! Target Report Mask(MEMCPY)
  Don't optimize block moves
  
  mmips-tfile
Index: gcc/config/mips/mips.c
===================================================================
*** gcc/config/mips/mips.c	(revision 127789)
--- gcc/config/mips/mips.c	(working copy)
*************** override_options (void)
*** 5323,5328 ****
--- 5323,5333 ----
        flag_delayed_branch = 0;
      }
  
+   /* Prefer a call to memcpy over inline code when optimizing for size,
+      though see MOVE_RATIO in mips.h.  */
+   if (optimize_size && (target_flags_explicit & MASK_MEMCPY) == 0)
+     target_flags |= MASK_MEMCPY;
+ 
  #ifdef MIPS_TFMODE_FORMAT
    REAL_MODE_FORMAT (TFmode) = &MIPS_TFMODE_FORMAT;
  #endif
Index: gcc/config/mips/mips.h
===================================================================
*** gcc/config/mips/mips.h	(revision 127789)
--- gcc/config/mips/mips.h	(working copy)
*************** while (0)
*** 2785,2790 ****
--- 2785,2841 ----
  
  #undef PTRDIFF_TYPE
  #define PTRDIFF_TYPE (POINTER_SIZE == 64 ? "long int" : "int")
+ 
+ /* The base cost of a memcpy call, for MOVE_RATIO and friends.  These
+    values were determined experimentally by benchmarking with CSiBE.
+    In theory, the call overhead is higher for TARGET_ABICALLS (especially
+    for o32 where we have to restore $gp afterwards as well as make an
+    indirect call), but in practice, bumping this up higher for
+    TARGET_ABICALLS doesn't make much difference to code size.  */
+ 
+ #define MIPS_CALL_RATIO 8
+ 
+ /* Define MOVE_RATIO to encourage use of movmemsi when enabled,
+    since it should always generate code at least as good as
+    move_by_pieces().  But when inline movmemsi pattern is disabled
+    (i.e., with -mips16 or -mmemcpy), instead use a value approximating
+    the length of a memcpy call sequence, so that move_by_pieces will
+    generate inline code if it is shorter than a function call.
+    Since move_by_pieces_ninsns() counts memory-to-memory moves, but
+    we'll have to generate a load/store pair for each, halve the value of 
+    MIPS_CALL_RATIO to take that into account.
+    The default value for MOVE_RATIO when HAVE_movmemsi is true is 2.
+    There is no point to setting it to less than this to try to disable
+    move_by_pieces entirely, because that also disables some desirable 
+    tree-level optimizations, specifically related to optimizing a
+    one-byte string copy into a simple move byte operation.  */
+ 
+ #define MOVE_RATIO \
+   ((TARGET_MIPS16 || TARGET_MEMCPY) ? MIPS_CALL_RATIO / 2 : 2)
+ 
+ /* For CLEAR_RATIO, when optimizing for size, give a better estimate
+    of the length of a memset call, but use the default otherwise.  */
+ 
+ #define CLEAR_RATIO \
+   (optimize_size ? MIPS_CALL_RATIO : 15)
+ 
+ /* This is similar to CLEAR_RATIO, but for a non-zero constant, so when
+    optimizing for size adjust the ratio to account for the overhead of
+    loading the constant and replicating it across the word.  */
+ 
+ #define SET_RATIO \
+   (optimize_size ? MIPS_CALL_RATIO - 2 : 15)
+ 
+ /* STORE_BY_PIECES_P can be used when copying a constant string, but
+    in that case each word takes 3 insns (lui, ori, sw), or more in
+    64-bit mode, instead of 2 (lw, sw).  For now we always fail this
+    and let the move_by_pieces code copy the string from read-only
+    memory.  In the future, this could be tuned further for multi-issue
+    CPUs that can issue stores down one pipe and arithmetic instructions
+    down another; in that case, the lui/ori/sw combination would be a
+    win for long enough strings.  */
+ 
+ #define STORE_BY_PIECES_P(SIZE, ALIGN) 0
  \f
  #ifndef __mips16
  /* Since the bits of the _init and _fini function is spread across

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [committed] Re: PATCH: fine-tuning for can_store_by_pieces
  2007-08-25  5:35                       ` [committed] " Sandra Loosemore
@ 2007-08-25  9:18                         ` Jakub Jelinek
  2007-08-25  9:58                           ` Jakub Jelinek
                                             ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Jakub Jelinek @ 2007-08-25  9:18 UTC (permalink / raw)
  To: Sandra Loosemore
  Cc: GCC Patches, Nigel Stephens, Guy Morrogh, David Ung,
	Thiemo Seufer, Mark Mitchell, richard

On Fri, Aug 24, 2007 at 08:09:22PM -0400, Sandra Loosemore wrote:
> Richard Sandiford wrote:
> >Thanks for your patience and all the iterations and testing.
> >This version looks good to me.
> 
> Mark Mitchell wrote:
> >The target-independent changes look fine.
> 
> OK, I've committed the patch.  I found that it collided with this one 
> from Jakub:
> 
> http://gcc.gnu.org/ml/gcc-patches/2007-08/msg01641.html
> 
> but made the obvious correction, and verified that it still builds and 
> that CSiBE results on MIPS do not regress from my previous version. 
> I've attached the final version of the patch.

*************** store_expr (tree exp, rtx target, int ca                                                                         
*** 4507,4513 ****                                                                                                               
                                  str_copy_len, builtin_strncpy_read_str,                                                        
                                  (void *) TREE_STRING_POINTER (exp),                                                            
                                  MEM_ALIGN (target),                                                                            
!                                 exp_len > str_copy_len ? 1 : 0);                                                               
        if (exp_len > str_copy_len)                                                                                              
        clear_storage (dest_mem, GEN_INT (exp_len - str_copy_len),                                                               
                       BLOCK_OP_NORMAL);                                                                                         
--- 4520,4527 ----                                                                                                               
                                  str_copy_len, builtin_strncpy_read_str,                                                        
                                  (void *) TREE_STRING_POINTER (exp),                                                            
                                  MEM_ALIGN (target),                                                                            
!                                 exp_len > str_copy_len ? 1 : 0,                                                                
!                                 false);                                                                                        
        if (exp_len > str_copy_len)                                                                                              
        clear_storage (dest_mem, GEN_INT (exp_len - str_copy_len),                                                               
                       BLOCK_OP_NORMAL);                                                                                         

This is wrong.  You added the memsetp argument to store_by_pieces
as the 6th, before the endp argument, but you are passing
exp_len > str_copy_len ? 1 : 0 as memsetp and false as endp.
This will certainly screw up say
struct A { char a[20]; };
void foo (void)
{
  struct A a = { "abc" };
  bar (&a);
}
because in that case exp_len (20) is bigger than str_copy_len
and so clear_storage is needed for the rest of the array.
If we pass false == 0 as endp, it means the returned endp
will point to the beginning of the array (&a.a[0]), rather
than where store_by_pieces stopped.

	Jakub

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [committed] Re: PATCH: fine-tuning for can_store_by_pieces
  2007-08-25  9:18                         ` Jakub Jelinek
@ 2007-08-25  9:58                           ` Jakub Jelinek
  2007-08-25 14:30                           ` gcc.c-torture/execute/20030221-1.c regressed with "fine-tuning for can_store_by_pieces" Hans-Peter Nilsson
  2007-08-25 14:40                           ` [committed] Re: PATCH: fine-tuning for can_store_by_pieces Sandra Loosemore
  2 siblings, 0 replies; 30+ messages in thread
From: Jakub Jelinek @ 2007-08-25  9:58 UTC (permalink / raw)
  To: Sandra Loosemore
  Cc: GCC Patches, Nigel Stephens, Guy Morrogh, David Ung,
	Thiemo Seufer, Mark Mitchell, richard

[-- Attachment #1: Type: text/plain, Size: 3273 bytes --]

On Sat, Aug 25, 2007 at 02:59:54AM -0400, Jakub Jelinek wrote:
> *************** store_expr (tree exp, rtx target, int ca                                                                         
> *** 4507,4513 ****                                                                                                               
>                                   str_copy_len, builtin_strncpy_read_str,                                                        
>                                   (void *) TREE_STRING_POINTER (exp),                                                            
>                                   MEM_ALIGN (target),                                                                            
> !                                 exp_len > str_copy_len ? 1 : 0);                                                               
>         if (exp_len > str_copy_len)                                                                                              
>         clear_storage (dest_mem, GEN_INT (exp_len - str_copy_len),                                                               
>                        BLOCK_OP_NORMAL);                                                                                         
> --- 4520,4527 ----                                                                                                               
>                                   str_copy_len, builtin_strncpy_read_str,                                                        
>                                   (void *) TREE_STRING_POINTER (exp),                                                            
>                                   MEM_ALIGN (target),                                                                            
> !                                 exp_len > str_copy_len ? 1 : 0,                                                                
> !                                 false);                                                                                        
>         if (exp_len > str_copy_len)                                                                                              
>         clear_storage (dest_mem, GEN_INT (exp_len - str_copy_len),                                                               
>                        BLOCK_OP_NORMAL);                                                                                         
> 
> This is wrong.  You added the memsetp argument to store_by_pieces
> as the 6th, before the endp argument, but you are passing
> exp_len > str_copy_len ? 1 : 0 as memsetp and false as endp.
> This will certainly screw up say
> struct A { char a[20]; };
> void foo (void)
> {
>   struct A a = { "abc" };
>   bar (&a);
> }
> because in that case exp_len (20) is bigger than str_copy_len
> and so clear_storage is needed for the rest of the array.
> If we pass false == 0 as endp, it means the returned endp
> will point to the beginning of the array (&a.a[0]), rather
> than where store_by_pieces stopped.

Surprised this wasn't caught up by make check, I have committed following as
obvious.  This new testcase at least on x86_64-linux and i686-linux
fails after your patch and no longer does after swapping the arguments.

	Jakub

[-- Attachment #2: O --]
[-- Type: text/plain, Size: 1815 bytes --]

2007-08-25  Jakub Jelinek  <jakub@redhat.com>

	* expr.c (store_expr): Fix order of store_by_pieces arguments.

	* gcc.dg/array-init-2.c: New test.

--- gcc/expr.c.jj	2007-08-25 09:06:46.000000000 +0200
+++ gcc/expr.c	2007-08-25 09:24:07.000000000 +0200
@@ -4519,9 +4519,8 @@ store_expr (tree exp, rtx target, int ca
       dest_mem = store_by_pieces (dest_mem,
 				  str_copy_len, builtin_strncpy_read_str,
 				  (void *) TREE_STRING_POINTER (exp),
-				  MEM_ALIGN (target),
-				  exp_len > str_copy_len ? 1 : 0,
-				  false);
+				  MEM_ALIGN (target), false,
+				  exp_len > str_copy_len ? 1 : 0);
       if (exp_len > str_copy_len)
 	clear_storage (dest_mem, GEN_INT (exp_len - str_copy_len),
 		       BLOCK_OP_NORMAL);
--- gcc/testsuite/gcc.dg/array-init-2.c.jj	2007-08-25 09:23:19.000000000 +0200
+++ gcc/testsuite/gcc.dg/array-init-2.c	2007-08-25 09:17:39.000000000 +0200
@@ -0,0 +1,51 @@
+/* Test array initializion by store_by_pieces.  */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+struct A { char c[10]; };
+extern void abort (void);
+
+void
+__attribute__((noinline))
+check (struct A * a, int b)
+{
+  const char *p;
+  switch (b)
+    {
+    case 0:
+      p = "abcdefghi";
+      break;
+    case 1:
+      p = "j\0\0\0\0\0\0\0\0";
+      break;
+    case 2:
+      p = "kl\0\0\0\0\0\0\0";
+      break;
+    case 3:
+      p = "mnop\0\0\0\0\0";
+      break;
+    case 4:
+      p = "qrstuvwx\0";
+      break;
+    default:
+      abort ();
+    }
+  if (__builtin_memcmp (a->c, p, 10) != 0)
+    abort ();
+}
+
+int
+main (void)
+{
+  struct A a = { "abcdefghi" };
+  check (&a, 0);
+  struct A b = { "j" };
+  check (&b, 1);
+  struct A c = { "kl" };
+  check (&c, 2);
+  struct A d = { "mnop" };
+  check (&d, 3);
+  struct A e = { "qrstuvwx" };
+  check (&e, 4);
+  return 0;
+}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* gcc.c-torture/execute/20030221-1.c regressed with "fine-tuning for  can_store_by_pieces"
  2007-08-25  9:18                         ` Jakub Jelinek
  2007-08-25  9:58                           ` Jakub Jelinek
@ 2007-08-25 14:30                           ` Hans-Peter Nilsson
  2007-08-25 14:40                           ` [committed] Re: PATCH: fine-tuning for can_store_by_pieces Sandra Loosemore
  2 siblings, 0 replies; 30+ messages in thread
From: Hans-Peter Nilsson @ 2007-08-25 14:30 UTC (permalink / raw)
  To: Sandra Loosemore; +Cc: Jakub Jelinek, GCC Patches, Nigel Stephens

On Sat, 25 Aug 2007, Jakub Jelinek wrote:
> On Fri, Aug 24, 2007 at 08:09:22PM -0400, Sandra Loosemore wrote:
> *************** store_expr (tree exp, rtx target, int ca
> *** 4507,4513 ****
>                                   str_copy_len, builtin_strncpy_read_str,
>                                   (void *) TREE_STRING_POINTER (exp),
>                                   MEM_ALIGN (target),
> !                                 exp_len > str_copy_len ? 1 : 0);
>         if (exp_len > str_copy_len)
>         clear_storage (dest_mem, GEN_INT (exp_len - str_copy_len),
>                        BLOCK_OP_NORMAL);
> --- 4520,4527 ----
>                                   str_copy_len, builtin_strncpy_read_str,
>                                   (void *) TREE_STRING_POINTER (exp),
>                                   MEM_ALIGN (target),
> !                                 exp_len > str_copy_len ? 1 : 0,
> !                                 false);
>         if (exp_len > str_copy_len)
>         clear_storage (dest_mem, GEN_INT (exp_len - str_copy_len),
>                        BLOCK_OP_NORMAL);
>
> This is wrong.
> ...

Likely, that's also the reason for this regression with r127790
(from r127789) on cris-axis-elf:

Running /home/hp/comb2/combined/gcc/testsuite/gcc.c-torture/execute/execute.exp ...
FAIL: gcc.c-torture/execute/20030221-1.c execution,  -O0
FAIL: gcc.c-torture/execute/20030221-1.c execution,  -O1
FAIL: gcc.c-torture/execute/20030221-1.c execution,  -O2
FAIL: gcc.c-torture/execute/20030221-1.c execution,  -O3 -fomit-frame-pointer
FAIL: gcc.c-torture/execute/20030221-1.c execution,  -O3 -g
FAIL: gcc.c-torture/execute/20030221-1.c execution,  -Os

(Just a "program stopped with signal 6." in gcc.log)

brgds, H-P

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [committed] Re: PATCH: fine-tuning for can_store_by_pieces
  2007-08-25  9:18                         ` Jakub Jelinek
  2007-08-25  9:58                           ` Jakub Jelinek
  2007-08-25 14:30                           ` gcc.c-torture/execute/20030221-1.c regressed with "fine-tuning for can_store_by_pieces" Hans-Peter Nilsson
@ 2007-08-25 14:40                           ` Sandra Loosemore
  2 siblings, 0 replies; 30+ messages in thread
From: Sandra Loosemore @ 2007-08-25 14:40 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: GCC Patches, Nigel Stephens, Guy Morrogh, David Ung,
	Thiemo Seufer, Mark Mitchell, richard

Jakub Jelinek wrote:

> This is wrong.  You added the memsetp argument to store_by_pieces
> as the 6th, before the endp argument, but you are passing
> exp_len > str_copy_len ? 1 : 0 as memsetp and false as endp.

Whoops.  Too much late-night hacking and not enough caffeine, I guess. 
:-(

I see you've beaten me to committing a fix -- thanks!

-Sandra

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2007-08-25 13:13 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-08-15 17:15 PATCH: fine-tuning for can_store_by_pieces Sandra Loosemore
2007-08-15 17:22 ` Andrew Pinski
2007-08-15 18:32   ` Sandra Loosemore
2007-08-15 19:53     ` Nigel Stephens
2007-08-15 19:58   ` Sandra Loosemore
2007-08-17  4:50   ` Mark Mitchell
2007-08-17 13:24     ` Sandra Loosemore
2007-08-17 18:55       ` Mark Mitchell
2007-08-16  8:34 ` Richard Sandiford
2007-08-16 19:41   ` Sandra Loosemore
2007-08-19  0:03   ` Sandra Loosemore
2007-08-20  8:22     ` Richard Sandiford
2007-08-20 23:38       ` Sandra Loosemore
2007-08-21  8:21         ` Richard Sandiford
2007-08-21 10:34           ` Nigel Stephens
2007-08-21 11:53             ` Richard Sandiford
2007-08-21 12:14               ` Nigel Stephens
2007-08-21 12:35                 ` Richard Sandiford
2007-08-21 13:54           ` Sandra Loosemore
2007-08-21 14:22             ` Richard Sandiford
2007-08-21 20:39               ` Sandra Loosemore
2007-08-21 20:56                 ` Richard Sandiford
2007-08-23 14:35                   ` Sandra Loosemore
2007-08-23 14:44                     ` Richard Sandiford
2007-08-25  5:35                       ` [committed] " Sandra Loosemore
2007-08-25  9:18                         ` Jakub Jelinek
2007-08-25  9:58                           ` Jakub Jelinek
2007-08-25 14:30                           ` gcc.c-torture/execute/20030221-1.c regressed with "fine-tuning for can_store_by_pieces" Hans-Peter Nilsson
2007-08-25 14:40                           ` [committed] Re: PATCH: fine-tuning for can_store_by_pieces Sandra Loosemore
2007-08-24 22:06                     ` Mark Mitchell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).