public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] builtin expansion of strncmp for rs6000
@ 2016-12-15 16:34 Aaron Sawdey
  2016-12-16 23:57 ` Segher Boessenkool
  2017-01-12 17:53 ` Joseph Myers
  0 siblings, 2 replies; 8+ messages in thread
From: Aaron Sawdey @ 2016-12-15 16:34 UTC (permalink / raw)
  To: gcc-patches; +Cc: Segher Boessenkool, David Edelsohn, Bill Schmidt

[-- Attachment #1: Type: text/plain, Size: 1105 bytes --]

This patch adds a cmpstrnsi pattern for rs6000 target to provide
builtin expansion of strncmp(). Perf tests on a power8 system show that
it is 3-10x faster than the glibc strncmp on RHEL7 for lengths under 64
bytes.

Bootstrap/regtest has passed on powerpc64le, in progress for powerpc64,
ok for trunk if no new regressions?

2016-11-17  Aaron Sawdey  <acsawdey@linux.vnet.ibm.com>

	* config/rs6000/rs6000-protos.h (expand_strn_compare): Declare.
	* config/rs6000/rs6000.md (UNSPEC_CMPB): New unspec.
	(cmpb<mode>3): pattern for generating cmpb.
	(cmpstrnsi): pattern to expand strncmp ().
	* config/rs6000/rs6000.opt (mstring-compare-inline-limit): Add a new
	target option for controlling how much code inline expansion of
	strncmp() will be allowed to generate.
	* config/rs6000/rs6000.c (expand_strncmp_align_check): generate code
	for runtime page crossing check of strncmp () args.
	(expand_strn_compare): Function to do builtin expansion of strncmp ().


-- 
Aaron Sawdey, Ph.D.  acsawdey@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain

[-- Attachment #2: strncmp.20161215.patch --]
[-- Type: text/x-patch, Size: 16742 bytes --]

Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 243658)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -78,6 +78,7 @@
 extern int expand_block_clear (rtx[]);
 extern int expand_block_move (rtx[]);
 extern bool expand_block_compare (rtx[]);
+extern bool expand_strn_compare (rtx[]);
 extern const char * rs6000_output_load_multiple (rtx[]);
 extern bool rs6000_is_valid_mask (rtx, int *, int *, machine_mode);
 extern bool rs6000_is_valid_and_mask (rtx, machine_mode);
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 243658)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -19382,7 +19382,387 @@
   return true;
 }
 
+/* Generate alignment check and branch code to set up for
+   strncmp when we don't have DI alignment.
+   STRNCMP_LABEL is the label to branch if there is a page crossing.
+   SRC is the string pointer to be examined.
+   BYTES is the max number of bytes to compare.  */
+static void
+expand_strncmp_align_check (rtx strncmp_label, rtx src, HOST_WIDE_INT bytes)
+{
+  rtx lab_ref = gen_rtx_LABEL_REF (VOIDmode, strncmp_label);
+  rtx src_check = copy_addr_to_reg (XEXP (src, 0));
+  if (GET_MODE (src_check) == SImode)
+    emit_insn (gen_andsi3_mask (src_check, src_check, GEN_INT(0xfff)));
+  else
+    emit_insn (gen_anddi3_mask (src_check, src_check, GEN_INT(0xfff)));
+  rtx cond = gen_reg_rtx (CCmode);
+  emit_move_insn (cond, gen_rtx_COMPARE (CCmode, src_check,
+					 GEN_INT (4096-bytes)));
 \f
+  rtx cmp_rtx = gen_rtx_LT (VOIDmode, cond, const0_rtx);
+
+  rtx ifelse = gen_rtx_IF_THEN_ELSE (VOIDmode, cmp_rtx,
+				     pc_rtx, lab_ref);
+  rtx j = emit_jump_insn (gen_rtx_SET (pc_rtx, ifelse));
+  JUMP_LABEL (j) = strncmp_label;
+  LABEL_NUSES (strncmp_label) += 1;
+}
+
+/* Expand a string compare operation with length, and return 
+   true if successful. Return false if we should let the 
+   compiler generate normal code, probably a strncmp call.
+
+   OPERANDS[0] is the target (result).
+   OPERANDS[1] is the first source.
+   OPERANDS[2] is the second source.
+   OPERANDS[3] is the length.
+   OPERANDS[4] is the alignment in bytes.  */
+bool
+expand_strn_compare (rtx operands[])
+{
+  rtx target = operands[0];
+  rtx orig_src1 = operands[1];
+  rtx orig_src2 = operands[2];
+  rtx bytes_rtx = operands[3];
+  rtx align_rtx = operands[4];
+  HOST_WIDE_INT cmp_bytes = 0;
+  rtx src1 = orig_src1;
+  rtx src2 = orig_src2;
+
+  /* If this is not a fixed size compare, just call strncmp.  */
+  if (!CONST_INT_P (bytes_rtx))
+    return false;
+
+  /* This must be a fixed size alignment.  */
+  if (!CONST_INT_P (align_rtx))
+    return false;
+
+  int base_align = INTVAL (align_rtx);
+  int align1 = MEM_ALIGN (orig_src1) / BITS_PER_UNIT;
+  int align2 = MEM_ALIGN (orig_src2) / BITS_PER_UNIT;
+
+  /* SLOW_UNALIGNED_ACCESS -- don't do unaligned stuff.  */
+  if (SLOW_UNALIGNED_ACCESS (word_mode, align1)
+      || SLOW_UNALIGNED_ACCESS (word_mode, align2))
+    return false;
+
+  gcc_assert (GET_MODE (target) == SImode);
+
+  HOST_WIDE_INT bytes = INTVAL (bytes_rtx);
+
+  /* If we have an LE target without ldbrx and word_mode is DImode,
+     then we must avoid using word_mode.  */
+  int word_mode_ok = !(!BYTES_BIG_ENDIAN && !TARGET_LDBRX
+		       && word_mode == DImode);
+
+  int word_mode_size = GET_MODE_SIZE (word_mode);
+
+  int offset = 0;
+  machine_mode load_mode =
+    select_block_compare_mode (offset, bytes, base_align, word_mode_ok);
+  int load_mode_size = GET_MODE_SIZE (load_mode);
+
+  /* We don't want to generate too much code.  */
+  if (ROUND_UP (bytes, load_mode_size) / load_mode_size
+      > rs6000_string_compare_inline_limit)
+    return false;
+
+  rtx result_reg = gen_reg_rtx (word_mode);
+  rtx final_move_label = gen_label_rtx ();
+  rtx final_label = gen_label_rtx ();
+  rtx begin_compare_label = NULL;
+
+  if (base_align < 8)
+    {
+      /* Generate code that checks distance to 4k boundary for this case.  */
+      begin_compare_label = gen_label_rtx ();
+      rtx strncmp_label = gen_label_rtx ();
+      rtx jmp;
+
+      /* Strncmp for power8 in glibc does this:
+	 rldicl	r8,r3,0,52
+	 cmpldi	cr7,r8,4096-16
+	 bgt	cr7,L(pagecross) */
+
+      if (align1 < 8)
+	expand_strncmp_align_check (strncmp_label, src1, bytes);
+      if (align2 < 8)
+	expand_strncmp_align_check (strncmp_label, src2, bytes);
+
+      /* Now generate the following sequence:
+	 - branch to begin_compare
+	 - strncmp_label
+	 - call to strncmp
+	 - branch to final_label 
+         - begin_compare_label */
+
+      rtx cmp_ref = gen_rtx_LABEL_REF (VOIDmode, begin_compare_label);
+      jmp = emit_jump_insn (gen_rtx_SET (pc_rtx, cmp_ref));
+      JUMP_LABEL(jmp) = begin_compare_label;
+      LABEL_NUSES (begin_compare_label) += 1;
+      emit_barrier ();
+
+      emit_label (strncmp_label);
+
+      if (!REG_P (XEXP (src1, 0)))
+	{
+	  rtx src1_reg = copy_addr_to_reg (XEXP (src1, 0));
+	  src1 = replace_equiv_address (src1, src1_reg);
+	}
+
+      if (!REG_P (XEXP (src2, 0)))
+	{
+	  rtx src2_reg = copy_addr_to_reg (XEXP (src2, 0));
+	  src2 = replace_equiv_address (src2, src2_reg);
+	}
+
+      /* -m32 -mpowerpc64 results in word_mode being DImode even
+	 though otherwise it is 32-bit. The length arg to strncmp
+	 is a size_t which will be the same size as pointers.  */
+      rtx len_rtx;
+      if (TARGET_64BIT)
+	len_rtx = gen_reg_rtx(DImode);
+      else
+	len_rtx = gen_reg_rtx(SImode);
+	
+      emit_move_insn (len_rtx, bytes_rtx);
+
+      emit_library_call_value (gen_rtx_SYMBOL_REF (Pmode, "strncmp"),
+			       target, LCT_NORMAL, GET_MODE (target), 3,
+			       force_reg (Pmode, XEXP (src1, 0)), Pmode,
+			       force_reg (Pmode, XEXP (src2, 0)), Pmode,
+			       len_rtx, GET_MODE (len_rtx));
+
+      rtx fin_ref = gen_rtx_LABEL_REF (VOIDmode, final_label);
+      jmp = emit_jump_insn (gen_rtx_SET (pc_rtx, fin_ref));
+      JUMP_LABEL (jmp) = final_label;
+      LABEL_NUSES (final_label) += 1;
+      emit_barrier ();
+      emit_label (begin_compare_label);
+    }
+
+  rtx cleanup_label = NULL;
+  rtx tmp_reg_src1 = gen_reg_rtx (word_mode);
+  rtx tmp_reg_src2 = gen_reg_rtx (word_mode);
+  
+  /* Generate sequence of ld/ldbrx, cmpb to compare out 
+     to the length specified.  */
+  while (bytes > 0)
+    {
+      /* Compare sequence:
+	 check each 8B with: ld/ld cmpd bne 
+	 cleanup code at end:
+	 cmpb          get byte that differs
+	 cmpb          look for zero byte
+	 orc           combine
+	 cntlzd        get bit of first zero/diff byte
+         subfic        convert for rldcl use
+         rldcl rldcl   extract diff/zero byte
+         subf          subtract for final result
+	 
+	 The last compare can branch around the cleanup code if the 
+	 result is zero because the strings are exactly equal.  */
+      int align = compute_current_alignment (base_align, offset);
+      if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED)
+	load_mode = select_block_compare_mode (offset, bytes, align,
+					       word_mode_ok);
+      else
+	load_mode = select_block_compare_mode (0, bytes, align, word_mode_ok);
+      load_mode_size = GET_MODE_SIZE (load_mode);
+      if (bytes >= load_mode_size)
+	cmp_bytes = load_mode_size;
+      else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED)
+	{
+	  /* Move this load back so it doesn't go past the end.
+	     P8/P9 can do this efficiently.  */
+	  int extra_bytes = load_mode_size - bytes;
+	  cmp_bytes = bytes;
+	  if (extra_bytes < offset)
+	    {
+	      offset -= extra_bytes;
+	      cmp_bytes = load_mode_size;
+	      bytes = cmp_bytes;
+	    }
+	}
+      else
+	/* P7 and earlier can't do the overlapping load trick fast,
+	   so this forces a non-overlapping load and a shift to get
+	   rid of the extra bytes.  */
+	cmp_bytes = bytes;
+      
+      src1 = adjust_address (orig_src1, load_mode, offset);
+      src2 = adjust_address (orig_src2, load_mode, offset);
+
+      if (!REG_P (XEXP (src1, 0)))
+	{
+	  rtx src1_reg = copy_addr_to_reg (XEXP (src1, 0));
+	  src1 = replace_equiv_address (src1, src1_reg);
+	}
+      set_mem_size (src1, cmp_bytes);
+
+      if (!REG_P (XEXP (src2, 0)))
+	{
+	  rtx src2_reg = copy_addr_to_reg (XEXP (src2, 0));
+	  src2 = replace_equiv_address (src2, src2_reg);
+	}
+      set_mem_size (src2, cmp_bytes);
+
+      do_load_for_compare (tmp_reg_src1, src1, load_mode);
+      do_load_for_compare (tmp_reg_src2, src2, load_mode);
+
+      /* We must always left-align the data we read, and
+	 clear any bytes to the right that are beyond the string.
+	 Otherwise the cmpb sequence won't produce the correct
+	 results.  The beginning of the compare will be done
+	 with word_mode so will not have any extra shifts or
+	 clear rights.  */
+
+      if ( load_mode_size < word_mode_size )
+	{
+	  /* Rotate left first. */
+	  rtx sh = GEN_INT (BITS_PER_UNIT * (word_mode_size - load_mode_size));
+	  if (word_mode == DImode)
+	    {
+	      emit_insn (gen_rotldi3 (tmp_reg_src1, tmp_reg_src1, sh));
+	      emit_insn (gen_rotldi3 (tmp_reg_src2, tmp_reg_src2, sh));
+	    }
+	  else
+	    {
+	      emit_insn (gen_rotlsi3 (tmp_reg_src1, tmp_reg_src1, sh));
+	      emit_insn (gen_rotlsi3 (tmp_reg_src2, tmp_reg_src2, sh));
+	    }
+	}
+
+      if (cmp_bytes < word_mode_size)
+	{
+	  /* Now clear right.  This plus the rotate can be 
+	     turned into a rldicr instruction. */
+	  HOST_WIDE_INT mb = BITS_PER_UNIT * (word_mode_size - cmp_bytes);
+	  rtx mask = GEN_INT (~(((HOST_WIDE_INT)1 << mb) - 1));
+	  if (word_mode == DImode)
+	    {
+	      emit_insn (gen_anddi3_mask (tmp_reg_src1, tmp_reg_src1, mask));
+	      emit_insn (gen_anddi3_mask (tmp_reg_src2, tmp_reg_src2, mask));
+	    }
+	  else
+	    {
+	      emit_insn (gen_andsi3_mask (tmp_reg_src1, tmp_reg_src1, mask));
+	      emit_insn (gen_andsi3_mask (tmp_reg_src2, tmp_reg_src2, mask));
+	    }
+	}
+
+      int remain = bytes - cmp_bytes;
+
+      rtx dst_label;
+      if (remain > 0)
+	{
+	  if (!cleanup_label)
+	    cleanup_label = gen_label_rtx ();
+	  dst_label = cleanup_label;
+	}
+      else
+	dst_label = final_move_label;
+
+      rtx lab_ref = gen_rtx_LABEL_REF (VOIDmode, dst_label);
+      rtx cond = gen_reg_rtx (CCmode);
+      
+      if (remain == 0)
+	{
+	  /* For the last chunk, subf. also 
+	     generates the zero result we need.  */
+	  rtx tmp = gen_rtx_MINUS (word_mode, tmp_reg_src1, tmp_reg_src2);
+	  rs6000_emit_dot_insn (result_reg, tmp, 1, cond);
+	}
+      else
+	emit_move_insn (cond, gen_rtx_COMPARE (CCmode,
+					       tmp_reg_src1, tmp_reg_src2));
+
+      rtx cmp_rtx;
+      if (remain > 0)
+	cmp_rtx = gen_rtx_NE (VOIDmode, cond, const0_rtx);
+      else
+	cmp_rtx = gen_rtx_EQ (VOIDmode, cond, const0_rtx);
+
+      rtx ifelse = gen_rtx_IF_THEN_ELSE (VOIDmode, cmp_rtx,
+					 lab_ref, pc_rtx);
+      rtx j = emit_jump_insn (gen_rtx_SET (pc_rtx, ifelse));
+      JUMP_LABEL (j) = dst_label;
+      LABEL_NUSES (dst_label) += 1;
+
+      offset += cmp_bytes;
+      bytes -= cmp_bytes;
+    }
+
+  if (cleanup_label)
+    emit_label (cleanup_label);
+
+  /* Generate the final sequence that identifies the differing
+     byte and generates the final result, taking into account
+     zero bytes:
+    
+	 cmpb          cmpb_result1, src1, src2
+	 cmpb          cmpb_result2, src1, zero
+	 orc           cmpb_result1, cmp_result1, cmpb_result2
+	 cntlzd        get bit of first zero/diff byte
+         addi          convert for rldcl use
+         rldcl rldcl   extract diff/zero byte
+         subf          subtract for final result
+  */
+
+  rtx cmpb_diff = gen_reg_rtx (word_mode);
+  rtx cmpb_zero = gen_reg_rtx (word_mode);
+  rtx rot_amt = gen_reg_rtx (word_mode);
+  rtx zero_reg = gen_reg_rtx (word_mode);
+
+  rtx rot1_1 = gen_reg_rtx(word_mode);
+  rtx rot1_2 = gen_reg_rtx(word_mode);
+  rtx rot2_1 = gen_reg_rtx(word_mode);
+  rtx rot2_2 = gen_reg_rtx(word_mode);
+
+  if (word_mode == SImode)
+    {
+      emit_insn (gen_cmpbsi3 (cmpb_diff, tmp_reg_src1, tmp_reg_src2));
+      emit_insn (gen_movsi (zero_reg, GEN_INT(0)));
+      emit_insn (gen_cmpbsi3 (cmpb_zero, tmp_reg_src1, zero_reg));
+      emit_insn (gen_one_cmplsi2 (cmpb_diff,cmpb_diff));
+      emit_insn (gen_iorsi3 (cmpb_diff, cmpb_diff, cmpb_zero));
+      emit_insn (gen_clzsi2 (rot_amt, cmpb_diff));
+      emit_insn (gen_addsi3 (rot_amt, rot_amt, GEN_INT (8)));
+      emit_insn (gen_rotlsi3 (rot1_1, tmp_reg_src1,
+			      gen_lowpart (SImode, rot_amt)));
+      emit_insn (gen_andsi3_mask (rot1_2, rot1_1, GEN_INT(0xff)));
+      emit_insn (gen_rotlsi3 (rot2_1, tmp_reg_src2,
+			      gen_lowpart (SImode, rot_amt)));
+      emit_insn (gen_andsi3_mask (rot2_2, rot2_1, GEN_INT(0xff)));
+      emit_insn (gen_subsi3 (result_reg, rot1_2, rot2_2));
+    }
+  else
+    {
+      emit_insn (gen_cmpbdi3 (cmpb_diff, tmp_reg_src1, tmp_reg_src2));
+      emit_insn (gen_movdi (zero_reg, GEN_INT(0)));
+      emit_insn (gen_cmpbdi3 (cmpb_zero, tmp_reg_src1, zero_reg));
+      emit_insn (gen_one_cmpldi2 (cmpb_diff,cmpb_diff));
+      emit_insn (gen_iordi3 (cmpb_diff, cmpb_diff, cmpb_zero));
+      emit_insn (gen_clzdi2 (rot_amt, cmpb_diff));
+      emit_insn (gen_adddi3 (rot_amt, rot_amt, GEN_INT (8)));
+      emit_insn (gen_rotldi3 (rot1_1, tmp_reg_src1,
+			      gen_lowpart (SImode, rot_amt)));
+      emit_insn (gen_anddi3_mask (rot1_2, rot1_1, GEN_INT(0xff)));
+      emit_insn (gen_rotldi3 (rot2_1, tmp_reg_src2,
+			      gen_lowpart (SImode, rot_amt)));
+      emit_insn (gen_anddi3_mask (rot2_2, rot2_1, GEN_INT(0xff)));
+      emit_insn (gen_subdi3 (result_reg, rot1_2, rot2_2));
+    }
+      
+  emit_label (final_move_label);
+  emit_insn (gen_movsi (target,
+			gen_lowpart (SImode, result_reg)));
+  emit_label (final_label);
+  return true;
+}
+
+\f
 /* Expand a block move operation, and return 1 if successful.  Return 0
    if we should let the compiler generate normal code.
 
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 243658)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -117,6 +117,7 @@
    UNSPEC_BPERM
    UNSPEC_COPYSIGN
    UNSPEC_PARITY
+   UNSPEC_CMPB
    UNSPEC_FCTIW
    UNSPEC_FCTID
    UNSPEC_LFIWAX
@@ -2316,6 +2317,13 @@
   "prty<wd> %0,%1"
   [(set_attr "type" "popcnt")])
 
+(define_insn "cmpb<mode>3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
+	(unspec:GPR [(match_operand:GPR 1 "gpc_reg_operand" "r")
+		     (match_operand:GPR 2 "gpc_reg_operand" "r")] UNSPEC_CMPB))]
+  "TARGET_CMPB"
+  "cmpb %0,%1,%2"
+  [(set_attr "type" "cmp")])
 
 ;; Since the hardware zeros the upper part of the register, save generating the
 ;; AND immediate if we are converting to unsigned
@@ -8844,7 +8852,7 @@
     FAIL;
 }")
 
-;; String/block compare insn.
+;; String compare N insn.
 ;; Argument 0 is the target (result)
 ;; Argument 1 is the destination
 ;; Argument 2 is the source
@@ -8851,6 +8859,27 @@
 ;; Argument 3 is the length
 ;; Argument 4 is the alignment
 
+(define_expand "cmpstrnsi"
+  [(parallel [(set (match_operand:SI 0)
+               (compare:SI (match_operand:BLK 1)
+                           (match_operand:BLK 2)))
+	      (use (match_operand:SI 3))
+	      (use (match_operand:SI 4))])]
+  "TARGET_CMPB && (BYTES_BIG_ENDIAN || TARGET_LDBRX)"
+{
+  if (expand_strn_compare (operands))
+    DONE;
+  else	
+    FAIL;
+})
+
+;; Block compare insn.
+;; Argument 0 is the target (result)
+;; Argument 1 is the destination
+;; Argument 2 is the source
+;; Argument 3 is the length
+;; Argument 4 is the alignment
+
 (define_expand "cmpmemsi"
   [(parallel [(set (match_operand:SI 0)
                (compare:SI (match_operand:BLK 1)
Index: gcc/config/rs6000/rs6000.opt
===================================================================
--- gcc/config/rs6000/rs6000.opt	(revision 243658)
+++ gcc/config/rs6000/rs6000.opt	(working copy)
@@ -337,6 +337,10 @@
 Target Report Var(rs6000_block_compare_inline_limit) Init(5) RejectNegative Joined UInteger Save
 Specify the maximum number pairs of load instructions that should be generated inline for the compare.  If the number needed exceeds the limit, a call to memcmp will be generated instead.
 
+mstring-compare-inline-limit=
+Target Report Var(rs6000_string_compare_inline_limit) Init(8) RejectNegative Joined UInteger Save
+Specify the maximum number pairs of load instructions that should be generated inline for the compare.  If the number needed exceeds the limit, a call to strncmp will be generated instead.
+
 misel
 Target Report Mask(ISEL) Var(rs6000_isa_flags)
 Generate isel instructions.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] builtin expansion of strncmp for rs6000
  2016-12-15 16:34 [PATCH] builtin expansion of strncmp for rs6000 Aaron Sawdey
@ 2016-12-16 23:57 ` Segher Boessenkool
  2016-12-19 15:58   ` Aaron Sawdey
  2017-01-12 17:53 ` Joseph Myers
  1 sibling, 1 reply; 8+ messages in thread
From: Segher Boessenkool @ 2016-12-16 23:57 UTC (permalink / raw)
  To: Aaron Sawdey; +Cc: gcc-patches, David Edelsohn, Bill Schmidt

Hi Aaron,

On Thu, Dec 15, 2016 at 10:31:05AM -0600, Aaron Sawdey wrote:
> This patch adds a cmpstrnsi pattern for rs6000 target to provide
> builtin expansion of strncmp(). Perf tests on a power8 system show that
> it is 3-10x faster than the glibc strncmp on RHEL7 for lengths under 64
> bytes.
> 
> Bootstrap/regtest has passed on powerpc64le, in progress for powerpc64,
> ok for trunk if no new regressions?

Some trivial things...

> +static void
> +expand_strncmp_align_check (rtx strncmp_label, rtx src, HOST_WIDE_INT bytes)
> +{
> +  rtx lab_ref = gen_rtx_LABEL_REF (VOIDmode, strncmp_label);
> +  rtx src_check = copy_addr_to_reg (XEXP (src, 0));
> +  if (GET_MODE (src_check) == SImode)
> +    emit_insn (gen_andsi3_mask (src_check, src_check, GEN_INT(0xfff)));

Space before (, also for macros.

> +  else
> +    emit_insn (gen_anddi3_mask (src_check, src_check, GEN_INT(0xfff)));

You can also just do gen_anddi3 (..); sometimes it can optimise better that
way, you can easier be sure you match all constraints, and it requires less
thinking.  Just for expanders of course, code that runs later needs to do
more by hand to get good results.  (There's no need to change it here, just
a FYI).

> +  rtx cond = gen_reg_rtx (CCmode);
> +  emit_move_insn (cond, gen_rtx_COMPARE (CCmode, src_check,
> +					 GEN_INT (4096-bytes)));

Spaces around -.

>  \f

You put the page break in the middle of a function.  Let's not :-)

> +/* Expand a string compare operation with length, and return 
> +   true if successful. Return false if we should let the 

These two lines have trailing spaces. There is more of this; please
check everything.

> +      if (align1 < 8)
> +	expand_strncmp_align_check (strncmp_label, src1, bytes);
> +      if (align2 < 8)
> +	expand_strncmp_align_check (strncmp_label, src2, bytes);

Does this work out if (say) bytes >= 4096?  It doesn't have to be very
sane code, but it should work correctly.  Or you need to limit the maximum
size some other way.

> +      /* Now generate the following sequence:
> +	 - branch to begin_compare
> +	 - strncmp_label
> +	 - call to strncmp
> +	 - branch to final_label 
> +         - begin_compare_label */

Wrong indent on that last line.

> +      /* -m32 -mpowerpc64 results in word_mode being DImode even
> +	 though otherwise it is 32-bit. The length arg to strncmp
> +	 is a size_t which will be the same size as pointers.  */
> +      rtx len_rtx;
> +      if (TARGET_64BIT)
> +	len_rtx = gen_reg_rtx(DImode);
> +      else
> +	len_rtx = gen_reg_rtx(SImode);
> +	

Wrong tab.  Some more tab vs. spaces indent (and white space at the end
of line) follows, please fix all.

> +      /* We must always left-align the data we read, and
> +	 clear any bytes to the right that are beyond the string.
> +	 Otherwise the cmpb sequence won't produce the correct
> +	 results.  The beginning of the compare will be done
> +	 with word_mode so will not have any extra shifts or
> +	 clear rights.  */
> +
> +      if ( load_mode_size < word_mode_size )

No space after ( or before ).

> +	  rtx mask = GEN_INT (~(((HOST_WIDE_INT)1 << mb) - 1));

	  rtx mask = GEN_INT (HOST_WIDE_INT_M1U << mb);

Please repost.  Thanks,


Segher

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] builtin expansion of strncmp for rs6000
  2016-12-16 23:57 ` Segher Boessenkool
@ 2016-12-19 15:58   ` Aaron Sawdey
  2016-12-19 17:32     ` Segher Boessenkool
  0 siblings, 1 reply; 8+ messages in thread
From: Aaron Sawdey @ 2016-12-19 15:58 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, David Edelsohn, Bill Schmidt

[-- Attachment #1: Type: text/plain, Size: 454 bytes --]

On Fri, 2016-12-16 at 17:14 -0600, Segher Boessenkool wrote:
> Please repost.  Thanks,

Hi Segher,
  Thanks for the review. Attached is an updated patch that should
address the issues you noted. 

Bootstrap/regtest in progress on ppc64le -mcpu=power8, ok for trunk if
results are clean?

Thanks,
   Aaron

-- 
Aaron Sawdey, Ph.D.  acsawdey@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain

[-- Attachment #2: strncmp.20161219.patch --]
[-- Type: text/x-patch, Size: 16757 bytes --]

Index: gcc/config/rs6000/rs6000-protos.h
===================================================================
--- gcc/config/rs6000/rs6000-protos.h	(revision 243799)
+++ gcc/config/rs6000/rs6000-protos.h	(working copy)
@@ -78,6 +78,7 @@
 extern int expand_block_clear (rtx[]);
 extern int expand_block_move (rtx[]);
 extern bool expand_block_compare (rtx[]);
+extern bool expand_strn_compare (rtx[]);
 extern const char * rs6000_output_load_multiple (rtx[]);
 extern bool rs6000_is_valid_mask (rtx, int *, int *, machine_mode);
 extern bool rs6000_is_valid_and_mask (rtx, machine_mode);
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 243799)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -19382,7 +19382,388 @@
   return true;
 }
 
+/* Generate alignment check and branch code to set up for
+   strncmp when we don't have DI alignment.
+   STRNCMP_LABEL is the label to branch if there is a page crossing.
+   SRC is the string pointer to be examined.
+   BYTES is the max number of bytes to compare.	 */
+static void
+expand_strncmp_align_check (rtx strncmp_label, rtx src, HOST_WIDE_INT bytes)
+{
+  rtx lab_ref = gen_rtx_LABEL_REF (VOIDmode, strncmp_label);
+  rtx src_check = copy_addr_to_reg (XEXP (src, 0));
+  if (GET_MODE (src_check) == SImode)
+    emit_insn (gen_andsi3 (src_check, src_check, GEN_INT (0xfff)));
+  else
+    emit_insn (gen_anddi3 (src_check, src_check, GEN_INT (0xfff)));
+  rtx cond = gen_reg_rtx (CCmode);
+  emit_move_insn (cond, gen_rtx_COMPARE (CCmode, src_check,
+					 GEN_INT (4096 - bytes)));
 \f
+  rtx cmp_rtx = gen_rtx_LT (VOIDmode, cond, const0_rtx);
+
+  rtx ifelse = gen_rtx_IF_THEN_ELSE (VOIDmode, cmp_rtx,
+				     pc_rtx, lab_ref);
+  rtx j = emit_jump_insn (gen_rtx_SET (pc_rtx, ifelse));
+  JUMP_LABEL (j) = strncmp_label;
+  LABEL_NUSES (strncmp_label) += 1;
+}
+
+/* Expand a string compare operation with length, and return
+   true if successful. Return false if we should let the
+   compiler generate normal code, probably a strncmp call.
+
+   OPERANDS[0] is the target (result).
+   OPERANDS[1] is the first source.
+   OPERANDS[2] is the second source.
+   OPERANDS[3] is the length.
+   OPERANDS[4] is the alignment in bytes.  */
+bool
+expand_strn_compare (rtx operands[])
+{
+  rtx target = operands[0];
+  rtx orig_src1 = operands[1];
+  rtx orig_src2 = operands[2];
+  rtx bytes_rtx = operands[3];
+  rtx align_rtx = operands[4];
+  HOST_WIDE_INT cmp_bytes = 0;
+  rtx src1 = orig_src1;
+  rtx src2 = orig_src2;
+
+  /* If this is not a fixed size compare, just call strncmp.  */
+  if (!CONST_INT_P (bytes_rtx))
+    return false;
+
+  /* This must be a fixed size alignment.  */
+  if (!CONST_INT_P (align_rtx))
+    return false;
+
+  int base_align = INTVAL (align_rtx);
+  int align1 = MEM_ALIGN (orig_src1) / BITS_PER_UNIT;
+  int align2 = MEM_ALIGN (orig_src2) / BITS_PER_UNIT;
+
+  /* SLOW_UNALIGNED_ACCESS -- don't do unaligned stuff.	 */
+  if (SLOW_UNALIGNED_ACCESS (word_mode, align1)
+      || SLOW_UNALIGNED_ACCESS (word_mode, align2))
+    return false;
+
+  gcc_assert (GET_MODE (target) == SImode);
+
+  HOST_WIDE_INT bytes = INTVAL (bytes_rtx);
+
+  /* If we have an LE target without ldbrx and word_mode is DImode,
+     then we must avoid using word_mode.  */
+  int word_mode_ok = !(!BYTES_BIG_ENDIAN && !TARGET_LDBRX
+		       && word_mode == DImode);
+
+  int word_mode_size = GET_MODE_SIZE (word_mode);
+
+  int offset = 0;
+  machine_mode load_mode =
+    select_block_compare_mode (offset, bytes, base_align, word_mode_ok);
+  int load_mode_size = GET_MODE_SIZE (load_mode);
+
+  /* We don't want to generate too much code.  Also if bytes is
+     4096 or larger we always want the library strncmp anyway.  */
+  int groups = ROUND_UP (bytes, load_mode_size) / load_mode_size;
+  if (bytes >= 4096 || groups > rs6000_string_compare_inline_limit)
+    return false;
+
+  rtx result_reg = gen_reg_rtx (word_mode);
+  rtx final_move_label = gen_label_rtx ();
+  rtx final_label = gen_label_rtx ();
+  rtx begin_compare_label = NULL;
+
+  if (base_align < 8)
+    {
+      /* Generate code that checks distance to 4k boundary for this case.  */
+      begin_compare_label = gen_label_rtx ();
+      rtx strncmp_label = gen_label_rtx ();
+      rtx jmp;
+
+      /* Strncmp for power8 in glibc does this:
+	 rldicl	r8,r3,0,52
+	 cmpldi	cr7,r8,4096-16
+	 bgt	cr7,L(pagecross) */
+
+      if (align1 < 8)
+	expand_strncmp_align_check (strncmp_label, src1, bytes);
+      if (align2 < 8)
+	expand_strncmp_align_check (strncmp_label, src2, bytes);
+
+      /* Now generate the following sequence:
+	 - branch to begin_compare
+	 - strncmp_label
+	 - call to strncmp
+	 - branch to final_label
+	 - begin_compare_label */
+
+      rtx cmp_ref = gen_rtx_LABEL_REF (VOIDmode, begin_compare_label);
+      jmp = emit_jump_insn (gen_rtx_SET (pc_rtx, cmp_ref));
+      JUMP_LABEL(jmp) = begin_compare_label;
+      LABEL_NUSES (begin_compare_label) += 1;
+      emit_barrier ();
+
+      emit_label (strncmp_label);
+
+      if (!REG_P (XEXP (src1, 0)))
+	{
+	  rtx src1_reg = copy_addr_to_reg (XEXP (src1, 0));
+	  src1 = replace_equiv_address (src1, src1_reg);
+	}
+
+      if (!REG_P (XEXP (src2, 0)))
+	{
+	  rtx src2_reg = copy_addr_to_reg (XEXP (src2, 0));
+	  src2 = replace_equiv_address (src2, src2_reg);
+	}
+
+      /* -m32 -mpowerpc64 results in word_mode being DImode even
+	 though otherwise it is 32-bit. The length arg to strncmp
+	 is a size_t which will be the same size as pointers.  */
+      rtx len_rtx;
+      if (TARGET_64BIT)
+	len_rtx = gen_reg_rtx(DImode);
+      else
+	len_rtx = gen_reg_rtx(SImode);
+
+      emit_move_insn (len_rtx, bytes_rtx);
+
+      emit_library_call_value (gen_rtx_SYMBOL_REF (Pmode, "strncmp"),
+			       target, LCT_NORMAL, GET_MODE (target), 3,
+			       force_reg (Pmode, XEXP (src1, 0)), Pmode,
+			       force_reg (Pmode, XEXP (src2, 0)), Pmode,
+			       len_rtx, GET_MODE (len_rtx));
+
+      rtx fin_ref = gen_rtx_LABEL_REF (VOIDmode, final_label);
+      jmp = emit_jump_insn (gen_rtx_SET (pc_rtx, fin_ref));
+      JUMP_LABEL (jmp) = final_label;
+      LABEL_NUSES (final_label) += 1;
+      emit_barrier ();
+      emit_label (begin_compare_label);
+    }
+
+  rtx cleanup_label = NULL;
+  rtx tmp_reg_src1 = gen_reg_rtx (word_mode);
+  rtx tmp_reg_src2 = gen_reg_rtx (word_mode);
+
+  /* Generate sequence of ld/ldbrx, cmpb to compare out
+     to the length specified.  */
+  while (bytes > 0)
+    {
+      /* Compare sequence:
+	 check each 8B with: ld/ld cmpd bne
+	 cleanup code at end:
+	 cmpb	       get byte that differs
+	 cmpb	       look for zero byte
+	 orc	       combine
+	 cntlzd	       get bit of first zero/diff byte
+	 subfic	       convert for rldcl use
+	 rldcl rldcl   extract diff/zero byte
+	 subf	       subtract for final result
+
+	 The last compare can branch around the cleanup code if the
+	 result is zero because the strings are exactly equal.	*/
+      int align = compute_current_alignment (base_align, offset);
+      if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED)
+	load_mode = select_block_compare_mode (offset, bytes, align,
+					       word_mode_ok);
+      else
+	load_mode = select_block_compare_mode (0, bytes, align, word_mode_ok);
+      load_mode_size = GET_MODE_SIZE (load_mode);
+      if (bytes >= load_mode_size)
+	cmp_bytes = load_mode_size;
+      else if (TARGET_EFFICIENT_OVERLAPPING_UNALIGNED)
+	{
+	  /* Move this load back so it doesn't go past the end.
+	     P8/P9 can do this efficiently.  */
+	  int extra_bytes = load_mode_size - bytes;
+	  cmp_bytes = bytes;
+	  if (extra_bytes < offset)
+	    {
+	      offset -= extra_bytes;
+	      cmp_bytes = load_mode_size;
+	      bytes = cmp_bytes;
+	    }
+	}
+      else
+	/* P7 and earlier can't do the overlapping load trick fast,
+	   so this forces a non-overlapping load and a shift to get
+	   rid of the extra bytes.  */
+	cmp_bytes = bytes;
+
+      src1 = adjust_address (orig_src1, load_mode, offset);
+      src2 = adjust_address (orig_src2, load_mode, offset);
+
+      if (!REG_P (XEXP (src1, 0)))
+	{
+	  rtx src1_reg = copy_addr_to_reg (XEXP (src1, 0));
+	  src1 = replace_equiv_address (src1, src1_reg);
+	}
+      set_mem_size (src1, cmp_bytes);
+
+      if (!REG_P (XEXP (src2, 0)))
+	{
+	  rtx src2_reg = copy_addr_to_reg (XEXP (src2, 0));
+	  src2 = replace_equiv_address (src2, src2_reg);
+	}
+      set_mem_size (src2, cmp_bytes);
+
+      do_load_for_compare (tmp_reg_src1, src1, load_mode);
+      do_load_for_compare (tmp_reg_src2, src2, load_mode);
+
+      /* We must always left-align the data we read, and
+	 clear any bytes to the right that are beyond the string.
+	 Otherwise the cmpb sequence won't produce the correct
+	 results.  The beginning of the compare will be done
+	 with word_mode so will not have any extra shifts or
+	 clear rights.	*/
+
+      if (load_mode_size < word_mode_size)
+	{
+	  /* Rotate left first. */
+	  rtx sh = GEN_INT (BITS_PER_UNIT * (word_mode_size - load_mode_size));
+	  if (word_mode == DImode)
+	    {
+	      emit_insn (gen_rotldi3 (tmp_reg_src1, tmp_reg_src1, sh));
+	      emit_insn (gen_rotldi3 (tmp_reg_src2, tmp_reg_src2, sh));
+	    }
+	  else
+	    {
+	      emit_insn (gen_rotlsi3 (tmp_reg_src1, tmp_reg_src1, sh));
+	      emit_insn (gen_rotlsi3 (tmp_reg_src2, tmp_reg_src2, sh));
+	    }
+	}
+
+      if (cmp_bytes < word_mode_size)
+	{
+	  /* Now clear right.  This plus the rotate can be
+	     turned into a rldicr instruction. */
+	  HOST_WIDE_INT mb = BITS_PER_UNIT * (word_mode_size - cmp_bytes);
+	  rtx mask = GEN_INT (HOST_WIDE_INT_M1U << mb);
+	  if (word_mode == DImode)
+	    {
+	      emit_insn (gen_anddi3_mask (tmp_reg_src1, tmp_reg_src1, mask));
+	      emit_insn (gen_anddi3_mask (tmp_reg_src2, tmp_reg_src2, mask));
+	    }
+	  else
+	    {
+	      emit_insn (gen_andsi3_mask (tmp_reg_src1, tmp_reg_src1, mask));
+	      emit_insn (gen_andsi3_mask (tmp_reg_src2, tmp_reg_src2, mask));
+	    }
+	}
+
+      int remain = bytes - cmp_bytes;
+
+      rtx dst_label;
+      if (remain > 0)
+	{
+	  if (!cleanup_label)
+	    cleanup_label = gen_label_rtx ();
+	  dst_label = cleanup_label;
+	}
+      else
+	dst_label = final_move_label;
+
+      rtx lab_ref = gen_rtx_LABEL_REF (VOIDmode, dst_label);
+      rtx cond = gen_reg_rtx (CCmode);
+
+      if (remain == 0)
+	{
+	  /* For the last chunk, subf. also
+	     generates the zero result we need.	 */
+	  rtx tmp = gen_rtx_MINUS (word_mode, tmp_reg_src1, tmp_reg_src2);
+	  rs6000_emit_dot_insn (result_reg, tmp, 1, cond);
+	}
+      else
+	emit_move_insn (cond, gen_rtx_COMPARE (CCmode,
+					       tmp_reg_src1, tmp_reg_src2));
+
+      rtx cmp_rtx;
+      if (remain > 0)
+	cmp_rtx = gen_rtx_NE (VOIDmode, cond, const0_rtx);
+      else
+	cmp_rtx = gen_rtx_EQ (VOIDmode, cond, const0_rtx);
+
+      rtx ifelse = gen_rtx_IF_THEN_ELSE (VOIDmode, cmp_rtx,
+					 lab_ref, pc_rtx);
+      rtx j = emit_jump_insn (gen_rtx_SET (pc_rtx, ifelse));
+      JUMP_LABEL (j) = dst_label;
+      LABEL_NUSES (dst_label) += 1;
+
+      offset += cmp_bytes;
+      bytes -= cmp_bytes;
+    }
+
+  if (cleanup_label)
+    emit_label (cleanup_label);
+
+  /* Generate the final sequence that identifies the differing
+     byte and generates the final result, taking into account
+     zero bytes:
+
+     cmpb	       cmpb_result1, src1, src2
+     cmpb	       cmpb_result2, src1, zero
+     orc	       cmpb_result1, cmp_result1, cmpb_result2
+     cntlzd	       get bit of first zero/diff byte
+     addi	       convert for rldcl use
+     rldcl rldcl   extract diff/zero byte
+     subf	       subtract for final result
+  */
+
+  rtx cmpb_diff = gen_reg_rtx (word_mode);
+  rtx cmpb_zero = gen_reg_rtx (word_mode);
+  rtx rot_amt = gen_reg_rtx (word_mode);
+  rtx zero_reg = gen_reg_rtx (word_mode);
+
+  rtx rot1_1 = gen_reg_rtx(word_mode);
+  rtx rot1_2 = gen_reg_rtx(word_mode);
+  rtx rot2_1 = gen_reg_rtx(word_mode);
+  rtx rot2_2 = gen_reg_rtx(word_mode);
+
+  if (word_mode == SImode)
+    {
+      emit_insn (gen_cmpbsi3 (cmpb_diff, tmp_reg_src1, tmp_reg_src2));
+      emit_insn (gen_movsi (zero_reg, GEN_INT(0)));
+      emit_insn (gen_cmpbsi3 (cmpb_zero, tmp_reg_src1, zero_reg));
+      emit_insn (gen_one_cmplsi2 (cmpb_diff,cmpb_diff));
+      emit_insn (gen_iorsi3 (cmpb_diff, cmpb_diff, cmpb_zero));
+      emit_insn (gen_clzsi2 (rot_amt, cmpb_diff));
+      emit_insn (gen_addsi3 (rot_amt, rot_amt, GEN_INT (8)));
+      emit_insn (gen_rotlsi3 (rot1_1, tmp_reg_src1,
+			      gen_lowpart (SImode, rot_amt)));
+      emit_insn (gen_andsi3_mask (rot1_2, rot1_1, GEN_INT(0xff)));
+      emit_insn (gen_rotlsi3 (rot2_1, tmp_reg_src2,
+			      gen_lowpart (SImode, rot_amt)));
+      emit_insn (gen_andsi3_mask (rot2_2, rot2_1, GEN_INT(0xff)));
+      emit_insn (gen_subsi3 (result_reg, rot1_2, rot2_2));
+    }
+  else
+    {
+      emit_insn (gen_cmpbdi3 (cmpb_diff, tmp_reg_src1, tmp_reg_src2));
+      emit_insn (gen_movdi (zero_reg, GEN_INT(0)));
+      emit_insn (gen_cmpbdi3 (cmpb_zero, tmp_reg_src1, zero_reg));
+      emit_insn (gen_one_cmpldi2 (cmpb_diff,cmpb_diff));
+      emit_insn (gen_iordi3 (cmpb_diff, cmpb_diff, cmpb_zero));
+      emit_insn (gen_clzdi2 (rot_amt, cmpb_diff));
+      emit_insn (gen_adddi3 (rot_amt, rot_amt, GEN_INT (8)));
+      emit_insn (gen_rotldi3 (rot1_1, tmp_reg_src1,
+			      gen_lowpart (SImode, rot_amt)));
+      emit_insn (gen_anddi3_mask (rot1_2, rot1_1, GEN_INT(0xff)));
+      emit_insn (gen_rotldi3 (rot2_1, tmp_reg_src2,
+			      gen_lowpart (SImode, rot_amt)));
+      emit_insn (gen_anddi3_mask (rot2_2, rot2_1, GEN_INT(0xff)));
+      emit_insn (gen_subdi3 (result_reg, rot1_2, rot2_2));
+    }
+
+  emit_label (final_move_label);
+  emit_insn (gen_movsi (target,
+			gen_lowpart (SImode, result_reg)));
+  emit_label (final_label);
+  return true;
+}
+
+\f
 /* Expand a block move operation, and return 1 if successful.  Return 0
    if we should let the compiler generate normal code.
 
Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 243799)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -117,6 +117,7 @@
    UNSPEC_BPERM
    UNSPEC_COPYSIGN
    UNSPEC_PARITY
+   UNSPEC_CMPB
    UNSPEC_FCTIW
    UNSPEC_FCTID
    UNSPEC_LFIWAX
@@ -2316,6 +2317,13 @@
   "prty<wd> %0,%1"
   [(set_attr "type" "popcnt")])
 
+(define_insn "cmpb<mode>3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
+	(unspec:GPR [(match_operand:GPR 1 "gpc_reg_operand" "r")
+		     (match_operand:GPR 2 "gpc_reg_operand" "r")] UNSPEC_CMPB))]
+  "TARGET_CMPB"
+  "cmpb %0,%1,%2"
+  [(set_attr "type" "cmp")])
 
 ;; Since the hardware zeros the upper part of the register, save generating the
 ;; AND immediate if we are converting to unsigned
@@ -8844,7 +8852,7 @@
     FAIL;
 }")
 
-;; String/block compare insn.
+;; String compare N insn.
 ;; Argument 0 is the target (result)
 ;; Argument 1 is the destination
 ;; Argument 2 is the source
@@ -8851,6 +8859,27 @@
 ;; Argument 3 is the length
 ;; Argument 4 is the alignment
 
+(define_expand "cmpstrnsi"
+  [(parallel [(set (match_operand:SI 0)
+               (compare:SI (match_operand:BLK 1)
+                           (match_operand:BLK 2)))
+	      (use (match_operand:SI 3))
+	      (use (match_operand:SI 4))])]
+  "TARGET_CMPB && (BYTES_BIG_ENDIAN || TARGET_LDBRX)"
+{
+  if (expand_strn_compare (operands))
+    DONE;
+  else	
+    FAIL;
+})
+
+;; Block compare insn.
+;; Argument 0 is the target (result)
+;; Argument 1 is the destination
+;; Argument 2 is the source
+;; Argument 3 is the length
+;; Argument 4 is the alignment
+
 (define_expand "cmpmemsi"
   [(parallel [(set (match_operand:SI 0)
                (compare:SI (match_operand:BLK 1)
Index: gcc/config/rs6000/rs6000.opt
===================================================================
--- gcc/config/rs6000/rs6000.opt	(revision 243799)
+++ gcc/config/rs6000/rs6000.opt	(working copy)
@@ -337,6 +337,10 @@
 Target Report Var(rs6000_block_compare_inline_limit) Init(5) RejectNegative Joined UInteger Save
 Specify the maximum number pairs of load instructions that should be generated inline for the compare.  If the number needed exceeds the limit, a call to memcmp will be generated instead.
 
+mstring-compare-inline-limit=
+Target Report Var(rs6000_string_compare_inline_limit) Init(8) RejectNegative Joined UInteger Save
+Specify the maximum number pairs of load instructions that should be generated inline for the compare.  If the number needed exceeds the limit, a call to strncmp will be generated instead.
+
 misel
 Target Report Mask(ISEL) Var(rs6000_isa_flags)
 Generate isel instructions.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] builtin expansion of strncmp for rs6000
  2016-12-19 15:58   ` Aaron Sawdey
@ 2016-12-19 17:32     ` Segher Boessenkool
  0 siblings, 0 replies; 8+ messages in thread
From: Segher Boessenkool @ 2016-12-19 17:32 UTC (permalink / raw)
  To: Aaron Sawdey; +Cc: gcc-patches, David Edelsohn, Bill Schmidt

Hi Aaron,

On Mon, Dec 19, 2016 at 09:57:07AM -0600, Aaron Sawdey wrote:
> Bootstrap/regtest in progress on ppc64le -mcpu=power8, ok for trunk if
> results are clean?


> +/* Generate alignment check and branch code to set up for
> +   strncmp when we don't have DI alignment.
> +   STRNCMP_LABEL is the label to branch if there is a page crossing.
> +   SRC is the string pointer to be examined.
> +   BYTES is the max number of bytes to compare.	 */

You have "tab, space" before the */ .

> +static void
> +expand_strncmp_align_check (rtx strncmp_label, rtx src, HOST_WIDE_INT bytes)
> +{
> +  rtx lab_ref = gen_rtx_LABEL_REF (VOIDmode, strncmp_label);
> +  rtx src_check = copy_addr_to_reg (XEXP (src, 0));
> +  if (GET_MODE (src_check) == SImode)
> +    emit_insn (gen_andsi3 (src_check, src_check, GEN_INT (0xfff)));
> +  else
> +    emit_insn (gen_anddi3 (src_check, src_check, GEN_INT (0xfff)));
> +  rtx cond = gen_reg_rtx (CCmode);
> +  emit_move_insn (cond, gen_rtx_COMPARE (CCmode, src_check,
> +					 GEN_INT (4096 - bytes)));
>  \f

And the page break here still.

> +  /* SLOW_UNALIGNED_ACCESS -- don't do unaligned stuff.	 */

Tab space.

> +    {
> +      /* Compare sequence:
> +	 check each 8B with: ld/ld cmpd bne
> +	 cleanup code at end:
> +	 cmpb	       get byte that differs
> +	 cmpb	       look for zero byte
> +	 orc	       combine
> +	 cntlzd	       get bit of first zero/diff byte
> +	 subfic	       convert for rldcl use
> +	 rldcl rldcl   extract diff/zero byte
> +	 subf	       subtract for final result

Mixed tabs and spaces here.

> +      /* We must always left-align the data we read, and
> +	 clear any bytes to the right that are beyond the string.
> +	 Otherwise the cmpb sequence won't produce the correct
> +	 results.  The beginning of the compare will be done
> +	 with word_mode so will not have any extra shifts or
> +	 clear rights.	*/

You have just a tab before */ here.

> +  /* Generate the final sequence that identifies the differing
> +     byte and generates the final result, taking into account
> +     zero bytes:
> +
> +     cmpb	       cmpb_result1, src1, src2
> +     cmpb	       cmpb_result2, src1, zero
> +     orc	       cmpb_result1, cmp_result1, cmpb_result2
> +     cntlzd	       get bit of first zero/diff byte
> +     addi	       convert for rldcl use
> +     rldcl rldcl   extract diff/zero byte
> +     subf	       subtract for final result
> +  */

Mixed tabs and spaces.

The rest looks good.  Please fix those remaining whitespace errors, and
it is okay for trunk.  Thanks,


Segher

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] builtin expansion of strncmp for rs6000
  2016-12-15 16:34 [PATCH] builtin expansion of strncmp for rs6000 Aaron Sawdey
  2016-12-16 23:57 ` Segher Boessenkool
@ 2017-01-12 17:53 ` Joseph Myers
  2017-01-12 21:54   ` Segher Boessenkool
  1 sibling, 1 reply; 8+ messages in thread
From: Joseph Myers @ 2017-01-12 17:53 UTC (permalink / raw)
  To: Aaron Sawdey
  Cc: gcc-patches, Segher Boessenkool, David Edelsohn, Bill Schmidt,
	Tulio Magno Quites Machado Filho

On Thu, 15 Dec 2016, Aaron Sawdey wrote:

> +      emit_library_call_value (gen_rtx_SYMBOL_REF (Pmode, "strncmp"),
> +			       target, LCT_NORMAL, GET_MODE (target), 3,
> +			       force_reg (Pmode, XEXP (src1, 0)), Pmode,
> +			       force_reg (Pmode, XEXP (src2, 0)), Pmode,
> +			       len_rtx, GET_MODE (len_rtx));

Building glibc with GCC mainline for powerpc64le-linux-gnu (from my bot 
doing such builds and compilation parts of the testsuite for all GNU/Linux 
glibc ABIs daily), I'm seeing a failure of the elf/check-localplt test, 
"Extra PLT reference: libc.so: strncmp".

Without having bisected, I suspect this patch of being a likely cause of 
that failure.  glibc redirects internal calls to use hidden aliases such 
as __GI_strncmp to avoid them going through the PLT.  By using 
gen_rtx_SYMBOL_REF there with a hardcoded function name "strncmp", this 
code looks like it would render a declaration of strncmp with asm 
("__GI_strncmp") ineffective, generating a direct call to strncmp when 
glibc expects __GI_strncmp to be called instead.

I don't know how readily this code can be made to respect asm renaming of 
strncmp.  If it's hard to fix in GCC, I suppose glibc needs a powerpc 
version of symbol-hacks.h to handle this redirection.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] builtin expansion of strncmp for rs6000
  2017-01-12 17:53 ` Joseph Myers
@ 2017-01-12 21:54   ` Segher Boessenkool
  2017-01-13 11:51     ` Tulio Magno Quites Machado Filho
  0 siblings, 1 reply; 8+ messages in thread
From: Segher Boessenkool @ 2017-01-12 21:54 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Aaron Sawdey, gcc-patches, David Edelsohn, Bill Schmidt,
	Tulio Magno Quites Machado Filho

On Thu, Jan 12, 2017 at 05:53:06PM +0000, Joseph Myers wrote:
> On Thu, 15 Dec 2016, Aaron Sawdey wrote:
> 
> > +      emit_library_call_value (gen_rtx_SYMBOL_REF (Pmode, "strncmp"),
> > +			       target, LCT_NORMAL, GET_MODE (target), 3,
> > +			       force_reg (Pmode, XEXP (src1, 0)), Pmode,
> > +			       force_reg (Pmode, XEXP (src2, 0)), Pmode,
> > +			       len_rtx, GET_MODE (len_rtx));
> 
> Building glibc with GCC mainline for powerpc64le-linux-gnu (from my bot 
> doing such builds and compilation parts of the testsuite for all GNU/Linux 
> glibc ABIs daily), I'm seeing a failure of the elf/check-localplt test, 
> "Extra PLT reference: libc.so: strncmp".
> 
> Without having bisected, I suspect this patch of being a likely cause of 
> that failure.  glibc redirects internal calls to use hidden aliases such 
> as __GI_strncmp to avoid them going through the PLT.  By using 
> gen_rtx_SYMBOL_REF there with a hardcoded function name "strncmp", this 
> code looks like it would render a declaration of strncmp with asm 
> ("__GI_strncmp") ineffective, generating a direct call to strncmp when 
> glibc expects __GI_strncmp to be called instead.
> 
> I don't know how readily this code can be made to respect asm renaming of 
> strncmp.  If it's hard to fix in GCC, I suppose glibc needs a powerpc 
> version of symbol-hacks.h to handle this redirection.

Tulio will look at fixing it in glibc tomorrow.  Thanks for the report,



Segher

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] builtin expansion of strncmp for rs6000
  2017-01-12 21:54   ` Segher Boessenkool
@ 2017-01-13 11:51     ` Tulio Magno Quites Machado Filho
  2017-01-13 13:01       ` Joseph Myers
  0 siblings, 1 reply; 8+ messages in thread
From: Tulio Magno Quites Machado Filho @ 2017-01-13 11:51 UTC (permalink / raw)
  To: Segher Boessenkool, Aaron Sawdey
  Cc: gcc-patches, David Edelsohn, Bill Schmidt, Joseph Myers

Segher Boessenkool <segher@kernel.crashing.org> writes:

> On Thu, Jan 12, 2017 at 05:53:06PM +0000, Joseph Myers wrote:
>> On Thu, 15 Dec 2016, Aaron Sawdey wrote:
>> 
>> > +      emit_library_call_value (gen_rtx_SYMBOL_REF (Pmode, "strncmp"),
>> > +			       target, LCT_NORMAL, GET_MODE (target), 3,
>> > +			       force_reg (Pmode, XEXP (src1, 0)), Pmode,
>> > +			       force_reg (Pmode, XEXP (src2, 0)), Pmode,
>> > +			       len_rtx, GET_MODE (len_rtx));
>> 
>> Building glibc with GCC mainline for powerpc64le-linux-gnu (from my bot 
>> doing such builds and compilation parts of the testsuite for all GNU/Linux 
>> glibc ABIs daily), I'm seeing a failure of the elf/check-localplt test, 
>> "Extra PLT reference: libc.so: strncmp".
>> 
>> Without having bisected, I suspect this patch of being a likely cause of 
>> that failure.  glibc redirects internal calls to use hidden aliases such 
>> as __GI_strncmp to avoid them going through the PLT.  By using 
>> gen_rtx_SYMBOL_REF there with a hardcoded function name "strncmp", this 
>> code looks like it would render a declaration of strncmp with asm 
>> ("__GI_strncmp") ineffective, generating a direct call to strncmp when 
>> glibc expects __GI_strncmp to be called instead.
>> 
>> I don't know how readily this code can be made to respect asm renaming of 
>> strncmp.  If it's hard to fix in GCC, I suppose glibc needs a powerpc 
>> version of symbol-hacks.h to handle this redirection.
>
> Tulio will look at fixing it in glibc tomorrow.  Thanks for the report,

Does this code affects a specific rs6000 build? i.e. powerpc64le -mcpu=power8.
Or is it generic?

-- 
Tulio Magno

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] builtin expansion of strncmp for rs6000
  2017-01-13 11:51     ` Tulio Magno Quites Machado Filho
@ 2017-01-13 13:01       ` Joseph Myers
  0 siblings, 0 replies; 8+ messages in thread
From: Joseph Myers @ 2017-01-13 13:01 UTC (permalink / raw)
  To: Tulio Magno Quites Machado Filho
  Cc: Segher Boessenkool, Aaron Sawdey, gcc-patches, David Edelsohn,
	Bill Schmidt

On Fri, 13 Jan 2017, Tulio Magno Quites Machado Filho wrote:

> > Tulio will look at fixing it in glibc tomorrow.  Thanks for the report,
> 
> Does this code affects a specific rs6000 build? i.e. powerpc64le -mcpu=power8.
> Or is it generic?

build-many-glibcs.py does not use any special configure options in GCC or 
glibc to select a particular processor (which I think means a POWER8 
default in GCC; I don't think powerpc glibc yet determines which sysdeps 
directories to use based on compiler defaults, however, though we've 
discussed that this approach, as used by the ARM port, is to be preferred 
over --with-cpu).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-01-13 13:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-15 16:34 [PATCH] builtin expansion of strncmp for rs6000 Aaron Sawdey
2016-12-16 23:57 ` Segher Boessenkool
2016-12-19 15:58   ` Aaron Sawdey
2016-12-19 17:32     ` Segher Boessenkool
2017-01-12 17:53 ` Joseph Myers
2017-01-12 21:54   ` Segher Boessenkool
2017-01-13 11:51     ` Tulio Magno Quites Machado Filho
2017-01-13 13:01       ` Joseph Myers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).