[Patchv2, rs6000] Clean up pre-checkings of expand_block

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [Patchv2, rs6000] Clean up pre-checkings of expand_block_compare
@ 2023-12-18  2:44 HAO CHEN GUI
  2023-12-19  6:49 ` Kewen.Lin
  0 siblings, 1 reply; 2+ messages in thread
From: HAO CHEN GUI @ 2023-12-18  2:44 UTC (permalink / raw)
  To: gcc-patches; +Cc: Segher Boessenkool, David, Kewen.Lin, Peter Bergner

Hi,
  This patch cleans up pre-checkings of expand_block_compare. It does
1. Assert only P7 above can enter this function as it's already guard
by the expand.
2. Return false when optimizing for size.
3. Remove P7 processor test as only P7 above can enter this function and
P7 LE is excluded by targetm.slow_unaligned_access. On P7 BE, the
performance of expand is better than the performance of library when
the length is long.

  Compared to last version,
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640082.html
the main change is to add some comments and move the variable definition
closed to its use.

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Clean up the pre-checkings of expand_block_compare

gcc/
	* gcc/config/rs6000/rs6000-string.cc (expand_block_compare): Assert
	only P7 above can enter this function.  Return false (call library)
	when it's optimized for size.  Remove P7 CPU test as only P7 above
	can enter this function and P7 LE is excluded by the checking of
	targetm.slow_unaligned_access on word_mode.  Also performance test
	shows the expand of block compare with 16 bytes to 64 bytes length
	is better than library on P7 BE.

gcc/testsuite/
	* gcc.target/powerpc/block-cmp-3.c: New.


patch.diff
diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc
index cb9eeef05d8..49670cef4d7 100644
--- a/gcc/config/rs6000/rs6000-string.cc
+++ b/gcc/config/rs6000/rs6000-string.cc
@@ -1946,36 +1946,32 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, unsigned int base_align,
 bool
 expand_block_compare (rtx operands[])
 {
-  rtx target = operands[0];
-  rtx orig_src1 = operands[1];
-  rtx orig_src2 = operands[2];
-  rtx bytes_rtx = operands[3];
-  rtx align_rtx = operands[4];
+  /* TARGET_POPCNTD is already guarded at expand cmpmemsi.  */
+  gcc_assert (TARGET_POPCNTD);

-  /* This case is complicated to handle because the subtract
-     with carry instructions do not generate the 64-bit
-     carry and so we must emit code to calculate it ourselves.
-     We choose not to implement this yet.  */
-  if (TARGET_32BIT && TARGET_POWERPC64)
+  if (optimize_insn_for_size_p ())
     return false;

-  bool isP7 = (rs6000_tune == PROCESSOR_POWER7);
-
   /* Allow this param to shut off all expansion.  */
   if (rs6000_block_compare_inline_limit == 0)
     return false;

-  /* targetm.slow_unaligned_access -- don't do unaligned stuff.
-     However slow_unaligned_access returns true on P7 even though the
-     performance of this code is good there.  */
-  if (!isP7
-      && (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1))
-	  || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2))))
+  /* This case is complicated to handle because the subtract
+     with carry instructions do not generate the 64-bit
+     carry and so we must emit code to calculate it ourselves.
+     We choose not to implement this yet.  */
+  if (TARGET_32BIT && TARGET_POWERPC64)
     return false;

-  /* Unaligned l*brx traps on P7 so don't do this.  However this should
-     not affect much because LE isn't really supported on P7 anyway.  */
-  if (isP7 && !BYTES_BIG_ENDIAN)
+  rtx target = operands[0];
+  rtx orig_src1 = operands[1];
+  rtx orig_src2 = operands[2];
+  rtx bytes_rtx = operands[3];
+  rtx align_rtx = operands[4];
+
+  /* targetm.slow_unaligned_access -- don't do unaligned stuff.  */
+    if (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1))
+	|| targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2)))
     return false;

   /* If this is not a fixed size compare, try generating loop code and
@@ -2023,14 +2019,6 @@ expand_block_compare (rtx operands[])
   if (!IN_RANGE (bytes, 1, max_bytes))
     return expand_compare_loop (operands);

-  /* The code generated for p7 and older is not faster than glibc
-     memcmp if alignment is small and length is not short, so bail
-     out to avoid those conditions.  */
-  if (targetm.slow_unaligned_access (word_mode, UINTVAL (align_rtx))
-      && ((base_align == 1 && bytes > 16)
-	  || (base_align == 2 && bytes > 32)))
-    return false;
-
   rtx final_label = NULL;

   if (use_vec)
diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c b/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c
new file mode 100644
index 00000000000..c7e853ad593
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-Os" } */
+/* { dg-final { scan-assembler-times {\mb[l]? memcmp\M} 1 } }  */
+
+int foo (const char* s1, const char* s2)
+{
+  return __builtin_memcmp (s1, s2, 4);
+}

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [Patchv2, rs6000] Clean up pre-checkings of expand_block_compare
  2023-12-18  2:44 [Patchv2, rs6000] Clean up pre-checkings of expand_block_compare HAO CHEN GUI
@ 2023-12-19  6:49 ` Kewen.Lin
  0 siblings, 0 replies; 2+ messages in thread
From: Kewen.Lin @ 2023-12-19  6:49 UTC (permalink / raw)
  To: HAO CHEN GUI; +Cc: Segher Boessenkool, David, Peter Bergner, gcc-patches

Hi Haochen,

on 2023/12/18 10:44, HAO CHEN GUI wrote:
> Hi,
>   This patch cleans up pre-checkings of expand_block_compare. It does
> 1. Assert only P7 above can enter this function as it's already guard
> by the expand.
> 2. Return false when optimizing for size.
> 3. Remove P7 processor test as only P7 above can enter this function and
> P7 LE is excluded by targetm.slow_unaligned_access. On P7 BE, the
> performance of expand is better than the performance of library when
> the length is long.

Maybe it's better to split the handling for optimizing for size out to a
separated patch, since it's not actually a clean up.  Sorry, I should
have suggested this in the previous review.  For 3, as you have evaluated
the performance on Power7, I think it's safe to make this change now, so
this patch is ok for trunk, thanks!

> 
>   Compared to last version,
> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640082.html
> the main change is to add some comments and move the variable definition
> closed to its use.
> 
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions. Is this OK for trunk?
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: Clean up the pre-checkings of expand_block_compare
> 
> gcc/
> 	* gcc/config/rs6000/rs6000-string.cc (expand_block_compare): Assert
> 	only P7 above can enter this function.  Return false (call library)
> 	when it's optimized for size.  Remove P7 CPU test as only P7 above
> 	can enter this function and P7 LE is excluded by the checking of
> 	targetm.slow_unaligned_access on word_mode.  Also performance test
> 	shows the expand of block compare with 16 bytes to 64 bytes length
> 	is better than library on P7 BE.

Nit: You can just describe "what's done" but not "why" here, and put "why"
into the commit log instead.

BR,
Kewen

> 
> gcc/testsuite/
> 	* gcc.target/powerpc/block-cmp-3.c: New.
> 
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-string.cc b/gcc/config/rs6000/rs6000-string.cc
> index cb9eeef05d8..49670cef4d7 100644
> --- a/gcc/config/rs6000/rs6000-string.cc
> +++ b/gcc/config/rs6000/rs6000-string.cc
> @@ -1946,36 +1946,32 @@ expand_block_compare_gpr(unsigned HOST_WIDE_INT bytes, unsigned int base_align,
>  bool
>  expand_block_compare (rtx operands[])
>  {
> -  rtx target = operands[0];
> -  rtx orig_src1 = operands[1];
> -  rtx orig_src2 = operands[2];
> -  rtx bytes_rtx = operands[3];
> -  rtx align_rtx = operands[4];
> +  /* TARGET_POPCNTD is already guarded at expand cmpmemsi.  */
> +  gcc_assert (TARGET_POPCNTD);
> 
> -  /* This case is complicated to handle because the subtract
> -     with carry instructions do not generate the 64-bit
> -     carry and so we must emit code to calculate it ourselves.
> -     We choose not to implement this yet.  */
> -  if (TARGET_32BIT && TARGET_POWERPC64)
> +  if (optimize_insn_for_size_p ())
>      return false;
> 
> -  bool isP7 = (rs6000_tune == PROCESSOR_POWER7);
> -
>    /* Allow this param to shut off all expansion.  */
>    if (rs6000_block_compare_inline_limit == 0)
>      return false;
> 
> -  /* targetm.slow_unaligned_access -- don't do unaligned stuff.
> -     However slow_unaligned_access returns true on P7 even though the
> -     performance of this code is good there.  */
> -  if (!isP7
> -      && (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1))
> -	  || targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2))))
> +  /* This case is complicated to handle because the subtract
> +     with carry instructions do not generate the 64-bit
> +     carry and so we must emit code to calculate it ourselves.
> +     We choose not to implement this yet.  */
> +  if (TARGET_32BIT && TARGET_POWERPC64)
>      return false;
> 
> -  /* Unaligned l*brx traps on P7 so don't do this.  However this should
> -     not affect much because LE isn't really supported on P7 anyway.  */
> -  if (isP7 && !BYTES_BIG_ENDIAN)
> +  rtx target = operands[0];
> +  rtx orig_src1 = operands[1];
> +  rtx orig_src2 = operands[2];
> +  rtx bytes_rtx = operands[3];
> +  rtx align_rtx = operands[4];
> +
> +  /* targetm.slow_unaligned_access -- don't do unaligned stuff.  */
> +    if (targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src1))
> +	|| targetm.slow_unaligned_access (word_mode, MEM_ALIGN (orig_src2)))
>      return false;
> 
>    /* If this is not a fixed size compare, try generating loop code and
> @@ -2023,14 +2019,6 @@ expand_block_compare (rtx operands[])
>    if (!IN_RANGE (bytes, 1, max_bytes))
>      return expand_compare_loop (operands);
> 
> -  /* The code generated for p7 and older is not faster than glibc
> -     memcmp if alignment is small and length is not short, so bail
> -     out to avoid those conditions.  */
> -  if (targetm.slow_unaligned_access (word_mode, UINTVAL (align_rtx))
> -      && ((base_align == 1 && bytes > 16)
> -	  || (base_align == 2 && bytes > 32)))
> -    return false;
> -
>    rtx final_label = NULL;
> 
>    if (use_vec)
> diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c b/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c
> new file mode 100644
> index 00000000000..c7e853ad593
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-3.c
> @@ -0,0 +1,8 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Os" } */
> +/* { dg-final { scan-assembler-times {\mb[l]? memcmp\M} 1 } }  */
> +
> +int foo (const char* s1, const char* s2)
> +{
> +  return __builtin_memcmp (s1, s2, 4);
> +}


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-12-19  6:49 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-18  2:44 [Patchv2, rs6000] Clean up pre-checkings of expand_block_compare HAO CHEN GUI
2023-12-19  6:49 ` Kewen.Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).