[PATCH-2, rs6000] Enable vector mode for memory equality compare [PR111449]

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH-2, rs6000] Enable vector mode for memory equality compare [PR111449]
@ 2023-10-09  2:30 HAO CHEN GUI
  2023-10-09 15:42 ` David Edelsohn
  0 siblings, 1 reply; 7+ messages in thread
From: HAO CHEN GUI @ 2023-10-09  2:30 UTC (permalink / raw)
  To: gcc-patches; +Cc: Segher Boessenkool, David, Kewen.Lin, Peter Bergner

Hi,
  This patch enables vector mode for memory equality compare by adding
a new expand cbranchv16qi4 and implementing it. Also the corresponding
CC reg and compare code is set in rs6000_generate_compare. With the
patch, 16-byte equality compare can be implemented by one vector compare
instructions other than 2 8-byte compares with branches.

  The test case is in the second patch which is rs6000 specific.

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions.

Thanks
Gui Haochen

ChangeLog
rs6000: Enable vector compare for memory equality compare

gcc/
	PR target/111449
	* config/rs6000/altivec.md (cbranchv16qi4): New expand pattern.
	* config/rs6000/rs6000.cc (rs6000_generate_compare): Generate insn
	sequence for V16QImode equality compare.
	* config/rs6000/rs6000.h (MOVE_MAX_PIECES): Define.
	(COMPARE_MAX_PIECES): Define.

gcc/testsuite/
	PR target/111449
	* gcc.target/powerpc/pr111449.c: New.

patch.diff
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index e8a596fb7e9..c69bf266402 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2605,6 +2605,39 @@ (define_insn "altivec_vupklpx"
 }
   [(set_attr "type" "vecperm")])

+(define_expand "cbranchv16qi4"
+  [(use (match_operator 0 "equality_operator"
+	[(match_operand:V16QI 1 "gpc_reg_operand")
+	 (match_operand:V16QI 2 "gpc_reg_operand")]))
+   (use (match_operand 3))]
+  "VECTOR_UNIT_ALTIVEC_P (V16QImode)"
+{
+  if (!TARGET_P9_VECTOR
+      && MEM_P (operands[1])
+      && !altivec_indexed_or_indirect_operand (operands[1], V16QImode)
+      && MEM_P (operands[2])
+      && !altivec_indexed_or_indirect_operand (operands[2], V16QImode))
+    {
+      /* Use direct move as the byte order doesn't matter for equality
+	 compare.  */
+      rtx reg_op1 = gen_reg_rtx (V16QImode);
+      rtx reg_op2 = gen_reg_rtx (V16QImode);
+      rs6000_emit_le_vsx_permute (reg_op1, operands[1], V16QImode);
+      rs6000_emit_le_vsx_permute (reg_op2, operands[2], V16QImode);
+      operands[1] = reg_op1;
+      operands[2] = reg_op2;
+    }
+  else
+    {
+      operands[1] = force_reg (V16QImode, operands[1]);
+      operands[2] = force_reg (V16QImode, operands[2]);
+    }
+  rtx_code code = GET_CODE (operands[0]);
+  operands[0] = gen_rtx_fmt_ee (code, V16QImode, operands[1], operands[2]);
+  rs6000_emit_cbranch (V16QImode, operands);
+  DONE;
+})
+
 ;; Compare vectors producing a vector result and a predicate, setting CR6 to
 ;; indicate a combined status
 (define_insn "altivec_vcmpequ<VI_char>_p"
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index efe9adce1f8..0087d786840 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15264,6 +15264,15 @@ rs6000_generate_compare (rtx cmp, machine_mode mode)
 	  else
 	    emit_insn (gen_stack_protect_testsi (compare_result, op0, op1b));
 	}
+      else if (mode == V16QImode)
+	{
+	  gcc_assert (code == EQ || code == NE);
+
+	  rtx result_vector = gen_reg_rtx (V16QImode);
+	  compare_result = gen_rtx_REG (CCmode, CR6_REGNO);
+	  emit_insn (gen_altivec_vcmpequb_p (result_vector, op0, op1));
+	  code = (code == NE) ? GE : LT;
+	}
       else
 	emit_insn (gen_rtx_SET (compare_result,
 				gen_rtx_COMPARE (comp_mode, op0, op1)));
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3503614efbd..dc33bca0802 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -1730,6 +1730,8 @@ typedef struct rs6000_args
    in one reasonably fast instruction.  */
 #define MOVE_MAX (! TARGET_POWERPC64 ? 4 : 8)
 #define MAX_MOVE_MAX 8
+#define MOVE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)
+#define COMPARE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)

 /* Nonzero if access to memory by bytes is no faster than for words.
    Also nonzero if doing byte operations (specifically shifts) in registers
diff --git a/gcc/testsuite/gcc.target/powerpc/pr111449.c b/gcc/testsuite/gcc.target/powerpc/pr111449.c
new file mode 100644
index 00000000000..a8c30b92a41
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr111449.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-maltivec -O2" } */
+/* { dg-require-effective-target has_arch_ppc64 } */
+
+/* Ensure vector comparison is used for 16-byte memory equality compare.  */
+
+int compare1 (const char* s1, const char* s2)
+{
+  return __builtin_memcmp (s1, s2, 16) == 0;
+}
+
+int compare2 (const char* s1)
+{
+  return __builtin_memcmp (s1, "0123456789012345", 16) == 0;
+}
+
+/* { dg-final { scan-assembler-times {\mvcmpequb\.} 2 } } */
+/* { dg-final { scan-assembler-not {\mcmpd\M} } } */

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH-2, rs6000] Enable vector mode for memory equality compare [PR111449]
  2023-10-09  2:30 [PATCH-2, rs6000] Enable vector mode for memory equality compare [PR111449] HAO CHEN GUI
@ 2023-10-09 15:42 ` David Edelsohn
  2023-10-10  8:18   ` HAO CHEN GUI
  0 siblings, 1 reply; 7+ messages in thread
From: David Edelsohn @ 2023-10-09 15:42 UTC (permalink / raw)
  To: HAO CHEN GUI; +Cc: gcc-patches, Segher Boessenkool, Kewen.Lin, Peter Bergner

[-- Attachment #1: Type: text/plain, Size: 5678 bytes --]

On Sun, Oct 8, 2023 at 10:30 PM HAO CHEN GUI <guihaoc@linux.ibm.com> wrote:

> Hi,
>   This patch enables vector mode for memory equality compare by adding
> a new expand cbranchv16qi4 and implementing it. Also the corresponding
> CC reg and compare code is set in rs6000_generate_compare. With the
> patch, 16-byte equality compare can be implemented by one vector compare
> instructions other than 2 8-byte compares with branches.
>
>   The test case is in the second patch which is rs6000 specific.
>
>   Bootstrapped and tested on powerpc64-linux BE and LE with no
> regressions.
>

Thanks for working on this.



>
> Thanks
> Gui Haochen
>
> ChangeLog
> rs6000: Enable vector compare for memory equality compare
>
> gcc/
>         PR target/111449
>         * config/rs6000/altivec.md (cbranchv16qi4): New expand pattern.
>         * config/rs6000/rs6000.cc (rs6000_generate_compare): Generate insn
>         sequence for V16QImode equality compare.
>         * config/rs6000/rs6000.h (MOVE_MAX_PIECES): Define.
>         (COMPARE_MAX_PIECES): Define.
>
> gcc/testsuite/
>         PR target/111449
>         * gcc.target/powerpc/pr111449.c: New.
>
> patch.diff
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index e8a596fb7e9..c69bf266402 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -2605,6 +2605,39 @@ (define_insn "altivec_vupklpx"
>  }
>    [(set_attr "type" "vecperm")])
>
> +(define_expand "cbranchv16qi4"
> +  [(use (match_operator 0 "equality_operator"
> +       [(match_operand:V16QI 1 "gpc_reg_operand")
> +        (match_operand:V16QI 2 "gpc_reg_operand")]))
> +   (use (match_operand 3))]
> +  "VECTOR_UNIT_ALTIVEC_P (V16QImode)"
> +{
> +  if (!TARGET_P9_VECTOR
> +      && MEM_P (operands[1])
> +      && !altivec_indexed_or_indirect_operand (operands[1], V16QImode)
> +      && MEM_P (operands[2])
> +      && !altivec_indexed_or_indirect_operand (operands[2], V16QImode))
> +    {
> +      /* Use direct move as the byte order doesn't matter for equality
> +        compare.  */
> +      rtx reg_op1 = gen_reg_rtx (V16QImode);
> +      rtx reg_op2 = gen_reg_rtx (V16QImode);
> +      rs6000_emit_le_vsx_permute (reg_op1, operands[1], V16QImode);
> +      rs6000_emit_le_vsx_permute (reg_op2, operands[2], V16QImode);
> +      operands[1] = reg_op1;
> +      operands[2] = reg_op2;
> +    }
> +  else
> +    {
> +      operands[1] = force_reg (V16QImode, operands[1]);
> +      operands[2] = force_reg (V16QImode, operands[2]);
> +    }
> +  rtx_code code = GET_CODE (operands[0]);
> +  operands[0] = gen_rtx_fmt_ee (code, V16QImode, operands[1],
> operands[2]);
> +  rs6000_emit_cbranch (V16QImode, operands);
> +  DONE;
> +})
> +
>  ;; Compare vectors producing a vector result and a predicate, setting CR6
> to
>  ;; indicate a combined status
>  (define_insn "altivec_vcmpequ<VI_char>_p"
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index efe9adce1f8..0087d786840 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -15264,6 +15264,15 @@ rs6000_generate_compare (rtx cmp, machine_mode
> mode)
>           else
>             emit_insn (gen_stack_protect_testsi (compare_result, op0,
> op1b));
>         }
> +      else if (mode == V16QImode)
> +       {
> +         gcc_assert (code == EQ || code == NE);
> +
> +         rtx result_vector = gen_reg_rtx (V16QImode);
> +         compare_result = gen_rtx_REG (CCmode, CR6_REGNO);
> +         emit_insn (gen_altivec_vcmpequb_p (result_vector, op0, op1));
> +         code = (code == NE) ? GE : LT;
> +       }
>        else
>         emit_insn (gen_rtx_SET (compare_result,
>                                 gen_rtx_COMPARE (comp_mode, op0, op1)));
> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index 3503614efbd..dc33bca0802 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -1730,6 +1730,8 @@ typedef struct rs6000_args
>     in one reasonably fast instruction.  */
>  #define MOVE_MAX (! TARGET_POWERPC64 ? 4 : 8)
>  #define MAX_MOVE_MAX 8
> +#define MOVE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)
> +#define COMPARE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)
>

How are the definitions of MOVE_MAX_PIECES and COMPARE_MAX_PIECES
determined?  The email does not provide any explanation for the
implementation.  The rest of the patch is related to vector support, but
vector support is not dependent on TARGET_POWERPC64.

Thanks, David


>
>  /* Nonzero if access to memory by bytes is no faster than for words.
>     Also nonzero if doing byte operations (specifically shifts) in
> registers
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr111449.c
> b/gcc/testsuite/gcc.target/powerpc/pr111449.c
> new file mode 100644
> index 00000000000..a8c30b92a41
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr111449.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-options "-maltivec -O2" } */
> +/* { dg-require-effective-target has_arch_ppc64 } */
> +
> +/* Ensure vector comparison is used for 16-byte memory equality compare.
> */
> +
> +int compare1 (const char* s1, const char* s2)
> +{
> +  return __builtin_memcmp (s1, s2, 16) == 0;
> +}
> +
> +int compare2 (const char* s1)
> +{
> +  return __builtin_memcmp (s1, "0123456789012345", 16) == 0;
> +}
> +
> +/* { dg-final { scan-assembler-times {\mvcmpequb\.} 2 } } */
> +/* { dg-final { scan-assembler-not {\mcmpd\M} } } */
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH-2, rs6000] Enable vector mode for memory equality compare [PR111449]
  2023-10-09 15:42 ` David Edelsohn
@ 2023-10-10  8:18   ` HAO CHEN GUI
  2023-10-10 12:44     ` David Edelsohn
  2023-10-17  2:19     ` Kewen.Lin
  0 siblings, 2 replies; 7+ messages in thread
From: HAO CHEN GUI @ 2023-10-10  8:18 UTC (permalink / raw)
  To: David Edelsohn; +Cc: gcc-patches, Segher Boessenkool, Kewen.Lin, Peter Bergner

Hi David,

  Thanks for your review comments.

在 2023/10/9 23:42, David Edelsohn 写道:
>      #define MOVE_MAX (! TARGET_POWERPC64 ? 4 : 8)
>      #define MAX_MOVE_MAX 8
>     +#define MOVE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)
>     +#define COMPARE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)
> 
> 
> How are the definitions of MOVE_MAX_PIECES and COMPARE_MAX_PIECES determined?  The email does not provide any explanation for the implementation.  The rest of the patch is related to vector support, but vector support is not dependent on TARGET_POWERPC64.

By default, MOVE_MAX_PIECES and COMPARE_MAX_PIECES is set the same value
as MOVE_MAX. The move and compare instructions are required in
compare_by_pieces, those macros are set to 16 byte when supporting
vector mode (V16QImode). The problem is rs6000 hasn't supported TImode
for "-m32". We discussed it in issue 1307. TImode will be used for
move when MOVE_MAX_PIECES is set to 16. But TImode isn't supported
with "-m32" which might cause ICE.

So MOVE_MAX_PIECES and COMPARE_MAX_PIECES is set to 4 for 32 bit
target in this patch. They could be changed to 16 after rs6000
supports TImode with "-m32".

Thanks
Gui Haochen

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH-2, rs6000] Enable vector mode for memory equality compare [PR111449]
  2023-10-10  8:18   ` HAO CHEN GUI
@ 2023-10-10 12:44     ` David Edelsohn
  2023-10-11  9:10       ` HAO CHEN GUI
  2023-10-17  2:19     ` Kewen.Lin
  1 sibling, 1 reply; 7+ messages in thread
From: David Edelsohn @ 2023-10-10 12:44 UTC (permalink / raw)
  To: HAO CHEN GUI; +Cc: gcc-patches, Segher Boessenkool, Kewen.Lin, Peter Bergner

[-- Attachment #1: Type: text/plain, Size: 1531 bytes --]

On Tue, Oct 10, 2023 at 4:23 AM HAO CHEN GUI <guihaoc@linux.ibm.com> wrote:

> Hi David,
>
>   Thanks for your review comments.
>
> 在 2023/10/9 23:42, David Edelsohn 写道:
> >      #define MOVE_MAX (! TARGET_POWERPC64 ? 4 : 8)
> >      #define MAX_MOVE_MAX 8
> >     +#define MOVE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)
> >     +#define COMPARE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)
> >
> >
> > How are the definitions of MOVE_MAX_PIECES and COMPARE_MAX_PIECES
> determined?  The email does not provide any explanation for the
> implementation.  The rest of the patch is related to vector support, but
> vector support is not dependent on TARGET_POWERPC64.
>
> By default, MOVE_MAX_PIECES and COMPARE_MAX_PIECES is set the same value
> as MOVE_MAX. The move and compare instructions are required in
> compare_by_pieces, those macros are set to 16 byte when supporting
> vector mode (V16QImode). The problem is rs6000 hasn't supported TImode
> for "-m32". We discussed it in issue 1307. TImode will be used for
> move when MOVE_MAX_PIECES is set to 16. But TImode isn't supported
> with "-m32" which might cause ICE.
>
> So MOVE_MAX_PIECES and COMPARE_MAX_PIECES is set to 4 for 32 bit
> target in this patch. They could be changed to 16 after rs6000
> supports TImode with "-m32".
>

Hi, Hao

Thanks for the explanation.

Are you stating that although PPC32 supports V16QImode in VSX, the
move_by_pieces support also requires TImode, which is not available on
PPC32?

Thanks, David

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH-2, rs6000] Enable vector mode for memory equality compare [PR111449]
  2023-10-10 12:44     ` David Edelsohn
@ 2023-10-11  9:10       ` HAO CHEN GUI
  0 siblings, 0 replies; 7+ messages in thread
From: HAO CHEN GUI @ 2023-10-11  9:10 UTC (permalink / raw)
  To: David Edelsohn; +Cc: gcc-patches, Segher Boessenkool, Kewen.Lin, Peter Bergner

Hi David,

在 2023/10/10 20:44, David Edelsohn 写道:
> Are you stating that although PPC32 supports V16QImode in VSX, the move_by_pieces support also requires TImode, which is not available on PPC32?
> 

Yes. By setting MOVE_MAX_PIECES to 16, TImode compare
might be generated as it checks vector mode first then
uses scalar mode by default.

Thanks
Gui Haochen

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH-2, rs6000] Enable vector mode for memory equality compare [PR111449]
  2023-10-10  8:18   ` HAO CHEN GUI
  2023-10-10 12:44     ` David Edelsohn
@ 2023-10-17  2:19     ` Kewen.Lin
  2023-10-19  8:39       ` HAO CHEN GUI
  1 sibling, 1 reply; 7+ messages in thread
From: Kewen.Lin @ 2023-10-17  2:19 UTC (permalink / raw)
  To: HAO CHEN GUI, David Edelsohn
  Cc: gcc-patches, Segher Boessenkool, Peter Bergner

Hi,

on 2023/10/10 16:18, HAO CHEN GUI wrote:
> Hi David,
> 
>   Thanks for your review comments.
> 
> 在 2023/10/9 23:42, David Edelsohn 写道:
>>      #define MOVE_MAX (! TARGET_POWERPC64 ? 4 : 8)
>>      #define MAX_MOVE_MAX 8
>>     +#define MOVE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)
>>     +#define COMPARE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)
>>
>>
>> How are the definitions of MOVE_MAX_PIECES and COMPARE_MAX_PIECES determined?  The email does not provide any explanation for the implementation.  The rest of the patch is related to vector support, but vector support is not dependent on TARGET_POWERPC64.
> 
> By default, MOVE_MAX_PIECES and COMPARE_MAX_PIECES is set the same value
> as MOVE_MAX. The move and compare instructions are required in
> compare_by_pieces, those macros are set to 16 byte when supporting
> vector mode (V16QImode). The problem is rs6000 hasn't supported TImode
> for "-m32". We discussed it in issue 1307. TImode will be used for
> move when MOVE_MAX_PIECES is set to 16. But TImode isn't supported
> with "-m32" which might cause ICE.

I think David raised a good question, it sounds to me that the current
handling simply consider that if MOVE_MAX_PIECES is set to 16, the
required operations for this optimization on TImode are always available,
but unfortunately on rs6000 the assumption doesn't hold, so could we
teach generic code instead?

BR,
Kewen

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH-2, rs6000] Enable vector mode for memory equality compare [PR111449]
  2023-10-17  2:19     ` Kewen.Lin
@ 2023-10-19  8:39       ` HAO CHEN GUI
  0 siblings, 0 replies; 7+ messages in thread
From: HAO CHEN GUI @ 2023-10-19  8:39 UTC (permalink / raw)
  To: Kewen.Lin, David Edelsohn; +Cc: gcc-patches, Segher Boessenkool, Peter Bergner

Kewen & David,
  Thanks for your comments.

在 2023/10/17 10:19, Kewen.Lin 写道:
> I think David raised a good question, it sounds to me that the current
> handling simply consider that if MOVE_MAX_PIECES is set to 16, the
> required operations for this optimization on TImode are always available,
> but unfortunately on rs6000 the assumption doesn't hold, so could we
> teach generic code instead?

Finally I found that it doesn't check if the scalar mode used in by pieces
operations is enabled by the target. The TImode is not enabled on ppc. It
should be checked before taking TImode to do by pieces operations. I made
a patch for the generic code and testing it. With the patch, 16-byte
comparison could be enabled on both ppc64 and ppc.

Thanks
Gui Haochen

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-10-19  8:39 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-09  2:30 [PATCH-2, rs6000] Enable vector mode for memory equality compare [PR111449] HAO CHEN GUI
2023-10-09 15:42 ` David Edelsohn
2023-10-10  8:18   ` HAO CHEN GUI
2023-10-10 12:44     ` David Edelsohn
2023-10-11  9:10       ` HAO CHEN GUI
2023-10-17  2:19     ` Kewen.Lin
2023-10-19  8:39       ` HAO CHEN GUI

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).