[PATCH, rs6000] Enable vector compare for 16-byte memory equality compare [PR111449]

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH, rs6000] Enable vector compare for 16-byte memory equality compare [PR111449]
@ 2023-09-20  8:49 HAO CHEN GUI
  2023-09-25  6:09 ` Kewen.Lin
  0 siblings, 1 reply; 6+ messages in thread
From: HAO CHEN GUI @ 2023-09-20  8:49 UTC (permalink / raw)
  To: gcc-patches; +Cc: Segher Boessenkool, David, Kewen.Lin, Peter Bergner

Hi,
  This patch enables vector compare for 16-byte memory equality compare.
The 16-byte memory equality compare can be efficiently implemented by
instruction "vcmpequb." It reduces one branch and one compare compared
with two 8-byte compare sequence.

  16-byte vector compare is not enabled on 32bit sub-targets as TImode
hasn't been supported well on 32bit sub-targets.

  Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.

Thanks
Gui Haochen

ChangeLog
rs6000: Enable vector compare for 16-byte memory equality compare

gcc/
	PR target/111449
	* config/rs6000/altivec.md (cbranchti4): New expand pattern.
	* config/rs6000/rs6000.cc (rs6000_generate_compare): Generate insn
	sequence for TImode vector equality compare.
	* config/rs6000/rs6000.h (MOVE_MAX_PIECES): Define.
	(COMPARE_MAX_PIECES): Define.

gcc/testsuite/
	PR target/111449
	* gcc.target/powerpc/pr111449.c: New.

patch.diff
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index e8a596fb7e9..99264235cbe 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2605,6 +2605,24 @@ (define_insn "altivec_vupklpx"
 }
   [(set_attr "type" "vecperm")])

+(define_expand "cbranchti4"
+  [(use (match_operator 0 "equality_operator"
+	[(match_operand:TI 1 "memory_operand")
+	 (match_operand:TI 2 "memory_operand")]))
+   (use (match_operand 3))]
+  "VECTOR_UNIT_ALTIVEC_P (V16QImode)"
+{
+  rtx op1 = simplify_subreg (V16QImode, operands[1], TImode, 0);
+  rtx op2 = simplify_subreg (V16QImode, operands[2], TImode, 0);
+  operands[1] = force_reg (V16QImode, op1);
+  operands[2] = force_reg (V16QImode, op2);
+  rtx_code code = GET_CODE (operands[0]);
+  operands[0] = gen_rtx_fmt_ee (code, V16QImode, operands[1],
+				operands[2]);
+  rs6000_emit_cbranch (TImode, operands);
+  DONE;
+})
+
 ;; Compare vectors producing a vector result and a predicate, setting CR6 to
 ;; indicate a combined status
 (define_insn "altivec_vcmpequ<VI_char>_p"
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index efe9adce1f8..c6b935a64e7 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15264,6 +15264,15 @@ rs6000_generate_compare (rtx cmp, machine_mode mode)
 	  else
 	    emit_insn (gen_stack_protect_testsi (compare_result, op0, op1b));
 	}
+      else if (mode == TImode)
+	{
+	  gcc_assert (code == EQ || code == NE);
+
+	  rtx result_vector = gen_reg_rtx (V16QImode);
+	  compare_result = gen_rtx_REG (CCmode, CR6_REGNO);
+	  emit_insn (gen_altivec_vcmpequb_p (result_vector, op0, op1));
+	  code = (code == NE) ? GE : LT;
+	}
       else
 	emit_insn (gen_rtx_SET (compare_result,
 				gen_rtx_COMPARE (comp_mode, op0, op1)));
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3503614efbd..dc33bca0802 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -1730,6 +1730,8 @@ typedef struct rs6000_args
    in one reasonably fast instruction.  */
 #define MOVE_MAX (! TARGET_POWERPC64 ? 4 : 8)
 #define MAX_MOVE_MAX 8
+#define MOVE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)
+#define COMPARE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)

 /* Nonzero if access to memory by bytes is no faster than for words.
    Also nonzero if doing byte operations (specifically shifts) in registers
diff --git a/gcc/testsuite/gcc.target/powerpc/pr111449.c b/gcc/testsuite/gcc.target/powerpc/pr111449.c
new file mode 100644
index 00000000000..ab9583f47bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr111449.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-maltivec -O2" } */
+/* { dg-require-effective-target has_arch_ppc64 } */
+
+/* Ensure vector comparison is used for 16-byte memory equality compare.  */
+
+int compare (const char* s1, const char* s2)
+{
+  return __builtin_memcmp (s1, s2, 16) == 0;
+}
+
+/* { dg-final { scan-assembler-times {\mvcmpequb\M} 1 } } */
+/* { dg-final { scan-assembler-not {\mcmpd\M} } } */

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH, rs6000] Enable vector compare for 16-byte memory equality compare [PR111449]
  2023-09-20  8:49 [PATCH, rs6000] Enable vector compare for 16-byte memory equality compare [PR111449] HAO CHEN GUI
@ 2023-09-25  6:09 ` Kewen.Lin
  2023-09-27 11:10   ` Richard Sandiford
  0 siblings, 1 reply; 6+ messages in thread
From: Kewen.Lin @ 2023-09-25  6:09 UTC (permalink / raw)
  To: HAO CHEN GUI
  Cc: Segher Boessenkool, David, Peter Bergner, gcc-patches,
	Richard Biener, Richard Sandiford, Jeff Law, Jakub Jelinek

Hi,

on 2023/9/20 16:49, HAO CHEN GUI wrote:
> Hi,
>   This patch enables vector compare for 16-byte memory equality compare.
> The 16-byte memory equality compare can be efficiently implemented by
> instruction "vcmpequb." It reduces one branch and one compare compared
> with two 8-byte compare sequence.

It looks nice to exploit vcmpequb. for this comparison.

> 
>   16-byte vector compare is not enabled on 32bit sub-targets as TImode
> hasn't been supported well on 32bit sub-targets.

But it sounds weird to say it is with TImode but the underlying instruction
is V16QImode.  This does NOT necessarily depend on TImode, so if it's coded
with V16QImode it would not suffer this unsupported issue.

The reason why you hacked with TImode seems that the generic part of code
only considers the scalar mode?  I wonder if we can extend the generic code
to consider the vector mode as well.  It also makes thing better if we will
have wider vector mode one day.

I guess there is no blocking/limitation for not considering vector modes?
CC some experts.

BR,
Kewen

> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: Enable vector compare for 16-byte memory equality compare
> 
> gcc/
> 	PR target/111449
> 	* config/rs6000/altivec.md (cbranchti4): New expand pattern.
> 	* config/rs6000/rs6000.cc (rs6000_generate_compare): Generate insn
> 	sequence for TImode vector equality compare.
> 	* config/rs6000/rs6000.h (MOVE_MAX_PIECES): Define.
> 	(COMPARE_MAX_PIECES): Define.
> 
> gcc/testsuite/
> 	PR target/111449
> 	* gcc.target/powerpc/pr111449.c: New.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index e8a596fb7e9..99264235cbe 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -2605,6 +2605,24 @@ (define_insn "altivec_vupklpx"
>  }
>    [(set_attr "type" "vecperm")])
> 
> +(define_expand "cbranchti4"
> +  [(use (match_operator 0 "equality_operator"
> +	[(match_operand:TI 1 "memory_operand")
> +	 (match_operand:TI 2 "memory_operand")]))
> +   (use (match_operand 3))]
> +  "VECTOR_UNIT_ALTIVEC_P (V16QImode)"
> +{
> +  rtx op1 = simplify_subreg (V16QImode, operands[1], TImode, 0);
> +  rtx op2 = simplify_subreg (V16QImode, operands[2], TImode, 0);
> +  operands[1] = force_reg (V16QImode, op1);
> +  operands[2] = force_reg (V16QImode, op2);
> +  rtx_code code = GET_CODE (operands[0]);
> +  operands[0] = gen_rtx_fmt_ee (code, V16QImode, operands[1],
> +				operands[2]);
> +  rs6000_emit_cbranch (TImode, operands);
> +  DONE;
> +})
> +
>  ;; Compare vectors producing a vector result and a predicate, setting CR6 to
>  ;; indicate a combined status
>  (define_insn "altivec_vcmpequ<VI_char>_p"
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index efe9adce1f8..c6b935a64e7 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -15264,6 +15264,15 @@ rs6000_generate_compare (rtx cmp, machine_mode mode)
>  	  else
>  	    emit_insn (gen_stack_protect_testsi (compare_result, op0, op1b));
>  	}
> +      else if (mode == TImode)
> +	{
> +	  gcc_assert (code == EQ || code == NE);
> +
> +	  rtx result_vector = gen_reg_rtx (V16QImode);
> +	  compare_result = gen_rtx_REG (CCmode, CR6_REGNO);
> +	  emit_insn (gen_altivec_vcmpequb_p (result_vector, op0, op1));
> +	  code = (code == NE) ? GE : LT;
> +	}
>        else
>  	emit_insn (gen_rtx_SET (compare_result,
>  				gen_rtx_COMPARE (comp_mode, op0, op1)));
> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
> index 3503614efbd..dc33bca0802 100644
> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -1730,6 +1730,8 @@ typedef struct rs6000_args
>     in one reasonably fast instruction.  */
>  #define MOVE_MAX (! TARGET_POWERPC64 ? 4 : 8)
>  #define MAX_MOVE_MAX 8
> +#define MOVE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)
> +#define COMPARE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)
> 
>  /* Nonzero if access to memory by bytes is no faster than for words.
>     Also nonzero if doing byte operations (specifically shifts) in registers
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr111449.c b/gcc/testsuite/gcc.target/powerpc/pr111449.c
> new file mode 100644
> index 00000000000..ab9583f47bb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr111449.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-options "-maltivec -O2" } */
> +/* { dg-require-effective-target has_arch_ppc64 } */
> +
> +/* Ensure vector comparison is used for 16-byte memory equality compare.  */
> +
> +int compare (const char* s1, const char* s2)
> +{
> +  return __builtin_memcmp (s1, s2, 16) == 0;
> +}
> +
> +/* { dg-final { scan-assembler-times {\mvcmpequb\M} 1 } } */
> +/* { dg-final { scan-assembler-not {\mcmpd\M} } } */




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH, rs6000] Enable vector compare for 16-byte memory equality compare [PR111449]
  2023-09-25  6:09 ` Kewen.Lin
@ 2023-09-27 11:10   ` Richard Sandiford
  2023-09-28  8:10     ` HAO CHEN GUI
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Sandiford @ 2023-09-27 11:10 UTC (permalink / raw)
  To: Kewen.Lin
  Cc: HAO CHEN GUI, Segher Boessenkool, David, Peter Bergner,
	gcc-patches, Richard Biener, Jeff Law, Jakub Jelinek

"Kewen.Lin" <linkw@linux.ibm.com> writes:
> Hi,
>
> on 2023/9/20 16:49, HAO CHEN GUI wrote:
>> Hi,
>>   This patch enables vector compare for 16-byte memory equality compare.
>> The 16-byte memory equality compare can be efficiently implemented by
>> instruction "vcmpequb." It reduces one branch and one compare compared
>> with two 8-byte compare sequence.
>
> It looks nice to exploit vcmpequb. for this comparison.
>
>> 
>>   16-byte vector compare is not enabled on 32bit sub-targets as TImode
>> hasn't been supported well on 32bit sub-targets.
>
> But it sounds weird to say it is with TImode but the underlying instruction
> is V16QImode.  This does NOT necessarily depend on TImode, so if it's coded
> with V16QImode it would not suffer this unsupported issue.
>
> The reason why you hacked with TImode seems that the generic part of code
> only considers the scalar mode?  I wonder if we can extend the generic code
> to consider the vector mode as well.  It also makes thing better if we will
> have wider vector mode one day.
>
> I guess there is no blocking/limitation for not considering vector modes?

Yeah, I agree there doesn't seem to be a good reason to exclude vectors.
Sorry to dive straight into details, but maybe we should have something
called bitwise_mode_for_size that tries to use integer modes where possible,
but falls back to vector modes otherwise.  That mode could then be used
for copying, storing, bitwise ops, and equality comparisons (if there
is appropriate optabs support).

Thanks,
Richard

> CC some experts.
>
> BR,
> Kewen
>
>> 
>>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
>> 
>> Thanks
>> Gui Haochen
>> 
>> ChangeLog
>> rs6000: Enable vector compare for 16-byte memory equality compare
>> 
>> gcc/
>> 	PR target/111449
>> 	* config/rs6000/altivec.md (cbranchti4): New expand pattern.
>> 	* config/rs6000/rs6000.cc (rs6000_generate_compare): Generate insn
>> 	sequence for TImode vector equality compare.
>> 	* config/rs6000/rs6000.h (MOVE_MAX_PIECES): Define.
>> 	(COMPARE_MAX_PIECES): Define.
>> 
>> gcc/testsuite/
>> 	PR target/111449
>> 	* gcc.target/powerpc/pr111449.c: New.
>> 
>> patch.diff
>> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
>> index e8a596fb7e9..99264235cbe 100644
>> --- a/gcc/config/rs6000/altivec.md
>> +++ b/gcc/config/rs6000/altivec.md
>> @@ -2605,6 +2605,24 @@ (define_insn "altivec_vupklpx"
>>  }
>>    [(set_attr "type" "vecperm")])
>> 
>> +(define_expand "cbranchti4"
>> +  [(use (match_operator 0 "equality_operator"
>> +	[(match_operand:TI 1 "memory_operand")
>> +	 (match_operand:TI 2 "memory_operand")]))
>> +   (use (match_operand 3))]
>> +  "VECTOR_UNIT_ALTIVEC_P (V16QImode)"
>> +{
>> +  rtx op1 = simplify_subreg (V16QImode, operands[1], TImode, 0);
>> +  rtx op2 = simplify_subreg (V16QImode, operands[2], TImode, 0);
>> +  operands[1] = force_reg (V16QImode, op1);
>> +  operands[2] = force_reg (V16QImode, op2);
>> +  rtx_code code = GET_CODE (operands[0]);
>> +  operands[0] = gen_rtx_fmt_ee (code, V16QImode, operands[1],
>> +				operands[2]);
>> +  rs6000_emit_cbranch (TImode, operands);
>> +  DONE;
>> +})
>> +
>>  ;; Compare vectors producing a vector result and a predicate, setting CR6 to
>>  ;; indicate a combined status
>>  (define_insn "altivec_vcmpequ<VI_char>_p"
>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index efe9adce1f8..c6b935a64e7 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -15264,6 +15264,15 @@ rs6000_generate_compare (rtx cmp, machine_mode mode)
>>  	  else
>>  	    emit_insn (gen_stack_protect_testsi (compare_result, op0, op1b));
>>  	}
>> +      else if (mode == TImode)
>> +	{
>> +	  gcc_assert (code == EQ || code == NE);
>> +
>> +	  rtx result_vector = gen_reg_rtx (V16QImode);
>> +	  compare_result = gen_rtx_REG (CCmode, CR6_REGNO);
>> +	  emit_insn (gen_altivec_vcmpequb_p (result_vector, op0, op1));
>> +	  code = (code == NE) ? GE : LT;
>> +	}
>>        else
>>  	emit_insn (gen_rtx_SET (compare_result,
>>  				gen_rtx_COMPARE (comp_mode, op0, op1)));
>> diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
>> index 3503614efbd..dc33bca0802 100644
>> --- a/gcc/config/rs6000/rs6000.h
>> +++ b/gcc/config/rs6000/rs6000.h
>> @@ -1730,6 +1730,8 @@ typedef struct rs6000_args
>>     in one reasonably fast instruction.  */
>>  #define MOVE_MAX (! TARGET_POWERPC64 ? 4 : 8)
>>  #define MAX_MOVE_MAX 8
>> +#define MOVE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)
>> +#define COMPARE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)
>> 
>>  /* Nonzero if access to memory by bytes is no faster than for words.
>>     Also nonzero if doing byte operations (specifically shifts) in registers
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr111449.c b/gcc/testsuite/gcc.target/powerpc/pr111449.c
>> new file mode 100644
>> index 00000000000..ab9583f47bb
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr111449.c
>> @@ -0,0 +1,14 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target powerpc_p8vector_ok } */
>> +/* { dg-options "-maltivec -O2" } */
>> +/* { dg-require-effective-target has_arch_ppc64 } */
>> +
>> +/* Ensure vector comparison is used for 16-byte memory equality compare.  */
>> +
>> +int compare (const char* s1, const char* s2)
>> +{
>> +  return __builtin_memcmp (s1, s2, 16) == 0;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {\mvcmpequb\M} 1 } } */
>> +/* { dg-final { scan-assembler-not {\mcmpd\M} } } */

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH, rs6000] Enable vector compare for 16-byte memory equality compare [PR111449]
  2023-09-27 11:10   ` Richard Sandiford
@ 2023-09-28  8:10     ` HAO CHEN GUI
  2023-09-28 13:39       ` Richard Sandiford
  0 siblings, 1 reply; 6+ messages in thread
From: HAO CHEN GUI @ 2023-09-28  8:10 UTC (permalink / raw)
  To: Kewen.Lin, richard.sandiford
  Cc: Segher Boessenkool, David, Peter Bergner, gcc-patches, Jeff Law,
	Jakub Jelinek, Richard Biener

Kewen and Richard,
  Thanks for your comments. Please let me clarify it.

在 2023/9/27 19:10, Richard Sandiford 写道:
> Yeah, I agree there doesn't seem to be a good reason to exclude vectors.
> Sorry to dive straight into details, but maybe we should have something
> called bitwise_mode_for_size that tries to use integer modes where possible,
> but falls back to vector modes otherwise.  That mode could then be used
> for copying, storing, bitwise ops, and equality comparisons (if there
> is appropriate optabs support).

  The vector mode is not supported for compare_by_pieces and move_by_pieces.
But it is supported for set_by_pieces and clear_by_pieces. The help function
widest_fixed_size_mode_for_size returns vector mode when qi_vector is set to
true.

static fixed_size_mode
widest_fixed_size_mode_for_size (unsigned int size, bool qi_vector)

I tried to enable qi_vector for compare_by_pieces. It can pick up a vector
mode (eg. V16QImode) and works on some cases. But it fails on a constant
string case.

int compare (const char* s1)
{
  return __builtin_memcmp_eq (s1, "__GCC_HAVE_DWARF2_CFI_ASM", 16);
}

As the second op is a constant string, it calls builtin_memcpy_read_str to
build the string. Unfortunately, the inner function doesn't support
vector mode.

  /* The by-pieces infrastructure does not try to pick a vector mode
     for memcpy expansion.  */
  return c_readstr (rep + offset, as_a <scalar_int_mode> (mode),
                    /*nul_terminated=*/false);

Seems by-pieces infrastructure itself supports vector mode, but low level
functions do not.

I think there are two ways enable vector mode for compare_by_pieces.
One is to modify the by-pieces infrastructure . Another is to enable it
by cmpmem expand. The expand is target specific and be flexible.

What's your opinion?

Thanks
Gui Haochen

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH, rs6000] Enable vector compare for 16-byte memory equality compare [PR111449]
  2023-09-28  8:10     ` HAO CHEN GUI
@ 2023-09-28 13:39       ` Richard Sandiford
  2023-09-29  7:51         ` HAO CHEN GUI
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Sandiford @ 2023-09-28 13:39 UTC (permalink / raw)
  To: HAO CHEN GUI
  Cc: Kewen.Lin, Segher Boessenkool, David, Peter Bergner, gcc-patches,
	Jeff Law, Jakub Jelinek, Richard Biener

HAO CHEN GUI <guihaoc@linux.ibm.com> writes:
> Kewen and Richard,
>   Thanks for your comments. Please let me clarify it.
>
> 在 2023/9/27 19:10, Richard Sandiford 写道:
>> Yeah, I agree there doesn't seem to be a good reason to exclude vectors.
>> Sorry to dive straight into details, but maybe we should have something
>> called bitwise_mode_for_size that tries to use integer modes where possible,
>> but falls back to vector modes otherwise.  That mode could then be used
>> for copying, storing, bitwise ops, and equality comparisons (if there
>> is appropriate optabs support).
>
>   The vector mode is not supported for compare_by_pieces and move_by_pieces.
> But it is supported for set_by_pieces and clear_by_pieces. The help function
> widest_fixed_size_mode_for_size returns vector mode when qi_vector is set to
> true.
>
> static fixed_size_mode
> widest_fixed_size_mode_for_size (unsigned int size, bool qi_vector)

Ah, had forgotten about that function.

>
> I tried to enable qi_vector for compare_by_pieces. It can pick up a vector
> mode (eg. V16QImode) and works on some cases. But it fails on a constant
> string case.
>
> int compare (const char* s1)
> {
>   return __builtin_memcmp_eq (s1, "__GCC_HAVE_DWARF2_CFI_ASM", 16);
> }
>
> As the second op is a constant string, it calls builtin_memcpy_read_str to
> build the string. Unfortunately, the inner function doesn't support
> vector mode.
>
>   /* The by-pieces infrastructure does not try to pick a vector mode
>      for memcpy expansion.  */
>   return c_readstr (rep + offset, as_a <scalar_int_mode> (mode),
>                     /*nul_terminated=*/false);
>
> Seems by-pieces infrastructure itself supports vector mode, but low level
> functions do not.

That looks easily solvable though.  I've posted a potential fix as:

   https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631595.html

Is that the only blocker to doing this in generic code?

Thanks,
Richard

>
> I think there are two ways enable vector mode for compare_by_pieces.
> One is to modify the by-pieces infrastructure . Another is to enable it
> by cmpmem expand. The expand is target specific and be flexible.
>
> What's your opinion?
>
> Thanks
> Gui Haochen

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH, rs6000] Enable vector compare for 16-byte memory equality compare [PR111449]
  2023-09-28 13:39       ` Richard Sandiford
@ 2023-09-29  7:51         ` HAO CHEN GUI
  0 siblings, 0 replies; 6+ messages in thread
From: HAO CHEN GUI @ 2023-09-29  7:51 UTC (permalink / raw)
  To: richard.sandiford; +Cc: gcc-patches

Richard,

在 2023/9/28 21:39, Richard Sandiford 写道:
> That looks easily solvable though.  I've posted a potential fix as:
> 
>    https://gcc.gnu.org/pipermail/gcc-patches/2023-September/631595.html
> 
> Is that the only blocker to doing this in generic code?

Thanks so much for your patch. It works. I don't find other blocks. I
will do a regression test after I am back from Holiday.

Thanks
Gui Haochen

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-09-29  7:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-20  8:49 [PATCH, rs6000] Enable vector compare for 16-byte memory equality compare [PR111449] HAO CHEN GUI
2023-09-25  6:09 ` Kewen.Lin
2023-09-27 11:10   ` Richard Sandiford
2023-09-28  8:10     ` HAO CHEN GUI
2023-09-28 13:39       ` Richard Sandiford
2023-09-29  7:51         ` HAO CHEN GUI

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).