public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] rs6000: Enable overlapped by-pieces operations
@ 2024-05-08  6:47 HAO CHEN GUI
  2024-05-09  5:44 ` Kewen.Lin
  0 siblings, 1 reply; 5+ messages in thread
From: HAO CHEN GUI @ 2024-05-08  6:47 UTC (permalink / raw)
  To: gcc-patches; +Cc: Segher Boessenkool, David, Kewen.Lin, Peter Bergner

Hi,
  This patch enables overlapped by-piece operations. On rs6000, default
move/set/clear ratio is 2. So the overlap is only enabled with compare
by-pieces.

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
rs6000: Enable overlapped by-pieces operations

This patch enables overlapped by-piece operations by defining
TARGET_OVERLAP_OP_BY_PIECES_P to true.  On rs6000, default move/set/clear
ratio is 2.  So the overlap is only enabled with compare by-pieces.

gcc/
	* config/rs6000/rs6000.cc (TARGET_OVERLAP_OP_BY_PIECES_P): Define.

gcc/testsuite/
	* gcc.target/powerpc/block-cmp-9.c: New.


patch.diff
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 6b9a40fcc66..2b5f5cf1d86 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1774,6 +1774,9 @@ static const scoped_attribute_specs *const rs6000_attribute_table[] =
 #undef TARGET_CONST_ANCHOR
 #define TARGET_CONST_ANCHOR 0x8000

+#undef TARGET_OVERLAP_OP_BY_PIECES_P
+#define TARGET_OVERLAP_OP_BY_PIECES_P hook_bool_void_true
+
 \f

 /* Processor table.  */
diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c
new file mode 100644
index 00000000000..b5f51affbb7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
+/* { dg-final { scan-assembler-not {\ml[hb]z\M} } } */
+
+/* Test if by-piece overlap compare is enabled and following case is
+   implemented by two overlap word loads and compares.  */
+
+int foo (const char* s1, const char* s2)
+{
+  return __builtin_memcmp (s1, s2, 7) == 0;
+}

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] rs6000: Enable overlapped by-pieces operations
  2024-05-08  6:47 [PATCH] rs6000: Enable overlapped by-pieces operations HAO CHEN GUI
@ 2024-05-09  5:44 ` Kewen.Lin
  2024-05-09  7:35   ` HAO CHEN GUI
  2024-05-09  7:59   ` HAO CHEN GUI
  0 siblings, 2 replies; 5+ messages in thread
From: Kewen.Lin @ 2024-05-09  5:44 UTC (permalink / raw)
  To: HAO CHEN GUI; +Cc: Segher Boessenkool, David, Peter Bergner, gcc-patches

Hi,

on 2024/5/8 14:47, HAO CHEN GUI wrote:
> Hi,
>   This patch enables overlapped by-piece operations. On rs6000, default
> move/set/clear ratio is 2. So the overlap is only enabled with compare
> by-pieces.

Thanks for enabling this, did you evaluate if it can help some benchmark?

> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no
> regressions. Is it OK for the trunk?
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: Enable overlapped by-pieces operations
> 
> This patch enables overlapped by-piece operations by defining
> TARGET_OVERLAP_OP_BY_PIECES_P to true.  On rs6000, default move/set/clear
> ratio is 2.  So the overlap is only enabled with compare by-pieces.
> 
> gcc/
> 	* config/rs6000/rs6000.cc (TARGET_OVERLAP_OP_BY_PIECES_P): Define.
> 
> gcc/testsuite/
> 	* gcc.target/powerpc/block-cmp-9.c: New.
> 
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 6b9a40fcc66..2b5f5cf1d86 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -1774,6 +1774,9 @@ static const scoped_attribute_specs *const rs6000_attribute_table[] =
>  #undef TARGET_CONST_ANCHOR
>  #define TARGET_CONST_ANCHOR 0x8000
> 
> +#undef TARGET_OVERLAP_OP_BY_PIECES_P
> +#define TARGET_OVERLAP_OP_BY_PIECES_P hook_bool_void_true
> +
>  \f
> 
>  /* Processor table.  */
> diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c
> new file mode 100644
> index 00000000000..b5f51affbb7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */

Why does it need power8 forced here?

BR,
Kewen

> +/* { dg-final { scan-assembler-not {\ml[hb]z\M} } } */
> +
> +/* Test if by-piece overlap compare is enabled and following case is
> +   implemented by two overlap word loads and compares.  */
> +
> +int foo (const char* s1, const char* s2)
> +{
> +  return __builtin_memcmp (s1, s2, 7) == 0;
> +}


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] rs6000: Enable overlapped by-pieces operations
  2024-05-09  5:44 ` Kewen.Lin
@ 2024-05-09  7:35   ` HAO CHEN GUI
  2024-05-13  1:40     ` Kewen.Lin
  2024-05-09  7:59   ` HAO CHEN GUI
  1 sibling, 1 reply; 5+ messages in thread
From: HAO CHEN GUI @ 2024-05-09  7:35 UTC (permalink / raw)
  To: Kewen.Lin; +Cc: Segher Boessenkool, David, Peter Bergner, gcc-patches

Hi Kewen,
  Thanks for your comments.

在 2024/5/9 13:44, Kewen.Lin 写道:
> Hi,
> 
> on 2024/5/8 14:47, HAO CHEN GUI wrote:
>> Hi,
>>   This patch enables overlapped by-piece operations. On rs6000, default
>> move/set/clear ratio is 2. So the overlap is only enabled with compare
>> by-pieces.
> 
> Thanks for enabling this, did you evaluate if it can help some benchmark?

Tested it with SPEC2017. No obvious performance impact. I think memory
compare might not be hot enough.

Tested it with my micro benchmark. 5-10% performance gain when compare
length is 7.

> 
>>
>>   Bootstrapped and tested on powerpc64-linux BE and LE with no
>> regressions. Is it OK for the trunk?
>>
>> Thanks
>> Gui Haochen
>>
>> ChangeLog
>> rs6000: Enable overlapped by-pieces operations
>>
>> This patch enables overlapped by-piece operations by defining
>> TARGET_OVERLAP_OP_BY_PIECES_P to true.  On rs6000, default move/set/clear
>> ratio is 2.  So the overlap is only enabled with compare by-pieces.
>>
>> gcc/
>> 	* config/rs6000/rs6000.cc (TARGET_OVERLAP_OP_BY_PIECES_P): Define.
>>
>> gcc/testsuite/
>> 	* gcc.target/powerpc/block-cmp-9.c: New.
>>
>>
>> patch.diff
>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index 6b9a40fcc66..2b5f5cf1d86 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -1774,6 +1774,9 @@ static const scoped_attribute_specs *const rs6000_attribute_table[] =
>>  #undef TARGET_CONST_ANCHOR
>>  #define TARGET_CONST_ANCHOR 0x8000
>>
>> +#undef TARGET_OVERLAP_OP_BY_PIECES_P
>> +#define TARGET_OVERLAP_OP_BY_PIECES_P hook_bool_void_true
>> +
>>  \f
>>
>>  /* Processor table.  */
>> diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c
>> new file mode 100644
>> index 00000000000..b5f51affbb7
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c
>> @@ -0,0 +1,11 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
> 
> Why does it need power8 forced here?

I just want to exclude P7 LE as targetm.slow_unaligned_access return false
for it and the expand cmpmemsi won't be invoked.

> 
> BR,
> Kewen
> 
>> +/* { dg-final { scan-assembler-not {\ml[hb]z\M} } } */
>> +
>> +/* Test if by-piece overlap compare is enabled and following case is
>> +   implemented by two overlap word loads and compares.  */
>> +
>> +int foo (const char* s1, const char* s2)
>> +{
>> +  return __builtin_memcmp (s1, s2, 7) == 0;
>> +}
> 

Thanks
Gui Haochen

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] rs6000: Enable overlapped by-pieces operations
  2024-05-09  5:44 ` Kewen.Lin
  2024-05-09  7:35   ` HAO CHEN GUI
@ 2024-05-09  7:59   ` HAO CHEN GUI
  1 sibling, 0 replies; 5+ messages in thread
From: HAO CHEN GUI @ 2024-05-09  7:59 UTC (permalink / raw)
  To: Kewen.Lin; +Cc: Segher Boessenkool, David, Peter Bergner, gcc-patches

Hi Kewen,

在 2024/5/9 13:44, Kewen.Lin 写道:
> Why does it need power8 forced here?

I think it over. It's no need. For the sub-targets which library is
called, l[hb]z won't be generated too.

Thanks
Gui Haochen

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] rs6000: Enable overlapped by-pieces operations
  2024-05-09  7:35   ` HAO CHEN GUI
@ 2024-05-13  1:40     ` Kewen.Lin
  0 siblings, 0 replies; 5+ messages in thread
From: Kewen.Lin @ 2024-05-13  1:40 UTC (permalink / raw)
  To: HAO CHEN GUI; +Cc: Segher Boessenkool, David, Peter Bergner, gcc-patches

Hi,

on 2024/5/9 15:35, HAO CHEN GUI wrote:
> Hi Kewen,
>   Thanks for your comments.
> 
> 在 2024/5/9 13:44, Kewen.Lin 写道:
>> Hi,
>>
>> on 2024/5/8 14:47, HAO CHEN GUI wrote:
>>> Hi,
>>>   This patch enables overlapped by-piece operations. On rs6000, default
>>> move/set/clear ratio is 2. So the overlap is only enabled with compare
>>> by-pieces.
>>
>> Thanks for enabling this, did you evaluate if it can help some benchmark?
> 
> Tested it with SPEC2017. No obvious performance impact. I think memory
> compare might not be hot enough.
> 
> Tested it with my micro benchmark. 5-10% performance gain when compare
> length is 7.

Nice!

> 
>>
>>>
>>>   Bootstrapped and tested on powerpc64-linux BE and LE with no
>>> regressions. Is it OK for the trunk?
>>>
>>> Thanks
>>> Gui Haochen
>>>
>>> ChangeLog
>>> rs6000: Enable overlapped by-pieces operations
>>>
>>> This patch enables overlapped by-piece operations by defining
>>> TARGET_OVERLAP_OP_BY_PIECES_P to true.  On rs6000, default move/set/clear
>>> ratio is 2.  So the overlap is only enabled with compare by-pieces.
>>>
>>> gcc/
>>> 	* config/rs6000/rs6000.cc (TARGET_OVERLAP_OP_BY_PIECES_P): Define.
>>>
>>> gcc/testsuite/
>>> 	* gcc.target/powerpc/block-cmp-9.c: New.
>>>
>>>
>>> patch.diff
>>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>>> index 6b9a40fcc66..2b5f5cf1d86 100644
>>> --- a/gcc/config/rs6000/rs6000.cc
>>> +++ b/gcc/config/rs6000/rs6000.cc
>>> @@ -1774,6 +1774,9 @@ static const scoped_attribute_specs *const rs6000_attribute_table[] =
>>>  #undef TARGET_CONST_ANCHOR
>>>  #define TARGET_CONST_ANCHOR 0x8000
>>>
>>> +#undef TARGET_OVERLAP_OP_BY_PIECES_P
>>> +#define TARGET_OVERLAP_OP_BY_PIECES_P hook_bool_void_true
>>> +
>>>  \f
>>>
>>>  /* Processor table.  */
>>> diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c
>>> new file mode 100644
>>> index 00000000000..b5f51affbb7
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c
>>> @@ -0,0 +1,11 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
>>
>> Why does it need power8 forced here?
> 
> I just want to exclude P7 LE as targetm.slow_unaligned_access return false
> for it and the expand cmpmemsi won't be invoked.

> I think it over. It's no need. For the sub-targets which library is
> called, l[hb]z won't be generated too.

Thanks for checking, OK with dropping this forced power8.

BR,
Kewen

> 
>>
>> BR,
>> Kewen
>>
>>> +/* { dg-final { scan-assembler-not {\ml[hb]z\M} } } */
>>> +
>>> +/* Test if by-piece overlap compare is enabled and following case is
>>> +   implemented by two overlap word loads and compares.  */
>>> +
>>> +int foo (const char* s1, const char* s2)
>>> +{
>>> +  return __builtin_memcmp (s1, s2, 7) == 0;
>>> +}
>>
> 
> Thanks
> Gui Haochen


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-05-13 18:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-08  6:47 [PATCH] rs6000: Enable overlapped by-pieces operations HAO CHEN GUI
2024-05-09  5:44 ` Kewen.Lin
2024-05-09  7:35   ` HAO CHEN GUI
2024-05-13  1:40     ` Kewen.Lin
2024-05-09  7:59   ` HAO CHEN GUI

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).