public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v2] [AARCH64] Fix PR target/103100 -mstrict-align and memset on not aligned buffers
@ 2021-11-19  1:53 apinski
  2021-12-02 22:14 ` Andrew Pinski
  0 siblings, 1 reply; 3+ messages in thread
From: apinski @ 2021-11-19  1:53 UTC (permalink / raw)
  To: gcc-patches; +Cc: Andrew Pinski

From: Andrew Pinski <apinski@marvell.com>

The problem here is that aarch64_expand_setmem does not change the alignment
for strict alignment case. This is a simplified patch from what I had previously.
So constraining copy_limit to the alignment of the mem in the case of strict align
fixes the issue without checking to many other changes to the code.

OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions/

gcc/ChangeLog:

	* config/aarch64/aarch64.c (aarch64_expand_setmem): Constraint
	copy_limit to the alignment of the mem if STRICT_ALIGNMENT is
	true.
---
 gcc/config/aarch64/aarch64.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 7389b5953dc..e9c2e89d8ce 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -23744,9 +23744,16 @@ aarch64_expand_setmem (rtx *operands)
   /* Maximum amount to copy in one go.  We allow 256-bit chunks based on the
      AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS tuning parameter.  setmem expand
      pattern is only turned on for TARGET_SIMD.  */
-  const int copy_limit = (aarch64_tune_params.extra_tuning_flags
-			  & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS)
-			  ? GET_MODE_BITSIZE (TImode) : 256;
+  int copy_limit;
+
+  if (aarch64_tune_params.extra_tuning_flags
+      & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS)
+    copy_limit = GET_MODE_BITSIZE (TImode);
+  else
+    copy_limit = 256;
+
+  if (STRICT_ALIGNMENT)
+    copy_limit = MIN (copy_limit, (int)MEM_ALIGN (dst));
 
   while (n > 0)
     {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] [AARCH64] Fix PR target/103100 -mstrict-align and memset on not aligned buffers
  2021-11-19  1:53 [PATCH v2] [AARCH64] Fix PR target/103100 -mstrict-align and memset on not aligned buffers apinski
@ 2021-12-02 22:14 ` Andrew Pinski
  2021-12-03 12:40   ` Richard Sandiford
  0 siblings, 1 reply; 3+ messages in thread
From: Andrew Pinski @ 2021-12-02 22:14 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: GCC Patches

On Thu, Nov 18, 2021 at 5:55 PM apinski--- via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> From: Andrew Pinski <apinski@marvell.com>
>
> The problem here is that aarch64_expand_setmem does not change the alignment
> for strict alignment case. This is a simplified patch from what I had previously.
> So constraining copy_limit to the alignment of the mem in the case of strict align
> fixes the issue without checking to many other changes to the code.
>
> OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.

Ping?

>
> gcc/ChangeLog:
>
>         * config/aarch64/aarch64.c (aarch64_expand_setmem): Constraint
>         copy_limit to the alignment of the mem if STRICT_ALIGNMENT is
>         true.
> ---
>  gcc/config/aarch64/aarch64.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 7389b5953dc..e9c2e89d8ce 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -23744,9 +23744,16 @@ aarch64_expand_setmem (rtx *operands)
>    /* Maximum amount to copy in one go.  We allow 256-bit chunks based on the
>       AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS tuning parameter.  setmem expand
>       pattern is only turned on for TARGET_SIMD.  */
> -  const int copy_limit = (aarch64_tune_params.extra_tuning_flags
> -                         & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS)
> -                         ? GET_MODE_BITSIZE (TImode) : 256;
> +  int copy_limit;
> +
> +  if (aarch64_tune_params.extra_tuning_flags
> +      & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS)
> +    copy_limit = GET_MODE_BITSIZE (TImode);
> +  else
> +    copy_limit = 256;
> +
> +  if (STRICT_ALIGNMENT)
> +    copy_limit = MIN (copy_limit, (int)MEM_ALIGN (dst));
>
>    while (n > 0)
>      {
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] [AARCH64] Fix PR target/103100 -mstrict-align and memset on not aligned buffers
  2021-12-02 22:14 ` Andrew Pinski
@ 2021-12-03 12:40   ` Richard Sandiford
  0 siblings, 0 replies; 3+ messages in thread
From: Richard Sandiford @ 2021-12-03 12:40 UTC (permalink / raw)
  To: Andrew Pinski via Gcc-patches; +Cc: Andrew Pinski, Andrew Pinski

Andrew Pinski via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Thu, Nov 18, 2021 at 5:55 PM apinski--- via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> From: Andrew Pinski <apinski@marvell.com>
>>
>> The problem here is that aarch64_expand_setmem does not change the alignment
>> for strict alignment case. This is a simplified patch from what I had previously.
>> So constraining copy_limit to the alignment of the mem in the case of strict align
>> fixes the issue without checking to many other changes to the code.
>>
>> OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.
>
> Ping?
>
>>
>> gcc/ChangeLog:
>>
>>         * config/aarch64/aarch64.c (aarch64_expand_setmem): Constraint
>>         copy_limit to the alignment of the mem if STRICT_ALIGNMENT is
>>         true.

This looks correct, but for a modified version of the testcase in the PR:

void f(char *x) { __builtin_memset (x, 0, 216); }

we'll now emit 216 STRB instructions, which seems a bit excessive.

I think in practice the code has only been tuned on targets that
support LDP/STP Q, so how about moving the copy_limit calculation
further up and doing:

  unsigned max_set_size = (copy_limit * 8) / BITS_PER_UNIT;

?

It would be good to have a scan-assembler-not testcase for the testsuite.

Thanks,
Richard


>> ---
>>  gcc/config/aarch64/aarch64.c | 13 ++++++++++---
>>  1 file changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index 7389b5953dc..e9c2e89d8ce 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -23744,9 +23744,16 @@ aarch64_expand_setmem (rtx *operands)
>>    /* Maximum amount to copy in one go.  We allow 256-bit chunks based on the
>>       AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS tuning parameter.  setmem expand
>>       pattern is only turned on for TARGET_SIMD.  */
>> -  const int copy_limit = (aarch64_tune_params.extra_tuning_flags
>> -                         & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS)
>> -                         ? GET_MODE_BITSIZE (TImode) : 256;
>> +  int copy_limit;
>> +
>> +  if (aarch64_tune_params.extra_tuning_flags
>> +      & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS)
>> +    copy_limit = GET_MODE_BITSIZE (TImode);
>> +  else
>> +    copy_limit = 256;
>> +
>> +  if (STRICT_ALIGNMENT)
>> +    copy_limit = MIN (copy_limit, (int)MEM_ALIGN (dst));
>>
>>    while (n > 0)
>>      {
>> --
>> 2.17.1
>>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-12-03 12:40 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-19  1:53 [PATCH v2] [AARCH64] Fix PR target/103100 -mstrict-align and memset on not aligned buffers apinski
2021-12-02 22:14 ` Andrew Pinski
2021-12-03 12:40   ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).