public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][rs6000] better use of unaligned vsx in memset() expansion
@ 2018-11-26 21:08 Aaron Sawdey
  2018-11-26 22:29 ` Segher Boessenkool
  0 siblings, 1 reply; 4+ messages in thread
From: Aaron Sawdey @ 2018-11-26 21:08 UTC (permalink / raw)
  To: gcc-patches; +Cc: Segher Boessenkool, Bill Schmidt, David Edelsohn

When I previously added the use of unaligned vsx stores to inline expansion
of memset, I didn't do a good job of managing boundary conditions. The intention
was to only use unaligned vsx if the block being cleared was more than 32 bytes.
What it actually did was to prevent the use of unaligned vsx for the last 32
bytes of any block being cleared. So this change puts the test up front so it
is not affected by the decrement of bytes.

OK for trunk if regstrap passes?

Thanks!
   Aaron



2018-11-26  Aaron Sawdey  <acsawdey@linux.ibm.com>

	* config/rs6000/rs6000-string.c (expand_block_clear): Change how
	we determine if unaligned vsx is ok.


Index: gcc/config/rs6000/rs6000-string.c
===================================================================
--- gcc/config/rs6000/rs6000-string.c	(revision 266219)
+++ gcc/config/rs6000/rs6000-string.c	(working copy)
@@ -85,14 +85,14 @@
   if (! optimize_size && bytes > 8 * clear_step)
     return 0;

+  bool unaligned_vsx_ok = (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX);
+
   for (offset = 0; bytes > 0; offset += clear_bytes, bytes -= clear_bytes)
     {
       machine_mode mode = BLKmode;
       rtx dest;

-      if (TARGET_ALTIVEC
-	  && ((bytes >= 16 && align >= 128)
-	      || (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX)))
+      if (TARGET_ALTIVEC && ((bytes >= 16 && align >= 128) || unaligned_vsx_ok))
 	{
 	  clear_bytes = 16;
 	  mode = V4SImode;

-- 
Aaron Sawdey, Ph.D.  acsawdey@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH][rs6000] better use of unaligned vsx in memset() expansion
  2018-11-26 21:08 [PATCH][rs6000] better use of unaligned vsx in memset() expansion Aaron Sawdey
@ 2018-11-26 22:29 ` Segher Boessenkool
  2018-11-28 19:25   ` Aaron Sawdey
  0 siblings, 1 reply; 4+ messages in thread
From: Segher Boessenkool @ 2018-11-26 22:29 UTC (permalink / raw)
  To: Aaron Sawdey; +Cc: gcc-patches, Bill Schmidt, David Edelsohn

On Mon, Nov 26, 2018 at 03:08:32PM -0600, Aaron Sawdey wrote:
> When I previously added the use of unaligned vsx stores to inline expansion
> of memset, I didn't do a good job of managing boundary conditions. The intention
> was to only use unaligned vsx if the block being cleared was more than 32 bytes.
> What it actually did was to prevent the use of unaligned vsx for the last 32
> bytes of any block being cleared. So this change puts the test up front so it
> is not affected by the decrement of bytes.

Oh wow.  Yes, that isn't so great.  Okay for trunk (and whatever backports).
Thanks,


Segher


> 2018-11-26  Aaron Sawdey  <acsawdey@linux.ibm.com>
> 
> 	* config/rs6000/rs6000-string.c (expand_block_clear): Change how
> 	we determine if unaligned vsx is ok.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH][rs6000] better use of unaligned vsx in memset() expansion
  2018-11-26 22:29 ` Segher Boessenkool
@ 2018-11-28 19:25   ` Aaron Sawdey
  2018-11-28 20:01     ` Segher Boessenkool
  0 siblings, 1 reply; 4+ messages in thread
From: Aaron Sawdey @ 2018-11-28 19:25 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, Bill Schmidt, David Edelsohn

The first version of this had a big bug and cleared past the requested bytes.
This version passes regstrap on ppc64le(power7/8/9), ppc64be(power6/7/8),
and ppc32(power8).

OK for trunk (and 8 backport after a week)?

Thanks!
   Aaron

Index: gcc/config/rs6000/rs6000-string.c
===================================================================
--- gcc/config/rs6000/rs6000-string.c	(revision 266524)
+++ gcc/config/rs6000/rs6000-string.c	(working copy)
@@ -85,6 +85,8 @@
   if (! optimize_size && bytes > 8 * clear_step)
     return 0;

+  bool unaligned_vsx_ok = (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX);
+
   for (offset = 0; bytes > 0; offset += clear_bytes, bytes -= clear_bytes)
     {
       machine_mode mode = BLKmode;
@@ -91,8 +93,7 @@
       rtx dest;

       if (TARGET_ALTIVEC
-	  && ((bytes >= 16 && align >= 128)
-	      || (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX)))
+	  && (bytes >= 16 && ( align >= 128 || unaligned_vsx_ok)))
 	{
 	  clear_bytes = 16;
 	  mode = V4SImode;


On 11/26/18 4:29 PM, Segher Boessenkool wrote:
> On Mon, Nov 26, 2018 at 03:08:32PM -0600, Aaron Sawdey wrote:
>> When I previously added the use of unaligned vsx stores to inline expansion
>> of memset, I didn't do a good job of managing boundary conditions. The intention
>> was to only use unaligned vsx if the block being cleared was more than 32 bytes.
>> What it actually did was to prevent the use of unaligned vsx for the last 32
>> bytes of any block being cleared. So this change puts the test up front so it
>> is not affected by the decrement of bytes.
> 
> Oh wow.  Yes, that isn't so great.  Okay for trunk (and whatever backports).
> Thanks,
> 
> 
> Segher
> 
> 
>> 2018-11-26  Aaron Sawdey  <acsawdey@linux.ibm.com>
>>
>> 	* config/rs6000/rs6000-string.c (expand_block_clear): Change how
>> 	we determine if unaligned vsx is ok.
> 

-- 
Aaron Sawdey, Ph.D.  acsawdey@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH][rs6000] better use of unaligned vsx in memset() expansion
  2018-11-28 19:25   ` Aaron Sawdey
@ 2018-11-28 20:01     ` Segher Boessenkool
  0 siblings, 0 replies; 4+ messages in thread
From: Segher Boessenkool @ 2018-11-28 20:01 UTC (permalink / raw)
  To: Aaron Sawdey; +Cc: gcc-patches, Bill Schmidt, David Edelsohn

On Wed, Nov 28, 2018 at 01:24:01PM -0600, Aaron Sawdey wrote:
> The first version of this had a big bug and cleared past the requested bytes.
> This version passes regstrap on ppc64le(power7/8/9), ppc64be(power6/7/8),
> and ppc32(power8).
> 
> OK for trunk (and 8 backport after a week)?

> @@ -91,8 +93,7 @@
>        rtx dest;
> 
>        if (TARGET_ALTIVEC
> -	  && ((bytes >= 16 && align >= 128)
> -	      || (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX)))
> +	  && (bytes >= 16 && ( align >= 128 || unaligned_vsx_ok)))

Please remove the stray space?  Okay for trunk and later for 8, thanks!


Segher

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-11-28 20:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-26 21:08 [PATCH][rs6000] better use of unaligned vsx in memset() expansion Aaron Sawdey
2018-11-26 22:29 ` Segher Boessenkool
2018-11-28 19:25   ` Aaron Sawdey
2018-11-28 20:01     ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).