* [PATCH][rs6000] avoid using unaligned vsx or lxvd2x/stxvd2x for memcpy/memmove inline expansion
@ 2018-12-19 19:53 Aaron Sawdey
2018-12-20 9:51 ` Segher Boessenkool
0 siblings, 1 reply; 8+ messages in thread
From: Aaron Sawdey @ 2018-12-19 19:53 UTC (permalink / raw)
To: GCC Patches, Segher Boessenkool, David Edelsohn, Bill Schmidt
Because of POWER9 dd2.1 issues with certain unaligned vsx instructions
to cache inhibited memory, here is a patch that keeps memmove (and memcpy)
inline expansion from doing unaligned vector or using vector load/store
other than lvx/stvx. More description of the issue is here:
https://patchwork.ozlabs.org/patch/814059/
OK for trunk if bootstrap/regtest ok?
Thanks!
Aaron
2018-12-19 Aaron Sawdey <acsawdey@linux.ibm.com>
* config/rs6000/rs6000-string.c (expand_block_move): Don't use
unaligned vsx and avoid lxvd2x/stxvd2x.
(gen_lvx_v4si_move): New function.
Index: gcc/config/rs6000/rs6000-string.c
===================================================================
--- gcc/config/rs6000/rs6000-string.c (revision 267055)
+++ gcc/config/rs6000/rs6000-string.c (working copy)
@@ -2669,6 +2669,35 @@
return true;
}
+/* Generate loads and stores for a move of v4si mode using lvx/stvx.
+ This uses altivec_{l,st}vx_<mode>_internal which use unspecs to
+ keep combine from changing what instruction gets used.
+
+ DEST is the destination for the data.
+ SRC is the source of the data for the move. */
+
+static rtx
+gen_lvx_v4si_move (rtx dest, rtx src)
+{
+ rtx rv = NULL;
+ if (MEM_P (dest))
+ {
+ gcc_assert (!MEM_P (src));
+ gcc_assert (GET_MODE (src) == V4SImode);
+ rv = gen_altivec_stvx_v4si_internal (dest, src);
+ }
+ else if (MEM_P (src))
+ {
+ gcc_assert (!MEM_P (dest));
+ gcc_assert (GET_MODE (dest) == V4SImode);
+ rv = gen_altivec_lvx_v4si_internal (dest, src);
+ }
+ else
+ gcc_unreachable ();
+
+ return rv;
+}
+
/* Expand a block move operation, and return 1 if successful. Return 0
if we should let the compiler generate normal code.
@@ -2721,11 +2750,11 @@
/* Altivec first, since it will be faster than a string move
when it applies, and usually not significantly larger. */
- if (TARGET_ALTIVEC && bytes >= 16 && (TARGET_EFFICIENT_UNALIGNED_VSX || align >= 128))
+ if (TARGET_ALTIVEC && bytes >= 16 && align >= 128)
{
move_bytes = 16;
mode = V4SImode;
- gen_func.mov = gen_movv4si;
+ gen_func.mov = gen_lvx_v4si_move;
}
else if (bytes >= 8 && TARGET_POWERPC64
&& (align >= 64 || !STRICT_ALIGNMENT))
--
Aaron Sawdey, Ph.D. acsawdey@linux.vnet.ibm.com
050-2/C113 (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH][rs6000] avoid using unaligned vsx or lxvd2x/stxvd2x for memcpy/memmove inline expansion
2018-12-19 19:53 [PATCH][rs6000] avoid using unaligned vsx or lxvd2x/stxvd2x for memcpy/memmove inline expansion Aaron Sawdey
@ 2018-12-20 9:51 ` Segher Boessenkool
2018-12-20 11:16 ` Segher Boessenkool
2018-12-20 23:44 ` Aaron Sawdey
0 siblings, 2 replies; 8+ messages in thread
From: Segher Boessenkool @ 2018-12-20 9:51 UTC (permalink / raw)
To: Aaron Sawdey; +Cc: GCC Patches, David Edelsohn, Bill Schmidt
On Wed, Dec 19, 2018 at 01:53:05PM -0600, Aaron Sawdey wrote:
> Because of POWER9 dd2.1 issues with certain unaligned vsx instructions
> to cache inhibited memory, here is a patch that keeps memmove (and memcpy)
> inline expansion from doing unaligned vector or using vector load/store
> other than lvx/stvx. More description of the issue is here:
>
> https://patchwork.ozlabs.org/patch/814059/
>
> OK for trunk if bootstrap/regtest ok?
Okay, but see below.
> 2018-12-19 Aaron Sawdey <acsawdey@linux.ibm.com>
>
> * config/rs6000/rs6000-string.c (expand_block_move): Don't use
> unaligned vsx and avoid lxvd2x/stxvd2x.
> (gen_lvx_v4si_move): New function.
> +static rtx
> +gen_lvx_v4si_move (rtx dest, rtx src)
> +{
> + rtx rv = NULL;
> + if (MEM_P (dest))
> + {
> + gcc_assert (!MEM_P (src));
> + gcc_assert (GET_MODE (src) == V4SImode);
> + rv = gen_altivec_stvx_v4si_internal (dest, src);
> + }
> + else if (MEM_P (src))
> + {
> + gcc_assert (!MEM_P (dest));
> + gcc_assert (GET_MODE (dest) == V4SImode);
> + rv = gen_altivec_lvx_v4si_internal (dest, src);
> + }
> + else
> + gcc_unreachable ();
> +
> + return rv;
> +}
This is extraordinarily clumsy :-) Maybe something like:
static rtx
gen_lvx_v4si_move (rtx dest, rtx src)
{
gcc_assert (!(MEM_P (dest) && MEM_P (src));
gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
if (MEM_P (dest))
return gen_altivec_stvx_v4si_internal (dest, src);
else if (MEM_P (src))
return gen_altivec_lvx_v4si_internal (dest, src);
else
gcc_unreachable ();
}
(Or do you allow VOIDmode for src as well?) Anyway, at least get rid of
the useless extra variable.
Thanks!
Segher
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH][rs6000] avoid using unaligned vsx or lxvd2x/stxvd2x for memcpy/memmove inline expansion
2018-12-20 9:51 ` Segher Boessenkool
@ 2018-12-20 11:16 ` Segher Boessenkool
2018-12-20 23:44 ` Aaron Sawdey
1 sibling, 0 replies; 8+ messages in thread
From: Segher Boessenkool @ 2018-12-20 11:16 UTC (permalink / raw)
To: Aaron Sawdey; +Cc: GCC Patches, David Edelsohn, Bill Schmidt
On Wed, Dec 19, 2018 at 01:53:05PM -0600, Aaron Sawdey wrote:
> Because of POWER9 dd2.1 issues with certain unaligned vsx instructions
> to cache inhibited memory, here is a patch that keeps memmove (and memcpy)
> inline expansion from doing unaligned vector or using vector load/store
> other than lvx/stvx. More description of the issue is here:
>
> https://patchwork.ozlabs.org/patch/814059/
>
> OK for trunk if bootstrap/regtest ok?
Okay, but see below.
> 2018-12-19 Aaron Sawdey <acsawdey@linux.ibm.com>
>
> * config/rs6000/rs6000-string.c (expand_block_move): Don't use
> unaligned vsx and avoid lxvd2x/stxvd2x.
> (gen_lvx_v4si_move): New function.
> +static rtx
> +gen_lvx_v4si_move (rtx dest, rtx src)
> +{
> + rtx rv = NULL;
> + if (MEM_P (dest))
> + {
> + gcc_assert (!MEM_P (src));
> + gcc_assert (GET_MODE (src) == V4SImode);
> + rv = gen_altivec_stvx_v4si_internal (dest, src);
> + }
> + else if (MEM_P (src))
> + {
> + gcc_assert (!MEM_P (dest));
> + gcc_assert (GET_MODE (dest) == V4SImode);
> + rv = gen_altivec_lvx_v4si_internal (dest, src);
> + }
> + else
> + gcc_unreachable ();
> +
> + return rv;
> +}
This is extraordinarily clumsy :-) Maybe something like:
static rtx
gen_lvx_v4si_move (rtx dest, rtx src)
{
gcc_assert (!(MEM_P (dest) && MEM_P (src));
gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
if (MEM_P (dest))
return gen_altivec_stvx_v4si_internal (dest, src);
else if (MEM_P (src))
return gen_altivec_lvx_v4si_internal (dest, src);
else
gcc_unreachable ();
}
(Or do you allow VOIDmode for src as well?) Anyway, at least get rid of
the useless extra variable.
Thanks!
Segher
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH][rs6000] avoid using unaligned vsx or lxvd2x/stxvd2x for memcpy/memmove inline expansion
2018-12-20 9:51 ` Segher Boessenkool
2018-12-20 11:16 ` Segher Boessenkool
@ 2018-12-20 23:44 ` Aaron Sawdey
2018-12-20 23:46 ` Segher Boessenkool
1 sibling, 1 reply; 8+ messages in thread
From: Aaron Sawdey @ 2018-12-20 23:44 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: GCC Patches, David Edelsohn, Bill Schmidt
On 12/20/18 3:51 AM, Segher Boessenkool wrote:
> On Wed, Dec 19, 2018 at 01:53:05PM -0600, Aaron Sawdey wrote:
>> Because of POWER9 dd2.1 issues with certain unaligned vsx instructions
>> to cache inhibited memory, here is a patch that keeps memmove (and memcpy)
>> inline expansion from doing unaligned vector or using vector load/store
>> other than lvx/stvx. More description of the issue is here:
>>
>> https://patchwork.ozlabs.org/patch/814059/
>>
>> OK for trunk if bootstrap/regtest ok?
>
> Okay, but see below.
>
[snip]
>
> This is extraordinarily clumsy :-) Maybe something like:
>
> static rtx
> gen_lvx_v4si_move (rtx dest, rtx src)
> {
> gcc_assert (!(MEM_P (dest) && MEM_P (src));
> gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
> if (MEM_P (dest))
> return gen_altivec_stvx_v4si_internal (dest, src);
> else if (MEM_P (src))
> return gen_altivec_lvx_v4si_internal (dest, src);
> else
> gcc_unreachable ();
> }
>
> (Or do you allow VOIDmode for src as well?) Anyway, at least get rid of
> the useless extra variable.
I think this should be better:
static rtx
gen_lvx_v4si_move (rtx dest, rtx src)
{
gcc_assert ((MEM_P (dest) && !MEM_P (src)) || (MEM_P (src) && !MEM_P(dest)));
gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
if (MEM_P (dest))
return gen_altivec_stvx_v4si_internal (dest, src);
else if (MEM_P (src))
return gen_altivec_lvx_v4si_internal (dest, src);
gcc_unreachable ();
}
I'll commit after I re-regstrap.
Thanks!
Aaron
--
Aaron Sawdey, Ph.D. acsawdey@linux.vnet.ibm.com
050-2/C113 (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH][rs6000] avoid using unaligned vsx or lxvd2x/stxvd2x for memcpy/memmove inline expansion
2018-12-20 23:44 ` Aaron Sawdey
@ 2018-12-20 23:46 ` Segher Boessenkool
2018-12-21 1:00 ` Aaron Sawdey
2019-01-14 18:49 ` Aaron Sawdey
0 siblings, 2 replies; 8+ messages in thread
From: Segher Boessenkool @ 2018-12-20 23:46 UTC (permalink / raw)
To: Aaron Sawdey; +Cc: GCC Patches, David Edelsohn, Bill Schmidt
On Thu, Dec 20, 2018 at 05:34:54PM -0600, Aaron Sawdey wrote:
> On 12/20/18 3:51 AM, Segher Boessenkool wrote:
> > On Wed, Dec 19, 2018 at 01:53:05PM -0600, Aaron Sawdey wrote:
> >> Because of POWER9 dd2.1 issues with certain unaligned vsx instructions
> >> to cache inhibited memory, here is a patch that keeps memmove (and memcpy)
> >> inline expansion from doing unaligned vector or using vector load/store
> >> other than lvx/stvx. More description of the issue is here:
> >>
> >> https://patchwork.ozlabs.org/patch/814059/
> >>
> >> OK for trunk if bootstrap/regtest ok?
> >
> > Okay, but see below.
> >
> [snip]
> >
> > This is extraordinarily clumsy :-) Maybe something like:
> >
> > static rtx
> > gen_lvx_v4si_move (rtx dest, rtx src)
> > {
> > gcc_assert (!(MEM_P (dest) && MEM_P (src));
> > gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
> > if (MEM_P (dest))
> > return gen_altivec_stvx_v4si_internal (dest, src);
> > else if (MEM_P (src))
> > return gen_altivec_lvx_v4si_internal (dest, src);
> > else
> > gcc_unreachable ();
> > }
> >
> > (Or do you allow VOIDmode for src as well?) Anyway, at least get rid of
> > the useless extra variable.
>
> I think this should be better:
The gcc_unreachable at the end catches the non-mem to non-mem case.
> static rtx
> gen_lvx_v4si_move (rtx dest, rtx src)
> {
> gcc_assert ((MEM_P (dest) && !MEM_P (src)) || (MEM_P (src) && !MEM_P(dest)));
But if you prefer this, how about
{
gcc_assert (MEM_P (dest) ^ MEM_P (src));
gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
if (MEM_P (dest))
return gen_altivec_stvx_v4si_internal (dest, src);
else
return gen_altivec_lvx_v4si_internal (dest, src);
}
:-)
Segher
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH][rs6000] avoid using unaligned vsx or lxvd2x/stxvd2x for memcpy/memmove inline expansion
2018-12-20 23:46 ` Segher Boessenkool
@ 2018-12-21 1:00 ` Aaron Sawdey
2019-01-14 18:49 ` Aaron Sawdey
1 sibling, 0 replies; 8+ messages in thread
From: Aaron Sawdey @ 2018-12-21 1:00 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: GCC Patches, David Edelsohn, Bill Schmidt
On 12/20/18 5:44 PM, Segher Boessenkool wrote:
> On Thu, Dec 20, 2018 at 05:34:54PM -0600, Aaron Sawdey wrote:
>> On 12/20/18 3:51 AM, Segher Boessenkool wrote:
>>> On Wed, Dec 19, 2018 at 01:53:05PM -0600, Aaron Sawdey wrote:
>>>> Because of POWER9 dd2.1 issues with certain unaligned vsx instructions
>>>> to cache inhibited memory, here is a patch that keeps memmove (and memcpy)
>>>> inline expansion from doing unaligned vector or using vector load/store
>>>> other than lvx/stvx. More description of the issue is here:
>>>>
>>>> https://patchwork.ozlabs.org/patch/814059/
>>>>
>>>> OK for trunk if bootstrap/regtest ok?
>>>
>>> Okay, but see below.
>>>
>> [snip]
>>>
>>> This is extraordinarily clumsy :-) Maybe something like:
>>>
>>> static rtx
>>> gen_lvx_v4si_move (rtx dest, rtx src)
>>> {
>>> gcc_assert (!(MEM_P (dest) && MEM_P (src));
>>> gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
>>> if (MEM_P (dest))
>>> return gen_altivec_stvx_v4si_internal (dest, src);
>>> else if (MEM_P (src))
>>> return gen_altivec_lvx_v4si_internal (dest, src);
>>> else
>>> gcc_unreachable ();
>>> }
>>>
>>> (Or do you allow VOIDmode for src as well?) Anyway, at least get rid of
>>> the useless extra variable.
>>
>> I think this should be better:
>
> The gcc_unreachable at the end catches the non-mem to non-mem case.
>
>> static rtx
>> gen_lvx_v4si_move (rtx dest, rtx src)
>> {
>> gcc_assert ((MEM_P (dest) && !MEM_P (src)) || (MEM_P (src) && !MEM_P(dest)));
>
> But if you prefer this, how about
>
> {
> gcc_assert (MEM_P (dest) ^ MEM_P (src));
> gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
>
> if (MEM_P (dest))
> return gen_altivec_stvx_v4si_internal (dest, src);
> else
> return gen_altivec_lvx_v4si_internal (dest, src);
> }
>
> :-)
>
>
> Segher
>
I like that even better, thanks!
--
Aaron Sawdey, Ph.D. acsawdey@linux.vnet.ibm.com
050-2/C113 (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH][rs6000] avoid using unaligned vsx or lxvd2x/stxvd2x for memcpy/memmove inline expansion
2018-12-20 23:46 ` Segher Boessenkool
2018-12-21 1:00 ` Aaron Sawdey
@ 2019-01-14 18:49 ` Aaron Sawdey
2019-01-16 16:28 ` Segher Boessenkool
1 sibling, 1 reply; 8+ messages in thread
From: Aaron Sawdey @ 2019-01-14 18:49 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: GCC Patches, David Edelsohn, Bill Schmidt
The patch for this was committed to trunk as 267562 (see below). Is this also ok for backport to 8?
Thanks,
Aaron
On 12/20/18 5:44 PM, Segher Boessenkool wrote:
> On Thu, Dec 20, 2018 at 05:34:54PM -0600, Aaron Sawdey wrote:
>> On 12/20/18 3:51 AM, Segher Boessenkool wrote:
>>> On Wed, Dec 19, 2018 at 01:53:05PM -0600, Aaron Sawdey wrote:
>>>> Because of POWER9 dd2.1 issues with certain unaligned vsx instructions
>>>> to cache inhibited memory, here is a patch that keeps memmove (and memcpy)
>>>> inline expansion from doing unaligned vector or using vector load/store
>>>> other than lvx/stvx. More description of the issue is here:
>>>>
>>>> https://patchwork.ozlabs.org/patch/814059/
>>>>
>>>> OK for trunk if bootstrap/regtest ok?
>>>
>>> Okay, but see below.
>>>
>> [snip]
>>>
>>> This is extraordinarily clumsy :-) Maybe something like:
>>>
>>> static rtx
>>> gen_lvx_v4si_move (rtx dest, rtx src)
>>> {
>>> gcc_assert (!(MEM_P (dest) && MEM_P (src));
>>> gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
>>> if (MEM_P (dest))
>>> return gen_altivec_stvx_v4si_internal (dest, src);
>>> else if (MEM_P (src))
>>> return gen_altivec_lvx_v4si_internal (dest, src);
>>> else
>>> gcc_unreachable ();
>>> }
>>>
>>> (Or do you allow VOIDmode for src as well?) Anyway, at least get rid of
>>> the useless extra variable.
>>
>> I think this should be better:
>
> The gcc_unreachable at the end catches the non-mem to non-mem case.
>
>> static rtx
>> gen_lvx_v4si_move (rtx dest, rtx src)
>> {
>> gcc_assert ((MEM_P (dest) && !MEM_P (src)) || (MEM_P (src) && !MEM_P(dest)));
>
> But if you prefer this, how about
>
> {
> gcc_assert (MEM_P (dest) ^ MEM_P (src));
> gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
>
> if (MEM_P (dest))
> return gen_altivec_stvx_v4si_internal (dest, src);
> else
> return gen_altivec_lvx_v4si_internal (dest, src);
> }
>
> :-)
>
>
> Segher
>
2019-01-03 Aaron Sawdey <acsawdey@linux.ibm.com>
* config/rs6000/rs6000-string.c (expand_block_move): Don't use
unaligned vsx and avoid lxvd2x/stxvd2x.
(gen_lvx_v4si_move): New function.
Index: gcc/config/rs6000/rs6000-string.c
===================================================================
--- gcc/config/rs6000/rs6000-string.c (revision 267299)
+++ gcc/config/rs6000/rs6000-string.c (working copy)
@@ -2669,6 +2669,25 @@
return true;
}
+/* Generate loads and stores for a move of v4si mode using lvx/stvx.
+ This uses altivec_{l,st}vx_<mode>_internal which use unspecs to
+ keep combine from changing what instruction gets used.
+
+ DEST is the destination for the data.
+ SRC is the source of the data for the move. */
+
+static rtx
+gen_lvx_v4si_move (rtx dest, rtx src)
+{
+ gcc_assert (MEM_P (dest) ^ MEM_P (src));
+ gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
+
+ if (MEM_P (dest))
+ return gen_altivec_stvx_v4si_internal (dest, src);
+ else
+ return gen_altivec_lvx_v4si_internal (dest, src);
+}
+
/* Expand a block move operation, and return 1 if successful. Return 0
if we should let the compiler generate normal code.
@@ -2721,11 +2740,11 @@
/* Altivec first, since it will be faster than a string move
when it applies, and usually not significantly larger. */
- if (TARGET_ALTIVEC && bytes >= 16 && (TARGET_EFFICIENT_UNALIGNED_VSX || align >= 128))
+ if (TARGET_ALTIVEC && bytes >= 16 && align >= 128)
{
move_bytes = 16;
mode = V4SImode;
- gen_func.mov = gen_movv4si;
+ gen_func.mov = gen_lvx_v4si_move;
}
else if (bytes >= 8 && TARGET_POWERPC64
&& (align >= 64 || !STRICT_ALIGNMENT))
--
Aaron Sawdey, Ph.D. acsawdey@linux.vnet.ibm.com
050-2/C113 (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH][rs6000] avoid using unaligned vsx or lxvd2x/stxvd2x for memcpy/memmove inline expansion
2019-01-14 18:49 ` Aaron Sawdey
@ 2019-01-16 16:28 ` Segher Boessenkool
0 siblings, 0 replies; 8+ messages in thread
From: Segher Boessenkool @ 2019-01-16 16:28 UTC (permalink / raw)
To: Aaron Sawdey; +Cc: GCC Patches, David Edelsohn, Bill Schmidt
On Mon, Jan 14, 2019 at 12:49:33PM -0600, Aaron Sawdey wrote:
> The patch for this was committed to trunk as 267562 (see below). Is this also ok for backport to 8?
Yes please. Thanks!
Segher
> On 12/20/18 5:44 PM, Segher Boessenkool wrote:
> > On Thu, Dec 20, 2018 at 05:34:54PM -0600, Aaron Sawdey wrote:
> >> On 12/20/18 3:51 AM, Segher Boessenkool wrote:
> >>> On Wed, Dec 19, 2018 at 01:53:05PM -0600, Aaron Sawdey wrote:
> >>>> Because of POWER9 dd2.1 issues with certain unaligned vsx instructions
> >>>> to cache inhibited memory, here is a patch that keeps memmove (and memcpy)
> >>>> inline expansion from doing unaligned vector or using vector load/store
> >>>> other than lvx/stvx. More description of the issue is here:
> >>>>
> >>>> https://patchwork.ozlabs.org/patch/814059/
> >>>>
> >>>> OK for trunk if bootstrap/regtest ok?
> >>>
> >>> Okay, but see below.
> >>>
> >> [snip]
> >>>
> >>> This is extraordinarily clumsy :-) Maybe something like:
> >>>
> >>> static rtx
> >>> gen_lvx_v4si_move (rtx dest, rtx src)
> >>> {
> >>> gcc_assert (!(MEM_P (dest) && MEM_P (src));
> >>> gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
> >>> if (MEM_P (dest))
> >>> return gen_altivec_stvx_v4si_internal (dest, src);
> >>> else if (MEM_P (src))
> >>> return gen_altivec_lvx_v4si_internal (dest, src);
> >>> else
> >>> gcc_unreachable ();
> >>> }
> >>>
> >>> (Or do you allow VOIDmode for src as well?) Anyway, at least get rid of
> >>> the useless extra variable.
> >>
> >> I think this should be better:
> >
> > The gcc_unreachable at the end catches the non-mem to non-mem case.
> >
> >> static rtx
> >> gen_lvx_v4si_move (rtx dest, rtx src)
> >> {
> >> gcc_assert ((MEM_P (dest) && !MEM_P (src)) || (MEM_P (src) && !MEM_P(dest)));
> >
> > But if you prefer this, how about
> >
> > {
> > gcc_assert (MEM_P (dest) ^ MEM_P (src));
> > gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
> >
> > if (MEM_P (dest))
> > return gen_altivec_stvx_v4si_internal (dest, src);
> > else
> > return gen_altivec_lvx_v4si_internal (dest, src);
> > }
> >
> > :-)
> >
> >
> > Segher
> >
>
> 2019-01-03 Aaron Sawdey <acsawdey@linux.ibm.com>
>
> * config/rs6000/rs6000-string.c (expand_block_move): Don't use
> unaligned vsx and avoid lxvd2x/stxvd2x.
> (gen_lvx_v4si_move): New function.
>
>
> Index: gcc/config/rs6000/rs6000-string.c
> ===================================================================
> --- gcc/config/rs6000/rs6000-string.c (revision 267299)
> +++ gcc/config/rs6000/rs6000-string.c (working copy)
> @@ -2669,6 +2669,25 @@
> return true;
> }
>
> +/* Generate loads and stores for a move of v4si mode using lvx/stvx.
> + This uses altivec_{l,st}vx_<mode>_internal which use unspecs to
> + keep combine from changing what instruction gets used.
> +
> + DEST is the destination for the data.
> + SRC is the source of the data for the move. */
> +
> +static rtx
> +gen_lvx_v4si_move (rtx dest, rtx src)
> +{
> + gcc_assert (MEM_P (dest) ^ MEM_P (src));
> + gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
> +
> + if (MEM_P (dest))
> + return gen_altivec_stvx_v4si_internal (dest, src);
> + else
> + return gen_altivec_lvx_v4si_internal (dest, src);
> +}
> +
> /* Expand a block move operation, and return 1 if successful. Return 0
> if we should let the compiler generate normal code.
>
> @@ -2721,11 +2740,11 @@
>
> /* Altivec first, since it will be faster than a string move
> when it applies, and usually not significantly larger. */
> - if (TARGET_ALTIVEC && bytes >= 16 && (TARGET_EFFICIENT_UNALIGNED_VSX || align >= 128))
> + if (TARGET_ALTIVEC && bytes >= 16 && align >= 128)
> {
> move_bytes = 16;
> mode = V4SImode;
> - gen_func.mov = gen_movv4si;
> + gen_func.mov = gen_lvx_v4si_move;
> }
> else if (bytes >= 8 && TARGET_POWERPC64
> && (align >= 64 || !STRICT_ALIGNMENT))
>
>
>
> --
> Aaron Sawdey, Ph.D. acsawdey@linux.vnet.ibm.com
> 050-2/C113 (507) 253-7520 home: 507/263-0782
> IBM Linux Technology Center - PPC Toolchain
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2019-01-16 16:28 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-19 19:53 [PATCH][rs6000] avoid using unaligned vsx or lxvd2x/stxvd2x for memcpy/memmove inline expansion Aaron Sawdey
2018-12-20 9:51 ` Segher Boessenkool
2018-12-20 11:16 ` Segher Boessenkool
2018-12-20 23:44 ` Aaron Sawdey
2018-12-20 23:46 ` Segher Boessenkool
2018-12-21 1:00 ` Aaron Sawdey
2019-01-14 18:49 ` Aaron Sawdey
2019-01-16 16:28 ` Segher Boessenkool
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).