From: Aaron Sawdey <acsawdey@linux.ibm.com>
To: Segher Boessenkool <segher@kernel.crashing.org>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>,
David Edelsohn <dje.gcc@gmail.com>,
Bill Schmidt <wschmidt@linux.ibm.com>
Subject: Re: [PATCH][rs6000] avoid using unaligned vsx or lxvd2x/stxvd2x for memcpy/memmove inline expansion
Date: Mon, 14 Jan 2019 18:49:00 -0000 [thread overview]
Message-ID: <578b94c0-d1d4-46d5-25d5-7077c306c3ea@linux.ibm.com> (raw)
In-Reply-To: <20181220234402.GX3803@gate.crashing.org>
The patch for this was committed to trunk as 267562 (see below). Is this also ok for backport to 8?
Thanks,
Aaron
On 12/20/18 5:44 PM, Segher Boessenkool wrote:
> On Thu, Dec 20, 2018 at 05:34:54PM -0600, Aaron Sawdey wrote:
>> On 12/20/18 3:51 AM, Segher Boessenkool wrote:
>>> On Wed, Dec 19, 2018 at 01:53:05PM -0600, Aaron Sawdey wrote:
>>>> Because of POWER9 dd2.1 issues with certain unaligned vsx instructions
>>>> to cache inhibited memory, here is a patch that keeps memmove (and memcpy)
>>>> inline expansion from doing unaligned vector or using vector load/store
>>>> other than lvx/stvx. More description of the issue is here:
>>>>
>>>> https://patchwork.ozlabs.org/patch/814059/
>>>>
>>>> OK for trunk if bootstrap/regtest ok?
>>>
>>> Okay, but see below.
>>>
>> [snip]
>>>
>>> This is extraordinarily clumsy :-) Maybe something like:
>>>
>>> static rtx
>>> gen_lvx_v4si_move (rtx dest, rtx src)
>>> {
>>> gcc_assert (!(MEM_P (dest) && MEM_P (src));
>>> gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
>>> if (MEM_P (dest))
>>> return gen_altivec_stvx_v4si_internal (dest, src);
>>> else if (MEM_P (src))
>>> return gen_altivec_lvx_v4si_internal (dest, src);
>>> else
>>> gcc_unreachable ();
>>> }
>>>
>>> (Or do you allow VOIDmode for src as well?) Anyway, at least get rid of
>>> the useless extra variable.
>>
>> I think this should be better:
>
> The gcc_unreachable at the end catches the non-mem to non-mem case.
>
>> static rtx
>> gen_lvx_v4si_move (rtx dest, rtx src)
>> {
>> gcc_assert ((MEM_P (dest) && !MEM_P (src)) || (MEM_P (src) && !MEM_P(dest)));
>
> But if you prefer this, how about
>
> {
> gcc_assert (MEM_P (dest) ^ MEM_P (src));
> gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
>
> if (MEM_P (dest))
> return gen_altivec_stvx_v4si_internal (dest, src);
> else
> return gen_altivec_lvx_v4si_internal (dest, src);
> }
>
> :-)
>
>
> Segher
>
2019-01-03 Aaron Sawdey <acsawdey@linux.ibm.com>
* config/rs6000/rs6000-string.c (expand_block_move): Don't use
unaligned vsx and avoid lxvd2x/stxvd2x.
(gen_lvx_v4si_move): New function.
Index: gcc/config/rs6000/rs6000-string.c
===================================================================
--- gcc/config/rs6000/rs6000-string.c (revision 267299)
+++ gcc/config/rs6000/rs6000-string.c (working copy)
@@ -2669,6 +2669,25 @@
return true;
}
+/* Generate loads and stores for a move of v4si mode using lvx/stvx.
+ This uses altivec_{l,st}vx_<mode>_internal which use unspecs to
+ keep combine from changing what instruction gets used.
+
+ DEST is the destination for the data.
+ SRC is the source of the data for the move. */
+
+static rtx
+gen_lvx_v4si_move (rtx dest, rtx src)
+{
+ gcc_assert (MEM_P (dest) ^ MEM_P (src));
+ gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
+
+ if (MEM_P (dest))
+ return gen_altivec_stvx_v4si_internal (dest, src);
+ else
+ return gen_altivec_lvx_v4si_internal (dest, src);
+}
+
/* Expand a block move operation, and return 1 if successful. Return 0
if we should let the compiler generate normal code.
@@ -2721,11 +2740,11 @@
/* Altivec first, since it will be faster than a string move
when it applies, and usually not significantly larger. */
- if (TARGET_ALTIVEC && bytes >= 16 && (TARGET_EFFICIENT_UNALIGNED_VSX || align >= 128))
+ if (TARGET_ALTIVEC && bytes >= 16 && align >= 128)
{
move_bytes = 16;
mode = V4SImode;
- gen_func.mov = gen_movv4si;
+ gen_func.mov = gen_lvx_v4si_move;
}
else if (bytes >= 8 && TARGET_POWERPC64
&& (align >= 64 || !STRICT_ALIGNMENT))
--
Aaron Sawdey, Ph.D. acsawdey@linux.vnet.ibm.com
050-2/C113 (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain
next prev parent reply other threads:[~2019-01-14 18:49 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-19 19:53 Aaron Sawdey
2018-12-20 9:51 ` Segher Boessenkool
2018-12-20 11:16 ` Segher Boessenkool
2018-12-20 23:44 ` Aaron Sawdey
2018-12-20 23:46 ` Segher Boessenkool
2018-12-21 1:00 ` Aaron Sawdey
2019-01-14 18:49 ` Aaron Sawdey [this message]
2019-01-16 16:28 ` Segher Boessenkool
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=578b94c0-d1d4-46d5-25d5-7077c306c3ea@linux.ibm.com \
--to=acsawdey@linux.ibm.com \
--cc=dje.gcc@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=segher@kernel.crashing.org \
--cc=wschmidt@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).