From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-494155-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 3108 invoked by alias); 16 Jan 2019 16:28:06 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 2418 invoked by uid 89); 16 Jan 2019 16:28:05 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-11.9 required=5.0 tests=BAYES_00,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 spammy=UD:D, 05PM, Toolchain, 05pm
X-HELO: gate.crashing.org
Received: from gate.crashing.org (HELO gate.crashing.org) (63.228.1.57) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 16 Jan 2019 16:28:03 +0000
Received: from gate.crashing.org (localhost.localdomain [127.0.0.1])	by gate.crashing.org (8.14.1/8.14.1) with ESMTP id x0GGRuG1027009;	Wed, 16 Jan 2019 10:27:57 -0600
Received: (from segher@localhost)	by gate.crashing.org (8.14.1/8.14.1/Submit) id x0GGRsPC027008;	Wed, 16 Jan 2019 10:27:54 -0600
Date: Wed, 16 Jan 2019 16:28:00 -0000
From: Segher Boessenkool <segher@kernel.crashing.org>
To: Aaron Sawdey <acsawdey@linux.ibm.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>, David Edelsohn <dje.gcc@gmail.com>,        Bill Schmidt <wschmidt@linux.ibm.com>
Subject: Re: [PATCH][rs6000] avoid using unaligned vsx or lxvd2x/stxvd2x for memcpy/memmove inline expansion
Message-ID: <20190116162753.GN14180@gate.crashing.org>
References: <0a17416b-57a0-99e7-2e7e-90a63da66fe6@linux.ibm.com> <20181220095119.GP3803@gate.crashing.org> <30fd466c-43c7-86aa-81f2-181a9d9ca7fc@linux.ibm.com> <20181220234402.GX3803@gate.crashing.org> <578b94c0-d1d4-46d5-25d5-7077c306c3ea@linux.ibm.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <578b94c0-d1d4-46d5-25d5-7077c306c3ea@linux.ibm.com>
User-Agent: Mutt/1.4.2.3i
X-IsSubscribed: yes
X-SW-Source: 2019-01/txt/msg00924.txt.bz2

On Mon, Jan 14, 2019 at 12:49:33PM -0600, Aaron Sawdey wrote:
> The patch for this was committed to trunk as 267562 (see below). Is this also ok for backport to 8?

Yes please.  Thanks!


Segher


> On 12/20/18 5:44 PM, Segher Boessenkool wrote:
> > On Thu, Dec 20, 2018 at 05:34:54PM -0600, Aaron Sawdey wrote:
> >> On 12/20/18 3:51 AM, Segher Boessenkool wrote:
> >>> On Wed, Dec 19, 2018 at 01:53:05PM -0600, Aaron Sawdey wrote:
> >>>> Because of POWER9 dd2.1 issues with certain unaligned vsx instructions
> >>>> to cache inhibited memory, here is a patch that keeps memmove (and memcpy)
> >>>> inline expansion from doing unaligned vector or using vector load/store
> >>>> other than lvx/stvx. More description of the issue is here:
> >>>>
> >>>> https://patchwork.ozlabs.org/patch/814059/
> >>>>
> >>>> OK for trunk if bootstrap/regtest ok?
> >>>
> >>> Okay, but see below.
> >>>
> >> [snip]
> >>>
> >>> This is extraordinarily clumsy :-)  Maybe something like:
> >>>
> >>> static rtx
> >>> gen_lvx_v4si_move (rtx dest, rtx src)
> >>> {
> >>>   gcc_assert (!(MEM_P (dest) && MEM_P (src));
> >>>   gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
> >>>   if (MEM_P (dest))
> >>>     return gen_altivec_stvx_v4si_internal (dest, src);
> >>>   else if (MEM_P (src))
> >>>     return gen_altivec_lvx_v4si_internal (dest, src);
> >>>   else
> >>>     gcc_unreachable ();
> >>> }
> >>>
> >>> (Or do you allow VOIDmode for src as well?)  Anyway, at least get rid of
> >>> the useless extra variable.
> >>
> >> I think this should be better:
> > 
> > The gcc_unreachable at the end catches the non-mem to non-mem case.
> > 
> >> static rtx
> >> gen_lvx_v4si_move (rtx dest, rtx src)
> >> {
> >>   gcc_assert ((MEM_P (dest) && !MEM_P (src)) || (MEM_P (src) && !MEM_P(dest)));
> > 
> > But if you prefer this, how about
> > 
> > {
> >   gcc_assert (MEM_P (dest) ^ MEM_P (src));
> >   gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
> > 
> >   if (MEM_P (dest))
> >     return gen_altivec_stvx_v4si_internal (dest, src);
> >   else
> >     return gen_altivec_lvx_v4si_internal (dest, src);
> > }
> > 
> > :-)
> > 
> > 
> > Segher
> > 
> 
> 2019-01-03  Aaron Sawdey  <acsawdey@linux.ibm.com>
> 
>         * config/rs6000/rs6000-string.c (expand_block_move): Don't use
>         unaligned vsx and avoid lxvd2x/stxvd2x.
>         (gen_lvx_v4si_move): New function.
> 
> 
> Index: gcc/config/rs6000/rs6000-string.c
> ===================================================================
> --- gcc/config/rs6000/rs6000-string.c	(revision 267299)
> +++ gcc/config/rs6000/rs6000-string.c	(working copy)
> @@ -2669,6 +2669,25 @@
>    return true;
>  }
> 
> +/* Generate loads and stores for a move of v4si mode using lvx/stvx.
> +   This uses altivec_{l,st}vx_<mode>_internal which use unspecs to
> +   keep combine from changing what instruction gets used.
> +
> +   DEST is the destination for the data.
> +   SRC is the source of the data for the move.  */
> +
> +static rtx
> +gen_lvx_v4si_move (rtx dest, rtx src)
> +{
> +  gcc_assert (MEM_P (dest) ^ MEM_P (src));
> +  gcc_assert (GET_MODE (dest) == V4SImode && GET_MODE (src) == V4SImode);
> +
> +  if (MEM_P (dest))
> +    return gen_altivec_stvx_v4si_internal (dest, src);
> +  else
> +    return gen_altivec_lvx_v4si_internal (dest, src);
> +}
> +
>  /* Expand a block move operation, and return 1 if successful.  Return 0
>     if we should let the compiler generate normal code.
> 
> @@ -2721,11 +2740,11 @@
> 
>        /* Altivec first, since it will be faster than a string move
>  	 when it applies, and usually not significantly larger.  */
> -      if (TARGET_ALTIVEC && bytes >= 16 && (TARGET_EFFICIENT_UNALIGNED_VSX || align >= 128))
> +      if (TARGET_ALTIVEC && bytes >= 16 && align >= 128)
>  	{
>  	  move_bytes = 16;
>  	  mode = V4SImode;
> -	  gen_func.mov = gen_movv4si;
> +	  gen_func.mov = gen_lvx_v4si_move;
>  	}
>        else if (bytes >= 8 && TARGET_POWERPC64
>  	       && (align >= 64 || !STRICT_ALIGNMENT))
> 
> 
> 
> -- 
> Aaron Sawdey, Ph.D.  acsawdey@linux.vnet.ibm.com
> 050-2/C113  (507) 253-7520 home: 507/263-0782
> IBM Linux Technology Center - PPC Toolchain