From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-406571-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 91152 invoked by alias); 2 Sep 2015 21:14:11 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 91132 invoked by uid 89); 2 Sep 2015 21:14:08 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.1 required=5.0 tests=AWL,BAYES_50,KAM_ASCII_DIVIDERS,KAM_LAZY_DOMAIN_SECURITY,T_RP_MATCHES_RCVD autolearn=no version=3.3.2
X-HELO: e18.ny.us.ibm.com
Received: from e18.ny.us.ibm.com (HELO e18.ny.us.ibm.com) (129.33.205.208) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (CAMELLIA256-SHA encrypted) ESMTPS; Wed, 02 Sep 2015 21:14:07 +0000
Received: from /spool/local	by e18.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted	for <gcc-patches@gcc.gnu.org> from <wschmidt@linux.vnet.ibm.com>;	Wed, 2 Sep 2015 17:14:04 -0400
Received: from d01dlp03.pok.ibm.com (9.56.250.168)	by e18.ny.us.ibm.com (146.89.104.205) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted;	Wed, 2 Sep 2015 17:14:02 -0400
X-MailFrom: wschmidt@linux.vnet.ibm.com
X-RcptTo: gcc-patches@gcc.gnu.org
Received: from b01cxnp22035.gho.pok.ibm.com (b01cxnp22035.gho.pok.ibm.com [9.57.198.25])	by d01dlp03.pok.ibm.com (Postfix) with ESMTP id 25852C90048	for <gcc-patches@gcc.gnu.org>; Wed,  2 Sep 2015 17:05:06 -0400 (EDT)
Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64])	by b01cxnp22035.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id t82LE1qZ55443572	for <gcc-patches@gcc.gnu.org>; Wed, 2 Sep 2015 21:14:01 GMT
Received: from d01av04.pok.ibm.com (localhost [127.0.0.1])	by d01av04.pok.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id t82LE1D7021808	for <gcc-patches@gcc.gnu.org>; Wed, 2 Sep 2015 17:14:01 -0400
Received: from [9.65.139.205] (sig-9-65-139-205.ibm.com [9.65.139.205])	by d01av04.pok.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id t82LE09r021650;	Wed, 2 Sep 2015 17:14:01 -0400
Message-ID: <1441228440.9007.0.camel@gnopaine>
Subject: Re: [PATCH] Fix ICE when generating a vector shift by scalar
From: Bill Schmidt <wschmidt@linux.vnet.ibm.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Date: Wed, 02 Sep 2015 21:53:00 -0000
In-Reply-To: <CAFiYyc0MCgfUQXmwa=R9oyXvk2eJBP_s+2SFbRMUee74+H=_0A@mail.gmail.com>
References: <1441052882.4779.3.camel@oc8801110288.ibm.com>	 <CAFiYyc2LDEXmRURfet6G691AON4UPjt6KEJRZy4Szz=HKfHESg@mail.gmail.com>	 <1441122782.4925.6.camel@oc8801110288.ibm.com>	 <CAFiYyc0MCgfUQXmwa=R9oyXvk2eJBP_s+2SFbRMUee74+H=_0A@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
X-TM-AS-MML: disable
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 15090221-0045-0000-0000-0000014322EB
X-IsSubscribed: yes
X-SW-Source: 2015-09/txt/msg00196.txt.bz2


On Wed, 2015-09-02 at 14:44 +0200, Richard Biener wrote:
> On Tue, Sep 1, 2015 at 5:53 PM, Bill Schmidt
> <wschmidt@linux.vnet.ibm.com> wrote:
> > On Tue, 2015-09-01 at 11:01 +0200, Richard Biener wrote:
> >> On Mon, Aug 31, 2015 at 10:28 PM, Bill Schmidt
> >> <wschmidt@linux.vnet.ibm.com> wrote:
> >> > Hi,
> >> >
> >> > The following simple test fails when attempting to convert a vector
> >> > shift-by-scalar into a vector shift-by-vector.
> >> >
> >> >   typedef unsigned char v16ui __attribute__((vector_size(16)));
> >> >
> >> >   v16ui vslb(v16ui v, unsigned char i)
> >> >   {
> >> >     return v << i;
> >> >   }
> >> >
> >> > When this code is gimplified, the shift amount gets expanded to an
> >> > unsigned int:
> >> >
> >> >   vslb (v16ui v, unsigned char i)
> >> >   {
> >> >     v16ui D.2300;
> >> >     unsigned int D.2301;
> >> >
> >> >     D.2301 = (unsigned int) i;
> >> >     D.2300 = v << D.2301;
> >> >     return D.2300;
> >> >   }
> >> >
> >> > In expand_binop, the shift-by-scalar is converted into a shift-by-vector
> >> > using expand_vector_broadcast, which produces the following rtx to be
> >> > used to initialize a V16QI vector:
> >> >
> >> > (parallel:V16QI [
> >> >         (subreg/s/v:SI (reg:DI 155) 0)
> >> >         (subreg/s/v:SI (reg:DI 155) 0)
> >> >         (subreg/s/v:SI (reg:DI 155) 0)
> >> >         (subreg/s/v:SI (reg:DI 155) 0)
> >> >         (subreg/s/v:SI (reg:DI 155) 0)
> >> >         (subreg/s/v:SI (reg:DI 155) 0)
> >> >         (subreg/s/v:SI (reg:DI 155) 0)
> >> >         (subreg/s/v:SI (reg:DI 155) 0)
> >> >         (subreg/s/v:SI (reg:DI 155) 0)
> >> >         (subreg/s/v:SI (reg:DI 155) 0)
> >> >         (subreg/s/v:SI (reg:DI 155) 0)
> >> >         (subreg/s/v:SI (reg:DI 155) 0)
> >> >         (subreg/s/v:SI (reg:DI 155) 0)
> >> >         (subreg/s/v:SI (reg:DI 155) 0)
> >> >         (subreg/s/v:SI (reg:DI 155) 0)
> >> >         (subreg/s/v:SI (reg:DI 155) 0)
> >> >     ])
> >> >
> >> > The back end eventually chokes trying to generate a copy of the SImode
> >> > expression into a QImode memory slot.
> >> >
> >> > This patch fixes this problem by ensuring that the shift amount is
> >> > truncated to the inner mode of the vector when necessary.  I've added a
> >> > test case verifying correct PowerPC code generation in this case.
> >> >
> >> > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> >> > regressions.  Is this ok for trunk?
> >> >
> >> > Thanks,
> >> > Bill
> >> >
> >> >
> >> > [gcc]
> >> >
> >> > 2015-08-31  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> >> >
> >> >         * optabs.c (expand_binop): Don't create a broadcast vector with a
> >> >         source element wider than the inner mode.
> >> >
> >> > [gcc/testsuite]
> >> >
> >> > 2015-08-31  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> >> >
> >> >         * gcc.target/powerpc/vec-shift.c: New test.
> >> >
> >> >
> >> > Index: gcc/optabs.c
> >> > ===================================================================
> >> > --- gcc/optabs.c        (revision 227353)
> >> > +++ gcc/optabs.c        (working copy)
> >> > @@ -1608,6 +1608,13 @@ expand_binop (machine_mode mode, optab binoptab, r
> >> >
> >> >        if (otheroptab && optab_handler (otheroptab, mode) != CODE_FOR_nothing)
> >> >         {
> >> > +         /* The scalar may have been extended to be too wide.  Truncate
> >> > +            it back to the proper size to fit in the broadcast vector.  */
> >> > +         machine_mode inner_mode = GET_MODE_INNER (mode);
> >> > +         if (GET_MODE_BITSIZE (inner_mode)
> >> > +             < GET_MODE_BITSIZE (GET_MODE (op1)))
> >>
> >> Does that work for modeless constants?  Btw, what do other targets do
> >> here?  Do they
> >> also choke or do they cope with the wide operand?
> >
> > Good question.  This works by serendipity more than by design.  Because
> > a constant has a mode of VOIDmode, its bitsize is 0 and the TRUNCATE
> > won't be generated.  It would be better for me to put in an explicit
> > check for CONST_INT rather than relying on this, though.  I'll fix that.
> >
> > I am not sure what other targets do here; I can check.  However, do you
> > think that's relevant?  I'm concerned that
> >
> > (parallel:V16QI [
> >         (subreg/s/v:SI (reg:DI 155) 0)
> >         (subreg/s/v:SI (reg:DI 155) 0)
> >         (subreg/s/v:SI (reg:DI 155) 0)
> >         (subreg/s/v:SI (reg:DI 155) 0)
> >         (subreg/s/v:SI (reg:DI 155) 0)
> >         (subreg/s/v:SI (reg:DI 155) 0)
> >         (subreg/s/v:SI (reg:DI 155) 0)
> >         (subreg/s/v:SI (reg:DI 155) 0)
> >         (subreg/s/v:SI (reg:DI 155) 0)
> >         (subreg/s/v:SI (reg:DI 155) 0)
> >         (subreg/s/v:SI (reg:DI 155) 0)
> >         (subreg/s/v:SI (reg:DI 155) 0)
> >         (subreg/s/v:SI (reg:DI 155) 0)
> >         (subreg/s/v:SI (reg:DI 155) 0)
> >         (subreg/s/v:SI (reg:DI 155) 0)
> >         (subreg/s/v:SI (reg:DI 155) 0)
> >     ])
> >
> > is a nonsensical expression and shouldn't be produced by common code, in
> > my view.  It seems best to make this explicitly correct.  Please let me
> > know if that's off-base.
> 
> No, the above indeed looks fishy though other backends vec_init_optab might
> have just handle it fine.
> 
> OTOH if a conversion is required it would be nice to CSE it, thus
> force the result to a register (not sure if the targets handle invalid
> RTL sharing in vec_init_optab).

Agreed.  I've fixed the modeless constant issue and added a force_reg on
the conversion.  New patch below, bootstrapped and tested on
powerpc64le-unknown-linux-gnu with no regressions.  Is this version ok?

Thanks!
Bill

> 
> > Thanks,
> > Bill
> >
> >>
> >> > +           op1 = simplify_gen_unary (TRUNCATE, inner_mode, op1,
> >> > +                                     GET_MODE (op1));
> >> >           rtx vop1 = expand_vector_broadcast (mode, op1);
> >> >           if (vop1)
> >> >             {
> >> > Index: gcc/testsuite/gcc.target/powerpc/vec-shift.c
> >> > ===================================================================
> >> > --- gcc/testsuite/gcc.target/powerpc/vec-shift.c        (revision 0)
> >> > +++ gcc/testsuite/gcc.target/powerpc/vec-shift.c        (working copy)
> >> > @@ -0,0 +1,20 @@
> >> > +/* { dg-do compile { target { powerpc*-*-* } } } */
> >> > +/* { dg-require-effective-target powerpc_altivec_ok } */
> >> > +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
> >> > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */
> >> > +/* { dg-options "-mcpu=power7 -O2" } */
> >> > +
> >> > +/* This used to ICE.  During gimplification, "i" is widened to an unsigned
> >> > +   int.  We used to fail at expand time as we tried to cram an SImode item
> >> > +   into a QImode memory slot.  This has been fixed to properly truncate the
> >> > +   shift amount when splatting it into a vector.  */
> >> > +
> >> > +typedef unsigned char v16ui __attribute__((vector_size(16)));
> >> > +
> >> > +v16ui vslb(v16ui v, unsigned char i)
> >> > +{
> >> > +       return v << i;
> >> > +}
> >> > +
> >> > +/* { dg-final { scan-assembler "vspltb" } } */
> >> > +/* { dg-final { scan-assembler "vslb" } } */
> >> >


New patch below:

[gcc]

2015-09-02  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* optabs.c (expand_binop): Don't create a broadcast vector with a
	source element wider than the inner mode.

[gcc/testsuite]

2015-09-02  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* gcc.target/powerpc/vec-shift.c: New test.


Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	(revision 227416)
+++ gcc/optabs.c	(working copy)
@@ -1608,6 +1608,15 @@ expand_binop (machine_mode mode, optab binoptab, r
 
       if (otheroptab && optab_handler (otheroptab, mode) != CODE_FOR_nothing)
 	{
+	  /* The scalar may have been extended to be too wide.  Truncate
+	     it back to the proper size to fit in the broadcast vector.  */
+	  machine_mode inner_mode = GET_MODE_INNER (mode);
+	  if (!CONST_INT_P (op1)
+	      && (GET_MODE_BITSIZE (inner_mode)
+		  < GET_MODE_BITSIZE (GET_MODE (op1))))
+	    op1 = force_reg (inner_mode,
+			     simplify_gen_unary (TRUNCATE, inner_mode, op1,
+						 GET_MODE (op1)));
 	  rtx vop1 = expand_vector_broadcast (mode, op1);
 	  if (vop1)
 	    {
Index: gcc/testsuite/gcc.target/powerpc/vec-shift.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/vec-shift.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/vec-shift.c	(working copy)
@@ -0,0 +1,20 @@
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */
+/* { dg-options "-mcpu=power7 -O2" } */
+
+/* This used to ICE.  During gimplification, "i" is widened to an unsigned
+   int.  We used to fail at expand time as we tried to cram an SImode item
+   into a QImode memory slot.  This has been fixed to properly truncate the
+   shift amount when splatting it into a vector.  */
+
+typedef unsigned char v16ui __attribute__((vector_size(16)));
+
+v16ui vslb(v16ui v, unsigned char i)
+{
+	return v << i;
+}
+
+/* { dg-final { scan-assembler "vspltb" } } */
+/* { dg-final { scan-assembler "vslb" } } */