From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-406605-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 84291 invoked by alias); 3 Sep 2015 09:40:44 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 84281 invoked by uid 89); 3 Sep 2015 09:40:43 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.2 required=5.0 tests=AWL,BAYES_50,FREEMAIL_FROM,KAM_ASCII_DIVIDERS,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=no version=3.3.2
X-HELO: mail-yk0-f182.google.com
Received: from mail-yk0-f182.google.com (HELO mail-yk0-f182.google.com) (209.85.160.182) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Thu, 03 Sep 2015 09:40:41 +0000
Received: by ykei199 with SMTP id i199so37526451yke.0        for <gcc-patches@gcc.gnu.org>; Thu, 03 Sep 2015 02:40:39 -0700 (PDT)
MIME-Version: 1.0
X-Received: by 10.170.81.5 with SMTP id x5mr9373010ykx.82.1441273239569; Thu, 03 Sep 2015 02:40:39 -0700 (PDT)
Received: by 10.37.93.136 with HTTP; Thu, 3 Sep 2015 02:40:39 -0700 (PDT)
In-Reply-To: <1441228440.9007.0.camel@gnopaine>
References: <1441052882.4779.3.camel@oc8801110288.ibm.com>	<CAFiYyc2LDEXmRURfet6G691AON4UPjt6KEJRZy4Szz=HKfHESg@mail.gmail.com>	<1441122782.4925.6.camel@oc8801110288.ibm.com>	<CAFiYyc0MCgfUQXmwa=R9oyXvk2eJBP_s+2SFbRMUee74+H=_0A@mail.gmail.com>	<1441228440.9007.0.camel@gnopaine>
Date: Thu, 03 Sep 2015 09:42:00 -0000
Message-ID: <CAFiYyc3gndz_JpZ6qmfbnhPHuuJGrF-fUh7d1xHwSqQwi576EA@mail.gmail.com>
Subject: Re: [PATCH] Fix ICE when generating a vector shift by scalar
From: Richard Biener <richard.guenther@gmail.com>
To: Bill Schmidt <wschmidt@linux.vnet.ibm.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset=UTF-8
X-IsSubscribed: yes
X-SW-Source: 2015-09/txt/msg00230.txt.bz2

On Wed, Sep 2, 2015 at 11:14 PM, Bill Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
>
> On Wed, 2015-09-02 at 14:44 +0200, Richard Biener wrote:
>> On Tue, Sep 1, 2015 at 5:53 PM, Bill Schmidt
>> <wschmidt@linux.vnet.ibm.com> wrote:
>> > On Tue, 2015-09-01 at 11:01 +0200, Richard Biener wrote:
>> >> On Mon, Aug 31, 2015 at 10:28 PM, Bill Schmidt
>> >> <wschmidt@linux.vnet.ibm.com> wrote:
>> >> > Hi,
>> >> >
>> >> > The following simple test fails when attempting to convert a vector
>> >> > shift-by-scalar into a vector shift-by-vector.
>> >> >
>> >> >   typedef unsigned char v16ui __attribute__((vector_size(16)));
>> >> >
>> >> >   v16ui vslb(v16ui v, unsigned char i)
>> >> >   {
>> >> >     return v << i;
>> >> >   }
>> >> >
>> >> > When this code is gimplified, the shift amount gets expanded to an
>> >> > unsigned int:
>> >> >
>> >> >   vslb (v16ui v, unsigned char i)
>> >> >   {
>> >> >     v16ui D.2300;
>> >> >     unsigned int D.2301;
>> >> >
>> >> >     D.2301 = (unsigned int) i;
>> >> >     D.2300 = v << D.2301;
>> >> >     return D.2300;
>> >> >   }
>> >> >
>> >> > In expand_binop, the shift-by-scalar is converted into a shift-by-vector
>> >> > using expand_vector_broadcast, which produces the following rtx to be
>> >> > used to initialize a V16QI vector:
>> >> >
>> >> > (parallel:V16QI [
>> >> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >> >     ])
>> >> >
>> >> > The back end eventually chokes trying to generate a copy of the SImode
>> >> > expression into a QImode memory slot.
>> >> >
>> >> > This patch fixes this problem by ensuring that the shift amount is
>> >> > truncated to the inner mode of the vector when necessary.  I've added a
>> >> > test case verifying correct PowerPC code generation in this case.
>> >> >
>> >> > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
>> >> > regressions.  Is this ok for trunk?
>> >> >
>> >> > Thanks,
>> >> > Bill
>> >> >
>> >> >
>> >> > [gcc]
>> >> >
>> >> > 2015-08-31  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>> >> >
>> >> >         * optabs.c (expand_binop): Don't create a broadcast vector with a
>> >> >         source element wider than the inner mode.
>> >> >
>> >> > [gcc/testsuite]
>> >> >
>> >> > 2015-08-31  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>> >> >
>> >> >         * gcc.target/powerpc/vec-shift.c: New test.
>> >> >
>> >> >
>> >> > Index: gcc/optabs.c
>> >> > ===================================================================
>> >> > --- gcc/optabs.c        (revision 227353)
>> >> > +++ gcc/optabs.c        (working copy)
>> >> > @@ -1608,6 +1608,13 @@ expand_binop (machine_mode mode, optab binoptab, r
>> >> >
>> >> >        if (otheroptab && optab_handler (otheroptab, mode) != CODE_FOR_nothing)
>> >> >         {
>> >> > +         /* The scalar may have been extended to be too wide.  Truncate
>> >> > +            it back to the proper size to fit in the broadcast vector.  */
>> >> > +         machine_mode inner_mode = GET_MODE_INNER (mode);
>> >> > +         if (GET_MODE_BITSIZE (inner_mode)
>> >> > +             < GET_MODE_BITSIZE (GET_MODE (op1)))
>> >>
>> >> Does that work for modeless constants?  Btw, what do other targets do
>> >> here?  Do they
>> >> also choke or do they cope with the wide operand?
>> >
>> > Good question.  This works by serendipity more than by design.  Because
>> > a constant has a mode of VOIDmode, its bitsize is 0 and the TRUNCATE
>> > won't be generated.  It would be better for me to put in an explicit
>> > check for CONST_INT rather than relying on this, though.  I'll fix that.
>> >
>> > I am not sure what other targets do here; I can check.  However, do you
>> > think that's relevant?  I'm concerned that
>> >
>> > (parallel:V16QI [
>> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >         (subreg/s/v:SI (reg:DI 155) 0)
>> >     ])
>> >
>> > is a nonsensical expression and shouldn't be produced by common code, in
>> > my view.  It seems best to make this explicitly correct.  Please let me
>> > know if that's off-base.
>>
>> No, the above indeed looks fishy though other backends vec_init_optab might
>> have just handle it fine.
>>
>> OTOH if a conversion is required it would be nice to CSE it, thus
>> force the result to a register (not sure if the targets handle invalid
>> RTL sharing in vec_init_optab).
>
> Agreed.  I've fixed the modeless constant issue and added a force_reg on
> the conversion.  New patch below, bootstrapped and tested on
> powerpc64le-unknown-linux-gnu with no regressions.  Is this version ok?

Looks good to me.

Thanks,
Richard.

> Thanks!
> Bill
>
>>
>> > Thanks,
>> > Bill
>> >
>> >>
>> >> > +           op1 = simplify_gen_unary (TRUNCATE, inner_mode, op1,
>> >> > +                                     GET_MODE (op1));
>> >> >           rtx vop1 = expand_vector_broadcast (mode, op1);
>> >> >           if (vop1)
>> >> >             {
>> >> > Index: gcc/testsuite/gcc.target/powerpc/vec-shift.c
>> >> > ===================================================================
>> >> > --- gcc/testsuite/gcc.target/powerpc/vec-shift.c        (revision 0)
>> >> > +++ gcc/testsuite/gcc.target/powerpc/vec-shift.c        (working copy)
>> >> > @@ -0,0 +1,20 @@
>> >> > +/* { dg-do compile { target { powerpc*-*-* } } } */
>> >> > +/* { dg-require-effective-target powerpc_altivec_ok } */
>> >> > +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
>> >> > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */
>> >> > +/* { dg-options "-mcpu=power7 -O2" } */
>> >> > +
>> >> > +/* This used to ICE.  During gimplification, "i" is widened to an unsigned
>> >> > +   int.  We used to fail at expand time as we tried to cram an SImode item
>> >> > +   into a QImode memory slot.  This has been fixed to properly truncate the
>> >> > +   shift amount when splatting it into a vector.  */
>> >> > +
>> >> > +typedef unsigned char v16ui __attribute__((vector_size(16)));
>> >> > +
>> >> > +v16ui vslb(v16ui v, unsigned char i)
>> >> > +{
>> >> > +       return v << i;
>> >> > +}
>> >> > +
>> >> > +/* { dg-final { scan-assembler "vspltb" } } */
>> >> > +/* { dg-final { scan-assembler "vslb" } } */
>> >> >
>
>
> New patch below:
>
> [gcc]
>
> 2015-09-02  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * optabs.c (expand_binop): Don't create a broadcast vector with a
>         source element wider than the inner mode.
>
> [gcc/testsuite]
>
> 2015-09-02  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * gcc.target/powerpc/vec-shift.c: New test.
>
>
> Index: gcc/optabs.c
> ===================================================================
> --- gcc/optabs.c        (revision 227416)
> +++ gcc/optabs.c        (working copy)
> @@ -1608,6 +1608,15 @@ expand_binop (machine_mode mode, optab binoptab, r
>
>        if (otheroptab && optab_handler (otheroptab, mode) != CODE_FOR_nothing)
>         {
> +         /* The scalar may have been extended to be too wide.  Truncate
> +            it back to the proper size to fit in the broadcast vector.  */
> +         machine_mode inner_mode = GET_MODE_INNER (mode);
> +         if (!CONST_INT_P (op1)
> +             && (GET_MODE_BITSIZE (inner_mode)
> +                 < GET_MODE_BITSIZE (GET_MODE (op1))))
> +           op1 = force_reg (inner_mode,
> +                            simplify_gen_unary (TRUNCATE, inner_mode, op1,
> +                                                GET_MODE (op1)));
>           rtx vop1 = expand_vector_broadcast (mode, op1);
>           if (vop1)
>             {
> Index: gcc/testsuite/gcc.target/powerpc/vec-shift.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/vec-shift.c        (revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/vec-shift.c        (working copy)
> @@ -0,0 +1,20 @@
> +/* { dg-do compile { target { powerpc*-*-* } } } */
> +/* { dg-require-effective-target powerpc_altivec_ok } */
> +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
> +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */
> +/* { dg-options "-mcpu=power7 -O2" } */
> +
> +/* This used to ICE.  During gimplification, "i" is widened to an unsigned
> +   int.  We used to fail at expand time as we tried to cram an SImode item
> +   into a QImode memory slot.  This has been fixed to properly truncate the
> +   shift amount when splatting it into a vector.  */
> +
> +typedef unsigned char v16ui __attribute__((vector_size(16)));
> +
> +v16ui vslb(v16ui v, unsigned char i)
> +{
> +       return v << i;
> +}
> +
> +/* { dg-final { scan-assembler "vspltb" } } */
> +/* { dg-final { scan-assembler "vslb" } } */
>
>
>
>