From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 91152 invoked by alias); 2 Sep 2015 21:14:11 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 91132 invoked by uid 89); 2 Sep 2015 21:14:08 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.1 required=5.0 tests=AWL,BAYES_50,KAM_ASCII_DIVIDERS,KAM_LAZY_DOMAIN_SECURITY,T_RP_MATCHES_RCVD autolearn=no version=3.3.2 X-HELO: e18.ny.us.ibm.com Received: from e18.ny.us.ibm.com (HELO e18.ny.us.ibm.com) (129.33.205.208) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (CAMELLIA256-SHA encrypted) ESMTPS; Wed, 02 Sep 2015 21:14:07 +0000 Received: from /spool/local by e18.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 2 Sep 2015 17:14:04 -0400 Received: from d01dlp03.pok.ibm.com (9.56.250.168) by e18.ny.us.ibm.com (146.89.104.205) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 2 Sep 2015 17:14:02 -0400 X-MailFrom: wschmidt@linux.vnet.ibm.com X-RcptTo: gcc-patches@gcc.gnu.org Received: from b01cxnp22035.gho.pok.ibm.com (b01cxnp22035.gho.pok.ibm.com [9.57.198.25]) by d01dlp03.pok.ibm.com (Postfix) with ESMTP id 25852C90048 for ; Wed, 2 Sep 2015 17:05:06 -0400 (EDT) Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by b01cxnp22035.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id t82LE1qZ55443572 for ; Wed, 2 Sep 2015 21:14:01 GMT Received: from d01av04.pok.ibm.com (localhost [127.0.0.1]) by d01av04.pok.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id t82LE1D7021808 for ; Wed, 2 Sep 2015 17:14:01 -0400 Received: from [9.65.139.205] (sig-9-65-139-205.ibm.com [9.65.139.205]) by d01av04.pok.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id t82LE09r021650; Wed, 2 Sep 2015 17:14:01 -0400 Message-ID: <1441228440.9007.0.camel@gnopaine> Subject: Re: [PATCH] Fix ICE when generating a vector shift by scalar From: Bill Schmidt To: Richard Biener Cc: GCC Patches Date: Wed, 02 Sep 2015 21:53:00 -0000 In-Reply-To: References: <1441052882.4779.3.camel@oc8801110288.ibm.com> <1441122782.4925.6.camel@oc8801110288.ibm.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15090221-0045-0000-0000-0000014322EB X-IsSubscribed: yes X-SW-Source: 2015-09/txt/msg00196.txt.bz2 On Wed, 2015-09-02 at 14:44 +0200, Richard Biener wrote: > On Tue, Sep 1, 2015 at 5:53 PM, Bill Schmidt > wrote: > > On Tue, 2015-09-01 at 11:01 +0200, Richard Biener wrote: > >> On Mon, Aug 31, 2015 at 10:28 PM, Bill Schmidt > >> wrote: > >> > Hi, > >> > > >> > The following simple test fails when attempting to convert a vector > >> > shift-by-scalar into a vector shift-by-vector. > >> > > >> > typedef unsigned char v16ui __attribute__((vector_size(16))); > >> > > >> > v16ui vslb(v16ui v, unsigned char i) > >> > { > >> > return v << i; > >> > } > >> > > >> > When this code is gimplified, the shift amount gets expanded to an > >> > unsigned int: > >> > > >> > vslb (v16ui v, unsigned char i) > >> > { > >> > v16ui D.2300; > >> > unsigned int D.2301; > >> > > >> > D.2301 = (unsigned int) i; > >> > D.2300 = v << D.2301; > >> > return D.2300; > >> > } > >> > > >> > In expand_binop, the shift-by-scalar is converted into a shift-by-vector > >> > using expand_vector_broadcast, which produces the following rtx to be > >> > used to initialize a V16QI vector: > >> > > >> > (parallel:V16QI [ > >> > (subreg/s/v:SI (reg:DI 155) 0) > >> > (subreg/s/v:SI (reg:DI 155) 0) > >> > (subreg/s/v:SI (reg:DI 155) 0) > >> > (subreg/s/v:SI (reg:DI 155) 0) > >> > (subreg/s/v:SI (reg:DI 155) 0) > >> > (subreg/s/v:SI (reg:DI 155) 0) > >> > (subreg/s/v:SI (reg:DI 155) 0) > >> > (subreg/s/v:SI (reg:DI 155) 0) > >> > (subreg/s/v:SI (reg:DI 155) 0) > >> > (subreg/s/v:SI (reg:DI 155) 0) > >> > (subreg/s/v:SI (reg:DI 155) 0) > >> > (subreg/s/v:SI (reg:DI 155) 0) > >> > (subreg/s/v:SI (reg:DI 155) 0) > >> > (subreg/s/v:SI (reg:DI 155) 0) > >> > (subreg/s/v:SI (reg:DI 155) 0) > >> > (subreg/s/v:SI (reg:DI 155) 0) > >> > ]) > >> > > >> > The back end eventually chokes trying to generate a copy of the SImode > >> > expression into a QImode memory slot. > >> > > >> > This patch fixes this problem by ensuring that the shift amount is > >> > truncated to the inner mode of the vector when necessary. I've added a > >> > test case verifying correct PowerPC code generation in this case. > >> > > >> > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no > >> > regressions. Is this ok for trunk? > >> > > >> > Thanks, > >> > Bill > >> > > >> > > >> > [gcc] > >> > > >> > 2015-08-31 Bill Schmidt > >> > > >> > * optabs.c (expand_binop): Don't create a broadcast vector with a > >> > source element wider than the inner mode. > >> > > >> > [gcc/testsuite] > >> > > >> > 2015-08-31 Bill Schmidt > >> > > >> > * gcc.target/powerpc/vec-shift.c: New test. > >> > > >> > > >> > Index: gcc/optabs.c > >> > =================================================================== > >> > --- gcc/optabs.c (revision 227353) > >> > +++ gcc/optabs.c (working copy) > >> > @@ -1608,6 +1608,13 @@ expand_binop (machine_mode mode, optab binoptab, r > >> > > >> > if (otheroptab && optab_handler (otheroptab, mode) != CODE_FOR_nothing) > >> > { > >> > + /* The scalar may have been extended to be too wide. Truncate > >> > + it back to the proper size to fit in the broadcast vector. */ > >> > + machine_mode inner_mode = GET_MODE_INNER (mode); > >> > + if (GET_MODE_BITSIZE (inner_mode) > >> > + < GET_MODE_BITSIZE (GET_MODE (op1))) > >> > >> Does that work for modeless constants? Btw, what do other targets do > >> here? Do they > >> also choke or do they cope with the wide operand? > > > > Good question. This works by serendipity more than by design. Because > > a constant has a mode of VOIDmode, its bitsize is 0 and the TRUNCATE > > won't be generated. It would be better for me to put in an explicit > > check for CONST_INT rather than relying on this, though. I'll fix that. > > > > I am not sure what other targets do here; I can check. However, do you > > think that's relevant? I'm concerned that > > > > (parallel:V16QI [ > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > ]) > > > > is a nonsensical expression and shouldn't be produced by common code, in > > my view. It seems best to make this explicitly correct. Please let me > > know if that's off-base. > > No, the above indeed looks fishy though other backends vec_init_optab might > have just handle it fine. > > OTOH if a conversion is required it would be nice to CSE it, thus > force the result to a register (not sure if the targets handle invalid > RTL sharing in vec_init_optab). Agreed. I've fixed the modeless constant issue and added a force_reg on the conversion. New patch below, bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. Is this version ok? Thanks! Bill > > > Thanks, > > Bill > > > >> > >> > + op1 = simplify_gen_unary (TRUNCATE, inner_mode, op1, > >> > + GET_MODE (op1)); > >> > rtx vop1 = expand_vector_broadcast (mode, op1); > >> > if (vop1) > >> > { > >> > Index: gcc/testsuite/gcc.target/powerpc/vec-shift.c > >> > =================================================================== > >> > --- gcc/testsuite/gcc.target/powerpc/vec-shift.c (revision 0) > >> > +++ gcc/testsuite/gcc.target/powerpc/vec-shift.c (working copy) > >> > @@ -0,0 +1,20 @@ > >> > +/* { dg-do compile { target { powerpc*-*-* } } } */ > >> > +/* { dg-require-effective-target powerpc_altivec_ok } */ > >> > +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ > >> > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */ > >> > +/* { dg-options "-mcpu=power7 -O2" } */ > >> > + > >> > +/* This used to ICE. During gimplification, "i" is widened to an unsigned > >> > + int. We used to fail at expand time as we tried to cram an SImode item > >> > + into a QImode memory slot. This has been fixed to properly truncate the > >> > + shift amount when splatting it into a vector. */ > >> > + > >> > +typedef unsigned char v16ui __attribute__((vector_size(16))); > >> > + > >> > +v16ui vslb(v16ui v, unsigned char i) > >> > +{ > >> > + return v << i; > >> > +} > >> > + > >> > +/* { dg-final { scan-assembler "vspltb" } } */ > >> > +/* { dg-final { scan-assembler "vslb" } } */ > >> > New patch below: [gcc] 2015-09-02 Bill Schmidt * optabs.c (expand_binop): Don't create a broadcast vector with a source element wider than the inner mode. [gcc/testsuite] 2015-09-02 Bill Schmidt * gcc.target/powerpc/vec-shift.c: New test. Index: gcc/optabs.c =================================================================== --- gcc/optabs.c (revision 227416) +++ gcc/optabs.c (working copy) @@ -1608,6 +1608,15 @@ expand_binop (machine_mode mode, optab binoptab, r if (otheroptab && optab_handler (otheroptab, mode) != CODE_FOR_nothing) { + /* The scalar may have been extended to be too wide. Truncate + it back to the proper size to fit in the broadcast vector. */ + machine_mode inner_mode = GET_MODE_INNER (mode); + if (!CONST_INT_P (op1) + && (GET_MODE_BITSIZE (inner_mode) + < GET_MODE_BITSIZE (GET_MODE (op1)))) + op1 = force_reg (inner_mode, + simplify_gen_unary (TRUNCATE, inner_mode, op1, + GET_MODE (op1))); rtx vop1 = expand_vector_broadcast (mode, op1); if (vop1) { Index: gcc/testsuite/gcc.target/powerpc/vec-shift.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/vec-shift.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vec-shift.c (working copy) @@ -0,0 +1,20 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-require-effective-target powerpc_altivec_ok } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */ +/* { dg-options "-mcpu=power7 -O2" } */ + +/* This used to ICE. During gimplification, "i" is widened to an unsigned + int. We used to fail at expand time as we tried to cram an SImode item + into a QImode memory slot. This has been fixed to properly truncate the + shift amount when splatting it into a vector. */ + +typedef unsigned char v16ui __attribute__((vector_size(16))); + +v16ui vslb(v16ui v, unsigned char i) +{ + return v << i; +} + +/* { dg-final { scan-assembler "vspltb" } } */ +/* { dg-final { scan-assembler "vslb" } } */