From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 84291 invoked by alias); 3 Sep 2015 09:40:44 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 84281 invoked by uid 89); 3 Sep 2015 09:40:43 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.2 required=5.0 tests=AWL,BAYES_50,FREEMAIL_FROM,KAM_ASCII_DIVIDERS,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=no version=3.3.2 X-HELO: mail-yk0-f182.google.com Received: from mail-yk0-f182.google.com (HELO mail-yk0-f182.google.com) (209.85.160.182) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Thu, 03 Sep 2015 09:40:41 +0000 Received: by ykei199 with SMTP id i199so37526451yke.0 for ; Thu, 03 Sep 2015 02:40:39 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.170.81.5 with SMTP id x5mr9373010ykx.82.1441273239569; Thu, 03 Sep 2015 02:40:39 -0700 (PDT) Received: by 10.37.93.136 with HTTP; Thu, 3 Sep 2015 02:40:39 -0700 (PDT) In-Reply-To: <1441228440.9007.0.camel@gnopaine> References: <1441052882.4779.3.camel@oc8801110288.ibm.com> <1441122782.4925.6.camel@oc8801110288.ibm.com> <1441228440.9007.0.camel@gnopaine> Date: Thu, 03 Sep 2015 09:42:00 -0000 Message-ID: Subject: Re: [PATCH] Fix ICE when generating a vector shift by scalar From: Richard Biener To: Bill Schmidt Cc: GCC Patches Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2015-09/txt/msg00230.txt.bz2 On Wed, Sep 2, 2015 at 11:14 PM, Bill Schmidt wrote: > > On Wed, 2015-09-02 at 14:44 +0200, Richard Biener wrote: >> On Tue, Sep 1, 2015 at 5:53 PM, Bill Schmidt >> wrote: >> > On Tue, 2015-09-01 at 11:01 +0200, Richard Biener wrote: >> >> On Mon, Aug 31, 2015 at 10:28 PM, Bill Schmidt >> >> wrote: >> >> > Hi, >> >> > >> >> > The following simple test fails when attempting to convert a vector >> >> > shift-by-scalar into a vector shift-by-vector. >> >> > >> >> > typedef unsigned char v16ui __attribute__((vector_size(16))); >> >> > >> >> > v16ui vslb(v16ui v, unsigned char i) >> >> > { >> >> > return v << i; >> >> > } >> >> > >> >> > When this code is gimplified, the shift amount gets expanded to an >> >> > unsigned int: >> >> > >> >> > vslb (v16ui v, unsigned char i) >> >> > { >> >> > v16ui D.2300; >> >> > unsigned int D.2301; >> >> > >> >> > D.2301 = (unsigned int) i; >> >> > D.2300 = v << D.2301; >> >> > return D.2300; >> >> > } >> >> > >> >> > In expand_binop, the shift-by-scalar is converted into a shift-by-vector >> >> > using expand_vector_broadcast, which produces the following rtx to be >> >> > used to initialize a V16QI vector: >> >> > >> >> > (parallel:V16QI [ >> >> > (subreg/s/v:SI (reg:DI 155) 0) >> >> > (subreg/s/v:SI (reg:DI 155) 0) >> >> > (subreg/s/v:SI (reg:DI 155) 0) >> >> > (subreg/s/v:SI (reg:DI 155) 0) >> >> > (subreg/s/v:SI (reg:DI 155) 0) >> >> > (subreg/s/v:SI (reg:DI 155) 0) >> >> > (subreg/s/v:SI (reg:DI 155) 0) >> >> > (subreg/s/v:SI (reg:DI 155) 0) >> >> > (subreg/s/v:SI (reg:DI 155) 0) >> >> > (subreg/s/v:SI (reg:DI 155) 0) >> >> > (subreg/s/v:SI (reg:DI 155) 0) >> >> > (subreg/s/v:SI (reg:DI 155) 0) >> >> > (subreg/s/v:SI (reg:DI 155) 0) >> >> > (subreg/s/v:SI (reg:DI 155) 0) >> >> > (subreg/s/v:SI (reg:DI 155) 0) >> >> > (subreg/s/v:SI (reg:DI 155) 0) >> >> > ]) >> >> > >> >> > The back end eventually chokes trying to generate a copy of the SImode >> >> > expression into a QImode memory slot. >> >> > >> >> > This patch fixes this problem by ensuring that the shift amount is >> >> > truncated to the inner mode of the vector when necessary. I've added a >> >> > test case verifying correct PowerPC code generation in this case. >> >> > >> >> > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no >> >> > regressions. Is this ok for trunk? >> >> > >> >> > Thanks, >> >> > Bill >> >> > >> >> > >> >> > [gcc] >> >> > >> >> > 2015-08-31 Bill Schmidt >> >> > >> >> > * optabs.c (expand_binop): Don't create a broadcast vector with a >> >> > source element wider than the inner mode. >> >> > >> >> > [gcc/testsuite] >> >> > >> >> > 2015-08-31 Bill Schmidt >> >> > >> >> > * gcc.target/powerpc/vec-shift.c: New test. >> >> > >> >> > >> >> > Index: gcc/optabs.c >> >> > =================================================================== >> >> > --- gcc/optabs.c (revision 227353) >> >> > +++ gcc/optabs.c (working copy) >> >> > @@ -1608,6 +1608,13 @@ expand_binop (machine_mode mode, optab binoptab, r >> >> > >> >> > if (otheroptab && optab_handler (otheroptab, mode) != CODE_FOR_nothing) >> >> > { >> >> > + /* The scalar may have been extended to be too wide. Truncate >> >> > + it back to the proper size to fit in the broadcast vector. */ >> >> > + machine_mode inner_mode = GET_MODE_INNER (mode); >> >> > + if (GET_MODE_BITSIZE (inner_mode) >> >> > + < GET_MODE_BITSIZE (GET_MODE (op1))) >> >> >> >> Does that work for modeless constants? Btw, what do other targets do >> >> here? Do they >> >> also choke or do they cope with the wide operand? >> > >> > Good question. This works by serendipity more than by design. Because >> > a constant has a mode of VOIDmode, its bitsize is 0 and the TRUNCATE >> > won't be generated. It would be better for me to put in an explicit >> > check for CONST_INT rather than relying on this, though. I'll fix that. >> > >> > I am not sure what other targets do here; I can check. However, do you >> > think that's relevant? I'm concerned that >> > >> > (parallel:V16QI [ >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > ]) >> > >> > is a nonsensical expression and shouldn't be produced by common code, in >> > my view. It seems best to make this explicitly correct. Please let me >> > know if that's off-base. >> >> No, the above indeed looks fishy though other backends vec_init_optab might >> have just handle it fine. >> >> OTOH if a conversion is required it would be nice to CSE it, thus >> force the result to a register (not sure if the targets handle invalid >> RTL sharing in vec_init_optab). > > Agreed. I've fixed the modeless constant issue and added a force_reg on > the conversion. New patch below, bootstrapped and tested on > powerpc64le-unknown-linux-gnu with no regressions. Is this version ok? Looks good to me. Thanks, Richard. > Thanks! > Bill > >> >> > Thanks, >> > Bill >> > >> >> >> >> > + op1 = simplify_gen_unary (TRUNCATE, inner_mode, op1, >> >> > + GET_MODE (op1)); >> >> > rtx vop1 = expand_vector_broadcast (mode, op1); >> >> > if (vop1) >> >> > { >> >> > Index: gcc/testsuite/gcc.target/powerpc/vec-shift.c >> >> > =================================================================== >> >> > --- gcc/testsuite/gcc.target/powerpc/vec-shift.c (revision 0) >> >> > +++ gcc/testsuite/gcc.target/powerpc/vec-shift.c (working copy) >> >> > @@ -0,0 +1,20 @@ >> >> > +/* { dg-do compile { target { powerpc*-*-* } } } */ >> >> > +/* { dg-require-effective-target powerpc_altivec_ok } */ >> >> > +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ >> >> > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */ >> >> > +/* { dg-options "-mcpu=power7 -O2" } */ >> >> > + >> >> > +/* This used to ICE. During gimplification, "i" is widened to an unsigned >> >> > + int. We used to fail at expand time as we tried to cram an SImode item >> >> > + into a QImode memory slot. This has been fixed to properly truncate the >> >> > + shift amount when splatting it into a vector. */ >> >> > + >> >> > +typedef unsigned char v16ui __attribute__((vector_size(16))); >> >> > + >> >> > +v16ui vslb(v16ui v, unsigned char i) >> >> > +{ >> >> > + return v << i; >> >> > +} >> >> > + >> >> > +/* { dg-final { scan-assembler "vspltb" } } */ >> >> > +/* { dg-final { scan-assembler "vslb" } } */ >> >> > > > > New patch below: > > [gcc] > > 2015-09-02 Bill Schmidt > > * optabs.c (expand_binop): Don't create a broadcast vector with a > source element wider than the inner mode. > > [gcc/testsuite] > > 2015-09-02 Bill Schmidt > > * gcc.target/powerpc/vec-shift.c: New test. > > > Index: gcc/optabs.c > =================================================================== > --- gcc/optabs.c (revision 227416) > +++ gcc/optabs.c (working copy) > @@ -1608,6 +1608,15 @@ expand_binop (machine_mode mode, optab binoptab, r > > if (otheroptab && optab_handler (otheroptab, mode) != CODE_FOR_nothing) > { > + /* The scalar may have been extended to be too wide. Truncate > + it back to the proper size to fit in the broadcast vector. */ > + machine_mode inner_mode = GET_MODE_INNER (mode); > + if (!CONST_INT_P (op1) > + && (GET_MODE_BITSIZE (inner_mode) > + < GET_MODE_BITSIZE (GET_MODE (op1)))) > + op1 = force_reg (inner_mode, > + simplify_gen_unary (TRUNCATE, inner_mode, op1, > + GET_MODE (op1))); > rtx vop1 = expand_vector_broadcast (mode, op1); > if (vop1) > { > Index: gcc/testsuite/gcc.target/powerpc/vec-shift.c > =================================================================== > --- gcc/testsuite/gcc.target/powerpc/vec-shift.c (revision 0) > +++ gcc/testsuite/gcc.target/powerpc/vec-shift.c (working copy) > @@ -0,0 +1,20 @@ > +/* { dg-do compile { target { powerpc*-*-* } } } */ > +/* { dg-require-effective-target powerpc_altivec_ok } */ > +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */ > +/* { dg-options "-mcpu=power7 -O2" } */ > + > +/* This used to ICE. During gimplification, "i" is widened to an unsigned > + int. We used to fail at expand time as we tried to cram an SImode item > + into a QImode memory slot. This has been fixed to properly truncate the > + shift amount when splatting it into a vector. */ > + > +typedef unsigned char v16ui __attribute__((vector_size(16))); > + > +v16ui vslb(v16ui v, unsigned char i) > +{ > + return v << i; > +} > + > +/* { dg-final { scan-assembler "vspltb" } } */ > +/* { dg-final { scan-assembler "vslb" } } */ > > > >