* [PATCH] Fix ICE when generating a vector shift by scalar @ 2015-08-31 21:23 Bill Schmidt 2015-09-01 9:01 ` Richard Biener 0 siblings, 1 reply; 6+ messages in thread From: Bill Schmidt @ 2015-08-31 21:23 UTC (permalink / raw) To: gcc-patches Hi, The following simple test fails when attempting to convert a vector shift-by-scalar into a vector shift-by-vector. typedef unsigned char v16ui __attribute__((vector_size(16))); v16ui vslb(v16ui v, unsigned char i) { return v << i; } When this code is gimplified, the shift amount gets expanded to an unsigned int: vslb (v16ui v, unsigned char i) { v16ui D.2300; unsigned int D.2301; D.2301 = (unsigned int) i; D.2300 = v << D.2301; return D.2300; } In expand_binop, the shift-by-scalar is converted into a shift-by-vector using expand_vector_broadcast, which produces the following rtx to be used to initialize a V16QI vector: (parallel:V16QI [ (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) ]) The back end eventually chokes trying to generate a copy of the SImode expression into a QImode memory slot. This patch fixes this problem by ensuring that the shift amount is truncated to the inner mode of the vector when necessary. I've added a test case verifying correct PowerPC code generation in this case. Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. Is this ok for trunk? Thanks, Bill [gcc] 2015-08-31 Bill Schmidt <wschmidt@linux.vnet.ibm.com> * optabs.c (expand_binop): Don't create a broadcast vector with a source element wider than the inner mode. [gcc/testsuite] 2015-08-31 Bill Schmidt <wschmidt@linux.vnet.ibm.com> * gcc.target/powerpc/vec-shift.c: New test. Index: gcc/optabs.c =================================================================== --- gcc/optabs.c (revision 227353) +++ gcc/optabs.c (working copy) @@ -1608,6 +1608,13 @@ expand_binop (machine_mode mode, optab binoptab, r if (otheroptab && optab_handler (otheroptab, mode) != CODE_FOR_nothing) { + /* The scalar may have been extended to be too wide. Truncate + it back to the proper size to fit in the broadcast vector. */ + machine_mode inner_mode = GET_MODE_INNER (mode); + if (GET_MODE_BITSIZE (inner_mode) + < GET_MODE_BITSIZE (GET_MODE (op1))) + op1 = simplify_gen_unary (TRUNCATE, inner_mode, op1, + GET_MODE (op1)); rtx vop1 = expand_vector_broadcast (mode, op1); if (vop1) { Index: gcc/testsuite/gcc.target/powerpc/vec-shift.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/vec-shift.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vec-shift.c (working copy) @@ -0,0 +1,20 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-require-effective-target powerpc_altivec_ok } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */ +/* { dg-options "-mcpu=power7 -O2" } */ + +/* This used to ICE. During gimplification, "i" is widened to an unsigned + int. We used to fail at expand time as we tried to cram an SImode item + into a QImode memory slot. This has been fixed to properly truncate the + shift amount when splatting it into a vector. */ + +typedef unsigned char v16ui __attribute__((vector_size(16))); + +v16ui vslb(v16ui v, unsigned char i) +{ + return v << i; +} + +/* { dg-final { scan-assembler "vspltb" } } */ +/* { dg-final { scan-assembler "vslb" } } */ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Fix ICE when generating a vector shift by scalar 2015-08-31 21:23 [PATCH] Fix ICE when generating a vector shift by scalar Bill Schmidt @ 2015-09-01 9:01 ` Richard Biener 2015-09-01 15:54 ` Bill Schmidt 0 siblings, 1 reply; 6+ messages in thread From: Richard Biener @ 2015-09-01 9:01 UTC (permalink / raw) To: Bill Schmidt; +Cc: GCC Patches On Mon, Aug 31, 2015 at 10:28 PM, Bill Schmidt <wschmidt@linux.vnet.ibm.com> wrote: > Hi, > > The following simple test fails when attempting to convert a vector > shift-by-scalar into a vector shift-by-vector. > > typedef unsigned char v16ui __attribute__((vector_size(16))); > > v16ui vslb(v16ui v, unsigned char i) > { > return v << i; > } > > When this code is gimplified, the shift amount gets expanded to an > unsigned int: > > vslb (v16ui v, unsigned char i) > { > v16ui D.2300; > unsigned int D.2301; > > D.2301 = (unsigned int) i; > D.2300 = v << D.2301; > return D.2300; > } > > In expand_binop, the shift-by-scalar is converted into a shift-by-vector > using expand_vector_broadcast, which produces the following rtx to be > used to initialize a V16QI vector: > > (parallel:V16QI [ > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > ]) > > The back end eventually chokes trying to generate a copy of the SImode > expression into a QImode memory slot. > > This patch fixes this problem by ensuring that the shift amount is > truncated to the inner mode of the vector when necessary. I've added a > test case verifying correct PowerPC code generation in this case. > > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no > regressions. Is this ok for trunk? > > Thanks, > Bill > > > [gcc] > > 2015-08-31 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * optabs.c (expand_binop): Don't create a broadcast vector with a > source element wider than the inner mode. > > [gcc/testsuite] > > 2015-08-31 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * gcc.target/powerpc/vec-shift.c: New test. > > > Index: gcc/optabs.c > =================================================================== > --- gcc/optabs.c (revision 227353) > +++ gcc/optabs.c (working copy) > @@ -1608,6 +1608,13 @@ expand_binop (machine_mode mode, optab binoptab, r > > if (otheroptab && optab_handler (otheroptab, mode) != CODE_FOR_nothing) > { > + /* The scalar may have been extended to be too wide. Truncate > + it back to the proper size to fit in the broadcast vector. */ > + machine_mode inner_mode = GET_MODE_INNER (mode); > + if (GET_MODE_BITSIZE (inner_mode) > + < GET_MODE_BITSIZE (GET_MODE (op1))) Does that work for modeless constants? Btw, what do other targets do here? Do they also choke or do they cope with the wide operand? > + op1 = simplify_gen_unary (TRUNCATE, inner_mode, op1, > + GET_MODE (op1)); > rtx vop1 = expand_vector_broadcast (mode, op1); > if (vop1) > { > Index: gcc/testsuite/gcc.target/powerpc/vec-shift.c > =================================================================== > --- gcc/testsuite/gcc.target/powerpc/vec-shift.c (revision 0) > +++ gcc/testsuite/gcc.target/powerpc/vec-shift.c (working copy) > @@ -0,0 +1,20 @@ > +/* { dg-do compile { target { powerpc*-*-* } } } */ > +/* { dg-require-effective-target powerpc_altivec_ok } */ > +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */ > +/* { dg-options "-mcpu=power7 -O2" } */ > + > +/* This used to ICE. During gimplification, "i" is widened to an unsigned > + int. We used to fail at expand time as we tried to cram an SImode item > + into a QImode memory slot. This has been fixed to properly truncate the > + shift amount when splatting it into a vector. */ > + > +typedef unsigned char v16ui __attribute__((vector_size(16))); > + > +v16ui vslb(v16ui v, unsigned char i) > +{ > + return v << i; > +} > + > +/* { dg-final { scan-assembler "vspltb" } } */ > +/* { dg-final { scan-assembler "vslb" } } */ > > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Fix ICE when generating a vector shift by scalar 2015-09-01 9:01 ` Richard Biener @ 2015-09-01 15:54 ` Bill Schmidt 2015-09-02 12:44 ` Richard Biener 0 siblings, 1 reply; 6+ messages in thread From: Bill Schmidt @ 2015-09-01 15:54 UTC (permalink / raw) To: Richard Biener; +Cc: GCC Patches On Tue, 2015-09-01 at 11:01 +0200, Richard Biener wrote: > On Mon, Aug 31, 2015 at 10:28 PM, Bill Schmidt > <wschmidt@linux.vnet.ibm.com> wrote: > > Hi, > > > > The following simple test fails when attempting to convert a vector > > shift-by-scalar into a vector shift-by-vector. > > > > typedef unsigned char v16ui __attribute__((vector_size(16))); > > > > v16ui vslb(v16ui v, unsigned char i) > > { > > return v << i; > > } > > > > When this code is gimplified, the shift amount gets expanded to an > > unsigned int: > > > > vslb (v16ui v, unsigned char i) > > { > > v16ui D.2300; > > unsigned int D.2301; > > > > D.2301 = (unsigned int) i; > > D.2300 = v << D.2301; > > return D.2300; > > } > > > > In expand_binop, the shift-by-scalar is converted into a shift-by-vector > > using expand_vector_broadcast, which produces the following rtx to be > > used to initialize a V16QI vector: > > > > (parallel:V16QI [ > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > ]) > > > > The back end eventually chokes trying to generate a copy of the SImode > > expression into a QImode memory slot. > > > > This patch fixes this problem by ensuring that the shift amount is > > truncated to the inner mode of the vector when necessary. I've added a > > test case verifying correct PowerPC code generation in this case. > > > > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no > > regressions. Is this ok for trunk? > > > > Thanks, > > Bill > > > > > > [gcc] > > > > 2015-08-31 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > > > * optabs.c (expand_binop): Don't create a broadcast vector with a > > source element wider than the inner mode. > > > > [gcc/testsuite] > > > > 2015-08-31 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > > > * gcc.target/powerpc/vec-shift.c: New test. > > > > > > Index: gcc/optabs.c > > =================================================================== > > --- gcc/optabs.c (revision 227353) > > +++ gcc/optabs.c (working copy) > > @@ -1608,6 +1608,13 @@ expand_binop (machine_mode mode, optab binoptab, r > > > > if (otheroptab && optab_handler (otheroptab, mode) != CODE_FOR_nothing) > > { > > + /* The scalar may have been extended to be too wide. Truncate > > + it back to the proper size to fit in the broadcast vector. */ > > + machine_mode inner_mode = GET_MODE_INNER (mode); > > + if (GET_MODE_BITSIZE (inner_mode) > > + < GET_MODE_BITSIZE (GET_MODE (op1))) > > Does that work for modeless constants? Btw, what do other targets do > here? Do they > also choke or do they cope with the wide operand? Good question. This works by serendipity more than by design. Because a constant has a mode of VOIDmode, its bitsize is 0 and the TRUNCATE won't be generated. It would be better for me to put in an explicit check for CONST_INT rather than relying on this, though. I'll fix that. I am not sure what other targets do here; I can check. However, do you think that's relevant? I'm concerned that (parallel:V16QI [ (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) (subreg/s/v:SI (reg:DI 155) 0) ]) is a nonsensical expression and shouldn't be produced by common code, in my view. It seems best to make this explicitly correct. Please let me know if that's off-base. Thanks, Bill > > > + op1 = simplify_gen_unary (TRUNCATE, inner_mode, op1, > > + GET_MODE (op1)); > > rtx vop1 = expand_vector_broadcast (mode, op1); > > if (vop1) > > { > > Index: gcc/testsuite/gcc.target/powerpc/vec-shift.c > > =================================================================== > > --- gcc/testsuite/gcc.target/powerpc/vec-shift.c (revision 0) > > +++ gcc/testsuite/gcc.target/powerpc/vec-shift.c (working copy) > > @@ -0,0 +1,20 @@ > > +/* { dg-do compile { target { powerpc*-*-* } } } */ > > +/* { dg-require-effective-target powerpc_altivec_ok } */ > > +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ > > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */ > > +/* { dg-options "-mcpu=power7 -O2" } */ > > + > > +/* This used to ICE. During gimplification, "i" is widened to an unsigned > > + int. We used to fail at expand time as we tried to cram an SImode item > > + into a QImode memory slot. This has been fixed to properly truncate the > > + shift amount when splatting it into a vector. */ > > + > > +typedef unsigned char v16ui __attribute__((vector_size(16))); > > + > > +v16ui vslb(v16ui v, unsigned char i) > > +{ > > + return v << i; > > +} > > + > > +/* { dg-final { scan-assembler "vspltb" } } */ > > +/* { dg-final { scan-assembler "vslb" } } */ > > > > > > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Fix ICE when generating a vector shift by scalar 2015-09-01 15:54 ` Bill Schmidt @ 2015-09-02 12:44 ` Richard Biener 2015-09-02 21:53 ` Bill Schmidt 0 siblings, 1 reply; 6+ messages in thread From: Richard Biener @ 2015-09-02 12:44 UTC (permalink / raw) To: Bill Schmidt; +Cc: GCC Patches On Tue, Sep 1, 2015 at 5:53 PM, Bill Schmidt <wschmidt@linux.vnet.ibm.com> wrote: > On Tue, 2015-09-01 at 11:01 +0200, Richard Biener wrote: >> On Mon, Aug 31, 2015 at 10:28 PM, Bill Schmidt >> <wschmidt@linux.vnet.ibm.com> wrote: >> > Hi, >> > >> > The following simple test fails when attempting to convert a vector >> > shift-by-scalar into a vector shift-by-vector. >> > >> > typedef unsigned char v16ui __attribute__((vector_size(16))); >> > >> > v16ui vslb(v16ui v, unsigned char i) >> > { >> > return v << i; >> > } >> > >> > When this code is gimplified, the shift amount gets expanded to an >> > unsigned int: >> > >> > vslb (v16ui v, unsigned char i) >> > { >> > v16ui D.2300; >> > unsigned int D.2301; >> > >> > D.2301 = (unsigned int) i; >> > D.2300 = v << D.2301; >> > return D.2300; >> > } >> > >> > In expand_binop, the shift-by-scalar is converted into a shift-by-vector >> > using expand_vector_broadcast, which produces the following rtx to be >> > used to initialize a V16QI vector: >> > >> > (parallel:V16QI [ >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > ]) >> > >> > The back end eventually chokes trying to generate a copy of the SImode >> > expression into a QImode memory slot. >> > >> > This patch fixes this problem by ensuring that the shift amount is >> > truncated to the inner mode of the vector when necessary. I've added a >> > test case verifying correct PowerPC code generation in this case. >> > >> > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no >> > regressions. Is this ok for trunk? >> > >> > Thanks, >> > Bill >> > >> > >> > [gcc] >> > >> > 2015-08-31 Bill Schmidt <wschmidt@linux.vnet.ibm.com> >> > >> > * optabs.c (expand_binop): Don't create a broadcast vector with a >> > source element wider than the inner mode. >> > >> > [gcc/testsuite] >> > >> > 2015-08-31 Bill Schmidt <wschmidt@linux.vnet.ibm.com> >> > >> > * gcc.target/powerpc/vec-shift.c: New test. >> > >> > >> > Index: gcc/optabs.c >> > =================================================================== >> > --- gcc/optabs.c (revision 227353) >> > +++ gcc/optabs.c (working copy) >> > @@ -1608,6 +1608,13 @@ expand_binop (machine_mode mode, optab binoptab, r >> > >> > if (otheroptab && optab_handler (otheroptab, mode) != CODE_FOR_nothing) >> > { >> > + /* The scalar may have been extended to be too wide. Truncate >> > + it back to the proper size to fit in the broadcast vector. */ >> > + machine_mode inner_mode = GET_MODE_INNER (mode); >> > + if (GET_MODE_BITSIZE (inner_mode) >> > + < GET_MODE_BITSIZE (GET_MODE (op1))) >> >> Does that work for modeless constants? Btw, what do other targets do >> here? Do they >> also choke or do they cope with the wide operand? > > Good question. This works by serendipity more than by design. Because > a constant has a mode of VOIDmode, its bitsize is 0 and the TRUNCATE > won't be generated. It would be better for me to put in an explicit > check for CONST_INT rather than relying on this, though. I'll fix that. > > I am not sure what other targets do here; I can check. However, do you > think that's relevant? I'm concerned that > > (parallel:V16QI [ > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > (subreg/s/v:SI (reg:DI 155) 0) > ]) > > is a nonsensical expression and shouldn't be produced by common code, in > my view. It seems best to make this explicitly correct. Please let me > know if that's off-base. No, the above indeed looks fishy though other backends vec_init_optab might have just handle it fine. OTOH if a conversion is required it would be nice to CSE it, thus force the result to a register (not sure if the targets handle invalid RTL sharing in vec_init_optab). > Thanks, > Bill > >> >> > + op1 = simplify_gen_unary (TRUNCATE, inner_mode, op1, >> > + GET_MODE (op1)); >> > rtx vop1 = expand_vector_broadcast (mode, op1); >> > if (vop1) >> > { >> > Index: gcc/testsuite/gcc.target/powerpc/vec-shift.c >> > =================================================================== >> > --- gcc/testsuite/gcc.target/powerpc/vec-shift.c (revision 0) >> > +++ gcc/testsuite/gcc.target/powerpc/vec-shift.c (working copy) >> > @@ -0,0 +1,20 @@ >> > +/* { dg-do compile { target { powerpc*-*-* } } } */ >> > +/* { dg-require-effective-target powerpc_altivec_ok } */ >> > +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ >> > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */ >> > +/* { dg-options "-mcpu=power7 -O2" } */ >> > + >> > +/* This used to ICE. During gimplification, "i" is widened to an unsigned >> > + int. We used to fail at expand time as we tried to cram an SImode item >> > + into a QImode memory slot. This has been fixed to properly truncate the >> > + shift amount when splatting it into a vector. */ >> > + >> > +typedef unsigned char v16ui __attribute__((vector_size(16))); >> > + >> > +v16ui vslb(v16ui v, unsigned char i) >> > +{ >> > + return v << i; >> > +} >> > + >> > +/* { dg-final { scan-assembler "vspltb" } } */ >> > +/* { dg-final { scan-assembler "vslb" } } */ >> > >> > >> > >> > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Fix ICE when generating a vector shift by scalar 2015-09-02 12:44 ` Richard Biener @ 2015-09-02 21:53 ` Bill Schmidt 2015-09-03 9:42 ` Richard Biener 0 siblings, 1 reply; 6+ messages in thread From: Bill Schmidt @ 2015-09-02 21:53 UTC (permalink / raw) To: Richard Biener; +Cc: GCC Patches On Wed, 2015-09-02 at 14:44 +0200, Richard Biener wrote: > On Tue, Sep 1, 2015 at 5:53 PM, Bill Schmidt > <wschmidt@linux.vnet.ibm.com> wrote: > > On Tue, 2015-09-01 at 11:01 +0200, Richard Biener wrote: > >> On Mon, Aug 31, 2015 at 10:28 PM, Bill Schmidt > >> <wschmidt@linux.vnet.ibm.com> wrote: > >> > Hi, > >> > > >> > The following simple test fails when attempting to convert a vector > >> > shift-by-scalar into a vector shift-by-vector. > >> > > >> > typedef unsigned char v16ui __attribute__((vector_size(16))); > >> > > >> > v16ui vslb(v16ui v, unsigned char i) > >> > { > >> > return v << i; > >> > } > >> > > >> > When this code is gimplified, the shift amount gets expanded to an > >> > unsigned int: > >> > > >> > vslb (v16ui v, unsigned char i) > >> > { > >> > v16ui D.2300; > >> > unsigned int D.2301; > >> > > >> > D.2301 = (unsigned int) i; > >> > D.2300 = v << D.2301; > >> > return D.2300; > >> > } > >> > > >> > In expand_binop, the shift-by-scalar is converted into a shift-by-vector > >> > using expand_vector_broadcast, which produces the following rtx to be > >> > used to initialize a V16QI vector: > >> > > >> > (parallel:V16QI [ > >> > (subreg/s/v:SI (reg:DI 155) 0) > >> > (subreg/s/v:SI (reg:DI 155) 0) > >> > (subreg/s/v:SI (reg:DI 155) 0) > >> > (subreg/s/v:SI (reg:DI 155) 0) > >> > (subreg/s/v:SI (reg:DI 155) 0) > >> > (subreg/s/v:SI (reg:DI 155) 0) > >> > (subreg/s/v:SI (reg:DI 155) 0) > >> > (subreg/s/v:SI (reg:DI 155) 0) > >> > (subreg/s/v:SI (reg:DI 155) 0) > >> > (subreg/s/v:SI (reg:DI 155) 0) > >> > (subreg/s/v:SI (reg:DI 155) 0) > >> > (subreg/s/v:SI (reg:DI 155) 0) > >> > (subreg/s/v:SI (reg:DI 155) 0) > >> > (subreg/s/v:SI (reg:DI 155) 0) > >> > (subreg/s/v:SI (reg:DI 155) 0) > >> > (subreg/s/v:SI (reg:DI 155) 0) > >> > ]) > >> > > >> > The back end eventually chokes trying to generate a copy of the SImode > >> > expression into a QImode memory slot. > >> > > >> > This patch fixes this problem by ensuring that the shift amount is > >> > truncated to the inner mode of the vector when necessary. I've added a > >> > test case verifying correct PowerPC code generation in this case. > >> > > >> > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no > >> > regressions. Is this ok for trunk? > >> > > >> > Thanks, > >> > Bill > >> > > >> > > >> > [gcc] > >> > > >> > 2015-08-31 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > >> > > >> > * optabs.c (expand_binop): Don't create a broadcast vector with a > >> > source element wider than the inner mode. > >> > > >> > [gcc/testsuite] > >> > > >> > 2015-08-31 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > >> > > >> > * gcc.target/powerpc/vec-shift.c: New test. > >> > > >> > > >> > Index: gcc/optabs.c > >> > =================================================================== > >> > --- gcc/optabs.c (revision 227353) > >> > +++ gcc/optabs.c (working copy) > >> > @@ -1608,6 +1608,13 @@ expand_binop (machine_mode mode, optab binoptab, r > >> > > >> > if (otheroptab && optab_handler (otheroptab, mode) != CODE_FOR_nothing) > >> > { > >> > + /* The scalar may have been extended to be too wide. Truncate > >> > + it back to the proper size to fit in the broadcast vector. */ > >> > + machine_mode inner_mode = GET_MODE_INNER (mode); > >> > + if (GET_MODE_BITSIZE (inner_mode) > >> > + < GET_MODE_BITSIZE (GET_MODE (op1))) > >> > >> Does that work for modeless constants? Btw, what do other targets do > >> here? Do they > >> also choke or do they cope with the wide operand? > > > > Good question. This works by serendipity more than by design. Because > > a constant has a mode of VOIDmode, its bitsize is 0 and the TRUNCATE > > won't be generated. It would be better for me to put in an explicit > > check for CONST_INT rather than relying on this, though. I'll fix that. > > > > I am not sure what other targets do here; I can check. However, do you > > think that's relevant? I'm concerned that > > > > (parallel:V16QI [ > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > (subreg/s/v:SI (reg:DI 155) 0) > > ]) > > > > is a nonsensical expression and shouldn't be produced by common code, in > > my view. It seems best to make this explicitly correct. Please let me > > know if that's off-base. > > No, the above indeed looks fishy though other backends vec_init_optab might > have just handle it fine. > > OTOH if a conversion is required it would be nice to CSE it, thus > force the result to a register (not sure if the targets handle invalid > RTL sharing in vec_init_optab). Agreed. I've fixed the modeless constant issue and added a force_reg on the conversion. New patch below, bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. Is this version ok? Thanks! Bill > > > Thanks, > > Bill > > > >> > >> > + op1 = simplify_gen_unary (TRUNCATE, inner_mode, op1, > >> > + GET_MODE (op1)); > >> > rtx vop1 = expand_vector_broadcast (mode, op1); > >> > if (vop1) > >> > { > >> > Index: gcc/testsuite/gcc.target/powerpc/vec-shift.c > >> > =================================================================== > >> > --- gcc/testsuite/gcc.target/powerpc/vec-shift.c (revision 0) > >> > +++ gcc/testsuite/gcc.target/powerpc/vec-shift.c (working copy) > >> > @@ -0,0 +1,20 @@ > >> > +/* { dg-do compile { target { powerpc*-*-* } } } */ > >> > +/* { dg-require-effective-target powerpc_altivec_ok } */ > >> > +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ > >> > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */ > >> > +/* { dg-options "-mcpu=power7 -O2" } */ > >> > + > >> > +/* This used to ICE. During gimplification, "i" is widened to an unsigned > >> > + int. We used to fail at expand time as we tried to cram an SImode item > >> > + into a QImode memory slot. This has been fixed to properly truncate the > >> > + shift amount when splatting it into a vector. */ > >> > + > >> > +typedef unsigned char v16ui __attribute__((vector_size(16))); > >> > + > >> > +v16ui vslb(v16ui v, unsigned char i) > >> > +{ > >> > + return v << i; > >> > +} > >> > + > >> > +/* { dg-final { scan-assembler "vspltb" } } */ > >> > +/* { dg-final { scan-assembler "vslb" } } */ > >> > New patch below: [gcc] 2015-09-02 Bill Schmidt <wschmidt@linux.vnet.ibm.com> * optabs.c (expand_binop): Don't create a broadcast vector with a source element wider than the inner mode. [gcc/testsuite] 2015-09-02 Bill Schmidt <wschmidt@linux.vnet.ibm.com> * gcc.target/powerpc/vec-shift.c: New test. Index: gcc/optabs.c =================================================================== --- gcc/optabs.c (revision 227416) +++ gcc/optabs.c (working copy) @@ -1608,6 +1608,15 @@ expand_binop (machine_mode mode, optab binoptab, r if (otheroptab && optab_handler (otheroptab, mode) != CODE_FOR_nothing) { + /* The scalar may have been extended to be too wide. Truncate + it back to the proper size to fit in the broadcast vector. */ + machine_mode inner_mode = GET_MODE_INNER (mode); + if (!CONST_INT_P (op1) + && (GET_MODE_BITSIZE (inner_mode) + < GET_MODE_BITSIZE (GET_MODE (op1)))) + op1 = force_reg (inner_mode, + simplify_gen_unary (TRUNCATE, inner_mode, op1, + GET_MODE (op1))); rtx vop1 = expand_vector_broadcast (mode, op1); if (vop1) { Index: gcc/testsuite/gcc.target/powerpc/vec-shift.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/vec-shift.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/vec-shift.c (working copy) @@ -0,0 +1,20 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-require-effective-target powerpc_altivec_ok } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */ +/* { dg-options "-mcpu=power7 -O2" } */ + +/* This used to ICE. During gimplification, "i" is widened to an unsigned + int. We used to fail at expand time as we tried to cram an SImode item + into a QImode memory slot. This has been fixed to properly truncate the + shift amount when splatting it into a vector. */ + +typedef unsigned char v16ui __attribute__((vector_size(16))); + +v16ui vslb(v16ui v, unsigned char i) +{ + return v << i; +} + +/* { dg-final { scan-assembler "vspltb" } } */ +/* { dg-final { scan-assembler "vslb" } } */ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Fix ICE when generating a vector shift by scalar 2015-09-02 21:53 ` Bill Schmidt @ 2015-09-03 9:42 ` Richard Biener 0 siblings, 0 replies; 6+ messages in thread From: Richard Biener @ 2015-09-03 9:42 UTC (permalink / raw) To: Bill Schmidt; +Cc: GCC Patches On Wed, Sep 2, 2015 at 11:14 PM, Bill Schmidt <wschmidt@linux.vnet.ibm.com> wrote: > > On Wed, 2015-09-02 at 14:44 +0200, Richard Biener wrote: >> On Tue, Sep 1, 2015 at 5:53 PM, Bill Schmidt >> <wschmidt@linux.vnet.ibm.com> wrote: >> > On Tue, 2015-09-01 at 11:01 +0200, Richard Biener wrote: >> >> On Mon, Aug 31, 2015 at 10:28 PM, Bill Schmidt >> >> <wschmidt@linux.vnet.ibm.com> wrote: >> >> > Hi, >> >> > >> >> > The following simple test fails when attempting to convert a vector >> >> > shift-by-scalar into a vector shift-by-vector. >> >> > >> >> > typedef unsigned char v16ui __attribute__((vector_size(16))); >> >> > >> >> > v16ui vslb(v16ui v, unsigned char i) >> >> > { >> >> > return v << i; >> >> > } >> >> > >> >> > When this code is gimplified, the shift amount gets expanded to an >> >> > unsigned int: >> >> > >> >> > vslb (v16ui v, unsigned char i) >> >> > { >> >> > v16ui D.2300; >> >> > unsigned int D.2301; >> >> > >> >> > D.2301 = (unsigned int) i; >> >> > D.2300 = v << D.2301; >> >> > return D.2300; >> >> > } >> >> > >> >> > In expand_binop, the shift-by-scalar is converted into a shift-by-vector >> >> > using expand_vector_broadcast, which produces the following rtx to be >> >> > used to initialize a V16QI vector: >> >> > >> >> > (parallel:V16QI [ >> >> > (subreg/s/v:SI (reg:DI 155) 0) >> >> > (subreg/s/v:SI (reg:DI 155) 0) >> >> > (subreg/s/v:SI (reg:DI 155) 0) >> >> > (subreg/s/v:SI (reg:DI 155) 0) >> >> > (subreg/s/v:SI (reg:DI 155) 0) >> >> > (subreg/s/v:SI (reg:DI 155) 0) >> >> > (subreg/s/v:SI (reg:DI 155) 0) >> >> > (subreg/s/v:SI (reg:DI 155) 0) >> >> > (subreg/s/v:SI (reg:DI 155) 0) >> >> > (subreg/s/v:SI (reg:DI 155) 0) >> >> > (subreg/s/v:SI (reg:DI 155) 0) >> >> > (subreg/s/v:SI (reg:DI 155) 0) >> >> > (subreg/s/v:SI (reg:DI 155) 0) >> >> > (subreg/s/v:SI (reg:DI 155) 0) >> >> > (subreg/s/v:SI (reg:DI 155) 0) >> >> > (subreg/s/v:SI (reg:DI 155) 0) >> >> > ]) >> >> > >> >> > The back end eventually chokes trying to generate a copy of the SImode >> >> > expression into a QImode memory slot. >> >> > >> >> > This patch fixes this problem by ensuring that the shift amount is >> >> > truncated to the inner mode of the vector when necessary. I've added a >> >> > test case verifying correct PowerPC code generation in this case. >> >> > >> >> > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no >> >> > regressions. Is this ok for trunk? >> >> > >> >> > Thanks, >> >> > Bill >> >> > >> >> > >> >> > [gcc] >> >> > >> >> > 2015-08-31 Bill Schmidt <wschmidt@linux.vnet.ibm.com> >> >> > >> >> > * optabs.c (expand_binop): Don't create a broadcast vector with a >> >> > source element wider than the inner mode. >> >> > >> >> > [gcc/testsuite] >> >> > >> >> > 2015-08-31 Bill Schmidt <wschmidt@linux.vnet.ibm.com> >> >> > >> >> > * gcc.target/powerpc/vec-shift.c: New test. >> >> > >> >> > >> >> > Index: gcc/optabs.c >> >> > =================================================================== >> >> > --- gcc/optabs.c (revision 227353) >> >> > +++ gcc/optabs.c (working copy) >> >> > @@ -1608,6 +1608,13 @@ expand_binop (machine_mode mode, optab binoptab, r >> >> > >> >> > if (otheroptab && optab_handler (otheroptab, mode) != CODE_FOR_nothing) >> >> > { >> >> > + /* The scalar may have been extended to be too wide. Truncate >> >> > + it back to the proper size to fit in the broadcast vector. */ >> >> > + machine_mode inner_mode = GET_MODE_INNER (mode); >> >> > + if (GET_MODE_BITSIZE (inner_mode) >> >> > + < GET_MODE_BITSIZE (GET_MODE (op1))) >> >> >> >> Does that work for modeless constants? Btw, what do other targets do >> >> here? Do they >> >> also choke or do they cope with the wide operand? >> > >> > Good question. This works by serendipity more than by design. Because >> > a constant has a mode of VOIDmode, its bitsize is 0 and the TRUNCATE >> > won't be generated. It would be better for me to put in an explicit >> > check for CONST_INT rather than relying on this, though. I'll fix that. >> > >> > I am not sure what other targets do here; I can check. However, do you >> > think that's relevant? I'm concerned that >> > >> > (parallel:V16QI [ >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > (subreg/s/v:SI (reg:DI 155) 0) >> > ]) >> > >> > is a nonsensical expression and shouldn't be produced by common code, in >> > my view. It seems best to make this explicitly correct. Please let me >> > know if that's off-base. >> >> No, the above indeed looks fishy though other backends vec_init_optab might >> have just handle it fine. >> >> OTOH if a conversion is required it would be nice to CSE it, thus >> force the result to a register (not sure if the targets handle invalid >> RTL sharing in vec_init_optab). > > Agreed. I've fixed the modeless constant issue and added a force_reg on > the conversion. New patch below, bootstrapped and tested on > powerpc64le-unknown-linux-gnu with no regressions. Is this version ok? Looks good to me. Thanks, Richard. > Thanks! > Bill > >> >> > Thanks, >> > Bill >> > >> >> >> >> > + op1 = simplify_gen_unary (TRUNCATE, inner_mode, op1, >> >> > + GET_MODE (op1)); >> >> > rtx vop1 = expand_vector_broadcast (mode, op1); >> >> > if (vop1) >> >> > { >> >> > Index: gcc/testsuite/gcc.target/powerpc/vec-shift.c >> >> > =================================================================== >> >> > --- gcc/testsuite/gcc.target/powerpc/vec-shift.c (revision 0) >> >> > +++ gcc/testsuite/gcc.target/powerpc/vec-shift.c (working copy) >> >> > @@ -0,0 +1,20 @@ >> >> > +/* { dg-do compile { target { powerpc*-*-* } } } */ >> >> > +/* { dg-require-effective-target powerpc_altivec_ok } */ >> >> > +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ >> >> > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */ >> >> > +/* { dg-options "-mcpu=power7 -O2" } */ >> >> > + >> >> > +/* This used to ICE. During gimplification, "i" is widened to an unsigned >> >> > + int. We used to fail at expand time as we tried to cram an SImode item >> >> > + into a QImode memory slot. This has been fixed to properly truncate the >> >> > + shift amount when splatting it into a vector. */ >> >> > + >> >> > +typedef unsigned char v16ui __attribute__((vector_size(16))); >> >> > + >> >> > +v16ui vslb(v16ui v, unsigned char i) >> >> > +{ >> >> > + return v << i; >> >> > +} >> >> > + >> >> > +/* { dg-final { scan-assembler "vspltb" } } */ >> >> > +/* { dg-final { scan-assembler "vslb" } } */ >> >> > > > > New patch below: > > [gcc] > > 2015-09-02 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * optabs.c (expand_binop): Don't create a broadcast vector with a > source element wider than the inner mode. > > [gcc/testsuite] > > 2015-09-02 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > * gcc.target/powerpc/vec-shift.c: New test. > > > Index: gcc/optabs.c > =================================================================== > --- gcc/optabs.c (revision 227416) > +++ gcc/optabs.c (working copy) > @@ -1608,6 +1608,15 @@ expand_binop (machine_mode mode, optab binoptab, r > > if (otheroptab && optab_handler (otheroptab, mode) != CODE_FOR_nothing) > { > + /* The scalar may have been extended to be too wide. Truncate > + it back to the proper size to fit in the broadcast vector. */ > + machine_mode inner_mode = GET_MODE_INNER (mode); > + if (!CONST_INT_P (op1) > + && (GET_MODE_BITSIZE (inner_mode) > + < GET_MODE_BITSIZE (GET_MODE (op1)))) > + op1 = force_reg (inner_mode, > + simplify_gen_unary (TRUNCATE, inner_mode, op1, > + GET_MODE (op1))); > rtx vop1 = expand_vector_broadcast (mode, op1); > if (vop1) > { > Index: gcc/testsuite/gcc.target/powerpc/vec-shift.c > =================================================================== > --- gcc/testsuite/gcc.target/powerpc/vec-shift.c (revision 0) > +++ gcc/testsuite/gcc.target/powerpc/vec-shift.c (working copy) > @@ -0,0 +1,20 @@ > +/* { dg-do compile { target { powerpc*-*-* } } } */ > +/* { dg-require-effective-target powerpc_altivec_ok } */ > +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */ > +/* { dg-options "-mcpu=power7 -O2" } */ > + > +/* This used to ICE. During gimplification, "i" is widened to an unsigned > + int. We used to fail at expand time as we tried to cram an SImode item > + into a QImode memory slot. This has been fixed to properly truncate the > + shift amount when splatting it into a vector. */ > + > +typedef unsigned char v16ui __attribute__((vector_size(16))); > + > +v16ui vslb(v16ui v, unsigned char i) > +{ > + return v << i; > +} > + > +/* { dg-final { scan-assembler "vspltb" } } */ > +/* { dg-final { scan-assembler "vslb" } } */ > > > > ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-09-03 9:40 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-08-31 21:23 [PATCH] Fix ICE when generating a vector shift by scalar Bill Schmidt 2015-09-01 9:01 ` Richard Biener 2015-09-01 15:54 ` Bill Schmidt 2015-09-02 12:44 ` Richard Biener 2015-09-02 21:53 ` Bill Schmidt 2015-09-03 9:42 ` Richard Biener
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).