[PATCH] Fix native_encode_vector_part for itype when TYPE_PRECISION (itype) == BITS_PER

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] Fix native_encode_vector_part for itype when TYPE_PRECISION (itype) == BITS_PER_UNIT
@ 2024-06-28  1:14 liuhongt
  2024-06-28  6:01 ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: liuhongt @ 2024-06-28  1:14 UTC (permalink / raw)
  To: gcc-patches; +Cc: crazylht, hjl.tools

for the testcase in the PR115406, here is part of the dump.

  char D.4882;
  vector(1) <signed-boolean:8> _1;
  vector(1) signed char _2;
  char _5;

  <bb 2> :
  _1 = { -1 };

When assign { -1 } to vector(1} {signed-boolean:8},
Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of dest
with each vector elemnet. But i think the bit setting should only apply for
TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1).
<signed-boolean:16>, it will be assigned as -1, instead of 1.
Is there any specific reason vector(1) <signed-boolean:8> is handled
differently from vector<1> <signed-boolean:16>?

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

gcc/ChangeLog:

	PR middle-end/115406
	* fold-const.cc (native_encode_vector_part): Don't set each
	bit to the dest when TYPE_PRECISION (itype) == BITS_PER_UNIT.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr115406.c: New test.
---
 gcc/fold-const.cc                        |  2 +-
 gcc/testsuite/gcc.target/i386/pr115406.c | 23 +++++++++++++++++++++++
 2 files changed, 24 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr115406.c

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 710d697c021..0f045f851d1 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -8077,7 +8077,7 @@ native_encode_vector_part (const_tree expr, unsigned char *ptr, int len,
 {
   tree itype = TREE_TYPE (TREE_TYPE (expr));
   if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr))
-      && TYPE_PRECISION (itype) <= BITS_PER_UNIT)
+      && TYPE_PRECISION (itype) < BITS_PER_UNIT)
     {
       /* This is the only case in which elements can be smaller than a byte.
 	 Element 0 is always in the lsb of the containing byte.  */
diff --git a/gcc/testsuite/gcc.target/i386/pr115406.c b/gcc/testsuite/gcc.target/i386/pr115406.c
new file mode 100644
index 00000000000..623dff06fc3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr115406.c
@@ -0,0 +1,23 @@
+/* { dg-do run } */
+/* { dg-options "-O0 -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+typedef __attribute__((__vector_size__ (1))) char V;
+
+char
+foo (V v)
+{
+  return ((V) v == v)[0];
+}
+
+int
+main ()
+{
+  if (!__builtin_cpu_supports ("avx512f"))
+    return 0;
+
+  char x = foo ((V) { });
+  if (x != -1)
+    __builtin_abort ();
+  return 0;
+}
-- 
2.31.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fix native_encode_vector_part for itype when TYPE_PRECISION (itype) == BITS_PER_UNIT
  2024-06-28  1:14 [PATCH] Fix native_encode_vector_part for itype when TYPE_PRECISION (itype) == BITS_PER_UNIT liuhongt
@ 2024-06-28  6:01 ` Richard Biener
  2024-06-28  6:08   ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Biener @ 2024-06-28  6:01 UTC (permalink / raw)
  To: liuhongt, Richard Sandiford; +Cc: gcc-patches, crazylht, hjl.tools

On Fri, Jun 28, 2024 at 3:15 AM liuhongt <hongtao.liu@intel.com> wrote:
>
> for the testcase in the PR115406, here is part of the dump.
>
>   char D.4882;
>   vector(1) <signed-boolean:8> _1;
>   vector(1) signed char _2;
>   char _5;
>
>   <bb 2> :
>   _1 = { -1 };
>
> When assign { -1 } to vector(1} {signed-boolean:8},
> Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of dest
> with each vector elemnet. But i think the bit setting should only apply for
> TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1).
> <signed-boolean:16>, it will be assigned as -1, instead of 1.
> Is there any specific reason vector(1) <signed-boolean:8> is handled
> differently from vector<1> <signed-boolean:16>?
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?

I agree that <= BITS_PER_UNIT is suspicious, but the bit-precision
code should work for 8 bit
entities as well, it seems we only set the LSB of each element in the
"mask".  ISTR that SVE
masks can have up to 8 bit elements (for 8 byte data elements), so
maybe that's why
<= BITS_PER_UNIT.  So maybe instead of just setting one bit in

              ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT);

we should set elt_bits bits, aka (without testing)

              ptr[bit / BITS_PER_UNIT] |= (1 << elt_bits - 1) << (bit
% BITS_PER_UNIT);

?

> gcc/ChangeLog:
>
>         PR middle-end/115406
>         * fold-const.cc (native_encode_vector_part): Don't set each
>         bit to the dest when TYPE_PRECISION (itype) == BITS_PER_UNIT.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/pr115406.c: New test.
> ---
>  gcc/fold-const.cc                        |  2 +-
>  gcc/testsuite/gcc.target/i386/pr115406.c | 23 +++++++++++++++++++++++
>  2 files changed, 24 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr115406.c
>
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 710d697c021..0f045f851d1 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -8077,7 +8077,7 @@ native_encode_vector_part (const_tree expr, unsigned char *ptr, int len,
>  {
>    tree itype = TREE_TYPE (TREE_TYPE (expr));
>    if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr))
> -      && TYPE_PRECISION (itype) <= BITS_PER_UNIT)
> +      && TYPE_PRECISION (itype) < BITS_PER_UNIT)
>      {
>        /* This is the only case in which elements can be smaller than a byte.
>          Element 0 is always in the lsb of the containing byte.  */
> diff --git a/gcc/testsuite/gcc.target/i386/pr115406.c b/gcc/testsuite/gcc.target/i386/pr115406.c
> new file mode 100644
> index 00000000000..623dff06fc3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr115406.c
> @@ -0,0 +1,23 @@
> +/* { dg-do run } */
> +/* { dg-options "-O0 -mavx512f" } */
> +/* { dg-require-effective-target avx512f } */
> +
> +typedef __attribute__((__vector_size__ (1))) char V;
> +
> +char
> +foo (V v)
> +{
> +  return ((V) v == v)[0];
> +}
> +
> +int
> +main ()
> +{
> +  if (!__builtin_cpu_supports ("avx512f"))
> +    return 0;
> +
> +  char x = foo ((V) { });
> +  if (x != -1)
> +    __builtin_abort ();
> +  return 0;
> +}
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fix native_encode_vector_part for itype when TYPE_PRECISION (itype) == BITS_PER_UNIT
  2024-06-28  6:01 ` Richard Biener
@ 2024-06-28  6:08   ` Richard Biener
  2024-06-28  8:27     ` Richard Sandiford
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Biener @ 2024-06-28  6:08 UTC (permalink / raw)
  To: liuhongt, Richard Sandiford; +Cc: gcc-patches, crazylht, hjl.tools

On Fri, Jun 28, 2024 at 8:01 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Fri, Jun 28, 2024 at 3:15 AM liuhongt <hongtao.liu@intel.com> wrote:
> >
> > for the testcase in the PR115406, here is part of the dump.
> >
> >   char D.4882;
> >   vector(1) <signed-boolean:8> _1;
> >   vector(1) signed char _2;
> >   char _5;
> >
> >   <bb 2> :
> >   _1 = { -1 };
> >
> > When assign { -1 } to vector(1} {signed-boolean:8},
> > Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of dest
> > with each vector elemnet. But i think the bit setting should only apply for
> > TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1).
> > <signed-boolean:16>, it will be assigned as -1, instead of 1.
> > Is there any specific reason vector(1) <signed-boolean:8> is handled
> > differently from vector<1> <signed-boolean:16>?
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Ok for trunk?
>
> I agree that <= BITS_PER_UNIT is suspicious, but the bit-precision
> code should work for 8 bit
> entities as well, it seems we only set the LSB of each element in the
> "mask".  ISTR that SVE
> masks can have up to 8 bit elements (for 8 byte data elements), so
> maybe that's why
> <= BITS_PER_UNIT.  So maybe instead of just setting one bit in
>
>               ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT);
>
> we should set elt_bits bits, aka (without testing)
>
>               ptr[bit / BITS_PER_UNIT] |= (1 << elt_bits - 1) << (bit
> % BITS_PER_UNIT);
>
> ?

Alternatively

  if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr))
      && TYPE_PRECISION (itype) <= BITS_PER_UNIT)

should be amended with

   && GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) != MODE_VECTOR_INT

maybe.  Still for the possibility of vector(n) <signed-boolean:16>
mask for a int128 element vector
we'd have 16bit mask elements, encoding that differently would be
inconsistent as well
(but of course 16bit elements are not handled by the code right now).

Richard.

> > gcc/ChangeLog:
> >
> >         PR middle-end/115406
> >         * fold-const.cc (native_encode_vector_part): Don't set each
> >         bit to the dest when TYPE_PRECISION (itype) == BITS_PER_UNIT.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/i386/pr115406.c: New test.
> > ---
> >  gcc/fold-const.cc                        |  2 +-
> >  gcc/testsuite/gcc.target/i386/pr115406.c | 23 +++++++++++++++++++++++
> >  2 files changed, 24 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr115406.c
> >
> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > index 710d697c021..0f045f851d1 100644
> > --- a/gcc/fold-const.cc
> > +++ b/gcc/fold-const.cc
> > @@ -8077,7 +8077,7 @@ native_encode_vector_part (const_tree expr, unsigned char *ptr, int len,
> >  {
> >    tree itype = TREE_TYPE (TREE_TYPE (expr));
> >    if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr))
> > -      && TYPE_PRECISION (itype) <= BITS_PER_UNIT)
> > +      && TYPE_PRECISION (itype) < BITS_PER_UNIT)
> >      {
> >        /* This is the only case in which elements can be smaller than a byte.
> >          Element 0 is always in the lsb of the containing byte.  */
> > diff --git a/gcc/testsuite/gcc.target/i386/pr115406.c b/gcc/testsuite/gcc.target/i386/pr115406.c
> > new file mode 100644
> > index 00000000000..623dff06fc3
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr115406.c
> > @@ -0,0 +1,23 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-O0 -mavx512f" } */
> > +/* { dg-require-effective-target avx512f } */
> > +
> > +typedef __attribute__((__vector_size__ (1))) char V;
> > +
> > +char
> > +foo (V v)
> > +{
> > +  return ((V) v == v)[0];
> > +}
> > +
> > +int
> > +main ()
> > +{
> > +  if (!__builtin_cpu_supports ("avx512f"))
> > +    return 0;
> > +
> > +  char x = foo ((V) { });
> > +  if (x != -1)
> > +    __builtin_abort ();
> > +  return 0;
> > +}
> > --
> > 2.31.1
> >

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fix native_encode_vector_part for itype when TYPE_PRECISION (itype) == BITS_PER_UNIT
  2024-06-28  6:08   ` Richard Biener
@ 2024-06-28  8:27     ` Richard Sandiford
  2024-06-28  9:06       ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Sandiford @ 2024-06-28  8:27 UTC (permalink / raw)
  To: Richard Biener; +Cc: liuhongt, gcc-patches, crazylht, hjl.tools

Richard Biener <richard.guenther@gmail.com> writes:
> On Fri, Jun 28, 2024 at 8:01 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
>>
>> On Fri, Jun 28, 2024 at 3:15 AM liuhongt <hongtao.liu@intel.com> wrote:
>> >
>> > for the testcase in the PR115406, here is part of the dump.
>> >
>> >   char D.4882;
>> >   vector(1) <signed-boolean:8> _1;
>> >   vector(1) signed char _2;
>> >   char _5;
>> >
>> >   <bb 2> :
>> >   _1 = { -1 };
>> >
>> > When assign { -1 } to vector(1} {signed-boolean:8},
>> > Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of dest
>> > with each vector elemnet. But i think the bit setting should only apply for
>> > TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1).
>> > <signed-boolean:16>, it will be assigned as -1, instead of 1.
>> > Is there any specific reason vector(1) <signed-boolean:8> is handled
>> > differently from vector<1> <signed-boolean:16>?
>> >
>> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
>> > Ok for trunk?
>>
>> I agree that <= BITS_PER_UNIT is suspicious, but the bit-precision
>> code should work for 8 bit
>> entities as well, it seems we only set the LSB of each element in the
>> "mask".  ISTR that SVE
>> masks can have up to 8 bit elements (for 8 byte data elements), so
>> maybe that's why
>> <= BITS_PER_UNIT.

Yeah.

>>  So maybe instead of just setting one bit in
>>
>>               ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT);
>>
>> we should set elt_bits bits, aka (without testing)
>>
>>               ptr[bit / BITS_PER_UNIT] |= (1 << elt_bits - 1) << (bit
>> % BITS_PER_UNIT);
>>
>> ?
>
> Alternatively
>
>   if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr))
>       && TYPE_PRECISION (itype) <= BITS_PER_UNIT)
>
> should be amended with
>
>    && GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) != MODE_VECTOR_INT

How about:

  if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) == MODE_VECTOR_BOOL)
    {
      gcc_assert (TYPE_PRECISION (itype) <= BITS_PER_UNIT);

?

Is it OK for TYPE_MODE to affect tree-level semantics though, especially
since it can change with the target attribute?  (At least TYPE_MODE_RAW
would be stable.)

> maybe.  Still for the possibility of vector(n) <signed-boolean:16>
> mask for a int128 element vector
> we'd have 16bit mask elements, encoding that differently would be
> inconsistent as well
> (but of course 16bit elements are not handled by the code right now).

Yeah, 16-bit predicate elements aren't a thing for SVE, so we've not
had to add support for them.

Richard

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fix native_encode_vector_part for itype when TYPE_PRECISION (itype) == BITS_PER_UNIT
  2024-06-28  8:27     ` Richard Sandiford
@ 2024-06-28  9:06       ` Richard Biener
  2024-06-28 12:16         ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Biener @ 2024-06-28  9:06 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: liuhongt, gcc-patches, crazylht, hjl.tools



> Am 28.06.2024 um 10:27 schrieb Richard Sandiford <richard.sandiford@arm.com>:
> 
> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Fri, Jun 28, 2024 at 8:01 AM Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>> 
>>> On Fri, Jun 28, 2024 at 3:15 AM liuhongt <hongtao.liu@intel.com> wrote:
>>>> 
>>>> for the testcase in the PR115406, here is part of the dump.
>>>> 
>>>>  char D.4882;
>>>>  vector(1) <signed-boolean:8> _1;
>>>>  vector(1) signed char _2;
>>>>  char _5;
>>>> 
>>>>  <bb 2> :
>>>>  _1 = { -1 };
>>>> 
>>>> When assign { -1 } to vector(1} {signed-boolean:8},
>>>> Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of dest
>>>> with each vector elemnet. But i think the bit setting should only apply for
>>>> TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1).
>>>> <signed-boolean:16>, it will be assigned as -1, instead of 1.
>>>> Is there any specific reason vector(1) <signed-boolean:8> is handled
>>>> differently from vector<1> <signed-boolean:16>?
>>>> 
>>>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
>>>> Ok for trunk?
>>> 
>>> I agree that <= BITS_PER_UNIT is suspicious, but the bit-precision
>>> code should work for 8 bit
>>> entities as well, it seems we only set the LSB of each element in the
>>> "mask".  ISTR that SVE
>>> masks can have up to 8 bit elements (for 8 byte data elements), so
>>> maybe that's why
>>> <= BITS_PER_UNIT.
> 
> Yeah.

So is it necessary that only one bit is set for SVE?

>>> So maybe instead of just setting one bit in
>>> 
>>>              ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT);
>>> 
>>> we should set elt_bits bits, aka (without testing)
>>> 
>>>              ptr[bit / BITS_PER_UNIT] |= (1 << elt_bits - 1) << (bit
>>> % BITS_PER_UNIT);
>>> 
>>> ?
>> 
>> Alternatively
>> 
>>  if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr))
>>      && TYPE_PRECISION (itype) <= BITS_PER_UNIT)
>> 
>> should be amended with
>> 
>>   && GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) != MODE_VECTOR_INT
> 
> How about:
> 
>  if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) == MODE_VECTOR_BOOL)
>    {
>      gcc_assert (TYPE_PRECISION (itype) <= BITS_PER_UNIT);
> 
> ?

Note the path is also necessary for avx512 and gcn mask modes which are integer modes.

> Is it OK for TYPE_MODE to affect tree-level semantics though, especially
> since it can change with the target attribute?  (At least TYPE_MODE_RAW
> would be stable.)

That’s a good question and also related to GCC vector extension which can result in both BLKmode and integer modes to be used.  But I’m not sure how we expose masks to the middle end here.  A too large vector bool could be lowered to AVX512 mode.  Maybe we should simply reject interpret/encode of BLKmode vectors and make sure to never assign integer modes to vector bools (if the target didn’t specify that mode)?

I guess some test coverage would be nice here.

>> maybe.  Still for the possibility of vector(n) <signed-boolean:16>
>> mask for a int128 element vector
>> we'd have 16bit mask elements, encoding that differently would be
>> inconsistent as well
>> (but of course 16bit elements are not handled by the code right now).
> 
> Yeah, 16-bit predicate elements aren't a thing for SVE, so we've not
> had to add support for them.
> 
> Richard

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fix native_encode_vector_part for itype when TYPE_PRECISION (itype) == BITS_PER_UNIT
  2024-06-28  9:06       ` Richard Biener
@ 2024-06-28 12:16         ` Richard Biener
  2024-06-28 12:22           ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Biener @ 2024-06-28 12:16 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: liuhongt, gcc-patches, crazylht, hjl.tools

On Fri, Jun 28, 2024 at 11:06 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
>
>
> > Am 28.06.2024 um 10:27 schrieb Richard Sandiford <richard.sandiford@arm.com>:
> >
> > Richard Biener <richard.guenther@gmail.com> writes:
> >>> On Fri, Jun 28, 2024 at 8:01 AM Richard Biener
> >>> <richard.guenther@gmail.com> wrote:
> >>>
> >>> On Fri, Jun 28, 2024 at 3:15 AM liuhongt <hongtao.liu@intel.com> wrote:
> >>>>
> >>>> for the testcase in the PR115406, here is part of the dump.
> >>>>
> >>>>  char D.4882;
> >>>>  vector(1) <signed-boolean:8> _1;
> >>>>  vector(1) signed char _2;
> >>>>  char _5;
> >>>>
> >>>>  <bb 2> :
> >>>>  _1 = { -1 };
> >>>>
> >>>> When assign { -1 } to vector(1} {signed-boolean:8},
> >>>> Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of dest
> >>>> with each vector elemnet. But i think the bit setting should only apply for
> >>>> TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1).
> >>>> <signed-boolean:16>, it will be assigned as -1, instead of 1.
> >>>> Is there any specific reason vector(1) <signed-boolean:8> is handled
> >>>> differently from vector<1> <signed-boolean:16>?
> >>>>
> >>>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> >>>> Ok for trunk?
> >>>
> >>> I agree that <= BITS_PER_UNIT is suspicious, but the bit-precision
> >>> code should work for 8 bit
> >>> entities as well, it seems we only set the LSB of each element in the
> >>> "mask".  ISTR that SVE
> >>> masks can have up to 8 bit elements (for 8 byte data elements), so
> >>> maybe that's why
> >>> <= BITS_PER_UNIT.
> >
> > Yeah.
>
> So is it necessary that only one bit is set for SVE?
>
> >>> So maybe instead of just setting one bit in
> >>>
> >>>              ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT);
> >>>
> >>> we should set elt_bits bits, aka (without testing)
> >>>
> >>>              ptr[bit / BITS_PER_UNIT] |= (1 << elt_bits - 1) << (bit
> >>> % BITS_PER_UNIT);
> >>>
> >>> ?
> >>
> >> Alternatively
> >>
> >>  if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr))
> >>      && TYPE_PRECISION (itype) <= BITS_PER_UNIT)
> >>
> >> should be amended with
> >>
> >>   && GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) != MODE_VECTOR_INT
> >
> > How about:
> >
> >  if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) == MODE_VECTOR_BOOL)
> >    {
> >      gcc_assert (TYPE_PRECISION (itype) <= BITS_PER_UNIT);
> >
> > ?
>
> Note the path is also necessary for avx512 and gcn mask modes which are integer modes.
>
> > Is it OK for TYPE_MODE to affect tree-level semantics though, especially
> > since it can change with the target attribute?  (At least TYPE_MODE_RAW
> > would be stable.)
>
> That’s a good question and also related to GCC vector extension which can result in both BLKmode and integer modes to be used.  But I’m not sure how we expose masks to the middle end here.  A too large vector bool could be lowered to AVX512 mode.  Maybe we should simply reject interpret/encode of BLKmode vectors and make sure to never assign integer modes to vector bools (if the target didn’t specify that mode)?
>
> I guess some test coverage would be nice here.

To continue on that, we do not currently have a way to capture a
vector comparison output
but the C++ frontend has vector ?:

typedef int v8si __attribute__((vector_size(32)));

void foo (v8si *a, v8si *b, v8si *c)
{
  *c = *a < *b ? (v8si){-1,-1,-1,-1,-1,-1,-1,-1 } : (v8si){0,0,0,0,0,0,0,0};
}

with SSE2 we get a <signed-boolean:32> temporary, with AVX512 enabled
that becomes <singed-boolean:1>.  When we enlarge the vector to size 128
then even with AVX512 enabled I see <signed-boolean:32> here and
vector lowering decomposes that to scalar (also with AVX or SSE, so maybe
just a missed optimization).  But note that to decompose this into two
AVX512 vectors the temporary would have to change from <signed-boolean:32>
elements to <signed-boolean:1>.

The not supported vector bool types have BLKmode sofar.

But for example on i?86-linux with -mno-sse (like -march=i586) for

typedef short v2hi __attribute__((vector_size(4)));

void foo (v2hi *a, v2hi *b, v2hi *c)
{
  *c = *a < *b ? (v2hi){-1,-1} : (v2hi){0,0};
}

we get a SImode vector <signed-boolean:16> as I feared.  That means
<signed-boolean:8> (the BITS_PER_UNIT case) can be ambiguous
between SVE (bool for a 8byte data vector) and emulated vectors
("word-mode" vectors; for 1byte data vectors).

And without knowing that SVE would have used VnBImode given that
AVX512 uses an integer mode.

Aside from the too large vector and AVX512 issue above I think we can use
MODE_VECTOR_BOOL || TYPE_PRECISION == 1 and for the latter we
can assert the mode is a scalar integer mode (AVX512 or GCN)?

Richard.


> >> maybe.  Still for the possibility of vector(n) <signed-boolean:16>
> >> mask for a int128 element vector
> >> we'd have 16bit mask elements, encoding that differently would be
> >> inconsistent as well
> >> (but of course 16bit elements are not handled by the code right now).
> >
> > Yeah, 16-bit predicate elements aren't a thing for SVE, so we've not
> > had to add support for them.
> >
> > Richard

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fix native_encode_vector_part for itype when TYPE_PRECISION (itype) == BITS_PER_UNIT
  2024-06-28 12:16         ` Richard Biener
@ 2024-06-28 12:22           ` Richard Biener
  2024-06-28 12:28             ` Richard Sandiford
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Biener @ 2024-06-28 12:22 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: liuhongt, gcc-patches, crazylht, hjl.tools

[-- Attachment #1: Type: text/plain, Size: 5195 bytes --]

On Fri, Jun 28, 2024 at 2:16 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Fri, Jun 28, 2024 at 11:06 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> >
> >
> > > Am 28.06.2024 um 10:27 schrieb Richard Sandiford <richard.sandiford@arm.com>:
> > >
> > > Richard Biener <richard.guenther@gmail.com> writes:
> > >>> On Fri, Jun 28, 2024 at 8:01 AM Richard Biener
> > >>> <richard.guenther@gmail.com> wrote:
> > >>>
> > >>> On Fri, Jun 28, 2024 at 3:15 AM liuhongt <hongtao.liu@intel.com> wrote:
> > >>>>
> > >>>> for the testcase in the PR115406, here is part of the dump.
> > >>>>
> > >>>>  char D.4882;
> > >>>>  vector(1) <signed-boolean:8> _1;
> > >>>>  vector(1) signed char _2;
> > >>>>  char _5;
> > >>>>
> > >>>>  <bb 2> :
> > >>>>  _1 = { -1 };
> > >>>>
> > >>>> When assign { -1 } to vector(1} {signed-boolean:8},
> > >>>> Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of dest
> > >>>> with each vector elemnet. But i think the bit setting should only apply for
> > >>>> TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1).
> > >>>> <signed-boolean:16>, it will be assigned as -1, instead of 1.
> > >>>> Is there any specific reason vector(1) <signed-boolean:8> is handled
> > >>>> differently from vector<1> <signed-boolean:16>?
> > >>>>
> > >>>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > >>>> Ok for trunk?
> > >>>
> > >>> I agree that <= BITS_PER_UNIT is suspicious, but the bit-precision
> > >>> code should work for 8 bit
> > >>> entities as well, it seems we only set the LSB of each element in the
> > >>> "mask".  ISTR that SVE
> > >>> masks can have up to 8 bit elements (for 8 byte data elements), so
> > >>> maybe that's why
> > >>> <= BITS_PER_UNIT.
> > >
> > > Yeah.
> >
> > So is it necessary that only one bit is set for SVE?
> >
> > >>> So maybe instead of just setting one bit in
> > >>>
> > >>>              ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT);
> > >>>
> > >>> we should set elt_bits bits, aka (without testing)
> > >>>
> > >>>              ptr[bit / BITS_PER_UNIT] |= (1 << elt_bits - 1) << (bit
> > >>> % BITS_PER_UNIT);
> > >>>
> > >>> ?
> > >>
> > >> Alternatively
> > >>
> > >>  if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr))
> > >>      && TYPE_PRECISION (itype) <= BITS_PER_UNIT)
> > >>
> > >> should be amended with
> > >>
> > >>   && GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) != MODE_VECTOR_INT
> > >
> > > How about:
> > >
> > >  if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) == MODE_VECTOR_BOOL)
> > >    {
> > >      gcc_assert (TYPE_PRECISION (itype) <= BITS_PER_UNIT);
> > >
> > > ?
> >
> > Note the path is also necessary for avx512 and gcn mask modes which are integer modes.
> >
> > > Is it OK for TYPE_MODE to affect tree-level semantics though, especially
> > > since it can change with the target attribute?  (At least TYPE_MODE_RAW
> > > would be stable.)
> >
> > That’s a good question and also related to GCC vector extension which can result in both BLKmode and integer modes to be used.  But I’m not sure how we expose masks to the middle end here.  A too large vector bool could be lowered to AVX512 mode.  Maybe we should simply reject interpret/encode of BLKmode vectors and make sure to never assign integer modes to vector bools (if the target didn’t specify that mode)?
> >
> > I guess some test coverage would be nice here.
>
> To continue on that, we do not currently have a way to capture a
> vector comparison output
> but the C++ frontend has vector ?:
>
> typedef int v8si __attribute__((vector_size(32)));
>
> void foo (v8si *a, v8si *b, v8si *c)
> {
>   *c = *a < *b ? (v8si){-1,-1,-1,-1,-1,-1,-1,-1 } : (v8si){0,0,0,0,0,0,0,0};
> }
>
> with SSE2 we get a <signed-boolean:32> temporary, with AVX512 enabled
> that becomes <singed-boolean:1>.  When we enlarge the vector to size 128
> then even with AVX512 enabled I see <signed-boolean:32> here and
> vector lowering decomposes that to scalar (also with AVX or SSE, so maybe
> just a missed optimization).  But note that to decompose this into two
> AVX512 vectors the temporary would have to change from <signed-boolean:32>
> elements to <signed-boolean:1>.
>
> The not supported vector bool types have BLKmode sofar.
>
> But for example on i?86-linux with -mno-sse (like -march=i586) for
>
> typedef short v2hi __attribute__((vector_size(4)));
>
> void foo (v2hi *a, v2hi *b, v2hi *c)
> {
>   *c = *a < *b ? (v2hi){-1,-1} : (v2hi){0,0};
> }
>
> we get a SImode vector <signed-boolean:16> as I feared.  That means
> <signed-boolean:8> (the BITS_PER_UNIT case) can be ambiguous
> between SVE (bool for a 8byte data vector) and emulated vectors
> ("word-mode" vectors; for 1byte data vectors).
>
> And without knowing that SVE would have used VnBImode given that
> AVX512 uses an integer mode.
>
> Aside from the too large vector and AVX512 issue above I think we can use
> MODE_VECTOR_BOOL || TYPE_PRECISION == 1 and for the latter we
> can assert the mode is a scalar integer mode (AVX512 or GCN)?

So like the attached?

Richard.

[-- Attachment #2: p --]
[-- Type: application/octet-stream, Size: 1566 bytes --]

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 8ff42806497..1aeb4c8b467 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -8077,11 +8077,13 @@ native_encode_vector_part (const_tree expr, unsigned char *ptr, int len,
 {
   tree itype = TREE_TYPE (TREE_TYPE (expr));
   if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr))
-      && TYPE_PRECISION (itype) <= BITS_PER_UNIT)
+      && (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) == MODE_VECTOR_BOOL
+	  || TYPE_PRECISION (itype) == 1))
     {
       /* This is the only case in which elements can be smaller than a byte.
 	 Element 0 is always in the lsb of the containing byte.  */
       unsigned int elt_bits = TYPE_PRECISION (itype);
+      gcc_assert (elt_bits <= BITS_PER_UNIT);
       int total_bytes = CEIL (elt_bits * count, BITS_PER_UNIT);
       if ((off == -1 && total_bytes > len) || off >= total_bytes)
 	return 0;
@@ -8966,11 +8968,13 @@ native_interpret_vector_part (tree type, const unsigned char *bytes,
 {
   tree elt_type = TREE_TYPE (type);
   if (VECTOR_BOOLEAN_TYPE_P (type)
-      && TYPE_PRECISION (elt_type) <= BITS_PER_UNIT)
+      && (GET_MODE_CLASS (TYPE_MODE (type)) == MODE_VECTOR_BOOL
+	  || TYPE_PRECISION (elt_type) == 1))
     {
       /* This is the only case in which elements can be smaller than a byte.
 	 Element 0 is always in the lsb of the containing byte.  */
       unsigned int elt_bits = TYPE_PRECISION (elt_type);
+      gcc_assert (elt_bits <= BITS_PER_UNIT);
       if (elt_bits * npatterns * nelts_per_pattern > len * BITS_PER_UNIT)
 	return NULL_TREE;
 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Fix native_encode_vector_part for itype when TYPE_PRECISION (itype) == BITS_PER_UNIT
  2024-06-28 12:22           ` Richard Biener
@ 2024-06-28 12:28             ` Richard Sandiford
  0 siblings, 0 replies; 8+ messages in thread
From: Richard Sandiford @ 2024-06-28 12:28 UTC (permalink / raw)
  To: Richard Biener; +Cc: liuhongt, gcc-patches, crazylht, hjl.tools

Richard Biener <richard.guenther@gmail.com> writes:
> On Fri, Jun 28, 2024 at 2:16 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
>>
>> On Fri, Jun 28, 2024 at 11:06 AM Richard Biener
>> <richard.guenther@gmail.com> wrote:
>> >
>> >
>> >
>> > > Am 28.06.2024 um 10:27 schrieb Richard Sandiford <richard.sandiford@arm.com>:
>> > >
>> > > Richard Biener <richard.guenther@gmail.com> writes:
>> > >>> On Fri, Jun 28, 2024 at 8:01 AM Richard Biener
>> > >>> <richard.guenther@gmail.com> wrote:
>> > >>>
>> > >>> On Fri, Jun 28, 2024 at 3:15 AM liuhongt <hongtao.liu@intel.com> wrote:
>> > >>>>
>> > >>>> for the testcase in the PR115406, here is part of the dump.
>> > >>>>
>> > >>>>  char D.4882;
>> > >>>>  vector(1) <signed-boolean:8> _1;
>> > >>>>  vector(1) signed char _2;
>> > >>>>  char _5;
>> > >>>>
>> > >>>>  <bb 2> :
>> > >>>>  _1 = { -1 };
>> > >>>>
>> > >>>> When assign { -1 } to vector(1} {signed-boolean:8},
>> > >>>> Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of dest
>> > >>>> with each vector elemnet. But i think the bit setting should only apply for
>> > >>>> TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1).
>> > >>>> <signed-boolean:16>, it will be assigned as -1, instead of 1.
>> > >>>> Is there any specific reason vector(1) <signed-boolean:8> is handled
>> > >>>> differently from vector<1> <signed-boolean:16>?
>> > >>>>
>> > >>>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
>> > >>>> Ok for trunk?
>> > >>>
>> > >>> I agree that <= BITS_PER_UNIT is suspicious, but the bit-precision
>> > >>> code should work for 8 bit
>> > >>> entities as well, it seems we only set the LSB of each element in the
>> > >>> "mask".  ISTR that SVE
>> > >>> masks can have up to 8 bit elements (for 8 byte data elements), so
>> > >>> maybe that's why
>> > >>> <= BITS_PER_UNIT.
>> > >
>> > > Yeah.
>> >
>> > So is it necessary that only one bit is set for SVE?

TBH I can't remember now.  It matches what SVE instructions produce, and
lines up with the associated RTL code (which at the time was SVE-specific).
But when dealing with multibyte elements, upper predicate element bits
are ignored on read, so matching the instructions might not matter.

>> > >>> So maybe instead of just setting one bit in
>> > >>>
>> > >>>              ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT);
>> > >>>
>> > >>> we should set elt_bits bits, aka (without testing)
>> > >>>
>> > >>>              ptr[bit / BITS_PER_UNIT] |= (1 << elt_bits - 1) << (bit
>> > >>> % BITS_PER_UNIT);
>> > >>>
>> > >>> ?
>> > >>
>> > >> Alternatively
>> > >>
>> > >>  if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr))
>> > >>      && TYPE_PRECISION (itype) <= BITS_PER_UNIT)
>> > >>
>> > >> should be amended with
>> > >>
>> > >>   && GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) != MODE_VECTOR_INT
>> > >
>> > > How about:
>> > >
>> > >  if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) == MODE_VECTOR_BOOL)
>> > >    {
>> > >      gcc_assert (TYPE_PRECISION (itype) <= BITS_PER_UNIT);
>> > >
>> > > ?
>> >
>> > Note the path is also necessary for avx512 and gcn mask modes which are integer modes.
>> >
>> > > Is it OK for TYPE_MODE to affect tree-level semantics though, especially
>> > > since it can change with the target attribute?  (At least TYPE_MODE_RAW
>> > > would be stable.)
>> >
>> > That’s a good question and also related to GCC vector extension which can result in both BLKmode and integer modes to be used.  But I’m not sure how we expose masks to the middle end here.  A too large vector bool could be lowered to AVX512 mode.  Maybe we should simply reject interpret/encode of BLKmode vectors and make sure to never assign integer modes to vector bools (if the target didn’t specify that mode)?
>> >
>> > I guess some test coverage would be nice here.
>>
>> To continue on that, we do not currently have a way to capture a
>> vector comparison output
>> but the C++ frontend has vector ?:
>>
>> typedef int v8si __attribute__((vector_size(32)));
>>
>> void foo (v8si *a, v8si *b, v8si *c)
>> {
>>   *c = *a < *b ? (v8si){-1,-1,-1,-1,-1,-1,-1,-1 } : (v8si){0,0,0,0,0,0,0,0};
>> }
>>
>> with SSE2 we get a <signed-boolean:32> temporary, with AVX512 enabled
>> that becomes <singed-boolean:1>.  When we enlarge the vector to size 128
>> then even with AVX512 enabled I see <signed-boolean:32> here and
>> vector lowering decomposes that to scalar (also with AVX or SSE, so maybe
>> just a missed optimization).  But note that to decompose this into two
>> AVX512 vectors the temporary would have to change from <signed-boolean:32>
>> elements to <signed-boolean:1>.
>>
>> The not supported vector bool types have BLKmode sofar.
>>
>> But for example on i?86-linux with -mno-sse (like -march=i586) for
>>
>> typedef short v2hi __attribute__((vector_size(4)));
>>
>> void foo (v2hi *a, v2hi *b, v2hi *c)
>> {
>>   *c = *a < *b ? (v2hi){-1,-1} : (v2hi){0,0};
>> }
>>
>> we get a SImode vector <signed-boolean:16> as I feared.  That means
>> <signed-boolean:8> (the BITS_PER_UNIT case) can be ambiguous
>> between SVE (bool for a 8byte data vector) and emulated vectors
>> ("word-mode" vectors; for 1byte data vectors).
>>
>> And without knowing that SVE would have used VnBImode given that
>> AVX512 uses an integer mode.
>>
>> Aside from the too large vector and AVX512 issue above I think we can use
>> MODE_VECTOR_BOOL || TYPE_PRECISION == 1 and for the latter we
>> can assert the mode is a scalar integer mode (AVX512 or GCN)?
>
> So like the attached?

Can you give me a few days to see whether swapping to -1 : 0 masks
works for SVE?  I guess it would make things easier in the long run.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-06-28 12:28 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-28  1:14 [PATCH] Fix native_encode_vector_part for itype when TYPE_PRECISION (itype) == BITS_PER_UNIT liuhongt
2024-06-28  6:01 ` Richard Biener
2024-06-28  6:08   ` Richard Biener
2024-06-28  8:27     ` Richard Sandiford
2024-06-28  9:06       ` Richard Biener
2024-06-28 12:16         ` Richard Biener
2024-06-28 12:22           ` Richard Biener
2024-06-28 12:28             ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).