Re: [PATCH] doc: clarify semantics of vector bitwise shifts

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: [PATCH] doc: clarify semantics of vector bitwise shifts
@ 2023-05-24 16:28 Richard Biener
  2023-05-24 18:36 ` Alexander Monakov
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Biener @ 2023-05-24 16:28 UTC (permalink / raw)
  To: Alexander Monakov, gcc-patches



> Am 24.05.2023 um 16:21 schrieb Alexander Monakov <amonakov@ispras.ru>:
> 
> 
>> On Wed, 24 May 2023, Richard Biener wrote:
>>> On Wed, May 24, 2023 at 2:54 PM Alexander Monakov via Gcc-patches
>>> <gcc-patches@gcc.gnu.org> wrote:
>>> Explicitly say that bitwise shifts for narrow types work similar to
>>> element-wise C shifts with integer promotions, which coincides with
>>> OpenCL semantics.
>> Do we need to clarify that v << w with v being a vector of shorts
>> still yields a vector of shorts and not a vector of ints?
> 
> I don't think so, but if necessary we could add "and the result was
> truncated back to the base type":
> 
>   When the base type is narrower than @code{int}, element-wise shifts
>   are performed as if operands underwent C integer promotions, and
>   the result was truncated back to the base type, like in OpenCL. 
> 
>> Btw, I don't see this promotion reflected in the IL.  For
>> typedef short v8hi __attribute__((vector_size(16)));
>> v8hi foo (v8hi a, v8hi b)
>> {
>> return a << b;
>> }
>> I get no masking of 'b' and vector lowering if the target doens't handle it
>> yields
>> short int _5;
>> short int _6;
>> _5 = BIT_FIELD_REF <a_1(D), 16, 0>;
>> _6 = BIT_FIELD_REF <b_2(D), 16, 0>;
>> _7 = _5 << _6;
>> which we could derive ranges from for _6 (apparantly we don't yet).
> 
> Here it depends on how we define the GIMPLE-level semantics of bit-shift
> operators for narrow types. To avoid changing lowering we could say that
> shifting by up to 31 bits is well-defined for narrow types.
> 
> RTL-level semantics are also undocumented, unfortunately.
> 
>> Even
>> typedef int v8hi __attribute__((vector_size(16)));
>> v8hi x;
>> int foo (v8hi a, v8hi b)
>> {
>> x = a << b;
>> return (b[0] > 33);
>> }
>> isn't optimized currently (but could - note I've used 'int' elements here).
> 
> Yeah. But let's constrain the optimizations first.
> 
>> So, I don't see us making sure the hardware does the right thing for
>> out-of bound values.
> 
> I think in practice it worked out even if GCC did not pay attention to it,
> because SIMD instructions had to facilitate autovectorization for C with
> corresponding shift semantics.

I’d have to check the ISAs what they actually do here - it of course depends on RTL semantics as well but as you say those are not strictly defined here either.

I agree we can go with smaller types than int behave as if promoted (also for scalars for consistency).  Those operations do not exist in the C standard after all (maybe with _BitInt it’s now a thing)

Richard.

> Alexander
> 
>> Richard.
>>> gcc/ChangeLog:
>>>       * doc/extend.texi (Vector Extensions): Clarify bitwise shift
>>>       semantics.
>>> ---
>>> gcc/doc/extend.texi | 7 ++++++-
>>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>>> index e426a2eb7d..6b4e94b6a1 100644
>>> --- a/gcc/doc/extend.texi
>>> +++ b/gcc/doc/extend.texi
>>> @@ -12026,7 +12026,12 @@ elements in the operand.
>>> It is possible to use shifting operators @code{<<}, @code{>>} on
>>> integer-type vectors. The operation is defined as following: @code{@{a0,
>>> a1, @dots{}, an@} >> @{b0, b1, @dots{}, bn@} == @{a0 >> b0, a1 >> b1,
>>> -@dots{}, an >> bn@}}@. Vector operands must have the same number of
>>> +@dots{}, an >> bn@}}@.  When the base type is narrower than @code{int},
>>> +element-wise shifts are performed as if operands underwent C integer
>>> +promotions, like in OpenCL.  This makes vector shifts by up to 31 bits
>>> +well-defined for vectors with @code{char} and @code{short} base types.
>>> +
>>> +Operands of binary vector operations must have the same number of
>>> elements.
>>> For convenience, it is allowed to use a binary vector operation
>>> --
>>> 2.39.2

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] doc: clarify semantics of vector bitwise shifts
  2023-05-24 16:28 [PATCH] doc: clarify semantics of vector bitwise shifts Richard Biener
@ 2023-05-24 18:36 ` Alexander Monakov
  2023-05-25  6:50   ` Richard Biener
  0 siblings, 1 reply; 17+ messages in thread
From: Alexander Monakov @ 2023-05-24 18:36 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1960 bytes --]


On Wed, 24 May 2023, Richard Biener via Gcc-patches wrote:

> I’d have to check the ISAs what they actually do here - it of course depends
> on RTL semantics as well but as you say those are not strictly defined here
> either.

Plus, we can add the following executable test to the testsuite:

#include <stdint.h>

#define CHECK(TYPE, WIDTH, OP, COUNT, INVERT)         \
{                                                     \
typedef TYPE vec __attribute__((vector_size(WIDTH))); \
                                                      \
	static volatile vec zero;                     \
	vec tmp = (zero-2) OP (COUNT);                \
	vec ref = INVERT zero;                        \
	if (__builtin_memcmp(&tmp, &ref, sizeof tmp)) \
		__builtin_abort();                    \
}

int main(void)
{
	CHECK( uint8_t, 16, <<, 8,  )
	CHECK( uint8_t, 16, <<, 31, )
	CHECK( uint8_t, 16, >>, 8,  )
	CHECK( uint8_t, 16, >>, 31, )
	CHECK(  int8_t, 16, <<, 8,  )
	CHECK(  int8_t, 16, <<, 31, )
	CHECK(  int8_t, 16, >>, 8,  ~)
	CHECK(  int8_t, 16, >>, 31, ~)
	CHECK(uint16_t, 16, <<, 16, )
	CHECK(uint16_t, 16, <<, 31, )
	CHECK(uint16_t, 16, >>, 16, )
	CHECK(uint16_t, 16, >>, 31, )
	CHECK( int16_t, 16, <<, 16, )
	CHECK( int16_t, 16, <<, 31, )
	CHECK( int16_t, 16, >>, 16, ~)
	CHECK( int16_t, 16, >>, 31, ~)
	// Per-lane-variable shifts:
	CHECK( uint8_t, 16, <<, zero+8,  )
	CHECK( uint8_t, 16, <<, zero+31, )
	CHECK( uint8_t, 16, >>, zero+8,  )
	CHECK( uint8_t, 16, >>, zero+31, )
	CHECK(  int8_t, 16, <<, zero+8,  )
	CHECK(  int8_t, 16, <<, zero+31, )
	CHECK(  int8_t, 16, >>, zero+8,  ~)
	CHECK(  int8_t, 16, >>, zero+31, ~)
	CHECK(uint16_t, 16, <<, zero+16, )
	CHECK(uint16_t, 16, <<, zero+31, )
	CHECK(uint16_t, 16, >>, zero+16, )
	CHECK(uint16_t, 16, >>, zero+31, )
	CHECK( int16_t, 16, <<, zero+16, )
	CHECK( int16_t, 16, <<, zero+31, )
	CHECK( int16_t, 16, >>, zero+16, ~)
	CHECK( int16_t, 16, >>, zero+31, ~)

	// Repeat for WIDTH=32 and WIDTH=64
}

Alexander

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] doc: clarify semantics of vector bitwise shifts
  2023-05-24 18:36 ` Alexander Monakov
@ 2023-05-25  6:50   ` Richard Biener
  2023-05-25 10:46     ` Richard Biener
  2023-05-30 14:49     ` Alexander Monakov
  0 siblings, 2 replies; 17+ messages in thread
From: Richard Biener @ 2023-05-25  6:50 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: gcc-patches

On Wed, May 24, 2023 at 8:36 PM Alexander Monakov <amonakov@ispras.ru> wrote:
>
>
> On Wed, 24 May 2023, Richard Biener via Gcc-patches wrote:
>
> > I’d have to check the ISAs what they actually do here - it of course depends
> > on RTL semantics as well but as you say those are not strictly defined here
> > either.
>
> Plus, we can add the following executable test to the testsuite:

Yeah, that's probably a good idea.  I think your documentation change
with the added sentence about the truncation is OK.  Note we have

/* Shift operations for shift and rotate.
   Shift means logical shift if done on an
   unsigned type, arithmetic shift if done on a signed type.
   The second operand is the number of bits to
   shift by; it need not be the same type as the first operand and result.
   Note that the result is undefined if the second operand is larger
   than or equal to the first operand's type size.

   The first operand of a shift can have either an integer or a
   (non-integer) fixed-point type.  We follow the ISO/IEC TR 18037:2004
   semantics for the latter.

   Rotates are defined for integer types only.  */
DEFTREECODE (LSHIFT_EXPR, "lshift_expr", tcc_binary, 2)

in tree.def which implies short << 24 is undefined behavior (similar
wording in generic.texi).  The rtl docs say nothing about behavior
but I think the semantics should carry over.  That works for x86
even for scalar instructions working on GPRs (masking is applied
but fixed to 5 or 6 bits even for QImode or HImode shifts).

Note that when we make these shifts well-defined there's
also arithmetic on signed types smaller than int (which again
doesn't exist in C) where overflow invokes undefined behavior
in the middle-end.  Unless we want to change that as well
this is somewhat inconsistent then.

There's also the issue that C 'int' is defined by INT_TYPE_SIZE
and thus target dependent which makes what is undefined and
what not target dependent.

Richard.

> #include <stdint.h>
>
> #define CHECK(TYPE, WIDTH, OP, COUNT, INVERT)         \
> {                                                     \
> typedef TYPE vec __attribute__((vector_size(WIDTH))); \
>                                                       \
>         static volatile vec zero;                     \
>         vec tmp = (zero-2) OP (COUNT);                \
>         vec ref = INVERT zero;                        \
>         if (__builtin_memcmp(&tmp, &ref, sizeof tmp)) \
>                 __builtin_abort();                    \
> }
>
> int main(void)
> {
>         CHECK( uint8_t, 16, <<, 8,  )
>         CHECK( uint8_t, 16, <<, 31, )
>         CHECK( uint8_t, 16, >>, 8,  )
>         CHECK( uint8_t, 16, >>, 31, )
>         CHECK(  int8_t, 16, <<, 8,  )
>         CHECK(  int8_t, 16, <<, 31, )
>         CHECK(  int8_t, 16, >>, 8,  ~)
>         CHECK(  int8_t, 16, >>, 31, ~)
>         CHECK(uint16_t, 16, <<, 16, )
>         CHECK(uint16_t, 16, <<, 31, )
>         CHECK(uint16_t, 16, >>, 16, )
>         CHECK(uint16_t, 16, >>, 31, )
>         CHECK( int16_t, 16, <<, 16, )
>         CHECK( int16_t, 16, <<, 31, )
>         CHECK( int16_t, 16, >>, 16, ~)
>         CHECK( int16_t, 16, >>, 31, ~)
>         // Per-lane-variable shifts:
>         CHECK( uint8_t, 16, <<, zero+8,  )
>         CHECK( uint8_t, 16, <<, zero+31, )
>         CHECK( uint8_t, 16, >>, zero+8,  )
>         CHECK( uint8_t, 16, >>, zero+31, )
>         CHECK(  int8_t, 16, <<, zero+8,  )
>         CHECK(  int8_t, 16, <<, zero+31, )
>         CHECK(  int8_t, 16, >>, zero+8,  ~)
>         CHECK(  int8_t, 16, >>, zero+31, ~)
>         CHECK(uint16_t, 16, <<, zero+16, )
>         CHECK(uint16_t, 16, <<, zero+31, )
>         CHECK(uint16_t, 16, >>, zero+16, )
>         CHECK(uint16_t, 16, >>, zero+31, )
>         CHECK( int16_t, 16, <<, zero+16, )
>         CHECK( int16_t, 16, <<, zero+31, )
>         CHECK( int16_t, 16, >>, zero+16, ~)
>         CHECK( int16_t, 16, >>, zero+31, ~)
>
>         // Repeat for WIDTH=32 and WIDTH=64
> }
>
> Alexander

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] doc: clarify semantics of vector bitwise shifts
  2023-05-25  6:50   ` Richard Biener
@ 2023-05-25 10:46     ` Richard Biener
  2023-05-30 14:49     ` Alexander Monakov
  1 sibling, 0 replies; 17+ messages in thread
From: Richard Biener @ 2023-05-25 10:46 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: gcc-patches

On Thu, May 25, 2023 at 8:50 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Wed, May 24, 2023 at 8:36 PM Alexander Monakov <amonakov@ispras.ru> wrote:
> >
> >
> > On Wed, 24 May 2023, Richard Biener via Gcc-patches wrote:
> >
> > > I’d have to check the ISAs what they actually do here - it of course depends
> > > on RTL semantics as well but as you say those are not strictly defined here
> > > either.

Btw, it was just noted on IRC that VSX (and maybe altivec as well)
does not adhere to this and use
just 3 bits from the shift operand for bytes and 4 for half-words.

> > Plus, we can add the following executable test to the testsuite:
>
> Yeah, that's probably a good idea.  I think your documentation change
> with the added sentence about the truncation is OK.  Note we have
>
> /* Shift operations for shift and rotate.
>    Shift means logical shift if done on an
>    unsigned type, arithmetic shift if done on a signed type.
>    The second operand is the number of bits to
>    shift by; it need not be the same type as the first operand and result.
>    Note that the result is undefined if the second operand is larger
>    than or equal to the first operand's type size.
>
>    The first operand of a shift can have either an integer or a
>    (non-integer) fixed-point type.  We follow the ISO/IEC TR 18037:2004
>    semantics for the latter.
>
>    Rotates are defined for integer types only.  */
> DEFTREECODE (LSHIFT_EXPR, "lshift_expr", tcc_binary, 2)
>
> in tree.def which implies short << 24 is undefined behavior (similar
> wording in generic.texi).  The rtl docs say nothing about behavior
> but I think the semantics should carry over.  That works for x86
> even for scalar instructions working on GPRs (masking is applied
> but fixed to 5 or 6 bits even for QImode or HImode shifts).
>
> Note that when we make these shifts well-defined there's
> also arithmetic on signed types smaller than int (which again
> doesn't exist in C) where overflow invokes undefined behavior
> in the middle-end.  Unless we want to change that as well
> this is somewhat inconsistent then.
>
> There's also the issue that C 'int' is defined by INT_TYPE_SIZE
> and thus target dependent which makes what is undefined and
> what not target dependent.
>
> Richard.
>
> > #include <stdint.h>
> >
> > #define CHECK(TYPE, WIDTH, OP, COUNT, INVERT)         \
> > {                                                     \
> > typedef TYPE vec __attribute__((vector_size(WIDTH))); \
> >                                                       \
> >         static volatile vec zero;                     \
> >         vec tmp = (zero-2) OP (COUNT);                \
> >         vec ref = INVERT zero;                        \
> >         if (__builtin_memcmp(&tmp, &ref, sizeof tmp)) \
> >                 __builtin_abort();                    \
> > }
> >
> > int main(void)
> > {
> >         CHECK( uint8_t, 16, <<, 8,  )
> >         CHECK( uint8_t, 16, <<, 31, )
> >         CHECK( uint8_t, 16, >>, 8,  )
> >         CHECK( uint8_t, 16, >>, 31, )
> >         CHECK(  int8_t, 16, <<, 8,  )
> >         CHECK(  int8_t, 16, <<, 31, )
> >         CHECK(  int8_t, 16, >>, 8,  ~)
> >         CHECK(  int8_t, 16, >>, 31, ~)
> >         CHECK(uint16_t, 16, <<, 16, )
> >         CHECK(uint16_t, 16, <<, 31, )
> >         CHECK(uint16_t, 16, >>, 16, )
> >         CHECK(uint16_t, 16, >>, 31, )
> >         CHECK( int16_t, 16, <<, 16, )
> >         CHECK( int16_t, 16, <<, 31, )
> >         CHECK( int16_t, 16, >>, 16, ~)
> >         CHECK( int16_t, 16, >>, 31, ~)
> >         // Per-lane-variable shifts:
> >         CHECK( uint8_t, 16, <<, zero+8,  )
> >         CHECK( uint8_t, 16, <<, zero+31, )
> >         CHECK( uint8_t, 16, >>, zero+8,  )
> >         CHECK( uint8_t, 16, >>, zero+31, )
> >         CHECK(  int8_t, 16, <<, zero+8,  )
> >         CHECK(  int8_t, 16, <<, zero+31, )
> >         CHECK(  int8_t, 16, >>, zero+8,  ~)
> >         CHECK(  int8_t, 16, >>, zero+31, ~)
> >         CHECK(uint16_t, 16, <<, zero+16, )
> >         CHECK(uint16_t, 16, <<, zero+31, )
> >         CHECK(uint16_t, 16, >>, zero+16, )
> >         CHECK(uint16_t, 16, >>, zero+31, )
> >         CHECK( int16_t, 16, <<, zero+16, )
> >         CHECK( int16_t, 16, <<, zero+31, )
> >         CHECK( int16_t, 16, >>, zero+16, ~)
> >         CHECK( int16_t, 16, >>, zero+31, ~)
> >
> >         // Repeat for WIDTH=32 and WIDTH=64
> > }
> >
> > Alexander

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] doc: clarify semantics of vector bitwise shifts
  2023-05-25  6:50   ` Richard Biener
  2023-05-25 10:46     ` Richard Biener
@ 2023-05-30 14:49     ` Alexander Monakov
  2023-05-31  7:12       ` Richard Biener
  1 sibling, 1 reply; 17+ messages in thread
From: Alexander Monakov @ 2023-05-30 14:49 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 5308 bytes --]


On Thu, 25 May 2023, Richard Biener wrote:

> On Wed, May 24, 2023 at 8:36 PM Alexander Monakov <amonakov@ispras.ru> wrote:
> >
> >
> > On Wed, 24 May 2023, Richard Biener via Gcc-patches wrote:
> >
> > > I’d have to check the ISAs what they actually do here - it of course depends
> > > on RTL semantics as well but as you say those are not strictly defined here
> > > either.
> >
> > Plus, we can add the following executable test to the testsuite:
> 
> Yeah, that's probably a good idea.  I think your documentation change
> with the added sentence about the truncation is OK.

I am no longer confident in my patch, sorry.

My claim about vector shift semantics in OpenCL was wrong. In fact it specifies
that RHS of a vector shift is masked to the exact bitwidth of the element type.

So, to collect various angles:

1. OpenCL semantics would need an 'AND' before a shift (except VSX/Altivec).

2. From user side we had a request to follow C integer promotion semantics
   in https://gcc.gnu.org/PR91838 but I now doubt we can do that.

3. LLVM makes oversized vector shifts UB both for 'vector_size' and
   'ext_vector_type'.

4. Vector lowering does not emit promotions, and starting from gcc-12
   ranger treats oversized shifts according to the documentation you
   cite below, and optimizes (e.g. with '-O2 -mno-sse')

	typedef short v8hi __attribute__((vector_size(16)));

	void f(v8hi *p)
	{
	    *p >>= 16;
	}

   to zeroing '*p'. If this looks unintended, I can file a bug.

I still think we need to clarify semantics of vector shifts, but probably
not in the way I proposed initially. What do you think?

Thanks.
Alexander

> Note we have
> 
> /* Shift operations for shift and rotate.
>    Shift means logical shift if done on an
>    unsigned type, arithmetic shift if done on a signed type.
>    The second operand is the number of bits to
>    shift by; it need not be the same type as the first operand and result.
>    Note that the result is undefined if the second operand is larger
>    than or equal to the first operand's type size.
> 
>    The first operand of a shift can have either an integer or a
>    (non-integer) fixed-point type.  We follow the ISO/IEC TR 18037:2004
>    semantics for the latter.
> 
>    Rotates are defined for integer types only.  */
> DEFTREECODE (LSHIFT_EXPR, "lshift_expr", tcc_binary, 2)
> 
> in tree.def which implies short << 24 is undefined behavior (similar
> wording in generic.texi).  The rtl docs say nothing about behavior
> but I think the semantics should carry over.  That works for x86
> even for scalar instructions working on GPRs (masking is applied
> but fixed to 5 or 6 bits even for QImode or HImode shifts).
> 
> Note that when we make these shifts well-defined there's
> also arithmetic on signed types smaller than int (which again
> doesn't exist in C) where overflow invokes undefined behavior
> in the middle-end.  Unless we want to change that as well
> this is somewhat inconsistent then.
> 
> There's also the issue that C 'int' is defined by INT_TYPE_SIZE
> and thus target dependent which makes what is undefined and
> what not target dependent.
> 
> Richard.
> 
> > #include <stdint.h>
> >
> > #define CHECK(TYPE, WIDTH, OP, COUNT, INVERT)         \
> > {                                                     \
> > typedef TYPE vec __attribute__((vector_size(WIDTH))); \
> >                                                       \
> >         static volatile vec zero;                     \
> >         vec tmp = (zero-2) OP (COUNT);                \
> >         vec ref = INVERT zero;                        \
> >         if (__builtin_memcmp(&tmp, &ref, sizeof tmp)) \
> >                 __builtin_abort();                    \
> > }
> >
> > int main(void)
> > {
> >         CHECK( uint8_t, 16, <<, 8,  )
> >         CHECK( uint8_t, 16, <<, 31, )
> >         CHECK( uint8_t, 16, >>, 8,  )
> >         CHECK( uint8_t, 16, >>, 31, )
> >         CHECK(  int8_t, 16, <<, 8,  )
> >         CHECK(  int8_t, 16, <<, 31, )
> >         CHECK(  int8_t, 16, >>, 8,  ~)
> >         CHECK(  int8_t, 16, >>, 31, ~)
> >         CHECK(uint16_t, 16, <<, 16, )
> >         CHECK(uint16_t, 16, <<, 31, )
> >         CHECK(uint16_t, 16, >>, 16, )
> >         CHECK(uint16_t, 16, >>, 31, )
> >         CHECK( int16_t, 16, <<, 16, )
> >         CHECK( int16_t, 16, <<, 31, )
> >         CHECK( int16_t, 16, >>, 16, ~)
> >         CHECK( int16_t, 16, >>, 31, ~)
> >         // Per-lane-variable shifts:
> >         CHECK( uint8_t, 16, <<, zero+8,  )
> >         CHECK( uint8_t, 16, <<, zero+31, )
> >         CHECK( uint8_t, 16, >>, zero+8,  )
> >         CHECK( uint8_t, 16, >>, zero+31, )
> >         CHECK(  int8_t, 16, <<, zero+8,  )
> >         CHECK(  int8_t, 16, <<, zero+31, )
> >         CHECK(  int8_t, 16, >>, zero+8,  ~)
> >         CHECK(  int8_t, 16, >>, zero+31, ~)
> >         CHECK(uint16_t, 16, <<, zero+16, )
> >         CHECK(uint16_t, 16, <<, zero+31, )
> >         CHECK(uint16_t, 16, >>, zero+16, )
> >         CHECK(uint16_t, 16, >>, zero+31, )
> >         CHECK( int16_t, 16, <<, zero+16, )
> >         CHECK( int16_t, 16, <<, zero+31, )
> >         CHECK( int16_t, 16, >>, zero+16, ~)
> >         CHECK( int16_t, 16, >>, zero+31, ~)
> >
> >         // Repeat for WIDTH=32 and WIDTH=64
> > }
> >
> > Alexander
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] doc: clarify semantics of vector bitwise shifts
  2023-05-30 14:49     ` Alexander Monakov
@ 2023-05-31  7:12       ` Richard Biener
  2023-06-01 18:25         ` Alexander Monakov
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Biener @ 2023-05-31  7:12 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: gcc-patches

On Tue, May 30, 2023 at 4:49 PM Alexander Monakov <amonakov@ispras.ru> wrote:
>
>
> On Thu, 25 May 2023, Richard Biener wrote:
>
> > On Wed, May 24, 2023 at 8:36 PM Alexander Monakov <amonakov@ispras.ru> wrote:
> > >
> > >
> > > On Wed, 24 May 2023, Richard Biener via Gcc-patches wrote:
> > >
> > > > I’d have to check the ISAs what they actually do here - it of course depends
> > > > on RTL semantics as well but as you say those are not strictly defined here
> > > > either.
> > >
> > > Plus, we can add the following executable test to the testsuite:
> >
> > Yeah, that's probably a good idea.  I think your documentation change
> > with the added sentence about the truncation is OK.
>
> I am no longer confident in my patch, sorry.
>
> My claim about vector shift semantics in OpenCL was wrong. In fact it specifies
> that RHS of a vector shift is masked to the exact bitwidth of the element type.
>
> So, to collect various angles:
>
> 1. OpenCL semantics would need an 'AND' before a shift (except VSX/Altivec).
>
> 2. From user side we had a request to follow C integer promotion semantics
>    in https://gcc.gnu.org/PR91838 but I now doubt we can do that.
>
> 3. LLVM makes oversized vector shifts UB both for 'vector_size' and
>    'ext_vector_type'.

I had the impression GCC desired to do 3. as well, matching what we do
for scalar shifts.

> 4. Vector lowering does not emit promotions, and starting from gcc-12
>    ranger treats oversized shifts according to the documentation you
>    cite below, and optimizes (e.g. with '-O2 -mno-sse')
>
>         typedef short v8hi __attribute__((vector_size(16)));
>
>         void f(v8hi *p)
>         {
>             *p >>= 16;
>         }
>
>    to zeroing '*p'. If this looks unintended, I can file a bug.
>
> I still think we need to clarify semantics of vector shifts, but probably
> not in the way I proposed initially. What do you think?

I think the intent at some point was to adhere to the OpenCL spec
for the GCC vector extension (because that's a written spec while
GCCs vector extension docs are lacking).  Originally the powerpc
altivec 'vector' keyword spurred most of the development IIRC
so it might be useful to see how they specify shifts.

So yes, we probably should clarify the semantics to match the
implementation (since we have two targets doing things differently
since forever we can only document it as UB) and also note the
difference from OpenCL (in case OpenCL is still relevant these
days we might want to offer a -fopencl-vectors to emit the required
AND).

It would be also good to amend the RTL documentation.

It would be very nice to start an internals documentation section
around collecting what the middle-end considers undefined
or implementation defined (aka target defined) behavior in the
GENERIC, GIMPLE and RTL ILs and what predicates eventually
control that (like TYPE_OVERFLOW_UNDEFINED).  Maybe spread it over
{gimple,generic,rtl}.texi, though gimple.texi is only about the representation
and all semantics are shared and documented in generic.texi.

Thanks,
Richard.

> Thanks.
> Alexander
>
> > Note we have
> >
> > /* Shift operations for shift and rotate.
> >    Shift means logical shift if done on an
> >    unsigned type, arithmetic shift if done on a signed type.
> >    The second operand is the number of bits to
> >    shift by; it need not be the same type as the first operand and result.
> >    Note that the result is undefined if the second operand is larger
> >    than or equal to the first operand's type size.
> >
> >    The first operand of a shift can have either an integer or a
> >    (non-integer) fixed-point type.  We follow the ISO/IEC TR 18037:2004
> >    semantics for the latter.
> >
> >    Rotates are defined for integer types only.  */
> > DEFTREECODE (LSHIFT_EXPR, "lshift_expr", tcc_binary, 2)
> >
> > in tree.def which implies short << 24 is undefined behavior (similar
> > wording in generic.texi).  The rtl docs say nothing about behavior
> > but I think the semantics should carry over.  That works for x86
> > even for scalar instructions working on GPRs (masking is applied
> > but fixed to 5 or 6 bits even for QImode or HImode shifts).
> >
> > Note that when we make these shifts well-defined there's
> > also arithmetic on signed types smaller than int (which again
> > doesn't exist in C) where overflow invokes undefined behavior
> > in the middle-end.  Unless we want to change that as well
> > this is somewhat inconsistent then.
> >
> > There's also the issue that C 'int' is defined by INT_TYPE_SIZE
> > and thus target dependent which makes what is undefined and
> > what not target dependent.
> >
> > Richard.
> >
> > > #include <stdint.h>
> > >
> > > #define CHECK(TYPE, WIDTH, OP, COUNT, INVERT)         \
> > > {                                                     \
> > > typedef TYPE vec __attribute__((vector_size(WIDTH))); \
> > >                                                       \
> > >         static volatile vec zero;                     \
> > >         vec tmp = (zero-2) OP (COUNT);                \
> > >         vec ref = INVERT zero;                        \
> > >         if (__builtin_memcmp(&tmp, &ref, sizeof tmp)) \
> > >                 __builtin_abort();                    \
> > > }
> > >
> > > int main(void)
> > > {
> > >         CHECK( uint8_t, 16, <<, 8,  )
> > >         CHECK( uint8_t, 16, <<, 31, )
> > >         CHECK( uint8_t, 16, >>, 8,  )
> > >         CHECK( uint8_t, 16, >>, 31, )
> > >         CHECK(  int8_t, 16, <<, 8,  )
> > >         CHECK(  int8_t, 16, <<, 31, )
> > >         CHECK(  int8_t, 16, >>, 8,  ~)
> > >         CHECK(  int8_t, 16, >>, 31, ~)
> > >         CHECK(uint16_t, 16, <<, 16, )
> > >         CHECK(uint16_t, 16, <<, 31, )
> > >         CHECK(uint16_t, 16, >>, 16, )
> > >         CHECK(uint16_t, 16, >>, 31, )
> > >         CHECK( int16_t, 16, <<, 16, )
> > >         CHECK( int16_t, 16, <<, 31, )
> > >         CHECK( int16_t, 16, >>, 16, ~)
> > >         CHECK( int16_t, 16, >>, 31, ~)
> > >         // Per-lane-variable shifts:
> > >         CHECK( uint8_t, 16, <<, zero+8,  )
> > >         CHECK( uint8_t, 16, <<, zero+31, )
> > >         CHECK( uint8_t, 16, >>, zero+8,  )
> > >         CHECK( uint8_t, 16, >>, zero+31, )
> > >         CHECK(  int8_t, 16, <<, zero+8,  )
> > >         CHECK(  int8_t, 16, <<, zero+31, )
> > >         CHECK(  int8_t, 16, >>, zero+8,  ~)
> > >         CHECK(  int8_t, 16, >>, zero+31, ~)
> > >         CHECK(uint16_t, 16, <<, zero+16, )
> > >         CHECK(uint16_t, 16, <<, zero+31, )
> > >         CHECK(uint16_t, 16, >>, zero+16, )
> > >         CHECK(uint16_t, 16, >>, zero+31, )
> > >         CHECK( int16_t, 16, <<, zero+16, )
> > >         CHECK( int16_t, 16, <<, zero+31, )
> > >         CHECK( int16_t, 16, >>, zero+16, ~)
> > >         CHECK( int16_t, 16, >>, zero+31, ~)
> > >
> > >         // Repeat for WIDTH=32 and WIDTH=64
> > > }
> > >
> > > Alexander
> >

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] doc: clarify semantics of vector bitwise shifts
  2023-05-31  7:12       ` Richard Biener
@ 2023-06-01 18:25         ` Alexander Monakov
  2023-06-02  7:07           ` Matthias Kretz
  2023-06-02  9:39           ` Richard Biener
  0 siblings, 2 replies; 17+ messages in thread
From: Alexander Monakov @ 2023-06-01 18:25 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 5246 bytes --]


On Wed, 31 May 2023, Richard Biener wrote:

> On Tue, May 30, 2023 at 4:49 PM Alexander Monakov <amonakov@ispras.ru> wrote:
> >
> >
> > On Thu, 25 May 2023, Richard Biener wrote:
> >
> > > On Wed, May 24, 2023 at 8:36 PM Alexander Monakov <amonakov@ispras.ru> wrote:
> > > >
> > > >
> > > > On Wed, 24 May 2023, Richard Biener via Gcc-patches wrote:
> > > >
> > > > > I’d have to check the ISAs what they actually do here - it of course depends
> > > > > on RTL semantics as well but as you say those are not strictly defined here
> > > > > either.
> > > >
> > > > Plus, we can add the following executable test to the testsuite:
> > >
> > > Yeah, that's probably a good idea.  I think your documentation change
> > > with the added sentence about the truncation is OK.
> >
> > I am no longer confident in my patch, sorry.
> >
> > My claim about vector shift semantics in OpenCL was wrong. In fact it specifies
> > that RHS of a vector shift is masked to the exact bitwidth of the element type.
> >
> > So, to collect various angles:
> >
> > 1. OpenCL semantics would need an 'AND' before a shift (except VSX/Altivec).
> >
> > 2. From user side we had a request to follow C integer promotion semantics
> >    in https://gcc.gnu.org/PR91838 but I now doubt we can do that.
> >
> > 3. LLVM makes oversized vector shifts UB both for 'vector_size' and
> >    'ext_vector_type'.
> 
> I had the impression GCC desired to do 3. as well, matching what we do
> for scalar shifts.
> 
> > 4. Vector lowering does not emit promotions, and starting from gcc-12
> >    ranger treats oversized shifts according to the documentation you
> >    cite below, and optimizes (e.g. with '-O2 -mno-sse')
> >
> >         typedef short v8hi __attribute__((vector_size(16)));
> >
> >         void f(v8hi *p)
> >         {
> >             *p >>= 16;
> >         }
> >
> >    to zeroing '*p'. If this looks unintended, I can file a bug.
> >
> > I still think we need to clarify semantics of vector shifts, but probably
> > not in the way I proposed initially. What do you think?
> 
> I think the intent at some point was to adhere to the OpenCL spec
> for the GCC vector extension (because that's a written spec while
> GCCs vector extension docs are lacking).  Originally the powerpc
> altivec 'vector' keyword spurred most of the development IIRC
> so it might be useful to see how they specify shifts.

It doesn't look like they document the semantics of '<<' and '>>'
operators for vector types.

> So yes, we probably should clarify the semantics to match the
> implementation (since we have two targets doing things differently
> since forever we can only document it as UB) and also note the
> difference from OpenCL (in case OpenCL is still relevant these
> days we might want to offer a -fopencl-vectors to emit the required
> AND).

It doesn't have to be UB, in principle we could say that shift amount
is taken modulo some power of two depending on the target without UB.
But since LLVM already treats that as UB, we might as well follow.

I think for addition/multiplication of signed vectors everybody
expects them to have wrapping semantics without UB on overflow though?

Revised patch below.

> It would be also good to amend the RTL documentation.
> 
> It would be very nice to start an internals documentation section
> around collecting what the middle-end considers undefined
> or implementation defined (aka target defined) behavior in the
> GENERIC, GIMPLE and RTL ILs and what predicates eventually
> control that (like TYPE_OVERFLOW_UNDEFINED).  Maybe spread it over
> {gimple,generic,rtl}.texi, though gimple.texi is only about the representation
> and all semantics are shared and documented in generic.texi.

Hm, noted. Thanks.

---8<---

From e4e8d9e262f2f8dbc91a94291cf7accb74d27e7c Mon Sep 17 00:00:00 2001
From: Alexander Monakov <amonakov@ispras.ru>
Date: Wed, 24 May 2023 15:48:29 +0300
Subject: [PATCH] doc: clarify semantics of vector bitwise shifts

Explicitly say that attempted shift past element bit width is UB for
vector types.  Mention that integer promotions do not happen.

gcc/ChangeLog:

	* doc/extend.texi (Vector Extensions): Clarify bitwise shift
	semantics.
---
 gcc/doc/extend.texi | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index e426a2eb7d..3723cfe467 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -12026,7 +12026,14 @@ elements in the operand.
 It is possible to use shifting operators @code{<<}, @code{>>} on
 integer-type vectors. The operation is defined as following: @code{@{a0,
 a1, @dots{}, an@} >> @{b0, b1, @dots{}, bn@} == @{a0 >> b0, a1 >> b1,
-@dots{}, an >> bn@}}@. Vector operands must have the same number of
+@dots{}, an >> bn@}}@.  Unlike OpenCL, values of @code{b} are not
+implicitly taken modulo bit width of the base type @code{B}, and the behavior
+is undefined if any @code{bi} is greater than or equal to @code{B}.
+
+In contrast to scalar operations in C and C++, operands of integer vector
+operations do not undergo integer promotions.
+
+Operands of binary vector operations must have the same number of
 elements. 
 
 For convenience, it is allowed to use a binary vector operation
-- 
2.39.2

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] doc: clarify semantics of vector bitwise shifts
  2023-06-01 18:25         ` Alexander Monakov
@ 2023-06-02  7:07           ` Matthias Kretz
  2023-06-02  7:49             ` Alexander Monakov
  2023-06-02  9:39           ` Richard Biener
  1 sibling, 1 reply; 17+ messages in thread
From: Matthias Kretz @ 2023-06-02  7:07 UTC (permalink / raw)
  To: Richard Biener, gcc-patches; +Cc: gcc-patches, Alexander Monakov

On Thursday, 1 June 2023 20:25:14 CEST Alexander Monakov wrote:
> On Wed, 31 May 2023, Richard Biener wrote:
> > So yes, we probably should clarify the semantics to match the
> > implementation (since we have two targets doing things differently
> > since forever we can only document it as UB) and also note the
> > difference from OpenCL (in case OpenCL is still relevant these
> > days we might want to offer a -fopencl-vectors to emit the required
> > AND).
> 
> It doesn't have to be UB, in principle we could say that shift amount
> is taken modulo some power of two depending on the target without UB.
> But since LLVM already treats that as UB, we might as well follow.

I prefer UB (as your patch states 👍). If a user requires the AND, let them 
state it explicitly. Don't let everybody pay in performance.

> I think for addition/multiplication of signed vectors everybody
> expects them to have wrapping semantics without UB on overflow though?

  simd<int> x = ...;
  bool t = all_of(x < x + 1); // unconditionally true or not?

I'd expect t to be unconditionally true. Because simd<int> simply is a data-
parallel version of int.

> Revised patch below.

This can be considered a breaking change. Does it need a mention in the 
release notes?

- Matthias


-- 
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 stdₓ::simd
──────────────────────────────────────────────────────────────────────────

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] doc: clarify semantics of vector bitwise shifts
  2023-06-02  7:07           ` Matthias Kretz
@ 2023-06-02  7:49             ` Alexander Monakov
  2023-06-02  9:03               ` Matthias Kretz
  0 siblings, 1 reply; 17+ messages in thread
From: Alexander Monakov @ 2023-06-02  7:49 UTC (permalink / raw)
  To: Matthias Kretz; +Cc: Richard Biener, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2338 bytes --]

On Fri, 2 Jun 2023, Matthias Kretz wrote:

> On Thursday, 1 June 2023 20:25:14 CEST Alexander Monakov wrote:
> > On Wed, 31 May 2023, Richard Biener wrote:
> > > So yes, we probably should clarify the semantics to match the
> > > implementation (since we have two targets doing things differently
> > > since forever we can only document it as UB) and also note the
> > > difference from OpenCL (in case OpenCL is still relevant these
> > > days we might want to offer a -fopencl-vectors to emit the required
> > > AND).
> > 
> > It doesn't have to be UB, in principle we could say that shift amount
> > is taken modulo some power of two depending on the target without UB.
> > But since LLVM already treats that as UB, we might as well follow.
> 
> I prefer UB (as your patch states 👍). If a user requires the AND, let them 
> state it explicitly. Don't let everybody pay in performance.

What I suggested does not imply a performance cost. All targets take some
lower bits of the shift amount anyway. It's only OpenCL's exact masking
that would imply a performance cost (and I agree it's inappropriate for
GCC's generic vectors).

> > I think for addition/multiplication of signed vectors everybody
> > expects them to have wrapping semantics without UB on overflow though?
> 
>   simd<int> x = ...;
>   bool t = all_of(x < x + 1); // unconditionally true or not?
> 
> I'd expect t to be unconditionally true. Because simd<int> simply is a data-
> parallel version of int.

Okay, I see opinions will vary here. I was thinking about our immintrin.h
which is partially implemented in terms of generic vectors. Imagine we
extend UBSan to trap on signed overflow for vector types. I expect that
will blow up on existing code that uses Intel intrinsics. But use of
generic vectors in immintrin.h is our implementation detail, and people
might have expected intrinsics to be overflow-safe, like for aliasing
(where we use __attribute__((may_alias)) in immintrin.h). Although, we
can solve that by inventing overflow-wraps attribute for types, maybe?

> > Revised patch below.
> 
> This can be considered a breaking change. Does it need a mention in the 
> release notes?

I'm not sure what you consider a breaking change here. Is that the implied
threat to use undefinedness for range deduction and other optimizations?

Thanks.
Alexander

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] doc: clarify semantics of vector bitwise shifts
  2023-06-02  7:49             ` Alexander Monakov
@ 2023-06-02  9:03               ` Matthias Kretz
  2023-06-02  9:24                 ` Alexander Monakov
  0 siblings, 1 reply; 17+ messages in thread
From: Matthias Kretz @ 2023-06-02  9:03 UTC (permalink / raw)
  To: gcc-patches, Alexander Monakov; +Cc: Richard Biener

On Friday, 2 June 2023 09:49:26 CEST Alexander Monakov wrote:
> > simd<int> x = ...;
> > bool t = all_of(x < x + 1); // unconditionally true or not?
> > 
> > I'd expect t to be unconditionally true. Because simd<int> simply is a
> > data- parallel version of int.
> 
> Okay, I see opinions will vary here. I was thinking about our immintrin.h
> which is partially implemented in terms of generic vectors. Imagine we
> extend UBSan to trap on signed overflow for vector types. I expect that
> will blow up on existing code that uses Intel intrinsics.

_mm_add_epi32 is already implemented via __v4su addition (i.e. unsigned). So 
the intrinsic would continue to wrap on signed overflow.

> > > Revised patch below.
> > 
> > This can be considered a breaking change. Does it need a mention in the
> > release notes?
> 
> I'm not sure what you consider a breaking change here. Is that the implied
> threat to use undefinedness for range deduction and other optimizations?

Consider the stdx::simd implementation. It currently follows semantics of the 
builtin types. So simd<char> can be shifted by 30 without UB. The 
implementation of the shift operator depends on the current behavior, even if 
it is target-dependent. For PPC the simd implementation adds extra code to 
avoid the "UB". With nailing down shifts > sizeof(T) as UB this extra code now 
needs to be added for all targets.

- Matthias

-- 
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 stdₓ::simd
──────────────────────────────────────────────────────────────────────────

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] doc: clarify semantics of vector bitwise shifts
  2023-06-02  9:03               ` Matthias Kretz
@ 2023-06-02  9:24                 ` Alexander Monakov
  2023-06-02  9:34                   ` Matthias Kretz
  2023-06-02  9:36                   ` Richard Biener
  0 siblings, 2 replies; 17+ messages in thread
From: Alexander Monakov @ 2023-06-02  9:24 UTC (permalink / raw)
  To: Matthias Kretz; +Cc: gcc-patches, Richard Biener


On Fri, 2 Jun 2023, Matthias Kretz wrote:

> > Okay, I see opinions will vary here. I was thinking about our immintrin.h
> > which is partially implemented in terms of generic vectors. Imagine we
> > extend UBSan to trap on signed overflow for vector types. I expect that
> > will blow up on existing code that uses Intel intrinsics.
> 
> _mm_add_epi32 is already implemented via __v4su addition (i.e. unsigned). So 
> the intrinsic would continue to wrap on signed overflow.

Ah, if our intrinsics take care of it, that alleviates my concern.

> > I'm not sure what you consider a breaking change here. Is that the implied
> > threat to use undefinedness for range deduction and other optimizations?
> 
> Consider the stdx::simd implementation. It currently follows semantics of the 
> builtin types. So simd<char> can be shifted by 30 without UB. The 
> implementation of the shift operator depends on the current behavior, even if 
> it is target-dependent. For PPC the simd implementation adds extra code to 
> avoid the "UB". With nailing down shifts > sizeof(T) as UB this extra code now 
> needs to be added for all targets.

What does stdx::simd do on LLVM, where that has always been UB even on x86?

Alexander

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] doc: clarify semantics of vector bitwise shifts
  2023-06-02  9:24                 ` Alexander Monakov
@ 2023-06-02  9:34                   ` Matthias Kretz
  2023-06-02  9:36                   ` Richard Biener
  1 sibling, 0 replies; 17+ messages in thread
From: Matthias Kretz @ 2023-06-02  9:34 UTC (permalink / raw)
  To: gcc-patches, Richard Biener; +Cc: Alexander Monakov

On Friday, 2 June 2023 11:24:23 CEST Alexander Monakov wrote:
> > > I'm not sure what you consider a breaking change here. Is that the
> > > implied
> > > threat to use undefinedness for range deduction and other optimizations?
> > 
> > Consider the stdx::simd implementation. It currently follows semantics of
> > the builtin types. So simd<char> can be shifted by 30 without UB. The
> > implementation of the shift operator depends on the current behavior, even
> > if it is target-dependent. For PPC the simd implementation adds extra
> > code to avoid the "UB". With nailing down shifts > sizeof(T) as UB this
> > extra code now needs to be added for all targets.
> 
> What does stdx::simd do on LLVM, where that has always been UB even on x86?

At this point Clang/LLVM support is best effort. I did not know before that 
LLVM nailed this down as UB. Also my test suite didn't show any failures on 
shifts IIRC (but that doesn't say anything about UB, I know).

FWIW, I'm okay with saying nothing in the release notes. It might just be that 
some codes have become dependent on the existing (under-specified) behavior. 🤷

- Matthias
-- 
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 stdₓ::simd
──────────────────────────────────────────────────────────────────────────

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] doc: clarify semantics of vector bitwise shifts
  2023-06-02  9:24                 ` Alexander Monakov
  2023-06-02  9:34                   ` Matthias Kretz
@ 2023-06-02  9:36                   ` Richard Biener
  1 sibling, 0 replies; 17+ messages in thread
From: Richard Biener @ 2023-06-02  9:36 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: Matthias Kretz, gcc-patches

On Fri, Jun 2, 2023 at 11:24 AM Alexander Monakov <amonakov@ispras.ru> wrote:
>
>
> On Fri, 2 Jun 2023, Matthias Kretz wrote:
>
> > > Okay, I see opinions will vary here. I was thinking about our immintrin.h
> > > which is partially implemented in terms of generic vectors. Imagine we
> > > extend UBSan to trap on signed overflow for vector types. I expect that
> > > will blow up on existing code that uses Intel intrinsics.
> >
> > _mm_add_epi32 is already implemented via __v4su addition (i.e. unsigned). So
> > the intrinsic would continue to wrap on signed overflow.
>
> Ah, if our intrinsics take care of it, that alleviates my concern.

Just to add when generic vectors are lowered to scalar operations then
signed vector ops become signed scalar ops which means followup
optimizations will assume undefined behavior on overflow.

> > > I'm not sure what you consider a breaking change here. Is that the implied
> > > threat to use undefinedness for range deduction and other optimizations?
> >
> > Consider the stdx::simd implementation. It currently follows semantics of the
> > builtin types. So simd<char> can be shifted by 30 without UB. The
> > implementation of the shift operator depends on the current behavior, even if
> > it is target-dependent. For PPC the simd implementation adds extra code to
> > avoid the "UB". With nailing down shifts > sizeof(T) as UB this extra code now
> > needs to be added for all targets.
>
> What does stdx::simd do on LLVM, where that has always been UB even on x86?
>
> Alexander

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] doc: clarify semantics of vector bitwise shifts
  2023-06-01 18:25         ` Alexander Monakov
  2023-06-02  7:07           ` Matthias Kretz
@ 2023-06-02  9:39           ` Richard Biener
  1 sibling, 0 replies; 17+ messages in thread
From: Richard Biener @ 2023-06-02  9:39 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: gcc-patches

On Thu, Jun 1, 2023 at 8:25 PM Alexander Monakov <amonakov@ispras.ru> wrote:
>
>
> On Wed, 31 May 2023, Richard Biener wrote:
>
> > On Tue, May 30, 2023 at 4:49 PM Alexander Monakov <amonakov@ispras.ru> wrote:
> > >
> > >
> > > On Thu, 25 May 2023, Richard Biener wrote:
> > >
> > > > On Wed, May 24, 2023 at 8:36 PM Alexander Monakov <amonakov@ispras.ru> wrote:
> > > > >
> > > > >
> > > > > On Wed, 24 May 2023, Richard Biener via Gcc-patches wrote:
> > > > >
> > > > > > I’d have to check the ISAs what they actually do here - it of course depends
> > > > > > on RTL semantics as well but as you say those are not strictly defined here
> > > > > > either.
> > > > >
> > > > > Plus, we can add the following executable test to the testsuite:
> > > >
> > > > Yeah, that's probably a good idea.  I think your documentation change
> > > > with the added sentence about the truncation is OK.
> > >
> > > I am no longer confident in my patch, sorry.
> > >
> > > My claim about vector shift semantics in OpenCL was wrong. In fact it specifies
> > > that RHS of a vector shift is masked to the exact bitwidth of the element type.
> > >
> > > So, to collect various angles:
> > >
> > > 1. OpenCL semantics would need an 'AND' before a shift (except VSX/Altivec).
> > >
> > > 2. From user side we had a request to follow C integer promotion semantics
> > >    in https://gcc.gnu.org/PR91838 but I now doubt we can do that.
> > >
> > > 3. LLVM makes oversized vector shifts UB both for 'vector_size' and
> > >    'ext_vector_type'.
> >
> > I had the impression GCC desired to do 3. as well, matching what we do
> > for scalar shifts.
> >
> > > 4. Vector lowering does not emit promotions, and starting from gcc-12
> > >    ranger treats oversized shifts according to the documentation you
> > >    cite below, and optimizes (e.g. with '-O2 -mno-sse')
> > >
> > >         typedef short v8hi __attribute__((vector_size(16)));
> > >
> > >         void f(v8hi *p)
> > >         {
> > >             *p >>= 16;
> > >         }
> > >
> > >    to zeroing '*p'. If this looks unintended, I can file a bug.
> > >
> > > I still think we need to clarify semantics of vector shifts, but probably
> > > not in the way I proposed initially. What do you think?
> >
> > I think the intent at some point was to adhere to the OpenCL spec
> > for the GCC vector extension (because that's a written spec while
> > GCCs vector extension docs are lacking).  Originally the powerpc
> > altivec 'vector' keyword spurred most of the development IIRC
> > so it might be useful to see how they specify shifts.
>
> It doesn't look like they document the semantics of '<<' and '>>'
> operators for vector types.
>
> > So yes, we probably should clarify the semantics to match the
> > implementation (since we have two targets doing things differently
> > since forever we can only document it as UB) and also note the
> > difference from OpenCL (in case OpenCL is still relevant these
> > days we might want to offer a -fopencl-vectors to emit the required
> > AND).
>
> It doesn't have to be UB, in principle we could say that shift amount
> is taken modulo some power of two depending on the target without UB.
> But since LLVM already treats that as UB, we might as well follow.
>
> I think for addition/multiplication of signed vectors everybody
> expects them to have wrapping semantics without UB on overflow though?

Actually GCC already treats them as UB on overflow by means of
vector lowering eventually turning them into scalar operations and
quite some patterns in match.pd applying to ANY_INTEGRAL_TYPE_P.

> Revised patch below.

The revised patch is OK.

Thanks,
Richard.

> > It would be also good to amend the RTL documentation.
> >
> > It would be very nice to start an internals documentation section
> > around collecting what the middle-end considers undefined
> > or implementation defined (aka target defined) behavior in the
> > GENERIC, GIMPLE and RTL ILs and what predicates eventually
> > control that (like TYPE_OVERFLOW_UNDEFINED).  Maybe spread it over
> > {gimple,generic,rtl}.texi, though gimple.texi is only about the representation
> > and all semantics are shared and documented in generic.texi.
>
> Hm, noted. Thanks.
>
> ---8<---
>
> From e4e8d9e262f2f8dbc91a94291cf7accb74d27e7c Mon Sep 17 00:00:00 2001
> From: Alexander Monakov <amonakov@ispras.ru>
> Date: Wed, 24 May 2023 15:48:29 +0300
> Subject: [PATCH] doc: clarify semantics of vector bitwise shifts
>
> Explicitly say that attempted shift past element bit width is UB for
> vector types.  Mention that integer promotions do not happen.
>
> gcc/ChangeLog:
>
>         * doc/extend.texi (Vector Extensions): Clarify bitwise shift
>         semantics.
> ---
>  gcc/doc/extend.texi | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index e426a2eb7d..3723cfe467 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -12026,7 +12026,14 @@ elements in the operand.
>  It is possible to use shifting operators @code{<<}, @code{>>} on
>  integer-type vectors. The operation is defined as following: @code{@{a0,
>  a1, @dots{}, an@} >> @{b0, b1, @dots{}, bn@} == @{a0 >> b0, a1 >> b1,
> -@dots{}, an >> bn@}}@. Vector operands must have the same number of
> +@dots{}, an >> bn@}}@.  Unlike OpenCL, values of @code{b} are not
> +implicitly taken modulo bit width of the base type @code{B}, and the behavior
> +is undefined if any @code{bi} is greater than or equal to @code{B}.
> +
> +In contrast to scalar operations in C and C++, operands of integer vector
> +operations do not undergo integer promotions.
> +
> +Operands of binary vector operations must have the same number of
>  elements.
>
>  For convenience, it is allowed to use a binary vector operation
> --
> 2.39.2

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] doc: clarify semantics of vector bitwise shifts
  2023-05-24 13:21 ` Richard Biener
@ 2023-05-24 14:21   ` Alexander Monakov
  0 siblings, 0 replies; 17+ messages in thread
From: Alexander Monakov @ 2023-05-24 14:21 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3322 bytes --]


On Wed, 24 May 2023, Richard Biener wrote:

> On Wed, May 24, 2023 at 2:54 PM Alexander Monakov via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Explicitly say that bitwise shifts for narrow types work similar to
> > element-wise C shifts with integer promotions, which coincides with
> > OpenCL semantics.
> 
> Do we need to clarify that v << w with v being a vector of shorts
> still yields a vector of shorts and not a vector of ints?

I don't think so, but if necessary we could add "and the result was
truncated back to the base type":

    When the base type is narrower than @code{int}, element-wise shifts
    are performed as if operands underwent C integer promotions, and
    the result was truncated back to the base type, like in OpenCL. 

> Btw, I don't see this promotion reflected in the IL.  For
> 
> typedef short v8hi __attribute__((vector_size(16)));
> 
> v8hi foo (v8hi a, v8hi b)
> {
>   return a << b;
> }
> 
> I get no masking of 'b' and vector lowering if the target doens't handle it
> yields
> 
>   short int _5;
>   short int _6;
> 
>   _5 = BIT_FIELD_REF <a_1(D), 16, 0>;
>   _6 = BIT_FIELD_REF <b_2(D), 16, 0>;
>   _7 = _5 << _6;
> 
> which we could derive ranges from for _6 (apparantly we don't yet).

Here it depends on how we define the GIMPLE-level semantics of bit-shift
operators for narrow types. To avoid changing lowering we could say that
shifting by up to 31 bits is well-defined for narrow types.

RTL-level semantics are also undocumented, unfortunately.

> Even
> 
> typedef int v8hi __attribute__((vector_size(16)));
> 
> v8hi x;
> int foo (v8hi a, v8hi b)
> {
>   x = a << b;
>   return (b[0] > 33);
> }
> 
> isn't optimized currently (but could - note I've used 'int' elements here).

Yeah. But let's constrain the optimizations first.

> So, I don't see us making sure the hardware does the right thing for
> out-of bound values.

I think in practice it worked out even if GCC did not pay attention to it,
because SIMD instructions had to facilitate autovectorization for C with
corresponding shift semantics.

Alexander

> 
> Richard.
> 
> > gcc/ChangeLog:
> >
> >         * doc/extend.texi (Vector Extensions): Clarify bitwise shift
> >         semantics.
> > ---
> >  gcc/doc/extend.texi | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> > index e426a2eb7d..6b4e94b6a1 100644
> > --- a/gcc/doc/extend.texi
> > +++ b/gcc/doc/extend.texi
> > @@ -12026,7 +12026,12 @@ elements in the operand.
> >  It is possible to use shifting operators @code{<<}, @code{>>} on
> >  integer-type vectors. The operation is defined as following: @code{@{a0,
> >  a1, @dots{}, an@} >> @{b0, b1, @dots{}, bn@} == @{a0 >> b0, a1 >> b1,
> > -@dots{}, an >> bn@}}@. Vector operands must have the same number of
> > +@dots{}, an >> bn@}}@.  When the base type is narrower than @code{int},
> > +element-wise shifts are performed as if operands underwent C integer
> > +promotions, like in OpenCL.  This makes vector shifts by up to 31 bits
> > +well-defined for vectors with @code{char} and @code{short} base types.
> > +
> > +Operands of binary vector operations must have the same number of
> >  elements.
> >
> >  For convenience, it is allowed to use a binary vector operation
> > --
> > 2.39.2
> >
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] doc: clarify semantics of vector bitwise shifts
  2023-05-24 12:53 Alexander Monakov
@ 2023-05-24 13:21 ` Richard Biener
  2023-05-24 14:21   ` Alexander Monakov
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Biener @ 2023-05-24 13:21 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: gcc-patches

On Wed, May 24, 2023 at 2:54 PM Alexander Monakov via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Explicitly say that bitwise shifts for narrow types work similar to
> element-wise C shifts with integer promotions, which coincides with
> OpenCL semantics.

Do we need to clarify that v << w with v being a vector of shorts
still yields a vector of shorts and not a vector of ints?

Btw, I don't see this promotion reflected in the IL.  For

typedef short v8hi __attribute__((vector_size(16)));

v8hi foo (v8hi a, v8hi b)
{
  return a << b;
}

I get no masking of 'b' and vector lowering if the target doens't handle it
yields

  short int _5;
  short int _6;

  _5 = BIT_FIELD_REF <a_1(D), 16, 0>;
  _6 = BIT_FIELD_REF <b_2(D), 16, 0>;
  _7 = _5 << _6;

which we could derive ranges from for _6 (apparantly we don't yet).  Even

typedef int v8hi __attribute__((vector_size(16)));

v8hi x;
int foo (v8hi a, v8hi b)
{
  x = a << b;
  return (b[0] > 33);
}

isn't optimized currently (but could - note I've used 'int' elements here).

So, I don't see us making sure the hardware does the right thing for
out-of bound values.

Richard.

> gcc/ChangeLog:
>
>         * doc/extend.texi (Vector Extensions): Clarify bitwise shift
>         semantics.
> ---
>  gcc/doc/extend.texi | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index e426a2eb7d..6b4e94b6a1 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -12026,7 +12026,12 @@ elements in the operand.
>  It is possible to use shifting operators @code{<<}, @code{>>} on
>  integer-type vectors. The operation is defined as following: @code{@{a0,
>  a1, @dots{}, an@} >> @{b0, b1, @dots{}, bn@} == @{a0 >> b0, a1 >> b1,
> -@dots{}, an >> bn@}}@. Vector operands must have the same number of
> +@dots{}, an >> bn@}}@.  When the base type is narrower than @code{int},
> +element-wise shifts are performed as if operands underwent C integer
> +promotions, like in OpenCL.  This makes vector shifts by up to 31 bits
> +well-defined for vectors with @code{char} and @code{short} base types.
> +
> +Operands of binary vector operations must have the same number of
>  elements.
>
>  For convenience, it is allowed to use a binary vector operation
> --
> 2.39.2
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] doc: clarify semantics of vector bitwise shifts
@ 2023-05-24 12:53 Alexander Monakov
  2023-05-24 13:21 ` Richard Biener
  0 siblings, 1 reply; 17+ messages in thread
From: Alexander Monakov @ 2023-05-24 12:53 UTC (permalink / raw)
  To: gcc-patches; +Cc: Alexander Monakov

Explicitly say that bitwise shifts for narrow types work similar to
element-wise C shifts with integer promotions, which coincides with
OpenCL semantics.

gcc/ChangeLog:

	* doc/extend.texi (Vector Extensions): Clarify bitwise shift
	semantics.
---
 gcc/doc/extend.texi | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index e426a2eb7d..6b4e94b6a1 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -12026,7 +12026,12 @@ elements in the operand.
 It is possible to use shifting operators @code{<<}, @code{>>} on
 integer-type vectors. The operation is defined as following: @code{@{a0,
 a1, @dots{}, an@} >> @{b0, b1, @dots{}, bn@} == @{a0 >> b0, a1 >> b1,
-@dots{}, an >> bn@}}@. Vector operands must have the same number of
+@dots{}, an >> bn@}}@.  When the base type is narrower than @code{int},
+element-wise shifts are performed as if operands underwent C integer
+promotions, like in OpenCL.  This makes vector shifts by up to 31 bits
+well-defined for vectors with @code{char} and @code{short} base types.
+
+Operands of binary vector operations must have the same number of
 elements. 
 
 For convenience, it is allowed to use a binary vector operation
-- 
2.39.2


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-06-02  9:40 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-24 16:28 [PATCH] doc: clarify semantics of vector bitwise shifts Richard Biener
2023-05-24 18:36 ` Alexander Monakov
2023-05-25  6:50   ` Richard Biener
2023-05-25 10:46     ` Richard Biener
2023-05-30 14:49     ` Alexander Monakov
2023-05-31  7:12       ` Richard Biener
2023-06-01 18:25         ` Alexander Monakov
2023-06-02  7:07           ` Matthias Kretz
2023-06-02  7:49             ` Alexander Monakov
2023-06-02  9:03               ` Matthias Kretz
2023-06-02  9:24                 ` Alexander Monakov
2023-06-02  9:34                   ` Matthias Kretz
2023-06-02  9:36                   ` Richard Biener
2023-06-02  9:39           ` Richard Biener
  -- strict thread matches above, loose matches on Subject: below --
2023-05-24 12:53 Alexander Monakov
2023-05-24 13:21 ` Richard Biener
2023-05-24 14:21   ` Alexander Monakov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).