On Thu, 25 May 2023, Richard Biener wrote:

> On Wed, May 24, 2023 at 8:36 PM Alexander Monakov <amonakov@ispras.ru> wrote:
> >
> >
> > On Wed, 24 May 2023, Richard Biener via Gcc-patches wrote:
> >
> > > I’d have to check the ISAs what they actually do here - it of course depends
> > > on RTL semantics as well but as you say those are not strictly defined here
> > > either.
> >
> > Plus, we can add the following executable test to the testsuite:
> 
> Yeah, that's probably a good idea.  I think your documentation change
> with the added sentence about the truncation is OK.

I am no longer confident in my patch, sorry.

My claim about vector shift semantics in OpenCL was wrong. In fact it specifies
that RHS of a vector shift is masked to the exact bitwidth of the element type.

So, to collect various angles:

1. OpenCL semantics would need an 'AND' before a shift (except VSX/Altivec).

2. From user side we had a request to follow C integer promotion semantics
   in https://gcc.gnu.org/PR91838 but I now doubt we can do that.

3. LLVM makes oversized vector shifts UB both for 'vector_size' and
   'ext_vector_type'.

4. Vector lowering does not emit promotions, and starting from gcc-12
   ranger treats oversized shifts according to the documentation you
   cite below, and optimizes (e.g. with '-O2 -mno-sse')

	typedef short v8hi __attribute__((vector_size(16)));

	void f(v8hi *p)
	{
	    *p >>= 16;
	}

   to zeroing '*p'. If this looks unintended, I can file a bug.

I still think we need to clarify semantics of vector shifts, but probably
not in the way I proposed initially. What do you think?

Thanks.
Alexander

> Note we have
> 
> /* Shift operations for shift and rotate.
>    Shift means logical shift if done on an
>    unsigned type, arithmetic shift if done on a signed type.
>    The second operand is the number of bits to
>    shift by; it need not be the same type as the first operand and result.
>    Note that the result is undefined if the second operand is larger
>    than or equal to the first operand's type size.
> 
>    The first operand of a shift can have either an integer or a
>    (non-integer) fixed-point type.  We follow the ISO/IEC TR 18037:2004
>    semantics for the latter.
> 
>    Rotates are defined for integer types only.  */
> DEFTREECODE (LSHIFT_EXPR, "lshift_expr", tcc_binary, 2)
> 
> in tree.def which implies short << 24 is undefined behavior (similar
> wording in generic.texi).  The rtl docs say nothing about behavior
> but I think the semantics should carry over.  That works for x86
> even for scalar instructions working on GPRs (masking is applied
> but fixed to 5 or 6 bits even for QImode or HImode shifts).
> 
> Note that when we make these shifts well-defined there's
> also arithmetic on signed types smaller than int (which again
> doesn't exist in C) where overflow invokes undefined behavior
> in the middle-end.  Unless we want to change that as well
> this is somewhat inconsistent then.
> 
> There's also the issue that C 'int' is defined by INT_TYPE_SIZE
> and thus target dependent which makes what is undefined and
> what not target dependent.
> 
> Richard.
> 
> > #include <stdint.h>
> >
> > #define CHECK(TYPE, WIDTH, OP, COUNT, INVERT)         \
> > {                                                     \
> > typedef TYPE vec __attribute__((vector_size(WIDTH))); \
> >                                                       \
> >         static volatile vec zero;                     \
> >         vec tmp = (zero-2) OP (COUNT);                \
> >         vec ref = INVERT zero;                        \
> >         if (__builtin_memcmp(&tmp, &ref, sizeof tmp)) \
> >                 __builtin_abort();                    \
> > }
> >
> > int main(void)
> > {
> >         CHECK( uint8_t, 16, <<, 8,  )
> >         CHECK( uint8_t, 16, <<, 31, )
> >         CHECK( uint8_t, 16, >>, 8,  )
> >         CHECK( uint8_t, 16, >>, 31, )
> >         CHECK(  int8_t, 16, <<, 8,  )
> >         CHECK(  int8_t, 16, <<, 31, )
> >         CHECK(  int8_t, 16, >>, 8,  ~)
> >         CHECK(  int8_t, 16, >>, 31, ~)
> >         CHECK(uint16_t, 16, <<, 16, )
> >         CHECK(uint16_t, 16, <<, 31, )
> >         CHECK(uint16_t, 16, >>, 16, )
> >         CHECK(uint16_t, 16, >>, 31, )
> >         CHECK( int16_t, 16, <<, 16, )
> >         CHECK( int16_t, 16, <<, 31, )
> >         CHECK( int16_t, 16, >>, 16, ~)
> >         CHECK( int16_t, 16, >>, 31, ~)
> >         // Per-lane-variable shifts:
> >         CHECK( uint8_t, 16, <<, zero+8,  )
> >         CHECK( uint8_t, 16, <<, zero+31, )
> >         CHECK( uint8_t, 16, >>, zero+8,  )
> >         CHECK( uint8_t, 16, >>, zero+31, )
> >         CHECK(  int8_t, 16, <<, zero+8,  )
> >         CHECK(  int8_t, 16, <<, zero+31, )
> >         CHECK(  int8_t, 16, >>, zero+8,  ~)
> >         CHECK(  int8_t, 16, >>, zero+31, ~)
> >         CHECK(uint16_t, 16, <<, zero+16, )
> >         CHECK(uint16_t, 16, <<, zero+31, )
> >         CHECK(uint16_t, 16, >>, zero+16, )
> >         CHECK(uint16_t, 16, >>, zero+31, )
> >         CHECK( int16_t, 16, <<, zero+16, )
> >         CHECK( int16_t, 16, <<, zero+31, )
> >         CHECK( int16_t, 16, >>, zero+16, ~)
> >         CHECK( int16_t, 16, >>, zero+31, ~)
> >
> >         // Repeat for WIDTH=32 and WIDTH=64
> > }
> >
> > Alexander
>