public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Bill Schmidt <wschmidt@linux.ibm.com>
To: "Paul A. Clarke" <pc@us.ibm.com>, gcc-patches@gcc.gnu.org
Cc: segher@kernel.crashing.org
Subject: Re: [PATCH 4/6] rs6000: Support SSE4.1 "cvt" intrinsics
Date: Wed, 18 Aug 2021 14:19:09 -0500	[thread overview]
Message-ID: <e5c38c62-8a47-24eb-c0a5-22320b54c4d2@linux.ibm.com> (raw)
In-Reply-To: <20210809202355.568303-5-pc@us.ibm.com>

Hi Paul,

On 8/9/21 3:23 PM, Paul A. Clarke via Gcc-patches wrote:
> Also, copy tests for:
> - _mm_cvtepi8_epi16, _mm_cvtepi8_epi32, _mm_cvtepi8_epi64
> - _mm_cvtepi16_epi32, _mm_cvtepi16_epi64
> - _mm_cvtepi32_epi64,
> - _mm_cvtepu8_epi16, _mm_cvtepu8_epi32, _mm_cvtepu8_epi64
> - _mm_cvtepu16_epi32, _mm_cvtepu16_epi64
> - _mm_cvtepu32_epi64
>
> from gcc/testsuite/gcc.target/i386.
>
> sse4_1-pmovsxbd.c, sse4_1-pmovsxbq.c, and sse4_1-pmovsxbw.c were
> modified from using "char" types to "signed char" types, because
> the default is unsigned on powerpc.


Testing, backports, etc.

This patch LGTM with the usual comment about documenting -Wno-psabi.

Thanks!
Bill

>
> 2021-08-09  Paul A. Clarke  <pc@us.ibm.com>
>
> gcc
> 	* config/rs6000/smmintrin.h (_mm_cvtepi8_epi16, _mm_cvtepi8_epi32,
> 	_mm_cvtepi8_epi64, _mm_cvtepi16_epi32, _mm_cvtepi16_epi64,
> 	_mm_cvtepi32_epi64, _mm_cvtepu8_epi16, _mm_cvtepu8_epi32,
> 	_mm_cvtepu8_epi64, _mm_cvtepu16_epi32, _mm_cvtepu16_epi64,
> 	_mm_cvtepu32_epi64): New.
>
> gcc/testsuite
> 	* gcc.target/powerpc/sse4_1-pmovsxbd.c: Copy from gcc.target/i386,
> 	adjust dg directives to suit.
> 	* gcc.target/powerpc/sse4_1-pmovsxbq.c: Same.
> 	* gcc.target/powerpc/sse4_1-pmovsxbw.c: Same.
> 	* gcc.target/powerpc/sse4_1-pmovsxdq.c: Same.
> 	* gcc.target/powerpc/sse4_1-pmovsxwd.c: Same.
> 	* gcc.target/powerpc/sse4_1-pmovsxwq.c: Same.
> 	* gcc.target/powerpc/sse4_1-pmovzxbd.c: Same.
> 	* gcc.target/powerpc/sse4_1-pmovzxbq.c: Same.
> 	* gcc.target/powerpc/sse4_1-pmovzxbw.c: Same.
> 	* gcc.target/powerpc/sse4_1-pmovzxdq.c: Same.
> 	* gcc.target/powerpc/sse4_1-pmovzxwd.c: Same.
> 	* gcc.target/powerpc/sse4_1-pmovzxwq.c: Same.
> ---
>   gcc/config/rs6000/smmintrin.h                 | 136 ++++++++++++++++++
>   .../gcc.target/powerpc/sse4_1-pmovsxbd.c      |  42 ++++++
>   .../gcc.target/powerpc/sse4_1-pmovsxbq.c      |  42 ++++++
>   .../gcc.target/powerpc/sse4_1-pmovsxbw.c      |  42 ++++++
>   .../gcc.target/powerpc/sse4_1-pmovsxdq.c      |  42 ++++++
>   .../gcc.target/powerpc/sse4_1-pmovsxwd.c      |  42 ++++++
>   .../gcc.target/powerpc/sse4_1-pmovsxwq.c      |  42 ++++++
>   .../gcc.target/powerpc/sse4_1-pmovzxbd.c      |  43 ++++++
>   .../gcc.target/powerpc/sse4_1-pmovzxbq.c      |  43 ++++++
>   .../gcc.target/powerpc/sse4_1-pmovzxbw.c      |  43 ++++++
>   .../gcc.target/powerpc/sse4_1-pmovzxdq.c      |  43 ++++++
>   .../gcc.target/powerpc/sse4_1-pmovzxwd.c      |  43 ++++++
>   .../gcc.target/powerpc/sse4_1-pmovzxwq.c      |  43 ++++++
>   13 files changed, 646 insertions(+)
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbd.c
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbq.c
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbw.c
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxdq.c
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwd.c
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwq.c
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbd.c
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbq.c
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbw.c
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxdq.c
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwd.c
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwq.c
>
> diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
> index 5d345e3fd56b..7f6ff7baff50 100644
> --- a/gcc/config/rs6000/smmintrin.h
> +++ b/gcc/config/rs6000/smmintrin.h
> @@ -448,6 +448,142 @@ _mm_max_epu32 (__m128i __X, __m128i __Y)
>     return (__m128i) vec_max ((__v4su)__X, (__v4su)__Y);
>   }
>
> +__inline __m128i
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvtepi8_epi16 (__m128i __A)
> +{
> +  return (__m128i) vec_unpackh ((__v16qi)__A);
> +}
> +
> +__inline __m128i
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvtepi8_epi32 (__m128i __A)
> +{
> +  __A = (__m128i) vec_unpackh ((__v16qi)__A);
> +  return (__m128i) vec_unpackh ((__v8hi)__A);
> +}
> +
> +#ifdef _ARCH_PWR8
> +__inline __m128i
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvtepi8_epi64 (__m128i __A)
> +{
> +  __A = (__m128i) vec_unpackh ((__v16qi)__A);
> +  __A = (__m128i) vec_unpackh ((__v8hi)__A);
> +  return (__m128i) vec_unpackh ((__v4si)__A);
> +}
> +#endif
> +
> +__inline __m128i
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvtepi16_epi32 (__m128i __A)
> +{
> +  return (__m128i) vec_unpackh ((__v8hi)__A);
> +}
> +
> +#ifdef _ARCH_PWR8
> +__inline __m128i
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvtepi16_epi64 (__m128i __A)
> +{
> +  __A = (__m128i) vec_unpackh ((__v8hi)__A);
> +  return (__m128i) vec_unpackh ((__v4si)__A);
> +}
> +
> +__inline __m128i
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvtepi32_epi64 (__m128i __A)
> +{
> +  return (__m128i) vec_unpackh ((__v4si)__A);
> +}
> +#endif
> +
> +__inline __m128i
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvtepu8_epi16 (__m128i __A)
> +{
> +  const __v16qu __zero = {0};
> +#ifdef __LITTLE_ENDIAN__
> +  __A = (__m128i) vec_mergeh ((__v16qu)__A, __zero);
> +#else /* __BIG_ENDIAN__.  */
> +  __A = (__m128i) vec_mergeh (__zero, (__v16qu)__A);
> +#endif /* __BIG_ENDIAN__.  */
> +  return __A;
> +}
> +
> +__inline __m128i
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvtepu8_epi32 (__m128i __A)
> +{
> +  const __v16qu __zero = {0};
> +#ifdef __LITTLE_ENDIAN__
> +  __A = (__m128i) vec_mergeh ((__v16qu)__A, __zero);
> +  __A = (__m128i) vec_mergeh ((__v8hu)__A, (__v8hu)__zero);
> +#else /* __BIG_ENDIAN__.  */
> +  __A = (__m128i) vec_mergeh (__zero, (__v16qu)__A);
> +  __A = (__m128i) vec_mergeh ((__v8hu)__zero, (__v8hu)__A);
> +#endif /* __BIG_ENDIAN__.  */
> +  return __A;
> +}
> +
> +__inline __m128i
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvtepu8_epi64 (__m128i __A)
> +{
> +  const __v16qu __zero = {0};
> +#ifdef __LITTLE_ENDIAN__
> +  __A = (__m128i) vec_mergeh ((__v16qu)__A, __zero);
> +  __A = (__m128i) vec_mergeh ((__v8hu)__A, (__v8hu)__zero);
> +  __A = (__m128i) vec_mergeh ((__v4su)__A, (__v4su)__zero);
> +#else /* __BIG_ENDIAN__.  */
> +  __A = (__m128i) vec_mergeh (__zero, (__v16qu)__A);
> +  __A = (__m128i) vec_mergeh ((__v8hu)__zero, (__v8hu)__A);
> +  __A = (__m128i) vec_mergeh ((__v4su)__zero, (__v4su)__A);
> +#endif /* __BIG_ENDIAN__.  */
> +  return __A;
> +}
> +
> +__inline __m128i
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvtepu16_epi32 (__m128i __A)
> +{
> +  const __v8hu __zero = {0};
> +#ifdef __LITTLE_ENDIAN__
> +  __A = (__m128i) vec_mergeh ((__v8hu)__A, __zero);
> +#else /* __BIG_ENDIAN__.  */
> +  __A = (__m128i) vec_mergeh (__zero, (__v8hu)__A);
> +#endif /* __BIG_ENDIAN__.  */
> +  return __A;
> +}
> +
> +__inline __m128i
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvtepu16_epi64 (__m128i __A)
> +{
> +  const __v8hu __zero = {0};
> +#ifdef __LITTLE_ENDIAN__
> +  __A = (__m128i) vec_mergeh ((__v8hu)__A, __zero);
> +  __A = (__m128i) vec_mergeh ((__v4su)__A, (__v4su)__zero);
> +#else /* __BIG_ENDIAN__.  */
> +  __A = (__m128i) vec_mergeh (__zero, (__v8hu)__A);
> +  __A = (__m128i) vec_mergeh ((__v4su)__zero, (__v4su)__A);
> +#endif /* __BIG_ENDIAN__.  */
> +  return __A;
> +}
> +
> +__inline __m128i
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cvtepu32_epi64 (__m128i __A)
> +{
> +  const __v4su __zero = {0};
> +#ifdef __LITTLE_ENDIAN__
> +  __A = (__m128i) vec_mergeh ((__v4su)__A, __zero);
> +#else /* __BIG_ENDIAN__.  */
> +  __A = (__m128i) vec_mergeh (__zero, (__v4su)__A);
> +#endif /* __BIG_ENDIAN__.  */
> +  return __A;
> +}
> +
>   /* Return horizontal packed word minimum and its index in bits [15:0]
>      and bits [18:16] respectively.  */
>   __inline __m128i
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbd.c
> new file mode 100644
> index 000000000000..ba8627489cfa
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbd.c
> @@ -0,0 +1,42 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mvsx -Wno-psabi" } */
> +
> +#ifndef CHECK_H
> +#define CHECK_H "sse4_1-check.h"
> +#endif
> +
> +#ifndef TEST
> +#define TEST sse4_1_test
> +#endif
> +
> +#include CHECK_H
> +
> +#include <smmintrin.h>
> +
> +#define NUM 128
> +
> +static void
> +TEST (void)
> +{
> +  union
> +    {
> +      __m128i x[NUM / 4];
> +      int i[NUM];
> +      signed char c[NUM * 4];
> +    } dst, src;
> +  int i, sign = 1;
> +
> +  for (i = 0; i < NUM; i++)
> +    {
> +      src.c[(i % 4) + (i / 4) * 16] = i * i * sign;
> +      sign = -sign;
> +    }
> +
> +  for (i = 0; i < NUM; i += 4)
> +    dst.x [i / 4] = _mm_cvtepi8_epi32 (src.x [i / 4]);
> +
> +  for (i = 0; i < NUM; i++)
> +    if (src.c[(i % 4) + (i / 4) * 16] != dst.i[i])
> +      abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbq.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbq.c
> new file mode 100644
> index 000000000000..57c61dddd13f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbq.c
> @@ -0,0 +1,42 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target p8vector_hw } */
> +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
> +
> +#ifndef CHECK_H
> +#define CHECK_H "sse4_1-check.h"
> +#endif
> +
> +#ifndef TEST
> +#define TEST sse4_1_test
> +#endif
> +
> +#include CHECK_H
> +
> +#include <smmintrin.h>
> +
> +#define NUM 128
> +
> +static void
> +TEST (void)
> +{
> +  union
> +    {
> +      __m128i x[NUM / 2];
> +      long long ll[NUM];
> +      signed char c[NUM * 8];
> +    } dst, src;
> +  int i, sign = 1;
> +
> +  for (i = 0; i < NUM; i++)
> +    {
> +      src.c[(i % 2) + (i / 2) * 16] = i * i * sign;
> +      sign = -sign;
> +    }
> +
> +  for (i = 0; i < NUM; i += 2)
> +    dst.x [i / 2] = _mm_cvtepi8_epi64 (src.x [i / 2]);
> +
> +  for (i = 0; i < NUM; i++)
> +    if (src.c[(i % 2) + (i / 2) * 16] != dst.ll[i])
> +      abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbw.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbw.c
> new file mode 100644
> index 000000000000..510b2e2ca03a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxbw.c
> @@ -0,0 +1,42 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mvsx -Wno-psabi" } */
> +
> +#ifndef CHECK_H
> +#define CHECK_H "sse4_1-check.h"
> +#endif
> +
> +#ifndef TEST
> +#define TEST sse4_1_test
> +#endif
> +
> +#include CHECK_H
> +
> +#include <smmintrin.h>
> +
> +#define NUM 128
> +
> +static void
> +TEST (void)
> +{
> +  union
> +    {
> +      __m128i x[NUM / 8];
> +      short s[NUM];
> +      signed char c[NUM * 2];
> +    } dst, src;
> +  int i, sign = 1;
> +
> +  for (i = 0; i < NUM; i++)
> +    {
> +      src.c[(i % 8) + (i / 8) * 16] = i * i * sign;
> +      sign = -sign;
> +    }
> +
> +  for (i = 0; i < NUM; i += 8)
> +    dst.x [i / 8] = _mm_cvtepi8_epi16 (src.x [i / 8]);
> +
> +  for (i = 0; i < NUM; i++)
> +    if (src.c[(i % 8) + (i / 8) * 16] != dst.s[i])
> +      abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxdq.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxdq.c
> new file mode 100644
> index 000000000000..0126883b4368
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxdq.c
> @@ -0,0 +1,42 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target p8vector_hw } */
> +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
> +
> +#ifndef CHECK_H
> +#define CHECK_H "sse4_1-check.h"
> +#endif
> +
> +#ifndef TEST
> +#define TEST sse4_1_test
> +#endif
> +
> +#include CHECK_H
> +
> +#include <smmintrin.h>
> +
> +#define NUM 128
> +
> +static void
> +TEST (void)
> +{
> +  union
> +    {
> +      __m128i x[NUM / 2];
> +      long long ll[NUM];
> +      int i[NUM * 2];
> +    } dst, src;
> +  int i, sign = 1;
> +
> +  for (i = 0; i < NUM; i++)
> +    {
> +      src.i[(i % 2) + (i / 2) * 4] = i * i * sign;
> +      sign = -sign;
> +    }
> +
> +  for (i = 0; i < NUM; i += 2)
> +    dst.x [i / 2] = _mm_cvtepi32_epi64 (src.x [i / 2]);
> +
> +  for (i = 0; i < NUM; i++)
> +    if (src.i[(i % 2) + (i / 2) * 4] != dst.ll[i])
> +      abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwd.c
> new file mode 100644
> index 000000000000..8018d331be72
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwd.c
> @@ -0,0 +1,42 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mvsx -Wno-psabi" } */
> +
> +#ifndef CHECK_H
> +#define CHECK_H "sse4_1-check.h"
> +#endif
> +
> +#ifndef TEST
> +#define TEST sse4_1_test
> +#endif
> +
> +#include CHECK_H
> +
> +#include <smmintrin.h>
> +
> +#define NUM 128
> +
> +static void
> +TEST (void)
> +{
> +  union
> +    {
> +      __m128i x[NUM / 4];
> +      int i[NUM];
> +      short s[NUM * 2];
> +    } dst, src;
> +  int i, sign = 1;
> +
> +  for (i = 0; i < NUM; i++)
> +    {
> +      src.s[(i % 4) + (i / 4) * 8] = i * i * sign;
> +      sign = -sign;
> +    }
> +
> +  for (i = 0; i < NUM; i += 4)
> +    dst.x [i / 4] = _mm_cvtepi16_epi32 (src.x [i / 4]);
> +
> +  for (i = 0; i < NUM; i++)
> +    if (src.s[(i % 4) + (i / 4) * 8] != dst.i[i])
> +      abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwq.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwq.c
> new file mode 100644
> index 000000000000..c513b095fe55
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovsxwq.c
> @@ -0,0 +1,42 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target p8vector_hw } */
> +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
> +
> +#ifndef CHECK_H
> +#define CHECK_H "sse4_1-check.h"
> +#endif
> +
> +#ifndef TEST
> +#define TEST sse4_1_test
> +#endif
> +
> +#include CHECK_H
> +
> +#include <smmintrin.h>
> +
> +#define NUM 128
> +
> +static void
> +TEST (void)
> +{
> +  union
> +    {
> +      __m128i x[NUM / 2];
> +      long long ll[NUM];
> +      short s[NUM * 4];
> +    } dst, src;
> +  int i, sign = 1;
> +
> +  for (i = 0; i < NUM; i++)
> +    {
> +      src.s[(i % 2) + (i / 2) * 8] = i * i * sign;
> +      sign = -sign;
> +    }
> +
> +  for (i = 0; i < NUM; i += 2)
> +    dst.x [i / 2] = _mm_cvtepi16_epi64 (src.x [i / 2]);
> +
> +  for (i = 0; i < NUM; i++)
> +    if (src.s[(i % 2) + (i / 2) * 8] != dst.ll[i])
> +      abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbd.c
> new file mode 100644
> index 000000000000..65c42e58f8ef
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbd.c
> @@ -0,0 +1,43 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mvsx -Wno-psabi" } */
> +
> +#ifndef CHECK_H
> +#define CHECK_H "sse4_1-check.h"
> +#endif
> +
> +#ifndef TEST
> +#define TEST sse4_1_test
> +#endif
> +
> +#include CHECK_H
> +
> +#include <smmintrin.h>
> +
> +#define NUM 128
> +
> +static void
> +TEST (void)
> +{
> +  union
> +    {
> +      __m128i x[NUM / 4];
> +      unsigned int i[NUM];
> +      unsigned char c[NUM * 4];
> +    } dst, src;
> +  int i;
> +
> +  for (i = 0; i < NUM; i++)
> +    {
> +      src.c[(i % 4) + (i / 4) * 16] = i * i;
> +      if ((i % 4))
> +	src.c[(i % 4) + (i / 4) * 16] |= 0x80;
> +    }
> +
> +  for (i = 0; i < NUM; i += 4)
> +    dst.x [i / 4] = _mm_cvtepu8_epi32 (src.x [i / 4]);
> +
> +  for (i = 0; i < NUM; i++)
> +    if (src.c[(i % 4) + (i / 4) * 16] != dst.i[i])
> +      abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbq.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbq.c
> new file mode 100644
> index 000000000000..7d59300f820b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbq.c
> @@ -0,0 +1,43 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mvsx -Wno-psabi" } */
> +
> +#ifndef CHECK_H
> +#define CHECK_H "sse4_1-check.h"
> +#endif
> +
> +#ifndef TEST
> +#define TEST sse4_1_test
> +#endif
> +
> +#include CHECK_H
> +
> +#include <smmintrin.h>
> +
> +#define NUM 128
> +
> +static void
> +TEST (void)
> +{
> +  union
> +    {
> +      __m128i x[NUM / 2];
> +      unsigned long long ll[NUM];
> +      unsigned char c[NUM * 8];
> +    } dst, src;
> +  int i;
> +
> +  for (i = 0; i < NUM; i++)
> +    {
> +      src.c[(i % 2) + (i / 2) * 16] = i * i;
> +      if ((i % 2))
> +	src.c[(i % 2) + (i / 2) * 16] |= 0x80;
> +    }
> +
> +  for (i = 0; i < NUM; i += 2)
> +    dst.x [i / 2] = _mm_cvtepu8_epi64 (src.x [i / 2]);
> +
> +  for (i = 0; i < NUM; i++)
> +    if (src.c[(i % 2) + (i / 2) * 16] != dst.ll[i])
> +      abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbw.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbw.c
> new file mode 100644
> index 000000000000..c3963698db0b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxbw.c
> @@ -0,0 +1,43 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mvsx -Wno-psabi" } */
> +
> +#ifndef CHECK_H
> +#define CHECK_H "sse4_1-check.h"
> +#endif
> +
> +#ifndef TEST
> +#define TEST sse4_1_test
> +#endif
> +
> +#include CHECK_H
> +
> +#include <smmintrin.h>
> +
> +#define NUM 128
> +
> +static void
> +TEST (void)
> +{
> +  union
> +    {
> +      __m128i x[NUM / 8];
> +      unsigned short s[NUM];
> +      unsigned char c[NUM * 2];
> +    } dst, src;
> +  int i;
> +
> +  for (i = 0; i < NUM; i++)
> +    {
> +      src.c[(i % 8) + (i / 8) * 16] = i * i;
> +      if ((i % 4))
> +	src.c[(i % 8) + (i / 8) * 16] |= 0x80;
> +    }
> +
> +  for (i = 0; i < NUM; i += 8)
> +    dst.x [i / 8] = _mm_cvtepu8_epi16 (src.x [i / 8]);
> +
> +  for (i = 0; i < NUM; i++)
> +    if (src.c[(i % 8) + (i / 8) * 16] != dst.s[i])
> +      abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxdq.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxdq.c
> new file mode 100644
> index 000000000000..bc05089a7e1a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxdq.c
> @@ -0,0 +1,43 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mvsx -Wno-psabi" } */
> +
> +#ifndef CHECK_H
> +#define CHECK_H "sse4_1-check.h"
> +#endif
> +
> +#ifndef TEST
> +#define TEST sse4_1_test
> +#endif
> +
> +#include CHECK_H
> +
> +#include <smmintrin.h>
> +
> +#define NUM 128
> +
> +static void
> +TEST (void)
> +{
> +  union
> +    {
> +      __m128i x[NUM / 2];
> +      unsigned long long ll[NUM];
> +      unsigned int i[NUM * 2];
> +    } dst, src;
> +  int i;
> +
> +  for (i = 0; i < NUM; i++)
> +    {
> +      src.i[(i % 2) + (i / 2) * 4] = i * i;
> +      if ((i % 2))
> +        src.i[(i % 2) + (i / 2) * 4] |= 0x80000000;
> +    }
> +
> +  for (i = 0; i < NUM; i += 2)
> +    dst.x [i / 2] = _mm_cvtepu32_epi64 (src.x [i / 2]);
> +
> +  for (i = 0; i < NUM; i++)
> +    if (src.i[(i % 2) + (i / 2) * 4] != dst.ll[i])
> +      abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwd.c
> new file mode 100644
> index 000000000000..a952d028e1e0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwd.c
> @@ -0,0 +1,43 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mvsx -Wno-psabi" } */
> +
> +#ifndef CHECK_H
> +#define CHECK_H "sse4_1-check.h"
> +#endif
> +
> +#ifndef TEST
> +#define TEST sse4_1_test
> +#endif
> +
> +#include CHECK_H
> +
> +#include <smmintrin.h>
> +
> +#define NUM 128
> +
> +static void
> +TEST (void)
> +{
> +  union
> +    {
> +      __m128i x[NUM / 4];
> +      unsigned int i[NUM];
> +      unsigned short s[NUM * 2];
> +    } dst, src;
> +  int i;
> +
> +  for (i = 0; i < NUM; i++)
> +    {
> +      src.s[(i % 4) + (i / 4) * 8] = i * i;
> +      if ((i % 4))
> +	src.s[(i % 4) + (i / 4) * 8] |= 0x8000;
> +    }
> +
> +  for (i = 0; i < NUM; i += 4)
> +    dst.x [i / 4] = _mm_cvtepu16_epi32 (src.x [i / 4]);
> +
> +  for (i = 0; i < NUM; i++)
> +    if (src.s[(i % 4) + (i / 4) * 8] != dst.i[i])
> +      abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwq.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwq.c
> new file mode 100644
> index 000000000000..1ae5857fe0ad
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmovzxwq.c
> @@ -0,0 +1,43 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mvsx -Wno-psabi" } */
> +
> +#ifndef CHECK_H
> +#define CHECK_H "sse4_1-check.h"
> +#endif
> +
> +#ifndef TEST
> +#define TEST sse4_1_test
> +#endif
> +
> +#include CHECK_H
> +
> +#include <smmintrin.h>
> +
> +#define NUM 128
> +
> +static void
> +TEST (void)
> +{
> +  union
> +    {
> +      __m128i x[NUM / 2];
> +      unsigned long long ll[NUM];
> +      unsigned short s[NUM * 4];
> +    } dst, src;
> +  int i;
> +
> +  for (i = 0; i < NUM; i++)
> +    {
> +      src.s[(i % 2) + (i / 2) * 8] = i * i;
> +      if ((i % 2))
> +	src.s[(i % 2) + (i / 2) * 8] |= 0x8000;
> +    }
> +
> +  for (i = 0; i < NUM; i += 2)
> +    dst.x [i / 2] = _mm_cvtepu16_epi64 (src.x [i / 2]);
> +
> +  for (i = 0; i < NUM; i++)
> +    if (src.s[(i % 2) + (i / 2) * 8] != dst.ll[i])
> +      abort ();
> +}

  reply	other threads:[~2021-08-18 19:19 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-09 20:23 [PATCH 0/6] rs6000: Support more SSE4.1 intrinsics Paul A. Clarke
2021-08-09 20:23 ` [PATCH 1/6] rs6000: Support SSE4.1 "round" intrinsics Paul A. Clarke
2021-08-18 19:01   ` Bill Schmidt
2021-08-18 22:22     ` Segher Boessenkool
2021-08-18 22:46   ` Segher Boessenkool
2021-08-19 18:16     ` Paul A. Clarke
2021-08-19 19:47       ` Segher Boessenkool
2021-08-09 20:23 ` [PATCH 2/6] rs6000: Support SSE4.1 "min" and "max" intrinsics Paul A. Clarke
2021-08-18 19:08   ` Bill Schmidt
2021-08-09 20:23 ` [PATCH 3/6] rs6000: Simplify some SSE4.1 "test" intrinsics Paul A. Clarke
2021-08-18 19:10   ` Bill Schmidt
2021-08-09 20:23 ` [PATCH 4/6] rs6000: Support SSE4.1 "cvt" intrinsics Paul A. Clarke
2021-08-18 19:19   ` Bill Schmidt [this message]
2021-08-09 20:23 ` [PATCH 5/6] rs6000: Support more SSE4.1 "cmp", "mul", "pack" intrinsics Paul A. Clarke
2021-08-18 19:21   ` Bill Schmidt
2021-08-09 20:23 ` [PATCH 6/6] rs6000: Guard some x86 intrinsics implementations Paul A. Clarke
2021-08-18 19:27   ` Bill Schmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e5c38c62-8a47-24eb-c0a5-22320b54c4d2@linux.ibm.com \
    --to=wschmidt@linux.ibm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=pc@us.ibm.com \
    --cc=segher@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).