public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/4] rs6000: Add SSE4.1 "test" and "blend" intrinsics
@ 2021-06-29 18:08 Paul A. Clarke
  2021-06-29 18:08 ` [PATCH 1/4] rs6000: Add support for SSE4.1 "test" intrinsics Paul A. Clarke
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Paul A. Clarke @ 2021-06-29 18:08 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher

Paul A. Clarke (4):
  rs6000: Add support for SSE4.1 "test" intrinsics
  rs6000: Add tests for SSE4.1 "test" intrinsics
  rs6000: Add support for SSE4.1 "blend" intrinsics
  rs6000: Add tests for SSE4.1 "blend" intrinsics

 gcc/config/rs6000/smmintrin.h                 |  96 ++++++++++++++
 .../gcc.target/powerpc/sse4_1-blendpd.c       |  89 +++++++++++++
 .../gcc.target/powerpc/sse4_1-blendps-2.c     |  81 ++++++++++++
 .../gcc.target/powerpc/sse4_1-blendps.c       |  90 ++++++++++++++
 .../gcc.target/powerpc/sse4_1-blendvpd.c      |  65 ++++++++++
 .../gcc.target/powerpc/sse4_1-ptest-1.c       | 117 ++++++++++++++++++
 6 files changed, 538 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendvpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ptest-1.c

-- 
2.27.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/4] rs6000: Add support for SSE4.1 "test" intrinsics
  2021-06-29 18:08 [PATCH 0/4] rs6000: Add SSE4.1 "test" and "blend" intrinsics Paul A. Clarke
@ 2021-06-29 18:08 ` Paul A. Clarke
  2021-07-11 15:45   ` Bill Schmidt
  2021-06-29 18:08 ` [PATCH 2/4] rs6000: Add tests " Paul A. Clarke
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Paul A. Clarke @ 2021-06-29 18:08 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher

2021-06-29  Paul A. Clarke  <pc@us.ibm.com>

gcc/ChangeLog:
        * config/rs6000/smmintrin.h (_mm_testz_si128, _mm_testc_si128,
	_mm_testnzc_si128, _mm_test_all_ones, _mm_test_all_zeros,
	_mm_test_mix_ones_zeros): New.
---
 gcc/config/rs6000/smmintrin.h | 50 +++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index bdf6eb365d88..1b8cad135ed0 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -116,4 +116,54 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
   return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
 }
 
+extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_testz_si128 (__m128i __A, __m128i __B)
+{
+  /* Note: This implementation does NOT set "zero" or "carry" flags.  */
+  const __v16qu __zero = {0};
+  return vec_all_eq (vec_and ((__v16qu) __A, (__v16qu) __B), __zero);
+}
+
+extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_testc_si128 (__m128i __A, __m128i __B)
+{
+  /* Note: This implementation does NOT set "zero" or "carry" flags.  */
+  const __v16qu __zero = {0};
+  const __v16qu __notA = vec_nor ((__v16qu) __A, (__v16qu) __A);
+  return vec_all_eq (vec_and ((__v16qu) __notA, (__v16qu) __B), __zero);
+}
+
+extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_testnzc_si128 (__m128i __A, __m128i __B)
+{
+  /* Note: This implementation does NOT set "zero" or "carry" flags.  */
+  return _mm_testz_si128 (__A, __B) == 0 && _mm_testc_si128 (__A, __B) == 0;
+}
+
+extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_test_all_zeros (__m128i __A, __m128i __mask)
+{
+  const __v16qu __zero = {0};
+  return vec_all_eq (vec_and ((__v16qu) __A, (__v16qu) __mask), __zero);
+}
+
+extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_test_all_ones (__m128i __A)
+{
+  const __v16qu __ones = vec_splats ((unsigned char) 0xff);
+  return vec_all_eq ((__v16qu) __A, __ones);
+}
+
+extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_test_mix_ones_zeros (__m128i __A, __m128i __mask)
+{
+  const __v16qu __zero = {0};
+  const __v16qu __Amasked = vec_and ((__v16qu) __A, (__v16qu) __mask);
+  const int any_ones = vec_any_ne (__Amasked, __zero);
+  const __v16qu __notA = vec_nor ((__v16qu) __A, (__v16qu) __A);
+  const __v16qu __notAmasked = vec_and ((__v16qu) __notA, (__v16qu) __mask);
+  const int any_zeros = vec_any_ne (__notAmasked, __zero);
+  return any_ones * any_zeros;
+}
+
 #endif
-- 
2.27.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2/4] rs6000: Add tests for SSE4.1 "test" intrinsics
  2021-06-29 18:08 [PATCH 0/4] rs6000: Add SSE4.1 "test" and "blend" intrinsics Paul A. Clarke
  2021-06-29 18:08 ` [PATCH 1/4] rs6000: Add support for SSE4.1 "test" intrinsics Paul A. Clarke
@ 2021-06-29 18:08 ` Paul A. Clarke
  2021-07-11 15:49   ` Bill Schmidt
  2021-06-29 18:08 ` [PATCH 3/4] rs6000: Add support for SSE4.1 "blend" intrinsics Paul A. Clarke
  2021-06-29 18:08 ` [PATCH 4/4] rs6000: Add tests " Paul A. Clarke
  3 siblings, 1 reply; 16+ messages in thread
From: Paul A. Clarke @ 2021-06-29 18:08 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher

Copy the test for _mm_testz_si128, _mm_testc_si128,
_mm_testnzc_si128, _mm_test_all_ones, _mm_test_all_zeros,
_mm_test_mix_ones_zeros from gcc/testsuite/gcc.target/i386.

2021-06-29  Paul A. Clarke  <pc@us.ibm.com>

gcc/testsuite/ChangeLog:
        * gcc.target/powerpc/sse4_1-ptest.c: Copy from
	gcc/testsuite/gcc.target/i386.
---
 .../gcc.target/powerpc/sse4_1-ptest-1.c       | 117 ++++++++++++++++++
 1 file changed, 117 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ptest-1.c

diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-ptest-1.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-ptest-1.c
new file mode 100644
index 000000000000..69d13d57770d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-ptest-1.c
@@ -0,0 +1,117 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#ifndef CHECK_H
+#define CHECK_H "sse4_1-check.h"
+#endif
+
+#ifndef TEST
+#define TEST sse4_1_test
+#endif
+
+#include CHECK_H
+
+#include <smmintrin.h>
+
+static int
+make_ptestz (__m128i m, __m128i v)
+{
+  union
+    {
+      __m128i x;
+      unsigned char c[16];
+    } val, mask;
+  int i, z;
+
+  mask.x = m;
+  val.x = v;
+
+  z = 1;
+  for (i = 0; i < 16; i++)
+    if ((mask.c[i] & val.c[i]))
+      {
+	z = 0;
+	break;
+      }
+  return z;
+}
+
+static int
+make_ptestc (__m128i m, __m128i v)
+{
+  union
+    {
+      __m128i x;
+      unsigned char c[16];
+    } val, mask;
+  int i, c;
+
+  mask.x = m;
+  val.x = v;
+
+  c = 1;
+  for (i = 0; i < 16; i++)
+    if ((val.c[i] & ~mask.c[i]))
+      {
+	c = 0;
+	break;
+      }
+  return c;
+}
+
+static void
+TEST (void)
+{
+  union
+    {
+      __m128i x;
+      unsigned int i[4];
+    } val[4];
+  int i, j, l;
+  int res[32];
+
+  val[0].i[0] = 0x11111111;
+  val[0].i[1] = 0x00000000;
+  val[0].i[2] = 0x00000000;
+  val[0].i[3] = 0x11111111;
+    
+  val[1].i[0] = 0x00000000;
+  val[1].i[1] = 0x11111111;
+  val[1].i[2] = 0x11111111;
+  val[1].i[3] = 0x00000000;
+
+  val[2].i[0] = 0;
+  val[2].i[1] = 0;
+  val[2].i[2] = 0;
+  val[2].i[3] = 0;
+
+  val[3].i[0] = 0xffffffff;
+  val[3].i[1] = 0xffffffff;
+  val[3].i[2] = 0xffffffff;
+  val[3].i[3] = 0xffffffff;
+
+  l = 0;
+  for(i = 0; i < 4; i++)
+    for(j = 0; j < 4; j++)
+      {
+	res[l++] = _mm_testz_si128 (val[j].x, val[i].x);
+	res[l++] = _mm_testc_si128 (val[j].x, val[i].x);
+      }
+
+  l = 0;
+  for(i = 0; i < 4; i++)
+    for(j = 0; j < 4; j++)
+      {
+	if (res[l++] != make_ptestz (val[j].x, val[i].x))
+	  abort ();
+	if (res[l++] != make_ptestc (val[j].x, val[i].x))
+	  abort ();
+      }
+
+  if (res[2] != _mm_testz_si128 (val[1].x, val[0].x))
+    abort ();
+
+  if (res[3] != _mm_testc_si128 (val[1].x, val[0].x))
+    abort ();
+}
-- 
2.27.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 3/4] rs6000: Add support for SSE4.1 "blend" intrinsics
  2021-06-29 18:08 [PATCH 0/4] rs6000: Add SSE4.1 "test" and "blend" intrinsics Paul A. Clarke
  2021-06-29 18:08 ` [PATCH 1/4] rs6000: Add support for SSE4.1 "test" intrinsics Paul A. Clarke
  2021-06-29 18:08 ` [PATCH 2/4] rs6000: Add tests " Paul A. Clarke
@ 2021-06-29 18:08 ` Paul A. Clarke
  2021-07-11 16:17   ` Bill Schmidt
  2021-06-29 18:08 ` [PATCH 4/4] rs6000: Add tests " Paul A. Clarke
  3 siblings, 1 reply; 16+ messages in thread
From: Paul A. Clarke @ 2021-06-29 18:08 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher

_mm_blend_epi16 and _mm_blendv_epi8 were added earlier.
Add these four to complete the set.

2021-06-29  Paul A. Clarke  <pc@us.ibm.com>

gcc/ChangeLog:
	* config/rs6000/smmintrin.h (_mm_blend_pd, _mm_blendv_pd,
	_mm_blend_ps, _mm_blendv_ps): New.
---
 gcc/config/rs6000/smmintrin.h | 46 +++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 1b8cad135ed0..fa17a8b2f478 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -116,6 +116,52 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
   return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
 }
 
+extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_blend_pd (__m128d __A, __m128d __B, const int __imm8)
+{
+  const signed char __tmp = (__imm8 & 0b10) * 0b01111000 |
+			    (__imm8 & 0b01) * 0b00001111;
+  __v16qi __charmask = vec_splats ((signed char) __tmp);
+  __charmask = vec_gb (__charmask);
+  __v8hu __shortmask = (__v8hu) vec_unpackh (__charmask);
+  #ifdef __BIG_ENDIAN__
+  __shortmask = vec_reve (__shortmask);
+  #endif
+  return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) __shortmask);
+}
+
+extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask)
+{
+  const __v2di __zero = {0};
+  const vector __bool long long __boolmask = vec_cmplt ((__v2di) __mask, __zero);
+  return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) __boolmask);
+}
+
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_blend_ps (__m128 __A, __m128 __B, const int __imm8)
+{
+  const signed char __mask = (__imm8 & 0b1000) * 0b00011000 |
+			     (__imm8 & 0b0100) * 0b00001100 |
+			     (__imm8 & 0b0010) * 0b00000110 |
+			     (__imm8 & 0b0001) * 0b00000011;
+  __v16qi __charmask = vec_splats ( __mask);
+  __charmask = vec_gb (__charmask);
+  __v8hu __shortmask = (__v8hu) vec_unpackh (__charmask);
+  #ifdef __BIG_ENDIAN__
+  __shortmask = vec_reve (__shortmask);
+  #endif
+  return (__m128) vec_sel ((__v8hu) __A, (__v8hu) __B, __shortmask);
+}
+
+extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_blendv_ps (__m128 __A, __m128 __B, __m128 __mask)
+{
+  const __v4si __zero = {0};
+  const vector __bool int __boolmask = vec_cmplt ((__v4si) __mask, __zero);
+  return (__m128) vec_sel ((__v4su) __A, (__v4su) __B, (__v4su) __boolmask);
+}
+
 extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_testz_si128 (__m128i __A, __m128i __B)
 {
-- 
2.27.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 4/4] rs6000: Add tests for SSE4.1 "blend" intrinsics
  2021-06-29 18:08 [PATCH 0/4] rs6000: Add SSE4.1 "test" and "blend" intrinsics Paul A. Clarke
                   ` (2 preceding siblings ...)
  2021-06-29 18:08 ` [PATCH 3/4] rs6000: Add support for SSE4.1 "blend" intrinsics Paul A. Clarke
@ 2021-06-29 18:08 ` Paul A. Clarke
  2021-07-11 16:19   ` Bill Schmidt
  3 siblings, 1 reply; 16+ messages in thread
From: Paul A. Clarke @ 2021-06-29 18:08 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher

Copy the tests for _mm_blend_pd, _mm_blendv_pd, _mm_blend_ps,
_mm_blendv_ps from gcc/testsuite/gcc.target/i386.

2021-06-29  Paul A. Clarke  <pc@us.ibm.com>

gcc/testsuite/ChangeLog:
	* gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c: Copy
	from gcc/testsuite/gcc.target/i386.
	* gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c: Likewise.
	* gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c: Likewise.
	* gcc/testsuite/gcc.target/powerpc/sse4_1-blendvpd.c: Likewise.
---
 .../gcc.target/powerpc/sse4_1-blendpd.c       | 89 ++++++++++++++++++
 .../gcc.target/powerpc/sse4_1-blendps-2.c     | 81 +++++++++++++++++
 .../gcc.target/powerpc/sse4_1-blendps.c       | 90 +++++++++++++++++++
 .../gcc.target/powerpc/sse4_1-blendvpd.c      | 65 ++++++++++++++
 4 files changed, 325 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendvpd.c

diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c
new file mode 100644
index 000000000000..ca1780471fa2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c
@@ -0,0 +1,89 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#ifndef CHECK_H
+#define CHECK_H "sse4_1-check.h"
+#endif
+
+#ifndef TEST
+#define TEST sse4_1_test
+#endif
+
+#include CHECK_H
+
+#include <smmintrin.h>
+#include <string.h>
+
+#define NUM 20
+
+#ifndef MASK
+#define MASK 0x03
+#endif
+
+static void
+init_blendpd (double *src1, double *src2)
+{
+  int i, sign = 1;
+
+  for (i = 0; i < NUM * 2; i++)
+    {
+      src1[i] = i * i * sign;
+      src2[i] = (i + 20) * sign;
+      sign = -sign;
+    }
+}
+
+static int
+check_blendpd (__m128d *dst, double *src1, double *src2)
+{
+  double tmp[2];
+  int j;
+
+  memcpy (&tmp[0], src1, sizeof (tmp));
+
+  for(j = 0; j < 2; j++)
+    if ((MASK & (1 << j)))
+      tmp[j] = src2[j];
+
+  return memcmp (dst, &tmp[0], sizeof (tmp));
+}
+
+static void
+TEST (void)
+{
+  __m128d x, y;
+  union
+    {
+      __m128d x[NUM];
+      double d[NUM * 2];
+    } dst, src1, src2;
+  union
+    {
+      __m128d x;
+      double d[2];
+    } src3;
+  int i;
+
+  init_blendpd (src1.d, src2.d);
+
+  /* Check blendpd imm8, m128, xmm */
+  for (i = 0; i < NUM; i++)
+    {
+      dst.x[i] = _mm_blend_pd (src1.x[i], src2.x[i], MASK);
+      if (check_blendpd (&dst.x[i], &src1.d[i * 2], &src2.d[i * 2]))
+	abort ();
+    }
+    
+  /* Check blendpd imm8, xmm, xmm */
+  src3.x = _mm_setzero_pd ();
+
+  x = _mm_blend_pd (dst.x[2], src3.x, MASK);
+  y = _mm_blend_pd (src3.x, dst.x[2], MASK);
+
+  if (check_blendpd (&x, &dst.d[4], &src3.d[0]))
+    abort ();
+
+  if (check_blendpd (&y, &src3.d[0], &dst.d[4]))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c
new file mode 100644
index 000000000000..768b6e64bbae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c
@@ -0,0 +1,81 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#include "sse4_1-check.h"
+
+#include <smmintrin.h>
+#include <string.h>
+#include <stdlib.h>
+
+#define NUM 20
+
+#undef MASK
+#define MASK 0xe
+
+static void
+init_blendps (float *src1, float *src2)
+{
+  int i, sign = 1;
+
+  for (i = 0; i < NUM * 4; i++)
+    {
+      src1[i] = i * i * sign;
+      src2[i] = (i + 20) * sign;
+      sign = -sign;
+    }
+}
+
+static int
+check_blendps (__m128 *dst, float *src1, float *src2)
+{
+  float tmp[4];
+  int j;
+
+  memcpy (&tmp[0], src1, sizeof (tmp));
+  for (j = 0; j < 4; j++)
+    if ((MASK & (1 << j)))
+      tmp[j] = src2[j];
+
+  return memcmp (dst, &tmp[0], sizeof (tmp));
+}
+
+static void
+sse4_1_test (void)
+{
+  __m128 x, y;
+  union
+    {
+      __m128 x[NUM];
+      float f[NUM * 4];
+    } dst, src1, src2;
+  union
+    {
+      __m128 x;
+      float f[4];
+    } src3;
+  int i;
+
+  init_blendps (src1.f, src2.f);
+
+  for (i = 0; i < 4; i++)
+    src3.f[i] = (int) rand ();
+
+  /* Check blendps imm8, m128, xmm */
+  for (i = 0; i < NUM; i++)
+    {
+      dst.x[i] = _mm_blend_ps (src1.x[i], src2.x[i], MASK); 
+      if (check_blendps (&dst.x[i], &src1.f[i * 4], &src2.f[i * 4]))
+	abort ();
+    }
+    
+   /* Check blendps imm8, xmm, xmm */
+  x = _mm_blend_ps (dst.x[2], src3.x, MASK);
+  y = _mm_blend_ps (src3.x, dst.x[2], MASK);
+
+  if (check_blendps (&x, &dst.f[8], &src3.f[0]))
+    abort ();
+
+  if (check_blendps (&y, &src3.f[0], &dst.f[8]))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c
new file mode 100644
index 000000000000..2f114b69a84b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c
@@ -0,0 +1,90 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#ifndef CHECK_H
+#define CHECK_H "sse4_1-check.h"
+#endif
+
+#ifndef TEST
+#define TEST sse4_1_test
+#endif
+
+#include CHECK_H
+
+#include <smmintrin.h>
+#include <string.h>
+#include <stdlib.h>
+
+#define NUM 20
+
+#ifndef MASK
+#define MASK 0x0f
+#endif
+
+static void
+init_blendps (float *src1, float *src2)
+{
+  int i, sign = 1;
+
+  for (i = 0; i < NUM * 4; i++)
+    {
+      src1[i] = i * i * sign;
+      src2[i] = (i + 20) * sign;
+      sign = -sign;
+    }
+}
+
+static int
+check_blendps (__m128 *dst, float *src1, float *src2)
+{
+  float tmp[4];
+  int j;
+
+  memcpy (&tmp[0], src1, sizeof (tmp));
+  for (j = 0; j < 4; j++)
+    if ((MASK & (1 << j)))
+      tmp[j] = src2[j];
+
+  return memcmp (dst, &tmp[0], sizeof (tmp));
+}
+
+static void
+TEST (void)
+{
+  __m128 x, y;
+  union
+    {
+      __m128 x[NUM];
+      float f[NUM * 4];
+    } dst, src1, src2;
+  union
+    {
+      __m128 x;
+      float f[4];
+    } src3;
+  int i;
+
+  init_blendps (src1.f, src2.f);
+
+  for (i = 0; i < 4; i++)
+    src3.f[i] = (int) rand ();
+
+  /* Check blendps imm8, m128, xmm */
+  for (i = 0; i < NUM; i++)
+    {
+      dst.x[i] = _mm_blend_ps (src1.x[i], src2.x[i], MASK); 
+      if (check_blendps (&dst.x[i], &src1.f[i * 4], &src2.f[i * 4]))
+	abort ();
+    }
+    
+   /* Check blendps imm8, xmm, xmm */
+  x = _mm_blend_ps (dst.x[2], src3.x, MASK);
+  y = _mm_blend_ps (src3.x, dst.x[2], MASK);
+
+  if (check_blendps (&x, &dst.f[8], &src3.f[0]))
+    abort ();
+
+  if (check_blendps (&y, &src3.f[0], &dst.f[8]))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvpd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvpd.c
new file mode 100644
index 000000000000..b82cd28848a6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvpd.c
@@ -0,0 +1,65 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#include "sse4_1-check.h"
+
+#include <smmintrin.h>
+#include <string.h>
+
+#define NUM 20
+
+static void
+init_blendvpd (double *src1, double *src2, double *mask)
+{
+  int i, msk, sign = 1; 
+
+  msk = -1;
+  for (i = 0; i < NUM * 2; i++)
+    {
+      if((i % 2) == 0)
+	msk++;
+      src1[i] = i* (i + 1) * sign;
+      src2[i] = (i + 20) * sign;
+      mask[i] = (i + 120) * i;
+      if( (msk & (1 << (i % 2))))
+	mask[i] = -mask[i];
+      sign = -sign;
+    }
+}
+
+static int
+check_blendvpd (__m128d *dst, double *src1, double *src2,
+		double *mask)
+{
+  double tmp[2];
+  int j;
+
+  memcpy (&tmp[0], src1, sizeof (tmp));
+  for (j = 0; j < 2; j++)
+    if (mask [j] < 0.0)
+      tmp[j] = src2[j];
+
+  return memcmp (dst, &tmp[0], sizeof (tmp));
+}
+
+static void
+sse4_1_test (void)
+{
+  union
+    {
+      __m128d x[NUM];
+      double d[NUM * 2];
+    } dst, src1, src2, mask;
+  int i;
+
+  init_blendvpd (src1.d, src2.d, mask.d);
+
+  for (i = 0; i < NUM; i++)
+    {
+      dst.x[i] = _mm_blendv_pd (src1.x[i], src2.x[i], mask.x[i]);
+      if (check_blendvpd (&dst.x[i], &src1.d[i * 2], &src2.d[i * 2],
+			  &mask.d[i * 2]))
+	abort ();
+    }
+}
-- 
2.27.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] rs6000: Add support for SSE4.1 "test" intrinsics
  2021-06-29 18:08 ` [PATCH 1/4] rs6000: Add support for SSE4.1 "test" intrinsics Paul A. Clarke
@ 2021-07-11 15:45   ` Bill Schmidt
  2021-07-12 22:24     ` Segher Boessenkool
  0 siblings, 1 reply; 16+ messages in thread
From: Bill Schmidt @ 2021-07-11 15:45 UTC (permalink / raw)
  To: Paul A. Clarke, gcc-patches; +Cc: segher

Hi Paul,

On 6/29/21 1:08 PM, Paul A. Clarke via Gcc-patches wrote:
> 2021-06-29  Paul A. Clarke  <pc@us.ibm.com>
>
> gcc/ChangeLog:
>          * config/rs6000/smmintrin.h (_mm_testz_si128, _mm_testc_si128,
> 	_mm_testnzc_si128, _mm_test_all_ones, _mm_test_all_zeros,
> 	_mm_test_mix_ones_zeros): New.
> ---
>   gcc/config/rs6000/smmintrin.h | 50 +++++++++++++++++++++++++++++++++++
>   1 file changed, 50 insertions(+)
>
> diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
> index bdf6eb365d88..1b8cad135ed0 100644
> --- a/gcc/config/rs6000/smmintrin.h
> +++ b/gcc/config/rs6000/smmintrin.h
> @@ -116,4 +116,54 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
>     return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
>   }
>
> +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__))
Line too long, please fix here and below.  (Existing cases can be left.)
> +_mm_testz_si128 (__m128i __A, __m128i __B)
> +{
> +  /* Note: This implementation does NOT set "zero" or "carry" flags.  */

This is reasonable; thanks for documenting.

LGTM; I can't approve, but recommend approval with line lengths fixed.  
Thanks!
Bill

> +  const __v16qu __zero = {0};
> +  return vec_all_eq (vec_and ((__v16qu) __A, (__v16qu) __B), __zero);
> +}
> +
> +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_testc_si128 (__m128i __A, __m128i __B)
> +{
> +  /* Note: This implementation does NOT set "zero" or "carry" flags.  */
> +  const __v16qu __zero = {0};
> +  const __v16qu __notA = vec_nor ((__v16qu) __A, (__v16qu) __A);
> +  return vec_all_eq (vec_and ((__v16qu) __notA, (__v16qu) __B), __zero);
> +}
> +
> +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_testnzc_si128 (__m128i __A, __m128i __B)
> +{
> +  /* Note: This implementation does NOT set "zero" or "carry" flags.  */
> +  return _mm_testz_si128 (__A, __B) == 0 && _mm_testc_si128 (__A, __B) == 0;
> +}
> +
> +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_test_all_zeros (__m128i __A, __m128i __mask)
> +{
> +  const __v16qu __zero = {0};
> +  return vec_all_eq (vec_and ((__v16qu) __A, (__v16qu) __mask), __zero);
> +}
> +
> +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_test_all_ones (__m128i __A)
> +{
> +  const __v16qu __ones = vec_splats ((unsigned char) 0xff);
> +  return vec_all_eq ((__v16qu) __A, __ones);
> +}
> +
> +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_test_mix_ones_zeros (__m128i __A, __m128i __mask)
> +{
> +  const __v16qu __zero = {0};
> +  const __v16qu __Amasked = vec_and ((__v16qu) __A, (__v16qu) __mask);
> +  const int any_ones = vec_any_ne (__Amasked, __zero);
> +  const __v16qu __notA = vec_nor ((__v16qu) __A, (__v16qu) __A);
> +  const __v16qu __notAmasked = vec_and ((__v16qu) __notA, (__v16qu) __mask);
> +  const int any_zeros = vec_any_ne (__notAmasked, __zero);
> +  return any_ones * any_zeros;
> +}
> +
>   #endif

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] rs6000: Add tests for SSE4.1 "test" intrinsics
  2021-06-29 18:08 ` [PATCH 2/4] rs6000: Add tests " Paul A. Clarke
@ 2021-07-11 15:49   ` Bill Schmidt
  2021-07-12 22:39     ` Segher Boessenkool
  0 siblings, 1 reply; 16+ messages in thread
From: Bill Schmidt @ 2021-07-11 15:49 UTC (permalink / raw)
  To: Paul A. Clarke, gcc-patches; +Cc: segher

Hi Paul,

LGTM.  I can't approve, but recommend approval as is.

Thanks,
Bill

On 6/29/21 1:08 PM, Paul A. Clarke via Gcc-patches wrote:
> Copy the test for _mm_testz_si128, _mm_testc_si128,
> _mm_testnzc_si128, _mm_test_all_ones, _mm_test_all_zeros,
> _mm_test_mix_ones_zeros from gcc/testsuite/gcc.target/i386.
>
> 2021-06-29  Paul A. Clarke  <pc@us.ibm.com>
>
> gcc/testsuite/ChangeLog:
>          * gcc.target/powerpc/sse4_1-ptest.c: Copy from
> 	gcc/testsuite/gcc.target/i386.
> ---
>   .../gcc.target/powerpc/sse4_1-ptest-1.c       | 117 ++++++++++++++++++
>   1 file changed, 117 insertions(+)
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ptest-1.c
>
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-ptest-1.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-ptest-1.c
> new file mode 100644
> index 000000000000..69d13d57770d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-ptest-1.c
> @@ -0,0 +1,117 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target p8vector_hw } */
> +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
> +
> +#ifndef CHECK_H
> +#define CHECK_H "sse4_1-check.h"
> +#endif
> +
> +#ifndef TEST
> +#define TEST sse4_1_test
> +#endif
> +
> +#include CHECK_H
> +
> +#include <smmintrin.h>
> +
> +static int
> +make_ptestz (__m128i m, __m128i v)
> +{
> +  union
> +    {
> +      __m128i x;
> +      unsigned char c[16];
> +    } val, mask;
> +  int i, z;
> +
> +  mask.x = m;
> +  val.x = v;
> +
> +  z = 1;
> +  for (i = 0; i < 16; i++)
> +    if ((mask.c[i] & val.c[i]))
> +      {
> +	z = 0;
> +	break;
> +      }
> +  return z;
> +}
> +
> +static int
> +make_ptestc (__m128i m, __m128i v)
> +{
> +  union
> +    {
> +      __m128i x;
> +      unsigned char c[16];
> +    } val, mask;
> +  int i, c;
> +
> +  mask.x = m;
> +  val.x = v;
> +
> +  c = 1;
> +  for (i = 0; i < 16; i++)
> +    if ((val.c[i] & ~mask.c[i]))
> +      {
> +	c = 0;
> +	break;
> +      }
> +  return c;
> +}
> +
> +static void
> +TEST (void)
> +{
> +  union
> +    {
> +      __m128i x;
> +      unsigned int i[4];
> +    } val[4];
> +  int i, j, l;
> +  int res[32];
> +
> +  val[0].i[0] = 0x11111111;
> +  val[0].i[1] = 0x00000000;
> +  val[0].i[2] = 0x00000000;
> +  val[0].i[3] = 0x11111111;
> +
> +  val[1].i[0] = 0x00000000;
> +  val[1].i[1] = 0x11111111;
> +  val[1].i[2] = 0x11111111;
> +  val[1].i[3] = 0x00000000;
> +
> +  val[2].i[0] = 0;
> +  val[2].i[1] = 0;
> +  val[2].i[2] = 0;
> +  val[2].i[3] = 0;
> +
> +  val[3].i[0] = 0xffffffff;
> +  val[3].i[1] = 0xffffffff;
> +  val[3].i[2] = 0xffffffff;
> +  val[3].i[3] = 0xffffffff;
> +
> +  l = 0;
> +  for(i = 0; i < 4; i++)
> +    for(j = 0; j < 4; j++)
> +      {
> +	res[l++] = _mm_testz_si128 (val[j].x, val[i].x);
> +	res[l++] = _mm_testc_si128 (val[j].x, val[i].x);
> +      }
> +
> +  l = 0;
> +  for(i = 0; i < 4; i++)
> +    for(j = 0; j < 4; j++)
> +      {
> +	if (res[l++] != make_ptestz (val[j].x, val[i].x))
> +	  abort ();
> +	if (res[l++] != make_ptestc (val[j].x, val[i].x))
> +	  abort ();
> +      }
> +
> +  if (res[2] != _mm_testz_si128 (val[1].x, val[0].x))
> +    abort ();
> +
> +  if (res[3] != _mm_testc_si128 (val[1].x, val[0].x))
> +    abort ();
> +}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] rs6000: Add support for SSE4.1 "blend" intrinsics
  2021-06-29 18:08 ` [PATCH 3/4] rs6000: Add support for SSE4.1 "blend" intrinsics Paul A. Clarke
@ 2021-07-11 16:17   ` Bill Schmidt
  2021-07-11 16:29     ` Bill Schmidt
  0 siblings, 1 reply; 16+ messages in thread
From: Bill Schmidt @ 2021-07-11 16:17 UTC (permalink / raw)
  To: Paul A. Clarke, gcc-patches; +Cc: segher

Hi Paul,

On 6/29/21 1:08 PM, Paul A. Clarke via Gcc-patches wrote:
> _mm_blend_epi16 and _mm_blendv_epi8 were added earlier.
> Add these four to complete the set.
>
> 2021-06-29  Paul A. Clarke  <pc@us.ibm.com>
>
> gcc/ChangeLog:
> 	* config/rs6000/smmintrin.h (_mm_blend_pd, _mm_blendv_pd,
> 	_mm_blend_ps, _mm_blendv_ps): New.
> ---
>   gcc/config/rs6000/smmintrin.h | 46 +++++++++++++++++++++++++++++++++++
>   1 file changed, 46 insertions(+)
>
> diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
> index 1b8cad135ed0..fa17a8b2f478 100644
> --- a/gcc/config/rs6000/smmintrin.h
> +++ b/gcc/config/rs6000/smmintrin.h
> @@ -116,6 +116,52 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
>     return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
>   }
>
> +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
Usual line length complaint. :)  Here and below...
> +_mm_blend_pd (__m128d __A, __m128d __B, const int __imm8)
> +{
> +  const signed char __tmp = (__imm8 & 0b10) * 0b01111000 |
> +			    (__imm8 & 0b01) * 0b00001111;
> +  __v16qi __charmask = vec_splats ((signed char) __tmp);
> +  __charmask = vec_gb (__charmask);
> +  __v8hu __shortmask = (__v8hu) vec_unpackh (__charmask);
> +  #ifdef __BIG_ENDIAN__
> +  __shortmask = vec_reve (__shortmask);
> +  #endif
> +  return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) __shortmask);

This seems way too complex, and needs commentary to explain what you're 
doing.  Doesn't this instruction just translate into some form of 
xxpermdi?  Different ones for BE and LE, but still just xxpermdi, I think.

> +}
> +
> +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask)
> +{
> +  const __v2di __zero = {0};
> +  const vector __bool long long __boolmask = vec_cmplt ((__v2di) __mask, __zero);
> +  return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) __boolmask);
> +}

Okay.

> +
> +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_blend_ps (__m128 __A, __m128 __B, const int __imm8)
> +{
> +  const signed char __mask = (__imm8 & 0b1000) * 0b00011000 |
> +			     (__imm8 & 0b0100) * 0b00001100 |
> +			     (__imm8 & 0b0010) * 0b00000110 |
> +			     (__imm8 & 0b0001) * 0b00000011;
> +  __v16qi __charmask = vec_splats ( __mask);
> +  __charmask = vec_gb (__charmask);
> +  __v8hu __shortmask = (__v8hu) vec_unpackh (__charmask);

This is a good trick, but you need comments to explain what you're 
doing, including how you build __mask.  I recommend you include 
alternate code for P10, where you can just use vec_genwm to expand from 
__mask to a mask of word elements.

I don't understand how you're getting away with a v8hu mask for word 
elements.  This seems wrong to me.  Adequate testing?

> +  #ifdef __BIG_ENDIAN__
> +  __shortmask = vec_reve (__shortmask);
> +  #endif
> +  return (__m128) vec_sel ((__v8hu) __A, (__v8hu) __B, __shortmask);
> +}
> +
> +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_blendv_ps (__m128 __A, __m128 __B, __m128 __mask)
> +{
> +  const __v4si __zero = {0};
> +  const vector __bool int __boolmask = vec_cmplt ((__v4si) __mask, __zero);
> +  return (__m128) vec_sel ((__v4su) __A, (__v4su) __B, (__v4su) __boolmask);
> +}
> +

Okay.

Please have a look at the above issues and resubmit.  Thanks!
Bill

>   extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__))
>   _mm_testz_si128 (__m128i __A, __m128i __B)
>   {

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] rs6000: Add tests for SSE4.1 "blend" intrinsics
  2021-06-29 18:08 ` [PATCH 4/4] rs6000: Add tests " Paul A. Clarke
@ 2021-07-11 16:19   ` Bill Schmidt
  2021-07-12 22:52     ` Segher Boessenkool
  0 siblings, 1 reply; 16+ messages in thread
From: Bill Schmidt @ 2021-07-11 16:19 UTC (permalink / raw)
  To: Paul A. Clarke, gcc-patches; +Cc: segher

Hi Paul,

Please resubmit this when you resubmit 3/4, in case any adjustments are 
needed.

Thanks!
Bill

On 6/29/21 1:08 PM, Paul A. Clarke via Gcc-patches wrote:
> Copy the tests for _mm_blend_pd, _mm_blendv_pd, _mm_blend_ps,
> _mm_blendv_ps from gcc/testsuite/gcc.target/i386.
>
> 2021-06-29  Paul A. Clarke  <pc@us.ibm.com>
>
> gcc/testsuite/ChangeLog:
> 	* gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c: Copy
> 	from gcc/testsuite/gcc.target/i386.
> 	* gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c: Likewise.
> 	* gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c: Likewise.
> 	* gcc/testsuite/gcc.target/powerpc/sse4_1-blendvpd.c: Likewise.
> ---
>   .../gcc.target/powerpc/sse4_1-blendpd.c       | 89 ++++++++++++++++++
>   .../gcc.target/powerpc/sse4_1-blendps-2.c     | 81 +++++++++++++++++
>   .../gcc.target/powerpc/sse4_1-blendps.c       | 90 +++++++++++++++++++
>   .../gcc.target/powerpc/sse4_1-blendvpd.c      | 65 ++++++++++++++
>   4 files changed, 325 insertions(+)
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendvpd.c
>
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c
> new file mode 100644
> index 000000000000..ca1780471fa2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c
> @@ -0,0 +1,89 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target p8vector_hw } */
> +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
> +
> +#ifndef CHECK_H
> +#define CHECK_H "sse4_1-check.h"
> +#endif
> +
> +#ifndef TEST
> +#define TEST sse4_1_test
> +#endif
> +
> +#include CHECK_H
> +
> +#include <smmintrin.h>
> +#include <string.h>
> +
> +#define NUM 20
> +
> +#ifndef MASK
> +#define MASK 0x03
> +#endif
> +
> +static void
> +init_blendpd (double *src1, double *src2)
> +{
> +  int i, sign = 1;
> +
> +  for (i = 0; i < NUM * 2; i++)
> +    {
> +      src1[i] = i * i * sign;
> +      src2[i] = (i + 20) * sign;
> +      sign = -sign;
> +    }
> +}
> +
> +static int
> +check_blendpd (__m128d *dst, double *src1, double *src2)
> +{
> +  double tmp[2];
> +  int j;
> +
> +  memcpy (&tmp[0], src1, sizeof (tmp));
> +
> +  for(j = 0; j < 2; j++)
> +    if ((MASK & (1 << j)))
> +      tmp[j] = src2[j];
> +
> +  return memcmp (dst, &tmp[0], sizeof (tmp));
> +}
> +
> +static void
> +TEST (void)
> +{
> +  __m128d x, y;
> +  union
> +    {
> +      __m128d x[NUM];
> +      double d[NUM * 2];
> +    } dst, src1, src2;
> +  union
> +    {
> +      __m128d x;
> +      double d[2];
> +    } src3;
> +  int i;
> +
> +  init_blendpd (src1.d, src2.d);
> +
> +  /* Check blendpd imm8, m128, xmm */
> +  for (i = 0; i < NUM; i++)
> +    {
> +      dst.x[i] = _mm_blend_pd (src1.x[i], src2.x[i], MASK);
> +      if (check_blendpd (&dst.x[i], &src1.d[i * 2], &src2.d[i * 2]))
> +	abort ();
> +    }
> +
> +  /* Check blendpd imm8, xmm, xmm */
> +  src3.x = _mm_setzero_pd ();
> +
> +  x = _mm_blend_pd (dst.x[2], src3.x, MASK);
> +  y = _mm_blend_pd (src3.x, dst.x[2], MASK);
> +
> +  if (check_blendpd (&x, &dst.d[4], &src3.d[0]))
> +    abort ();
> +
> +  if (check_blendpd (&y, &src3.d[0], &dst.d[4]))
> +    abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c
> new file mode 100644
> index 000000000000..768b6e64bbae
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c
> @@ -0,0 +1,81 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target p8vector_hw } */
> +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
> +
> +#include "sse4_1-check.h"
> +
> +#include <smmintrin.h>
> +#include <string.h>
> +#include <stdlib.h>
> +
> +#define NUM 20
> +
> +#undef MASK
> +#define MASK 0xe
> +
> +static void
> +init_blendps (float *src1, float *src2)
> +{
> +  int i, sign = 1;
> +
> +  for (i = 0; i < NUM * 4; i++)
> +    {
> +      src1[i] = i * i * sign;
> +      src2[i] = (i + 20) * sign;
> +      sign = -sign;
> +    }
> +}
> +
> +static int
> +check_blendps (__m128 *dst, float *src1, float *src2)
> +{
> +  float tmp[4];
> +  int j;
> +
> +  memcpy (&tmp[0], src1, sizeof (tmp));
> +  for (j = 0; j < 4; j++)
> +    if ((MASK & (1 << j)))
> +      tmp[j] = src2[j];
> +
> +  return memcmp (dst, &tmp[0], sizeof (tmp));
> +}
> +
> +static void
> +sse4_1_test (void)
> +{
> +  __m128 x, y;
> +  union
> +    {
> +      __m128 x[NUM];
> +      float f[NUM * 4];
> +    } dst, src1, src2;
> +  union
> +    {
> +      __m128 x;
> +      float f[4];
> +    } src3;
> +  int i;
> +
> +  init_blendps (src1.f, src2.f);
> +
> +  for (i = 0; i < 4; i++)
> +    src3.f[i] = (int) rand ();
> +
> +  /* Check blendps imm8, m128, xmm */
> +  for (i = 0; i < NUM; i++)
> +    {
> +      dst.x[i] = _mm_blend_ps (src1.x[i], src2.x[i], MASK);
> +      if (check_blendps (&dst.x[i], &src1.f[i * 4], &src2.f[i * 4]))
> +	abort ();
> +    }
> +
> +   /* Check blendps imm8, xmm, xmm */
> +  x = _mm_blend_ps (dst.x[2], src3.x, MASK);
> +  y = _mm_blend_ps (src3.x, dst.x[2], MASK);
> +
> +  if (check_blendps (&x, &dst.f[8], &src3.f[0]))
> +    abort ();
> +
> +  if (check_blendps (&y, &src3.f[0], &dst.f[8]))
> +    abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c
> new file mode 100644
> index 000000000000..2f114b69a84b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c
> @@ -0,0 +1,90 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target p8vector_hw } */
> +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
> +
> +#ifndef CHECK_H
> +#define CHECK_H "sse4_1-check.h"
> +#endif
> +
> +#ifndef TEST
> +#define TEST sse4_1_test
> +#endif
> +
> +#include CHECK_H
> +
> +#include <smmintrin.h>
> +#include <string.h>
> +#include <stdlib.h>
> +
> +#define NUM 20
> +
> +#ifndef MASK
> +#define MASK 0x0f
> +#endif
> +
> +static void
> +init_blendps (float *src1, float *src2)
> +{
> +  int i, sign = 1;
> +
> +  for (i = 0; i < NUM * 4; i++)
> +    {
> +      src1[i] = i * i * sign;
> +      src2[i] = (i + 20) * sign;
> +      sign = -sign;
> +    }
> +}
> +
> +static int
> +check_blendps (__m128 *dst, float *src1, float *src2)
> +{
> +  float tmp[4];
> +  int j;
> +
> +  memcpy (&tmp[0], src1, sizeof (tmp));
> +  for (j = 0; j < 4; j++)
> +    if ((MASK & (1 << j)))
> +      tmp[j] = src2[j];
> +
> +  return memcmp (dst, &tmp[0], sizeof (tmp));
> +}
> +
> +static void
> +TEST (void)
> +{
> +  __m128 x, y;
> +  union
> +    {
> +      __m128 x[NUM];
> +      float f[NUM * 4];
> +    } dst, src1, src2;
> +  union
> +    {
> +      __m128 x;
> +      float f[4];
> +    } src3;
> +  int i;
> +
> +  init_blendps (src1.f, src2.f);
> +
> +  for (i = 0; i < 4; i++)
> +    src3.f[i] = (int) rand ();
> +
> +  /* Check blendps imm8, m128, xmm */
> +  for (i = 0; i < NUM; i++)
> +    {
> +      dst.x[i] = _mm_blend_ps (src1.x[i], src2.x[i], MASK);
> +      if (check_blendps (&dst.x[i], &src1.f[i * 4], &src2.f[i * 4]))
> +	abort ();
> +    }
> +
> +   /* Check blendps imm8, xmm, xmm */
> +  x = _mm_blend_ps (dst.x[2], src3.x, MASK);
> +  y = _mm_blend_ps (src3.x, dst.x[2], MASK);
> +
> +  if (check_blendps (&x, &dst.f[8], &src3.f[0]))
> +    abort ();
> +
> +  if (check_blendps (&y, &src3.f[0], &dst.f[8]))
> +    abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvpd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvpd.c
> new file mode 100644
> index 000000000000..b82cd28848a6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvpd.c
> @@ -0,0 +1,65 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target p8vector_hw } */
> +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
> +
> +#include "sse4_1-check.h"
> +
> +#include <smmintrin.h>
> +#include <string.h>
> +
> +#define NUM 20
> +
> +static void
> +init_blendvpd (double *src1, double *src2, double *mask)
> +{
> +  int i, msk, sign = 1;
> +
> +  msk = -1;
> +  for (i = 0; i < NUM * 2; i++)
> +    {
> +      if((i % 2) == 0)
> +	msk++;
> +      src1[i] = i* (i + 1) * sign;
> +      src2[i] = (i + 20) * sign;
> +      mask[i] = (i + 120) * i;
> +      if( (msk & (1 << (i % 2))))
> +	mask[i] = -mask[i];
> +      sign = -sign;
> +    }
> +}
> +
> +static int
> +check_blendvpd (__m128d *dst, double *src1, double *src2,
> +		double *mask)
> +{
> +  double tmp[2];
> +  int j;
> +
> +  memcpy (&tmp[0], src1, sizeof (tmp));
> +  for (j = 0; j < 2; j++)
> +    if (mask [j] < 0.0)
> +      tmp[j] = src2[j];
> +
> +  return memcmp (dst, &tmp[0], sizeof (tmp));
> +}
> +
> +static void
> +sse4_1_test (void)
> +{
> +  union
> +    {
> +      __m128d x[NUM];
> +      double d[NUM * 2];
> +    } dst, src1, src2, mask;
> +  int i;
> +
> +  init_blendvpd (src1.d, src2.d, mask.d);
> +
> +  for (i = 0; i < NUM; i++)
> +    {
> +      dst.x[i] = _mm_blendv_pd (src1.x[i], src2.x[i], mask.x[i]);
> +      if (check_blendvpd (&dst.x[i], &src1.d[i * 2], &src2.d[i * 2],
> +			  &mask.d[i * 2]))
> +	abort ();
> +    }
> +}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] rs6000: Add support for SSE4.1 "blend" intrinsics
  2021-07-11 16:17   ` Bill Schmidt
@ 2021-07-11 16:29     ` Bill Schmidt
  2021-07-12 21:29       ` Paul A. Clarke
  0 siblings, 1 reply; 16+ messages in thread
From: Bill Schmidt @ 2021-07-11 16:29 UTC (permalink / raw)
  To: Paul A. Clarke, gcc-patches; +Cc: segher

On 7/11/21 11:17 AM, Bill Schmidt wrote:
> Hi Paul,
>
> On 6/29/21 1:08 PM, Paul A. Clarke via Gcc-patches wrote:
>> _mm_blend_epi16 and _mm_blendv_epi8 were added earlier.
>> Add these four to complete the set.
>>
>> 2021-06-29  Paul A. Clarke  <pc@us.ibm.com>
>>
>> gcc/ChangeLog:
>>     * config/rs6000/smmintrin.h (_mm_blend_pd, _mm_blendv_pd,
>>     _mm_blend_ps, _mm_blendv_ps): New.
>> ---
>>   gcc/config/rs6000/smmintrin.h | 46 +++++++++++++++++++++++++++++++++++
>>   1 file changed, 46 insertions(+)
>>
>> diff --git a/gcc/config/rs6000/smmintrin.h 
>> b/gcc/config/rs6000/smmintrin.h
>> index 1b8cad135ed0..fa17a8b2f478 100644
>> --- a/gcc/config/rs6000/smmintrin.h
>> +++ b/gcc/config/rs6000/smmintrin.h
>> @@ -116,6 +116,52 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, 
>> __m128i __mask)
>>     return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
>>   }
>>
>> +extern __inline __m128d __attribute__((__gnu_inline__, 
>> __always_inline__, __artificial__))
> Usual line length complaint. :)  Here and below...
>> +_mm_blend_pd (__m128d __A, __m128d __B, const int __imm8)
>> +{
>> +  const signed char __tmp = (__imm8 & 0b10) * 0b01111000 |
>> +                (__imm8 & 0b01) * 0b00001111;
>> +  __v16qi __charmask = vec_splats ((signed char) __tmp);
>> +  __charmask = vec_gb (__charmask);
>> +  __v8hu __shortmask = (__v8hu) vec_unpackh (__charmask);
>> +  #ifdef __BIG_ENDIAN__
>> +  __shortmask = vec_reve (__shortmask);
>> +  #endif
>> +  return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) 
>> __shortmask);
>
> This seems way too complex, and needs commentary to explain what 
> you're doing.  Doesn't this instruction just translate into some form 
> of xxpermdi?  Different ones for BE and LE, but still just xxpermdi, I 
> think.
>
>> +}
>> +
>> +extern __inline __m128d __attribute__((__gnu_inline__, 
>> __always_inline__, __artificial__))
>> +_mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask)
>> +{
>> +  const __v2di __zero = {0};
>> +  const vector __bool long long __boolmask = vec_cmplt ((__v2di) 
>> __mask, __zero);
>> +  return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) 
>> __boolmask);
>> +}
>
> Okay.
>
>> +
>> +extern __inline __m128 __attribute__((__gnu_inline__, 
>> __always_inline__, __artificial__))
>> +_mm_blend_ps (__m128 __A, __m128 __B, const int __imm8)
>> +{
>> +  const signed char __mask = (__imm8 & 0b1000) * 0b00011000 |
>> +                 (__imm8 & 0b0100) * 0b00001100 |
>> +                 (__imm8 & 0b0010) * 0b00000110 |
>> +                 (__imm8 & 0b0001) * 0b00000011;
>> +  __v16qi __charmask = vec_splats ( __mask);
>> +  __charmask = vec_gb (__charmask);
>> +  __v8hu __shortmask = (__v8hu) vec_unpackh (__charmask);
>
> This is a good trick, but you need comments to explain what you're 
> doing, including how you build __mask.  I recommend you include 
> alternate code for P10, where you can just use vec_genwm to expand 
> from __mask to a mask of word elements.
>
> I don't understand how you're getting away with a v8hu mask for word 
> elements.  This seems wrong to me.  Adequate testing?

As an alternate approach, I suppose you could use vec_perm / vec_permr 
with one of sixteen possible masks, which would seem faster than the 
splat/gather/unpack/select approach.  Something to consider.

Bill

>
>> +  #ifdef __BIG_ENDIAN__
>> +  __shortmask = vec_reve (__shortmask);
>> +  #endif
>> +  return (__m128) vec_sel ((__v8hu) __A, (__v8hu) __B, __shortmask);
>> +}
>> +
>> +extern __inline __m128 __attribute__((__gnu_inline__, 
>> __always_inline__, __artificial__))
>> +_mm_blendv_ps (__m128 __A, __m128 __B, __m128 __mask)
>> +{
>> +  const __v4si __zero = {0};
>> +  const vector __bool int __boolmask = vec_cmplt ((__v4si) __mask, 
>> __zero);
>> +  return (__m128) vec_sel ((__v4su) __A, (__v4su) __B, (__v4su) 
>> __boolmask);
>> +}
>> +
>
> Okay.
>
> Please have a look at the above issues and resubmit.  Thanks!
> Bill
>
>>   extern __inline int __attribute__((__gnu_inline__, 
>> __always_inline__, __artificial__))
>>   _mm_testz_si128 (__m128i __A, __m128i __B)
>>   {

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] rs6000: Add support for SSE4.1 "blend" intrinsics
  2021-07-11 16:29     ` Bill Schmidt
@ 2021-07-12 21:29       ` Paul A. Clarke
  0 siblings, 0 replies; 16+ messages in thread
From: Paul A. Clarke @ 2021-07-12 21:29 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: gcc-patches, segher

On Sun, Jul 11, 2021 at 11:29:24AM -0500, Bill Schmidt wrote:
> On 7/11/21 11:17 AM, Bill Schmidt wrote:
> > On 6/29/21 1:08 PM, Paul A. Clarke via Gcc-patches wrote:
> > > _mm_blend_epi16 and _mm_blendv_epi8 were added earlier.
> > > Add these four to complete the set.
> > > 
> > > 2021-06-29  Paul A. Clarke  <pc@us.ibm.com>
> > > 
> > > gcc/ChangeLog:
> > >     * config/rs6000/smmintrin.h (_mm_blend_pd, _mm_blendv_pd,
> > >     _mm_blend_ps, _mm_blendv_ps): New.
> > > ---
> > >   gcc/config/rs6000/smmintrin.h | 46 +++++++++++++++++++++++++++++++++++
> > >   1 file changed, 46 insertions(+)
> > > 
> > > diff --git a/gcc/config/rs6000/smmintrin.h
> > > b/gcc/config/rs6000/smmintrin.h
> > > index 1b8cad135ed0..fa17a8b2f478 100644
> > > --- a/gcc/config/rs6000/smmintrin.h
> > > +++ b/gcc/config/rs6000/smmintrin.h
> > > @@ -116,6 +116,52 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B,
> > > __m128i __mask)
> > >     return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
> > >   }
> > > 
> > > +extern __inline __m128d __attribute__((__gnu_inline__,
> > > __always_inline__, __artificial__))
> > Usual line length complaint. :)  Here and below...
> > > +_mm_blend_pd (__m128d __A, __m128d __B, const int __imm8)
> > > +{
> > > +  const signed char __tmp = (__imm8 & 0b10) * 0b01111000 |
> > > +                (__imm8 & 0b01) * 0b00001111;
> > > +  __v16qi __charmask = vec_splats ((signed char) __tmp);
> > > +  __charmask = vec_gb (__charmask);
> > > +  __v8hu __shortmask = (__v8hu) vec_unpackh (__charmask);
> > > +  #ifdef __BIG_ENDIAN__
> > > +  __shortmask = vec_reve (__shortmask);
> > > +  #endif
> > > +  return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du)
> > > __shortmask);
> > 
> > This seems way too complex, and needs commentary to explain what you're
> > doing.  Doesn't this instruction just translate into some form of
> > xxpermdi?  Different ones for BE and LE, but still just xxpermdi, I
> > think.

xxpermdi won't work because the operation here is different.
blend_pd is "for each element, select a value from A or B"
(more like a "select" operation than a permute, such that the result may be
entirely identical to an input), whereas xxpermdi always takes one value from
each input.

Your suggestion, below, to use vperm can also be used here.

> > > +}
> > > +
> > > +extern __inline __m128d __attribute__((__gnu_inline__,
> > > __always_inline__, __artificial__))
> > > +_mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask)
> > > +{
> > > +  const __v2di __zero = {0};
> > > +  const vector __bool long long __boolmask = vec_cmplt ((__v2di)
> > > __mask, __zero);
> > > +  return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du)
> > > __boolmask);
> > > +}
> > 
> > Okay.
> > 
> > > +
> > > +extern __inline __m128 __attribute__((__gnu_inline__,
> > > __always_inline__, __artificial__))
> > > +_mm_blend_ps (__m128 __A, __m128 __B, const int __imm8)
> > > +{
> > > +  const signed char __mask = (__imm8 & 0b1000) * 0b00011000 |
> > > +                 (__imm8 & 0b0100) * 0b00001100 |
> > > +                 (__imm8 & 0b0010) * 0b00000110 |
> > > +                 (__imm8 & 0b0001) * 0b00000011;
> > > +  __v16qi __charmask = vec_splats ( __mask);
> > > +  __charmask = vec_gb (__charmask);
> > > +  __v8hu __shortmask = (__v8hu) vec_unpackh (__charmask);
> > 
> > This is a good trick, but you need comments to explain what you're
> > doing, including how you build __mask.  I recommend you include
> > alternate code for P10, where you can just use vec_genwm to expand from
> > __mask to a mask of word elements.
> > 
> > I don't understand how you're getting away with a v8hu mask for word
> > elements.  This seems wrong to me.  Adequate testing?

chars unpack to shorts. If you construct the char vector carefully, it all
just works.  In the end, it's a long bitmask.

Regardless, I'll go with your alternate approach, below...

> As an alternate approach, I suppose you could use vec_perm / vec_permr with
> one of sixteen possible masks, which would seem faster than the
> splat/gather/unpack/select approach.  Something to consider.

vperm will work, and is a bit more straightforward, if just a bit more code.

> > > +  #ifdef __BIG_ENDIAN__
> > > +  __shortmask = vec_reve (__shortmask);
> > > +  #endif
> > > +  return (__m128) vec_sel ((__v8hu) __A, (__v8hu) __B, __shortmask);
> > > +}
> > > +
> > > +extern __inline __m128 __attribute__((__gnu_inline__,
> > > __always_inline__, __artificial__))
> > > +_mm_blendv_ps (__m128 __A, __m128 __B, __m128 __mask)
> > > +{
> > > +  const __v4si __zero = {0};
> > > +  const vector __bool int __boolmask = vec_cmplt ((__v4si) __mask,
> > > __zero);
> > > +  return (__m128) vec_sel ((__v4su) __A, (__v4su) __B, (__v4su)
> > > __boolmask);
> > > +}
> > > +
> > 
> > Okay.
> > 
> > Please have a look at the above issues and resubmit.  Thanks!

Will do. Thanks for the reviews!

PC

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] rs6000: Add support for SSE4.1 "test" intrinsics
  2021-07-11 15:45   ` Bill Schmidt
@ 2021-07-12 22:24     ` Segher Boessenkool
  2021-07-13 19:01       ` [PATCH 1/4 committed] " Paul A. Clarke
  0 siblings, 1 reply; 16+ messages in thread
From: Segher Boessenkool @ 2021-07-12 22:24 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: Paul A. Clarke, gcc-patches

Hi!

On Sun, Jul 11, 2021 at 10:45:45AM -0500, Bill Schmidt wrote:
> On 6/29/21 1:08 PM, Paul A. Clarke via Gcc-patches wrote:
> >--- a/gcc/config/rs6000/smmintrin.h
> >+++ b/gcc/config/rs6000/smmintrin.h
> >@@ -116,4 +116,54 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i 
> >__mask)
> >    return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
> >  }
> >
> >+extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
> >__artificial__))
> Line too long, please fix here and below.  (Existing cases can be left.)

I wouldn't bother in this case.  There is no way to write these
attribute lines in a reasonable way, it doesn't overflow 80 char by that
much, and there isn't anything interesting at the end of line.

You could put it on a line by itself, which helps for now because it
won't get too long until you add another attribute ;-)

There should be a space before (( though, and "extern" on definitions is
superfluous.  But I do not care much about that either -- this isn't a
part of the compiler proper anyway :-)

> LGTM; I can't approve, but recommend approval with line lengths fixed.  

It is okay for trunk with whatever changes you want to do.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] rs6000: Add tests for SSE4.1 "test" intrinsics
  2021-07-11 15:49   ` Bill Schmidt
@ 2021-07-12 22:39     ` Segher Boessenkool
  0 siblings, 0 replies; 16+ messages in thread
From: Segher Boessenkool @ 2021-07-12 22:39 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: Paul A. Clarke, gcc-patches

On Sun, Jul 11, 2021 at 10:49:27AM -0500, Bill Schmidt wrote:
> LGTM.  I can't approve, but recommend approval as is.

Okay for trunk.  Thanks!


Segher


> >2021-06-29  Paul A. Clarke  <pc@us.ibm.com>
> >
> >gcc/testsuite/ChangeLog:
> >         * gcc.target/powerpc/sse4_1-ptest.c: Copy from
> >	gcc/testsuite/gcc.target/i386.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] rs6000: Add tests for SSE4.1 "blend" intrinsics
  2021-07-11 16:19   ` Bill Schmidt
@ 2021-07-12 22:52     ` Segher Boessenkool
  0 siblings, 0 replies; 16+ messages in thread
From: Segher Boessenkool @ 2021-07-12 22:52 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: Paul A. Clarke, gcc-patches

On Sun, Jul 11, 2021 at 11:19:56AM -0500, Bill Schmidt wrote:
> Please resubmit this when you resubmit 3/4, in case any adjustments are 
> needed.

It is testing if elsewhere-defined functions work according to its
specification -- let's hope that doesn't change ;-)


Segher

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4 committed] rs6000: Add support for SSE4.1 "test" intrinsics
  2021-07-12 22:24     ` Segher Boessenkool
@ 2021-07-13 19:01       ` Paul A. Clarke
  2021-07-13 23:12         ` Segher Boessenkool
  0 siblings, 1 reply; 16+ messages in thread
From: Paul A. Clarke @ 2021-07-13 19:01 UTC (permalink / raw)
  To: gcc-patches; +Cc: Bill Schmidt, Segher Boessenkool

On Mon, Jul 12, 2021 at 05:24:07PM -0500, Segher Boessenkool wrote:
> On Sun, Jul 11, 2021 at 10:45:45AM -0500, Bill Schmidt wrote:
> > On 6/29/21 1:08 PM, Paul A. Clarke via Gcc-patches wrote:
> > >--- a/gcc/config/rs6000/smmintrin.h
> > >+++ b/gcc/config/rs6000/smmintrin.h
> > >@@ -116,4 +116,54 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i 
> > >__mask)
> > >    return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
> > >  }
> > >
> > >+extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
> > >__artificial__))
> > Line too long, please fix here and below.  (Existing cases can be left.)
> 
> I wouldn't bother in this case.  There is no way to write these
> attribute lines in a reasonable way, it doesn't overflow 80 char by that
> much, and there isn't anything interesting at the end of line.

I bothered. ;-)

> You could put it on a line by itself, which helps for now because it
> won't get too long until you add another attribute ;-)

OK

> There should be a space before (( though, and "extern" on definitions is
> superfluous.  But I do not care much about that either -- this isn't a
> part of the compiler proper anyway :-)

OK

> It is okay for trunk with whatever changes you want to do.  Thanks!

This is what I committed:

2021-07-13  Paul A. Clarke  <pc@us.ibm.com>

gcc
	* config/rs6000/smmintrin.h (_mm_testz_si128, _mm_testc_si128,
	_mm_testnzc_si128, _mm_test_all_ones, _mm_test_all_zeros,
	_mm_test_mix_ones_zeros): New.
---
 gcc/config/rs6000/smmintrin.h | 56 +++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index bdf6eb365d88..16fd34d836ff 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -116,4 +116,60 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
   return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
 }
 
+__inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_testz_si128 (__m128i __A, __m128i __B)
+{
+  /* Note: This implementation does NOT set "zero" or "carry" flags.  */
+  const __v16qu __zero = {0};
+  return vec_all_eq (vec_and ((__v16qu) __A, (__v16qu) __B), __zero);
+}
+
+__inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_testc_si128 (__m128i __A, __m128i __B)
+{
+  /* Note: This implementation does NOT set "zero" or "carry" flags.  */
+  const __v16qu __zero = {0};
+  const __v16qu __notA = vec_nor ((__v16qu) __A, (__v16qu) __A);
+  return vec_all_eq (vec_and ((__v16qu) __notA, (__v16qu) __B), __zero);
+}
+
+__inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_testnzc_si128 (__m128i __A, __m128i __B)
+{
+  /* Note: This implementation does NOT set "zero" or "carry" flags.  */
+  return _mm_testz_si128 (__A, __B) == 0 && _mm_testc_si128 (__A, __B) == 0;
+}
+
+__inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_test_all_zeros (__m128i __A, __m128i __mask)
+{
+  const __v16qu __zero = {0};
+  return vec_all_eq (vec_and ((__v16qu) __A, (__v16qu) __mask), __zero);
+}
+
+__inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_test_all_ones (__m128i __A)
+{
+  const __v16qu __ones = vec_splats ((unsigned char) 0xff);
+  return vec_all_eq ((__v16qu) __A, __ones);
+}
+
+__inline int
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_test_mix_ones_zeros (__m128i __A, __m128i __mask)
+{
+  const __v16qu __zero = {0};
+  const __v16qu __Amasked = vec_and ((__v16qu) __A, (__v16qu) __mask);
+  const int any_ones = vec_any_ne (__Amasked, __zero);
+  const __v16qu __notA = vec_nor ((__v16qu) __A, (__v16qu) __A);
+  const __v16qu __notAmasked = vec_and ((__v16qu) __notA, (__v16qu) __mask);
+  const int any_zeros = vec_any_ne (__notAmasked, __zero);
+  return any_ones * any_zeros;
+}
+
 #endif
-- 
2.27.0

PC

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4 committed] rs6000: Add support for SSE4.1 "test" intrinsics
  2021-07-13 19:01       ` [PATCH 1/4 committed] " Paul A. Clarke
@ 2021-07-13 23:12         ` Segher Boessenkool
  0 siblings, 0 replies; 16+ messages in thread
From: Segher Boessenkool @ 2021-07-13 23:12 UTC (permalink / raw)
  To: Paul A. Clarke; +Cc: gcc-patches, Bill Schmidt

On Tue, Jul 13, 2021 at 02:01:18PM -0500, Paul A. Clarke wrote:
> > > >+extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
> > > >__artificial__))
> > > Line too long, please fix here and below.  (Existing cases can be left.)
> > 
> > I wouldn't bother in this case.  There is no way to write these
> > attribute lines in a reasonable way, it doesn't overflow 80 char by that
> > much, and there isn't anything interesting at the end of line.
> 
> I bothered. ;-)

Ha :-)

Btw, Bill suggested to me offline making a preprocessor macro for this
long attribute line.  Which is a fine suggestion!  Something for the
future, maybe?


Segher

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-07-13 23:13 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-29 18:08 [PATCH 0/4] rs6000: Add SSE4.1 "test" and "blend" intrinsics Paul A. Clarke
2021-06-29 18:08 ` [PATCH 1/4] rs6000: Add support for SSE4.1 "test" intrinsics Paul A. Clarke
2021-07-11 15:45   ` Bill Schmidt
2021-07-12 22:24     ` Segher Boessenkool
2021-07-13 19:01       ` [PATCH 1/4 committed] " Paul A. Clarke
2021-07-13 23:12         ` Segher Boessenkool
2021-06-29 18:08 ` [PATCH 2/4] rs6000: Add tests " Paul A. Clarke
2021-07-11 15:49   ` Bill Schmidt
2021-07-12 22:39     ` Segher Boessenkool
2021-06-29 18:08 ` [PATCH 3/4] rs6000: Add support for SSE4.1 "blend" intrinsics Paul A. Clarke
2021-07-11 16:17   ` Bill Schmidt
2021-07-11 16:29     ` Bill Schmidt
2021-07-12 21:29       ` Paul A. Clarke
2021-06-29 18:08 ` [PATCH 4/4] rs6000: Add tests " Paul A. Clarke
2021-07-11 16:19   ` Bill Schmidt
2021-07-12 22:52     ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).