public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] rs6000: Add SSE4.1 "blend", "ceil", "floor"
@ 2021-07-16 13:50 Paul A. Clarke
  2021-07-16 13:50 ` [PATCH v2 1/6] rs6000: Add support for SSE4.1 "blend" intrinsics Paul A. Clarke
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Paul A. Clarke @ 2021-07-16 13:50 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, wschmidt

I have combined three independent "v1" patchsets into this set,
and the "blend" patches were originally combined with "test",
which has now been merged.

Instead of copying some tests from gcc/testsuite/gcc.target/i386,
I created new tests.  The i386 tests in question used rand() to
generate the input data and assembly to compute the rounded values.
Using rand() for testing seems wrong, and the assembly is obviously
not portable.  I use static data, primarily exercising the edges of
dynamic ranges (where fractions start to be unrepresentable).

Tested on ppc64le, ppc64, ppc.

v2:
- Rewrite blends to use vec_perm.
- Improve formatting.

Paul A. Clarke (6):
  rs6000: Add support for SSE4.1 "blend" intrinsics
  rs6000: Add tests for SSE4.1 "blend" intrinsics
  rs6000: Add support for SSE4.1 "ceil" intrinsics
  rs6000: Add tests for SSE4.1 "ceil" intrinsics
  rs6000: Add support for SSE4.1 "floor" intrinsics
  rs6000: Add tests for SSE4.1 "floor" intrinsics

 gcc/config/rs6000/smmintrin.h                 | 124 ++++++++++++++++++
 .../gcc.target/powerpc/sse4_1-blendpd.c       |  89 +++++++++++++
 .../gcc.target/powerpc/sse4_1-blendps-2.c     |  81 ++++++++++++
 .../gcc.target/powerpc/sse4_1-blendps.c       |  90 +++++++++++++
 .../gcc.target/powerpc/sse4_1-blendvpd.c      |  65 +++++++++
 .../gcc.target/powerpc/sse4_1-ceilpd.c        |  51 +++++++
 .../gcc.target/powerpc/sse4_1-ceilps.c        |  41 ++++++
 .../gcc.target/powerpc/sse4_1-ceilsd.c        | 119 +++++++++++++++++
 .../gcc.target/powerpc/sse4_1-ceilss.c        |  95 ++++++++++++++
 .../gcc.target/powerpc/sse4_1-check.h         |   4 +
 .../gcc.target/powerpc/sse4_1-floorpd.c       |  51 +++++++
 .../gcc.target/powerpc/sse4_1-floorps.c       |  41 ++++++
 .../gcc.target/powerpc/sse4_1-floorsd.c       | 119 +++++++++++++++++
 .../gcc.target/powerpc/sse4_1-floorss.c       |  95 ++++++++++++++
 .../gcc.target/powerpc/sse4_1-round-data.h    |  20 +++
 .../gcc.target/powerpc/sse4_1-round.h         |  27 ++++
 .../gcc.target/powerpc/sse4_1-round2.h        |  27 ++++
 .../gcc.target/powerpc/sse4_1-roundpd-2.c     |  36 +++++
 .../gcc.target/powerpc/sse4_1-roundpd-3.c     |  36 +++++
 19 files changed, 1211 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendvpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilss.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorss.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round-data.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round2.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-3.c

-- 
2.27.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/6] rs6000: Add support for SSE4.1 "blend" intrinsics
  2021-07-16 13:50 [PATCH v2 0/6] rs6000: Add SSE4.1 "blend", "ceil", "floor" Paul A. Clarke
@ 2021-07-16 13:50 ` Paul A. Clarke
  2021-07-16 18:13   ` Bill Schmidt
  2021-07-28 21:30   ` Segher Boessenkool
  2021-07-16 13:50 ` [PATCH v2 2/6] rs6000: Add tests " Paul A. Clarke
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 20+ messages in thread
From: Paul A. Clarke @ 2021-07-16 13:50 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, wschmidt

_mm_blend_epi16 and _mm_blendv_epi8 were added earlier.
Add these four to complete the set.

2021-07-16  Paul A. Clarke  <pc@us.ibm.com>

gcc
	* config/rs6000/smmintrin.h (_mm_blend_pd, _mm_blendv_pd,
	_mm_blend_ps, _mm_blendv_ps): New.
---
v2:
- Per review from Bill, rewrote _mm_blend_pd and _mm_blendv_pd to use
  vec_perm instead of gather/unpack/select.

 gcc/config/rs6000/smmintrin.h | 60 +++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 6a010fdbb96f..69e54702a877 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -116,6 +116,66 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
   return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
 }
 
+__inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_blend_pd (__m128d __A, __m128d __B, const int __imm8)
+{
+  __v16qu __pcv[] =
+    {
+      {  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15 },
+      { 16, 17, 18, 19, 20, 21, 22, 23,  8,  9, 10, 11, 12, 13, 14, 15 },
+      {  0,  1,  2,  3,  4,  5,  6,  7, 24, 25, 26, 27, 28, 29, 30, 31 },
+      { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 }
+    };
+  __v16qu __r = vec_perm ((__v16qu) __A, (__v16qu)__B, __pcv[__imm8]);
+  return (__m128d) __r;
+}
+
+__inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask)
+{
+  const __v2di __zero = {0};
+  const __vector __bool long long __boolmask = vec_cmplt ((__v2di) __mask, __zero);
+  return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) __boolmask);
+}
+
+__inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_blend_ps (__m128 __A, __m128 __B, const int __imm8)
+{
+  __v16qu __pcv[] =
+    {
+      {  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15 },
+      { 16, 17, 18, 19,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15 },
+      {  0,  1,  2,  3, 20, 21, 22, 23,  8,  9, 10, 11, 12, 13, 14, 15 },
+      { 16, 17, 18, 19, 20, 21, 22, 23,  8,  9, 10, 11, 12, 13, 14, 15 },
+      {  0,  1,  2,  3,  4,  5,  6,  7, 24, 25, 26, 27, 12, 13, 14, 15 },
+      { 16, 17, 18, 19,  4,  5,  6,  7, 24, 25, 26, 27, 12, 13, 14, 15 },
+      {  0,  1,  2,  3, 20, 21, 22, 23, 24, 25, 26, 27, 12, 13, 14, 15 },
+      { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 12, 13, 14, 15 },
+      {  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 28, 29, 30, 31 },
+      { 16, 17, 18, 19,  4,  5,  6,  7,  8,  9, 10, 11, 28, 29, 30, 31 },
+      {  0,  1,  2,  3, 20, 21, 22, 23,  8,  9, 10, 11, 28, 29, 30, 31 },
+      { 16, 17, 18, 19, 20, 21, 22, 23,  8,  9, 10, 11, 28, 29, 30, 31 },
+      {  0,  1,  2,  3,  4,  5,  6,  7, 24, 25, 26, 27, 28, 29, 30, 31 },
+      { 16, 17, 18, 19,  4,  5,  6,  7, 24, 25, 26, 27, 28, 29, 30, 31 },
+      {  0,  1,  2,  3, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 },
+      { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 },
+    };
+  __v16qu __r = vec_perm ((__v16qu) __A, (__v16qu)__B, __pcv[__imm8]);
+  return (__m128) __r;
+}
+
+__inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_blendv_ps (__m128 __A, __m128 __B, __m128 __mask)
+{
+  const __v4si __zero = {0};
+  const __vector __bool int __boolmask = vec_cmplt ((__v4si) __mask, __zero);
+  return (__m128) vec_sel ((__v4su) __A, (__v4su) __B, (__v4su) __boolmask);
+}
+
 __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_testz_si128 (__m128i __A, __m128i __B)
-- 
2.27.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 2/6] rs6000: Add tests for SSE4.1 "blend" intrinsics
  2021-07-16 13:50 [PATCH v2 0/6] rs6000: Add SSE4.1 "blend", "ceil", "floor" Paul A. Clarke
  2021-07-16 13:50 ` [PATCH v2 1/6] rs6000: Add support for SSE4.1 "blend" intrinsics Paul A. Clarke
@ 2021-07-16 13:50 ` Paul A. Clarke
  2021-07-16 18:16   ` Bill Schmidt
  2021-07-28 21:51   ` Segher Boessenkool
  2021-07-16 13:50 ` [PATCH v2 3/6] rs6000: Add support for SSE4.1 "ceil" intrinsics Paul A. Clarke
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 20+ messages in thread
From: Paul A. Clarke @ 2021-07-16 13:50 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, wschmidt

Copy the tests for _mm_blend_pd, _mm_blendv_pd, _mm_blend_ps,
_mm_blendv_ps from gcc/testsuite/gcc.target/i386.

2021-07-16  Paul A. Clarke  <pc@us.ibm.com>

gcc/testsuite
	* gcc.target/powerpc/sse4_1-blendpd.c: Copy from gcc.target/i386.
	* gcc.target/powerpc/sse4_1-blendps-2.c: Likewise.
	* gcc.target/powerpc/sse4_1-blendps.c: Likewise.
	* gcc.target/powerpc/sse4_1-blendvpd.c: Likewise.
---
v2: Improve formatting per review from Bill.

 .../gcc.target/powerpc/sse4_1-blendpd.c       | 89 ++++++++++++++++++
 .../gcc.target/powerpc/sse4_1-blendps-2.c     | 81 +++++++++++++++++
 .../gcc.target/powerpc/sse4_1-blendps.c       | 90 +++++++++++++++++++
 .../gcc.target/powerpc/sse4_1-blendvpd.c      | 65 ++++++++++++++
 4 files changed, 325 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendvpd.c

diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c
new file mode 100644
index 000000000000..ca1780471fa2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c
@@ -0,0 +1,89 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#ifndef CHECK_H
+#define CHECK_H "sse4_1-check.h"
+#endif
+
+#ifndef TEST
+#define TEST sse4_1_test
+#endif
+
+#include CHECK_H
+
+#include <smmintrin.h>
+#include <string.h>
+
+#define NUM 20
+
+#ifndef MASK
+#define MASK 0x03
+#endif
+
+static void
+init_blendpd (double *src1, double *src2)
+{
+  int i, sign = 1;
+
+  for (i = 0; i < NUM * 2; i++)
+    {
+      src1[i] = i * i * sign;
+      src2[i] = (i + 20) * sign;
+      sign = -sign;
+    }
+}
+
+static int
+check_blendpd (__m128d *dst, double *src1, double *src2)
+{
+  double tmp[2];
+  int j;
+
+  memcpy (&tmp[0], src1, sizeof (tmp));
+
+  for(j = 0; j < 2; j++)
+    if ((MASK & (1 << j)))
+      tmp[j] = src2[j];
+
+  return memcmp (dst, &tmp[0], sizeof (tmp));
+}
+
+static void
+TEST (void)
+{
+  __m128d x, y;
+  union
+    {
+      __m128d x[NUM];
+      double d[NUM * 2];
+    } dst, src1, src2;
+  union
+    {
+      __m128d x;
+      double d[2];
+    } src3;
+  int i;
+
+  init_blendpd (src1.d, src2.d);
+
+  /* Check blendpd imm8, m128, xmm */
+  for (i = 0; i < NUM; i++)
+    {
+      dst.x[i] = _mm_blend_pd (src1.x[i], src2.x[i], MASK);
+      if (check_blendpd (&dst.x[i], &src1.d[i * 2], &src2.d[i * 2]))
+	abort ();
+    }
+    
+  /* Check blendpd imm8, xmm, xmm */
+  src3.x = _mm_setzero_pd ();
+
+  x = _mm_blend_pd (dst.x[2], src3.x, MASK);
+  y = _mm_blend_pd (src3.x, dst.x[2], MASK);
+
+  if (check_blendpd (&x, &dst.d[4], &src3.d[0]))
+    abort ();
+
+  if (check_blendpd (&y, &src3.d[0], &dst.d[4]))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c
new file mode 100644
index 000000000000..768b6e64bbae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c
@@ -0,0 +1,81 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#include "sse4_1-check.h"
+
+#include <smmintrin.h>
+#include <string.h>
+#include <stdlib.h>
+
+#define NUM 20
+
+#undef MASK
+#define MASK 0xe
+
+static void
+init_blendps (float *src1, float *src2)
+{
+  int i, sign = 1;
+
+  for (i = 0; i < NUM * 4; i++)
+    {
+      src1[i] = i * i * sign;
+      src2[i] = (i + 20) * sign;
+      sign = -sign;
+    }
+}
+
+static int
+check_blendps (__m128 *dst, float *src1, float *src2)
+{
+  float tmp[4];
+  int j;
+
+  memcpy (&tmp[0], src1, sizeof (tmp));
+  for (j = 0; j < 4; j++)
+    if ((MASK & (1 << j)))
+      tmp[j] = src2[j];
+
+  return memcmp (dst, &tmp[0], sizeof (tmp));
+}
+
+static void
+sse4_1_test (void)
+{
+  __m128 x, y;
+  union
+    {
+      __m128 x[NUM];
+      float f[NUM * 4];
+    } dst, src1, src2;
+  union
+    {
+      __m128 x;
+      float f[4];
+    } src3;
+  int i;
+
+  init_blendps (src1.f, src2.f);
+
+  for (i = 0; i < 4; i++)
+    src3.f[i] = (int) rand ();
+
+  /* Check blendps imm8, m128, xmm */
+  for (i = 0; i < NUM; i++)
+    {
+      dst.x[i] = _mm_blend_ps (src1.x[i], src2.x[i], MASK); 
+      if (check_blendps (&dst.x[i], &src1.f[i * 4], &src2.f[i * 4]))
+	abort ();
+    }
+    
+   /* Check blendps imm8, xmm, xmm */
+  x = _mm_blend_ps (dst.x[2], src3.x, MASK);
+  y = _mm_blend_ps (src3.x, dst.x[2], MASK);
+
+  if (check_blendps (&x, &dst.f[8], &src3.f[0]))
+    abort ();
+
+  if (check_blendps (&y, &src3.f[0], &dst.f[8]))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c
new file mode 100644
index 000000000000..2f114b69a84b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c
@@ -0,0 +1,90 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#ifndef CHECK_H
+#define CHECK_H "sse4_1-check.h"
+#endif
+
+#ifndef TEST
+#define TEST sse4_1_test
+#endif
+
+#include CHECK_H
+
+#include <smmintrin.h>
+#include <string.h>
+#include <stdlib.h>
+
+#define NUM 20
+
+#ifndef MASK
+#define MASK 0x0f
+#endif
+
+static void
+init_blendps (float *src1, float *src2)
+{
+  int i, sign = 1;
+
+  for (i = 0; i < NUM * 4; i++)
+    {
+      src1[i] = i * i * sign;
+      src2[i] = (i + 20) * sign;
+      sign = -sign;
+    }
+}
+
+static int
+check_blendps (__m128 *dst, float *src1, float *src2)
+{
+  float tmp[4];
+  int j;
+
+  memcpy (&tmp[0], src1, sizeof (tmp));
+  for (j = 0; j < 4; j++)
+    if ((MASK & (1 << j)))
+      tmp[j] = src2[j];
+
+  return memcmp (dst, &tmp[0], sizeof (tmp));
+}
+
+static void
+TEST (void)
+{
+  __m128 x, y;
+  union
+    {
+      __m128 x[NUM];
+      float f[NUM * 4];
+    } dst, src1, src2;
+  union
+    {
+      __m128 x;
+      float f[4];
+    } src3;
+  int i;
+
+  init_blendps (src1.f, src2.f);
+
+  for (i = 0; i < 4; i++)
+    src3.f[i] = (int) rand ();
+
+  /* Check blendps imm8, m128, xmm */
+  for (i = 0; i < NUM; i++)
+    {
+      dst.x[i] = _mm_blend_ps (src1.x[i], src2.x[i], MASK); 
+      if (check_blendps (&dst.x[i], &src1.f[i * 4], &src2.f[i * 4]))
+	abort ();
+    }
+    
+   /* Check blendps imm8, xmm, xmm */
+  x = _mm_blend_ps (dst.x[2], src3.x, MASK);
+  y = _mm_blend_ps (src3.x, dst.x[2], MASK);
+
+  if (check_blendps (&x, &dst.f[8], &src3.f[0]))
+    abort ();
+
+  if (check_blendps (&y, &src3.f[0], &dst.f[8]))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvpd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvpd.c
new file mode 100644
index 000000000000..b82cd28848a6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvpd.c
@@ -0,0 +1,65 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#include "sse4_1-check.h"
+
+#include <smmintrin.h>
+#include <string.h>
+
+#define NUM 20
+
+static void
+init_blendvpd (double *src1, double *src2, double *mask)
+{
+  int i, msk, sign = 1; 
+
+  msk = -1;
+  for (i = 0; i < NUM * 2; i++)
+    {
+      if((i % 2) == 0)
+	msk++;
+      src1[i] = i* (i + 1) * sign;
+      src2[i] = (i + 20) * sign;
+      mask[i] = (i + 120) * i;
+      if( (msk & (1 << (i % 2))))
+	mask[i] = -mask[i];
+      sign = -sign;
+    }
+}
+
+static int
+check_blendvpd (__m128d *dst, double *src1, double *src2,
+		double *mask)
+{
+  double tmp[2];
+  int j;
+
+  memcpy (&tmp[0], src1, sizeof (tmp));
+  for (j = 0; j < 2; j++)
+    if (mask [j] < 0.0)
+      tmp[j] = src2[j];
+
+  return memcmp (dst, &tmp[0], sizeof (tmp));
+}
+
+static void
+sse4_1_test (void)
+{
+  union
+    {
+      __m128d x[NUM];
+      double d[NUM * 2];
+    } dst, src1, src2, mask;
+  int i;
+
+  init_blendvpd (src1.d, src2.d, mask.d);
+
+  for (i = 0; i < NUM; i++)
+    {
+      dst.x[i] = _mm_blendv_pd (src1.x[i], src2.x[i], mask.x[i]);
+      if (check_blendvpd (&dst.x[i], &src1.d[i * 2], &src2.d[i * 2],
+			  &mask.d[i * 2]))
+	abort ();
+    }
+}
-- 
2.27.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 3/6] rs6000: Add support for SSE4.1 "ceil" intrinsics
  2021-07-16 13:50 [PATCH v2 0/6] rs6000: Add SSE4.1 "blend", "ceil", "floor" Paul A. Clarke
  2021-07-16 13:50 ` [PATCH v2 1/6] rs6000: Add support for SSE4.1 "blend" intrinsics Paul A. Clarke
  2021-07-16 13:50 ` [PATCH v2 2/6] rs6000: Add tests " Paul A. Clarke
@ 2021-07-16 13:50 ` Paul A. Clarke
  2021-07-16 18:20   ` Bill Schmidt
  2021-07-28 22:01   ` Segher Boessenkool
  2021-07-16 13:50 ` [PATCH v2 4/6] rs6000: Add tests " Paul A. Clarke
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 20+ messages in thread
From: Paul A. Clarke @ 2021-07-16 13:50 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, wschmidt

2021-07-16  Paul A. Clarke  <pc@us.ibm.com>

gcc
	* config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps,
	_mm_ceil_sd, _mm_ceil_ss): New.
---
v2: Improve formatting per review from Bill.

 gcc/config/rs6000/smmintrin.h | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 69e54702a877..cad770a67631 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -232,6 +232,38 @@ _mm_test_mix_ones_zeros (__m128i __A, __m128i __mask)
   return any_ones * any_zeros;
 }
 
+__inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_ceil_pd (__m128d __A)
+{
+  return (__m128d) vec_ceil ((__v2df) __A);
+}
+
+__inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_ceil_ps (__m128 __A)
+{
+  return (__m128) vec_ceil ((__v4sf) __A);
+}
+
+__inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_ceil_sd (__m128d __A, __m128d __B)
+{
+  __v2df r = vec_ceil ((__v2df) __B);
+  r[1] = ((__v2df) __A)[1];
+  return (__m128d) r;
+}
+
+__inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_ceil_ss (__m128 __A, __m128 __B)
+{
+  __v4sf r = (__v4sf) __A;
+  r[0] = __builtin_ceil (((__v4sf) __B)[0]);
+  return r;
+}
+
 /* Return horizontal packed word minimum and its index in bits [15:0]
    and bits [18:16] respectively.  */
 __inline __m128i
-- 
2.27.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 4/6] rs6000: Add tests for SSE4.1 "ceil" intrinsics
  2021-07-16 13:50 [PATCH v2 0/6] rs6000: Add SSE4.1 "blend", "ceil", "floor" Paul A. Clarke
                   ` (2 preceding siblings ...)
  2021-07-16 13:50 ` [PATCH v2 3/6] rs6000: Add support for SSE4.1 "ceil" intrinsics Paul A. Clarke
@ 2021-07-16 13:50 ` Paul A. Clarke
  2021-07-16 18:22   ` Bill Schmidt
  2021-07-28 22:16   ` Segher Boessenkool
  2021-07-16 13:50 ` [PATCH v2 5/6] rs6000: Add support for SSE4.1 "floor" intrinsics Paul A. Clarke
  2021-07-16 13:50 ` [PATCH v2 6/6] rs6000: Add tests " Paul A. Clarke
  5 siblings, 2 replies; 20+ messages in thread
From: Paul A. Clarke @ 2021-07-16 13:50 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, wschmidt

Add the tests for _mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd, _mm_ceil_ss.

Copy a test for _mm_ceil_pd and _mm_ceil_ps from
gcc/testsuite/gcc.target/i386.

Define __VSX_SSE2__ to pick up some union definitions in
m128-check.h.

2021-07-16  Paul A. Clarke  <pc@us.ibm.com>

gcc/testsuite
	* gcc.target/powerpc/sse4_1-ceilpd.c: New.
	* gcc.target/powerpc/sse4_1-ceilps.c: New.
	* gcc.target/powerpc/sse4_1-ceilsd.c: New.
	* gcc.target/powerpc/sse4_1-ceilss.c: New.
	* gcc.target/powerpc/sse4_1-round-data.h: New.
	* gcc.target/powerpc/sse4_1-round.h: New.
	* gcc.target/powerpc/sse4_1-round2.h: New.
	* gcc.target/powerpc/sse4_1-roundpd-3.c: Copy from gcc.target/i386.
	* gcc.target/powerpc/sse4_1-check.h (__VSX_SSE2__): Define.
---
v2: Improve formatting per review from Bill.

 .../gcc.target/powerpc/sse4_1-ceilpd.c        |  51 ++++++++
 .../gcc.target/powerpc/sse4_1-ceilps.c        |  41 ++++++
 .../gcc.target/powerpc/sse4_1-ceilsd.c        | 119 ++++++++++++++++++
 .../gcc.target/powerpc/sse4_1-ceilss.c        |  95 ++++++++++++++
 .../gcc.target/powerpc/sse4_1-check.h         |   4 +
 .../gcc.target/powerpc/sse4_1-round-data.h    |  20 +++
 .../gcc.target/powerpc/sse4_1-round.h         |  27 ++++
 .../gcc.target/powerpc/sse4_1-round2.h        |  27 ++++
 .../gcc.target/powerpc/sse4_1-roundpd-3.c     |  36 ++++++
 9 files changed, 420 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilss.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round-data.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round2.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-3.c

diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c
new file mode 100644
index 000000000000..f532fdb9c285
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c
@@ -0,0 +1,51 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <smmintrin.h>
+
+#define VEC_T __m128d
+#define FP_T double
+
+#define ROUND_INTRIN(x, mode) _mm_ceil_pd (x)
+
+#include "sse4_1-round-data.h"
+
+static struct data data[] = {
+  { .value = { .f = {  0.00,  0.25 } }, .answer = {  0.0,  1.0 } },
+  { .value = { .f = {  0.50,  0.75 } }, .answer = {  1.0,  1.0 } },
+
+  { { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffdp+50 } },
+           {  0x1.ffffffffffffcp+50,  0x1.0000000000000p+51 } },
+  { { .f = {  0x1.ffffffffffffep+50,  0x1.fffffffffffffp+50 } },
+           {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
+  { { .f = {  0x1.0000000000000p+51,  0x1.0000000000001p+51 } },
+           {  0x1.0000000000000p+51,  0x1.0000000000002p+51 } },
+  { { .f = {  0x1.0000000000002p+51,  0x1.0000000000003p+51 } },
+           {  0x1.0000000000002p+51,  0x1.0000000000004p+51 } },
+
+  { { .f = {  0x1.ffffffffffffep+51,  0x1.fffffffffffffp+51 } },
+           {  0x1.ffffffffffffep+51,  0x1.0000000000000p+52 } },
+  { { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
+           {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
+
+  { { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
+           { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
+  { { .f = { -0x1.fffffffffffffp+51, -0x1.ffffffffffffep+51 } },
+           { -0x1.ffffffffffffep+51, -0x1.ffffffffffffep+51 } },
+
+  { { .f = { -0x1.0000000000003p+51, -0x1.0000000000002p+51 } },
+           { -0x1.0000000000002p+51, -0x1.0000000000002p+51 } },
+  { { .f = { -0x1.0000000000001p+51, -0x1.0000000000000p+51 } },
+           { -0x1.0000000000000p+51, -0x1.0000000000000p+51 } },
+  { { .f = { -0x1.fffffffffffffp+50, -0x1.ffffffffffffep+50 } },
+           { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
+  { { .f = { -0x1.ffffffffffffdp+50, -0x1.ffffffffffffcp+50 } },
+           { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
+
+  { { .f = { -1.00, -0.75 } }, { -1.0,  0.0 } },
+  { { .f = { -0.50, -0.25 } }, {  0.0,  0.0 } }
+};
+
+#include "sse4_1-round.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c
new file mode 100644
index 000000000000..1e29999a57d8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c
@@ -0,0 +1,41 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <smmintrin.h>
+
+#define VEC_T __m128
+#define FP_T float
+
+#define ROUND_INTRIN(x, mode) _mm_ceil_ps (x)
+
+#include "sse4_1-round-data.h"
+
+static struct data data[] = {
+  { { .f = {  0.00,  0.25,  0.50,  0.75 } }, {  0.0,  1.0,  1.0,  1.0 } },
+
+  { { .f = {  0x1.fffff8p+21,  0x1.fffffap+21,
+	      0x1.fffffcp+21,  0x1.fffffep+21 } },
+           {  0x1.fffff8p+21,  0x1.000000p+22,
+	      0x1.000000p+22,  0x1.000000p+22 } },
+
+  { { .f = {  0x1.fffffap+22,  0x1.fffffcp+22,
+	      0x1.fffffep+22,  0x1.fffffep+23 } },
+           {  0x1.fffffcp+22,  0x1.fffffcp+22,
+	      0x1.000000p+23,  0x1.fffffep+23 } },
+
+  { { .f = { -0x1.fffffep+23, -0x1.fffffep+22,
+	     -0x1.fffffcp+22, -0x1.fffffap+22 } },
+           { -0x1.fffffep+23, -0x1.fffffcp+22,
+	     -0x1.fffffcp+22, -0x1.fffff8p+22 } },
+
+  { { .f = { -0x1.fffffep+21, -0x1.fffffcp+21,
+	     -0x1.fffffap+21, -0x1.fffff8p+21 } },
+           { -0x1.fffff8p+21, -0x1.fffff8p+21,
+	     -0x1.fffff8p+21, -0x1.fffff8p+21 } },
+
+  { { .f = { -1.00, -0.75, -0.50, -0.25 } }, { -1.0,  0.0,  0.0,  0.0 } }
+};
+
+#include "sse4_1-round.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilsd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilsd.c
new file mode 100644
index 000000000000..cc0d9c1d0afe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilsd.c
@@ -0,0 +1,119 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <smmintrin.h>
+
+#define VEC_T __m128d
+#define FP_T double
+
+#define ROUND_INTRIN(x, y) _mm_ceil_sd (x, y)
+
+#include "sse4_1-round-data.h"
+
+static struct data2 data[] = {
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0.00, IGNORED } },
+    .answer = {  0.0, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0.25, IGNORED } },
+    .answer = {  1.0, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0.50, IGNORED } },
+    .answer = {  1.0, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0.75, IGNORED } },
+    .answer = {  1.0, PASSTHROUGH } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.ffffffffffffcp+50, IGNORED } },
+    .answer = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.ffffffffffffdp+50, IGNORED } },
+    .answer = {  0x1.0000000000000p+51, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.ffffffffffffep+50, IGNORED } },
+    .answer = {  0x1.0000000000000p+51, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffffffffffp+50, IGNORED } },
+    .answer = {  0x1.0000000000000p+51, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.0000000000000p+51, IGNORED } },
+    .answer = {  0x1.0000000000000p+51, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.0000000000001p+51, IGNORED } },
+    .answer = {  0x1.0000000000002p+51, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.0000000000002p+51, IGNORED } },
+    .answer = {  0x1.0000000000002p+51, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.0000000000003p+51, IGNORED } },
+    .answer = {  0x1.0000000000004p+51, PASSTHROUGH } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.ffffffffffffep+51, IGNORED } },
+    .answer = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffffffffffp+51, IGNORED } },
+    .answer = {  0x1.0000000000000p+52, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.0000000000000p+52, IGNORED } },
+    .answer = {  0x1.0000000000000p+52, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.0000000000001p+52, IGNORED } },
+    .answer = {  0x1.0000000000001p+52, PASSTHROUGH } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.0000000000001p+52, IGNORED } },
+    .answer = { -0x1.0000000000001p+52, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.0000000000000p+52, IGNORED } },
+    .answer = { -0x1.0000000000000p+52, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffffffffffffp+51, IGNORED } },
+    .answer = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.ffffffffffffep+51, IGNORED } },
+    .answer = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.0000000000003p+51, IGNORED } },
+    .answer = { -0x1.0000000000002p+51, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.0000000000002p+51, IGNORED } },
+    .answer = { -0x1.0000000000002p+51, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.0000000000001p+51, IGNORED } },
+    .answer = { -0x1.0000000000000p+51, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.0000000000000p+51, IGNORED } },
+    .answer = { -0x1.0000000000000p+51, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.ffffffffffffcp+50, IGNORED } },
+    .answer = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.ffffffffffffep+50, IGNORED } },
+    .answer = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.ffffffffffffdp+50, IGNORED } },
+    .answer = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.ffffffffffffcp+50, IGNORED } },
+    .answer = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -1.00, IGNORED } },
+    .answer = { -1.0, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0.75, IGNORED } },
+    .answer = { -0.0, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0.50, IGNORED } },
+    .answer = { -0.0, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0.25, IGNORED } },
+    .answer = { -0.0, PASSTHROUGH } }
+};
+
+#include "sse4_1-round2.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilss.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilss.c
new file mode 100644
index 000000000000..cf1a0392990e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilss.c
@@ -0,0 +1,95 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <smmintrin.h>
+
+#define VEC_T __m128
+#define FP_T float
+
+#define ROUND_INTRIN(x, y) _mm_ceil_ss (x, y)
+
+#include "sse4_1-round-data.h"
+
+static struct data2 data[] = {
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0.00,  IGNORED, IGNORED, IGNORED } },
+    .answer = {  0.0, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0.25, IGNORED, IGNORED, IGNORED } },
+    .answer = {  1.0, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0.50, IGNORED, IGNORED, IGNORED } },
+    .answer = {  1.0, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0.75, IGNORED, IGNORED, IGNORED } },
+    .answer = {  1.0, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffff8p+21, IGNORED, IGNORED, IGNORED } },
+    .answer = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffap+21, IGNORED, IGNORED, IGNORED } },
+    .answer = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffcp+21, IGNORED, IGNORED, IGNORED } },
+    .answer = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffep+21, IGNORED, IGNORED, IGNORED } },
+    .answer = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffap+22, IGNORED, IGNORED, IGNORED } },
+    .answer = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffcp+22, IGNORED, IGNORED, IGNORED } },
+    .answer = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffep+22, IGNORED, IGNORED, IGNORED } },
+    .answer = {  0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffep+23, IGNORED, IGNORED, IGNORED } },
+    .answer = {  0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffffep+23, IGNORED, IGNORED, IGNORED } },
+    .answer = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffffep+22, IGNORED, IGNORED, IGNORED } },
+    .answer = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffffcp+22, IGNORED, IGNORED, IGNORED } },
+    .answer = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffffap+22, IGNORED, IGNORED, IGNORED } },
+    .answer = { -0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffffep+21, IGNORED, IGNORED, IGNORED } },
+    .answer = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffffcp+21, IGNORED, IGNORED, IGNORED } },
+    .answer = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffffap+21, IGNORED, IGNORED, IGNORED } },
+    .answer = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffff8p+21, IGNORED, IGNORED, IGNORED } },
+    .answer = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -1.00, IGNORED, IGNORED, IGNORED } },
+    .answer = { -1.0, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0.75, IGNORED, IGNORED, IGNORED } },
+    .answer = {  0.0, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0.50, IGNORED, IGNORED, IGNORED } },
+    .answer = {  0.0, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0.25, IGNORED, IGNORED, IGNORED } },
+    .answer = {  0.0, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }
+};
+
+#include "sse4_1-round2.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-check.h b/gcc/testsuite/gcc.target/powerpc/sse4_1-check.h
index 5f855b9fd53a..16330533e50a 100644
--- a/gcc/testsuite/gcc.target/powerpc/sse4_1-check.h
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-check.h
@@ -1,6 +1,10 @@
 #include <stdio.h>
 #include <stdlib.h>
 
+/* Define this to enable the combination of VSX vector double and
+   SSE2 data types.  */
+#define __VSX_SSE2__ 1
+
 #include "m128-check.h"
 
 //#define DEBUG 1
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-round-data.h b/gcc/testsuite/gcc.target/powerpc/sse4_1-round-data.h
new file mode 100644
index 000000000000..543f5bc2181b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-round-data.h
@@ -0,0 +1,20 @@
+/* Pick a few numbers at random which are not in the input data and
+   unlikely to show up naturally.  */
+#define PASSTHROUGH -29.5
+#define IGNORED -61.5
+
+union value {
+  VEC_T x;
+  FP_T f[sizeof (VEC_T) / sizeof (FP_T)];
+};
+
+struct data {
+  union value value;
+  double answer[sizeof (VEC_T) / sizeof (FP_T)];
+};
+
+struct data2 {
+  union value value1;
+  union value value2;
+  double answer[sizeof (VEC_T) / sizeof (FP_T)];
+};
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-round.h b/gcc/testsuite/gcc.target/powerpc/sse4_1-round.h
new file mode 100644
index 000000000000..6acf8da8b766
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-round.h
@@ -0,0 +1,27 @@
+#include <fenv.h>
+#include <smmintrin.h>
+#include "sse4_1-check.h"
+
+#define DIM(a) (sizeof (a) / sizeof ((a)[0]))
+
+static int modes[] = { FE_TONEAREST, FE_UPWARD, FE_DOWNWARD, FE_TOWARDZERO };
+
+static void
+TEST (void)
+{
+  int i, j, ri, round_save;
+
+  round_save = fegetround ();
+  for (ri = 0; ri < DIM (modes); ri++) {
+    (void) fesetround (modes[ri]);
+    for (i = 0; i < DIM (data); i++) {
+      union value guess;
+      guess.x = ROUND_INTRIN (data[i].value.x, /* Ignored.  */);
+      for (j = 0; j < DIM (data[i].value.f); j++) {
+        if (guess.f[j] != data[i].answer[j])
+          abort ();
+      }
+    }
+  }
+  (void) fesetround (round_save);
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-round2.h b/gcc/testsuite/gcc.target/powerpc/sse4_1-round2.h
new file mode 100644
index 000000000000..859574e11d9a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-round2.h
@@ -0,0 +1,27 @@
+#include <fenv.h>
+#include <smmintrin.h>
+#include "sse4_1-check.h"
+
+#define DIM(a) (sizeof (a) / sizeof ((a)[0]))
+
+static int modes[] = { FE_TONEAREST, FE_UPWARD, FE_DOWNWARD, FE_TOWARDZERO };
+
+static void
+TEST (void)
+{
+  int i, j, ri, round_save;
+
+  round_save = fegetround ();
+  for (ri = 0; ri < DIM (modes); ri++) {
+    (void) fesetround (modes[ri]);
+    for (i = 0; i < DIM (data); i++) {
+      union value guess;
+      guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x);
+      for (j = 0; j < DIM (data[i].value1.f); j++) {
+        if (guess.f[j] != data[i].answer[j])
+          abort ();
+      }
+    }
+  }
+  (void) fesetround (round_save);
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-3.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-3.c
new file mode 100644
index 000000000000..88a5f0718ebb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-3.c
@@ -0,0 +1,36 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#ifndef CHECK_H
+#define CHECK_H "sse4_1-check.h"
+#endif
+
+#ifndef TEST
+#define TEST sse4_1_test
+#endif
+
+#include CHECK_H
+
+#include <smmintrin.h>
+
+static void
+TEST (void)
+{
+  union128d u, s;
+  double e[2] = {0.0};
+  int i;
+
+  s.x = _mm_set_pd (1.1234, -2.3478);
+  u.x = _mm_ceil_pd (s.x);
+
+  for (i = 0; i < 2; i++)
+    {
+      __m128d tmp = _mm_load_sd (&s.a[i]);
+      tmp = _mm_ceil_sd (tmp, tmp);
+      _mm_store_sd (&e[i], tmp);
+    }
+  
+  if (check_union128d (u, e))
+    abort ();
+} 
-- 
2.27.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 5/6] rs6000: Add support for SSE4.1 "floor" intrinsics
  2021-07-16 13:50 [PATCH v2 0/6] rs6000: Add SSE4.1 "blend", "ceil", "floor" Paul A. Clarke
                   ` (3 preceding siblings ...)
  2021-07-16 13:50 ` [PATCH v2 4/6] rs6000: Add tests " Paul A. Clarke
@ 2021-07-16 13:50 ` Paul A. Clarke
  2021-07-16 18:30   ` Bill Schmidt
  2021-07-28 22:25   ` Segher Boessenkool
  2021-07-16 13:50 ` [PATCH v2 6/6] rs6000: Add tests " Paul A. Clarke
  5 siblings, 2 replies; 20+ messages in thread
From: Paul A. Clarke @ 2021-07-16 13:50 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, wschmidt

2021-07-16  Paul A. Clarke  <pc@us.ibm.com>

gcc
	* config/rs6000/smmintrin.h (_mm_floor_pd, _mm_floor_ps,
	_mm_floor_sd, _mm_floor_ss): New.
---
v2: Improve formatting per review from Bill.

 gcc/config/rs6000/smmintrin.h | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index cad770a67631..5960991e0af7 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -264,6 +264,38 @@ _mm_ceil_ss (__m128 __A, __m128 __B)
   return r;
 }
 
+__inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_floor_pd (__m128d __A)
+{
+  return (__m128d) vec_floor ((__v2df) __A);
+}
+
+__inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_floor_ps (__m128 __A)
+{
+  return (__m128) vec_floor ((__v4sf) __A);
+}
+
+__inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_floor_sd (__m128d __A, __m128d __B)
+{
+  __v2df r = vec_floor ((__v2df) __B);
+  r[1] = ((__v2df) __A)[1];
+  return (__m128d) r;
+}
+
+__inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_floor_ss (__m128 __A, __m128 __B)
+{
+  __v4sf r = (__v4sf) __A;
+  r[0] = __builtin_floor (((__v4sf) __B)[0]);
+  return r;
+}
+
 /* Return horizontal packed word minimum and its index in bits [15:0]
    and bits [18:16] respectively.  */
 __inline __m128i
-- 
2.27.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 6/6] rs6000: Add tests for SSE4.1 "floor" intrinsics
  2021-07-16 13:50 [PATCH v2 0/6] rs6000: Add SSE4.1 "blend", "ceil", "floor" Paul A. Clarke
                   ` (4 preceding siblings ...)
  2021-07-16 13:50 ` [PATCH v2 5/6] rs6000: Add support for SSE4.1 "floor" intrinsics Paul A. Clarke
@ 2021-07-16 13:50 ` Paul A. Clarke
  2021-07-16 18:31   ` Bill Schmidt
  2021-07-28 22:26   ` Segher Boessenkool
  5 siblings, 2 replies; 20+ messages in thread
From: Paul A. Clarke @ 2021-07-16 13:50 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, wschmidt

Add the tests for _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss.
These are modelled after (and depend upon parts of) the tests for
_mm_ceil intrinsics, recently posted.

Copy a test for _mm_floor_sd from gcc/testsuite/gcc.target/i386.

2021-07-16  Paul A. Clarke  <pc@us.ibm.com>

gcc/testsuite
	* gcc.target/powerpc/sse4_1-floorpd.c: New.
	* gcc.target/powerpc/sse4_1-floorps.c: New.
	* gcc.target/powerpc/sse4_1-floorsd.c: New.
	* gcc.target/powerpc/sse4_1-floorss.c: New.
	* gcc.target/powerpc/sse4_1-roundpd-2.c: Copy from
	gcc/testsuite/gcc.target/i386.
---
v2: Improve formatting per review from Bill.

 .../gcc.target/powerpc/sse4_1-floorpd.c       |  51 ++++++++
 .../gcc.target/powerpc/sse4_1-floorps.c       |  41 ++++++
 .../gcc.target/powerpc/sse4_1-floorsd.c       | 119 ++++++++++++++++++
 .../gcc.target/powerpc/sse4_1-floorss.c       |  95 ++++++++++++++
 .../gcc.target/powerpc/sse4_1-roundpd-2.c     |  36 ++++++
 5 files changed, 342 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorss.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-2.c

diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c
new file mode 100644
index 000000000000..ad21644f50c4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c
@@ -0,0 +1,51 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <smmintrin.h>
+
+#define VEC_T __m128d
+#define FP_T double
+
+#define ROUND_INTRIN(x, mode) _mm_floor_pd (x)
+
+#include "sse4_1-round-data.h"
+
+static struct data data[] = {
+  { .value = { .f = {  0.00,  0.25 } }, .answer = {  0.0,  0.0 } },
+  { .value = { .f = {  0.50,  0.75 } }, .answer = {  0.0,  0.0 } },
+
+  { { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffdp+50 } },
+           {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
+  { { .f = {  0x1.ffffffffffffep+50,  0x1.0000000000000p+51 } },
+           {  0x1.ffffffffffffcp+50,  0x1.0000000000000p+51 } },
+  { { .f = {  0x1.0000000000000p+51,  0x1.0000000000001p+51 } },
+           {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
+  { { .f = {  0x1.0000000000002p+51,  0x1.0000000000003p+51 } },
+           {  0x1.0000000000002p+51,  0x1.0000000000002p+51 } },
+
+  { { .f = {  0x1.ffffffffffffep+51,  0x1.fffffffffffffp+51 } },
+           {  0x1.ffffffffffffep+51,  0x1.ffffffffffffep+51 } },
+  { { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
+           {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
+
+  { { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
+           { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
+  { { .f = { -0x1.fffffffffffffp+51, -0x1.ffffffffffffep+52 } },
+           { -0x1.0000000000000p+52, -0x1.ffffffffffffep+52 } },
+
+  { { .f = { -0x1.0000000000003p+51, -0x1.0000000000002p+51 } },
+           { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } },
+  { { .f = { -0x1.0000000000001p+51, -0x1.0000000000000p+51 } },
+           { -0x1.0000000000002p+51, -0x1.0000000000000p+51 } },
+  { { .f = { -0x1.fffffffffffffp+50, -0x1.ffffffffffffep+50 } },
+           { -0x1.0000000000000p+51, -0x1.0000000000000p+51 } },
+  { { .f = { -0x1.ffffffffffffdp+50, -0x1.ffffffffffffcp+50 } },
+           { -0x1.0000000000000p+51, -0x1.ffffffffffffcp+50 } },
+
+  { { .f = { -1.00, -0.75 } }, { -1.0, -1.0 } },
+  { { .f = { -0.50, -0.25 } }, { -1.0, -1.0 } }
+};
+
+#include "sse4_1-round.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c
new file mode 100644
index 000000000000..a53ef9aa9e8b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c
@@ -0,0 +1,41 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <smmintrin.h>
+
+#define VEC_T __m128
+#define FP_T float
+
+#define ROUND_INTRIN(x, mode) _mm_floor_ps (x)
+
+#include "sse4_1-round-data.h"
+
+static struct data data[] = {
+  { { .f = {  0.00,  0.25,  0.50,  0.75 } }, {  0.0,  0.0,  0.0,  0.0 } },
+
+  { { .f = {  0x1.fffff8p+21,  0x1.fffffap+21,
+	      0x1.fffffcp+21,  0x1.fffffep+21 } },
+           {  0x1.fffff8p+21,  0x1.fffff8p+21,
+	      0x1.fffff8p+21,  0x1.fffff8p+21 } },
+
+  { { .f = {  0x1.fffffap+22,  0x1.fffffcp+22,
+	      0x1.fffffep+22,  0x1.fffffep+23 } },
+           {  0x1.fffff8p+22,  0x1.fffffcp+22,
+	      0x1.fffffcp+22,  0x1.fffffep+23 } },
+
+  { { .f = { -0x1.fffffep+23, -0x1.fffffep+22,
+	     -0x1.fffffcp+22, -0x1.fffffap+22 } },
+           { -0x1.fffffep+23, -0x1.000000p+23,
+	     -0x1.fffffcp+22, -0x1.fffffcp+22 } },
+
+  { { .f = { -0x1.fffffep+21, -0x1.fffffcp+21,
+	     -0x1.fffffap+21, -0x1.fffff8p+21 } },
+           { -0x1.000000p+22, -0x1.000000p+22,
+	     -0x1.000000p+22, -0x1.fffff8p+21 } },
+
+  { { .f = { -1.00, -0.75, -0.50, -0.25 } }, { -1.0, -1.0, -1.0, -1.0 } }
+};
+
+#include "sse4_1-round.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-floorsd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorsd.c
new file mode 100644
index 000000000000..e4ebc550556f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorsd.c
@@ -0,0 +1,119 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <smmintrin.h>
+
+#define VEC_T __m128d
+#define FP_T double
+
+#define ROUND_INTRIN(x, y) _mm_floor_sd (x, y)
+
+#include "sse4_1-round-data.h"
+
+static struct data2 data[] = {
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0.00, IGNORED } },
+    .answer = {  0.0, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0.25, IGNORED } },
+    .answer = {  0.0, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0.50, IGNORED } },
+    .answer = {  0.0, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0.75, IGNORED } },
+    .answer = {  0.0, PASSTHROUGH } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.ffffffffffffcp+50, IGNORED } },
+    .answer = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.ffffffffffffdp+50, IGNORED } },
+    .answer = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.ffffffffffffep+50, IGNORED } },
+    .answer = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffffffffffp+50, IGNORED } },
+    .answer = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.0000000000000p+51, IGNORED } },
+    .answer = {  0x1.0000000000000p+51, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.0000000000001p+51, IGNORED } },
+    .answer = {  0x1.0000000000000p+51, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.0000000000002p+51, IGNORED } },
+    .answer = {  0x1.0000000000002p+51, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.0000000000003p+51, IGNORED } },
+    .answer = {  0x1.0000000000002p+51, PASSTHROUGH } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.ffffffffffffep+51, IGNORED } },
+    .answer = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffffffffffp+51, IGNORED } },
+    .answer = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.0000000000000p+52, IGNORED } },
+    .answer = {  0x1.0000000000000p+52, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.0000000000001p+52, IGNORED } },
+    .answer = {  0x1.0000000000001p+52, PASSTHROUGH } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.0000000000001p+52, IGNORED } },
+    .answer = { -0x1.0000000000001p+52, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.0000000000000p+52, IGNORED } },
+    .answer = { -0x1.0000000000000p+52, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffffffffffffp+51, IGNORED } },
+    .answer = { -0x1.0000000000000p+52, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.ffffffffffffep+51, IGNORED } },
+    .answer = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.0000000000003p+51, IGNORED } },
+    .answer = { -0x1.0000000000004p+51, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.0000000000002p+51, IGNORED } },
+    .answer = { -0x1.0000000000002p+51, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.0000000000001p+51, IGNORED } },
+    .answer = { -0x1.0000000000002p+51, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.0000000000000p+51, IGNORED } },
+    .answer = { -0x1.0000000000000p+51, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.ffffffffffffcp+50, IGNORED } },
+    .answer = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.ffffffffffffep+50, IGNORED } },
+    .answer = { -0x1.0000000000000p+51, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.ffffffffffffdp+50, IGNORED } },
+    .answer = { -0x1.0000000000000p+51, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.ffffffffffffcp+50, IGNORED } },
+    .answer = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -1.00, IGNORED } },
+    .answer = { -1.0, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0.75, IGNORED } },
+    .answer = { -1.0, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0.50, IGNORED } },
+    .answer = { -1.0, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0.25, IGNORED } },
+    .answer = { -1.0, PASSTHROUGH } }
+};
+
+#include "sse4_1-round2.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-floorss.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorss.c
new file mode 100644
index 000000000000..cfbfe2b1eba7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorss.c
@@ -0,0 +1,95 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <smmintrin.h>
+
+#define VEC_T __m128
+#define FP_T float
+
+#define ROUND_INTRIN(x, y) _mm_floor_ss (x, y)
+
+#include "sse4_1-round-data.h"
+
+static struct data2 data[] = {
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0.00,  IGNORED, IGNORED, IGNORED } },
+    .answer = {  0.0, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0.25, IGNORED, IGNORED, IGNORED } },
+    .answer = {  0.0, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0.50, IGNORED, IGNORED, IGNORED } },
+    .answer = {  0.0, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0.75, IGNORED, IGNORED, IGNORED } },
+    .answer = {  0.0, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffff8p+21, IGNORED, IGNORED, IGNORED } },
+    .answer = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffap+21, IGNORED, IGNORED, IGNORED } },
+    .answer = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffcp+21, IGNORED, IGNORED, IGNORED } },
+    .answer = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffep+21, IGNORED, IGNORED, IGNORED } },
+    .answer = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffap+22, IGNORED, IGNORED, IGNORED } },
+    .answer = {  0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffcp+22, IGNORED, IGNORED, IGNORED } },
+    .answer = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffep+22, IGNORED, IGNORED, IGNORED } },
+    .answer = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffep+23, IGNORED, IGNORED, IGNORED } },
+    .answer = {  0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffffep+23, IGNORED, IGNORED, IGNORED } },
+    .answer = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffffep+22, IGNORED, IGNORED, IGNORED } },
+    .answer = { -0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffffcp+22, IGNORED, IGNORED, IGNORED } },
+    .answer = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffffap+22, IGNORED, IGNORED, IGNORED } },
+    .answer = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffffep+21, IGNORED, IGNORED, IGNORED } },
+    .answer = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffffcp+21, IGNORED, IGNORED, IGNORED } },
+    .answer = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffffap+21, IGNORED, IGNORED, IGNORED } },
+    .answer = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffff8p+21, IGNORED, IGNORED, IGNORED } },
+    .answer = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -1.00, IGNORED, IGNORED, IGNORED } },
+    .answer = { -1.0, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0.75, IGNORED, IGNORED, IGNORED } },
+    .answer = { -1.0, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0.50, IGNORED, IGNORED, IGNORED } },
+    .answer = { -1.0, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0.25, IGNORED, IGNORED, IGNORED } },
+    .answer = { -1.0, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }
+};
+
+#include "sse4_1-round2.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-2.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-2.c
new file mode 100644
index 000000000000..cec16175473f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-2.c
@@ -0,0 +1,36 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#ifndef CHECK_H
+#define CHECK_H "sse4_1-check.h"
+#endif
+
+#ifndef TEST
+#define TEST sse4_1_test
+#endif
+
+#include CHECK_H
+
+#include <smmintrin.h>
+
+static void
+TEST (void)
+{
+  union128d u, s;
+  double e[2] = {0.0};
+  int i;
+
+  s.x = _mm_set_pd (1.1234, -2.3478);
+  u.x = _mm_floor_pd (s.x);
+
+  for (i = 0; i < 2; i++)
+    {
+      __m128d tmp = _mm_load_sd (&s.a[i]);
+      tmp = _mm_floor_sd (tmp, tmp);
+      _mm_store_sd (&e[i], tmp);
+    }
+  
+  if (check_union128d (u, e))
+    abort ();
+} 
-- 
2.27.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/6] rs6000: Add support for SSE4.1 "blend" intrinsics
  2021-07-16 13:50 ` [PATCH v2 1/6] rs6000: Add support for SSE4.1 "blend" intrinsics Paul A. Clarke
@ 2021-07-16 18:13   ` Bill Schmidt
  2021-07-28 21:30   ` Segher Boessenkool
  1 sibling, 0 replies; 20+ messages in thread
From: Bill Schmidt @ 2021-07-16 18:13 UTC (permalink / raw)
  To: Paul A. Clarke, gcc-patches; +Cc: segher

Hi Paul,

Thanks!  LGTM.  Recommend that maintainers approve.

Bill

On 7/16/21 8:50 AM, Paul A. Clarke wrote:
> _mm_blend_epi16 and _mm_blendv_epi8 were added earlier.
> Add these four to complete the set.
>
> 2021-07-16  Paul A. Clarke  <pc@us.ibm.com>
>
> gcc
> 	* config/rs6000/smmintrin.h (_mm_blend_pd, _mm_blendv_pd,
> 	_mm_blend_ps, _mm_blendv_ps): New.
> ---
> v2:
> - Per review from Bill, rewrote _mm_blend_pd and _mm_blendv_pd to use
>    vec_perm instead of gather/unpack/select.
>
>   gcc/config/rs6000/smmintrin.h | 60 +++++++++++++++++++++++++++++++++++
>   1 file changed, 60 insertions(+)
>
> diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
> index 6a010fdbb96f..69e54702a877 100644
> --- a/gcc/config/rs6000/smmintrin.h
> +++ b/gcc/config/rs6000/smmintrin.h
> @@ -116,6 +116,66 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
>     return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
>   }
>   
> +__inline __m128d
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_blend_pd (__m128d __A, __m128d __B, const int __imm8)
> +{
> +  __v16qu __pcv[] =
> +    {
> +      {  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15 },
> +      { 16, 17, 18, 19, 20, 21, 22, 23,  8,  9, 10, 11, 12, 13, 14, 15 },
> +      {  0,  1,  2,  3,  4,  5,  6,  7, 24, 25, 26, 27, 28, 29, 30, 31 },
> +      { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 }
> +    };
> +  __v16qu __r = vec_perm ((__v16qu) __A, (__v16qu)__B, __pcv[__imm8]);
> +  return (__m128d) __r;
> +}
> +
> +__inline __m128d
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask)
> +{
> +  const __v2di __zero = {0};
> +  const __vector __bool long long __boolmask = vec_cmplt ((__v2di) __mask, __zero);
> +  return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) __boolmask);
> +}
> +
> +__inline __m128
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_blend_ps (__m128 __A, __m128 __B, const int __imm8)
> +{
> +  __v16qu __pcv[] =
> +    {
> +      {  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15 },
> +      { 16, 17, 18, 19,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15 },
> +      {  0,  1,  2,  3, 20, 21, 22, 23,  8,  9, 10, 11, 12, 13, 14, 15 },
> +      { 16, 17, 18, 19, 20, 21, 22, 23,  8,  9, 10, 11, 12, 13, 14, 15 },
> +      {  0,  1,  2,  3,  4,  5,  6,  7, 24, 25, 26, 27, 12, 13, 14, 15 },
> +      { 16, 17, 18, 19,  4,  5,  6,  7, 24, 25, 26, 27, 12, 13, 14, 15 },
> +      {  0,  1,  2,  3, 20, 21, 22, 23, 24, 25, 26, 27, 12, 13, 14, 15 },
> +      { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 12, 13, 14, 15 },
> +      {  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 28, 29, 30, 31 },
> +      { 16, 17, 18, 19,  4,  5,  6,  7,  8,  9, 10, 11, 28, 29, 30, 31 },
> +      {  0,  1,  2,  3, 20, 21, 22, 23,  8,  9, 10, 11, 28, 29, 30, 31 },
> +      { 16, 17, 18, 19, 20, 21, 22, 23,  8,  9, 10, 11, 28, 29, 30, 31 },
> +      {  0,  1,  2,  3,  4,  5,  6,  7, 24, 25, 26, 27, 28, 29, 30, 31 },
> +      { 16, 17, 18, 19,  4,  5,  6,  7, 24, 25, 26, 27, 28, 29, 30, 31 },
> +      {  0,  1,  2,  3, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 },
> +      { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 },
> +    };
> +  __v16qu __r = vec_perm ((__v16qu) __A, (__v16qu)__B, __pcv[__imm8]);
> +  return (__m128) __r;
> +}
> +
> +__inline __m128
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_blendv_ps (__m128 __A, __m128 __B, __m128 __mask)
> +{
> +  const __v4si __zero = {0};
> +  const __vector __bool int __boolmask = vec_cmplt ((__v4si) __mask, __zero);
> +  return (__m128) vec_sel ((__v4su) __A, (__v4su) __B, (__v4su) __boolmask);
> +}
> +
>   __inline int
>   __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>   _mm_testz_si128 (__m128i __A, __m128i __B)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/6] rs6000: Add tests for SSE4.1 "blend" intrinsics
  2021-07-16 13:50 ` [PATCH v2 2/6] rs6000: Add tests " Paul A. Clarke
@ 2021-07-16 18:16   ` Bill Schmidt
  2021-07-28 21:51   ` Segher Boessenkool
  1 sibling, 0 replies; 20+ messages in thread
From: Bill Schmidt @ 2021-07-16 18:16 UTC (permalink / raw)
  To: Paul A. Clarke, gcc-patches; +Cc: segher

Hi Paul,

Thanks for the cleanup, LGTM!  Recommend maintainers approve.

Bill

On 7/16/21 8:50 AM, Paul A. Clarke wrote:
> Copy the tests for _mm_blend_pd, _mm_blendv_pd, _mm_blend_ps,
> _mm_blendv_ps from gcc/testsuite/gcc.target/i386.
>
> 2021-07-16  Paul A. Clarke  <pc@us.ibm.com>
>
> gcc/testsuite
> 	* gcc.target/powerpc/sse4_1-blendpd.c: Copy from gcc.target/i386.
> 	* gcc.target/powerpc/sse4_1-blendps-2.c: Likewise.
> 	* gcc.target/powerpc/sse4_1-blendps.c: Likewise.
> 	* gcc.target/powerpc/sse4_1-blendvpd.c: Likewise.
> ---
> v2: Improve formatting per review from Bill.
>
>   .../gcc.target/powerpc/sse4_1-blendpd.c       | 89 ++++++++++++++++++
>   .../gcc.target/powerpc/sse4_1-blendps-2.c     | 81 +++++++++++++++++
>   .../gcc.target/powerpc/sse4_1-blendps.c       | 90 +++++++++++++++++++
>   .../gcc.target/powerpc/sse4_1-blendvpd.c      | 65 ++++++++++++++
>   4 files changed, 325 insertions(+)
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendvpd.c
>
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c
> new file mode 100644
> index 000000000000..ca1780471fa2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendpd.c
> @@ -0,0 +1,89 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target p8vector_hw } */
> +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
> +
> +#ifndef CHECK_H
> +#define CHECK_H "sse4_1-check.h"
> +#endif
> +
> +#ifndef TEST
> +#define TEST sse4_1_test
> +#endif
> +
> +#include CHECK_H
> +
> +#include <smmintrin.h>
> +#include <string.h>
> +
> +#define NUM 20
> +
> +#ifndef MASK
> +#define MASK 0x03
> +#endif
> +
> +static void
> +init_blendpd (double *src1, double *src2)
> +{
> +  int i, sign = 1;
> +
> +  for (i = 0; i < NUM * 2; i++)
> +    {
> +      src1[i] = i * i * sign;
> +      src2[i] = (i + 20) * sign;
> +      sign = -sign;
> +    }
> +}
> +
> +static int
> +check_blendpd (__m128d *dst, double *src1, double *src2)
> +{
> +  double tmp[2];
> +  int j;
> +
> +  memcpy (&tmp[0], src1, sizeof (tmp));
> +
> +  for(j = 0; j < 2; j++)
> +    if ((MASK & (1 << j)))
> +      tmp[j] = src2[j];
> +
> +  return memcmp (dst, &tmp[0], sizeof (tmp));
> +}
> +
> +static void
> +TEST (void)
> +{
> +  __m128d x, y;
> +  union
> +    {
> +      __m128d x[NUM];
> +      double d[NUM * 2];
> +    } dst, src1, src2;
> +  union
> +    {
> +      __m128d x;
> +      double d[2];
> +    } src3;
> +  int i;
> +
> +  init_blendpd (src1.d, src2.d);
> +
> +  /* Check blendpd imm8, m128, xmm */
> +  for (i = 0; i < NUM; i++)
> +    {
> +      dst.x[i] = _mm_blend_pd (src1.x[i], src2.x[i], MASK);
> +      if (check_blendpd (&dst.x[i], &src1.d[i * 2], &src2.d[i * 2]))
> +	abort ();
> +    }
> +
> +  /* Check blendpd imm8, xmm, xmm */
> +  src3.x = _mm_setzero_pd ();
> +
> +  x = _mm_blend_pd (dst.x[2], src3.x, MASK);
> +  y = _mm_blend_pd (src3.x, dst.x[2], MASK);
> +
> +  if (check_blendpd (&x, &dst.d[4], &src3.d[0]))
> +    abort ();
> +
> +  if (check_blendpd (&y, &src3.d[0], &dst.d[4]))
> +    abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c
> new file mode 100644
> index 000000000000..768b6e64bbae
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps-2.c
> @@ -0,0 +1,81 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target p8vector_hw } */
> +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
> +
> +#include "sse4_1-check.h"
> +
> +#include <smmintrin.h>
> +#include <string.h>
> +#include <stdlib.h>
> +
> +#define NUM 20
> +
> +#undef MASK
> +#define MASK 0xe
> +
> +static void
> +init_blendps (float *src1, float *src2)
> +{
> +  int i, sign = 1;
> +
> +  for (i = 0; i < NUM * 4; i++)
> +    {
> +      src1[i] = i * i * sign;
> +      src2[i] = (i + 20) * sign;
> +      sign = -sign;
> +    }
> +}
> +
> +static int
> +check_blendps (__m128 *dst, float *src1, float *src2)
> +{
> +  float tmp[4];
> +  int j;
> +
> +  memcpy (&tmp[0], src1, sizeof (tmp));
> +  for (j = 0; j < 4; j++)
> +    if ((MASK & (1 << j)))
> +      tmp[j] = src2[j];
> +
> +  return memcmp (dst, &tmp[0], sizeof (tmp));
> +}
> +
> +static void
> +sse4_1_test (void)
> +{
> +  __m128 x, y;
> +  union
> +    {
> +      __m128 x[NUM];
> +      float f[NUM * 4];
> +    } dst, src1, src2;
> +  union
> +    {
> +      __m128 x;
> +      float f[4];
> +    } src3;
> +  int i;
> +
> +  init_blendps (src1.f, src2.f);
> +
> +  for (i = 0; i < 4; i++)
> +    src3.f[i] = (int) rand ();
> +
> +  /* Check blendps imm8, m128, xmm */
> +  for (i = 0; i < NUM; i++)
> +    {
> +      dst.x[i] = _mm_blend_ps (src1.x[i], src2.x[i], MASK);
> +      if (check_blendps (&dst.x[i], &src1.f[i * 4], &src2.f[i * 4]))
> +	abort ();
> +    }
> +
> +   /* Check blendps imm8, xmm, xmm */
> +  x = _mm_blend_ps (dst.x[2], src3.x, MASK);
> +  y = _mm_blend_ps (src3.x, dst.x[2], MASK);
> +
> +  if (check_blendps (&x, &dst.f[8], &src3.f[0]))
> +    abort ();
> +
> +  if (check_blendps (&y, &src3.f[0], &dst.f[8]))
> +    abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c
> new file mode 100644
> index 000000000000..2f114b69a84b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendps.c
> @@ -0,0 +1,90 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target p8vector_hw } */
> +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
> +
> +#ifndef CHECK_H
> +#define CHECK_H "sse4_1-check.h"
> +#endif
> +
> +#ifndef TEST
> +#define TEST sse4_1_test
> +#endif
> +
> +#include CHECK_H
> +
> +#include <smmintrin.h>
> +#include <string.h>
> +#include <stdlib.h>
> +
> +#define NUM 20
> +
> +#ifndef MASK
> +#define MASK 0x0f
> +#endif
> +
> +static void
> +init_blendps (float *src1, float *src2)
> +{
> +  int i, sign = 1;
> +
> +  for (i = 0; i < NUM * 4; i++)
> +    {
> +      src1[i] = i * i * sign;
> +      src2[i] = (i + 20) * sign;
> +      sign = -sign;
> +    }
> +}
> +
> +static int
> +check_blendps (__m128 *dst, float *src1, float *src2)
> +{
> +  float tmp[4];
> +  int j;
> +
> +  memcpy (&tmp[0], src1, sizeof (tmp));
> +  for (j = 0; j < 4; j++)
> +    if ((MASK & (1 << j)))
> +      tmp[j] = src2[j];
> +
> +  return memcmp (dst, &tmp[0], sizeof (tmp));
> +}
> +
> +static void
> +TEST (void)
> +{
> +  __m128 x, y;
> +  union
> +    {
> +      __m128 x[NUM];
> +      float f[NUM * 4];
> +    } dst, src1, src2;
> +  union
> +    {
> +      __m128 x;
> +      float f[4];
> +    } src3;
> +  int i;
> +
> +  init_blendps (src1.f, src2.f);
> +
> +  for (i = 0; i < 4; i++)
> +    src3.f[i] = (int) rand ();
> +
> +  /* Check blendps imm8, m128, xmm */
> +  for (i = 0; i < NUM; i++)
> +    {
> +      dst.x[i] = _mm_blend_ps (src1.x[i], src2.x[i], MASK);
> +      if (check_blendps (&dst.x[i], &src1.f[i * 4], &src2.f[i * 4]))
> +	abort ();
> +    }
> +
> +   /* Check blendps imm8, xmm, xmm */
> +  x = _mm_blend_ps (dst.x[2], src3.x, MASK);
> +  y = _mm_blend_ps (src3.x, dst.x[2], MASK);
> +
> +  if (check_blendps (&x, &dst.f[8], &src3.f[0]))
> +    abort ();
> +
> +  if (check_blendps (&y, &src3.f[0], &dst.f[8]))
> +    abort ();
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvpd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvpd.c
> new file mode 100644
> index 000000000000..b82cd28848a6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvpd.c
> @@ -0,0 +1,65 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target p8vector_hw } */
> +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
> +
> +#include "sse4_1-check.h"
> +
> +#include <smmintrin.h>
> +#include <string.h>
> +
> +#define NUM 20
> +
> +static void
> +init_blendvpd (double *src1, double *src2, double *mask)
> +{
> +  int i, msk, sign = 1;
> +
> +  msk = -1;
> +  for (i = 0; i < NUM * 2; i++)
> +    {
> +      if((i % 2) == 0)
> +	msk++;
> +      src1[i] = i* (i + 1) * sign;
> +      src2[i] = (i + 20) * sign;
> +      mask[i] = (i + 120) * i;
> +      if( (msk & (1 << (i % 2))))
> +	mask[i] = -mask[i];
> +      sign = -sign;
> +    }
> +}
> +
> +static int
> +check_blendvpd (__m128d *dst, double *src1, double *src2,
> +		double *mask)
> +{
> +  double tmp[2];
> +  int j;
> +
> +  memcpy (&tmp[0], src1, sizeof (tmp));
> +  for (j = 0; j < 2; j++)
> +    if (mask [j] < 0.0)
> +      tmp[j] = src2[j];
> +
> +  return memcmp (dst, &tmp[0], sizeof (tmp));
> +}
> +
> +static void
> +sse4_1_test (void)
> +{
> +  union
> +    {
> +      __m128d x[NUM];
> +      double d[NUM * 2];
> +    } dst, src1, src2, mask;
> +  int i;
> +
> +  init_blendvpd (src1.d, src2.d, mask.d);
> +
> +  for (i = 0; i < NUM; i++)
> +    {
> +      dst.x[i] = _mm_blendv_pd (src1.x[i], src2.x[i], mask.x[i]);
> +      if (check_blendvpd (&dst.x[i], &src1.d[i * 2], &src2.d[i * 2],
> +			  &mask.d[i * 2]))
> +	abort ();
> +    }
> +}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/6] rs6000: Add support for SSE4.1 "ceil" intrinsics
  2021-07-16 13:50 ` [PATCH v2 3/6] rs6000: Add support for SSE4.1 "ceil" intrinsics Paul A. Clarke
@ 2021-07-16 18:20   ` Bill Schmidt
  2021-07-28 22:01   ` Segher Boessenkool
  1 sibling, 0 replies; 20+ messages in thread
From: Bill Schmidt @ 2021-07-16 18:20 UTC (permalink / raw)
  To: Paul A. Clarke, gcc-patches; +Cc: segher

Hi Paul,

Thanks for the cleanup, LGTM!  Recommend maintainers approve.

Bill

On 7/16/21 8:50 AM, Paul A. Clarke wrote:
> 2021-07-16  Paul A. Clarke  <pc@us.ibm.com>
>
> gcc
> 	* config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps,
> 	_mm_ceil_sd, _mm_ceil_ss): New.
> ---
> v2: Improve formatting per review from Bill.
>
>   gcc/config/rs6000/smmintrin.h | 32 ++++++++++++++++++++++++++++++++
>   1 file changed, 32 insertions(+)
>
> diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
> index 69e54702a877..cad770a67631 100644
> --- a/gcc/config/rs6000/smmintrin.h
> +++ b/gcc/config/rs6000/smmintrin.h
> @@ -232,6 +232,38 @@ _mm_test_mix_ones_zeros (__m128i __A, __m128i __mask)
>     return any_ones * any_zeros;
>   }
>   
> +__inline __m128d
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_ceil_pd (__m128d __A)
> +{
> +  return (__m128d) vec_ceil ((__v2df) __A);
> +}
> +
> +__inline __m128
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_ceil_ps (__m128 __A)
> +{
> +  return (__m128) vec_ceil ((__v4sf) __A);
> +}
> +
> +__inline __m128d
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_ceil_sd (__m128d __A, __m128d __B)
> +{
> +  __v2df r = vec_ceil ((__v2df) __B);
> +  r[1] = ((__v2df) __A)[1];
> +  return (__m128d) r;
> +}
> +
> +__inline __m128
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_ceil_ss (__m128 __A, __m128 __B)
> +{
> +  __v4sf r = (__v4sf) __A;
> +  r[0] = __builtin_ceil (((__v4sf) __B)[0]);
> +  return r;
> +}
> +
>   /* Return horizontal packed word minimum and its index in bits [15:0]
>      and bits [18:16] respectively.  */
>   __inline __m128i

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 4/6] rs6000: Add tests for SSE4.1 "ceil" intrinsics
  2021-07-16 13:50 ` [PATCH v2 4/6] rs6000: Add tests " Paul A. Clarke
@ 2021-07-16 18:22   ` Bill Schmidt
  2021-07-28 22:16   ` Segher Boessenkool
  1 sibling, 0 replies; 20+ messages in thread
From: Bill Schmidt @ 2021-07-16 18:22 UTC (permalink / raw)
  To: Paul A. Clarke, gcc-patches; +Cc: segher

Hi Paul,

Thanks for the cleanup, LGTM!  Recommend maintainers approve.

Bill

On 7/16/21 8:50 AM, Paul A. Clarke wrote:
> Add the tests for _mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd, _mm_ceil_ss.
>
> Copy a test for _mm_ceil_pd and _mm_ceil_ps from
> gcc/testsuite/gcc.target/i386.
>
> Define __VSX_SSE2__ to pick up some union definitions in
> m128-check.h.
>
> 2021-07-16  Paul A. Clarke  <pc@us.ibm.com>
>
> gcc/testsuite
> 	* gcc.target/powerpc/sse4_1-ceilpd.c: New.
> 	* gcc.target/powerpc/sse4_1-ceilps.c: New.
> 	* gcc.target/powerpc/sse4_1-ceilsd.c: New.
> 	* gcc.target/powerpc/sse4_1-ceilss.c: New.
> 	* gcc.target/powerpc/sse4_1-round-data.h: New.
> 	* gcc.target/powerpc/sse4_1-round.h: New.
> 	* gcc.target/powerpc/sse4_1-round2.h: New.
> 	* gcc.target/powerpc/sse4_1-roundpd-3.c: Copy from gcc.target/i386.
> 	* gcc.target/powerpc/sse4_1-check.h (__VSX_SSE2__): Define.
> ---
> v2: Improve formatting per review from Bill.
>
>   .../gcc.target/powerpc/sse4_1-ceilpd.c        |  51 ++++++++
>   .../gcc.target/powerpc/sse4_1-ceilps.c        |  41 ++++++
>   .../gcc.target/powerpc/sse4_1-ceilsd.c        | 119 ++++++++++++++++++
>   .../gcc.target/powerpc/sse4_1-ceilss.c        |  95 ++++++++++++++
>   .../gcc.target/powerpc/sse4_1-check.h         |   4 +
>   .../gcc.target/powerpc/sse4_1-round-data.h    |  20 +++
>   .../gcc.target/powerpc/sse4_1-round.h         |  27 ++++
>   .../gcc.target/powerpc/sse4_1-round2.h        |  27 ++++
>   .../gcc.target/powerpc/sse4_1-roundpd-3.c     |  36 ++++++
>   9 files changed, 420 insertions(+)
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilsd.c
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-ceilss.c
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round-data.h
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round.h
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round2.h
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-3.c
>
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c
> new file mode 100644
> index 000000000000..f532fdb9c285
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilpd.c
> @@ -0,0 +1,51 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target p8vector_hw } */
> +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
> +
> +#define NO_WARN_X86_INTRINSICS 1
> +#include <smmintrin.h>
> +
> +#define VEC_T __m128d
> +#define FP_T double
> +
> +#define ROUND_INTRIN(x, mode) _mm_ceil_pd (x)
> +
> +#include "sse4_1-round-data.h"
> +
> +static struct data data[] = {
> +  { .value = { .f = {  0.00,  0.25 } }, .answer = {  0.0,  1.0 } },
> +  { .value = { .f = {  0.50,  0.75 } }, .answer = {  1.0,  1.0 } },
> +
> +  { { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffdp+50 } },
> +           {  0x1.ffffffffffffcp+50,  0x1.0000000000000p+51 } },
> +  { { .f = {  0x1.ffffffffffffep+50,  0x1.fffffffffffffp+50 } },
> +           {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
> +  { { .f = {  0x1.0000000000000p+51,  0x1.0000000000001p+51 } },
> +           {  0x1.0000000000000p+51,  0x1.0000000000002p+51 } },
> +  { { .f = {  0x1.0000000000002p+51,  0x1.0000000000003p+51 } },
> +           {  0x1.0000000000002p+51,  0x1.0000000000004p+51 } },
> +
> +  { { .f = {  0x1.ffffffffffffep+51,  0x1.fffffffffffffp+51 } },
> +           {  0x1.ffffffffffffep+51,  0x1.0000000000000p+52 } },
> +  { { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
> +           {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
> +
> +  { { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
> +           { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
> +  { { .f = { -0x1.fffffffffffffp+51, -0x1.ffffffffffffep+51 } },
> +           { -0x1.ffffffffffffep+51, -0x1.ffffffffffffep+51 } },
> +
> +  { { .f = { -0x1.0000000000003p+51, -0x1.0000000000002p+51 } },
> +           { -0x1.0000000000002p+51, -0x1.0000000000002p+51 } },
> +  { { .f = { -0x1.0000000000001p+51, -0x1.0000000000000p+51 } },
> +           { -0x1.0000000000000p+51, -0x1.0000000000000p+51 } },
> +  { { .f = { -0x1.fffffffffffffp+50, -0x1.ffffffffffffep+50 } },
> +           { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
> +  { { .f = { -0x1.ffffffffffffdp+50, -0x1.ffffffffffffcp+50 } },
> +           { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
> +
> +  { { .f = { -1.00, -0.75 } }, { -1.0,  0.0 } },
> +  { { .f = { -0.50, -0.25 } }, {  0.0,  0.0 } }
> +};
> +
> +#include "sse4_1-round.h"
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c
> new file mode 100644
> index 000000000000..1e29999a57d8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilps.c
> @@ -0,0 +1,41 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target p8vector_hw } */
> +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
> +
> +#define NO_WARN_X86_INTRINSICS 1
> +#include <smmintrin.h>
> +
> +#define VEC_T __m128
> +#define FP_T float
> +
> +#define ROUND_INTRIN(x, mode) _mm_ceil_ps (x)
> +
> +#include "sse4_1-round-data.h"
> +
> +static struct data data[] = {
> +  { { .f = {  0.00,  0.25,  0.50,  0.75 } }, {  0.0,  1.0,  1.0,  1.0 } },
> +
> +  { { .f = {  0x1.fffff8p+21,  0x1.fffffap+21,
> +	      0x1.fffffcp+21,  0x1.fffffep+21 } },
> +           {  0x1.fffff8p+21,  0x1.000000p+22,
> +	      0x1.000000p+22,  0x1.000000p+22 } },
> +
> +  { { .f = {  0x1.fffffap+22,  0x1.fffffcp+22,
> +	      0x1.fffffep+22,  0x1.fffffep+23 } },
> +           {  0x1.fffffcp+22,  0x1.fffffcp+22,
> +	      0x1.000000p+23,  0x1.fffffep+23 } },
> +
> +  { { .f = { -0x1.fffffep+23, -0x1.fffffep+22,
> +	     -0x1.fffffcp+22, -0x1.fffffap+22 } },
> +           { -0x1.fffffep+23, -0x1.fffffcp+22,
> +	     -0x1.fffffcp+22, -0x1.fffff8p+22 } },
> +
> +  { { .f = { -0x1.fffffep+21, -0x1.fffffcp+21,
> +	     -0x1.fffffap+21, -0x1.fffff8p+21 } },
> +           { -0x1.fffff8p+21, -0x1.fffff8p+21,
> +	     -0x1.fffff8p+21, -0x1.fffff8p+21 } },
> +
> +  { { .f = { -1.00, -0.75, -0.50, -0.25 } }, { -1.0,  0.0,  0.0,  0.0 } }
> +};
> +
> +#include "sse4_1-round.h"
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilsd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilsd.c
> new file mode 100644
> index 000000000000..cc0d9c1d0afe
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilsd.c
> @@ -0,0 +1,119 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target p8vector_hw } */
> +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
> +
> +#define NO_WARN_X86_INTRINSICS 1
> +#include <smmintrin.h>
> +
> +#define VEC_T __m128d
> +#define FP_T double
> +
> +#define ROUND_INTRIN(x, y) _mm_ceil_sd (x, y)
> +
> +#include "sse4_1-round-data.h"
> +
> +static struct data2 data[] = {
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0.00, IGNORED } },
> +    .answer = {  0.0, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0.25, IGNORED } },
> +    .answer = {  1.0, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0.50, IGNORED } },
> +    .answer = {  1.0, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0.75, IGNORED } },
> +    .answer = {  1.0, PASSTHROUGH } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.ffffffffffffcp+50, IGNORED } },
> +    .answer = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.ffffffffffffdp+50, IGNORED } },
> +    .answer = {  0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.ffffffffffffep+50, IGNORED } },
> +    .answer = {  0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffffffffffp+50, IGNORED } },
> +    .answer = {  0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.0000000000000p+51, IGNORED } },
> +    .answer = {  0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.0000000000001p+51, IGNORED } },
> +    .answer = {  0x1.0000000000002p+51, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.0000000000002p+51, IGNORED } },
> +    .answer = {  0x1.0000000000002p+51, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.0000000000003p+51, IGNORED } },
> +    .answer = {  0x1.0000000000004p+51, PASSTHROUGH } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.ffffffffffffep+51, IGNORED } },
> +    .answer = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffffffffffp+51, IGNORED } },
> +    .answer = {  0x1.0000000000000p+52, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.0000000000000p+52, IGNORED } },
> +    .answer = {  0x1.0000000000000p+52, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.0000000000001p+52, IGNORED } },
> +    .answer = {  0x1.0000000000001p+52, PASSTHROUGH } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.0000000000001p+52, IGNORED } },
> +    .answer = { -0x1.0000000000001p+52, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.0000000000000p+52, IGNORED } },
> +    .answer = { -0x1.0000000000000p+52, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffffffffffffp+51, IGNORED } },
> +    .answer = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.ffffffffffffep+51, IGNORED } },
> +    .answer = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.0000000000003p+51, IGNORED } },
> +    .answer = { -0x1.0000000000002p+51, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.0000000000002p+51, IGNORED } },
> +    .answer = { -0x1.0000000000002p+51, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.0000000000001p+51, IGNORED } },
> +    .answer = { -0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.0000000000000p+51, IGNORED } },
> +    .answer = { -0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.ffffffffffffcp+50, IGNORED } },
> +    .answer = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.ffffffffffffep+50, IGNORED } },
> +    .answer = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.ffffffffffffdp+50, IGNORED } },
> +    .answer = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.ffffffffffffcp+50, IGNORED } },
> +    .answer = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -1.00, IGNORED } },
> +    .answer = { -1.0, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0.75, IGNORED } },
> +    .answer = { -0.0, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0.50, IGNORED } },
> +    .answer = { -0.0, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0.25, IGNORED } },
> +    .answer = { -0.0, PASSTHROUGH } }
> +};
> +
> +#include "sse4_1-round2.h"
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilss.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilss.c
> new file mode 100644
> index 000000000000..cf1a0392990e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-ceilss.c
> @@ -0,0 +1,95 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target p8vector_hw } */
> +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
> +
> +#define NO_WARN_X86_INTRINSICS 1
> +#include <smmintrin.h>
> +
> +#define VEC_T __m128
> +#define FP_T float
> +
> +#define ROUND_INTRIN(x, y) _mm_ceil_ss (x, y)
> +
> +#include "sse4_1-round-data.h"
> +
> +static struct data2 data[] = {
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0.00,  IGNORED, IGNORED, IGNORED } },
> +    .answer = {  0.0, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0.25, IGNORED, IGNORED, IGNORED } },
> +    .answer = {  1.0, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0.50, IGNORED, IGNORED, IGNORED } },
> +    .answer = {  1.0, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0.75, IGNORED, IGNORED, IGNORED } },
> +    .answer = {  1.0, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffff8p+21, IGNORED, IGNORED, IGNORED } },
> +    .answer = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffap+21, IGNORED, IGNORED, IGNORED } },
> +    .answer = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffcp+21, IGNORED, IGNORED, IGNORED } },
> +    .answer = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffep+21, IGNORED, IGNORED, IGNORED } },
> +    .answer = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffap+22, IGNORED, IGNORED, IGNORED } },
> +    .answer = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffcp+22, IGNORED, IGNORED, IGNORED } },
> +    .answer = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffep+22, IGNORED, IGNORED, IGNORED } },
> +    .answer = {  0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffep+23, IGNORED, IGNORED, IGNORED } },
> +    .answer = {  0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffffep+23, IGNORED, IGNORED, IGNORED } },
> +    .answer = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffffep+22, IGNORED, IGNORED, IGNORED } },
> +    .answer = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffffcp+22, IGNORED, IGNORED, IGNORED } },
> +    .answer = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffffap+22, IGNORED, IGNORED, IGNORED } },
> +    .answer = { -0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffffep+21, IGNORED, IGNORED, IGNORED } },
> +    .answer = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffffcp+21, IGNORED, IGNORED, IGNORED } },
> +    .answer = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffffap+21, IGNORED, IGNORED, IGNORED } },
> +    .answer = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffff8p+21, IGNORED, IGNORED, IGNORED } },
> +    .answer = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -1.00, IGNORED, IGNORED, IGNORED } },
> +    .answer = { -1.0, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0.75, IGNORED, IGNORED, IGNORED } },
> +    .answer = {  0.0, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0.50, IGNORED, IGNORED, IGNORED } },
> +    .answer = {  0.0, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0.25, IGNORED, IGNORED, IGNORED } },
> +    .answer = {  0.0, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }
> +};
> +
> +#include "sse4_1-round2.h"
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-check.h b/gcc/testsuite/gcc.target/powerpc/sse4_1-check.h
> index 5f855b9fd53a..16330533e50a 100644
> --- a/gcc/testsuite/gcc.target/powerpc/sse4_1-check.h
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-check.h
> @@ -1,6 +1,10 @@
>   #include <stdio.h>
>   #include <stdlib.h>
>   
> +/* Define this to enable the combination of VSX vector double and
> +   SSE2 data types.  */
> +#define __VSX_SSE2__ 1
> +
>   #include "m128-check.h"
>   
>   //#define DEBUG 1
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-round-data.h b/gcc/testsuite/gcc.target/powerpc/sse4_1-round-data.h
> new file mode 100644
> index 000000000000..543f5bc2181b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-round-data.h
> @@ -0,0 +1,20 @@
> +/* Pick a few numbers at random which are not in the input data and
> +   unlikely to show up naturally.  */
> +#define PASSTHROUGH -29.5
> +#define IGNORED -61.5
> +
> +union value {
> +  VEC_T x;
> +  FP_T f[sizeof (VEC_T) / sizeof (FP_T)];
> +};
> +
> +struct data {
> +  union value value;
> +  double answer[sizeof (VEC_T) / sizeof (FP_T)];
> +};
> +
> +struct data2 {
> +  union value value1;
> +  union value value2;
> +  double answer[sizeof (VEC_T) / sizeof (FP_T)];
> +};
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-round.h b/gcc/testsuite/gcc.target/powerpc/sse4_1-round.h
> new file mode 100644
> index 000000000000..6acf8da8b766
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-round.h
> @@ -0,0 +1,27 @@
> +#include <fenv.h>
> +#include <smmintrin.h>
> +#include "sse4_1-check.h"
> +
> +#define DIM(a) (sizeof (a) / sizeof ((a)[0]))
> +
> +static int modes[] = { FE_TONEAREST, FE_UPWARD, FE_DOWNWARD, FE_TOWARDZERO };
> +
> +static void
> +TEST (void)
> +{
> +  int i, j, ri, round_save;
> +
> +  round_save = fegetround ();
> +  for (ri = 0; ri < DIM (modes); ri++) {
> +    (void) fesetround (modes[ri]);
> +    for (i = 0; i < DIM (data); i++) {
> +      union value guess;
> +      guess.x = ROUND_INTRIN (data[i].value.x, /* Ignored.  */);
> +      for (j = 0; j < DIM (data[i].value.f); j++) {
> +        if (guess.f[j] != data[i].answer[j])
> +          abort ();
> +      }
> +    }
> +  }
> +  (void) fesetround (round_save);
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-round2.h b/gcc/testsuite/gcc.target/powerpc/sse4_1-round2.h
> new file mode 100644
> index 000000000000..859574e11d9a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-round2.h
> @@ -0,0 +1,27 @@
> +#include <fenv.h>
> +#include <smmintrin.h>
> +#include "sse4_1-check.h"
> +
> +#define DIM(a) (sizeof (a) / sizeof ((a)[0]))
> +
> +static int modes[] = { FE_TONEAREST, FE_UPWARD, FE_DOWNWARD, FE_TOWARDZERO };
> +
> +static void
> +TEST (void)
> +{
> +  int i, j, ri, round_save;
> +
> +  round_save = fegetround ();
> +  for (ri = 0; ri < DIM (modes); ri++) {
> +    (void) fesetround (modes[ri]);
> +    for (i = 0; i < DIM (data); i++) {
> +      union value guess;
> +      guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x);
> +      for (j = 0; j < DIM (data[i].value1.f); j++) {
> +        if (guess.f[j] != data[i].answer[j])
> +          abort ();
> +      }
> +    }
> +  }
> +  (void) fesetround (round_save);
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-3.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-3.c
> new file mode 100644
> index 000000000000..88a5f0718ebb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-3.c
> @@ -0,0 +1,36 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target p8vector_hw } */
> +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
> +
> +#ifndef CHECK_H
> +#define CHECK_H "sse4_1-check.h"
> +#endif
> +
> +#ifndef TEST
> +#define TEST sse4_1_test
> +#endif
> +
> +#include CHECK_H
> +
> +#include <smmintrin.h>
> +
> +static void
> +TEST (void)
> +{
> +  union128d u, s;
> +  double e[2] = {0.0};
> +  int i;
> +
> +  s.x = _mm_set_pd (1.1234, -2.3478);
> +  u.x = _mm_ceil_pd (s.x);
> +
> +  for (i = 0; i < 2; i++)
> +    {
> +      __m128d tmp = _mm_load_sd (&s.a[i]);
> +      tmp = _mm_ceil_sd (tmp, tmp);
> +      _mm_store_sd (&e[i], tmp);
> +    }
> +
> +  if (check_union128d (u, e))
> +    abort ();
> +}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 5/6] rs6000: Add support for SSE4.1 "floor" intrinsics
  2021-07-16 13:50 ` [PATCH v2 5/6] rs6000: Add support for SSE4.1 "floor" intrinsics Paul A. Clarke
@ 2021-07-16 18:30   ` Bill Schmidt
  2021-07-28 22:25   ` Segher Boessenkool
  1 sibling, 0 replies; 20+ messages in thread
From: Bill Schmidt @ 2021-07-16 18:30 UTC (permalink / raw)
  To: Paul A. Clarke, gcc-patches; +Cc: segher

Hi Paul,

LGTM!  Recommend maintainers approve.

Bill

On 7/16/21 8:50 AM, Paul A. Clarke wrote:
> 2021-07-16  Paul A. Clarke  <pc@us.ibm.com>
>
> gcc
> 	* config/rs6000/smmintrin.h (_mm_floor_pd, _mm_floor_ps,
> 	_mm_floor_sd, _mm_floor_ss): New.
> ---
> v2: Improve formatting per review from Bill.
>
>   gcc/config/rs6000/smmintrin.h | 32 ++++++++++++++++++++++++++++++++
>   1 file changed, 32 insertions(+)
>
> diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
> index cad770a67631..5960991e0af7 100644
> --- a/gcc/config/rs6000/smmintrin.h
> +++ b/gcc/config/rs6000/smmintrin.h
> @@ -264,6 +264,38 @@ _mm_ceil_ss (__m128 __A, __m128 __B)
>     return r;
>   }
>   
> +__inline __m128d
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_floor_pd (__m128d __A)
> +{
> +  return (__m128d) vec_floor ((__v2df) __A);
> +}
> +
> +__inline __m128
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_floor_ps (__m128 __A)
> +{
> +  return (__m128) vec_floor ((__v4sf) __A);
> +}
> +
> +__inline __m128d
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_floor_sd (__m128d __A, __m128d __B)
> +{
> +  __v2df r = vec_floor ((__v2df) __B);
> +  r[1] = ((__v2df) __A)[1];
> +  return (__m128d) r;
> +}
> +
> +__inline __m128
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_floor_ss (__m128 __A, __m128 __B)
> +{
> +  __v4sf r = (__v4sf) __A;
> +  r[0] = __builtin_floor (((__v4sf) __B)[0]);
> +  return r;
> +}
> +
>   /* Return horizontal packed word minimum and its index in bits [15:0]
>      and bits [18:16] respectively.  */
>   __inline __m128i

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 6/6] rs6000: Add tests for SSE4.1 "floor" intrinsics
  2021-07-16 13:50 ` [PATCH v2 6/6] rs6000: Add tests " Paul A. Clarke
@ 2021-07-16 18:31   ` Bill Schmidt
  2021-07-28 22:26   ` Segher Boessenkool
  1 sibling, 0 replies; 20+ messages in thread
From: Bill Schmidt @ 2021-07-16 18:31 UTC (permalink / raw)
  To: Paul A. Clarke, gcc-patches; +Cc: segher

Hi Paul,

LGTM!  Recommend maintainers approve.

Bill

On 7/16/21 8:50 AM, Paul A. Clarke wrote:
> Add the tests for _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss.
> These are modelled after (and depend upon parts of) the tests for
> _mm_ceil intrinsics, recently posted.
>
> Copy a test for _mm_floor_sd from gcc/testsuite/gcc.target/i386.
>
> 2021-07-16  Paul A. Clarke  <pc@us.ibm.com>
>
> gcc/testsuite
> 	* gcc.target/powerpc/sse4_1-floorpd.c: New.
> 	* gcc.target/powerpc/sse4_1-floorps.c: New.
> 	* gcc.target/powerpc/sse4_1-floorsd.c: New.
> 	* gcc.target/powerpc/sse4_1-floorss.c: New.
> 	* gcc.target/powerpc/sse4_1-roundpd-2.c: Copy from
> 	gcc/testsuite/gcc.target/i386.
> ---
> v2: Improve formatting per review from Bill.
>
>   .../gcc.target/powerpc/sse4_1-floorpd.c       |  51 ++++++++
>   .../gcc.target/powerpc/sse4_1-floorps.c       |  41 ++++++
>   .../gcc.target/powerpc/sse4_1-floorsd.c       | 119 ++++++++++++++++++
>   .../gcc.target/powerpc/sse4_1-floorss.c       |  95 ++++++++++++++
>   .../gcc.target/powerpc/sse4_1-roundpd-2.c     |  36 ++++++
>   5 files changed, 342 insertions(+)
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorsd.c
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-floorss.c
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-2.c
>
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c
> new file mode 100644
> index 000000000000..ad21644f50c4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorpd.c
> @@ -0,0 +1,51 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target p8vector_hw } */
> +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
> +
> +#define NO_WARN_X86_INTRINSICS 1
> +#include <smmintrin.h>
> +
> +#define VEC_T __m128d
> +#define FP_T double
> +
> +#define ROUND_INTRIN(x, mode) _mm_floor_pd (x)
> +
> +#include "sse4_1-round-data.h"
> +
> +static struct data data[] = {
> +  { .value = { .f = {  0.00,  0.25 } }, .answer = {  0.0,  0.0 } },
> +  { .value = { .f = {  0.50,  0.75 } }, .answer = {  0.0,  0.0 } },
> +
> +  { { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffdp+50 } },
> +           {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
> +  { { .f = {  0x1.ffffffffffffep+50,  0x1.0000000000000p+51 } },
> +           {  0x1.ffffffffffffcp+50,  0x1.0000000000000p+51 } },
> +  { { .f = {  0x1.0000000000000p+51,  0x1.0000000000001p+51 } },
> +           {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
> +  { { .f = {  0x1.0000000000002p+51,  0x1.0000000000003p+51 } },
> +           {  0x1.0000000000002p+51,  0x1.0000000000002p+51 } },
> +
> +  { { .f = {  0x1.ffffffffffffep+51,  0x1.fffffffffffffp+51 } },
> +           {  0x1.ffffffffffffep+51,  0x1.ffffffffffffep+51 } },
> +  { { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
> +           {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
> +
> +  { { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
> +           { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
> +  { { .f = { -0x1.fffffffffffffp+51, -0x1.ffffffffffffep+52 } },
> +           { -0x1.0000000000000p+52, -0x1.ffffffffffffep+52 } },
> +
> +  { { .f = { -0x1.0000000000003p+51, -0x1.0000000000002p+51 } },
> +           { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } },
> +  { { .f = { -0x1.0000000000001p+51, -0x1.0000000000000p+51 } },
> +           { -0x1.0000000000002p+51, -0x1.0000000000000p+51 } },
> +  { { .f = { -0x1.fffffffffffffp+50, -0x1.ffffffffffffep+50 } },
> +           { -0x1.0000000000000p+51, -0x1.0000000000000p+51 } },
> +  { { .f = { -0x1.ffffffffffffdp+50, -0x1.ffffffffffffcp+50 } },
> +           { -0x1.0000000000000p+51, -0x1.ffffffffffffcp+50 } },
> +
> +  { { .f = { -1.00, -0.75 } }, { -1.0, -1.0 } },
> +  { { .f = { -0.50, -0.25 } }, { -1.0, -1.0 } }
> +};
> +
> +#include "sse4_1-round.h"
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c
> new file mode 100644
> index 000000000000..a53ef9aa9e8b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorps.c
> @@ -0,0 +1,41 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target p8vector_hw } */
> +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
> +
> +#define NO_WARN_X86_INTRINSICS 1
> +#include <smmintrin.h>
> +
> +#define VEC_T __m128
> +#define FP_T float
> +
> +#define ROUND_INTRIN(x, mode) _mm_floor_ps (x)
> +
> +#include "sse4_1-round-data.h"
> +
> +static struct data data[] = {
> +  { { .f = {  0.00,  0.25,  0.50,  0.75 } }, {  0.0,  0.0,  0.0,  0.0 } },
> +
> +  { { .f = {  0x1.fffff8p+21,  0x1.fffffap+21,
> +	      0x1.fffffcp+21,  0x1.fffffep+21 } },
> +           {  0x1.fffff8p+21,  0x1.fffff8p+21,
> +	      0x1.fffff8p+21,  0x1.fffff8p+21 } },
> +
> +  { { .f = {  0x1.fffffap+22,  0x1.fffffcp+22,
> +	      0x1.fffffep+22,  0x1.fffffep+23 } },
> +           {  0x1.fffff8p+22,  0x1.fffffcp+22,
> +	      0x1.fffffcp+22,  0x1.fffffep+23 } },
> +
> +  { { .f = { -0x1.fffffep+23, -0x1.fffffep+22,
> +	     -0x1.fffffcp+22, -0x1.fffffap+22 } },
> +           { -0x1.fffffep+23, -0x1.000000p+23,
> +	     -0x1.fffffcp+22, -0x1.fffffcp+22 } },
> +
> +  { { .f = { -0x1.fffffep+21, -0x1.fffffcp+21,
> +	     -0x1.fffffap+21, -0x1.fffff8p+21 } },
> +           { -0x1.000000p+22, -0x1.000000p+22,
> +	     -0x1.000000p+22, -0x1.fffff8p+21 } },
> +
> +  { { .f = { -1.00, -0.75, -0.50, -0.25 } }, { -1.0, -1.0, -1.0, -1.0 } }
> +};
> +
> +#include "sse4_1-round.h"
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-floorsd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorsd.c
> new file mode 100644
> index 000000000000..e4ebc550556f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorsd.c
> @@ -0,0 +1,119 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target p8vector_hw } */
> +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
> +
> +#define NO_WARN_X86_INTRINSICS 1
> +#include <smmintrin.h>
> +
> +#define VEC_T __m128d
> +#define FP_T double
> +
> +#define ROUND_INTRIN(x, y) _mm_floor_sd (x, y)
> +
> +#include "sse4_1-round-data.h"
> +
> +static struct data2 data[] = {
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0.00, IGNORED } },
> +    .answer = {  0.0, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0.25, IGNORED } },
> +    .answer = {  0.0, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0.50, IGNORED } },
> +    .answer = {  0.0, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0.75, IGNORED } },
> +    .answer = {  0.0, PASSTHROUGH } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.ffffffffffffcp+50, IGNORED } },
> +    .answer = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.ffffffffffffdp+50, IGNORED } },
> +    .answer = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.ffffffffffffep+50, IGNORED } },
> +    .answer = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffffffffffp+50, IGNORED } },
> +    .answer = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.0000000000000p+51, IGNORED } },
> +    .answer = {  0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.0000000000001p+51, IGNORED } },
> +    .answer = {  0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.0000000000002p+51, IGNORED } },
> +    .answer = {  0x1.0000000000002p+51, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.0000000000003p+51, IGNORED } },
> +    .answer = {  0x1.0000000000002p+51, PASSTHROUGH } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.ffffffffffffep+51, IGNORED } },
> +    .answer = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffffffffffp+51, IGNORED } },
> +    .answer = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.0000000000000p+52, IGNORED } },
> +    .answer = {  0x1.0000000000000p+52, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.0000000000001p+52, IGNORED } },
> +    .answer = {  0x1.0000000000001p+52, PASSTHROUGH } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.0000000000001p+52, IGNORED } },
> +    .answer = { -0x1.0000000000001p+52, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.0000000000000p+52, IGNORED } },
> +    .answer = { -0x1.0000000000000p+52, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffffffffffffp+51, IGNORED } },
> +    .answer = { -0x1.0000000000000p+52, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.ffffffffffffep+51, IGNORED } },
> +    .answer = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.0000000000003p+51, IGNORED } },
> +    .answer = { -0x1.0000000000004p+51, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.0000000000002p+51, IGNORED } },
> +    .answer = { -0x1.0000000000002p+51, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.0000000000001p+51, IGNORED } },
> +    .answer = { -0x1.0000000000002p+51, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.0000000000000p+51, IGNORED } },
> +    .answer = { -0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.ffffffffffffcp+50, IGNORED } },
> +    .answer = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.ffffffffffffep+50, IGNORED } },
> +    .answer = { -0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.ffffffffffffdp+50, IGNORED } },
> +    .answer = { -0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.ffffffffffffcp+50, IGNORED } },
> +    .answer = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -1.00, IGNORED } },
> +    .answer = { -1.0, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0.75, IGNORED } },
> +    .answer = { -1.0, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0.50, IGNORED } },
> +    .answer = { -1.0, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0.25, IGNORED } },
> +    .answer = { -1.0, PASSTHROUGH } }
> +};
> +
> +#include "sse4_1-round2.h"
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-floorss.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorss.c
> new file mode 100644
> index 000000000000..cfbfe2b1eba7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-floorss.c
> @@ -0,0 +1,95 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target p8vector_hw } */
> +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
> +
> +#define NO_WARN_X86_INTRINSICS 1
> +#include <smmintrin.h>
> +
> +#define VEC_T __m128
> +#define FP_T float
> +
> +#define ROUND_INTRIN(x, y) _mm_floor_ss (x, y)
> +
> +#include "sse4_1-round-data.h"
> +
> +static struct data2 data[] = {
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0.00,  IGNORED, IGNORED, IGNORED } },
> +    .answer = {  0.0, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0.25, IGNORED, IGNORED, IGNORED } },
> +    .answer = {  0.0, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0.50, IGNORED, IGNORED, IGNORED } },
> +    .answer = {  0.0, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0.75, IGNORED, IGNORED, IGNORED } },
> +    .answer = {  0.0, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffff8p+21, IGNORED, IGNORED, IGNORED } },
> +    .answer = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffap+21, IGNORED, IGNORED, IGNORED } },
> +    .answer = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffcp+21, IGNORED, IGNORED, IGNORED } },
> +    .answer = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffep+21, IGNORED, IGNORED, IGNORED } },
> +    .answer = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffap+22, IGNORED, IGNORED, IGNORED } },
> +    .answer = {  0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffcp+22, IGNORED, IGNORED, IGNORED } },
> +    .answer = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffep+22, IGNORED, IGNORED, IGNORED } },
> +    .answer = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffep+23, IGNORED, IGNORED, IGNORED } },
> +    .answer = {  0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffffep+23, IGNORED, IGNORED, IGNORED } },
> +    .answer = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffffep+22, IGNORED, IGNORED, IGNORED } },
> +    .answer = { -0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffffcp+22, IGNORED, IGNORED, IGNORED } },
> +    .answer = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffffap+22, IGNORED, IGNORED, IGNORED } },
> +    .answer = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffffep+21, IGNORED, IGNORED, IGNORED } },
> +    .answer = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffffcp+21, IGNORED, IGNORED, IGNORED } },
> +    .answer = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffffap+21, IGNORED, IGNORED, IGNORED } },
> +    .answer = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffff8p+21, IGNORED, IGNORED, IGNORED } },
> +    .answer = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -1.00, IGNORED, IGNORED, IGNORED } },
> +    .answer = { -1.0, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0.75, IGNORED, IGNORED, IGNORED } },
> +    .answer = { -1.0, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0.50, IGNORED, IGNORED, IGNORED } },
> +    .answer = { -1.0, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0.25, IGNORED, IGNORED, IGNORED } },
> +    .answer = { -1.0, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }
> +};
> +
> +#include "sse4_1-round2.h"
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-2.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-2.c
> new file mode 100644
> index 000000000000..cec16175473f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd-2.c
> @@ -0,0 +1,36 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target p8vector_hw } */
> +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
> +
> +#ifndef CHECK_H
> +#define CHECK_H "sse4_1-check.h"
> +#endif
> +
> +#ifndef TEST
> +#define TEST sse4_1_test
> +#endif
> +
> +#include CHECK_H
> +
> +#include <smmintrin.h>
> +
> +static void
> +TEST (void)
> +{
> +  union128d u, s;
> +  double e[2] = {0.0};
> +  int i;
> +
> +  s.x = _mm_set_pd (1.1234, -2.3478);
> +  u.x = _mm_floor_pd (s.x);
> +
> +  for (i = 0; i < 2; i++)
> +    {
> +      __m128d tmp = _mm_load_sd (&s.a[i]);
> +      tmp = _mm_floor_sd (tmp, tmp);
> +      _mm_store_sd (&e[i], tmp);
> +    }
> +
> +  if (check_union128d (u, e))
> +    abort ();
> +}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/6] rs6000: Add support for SSE4.1 "blend" intrinsics
  2021-07-16 13:50 ` [PATCH v2 1/6] rs6000: Add support for SSE4.1 "blend" intrinsics Paul A. Clarke
  2021-07-16 18:13   ` Bill Schmidt
@ 2021-07-28 21:30   ` Segher Boessenkool
  1 sibling, 0 replies; 20+ messages in thread
From: Segher Boessenkool @ 2021-07-28 21:30 UTC (permalink / raw)
  To: Paul A. Clarke; +Cc: gcc-patches, wschmidt

Hi!

On Fri, Jul 16, 2021 at 08:50:17AM -0500, Paul A. Clarke wrote:
> _mm_blend_epi16 and _mm_blendv_epi8 were added earlier.
> Add these four to complete the set.
> 
> 2021-07-16  Paul A. Clarke  <pc@us.ibm.com>
> 
> gcc
> 	* config/rs6000/smmintrin.h (_mm_blend_pd, _mm_blendv_pd,
> 	_mm_blend_ps, _mm_blendv_ps): New.

I'm not sure if this is allowed like this in changelogs?  In either case
it is more obvious / aesthetically pleasing / etc. to write "gcc/".  But
also, it is fine to leave out this one, it being the default :-)

The patch is fiune for trunk.  Thank you!


Segher

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/6] rs6000: Add tests for SSE4.1 "blend" intrinsics
  2021-07-16 13:50 ` [PATCH v2 2/6] rs6000: Add tests " Paul A. Clarke
  2021-07-16 18:16   ` Bill Schmidt
@ 2021-07-28 21:51   ` Segher Boessenkool
  1 sibling, 0 replies; 20+ messages in thread
From: Segher Boessenkool @ 2021-07-28 21:51 UTC (permalink / raw)
  To: Paul A. Clarke; +Cc: gcc-patches, wschmidt

Hi!

On Fri, Jul 16, 2021 at 08:50:18AM -0500, Paul A. Clarke wrote:
> Copy the tests for _mm_blend_pd, _mm_blendv_pd, _mm_blend_ps,
> _mm_blendv_ps from gcc/testsuite/gcc.target/i386.

You get less messy series in cases like this if you just put the tests
in the same patch as the code it tests (which works fine with Git by
default, it sorts everything in gcc/testsuite/ after everything in
gcc/config/ after all, so the important stuff is first in your patch).

> gcc/testsuite
> 	* gcc.target/powerpc/sse4_1-blendpd.c: Copy from gcc.target/i386.
> 	* gcc.target/powerpc/sse4_1-blendps-2.c: Likewise.
> 	* gcc.target/powerpc/sse4_1-blendps.c: Likewise.
> 	* gcc.target/powerpc/sse4_1-blendvpd.c: Likewise.

Well, they aren't exact copies, the dg-* statements are different (to
make it run only on a p8 or up, and enabling generating p8 code).  So
maybe say that?

Okay for trunk.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/6] rs6000: Add support for SSE4.1 "ceil" intrinsics
  2021-07-16 13:50 ` [PATCH v2 3/6] rs6000: Add support for SSE4.1 "ceil" intrinsics Paul A. Clarke
  2021-07-16 18:20   ` Bill Schmidt
@ 2021-07-28 22:01   ` Segher Boessenkool
  1 sibling, 0 replies; 20+ messages in thread
From: Segher Boessenkool @ 2021-07-28 22:01 UTC (permalink / raw)
  To: Paul A. Clarke; +Cc: gcc-patches, wschmidt

Hi!

On Fri, Jul 16, 2021 at 08:50:19AM -0500, Paul A. Clarke wrote:
> 	* config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps,
> 	_mm_ceil_sd, _mm_ceil_ss): New.

This is fine.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 4/6] rs6000: Add tests for SSE4.1 "ceil" intrinsics
  2021-07-16 13:50 ` [PATCH v2 4/6] rs6000: Add tests " Paul A. Clarke
  2021-07-16 18:22   ` Bill Schmidt
@ 2021-07-28 22:16   ` Segher Boessenkool
  2021-07-30 22:13     ` Paul A. Clarke
  1 sibling, 1 reply; 20+ messages in thread
From: Segher Boessenkool @ 2021-07-28 22:16 UTC (permalink / raw)
  To: Paul A. Clarke; +Cc: gcc-patches, wschmidt

Hi!

On Fri, Jul 16, 2021 at 08:50:20AM -0500, Paul A. Clarke wrote:
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-round.h
> @@ -0,0 +1,27 @@
> +#include <fenv.h>
> +#include <smmintrin.h>
> +#include "sse4_1-check.h"
> +
> +#define DIM(a) (sizeof (a) / sizeof ((a)[0]))

Pet peeve: sizeof is an operator, not a function, so even if you want to
protect the macro parameter this just is
  #define DIM(a) (sizeof (a) / sizeof (a)[0])

> +  (void) fesetround (round_save);

Please don't cast to (void).  That never does *anything*.

Okay for trunk (these are all testsuite files after all, and we should
test horrrible style as well! :-P )

Thanks,


Segher

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 5/6] rs6000: Add support for SSE4.1 "floor" intrinsics
  2021-07-16 13:50 ` [PATCH v2 5/6] rs6000: Add support for SSE4.1 "floor" intrinsics Paul A. Clarke
  2021-07-16 18:30   ` Bill Schmidt
@ 2021-07-28 22:25   ` Segher Boessenkool
  1 sibling, 0 replies; 20+ messages in thread
From: Segher Boessenkool @ 2021-07-28 22:25 UTC (permalink / raw)
  To: Paul A. Clarke; +Cc: gcc-patches, wschmidt

On Fri, Jul 16, 2021 at 08:50:21AM -0500, Paul A. Clarke wrote:
> 	* config/rs6000/smmintrin.h (_mm_floor_pd, _mm_floor_ps,
> 	_mm_floor_sd, _mm_floor_ss): New.

Okay for trunk.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 6/6] rs6000: Add tests for SSE4.1 "floor" intrinsics
  2021-07-16 13:50 ` [PATCH v2 6/6] rs6000: Add tests " Paul A. Clarke
  2021-07-16 18:31   ` Bill Schmidt
@ 2021-07-28 22:26   ` Segher Boessenkool
  1 sibling, 0 replies; 20+ messages in thread
From: Segher Boessenkool @ 2021-07-28 22:26 UTC (permalink / raw)
  To: Paul A. Clarke; +Cc: gcc-patches, wschmidt

On Fri, Jul 16, 2021 at 08:50:22AM -0500, Paul A. Clarke wrote:
> gcc/testsuite
> 	* gcc.target/powerpc/sse4_1-floorpd.c: New.
> 	* gcc.target/powerpc/sse4_1-floorps.c: New.
> 	* gcc.target/powerpc/sse4_1-floorsd.c: New.
> 	* gcc.target/powerpc/sse4_1-floorss.c: New.
> 	* gcc.target/powerpc/sse4_1-roundpd-2.c: Copy from
> 	gcc/testsuite/gcc.target/i386.

Okido.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 4/6] rs6000: Add tests for SSE4.1 "ceil" intrinsics
  2021-07-28 22:16   ` Segher Boessenkool
@ 2021-07-30 22:13     ` Paul A. Clarke
  0 siblings, 0 replies; 20+ messages in thread
From: Paul A. Clarke @ 2021-07-30 22:13 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches

On Wed, Jul 28, 2021 at 05:16:32PM -0500, Segher Boessenkool wrote:
> On Fri, Jul 16, 2021 at 08:50:20AM -0500, Paul A. Clarke wrote:
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-round.h
> > @@ -0,0 +1,27 @@
> > +#include <fenv.h>
> > +#include <smmintrin.h>
> > +#include "sse4_1-check.h"
> > +
> > +#define DIM(a) (sizeof (a) / sizeof ((a)[0]))
> 
> Pet peeve: sizeof is an operator, not a function, so even if you want to
> protect the macro parameter this just is
>   #define DIM(a) (sizeof (a) / sizeof (a)[0])
> 
> > +  (void) fesetround (round_save);
> 
> Please don't cast to (void).  That never does *anything*.
> 
> Okay for trunk (these are all testsuite files after all, and we should
> test horrrible style as well! :-P )

I didn't want to be responsible for promulgating horrible style, so
I incorporated the above changes and pushed as
d656a3d3ce88d402a14e8c120f1b0e78a3979deb.  :-)

PC

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-07-30 22:13 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-16 13:50 [PATCH v2 0/6] rs6000: Add SSE4.1 "blend", "ceil", "floor" Paul A. Clarke
2021-07-16 13:50 ` [PATCH v2 1/6] rs6000: Add support for SSE4.1 "blend" intrinsics Paul A. Clarke
2021-07-16 18:13   ` Bill Schmidt
2021-07-28 21:30   ` Segher Boessenkool
2021-07-16 13:50 ` [PATCH v2 2/6] rs6000: Add tests " Paul A. Clarke
2021-07-16 18:16   ` Bill Schmidt
2021-07-28 21:51   ` Segher Boessenkool
2021-07-16 13:50 ` [PATCH v2 3/6] rs6000: Add support for SSE4.1 "ceil" intrinsics Paul A. Clarke
2021-07-16 18:20   ` Bill Schmidt
2021-07-28 22:01   ` Segher Boessenkool
2021-07-16 13:50 ` [PATCH v2 4/6] rs6000: Add tests " Paul A. Clarke
2021-07-16 18:22   ` Bill Schmidt
2021-07-28 22:16   ` Segher Boessenkool
2021-07-30 22:13     ` Paul A. Clarke
2021-07-16 13:50 ` [PATCH v2 5/6] rs6000: Add support for SSE4.1 "floor" intrinsics Paul A. Clarke
2021-07-16 18:30   ` Bill Schmidt
2021-07-28 22:25   ` Segher Boessenkool
2021-07-16 13:50 ` [PATCH v2 6/6] rs6000: Add tests " Paul A. Clarke
2021-07-16 18:31   ` Bill Schmidt
2021-07-28 22:26   ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).