public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v4 0/3] rs6000: Support more SSE4 intrinsics
@ 2021-10-19  1:15 Paul A. Clarke
  2021-10-19  1:15 ` [PATCH v4 1/3] rs6000: Add nmmintrin.h to extra_headers Paul A. Clarke
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Paul A. Clarke @ 2021-10-19  1:15 UTC (permalink / raw)
  To: segher; +Cc: gcc-patches, wschmidt

v4:
- Of original 6 patches in this series, I committed patches 2-5.
- Found an issue from v3. New file "nmmintrin.h" also needs to be added
to gcc/config.gcc "extra_headers".  Unfortunately, I discovered this
after committing the patch which added "nmmintrin.h", so I've added a
new patch here.
- Added scheduling "barriers" to patch 2 after review from Segher.
- Noted additional PR fixed by patch 3.

v3: Add "nmmintrin.h". _mm_cmpgt_epi64 is part of SSE4.2
and users will expect to be able to include "nmmintrin.h",
even though "nmmintrin.h" just includes "smmintrin.h"
where all of the SSE4.2 implementations actually appear.
Only patch 5/6 changed from v2.

Tested ppc64le (POWER9) and ppc64/32 (POWER7).

OK for trunk?

Paul A. Clarke (3):
  rs6000: Add nmmintrin.h to extra_headers
  rs6000: Support SSE4.1 "round" intrinsics
  rs6000: Guard some x86 intrinsics implementations

 gcc/config.gcc                                |   1 +
 gcc/config/rs6000/emmintrin.h                 |  12 +-
 gcc/config/rs6000/pmmintrin.h                 |   4 +
 gcc/config/rs6000/smmintrin.h                 | 296 ++++++++++++++----
 gcc/config/rs6000/tmmintrin.h                 |  12 +
 .../gcc.target/powerpc/sse4_1-round3.h        |  81 +++++
 .../gcc.target/powerpc/sse4_1-roundpd.c       | 143 +++++++++
 .../gcc.target/powerpc/sse4_1-roundps.c       |  98 ++++++
 .../gcc.target/powerpc/sse4_1-roundsd.c       | 256 +++++++++++++++
 .../gcc.target/powerpc/sse4_1-roundss.c       | 208 ++++++++++++
 .../gcc.target/powerpc/sse4_2-pcmpgtq.c       |   4 +-
 11 files changed, 1039 insertions(+), 76 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c

-- 
2.27.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v4 1/3] rs6000: Add nmmintrin.h to extra_headers
  2021-10-19  1:15 [PATCH v4 0/3] rs6000: Support more SSE4 intrinsics Paul A. Clarke
@ 2021-10-19  1:15 ` Paul A. Clarke
  2021-10-19 13:10   ` Bill Schmidt
  2021-10-19  1:15 ` [PATCH v4 2/3] rs6000: Support SSE4.1 "round" intrinsics Paul A. Clarke
  2021-10-19  1:15 ` [PATCH v4 3/3] rs6000: Guard some x86 intrinsics implementations Paul A. Clarke
  2 siblings, 1 reply; 13+ messages in thread
From: Paul A. Clarke @ 2021-10-19  1:15 UTC (permalink / raw)
  To: segher; +Cc: gcc-patches, wschmidt

Fix an ommission in commit 29fb1e831bf1c25e4574bf2f98a9f534e5c67665.

2021-10-18  Paul A. Clarke  <pc@us.ibm.com>

gcc
	* config/config.gcc (extra_headers): Add nmmintrin.h.
---
 gcc/config.gcc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index aa5bd5d14590..1cb9303b3a85 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -490,6 +490,7 @@ powerpc*-*-*)
 	extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h"
 	extra_headers="${extra_headers} mmintrin.h x86intrin.h"
 	extra_headers="${extra_headers} pmmintrin.h tmmintrin.h smmintrin.h"
+	extra_headers="${extra_headers} nmmintrin.h"
 	extra_headers="${extra_headers} ppu_intrinsics.h spu2vmx.h vec_types.h si2vmx.h"
 	extra_headers="${extra_headers} amo.h"
 	case x$with_cpu in
-- 
2.27.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v4 2/3] rs6000: Support SSE4.1 "round" intrinsics
  2021-10-19  1:15 [PATCH v4 0/3] rs6000: Support more SSE4 intrinsics Paul A. Clarke
  2021-10-19  1:15 ` [PATCH v4 1/3] rs6000: Add nmmintrin.h to extra_headers Paul A. Clarke
@ 2021-10-19  1:15 ` Paul A. Clarke
  2021-10-26 20:00   ` [PING PATCH " Paul A. Clarke
  2021-10-19  1:15 ` [PATCH v4 3/3] rs6000: Guard some x86 intrinsics implementations Paul A. Clarke
  2 siblings, 1 reply; 13+ messages in thread
From: Paul A. Clarke @ 2021-10-19  1:15 UTC (permalink / raw)
  To: segher; +Cc: gcc-patches, wschmidt

Suppress exceptions (when specified), by saving, manipulating, and
restoring the FPSCR.  Similarly, save, set, and restore the floating-point
rounding mode when required.

No attempt is made to optimize writing the FPSCR (by checking if the new
value would be the same), other than using lighter weight instructions
when possible. Note that explicit instruction scheduling "barriers" are
added to prevent floating-point computations from being moved before or
after the explicit FPSCR manipulations.  (That these are required has
been reported as an issue in GCC: PR102783.)

The scalar versions naively use the parallel versions to compute the
single scalar result and then construct the remainder of the result.

Of minor note, the values of _MM_FROUND_TO_NEG_INF and _MM_FROUND_TO_ZERO
are swapped from the corresponding values on x86 so as to match the
corresponding rounding mode values in the Power ISA.

Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and
convert _mm_ceil* and _mm_floor* into macros. This matches the current
analogous implementations in config/i386/smmintrin.h.

Function signatures match the analogous functions in config/i386/smmintrin.h.

Add tests for _mm_round_pd, _mm_round_ps, _mm_round_sd, _mm_round_ss,
modeled after the very similar "floor" and "ceil" tests.

Include basic tests, plus tests at the boundaries for floating-point
representation, positive and negative, test all of the parameterized
rounding modes as well as the C99 rounding modes and interactions
between the two.

Exceptions are not explicitly tested.

2021-10-18  Paul A. Clarke  <pc@us.ibm.com>

gcc
	* config/rs6000/smmintrin.h (_mm_round_pd, _mm_round_ps,
	_mm_round_sd, _mm_round_ss, _MM_FROUND_TO_NEAREST_INT,
	_MM_FROUND_TO_ZERO, _MM_FROUND_TO_POS_INF, _MM_FROUND_TO_NEG_INF,
	_MM_FROUND_CUR_DIRECTION, _MM_FROUND_RAISE_EXC, _MM_FROUND_NO_EXC,
	_MM_FROUND_NINT, _MM_FROUND_FLOOR, _MM_FROUND_CEIL, _MM_FROUND_TRUNC,
	_MM_FROUND_RINT, _MM_FROUND_NEARBYINT): New.
	* config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd,
	_mm_ceil_ss, _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss):
	Convert from function to macro.

gcc/testsuite
	* gcc.target/powerpc/sse4_1-round3.h: New.
	* gcc.target/powerpc/sse4_1-roundpd.c: New.
	* gcc.target/powerpc/sse4_1-roundps.c: New.
	* gcc.target/powerpc/sse4_1-roundsd.c: New.
	* gcc.target/powerpc/sse4_1-roundss.c: New.
---
 gcc/config/rs6000/smmintrin.h                 | 292 ++++++++++++++----
 .../gcc.target/powerpc/sse4_1-round3.h        |  81 +++++
 .../gcc.target/powerpc/sse4_1-roundpd.c       | 143 +++++++++
 .../gcc.target/powerpc/sse4_1-roundps.c       |  98 ++++++
 .../gcc.target/powerpc/sse4_1-roundsd.c       | 256 +++++++++++++++
 .../gcc.target/powerpc/sse4_1-roundss.c       | 208 +++++++++++++
 6 files changed, 1014 insertions(+), 64 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 90ce03d22709..6bb03e6e20ac 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -42,6 +42,234 @@
 #include <altivec.h>
 #include <tmmintrin.h>
 
+/* Rounding mode macros. */
+#define _MM_FROUND_TO_NEAREST_INT       0x00
+#define _MM_FROUND_TO_ZERO              0x01
+#define _MM_FROUND_TO_POS_INF           0x02
+#define _MM_FROUND_TO_NEG_INF           0x03
+#define _MM_FROUND_CUR_DIRECTION        0x04
+
+#define _MM_FROUND_NINT		\
+  (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_FLOOR	\
+  (_MM_FROUND_TO_NEG_INF | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_CEIL		\
+  (_MM_FROUND_TO_POS_INF | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_TRUNC	\
+  (_MM_FROUND_TO_ZERO | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_RINT		\
+  (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_RAISE_EXC)
+#define _MM_FROUND_NEARBYINT	\
+  (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC)
+
+#define _MM_FROUND_RAISE_EXC            0x00
+#define _MM_FROUND_NO_EXC               0x08
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_round_pd (__m128d __A, int __rounding)
+{
+  __v2df __r;
+  union {
+    double __fr;
+    long long __fpscr;
+  } __enables_save, __fpscr_save;
+
+  if (__rounding & _MM_FROUND_NO_EXC)
+    {
+      /* Save enabled exceptions, disable all exceptions,
+	 and preserve the rounding mode.  */
+#ifdef _ARCH_PWR9
+      __asm__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
+      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
+#else
+      __fpscr_save.__fr = __builtin_mffs ();
+      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
+      __fpscr_save.__fpscr &= ~0xf8;
+      __builtin_mtfsf (0b00000011, __fpscr_save.__fr);
+#endif
+      /* Insert an artificial "read/write" reference to the variable
+	 read below, to ensure the compiler does not schedule
+	 a read/use of the variable before the FPSCR is modified, above.
+	 This can be removed if and when GCC PR102783 is fixed.
+       */
+      __asm__ ("" : "+wa" (__A));
+    }
+
+  switch (__rounding)
+    {
+      case _MM_FROUND_TO_NEAREST_INT:
+	__fpscr_save.__fr = __builtin_mffsl ();
+	__attribute__ ((fallthrough));
+      case _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC:
+	__builtin_set_fpscr_rn (0b00);
+	/* Insert an artificial "read/write" reference to the variable
+	   read below, to ensure the compiler does not schedule
+	   a read/use of the variable before the FPSCR is modified, above.
+	   This can be removed if and when GCC PR102783 is fixed.
+	 */
+	__asm__ ("" : "+wa" (__A));
+
+	__r = vec_rint ((__v2df) __A);
+
+	/* Insert an artificial "read" reference to the variable written
+	   above, to ensure the compiler does not schedule the computation
+	   of the value after the manipulation of the FPSCR, below.
+	   This can be removed if and when GCC PR102783 is fixed.
+	 */
+	__asm__ ("" : : "wa" (__r));
+	__builtin_set_fpscr_rn (__fpscr_save.__fpscr);
+	break;
+      case _MM_FROUND_TO_NEG_INF:
+      case _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC:
+	__r = vec_floor ((__v2df) __A);
+	break;
+      case _MM_FROUND_TO_POS_INF:
+      case _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC:
+	__r = vec_ceil ((__v2df) __A);
+	break;
+      case _MM_FROUND_TO_ZERO:
+      case _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC:
+	__r = vec_trunc ((__v2df) __A);
+	break;
+      case _MM_FROUND_CUR_DIRECTION:
+	__r = vec_rint ((__v2df) __A);
+	break;
+    }
+  if (__rounding & _MM_FROUND_NO_EXC)
+    {
+      /* Insert an artificial "read" reference to the variable written
+	 above, to ensure the compiler does not schedule the computation
+	 of the value after the manipulation of the FPSCR, below.
+	 This can be removed if and when GCC PR102783 is fixed.
+       */
+      __asm__ ("" : : "wa" (__r));
+      /* Restore enabled exceptions.  */
+      __fpscr_save.__fr = __builtin_mffsl ();
+      __fpscr_save.__fpscr |= __enables_save.__fpscr;
+      __builtin_mtfsf (0b00000011, __fpscr_save.__fr);
+    }
+  return (__m128d) __r;
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_round_sd (__m128d __A, __m128d __B, int __rounding)
+{
+  __B = _mm_round_pd (__B, __rounding);
+  __v2df __r = { ((__v2df) __B)[0], ((__v2df) __A)[1] };
+  return (__m128d) __r;
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_round_ps (__m128 __A, int __rounding)
+{
+  __v4sf __r;
+  union {
+    double __fr;
+    long long __fpscr;
+  } __enables_save, __fpscr_save;
+
+  if (__rounding & _MM_FROUND_NO_EXC)
+    {
+      /* Save enabled exceptions, disable all exceptions,
+	 and preserve the rounding mode.  */
+#ifdef _ARCH_PWR9
+      __asm__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
+      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
+#else
+      __fpscr_save.__fr = __builtin_mffs ();
+      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
+      __fpscr_save.__fpscr &= ~0xf8;
+      __builtin_mtfsf (0b00000011, __fpscr_save.__fr);
+#endif
+      /* Insert an artificial "read/write" reference to the variable
+	 read below, to ensure the compiler does not schedule
+	 a read/use of the variable before the FPSCR is modified, above.
+	 This can be removed if and when GCC PR102783 is fixed.
+       */
+      __asm__ ("" : "+wa" (__A));
+    }
+
+  switch (__rounding)
+    {
+      case _MM_FROUND_TO_NEAREST_INT:
+	__fpscr_save.__fr = __builtin_mffsl ();
+	__attribute__ ((fallthrough));
+      case _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC:
+	__builtin_set_fpscr_rn (0b00);
+	/* Insert an artificial "read/write" reference to the variable
+	   read below, to ensure the compiler does not schedule
+	   a read/use of the variable before the FPSCR is modified, above.
+	   This can be removed if and when GCC PR102783 is fixed.
+	 */
+	__asm__ ("" : "+wa" (__A));
+
+	__r = vec_rint ((__v4sf) __A);
+
+	/* Insert an artificial "read" reference to the variable written
+	   above, to ensure the compiler does not schedule the computation
+	   of the value after the manipulation of the FPSCR, below.
+	   This can be removed if and when GCC PR102783 is fixed.
+	 */
+	__asm__ ("" : : "wa" (__r));
+	__builtin_set_fpscr_rn (__fpscr_save.__fpscr);
+	break;
+      case _MM_FROUND_TO_NEG_INF:
+      case _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC:
+	__r = vec_floor ((__v4sf) __A);
+	break;
+      case _MM_FROUND_TO_POS_INF:
+      case _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC:
+	__r = vec_ceil ((__v4sf) __A);
+	break;
+      case _MM_FROUND_TO_ZERO:
+      case _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC:
+	__r = vec_trunc ((__v4sf) __A);
+	break;
+      case _MM_FROUND_CUR_DIRECTION:
+	__r = vec_rint ((__v4sf) __A);
+	break;
+    }
+  if (__rounding & _MM_FROUND_NO_EXC)
+    {
+      /* Insert an artificial "read" reference to the variable written
+	 above, to ensure the compiler does not schedule the computation
+	 of the value after the manipulation of the FPSCR, below.
+	 This can be removed if and when GCC PR102783 is fixed.
+       */
+      __asm__ ("" : : "wa" (__r));
+      /* Restore enabled exceptions.  */
+      __fpscr_save.__fr = __builtin_mffsl ();
+      __fpscr_save.__fpscr |= __enables_save.__fpscr;
+      __builtin_mtfsf (0b00000011, __fpscr_save.__fr);
+    }
+  return (__m128) __r;
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_round_ss (__m128 __A, __m128 __B, int __rounding)
+{
+  __B = _mm_round_ps (__B, __rounding);
+  __v4sf __r = (__v4sf) __A;
+  __r[0] = ((__v4sf) __B)[0];
+  return (__m128) __r;
+}
+
+#define _mm_ceil_pd(V)	   _mm_round_pd ((V), _MM_FROUND_CEIL)
+#define _mm_ceil_sd(D, V)  _mm_round_sd ((D), (V), _MM_FROUND_CEIL)
+
+#define _mm_floor_pd(V)	   _mm_round_pd((V), _MM_FROUND_FLOOR)
+#define _mm_floor_sd(D, V) _mm_round_sd ((D), (V), _MM_FROUND_FLOOR)
+
+#define _mm_ceil_ps(V)	   _mm_round_ps ((V), _MM_FROUND_CEIL)
+#define _mm_ceil_ss(D, V)  _mm_round_ss ((D), (V), _MM_FROUND_CEIL)
+
+#define _mm_floor_ps(V)	   _mm_round_ps ((V), _MM_FROUND_FLOOR)
+#define _mm_floor_ss(D, V) _mm_round_ss ((D), (V), _MM_FROUND_FLOOR)
+
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_insert_epi8 (__m128i const __A, int const __D, int const __N)
 {
@@ -210,70 +438,6 @@ _mm_testnzc_si128 (__m128i __A, __m128i __B)
 
 #define _mm_test_mix_ones_zeros(M, V) _mm_testnzc_si128 ((M), (V))
 
-__inline __m128d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_ceil_pd (__m128d __A)
-{
-  return (__m128d) vec_ceil ((__v2df) __A);
-}
-
-__inline __m128d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_ceil_sd (__m128d __A, __m128d __B)
-{
-  __v2df __r = vec_ceil ((__v2df) __B);
-  __r[1] = ((__v2df) __A)[1];
-  return (__m128d) __r;
-}
-
-__inline __m128d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_floor_pd (__m128d __A)
-{
-  return (__m128d) vec_floor ((__v2df) __A);
-}
-
-__inline __m128d
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_floor_sd (__m128d __A, __m128d __B)
-{
-  __v2df __r = vec_floor ((__v2df) __B);
-  __r[1] = ((__v2df) __A)[1];
-  return (__m128d) __r;
-}
-
-__inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_ceil_ps (__m128 __A)
-{
-  return (__m128) vec_ceil ((__v4sf) __A);
-}
-
-__inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_ceil_ss (__m128 __A, __m128 __B)
-{
-  __v4sf __r = (__v4sf) __A;
-  __r[0] = __builtin_ceil (((__v4sf) __B)[0]);
-  return __r;
-}
-
-__inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_floor_ps (__m128 __A)
-{
-  return (__m128) vec_floor ((__v4sf) __A);
-}
-
-__inline __m128
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm_floor_ss (__m128 __A, __m128 __B)
-{
-  __v4sf __r = (__v4sf) __A;
-  __r[0] = __builtin_floor (((__v4sf) __B)[0]);
-  return __r;
-}
-
 #ifdef _ARCH_PWR8
 extern __inline __m128i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h b/gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
new file mode 100644
index 000000000000..de6cbf7be438
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
@@ -0,0 +1,81 @@
+#include <smmintrin.h>
+#include <fenv.h>
+#include "sse4_1-check.h"
+
+#define DIM(a) (sizeof (a) / sizeof (a)[0])
+
+static int roundings[] =
+  {
+    _MM_FROUND_TO_NEAREST_INT,
+    _MM_FROUND_TO_NEG_INF,
+    _MM_FROUND_TO_POS_INF,
+    _MM_FROUND_TO_ZERO,
+    _MM_FROUND_CUR_DIRECTION
+  };
+
+static int modes[] =
+  {
+    FE_TONEAREST,
+    FE_UPWARD,
+    FE_DOWNWARD,
+    FE_TOWARDZERO
+  };
+
+static void
+TEST (void)
+{
+  int i, j, ri, mi, round_save;
+
+  round_save = fegetround ();
+  for (mi = 0; mi < DIM (modes); mi++) {
+    fesetround (modes[mi]);
+    for (i = 0; i < DIM (data); i++) {
+      for (ri = 0; ri < DIM (roundings); ri++) {
+	union value guess;
+	union value *current_answers = answers[ri];
+	switch ( roundings[ri] ) {
+	  case _MM_FROUND_TO_NEAREST_INT:
+	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
+				    _MM_FROUND_TO_NEAREST_INT);
+	    break;
+	  case _MM_FROUND_TO_NEG_INF:
+	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
+				    _MM_FROUND_TO_NEG_INF);
+	    break;
+	  case _MM_FROUND_TO_POS_INF:
+	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
+				    _MM_FROUND_TO_POS_INF);
+	    break;
+	  case _MM_FROUND_TO_ZERO:
+	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
+				    _MM_FROUND_TO_ZERO);
+	    break;
+	  case _MM_FROUND_CUR_DIRECTION:
+	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
+				    _MM_FROUND_CUR_DIRECTION);
+	    switch ( modes[mi] ) {
+	      case FE_TONEAREST:
+		current_answers = answers_NEAREST_INT;
+		break;
+	      case FE_UPWARD:
+		current_answers = answers_POS_INF;
+		break;
+	      case FE_DOWNWARD:
+		current_answers = answers_NEG_INF;
+		break;
+	      case FE_TOWARDZERO:
+		current_answers = answers_ZERO;
+		break;
+	    }
+	    break;
+	  default:
+	    abort ();
+	}
+	for (j = 0; j < DIM (guess.f); j++)
+	  if (guess.f[j] != current_answers[i].f[j])
+	    abort ();
+      }
+    }
+  }
+  fesetround (round_save);
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
new file mode 100644
index 000000000000..58d9cc524167
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
@@ -0,0 +1,143 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-O2 -mvsx" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <smmintrin.h>
+
+#define VEC_T __m128d
+#define FP_T double
+
+#define ROUND_INTRIN(x, ignored, mode) _mm_round_pd (x, mode)
+
+#include "sse4_1-round-data.h"
+
+struct data2 data[] = {
+  { .value1 = { .f = {  0.00,  0.25 } } },
+  { .value1 = { .f = {  0.50,  0.75 } } },
+
+  { .value1 = { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffdp+50 } } },
+  { .value1 = { .f = {  0x1.ffffffffffffep+50,  0x1.fffffffffffffp+50 } } },
+  { .value1 = { .f = {  0x1.0000000000000p+51,  0x1.0000000000001p+51 } } },
+  { .value1 = { .f = {  0x1.0000000000002p+51,  0x1.0000000000003p+51 } } },
+
+  { .value1 = { .f = {  0x1.ffffffffffffep+51,  0x1.fffffffffffffp+51 } } },
+  { .value1 = { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } } },
+
+  { .value1 = { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } } },
+  { .value1 = { .f = { -0x1.fffffffffffffp+51, -0x1.ffffffffffffep+51 } } },
+
+  { .value1 = { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } } },
+  { .value1 = { .f = { -0x1.0000000000001p+51, -0x1.0000000000000p+51 } } },
+  { .value1 = { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffep+50 } } },
+  { .value1 = { .f = { -0x1.ffffffffffffdp+50, -0x1.ffffffffffffcp+50 } } },
+
+  { .value1 = { .f = { -1.00, -0.75 } } },
+  { .value1 = { .f = { -0.50, -0.25 } } }
+};
+
+union value answers_NEAREST_INT[] = {
+  { .f = {  0.00,  0.00 } },
+  { .f = {  0.00,  1.00 } },
+
+  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
+  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
+  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
+  { .f = {  0x1.0000000000002p+51,  0x1.0000000000004p+51 } },
+
+  { .f = {  0x1.ffffffffffffep+51,  0x1.0000000000000p+52 } },
+  { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
+
+  { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
+  { .f = { -0x1.0000000000000p+52, -0x1.ffffffffffffep+51 } },
+
+  { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } },
+  { .f = { -0x1.0000000000000p+51, -0x1.0000000000000p+51 } },
+  { .f = { -0x1.ffffffffffffcp+50, -0x1.0000000000000p+51 } },
+  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
+
+  { .f = { -1.00, -1.00 } },
+  { .f = {  0.00,  0.00 } }
+};
+
+union value answers_NEG_INF[] = {
+  { .f = {  0.00,  0.00 } },
+  { .f = {  0.00,  0.00 } },
+
+  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
+  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
+  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
+  { .f = {  0x1.0000000000002p+51,  0x1.0000000000002p+51 } },
+
+  { .f = {  0x1.ffffffffffffep+51,  0x1.ffffffffffffep+51 } },
+  { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
+
+  { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
+  { .f = { -0x1.0000000000000p+52, -0x1.ffffffffffffep+51 } },
+
+  { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } },
+  { .f = { -0x1.0000000000002p+51, -0x1.0000000000000p+51 } },
+  { .f = { -0x1.ffffffffffffcp+50, -0x1.0000000000000p+51 } },
+  { .f = { -0x1.0000000000000p+51, -0x1.ffffffffffffcp+50 } },
+
+  { .f = { -1.00, -1.00 } },
+  { .f = { -1.00, -1.00 } }
+};
+
+union value answers_POS_INF[] = {
+  { .f = {  0.00,  1.00 } },
+  { .f = {  1.00,  1.00 } },
+
+  { .f = {  0x1.ffffffffffffcp+50,  0x1.0000000000000p+51 } },
+  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
+  { .f = {  0x1.0000000000000p+51,  0x1.0000000000002p+51 } },
+  { .f = {  0x1.0000000000002p+51,  0x1.0000000000004p+51 } },
+
+  { .f = {  0x1.ffffffffffffep+51,  0x1.0000000000000p+52 } },
+  { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
+
+  { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
+  { .f = { -0x1.ffffffffffffep+51, -0x1.ffffffffffffep+51 } },
+
+  { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } },
+  { .f = { -0x1.0000000000000p+51, -0x1.0000000000000p+51 } },
+  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
+  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
+
+  { .f = { -1.00,  0.00 } },
+  { .f = {  0.00,  0.00 } }
+};
+
+union value answers_ZERO[] = {
+  { .f = {  0.00,  0.00 } },
+  { .f = {  0.00,  0.00 } },
+
+  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
+  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
+  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
+  { .f = {  0x1.0000000000002p+51,  0x1.0000000000002p+51 } },
+
+  { .f = {  0x1.ffffffffffffep+51,  0x1.ffffffffffffep+51 } },
+  { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
+
+  { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
+  { .f = { -0x1.ffffffffffffep+51, -0x1.ffffffffffffep+51 } },
+
+  { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } },
+  { .f = { -0x1.0000000000000p+51, -0x1.0000000000000p+51 } },
+  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
+  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
+
+  { .f = { -1.00,  0.00 } },
+  { .f = {  0.00,  0.00 } }
+};
+
+union value *answers[] = {
+  answers_NEAREST_INT,
+  answers_NEG_INF,
+  answers_POS_INF,
+  answers_ZERO,
+  0 /* CUR_DIRECTION answers depend on current rounding mode.  */
+};
+
+#include "sse4_1-round3.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
new file mode 100644
index 000000000000..4b0366dfddf3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
@@ -0,0 +1,98 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-O2 -mvsx" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <smmintrin.h>
+
+#define VEC_T __m128
+#define FP_T float
+
+#define ROUND_INTRIN(x, ignored, mode) _mm_round_ps (x, mode)
+
+#include "sse4_1-round-data.h"
+
+struct data2 data[] = {
+  { .value1 = { .f = {  0.00,  0.25,  0.50,  0.75 } } },
+
+  { .value1 = { .f = {  0x1.fffff8p+21,  0x1.fffffap+21,
+			0x1.fffffcp+21,  0x1.fffffep+21 } } },
+  { .value1 = { .f = {  0x1.fffffap+22,  0x1.fffffcp+22,
+			0x1.fffffep+22,  0x1.fffffep+23 } } },
+  { .value1 = { .f = { -0x1.fffffep+23, -0x1.fffffep+22,
+		       -0x1.fffffcp+22, -0x1.fffffap+22 } } },
+  { .value1 = { .f = { -0x1.fffffep+21, -0x1.fffffcp+21,
+		       -0x1.fffffap+21, -0x1.fffff8p+21 } } },
+
+  { .value1 = { .f = { -1.00, -0.75, -0.50, -0.25 } } }
+};
+
+union value answers_NEAREST_INT[] = {
+  { .f = {  0.00,  0.00,  0.00,  1.00 } },
+
+  { .f = {  0x1.fffff8p+21,  0x1.fffff8p+21,
+            0x1.000000p+22,  0x1.000000p+22 } },
+  { .f = {  0x1.fffff8p+22,  0x1.fffffcp+22,
+            0x1.000000p+23,  0x1.fffffep+23 } },
+  { .f = { -0x1.fffffep+23, -0x1.000000p+23,
+           -0x1.fffffcp+22, -0x1.fffff8p+22 } },
+  { .f = { -0x1.000000p+22, -0x1.000000p+22,
+           -0x1.fffff8p+21, -0x1.fffff8p+21 } },
+
+  { .f = { -1.00, -1.00,  0.00,  0.00 } }
+};
+
+union value answers_NEG_INF[] = {
+  { .f = {  0.00,  0.00,  0.00,  0.00 } },
+
+  { .f = {  0x1.fffff8p+21,  0x1.fffff8p+21,
+            0x1.fffff8p+21,  0x1.fffff8p+21 } },
+  { .f = {  0x1.fffff8p+22,  0x1.fffffcp+22,
+            0x1.fffffcp+22,  0x1.fffffep+23 } },
+  { .f = { -0x1.fffffep+23, -0x1.000000p+23,
+           -0x1.fffffcp+22, -0x1.fffffcp+22 } },
+  { .f = { -0x1.000000p+22, -0x1.000000p+22,
+           -0x1.000000p+22, -0x1.fffff8p+21 } },
+
+  { .f = { -1.00, -1.00, -1.00, -1.00 } }
+};
+
+union value answers_POS_INF[] = {
+  { .f = {  0.00,  1.00,  1.00,  1.00 } },
+
+  { .f = {  0x1.fffff8p+21,  0x1.000000p+22,
+            0x1.000000p+22,  0x1.000000p+22 } },
+  { .f = {  0x1.fffffcp+22,  0x1.fffffcp+22,
+            0x1.000000p+23,  0x1.fffffep+23 } },
+  { .f = { -0x1.fffffep+23, -0x1.fffffcp+22,
+           -0x1.fffffcp+22, -0x1.fffff8p+22 } },
+  { .f = { -0x1.fffff8p+21, -0x1.fffff8p+21,
+           -0x1.fffff8p+21, -0x1.fffff8p+21 } },
+
+  { .f = { -1.00,  0.00,  0.00,  0.00 } }
+};
+
+union value answers_ZERO[] = {
+  { .f = {  0.00,  0.00,  0.00,  0.00 } },
+
+  { .f = {  0x1.fffff8p+21,  0x1.fffff8p+21,
+            0x1.fffff8p+21,  0x1.fffff8p+21 } },
+  { .f = {  0x1.fffff8p+22,  0x1.fffffcp+22,
+            0x1.fffffcp+22,  0x1.fffffep+23 } },
+  { .f = { -0x1.fffffep+23, -0x1.fffffcp+22,
+           -0x1.fffffcp+22, -0x1.fffff8p+22 } },
+  { .f = { -0x1.fffff8p+21, -0x1.fffff8p+21,
+           -0x1.fffff8p+21, -0x1.fffff8p+21 } },
+
+  { .f = { -1.00,  0.00,  0.00,  0.00 } }
+};
+
+union value *answers[] = {
+  answers_NEAREST_INT,
+  answers_NEG_INF,
+  answers_POS_INF,
+  answers_ZERO,
+  0 /* CUR_DIRECTION answers depend on current rounding mode.  */
+};
+
+#include "sse4_1-round3.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
new file mode 100644
index 000000000000..4f8d9e08c93d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
@@ -0,0 +1,256 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-O2 -mvsx" } */
+
+#include <stdio.h>
+#define NO_WARN_X86_INTRINSICS 1
+#include <smmintrin.h>
+
+#define VEC_T __m128d
+#define FP_T double
+
+#define ROUND_INTRIN(x, y, mode) _mm_round_sd (x, y, mode)
+
+#include "sse4_1-round-data.h"
+
+static struct data2 data[] = {
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0.00, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0.25, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0.50, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0.75, IGNORED } } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.ffffffffffffcp+50, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.ffffffffffffdp+50, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.ffffffffffffep+50, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffffffffffp+50, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.0000000000000p+51, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.0000000000001p+51, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.0000000000002p+51, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.0000000000003p+51, IGNORED } } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.ffffffffffffep+51, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffffffffffp+51, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.0000000000000p+52, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.0000000000001p+52, IGNORED } } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.0000000000001p+52, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.0000000000000p+52, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffffffffffffp+51, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.ffffffffffffep+51, IGNORED } } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.0000000000004p+51, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.0000000000002p+51, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.0000000000001p+51, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.0000000000000p+51, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.ffffffffffffcp+50, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.ffffffffffffep+50, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.ffffffffffffdp+50, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.ffffffffffffcp+50, IGNORED } } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -1.00, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0.75, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0.50, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
+    .value2 = { .f = { -0.25, IGNORED } } }
+};
+
+static union value answers_NEAREST_INT[] = {
+  { .f = {  0.00, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH } },
+  { .f = {  1.00, PASSTHROUGH } },
+
+  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000004p+51, PASSTHROUGH } },
+
+  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
+  { .f = {  0x1.0000000000001p+52, PASSTHROUGH } },
+
+  { .f = { -0x1.0000000000001p+52, PASSTHROUGH } },
+  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
+  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
+
+  { .f = { -0x1.0000000000004p+51, PASSTHROUGH } },
+  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
+  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
+
+  { .f = { -1.00, PASSTHROUGH } },
+  { .f = { -1.00, PASSTHROUGH } },
+  { .f = { -0.00, PASSTHROUGH } },
+  { .f = { -0.00, PASSTHROUGH } }
+};
+
+static union value answers_NEG_INF[] = {
+  { .f = {  0.00, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH } },
+
+  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
+
+  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
+  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
+  { .f = {  0x1.0000000000001p+52, PASSTHROUGH } },
+
+  { .f = { -0x1.0000000000001p+52, PASSTHROUGH } },
+  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
+  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
+
+  { .f = { -0x1.0000000000004p+51, PASSTHROUGH } },
+  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
+  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
+  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
+
+  { .f = { -1.00, PASSTHROUGH } },
+  { .f = { -1.00, PASSTHROUGH } },
+  { .f = { -1.00, PASSTHROUGH } },
+  { .f = { -1.00, PASSTHROUGH } }
+};
+
+static union value answers_POS_INF[] = {
+  { .f = {  0.00, PASSTHROUGH } },
+  { .f = {  1.00, PASSTHROUGH } },
+  { .f = {  1.00, PASSTHROUGH } },
+  { .f = {  1.00, PASSTHROUGH } },
+
+  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000004p+51, PASSTHROUGH } },
+
+  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
+  { .f = {  0x1.0000000000001p+52, PASSTHROUGH } },
+
+  { .f = { -0x1.0000000000001p+52, PASSTHROUGH } },
+  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
+
+  { .f = { -0x1.0000000000004p+51, PASSTHROUGH } },
+  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
+  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
+
+  { .f = { -1.00, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH } }
+};
+
+static union value answers_ZERO[] = {
+  { .f = {  0.00, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH } },
+
+  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
+
+  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
+  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
+  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
+  { .f = {  0x1.0000000000001p+52, PASSTHROUGH } },
+
+  { .f = { -0x1.0000000000001p+52, PASSTHROUGH } },
+  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
+
+  { .f = { -0x1.0000000000004p+51, PASSTHROUGH } },
+  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
+  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
+  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
+
+  { .f = { -1.00, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH } }
+};
+
+union value *answers[] = {
+  answers_NEAREST_INT,
+  answers_NEG_INF,
+  answers_POS_INF,
+  answers_ZERO,
+  0 /* CUR_DIRECTION answers depend on current rounding mode.  */
+};
+
+#include "sse4_1-round3.h"
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c
new file mode 100644
index 000000000000..d788ebda64dd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c
@@ -0,0 +1,208 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-O2 -mvsx" } */
+
+#include <stdio.h>
+#define NO_WARN_X86_INTRINSICS 1
+#include <smmintrin.h>
+
+#define VEC_T __m128
+#define FP_T float
+
+#define ROUND_INTRIN(x, y, mode) _mm_round_ss (x, y, mode)
+
+#include "sse4_1-round-data.h"
+
+static struct data2 data[] = {
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0.00, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0.25, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0.50, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0.75, IGNORED, IGNORED, IGNORED } } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffff8p+21, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffap+21, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffcp+21, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffep+21, IGNORED, IGNORED, IGNORED } } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffap+22, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffcp+22, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffep+22, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = {  0x1.fffffep+23, IGNORED, IGNORED, IGNORED } } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffffep+23, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffffep+22, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffffcp+22, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffffap+22, IGNORED, IGNORED, IGNORED } } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffffep+21, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffffcp+21, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffffap+21, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0x1.fffff8p+21, IGNORED, IGNORED, IGNORED } } },
+
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -1.00, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0.75, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0.50, IGNORED, IGNORED, IGNORED } } },
+  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+    .value2 = { .f = { -0.25, IGNORED, IGNORED, IGNORED } } }
+};
+
+static union value answers_NEAREST_INT[] = {
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = {  0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }
+};
+
+static union value answers_NEG_INF[] = {
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = {  0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }
+};
+
+static union value answers_POS_INF[] = {
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }
+};
+
+static union value answers_ZERO[] = {
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = {  0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+
+  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
+  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }
+};
+
+union value *answers[] = {
+  answers_NEAREST_INT,
+  answers_NEG_INF,
+  answers_POS_INF,
+  answers_ZERO,
+  0 /* CUR_DIRECTION answers depend on current rounding mode.  */
+};
+
+#include "sse4_1-round3.h"
-- 
2.27.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v4 3/3] rs6000: Guard some x86 intrinsics implementations
  2021-10-19  1:15 [PATCH v4 0/3] rs6000: Support more SSE4 intrinsics Paul A. Clarke
  2021-10-19  1:15 ` [PATCH v4 1/3] rs6000: Add nmmintrin.h to extra_headers Paul A. Clarke
  2021-10-19  1:15 ` [PATCH v4 2/3] rs6000: Support SSE4.1 "round" intrinsics Paul A. Clarke
@ 2021-10-19  1:15 ` Paul A. Clarke
  2021-10-19 14:32   ` Segher Boessenkool
  2 siblings, 1 reply; 13+ messages in thread
From: Paul A. Clarke @ 2021-10-19  1:15 UTC (permalink / raw)
  To: segher; +Cc: gcc-patches, wschmidt

Some compatibility implementations of x86 intrinsics include
Power intrinsics which require POWER8.  Guard them.

emmintrin.h:
- _mm_cmpord_pd: Remove code which was ostensibly for pre-POWER8,
  but which indeed depended on POWER8 (vec_cmpgt(v2du)/vcmpgtud).
  The "POWER8" version works fine on pre-POWER8.
- _mm_mul_epu32: vec_mule(v4su) uses vmuleuw.
pmmintrin.h:
- _mm_movehdup_ps: vec_mergeo(v4su) uses vmrgow.
- _mm_moveldup_ps: vec_mergee(v4su) uses vmrgew.
smmintrin.h:
- _mm_cmpeq_epi64: vec_cmpeq(v2di) uses vcmpequd.
- _mm_mul_epi32: vec_mule(v4si) uses vmuluwm.
- _mm_cmpgt_epi64: vec_cmpgt(v2di) uses vcmpgtsd.
tmmintrin.h:
- _mm_sign_epi8: vec_neg(v4si) uses vsububm.
- _mm_sign_epi16: vec_neg(v4si) uses vsubuhm.
- _mm_sign_epi32: vec_neg(v4si) uses vsubuwm.
  Note that the above three could actually be supported pre-POWER8,
  but current GCC does not support them before POWER8.
- _mm_sign_pi8: depends on _mm_sign_epi8.
- _mm_sign_pi16: depends on _mm_sign_epi16.
- _mm_sign_pi32: depends on _mm_sign_epi32.

2021-10-18  Paul A. Clarke  <pc@us.ibm.com>

gcc
	PR target/101893
	PR target/102719
	* config/rs6000/emmintrin.h: Guard POWER8 intrinsics.
	* config/rs6000/pmmintrin.h: Same.
	* config/rs6000/smmintrin.h: Same.
	* config/rs6000/tmmintrin.h: Same.
---
 gcc/config/rs6000/emmintrin.h                     | 12 ++----------
 gcc/config/rs6000/pmmintrin.h                     |  4 ++++
 gcc/config/rs6000/smmintrin.h                     |  4 ++++
 gcc/config/rs6000/tmmintrin.h                     | 12 ++++++++++++
 gcc/testsuite/gcc.target/powerpc/sse4_2-pcmpgtq.c |  4 ++--
 5 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/gcc/config/rs6000/emmintrin.h b/gcc/config/rs6000/emmintrin.h
index ce1287edf782..32ad72b4cc35 100644
--- a/gcc/config/rs6000/emmintrin.h
+++ b/gcc/config/rs6000/emmintrin.h
@@ -430,20 +430,10 @@ _mm_cmpnge_pd (__m128d __A, __m128d __B)
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_cmpord_pd (__m128d __A, __m128d __B)
 {
-#if _ARCH_PWR8
   __v2du c, d;
   /* Compare against self will return false (0's) if NAN.  */
   c = (__v2du)vec_cmpeq (__A, __A);
   d = (__v2du)vec_cmpeq (__B, __B);
-#else
-  __v2du a, b;
-  __v2du c, d;
-  const __v2du double_exp_mask  = {0x7ff0000000000000, 0x7ff0000000000000};
-  a = (__v2du)vec_abs ((__v2df)__A);
-  b = (__v2du)vec_abs ((__v2df)__B);
-  c = (__v2du)vec_cmpgt (double_exp_mask, a);
-  d = (__v2du)vec_cmpgt (double_exp_mask, b);
-#endif
   /* A != NAN and B != NAN.  */
   return ((__m128d)vec_and(c, d));
 }
@@ -1472,6 +1462,7 @@ _mm_mul_su32 (__m64 __A, __m64 __B)
   return ((__m64)a * (__m64)b);
 }
 
+#ifdef _ARCH_PWR8
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_mul_epu32 (__m128i __A, __m128i __B)
 {
@@ -1498,6 +1489,7 @@ _mm_mul_epu32 (__m128i __A, __m128i __B)
   return (__m128i) vec_mule ((__v4su)__A, (__v4su)__B);
 #endif
 }
+#endif
 
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_slli_epi16 (__m128i __A, int __B)
diff --git a/gcc/config/rs6000/pmmintrin.h b/gcc/config/rs6000/pmmintrin.h
index eab712fdfa66..83dff1d85666 100644
--- a/gcc/config/rs6000/pmmintrin.h
+++ b/gcc/config/rs6000/pmmintrin.h
@@ -123,17 +123,21 @@ _mm_hsub_pd (__m128d __X, __m128d __Y)
 			    vec_mergel ((__v2df) __X, (__v2df)__Y));
 }
 
+#ifdef _ARCH_PWR8
 extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_movehdup_ps (__m128 __X)
 {
   return (__m128)vec_mergeo ((__v4su)__X, (__v4su)__X);
 }
+#endif
 
+#ifdef _ARCH_PWR8
 extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_moveldup_ps (__m128 __X)
 {
   return (__m128)vec_mergee ((__v4su)__X, (__v4su)__X);
 }
+#endif
 
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_loaddup_pd (double const *__P)
diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 6bb03e6e20ac..24adc95589ad 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -324,6 +324,7 @@ _mm_extract_ps (__m128 __X, const int __N)
   return ((__v4si)__X)[__N & 3];
 }
 
+#ifdef _ARCH_PWR8
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_blend_epi16 (__m128i __A, __m128i __B, const int __imm8)
 {
@@ -335,6 +336,7 @@ _mm_blend_epi16 (__m128i __A, __m128i __B, const int __imm8)
   #endif
   return (__m128i) vec_sel ((__v8hu) __A, (__v8hu) __B, __shortmask);
 }
+#endif
 
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
@@ -395,6 +397,7 @@ _mm_blend_pd (__m128d __A, __m128d __B, const int __imm8)
   return (__m128d) __r;
 }
 
+#ifdef _ARCH_PWR8
 __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask)
@@ -403,6 +406,7 @@ _mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask)
   const __vector __bool long long __boolmask = vec_cmplt ((__v2di) __mask, __zero);
   return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) __boolmask);
 }
+#endif
 
 __inline int
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
diff --git a/gcc/config/rs6000/tmmintrin.h b/gcc/config/rs6000/tmmintrin.h
index 971511260b78..a67d88c8079a 100644
--- a/gcc/config/rs6000/tmmintrin.h
+++ b/gcc/config/rs6000/tmmintrin.h
@@ -350,6 +350,7 @@ _mm_shuffle_pi8 (__m64 __A, __m64 __B)
   return (__m64) ((__v2du) (__C))[0];
 }
 
+#ifdef _ARCH_PWR8
 extern __inline __m128i
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_sign_epi8 (__m128i __A, __m128i __B)
@@ -361,7 +362,9 @@ _mm_sign_epi8 (__m128i __A, __m128i __B)
   __v16qi __conv = vec_add (__selectneg, __selectpos);
   return (__m128i) vec_mul ((__v16qi) __A, (__v16qi) __conv);
 }
+#endif
 
+#ifdef _ARCH_PWR8
 extern __inline __m128i
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_sign_epi16 (__m128i __A, __m128i __B)
@@ -373,7 +376,9 @@ _mm_sign_epi16 (__m128i __A, __m128i __B)
   __v8hi __conv = vec_add (__selectneg, __selectpos);
   return (__m128i) vec_mul ((__v8hi) __A, (__v8hi) __conv);
 }
+#endif
 
+#ifdef _ARCH_PWR8
 extern __inline __m128i
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_sign_epi32 (__m128i __A, __m128i __B)
@@ -385,7 +390,9 @@ _mm_sign_epi32 (__m128i __A, __m128i __B)
   __v4si __conv = vec_add (__selectneg, __selectpos);
   return (__m128i) vec_mul ((__v4si) __A, (__v4si) __conv);
 }
+#endif
 
+#ifdef _ARCH_PWR8
 extern __inline __m64
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_sign_pi8 (__m64 __A, __m64 __B)
@@ -396,7 +403,9 @@ _mm_sign_pi8 (__m64 __A, __m64 __B)
   __C = (__v16qi) _mm_sign_epi8 ((__m128i) __C, (__m128i) __D);
   return (__m64) ((__v2du) (__C))[0];
 }
+#endif
 
+#ifdef _ARCH_PWR8
 extern __inline __m64
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_sign_pi16 (__m64 __A, __m64 __B)
@@ -407,7 +416,9 @@ _mm_sign_pi16 (__m64 __A, __m64 __B)
   __C = (__v8hi) _mm_sign_epi16 ((__m128i) __C, (__m128i) __D);
   return (__m64) ((__v2du) (__C))[0];
 }
+#endif
 
+#ifdef _ARCH_PWR8
 extern __inline __m64
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_sign_pi32 (__m64 __A, __m64 __B)
@@ -418,6 +429,7 @@ _mm_sign_pi32 (__m64 __A, __m64 __B)
   __C = (__v4si) _mm_sign_epi32 ((__m128i) __C, (__m128i) __D);
   return (__m64) ((__v2du) (__C))[0];
 }
+#endif
 
 extern __inline __m128i
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_2-pcmpgtq.c b/gcc/testsuite/gcc.target/powerpc/sse4_2-pcmpgtq.c
index e8ecd9c43c25..36b9bd7f9f4a 100644
--- a/gcc/testsuite/gcc.target/powerpc/sse4_2-pcmpgtq.c
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_2-pcmpgtq.c
@@ -1,6 +1,6 @@
 /* { dg-do run } */
-/* { dg-options "-O2 -mvsx" } */
-/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-O2 -mpower8-vector" } */
+/* { dg-require-effective-target p8vector_hw } */
 
 #ifndef CHECK_H
 #define CHECK_H "sse4_2-check.h"
-- 
2.27.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 1/3] rs6000: Add nmmintrin.h to extra_headers
  2021-10-19  1:15 ` [PATCH v4 1/3] rs6000: Add nmmintrin.h to extra_headers Paul A. Clarke
@ 2021-10-19 13:10   ` Bill Schmidt
  2021-10-19 14:27     ` Segher Boessenkool
  0 siblings, 1 reply; 13+ messages in thread
From: Bill Schmidt @ 2021-10-19 13:10 UTC (permalink / raw)
  To: Paul A. Clarke, segher; +Cc: gcc-patches

Hi Paul,

On 10/18/21 8:15 PM, Paul A. Clarke wrote:
> Fix an ommission in commit 29fb1e831bf1c25e4574bf2f98a9f534e5c67665.
>
> 2021-10-18  Paul A. Clarke  <pc@us.ibm.com>
>
> gcc
> 	* config/config.gcc (extra_headers): Add nmmintrin.h.
> ---
>  gcc/config.gcc | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index aa5bd5d14590..1cb9303b3a85 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -490,6 +490,7 @@ powerpc*-*-*)
>  	extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h"
>  	extra_headers="${extra_headers} mmintrin.h x86intrin.h"
>  	extra_headers="${extra_headers} pmmintrin.h tmmintrin.h smmintrin.h"
> +	extra_headers="${extra_headers} nmmintrin.h"
>  	extra_headers="${extra_headers} ppu_intrinsics.h spu2vmx.h vec_types.h si2vmx.h"
>  	extra_headers="${extra_headers} amo.h"
>  	case x$with_cpu in

In my opinion, you can commit this one as obvious.

Thanks,
Bill


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 1/3] rs6000: Add nmmintrin.h to extra_headers
  2021-10-19 13:10   ` Bill Schmidt
@ 2021-10-19 14:27     ` Segher Boessenkool
  0 siblings, 0 replies; 13+ messages in thread
From: Segher Boessenkool @ 2021-10-19 14:27 UTC (permalink / raw)
  To: wschmidt; +Cc: Paul A. Clarke, gcc-patches

On Tue, Oct 19, 2021 at 08:10:06AM -0500, Bill Schmidt via Gcc-patches wrote:
> Hi Paul,
> 
> On 10/18/21 8:15 PM, Paul A. Clarke wrote:
> > Fix an ommission in commit 29fb1e831bf1c25e4574bf2f98a9f534e5c67665.

(Typo, s/mm/m/)

> > 2021-10-18  Paul A. Clarke  <pc@us.ibm.com>
> >
> > gcc
> > 	* config/config.gcc (extra_headers): Add nmmintrin.h.
> > ---
> >  gcc/config.gcc | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/gcc/config.gcc b/gcc/config.gcc
> > index aa5bd5d14590..1cb9303b3a85 100644
> > --- a/gcc/config.gcc
> > +++ b/gcc/config.gcc
> > @@ -490,6 +490,7 @@ powerpc*-*-*)
> >  	extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h"
> >  	extra_headers="${extra_headers} mmintrin.h x86intrin.h"
> >  	extra_headers="${extra_headers} pmmintrin.h tmmintrin.h smmintrin.h"
> > +	extra_headers="${extra_headers} nmmintrin.h"
> >  	extra_headers="${extra_headers} ppu_intrinsics.h spu2vmx.h vec_types.h si2vmx.h"
> >  	extra_headers="${extra_headers} amo.h"
> >  	case x$with_cpu in
> 
> In my opinion, you can commit this one as obvious.

Or as trivial.  Or as obvious and trivial :-)


Segher

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 3/3] rs6000: Guard some x86 intrinsics implementations
  2021-10-19  1:15 ` [PATCH v4 3/3] rs6000: Guard some x86 intrinsics implementations Paul A. Clarke
@ 2021-10-19 14:32   ` Segher Boessenkool
  2021-10-19 15:23     ` Paul A. Clarke
  0 siblings, 1 reply; 13+ messages in thread
From: Segher Boessenkool @ 2021-10-19 14:32 UTC (permalink / raw)
  To: Paul A. Clarke; +Cc: wschmidt, gcc-patches

On Mon, Oct 18, 2021 at 08:15:12PM -0500, Paul A. Clarke via Gcc-patches wrote:
> Some compatibility implementations of x86 intrinsics include
> Power intrinsics which require POWER8.  Guard them.

I assume this improves on all previous commented things (you don't say
if it does).

> gcc
> 	PR target/101893
> 	PR target/102719
> 	* config/rs6000/emmintrin.h: Guard POWER8 intrinsics.
> 	* config/rs6000/pmmintrin.h: Same.
> 	* config/rs6000/smmintrin.h: Same.
> 	* config/rs6000/tmmintrin.h: Same.

Okay for trunk.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 3/3] rs6000: Guard some x86 intrinsics implementations
  2021-10-19 14:32   ` Segher Boessenkool
@ 2021-10-19 15:23     ` Paul A. Clarke
  0 siblings, 0 replies; 13+ messages in thread
From: Paul A. Clarke @ 2021-10-19 15:23 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches

On Tue, Oct 19, 2021 at 09:32:20AM -0500, Segher Boessenkool wrote:
> On Mon, Oct 18, 2021 at 08:15:12PM -0500, Paul A. Clarke via Gcc-patches wrote:
> > Some compatibility implementations of x86 intrinsics include
> > Power intrinsics which require POWER8.  Guard them.
> 
> I assume this improves on all previous commented things (you don't say
> if it does).

Sorry, I summarized the changes in the v4 cover letter. This patch
required no changes other than adding a new PR addressed by it.
The reasons for no changes was in my reply to your review of v3.

> > gcc
> > 	PR target/101893
> > 	PR target/102719
> > 	* config/rs6000/emmintrin.h: Guard POWER8 intrinsics.
> > 	* config/rs6000/pmmintrin.h: Same.
> > 	* config/rs6000/smmintrin.h: Same.
> > 	* config/rs6000/tmmintrin.h: Same.
> 
> Okay for trunk.  Thanks!

Thanks!

PC

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PING PATCH v4 2/3] rs6000: Support SSE4.1 "round" intrinsics
  2021-10-19  1:15 ` [PATCH v4 2/3] rs6000: Support SSE4.1 "round" intrinsics Paul A. Clarke
@ 2021-10-26 20:00   ` Paul A. Clarke
  2021-11-08 17:40     ` [PING^2 " Paul A. Clarke
  0 siblings, 1 reply; 13+ messages in thread
From: Paul A. Clarke @ 2021-10-26 20:00 UTC (permalink / raw)
  To: segher; +Cc: gcc-patches, wschmidt

Patches 1/3 and 3/3 have been committed.
This is only a ping for 2/3.

On Mon, Oct 18, 2021 at 08:15:11PM -0500, Paul A. Clarke via Gcc-patches wrote:
> Suppress exceptions (when specified), by saving, manipulating, and
> restoring the FPSCR.  Similarly, save, set, and restore the floating-point
> rounding mode when required.
> 
> No attempt is made to optimize writing the FPSCR (by checking if the new
> value would be the same), other than using lighter weight instructions
> when possible. Note that explicit instruction scheduling "barriers" are
> added to prevent floating-point computations from being moved before or
> after the explicit FPSCR manipulations.  (That these are required has
> been reported as an issue in GCC: PR102783.)
> 
> The scalar versions naively use the parallel versions to compute the
> single scalar result and then construct the remainder of the result.
> 
> Of minor note, the values of _MM_FROUND_TO_NEG_INF and _MM_FROUND_TO_ZERO
> are swapped from the corresponding values on x86 so as to match the
> corresponding rounding mode values in the Power ISA.
> 
> Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and
> convert _mm_ceil* and _mm_floor* into macros. This matches the current
> analogous implementations in config/i386/smmintrin.h.
> 
> Function signatures match the analogous functions in config/i386/smmintrin.h.
> 
> Add tests for _mm_round_pd, _mm_round_ps, _mm_round_sd, _mm_round_ss,
> modeled after the very similar "floor" and "ceil" tests.
> 
> Include basic tests, plus tests at the boundaries for floating-point
> representation, positive and negative, test all of the parameterized
> rounding modes as well as the C99 rounding modes and interactions
> between the two.
> 
> Exceptions are not explicitly tested.
> 
> 2021-10-18  Paul A. Clarke  <pc@us.ibm.com>
> 
> gcc
> 	* config/rs6000/smmintrin.h (_mm_round_pd, _mm_round_ps,
> 	_mm_round_sd, _mm_round_ss, _MM_FROUND_TO_NEAREST_INT,
> 	_MM_FROUND_TO_ZERO, _MM_FROUND_TO_POS_INF, _MM_FROUND_TO_NEG_INF,
> 	_MM_FROUND_CUR_DIRECTION, _MM_FROUND_RAISE_EXC, _MM_FROUND_NO_EXC,
> 	_MM_FROUND_NINT, _MM_FROUND_FLOOR, _MM_FROUND_CEIL, _MM_FROUND_TRUNC,
> 	_MM_FROUND_RINT, _MM_FROUND_NEARBYINT): New.
> 	* config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd,
> 	_mm_ceil_ss, _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss):
> 	Convert from function to macro.
> 
> gcc/testsuite
> 	* gcc.target/powerpc/sse4_1-round3.h: New.
> 	* gcc.target/powerpc/sse4_1-roundpd.c: New.
> 	* gcc.target/powerpc/sse4_1-roundps.c: New.
> 	* gcc.target/powerpc/sse4_1-roundsd.c: New.
> 	* gcc.target/powerpc/sse4_1-roundss.c: New.
> ---
>  gcc/config/rs6000/smmintrin.h                 | 292 ++++++++++++++----
>  .../gcc.target/powerpc/sse4_1-round3.h        |  81 +++++
>  .../gcc.target/powerpc/sse4_1-roundpd.c       | 143 +++++++++
>  .../gcc.target/powerpc/sse4_1-roundps.c       |  98 ++++++
>  .../gcc.target/powerpc/sse4_1-roundsd.c       | 256 +++++++++++++++
>  .../gcc.target/powerpc/sse4_1-roundss.c       | 208 +++++++++++++
>  6 files changed, 1014 insertions(+), 64 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c
> 
> diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
> index 90ce03d22709..6bb03e6e20ac 100644
> --- a/gcc/config/rs6000/smmintrin.h
> +++ b/gcc/config/rs6000/smmintrin.h
> @@ -42,6 +42,234 @@
>  #include <altivec.h>
>  #include <tmmintrin.h>
>  
> +/* Rounding mode macros. */
> +#define _MM_FROUND_TO_NEAREST_INT       0x00
> +#define _MM_FROUND_TO_ZERO              0x01
> +#define _MM_FROUND_TO_POS_INF           0x02
> +#define _MM_FROUND_TO_NEG_INF           0x03
> +#define _MM_FROUND_CUR_DIRECTION        0x04
> +
> +#define _MM_FROUND_NINT		\
> +  (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_RAISE_EXC)
> +#define _MM_FROUND_FLOOR	\
> +  (_MM_FROUND_TO_NEG_INF | _MM_FROUND_RAISE_EXC)
> +#define _MM_FROUND_CEIL		\
> +  (_MM_FROUND_TO_POS_INF | _MM_FROUND_RAISE_EXC)
> +#define _MM_FROUND_TRUNC	\
> +  (_MM_FROUND_TO_ZERO | _MM_FROUND_RAISE_EXC)
> +#define _MM_FROUND_RINT		\
> +  (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_RAISE_EXC)
> +#define _MM_FROUND_NEARBYINT	\
> +  (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC)
> +
> +#define _MM_FROUND_RAISE_EXC            0x00
> +#define _MM_FROUND_NO_EXC               0x08
> +
> +extern __inline __m128d
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_round_pd (__m128d __A, int __rounding)
> +{
> +  __v2df __r;
> +  union {
> +    double __fr;
> +    long long __fpscr;
> +  } __enables_save, __fpscr_save;
> +
> +  if (__rounding & _MM_FROUND_NO_EXC)
> +    {
> +      /* Save enabled exceptions, disable all exceptions,
> +	 and preserve the rounding mode.  */
> +#ifdef _ARCH_PWR9
> +      __asm__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
> +      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
> +#else
> +      __fpscr_save.__fr = __builtin_mffs ();
> +      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
> +      __fpscr_save.__fpscr &= ~0xf8;
> +      __builtin_mtfsf (0b00000011, __fpscr_save.__fr);
> +#endif
> +      /* Insert an artificial "read/write" reference to the variable
> +	 read below, to ensure the compiler does not schedule
> +	 a read/use of the variable before the FPSCR is modified, above.
> +	 This can be removed if and when GCC PR102783 is fixed.
> +       */
> +      __asm__ ("" : "+wa" (__A));
> +    }
> +
> +  switch (__rounding)
> +    {
> +      case _MM_FROUND_TO_NEAREST_INT:
> +	__fpscr_save.__fr = __builtin_mffsl ();
> +	__attribute__ ((fallthrough));
> +      case _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC:
> +	__builtin_set_fpscr_rn (0b00);
> +	/* Insert an artificial "read/write" reference to the variable
> +	   read below, to ensure the compiler does not schedule
> +	   a read/use of the variable before the FPSCR is modified, above.
> +	   This can be removed if and when GCC PR102783 is fixed.
> +	 */
> +	__asm__ ("" : "+wa" (__A));
> +
> +	__r = vec_rint ((__v2df) __A);
> +
> +	/* Insert an artificial "read" reference to the variable written
> +	   above, to ensure the compiler does not schedule the computation
> +	   of the value after the manipulation of the FPSCR, below.
> +	   This can be removed if and when GCC PR102783 is fixed.
> +	 */
> +	__asm__ ("" : : "wa" (__r));
> +	__builtin_set_fpscr_rn (__fpscr_save.__fpscr);
> +	break;
> +      case _MM_FROUND_TO_NEG_INF:
> +      case _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC:
> +	__r = vec_floor ((__v2df) __A);
> +	break;
> +      case _MM_FROUND_TO_POS_INF:
> +      case _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC:
> +	__r = vec_ceil ((__v2df) __A);
> +	break;
> +      case _MM_FROUND_TO_ZERO:
> +      case _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC:
> +	__r = vec_trunc ((__v2df) __A);
> +	break;
> +      case _MM_FROUND_CUR_DIRECTION:
> +	__r = vec_rint ((__v2df) __A);
> +	break;
> +    }
> +  if (__rounding & _MM_FROUND_NO_EXC)
> +    {
> +      /* Insert an artificial "read" reference to the variable written
> +	 above, to ensure the compiler does not schedule the computation
> +	 of the value after the manipulation of the FPSCR, below.
> +	 This can be removed if and when GCC PR102783 is fixed.
> +       */
> +      __asm__ ("" : : "wa" (__r));
> +      /* Restore enabled exceptions.  */
> +      __fpscr_save.__fr = __builtin_mffsl ();
> +      __fpscr_save.__fpscr |= __enables_save.__fpscr;
> +      __builtin_mtfsf (0b00000011, __fpscr_save.__fr);
> +    }
> +  return (__m128d) __r;
> +}
> +
> +extern __inline __m128d
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_round_sd (__m128d __A, __m128d __B, int __rounding)
> +{
> +  __B = _mm_round_pd (__B, __rounding);
> +  __v2df __r = { ((__v2df) __B)[0], ((__v2df) __A)[1] };
> +  return (__m128d) __r;
> +}
> +
> +extern __inline __m128
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_round_ps (__m128 __A, int __rounding)
> +{
> +  __v4sf __r;
> +  union {
> +    double __fr;
> +    long long __fpscr;
> +  } __enables_save, __fpscr_save;
> +
> +  if (__rounding & _MM_FROUND_NO_EXC)
> +    {
> +      /* Save enabled exceptions, disable all exceptions,
> +	 and preserve the rounding mode.  */
> +#ifdef _ARCH_PWR9
> +      __asm__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
> +      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
> +#else
> +      __fpscr_save.__fr = __builtin_mffs ();
> +      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
> +      __fpscr_save.__fpscr &= ~0xf8;
> +      __builtin_mtfsf (0b00000011, __fpscr_save.__fr);
> +#endif
> +      /* Insert an artificial "read/write" reference to the variable
> +	 read below, to ensure the compiler does not schedule
> +	 a read/use of the variable before the FPSCR is modified, above.
> +	 This can be removed if and when GCC PR102783 is fixed.
> +       */
> +      __asm__ ("" : "+wa" (__A));
> +    }
> +
> +  switch (__rounding)
> +    {
> +      case _MM_FROUND_TO_NEAREST_INT:
> +	__fpscr_save.__fr = __builtin_mffsl ();
> +	__attribute__ ((fallthrough));
> +      case _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC:
> +	__builtin_set_fpscr_rn (0b00);
> +	/* Insert an artificial "read/write" reference to the variable
> +	   read below, to ensure the compiler does not schedule
> +	   a read/use of the variable before the FPSCR is modified, above.
> +	   This can be removed if and when GCC PR102783 is fixed.
> +	 */
> +	__asm__ ("" : "+wa" (__A));
> +
> +	__r = vec_rint ((__v4sf) __A);
> +
> +	/* Insert an artificial "read" reference to the variable written
> +	   above, to ensure the compiler does not schedule the computation
> +	   of the value after the manipulation of the FPSCR, below.
> +	   This can be removed if and when GCC PR102783 is fixed.
> +	 */
> +	__asm__ ("" : : "wa" (__r));
> +	__builtin_set_fpscr_rn (__fpscr_save.__fpscr);
> +	break;
> +      case _MM_FROUND_TO_NEG_INF:
> +      case _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC:
> +	__r = vec_floor ((__v4sf) __A);
> +	break;
> +      case _MM_FROUND_TO_POS_INF:
> +      case _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC:
> +	__r = vec_ceil ((__v4sf) __A);
> +	break;
> +      case _MM_FROUND_TO_ZERO:
> +      case _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC:
> +	__r = vec_trunc ((__v4sf) __A);
> +	break;
> +      case _MM_FROUND_CUR_DIRECTION:
> +	__r = vec_rint ((__v4sf) __A);
> +	break;
> +    }
> +  if (__rounding & _MM_FROUND_NO_EXC)
> +    {
> +      /* Insert an artificial "read" reference to the variable written
> +	 above, to ensure the compiler does not schedule the computation
> +	 of the value after the manipulation of the FPSCR, below.
> +	 This can be removed if and when GCC PR102783 is fixed.
> +       */
> +      __asm__ ("" : : "wa" (__r));
> +      /* Restore enabled exceptions.  */
> +      __fpscr_save.__fr = __builtin_mffsl ();
> +      __fpscr_save.__fpscr |= __enables_save.__fpscr;
> +      __builtin_mtfsf (0b00000011, __fpscr_save.__fr);
> +    }
> +  return (__m128) __r;
> +}
> +
> +extern __inline __m128
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_round_ss (__m128 __A, __m128 __B, int __rounding)
> +{
> +  __B = _mm_round_ps (__B, __rounding);
> +  __v4sf __r = (__v4sf) __A;
> +  __r[0] = ((__v4sf) __B)[0];
> +  return (__m128) __r;
> +}
> +
> +#define _mm_ceil_pd(V)	   _mm_round_pd ((V), _MM_FROUND_CEIL)
> +#define _mm_ceil_sd(D, V)  _mm_round_sd ((D), (V), _MM_FROUND_CEIL)
> +
> +#define _mm_floor_pd(V)	   _mm_round_pd((V), _MM_FROUND_FLOOR)
> +#define _mm_floor_sd(D, V) _mm_round_sd ((D), (V), _MM_FROUND_FLOOR)
> +
> +#define _mm_ceil_ps(V)	   _mm_round_ps ((V), _MM_FROUND_CEIL)
> +#define _mm_ceil_ss(D, V)  _mm_round_ss ((D), (V), _MM_FROUND_CEIL)
> +
> +#define _mm_floor_ps(V)	   _mm_round_ps ((V), _MM_FROUND_FLOOR)
> +#define _mm_floor_ss(D, V) _mm_round_ss ((D), (V), _MM_FROUND_FLOOR)
> +
>  extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
>  _mm_insert_epi8 (__m128i const __A, int const __D, int const __N)
>  {
> @@ -210,70 +438,6 @@ _mm_testnzc_si128 (__m128i __A, __m128i __B)
>  
>  #define _mm_test_mix_ones_zeros(M, V) _mm_testnzc_si128 ((M), (V))
>  
> -__inline __m128d
> -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> -_mm_ceil_pd (__m128d __A)
> -{
> -  return (__m128d) vec_ceil ((__v2df) __A);
> -}
> -
> -__inline __m128d
> -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> -_mm_ceil_sd (__m128d __A, __m128d __B)
> -{
> -  __v2df __r = vec_ceil ((__v2df) __B);
> -  __r[1] = ((__v2df) __A)[1];
> -  return (__m128d) __r;
> -}
> -
> -__inline __m128d
> -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> -_mm_floor_pd (__m128d __A)
> -{
> -  return (__m128d) vec_floor ((__v2df) __A);
> -}
> -
> -__inline __m128d
> -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> -_mm_floor_sd (__m128d __A, __m128d __B)
> -{
> -  __v2df __r = vec_floor ((__v2df) __B);
> -  __r[1] = ((__v2df) __A)[1];
> -  return (__m128d) __r;
> -}
> -
> -__inline __m128
> -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> -_mm_ceil_ps (__m128 __A)
> -{
> -  return (__m128) vec_ceil ((__v4sf) __A);
> -}
> -
> -__inline __m128
> -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> -_mm_ceil_ss (__m128 __A, __m128 __B)
> -{
> -  __v4sf __r = (__v4sf) __A;
> -  __r[0] = __builtin_ceil (((__v4sf) __B)[0]);
> -  return __r;
> -}
> -
> -__inline __m128
> -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> -_mm_floor_ps (__m128 __A)
> -{
> -  return (__m128) vec_floor ((__v4sf) __A);
> -}
> -
> -__inline __m128
> -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> -_mm_floor_ss (__m128 __A, __m128 __B)
> -{
> -  __v4sf __r = (__v4sf) __A;
> -  __r[0] = __builtin_floor (((__v4sf) __B)[0]);
> -  return __r;
> -}
> -
>  #ifdef _ARCH_PWR8
>  extern __inline __m128i
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h b/gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
> new file mode 100644
> index 000000000000..de6cbf7be438
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
> @@ -0,0 +1,81 @@
> +#include <smmintrin.h>
> +#include <fenv.h>
> +#include "sse4_1-check.h"
> +
> +#define DIM(a) (sizeof (a) / sizeof (a)[0])
> +
> +static int roundings[] =
> +  {
> +    _MM_FROUND_TO_NEAREST_INT,
> +    _MM_FROUND_TO_NEG_INF,
> +    _MM_FROUND_TO_POS_INF,
> +    _MM_FROUND_TO_ZERO,
> +    _MM_FROUND_CUR_DIRECTION
> +  };
> +
> +static int modes[] =
> +  {
> +    FE_TONEAREST,
> +    FE_UPWARD,
> +    FE_DOWNWARD,
> +    FE_TOWARDZERO
> +  };
> +
> +static void
> +TEST (void)
> +{
> +  int i, j, ri, mi, round_save;
> +
> +  round_save = fegetround ();
> +  for (mi = 0; mi < DIM (modes); mi++) {
> +    fesetround (modes[mi]);
> +    for (i = 0; i < DIM (data); i++) {
> +      for (ri = 0; ri < DIM (roundings); ri++) {
> +	union value guess;
> +	union value *current_answers = answers[ri];
> +	switch ( roundings[ri] ) {
> +	  case _MM_FROUND_TO_NEAREST_INT:
> +	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
> +				    _MM_FROUND_TO_NEAREST_INT);
> +	    break;
> +	  case _MM_FROUND_TO_NEG_INF:
> +	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
> +				    _MM_FROUND_TO_NEG_INF);
> +	    break;
> +	  case _MM_FROUND_TO_POS_INF:
> +	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
> +				    _MM_FROUND_TO_POS_INF);
> +	    break;
> +	  case _MM_FROUND_TO_ZERO:
> +	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
> +				    _MM_FROUND_TO_ZERO);
> +	    break;
> +	  case _MM_FROUND_CUR_DIRECTION:
> +	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
> +				    _MM_FROUND_CUR_DIRECTION);
> +	    switch ( modes[mi] ) {
> +	      case FE_TONEAREST:
> +		current_answers = answers_NEAREST_INT;
> +		break;
> +	      case FE_UPWARD:
> +		current_answers = answers_POS_INF;
> +		break;
> +	      case FE_DOWNWARD:
> +		current_answers = answers_NEG_INF;
> +		break;
> +	      case FE_TOWARDZERO:
> +		current_answers = answers_ZERO;
> +		break;
> +	    }
> +	    break;
> +	  default:
> +	    abort ();
> +	}
> +	for (j = 0; j < DIM (guess.f); j++)
> +	  if (guess.f[j] != current_answers[i].f[j])
> +	    abort ();
> +      }
> +    }
> +  }
> +  fesetround (round_save);
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
> new file mode 100644
> index 000000000000..58d9cc524167
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
> @@ -0,0 +1,143 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vsx_hw } */
> +/* { dg-options "-O2 -mvsx" } */
> +
> +#define NO_WARN_X86_INTRINSICS 1
> +#include <smmintrin.h>
> +
> +#define VEC_T __m128d
> +#define FP_T double
> +
> +#define ROUND_INTRIN(x, ignored, mode) _mm_round_pd (x, mode)
> +
> +#include "sse4_1-round-data.h"
> +
> +struct data2 data[] = {
> +  { .value1 = { .f = {  0.00,  0.25 } } },
> +  { .value1 = { .f = {  0.50,  0.75 } } },
> +
> +  { .value1 = { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffdp+50 } } },
> +  { .value1 = { .f = {  0x1.ffffffffffffep+50,  0x1.fffffffffffffp+50 } } },
> +  { .value1 = { .f = {  0x1.0000000000000p+51,  0x1.0000000000001p+51 } } },
> +  { .value1 = { .f = {  0x1.0000000000002p+51,  0x1.0000000000003p+51 } } },
> +
> +  { .value1 = { .f = {  0x1.ffffffffffffep+51,  0x1.fffffffffffffp+51 } } },
> +  { .value1 = { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } } },
> +
> +  { .value1 = { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } } },
> +  { .value1 = { .f = { -0x1.fffffffffffffp+51, -0x1.ffffffffffffep+51 } } },
> +
> +  { .value1 = { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } } },
> +  { .value1 = { .f = { -0x1.0000000000001p+51, -0x1.0000000000000p+51 } } },
> +  { .value1 = { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffep+50 } } },
> +  { .value1 = { .f = { -0x1.ffffffffffffdp+50, -0x1.ffffffffffffcp+50 } } },
> +
> +  { .value1 = { .f = { -1.00, -0.75 } } },
> +  { .value1 = { .f = { -0.50, -0.25 } } }
> +};
> +
> +union value answers_NEAREST_INT[] = {
> +  { .f = {  0.00,  0.00 } },
> +  { .f = {  0.00,  1.00 } },
> +
> +  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
> +  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
> +  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
> +  { .f = {  0x1.0000000000002p+51,  0x1.0000000000004p+51 } },
> +
> +  { .f = {  0x1.ffffffffffffep+51,  0x1.0000000000000p+52 } },
> +  { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
> +
> +  { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
> +  { .f = { -0x1.0000000000000p+52, -0x1.ffffffffffffep+51 } },
> +
> +  { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } },
> +  { .f = { -0x1.0000000000000p+51, -0x1.0000000000000p+51 } },
> +  { .f = { -0x1.ffffffffffffcp+50, -0x1.0000000000000p+51 } },
> +  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
> +
> +  { .f = { -1.00, -1.00 } },
> +  { .f = {  0.00,  0.00 } }
> +};
> +
> +union value answers_NEG_INF[] = {
> +  { .f = {  0.00,  0.00 } },
> +  { .f = {  0.00,  0.00 } },
> +
> +  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
> +  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
> +  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
> +  { .f = {  0x1.0000000000002p+51,  0x1.0000000000002p+51 } },
> +
> +  { .f = {  0x1.ffffffffffffep+51,  0x1.ffffffffffffep+51 } },
> +  { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
> +
> +  { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
> +  { .f = { -0x1.0000000000000p+52, -0x1.ffffffffffffep+51 } },
> +
> +  { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } },
> +  { .f = { -0x1.0000000000002p+51, -0x1.0000000000000p+51 } },
> +  { .f = { -0x1.ffffffffffffcp+50, -0x1.0000000000000p+51 } },
> +  { .f = { -0x1.0000000000000p+51, -0x1.ffffffffffffcp+50 } },
> +
> +  { .f = { -1.00, -1.00 } },
> +  { .f = { -1.00, -1.00 } }
> +};
> +
> +union value answers_POS_INF[] = {
> +  { .f = {  0.00,  1.00 } },
> +  { .f = {  1.00,  1.00 } },
> +
> +  { .f = {  0x1.ffffffffffffcp+50,  0x1.0000000000000p+51 } },
> +  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
> +  { .f = {  0x1.0000000000000p+51,  0x1.0000000000002p+51 } },
> +  { .f = {  0x1.0000000000002p+51,  0x1.0000000000004p+51 } },
> +
> +  { .f = {  0x1.ffffffffffffep+51,  0x1.0000000000000p+52 } },
> +  { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
> +
> +  { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
> +  { .f = { -0x1.ffffffffffffep+51, -0x1.ffffffffffffep+51 } },
> +
> +  { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } },
> +  { .f = { -0x1.0000000000000p+51, -0x1.0000000000000p+51 } },
> +  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
> +  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
> +
> +  { .f = { -1.00,  0.00 } },
> +  { .f = {  0.00,  0.00 } }
> +};
> +
> +union value answers_ZERO[] = {
> +  { .f = {  0.00,  0.00 } },
> +  { .f = {  0.00,  0.00 } },
> +
> +  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
> +  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
> +  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
> +  { .f = {  0x1.0000000000002p+51,  0x1.0000000000002p+51 } },
> +
> +  { .f = {  0x1.ffffffffffffep+51,  0x1.ffffffffffffep+51 } },
> +  { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
> +
> +  { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
> +  { .f = { -0x1.ffffffffffffep+51, -0x1.ffffffffffffep+51 } },
> +
> +  { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } },
> +  { .f = { -0x1.0000000000000p+51, -0x1.0000000000000p+51 } },
> +  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
> +  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
> +
> +  { .f = { -1.00,  0.00 } },
> +  { .f = {  0.00,  0.00 } }
> +};
> +
> +union value *answers[] = {
> +  answers_NEAREST_INT,
> +  answers_NEG_INF,
> +  answers_POS_INF,
> +  answers_ZERO,
> +  0 /* CUR_DIRECTION answers depend on current rounding mode.  */
> +};
> +
> +#include "sse4_1-round3.h"
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
> new file mode 100644
> index 000000000000..4b0366dfddf3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
> @@ -0,0 +1,98 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vsx_hw } */
> +/* { dg-options "-O2 -mvsx" } */
> +
> +#define NO_WARN_X86_INTRINSICS 1
> +#include <smmintrin.h>
> +
> +#define VEC_T __m128
> +#define FP_T float
> +
> +#define ROUND_INTRIN(x, ignored, mode) _mm_round_ps (x, mode)
> +
> +#include "sse4_1-round-data.h"
> +
> +struct data2 data[] = {
> +  { .value1 = { .f = {  0.00,  0.25,  0.50,  0.75 } } },
> +
> +  { .value1 = { .f = {  0x1.fffff8p+21,  0x1.fffffap+21,
> +			0x1.fffffcp+21,  0x1.fffffep+21 } } },
> +  { .value1 = { .f = {  0x1.fffffap+22,  0x1.fffffcp+22,
> +			0x1.fffffep+22,  0x1.fffffep+23 } } },
> +  { .value1 = { .f = { -0x1.fffffep+23, -0x1.fffffep+22,
> +		       -0x1.fffffcp+22, -0x1.fffffap+22 } } },
> +  { .value1 = { .f = { -0x1.fffffep+21, -0x1.fffffcp+21,
> +		       -0x1.fffffap+21, -0x1.fffff8p+21 } } },
> +
> +  { .value1 = { .f = { -1.00, -0.75, -0.50, -0.25 } } }
> +};
> +
> +union value answers_NEAREST_INT[] = {
> +  { .f = {  0.00,  0.00,  0.00,  1.00 } },
> +
> +  { .f = {  0x1.fffff8p+21,  0x1.fffff8p+21,
> +            0x1.000000p+22,  0x1.000000p+22 } },
> +  { .f = {  0x1.fffff8p+22,  0x1.fffffcp+22,
> +            0x1.000000p+23,  0x1.fffffep+23 } },
> +  { .f = { -0x1.fffffep+23, -0x1.000000p+23,
> +           -0x1.fffffcp+22, -0x1.fffff8p+22 } },
> +  { .f = { -0x1.000000p+22, -0x1.000000p+22,
> +           -0x1.fffff8p+21, -0x1.fffff8p+21 } },
> +
> +  { .f = { -1.00, -1.00,  0.00,  0.00 } }
> +};
> +
> +union value answers_NEG_INF[] = {
> +  { .f = {  0.00,  0.00,  0.00,  0.00 } },
> +
> +  { .f = {  0x1.fffff8p+21,  0x1.fffff8p+21,
> +            0x1.fffff8p+21,  0x1.fffff8p+21 } },
> +  { .f = {  0x1.fffff8p+22,  0x1.fffffcp+22,
> +            0x1.fffffcp+22,  0x1.fffffep+23 } },
> +  { .f = { -0x1.fffffep+23, -0x1.000000p+23,
> +           -0x1.fffffcp+22, -0x1.fffffcp+22 } },
> +  { .f = { -0x1.000000p+22, -0x1.000000p+22,
> +           -0x1.000000p+22, -0x1.fffff8p+21 } },
> +
> +  { .f = { -1.00, -1.00, -1.00, -1.00 } }
> +};
> +
> +union value answers_POS_INF[] = {
> +  { .f = {  0.00,  1.00,  1.00,  1.00 } },
> +
> +  { .f = {  0x1.fffff8p+21,  0x1.000000p+22,
> +            0x1.000000p+22,  0x1.000000p+22 } },
> +  { .f = {  0x1.fffffcp+22,  0x1.fffffcp+22,
> +            0x1.000000p+23,  0x1.fffffep+23 } },
> +  { .f = { -0x1.fffffep+23, -0x1.fffffcp+22,
> +           -0x1.fffffcp+22, -0x1.fffff8p+22 } },
> +  { .f = { -0x1.fffff8p+21, -0x1.fffff8p+21,
> +           -0x1.fffff8p+21, -0x1.fffff8p+21 } },
> +
> +  { .f = { -1.00,  0.00,  0.00,  0.00 } }
> +};
> +
> +union value answers_ZERO[] = {
> +  { .f = {  0.00,  0.00,  0.00,  0.00 } },
> +
> +  { .f = {  0x1.fffff8p+21,  0x1.fffff8p+21,
> +            0x1.fffff8p+21,  0x1.fffff8p+21 } },
> +  { .f = {  0x1.fffff8p+22,  0x1.fffffcp+22,
> +            0x1.fffffcp+22,  0x1.fffffep+23 } },
> +  { .f = { -0x1.fffffep+23, -0x1.fffffcp+22,
> +           -0x1.fffffcp+22, -0x1.fffff8p+22 } },
> +  { .f = { -0x1.fffff8p+21, -0x1.fffff8p+21,
> +           -0x1.fffff8p+21, -0x1.fffff8p+21 } },
> +
> +  { .f = { -1.00,  0.00,  0.00,  0.00 } }
> +};
> +
> +union value *answers[] = {
> +  answers_NEAREST_INT,
> +  answers_NEG_INF,
> +  answers_POS_INF,
> +  answers_ZERO,
> +  0 /* CUR_DIRECTION answers depend on current rounding mode.  */
> +};
> +
> +#include "sse4_1-round3.h"
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
> new file mode 100644
> index 000000000000..4f8d9e08c93d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
> @@ -0,0 +1,256 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vsx_hw } */
> +/* { dg-options "-O2 -mvsx" } */
> +
> +#include <stdio.h>
> +#define NO_WARN_X86_INTRINSICS 1
> +#include <smmintrin.h>
> +
> +#define VEC_T __m128d
> +#define FP_T double
> +
> +#define ROUND_INTRIN(x, y, mode) _mm_round_sd (x, y, mode)
> +
> +#include "sse4_1-round-data.h"
> +
> +static struct data2 data[] = {
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0.00, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0.25, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0.50, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0.75, IGNORED } } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.ffffffffffffcp+50, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.ffffffffffffdp+50, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.ffffffffffffep+50, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffffffffffp+50, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.0000000000000p+51, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.0000000000001p+51, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.0000000000002p+51, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.0000000000003p+51, IGNORED } } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.ffffffffffffep+51, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffffffffffp+51, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.0000000000000p+52, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.0000000000001p+52, IGNORED } } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.0000000000001p+52, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.0000000000000p+52, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffffffffffffp+51, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.ffffffffffffep+51, IGNORED } } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.0000000000004p+51, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.0000000000002p+51, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.0000000000001p+51, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.0000000000000p+51, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.ffffffffffffcp+50, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.ffffffffffffep+50, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.ffffffffffffdp+50, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.ffffffffffffcp+50, IGNORED } } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -1.00, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0.75, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0.50, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> +    .value2 = { .f = { -0.25, IGNORED } } }
> +};
> +
> +static union value answers_NEAREST_INT[] = {
> +  { .f = {  0.00, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH } },
> +  { .f = {  1.00, PASSTHROUGH } },
> +
> +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000004p+51, PASSTHROUGH } },
> +
> +  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000001p+52, PASSTHROUGH } },
> +
> +  { .f = { -0x1.0000000000001p+52, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> +
> +  { .f = { -0x1.0000000000004p+51, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +
> +  { .f = { -1.00, PASSTHROUGH } },
> +  { .f = { -1.00, PASSTHROUGH } },
> +  { .f = { -0.00, PASSTHROUGH } },
> +  { .f = { -0.00, PASSTHROUGH } }
> +};
> +
> +static union value answers_NEG_INF[] = {
> +  { .f = {  0.00, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH } },
> +
> +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> +
> +  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> +  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000001p+52, PASSTHROUGH } },
> +
> +  { .f = { -0x1.0000000000001p+52, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> +
> +  { .f = { -0x1.0000000000004p+51, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +
> +  { .f = { -1.00, PASSTHROUGH } },
> +  { .f = { -1.00, PASSTHROUGH } },
> +  { .f = { -1.00, PASSTHROUGH } },
> +  { .f = { -1.00, PASSTHROUGH } }
> +};
> +
> +static union value answers_POS_INF[] = {
> +  { .f = {  0.00, PASSTHROUGH } },
> +  { .f = {  1.00, PASSTHROUGH } },
> +  { .f = {  1.00, PASSTHROUGH } },
> +  { .f = {  1.00, PASSTHROUGH } },
> +
> +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000004p+51, PASSTHROUGH } },
> +
> +  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000001p+52, PASSTHROUGH } },
> +
> +  { .f = { -0x1.0000000000001p+52, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> +
> +  { .f = { -0x1.0000000000004p+51, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +
> +  { .f = { -1.00, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH } }
> +};
> +
> +static union value answers_ZERO[] = {
> +  { .f = {  0.00, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH } },
> +
> +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> +
> +  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> +  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
> +  { .f = {  0x1.0000000000001p+52, PASSTHROUGH } },
> +
> +  { .f = { -0x1.0000000000001p+52, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> +
> +  { .f = { -0x1.0000000000004p+51, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> +
> +  { .f = { -1.00, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH } }
> +};
> +
> +union value *answers[] = {
> +  answers_NEAREST_INT,
> +  answers_NEG_INF,
> +  answers_POS_INF,
> +  answers_ZERO,
> +  0 /* CUR_DIRECTION answers depend on current rounding mode.  */
> +};
> +
> +#include "sse4_1-round3.h"
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c
> new file mode 100644
> index 000000000000..d788ebda64dd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c
> @@ -0,0 +1,208 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vsx_hw } */
> +/* { dg-options "-O2 -mvsx" } */
> +
> +#include <stdio.h>
> +#define NO_WARN_X86_INTRINSICS 1
> +#include <smmintrin.h>
> +
> +#define VEC_T __m128
> +#define FP_T float
> +
> +#define ROUND_INTRIN(x, y, mode) _mm_round_ss (x, y, mode)
> +
> +#include "sse4_1-round-data.h"
> +
> +static struct data2 data[] = {
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0.00, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0.25, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0.50, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0.75, IGNORED, IGNORED, IGNORED } } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffff8p+21, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffap+21, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffcp+21, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffep+21, IGNORED, IGNORED, IGNORED } } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffap+22, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffcp+22, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffep+22, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = {  0x1.fffffep+23, IGNORED, IGNORED, IGNORED } } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffffep+23, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffffep+22, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffffcp+22, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffffap+22, IGNORED, IGNORED, IGNORED } } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffffep+21, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffffcp+21, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffffap+21, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0x1.fffff8p+21, IGNORED, IGNORED, IGNORED } } },
> +
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -1.00, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0.75, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0.50, IGNORED, IGNORED, IGNORED } } },
> +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +    .value2 = { .f = { -0.25, IGNORED, IGNORED, IGNORED } } }
> +};
> +
> +static union value answers_NEAREST_INT[] = {
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = {  0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }
> +};
> +
> +static union value answers_NEG_INF[] = {
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = {  0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }
> +};
> +
> +static union value answers_POS_INF[] = {
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }
> +};
> +
> +static union value answers_ZERO[] = {
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = {  0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +
> +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }
> +};
> +
> +union value *answers[] = {
> +  answers_NEAREST_INT,
> +  answers_NEG_INF,
> +  answers_POS_INF,
> +  answers_ZERO,
> +  0 /* CUR_DIRECTION answers depend on current rounding mode.  */
> +};
> +
> +#include "sse4_1-round3.h"
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PING^2 PATCH v4 2/3] rs6000: Support SSE4.1 "round" intrinsics
  2021-10-26 20:00   ` [PING PATCH " Paul A. Clarke
@ 2021-11-08 17:40     ` Paul A. Clarke
  2021-11-19  2:24       ` [PING^3 " Paul A. Clarke
  0 siblings, 1 reply; 13+ messages in thread
From: Paul A. Clarke @ 2021-11-08 17:40 UTC (permalink / raw)
  To: segher, wschmidt, gcc-patches

On Tue, Oct 26, 2021 at 03:00:11PM -0500, Paul A. Clarke via Gcc-patches wrote:
> Patches 1/3 and 3/3 have been committed.
> This is only a ping for 2/3.

Gentle re-ping.

> On Mon, Oct 18, 2021 at 08:15:11PM -0500, Paul A. Clarke via Gcc-patches wrote:
> > Suppress exceptions (when specified), by saving, manipulating, and
> > restoring the FPSCR.  Similarly, save, set, and restore the floating-point
> > rounding mode when required.
> > 
> > No attempt is made to optimize writing the FPSCR (by checking if the new
> > value would be the same), other than using lighter weight instructions
> > when possible. Note that explicit instruction scheduling "barriers" are
> > added to prevent floating-point computations from being moved before or
> > after the explicit FPSCR manipulations.  (That these are required has
> > been reported as an issue in GCC: PR102783.)
> > 
> > The scalar versions naively use the parallel versions to compute the
> > single scalar result and then construct the remainder of the result.
> > 
> > Of minor note, the values of _MM_FROUND_TO_NEG_INF and _MM_FROUND_TO_ZERO
> > are swapped from the corresponding values on x86 so as to match the
> > corresponding rounding mode values in the Power ISA.
> > 
> > Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and
> > convert _mm_ceil* and _mm_floor* into macros. This matches the current
> > analogous implementations in config/i386/smmintrin.h.
> > 
> > Function signatures match the analogous functions in config/i386/smmintrin.h.
> > 
> > Add tests for _mm_round_pd, _mm_round_ps, _mm_round_sd, _mm_round_ss,
> > modeled after the very similar "floor" and "ceil" tests.
> > 
> > Include basic tests, plus tests at the boundaries for floating-point
> > representation, positive and negative, test all of the parameterized
> > rounding modes as well as the C99 rounding modes and interactions
> > between the two.
> > 
> > Exceptions are not explicitly tested.
> > 
> > 2021-10-18  Paul A. Clarke  <pc@us.ibm.com>
> > 
> > gcc
> > 	* config/rs6000/smmintrin.h (_mm_round_pd, _mm_round_ps,
> > 	_mm_round_sd, _mm_round_ss, _MM_FROUND_TO_NEAREST_INT,
> > 	_MM_FROUND_TO_ZERO, _MM_FROUND_TO_POS_INF, _MM_FROUND_TO_NEG_INF,
> > 	_MM_FROUND_CUR_DIRECTION, _MM_FROUND_RAISE_EXC, _MM_FROUND_NO_EXC,
> > 	_MM_FROUND_NINT, _MM_FROUND_FLOOR, _MM_FROUND_CEIL, _MM_FROUND_TRUNC,
> > 	_MM_FROUND_RINT, _MM_FROUND_NEARBYINT): New.
> > 	* config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd,
> > 	_mm_ceil_ss, _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss):
> > 	Convert from function to macro.
> > 
> > gcc/testsuite
> > 	* gcc.target/powerpc/sse4_1-round3.h: New.
> > 	* gcc.target/powerpc/sse4_1-roundpd.c: New.
> > 	* gcc.target/powerpc/sse4_1-roundps.c: New.
> > 	* gcc.target/powerpc/sse4_1-roundsd.c: New.
> > 	* gcc.target/powerpc/sse4_1-roundss.c: New.
> > ---
> >  gcc/config/rs6000/smmintrin.h                 | 292 ++++++++++++++----
> >  .../gcc.target/powerpc/sse4_1-round3.h        |  81 +++++
> >  .../gcc.target/powerpc/sse4_1-roundpd.c       | 143 +++++++++
> >  .../gcc.target/powerpc/sse4_1-roundps.c       |  98 ++++++
> >  .../gcc.target/powerpc/sse4_1-roundsd.c       | 256 +++++++++++++++
> >  .../gcc.target/powerpc/sse4_1-roundss.c       | 208 +++++++++++++
> >  6 files changed, 1014 insertions(+), 64 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c
> > 
> > diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
> > index 90ce03d22709..6bb03e6e20ac 100644
> > --- a/gcc/config/rs6000/smmintrin.h
> > +++ b/gcc/config/rs6000/smmintrin.h
> > @@ -42,6 +42,234 @@
> >  #include <altivec.h>
> >  #include <tmmintrin.h>
> >  
> > +/* Rounding mode macros. */
> > +#define _MM_FROUND_TO_NEAREST_INT       0x00
> > +#define _MM_FROUND_TO_ZERO              0x01
> > +#define _MM_FROUND_TO_POS_INF           0x02
> > +#define _MM_FROUND_TO_NEG_INF           0x03
> > +#define _MM_FROUND_CUR_DIRECTION        0x04
> > +
> > +#define _MM_FROUND_NINT		\
> > +  (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_RAISE_EXC)
> > +#define _MM_FROUND_FLOOR	\
> > +  (_MM_FROUND_TO_NEG_INF | _MM_FROUND_RAISE_EXC)
> > +#define _MM_FROUND_CEIL		\
> > +  (_MM_FROUND_TO_POS_INF | _MM_FROUND_RAISE_EXC)
> > +#define _MM_FROUND_TRUNC	\
> > +  (_MM_FROUND_TO_ZERO | _MM_FROUND_RAISE_EXC)
> > +#define _MM_FROUND_RINT		\
> > +  (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_RAISE_EXC)
> > +#define _MM_FROUND_NEARBYINT	\
> > +  (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC)
> > +
> > +#define _MM_FROUND_RAISE_EXC            0x00
> > +#define _MM_FROUND_NO_EXC               0x08
> > +
> > +extern __inline __m128d
> > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > +_mm_round_pd (__m128d __A, int __rounding)
> > +{
> > +  __v2df __r;
> > +  union {
> > +    double __fr;
> > +    long long __fpscr;
> > +  } __enables_save, __fpscr_save;
> > +
> > +  if (__rounding & _MM_FROUND_NO_EXC)
> > +    {
> > +      /* Save enabled exceptions, disable all exceptions,
> > +	 and preserve the rounding mode.  */
> > +#ifdef _ARCH_PWR9
> > +      __asm__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
> > +      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
> > +#else
> > +      __fpscr_save.__fr = __builtin_mffs ();
> > +      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
> > +      __fpscr_save.__fpscr &= ~0xf8;
> > +      __builtin_mtfsf (0b00000011, __fpscr_save.__fr);
> > +#endif
> > +      /* Insert an artificial "read/write" reference to the variable
> > +	 read below, to ensure the compiler does not schedule
> > +	 a read/use of the variable before the FPSCR is modified, above.
> > +	 This can be removed if and when GCC PR102783 is fixed.
> > +       */
> > +      __asm__ ("" : "+wa" (__A));
> > +    }
> > +
> > +  switch (__rounding)
> > +    {
> > +      case _MM_FROUND_TO_NEAREST_INT:
> > +	__fpscr_save.__fr = __builtin_mffsl ();
> > +	__attribute__ ((fallthrough));
> > +      case _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC:
> > +	__builtin_set_fpscr_rn (0b00);
> > +	/* Insert an artificial "read/write" reference to the variable
> > +	   read below, to ensure the compiler does not schedule
> > +	   a read/use of the variable before the FPSCR is modified, above.
> > +	   This can be removed if and when GCC PR102783 is fixed.
> > +	 */
> > +	__asm__ ("" : "+wa" (__A));
> > +
> > +	__r = vec_rint ((__v2df) __A);
> > +
> > +	/* Insert an artificial "read" reference to the variable written
> > +	   above, to ensure the compiler does not schedule the computation
> > +	   of the value after the manipulation of the FPSCR, below.
> > +	   This can be removed if and when GCC PR102783 is fixed.
> > +	 */
> > +	__asm__ ("" : : "wa" (__r));
> > +	__builtin_set_fpscr_rn (__fpscr_save.__fpscr);
> > +	break;
> > +      case _MM_FROUND_TO_NEG_INF:
> > +      case _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC:
> > +	__r = vec_floor ((__v2df) __A);
> > +	break;
> > +      case _MM_FROUND_TO_POS_INF:
> > +      case _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC:
> > +	__r = vec_ceil ((__v2df) __A);
> > +	break;
> > +      case _MM_FROUND_TO_ZERO:
> > +      case _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC:
> > +	__r = vec_trunc ((__v2df) __A);
> > +	break;
> > +      case _MM_FROUND_CUR_DIRECTION:
> > +	__r = vec_rint ((__v2df) __A);
> > +	break;
> > +    }
> > +  if (__rounding & _MM_FROUND_NO_EXC)
> > +    {
> > +      /* Insert an artificial "read" reference to the variable written
> > +	 above, to ensure the compiler does not schedule the computation
> > +	 of the value after the manipulation of the FPSCR, below.
> > +	 This can be removed if and when GCC PR102783 is fixed.
> > +       */
> > +      __asm__ ("" : : "wa" (__r));
> > +      /* Restore enabled exceptions.  */
> > +      __fpscr_save.__fr = __builtin_mffsl ();
> > +      __fpscr_save.__fpscr |= __enables_save.__fpscr;
> > +      __builtin_mtfsf (0b00000011, __fpscr_save.__fr);
> > +    }
> > +  return (__m128d) __r;
> > +}
> > +
> > +extern __inline __m128d
> > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > +_mm_round_sd (__m128d __A, __m128d __B, int __rounding)
> > +{
> > +  __B = _mm_round_pd (__B, __rounding);
> > +  __v2df __r = { ((__v2df) __B)[0], ((__v2df) __A)[1] };
> > +  return (__m128d) __r;
> > +}
> > +
> > +extern __inline __m128
> > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > +_mm_round_ps (__m128 __A, int __rounding)
> > +{
> > +  __v4sf __r;
> > +  union {
> > +    double __fr;
> > +    long long __fpscr;
> > +  } __enables_save, __fpscr_save;
> > +
> > +  if (__rounding & _MM_FROUND_NO_EXC)
> > +    {
> > +      /* Save enabled exceptions, disable all exceptions,
> > +	 and preserve the rounding mode.  */
> > +#ifdef _ARCH_PWR9
> > +      __asm__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
> > +      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
> > +#else
> > +      __fpscr_save.__fr = __builtin_mffs ();
> > +      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
> > +      __fpscr_save.__fpscr &= ~0xf8;
> > +      __builtin_mtfsf (0b00000011, __fpscr_save.__fr);
> > +#endif
> > +      /* Insert an artificial "read/write" reference to the variable
> > +	 read below, to ensure the compiler does not schedule
> > +	 a read/use of the variable before the FPSCR is modified, above.
> > +	 This can be removed if and when GCC PR102783 is fixed.
> > +       */
> > +      __asm__ ("" : "+wa" (__A));
> > +    }
> > +
> > +  switch (__rounding)
> > +    {
> > +      case _MM_FROUND_TO_NEAREST_INT:
> > +	__fpscr_save.__fr = __builtin_mffsl ();
> > +	__attribute__ ((fallthrough));
> > +      case _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC:
> > +	__builtin_set_fpscr_rn (0b00);
> > +	/* Insert an artificial "read/write" reference to the variable
> > +	   read below, to ensure the compiler does not schedule
> > +	   a read/use of the variable before the FPSCR is modified, above.
> > +	   This can be removed if and when GCC PR102783 is fixed.
> > +	 */
> > +	__asm__ ("" : "+wa" (__A));
> > +
> > +	__r = vec_rint ((__v4sf) __A);
> > +
> > +	/* Insert an artificial "read" reference to the variable written
> > +	   above, to ensure the compiler does not schedule the computation
> > +	   of the value after the manipulation of the FPSCR, below.
> > +	   This can be removed if and when GCC PR102783 is fixed.
> > +	 */
> > +	__asm__ ("" : : "wa" (__r));
> > +	__builtin_set_fpscr_rn (__fpscr_save.__fpscr);
> > +	break;
> > +      case _MM_FROUND_TO_NEG_INF:
> > +      case _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC:
> > +	__r = vec_floor ((__v4sf) __A);
> > +	break;
> > +      case _MM_FROUND_TO_POS_INF:
> > +      case _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC:
> > +	__r = vec_ceil ((__v4sf) __A);
> > +	break;
> > +      case _MM_FROUND_TO_ZERO:
> > +      case _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC:
> > +	__r = vec_trunc ((__v4sf) __A);
> > +	break;
> > +      case _MM_FROUND_CUR_DIRECTION:
> > +	__r = vec_rint ((__v4sf) __A);
> > +	break;
> > +    }
> > +  if (__rounding & _MM_FROUND_NO_EXC)
> > +    {
> > +      /* Insert an artificial "read" reference to the variable written
> > +	 above, to ensure the compiler does not schedule the computation
> > +	 of the value after the manipulation of the FPSCR, below.
> > +	 This can be removed if and when GCC PR102783 is fixed.
> > +       */
> > +      __asm__ ("" : : "wa" (__r));
> > +      /* Restore enabled exceptions.  */
> > +      __fpscr_save.__fr = __builtin_mffsl ();
> > +      __fpscr_save.__fpscr |= __enables_save.__fpscr;
> > +      __builtin_mtfsf (0b00000011, __fpscr_save.__fr);
> > +    }
> > +  return (__m128) __r;
> > +}
> > +
> > +extern __inline __m128
> > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > +_mm_round_ss (__m128 __A, __m128 __B, int __rounding)
> > +{
> > +  __B = _mm_round_ps (__B, __rounding);
> > +  __v4sf __r = (__v4sf) __A;
> > +  __r[0] = ((__v4sf) __B)[0];
> > +  return (__m128) __r;
> > +}
> > +
> > +#define _mm_ceil_pd(V)	   _mm_round_pd ((V), _MM_FROUND_CEIL)
> > +#define _mm_ceil_sd(D, V)  _mm_round_sd ((D), (V), _MM_FROUND_CEIL)
> > +
> > +#define _mm_floor_pd(V)	   _mm_round_pd((V), _MM_FROUND_FLOOR)
> > +#define _mm_floor_sd(D, V) _mm_round_sd ((D), (V), _MM_FROUND_FLOOR)
> > +
> > +#define _mm_ceil_ps(V)	   _mm_round_ps ((V), _MM_FROUND_CEIL)
> > +#define _mm_ceil_ss(D, V)  _mm_round_ss ((D), (V), _MM_FROUND_CEIL)
> > +
> > +#define _mm_floor_ps(V)	   _mm_round_ps ((V), _MM_FROUND_FLOOR)
> > +#define _mm_floor_ss(D, V) _mm_round_ss ((D), (V), _MM_FROUND_FLOOR)
> > +
> >  extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> >  _mm_insert_epi8 (__m128i const __A, int const __D, int const __N)
> >  {
> > @@ -210,70 +438,6 @@ _mm_testnzc_si128 (__m128i __A, __m128i __B)
> >  
> >  #define _mm_test_mix_ones_zeros(M, V) _mm_testnzc_si128 ((M), (V))
> >  
> > -__inline __m128d
> > -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > -_mm_ceil_pd (__m128d __A)
> > -{
> > -  return (__m128d) vec_ceil ((__v2df) __A);
> > -}
> > -
> > -__inline __m128d
> > -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > -_mm_ceil_sd (__m128d __A, __m128d __B)
> > -{
> > -  __v2df __r = vec_ceil ((__v2df) __B);
> > -  __r[1] = ((__v2df) __A)[1];
> > -  return (__m128d) __r;
> > -}
> > -
> > -__inline __m128d
> > -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > -_mm_floor_pd (__m128d __A)
> > -{
> > -  return (__m128d) vec_floor ((__v2df) __A);
> > -}
> > -
> > -__inline __m128d
> > -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > -_mm_floor_sd (__m128d __A, __m128d __B)
> > -{
> > -  __v2df __r = vec_floor ((__v2df) __B);
> > -  __r[1] = ((__v2df) __A)[1];
> > -  return (__m128d) __r;
> > -}
> > -
> > -__inline __m128
> > -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > -_mm_ceil_ps (__m128 __A)
> > -{
> > -  return (__m128) vec_ceil ((__v4sf) __A);
> > -}
> > -
> > -__inline __m128
> > -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > -_mm_ceil_ss (__m128 __A, __m128 __B)
> > -{
> > -  __v4sf __r = (__v4sf) __A;
> > -  __r[0] = __builtin_ceil (((__v4sf) __B)[0]);
> > -  return __r;
> > -}
> > -
> > -__inline __m128
> > -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > -_mm_floor_ps (__m128 __A)
> > -{
> > -  return (__m128) vec_floor ((__v4sf) __A);
> > -}
> > -
> > -__inline __m128
> > -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > -_mm_floor_ss (__m128 __A, __m128 __B)
> > -{
> > -  __v4sf __r = (__v4sf) __A;
> > -  __r[0] = __builtin_floor (((__v4sf) __B)[0]);
> > -  return __r;
> > -}
> > -
> >  #ifdef _ARCH_PWR8
> >  extern __inline __m128i
> >  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h b/gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
> > new file mode 100644
> > index 000000000000..de6cbf7be438
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
> > @@ -0,0 +1,81 @@
> > +#include <smmintrin.h>
> > +#include <fenv.h>
> > +#include "sse4_1-check.h"
> > +
> > +#define DIM(a) (sizeof (a) / sizeof (a)[0])
> > +
> > +static int roundings[] =
> > +  {
> > +    _MM_FROUND_TO_NEAREST_INT,
> > +    _MM_FROUND_TO_NEG_INF,
> > +    _MM_FROUND_TO_POS_INF,
> > +    _MM_FROUND_TO_ZERO,
> > +    _MM_FROUND_CUR_DIRECTION
> > +  };
> > +
> > +static int modes[] =
> > +  {
> > +    FE_TONEAREST,
> > +    FE_UPWARD,
> > +    FE_DOWNWARD,
> > +    FE_TOWARDZERO
> > +  };
> > +
> > +static void
> > +TEST (void)
> > +{
> > +  int i, j, ri, mi, round_save;
> > +
> > +  round_save = fegetround ();
> > +  for (mi = 0; mi < DIM (modes); mi++) {
> > +    fesetround (modes[mi]);
> > +    for (i = 0; i < DIM (data); i++) {
> > +      for (ri = 0; ri < DIM (roundings); ri++) {
> > +	union value guess;
> > +	union value *current_answers = answers[ri];
> > +	switch ( roundings[ri] ) {
> > +	  case _MM_FROUND_TO_NEAREST_INT:
> > +	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
> > +				    _MM_FROUND_TO_NEAREST_INT);
> > +	    break;
> > +	  case _MM_FROUND_TO_NEG_INF:
> > +	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
> > +				    _MM_FROUND_TO_NEG_INF);
> > +	    break;
> > +	  case _MM_FROUND_TO_POS_INF:
> > +	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
> > +				    _MM_FROUND_TO_POS_INF);
> > +	    break;
> > +	  case _MM_FROUND_TO_ZERO:
> > +	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
> > +				    _MM_FROUND_TO_ZERO);
> > +	    break;
> > +	  case _MM_FROUND_CUR_DIRECTION:
> > +	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
> > +				    _MM_FROUND_CUR_DIRECTION);
> > +	    switch ( modes[mi] ) {
> > +	      case FE_TONEAREST:
> > +		current_answers = answers_NEAREST_INT;
> > +		break;
> > +	      case FE_UPWARD:
> > +		current_answers = answers_POS_INF;
> > +		break;
> > +	      case FE_DOWNWARD:
> > +		current_answers = answers_NEG_INF;
> > +		break;
> > +	      case FE_TOWARDZERO:
> > +		current_answers = answers_ZERO;
> > +		break;
> > +	    }
> > +	    break;
> > +	  default:
> > +	    abort ();
> > +	}
> > +	for (j = 0; j < DIM (guess.f); j++)
> > +	  if (guess.f[j] != current_answers[i].f[j])
> > +	    abort ();
> > +      }
> > +    }
> > +  }
> > +  fesetround (round_save);
> > +}
> > diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
> > new file mode 100644
> > index 000000000000..58d9cc524167
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
> > @@ -0,0 +1,143 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target vsx_hw } */
> > +/* { dg-options "-O2 -mvsx" } */
> > +
> > +#define NO_WARN_X86_INTRINSICS 1
> > +#include <smmintrin.h>
> > +
> > +#define VEC_T __m128d
> > +#define FP_T double
> > +
> > +#define ROUND_INTRIN(x, ignored, mode) _mm_round_pd (x, mode)
> > +
> > +#include "sse4_1-round-data.h"
> > +
> > +struct data2 data[] = {
> > +  { .value1 = { .f = {  0.00,  0.25 } } },
> > +  { .value1 = { .f = {  0.50,  0.75 } } },
> > +
> > +  { .value1 = { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffdp+50 } } },
> > +  { .value1 = { .f = {  0x1.ffffffffffffep+50,  0x1.fffffffffffffp+50 } } },
> > +  { .value1 = { .f = {  0x1.0000000000000p+51,  0x1.0000000000001p+51 } } },
> > +  { .value1 = { .f = {  0x1.0000000000002p+51,  0x1.0000000000003p+51 } } },
> > +
> > +  { .value1 = { .f = {  0x1.ffffffffffffep+51,  0x1.fffffffffffffp+51 } } },
> > +  { .value1 = { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } } },
> > +
> > +  { .value1 = { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } } },
> > +  { .value1 = { .f = { -0x1.fffffffffffffp+51, -0x1.ffffffffffffep+51 } } },
> > +
> > +  { .value1 = { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } } },
> > +  { .value1 = { .f = { -0x1.0000000000001p+51, -0x1.0000000000000p+51 } } },
> > +  { .value1 = { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffep+50 } } },
> > +  { .value1 = { .f = { -0x1.ffffffffffffdp+50, -0x1.ffffffffffffcp+50 } } },
> > +
> > +  { .value1 = { .f = { -1.00, -0.75 } } },
> > +  { .value1 = { .f = { -0.50, -0.25 } } }
> > +};
> > +
> > +union value answers_NEAREST_INT[] = {
> > +  { .f = {  0.00,  0.00 } },
> > +  { .f = {  0.00,  1.00 } },
> > +
> > +  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
> > +  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
> > +  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
> > +  { .f = {  0x1.0000000000002p+51,  0x1.0000000000004p+51 } },
> > +
> > +  { .f = {  0x1.ffffffffffffep+51,  0x1.0000000000000p+52 } },
> > +  { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
> > +
> > +  { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
> > +  { .f = { -0x1.0000000000000p+52, -0x1.ffffffffffffep+51 } },
> > +
> > +  { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } },
> > +  { .f = { -0x1.0000000000000p+51, -0x1.0000000000000p+51 } },
> > +  { .f = { -0x1.ffffffffffffcp+50, -0x1.0000000000000p+51 } },
> > +  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
> > +
> > +  { .f = { -1.00, -1.00 } },
> > +  { .f = {  0.00,  0.00 } }
> > +};
> > +
> > +union value answers_NEG_INF[] = {
> > +  { .f = {  0.00,  0.00 } },
> > +  { .f = {  0.00,  0.00 } },
> > +
> > +  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
> > +  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
> > +  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
> > +  { .f = {  0x1.0000000000002p+51,  0x1.0000000000002p+51 } },
> > +
> > +  { .f = {  0x1.ffffffffffffep+51,  0x1.ffffffffffffep+51 } },
> > +  { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
> > +
> > +  { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
> > +  { .f = { -0x1.0000000000000p+52, -0x1.ffffffffffffep+51 } },
> > +
> > +  { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } },
> > +  { .f = { -0x1.0000000000002p+51, -0x1.0000000000000p+51 } },
> > +  { .f = { -0x1.ffffffffffffcp+50, -0x1.0000000000000p+51 } },
> > +  { .f = { -0x1.0000000000000p+51, -0x1.ffffffffffffcp+50 } },
> > +
> > +  { .f = { -1.00, -1.00 } },
> > +  { .f = { -1.00, -1.00 } }
> > +};
> > +
> > +union value answers_POS_INF[] = {
> > +  { .f = {  0.00,  1.00 } },
> > +  { .f = {  1.00,  1.00 } },
> > +
> > +  { .f = {  0x1.ffffffffffffcp+50,  0x1.0000000000000p+51 } },
> > +  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
> > +  { .f = {  0x1.0000000000000p+51,  0x1.0000000000002p+51 } },
> > +  { .f = {  0x1.0000000000002p+51,  0x1.0000000000004p+51 } },
> > +
> > +  { .f = {  0x1.ffffffffffffep+51,  0x1.0000000000000p+52 } },
> > +  { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
> > +
> > +  { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
> > +  { .f = { -0x1.ffffffffffffep+51, -0x1.ffffffffffffep+51 } },
> > +
> > +  { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } },
> > +  { .f = { -0x1.0000000000000p+51, -0x1.0000000000000p+51 } },
> > +  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
> > +  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
> > +
> > +  { .f = { -1.00,  0.00 } },
> > +  { .f = {  0.00,  0.00 } }
> > +};
> > +
> > +union value answers_ZERO[] = {
> > +  { .f = {  0.00,  0.00 } },
> > +  { .f = {  0.00,  0.00 } },
> > +
> > +  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
> > +  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
> > +  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
> > +  { .f = {  0x1.0000000000002p+51,  0x1.0000000000002p+51 } },
> > +
> > +  { .f = {  0x1.ffffffffffffep+51,  0x1.ffffffffffffep+51 } },
> > +  { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
> > +
> > +  { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
> > +  { .f = { -0x1.ffffffffffffep+51, -0x1.ffffffffffffep+51 } },
> > +
> > +  { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } },
> > +  { .f = { -0x1.0000000000000p+51, -0x1.0000000000000p+51 } },
> > +  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
> > +  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
> > +
> > +  { .f = { -1.00,  0.00 } },
> > +  { .f = {  0.00,  0.00 } }
> > +};
> > +
> > +union value *answers[] = {
> > +  answers_NEAREST_INT,
> > +  answers_NEG_INF,
> > +  answers_POS_INF,
> > +  answers_ZERO,
> > +  0 /* CUR_DIRECTION answers depend on current rounding mode.  */
> > +};
> > +
> > +#include "sse4_1-round3.h"
> > diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
> > new file mode 100644
> > index 000000000000..4b0366dfddf3
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
> > @@ -0,0 +1,98 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target vsx_hw } */
> > +/* { dg-options "-O2 -mvsx" } */
> > +
> > +#define NO_WARN_X86_INTRINSICS 1
> > +#include <smmintrin.h>
> > +
> > +#define VEC_T __m128
> > +#define FP_T float
> > +
> > +#define ROUND_INTRIN(x, ignored, mode) _mm_round_ps (x, mode)
> > +
> > +#include "sse4_1-round-data.h"
> > +
> > +struct data2 data[] = {
> > +  { .value1 = { .f = {  0.00,  0.25,  0.50,  0.75 } } },
> > +
> > +  { .value1 = { .f = {  0x1.fffff8p+21,  0x1.fffffap+21,
> > +			0x1.fffffcp+21,  0x1.fffffep+21 } } },
> > +  { .value1 = { .f = {  0x1.fffffap+22,  0x1.fffffcp+22,
> > +			0x1.fffffep+22,  0x1.fffffep+23 } } },
> > +  { .value1 = { .f = { -0x1.fffffep+23, -0x1.fffffep+22,
> > +		       -0x1.fffffcp+22, -0x1.fffffap+22 } } },
> > +  { .value1 = { .f = { -0x1.fffffep+21, -0x1.fffffcp+21,
> > +		       -0x1.fffffap+21, -0x1.fffff8p+21 } } },
> > +
> > +  { .value1 = { .f = { -1.00, -0.75, -0.50, -0.25 } } }
> > +};
> > +
> > +union value answers_NEAREST_INT[] = {
> > +  { .f = {  0.00,  0.00,  0.00,  1.00 } },
> > +
> > +  { .f = {  0x1.fffff8p+21,  0x1.fffff8p+21,
> > +            0x1.000000p+22,  0x1.000000p+22 } },
> > +  { .f = {  0x1.fffff8p+22,  0x1.fffffcp+22,
> > +            0x1.000000p+23,  0x1.fffffep+23 } },
> > +  { .f = { -0x1.fffffep+23, -0x1.000000p+23,
> > +           -0x1.fffffcp+22, -0x1.fffff8p+22 } },
> > +  { .f = { -0x1.000000p+22, -0x1.000000p+22,
> > +           -0x1.fffff8p+21, -0x1.fffff8p+21 } },
> > +
> > +  { .f = { -1.00, -1.00,  0.00,  0.00 } }
> > +};
> > +
> > +union value answers_NEG_INF[] = {
> > +  { .f = {  0.00,  0.00,  0.00,  0.00 } },
> > +
> > +  { .f = {  0x1.fffff8p+21,  0x1.fffff8p+21,
> > +            0x1.fffff8p+21,  0x1.fffff8p+21 } },
> > +  { .f = {  0x1.fffff8p+22,  0x1.fffffcp+22,
> > +            0x1.fffffcp+22,  0x1.fffffep+23 } },
> > +  { .f = { -0x1.fffffep+23, -0x1.000000p+23,
> > +           -0x1.fffffcp+22, -0x1.fffffcp+22 } },
> > +  { .f = { -0x1.000000p+22, -0x1.000000p+22,
> > +           -0x1.000000p+22, -0x1.fffff8p+21 } },
> > +
> > +  { .f = { -1.00, -1.00, -1.00, -1.00 } }
> > +};
> > +
> > +union value answers_POS_INF[] = {
> > +  { .f = {  0.00,  1.00,  1.00,  1.00 } },
> > +
> > +  { .f = {  0x1.fffff8p+21,  0x1.000000p+22,
> > +            0x1.000000p+22,  0x1.000000p+22 } },
> > +  { .f = {  0x1.fffffcp+22,  0x1.fffffcp+22,
> > +            0x1.000000p+23,  0x1.fffffep+23 } },
> > +  { .f = { -0x1.fffffep+23, -0x1.fffffcp+22,
> > +           -0x1.fffffcp+22, -0x1.fffff8p+22 } },
> > +  { .f = { -0x1.fffff8p+21, -0x1.fffff8p+21,
> > +           -0x1.fffff8p+21, -0x1.fffff8p+21 } },
> > +
> > +  { .f = { -1.00,  0.00,  0.00,  0.00 } }
> > +};
> > +
> > +union value answers_ZERO[] = {
> > +  { .f = {  0.00,  0.00,  0.00,  0.00 } },
> > +
> > +  { .f = {  0x1.fffff8p+21,  0x1.fffff8p+21,
> > +            0x1.fffff8p+21,  0x1.fffff8p+21 } },
> > +  { .f = {  0x1.fffff8p+22,  0x1.fffffcp+22,
> > +            0x1.fffffcp+22,  0x1.fffffep+23 } },
> > +  { .f = { -0x1.fffffep+23, -0x1.fffffcp+22,
> > +           -0x1.fffffcp+22, -0x1.fffff8p+22 } },
> > +  { .f = { -0x1.fffff8p+21, -0x1.fffff8p+21,
> > +           -0x1.fffff8p+21, -0x1.fffff8p+21 } },
> > +
> > +  { .f = { -1.00,  0.00,  0.00,  0.00 } }
> > +};
> > +
> > +union value *answers[] = {
> > +  answers_NEAREST_INT,
> > +  answers_NEG_INF,
> > +  answers_POS_INF,
> > +  answers_ZERO,
> > +  0 /* CUR_DIRECTION answers depend on current rounding mode.  */
> > +};
> > +
> > +#include "sse4_1-round3.h"
> > diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
> > new file mode 100644
> > index 000000000000..4f8d9e08c93d
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
> > @@ -0,0 +1,256 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target vsx_hw } */
> > +/* { dg-options "-O2 -mvsx" } */
> > +
> > +#include <stdio.h>
> > +#define NO_WARN_X86_INTRINSICS 1
> > +#include <smmintrin.h>
> > +
> > +#define VEC_T __m128d
> > +#define FP_T double
> > +
> > +#define ROUND_INTRIN(x, y, mode) _mm_round_sd (x, y, mode)
> > +
> > +#include "sse4_1-round-data.h"
> > +
> > +static struct data2 data[] = {
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > +    .value2 = { .f = {  0.00, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > +    .value2 = { .f = {  0.25, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > +    .value2 = { .f = {  0.50, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > +    .value2 = { .f = {  0.75, IGNORED } } },
> > +
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > +    .value2 = { .f = {  0x1.ffffffffffffcp+50, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > +    .value2 = { .f = {  0x1.ffffffffffffdp+50, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > +    .value2 = { .f = {  0x1.ffffffffffffep+50, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > +    .value2 = { .f = {  0x1.fffffffffffffp+50, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > +    .value2 = { .f = {  0x1.0000000000000p+51, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > +    .value2 = { .f = {  0x1.0000000000001p+51, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > +    .value2 = { .f = {  0x1.0000000000002p+51, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > +    .value2 = { .f = {  0x1.0000000000003p+51, IGNORED } } },
> > +
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > +    .value2 = { .f = {  0x1.ffffffffffffep+51, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > +    .value2 = { .f = {  0x1.fffffffffffffp+51, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > +    .value2 = { .f = {  0x1.0000000000000p+52, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > +    .value2 = { .f = {  0x1.0000000000001p+52, IGNORED } } },
> > +
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > +    .value2 = { .f = { -0x1.0000000000001p+52, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > +    .value2 = { .f = { -0x1.0000000000000p+52, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > +    .value2 = { .f = { -0x1.fffffffffffffp+51, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > +    .value2 = { .f = { -0x1.ffffffffffffep+51, IGNORED } } },
> > +
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > +    .value2 = { .f = { -0x1.0000000000004p+51, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > +    .value2 = { .f = { -0x1.0000000000002p+51, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > +    .value2 = { .f = { -0x1.0000000000001p+51, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > +    .value2 = { .f = { -0x1.0000000000000p+51, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > +    .value2 = { .f = { -0x1.ffffffffffffcp+50, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > +    .value2 = { .f = { -0x1.ffffffffffffep+50, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > +    .value2 = { .f = { -0x1.ffffffffffffdp+50, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > +    .value2 = { .f = { -0x1.ffffffffffffcp+50, IGNORED } } },
> > +
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > +    .value2 = { .f = { -1.00, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > +    .value2 = { .f = { -0.75, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > +    .value2 = { .f = { -0.50, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > +    .value2 = { .f = { -0.25, IGNORED } } }
> > +};
> > +
> > +static union value answers_NEAREST_INT[] = {
> > +  { .f = {  0.00, PASSTHROUGH } },
> > +  { .f = {  0.00, PASSTHROUGH } },
> > +  { .f = {  0.00, PASSTHROUGH } },
> > +  { .f = {  1.00, PASSTHROUGH } },
> > +
> > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> > +  { .f = {  0x1.0000000000004p+51, PASSTHROUGH } },
> > +
> > +  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> > +  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
> > +  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
> > +  { .f = {  0x1.0000000000001p+52, PASSTHROUGH } },
> > +
> > +  { .f = { -0x1.0000000000001p+52, PASSTHROUGH } },
> > +  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
> > +  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
> > +  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> > +
> > +  { .f = { -0x1.0000000000004p+51, PASSTHROUGH } },
> > +  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
> > +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> > +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > +
> > +  { .f = { -1.00, PASSTHROUGH } },
> > +  { .f = { -1.00, PASSTHROUGH } },
> > +  { .f = { -0.00, PASSTHROUGH } },
> > +  { .f = { -0.00, PASSTHROUGH } }
> > +};
> > +
> > +static union value answers_NEG_INF[] = {
> > +  { .f = {  0.00, PASSTHROUGH } },
> > +  { .f = {  0.00, PASSTHROUGH } },
> > +  { .f = {  0.00, PASSTHROUGH } },
> > +  { .f = {  0.00, PASSTHROUGH } },
> > +
> > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> > +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> > +
> > +  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> > +  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> > +  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
> > +  { .f = {  0x1.0000000000001p+52, PASSTHROUGH } },
> > +
> > +  { .f = { -0x1.0000000000001p+52, PASSTHROUGH } },
> > +  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
> > +  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
> > +  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> > +
> > +  { .f = { -0x1.0000000000004p+51, PASSTHROUGH } },
> > +  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
> > +  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
> > +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> > +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > +
> > +  { .f = { -1.00, PASSTHROUGH } },
> > +  { .f = { -1.00, PASSTHROUGH } },
> > +  { .f = { -1.00, PASSTHROUGH } },
> > +  { .f = { -1.00, PASSTHROUGH } }
> > +};
> > +
> > +static union value answers_POS_INF[] = {
> > +  { .f = {  0.00, PASSTHROUGH } },
> > +  { .f = {  1.00, PASSTHROUGH } },
> > +  { .f = {  1.00, PASSTHROUGH } },
> > +  { .f = {  1.00, PASSTHROUGH } },
> > +
> > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> > +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> > +  { .f = {  0x1.0000000000004p+51, PASSTHROUGH } },
> > +
> > +  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> > +  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
> > +  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
> > +  { .f = {  0x1.0000000000001p+52, PASSTHROUGH } },
> > +
> > +  { .f = { -0x1.0000000000001p+52, PASSTHROUGH } },
> > +  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
> > +  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> > +  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> > +
> > +  { .f = { -0x1.0000000000004p+51, PASSTHROUGH } },
> > +  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
> > +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> > +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > +
> > +  { .f = { -1.00, PASSTHROUGH } },
> > +  { .f = {  0.00, PASSTHROUGH } },
> > +  { .f = {  0.00, PASSTHROUGH } },
> > +  { .f = {  0.00, PASSTHROUGH } }
> > +};
> > +
> > +static union value answers_ZERO[] = {
> > +  { .f = {  0.00, PASSTHROUGH } },
> > +  { .f = {  0.00, PASSTHROUGH } },
> > +  { .f = {  0.00, PASSTHROUGH } },
> > +  { .f = {  0.00, PASSTHROUGH } },
> > +
> > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> > +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> > +
> > +  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> > +  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> > +  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
> > +  { .f = {  0x1.0000000000001p+52, PASSTHROUGH } },
> > +
> > +  { .f = { -0x1.0000000000001p+52, PASSTHROUGH } },
> > +  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
> > +  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> > +  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> > +
> > +  { .f = { -0x1.0000000000004p+51, PASSTHROUGH } },
> > +  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
> > +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> > +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > +
> > +  { .f = { -1.00, PASSTHROUGH } },
> > +  { .f = {  0.00, PASSTHROUGH } },
> > +  { .f = {  0.00, PASSTHROUGH } },
> > +  { .f = {  0.00, PASSTHROUGH } }
> > +};
> > +
> > +union value *answers[] = {
> > +  answers_NEAREST_INT,
> > +  answers_NEG_INF,
> > +  answers_POS_INF,
> > +  answers_ZERO,
> > +  0 /* CUR_DIRECTION answers depend on current rounding mode.  */
> > +};
> > +
> > +#include "sse4_1-round3.h"
> > diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c
> > new file mode 100644
> > index 000000000000..d788ebda64dd
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c
> > @@ -0,0 +1,208 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target vsx_hw } */
> > +/* { dg-options "-O2 -mvsx" } */
> > +
> > +#include <stdio.h>
> > +#define NO_WARN_X86_INTRINSICS 1
> > +#include <smmintrin.h>
> > +
> > +#define VEC_T __m128
> > +#define FP_T float
> > +
> > +#define ROUND_INTRIN(x, y, mode) _mm_round_ss (x, y, mode)
> > +
> > +#include "sse4_1-round-data.h"
> > +
> > +static struct data2 data[] = {
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +    .value2 = { .f = {  0.00, IGNORED, IGNORED, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +    .value2 = { .f = {  0.25, IGNORED, IGNORED, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +    .value2 = { .f = {  0.50, IGNORED, IGNORED, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +    .value2 = { .f = {  0.75, IGNORED, IGNORED, IGNORED } } },
> > +
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +    .value2 = { .f = {  0x1.fffff8p+21, IGNORED, IGNORED, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +    .value2 = { .f = {  0x1.fffffap+21, IGNORED, IGNORED, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +    .value2 = { .f = {  0x1.fffffcp+21, IGNORED, IGNORED, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +    .value2 = { .f = {  0x1.fffffep+21, IGNORED, IGNORED, IGNORED } } },
> > +
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +    .value2 = { .f = {  0x1.fffffap+22, IGNORED, IGNORED, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +    .value2 = { .f = {  0x1.fffffcp+22, IGNORED, IGNORED, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +    .value2 = { .f = {  0x1.fffffep+22, IGNORED, IGNORED, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +    .value2 = { .f = {  0x1.fffffep+23, IGNORED, IGNORED, IGNORED } } },
> > +
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +    .value2 = { .f = { -0x1.fffffep+23, IGNORED, IGNORED, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +    .value2 = { .f = { -0x1.fffffep+22, IGNORED, IGNORED, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +    .value2 = { .f = { -0x1.fffffcp+22, IGNORED, IGNORED, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +    .value2 = { .f = { -0x1.fffffap+22, IGNORED, IGNORED, IGNORED } } },
> > +
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +    .value2 = { .f = { -0x1.fffffep+21, IGNORED, IGNORED, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +    .value2 = { .f = { -0x1.fffffcp+21, IGNORED, IGNORED, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +    .value2 = { .f = { -0x1.fffffap+21, IGNORED, IGNORED, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +    .value2 = { .f = { -0x1.fffff8p+21, IGNORED, IGNORED, IGNORED } } },
> > +
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +    .value2 = { .f = { -1.00, IGNORED, IGNORED, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +    .value2 = { .f = { -0.75, IGNORED, IGNORED, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +    .value2 = { .f = { -0.50, IGNORED, IGNORED, IGNORED } } },
> > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +    .value2 = { .f = { -0.25, IGNORED, IGNORED, IGNORED } } }
> > +};
> > +
> > +static union value answers_NEAREST_INT[] = {
> > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +
> > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +
> > +  { .f = {  0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +
> > +  { .f = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = { -0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = { -0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +
> > +  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +
> > +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = { -0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = { -0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }
> > +};
> > +
> > +static union value answers_NEG_INF[] = {
> > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +
> > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +
> > +  { .f = {  0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +
> > +  { .f = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = { -0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +
> > +  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +
> > +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }
> > +};
> > +
> > +static union value answers_POS_INF[] = {
> > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +
> > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +
> > +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +
> > +  { .f = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = { -0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +
> > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +
> > +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }
> > +};
> > +
> > +static union value answers_ZERO[] = {
> > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +
> > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +
> > +  { .f = {  0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +
> > +  { .f = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = { -0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +
> > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +
> > +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }
> > +};
> > +
> > +union value *answers[] = {
> > +  answers_NEAREST_INT,
> > +  answers_NEG_INF,
> > +  answers_POS_INF,
> > +  answers_ZERO,
> > +  0 /* CUR_DIRECTION answers depend on current rounding mode.  */
> > +};
> > +
> > +#include "sse4_1-round3.h"
> > -- 
> > 2.27.0
> > 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PING^3 PATCH v4 2/3] rs6000: Support SSE4.1 "round" intrinsics
  2021-11-08 17:40     ` [PING^2 " Paul A. Clarke
@ 2021-11-19  2:24       ` Paul A. Clarke
  2022-01-03 16:48         ` [PING^4 " Paul A. Clarke
  0 siblings, 1 reply; 13+ messages in thread
From: Paul A. Clarke @ 2021-11-19  2:24 UTC (permalink / raw)
  To: segher, wschmidt, gcc-patches

On Mon, Nov 08, 2021 at 11:40:42AM -0600, Paul A. Clarke via Gcc-patches wrote:
> On Tue, Oct 26, 2021 at 03:00:11PM -0500, Paul A. Clarke via Gcc-patches wrote:
> > Patches 1/3 and 3/3 have been committed.
> > This is only a ping for 2/3.
> 
> Gentle re-ping.

Gentle re-re-ping.

> > On Mon, Oct 18, 2021 at 08:15:11PM -0500, Paul A. Clarke via Gcc-patches wrote:
> > > Suppress exceptions (when specified), by saving, manipulating, and
> > > restoring the FPSCR.  Similarly, save, set, and restore the floating-point
> > > rounding mode when required.
> > > 
> > > No attempt is made to optimize writing the FPSCR (by checking if the new
> > > value would be the same), other than using lighter weight instructions
> > > when possible. Note that explicit instruction scheduling "barriers" are
> > > added to prevent floating-point computations from being moved before or
> > > after the explicit FPSCR manipulations.  (That these are required has
> > > been reported as an issue in GCC: PR102783.)
> > > 
> > > The scalar versions naively use the parallel versions to compute the
> > > single scalar result and then construct the remainder of the result.
> > > 
> > > Of minor note, the values of _MM_FROUND_TO_NEG_INF and _MM_FROUND_TO_ZERO
> > > are swapped from the corresponding values on x86 so as to match the
> > > corresponding rounding mode values in the Power ISA.
> > > 
> > > Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and
> > > convert _mm_ceil* and _mm_floor* into macros. This matches the current
> > > analogous implementations in config/i386/smmintrin.h.
> > > 
> > > Function signatures match the analogous functions in config/i386/smmintrin.h.
> > > 
> > > Add tests for _mm_round_pd, _mm_round_ps, _mm_round_sd, _mm_round_ss,
> > > modeled after the very similar "floor" and "ceil" tests.
> > > 
> > > Include basic tests, plus tests at the boundaries for floating-point
> > > representation, positive and negative, test all of the parameterized
> > > rounding modes as well as the C99 rounding modes and interactions
> > > between the two.
> > > 
> > > Exceptions are not explicitly tested.
> > > 
> > > 2021-10-18  Paul A. Clarke  <pc@us.ibm.com>
> > > 
> > > gcc
> > > 	* config/rs6000/smmintrin.h (_mm_round_pd, _mm_round_ps,
> > > 	_mm_round_sd, _mm_round_ss, _MM_FROUND_TO_NEAREST_INT,
> > > 	_MM_FROUND_TO_ZERO, _MM_FROUND_TO_POS_INF, _MM_FROUND_TO_NEG_INF,
> > > 	_MM_FROUND_CUR_DIRECTION, _MM_FROUND_RAISE_EXC, _MM_FROUND_NO_EXC,
> > > 	_MM_FROUND_NINT, _MM_FROUND_FLOOR, _MM_FROUND_CEIL, _MM_FROUND_TRUNC,
> > > 	_MM_FROUND_RINT, _MM_FROUND_NEARBYINT): New.
> > > 	* config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd,
> > > 	_mm_ceil_ss, _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss):
> > > 	Convert from function to macro.
> > > 
> > > gcc/testsuite
> > > 	* gcc.target/powerpc/sse4_1-round3.h: New.
> > > 	* gcc.target/powerpc/sse4_1-roundpd.c: New.
> > > 	* gcc.target/powerpc/sse4_1-roundps.c: New.
> > > 	* gcc.target/powerpc/sse4_1-roundsd.c: New.
> > > 	* gcc.target/powerpc/sse4_1-roundss.c: New.
> > > ---
> > >  gcc/config/rs6000/smmintrin.h                 | 292 ++++++++++++++----
> > >  .../gcc.target/powerpc/sse4_1-round3.h        |  81 +++++
> > >  .../gcc.target/powerpc/sse4_1-roundpd.c       | 143 +++++++++
> > >  .../gcc.target/powerpc/sse4_1-roundps.c       |  98 ++++++
> > >  .../gcc.target/powerpc/sse4_1-roundsd.c       | 256 +++++++++++++++
> > >  .../gcc.target/powerpc/sse4_1-roundss.c       | 208 +++++++++++++
> > >  6 files changed, 1014 insertions(+), 64 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
> > >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
> > >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
> > >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
> > >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c
> > > 
> > > diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
> > > index 90ce03d22709..6bb03e6e20ac 100644
> > > --- a/gcc/config/rs6000/smmintrin.h
> > > +++ b/gcc/config/rs6000/smmintrin.h
> > > @@ -42,6 +42,234 @@
> > >  #include <altivec.h>
> > >  #include <tmmintrin.h>
> > >  
> > > +/* Rounding mode macros. */
> > > +#define _MM_FROUND_TO_NEAREST_INT       0x00
> > > +#define _MM_FROUND_TO_ZERO              0x01
> > > +#define _MM_FROUND_TO_POS_INF           0x02
> > > +#define _MM_FROUND_TO_NEG_INF           0x03
> > > +#define _MM_FROUND_CUR_DIRECTION        0x04
> > > +
> > > +#define _MM_FROUND_NINT		\
> > > +  (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_RAISE_EXC)
> > > +#define _MM_FROUND_FLOOR	\
> > > +  (_MM_FROUND_TO_NEG_INF | _MM_FROUND_RAISE_EXC)
> > > +#define _MM_FROUND_CEIL		\
> > > +  (_MM_FROUND_TO_POS_INF | _MM_FROUND_RAISE_EXC)
> > > +#define _MM_FROUND_TRUNC	\
> > > +  (_MM_FROUND_TO_ZERO | _MM_FROUND_RAISE_EXC)
> > > +#define _MM_FROUND_RINT		\
> > > +  (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_RAISE_EXC)
> > > +#define _MM_FROUND_NEARBYINT	\
> > > +  (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC)
> > > +
> > > +#define _MM_FROUND_RAISE_EXC            0x00
> > > +#define _MM_FROUND_NO_EXC               0x08
> > > +
> > > +extern __inline __m128d
> > > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > > +_mm_round_pd (__m128d __A, int __rounding)
> > > +{
> > > +  __v2df __r;
> > > +  union {
> > > +    double __fr;
> > > +    long long __fpscr;
> > > +  } __enables_save, __fpscr_save;
> > > +
> > > +  if (__rounding & _MM_FROUND_NO_EXC)
> > > +    {
> > > +      /* Save enabled exceptions, disable all exceptions,
> > > +	 and preserve the rounding mode.  */
> > > +#ifdef _ARCH_PWR9
> > > +      __asm__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
> > > +      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
> > > +#else
> > > +      __fpscr_save.__fr = __builtin_mffs ();
> > > +      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
> > > +      __fpscr_save.__fpscr &= ~0xf8;
> > > +      __builtin_mtfsf (0b00000011, __fpscr_save.__fr);
> > > +#endif
> > > +      /* Insert an artificial "read/write" reference to the variable
> > > +	 read below, to ensure the compiler does not schedule
> > > +	 a read/use of the variable before the FPSCR is modified, above.
> > > +	 This can be removed if and when GCC PR102783 is fixed.
> > > +       */
> > > +      __asm__ ("" : "+wa" (__A));
> > > +    }
> > > +
> > > +  switch (__rounding)
> > > +    {
> > > +      case _MM_FROUND_TO_NEAREST_INT:
> > > +	__fpscr_save.__fr = __builtin_mffsl ();
> > > +	__attribute__ ((fallthrough));
> > > +      case _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC:
> > > +	__builtin_set_fpscr_rn (0b00);
> > > +	/* Insert an artificial "read/write" reference to the variable
> > > +	   read below, to ensure the compiler does not schedule
> > > +	   a read/use of the variable before the FPSCR is modified, above.
> > > +	   This can be removed if and when GCC PR102783 is fixed.
> > > +	 */
> > > +	__asm__ ("" : "+wa" (__A));
> > > +
> > > +	__r = vec_rint ((__v2df) __A);
> > > +
> > > +	/* Insert an artificial "read" reference to the variable written
> > > +	   above, to ensure the compiler does not schedule the computation
> > > +	   of the value after the manipulation of the FPSCR, below.
> > > +	   This can be removed if and when GCC PR102783 is fixed.
> > > +	 */
> > > +	__asm__ ("" : : "wa" (__r));
> > > +	__builtin_set_fpscr_rn (__fpscr_save.__fpscr);
> > > +	break;
> > > +      case _MM_FROUND_TO_NEG_INF:
> > > +      case _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC:
> > > +	__r = vec_floor ((__v2df) __A);
> > > +	break;
> > > +      case _MM_FROUND_TO_POS_INF:
> > > +      case _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC:
> > > +	__r = vec_ceil ((__v2df) __A);
> > > +	break;
> > > +      case _MM_FROUND_TO_ZERO:
> > > +      case _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC:
> > > +	__r = vec_trunc ((__v2df) __A);
> > > +	break;
> > > +      case _MM_FROUND_CUR_DIRECTION:
> > > +	__r = vec_rint ((__v2df) __A);
> > > +	break;
> > > +    }
> > > +  if (__rounding & _MM_FROUND_NO_EXC)
> > > +    {
> > > +      /* Insert an artificial "read" reference to the variable written
> > > +	 above, to ensure the compiler does not schedule the computation
> > > +	 of the value after the manipulation of the FPSCR, below.
> > > +	 This can be removed if and when GCC PR102783 is fixed.
> > > +       */
> > > +      __asm__ ("" : : "wa" (__r));
> > > +      /* Restore enabled exceptions.  */
> > > +      __fpscr_save.__fr = __builtin_mffsl ();
> > > +      __fpscr_save.__fpscr |= __enables_save.__fpscr;
> > > +      __builtin_mtfsf (0b00000011, __fpscr_save.__fr);
> > > +    }
> > > +  return (__m128d) __r;
> > > +}
> > > +
> > > +extern __inline __m128d
> > > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > > +_mm_round_sd (__m128d __A, __m128d __B, int __rounding)
> > > +{
> > > +  __B = _mm_round_pd (__B, __rounding);
> > > +  __v2df __r = { ((__v2df) __B)[0], ((__v2df) __A)[1] };
> > > +  return (__m128d) __r;
> > > +}
> > > +
> > > +extern __inline __m128
> > > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > > +_mm_round_ps (__m128 __A, int __rounding)
> > > +{
> > > +  __v4sf __r;
> > > +  union {
> > > +    double __fr;
> > > +    long long __fpscr;
> > > +  } __enables_save, __fpscr_save;
> > > +
> > > +  if (__rounding & _MM_FROUND_NO_EXC)
> > > +    {
> > > +      /* Save enabled exceptions, disable all exceptions,
> > > +	 and preserve the rounding mode.  */
> > > +#ifdef _ARCH_PWR9
> > > +      __asm__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
> > > +      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
> > > +#else
> > > +      __fpscr_save.__fr = __builtin_mffs ();
> > > +      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
> > > +      __fpscr_save.__fpscr &= ~0xf8;
> > > +      __builtin_mtfsf (0b00000011, __fpscr_save.__fr);
> > > +#endif
> > > +      /* Insert an artificial "read/write" reference to the variable
> > > +	 read below, to ensure the compiler does not schedule
> > > +	 a read/use of the variable before the FPSCR is modified, above.
> > > +	 This can be removed if and when GCC PR102783 is fixed.
> > > +       */
> > > +      __asm__ ("" : "+wa" (__A));
> > > +    }
> > > +
> > > +  switch (__rounding)
> > > +    {
> > > +      case _MM_FROUND_TO_NEAREST_INT:
> > > +	__fpscr_save.__fr = __builtin_mffsl ();
> > > +	__attribute__ ((fallthrough));
> > > +      case _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC:
> > > +	__builtin_set_fpscr_rn (0b00);
> > > +	/* Insert an artificial "read/write" reference to the variable
> > > +	   read below, to ensure the compiler does not schedule
> > > +	   a read/use of the variable before the FPSCR is modified, above.
> > > +	   This can be removed if and when GCC PR102783 is fixed.
> > > +	 */
> > > +	__asm__ ("" : "+wa" (__A));
> > > +
> > > +	__r = vec_rint ((__v4sf) __A);
> > > +
> > > +	/* Insert an artificial "read" reference to the variable written
> > > +	   above, to ensure the compiler does not schedule the computation
> > > +	   of the value after the manipulation of the FPSCR, below.
> > > +	   This can be removed if and when GCC PR102783 is fixed.
> > > +	 */
> > > +	__asm__ ("" : : "wa" (__r));
> > > +	__builtin_set_fpscr_rn (__fpscr_save.__fpscr);
> > > +	break;
> > > +      case _MM_FROUND_TO_NEG_INF:
> > > +      case _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC:
> > > +	__r = vec_floor ((__v4sf) __A);
> > > +	break;
> > > +      case _MM_FROUND_TO_POS_INF:
> > > +      case _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC:
> > > +	__r = vec_ceil ((__v4sf) __A);
> > > +	break;
> > > +      case _MM_FROUND_TO_ZERO:
> > > +      case _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC:
> > > +	__r = vec_trunc ((__v4sf) __A);
> > > +	break;
> > > +      case _MM_FROUND_CUR_DIRECTION:
> > > +	__r = vec_rint ((__v4sf) __A);
> > > +	break;
> > > +    }
> > > +  if (__rounding & _MM_FROUND_NO_EXC)
> > > +    {
> > > +      /* Insert an artificial "read" reference to the variable written
> > > +	 above, to ensure the compiler does not schedule the computation
> > > +	 of the value after the manipulation of the FPSCR, below.
> > > +	 This can be removed if and when GCC PR102783 is fixed.
> > > +       */
> > > +      __asm__ ("" : : "wa" (__r));
> > > +      /* Restore enabled exceptions.  */
> > > +      __fpscr_save.__fr = __builtin_mffsl ();
> > > +      __fpscr_save.__fpscr |= __enables_save.__fpscr;
> > > +      __builtin_mtfsf (0b00000011, __fpscr_save.__fr);
> > > +    }
> > > +  return (__m128) __r;
> > > +}
> > > +
> > > +extern __inline __m128
> > > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > > +_mm_round_ss (__m128 __A, __m128 __B, int __rounding)
> > > +{
> > > +  __B = _mm_round_ps (__B, __rounding);
> > > +  __v4sf __r = (__v4sf) __A;
> > > +  __r[0] = ((__v4sf) __B)[0];
> > > +  return (__m128) __r;
> > > +}
> > > +
> > > +#define _mm_ceil_pd(V)	   _mm_round_pd ((V), _MM_FROUND_CEIL)
> > > +#define _mm_ceil_sd(D, V)  _mm_round_sd ((D), (V), _MM_FROUND_CEIL)
> > > +
> > > +#define _mm_floor_pd(V)	   _mm_round_pd((V), _MM_FROUND_FLOOR)
> > > +#define _mm_floor_sd(D, V) _mm_round_sd ((D), (V), _MM_FROUND_FLOOR)
> > > +
> > > +#define _mm_ceil_ps(V)	   _mm_round_ps ((V), _MM_FROUND_CEIL)
> > > +#define _mm_ceil_ss(D, V)  _mm_round_ss ((D), (V), _MM_FROUND_CEIL)
> > > +
> > > +#define _mm_floor_ps(V)	   _mm_round_ps ((V), _MM_FROUND_FLOOR)
> > > +#define _mm_floor_ss(D, V) _mm_round_ss ((D), (V), _MM_FROUND_FLOOR)
> > > +
> > >  extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> > >  _mm_insert_epi8 (__m128i const __A, int const __D, int const __N)
> > >  {
> > > @@ -210,70 +438,6 @@ _mm_testnzc_si128 (__m128i __A, __m128i __B)
> > >  
> > >  #define _mm_test_mix_ones_zeros(M, V) _mm_testnzc_si128 ((M), (V))
> > >  
> > > -__inline __m128d
> > > -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > > -_mm_ceil_pd (__m128d __A)
> > > -{
> > > -  return (__m128d) vec_ceil ((__v2df) __A);
> > > -}
> > > -
> > > -__inline __m128d
> > > -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > > -_mm_ceil_sd (__m128d __A, __m128d __B)
> > > -{
> > > -  __v2df __r = vec_ceil ((__v2df) __B);
> > > -  __r[1] = ((__v2df) __A)[1];
> > > -  return (__m128d) __r;
> > > -}
> > > -
> > > -__inline __m128d
> > > -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > > -_mm_floor_pd (__m128d __A)
> > > -{
> > > -  return (__m128d) vec_floor ((__v2df) __A);
> > > -}
> > > -
> > > -__inline __m128d
> > > -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > > -_mm_floor_sd (__m128d __A, __m128d __B)
> > > -{
> > > -  __v2df __r = vec_floor ((__v2df) __B);
> > > -  __r[1] = ((__v2df) __A)[1];
> > > -  return (__m128d) __r;
> > > -}
> > > -
> > > -__inline __m128
> > > -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > > -_mm_ceil_ps (__m128 __A)
> > > -{
> > > -  return (__m128) vec_ceil ((__v4sf) __A);
> > > -}
> > > -
> > > -__inline __m128
> > > -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > > -_mm_ceil_ss (__m128 __A, __m128 __B)
> > > -{
> > > -  __v4sf __r = (__v4sf) __A;
> > > -  __r[0] = __builtin_ceil (((__v4sf) __B)[0]);
> > > -  return __r;
> > > -}
> > > -
> > > -__inline __m128
> > > -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > > -_mm_floor_ps (__m128 __A)
> > > -{
> > > -  return (__m128) vec_floor ((__v4sf) __A);
> > > -}
> > > -
> > > -__inline __m128
> > > -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > > -_mm_floor_ss (__m128 __A, __m128 __B)
> > > -{
> > > -  __v4sf __r = (__v4sf) __A;
> > > -  __r[0] = __builtin_floor (((__v4sf) __B)[0]);
> > > -  return __r;
> > > -}
> > > -
> > >  #ifdef _ARCH_PWR8
> > >  extern __inline __m128i
> > >  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > > diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h b/gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
> > > new file mode 100644
> > > index 000000000000..de6cbf7be438
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
> > > @@ -0,0 +1,81 @@
> > > +#include <smmintrin.h>
> > > +#include <fenv.h>
> > > +#include "sse4_1-check.h"
> > > +
> > > +#define DIM(a) (sizeof (a) / sizeof (a)[0])
> > > +
> > > +static int roundings[] =
> > > +  {
> > > +    _MM_FROUND_TO_NEAREST_INT,
> > > +    _MM_FROUND_TO_NEG_INF,
> > > +    _MM_FROUND_TO_POS_INF,
> > > +    _MM_FROUND_TO_ZERO,
> > > +    _MM_FROUND_CUR_DIRECTION
> > > +  };
> > > +
> > > +static int modes[] =
> > > +  {
> > > +    FE_TONEAREST,
> > > +    FE_UPWARD,
> > > +    FE_DOWNWARD,
> > > +    FE_TOWARDZERO
> > > +  };
> > > +
> > > +static void
> > > +TEST (void)
> > > +{
> > > +  int i, j, ri, mi, round_save;
> > > +
> > > +  round_save = fegetround ();
> > > +  for (mi = 0; mi < DIM (modes); mi++) {
> > > +    fesetround (modes[mi]);
> > > +    for (i = 0; i < DIM (data); i++) {
> > > +      for (ri = 0; ri < DIM (roundings); ri++) {
> > > +	union value guess;
> > > +	union value *current_answers = answers[ri];
> > > +	switch ( roundings[ri] ) {
> > > +	  case _MM_FROUND_TO_NEAREST_INT:
> > > +	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
> > > +				    _MM_FROUND_TO_NEAREST_INT);
> > > +	    break;
> > > +	  case _MM_FROUND_TO_NEG_INF:
> > > +	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
> > > +				    _MM_FROUND_TO_NEG_INF);
> > > +	    break;
> > > +	  case _MM_FROUND_TO_POS_INF:
> > > +	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
> > > +				    _MM_FROUND_TO_POS_INF);
> > > +	    break;
> > > +	  case _MM_FROUND_TO_ZERO:
> > > +	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
> > > +				    _MM_FROUND_TO_ZERO);
> > > +	    break;
> > > +	  case _MM_FROUND_CUR_DIRECTION:
> > > +	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
> > > +				    _MM_FROUND_CUR_DIRECTION);
> > > +	    switch ( modes[mi] ) {
> > > +	      case FE_TONEAREST:
> > > +		current_answers = answers_NEAREST_INT;
> > > +		break;
> > > +	      case FE_UPWARD:
> > > +		current_answers = answers_POS_INF;
> > > +		break;
> > > +	      case FE_DOWNWARD:
> > > +		current_answers = answers_NEG_INF;
> > > +		break;
> > > +	      case FE_TOWARDZERO:
> > > +		current_answers = answers_ZERO;
> > > +		break;
> > > +	    }
> > > +	    break;
> > > +	  default:
> > > +	    abort ();
> > > +	}
> > > +	for (j = 0; j < DIM (guess.f); j++)
> > > +	  if (guess.f[j] != current_answers[i].f[j])
> > > +	    abort ();
> > > +      }
> > > +    }
> > > +  }
> > > +  fesetround (round_save);
> > > +}
> > > diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
> > > new file mode 100644
> > > index 000000000000..58d9cc524167
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
> > > @@ -0,0 +1,143 @@
> > > +/* { dg-do run } */
> > > +/* { dg-require-effective-target vsx_hw } */
> > > +/* { dg-options "-O2 -mvsx" } */
> > > +
> > > +#define NO_WARN_X86_INTRINSICS 1
> > > +#include <smmintrin.h>
> > > +
> > > +#define VEC_T __m128d
> > > +#define FP_T double
> > > +
> > > +#define ROUND_INTRIN(x, ignored, mode) _mm_round_pd (x, mode)
> > > +
> > > +#include "sse4_1-round-data.h"
> > > +
> > > +struct data2 data[] = {
> > > +  { .value1 = { .f = {  0.00,  0.25 } } },
> > > +  { .value1 = { .f = {  0.50,  0.75 } } },
> > > +
> > > +  { .value1 = { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffdp+50 } } },
> > > +  { .value1 = { .f = {  0x1.ffffffffffffep+50,  0x1.fffffffffffffp+50 } } },
> > > +  { .value1 = { .f = {  0x1.0000000000000p+51,  0x1.0000000000001p+51 } } },
> > > +  { .value1 = { .f = {  0x1.0000000000002p+51,  0x1.0000000000003p+51 } } },
> > > +
> > > +  { .value1 = { .f = {  0x1.ffffffffffffep+51,  0x1.fffffffffffffp+51 } } },
> > > +  { .value1 = { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } } },
> > > +
> > > +  { .value1 = { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } } },
> > > +  { .value1 = { .f = { -0x1.fffffffffffffp+51, -0x1.ffffffffffffep+51 } } },
> > > +
> > > +  { .value1 = { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } } },
> > > +  { .value1 = { .f = { -0x1.0000000000001p+51, -0x1.0000000000000p+51 } } },
> > > +  { .value1 = { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffep+50 } } },
> > > +  { .value1 = { .f = { -0x1.ffffffffffffdp+50, -0x1.ffffffffffffcp+50 } } },
> > > +
> > > +  { .value1 = { .f = { -1.00, -0.75 } } },
> > > +  { .value1 = { .f = { -0.50, -0.25 } } }
> > > +};
> > > +
> > > +union value answers_NEAREST_INT[] = {
> > > +  { .f = {  0.00,  0.00 } },
> > > +  { .f = {  0.00,  1.00 } },
> > > +
> > > +  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
> > > +  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
> > > +  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
> > > +  { .f = {  0x1.0000000000002p+51,  0x1.0000000000004p+51 } },
> > > +
> > > +  { .f = {  0x1.ffffffffffffep+51,  0x1.0000000000000p+52 } },
> > > +  { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
> > > +
> > > +  { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
> > > +  { .f = { -0x1.0000000000000p+52, -0x1.ffffffffffffep+51 } },
> > > +
> > > +  { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } },
> > > +  { .f = { -0x1.0000000000000p+51, -0x1.0000000000000p+51 } },
> > > +  { .f = { -0x1.ffffffffffffcp+50, -0x1.0000000000000p+51 } },
> > > +  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
> > > +
> > > +  { .f = { -1.00, -1.00 } },
> > > +  { .f = {  0.00,  0.00 } }
> > > +};
> > > +
> > > +union value answers_NEG_INF[] = {
> > > +  { .f = {  0.00,  0.00 } },
> > > +  { .f = {  0.00,  0.00 } },
> > > +
> > > +  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
> > > +  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
> > > +  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
> > > +  { .f = {  0x1.0000000000002p+51,  0x1.0000000000002p+51 } },
> > > +
> > > +  { .f = {  0x1.ffffffffffffep+51,  0x1.ffffffffffffep+51 } },
> > > +  { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
> > > +
> > > +  { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
> > > +  { .f = { -0x1.0000000000000p+52, -0x1.ffffffffffffep+51 } },
> > > +
> > > +  { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } },
> > > +  { .f = { -0x1.0000000000002p+51, -0x1.0000000000000p+51 } },
> > > +  { .f = { -0x1.ffffffffffffcp+50, -0x1.0000000000000p+51 } },
> > > +  { .f = { -0x1.0000000000000p+51, -0x1.ffffffffffffcp+50 } },
> > > +
> > > +  { .f = { -1.00, -1.00 } },
> > > +  { .f = { -1.00, -1.00 } }
> > > +};
> > > +
> > > +union value answers_POS_INF[] = {
> > > +  { .f = {  0.00,  1.00 } },
> > > +  { .f = {  1.00,  1.00 } },
> > > +
> > > +  { .f = {  0x1.ffffffffffffcp+50,  0x1.0000000000000p+51 } },
> > > +  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
> > > +  { .f = {  0x1.0000000000000p+51,  0x1.0000000000002p+51 } },
> > > +  { .f = {  0x1.0000000000002p+51,  0x1.0000000000004p+51 } },
> > > +
> > > +  { .f = {  0x1.ffffffffffffep+51,  0x1.0000000000000p+52 } },
> > > +  { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
> > > +
> > > +  { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
> > > +  { .f = { -0x1.ffffffffffffep+51, -0x1.ffffffffffffep+51 } },
> > > +
> > > +  { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } },
> > > +  { .f = { -0x1.0000000000000p+51, -0x1.0000000000000p+51 } },
> > > +  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
> > > +  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
> > > +
> > > +  { .f = { -1.00,  0.00 } },
> > > +  { .f = {  0.00,  0.00 } }
> > > +};
> > > +
> > > +union value answers_ZERO[] = {
> > > +  { .f = {  0.00,  0.00 } },
> > > +  { .f = {  0.00,  0.00 } },
> > > +
> > > +  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
> > > +  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
> > > +  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
> > > +  { .f = {  0x1.0000000000002p+51,  0x1.0000000000002p+51 } },
> > > +
> > > +  { .f = {  0x1.ffffffffffffep+51,  0x1.ffffffffffffep+51 } },
> > > +  { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
> > > +
> > > +  { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
> > > +  { .f = { -0x1.ffffffffffffep+51, -0x1.ffffffffffffep+51 } },
> > > +
> > > +  { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } },
> > > +  { .f = { -0x1.0000000000000p+51, -0x1.0000000000000p+51 } },
> > > +  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
> > > +  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
> > > +
> > > +  { .f = { -1.00,  0.00 } },
> > > +  { .f = {  0.00,  0.00 } }
> > > +};
> > > +
> > > +union value *answers[] = {
> > > +  answers_NEAREST_INT,
> > > +  answers_NEG_INF,
> > > +  answers_POS_INF,
> > > +  answers_ZERO,
> > > +  0 /* CUR_DIRECTION answers depend on current rounding mode.  */
> > > +};
> > > +
> > > +#include "sse4_1-round3.h"
> > > diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
> > > new file mode 100644
> > > index 000000000000..4b0366dfddf3
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
> > > @@ -0,0 +1,98 @@
> > > +/* { dg-do run } */
> > > +/* { dg-require-effective-target vsx_hw } */
> > > +/* { dg-options "-O2 -mvsx" } */
> > > +
> > > +#define NO_WARN_X86_INTRINSICS 1
> > > +#include <smmintrin.h>
> > > +
> > > +#define VEC_T __m128
> > > +#define FP_T float
> > > +
> > > +#define ROUND_INTRIN(x, ignored, mode) _mm_round_ps (x, mode)
> > > +
> > > +#include "sse4_1-round-data.h"
> > > +
> > > +struct data2 data[] = {
> > > +  { .value1 = { .f = {  0.00,  0.25,  0.50,  0.75 } } },
> > > +
> > > +  { .value1 = { .f = {  0x1.fffff8p+21,  0x1.fffffap+21,
> > > +			0x1.fffffcp+21,  0x1.fffffep+21 } } },
> > > +  { .value1 = { .f = {  0x1.fffffap+22,  0x1.fffffcp+22,
> > > +			0x1.fffffep+22,  0x1.fffffep+23 } } },
> > > +  { .value1 = { .f = { -0x1.fffffep+23, -0x1.fffffep+22,
> > > +		       -0x1.fffffcp+22, -0x1.fffffap+22 } } },
> > > +  { .value1 = { .f = { -0x1.fffffep+21, -0x1.fffffcp+21,
> > > +		       -0x1.fffffap+21, -0x1.fffff8p+21 } } },
> > > +
> > > +  { .value1 = { .f = { -1.00, -0.75, -0.50, -0.25 } } }
> > > +};
> > > +
> > > +union value answers_NEAREST_INT[] = {
> > > +  { .f = {  0.00,  0.00,  0.00,  1.00 } },
> > > +
> > > +  { .f = {  0x1.fffff8p+21,  0x1.fffff8p+21,
> > > +            0x1.000000p+22,  0x1.000000p+22 } },
> > > +  { .f = {  0x1.fffff8p+22,  0x1.fffffcp+22,
> > > +            0x1.000000p+23,  0x1.fffffep+23 } },
> > > +  { .f = { -0x1.fffffep+23, -0x1.000000p+23,
> > > +           -0x1.fffffcp+22, -0x1.fffff8p+22 } },
> > > +  { .f = { -0x1.000000p+22, -0x1.000000p+22,
> > > +           -0x1.fffff8p+21, -0x1.fffff8p+21 } },
> > > +
> > > +  { .f = { -1.00, -1.00,  0.00,  0.00 } }
> > > +};
> > > +
> > > +union value answers_NEG_INF[] = {
> > > +  { .f = {  0.00,  0.00,  0.00,  0.00 } },
> > > +
> > > +  { .f = {  0x1.fffff8p+21,  0x1.fffff8p+21,
> > > +            0x1.fffff8p+21,  0x1.fffff8p+21 } },
> > > +  { .f = {  0x1.fffff8p+22,  0x1.fffffcp+22,
> > > +            0x1.fffffcp+22,  0x1.fffffep+23 } },
> > > +  { .f = { -0x1.fffffep+23, -0x1.000000p+23,
> > > +           -0x1.fffffcp+22, -0x1.fffffcp+22 } },
> > > +  { .f = { -0x1.000000p+22, -0x1.000000p+22,
> > > +           -0x1.000000p+22, -0x1.fffff8p+21 } },
> > > +
> > > +  { .f = { -1.00, -1.00, -1.00, -1.00 } }
> > > +};
> > > +
> > > +union value answers_POS_INF[] = {
> > > +  { .f = {  0.00,  1.00,  1.00,  1.00 } },
> > > +
> > > +  { .f = {  0x1.fffff8p+21,  0x1.000000p+22,
> > > +            0x1.000000p+22,  0x1.000000p+22 } },
> > > +  { .f = {  0x1.fffffcp+22,  0x1.fffffcp+22,
> > > +            0x1.000000p+23,  0x1.fffffep+23 } },
> > > +  { .f = { -0x1.fffffep+23, -0x1.fffffcp+22,
> > > +           -0x1.fffffcp+22, -0x1.fffff8p+22 } },
> > > +  { .f = { -0x1.fffff8p+21, -0x1.fffff8p+21,
> > > +           -0x1.fffff8p+21, -0x1.fffff8p+21 } },
> > > +
> > > +  { .f = { -1.00,  0.00,  0.00,  0.00 } }
> > > +};
> > > +
> > > +union value answers_ZERO[] = {
> > > +  { .f = {  0.00,  0.00,  0.00,  0.00 } },
> > > +
> > > +  { .f = {  0x1.fffff8p+21,  0x1.fffff8p+21,
> > > +            0x1.fffff8p+21,  0x1.fffff8p+21 } },
> > > +  { .f = {  0x1.fffff8p+22,  0x1.fffffcp+22,
> > > +            0x1.fffffcp+22,  0x1.fffffep+23 } },
> > > +  { .f = { -0x1.fffffep+23, -0x1.fffffcp+22,
> > > +           -0x1.fffffcp+22, -0x1.fffff8p+22 } },
> > > +  { .f = { -0x1.fffff8p+21, -0x1.fffff8p+21,
> > > +           -0x1.fffff8p+21, -0x1.fffff8p+21 } },
> > > +
> > > +  { .f = { -1.00,  0.00,  0.00,  0.00 } }
> > > +};
> > > +
> > > +union value *answers[] = {
> > > +  answers_NEAREST_INT,
> > > +  answers_NEG_INF,
> > > +  answers_POS_INF,
> > > +  answers_ZERO,
> > > +  0 /* CUR_DIRECTION answers depend on current rounding mode.  */
> > > +};
> > > +
> > > +#include "sse4_1-round3.h"
> > > diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
> > > new file mode 100644
> > > index 000000000000..4f8d9e08c93d
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
> > > @@ -0,0 +1,256 @@
> > > +/* { dg-do run } */
> > > +/* { dg-require-effective-target vsx_hw } */
> > > +/* { dg-options "-O2 -mvsx" } */
> > > +
> > > +#include <stdio.h>
> > > +#define NO_WARN_X86_INTRINSICS 1
> > > +#include <smmintrin.h>
> > > +
> > > +#define VEC_T __m128d
> > > +#define FP_T double
> > > +
> > > +#define ROUND_INTRIN(x, y, mode) _mm_round_sd (x, y, mode)
> > > +
> > > +#include "sse4_1-round-data.h"
> > > +
> > > +static struct data2 data[] = {
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > +    .value2 = { .f = {  0.00, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > +    .value2 = { .f = {  0.25, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > +    .value2 = { .f = {  0.50, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > +    .value2 = { .f = {  0.75, IGNORED } } },
> > > +
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > +    .value2 = { .f = {  0x1.ffffffffffffcp+50, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > +    .value2 = { .f = {  0x1.ffffffffffffdp+50, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > +    .value2 = { .f = {  0x1.ffffffffffffep+50, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > +    .value2 = { .f = {  0x1.fffffffffffffp+50, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > +    .value2 = { .f = {  0x1.0000000000000p+51, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > +    .value2 = { .f = {  0x1.0000000000001p+51, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > +    .value2 = { .f = {  0x1.0000000000002p+51, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > +    .value2 = { .f = {  0x1.0000000000003p+51, IGNORED } } },
> > > +
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > +    .value2 = { .f = {  0x1.ffffffffffffep+51, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > +    .value2 = { .f = {  0x1.fffffffffffffp+51, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > +    .value2 = { .f = {  0x1.0000000000000p+52, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > +    .value2 = { .f = {  0x1.0000000000001p+52, IGNORED } } },
> > > +
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > +    .value2 = { .f = { -0x1.0000000000001p+52, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > +    .value2 = { .f = { -0x1.0000000000000p+52, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > +    .value2 = { .f = { -0x1.fffffffffffffp+51, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > +    .value2 = { .f = { -0x1.ffffffffffffep+51, IGNORED } } },
> > > +
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > +    .value2 = { .f = { -0x1.0000000000004p+51, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > +    .value2 = { .f = { -0x1.0000000000002p+51, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > +    .value2 = { .f = { -0x1.0000000000001p+51, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > +    .value2 = { .f = { -0x1.0000000000000p+51, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > +    .value2 = { .f = { -0x1.ffffffffffffcp+50, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > +    .value2 = { .f = { -0x1.ffffffffffffep+50, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > +    .value2 = { .f = { -0x1.ffffffffffffdp+50, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > +    .value2 = { .f = { -0x1.ffffffffffffcp+50, IGNORED } } },
> > > +
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > +    .value2 = { .f = { -1.00, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > +    .value2 = { .f = { -0.75, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > +    .value2 = { .f = { -0.50, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > +    .value2 = { .f = { -0.25, IGNORED } } }
> > > +};
> > > +
> > > +static union value answers_NEAREST_INT[] = {
> > > +  { .f = {  0.00, PASSTHROUGH } },
> > > +  { .f = {  0.00, PASSTHROUGH } },
> > > +  { .f = {  0.00, PASSTHROUGH } },
> > > +  { .f = {  1.00, PASSTHROUGH } },
> > > +
> > > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > > +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> > > +  { .f = {  0x1.0000000000004p+51, PASSTHROUGH } },
> > > +
> > > +  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> > > +  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
> > > +  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
> > > +  { .f = {  0x1.0000000000001p+52, PASSTHROUGH } },
> > > +
> > > +  { .f = { -0x1.0000000000001p+52, PASSTHROUGH } },
> > > +  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
> > > +  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
> > > +  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> > > +
> > > +  { .f = { -0x1.0000000000004p+51, PASSTHROUGH } },
> > > +  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
> > > +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> > > +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> > > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> > > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > +
> > > +  { .f = { -1.00, PASSTHROUGH } },
> > > +  { .f = { -1.00, PASSTHROUGH } },
> > > +  { .f = { -0.00, PASSTHROUGH } },
> > > +  { .f = { -0.00, PASSTHROUGH } }
> > > +};
> > > +
> > > +static union value answers_NEG_INF[] = {
> > > +  { .f = {  0.00, PASSTHROUGH } },
> > > +  { .f = {  0.00, PASSTHROUGH } },
> > > +  { .f = {  0.00, PASSTHROUGH } },
> > > +  { .f = {  0.00, PASSTHROUGH } },
> > > +
> > > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > > +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> > > +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> > > +
> > > +  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> > > +  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> > > +  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
> > > +  { .f = {  0x1.0000000000001p+52, PASSTHROUGH } },
> > > +
> > > +  { .f = { -0x1.0000000000001p+52, PASSTHROUGH } },
> > > +  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
> > > +  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
> > > +  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> > > +
> > > +  { .f = { -0x1.0000000000004p+51, PASSTHROUGH } },
> > > +  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
> > > +  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
> > > +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> > > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> > > +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> > > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > +
> > > +  { .f = { -1.00, PASSTHROUGH } },
> > > +  { .f = { -1.00, PASSTHROUGH } },
> > > +  { .f = { -1.00, PASSTHROUGH } },
> > > +  { .f = { -1.00, PASSTHROUGH } }
> > > +};
> > > +
> > > +static union value answers_POS_INF[] = {
> > > +  { .f = {  0.00, PASSTHROUGH } },
> > > +  { .f = {  1.00, PASSTHROUGH } },
> > > +  { .f = {  1.00, PASSTHROUGH } },
> > > +  { .f = {  1.00, PASSTHROUGH } },
> > > +
> > > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > > +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> > > +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> > > +  { .f = {  0x1.0000000000004p+51, PASSTHROUGH } },
> > > +
> > > +  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> > > +  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
> > > +  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
> > > +  { .f = {  0x1.0000000000001p+52, PASSTHROUGH } },
> > > +
> > > +  { .f = { -0x1.0000000000001p+52, PASSTHROUGH } },
> > > +  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
> > > +  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> > > +  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> > > +
> > > +  { .f = { -0x1.0000000000004p+51, PASSTHROUGH } },
> > > +  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
> > > +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> > > +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> > > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > +
> > > +  { .f = { -1.00, PASSTHROUGH } },
> > > +  { .f = {  0.00, PASSTHROUGH } },
> > > +  { .f = {  0.00, PASSTHROUGH } },
> > > +  { .f = {  0.00, PASSTHROUGH } }
> > > +};
> > > +
> > > +static union value answers_ZERO[] = {
> > > +  { .f = {  0.00, PASSTHROUGH } },
> > > +  { .f = {  0.00, PASSTHROUGH } },
> > > +  { .f = {  0.00, PASSTHROUGH } },
> > > +  { .f = {  0.00, PASSTHROUGH } },
> > > +
> > > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > > +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> > > +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> > > +
> > > +  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> > > +  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> > > +  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
> > > +  { .f = {  0x1.0000000000001p+52, PASSTHROUGH } },
> > > +
> > > +  { .f = { -0x1.0000000000001p+52, PASSTHROUGH } },
> > > +  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
> > > +  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> > > +  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> > > +
> > > +  { .f = { -0x1.0000000000004p+51, PASSTHROUGH } },
> > > +  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
> > > +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> > > +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> > > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > +
> > > +  { .f = { -1.00, PASSTHROUGH } },
> > > +  { .f = {  0.00, PASSTHROUGH } },
> > > +  { .f = {  0.00, PASSTHROUGH } },
> > > +  { .f = {  0.00, PASSTHROUGH } }
> > > +};
> > > +
> > > +union value *answers[] = {
> > > +  answers_NEAREST_INT,
> > > +  answers_NEG_INF,
> > > +  answers_POS_INF,
> > > +  answers_ZERO,
> > > +  0 /* CUR_DIRECTION answers depend on current rounding mode.  */
> > > +};
> > > +
> > > +#include "sse4_1-round3.h"
> > > diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c
> > > new file mode 100644
> > > index 000000000000..d788ebda64dd
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c
> > > @@ -0,0 +1,208 @@
> > > +/* { dg-do run } */
> > > +/* { dg-require-effective-target vsx_hw } */
> > > +/* { dg-options "-O2 -mvsx" } */
> > > +
> > > +#include <stdio.h>
> > > +#define NO_WARN_X86_INTRINSICS 1
> > > +#include <smmintrin.h>
> > > +
> > > +#define VEC_T __m128
> > > +#define FP_T float
> > > +
> > > +#define ROUND_INTRIN(x, y, mode) _mm_round_ss (x, y, mode)
> > > +
> > > +#include "sse4_1-round-data.h"
> > > +
> > > +static struct data2 data[] = {
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +    .value2 = { .f = {  0.00, IGNORED, IGNORED, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +    .value2 = { .f = {  0.25, IGNORED, IGNORED, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +    .value2 = { .f = {  0.50, IGNORED, IGNORED, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +    .value2 = { .f = {  0.75, IGNORED, IGNORED, IGNORED } } },
> > > +
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +    .value2 = { .f = {  0x1.fffff8p+21, IGNORED, IGNORED, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +    .value2 = { .f = {  0x1.fffffap+21, IGNORED, IGNORED, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +    .value2 = { .f = {  0x1.fffffcp+21, IGNORED, IGNORED, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +    .value2 = { .f = {  0x1.fffffep+21, IGNORED, IGNORED, IGNORED } } },
> > > +
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +    .value2 = { .f = {  0x1.fffffap+22, IGNORED, IGNORED, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +    .value2 = { .f = {  0x1.fffffcp+22, IGNORED, IGNORED, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +    .value2 = { .f = {  0x1.fffffep+22, IGNORED, IGNORED, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +    .value2 = { .f = {  0x1.fffffep+23, IGNORED, IGNORED, IGNORED } } },
> > > +
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +    .value2 = { .f = { -0x1.fffffep+23, IGNORED, IGNORED, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +    .value2 = { .f = { -0x1.fffffep+22, IGNORED, IGNORED, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +    .value2 = { .f = { -0x1.fffffcp+22, IGNORED, IGNORED, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +    .value2 = { .f = { -0x1.fffffap+22, IGNORED, IGNORED, IGNORED } } },
> > > +
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +    .value2 = { .f = { -0x1.fffffep+21, IGNORED, IGNORED, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +    .value2 = { .f = { -0x1.fffffcp+21, IGNORED, IGNORED, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +    .value2 = { .f = { -0x1.fffffap+21, IGNORED, IGNORED, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +    .value2 = { .f = { -0x1.fffff8p+21, IGNORED, IGNORED, IGNORED } } },
> > > +
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +    .value2 = { .f = { -1.00, IGNORED, IGNORED, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +    .value2 = { .f = { -0.75, IGNORED, IGNORED, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +    .value2 = { .f = { -0.50, IGNORED, IGNORED, IGNORED } } },
> > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +    .value2 = { .f = { -0.25, IGNORED, IGNORED, IGNORED } } }
> > > +};
> > > +
> > > +static union value answers_NEAREST_INT[] = {
> > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +
> > > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +
> > > +  { .f = {  0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +
> > > +  { .f = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = { -0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = { -0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +
> > > +  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +
> > > +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = { -0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = { -0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }
> > > +};
> > > +
> > > +static union value answers_NEG_INF[] = {
> > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +
> > > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +
> > > +  { .f = {  0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +
> > > +  { .f = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = { -0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +
> > > +  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +
> > > +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }
> > > +};
> > > +
> > > +static union value answers_POS_INF[] = {
> > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +
> > > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +
> > > +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +
> > > +  { .f = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = { -0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +
> > > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +
> > > +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }
> > > +};
> > > +
> > > +static union value answers_ZERO[] = {
> > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +
> > > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +
> > > +  { .f = {  0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +
> > > +  { .f = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = { -0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +
> > > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +
> > > +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }
> > > +};
> > > +
> > > +union value *answers[] = {
> > > +  answers_NEAREST_INT,
> > > +  answers_NEG_INF,
> > > +  answers_POS_INF,
> > > +  answers_ZERO,
> > > +  0 /* CUR_DIRECTION answers depend on current rounding mode.  */
> > > +};
> > > +
> > > +#include "sse4_1-round3.h"
> > > -- 
> > > 2.27.0
> > > 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PING^4 PATCH v4 2/3] rs6000: Support SSE4.1 "round" intrinsics
  2021-11-19  2:24       ` [PING^3 " Paul A. Clarke
@ 2022-01-03 16:48         ` Paul A. Clarke
  0 siblings, 0 replies; 13+ messages in thread
From: Paul A. Clarke @ 2022-01-03 16:48 UTC (permalink / raw)
  To: segher, wschmidt, gcc-patches

On Thu, Nov 18, 2021 at 08:24:52PM -0600, Paul A. Clarke via Gcc-patches wrote:
> On Mon, Nov 08, 2021 at 11:40:42AM -0600, Paul A. Clarke via Gcc-patches wrote:
> > On Tue, Oct 26, 2021 at 03:00:11PM -0500, Paul A. Clarke via Gcc-patches wrote:
> > > Patches 1/3 and 3/3 have been committed.
> > > This is only a ping for 2/3.
> > 
> > Gentle re-ping.
> 
> Gentle re-re-ping.

and once more. :-)

> > > On Mon, Oct 18, 2021 at 08:15:11PM -0500, Paul A. Clarke via Gcc-patches wrote:
> > > > Suppress exceptions (when specified), by saving, manipulating, and
> > > > restoring the FPSCR.  Similarly, save, set, and restore the floating-point
> > > > rounding mode when required.
> > > > 
> > > > No attempt is made to optimize writing the FPSCR (by checking if the new
> > > > value would be the same), other than using lighter weight instructions
> > > > when possible. Note that explicit instruction scheduling "barriers" are
> > > > added to prevent floating-point computations from being moved before or
> > > > after the explicit FPSCR manipulations.  (That these are required has
> > > > been reported as an issue in GCC: PR102783.)
> > > > 
> > > > The scalar versions naively use the parallel versions to compute the
> > > > single scalar result and then construct the remainder of the result.
> > > > 
> > > > Of minor note, the values of _MM_FROUND_TO_NEG_INF and _MM_FROUND_TO_ZERO
> > > > are swapped from the corresponding values on x86 so as to match the
> > > > corresponding rounding mode values in the Power ISA.
> > > > 
> > > > Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and
> > > > convert _mm_ceil* and _mm_floor* into macros. This matches the current
> > > > analogous implementations in config/i386/smmintrin.h.
> > > > 
> > > > Function signatures match the analogous functions in config/i386/smmintrin.h.
> > > > 
> > > > Add tests for _mm_round_pd, _mm_round_ps, _mm_round_sd, _mm_round_ss,
> > > > modeled after the very similar "floor" and "ceil" tests.
> > > > 
> > > > Include basic tests, plus tests at the boundaries for floating-point
> > > > representation, positive and negative, test all of the parameterized
> > > > rounding modes as well as the C99 rounding modes and interactions
> > > > between the two.
> > > > 
> > > > Exceptions are not explicitly tested.
> > > > 
> > > > 2021-10-18  Paul A. Clarke  <pc@us.ibm.com>
> > > > 
> > > > gcc
> > > > 	* config/rs6000/smmintrin.h (_mm_round_pd, _mm_round_ps,
> > > > 	_mm_round_sd, _mm_round_ss, _MM_FROUND_TO_NEAREST_INT,
> > > > 	_MM_FROUND_TO_ZERO, _MM_FROUND_TO_POS_INF, _MM_FROUND_TO_NEG_INF,
> > > > 	_MM_FROUND_CUR_DIRECTION, _MM_FROUND_RAISE_EXC, _MM_FROUND_NO_EXC,
> > > > 	_MM_FROUND_NINT, _MM_FROUND_FLOOR, _MM_FROUND_CEIL, _MM_FROUND_TRUNC,
> > > > 	_MM_FROUND_RINT, _MM_FROUND_NEARBYINT): New.
> > > > 	* config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd,
> > > > 	_mm_ceil_ss, _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss):
> > > > 	Convert from function to macro.
> > > > 
> > > > gcc/testsuite
> > > > 	* gcc.target/powerpc/sse4_1-round3.h: New.
> > > > 	* gcc.target/powerpc/sse4_1-roundpd.c: New.
> > > > 	* gcc.target/powerpc/sse4_1-roundps.c: New.
> > > > 	* gcc.target/powerpc/sse4_1-roundsd.c: New.
> > > > 	* gcc.target/powerpc/sse4_1-roundss.c: New.
> > > > ---
> > > >  gcc/config/rs6000/smmintrin.h                 | 292 ++++++++++++++----
> > > >  .../gcc.target/powerpc/sse4_1-round3.h        |  81 +++++
> > > >  .../gcc.target/powerpc/sse4_1-roundpd.c       | 143 +++++++++
> > > >  .../gcc.target/powerpc/sse4_1-roundps.c       |  98 ++++++
> > > >  .../gcc.target/powerpc/sse4_1-roundsd.c       | 256 +++++++++++++++
> > > >  .../gcc.target/powerpc/sse4_1-roundss.c       | 208 +++++++++++++
> > > >  6 files changed, 1014 insertions(+), 64 deletions(-)
> > > >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
> > > >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
> > > >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c
> > > > 
> > > > diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
> > > > index 90ce03d22709..6bb03e6e20ac 100644
> > > > --- a/gcc/config/rs6000/smmintrin.h
> > > > +++ b/gcc/config/rs6000/smmintrin.h
> > > > @@ -42,6 +42,234 @@
> > > >  #include <altivec.h>
> > > >  #include <tmmintrin.h>
> > > >  
> > > > +/* Rounding mode macros. */
> > > > +#define _MM_FROUND_TO_NEAREST_INT       0x00
> > > > +#define _MM_FROUND_TO_ZERO              0x01
> > > > +#define _MM_FROUND_TO_POS_INF           0x02
> > > > +#define _MM_FROUND_TO_NEG_INF           0x03
> > > > +#define _MM_FROUND_CUR_DIRECTION        0x04
> > > > +
> > > > +#define _MM_FROUND_NINT		\
> > > > +  (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_RAISE_EXC)
> > > > +#define _MM_FROUND_FLOOR	\
> > > > +  (_MM_FROUND_TO_NEG_INF | _MM_FROUND_RAISE_EXC)
> > > > +#define _MM_FROUND_CEIL		\
> > > > +  (_MM_FROUND_TO_POS_INF | _MM_FROUND_RAISE_EXC)
> > > > +#define _MM_FROUND_TRUNC	\
> > > > +  (_MM_FROUND_TO_ZERO | _MM_FROUND_RAISE_EXC)
> > > > +#define _MM_FROUND_RINT		\
> > > > +  (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_RAISE_EXC)
> > > > +#define _MM_FROUND_NEARBYINT	\
> > > > +  (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC)
> > > > +
> > > > +#define _MM_FROUND_RAISE_EXC            0x00
> > > > +#define _MM_FROUND_NO_EXC               0x08
> > > > +
> > > > +extern __inline __m128d
> > > > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > > > +_mm_round_pd (__m128d __A, int __rounding)
> > > > +{
> > > > +  __v2df __r;
> > > > +  union {
> > > > +    double __fr;
> > > > +    long long __fpscr;
> > > > +  } __enables_save, __fpscr_save;
> > > > +
> > > > +  if (__rounding & _MM_FROUND_NO_EXC)
> > > > +    {
> > > > +      /* Save enabled exceptions, disable all exceptions,
> > > > +	 and preserve the rounding mode.  */
> > > > +#ifdef _ARCH_PWR9
> > > > +      __asm__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
> > > > +      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
> > > > +#else
> > > > +      __fpscr_save.__fr = __builtin_mffs ();
> > > > +      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
> > > > +      __fpscr_save.__fpscr &= ~0xf8;
> > > > +      __builtin_mtfsf (0b00000011, __fpscr_save.__fr);
> > > > +#endif
> > > > +      /* Insert an artificial "read/write" reference to the variable
> > > > +	 read below, to ensure the compiler does not schedule
> > > > +	 a read/use of the variable before the FPSCR is modified, above.
> > > > +	 This can be removed if and when GCC PR102783 is fixed.
> > > > +       */
> > > > +      __asm__ ("" : "+wa" (__A));
> > > > +    }
> > > > +
> > > > +  switch (__rounding)
> > > > +    {
> > > > +      case _MM_FROUND_TO_NEAREST_INT:
> > > > +	__fpscr_save.__fr = __builtin_mffsl ();
> > > > +	__attribute__ ((fallthrough));
> > > > +      case _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC:
> > > > +	__builtin_set_fpscr_rn (0b00);
> > > > +	/* Insert an artificial "read/write" reference to the variable
> > > > +	   read below, to ensure the compiler does not schedule
> > > > +	   a read/use of the variable before the FPSCR is modified, above.
> > > > +	   This can be removed if and when GCC PR102783 is fixed.
> > > > +	 */
> > > > +	__asm__ ("" : "+wa" (__A));
> > > > +
> > > > +	__r = vec_rint ((__v2df) __A);
> > > > +
> > > > +	/* Insert an artificial "read" reference to the variable written
> > > > +	   above, to ensure the compiler does not schedule the computation
> > > > +	   of the value after the manipulation of the FPSCR, below.
> > > > +	   This can be removed if and when GCC PR102783 is fixed.
> > > > +	 */
> > > > +	__asm__ ("" : : "wa" (__r));
> > > > +	__builtin_set_fpscr_rn (__fpscr_save.__fpscr);
> > > > +	break;
> > > > +      case _MM_FROUND_TO_NEG_INF:
> > > > +      case _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC:
> > > > +	__r = vec_floor ((__v2df) __A);
> > > > +	break;
> > > > +      case _MM_FROUND_TO_POS_INF:
> > > > +      case _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC:
> > > > +	__r = vec_ceil ((__v2df) __A);
> > > > +	break;
> > > > +      case _MM_FROUND_TO_ZERO:
> > > > +      case _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC:
> > > > +	__r = vec_trunc ((__v2df) __A);
> > > > +	break;
> > > > +      case _MM_FROUND_CUR_DIRECTION:
> > > > +	__r = vec_rint ((__v2df) __A);
> > > > +	break;
> > > > +    }
> > > > +  if (__rounding & _MM_FROUND_NO_EXC)
> > > > +    {
> > > > +      /* Insert an artificial "read" reference to the variable written
> > > > +	 above, to ensure the compiler does not schedule the computation
> > > > +	 of the value after the manipulation of the FPSCR, below.
> > > > +	 This can be removed if and when GCC PR102783 is fixed.
> > > > +       */
> > > > +      __asm__ ("" : : "wa" (__r));
> > > > +      /* Restore enabled exceptions.  */
> > > > +      __fpscr_save.__fr = __builtin_mffsl ();
> > > > +      __fpscr_save.__fpscr |= __enables_save.__fpscr;
> > > > +      __builtin_mtfsf (0b00000011, __fpscr_save.__fr);
> > > > +    }
> > > > +  return (__m128d) __r;
> > > > +}
> > > > +
> > > > +extern __inline __m128d
> > > > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > > > +_mm_round_sd (__m128d __A, __m128d __B, int __rounding)
> > > > +{
> > > > +  __B = _mm_round_pd (__B, __rounding);
> > > > +  __v2df __r = { ((__v2df) __B)[0], ((__v2df) __A)[1] };
> > > > +  return (__m128d) __r;
> > > > +}
> > > > +
> > > > +extern __inline __m128
> > > > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > > > +_mm_round_ps (__m128 __A, int __rounding)
> > > > +{
> > > > +  __v4sf __r;
> > > > +  union {
> > > > +    double __fr;
> > > > +    long long __fpscr;
> > > > +  } __enables_save, __fpscr_save;
> > > > +
> > > > +  if (__rounding & _MM_FROUND_NO_EXC)
> > > > +    {
> > > > +      /* Save enabled exceptions, disable all exceptions,
> > > > +	 and preserve the rounding mode.  */
> > > > +#ifdef _ARCH_PWR9
> > > > +      __asm__ ("mffsce %0" : "=f" (__fpscr_save.__fr));
> > > > +      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
> > > > +#else
> > > > +      __fpscr_save.__fr = __builtin_mffs ();
> > > > +      __enables_save.__fpscr = __fpscr_save.__fpscr & 0xf8;
> > > > +      __fpscr_save.__fpscr &= ~0xf8;
> > > > +      __builtin_mtfsf (0b00000011, __fpscr_save.__fr);
> > > > +#endif
> > > > +      /* Insert an artificial "read/write" reference to the variable
> > > > +	 read below, to ensure the compiler does not schedule
> > > > +	 a read/use of the variable before the FPSCR is modified, above.
> > > > +	 This can be removed if and when GCC PR102783 is fixed.
> > > > +       */
> > > > +      __asm__ ("" : "+wa" (__A));
> > > > +    }
> > > > +
> > > > +  switch (__rounding)
> > > > +    {
> > > > +      case _MM_FROUND_TO_NEAREST_INT:
> > > > +	__fpscr_save.__fr = __builtin_mffsl ();
> > > > +	__attribute__ ((fallthrough));
> > > > +      case _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC:
> > > > +	__builtin_set_fpscr_rn (0b00);
> > > > +	/* Insert an artificial "read/write" reference to the variable
> > > > +	   read below, to ensure the compiler does not schedule
> > > > +	   a read/use of the variable before the FPSCR is modified, above.
> > > > +	   This can be removed if and when GCC PR102783 is fixed.
> > > > +	 */
> > > > +	__asm__ ("" : "+wa" (__A));
> > > > +
> > > > +	__r = vec_rint ((__v4sf) __A);
> > > > +
> > > > +	/* Insert an artificial "read" reference to the variable written
> > > > +	   above, to ensure the compiler does not schedule the computation
> > > > +	   of the value after the manipulation of the FPSCR, below.
> > > > +	   This can be removed if and when GCC PR102783 is fixed.
> > > > +	 */
> > > > +	__asm__ ("" : : "wa" (__r));
> > > > +	__builtin_set_fpscr_rn (__fpscr_save.__fpscr);
> > > > +	break;
> > > > +      case _MM_FROUND_TO_NEG_INF:
> > > > +      case _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC:
> > > > +	__r = vec_floor ((__v4sf) __A);
> > > > +	break;
> > > > +      case _MM_FROUND_TO_POS_INF:
> > > > +      case _MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC:
> > > > +	__r = vec_ceil ((__v4sf) __A);
> > > > +	break;
> > > > +      case _MM_FROUND_TO_ZERO:
> > > > +      case _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC:
> > > > +	__r = vec_trunc ((__v4sf) __A);
> > > > +	break;
> > > > +      case _MM_FROUND_CUR_DIRECTION:
> > > > +	__r = vec_rint ((__v4sf) __A);
> > > > +	break;
> > > > +    }
> > > > +  if (__rounding & _MM_FROUND_NO_EXC)
> > > > +    {
> > > > +      /* Insert an artificial "read" reference to the variable written
> > > > +	 above, to ensure the compiler does not schedule the computation
> > > > +	 of the value after the manipulation of the FPSCR, below.
> > > > +	 This can be removed if and when GCC PR102783 is fixed.
> > > > +       */
> > > > +      __asm__ ("" : : "wa" (__r));
> > > > +      /* Restore enabled exceptions.  */
> > > > +      __fpscr_save.__fr = __builtin_mffsl ();
> > > > +      __fpscr_save.__fpscr |= __enables_save.__fpscr;
> > > > +      __builtin_mtfsf (0b00000011, __fpscr_save.__fr);
> > > > +    }
> > > > +  return (__m128) __r;
> > > > +}
> > > > +
> > > > +extern __inline __m128
> > > > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > > > +_mm_round_ss (__m128 __A, __m128 __B, int __rounding)
> > > > +{
> > > > +  __B = _mm_round_ps (__B, __rounding);
> > > > +  __v4sf __r = (__v4sf) __A;
> > > > +  __r[0] = ((__v4sf) __B)[0];
> > > > +  return (__m128) __r;
> > > > +}
> > > > +
> > > > +#define _mm_ceil_pd(V)	   _mm_round_pd ((V), _MM_FROUND_CEIL)
> > > > +#define _mm_ceil_sd(D, V)  _mm_round_sd ((D), (V), _MM_FROUND_CEIL)
> > > > +
> > > > +#define _mm_floor_pd(V)	   _mm_round_pd((V), _MM_FROUND_FLOOR)
> > > > +#define _mm_floor_sd(D, V) _mm_round_sd ((D), (V), _MM_FROUND_FLOOR)
> > > > +
> > > > +#define _mm_ceil_ps(V)	   _mm_round_ps ((V), _MM_FROUND_CEIL)
> > > > +#define _mm_ceil_ss(D, V)  _mm_round_ss ((D), (V), _MM_FROUND_CEIL)
> > > > +
> > > > +#define _mm_floor_ps(V)	   _mm_round_ps ((V), _MM_FROUND_FLOOR)
> > > > +#define _mm_floor_ss(D, V) _mm_round_ss ((D), (V), _MM_FROUND_FLOOR)
> > > > +
> > > >  extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> > > >  _mm_insert_epi8 (__m128i const __A, int const __D, int const __N)
> > > >  {
> > > > @@ -210,70 +438,6 @@ _mm_testnzc_si128 (__m128i __A, __m128i __B)
> > > >  
> > > >  #define _mm_test_mix_ones_zeros(M, V) _mm_testnzc_si128 ((M), (V))
> > > >  
> > > > -__inline __m128d
> > > > -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > > > -_mm_ceil_pd (__m128d __A)
> > > > -{
> > > > -  return (__m128d) vec_ceil ((__v2df) __A);
> > > > -}
> > > > -
> > > > -__inline __m128d
> > > > -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > > > -_mm_ceil_sd (__m128d __A, __m128d __B)
> > > > -{
> > > > -  __v2df __r = vec_ceil ((__v2df) __B);
> > > > -  __r[1] = ((__v2df) __A)[1];
> > > > -  return (__m128d) __r;
> > > > -}
> > > > -
> > > > -__inline __m128d
> > > > -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > > > -_mm_floor_pd (__m128d __A)
> > > > -{
> > > > -  return (__m128d) vec_floor ((__v2df) __A);
> > > > -}
> > > > -
> > > > -__inline __m128d
> > > > -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > > > -_mm_floor_sd (__m128d __A, __m128d __B)
> > > > -{
> > > > -  __v2df __r = vec_floor ((__v2df) __B);
> > > > -  __r[1] = ((__v2df) __A)[1];
> > > > -  return (__m128d) __r;
> > > > -}
> > > > -
> > > > -__inline __m128
> > > > -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > > > -_mm_ceil_ps (__m128 __A)
> > > > -{
> > > > -  return (__m128) vec_ceil ((__v4sf) __A);
> > > > -}
> > > > -
> > > > -__inline __m128
> > > > -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > > > -_mm_ceil_ss (__m128 __A, __m128 __B)
> > > > -{
> > > > -  __v4sf __r = (__v4sf) __A;
> > > > -  __r[0] = __builtin_ceil (((__v4sf) __B)[0]);
> > > > -  return __r;
> > > > -}
> > > > -
> > > > -__inline __m128
> > > > -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > > > -_mm_floor_ps (__m128 __A)
> > > > -{
> > > > -  return (__m128) vec_floor ((__v4sf) __A);
> > > > -}
> > > > -
> > > > -__inline __m128
> > > > -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > > > -_mm_floor_ss (__m128 __A, __m128 __B)
> > > > -{
> > > > -  __v4sf __r = (__v4sf) __A;
> > > > -  __r[0] = __builtin_floor (((__v4sf) __B)[0]);
> > > > -  return __r;
> > > > -}
> > > > -
> > > >  #ifdef _ARCH_PWR8
> > > >  extern __inline __m128i
> > > >  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > > > diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h b/gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
> > > > new file mode 100644
> > > > index 000000000000..de6cbf7be438
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-round3.h
> > > > @@ -0,0 +1,81 @@
> > > > +#include <smmintrin.h>
> > > > +#include <fenv.h>
> > > > +#include "sse4_1-check.h"
> > > > +
> > > > +#define DIM(a) (sizeof (a) / sizeof (a)[0])
> > > > +
> > > > +static int roundings[] =
> > > > +  {
> > > > +    _MM_FROUND_TO_NEAREST_INT,
> > > > +    _MM_FROUND_TO_NEG_INF,
> > > > +    _MM_FROUND_TO_POS_INF,
> > > > +    _MM_FROUND_TO_ZERO,
> > > > +    _MM_FROUND_CUR_DIRECTION
> > > > +  };
> > > > +
> > > > +static int modes[] =
> > > > +  {
> > > > +    FE_TONEAREST,
> > > > +    FE_UPWARD,
> > > > +    FE_DOWNWARD,
> > > > +    FE_TOWARDZERO
> > > > +  };
> > > > +
> > > > +static void
> > > > +TEST (void)
> > > > +{
> > > > +  int i, j, ri, mi, round_save;
> > > > +
> > > > +  round_save = fegetround ();
> > > > +  for (mi = 0; mi < DIM (modes); mi++) {
> > > > +    fesetround (modes[mi]);
> > > > +    for (i = 0; i < DIM (data); i++) {
> > > > +      for (ri = 0; ri < DIM (roundings); ri++) {
> > > > +	union value guess;
> > > > +	union value *current_answers = answers[ri];
> > > > +	switch ( roundings[ri] ) {
> > > > +	  case _MM_FROUND_TO_NEAREST_INT:
> > > > +	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
> > > > +				    _MM_FROUND_TO_NEAREST_INT);
> > > > +	    break;
> > > > +	  case _MM_FROUND_TO_NEG_INF:
> > > > +	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
> > > > +				    _MM_FROUND_TO_NEG_INF);
> > > > +	    break;
> > > > +	  case _MM_FROUND_TO_POS_INF:
> > > > +	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
> > > > +				    _MM_FROUND_TO_POS_INF);
> > > > +	    break;
> > > > +	  case _MM_FROUND_TO_ZERO:
> > > > +	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
> > > > +				    _MM_FROUND_TO_ZERO);
> > > > +	    break;
> > > > +	  case _MM_FROUND_CUR_DIRECTION:
> > > > +	    guess.x = ROUND_INTRIN (data[i].value1.x, data[i].value2.x,
> > > > +				    _MM_FROUND_CUR_DIRECTION);
> > > > +	    switch ( modes[mi] ) {
> > > > +	      case FE_TONEAREST:
> > > > +		current_answers = answers_NEAREST_INT;
> > > > +		break;
> > > > +	      case FE_UPWARD:
> > > > +		current_answers = answers_POS_INF;
> > > > +		break;
> > > > +	      case FE_DOWNWARD:
> > > > +		current_answers = answers_NEG_INF;
> > > > +		break;
> > > > +	      case FE_TOWARDZERO:
> > > > +		current_answers = answers_ZERO;
> > > > +		break;
> > > > +	    }
> > > > +	    break;
> > > > +	  default:
> > > > +	    abort ();
> > > > +	}
> > > > +	for (j = 0; j < DIM (guess.f); j++)
> > > > +	  if (guess.f[j] != current_answers[i].f[j])
> > > > +	    abort ();
> > > > +      }
> > > > +    }
> > > > +  }
> > > > +  fesetround (round_save);
> > > > +}
> > > > diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
> > > > new file mode 100644
> > > > index 000000000000..58d9cc524167
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundpd.c
> > > > @@ -0,0 +1,143 @@
> > > > +/* { dg-do run } */
> > > > +/* { dg-require-effective-target vsx_hw } */
> > > > +/* { dg-options "-O2 -mvsx" } */
> > > > +
> > > > +#define NO_WARN_X86_INTRINSICS 1
> > > > +#include <smmintrin.h>
> > > > +
> > > > +#define VEC_T __m128d
> > > > +#define FP_T double
> > > > +
> > > > +#define ROUND_INTRIN(x, ignored, mode) _mm_round_pd (x, mode)
> > > > +
> > > > +#include "sse4_1-round-data.h"
> > > > +
> > > > +struct data2 data[] = {
> > > > +  { .value1 = { .f = {  0.00,  0.25 } } },
> > > > +  { .value1 = { .f = {  0.50,  0.75 } } },
> > > > +
> > > > +  { .value1 = { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffdp+50 } } },
> > > > +  { .value1 = { .f = {  0x1.ffffffffffffep+50,  0x1.fffffffffffffp+50 } } },
> > > > +  { .value1 = { .f = {  0x1.0000000000000p+51,  0x1.0000000000001p+51 } } },
> > > > +  { .value1 = { .f = {  0x1.0000000000002p+51,  0x1.0000000000003p+51 } } },
> > > > +
> > > > +  { .value1 = { .f = {  0x1.ffffffffffffep+51,  0x1.fffffffffffffp+51 } } },
> > > > +  { .value1 = { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } } },
> > > > +
> > > > +  { .value1 = { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } } },
> > > > +  { .value1 = { .f = { -0x1.fffffffffffffp+51, -0x1.ffffffffffffep+51 } } },
> > > > +
> > > > +  { .value1 = { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } } },
> > > > +  { .value1 = { .f = { -0x1.0000000000001p+51, -0x1.0000000000000p+51 } } },
> > > > +  { .value1 = { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffep+50 } } },
> > > > +  { .value1 = { .f = { -0x1.ffffffffffffdp+50, -0x1.ffffffffffffcp+50 } } },
> > > > +
> > > > +  { .value1 = { .f = { -1.00, -0.75 } } },
> > > > +  { .value1 = { .f = { -0.50, -0.25 } } }
> > > > +};
> > > > +
> > > > +union value answers_NEAREST_INT[] = {
> > > > +  { .f = {  0.00,  0.00 } },
> > > > +  { .f = {  0.00,  1.00 } },
> > > > +
> > > > +  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
> > > > +  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
> > > > +  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
> > > > +  { .f = {  0x1.0000000000002p+51,  0x1.0000000000004p+51 } },
> > > > +
> > > > +  { .f = {  0x1.ffffffffffffep+51,  0x1.0000000000000p+52 } },
> > > > +  { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
> > > > +
> > > > +  { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
> > > > +  { .f = { -0x1.0000000000000p+52, -0x1.ffffffffffffep+51 } },
> > > > +
> > > > +  { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } },
> > > > +  { .f = { -0x1.0000000000000p+51, -0x1.0000000000000p+51 } },
> > > > +  { .f = { -0x1.ffffffffffffcp+50, -0x1.0000000000000p+51 } },
> > > > +  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
> > > > +
> > > > +  { .f = { -1.00, -1.00 } },
> > > > +  { .f = {  0.00,  0.00 } }
> > > > +};
> > > > +
> > > > +union value answers_NEG_INF[] = {
> > > > +  { .f = {  0.00,  0.00 } },
> > > > +  { .f = {  0.00,  0.00 } },
> > > > +
> > > > +  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
> > > > +  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
> > > > +  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
> > > > +  { .f = {  0x1.0000000000002p+51,  0x1.0000000000002p+51 } },
> > > > +
> > > > +  { .f = {  0x1.ffffffffffffep+51,  0x1.ffffffffffffep+51 } },
> > > > +  { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
> > > > +
> > > > +  { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
> > > > +  { .f = { -0x1.0000000000000p+52, -0x1.ffffffffffffep+51 } },
> > > > +
> > > > +  { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } },
> > > > +  { .f = { -0x1.0000000000002p+51, -0x1.0000000000000p+51 } },
> > > > +  { .f = { -0x1.ffffffffffffcp+50, -0x1.0000000000000p+51 } },
> > > > +  { .f = { -0x1.0000000000000p+51, -0x1.ffffffffffffcp+50 } },
> > > > +
> > > > +  { .f = { -1.00, -1.00 } },
> > > > +  { .f = { -1.00, -1.00 } }
> > > > +};
> > > > +
> > > > +union value answers_POS_INF[] = {
> > > > +  { .f = {  0.00,  1.00 } },
> > > > +  { .f = {  1.00,  1.00 } },
> > > > +
> > > > +  { .f = {  0x1.ffffffffffffcp+50,  0x1.0000000000000p+51 } },
> > > > +  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
> > > > +  { .f = {  0x1.0000000000000p+51,  0x1.0000000000002p+51 } },
> > > > +  { .f = {  0x1.0000000000002p+51,  0x1.0000000000004p+51 } },
> > > > +
> > > > +  { .f = {  0x1.ffffffffffffep+51,  0x1.0000000000000p+52 } },
> > > > +  { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
> > > > +
> > > > +  { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
> > > > +  { .f = { -0x1.ffffffffffffep+51, -0x1.ffffffffffffep+51 } },
> > > > +
> > > > +  { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } },
> > > > +  { .f = { -0x1.0000000000000p+51, -0x1.0000000000000p+51 } },
> > > > +  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
> > > > +  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
> > > > +
> > > > +  { .f = { -1.00,  0.00 } },
> > > > +  { .f = {  0.00,  0.00 } }
> > > > +};
> > > > +
> > > > +union value answers_ZERO[] = {
> > > > +  { .f = {  0.00,  0.00 } },
> > > > +  { .f = {  0.00,  0.00 } },
> > > > +
> > > > +  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
> > > > +  { .f = {  0x1.ffffffffffffcp+50,  0x1.ffffffffffffcp+50 } },
> > > > +  { .f = {  0x1.0000000000000p+51,  0x1.0000000000000p+51 } },
> > > > +  { .f = {  0x1.0000000000002p+51,  0x1.0000000000002p+51 } },
> > > > +
> > > > +  { .f = {  0x1.ffffffffffffep+51,  0x1.ffffffffffffep+51 } },
> > > > +  { .f = {  0x1.0000000000000p+52,  0x1.0000000000001p+52 } },
> > > > +
> > > > +  { .f = { -0x1.0000000000001p+52, -0x1.0000000000000p+52 } },
> > > > +  { .f = { -0x1.ffffffffffffep+51, -0x1.ffffffffffffep+51 } },
> > > > +
> > > > +  { .f = { -0x1.0000000000004p+51, -0x1.0000000000002p+51 } },
> > > > +  { .f = { -0x1.0000000000000p+51, -0x1.0000000000000p+51 } },
> > > > +  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
> > > > +  { .f = { -0x1.ffffffffffffcp+50, -0x1.ffffffffffffcp+50 } },
> > > > +
> > > > +  { .f = { -1.00,  0.00 } },
> > > > +  { .f = {  0.00,  0.00 } }
> > > > +};
> > > > +
> > > > +union value *answers[] = {
> > > > +  answers_NEAREST_INT,
> > > > +  answers_NEG_INF,
> > > > +  answers_POS_INF,
> > > > +  answers_ZERO,
> > > > +  0 /* CUR_DIRECTION answers depend on current rounding mode.  */
> > > > +};
> > > > +
> > > > +#include "sse4_1-round3.h"
> > > > diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
> > > > new file mode 100644
> > > > index 000000000000..4b0366dfddf3
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundps.c
> > > > @@ -0,0 +1,98 @@
> > > > +/* { dg-do run } */
> > > > +/* { dg-require-effective-target vsx_hw } */
> > > > +/* { dg-options "-O2 -mvsx" } */
> > > > +
> > > > +#define NO_WARN_X86_INTRINSICS 1
> > > > +#include <smmintrin.h>
> > > > +
> > > > +#define VEC_T __m128
> > > > +#define FP_T float
> > > > +
> > > > +#define ROUND_INTRIN(x, ignored, mode) _mm_round_ps (x, mode)
> > > > +
> > > > +#include "sse4_1-round-data.h"
> > > > +
> > > > +struct data2 data[] = {
> > > > +  { .value1 = { .f = {  0.00,  0.25,  0.50,  0.75 } } },
> > > > +
> > > > +  { .value1 = { .f = {  0x1.fffff8p+21,  0x1.fffffap+21,
> > > > +			0x1.fffffcp+21,  0x1.fffffep+21 } } },
> > > > +  { .value1 = { .f = {  0x1.fffffap+22,  0x1.fffffcp+22,
> > > > +			0x1.fffffep+22,  0x1.fffffep+23 } } },
> > > > +  { .value1 = { .f = { -0x1.fffffep+23, -0x1.fffffep+22,
> > > > +		       -0x1.fffffcp+22, -0x1.fffffap+22 } } },
> > > > +  { .value1 = { .f = { -0x1.fffffep+21, -0x1.fffffcp+21,
> > > > +		       -0x1.fffffap+21, -0x1.fffff8p+21 } } },
> > > > +
> > > > +  { .value1 = { .f = { -1.00, -0.75, -0.50, -0.25 } } }
> > > > +};
> > > > +
> > > > +union value answers_NEAREST_INT[] = {
> > > > +  { .f = {  0.00,  0.00,  0.00,  1.00 } },
> > > > +
> > > > +  { .f = {  0x1.fffff8p+21,  0x1.fffff8p+21,
> > > > +            0x1.000000p+22,  0x1.000000p+22 } },
> > > > +  { .f = {  0x1.fffff8p+22,  0x1.fffffcp+22,
> > > > +            0x1.000000p+23,  0x1.fffffep+23 } },
> > > > +  { .f = { -0x1.fffffep+23, -0x1.000000p+23,
> > > > +           -0x1.fffffcp+22, -0x1.fffff8p+22 } },
> > > > +  { .f = { -0x1.000000p+22, -0x1.000000p+22,
> > > > +           -0x1.fffff8p+21, -0x1.fffff8p+21 } },
> > > > +
> > > > +  { .f = { -1.00, -1.00,  0.00,  0.00 } }
> > > > +};
> > > > +
> > > > +union value answers_NEG_INF[] = {
> > > > +  { .f = {  0.00,  0.00,  0.00,  0.00 } },
> > > > +
> > > > +  { .f = {  0x1.fffff8p+21,  0x1.fffff8p+21,
> > > > +            0x1.fffff8p+21,  0x1.fffff8p+21 } },
> > > > +  { .f = {  0x1.fffff8p+22,  0x1.fffffcp+22,
> > > > +            0x1.fffffcp+22,  0x1.fffffep+23 } },
> > > > +  { .f = { -0x1.fffffep+23, -0x1.000000p+23,
> > > > +           -0x1.fffffcp+22, -0x1.fffffcp+22 } },
> > > > +  { .f = { -0x1.000000p+22, -0x1.000000p+22,
> > > > +           -0x1.000000p+22, -0x1.fffff8p+21 } },
> > > > +
> > > > +  { .f = { -1.00, -1.00, -1.00, -1.00 } }
> > > > +};
> > > > +
> > > > +union value answers_POS_INF[] = {
> > > > +  { .f = {  0.00,  1.00,  1.00,  1.00 } },
> > > > +
> > > > +  { .f = {  0x1.fffff8p+21,  0x1.000000p+22,
> > > > +            0x1.000000p+22,  0x1.000000p+22 } },
> > > > +  { .f = {  0x1.fffffcp+22,  0x1.fffffcp+22,
> > > > +            0x1.000000p+23,  0x1.fffffep+23 } },
> > > > +  { .f = { -0x1.fffffep+23, -0x1.fffffcp+22,
> > > > +           -0x1.fffffcp+22, -0x1.fffff8p+22 } },
> > > > +  { .f = { -0x1.fffff8p+21, -0x1.fffff8p+21,
> > > > +           -0x1.fffff8p+21, -0x1.fffff8p+21 } },
> > > > +
> > > > +  { .f = { -1.00,  0.00,  0.00,  0.00 } }
> > > > +};
> > > > +
> > > > +union value answers_ZERO[] = {
> > > > +  { .f = {  0.00,  0.00,  0.00,  0.00 } },
> > > > +
> > > > +  { .f = {  0x1.fffff8p+21,  0x1.fffff8p+21,
> > > > +            0x1.fffff8p+21,  0x1.fffff8p+21 } },
> > > > +  { .f = {  0x1.fffff8p+22,  0x1.fffffcp+22,
> > > > +            0x1.fffffcp+22,  0x1.fffffep+23 } },
> > > > +  { .f = { -0x1.fffffep+23, -0x1.fffffcp+22,
> > > > +           -0x1.fffffcp+22, -0x1.fffff8p+22 } },
> > > > +  { .f = { -0x1.fffff8p+21, -0x1.fffff8p+21,
> > > > +           -0x1.fffff8p+21, -0x1.fffff8p+21 } },
> > > > +
> > > > +  { .f = { -1.00,  0.00,  0.00,  0.00 } }
> > > > +};
> > > > +
> > > > +union value *answers[] = {
> > > > +  answers_NEAREST_INT,
> > > > +  answers_NEG_INF,
> > > > +  answers_POS_INF,
> > > > +  answers_ZERO,
> > > > +  0 /* CUR_DIRECTION answers depend on current rounding mode.  */
> > > > +};
> > > > +
> > > > +#include "sse4_1-round3.h"
> > > > diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
> > > > new file mode 100644
> > > > index 000000000000..4f8d9e08c93d
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundsd.c
> > > > @@ -0,0 +1,256 @@
> > > > +/* { dg-do run } */
> > > > +/* { dg-require-effective-target vsx_hw } */
> > > > +/* { dg-options "-O2 -mvsx" } */
> > > > +
> > > > +#include <stdio.h>
> > > > +#define NO_WARN_X86_INTRINSICS 1
> > > > +#include <smmintrin.h>
> > > > +
> > > > +#define VEC_T __m128d
> > > > +#define FP_T double
> > > > +
> > > > +#define ROUND_INTRIN(x, y, mode) _mm_round_sd (x, y, mode)
> > > > +
> > > > +#include "sse4_1-round-data.h"
> > > > +
> > > > +static struct data2 data[] = {
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > > +    .value2 = { .f = {  0.00, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > > +    .value2 = { .f = {  0.25, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > > +    .value2 = { .f = {  0.50, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > > +    .value2 = { .f = {  0.75, IGNORED } } },
> > > > +
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > > +    .value2 = { .f = {  0x1.ffffffffffffcp+50, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > > +    .value2 = { .f = {  0x1.ffffffffffffdp+50, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > > +    .value2 = { .f = {  0x1.ffffffffffffep+50, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > > +    .value2 = { .f = {  0x1.fffffffffffffp+50, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > > +    .value2 = { .f = {  0x1.0000000000000p+51, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > > +    .value2 = { .f = {  0x1.0000000000001p+51, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > > +    .value2 = { .f = {  0x1.0000000000002p+51, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > > +    .value2 = { .f = {  0x1.0000000000003p+51, IGNORED } } },
> > > > +
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > > +    .value2 = { .f = {  0x1.ffffffffffffep+51, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > > +    .value2 = { .f = {  0x1.fffffffffffffp+51, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > > +    .value2 = { .f = {  0x1.0000000000000p+52, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > > +    .value2 = { .f = {  0x1.0000000000001p+52, IGNORED } } },
> > > > +
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > > +    .value2 = { .f = { -0x1.0000000000001p+52, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > > +    .value2 = { .f = { -0x1.0000000000000p+52, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > > +    .value2 = { .f = { -0x1.fffffffffffffp+51, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > > +    .value2 = { .f = { -0x1.ffffffffffffep+51, IGNORED } } },
> > > > +
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > > +    .value2 = { .f = { -0x1.0000000000004p+51, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > > +    .value2 = { .f = { -0x1.0000000000002p+51, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > > +    .value2 = { .f = { -0x1.0000000000001p+51, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > > +    .value2 = { .f = { -0x1.0000000000000p+51, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > > +    .value2 = { .f = { -0x1.ffffffffffffcp+50, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > > +    .value2 = { .f = { -0x1.ffffffffffffep+50, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > > +    .value2 = { .f = { -0x1.ffffffffffffdp+50, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > > +    .value2 = { .f = { -0x1.ffffffffffffcp+50, IGNORED } } },
> > > > +
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > > +    .value2 = { .f = { -1.00, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > > +    .value2 = { .f = { -0.75, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > > +    .value2 = { .f = { -0.50, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH } },
> > > > +    .value2 = { .f = { -0.25, IGNORED } } }
> > > > +};
> > > > +
> > > > +static union value answers_NEAREST_INT[] = {
> > > > +  { .f = {  0.00, PASSTHROUGH } },
> > > > +  { .f = {  0.00, PASSTHROUGH } },
> > > > +  { .f = {  0.00, PASSTHROUGH } },
> > > > +  { .f = {  1.00, PASSTHROUGH } },
> > > > +
> > > > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > > > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > > > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > > > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > > > +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> > > > +  { .f = {  0x1.0000000000004p+51, PASSTHROUGH } },
> > > > +
> > > > +  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> > > > +  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
> > > > +  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
> > > > +  { .f = {  0x1.0000000000001p+52, PASSTHROUGH } },
> > > > +
> > > > +  { .f = { -0x1.0000000000001p+52, PASSTHROUGH } },
> > > > +  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
> > > > +  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
> > > > +  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> > > > +
> > > > +  { .f = { -0x1.0000000000004p+51, PASSTHROUGH } },
> > > > +  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
> > > > +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> > > > +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> > > > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > > +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> > > > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > > +
> > > > +  { .f = { -1.00, PASSTHROUGH } },
> > > > +  { .f = { -1.00, PASSTHROUGH } },
> > > > +  { .f = { -0.00, PASSTHROUGH } },
> > > > +  { .f = { -0.00, PASSTHROUGH } }
> > > > +};
> > > > +
> > > > +static union value answers_NEG_INF[] = {
> > > > +  { .f = {  0.00, PASSTHROUGH } },
> > > > +  { .f = {  0.00, PASSTHROUGH } },
> > > > +  { .f = {  0.00, PASSTHROUGH } },
> > > > +  { .f = {  0.00, PASSTHROUGH } },
> > > > +
> > > > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > > > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > > > +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> > > > +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> > > > +
> > > > +  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> > > > +  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> > > > +  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
> > > > +  { .f = {  0x1.0000000000001p+52, PASSTHROUGH } },
> > > > +
> > > > +  { .f = { -0x1.0000000000001p+52, PASSTHROUGH } },
> > > > +  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
> > > > +  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
> > > > +  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> > > > +
> > > > +  { .f = { -0x1.0000000000004p+51, PASSTHROUGH } },
> > > > +  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
> > > > +  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
> > > > +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> > > > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > > +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> > > > +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> > > > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > > +
> > > > +  { .f = { -1.00, PASSTHROUGH } },
> > > > +  { .f = { -1.00, PASSTHROUGH } },
> > > > +  { .f = { -1.00, PASSTHROUGH } },
> > > > +  { .f = { -1.00, PASSTHROUGH } }
> > > > +};
> > > > +
> > > > +static union value answers_POS_INF[] = {
> > > > +  { .f = {  0.00, PASSTHROUGH } },
> > > > +  { .f = {  1.00, PASSTHROUGH } },
> > > > +  { .f = {  1.00, PASSTHROUGH } },
> > > > +  { .f = {  1.00, PASSTHROUGH } },
> > > > +
> > > > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > > > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > > > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > > > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > > > +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> > > > +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> > > > +  { .f = {  0x1.0000000000004p+51, PASSTHROUGH } },
> > > > +
> > > > +  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> > > > +  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
> > > > +  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
> > > > +  { .f = {  0x1.0000000000001p+52, PASSTHROUGH } },
> > > > +
> > > > +  { .f = { -0x1.0000000000001p+52, PASSTHROUGH } },
> > > > +  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
> > > > +  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> > > > +  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> > > > +
> > > > +  { .f = { -0x1.0000000000004p+51, PASSTHROUGH } },
> > > > +  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
> > > > +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> > > > +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> > > > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > > +
> > > > +  { .f = { -1.00, PASSTHROUGH } },
> > > > +  { .f = {  0.00, PASSTHROUGH } },
> > > > +  { .f = {  0.00, PASSTHROUGH } },
> > > > +  { .f = {  0.00, PASSTHROUGH } }
> > > > +};
> > > > +
> > > > +static union value answers_ZERO[] = {
> > > > +  { .f = {  0.00, PASSTHROUGH } },
> > > > +  { .f = {  0.00, PASSTHROUGH } },
> > > > +  { .f = {  0.00, PASSTHROUGH } },
> > > > +  { .f = {  0.00, PASSTHROUGH } },
> > > > +
> > > > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > > +  { .f = {  0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > > > +  { .f = {  0x1.0000000000000p+51, PASSTHROUGH } },
> > > > +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> > > > +  { .f = {  0x1.0000000000002p+51, PASSTHROUGH } },
> > > > +
> > > > +  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> > > > +  { .f = {  0x1.ffffffffffffep+51, PASSTHROUGH } },
> > > > +  { .f = {  0x1.0000000000000p+52, PASSTHROUGH } },
> > > > +  { .f = {  0x1.0000000000001p+52, PASSTHROUGH } },
> > > > +
> > > > +  { .f = { -0x1.0000000000001p+52, PASSTHROUGH } },
> > > > +  { .f = { -0x1.0000000000000p+52, PASSTHROUGH } },
> > > > +  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> > > > +  { .f = { -0x1.ffffffffffffep+51, PASSTHROUGH } },
> > > > +
> > > > +  { .f = { -0x1.0000000000004p+51, PASSTHROUGH } },
> > > > +  { .f = { -0x1.0000000000002p+51, PASSTHROUGH } },
> > > > +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> > > > +  { .f = { -0x1.0000000000000p+51, PASSTHROUGH } },
> > > > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > > +  { .f = { -0x1.ffffffffffffcp+50, PASSTHROUGH } },
> > > > +
> > > > +  { .f = { -1.00, PASSTHROUGH } },
> > > > +  { .f = {  0.00, PASSTHROUGH } },
> > > > +  { .f = {  0.00, PASSTHROUGH } },
> > > > +  { .f = {  0.00, PASSTHROUGH } }
> > > > +};
> > > > +
> > > > +union value *answers[] = {
> > > > +  answers_NEAREST_INT,
> > > > +  answers_NEG_INF,
> > > > +  answers_POS_INF,
> > > > +  answers_ZERO,
> > > > +  0 /* CUR_DIRECTION answers depend on current rounding mode.  */
> > > > +};
> > > > +
> > > > +#include "sse4_1-round3.h"
> > > > diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c
> > > > new file mode 100644
> > > > index 000000000000..d788ebda64dd
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-roundss.c
> > > > @@ -0,0 +1,208 @@
> > > > +/* { dg-do run } */
> > > > +/* { dg-require-effective-target vsx_hw } */
> > > > +/* { dg-options "-O2 -mvsx" } */
> > > > +
> > > > +#include <stdio.h>
> > > > +#define NO_WARN_X86_INTRINSICS 1
> > > > +#include <smmintrin.h>
> > > > +
> > > > +#define VEC_T __m128
> > > > +#define FP_T float
> > > > +
> > > > +#define ROUND_INTRIN(x, y, mode) _mm_round_ss (x, y, mode)
> > > > +
> > > > +#include "sse4_1-round-data.h"
> > > > +
> > > > +static struct data2 data[] = {
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +    .value2 = { .f = {  0.00, IGNORED, IGNORED, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +    .value2 = { .f = {  0.25, IGNORED, IGNORED, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +    .value2 = { .f = {  0.50, IGNORED, IGNORED, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +    .value2 = { .f = {  0.75, IGNORED, IGNORED, IGNORED } } },
> > > > +
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +    .value2 = { .f = {  0x1.fffff8p+21, IGNORED, IGNORED, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +    .value2 = { .f = {  0x1.fffffap+21, IGNORED, IGNORED, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +    .value2 = { .f = {  0x1.fffffcp+21, IGNORED, IGNORED, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +    .value2 = { .f = {  0x1.fffffep+21, IGNORED, IGNORED, IGNORED } } },
> > > > +
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +    .value2 = { .f = {  0x1.fffffap+22, IGNORED, IGNORED, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +    .value2 = { .f = {  0x1.fffffcp+22, IGNORED, IGNORED, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +    .value2 = { .f = {  0x1.fffffep+22, IGNORED, IGNORED, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +    .value2 = { .f = {  0x1.fffffep+23, IGNORED, IGNORED, IGNORED } } },
> > > > +
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +    .value2 = { .f = { -0x1.fffffep+23, IGNORED, IGNORED, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +    .value2 = { .f = { -0x1.fffffep+22, IGNORED, IGNORED, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +    .value2 = { .f = { -0x1.fffffcp+22, IGNORED, IGNORED, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +    .value2 = { .f = { -0x1.fffffap+22, IGNORED, IGNORED, IGNORED } } },
> > > > +
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +    .value2 = { .f = { -0x1.fffffep+21, IGNORED, IGNORED, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +    .value2 = { .f = { -0x1.fffffcp+21, IGNORED, IGNORED, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +    .value2 = { .f = { -0x1.fffffap+21, IGNORED, IGNORED, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +    .value2 = { .f = { -0x1.fffff8p+21, IGNORED, IGNORED, IGNORED } } },
> > > > +
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +    .value2 = { .f = { -1.00, IGNORED, IGNORED, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +    .value2 = { .f = { -0.75, IGNORED, IGNORED, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +    .value2 = { .f = { -0.50, IGNORED, IGNORED, IGNORED } } },
> > > > +  { .value1 = { .f = { IGNORED, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +    .value2 = { .f = { -0.25, IGNORED, IGNORED, IGNORED } } }
> > > > +};
> > > > +
> > > > +static union value answers_NEAREST_INT[] = {
> > > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +
> > > > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +
> > > > +  { .f = {  0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +
> > > > +  { .f = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = { -0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = { -0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +
> > > > +  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +
> > > > +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = { -0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = { -0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }
> > > > +};
> > > > +
> > > > +static union value answers_NEG_INF[] = {
> > > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +
> > > > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +
> > > > +  { .f = {  0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +
> > > > +  { .f = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = { -0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +
> > > > +  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = { -0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +
> > > > +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }
> > > > +};
> > > > +
> > > > +static union value answers_POS_INF[] = {
> > > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +
> > > > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0x1.000000p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +
> > > > +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0x1.000000p+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +
> > > > +  { .f = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = { -0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +
> > > > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +
> > > > +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }
> > > > +};
> > > > +
> > > > +static union value answers_ZERO[] = {
> > > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +
> > > > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +
> > > > +  { .f = {  0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +
> > > > +  { .f = { -0x1.fffffep+23, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = { -0x1.fffffcp+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = { -0x1.fffff8p+22, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +
> > > > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = { -0x1.fffff8p+21, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +
> > > > +  { .f = { -1.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } },
> > > > +  { .f = {  0.00, PASSTHROUGH, PASSTHROUGH, PASSTHROUGH } }
> > > > +};
> > > > +
> > > > +union value *answers[] = {
> > > > +  answers_NEAREST_INT,
> > > > +  answers_NEG_INF,
> > > > +  answers_POS_INF,
> > > > +  answers_ZERO,
> > > > +  0 /* CUR_DIRECTION answers depend on current rounding mode.  */
> > > > +};
> > > > +
> > > > +#include "sse4_1-round3.h"
> > > > -- 
> > > > 2.27.0
> > > > 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 2/3] rs6000: Support SSE4.1 "round" intrinsics
@ 2022-01-11 19:25 David Edelsohn
  0 siblings, 0 replies; 13+ messages in thread
From: David Edelsohn @ 2022-01-11 19:25 UTC (permalink / raw)
  To: Paul A. Clarke, Segher Boessenkool, Bill Schmidt; +Cc: GCC Patches

Suppress exceptions (when specified), by saving, manipulating, and
restoring the FPSCR.  Similarly, save, set, and restore the floating-point
rounding mode when required.

No attempt is made to optimize writing the FPSCR (by checking if the new
value would be the same), other than using lighter weight instructions
when possible. Note that explicit instruction scheduling "barriers" are
added to prevent floating-point computations from being moved before or
after the explicit FPSCR manipulations.  (That these are required has
been reported as an issue in GCC: PR102783.)

The scalar versions naively use the parallel versions to compute the
single scalar result and then construct the remainder of the result.

Of minor note, the values of _MM_FROUND_TO_NEG_INF and _MM_FROUND_TO_ZERO
are swapped from the corresponding values on x86 so as to match the
corresponding rounding mode values in the Power ISA.

Move implementations of _mm_ceil* and _mm_floor* into _mm_round*, and
convert _mm_ceil* and _mm_floor* into macros. This matches the current
analogous implementations in config/i386/smmintrin.h.

Function signatures match the analogous functions in config/i386/smmintrin.h.

Add tests for _mm_round_pd, _mm_round_ps, _mm_round_sd, _mm_round_ss,
modeled after the very similar "floor" and "ceil" tests.

Include basic tests, plus tests at the boundaries for floating-point
representation, positive and negative, test all of the parameterized
rounding modes as well as the C99 rounding modes and interactions
between the two.

Exceptions are not explicitly tested.

2021-10-18  Paul A. Clarke  <pc@us.ibm.com>

gcc
* config/rs6000/smmintrin.h (_mm_round_pd, _mm_round_ps,
_mm_round_sd, _mm_round_ss, _MM_FROUND_TO_NEAREST_INT,
_MM_FROUND_TO_ZERO, _MM_FROUND_TO_POS_INF, _MM_FROUND_TO_NEG_INF,
_MM_FROUND_CUR_DIRECTION, _MM_FROUND_RAISE_EXC, _MM_FROUND_NO_EXC,
_MM_FROUND_NINT, _MM_FROUND_FLOOR, _MM_FROUND_CEIL, _MM_FROUND_TRUNC,
_MM_FROUND_RINT, _MM_FROUND_NEARBYINT): New.
* config/rs6000/smmintrin.h (_mm_ceil_pd, _mm_ceil_ps, _mm_ceil_sd,
_mm_ceil_ss, _mm_floor_pd, _mm_floor_ps, _mm_floor_sd, _mm_floor_ss):
Convert from function to macro.

gcc/testsuite
* gcc.target/powerpc/sse4_1-round3.h: New.
* gcc.target/powerpc/sse4_1-roundpd.c: New.
* gcc.target/powerpc/sse4_1-roundps.c: New.
* gcc.target/powerpc/sse4_1-roundsd.c: New.
* gcc.target/powerpc/sse4_1-roundss.c: New.

Okay.

Thanks, David

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-01-11 19:26 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-19  1:15 [PATCH v4 0/3] rs6000: Support more SSE4 intrinsics Paul A. Clarke
2021-10-19  1:15 ` [PATCH v4 1/3] rs6000: Add nmmintrin.h to extra_headers Paul A. Clarke
2021-10-19 13:10   ` Bill Schmidt
2021-10-19 14:27     ` Segher Boessenkool
2021-10-19  1:15 ` [PATCH v4 2/3] rs6000: Support SSE4.1 "round" intrinsics Paul A. Clarke
2021-10-26 20:00   ` [PING PATCH " Paul A. Clarke
2021-11-08 17:40     ` [PING^2 " Paul A. Clarke
2021-11-19  2:24       ` [PING^3 " Paul A. Clarke
2022-01-03 16:48         ` [PING^4 " Paul A. Clarke
2021-10-19  1:15 ` [PATCH v4 3/3] rs6000: Guard some x86 intrinsics implementations Paul A. Clarke
2021-10-19 14:32   ` Segher Boessenkool
2021-10-19 15:23     ` Paul A. Clarke
2022-01-11 19:25 [PATCH v4 2/3] rs6000: Support SSE4.1 "round" intrinsics David Edelsohn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).