public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [patch][i386, AVX] Adding missing mask[z]_range[_round]_s[d,s] intrinsics
@ 2017-12-04  9:45 Makhotina, Olga
  2018-02-05  7:09 ` Kirill Yukhin
  0 siblings, 1 reply; 2+ messages in thread
From: Makhotina, Olga @ 2017-12-04  9:45 UTC (permalink / raw)
  To: 'gcc-patches@gcc.gnu.org'
  Cc: 'Kirill Yukhin', Makhotina, Olga, Peryt, Sebastian

[-- Attachment #1: Type: text/plain, Size: 2562 bytes --]

Hi,

This patch adds missing intrinsics for _mm_mask[z]_range[_round]_[sd,ss].

04.12.2017 Olga Makhotina  <olga.makhotina@intel.com>

gcc/
	* config/i386/avx512dqintrin.h (_mm_mask_range_sd, _mm_maskz_range_sd,
	_mm_mask_range_round_sd, _mm_maskz_range_round_sd, _mm_mask_range_ss,
	_mm_maskz_range_ss, _mm_mask_range_round_ss,
	_mm_maskz_range_round_ss): New intrinsics.
	(__builtin_ia32_rangesd128_round, __builtin_ia32_rangess128_round): Remove.
	(__builtin_ia32_rangesd128_mask_round,
	__builtin_ia32_rangess128_mask_round): New builtins.
	* config/i386/i386-builtin.def (__builtin_ia32_rangesd128_round,
	__builtin_ia32_rangess128_round): Remove.
	(__builtin_ia32_rangesd128_mask_round,
	__builtin_ia32_rangess128_mask_round): New builtins.
	* config/i386/sse.md (ranges<mode><round_saeonly_name>): Renamed to ...
	(ranges<mode><mask_scalar_name><round_saeonly_scalar_name>): ... this.
	((match_operand:VF_128 2 "<round_saeonly_nimm_predicate>"
	"<round_saeonly_constraint>")): Changed to ...
	((match_operand:VF_128 2 "<round_saeonly_scalar_nimm_predicate>"
	"<round_saeonly_scalar_constraint>")): ... this.
	("vrange<ssescalarmodesuffix>\t{%3, <round_saeonly_op4>%2, %1, %0|
	%0, %1, %2<round_saeonly_op4>, %3}"): Changed to ...
	("vrange<ssescalarmodesuffix>\t{%3, <round_saeonly_scalar_mask_op4>%2, %1,
	%0<mask_scalar_operand4>|%0<mask_scalar_operand4>, %1,
	%2<round_saeonly_scalar_mask_op4>, %3}"): ... this.

04.12.2017 Olga Makhotina  <olga.makhotina@intel.com>

gcc/testsuite/
	* gcc.target/i386/avx512dq-vrangesd-1.c (_mm_mask_range_sd,
	_mm_maskz_range_sd, _mm_mask_range_round_sd,
	_mm_maskz_range_round_sd): Test new intrinsics.
	* gcc.target/i386/avx512dq-vrangesd-2.c (_mm_range_sd, _mm_mask_range_sd,
	_mm_maskz_range_sd, _mm_range_round_sd, _mm_mask_range_round_sd,
	_mm_maskz_range_round_sd): Test new intrinsics.
	* gcc.target/i386/avx512dq-vrangess-1.c (_mm_mask_range_ss,
	_mm_maskz_range_ss, _mm_mask_range_round_ss,
	_mm_maskz_range_round_ss): Test new intrinsics.
	* gcc.target/i386/avx512dq-vrangess-2.c (_mm_range_ss, _mm_mask_range_ss,
	_mm_maskz_range_ss, _mm_range_round_ss, _mm_mask_range_round_ss,
	_mm_maskz_range_round_ss): Test new intrinsics.
	* gcc.target/i386/avx-1.c (__builtin_ia32_rangesd128_round,
	__builtin_ia32_rangess128_round): Remove builtins.
	(__builtin_ia32_rangesd128_mask_round,
	__builtin_ia32_rangess128_mask_round): Test new builtins.
	* gcc.target/i386/sse-13.c: Ditto.
	* gcc.target/i386/sse-23.c: Ditto.

Is it ok for trunk?

Thanks,
Olga


[-- Attachment #2: 0001-range.patch --]
[-- Type: application/octet-stream, Size: 23466 bytes --]

From fdacf0127ab3f3e919bde6bb86d9cee57476b02e Mon Sep 17 00:00:00 2001
From: Olga Makhotina <olga.makhotina@intel.com>
Date: Mon, 20 Nov 2017 13:18:13 +0100
Subject: [PATCH] range

---
 gcc/config/i386/avx512dqintrin.h                   | 180 ++++++++++++++++++---
 gcc/config/i386/i386-builtin.def                   |   4 +-
 gcc/config/i386/sse.md                             |   6 +-
 gcc/testsuite/gcc.target/i386/avx-1.c              |   6 +-
 .../gcc.target/i386/avx512dq-vrangesd-1.c          |  10 ++
 .../gcc.target/i386/avx512dq-vrangesd-2.c          |  66 ++++++++
 .../gcc.target/i386/avx512dq-vrangess-1.c          |   9 ++
 .../gcc.target/i386/avx512dq-vrangess-2.c          |  66 ++++++++
 gcc/testsuite/gcc.target/i386/sse-13.c             |   6 +-
 gcc/testsuite/gcc.target/i386/sse-23.c             |   6 +-
 10 files changed, 330 insertions(+), 29 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512dq-vrangesd-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512dq-vrangess-2.c

diff --git a/gcc/config/i386/avx512dqintrin.h b/gcc/config/i386/avx512dqintrin.h
index 8e887d8..83b9637 100644
--- a/gcc/config/i386/avx512dqintrin.h
+++ b/gcc/config/i386/avx512dqintrin.h
@@ -1223,18 +1223,70 @@ extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_range_sd (__m128d __A, __m128d __B, int __C)
 {
-  return (__m128d) __builtin_ia32_rangesd128_round ((__v2df) __A,
+  return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A,
 						   (__v2df) __B, __C,
+						   (__v2df)
+						   _mm_setzero_pd (),
+						   (__mmask8) -1,
+						   _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_range_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B, int __C)
+{
+  return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A,
+						   (__v2df) __B, __C,
+						   (__v2df) __W,
+						   (__mmask8) __U,
 						   _MM_FROUND_CUR_DIRECTION);
 }
 
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_range_sd (__mmask8 __U, __m128d __A, __m128d __B, int __C)
+{
+  return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A,
+						   (__v2df) __B, __C,
+						   (__v2df)
+						   _mm_setzero_pd (),
+						   (__mmask8) __U,
+						   _MM_FROUND_CUR_DIRECTION);
+}
 
 extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_range_ss (__m128 __A, __m128 __B, int __C)
 {
-  return (__m128) __builtin_ia32_rangess128_round ((__v4sf) __A,
+  return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A,
 						  (__v4sf) __B, __C,
+						  (__v4sf)
+						  _mm_setzero_ps (),
+						  (__mmask8) -1,
+						  _MM_FROUND_CUR_DIRECTION);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_range_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B, int __C)
+{
+  return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A,
+						  (__v4sf) __B, __C,
+						  (__v4sf) __W,
+						  (__mmask8) __U,
+						  _MM_FROUND_CUR_DIRECTION);
+}
+
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_range_ss (__mmask8 __U, __m128 __A, __m128 __B, int __C)
+{
+  return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A,
+						  (__v4sf) __B, __C,
+						  (__v4sf)
+						  _mm_setzero_ps (),
+						  (__mmask8) __U,
 						  _MM_FROUND_CUR_DIRECTION);
 }
 
@@ -1242,18 +1294,68 @@ extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_range_round_sd (__m128d __A, __m128d __B, int __C, const int __R)
 {
-  return (__m128d) __builtin_ia32_rangesd128_round ((__v2df) __A,
+  return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A,
 						   (__v2df) __B, __C,
-						   __R);
+						   (__v2df)
+						   _mm_setzero_pd (),
+						   (__mmask8) -1, __R);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_range_round_sd (__m128d __W, __mmask8 __U, __m128d __A, __m128d __B,
+			 int __C, const int __R)
+{
+  return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A,
+						   (__v2df) __B, __C,
+						   (__v2df) __W,
+						   (__mmask8) __U, __R);
+}
+
+extern __inline __m128d
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_range_round_sd (__mmask8 __U, __m128d __A, __m128d __B, int __C,
+			  const int __R)
+{
+  return (__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df) __A,
+						   (__v2df) __B, __C,
+						   (__v2df)
+						   _mm_setzero_pd (),
+						   (__mmask8) __U, __R);
 }
 
 extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_range_round_ss (__m128 __A, __m128 __B, int __C, const int __R)
 {
-  return (__m128) __builtin_ia32_rangess128_round ((__v4sf) __A,
+  return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A,
 						  (__v4sf) __B, __C,
-						  __R);
+						  (__v4sf)
+						  _mm_setzero_ps (),
+						  (__mmask8) -1, __R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_mask_range_round_ss (__m128 __W, __mmask8 __U, __m128 __A, __m128 __B,
+			 int __C, const int __R)
+{
+  return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A,
+						  (__v4sf) __B, __C,
+						  (__v4sf) __W,
+						  (__mmask8) __U, __R);
+}
+
+extern __inline __m128
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_maskz_range_round_ss (__mmask8 __U, __m128 __A, __m128 __B, int __C,
+			  const int __R)
+{
+  return (__m128) __builtin_ia32_rangess128_mask_round ((__v4sf) __A,
+						  (__v4sf) __B, __C,
+						  (__v4sf)
+						  _mm_setzero_ps (),
+						  (__mmask8) __U, __R);
 }
 
 extern __inline __mmask8
@@ -2148,23 +2250,65 @@ _mm512_fpclass_ps_mask (__m512 __A, const int __imm)
 #define _kshiftri_mask8(X, Y)						\
   ((__mmask8) __builtin_ia32_kshiftriqi ((__mmask8)(X), (__mmask8)(Y)))
 
-#define _mm_range_sd(A, B, C)						\
-  ((__m128d) __builtin_ia32_rangesd128_round ((__v2df)(__m128d)(A),	\
-    (__v2df)(__m128d)(B), (int)(C),					\
-    _MM_FROUND_CUR_DIRECTION))
+#define _mm_range_sd(A, B, C)						 \
+  ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \
+    (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), 	 \
+    (__mmask8) -1, _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_mask_range_sd(W, U, A, B, C)				 \
+  ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \
+    (__v2df)(__m128d)(B), (int)(C), (__v2df)(__m128d)(W), 		 \
+    (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_maskz_range_sd(U, A, B, C)					 \
+  ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \
+    (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (), 	 \
+    (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
 
 #define _mm_range_ss(A, B, C)						\
-  ((__m128) __builtin_ia32_rangess128_round ((__v4sf)(__m128)(A),	\
-    (__v4sf)(__m128)(B), (int)(C),					\
-    _MM_FROUND_CUR_DIRECTION))
+  ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A),	\
+    (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (),		\
+    (__mmask8) -1, _MM_FROUND_CUR_DIRECTION))
 
-#define _mm_range_round_sd(A, B, C, R)					\
-  ((__m128d) __builtin_ia32_rangesd128_round ((__v2df)(__m128d)(A),	\
-    (__v2df)(__m128d)(B), (int)(C), (R)))
+#define _mm_mask_range_ss(W, U, A, B, C)				\
+  ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A),	\
+    (__v4sf)(__m128)(B), (int)(C), (__v4sf)(__m128)(W),			\
+    (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_maskz_range_ss(U, A, B, C)					\
+  ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A),	\
+    (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (),		\
+    (__mmask8)(U), _MM_FROUND_CUR_DIRECTION))
+
+#define _mm_range_round_sd(A, B, C, R)					 \
+  ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \
+    (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (),		 \
+    (__mmask8) -1, (R)))
+
+#define _mm_mask_range_round_sd(W, U, A, B, C, R)			 \
+  ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \
+    (__v2df)(__m128d)(B), (int)(C), (__v2df)(__m128d)(W),		 \
+    (__mmask8)(U), (R)))
+
+#define _mm_maskz_range_round_sd(U, A, B, C, R)				 \
+  ((__m128d) __builtin_ia32_rangesd128_mask_round ((__v2df)(__m128d)(A), \
+    (__v2df)(__m128d)(B), (int)(C), (__v2df) _mm_setzero_pd (),		 \
+    (__mmask8)(U), (R)))
 
 #define _mm_range_round_ss(A, B, C, R)					\
-  ((__m128) __builtin_ia32_rangess128_round ((__v4sf)(__m128)(A),	\
-    (__v4sf)(__m128)(B), (int)(C), (R)))
+  ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A),	\
+    (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (),		\
+    (__mmask8) -1, (R)))
+
+#define _mm_mask_range_round_ss(W, U, A, B, C, R)			\
+  ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A),	\
+    (__v4sf)(__m128)(B), (int)(C), (__v4sf)(__m128)(W),			\
+    (__mmask8)(U), (R)))
+
+#define _mm_maskz_range_round_ss(U, A, B, C, R)				\
+  ((__m128) __builtin_ia32_rangess128_mask_round ((__v4sf)(__m128)(A),	\
+    (__v4sf)(__m128)(B), (int)(C), (__v4sf) _mm_setzero_ps (),		\
+    (__mmask8)(U), (R)))
 
 #define _mm512_cvtt_roundpd_epi64(A, B)		    \
   ((__m512i)__builtin_ia32_cvttpd2qq512_mask ((A), (__v8di)		\
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index e46a6ab..c9b6de6 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -2562,8 +2562,8 @@ BDESC (OPTION_MASK_ISA_AVX512ER, CODE_FOR_avx512er_vmrsqrt28v2df_round, "__built
 BDESC (OPTION_MASK_ISA_AVX512ER, CODE_FOR_avx512er_vmrsqrt28v4sf_round, "__builtin_ia32_rsqrt28ss_round", IX86_BUILTIN_RSQRT28SS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT)
 
 /* AVX512DQ.  */
-BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_avx512dq_rangesv2df_round, "__builtin_ia32_rangesd128_round", IX86_BUILTIN_RANGESD128, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT_INT)
-BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_avx512dq_rangesv4sf_round, "__builtin_ia32_rangess128_round", IX86_BUILTIN_RANGESS128, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_avx512dq_rangesv2df_mask_round, "__builtin_ia32_rangesd128_mask_round", IX86_BUILTIN_RANGESD128, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT_V2DF_UQI_INT)
+BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_avx512dq_rangesv4sf_mask_round, "__builtin_ia32_rangess128_mask_round", IX86_BUILTIN_RANGESS128, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT_V4SF_UQI_INT)
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_fix_notruncv8dfv8di2_mask_round, "__builtin_ia32_cvtpd2qq512_mask", IX86_BUILTIN_CVTPD2QQ512, UNKNOWN, (int) V8DI_FTYPE_V8DF_V8DI_QI_INT)
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_avx512dq_cvtps2qqv8di_mask_round, "__builtin_ia32_cvtps2qq512_mask", IX86_BUILTIN_CVTPS2QQ512, UNKNOWN, (int) V8DI_FTYPE_V8SF_V8DI_QI_INT)
 BDESC (OPTION_MASK_ISA_AVX512DQ, CODE_FOR_ufix_notruncv8dfv8di2_mask_round, "__builtin_ia32_cvtpd2uqq512_mask", IX86_BUILTIN_CVTPD2UQQ512, UNKNOWN, (int) V8DI_FTYPE_V8DF_V8DI_QI_INT)
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 7f17231..8d527ad 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -19348,18 +19348,18 @@
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
 
-(define_insn "avx512dq_ranges<mode><round_saeonly_name>"
+(define_insn "avx512dq_ranges<mode><mask_scalar_name><round_saeonly_scalar_name>"
   [(set (match_operand:VF_128 0 "register_operand" "=v")
 	(vec_merge:VF_128
 	  (unspec:VF_128
 	    [(match_operand:VF_128 1 "register_operand" "v")
-	     (match_operand:VF_128 2 "<round_saeonly_nimm_predicate>" "<round_saeonly_constraint>")
+	     (match_operand:VF_128 2 "<round_saeonly_scalar_nimm_predicate>" "<round_saeonly_scalar_constraint>")
 	     (match_operand:SI 3 "const_0_to_15_operand")]
 	    UNSPEC_RANGE)
 	  (match_dup 1)
 	  (const_int 1)))]
   "TARGET_AVX512DQ"
-  "vrange<ssescalarmodesuffix>\t{%3, <round_saeonly_op4>%2, %1, %0|%0, %1, %2<round_saeonly_op4>, %3}"
+  "vrange<ssescalarmodesuffix>\t{%3, <round_saeonly_scalar_mask_op4>%2, %1, %0<mask_scalar_operand4>|%0<mask_scalar_operand4>, %1, %2<round_saeonly_scalar_mask_op4>, %3}"
   [(set_attr "type" "sse")
    (set_attr "prefix" "evex")
    (set_attr "mode" "<MODE>")])
diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c
index 1133a83..394ed33 100644
--- a/gcc/testsuite/gcc.target/i386/avx-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx-1.c
@@ -416,8 +416,10 @@
 #define __builtin_ia32_reducesd_mask(A, B, F, W, U) __builtin_ia32_reducesd_mask(A, B, 1, W, U)
 #define __builtin_ia32_reduceps512_mask(A, E, C, D) __builtin_ia32_reduceps512_mask(A, 1, C, D)
 #define __builtin_ia32_reducepd512_mask(A, E, C, D) __builtin_ia32_reducepd512_mask(A, 1, C, D)
-#define __builtin_ia32_rangess128_round(A, B, I, F) __builtin_ia32_rangess128_round(A, B, 1, 8)
-#define __builtin_ia32_rangesd128_round(A, B, I, F) __builtin_ia32_rangesd128_round(A, B, 1, 8)
+#define __builtin_ia32_rangess128_mask_round(A, B, I, D, E, F) \
+    __builtin_ia32_rangess128_mask_round(A, B, 1, D, E, 8)
+#define __builtin_ia32_rangesd128_mask_round(A, B, I, D, E, F) \
+    __builtin_ia32_rangesd128_mask_round(A, B, 1, D, E, 8)
 #define __builtin_ia32_rangeps512_mask(A, B, I, D, E, F) __builtin_ia32_rangeps512_mask(A, B, 1, D, E, 8)
 #define __builtin_ia32_rangepd512_mask(A, B, I, D, E, F) __builtin_ia32_rangepd512_mask(A, B, 1, D, E, 8)
 #define __builtin_ia32_inserti64x2_512_mask(A, B, F, D, E) __builtin_ia32_inserti64x2_512_mask(A, B, 1, D, E)
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-vrangesd-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-vrangesd-1.c
index 4f7d635..aa2124e 100644
--- a/gcc/testsuite/gcc.target/i386/avx512dq-vrangesd-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-vrangesd-1.c
@@ -2,6 +2,11 @@
 /* { dg-options "-mavx512dq -O2" } */
 /* { dg-final { scan-assembler-times "vrangesd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vrangesd\[ \\t\]+\[^\$\n\]*\\$\[^\{\n\]*\{sae\}\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrangesd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrangesd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrangesd\[ \\t\]+\[^\{\n\]*\{sae\}\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrangesd\[ \\t\]+\[^\{\n\]*\{sae\}\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+
 
 #include <immintrin.h>
 
@@ -12,5 +17,10 @@ void extern
 avx512dq_test (void)
 {
   x1 = _mm_range_sd (x1, x2, 3);
+  x1 = _mm_mask_range_sd (x1, m, x1, x2, 3);
+  x1 = _mm_maskz_range_sd (m, x1, x2, 3);
+
   x1 = _mm_range_round_sd (x1, x2, 3, _MM_FROUND_NO_EXC);
+  x1 = _mm_mask_range_round_sd (x1, m, x1, x2, 3, _MM_FROUND_NO_EXC);
+  x1 = _mm_maskz_range_round_sd (m, x1, x2, 3, _MM_FROUND_NO_EXC);
 }
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-vrangesd-2.c b/gcc/testsuite/gcc.target/i386/avx512dq-vrangesd-2.c
new file mode 100644
index 0000000..91c346c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-vrangesd-2.c
@@ -0,0 +1,66 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512dq" } */
+/* { dg-require-effective-target avx512dq } */
+
+#define AVX512DQ
+#include "avx512f-helper.h"
+
+#define SIZE (128 / 64)
+#include "avx512f-mask-type.h"
+#define IMM 0x04
+
+void
+CALC (double *s1, double *s2, double *r)
+{
+  int i = 0;
+  r[i] = (s1[i] <= s2[i])? s1[i]: s2[i];
+  for (i = 1; i < SIZE; i++)
+    {
+       r[i] = s1[i];
+    }
+}
+
+void
+TEST (void)
+{
+  union128d s1, s2, res1, res2, res3;
+  union128d res1r, res2r, res3r;
+  MASK_TYPE mask = MASK_VALUE;
+  double res_ref[SIZE];
+  int i, sign = 1;
+
+  for (i = 0; i < SIZE; i++)
+    {
+      s1.a[i] = 234.567 * i * sign;
+      s2.a[i] = 100 * (i + 1);
+      res1.a[i] = DEFAULT_VALUE;
+      res2.a[i] = DEFAULT_VALUE;
+      res3.a[i] = DEFAULT_VALUE;
+      res1r.a[i] = DEFAULT_VALUE;
+      res2r.a[i] = DEFAULT_VALUE;
+      res3r.a[i] = DEFAULT_VALUE;
+      sign = -sign;
+    }
+
+  res1.x = _mm_range_sd (s1.x, s2.x, IMM);
+  res2.x = _mm_mask_range_sd (res2.x, mask, s1.x, s2.x, IMM);
+  res3.x = _mm_maskz_range_sd (mask, s1.x, s2.x, IMM);
+  res1r.x = _mm_range_round_sd (s1.x, s2.x, IMM, _MM_FROUND_NO_EXC);
+  res2r.x = _mm_mask_range_round_sd (res2.x, mask, s1.x, s2.x, IMM,
+				     _MM_FROUND_NO_EXC);
+  res3r.x = _mm_maskz_range_round_sd (mask, s1.x, s2.x, IMM,
+				      _MM_FROUND_NO_EXC);
+
+  CALC (s1.a, s2.a, res_ref);
+
+  if (check_union128d (res1, res_ref) || check_union128d (res1r, res_ref))
+    abort ();
+
+  MASK_MERGE (d) (res_ref, mask, 1);
+  if (check_union128d (res2, res_ref) || check_union128d (res2r, res_ref))
+    abort ();
+
+  MASK_ZERO (d) (res_ref, mask, 1);
+  if (check_union128d (res3, res_ref) || check_union128d (res3r, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-vrangess-1.c b/gcc/testsuite/gcc.target/i386/avx512dq-vrangess-1.c
index b0ed86d..3b401df 100644
--- a/gcc/testsuite/gcc.target/i386/avx512dq-vrangess-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-vrangess-1.c
@@ -2,6 +2,10 @@
 /* { dg-options "-mavx512dq -O2" } */
 /* { dg-final { scan-assembler-times "vrangess\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vrangess\[ \\t\]+\[^\$\n\]*\\$\[^\{\n\]*\{sae\}\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrangess\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrangess\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrangess\[ \\t\]+\[^\{\n\]*\{sae\}\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vrangess\[ \\t\]+\[^\{\n\]*\{sae\}\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */
 
 #include <immintrin.h>
 
@@ -12,5 +16,10 @@ void extern
 avx512dq_test (void)
 {
   x1 = _mm_range_ss (x1, x2, 1);
+  x1 = _mm_mask_range_ss (x1, m, x1, x2, 1);
+  x1 = _mm_maskz_range_ss (m, x1, x2, 1);
+
   x1 = _mm_range_round_ss (x1, x2, 1, _MM_FROUND_NO_EXC);
+  x1 = _mm_mask_range_round_ss (x1, m, x1, x2, 1, _MM_FROUND_NO_EXC);
+  x1 = _mm_maskz_range_round_ss (m, x1, x2, 1, _MM_FROUND_NO_EXC);
 }
diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-vrangess-2.c b/gcc/testsuite/gcc.target/i386/avx512dq-vrangess-2.c
new file mode 100644
index 0000000..ba6561d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512dq-vrangess-2.c
@@ -0,0 +1,66 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mavx512dq" } */
+/* { dg-require-effective-target avx512dq } */
+
+#define AVX512DQ
+#include "avx512f-helper.h"
+
+#define SIZE (128 / 32)
+#include "avx512f-mask-type.h"
+#define IMM 0x04
+
+void
+CALC (float *s1, float *s2, float *r)
+{
+  int i = 0;
+  r[i] = (s1[i] <= s2[i])? s1[i]: s2[i];
+  for (i = 1; i < SIZE; i++)
+    {
+       r[i] = s1[i];
+    }
+}
+
+void
+TEST (void)
+{
+  union128 s1, s2, res1, res2, res3;
+  union128 res1r, res2r, res3r;
+  MASK_TYPE mask = MASK_VALUE;
+  float res_ref[SIZE];
+  int i, sign = 1;
+
+  for (i = 0; i < SIZE; i++)
+    {
+      s1.a[i] = 234.567 * i * sign;
+      s2.a[i] = 100 * (i + 1);
+      res1.a[i] = DEFAULT_VALUE;
+      res2.a[i] = DEFAULT_VALUE;
+      res3.a[i] = DEFAULT_VALUE;
+      res1r.a[i] = DEFAULT_VALUE;
+      res2r.a[i] = DEFAULT_VALUE;
+      res3r.a[i] = DEFAULT_VALUE;
+      sign = -sign;
+    }
+
+  res1.x = _mm_range_ss (s1.x, s2.x, IMM);
+  res2.x = _mm_mask_range_ss (res2.x, mask, s1.x, s2.x, IMM);
+  res3.x = _mm_maskz_range_ss (mask, s1.x, s2.x, IMM);
+  res1r.x = _mm_range_round_ss (s1.x, s2.x, IMM, _MM_FROUND_NO_EXC);
+  res2r.x = _mm_mask_range_round_ss (res2.x, mask, s1.x, s2.x, IMM,
+				     _MM_FROUND_NO_EXC);
+  res3r.x = _mm_maskz_range_round_ss (mask, s1.x, s2.x, IMM,
+				      _MM_FROUND_NO_EXC);
+
+  CALC (s1.a, s2.a, res_ref);
+
+  if (check_union128 (res1, res_ref) || check_union128 (res1r, res_ref))
+    abort ();
+
+  MASK_MERGE () (res_ref, mask, 1);
+  if (check_union128 (res2, res_ref) || check_union128 (res2r, res_ref))
+    abort ();
+
+  MASK_ZERO () (res_ref, mask, 1);
+  if (check_union128 (res3, res_ref) || check_union128 (res3r, res_ref))
+    abort ();
+}
diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c
index 9bdc73f..7e4b1dc 100644
--- a/gcc/testsuite/gcc.target/i386/sse-13.c
+++ b/gcc/testsuite/gcc.target/i386/sse-13.c
@@ -433,8 +433,10 @@
 #define __builtin_ia32_reducesd_mask(A, B, F, W, U) __builtin_ia32_reducesd_mask(A, B, 1, W, U)
 #define __builtin_ia32_reduceps512_mask(A, E, C, D) __builtin_ia32_reduceps512_mask(A, 1, C, D)
 #define __builtin_ia32_reducepd512_mask(A, E, C, D) __builtin_ia32_reducepd512_mask(A, 1, C, D)
-#define __builtin_ia32_rangess128_round(A, B, I, F) __builtin_ia32_rangess128_round(A, B, 1, 8)
-#define __builtin_ia32_rangesd128_round(A, B, I, F) __builtin_ia32_rangesd128_round(A, B, 1, 8)
+#define __builtin_ia32_rangess128_mask_round(A, B, I, D, E, F) \
+    __builtin_ia32_rangess128_mask_round(A, B, 1, D, E, 8)
+#define __builtin_ia32_rangesd128_mask_round(A, B, I, D, E, F) \
+    __builtin_ia32_rangesd128_mask_round(A, B, 1, D, E, 8)
 #define __builtin_ia32_rangeps512_mask(A, B, I, D, E, F) __builtin_ia32_rangeps512_mask(A, B, 1, D, E, 8)
 #define __builtin_ia32_rangepd512_mask(A, B, I, D, E, F) __builtin_ia32_rangepd512_mask(A, B, 1, D, E, 8)
 #define __builtin_ia32_inserti64x2_512_mask(A, B, F, D, E) __builtin_ia32_inserti64x2_512_mask(A, B, 1, D, E)
diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c
index 66c25c7..a8bc23b 100644
--- a/gcc/testsuite/gcc.target/i386/sse-23.c
+++ b/gcc/testsuite/gcc.target/i386/sse-23.c
@@ -432,8 +432,10 @@
 #define __builtin_ia32_reducesd_mask(A, B, F, W, U) __builtin_ia32_reducesd_mask(A, B, 1, W, U)
 #define __builtin_ia32_reduceps512_mask(A, E, C, D) __builtin_ia32_reduceps512_mask(A, 1, C, D)
 #define __builtin_ia32_reducepd512_mask(A, E, C, D) __builtin_ia32_reducepd512_mask(A, 1, C, D)
-#define __builtin_ia32_rangess128_round(A, B, I, F) __builtin_ia32_rangess128_round(A, B, 1, 8)
-#define __builtin_ia32_rangesd128_round(A, B, I, F) __builtin_ia32_rangesd128_round(A, B, 1, 8)
+#define __builtin_ia32_rangess128_mask_round(A, B, I, D, E, F) \
+    __builtin_ia32_rangess128_mask_round(A, B, 1, D, E, 8)
+#define __builtin_ia32_rangesd128_mask_round(A, B, I, D, E, F) \
+    __builtin_ia32_rangesd128_mask_round(A, B, 1, D, E, 8)
 #define __builtin_ia32_rangeps512_mask(A, B, I, D, E, F) __builtin_ia32_rangeps512_mask(A, B, 1, D, E, 8)
 #define __builtin_ia32_rangepd512_mask(A, B, I, D, E, F) __builtin_ia32_rangepd512_mask(A, B, 1, D, E, 8)
 #define __builtin_ia32_inserti64x2_512_mask(A, B, F, D, E) __builtin_ia32_inserti64x2_512_mask(A, B, 1, D, E)
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [patch][i386, AVX] Adding missing mask[z]_range[_round]_s[d,s] intrinsics
  2017-12-04  9:45 [patch][i386, AVX] Adding missing mask[z]_range[_round]_s[d,s] intrinsics Makhotina, Olga
@ 2018-02-05  7:09 ` Kirill Yukhin
  0 siblings, 0 replies; 2+ messages in thread
From: Kirill Yukhin @ 2018-02-05  7:09 UTC (permalink / raw)
  To: Makhotina, Olga; +Cc: 'gcc-patches@gcc.gnu.org', Peryt, Sebastian

Hello Olga,
On 04 Dec 09:44, Makhotina, Olga wrote:
> Hi,
> 
> This patch adds missing intrinsics for _mm_mask[z]_range[_round]_[sd,ss].
> 
> 04.12.2017 Olga Makhotina  <olga.makhotina@intel.com>
> 
> gcc/
> 	* config/i386/avx512dqintrin.h (_mm_mask_range_sd, _mm_maskz_range_sd,
> 	_mm_mask_range_round_sd, _mm_maskz_range_round_sd, _mm_mask_range_ss,
> 	_mm_maskz_range_ss, _mm_mask_range_round_ss,
> 	_mm_maskz_range_round_ss): New intrinsics.
> 	(__builtin_ia32_rangesd128_round, __builtin_ia32_rangess128_round): Remove.
> 	(__builtin_ia32_rangesd128_mask_round,
> 	__builtin_ia32_rangess128_mask_round): New builtins.
> 	* config/i386/i386-builtin.def (__builtin_ia32_rangesd128_round,
> 	__builtin_ia32_rangess128_round): Remove.
> 	(__builtin_ia32_rangesd128_mask_round,
> 	__builtin_ia32_rangess128_mask_round): New builtins.
> 	* config/i386/sse.md (ranges<mode><round_saeonly_name>): Renamed to ...
> 	(ranges<mode><mask_scalar_name><round_saeonly_scalar_name>): ... this.
> 	((match_operand:VF_128 2 "<round_saeonly_nimm_predicate>"
> 	"<round_saeonly_constraint>")): Changed to ...
> 	((match_operand:VF_128 2 "<round_saeonly_scalar_nimm_predicate>"
> 	"<round_saeonly_scalar_constraint>")): ... this.
> 	("vrange<ssescalarmodesuffix>\t{%3, <round_saeonly_op4>%2, %1, %0|
> 	%0, %1, %2<round_saeonly_op4>, %3}"): Changed to ...
> 	("vrange<ssescalarmodesuffix>\t{%3, <round_saeonly_scalar_mask_op4>%2, %1,
> 	%0<mask_scalar_operand4>|%0<mask_scalar_operand4>, %1,
> 	%2<round_saeonly_scalar_mask_op4>, %3}"): ... this.
> 
> 04.12.2017 Olga Makhotina  <olga.makhotina@intel.com>
> 
> gcc/testsuite/
> 	* gcc.target/i386/avx512dq-vrangesd-1.c (_mm_mask_range_sd,
> 	_mm_maskz_range_sd, _mm_mask_range_round_sd,
> 	_mm_maskz_range_round_sd): Test new intrinsics.
> 	* gcc.target/i386/avx512dq-vrangesd-2.c (_mm_range_sd, _mm_mask_range_sd,
> 	_mm_maskz_range_sd, _mm_range_round_sd, _mm_mask_range_round_sd,
> 	_mm_maskz_range_round_sd): Test new intrinsics.
> 	* gcc.target/i386/avx512dq-vrangess-1.c (_mm_mask_range_ss,
> 	_mm_maskz_range_ss, _mm_mask_range_round_ss,
> 	_mm_maskz_range_round_ss): Test new intrinsics.
> 	* gcc.target/i386/avx512dq-vrangess-2.c (_mm_range_ss, _mm_mask_range_ss,
> 	_mm_maskz_range_ss, _mm_range_round_ss, _mm_mask_range_round_ss,
> 	_mm_maskz_range_round_ss): Test new intrinsics.
> 	* gcc.target/i386/avx-1.c (__builtin_ia32_rangesd128_round,
> 	__builtin_ia32_rangess128_round): Remove builtins.
> 	(__builtin_ia32_rangesd128_mask_round,
> 	__builtin_ia32_rangess128_mask_round): Test new builtins.
> 	* gcc.target/i386/sse-13.c: Ditto.
> 	* gcc.target/i386/sse-23.c: Ditto.
> 
> Is it ok for trunk?
Your patch is OK for trunk. I've checked it in.
> 
> Thanks,
> Olga
>

--
Thanks, K


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-02-05  7:09 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-04  9:45 [patch][i386, AVX] Adding missing mask[z]_range[_round]_s[d,s] intrinsics Makhotina, Olga
2018-02-05  7:09 ` Kirill Yukhin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).