From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 506253858022 for ; Mon, 23 Aug 2021 19:03:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 506253858022 Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 17NIXeSH122783; Mon, 23 Aug 2021 15:03:21 -0400 Received: from ppma03dal.us.ibm.com (b.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.11]) by mx0b-001b2d01.pphosted.com with ESMTP id 3am73kc2tf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 23 Aug 2021 15:03:21 -0400 Received: from pps.filterd (ppma03dal.us.ibm.com [127.0.0.1]) by ppma03dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 17NIvDJX007022; Mon, 23 Aug 2021 19:03:20 GMT Received: from b03cxnp07029.gho.boulder.ibm.com (b03cxnp07029.gho.boulder.ibm.com [9.17.130.16]) by ppma03dal.us.ibm.com with ESMTP id 3ajs4c0u7c-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 23 Aug 2021 19:03:20 +0000 Received: from b03ledav003.gho.boulder.ibm.com (b03ledav003.gho.boulder.ibm.com [9.17.130.234]) by b03cxnp07029.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 17NJ3JeT45679086 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 23 Aug 2021 19:03:19 GMT Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 39C346A057; Mon, 23 Aug 2021 19:03:19 +0000 (GMT) Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id DAACE6A051; Mon, 23 Aug 2021 19:03:18 +0000 (GMT) Received: from localhost (unknown [9.160.136.65]) by b03ledav003.gho.boulder.ibm.com (Postfix) with ESMTP; Mon, 23 Aug 2021 19:03:18 +0000 (GMT) From: "Paul A. Clarke" To: gcc-patches@gcc.gnu.org Cc: segher@kernel.crashing.org, wschmidt@linux.ibm.com Subject: [PATCH v3 6/6] rs6000: Guard some x86 intrinsics implementations Date: Mon, 23 Aug 2021 14:03:10 -0500 Message-Id: <20210823190310.1679905-7-pc@us.ibm.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210823190310.1679905-1-pc@us.ibm.com> References: <20210823190310.1679905-1-pc@us.ibm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: yg_X8PA3NOm_j9sXXQ6bXXor7H4-1S5s X-Proofpoint-GUID: yg_X8PA3NOm_j9sXXQ6bXXor7H4-1S5s X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-08-23_04:2021-08-23, 2021-08-23 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 adultscore=0 clxscore=1015 malwarescore=0 suspectscore=0 bulkscore=0 impostorscore=0 mlxlogscore=999 phishscore=0 spamscore=0 lowpriorityscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2107140000 definitions=main-2108230126 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Aug 2021 19:03:26 -0000 Some compatibility implementations of x86 intrinsics include Power intrinsics which require POWER8. Guard them. emmintrin.h: - _mm_cmpord_pd: Remove code which was ostensibly for pre-POWER8, but which indeed depended on POWER8 (vec_cmpgt(v2du)/vcmpgtud). The "POWER8" version works fine on pre-POWER8. - _mm_mul_epu32: vec_mule(v4su) uses vmuleuw. pmmintrin.h: - _mm_movehdup_ps: vec_mergeo(v4su) uses vmrgow. - _mm_moveldup_ps: vec_mergee(v4su) uses vmrgew. smmintrin.h: - _mm_cmpeq_epi64: vec_cmpeq(v2di) uses vcmpequd. - _mm_mul_epi32: vec_mule(v4si) uses vmuluwm. - _mm_cmpgt_epi64: vec_cmpgt(v2di) uses vcmpgtsd. tmmintrin.h: - _mm_sign_epi8: vec_neg(v4si) uses vsububm. - _mm_sign_epi16: vec_neg(v4si) uses vsubuhm. - _mm_sign_epi32: vec_neg(v4si) uses vsubuwm. Note that the above three could actually be supported pre-POWER8, but current GCC does not support them before POWER8. - _mm_sign_pi8: depends on _mm_sign_epi8. - _mm_sign_pi16: depends on _mm_sign_epi16. - _mm_sign_pi32: depends on _mm_sign_epi32. 2021-08-20 Paul A. Clarke gcc PR target/101893 * config/rs6000/emmintrin.h: Guard POWER8 intrinsics. * config/rs6000/pmmintrin.h: Same. * config/rs6000/smmintrin.h: Same. * config/rs6000/tmmintrin.h: Same. --- v3: No change. v2: - Ensured that new "#ifdef _ARCH_PWR8" bracket each function so impacted, rather than groups of functions, per v1 review. - Noted testing in patch series cover letter. - Added PR number to commit message. gcc/config/rs6000/emmintrin.h | 12 ++---------- gcc/config/rs6000/pmmintrin.h | 4 ++++ gcc/config/rs6000/smmintrin.h | 4 ++++ gcc/config/rs6000/tmmintrin.h | 12 ++++++++++++ 4 files changed, 22 insertions(+), 10 deletions(-) diff --git a/gcc/config/rs6000/emmintrin.h b/gcc/config/rs6000/emmintrin.h index ce1287edf782..32ad72b4cc35 100644 --- a/gcc/config/rs6000/emmintrin.h +++ b/gcc/config/rs6000/emmintrin.h @@ -430,20 +430,10 @@ _mm_cmpnge_pd (__m128d __A, __m128d __B) extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_cmpord_pd (__m128d __A, __m128d __B) { -#if _ARCH_PWR8 __v2du c, d; /* Compare against self will return false (0's) if NAN. */ c = (__v2du)vec_cmpeq (__A, __A); d = (__v2du)vec_cmpeq (__B, __B); -#else - __v2du a, b; - __v2du c, d; - const __v2du double_exp_mask = {0x7ff0000000000000, 0x7ff0000000000000}; - a = (__v2du)vec_abs ((__v2df)__A); - b = (__v2du)vec_abs ((__v2df)__B); - c = (__v2du)vec_cmpgt (double_exp_mask, a); - d = (__v2du)vec_cmpgt (double_exp_mask, b); -#endif /* A != NAN and B != NAN. */ return ((__m128d)vec_and(c, d)); } @@ -1472,6 +1462,7 @@ _mm_mul_su32 (__m64 __A, __m64 __B) return ((__m64)a * (__m64)b); } +#ifdef _ARCH_PWR8 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_mul_epu32 (__m128i __A, __m128i __B) { @@ -1498,6 +1489,7 @@ _mm_mul_epu32 (__m128i __A, __m128i __B) return (__m128i) vec_mule ((__v4su)__A, (__v4su)__B); #endif } +#endif extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_slli_epi16 (__m128i __A, int __B) diff --git a/gcc/config/rs6000/pmmintrin.h b/gcc/config/rs6000/pmmintrin.h index eab712fdfa66..83dff1d85666 100644 --- a/gcc/config/rs6000/pmmintrin.h +++ b/gcc/config/rs6000/pmmintrin.h @@ -123,17 +123,21 @@ _mm_hsub_pd (__m128d __X, __m128d __Y) vec_mergel ((__v2df) __X, (__v2df)__Y)); } +#ifdef _ARCH_PWR8 extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_movehdup_ps (__m128 __X) { return (__m128)vec_mergeo ((__v4su)__X, (__v4su)__X); } +#endif +#ifdef _ARCH_PWR8 extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_moveldup_ps (__m128 __X) { return (__m128)vec_mergee ((__v4su)__X, (__v4su)__X); } +#endif extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_loaddup_pd (double const *__P) diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h index c04d2bb5b6d3..29719367e205 100644 --- a/gcc/config/rs6000/smmintrin.h +++ b/gcc/config/rs6000/smmintrin.h @@ -272,6 +272,7 @@ _mm_extract_ps (__m128 __X, const int __N) return ((__v4si)__X)[__N & 3]; } +#ifdef _ARCH_PWR8 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_blend_epi16 (__m128i __A, __m128i __B, const int __imm8) { @@ -283,6 +284,7 @@ _mm_blend_epi16 (__m128i __A, __m128i __B, const int __imm8) #endif return (__m128i) vec_sel ((__v8hu) __A, (__v8hu) __B, __shortmask); } +#endif extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask) @@ -343,6 +345,7 @@ _mm_blend_pd (__m128d __A, __m128d __B, const int __imm8) return (__m128d) __r; } +#ifdef _ARCH_PWR8 __inline __m128d __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask) @@ -351,6 +354,7 @@ _mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask) const __vector __bool long long __boolmask = vec_cmplt ((__v2di) __mask, __zero); return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) __boolmask); } +#endif __inline int __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) diff --git a/gcc/config/rs6000/tmmintrin.h b/gcc/config/rs6000/tmmintrin.h index 971511260b78..a67d88c8079a 100644 --- a/gcc/config/rs6000/tmmintrin.h +++ b/gcc/config/rs6000/tmmintrin.h @@ -350,6 +350,7 @@ _mm_shuffle_pi8 (__m64 __A, __m64 __B) return (__m64) ((__v2du) (__C))[0]; } +#ifdef _ARCH_PWR8 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_sign_epi8 (__m128i __A, __m128i __B) @@ -361,7 +362,9 @@ _mm_sign_epi8 (__m128i __A, __m128i __B) __v16qi __conv = vec_add (__selectneg, __selectpos); return (__m128i) vec_mul ((__v16qi) __A, (__v16qi) __conv); } +#endif +#ifdef _ARCH_PWR8 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_sign_epi16 (__m128i __A, __m128i __B) @@ -373,7 +376,9 @@ _mm_sign_epi16 (__m128i __A, __m128i __B) __v8hi __conv = vec_add (__selectneg, __selectpos); return (__m128i) vec_mul ((__v8hi) __A, (__v8hi) __conv); } +#endif +#ifdef _ARCH_PWR8 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_sign_epi32 (__m128i __A, __m128i __B) @@ -385,7 +390,9 @@ _mm_sign_epi32 (__m128i __A, __m128i __B) __v4si __conv = vec_add (__selectneg, __selectpos); return (__m128i) vec_mul ((__v4si) __A, (__v4si) __conv); } +#endif +#ifdef _ARCH_PWR8 extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_sign_pi8 (__m64 __A, __m64 __B) @@ -396,7 +403,9 @@ _mm_sign_pi8 (__m64 __A, __m64 __B) __C = (__v16qi) _mm_sign_epi8 ((__m128i) __C, (__m128i) __D); return (__m64) ((__v2du) (__C))[0]; } +#endif +#ifdef _ARCH_PWR8 extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_sign_pi16 (__m64 __A, __m64 __B) @@ -407,7 +416,9 @@ _mm_sign_pi16 (__m64 __A, __m64 __B) __C = (__v8hi) _mm_sign_epi16 ((__m128i) __C, (__m128i) __D); return (__m64) ((__v2du) (__C))[0]; } +#endif +#ifdef _ARCH_PWR8 extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_sign_pi32 (__m64 __A, __m64 __B) @@ -418,6 +429,7 @@ _mm_sign_pi32 (__m64 __A, __m64 __B) __C = (__v4si) _mm_sign_epi32 ((__m128i) __C, (__m128i) __D); return (__m64) ((__v2du) (__C))[0]; } +#endif extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) -- 2.27.0