From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id C18BB3857C60 for ; Fri, 27 Aug 2021 15:25:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C18BB3857C60 Received: from pps.filterd (m0098420.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 17RFDSod107925; Fri, 27 Aug 2021 11:25:19 -0400 Received: from ppma02wdc.us.ibm.com (aa.5b.37a9.ip4.static.sl-reverse.com [169.55.91.170]) by mx0b-001b2d01.pphosted.com with ESMTP id 3aq0ysbg4t-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 27 Aug 2021 11:25:19 -0400 Received: from pps.filterd (ppma02wdc.us.ibm.com [127.0.0.1]) by ppma02wdc.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 17RFCHEX003331; Fri, 27 Aug 2021 15:25:18 GMT Received: from b03cxnp08026.gho.boulder.ibm.com (b03cxnp08026.gho.boulder.ibm.com [9.17.130.18]) by ppma02wdc.us.ibm.com with ESMTP id 3ajs4g2323-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 27 Aug 2021 15:25:18 +0000 Received: from b03ledav002.gho.boulder.ibm.com (b03ledav002.gho.boulder.ibm.com [9.17.130.233]) by b03cxnp08026.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 17RFPHD035717458 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 27 Aug 2021 15:25:17 GMT Received: from b03ledav002.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7CD89136065; Fri, 27 Aug 2021 15:25:17 +0000 (GMT) Received: from b03ledav002.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4D9D713605E; Fri, 27 Aug 2021 15:25:17 +0000 (GMT) Received: from Bills-MacBook-Pro.local (unknown [9.211.74.106]) by b03ledav002.gho.boulder.ibm.com (Postfix) with ESMTP; Fri, 27 Aug 2021 15:25:17 +0000 (GMT) Reply-To: wschmidt@linux.ibm.com Subject: Re: [PATCH v3 6/6] rs6000: Guard some x86 intrinsics implementations To: "Paul A. Clarke" , gcc-patches@gcc.gnu.org Cc: segher@kernel.crashing.org References: <20210823190310.1679905-1-pc@us.ibm.com> <20210823190310.1679905-7-pc@us.ibm.com> From: Bill Schmidt Message-ID: <146dae52-215d-995d-a783-419b6d208a46@linux.ibm.com> Date: Fri, 27 Aug 2021 10:25:16 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: <20210823190310.1679905-7-pc@us.ibm.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-TM-AS-GCONF: 00 X-Proofpoint-GUID: eTCHJ0K8DEok-pV-HU21cbIJiOXh27zs X-Proofpoint-ORIG-GUID: eTCHJ0K8DEok-pV-HU21cbIJiOXh27zs X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-08-27_04:2021-08-26, 2021-08-27 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 mlxscore=0 impostorscore=0 bulkscore=0 spamscore=0 mlxlogscore=999 lowpriorityscore=0 suspectscore=0 clxscore=1015 adultscore=0 malwarescore=0 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2107140000 definitions=main-2108270092 X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, NICE_REPLY_A, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Aug 2021 15:25:21 -0000 Hi Paul, Thanks for the changes!  This looks fine to me, recommend approval. Thanks, Bill On 8/23/21 2:03 PM, Paul A. Clarke wrote: > Some compatibility implementations of x86 intrinsics include > Power intrinsics which require POWER8. Guard them. > > emmintrin.h: > - _mm_cmpord_pd: Remove code which was ostensibly for pre-POWER8, > but which indeed depended on POWER8 (vec_cmpgt(v2du)/vcmpgtud). > The "POWER8" version works fine on pre-POWER8. > - _mm_mul_epu32: vec_mule(v4su) uses vmuleuw. > pmmintrin.h: > - _mm_movehdup_ps: vec_mergeo(v4su) uses vmrgow. > - _mm_moveldup_ps: vec_mergee(v4su) uses vmrgew. > smmintrin.h: > - _mm_cmpeq_epi64: vec_cmpeq(v2di) uses vcmpequd. > - _mm_mul_epi32: vec_mule(v4si) uses vmuluwm. > - _mm_cmpgt_epi64: vec_cmpgt(v2di) uses vcmpgtsd. > tmmintrin.h: > - _mm_sign_epi8: vec_neg(v4si) uses vsububm. > - _mm_sign_epi16: vec_neg(v4si) uses vsubuhm. > - _mm_sign_epi32: vec_neg(v4si) uses vsubuwm. > Note that the above three could actually be supported pre-POWER8, > but current GCC does not support them before POWER8. > - _mm_sign_pi8: depends on _mm_sign_epi8. > - _mm_sign_pi16: depends on _mm_sign_epi16. > - _mm_sign_pi32: depends on _mm_sign_epi32. > > 2021-08-20 Paul A. Clarke > > gcc > PR target/101893 > * config/rs6000/emmintrin.h: Guard POWER8 intrinsics. > * config/rs6000/pmmintrin.h: Same. > * config/rs6000/smmintrin.h: Same. > * config/rs6000/tmmintrin.h: Same. > --- > v3: No change. > v2: > - Ensured that new "#ifdef _ARCH_PWR8" bracket each function so > impacted, rather than groups of functions, per v1 review. > - Noted testing in patch series cover letter. > - Added PR number to commit message. > > gcc/config/rs6000/emmintrin.h | 12 ++---------- > gcc/config/rs6000/pmmintrin.h | 4 ++++ > gcc/config/rs6000/smmintrin.h | 4 ++++ > gcc/config/rs6000/tmmintrin.h | 12 ++++++++++++ > 4 files changed, 22 insertions(+), 10 deletions(-) > > diff --git a/gcc/config/rs6000/emmintrin.h b/gcc/config/rs6000/emmintrin.h > index ce1287edf782..32ad72b4cc35 100644 > --- a/gcc/config/rs6000/emmintrin.h > +++ b/gcc/config/rs6000/emmintrin.h > @@ -430,20 +430,10 @@ _mm_cmpnge_pd (__m128d __A, __m128d __B) > extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) > _mm_cmpord_pd (__m128d __A, __m128d __B) > { > -#if _ARCH_PWR8 > __v2du c, d; > /* Compare against self will return false (0's) if NAN. */ > c = (__v2du)vec_cmpeq (__A, __A); > d = (__v2du)vec_cmpeq (__B, __B); > -#else > - __v2du a, b; > - __v2du c, d; > - const __v2du double_exp_mask = {0x7ff0000000000000, 0x7ff0000000000000}; > - a = (__v2du)vec_abs ((__v2df)__A); > - b = (__v2du)vec_abs ((__v2df)__B); > - c = (__v2du)vec_cmpgt (double_exp_mask, a); > - d = (__v2du)vec_cmpgt (double_exp_mask, b); > -#endif > /* A != NAN and B != NAN. */ > return ((__m128d)vec_and(c, d)); > } > @@ -1472,6 +1462,7 @@ _mm_mul_su32 (__m64 __A, __m64 __B) > return ((__m64)a * (__m64)b); > } > > +#ifdef _ARCH_PWR8 > extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) > _mm_mul_epu32 (__m128i __A, __m128i __B) > { > @@ -1498,6 +1489,7 @@ _mm_mul_epu32 (__m128i __A, __m128i __B) > return (__m128i) vec_mule ((__v4su)__A, (__v4su)__B); > #endif > } > +#endif > > extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) > _mm_slli_epi16 (__m128i __A, int __B) > diff --git a/gcc/config/rs6000/pmmintrin.h b/gcc/config/rs6000/pmmintrin.h > index eab712fdfa66..83dff1d85666 100644 > --- a/gcc/config/rs6000/pmmintrin.h > +++ b/gcc/config/rs6000/pmmintrin.h > @@ -123,17 +123,21 @@ _mm_hsub_pd (__m128d __X, __m128d __Y) > vec_mergel ((__v2df) __X, (__v2df)__Y)); > } > > +#ifdef _ARCH_PWR8 > extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) > _mm_movehdup_ps (__m128 __X) > { > return (__m128)vec_mergeo ((__v4su)__X, (__v4su)__X); > } > +#endif > > +#ifdef _ARCH_PWR8 > extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) > _mm_moveldup_ps (__m128 __X) > { > return (__m128)vec_mergee ((__v4su)__X, (__v4su)__X); > } > +#endif > > extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) > _mm_loaddup_pd (double const *__P) > diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h > index c04d2bb5b6d3..29719367e205 100644 > --- a/gcc/config/rs6000/smmintrin.h > +++ b/gcc/config/rs6000/smmintrin.h > @@ -272,6 +272,7 @@ _mm_extract_ps (__m128 __X, const int __N) > return ((__v4si)__X)[__N & 3]; > } > > +#ifdef _ARCH_PWR8 > extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) > _mm_blend_epi16 (__m128i __A, __m128i __B, const int __imm8) > { > @@ -283,6 +284,7 @@ _mm_blend_epi16 (__m128i __A, __m128i __B, const int __imm8) > #endif > return (__m128i) vec_sel ((__v8hu) __A, (__v8hu) __B, __shortmask); > } > +#endif > > extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) > _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask) > @@ -343,6 +345,7 @@ _mm_blend_pd (__m128d __A, __m128d __B, const int __imm8) > return (__m128d) __r; > } > > +#ifdef _ARCH_PWR8 > __inline __m128d > __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > _mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask) > @@ -351,6 +354,7 @@ _mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask) > const __vector __bool long long __boolmask = vec_cmplt ((__v2di) __mask, __zero); > return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) __boolmask); > } > +#endif > > __inline int > __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > diff --git a/gcc/config/rs6000/tmmintrin.h b/gcc/config/rs6000/tmmintrin.h > index 971511260b78..a67d88c8079a 100644 > --- a/gcc/config/rs6000/tmmintrin.h > +++ b/gcc/config/rs6000/tmmintrin.h > @@ -350,6 +350,7 @@ _mm_shuffle_pi8 (__m64 __A, __m64 __B) > return (__m64) ((__v2du) (__C))[0]; > } > > +#ifdef _ARCH_PWR8 > extern __inline __m128i > __attribute__((__gnu_inline__, __always_inline__, __artificial__)) > _mm_sign_epi8 (__m128i __A, __m128i __B) > @@ -361,7 +362,9 @@ _mm_sign_epi8 (__m128i __A, __m128i __B) > __v16qi __conv = vec_add (__selectneg, __selectpos); > return (__m128i) vec_mul ((__v16qi) __A, (__v16qi) __conv); > } > +#endif > > +#ifdef _ARCH_PWR8 > extern __inline __m128i > __attribute__((__gnu_inline__, __always_inline__, __artificial__)) > _mm_sign_epi16 (__m128i __A, __m128i __B) > @@ -373,7 +376,9 @@ _mm_sign_epi16 (__m128i __A, __m128i __B) > __v8hi __conv = vec_add (__selectneg, __selectpos); > return (__m128i) vec_mul ((__v8hi) __A, (__v8hi) __conv); > } > +#endif > > +#ifdef _ARCH_PWR8 > extern __inline __m128i > __attribute__((__gnu_inline__, __always_inline__, __artificial__)) > _mm_sign_epi32 (__m128i __A, __m128i __B) > @@ -385,7 +390,9 @@ _mm_sign_epi32 (__m128i __A, __m128i __B) > __v4si __conv = vec_add (__selectneg, __selectpos); > return (__m128i) vec_mul ((__v4si) __A, (__v4si) __conv); > } > +#endif > > +#ifdef _ARCH_PWR8 > extern __inline __m64 > __attribute__((__gnu_inline__, __always_inline__, __artificial__)) > _mm_sign_pi8 (__m64 __A, __m64 __B) > @@ -396,7 +403,9 @@ _mm_sign_pi8 (__m64 __A, __m64 __B) > __C = (__v16qi) _mm_sign_epi8 ((__m128i) __C, (__m128i) __D); > return (__m64) ((__v2du) (__C))[0]; > } > +#endif > > +#ifdef _ARCH_PWR8 > extern __inline __m64 > __attribute__((__gnu_inline__, __always_inline__, __artificial__)) > _mm_sign_pi16 (__m64 __A, __m64 __B) > @@ -407,7 +416,9 @@ _mm_sign_pi16 (__m64 __A, __m64 __B) > __C = (__v8hi) _mm_sign_epi16 ((__m128i) __C, (__m128i) __D); > return (__m64) ((__v2du) (__C))[0]; > } > +#endif > > +#ifdef _ARCH_PWR8 > extern __inline __m64 > __attribute__((__gnu_inline__, __always_inline__, __artificial__)) > _mm_sign_pi32 (__m64 __A, __m64 __B) > @@ -418,6 +429,7 @@ _mm_sign_pi32 (__m64 __A, __m64 __B) > __C = (__v4si) _mm_sign_epi32 ((__m128i) __C, (__m128i) __D); > return (__m64) ((__v2du) (__C))[0]; > } > +#endif > > extern __inline __m128i > __attribute__((__gnu_inline__, __always_inline__, __artificial__))