From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 4E3A33861C5F for ; Fri, 16 Jul 2021 18:13:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4E3A33861C5F Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 16GI3H5C161249; Fri, 16 Jul 2021 14:13:43 -0400 Received: from ppma03dal.us.ibm.com (b.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.11]) by mx0a-001b2d01.pphosted.com with ESMTP id 39tw6bn0w4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 16 Jul 2021 14:13:42 -0400 Received: from pps.filterd (ppma03dal.us.ibm.com [127.0.0.1]) by ppma03dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 16GI81jg028684; Fri, 16 Jul 2021 18:13:41 GMT Received: from b01cxnp22035.gho.pok.ibm.com (b01cxnp22035.gho.pok.ibm.com [9.57.198.25]) by ppma03dal.us.ibm.com with ESMTP id 39rkgy6js9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 16 Jul 2021 18:13:41 +0000 Received: from b01ledav003.gho.pok.ibm.com (b01ledav003.gho.pok.ibm.com [9.57.199.108]) by b01cxnp22035.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 16GIDf5i24051998 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 16 Jul 2021 18:13:41 GMT Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id DDE13B206B; Fri, 16 Jul 2021 18:13:40 +0000 (GMT) Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8A961B2064; Fri, 16 Jul 2021 18:13:40 +0000 (GMT) Received: from Bills-MacBook-Pro.local (unknown [9.211.124.44]) by b01ledav003.gho.pok.ibm.com (Postfix) with ESMTP; Fri, 16 Jul 2021 18:13:40 +0000 (GMT) Reply-To: wschmidt@linux.ibm.com Subject: Re: [PATCH v2 1/6] rs6000: Add support for SSE4.1 "blend" intrinsics To: "Paul A. Clarke" , gcc-patches@gcc.gnu.org Cc: segher@kernel.crashing.org References: <20210716135022.489455-1-pc@us.ibm.com> <20210716135022.489455-2-pc@us.ibm.com> From: Bill Schmidt Message-ID: <7c1492ef-b19e-c9a4-ed90-daa84b4e8cce@linux.ibm.com> Date: Fri, 16 Jul 2021 13:13:39 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <20210716135022.489455-2-pc@us.ibm.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-TM-AS-GCONF: 00 X-Proofpoint-GUID: EJWyX9n6q86U-oDA1ohc0upkfWXPShU5 X-Proofpoint-ORIG-GUID: EJWyX9n6q86U-oDA1ohc0upkfWXPShU5 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-07-16_06:2021-07-16, 2021-07-16 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 suspectscore=0 mlxlogscore=999 spamscore=0 adultscore=0 lowpriorityscore=0 phishscore=0 bulkscore=0 clxscore=1015 mlxscore=0 impostorscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104190000 definitions=main-2107160111 X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, NICE_REPLY_A, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SCC_10_SHORT_WORD_LINES, SCC_20_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Jul 2021 18:13:45 -0000 Hi Paul, Thanks!  LGTM.  Recommend that maintainers approve. Bill On 7/16/21 8:50 AM, Paul A. Clarke wrote: > _mm_blend_epi16 and _mm_blendv_epi8 were added earlier. > Add these four to complete the set. > > 2021-07-16 Paul A. Clarke > > gcc > * config/rs6000/smmintrin.h (_mm_blend_pd, _mm_blendv_pd, > _mm_blend_ps, _mm_blendv_ps): New. > --- > v2: > - Per review from Bill, rewrote _mm_blend_pd and _mm_blendv_pd to use > vec_perm instead of gather/unpack/select. > > gcc/config/rs6000/smmintrin.h | 60 +++++++++++++++++++++++++++++++++++ > 1 file changed, 60 insertions(+) > > diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h > index 6a010fdbb96f..69e54702a877 100644 > --- a/gcc/config/rs6000/smmintrin.h > +++ b/gcc/config/rs6000/smmintrin.h > @@ -116,6 +116,66 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask) > return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask); > } > > +__inline __m128d > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm_blend_pd (__m128d __A, __m128d __B, const int __imm8) > +{ > + __v16qu __pcv[] = > + { > + { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }, > + { 16, 17, 18, 19, 20, 21, 22, 23, 8, 9, 10, 11, 12, 13, 14, 15 }, > + { 0, 1, 2, 3, 4, 5, 6, 7, 24, 25, 26, 27, 28, 29, 30, 31 }, > + { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 } > + }; > + __v16qu __r = vec_perm ((__v16qu) __A, (__v16qu)__B, __pcv[__imm8]); > + return (__m128d) __r; > +} > + > +__inline __m128d > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask) > +{ > + const __v2di __zero = {0}; > + const __vector __bool long long __boolmask = vec_cmplt ((__v2di) __mask, __zero); > + return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) __boolmask); > +} > + > +__inline __m128 > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm_blend_ps (__m128 __A, __m128 __B, const int __imm8) > +{ > + __v16qu __pcv[] = > + { > + { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }, > + { 16, 17, 18, 19, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }, > + { 0, 1, 2, 3, 20, 21, 22, 23, 8, 9, 10, 11, 12, 13, 14, 15 }, > + { 16, 17, 18, 19, 20, 21, 22, 23, 8, 9, 10, 11, 12, 13, 14, 15 }, > + { 0, 1, 2, 3, 4, 5, 6, 7, 24, 25, 26, 27, 12, 13, 14, 15 }, > + { 16, 17, 18, 19, 4, 5, 6, 7, 24, 25, 26, 27, 12, 13, 14, 15 }, > + { 0, 1, 2, 3, 20, 21, 22, 23, 24, 25, 26, 27, 12, 13, 14, 15 }, > + { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 12, 13, 14, 15 }, > + { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 28, 29, 30, 31 }, > + { 16, 17, 18, 19, 4, 5, 6, 7, 8, 9, 10, 11, 28, 29, 30, 31 }, > + { 0, 1, 2, 3, 20, 21, 22, 23, 8, 9, 10, 11, 28, 29, 30, 31 }, > + { 16, 17, 18, 19, 20, 21, 22, 23, 8, 9, 10, 11, 28, 29, 30, 31 }, > + { 0, 1, 2, 3, 4, 5, 6, 7, 24, 25, 26, 27, 28, 29, 30, 31 }, > + { 16, 17, 18, 19, 4, 5, 6, 7, 24, 25, 26, 27, 28, 29, 30, 31 }, > + { 0, 1, 2, 3, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 }, > + { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 }, > + }; > + __v16qu __r = vec_perm ((__v16qu) __A, (__v16qu)__B, __pcv[__imm8]); > + return (__m128) __r; > +} > + > +__inline __m128 > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm_blendv_ps (__m128 __A, __m128 __B, __m128 __mask) > +{ > + const __v4si __zero = {0}; > + const __vector __bool int __boolmask = vec_cmplt ((__v4si) __mask, __zero); > + return (__m128) vec_sel ((__v4su) __A, (__v4su) __B, (__v4su) __boolmask); > +} > + > __inline int > __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > _mm_testz_si128 (__m128i __A, __m128i __B)