From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <wschmidt@linux.ibm.com>
Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com
 [148.163.156.1])
 by sourceware.org (Postfix) with ESMTPS id 4E3A33861C5F
 for <gcc-patches@gcc.gnu.org>; Fri, 16 Jul 2021 18:13:44 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4E3A33861C5F
Received: from pps.filterd (m0098396.ppops.net [127.0.0.1])
 by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id
 16GI3H5C161249; Fri, 16 Jul 2021 14:13:43 -0400
Received: from ppma03dal.us.ibm.com (b.bd.3ea9.ip4.static.sl-reverse.com
 [169.62.189.11])
 by mx0a-001b2d01.pphosted.com with ESMTP id 39tw6bn0w4-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Fri, 16 Jul 2021 14:13:42 -0400
Received: from pps.filterd (ppma03dal.us.ibm.com [127.0.0.1])
 by ppma03dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 16GI81jg028684;
 Fri, 16 Jul 2021 18:13:41 GMT
Received: from b01cxnp22035.gho.pok.ibm.com (b01cxnp22035.gho.pok.ibm.com
 [9.57.198.25]) by ppma03dal.us.ibm.com with ESMTP id 39rkgy6js9-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Fri, 16 Jul 2021 18:13:41 +0000
Received: from b01ledav003.gho.pok.ibm.com (b01ledav003.gho.pok.ibm.com
 [9.57.199.108])
 by b01cxnp22035.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id
 16GIDf5i24051998
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Fri, 16 Jul 2021 18:13:41 GMT
Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id DDE13B206B;
 Fri, 16 Jul 2021 18:13:40 +0000 (GMT)
Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 8A961B2064;
 Fri, 16 Jul 2021 18:13:40 +0000 (GMT)
Received: from Bills-MacBook-Pro.local (unknown [9.211.124.44])
 by b01ledav003.gho.pok.ibm.com (Postfix) with ESMTP;
 Fri, 16 Jul 2021 18:13:40 +0000 (GMT)
Reply-To: wschmidt@linux.ibm.com
Subject: Re: [PATCH v2 1/6] rs6000: Add support for SSE4.1 "blend" intrinsics
To: "Paul A. Clarke" <pc@us.ibm.com>, gcc-patches@gcc.gnu.org
Cc: segher@kernel.crashing.org
References: <20210716135022.489455-1-pc@us.ibm.com>
 <20210716135022.489455-2-pc@us.ibm.com>
From: Bill Schmidt <wschmidt@linux.ibm.com>
Message-ID: <7c1492ef-b19e-c9a4-ed90-daa84b4e8cce@linux.ibm.com>
Date: Fri, 16 Jul 2021 13:13:39 -0500
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0)
 Gecko/20100101 Thunderbird/78.11.0
MIME-Version: 1.0
In-Reply-To: <20210716135022.489455-2-pc@us.ibm.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Content-Language: en-GB
X-TM-AS-GCONF: 00
X-Proofpoint-GUID: EJWyX9n6q86U-oDA1ohc0upkfWXPShU5
X-Proofpoint-ORIG-GUID: EJWyX9n6q86U-oDA1ohc0upkfWXPShU5
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790
 definitions=2021-07-16_06:2021-07-16,
 2021-07-16 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 priorityscore=1501
 suspectscore=0 mlxlogscore=999 spamscore=0 adultscore=0 lowpriorityscore=0
 phishscore=0 bulkscore=0 clxscore=1015 mlxscore=0 impostorscore=0
 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.12.0-2104190000 definitions=main-2107160111
X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, NICE_REPLY_A, RCVD_IN_MSPIKE_H4,
 RCVD_IN_MSPIKE_WL, SCC_10_SHORT_WORD_LINES, SCC_20_SHORT_WORD_LINES,
 SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 16 Jul 2021 18:13:45 -0000

Hi Paul,

Thanks!  LGTM.  Recommend that maintainers approve.

Bill

On 7/16/21 8:50 AM, Paul A. Clarke wrote:
> _mm_blend_epi16 and _mm_blendv_epi8 were added earlier.
> Add these four to complete the set.
>
> 2021-07-16  Paul A. Clarke  <pc@us.ibm.com>
>
> gcc
> 	* config/rs6000/smmintrin.h (_mm_blend_pd, _mm_blendv_pd,
> 	_mm_blend_ps, _mm_blendv_ps): New.
> ---
> v2:
> - Per review from Bill, rewrote _mm_blend_pd and _mm_blendv_pd to use
>    vec_perm instead of gather/unpack/select.
>
>   gcc/config/rs6000/smmintrin.h | 60 +++++++++++++++++++++++++++++++++++
>   1 file changed, 60 insertions(+)
>
> diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
> index 6a010fdbb96f..69e54702a877 100644
> --- a/gcc/config/rs6000/smmintrin.h
> +++ b/gcc/config/rs6000/smmintrin.h
> @@ -116,6 +116,66 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
>     return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
>   }
>   
> +__inline __m128d
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_blend_pd (__m128d __A, __m128d __B, const int __imm8)
> +{
> +  __v16qu __pcv[] =
> +    {
> +      {  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15 },
> +      { 16, 17, 18, 19, 20, 21, 22, 23,  8,  9, 10, 11, 12, 13, 14, 15 },
> +      {  0,  1,  2,  3,  4,  5,  6,  7, 24, 25, 26, 27, 28, 29, 30, 31 },
> +      { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 }
> +    };
> +  __v16qu __r = vec_perm ((__v16qu) __A, (__v16qu)__B, __pcv[__imm8]);
> +  return (__m128d) __r;
> +}
> +
> +__inline __m128d
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask)
> +{
> +  const __v2di __zero = {0};
> +  const __vector __bool long long __boolmask = vec_cmplt ((__v2di) __mask, __zero);
> +  return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) __boolmask);
> +}
> +
> +__inline __m128
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_blend_ps (__m128 __A, __m128 __B, const int __imm8)
> +{
> +  __v16qu __pcv[] =
> +    {
> +      {  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15 },
> +      { 16, 17, 18, 19,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15 },
> +      {  0,  1,  2,  3, 20, 21, 22, 23,  8,  9, 10, 11, 12, 13, 14, 15 },
> +      { 16, 17, 18, 19, 20, 21, 22, 23,  8,  9, 10, 11, 12, 13, 14, 15 },
> +      {  0,  1,  2,  3,  4,  5,  6,  7, 24, 25, 26, 27, 12, 13, 14, 15 },
> +      { 16, 17, 18, 19,  4,  5,  6,  7, 24, 25, 26, 27, 12, 13, 14, 15 },
> +      {  0,  1,  2,  3, 20, 21, 22, 23, 24, 25, 26, 27, 12, 13, 14, 15 },
> +      { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 12, 13, 14, 15 },
> +      {  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 28, 29, 30, 31 },
> +      { 16, 17, 18, 19,  4,  5,  6,  7,  8,  9, 10, 11, 28, 29, 30, 31 },
> +      {  0,  1,  2,  3, 20, 21, 22, 23,  8,  9, 10, 11, 28, 29, 30, 31 },
> +      { 16, 17, 18, 19, 20, 21, 22, 23,  8,  9, 10, 11, 28, 29, 30, 31 },
> +      {  0,  1,  2,  3,  4,  5,  6,  7, 24, 25, 26, 27, 28, 29, 30, 31 },
> +      { 16, 17, 18, 19,  4,  5,  6,  7, 24, 25, 26, 27, 28, 29, 30, 31 },
> +      {  0,  1,  2,  3, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 },
> +      { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 },
> +    };
> +  __v16qu __r = vec_perm ((__v16qu) __A, (__v16qu)__B, __pcv[__imm8]);
> +  return (__m128) __r;
> +}
> +
> +__inline __m128
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_blendv_ps (__m128 __A, __m128 __B, __m128 __mask)
> +{
> +  const __v4si __zero = {0};
> +  const __vector __bool int __boolmask = vec_cmplt ((__v4si) __mask, __zero);
> +  return (__m128) vec_sel ((__v4su) __A, (__v4su) __B, (__v4su) __boolmask);
> +}
> +
>   __inline int
>   __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>   _mm_testz_si128 (__m128i __A, __m128i __B)