From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 8FCCB3858C3A for ; Tue, 12 Oct 2021 01:55:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8FCCB3858C3A Received: from pps.filterd (m0098393.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 19BNVS1R012889; Mon, 11 Oct 2021 21:55:07 -0400 Received: from ppma01dal.us.ibm.com (83.d6.3fa9.ip4.static.sl-reverse.com [169.63.214.131]) by mx0a-001b2d01.pphosted.com with ESMTP id 3bmy11syqr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 11 Oct 2021 21:55:07 -0400 Received: from pps.filterd (ppma01dal.us.ibm.com [127.0.0.1]) by ppma01dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 19C1aiJl004342; Tue, 12 Oct 2021 01:55:06 GMT Received: from b01cxnp22035.gho.pok.ibm.com (b01cxnp22035.gho.pok.ibm.com [9.57.198.25]) by ppma01dal.us.ibm.com with ESMTP id 3bk2qbbne0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 12 Oct 2021 01:55:06 +0000 Received: from b01ledav002.gho.pok.ibm.com (b01ledav002.gho.pok.ibm.com [9.57.199.107]) by b01cxnp22035.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 19C1t51o39190824 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 12 Oct 2021 01:55:05 GMT Received: from b01ledav002.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1B92C124069; Tue, 12 Oct 2021 01:55:05 +0000 (GMT) Received: from b01ledav002.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B9688124052; Tue, 12 Oct 2021 01:55:04 +0000 (GMT) Received: from li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com (unknown [9.160.31.192]) by b01ledav002.gho.pok.ibm.com (Postfix) with ESMTPS; Tue, 12 Oct 2021 01:55:04 +0000 (GMT) Date: Mon, 11 Oct 2021 20:55:02 -0500 From: "Paul A. Clarke" To: gcc-patches@gcc.gnu.org Cc: segher@kernel.crashing.org, wschmidt@linux.ibm.com Subject: Re: [COMMITTED v4 5/6] rs6000: Support more SSE4 "cmp", "mul", "pack" intrinsics Message-ID: <20211012015502.GL243632@li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com> References: <20210823190310.1679905-1-pc@us.ibm.com> <20210823190310.1679905-6-pc@us.ibm.com> <20211011230735.GE10333@gate.crashing.org> Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20211011230735.GE10333@gate.crashing.org> User-Agent: Mutt/1.10.1 (2018-07-13) X-TM-AS-GCONF: 00 X-Proofpoint-GUID: VQ2m70vfG2npPzx4GmwLJ86rDVeA1kwV X-Proofpoint-ORIG-GUID: VQ2m70vfG2npPzx4GmwLJ86rDVeA1kwV X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.182.1,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.0.607.475 definitions=2021-10-11_11,2021-10-11_01,2020-04-07_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 priorityscore=1501 impostorscore=0 adultscore=0 suspectscore=0 malwarescore=0 clxscore=1015 spamscore=0 phishscore=0 mlxlogscore=999 bulkscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2109230001 definitions=main-2110120004 X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Oct 2021 01:55:10 -0000 On Mon, Oct 11, 2021 at 06:07:35PM -0500, Segher Boessenkool wrote: > On Mon, Aug 23, 2021 at 02:03:09PM -0500, Paul A. Clarke wrote: > > gcc > > * config/rs6000/smmintrin.h (_mm_cmpeq_epi64, _mm_cmpgt_epi64, > > _mm_mullo_epi32, _mm_mul_epi32, _mm_packus_epi32): New. > > * config/rs6000/nmmintrin.h: Copy from i386, tweak to suit. > > > > gcc/testsuite > > * gcc.target/powerpc/pr78102.c: Copy from gcc.target/i386, > > adjust dg directives to suit. > > * gcc.target/powerpc/sse4_1-packusdw.c: Same. > > * gcc.target/powerpc/sse4_1-pcmpeqq.c: Same. > > * gcc.target/powerpc/sse4_1-pmuldq.c: Same. > > * gcc.target/powerpc/sse4_1-pmulld.c: Same. > > * gcc.target/powerpc/sse4_2-pcmpgtq.c: Same. > > * gcc.target/powerpc/sse4_2-check.h: Copy from gcc.target/i386, > > tweak to suit. > > Okay for trunk (with the vsx_hw thing). Thanks! This was committed: rs6000: Support more SSE4 "cmp", "mul", "pack" intrinsics Function signatures and decorations match gcc/config/i386/smmintrin.h. Also, copy tests for: - _mm_cmpeq_epi64 - _mm_mullo_epi32, _mm_mul_epi32 - _mm_packus_epi32 - _mm_cmpgt_epi64 (SSE4.2) from gcc/testsuite/gcc.target/i386. 2021-10-11 Paul A. Clarke gcc * config/rs6000/smmintrin.h (_mm_cmpeq_epi64, _mm_cmpgt_epi64, _mm_mullo_epi32, _mm_mul_epi32, _mm_packus_epi32): New. * config/rs6000/nmmintrin.h: Copy from i386, tweak to suit. gcc/testsuite * gcc.target/powerpc/pr78102.c: Copy from gcc.target/i386, adjust dg directives to suit. * gcc.target/powerpc/sse4_1-packusdw.c: Same. * gcc.target/powerpc/sse4_1-pcmpeqq.c: Same. * gcc.target/powerpc/sse4_1-pmuldq.c: Same. * gcc.target/powerpc/sse4_1-pmulld.c: Same. * gcc.target/powerpc/sse4_2-pcmpgtq.c: Same. * gcc.target/powerpc/sse4_2-check.h: Copy from gcc.target/i386, tweak to suit. --- v4: Fix "space after cast" and "vsx_hw" issues, per Segher review. diff --git a/gcc/config/rs6000/nmmintrin.h b/gcc/config/rs6000/nmmintrin.h new file mode 100644 index 000000000000..20a70bee3776 --- /dev/null +++ b/gcc/config/rs6000/nmmintrin.h @@ -0,0 +1,40 @@ +/* Copyright (C) 2021 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef NO_WARN_X86_INTRINSICS +/* This header is distributed to simplify porting x86_64 code that + makes explicit use of Intel intrinsics to powerpc64le. + It is the user's responsibility to determine if the results are + acceptable and make additional changes as necessary. + Note that much code that uses Intel intrinsics can be rewritten in + standard C or GNU C extensions, which are more portable and better + optimized across multiple targets. */ +#endif + +#ifndef _NMMINTRIN_H_INCLUDED +#define _NMMINTRIN_H_INCLUDED + +/* We just include SSE4.1 header file. */ +#include + +#endif /* _NMMINTRIN_H_INCLUDED */ diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h index ad6b68e13cce..90ce03d22709 100644 --- a/gcc/config/rs6000/smmintrin.h +++ b/gcc/config/rs6000/smmintrin.h @@ -274,6 +274,15 @@ _mm_floor_ss (__m128 __A, __m128 __B) return __r; } +#ifdef _ARCH_PWR8 +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpeq_epi64 (__m128i __X, __m128i __Y) +{ + return (__m128i) vec_cmpeq ((__v2di) __X, (__v2di) __Y); +} +#endif + extern __inline __m128i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm_min_epi8 (__m128i __X, __m128i __Y) @@ -332,6 +341,22 @@ _mm_max_epu32 (__m128i __X, __m128i __Y) extern __inline __m128i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mullo_epi32 (__m128i __X, __m128i __Y) +{ + return (__m128i) vec_mul ((__v4su) __X, (__v4su) __Y); +} + +#ifdef _ARCH_PWR8 +__inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mul_epi32 (__m128i __X, __m128i __Y) +{ + return (__m128i) vec_mule ((__v4si) __X, (__v4si) __Y); +} +#endif + +__inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm_cvtepi8_epi16 (__m128i __A) { return (__m128i) vec_unpackh ((__v16qi) __A); @@ -495,4 +520,20 @@ _mm_minpos_epu16 (__m128i __A) return __r.__m; } +__inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_packus_epi32 (__m128i __X, __m128i __Y) +{ + return (__m128i) vec_packsu ((__v4si) __X, (__v4si) __Y); +} + +#ifdef _ARCH_PWR8 +__inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpgt_epi64 (__m128i __X, __m128i __Y) +{ + return (__m128i) vec_cmpgt ((__v2di) __X, (__v2di) __Y); +} +#endif + #endif diff --git a/gcc/testsuite/gcc.target/powerpc/pr78102.c b/gcc/testsuite/gcc.target/powerpc/pr78102.c new file mode 100644 index 000000000000..68898c7f9428 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr78102.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mvsx" } */ +/* { dg-require-effective-target powerpc_vsx_hw } */ + +#include + +__m128i +foo (const __m128i x, const __m128i y) +{ + return _mm_cmpeq_epi64 (x, y); +} + +__v2di +bar (const __v2di x, const __v2di y) +{ + return x == y; +} + +__v2di +baz (const __v2di x, const __v2di y) +{ + return x != y; +} diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c new file mode 100644 index 000000000000..8b757a267468 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c @@ -0,0 +1,73 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mvsx" } */ +/* { dg-require-effective-target powerpc_vsx_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include + +#define NUM 64 + +static unsigned short +int_to_ushort (int iVal) +{ + unsigned short sVal; + + if (iVal < 0) + sVal = 0; + else if (iVal > 0xffff) + sVal = 0xffff; + else sVal = iVal; + + return sVal; +} + +static void +TEST (void) +{ + union + { + __m128i x[NUM / 4]; + int i[NUM]; + } src1, src2; + union + { + __m128i x[NUM / 4]; + unsigned short s[NUM * 2]; + } dst; + int i, sign = 1; + + for (i = 0; i < NUM; i++) + { + src1.i[i] = i * i * sign; + src2.i[i] = (i + 20) * sign; + sign = -sign; + } + + for (i = 0; i < NUM; i += 4) + dst.x[i / 4] = _mm_packus_epi32 (src1.x [i / 4], src2.x [i / 4]); + + for (i = 0; i < NUM; i ++) + { + int dstIndex; + unsigned short sVal; + + sVal = int_to_ushort (src1.i[i]); + dstIndex = (i % 4) + (i / 4) * 8; + if (sVal != dst.s[dstIndex]) + abort (); + + sVal = int_to_ushort (src2.i[i]); + dstIndex += 4; + if (sVal != dst.s[dstIndex]) + abort (); + } +} diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pcmpeqq.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pcmpeqq.c new file mode 100644 index 000000000000..39b9f01d64a4 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pcmpeqq.c @@ -0,0 +1,46 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mpower8-vector" } */ +/* { dg-require-effective-target p8vector_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include + +#define NUM 64 + +static void +TEST (void) +{ + union + { + __m128i x[NUM / 2]; + long long ll[NUM]; + } dst, src1, src2; + int i, sign=1; + long long is_eq; + + for (i = 0; i < NUM; i++) + { + src1.ll[i] = i * i * sign; + src2.ll[i] = (i + 20) * sign; + sign = -sign; + } + + for (i = 0; i < NUM; i += 2) + dst.x [i / 2] = _mm_cmpeq_epi64(src1.x [i / 2], src2.x [i / 2]); + + for (i = 0; i < NUM; i++) + { + is_eq = src1.ll[i] == src2.ll[i] ? 0xffffffffffffffffLL : 0LL; + if (is_eq != dst.ll[i]) + abort (); + } +} diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmuldq.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmuldq.c new file mode 100644 index 000000000000..6a884f46235f --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmuldq.c @@ -0,0 +1,51 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mpower8-vector" } */ +/* { dg-require-effective-target p8vector_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include + +#define NUM 64 + +static void +TEST (void) +{ + union + { + __m128i x[NUM / 2]; + long long ll[NUM]; + } dst; + union + { + __m128i x[NUM / 2]; + int i[NUM * 2]; + } src1, src2; + int i, sign = 1; + long long value; + + for (i = 0; i < NUM * 2; i += 2) + { + src1.i[i] = i * i * sign; + src2.i[i] = (i + 20) * sign; + sign = -sign; + } + + for (i = 0; i < NUM; i += 2) + dst.x[i / 2] = _mm_mul_epi32 (src1.x[i / 2], src2.x[i / 2]); + + for (i = 0; i < NUM; i++) + { + value = (long long) src1.i[i * 2] * (long long) src2.i[i * 2]; + if (value != dst.ll[i]) + abort (); + } +} diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmulld.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmulld.c new file mode 100644 index 000000000000..730334366426 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmulld.c @@ -0,0 +1,46 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mvsx" } */ +/* { dg-require-effective-target powerpc_vsx_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include + +#define NUM 64 + +static void +TEST (void) +{ + union + { + __m128i x[NUM / 4]; + int i[NUM]; + } dst, src1, src2; + int i, sign = 1; + int value; + + for (i = 0; i < NUM; i++) + { + src1.i[i] = i * i * sign; + src2.i[i] = (i + 20) * sign; + sign = -sign; + } + + for (i = 0; i < NUM; i += 4) + dst.x[i / 4] = _mm_mullo_epi32 (src1.x[i / 4], src2.x[i / 4]); + + for (i = 0; i < NUM; i++) + { + value = src1.i[i] * src2.i[i]; + if (value != dst.i[i]) + abort (); + } +} diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_2-check.h b/gcc/testsuite/gcc.target/powerpc/sse4_2-check.h new file mode 100644 index 000000000000..f6264e5a1083 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_2-check.h @@ -0,0 +1,18 @@ +#define NO_WARN_X86_INTRINSICS 1 + +static void sse4_2_test (void); + +static void +__attribute__ ((noinline)) +do_test (void) +{ + sse4_2_test (); +} + +int +main () +{ + do_test (); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_2-pcmpgtq.c b/gcc/testsuite/gcc.target/powerpc/sse4_2-pcmpgtq.c new file mode 100644 index 000000000000..a8a6a2010f45 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_2-pcmpgtq.c @@ -0,0 +1,46 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mvsx" } */ +/* { dg-require-effective-target powerpc_vsx_hw } */ + +#ifndef CHECK_H +#define CHECK_H "sse4_2-check.h" +#endif + +#ifndef TEST +#define TEST sse4_2_test +#endif + +#include CHECK_H + +#include + +#define NUM 64 + +static void +TEST (void) +{ + union + { + __m128i x[NUM / 2]; + long long ll[NUM]; + } dst, src1, src2; + int i, sign = 1; + long long is_eq; + + for (i = 0; i < NUM; i++) + { + src1.ll[i] = i * i * sign; + src2.ll[i] = (i + 20) * sign; + sign = -sign; + } + + for (i = 0; i < NUM; i += 2) + dst.x[i / 2] = _mm_cmpgt_epi64 (src1.x[i / 2], src2.x[i / 2]); + + for (i = 0; i < NUM; i++) + { + is_eq = src1.ll[i] > src2.ll[i] ? 0xFFFFFFFFFFFFFFFFLL : 0LL; + if (is_eq != dst.ll[i]) + abort (); + } +}