From: Bill Schmidt <wschmidt@linux.ibm.com>
To: "Paul A. Clarke" <pc@us.ibm.com>, gcc-patches@gcc.gnu.org
Cc: segher@kernel.crashing.org
Subject: Re: [PATCH v3 5/6] rs6000: Support more SSE4 "cmp", "mul", "pack" intrinsics
Date: Fri, 27 Aug 2021 10:21:35 -0500 [thread overview]
Message-ID: <3c073ea1-6d25-e204-6458-9b042f66032a@linux.ibm.com> (raw)
In-Reply-To: <20210823190310.1679905-6-pc@us.ibm.com>
Hi Paul,
On 8/23/21 2:03 PM, Paul A. Clarke wrote:
> Function signatures and decorations match gcc/config/i386/smmintrin.h.
>
> Also, copy tests for:
> - _mm_cmpeq_epi64
> - _mm_mullo_epi32, _mm_mul_epi32
> - _mm_packus_epi32
> - _mm_cmpgt_epi64 (SSE4.2)
>
> from gcc/testsuite/gcc.target/i386.
>
> 2021-08-23 Paul A. Clarke <pc@us.ibm.com>
>
> gcc
> * config/rs6000/smmintrin.h (_mm_cmpeq_epi64, _mm_cmpgt_epi64,
> _mm_mullo_epi32, _mm_mul_epi32, _mm_packus_epi32): New.
> * config/rs6000/nmmintrin.h: Copy from i386, tweak to suit.
>
> gcc/testsuite
> * gcc.target/powerpc/pr78102.c: Copy from gcc.target/i386,
> adjust dg directives to suit.
> * gcc.target/powerpc/sse4_1-packusdw.c: Same.
> * gcc.target/powerpc/sse4_1-pcmpeqq.c: Same.
> * gcc.target/powerpc/sse4_1-pmuldq.c: Same.
> * gcc.target/powerpc/sse4_1-pmulld.c: Same.
> * gcc.target/powerpc/sse4_2-pcmpgtq.c: Same.
> * gcc.target/powerpc/sse4_2-check.h: Copy from gcc.target/i386,
> tweak to suit.
> ---
> v3:
> - Add nmmintrin.h. _mm_cmpgt_epi64 is part of SSE4.2, which is
> ostensibly defined in nmmintrin.h. Following the i386 implementation,
> however, nmmintrin.h only includes smmintrin.h, and the actual
> implementations appear there.
> - Add sse4_2-check.h, required by sse4_2-pcmpgtq.c. My testing was
> obviously inadequate.
> v2:
> - Added "extern" to functions to maintain compatible decorations with
> like implementations in gcc/config/i386.
> - Removed "-Wno-psabi" from tests as unnecessary, per v1 review.
> - Noted testing in patch series cover letter.
>
> gcc/config/rs6000/nmmintrin.h | 40 ++++++++++
> gcc/config/rs6000/smmintrin.h | 41 +++++++++++
> gcc/testsuite/gcc.target/powerpc/pr78102.c | 23 ++++++
> .../gcc.target/powerpc/sse4_1-packusdw.c | 73 +++++++++++++++++++
> .../gcc.target/powerpc/sse4_1-pcmpeqq.c | 46 ++++++++++++
> .../gcc.target/powerpc/sse4_1-pmuldq.c | 51 +++++++++++++
> .../gcc.target/powerpc/sse4_1-pmulld.c | 46 ++++++++++++
> .../gcc.target/powerpc/sse4_2-check.h | 18 +++++
> .../gcc.target/powerpc/sse4_2-pcmpgtq.c | 46 ++++++++++++
> 9 files changed, 384 insertions(+)
> create mode 100644 gcc/config/rs6000/nmmintrin.h
> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr78102.c
> create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c
> create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pcmpeqq.c
> create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmuldq.c
> create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmulld.c
> create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_2-check.h
> create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_2-pcmpgtq.c
>
> diff --git a/gcc/config/rs6000/nmmintrin.h b/gcc/config/rs6000/nmmintrin.h
> new file mode 100644
> index 000000000000..20a70bee3776
> --- /dev/null
> +++ b/gcc/config/rs6000/nmmintrin.h
> @@ -0,0 +1,40 @@
> +/* Copyright (C) 2021 Free Software Foundation, Inc.
> +
> + This file is part of GCC.
> +
> + GCC is free software; you can redistribute it and/or modify
> + it under the terms of the GNU General Public License as published by
> + the Free Software Foundation; either version 3, or (at your option)
> + any later version.
> +
> + GCC is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + GNU General Public License for more details.
> +
> + Under Section 7 of GPL version 3, you are granted additional
> + permissions described in the GCC Runtime Library Exception, version
> + 3.1, as published by the Free Software Foundation.
> +
> + You should have received a copy of the GNU General Public License and
> + a copy of the GCC Runtime Library Exception along with this program;
> + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
> + <http://www.gnu.org/licenses/>. */
> +
> +#ifndef NO_WARN_X86_INTRINSICS
> +/* This header is distributed to simplify porting x86_64 code that
> + makes explicit use of Intel intrinsics to powerpc64le.
> + It is the user's responsibility to determine if the results are
> + acceptable and make additional changes as necessary.
> + Note that much code that uses Intel intrinsics can be rewritten in
> + standard C or GNU C extensions, which are more portable and better
> + optimized across multiple targets. */
> +#endif
> +
> +#ifndef _NMMINTRIN_H_INCLUDED
> +#define _NMMINTRIN_H_INCLUDED
> +
> +/* We just include SSE4.1 header file. */
> +#include <smmintrin.h>
> +
> +#endif /* _NMMINTRIN_H_INCLUDED */
Should there be something in here indicating that nmmintrin.h is for SSE
4.2? Otherwise it's a bit of a head-scratcher to a new person wondering
why this file exists. No big deal either way.
This looks fine to me with or without that. Recommend approval.
Thanks!
Bill
> diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
> index fdef6674d16c..c04d2bb5b6d3 100644
> --- a/gcc/config/rs6000/smmintrin.h
> +++ b/gcc/config/rs6000/smmintrin.h
> @@ -386,6 +386,15 @@ _mm_testnzc_si128 (__m128i __A, __m128i __B)
>
> #define _mm_test_mix_ones_zeros(M, V) _mm_testnzc_si128 ((M), (V))
>
> +#ifdef _ARCH_PWR8
> +extern __inline __m128i
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cmpeq_epi64 (__m128i __X, __m128i __Y)
> +{
> + return (__m128i) vec_cmpeq ((__v2di)__X, (__v2di)__Y);
> +}
> +#endif
> +
> extern __inline __m128i
> __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> _mm_min_epi8 (__m128i __X, __m128i __Y)
> @@ -444,6 +453,22 @@ _mm_max_epu32 (__m128i __X, __m128i __Y)
>
> extern __inline __m128i
> __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_mullo_epi32 (__m128i __X, __m128i __Y)
> +{
> + return (__m128i) vec_mul ((__v4su)__X, (__v4su)__Y);
> +}
> +
> +#ifdef _ARCH_PWR8
> +__inline __m128i
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_mul_epi32 (__m128i __X, __m128i __Y)
> +{
> + return (__m128i) vec_mule ((__v4si)__X, (__v4si)__Y);
> +}
> +#endif
> +
> +__inline __m128i
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> _mm_cvtepi8_epi16 (__m128i __A)
> {
> return (__m128i) vec_unpackh ((__v16qi)__A);
> @@ -607,4 +632,20 @@ _mm_minpos_epu16 (__m128i __A)
> return __r.__m;
> }
>
> +__inline __m128i
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_packus_epi32 (__m128i __X, __m128i __Y)
> +{
> + return (__m128i) vec_packsu ((__v4si)__X, (__v4si)__Y);
> +}
> +
> +#ifdef _ARCH_PWR8
> +__inline __m128i
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_cmpgt_epi64 (__m128i __X, __m128i __Y)
> +{
> + return (__m128i) vec_cmpgt ((__v2di)__X, (__v2di)__Y);
> +}
> +#endif
> +
> #endif
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr78102.c b/gcc/testsuite/gcc.target/powerpc/pr78102.c
> new file mode 100644
> index 000000000000..56a2d497bbff
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr78102.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mvsx" } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +
> +#include <x86intrin.h>
> +
> +__m128i
> +foo (const __m128i x, const __m128i y)
> +{
> + return _mm_cmpeq_epi64 (x, y);
> +}
> +
> +__v2di
> +bar (const __v2di x, const __v2di y)
> +{
> + return x == y;
> +}
> +
> +__v2di
> +baz (const __v2di x, const __v2di y)
> +{
> + return x != y;
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c
> new file mode 100644
> index 000000000000..15b8ca418f54
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c
> @@ -0,0 +1,73 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mvsx" } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +
> +#ifndef CHECK_H
> +#define CHECK_H "sse4_1-check.h"
> +#endif
> +
> +#ifndef TEST
> +#define TEST sse4_1_test
> +#endif
> +
> +#include CHECK_H
> +
> +#include <smmintrin.h>
> +
> +#define NUM 64
> +
> +static unsigned short
> +int_to_ushort (int iVal)
> +{
> + unsigned short sVal;
> +
> + if (iVal < 0)
> + sVal = 0;
> + else if (iVal > 0xffff)
> + sVal = 0xffff;
> + else sVal = iVal;
> +
> + return sVal;
> +}
> +
> +static void
> +TEST (void)
> +{
> + union
> + {
> + __m128i x[NUM / 4];
> + int i[NUM];
> + } src1, src2;
> + union
> + {
> + __m128i x[NUM / 4];
> + unsigned short s[NUM * 2];
> + } dst;
> + int i, sign = 1;
> +
> + for (i = 0; i < NUM; i++)
> + {
> + src1.i[i] = i * i * sign;
> + src2.i[i] = (i + 20) * sign;
> + sign = -sign;
> + }
> +
> + for (i = 0; i < NUM; i += 4)
> + dst.x[i / 4] = _mm_packus_epi32 (src1.x [i / 4], src2.x [i / 4]);
> +
> + for (i = 0; i < NUM; i ++)
> + {
> + int dstIndex;
> + unsigned short sVal;
> +
> + sVal = int_to_ushort (src1.i[i]);
> + dstIndex = (i % 4) + (i / 4) * 8;
> + if (sVal != dst.s[dstIndex])
> + abort ();
> +
> + sVal = int_to_ushort (src2.i[i]);
> + dstIndex += 4;
> + if (sVal != dst.s[dstIndex])
> + abort ();
> + }
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pcmpeqq.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pcmpeqq.c
> new file mode 100644
> index 000000000000..39b9f01d64a4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pcmpeqq.c
> @@ -0,0 +1,46 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mpower8-vector" } */
> +/* { dg-require-effective-target p8vector_hw } */
> +
> +#ifndef CHECK_H
> +#define CHECK_H "sse4_1-check.h"
> +#endif
> +
> +#ifndef TEST
> +#define TEST sse4_1_test
> +#endif
> +
> +#include CHECK_H
> +
> +#include <smmintrin.h>
> +
> +#define NUM 64
> +
> +static void
> +TEST (void)
> +{
> + union
> + {
> + __m128i x[NUM / 2];
> + long long ll[NUM];
> + } dst, src1, src2;
> + int i, sign=1;
> + long long is_eq;
> +
> + for (i = 0; i < NUM; i++)
> + {
> + src1.ll[i] = i * i * sign;
> + src2.ll[i] = (i + 20) * sign;
> + sign = -sign;
> + }
> +
> + for (i = 0; i < NUM; i += 2)
> + dst.x [i / 2] = _mm_cmpeq_epi64(src1.x [i / 2], src2.x [i / 2]);
> +
> + for (i = 0; i < NUM; i++)
> + {
> + is_eq = src1.ll[i] == src2.ll[i] ? 0xffffffffffffffffLL : 0LL;
> + if (is_eq != dst.ll[i])
> + abort ();
> + }
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmuldq.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmuldq.c
> new file mode 100644
> index 000000000000..6a884f46235f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmuldq.c
> @@ -0,0 +1,51 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mpower8-vector" } */
> +/* { dg-require-effective-target p8vector_hw } */
> +
> +#ifndef CHECK_H
> +#define CHECK_H "sse4_1-check.h"
> +#endif
> +
> +#ifndef TEST
> +#define TEST sse4_1_test
> +#endif
> +
> +#include CHECK_H
> +
> +#include <smmintrin.h>
> +
> +#define NUM 64
> +
> +static void
> +TEST (void)
> +{
> + union
> + {
> + __m128i x[NUM / 2];
> + long long ll[NUM];
> + } dst;
> + union
> + {
> + __m128i x[NUM / 2];
> + int i[NUM * 2];
> + } src1, src2;
> + int i, sign = 1;
> + long long value;
> +
> + for (i = 0; i < NUM * 2; i += 2)
> + {
> + src1.i[i] = i * i * sign;
> + src2.i[i] = (i + 20) * sign;
> + sign = -sign;
> + }
> +
> + for (i = 0; i < NUM; i += 2)
> + dst.x[i / 2] = _mm_mul_epi32 (src1.x[i / 2], src2.x[i / 2]);
> +
> + for (i = 0; i < NUM; i++)
> + {
> + value = (long long) src1.i[i * 2] * (long long) src2.i[i * 2];
> + if (value != dst.ll[i])
> + abort ();
> + }
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-pmulld.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmulld.c
> new file mode 100644
> index 000000000000..150832915911
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-pmulld.c
> @@ -0,0 +1,46 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mvsx" } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +
> +#ifndef CHECK_H
> +#define CHECK_H "sse4_1-check.h"
> +#endif
> +
> +#ifndef TEST
> +#define TEST sse4_1_test
> +#endif
> +
> +#include CHECK_H
> +
> +#include <smmintrin.h>
> +
> +#define NUM 64
> +
> +static void
> +TEST (void)
> +{
> + union
> + {
> + __m128i x[NUM / 4];
> + int i[NUM];
> + } dst, src1, src2;
> + int i, sign = 1;
> + int value;
> +
> + for (i = 0; i < NUM; i++)
> + {
> + src1.i[i] = i * i * sign;
> + src2.i[i] = (i + 20) * sign;
> + sign = -sign;
> + }
> +
> + for (i = 0; i < NUM; i += 4)
> + dst.x[i / 4] = _mm_mullo_epi32 (src1.x[i / 4], src2.x[i / 4]);
> +
> + for (i = 0; i < NUM; i++)
> + {
> + value = src1.i[i] * src2.i[i];
> + if (value != dst.i[i])
> + abort ();
> + }
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_2-check.h b/gcc/testsuite/gcc.target/powerpc/sse4_2-check.h
> new file mode 100644
> index 000000000000..f6264e5a1083
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_2-check.h
> @@ -0,0 +1,18 @@
> +#define NO_WARN_X86_INTRINSICS 1
> +
> +static void sse4_2_test (void);
> +
> +static void
> +__attribute__ ((noinline))
> +do_test (void)
> +{
> + sse4_2_test ();
> +}
> +
> +int
> +main ()
> +{
> + do_test ();
> +
> + return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_2-pcmpgtq.c b/gcc/testsuite/gcc.target/powerpc/sse4_2-pcmpgtq.c
> new file mode 100644
> index 000000000000..4bfbad885b30
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_2-pcmpgtq.c
> @@ -0,0 +1,46 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mvsx" } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +
> +#ifndef CHECK_H
> +#define CHECK_H "sse4_2-check.h"
> +#endif
> +
> +#ifndef TEST
> +#define TEST sse4_2_test
> +#endif
> +
> +#include CHECK_H
> +
> +#include <nmmintrin.h>
> +
> +#define NUM 64
> +
> +static void
> +TEST (void)
> +{
> + union
> + {
> + __m128i x[NUM / 2];
> + long long ll[NUM];
> + } dst, src1, src2;
> + int i, sign = 1;
> + long long is_eq;
> +
> + for (i = 0; i < NUM; i++)
> + {
> + src1.ll[i] = i * i * sign;
> + src2.ll[i] = (i + 20) * sign;
> + sign = -sign;
> + }
> +
> + for (i = 0; i < NUM; i += 2)
> + dst.x[i / 2] = _mm_cmpgt_epi64 (src1.x[i / 2], src2.x[i / 2]);
> +
> + for (i = 0; i < NUM; i++)
> + {
> + is_eq = src1.ll[i] > src2.ll[i] ? 0xFFFFFFFFFFFFFFFFLL : 0LL;
> + if (is_eq != dst.ll[i])
> + abort ();
> + }
> +}
next prev parent reply other threads:[~2021-08-27 15:21 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-23 19:03 [PATCH v3 0/6] rs6000: Support more SSE4 intrinsics Paul A. Clarke
2021-08-23 19:03 ` [PATCH v3 1/6] rs6000: Support SSE4.1 "round" intrinsics Paul A. Clarke
2021-08-27 13:44 ` Bill Schmidt
2021-08-27 13:47 ` Bill Schmidt
2021-08-30 21:16 ` Paul A. Clarke
2021-08-30 21:24 ` Bill Schmidt
2021-10-07 23:08 ` Segher Boessenkool
2021-10-07 23:39 ` Segher Boessenkool
2021-10-08 1:04 ` Paul A. Clarke
2021-10-08 17:39 ` Segher Boessenkool
2021-10-08 19:27 ` Paul A. Clarke
2021-10-08 22:31 ` Segher Boessenkool
2021-10-11 13:46 ` Paul A. Clarke
2021-10-11 16:28 ` Segher Boessenkool
2021-10-11 17:31 ` Paul A. Clarke
2021-10-11 22:04 ` Segher Boessenkool
2021-10-12 19:35 ` Paul A. Clarke
2021-10-12 22:25 ` Segher Boessenkool
2021-10-19 0:36 ` Paul A. Clarke
2021-08-23 19:03 ` [PATCH v3 2/6] rs6000: Support SSE4.1 "min" and "max" intrinsics Paul A. Clarke
2021-08-27 13:47 ` Bill Schmidt
2021-10-11 19:28 ` Segher Boessenkool
2021-10-12 1:42 ` [COMMITTED v4 " Paul A. Clarke
2021-08-23 19:03 ` [PATCH v3 3/6] rs6000: Simplify some SSE4.1 "test" intrinsics Paul A. Clarke
2021-08-27 13:48 ` Bill Schmidt
2021-10-11 20:50 ` Segher Boessenkool
2021-10-12 1:47 ` [COMMITTED v4 " Paul A. Clarke
2021-08-23 19:03 ` [PATCH v3 4/6] rs6000: Support SSE4.1 "cvt" intrinsics Paul A. Clarke
2021-08-27 13:49 ` Bill Schmidt
2021-10-11 21:52 ` Segher Boessenkool
2021-10-12 1:51 ` [COMMITTED v4 " Paul A. Clarke
2021-08-23 19:03 ` [PATCH v3 5/6] rs6000: Support more SSE4 "cmp", "mul", "pack" intrinsics Paul A. Clarke
2021-08-27 15:21 ` Bill Schmidt [this message]
2021-08-27 18:52 ` Paul A. Clarke
2021-10-11 23:07 ` Segher Boessenkool
2021-10-12 1:55 ` [COMMITTED v4 " Paul A. Clarke
2021-08-23 19:03 ` [PATCH v3 6/6] rs6000: Guard some x86 intrinsics implementations Paul A. Clarke
2021-08-27 15:25 ` Bill Schmidt
2021-10-12 0:11 ` Segher Boessenkool
2021-10-13 17:04 ` Paul A. Clarke
2021-10-13 23:47 ` Segher Boessenkool
2021-10-19 0:26 ` Paul A. Clarke
2021-09-16 14:59 ` [PATCH v3 0/6] rs6000: Support more SSE4 intrinsics Paul A. Clarke
2021-10-04 18:26 ` Paul A. Clarke
2021-10-07 22:25 ` Segher Boessenkool
2021-10-08 0:29 ` Paul A. Clarke
2021-10-12 0:15 ` Segher Boessenkool
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3c073ea1-6d25-e204-6458-9b042f66032a@linux.ibm.com \
--to=wschmidt@linux.ibm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=pc@us.ibm.com \
--cc=segher@kernel.crashing.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).