* [PATCH v3 0/2] rs6000: Add support for _mm_minpos_epu16 @ 2021-07-15 23:29 Paul A. Clarke 2021-07-15 23:29 ` [PATCH v3 1/2] " Paul A. Clarke 2021-07-15 23:29 ` [PATCH v3 2/2] rs6000: Add test " Paul A. Clarke 0 siblings, 2 replies; 10+ messages in thread From: Paul A. Clarke @ 2021-07-15 23:29 UTC (permalink / raw) To: gcc-patches; +Cc: segher, wschmidt Added compatible implementation of _mm_minpos_epu16 for powerpc. Copied, improved, and fixed testcase from i386. Tested on BE, LE (32 and 64bit). Paul A. Clarke (2): rs6000: Add support for _mm_minpos_epu16 - v3: Changes per Bill's review. rs6000: Add test for _mm_minpos_epu16 - v3: Changes per Bill's review. gcc/config/rs6000/smmintrin.h | 27 ++++++++ .../gcc.target/powerpc/sse4_1-phminposuw.c | 68 +++++++++++++++++++ 2 files changed, 95 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c -- 2.27.0 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 1/2] rs6000: Add support for _mm_minpos_epu16 2021-07-15 23:29 [PATCH v3 0/2] rs6000: Add support for _mm_minpos_epu16 Paul A. Clarke @ 2021-07-15 23:29 ` Paul A. Clarke 2021-07-16 20:06 ` Bill Schmidt 2021-08-02 22:29 ` Segher Boessenkool 2021-07-15 23:29 ` [PATCH v3 2/2] rs6000: Add test " Paul A. Clarke 1 sibling, 2 replies; 10+ messages in thread From: Paul A. Clarke @ 2021-07-15 23:29 UTC (permalink / raw) To: gcc-patches; +Cc: segher, wschmidt Add a naive implementation of the subject x86 intrinsic to ease porting. 2021-07-15 Paul A. Clarke <pc@us.ibm.com> gcc * config/rs6000/smmintrin.h (_mm_minpos_epu16): New. --- v3: Minor formatting changes per review from Bill. v2: Minor formatting changes per review from Segher. gcc/config/rs6000/smmintrin.h | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h index 16fd34d836ff..6a010fdbb96f 100644 --- a/gcc/config/rs6000/smmintrin.h +++ b/gcc/config/rs6000/smmintrin.h @@ -172,4 +172,31 @@ _mm_test_mix_ones_zeros (__m128i __A, __m128i __mask) return any_ones * any_zeros; } +/* Return horizontal packed word minimum and its index in bits [15:0] + and bits [18:16] respectively. */ +__inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_minpos_epu16 (__m128i __A) +{ + union __u + { + __m128i __m; + __v8hu __uh; + }; + union __u __u = { .__m = __A }, __r = { .__m = {0} }; + unsigned short __ridx = 0; + unsigned short __rmin = __u.__uh[__ridx]; + for (unsigned long __i = 1; __i < 8; __i++) + { + if (__u.__uh[__i] < __rmin) + { + __rmin = __u.__uh[__i]; + __ridx = __i; + } + } + __r.__uh[0] = __rmin; + __r.__uh[1] = __ridx; + return __r.__m; +} + #endif -- 2.27.0 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/2] rs6000: Add support for _mm_minpos_epu16 2021-07-15 23:29 ` [PATCH v3 1/2] " Paul A. Clarke @ 2021-07-16 20:06 ` Bill Schmidt 2021-08-02 22:29 ` Segher Boessenkool 1 sibling, 0 replies; 10+ messages in thread From: Bill Schmidt @ 2021-07-16 20:06 UTC (permalink / raw) To: Paul A. Clarke, gcc-patches; +Cc: segher Hi Paul, LGTM. Recommend maintainers approve. Thanks for the cleanups, Bill On 7/15/21 6:29 PM, Paul A. Clarke wrote: > Add a naive implementation of the subject x86 intrinsic to > ease porting. > > 2021-07-15 Paul A. Clarke <pc@us.ibm.com> > > gcc > * config/rs6000/smmintrin.h (_mm_minpos_epu16): New. > --- > v3: Minor formatting changes per review from Bill. > v2: Minor formatting changes per review from Segher. > > gcc/config/rs6000/smmintrin.h | 27 +++++++++++++++++++++++++++ > 1 file changed, 27 insertions(+) > > diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h > index 16fd34d836ff..6a010fdbb96f 100644 > --- a/gcc/config/rs6000/smmintrin.h > +++ b/gcc/config/rs6000/smmintrin.h > @@ -172,4 +172,31 @@ _mm_test_mix_ones_zeros (__m128i __A, __m128i __mask) > return any_ones * any_zeros; > } > > +/* Return horizontal packed word minimum and its index in bits [15:0] > + and bits [18:16] respectively. */ > +__inline __m128i > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm_minpos_epu16 (__m128i __A) > +{ > + union __u > + { > + __m128i __m; > + __v8hu __uh; > + }; > + union __u __u = { .__m = __A }, __r = { .__m = {0} }; > + unsigned short __ridx = 0; > + unsigned short __rmin = __u.__uh[__ridx]; > + for (unsigned long __i = 1; __i < 8; __i++) > + { > + if (__u.__uh[__i] < __rmin) > + { > + __rmin = __u.__uh[__i]; > + __ridx = __i; > + } > + } > + __r.__uh[0] = __rmin; > + __r.__uh[1] = __ridx; > + return __r.__m; > +} > + > #endif ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/2] rs6000: Add support for _mm_minpos_epu16 2021-07-15 23:29 ` [PATCH v3 1/2] " Paul A. Clarke 2021-07-16 20:06 ` Bill Schmidt @ 2021-08-02 22:29 ` Segher Boessenkool 2021-08-03 13:01 ` Paul A. Clarke 1 sibling, 1 reply; 10+ messages in thread From: Segher Boessenkool @ 2021-08-02 22:29 UTC (permalink / raw) To: Paul A. Clarke; +Cc: gcc-patches, wschmidt Hi! On Thu, Jul 15, 2021 at 06:29:17PM -0500, Paul A. Clarke wrote: > Add a naive implementation of the subject x86 intrinsic to > ease porting. > --- a/gcc/config/rs6000/smmintrin.h > +++ b/gcc/config/rs6000/smmintrin.h > @@ -172,4 +172,31 @@ _mm_test_mix_ones_zeros (__m128i __A, __m128i __mask) > return any_ones * any_zeros; > } > > +/* Return horizontal packed word minimum and its index in bits [15:0] > + and bits [18:16] respectively. */ > +__inline __m128i > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > +_mm_minpos_epu16 (__m128i __A) > +{ > + union __u > + { > + __m128i __m; > + __v8hu __uh; > + }; > + union __u __u = { .__m = __A }, __r = { .__m = {0} }; > + unsigned short __ridx = 0; > + unsigned short __rmin = __u.__uh[__ridx]; > + for (unsigned long __i = 1; __i < 8; __i++) > + { > + if (__u.__uh[__i] < __rmin) > + { > + __rmin = __u.__uh[__i]; > + __ridx = __i; > + } > + } > + __r.__uh[0] = __rmin; > + __r.__uh[1] = __ridx; > + return __r.__m; > +} As before: does this work correctly on BE? Was it tested there? Okay for trunk if so. Thanks! Segher ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/2] rs6000: Add support for _mm_minpos_epu16 2021-08-02 22:29 ` Segher Boessenkool @ 2021-08-03 13:01 ` Paul A. Clarke 0 siblings, 0 replies; 10+ messages in thread From: Paul A. Clarke @ 2021-08-03 13:01 UTC (permalink / raw) To: Segher Boessenkool; +Cc: wschmidt, gcc-patches On Mon, Aug 02, 2021 at 05:29:08PM -0500, Segher Boessenkool wrote: > On Thu, Jul 15, 2021 at 06:29:17PM -0500, Paul A. Clarke wrote: > > Add a naive implementation of the subject x86 intrinsic to > > ease porting. > > > --- a/gcc/config/rs6000/smmintrin.h > > +++ b/gcc/config/rs6000/smmintrin.h > > @@ -172,4 +172,31 @@ _mm_test_mix_ones_zeros (__m128i __A, __m128i __mask) > > return any_ones * any_zeros; > > } > > > > +/* Return horizontal packed word minimum and its index in bits [15:0] > > + and bits [18:16] respectively. */ > > +__inline __m128i > > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) > > +_mm_minpos_epu16 (__m128i __A) > > +{ > > + union __u > > + { > > + __m128i __m; > > + __v8hu __uh; > > + }; > > + union __u __u = { .__m = __A }, __r = { .__m = {0} }; > > + unsigned short __ridx = 0; > > + unsigned short __rmin = __u.__uh[__ridx]; > > + for (unsigned long __i = 1; __i < 8; __i++) > > + { > > + if (__u.__uh[__i] < __rmin) > > + { > > + __rmin = __u.__uh[__i]; > > + __ridx = __i; > > + } > > + } > > + __r.__uh[0] = __rmin; > > + __r.__uh[1] = __ridx; > > + return __r.__m; > > +} > > As before: does this work correctly on BE? Was it tested there? Per the "cover letter": | Tested on BE, LE (32 and 64bit). > Okay for trunk if so. Thanks! Thanks! I'll push this shortly. PC ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 2/2] rs6000: Add test for _mm_minpos_epu16 2021-07-15 23:29 [PATCH v3 0/2] rs6000: Add support for _mm_minpos_epu16 Paul A. Clarke 2021-07-15 23:29 ` [PATCH v3 1/2] " Paul A. Clarke @ 2021-07-15 23:29 ` Paul A. Clarke 2021-07-16 20:19 ` Bill Schmidt 2021-08-02 22:54 ` Segher Boessenkool 1 sibling, 2 replies; 10+ messages in thread From: Paul A. Clarke @ 2021-07-15 23:29 UTC (permalink / raw) To: gcc-patches; +Cc: segher, wschmidt Copy the test for _mm_minpos_epu16 from gcc/testsuite/gcc.target/i386/sse4_1-phminposuw.c, with a few adjustments: - Adjust the dejagnu directives for powerpc platform. - Make the data not be monotonically increasing, such that some of the returned values are not always the first value (index 0). - Create a list of input data testing various scenarios including more than one minimum value and different orders and indices of the minimum value. - Fix a masking issue where the index was being truncated to 2 bits instead of 3 bits, which wasn't found because all of the returned indices were 0 with the original generated data. - Support big-endian. 2021-07-15 Paul A. Clarke <pc@us.ibm.com> gcc/testsuite * gcc.target/powerpc/sse4_1-phminposuw.c: Copy from gcc/testsuite/gcc.target/i386, make more robust. --- v3: Minor formatting changes per Bill's review. v2: Rewrote to utilize much more interesting input data afer Segher's review. .../gcc.target/powerpc/sse4_1-phminposuw.c | 68 +++++++++++++++++++ 1 file changed, 68 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c new file mode 100644 index 000000000000..88d9b43c431c --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c @@ -0,0 +1,68 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target p8vector_hw } */ + +#define NO_WARN_X86_INTRINSICS 1 +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include <smmintrin.h> + +#define DIM(a) (sizeof (a) / sizeof ((a)[0])) + +static void +TEST (void) +{ + union + { + __m128i x; + unsigned short s[8]; + } src[] = + { + { .s = { 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000 } }, + { .s = { 0x0000, 0xffff, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000 } }, + { .s = { 0xffff, 0xffff, 0x0000, 0xffff, 0xffff, 0xffff, 0x0000, 0xffff } }, + { .s = { 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007, 0x0008 } }, + { .s = { 0x0008, 0x0007, 0x0006, 0x0005, 0x0004, 0x0003, 0x0002, 0x0001 } }, + { .s = { 0xfff4, 0xfff3, 0xfff2, 0xfff1, 0xfff3, 0xfff1, 0xfff2, 0xfff3 } } + }; + unsigned short minVal[DIM (src)]; + int minInd[DIM (src)]; + unsigned short minValScalar, minIndScalar; + int i, j; + union + { + int si; + unsigned short s[2]; + } res; + + for (i = 0; i < DIM (src); i++) + { + res.si = _mm_cvtsi128_si32 (_mm_minpos_epu16 (src[i].x)); + minVal[i] = res.s[0]; + minInd[i] = res.s[1] & 0b111; + } + + for (i = 0; i < DIM (src); i++) + { + minValScalar = src[i].s[0]; + minIndScalar = 0; + + for (j = 1; j < 8; j++) + if (minValScalar > src[i].s[j]) + { + minValScalar = src[i].s[j]; + minIndScalar = j; + } + + if (minValScalar != minVal[i] && minIndScalar != minInd[i]) + abort (); + } +} -- 2.27.0 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 2/2] rs6000: Add test for _mm_minpos_epu16 2021-07-15 23:29 ` [PATCH v3 2/2] rs6000: Add test " Paul A. Clarke @ 2021-07-16 20:19 ` Bill Schmidt 2021-08-02 22:54 ` Segher Boessenkool 1 sibling, 0 replies; 10+ messages in thread From: Bill Schmidt @ 2021-07-16 20:19 UTC (permalink / raw) To: Paul A. Clarke, gcc-patches; +Cc: segher Hi Paul, Thanks for the cleanups, LGTM! Recommend maintainers approve. Bill On 7/15/21 6:29 PM, Paul A. Clarke wrote: > Copy the test for _mm_minpos_epu16 from > gcc/testsuite/gcc.target/i386/sse4_1-phminposuw.c, with > a few adjustments: > > - Adjust the dejagnu directives for powerpc platform. > - Make the data not be monotonically increasing, > such that some of the returned values are not > always the first value (index 0). > - Create a list of input data testing various scenarios > including more than one minimum value and different > orders and indices of the minimum value. > - Fix a masking issue where the index was being truncated > to 2 bits instead of 3 bits, which wasn't found because > all of the returned indices were 0 with the original > generated data. > - Support big-endian. > > 2021-07-15 Paul A. Clarke <pc@us.ibm.com> > > gcc/testsuite > * gcc.target/powerpc/sse4_1-phminposuw.c: Copy from > gcc/testsuite/gcc.target/i386, make more robust. > --- > v3: Minor formatting changes per Bill's review. > v2: Rewrote to utilize much more interesting input data afer Segher's > review. > > .../gcc.target/powerpc/sse4_1-phminposuw.c | 68 +++++++++++++++++++ > 1 file changed, 68 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c > > diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c > new file mode 100644 > index 000000000000..88d9b43c431c > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c > @@ -0,0 +1,68 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */ > +/* { dg-require-effective-target p8vector_hw } */ > + > +#define NO_WARN_X86_INTRINSICS 1 > +#ifndef CHECK_H > +#define CHECK_H "sse4_1-check.h" > +#endif > + > +#ifndef TEST > +#define TEST sse4_1_test > +#endif > + > +#include CHECK_H > + > +#include <smmintrin.h> > + > +#define DIM(a) (sizeof (a) / sizeof ((a)[0])) > + > +static void > +TEST (void) > +{ > + union > + { > + __m128i x; > + unsigned short s[8]; > + } src[] = > + { > + { .s = { 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000 } }, > + { .s = { 0x0000, 0xffff, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000 } }, > + { .s = { 0xffff, 0xffff, 0x0000, 0xffff, 0xffff, 0xffff, 0x0000, 0xffff } }, > + { .s = { 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007, 0x0008 } }, > + { .s = { 0x0008, 0x0007, 0x0006, 0x0005, 0x0004, 0x0003, 0x0002, 0x0001 } }, > + { .s = { 0xfff4, 0xfff3, 0xfff2, 0xfff1, 0xfff3, 0xfff1, 0xfff2, 0xfff3 } } > + }; > + unsigned short minVal[DIM (src)]; > + int minInd[DIM (src)]; > + unsigned short minValScalar, minIndScalar; > + int i, j; > + union > + { > + int si; > + unsigned short s[2]; > + } res; > + > + for (i = 0; i < DIM (src); i++) > + { > + res.si = _mm_cvtsi128_si32 (_mm_minpos_epu16 (src[i].x)); > + minVal[i] = res.s[0]; > + minInd[i] = res.s[1] & 0b111; > + } > + > + for (i = 0; i < DIM (src); i++) > + { > + minValScalar = src[i].s[0]; > + minIndScalar = 0; > + > + for (j = 1; j < 8; j++) > + if (minValScalar > src[i].s[j]) > + { > + minValScalar = src[i].s[j]; > + minIndScalar = j; > + } > + > + if (minValScalar != minVal[i] && minIndScalar != minInd[i]) > + abort (); > + } > +} ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 2/2] rs6000: Add test for _mm_minpos_epu16 2021-07-15 23:29 ` [PATCH v3 2/2] rs6000: Add test " Paul A. Clarke 2021-07-16 20:19 ` Bill Schmidt @ 2021-08-02 22:54 ` Segher Boessenkool 1 sibling, 0 replies; 10+ messages in thread From: Segher Boessenkool @ 2021-08-02 22:54 UTC (permalink / raw) To: Paul A. Clarke; +Cc: gcc-patches, wschmidt Hi! On Thu, Jul 15, 2021 at 06:29:18PM -0500, Paul A. Clarke wrote: > Copy the test for _mm_minpos_epu16 from > gcc/testsuite/gcc.target/i386/sse4_1-phminposuw.c, with > a few adjustments: > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c > @@ -0,0 +1,68 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */ > +/* { dg-require-effective-target p8vector_hw } */ What does this need P8 for? Please test for just what you need, and don't use -mpower8-vector at all, it is never needed? Okay for trunk with that fixed. Thanks! Segher ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/2] rs6000: Add support for _mm_minpos_epu16
@ 2021-07-28 2:29 David Edelsohn
2021-07-29 22:03 ` Paul A. Clarke
0 siblings, 1 reply; 10+ messages in thread
From: David Edelsohn @ 2021-07-28 2:29 UTC (permalink / raw)
To: Paul A. Clarke; +Cc: GCC Patches, Bill Schmidt
> Add a naive implementation of the subject x86 intrinsic to
> ease porting.
>
> 2021-07-15 Paul A. Clarke <pc@us.ibm.com>
>
> gcc
> * config/rs6000/smmintrin.h (_mm_minpos_epu16): New.
Segher already approved this with the changes requested.
Thanks, David
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/2] rs6000: Add support for _mm_minpos_epu16 2021-07-28 2:29 [PATCH v3 1/2] rs6000: Add support " David Edelsohn @ 2021-07-29 22:03 ` Paul A. Clarke 0 siblings, 0 replies; 10+ messages in thread From: Paul A. Clarke @ 2021-07-29 22:03 UTC (permalink / raw) To: David Edelsohn; +Cc: Bill Schmidt, GCC Patches, segher On Tue, Jul 27, 2021 at 10:29:13PM -0400, David Edelsohn via Gcc-patches wrote: > > Add a naive implementation of the subject x86 intrinsic to > > ease porting. > > > > 2021-07-15 Paul A. Clarke <pc@us.ibm.com> > > > > gcc > > * config/rs6000/smmintrin.h (_mm_minpos_epu16): New. > > Segher already approved this with the changes requested. Segher said: | This does not compute the index correctly for big endian (it needs to | walk from right to left for that). The construction of the return value | looks wrong as well. | | Okay for trunk with that fixed. Thanks! I responded: | I'm not seeing the issue here. The values are numbered by element order, | and the results are in the "first" (minimum value) and "second" (index of | first encountered minimum value in element order) elements of the result. I did not get a response, nor did I change any code. It feels like a stretch to equate the above exchange to "approved", so I'll continue to wait for explicit approval. PC ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-08-03 13:01 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-07-15 23:29 [PATCH v3 0/2] rs6000: Add support for _mm_minpos_epu16 Paul A. Clarke 2021-07-15 23:29 ` [PATCH v3 1/2] " Paul A. Clarke 2021-07-16 20:06 ` Bill Schmidt 2021-08-02 22:29 ` Segher Boessenkool 2021-08-03 13:01 ` Paul A. Clarke 2021-07-15 23:29 ` [PATCH v3 2/2] rs6000: Add test " Paul A. Clarke 2021-07-16 20:19 ` Bill Schmidt 2021-08-02 22:54 ` Segher Boessenkool 2021-07-28 2:29 [PATCH v3 1/2] rs6000: Add support " David Edelsohn 2021-07-29 22:03 ` Paul A. Clarke
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).