* [PATCH v2 0/2] rs6000: Add support for _mm_minpos_epu16 @ 2021-06-08 19:11 Paul A. Clarke 2021-06-08 19:11 ` [PATCH v2 1/2] " Paul A. Clarke 2021-06-08 19:11 ` [PATCH v2 2/2] rs6000: Add test " Paul A. Clarke 0 siblings, 2 replies; 7+ messages in thread From: Paul A. Clarke @ 2021-06-08 19:11 UTC (permalink / raw) To: gcc-patches; +Cc: segher Added compatible implementation of _mm_minpos_epu16 for powerpc. Copied, improved, and fixed testcase from i386. Tested on BE, LE (32 and 64bit). Paul A. Clarke (2): rs6000: Add support for _mm_minpos_epu16 rs6000: Add test for _mm_minpos_epu16 gcc/config/rs6000/smmintrin.h | 25 +++++++ .../gcc.target/powerpc/sse4_1-phminposuw.c | 68 +++++++++++++++++++ 2 files changed, 93 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c -- 2.27.0 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 1/2] rs6000: Add support for _mm_minpos_epu16 2021-06-08 19:11 [PATCH v2 0/2] rs6000: Add support for _mm_minpos_epu16 Paul A. Clarke @ 2021-06-08 19:11 ` Paul A. Clarke 2021-06-21 14:27 ` [ping PATCH " Paul A. Clarke 2021-07-09 15:35 ` [PATCH " Bill Schmidt 2021-06-08 19:11 ` [PATCH v2 2/2] rs6000: Add test " Paul A. Clarke 1 sibling, 2 replies; 7+ messages in thread From: Paul A. Clarke @ 2021-06-08 19:11 UTC (permalink / raw) To: gcc-patches; +Cc: segher Add a naive implementation of the subject x86 intrinsic to ease porting. 2021-06-08 Paul A. Clarke <pc@us.ibm.com> gcc/ChangeLog: * config/rs6000/smmintrin.h (_mm_minpos_epu16): New. --- gcc/config/rs6000/smmintrin.h | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h index bdf6eb365d88..b7de38763f2b 100644 --- a/gcc/config/rs6000/smmintrin.h +++ b/gcc/config/rs6000/smmintrin.h @@ -116,4 +116,29 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask) return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask); } +/* Return horizontal packed word minimum and its index in bits [15:0] + and bits [18:16] respectively. */ +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_minpos_epu16 (__m128i __A) +{ + union __u + { + __m128i __m; + __v8hu __uh; + }; + union __u __u = { .__m = __A }, __r = { .__m = {0} }; + unsigned short __ridx = 0; + unsigned short __rmin = __u.__uh[__ridx]; + for (unsigned long __i = __ridx + 1; __i < 8; __i++) + { + if (__u.__uh[__i] < __rmin) + { + __rmin = __u.__uh[__i]; + __ridx = __i; + } + } + __r.__uh[0] = __rmin; + __r.__uh[1] = __ridx; + return __r.__m; +} #endif -- 2.27.0 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [ping PATCH v2 1/2] rs6000: Add support for _mm_minpos_epu16 2021-06-08 19:11 ` [PATCH v2 1/2] " Paul A. Clarke @ 2021-06-21 14:27 ` Paul A. Clarke 2021-07-09 15:35 ` [PATCH " Bill Schmidt 1 sibling, 0 replies; 7+ messages in thread From: Paul A. Clarke @ 2021-06-21 14:27 UTC (permalink / raw) To: gcc-patches, segher Gentle ping. I now realize I forgot to include a blurb about "what changed in v2". v2: - Slight formatting changes based on Segher's review (simplified condition, single line). PC On Tue, Jun 08, 2021 at 02:11:54PM -0500, Paul A. Clarke via Gcc-patches wrote: > Add a naive implementation of the subject x86 intrinsic to > ease porting. > > 2021-06-08 Paul A. Clarke <pc@us.ibm.com> > > gcc/ChangeLog: > * config/rs6000/smmintrin.h (_mm_minpos_epu16): New. > --- > gcc/config/rs6000/smmintrin.h | 25 +++++++++++++++++++++++++ > 1 file changed, 25 insertions(+) > > diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h > index bdf6eb365d88..b7de38763f2b 100644 > --- a/gcc/config/rs6000/smmintrin.h > +++ b/gcc/config/rs6000/smmintrin.h > @@ -116,4 +116,29 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask) > return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask); > } > > +/* Return horizontal packed word minimum and its index in bits [15:0] > + and bits [18:16] respectively. */ > +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) > +_mm_minpos_epu16 (__m128i __A) > +{ > + union __u > + { > + __m128i __m; > + __v8hu __uh; > + }; > + union __u __u = { .__m = __A }, __r = { .__m = {0} }; > + unsigned short __ridx = 0; > + unsigned short __rmin = __u.__uh[__ridx]; > + for (unsigned long __i = __ridx + 1; __i < 8; __i++) > + { > + if (__u.__uh[__i] < __rmin) > + { > + __rmin = __u.__uh[__i]; > + __ridx = __i; > + } > + } > + __r.__uh[0] = __rmin; > + __r.__uh[1] = __ridx; > + return __r.__m; > +} > #endif > -- > 2.27.0 > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 1/2] rs6000: Add support for _mm_minpos_epu16 2021-06-08 19:11 ` [PATCH v2 1/2] " Paul A. Clarke 2021-06-21 14:27 ` [ping PATCH " Paul A. Clarke @ 2021-07-09 15:35 ` Bill Schmidt 1 sibling, 0 replies; 7+ messages in thread From: Bill Schmidt @ 2021-07-09 15:35 UTC (permalink / raw) To: Paul A. Clarke, gcc-patches; +Cc: segher Hi Paul, On 6/8/21 2:11 PM, Paul A. Clarke via Gcc-patches wrote: > Add a naive implementation of the subject x86 intrinsic to > ease porting. "subject" won't be part of eventual commit, so please specify in commit blurb. > > 2021-06-08 Paul A. Clarke <pc@us.ibm.com> > > gcc/ChangeLog: > * config/rs6000/smmintrin.h (_mm_minpos_epu16): New. > --- > gcc/config/rs6000/smmintrin.h | 25 +++++++++++++++++++++++++ > 1 file changed, 25 insertions(+) > > diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h > index bdf6eb365d88..b7de38763f2b 100644 > --- a/gcc/config/rs6000/smmintrin.h > +++ b/gcc/config/rs6000/smmintrin.h > @@ -116,4 +116,29 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask) > return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask); > } > > +/* Return horizontal packed word minimum and its index in bits [15:0] > + and bits [18:16] respectively. */ > +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) Line too long, please break up. (I realize this happens throughout this file already, but...) > +_mm_minpos_epu16 (__m128i __A) > +{ > + union __u > + { > + __m128i __m; > + __v8hu __uh; > + }; > + union __u __u = { .__m = __A }, __r = { .__m = {0} }; > + unsigned short __ridx = 0; > + unsigned short __rmin = __u.__uh[__ridx]; > + for (unsigned long __i = __ridx + 1; __i < 8; __i++) "__ridx + 1" can just be "1" > + { > + if (__u.__uh[__i] < __rmin) > + { > + __rmin = __u.__uh[__i]; > + __ridx = __i; > + } Preceding four lines need tabs, not spaces. > + } > + __r.__uh[0] = __rmin; > + __r.__uh[1] = __ridx; > + return __r.__m; > +} > #endif Otherwise LGTM. I can't approve, but recommend approval with those things fixed. Thanks, Bill ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 2/2] rs6000: Add test for _mm_minpos_epu16 2021-06-08 19:11 [PATCH v2 0/2] rs6000: Add support for _mm_minpos_epu16 Paul A. Clarke 2021-06-08 19:11 ` [PATCH v2 1/2] " Paul A. Clarke @ 2021-06-08 19:11 ` Paul A. Clarke 2021-06-21 14:35 ` [ping PATCH " Paul A. Clarke 2021-07-09 15:46 ` [PATCH " Bill Schmidt 1 sibling, 2 replies; 7+ messages in thread From: Paul A. Clarke @ 2021-06-08 19:11 UTC (permalink / raw) To: gcc-patches; +Cc: segher Copy the test for _mm_minpos_epu16 from gcc/testsuite/gcc.target/i386/sse4_1-phminposuw.c, with a few adjustments: - Adjust the dejagnu directives for powerpc platform. - Make the data not be monotonically increasing, such that some of the returned values are not always the first value (index 0). - Create a list of input data testing various scenarios including more than one minimum value and different orders and indicies of the minimum value. - Fix a masking issue where the index was being truncated to 2 bits instead of 3 bits, which wasn't found because all of the returned indicies were 0 with the original generated data. - Support big-endian. 2021-06-08 Paul A. Clarke <pc@us.ibm.com> gcc/testsuite/ChangeLog: * gcc.target/powerpc/sse4_1-phminposuw.c: Copy from gcc/testsuite/gcc.target/i386, make more robust. --- .../gcc.target/powerpc/sse4_1-phminposuw.c | 68 +++++++++++++++++++ 1 file changed, 68 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c new file mode 100644 index 000000000000..3bb5a2dfe4f5 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c @@ -0,0 +1,68 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */ +/* { dg-require-effective-target p8vector_hw } */ + +#define NO_WARN_X86_INTRINSICS 1 +#ifndef CHECK_H +#define CHECK_H "sse4_1-check.h" +#endif + +#ifndef TEST +#define TEST sse4_1_test +#endif + +#include CHECK_H + +#include <smmintrin.h> + +#define DIM(a) (sizeof (a) / sizeof ((a)[0])) + +static void +TEST (void) +{ + union + { + __m128i x; + unsigned short s[8]; + } src[] = + { + { .s = { 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000 } }, + { .s = { 0x0000, 0xffff, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000 } }, + { .s = { 0xffff, 0xffff, 0x0000, 0xffff, 0xffff, 0xffff, 0x0000, 0xffff } }, + { .s = { 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007, 0x0008 } }, + { .s = { 0x0008, 0x0007, 0x0006, 0x0005, 0x0004, 0x0003, 0x0002, 0x0001 } }, + { .s = { 0xfff4, 0xfff3, 0xfff2, 0xfff1, 0xfff3, 0xfff1, 0xfff2, 0xfff3 } } + }; + unsigned short minVal[DIM (src)]; + int minInd[DIM (src)]; + unsigned short minValScalar, minIndScalar; + int i, j; + union + { + int i; + unsigned short s[2]; + } res; + + for (i = 0; i < DIM (src); i++) + { + res.i = _mm_cvtsi128_si32 (_mm_minpos_epu16 (src[i].x)); + minVal[i] = res.s[0]; + minInd[i] = res.s[1] & 0b111; + } + + for (i = 0; i < DIM (src); i++) + { + minValScalar = src[i].s[0]; + minIndScalar = 0; + + for (j = 1; j < 8; j++) + if (minValScalar > src[i].s[j]) + { + minValScalar = src[i].s[j]; + minIndScalar = j; + } + + if (minValScalar != minVal[i] && minIndScalar != minInd[i]) + abort (); + } +} -- 2.27.0 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [ping PATCH v2 2/2] rs6000: Add test for _mm_minpos_epu16 2021-06-08 19:11 ` [PATCH v2 2/2] rs6000: Add test " Paul A. Clarke @ 2021-06-21 14:35 ` Paul A. Clarke 2021-07-09 15:46 ` [PATCH " Bill Schmidt 1 sibling, 0 replies; 7+ messages in thread From: Paul A. Clarke @ 2021-06-21 14:35 UTC (permalink / raw) To: gcc-patches, segher Gentle ping. I now realize I forgot to include a blurb about "what changed in v2". v2: - Rewrote input computation into a fixed set of useful input sets: - All equal. - All but one equal. - A couple equal, but not in positions that are endian-identical. - Minimum first. - Minimum last. - Values which are large enough that if considered signed would be considered minimums, but not if unsigned. - Convert "dimension of array" computation to DIM macro. On Tue, Jun 08, 2021 at 02:11:55PM -0500, Paul A. Clarke via Gcc-patches wrote: > Copy the test for _mm_minpos_epu16 from > gcc/testsuite/gcc.target/i386/sse4_1-phminposuw.c, with > a few adjustments: > > - Adjust the dejagnu directives for powerpc platform. > - Make the data not be monotonically increasing, > such that some of the returned values are not > always the first value (index 0). > - Create a list of input data testing various scenarios > including more than one minimum value and different > orders and indicies of the minimum value. > - Fix a masking issue where the index was being truncated > to 2 bits instead of 3 bits, which wasn't found because > all of the returned indicies were 0 with the original > generated data. > - Support big-endian. > > 2021-06-08 Paul A. Clarke <pc@us.ibm.com> > > gcc/testsuite/ChangeLog: > * gcc.target/powerpc/sse4_1-phminposuw.c: Copy from > gcc/testsuite/gcc.target/i386, make more robust. > --- > .../gcc.target/powerpc/sse4_1-phminposuw.c | 68 +++++++++++++++++++ > 1 file changed, 68 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c > > diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c > new file mode 100644 > index 000000000000..3bb5a2dfe4f5 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c > @@ -0,0 +1,68 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */ > +/* { dg-require-effective-target p8vector_hw } */ > + > +#define NO_WARN_X86_INTRINSICS 1 > +#ifndef CHECK_H > +#define CHECK_H "sse4_1-check.h" > +#endif > + > +#ifndef TEST > +#define TEST sse4_1_test > +#endif > + > +#include CHECK_H > + > +#include <smmintrin.h> > + > +#define DIM(a) (sizeof (a) / sizeof ((a)[0])) > + > +static void > +TEST (void) > +{ > + union > + { > + __m128i x; > + unsigned short s[8]; > + } src[] = > + { > + { .s = { 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000 } }, > + { .s = { 0x0000, 0xffff, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000 } }, > + { .s = { 0xffff, 0xffff, 0x0000, 0xffff, 0xffff, 0xffff, 0x0000, 0xffff } }, > + { .s = { 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007, 0x0008 } }, > + { .s = { 0x0008, 0x0007, 0x0006, 0x0005, 0x0004, 0x0003, 0x0002, 0x0001 } }, > + { .s = { 0xfff4, 0xfff3, 0xfff2, 0xfff1, 0xfff3, 0xfff1, 0xfff2, 0xfff3 } } > + }; > + unsigned short minVal[DIM (src)]; > + int minInd[DIM (src)]; > + unsigned short minValScalar, minIndScalar; > + int i, j; > + union > + { > + int i; > + unsigned short s[2]; > + } res; > + > + for (i = 0; i < DIM (src); i++) > + { > + res.i = _mm_cvtsi128_si32 (_mm_minpos_epu16 (src[i].x)); > + minVal[i] = res.s[0]; > + minInd[i] = res.s[1] & 0b111; > + } > + > + for (i = 0; i < DIM (src); i++) > + { > + minValScalar = src[i].s[0]; > + minIndScalar = 0; > + > + for (j = 1; j < 8; j++) > + if (minValScalar > src[i].s[j]) > + { > + minValScalar = src[i].s[j]; > + minIndScalar = j; > + } > + > + if (minValScalar != minVal[i] && minIndScalar != minInd[i]) > + abort (); > + } > +} > -- > 2.27.0 > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 2/2] rs6000: Add test for _mm_minpos_epu16 2021-06-08 19:11 ` [PATCH v2 2/2] rs6000: Add test " Paul A. Clarke 2021-06-21 14:35 ` [ping PATCH " Paul A. Clarke @ 2021-07-09 15:46 ` Bill Schmidt 1 sibling, 0 replies; 7+ messages in thread From: Bill Schmidt @ 2021-07-09 15:46 UTC (permalink / raw) To: gcc-patches Hi Paul, On 6/8/21 2:11 PM, Paul A. Clarke via Gcc-patches wrote: > Copy the test for _mm_minpos_epu16 from > gcc/testsuite/gcc.target/i386/sse4_1-phminposuw.c, with > a few adjustments: > > - Adjust the dejagnu directives for powerpc platform. > - Make the data not be monotonically increasing, > such that some of the returned values are not > always the first value (index 0). > - Create a list of input data testing various scenarios > including more than one minimum value and different > orders and indicies of the minimum value. Typo: indices > - Fix a masking issue where the index was being truncated > to 2 bits instead of 3 bits, which wasn't found because > all of the returned indicies were 0 with the original and here > generated data. > - Support big-endian. Thank you for attention to detail. :) > > 2021-06-08 Paul A. Clarke <pc@us.ibm.com> > > gcc/testsuite/ChangeLog: > * gcc.target/powerpc/sse4_1-phminposuw.c: Copy from > gcc/testsuite/gcc.target/i386, make more robust. > --- > .../gcc.target/powerpc/sse4_1-phminposuw.c | 68 +++++++++++++++++++ > 1 file changed, 68 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c > > diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c > new file mode 100644 > index 000000000000..3bb5a2dfe4f5 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c > @@ -0,0 +1,68 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */ > +/* { dg-require-effective-target p8vector_hw } */ > + > +#define NO_WARN_X86_INTRINSICS 1 > +#ifndef CHECK_H > +#define CHECK_H "sse4_1-check.h" > +#endif > + > +#ifndef TEST > +#define TEST sse4_1_test > +#endif > + > +#include CHECK_H > + > +#include <smmintrin.h> > + > +#define DIM(a) (sizeof (a) / sizeof ((a)[0])) > + > +static void > +TEST (void) > +{ > + union > + { > + __m128i x; > + unsigned short s[8]; > + } src[] = > + { > + { .s = { 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000 } }, > + { .s = { 0x0000, 0xffff, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000 } }, > + { .s = { 0xffff, 0xffff, 0x0000, 0xffff, 0xffff, 0xffff, 0x0000, 0xffff } }, > + { .s = { 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007, 0x0008 } }, > + { .s = { 0x0008, 0x0007, 0x0006, 0x0005, 0x0004, 0x0003, 0x0002, 0x0001 } }, > + { .s = { 0xfff4, 0xfff3, 0xfff2, 0xfff1, 0xfff3, 0xfff1, 0xfff2, 0xfff3 } } > + }; > + unsigned short minVal[DIM (src)]; > + int minInd[DIM (src)]; > + unsigned short minValScalar, minIndScalar; > + int i, j; > + union > + { > + int i; No need to change, but overloading i in another scope is not the greatest style. I assume this came with the original test. > + unsigned short s[2]; > + } res; > + > + for (i = 0; i < DIM (src); i++) > + { > + res.i = _mm_cvtsi128_si32 (_mm_minpos_epu16 (src[i].x)); > + minVal[i] = res.s[0]; > + minInd[i] = res.s[1] & 0b111; > + } > + > + for (i = 0; i < DIM (src); i++) > + { > + minValScalar = src[i].s[0]; > + minIndScalar = 0; > + > + for (j = 1; j < 8; j++) > + if (minValScalar > src[i].s[j]) > + { > + minValScalar = src[i].s[j]; > + minIndScalar = j; > + } > + > + if (minValScalar != minVal[i] && minIndScalar != minInd[i]) > + abort (); > + } > +} LGTM with spelling addressed. I can't approve, but recommend approval with those changes. Thanks, Bill ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-07-09 15:46 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-06-08 19:11 [PATCH v2 0/2] rs6000: Add support for _mm_minpos_epu16 Paul A. Clarke 2021-06-08 19:11 ` [PATCH v2 1/2] " Paul A. Clarke 2021-06-21 14:27 ` [ping PATCH " Paul A. Clarke 2021-07-09 15:35 ` [PATCH " Bill Schmidt 2021-06-08 19:11 ` [PATCH v2 2/2] rs6000: Add test " Paul A. Clarke 2021-06-21 14:35 ` [ping PATCH " Paul A. Clarke 2021-07-09 15:46 ` [PATCH " Bill Schmidt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).