public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v3 0/2] rs6000: Add support for _mm_minpos_epu16
@ 2021-07-15 23:29 Paul A. Clarke
  2021-07-15 23:29 ` [PATCH v3 1/2] " Paul A. Clarke
  2021-07-15 23:29 ` [PATCH v3 2/2] rs6000: Add test " Paul A. Clarke
  0 siblings, 2 replies; 9+ messages in thread
From: Paul A. Clarke @ 2021-07-15 23:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, wschmidt

Added compatible implementation of _mm_minpos_epu16 for powerpc.
Copied, improved, and fixed testcase from i386.
Tested on BE, LE (32 and 64bit).

Paul A. Clarke (2):
  rs6000: Add support for _mm_minpos_epu16
  - v3: Changes per Bill's review.
  rs6000: Add test for _mm_minpos_epu16
  - v3: Changes per Bill's review.

 gcc/config/rs6000/smmintrin.h                 | 27 ++++++++
 .../gcc.target/powerpc/sse4_1-phminposuw.c    | 68 +++++++++++++++++++
 2 files changed, 95 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c

-- 
2.27.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 1/2] rs6000: Add support for _mm_minpos_epu16
  2021-07-15 23:29 [PATCH v3 0/2] rs6000: Add support for _mm_minpos_epu16 Paul A. Clarke
@ 2021-07-15 23:29 ` Paul A. Clarke
  2021-07-16 20:06   ` Bill Schmidt
  2021-08-02 22:29   ` Segher Boessenkool
  2021-07-15 23:29 ` [PATCH v3 2/2] rs6000: Add test " Paul A. Clarke
  1 sibling, 2 replies; 9+ messages in thread
From: Paul A. Clarke @ 2021-07-15 23:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, wschmidt

Add a naive implementation of the subject x86 intrinsic to
ease porting.

2021-07-15  Paul A. Clarke  <pc@us.ibm.com>

gcc
        * config/rs6000/smmintrin.h (_mm_minpos_epu16): New.
---
v3: Minor formatting changes per review from Bill.
v2: Minor formatting changes per review from Segher.

 gcc/config/rs6000/smmintrin.h | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 16fd34d836ff..6a010fdbb96f 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -172,4 +172,31 @@ _mm_test_mix_ones_zeros (__m128i __A, __m128i __mask)
   return any_ones * any_zeros;
 }
 
+/* Return horizontal packed word minimum and its index in bits [15:0]
+   and bits [18:16] respectively.  */
+__inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_minpos_epu16 (__m128i __A)
+{
+  union __u
+    {
+      __m128i __m;
+      __v8hu __uh;
+    };
+  union __u __u = { .__m = __A }, __r = { .__m = {0} };
+  unsigned short __ridx = 0;
+  unsigned short __rmin = __u.__uh[__ridx];
+  for (unsigned long __i = 1; __i < 8; __i++)
+    {
+      if (__u.__uh[__i] < __rmin)
+	{
+	  __rmin = __u.__uh[__i];
+	  __ridx = __i;
+	}
+    }
+  __r.__uh[0] = __rmin;
+  __r.__uh[1] = __ridx;
+  return __r.__m;
+}
+
 #endif
-- 
2.27.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 2/2] rs6000: Add test for _mm_minpos_epu16
  2021-07-15 23:29 [PATCH v3 0/2] rs6000: Add support for _mm_minpos_epu16 Paul A. Clarke
  2021-07-15 23:29 ` [PATCH v3 1/2] " Paul A. Clarke
@ 2021-07-15 23:29 ` Paul A. Clarke
  2021-07-16 20:19   ` Bill Schmidt
  2021-08-02 22:54   ` Segher Boessenkool
  1 sibling, 2 replies; 9+ messages in thread
From: Paul A. Clarke @ 2021-07-15 23:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, wschmidt

Copy the test for _mm_minpos_epu16 from
gcc/testsuite/gcc.target/i386/sse4_1-phminposuw.c, with
a few adjustments:

- Adjust the dejagnu directives for powerpc platform.
- Make the data not be monotonically increasing,
  such that some of the returned values are not
  always the first value (index 0).
- Create a list of input data testing various scenarios
  including more than one minimum value and different
  orders and indices of the minimum value.
- Fix a masking issue where the index was being truncated
  to 2 bits instead of 3 bits, which wasn't found because
  all of the returned indices were 0 with the original
  generated data.
- Support big-endian.

2021-07-15  Paul A. Clarke  <pc@us.ibm.com>

gcc/testsuite
        * gcc.target/powerpc/sse4_1-phminposuw.c: Copy from
        gcc/testsuite/gcc.target/i386, make more robust.
---
v3: Minor formatting changes per Bill's review.
v2: Rewrote to utilize much more interesting input data afer Segher's
    review.

 .../gcc.target/powerpc/sse4_1-phminposuw.c    | 68 +++++++++++++++++++
 1 file changed, 68 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c

diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c
new file mode 100644
index 000000000000..88d9b43c431c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c
@@ -0,0 +1,68 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+/* { dg-require-effective-target p8vector_hw } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#ifndef CHECK_H
+#define CHECK_H "sse4_1-check.h"
+#endif
+
+#ifndef TEST
+#define TEST sse4_1_test
+#endif
+
+#include CHECK_H
+
+#include <smmintrin.h>
+
+#define DIM(a) (sizeof (a) / sizeof ((a)[0]))
+
+static void
+TEST (void)
+{
+  union
+    {
+      __m128i x;
+      unsigned short s[8];
+    } src[] =
+    {
+      { .s = { 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000 } },
+      { .s = { 0x0000, 0xffff, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000 } },
+      { .s = { 0xffff, 0xffff, 0x0000, 0xffff, 0xffff, 0xffff, 0x0000, 0xffff } },
+      { .s = { 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007, 0x0008 } },
+      { .s = { 0x0008, 0x0007, 0x0006, 0x0005, 0x0004, 0x0003, 0x0002, 0x0001 } },
+      { .s = { 0xfff4, 0xfff3, 0xfff2, 0xfff1, 0xfff3, 0xfff1, 0xfff2, 0xfff3 } }
+    };
+  unsigned short minVal[DIM (src)];
+  int minInd[DIM (src)];
+  unsigned short minValScalar, minIndScalar;
+  int i, j;
+  union
+    {
+      int si;
+      unsigned short s[2];
+    } res;
+
+  for (i = 0; i < DIM (src); i++)
+    {
+      res.si = _mm_cvtsi128_si32 (_mm_minpos_epu16 (src[i].x));
+      minVal[i] = res.s[0];
+      minInd[i] = res.s[1] & 0b111;
+    }
+
+  for (i = 0; i < DIM (src); i++)
+    {
+      minValScalar = src[i].s[0];
+      minIndScalar = 0;
+
+      for (j = 1; j < 8; j++)
+	if (minValScalar > src[i].s[j])
+	  {
+	    minValScalar = src[i].s[j];
+	    minIndScalar = j;
+	  }
+
+      if (minValScalar != minVal[i] && minIndScalar != minInd[i])
+	abort ();
+    }
+}
-- 
2.27.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/2] rs6000: Add support for _mm_minpos_epu16
  2021-07-15 23:29 ` [PATCH v3 1/2] " Paul A. Clarke
@ 2021-07-16 20:06   ` Bill Schmidt
  2021-08-02 22:29   ` Segher Boessenkool
  1 sibling, 0 replies; 9+ messages in thread
From: Bill Schmidt @ 2021-07-16 20:06 UTC (permalink / raw)
  To: Paul A. Clarke, gcc-patches; +Cc: segher

Hi Paul,

LGTM.  Recommend maintainers approve.

Thanks for the cleanups,
Bill

On 7/15/21 6:29 PM, Paul A. Clarke wrote:
> Add a naive implementation of the subject x86 intrinsic to
> ease porting.
>
> 2021-07-15  Paul A. Clarke  <pc@us.ibm.com>
>
> gcc
>          * config/rs6000/smmintrin.h (_mm_minpos_epu16): New.
> ---
> v3: Minor formatting changes per review from Bill.
> v2: Minor formatting changes per review from Segher.
>
>   gcc/config/rs6000/smmintrin.h | 27 +++++++++++++++++++++++++++
>   1 file changed, 27 insertions(+)
>
> diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
> index 16fd34d836ff..6a010fdbb96f 100644
> --- a/gcc/config/rs6000/smmintrin.h
> +++ b/gcc/config/rs6000/smmintrin.h
> @@ -172,4 +172,31 @@ _mm_test_mix_ones_zeros (__m128i __A, __m128i __mask)
>     return any_ones * any_zeros;
>   }
>   
> +/* Return horizontal packed word minimum and its index in bits [15:0]
> +   and bits [18:16] respectively.  */
> +__inline __m128i
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_minpos_epu16 (__m128i __A)
> +{
> +  union __u
> +    {
> +      __m128i __m;
> +      __v8hu __uh;
> +    };
> +  union __u __u = { .__m = __A }, __r = { .__m = {0} };
> +  unsigned short __ridx = 0;
> +  unsigned short __rmin = __u.__uh[__ridx];
> +  for (unsigned long __i = 1; __i < 8; __i++)
> +    {
> +      if (__u.__uh[__i] < __rmin)
> +	{
> +	  __rmin = __u.__uh[__i];
> +	  __ridx = __i;
> +	}
> +    }
> +  __r.__uh[0] = __rmin;
> +  __r.__uh[1] = __ridx;
> +  return __r.__m;
> +}
> +
>   #endif

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/2] rs6000: Add test for _mm_minpos_epu16
  2021-07-15 23:29 ` [PATCH v3 2/2] rs6000: Add test " Paul A. Clarke
@ 2021-07-16 20:19   ` Bill Schmidt
  2021-08-02 22:54   ` Segher Boessenkool
  1 sibling, 0 replies; 9+ messages in thread
From: Bill Schmidt @ 2021-07-16 20:19 UTC (permalink / raw)
  To: Paul A. Clarke, gcc-patches; +Cc: segher

Hi Paul,

Thanks for the cleanups, LGTM!  Recommend maintainers approve.

Bill

On 7/15/21 6:29 PM, Paul A. Clarke wrote:
> Copy the test for _mm_minpos_epu16 from
> gcc/testsuite/gcc.target/i386/sse4_1-phminposuw.c, with
> a few adjustments:
>
> - Adjust the dejagnu directives for powerpc platform.
> - Make the data not be monotonically increasing,
>    such that some of the returned values are not
>    always the first value (index 0).
> - Create a list of input data testing various scenarios
>    including more than one minimum value and different
>    orders and indices of the minimum value.
> - Fix a masking issue where the index was being truncated
>    to 2 bits instead of 3 bits, which wasn't found because
>    all of the returned indices were 0 with the original
>    generated data.
> - Support big-endian.
>
> 2021-07-15  Paul A. Clarke  <pc@us.ibm.com>
>
> gcc/testsuite
>          * gcc.target/powerpc/sse4_1-phminposuw.c: Copy from
>          gcc/testsuite/gcc.target/i386, make more robust.
> ---
> v3: Minor formatting changes per Bill's review.
> v2: Rewrote to utilize much more interesting input data afer Segher's
>      review.
>
>   .../gcc.target/powerpc/sse4_1-phminposuw.c    | 68 +++++++++++++++++++
>   1 file changed, 68 insertions(+)
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c
>
> diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c
> new file mode 100644
> index 000000000000..88d9b43c431c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c
> @@ -0,0 +1,68 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
> +/* { dg-require-effective-target p8vector_hw } */
> +
> +#define NO_WARN_X86_INTRINSICS 1
> +#ifndef CHECK_H
> +#define CHECK_H "sse4_1-check.h"
> +#endif
> +
> +#ifndef TEST
> +#define TEST sse4_1_test
> +#endif
> +
> +#include CHECK_H
> +
> +#include <smmintrin.h>
> +
> +#define DIM(a) (sizeof (a) / sizeof ((a)[0]))
> +
> +static void
> +TEST (void)
> +{
> +  union
> +    {
> +      __m128i x;
> +      unsigned short s[8];
> +    } src[] =
> +    {
> +      { .s = { 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000 } },
> +      { .s = { 0x0000, 0xffff, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000 } },
> +      { .s = { 0xffff, 0xffff, 0x0000, 0xffff, 0xffff, 0xffff, 0x0000, 0xffff } },
> +      { .s = { 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007, 0x0008 } },
> +      { .s = { 0x0008, 0x0007, 0x0006, 0x0005, 0x0004, 0x0003, 0x0002, 0x0001 } },
> +      { .s = { 0xfff4, 0xfff3, 0xfff2, 0xfff1, 0xfff3, 0xfff1, 0xfff2, 0xfff3 } }
> +    };
> +  unsigned short minVal[DIM (src)];
> +  int minInd[DIM (src)];
> +  unsigned short minValScalar, minIndScalar;
> +  int i, j;
> +  union
> +    {
> +      int si;
> +      unsigned short s[2];
> +    } res;
> +
> +  for (i = 0; i < DIM (src); i++)
> +    {
> +      res.si = _mm_cvtsi128_si32 (_mm_minpos_epu16 (src[i].x));
> +      minVal[i] = res.s[0];
> +      minInd[i] = res.s[1] & 0b111;
> +    }
> +
> +  for (i = 0; i < DIM (src); i++)
> +    {
> +      minValScalar = src[i].s[0];
> +      minIndScalar = 0;
> +
> +      for (j = 1; j < 8; j++)
> +	if (minValScalar > src[i].s[j])
> +	  {
> +	    minValScalar = src[i].s[j];
> +	    minIndScalar = j;
> +	  }
> +
> +      if (minValScalar != minVal[i] && minIndScalar != minInd[i])
> +	abort ();
> +    }
> +}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/2] rs6000: Add support for _mm_minpos_epu16
  2021-07-15 23:29 ` [PATCH v3 1/2] " Paul A. Clarke
  2021-07-16 20:06   ` Bill Schmidt
@ 2021-08-02 22:29   ` Segher Boessenkool
  2021-08-03 13:01     ` Paul A. Clarke
  1 sibling, 1 reply; 9+ messages in thread
From: Segher Boessenkool @ 2021-08-02 22:29 UTC (permalink / raw)
  To: Paul A. Clarke; +Cc: gcc-patches, wschmidt

Hi!

On Thu, Jul 15, 2021 at 06:29:17PM -0500, Paul A. Clarke wrote:
> Add a naive implementation of the subject x86 intrinsic to
> ease porting.

> --- a/gcc/config/rs6000/smmintrin.h
> +++ b/gcc/config/rs6000/smmintrin.h
> @@ -172,4 +172,31 @@ _mm_test_mix_ones_zeros (__m128i __A, __m128i __mask)
>    return any_ones * any_zeros;
>  }
>  
> +/* Return horizontal packed word minimum and its index in bits [15:0]
> +   and bits [18:16] respectively.  */
> +__inline __m128i
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_minpos_epu16 (__m128i __A)
> +{
> +  union __u
> +    {
> +      __m128i __m;
> +      __v8hu __uh;
> +    };
> +  union __u __u = { .__m = __A }, __r = { .__m = {0} };
> +  unsigned short __ridx = 0;
> +  unsigned short __rmin = __u.__uh[__ridx];
> +  for (unsigned long __i = 1; __i < 8; __i++)
> +    {
> +      if (__u.__uh[__i] < __rmin)
> +	{
> +	  __rmin = __u.__uh[__i];
> +	  __ridx = __i;
> +	}
> +    }
> +  __r.__uh[0] = __rmin;
> +  __r.__uh[1] = __ridx;
> +  return __r.__m;
> +}

As before: does this work correctly on BE?  Was it tested there?

Okay for trunk if so.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/2] rs6000: Add test for _mm_minpos_epu16
  2021-07-15 23:29 ` [PATCH v3 2/2] rs6000: Add test " Paul A. Clarke
  2021-07-16 20:19   ` Bill Schmidt
@ 2021-08-02 22:54   ` Segher Boessenkool
  1 sibling, 0 replies; 9+ messages in thread
From: Segher Boessenkool @ 2021-08-02 22:54 UTC (permalink / raw)
  To: Paul A. Clarke; +Cc: gcc-patches, wschmidt

Hi!

On Thu, Jul 15, 2021 at 06:29:18PM -0500, Paul A. Clarke wrote:
> Copy the test for _mm_minpos_epu16 from
> gcc/testsuite/gcc.target/i386/sse4_1-phminposuw.c, with
> a few adjustments:

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c
> @@ -0,0 +1,68 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
> +/* { dg-require-effective-target p8vector_hw } */

What does this need P8 for?  Please test for just what you need, and
don't use -mpower8-vector at all, it is never needed?

Okay for trunk with that fixed.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/2] rs6000: Add support for _mm_minpos_epu16
  2021-08-02 22:29   ` Segher Boessenkool
@ 2021-08-03 13:01     ` Paul A. Clarke
  0 siblings, 0 replies; 9+ messages in thread
From: Paul A. Clarke @ 2021-08-03 13:01 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: wschmidt, gcc-patches

On Mon, Aug 02, 2021 at 05:29:08PM -0500, Segher Boessenkool wrote:
> On Thu, Jul 15, 2021 at 06:29:17PM -0500, Paul A. Clarke wrote:
> > Add a naive implementation of the subject x86 intrinsic to
> > ease porting.
> 
> > --- a/gcc/config/rs6000/smmintrin.h
> > +++ b/gcc/config/rs6000/smmintrin.h
> > @@ -172,4 +172,31 @@ _mm_test_mix_ones_zeros (__m128i __A, __m128i __mask)
> >    return any_ones * any_zeros;
> >  }
> >  
> > +/* Return horizontal packed word minimum and its index in bits [15:0]
> > +   and bits [18:16] respectively.  */
> > +__inline __m128i
> > +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > +_mm_minpos_epu16 (__m128i __A)
> > +{
> > +  union __u
> > +    {
> > +      __m128i __m;
> > +      __v8hu __uh;
> > +    };
> > +  union __u __u = { .__m = __A }, __r = { .__m = {0} };
> > +  unsigned short __ridx = 0;
> > +  unsigned short __rmin = __u.__uh[__ridx];
> > +  for (unsigned long __i = 1; __i < 8; __i++)
> > +    {
> > +      if (__u.__uh[__i] < __rmin)
> > +	{
> > +	  __rmin = __u.__uh[__i];
> > +	  __ridx = __i;
> > +	}
> > +    }
> > +  __r.__uh[0] = __rmin;
> > +  __r.__uh[1] = __ridx;
> > +  return __r.__m;
> > +}
> 
> As before: does this work correctly on BE?  Was it tested there?

Per the "cover letter":
| Tested on BE, LE (32 and 64bit).

> Okay for trunk if so.  Thanks!

Thanks! I'll push this shortly.

PC

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 2/2] rs6000: Add test for _mm_minpos_epu16
@ 2021-07-28  2:31 David Edelsohn
  0 siblings, 0 replies; 9+ messages in thread
From: David Edelsohn @ 2021-07-28  2:31 UTC (permalink / raw)
  To: Paul A. Clarke; +Cc: GCC Patches, Bill Schmidt

> Copy the test for _mm_minpos_epu16 from
> gcc/testsuite/gcc.target/i386/sse4_1-phminposuw.c, with
> a few adjustments:
>
> - Adjust the dejagnu directives for powerpc platform.
> - Make the data not be monotonically increasing,
>  such that some of the returned values are not
>  always the first value (index 0).
> - Create a list of input data testing various scenarios
>  including more than one minimum value and different
>  orders and indices of the minimum value.
> - Fix a masking issue where the index was being truncated
>  to 2 bits instead of 3 bits, which wasn't found because
>  all of the returned indices were 0 with the original
>  generated data.
> - Support big-endian.
>
> 2021-07-15  Paul A. Clarke  <pc@us.ibm.com>
>
> gcc/testsuite
>         * gcc.target/powerpc/sse4_1-phminposuw.c: Copy from
 >        gcc/testsuite/gcc.target/i386, make more robust.

This is okay.

Thanks David

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-08-03 13:01 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-15 23:29 [PATCH v3 0/2] rs6000: Add support for _mm_minpos_epu16 Paul A. Clarke
2021-07-15 23:29 ` [PATCH v3 1/2] " Paul A. Clarke
2021-07-16 20:06   ` Bill Schmidt
2021-08-02 22:29   ` Segher Boessenkool
2021-08-03 13:01     ` Paul A. Clarke
2021-07-15 23:29 ` [PATCH v3 2/2] rs6000: Add test " Paul A. Clarke
2021-07-16 20:19   ` Bill Schmidt
2021-08-02 22:54   ` Segher Boessenkool
2021-07-28  2:31 David Edelsohn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).