public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/2] rs6000: Add support for _mm_minpos_epu16
@ 2021-06-02 22:13 Paul A. Clarke
  2021-06-02 22:13 ` [PATCH 1/2] " Paul A. Clarke
  2021-06-02 22:13 ` [PATCH 2/2] rs6000: Add test " Paul A. Clarke
  0 siblings, 2 replies; 7+ messages in thread
From: Paul A. Clarke @ 2021-06-02 22:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher

Added compatible implementation of _mm_minpos_epu16 for powerpc.
Copied, improved, and fixed testcase from i386.
Tested on BE, LE (32 and 64bit).

Paul A. Clarke (2):
  rs6000: Add support for _mm_minpos_epu16
  rs6000: Add test for _mm_minpos_epu16

 gcc/config/rs6000/smmintrin.h                 | 27 ++++++++
 .../gcc.target/powerpc/sse4_1-phminposuw.c    | 63 +++++++++++++++++++
 2 files changed, 90 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c

-- 
2.27.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] rs6000: Add support for _mm_minpos_epu16
  2021-06-02 22:13 [PATCH 0/2] rs6000: Add support for _mm_minpos_epu16 Paul A. Clarke
@ 2021-06-02 22:13 ` Paul A. Clarke
  2021-06-03  0:27   ` Segher Boessenkool
  2021-06-02 22:13 ` [PATCH 2/2] rs6000: Add test " Paul A. Clarke
  1 sibling, 1 reply; 7+ messages in thread
From: Paul A. Clarke @ 2021-06-02 22:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher

Add a naive implementation of the subject x86 intrinsic to
ease porting.

2021-06-02  Paul A. Clarke  <pc@us.ibm.com>

gcc/ChangeLog:
        * config/rs6000/smmintrin.h (_mm_minpos_epu16): New.
---
 gcc/config/rs6000/smmintrin.h | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index bdf6eb365d88..358a48958192 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -116,4 +116,31 @@ _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
   return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
 }
 
+/* Return horizontal packed word minimum and its index in bits [15:0]
+   and bits [18:16] respectively.  */
+extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mm_minpos_epu16 (__m128i __A)
+{
+  union __u
+    {
+      __m128i __m;
+      __v8hu __uh;
+    };
+  union __u __u = { .__m = __A }, __r = { .__m = {0} };
+  unsigned short __ridx = 0;
+  unsigned short __rmin = __u.__uh[__ridx];
+  for (unsigned long __i = __ridx+1;
+       __i < sizeof (__u.__uh) / sizeof (__u.__uh[0]);
+       __i++)
+    {
+      if (__u.__uh[__i] < __rmin)
+        {
+          __rmin = __u.__uh[__i];
+          __ridx = __i;
+        }
+    }
+  __r.__uh[0] = __rmin;
+  __r.__uh[1] = __ridx;
+  return __r.__m;
+}
 #endif
-- 
2.27.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 2/2] rs6000: Add test for _mm_minpos_epu16
  2021-06-02 22:13 [PATCH 0/2] rs6000: Add support for _mm_minpos_epu16 Paul A. Clarke
  2021-06-02 22:13 ` [PATCH 1/2] " Paul A. Clarke
@ 2021-06-02 22:13 ` Paul A. Clarke
  2021-06-03  1:50   ` Segher Boessenkool
  1 sibling, 1 reply; 7+ messages in thread
From: Paul A. Clarke @ 2021-06-02 22:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher

Copy the test for _mm_minpos_epu16 from
gcc/testsuite/gcc.target/i386/sse4_1-phminposuw.c, with
a few adjustments:

- Adjust the dejagnu directives for powerpc platform.
- Make the data not be monotonically increasing,
  such that some of the returned values are not
  always the first value (index 0).
- Fix a masking issue where the index was being truncated
  to 2 bits instead of 3 bits, which wasn't found because
  all of the returned indicies were 0 with the original
  generated data.
- Support big-endian.

2021-06-02  Paul A. Clarke  <pc@us.ibm.com>

gcc/testsuite/ChangeLog:
        * gcc.target/powerpc/sse4_1-phminposuw.c: Copy from
        gcc/testsuite/gcc.target/i386, make more robust.
---
 .../gcc.target/powerpc/sse4_1-phminposuw.c    | 63 +++++++++++++++++++
 1 file changed, 63 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c

diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c b/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c
new file mode 100644
index 000000000000..0b6318500b1e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-phminposuw.c
@@ -0,0 +1,63 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+/* { dg-require-effective-target p8vector_hw } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#ifndef CHECK_H
+#define CHECK_H "sse4_1-check.h"
+#endif
+
+#ifndef TEST
+#define TEST sse4_1_test
+#endif
+
+#include CHECK_H
+
+#include <smmintrin.h>
+
+#define NUM 64
+
+static void
+TEST (void)
+{
+  union
+    {
+      __m128i x[NUM/8];
+      unsigned short s[NUM];
+    } src;
+  unsigned short minVal[NUM/8];
+  int minInd[NUM/8];
+  unsigned short minValScalar, minIndScalar;
+  int i, j;
+  union
+    {
+      int i;
+      unsigned short s[2];
+    } res;
+
+  for (i = 0; i < NUM; i++)
+    src.s[i] = i * i - 68 * i + 1200;
+
+  for (i = 0, j = 0; i < NUM; i += 8, j++)
+    {
+      res.i = _mm_cvtsi128_si32 (_mm_minpos_epu16 (src.x [i/8]));
+      minVal[j] = res.s[0];
+      minInd[j] = res.s[1] & 0b111;
+    }
+
+  for (i = 0; i < NUM; i += 8)
+    {
+      minValScalar = src.s[i];
+      minIndScalar = 0;
+
+      for (j = i + 1; j < i + 8; j++)
+	if (minValScalar > src.s[j])
+	  {
+	    minValScalar = src.s[j];
+	    minIndScalar = j - i;
+	  }
+
+      if (minValScalar != minVal[i/8] && minIndScalar != minInd[i/8])
+	abort ();
+    }
+}
-- 
2.27.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] rs6000: Add support for _mm_minpos_epu16
  2021-06-02 22:13 ` [PATCH 1/2] " Paul A. Clarke
@ 2021-06-03  0:27   ` Segher Boessenkool
  2021-06-03 17:59     ` Paul A. Clarke
  0 siblings, 1 reply; 7+ messages in thread
From: Segher Boessenkool @ 2021-06-03  0:27 UTC (permalink / raw)
  To: Paul A. Clarke; +Cc: gcc-patches

Hi!

On Wed, Jun 02, 2021 at 05:13:15PM -0500, Paul A. Clarke wrote:
> Add a naive implementation of the subject x86 intrinsic to
> ease porting.

> +/* Return horizontal packed word minimum and its index in bits [15:0]
> +   and bits [18:16] respectively.  */
> +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> +_mm_minpos_epu16 (__m128i __A)
> +{
> +  union __u
> +    {
> +      __m128i __m;
> +      __v8hu __uh;
> +    };
> +  union __u __u = { .__m = __A }, __r = { .__m = {0} };
> +  unsigned short __ridx = 0;
> +  unsigned short __rmin = __u.__uh[__ridx];
> +  for (unsigned long __i = __ridx+1;

(spaces around the "+"?)

> +       __i < sizeof (__u.__uh) / sizeof (__u.__uh[0]);

You should either use a macro for that, or just write "8" :-)

> +       __i++)
> +    {
> +      if (__u.__uh[__i] < __rmin)
> +        {
> +          __rmin = __u.__uh[__i];
> +          __ridx = __i;
> +        }
> +    }
> +  __r.__uh[0] = __rmin;
> +  __r.__uh[1] = __ridx;
> +  return __r.__m;
> +}

This does not compute the index correctly for big endian (it needs to
walk from right to left for that).  The construction of the return value
looks wrong as well.

Okay for trunk with that fixed.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] rs6000: Add test for _mm_minpos_epu16
  2021-06-02 22:13 ` [PATCH 2/2] rs6000: Add test " Paul A. Clarke
@ 2021-06-03  1:50   ` Segher Boessenkool
  2021-06-03 18:01     ` Paul A. Clarke
  0 siblings, 1 reply; 7+ messages in thread
From: Segher Boessenkool @ 2021-06-03  1:50 UTC (permalink / raw)
  To: Paul A. Clarke; +Cc: gcc-patches

On Wed, Jun 02, 2021 at 05:13:16PM -0500, Paul A. Clarke wrote:
> +  for (i = 0; i < NUM; i++)
> +    src.s[i] = i * i - 68 * i + 1200;

Could you do tests with some identical elements as well?  Because that
is where I think it fails on BE currently.


Segher

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] rs6000: Add support for _mm_minpos_epu16
  2021-06-03  0:27   ` Segher Boessenkool
@ 2021-06-03 17:59     ` Paul A. Clarke
  0 siblings, 0 replies; 7+ messages in thread
From: Paul A. Clarke @ 2021-06-03 17:59 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches

On Wed, Jun 02, 2021 at 07:27:35PM -0500, Segher Boessenkool wrote:
> On Wed, Jun 02, 2021 at 05:13:15PM -0500, Paul A. Clarke wrote:
> > Add a naive implementation of the subject x86 intrinsic to
> > ease porting.
> 
> > +/* Return horizontal packed word minimum and its index in bits [15:0]
> > +   and bits [18:16] respectively.  */
> > +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
> > +_mm_minpos_epu16 (__m128i __A)
> > +{
> > +  union __u
> > +    {
> > +      __m128i __m;
> > +      __v8hu __uh;
> > +    };
> > +  union __u __u = { .__m = __A }, __r = { .__m = {0} };
> > +  unsigned short __ridx = 0;
> > +  unsigned short __rmin = __u.__uh[__ridx];
> > +  for (unsigned long __i = __ridx+1;
> 
> (spaces around the "+"?)

ok

> 
> > +       __i < sizeof (__u.__uh) / sizeof (__u.__uh[0]);
> 
> You should either use a macro for that, or just write "8" :-)

ok. (There should be a standard thing for this operation.)

> > +       __i++)
> > +    {
> > +      if (__u.__uh[__i] < __rmin)
> > +        {
> > +          __rmin = __u.__uh[__i];
> > +          __ridx = __i;
> > +        }
> > +    }
> > +  __r.__uh[0] = __rmin;
> > +  __r.__uh[1] = __ridx;
> > +  return __r.__m;
> > +}
> 
> This does not compute the index correctly for big endian (it needs to
> walk from right to left for that).  The construction of the return value
> looks wrong as well.
> 
> Okay for trunk with that fixed.  Thanks!

I'm not seeing the issue here. The values are numbered by element order,
and the results are in the "first" (minimum value) and "second" (index of
first encountered minimum value in element order) elements of the result.

PC

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] rs6000: Add test for _mm_minpos_epu16
  2021-06-03  1:50   ` Segher Boessenkool
@ 2021-06-03 18:01     ` Paul A. Clarke
  0 siblings, 0 replies; 7+ messages in thread
From: Paul A. Clarke @ 2021-06-03 18:01 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches

On Wed, Jun 02, 2021 at 08:50:56PM -0500, Segher Boessenkool wrote:
> On Wed, Jun 02, 2021 at 05:13:16PM -0500, Paul A. Clarke wrote:
> > +  for (i = 0; i < NUM; i++)
> > +    src.s[i] = i * i - 68 * i + 1200;
> 
> Could you do tests with some identical elements as well?  Because that
> is where I think it fails on BE currently.

Let me re-do the test case a bit more to provide a better set of
input data, rather than this computational attempt which misses a
bunch of interesting cases.

I'll send a v2 in a bit.

PC

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-06-03 18:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-02 22:13 [PATCH 0/2] rs6000: Add support for _mm_minpos_epu16 Paul A. Clarke
2021-06-02 22:13 ` [PATCH 1/2] " Paul A. Clarke
2021-06-03  0:27   ` Segher Boessenkool
2021-06-03 17:59     ` Paul A. Clarke
2021-06-02 22:13 ` [PATCH 2/2] rs6000: Add test " Paul A. Clarke
2021-06-03  1:50   ` Segher Boessenkool
2021-06-03 18:01     ` Paul A. Clarke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).