public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, rs6000] Add x86 instrinsic headers to GCC PPC64LE taget
@ 2017-05-08 14:50 Steven Munroe
  2017-05-09 17:35 ` Segher Boessenkool
  2017-05-12 18:39 ` Mike Stump
  0 siblings, 2 replies; 10+ messages in thread
From: Steven Munroe @ 2017-05-08 14:50 UTC (permalink / raw)
  To: gcc-patches; +Cc: Segher Boessenkool, David Edelsohn

A common issue in porting applications and packages is that someone may
have forgotten that there is more than one hardware platform. 

A specific example is applications using Intel x86 intrinsic functions
without appropriate conditional compile guards. Another example is a
developer tasked to port a large volume of code containing important
functions "optimized" with Intel x86 intrinsics, but without the skill
or time to perform the same optimization for another platform. Often the
developer who wrote the original optimization has moved on and those
left to maintain the application / package lack understanding of the
original x86 intrinsic code or design.

For PowerPC this can be acute especially for HPC vector SIMD codes. The
PowerISA (as implemented for POWER and OpenPOWER servers) has extensive
vector hardware facilities and GCC proves a large set of vector
intrinsics. Thus I would like to restrict this support to PowerPC
targets that support VMX/VSX and PowerISA-2.07 (power8) and later.

But the difference in (intrinsic) spelling alone is enough stop many
application developers in their tracks.

So I propose to submit a series of patches to implement the PowerPC64LE
equivalent of a useful subset of the x86 intrinsics. The final size and
usefulness of this effort is to be determined. The proposal is to
incrementally port intrinsic header files from the ./config/i386 tree to
the ./config/rs6000 tree. This naturally provides the same header
structure and intrinsic names which will simplify code porting.

It seems natural to work from the bottom (oldest) up. For example
starting with mmintrin.h and working our way up the following headers:

smmintrin.h    (SSE4.1)  includes tmmintrin,h
tmmintrin.h    (SSSE3)   includes pmmintrin.h
pmmintrin.h    (SSE3)    includes emmintrin,h
emmintrin.h    (SSE2)    includes xmmintrin.h
xmmintrin.h    (SSE)     includes mmintrin.h and mm_malloc.h
mmintrin.h     (MMX)

There is a smattering of non-vector intrinsics in common use.
Like the Bit Manipulation Instructions (BMI & BMI2).

bmiintrin.h
bmi2intrin.h
x86intrin.h     (collector includes BMI headers and many others)

The older intrinsic (BMI/MMX/SSE) instructions have been integrated into
GCC and many of the intrinsic implementations are simple C code or GCC
built-ins. The remaining intrinsic functions are implemented as platform
specific builtins (__builtin_ia32_*) and need to be mapped to equivalent
PowerPC builtin or vector intrinsic from altivec.h is required.

Of course as part of this process we will port as many of the
corresponding DejaGnu tests from gcc/testsuite/gcc.target/i386/ to
gcc/testsuite/gcc.target/powerpc/ as appropriate. So far the dg-do run
tests only require minor source changes, mostly to the platform specific
dg-* directives. A few dg-do compile tests are needed to insure we are
getting the expected folding/Common subexpression elimination (CSE) to
generate the optimum sequence for PowerPC.

To get the ball rolling I include the BMI intrinsics ported to PowerPC
for review as they are reasonable size (31 intrinsic implementations).

[gcc]

2017-05-04  Steven Munroe  <munroesj@gcc.gnu.org>

        * config.gcc (powerpc*-*-*): Add bmi2intrin.h, bmiintrin.h,
        and x86intrin.h
        * config/rs6000/bmiintrin.h: New file.
        * config/rs6000/bmi2intrin.h: New file.
        * config/rs6000/x86intrin.h: New file.

[gcc/testsuite]

2017-05-04  Steven Munroe  <munroesj@gcc.gnu.org>

        * gcc.target/powerpc/bmi-andn-1.c: New file
        * gcc.target/powerpc/bmi-andn-2.c: New file.
        * gcc.target/powerpc/bmi-bextr-1.c: New file.
        * gcc.target/powerpc/bmi-bextr-2.c: New file.
        * gcc.target/powerpc/bmi-bextr-4.c: New file.
        * gcc.target/powerpc/bmi-bextr-5.c: New file.
        * gcc.target/powerpc/bmi-blsi-1.c: New file.
        * gcc.target/powerpc/bmi-blsi-2.c: New file.
        * gcc.target/powerpc/bmi-blsmsk-1.c: new file.
        * gcc.target/powerpc/bmi-blsmsk-2.c: New file.
        * gcc.target/powerpc/bmi-blsr-1.c: New file.
        * gcc.target/powerpc/bmi-blsr-2.c: New File.
        * gcc.target/powerpc/bmi-check.h: New File.
        * gcc.target/powerpc/bmi-tzcnt-1.c: new file.
        * gcc.target/powerpc/bmi-tzcnt-2.c: New file.
        * gcc.target/powerpc/bmi2-bzhi32-1.c: New file.
        * gcc.target/powerpc/bmi2-bzhi64-1.c: New file.
        * gcc.target/powerpc/bmi2-bzhi64-1a.c: New file.
        * gcc.target/powerpc/bmi2-check.h: New file.
        * gcc.target/powerpc/bmi2-mulx32-1.c: New file.
        * gcc.target/powerpc/bmi2-mulx32-2.c: New file.
        * gcc.target/powerpc/bmi2-mulx64-1.c: New file.
        * gcc.target/powerpc/bmi2-mulx64-2.c: New file.
        * gcc.target/powerpc/bmi2-pdep32-1.c: New file.
        * gcc.target/powerpc/bmi2-pdep64-1.c: New file.
        * gcc.target/powerpc/bmi2-pext32-1.c: New File.
        * gcc.target/powerpc/bmi2-pext64-1.c: New file.
        * gcc.target/powerpc/bmi2-pext64-1a.c: New File.

Index: gcc/testsuite/gcc.target/powerpc/bmi-blsi-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bmi-blsi-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bmi-blsi-1.c	(revision 0)
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O3 -m64 -fno-inline" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <x86intrin.h>
+#include "bmi-check.h"
+
+/* To fool the compiler, so it does not generate blsi here. */
+long long calc_blsi_u64 (long long src1, long long src2)
+{
+  return (-src1) & (src2);
+}
+
+static void
+bmi_test()
+{
+  unsigned i;
+
+  long long src = 0xfacec0ffeefacec0;
+  long long res, res_ref;
+
+  for (i=0; i<5; ++i) {
+    src = (i + src) << i;
+
+    res_ref = calc_blsi_u64 (src, src);
+    res = __blsi_u64 (src);
+
+    if (res != res_ref)
+      abort();
+  }
+}
Index: gcc/testsuite/gcc.target/powerpc/bmi-bextr-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bmi-bextr-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bmi-bextr-2.c	(revision 0)
@@ -0,0 +1,49 @@
+/* { dg-do run } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O3 -m64 -fno-inline" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <x86intrin.h>
+#include "bmi-check.h"
+
+unsigned calc_bextr_u32 (unsigned src1, unsigned src2)
+{
+  unsigned res = 0;
+  unsigned char start = (src2 & 0xff);
+  unsigned char len = (int) ((src2 >> 8) & 0xff);
+  if (start < 32) {
+    unsigned i;
+    unsigned last = (start+len) < 32 ? start+len : 32;
+
+    src1 >>= start;
+    for (i=start; i<last; ++i) {
+      res |= (src1 & 1) << (i-start);
+      src1 >>= 1;
+    }
+  }
+
+  return res;
+}
+
+static void
+bmi_test ()
+{
+  unsigned i;
+  unsigned char start, len;
+  unsigned src1 = 0xfacec0ff;
+  unsigned res, res_ref, src2;
+
+  for (i=0; i<5; ++i) {
+    start = (i * 1983) % 32;
+    len = (i + (i * 1983)) % 32;
+
+    src1 = src1 * 3;
+    src2 = start | (((unsigned)len) << 8);
+
+    res_ref = calc_bextr_u32 (src1, src2);
+    res = __bextr_u32 (src1, src2);
+
+    if (res != res_ref)
+      abort();
+  }
+}
Index: gcc/testsuite/gcc.target/powerpc/bmi2-bzhi64-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bmi2-bzhi64-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bmi2-bzhi64-1.c	(revision 0)
@@ -0,0 +1,37 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -m64 -mcpu=power7" } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <x86intrin.h>
+#include "bmi2-check.h"
+
+__attribute__((noinline))
+unsigned long long
+calc_bzhi_u64 (unsigned long long a, int l)
+{
+  unsigned long long res = a;
+  int i;
+  for (i = 0; i < 64 - l; ++i)
+    res &= ~(1LL << (63 - i));
+
+  return res;
+}
+
+static void
+bmi2_test ()
+{
+  unsigned i;
+  unsigned long long src = 0xce7ace0ce7ace0ff;
+  unsigned long long res, res_ref;
+
+  for (i = 0; i < 5; ++i) {
+    src = src * (i + 1);
+
+    res_ref = calc_bzhi_u64 (src, i * 2);
+    res = _bzhi_u64 (src, i * 2);
+
+    if (res != res_ref)
+      abort();
+  }
+}
Index: gcc/testsuite/gcc.target/powerpc/bmi-blsi-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bmi-blsi-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bmi-blsi-2.c	(revision 0)
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O3 -m64 -fno-inline" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <x86intrin.h>
+#include "bmi-check.h"
+
+/* To fool the compiler, so it does not generate blsi here. */
+int calc_blsi_u32 (int src1, int src2)
+{
+  return (-src1) & (src2);
+}
+
+static void
+bmi_test()
+{
+  unsigned i;
+  int src = 0xfacec0ff;
+  int res, res_ref;
+
+  for (i=0; i<5; ++i) {
+    src = (i + src) << i;
+
+    res_ref = calc_blsi_u32 (src, src);
+    res = __blsi_u32 (src);
+
+    if (res != res_ref)
+      abort();
+  }
+}
Index: gcc/testsuite/gcc.target/powerpc/bmi-blsr-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bmi-blsr-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bmi-blsr-1.c	(revision 0)
@@ -0,0 +1,30 @@
+/* { dg-do run } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O3 -m64 -fno-inline" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <x86intrin.h>
+#include "bmi-check.h"
+
+long long calc_blsr_u64 (long long src1, long long src2)
+{
+  return (src1-1) & (src2);
+}
+
+static void
+bmi_test()
+{
+  unsigned i;
+  long long src = 0xfacec0ffeefacec0;
+  long long res, res_ref;
+
+  for (i=0; i<5; ++i) {
+    src = (i + src) << i;
+
+    res_ref = calc_blsr_u64 (src, src);
+    res = __blsr_u64 (src);
+
+    if (res != res_ref)
+      abort();
+  }
+}
Index: gcc/testsuite/gcc.target/powerpc/bmi2-pdep64-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bmi2-pdep64-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bmi2-pdep64-1.c	(revision 0)
@@ -0,0 +1,41 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -m64 -mcpu=power7" } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <x86intrin.h>
+#include "bmi2-check.h"
+
+__attribute__((noinline))
+unsigned long long
+calc_pdep_u64 (unsigned long long a, unsigned long long mask)
+{
+  unsigned long long res = 0;
+  unsigned long long i, k = 0;
+
+  for (i = 0; i < 64; ++i)
+    if (mask & (1LL << i)) {
+      res |= ((a & (1LL << k)) >> k) << i;
+      ++k;
+    }
+  return res;
+}
+
+static
+void
+bmi2_test ()
+{
+  unsigned long long i;
+  unsigned long long src = 0xce7acce7acce7ac;
+  unsigned long long res, res_ref;
+
+  for (i = 0; i < 5; ++i) {
+    src = src * (i + 1);
+
+    res_ref = calc_pdep_u64 (src, ~(i * 3));
+    res = _pdep_u64 (src, ~(i * 3));
+
+    if (res != res_ref)
+      abort ();
+  }
+}
Index: gcc/testsuite/gcc.target/powerpc/bmi-bextr-4.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bmi-bextr-4.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bmi-bextr-4.c	(revision 0)
@@ -0,0 +1,49 @@
+/* { dg-do run } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O3 -m64 -fno-inline" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <x86intrin.h>
+#include "bmi-check.h"
+
+unsigned calc_bextr_u32 (unsigned src1, unsigned src2)
+{
+  unsigned res = 0;
+  unsigned char start = (src2 & 0xff);
+  unsigned char len = (int) ((src2 >> 8) & 0xff);
+  if (start < 32) {
+    unsigned i;
+    unsigned last = (start+len) < 32 ? start+len : 32;
+
+    src1 >>= start;
+    for (i=start; i<last; ++i) {
+      res |= (src1 & 1) << (i-start);
+      src1 >>= 1;
+    }
+  }
+
+  return res;
+}
+
+static void
+bmi_test ()
+{
+  unsigned i;
+  unsigned char start, len;
+  unsigned src1 = 0xfacec0ff;
+  unsigned res, res_ref, src2;
+
+  for (i=0; i<5; ++i) {
+    start = i * 4;
+    len = i * 4;
+
+    src1 = src1 * 3;
+    src2 = (start & 0xff) | ((len & 0xff) << 8);
+
+    res_ref = calc_bextr_u32 (src1, src2);
+    res = _bextr_u32 (src1, start, len);
+
+    if (res != res_ref)
+      abort();
+  }
+}
Index: gcc/testsuite/gcc.target/powerpc/bmi-check.h
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bmi-check.h	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bmi-check.h	(revision 0)
@@ -0,0 +1,30 @@
+#include <stdio.h>
+#include <stdlib.h>
+
+static void bmi_test (void);
+
+static void
+__attribute__ ((noinline))
+do_test (void)
+{
+  bmi_test ();
+}
+
+int
+main ()
+{
+  /* Need 64-bit for 64-bit longs as single instruction.  */
+  if ( __builtin_cpu_supports ("ppc64") )
+    {
+      do_test ();
+#ifdef DEBUG
+      printf ("PASSED\n");
+#endif
+    }
+#ifdef DEBUG
+  else
+    printf ("SKIPPED\n");
+#endif
+
+  return 0;
+}
Index: gcc/testsuite/gcc.target/powerpc/bmi-blsr-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bmi-blsr-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bmi-blsr-2.c	(revision 0)
@@ -0,0 +1,30 @@
+/* { dg-do run } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O3 -m64 -fno-inline" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <x86intrin.h>
+#include "bmi-check.h"
+
+int calc_blsr_u32 (int src1, int src2)
+{
+  return (src1-1) & (src2);
+}
+
+static void
+bmi_test ()
+{
+  unsigned i;
+  int src = 0xfacec0ff;
+  int res, res_ref;
+
+  for (i=0; i<5; ++i) {
+    src = (i + src) << i;
+
+    res_ref = calc_blsr_u32 (src, src);
+    res = __blsr_u32 (src);
+
+    if (res != res_ref)
+      abort();
+  }
+}
Index: gcc/testsuite/gcc.target/powerpc/bmi-bextr-5.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bmi-bextr-5.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bmi-bextr-5.c	(revision 0)
@@ -0,0 +1,49 @@
+/* { dg-do run } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O3 -m64 -fno-inline" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <x86intrin.h>
+#include "bmi-check.h"
+
+long long calc_bextr_u64 (unsigned long long src1,
+			  unsigned long long src2)
+{
+  long long res = 0;
+  unsigned char start = (src2 & 0xff);
+  unsigned char len = (int) ((src2 >> 8) & 0xff);
+  if (start < 64) {
+    unsigned i;
+    unsigned last = (start+len) < 64 ? start+len : 64;
+
+    src1 >>= start;
+    for (i=start; i<last; ++i) {
+      res |= (src1 & 1) << (i-start);
+      src1 >>= 1;
+    }
+  }
+
+  return res;
+}
+
+static void
+bmi_test ()
+{
+  unsigned i;
+  unsigned char start, len;
+  unsigned long long src1 = 0xfacec0ffeefacec0;
+  unsigned long long res, res_ref, src2;
+
+  for (i=0; i<5; ++i) {
+    start = i * 4;
+    len = i * 3;
+    src1 = src1 * 3;
+    src2 = (start & 0xff) | ((len & 0xff) << 8);
+
+    res_ref = calc_bextr_u64 (src1, src2);
+    res = _bextr_u64 (src1, start, len);
+
+    if (res != res_ref)
+      abort();
+  }
+}
Index: gcc/testsuite/gcc.target/powerpc/bmi-blsmsk-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bmi-blsmsk-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bmi-blsmsk-1.c	(revision 0)
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O3 -m64 -fno-inline" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <x86intrin.h>
+#include "bmi-check.h"
+
+/*  Trick compiler in order not to generate target insn here. */
+long long calc_blsmsk_u64 (long long src1, long long src2)
+{
+  return (src1-1) ^ (src2);
+}
+
+static void
+bmi_test ()
+{
+  unsigned i;
+  long long src = 0xfacec0ffeefacec0;
+  long long res, res_ref;
+
+  for (i=0; i<5; ++i) {
+    src = (i + src) << i;
+
+    res_ref = calc_blsmsk_u64 (src, src);
+    res = __blsmsk_u64 (src);
+
+    if (res != res_ref)
+      abort();
+  }
+}
Index: gcc/testsuite/gcc.target/powerpc/bmi-blsmsk-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bmi-blsmsk-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bmi-blsmsk-2.c	(revision 0)
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O3 -m64 -fno-inline" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <x86intrin.h>
+#include "bmi-check.h"
+
+/*  Trick compiler in order not to generate target insn here. */
+int calc_blsmsk_u32 (int src1, int src2)
+{
+  return (src1-1) ^ (src2);
+}
+
+static void
+bmi_test ()
+{
+  unsigned i;
+  int src = 0xfacec0ff;
+  int res, res_ref;
+
+  for (i=0; i<5; ++i) {
+    src = (i + src) << i;
+
+    res_ref = calc_blsmsk_u32 (src, src);
+    res = __blsmsk_u32 (src);
+
+    if (res != res_ref)
+      abort();
+  }
+}
Index: gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1.c	(revision 0)
@@ -0,0 +1,41 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -m64 -mcpu=power7" } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <x86intrin.h>
+#include "bmi2-check.h"
+
+__attribute__((noinline))
+unsigned long long
+calc_pext_u64 (unsigned long long a, unsigned long long mask)
+{
+  unsigned long long res = 0;
+  int i, k = 0;
+
+  for (i = 0; i < 64; ++i)
+    if (mask & (1LL << i)) {
+      res |= ((a & (1LL << i)) >> i) << k;
+      ++k;
+    }
+
+  return res;
+}
+
+static void
+bmi2_test ()
+{
+  unsigned long long i;
+  unsigned long long src = 0xce7acce7acce7ac;
+  unsigned long long res, res_ref;
+
+  for (i = 0; i < 5; ++i) {
+    src = src * (i + 1);
+
+    res_ref = calc_pext_u64 (src, ~(i * 3));
+    res = _pext_u64 (src, ~(i * 3));
+
+    if (res != res_ref)
+      abort();
+  }
+}
Index: gcc/testsuite/gcc.target/powerpc/bmi2-bzhi32-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bmi2-bzhi32-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bmi2-bzhi32-1.c	(revision 0)
@@ -0,0 +1,37 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -m64 -mcpu=power7" } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <x86intrin.h>
+#include "bmi2-check.h"
+
+__attribute__((noinline))
+unsigned
+calc_bzhi_u32 (unsigned a, int l)
+{
+  unsigned res = a;
+  int i;
+  for (i = 0; i < 32 - l; ++i)
+    res &= ~(1 << (31 - i));
+
+  return res;
+}
+
+static void
+bmi2_test ()
+{
+  unsigned i;
+  unsigned src = 0xce7ace0f;
+  unsigned res, res_ref;
+
+  for (i = 0; i < 5; ++i) {
+    src = src * (i + 1);
+
+    res_ref = calc_bzhi_u32 (src, i * 2);
+    res = _bzhi_u32 (src, i * 2);
+
+    if (res != res_ref)
+      abort();
+  }
+}
Index: gcc/testsuite/gcc.target/powerpc/bmi2-pdep32-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bmi2-pdep32-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bmi2-pdep32-1.c	(revision 0)
@@ -0,0 +1,41 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -m64 -mcpu=power7" } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <x86intrin.h>
+#include "bmi2-check.h"
+
+__attribute__((noinline))
+unsigned
+calc_pdep_u32 (unsigned a, int mask)
+{
+  unsigned res = 0;
+  int i, k = 0;
+
+  for (i = 0; i < 32; ++i)
+    if (mask & (1 << i)) {
+      res |= ((a & (1 << k)) >> k) << i;
+      ++k;
+    }
+
+  return res;
+}
+
+static void
+bmi2_test ()
+{
+  unsigned i;
+  unsigned src = 0xce7acc;
+  unsigned res, res_ref;
+
+  for (i = 0; i < 5; ++i) {
+    src = src * (i + 1);
+
+    res_ref = calc_pdep_u32 (src, i * 3);
+    res = _pdep_u32 (src, i * 3);
+
+    if (res != res_ref)
+      abort();
+  }
+}
Index: gcc/testsuite/gcc.target/powerpc/bmi-andn-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bmi-andn-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bmi-andn-1.c	(revision 0)
@@ -0,0 +1,33 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -m64" } */
+/* { dg-require-effective-target lp64 } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <x86intrin.h>
+#include "bmi-check.h"
+
+long long calc_andn_u64 (long long src1,
+			 long long src2,
+			 long long dummy)
+{
+  return (~src1 + dummy) & (src2);
+}
+
+static void
+bmi_test()
+{
+  unsigned i;
+
+  long long src = 0xfacec0ffeefacec0;
+  long long res, res_ref;
+
+  for (i=0; i<5; ++i) {
+    src = (i + src) << i;
+
+    res_ref = calc_andn_u64 (src, src+i, 0);
+    res = __andn_u64 (src, src+i);
+
+    if (res != res_ref)
+      abort();
+  }
+}
Index: gcc/testsuite/gcc.target/powerpc/bmi-tzcnt-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bmi-tzcnt-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bmi-tzcnt-1.c	(revision 0)
@@ -0,0 +1,38 @@
+/* { dg-do run } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O3 -m64 -fno-inline" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <x86intrin.h>
+#include "bmi-check.h"
+
+long long calc_tzcnt_u64 (long long src)
+{
+  int i;
+  int res = 0;
+
+  while ( (res<64) && ((src&1) == 0)) {
+    ++res;
+    src >>= 1;
+  }
+
+  return res;
+}
+
+static void
+bmi_test ()
+{
+  unsigned i;
+  long long src = 0xfacec0ffeefacec0;
+  long long res, res_ref;
+
+  for (i=0; i<5; ++i) {
+    src = (i + src) << i;
+
+    res_ref = calc_tzcnt_u64 (src);
+    res = __tzcnt_u64 (src);
+
+    if (res != res_ref)
+      abort();
+  }
+}
Index: gcc/testsuite/gcc.target/powerpc/bmi2-check.h
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bmi2-check.h	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bmi2-check.h	(revision 0)
@@ -0,0 +1,33 @@
+#include <stdio.h>
+#include <stdlib.h>
+
+static void bmi2_test (void);
+
+static void
+__attribute__ ((noinline))
+do_test (void)
+{
+  bmi2_test ();
+}
+
+int
+main ()
+{
+  /* The BMI2 test for pext test requires the Bit Permute doubleword
+     (bpermd) instruction added in PowerISA 2.06 along with the VSX
+     facility.  So we can test for arch_2_06.  */
+  if ( __builtin_cpu_supports ("arch_2_06") )
+    {
+      do_test ();
+#ifdef DEBUG
+      printf ("PASSED\n");
+#endif
+    }
+#ifdef DEBUG
+  else
+    printf ("SKIPPED\n");
+#endif
+
+  return 0;
+}
+
Index: gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1a.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1a.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bmi2-pext64-1a.c	(revision 0)
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -m64 -mcpu=power7" } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <x86intrin.h>
+
+unsigned long long
+test__pexp_cmask_u64 (unsigned long long a[4])
+{
+  /* The _pext implmentation is nominally a popcount of the mask,
+     followed by a loop using count leading zeros to find the
+     next bit to process.
+     If the mask is a const, the popcount should be folded and
+     the constant propagation should eliminate the mask
+     generation loop and produce a single constant bpermd permute
+     control word.
+     This test verifies that the compiler is replacing the mask
+     popcount and loop with a const bperm control and generating
+     the bpermd for this case.  */
+  const unsigned long mask = 0x00000000100000a4UL;
+  unsigned long res;
+  res = _pext_u64 (a[0], mask);
+  res = (res << 8) | _pext_u64 (a[1], mask);
+  res = (res << 8) | _pext_u64 (a[2], mask);
+  res = (res << 8) | _pext_u64 (a[3], mask);
+  return (res);
+}
+/* the resulting assembler should have 4 X bpermd and no popcntd or
+   cntlzd instructions.  */
+
+/* { dg-final { scan-assembler-times "bpermd" 4 } } */
+/* { dg-final { scan-assembler-not "popcntd" } } */
+/* { dg-final { scan-assembler-not "cntlzd" } } */
Index: gcc/testsuite/gcc.target/powerpc/bmi-andn-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bmi-andn-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bmi-andn-2.c	(revision 0)
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -m64" } */
+/* { dg-require-effective-target lp64 } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <x86intrin.h>
+#include "bmi-check.h"
+
+long long calc_andn_u32 (int src1, int src2, int dummy)
+{
+  return (~src1+dummy) & (src2);
+}
+
+static void
+bmi_test()
+{
+  unsigned i;
+
+  int src = 0xfacec0ff;
+  int res, res_ref;
+
+  for (i=0; i<5; ++i) {
+    src = (i + src) << i;
+
+    res_ref = calc_andn_u32 (src, src+i, 0);
+    res = __andn_u32 (src, src+i);
+
+    if (res != res_ref)
+      abort();
+  }
+}
Index: gcc/testsuite/gcc.target/powerpc/bmi-tzcnt-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bmi-tzcnt-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bmi-tzcnt-2.c	(revision 0)
@@ -0,0 +1,37 @@
+/* { dg-do run } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O3 -m64 -fno-inline" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <x86intrin.h>
+#include "bmi-check.h"
+
+int calc_tzcnt_u32 (int src)
+{
+  int i;
+  int res = 0;
+
+  while ( (res<32) && ((src&1) == 0)) {
+    ++res;
+    src >>= 1;
+  }
+  return res;
+}
+
+static void
+bmi_test ()
+{
+  unsigned i;
+  int src = 0xfacec0ff;
+  int res, res_ref;
+
+  for (i=0; i<5; ++i) {
+    src = i + (src << i);
+
+    res_ref = calc_tzcnt_u32 (src);
+    res = __tzcnt_u32 (src);
+
+    if (res != res_ref)
+      abort();
+  }
+}
Index: gcc/testsuite/gcc.target/powerpc/bmi2-bzhi64-1a.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bmi2-bzhi64-1a.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bmi2-bzhi64-1a.c	(revision 0)
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -m64 -mcpu=power7" } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <x86intrin.h>
+
+unsigned long long
+test__bzhi_u64_group (unsigned long long a)
+{
+  /* bzhi is implemented in source as shift left then shift right
+   to clear the high order bits.
+   For the case where the starting index is const, the compiler
+   should reduces this to a single Rotate Left Doubleword
+   Immediate then Clear Left (rldicl) instruction.  */
+  unsigned long long res;
+  res = _bzhi_u64 (a, 8);
+  res += _bzhi_u64 (a, 16);
+  res += _bzhi_u64 (a, 24);
+  res += _bzhi_u64 (a, 32);
+  res += _bzhi_u64 (a, 40);
+  res += _bzhi_u64 (a, 48);
+  return (res);
+}
+/* the resulting assembler should have 6 X rldicl and no sld or
+   srd instructions.  */
+
+/* { dg-final { scan-assembler-times "rldicl" 6 } } */
+/* { dg-final { scan-assembler-not "sld" } } */
+/* { dg-final { scan-assembler-not "srd" } } */
Index: gcc/testsuite/gcc.target/powerpc/bmi2-pext32-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bmi2-pext32-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bmi2-pext32-1.c	(revision 0)
@@ -0,0 +1,41 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -m64 -mcpu=power7" } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <x86intrin.h>
+#include "bmi2-check.h"
+
+__attribute__((noinline))
+unsigned
+calc_pext_u32 (unsigned a, unsigned mask)
+{
+  unsigned res = 0;
+  int i, k = 0;
+
+  for (i = 0; i < 32; ++i)
+    if (mask & (1 << i)) {
+      res |= ((a & (1 << i)) >> i) << k;
+      ++k;
+    }
+
+  return res;
+}
+
+static void
+bmi2_test ()
+{
+  unsigned i;
+  unsigned src = 0xce7acc;
+  unsigned res, res_ref;
+
+  for (i = 0; i < 5; ++i) {
+    src = src * (i + 1);
+
+    res_ref = calc_pext_u32 (src, ~(i * 3));
+    res = _pext_u32 (src, ~(i * 3));
+
+    if (res != res_ref)
+      abort();
+  }
+}
Index: gcc/testsuite/gcc.target/powerpc/bmi2-mulx64-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bmi2-mulx64-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bmi2-mulx64-1.c	(revision 0)
@@ -0,0 +1,38 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -m64 -mcpu=power7" } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include "bmi2-check.h"
+
+__attribute__((noinline))
+unsigned __int128
+calc_mul_u64 (unsigned long long volatile a, unsigned long long b)
+{
+  unsigned __int128 res = 0;
+  int i;
+  for (i = 0; i < b; ++i)
+    res += (unsigned __int128) a;
+
+  return res;
+}
+
+static void
+bmi2_test ()
+{
+  unsigned i;
+  unsigned long long a = 0xce7ace0ce7ace0;
+  unsigned long long b = 0xface;
+  unsigned __int128 res, res_ref;
+
+  for (i=0; i<5; ++i) {
+    a = a * (i + 1);
+    b = b / (i + 1);
+
+    res_ref = calc_mul_u64 (a, b);
+    res = (unsigned __int128) a * b;
+
+    if (res != res_ref)
+      abort();
+  }
+}
Index: gcc/testsuite/gcc.target/powerpc/bmi2-mulx64-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bmi2-mulx64-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bmi2-mulx64-2.c	(revision 0)
@@ -0,0 +1,53 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -m64 -mcpu=power7" } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <x86intrin.h>
+#include "bmi2-check.h"
+
+__attribute__((noinline))
+unsigned __int128
+calc_mul_u64 (unsigned long long volatile a, unsigned long long b)
+{
+  unsigned __int128 res = 0;
+  int i;
+  for (i = 0; i < b; ++i)
+    res += (unsigned __int128) a;
+
+  return res;
+}
+
+__attribute__((noinline))
+unsigned long long
+calc_mulx_u64 (unsigned long long x,
+	       unsigned long long y,
+	       unsigned long long *res_h)
+{
+  return _mulx_u64 (x, y, res_h);
+}
+
+
+static void
+bmi2_test ()
+{
+  unsigned i;
+  unsigned long long a = 0xce7ace0ce7ace0;
+  unsigned long long b = 0xface;
+  unsigned long long res_l, res_h;
+  unsigned __int128 res, res_ref;
+
+  for (i=0; i<5; ++i) {
+    a = a * (i + 1);
+    b = b / (i + 1);
+
+    res_ref = calc_mul_u64 (a, b);
+
+    res_l = calc_mulx_u64 (a, b, &res_h);
+
+    res = ((unsigned __int128) res_h << 64) | res_l;
+
+    if (res != res_ref)
+      abort();
+  }
+}
Index: gcc/testsuite/gcc.target/powerpc/bmi2-mulx32-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bmi2-mulx32-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bmi2-mulx32-1.c	(revision 0)
@@ -0,0 +1,49 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -m64 -mcpu=power7" } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include "bmi2-check.h"
+
+__attribute__((noinline))
+unsigned long long
+calc_mul_u32 (unsigned volatile a, unsigned b)
+{
+  unsigned long long res = 0;
+  int i;
+  for (i = 0; i < b; ++i)
+    res += a;
+
+  return res;
+}
+
+__attribute__((noinline))
+unsigned long long
+gen_mulx (unsigned a, unsigned b)
+{
+  unsigned long long res;
+
+  res = (unsigned long long)a * b;
+
+  return res;
+}
+
+static void
+bmi2_test ()
+{
+  unsigned i;
+  unsigned a = 0xce7ace0;
+  unsigned b = 0xfacefff;
+  unsigned long long res, res_ref;
+
+  for (i = 0; i < 5; ++i) {
+    a = a * (i + 1);
+    b = b / (i + 1);
+
+    res_ref = calc_mul_u32 (a, b);
+    res = gen_mulx (a, b);
+
+    if (res != res_ref)
+      abort();
+  }
+}
Index: gcc/testsuite/gcc.target/powerpc/bmi2-mulx32-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bmi2-mulx32-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bmi2-mulx32-2.c	(revision 0)
@@ -0,0 +1,48 @@
+/* { dg-do run } */
+/* { dg-options "-O3 -m64 -mcpu=power7" } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <x86intrin.h>
+#include "bmi2-check.h"
+
+__attribute__((noinline))
+unsigned long long
+calc_mul_u32 (unsigned volatile a, unsigned b)
+{
+  unsigned long long res = 0;
+  int i;
+  for (i = 0; i < b; ++i)
+    res += a;
+
+  return res;
+}
+
+__attribute__((noinline))
+unsigned calc_mulx_u32 (unsigned x, unsigned y, unsigned *res_h)
+{
+  return (unsigned) _mulx_u32 (x, y, res_h);
+}
+
+static void
+bmi2_test ()
+{
+  unsigned i;
+  unsigned a = 0xce7ace0;
+  unsigned b = 0xfacefff;
+  unsigned res_l, res_h;
+  unsigned long long res, res_ref;
+
+  for (i = 0; i < 5; ++i) {
+    a = a * (i + 1);
+    b = b / (i + 1);
+
+    res_ref = calc_mul_u32 (a, b);
+    res_l = calc_mulx_u32 (a, b, &res_h);
+
+    res = ((unsigned long long) res_h << 32) | res_l;
+
+    if (res != res_ref)
+      abort();
+  }
+}
Index: gcc/testsuite/gcc.target/powerpc/bmi-bextr-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/bmi-bextr-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/bmi-bextr-1.c	(revision 0)
@@ -0,0 +1,50 @@
+/* { dg-do run } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O2 -m64 -fno-inline" } */
+
+#define NO_WARN_X86_INTRINSICS 1
+#include <x86intrin.h>
+#include "bmi-check.h"
+
+long long calc_bextr_u64 (unsigned long long src1,
+			  unsigned long long src2)
+{
+  long long res = 0;
+  unsigned char start = (src2 & 0xff);
+  unsigned char len = (int) ((src2 >> 8) & 0xff);
+  if (start < 64) {
+    unsigned i;
+    unsigned last = (start+len) < 64 ? start+len : 64;
+
+    src1 >>= start;
+    for (i=start; i<last; ++i) {
+      res |= (src1 & 1) << (i-start);
+      src1 >>= 1;
+    }
+  }
+
+  return res;
+}
+
+static void
+bmi_test ()
+{
+  unsigned i;
+  unsigned char start, len;
+  unsigned long long src1 = 0xfacec0ffeefacec0;
+  unsigned long long res, res_ref, src2;
+
+  for (i=0; i<5; ++i) {
+    start = (i * 1983) % 64;
+    len = (i + (i * 1983)) % 64;
+
+    src1 = src1 * 3;
+    src2 = start | (((unsigned long long)len) << 8);
+
+    res_ref = calc_bextr_u64 (src1, src2);
+    res = __bextr_u64 (src1, src2);
+
+    if (res != res_ref)
+      abort ();
+  }
+}
Index: gcc/config.gcc
===================================================================
--- gcc/config.gcc	(revision 247616)
+++ gcc/config.gcc	(working copy)
@@ -444,7 +444,7 @@ nvptx-*-*)
 	;;
 powerpc*-*-*)
 	cpu_type=rs6000
-	extra_headers="ppc-asm.h altivec.h spe.h ppu_intrinsics.h paired.h
spu2vmx.h vec_types.h si2vmx.h htmintrin.h htmxlintrin.h"
+	extra_headers="ppc-asm.h altivec.h spe.h ppu_intrinsics.h paired.h
spu2vmx.h vec_types.h si2vmx.h htmintrin.h htmxlintrin.h bmi2intrin.h
bmiintrin.h x86intrin.h"
 	case x$with_cpu in
 	    xpowerpc64|xdefault64|x6[23]0|x970|xG5|xpower[3456789]|xpower6x|
xrs64a|xcell|xa2|xe500mc64|xe5500|xe6500)
 		cpu_is_64bit=yes
Index: gcc/config/rs6000/bmiintrin.h
===================================================================
--- gcc/config/rs6000/bmiintrin.h	(revision 0)
+++ gcc/config/rs6000/bmiintrin.h	(revision 0)
@@ -0,0 +1,187 @@
+/* Copyright (C) 2010-2017 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License
and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not,
see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This header is distributed to simplify porting x86_64 code that
+   makes explicit use of Intel intrinsics to powerpc64le.
+   It is the user's responsibility to determine if the results are
+   acceptable and make additional changes as necessary.
+   Note that much code that uses Intel intrinsics can be rewritten in
+   standard C or GNU C extensions, which are more portable and better
+   optimized across multiple targets.  */
+
+#if !defined _X86INTRIN_H_INCLUDED
+# error "Never use <bmiintrin.h> directly; include <x86intrin.h>
instead."
+#endif
+
+#ifndef _BMIINTRIN_H_INCLUDED
+#define _BMIINTRIN_H_INCLUDED
+
+extern __inline unsigned short __attribute__((__gnu_inline__,
__always_inline__, __artificial__))
+__tzcnt_u16 (unsigned short __X)
+{
+  return __builtin_ctz (__X);
+}
+
+extern __inline unsigned int __attribute__((__gnu_inline__,
__always_inline__, __artificial__))
+__andn_u32 (unsigned int __X, unsigned int __Y)
+{
+  return (~__X & __Y);
+}
+
+extern __inline unsigned int __attribute__((__gnu_inline__,
__always_inline__, __artificial__))
+_bextr_u32 (unsigned int __X, unsigned int __P, unsigned int __L)
+{
+  return ((__X << (32 - (__L + __P))) >> (32 - __L));
+}
+
+extern __inline unsigned int __attribute__((__gnu_inline__,
__always_inline__, __artificial__))
+__bextr_u32 (unsigned int __X, unsigned int __Y)
+{
+  unsigned int __P, __L;
+  __P = __Y & 0xFF;
+  __L = (__Y >> 8) & 0xFF;
+  return (_bextr_u32 (__X, __P, __L));
+}
+
+extern __inline unsigned int __attribute__((__gnu_inline__,
__always_inline__, __artificial__))
+__blsi_u32 (unsigned int __X)
+{
+  return (__X & -__X);
+}
+
+extern __inline unsigned int __attribute__((__gnu_inline__,
__always_inline__, __artificial__))
+_blsi_u32 (unsigned int __X)
+{
+  return __blsi_u32 (__X);
+}
+
+extern __inline unsigned int __attribute__((__gnu_inline__,
__always_inline__, __artificial__))
+__blsmsk_u32 (unsigned int __X)
+{
+  return (__X ^ (__X - 1));
+}
+
+extern __inline unsigned int __attribute__((__gnu_inline__,
__always_inline__, __artificial__))
+_blsmsk_u32 (unsigned int __X)
+{
+  return __blsmsk_u32 (__X);
+}
+
+extern __inline unsigned int __attribute__((__gnu_inline__,
__always_inline__, __artificial__))
+__blsr_u32 (unsigned int __X)
+{
+  return (__X & (__X - 1));
+}
+
+extern __inline unsigned int __attribute__((__gnu_inline__,
__always_inline__, __artificial__))
+_blsr_u32 (unsigned int __X)
+{
+  return __blsr_u32 (__X);
+}
+
+extern __inline unsigned int __attribute__((__gnu_inline__,
__always_inline__, __artificial__))
+__tzcnt_u32 (unsigned int __X)
+{
+  return __builtin_ctz (__X);
+}
+
+extern __inline unsigned int __attribute__((__gnu_inline__,
__always_inline__, __artificial__))
+_tzcnt_u32 (unsigned int __X)
+{
+  return __builtin_ctz (__X);
+}
+
+/* use the 64-bit shift, rotate, and count leading zeros instructions
+   for long long.  */
+#ifdef  __PPC64__
+extern __inline unsigned long long __attribute__((__gnu_inline__,
__always_inline__, __artificial__))
+__andn_u64 (unsigned long long __X, unsigned long long __Y)
+{
+  return (~__X & __Y);
+}
+
+extern __inline unsigned long long __attribute__((__gnu_inline__,
__always_inline__, __artificial__))
+_bextr_u64 (unsigned long long __X, unsigned int __P, unsigned int __L)
+{
+  return ((__X << (64 - (__L + __P))) >> (64 - __L));
+}
+
+extern __inline unsigned long long __attribute__((__gnu_inline__,
__always_inline__, __artificial__))
+__bextr_u64 (unsigned long long __X, unsigned long long __Y)
+{
+  unsigned int __P, __L;
+  __P = __Y & 0xFF;
+  __L = (__Y & 0xFF00) >> 8;
+  return (_bextr_u64 (__X, __P, __L));
+}
+
+extern __inline unsigned long long __attribute__((__gnu_inline__,
__always_inline__, __artificial__))
+__blsi_u64 (unsigned long long __X)
+{
+  return __X & -__X;
+}
+
+extern __inline unsigned long long __attribute__((__gnu_inline__,
__always_inline__, __artificial__))
+_blsi_u64 (unsigned long long __X)
+{
+  return __blsi_u64 (__X);
+}
+
+extern __inline unsigned long long __attribute__((__gnu_inline__,
__always_inline__, __artificial__))
+__blsmsk_u64 (unsigned long long __X)
+{
+  return (__X ^ (__X - 1));
+}
+
+extern __inline unsigned long long __attribute__((__gnu_inline__,
__always_inline__, __artificial__))
+_blsmsk_u64 (unsigned long long __X)
+{
+  return __blsmsk_u64 (__X);
+}
+
+extern __inline unsigned long long __attribute__((__gnu_inline__,
__always_inline__, __artificial__))
+__blsr_u64 (unsigned long long __X)
+{
+  return (__X & (__X - 1));
+}
+
+extern __inline unsigned long long __attribute__((__gnu_inline__,
__always_inline__, __artificial__))
+_blsr_u64 (unsigned long long __X)
+{
+  return __blsr_u64 (__X);
+}
+
+extern __inline unsigned long long __attribute__((__gnu_inline__,
__always_inline__, __artificial__))
+__tzcnt_u64 (unsigned long long __X)
+{
+  return __builtin_ctzll (__X);
+}
+
+extern __inline unsigned long long __attribute__((__gnu_inline__,
__always_inline__, __artificial__))
+_tzcnt_u64 (unsigned long long __X)
+{
+  return __builtin_ctzll (__X);
+}
+#endif /* __PPC64__  */
+
+#endif /* _BMIINTRIN_H_INCLUDED */
Index: gcc/config/rs6000/bmi2intrin.h
===================================================================
--- gcc/config/rs6000/bmi2intrin.h	(revision 0)
+++ gcc/config/rs6000/bmi2intrin.h	(revision 0)
@@ -0,0 +1,169 @@
+/* Copyright (C) 2011-2017 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License
and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not,
see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This header is distributed to simplify porting x86_64 code that
+   makes explicit use of Intel intrinsics to powerpc64le.
+   It is the user's responsibility to determine if the results are
+   acceptable and make additional changes as necessary.
+   Note that much code that uses Intel intrinsics can be rewritten in
+   standard C or GNU C extensions, which are more portable and better
+   optimized across multiple targets.  */
+
+#if !defined _X86INTRIN_H_INCLUDED
+# error "Never use <bmi2intrin.h> directly; include <x86intrin.h>
instead."
+#endif
+
+#ifndef _BMI2INTRIN_H_INCLUDED
+#define _BMI2INTRIN_H_INCLUDED
+
+extern __inline unsigned int
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_bzhi_u32 (unsigned int __X, unsigned int __Y)
+{
+  return ((__X << (32 - __Y)) >> (32 - __Y));
+}
+
+extern __inline unsigned int
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mulx_u32 (unsigned int __X, unsigned int __Y, unsigned int *__P)
+{
+  unsigned long long __res = (unsigned long long) __X * __Y;
+  *__P = (unsigned int) (__res >> 32);
+  return (unsigned int) __res;
+}
+
+#ifdef  __PPC64__
+extern __inline unsigned long long
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_bzhi_u64 (unsigned long long __X, unsigned long long __Y)
+{
+  return ((__X << (64 - __Y)) >> (64 - __Y));
+}
+
+/* __int128 requires base 64-bit.  */
+extern __inline unsigned long long
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_mulx_u64 (unsigned long long __X, unsigned long long __Y,
+	   unsigned long long *__P)
+{
+  unsigned __int128 __res = (unsigned __int128) __X * __Y;
+  *__P = (unsigned long long) (__res >> 64);
+  return (unsigned long long) __res;
+}
+
+#ifdef  _ARCH_PWR7
+/* popcount and bpermd require power7 minimum.  */
+extern __inline unsigned long long
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_pdep_u64 (unsigned long long __X, unsigned long long __M)
+{
+  unsigned long result = 0x0UL;
+  const unsigned long mask = 0x8000000000000000UL;
+  unsigned long m = __M;
+  unsigned long c, t;
+  unsigned long p;
+
+  /* The pop-count of the mask gives the number of the bits from
+   source to process.  This is also needed to shift bits from the
+   source into the correct position for the result.  */
+  p = 64 - __builtin_popcountl (__M);
+
+  /* The loop is for the number of '1' bits in the mask and clearing
+   each mask bit as it is processed.  */
+  while (m != 0)
+    {
+      c = __builtin_clzl (m);
+      t = __X << (p - c);
+      m ^= (mask >> c);
+      result |= (t & (mask >> c));
+      p++;
+    }
+  return (result);
+}
+
+extern __inline unsigned long long
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_pext_u64 (unsigned long long __X, unsigned long long __M)
+{
+  unsigned long p = 0x4040404040404040UL; // initial bit permute
control
+  const unsigned long mask = 0x8000000000000000UL;
+  unsigned long m = __M;
+  unsigned long c;
+  unsigned long result;
+
+  /* if the mask is constant and selects 8 bits or less we can use
+   the Power8 Bit permute instruction.  */
+  if (__builtin_constant_p (__M) && (__builtin_popcountl (__M) <= 8))
+    {
+      /* Also if the pext mask is constant, then the popcount is
+       constant, we can evaluate the following loop at compile
+       time and use a constant bit permute vector.  */
+      for (long i = 0; i < __builtin_popcountl (__M); i++)
+	{
+	  c = __builtin_clzl (m);
+	  p = (p << 8) | c;
+	  m ^= (mask >> c);
+	}
+      result = __builtin_bpermd (p, __X);
+    }
+  else
+    {
+      p = 64 - __builtin_popcountl (__M);
+      result = 0;
+      /* We could a use a for loop here, but that combined with
+       -funroll-loops can expand to a lot of code.  The while
+       loop avoids unrolling and the compiler commons the xor
+       from clearing the mask bit with the (m != 0) test.  The
+       result is a more compact loop setup and body.  */
+      while (m != 0)
+	{
+	  unsigned long t;
+	  c = __builtin_clzl (m);
+	  t = (__X & (mask >> c)) >> (p - c);
+	  m ^= (mask >> c);
+	  result |= (t);
+	  p++;
+	}
+    }
+  return (result);
+}
+
+/* these 32-bit implementations depend on 64-bit pdep/pext
+   which depend on _ARCH_PWR7.  */
+extern __inline unsigned int
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_pdep_u32 (unsigned int __X, unsigned int __Y)
+{
+  return _pdep_u64 (__X, __Y);
+}
+
+extern __inline unsigned int
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+_pext_u32 (unsigned int __X, unsigned int __Y)
+{
+  return _pext_u64 (__X, __Y);
+}
+#endif /* _ARCH_PWR7  */
+#endif /* __PPC64__  */
+
+#endif /* _BMI2INTRIN_H_INCLUDED */
Index: gcc/config/rs6000/x86intrin.h
===================================================================
--- gcc/config/rs6000/x86intrin.h	(revision 0)
+++ gcc/config/rs6000/x86intrin.h	(revision 0)
@@ -0,0 +1,43 @@
+/* Copyright (C) 2008-2017 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License
and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not,
see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef NO_WARN_X86_INTRINSICS
+/* This header is distributed to simplify porting x86_64 code that
+   makes explicit use of Intel intrinsics to powerpc64le.
+   It is the user's responsibility to determine if the results are
+   acceptable and make additional changes as necessary.
+   Note that much code that uses Intel intrinsics can be rewritten in
+   standard C or GNU C extensions, which are more portable and better
+   optimized across multiple targets.  */
+#warning "Please read comment above.  Use -DNO_WARN_X86_INTRINSICS to
disable this warning."
+#endif
+
+#ifndef _X86INTRIN_H_INCLUDED
+#define _X86INTRIN_H_INCLUDED
+
+#include <bmiintrin.h>
+
+#include <bmi2intrin.h>
+
+
+#endif /* _X86INTRIN_H_INCLUDED */

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH, rs6000] Add x86 instrinsic headers to GCC PPC64LE taget
  2017-05-08 14:50 [PATCH, rs6000] Add x86 instrinsic headers to GCC PPC64LE taget Steven Munroe
@ 2017-05-09 17:35 ` Segher Boessenkool
  2017-05-09 20:01   ` Steven Munroe
  2017-05-12 18:39 ` Mike Stump
  1 sibling, 1 reply; 10+ messages in thread
From: Segher Boessenkool @ 2017-05-09 17:35 UTC (permalink / raw)
  To: Steven Munroe; +Cc: gcc-patches, David Edelsohn

Hi!

On Mon, May 08, 2017 at 09:49:57AM -0500, Steven Munroe wrote:
> Thus I would like to restrict this support to PowerPC
> targets that support VMX/VSX and PowerISA-2.07 (power8) and later.

What happens if you run it on an older machine, or as BE or 32-bit,
or with vectors disabled?

> So I propose to submit a series of patches to implement the PowerPC64LE
> equivalent of a useful subset of the x86 intrinsics. The final size and
> usefulness of this effort is to be determined. The proposal is to
> incrementally port intrinsic header files from the ./config/i386 tree to
> the ./config/rs6000 tree. This naturally provides the same header
> structure and intrinsic names which will simplify code porting.

Yeah.

I'd still like to see these headers moved into some subdir (both in
the source tree and in the installed headers tree), to reduce clutter,
but I understand it's not trivial to do.

> To get the ball rolling I include the BMI intrinsics ported to PowerPC
> for review as they are reasonable size (31 intrinsic implementations).

This is okay for trunk.  Thanks!

> --- gcc/config.gcc	(revision 247616)
> +++ gcc/config.gcc	(working copy)
> @@ -444,7 +444,7 @@ nvptx-*-*)
>  	;;
>  powerpc*-*-*)
>  	cpu_type=rs6000
> -	extra_headers="ppc-asm.h altivec.h spe.h ppu_intrinsics.h paired.h
> spu2vmx.h vec_types.h si2vmx.h htmintrin.h htmxlintrin.h"
> +	extra_headers="ppc-asm.h altivec.h spe.h ppu_intrinsics.h paired.h
> spu2vmx.h vec_types.h si2vmx.h htmintrin.h htmxlintrin.h bmi2intrin.h
> bmiintrin.h x86intrin.h"

(Your mail client wrapped this).

Write this on a separate line?  Like
  extra_headers="${extra_headers} htmintrin.h htmxlintrin.h bmi2intrin.h"
(You cannot use += here, pity).


Segher

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH, rs6000] Add x86 instrinsic headers to GCC PPC64LE taget
  2017-05-09 17:35 ` Segher Boessenkool
@ 2017-05-09 20:01   ` Steven Munroe
  2017-05-09 21:03     ` Segher Boessenkool
  0 siblings, 1 reply; 10+ messages in thread
From: Steven Munroe @ 2017-05-09 20:01 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, David Edelsohn

On Tue, 2017-05-09 at 12:23 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, May 08, 2017 at 09:49:57AM -0500, Steven Munroe wrote:
> > Thus I would like to restrict this support to PowerPC
> > targets that support VMX/VSX and PowerISA-2.07 (power8) and later.
> 
> What happens if you run it on an older machine, or as BE or 32-bit,
> or with vectors disabled?
> 
Well I hope that I set the dg-require-effective-target correctly because
while some of these intrinsics might work on the BE or 32-bit machine,
most will not.

For example; many of the BMI intrinsic implementations depend on 64-bit
instructions and so I use { dg-require-effective-target lp64 }.  The
BMI2 intrinsic _pext exploits the Bit Permute Doubleword instruction.
There is no Bit Permute Word instruction. So for BMI2 I use
{ dg-require-effective-target powerpc_vsx_ok } as bpermd was introduced
in PowerISA 2.06 along with the Vector Scalar Extension facility.

The situation gets more complicated when we start looking at the
SSE/SSE2. These headers define many variants of load and store
instructions that are decidedly LE and many unaligned forms. While
powerpc64le handles this with ease, implementing LE semantics in BE mode
gets seriously tricky. I think it is better to avoid this and only
support these headers for LE.

And while some SSE instrinsics can be implemented with VMX instructions
all the SSE2 double float intrinsics require VSX. And some PowerISA 2.07
instructions simplify implementation if available. As power8 is also the
first supported powerpc64le system it seems the logical starting point
for most of this work. 

I don't plan to spend effort on supporting Intel intrinsic functions on
older PowerPC machines (before power8) or BE.

> > So I propose to submit a series of patches to implement the PowerPC64LE
> > equivalent of a useful subset of the x86 intrinsics. The final size and
> > usefulness of this effort is to be determined. The proposal is to
> > incrementally port intrinsic header files from the ./config/i386 tree to
> > the ./config/rs6000 tree. This naturally provides the same header
> > structure and intrinsic names which will simplify code porting.
> 
> Yeah.
> 
> I'd still like to see these headers moved into some subdir (both in
> the source tree and in the installed headers tree), to reduce clutter,
> but I understand it's not trivial to do.
> 
> > To get the ball rolling I include the BMI intrinsics ported to PowerPC
> > for review as they are reasonable size (31 intrinsic implementations).
> 
> This is okay for trunk.  Thanks!
> 
Thank you

> > --- gcc/config.gcc	(revision 247616)
> > +++ gcc/config.gcc	(working copy)
> > @@ -444,7 +444,7 @@ nvptx-*-*)
> >  	;;
> >  powerpc*-*-*)
> >  	cpu_type=rs6000
> > -	extra_headers="ppc-asm.h altivec.h spe.h ppu_intrinsics.h paired.h
> > spu2vmx.h vec_types.h si2vmx.h htmintrin.h htmxlintrin.h"
> > +	extra_headers="ppc-asm.h altivec.h spe.h ppu_intrinsics.h paired.h
> > spu2vmx.h vec_types.h si2vmx.h htmintrin.h htmxlintrin.h bmi2intrin.h
> > bmiintrin.h x86intrin.h"
> 
> (Your mail client wrapped this).
> 
> Write this on a separate line?  Like
>   extra_headers="${extra_headers} htmintrin.h htmxlintrin.h bmi2intrin.h"
> (You cannot use += here, pity).
> 
> 
> Segher
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH, rs6000] Add x86 instrinsic headers to GCC PPC64LE taget
  2017-05-09 20:01   ` Steven Munroe
@ 2017-05-09 21:03     ` Segher Boessenkool
  2017-05-10 18:05       ` Steven Munroe
  0 siblings, 1 reply; 10+ messages in thread
From: Segher Boessenkool @ 2017-05-09 21:03 UTC (permalink / raw)
  To: Steven Munroe; +Cc: gcc-patches, David Edelsohn

On Tue, May 09, 2017 at 02:33:00PM -0500, Steven Munroe wrote:
> On Tue, 2017-05-09 at 12:23 -0500, Segher Boessenkool wrote:
> > On Mon, May 08, 2017 at 09:49:57AM -0500, Steven Munroe wrote:
> > > Thus I would like to restrict this support to PowerPC
> > > targets that support VMX/VSX and PowerISA-2.07 (power8) and later.
> > 
> > What happens if you run it on an older machine, or as BE or 32-bit,
> > or with vectors disabled?
> > 
> Well I hope that I set the dg-require-effective-target correctly because
> while some of these intrinsics might work on the BE or 32-bit machine,
> most will not.

That is just for the testsuite; I meant what happens if a user tries
to use it with an older target (or BE, or 32-bit)?  Is there a useful,
obvious error message?

> The situation gets more complicated when we start looking at the
> SSE/SSE2. These headers define many variants of load and store
> instructions that are decidedly LE and many unaligned forms. While
> powerpc64le handles this with ease, implementing LE semantics in BE mode
> gets seriously tricky. I think it is better to avoid this and only
> support these headers for LE.

Right.

> And while some SSE instrinsics can be implemented with VMX instructions
> all the SSE2 double float intrinsics require VSX. And some PowerISA 2.07
> instructions simplify implementation if available. As power8 is also the
> first supported powerpc64le system it seems the logical starting point
> for most of this work. 

Agreed as well.

> I don't plan to spend effort on supporting Intel intrinsic functions on
> older PowerPC machines (before power8) or BE.

Just make sure if anyone tries anyway, there is a clear error message
that tells them not to.


Segher

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH, rs6000] Add x86 instrinsic headers to GCC PPC64LE taget
  2017-05-09 21:03     ` Segher Boessenkool
@ 2017-05-10 18:05       ` Steven Munroe
  2017-05-11 14:52         ` Segher Boessenkool
  0 siblings, 1 reply; 10+ messages in thread
From: Steven Munroe @ 2017-05-10 18:05 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, David Edelsohn

On Tue, 2017-05-09 at 16:03 -0500, Segher Boessenkool wrote:
> On Tue, May 09, 2017 at 02:33:00PM -0500, Steven Munroe wrote:
> > On Tue, 2017-05-09 at 12:23 -0500, Segher Boessenkool wrote:
> > > On Mon, May 08, 2017 at 09:49:57AM -0500, Steven Munroe wrote:
> > > > Thus I would like to restrict this support to PowerPC
> > > > targets that support VMX/VSX and PowerISA-2.07 (power8) and later.
> > > 
> > > What happens if you run it on an older machine, or as BE or 32-bit,
> > > or with vectors disabled?
> > > 
> > Well I hope that I set the dg-require-effective-target correctly because
> > while some of these intrinsics might work on the BE or 32-bit machine,
> > most will not.
> 
> That is just for the testsuite; I meant what happens if a user tries
> to use it with an older target (or BE, or 32-bit)?  Is there a useful,
> obvious error message?
> 
So looking at the X86 headers, their current practice falls into two two
areas. 

1) guard 64-bit dependent intrinsic functions with:

#ifdef __x86_64__
#endif

But they do not provide any warnings. I assume that attempting to use an
intrinsic of this class would result in an implicit function declaration
and a link-time failure.

2) guard architecture level dependent intrinsic header content with:

#ifndef __AVX__
#pragma GCC push_options
#pragma GCC target("avx")
#define __DISABLE_AVX__
#endif /* __AVX__ */
...

#ifdef __DISABLE_AVX__
#undef __DISABLE_AVX__
#pragma GCC pop_options
#endif /* __DISABLE_AVX__ */

So they don't many any attempt to prevent them from using a specific
header. If the compiler version does not support the "GCC target" I
assume that specific did not exist in that version. 

If GCC does support that target then the '#pragma GCC target("avx")'
will enable code generation, but the user might get a SIGILL if the
hardware they have does not support those instructions.

In the BMI headers I already guard with:

#ifdef  __PPC64__
#endif

This means that like x86_64, attempting to use _pext_u64 on a 32-bit
compiler will result in an implicit function declaration and cause a
linker error.

This is sufficient for most of BMI and BMI2 (registers only / endian
agnostic). But this does not address the larger issues (for SSE/SSE2+)
which needing VXS implementation or restricting to LE.

So should I check for:

#ifdef __VSX__
#endif

or 

#ifdef __POWER8_VECTOR__

or 

#ifdef _ARCH_PWR8

and perhaps:

#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__

as well to enforce this. 

And are you suggesting I add an #else clause with #warning or #error? Or
is the implicit function and link failure sufficient?

> > The situation gets more complicated when we start looking at the
> > SSE/SSE2. These headers define many variants of load and store
> > instructions that are decidedly LE and many unaligned forms. While
> > powerpc64le handles this with ease, implementing LE semantics in BE mode
> > gets seriously tricky. I think it is better to avoid this and only
> > support these headers for LE.
> 
> Right.
> 
> > And while some SSE instrinsics can be implemented with VMX instructions
> > all the SSE2 double float intrinsics require VSX. And some PowerISA 2.07
> > instructions simplify implementation if available. As power8 is also the
> > first supported powerpc64le system it seems the logical starting point
> > for most of this work. 
> 
> Agreed as well.
> 
> > I don't plan to spend effort on supporting Intel intrinsic functions on
> > older PowerPC machines (before power8) or BE.
> 
> Just make sure if anyone tries anyway, there is a clear error message
> that tells them not to.
> 
> 
> Segher
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH, rs6000] Add x86 instrinsic headers to GCC PPC64LE taget
  2017-05-10 18:05       ` Steven Munroe
@ 2017-05-11 14:52         ` Segher Boessenkool
  2017-05-11 19:27           ` Steven Munroe
  0 siblings, 1 reply; 10+ messages in thread
From: Segher Boessenkool @ 2017-05-11 14:52 UTC (permalink / raw)
  To: Steven Munroe; +Cc: gcc-patches, David Edelsohn

On Wed, May 10, 2017 at 12:59:28PM -0500, Steven Munroe wrote:
> > That is just for the testsuite; I meant what happens if a user tries
> > to use it with an older target (or BE, or 32-bit)?  Is there a useful,
> > obvious error message?
> > 
> So looking at the X86 headers, their current practice falls into two two
> areas. 
> 
> 1) guard 64-bit dependent intrinsic functions with:
> 
> #ifdef __x86_64__
> #endif
> 
> But they do not provide any warnings. I assume that attempting to use an
> intrinsic of this class would result in an implicit function declaration
> and a link-time failure.

Yeah probably.  Which is fine -- it does not silently do the wrong thing,
and it is easy to find where the problem is.

> If GCC does support that target then the '#pragma GCC target("avx")'
> will enable code generation, but the user might get a SIGILL if the
> hardware they have does not support those instructions.

That is less friendly, but it still does not silently generate bad code.

> In the BMI headers I already guard with:
> 
> #ifdef  __PPC64__
> #endif
> 
> This means that like x86_64, attempting to use _pext_u64 on a 32-bit
> compiler will result in an implicit function declaration and cause a
> linker error.

Yup, that's fine.

> This is sufficient for most of BMI and BMI2 (registers only / endian
> agnostic). But this does not address the larger issues (for SSE/SSE2+)
> which needing VXS implementation or restricting to LE.

Right.

> So should I check for:
> 
> #ifdef __VSX__
> #endif
> 
> or 
> 
> #ifdef __POWER8_VECTOR__
> 
> or 
> 
> #ifdef _ARCH_PWR8
> 
> and perhaps:
> 
> #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
> 
> as well to enforce this. 
> 
> And are you suggesting I add an #else clause with #warning or #error? Or
> is the implicit function and link failure sufficient?

The first is friendlier, the second is sufficient I think.

Maybe it is good enough to check for LE only?  Most unmodified code
written for x86 (using intrinsics etc.) will not work correctly on BE.
And if you restrict to LE you get 64-bit and POWER8 automatically.

So maybe just require LE?


Segher

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH, rs6000] Add x86 instrinsic headers to GCC PPC64LE taget
  2017-05-11 14:52         ` Segher Boessenkool
@ 2017-05-11 19:27           ` Steven Munroe
  2017-05-14  1:57             ` David Edelsohn
  0 siblings, 1 reply; 10+ messages in thread
From: Steven Munroe @ 2017-05-11 19:27 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, David Edelsohn

On Thu, 2017-05-11 at 09:39 -0500, Segher Boessenkool wrote:
> On Wed, May 10, 2017 at 12:59:28PM -0500, Steven Munroe wrote:
> > > That is just for the testsuite; I meant what happens if a user tries
> > > to use it with an older target (or BE, or 32-bit)?  Is there a useful,
> > > obvious error message?
> > > 
> > So looking at the X86 headers, their current practice falls into two two
> > areas. 
> > 
> > 1) guard 64-bit dependent intrinsic functions with:
> > 
> > #ifdef __x86_64__
> > #endif
> > 
> > But they do not provide any warnings. I assume that attempting to use an
> > intrinsic of this class would result in an implicit function declaration
> > and a link-time failure.
> 
> Yeah probably.  Which is fine -- it does not silently do the wrong thing,
> and it is easy to find where the problem is.
> 
> > If GCC does support that target then the '#pragma GCC target("avx")'
> > will enable code generation, but the user might get a SIGILL if the
> > hardware they have does not support those instructions.
> 
> That is less friendly, but it still does not silently generate bad code.
> 
> > In the BMI headers I already guard with:
> > 
> > #ifdef  __PPC64__
> > #endif
> > 
> > This means that like x86_64, attempting to use _pext_u64 on a 32-bit
> > compiler will result in an implicit function declaration and cause a
> > linker error.
> 
> Yup, that's fine.
> 
> > This is sufficient for most of BMI and BMI2 (registers only / endian
> > agnostic). But this does not address the larger issues (for SSE/SSE2+)
> > which needing VXS implementation or restricting to LE.
> 
> Right.
> 
> > So should I check for:
> > 
> > #ifdef __VSX__
> > #endif
> > 
> > or 
> > 
> > #ifdef __POWER8_VECTOR__
> > 
> > or 
> > 
> > #ifdef _ARCH_PWR8
> > 
> > and perhaps:
> > 
> > #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
> > 
> > as well to enforce this. 
> > 
> > And are you suggesting I add an #else clause with #warning or #error? Or
> > is the implicit function and link failure sufficient?
> 
> The first is friendlier, the second is sufficient I think.
> 
> Maybe it is good enough to check for LE only?  Most unmodified code
> written for x86 (using intrinsics etc.) will not work correctly on BE.
> And if you restrict to LE you get 64-bit and POWER8 automatically.
> 
> So maybe just require LE?
> 
Ok I will add "#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__" guard for
the MMX/SSE and later intrinsic headers.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH, rs6000] Add x86 instrinsic headers to GCC PPC64LE taget
  2017-05-08 14:50 [PATCH, rs6000] Add x86 instrinsic headers to GCC PPC64LE taget Steven Munroe
  2017-05-09 17:35 ` Segher Boessenkool
@ 2017-05-12 18:39 ` Mike Stump
  2017-05-12 19:09   ` Steven Munroe
  1 sibling, 1 reply; 10+ messages in thread
From: Mike Stump @ 2017-05-12 18:39 UTC (permalink / raw)
  To: munroesj; +Cc: GCC Patches, Segher Boessenkool, David Edelsohn

On May 8, 2017, at 7:49 AM, Steven Munroe <munroesj@linux.vnet.ibm.com> wrote:
> Of course as part of this process we will port as many of the
> corresponding DejaGnu tests from gcc/testsuite/gcc.target/i386/ to
> gcc/testsuite/gcc.target/powerpc/ as appropriate. So far the dg-do run
> tests only require minor source changes, mostly to the platform specific
> dg-* directives. A few dg-do compile tests are needed to insure we are
> getting the expected folding/Common subexpression elimination (CSE) to
> generate the optimum sequence for PowerPC.

If there is a way to share that seems reasonable and the x86 would like to share...

I'd let you and the x86 folks figure out what is best.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH, rs6000] Add x86 instrinsic headers to GCC PPC64LE taget
  2017-05-12 18:39 ` Mike Stump
@ 2017-05-12 19:09   ` Steven Munroe
  0 siblings, 0 replies; 10+ messages in thread
From: Steven Munroe @ 2017-05-12 19:09 UTC (permalink / raw)
  To: Mike Stump; +Cc: GCC Patches, Segher Boessenkool, David Edelsohn

On Fri, 2017-05-12 at 11:38 -0700, Mike Stump wrote:
> On May 8, 2017, at 7:49 AM, Steven Munroe <munroesj@linux.vnet.ibm.com> wrote:
> > Of course as part of this process we will port as many of the
> > corresponding DejaGnu tests from gcc/testsuite/gcc.target/i386/ to
> > gcc/testsuite/gcc.target/powerpc/ as appropriate. So far the dg-do run
> > tests only require minor source changes, mostly to the platform specific
> > dg-* directives. A few dg-do compile tests are needed to insure we are
> > getting the expected folding/Common subexpression elimination (CSE) to
> > generate the optimum sequence for PowerPC.
> 
> If there is a way to share that seems reasonable and the x86 would like to share...
> 
> I'd let you and the x86 folks figure out what is best.

It too early to tell but I have no objections to discussing options.

Are you looking to share source files? This seems like low value because
the files tend to be small and the only difference is the dg-*
directives. I don't know enough about the DejaGnu macros to even guess
at what this might entail.

So far the sharing it is mostly one way (./i386/ -> ./powerpc/) but if I
find cases that requires a new dg test and might also apply to ./i386/ I
be willing to share that with X86.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH, rs6000] Add x86 instrinsic headers to GCC PPC64LE taget
  2017-05-11 19:27           ` Steven Munroe
@ 2017-05-14  1:57             ` David Edelsohn
  0 siblings, 0 replies; 10+ messages in thread
From: David Edelsohn @ 2017-05-14  1:57 UTC (permalink / raw)
  To: Steve Munroe; +Cc: Segher Boessenkool, GCC Patches

On Thu, May 11, 2017 at 12:22 PM, Steven Munroe
<munroesj@linux.vnet.ibm.com> wrote:
> On Thu, 2017-05-11 at 09:39 -0500, Segher Boessenkool wrote:
>> On Wed, May 10, 2017 at 12:59:28PM -0500, Steven Munroe wrote:
>> > > That is just for the testsuite; I meant what happens if a user tries
>> > > to use it with an older target (or BE, or 32-bit)?  Is there a useful,
>> > > obvious error message?
>> > >
>> > So looking at the X86 headers, their current practice falls into two two
>> > areas.
>> >
>> > 1) guard 64-bit dependent intrinsic functions with:
>> >
>> > #ifdef __x86_64__
>> > #endif
>> >
>> > But they do not provide any warnings. I assume that attempting to use an
>> > intrinsic of this class would result in an implicit function declaration
>> > and a link-time failure.
>>
>> Yeah probably.  Which is fine -- it does not silently do the wrong thing,
>> and it is easy to find where the problem is.
>>
>> > If GCC does support that target then the '#pragma GCC target("avx")'
>> > will enable code generation, but the user might get a SIGILL if the
>> > hardware they have does not support those instructions.
>>
>> That is less friendly, but it still does not silently generate bad code.
>>
>> > In the BMI headers I already guard with:
>> >
>> > #ifdef  __PPC64__
>> > #endif
>> >
>> > This means that like x86_64, attempting to use _pext_u64 on a 32-bit
>> > compiler will result in an implicit function declaration and cause a
>> > linker error.
>>
>> Yup, that's fine.
>>
>> > This is sufficient for most of BMI and BMI2 (registers only / endian
>> > agnostic). But this does not address the larger issues (for SSE/SSE2+)
>> > which needing VXS implementation or restricting to LE.
>>
>> Right.
>>
>> > So should I check for:
>> >
>> > #ifdef __VSX__
>> > #endif
>> >
>> > or
>> >
>> > #ifdef __POWER8_VECTOR__
>> >
>> > or
>> >
>> > #ifdef _ARCH_PWR8
>> >
>> > and perhaps:
>> >
>> > #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
>> >
>> > as well to enforce this.
>> >
>> > And are you suggesting I add an #else clause with #warning or #error? Or
>> > is the implicit function and link failure sufficient?
>>
>> The first is friendlier, the second is sufficient I think.
>>
>> Maybe it is good enough to check for LE only?  Most unmodified code
>> written for x86 (using intrinsics etc.) will not work correctly on BE.
>> And if you restrict to LE you get 64-bit and POWER8 automatically.
>>
>> So maybe just require LE?
>>
> Ok I will add "#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__" guard for
> the MMX/SSE and later intrinsic headers.

Steve,

All of these the testcases for the new functionality are failing on AIX.

The testcases should use a target of lp64, not explicitly use -m64 option.

I don't know if it's useful to run the tests on AIX, but they
definitely should use lp64 and not the explicit option.

Thanks, David

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-05-14  1:09 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-08 14:50 [PATCH, rs6000] Add x86 instrinsic headers to GCC PPC64LE taget Steven Munroe
2017-05-09 17:35 ` Segher Boessenkool
2017-05-09 20:01   ` Steven Munroe
2017-05-09 21:03     ` Segher Boessenkool
2017-05-10 18:05       ` Steven Munroe
2017-05-11 14:52         ` Segher Boessenkool
2017-05-11 19:27           ` Steven Munroe
2017-05-14  1:57             ` David Edelsohn
2017-05-12 18:39 ` Mike Stump
2017-05-12 19:09   ` Steven Munroe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).