[PATCH][Aarch64] Add vectorized mersenne twister

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH][Aarch64] Add vectorized mersenne twister
@ 2017-06-01 23:48 Michael Collison
  0 siblings, 0 replies; 9+ messages in thread
From: Michael Collison @ 2017-06-01 23:48 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd

[-- Attachment #1: Type: text/plain, Size: 1389 bytes --]

This patch adds an vectorized implementation of the mersenne twister random number generator. This implementation is approximately 2.6 times faster than the non-vectorized implementation.

This implementation includes "arm_neon.h" when including the optimized <ext/random>.  This has the effect of polluting the global namespace with the Neon intrinsics, so user macros and functions could potentially clash with them.  Is this acceptable given this only happens when <ext/random> is explicitly included? Comments and input are welcome.

Sample code to use the new generator would look like this:

#include <random>
#include <ext/random>
#include <iostream>

int
main()
{
  __gnu_cxx::sfmt19937 mt(1729);

  std::uniform_int_distribution<int> dist(0,1008);

  for (int i = 0; i < 16; ++i)
    {
      std::cout << dist(mt) << " ";
    }
}



2017-06-01  Michael Collison  <michael.collison@arm.com>

	Add optimized implementation of mersenne twister for aarch64
	* config/cpu/aarch64/opt/ext/opt_random.h: New file.
	(__arch64_recursion): new function.
	(operator==): New function.
	(simd_fast_mersenne_twister_engine): New template class.
	* config/cpu/aarch64/opt/bits/opt_random.h: New file.
	* include/ext/random (add include for arm_neon.h):
	(simd_fast_mersenne_twister_engine): add _M_state private
	array for ARM_NEON conditional compilation.


[-- Attachment #2: gnutools-4218-v10.patch --]
[-- Type: application/octet-stream, Size: 8163 bytes --]

diff --git a/libstdc++-v3/config/cpu/aarch64/opt/bits/opt_random.h b/libstdc++-v3/config/cpu/aarch64/opt/bits/opt_random.h
new file mode 100644
index 0000000..7a49f3c
--- /dev/null
+++ b/libstdc++-v3/config/cpu/aarch64/opt/bits/opt_random.h
@@ -0,0 +1,52 @@
+// Optimizations for random number functions, aarch64 version -*- C++ -*-
+
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// <http://www.gnu.org/licenses/>.
+
+/** @file bits/opt_random.h
+ *  This is an internal header file, included by other library headers.
+ *  Do not attempt to use it directly. @headername{random}
+ */
+
+#ifndef _BITS_OPT_RANDOM_H
+#define _BITS_OPT_RANDOM_H 1
+
+#ifdef __ARM_NEON
+#include "arm_neon.h"
+#endif
+
+
+#pragma GCC system_header
+
+
+namespace std _GLIBCXX_VISIBILITY (default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+
+
+
+_GLIBCXX_END_NAMESPACE_VERSION
+} // namespace
+
+
+#endif // _BITS_OPT_RANDOM_H
diff --git a/libstdc++-v3/config/cpu/aarch64/opt/ext/opt_random.h b/libstdc++-v3/config/cpu/aarch64/opt/ext/opt_random.h
new file mode 100644
index 0000000..5099c85
--- /dev/null
+++ b/libstdc++-v3/config/cpu/aarch64/opt/ext/opt_random.h
@@ -0,0 +1,160 @@
+// Optimizations for random number extensions, aarch64 version -*- C++ -*-
+
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// <http://www.gnu.org/licenses/>.
+
+/** @file ext/random.tcc
+ *  This is an internal header file, included by other library headers.
+ *  Do not attempt to use it directly. @headername{ext/random}
+ */
+
+#ifndef _EXT_OPT_RANDOM_H
+#define _EXT_OPT_RANDOM_H 1
+
+#pragma GCC system_header
+
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+
+#ifdef __ARM_NEON
+
+#include "arm_neon.h"
+
+namespace __gnu_cxx _GLIBCXX_VISIBILITY (default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+
+  namespace {
+    template<size_t __sl1, size_t __sl2, size_t __sr1, size_t __sr2>
+      inline int32x4_t __aarch64_recursion (int32x4_t __a, int32x4_t __b,
+					    int32x4_t __c, int32x4_t __d,
+					    int32x4_t __e, int8x16_t __zero)
+      {
+	int32x4_t __y = (int32x4_t) vshrq_n_u32 ((uint32x4_t) __b, __sr1);
+	int32x4_t __z = (int32x4_t) vextq_s8 ((int8x16_t)__c, __zero, __sr2);
+	int32x4_t __v = (int32x4_t) vshlq_n_s32 (__d,__sl1);
+
+	__z = veorq_s32 (__z, __a);
+	__z = veorq_s32 (__z, __v);
+
+	int32x4_t __x = (int32x4_t) vextq_s8 (__zero,
+					      (int8x16_t) __a, 16 - __sl2);
+	__y = (int32x4_t) vandq_s32 (__y, __e);
+	__z = veorq_s32 (__z, __x);
+	return veorq_s32 (__z, __y);
+      }
+
+  }
+
+
+#define _GLIBCXX_OPT_HAVE_RANDOM_SFMT_GEN_READ	1
+  template<typename _UIntType, size_t __m,
+	   size_t __pos1, size_t __sl1, size_t __sl2,
+	   size_t __sr1, size_t __sr2,
+	   uint32_t __msk1, uint32_t __msk2,
+	   uint32_t __msk3, uint32_t __msk4,
+	   uint32_t __parity1, uint32_t __parity2,
+	   uint32_t __parity3, uint32_t __parity4>
+    void simd_fast_mersenne_twister_engine<_UIntType, __m,
+					   __pos1, __sl1, __sl2, __sr1, __sr2,
+					   __msk1, __msk2, __msk3, __msk4,
+					   __parity1, __parity2, __parity3,
+					   __parity4>::
+    _M_gen_rand (void)
+    {
+      int32x4_t __r1 = vld1q_s32 ((int32_t *)&_M_state[_M_nstate - 2]);
+      int32x4_t __r2 = vld1q_s32 ((int32_t *)&_M_state[_M_nstate - 1]);
+
+      int32_t __attribute__ ((aligned (16))) data[4] = {__msk1, __msk2,
+						      __msk3, __msk4};
+      int32x4_t __aData = vld1q_s32 (data);
+      const int8x16_t zero = (int8x16_t) vdupq_n_s8 (0);
+
+      size_t __i;
+      for (__i = 0; __i < _M_nstate - __pos1; ++__i)
+	{
+	  int32x4_t __r = __aarch64_recursion<__sl1, __sl2, __sr1, __sr2>
+	    (_M_state[__i], _M_state[__i + __pos1], __r1, __r2, __aData, zero);
+
+	  vst1q_s32 ((int32_t*) &_M_state[__i], __r);
+	  __r1 = __r2;
+	  __r2 = __r;
+	}
+      for (; __i < _M_nstate; ++__i)
+	{
+	  int32x4_t __r = __aarch64_recursion<__sl1, __sl2, __sr1, __sr2>
+	    (_M_state[__i], _M_state[__i + __pos1 - _M_nstate], __r1, __r2,
+	     __aData, zero);
+
+	  vst1q_s32 ((int32_t*)&_M_state[__i], __r);
+	  __r1 = __r2;
+	  __r2 = __r;
+	}
+
+      _M_pos = 0;
+    }
+
+
+#define _GLIBCXX_OPT_HAVE_RANDOM_SFMT_OPERATOREQUAL	1
+  template<typename _UIntType, size_t __m,
+	   size_t __pos1, size_t __sl1, size_t __sl2,
+	   size_t __sr1, size_t __sr2,
+	   uint32_t __msk1, uint32_t __msk2,
+	   uint32_t __msk3, uint32_t __msk4,
+	   uint32_t __parity1, uint32_t __parity2,
+	   uint32_t __parity3, uint32_t __parity4>
+    bool
+    operator==(const __gnu_cxx::simd_fast_mersenne_twister_engine<_UIntType,
+	       __m, __pos1, __sl1, __sl2, __sr1, __sr2,
+	       __msk1, __msk2, __msk3, __msk4,
+	       __parity1, __parity2, __parity3, __parity4>& __lhs,
+	       const __gnu_cxx::simd_fast_mersenne_twister_engine<_UIntType,
+	       __m, __pos1, __sl1, __sl2, __sr1, __sr2,
+	       __msk1, __msk2, __msk3, __msk4,
+	       __parity1, __parity2, __parity3, __parity4>& __rhs)
+    {
+      if (__lhs._M_pos != __rhs._M_pos)
+	return false;
+
+      uint8x16_t __res = veorq_u8 ((uint8x16_t) __lhs._M_state[0],
+				   (uint8x16_t) __rhs._M_state[0]);
+
+      int32x4_t __compare_res;
+      for (size_t __i = 1; __i < __lhs._M_nstate; ++__i)
+	{
+	  __compare_res = (int32x4_t) veorq_u8 ((uint8x16_t) __lhs._M_state[__i],
+					      (uint8x16_t) __rhs._M_state[__i]);
+
+	  __res = (uint8x16_t) vorrq_s32 ((int32x4_t) __res, __compare_res);
+	}
+
+      return ((__int128) __res == 0);
+    }
+
+
+_GLIBCXX_END_NAMESPACE_VERSION
+} // namespace
+
+#endif // __ARM_NEON
+
+#endif // __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+
+#endif // _EXT_OPT_RANDOM_H
diff --git a/libstdc++-v3/include/ext/random b/libstdc++-v3/include/ext/random
index 167c560..b8ebe2c 100644
--- a/libstdc++-v3/include/ext/random
+++ b/libstdc++-v3/include/ext/random
@@ -42,6 +42,9 @@
 #ifdef __SSE2__
 # include <emmintrin.h>
 #endif
+#ifdef __ARM_NEON
+#include "arm_neon.h"
+#endif
 
 #if defined(_GLIBCXX_USE_C99_STDINT_TR1) && defined(UINT32_C)
 
@@ -184,6 +187,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #ifdef __SSE2__
 	__m128i _M_state[_M_nstate];
 #endif
+#ifdef __ARM_NEON
+	int32x4_t _M_state[_M_nstate];
+#endif
 	uint32_t _M_state32[_M_nstate32];
 	result_type _M_stateT[state_size];
       } __attribute__ ((__aligned__ (16)));
-- 
1.9.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH][Aarch64] Add vectorized mersenne twister
  2017-06-06 11:41   ` Charles Baylis
@ 2017-06-06 11:51     ` Jonathan Wakely
  0 siblings, 0 replies; 9+ messages in thread
From: Jonathan Wakely @ 2017-06-06 11:51 UTC (permalink / raw)
  To: Charles Baylis
  Cc: James Greenhalgh, Ulrich Drepper, Michael Collison, libstdc++,
	nd, GCC Patches, Richard Earnshaw, Ramana Radhakrishnan,
	kyrtka01, nickc

On 06/06/17 12:41 +0100, Charles Baylis wrote:
>On 6 June 2017 at 11:07, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>
>> If we don't mind that, and instead take a GCC-centric view, we could avoid
>> namespace polution by using the GCC-internal names for the intrinsics
>> (__builtin_aarch64_...). As those names don't form a guaranteed interface,
>> that would tie us to a GCC version.
>>
>> So we have a few solutions to choose from, each of which invokes a trade-off:
>>
>>   1 Use the current names and pollute the namespace.
>>   2 Use the GCC internal __builtin_aarch64* names and tie libstdc++ to GCC
>>     internals.
>>   3 Define a new set of namespace-clean names and complicate the Neon
>>     intrinsic interface while we migrate old users to new names.
>>
>> I can see why the libstdc++ community would prefer 3) over the other options,
>> but I'm reticent to take that route as the cost to our intrinsic maintainers
>> and users looks high. I've added some of the other ARM port maintainers
>> for their opinion.
>>
>> Are there any other options I'm missing?
>
>If solving for C++ only is OK, then it might be feasible to do something like:
>
>namespace __arm_neon_for_ext_random {
>#include <arm_neon_internal.h>   // like arm_neon.h, but without
>include guards [*]
>}
>
>Then the libstdc++ headers can use "using namespace
>__arm_neon_for_ext_random" inside the functions which use NEON
>intrinsics.
>
>[*] without include guards so that other header files can use the same
>trick in their own namespace.
>
>I'm not sure if this will work for all host compilers, with GCC I
>think it's OK because the intrinsics are implemented as inline
>functions, rather than #defines, but not all compilers have taken that
>approach.

It would be a bit better, but this valid (albeit silly) program would
be rejected:

#define int32x4_t ++/!!!POISONED!!!/++
#include <ext/random>
int main() { }

Users are allowed to use names like int32x4_t for their own macros,
because that's not a name reserved for the implementation.

ARM developers are probably aware of those names and would avoid them,
but developers writing portable code that might get used on ARM are
now also unable to use those names if <ext/random> might use them.

Of course, using names that aren't ALL_CAPS for macros is crazy, but
the standard library has to support craziness as long as it doesn't
violate the standard. That's why including <arm_neon.h> in libstdc++
headers is problematic. I would be prepared to live with it for a
non-standard extension header, but Ulrich makes a good point about
this code maybe going into namespace std one day.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH][Aarch64] Add vectorized mersenne twister
  2017-06-06 10:08 ` James Greenhalgh
  2017-06-06 10:23   ` Jonathan Wakely
  2017-06-06 11:25   ` Ulrich Drepper
@ 2017-06-06 11:41   ` Charles Baylis
  2017-06-06 11:51     ` Jonathan Wakely
  2 siblings, 1 reply; 9+ messages in thread
From: Charles Baylis @ 2017-06-06 11:41 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: Jonathan Wakely, Ulrich Drepper, Michael Collison, libstdc++,
	nd, GCC Patches, Richard Earnshaw, Ramana Radhakrishnan,
	kyrtka01, nickc

On 6 June 2017 at 11:07, James Greenhalgh <james.greenhalgh@arm.com> wrote:

> If we don't mind that, and instead take a GCC-centric view, we could avoid
> namespace polution by using the GCC-internal names for the intrinsics
> (__builtin_aarch64_...). As those names don't form a guaranteed interface,
> that would tie us to a GCC version.
>
> So we have a few solutions to choose from, each of which invokes a trade-off:
>
>   1 Use the current names and pollute the namespace.
>   2 Use the GCC internal __builtin_aarch64* names and tie libstdc++ to GCC
>     internals.
>   3 Define a new set of namespace-clean names and complicate the Neon
>     intrinsic interface while we migrate old users to new names.
>
> I can see why the libstdc++ community would prefer 3) over the other options,
> but I'm reticent to take that route as the cost to our intrinsic maintainers
> and users looks high. I've added some of the other ARM port maintainers
> for their opinion.
>
> Are there any other options I'm missing?

If solving for C++ only is OK, then it might be feasible to do something like:

namespace __arm_neon_for_ext_random {
#include <arm_neon_internal.h>   // like arm_neon.h, but without
include guards [*]
}

Then the libstdc++ headers can use "using namespace
__arm_neon_for_ext_random" inside the functions which use NEON
intrinsics.

[*] without include guards so that other header files can use the same
trick in their own namespace.

I'm not sure if this will work for all host compilers, with GCC I
think it's OK because the intrinsics are implemented as inline
functions, rather than #defines, but not all compilers have taken that
approach.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH][Aarch64] Add vectorized mersenne twister
  2017-06-06 11:33       ` James Greenhalgh
@ 2017-06-06 11:40         ` Jonathan Wakely
  0 siblings, 0 replies; 9+ messages in thread
From: Jonathan Wakely @ 2017-06-06 11:40 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: Ulrich Drepper, Michael Collison, libstdc++,
	nd, gcc-patches, richard.earnshaw, ramana.radhakrishnan,
	kyrtka01, nickc

On 06/06/17 12:33 +0100, James Greenhalgh wrote:
>On Tue, Jun 06, 2017 at 11:47:45AM +0100, Jonathan Wakely wrote:
>> On 06/06/17 11:23 +0100, Jonathan Wakely wrote:
>> >On 06/06/17 11:07 +0100, James Greenhalgh wrote:
>> >>On Fri, Jun 02, 2017 at 07:13:17PM +0100, Jonathan Wakely wrote:
>> >>>On 02/06/17 19:19 +0200, Ulrich Drepper wrote:
>> >>>>On Fri, Jun 2, 2017 at 5:46 PM, Michael Collison
>> >>>><Michael.Collison@arm.com> wrote:
>> >>>>>This implementation includes "arm_neon.h" when including the optimized <ext/random>.  This has the effect of polluting the global namespace with the Neon intrinsics, so user macros and functions could potentially clash with them.  Is this acceptable given this only happens when <ext/random> is explicitly included?
>> >>>>
>> >>>>I don't think it is.  Ideally the sfmt classes would get added to the
>> >>>>standard some day; they are much better performance-wise than the mt
>> >>>>classes.  At that point you'd have no excuse anymore.
>> >>>>
>> >>>>Why are the symbols in arm_neon.h not prefixed with an underscore as
>> >>>>it is the case for equivalent definitions of other platforms (notably
>> >>>>x86)?  If you start adding such optimizations I suggest you create
>> >>>>headers which do not pollute the namespace and include it in
>> >>>>arm_neon.h and whatever other header is needed to define the symbols
>> >>>>needed for compatibility.
>> >>>>
>> >>>>Nip the problem in the butt now, this is probably just the beginning.
>> >>
>> >>We're a good number of years late to do that without causing some pain.
>> >>
>> >>The arm_neon.h names for AArch64 come from those for the ARM port, which
>> >>themselves started out in the mid 2000s in ARM's proprietary compiler. These
>> >>names are now all over the place; from LLVM, through popular proprietary
>> >>toolchains, through a variety of developer guides and codebases, and in
>> >>to GCC.
>> >>
>> >>Forging a new interface is possible, but expecting migration to the
>> >>namespace-safe names seems unlikely given the wide spread of the current
>> >>names. Essentially, we'd be doubling our maintenance, and asking all
>> >>other compilers to follow. It is possible, but we'd not want to do it
>> >>lightly.
>> >>
>> >>In particular, my immediate worry is needing other compilers to support the
>> >>new names if they are to use libstdc++.
>> >>
>> >>If we don't mind that, and instead take a GCC-centric view, we could avoid
>> >>namespace polution by using the GCC-internal names for the intrinsics
>> >>(__builtin_aarch64_...). As those names don't form a guaranteed interface,
>> >>that would tie us to a GCC version.
>> >
>> >Libstdc++ is already tied to a GCC version when used with GCC. Using
>> >the internal names might cause problems when using libstdc++ with
>> >non-GNU compilers if they don't use the same names (it's used with at
>> >least Clang and ICC).
>> >
>> >>So we have a few solutions to choose from, each of which invokes a trade-off:
>> >>
>> >>1 Use the current names and pollute the namespace.
>> >>2 Use the GCC internal __builtin_aarch64* names and tie libstdc++ to GCC
>> >>  internals.
>> >>3 Define a new set of namespace-clean names and complicate the Neon
>> >>  intrinsic interface while we migrate old users to new names.
>> >>
>> >>I can see why the libstdc++ community would prefer 3) over the other options,
>> >>but I'm reticent to take that route as the cost to our intrinsic maintainers
>> >>and users looks high. I've added some of the other ARM port maintainers
>> >>for their opinion.
>> >>
>> >>Are there any other options I'm missing?
>> >
>> >3b) Define a new subset of names for the parts needed by this patch,
>> >  but don't expect users to migrate to them.
>> >
>> >That probably doesn't have any advantage over 2.
>> >
>>
>>
>> I suppose <ext/random> could do something like:
>>
>> #if __GNUC__ >= 8  // This is GCC not Clang pretending to be GCC 4.2
>> namespace __detail {
>>  using __int32x4_t = __Int32x4_t;
>>  using __int8x16_t = __Int8x16_t;
>> }
>> #else
>> #include "arm_neon.h"
>> namespace __detail {
>>  using __int32x4_t = int32x4_t;
>>  using __int8x16_t = int8x16_t;
>> }
>> #endif
>>
>> But we'd need to redefine or wrap the operations like veorq_u8 which
>> would be a pain.
>
>Thanks for the feedback. We've got the clear message that this approach
>is unacceptable, the suggestions for other possible ways forward are very
>helpful.
>
>I think we got some of the way towards this when we were discussing this
>internally. For many of the intrinsics needed by this patch we can get
>away with using the GCC vector syntax (a ^ b for veorq_u8) and avoid
>relying on the header at all. I think we ought to go back to that idea and
>see how far we can take it, and which names we really do need to agree on
>or mask in this way.
>
>I presume using the GCC vector extensions (which are certainly supported
>by Clang, I don't know about ICC) will be OK?

Yes, I had a look in the Clang arm_neon.h header and it uses the same
vector extensions. If all the operations needed for Michael's patch
can be done using those instead of the named functions I'd be happy
with that. The handful of typedefs needed can be defined in some
libstdc++ header, rather than by including in arm_neon.h

I assume we don't actually need to worry about ICC, because this is
ARM-only code.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH][Aarch64] Add vectorized mersenne twister
  2017-06-06 10:47     ` Jonathan Wakely
@ 2017-06-06 11:33       ` James Greenhalgh
  2017-06-06 11:40         ` Jonathan Wakely
  0 siblings, 1 reply; 9+ messages in thread
From: James Greenhalgh @ 2017-06-06 11:33 UTC (permalink / raw)
  To: Jonathan Wakely
  Cc: Ulrich Drepper, Michael Collison, libstdc++,
	nd, gcc-patches, richard.earnshaw, ramana.radhakrishnan,
	kyrtka01, nickc

On Tue, Jun 06, 2017 at 11:47:45AM +0100, Jonathan Wakely wrote:
> On 06/06/17 11:23 +0100, Jonathan Wakely wrote:
> >On 06/06/17 11:07 +0100, James Greenhalgh wrote:
> >>On Fri, Jun 02, 2017 at 07:13:17PM +0100, Jonathan Wakely wrote:
> >>>On 02/06/17 19:19 +0200, Ulrich Drepper wrote:
> >>>>On Fri, Jun 2, 2017 at 5:46 PM, Michael Collison
> >>>><Michael.Collison@arm.com> wrote:
> >>>>>This implementation includes "arm_neon.h" when including the optimized <ext/random>.  This has the effect of polluting the global namespace with the Neon intrinsics, so user macros and functions could potentially clash with them.  Is this acceptable given this only happens when <ext/random> is explicitly included?
> >>>>
> >>>>I don't think it is.  Ideally the sfmt classes would get added to the
> >>>>standard some day; they are much better performance-wise than the mt
> >>>>classes.  At that point you'd have no excuse anymore.
> >>>>
> >>>>Why are the symbols in arm_neon.h not prefixed with an underscore as
> >>>>it is the case for equivalent definitions of other platforms (notably
> >>>>x86)?  If you start adding such optimizations I suggest you create
> >>>>headers which do not pollute the namespace and include it in
> >>>>arm_neon.h and whatever other header is needed to define the symbols
> >>>>needed for compatibility.
> >>>>
> >>>>Nip the problem in the butt now, this is probably just the beginning.
> >>
> >>We're a good number of years late to do that without causing some pain.
> >>
> >>The arm_neon.h names for AArch64 come from those for the ARM port, which
> >>themselves started out in the mid 2000s in ARM's proprietary compiler. These
> >>names are now all over the place; from LLVM, through popular proprietary
> >>toolchains, through a variety of developer guides and codebases, and in
> >>to GCC.
> >>
> >>Forging a new interface is possible, but expecting migration to the
> >>namespace-safe names seems unlikely given the wide spread of the current
> >>names. Essentially, we'd be doubling our maintenance, and asking all
> >>other compilers to follow. It is possible, but we'd not want to do it
> >>lightly.
> >>
> >>In particular, my immediate worry is needing other compilers to support the
> >>new names if they are to use libstdc++.
> >>
> >>If we don't mind that, and instead take a GCC-centric view, we could avoid
> >>namespace polution by using the GCC-internal names for the intrinsics
> >>(__builtin_aarch64_...). As those names don't form a guaranteed interface,
> >>that would tie us to a GCC version.
> >
> >Libstdc++ is already tied to a GCC version when used with GCC. Using
> >the internal names might cause problems when using libstdc++ with
> >non-GNU compilers if they don't use the same names (it's used with at
> >least Clang and ICC).
> >
> >>So we have a few solutions to choose from, each of which invokes a trade-off:
> >>
> >>1 Use the current names and pollute the namespace.
> >>2 Use the GCC internal __builtin_aarch64* names and tie libstdc++ to GCC
> >>  internals.
> >>3 Define a new set of namespace-clean names and complicate the Neon
> >>  intrinsic interface while we migrate old users to new names.
> >>
> >>I can see why the libstdc++ community would prefer 3) over the other options,
> >>but I'm reticent to take that route as the cost to our intrinsic maintainers
> >>and users looks high. I've added some of the other ARM port maintainers
> >>for their opinion.
> >>
> >>Are there any other options I'm missing?
> >
> >3b) Define a new subset of names for the parts needed by this patch,
> >  but don't expect users to migrate to them.
> >
> >That probably doesn't have any advantage over 2.
> >
> 
> 
> I suppose <ext/random> could do something like:
> 
> #if __GNUC__ >= 8  // This is GCC not Clang pretending to be GCC 4.2
> namespace __detail {
>  using __int32x4_t = __Int32x4_t;
>  using __int8x16_t = __Int8x16_t;
> }
> #else
> #include "arm_neon.h"
> namespace __detail {
>  using __int32x4_t = int32x4_t;
>  using __int8x16_t = int8x16_t;
> }
> #endif
> 
> But we'd need to redefine or wrap the operations like veorq_u8 which
> would be a pain.

Thanks for the feedback. We've got the clear message that this approach
is unacceptable, the suggestions for other possible ways forward are very
helpful.

I think we got some of the way towards this when we were discussing this
internally. For many of the intrinsics needed by this patch we can get
away with using the GCC vector syntax (a ^ b for veorq_u8) and avoid
relying on the header at all. I think we ought to go back to that idea and
see how far we can take it, and which names we really do need to agree on
or mask in this way.

I presume using the GCC vector extensions (which are certainly supported
by Clang, I don't know about ICC) will be OK?

Cheers,
James

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH][Aarch64] Add vectorized mersenne twister
  2017-06-06 10:08 ` James Greenhalgh
  2017-06-06 10:23   ` Jonathan Wakely
@ 2017-06-06 11:25   ` Ulrich Drepper
  2017-06-06 11:41   ` Charles Baylis
  2 siblings, 0 replies; 9+ messages in thread
From: Ulrich Drepper @ 2017-06-06 11:25 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: Jonathan Wakely, Michael Collison, libstdc++,
	nd, GCC Patches, richard.earnshaw, ramana.radhakrishnan,
	kyrtka01, Nick Clifton

On Tue, Jun 6, 2017 at 12:07 PM, James Greenhalgh
<james.greenhalgh@arm.com> wrote:
> We're a good number of years late to do that without causing some pain.

Well, it's pain for those who deserve it.  Who thought it to be a
smart idea to pollute the global namespace?

It's a one-time deal.


> So we have a few solutions to choose from, each of which invokes a trade-off:
>
>   1 Use the current names and pollute the namespace.

IMO unacceptable.


>   2 Use the GCC internal __builtin_aarch64* names and tie libstdc++ to GCC
>     internals.

Maybe.


>   3 Define a new set of namespace-clean names and complicate the Neon
>     intrinsic interface while we migrate old users to new names.

See Jonathan's proposal.  I never suggested that those who don't care
about namespace pollution would have to change their code.  Add
appropriate aliases.

There is perhaps number 4:

- use the x86-64 intrinsics which you map to aarch64 intrinsics.
Isn't this compatibility layer planned anyway?  I don't know whether
everything maps 1-to-1 and you don't lose performance but you could
this way use the arch-specific code I wrote a long time ago for
x86-64.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH][Aarch64] Add vectorized mersenne twister
  2017-06-06 10:23   ` Jonathan Wakely
@ 2017-06-06 10:47     ` Jonathan Wakely
  2017-06-06 11:33       ` James Greenhalgh
  0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Wakely @ 2017-06-06 10:47 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: Ulrich Drepper, Michael Collison, libstdc++,
	nd, gcc-patches, richard.earnshaw, ramana.radhakrishnan,
	kyrtka01, nickc

On 06/06/17 11:23 +0100, Jonathan Wakely wrote:
>On 06/06/17 11:07 +0100, James Greenhalgh wrote:
>>On Fri, Jun 02, 2017 at 07:13:17PM +0100, Jonathan Wakely wrote:
>>>On 02/06/17 19:19 +0200, Ulrich Drepper wrote:
>>>>On Fri, Jun 2, 2017 at 5:46 PM, Michael Collison
>>>><Michael.Collison@arm.com> wrote:
>>>>>This implementation includes "arm_neon.h" when including the optimized <ext/random>.  This has the effect of polluting the global namespace with the Neon intrinsics, so user macros and functions could potentially clash with them.  Is this acceptable given this only happens when <ext/random> is explicitly included?
>>>>
>>>>I don't think it is.  Ideally the sfmt classes would get added to the
>>>>standard some day; they are much better performance-wise than the mt
>>>>classes.  At that point you'd have no excuse anymore.
>>>>
>>>>Why are the symbols in arm_neon.h not prefixed with an underscore as
>>>>it is the case for equivalent definitions of other platforms (notably
>>>>x86)?  If you start adding such optimizations I suggest you create
>>>>headers which do not pollute the namespace and include it in
>>>>arm_neon.h and whatever other header is needed to define the symbols
>>>>needed for compatibility.
>>>>
>>>>Nip the problem in the butt now, this is probably just the beginning.
>>
>>We're a good number of years late to do that without causing some pain.
>>
>>The arm_neon.h names for AArch64 come from those for the ARM port, which
>>themselves started out in the mid 2000s in ARM's proprietary compiler. These
>>names are now all over the place; from LLVM, through popular proprietary
>>toolchains, through a variety of developer guides and codebases, and in
>>to GCC.
>>
>>Forging a new interface is possible, but expecting migration to the
>>namespace-safe names seems unlikely given the wide spread of the current
>>names. Essentially, we'd be doubling our maintenance, and asking all
>>other compilers to follow. It is possible, but we'd not want to do it
>>lightly.
>>
>>In particular, my immediate worry is needing other compilers to support the
>>new names if they are to use libstdc++.
>>
>>If we don't mind that, and instead take a GCC-centric view, we could avoid
>>namespace polution by using the GCC-internal names for the intrinsics
>>(__builtin_aarch64_...). As those names don't form a guaranteed interface,
>>that would tie us to a GCC version.
>
>Libstdc++ is already tied to a GCC version when used with GCC. Using
>the internal names might cause problems when using libstdc++ with
>non-GNU compilers if they don't use the same names (it's used with at
>least Clang and ICC).
>
>>So we have a few solutions to choose from, each of which invokes a trade-off:
>>
>> 1 Use the current names and pollute the namespace.
>> 2 Use the GCC internal __builtin_aarch64* names and tie libstdc++ to GCC
>>   internals.
>> 3 Define a new set of namespace-clean names and complicate the Neon
>>   intrinsic interface while we migrate old users to new names.
>>
>>I can see why the libstdc++ community would prefer 3) over the other options,
>>but I'm reticent to take that route as the cost to our intrinsic maintainers
>>and users looks high. I've added some of the other ARM port maintainers
>>for their opinion.
>>
>>Are there any other options I'm missing?
>
>3b) Define a new subset of names for the parts needed by this patch,
>   but don't expect users to migrate to them.
>
>That probably doesn't have any advantage over 2.
>


I suppose <ext/random> could do something like:

#if __GNUC__ >= 8  // This is GCC not Clang pretending to be GCC 4.2
namespace __detail {
  using __int32x4_t = __Int32x4_t;
  using __int8x16_t = __Int8x16_t;
}
#else
#include "arm_neon.h"
namespace __detail {
  using __int32x4_t = int32x4_t;
  using __int8x16_t = int8x16_t;
}
#endif

But we'd need to redefine or wrap the operations like veorq_u8 which
would be a pain.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH][Aarch64] Add vectorized mersenne twister
  2017-06-06 10:08 ` James Greenhalgh
@ 2017-06-06 10:23   ` Jonathan Wakely
  2017-06-06 10:47     ` Jonathan Wakely
  2017-06-06 11:25   ` Ulrich Drepper
  2017-06-06 11:41   ` Charles Baylis
  2 siblings, 1 reply; 9+ messages in thread
From: Jonathan Wakely @ 2017-06-06 10:23 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: Ulrich Drepper, Michael Collison, libstdc++,
	nd, gcc-patches, richard.earnshaw, ramana.radhakrishnan,
	kyrtka01, nickc

On 06/06/17 11:07 +0100, James Greenhalgh wrote:
>On Fri, Jun 02, 2017 at 07:13:17PM +0100, Jonathan Wakely wrote:
>> On 02/06/17 19:19 +0200, Ulrich Drepper wrote:
>> >On Fri, Jun 2, 2017 at 5:46 PM, Michael Collison
>> ><Michael.Collison@arm.com> wrote:
>> >>This implementation includes "arm_neon.h" when including the optimized <ext/random>.  This has the effect of polluting the global namespace with the Neon intrinsics, so user macros and functions could potentially clash with them.  Is this acceptable given this only happens when <ext/random> is explicitly included?
>> >
>> >I don't think it is.  Ideally the sfmt classes would get added to the
>> >standard some day; they are much better performance-wise than the mt
>> >classes.  At that point you'd have no excuse anymore.
>> >
>> >Why are the symbols in arm_neon.h not prefixed with an underscore as
>> >it is the case for equivalent definitions of other platforms (notably
>> >x86)?  If you start adding such optimizations I suggest you create
>> >headers which do not pollute the namespace and include it in
>> >arm_neon.h and whatever other header is needed to define the symbols
>> >needed for compatibility.
>> >
>> >Nip the problem in the butt now, this is probably just the beginning.
>
>We're a good number of years late to do that without causing some pain.
>
>The arm_neon.h names for AArch64 come from those for the ARM port, which
>themselves started out in the mid 2000s in ARM's proprietary compiler. These
>names are now all over the place; from LLVM, through popular proprietary
>toolchains, through a variety of developer guides and codebases, and in
>to GCC.
>
>Forging a new interface is possible, but expecting migration to the
>namespace-safe names seems unlikely given the wide spread of the current
>names. Essentially, we'd be doubling our maintenance, and asking all
>other compilers to follow. It is possible, but we'd not want to do it
>lightly.
>
>In particular, my immediate worry is needing other compilers to support the
>new names if they are to use libstdc++.
>
>If we don't mind that, and instead take a GCC-centric view, we could avoid
>namespace polution by using the GCC-internal names for the intrinsics
>(__builtin_aarch64_...). As those names don't form a guaranteed interface,
>that would tie us to a GCC version.

Libstdc++ is already tied to a GCC version when used with GCC. Using
the internal names might cause problems when using libstdc++ with
non-GNU compilers if they don't use the same names (it's used with at
least Clang and ICC).

>So we have a few solutions to choose from, each of which invokes a trade-off:
>
>  1 Use the current names and pollute the namespace.
>  2 Use the GCC internal __builtin_aarch64* names and tie libstdc++ to GCC
>    internals.
>  3 Define a new set of namespace-clean names and complicate the Neon
>    intrinsic interface while we migrate old users to new names.
>
>I can see why the libstdc++ community would prefer 3) over the other options,
>but I'm reticent to take that route as the cost to our intrinsic maintainers
>and users looks high. I've added some of the other ARM port maintainers
>for their opinion.
>
>Are there any other options I'm missing?

3b) Define a new subset of names for the parts needed by this patch,
    but don't expect users to migrate to them.

That probably doesn't have any advantage over 2.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH][Aarch64] Add vectorized mersenne twister
       [not found] <20170602181317.GJ12306@redhat.com>
@ 2017-06-06 10:08 ` James Greenhalgh
  2017-06-06 10:23   ` Jonathan Wakely
                     ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: James Greenhalgh @ 2017-06-06 10:08 UTC (permalink / raw)
  To: Jonathan Wakely
  Cc: Ulrich Drepper, Michael Collison, libstdc++,
	nd, gcc-patches, richard.earnshaw, ramana.radhakrishnan,
	kyrtka01, nickc

On Fri, Jun 02, 2017 at 07:13:17PM +0100, Jonathan Wakely wrote:
> On 02/06/17 19:19 +0200, Ulrich Drepper wrote:
> >On Fri, Jun 2, 2017 at 5:46 PM, Michael Collison
> ><Michael.Collison@arm.com> wrote:
> >>This implementation includes "arm_neon.h" when including the optimized <ext/random>.  This has the effect of polluting the global namespace with the Neon intrinsics, so user macros and functions could potentially clash with them.  Is this acceptable given this only happens when <ext/random> is explicitly included?
> >
> >I don't think it is.  Ideally the sfmt classes would get added to the
> >standard some day; they are much better performance-wise than the mt
> >classes.  At that point you'd have no excuse anymore.
> >
> >Why are the symbols in arm_neon.h not prefixed with an underscore as
> >it is the case for equivalent definitions of other platforms (notably
> >x86)?  If you start adding such optimizations I suggest you create
> >headers which do not pollute the namespace and include it in
> >arm_neon.h and whatever other header is needed to define the symbols
> >needed for compatibility.
> >
> >Nip the problem in the butt now, this is probably just the beginning.

We're a good number of years late to do that without causing some pain.

The arm_neon.h names for AArch64 come from those for the ARM port, which
themselves started out in the mid 2000s in ARM's proprietary compiler. These
names are now all over the place; from LLVM, through popular proprietary
toolchains, through a variety of developer guides and codebases, and in
to GCC.

Forging a new interface is possible, but expecting migration to the
namespace-safe names seems unlikely given the wide spread of the current
names. Essentially, we'd be doubling our maintenance, and asking all
other compilers to follow. It is possible, but we'd not want to do it
lightly.

In particular, my immediate worry is needing other compilers to support the
new names if they are to use libstdc++.

If we don't mind that, and instead take a GCC-centric view, we could avoid
namespace polution by using the GCC-internal names for the intrinsics
(__builtin_aarch64_...). As those names don't form a guaranteed interface,
that would tie us to a GCC version.

So we have a few solutions to choose from, each of which invokes a trade-off:

  1 Use the current names and pollute the namespace.
  2 Use the GCC internal __builtin_aarch64* names and tie libstdc++ to GCC
    internals.
  3 Define a new set of namespace-clean names and complicate the Neon
    intrinsic interface while we migrate old users to new names.

I can see why the libstdc++ community would prefer 3) over the other options,
but I'm reticent to take that route as the cost to our intrinsic maintainers
and users looks high. I've added some of the other ARM port maintainers
for their opinion.

Are there any other options I'm missing?

Thanks,
James

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-06-06 11:51 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-01 23:48 [PATCH][Aarch64] Add vectorized mersenne twister Michael Collison
     [not found] <20170602181317.GJ12306@redhat.com>
2017-06-06 10:08 ` James Greenhalgh
2017-06-06 10:23   ` Jonathan Wakely
2017-06-06 10:47     ` Jonathan Wakely
2017-06-06 11:33       ` James Greenhalgh
2017-06-06 11:40         ` Jonathan Wakely
2017-06-06 11:25   ` Ulrich Drepper
2017-06-06 11:41   ` Charles Baylis
2017-06-06 11:51     ` Jonathan Wakely

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).