[PATCH 00/11] stdx::simd optimizations, corrections, and cleanups

public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups
@ 2021-06-08 12:10 Matthias Kretz
  2021-06-08 12:11 ` [PATCH 01/11] libstdc++: Improve copysign codegen Matthias Kretz
                   ` (11 more replies)
  0 siblings, 12 replies; 29+ messages in thread
From: Matthias Kretz @ 2021-06-08 12:10 UTC (permalink / raw)
  To: gcc-patches, libstdc++

The following patches mostly contain code cleanups and minor corrections. The 
major feature in this patchset is the last patch, which should make the use of 
stdx::simd much safer wrt. ODR violations involuntarily introduced by linking 
TUs that were compiled with different -m and floating-point flags.

Matthias Kretz (11):
  libstdc++: Improve copysign codegen
  libstdc++: Remove dead code
  libstdc++: Improve fixed_size codegen
  libstdc++: Make use of __builtin_bit_cast
  libstdc++: Remove incorrect fabs overload
  libstdc++: Minor simd_math cleanups
  libstdc++: Fix condition when AVX512F ldexp implementation is used
  libstdc++: Avoid raising fp exceptions in trunc, floor, and ceil
  libstdc++: Ensure unrolled loops inline the lambda
  libstdc++: Fix internal names: add missing underscores
  libstdc++: Fix ODR issues with different -m flags

 libstdc++-v3/include/experimental/bits/simd.h | 438 ++++++++++++------
 .../include/experimental/bits/simd_builtin.h  |  48 +-
 .../experimental/bits/simd_converter.h        |   2 +-
 .../include/experimental/bits/simd_detail.h   |  40 ++
 .../experimental/bits/simd_fixed_size.h       |  95 ++--
 .../include/experimental/bits/simd_math.h     | 107 ++---
 .../include/experimental/bits/simd_neon.h     |   4 +-
 .../include/experimental/bits/simd_ppc.h      |   4 +-
 .../include/experimental/bits/simd_scalar.h   |  71 ++-
 .../include/experimental/bits/simd_x86.h      |  33 +-
 .../simd/tests/bits/test_values.h             |   8 +-
 11 files changed, 540 insertions(+), 310 deletions(-)

-- 
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 std::experimental::simd              https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 01/11] libstdc++: Improve copysign codegen
  2021-06-08 12:10 [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups Matthias Kretz
@ 2021-06-08 12:11 ` Matthias Kretz
  2021-06-08 12:11 ` [PATCH 02/11] libstdc++: Remove dead code Matthias Kretz
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Matthias Kretz @ 2021-06-08 12:11 UTC (permalink / raw)
  To: gcc-patches, libstdc++

[-- Attachment #1: Type: text/plain, Size: 1824 bytes --]



From: Matthias Kretz <kretz@kde.org>

This also resolves a test failure on aarch64 with -ffast-math and
fixed_size<N> with large N.

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>

libstdc++-v3/ChangeLog:

	* include/experimental/bits/simd.h: Add missing operator~
	overload for simd<floating-point> to __float_bitwise_operators.
	* include/experimental/bits/simd_builtin.h
	(_SimdImplBuiltin::_S_complement): Bitcast to int (and back) to
	implement complement for floating-point vectors.
	* include/experimental/bits/simd_fixed_size.h
	(_SimdImplFixedSize::_S_copysign): New function, forwarding to
	copysign implementation of _SimdTuple members.
	* include/experimental/bits/simd_math.h (copysign): Call
	_SimdImpl::_S_copysign for fixed_size arguments. Simplify
	generic copysign implementation using the new ~ operator.
---
 libstdc++-v3/include/experimental/bits/simd.h            | 6 ++++++
 libstdc++-v3/include/experimental/bits/simd_builtin.h    | 7 ++++++-
 libstdc++-v3/include/experimental/bits/simd_fixed_size.h | 2 +-
 libstdc++-v3/include/experimental/bits/simd_math.h       | 4 +++-
 4 files changed, 16 insertions(+), 3 deletions(-)


--
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 std::experimental::simd              https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────

[-- Attachment #2: 0001-libstdc-Improve-copysign-codegen.patch --]
[-- Type: text/x-patch, Size: 3375 bytes --]

diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/include/experimental/bits/simd.h
index 59ddf3cc958..163f1b574e2 100644
--- a/libstdc++-v3/include/experimental/bits/simd.h
+++ b/libstdc++-v3/include/experimental/bits/simd.h
@@ -5189,6 +5189,12 @@ template <typename _Tp, typename _Ap>
     return {__private_init,
 	    _Ap::_SimdImpl::_S_bit_and(__data(__a), __data(__b))};
   }
+
+template <typename _Tp, typename _Ap>
+  _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR
+  enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Ap>>
+  operator~(const simd<_Tp, _Ap>& __a)
+  { return {__private_init, _Ap::_SimdImpl::_S_complement(__data(__a))}; }
 } // namespace __float_bitwise_operators }}}
 
 _GLIBCXX_SIMD_END_NAMESPACE
diff --git a/libstdc++-v3/include/experimental/bits/simd_builtin.h b/libstdc++-v3/include/experimental/bits/simd_builtin.h
index e986ee91620..8cd338e313f 100644
--- a/libstdc++-v3/include/experimental/bits/simd_builtin.h
+++ b/libstdc++-v3/include/experimental/bits/simd_builtin.h
@@ -1632,7 +1632,12 @@ template <typename _Abi>
     template <typename _Tp, size_t _Np>
       _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np>
       _S_complement(_SimdWrapper<_Tp, _Np> __x) noexcept
-      { return ~__x._M_data; }
+      {
+	if constexpr (is_floating_point_v<_Tp>)
+	  return __vector_bitcast<_Tp>(~__vector_bitcast<__int_for_sizeof_t<_Tp>>(__x));
+	else
+	  return ~__x._M_data;
+      }
 
     // _S_unary_minus {{{2
     template <typename _Tp, size_t _Np>
diff --git a/libstdc++-v3/include/experimental/bits/simd_fixed_size.h b/libstdc++-v3/include/experimental/bits/simd_fixed_size.h
index 2722055c899..7c2c1df77c8 100644
--- a/libstdc++-v3/include/experimental/bits/simd_fixed_size.h
+++ b/libstdc++-v3/include/experimental/bits/simd_fixed_size.h
@@ -1663,7 +1663,7 @@ template <int _Np>
     _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, ldexp)
     _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, fmod)
     _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, remainder)
-    // copysign in simd_math.h
+    _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, copysign)
     _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, nextafter)
     _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, fdim)
     _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, fmax)
diff --git a/libstdc++-v3/include/experimental/bits/simd_math.h b/libstdc++-v3/include/experimental/bits/simd_math.h
index 4799803a200..d954e761eee 100644
--- a/libstdc++-v3/include/experimental/bits/simd_math.h
+++ b/libstdc++-v3/include/experimental/bits/simd_math.h
@@ -1304,6 +1304,8 @@ template <typename _Tp, typename _Abi>
   {
     if constexpr (simd_size_v<_Tp, _Abi> == 1)
       return std::copysign(__x[0], __y[0]);
+    else if constexpr (__is_fixed_size_abi_v<_Abi>)
+      return {__private_init, _Abi::_SimdImpl::_S_copysign(__data(__x), __data(__y))};
     else if constexpr (is_same_v<_Tp, long double> && sizeof(_Tp) == 12)
       // Remove this case once __bit_cast is implemented via __builtin_bit_cast.
       // It is necessary, because __signmask below cannot be computed at compile
@@ -1315,7 +1317,7 @@ template <typename _Tp, typename _Abi>
 	using _V = simd<_Tp, _Abi>;
 	using namespace std::experimental::__float_bitwise_operators;
 	_GLIBCXX_SIMD_USE_CONSTEXPR_API auto __signmask = _V(1) ^ _V(-1);
-	return (__x & (__x ^ __signmask)) | (__y & __signmask);
+	return (__x & ~__signmask) | (__y & __signmask);
       }
   }
 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 02/11] libstdc++: Remove dead code
  2021-06-08 12:10 [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups Matthias Kretz
  2021-06-08 12:11 ` [PATCH 01/11] libstdc++: Improve copysign codegen Matthias Kretz
@ 2021-06-08 12:11 ` Matthias Kretz
  2021-06-08 12:11 ` [PATCH 03/11] libstdc++: Improve fixed_size codegen Matthias Kretz
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Matthias Kretz @ 2021-06-08 12:11 UTC (permalink / raw)
  To: gcc-patches, libstdc++

[-- Attachment #1: Type: text/plain, Size: 1018 bytes --]



From: Matthias Kretz <kretz@kde.org>

This helper type became unused at some point.

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>

libstdc++-v3/ChangeLog:

	* include/experimental/bits/simd_fixed_size.h
	(_AbisInSimdTuple): Removed.
---
 .../experimental/bits/simd_fixed_size.h       | 49 -------------------
 1 file changed, 49 deletions(-)


--
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 std::experimental::simd              https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────

[-- Attachment #2: 0002-libstdc-Remove-dead-code.patch --]
[-- Type: text/x-patch, Size: 2211 bytes --]

diff --git a/libstdc++-v3/include/experimental/bits/simd_fixed_size.h b/libstdc++-v3/include/experimental/bits/simd_fixed_size.h
index 7c2c1df77c8..b6fb47cdf39 100644
--- a/libstdc++-v3/include/experimental/bits/simd_fixed_size.h
+++ b/libstdc++-v3/include/experimental/bits/simd_fixed_size.h
@@ -1025,55 +1025,6 @@ template <typename _Tp, int _Np, typename... _As, typename _Next, int _Remain>
       _Tp, _Remain, _SimdTuple<_Tp, _As..., typename _Next::abi_type>>::type;
   };
 
-// }}}
-// _AbisInSimdTuple {{{
-template <typename _Tp>
-  struct _SeqOp;
-
-template <size_t _I0, size_t... _Is>
-  struct _SeqOp<index_sequence<_I0, _Is...>>
-  {
-    using _FirstPlusOne = index_sequence<_I0 + 1, _Is...>;
-    using _NotFirstPlusOne = index_sequence<_I0, (_Is + 1)...>;
-    template <size_t _First, size_t _Add>
-    using _Prepend = index_sequence<_First, _I0 + _Add, (_Is + _Add)...>;
-  };
-
-template <typename _Tp>
-  struct _AbisInSimdTuple;
-
-template <typename _Tp>
-  struct _AbisInSimdTuple<_SimdTuple<_Tp>>
-  {
-    using _Counts = index_sequence<0>;
-    using _Begins = index_sequence<0>;
-  };
-
-template <typename _Tp, typename _Ap>
-  struct _AbisInSimdTuple<_SimdTuple<_Tp, _Ap>>
-  {
-    using _Counts = index_sequence<1>;
-    using _Begins = index_sequence<0>;
-  };
-
-template <typename _Tp, typename _A0, typename... _As>
-  struct _AbisInSimdTuple<_SimdTuple<_Tp, _A0, _A0, _As...>>
-  {
-    using _Counts = typename _SeqOp<typename _AbisInSimdTuple<
-      _SimdTuple<_Tp, _A0, _As...>>::_Counts>::_FirstPlusOne;
-    using _Begins = typename _SeqOp<typename _AbisInSimdTuple<
-      _SimdTuple<_Tp, _A0, _As...>>::_Begins>::_NotFirstPlusOne;
-  };
-
-template <typename _Tp, typename _A0, typename _A1, typename... _As>
-  struct _AbisInSimdTuple<_SimdTuple<_Tp, _A0, _A1, _As...>>
-  {
-    using _Counts = typename _SeqOp<typename _AbisInSimdTuple<
-      _SimdTuple<_Tp, _A1, _As...>>::_Counts>::template _Prepend<1, 0>;
-    using _Begins = typename _SeqOp<typename _AbisInSimdTuple<
-      _SimdTuple<_Tp, _A1, _As...>>::_Begins>::template _Prepend<0, 1>;
-  };
-
 // }}}
 // __autocvt_to_simd {{{
 template <typename _Tp, bool = is_arithmetic_v<__remove_cvref_t<_Tp>>>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 03/11] libstdc++: Improve fixed_size codegen
  2021-06-08 12:10 [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups Matthias Kretz
  2021-06-08 12:11 ` [PATCH 01/11] libstdc++: Improve copysign codegen Matthias Kretz
  2021-06-08 12:11 ` [PATCH 02/11] libstdc++: Remove dead code Matthias Kretz
@ 2021-06-08 12:11 ` Matthias Kretz
  2021-06-08 12:11 ` [PATCH 04/11] libstdc++: Make use of __builtin_bit_cast Matthias Kretz
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Matthias Kretz @ 2021-06-08 12:11 UTC (permalink / raw)
  To: gcc-patches, libstdc++

[-- Attachment #1: Type: text/plain, Size: 1458 bytes --]



From: Matthias Kretz <kretz@kde.org>

Sometimes fixed_size objects will get unnecessarily copied on the stack.
The simd implementation should never pass _SimdTuple by value to avoid
requiring the optimizer to see through these copies.

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>

libstdc++-v3/ChangeLog:

	* include/experimental/bits/simd_converter.h
	(_SimdConverter::operator()): Pass _SimdTuple by const-ref.
	* include/experimental/bits/simd_fixed_size.h
	(_GLIBCXX_SIMD_FIXED_OP): Pass binary operator _SimdTuple
	arguments by const-ref.
	(_S_masked_unary): Pass _SimdTuple by const-ref.
---
 libstdc++-v3/include/experimental/bits/simd_converter.h  | 2 +-
 libstdc++-v3/include/experimental/bits/simd_fixed_size.h | 5 ++---
 2 files changed, 3 insertions(+), 4 deletions(-)


--
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 std::experimental::simd              https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────

[-- Attachment #2: 0003-libstdc-Improve-fixed_size-codegen.patch --]
[-- Type: text/x-patch, Size: 2133 bytes --]

diff --git a/libstdc++-v3/include/experimental/bits/simd_converter.h b/libstdc++-v3/include/experimental/bits/simd_converter.h
index 9c8bf382df9..11999df25e4 100644
--- a/libstdc++-v3/include/experimental/bits/simd_converter.h
+++ b/libstdc++-v3/include/experimental/bits/simd_converter.h
@@ -316,7 +316,7 @@ template <typename _From, int _Np, typename _To, typename _Ap>
 
     _GLIBCXX_SIMD_INTRINSIC constexpr
       typename _SimdTraits<_To, _Ap>::_SimdMember
-      operator()(_Arg __x) const noexcept
+      operator()(const _Arg& __x) const noexcept
     {
       if constexpr (_Arg::_S_tuple_size == 1)
 	return __vector_convert<__vector_type_t<_To, _Np>>(__x.first);
diff --git a/libstdc++-v3/include/experimental/bits/simd_fixed_size.h b/libstdc++-v3/include/experimental/bits/simd_fixed_size.h
index b6fb47cdf39..dc2fb90b9b2 100644
--- a/libstdc++-v3/include/experimental/bits/simd_fixed_size.h
+++ b/libstdc++-v3/include/experimental/bits/simd_fixed_size.h
@@ -1480,7 +1480,7 @@ template <int _Np>
 #define _GLIBCXX_SIMD_FIXED_OP(name_, op_)                                     \
     template <typename _Tp, typename... _As>                                   \
       static inline constexpr _SimdTuple<_Tp, _As...> name_(                   \
-	const _SimdTuple<_Tp, _As...> __x, const _SimdTuple<_Tp, _As...> __y)  \
+	const _SimdTuple<_Tp, _As...>& __x, const _SimdTuple<_Tp, _As...>& __y)\
       {                                                                        \
 	return __x._M_apply_per_chunk(                                         \
 	  [](auto __impl, auto __xx, auto __yy) constexpr {                    \
@@ -1780,8 +1780,7 @@ template <int _Np>
     // _S_masked_unary {{{2
     template <template <typename> class _Op, typename _Tp, typename... _As>
       static inline _SimdTuple<_Tp, _As...>
-      _S_masked_unary(const _MaskMember __bits,
-		      const _SimdTuple<_Tp, _As...> __v) // TODO: const-ref __v?
+      _S_masked_unary(const _MaskMember __bits, const _SimdTuple<_Tp, _As...>& __v)
       {
 	return __v._M_apply_wrapped([&__bits](auto __meta,
 					      auto __native) constexpr {

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 04/11] libstdc++: Make use of __builtin_bit_cast
  2021-06-08 12:10 [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups Matthias Kretz
                   ` (2 preceding siblings ...)
  2021-06-08 12:11 ` [PATCH 03/11] libstdc++: Improve fixed_size codegen Matthias Kretz
@ 2021-06-08 12:11 ` Matthias Kretz
  2021-06-11 10:53   ` [PATCH 04/11 v2] " Matthias Kretz
  2021-06-08 12:11 ` [PATCH 05/11] libstdc++: Remove incorrect fabs overload Matthias Kretz
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 29+ messages in thread
From: Matthias Kretz @ 2021-06-08 12:11 UTC (permalink / raw)
  To: gcc-patches, libstdc++

[-- Attachment #1: Type: text/plain, Size: 2008 bytes --]



From: Matthias Kretz <kretz@kde.org>

The __bit_cast function was a hack to achieve what __builtin_bit_cast
can do, therefore use __builtin_bit_cast if possible. However,
__builtin_bit_cast cannot be used to cast from/to fixed_size_simd, since
it isn't trivially copyable (in the language sense — in principle it
is). Therefore add __proposed::simd_bit_cast to enable the use case
required in the test framework.

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>

libstdc++-v3/ChangeLog:

	* include/experimental/bits/simd.h (__bit_cast): Implement via
	__builtin_bit_cast #if available.
	(__proposed::simd_bit_cast): Add overloads for simd and
	simd_mask, which use __builtin_bit_cast (or __bit_cast #if not
	available), which return an object of the requested type with
	the same bits as the argument.
	* include/experimental/bits/simd_math.h: Use simd_bit_cast
	instead of __bit_cast to allow casts to fixed_size_simd.
	* testsuite/experimental/simd/tests/bits/test_values.h: Switch
	from __bit_cast to __proposed::simd_bit_cast since the former
	will not cast fixed_size objects anymore.
---
 libstdc++-v3/include/experimental/bits/simd.h | 40 ++++++++++++++++++-
 .../include/experimental/bits/simd_math.h     |  8 ++--
 .../simd/tests/bits/test_values.h             |  8 ++--
 3 files changed, 46 insertions(+), 10 deletions(-)


--
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 std::experimental::simd              https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────

[-- Attachment #2: 0004-libstdc-Make-use-of-__builtin_bit_cast.patch --]
[-- Type: text/x-patch, Size: 4429 bytes --]

diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/include/experimental/bits/simd.h
index 163f1b574e2..5d243f22434 100644
--- a/libstdc++-v3/include/experimental/bits/simd.h
+++ b/libstdc++-v3/include/experimental/bits/simd.h
@@ -1598,7 +1598,9 @@ template <typename _To, typename _From>
   _GLIBCXX_SIMD_INTRINSIC constexpr _To
   __bit_cast(const _From __x)
   {
-    // TODO: implement with / replace by __builtin_bit_cast ASAP
+#if __has_builtin(__builtin_bit_cast)
+    return __builtin_bit_cast(_To, __x);
+#else
     static_assert(sizeof(_To) == sizeof(_From));
     constexpr bool __to_is_vectorizable
       = is_arithmetic_v<_To> || is_enum_v<_To>;
@@ -1629,6 +1631,7 @@ template <typename _To, typename _From>
 			 reinterpret_cast<const char*>(&__x), sizeof(_To));
 	return __r;
       }
+#endif
   }
 
 // }}}
@@ -2900,6 +2903,41 @@ template <typename _Tp, typename _Up, typename _Ap,
     return {__private_init, _RM::abi_type::_MaskImpl::template _S_convert<
 			      typename _RM::simd_type::value_type>(__x)};
   }
+
+template <typename _To, typename _Up, typename _Abi>
+  _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR
+  _To
+  simd_bit_cast(const simd<_Up, _Abi>& __x)
+  {
+    using _Tp = typename _To::value_type;
+    using _ToMember = typename _SimdTraits<_Tp, typename _To::abi_type>::_SimdMember;
+    using _From = simd<_Up, _Abi>;
+    using _FromMember = typename _SimdTraits<_Up, _Abi>::_SimdMember;
+    // with concepts, the following should be constraints
+    static_assert(sizeof(_To) == sizeof(_From));
+    static_assert(is_trivially_copyable_v<_Tp> && is_trivially_copyable_v<_Up>);
+    static_assert(is_trivially_copyable_v<_ToMember> && is_trivially_copyable_v<_FromMember>);
+#if __has_builtin(__builtin_bit_cast)
+    return {__private_init, __builtin_bit_cast(_ToMember, __data(__x))};
+#else
+    return {__private_init, __bit_cast<_ToMember>(__data(__x))};
+#endif
+  }
+
+template <typename _To, typename _Up, typename _Abi>
+  _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR
+  _To
+  simd_bit_cast(const simd_mask<_Up, _Abi>& __x)
+  {
+    using _From = simd_mask<_Up, _Abi>;
+    static_assert(sizeof(_To) == sizeof(_From));
+    static_assert(is_trivially_copyable_v<_To> && is_trivially_copyable_v<_From>);
+#if __has_builtin(__builtin_bit_cast)
+    return __builtin_bit_cast(_To, __x);
+#else
+    return __bit_cast<_To>(__x);
+#endif
+  }
 } // namespace __proposed
 
 // simd_cast {{{2
diff --git a/libstdc++-v3/include/experimental/bits/simd_math.h b/libstdc++-v3/include/experimental/bits/simd_math.h
index d954e761eee..3ade293fcbf 100644
--- a/libstdc++-v3/include/experimental/bits/simd_math.h
+++ b/libstdc++-v3/include/experimental/bits/simd_math.h
@@ -700,11 +700,9 @@ template <typename _Tp, typename _Abi>
 	// (inf and NaN are excluded by -ffinite-math-only)
 	const auto __iszero_inf_nan = __x == 0;
 #else
-	const auto __as_int
-	  = __bit_cast<rebind_simd_t<__int_for_sizeof_t<_Tp>, _V>>(abs(__x));
-	const auto __inf
-	  = __bit_cast<rebind_simd_t<__int_for_sizeof_t<_Tp>, _V>>(
-	    _V(__infinity_v<_Tp>));
+	using _Ip = __int_for_sizeof_t<_Tp>;
+	const auto __as_int = simd_bit_cast<rebind_simd_t<_Ip, _V>>(abs(__x));
+	const auto __inf = simd_bit_cast<rebind_simd_t<_Ip, _V>>(_V(__infinity_v<_Tp>));
 	const auto __iszero_inf_nan = static_simd_cast<typename _V::mask_type>(
 	  __as_int == 0 || __as_int >= __inf);
 #endif
diff --git a/libstdc++-v3/testsuite/experimental/simd/tests/bits/test_values.h b/libstdc++-v3/testsuite/experimental/simd/tests/bits/test_values.h
index b69bd0b704d..67aa870659b 100644
--- a/libstdc++-v3/testsuite/experimental/simd/tests/bits/test_values.h
+++ b/libstdc++-v3/testsuite/experimental/simd/tests/bits/test_values.h
@@ -221,11 +221,11 @@ template <class V>
     if constexpr (sizeof(T) <= sizeof(double))
       {
 	using I = rebind_simd_t<__int_for_sizeof_t<T>, V>;
-	const I abs_x = __bit_cast<I>(abs(x));
-	const I min = __bit_cast<I>(V(std::__norm_min_v<T>));
-	const I max = __bit_cast<I>(V(std::__finite_max_v<T>));
+	const I abs_x = simd_bit_cast<I>(abs(x));
+	const I min = simd_bit_cast<I>(V(std::__norm_min_v<T>));
+	const I max = simd_bit_cast<I>(V(std::__finite_max_v<T>));
 	return static_simd_cast<typename V::mask_type>(
-		 __bit_cast<I>(x) == 0 || (abs_x >= min && abs_x <= max));
+		 simd_bit_cast<I>(x) == 0 || (abs_x >= min && abs_x <= max));
       }
     else
       {

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 05/11] libstdc++: Remove incorrect fabs overload
  2021-06-08 12:10 [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups Matthias Kretz
                   ` (3 preceding siblings ...)
  2021-06-08 12:11 ` [PATCH 04/11] libstdc++: Make use of __builtin_bit_cast Matthias Kretz
@ 2021-06-08 12:11 ` Matthias Kretz
  2021-06-08 12:11 ` [PATCH 06/11] libstdc++: Minor simd_math cleanups Matthias Kretz
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Matthias Kretz @ 2021-06-08 12:11 UTC (permalink / raw)
  To: gcc-patches, libstdc++

[-- Attachment #1: Type: text/plain, Size: 1152 bytes --]



From: Matthias Kretz <kretz@kde.org>

fabs(int) returns double, this one didn't. This overload is not
specified in the Parallelism TS 2. Also remove the comment about labs
and llabs: it doesn't belong here.

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>

libstdc++-v3/ChangeLog:

	* include/experimental/bits/simd_math.h (fabs): Remove
	fabs(simd<integral>) overload.
---
 .../include/experimental/bits/simd_math.h        | 16 ----------------
 1 file changed, 16 deletions(-)


--
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 std::experimental::simd              https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────

[-- Attachment #2: 0005-libstdc-Remove-incorrect-fabs-overload.patch --]
[-- Type: text/x-patch, Size: 1372 bytes --]

diff --git a/libstdc++-v3/include/experimental/bits/simd_math.h b/libstdc++-v3/include/experimental/bits/simd_math.h
index 3ade293fcbf..cff4371619d 100644
--- a/libstdc++-v3/include/experimental/bits/simd_math.h
+++ b/libstdc++-v3/include/experimental/bits/simd_math.h
@@ -863,22 +863,6 @@ template <typename _Tp, typename _Abi>
   abs(const simd<_Tp, _Abi>& __x)
   { return {__private_init, _Abi::_SimdImpl::_S_abs(__data(__x))}; }
 
-template <typename _Tp, typename _Abi>
-  enable_if_t<!is_floating_point_v<_Tp> && is_signed_v<_Tp>, simd<_Tp, _Abi>>
-  fabs(const simd<_Tp, _Abi>& __x)
-  { return {__private_init, _Abi::_SimdImpl::_S_abs(__data(__x))}; }
-
-// the following are overloads for functions in <cstdlib> and not covered by
-// [parallel.simd.math]. I don't see much value in making them work, though
-/*
-template <typename _Abi> simd<long, _Abi> labs(const simd<long, _Abi> &__x)
-{ return {__private_init, _Abi::_SimdImpl::abs(__data(__x))}; }
-
-template <typename _Abi> simd<long long, _Abi> llabs(const simd<long long, _Abi>
-&__x)
-{ return {__private_init, _Abi::_SimdImpl::abs(__data(__x))}; }
-*/
-
 #define _GLIBCXX_SIMD_CVTING2(_NAME)                                           \
 template <typename _Tp, typename _Abi>                                         \
   _GLIBCXX_SIMD_INTRINSIC simd<_Tp, _Abi> _NAME(                               \

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 06/11] libstdc++: Minor simd_math cleanups
  2021-06-08 12:10 [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups Matthias Kretz
                   ` (4 preceding siblings ...)
  2021-06-08 12:11 ` [PATCH 05/11] libstdc++: Remove incorrect fabs overload Matthias Kretz
@ 2021-06-08 12:11 ` Matthias Kretz
  2021-06-08 12:11 ` [PATCH 07/11] libstdc++: Fix condition when AVX512F ldexp implementation is used Matthias Kretz
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Matthias Kretz @ 2021-06-08 12:11 UTC (permalink / raw)
  To: gcc-patches, libstdc++

[-- Attachment #1: Type: text/plain, Size: 1183 bytes --]



From: Matthias Kretz <kretz@kde.org>

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>

libstdc++-v3/ChangeLog:

	* include/experimental/bits/simd_math.h: Undefine internal
	macros after use.
	(frexp): Move #if to a more sensible position and reformat
	preceding code.
	(logb): Call _SimdImpl::_S_logb for fixed_size instead of
	duplicating the code here.
	(modf): Simplify condition.
---
 .../include/experimental/bits/simd_math.h     | 22 +++++--------------
 1 file changed, 6 insertions(+), 16 deletions(-)


--
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 std::experimental::simd              https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────

[-- Attachment #2: 0006-libstdc-Minor-simd_math-cleanups.patch --]
[-- Type: text/x-patch, Size: 2308 bytes --]

diff --git a/libstdc++-v3/include/experimental/bits/simd_math.h b/libstdc++-v3/include/experimental/bits/simd_math.h
index cff4371619d..a5df2039970 100644
--- a/libstdc++-v3/include/experimental/bits/simd_math.h
+++ b/libstdc++-v3/include/experimental/bits/simd_math.h
@@ -645,11 +645,8 @@ template <typename _Tp, typename _Abi>
 	return __r;
       }
     else if constexpr (__is_fixed_size_abi_v<_Abi>)
-      {
-	return {__private_init,
-		_Abi::_SimdImpl::_S_frexp(__data(__x), __data(*__exp))};
+      return {__private_init, _Abi::_SimdImpl::_S_frexp(__data(__x), __data(*__exp))};
 #if _GLIBCXX_SIMD_X86INTRIN
-      }
     else if constexpr (__have_avx512f)
       {
 	constexpr size_t _Np = simd_size_v<_Tp, _Abi>;
@@ -667,8 +664,8 @@ template <typename _Tp, typename _Abi>
 		_Abi::_CommonImpl::_S_blend(_SimdWrapper<bool, _Np>(
 					      __isnonzero),
 					    __v, __getmant_avx512(__v))};
-#endif // _GLIBCXX_SIMD_X86INTRIN
       }
+#endif // _GLIBCXX_SIMD_X86INTRIN
     else
       {
 	// fallback implementation
@@ -749,14 +746,7 @@ template <typename _Tp, typename _Abi>
     if constexpr (_Np == 1)
       return std::logb(__x[0]);
     else if constexpr (__is_fixed_size_abi_v<_Abi>)
-      {
-	return {__private_init,
-		__data(__x)._M_apply_per_chunk([](auto __impl, auto __xx) {
-		  using _V = typename decltype(__impl)::simd_type;
-		  return __data(
-		    std::experimental::logb(_V(__private_init, __xx)));
-		})};
-      }
+      return {__private_init, _Abi::_SimdImpl::_S_logb(__data(__x))};
 #if _GLIBCXX_SIMD_X86INTRIN // {{{
     else if constexpr (__have_avx512vl && __is_sse_ps<_Tp, _Np>())
       return {__private_init,
@@ -827,9 +817,7 @@ template <typename _Tp, typename _Abi>
   enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>>
   modf(const simd<_Tp, _Abi>& __x, simd<_Tp, _Abi>* __iptr)
   {
-    if constexpr (__is_scalar_abi<_Abi>()
-		  || (__is_fixed_size_abi_v<
-			_Abi> && simd_size_v<_Tp, _Abi> == 1))
+    if constexpr (simd_size_v<_Tp, _Abi> == 1)
       {
 	_Tp __tmp;
 	_Tp __r = std::modf(__x[0], &__tmp);
@@ -1472,6 +1460,8 @@ template <typename _Tp, typename _Abi>
   }
 // }}}
 
+#undef _GLIBCXX_SIMD_CVTING2
+#undef _GLIBCXX_SIMD_CVTING3
 #undef _GLIBCXX_SIMD_MATH_CALL_
 #undef _GLIBCXX_SIMD_MATH_CALL2_
 #undef _GLIBCXX_SIMD_MATH_CALL3_

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 07/11] libstdc++: Fix condition when AVX512F ldexp implementation is used
  2021-06-08 12:10 [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups Matthias Kretz
                   ` (5 preceding siblings ...)
  2021-06-08 12:11 ` [PATCH 06/11] libstdc++: Minor simd_math cleanups Matthias Kretz
@ 2021-06-08 12:11 ` Matthias Kretz
  2021-06-08 12:11 ` [PATCH 08/11] libstdc++: Avoid raising fp exceptions in trunc, floor, and ceil Matthias Kretz
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Matthias Kretz @ 2021-06-08 12:11 UTC (permalink / raw)
  To: gcc-patches, libstdc++

[-- Attachment #1: Type: text/plain, Size: 1170 bytes --]



From: Matthias Kretz <kretz@kde.org>

This improves codegen of ldexp if AVX512VL is available.

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>

libstdc++-v3/ChangeLog:

	* include/experimental/bits/simd_x86.h (_S_ldexp): The AVX512F
	implementation doesn't require a _VecBltnBtmsk ABI tag, it
	requires either a 64-Byte input (in which case AVX512F must be
	available) or AVX512VL.
---
 libstdc++-v3/include/experimental/bits/simd_x86.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)


--
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 std::experimental::simd              https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────

[-- Attachment #2: 0007-libstdc-Fix-condition-when-AVX512F-ldexp-implementat.patch --]
[-- Type: text/x-patch, Size: 1009 bytes --]

diff --git a/libstdc++-v3/include/experimental/bits/simd_x86.h b/libstdc++-v3/include/experimental/bits/simd_x86.h
index 305d7a9fa54..5706bf63845 100644
--- a/libstdc++-v3/include/experimental/bits/simd_x86.h
+++ b/libstdc++-v3/include/experimental/bits/simd_x86.h
@@ -2611,13 +2611,14 @@ template <typename _Abi>
       _S_ldexp(_SimdWrapper<_Tp, _Np> __x,
 	       __fixed_size_storage_t<int, _Np> __exp)
       {
-	if constexpr (__is_avx512_abi<_Abi>())
+	if constexpr (sizeof(__x) == 64 || __have_avx512vl)
 	  {
 	    const auto __xi = __to_intrin(__x);
 	    constexpr _SimdConverter<int, simd_abi::fixed_size<_Np>, _Tp, _Abi>
 	      __cvt;
 	    const auto __expi = __to_intrin(__cvt(__exp));
-	    constexpr auto __k1 = _Abi::template _S_implicit_mask_intrin<_Tp>();
+	    using _Up = __bool_storage_member_type_t<_Np>;
+	    constexpr _Up __k1 = _Np < sizeof(_Up) * __CHAR_BIT__ ? _Up((1ULL << _Np) - 1) : ~_Up();
 	    if constexpr (sizeof(__xi) == 16)
 	      {
 		if constexpr (sizeof(_Tp) == 8)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 08/11] libstdc++: Avoid raising fp exceptions in trunc, floor, and ceil
  2021-06-08 12:10 [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups Matthias Kretz
                   ` (6 preceding siblings ...)
  2021-06-08 12:11 ` [PATCH 07/11] libstdc++: Fix condition when AVX512F ldexp implementation is used Matthias Kretz
@ 2021-06-08 12:11 ` Matthias Kretz
  2021-06-08 12:11 ` [PATCH 09/11] libstdc++: Ensure unrolled loops inline the lambda Matthias Kretz
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Matthias Kretz @ 2021-06-08 12:11 UTC (permalink / raw)
  To: gcc-patches, libstdc++

[-- Attachment #1: Type: text/plain, Size: 1050 bytes --]



From: Matthias Kretz <kretz@kde.org>

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>

libstdc++-v3/ChangeLog:
	* include/experimental/bits/simd_x86.h (_S_trunc, _S_floor,
	_S_ceil): Set bit 8 (_MM_FROUND_NO_EXC) on AVX and SSE4.1
	roundp[sd] calls.
---
 .../include/experimental/bits/simd_x86.h      | 24 +++++++++----------
 1 file changed, 12 insertions(+), 12 deletions(-)


--
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 std::experimental::simd              https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────

[-- Attachment #2: 0008-libstdc-Avoid-raising-fp-exceptions-in-trunc-floor-a.patch --]
[-- Type: text/x-patch, Size: 2545 bytes --]

diff --git a/libstdc++-v3/include/experimental/bits/simd_x86.h b/libstdc++-v3/include/experimental/bits/simd_x86.h
index 5706bf63845..34633c096b1 100644
--- a/libstdc++-v3/include/experimental/bits/simd_x86.h
+++ b/libstdc++-v3/include/experimental/bits/simd_x86.h
@@ -2657,13 +2657,13 @@ template <typename _Abi>
 	else if constexpr (__is_avx512_pd<_Tp, _Np>())
 	  return _mm512_roundscale_pd(__x, 0x0b);
 	else if constexpr (__is_avx_ps<_Tp, _Np>())
-	  return _mm256_round_ps(__x, 0x3);
+	  return _mm256_round_ps(__x, 0xb);
 	else if constexpr (__is_avx_pd<_Tp, _Np>())
-	  return _mm256_round_pd(__x, 0x3);
+	  return _mm256_round_pd(__x, 0xb);
 	else if constexpr (__have_sse4_1 && __is_sse_ps<_Tp, _Np>())
-	  return __auto_bitcast(_mm_round_ps(__to_intrin(__x), 0x3));
+	  return __auto_bitcast(_mm_round_ps(__to_intrin(__x), 0xb));
 	else if constexpr (__have_sse4_1 && __is_sse_pd<_Tp, _Np>())
-	  return _mm_round_pd(__x, 0x3);
+	  return _mm_round_pd(__x, 0xb);
 	else if constexpr (__is_sse_ps<_Tp, _Np>())
 	  {
 	    auto __truncated
@@ -2786,13 +2786,13 @@ template <typename _Abi>
 	else if constexpr (__is_avx512_pd<_Tp, _Np>())
 	  return _mm512_roundscale_pd(__x, 0x09);
 	else if constexpr (__is_avx_ps<_Tp, _Np>())
-	  return _mm256_round_ps(__x, 0x1);
+	  return _mm256_round_ps(__x, 0x9);
 	else if constexpr (__is_avx_pd<_Tp, _Np>())
-	  return _mm256_round_pd(__x, 0x1);
+	  return _mm256_round_pd(__x, 0x9);
 	else if constexpr (__have_sse4_1 && __is_sse_ps<_Tp, _Np>())
-	  return __auto_bitcast(_mm_floor_ps(__to_intrin(__x)));
+	  return __auto_bitcast(_mm_round_ps(__to_intrin(__x), 0x9));
 	else if constexpr (__have_sse4_1 && __is_sse_pd<_Tp, _Np>())
-	  return _mm_floor_pd(__x);
+	  return _mm_round_pd(__x, 0x9);
 	else
 	  return _Base::_S_floor(__x);
       }
@@ -2808,13 +2808,13 @@ template <typename _Abi>
 	else if constexpr (__is_avx512_pd<_Tp, _Np>())
 	  return _mm512_roundscale_pd(__x, 0x0a);
 	else if constexpr (__is_avx_ps<_Tp, _Np>())
-	  return _mm256_round_ps(__x, 0x2);
+	  return _mm256_round_ps(__x, 0xa);
 	else if constexpr (__is_avx_pd<_Tp, _Np>())
-	  return _mm256_round_pd(__x, 0x2);
+	  return _mm256_round_pd(__x, 0xa);
 	else if constexpr (__have_sse4_1 && __is_sse_ps<_Tp, _Np>())
-	  return __auto_bitcast(_mm_ceil_ps(__to_intrin(__x)));
+	  return __auto_bitcast(_mm_round_ps(__to_intrin(__x), 0xa));
 	else if constexpr (__have_sse4_1 && __is_sse_pd<_Tp, _Np>())
-	  return _mm_ceil_pd(__x);
+	  return _mm_round_pd(__x, 0xa);
 	else
 	  return _Base::_S_ceil(__x);
       }

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 09/11] libstdc++: Ensure unrolled loops inline the lambda
  2021-06-08 12:10 [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups Matthias Kretz
                   ` (7 preceding siblings ...)
  2021-06-08 12:11 ` [PATCH 08/11] libstdc++: Avoid raising fp exceptions in trunc, floor, and ceil Matthias Kretz
@ 2021-06-08 12:11 ` Matthias Kretz
  2021-06-08 12:12 ` [PATCH 10/11] libstdc++: Fix internal names: add missing underscores Matthias Kretz
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 29+ messages in thread
From: Matthias Kretz @ 2021-06-08 12:11 UTC (permalink / raw)
  To: gcc-patches, libstdc++

[-- Attachment #1: Type: text/plain, Size: 1088 bytes --]



From: Matthias Kretz <kretz@kde.org>

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>

libstdc++-v3/ChangeLog:

	* include/experimental/bits/simd.h (__execute_on_index_sequence,
	__execute_on_index_sequence_with_return,
	__call_with_n_evaluations, __call_with_subscripts): Add flatten
	attribute.
---
 libstdc++-v3/include/experimental/bits/simd.h | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)


--
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 std::experimental::simd              https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────

[-- Attachment #2: 0009-libstdc-Ensure-unrolled-loops-inline-the-lambda.patch --]
[-- Type: text/x-patch, Size: 1830 bytes --]

diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/include/experimental/bits/simd.h
index 5d243f22434..21100c1087d 100644
--- a/libstdc++-v3/include/experimental/bits/simd.h
+++ b/libstdc++-v3/include/experimental/bits/simd.h
@@ -234,7 +234,8 @@ namespace __detail
 // unrolled/pack execution helpers
 // __execute_n_times{{{
 template <typename _Fp, size_t... _I>
-  _GLIBCXX_SIMD_INTRINSIC constexpr void
+  [[__gnu__::__flatten__]] _GLIBCXX_SIMD_INTRINSIC constexpr
+  void
   __execute_on_index_sequence(_Fp&& __f, index_sequence<_I...>)
   { ((void)__f(_SizeConstant<_I>()), ...); }
 
@@ -254,7 +255,8 @@ template <size_t _Np, typename _Fp>
 // }}}
 // __generate_from_n_evaluations{{{
 template <typename _R, typename _Fp, size_t... _I>
-  _GLIBCXX_SIMD_INTRINSIC constexpr _R
+  [[__gnu__::__flatten__]] _GLIBCXX_SIMD_INTRINSIC constexpr
+  _R
   __execute_on_index_sequence_with_return(_Fp&& __f, index_sequence<_I...>)
   { return _R{__f(_SizeConstant<_I>())...}; }
 
@@ -269,7 +271,8 @@ template <size_t _Np, typename _R, typename _Fp>
 // }}}
 // __call_with_n_evaluations{{{
 template <size_t... _I, typename _F0, typename _FArgs>
-  _GLIBCXX_SIMD_INTRINSIC constexpr auto
+  [[__gnu__::__flatten__]] _GLIBCXX_SIMD_INTRINSIC constexpr
+  auto
   __call_with_n_evaluations(index_sequence<_I...>, _F0&& __f0, _FArgs&& __fargs)
   { return __f0(__fargs(_SizeConstant<_I>())...); }
 
@@ -285,7 +288,8 @@ template <size_t _Np, typename _F0, typename _FArgs>
 // }}}
 // __call_with_subscripts{{{
 template <size_t _First = 0, size_t... _It, typename _Tp, typename _Fp>
-  _GLIBCXX_SIMD_INTRINSIC constexpr auto
+  [[__gnu__::__flatten__]] _GLIBCXX_SIMD_INTRINSIC constexpr
+  auto
   __call_with_subscripts(_Tp&& __x, index_sequence<_It...>, _Fp&& __fun)
   { return __fun(__x[_First + _It]...); }
 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 10/11] libstdc++: Fix internal names: add missing underscores
  2021-06-08 12:10 [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups Matthias Kretz
                   ` (8 preceding siblings ...)
  2021-06-08 12:11 ` [PATCH 09/11] libstdc++: Ensure unrolled loops inline the lambda Matthias Kretz
@ 2021-06-08 12:12 ` Matthias Kretz
  2021-06-08 12:12 ` [PATCH 11/11] libstdc++: Fix ODR issues with different -m flags Matthias Kretz
  2021-06-24 13:42 ` [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups Jonathan Wakely
  11 siblings, 0 replies; 29+ messages in thread
From: Matthias Kretz @ 2021-06-08 12:12 UTC (permalink / raw)
  To: gcc-patches, libstdc++

[-- Attachment #1: Type: text/plain, Size: 1078 bytes --]



From: Matthias Kretz <kretz@kde.org>

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>

libstdc++-v3/ChangeLog:

	* include/experimental/bits/simd_math.h
	(_GLIBCXX_SIMD_MATH_CALL2_): Rename arg2_ to __arg2.
	(_GLIBCXX_SIMD_MATH_CALL3_): Rename arg2_ to __arg2 and arg3_ to
	__arg3.
---
 libstdc++-v3/include/experimental/bits/simd_math.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)


--
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 std::experimental::simd              https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────

[-- Attachment #2: 0010-libstdc-Fix-internal-names-add-missing-underscores.patch --]
[-- Type: text/x-patch, Size: 2737 bytes --]

diff --git a/libstdc++-v3/include/experimental/bits/simd_math.h b/libstdc++-v3/include/experimental/bits/simd_math.h
index a5df2039970..61af9fc67af 100644
--- a/libstdc++-v3/include/experimental/bits/simd_math.h
+++ b/libstdc++-v3/include/experimental/bits/simd_math.h
@@ -119,10 +119,10 @@ template <typename _Up, typename _Tp, typename _Abi>
 
 //}}}
 // _GLIBCXX_SIMD_MATH_CALL2_ {{{
-#define _GLIBCXX_SIMD_MATH_CALL2_(__name, arg2_)                               \
+#define _GLIBCXX_SIMD_MATH_CALL2_(__name, __arg2)                              \
 template <                                                                     \
   typename _Tp, typename _Abi, typename...,                                    \
-  typename _Arg2 = _Extra_argument_type<arg2_, _Tp, _Abi>,                     \
+  typename _Arg2 = _Extra_argument_type<__arg2, _Tp, _Abi>,                    \
   typename _R = _Math_return_type_t<                                           \
     decltype(std::__name(declval<double>(), _Arg2::declval())), _Tp, _Abi>>    \
   enable_if_t<is_floating_point_v<_Tp>, _R>                                    \
@@ -137,7 +137,7 @@ template <typename _Up, typename _Tp, typename _Abi>                           \
       declval<double>(),                                                       \
       declval<enable_if_t<                                                     \
 	conjunction_v<                                                         \
-	  is_same<arg2_, _Tp>,                                                 \
+	  is_same<__arg2, _Tp>,                                                \
 	  negation<is_same<__remove_cvref_t<_Up>, simd<_Tp, _Abi>>>,           \
 	  is_convertible<_Up, simd<_Tp, _Abi>>, is_floating_point<_Tp>>,       \
 	double>>())),                                                          \
@@ -147,10 +147,10 @@ template <typename _Up, typename _Tp, typename _Abi>                           \
 
 // }}}
 // _GLIBCXX_SIMD_MATH_CALL3_ {{{
-#define _GLIBCXX_SIMD_MATH_CALL3_(__name, arg2_, arg3_)                        \
+#define _GLIBCXX_SIMD_MATH_CALL3_(__name, __arg2, __arg3)                      \
 template <typename _Tp, typename _Abi, typename...,                            \
-	  typename _Arg2 = _Extra_argument_type<arg2_, _Tp, _Abi>,             \
-	  typename _Arg3 = _Extra_argument_type<arg3_, _Tp, _Abi>,             \
+	  typename _Arg2 = _Extra_argument_type<__arg2, _Tp, _Abi>,            \
+	  typename _Arg3 = _Extra_argument_type<__arg3, _Tp, _Abi>,            \
 	  typename _R = _Math_return_type_t<                                   \
 	    decltype(std::__name(declval<double>(), _Arg2::declval(),          \
 				 _Arg3::declval())),                           \

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 11/11] libstdc++: Fix ODR issues with different -m flags
  2021-06-08 12:10 [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups Matthias Kretz
                   ` (9 preceding siblings ...)
  2021-06-08 12:12 ` [PATCH 10/11] libstdc++: Fix internal names: add missing underscores Matthias Kretz
@ 2021-06-08 12:12 ` Matthias Kretz
  2021-06-09 12:22   ` Richard Biener
  2021-11-15  8:57   ` Matthias Kretz
  2021-06-24 13:42 ` [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups Jonathan Wakely
  11 siblings, 2 replies; 29+ messages in thread
From: Matthias Kretz @ 2021-06-08 12:12 UTC (permalink / raw)
  To: gcc-patches, libstdc++

[-- Attachment #1: Type: text/plain, Size: 3618 bytes --]


From: Matthias Kretz <kretz@kde.org>

Explicitly support use of the stdx::simd implementation in situations
where the user links TUs that were compiled with different -m flags. In
general, this is always a (quasi) ODR violation for inline functions
because at least codegen may differ in important ways. However, in the
resulting executable only one (unspecified which one) of them might be
used. For simd we want to support users to compile code multiple times,
with different -m flags and have a runtime dispatch to the TU matching
the target CPU. But if internal functions are not inlined this may lead
to unexpected performance loss or execution of illegal instructions.
Therefore, inline functions that are not marked as always_inline must
use an additional template parameter somewhere in their name, to
disambiguate between the different -m translations.

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>

libstdc++-v3/ChangeLog:

	* include/experimental/bits/simd.h: Move feature detection bools
	and add __have_avx512bitalg, __have_avx512vbmi2,
	__have_avx512vbmi, __have_avx512ifma, __have_avx512cd,
	__have_avx512vnni, __have_avx512vpopcntdq.
	(__detail::__machine_flags): New function which returns a unique
	uint64 depending on relevant -m and -f flags.
	(__detail::__odr_helper): New type alias for either an anonymous
	type or a type specialized with the __machine_flags number.
	(_SimdIntOperators): Change template parameters from _Impl to
	_Tp, _Abi because _Impl now has an __odr_helper parameter which
	may be _OdrEnforcer from the anonymous namespace, which makes
	for a bad base class.
	(many): Either add __odr_helper template parameter or mark as
	always_inline.
	* include/experimental/bits/simd_detail.h: Add defines for
	AVX512BITALG, AVX512VBMI2, AVX512VBMI, AVX512IFMA, AVX512CD,
	AVX512VNNI, AVX512VPOPCNTDQ, and AVX512VP2INTERSECT.
	* include/experimental/bits/simd_builtin.h: Add __odr_helper
	template parameter or mark as always_inline.
	* include/experimental/bits/simd_fixed_size.h: Ditto.
	* include/experimental/bits/simd_math.h: Ditto.
	* include/experimental/bits/simd_scalar.h: Ditto.
	* include/experimental/bits/simd_neon.h: Add __odr_helper
	template parameter.
	* include/experimental/bits/simd_ppc.h: Ditto.
	* include/experimental/bits/simd_x86.h: Ditto.
---
 libstdc++-v3/include/experimental/bits/simd.h | 380 ++++++++++++------
 .../include/experimental/bits/simd_builtin.h  |  41 +-
 .../include/experimental/bits/simd_detail.h   |  40 ++
 .../experimental/bits/simd_fixed_size.h       |  39 +-
 .../include/experimental/bits/simd_math.h     |  45 ++-
 .../include/experimental/bits/simd_neon.h     |   4 +-
 .../include/experimental/bits/simd_ppc.h      |   4 +-
 .../include/experimental/bits/simd_scalar.h   |  71 +++-
 .../include/experimental/bits/simd_x86.h      |   4 +-
 9 files changed, 440 insertions(+), 188 deletions(-)


--
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 std::experimental::simd              https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────

[-- Attachment #2: 0011-libstdc-Fix-ODR-issues-with-different-m-flags.patch --]
[-- Type: text/x-patch, Size: 53223 bytes --]

diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/include/experimental/bits/simd.h
index 21100c1087d..43331134301 100644
--- a/libstdc++-v3/include/experimental/bits/simd.h
+++ b/libstdc++-v3/include/experimental/bits/simd.h
@@ -35,6 +35,7 @@
 #include <cstdio> // for stderr
 #endif
 #include <cstring>
+#include <cmath>
 #include <functional>
 #include <iosfwd>
 #include <utility>
@@ -203,9 +204,170 @@ template <size_t _Np>
 // }}}
 template <size_t _Xp>
   using _SizeConstant = integral_constant<size_t, _Xp>;
+// constexpr feature detection{{{
+constexpr inline bool __have_mmx = _GLIBCXX_SIMD_HAVE_MMX;
+constexpr inline bool __have_sse = _GLIBCXX_SIMD_HAVE_SSE;
+constexpr inline bool __have_sse2 = _GLIBCXX_SIMD_HAVE_SSE2;
+constexpr inline bool __have_sse3 = _GLIBCXX_SIMD_HAVE_SSE3;
+constexpr inline bool __have_ssse3 = _GLIBCXX_SIMD_HAVE_SSSE3;
+constexpr inline bool __have_sse4_1 = _GLIBCXX_SIMD_HAVE_SSE4_1;
+constexpr inline bool __have_sse4_2 = _GLIBCXX_SIMD_HAVE_SSE4_2;
+constexpr inline bool __have_xop = _GLIBCXX_SIMD_HAVE_XOP;
+constexpr inline bool __have_avx = _GLIBCXX_SIMD_HAVE_AVX;
+constexpr inline bool __have_avx2 = _GLIBCXX_SIMD_HAVE_AVX2;
+constexpr inline bool __have_bmi = _GLIBCXX_SIMD_HAVE_BMI1;
+constexpr inline bool __have_bmi2 = _GLIBCXX_SIMD_HAVE_BMI2;
+constexpr inline bool __have_lzcnt = _GLIBCXX_SIMD_HAVE_LZCNT;
+constexpr inline bool __have_sse4a = _GLIBCXX_SIMD_HAVE_SSE4A;
+constexpr inline bool __have_fma = _GLIBCXX_SIMD_HAVE_FMA;
+constexpr inline bool __have_fma4 = _GLIBCXX_SIMD_HAVE_FMA4;
+constexpr inline bool __have_f16c = _GLIBCXX_SIMD_HAVE_F16C;
+constexpr inline bool __have_popcnt = _GLIBCXX_SIMD_HAVE_POPCNT;
+constexpr inline bool __have_avx512f = _GLIBCXX_SIMD_HAVE_AVX512F;
+constexpr inline bool __have_avx512dq = _GLIBCXX_SIMD_HAVE_AVX512DQ;
+constexpr inline bool __have_avx512vl = _GLIBCXX_SIMD_HAVE_AVX512VL;
+constexpr inline bool __have_avx512bw = _GLIBCXX_SIMD_HAVE_AVX512BW;
+constexpr inline bool __have_avx512dq_vl = __have_avx512dq && __have_avx512vl;
+constexpr inline bool __have_avx512bw_vl = __have_avx512bw && __have_avx512vl;
+constexpr inline bool __have_avx512bitalg = _GLIBCXX_SIMD_HAVE_AVX512BITALG;
+constexpr inline bool __have_avx512vbmi2 = _GLIBCXX_SIMD_HAVE_AVX512VBMI2;
+constexpr inline bool __have_avx512vbmi = _GLIBCXX_SIMD_HAVE_AVX512VBMI;
+constexpr inline bool __have_avx512ifma = _GLIBCXX_SIMD_HAVE_AVX512IFMA;
+constexpr inline bool __have_avx512cd = _GLIBCXX_SIMD_HAVE_AVX512CD;
+constexpr inline bool __have_avx512vnni = _GLIBCXX_SIMD_HAVE_AVX512VNNI;
+constexpr inline bool __have_avx512vpopcntdq = _GLIBCXX_SIMD_HAVE_AVX512VPOPCNTDQ;
+constexpr inline bool __have_avx512vp2intersect = _GLIBCXX_SIMD_HAVE_AVX512VP2INTERSECT;
+
+constexpr inline bool __have_neon = _GLIBCXX_SIMD_HAVE_NEON;
+constexpr inline bool __have_neon_a32 = _GLIBCXX_SIMD_HAVE_NEON_A32;
+constexpr inline bool __have_neon_a64 = _GLIBCXX_SIMD_HAVE_NEON_A64;
+constexpr inline bool __support_neon_float =
+#if defined __GCC_IEC_559
+  __GCC_IEC_559 == 0;
+#elif defined __FAST_MATH__
+  true;
+#else
+  false;
+#endif
+
+#ifdef _ARCH_PWR10
+constexpr inline bool __have_power10vec = true;
+#else
+constexpr inline bool __have_power10vec = false;
+#endif
+#ifdef __POWER9_VECTOR__
+constexpr inline bool __have_power9vec = true;
+#else
+constexpr inline bool __have_power9vec = false;
+#endif
+#if defined __POWER8_VECTOR__
+constexpr inline bool __have_power8vec = true;
+#else
+constexpr inline bool __have_power8vec = __have_power9vec;
+#endif
+#if defined __VSX__
+constexpr inline bool __have_power_vsx = true;
+#else
+constexpr inline bool __have_power_vsx = __have_power8vec;
+#endif
+#if defined __ALTIVEC__
+constexpr inline bool __have_power_vmx = true;
+#else
+constexpr inline bool __have_power_vmx = __have_power_vsx;
+#endif
+
+// }}}
 
 namespace __detail
 {
+  constexpr std::uint_least64_t
+  __floating_point_flags()
+  {
+    std::uint_least64_t __flags = 0;
+    if constexpr (math_errhandling & MATH_ERREXCEPT)
+      __flags |= 1;
+#ifdef __FAST_MATH__
+    __flags |= 1 << 1;
+#elif __FINITE_MATH_ONLY__
+    __flags |= 2 << 1;
+#elif __GCC_IEC_559 < 2
+    __flags |= 3 << 1;
+#endif
+    __flags |= (__FLT_EVAL_METHOD__ + 1) << 3;
+    return __flags;
+  }
+
+  constexpr std::uint_least64_t
+  __machine_flags()
+  {
+    if constexpr (__have_mmx || __have_sse)
+      return __have_mmx
+		 | (__have_sse                << 1)
+		 | (__have_sse2               << 2)
+		 | (__have_sse3               << 3)
+		 | (__have_ssse3              << 4)
+		 | (__have_sse4_1             << 5)
+		 | (__have_sse4_2             << 6)
+		 | (__have_xop                << 7)
+		 | (__have_avx                << 8)
+		 | (__have_avx2               << 9)
+		 | (__have_bmi                << 10)
+		 | (__have_bmi2               << 11)
+		 | (__have_lzcnt              << 12)
+		 | (__have_sse4a              << 13)
+		 | (__have_fma                << 14)
+		 | (__have_fma4               << 15)
+		 | (__have_f16c               << 16)
+		 | (__have_popcnt             << 17)
+		 | (__have_avx512f            << 18)
+		 | (__have_avx512dq           << 19)
+		 | (__have_avx512vl           << 20)
+		 | (__have_avx512bw           << 21)
+		 | (__have_avx512bitalg       << 22)
+		 | (__have_avx512vbmi2        << 23)
+		 | (__have_avx512vbmi         << 24)
+		 | (__have_avx512ifma         << 25)
+		 | (__have_avx512cd           << 26)
+		 | (__have_avx512vnni         << 27)
+		 | (__have_avx512vpopcntdq    << 28)
+		 | (__have_avx512vp2intersect << 29);
+    else if constexpr (__have_neon)
+      return __have_neon
+	       | (__have_neon_a32 << 1)
+	       | (__have_neon_a64 << 2)
+	       | (__have_neon_a64 << 2)
+	       | (__support_neon_float << 3);
+    else if constexpr (__have_power_vmx)
+      return __have_power_vmx
+	       | (__have_power_vsx  << 1)
+	       | (__have_power8vec  << 2)
+	       | (__have_power9vec  << 3)
+	       | (__have_power10vec << 4);
+    else
+      return 0;
+  }
+
+  namespace
+  {
+    struct _OdrEnforcer {};
+  }
+
+  template <std::uint_least64_t...>
+    struct _MachineFlagsTemplate {};
+
+  /**@internal
+   * Use this type as default template argument to all function templates that
+   * are not declared always_inline. It ensures, that a function
+   * specialization, which the compiler decides not to inline, has a unique symbol
+   * (_OdrEnforcer) or a symbol matching the machine/architecture flags
+   * (_MachineFlagsTemplate). This helps to avoid ODR violations in cases where
+   * users link TUs compiled with different flags. This is especially important
+   * for using simd in libraries.
+   */
+  using __odr_helper
+    = conditional_t<__machine_flags() == 0, _OdrEnforcer,
+		    _MachineFlagsTemplate<__machine_flags(), __floating_point_flags()>>;
+
   struct _Minimum
   {
     template <typename _Tp>
@@ -469,71 +631,6 @@ template <int _Np>
 template <typename _Tp>
   inline constexpr bool __is_fixed_size_abi_v = __is_fixed_size_abi<_Tp>::value;
 
-// }}}
-// constexpr feature detection{{{
-constexpr inline bool __have_mmx = _GLIBCXX_SIMD_HAVE_MMX;
-constexpr inline bool __have_sse = _GLIBCXX_SIMD_HAVE_SSE;
-constexpr inline bool __have_sse2 = _GLIBCXX_SIMD_HAVE_SSE2;
-constexpr inline bool __have_sse3 = _GLIBCXX_SIMD_HAVE_SSE3;
-constexpr inline bool __have_ssse3 = _GLIBCXX_SIMD_HAVE_SSSE3;
-constexpr inline bool __have_sse4_1 = _GLIBCXX_SIMD_HAVE_SSE4_1;
-constexpr inline bool __have_sse4_2 = _GLIBCXX_SIMD_HAVE_SSE4_2;
-constexpr inline bool __have_xop = _GLIBCXX_SIMD_HAVE_XOP;
-constexpr inline bool __have_avx = _GLIBCXX_SIMD_HAVE_AVX;
-constexpr inline bool __have_avx2 = _GLIBCXX_SIMD_HAVE_AVX2;
-constexpr inline bool __have_bmi = _GLIBCXX_SIMD_HAVE_BMI1;
-constexpr inline bool __have_bmi2 = _GLIBCXX_SIMD_HAVE_BMI2;
-constexpr inline bool __have_lzcnt = _GLIBCXX_SIMD_HAVE_LZCNT;
-constexpr inline bool __have_sse4a = _GLIBCXX_SIMD_HAVE_SSE4A;
-constexpr inline bool __have_fma = _GLIBCXX_SIMD_HAVE_FMA;
-constexpr inline bool __have_fma4 = _GLIBCXX_SIMD_HAVE_FMA4;
-constexpr inline bool __have_f16c = _GLIBCXX_SIMD_HAVE_F16C;
-constexpr inline bool __have_popcnt = _GLIBCXX_SIMD_HAVE_POPCNT;
-constexpr inline bool __have_avx512f = _GLIBCXX_SIMD_HAVE_AVX512F;
-constexpr inline bool __have_avx512dq = _GLIBCXX_SIMD_HAVE_AVX512DQ;
-constexpr inline bool __have_avx512vl = _GLIBCXX_SIMD_HAVE_AVX512VL;
-constexpr inline bool __have_avx512bw = _GLIBCXX_SIMD_HAVE_AVX512BW;
-constexpr inline bool __have_avx512dq_vl = __have_avx512dq && __have_avx512vl;
-constexpr inline bool __have_avx512bw_vl = __have_avx512bw && __have_avx512vl;
-
-constexpr inline bool __have_neon = _GLIBCXX_SIMD_HAVE_NEON;
-constexpr inline bool __have_neon_a32 = _GLIBCXX_SIMD_HAVE_NEON_A32;
-constexpr inline bool __have_neon_a64 = _GLIBCXX_SIMD_HAVE_NEON_A64;
-constexpr inline bool __support_neon_float =
-#if defined __GCC_IEC_559
-  __GCC_IEC_559 == 0;
-#elif defined __FAST_MATH__
-  true;
-#else
-  false;
-#endif
-
-#ifdef _ARCH_PWR10
-constexpr inline bool __have_power10vec = true;
-#else
-constexpr inline bool __have_power10vec = false;
-#endif
-#ifdef __POWER9_VECTOR__
-constexpr inline bool __have_power9vec = true;
-#else
-constexpr inline bool __have_power9vec = false;
-#endif
-#if defined __POWER8_VECTOR__
-constexpr inline bool __have_power8vec = true;
-#else
-constexpr inline bool __have_power8vec = __have_power9vec;
-#endif
-#if defined __VSX__
-constexpr inline bool __have_power_vsx = true;
-#else
-constexpr inline bool __have_power_vsx = __have_power8vec;
-#endif
-#if defined __ALTIVEC__
-constexpr inline bool __have_power_vmx = true;
-#else
-constexpr inline bool __have_power_vmx = __have_power_vsx;
-#endif
-
 // }}}
 // __is_scalar_abi {{{
 template <typename _Abi>
@@ -3984,7 +4081,7 @@ template <typename _Tp, typename _A0, typename... _As>
 
 // }}}
 // concat(simd...) {{{
-template <typename _Tp, typename... _As>
+template <typename _Tp, typename... _As, typename = __detail::__odr_helper>
   inline _GLIBCXX_SIMD_CONSTEXPR
   simd<_Tp, simd_abi::deduce_t<_Tp, (simd_size_v<_Tp, _As> + ...)>>
   concat(const simd<_Tp, _As>&... __xs)
@@ -4567,6 +4664,7 @@ template <typename _Tp, typename _Abi>
       template <typename _Up, typename _A2,
 		typename
 		= enable_if_t<simd_size_v<_Up, _A2> == simd_size_v<_Tp, _Abi>>>
+	_GLIBCXX_SIMD_ALWAYS_INLINE
 	operator simd_mask<_Up, _A2>() &&
 	{
 	  using namespace std::experimental::__proposed;
@@ -4801,121 +4899,153 @@ find_last_set(_ExactBool)
 // }}}
 
 // _SimdIntOperators{{{1
-template <typename _V, typename _Impl, bool>
+template <typename _V, typename _Tp, typename _Abi, bool>
   class _SimdIntOperators {};
 
-template <typename _V, typename _Impl>
-  class _SimdIntOperators<_V, _Impl, true>
+template <typename _V, typename _Tp, typename _Abi>
+  class _SimdIntOperators<_V, _Tp, _Abi, true>
   {
+    using _Impl = typename _SimdTraits<_Tp, _Abi>::_SimdImpl;
+
     _GLIBCXX_SIMD_INTRINSIC const _V& __derived() const
     { return *static_cast<const _V*>(this); }
 
-    template <typename _Tp>
+    template <typename _Up>
       _GLIBCXX_SIMD_INTRINSIC static _GLIBCXX_SIMD_CONSTEXPR _V
-      _S_make_derived(_Tp&& __d)
-      { return {__private_init, static_cast<_Tp&&>(__d)}; }
+      _S_make_derived(_Up&& __d)
+      { return {__private_init, static_cast<_Up&&>(__d)}; }
 
   public:
-    _GLIBCXX_SIMD_CONSTEXPR friend _V& operator%=(_V& __lhs, const _V& __x)
+    _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend
+    _V&
+    operator%=(_V& __lhs, const _V& __x)
     { return __lhs = __lhs % __x; }
 
-    _GLIBCXX_SIMD_CONSTEXPR friend _V& operator&=(_V& __lhs, const _V& __x)
+    _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend
+    _V&
+    operator&=(_V& __lhs, const _V& __x)
     { return __lhs = __lhs & __x; }
 
-    _GLIBCXX_SIMD_CONSTEXPR friend _V& operator|=(_V& __lhs, const _V& __x)
+    _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend
+    _V&
+    operator|=(_V& __lhs, const _V& __x)
     { return __lhs = __lhs | __x; }
 
-    _GLIBCXX_SIMD_CONSTEXPR friend _V& operator^=(_V& __lhs, const _V& __x)
+    _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend
+    _V&
+    operator^=(_V& __lhs, const _V& __x)
     { return __lhs = __lhs ^ __x; }
 
-    _GLIBCXX_SIMD_CONSTEXPR friend _V& operator<<=(_V& __lhs, const _V& __x)
+    _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend
+    _V&
+    operator<<=(_V& __lhs, const _V& __x)
     { return __lhs = __lhs << __x; }
 
-    _GLIBCXX_SIMD_CONSTEXPR friend _V& operator>>=(_V& __lhs, const _V& __x)
+    _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend
+    _V&
+    operator>>=(_V& __lhs, const _V& __x)
     { return __lhs = __lhs >> __x; }
 
-    _GLIBCXX_SIMD_CONSTEXPR friend _V& operator<<=(_V& __lhs, int __x)
+    _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend
+    _V&
+    operator<<=(_V& __lhs, int __x)
     { return __lhs = __lhs << __x; }
 
-    _GLIBCXX_SIMD_CONSTEXPR friend _V& operator>>=(_V& __lhs, int __x)
+    _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend
+    _V&
+    operator>>=(_V& __lhs, int __x)
     { return __lhs = __lhs >> __x; }
 
-    _GLIBCXX_SIMD_CONSTEXPR friend _V operator%(const _V& __x, const _V& __y)
+    _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend
+    _V
+    operator%(const _V& __x, const _V& __y)
     {
       return _SimdIntOperators::_S_make_derived(
 	_Impl::_S_modulus(__data(__x), __data(__y)));
     }
 
-    _GLIBCXX_SIMD_CONSTEXPR friend _V operator&(const _V& __x, const _V& __y)
+    _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend
+    _V
+    operator&(const _V& __x, const _V& __y)
     {
       return _SimdIntOperators::_S_make_derived(
 	_Impl::_S_bit_and(__data(__x), __data(__y)));
     }
 
-    _GLIBCXX_SIMD_CONSTEXPR friend _V operator|(const _V& __x, const _V& __y)
+    _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend
+    _V
+    operator|(const _V& __x, const _V& __y)
     {
       return _SimdIntOperators::_S_make_derived(
 	_Impl::_S_bit_or(__data(__x), __data(__y)));
     }
 
-    _GLIBCXX_SIMD_CONSTEXPR friend _V operator^(const _V& __x, const _V& __y)
+    _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend
+    _V
+    operator^(const _V& __x, const _V& __y)
     {
       return _SimdIntOperators::_S_make_derived(
 	_Impl::_S_bit_xor(__data(__x), __data(__y)));
     }
 
-    _GLIBCXX_SIMD_CONSTEXPR friend _V operator<<(const _V& __x, const _V& __y)
+    _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend
+    _V
+    operator<<(const _V& __x, const _V& __y)
     {
       return _SimdIntOperators::_S_make_derived(
 	_Impl::_S_bit_shift_left(__data(__x), __data(__y)));
     }
 
-    _GLIBCXX_SIMD_CONSTEXPR friend _V operator>>(const _V& __x, const _V& __y)
+    _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend
+    _V
+    operator>>(const _V& __x, const _V& __y)
     {
       return _SimdIntOperators::_S_make_derived(
 	_Impl::_S_bit_shift_right(__data(__x), __data(__y)));
     }
 
-    template <typename _VV = _V>
-      _GLIBCXX_SIMD_CONSTEXPR friend _V operator<<(const _V& __x, int __y)
-      {
-	using _Tp = typename _VV::value_type;
-	if (__y < 0)
-	  __invoke_ub("The behavior is undefined if the right operand of a "
-		      "shift operation is negative. [expr.shift]\nA shift by "
-		      "%d was requested",
-		      __y);
-	if (size_t(__y) >= sizeof(declval<_Tp>() << __y) * __CHAR_BIT__)
-	  __invoke_ub(
-	    "The behavior is undefined if the right operand of a "
-	    "shift operation is greater than or equal to the width of the "
-	    "promoted left operand. [expr.shift]\nA shift by %d was requested",
-	    __y);
-	return _SimdIntOperators::_S_make_derived(
-	  _Impl::_S_bit_shift_left(__data(__x), __y));
-      }
+    _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend
+    _V
+    operator<<(const _V& __x, int __y)
+    {
+      if (__y < 0)
+	__invoke_ub("The behavior is undefined if the right operand of a "
+		    "shift operation is negative. [expr.shift]\nA shift by "
+		    "%d was requested",
+		    __y);
+      if (size_t(__y) >= sizeof(declval<_Tp>() << __y) * __CHAR_BIT__)
+	__invoke_ub(
+	  "The behavior is undefined if the right operand of a "
+	  "shift operation is greater than or equal to the width of the "
+	  "promoted left operand. [expr.shift]\nA shift by %d was requested",
+	  __y);
+      return _SimdIntOperators::_S_make_derived(
+	_Impl::_S_bit_shift_left(__data(__x), __y));
+    }
 
-    template <typename _VV = _V>
-      _GLIBCXX_SIMD_CONSTEXPR friend _V operator>>(const _V& __x, int __y)
-      {
-	using _Tp = typename _VV::value_type;
-	if (__y < 0)
-	  __invoke_ub(
-	    "The behavior is undefined if the right operand of a shift "
-	    "operation is negative. [expr.shift]\nA shift by %d was requested",
-	    __y);
-	if (size_t(__y) >= sizeof(declval<_Tp>() << __y) * __CHAR_BIT__)
-	  __invoke_ub(
-	    "The behavior is undefined if the right operand of a shift "
-	    "operation is greater than or equal to the width of the promoted "
-	    "left operand. [expr.shift]\nA shift by %d was requested",
-	    __y);
-	return _SimdIntOperators::_S_make_derived(
-	  _Impl::_S_bit_shift_right(__data(__x), __y));
-      }
+    _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend
+    _V
+    operator>>(const _V& __x, int __y)
+    {
+      if (__y < 0)
+	__invoke_ub(
+	  "The behavior is undefined if the right operand of a shift "
+	  "operation is negative. [expr.shift]\nA shift by %d was requested",
+	  __y);
+      if (size_t(__y) >= sizeof(declval<_Tp>() << __y) * __CHAR_BIT__)
+	__invoke_ub(
+	  "The behavior is undefined if the right operand of a shift "
+	  "operation is greater than or equal to the width of the promoted "
+	  "left operand. [expr.shift]\nA shift by %d was requested",
+	  __y);
+      return _SimdIntOperators::_S_make_derived(
+	_Impl::_S_bit_shift_right(__data(__x), __y));
+    }
 
     // unary operators (for integral _Tp)
-    _GLIBCXX_SIMD_CONSTEXPR _V operator~() const
+    _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR
+    _V
+    operator~() const
     { return {__private_init, _Impl::_S_complement(__derived()._M_data)}; }
   };
 
@@ -4924,7 +5054,7 @@ template <typename _V, typename _Impl>
 // simd {{{
 template <typename _Tp, typename _Abi>
   class simd : public _SimdIntOperators<
-		 simd<_Tp, _Abi>, typename _SimdTraits<_Tp, _Abi>::_SimdImpl,
+		 simd<_Tp, _Abi>, _Tp, _Abi,
 		 conjunction<is_integral<_Tp>,
 			     typename _SimdTraits<_Tp, _Abi>::_IsValid>::value>,
 	       public _SimdTraits<_Tp, _Abi>::_SimdBase
@@ -4938,7 +5068,7 @@ template <typename _Tp, typename _Abi>
   public:
     using _Impl = typename _Traits::_SimdImpl;
     friend _Impl;
-    friend _SimdIntOperators<simd, _Impl, true>;
+    friend _SimdIntOperators<simd, _Tp, _Abi, true>;
 
     using value_type = _Tp;
     using reference = _SmartReference<_MemberType, _Impl, value_type>;
diff --git a/libstdc++-v3/include/experimental/bits/simd_builtin.h b/libstdc++-v3/include/experimental/bits/simd_builtin.h
index 8cd338e313f..55fea77d4ab 100644
--- a/libstdc++-v3/include/experimental/bits/simd_builtin.h
+++ b/libstdc++-v3/include/experimental/bits/simd_builtin.h
@@ -50,7 +50,8 @@ template <typename _V, typename = _VectorTraits<_V>>
 //}}}
 // __vector_permute<Indices...>{{{
 // Index == -1 requests zeroing of the output element
-template <int... _Indices, typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+template <int... _Indices, typename _Tp, typename _TVT = _VectorTraits<_Tp>,
+	  typename = __detail::__odr_helper>
   _Tp
   __vector_permute(_Tp __x)
   {
@@ -62,7 +63,8 @@ template <int... _Indices, typename _Tp, typename _TVT = _VectorTraits<_Tp>>
 // }}}
 // __vector_shuffle<Indices...>{{{
 // Index == -1 requests zeroing of the output element
-template <int... _Indices, typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+template <int... _Indices, typename _Tp, typename _TVT = _VectorTraits<_Tp>,
+	  typename = __detail::__odr_helper>
   _Tp
   __vector_shuffle(_Tp __x, _Tp __y)
   {
@@ -820,10 +822,12 @@ template <typename _Tp, typename _Mp, typename _Abi, size_t _Np>
     // _SimdBase / base class for simd, providing extra conversions {{{
     struct _SimdBase2
     {
+      _GLIBCXX_SIMD_ALWAYS_INLINE
       explicit operator __intrinsic_type_t<_Tp, _Np>() const
       {
 	return __to_intrin(static_cast<const simd<_Tp, _Abi>*>(this)->_M_data);
       }
+      _GLIBCXX_SIMD_ALWAYS_INLINE
       explicit operator __vector_type_t<_Tp, _Np>() const
       {
 	return static_cast<const simd<_Tp, _Abi>*>(this)->_M_data.__builtin();
@@ -832,6 +836,7 @@ template <typename _Tp, typename _Mp, typename _Abi, size_t _Np>
 
     struct _SimdBase1
     {
+      _GLIBCXX_SIMD_ALWAYS_INLINE
       explicit operator __intrinsic_type_t<_Tp, _Np>() const
       { return __data(*static_cast<const simd<_Tp, _Abi>*>(this)); }
     };
@@ -844,11 +849,13 @@ template <typename _Tp, typename _Mp, typename _Abi, size_t _Np>
     // _MaskBase {{{
     struct _MaskBase2
     {
+      _GLIBCXX_SIMD_ALWAYS_INLINE
       explicit operator __intrinsic_type_t<_Tp, _Np>() const
       {
 	return static_cast<const simd_mask<_Tp, _Abi>*>(this)
 	  ->_M_data.__intrin();
       }
+      _GLIBCXX_SIMD_ALWAYS_INLINE
       explicit operator __vector_type_t<_Tp, _Np>() const
       {
 	return static_cast<const simd_mask<_Tp, _Abi>*>(this)->_M_data._M_data;
@@ -857,6 +864,7 @@ template <typename _Tp, typename _Mp, typename _Abi, size_t _Np>
 
     struct _MaskBase1
     {
+      _GLIBCXX_SIMD_ALWAYS_INLINE
       explicit operator __intrinsic_type_t<_Tp, _Np>() const
       { return __data(*static_cast<const simd_mask<_Tp, _Abi>*>(this)); }
     };
@@ -874,7 +882,9 @@ template <typename _Tp, typename _Mp, typename _Abi, size_t _Np>
       _Up _M_data;
 
     public:
+      _GLIBCXX_SIMD_ALWAYS_INLINE
       _MaskCastType(_Up __x) : _M_data(__x) {}
+      _GLIBCXX_SIMD_ALWAYS_INLINE
       operator _MaskMember() const { return _M_data; }
     };
 
@@ -887,7 +897,9 @@ template <typename _Tp, typename _Mp, typename _Abi, size_t _Np>
       _SimdMember _M_data;
 
     public:
+      _GLIBCXX_SIMD_ALWAYS_INLINE
       _SimdCastType1(_Ap __a) : _M_data(__vector_bitcast<_Tp>(__a)) {}
+      _GLIBCXX_SIMD_ALWAYS_INLINE
       operator _SimdMember() const { return _M_data; }
     };
 
@@ -898,8 +910,11 @@ template <typename _Tp, typename _Mp, typename _Abi, size_t _Np>
       _SimdMember _M_data;
 
     public:
+      _GLIBCXX_SIMD_ALWAYS_INLINE
       _SimdCastType2(_Ap __a) : _M_data(__vector_bitcast<_Tp>(__a)) {}
+      _GLIBCXX_SIMD_ALWAYS_INLINE
       _SimdCastType2(_Bp __b) : _M_data(__b) {}
+      _GLIBCXX_SIMD_ALWAYS_INLINE
       operator _SimdMember() const { return _M_data; }
     };
 
@@ -913,14 +928,14 @@ template <typename _Tp, typename _Mp, typename _Abi, size_t _Np>
 struct _CommonImplX86;
 struct _CommonImplNeon;
 struct _CommonImplBuiltin;
-template <typename _Abi> struct _SimdImplBuiltin;
-template <typename _Abi> struct _MaskImplBuiltin;
-template <typename _Abi> struct _SimdImplX86;
-template <typename _Abi> struct _MaskImplX86;
-template <typename _Abi> struct _SimdImplNeon;
-template <typename _Abi> struct _MaskImplNeon;
-template <typename _Abi> struct _SimdImplPpc;
-template <typename _Abi> struct _MaskImplPpc;
+template <typename _Abi, typename = __detail::__odr_helper> struct _SimdImplBuiltin;
+template <typename _Abi, typename = __detail::__odr_helper> struct _MaskImplBuiltin;
+template <typename _Abi, typename = __detail::__odr_helper> struct _SimdImplX86;
+template <typename _Abi, typename = __detail::__odr_helper> struct _MaskImplX86;
+template <typename _Abi, typename = __detail::__odr_helper> struct _SimdImplNeon;
+template <typename _Abi, typename = __detail::__odr_helper> struct _MaskImplNeon;
+template <typename _Abi, typename = __detail::__odr_helper> struct _SimdImplPpc;
+template <typename _Abi, typename = __detail::__odr_helper> struct _MaskImplPpc;
 
 // simd_abi::_VecBuiltin {{{
 template <int _UsedBytes>
@@ -1369,7 +1384,7 @@ struct _CommonImplBuiltin
 
 // }}}
 // _SimdImplBuiltin {{{1
-template <typename _Abi>
+template <typename _Abi, typename>
   struct _SimdImplBuiltin
   {
     // member types {{{2
@@ -2618,7 +2633,7 @@ struct _MaskImplBuiltinMixin
 };
 
 // _MaskImplBuiltin {{{1
-template <typename _Abi>
+template <typename _Abi, typename>
   struct _MaskImplBuiltin : _MaskImplBuiltinMixin
   {
     using _MaskImplBuiltinMixin::_S_to_bits;
@@ -2953,4 +2968,4 @@ _GLIBCXX_SIMD_END_NAMESPACE
 #endif // __cplusplus >= 201703L
 #endif // _GLIBCXX_EXPERIMENTAL_SIMD_ABIS_H_
 
-// vim: foldmethod=marker foldmarker={{{,}}} sw=2 noet ts=8 sts=2 tw=80
+// vim: foldmethod=marker foldmarker={{{,}}} sw=2 noet ts=8 sts=2 tw=100
diff --git a/libstdc++-v3/include/experimental/bits/simd_detail.h b/libstdc++-v3/include/experimental/bits/simd_detail.h
index 1e75812d098..78ad33f74e4 100644
--- a/libstdc++-v3/include/experimental/bits/simd_detail.h
+++ b/libstdc++-v3/include/experimental/bits/simd_detail.h
@@ -172,6 +172,46 @@
 #else
 #define _GLIBCXX_SIMD_HAVE_AVX512BW 0
 #endif
+#ifdef __AVX512BITALG__
+#define _GLIBCXX_SIMD_HAVE_AVX512BITALG 1
+#else
+#define _GLIBCXX_SIMD_HAVE_AVX512BITALG 0
+#endif
+#ifdef __AVX512VBMI2__
+#define _GLIBCXX_SIMD_HAVE_AVX512VBMI2 1
+#else
+#define _GLIBCXX_SIMD_HAVE_AVX512VBMI2 0
+#endif
+#ifdef __AVX512VBMI__
+#define _GLIBCXX_SIMD_HAVE_AVX512VBMI 1
+#else
+#define _GLIBCXX_SIMD_HAVE_AVX512VBMI 0
+#endif
+#ifdef __AVX512IFMA__
+#define _GLIBCXX_SIMD_HAVE_AVX512IFMA 1
+#else
+#define _GLIBCXX_SIMD_HAVE_AVX512IFMA 0
+#endif
+#ifdef __AVX512CD__
+#define _GLIBCXX_SIMD_HAVE_AVX512CD 1
+#else
+#define _GLIBCXX_SIMD_HAVE_AVX512CD 0
+#endif
+#ifdef __AVX512VNNI__
+#define _GLIBCXX_SIMD_HAVE_AVX512VNNI 1
+#else
+#define _GLIBCXX_SIMD_HAVE_AVX512VNNI 0
+#endif
+#ifdef __AVX512VPOPCNTDQ__
+#define _GLIBCXX_SIMD_HAVE_AVX512VPOPCNTDQ 1
+#else
+#define _GLIBCXX_SIMD_HAVE_AVX512VPOPCNTDQ 0
+#endif
+#ifdef __AVX512VP2INTERSECT__
+#define _GLIBCXX_SIMD_HAVE_AVX512VP2INTERSECT 1
+#else
+#define _GLIBCXX_SIMD_HAVE_AVX512VP2INTERSECT 0
+#endif
 
 #if _GLIBCXX_SIMD_HAVE_SSE
 #define _GLIBCXX_SIMD_HAVE_SSE_ABI 1
diff --git a/libstdc++-v3/include/experimental/bits/simd_fixed_size.h b/libstdc++-v3/include/experimental/bits/simd_fixed_size.h
index dc2fb90b9b2..5a742ed52e1 100644
--- a/libstdc++-v3/include/experimental/bits/simd_fixed_size.h
+++ b/libstdc++-v3/include/experimental/bits/simd_fixed_size.h
@@ -201,6 +201,7 @@ template <typename _Tp, typename _Abi, size_t _Offset>
   };
 
 template <size_t _Offset, typename _Tp, typename _Abi, typename... _As>
+  _GLIBCXX_SIMD_INTRINSIC
   __tuple_element_meta<_Tp, _Abi, _Offset>
   __make_meta(const _SimdTuple<_Tp, _Abi, _As...>&)
   { return {}; }
@@ -230,11 +231,13 @@ template <size_t _O0, size_t _O1, typename _Base>
   struct _WithOffset<_O0, _WithOffset<_O1, _Base>> {};
 
 template <size_t _Offset, typename _Tp>
+  _GLIBCXX_SIMD_INTRINSIC
   decltype(auto)
   __add_offset(_Tp& __base)
   { return static_cast<_WithOffset<_Offset, __remove_cvref_t<_Tp>>&>(__base); }
 
 template <size_t _Offset, typename _Tp>
+  _GLIBCXX_SIMD_INTRINSIC
   decltype(auto)
   __add_offset(const _Tp& __base)
   {
@@ -243,6 +246,7 @@ template <size_t _Offset, typename _Tp>
   }
 
 template <size_t _Offset, size_t _ExistingOffset, typename _Tp>
+  _GLIBCXX_SIMD_INTRINSIC
   decltype(auto)
   __add_offset(_WithOffset<_ExistingOffset, _Tp>& __base)
   {
@@ -251,6 +255,7 @@ template <size_t _Offset, size_t _ExistingOffset, typename _Tp>
   }
 
 template <size_t _Offset, size_t _ExistingOffset, typename _Tp>
+  _GLIBCXX_SIMD_INTRINSIC
   decltype(auto)
   __add_offset(const _WithOffset<_ExistingOffset, _Tp>& __base)
   {
@@ -586,6 +591,7 @@ template <typename _Tp, typename _Abi0, typename... _Abis>
 	  return second[integral_constant<_Up, _I - simd_size_v<_Tp, _Abi0>>()];
       }
 
+    _GLIBCXX_SIMD_INTRINSIC
     _Tp operator[](size_t __i) const noexcept
     {
       if constexpr (_S_tuple_size == 1)
@@ -608,6 +614,7 @@ template <typename _Tp, typename _Abi0, typename... _Abis>
 	}
     }
 
+    _GLIBCXX_SIMD_INTRINSIC
     void _M_set(size_t __i, _Tp __val) noexcept
     {
       if constexpr (_S_tuple_size == 1)
@@ -627,6 +634,7 @@ template <typename _Tp, typename _Abi0, typename... _Abis>
 
   private:
     // _M_subscript_read/_write {{{
+    _GLIBCXX_SIMD_INTRINSIC
     _Tp _M_subscript_read([[maybe_unused]] size_t __i) const noexcept
     {
       if constexpr (__is_vectorizable_v<_FirstType>)
@@ -635,6 +643,7 @@ template <typename _Tp, typename _Abi0, typename... _Abis>
 	return first[__i];
     }
 
+    _GLIBCXX_SIMD_INTRINSIC
     void _M_subscript_write([[maybe_unused]] size_t __i, _Tp __y) noexcept
     {
       if constexpr (__is_vectorizable_v<_FirstType>)
@@ -1033,9 +1042,11 @@ template <typename _Tp, bool = is_arithmetic_v<__remove_cvref_t<_Tp>>>
     _Tp _M_data;
     using _TT = __remove_cvref_t<_Tp>;
 
+    _GLIBCXX_SIMD_INTRINSIC
     operator _TT()
     { return _M_data; }
 
+    _GLIBCXX_SIMD_INTRINSIC
     operator _TT&()
     {
       static_assert(is_lvalue_reference<_Tp>::value, "");
@@ -1043,6 +1054,7 @@ template <typename _Tp, bool = is_arithmetic_v<__remove_cvref_t<_Tp>>>
       return _M_data;
     }
 
+    _GLIBCXX_SIMD_INTRINSIC
     operator _TT*()
     {
       static_assert(is_lvalue_reference<_Tp>::value, "");
@@ -1050,13 +1062,16 @@ template <typename _Tp, bool = is_arithmetic_v<__remove_cvref_t<_Tp>>>
       return &_M_data;
     }
 
-    constexpr inline __autocvt_to_simd(_Tp dd) : _M_data(dd) {}
+    _GLIBCXX_SIMD_INTRINSIC
+    constexpr __autocvt_to_simd(_Tp dd) : _M_data(dd) {}
 
     template <typename _Abi>
+      _GLIBCXX_SIMD_INTRINSIC
       operator simd<typename _TT::value_type, _Abi>()
       { return {__private_init, _M_data}; }
 
     template <typename _Abi>
+      _GLIBCXX_SIMD_INTRINSIC
       operator simd<typename _TT::value_type, _Abi>&()
       {
 	return *reinterpret_cast<simd<typename _TT::value_type, _Abi>*>(
@@ -1064,6 +1079,7 @@ template <typename _Tp, bool = is_arithmetic_v<__remove_cvref_t<_Tp>>>
       }
 
     template <typename _Abi>
+      _GLIBCXX_SIMD_INTRINSIC
       operator simd<typename _TT::value_type, _Abi>*()
       {
 	return reinterpret_cast<simd<typename _TT::value_type, _Abi>*>(
@@ -1081,14 +1097,18 @@ template <typename _Tp>
     _Tp _M_data;
     fixed_size_simd<_TT, 1> _M_fd;
 
-    constexpr inline __autocvt_to_simd(_Tp dd) : _M_data(dd), _M_fd(_M_data) {}
+    _GLIBCXX_SIMD_INTRINSIC
+    constexpr __autocvt_to_simd(_Tp dd) : _M_data(dd), _M_fd(_M_data) {}
 
+    _GLIBCXX_SIMD_INTRINSIC
     ~__autocvt_to_simd()
     { _M_data = __data(_M_fd).first; }
 
+    _GLIBCXX_SIMD_INTRINSIC
     operator fixed_size_simd<_TT, 1>()
     { return _M_fd; }
 
+    _GLIBCXX_SIMD_INTRINSIC
     operator fixed_size_simd<_TT, 1> &()
     {
       static_assert(is_lvalue_reference<_Tp>::value, "");
@@ -1096,6 +1116,7 @@ template <typename _Tp>
       return _M_fd;
     }
 
+    _GLIBCXX_SIMD_INTRINSIC
     operator fixed_size_simd<_TT, 1> *()
     {
       static_assert(is_lvalue_reference<_Tp>::value, "");
@@ -1107,8 +1128,8 @@ template <typename _Tp>
 // }}}
 
 struct _CommonImplFixedSize;
-template <int _Np> struct _SimdImplFixedSize;
-template <int _Np> struct _MaskImplFixedSize;
+template <int _Np, typename = __detail::__odr_helper> struct _SimdImplFixedSize;
+template <int _Np, typename = __detail::__odr_helper> struct _MaskImplFixedSize;
 // simd_abi::_Fixed {{{
 template <int _Np>
   struct simd_abi::_Fixed
@@ -1172,12 +1193,15 @@ template <int _Np>
 	{
 	  // The following ensures, function arguments are passed via the stack.
 	  // This is important for ABI compatibility across TU boundaries
+	  _GLIBCXX_SIMD_ALWAYS_INLINE
 	  _SimdBase(const _SimdBase&) {}
 	  _SimdBase() = default;
 
+	  _GLIBCXX_SIMD_ALWAYS_INLINE
 	  explicit operator const _SimdMember &() const
 	  { return static_cast<const simd<_Tp, _Fixed>*>(this)->_M_data; }
 
+	  _GLIBCXX_SIMD_ALWAYS_INLINE
 	  explicit operator array<_Tp, _Np>() const
 	  {
 	    array<_Tp, _Np> __r;
@@ -1198,8 +1222,11 @@ template <int _Np>
 	// _SimdCastType {{{
 	struct _SimdCastType
 	{
+	  _GLIBCXX_SIMD_ALWAYS_INLINE
 	  _SimdCastType(const array<_Tp, _Np>&);
+	  _GLIBCXX_SIMD_ALWAYS_INLINE
 	  _SimdCastType(const _SimdMember& dd) : _M_data(dd) {}
+	  _GLIBCXX_SIMD_ALWAYS_INLINE
 	  explicit operator const _SimdMember &() const { return _M_data; }
 
 	private:
@@ -1237,7 +1264,7 @@ struct _CommonImplFixedSize
 // _SimdImplFixedSize {{{1
 // fixed_size should not inherit from _SimdMathFallback in order for
 // specializations in the used _SimdTuple Abis to get used
-template <int _Np>
+template <int _Np, typename>
   struct _SimdImplFixedSize
   {
     // member types {{{2
@@ -1794,7 +1821,7 @@ template <int _Np>
   };
 
 // _MaskImplFixedSize {{{1
-template <int _Np>
+template <int _Np, typename>
   struct _MaskImplFixedSize
   {
     static_assert(
diff --git a/libstdc++-v3/include/experimental/bits/simd_math.h b/libstdc++-v3/include/experimental/bits/simd_math.h
index 61af9fc67af..01061a75a5e 100644
--- a/libstdc++-v3/include/experimental/bits/simd_math.h
+++ b/libstdc++-v3/include/experimental/bits/simd_math.h
@@ -60,6 +60,7 @@ template <typename _DoubleR, typename _Tp, typename _Abi>
 template <typename _Tp, typename _Abi, typename...,                            \
 	  typename _R = _Math_return_type_t<                                   \
 	    decltype(std::__name(declval<double>())), _Tp, _Abi>>              \
+  _GLIBCXX_SIMD_ALWAYS_INLINE                                                  \
   enable_if_t<is_floating_point_v<_Tp>, _R>                                    \
   __name(simd<_Tp, _Abi> __x)                                                  \
   { return {__private_init, _Abi::_SimdImpl::_S_##__name(__data(__x))}; }
@@ -125,6 +126,7 @@ template <                                                                     \
   typename _Arg2 = _Extra_argument_type<__arg2, _Tp, _Abi>,                    \
   typename _R = _Math_return_type_t<                                           \
     decltype(std::__name(declval<double>(), _Arg2::declval())), _Tp, _Abi>>    \
+  _GLIBCXX_SIMD_ALWAYS_INLINE                                                  \
   enable_if_t<is_floating_point_v<_Tp>, _R>                                    \
   __name(const simd<_Tp, _Abi>& __x, const typename _Arg2::type& __y)          \
   {                                                                            \
@@ -155,6 +157,7 @@ template <typename _Tp, typename _Abi, typename...,                            \
 	    decltype(std::__name(declval<double>(), _Arg2::declval(),          \
 				 _Arg3::declval())),                           \
 	    _Tp, _Abi>>                                                        \
+  _GLIBCXX_SIMD_ALWAYS_INLINE                                                  \
   enable_if_t<is_floating_point_v<_Tp>, _R>                                    \
   __name(const simd<_Tp, _Abi>& __x, const typename _Arg2::type& __y,          \
 	 const typename _Arg3::type& __z)                                      \
@@ -399,6 +402,7 @@ template <typename _Abi>
 // }}}
 // __extract_exponent_as_int {{{
 template <typename _Tp, typename _Abi>
+  _GLIBCXX_SIMD_INTRINSIC
   rebind_simd_t<int, simd<_Tp, _Abi>>
   __extract_exponent_as_int(const simd<_Tp, _Abi>& __v)
   {
@@ -421,7 +425,8 @@ template <typename ImplFun, typename FallbackFun, typename... _Args>
     -> decltype(__impl_fun(static_cast<_Args&&>(__args)...))
   { return __impl_fun(static_cast<_Args&&>(__args)...); }
 
-template <typename ImplFun, typename FallbackFun, typename... _Args>
+template <typename ImplFun, typename FallbackFun, typename... _Args,
+	  typename = __detail::__odr_helper>
   inline auto
   __impl_or_fallback_dispatch(float, ImplFun&&, FallbackFun&& __fallback_fun,
 			      _Args&&... __args)
@@ -457,7 +462,7 @@ _GLIBCXX_SIMD_MATH_CALL2_(atan2, _Tp)
  * Fix sign.
  */
 // cos{{{
-template <typename _Tp, typename _Abi>
+template <typename _Tp, typename _Abi, typename = __detail::__odr_helper>
   enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>>
   cos(const simd<_Tp, _Abi>& __x)
   {
@@ -503,7 +508,7 @@ template <typename _Tp>
 
 //}}}
 // sin{{{
-template <typename _Tp, typename _Abi>
+template <typename _Tp, typename _Abi, typename = __detail::__odr_helper>
   enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>>
   sin(const simd<_Tp, _Abi>& __x)
   {
@@ -565,6 +570,7 @@ _GLIBCXX_SIMD_MATH_CALL_(expm1)
 // frexp {{{
 #if _GLIBCXX_SIMD_X86INTRIN
 template <typename _Tp, size_t _Np>
+  _GLIBCXX_SIMD_INTRINSIC
   _SimdWrapper<_Tp, _Np>
   __getexp(_SimdWrapper<_Tp, _Np> __x)
   {
@@ -593,6 +599,7 @@ template <typename _Tp, size_t _Np>
   }
 
 template <typename _Tp, size_t _Np>
+  _GLIBCXX_SIMD_INTRINSIC
   _SimdWrapper<_Tp, _Np>
   __getmant_avx512(_SimdWrapper<_Tp, _Np> __x)
   {
@@ -633,7 +640,7 @@ template <typename _Tp, size_t _Np>
  * The return value will be in the range [0.5, 1.0[
  * The @p __e value will be an integer defining the power-of-two exponent
  */
-template <typename _Tp, typename _Abi>
+template <typename _Tp, typename _Abi, typename = __detail::__odr_helper>
   enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>>
   frexp(const simd<_Tp, _Abi>& __x, _Samesize<int, simd<_Tp, _Abi>>* __exp)
   {
@@ -738,7 +745,7 @@ _GLIBCXX_SIMD_MATH_CALL_(log2)
 
 //}}}
 // logb{{{
-template <typename _Tp, typename _Abi>
+template <typename _Tp, typename _Abi, typename = __detail::__odr_helper>
   enable_if_t<is_floating_point<_Tp>::value, simd<_Tp, _Abi>>
   logb(const simd<_Tp, _Abi>& __x)
   {
@@ -813,7 +820,7 @@ template <typename _Tp, typename _Abi>
   }
 
 //}}}
-template <typename _Tp, typename _Abi>
+template <typename _Tp, typename _Abi, typename = __detail::__odr_helper>
   enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>>
   modf(const simd<_Tp, _Abi>& __x, simd<_Tp, _Abi>* __iptr)
   {
@@ -847,6 +854,7 @@ _GLIBCXX_SIMD_MATH_CALL_(fabs)
 // [parallel.simd.math] only asks for is_floating_point_v<_Tp> and forgot to
 // allow signed integral _Tp
 template <typename _Tp, typename _Abi>
+  _GLIBCXX_SIMD_ALWAYS_INLINE
   enable_if_t<!is_floating_point_v<_Tp> && is_signed_v<_Tp>, simd<_Tp, _Abi>>
   abs(const simd<_Tp, _Abi>& __x)
   { return {__private_init, _Abi::_SimdImpl::_S_abs(__data(__x))}; }
@@ -929,7 +937,7 @@ template <typename _R, typename _ToApply, typename _Tp, typename... _Tps>
 	      __data(__args)...)};
   }
 
-template <typename _VV>
+template <typename _VV, typename = __detail::__odr_helper>
   __remove_cvref_t<_VV>
   __hypot(_VV __x, _VV __y)
   {
@@ -1067,7 +1075,7 @@ template <typename _Tp, typename _Abi>
 
 _GLIBCXX_SIMD_CVTING2(hypot)
 
-  template <typename _VV>
+  template <typename _VV, typename = __detail::__odr_helper>
   __remove_cvref_t<_VV>
   __hypot(_VV __x, _VV __y, _VV __z)
   {
@@ -1268,7 +1276,7 @@ _GLIBCXX_SIMD_MATH_CALL2_(fmod, _Tp)
 _GLIBCXX_SIMD_MATH_CALL2_(remainder, _Tp)
 _GLIBCXX_SIMD_MATH_CALL3_(remquo, _Tp, int*)
 
-template <typename _Tp, typename _Abi>
+template <typename _Tp, typename _Abi, typename = __detail::__odr_helper>
   enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>>
   copysign(const simd<_Tp, _Abi>& __x, const simd<_Tp, _Abi>& __y)
   {
@@ -1306,12 +1314,14 @@ _GLIBCXX_SIMD_MATH_CALL_(isfinite)
 // `int isinf(double)`.
 template <typename _Tp, typename _Abi, typename...,
 	  typename _R = _Math_return_type_t<bool, _Tp, _Abi>>
+  _GLIBCXX_SIMD_ALWAYS_INLINE
   enable_if_t<is_floating_point_v<_Tp>, _R>
   isinf(simd<_Tp, _Abi> __x)
   { return {__private_init, _Abi::_SimdImpl::_S_isinf(__data(__x))}; }
 
 template <typename _Tp, typename _Abi, typename...,
 	  typename _R = _Math_return_type_t<bool, _Tp, _Abi>>
+  _GLIBCXX_SIMD_ALWAYS_INLINE
   enable_if_t<is_floating_point_v<_Tp>, _R>
   isnan(simd<_Tp, _Abi> __x)
   { return {__private_init, _Abi::_SimdImpl::_S_isnan(__data(__x))}; }
@@ -1319,6 +1329,7 @@ template <typename _Tp, typename _Abi, typename...,
 _GLIBCXX_SIMD_MATH_CALL_(isnormal)
 
 template <typename..., typename _Tp, typename _Abi>
+  _GLIBCXX_SIMD_ALWAYS_INLINE
   simd_mask<_Tp, _Abi>
   signbit(simd<_Tp, _Abi> __x)
   {
@@ -1366,7 +1377,7 @@ simd_div_t<__llongv<_Abi>> div(__llongv<_Abi> numer,
 */
 
 // special math {{{
-template <typename _Tp, typename _Abi>
+template <typename _Tp, typename _Abi, typename = __detail::__odr_helper>
   enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>>
   assoc_laguerre(const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __n,
 		 const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __m,
@@ -1377,7 +1388,7 @@ template <typename _Tp, typename _Abi>
     });
   }
 
-template <typename _Tp, typename _Abi>
+template <typename _Tp, typename _Abi, typename = __detail::__odr_helper>
   enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>>
   assoc_legendre(const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __n,
 		 const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __m,
@@ -1401,7 +1412,7 @@ _GLIBCXX_SIMD_MATH_CALL2_(ellint_2, _Tp)
 _GLIBCXX_SIMD_MATH_CALL3_(ellint_3, _Tp, _Tp)
 _GLIBCXX_SIMD_MATH_CALL_(expint)
 
-template <typename _Tp, typename _Abi>
+template <typename _Tp, typename _Abi, typename = __detail::__odr_helper>
   enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>>
   hermite(const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __n,
 	  const simd<_Tp, _Abi>& __x)
@@ -1410,7 +1421,7 @@ template <typename _Tp, typename _Abi>
       [&](auto __i) { return std::hermite(__n[__i], __x[__i]); });
   }
 
-template <typename _Tp, typename _Abi>
+template <typename _Tp, typename _Abi, typename = __detail::__odr_helper>
   enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>>
   laguerre(const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __n,
 	   const simd<_Tp, _Abi>& __x)
@@ -1419,7 +1430,7 @@ template <typename _Tp, typename _Abi>
       [&](auto __i) { return std::laguerre(__n[__i], __x[__i]); });
   }
 
-template <typename _Tp, typename _Abi>
+template <typename _Tp, typename _Abi, typename = __detail::__odr_helper>
   enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>>
   legendre(const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __n,
 	   const simd<_Tp, _Abi>& __x)
@@ -1430,7 +1441,7 @@ template <typename _Tp, typename _Abi>
 
 _GLIBCXX_SIMD_MATH_CALL_(riemann_zeta)
 
-template <typename _Tp, typename _Abi>
+template <typename _Tp, typename _Abi, typename = __detail::__odr_helper>
   enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>>
   sph_bessel(const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __n,
 	     const simd<_Tp, _Abi>& __x)
@@ -1439,7 +1450,7 @@ template <typename _Tp, typename _Abi>
       [&](auto __i) { return std::sph_bessel(__n[__i], __x[__i]); });
   }
 
-template <typename _Tp, typename _Abi>
+template <typename _Tp, typename _Abi, typename = __detail::__odr_helper>
   enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>>
   sph_legendre(const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __l,
 	       const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __m,
@@ -1450,7 +1461,7 @@ template <typename _Tp, typename _Abi>
     });
   }
 
-template <typename _Tp, typename _Abi>
+template <typename _Tp, typename _Abi, typename = __detail::__odr_helper>
   enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>>
   sph_neumann(const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __n,
 	      const simd<_Tp, _Abi>& __x)
diff --git a/libstdc++-v3/include/experimental/bits/simd_neon.h b/libstdc++-v3/include/experimental/bits/simd_neon.h
index 7f472e88649..bbd26835d9c 100644
--- a/libstdc++-v3/include/experimental/bits/simd_neon.h
+++ b/libstdc++-v3/include/experimental/bits/simd_neon.h
@@ -44,7 +44,7 @@ struct _CommonImplNeon : _CommonImplBuiltin
 
 // }}}
 // _SimdImplNeon {{{
-template <typename _Abi>
+template <typename _Abi, typename>
   struct _SimdImplNeon : _SimdImplBuiltin<_Abi>
   {
     using _Base = _SimdImplBuiltin<_Abi>;
@@ -390,7 +390,7 @@ struct _MaskImplNeonMixin
 
 // }}}
 // _MaskImplNeon {{{
-template <typename _Abi>
+template <typename _Abi, typename>
   struct _MaskImplNeon : _MaskImplNeonMixin, _MaskImplBuiltin<_Abi>
   {
     using _MaskImplBuiltinMixin::_S_to_maskvector;
diff --git a/libstdc++-v3/include/experimental/bits/simd_ppc.h b/libstdc++-v3/include/experimental/bits/simd_ppc.h
index ef52d129a85..4143bafa80e 100644
--- a/libstdc++-v3/include/experimental/bits/simd_ppc.h
+++ b/libstdc++-v3/include/experimental/bits/simd_ppc.h
@@ -35,7 +35,7 @@
 _GLIBCXX_SIMD_BEGIN_NAMESPACE
 
 // _SimdImplPpc {{{
-template <typename _Abi>
+template <typename _Abi, typename>
   struct _SimdImplPpc : _SimdImplBuiltin<_Abi>
   {
     using _Base = _SimdImplBuiltin<_Abi>;
@@ -117,7 +117,7 @@ template <typename _Abi>
 
 // }}}
 // _MaskImplPpc {{{
-template <typename _Abi>
+template <typename _Abi, typename>
   struct _MaskImplPpc : _MaskImplBuiltin<_Abi>
   {
     using _Base = _MaskImplBuiltin<_Abi>;
diff --git a/libstdc++-v3/include/experimental/bits/simd_scalar.h b/libstdc++-v3/include/experimental/bits/simd_scalar.h
index 48e13f6c719..b23011ca6c9 100644
--- a/libstdc++-v3/include/experimental/bits/simd_scalar.h
+++ b/libstdc++-v3/include/experimental/bits/simd_scalar.h
@@ -155,7 +155,8 @@ struct _SimdImplScalar
 
   // _S_masked_load {{{2
   template <typename _Tp, typename _Up>
-    static inline _Tp _S_masked_load(_Tp __merge, bool __k,
+    _GLIBCXX_SIMD_INTRINSIC
+    static _Tp _S_masked_load(_Tp __merge, bool __k,
 				     const _Up* __mem) noexcept
     {
       if (__k)
@@ -165,83 +166,97 @@ struct _SimdImplScalar
 
   // _S_store {{{2
   template <typename _Tp, typename _Up>
-    static inline void _S_store(_Tp __v, _Up* __mem, _TypeTag<_Tp>) noexcept
+    _GLIBCXX_SIMD_INTRINSIC
+    static void _S_store(_Tp __v, _Up* __mem, _TypeTag<_Tp>) noexcept
     { __mem[0] = static_cast<_Up>(__v); }
 
   // _S_masked_store {{{2
   template <typename _Tp, typename _Up>
-    static inline void _S_masked_store(const _Tp __v, _Up* __mem,
+    _GLIBCXX_SIMD_INTRINSIC
+    static void _S_masked_store(const _Tp __v, _Up* __mem,
 				       const bool __k) noexcept
     { if (__k) __mem[0] = __v; }
 
   // _S_negate {{{2
   template <typename _Tp>
-    static constexpr inline bool _S_negate(_Tp __x) noexcept
+    _GLIBCXX_SIMD_INTRINSIC
+    static constexpr bool _S_negate(_Tp __x) noexcept
     { return !__x; }
 
   // _S_reduce {{{2
   template <typename _Tp, typename _BinaryOperation>
-    static constexpr inline _Tp
+    _GLIBCXX_SIMD_INTRINSIC
+    static constexpr _Tp
     _S_reduce(const simd<_Tp, simd_abi::scalar>& __x, const _BinaryOperation&)
     { return __x._M_data; }
 
   // _S_min, _S_max {{{2
   template <typename _Tp>
-    static constexpr inline _Tp _S_min(const _Tp __a, const _Tp __b)
+    _GLIBCXX_SIMD_INTRINSIC
+    static constexpr _Tp _S_min(const _Tp __a, const _Tp __b)
     { return std::min(__a, __b); }
 
   template <typename _Tp>
-    static constexpr inline _Tp _S_max(const _Tp __a, const _Tp __b)
+    _GLIBCXX_SIMD_INTRINSIC
+    static constexpr _Tp _S_max(const _Tp __a, const _Tp __b)
     { return std::max(__a, __b); }
 
   // _S_complement {{{2
   template <typename _Tp>
-    static constexpr inline _Tp _S_complement(_Tp __x) noexcept
+    _GLIBCXX_SIMD_INTRINSIC
+    static constexpr _Tp _S_complement(_Tp __x) noexcept
     { return static_cast<_Tp>(~__x); }
 
   // _S_unary_minus {{{2
   template <typename _Tp>
-    static constexpr inline _Tp _S_unary_minus(_Tp __x) noexcept
+    _GLIBCXX_SIMD_INTRINSIC
+    static constexpr _Tp _S_unary_minus(_Tp __x) noexcept
     { return static_cast<_Tp>(-__x); }
 
   // arithmetic operators {{{2
   template <typename _Tp>
-    static constexpr inline _Tp _S_plus(_Tp __x, _Tp __y)
+    _GLIBCXX_SIMD_INTRINSIC
+    static constexpr _Tp _S_plus(_Tp __x, _Tp __y)
     {
       return static_cast<_Tp>(__promote_preserving_unsigned(__x)
 			      + __promote_preserving_unsigned(__y));
     }
 
   template <typename _Tp>
-    static constexpr inline _Tp _S_minus(_Tp __x, _Tp __y)
+    _GLIBCXX_SIMD_INTRINSIC
+    static constexpr _Tp _S_minus(_Tp __x, _Tp __y)
     {
       return static_cast<_Tp>(__promote_preserving_unsigned(__x)
 			      - __promote_preserving_unsigned(__y));
     }
 
   template <typename _Tp>
-    static constexpr inline _Tp _S_multiplies(_Tp __x, _Tp __y)
+    _GLIBCXX_SIMD_INTRINSIC
+    static constexpr _Tp _S_multiplies(_Tp __x, _Tp __y)
     {
       return static_cast<_Tp>(__promote_preserving_unsigned(__x)
 			      * __promote_preserving_unsigned(__y));
     }
 
   template <typename _Tp>
-    static constexpr inline _Tp _S_divides(_Tp __x, _Tp __y)
+    _GLIBCXX_SIMD_INTRINSIC
+    static constexpr _Tp _S_divides(_Tp __x, _Tp __y)
     {
       return static_cast<_Tp>(__promote_preserving_unsigned(__x)
 			      / __promote_preserving_unsigned(__y));
     }
 
   template <typename _Tp>
-    static constexpr inline _Tp _S_modulus(_Tp __x, _Tp __y)
+    _GLIBCXX_SIMD_INTRINSIC
+    static constexpr _Tp _S_modulus(_Tp __x, _Tp __y)
     {
       return static_cast<_Tp>(__promote_preserving_unsigned(__x)
 			      % __promote_preserving_unsigned(__y));
     }
 
   template <typename _Tp>
-    static constexpr inline _Tp _S_bit_and(_Tp __x, _Tp __y)
+    _GLIBCXX_SIMD_INTRINSIC
+    static constexpr _Tp _S_bit_and(_Tp __x, _Tp __y)
     {
       if constexpr (is_floating_point_v<_Tp>)
 	{
@@ -254,7 +269,8 @@ struct _SimdImplScalar
     }
 
   template <typename _Tp>
-    static constexpr inline _Tp _S_bit_or(_Tp __x, _Tp __y)
+    _GLIBCXX_SIMD_INTRINSIC
+    static constexpr _Tp _S_bit_or(_Tp __x, _Tp __y)
     {
       if constexpr (is_floating_point_v<_Tp>)
 	{
@@ -267,7 +283,8 @@ struct _SimdImplScalar
     }
 
   template <typename _Tp>
-    static constexpr inline _Tp _S_bit_xor(_Tp __x, _Tp __y)
+    _GLIBCXX_SIMD_INTRINSIC
+    static constexpr _Tp _S_bit_xor(_Tp __x, _Tp __y)
     {
       if constexpr (is_floating_point_v<_Tp>)
 	{
@@ -280,11 +297,13 @@ struct _SimdImplScalar
     }
 
   template <typename _Tp>
-    static constexpr inline _Tp _S_bit_shift_left(_Tp __x, int __y)
+    _GLIBCXX_SIMD_INTRINSIC
+    static constexpr _Tp _S_bit_shift_left(_Tp __x, int __y)
     { return static_cast<_Tp>(__promote_preserving_unsigned(__x) << __y); }
 
   template <typename _Tp>
-    static constexpr inline _Tp _S_bit_shift_right(_Tp __x, int __y)
+    _GLIBCXX_SIMD_INTRINSIC
+    static constexpr _Tp _S_bit_shift_right(_Tp __x, int __y)
     { return static_cast<_Tp>(__promote_preserving_unsigned(__x) >> __y); }
 
   // math {{{2
@@ -553,11 +572,13 @@ struct _SimdImplScalar
 
   // _S_increment & _S_decrement{{{2
   template <typename _Tp>
-    constexpr static inline void _S_increment(_Tp& __x)
+    _GLIBCXX_SIMD_INTRINSIC
+    constexpr static void _S_increment(_Tp& __x)
     { ++__x; }
 
   template <typename _Tp>
-    constexpr static inline void _S_decrement(_Tp& __x)
+    _GLIBCXX_SIMD_INTRINSIC
+    constexpr static void _S_decrement(_Tp& __x)
     { --__x; }
 
 
@@ -582,6 +603,7 @@ struct _SimdImplScalar
 
   // smart_reference access {{{2
   template <typename _Tp, typename _Up>
+    _GLIBCXX_SIMD_INTRINSIC
     constexpr static void _S_set(_Tp& __v, [[maybe_unused]] int __i,
 				 _Up&& __x) noexcept
     {
@@ -677,25 +699,32 @@ struct _MaskImplScalar
   }
 
   // logical and bitwise operators {{{2
+  _GLIBCXX_SIMD_INTRINSIC
   static constexpr bool _S_logical_and(bool __x, bool __y)
   { return __x && __y; }
 
+  _GLIBCXX_SIMD_INTRINSIC
   static constexpr bool _S_logical_or(bool __x, bool __y)
   { return __x || __y; }
 
+  _GLIBCXX_SIMD_INTRINSIC
   static constexpr bool _S_bit_not(bool __x)
   { return !__x; }
 
+  _GLIBCXX_SIMD_INTRINSIC
   static constexpr bool _S_bit_and(bool __x, bool __y)
   { return __x && __y; }
 
+  _GLIBCXX_SIMD_INTRINSIC
   static constexpr bool _S_bit_or(bool __x, bool __y)
   { return __x || __y; }
 
+  _GLIBCXX_SIMD_INTRINSIC
   static constexpr bool _S_bit_xor(bool __x, bool __y)
   { return __x != __y; }
 
   // smart_reference access {{{2
+  _GLIBCXX_SIMD_INTRINSIC
   constexpr static void _S_set(bool& __k, [[maybe_unused]] int __i,
 			       bool __x) noexcept
   {
diff --git a/libstdc++-v3/include/experimental/bits/simd_x86.h b/libstdc++-v3/include/experimental/bits/simd_x86.h
index 34633c096b1..e010740b44c 100644
--- a/libstdc++-v3/include/experimental/bits/simd_x86.h
+++ b/libstdc++-v3/include/experimental/bits/simd_x86.h
@@ -822,7 +822,7 @@ struct _CommonImplX86 : _CommonImplBuiltin
 
 // }}}
 // _SimdImplX86 {{{
-template <typename _Abi>
+template <typename _Abi, typename>
   struct _SimdImplX86 : _SimdImplBuiltin<_Abi>
   {
     using _Base = _SimdImplBuiltin<_Abi>;
@@ -4241,7 +4241,7 @@ struct _MaskImplX86Mixin
 
 // }}}
 // _MaskImplX86 {{{
-template <typename _Abi>
+template <typename _Abi, typename>
   struct _MaskImplX86 : _MaskImplX86Mixin, _MaskImplBuiltin<_Abi>
   {
     using _MaskImplX86Mixin::_S_to_bits;

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 11/11] libstdc++: Fix ODR issues with different -m flags
  2021-06-08 12:12 ` [PATCH 11/11] libstdc++: Fix ODR issues with different -m flags Matthias Kretz
@ 2021-06-09 12:22   ` Richard Biener
  2021-06-09 12:53     ` Matthias Kretz
  2021-11-15  8:57   ` Matthias Kretz
  1 sibling, 1 reply; 29+ messages in thread
From: Richard Biener @ 2021-06-09 12:22 UTC (permalink / raw)
  To: Matthias Kretz; +Cc: GCC Patches, libstdc++

On Tue, Jun 8, 2021 at 2:23 PM Matthias Kretz <m.kretz@gsi.de> wrote:
>
>
> From: Matthias Kretz <kretz@kde.org>
>
> Explicitly support use of the stdx::simd implementation in situations
> where the user links TUs that were compiled with different -m flags. In
> general, this is always a (quasi) ODR violation for inline functions
> because at least codegen may differ in important ways. However, in the
> resulting executable only one (unspecified which one) of them might be
> used. For simd we want to support users to compile code multiple times,
> with different -m flags and have a runtime dispatch to the TU matching
> the target CPU. But if internal functions are not inlined this may lead
> to unexpected performance loss or execution of illegal instructions.
> Therefore, inline functions that are not marked as always_inline must
> use an additional template parameter somewhere in their name, to
> disambiguate between the different -m translations.

Note that excessive use of always_inline can cause compile-time issues
(see for example PR99785).  I wonder whether the inlines can be
placed in an anonymous namespace instead of the difficult to maintain
explict list of SIMD features?  It also doesn't solve the issue when
instantiating the functions from a TU which contains #pragma GCC target
sections to switch options, of course.

Richard.

> Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
>
> libstdc++-v3/ChangeLog:
>
>         * include/experimental/bits/simd.h: Move feature detection bools
>         and add __have_avx512bitalg, __have_avx512vbmi2,
>         __have_avx512vbmi, __have_avx512ifma, __have_avx512cd,
>         __have_avx512vnni, __have_avx512vpopcntdq.
>         (__detail::__machine_flags): New function which returns a unique
>         uint64 depending on relevant -m and -f flags.
>         (__detail::__odr_helper): New type alias for either an anonymous
>         type or a type specialized with the __machine_flags number.
>         (_SimdIntOperators): Change template parameters from _Impl to
>         _Tp, _Abi because _Impl now has an __odr_helper parameter which
>         may be _OdrEnforcer from the anonymous namespace, which makes
>         for a bad base class.
>         (many): Either add __odr_helper template parameter or mark as
>         always_inline.
>         * include/experimental/bits/simd_detail.h: Add defines for
>         AVX512BITALG, AVX512VBMI2, AVX512VBMI, AVX512IFMA, AVX512CD,
>         AVX512VNNI, AVX512VPOPCNTDQ, and AVX512VP2INTERSECT.
>         * include/experimental/bits/simd_builtin.h: Add __odr_helper
>         template parameter or mark as always_inline.
>         * include/experimental/bits/simd_fixed_size.h: Ditto.
>         * include/experimental/bits/simd_math.h: Ditto.
>         * include/experimental/bits/simd_scalar.h: Ditto.
>         * include/experimental/bits/simd_neon.h: Add __odr_helper
>         template parameter.
>         * include/experimental/bits/simd_ppc.h: Ditto.
>         * include/experimental/bits/simd_x86.h: Ditto.
> ---
>  libstdc++-v3/include/experimental/bits/simd.h | 380 ++++++++++++------
>  .../include/experimental/bits/simd_builtin.h  |  41 +-
>  .../include/experimental/bits/simd_detail.h   |  40 ++
>  .../experimental/bits/simd_fixed_size.h       |  39 +-
>  .../include/experimental/bits/simd_math.h     |  45 ++-
>  .../include/experimental/bits/simd_neon.h     |   4 +-
>  .../include/experimental/bits/simd_ppc.h      |   4 +-
>  .../include/experimental/bits/simd_scalar.h   |  71 +++-
>  .../include/experimental/bits/simd_x86.h      |   4 +-
>  9 files changed, 440 insertions(+), 188 deletions(-)
>
>
> --
> ──────────────────────────────────────────────────────────────────────────
>  Dr. Matthias Kretz                           https://mattkretz.github.io
>  GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
>  std::experimental::simd              https://github.com/VcDevel/std-simd
> ──────────────────────────────────────────────────────────────────────────

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 11/11] libstdc++: Fix ODR issues with different -m flags
  2021-06-09 12:22   ` Richard Biener
@ 2021-06-09 12:53     ` Matthias Kretz
  2021-06-09 13:22       ` Richard Biener
  0 siblings, 1 reply; 29+ messages in thread
From: Matthias Kretz @ 2021-06-09 12:53 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, libstdc++

On Wednesday, 9 June 2021 14:22:00 CEST Richard Biener wrote:
> On Tue, Jun 8, 2021 at 2:23 PM Matthias Kretz <m.kretz@gsi.de> wrote:
> > From: Matthias Kretz <kretz@kde.org>
> > 
> > Explicitly support use of the stdx::simd implementation in situations
> > where the user links TUs that were compiled with different -m flags. In
> > general, this is always a (quasi) ODR violation for inline functions
> > because at least codegen may differ in important ways. However, in the
> > resulting executable only one (unspecified which one) of them might be
> > used. For simd we want to support users to compile code multiple times,
> > with different -m flags and have a runtime dispatch to the TU matching
> > the target CPU. But if internal functions are not inlined this may lead
> > to unexpected performance loss or execution of illegal instructions.
> > Therefore, inline functions that are not marked as always_inline must
> > use an additional template parameter somewhere in their name, to
> > disambiguate between the different -m translations.
> 
> Note that excessive use of always_inline can cause compile-time issues
> (see for example PR99785).

Ah, I should verify whether that's also the reason my stdx::simd 
implementation is slow to compile.

However, I really must have the always_inline semantics in most of the places 
stdx::simd uses it. Because most of these functions compile to either a single 
function call or a single instruction (often f0 -> f1 -> f2 -> single 
instruction). If the inliner even makes one single wrong inlining decision, 
the whole program might slow down by integral factors, not only small 
percentages. And without inlining these functions, -fno-inline builds (i.e. 
many debug builds) become unbearably slow (aka useless).

> I wonder whether the inlines can be
> placed in an anonymous namespace instead of the difficult to maintain
> explict list of SIMD features?

It's possible, and part of the patch:

+  namespace
+  {
+    struct _OdrEnforcer {};
+  }
[...]
+  using __odr_helper
+    = conditional_t<__machine_flags() == 0, _OdrEnforcer,
+	_MachineFlagsTemplate<__machine_flags(), __floating_point_flags()>>;

It can potentially blow up the code size and the instruction cache usage, 
though. The trade-off isn't obvious to make. I guess I can't promise that 
mixing different compiler flags is ODR violation free 

> It also doesn't solve the issue when
> instantiating the functions from a TU which contains #pragma GCC target
> sections to switch options, of course.

Yes. Can I get PR83875? ;-)

- Matthias

> > Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
> > 
> > libstdc++-v3/ChangeLog:
> >         * include/experimental/bits/simd.h: Move feature detection bools
> >         and add __have_avx512bitalg, __have_avx512vbmi2,
> >         __have_avx512vbmi, __have_avx512ifma, __have_avx512cd,
> >         __have_avx512vnni, __have_avx512vpopcntdq.
> >         (__detail::__machine_flags): New function which returns a unique
> >         uint64 depending on relevant -m and -f flags.
> >         (__detail::__odr_helper): New type alias for either an anonymous
> >         type or a type specialized with the __machine_flags number.
> >         (_SimdIntOperators): Change template parameters from _Impl to
> >         _Tp, _Abi because _Impl now has an __odr_helper parameter which
> >         may be _OdrEnforcer from the anonymous namespace, which makes
> >         for a bad base class.
> >         (many): Either add __odr_helper template parameter or mark as
> >         always_inline.
> >         * include/experimental/bits/simd_detail.h: Add defines for
> >         AVX512BITALG, AVX512VBMI2, AVX512VBMI, AVX512IFMA, AVX512CD,
> >         AVX512VNNI, AVX512VPOPCNTDQ, and AVX512VP2INTERSECT.
> >         * include/experimental/bits/simd_builtin.h: Add __odr_helper
> >         template parameter or mark as always_inline.
> >         * include/experimental/bits/simd_fixed_size.h: Ditto.
> >         * include/experimental/bits/simd_math.h: Ditto.
> >         * include/experimental/bits/simd_scalar.h: Ditto.
> >         * include/experimental/bits/simd_neon.h: Add __odr_helper
> >         template parameter.
> >         * include/experimental/bits/simd_ppc.h: Ditto.
> >         * include/experimental/bits/simd_x86.h: Ditto.
> > 
> > ---
> > 
> >  libstdc++-v3/include/experimental/bits/simd.h | 380 ++++++++++++------
> >  .../include/experimental/bits/simd_builtin.h  |  41 +-
> >  .../include/experimental/bits/simd_detail.h   |  40 ++
> >  .../experimental/bits/simd_fixed_size.h       |  39 +-
> >  .../include/experimental/bits/simd_math.h     |  45 ++-
> >  .../include/experimental/bits/simd_neon.h     |   4 +-
> >  .../include/experimental/bits/simd_ppc.h      |   4 +-
> >  .../include/experimental/bits/simd_scalar.h   |  71 +++-
> >  .../include/experimental/bits/simd_x86.h      |   4 +-
> >  9 files changed, 440 insertions(+), 188 deletions(-)
> > 
> > --
> > ──────────────────────────────────────────────────────────────────────────
> > 
> >  Dr. Matthias Kretz                           https://mattkretz.github.io
> >  GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
> >  std::experimental::simd              https://github.com/VcDevel/std-simd
> > 
> > ──────────────────────────────────────────────────────────────────────────


-- 
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 std::experimental::simd              https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 11/11] libstdc++: Fix ODR issues with different -m flags
  2021-06-09 12:53     ` Matthias Kretz
@ 2021-06-09 13:22       ` Richard Biener
  0 siblings, 0 replies; 29+ messages in thread
From: Richard Biener @ 2021-06-09 13:22 UTC (permalink / raw)
  To: Matthias Kretz; +Cc: GCC Patches, libstdc++

On Wed, Jun 9, 2021 at 2:53 PM Matthias Kretz <m.kretz@gsi.de> wrote:
>
> On Wednesday, 9 June 2021 14:22:00 CEST Richard Biener wrote:
> > On Tue, Jun 8, 2021 at 2:23 PM Matthias Kretz <m.kretz@gsi.de> wrote:
> > > From: Matthias Kretz <kretz@kde.org>
> > >
> > > Explicitly support use of the stdx::simd implementation in situations
> > > where the user links TUs that were compiled with different -m flags. In
> > > general, this is always a (quasi) ODR violation for inline functions
> > > because at least codegen may differ in important ways. However, in the
> > > resulting executable only one (unspecified which one) of them might be
> > > used. For simd we want to support users to compile code multiple times,
> > > with different -m flags and have a runtime dispatch to the TU matching
> > > the target CPU. But if internal functions are not inlined this may lead
> > > to unexpected performance loss or execution of illegal instructions.
> > > Therefore, inline functions that are not marked as always_inline must
> > > use an additional template parameter somewhere in their name, to
> > > disambiguate between the different -m translations.
> >
> > Note that excessive use of always_inline can cause compile-time issues
> > (see for example PR99785).
>
> Ah, I should verify whether that's also the reason my stdx::simd
> implementation is slow to compile.
>
> However, I really must have the always_inline semantics in most of the places
> stdx::simd uses it. Because most of these functions compile to either a single
> function call or a single instruction (often f0 -> f1 -> f2 -> single
> instruction). If the inliner even makes one single wrong inlining decision,
> the whole program might slow down by integral factors, not only small
> percentages. And without inlining these functions, -fno-inline builds (i.e.
> many debug builds) become unbearably slow (aka useless).

Understood.  Note I think that the slow compile is a bug and there must be
a way to address it, there's just too large testcases at the moment to get
a hand on what kind of callgraphs cause which problem and why and how
we might want to address this.

> > I wonder whether the inlines can be
> > placed in an anonymous namespace instead of the difficult to maintain
> > explict list of SIMD features?
>
> It's possible, and part of the patch:
>
> +  namespace
> +  {
> +    struct _OdrEnforcer {};
> +  }
> [...]
> +  using __odr_helper
> +    = conditional_t<__machine_flags() == 0, _OdrEnforcer,
> +       _MachineFlagsTemplate<__machine_flags(), __floating_point_flags()>>;
>
> It can potentially blow up the code size and the instruction cache usage,
> though. The trade-off isn't obvious to make. I guess I can't promise that
> mixing different compiler flags is ODR violation free
>
> > It also doesn't solve the issue when
> > instantiating the functions from a TU which contains #pragma GCC target
> > sections to switch options, of course.
>
> Yes. Can I get PR83875? ;-)

heh ;)

Richard.

> - Matthias
>
> > > Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
> > >
> > > libstdc++-v3/ChangeLog:
> > >         * include/experimental/bits/simd.h: Move feature detection bools
> > >         and add __have_avx512bitalg, __have_avx512vbmi2,
> > >         __have_avx512vbmi, __have_avx512ifma, __have_avx512cd,
> > >         __have_avx512vnni, __have_avx512vpopcntdq.
> > >         (__detail::__machine_flags): New function which returns a unique
> > >         uint64 depending on relevant -m and -f flags.
> > >         (__detail::__odr_helper): New type alias for either an anonymous
> > >         type or a type specialized with the __machine_flags number.
> > >         (_SimdIntOperators): Change template parameters from _Impl to
> > >         _Tp, _Abi because _Impl now has an __odr_helper parameter which
> > >         may be _OdrEnforcer from the anonymous namespace, which makes
> > >         for a bad base class.
> > >         (many): Either add __odr_helper template parameter or mark as
> > >         always_inline.
> > >         * include/experimental/bits/simd_detail.h: Add defines for
> > >         AVX512BITALG, AVX512VBMI2, AVX512VBMI, AVX512IFMA, AVX512CD,
> > >         AVX512VNNI, AVX512VPOPCNTDQ, and AVX512VP2INTERSECT.
> > >         * include/experimental/bits/simd_builtin.h: Add __odr_helper
> > >         template parameter or mark as always_inline.
> > >         * include/experimental/bits/simd_fixed_size.h: Ditto.
> > >         * include/experimental/bits/simd_math.h: Ditto.
> > >         * include/experimental/bits/simd_scalar.h: Ditto.
> > >         * include/experimental/bits/simd_neon.h: Add __odr_helper
> > >         template parameter.
> > >         * include/experimental/bits/simd_ppc.h: Ditto.
> > >         * include/experimental/bits/simd_x86.h: Ditto.
> > >
> > > ---
> > >
> > >  libstdc++-v3/include/experimental/bits/simd.h | 380 ++++++++++++------
> > >  .../include/experimental/bits/simd_builtin.h  |  41 +-
> > >  .../include/experimental/bits/simd_detail.h   |  40 ++
> > >  .../experimental/bits/simd_fixed_size.h       |  39 +-
> > >  .../include/experimental/bits/simd_math.h     |  45 ++-
> > >  .../include/experimental/bits/simd_neon.h     |   4 +-
> > >  .../include/experimental/bits/simd_ppc.h      |   4 +-
> > >  .../include/experimental/bits/simd_scalar.h   |  71 +++-
> > >  .../include/experimental/bits/simd_x86.h      |   4 +-
> > >  9 files changed, 440 insertions(+), 188 deletions(-)
> > >
> > > --
> > > ──────────────────────────────────────────────────────────────────────────
> > >
> > >  Dr. Matthias Kretz                           https://mattkretz.github.io
> > >  GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
> > >  std::experimental::simd              https://github.com/VcDevel/std-simd
> > >
> > > ──────────────────────────────────────────────────────────────────────────
>
>
> --
> ──────────────────────────────────────────────────────────────────────────
>  Dr. Matthias Kretz                           https://mattkretz.github.io
>  GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
>  std::experimental::simd              https://github.com/VcDevel/std-simd
> ──────────────────────────────────────────────────────────────────────────
>
>
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 04/11 v2] libstdc++: Make use of __builtin_bit_cast
  2021-06-08 12:11 ` [PATCH 04/11] libstdc++: Make use of __builtin_bit_cast Matthias Kretz
@ 2021-06-11 10:53   ` Matthias Kretz
  2021-06-24 14:01     ` [PATCH 04/11 v3] " Matthias Kretz
  0 siblings, 1 reply; 29+ messages in thread
From: Matthias Kretz @ 2021-06-11 10:53 UTC (permalink / raw)
  To: gcc-patches, libstdc++

[-- Attachment #1: Type: text/plain, Size: 2401 bytes --]

While testing newer patches I found several missing conversions from 
__bit_cast to simd_bit_cast in this patch (i.e. where bit casting to / from 
fixed_size was sometimes required). Corrected patch attached.


From: Matthias Kretz <kretz@kde.org>

The __bit_cast function was a hack to achieve what __builtin_bit_cast
can do, therefore use __builtin_bit_cast if possible. However,
__builtin_bit_cast cannot be used to cast from/to fixed_size_simd, since
it isn't trivially copyable (in the language sense — in principle it
is). Therefore add __proposed::simd_bit_cast to enable the use case
required in the test framework.

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>

libstdc++-v3/ChangeLog:

        * include/experimental/bits/simd.h (__bit_cast): Implement via
        __builtin_bit_cast #if available.
        (__proposed::simd_bit_cast): Add overloads for simd and
        simd_mask, which use __builtin_bit_cast (or __bit_cast #if not
        available), which return an object of the requested type with
        the same bits as the argument.
        * include/experimental/bits/simd_math.h: Use simd_bit_cast
        instead of __bit_cast to allow casts to fixed_size_simd.
        (copysign): Remove branch that was only required if __bit_cast
        cannot be constexpr.
        * testsuite/experimental/simd/tests/bits/test_values.h: Switch
        from __bit_cast to __proposed::simd_bit_cast since the former
        will not cast fixed_size objects anymore.
---
 libstdc++-v3/include/experimental/bits/simd.h | 57 ++++++++++++++++++-
 .../include/experimental/bits/simd_math.h     | 36 +++++-------
 .../simd/tests/bits/test_values.h             |  8 +--
 3 files changed, 75 insertions(+), 26 deletions(-)


--
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 std::experimental::simd              https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────

[-- Attachment #2: 0001-libstdc-Make-use-of-__builtin_bit_cast.patch --]
[-- Type: text/x-patch, Size: 8732 bytes --]

diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/include/experimental/bits/simd.h
index 163f1b574e2..852d0b62012 100644
--- a/libstdc++-v3/include/experimental/bits/simd.h
+++ b/libstdc++-v3/include/experimental/bits/simd.h
@@ -1598,7 +1598,9 @@ template <typename _To, typename _From>
   _GLIBCXX_SIMD_INTRINSIC constexpr _To
   __bit_cast(const _From __x)
   {
-    // TODO: implement with / replace by __builtin_bit_cast ASAP
+#if __has_builtin(__builtin_bit_cast)
+    return __builtin_bit_cast(_To, __x);
+#else
     static_assert(sizeof(_To) == sizeof(_From));
     constexpr bool __to_is_vectorizable
       = is_arithmetic_v<_To> || is_enum_v<_To>;
@@ -1629,6 +1631,7 @@ template <typename _To, typename _From>
 			 reinterpret_cast<const char*>(&__x), sizeof(_To));
 	return __r;
       }
+#endif
   }
 
 // }}}
@@ -2900,6 +2903,58 @@ template <typename _Tp, typename _Up, typename _Ap,
     return {__private_init, _RM::abi_type::_MaskImpl::template _S_convert<
 			      typename _RM::simd_type::value_type>(__x)};
   }
+
+template <typename _To, typename _Up, typename _Abi>
+  _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR
+  _To
+  simd_bit_cast(const simd<_Up, _Abi>& __x)
+  {
+    using _Tp = typename _To::value_type;
+    using _ToMember = typename _SimdTraits<_Tp, typename _To::abi_type>::_SimdMember;
+    using _From = simd<_Up, _Abi>;
+    using _FromMember = typename _SimdTraits<_Up, _Abi>::_SimdMember;
+    // with concepts, the following should be constraints
+    static_assert(sizeof(_To) == sizeof(_From));
+    static_assert(is_trivially_copyable_v<_Tp> && is_trivially_copyable_v<_Up>);
+    static_assert(is_trivially_copyable_v<_ToMember> && is_trivially_copyable_v<_FromMember>);
+#if __has_builtin(__builtin_bit_cast)
+    return {__private_init, __builtin_bit_cast(_ToMember, __data(__x))};
+#else
+    return {__private_init, __bit_cast<_ToMember>(__data(__x))};
+#endif
+  }
+
+template <typename _To, typename _Up, typename _Abi>
+  _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR
+  _To
+  simd_bit_cast(const simd_mask<_Up, _Abi>& __x)
+  {
+    using _From = simd_mask<_Up, _Abi>;
+    static_assert(sizeof(_To) == sizeof(_From));
+    static_assert(is_trivially_copyable_v<_From>);
+    // _To can be simd<T, A>, specifically simd<T, fixed_size<N>> in which case _To is not trivially
+    // copyable.
+    if constexpr (is_simd_v<_To>)
+      {
+	using _Tp = typename _To::value_type;
+	using _ToMember = typename _SimdTraits<_Tp, typename _To::abi_type>::_SimdMember;
+	static_assert(is_trivially_copyable_v<_ToMember>);
+#if __has_builtin(__builtin_bit_cast)
+	return {__private_init, __builtin_bit_cast(_ToMember, __x)};
+#else
+	return {__private_init, __bit_cast<_ToMember>(__x)};
+#endif
+      }
+    else
+      {
+	static_assert(is_trivially_copyable_v<_To>);
+#if __has_builtin(__builtin_bit_cast)
+	return __builtin_bit_cast(_To, __x);
+#else
+	return __bit_cast<_To>(__x);
+#endif
+      }
+  }
 } // namespace __proposed
 
 // simd_cast {{{2
diff --git a/libstdc++-v3/include/experimental/bits/simd_math.h b/libstdc++-v3/include/experimental/bits/simd_math.h
index d954e761eee..afd8b5a028f 100644
--- a/libstdc++-v3/include/experimental/bits/simd_math.h
+++ b/libstdc++-v3/include/experimental/bits/simd_math.h
@@ -405,10 +405,11 @@ template <typename _Tp, typename _Abi>
     using _Vp = simd<_Tp, _Abi>;
     using _Up = make_unsigned_t<__int_for_sizeof_t<_Tp>>;
     using namespace std::experimental::__float_bitwise_operators;
+    using namespace std::experimental::__proposed;
     const _Vp __exponent_mask
       = __infinity_v<_Tp>; // 0x7f800000 or 0x7ff0000000000000
     return static_simd_cast<rebind_simd_t<int, _Vp>>(
-      __bit_cast<rebind_simd_t<_Up, _Vp>>(__v & __exponent_mask)
+	     simd_bit_cast<rebind_simd_t<_Up, _Vp>>(__v & __exponent_mask)
       >> (__digits_v<_Tp> - 1));
   }
 
@@ -700,11 +701,9 @@ template <typename _Tp, typename _Abi>
 	// (inf and NaN are excluded by -ffinite-math-only)
 	const auto __iszero_inf_nan = __x == 0;
 #else
-	const auto __as_int
-	  = __bit_cast<rebind_simd_t<__int_for_sizeof_t<_Tp>, _V>>(abs(__x));
-	const auto __inf
-	  = __bit_cast<rebind_simd_t<__int_for_sizeof_t<_Tp>, _V>>(
-	    _V(__infinity_v<_Tp>));
+	using _Ip = __int_for_sizeof_t<_Tp>;
+	const auto __as_int = simd_bit_cast<rebind_simd_t<_Ip, _V>>(abs(__x));
+	const auto __inf = simd_bit_cast<rebind_simd_t<_Ip, _V>>(_V(__infinity_v<_Tp>));
 	const auto __iszero_inf_nan = static_simd_cast<typename _V::mask_type>(
 	  __as_int == 0 || __as_int >= __inf);
 #endif
@@ -722,10 +721,10 @@ template <typename _Tp, typename _Abi>
 	where(__value_isnormal.__cvt(), __e) = __exponent_bits;
 	static_assert(sizeof(_IV) == sizeof(__value_isnormal));
 	const _IV __offset
-	  = (__bit_cast<_IV>(__value_isnormal) & _IV(__exp_adjust))
-	    | (__bit_cast<_IV>(static_simd_cast<_MaskType>(__exponent_bits == 0)
-			       & static_simd_cast<_MaskType>(__x != 0))
-	       & _IV(__exp_adjust + __exp_offset));
+	  = (simd_bit_cast<_IV>(__value_isnormal) & _IV(__exp_adjust))
+	      | (simd_bit_cast<_IV>(static_simd_cast<_MaskType>(__exponent_bits == 0)
+				      & static_simd_cast<_MaskType>(__x != 0))
+		   & _IV(__exp_adjust + __exp_offset));
 	*__exp = simd_cast<_Samesize<int, _V>>(__e - __offset);
 	return __mant;
       }
@@ -796,7 +795,7 @@ template <typename _Tp, typename _Abi>
 	  using namespace std::experimental::__proposed;
 	  using _IV = rebind_simd_t<
 	    conditional_t<sizeof(_Tp) == sizeof(_LLong), _LLong, int>, _V>;
-	  return (__bit_cast<_IV>(__v) >> (__digits_v<_Tp> - 1))
+	  return (simd_bit_cast<_IV>(__v) >> (__digits_v<_Tp> - 1))
 		 - (__max_exponent_v<_Tp> - 1);
 	};
 	_V __r = static_simd_cast<_V>(__exponent(abs_x));
@@ -981,6 +980,7 @@ template <typename _VV>
 	// Skylake-AVX512 (not even for SSE and AVX vectors, and really bad for
 	// AVX-512).
 	using namespace __float_bitwise_operators;
+	using namespace __proposed;
 	_V __absx = abs(__x);          // no error
 	_V __absy = abs(__y);          // no error
 	_V __hi = max(__absx, __absy); // no error
@@ -1028,9 +1028,9 @@ template <typename _VV>
 #ifdef __FAST_MATH__
 	    using _Ip = __int_for_sizeof_t<_Tp>;
 	    using _IV = rebind_simd_t<_Ip, _V>;
-	    const auto __as_int = __bit_cast<_IV>(__hi_exp);
+	    const auto __as_int = simd_bit_cast<_IV>(__hi_exp);
 	    const _V __scale
-	      = __bit_cast<_V>(2 * __bit_cast<_Ip>(_Tp(1)) - __as_int);
+	      = simd_bit_cast<_V>(2 * simd_bit_cast<_Ip>(_Tp(1)) - __as_int);
 #else
 	    const _V __scale = (__hi_exp ^ __inf) * _Tp(.5);
 #endif
@@ -1197,9 +1197,9 @@ _GLIBCXX_SIMD_CVTING2(hypot)
 #ifdef __FAST_MATH__
 		using _Ip = __int_for_sizeof_t<_Tp>;
 		using _IV = rebind_simd_t<_Ip, _V>;
-		const auto __as_int = __bit_cast<_IV>(__hi_exp);
+		const auto __as_int = simd_bit_cast<_IV>(__hi_exp);
 		const _V __scale
-		  = __bit_cast<_V>(2 * __bit_cast<_Ip>(_Tp(1)) - __as_int);
+		  = simd_bit_cast<_V>(2 * simd_bit_cast<_Ip>(_Tp(1)) - __as_int);
 #else
 		const _V __scale = (__hi_exp ^ __inf) * _Tp(.5);
 #endif
@@ -1306,12 +1306,6 @@ template <typename _Tp, typename _Abi>
       return std::copysign(__x[0], __y[0]);
     else if constexpr (__is_fixed_size_abi_v<_Abi>)
       return {__private_init, _Abi::_SimdImpl::_S_copysign(__data(__x), __data(__y))};
-    else if constexpr (is_same_v<_Tp, long double> && sizeof(_Tp) == 12)
-      // Remove this case once __bit_cast is implemented via __builtin_bit_cast.
-      // It is necessary, because __signmask below cannot be computed at compile
-      // time.
-      return simd<_Tp, _Abi>(
-	[&](auto __i) { return std::copysign(__x[__i], __y[__i]); });
     else
       {
 	using _V = simd<_Tp, _Abi>;
diff --git a/libstdc++-v3/testsuite/experimental/simd/tests/bits/test_values.h b/libstdc++-v3/testsuite/experimental/simd/tests/bits/test_values.h
index b69bd0b704d..67aa870659b 100644
--- a/libstdc++-v3/testsuite/experimental/simd/tests/bits/test_values.h
+++ b/libstdc++-v3/testsuite/experimental/simd/tests/bits/test_values.h
@@ -221,11 +221,11 @@ template <class V>
     if constexpr (sizeof(T) <= sizeof(double))
       {
 	using I = rebind_simd_t<__int_for_sizeof_t<T>, V>;
-	const I abs_x = __bit_cast<I>(abs(x));
-	const I min = __bit_cast<I>(V(std::__norm_min_v<T>));
-	const I max = __bit_cast<I>(V(std::__finite_max_v<T>));
+	const I abs_x = simd_bit_cast<I>(abs(x));
+	const I min = simd_bit_cast<I>(V(std::__norm_min_v<T>));
+	const I max = simd_bit_cast<I>(V(std::__finite_max_v<T>));
 	return static_simd_cast<typename V::mask_type>(
-		 __bit_cast<I>(x) == 0 || (abs_x >= min && abs_x <= max));
+		 simd_bit_cast<I>(x) == 0 || (abs_x >= min && abs_x <= max));
       }
     else
       {

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups
  2021-06-08 12:10 [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups Matthias Kretz
                   ` (10 preceding siblings ...)
  2021-06-08 12:12 ` [PATCH 11/11] libstdc++: Fix ODR issues with different -m flags Matthias Kretz
@ 2021-06-24 13:42 ` Jonathan Wakely
  11 siblings, 0 replies; 29+ messages in thread
From: Jonathan Wakely @ 2021-06-24 13:42 UTC (permalink / raw)
  To: Matthias Kretz; +Cc: gcc Patches, libstdc++

On Tue, 8 Jun 2021 at 13:10, Matthias Kretz wrote:
>
> The following patches mostly contain code cleanups and minor corrections. The
> major feature in this patchset is the last patch, which should make the use of
> stdx::simd much safer wrt. ODR violations involuntarily introduced by linking
> TUs that were compiled with different -m and floating-point flags.
>
> Matthias Kretz (11):
>   libstdc++: Improve copysign codegen
>   libstdc++: Remove dead code
>   libstdc++: Improve fixed_size codegen
>   libstdc++: Make use of __builtin_bit_cast
>   libstdc++: Remove incorrect fabs overload
>   libstdc++: Minor simd_math cleanups
>   libstdc++: Fix condition when AVX512F ldexp implementation is used
>   libstdc++: Avoid raising fp exceptions in trunc, floor, and ceil
>   libstdc++: Ensure unrolled loops inline the lambda
>   libstdc++: Fix internal names: add missing underscores
>   libstdc++: Fix ODR issues with different -m flags

Thanks! I've pushed all except the bit_cast one (as discussed on IRC)
and the ODR one (which I'm still reviewing).


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast
  2021-06-11 10:53   ` [PATCH 04/11 v2] " Matthias Kretz
@ 2021-06-24 14:01     ` Matthias Kretz
  2021-06-24 14:08       ` Jakub Jelinek
  2021-06-25 11:23       ` Jonathan Wakely
  0 siblings, 2 replies; 29+ messages in thread
From: Matthias Kretz @ 2021-06-24 14:01 UTC (permalink / raw)
  To: gcc-patches, libstdc++

[-- Attachment #1: Type: text/plain, Size: 2303 bytes --]

For -ffast-math there was a missing using namespace __proposed left. The 
attached patch resolves the issue.

From: Matthias Kretz <m.kretz@gsi.de>

The __bit_cast function was a hack to achieve what __builtin_bit_cast
can do, therefore use __builtin_bit_cast if possible. However,
__builtin_bit_cast cannot be used to cast from/to fixed_size_simd, since
it isn't trivially copyable (in the language sense — in principle it
is). Therefore add __proposed::simd_bit_cast to enable the use case
required in the test framework.

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>

libstdc++-v3/ChangeLog:

        * include/experimental/bits/simd.h (__bit_cast): Implement via
        __builtin_bit_cast #if available.
        (__proposed::simd_bit_cast): Add overloads for simd and
        simd_mask, which use __builtin_bit_cast (or __bit_cast #if not
        available), which return an object of the requested type with
        the same bits as the argument.
        * include/experimental/bits/simd_math.h: Use simd_bit_cast
        instead of __bit_cast to allow casts to fixed_size_simd.
        (copysign): Remove branch that was only required if __bit_cast
        cannot be constexpr.
        * testsuite/experimental/simd/tests/bits/test_values.h: Switch
        from __bit_cast to __proposed::simd_bit_cast since the former
        will not cast fixed_size objects anymore.
---
 libstdc++-v3/include/experimental/bits/simd.h | 57 ++++++++++++++++++-
 .../include/experimental/bits/simd_math.h     | 37 ++++++------
 .../simd/tests/bits/test_values.h             |  8 +--
 3 files changed, 76 insertions(+), 26 deletions(-)


-- 
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 std::experimental::simd              https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────

[-- Attachment #2: 0001-libstdc-Make-use-of-__builtin_bit_cast.patch --]
[-- Type: text/x-patch, Size: 9051 bytes --]

diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/include/experimental/bits/simd.h
index 163f1b574e2..852d0b62012 100644
--- a/libstdc++-v3/include/experimental/bits/simd.h
+++ b/libstdc++-v3/include/experimental/bits/simd.h
@@ -1598,7 +1598,9 @@ template <typename _To, typename _From>
   _GLIBCXX_SIMD_INTRINSIC constexpr _To
   __bit_cast(const _From __x)
   {
-    // TODO: implement with / replace by __builtin_bit_cast ASAP
+#if __has_builtin(__builtin_bit_cast)
+    return __builtin_bit_cast(_To, __x);
+#else
     static_assert(sizeof(_To) == sizeof(_From));
     constexpr bool __to_is_vectorizable
       = is_arithmetic_v<_To> || is_enum_v<_To>;
@@ -1629,6 +1631,7 @@ template <typename _To, typename _From>
 			 reinterpret_cast<const char*>(&__x), sizeof(_To));
 	return __r;
       }
+#endif
   }
 
 // }}}
@@ -2900,6 +2903,58 @@ template <typename _Tp, typename _Up, typename _Ap,
     return {__private_init, _RM::abi_type::_MaskImpl::template _S_convert<
 			      typename _RM::simd_type::value_type>(__x)};
   }
+
+template <typename _To, typename _Up, typename _Abi>
+  _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR
+  _To
+  simd_bit_cast(const simd<_Up, _Abi>& __x)
+  {
+    using _Tp = typename _To::value_type;
+    using _ToMember = typename _SimdTraits<_Tp, typename _To::abi_type>::_SimdMember;
+    using _From = simd<_Up, _Abi>;
+    using _FromMember = typename _SimdTraits<_Up, _Abi>::_SimdMember;
+    // with concepts, the following should be constraints
+    static_assert(sizeof(_To) == sizeof(_From));
+    static_assert(is_trivially_copyable_v<_Tp> && is_trivially_copyable_v<_Up>);
+    static_assert(is_trivially_copyable_v<_ToMember> && is_trivially_copyable_v<_FromMember>);
+#if __has_builtin(__builtin_bit_cast)
+    return {__private_init, __builtin_bit_cast(_ToMember, __data(__x))};
+#else
+    return {__private_init, __bit_cast<_ToMember>(__data(__x))};
+#endif
+  }
+
+template <typename _To, typename _Up, typename _Abi>
+  _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR
+  _To
+  simd_bit_cast(const simd_mask<_Up, _Abi>& __x)
+  {
+    using _From = simd_mask<_Up, _Abi>;
+    static_assert(sizeof(_To) == sizeof(_From));
+    static_assert(is_trivially_copyable_v<_From>);
+    // _To can be simd<T, A>, specifically simd<T, fixed_size<N>> in which case _To is not trivially
+    // copyable.
+    if constexpr (is_simd_v<_To>)
+      {
+	using _Tp = typename _To::value_type;
+	using _ToMember = typename _SimdTraits<_Tp, typename _To::abi_type>::_SimdMember;
+	static_assert(is_trivially_copyable_v<_ToMember>);
+#if __has_builtin(__builtin_bit_cast)
+	return {__private_init, __builtin_bit_cast(_ToMember, __x)};
+#else
+	return {__private_init, __bit_cast<_ToMember>(__x)};
+#endif
+      }
+    else
+      {
+	static_assert(is_trivially_copyable_v<_To>);
+#if __has_builtin(__builtin_bit_cast)
+	return __builtin_bit_cast(_To, __x);
+#else
+	return __bit_cast<_To>(__x);
+#endif
+      }
+  }
 } // namespace __proposed
 
 // simd_cast {{{2
diff --git a/libstdc++-v3/include/experimental/bits/simd_math.h b/libstdc++-v3/include/experimental/bits/simd_math.h
index d954e761eee..ef2bdc641b8 100644
--- a/libstdc++-v3/include/experimental/bits/simd_math.h
+++ b/libstdc++-v3/include/experimental/bits/simd_math.h
@@ -405,10 +405,11 @@ template <typename _Tp, typename _Abi>
     using _Vp = simd<_Tp, _Abi>;
     using _Up = make_unsigned_t<__int_for_sizeof_t<_Tp>>;
     using namespace std::experimental::__float_bitwise_operators;
+    using namespace std::experimental::__proposed;
     const _Vp __exponent_mask
       = __infinity_v<_Tp>; // 0x7f800000 or 0x7ff0000000000000
     return static_simd_cast<rebind_simd_t<int, _Vp>>(
-      __bit_cast<rebind_simd_t<_Up, _Vp>>(__v & __exponent_mask)
+	     simd_bit_cast<rebind_simd_t<_Up, _Vp>>(__v & __exponent_mask)
       >> (__digits_v<_Tp> - 1));
   }
 
@@ -700,11 +701,9 @@ template <typename _Tp, typename _Abi>
 	// (inf and NaN are excluded by -ffinite-math-only)
 	const auto __iszero_inf_nan = __x == 0;
 #else
-	const auto __as_int
-	  = __bit_cast<rebind_simd_t<__int_for_sizeof_t<_Tp>, _V>>(abs(__x));
-	const auto __inf
-	  = __bit_cast<rebind_simd_t<__int_for_sizeof_t<_Tp>, _V>>(
-	    _V(__infinity_v<_Tp>));
+	using _Ip = __int_for_sizeof_t<_Tp>;
+	const auto __as_int = simd_bit_cast<rebind_simd_t<_Ip, _V>>(abs(__x));
+	const auto __inf = simd_bit_cast<rebind_simd_t<_Ip, _V>>(_V(__infinity_v<_Tp>));
 	const auto __iszero_inf_nan = static_simd_cast<typename _V::mask_type>(
 	  __as_int == 0 || __as_int >= __inf);
 #endif
@@ -722,10 +721,10 @@ template <typename _Tp, typename _Abi>
 	where(__value_isnormal.__cvt(), __e) = __exponent_bits;
 	static_assert(sizeof(_IV) == sizeof(__value_isnormal));
 	const _IV __offset
-	  = (__bit_cast<_IV>(__value_isnormal) & _IV(__exp_adjust))
-	    | (__bit_cast<_IV>(static_simd_cast<_MaskType>(__exponent_bits == 0)
-			       & static_simd_cast<_MaskType>(__x != 0))
-	       & _IV(__exp_adjust + __exp_offset));
+	  = (simd_bit_cast<_IV>(__value_isnormal) & _IV(__exp_adjust))
+	      | (simd_bit_cast<_IV>(static_simd_cast<_MaskType>(__exponent_bits == 0)
+				      & static_simd_cast<_MaskType>(__x != 0))
+		   & _IV(__exp_adjust + __exp_offset));
 	*__exp = simd_cast<_Samesize<int, _V>>(__e - __offset);
 	return __mant;
       }
@@ -796,7 +795,7 @@ template <typename _Tp, typename _Abi>
 	  using namespace std::experimental::__proposed;
 	  using _IV = rebind_simd_t<
 	    conditional_t<sizeof(_Tp) == sizeof(_LLong), _LLong, int>, _V>;
-	  return (__bit_cast<_IV>(__v) >> (__digits_v<_Tp> - 1))
+	  return (simd_bit_cast<_IV>(__v) >> (__digits_v<_Tp> - 1))
 		 - (__max_exponent_v<_Tp> - 1);
 	};
 	_V __r = static_simd_cast<_V>(__exponent(abs_x));
@@ -981,6 +980,7 @@ template <typename _VV>
 	// Skylake-AVX512 (not even for SSE and AVX vectors, and really bad for
 	// AVX-512).
 	using namespace __float_bitwise_operators;
+	using namespace __proposed;
 	_V __absx = abs(__x);          // no error
 	_V __absy = abs(__y);          // no error
 	_V __hi = max(__absx, __absy); // no error
@@ -1028,9 +1028,9 @@ template <typename _VV>
 #ifdef __FAST_MATH__
 	    using _Ip = __int_for_sizeof_t<_Tp>;
 	    using _IV = rebind_simd_t<_Ip, _V>;
-	    const auto __as_int = __bit_cast<_IV>(__hi_exp);
+	    const auto __as_int = simd_bit_cast<_IV>(__hi_exp);
 	    const _V __scale
-	      = __bit_cast<_V>(2 * __bit_cast<_Ip>(_Tp(1)) - __as_int);
+	      = simd_bit_cast<_V>(2 * simd_bit_cast<_Ip>(_Tp(1)) - __as_int);
 #else
 	    const _V __scale = (__hi_exp ^ __inf) * _Tp(.5);
 #endif
@@ -1118,6 +1118,7 @@ _GLIBCXX_SIMD_CVTING2(hypot)
     else
       {
 	using namespace __float_bitwise_operators;
+	using namespace __proposed;
 	const _V __absx = abs(__x);                 // no error
 	const _V __absy = abs(__y);                 // no error
 	const _V __absz = abs(__z);                 // no error
@@ -1197,9 +1198,9 @@ _GLIBCXX_SIMD_CVTING2(hypot)
 #ifdef __FAST_MATH__
 		using _Ip = __int_for_sizeof_t<_Tp>;
 		using _IV = rebind_simd_t<_Ip, _V>;
-		const auto __as_int = __bit_cast<_IV>(__hi_exp);
+		const auto __as_int = simd_bit_cast<_IV>(__hi_exp);
 		const _V __scale
-		  = __bit_cast<_V>(2 * __bit_cast<_Ip>(_Tp(1)) - __as_int);
+		  = simd_bit_cast<_V>(2 * simd_bit_cast<_Ip>(_Tp(1)) - __as_int);
 #else
 		const _V __scale = (__hi_exp ^ __inf) * _Tp(.5);
 #endif
@@ -1306,12 +1307,6 @@ template <typename _Tp, typename _Abi>
       return std::copysign(__x[0], __y[0]);
     else if constexpr (__is_fixed_size_abi_v<_Abi>)
       return {__private_init, _Abi::_SimdImpl::_S_copysign(__data(__x), __data(__y))};
-    else if constexpr (is_same_v<_Tp, long double> && sizeof(_Tp) == 12)
-      // Remove this case once __bit_cast is implemented via __builtin_bit_cast.
-      // It is necessary, because __signmask below cannot be computed at compile
-      // time.
-      return simd<_Tp, _Abi>(
-	[&](auto __i) { return std::copysign(__x[__i], __y[__i]); });
     else
       {
 	using _V = simd<_Tp, _Abi>;
diff --git a/libstdc++-v3/testsuite/experimental/simd/tests/bits/test_values.h b/libstdc++-v3/testsuite/experimental/simd/tests/bits/test_values.h
index b69bd0b704d..67aa870659b 100644
--- a/libstdc++-v3/testsuite/experimental/simd/tests/bits/test_values.h
+++ b/libstdc++-v3/testsuite/experimental/simd/tests/bits/test_values.h
@@ -221,11 +221,11 @@ template <class V>
     if constexpr (sizeof(T) <= sizeof(double))
       {
 	using I = rebind_simd_t<__int_for_sizeof_t<T>, V>;
-	const I abs_x = __bit_cast<I>(abs(x));
-	const I min = __bit_cast<I>(V(std::__norm_min_v<T>));
-	const I max = __bit_cast<I>(V(std::__finite_max_v<T>));
+	const I abs_x = simd_bit_cast<I>(abs(x));
+	const I min = simd_bit_cast<I>(V(std::__norm_min_v<T>));
+	const I max = simd_bit_cast<I>(V(std::__finite_max_v<T>));
 	return static_simd_cast<typename V::mask_type>(
-		 __bit_cast<I>(x) == 0 || (abs_x >= min && abs_x <= max));
+		 simd_bit_cast<I>(x) == 0 || (abs_x >= min && abs_x <= max));
       }
     else
       {

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast
  2021-06-24 14:01     ` [PATCH 04/11 v3] " Matthias Kretz
@ 2021-06-24 14:08       ` Jakub Jelinek
  2021-06-24 14:11         ` Jonathan Wakely
  2021-06-25 11:23       ` Jonathan Wakely
  1 sibling, 1 reply; 29+ messages in thread
From: Jakub Jelinek @ 2021-06-24 14:08 UTC (permalink / raw)
  To: Matthias Kretz, Jonathan Wakely; +Cc: gcc-patches, libstdc++

On Thu, Jun 24, 2021 at 04:01:34PM +0200, Matthias Kretz wrote:
> --- a/libstdc++-v3/include/experimental/bits/simd.h
> +++ b/libstdc++-v3/include/experimental/bits/simd.h
> @@ -1598,7 +1598,9 @@ template <typename _To, typename _From>
>    _GLIBCXX_SIMD_INTRINSIC constexpr _To
>    __bit_cast(const _From __x)
>    {
> -    // TODO: implement with / replace by __builtin_bit_cast ASAP
> +#if __has_builtin(__builtin_bit_cast)

Shouldn't that use #if _GLIBCXX_HAS_BUILTIN(__builtin_bit_cast) in
c++config to define a new macro and use that macro here?
Though it is true that c++config already uses
#if __has_builtin(__builtin_is_constant_evaluated)
and so would fail miserably for compilers that don't support __has_builtin

	Jakub


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast
  2021-06-24 14:08       ` Jakub Jelinek
@ 2021-06-24 14:11         ` Jonathan Wakely
  2021-06-24 14:12           ` Jonathan Wakely
  2021-06-24 14:21           ` Jakub Jelinek
  0 siblings, 2 replies; 29+ messages in thread
From: Jonathan Wakely @ 2021-06-24 14:11 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Matthias Kretz, gcc Patches, libstdc++

On Thu, 24 Jun 2021 at 15:08, Jakub Jelinek wrote:
>
> On Thu, Jun 24, 2021 at 04:01:34PM +0200, Matthias Kretz wrote:
> > --- a/libstdc++-v3/include/experimental/bits/simd.h
> > +++ b/libstdc++-v3/include/experimental/bits/simd.h
> > @@ -1598,7 +1598,9 @@ template <typename _To, typename _From>
> >    _GLIBCXX_SIMD_INTRINSIC constexpr _To
> >    __bit_cast(const _From __x)
> >    {
> > -    // TODO: implement with / replace by __builtin_bit_cast ASAP
> > +#if __has_builtin(__builtin_bit_cast)
>
> Shouldn't that use #if _GLIBCXX_HAS_BUILTIN(__builtin_bit_cast) in
> c++config to define a new macro and use that macro here?
> Though it is true that c++config already uses
> #if __has_builtin(__builtin_is_constant_evaluated)
> and so would fail miserably for compilers that don't support __has_builtin

GCC was the last of our supported compilers to implement
__has_builtin, so for GCC trunk we can assume that it's always
supported.

The code in c++config.h still has some value for built-ins that aren't
called __builtin_xxx because older versions of Clang need different
handling for those. But for __builtin_bit_cast and
__builtin_is_constant_evaluted we can just use __is_builtin directly.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast
  2021-06-24 14:11         ` Jonathan Wakely
@ 2021-06-24 14:12           ` Jonathan Wakely
  2021-06-24 14:21           ` Jakub Jelinek
  1 sibling, 0 replies; 29+ messages in thread
From: Jonathan Wakely @ 2021-06-24 14:12 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Matthias Kretz, gcc Patches, libstdc++

On Thu, 24 Jun 2021 at 15:11, Jonathan Wakely wrote:
>
> On Thu, 24 Jun 2021 at 15:08, Jakub Jelinek wrote:
> >
> > On Thu, Jun 24, 2021 at 04:01:34PM +0200, Matthias Kretz wrote:
> > > --- a/libstdc++-v3/include/experimental/bits/simd.h
> > > +++ b/libstdc++-v3/include/experimental/bits/simd.h
> > > @@ -1598,7 +1598,9 @@ template <typename _To, typename _From>
> > >    _GLIBCXX_SIMD_INTRINSIC constexpr _To
> > >    __bit_cast(const _From __x)
> > >    {
> > > -    // TODO: implement with / replace by __builtin_bit_cast ASAP
> > > +#if __has_builtin(__builtin_bit_cast)
> >
> > Shouldn't that use #if _GLIBCXX_HAS_BUILTIN(__builtin_bit_cast) in
> > c++config to define a new macro and use that macro here?
> > Though it is true that c++config already uses
> > #if __has_builtin(__builtin_is_constant_evaluated)
> > and so would fail miserably for compilers that don't support __has_builtin
>
> GCC was the last of our supported compilers to implement
> __has_builtin, so for GCC trunk we can assume that it's always
> supported.
>
> The code in c++config.h still has some value for built-ins that aren't
> called __builtin_xxx because older versions of Clang need different
> handling for those. But for __builtin_bit_cast and
> __builtin_is_constant_evaluted we can just use __is_builtin directly.

s/__is_builtin/__has_builtin/


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast
  2021-06-24 14:11         ` Jonathan Wakely
  2021-06-24 14:12           ` Jonathan Wakely
@ 2021-06-24 14:21           ` Jakub Jelinek
  2021-06-24 14:34             ` Jonathan Wakely
  1 sibling, 1 reply; 29+ messages in thread
From: Jakub Jelinek @ 2021-06-24 14:21 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: Matthias Kretz, gcc Patches, libstdc++

On Thu, Jun 24, 2021 at 03:11:01PM +0100, Jonathan Wakely wrote:
> On Thu, 24 Jun 2021 at 15:08, Jakub Jelinek wrote:
> >
> > On Thu, Jun 24, 2021 at 04:01:34PM +0200, Matthias Kretz wrote:
> > > --- a/libstdc++-v3/include/experimental/bits/simd.h
> > > +++ b/libstdc++-v3/include/experimental/bits/simd.h
> > > @@ -1598,7 +1598,9 @@ template <typename _To, typename _From>
> > >    _GLIBCXX_SIMD_INTRINSIC constexpr _To
> > >    __bit_cast(const _From __x)
> > >    {
> > > -    // TODO: implement with / replace by __builtin_bit_cast ASAP
> > > +#if __has_builtin(__builtin_bit_cast)
> >
> > Shouldn't that use #if _GLIBCXX_HAS_BUILTIN(__builtin_bit_cast) in
> > c++config to define a new macro and use that macro here?
> > Though it is true that c++config already uses
> > #if __has_builtin(__builtin_is_constant_evaluated)
> > and so would fail miserably for compilers that don't support __has_builtin
> 
> GCC was the last of our supported compilers to implement
> __has_builtin, so for GCC trunk we can assume that it's always
> supported.

We don't support mixing GCC and libstdc++ versions, so I'm not worried
about GCC.  At least according to godbolt, already clang 3.0 supports it
which is 10 years old, so probably fine too, but ICC 19.0/19.1 still doesn't
support it, only ICC 2021 does.  And ICC 19.1 seems to be released in
October 2020.

So, wouldn't it be better not to #undef _GLIBCXX_HAS_BUILTIN, move its
definition a little bit earlier and use it also for
__builtin_is_constant_evaluated?

	Jakub


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast
  2021-06-24 14:21           ` Jakub Jelinek
@ 2021-06-24 14:34             ` Jonathan Wakely
  2021-06-24 14:40               ` Jonathan Wakely
  0 siblings, 1 reply; 29+ messages in thread
From: Jonathan Wakely @ 2021-06-24 14:34 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Matthias Kretz, gcc Patches, libstdc++

[-- Attachment #1: Type: text/plain, Size: 2358 bytes --]

On Thu, 24 Jun 2021 at 15:21, Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Thu, Jun 24, 2021 at 03:11:01PM +0100, Jonathan Wakely wrote:
> > On Thu, 24 Jun 2021 at 15:08, Jakub Jelinek wrote:
> > >
> > > On Thu, Jun 24, 2021 at 04:01:34PM +0200, Matthias Kretz wrote:
> > > > --- a/libstdc++-v3/include/experimental/bits/simd.h
> > > > +++ b/libstdc++-v3/include/experimental/bits/simd.h
> > > > @@ -1598,7 +1598,9 @@ template <typename _To, typename _From>
> > > >    _GLIBCXX_SIMD_INTRINSIC constexpr _To
> > > >    __bit_cast(const _From __x)
> > > >    {
> > > > -    // TODO: implement with / replace by __builtin_bit_cast ASAP
> > > > +#if __has_builtin(__builtin_bit_cast)
> > >
> > > Shouldn't that use #if _GLIBCXX_HAS_BUILTIN(__builtin_bit_cast) in
> > > c++config to define a new macro and use that macro here?
> > > Though it is true that c++config already uses
> > > #if __has_builtin(__builtin_is_constant_evaluated)
> > > and so would fail miserably for compilers that don't support __has_builtin
> >
> > GCC was the last of our supported compilers to implement
> > __has_builtin, so for GCC trunk we can assume that it's always
> > supported.
>
> We don't support mixing GCC and libstdc++ versions, so I'm not worried
> about GCC.  At least according to godbolt, already clang 3.0 supports it
> which is 10 years old, so probably fine too, but ICC 19.0/19.1 still doesn't
> support it, only ICC 2021 does.  And ICC 19.1 seems to be released in
> October 2020.
>
> So, wouldn't it be better not to #undef _GLIBCXX_HAS_BUILTIN, move its
> definition a little bit earlier and use it also for
> __builtin_is_constant_evaluated?

I discussed this with Judy Ward on the Intel compiler team. If you're
using their compiler, you should be using the latest version. They
also claim 100% compatibility with GCC, for versions they've been able
to test. So if you are using libstdc++ headers from a GCC release that
supports __has_builtin, then you need to use a release of the Intel
compiler that supports __has_builtin. Otherwise, it's unsupported. So
in GCC 12 C++ headers we support GCC 12, versions of Intel compatible
with GCC 12, and the last few releases of Clang. All of those have
__has_builtin.

Rather than use the _GLIBCXX_HAS_BUILTIN macro more widely, I'd prefer
to not use it where it isn't needed, as in the attached (untested)
patch.

[-- Attachment #2: patch.txt --]
[-- Type: text/plain, Size: 12043 bytes --]

diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h
index 9911d4deb72..3c075966660 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -55,7 +55,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #ifdef __cpp_lib_is_constant_evaluated
 // Support P1032R1 in C++20 (but not P0980R1 yet).
 # define __cpp_lib_constexpr_string 201811L
-#elif __cplusplus >= 201703L && _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
+#elif __cplusplus >= 201703L && __has_builtin(__builtin_is_constant_evaluated)
 // Support P0426R1 changes to char_traits in C++17.
 # define __cpp_lib_constexpr_string 201611L
 #elif __cplusplus > 201703L
diff --git a/libstdc++-v3/include/bits/c++config b/libstdc++-v3/include/bits/c++config
index 9314117aed8..3ec668b65cf 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -720,13 +720,11 @@ namespace std
 # define _GLIBCXX_DOUBLE_IS_IEEE_BINARY64 1
 #endif
 
-#ifdef __has_builtin
-# ifdef __is_identifier
+#ifdef __is_identifier
 // Intel and older Clang require !__is_identifier for some built-ins:
-#  define _GLIBCXX_HAS_BUILTIN(B) __has_builtin(B) || ! __is_identifier(B)
-# else
-#  define _GLIBCXX_HAS_BUILTIN(B) __has_builtin(B)
-# endif
+# define _GLIBCXX_HAS_BUILTIN(B) __has_builtin(B) || ! __is_identifier(B)
+#else
+# define _GLIBCXX_HAS_BUILTIN(B) __has_builtin(B)
 #endif
 
 #if _GLIBCXX_HAS_BUILTIN(__has_unique_object_representations)
@@ -737,18 +735,10 @@ namespace std
 # define _GLIBCXX_HAVE_BUILTIN_IS_AGGREGATE 1
 #endif
 
-#if _GLIBCXX_HAS_BUILTIN(__builtin_is_constant_evaluated)
-#  define _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED 1
-#endif
-
 #if _GLIBCXX_HAS_BUILTIN(__is_same)
 #  define _GLIBCXX_HAVE_BUILTIN_IS_SAME 1
 #endif
 
-#if _GLIBCXX_HAS_BUILTIN(__builtin_launder)
-# define _GLIBCXX_HAVE_BUILTIN_LAUNDER 1
-#endif
-
 #undef _GLIBCXX_HAS_BUILTIN
 
 
diff --git a/libstdc++-v3/include/bits/char_traits.h b/libstdc++-v3/include/bits/char_traits.h
index 3da6e28a513..77ad7be5dfb 100644
--- a/libstdc++-v3/include/bits/char_traits.h
+++ b/libstdc++-v3/include/bits/char_traits.h
@@ -238,7 +238,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #ifdef __cpp_lib_is_constant_evaluated
 // Unofficial macro indicating P1032R1 support in C++20
 # define __cpp_lib_constexpr_char_traits 201811L
-#elif __cplusplus >= 201703L && _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
+#elif __cplusplus >= 201703L && __has_builtin(__builtin_is_constant_evaluated)
 // Unofficial macro indicating P0426R1 support in C++17
 # define __cpp_lib_constexpr_char_traits 201611L
 #endif
@@ -295,7 +295,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       {
 	if (__n == 0)
 	  return 0;
-#if __cplusplus >= 201703L && _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
+#if __cplusplus >= 201703L && __has_builtin(__builtin_is_constant_evaluated)
 	if (__builtin_is_constant_evaluated())
 	  {
 	    for (size_t __i = 0; __i < __n; ++__i)
@@ -312,7 +312,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       static _GLIBCXX17_CONSTEXPR size_t
       length(const char_type* __s)
       {
-#if __cplusplus >= 201703L && _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
+#if __cplusplus >= 201703L && __has_builtin(__builtin_is_constant_evaluated)
 	if (__builtin_is_constant_evaluated())
 	  return __gnu_cxx::char_traits<char_type>::length(__s);
 #endif
@@ -324,7 +324,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       {
 	if (__n == 0)
 	  return 0;
-#if __cplusplus >= 201703L && _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
+#if __cplusplus >= 201703L && __has_builtin(__builtin_is_constant_evaluated)
 	if (__builtin_is_constant_evaluated())
 	  return __gnu_cxx::char_traits<char_type>::find(__s, __n, __a);
 #endif
@@ -422,7 +422,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       {
 	if (__n == 0)
 	  return 0;
-#if __cplusplus >= 201703L && _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
+#if __cplusplus >= 201703L && __has_builtin(__builtin_is_constant_evaluated)
 	if (__builtin_is_constant_evaluated())
 	  return __gnu_cxx::char_traits<char_type>::compare(__s1, __s2, __n);
 #endif
@@ -432,7 +432,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       static _GLIBCXX17_CONSTEXPR size_t
       length(const char_type* __s)
       {
-#if __cplusplus >= 201703L && _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
+#if __cplusplus >= 201703L && __has_builtin(__builtin_is_constant_evaluated)
 	if (__builtin_is_constant_evaluated())
 	  return __gnu_cxx::char_traits<char_type>::length(__s);
 #endif
@@ -444,7 +444,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       {
 	if (__n == 0)
 	  return 0;
-#if __cplusplus >= 201703L && _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
+#if __cplusplus >= 201703L && __has_builtin(__builtin_is_constant_evaluated)
 	if (__builtin_is_constant_evaluated())
 	  return __gnu_cxx::char_traits<char_type>::find(__s, __n, __a);
 #endif
@@ -539,7 +539,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       {
 	if (__n == 0)
 	  return 0;
-#if __cplusplus >= 201703L && _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
+#if __cplusplus >= 201703L && __has_builtin(__builtin_is_constant_evaluated)
 	if (__builtin_is_constant_evaluated())
 	  return __gnu_cxx::char_traits<char_type>::compare(__s1, __s2, __n);
 #endif
@@ -549,7 +549,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       static _GLIBCXX17_CONSTEXPR size_t
       length(const char_type* __s)
       {
-#if __cplusplus >= 201703L && _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
+#if __cplusplus >= 201703L && __has_builtin(__builtin_is_constant_evaluated)
 	if (__builtin_is_constant_evaluated())
 	  return __gnu_cxx::char_traits<char_type>::length(__s);
 #endif
@@ -564,7 +564,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       {
 	if (__n == 0)
 	  return 0;
-#if __cplusplus >= 201703L && _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
+#if __cplusplus >= 201703L && __has_builtin(__builtin_is_constant_evaluated)
 	if (__builtin_is_constant_evaluated())
 	  return __gnu_cxx::char_traits<char_type>::find(__s, __n, __a);
 #endif
diff --git a/libstdc++-v3/include/bits/stl_function.h b/libstdc++-v3/include/bits/stl_function.h
index 073018d522d..774a9829284 100644
--- a/libstdc++-v3/include/bits/stl_function.h
+++ b/libstdc++-v3/include/bits/stl_function.h
@@ -413,12 +413,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       _GLIBCXX14_CONSTEXPR bool
       operator()(_Tp* __x, _Tp* __y) const _GLIBCXX_NOTHROW
       {
-#if __cplusplus >= 201402L
-#ifdef _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
+#if __cplusplus >= 201402L && __has_builtin(__builtin_is_constant_evaluated)
 	if (__builtin_is_constant_evaluated())
-#else
-	if (__builtin_constant_p(__x > __y))
-#endif
 	  return __x > __y;
 #endif
 	return (__UINTPTR_TYPE__)__x > (__UINTPTR_TYPE__)__y;
@@ -432,12 +428,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       _GLIBCXX14_CONSTEXPR bool
       operator()(_Tp* __x, _Tp* __y) const _GLIBCXX_NOTHROW
       {
-#if __cplusplus >= 201402L
-#ifdef _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
+#if __cplusplus >= 201402L && __has_builtin(__builtin_is_constant_evaluated)
 	if (__builtin_is_constant_evaluated())
-#else
-	if (__builtin_constant_p(__x < __y))
-#endif
 	  return __x < __y;
 #endif
 	return (__UINTPTR_TYPE__)__x < (__UINTPTR_TYPE__)__y;
@@ -451,12 +443,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       _GLIBCXX14_CONSTEXPR bool
       operator()(_Tp* __x, _Tp* __y) const _GLIBCXX_NOTHROW
       {
-#if __cplusplus >= 201402L
-#ifdef _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
+#if __cplusplus >= 201402L && __has_builtin(__builtin_is_constant_evaluated)
 	if (__builtin_is_constant_evaluated())
-#else
-	if (__builtin_constant_p(__x >= __y))
-#endif
 	  return __x >= __y;
 #endif
 	return (__UINTPTR_TYPE__)__x >= (__UINTPTR_TYPE__)__y;
@@ -470,12 +458,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       _GLIBCXX14_CONSTEXPR bool
       operator()(_Tp* __x, _Tp* __y) const _GLIBCXX_NOTHROW
       {
-#if __cplusplus >= 201402L
-#ifdef _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
+#if __cplusplus >= 201402L && __has_builtin(__builtin_is_constant_evaluated)
 	if (__builtin_is_constant_evaluated())
-#else
-	if (__builtin_constant_p(__x <= __y))
-#endif
 	  return __x <= __y;
 #endif
 	return (__UINTPTR_TYPE__)__x <= (__UINTPTR_TYPE__)__y;
diff --git a/libstdc++-v3/include/debug/helper_functions.h b/libstdc++-v3/include/debug/helper_functions.h
index c0144ced979..c54311a22d1 100644
--- a/libstdc++-v3/include/debug/helper_functions.h
+++ b/libstdc++-v3/include/debug/helper_functions.h
@@ -125,7 +125,7 @@ namespace __gnu_debug
     __check_singular(_Iterator const& __x)
     {
       return
-#ifdef _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
+#if __has_builtin(__builtin_is_constant_evaluated)
 	__builtin_is_constant_evaluated() ? false :
 #endif
 	__check_singular_aux(std::__addressof(__x));
@@ -138,7 +138,7 @@ namespace __gnu_debug
     __check_singular(_Tp* const& __ptr)
     {
       return
-#ifdef _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
+#if __has_builtin(__builtin_is_constant_evaluated)
 	__builtin_is_constant_evaluated() ? false :
 #endif
 	__ptr == 0;
diff --git a/libstdc++-v3/include/std/bit b/libstdc++-v3/include/std/bit
index c5aae8bab03..ee8e001fd44 100644
--- a/libstdc++-v3/include/std/bit
+++ b/libstdc++-v3/include/std/bit
@@ -265,7 +265,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       // representable as a value of _Tp, and so the result is undefined.
       // Want that undefined behaviour to be detected in constant expressions,
       // by UBSan, and by debug assertions.
-#ifdef _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
+#if __has_builtin(__builtin_is_constant_evaluated)
       if (!__builtin_is_constant_evaluated())
 	{
 	  __glibcxx_assert( __shift_exponent != __int_traits<_Tp>::__digits );
diff --git a/libstdc++-v3/include/std/type_traits b/libstdc++-v3/include/std/type_traits
index d9068a06f08..95a60e406a8 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -3316,7 +3316,7 @@ template <typename _From, typename _To>
     inline constexpr bool is_scoped_enum_v = is_scoped_enum<_Tp>::value;
 #endif // C++23
 
-#ifdef _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
+#if __has_builtin(__builtin_is_constant_evaluated)
 
 #define __cpp_lib_is_constant_evaluated 201811L
 
diff --git a/libstdc++-v3/include/std/version b/libstdc++-v3/include/std/version
index 27bcd32cb60..3bb50d37a72 100644
--- a/libstdc++-v3/include/std/version
+++ b/libstdc++-v3/include/std/version
@@ -111,7 +111,7 @@
 #endif
 #define __cpp_lib_is_invocable 201703
 #define __cpp_lib_is_swappable 201603
-#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER
+#if __has_builtin(__builtin_launder)
 # define __cpp_lib_launder 201606
 #endif
 #define __cpp_lib_logical_traits 201510
@@ -130,7 +130,7 @@
 #define __cpp_lib_chrono 201611
 #define __cpp_lib_clamp 201603
 #if __cplusplus == 201703L // N.B. updated value in C++20
-# if _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
+# if __has_builtin(__builtin_is_constant_evaluated)
 #  define __cpp_lib_constexpr_char_traits 201611L
 #  define __cpp_lib_constexpr_string 201611L
 # endif
@@ -188,7 +188,7 @@
 #endif
 #define __cpp_lib_endian 201907L
 #define __cpp_lib_int_pow2 202002L
-#ifdef _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED
+#ifdef __has_builtin(__builtin_is_constant_evaluated)
 # define __cpp_lib_is_constant_evaluated 201811L
 #endif
 #define __cpp_lib_is_nothrow_convertible 201806L
diff --git a/libstdc++-v3/libsupc++/new b/libstdc++-v3/libsupc++/new
index 3349b13fd1b..8774b333b90 100644
--- a/libstdc++-v3/libsupc++/new
+++ b/libstdc++-v3/libsupc++/new
@@ -182,8 +182,7 @@ inline void operator delete[](void*, void*) _GLIBCXX_USE_NOEXCEPT { }
 //@}
 } // extern "C++"
 
-#if __cplusplus >= 201703L
-#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER
+#if __cplusplus >= 201703L && __has_builtin(__builtin_launder)
 namespace std
 {
 #define __cpp_lib_launder 201606
@@ -206,7 +205,6 @@ namespace std
   void launder(volatile void*) = delete;
   void launder(const volatile void*) = delete;
 }
-#endif // _GLIBCXX_HAVE_BUILTIN_LAUNDER
 #endif // C++17
 
 #if __cplusplus > 201703L

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast
  2021-06-24 14:34             ` Jonathan Wakely
@ 2021-06-24 14:40               ` Jonathan Wakely
  2021-06-24 14:44                 ` Jakub Jelinek
  0 siblings, 1 reply; 29+ messages in thread
From: Jonathan Wakely @ 2021-06-24 14:40 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Matthias Kretz, gcc Patches, libstdc++

On Thu, 24 Jun 2021 at 15:34, Jonathan Wakely wrote:
> Rather than use the _GLIBCXX_HAS_BUILTIN macro more widely, I'd prefer
> to not use it where it isn't needed, as in the attached (untested)
> patch.

My rationale for this is that I'd prefer to use standardized features
like __has_include and __has_cpp_attribute where possible, instead of
adding more and more configure macros. You don't need to look in
c++config.h to see how the macro is defined if you just use a standard
feature directly.

__has_builtin obviously isn't standardized, but as long as it's
available on all the compilers we care about (which it is) then the
same rationale applies.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast
  2021-06-24 14:40               ` Jonathan Wakely
@ 2021-06-24 14:44                 ` Jakub Jelinek
  0 siblings, 0 replies; 29+ messages in thread
From: Jakub Jelinek @ 2021-06-24 14:44 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: Matthias Kretz, gcc Patches, libstdc++

On Thu, Jun 24, 2021 at 03:40:09PM +0100, Jonathan Wakely wrote:
> On Thu, 24 Jun 2021 at 15:34, Jonathan Wakely wrote:
> > Rather than use the _GLIBCXX_HAS_BUILTIN macro more widely, I'd prefer
> > to not use it where it isn't needed, as in the attached (untested)
> > patch.
> 
> My rationale for this is that I'd prefer to use standardized features
> like __has_include and __has_cpp_attribute where possible, instead of
> adding more and more configure macros. You don't need to look in
> c++config.h to see how the macro is defined if you just use a standard
> feature directly.
> 
> __has_builtin obviously isn't standardized, but as long as it's
> available on all the compilers we care about (which it is) then the
> same rationale applies.

Okay.

	Jakub


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast
  2021-06-24 14:01     ` [PATCH 04/11 v3] " Matthias Kretz
  2021-06-24 14:08       ` Jakub Jelinek
@ 2021-06-25 11:23       ` Jonathan Wakely
  1 sibling, 0 replies; 29+ messages in thread
From: Jonathan Wakely @ 2021-06-25 11:23 UTC (permalink / raw)
  To: Matthias Kretz; +Cc: gcc Patches, libstdc++

On Thu, 24 Jun 2021 at 15:02, Matthias Kretz wrote:
>
> For -ffast-math there was a missing using namespace __proposed left. The
> attached patch resolves the issue.

OK for trunk, please push (after adding yourself to the "Write After
Approval" section of MAINTAINERS as per
https://gcc.gnu.org/gitwrite.html as your first commit).

Thanks!


> From: Matthias Kretz <m.kretz@gsi.de>
>
> The __bit_cast function was a hack to achieve what __builtin_bit_cast
> can do, therefore use __builtin_bit_cast if possible. However,
> __builtin_bit_cast cannot be used to cast from/to fixed_size_simd, since
> it isn't trivially copyable (in the language sense — in principle it
> is). Therefore add __proposed::simd_bit_cast to enable the use case
> required in the test framework.
>
> Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
>
> libstdc++-v3/ChangeLog:
>
>         * include/experimental/bits/simd.h (__bit_cast): Implement via
>         __builtin_bit_cast #if available.
>         (__proposed::simd_bit_cast): Add overloads for simd and
>         simd_mask, which use __builtin_bit_cast (or __bit_cast #if not
>         available), which return an object of the requested type with
>         the same bits as the argument.
>         * include/experimental/bits/simd_math.h: Use simd_bit_cast
>         instead of __bit_cast to allow casts to fixed_size_simd.
>         (copysign): Remove branch that was only required if __bit_cast
>         cannot be constexpr.
>         * testsuite/experimental/simd/tests/bits/test_values.h: Switch
>         from __bit_cast to __proposed::simd_bit_cast since the former
>         will not cast fixed_size objects anymore.
> ---
>  libstdc++-v3/include/experimental/bits/simd.h | 57 ++++++++++++++++++-
>  .../include/experimental/bits/simd_math.h     | 37 ++++++------
>  .../simd/tests/bits/test_values.h             |  8 +--
>  3 files changed, 76 insertions(+), 26 deletions(-)
>
>
> --
> ──────────────────────────────────────────────────────────────────────────
>  Dr. Matthias Kretz                           https://mattkretz.github.io
>  GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
>  std::experimental::simd              https://github.com/VcDevel/std-simd
> ──────────────────────────────────────────────────────────────────────────


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 11/11] libstdc++: Fix ODR issues with different -m flags
  2021-06-08 12:12 ` [PATCH 11/11] libstdc++: Fix ODR issues with different -m flags Matthias Kretz
  2021-06-09 12:22   ` Richard Biener
@ 2021-11-15  8:57   ` Matthias Kretz
  2022-01-14 21:30     ` Jonathan Wakely
  1 sibling, 1 reply; 29+ messages in thread
From: Matthias Kretz @ 2021-11-15  8:57 UTC (permalink / raw)
  To: gcc-patches, libstdc++, Jonathan Wakely

ping. OK to push?

On Tuesday, 8 June 2021 14:12:23 CET Matthias Kretz wrote:
> From: Matthias Kretz <kretz@kde.org>
> 
> Explicitly support use of the stdx::simd implementation in situations
> where the user links TUs that were compiled with different -m flags. In
> general, this is always a (quasi) ODR violation for inline functions
> because at least codegen may differ in important ways. However, in the
> resulting executable only one (unspecified which one) of them might be
> used. For simd we want to support users to compile code multiple times,
> with different -m flags and have a runtime dispatch to the TU matching
> the target CPU. But if internal functions are not inlined this may lead
> to unexpected performance loss or execution of illegal instructions.
> Therefore, inline functions that are not marked as always_inline must
> use an additional template parameter somewhere in their name, to
> disambiguate between the different -m translations.
> 
> Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
> 
> libstdc++-v3/ChangeLog:
> 
> 	* include/experimental/bits/simd.h: Move feature detection bools
> 	and add __have_avx512bitalg, __have_avx512vbmi2,
> 	__have_avx512vbmi, __have_avx512ifma, __have_avx512cd,
> 	__have_avx512vnni, __have_avx512vpopcntdq.
> 	(__detail::__machine_flags): New function which returns a unique
> 	uint64 depending on relevant -m and -f flags.
> 	(__detail::__odr_helper): New type alias for either an anonymous
> 	type or a type specialized with the __machine_flags number.
> 	(_SimdIntOperators): Change template parameters from _Impl to
> 	_Tp, _Abi because _Impl now has an __odr_helper parameter which
> 	may be _OdrEnforcer from the anonymous namespace, which makes
> 	for a bad base class.
> 	(many): Either add __odr_helper template parameter or mark as
> 	always_inline.
> 	* include/experimental/bits/simd_detail.h: Add defines for
> 	AVX512BITALG, AVX512VBMI2, AVX512VBMI, AVX512IFMA, AVX512CD,
> 	AVX512VNNI, AVX512VPOPCNTDQ, and AVX512VP2INTERSECT.
> 	* include/experimental/bits/simd_builtin.h: Add __odr_helper
> 	template parameter or mark as always_inline.
> 	* include/experimental/bits/simd_fixed_size.h: Ditto.
> 	* include/experimental/bits/simd_math.h: Ditto.
> 	* include/experimental/bits/simd_scalar.h: Ditto.
> 	* include/experimental/bits/simd_neon.h: Add __odr_helper
> 	template parameter.
> 	* include/experimental/bits/simd_ppc.h: Ditto.
> 	* include/experimental/bits/simd_x86.h: Ditto.
> ---
>  libstdc++-v3/include/experimental/bits/simd.h | 380 ++++++++++++------
>  .../include/experimental/bits/simd_builtin.h  |  41 +-
>  .../include/experimental/bits/simd_detail.h   |  40 ++
>  .../experimental/bits/simd_fixed_size.h       |  39 +-
>  .../include/experimental/bits/simd_math.h     |  45 ++-
>  .../include/experimental/bits/simd_neon.h     |   4 +-
>  .../include/experimental/bits/simd_ppc.h      |   4 +-
>  .../include/experimental/bits/simd_scalar.h   |  71 +++-
>  .../include/experimental/bits/simd_x86.h      |   4 +-
>  9 files changed, 440 insertions(+), 188 deletions(-)

-- 
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 stdₓ::simd
──────────────────────────────────────────────────────────────────────────




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 11/11] libstdc++: Fix ODR issues with different -m flags
  2021-11-15  8:57   ` Matthias Kretz
@ 2022-01-14 21:30     ` Jonathan Wakely
  2022-01-17  0:08       ` Jonathan Wakely
  0 siblings, 1 reply; 29+ messages in thread
From: Jonathan Wakely @ 2022-01-14 21:30 UTC (permalink / raw)
  To: Matthias Kretz; +Cc: gcc Patches, libstdc++

On Mon, 15 Nov 2021 at 08:57, Matthias Kretz <m.kretz@gsi.de> wrote:

> ping. OK to push?
>

Sorry for the delay - this is OK for trunk.



> On Tuesday, 8 June 2021 14:12:23 CET Matthias Kretz wrote:
> > From: Matthias Kretz <kretz@kde.org>
> >
> > Explicitly support use of the stdx::simd implementation in situations
> > where the user links TUs that were compiled with different -m flags. In
> > general, this is always a (quasi) ODR violation for inline functions
> > because at least codegen may differ in important ways. However, in the
> > resulting executable only one (unspecified which one) of them might be
> > used. For simd we want to support users to compile code multiple times,
> > with different -m flags and have a runtime dispatch to the TU matching
> > the target CPU. But if internal functions are not inlined this may lead
> > to unexpected performance loss or execution of illegal instructions.
> > Therefore, inline functions that are not marked as always_inline must
> > use an additional template parameter somewhere in their name, to
> > disambiguate between the different -m translations.
> >
> > Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
> >
> > libstdc++-v3/ChangeLog:
> >
> >       * include/experimental/bits/simd.h: Move feature detection bools
> >       and add __have_avx512bitalg, __have_avx512vbmi2,
> >       __have_avx512vbmi, __have_avx512ifma, __have_avx512cd,
> >       __have_avx512vnni, __have_avx512vpopcntdq.
> >       (__detail::__machine_flags): New function which returns a unique
> >       uint64 depending on relevant -m and -f flags.
> >       (__detail::__odr_helper): New type alias for either an anonymous
> >       type or a type specialized with the __machine_flags number.
> >       (_SimdIntOperators): Change template parameters from _Impl to
> >       _Tp, _Abi because _Impl now has an __odr_helper parameter which
> >       may be _OdrEnforcer from the anonymous namespace, which makes
> >       for a bad base class.
> >       (many): Either add __odr_helper template parameter or mark as
> >       always_inline.
> >       * include/experimental/bits/simd_detail.h: Add defines for
> >       AVX512BITALG, AVX512VBMI2, AVX512VBMI, AVX512IFMA, AVX512CD,
> >       AVX512VNNI, AVX512VPOPCNTDQ, and AVX512VP2INTERSECT.
> >       * include/experimental/bits/simd_builtin.h: Add __odr_helper
> >       template parameter or mark as always_inline.
> >       * include/experimental/bits/simd_fixed_size.h: Ditto.
> >       * include/experimental/bits/simd_math.h: Ditto.
> >       * include/experimental/bits/simd_scalar.h: Ditto.
> >       * include/experimental/bits/simd_neon.h: Add __odr_helper
> >       template parameter.
> >       * include/experimental/bits/simd_ppc.h: Ditto.
> >       * include/experimental/bits/simd_x86.h: Ditto.
> > ---
> >  libstdc++-v3/include/experimental/bits/simd.h | 380 ++++++++++++------
> >  .../include/experimental/bits/simd_builtin.h  |  41 +-
> >  .../include/experimental/bits/simd_detail.h   |  40 ++
> >  .../experimental/bits/simd_fixed_size.h       |  39 +-
> >  .../include/experimental/bits/simd_math.h     |  45 ++-
> >  .../include/experimental/bits/simd_neon.h     |   4 +-
> >  .../include/experimental/bits/simd_ppc.h      |   4 +-
> >  .../include/experimental/bits/simd_scalar.h   |  71 +++-
> >  .../include/experimental/bits/simd_x86.h      |   4 +-
> >  9 files changed, 440 insertions(+), 188 deletions(-)
>
> --
> ──────────────────────────────────────────────────────────────────────────
>  Dr. Matthias Kretz                           https://mattkretz.github.io
>  GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
>  stdₓ::simd
> ──────────────────────────────────────────────────────────────────────────
>
>
>
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 11/11] libstdc++: Fix ODR issues with different -m flags
  2022-01-14 21:30     ` Jonathan Wakely
@ 2022-01-17  0:08       ` Jonathan Wakely
  0 siblings, 0 replies; 29+ messages in thread
From: Jonathan Wakely @ 2022-01-17  0:08 UTC (permalink / raw)
  To: Matthias Kretz; +Cc: gcc Patches, libstdc++

On Fri, 14 Jan 2022 at 21:30, Jonathan Wakely <jwakely@redhat.com> wrote:

>
>
> On Mon, 15 Nov 2021 at 08:57, Matthias Kretz <m.kretz@gsi.de> wrote:
>
>> ping. OK to push?
>>
>
> Sorry for the delay - this is OK for trunk.
>

I see a new failure on powerpc64le-linux (gcc112 in the cfarm) after this
commit:

FAIL: experimental/simd/standard_abi_usable_2.cc -maltivec -mpower8-vector
-O2 -Wno-psabi (test for excess errors)

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2022-01-17  0:08 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-08 12:10 [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups Matthias Kretz
2021-06-08 12:11 ` [PATCH 01/11] libstdc++: Improve copysign codegen Matthias Kretz
2021-06-08 12:11 ` [PATCH 02/11] libstdc++: Remove dead code Matthias Kretz
2021-06-08 12:11 ` [PATCH 03/11] libstdc++: Improve fixed_size codegen Matthias Kretz
2021-06-08 12:11 ` [PATCH 04/11] libstdc++: Make use of __builtin_bit_cast Matthias Kretz
2021-06-11 10:53   ` [PATCH 04/11 v2] " Matthias Kretz
2021-06-24 14:01     ` [PATCH 04/11 v3] " Matthias Kretz
2021-06-24 14:08       ` Jakub Jelinek
2021-06-24 14:11         ` Jonathan Wakely
2021-06-24 14:12           ` Jonathan Wakely
2021-06-24 14:21           ` Jakub Jelinek
2021-06-24 14:34             ` Jonathan Wakely
2021-06-24 14:40               ` Jonathan Wakely
2021-06-24 14:44                 ` Jakub Jelinek
2021-06-25 11:23       ` Jonathan Wakely
2021-06-08 12:11 ` [PATCH 05/11] libstdc++: Remove incorrect fabs overload Matthias Kretz
2021-06-08 12:11 ` [PATCH 06/11] libstdc++: Minor simd_math cleanups Matthias Kretz
2021-06-08 12:11 ` [PATCH 07/11] libstdc++: Fix condition when AVX512F ldexp implementation is used Matthias Kretz
2021-06-08 12:11 ` [PATCH 08/11] libstdc++: Avoid raising fp exceptions in trunc, floor, and ceil Matthias Kretz
2021-06-08 12:11 ` [PATCH 09/11] libstdc++: Ensure unrolled loops inline the lambda Matthias Kretz
2021-06-08 12:12 ` [PATCH 10/11] libstdc++: Fix internal names: add missing underscores Matthias Kretz
2021-06-08 12:12 ` [PATCH 11/11] libstdc++: Fix ODR issues with different -m flags Matthias Kretz
2021-06-09 12:22   ` Richard Biener
2021-06-09 12:53     ` Matthias Kretz
2021-06-09 13:22       ` Richard Biener
2021-11-15  8:57   ` Matthias Kretz
2022-01-14 21:30     ` Jonathan Wakely
2022-01-17  0:08       ` Jonathan Wakely
2021-06-24 13:42 ` [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups Jonathan Wakely

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).