* [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups @ 2021-06-08 12:10 Matthias Kretz 2021-06-08 12:11 ` [PATCH 01/11] libstdc++: Improve copysign codegen Matthias Kretz ` (11 more replies) 0 siblings, 12 replies; 29+ messages in thread From: Matthias Kretz @ 2021-06-08 12:10 UTC (permalink / raw) To: gcc-patches, libstdc++ The following patches mostly contain code cleanups and minor corrections. The major feature in this patchset is the last patch, which should make the use of stdx::simd much safer wrt. ODR violations involuntarily introduced by linking TUs that were compiled with different -m and floating-point flags. Matthias Kretz (11): libstdc++: Improve copysign codegen libstdc++: Remove dead code libstdc++: Improve fixed_size codegen libstdc++: Make use of __builtin_bit_cast libstdc++: Remove incorrect fabs overload libstdc++: Minor simd_math cleanups libstdc++: Fix condition when AVX512F ldexp implementation is used libstdc++: Avoid raising fp exceptions in trunc, floor, and ceil libstdc++: Ensure unrolled loops inline the lambda libstdc++: Fix internal names: add missing underscores libstdc++: Fix ODR issues with different -m flags libstdc++-v3/include/experimental/bits/simd.h | 438 ++++++++++++------ .../include/experimental/bits/simd_builtin.h | 48 +- .../experimental/bits/simd_converter.h | 2 +- .../include/experimental/bits/simd_detail.h | 40 ++ .../experimental/bits/simd_fixed_size.h | 95 ++-- .../include/experimental/bits/simd_math.h | 107 ++--- .../include/experimental/bits/simd_neon.h | 4 +- .../include/experimental/bits/simd_ppc.h | 4 +- .../include/experimental/bits/simd_scalar.h | 71 ++- .../include/experimental/bits/simd_x86.h | 33 +- .../simd/tests/bits/test_values.h | 8 +- 11 files changed, 540 insertions(+), 310 deletions(-) -- ────────────────────────────────────────────────────────────────────────── Dr. Matthias Kretz https://mattkretz.github.io GSI Helmholtz Centre for Heavy Ion Research https://gsi.de std::experimental::simd https://github.com/VcDevel/std-simd ────────────────────────────────────────────────────────────────────────── ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 01/11] libstdc++: Improve copysign codegen 2021-06-08 12:10 [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups Matthias Kretz @ 2021-06-08 12:11 ` Matthias Kretz 2021-06-08 12:11 ` [PATCH 02/11] libstdc++: Remove dead code Matthias Kretz ` (10 subsequent siblings) 11 siblings, 0 replies; 29+ messages in thread From: Matthias Kretz @ 2021-06-08 12:11 UTC (permalink / raw) To: gcc-patches, libstdc++ [-- Attachment #1: Type: text/plain, Size: 1824 bytes --] From: Matthias Kretz <kretz@kde.org> This also resolves a test failure on aarch64 with -ffast-math and fixed_size<N> with large N. Signed-off-by: Matthias Kretz <m.kretz@gsi.de> libstdc++-v3/ChangeLog: * include/experimental/bits/simd.h: Add missing operator~ overload for simd<floating-point> to __float_bitwise_operators. * include/experimental/bits/simd_builtin.h (_SimdImplBuiltin::_S_complement): Bitcast to int (and back) to implement complement for floating-point vectors. * include/experimental/bits/simd_fixed_size.h (_SimdImplFixedSize::_S_copysign): New function, forwarding to copysign implementation of _SimdTuple members. * include/experimental/bits/simd_math.h (copysign): Call _SimdImpl::_S_copysign for fixed_size arguments. Simplify generic copysign implementation using the new ~ operator. --- libstdc++-v3/include/experimental/bits/simd.h | 6 ++++++ libstdc++-v3/include/experimental/bits/simd_builtin.h | 7 ++++++- libstdc++-v3/include/experimental/bits/simd_fixed_size.h | 2 +- libstdc++-v3/include/experimental/bits/simd_math.h | 4 +++- 4 files changed, 16 insertions(+), 3 deletions(-) -- ────────────────────────────────────────────────────────────────────────── Dr. Matthias Kretz https://mattkretz.github.io GSI Helmholtz Centre for Heavy Ion Research https://gsi.de std::experimental::simd https://github.com/VcDevel/std-simd ────────────────────────────────────────────────────────────────────────── [-- Attachment #2: 0001-libstdc-Improve-copysign-codegen.patch --] [-- Type: text/x-patch, Size: 3375 bytes --] diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/include/experimental/bits/simd.h index 59ddf3cc958..163f1b574e2 100644 --- a/libstdc++-v3/include/experimental/bits/simd.h +++ b/libstdc++-v3/include/experimental/bits/simd.h @@ -5189,6 +5189,12 @@ template <typename _Tp, typename _Ap> return {__private_init, _Ap::_SimdImpl::_S_bit_and(__data(__a), __data(__b))}; } + +template <typename _Tp, typename _Ap> + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR + enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Ap>> + operator~(const simd<_Tp, _Ap>& __a) + { return {__private_init, _Ap::_SimdImpl::_S_complement(__data(__a))}; } } // namespace __float_bitwise_operators }}} _GLIBCXX_SIMD_END_NAMESPACE diff --git a/libstdc++-v3/include/experimental/bits/simd_builtin.h b/libstdc++-v3/include/experimental/bits/simd_builtin.h index e986ee91620..8cd338e313f 100644 --- a/libstdc++-v3/include/experimental/bits/simd_builtin.h +++ b/libstdc++-v3/include/experimental/bits/simd_builtin.h @@ -1632,7 +1632,12 @@ template <typename _Abi> template <typename _Tp, size_t _Np> _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np> _S_complement(_SimdWrapper<_Tp, _Np> __x) noexcept - { return ~__x._M_data; } + { + if constexpr (is_floating_point_v<_Tp>) + return __vector_bitcast<_Tp>(~__vector_bitcast<__int_for_sizeof_t<_Tp>>(__x)); + else + return ~__x._M_data; + } // _S_unary_minus {{{2 template <typename _Tp, size_t _Np> diff --git a/libstdc++-v3/include/experimental/bits/simd_fixed_size.h b/libstdc++-v3/include/experimental/bits/simd_fixed_size.h index 2722055c899..7c2c1df77c8 100644 --- a/libstdc++-v3/include/experimental/bits/simd_fixed_size.h +++ b/libstdc++-v3/include/experimental/bits/simd_fixed_size.h @@ -1663,7 +1663,7 @@ template <int _Np> _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, ldexp) _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, fmod) _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, remainder) - // copysign in simd_math.h + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, copysign) _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, nextafter) _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, fdim) _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, fmax) diff --git a/libstdc++-v3/include/experimental/bits/simd_math.h b/libstdc++-v3/include/experimental/bits/simd_math.h index 4799803a200..d954e761eee 100644 --- a/libstdc++-v3/include/experimental/bits/simd_math.h +++ b/libstdc++-v3/include/experimental/bits/simd_math.h @@ -1304,6 +1304,8 @@ template <typename _Tp, typename _Abi> { if constexpr (simd_size_v<_Tp, _Abi> == 1) return std::copysign(__x[0], __y[0]); + else if constexpr (__is_fixed_size_abi_v<_Abi>) + return {__private_init, _Abi::_SimdImpl::_S_copysign(__data(__x), __data(__y))}; else if constexpr (is_same_v<_Tp, long double> && sizeof(_Tp) == 12) // Remove this case once __bit_cast is implemented via __builtin_bit_cast. // It is necessary, because __signmask below cannot be computed at compile @@ -1315,7 +1317,7 @@ template <typename _Tp, typename _Abi> using _V = simd<_Tp, _Abi>; using namespace std::experimental::__float_bitwise_operators; _GLIBCXX_SIMD_USE_CONSTEXPR_API auto __signmask = _V(1) ^ _V(-1); - return (__x & (__x ^ __signmask)) | (__y & __signmask); + return (__x & ~__signmask) | (__y & __signmask); } } ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 02/11] libstdc++: Remove dead code 2021-06-08 12:10 [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups Matthias Kretz 2021-06-08 12:11 ` [PATCH 01/11] libstdc++: Improve copysign codegen Matthias Kretz @ 2021-06-08 12:11 ` Matthias Kretz 2021-06-08 12:11 ` [PATCH 03/11] libstdc++: Improve fixed_size codegen Matthias Kretz ` (9 subsequent siblings) 11 siblings, 0 replies; 29+ messages in thread From: Matthias Kretz @ 2021-06-08 12:11 UTC (permalink / raw) To: gcc-patches, libstdc++ [-- Attachment #1: Type: text/plain, Size: 1018 bytes --] From: Matthias Kretz <kretz@kde.org> This helper type became unused at some point. Signed-off-by: Matthias Kretz <m.kretz@gsi.de> libstdc++-v3/ChangeLog: * include/experimental/bits/simd_fixed_size.h (_AbisInSimdTuple): Removed. --- .../experimental/bits/simd_fixed_size.h | 49 ------------------- 1 file changed, 49 deletions(-) -- ────────────────────────────────────────────────────────────────────────── Dr. Matthias Kretz https://mattkretz.github.io GSI Helmholtz Centre for Heavy Ion Research https://gsi.de std::experimental::simd https://github.com/VcDevel/std-simd ────────────────────────────────────────────────────────────────────────── [-- Attachment #2: 0002-libstdc-Remove-dead-code.patch --] [-- Type: text/x-patch, Size: 2211 bytes --] diff --git a/libstdc++-v3/include/experimental/bits/simd_fixed_size.h b/libstdc++-v3/include/experimental/bits/simd_fixed_size.h index 7c2c1df77c8..b6fb47cdf39 100644 --- a/libstdc++-v3/include/experimental/bits/simd_fixed_size.h +++ b/libstdc++-v3/include/experimental/bits/simd_fixed_size.h @@ -1025,55 +1025,6 @@ template <typename _Tp, int _Np, typename... _As, typename _Next, int _Remain> _Tp, _Remain, _SimdTuple<_Tp, _As..., typename _Next::abi_type>>::type; }; -// }}} -// _AbisInSimdTuple {{{ -template <typename _Tp> - struct _SeqOp; - -template <size_t _I0, size_t... _Is> - struct _SeqOp<index_sequence<_I0, _Is...>> - { - using _FirstPlusOne = index_sequence<_I0 + 1, _Is...>; - using _NotFirstPlusOne = index_sequence<_I0, (_Is + 1)...>; - template <size_t _First, size_t _Add> - using _Prepend = index_sequence<_First, _I0 + _Add, (_Is + _Add)...>; - }; - -template <typename _Tp> - struct _AbisInSimdTuple; - -template <typename _Tp> - struct _AbisInSimdTuple<_SimdTuple<_Tp>> - { - using _Counts = index_sequence<0>; - using _Begins = index_sequence<0>; - }; - -template <typename _Tp, typename _Ap> - struct _AbisInSimdTuple<_SimdTuple<_Tp, _Ap>> - { - using _Counts = index_sequence<1>; - using _Begins = index_sequence<0>; - }; - -template <typename _Tp, typename _A0, typename... _As> - struct _AbisInSimdTuple<_SimdTuple<_Tp, _A0, _A0, _As...>> - { - using _Counts = typename _SeqOp<typename _AbisInSimdTuple< - _SimdTuple<_Tp, _A0, _As...>>::_Counts>::_FirstPlusOne; - using _Begins = typename _SeqOp<typename _AbisInSimdTuple< - _SimdTuple<_Tp, _A0, _As...>>::_Begins>::_NotFirstPlusOne; - }; - -template <typename _Tp, typename _A0, typename _A1, typename... _As> - struct _AbisInSimdTuple<_SimdTuple<_Tp, _A0, _A1, _As...>> - { - using _Counts = typename _SeqOp<typename _AbisInSimdTuple< - _SimdTuple<_Tp, _A1, _As...>>::_Counts>::template _Prepend<1, 0>; - using _Begins = typename _SeqOp<typename _AbisInSimdTuple< - _SimdTuple<_Tp, _A1, _As...>>::_Begins>::template _Prepend<0, 1>; - }; - // }}} // __autocvt_to_simd {{{ template <typename _Tp, bool = is_arithmetic_v<__remove_cvref_t<_Tp>>> ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 03/11] libstdc++: Improve fixed_size codegen 2021-06-08 12:10 [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups Matthias Kretz 2021-06-08 12:11 ` [PATCH 01/11] libstdc++: Improve copysign codegen Matthias Kretz 2021-06-08 12:11 ` [PATCH 02/11] libstdc++: Remove dead code Matthias Kretz @ 2021-06-08 12:11 ` Matthias Kretz 2021-06-08 12:11 ` [PATCH 04/11] libstdc++: Make use of __builtin_bit_cast Matthias Kretz ` (8 subsequent siblings) 11 siblings, 0 replies; 29+ messages in thread From: Matthias Kretz @ 2021-06-08 12:11 UTC (permalink / raw) To: gcc-patches, libstdc++ [-- Attachment #1: Type: text/plain, Size: 1458 bytes --] From: Matthias Kretz <kretz@kde.org> Sometimes fixed_size objects will get unnecessarily copied on the stack. The simd implementation should never pass _SimdTuple by value to avoid requiring the optimizer to see through these copies. Signed-off-by: Matthias Kretz <m.kretz@gsi.de> libstdc++-v3/ChangeLog: * include/experimental/bits/simd_converter.h (_SimdConverter::operator()): Pass _SimdTuple by const-ref. * include/experimental/bits/simd_fixed_size.h (_GLIBCXX_SIMD_FIXED_OP): Pass binary operator _SimdTuple arguments by const-ref. (_S_masked_unary): Pass _SimdTuple by const-ref. --- libstdc++-v3/include/experimental/bits/simd_converter.h | 2 +- libstdc++-v3/include/experimental/bits/simd_fixed_size.h | 5 ++--- 2 files changed, 3 insertions(+), 4 deletions(-) -- ────────────────────────────────────────────────────────────────────────── Dr. Matthias Kretz https://mattkretz.github.io GSI Helmholtz Centre for Heavy Ion Research https://gsi.de std::experimental::simd https://github.com/VcDevel/std-simd ────────────────────────────────────────────────────────────────────────── [-- Attachment #2: 0003-libstdc-Improve-fixed_size-codegen.patch --] [-- Type: text/x-patch, Size: 2133 bytes --] diff --git a/libstdc++-v3/include/experimental/bits/simd_converter.h b/libstdc++-v3/include/experimental/bits/simd_converter.h index 9c8bf382df9..11999df25e4 100644 --- a/libstdc++-v3/include/experimental/bits/simd_converter.h +++ b/libstdc++-v3/include/experimental/bits/simd_converter.h @@ -316,7 +316,7 @@ template <typename _From, int _Np, typename _To, typename _Ap> _GLIBCXX_SIMD_INTRINSIC constexpr typename _SimdTraits<_To, _Ap>::_SimdMember - operator()(_Arg __x) const noexcept + operator()(const _Arg& __x) const noexcept { if constexpr (_Arg::_S_tuple_size == 1) return __vector_convert<__vector_type_t<_To, _Np>>(__x.first); diff --git a/libstdc++-v3/include/experimental/bits/simd_fixed_size.h b/libstdc++-v3/include/experimental/bits/simd_fixed_size.h index b6fb47cdf39..dc2fb90b9b2 100644 --- a/libstdc++-v3/include/experimental/bits/simd_fixed_size.h +++ b/libstdc++-v3/include/experimental/bits/simd_fixed_size.h @@ -1480,7 +1480,7 @@ template <int _Np> #define _GLIBCXX_SIMD_FIXED_OP(name_, op_) \ template <typename _Tp, typename... _As> \ static inline constexpr _SimdTuple<_Tp, _As...> name_( \ - const _SimdTuple<_Tp, _As...> __x, const _SimdTuple<_Tp, _As...> __y) \ + const _SimdTuple<_Tp, _As...>& __x, const _SimdTuple<_Tp, _As...>& __y)\ { \ return __x._M_apply_per_chunk( \ [](auto __impl, auto __xx, auto __yy) constexpr { \ @@ -1780,8 +1780,7 @@ template <int _Np> // _S_masked_unary {{{2 template <template <typename> class _Op, typename _Tp, typename... _As> static inline _SimdTuple<_Tp, _As...> - _S_masked_unary(const _MaskMember __bits, - const _SimdTuple<_Tp, _As...> __v) // TODO: const-ref __v? + _S_masked_unary(const _MaskMember __bits, const _SimdTuple<_Tp, _As...>& __v) { return __v._M_apply_wrapped([&__bits](auto __meta, auto __native) constexpr { ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 04/11] libstdc++: Make use of __builtin_bit_cast 2021-06-08 12:10 [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups Matthias Kretz ` (2 preceding siblings ...) 2021-06-08 12:11 ` [PATCH 03/11] libstdc++: Improve fixed_size codegen Matthias Kretz @ 2021-06-08 12:11 ` Matthias Kretz 2021-06-11 10:53 ` [PATCH 04/11 v2] " Matthias Kretz 2021-06-08 12:11 ` [PATCH 05/11] libstdc++: Remove incorrect fabs overload Matthias Kretz ` (7 subsequent siblings) 11 siblings, 1 reply; 29+ messages in thread From: Matthias Kretz @ 2021-06-08 12:11 UTC (permalink / raw) To: gcc-patches, libstdc++ [-- Attachment #1: Type: text/plain, Size: 2008 bytes --] From: Matthias Kretz <kretz@kde.org> The __bit_cast function was a hack to achieve what __builtin_bit_cast can do, therefore use __builtin_bit_cast if possible. However, __builtin_bit_cast cannot be used to cast from/to fixed_size_simd, since it isn't trivially copyable (in the language sense — in principle it is). Therefore add __proposed::simd_bit_cast to enable the use case required in the test framework. Signed-off-by: Matthias Kretz <m.kretz@gsi.de> libstdc++-v3/ChangeLog: * include/experimental/bits/simd.h (__bit_cast): Implement via __builtin_bit_cast #if available. (__proposed::simd_bit_cast): Add overloads for simd and simd_mask, which use __builtin_bit_cast (or __bit_cast #if not available), which return an object of the requested type with the same bits as the argument. * include/experimental/bits/simd_math.h: Use simd_bit_cast instead of __bit_cast to allow casts to fixed_size_simd. * testsuite/experimental/simd/tests/bits/test_values.h: Switch from __bit_cast to __proposed::simd_bit_cast since the former will not cast fixed_size objects anymore. --- libstdc++-v3/include/experimental/bits/simd.h | 40 ++++++++++++++++++- .../include/experimental/bits/simd_math.h | 8 ++-- .../simd/tests/bits/test_values.h | 8 ++-- 3 files changed, 46 insertions(+), 10 deletions(-) -- ────────────────────────────────────────────────────────────────────────── Dr. Matthias Kretz https://mattkretz.github.io GSI Helmholtz Centre for Heavy Ion Research https://gsi.de std::experimental::simd https://github.com/VcDevel/std-simd ────────────────────────────────────────────────────────────────────────── [-- Attachment #2: 0004-libstdc-Make-use-of-__builtin_bit_cast.patch --] [-- Type: text/x-patch, Size: 4429 bytes --] diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/include/experimental/bits/simd.h index 163f1b574e2..5d243f22434 100644 --- a/libstdc++-v3/include/experimental/bits/simd.h +++ b/libstdc++-v3/include/experimental/bits/simd.h @@ -1598,7 +1598,9 @@ template <typename _To, typename _From> _GLIBCXX_SIMD_INTRINSIC constexpr _To __bit_cast(const _From __x) { - // TODO: implement with / replace by __builtin_bit_cast ASAP +#if __has_builtin(__builtin_bit_cast) + return __builtin_bit_cast(_To, __x); +#else static_assert(sizeof(_To) == sizeof(_From)); constexpr bool __to_is_vectorizable = is_arithmetic_v<_To> || is_enum_v<_To>; @@ -1629,6 +1631,7 @@ template <typename _To, typename _From> reinterpret_cast<const char*>(&__x), sizeof(_To)); return __r; } +#endif } // }}} @@ -2900,6 +2903,41 @@ template <typename _Tp, typename _Up, typename _Ap, return {__private_init, _RM::abi_type::_MaskImpl::template _S_convert< typename _RM::simd_type::value_type>(__x)}; } + +template <typename _To, typename _Up, typename _Abi> + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR + _To + simd_bit_cast(const simd<_Up, _Abi>& __x) + { + using _Tp = typename _To::value_type; + using _ToMember = typename _SimdTraits<_Tp, typename _To::abi_type>::_SimdMember; + using _From = simd<_Up, _Abi>; + using _FromMember = typename _SimdTraits<_Up, _Abi>::_SimdMember; + // with concepts, the following should be constraints + static_assert(sizeof(_To) == sizeof(_From)); + static_assert(is_trivially_copyable_v<_Tp> && is_trivially_copyable_v<_Up>); + static_assert(is_trivially_copyable_v<_ToMember> && is_trivially_copyable_v<_FromMember>); +#if __has_builtin(__builtin_bit_cast) + return {__private_init, __builtin_bit_cast(_ToMember, __data(__x))}; +#else + return {__private_init, __bit_cast<_ToMember>(__data(__x))}; +#endif + } + +template <typename _To, typename _Up, typename _Abi> + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR + _To + simd_bit_cast(const simd_mask<_Up, _Abi>& __x) + { + using _From = simd_mask<_Up, _Abi>; + static_assert(sizeof(_To) == sizeof(_From)); + static_assert(is_trivially_copyable_v<_To> && is_trivially_copyable_v<_From>); +#if __has_builtin(__builtin_bit_cast) + return __builtin_bit_cast(_To, __x); +#else + return __bit_cast<_To>(__x); +#endif + } } // namespace __proposed // simd_cast {{{2 diff --git a/libstdc++-v3/include/experimental/bits/simd_math.h b/libstdc++-v3/include/experimental/bits/simd_math.h index d954e761eee..3ade293fcbf 100644 --- a/libstdc++-v3/include/experimental/bits/simd_math.h +++ b/libstdc++-v3/include/experimental/bits/simd_math.h @@ -700,11 +700,9 @@ template <typename _Tp, typename _Abi> // (inf and NaN are excluded by -ffinite-math-only) const auto __iszero_inf_nan = __x == 0; #else - const auto __as_int - = __bit_cast<rebind_simd_t<__int_for_sizeof_t<_Tp>, _V>>(abs(__x)); - const auto __inf - = __bit_cast<rebind_simd_t<__int_for_sizeof_t<_Tp>, _V>>( - _V(__infinity_v<_Tp>)); + using _Ip = __int_for_sizeof_t<_Tp>; + const auto __as_int = simd_bit_cast<rebind_simd_t<_Ip, _V>>(abs(__x)); + const auto __inf = simd_bit_cast<rebind_simd_t<_Ip, _V>>(_V(__infinity_v<_Tp>)); const auto __iszero_inf_nan = static_simd_cast<typename _V::mask_type>( __as_int == 0 || __as_int >= __inf); #endif diff --git a/libstdc++-v3/testsuite/experimental/simd/tests/bits/test_values.h b/libstdc++-v3/testsuite/experimental/simd/tests/bits/test_values.h index b69bd0b704d..67aa870659b 100644 --- a/libstdc++-v3/testsuite/experimental/simd/tests/bits/test_values.h +++ b/libstdc++-v3/testsuite/experimental/simd/tests/bits/test_values.h @@ -221,11 +221,11 @@ template <class V> if constexpr (sizeof(T) <= sizeof(double)) { using I = rebind_simd_t<__int_for_sizeof_t<T>, V>; - const I abs_x = __bit_cast<I>(abs(x)); - const I min = __bit_cast<I>(V(std::__norm_min_v<T>)); - const I max = __bit_cast<I>(V(std::__finite_max_v<T>)); + const I abs_x = simd_bit_cast<I>(abs(x)); + const I min = simd_bit_cast<I>(V(std::__norm_min_v<T>)); + const I max = simd_bit_cast<I>(V(std::__finite_max_v<T>)); return static_simd_cast<typename V::mask_type>( - __bit_cast<I>(x) == 0 || (abs_x >= min && abs_x <= max)); + simd_bit_cast<I>(x) == 0 || (abs_x >= min && abs_x <= max)); } else { ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 04/11 v2] libstdc++: Make use of __builtin_bit_cast 2021-06-08 12:11 ` [PATCH 04/11] libstdc++: Make use of __builtin_bit_cast Matthias Kretz @ 2021-06-11 10:53 ` Matthias Kretz 2021-06-24 14:01 ` [PATCH 04/11 v3] " Matthias Kretz 0 siblings, 1 reply; 29+ messages in thread From: Matthias Kretz @ 2021-06-11 10:53 UTC (permalink / raw) To: gcc-patches, libstdc++ [-- Attachment #1: Type: text/plain, Size: 2401 bytes --] While testing newer patches I found several missing conversions from __bit_cast to simd_bit_cast in this patch (i.e. where bit casting to / from fixed_size was sometimes required). Corrected patch attached. From: Matthias Kretz <kretz@kde.org> The __bit_cast function was a hack to achieve what __builtin_bit_cast can do, therefore use __builtin_bit_cast if possible. However, __builtin_bit_cast cannot be used to cast from/to fixed_size_simd, since it isn't trivially copyable (in the language sense — in principle it is). Therefore add __proposed::simd_bit_cast to enable the use case required in the test framework. Signed-off-by: Matthias Kretz <m.kretz@gsi.de> libstdc++-v3/ChangeLog: * include/experimental/bits/simd.h (__bit_cast): Implement via __builtin_bit_cast #if available. (__proposed::simd_bit_cast): Add overloads for simd and simd_mask, which use __builtin_bit_cast (or __bit_cast #if not available), which return an object of the requested type with the same bits as the argument. * include/experimental/bits/simd_math.h: Use simd_bit_cast instead of __bit_cast to allow casts to fixed_size_simd. (copysign): Remove branch that was only required if __bit_cast cannot be constexpr. * testsuite/experimental/simd/tests/bits/test_values.h: Switch from __bit_cast to __proposed::simd_bit_cast since the former will not cast fixed_size objects anymore. --- libstdc++-v3/include/experimental/bits/simd.h | 57 ++++++++++++++++++- .../include/experimental/bits/simd_math.h | 36 +++++------- .../simd/tests/bits/test_values.h | 8 +-- 3 files changed, 75 insertions(+), 26 deletions(-) -- ────────────────────────────────────────────────────────────────────────── Dr. Matthias Kretz https://mattkretz.github.io GSI Helmholtz Centre for Heavy Ion Research https://gsi.de std::experimental::simd https://github.com/VcDevel/std-simd ────────────────────────────────────────────────────────────────────────── [-- Attachment #2: 0001-libstdc-Make-use-of-__builtin_bit_cast.patch --] [-- Type: text/x-patch, Size: 8732 bytes --] diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/include/experimental/bits/simd.h index 163f1b574e2..852d0b62012 100644 --- a/libstdc++-v3/include/experimental/bits/simd.h +++ b/libstdc++-v3/include/experimental/bits/simd.h @@ -1598,7 +1598,9 @@ template <typename _To, typename _From> _GLIBCXX_SIMD_INTRINSIC constexpr _To __bit_cast(const _From __x) { - // TODO: implement with / replace by __builtin_bit_cast ASAP +#if __has_builtin(__builtin_bit_cast) + return __builtin_bit_cast(_To, __x); +#else static_assert(sizeof(_To) == sizeof(_From)); constexpr bool __to_is_vectorizable = is_arithmetic_v<_To> || is_enum_v<_To>; @@ -1629,6 +1631,7 @@ template <typename _To, typename _From> reinterpret_cast<const char*>(&__x), sizeof(_To)); return __r; } +#endif } // }}} @@ -2900,6 +2903,58 @@ template <typename _Tp, typename _Up, typename _Ap, return {__private_init, _RM::abi_type::_MaskImpl::template _S_convert< typename _RM::simd_type::value_type>(__x)}; } + +template <typename _To, typename _Up, typename _Abi> + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR + _To + simd_bit_cast(const simd<_Up, _Abi>& __x) + { + using _Tp = typename _To::value_type; + using _ToMember = typename _SimdTraits<_Tp, typename _To::abi_type>::_SimdMember; + using _From = simd<_Up, _Abi>; + using _FromMember = typename _SimdTraits<_Up, _Abi>::_SimdMember; + // with concepts, the following should be constraints + static_assert(sizeof(_To) == sizeof(_From)); + static_assert(is_trivially_copyable_v<_Tp> && is_trivially_copyable_v<_Up>); + static_assert(is_trivially_copyable_v<_ToMember> && is_trivially_copyable_v<_FromMember>); +#if __has_builtin(__builtin_bit_cast) + return {__private_init, __builtin_bit_cast(_ToMember, __data(__x))}; +#else + return {__private_init, __bit_cast<_ToMember>(__data(__x))}; +#endif + } + +template <typename _To, typename _Up, typename _Abi> + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR + _To + simd_bit_cast(const simd_mask<_Up, _Abi>& __x) + { + using _From = simd_mask<_Up, _Abi>; + static_assert(sizeof(_To) == sizeof(_From)); + static_assert(is_trivially_copyable_v<_From>); + // _To can be simd<T, A>, specifically simd<T, fixed_size<N>> in which case _To is not trivially + // copyable. + if constexpr (is_simd_v<_To>) + { + using _Tp = typename _To::value_type; + using _ToMember = typename _SimdTraits<_Tp, typename _To::abi_type>::_SimdMember; + static_assert(is_trivially_copyable_v<_ToMember>); +#if __has_builtin(__builtin_bit_cast) + return {__private_init, __builtin_bit_cast(_ToMember, __x)}; +#else + return {__private_init, __bit_cast<_ToMember>(__x)}; +#endif + } + else + { + static_assert(is_trivially_copyable_v<_To>); +#if __has_builtin(__builtin_bit_cast) + return __builtin_bit_cast(_To, __x); +#else + return __bit_cast<_To>(__x); +#endif + } + } } // namespace __proposed // simd_cast {{{2 diff --git a/libstdc++-v3/include/experimental/bits/simd_math.h b/libstdc++-v3/include/experimental/bits/simd_math.h index d954e761eee..afd8b5a028f 100644 --- a/libstdc++-v3/include/experimental/bits/simd_math.h +++ b/libstdc++-v3/include/experimental/bits/simd_math.h @@ -405,10 +405,11 @@ template <typename _Tp, typename _Abi> using _Vp = simd<_Tp, _Abi>; using _Up = make_unsigned_t<__int_for_sizeof_t<_Tp>>; using namespace std::experimental::__float_bitwise_operators; + using namespace std::experimental::__proposed; const _Vp __exponent_mask = __infinity_v<_Tp>; // 0x7f800000 or 0x7ff0000000000000 return static_simd_cast<rebind_simd_t<int, _Vp>>( - __bit_cast<rebind_simd_t<_Up, _Vp>>(__v & __exponent_mask) + simd_bit_cast<rebind_simd_t<_Up, _Vp>>(__v & __exponent_mask) >> (__digits_v<_Tp> - 1)); } @@ -700,11 +701,9 @@ template <typename _Tp, typename _Abi> // (inf and NaN are excluded by -ffinite-math-only) const auto __iszero_inf_nan = __x == 0; #else - const auto __as_int - = __bit_cast<rebind_simd_t<__int_for_sizeof_t<_Tp>, _V>>(abs(__x)); - const auto __inf - = __bit_cast<rebind_simd_t<__int_for_sizeof_t<_Tp>, _V>>( - _V(__infinity_v<_Tp>)); + using _Ip = __int_for_sizeof_t<_Tp>; + const auto __as_int = simd_bit_cast<rebind_simd_t<_Ip, _V>>(abs(__x)); + const auto __inf = simd_bit_cast<rebind_simd_t<_Ip, _V>>(_V(__infinity_v<_Tp>)); const auto __iszero_inf_nan = static_simd_cast<typename _V::mask_type>( __as_int == 0 || __as_int >= __inf); #endif @@ -722,10 +721,10 @@ template <typename _Tp, typename _Abi> where(__value_isnormal.__cvt(), __e) = __exponent_bits; static_assert(sizeof(_IV) == sizeof(__value_isnormal)); const _IV __offset - = (__bit_cast<_IV>(__value_isnormal) & _IV(__exp_adjust)) - | (__bit_cast<_IV>(static_simd_cast<_MaskType>(__exponent_bits == 0) - & static_simd_cast<_MaskType>(__x != 0)) - & _IV(__exp_adjust + __exp_offset)); + = (simd_bit_cast<_IV>(__value_isnormal) & _IV(__exp_adjust)) + | (simd_bit_cast<_IV>(static_simd_cast<_MaskType>(__exponent_bits == 0) + & static_simd_cast<_MaskType>(__x != 0)) + & _IV(__exp_adjust + __exp_offset)); *__exp = simd_cast<_Samesize<int, _V>>(__e - __offset); return __mant; } @@ -796,7 +795,7 @@ template <typename _Tp, typename _Abi> using namespace std::experimental::__proposed; using _IV = rebind_simd_t< conditional_t<sizeof(_Tp) == sizeof(_LLong), _LLong, int>, _V>; - return (__bit_cast<_IV>(__v) >> (__digits_v<_Tp> - 1)) + return (simd_bit_cast<_IV>(__v) >> (__digits_v<_Tp> - 1)) - (__max_exponent_v<_Tp> - 1); }; _V __r = static_simd_cast<_V>(__exponent(abs_x)); @@ -981,6 +980,7 @@ template <typename _VV> // Skylake-AVX512 (not even for SSE and AVX vectors, and really bad for // AVX-512). using namespace __float_bitwise_operators; + using namespace __proposed; _V __absx = abs(__x); // no error _V __absy = abs(__y); // no error _V __hi = max(__absx, __absy); // no error @@ -1028,9 +1028,9 @@ template <typename _VV> #ifdef __FAST_MATH__ using _Ip = __int_for_sizeof_t<_Tp>; using _IV = rebind_simd_t<_Ip, _V>; - const auto __as_int = __bit_cast<_IV>(__hi_exp); + const auto __as_int = simd_bit_cast<_IV>(__hi_exp); const _V __scale - = __bit_cast<_V>(2 * __bit_cast<_Ip>(_Tp(1)) - __as_int); + = simd_bit_cast<_V>(2 * simd_bit_cast<_Ip>(_Tp(1)) - __as_int); #else const _V __scale = (__hi_exp ^ __inf) * _Tp(.5); #endif @@ -1197,9 +1197,9 @@ _GLIBCXX_SIMD_CVTING2(hypot) #ifdef __FAST_MATH__ using _Ip = __int_for_sizeof_t<_Tp>; using _IV = rebind_simd_t<_Ip, _V>; - const auto __as_int = __bit_cast<_IV>(__hi_exp); + const auto __as_int = simd_bit_cast<_IV>(__hi_exp); const _V __scale - = __bit_cast<_V>(2 * __bit_cast<_Ip>(_Tp(1)) - __as_int); + = simd_bit_cast<_V>(2 * simd_bit_cast<_Ip>(_Tp(1)) - __as_int); #else const _V __scale = (__hi_exp ^ __inf) * _Tp(.5); #endif @@ -1306,12 +1306,6 @@ template <typename _Tp, typename _Abi> return std::copysign(__x[0], __y[0]); else if constexpr (__is_fixed_size_abi_v<_Abi>) return {__private_init, _Abi::_SimdImpl::_S_copysign(__data(__x), __data(__y))}; - else if constexpr (is_same_v<_Tp, long double> && sizeof(_Tp) == 12) - // Remove this case once __bit_cast is implemented via __builtin_bit_cast. - // It is necessary, because __signmask below cannot be computed at compile - // time. - return simd<_Tp, _Abi>( - [&](auto __i) { return std::copysign(__x[__i], __y[__i]); }); else { using _V = simd<_Tp, _Abi>; diff --git a/libstdc++-v3/testsuite/experimental/simd/tests/bits/test_values.h b/libstdc++-v3/testsuite/experimental/simd/tests/bits/test_values.h index b69bd0b704d..67aa870659b 100644 --- a/libstdc++-v3/testsuite/experimental/simd/tests/bits/test_values.h +++ b/libstdc++-v3/testsuite/experimental/simd/tests/bits/test_values.h @@ -221,11 +221,11 @@ template <class V> if constexpr (sizeof(T) <= sizeof(double)) { using I = rebind_simd_t<__int_for_sizeof_t<T>, V>; - const I abs_x = __bit_cast<I>(abs(x)); - const I min = __bit_cast<I>(V(std::__norm_min_v<T>)); - const I max = __bit_cast<I>(V(std::__finite_max_v<T>)); + const I abs_x = simd_bit_cast<I>(abs(x)); + const I min = simd_bit_cast<I>(V(std::__norm_min_v<T>)); + const I max = simd_bit_cast<I>(V(std::__finite_max_v<T>)); return static_simd_cast<typename V::mask_type>( - __bit_cast<I>(x) == 0 || (abs_x >= min && abs_x <= max)); + simd_bit_cast<I>(x) == 0 || (abs_x >= min && abs_x <= max)); } else { ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast 2021-06-11 10:53 ` [PATCH 04/11 v2] " Matthias Kretz @ 2021-06-24 14:01 ` Matthias Kretz 2021-06-24 14:08 ` Jakub Jelinek 2021-06-25 11:23 ` Jonathan Wakely 0 siblings, 2 replies; 29+ messages in thread From: Matthias Kretz @ 2021-06-24 14:01 UTC (permalink / raw) To: gcc-patches, libstdc++ [-- Attachment #1: Type: text/plain, Size: 2303 bytes --] For -ffast-math there was a missing using namespace __proposed left. The attached patch resolves the issue. From: Matthias Kretz <m.kretz@gsi.de> The __bit_cast function was a hack to achieve what __builtin_bit_cast can do, therefore use __builtin_bit_cast if possible. However, __builtin_bit_cast cannot be used to cast from/to fixed_size_simd, since it isn't trivially copyable (in the language sense — in principle it is). Therefore add __proposed::simd_bit_cast to enable the use case required in the test framework. Signed-off-by: Matthias Kretz <m.kretz@gsi.de> libstdc++-v3/ChangeLog: * include/experimental/bits/simd.h (__bit_cast): Implement via __builtin_bit_cast #if available. (__proposed::simd_bit_cast): Add overloads for simd and simd_mask, which use __builtin_bit_cast (or __bit_cast #if not available), which return an object of the requested type with the same bits as the argument. * include/experimental/bits/simd_math.h: Use simd_bit_cast instead of __bit_cast to allow casts to fixed_size_simd. (copysign): Remove branch that was only required if __bit_cast cannot be constexpr. * testsuite/experimental/simd/tests/bits/test_values.h: Switch from __bit_cast to __proposed::simd_bit_cast since the former will not cast fixed_size objects anymore. --- libstdc++-v3/include/experimental/bits/simd.h | 57 ++++++++++++++++++- .../include/experimental/bits/simd_math.h | 37 ++++++------ .../simd/tests/bits/test_values.h | 8 +-- 3 files changed, 76 insertions(+), 26 deletions(-) -- ────────────────────────────────────────────────────────────────────────── Dr. Matthias Kretz https://mattkretz.github.io GSI Helmholtz Centre for Heavy Ion Research https://gsi.de std::experimental::simd https://github.com/VcDevel/std-simd ────────────────────────────────────────────────────────────────────────── [-- Attachment #2: 0001-libstdc-Make-use-of-__builtin_bit_cast.patch --] [-- Type: text/x-patch, Size: 9051 bytes --] diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/include/experimental/bits/simd.h index 163f1b574e2..852d0b62012 100644 --- a/libstdc++-v3/include/experimental/bits/simd.h +++ b/libstdc++-v3/include/experimental/bits/simd.h @@ -1598,7 +1598,9 @@ template <typename _To, typename _From> _GLIBCXX_SIMD_INTRINSIC constexpr _To __bit_cast(const _From __x) { - // TODO: implement with / replace by __builtin_bit_cast ASAP +#if __has_builtin(__builtin_bit_cast) + return __builtin_bit_cast(_To, __x); +#else static_assert(sizeof(_To) == sizeof(_From)); constexpr bool __to_is_vectorizable = is_arithmetic_v<_To> || is_enum_v<_To>; @@ -1629,6 +1631,7 @@ template <typename _To, typename _From> reinterpret_cast<const char*>(&__x), sizeof(_To)); return __r; } +#endif } // }}} @@ -2900,6 +2903,58 @@ template <typename _Tp, typename _Up, typename _Ap, return {__private_init, _RM::abi_type::_MaskImpl::template _S_convert< typename _RM::simd_type::value_type>(__x)}; } + +template <typename _To, typename _Up, typename _Abi> + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR + _To + simd_bit_cast(const simd<_Up, _Abi>& __x) + { + using _Tp = typename _To::value_type; + using _ToMember = typename _SimdTraits<_Tp, typename _To::abi_type>::_SimdMember; + using _From = simd<_Up, _Abi>; + using _FromMember = typename _SimdTraits<_Up, _Abi>::_SimdMember; + // with concepts, the following should be constraints + static_assert(sizeof(_To) == sizeof(_From)); + static_assert(is_trivially_copyable_v<_Tp> && is_trivially_copyable_v<_Up>); + static_assert(is_trivially_copyable_v<_ToMember> && is_trivially_copyable_v<_FromMember>); +#if __has_builtin(__builtin_bit_cast) + return {__private_init, __builtin_bit_cast(_ToMember, __data(__x))}; +#else + return {__private_init, __bit_cast<_ToMember>(__data(__x))}; +#endif + } + +template <typename _To, typename _Up, typename _Abi> + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR + _To + simd_bit_cast(const simd_mask<_Up, _Abi>& __x) + { + using _From = simd_mask<_Up, _Abi>; + static_assert(sizeof(_To) == sizeof(_From)); + static_assert(is_trivially_copyable_v<_From>); + // _To can be simd<T, A>, specifically simd<T, fixed_size<N>> in which case _To is not trivially + // copyable. + if constexpr (is_simd_v<_To>) + { + using _Tp = typename _To::value_type; + using _ToMember = typename _SimdTraits<_Tp, typename _To::abi_type>::_SimdMember; + static_assert(is_trivially_copyable_v<_ToMember>); +#if __has_builtin(__builtin_bit_cast) + return {__private_init, __builtin_bit_cast(_ToMember, __x)}; +#else + return {__private_init, __bit_cast<_ToMember>(__x)}; +#endif + } + else + { + static_assert(is_trivially_copyable_v<_To>); +#if __has_builtin(__builtin_bit_cast) + return __builtin_bit_cast(_To, __x); +#else + return __bit_cast<_To>(__x); +#endif + } + } } // namespace __proposed // simd_cast {{{2 diff --git a/libstdc++-v3/include/experimental/bits/simd_math.h b/libstdc++-v3/include/experimental/bits/simd_math.h index d954e761eee..ef2bdc641b8 100644 --- a/libstdc++-v3/include/experimental/bits/simd_math.h +++ b/libstdc++-v3/include/experimental/bits/simd_math.h @@ -405,10 +405,11 @@ template <typename _Tp, typename _Abi> using _Vp = simd<_Tp, _Abi>; using _Up = make_unsigned_t<__int_for_sizeof_t<_Tp>>; using namespace std::experimental::__float_bitwise_operators; + using namespace std::experimental::__proposed; const _Vp __exponent_mask = __infinity_v<_Tp>; // 0x7f800000 or 0x7ff0000000000000 return static_simd_cast<rebind_simd_t<int, _Vp>>( - __bit_cast<rebind_simd_t<_Up, _Vp>>(__v & __exponent_mask) + simd_bit_cast<rebind_simd_t<_Up, _Vp>>(__v & __exponent_mask) >> (__digits_v<_Tp> - 1)); } @@ -700,11 +701,9 @@ template <typename _Tp, typename _Abi> // (inf and NaN are excluded by -ffinite-math-only) const auto __iszero_inf_nan = __x == 0; #else - const auto __as_int - = __bit_cast<rebind_simd_t<__int_for_sizeof_t<_Tp>, _V>>(abs(__x)); - const auto __inf - = __bit_cast<rebind_simd_t<__int_for_sizeof_t<_Tp>, _V>>( - _V(__infinity_v<_Tp>)); + using _Ip = __int_for_sizeof_t<_Tp>; + const auto __as_int = simd_bit_cast<rebind_simd_t<_Ip, _V>>(abs(__x)); + const auto __inf = simd_bit_cast<rebind_simd_t<_Ip, _V>>(_V(__infinity_v<_Tp>)); const auto __iszero_inf_nan = static_simd_cast<typename _V::mask_type>( __as_int == 0 || __as_int >= __inf); #endif @@ -722,10 +721,10 @@ template <typename _Tp, typename _Abi> where(__value_isnormal.__cvt(), __e) = __exponent_bits; static_assert(sizeof(_IV) == sizeof(__value_isnormal)); const _IV __offset - = (__bit_cast<_IV>(__value_isnormal) & _IV(__exp_adjust)) - | (__bit_cast<_IV>(static_simd_cast<_MaskType>(__exponent_bits == 0) - & static_simd_cast<_MaskType>(__x != 0)) - & _IV(__exp_adjust + __exp_offset)); + = (simd_bit_cast<_IV>(__value_isnormal) & _IV(__exp_adjust)) + | (simd_bit_cast<_IV>(static_simd_cast<_MaskType>(__exponent_bits == 0) + & static_simd_cast<_MaskType>(__x != 0)) + & _IV(__exp_adjust + __exp_offset)); *__exp = simd_cast<_Samesize<int, _V>>(__e - __offset); return __mant; } @@ -796,7 +795,7 @@ template <typename _Tp, typename _Abi> using namespace std::experimental::__proposed; using _IV = rebind_simd_t< conditional_t<sizeof(_Tp) == sizeof(_LLong), _LLong, int>, _V>; - return (__bit_cast<_IV>(__v) >> (__digits_v<_Tp> - 1)) + return (simd_bit_cast<_IV>(__v) >> (__digits_v<_Tp> - 1)) - (__max_exponent_v<_Tp> - 1); }; _V __r = static_simd_cast<_V>(__exponent(abs_x)); @@ -981,6 +980,7 @@ template <typename _VV> // Skylake-AVX512 (not even for SSE and AVX vectors, and really bad for // AVX-512). using namespace __float_bitwise_operators; + using namespace __proposed; _V __absx = abs(__x); // no error _V __absy = abs(__y); // no error _V __hi = max(__absx, __absy); // no error @@ -1028,9 +1028,9 @@ template <typename _VV> #ifdef __FAST_MATH__ using _Ip = __int_for_sizeof_t<_Tp>; using _IV = rebind_simd_t<_Ip, _V>; - const auto __as_int = __bit_cast<_IV>(__hi_exp); + const auto __as_int = simd_bit_cast<_IV>(__hi_exp); const _V __scale - = __bit_cast<_V>(2 * __bit_cast<_Ip>(_Tp(1)) - __as_int); + = simd_bit_cast<_V>(2 * simd_bit_cast<_Ip>(_Tp(1)) - __as_int); #else const _V __scale = (__hi_exp ^ __inf) * _Tp(.5); #endif @@ -1118,6 +1118,7 @@ _GLIBCXX_SIMD_CVTING2(hypot) else { using namespace __float_bitwise_operators; + using namespace __proposed; const _V __absx = abs(__x); // no error const _V __absy = abs(__y); // no error const _V __absz = abs(__z); // no error @@ -1197,9 +1198,9 @@ _GLIBCXX_SIMD_CVTING2(hypot) #ifdef __FAST_MATH__ using _Ip = __int_for_sizeof_t<_Tp>; using _IV = rebind_simd_t<_Ip, _V>; - const auto __as_int = __bit_cast<_IV>(__hi_exp); + const auto __as_int = simd_bit_cast<_IV>(__hi_exp); const _V __scale - = __bit_cast<_V>(2 * __bit_cast<_Ip>(_Tp(1)) - __as_int); + = simd_bit_cast<_V>(2 * simd_bit_cast<_Ip>(_Tp(1)) - __as_int); #else const _V __scale = (__hi_exp ^ __inf) * _Tp(.5); #endif @@ -1306,12 +1307,6 @@ template <typename _Tp, typename _Abi> return std::copysign(__x[0], __y[0]); else if constexpr (__is_fixed_size_abi_v<_Abi>) return {__private_init, _Abi::_SimdImpl::_S_copysign(__data(__x), __data(__y))}; - else if constexpr (is_same_v<_Tp, long double> && sizeof(_Tp) == 12) - // Remove this case once __bit_cast is implemented via __builtin_bit_cast. - // It is necessary, because __signmask below cannot be computed at compile - // time. - return simd<_Tp, _Abi>( - [&](auto __i) { return std::copysign(__x[__i], __y[__i]); }); else { using _V = simd<_Tp, _Abi>; diff --git a/libstdc++-v3/testsuite/experimental/simd/tests/bits/test_values.h b/libstdc++-v3/testsuite/experimental/simd/tests/bits/test_values.h index b69bd0b704d..67aa870659b 100644 --- a/libstdc++-v3/testsuite/experimental/simd/tests/bits/test_values.h +++ b/libstdc++-v3/testsuite/experimental/simd/tests/bits/test_values.h @@ -221,11 +221,11 @@ template <class V> if constexpr (sizeof(T) <= sizeof(double)) { using I = rebind_simd_t<__int_for_sizeof_t<T>, V>; - const I abs_x = __bit_cast<I>(abs(x)); - const I min = __bit_cast<I>(V(std::__norm_min_v<T>)); - const I max = __bit_cast<I>(V(std::__finite_max_v<T>)); + const I abs_x = simd_bit_cast<I>(abs(x)); + const I min = simd_bit_cast<I>(V(std::__norm_min_v<T>)); + const I max = simd_bit_cast<I>(V(std::__finite_max_v<T>)); return static_simd_cast<typename V::mask_type>( - __bit_cast<I>(x) == 0 || (abs_x >= min && abs_x <= max)); + simd_bit_cast<I>(x) == 0 || (abs_x >= min && abs_x <= max)); } else { ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast 2021-06-24 14:01 ` [PATCH 04/11 v3] " Matthias Kretz @ 2021-06-24 14:08 ` Jakub Jelinek 2021-06-24 14:11 ` Jonathan Wakely 2021-06-25 11:23 ` Jonathan Wakely 1 sibling, 1 reply; 29+ messages in thread From: Jakub Jelinek @ 2021-06-24 14:08 UTC (permalink / raw) To: Matthias Kretz, Jonathan Wakely; +Cc: gcc-patches, libstdc++ On Thu, Jun 24, 2021 at 04:01:34PM +0200, Matthias Kretz wrote: > --- a/libstdc++-v3/include/experimental/bits/simd.h > +++ b/libstdc++-v3/include/experimental/bits/simd.h > @@ -1598,7 +1598,9 @@ template <typename _To, typename _From> > _GLIBCXX_SIMD_INTRINSIC constexpr _To > __bit_cast(const _From __x) > { > - // TODO: implement with / replace by __builtin_bit_cast ASAP > +#if __has_builtin(__builtin_bit_cast) Shouldn't that use #if _GLIBCXX_HAS_BUILTIN(__builtin_bit_cast) in c++config to define a new macro and use that macro here? Though it is true that c++config already uses #if __has_builtin(__builtin_is_constant_evaluated) and so would fail miserably for compilers that don't support __has_builtin Jakub ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast 2021-06-24 14:08 ` Jakub Jelinek @ 2021-06-24 14:11 ` Jonathan Wakely 2021-06-24 14:12 ` Jonathan Wakely 2021-06-24 14:21 ` Jakub Jelinek 0 siblings, 2 replies; 29+ messages in thread From: Jonathan Wakely @ 2021-06-24 14:11 UTC (permalink / raw) To: Jakub Jelinek; +Cc: Matthias Kretz, gcc Patches, libstdc++ On Thu, 24 Jun 2021 at 15:08, Jakub Jelinek wrote: > > On Thu, Jun 24, 2021 at 04:01:34PM +0200, Matthias Kretz wrote: > > --- a/libstdc++-v3/include/experimental/bits/simd.h > > +++ b/libstdc++-v3/include/experimental/bits/simd.h > > @@ -1598,7 +1598,9 @@ template <typename _To, typename _From> > > _GLIBCXX_SIMD_INTRINSIC constexpr _To > > __bit_cast(const _From __x) > > { > > - // TODO: implement with / replace by __builtin_bit_cast ASAP > > +#if __has_builtin(__builtin_bit_cast) > > Shouldn't that use #if _GLIBCXX_HAS_BUILTIN(__builtin_bit_cast) in > c++config to define a new macro and use that macro here? > Though it is true that c++config already uses > #if __has_builtin(__builtin_is_constant_evaluated) > and so would fail miserably for compilers that don't support __has_builtin GCC was the last of our supported compilers to implement __has_builtin, so for GCC trunk we can assume that it's always supported. The code in c++config.h still has some value for built-ins that aren't called __builtin_xxx because older versions of Clang need different handling for those. But for __builtin_bit_cast and __builtin_is_constant_evaluted we can just use __is_builtin directly. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast 2021-06-24 14:11 ` Jonathan Wakely @ 2021-06-24 14:12 ` Jonathan Wakely 2021-06-24 14:21 ` Jakub Jelinek 1 sibling, 0 replies; 29+ messages in thread From: Jonathan Wakely @ 2021-06-24 14:12 UTC (permalink / raw) To: Jakub Jelinek; +Cc: Matthias Kretz, gcc Patches, libstdc++ On Thu, 24 Jun 2021 at 15:11, Jonathan Wakely wrote: > > On Thu, 24 Jun 2021 at 15:08, Jakub Jelinek wrote: > > > > On Thu, Jun 24, 2021 at 04:01:34PM +0200, Matthias Kretz wrote: > > > --- a/libstdc++-v3/include/experimental/bits/simd.h > > > +++ b/libstdc++-v3/include/experimental/bits/simd.h > > > @@ -1598,7 +1598,9 @@ template <typename _To, typename _From> > > > _GLIBCXX_SIMD_INTRINSIC constexpr _To > > > __bit_cast(const _From __x) > > > { > > > - // TODO: implement with / replace by __builtin_bit_cast ASAP > > > +#if __has_builtin(__builtin_bit_cast) > > > > Shouldn't that use #if _GLIBCXX_HAS_BUILTIN(__builtin_bit_cast) in > > c++config to define a new macro and use that macro here? > > Though it is true that c++config already uses > > #if __has_builtin(__builtin_is_constant_evaluated) > > and so would fail miserably for compilers that don't support __has_builtin > > GCC was the last of our supported compilers to implement > __has_builtin, so for GCC trunk we can assume that it's always > supported. > > The code in c++config.h still has some value for built-ins that aren't > called __builtin_xxx because older versions of Clang need different > handling for those. But for __builtin_bit_cast and > __builtin_is_constant_evaluted we can just use __is_builtin directly. s/__is_builtin/__has_builtin/ ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast 2021-06-24 14:11 ` Jonathan Wakely 2021-06-24 14:12 ` Jonathan Wakely @ 2021-06-24 14:21 ` Jakub Jelinek 2021-06-24 14:34 ` Jonathan Wakely 1 sibling, 1 reply; 29+ messages in thread From: Jakub Jelinek @ 2021-06-24 14:21 UTC (permalink / raw) To: Jonathan Wakely; +Cc: Matthias Kretz, gcc Patches, libstdc++ On Thu, Jun 24, 2021 at 03:11:01PM +0100, Jonathan Wakely wrote: > On Thu, 24 Jun 2021 at 15:08, Jakub Jelinek wrote: > > > > On Thu, Jun 24, 2021 at 04:01:34PM +0200, Matthias Kretz wrote: > > > --- a/libstdc++-v3/include/experimental/bits/simd.h > > > +++ b/libstdc++-v3/include/experimental/bits/simd.h > > > @@ -1598,7 +1598,9 @@ template <typename _To, typename _From> > > > _GLIBCXX_SIMD_INTRINSIC constexpr _To > > > __bit_cast(const _From __x) > > > { > > > - // TODO: implement with / replace by __builtin_bit_cast ASAP > > > +#if __has_builtin(__builtin_bit_cast) > > > > Shouldn't that use #if _GLIBCXX_HAS_BUILTIN(__builtin_bit_cast) in > > c++config to define a new macro and use that macro here? > > Though it is true that c++config already uses > > #if __has_builtin(__builtin_is_constant_evaluated) > > and so would fail miserably for compilers that don't support __has_builtin > > GCC was the last of our supported compilers to implement > __has_builtin, so for GCC trunk we can assume that it's always > supported. We don't support mixing GCC and libstdc++ versions, so I'm not worried about GCC. At least according to godbolt, already clang 3.0 supports it which is 10 years old, so probably fine too, but ICC 19.0/19.1 still doesn't support it, only ICC 2021 does. And ICC 19.1 seems to be released in October 2020. So, wouldn't it be better not to #undef _GLIBCXX_HAS_BUILTIN, move its definition a little bit earlier and use it also for __builtin_is_constant_evaluated? Jakub ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast 2021-06-24 14:21 ` Jakub Jelinek @ 2021-06-24 14:34 ` Jonathan Wakely 2021-06-24 14:40 ` Jonathan Wakely 0 siblings, 1 reply; 29+ messages in thread From: Jonathan Wakely @ 2021-06-24 14:34 UTC (permalink / raw) To: Jakub Jelinek; +Cc: Matthias Kretz, gcc Patches, libstdc++ [-- Attachment #1: Type: text/plain, Size: 2358 bytes --] On Thu, 24 Jun 2021 at 15:21, Jakub Jelinek <jakub@redhat.com> wrote: > > On Thu, Jun 24, 2021 at 03:11:01PM +0100, Jonathan Wakely wrote: > > On Thu, 24 Jun 2021 at 15:08, Jakub Jelinek wrote: > > > > > > On Thu, Jun 24, 2021 at 04:01:34PM +0200, Matthias Kretz wrote: > > > > --- a/libstdc++-v3/include/experimental/bits/simd.h > > > > +++ b/libstdc++-v3/include/experimental/bits/simd.h > > > > @@ -1598,7 +1598,9 @@ template <typename _To, typename _From> > > > > _GLIBCXX_SIMD_INTRINSIC constexpr _To > > > > __bit_cast(const _From __x) > > > > { > > > > - // TODO: implement with / replace by __builtin_bit_cast ASAP > > > > +#if __has_builtin(__builtin_bit_cast) > > > > > > Shouldn't that use #if _GLIBCXX_HAS_BUILTIN(__builtin_bit_cast) in > > > c++config to define a new macro and use that macro here? > > > Though it is true that c++config already uses > > > #if __has_builtin(__builtin_is_constant_evaluated) > > > and so would fail miserably for compilers that don't support __has_builtin > > > > GCC was the last of our supported compilers to implement > > __has_builtin, so for GCC trunk we can assume that it's always > > supported. > > We don't support mixing GCC and libstdc++ versions, so I'm not worried > about GCC. At least according to godbolt, already clang 3.0 supports it > which is 10 years old, so probably fine too, but ICC 19.0/19.1 still doesn't > support it, only ICC 2021 does. And ICC 19.1 seems to be released in > October 2020. > > So, wouldn't it be better not to #undef _GLIBCXX_HAS_BUILTIN, move its > definition a little bit earlier and use it also for > __builtin_is_constant_evaluated? I discussed this with Judy Ward on the Intel compiler team. If you're using their compiler, you should be using the latest version. They also claim 100% compatibility with GCC, for versions they've been able to test. So if you are using libstdc++ headers from a GCC release that supports __has_builtin, then you need to use a release of the Intel compiler that supports __has_builtin. Otherwise, it's unsupported. So in GCC 12 C++ headers we support GCC 12, versions of Intel compatible with GCC 12, and the last few releases of Clang. All of those have __has_builtin. Rather than use the _GLIBCXX_HAS_BUILTIN macro more widely, I'd prefer to not use it where it isn't needed, as in the attached (untested) patch. [-- Attachment #2: patch.txt --] [-- Type: text/plain, Size: 12043 bytes --] diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h index 9911d4deb72..3c075966660 100644 --- a/libstdc++-v3/include/bits/basic_string.h +++ b/libstdc++-v3/include/bits/basic_string.h @@ -55,7 +55,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #ifdef __cpp_lib_is_constant_evaluated // Support P1032R1 in C++20 (but not P0980R1 yet). # define __cpp_lib_constexpr_string 201811L -#elif __cplusplus >= 201703L && _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED +#elif __cplusplus >= 201703L && __has_builtin(__builtin_is_constant_evaluated) // Support P0426R1 changes to char_traits in C++17. # define __cpp_lib_constexpr_string 201611L #elif __cplusplus > 201703L diff --git a/libstdc++-v3/include/bits/c++config b/libstdc++-v3/include/bits/c++config index 9314117aed8..3ec668b65cf 100644 --- a/libstdc++-v3/include/bits/c++config +++ b/libstdc++-v3/include/bits/c++config @@ -720,13 +720,11 @@ namespace std # define _GLIBCXX_DOUBLE_IS_IEEE_BINARY64 1 #endif -#ifdef __has_builtin -# ifdef __is_identifier +#ifdef __is_identifier // Intel and older Clang require !__is_identifier for some built-ins: -# define _GLIBCXX_HAS_BUILTIN(B) __has_builtin(B) || ! __is_identifier(B) -# else -# define _GLIBCXX_HAS_BUILTIN(B) __has_builtin(B) -# endif +# define _GLIBCXX_HAS_BUILTIN(B) __has_builtin(B) || ! __is_identifier(B) +#else +# define _GLIBCXX_HAS_BUILTIN(B) __has_builtin(B) #endif #if _GLIBCXX_HAS_BUILTIN(__has_unique_object_representations) @@ -737,18 +735,10 @@ namespace std # define _GLIBCXX_HAVE_BUILTIN_IS_AGGREGATE 1 #endif -#if _GLIBCXX_HAS_BUILTIN(__builtin_is_constant_evaluated) -# define _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED 1 -#endif - #if _GLIBCXX_HAS_BUILTIN(__is_same) # define _GLIBCXX_HAVE_BUILTIN_IS_SAME 1 #endif -#if _GLIBCXX_HAS_BUILTIN(__builtin_launder) -# define _GLIBCXX_HAVE_BUILTIN_LAUNDER 1 -#endif - #undef _GLIBCXX_HAS_BUILTIN diff --git a/libstdc++-v3/include/bits/char_traits.h b/libstdc++-v3/include/bits/char_traits.h index 3da6e28a513..77ad7be5dfb 100644 --- a/libstdc++-v3/include/bits/char_traits.h +++ b/libstdc++-v3/include/bits/char_traits.h @@ -238,7 +238,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION #ifdef __cpp_lib_is_constant_evaluated // Unofficial macro indicating P1032R1 support in C++20 # define __cpp_lib_constexpr_char_traits 201811L -#elif __cplusplus >= 201703L && _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED +#elif __cplusplus >= 201703L && __has_builtin(__builtin_is_constant_evaluated) // Unofficial macro indicating P0426R1 support in C++17 # define __cpp_lib_constexpr_char_traits 201611L #endif @@ -295,7 +295,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { if (__n == 0) return 0; -#if __cplusplus >= 201703L && _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED +#if __cplusplus >= 201703L && __has_builtin(__builtin_is_constant_evaluated) if (__builtin_is_constant_evaluated()) { for (size_t __i = 0; __i < __n; ++__i) @@ -312,7 +312,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION static _GLIBCXX17_CONSTEXPR size_t length(const char_type* __s) { -#if __cplusplus >= 201703L && _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED +#if __cplusplus >= 201703L && __has_builtin(__builtin_is_constant_evaluated) if (__builtin_is_constant_evaluated()) return __gnu_cxx::char_traits<char_type>::length(__s); #endif @@ -324,7 +324,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { if (__n == 0) return 0; -#if __cplusplus >= 201703L && _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED +#if __cplusplus >= 201703L && __has_builtin(__builtin_is_constant_evaluated) if (__builtin_is_constant_evaluated()) return __gnu_cxx::char_traits<char_type>::find(__s, __n, __a); #endif @@ -422,7 +422,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { if (__n == 0) return 0; -#if __cplusplus >= 201703L && _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED +#if __cplusplus >= 201703L && __has_builtin(__builtin_is_constant_evaluated) if (__builtin_is_constant_evaluated()) return __gnu_cxx::char_traits<char_type>::compare(__s1, __s2, __n); #endif @@ -432,7 +432,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION static _GLIBCXX17_CONSTEXPR size_t length(const char_type* __s) { -#if __cplusplus >= 201703L && _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED +#if __cplusplus >= 201703L && __has_builtin(__builtin_is_constant_evaluated) if (__builtin_is_constant_evaluated()) return __gnu_cxx::char_traits<char_type>::length(__s); #endif @@ -444,7 +444,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { if (__n == 0) return 0; -#if __cplusplus >= 201703L && _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED +#if __cplusplus >= 201703L && __has_builtin(__builtin_is_constant_evaluated) if (__builtin_is_constant_evaluated()) return __gnu_cxx::char_traits<char_type>::find(__s, __n, __a); #endif @@ -539,7 +539,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { if (__n == 0) return 0; -#if __cplusplus >= 201703L && _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED +#if __cplusplus >= 201703L && __has_builtin(__builtin_is_constant_evaluated) if (__builtin_is_constant_evaluated()) return __gnu_cxx::char_traits<char_type>::compare(__s1, __s2, __n); #endif @@ -549,7 +549,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION static _GLIBCXX17_CONSTEXPR size_t length(const char_type* __s) { -#if __cplusplus >= 201703L && _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED +#if __cplusplus >= 201703L && __has_builtin(__builtin_is_constant_evaluated) if (__builtin_is_constant_evaluated()) return __gnu_cxx::char_traits<char_type>::length(__s); #endif @@ -564,7 +564,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { if (__n == 0) return 0; -#if __cplusplus >= 201703L && _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED +#if __cplusplus >= 201703L && __has_builtin(__builtin_is_constant_evaluated) if (__builtin_is_constant_evaluated()) return __gnu_cxx::char_traits<char_type>::find(__s, __n, __a); #endif diff --git a/libstdc++-v3/include/bits/stl_function.h b/libstdc++-v3/include/bits/stl_function.h index 073018d522d..774a9829284 100644 --- a/libstdc++-v3/include/bits/stl_function.h +++ b/libstdc++-v3/include/bits/stl_function.h @@ -413,12 +413,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _GLIBCXX14_CONSTEXPR bool operator()(_Tp* __x, _Tp* __y) const _GLIBCXX_NOTHROW { -#if __cplusplus >= 201402L -#ifdef _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED +#if __cplusplus >= 201402L && __has_builtin(__builtin_is_constant_evaluated) if (__builtin_is_constant_evaluated()) -#else - if (__builtin_constant_p(__x > __y)) -#endif return __x > __y; #endif return (__UINTPTR_TYPE__)__x > (__UINTPTR_TYPE__)__y; @@ -432,12 +428,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _GLIBCXX14_CONSTEXPR bool operator()(_Tp* __x, _Tp* __y) const _GLIBCXX_NOTHROW { -#if __cplusplus >= 201402L -#ifdef _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED +#if __cplusplus >= 201402L && __has_builtin(__builtin_is_constant_evaluated) if (__builtin_is_constant_evaluated()) -#else - if (__builtin_constant_p(__x < __y)) -#endif return __x < __y; #endif return (__UINTPTR_TYPE__)__x < (__UINTPTR_TYPE__)__y; @@ -451,12 +443,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _GLIBCXX14_CONSTEXPR bool operator()(_Tp* __x, _Tp* __y) const _GLIBCXX_NOTHROW { -#if __cplusplus >= 201402L -#ifdef _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED +#if __cplusplus >= 201402L && __has_builtin(__builtin_is_constant_evaluated) if (__builtin_is_constant_evaluated()) -#else - if (__builtin_constant_p(__x >= __y)) -#endif return __x >= __y; #endif return (__UINTPTR_TYPE__)__x >= (__UINTPTR_TYPE__)__y; @@ -470,12 +458,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _GLIBCXX14_CONSTEXPR bool operator()(_Tp* __x, _Tp* __y) const _GLIBCXX_NOTHROW { -#if __cplusplus >= 201402L -#ifdef _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED +#if __cplusplus >= 201402L && __has_builtin(__builtin_is_constant_evaluated) if (__builtin_is_constant_evaluated()) -#else - if (__builtin_constant_p(__x <= __y)) -#endif return __x <= __y; #endif return (__UINTPTR_TYPE__)__x <= (__UINTPTR_TYPE__)__y; diff --git a/libstdc++-v3/include/debug/helper_functions.h b/libstdc++-v3/include/debug/helper_functions.h index c0144ced979..c54311a22d1 100644 --- a/libstdc++-v3/include/debug/helper_functions.h +++ b/libstdc++-v3/include/debug/helper_functions.h @@ -125,7 +125,7 @@ namespace __gnu_debug __check_singular(_Iterator const& __x) { return -#ifdef _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED +#if __has_builtin(__builtin_is_constant_evaluated) __builtin_is_constant_evaluated() ? false : #endif __check_singular_aux(std::__addressof(__x)); @@ -138,7 +138,7 @@ namespace __gnu_debug __check_singular(_Tp* const& __ptr) { return -#ifdef _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED +#if __has_builtin(__builtin_is_constant_evaluated) __builtin_is_constant_evaluated() ? false : #endif __ptr == 0; diff --git a/libstdc++-v3/include/std/bit b/libstdc++-v3/include/std/bit index c5aae8bab03..ee8e001fd44 100644 --- a/libstdc++-v3/include/std/bit +++ b/libstdc++-v3/include/std/bit @@ -265,7 +265,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION // representable as a value of _Tp, and so the result is undefined. // Want that undefined behaviour to be detected in constant expressions, // by UBSan, and by debug assertions. -#ifdef _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED +#if __has_builtin(__builtin_is_constant_evaluated) if (!__builtin_is_constant_evaluated()) { __glibcxx_assert( __shift_exponent != __int_traits<_Tp>::__digits ); diff --git a/libstdc++-v3/include/std/type_traits b/libstdc++-v3/include/std/type_traits index d9068a06f08..95a60e406a8 100644 --- a/libstdc++-v3/include/std/type_traits +++ b/libstdc++-v3/include/std/type_traits @@ -3316,7 +3316,7 @@ template <typename _From, typename _To> inline constexpr bool is_scoped_enum_v = is_scoped_enum<_Tp>::value; #endif // C++23 -#ifdef _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED +#if __has_builtin(__builtin_is_constant_evaluated) #define __cpp_lib_is_constant_evaluated 201811L diff --git a/libstdc++-v3/include/std/version b/libstdc++-v3/include/std/version index 27bcd32cb60..3bb50d37a72 100644 --- a/libstdc++-v3/include/std/version +++ b/libstdc++-v3/include/std/version @@ -111,7 +111,7 @@ #endif #define __cpp_lib_is_invocable 201703 #define __cpp_lib_is_swappable 201603 -#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER +#if __has_builtin(__builtin_launder) # define __cpp_lib_launder 201606 #endif #define __cpp_lib_logical_traits 201510 @@ -130,7 +130,7 @@ #define __cpp_lib_chrono 201611 #define __cpp_lib_clamp 201603 #if __cplusplus == 201703L // N.B. updated value in C++20 -# if _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED +# if __has_builtin(__builtin_is_constant_evaluated) # define __cpp_lib_constexpr_char_traits 201611L # define __cpp_lib_constexpr_string 201611L # endif @@ -188,7 +188,7 @@ #endif #define __cpp_lib_endian 201907L #define __cpp_lib_int_pow2 202002L -#ifdef _GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED +#ifdef __has_builtin(__builtin_is_constant_evaluated) # define __cpp_lib_is_constant_evaluated 201811L #endif #define __cpp_lib_is_nothrow_convertible 201806L diff --git a/libstdc++-v3/libsupc++/new b/libstdc++-v3/libsupc++/new index 3349b13fd1b..8774b333b90 100644 --- a/libstdc++-v3/libsupc++/new +++ b/libstdc++-v3/libsupc++/new @@ -182,8 +182,7 @@ inline void operator delete[](void*, void*) _GLIBCXX_USE_NOEXCEPT { } //@} } // extern "C++" -#if __cplusplus >= 201703L -#ifdef _GLIBCXX_HAVE_BUILTIN_LAUNDER +#if __cplusplus >= 201703L && __has_builtin(__builtin_launder) namespace std { #define __cpp_lib_launder 201606 @@ -206,7 +205,6 @@ namespace std void launder(volatile void*) = delete; void launder(const volatile void*) = delete; } -#endif // _GLIBCXX_HAVE_BUILTIN_LAUNDER #endif // C++17 #if __cplusplus > 201703L ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast 2021-06-24 14:34 ` Jonathan Wakely @ 2021-06-24 14:40 ` Jonathan Wakely 2021-06-24 14:44 ` Jakub Jelinek 0 siblings, 1 reply; 29+ messages in thread From: Jonathan Wakely @ 2021-06-24 14:40 UTC (permalink / raw) To: Jakub Jelinek; +Cc: Matthias Kretz, gcc Patches, libstdc++ On Thu, 24 Jun 2021 at 15:34, Jonathan Wakely wrote: > Rather than use the _GLIBCXX_HAS_BUILTIN macro more widely, I'd prefer > to not use it where it isn't needed, as in the attached (untested) > patch. My rationale for this is that I'd prefer to use standardized features like __has_include and __has_cpp_attribute where possible, instead of adding more and more configure macros. You don't need to look in c++config.h to see how the macro is defined if you just use a standard feature directly. __has_builtin obviously isn't standardized, but as long as it's available on all the compilers we care about (which it is) then the same rationale applies. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast 2021-06-24 14:40 ` Jonathan Wakely @ 2021-06-24 14:44 ` Jakub Jelinek 0 siblings, 0 replies; 29+ messages in thread From: Jakub Jelinek @ 2021-06-24 14:44 UTC (permalink / raw) To: Jonathan Wakely; +Cc: Matthias Kretz, gcc Patches, libstdc++ On Thu, Jun 24, 2021 at 03:40:09PM +0100, Jonathan Wakely wrote: > On Thu, 24 Jun 2021 at 15:34, Jonathan Wakely wrote: > > Rather than use the _GLIBCXX_HAS_BUILTIN macro more widely, I'd prefer > > to not use it where it isn't needed, as in the attached (untested) > > patch. > > My rationale for this is that I'd prefer to use standardized features > like __has_include and __has_cpp_attribute where possible, instead of > adding more and more configure macros. You don't need to look in > c++config.h to see how the macro is defined if you just use a standard > feature directly. > > __has_builtin obviously isn't standardized, but as long as it's > available on all the compilers we care about (which it is) then the > same rationale applies. Okay. Jakub ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 04/11 v3] libstdc++: Make use of __builtin_bit_cast 2021-06-24 14:01 ` [PATCH 04/11 v3] " Matthias Kretz 2021-06-24 14:08 ` Jakub Jelinek @ 2021-06-25 11:23 ` Jonathan Wakely 1 sibling, 0 replies; 29+ messages in thread From: Jonathan Wakely @ 2021-06-25 11:23 UTC (permalink / raw) To: Matthias Kretz; +Cc: gcc Patches, libstdc++ On Thu, 24 Jun 2021 at 15:02, Matthias Kretz wrote: > > For -ffast-math there was a missing using namespace __proposed left. The > attached patch resolves the issue. OK for trunk, please push (after adding yourself to the "Write After Approval" section of MAINTAINERS as per https://gcc.gnu.org/gitwrite.html as your first commit). Thanks! > From: Matthias Kretz <m.kretz@gsi.de> > > The __bit_cast function was a hack to achieve what __builtin_bit_cast > can do, therefore use __builtin_bit_cast if possible. However, > __builtin_bit_cast cannot be used to cast from/to fixed_size_simd, since > it isn't trivially copyable (in the language sense — in principle it > is). Therefore add __proposed::simd_bit_cast to enable the use case > required in the test framework. > > Signed-off-by: Matthias Kretz <m.kretz@gsi.de> > > libstdc++-v3/ChangeLog: > > * include/experimental/bits/simd.h (__bit_cast): Implement via > __builtin_bit_cast #if available. > (__proposed::simd_bit_cast): Add overloads for simd and > simd_mask, which use __builtin_bit_cast (or __bit_cast #if not > available), which return an object of the requested type with > the same bits as the argument. > * include/experimental/bits/simd_math.h: Use simd_bit_cast > instead of __bit_cast to allow casts to fixed_size_simd. > (copysign): Remove branch that was only required if __bit_cast > cannot be constexpr. > * testsuite/experimental/simd/tests/bits/test_values.h: Switch > from __bit_cast to __proposed::simd_bit_cast since the former > will not cast fixed_size objects anymore. > --- > libstdc++-v3/include/experimental/bits/simd.h | 57 ++++++++++++++++++- > .../include/experimental/bits/simd_math.h | 37 ++++++------ > .../simd/tests/bits/test_values.h | 8 +-- > 3 files changed, 76 insertions(+), 26 deletions(-) > > > -- > ────────────────────────────────────────────────────────────────────────── > Dr. Matthias Kretz https://mattkretz.github.io > GSI Helmholtz Centre for Heavy Ion Research https://gsi.de > std::experimental::simd https://github.com/VcDevel/std-simd > ────────────────────────────────────────────────────────────────────────── ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 05/11] libstdc++: Remove incorrect fabs overload 2021-06-08 12:10 [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups Matthias Kretz ` (3 preceding siblings ...) 2021-06-08 12:11 ` [PATCH 04/11] libstdc++: Make use of __builtin_bit_cast Matthias Kretz @ 2021-06-08 12:11 ` Matthias Kretz 2021-06-08 12:11 ` [PATCH 06/11] libstdc++: Minor simd_math cleanups Matthias Kretz ` (6 subsequent siblings) 11 siblings, 0 replies; 29+ messages in thread From: Matthias Kretz @ 2021-06-08 12:11 UTC (permalink / raw) To: gcc-patches, libstdc++ [-- Attachment #1: Type: text/plain, Size: 1152 bytes --] From: Matthias Kretz <kretz@kde.org> fabs(int) returns double, this one didn't. This overload is not specified in the Parallelism TS 2. Also remove the comment about labs and llabs: it doesn't belong here. Signed-off-by: Matthias Kretz <m.kretz@gsi.de> libstdc++-v3/ChangeLog: * include/experimental/bits/simd_math.h (fabs): Remove fabs(simd<integral>) overload. --- .../include/experimental/bits/simd_math.h | 16 ---------------- 1 file changed, 16 deletions(-) -- ────────────────────────────────────────────────────────────────────────── Dr. Matthias Kretz https://mattkretz.github.io GSI Helmholtz Centre for Heavy Ion Research https://gsi.de std::experimental::simd https://github.com/VcDevel/std-simd ────────────────────────────────────────────────────────────────────────── [-- Attachment #2: 0005-libstdc-Remove-incorrect-fabs-overload.patch --] [-- Type: text/x-patch, Size: 1372 bytes --] diff --git a/libstdc++-v3/include/experimental/bits/simd_math.h b/libstdc++-v3/include/experimental/bits/simd_math.h index 3ade293fcbf..cff4371619d 100644 --- a/libstdc++-v3/include/experimental/bits/simd_math.h +++ b/libstdc++-v3/include/experimental/bits/simd_math.h @@ -863,22 +863,6 @@ template <typename _Tp, typename _Abi> abs(const simd<_Tp, _Abi>& __x) { return {__private_init, _Abi::_SimdImpl::_S_abs(__data(__x))}; } -template <typename _Tp, typename _Abi> - enable_if_t<!is_floating_point_v<_Tp> && is_signed_v<_Tp>, simd<_Tp, _Abi>> - fabs(const simd<_Tp, _Abi>& __x) - { return {__private_init, _Abi::_SimdImpl::_S_abs(__data(__x))}; } - -// the following are overloads for functions in <cstdlib> and not covered by -// [parallel.simd.math]. I don't see much value in making them work, though -/* -template <typename _Abi> simd<long, _Abi> labs(const simd<long, _Abi> &__x) -{ return {__private_init, _Abi::_SimdImpl::abs(__data(__x))}; } - -template <typename _Abi> simd<long long, _Abi> llabs(const simd<long long, _Abi> -&__x) -{ return {__private_init, _Abi::_SimdImpl::abs(__data(__x))}; } -*/ - #define _GLIBCXX_SIMD_CVTING2(_NAME) \ template <typename _Tp, typename _Abi> \ _GLIBCXX_SIMD_INTRINSIC simd<_Tp, _Abi> _NAME( \ ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 06/11] libstdc++: Minor simd_math cleanups 2021-06-08 12:10 [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups Matthias Kretz ` (4 preceding siblings ...) 2021-06-08 12:11 ` [PATCH 05/11] libstdc++: Remove incorrect fabs overload Matthias Kretz @ 2021-06-08 12:11 ` Matthias Kretz 2021-06-08 12:11 ` [PATCH 07/11] libstdc++: Fix condition when AVX512F ldexp implementation is used Matthias Kretz ` (5 subsequent siblings) 11 siblings, 0 replies; 29+ messages in thread From: Matthias Kretz @ 2021-06-08 12:11 UTC (permalink / raw) To: gcc-patches, libstdc++ [-- Attachment #1: Type: text/plain, Size: 1183 bytes --] From: Matthias Kretz <kretz@kde.org> Signed-off-by: Matthias Kretz <m.kretz@gsi.de> libstdc++-v3/ChangeLog: * include/experimental/bits/simd_math.h: Undefine internal macros after use. (frexp): Move #if to a more sensible position and reformat preceding code. (logb): Call _SimdImpl::_S_logb for fixed_size instead of duplicating the code here. (modf): Simplify condition. --- .../include/experimental/bits/simd_math.h | 22 +++++-------------- 1 file changed, 6 insertions(+), 16 deletions(-) -- ────────────────────────────────────────────────────────────────────────── Dr. Matthias Kretz https://mattkretz.github.io GSI Helmholtz Centre for Heavy Ion Research https://gsi.de std::experimental::simd https://github.com/VcDevel/std-simd ────────────────────────────────────────────────────────────────────────── [-- Attachment #2: 0006-libstdc-Minor-simd_math-cleanups.patch --] [-- Type: text/x-patch, Size: 2308 bytes --] diff --git a/libstdc++-v3/include/experimental/bits/simd_math.h b/libstdc++-v3/include/experimental/bits/simd_math.h index cff4371619d..a5df2039970 100644 --- a/libstdc++-v3/include/experimental/bits/simd_math.h +++ b/libstdc++-v3/include/experimental/bits/simd_math.h @@ -645,11 +645,8 @@ template <typename _Tp, typename _Abi> return __r; } else if constexpr (__is_fixed_size_abi_v<_Abi>) - { - return {__private_init, - _Abi::_SimdImpl::_S_frexp(__data(__x), __data(*__exp))}; + return {__private_init, _Abi::_SimdImpl::_S_frexp(__data(__x), __data(*__exp))}; #if _GLIBCXX_SIMD_X86INTRIN - } else if constexpr (__have_avx512f) { constexpr size_t _Np = simd_size_v<_Tp, _Abi>; @@ -667,8 +664,8 @@ template <typename _Tp, typename _Abi> _Abi::_CommonImpl::_S_blend(_SimdWrapper<bool, _Np>( __isnonzero), __v, __getmant_avx512(__v))}; -#endif // _GLIBCXX_SIMD_X86INTRIN } +#endif // _GLIBCXX_SIMD_X86INTRIN else { // fallback implementation @@ -749,14 +746,7 @@ template <typename _Tp, typename _Abi> if constexpr (_Np == 1) return std::logb(__x[0]); else if constexpr (__is_fixed_size_abi_v<_Abi>) - { - return {__private_init, - __data(__x)._M_apply_per_chunk([](auto __impl, auto __xx) { - using _V = typename decltype(__impl)::simd_type; - return __data( - std::experimental::logb(_V(__private_init, __xx))); - })}; - } + return {__private_init, _Abi::_SimdImpl::_S_logb(__data(__x))}; #if _GLIBCXX_SIMD_X86INTRIN // {{{ else if constexpr (__have_avx512vl && __is_sse_ps<_Tp, _Np>()) return {__private_init, @@ -827,9 +817,7 @@ template <typename _Tp, typename _Abi> enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>> modf(const simd<_Tp, _Abi>& __x, simd<_Tp, _Abi>* __iptr) { - if constexpr (__is_scalar_abi<_Abi>() - || (__is_fixed_size_abi_v< - _Abi> && simd_size_v<_Tp, _Abi> == 1)) + if constexpr (simd_size_v<_Tp, _Abi> == 1) { _Tp __tmp; _Tp __r = std::modf(__x[0], &__tmp); @@ -1472,6 +1460,8 @@ template <typename _Tp, typename _Abi> } // }}} +#undef _GLIBCXX_SIMD_CVTING2 +#undef _GLIBCXX_SIMD_CVTING3 #undef _GLIBCXX_SIMD_MATH_CALL_ #undef _GLIBCXX_SIMD_MATH_CALL2_ #undef _GLIBCXX_SIMD_MATH_CALL3_ ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 07/11] libstdc++: Fix condition when AVX512F ldexp implementation is used 2021-06-08 12:10 [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups Matthias Kretz ` (5 preceding siblings ...) 2021-06-08 12:11 ` [PATCH 06/11] libstdc++: Minor simd_math cleanups Matthias Kretz @ 2021-06-08 12:11 ` Matthias Kretz 2021-06-08 12:11 ` [PATCH 08/11] libstdc++: Avoid raising fp exceptions in trunc, floor, and ceil Matthias Kretz ` (4 subsequent siblings) 11 siblings, 0 replies; 29+ messages in thread From: Matthias Kretz @ 2021-06-08 12:11 UTC (permalink / raw) To: gcc-patches, libstdc++ [-- Attachment #1: Type: text/plain, Size: 1170 bytes --] From: Matthias Kretz <kretz@kde.org> This improves codegen of ldexp if AVX512VL is available. Signed-off-by: Matthias Kretz <m.kretz@gsi.de> libstdc++-v3/ChangeLog: * include/experimental/bits/simd_x86.h (_S_ldexp): The AVX512F implementation doesn't require a _VecBltnBtmsk ABI tag, it requires either a 64-Byte input (in which case AVX512F must be available) or AVX512VL. --- libstdc++-v3/include/experimental/bits/simd_x86.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) -- ────────────────────────────────────────────────────────────────────────── Dr. Matthias Kretz https://mattkretz.github.io GSI Helmholtz Centre for Heavy Ion Research https://gsi.de std::experimental::simd https://github.com/VcDevel/std-simd ────────────────────────────────────────────────────────────────────────── [-- Attachment #2: 0007-libstdc-Fix-condition-when-AVX512F-ldexp-implementat.patch --] [-- Type: text/x-patch, Size: 1009 bytes --] diff --git a/libstdc++-v3/include/experimental/bits/simd_x86.h b/libstdc++-v3/include/experimental/bits/simd_x86.h index 305d7a9fa54..5706bf63845 100644 --- a/libstdc++-v3/include/experimental/bits/simd_x86.h +++ b/libstdc++-v3/include/experimental/bits/simd_x86.h @@ -2611,13 +2611,14 @@ template <typename _Abi> _S_ldexp(_SimdWrapper<_Tp, _Np> __x, __fixed_size_storage_t<int, _Np> __exp) { - if constexpr (__is_avx512_abi<_Abi>()) + if constexpr (sizeof(__x) == 64 || __have_avx512vl) { const auto __xi = __to_intrin(__x); constexpr _SimdConverter<int, simd_abi::fixed_size<_Np>, _Tp, _Abi> __cvt; const auto __expi = __to_intrin(__cvt(__exp)); - constexpr auto __k1 = _Abi::template _S_implicit_mask_intrin<_Tp>(); + using _Up = __bool_storage_member_type_t<_Np>; + constexpr _Up __k1 = _Np < sizeof(_Up) * __CHAR_BIT__ ? _Up((1ULL << _Np) - 1) : ~_Up(); if constexpr (sizeof(__xi) == 16) { if constexpr (sizeof(_Tp) == 8) ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 08/11] libstdc++: Avoid raising fp exceptions in trunc, floor, and ceil 2021-06-08 12:10 [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups Matthias Kretz ` (6 preceding siblings ...) 2021-06-08 12:11 ` [PATCH 07/11] libstdc++: Fix condition when AVX512F ldexp implementation is used Matthias Kretz @ 2021-06-08 12:11 ` Matthias Kretz 2021-06-08 12:11 ` [PATCH 09/11] libstdc++: Ensure unrolled loops inline the lambda Matthias Kretz ` (3 subsequent siblings) 11 siblings, 0 replies; 29+ messages in thread From: Matthias Kretz @ 2021-06-08 12:11 UTC (permalink / raw) To: gcc-patches, libstdc++ [-- Attachment #1: Type: text/plain, Size: 1050 bytes --] From: Matthias Kretz <kretz@kde.org> Signed-off-by: Matthias Kretz <m.kretz@gsi.de> libstdc++-v3/ChangeLog: * include/experimental/bits/simd_x86.h (_S_trunc, _S_floor, _S_ceil): Set bit 8 (_MM_FROUND_NO_EXC) on AVX and SSE4.1 roundp[sd] calls. --- .../include/experimental/bits/simd_x86.h | 24 +++++++++---------- 1 file changed, 12 insertions(+), 12 deletions(-) -- ────────────────────────────────────────────────────────────────────────── Dr. Matthias Kretz https://mattkretz.github.io GSI Helmholtz Centre for Heavy Ion Research https://gsi.de std::experimental::simd https://github.com/VcDevel/std-simd ────────────────────────────────────────────────────────────────────────── [-- Attachment #2: 0008-libstdc-Avoid-raising-fp-exceptions-in-trunc-floor-a.patch --] [-- Type: text/x-patch, Size: 2545 bytes --] diff --git a/libstdc++-v3/include/experimental/bits/simd_x86.h b/libstdc++-v3/include/experimental/bits/simd_x86.h index 5706bf63845..34633c096b1 100644 --- a/libstdc++-v3/include/experimental/bits/simd_x86.h +++ b/libstdc++-v3/include/experimental/bits/simd_x86.h @@ -2657,13 +2657,13 @@ template <typename _Abi> else if constexpr (__is_avx512_pd<_Tp, _Np>()) return _mm512_roundscale_pd(__x, 0x0b); else if constexpr (__is_avx_ps<_Tp, _Np>()) - return _mm256_round_ps(__x, 0x3); + return _mm256_round_ps(__x, 0xb); else if constexpr (__is_avx_pd<_Tp, _Np>()) - return _mm256_round_pd(__x, 0x3); + return _mm256_round_pd(__x, 0xb); else if constexpr (__have_sse4_1 && __is_sse_ps<_Tp, _Np>()) - return __auto_bitcast(_mm_round_ps(__to_intrin(__x), 0x3)); + return __auto_bitcast(_mm_round_ps(__to_intrin(__x), 0xb)); else if constexpr (__have_sse4_1 && __is_sse_pd<_Tp, _Np>()) - return _mm_round_pd(__x, 0x3); + return _mm_round_pd(__x, 0xb); else if constexpr (__is_sse_ps<_Tp, _Np>()) { auto __truncated @@ -2786,13 +2786,13 @@ template <typename _Abi> else if constexpr (__is_avx512_pd<_Tp, _Np>()) return _mm512_roundscale_pd(__x, 0x09); else if constexpr (__is_avx_ps<_Tp, _Np>()) - return _mm256_round_ps(__x, 0x1); + return _mm256_round_ps(__x, 0x9); else if constexpr (__is_avx_pd<_Tp, _Np>()) - return _mm256_round_pd(__x, 0x1); + return _mm256_round_pd(__x, 0x9); else if constexpr (__have_sse4_1 && __is_sse_ps<_Tp, _Np>()) - return __auto_bitcast(_mm_floor_ps(__to_intrin(__x))); + return __auto_bitcast(_mm_round_ps(__to_intrin(__x), 0x9)); else if constexpr (__have_sse4_1 && __is_sse_pd<_Tp, _Np>()) - return _mm_floor_pd(__x); + return _mm_round_pd(__x, 0x9); else return _Base::_S_floor(__x); } @@ -2808,13 +2808,13 @@ template <typename _Abi> else if constexpr (__is_avx512_pd<_Tp, _Np>()) return _mm512_roundscale_pd(__x, 0x0a); else if constexpr (__is_avx_ps<_Tp, _Np>()) - return _mm256_round_ps(__x, 0x2); + return _mm256_round_ps(__x, 0xa); else if constexpr (__is_avx_pd<_Tp, _Np>()) - return _mm256_round_pd(__x, 0x2); + return _mm256_round_pd(__x, 0xa); else if constexpr (__have_sse4_1 && __is_sse_ps<_Tp, _Np>()) - return __auto_bitcast(_mm_ceil_ps(__to_intrin(__x))); + return __auto_bitcast(_mm_round_ps(__to_intrin(__x), 0xa)); else if constexpr (__have_sse4_1 && __is_sse_pd<_Tp, _Np>()) - return _mm_ceil_pd(__x); + return _mm_round_pd(__x, 0xa); else return _Base::_S_ceil(__x); } ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 09/11] libstdc++: Ensure unrolled loops inline the lambda 2021-06-08 12:10 [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups Matthias Kretz ` (7 preceding siblings ...) 2021-06-08 12:11 ` [PATCH 08/11] libstdc++: Avoid raising fp exceptions in trunc, floor, and ceil Matthias Kretz @ 2021-06-08 12:11 ` Matthias Kretz 2021-06-08 12:12 ` [PATCH 10/11] libstdc++: Fix internal names: add missing underscores Matthias Kretz ` (2 subsequent siblings) 11 siblings, 0 replies; 29+ messages in thread From: Matthias Kretz @ 2021-06-08 12:11 UTC (permalink / raw) To: gcc-patches, libstdc++ [-- Attachment #1: Type: text/plain, Size: 1088 bytes --] From: Matthias Kretz <kretz@kde.org> Signed-off-by: Matthias Kretz <m.kretz@gsi.de> libstdc++-v3/ChangeLog: * include/experimental/bits/simd.h (__execute_on_index_sequence, __execute_on_index_sequence_with_return, __call_with_n_evaluations, __call_with_subscripts): Add flatten attribute. --- libstdc++-v3/include/experimental/bits/simd.h | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) -- ────────────────────────────────────────────────────────────────────────── Dr. Matthias Kretz https://mattkretz.github.io GSI Helmholtz Centre for Heavy Ion Research https://gsi.de std::experimental::simd https://github.com/VcDevel/std-simd ────────────────────────────────────────────────────────────────────────── [-- Attachment #2: 0009-libstdc-Ensure-unrolled-loops-inline-the-lambda.patch --] [-- Type: text/x-patch, Size: 1830 bytes --] diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/include/experimental/bits/simd.h index 5d243f22434..21100c1087d 100644 --- a/libstdc++-v3/include/experimental/bits/simd.h +++ b/libstdc++-v3/include/experimental/bits/simd.h @@ -234,7 +234,8 @@ namespace __detail // unrolled/pack execution helpers // __execute_n_times{{{ template <typename _Fp, size_t... _I> - _GLIBCXX_SIMD_INTRINSIC constexpr void + [[__gnu__::__flatten__]] _GLIBCXX_SIMD_INTRINSIC constexpr + void __execute_on_index_sequence(_Fp&& __f, index_sequence<_I...>) { ((void)__f(_SizeConstant<_I>()), ...); } @@ -254,7 +255,8 @@ template <size_t _Np, typename _Fp> // }}} // __generate_from_n_evaluations{{{ template <typename _R, typename _Fp, size_t... _I> - _GLIBCXX_SIMD_INTRINSIC constexpr _R + [[__gnu__::__flatten__]] _GLIBCXX_SIMD_INTRINSIC constexpr + _R __execute_on_index_sequence_with_return(_Fp&& __f, index_sequence<_I...>) { return _R{__f(_SizeConstant<_I>())...}; } @@ -269,7 +271,8 @@ template <size_t _Np, typename _R, typename _Fp> // }}} // __call_with_n_evaluations{{{ template <size_t... _I, typename _F0, typename _FArgs> - _GLIBCXX_SIMD_INTRINSIC constexpr auto + [[__gnu__::__flatten__]] _GLIBCXX_SIMD_INTRINSIC constexpr + auto __call_with_n_evaluations(index_sequence<_I...>, _F0&& __f0, _FArgs&& __fargs) { return __f0(__fargs(_SizeConstant<_I>())...); } @@ -285,7 +288,8 @@ template <size_t _Np, typename _F0, typename _FArgs> // }}} // __call_with_subscripts{{{ template <size_t _First = 0, size_t... _It, typename _Tp, typename _Fp> - _GLIBCXX_SIMD_INTRINSIC constexpr auto + [[__gnu__::__flatten__]] _GLIBCXX_SIMD_INTRINSIC constexpr + auto __call_with_subscripts(_Tp&& __x, index_sequence<_It...>, _Fp&& __fun) { return __fun(__x[_First + _It]...); } ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 10/11] libstdc++: Fix internal names: add missing underscores 2021-06-08 12:10 [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups Matthias Kretz ` (8 preceding siblings ...) 2021-06-08 12:11 ` [PATCH 09/11] libstdc++: Ensure unrolled loops inline the lambda Matthias Kretz @ 2021-06-08 12:12 ` Matthias Kretz 2021-06-08 12:12 ` [PATCH 11/11] libstdc++: Fix ODR issues with different -m flags Matthias Kretz 2021-06-24 13:42 ` [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups Jonathan Wakely 11 siblings, 0 replies; 29+ messages in thread From: Matthias Kretz @ 2021-06-08 12:12 UTC (permalink / raw) To: gcc-patches, libstdc++ [-- Attachment #1: Type: text/plain, Size: 1078 bytes --] From: Matthias Kretz <kretz@kde.org> Signed-off-by: Matthias Kretz <m.kretz@gsi.de> libstdc++-v3/ChangeLog: * include/experimental/bits/simd_math.h (_GLIBCXX_SIMD_MATH_CALL2_): Rename arg2_ to __arg2. (_GLIBCXX_SIMD_MATH_CALL3_): Rename arg2_ to __arg2 and arg3_ to __arg3. --- libstdc++-v3/include/experimental/bits/simd_math.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) -- ────────────────────────────────────────────────────────────────────────── Dr. Matthias Kretz https://mattkretz.github.io GSI Helmholtz Centre for Heavy Ion Research https://gsi.de std::experimental::simd https://github.com/VcDevel/std-simd ────────────────────────────────────────────────────────────────────────── [-- Attachment #2: 0010-libstdc-Fix-internal-names-add-missing-underscores.patch --] [-- Type: text/x-patch, Size: 2737 bytes --] diff --git a/libstdc++-v3/include/experimental/bits/simd_math.h b/libstdc++-v3/include/experimental/bits/simd_math.h index a5df2039970..61af9fc67af 100644 --- a/libstdc++-v3/include/experimental/bits/simd_math.h +++ b/libstdc++-v3/include/experimental/bits/simd_math.h @@ -119,10 +119,10 @@ template <typename _Up, typename _Tp, typename _Abi> //}}} // _GLIBCXX_SIMD_MATH_CALL2_ {{{ -#define _GLIBCXX_SIMD_MATH_CALL2_(__name, arg2_) \ +#define _GLIBCXX_SIMD_MATH_CALL2_(__name, __arg2) \ template < \ typename _Tp, typename _Abi, typename..., \ - typename _Arg2 = _Extra_argument_type<arg2_, _Tp, _Abi>, \ + typename _Arg2 = _Extra_argument_type<__arg2, _Tp, _Abi>, \ typename _R = _Math_return_type_t< \ decltype(std::__name(declval<double>(), _Arg2::declval())), _Tp, _Abi>> \ enable_if_t<is_floating_point_v<_Tp>, _R> \ @@ -137,7 +137,7 @@ template <typename _Up, typename _Tp, typename _Abi> \ declval<double>(), \ declval<enable_if_t< \ conjunction_v< \ - is_same<arg2_, _Tp>, \ + is_same<__arg2, _Tp>, \ negation<is_same<__remove_cvref_t<_Up>, simd<_Tp, _Abi>>>, \ is_convertible<_Up, simd<_Tp, _Abi>>, is_floating_point<_Tp>>, \ double>>())), \ @@ -147,10 +147,10 @@ template <typename _Up, typename _Tp, typename _Abi> \ // }}} // _GLIBCXX_SIMD_MATH_CALL3_ {{{ -#define _GLIBCXX_SIMD_MATH_CALL3_(__name, arg2_, arg3_) \ +#define _GLIBCXX_SIMD_MATH_CALL3_(__name, __arg2, __arg3) \ template <typename _Tp, typename _Abi, typename..., \ - typename _Arg2 = _Extra_argument_type<arg2_, _Tp, _Abi>, \ - typename _Arg3 = _Extra_argument_type<arg3_, _Tp, _Abi>, \ + typename _Arg2 = _Extra_argument_type<__arg2, _Tp, _Abi>, \ + typename _Arg3 = _Extra_argument_type<__arg3, _Tp, _Abi>, \ typename _R = _Math_return_type_t< \ decltype(std::__name(declval<double>(), _Arg2::declval(), \ _Arg3::declval())), \ ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 11/11] libstdc++: Fix ODR issues with different -m flags 2021-06-08 12:10 [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups Matthias Kretz ` (9 preceding siblings ...) 2021-06-08 12:12 ` [PATCH 10/11] libstdc++: Fix internal names: add missing underscores Matthias Kretz @ 2021-06-08 12:12 ` Matthias Kretz 2021-06-09 12:22 ` Richard Biener 2021-11-15 8:57 ` Matthias Kretz 2021-06-24 13:42 ` [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups Jonathan Wakely 11 siblings, 2 replies; 29+ messages in thread From: Matthias Kretz @ 2021-06-08 12:12 UTC (permalink / raw) To: gcc-patches, libstdc++ [-- Attachment #1: Type: text/plain, Size: 3618 bytes --] From: Matthias Kretz <kretz@kde.org> Explicitly support use of the stdx::simd implementation in situations where the user links TUs that were compiled with different -m flags. In general, this is always a (quasi) ODR violation for inline functions because at least codegen may differ in important ways. However, in the resulting executable only one (unspecified which one) of them might be used. For simd we want to support users to compile code multiple times, with different -m flags and have a runtime dispatch to the TU matching the target CPU. But if internal functions are not inlined this may lead to unexpected performance loss or execution of illegal instructions. Therefore, inline functions that are not marked as always_inline must use an additional template parameter somewhere in their name, to disambiguate between the different -m translations. Signed-off-by: Matthias Kretz <m.kretz@gsi.de> libstdc++-v3/ChangeLog: * include/experimental/bits/simd.h: Move feature detection bools and add __have_avx512bitalg, __have_avx512vbmi2, __have_avx512vbmi, __have_avx512ifma, __have_avx512cd, __have_avx512vnni, __have_avx512vpopcntdq. (__detail::__machine_flags): New function which returns a unique uint64 depending on relevant -m and -f flags. (__detail::__odr_helper): New type alias for either an anonymous type or a type specialized with the __machine_flags number. (_SimdIntOperators): Change template parameters from _Impl to _Tp, _Abi because _Impl now has an __odr_helper parameter which may be _OdrEnforcer from the anonymous namespace, which makes for a bad base class. (many): Either add __odr_helper template parameter or mark as always_inline. * include/experimental/bits/simd_detail.h: Add defines for AVX512BITALG, AVX512VBMI2, AVX512VBMI, AVX512IFMA, AVX512CD, AVX512VNNI, AVX512VPOPCNTDQ, and AVX512VP2INTERSECT. * include/experimental/bits/simd_builtin.h: Add __odr_helper template parameter or mark as always_inline. * include/experimental/bits/simd_fixed_size.h: Ditto. * include/experimental/bits/simd_math.h: Ditto. * include/experimental/bits/simd_scalar.h: Ditto. * include/experimental/bits/simd_neon.h: Add __odr_helper template parameter. * include/experimental/bits/simd_ppc.h: Ditto. * include/experimental/bits/simd_x86.h: Ditto. --- libstdc++-v3/include/experimental/bits/simd.h | 380 ++++++++++++------ .../include/experimental/bits/simd_builtin.h | 41 +- .../include/experimental/bits/simd_detail.h | 40 ++ .../experimental/bits/simd_fixed_size.h | 39 +- .../include/experimental/bits/simd_math.h | 45 ++- .../include/experimental/bits/simd_neon.h | 4 +- .../include/experimental/bits/simd_ppc.h | 4 +- .../include/experimental/bits/simd_scalar.h | 71 +++- .../include/experimental/bits/simd_x86.h | 4 +- 9 files changed, 440 insertions(+), 188 deletions(-) -- ────────────────────────────────────────────────────────────────────────── Dr. Matthias Kretz https://mattkretz.github.io GSI Helmholtz Centre for Heavy Ion Research https://gsi.de std::experimental::simd https://github.com/VcDevel/std-simd ────────────────────────────────────────────────────────────────────────── [-- Attachment #2: 0011-libstdc-Fix-ODR-issues-with-different-m-flags.patch --] [-- Type: text/x-patch, Size: 53223 bytes --] diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/include/experimental/bits/simd.h index 21100c1087d..43331134301 100644 --- a/libstdc++-v3/include/experimental/bits/simd.h +++ b/libstdc++-v3/include/experimental/bits/simd.h @@ -35,6 +35,7 @@ #include <cstdio> // for stderr #endif #include <cstring> +#include <cmath> #include <functional> #include <iosfwd> #include <utility> @@ -203,9 +204,170 @@ template <size_t _Np> // }}} template <size_t _Xp> using _SizeConstant = integral_constant<size_t, _Xp>; +// constexpr feature detection{{{ +constexpr inline bool __have_mmx = _GLIBCXX_SIMD_HAVE_MMX; +constexpr inline bool __have_sse = _GLIBCXX_SIMD_HAVE_SSE; +constexpr inline bool __have_sse2 = _GLIBCXX_SIMD_HAVE_SSE2; +constexpr inline bool __have_sse3 = _GLIBCXX_SIMD_HAVE_SSE3; +constexpr inline bool __have_ssse3 = _GLIBCXX_SIMD_HAVE_SSSE3; +constexpr inline bool __have_sse4_1 = _GLIBCXX_SIMD_HAVE_SSE4_1; +constexpr inline bool __have_sse4_2 = _GLIBCXX_SIMD_HAVE_SSE4_2; +constexpr inline bool __have_xop = _GLIBCXX_SIMD_HAVE_XOP; +constexpr inline bool __have_avx = _GLIBCXX_SIMD_HAVE_AVX; +constexpr inline bool __have_avx2 = _GLIBCXX_SIMD_HAVE_AVX2; +constexpr inline bool __have_bmi = _GLIBCXX_SIMD_HAVE_BMI1; +constexpr inline bool __have_bmi2 = _GLIBCXX_SIMD_HAVE_BMI2; +constexpr inline bool __have_lzcnt = _GLIBCXX_SIMD_HAVE_LZCNT; +constexpr inline bool __have_sse4a = _GLIBCXX_SIMD_HAVE_SSE4A; +constexpr inline bool __have_fma = _GLIBCXX_SIMD_HAVE_FMA; +constexpr inline bool __have_fma4 = _GLIBCXX_SIMD_HAVE_FMA4; +constexpr inline bool __have_f16c = _GLIBCXX_SIMD_HAVE_F16C; +constexpr inline bool __have_popcnt = _GLIBCXX_SIMD_HAVE_POPCNT; +constexpr inline bool __have_avx512f = _GLIBCXX_SIMD_HAVE_AVX512F; +constexpr inline bool __have_avx512dq = _GLIBCXX_SIMD_HAVE_AVX512DQ; +constexpr inline bool __have_avx512vl = _GLIBCXX_SIMD_HAVE_AVX512VL; +constexpr inline bool __have_avx512bw = _GLIBCXX_SIMD_HAVE_AVX512BW; +constexpr inline bool __have_avx512dq_vl = __have_avx512dq && __have_avx512vl; +constexpr inline bool __have_avx512bw_vl = __have_avx512bw && __have_avx512vl; +constexpr inline bool __have_avx512bitalg = _GLIBCXX_SIMD_HAVE_AVX512BITALG; +constexpr inline bool __have_avx512vbmi2 = _GLIBCXX_SIMD_HAVE_AVX512VBMI2; +constexpr inline bool __have_avx512vbmi = _GLIBCXX_SIMD_HAVE_AVX512VBMI; +constexpr inline bool __have_avx512ifma = _GLIBCXX_SIMD_HAVE_AVX512IFMA; +constexpr inline bool __have_avx512cd = _GLIBCXX_SIMD_HAVE_AVX512CD; +constexpr inline bool __have_avx512vnni = _GLIBCXX_SIMD_HAVE_AVX512VNNI; +constexpr inline bool __have_avx512vpopcntdq = _GLIBCXX_SIMD_HAVE_AVX512VPOPCNTDQ; +constexpr inline bool __have_avx512vp2intersect = _GLIBCXX_SIMD_HAVE_AVX512VP2INTERSECT; + +constexpr inline bool __have_neon = _GLIBCXX_SIMD_HAVE_NEON; +constexpr inline bool __have_neon_a32 = _GLIBCXX_SIMD_HAVE_NEON_A32; +constexpr inline bool __have_neon_a64 = _GLIBCXX_SIMD_HAVE_NEON_A64; +constexpr inline bool __support_neon_float = +#if defined __GCC_IEC_559 + __GCC_IEC_559 == 0; +#elif defined __FAST_MATH__ + true; +#else + false; +#endif + +#ifdef _ARCH_PWR10 +constexpr inline bool __have_power10vec = true; +#else +constexpr inline bool __have_power10vec = false; +#endif +#ifdef __POWER9_VECTOR__ +constexpr inline bool __have_power9vec = true; +#else +constexpr inline bool __have_power9vec = false; +#endif +#if defined __POWER8_VECTOR__ +constexpr inline bool __have_power8vec = true; +#else +constexpr inline bool __have_power8vec = __have_power9vec; +#endif +#if defined __VSX__ +constexpr inline bool __have_power_vsx = true; +#else +constexpr inline bool __have_power_vsx = __have_power8vec; +#endif +#if defined __ALTIVEC__ +constexpr inline bool __have_power_vmx = true; +#else +constexpr inline bool __have_power_vmx = __have_power_vsx; +#endif + +// }}} namespace __detail { + constexpr std::uint_least64_t + __floating_point_flags() + { + std::uint_least64_t __flags = 0; + if constexpr (math_errhandling & MATH_ERREXCEPT) + __flags |= 1; +#ifdef __FAST_MATH__ + __flags |= 1 << 1; +#elif __FINITE_MATH_ONLY__ + __flags |= 2 << 1; +#elif __GCC_IEC_559 < 2 + __flags |= 3 << 1; +#endif + __flags |= (__FLT_EVAL_METHOD__ + 1) << 3; + return __flags; + } + + constexpr std::uint_least64_t + __machine_flags() + { + if constexpr (__have_mmx || __have_sse) + return __have_mmx + | (__have_sse << 1) + | (__have_sse2 << 2) + | (__have_sse3 << 3) + | (__have_ssse3 << 4) + | (__have_sse4_1 << 5) + | (__have_sse4_2 << 6) + | (__have_xop << 7) + | (__have_avx << 8) + | (__have_avx2 << 9) + | (__have_bmi << 10) + | (__have_bmi2 << 11) + | (__have_lzcnt << 12) + | (__have_sse4a << 13) + | (__have_fma << 14) + | (__have_fma4 << 15) + | (__have_f16c << 16) + | (__have_popcnt << 17) + | (__have_avx512f << 18) + | (__have_avx512dq << 19) + | (__have_avx512vl << 20) + | (__have_avx512bw << 21) + | (__have_avx512bitalg << 22) + | (__have_avx512vbmi2 << 23) + | (__have_avx512vbmi << 24) + | (__have_avx512ifma << 25) + | (__have_avx512cd << 26) + | (__have_avx512vnni << 27) + | (__have_avx512vpopcntdq << 28) + | (__have_avx512vp2intersect << 29); + else if constexpr (__have_neon) + return __have_neon + | (__have_neon_a32 << 1) + | (__have_neon_a64 << 2) + | (__have_neon_a64 << 2) + | (__support_neon_float << 3); + else if constexpr (__have_power_vmx) + return __have_power_vmx + | (__have_power_vsx << 1) + | (__have_power8vec << 2) + | (__have_power9vec << 3) + | (__have_power10vec << 4); + else + return 0; + } + + namespace + { + struct _OdrEnforcer {}; + } + + template <std::uint_least64_t...> + struct _MachineFlagsTemplate {}; + + /**@internal + * Use this type as default template argument to all function templates that + * are not declared always_inline. It ensures, that a function + * specialization, which the compiler decides not to inline, has a unique symbol + * (_OdrEnforcer) or a symbol matching the machine/architecture flags + * (_MachineFlagsTemplate). This helps to avoid ODR violations in cases where + * users link TUs compiled with different flags. This is especially important + * for using simd in libraries. + */ + using __odr_helper + = conditional_t<__machine_flags() == 0, _OdrEnforcer, + _MachineFlagsTemplate<__machine_flags(), __floating_point_flags()>>; + struct _Minimum { template <typename _Tp> @@ -469,71 +631,6 @@ template <int _Np> template <typename _Tp> inline constexpr bool __is_fixed_size_abi_v = __is_fixed_size_abi<_Tp>::value; -// }}} -// constexpr feature detection{{{ -constexpr inline bool __have_mmx = _GLIBCXX_SIMD_HAVE_MMX; -constexpr inline bool __have_sse = _GLIBCXX_SIMD_HAVE_SSE; -constexpr inline bool __have_sse2 = _GLIBCXX_SIMD_HAVE_SSE2; -constexpr inline bool __have_sse3 = _GLIBCXX_SIMD_HAVE_SSE3; -constexpr inline bool __have_ssse3 = _GLIBCXX_SIMD_HAVE_SSSE3; -constexpr inline bool __have_sse4_1 = _GLIBCXX_SIMD_HAVE_SSE4_1; -constexpr inline bool __have_sse4_2 = _GLIBCXX_SIMD_HAVE_SSE4_2; -constexpr inline bool __have_xop = _GLIBCXX_SIMD_HAVE_XOP; -constexpr inline bool __have_avx = _GLIBCXX_SIMD_HAVE_AVX; -constexpr inline bool __have_avx2 = _GLIBCXX_SIMD_HAVE_AVX2; -constexpr inline bool __have_bmi = _GLIBCXX_SIMD_HAVE_BMI1; -constexpr inline bool __have_bmi2 = _GLIBCXX_SIMD_HAVE_BMI2; -constexpr inline bool __have_lzcnt = _GLIBCXX_SIMD_HAVE_LZCNT; -constexpr inline bool __have_sse4a = _GLIBCXX_SIMD_HAVE_SSE4A; -constexpr inline bool __have_fma = _GLIBCXX_SIMD_HAVE_FMA; -constexpr inline bool __have_fma4 = _GLIBCXX_SIMD_HAVE_FMA4; -constexpr inline bool __have_f16c = _GLIBCXX_SIMD_HAVE_F16C; -constexpr inline bool __have_popcnt = _GLIBCXX_SIMD_HAVE_POPCNT; -constexpr inline bool __have_avx512f = _GLIBCXX_SIMD_HAVE_AVX512F; -constexpr inline bool __have_avx512dq = _GLIBCXX_SIMD_HAVE_AVX512DQ; -constexpr inline bool __have_avx512vl = _GLIBCXX_SIMD_HAVE_AVX512VL; -constexpr inline bool __have_avx512bw = _GLIBCXX_SIMD_HAVE_AVX512BW; -constexpr inline bool __have_avx512dq_vl = __have_avx512dq && __have_avx512vl; -constexpr inline bool __have_avx512bw_vl = __have_avx512bw && __have_avx512vl; - -constexpr inline bool __have_neon = _GLIBCXX_SIMD_HAVE_NEON; -constexpr inline bool __have_neon_a32 = _GLIBCXX_SIMD_HAVE_NEON_A32; -constexpr inline bool __have_neon_a64 = _GLIBCXX_SIMD_HAVE_NEON_A64; -constexpr inline bool __support_neon_float = -#if defined __GCC_IEC_559 - __GCC_IEC_559 == 0; -#elif defined __FAST_MATH__ - true; -#else - false; -#endif - -#ifdef _ARCH_PWR10 -constexpr inline bool __have_power10vec = true; -#else -constexpr inline bool __have_power10vec = false; -#endif -#ifdef __POWER9_VECTOR__ -constexpr inline bool __have_power9vec = true; -#else -constexpr inline bool __have_power9vec = false; -#endif -#if defined __POWER8_VECTOR__ -constexpr inline bool __have_power8vec = true; -#else -constexpr inline bool __have_power8vec = __have_power9vec; -#endif -#if defined __VSX__ -constexpr inline bool __have_power_vsx = true; -#else -constexpr inline bool __have_power_vsx = __have_power8vec; -#endif -#if defined __ALTIVEC__ -constexpr inline bool __have_power_vmx = true; -#else -constexpr inline bool __have_power_vmx = __have_power_vsx; -#endif - // }}} // __is_scalar_abi {{{ template <typename _Abi> @@ -3984,7 +4081,7 @@ template <typename _Tp, typename _A0, typename... _As> // }}} // concat(simd...) {{{ -template <typename _Tp, typename... _As> +template <typename _Tp, typename... _As, typename = __detail::__odr_helper> inline _GLIBCXX_SIMD_CONSTEXPR simd<_Tp, simd_abi::deduce_t<_Tp, (simd_size_v<_Tp, _As> + ...)>> concat(const simd<_Tp, _As>&... __xs) @@ -4567,6 +4664,7 @@ template <typename _Tp, typename _Abi> template <typename _Up, typename _A2, typename = enable_if_t<simd_size_v<_Up, _A2> == simd_size_v<_Tp, _Abi>>> + _GLIBCXX_SIMD_ALWAYS_INLINE operator simd_mask<_Up, _A2>() && { using namespace std::experimental::__proposed; @@ -4801,121 +4899,153 @@ find_last_set(_ExactBool) // }}} // _SimdIntOperators{{{1 -template <typename _V, typename _Impl, bool> +template <typename _V, typename _Tp, typename _Abi, bool> class _SimdIntOperators {}; -template <typename _V, typename _Impl> - class _SimdIntOperators<_V, _Impl, true> +template <typename _V, typename _Tp, typename _Abi> + class _SimdIntOperators<_V, _Tp, _Abi, true> { + using _Impl = typename _SimdTraits<_Tp, _Abi>::_SimdImpl; + _GLIBCXX_SIMD_INTRINSIC const _V& __derived() const { return *static_cast<const _V*>(this); } - template <typename _Tp> + template <typename _Up> _GLIBCXX_SIMD_INTRINSIC static _GLIBCXX_SIMD_CONSTEXPR _V - _S_make_derived(_Tp&& __d) - { return {__private_init, static_cast<_Tp&&>(__d)}; } + _S_make_derived(_Up&& __d) + { return {__private_init, static_cast<_Up&&>(__d)}; } public: - _GLIBCXX_SIMD_CONSTEXPR friend _V& operator%=(_V& __lhs, const _V& __x) + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend + _V& + operator%=(_V& __lhs, const _V& __x) { return __lhs = __lhs % __x; } - _GLIBCXX_SIMD_CONSTEXPR friend _V& operator&=(_V& __lhs, const _V& __x) + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend + _V& + operator&=(_V& __lhs, const _V& __x) { return __lhs = __lhs & __x; } - _GLIBCXX_SIMD_CONSTEXPR friend _V& operator|=(_V& __lhs, const _V& __x) + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend + _V& + operator|=(_V& __lhs, const _V& __x) { return __lhs = __lhs | __x; } - _GLIBCXX_SIMD_CONSTEXPR friend _V& operator^=(_V& __lhs, const _V& __x) + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend + _V& + operator^=(_V& __lhs, const _V& __x) { return __lhs = __lhs ^ __x; } - _GLIBCXX_SIMD_CONSTEXPR friend _V& operator<<=(_V& __lhs, const _V& __x) + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend + _V& + operator<<=(_V& __lhs, const _V& __x) { return __lhs = __lhs << __x; } - _GLIBCXX_SIMD_CONSTEXPR friend _V& operator>>=(_V& __lhs, const _V& __x) + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend + _V& + operator>>=(_V& __lhs, const _V& __x) { return __lhs = __lhs >> __x; } - _GLIBCXX_SIMD_CONSTEXPR friend _V& operator<<=(_V& __lhs, int __x) + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend + _V& + operator<<=(_V& __lhs, int __x) { return __lhs = __lhs << __x; } - _GLIBCXX_SIMD_CONSTEXPR friend _V& operator>>=(_V& __lhs, int __x) + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend + _V& + operator>>=(_V& __lhs, int __x) { return __lhs = __lhs >> __x; } - _GLIBCXX_SIMD_CONSTEXPR friend _V operator%(const _V& __x, const _V& __y) + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend + _V + operator%(const _V& __x, const _V& __y) { return _SimdIntOperators::_S_make_derived( _Impl::_S_modulus(__data(__x), __data(__y))); } - _GLIBCXX_SIMD_CONSTEXPR friend _V operator&(const _V& __x, const _V& __y) + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend + _V + operator&(const _V& __x, const _V& __y) { return _SimdIntOperators::_S_make_derived( _Impl::_S_bit_and(__data(__x), __data(__y))); } - _GLIBCXX_SIMD_CONSTEXPR friend _V operator|(const _V& __x, const _V& __y) + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend + _V + operator|(const _V& __x, const _V& __y) { return _SimdIntOperators::_S_make_derived( _Impl::_S_bit_or(__data(__x), __data(__y))); } - _GLIBCXX_SIMD_CONSTEXPR friend _V operator^(const _V& __x, const _V& __y) + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend + _V + operator^(const _V& __x, const _V& __y) { return _SimdIntOperators::_S_make_derived( _Impl::_S_bit_xor(__data(__x), __data(__y))); } - _GLIBCXX_SIMD_CONSTEXPR friend _V operator<<(const _V& __x, const _V& __y) + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend + _V + operator<<(const _V& __x, const _V& __y) { return _SimdIntOperators::_S_make_derived( _Impl::_S_bit_shift_left(__data(__x), __data(__y))); } - _GLIBCXX_SIMD_CONSTEXPR friend _V operator>>(const _V& __x, const _V& __y) + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend + _V + operator>>(const _V& __x, const _V& __y) { return _SimdIntOperators::_S_make_derived( _Impl::_S_bit_shift_right(__data(__x), __data(__y))); } - template <typename _VV = _V> - _GLIBCXX_SIMD_CONSTEXPR friend _V operator<<(const _V& __x, int __y) - { - using _Tp = typename _VV::value_type; - if (__y < 0) - __invoke_ub("The behavior is undefined if the right operand of a " - "shift operation is negative. [expr.shift]\nA shift by " - "%d was requested", - __y); - if (size_t(__y) >= sizeof(declval<_Tp>() << __y) * __CHAR_BIT__) - __invoke_ub( - "The behavior is undefined if the right operand of a " - "shift operation is greater than or equal to the width of the " - "promoted left operand. [expr.shift]\nA shift by %d was requested", - __y); - return _SimdIntOperators::_S_make_derived( - _Impl::_S_bit_shift_left(__data(__x), __y)); - } + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend + _V + operator<<(const _V& __x, int __y) + { + if (__y < 0) + __invoke_ub("The behavior is undefined if the right operand of a " + "shift operation is negative. [expr.shift]\nA shift by " + "%d was requested", + __y); + if (size_t(__y) >= sizeof(declval<_Tp>() << __y) * __CHAR_BIT__) + __invoke_ub( + "The behavior is undefined if the right operand of a " + "shift operation is greater than or equal to the width of the " + "promoted left operand. [expr.shift]\nA shift by %d was requested", + __y); + return _SimdIntOperators::_S_make_derived( + _Impl::_S_bit_shift_left(__data(__x), __y)); + } - template <typename _VV = _V> - _GLIBCXX_SIMD_CONSTEXPR friend _V operator>>(const _V& __x, int __y) - { - using _Tp = typename _VV::value_type; - if (__y < 0) - __invoke_ub( - "The behavior is undefined if the right operand of a shift " - "operation is negative. [expr.shift]\nA shift by %d was requested", - __y); - if (size_t(__y) >= sizeof(declval<_Tp>() << __y) * __CHAR_BIT__) - __invoke_ub( - "The behavior is undefined if the right operand of a shift " - "operation is greater than or equal to the width of the promoted " - "left operand. [expr.shift]\nA shift by %d was requested", - __y); - return _SimdIntOperators::_S_make_derived( - _Impl::_S_bit_shift_right(__data(__x), __y)); - } + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend + _V + operator>>(const _V& __x, int __y) + { + if (__y < 0) + __invoke_ub( + "The behavior is undefined if the right operand of a shift " + "operation is negative. [expr.shift]\nA shift by %d was requested", + __y); + if (size_t(__y) >= sizeof(declval<_Tp>() << __y) * __CHAR_BIT__) + __invoke_ub( + "The behavior is undefined if the right operand of a shift " + "operation is greater than or equal to the width of the promoted " + "left operand. [expr.shift]\nA shift by %d was requested", + __y); + return _SimdIntOperators::_S_make_derived( + _Impl::_S_bit_shift_right(__data(__x), __y)); + } // unary operators (for integral _Tp) - _GLIBCXX_SIMD_CONSTEXPR _V operator~() const + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR + _V + operator~() const { return {__private_init, _Impl::_S_complement(__derived()._M_data)}; } }; @@ -4924,7 +5054,7 @@ template <typename _V, typename _Impl> // simd {{{ template <typename _Tp, typename _Abi> class simd : public _SimdIntOperators< - simd<_Tp, _Abi>, typename _SimdTraits<_Tp, _Abi>::_SimdImpl, + simd<_Tp, _Abi>, _Tp, _Abi, conjunction<is_integral<_Tp>, typename _SimdTraits<_Tp, _Abi>::_IsValid>::value>, public _SimdTraits<_Tp, _Abi>::_SimdBase @@ -4938,7 +5068,7 @@ template <typename _Tp, typename _Abi> public: using _Impl = typename _Traits::_SimdImpl; friend _Impl; - friend _SimdIntOperators<simd, _Impl, true>; + friend _SimdIntOperators<simd, _Tp, _Abi, true>; using value_type = _Tp; using reference = _SmartReference<_MemberType, _Impl, value_type>; diff --git a/libstdc++-v3/include/experimental/bits/simd_builtin.h b/libstdc++-v3/include/experimental/bits/simd_builtin.h index 8cd338e313f..55fea77d4ab 100644 --- a/libstdc++-v3/include/experimental/bits/simd_builtin.h +++ b/libstdc++-v3/include/experimental/bits/simd_builtin.h @@ -50,7 +50,8 @@ template <typename _V, typename = _VectorTraits<_V>> //}}} // __vector_permute<Indices...>{{{ // Index == -1 requests zeroing of the output element -template <int... _Indices, typename _Tp, typename _TVT = _VectorTraits<_Tp>> +template <int... _Indices, typename _Tp, typename _TVT = _VectorTraits<_Tp>, + typename = __detail::__odr_helper> _Tp __vector_permute(_Tp __x) { @@ -62,7 +63,8 @@ template <int... _Indices, typename _Tp, typename _TVT = _VectorTraits<_Tp>> // }}} // __vector_shuffle<Indices...>{{{ // Index == -1 requests zeroing of the output element -template <int... _Indices, typename _Tp, typename _TVT = _VectorTraits<_Tp>> +template <int... _Indices, typename _Tp, typename _TVT = _VectorTraits<_Tp>, + typename = __detail::__odr_helper> _Tp __vector_shuffle(_Tp __x, _Tp __y) { @@ -820,10 +822,12 @@ template <typename _Tp, typename _Mp, typename _Abi, size_t _Np> // _SimdBase / base class for simd, providing extra conversions {{{ struct _SimdBase2 { + _GLIBCXX_SIMD_ALWAYS_INLINE explicit operator __intrinsic_type_t<_Tp, _Np>() const { return __to_intrin(static_cast<const simd<_Tp, _Abi>*>(this)->_M_data); } + _GLIBCXX_SIMD_ALWAYS_INLINE explicit operator __vector_type_t<_Tp, _Np>() const { return static_cast<const simd<_Tp, _Abi>*>(this)->_M_data.__builtin(); @@ -832,6 +836,7 @@ template <typename _Tp, typename _Mp, typename _Abi, size_t _Np> struct _SimdBase1 { + _GLIBCXX_SIMD_ALWAYS_INLINE explicit operator __intrinsic_type_t<_Tp, _Np>() const { return __data(*static_cast<const simd<_Tp, _Abi>*>(this)); } }; @@ -844,11 +849,13 @@ template <typename _Tp, typename _Mp, typename _Abi, size_t _Np> // _MaskBase {{{ struct _MaskBase2 { + _GLIBCXX_SIMD_ALWAYS_INLINE explicit operator __intrinsic_type_t<_Tp, _Np>() const { return static_cast<const simd_mask<_Tp, _Abi>*>(this) ->_M_data.__intrin(); } + _GLIBCXX_SIMD_ALWAYS_INLINE explicit operator __vector_type_t<_Tp, _Np>() const { return static_cast<const simd_mask<_Tp, _Abi>*>(this)->_M_data._M_data; @@ -857,6 +864,7 @@ template <typename _Tp, typename _Mp, typename _Abi, size_t _Np> struct _MaskBase1 { + _GLIBCXX_SIMD_ALWAYS_INLINE explicit operator __intrinsic_type_t<_Tp, _Np>() const { return __data(*static_cast<const simd_mask<_Tp, _Abi>*>(this)); } }; @@ -874,7 +882,9 @@ template <typename _Tp, typename _Mp, typename _Abi, size_t _Np> _Up _M_data; public: + _GLIBCXX_SIMD_ALWAYS_INLINE _MaskCastType(_Up __x) : _M_data(__x) {} + _GLIBCXX_SIMD_ALWAYS_INLINE operator _MaskMember() const { return _M_data; } }; @@ -887,7 +897,9 @@ template <typename _Tp, typename _Mp, typename _Abi, size_t _Np> _SimdMember _M_data; public: + _GLIBCXX_SIMD_ALWAYS_INLINE _SimdCastType1(_Ap __a) : _M_data(__vector_bitcast<_Tp>(__a)) {} + _GLIBCXX_SIMD_ALWAYS_INLINE operator _SimdMember() const { return _M_data; } }; @@ -898,8 +910,11 @@ template <typename _Tp, typename _Mp, typename _Abi, size_t _Np> _SimdMember _M_data; public: + _GLIBCXX_SIMD_ALWAYS_INLINE _SimdCastType2(_Ap __a) : _M_data(__vector_bitcast<_Tp>(__a)) {} + _GLIBCXX_SIMD_ALWAYS_INLINE _SimdCastType2(_Bp __b) : _M_data(__b) {} + _GLIBCXX_SIMD_ALWAYS_INLINE operator _SimdMember() const { return _M_data; } }; @@ -913,14 +928,14 @@ template <typename _Tp, typename _Mp, typename _Abi, size_t _Np> struct _CommonImplX86; struct _CommonImplNeon; struct _CommonImplBuiltin; -template <typename _Abi> struct _SimdImplBuiltin; -template <typename _Abi> struct _MaskImplBuiltin; -template <typename _Abi> struct _SimdImplX86; -template <typename _Abi> struct _MaskImplX86; -template <typename _Abi> struct _SimdImplNeon; -template <typename _Abi> struct _MaskImplNeon; -template <typename _Abi> struct _SimdImplPpc; -template <typename _Abi> struct _MaskImplPpc; +template <typename _Abi, typename = __detail::__odr_helper> struct _SimdImplBuiltin; +template <typename _Abi, typename = __detail::__odr_helper> struct _MaskImplBuiltin; +template <typename _Abi, typename = __detail::__odr_helper> struct _SimdImplX86; +template <typename _Abi, typename = __detail::__odr_helper> struct _MaskImplX86; +template <typename _Abi, typename = __detail::__odr_helper> struct _SimdImplNeon; +template <typename _Abi, typename = __detail::__odr_helper> struct _MaskImplNeon; +template <typename _Abi, typename = __detail::__odr_helper> struct _SimdImplPpc; +template <typename _Abi, typename = __detail::__odr_helper> struct _MaskImplPpc; // simd_abi::_VecBuiltin {{{ template <int _UsedBytes> @@ -1369,7 +1384,7 @@ struct _CommonImplBuiltin // }}} // _SimdImplBuiltin {{{1 -template <typename _Abi> +template <typename _Abi, typename> struct _SimdImplBuiltin { // member types {{{2 @@ -2618,7 +2633,7 @@ struct _MaskImplBuiltinMixin }; // _MaskImplBuiltin {{{1 -template <typename _Abi> +template <typename _Abi, typename> struct _MaskImplBuiltin : _MaskImplBuiltinMixin { using _MaskImplBuiltinMixin::_S_to_bits; @@ -2953,4 +2968,4 @@ _GLIBCXX_SIMD_END_NAMESPACE #endif // __cplusplus >= 201703L #endif // _GLIBCXX_EXPERIMENTAL_SIMD_ABIS_H_ -// vim: foldmethod=marker foldmarker={{{,}}} sw=2 noet ts=8 sts=2 tw=80 +// vim: foldmethod=marker foldmarker={{{,}}} sw=2 noet ts=8 sts=2 tw=100 diff --git a/libstdc++-v3/include/experimental/bits/simd_detail.h b/libstdc++-v3/include/experimental/bits/simd_detail.h index 1e75812d098..78ad33f74e4 100644 --- a/libstdc++-v3/include/experimental/bits/simd_detail.h +++ b/libstdc++-v3/include/experimental/bits/simd_detail.h @@ -172,6 +172,46 @@ #else #define _GLIBCXX_SIMD_HAVE_AVX512BW 0 #endif +#ifdef __AVX512BITALG__ +#define _GLIBCXX_SIMD_HAVE_AVX512BITALG 1 +#else +#define _GLIBCXX_SIMD_HAVE_AVX512BITALG 0 +#endif +#ifdef __AVX512VBMI2__ +#define _GLIBCXX_SIMD_HAVE_AVX512VBMI2 1 +#else +#define _GLIBCXX_SIMD_HAVE_AVX512VBMI2 0 +#endif +#ifdef __AVX512VBMI__ +#define _GLIBCXX_SIMD_HAVE_AVX512VBMI 1 +#else +#define _GLIBCXX_SIMD_HAVE_AVX512VBMI 0 +#endif +#ifdef __AVX512IFMA__ +#define _GLIBCXX_SIMD_HAVE_AVX512IFMA 1 +#else +#define _GLIBCXX_SIMD_HAVE_AVX512IFMA 0 +#endif +#ifdef __AVX512CD__ +#define _GLIBCXX_SIMD_HAVE_AVX512CD 1 +#else +#define _GLIBCXX_SIMD_HAVE_AVX512CD 0 +#endif +#ifdef __AVX512VNNI__ +#define _GLIBCXX_SIMD_HAVE_AVX512VNNI 1 +#else +#define _GLIBCXX_SIMD_HAVE_AVX512VNNI 0 +#endif +#ifdef __AVX512VPOPCNTDQ__ +#define _GLIBCXX_SIMD_HAVE_AVX512VPOPCNTDQ 1 +#else +#define _GLIBCXX_SIMD_HAVE_AVX512VPOPCNTDQ 0 +#endif +#ifdef __AVX512VP2INTERSECT__ +#define _GLIBCXX_SIMD_HAVE_AVX512VP2INTERSECT 1 +#else +#define _GLIBCXX_SIMD_HAVE_AVX512VP2INTERSECT 0 +#endif #if _GLIBCXX_SIMD_HAVE_SSE #define _GLIBCXX_SIMD_HAVE_SSE_ABI 1 diff --git a/libstdc++-v3/include/experimental/bits/simd_fixed_size.h b/libstdc++-v3/include/experimental/bits/simd_fixed_size.h index dc2fb90b9b2..5a742ed52e1 100644 --- a/libstdc++-v3/include/experimental/bits/simd_fixed_size.h +++ b/libstdc++-v3/include/experimental/bits/simd_fixed_size.h @@ -201,6 +201,7 @@ template <typename _Tp, typename _Abi, size_t _Offset> }; template <size_t _Offset, typename _Tp, typename _Abi, typename... _As> + _GLIBCXX_SIMD_INTRINSIC __tuple_element_meta<_Tp, _Abi, _Offset> __make_meta(const _SimdTuple<_Tp, _Abi, _As...>&) { return {}; } @@ -230,11 +231,13 @@ template <size_t _O0, size_t _O1, typename _Base> struct _WithOffset<_O0, _WithOffset<_O1, _Base>> {}; template <size_t _Offset, typename _Tp> + _GLIBCXX_SIMD_INTRINSIC decltype(auto) __add_offset(_Tp& __base) { return static_cast<_WithOffset<_Offset, __remove_cvref_t<_Tp>>&>(__base); } template <size_t _Offset, typename _Tp> + _GLIBCXX_SIMD_INTRINSIC decltype(auto) __add_offset(const _Tp& __base) { @@ -243,6 +246,7 @@ template <size_t _Offset, typename _Tp> } template <size_t _Offset, size_t _ExistingOffset, typename _Tp> + _GLIBCXX_SIMD_INTRINSIC decltype(auto) __add_offset(_WithOffset<_ExistingOffset, _Tp>& __base) { @@ -251,6 +255,7 @@ template <size_t _Offset, size_t _ExistingOffset, typename _Tp> } template <size_t _Offset, size_t _ExistingOffset, typename _Tp> + _GLIBCXX_SIMD_INTRINSIC decltype(auto) __add_offset(const _WithOffset<_ExistingOffset, _Tp>& __base) { @@ -586,6 +591,7 @@ template <typename _Tp, typename _Abi0, typename... _Abis> return second[integral_constant<_Up, _I - simd_size_v<_Tp, _Abi0>>()]; } + _GLIBCXX_SIMD_INTRINSIC _Tp operator[](size_t __i) const noexcept { if constexpr (_S_tuple_size == 1) @@ -608,6 +614,7 @@ template <typename _Tp, typename _Abi0, typename... _Abis> } } + _GLIBCXX_SIMD_INTRINSIC void _M_set(size_t __i, _Tp __val) noexcept { if constexpr (_S_tuple_size == 1) @@ -627,6 +634,7 @@ template <typename _Tp, typename _Abi0, typename... _Abis> private: // _M_subscript_read/_write {{{ + _GLIBCXX_SIMD_INTRINSIC _Tp _M_subscript_read([[maybe_unused]] size_t __i) const noexcept { if constexpr (__is_vectorizable_v<_FirstType>) @@ -635,6 +643,7 @@ template <typename _Tp, typename _Abi0, typename... _Abis> return first[__i]; } + _GLIBCXX_SIMD_INTRINSIC void _M_subscript_write([[maybe_unused]] size_t __i, _Tp __y) noexcept { if constexpr (__is_vectorizable_v<_FirstType>) @@ -1033,9 +1042,11 @@ template <typename _Tp, bool = is_arithmetic_v<__remove_cvref_t<_Tp>>> _Tp _M_data; using _TT = __remove_cvref_t<_Tp>; + _GLIBCXX_SIMD_INTRINSIC operator _TT() { return _M_data; } + _GLIBCXX_SIMD_INTRINSIC operator _TT&() { static_assert(is_lvalue_reference<_Tp>::value, ""); @@ -1043,6 +1054,7 @@ template <typename _Tp, bool = is_arithmetic_v<__remove_cvref_t<_Tp>>> return _M_data; } + _GLIBCXX_SIMD_INTRINSIC operator _TT*() { static_assert(is_lvalue_reference<_Tp>::value, ""); @@ -1050,13 +1062,16 @@ template <typename _Tp, bool = is_arithmetic_v<__remove_cvref_t<_Tp>>> return &_M_data; } - constexpr inline __autocvt_to_simd(_Tp dd) : _M_data(dd) {} + _GLIBCXX_SIMD_INTRINSIC + constexpr __autocvt_to_simd(_Tp dd) : _M_data(dd) {} template <typename _Abi> + _GLIBCXX_SIMD_INTRINSIC operator simd<typename _TT::value_type, _Abi>() { return {__private_init, _M_data}; } template <typename _Abi> + _GLIBCXX_SIMD_INTRINSIC operator simd<typename _TT::value_type, _Abi>&() { return *reinterpret_cast<simd<typename _TT::value_type, _Abi>*>( @@ -1064,6 +1079,7 @@ template <typename _Tp, bool = is_arithmetic_v<__remove_cvref_t<_Tp>>> } template <typename _Abi> + _GLIBCXX_SIMD_INTRINSIC operator simd<typename _TT::value_type, _Abi>*() { return reinterpret_cast<simd<typename _TT::value_type, _Abi>*>( @@ -1081,14 +1097,18 @@ template <typename _Tp> _Tp _M_data; fixed_size_simd<_TT, 1> _M_fd; - constexpr inline __autocvt_to_simd(_Tp dd) : _M_data(dd), _M_fd(_M_data) {} + _GLIBCXX_SIMD_INTRINSIC + constexpr __autocvt_to_simd(_Tp dd) : _M_data(dd), _M_fd(_M_data) {} + _GLIBCXX_SIMD_INTRINSIC ~__autocvt_to_simd() { _M_data = __data(_M_fd).first; } + _GLIBCXX_SIMD_INTRINSIC operator fixed_size_simd<_TT, 1>() { return _M_fd; } + _GLIBCXX_SIMD_INTRINSIC operator fixed_size_simd<_TT, 1> &() { static_assert(is_lvalue_reference<_Tp>::value, ""); @@ -1096,6 +1116,7 @@ template <typename _Tp> return _M_fd; } + _GLIBCXX_SIMD_INTRINSIC operator fixed_size_simd<_TT, 1> *() { static_assert(is_lvalue_reference<_Tp>::value, ""); @@ -1107,8 +1128,8 @@ template <typename _Tp> // }}} struct _CommonImplFixedSize; -template <int _Np> struct _SimdImplFixedSize; -template <int _Np> struct _MaskImplFixedSize; +template <int _Np, typename = __detail::__odr_helper> struct _SimdImplFixedSize; +template <int _Np, typename = __detail::__odr_helper> struct _MaskImplFixedSize; // simd_abi::_Fixed {{{ template <int _Np> struct simd_abi::_Fixed @@ -1172,12 +1193,15 @@ template <int _Np> { // The following ensures, function arguments are passed via the stack. // This is important for ABI compatibility across TU boundaries + _GLIBCXX_SIMD_ALWAYS_INLINE _SimdBase(const _SimdBase&) {} _SimdBase() = default; + _GLIBCXX_SIMD_ALWAYS_INLINE explicit operator const _SimdMember &() const { return static_cast<const simd<_Tp, _Fixed>*>(this)->_M_data; } + _GLIBCXX_SIMD_ALWAYS_INLINE explicit operator array<_Tp, _Np>() const { array<_Tp, _Np> __r; @@ -1198,8 +1222,11 @@ template <int _Np> // _SimdCastType {{{ struct _SimdCastType { + _GLIBCXX_SIMD_ALWAYS_INLINE _SimdCastType(const array<_Tp, _Np>&); + _GLIBCXX_SIMD_ALWAYS_INLINE _SimdCastType(const _SimdMember& dd) : _M_data(dd) {} + _GLIBCXX_SIMD_ALWAYS_INLINE explicit operator const _SimdMember &() const { return _M_data; } private: @@ -1237,7 +1264,7 @@ struct _CommonImplFixedSize // _SimdImplFixedSize {{{1 // fixed_size should not inherit from _SimdMathFallback in order for // specializations in the used _SimdTuple Abis to get used -template <int _Np> +template <int _Np, typename> struct _SimdImplFixedSize { // member types {{{2 @@ -1794,7 +1821,7 @@ template <int _Np> }; // _MaskImplFixedSize {{{1 -template <int _Np> +template <int _Np, typename> struct _MaskImplFixedSize { static_assert( diff --git a/libstdc++-v3/include/experimental/bits/simd_math.h b/libstdc++-v3/include/experimental/bits/simd_math.h index 61af9fc67af..01061a75a5e 100644 --- a/libstdc++-v3/include/experimental/bits/simd_math.h +++ b/libstdc++-v3/include/experimental/bits/simd_math.h @@ -60,6 +60,7 @@ template <typename _DoubleR, typename _Tp, typename _Abi> template <typename _Tp, typename _Abi, typename..., \ typename _R = _Math_return_type_t< \ decltype(std::__name(declval<double>())), _Tp, _Abi>> \ + _GLIBCXX_SIMD_ALWAYS_INLINE \ enable_if_t<is_floating_point_v<_Tp>, _R> \ __name(simd<_Tp, _Abi> __x) \ { return {__private_init, _Abi::_SimdImpl::_S_##__name(__data(__x))}; } @@ -125,6 +126,7 @@ template < \ typename _Arg2 = _Extra_argument_type<__arg2, _Tp, _Abi>, \ typename _R = _Math_return_type_t< \ decltype(std::__name(declval<double>(), _Arg2::declval())), _Tp, _Abi>> \ + _GLIBCXX_SIMD_ALWAYS_INLINE \ enable_if_t<is_floating_point_v<_Tp>, _R> \ __name(const simd<_Tp, _Abi>& __x, const typename _Arg2::type& __y) \ { \ @@ -155,6 +157,7 @@ template <typename _Tp, typename _Abi, typename..., \ decltype(std::__name(declval<double>(), _Arg2::declval(), \ _Arg3::declval())), \ _Tp, _Abi>> \ + _GLIBCXX_SIMD_ALWAYS_INLINE \ enable_if_t<is_floating_point_v<_Tp>, _R> \ __name(const simd<_Tp, _Abi>& __x, const typename _Arg2::type& __y, \ const typename _Arg3::type& __z) \ @@ -399,6 +402,7 @@ template <typename _Abi> // }}} // __extract_exponent_as_int {{{ template <typename _Tp, typename _Abi> + _GLIBCXX_SIMD_INTRINSIC rebind_simd_t<int, simd<_Tp, _Abi>> __extract_exponent_as_int(const simd<_Tp, _Abi>& __v) { @@ -421,7 +425,8 @@ template <typename ImplFun, typename FallbackFun, typename... _Args> -> decltype(__impl_fun(static_cast<_Args&&>(__args)...)) { return __impl_fun(static_cast<_Args&&>(__args)...); } -template <typename ImplFun, typename FallbackFun, typename... _Args> +template <typename ImplFun, typename FallbackFun, typename... _Args, + typename = __detail::__odr_helper> inline auto __impl_or_fallback_dispatch(float, ImplFun&&, FallbackFun&& __fallback_fun, _Args&&... __args) @@ -457,7 +462,7 @@ _GLIBCXX_SIMD_MATH_CALL2_(atan2, _Tp) * Fix sign. */ // cos{{{ -template <typename _Tp, typename _Abi> +template <typename _Tp, typename _Abi, typename = __detail::__odr_helper> enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>> cos(const simd<_Tp, _Abi>& __x) { @@ -503,7 +508,7 @@ template <typename _Tp> //}}} // sin{{{ -template <typename _Tp, typename _Abi> +template <typename _Tp, typename _Abi, typename = __detail::__odr_helper> enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>> sin(const simd<_Tp, _Abi>& __x) { @@ -565,6 +570,7 @@ _GLIBCXX_SIMD_MATH_CALL_(expm1) // frexp {{{ #if _GLIBCXX_SIMD_X86INTRIN template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC _SimdWrapper<_Tp, _Np> __getexp(_SimdWrapper<_Tp, _Np> __x) { @@ -593,6 +599,7 @@ template <typename _Tp, size_t _Np> } template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC _SimdWrapper<_Tp, _Np> __getmant_avx512(_SimdWrapper<_Tp, _Np> __x) { @@ -633,7 +640,7 @@ template <typename _Tp, size_t _Np> * The return value will be in the range [0.5, 1.0[ * The @p __e value will be an integer defining the power-of-two exponent */ -template <typename _Tp, typename _Abi> +template <typename _Tp, typename _Abi, typename = __detail::__odr_helper> enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>> frexp(const simd<_Tp, _Abi>& __x, _Samesize<int, simd<_Tp, _Abi>>* __exp) { @@ -738,7 +745,7 @@ _GLIBCXX_SIMD_MATH_CALL_(log2) //}}} // logb{{{ -template <typename _Tp, typename _Abi> +template <typename _Tp, typename _Abi, typename = __detail::__odr_helper> enable_if_t<is_floating_point<_Tp>::value, simd<_Tp, _Abi>> logb(const simd<_Tp, _Abi>& __x) { @@ -813,7 +820,7 @@ template <typename _Tp, typename _Abi> } //}}} -template <typename _Tp, typename _Abi> +template <typename _Tp, typename _Abi, typename = __detail::__odr_helper> enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>> modf(const simd<_Tp, _Abi>& __x, simd<_Tp, _Abi>* __iptr) { @@ -847,6 +854,7 @@ _GLIBCXX_SIMD_MATH_CALL_(fabs) // [parallel.simd.math] only asks for is_floating_point_v<_Tp> and forgot to // allow signed integral _Tp template <typename _Tp, typename _Abi> + _GLIBCXX_SIMD_ALWAYS_INLINE enable_if_t<!is_floating_point_v<_Tp> && is_signed_v<_Tp>, simd<_Tp, _Abi>> abs(const simd<_Tp, _Abi>& __x) { return {__private_init, _Abi::_SimdImpl::_S_abs(__data(__x))}; } @@ -929,7 +937,7 @@ template <typename _R, typename _ToApply, typename _Tp, typename... _Tps> __data(__args)...)}; } -template <typename _VV> +template <typename _VV, typename = __detail::__odr_helper> __remove_cvref_t<_VV> __hypot(_VV __x, _VV __y) { @@ -1067,7 +1075,7 @@ template <typename _Tp, typename _Abi> _GLIBCXX_SIMD_CVTING2(hypot) - template <typename _VV> + template <typename _VV, typename = __detail::__odr_helper> __remove_cvref_t<_VV> __hypot(_VV __x, _VV __y, _VV __z) { @@ -1268,7 +1276,7 @@ _GLIBCXX_SIMD_MATH_CALL2_(fmod, _Tp) _GLIBCXX_SIMD_MATH_CALL2_(remainder, _Tp) _GLIBCXX_SIMD_MATH_CALL3_(remquo, _Tp, int*) -template <typename _Tp, typename _Abi> +template <typename _Tp, typename _Abi, typename = __detail::__odr_helper> enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>> copysign(const simd<_Tp, _Abi>& __x, const simd<_Tp, _Abi>& __y) { @@ -1306,12 +1314,14 @@ _GLIBCXX_SIMD_MATH_CALL_(isfinite) // `int isinf(double)`. template <typename _Tp, typename _Abi, typename..., typename _R = _Math_return_type_t<bool, _Tp, _Abi>> + _GLIBCXX_SIMD_ALWAYS_INLINE enable_if_t<is_floating_point_v<_Tp>, _R> isinf(simd<_Tp, _Abi> __x) { return {__private_init, _Abi::_SimdImpl::_S_isinf(__data(__x))}; } template <typename _Tp, typename _Abi, typename..., typename _R = _Math_return_type_t<bool, _Tp, _Abi>> + _GLIBCXX_SIMD_ALWAYS_INLINE enable_if_t<is_floating_point_v<_Tp>, _R> isnan(simd<_Tp, _Abi> __x) { return {__private_init, _Abi::_SimdImpl::_S_isnan(__data(__x))}; } @@ -1319,6 +1329,7 @@ template <typename _Tp, typename _Abi, typename..., _GLIBCXX_SIMD_MATH_CALL_(isnormal) template <typename..., typename _Tp, typename _Abi> + _GLIBCXX_SIMD_ALWAYS_INLINE simd_mask<_Tp, _Abi> signbit(simd<_Tp, _Abi> __x) { @@ -1366,7 +1377,7 @@ simd_div_t<__llongv<_Abi>> div(__llongv<_Abi> numer, */ // special math {{{ -template <typename _Tp, typename _Abi> +template <typename _Tp, typename _Abi, typename = __detail::__odr_helper> enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>> assoc_laguerre(const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __n, const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __m, @@ -1377,7 +1388,7 @@ template <typename _Tp, typename _Abi> }); } -template <typename _Tp, typename _Abi> +template <typename _Tp, typename _Abi, typename = __detail::__odr_helper> enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>> assoc_legendre(const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __n, const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __m, @@ -1401,7 +1412,7 @@ _GLIBCXX_SIMD_MATH_CALL2_(ellint_2, _Tp) _GLIBCXX_SIMD_MATH_CALL3_(ellint_3, _Tp, _Tp) _GLIBCXX_SIMD_MATH_CALL_(expint) -template <typename _Tp, typename _Abi> +template <typename _Tp, typename _Abi, typename = __detail::__odr_helper> enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>> hermite(const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __n, const simd<_Tp, _Abi>& __x) @@ -1410,7 +1421,7 @@ template <typename _Tp, typename _Abi> [&](auto __i) { return std::hermite(__n[__i], __x[__i]); }); } -template <typename _Tp, typename _Abi> +template <typename _Tp, typename _Abi, typename = __detail::__odr_helper> enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>> laguerre(const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __n, const simd<_Tp, _Abi>& __x) @@ -1419,7 +1430,7 @@ template <typename _Tp, typename _Abi> [&](auto __i) { return std::laguerre(__n[__i], __x[__i]); }); } -template <typename _Tp, typename _Abi> +template <typename _Tp, typename _Abi, typename = __detail::__odr_helper> enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>> legendre(const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __n, const simd<_Tp, _Abi>& __x) @@ -1430,7 +1441,7 @@ template <typename _Tp, typename _Abi> _GLIBCXX_SIMD_MATH_CALL_(riemann_zeta) -template <typename _Tp, typename _Abi> +template <typename _Tp, typename _Abi, typename = __detail::__odr_helper> enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>> sph_bessel(const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __n, const simd<_Tp, _Abi>& __x) @@ -1439,7 +1450,7 @@ template <typename _Tp, typename _Abi> [&](auto __i) { return std::sph_bessel(__n[__i], __x[__i]); }); } -template <typename _Tp, typename _Abi> +template <typename _Tp, typename _Abi, typename = __detail::__odr_helper> enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>> sph_legendre(const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __l, const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __m, @@ -1450,7 +1461,7 @@ template <typename _Tp, typename _Abi> }); } -template <typename _Tp, typename _Abi> +template <typename _Tp, typename _Abi, typename = __detail::__odr_helper> enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>> sph_neumann(const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __n, const simd<_Tp, _Abi>& __x) diff --git a/libstdc++-v3/include/experimental/bits/simd_neon.h b/libstdc++-v3/include/experimental/bits/simd_neon.h index 7f472e88649..bbd26835d9c 100644 --- a/libstdc++-v3/include/experimental/bits/simd_neon.h +++ b/libstdc++-v3/include/experimental/bits/simd_neon.h @@ -44,7 +44,7 @@ struct _CommonImplNeon : _CommonImplBuiltin // }}} // _SimdImplNeon {{{ -template <typename _Abi> +template <typename _Abi, typename> struct _SimdImplNeon : _SimdImplBuiltin<_Abi> { using _Base = _SimdImplBuiltin<_Abi>; @@ -390,7 +390,7 @@ struct _MaskImplNeonMixin // }}} // _MaskImplNeon {{{ -template <typename _Abi> +template <typename _Abi, typename> struct _MaskImplNeon : _MaskImplNeonMixin, _MaskImplBuiltin<_Abi> { using _MaskImplBuiltinMixin::_S_to_maskvector; diff --git a/libstdc++-v3/include/experimental/bits/simd_ppc.h b/libstdc++-v3/include/experimental/bits/simd_ppc.h index ef52d129a85..4143bafa80e 100644 --- a/libstdc++-v3/include/experimental/bits/simd_ppc.h +++ b/libstdc++-v3/include/experimental/bits/simd_ppc.h @@ -35,7 +35,7 @@ _GLIBCXX_SIMD_BEGIN_NAMESPACE // _SimdImplPpc {{{ -template <typename _Abi> +template <typename _Abi, typename> struct _SimdImplPpc : _SimdImplBuiltin<_Abi> { using _Base = _SimdImplBuiltin<_Abi>; @@ -117,7 +117,7 @@ template <typename _Abi> // }}} // _MaskImplPpc {{{ -template <typename _Abi> +template <typename _Abi, typename> struct _MaskImplPpc : _MaskImplBuiltin<_Abi> { using _Base = _MaskImplBuiltin<_Abi>; diff --git a/libstdc++-v3/include/experimental/bits/simd_scalar.h b/libstdc++-v3/include/experimental/bits/simd_scalar.h index 48e13f6c719..b23011ca6c9 100644 --- a/libstdc++-v3/include/experimental/bits/simd_scalar.h +++ b/libstdc++-v3/include/experimental/bits/simd_scalar.h @@ -155,7 +155,8 @@ struct _SimdImplScalar // _S_masked_load {{{2 template <typename _Tp, typename _Up> - static inline _Tp _S_masked_load(_Tp __merge, bool __k, + _GLIBCXX_SIMD_INTRINSIC + static _Tp _S_masked_load(_Tp __merge, bool __k, const _Up* __mem) noexcept { if (__k) @@ -165,83 +166,97 @@ struct _SimdImplScalar // _S_store {{{2 template <typename _Tp, typename _Up> - static inline void _S_store(_Tp __v, _Up* __mem, _TypeTag<_Tp>) noexcept + _GLIBCXX_SIMD_INTRINSIC + static void _S_store(_Tp __v, _Up* __mem, _TypeTag<_Tp>) noexcept { __mem[0] = static_cast<_Up>(__v); } // _S_masked_store {{{2 template <typename _Tp, typename _Up> - static inline void _S_masked_store(const _Tp __v, _Up* __mem, + _GLIBCXX_SIMD_INTRINSIC + static void _S_masked_store(const _Tp __v, _Up* __mem, const bool __k) noexcept { if (__k) __mem[0] = __v; } // _S_negate {{{2 template <typename _Tp> - static constexpr inline bool _S_negate(_Tp __x) noexcept + _GLIBCXX_SIMD_INTRINSIC + static constexpr bool _S_negate(_Tp __x) noexcept { return !__x; } // _S_reduce {{{2 template <typename _Tp, typename _BinaryOperation> - static constexpr inline _Tp + _GLIBCXX_SIMD_INTRINSIC + static constexpr _Tp _S_reduce(const simd<_Tp, simd_abi::scalar>& __x, const _BinaryOperation&) { return __x._M_data; } // _S_min, _S_max {{{2 template <typename _Tp> - static constexpr inline _Tp _S_min(const _Tp __a, const _Tp __b) + _GLIBCXX_SIMD_INTRINSIC + static constexpr _Tp _S_min(const _Tp __a, const _Tp __b) { return std::min(__a, __b); } template <typename _Tp> - static constexpr inline _Tp _S_max(const _Tp __a, const _Tp __b) + _GLIBCXX_SIMD_INTRINSIC + static constexpr _Tp _S_max(const _Tp __a, const _Tp __b) { return std::max(__a, __b); } // _S_complement {{{2 template <typename _Tp> - static constexpr inline _Tp _S_complement(_Tp __x) noexcept + _GLIBCXX_SIMD_INTRINSIC + static constexpr _Tp _S_complement(_Tp __x) noexcept { return static_cast<_Tp>(~__x); } // _S_unary_minus {{{2 template <typename _Tp> - static constexpr inline _Tp _S_unary_minus(_Tp __x) noexcept + _GLIBCXX_SIMD_INTRINSIC + static constexpr _Tp _S_unary_minus(_Tp __x) noexcept { return static_cast<_Tp>(-__x); } // arithmetic operators {{{2 template <typename _Tp> - static constexpr inline _Tp _S_plus(_Tp __x, _Tp __y) + _GLIBCXX_SIMD_INTRINSIC + static constexpr _Tp _S_plus(_Tp __x, _Tp __y) { return static_cast<_Tp>(__promote_preserving_unsigned(__x) + __promote_preserving_unsigned(__y)); } template <typename _Tp> - static constexpr inline _Tp _S_minus(_Tp __x, _Tp __y) + _GLIBCXX_SIMD_INTRINSIC + static constexpr _Tp _S_minus(_Tp __x, _Tp __y) { return static_cast<_Tp>(__promote_preserving_unsigned(__x) - __promote_preserving_unsigned(__y)); } template <typename _Tp> - static constexpr inline _Tp _S_multiplies(_Tp __x, _Tp __y) + _GLIBCXX_SIMD_INTRINSIC + static constexpr _Tp _S_multiplies(_Tp __x, _Tp __y) { return static_cast<_Tp>(__promote_preserving_unsigned(__x) * __promote_preserving_unsigned(__y)); } template <typename _Tp> - static constexpr inline _Tp _S_divides(_Tp __x, _Tp __y) + _GLIBCXX_SIMD_INTRINSIC + static constexpr _Tp _S_divides(_Tp __x, _Tp __y) { return static_cast<_Tp>(__promote_preserving_unsigned(__x) / __promote_preserving_unsigned(__y)); } template <typename _Tp> - static constexpr inline _Tp _S_modulus(_Tp __x, _Tp __y) + _GLIBCXX_SIMD_INTRINSIC + static constexpr _Tp _S_modulus(_Tp __x, _Tp __y) { return static_cast<_Tp>(__promote_preserving_unsigned(__x) % __promote_preserving_unsigned(__y)); } template <typename _Tp> - static constexpr inline _Tp _S_bit_and(_Tp __x, _Tp __y) + _GLIBCXX_SIMD_INTRINSIC + static constexpr _Tp _S_bit_and(_Tp __x, _Tp __y) { if constexpr (is_floating_point_v<_Tp>) { @@ -254,7 +269,8 @@ struct _SimdImplScalar } template <typename _Tp> - static constexpr inline _Tp _S_bit_or(_Tp __x, _Tp __y) + _GLIBCXX_SIMD_INTRINSIC + static constexpr _Tp _S_bit_or(_Tp __x, _Tp __y) { if constexpr (is_floating_point_v<_Tp>) { @@ -267,7 +283,8 @@ struct _SimdImplScalar } template <typename _Tp> - static constexpr inline _Tp _S_bit_xor(_Tp __x, _Tp __y) + _GLIBCXX_SIMD_INTRINSIC + static constexpr _Tp _S_bit_xor(_Tp __x, _Tp __y) { if constexpr (is_floating_point_v<_Tp>) { @@ -280,11 +297,13 @@ struct _SimdImplScalar } template <typename _Tp> - static constexpr inline _Tp _S_bit_shift_left(_Tp __x, int __y) + _GLIBCXX_SIMD_INTRINSIC + static constexpr _Tp _S_bit_shift_left(_Tp __x, int __y) { return static_cast<_Tp>(__promote_preserving_unsigned(__x) << __y); } template <typename _Tp> - static constexpr inline _Tp _S_bit_shift_right(_Tp __x, int __y) + _GLIBCXX_SIMD_INTRINSIC + static constexpr _Tp _S_bit_shift_right(_Tp __x, int __y) { return static_cast<_Tp>(__promote_preserving_unsigned(__x) >> __y); } // math {{{2 @@ -553,11 +572,13 @@ struct _SimdImplScalar // _S_increment & _S_decrement{{{2 template <typename _Tp> - constexpr static inline void _S_increment(_Tp& __x) + _GLIBCXX_SIMD_INTRINSIC + constexpr static void _S_increment(_Tp& __x) { ++__x; } template <typename _Tp> - constexpr static inline void _S_decrement(_Tp& __x) + _GLIBCXX_SIMD_INTRINSIC + constexpr static void _S_decrement(_Tp& __x) { --__x; } @@ -582,6 +603,7 @@ struct _SimdImplScalar // smart_reference access {{{2 template <typename _Tp, typename _Up> + _GLIBCXX_SIMD_INTRINSIC constexpr static void _S_set(_Tp& __v, [[maybe_unused]] int __i, _Up&& __x) noexcept { @@ -677,25 +699,32 @@ struct _MaskImplScalar } // logical and bitwise operators {{{2 + _GLIBCXX_SIMD_INTRINSIC static constexpr bool _S_logical_and(bool __x, bool __y) { return __x && __y; } + _GLIBCXX_SIMD_INTRINSIC static constexpr bool _S_logical_or(bool __x, bool __y) { return __x || __y; } + _GLIBCXX_SIMD_INTRINSIC static constexpr bool _S_bit_not(bool __x) { return !__x; } + _GLIBCXX_SIMD_INTRINSIC static constexpr bool _S_bit_and(bool __x, bool __y) { return __x && __y; } + _GLIBCXX_SIMD_INTRINSIC static constexpr bool _S_bit_or(bool __x, bool __y) { return __x || __y; } + _GLIBCXX_SIMD_INTRINSIC static constexpr bool _S_bit_xor(bool __x, bool __y) { return __x != __y; } // smart_reference access {{{2 + _GLIBCXX_SIMD_INTRINSIC constexpr static void _S_set(bool& __k, [[maybe_unused]] int __i, bool __x) noexcept { diff --git a/libstdc++-v3/include/experimental/bits/simd_x86.h b/libstdc++-v3/include/experimental/bits/simd_x86.h index 34633c096b1..e010740b44c 100644 --- a/libstdc++-v3/include/experimental/bits/simd_x86.h +++ b/libstdc++-v3/include/experimental/bits/simd_x86.h @@ -822,7 +822,7 @@ struct _CommonImplX86 : _CommonImplBuiltin // }}} // _SimdImplX86 {{{ -template <typename _Abi> +template <typename _Abi, typename> struct _SimdImplX86 : _SimdImplBuiltin<_Abi> { using _Base = _SimdImplBuiltin<_Abi>; @@ -4241,7 +4241,7 @@ struct _MaskImplX86Mixin // }}} // _MaskImplX86 {{{ -template <typename _Abi> +template <typename _Abi, typename> struct _MaskImplX86 : _MaskImplX86Mixin, _MaskImplBuiltin<_Abi> { using _MaskImplX86Mixin::_S_to_bits; ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 11/11] libstdc++: Fix ODR issues with different -m flags 2021-06-08 12:12 ` [PATCH 11/11] libstdc++: Fix ODR issues with different -m flags Matthias Kretz @ 2021-06-09 12:22 ` Richard Biener 2021-06-09 12:53 ` Matthias Kretz 2021-11-15 8:57 ` Matthias Kretz 1 sibling, 1 reply; 29+ messages in thread From: Richard Biener @ 2021-06-09 12:22 UTC (permalink / raw) To: Matthias Kretz; +Cc: GCC Patches, libstdc++ On Tue, Jun 8, 2021 at 2:23 PM Matthias Kretz <m.kretz@gsi.de> wrote: > > > From: Matthias Kretz <kretz@kde.org> > > Explicitly support use of the stdx::simd implementation in situations > where the user links TUs that were compiled with different -m flags. In > general, this is always a (quasi) ODR violation for inline functions > because at least codegen may differ in important ways. However, in the > resulting executable only one (unspecified which one) of them might be > used. For simd we want to support users to compile code multiple times, > with different -m flags and have a runtime dispatch to the TU matching > the target CPU. But if internal functions are not inlined this may lead > to unexpected performance loss or execution of illegal instructions. > Therefore, inline functions that are not marked as always_inline must > use an additional template parameter somewhere in their name, to > disambiguate between the different -m translations. Note that excessive use of always_inline can cause compile-time issues (see for example PR99785). I wonder whether the inlines can be placed in an anonymous namespace instead of the difficult to maintain explict list of SIMD features? It also doesn't solve the issue when instantiating the functions from a TU which contains #pragma GCC target sections to switch options, of course. Richard. > Signed-off-by: Matthias Kretz <m.kretz@gsi.de> > > libstdc++-v3/ChangeLog: > > * include/experimental/bits/simd.h: Move feature detection bools > and add __have_avx512bitalg, __have_avx512vbmi2, > __have_avx512vbmi, __have_avx512ifma, __have_avx512cd, > __have_avx512vnni, __have_avx512vpopcntdq. > (__detail::__machine_flags): New function which returns a unique > uint64 depending on relevant -m and -f flags. > (__detail::__odr_helper): New type alias for either an anonymous > type or a type specialized with the __machine_flags number. > (_SimdIntOperators): Change template parameters from _Impl to > _Tp, _Abi because _Impl now has an __odr_helper parameter which > may be _OdrEnforcer from the anonymous namespace, which makes > for a bad base class. > (many): Either add __odr_helper template parameter or mark as > always_inline. > * include/experimental/bits/simd_detail.h: Add defines for > AVX512BITALG, AVX512VBMI2, AVX512VBMI, AVX512IFMA, AVX512CD, > AVX512VNNI, AVX512VPOPCNTDQ, and AVX512VP2INTERSECT. > * include/experimental/bits/simd_builtin.h: Add __odr_helper > template parameter or mark as always_inline. > * include/experimental/bits/simd_fixed_size.h: Ditto. > * include/experimental/bits/simd_math.h: Ditto. > * include/experimental/bits/simd_scalar.h: Ditto. > * include/experimental/bits/simd_neon.h: Add __odr_helper > template parameter. > * include/experimental/bits/simd_ppc.h: Ditto. > * include/experimental/bits/simd_x86.h: Ditto. > --- > libstdc++-v3/include/experimental/bits/simd.h | 380 ++++++++++++------ > .../include/experimental/bits/simd_builtin.h | 41 +- > .../include/experimental/bits/simd_detail.h | 40 ++ > .../experimental/bits/simd_fixed_size.h | 39 +- > .../include/experimental/bits/simd_math.h | 45 ++- > .../include/experimental/bits/simd_neon.h | 4 +- > .../include/experimental/bits/simd_ppc.h | 4 +- > .../include/experimental/bits/simd_scalar.h | 71 +++- > .../include/experimental/bits/simd_x86.h | 4 +- > 9 files changed, 440 insertions(+), 188 deletions(-) > > > -- > ────────────────────────────────────────────────────────────────────────── > Dr. Matthias Kretz https://mattkretz.github.io > GSI Helmholtz Centre for Heavy Ion Research https://gsi.de > std::experimental::simd https://github.com/VcDevel/std-simd > ────────────────────────────────────────────────────────────────────────── ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 11/11] libstdc++: Fix ODR issues with different -m flags 2021-06-09 12:22 ` Richard Biener @ 2021-06-09 12:53 ` Matthias Kretz 2021-06-09 13:22 ` Richard Biener 0 siblings, 1 reply; 29+ messages in thread From: Matthias Kretz @ 2021-06-09 12:53 UTC (permalink / raw) To: Richard Biener; +Cc: GCC Patches, libstdc++ On Wednesday, 9 June 2021 14:22:00 CEST Richard Biener wrote: > On Tue, Jun 8, 2021 at 2:23 PM Matthias Kretz <m.kretz@gsi.de> wrote: > > From: Matthias Kretz <kretz@kde.org> > > > > Explicitly support use of the stdx::simd implementation in situations > > where the user links TUs that were compiled with different -m flags. In > > general, this is always a (quasi) ODR violation for inline functions > > because at least codegen may differ in important ways. However, in the > > resulting executable only one (unspecified which one) of them might be > > used. For simd we want to support users to compile code multiple times, > > with different -m flags and have a runtime dispatch to the TU matching > > the target CPU. But if internal functions are not inlined this may lead > > to unexpected performance loss or execution of illegal instructions. > > Therefore, inline functions that are not marked as always_inline must > > use an additional template parameter somewhere in their name, to > > disambiguate between the different -m translations. > > Note that excessive use of always_inline can cause compile-time issues > (see for example PR99785). Ah, I should verify whether that's also the reason my stdx::simd implementation is slow to compile. However, I really must have the always_inline semantics in most of the places stdx::simd uses it. Because most of these functions compile to either a single function call or a single instruction (often f0 -> f1 -> f2 -> single instruction). If the inliner even makes one single wrong inlining decision, the whole program might slow down by integral factors, not only small percentages. And without inlining these functions, -fno-inline builds (i.e. many debug builds) become unbearably slow (aka useless). > I wonder whether the inlines can be > placed in an anonymous namespace instead of the difficult to maintain > explict list of SIMD features? It's possible, and part of the patch: + namespace + { + struct _OdrEnforcer {}; + } [...] + using __odr_helper + = conditional_t<__machine_flags() == 0, _OdrEnforcer, + _MachineFlagsTemplate<__machine_flags(), __floating_point_flags()>>; It can potentially blow up the code size and the instruction cache usage, though. The trade-off isn't obvious to make. I guess I can't promise that mixing different compiler flags is ODR violation free > It also doesn't solve the issue when > instantiating the functions from a TU which contains #pragma GCC target > sections to switch options, of course. Yes. Can I get PR83875? ;-) - Matthias > > Signed-off-by: Matthias Kretz <m.kretz@gsi.de> > > > > libstdc++-v3/ChangeLog: > > * include/experimental/bits/simd.h: Move feature detection bools > > and add __have_avx512bitalg, __have_avx512vbmi2, > > __have_avx512vbmi, __have_avx512ifma, __have_avx512cd, > > __have_avx512vnni, __have_avx512vpopcntdq. > > (__detail::__machine_flags): New function which returns a unique > > uint64 depending on relevant -m and -f flags. > > (__detail::__odr_helper): New type alias for either an anonymous > > type or a type specialized with the __machine_flags number. > > (_SimdIntOperators): Change template parameters from _Impl to > > _Tp, _Abi because _Impl now has an __odr_helper parameter which > > may be _OdrEnforcer from the anonymous namespace, which makes > > for a bad base class. > > (many): Either add __odr_helper template parameter or mark as > > always_inline. > > * include/experimental/bits/simd_detail.h: Add defines for > > AVX512BITALG, AVX512VBMI2, AVX512VBMI, AVX512IFMA, AVX512CD, > > AVX512VNNI, AVX512VPOPCNTDQ, and AVX512VP2INTERSECT. > > * include/experimental/bits/simd_builtin.h: Add __odr_helper > > template parameter or mark as always_inline. > > * include/experimental/bits/simd_fixed_size.h: Ditto. > > * include/experimental/bits/simd_math.h: Ditto. > > * include/experimental/bits/simd_scalar.h: Ditto. > > * include/experimental/bits/simd_neon.h: Add __odr_helper > > template parameter. > > * include/experimental/bits/simd_ppc.h: Ditto. > > * include/experimental/bits/simd_x86.h: Ditto. > > > > --- > > > > libstdc++-v3/include/experimental/bits/simd.h | 380 ++++++++++++------ > > .../include/experimental/bits/simd_builtin.h | 41 +- > > .../include/experimental/bits/simd_detail.h | 40 ++ > > .../experimental/bits/simd_fixed_size.h | 39 +- > > .../include/experimental/bits/simd_math.h | 45 ++- > > .../include/experimental/bits/simd_neon.h | 4 +- > > .../include/experimental/bits/simd_ppc.h | 4 +- > > .../include/experimental/bits/simd_scalar.h | 71 +++- > > .../include/experimental/bits/simd_x86.h | 4 +- > > 9 files changed, 440 insertions(+), 188 deletions(-) > > > > -- > > ────────────────────────────────────────────────────────────────────────── > > > > Dr. Matthias Kretz https://mattkretz.github.io > > GSI Helmholtz Centre for Heavy Ion Research https://gsi.de > > std::experimental::simd https://github.com/VcDevel/std-simd > > > > ────────────────────────────────────────────────────────────────────────── -- ────────────────────────────────────────────────────────────────────────── Dr. Matthias Kretz https://mattkretz.github.io GSI Helmholtz Centre for Heavy Ion Research https://gsi.de std::experimental::simd https://github.com/VcDevel/std-simd ────────────────────────────────────────────────────────────────────────── ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 11/11] libstdc++: Fix ODR issues with different -m flags 2021-06-09 12:53 ` Matthias Kretz @ 2021-06-09 13:22 ` Richard Biener 0 siblings, 0 replies; 29+ messages in thread From: Richard Biener @ 2021-06-09 13:22 UTC (permalink / raw) To: Matthias Kretz; +Cc: GCC Patches, libstdc++ On Wed, Jun 9, 2021 at 2:53 PM Matthias Kretz <m.kretz@gsi.de> wrote: > > On Wednesday, 9 June 2021 14:22:00 CEST Richard Biener wrote: > > On Tue, Jun 8, 2021 at 2:23 PM Matthias Kretz <m.kretz@gsi.de> wrote: > > > From: Matthias Kretz <kretz@kde.org> > > > > > > Explicitly support use of the stdx::simd implementation in situations > > > where the user links TUs that were compiled with different -m flags. In > > > general, this is always a (quasi) ODR violation for inline functions > > > because at least codegen may differ in important ways. However, in the > > > resulting executable only one (unspecified which one) of them might be > > > used. For simd we want to support users to compile code multiple times, > > > with different -m flags and have a runtime dispatch to the TU matching > > > the target CPU. But if internal functions are not inlined this may lead > > > to unexpected performance loss or execution of illegal instructions. > > > Therefore, inline functions that are not marked as always_inline must > > > use an additional template parameter somewhere in their name, to > > > disambiguate between the different -m translations. > > > > Note that excessive use of always_inline can cause compile-time issues > > (see for example PR99785). > > Ah, I should verify whether that's also the reason my stdx::simd > implementation is slow to compile. > > However, I really must have the always_inline semantics in most of the places > stdx::simd uses it. Because most of these functions compile to either a single > function call or a single instruction (often f0 -> f1 -> f2 -> single > instruction). If the inliner even makes one single wrong inlining decision, > the whole program might slow down by integral factors, not only small > percentages. And without inlining these functions, -fno-inline builds (i.e. > many debug builds) become unbearably slow (aka useless). Understood. Note I think that the slow compile is a bug and there must be a way to address it, there's just too large testcases at the moment to get a hand on what kind of callgraphs cause which problem and why and how we might want to address this. > > I wonder whether the inlines can be > > placed in an anonymous namespace instead of the difficult to maintain > > explict list of SIMD features? > > It's possible, and part of the patch: > > + namespace > + { > + struct _OdrEnforcer {}; > + } > [...] > + using __odr_helper > + = conditional_t<__machine_flags() == 0, _OdrEnforcer, > + _MachineFlagsTemplate<__machine_flags(), __floating_point_flags()>>; > > It can potentially blow up the code size and the instruction cache usage, > though. The trade-off isn't obvious to make. I guess I can't promise that > mixing different compiler flags is ODR violation free > > > It also doesn't solve the issue when > > instantiating the functions from a TU which contains #pragma GCC target > > sections to switch options, of course. > > Yes. Can I get PR83875? ;-) heh ;) Richard. > - Matthias > > > > Signed-off-by: Matthias Kretz <m.kretz@gsi.de> > > > > > > libstdc++-v3/ChangeLog: > > > * include/experimental/bits/simd.h: Move feature detection bools > > > and add __have_avx512bitalg, __have_avx512vbmi2, > > > __have_avx512vbmi, __have_avx512ifma, __have_avx512cd, > > > __have_avx512vnni, __have_avx512vpopcntdq. > > > (__detail::__machine_flags): New function which returns a unique > > > uint64 depending on relevant -m and -f flags. > > > (__detail::__odr_helper): New type alias for either an anonymous > > > type or a type specialized with the __machine_flags number. > > > (_SimdIntOperators): Change template parameters from _Impl to > > > _Tp, _Abi because _Impl now has an __odr_helper parameter which > > > may be _OdrEnforcer from the anonymous namespace, which makes > > > for a bad base class. > > > (many): Either add __odr_helper template parameter or mark as > > > always_inline. > > > * include/experimental/bits/simd_detail.h: Add defines for > > > AVX512BITALG, AVX512VBMI2, AVX512VBMI, AVX512IFMA, AVX512CD, > > > AVX512VNNI, AVX512VPOPCNTDQ, and AVX512VP2INTERSECT. > > > * include/experimental/bits/simd_builtin.h: Add __odr_helper > > > template parameter or mark as always_inline. > > > * include/experimental/bits/simd_fixed_size.h: Ditto. > > > * include/experimental/bits/simd_math.h: Ditto. > > > * include/experimental/bits/simd_scalar.h: Ditto. > > > * include/experimental/bits/simd_neon.h: Add __odr_helper > > > template parameter. > > > * include/experimental/bits/simd_ppc.h: Ditto. > > > * include/experimental/bits/simd_x86.h: Ditto. > > > > > > --- > > > > > > libstdc++-v3/include/experimental/bits/simd.h | 380 ++++++++++++------ > > > .../include/experimental/bits/simd_builtin.h | 41 +- > > > .../include/experimental/bits/simd_detail.h | 40 ++ > > > .../experimental/bits/simd_fixed_size.h | 39 +- > > > .../include/experimental/bits/simd_math.h | 45 ++- > > > .../include/experimental/bits/simd_neon.h | 4 +- > > > .../include/experimental/bits/simd_ppc.h | 4 +- > > > .../include/experimental/bits/simd_scalar.h | 71 +++- > > > .../include/experimental/bits/simd_x86.h | 4 +- > > > 9 files changed, 440 insertions(+), 188 deletions(-) > > > > > > -- > > > ────────────────────────────────────────────────────────────────────────── > > > > > > Dr. Matthias Kretz https://mattkretz.github.io > > > GSI Helmholtz Centre for Heavy Ion Research https://gsi.de > > > std::experimental::simd https://github.com/VcDevel/std-simd > > > > > > ────────────────────────────────────────────────────────────────────────── > > > -- > ────────────────────────────────────────────────────────────────────────── > Dr. Matthias Kretz https://mattkretz.github.io > GSI Helmholtz Centre for Heavy Ion Research https://gsi.de > std::experimental::simd https://github.com/VcDevel/std-simd > ────────────────────────────────────────────────────────────────────────── > > > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 11/11] libstdc++: Fix ODR issues with different -m flags 2021-06-08 12:12 ` [PATCH 11/11] libstdc++: Fix ODR issues with different -m flags Matthias Kretz 2021-06-09 12:22 ` Richard Biener @ 2021-11-15 8:57 ` Matthias Kretz 2022-01-14 21:30 ` Jonathan Wakely 1 sibling, 1 reply; 29+ messages in thread From: Matthias Kretz @ 2021-11-15 8:57 UTC (permalink / raw) To: gcc-patches, libstdc++, Jonathan Wakely ping. OK to push? On Tuesday, 8 June 2021 14:12:23 CET Matthias Kretz wrote: > From: Matthias Kretz <kretz@kde.org> > > Explicitly support use of the stdx::simd implementation in situations > where the user links TUs that were compiled with different -m flags. In > general, this is always a (quasi) ODR violation for inline functions > because at least codegen may differ in important ways. However, in the > resulting executable only one (unspecified which one) of them might be > used. For simd we want to support users to compile code multiple times, > with different -m flags and have a runtime dispatch to the TU matching > the target CPU. But if internal functions are not inlined this may lead > to unexpected performance loss or execution of illegal instructions. > Therefore, inline functions that are not marked as always_inline must > use an additional template parameter somewhere in their name, to > disambiguate between the different -m translations. > > Signed-off-by: Matthias Kretz <m.kretz@gsi.de> > > libstdc++-v3/ChangeLog: > > * include/experimental/bits/simd.h: Move feature detection bools > and add __have_avx512bitalg, __have_avx512vbmi2, > __have_avx512vbmi, __have_avx512ifma, __have_avx512cd, > __have_avx512vnni, __have_avx512vpopcntdq. > (__detail::__machine_flags): New function which returns a unique > uint64 depending on relevant -m and -f flags. > (__detail::__odr_helper): New type alias for either an anonymous > type or a type specialized with the __machine_flags number. > (_SimdIntOperators): Change template parameters from _Impl to > _Tp, _Abi because _Impl now has an __odr_helper parameter which > may be _OdrEnforcer from the anonymous namespace, which makes > for a bad base class. > (many): Either add __odr_helper template parameter or mark as > always_inline. > * include/experimental/bits/simd_detail.h: Add defines for > AVX512BITALG, AVX512VBMI2, AVX512VBMI, AVX512IFMA, AVX512CD, > AVX512VNNI, AVX512VPOPCNTDQ, and AVX512VP2INTERSECT. > * include/experimental/bits/simd_builtin.h: Add __odr_helper > template parameter or mark as always_inline. > * include/experimental/bits/simd_fixed_size.h: Ditto. > * include/experimental/bits/simd_math.h: Ditto. > * include/experimental/bits/simd_scalar.h: Ditto. > * include/experimental/bits/simd_neon.h: Add __odr_helper > template parameter. > * include/experimental/bits/simd_ppc.h: Ditto. > * include/experimental/bits/simd_x86.h: Ditto. > --- > libstdc++-v3/include/experimental/bits/simd.h | 380 ++++++++++++------ > .../include/experimental/bits/simd_builtin.h | 41 +- > .../include/experimental/bits/simd_detail.h | 40 ++ > .../experimental/bits/simd_fixed_size.h | 39 +- > .../include/experimental/bits/simd_math.h | 45 ++- > .../include/experimental/bits/simd_neon.h | 4 +- > .../include/experimental/bits/simd_ppc.h | 4 +- > .../include/experimental/bits/simd_scalar.h | 71 +++- > .../include/experimental/bits/simd_x86.h | 4 +- > 9 files changed, 440 insertions(+), 188 deletions(-) -- ────────────────────────────────────────────────────────────────────────── Dr. Matthias Kretz https://mattkretz.github.io GSI Helmholtz Centre for Heavy Ion Research https://gsi.de stdₓ::simd ────────────────────────────────────────────────────────────────────────── ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 11/11] libstdc++: Fix ODR issues with different -m flags 2021-11-15 8:57 ` Matthias Kretz @ 2022-01-14 21:30 ` Jonathan Wakely 2022-01-17 0:08 ` Jonathan Wakely 0 siblings, 1 reply; 29+ messages in thread From: Jonathan Wakely @ 2022-01-14 21:30 UTC (permalink / raw) To: Matthias Kretz; +Cc: gcc Patches, libstdc++ On Mon, 15 Nov 2021 at 08:57, Matthias Kretz <m.kretz@gsi.de> wrote: > ping. OK to push? > Sorry for the delay - this is OK for trunk. > On Tuesday, 8 June 2021 14:12:23 CET Matthias Kretz wrote: > > From: Matthias Kretz <kretz@kde.org> > > > > Explicitly support use of the stdx::simd implementation in situations > > where the user links TUs that were compiled with different -m flags. In > > general, this is always a (quasi) ODR violation for inline functions > > because at least codegen may differ in important ways. However, in the > > resulting executable only one (unspecified which one) of them might be > > used. For simd we want to support users to compile code multiple times, > > with different -m flags and have a runtime dispatch to the TU matching > > the target CPU. But if internal functions are not inlined this may lead > > to unexpected performance loss or execution of illegal instructions. > > Therefore, inline functions that are not marked as always_inline must > > use an additional template parameter somewhere in their name, to > > disambiguate between the different -m translations. > > > > Signed-off-by: Matthias Kretz <m.kretz@gsi.de> > > > > libstdc++-v3/ChangeLog: > > > > * include/experimental/bits/simd.h: Move feature detection bools > > and add __have_avx512bitalg, __have_avx512vbmi2, > > __have_avx512vbmi, __have_avx512ifma, __have_avx512cd, > > __have_avx512vnni, __have_avx512vpopcntdq. > > (__detail::__machine_flags): New function which returns a unique > > uint64 depending on relevant -m and -f flags. > > (__detail::__odr_helper): New type alias for either an anonymous > > type or a type specialized with the __machine_flags number. > > (_SimdIntOperators): Change template parameters from _Impl to > > _Tp, _Abi because _Impl now has an __odr_helper parameter which > > may be _OdrEnforcer from the anonymous namespace, which makes > > for a bad base class. > > (many): Either add __odr_helper template parameter or mark as > > always_inline. > > * include/experimental/bits/simd_detail.h: Add defines for > > AVX512BITALG, AVX512VBMI2, AVX512VBMI, AVX512IFMA, AVX512CD, > > AVX512VNNI, AVX512VPOPCNTDQ, and AVX512VP2INTERSECT. > > * include/experimental/bits/simd_builtin.h: Add __odr_helper > > template parameter or mark as always_inline. > > * include/experimental/bits/simd_fixed_size.h: Ditto. > > * include/experimental/bits/simd_math.h: Ditto. > > * include/experimental/bits/simd_scalar.h: Ditto. > > * include/experimental/bits/simd_neon.h: Add __odr_helper > > template parameter. > > * include/experimental/bits/simd_ppc.h: Ditto. > > * include/experimental/bits/simd_x86.h: Ditto. > > --- > > libstdc++-v3/include/experimental/bits/simd.h | 380 ++++++++++++------ > > .../include/experimental/bits/simd_builtin.h | 41 +- > > .../include/experimental/bits/simd_detail.h | 40 ++ > > .../experimental/bits/simd_fixed_size.h | 39 +- > > .../include/experimental/bits/simd_math.h | 45 ++- > > .../include/experimental/bits/simd_neon.h | 4 +- > > .../include/experimental/bits/simd_ppc.h | 4 +- > > .../include/experimental/bits/simd_scalar.h | 71 +++- > > .../include/experimental/bits/simd_x86.h | 4 +- > > 9 files changed, 440 insertions(+), 188 deletions(-) > > -- > ────────────────────────────────────────────────────────────────────────── > Dr. Matthias Kretz https://mattkretz.github.io > GSI Helmholtz Centre for Heavy Ion Research https://gsi.de > stdₓ::simd > ────────────────────────────────────────────────────────────────────────── > > > > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 11/11] libstdc++: Fix ODR issues with different -m flags 2022-01-14 21:30 ` Jonathan Wakely @ 2022-01-17 0:08 ` Jonathan Wakely 0 siblings, 0 replies; 29+ messages in thread From: Jonathan Wakely @ 2022-01-17 0:08 UTC (permalink / raw) To: Matthias Kretz; +Cc: gcc Patches, libstdc++ On Fri, 14 Jan 2022 at 21:30, Jonathan Wakely <jwakely@redhat.com> wrote: > > > On Mon, 15 Nov 2021 at 08:57, Matthias Kretz <m.kretz@gsi.de> wrote: > >> ping. OK to push? >> > > Sorry for the delay - this is OK for trunk. > I see a new failure on powerpc64le-linux (gcc112 in the cfarm) after this commit: FAIL: experimental/simd/standard_abi_usable_2.cc -maltivec -mpower8-vector -O2 -Wno-psabi (test for excess errors) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups 2021-06-08 12:10 [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups Matthias Kretz ` (10 preceding siblings ...) 2021-06-08 12:12 ` [PATCH 11/11] libstdc++: Fix ODR issues with different -m flags Matthias Kretz @ 2021-06-24 13:42 ` Jonathan Wakely 11 siblings, 0 replies; 29+ messages in thread From: Jonathan Wakely @ 2021-06-24 13:42 UTC (permalink / raw) To: Matthias Kretz; +Cc: gcc Patches, libstdc++ On Tue, 8 Jun 2021 at 13:10, Matthias Kretz wrote: > > The following patches mostly contain code cleanups and minor corrections. The > major feature in this patchset is the last patch, which should make the use of > stdx::simd much safer wrt. ODR violations involuntarily introduced by linking > TUs that were compiled with different -m and floating-point flags. > > Matthias Kretz (11): > libstdc++: Improve copysign codegen > libstdc++: Remove dead code > libstdc++: Improve fixed_size codegen > libstdc++: Make use of __builtin_bit_cast > libstdc++: Remove incorrect fabs overload > libstdc++: Minor simd_math cleanups > libstdc++: Fix condition when AVX512F ldexp implementation is used > libstdc++: Avoid raising fp exceptions in trunc, floor, and ceil > libstdc++: Ensure unrolled loops inline the lambda > libstdc++: Fix internal names: add missing underscores > libstdc++: Fix ODR issues with different -m flags Thanks! I've pushed all except the bit_cast one (as discussed on IRC) and the ODR one (which I'm still reviewing). ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2022-01-17 0:08 UTC | newest] Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-06-08 12:10 [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups Matthias Kretz 2021-06-08 12:11 ` [PATCH 01/11] libstdc++: Improve copysign codegen Matthias Kretz 2021-06-08 12:11 ` [PATCH 02/11] libstdc++: Remove dead code Matthias Kretz 2021-06-08 12:11 ` [PATCH 03/11] libstdc++: Improve fixed_size codegen Matthias Kretz 2021-06-08 12:11 ` [PATCH 04/11] libstdc++: Make use of __builtin_bit_cast Matthias Kretz 2021-06-11 10:53 ` [PATCH 04/11 v2] " Matthias Kretz 2021-06-24 14:01 ` [PATCH 04/11 v3] " Matthias Kretz 2021-06-24 14:08 ` Jakub Jelinek 2021-06-24 14:11 ` Jonathan Wakely 2021-06-24 14:12 ` Jonathan Wakely 2021-06-24 14:21 ` Jakub Jelinek 2021-06-24 14:34 ` Jonathan Wakely 2021-06-24 14:40 ` Jonathan Wakely 2021-06-24 14:44 ` Jakub Jelinek 2021-06-25 11:23 ` Jonathan Wakely 2021-06-08 12:11 ` [PATCH 05/11] libstdc++: Remove incorrect fabs overload Matthias Kretz 2021-06-08 12:11 ` [PATCH 06/11] libstdc++: Minor simd_math cleanups Matthias Kretz 2021-06-08 12:11 ` [PATCH 07/11] libstdc++: Fix condition when AVX512F ldexp implementation is used Matthias Kretz 2021-06-08 12:11 ` [PATCH 08/11] libstdc++: Avoid raising fp exceptions in trunc, floor, and ceil Matthias Kretz 2021-06-08 12:11 ` [PATCH 09/11] libstdc++: Ensure unrolled loops inline the lambda Matthias Kretz 2021-06-08 12:12 ` [PATCH 10/11] libstdc++: Fix internal names: add missing underscores Matthias Kretz 2021-06-08 12:12 ` [PATCH 11/11] libstdc++: Fix ODR issues with different -m flags Matthias Kretz 2021-06-09 12:22 ` Richard Biener 2021-06-09 12:53 ` Matthias Kretz 2021-06-09 13:22 ` Richard Biener 2021-11-15 8:57 ` Matthias Kretz 2022-01-14 21:30 ` Jonathan Wakely 2022-01-17 0:08 ` Jonathan Wakely 2021-06-24 13:42 ` [PATCH 00/11] stdx::simd optimizations, corrections, and cleanups Jonathan Wakely
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).