public inbox for gcc-cvs@sourceware.org help / color / mirror / Atom feed
From: Jonathan Wakely <redi@gcc.gnu.org> To: gcc-cvs@gcc.gnu.org, libstdc++-cvs@gcc.gnu.org Subject: [gcc r11-6935] libstdc++: Add std::experimental::simd from the Parallelism TS 2 Date: Wed, 27 Jan 2021 16:39:18 +0000 (GMT) [thread overview] Message-ID: <20210127163918.E607F3846405@sourceware.org> (raw) https://gcc.gnu.org/g:2bcceb6fc59fcdaf51006d4fcfc71c2d26761396 commit r11-6935-g2bcceb6fc59fcdaf51006d4fcfc71c2d26761396 Author: Matthias Kretz <kretz@kde.org> Date: Thu Jan 21 11:45:15 2021 +0000 libstdc++: Add std::experimental::simd from the Parallelism TS 2 Adds <experimental/simd>. This implements the simd and simd_mask class templates via [[gnu::vector_size(N)]] data members. It implements overloads for all of <cmath> for simd. Explicit vectorization of the <cmath> functions is not finished. The majority of functions are marked as [[gnu::always_inline]] to enable quasi-ODR-conforming linking of TUs with different -m flags. Performance optimization was done for x86_64. ARM, Aarch64, and POWER rely on the compiler to recognize reduction, conversion, and shuffle patterns. Besides verification using many different machine flages, the code was also verified with different fast-math flags. libstdc++-v3/ChangeLog: * doc/xml/manual/status_cxx2017.xml: Add implementation status of the Parallelism TS 2. Document implementation-defined types and behavior. * include/Makefile.am: Add new headers. * include/Makefile.in: Regenerate. * include/experimental/simd: New file. New header for Parallelism TS 2. * include/experimental/bits/numeric_traits.h: New file. Implementation of P1841R1 using internal naming. Addition of missing IEC559 functionality query. * include/experimental/bits/simd.h: New file. Definition of the public simd interfaces and general implementation helpers. * include/experimental/bits/simd_builtin.h: New file. Implementation of the _VecBuiltin simd_abi. * include/experimental/bits/simd_converter.h: New file. Generic simd conversions. * include/experimental/bits/simd_detail.h: New file. Internal macros for the simd implementation. * include/experimental/bits/simd_fixed_size.h: New file. Simd fixed_size ABI specific implementations. * include/experimental/bits/simd_math.h: New file. Math overloads for simd. * include/experimental/bits/simd_neon.h: New file. Simd NEON specific implementations. * include/experimental/bits/simd_ppc.h: New file. Implement bit shifts to avoid invalid results for integral types smaller than int. * include/experimental/bits/simd_scalar.h: New file. Simd scalar ABI specific implementations. * include/experimental/bits/simd_x86.h: New file. Simd x86 specific implementations. * include/experimental/bits/simd_x86_conversions.h: New file. x86 specific conversion optimizations. The conversion patterns work around missing conversion patterns in the compiler and should be removed as soon as PR85048 is resolved. * testsuite/experimental/simd/standard_abi_usable.cc: New file. Test that all (not all fixed_size<N>, though) standard simd and simd_mask types are usable. * testsuite/experimental/simd/standard_abi_usable_2.cc: New file. As above but with -ffast-math. * testsuite/libstdc++-dg/conformance.exp: Don't build simd tests from the standard test loop. Instead use check_vect_support_and_set_flags to build simd tests with the relevant machine flags. Diff: --- libstdc++-v3/doc/xml/manual/status_cxx2017.xml | 216 + libstdc++-v3/include/Makefile.am | 13 + libstdc++-v3/include/Makefile.in | 13 + .../include/experimental/bits/numeric_traits.h | 567 +++ libstdc++-v3/include/experimental/bits/simd.h | 5051 +++++++++++++++++++ .../include/experimental/bits/simd_builtin.h | 2949 +++++++++++ .../include/experimental/bits/simd_converter.h | 354 ++ .../include/experimental/bits/simd_detail.h | 306 ++ .../include/experimental/bits/simd_fixed_size.h | 2066 ++++++++ libstdc++-v3/include/experimental/bits/simd_math.h | 1500 ++++++ libstdc++-v3/include/experimental/bits/simd_neon.h | 519 ++ libstdc++-v3/include/experimental/bits/simd_ppc.h | 123 + .../include/experimental/bits/simd_scalar.h | 772 +++ libstdc++-v3/include/experimental/bits/simd_x86.h | 5169 ++++++++++++++++++++ .../experimental/bits/simd_x86_conversions.h | 2029 ++++++++ libstdc++-v3/include/experimental/simd | 70 + .../experimental/simd/standard_abi_usable.cc | 64 + .../experimental/simd/standard_abi_usable_2.cc | 4 + .../testsuite/libstdc++-dg/conformance.exp | 18 +- 19 files changed, 21802 insertions(+), 1 deletion(-) diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml index e6834b3607a..bc740f8e1ba 100644 --- a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml +++ b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml @@ -2869,6 +2869,17 @@ since C++14 and the implementation is complete. <entry>Library Fundamentals 2 TS</entry> </row> + <row> + <entry> + <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0214r9.pdf"> + P0214R9 + </link> + </entry> + <entry>Data-Parallel Types</entry> + <entry>Y</entry> + <entry>Parallelism 2 TS</entry> + </row> + </tbody> </tgroup> </table> @@ -3014,6 +3025,211 @@ since C++14 and the implementation is complete. If <code>!is_regular_file(p)</code>, an error is reported. </para> + <section xml:id="iso.2017.par2ts" xreflabel="Implementation Specific Behavior of the Parallelism 2 TS"><info><title>Parallelism 2 TS</title></info> + + <para> + <emphasis>9.3 [parallel.simd.abi]</emphasis> + <code>max_fixed_size<T></code> is 32, except when targetting + AVX512BW and <code>sizeof(T)</code> is 1. + </para> + + <para> + When targeting 32-bit x86, + <classname>simd_abi::compatible<T></classname> is an alias for + <classname>simd_abi::scalar</classname>. + When targeting 64-bit x86 (including x32) or Aarch64, + <classname>simd_abi::compatible<T></classname> is an alias for + <classname>simd_abi::_VecBuiltin<16></classname>, + unless <code>T</code> is <code>long double</code>, in which case it is + an alias for <classname>simd_abi::scalar</classname>. + When targeting ARM (but not Aarch64) with NEON support, + <classname>simd_abi::compatible<T></classname> is an alias for + <classname>simd_abi::_VecBuiltin<16></classname>, + unless <code>sizeof(T) > 4</code>, in which case it is + an alias for <classname>simd_abi::scalar</classname>. Additionally, + <classname>simd_abi::compatible<float></classname> is an alias for + <classname>simd_abi::scalar</classname> unless compiling with + -ffast-math. + </para> + + <para> + When targeting x86 (both 32-bit and 64-bit), + <classname>simd_abi::native<T></classname> is an alias for one of + <classname>simd_abi::scalar</classname>, + <classname>simd_abi::_VecBuiltin<16></classname>, + <classname>simd_abi::_VecBuiltin<32></classname>, or + <classname>simd_abi::_VecBltnBtmsk<64></classname>, depending on + <code>T</code> and the machine options the compiler was invoked with. + </para> + + <para> + When targeting ARM/Aarch64 or POWER, + <classname>simd_abi::native<T></classname> is an alias for + <classname>simd_abi::scalar</classname> or + <classname>simd_abi::_VecBuiltin<16></classname>, depending on + <code>T</code> and the machine options the compiler was invoked with. + </para> + + <para> + For any other targeted machine + <classname>simd_abi::compatible<T></classname> and + <classname>simd_abi::native<T></classname> are aliases for + <classname>simd_abi::scalar</classname>. (subject to change) + </para> + + <para> + The extended ABI tag types defined in the + <code>std::experimental::parallelism_v2::simd_abi</code> namespace are: + <classname>simd_abi::_VecBuiltin<Bytes></classname>, and + <classname>simd_abi::_VecBltnBtmsk<Bytes></classname>. + </para> + + <para> + <classname>simd_abi::deduce<T, N, Abis...>::type</classname>, + with <code>N > 1</code> is an alias for an extended ABI tag, if a + supported extended ABI tag exists. Otherwise it is an alias for + <classname>simd_abi::fixed_size<N></classname>. The <classname> + simd_abi::_VecBltnBtmsk</classname> ABI tag is preferred over + <classname>simd_abi::_VecBuiltin</classname>. + </para> + + <para> + <emphasis>9.4 [parallel.simd.traits]</emphasis> + <classname>memory_alignment<T, U>::value</classname> is + <code>sizeof(U) * T::size()</code> rounded up to the next power-of-two + value. + </para> + + <para> + <emphasis>9.6.1 [parallel.simd.overview]</emphasis> + On ARM, <classname>simd<T, _VecBuiltin<Bytes>></classname> + is supported if <code>__ARM_NEON</code> is defined and + <code>sizeof(T) <= 4</code>. Additionally, + <code>sizeof(T) == 8</code> with integral <code>T</code> is supported if + <code>__ARM_ARCH >= 8</code>, and <code>double</code> is supported if + <code>__aarch64__</code> is defined. + + On POWER, <classname>simd<T, _VecBuiltin<Bytes>></classname> + is supported if <code>__ALTIVEC__</code> is defined and <code>sizeof(T) + < 8</code>. Additionally, <code>double</code> is supported if + <code>__VSX__</code> is defined, and any <code>T</code> with <code> + sizeof(T) ≤ 8</code> is supported if <code>__POWER8_VECTOR__</code> + is defined. + + On x86, given an extended ABI tag <code>Abi</code>, + <classname>simd<T, Abi></classname> is supported according to the + following table: + <table frame="all" xml:id="table.par2ts_simd_support"> + <title>Support for Extended ABI Tags</title> + + <tgroup cols="4" align="left" colsep="0" rowsep="1"> + <colspec colname="c1"/> + <colspec colname="c2"/> + <colspec colname="c3"/> + <colspec colname="c4"/> + <thead> + <row> + <entry>ABI tag <code>Abi</code></entry> + <entry>value type <code>T</code></entry> + <entry>values for <code>Bytes</code></entry> + <entry>required machine option</entry> + </row> + </thead> + + <tbody> + <row> + <entry morerows="5"> + <classname>_VecBuiltin<Bytes></classname> + </entry> + <entry morerows="1"><code>float</code></entry> + <entry>8, 12, 16</entry> + <entry>"-msse"</entry> + </row> + + <row> + <entry>20, 24, 28, 32</entry> + <entry>"-mavx"</entry> + </row> + + <row> + <entry morerows="1"><code>double</code></entry> + <entry>16</entry> + <entry>"-msse2"</entry> + </row> + + <row> + <entry>24, 32</entry> + <entry>"-mavx"</entry> + </row> + + <row> + <entry morerows="1"> + integral types other than <code>bool</code> + </entry> + <entry> + <code>Bytes</code> ≤ 16 and <code>Bytes</code> divisible by + <code>sizeof(T)</code> + </entry> + <entry>"-msse2"</entry> + </row> + + <row> + <entry> + 16 < <code>Bytes</code> ≤ 32 and <code>Bytes</code> + divisible by <code>sizeof(T)</code> + </entry> + <entry>"-mavx2"</entry> + </row> + + <row> + <entry morerows="1"> + <classname>_VecBuiltin<Bytes></classname> and + <classname>_VecBltnBtmsk<Bytes></classname> + </entry> + <entry> + vectorizable types with <code>sizeof(T)</code> ≥ 4 + </entry> + <entry morerows="1"> + 32 < <code>Bytes</code> ≤ 64 and <code>Bytes</code> + divisible by <code>sizeof(T)</code> + </entry> + <entry>"-mavx512f"</entry> + </row> + + <row> + <entry> + vectorizable types with <code>sizeof(T)</code> < 4 + </entry> + <entry>"-mavx512bw"</entry> + </row> + + <row> + <entry morerows="1"> + <classname>_VecBltnBtmsk<Bytes></classname> + </entry> + <entry> + vectorizable types with <code>sizeof(T)</code> ≥ 4 + </entry> + <entry morerows="1"> + <code>Bytes</code> ≤ 32 and <code>Bytes</code> divisible by + <code>sizeof(T)</code> + </entry> + <entry>"-mavx512vl"</entry> + </row> + + <row> + <entry> + vectorizable types with <code>sizeof(T)</code> < 4 + </entry> + <entry>"-mavx512bw" and "-mavx512vl"</entry> + </row> + + </tbody> + </tgroup> + </table> + </para> + + </section> </section> diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am index 90508a8fe83..f24a5489e8e 100644 --- a/libstdc++-v3/include/Makefile.am +++ b/libstdc++-v3/include/Makefile.am @@ -747,6 +747,7 @@ experimental_headers = \ ${experimental_srcdir}/ratio \ ${experimental_srcdir}/regex \ ${experimental_srcdir}/set \ + ${experimental_srcdir}/simd \ ${experimental_srcdir}/socket \ ${experimental_srcdir}/source_location \ ${experimental_srcdir}/string \ @@ -766,7 +767,19 @@ experimental_bits_builddir = ./experimental/bits experimental_bits_headers = \ ${experimental_bits_srcdir}/lfts_config.h \ ${experimental_bits_srcdir}/net.h \ + ${experimental_bits_srcdir}/numeric_traits.h \ ${experimental_bits_srcdir}/shared_ptr.h \ + ${experimental_bits_srcdir}/simd.h \ + ${experimental_bits_srcdir}/simd_builtin.h \ + ${experimental_bits_srcdir}/simd_converter.h \ + ${experimental_bits_srcdir}/simd_detail.h \ + ${experimental_bits_srcdir}/simd_fixed_size.h \ + ${experimental_bits_srcdir}/simd_math.h \ + ${experimental_bits_srcdir}/simd_neon.h \ + ${experimental_bits_srcdir}/simd_ppc.h \ + ${experimental_bits_srcdir}/simd_scalar.h \ + ${experimental_bits_srcdir}/simd_x86.h \ + ${experimental_bits_srcdir}/simd_x86_conversions.h \ ${experimental_bits_srcdir}/string_view.tcc \ ${experimental_bits_filesystem_headers} diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in index 922ba440df0..12c63400706 100644 --- a/libstdc++-v3/include/Makefile.in +++ b/libstdc++-v3/include/Makefile.in @@ -1097,6 +1097,7 @@ experimental_headers = \ ${experimental_srcdir}/ratio \ ${experimental_srcdir}/regex \ ${experimental_srcdir}/set \ + ${experimental_srcdir}/simd \ ${experimental_srcdir}/socket \ ${experimental_srcdir}/source_location \ ${experimental_srcdir}/string \ @@ -1116,7 +1117,19 @@ experimental_bits_builddir = ./experimental/bits experimental_bits_headers = \ ${experimental_bits_srcdir}/lfts_config.h \ ${experimental_bits_srcdir}/net.h \ + ${experimental_bits_srcdir}/numeric_traits.h \ ${experimental_bits_srcdir}/shared_ptr.h \ + ${experimental_bits_srcdir}/simd.h \ + ${experimental_bits_srcdir}/simd_builtin.h \ + ${experimental_bits_srcdir}/simd_converter.h \ + ${experimental_bits_srcdir}/simd_detail.h \ + ${experimental_bits_srcdir}/simd_fixed_size.h \ + ${experimental_bits_srcdir}/simd_math.h \ + ${experimental_bits_srcdir}/simd_neon.h \ + ${experimental_bits_srcdir}/simd_ppc.h \ + ${experimental_bits_srcdir}/simd_scalar.h \ + ${experimental_bits_srcdir}/simd_x86.h \ + ${experimental_bits_srcdir}/simd_x86_conversions.h \ ${experimental_bits_srcdir}/string_view.tcc \ ${experimental_bits_filesystem_headers} diff --git a/libstdc++-v3/include/experimental/bits/numeric_traits.h b/libstdc++-v3/include/experimental/bits/numeric_traits.h new file mode 100644 index 00000000000..1b60874b788 --- /dev/null +++ b/libstdc++-v3/include/experimental/bits/numeric_traits.h @@ -0,0 +1,567 @@ +// Definition of numeric_limits replacement traits P1841R1 -*- C++ -*- + +// Copyright (C) 2020 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// Under Section 7 of GPL version 3, you are granted additional +// permissions described in the GCC Runtime Library Exception, version +// 3.1, as published by the Free Software Foundation. + +// You should have received a copy of the GNU General Public License and +// a copy of the GCC Runtime Library Exception along with this program; +// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +// <http://www.gnu.org/licenses/>. + +#include <type_traits> + +namespace std { + +template <template <typename> class _Trait, typename _Tp, typename = void> + struct __value_exists_impl : false_type {}; + +template <template <typename> class _Trait, typename _Tp> + struct __value_exists_impl<_Trait, _Tp, void_t<decltype(_Trait<_Tp>::value)>> + : true_type {}; + +template <typename _Tp, bool = is_arithmetic_v<_Tp>> + struct __digits_impl {}; + +template <typename _Tp> + struct __digits_impl<_Tp, true> + { + static inline constexpr int value + = sizeof(_Tp) * __CHAR_BIT__ - is_signed_v<_Tp>; + }; + +template <> + struct __digits_impl<float, true> + { static inline constexpr int value = __FLT_MANT_DIG__; }; + +template <> + struct __digits_impl<double, true> + { static inline constexpr int value = __DBL_MANT_DIG__; }; + +template <> + struct __digits_impl<long double, true> + { static inline constexpr int value = __LDBL_MANT_DIG__; }; + +template <typename _Tp, bool = is_arithmetic_v<_Tp>> + struct __digits10_impl {}; + +template <typename _Tp> + struct __digits10_impl<_Tp, true> + { + // The fraction 643/2136 approximates log10(2) to 7 significant digits. + static inline constexpr int value = __digits_impl<_Tp>::value * 643L / 2136; + }; + +template <> + struct __digits10_impl<float, true> + { static inline constexpr int value = __FLT_DIG__; }; + +template <> + struct __digits10_impl<double, true> + { static inline constexpr int value = __DBL_DIG__; }; + +template <> + struct __digits10_impl<long double, true> + { static inline constexpr int value = __LDBL_DIG__; }; + +template <typename _Tp, bool = is_arithmetic_v<_Tp>> + struct __max_digits10_impl {}; + +template <typename _Tp> + struct __max_digits10_impl<_Tp, true> + { + static inline constexpr int value + = is_floating_point_v<_Tp> ? 2 + __digits_impl<_Tp>::value * 643L / 2136 + : __digits10_impl<_Tp>::value + 1; + }; + +template <typename _Tp> + struct __max_exponent_impl {}; + +template <> + struct __max_exponent_impl<float> + { static inline constexpr int value = __FLT_MAX_EXP__; }; + +template <> + struct __max_exponent_impl<double> + { static inline constexpr int value = __DBL_MAX_EXP__; }; + +template <> + struct __max_exponent_impl<long double> + { static inline constexpr int value = __LDBL_MAX_EXP__; }; + +template <typename _Tp> + struct __max_exponent10_impl {}; + +template <> + struct __max_exponent10_impl<float> + { static inline constexpr int value = __FLT_MAX_10_EXP__; }; + +template <> + struct __max_exponent10_impl<double> + { static inline constexpr int value = __DBL_MAX_10_EXP__; }; + +template <> + struct __max_exponent10_impl<long double> + { static inline constexpr int value = __LDBL_MAX_10_EXP__; }; + +template <typename _Tp> + struct __min_exponent_impl {}; + +template <> + struct __min_exponent_impl<float> + { static inline constexpr int value = __FLT_MIN_EXP__; }; + +template <> + struct __min_exponent_impl<double> + { static inline constexpr int value = __DBL_MIN_EXP__; }; + +template <> + struct __min_exponent_impl<long double> + { static inline constexpr int value = __LDBL_MIN_EXP__; }; + +template <typename _Tp> + struct __min_exponent10_impl {}; + +template <> + struct __min_exponent10_impl<float> + { static inline constexpr int value = __FLT_MIN_10_EXP__; }; + +template <> + struct __min_exponent10_impl<double> + { static inline constexpr int value = __DBL_MIN_10_EXP__; }; + +template <> + struct __min_exponent10_impl<long double> + { static inline constexpr int value = __LDBL_MIN_10_EXP__; }; + +template <typename _Tp, bool = is_arithmetic_v<_Tp>> + struct __radix_impl {}; + +template <typename _Tp> + struct __radix_impl<_Tp, true> + { + static inline constexpr int value + = is_floating_point_v<_Tp> ? __FLT_RADIX__ : 2; + }; + +// [num.traits.util], numeric utility traits +template <template <typename> class _Trait, typename _Tp> + struct __value_exists : __value_exists_impl<_Trait, _Tp> {}; + +template <template <typename> class _Trait, typename _Tp> + inline constexpr bool __value_exists_v = __value_exists<_Trait, _Tp>::value; + +template <template <typename> class _Trait, typename _Tp, typename _Up = _Tp> + inline constexpr _Up + __value_or(_Up __def = _Up()) noexcept + { + if constexpr (__value_exists_v<_Trait, _Tp>) + return static_cast<_Up>(_Trait<_Tp>::value); + else + return __def; + } + +template <typename _Tp, bool = is_arithmetic_v<_Tp>> + struct __norm_min_impl {}; + +template <typename _Tp> + struct __norm_min_impl<_Tp, true> + { static inline constexpr _Tp value = 1; }; + +template <> + struct __norm_min_impl<float, true> + { static inline constexpr float value = __FLT_MIN__; }; + +template <> + struct __norm_min_impl<double, true> + { static inline constexpr double value = __DBL_MIN__; }; + +template <> + struct __norm_min_impl<long double, true> + { static inline constexpr long double value = __LDBL_MIN__; }; + +template <typename _Tp> + struct __denorm_min_impl : __norm_min_impl<_Tp> {}; + +#if __FLT_HAS_DENORM__ +template <> + struct __denorm_min_impl<float> + { static inline constexpr float value = __FLT_DENORM_MIN__; }; +#endif + +#if __DBL_HAS_DENORM__ +template <> + struct __denorm_min_impl<double> + { static inline constexpr double value = __DBL_DENORM_MIN__; }; +#endif + +#if __LDBL_HAS_DENORM__ +template <> + struct __denorm_min_impl<long double> + { static inline constexpr long double value = __LDBL_DENORM_MIN__; }; +#endif + +template <typename _Tp> + struct __epsilon_impl {}; + +template <> + struct __epsilon_impl<float> + { static inline constexpr float value = __FLT_EPSILON__; }; + +template <> + struct __epsilon_impl<double> + { static inline constexpr double value = __DBL_EPSILON__; }; + +template <> + struct __epsilon_impl<long double> + { static inline constexpr long double value = __LDBL_EPSILON__; }; + +template <typename _Tp, bool = is_arithmetic_v<_Tp>> + struct __finite_min_impl {}; + +template <typename _Tp> + struct __finite_min_impl<_Tp, true> + { + static inline constexpr _Tp value + = is_unsigned_v<_Tp> ? _Tp() + : -2 * (_Tp(1) << __digits_impl<_Tp>::value - 1); + }; + +template <> + struct __finite_min_impl<float, true> + { static inline constexpr float value = -__FLT_MAX__; }; + +template <> + struct __finite_min_impl<double, true> + { static inline constexpr double value = -__DBL_MAX__; }; + +template <> + struct __finite_min_impl<long double, true> + { static inline constexpr long double value = -__LDBL_MAX__; }; + +template <typename _Tp, bool = is_arithmetic_v<_Tp>> + struct __finite_max_impl {}; + +template <typename _Tp> + struct __finite_max_impl<_Tp, true> + { static inline constexpr _Tp value = ~__finite_min_impl<_Tp>::value; }; + +template <> + struct __finite_max_impl<float, true> + { static inline constexpr float value = __FLT_MAX__; }; + +template <> + struct __finite_max_impl<double, true> + { static inline constexpr double value = __DBL_MAX__; }; + +template <> + struct __finite_max_impl<long double, true> + { static inline constexpr long double value = __LDBL_MAX__; }; + +template <typename _Tp> + struct __infinity_impl {}; + +#if __FLT_HAS_INFINITY__ +template <> + struct __infinity_impl<float> + { static inline constexpr float value = __builtin_inff(); }; +#endif + +#if __DBL_HAS_INFINITY__ +template <> + struct __infinity_impl<double> + { static inline constexpr double value = __builtin_inf(); }; +#endif + +#if __LDBL_HAS_INFINITY__ +template <> + struct __infinity_impl<long double> + { static inline constexpr long double value = __builtin_infl(); }; +#endif + +template <typename _Tp> + struct __quiet_NaN_impl {}; + +#if __FLT_HAS_QUIET_NAN__ +template <> + struct __quiet_NaN_impl<float> + { static inline constexpr float value = __builtin_nanf(""); }; +#endif + +#if __DBL_HAS_QUIET_NAN__ +template <> + struct __quiet_NaN_impl<double> + { static inline constexpr double value = __builtin_nan(""); }; +#endif + +#if __LDBL_HAS_QUIET_NAN__ +template <> + struct __quiet_NaN_impl<long double> + { static inline constexpr long double value = __builtin_nanl(""); }; +#endif + +template <typename _Tp, bool = is_floating_point_v<_Tp>> + struct __reciprocal_overflow_threshold_impl {}; + +template <typename _Tp> + struct __reciprocal_overflow_threshold_impl<_Tp, true> + { + // This typically yields a subnormal value. Is this incorrect for + // flush-to-zero configurations? + static constexpr _Tp _S_search(_Tp __ok, _Tp __overflows) + { + const _Tp __mid = (__ok + __overflows) / 2; + // 1/__mid without -ffast-math is not a constant expression if it + // overflows. Therefore divide 1 by the radix before division. + // Consequently finite_max (the threshold) must be scaled by the + // same value. + if (__mid == __ok || __mid == __overflows) + return __ok; + else if (_Tp(1) / (__radix_impl<_Tp>::value * __mid) + <= __finite_max_impl<_Tp>::value / __radix_impl<_Tp>::value) + return _S_search(__mid, __overflows); + else + return _S_search(__ok, __mid); + } + + static inline constexpr _Tp value + = _S_search(_Tp(1.01) / __finite_max_impl<_Tp>::value, + _Tp(0.99) / __finite_max_impl<_Tp>::value); + }; + +template <typename _Tp, bool = is_floating_point_v<_Tp>> + struct __round_error_impl {}; + +template <typename _Tp> + struct __round_error_impl<_Tp, true> + { static inline constexpr _Tp value = 0.5; }; + +template <typename _Tp> + struct __signaling_NaN_impl {}; + +#if __FLT_HAS_QUIET_NAN__ +template <> + struct __signaling_NaN_impl<float> + { static inline constexpr float value = __builtin_nansf(""); }; +#endif + +#if __DBL_HAS_QUIET_NAN__ +template <> + struct __signaling_NaN_impl<double> + { static inline constexpr double value = __builtin_nans(""); }; +#endif + +#if __LDBL_HAS_QUIET_NAN__ +template <> + struct __signaling_NaN_impl<long double> + { static inline constexpr long double value = __builtin_nansl(""); }; +#endif + +// [num.traits.val], numeric distinguished value traits +template <typename _Tp> + struct __denorm_min : __denorm_min_impl<remove_cv_t<_Tp>> {}; + +template <typename _Tp> + struct __epsilon : __epsilon_impl<remove_cv_t<_Tp>> {}; + +template <typename _Tp> + struct __finite_max : __finite_max_impl<remove_cv_t<_Tp>> {}; + +template <typename _Tp> + struct __finite_min : __finite_min_impl<remove_cv_t<_Tp>> {}; + +template <typename _Tp> + struct __infinity : __infinity_impl<remove_cv_t<_Tp>> {}; + +template <typename _Tp> + struct __norm_min : __norm_min_impl<remove_cv_t<_Tp>> {}; + +template <typename _Tp> + struct __quiet_NaN : __quiet_NaN_impl<remove_cv_t<_Tp>> {}; + +template <typename _Tp> + struct __reciprocal_overflow_threshold + : __reciprocal_overflow_threshold_impl<remove_cv_t<_Tp>> {}; + +template <typename _Tp> + struct __round_error : __round_error_impl<remove_cv_t<_Tp>> {}; + +template <typename _Tp> + struct __signaling_NaN : __signaling_NaN_impl<remove_cv_t<_Tp>> {}; + +template <typename _Tp> + inline constexpr auto __denorm_min_v = __denorm_min<_Tp>::value; + +template <typename _Tp> + inline constexpr auto __epsilon_v = __epsilon<_Tp>::value; + +template <typename _Tp> + inline constexpr auto __finite_max_v = __finite_max<_Tp>::value; + +template <typename _Tp> + inline constexpr auto __finite_min_v = __finite_min<_Tp>::value; + +template <typename _Tp> + inline constexpr auto __infinity_v = __infinity<_Tp>::value; + +template <typename _Tp> + inline constexpr auto __norm_min_v = __norm_min<_Tp>::value; + +template <typename _Tp> + inline constexpr auto __quiet_NaN_v = __quiet_NaN<_Tp>::value; + +template <typename _Tp> + inline constexpr auto __reciprocal_overflow_threshold_v + = __reciprocal_overflow_threshold<_Tp>::value; + +template <typename _Tp> + inline constexpr auto __round_error_v = __round_error<_Tp>::value; + +template <typename _Tp> + inline constexpr auto __signaling_NaN_v = __signaling_NaN<_Tp>::value; + +// [num.traits.char], numeric characteristics traits +template <typename _Tp> + struct __digits : __digits_impl<remove_cv_t<_Tp>> {}; + +template <typename _Tp> + struct __digits10 : __digits10_impl<remove_cv_t<_Tp>> {}; + +template <typename _Tp> + struct __max_digits10 : __max_digits10_impl<remove_cv_t<_Tp>> {}; + +template <typename _Tp> + struct __max_exponent : __max_exponent_impl<remove_cv_t<_Tp>> {}; + +template <typename _Tp> + struct __max_exponent10 : __max_exponent10_impl<remove_cv_t<_Tp>> {}; + +template <typename _Tp> + struct __min_exponent : __min_exponent_impl<remove_cv_t<_Tp>> {}; + +template <typename _Tp> + struct __min_exponent10 : __min_exponent10_impl<remove_cv_t<_Tp>> {}; + +template <typename _Tp> + struct __radix : __radix_impl<remove_cv_t<_Tp>> {}; + +template <typename _Tp> + inline constexpr auto __digits_v = __digits<_Tp>::value; + +template <typename _Tp> + inline constexpr auto __digits10_v = __digits10<_Tp>::value; + +template <typename _Tp> + inline constexpr auto __max_digits10_v = __max_digits10<_Tp>::value; + +template <typename _Tp> + inline constexpr auto __max_exponent_v = __max_exponent<_Tp>::value; + +template <typename _Tp> + inline constexpr auto __max_exponent10_v = __max_exponent10<_Tp>::value; + +template <typename _Tp> + inline constexpr auto __min_exponent_v = __min_exponent<_Tp>::value; + +template <typename _Tp> + inline constexpr auto __min_exponent10_v = __min_exponent10<_Tp>::value; + +template <typename _Tp> + inline constexpr auto __radix_v = __radix<_Tp>::value; + +// mkretz's extensions +// TODO: does GCC tell me? __GCC_IEC_559 >= 2 is not the right answer +template <typename _Tp> + struct __has_iec559_storage_format : true_type {}; + +template <typename _Tp> + inline constexpr bool __has_iec559_storage_format_v + = __has_iec559_storage_format<_Tp>::value; + +/* To propose: + If __has_iec559_behavior<__quiet_NaN, T> is true the following holds: + - nan == nan is false + - isnan(nan) is true + - isnan(nan + x) is true + - isnan(inf/inf) is true + - isnan(0/0) is true + - isunordered(nan, x) is true + + If __has_iec559_behavior<__infinity, T> is true the following holds (x is + neither nan nor inf): + - isinf(inf) is true + - isinf(inf + x) is true + - isinf(1/0) is true + */ +template <template <typename> class _Trait, typename _Tp> + struct __has_iec559_behavior : false_type {}; + +template <template <typename> class _Trait, typename _Tp> + inline constexpr bool __has_iec559_behavior_v + = __has_iec559_behavior<_Trait, _Tp>::value; + +#if !__FINITE_MATH_ONLY__ +#if __FLT_HAS_QUIET_NAN__ +template <> + struct __has_iec559_behavior<__quiet_NaN, float> : true_type {}; +#endif + +#if __DBL_HAS_QUIET_NAN__ +template <> + struct __has_iec559_behavior<__quiet_NaN, double> : true_type {}; +#endif + +#if __LDBL_HAS_QUIET_NAN__ +template <> + struct __has_iec559_behavior<__quiet_NaN, long double> : true_type {}; +#endif + +#if __FLT_HAS_INFINITY__ +template <> + struct __has_iec559_behavior<__infinity, float> : true_type {}; +#endif + +#if __DBL_HAS_INFINITY__ +template <> + struct __has_iec559_behavior<__infinity, double> : true_type {}; +#endif + +#if __LDBL_HAS_INFINITY__ +template <> + struct __has_iec559_behavior<__infinity, long double> : true_type {}; +#endif + +#ifdef __SUPPORT_SNAN__ +#if __FLT_HAS_QUIET_NAN__ +template <> + struct __has_iec559_behavior<__signaling_NaN, float> : true_type {}; +#endif + +#if __DBL_HAS_QUIET_NAN__ +template <> + struct __has_iec559_behavior<__signaling_NaN, double> : true_type {}; +#endif + +#if __LDBL_HAS_QUIET_NAN__ +template <> + struct __has_iec559_behavior<__signaling_NaN, long double> : true_type {}; +#endif + +#endif +#endif // __FINITE_MATH_ONLY__ + +} // namespace std diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/include/experimental/bits/simd.h new file mode 100644 index 00000000000..00eec50d64f --- /dev/null +++ b/libstdc++-v3/include/experimental/bits/simd.h @@ -0,0 +1,5051 @@ +// Definition of the public simd interfaces -*- C++ -*- + +// Copyright (C) 2020 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// Under Section 7 of GPL version 3, you are granted additional +// permissions described in the GCC Runtime Library Exception, version +// 3.1, as published by the Free Software Foundation. + +// You should have received a copy of the GNU General Public License and +// a copy of the GCC Runtime Library Exception along with this program; +// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +// <http://www.gnu.org/licenses/>. + +#ifndef _GLIBCXX_EXPERIMENTAL_SIMD_H +#define _GLIBCXX_EXPERIMENTAL_SIMD_H + +#if __cplusplus >= 201703L + +#include "simd_detail.h" +#include "numeric_traits.h" +#include <bit> +#include <bitset> +#ifdef _GLIBCXX_DEBUG_UB +#include <cstdio> // for stderr +#endif +#include <cstring> +#include <functional> +#include <iosfwd> +#include <utility> + +#if _GLIBCXX_SIMD_X86INTRIN +#include <x86intrin.h> +#elif _GLIBCXX_SIMD_HAVE_NEON +#include <arm_neon.h> +#endif + +/* There are several closely related types, with the following naming + * convention: + * _Tp: vectorizable (arithmetic) type (or any type) + * _TV: __vector_type_t<_Tp, _Np> + * _TW: _SimdWrapper<_Tp, _Np> + * _TI: __intrinsic_type_t<_Tp, _Np> + * _TVT: _VectorTraits<_TV> or _VectorTraits<_TW> + * If one additional type is needed use _U instead of _T. + * Otherwise use _T\d, _TV\d, _TW\d, TI\d, _TVT\d. + * + * More naming conventions: + * _Ap or _Abi: An ABI tag from the simd_abi namespace + * _Ip: often used for integer types with sizeof(_Ip) == sizeof(_Tp), + * _IV, _IW as for _TV, _TW + * _Np: number of elements (not bytes) + * _Bytes: number of bytes + * + * Variable names: + * __k: mask object (vector- or bitmask) + */ +_GLIBCXX_SIMD_BEGIN_NAMESPACE + +#if !_GLIBCXX_SIMD_X86INTRIN +using __m128 [[__gnu__::__vector_size__(16)]] = float; +using __m128d [[__gnu__::__vector_size__(16)]] = double; +using __m128i [[__gnu__::__vector_size__(16)]] = long long; +using __m256 [[__gnu__::__vector_size__(32)]] = float; +using __m256d [[__gnu__::__vector_size__(32)]] = double; +using __m256i [[__gnu__::__vector_size__(32)]] = long long; +using __m512 [[__gnu__::__vector_size__(64)]] = float; +using __m512d [[__gnu__::__vector_size__(64)]] = double; +using __m512i [[__gnu__::__vector_size__(64)]] = long long; +#endif + +namespace simd_abi { +// simd_abi forward declarations {{{ +// implementation details: +struct _Scalar; + +template <int _Np> + struct _Fixed; + +// There are two major ABIs that appear on different architectures. +// Both have non-boolean values packed into an N Byte register +// -> #elements = N / sizeof(T) +// Masks differ: +// 1. Use value vector registers for masks (all 0 or all 1) +// 2. Use bitmasks (mask registers) with one bit per value in the corresponding +// value vector +// +// Both can be partially used, masking off the rest when doing horizontal +// operations or operations that can trap (e.g. FP_INVALID or integer division +// by 0). This is encoded as the number of used bytes. +template <int _UsedBytes> + struct _VecBuiltin; + +template <int _UsedBytes> + struct _VecBltnBtmsk; + +template <typename _Tp, int _Np> + using _VecN = _VecBuiltin<sizeof(_Tp) * _Np>; + +template <int _UsedBytes = 16> + using _Sse = _VecBuiltin<_UsedBytes>; + +template <int _UsedBytes = 32> + using _Avx = _VecBuiltin<_UsedBytes>; + +template <int _UsedBytes = 64> + using _Avx512 = _VecBltnBtmsk<_UsedBytes>; + +template <int _UsedBytes = 16> + using _Neon = _VecBuiltin<_UsedBytes>; + +// implementation-defined: +using __sse = _Sse<>; +using __avx = _Avx<>; +using __avx512 = _Avx512<>; +using __neon = _Neon<>; +using __neon128 = _Neon<16>; +using __neon64 = _Neon<8>; + +// standard: +template <typename _Tp, size_t _Np, typename...> + struct deduce; + +template <int _Np> + using fixed_size = _Fixed<_Np>; + +using scalar = _Scalar; + +// }}} +} // namespace simd_abi +// forward declarations is_simd(_mask), simd(_mask), simd_size {{{ +template <typename _Tp> + struct is_simd; + +template <typename _Tp> + struct is_simd_mask; + +template <typename _Tp, typename _Abi> + class simd; + +template <typename _Tp, typename _Abi> + class simd_mask; + +template <typename _Tp, typename _Abi> + struct simd_size; + +// }}} +// load/store flags {{{ +struct element_aligned_tag +{ + template <typename _Tp, typename _Up = typename _Tp::value_type> + static constexpr size_t _S_alignment = alignof(_Up); + + template <typename _Tp, typename _Up> + _GLIBCXX_SIMD_INTRINSIC static constexpr _Up* + _S_apply(_Up* __ptr) + { return __ptr; } +}; + +struct vector_aligned_tag +{ + template <typename _Tp, typename _Up = typename _Tp::value_type> + static constexpr size_t _S_alignment + = std::__bit_ceil(sizeof(_Up) * _Tp::size()); + + template <typename _Tp, typename _Up> + _GLIBCXX_SIMD_INTRINSIC static constexpr _Up* + _S_apply(_Up* __ptr) + { + return static_cast<_Up*>( + __builtin_assume_aligned(__ptr, _S_alignment<_Tp, _Up>)); + } +}; + +template <size_t _Np> struct overaligned_tag +{ + template <typename _Tp, typename _Up = typename _Tp::value_type> + static constexpr size_t _S_alignment = _Np; + + template <typename _Tp, typename _Up> + _GLIBCXX_SIMD_INTRINSIC static constexpr _Up* + _S_apply(_Up* __ptr) + { return static_cast<_Up*>(__builtin_assume_aligned(__ptr, _Np)); } +}; + +inline constexpr element_aligned_tag element_aligned = {}; + +inline constexpr vector_aligned_tag vector_aligned = {}; + +template <size_t _Np> + inline constexpr overaligned_tag<_Np> overaligned = {}; + +// }}} +template <size_t _X> + using _SizeConstant = integral_constant<size_t, _X>; + +// unrolled/pack execution helpers +// __execute_n_times{{{ +template <typename _Fp, size_t... _I> + _GLIBCXX_SIMD_INTRINSIC constexpr void + __execute_on_index_sequence(_Fp&& __f, index_sequence<_I...>) + { ((void)__f(_SizeConstant<_I>()), ...); } + +template <typename _Fp> + _GLIBCXX_SIMD_INTRINSIC constexpr void + __execute_on_index_sequence(_Fp&&, index_sequence<>) + { } + +template <size_t _Np, typename _Fp> + _GLIBCXX_SIMD_INTRINSIC constexpr void + __execute_n_times(_Fp&& __f) + { + __execute_on_index_sequence(static_cast<_Fp&&>(__f), + make_index_sequence<_Np>{}); + } + +// }}} +// __generate_from_n_evaluations{{{ +template <typename _R, typename _Fp, size_t... _I> + _GLIBCXX_SIMD_INTRINSIC constexpr _R + __execute_on_index_sequence_with_return(_Fp&& __f, index_sequence<_I...>) + { return _R{__f(_SizeConstant<_I>())...}; } + +template <size_t _Np, typename _R, typename _Fp> + _GLIBCXX_SIMD_INTRINSIC constexpr _R + __generate_from_n_evaluations(_Fp&& __f) + { + return __execute_on_index_sequence_with_return<_R>( + static_cast<_Fp&&>(__f), make_index_sequence<_Np>{}); + } + +// }}} +// __call_with_n_evaluations{{{ +template <size_t... _I, typename _F0, typename _FArgs> + _GLIBCXX_SIMD_INTRINSIC constexpr auto + __call_with_n_evaluations(index_sequence<_I...>, _F0&& __f0, _FArgs&& __fargs) + { return __f0(__fargs(_SizeConstant<_I>())...); } + +template <size_t _Np, typename _F0, typename _FArgs> + _GLIBCXX_SIMD_INTRINSIC constexpr auto + __call_with_n_evaluations(_F0&& __f0, _FArgs&& __fargs) + { + return __call_with_n_evaluations(make_index_sequence<_Np>{}, + static_cast<_F0&&>(__f0), + static_cast<_FArgs&&>(__fargs)); + } + +// }}} +// __call_with_subscripts{{{ +template <size_t _First = 0, size_t... _It, typename _Tp, typename _Fp> + _GLIBCXX_SIMD_INTRINSIC constexpr auto + __call_with_subscripts(_Tp&& __x, index_sequence<_It...>, _Fp&& __fun) + { return __fun(__x[_First + _It]...); } + +template <size_t _Np, size_t _First = 0, typename _Tp, typename _Fp> + _GLIBCXX_SIMD_INTRINSIC constexpr auto + __call_with_subscripts(_Tp&& __x, _Fp&& __fun) + { + return __call_with_subscripts<_First>(static_cast<_Tp&&>(__x), + make_index_sequence<_Np>(), + static_cast<_Fp&&>(__fun)); + } + +// }}} + +// vvv ---- type traits ---- vvv +// integer type aliases{{{ +using _UChar = unsigned char; +using _SChar = signed char; +using _UShort = unsigned short; +using _UInt = unsigned int; +using _ULong = unsigned long; +using _ULLong = unsigned long long; +using _LLong = long long; + +//}}} +// __first_of_pack{{{ +template <typename _T0, typename...> + struct __first_of_pack + { using type = _T0; }; + +template <typename... _Ts> + using __first_of_pack_t = typename __first_of_pack<_Ts...>::type; + +//}}} +// __value_type_or_identity_t {{{ +template <typename _Tp> + typename _Tp::value_type + __value_type_or_identity_impl(int); + +template <typename _Tp> + _Tp + __value_type_or_identity_impl(float); + +template <typename _Tp> + using __value_type_or_identity_t + = decltype(__value_type_or_identity_impl<_Tp>(int())); + +// }}} +// __is_vectorizable {{{ +template <typename _Tp> + struct __is_vectorizable : public is_arithmetic<_Tp> {}; + +template <> + struct __is_vectorizable<bool> : public false_type {}; + +template <typename _Tp> + inline constexpr bool __is_vectorizable_v = __is_vectorizable<_Tp>::value; + +// Deduces to a vectorizable type +template <typename _Tp, typename = enable_if_t<__is_vectorizable_v<_Tp>>> + using _Vectorizable = _Tp; + +// }}} +// _LoadStorePtr / __is_possible_loadstore_conversion {{{ +template <typename _Ptr, typename _ValueType> + struct __is_possible_loadstore_conversion + : conjunction<__is_vectorizable<_Ptr>, __is_vectorizable<_ValueType>> {}; + +template <> + struct __is_possible_loadstore_conversion<bool, bool> : true_type {}; + +// Deduces to a type allowed for load/store with the given value type. +template <typename _Ptr, typename _ValueType, + typename = enable_if_t< + __is_possible_loadstore_conversion<_Ptr, _ValueType>::value>> + using _LoadStorePtr = _Ptr; + +// }}} +// __is_bitmask{{{ +template <typename _Tp, typename = void_t<>> + struct __is_bitmask : false_type {}; + +template <typename _Tp> + inline constexpr bool __is_bitmask_v = __is_bitmask<_Tp>::value; + +// the __mmaskXX case: +template <typename _Tp> + struct __is_bitmask<_Tp, + void_t<decltype(declval<unsigned&>() = declval<_Tp>() & 1u)>> + : true_type {}; + +// }}} +// __int_for_sizeof{{{ +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Wpedantic" +template <size_t _Bytes> + constexpr auto + __int_for_sizeof() + { + if constexpr (_Bytes == sizeof(int)) + return int(); + #ifdef __clang__ + else if constexpr (_Bytes == sizeof(char)) + return char(); + #else + else if constexpr (_Bytes == sizeof(_SChar)) + return _SChar(); + #endif + else if constexpr (_Bytes == sizeof(short)) + return short(); + #ifndef __clang__ + else if constexpr (_Bytes == sizeof(long)) + return long(); + #endif + else if constexpr (_Bytes == sizeof(_LLong)) + return _LLong(); + #ifdef __SIZEOF_INT128__ + else if constexpr (_Bytes == sizeof(__int128)) + return __int128(); + #endif // __SIZEOF_INT128__ + else if constexpr (_Bytes % sizeof(int) == 0) + { + constexpr size_t _Np = _Bytes / sizeof(int); + struct _Ip + { + int _M_data[_Np]; + + _GLIBCXX_SIMD_INTRINSIC constexpr _Ip + operator&(_Ip __rhs) const + { + return __generate_from_n_evaluations<_Np, _Ip>( + [&](auto __i) { return __rhs._M_data[__i] & _M_data[__i]; }); + } + + _GLIBCXX_SIMD_INTRINSIC constexpr _Ip + operator|(_Ip __rhs) const + { + return __generate_from_n_evaluations<_Np, _Ip>( + [&](auto __i) { return __rhs._M_data[__i] | _M_data[__i]; }); + } + + _GLIBCXX_SIMD_INTRINSIC constexpr _Ip + operator^(_Ip __rhs) const + { + return __generate_from_n_evaluations<_Np, _Ip>( + [&](auto __i) { return __rhs._M_data[__i] ^ _M_data[__i]; }); + } + + _GLIBCXX_SIMD_INTRINSIC constexpr _Ip + operator~() const + { + return __generate_from_n_evaluations<_Np, _Ip>( + [&](auto __i) { return ~_M_data[__i]; }); + } + }; + return _Ip{}; + } + else + static_assert(_Bytes != _Bytes, "this should be unreachable"); + } +#pragma GCC diagnostic pop + +template <typename _Tp> + using __int_for_sizeof_t = decltype(__int_for_sizeof<sizeof(_Tp)>()); + +template <size_t _Np> + using __int_with_sizeof_t = decltype(__int_for_sizeof<_Np>()); + +// }}} +// __is_fixed_size_abi{{{ +template <typename _Tp> + struct __is_fixed_size_abi : false_type {}; + +template <int _Np> + struct __is_fixed_size_abi<simd_abi::fixed_size<_Np>> : true_type {}; + +template <typename _Tp> + inline constexpr bool __is_fixed_size_abi_v = __is_fixed_size_abi<_Tp>::value; + +// }}} +// constexpr feature detection{{{ +constexpr inline bool __have_mmx = _GLIBCXX_SIMD_HAVE_MMX; +constexpr inline bool __have_sse = _GLIBCXX_SIMD_HAVE_SSE; +constexpr inline bool __have_sse2 = _GLIBCXX_SIMD_HAVE_SSE2; +constexpr inline bool __have_sse3 = _GLIBCXX_SIMD_HAVE_SSE3; +constexpr inline bool __have_ssse3 = _GLIBCXX_SIMD_HAVE_SSSE3; +constexpr inline bool __have_sse4_1 = _GLIBCXX_SIMD_HAVE_SSE4_1; +constexpr inline bool __have_sse4_2 = _GLIBCXX_SIMD_HAVE_SSE4_2; +constexpr inline bool __have_xop = _GLIBCXX_SIMD_HAVE_XOP; +constexpr inline bool __have_avx = _GLIBCXX_SIMD_HAVE_AVX; +constexpr inline bool __have_avx2 = _GLIBCXX_SIMD_HAVE_AVX2; +constexpr inline bool __have_bmi = _GLIBCXX_SIMD_HAVE_BMI1; +constexpr inline bool __have_bmi2 = _GLIBCXX_SIMD_HAVE_BMI2; +constexpr inline bool __have_lzcnt = _GLIBCXX_SIMD_HAVE_LZCNT; +constexpr inline bool __have_sse4a = _GLIBCXX_SIMD_HAVE_SSE4A; +constexpr inline bool __have_fma = _GLIBCXX_SIMD_HAVE_FMA; +constexpr inline bool __have_fma4 = _GLIBCXX_SIMD_HAVE_FMA4; +constexpr inline bool __have_f16c = _GLIBCXX_SIMD_HAVE_F16C; +constexpr inline bool __have_popcnt = _GLIBCXX_SIMD_HAVE_POPCNT; +constexpr inline bool __have_avx512f = _GLIBCXX_SIMD_HAVE_AVX512F; +constexpr inline bool __have_avx512dq = _GLIBCXX_SIMD_HAVE_AVX512DQ; +constexpr inline bool __have_avx512vl = _GLIBCXX_SIMD_HAVE_AVX512VL; +constexpr inline bool __have_avx512bw = _GLIBCXX_SIMD_HAVE_AVX512BW; +constexpr inline bool __have_avx512dq_vl = __have_avx512dq && __have_avx512vl; +constexpr inline bool __have_avx512bw_vl = __have_avx512bw && __have_avx512vl; + +constexpr inline bool __have_neon = _GLIBCXX_SIMD_HAVE_NEON; +constexpr inline bool __have_neon_a32 = _GLIBCXX_SIMD_HAVE_NEON_A32; +constexpr inline bool __have_neon_a64 = _GLIBCXX_SIMD_HAVE_NEON_A64; +constexpr inline bool __support_neon_float = +#if defined __GCC_IEC_559 + __GCC_IEC_559 == 0; +#elif defined __FAST_MATH__ + true; +#else + false; +#endif + +#ifdef __POWER9_VECTOR__ +constexpr inline bool __have_power9vec = true; +#else +constexpr inline bool __have_power9vec = false; +#endif +#if defined __POWER8_VECTOR__ +constexpr inline bool __have_power8vec = true; +#else +constexpr inline bool __have_power8vec = __have_power9vec; +#endif +#if defined __VSX__ +constexpr inline bool __have_power_vsx = true; +#else +constexpr inline bool __have_power_vsx = __have_power8vec; +#endif +#if defined __ALTIVEC__ +constexpr inline bool __have_power_vmx = true; +#else +constexpr inline bool __have_power_vmx = __have_power_vsx; +#endif + +// }}} +// __is_scalar_abi {{{ +template <typename _Abi> + constexpr bool + __is_scalar_abi() + { return is_same_v<simd_abi::scalar, _Abi>; } + +// }}} +// __abi_bytes_v {{{ +template <template <int> class _Abi, int _Bytes> + constexpr int + __abi_bytes_impl(_Abi<_Bytes>*) + { return _Bytes; } + +template <typename _Tp> + constexpr int + __abi_bytes_impl(_Tp*) + { return -1; } + +template <typename _Abi> + inline constexpr int __abi_bytes_v + = __abi_bytes_impl(static_cast<_Abi*>(nullptr)); + +// }}} +// __is_builtin_bitmask_abi {{{ +template <typename _Abi> + constexpr bool + __is_builtin_bitmask_abi() + { return is_same_v<simd_abi::_VecBltnBtmsk<__abi_bytes_v<_Abi>>, _Abi>; } + +// }}} +// __is_sse_abi {{{ +template <typename _Abi> + constexpr bool + __is_sse_abi() + { + constexpr auto _Bytes = __abi_bytes_v<_Abi>; + return _Bytes <= 16 && is_same_v<simd_abi::_VecBuiltin<_Bytes>, _Abi>; + } + +// }}} +// __is_avx_abi {{{ +template <typename _Abi> + constexpr bool + __is_avx_abi() + { + constexpr auto _Bytes = __abi_bytes_v<_Abi>; + return _Bytes > 16 && _Bytes <= 32 + && is_same_v<simd_abi::_VecBuiltin<_Bytes>, _Abi>; + } + +// }}} +// __is_avx512_abi {{{ +template <typename _Abi> + constexpr bool + __is_avx512_abi() + { + constexpr auto _Bytes = __abi_bytes_v<_Abi>; + return _Bytes <= 64 && is_same_v<simd_abi::_Avx512<_Bytes>, _Abi>; + } + +// }}} +// __is_neon_abi {{{ +template <typename _Abi> + constexpr bool + __is_neon_abi() + { + constexpr auto _Bytes = __abi_bytes_v<_Abi>; + return _Bytes <= 16 && is_same_v<simd_abi::_VecBuiltin<_Bytes>, _Abi>; + } + +// }}} +// __make_dependent_t {{{ +template <typename, typename _Up> + struct __make_dependent + { using type = _Up; }; + +template <typename _Tp, typename _Up> + using __make_dependent_t = typename __make_dependent<_Tp, _Up>::type; + +// }}} +// ^^^ ---- type traits ---- ^^^ + +// __invoke_ub{{{ +template <typename... _Args> + [[noreturn]] _GLIBCXX_SIMD_ALWAYS_INLINE void + __invoke_ub([[maybe_unused]] const char* __msg, + [[maybe_unused]] const _Args&... __args) + { +#ifdef _GLIBCXX_DEBUG_UB + __builtin_fprintf(stderr, __msg, __args...); + __builtin_trap(); +#else + __builtin_unreachable(); +#endif + } + +// }}} +// __assert_unreachable{{{ +template <typename _Tp> + struct __assert_unreachable + { static_assert(!is_same_v<_Tp, _Tp>, "this should be unreachable"); }; + +// }}} +// __size_or_zero_v {{{ +template <typename _Tp, typename _Ap, size_t _Np = simd_size<_Tp, _Ap>::value> + constexpr size_t + __size_or_zero_dispatch(int) + { return _Np; } + +template <typename _Tp, typename _Ap> + constexpr size_t + __size_or_zero_dispatch(float) + { return 0; } + +template <typename _Tp, typename _Ap> + inline constexpr size_t __size_or_zero_v + = __size_or_zero_dispatch<_Tp, _Ap>(0); + +// }}} +// __div_roundup {{{ +inline constexpr size_t +__div_roundup(size_t __a, size_t __b) +{ return (__a + __b - 1) / __b; } + +// }}} +// _ExactBool{{{ +class _ExactBool +{ + const bool _M_data; + +public: + _GLIBCXX_SIMD_INTRINSIC constexpr _ExactBool(bool __b) : _M_data(__b) {} + + _ExactBool(int) = delete; + + _GLIBCXX_SIMD_INTRINSIC constexpr operator bool() const { return _M_data; } +}; + +// }}} +// __may_alias{{{ +/**@internal + * Helper __may_alias<_Tp> that turns _Tp into the type to be used for an + * aliasing pointer. This adds the __may_alias attribute to _Tp (with compilers + * that support it). + */ +template <typename _Tp> + using __may_alias [[__gnu__::__may_alias__]] = _Tp; + +// }}} +// _UnsupportedBase {{{ +// simd and simd_mask base for unsupported <_Tp, _Abi> +struct _UnsupportedBase +{ + _UnsupportedBase() = delete; + _UnsupportedBase(const _UnsupportedBase&) = delete; + _UnsupportedBase& operator=(const _UnsupportedBase&) = delete; + ~_UnsupportedBase() = delete; +}; + +// }}} +// _InvalidTraits {{{ +/** + * @internal + * Defines the implementation of __a given <_Tp, _Abi>. + * + * Implementations must ensure that only valid <_Tp, _Abi> instantiations are + * possible. Static assertions in the type definition do not suffice. It is + * important that SFINAE works. + */ +struct _InvalidTraits +{ + using _IsValid = false_type; + using _SimdBase = _UnsupportedBase; + using _MaskBase = _UnsupportedBase; + + static constexpr size_t _S_full_size = 0; + static constexpr bool _S_is_partial = false; + + static constexpr size_t _S_simd_align = 1; + struct _SimdImpl; + struct _SimdMember {}; + struct _SimdCastType; + + static constexpr size_t _S_mask_align = 1; + struct _MaskImpl; + struct _MaskMember {}; + struct _MaskCastType; +}; + +// }}} +// _SimdTraits {{{ +template <typename _Tp, typename _Abi, typename = void_t<>> + struct _SimdTraits : _InvalidTraits {}; + +// }}} +// __private_init, __bitset_init{{{ +/** + * @internal + * Tag used for private init constructor of simd and simd_mask + */ +inline constexpr struct _PrivateInit {} __private_init = {}; + +inline constexpr struct _BitsetInit {} __bitset_init = {}; + +// }}} +// __is_narrowing_conversion<_From, _To>{{{ +template <typename _From, typename _To, bool = is_arithmetic_v<_From>, + bool = is_arithmetic_v<_To>> + struct __is_narrowing_conversion; + +// ignore "signed/unsigned mismatch" in the following trait. +// The implicit conversions will do the right thing here. +template <typename _From, typename _To> + struct __is_narrowing_conversion<_From, _To, true, true> + : public __bool_constant<( + __digits_v<_From> > __digits_v<_To> + || __finite_max_v<_From> > __finite_max_v<_To> + || __finite_min_v<_From> < __finite_min_v<_To> + || (is_signed_v<_From> && is_unsigned_v<_To>))> {}; + +template <typename _Tp> + struct __is_narrowing_conversion<_Tp, bool, true, true> + : public true_type {}; + +template <> + struct __is_narrowing_conversion<bool, bool, true, true> + : public false_type {}; + +template <typename _Tp> + struct __is_narrowing_conversion<_Tp, _Tp, true, true> + : public false_type {}; + +template <typename _From, typename _To> + struct __is_narrowing_conversion<_From, _To, false, true> + : public negation<is_convertible<_From, _To>> {}; + +// }}} +// __converts_to_higher_integer_rank{{{ +template <typename _From, typename _To, bool = (sizeof(_From) < sizeof(_To))> + struct __converts_to_higher_integer_rank : public true_type {}; + +// this may fail for char -> short if sizeof(char) == sizeof(short) +template <typename _From, typename _To> + struct __converts_to_higher_integer_rank<_From, _To, false> + : public is_same<decltype(declval<_From>() + declval<_To>()), _To> {}; + +// }}} +// __data(simd/simd_mask) {{{ +template <typename _Tp, typename _Ap> + _GLIBCXX_SIMD_INTRINSIC constexpr const auto& + __data(const simd<_Tp, _Ap>& __x); + +template <typename _Tp, typename _Ap> + _GLIBCXX_SIMD_INTRINSIC constexpr auto& + __data(simd<_Tp, _Ap>& __x); + +template <typename _Tp, typename _Ap> + _GLIBCXX_SIMD_INTRINSIC constexpr const auto& + __data(const simd_mask<_Tp, _Ap>& __x); + +template <typename _Tp, typename _Ap> + _GLIBCXX_SIMD_INTRINSIC constexpr auto& + __data(simd_mask<_Tp, _Ap>& __x); + +// }}} +// _SimdConverter {{{ +template <typename _FromT, typename _FromA, typename _ToT, typename _ToA, + typename = void> + struct _SimdConverter; + +template <typename _Tp, typename _Ap> + struct _SimdConverter<_Tp, _Ap, _Tp, _Ap, void> + { + template <typename _Up> + _GLIBCXX_SIMD_INTRINSIC const _Up& + operator()(const _Up& __x) + { return __x; } + }; + +// }}} +// __to_value_type_or_member_type {{{ +template <typename _V> + _GLIBCXX_SIMD_INTRINSIC constexpr auto + __to_value_type_or_member_type(const _V& __x) -> decltype(__data(__x)) + { return __data(__x); } + +template <typename _V> + _GLIBCXX_SIMD_INTRINSIC constexpr const typename _V::value_type& + __to_value_type_or_member_type(const typename _V::value_type& __x) + { return __x; } + +// }}} +// __bool_storage_member_type{{{ +template <size_t _Size> + struct __bool_storage_member_type; + +template <size_t _Size> + using __bool_storage_member_type_t = + typename __bool_storage_member_type<_Size>::type; + +// }}} +// _SimdTuple {{{ +// why not tuple? +// 1. tuple gives no guarantee about the storage order, but I require +// storage +// equivalent to array<_Tp, _Np> +// 2. direct access to the element type (first template argument) +// 3. enforces equal element type, only different _Abi types are allowed +template <typename _Tp, typename... _Abis> + struct _SimdTuple; + +//}}} +// __fixed_size_storage_t {{{ +template <typename _Tp, int _Np> + struct __fixed_size_storage; + +template <typename _Tp, int _Np> + using __fixed_size_storage_t = typename __fixed_size_storage<_Tp, _Np>::type; + +// }}} +// _SimdWrapper fwd decl{{{ +template <typename _Tp, size_t _Size, typename = void_t<>> + struct _SimdWrapper; + +template <typename _Tp> + using _SimdWrapper8 = _SimdWrapper<_Tp, 8 / sizeof(_Tp)>; +template <typename _Tp> + using _SimdWrapper16 = _SimdWrapper<_Tp, 16 / sizeof(_Tp)>; +template <typename _Tp> + using _SimdWrapper32 = _SimdWrapper<_Tp, 32 / sizeof(_Tp)>; +template <typename _Tp> + using _SimdWrapper64 = _SimdWrapper<_Tp, 64 / sizeof(_Tp)>; + +// }}} +// __is_simd_wrapper {{{ +template <typename _Tp> + struct __is_simd_wrapper : false_type {}; + +template <typename _Tp, size_t _Np> + struct __is_simd_wrapper<_SimdWrapper<_Tp, _Np>> : true_type {}; + +template <typename _Tp> + inline constexpr bool __is_simd_wrapper_v = __is_simd_wrapper<_Tp>::value; + +// }}} +// _BitOps {{{ +struct _BitOps +{ + // _S_bit_iteration {{{ + template <typename _Tp, typename _Fp> + static void + _S_bit_iteration(_Tp __mask, _Fp&& __f) + { + static_assert(sizeof(_ULLong) >= sizeof(_Tp)); + conditional_t<sizeof(_Tp) <= sizeof(_UInt), _UInt, _ULLong> __k; + if constexpr (is_convertible_v<_Tp, decltype(__k)>) + __k = __mask; + else + __k = __mask.to_ullong(); + while(__k) + { + __f(std::__countr_zero(__k)); + __k &= (__k - 1); + } + } + + //}}} +}; + +//}}} +// __increment, __decrement {{{ +template <typename _Tp = void> + struct __increment + { constexpr _Tp operator()(_Tp __a) const { return ++__a; } }; + +template <> + struct __increment<void> + { + template <typename _Tp> + constexpr _Tp + operator()(_Tp __a) const + { return ++__a; } + }; + +template <typename _Tp = void> + struct __decrement + { constexpr _Tp operator()(_Tp __a) const { return --__a; } }; + +template <> + struct __decrement<void> + { + template <typename _Tp> + constexpr _Tp + operator()(_Tp __a) const + { return --__a; } + }; + +// }}} +// _ValuePreserving(OrInt) {{{ +template <typename _From, typename _To, + typename = enable_if_t<negation< + __is_narrowing_conversion<__remove_cvref_t<_From>, _To>>::value>> + using _ValuePreserving = _From; + +template <typename _From, typename _To, + typename _DecayedFrom = __remove_cvref_t<_From>, + typename = enable_if_t<conjunction< + is_convertible<_From, _To>, + disjunction< + is_same<_DecayedFrom, _To>, is_same<_DecayedFrom, int>, + conjunction<is_same<_DecayedFrom, _UInt>, is_unsigned<_To>>, + negation<__is_narrowing_conversion<_DecayedFrom, _To>>>>::value>> + using _ValuePreservingOrInt = _From; + +// }}} +// __intrinsic_type {{{ +template <typename _Tp, size_t _Bytes, typename = void_t<>> + struct __intrinsic_type; + +template <typename _Tp, size_t _Size> + using __intrinsic_type_t = + typename __intrinsic_type<_Tp, _Size * sizeof(_Tp)>::type; + +template <typename _Tp> + using __intrinsic_type2_t = typename __intrinsic_type<_Tp, 2>::type; +template <typename _Tp> + using __intrinsic_type4_t = typename __intrinsic_type<_Tp, 4>::type; +template <typename _Tp> + using __intrinsic_type8_t = typename __intrinsic_type<_Tp, 8>::type; +template <typename _Tp> + using __intrinsic_type16_t = typename __intrinsic_type<_Tp, 16>::type; +template <typename _Tp> + using __intrinsic_type32_t = typename __intrinsic_type<_Tp, 32>::type; +template <typename _Tp> + using __intrinsic_type64_t = typename __intrinsic_type<_Tp, 64>::type; + +// }}} +// _BitMask {{{ +template <size_t _Np, bool _Sanitized = false> + struct _BitMask; + +template <size_t _Np, bool _Sanitized> + struct __is_bitmask<_BitMask<_Np, _Sanitized>, void> : true_type {}; + +template <size_t _Np> + using _SanitizedBitMask = _BitMask<_Np, true>; + +template <size_t _Np, bool _Sanitized> + struct _BitMask + { + static_assert(_Np > 0); + + static constexpr size_t _NBytes = __div_roundup(_Np, __CHAR_BIT__); + + using _Tp = conditional_t<_Np == 1, bool, + make_unsigned_t<__int_with_sizeof_t<std::min( + sizeof(_ULLong), std::__bit_ceil(_NBytes))>>>; + + static constexpr int _S_array_size = __div_roundup(_NBytes, sizeof(_Tp)); + + _Tp _M_bits[_S_array_size]; + + static constexpr int _S_unused_bits + = _Np == 1 ? 0 : _S_array_size * sizeof(_Tp) * __CHAR_BIT__ - _Np; + + static constexpr _Tp _S_bitmask = +_Tp(~_Tp()) >> _S_unused_bits; + + constexpr _BitMask() noexcept = default; + + constexpr _BitMask(unsigned long long __x) noexcept + : _M_bits{static_cast<_Tp>(__x)} {} + + _BitMask(bitset<_Np> __x) noexcept : _BitMask(__x.to_ullong()) {} + + constexpr _BitMask(const _BitMask&) noexcept = default; + + template <bool _RhsSanitized, typename = enable_if_t<_RhsSanitized == false + && _Sanitized == true>> + constexpr _BitMask(const _BitMask<_Np, _RhsSanitized>& __rhs) noexcept + : _BitMask(__rhs._M_sanitized()) {} + + constexpr operator _SimdWrapper<bool, _Np>() const noexcept + { + static_assert(_S_array_size == 1); + return _M_bits[0]; + } + + // precondition: is sanitized + constexpr _Tp + _M_to_bits() const noexcept + { + static_assert(_S_array_size == 1); + return _M_bits[0]; + } + + // precondition: is sanitized + constexpr unsigned long long + to_ullong() const noexcept + { + static_assert(_S_array_size == 1); + return _M_bits[0]; + } + + // precondition: is sanitized + constexpr unsigned long + to_ulong() const noexcept + { + static_assert(_S_array_size == 1); + return _M_bits[0]; + } + + constexpr bitset<_Np> + _M_to_bitset() const noexcept + { + static_assert(_S_array_size == 1); + return _M_bits[0]; + } + + constexpr decltype(auto) + _M_sanitized() const noexcept + { + if constexpr (_Sanitized) + return *this; + else if constexpr (_Np == 1) + return _SanitizedBitMask<_Np>(_M_bits[0]); + else + { + _SanitizedBitMask<_Np> __r = {}; + for (int __i = 0; __i < _S_array_size; ++__i) + __r._M_bits[__i] = _M_bits[__i]; + if constexpr (_S_unused_bits > 0) + __r._M_bits[_S_array_size - 1] &= _S_bitmask; + return __r; + } + } + + template <size_t _Mp, bool _LSanitized> + constexpr _BitMask<_Np + _Mp, _Sanitized> + _M_prepend(_BitMask<_Mp, _LSanitized> __lsb) const noexcept + { + constexpr size_t _RN = _Np + _Mp; + using _Rp = _BitMask<_RN, _Sanitized>; + if constexpr (_Rp::_S_array_size == 1) + { + _Rp __r{{_M_bits[0]}}; + __r._M_bits[0] <<= _Mp; + __r._M_bits[0] |= __lsb._M_sanitized()._M_bits[0]; + return __r; + } + else + __assert_unreachable<_Rp>(); + } + + // Return a new _BitMask with size _NewSize while dropping _DropLsb least + // significant bits. If the operation implicitly produces a sanitized bitmask, + // the result type will have _Sanitized set. + template <size_t _DropLsb, size_t _NewSize = _Np - _DropLsb> + constexpr auto + _M_extract() const noexcept + { + static_assert(_Np > _DropLsb); + static_assert(_DropLsb + _NewSize <= sizeof(_ULLong) * __CHAR_BIT__, + "not implemented for bitmasks larger than one ullong"); + if constexpr (_NewSize == 1) + // must sanitize because the return _Tp is bool + return _SanitizedBitMask<1>(_M_bits[0] & (_Tp(1) << _DropLsb)); + else + return _BitMask<_NewSize, + ((_NewSize + _DropLsb == sizeof(_Tp) * __CHAR_BIT__ + && _NewSize + _DropLsb <= _Np) + || ((_Sanitized || _Np == sizeof(_Tp) * __CHAR_BIT__) + && _NewSize + _DropLsb >= _Np))>(_M_bits[0] + >> _DropLsb); + } + + // True if all bits are set. Implicitly sanitizes if _Sanitized == false. + constexpr bool + all() const noexcept + { + if constexpr (_Np == 1) + return _M_bits[0]; + else if constexpr (!_Sanitized) + return _M_sanitized().all(); + else + { + constexpr _Tp __allbits = ~_Tp(); + for (int __i = 0; __i < _S_array_size - 1; ++__i) + if (_M_bits[__i] != __allbits) + return false; + return _M_bits[_S_array_size - 1] == _S_bitmask; + } + } + + // True if at least one bit is set. Implicitly sanitizes if _Sanitized == + // false. + constexpr bool + any() const noexcept + { + if constexpr (_Np == 1) + return _M_bits[0]; + else if constexpr (!_Sanitized) + return _M_sanitized().any(); + else + { + for (int __i = 0; __i < _S_array_size - 1; ++__i) + if (_M_bits[__i] != 0) + return true; + return _M_bits[_S_array_size - 1] != 0; + } + } + + // True if no bit is set. Implicitly sanitizes if _Sanitized == false. + constexpr bool + none() const noexcept + { + if constexpr (_Np == 1) + return !_M_bits[0]; + else if constexpr (!_Sanitized) + return _M_sanitized().none(); + else + { + for (int __i = 0; __i < _S_array_size - 1; ++__i) + if (_M_bits[__i] != 0) + return false; + return _M_bits[_S_array_size - 1] == 0; + } + } + + // Returns the number of set bits. Implicitly sanitizes if _Sanitized == + // false. + constexpr int + count() const noexcept + { + if constexpr (_Np == 1) + return _M_bits[0]; + else if constexpr (!_Sanitized) + return _M_sanitized().none(); + else + { + int __result = __builtin_popcountll(_M_bits[0]); + for (int __i = 1; __i < _S_array_size; ++__i) + __result += __builtin_popcountll(_M_bits[__i]); + return __result; + } + } + + // Returns the bit at offset __i as bool. + constexpr bool + operator[](size_t __i) const noexcept + { + if constexpr (_Np == 1) + return _M_bits[0]; + else if constexpr (_S_array_size == 1) + return (_M_bits[0] >> __i) & 1; + else + { + const size_t __j = __i / (sizeof(_Tp) * __CHAR_BIT__); + const size_t __shift = __i % (sizeof(_Tp) * __CHAR_BIT__); + return (_M_bits[__j] >> __shift) & 1; + } + } + + template <size_t __i> + constexpr bool + operator[](_SizeConstant<__i>) const noexcept + { + static_assert(__i < _Np); + constexpr size_t __j = __i / (sizeof(_Tp) * __CHAR_BIT__); + constexpr size_t __shift = __i % (sizeof(_Tp) * __CHAR_BIT__); + return static_cast<bool>(_M_bits[__j] & (_Tp(1) << __shift)); + } + + // Set the bit at offset __i to __x. + constexpr void + set(size_t __i, bool __x) noexcept + { + if constexpr (_Np == 1) + _M_bits[0] = __x; + else if constexpr (_S_array_size == 1) + { + _M_bits[0] &= ~_Tp(_Tp(1) << __i); + _M_bits[0] |= _Tp(_Tp(__x) << __i); + } + else + { + const size_t __j = __i / (sizeof(_Tp) * __CHAR_BIT__); + const size_t __shift = __i % (sizeof(_Tp) * __CHAR_BIT__); + _M_bits[__j] &= ~_Tp(_Tp(1) << __shift); + _M_bits[__j] |= _Tp(_Tp(__x) << __shift); + } + } + + template <size_t __i> + constexpr void + set(_SizeConstant<__i>, bool __x) noexcept + { + static_assert(__i < _Np); + if constexpr (_Np == 1) + _M_bits[0] = __x; + else + { + constexpr size_t __j = __i / (sizeof(_Tp) * __CHAR_BIT__); + constexpr size_t __shift = __i % (sizeof(_Tp) * __CHAR_BIT__); + constexpr _Tp __mask = ~_Tp(_Tp(1) << __shift); + _M_bits[__j] &= __mask; + _M_bits[__j] |= _Tp(_Tp(__x) << __shift); + } + } + + // Inverts all bits. Sanitized input leads to sanitized output. + constexpr _BitMask + operator~() const noexcept + { + if constexpr (_Np == 1) + return !_M_bits[0]; + else + { + _BitMask __result{}; + for (int __i = 0; __i < _S_array_size - 1; ++__i) + __result._M_bits[__i] = ~_M_bits[__i]; + if constexpr (_Sanitized) + __result._M_bits[_S_array_size - 1] + = _M_bits[_S_array_size - 1] ^ _S_bitmask; + else + __result._M_bits[_S_array_size - 1] = ~_M_bits[_S_array_size - 1]; + return __result; + } + } + + constexpr _BitMask& + operator^=(const _BitMask& __b) & noexcept + { + __execute_n_times<_S_array_size>( + [&](auto __i) { _M_bits[__i] ^= __b._M_bits[__i]; }); + return *this; + } + + constexpr _BitMask& + operator|=(const _BitMask& __b) & noexcept + { + __execute_n_times<_S_array_size>( + [&](auto __i) { _M_bits[__i] |= __b._M_bits[__i]; }); + return *this; + } + + constexpr _BitMask& + operator&=(const _BitMask& __b) & noexcept + { + __execute_n_times<_S_array_size>( + [&](auto __i) { _M_bits[__i] &= __b._M_bits[__i]; }); + return *this; + } + + friend constexpr _BitMask + operator^(const _BitMask& __a, const _BitMask& __b) noexcept + { + _BitMask __r = __a; + __r ^= __b; + return __r; + } + + friend constexpr _BitMask + operator|(const _BitMask& __a, const _BitMask& __b) noexcept + { + _BitMask __r = __a; + __r |= __b; + return __r; + } + + friend constexpr _BitMask + operator&(const _BitMask& __a, const _BitMask& __b) noexcept + { + _BitMask __r = __a; + __r &= __b; + return __r; + } + + _GLIBCXX_SIMD_INTRINSIC + constexpr bool + _M_is_constprop() const + { + if constexpr (_S_array_size == 0) + return __builtin_constant_p(_M_bits[0]); + else + { + for (int __i = 0; __i < _S_array_size; ++__i) + if (!__builtin_constant_p(_M_bits[__i])) + return false; + return true; + } + } + }; + +// }}} + +// vvv ---- builtin vector types [[gnu::vector_size(N)]] and operations ---- vvv +// __min_vector_size {{{ +template <typename _Tp = void> + static inline constexpr int __min_vector_size = 2 * sizeof(_Tp); + +#if _GLIBCXX_SIMD_HAVE_NEON +template <> + inline constexpr int __min_vector_size<void> = 8; +#else +template <> + inline constexpr int __min_vector_size<void> = 16; +#endif + +// }}} +// __vector_type {{{ +template <typename _Tp, size_t _Np, typename = void> + struct __vector_type_n {}; + +// substition failure for 0-element case +template <typename _Tp> + struct __vector_type_n<_Tp, 0, void> {}; + +// special case 1-element to be _Tp itself +template <typename _Tp> + struct __vector_type_n<_Tp, 1, enable_if_t<__is_vectorizable_v<_Tp>>> + { using type = _Tp; }; + +// else, use GNU-style builtin vector types +template <typename _Tp, size_t _Np> + struct __vector_type_n<_Tp, _Np, + enable_if_t<__is_vectorizable_v<_Tp> && _Np >= 2>> + { + static constexpr size_t _S_Np2 = std::__bit_ceil(_Np * sizeof(_Tp)); + + static constexpr size_t _S_Bytes = +#ifdef __i386__ + // Using [[gnu::vector_size(8)]] would wreak havoc on the FPU because + // those objects are passed via MMX registers and nothing ever calls EMMS. + _S_Np2 == 8 ? 16 : +#endif + _S_Np2 < __min_vector_size<_Tp> ? __min_vector_size<_Tp> + : _S_Np2; + + using type [[__gnu__::__vector_size__(_S_Bytes)]] = _Tp; + }; + +template <typename _Tp, size_t _Bytes, size_t = _Bytes % sizeof(_Tp)> + struct __vector_type; + +template <typename _Tp, size_t _Bytes> + struct __vector_type<_Tp, _Bytes, 0> + : __vector_type_n<_Tp, _Bytes / sizeof(_Tp)> {}; + +template <typename _Tp, size_t _Size> + using __vector_type_t = typename __vector_type_n<_Tp, _Size>::type; + +template <typename _Tp> + using __vector_type2_t = typename __vector_type<_Tp, 2>::type; +template <typename _Tp> + using __vector_type4_t = typename __vector_type<_Tp, 4>::type; +template <typename _Tp> + using __vector_type8_t = typename __vector_type<_Tp, 8>::type; +template <typename _Tp> + using __vector_type16_t = typename __vector_type<_Tp, 16>::type; +template <typename _Tp> + using __vector_type32_t = typename __vector_type<_Tp, 32>::type; +template <typename _Tp> + using __vector_type64_t = typename __vector_type<_Tp, 64>::type; + +// }}} +// __is_vector_type {{{ +template <typename _Tp, typename = void_t<>> + struct __is_vector_type : false_type {}; + +template <typename _Tp> + struct __is_vector_type< + _Tp, void_t<typename __vector_type< + remove_reference_t<decltype(declval<_Tp>()[0])>, sizeof(_Tp)>::type>> + : is_same<_Tp, typename __vector_type< + remove_reference_t<decltype(declval<_Tp>()[0])>, + sizeof(_Tp)>::type> {}; + +template <typename _Tp> + inline constexpr bool __is_vector_type_v = __is_vector_type<_Tp>::value; + +// }}} +// _VectorTraits{{{ +template <typename _Tp, typename = void_t<>> + struct _VectorTraitsImpl; + +template <typename _Tp> + struct _VectorTraitsImpl<_Tp, enable_if_t<__is_vector_type_v<_Tp>>> + { + using type = _Tp; + using value_type = remove_reference_t<decltype(declval<_Tp>()[0])>; + static constexpr int _S_full_size = sizeof(_Tp) / sizeof(value_type); + using _Wrapper = _SimdWrapper<value_type, _S_full_size>; + template <typename _Up, int _W = _S_full_size> + static constexpr bool _S_is + = is_same_v<value_type, _Up> && _W == _S_full_size; + }; + +template <typename _Tp, size_t _Np> + struct _VectorTraitsImpl<_SimdWrapper<_Tp, _Np>, + void_t<__vector_type_t<_Tp, _Np>>> + { + using type = __vector_type_t<_Tp, _Np>; + using value_type = _Tp; + static constexpr int _S_full_size = sizeof(type) / sizeof(value_type); + using _Wrapper = _SimdWrapper<_Tp, _Np>; + static constexpr bool _S_is_partial = (_Np == _S_full_size); + static constexpr int _S_partial_width = _Np; + template <typename _Up, int _W = _S_full_size> + static constexpr bool _S_is + = is_same_v<value_type, _Up>&& _W == _S_full_size; + }; + +template <typename _Tp, typename = typename _VectorTraitsImpl<_Tp>::type> + using _VectorTraits = _VectorTraitsImpl<_Tp>; + +// }}} +// __as_vector{{{ +template <typename _V> + _GLIBCXX_SIMD_INTRINSIC constexpr auto + __as_vector(_V __x) + { + if constexpr (__is_vector_type_v<_V>) + return __x; + else if constexpr (is_simd<_V>::value || is_simd_mask<_V>::value) + return __data(__x)._M_data; + else if constexpr (__is_vectorizable_v<_V>) + return __vector_type_t<_V, 2>{__x}; + else + return __x._M_data; + } + +// }}} +// __as_wrapper{{{ +template <size_t _Np = 0, typename _V> + _GLIBCXX_SIMD_INTRINSIC constexpr auto + __as_wrapper(_V __x) + { + if constexpr (__is_vector_type_v<_V>) + return _SimdWrapper<typename _VectorTraits<_V>::value_type, + (_Np > 0 ? _Np : _VectorTraits<_V>::_S_full_size)>(__x); + else if constexpr (is_simd<_V>::value || is_simd_mask<_V>::value) + { + static_assert(_V::size() == _Np); + return __data(__x); + } + else + { + static_assert(_V::_S_size == _Np); + return __x; + } + } + +// }}} +// __intrin_bitcast{{{ +template <typename _To, typename _From> + _GLIBCXX_SIMD_INTRINSIC constexpr _To + __intrin_bitcast(_From __v) + { + static_assert(__is_vector_type_v<_From> && __is_vector_type_v<_To>); + if constexpr (sizeof(_To) == sizeof(_From)) + return reinterpret_cast<_To>(__v); + else if constexpr (sizeof(_From) > sizeof(_To)) + if constexpr (sizeof(_To) >= 16) + return reinterpret_cast<const __may_alias<_To>&>(__v); + else + { + _To __r; + __builtin_memcpy(&__r, &__v, sizeof(_To)); + return __r; + } +#if _GLIBCXX_SIMD_X86INTRIN && !defined __clang__ + else if constexpr (__have_avx && sizeof(_From) == 16 && sizeof(_To) == 32) + return reinterpret_cast<_To>(__builtin_ia32_ps256_ps( + reinterpret_cast<__vector_type_t<float, 4>>(__v))); + else if constexpr (__have_avx512f && sizeof(_From) == 16 + && sizeof(_To) == 64) + return reinterpret_cast<_To>(__builtin_ia32_ps512_ps( + reinterpret_cast<__vector_type_t<float, 4>>(__v))); + else if constexpr (__have_avx512f && sizeof(_From) == 32 + && sizeof(_To) == 64) + return reinterpret_cast<_To>(__builtin_ia32_ps512_256ps( + reinterpret_cast<__vector_type_t<float, 8>>(__v))); +#endif // _GLIBCXX_SIMD_X86INTRIN + else if constexpr (sizeof(__v) <= 8) + return reinterpret_cast<_To>( + __vector_type_t<__int_for_sizeof_t<_From>, sizeof(_To) / sizeof(_From)>{ + reinterpret_cast<__int_for_sizeof_t<_From>>(__v)}); + else + { + static_assert(sizeof(_To) > sizeof(_From)); + _To __r = {}; + __builtin_memcpy(&__r, &__v, sizeof(_From)); + return __r; + } + } + +// }}} +// __vector_bitcast{{{ +template <typename _To, size_t _NN = 0, typename _From, + typename _FromVT = _VectorTraits<_From>, + size_t _Np = _NN == 0 ? sizeof(_From) / sizeof(_To) : _NN> + _GLIBCXX_SIMD_INTRINSIC constexpr __vector_type_t<_To, _Np> + __vector_bitcast(_From __x) + { + using _R = __vector_type_t<_To, _Np>; + return __intrin_bitcast<_R>(__x); + } + +template <typename _To, size_t _NN = 0, typename _Tp, size_t _Nx, + size_t _Np + = _NN == 0 ? sizeof(_SimdWrapper<_Tp, _Nx>) / sizeof(_To) : _NN> + _GLIBCXX_SIMD_INTRINSIC constexpr __vector_type_t<_To, _Np> + __vector_bitcast(const _SimdWrapper<_Tp, _Nx>& __x) + { + static_assert(_Np > 1); + return __intrin_bitcast<__vector_type_t<_To, _Np>>(__x._M_data); + } + +// }}} +// __convert_x86 declarations {{{ +#ifdef _GLIBCXX_SIMD_WORKAROUND_PR85048 +template <typename _To, typename _Tp, typename _TVT = _VectorTraits<_Tp>> + _To __convert_x86(_Tp); + +template <typename _To, typename _Tp, typename _TVT = _VectorTraits<_Tp>> + _To __convert_x86(_Tp, _Tp); + +template <typename _To, typename _Tp, typename _TVT = _VectorTraits<_Tp>> + _To __convert_x86(_Tp, _Tp, _Tp, _Tp); + +template <typename _To, typename _Tp, typename _TVT = _VectorTraits<_Tp>> + _To __convert_x86(_Tp, _Tp, _Tp, _Tp, _Tp, _Tp, _Tp, _Tp); + +template <typename _To, typename _Tp, typename _TVT = _VectorTraits<_Tp>> + _To __convert_x86(_Tp, _Tp, _Tp, _Tp, _Tp, _Tp, _Tp, _Tp, _Tp, _Tp, _Tp, _Tp, + _Tp, _Tp, _Tp, _Tp); +#endif // _GLIBCXX_SIMD_WORKAROUND_PR85048 + +//}}} +// __bit_cast {{{ +template <typename _To, typename _From> + _GLIBCXX_SIMD_INTRINSIC constexpr _To + __bit_cast(const _From __x) + { + // TODO: implement with / replace by __builtin_bit_cast ASAP + static_assert(sizeof(_To) == sizeof(_From)); + constexpr bool __to_is_vectorizable + = is_arithmetic_v<_To> || is_enum_v<_To>; + constexpr bool __from_is_vectorizable + = is_arithmetic_v<_From> || is_enum_v<_From>; + if constexpr (__is_vector_type_v<_To> && __is_vector_type_v<_From>) + return reinterpret_cast<_To>(__x); + else if constexpr (__is_vector_type_v<_To> && __from_is_vectorizable) + { + using _FV [[gnu::vector_size(sizeof(_From))]] = _From; + return reinterpret_cast<_To>(_FV{__x}); + } + else if constexpr (__to_is_vectorizable && __from_is_vectorizable) + { + using _TV [[gnu::vector_size(sizeof(_To))]] = _To; + using _FV [[gnu::vector_size(sizeof(_From))]] = _From; + return reinterpret_cast<_TV>(_FV{__x})[0]; + } + else if constexpr (__to_is_vectorizable && __is_vector_type_v<_From>) + { + using _TV [[gnu::vector_size(sizeof(_To))]] = _To; + return reinterpret_cast<_TV>(__x)[0]; + } + else + { + _To __r; + __builtin_memcpy(reinterpret_cast<char*>(&__r), + reinterpret_cast<const char*>(&__x), sizeof(_To)); + return __r; + } + } + +// }}} +// __to_intrin {{{ +template <typename _Tp, typename _TVT = _VectorTraits<_Tp>, + typename _R + = __intrinsic_type_t<typename _TVT::value_type, _TVT::_S_full_size>> + _GLIBCXX_SIMD_INTRINSIC constexpr _R + __to_intrin(_Tp __x) + { + static_assert(sizeof(__x) <= sizeof(_R), + "__to_intrin may never drop values off the end"); + if constexpr (sizeof(__x) == sizeof(_R)) + return reinterpret_cast<_R>(__as_vector(__x)); + else + { + using _Up = __int_for_sizeof_t<_Tp>; + return reinterpret_cast<_R>( + __vector_type_t<_Up, sizeof(_R) / sizeof(_Up)>{__bit_cast<_Up>(__x)}); + } + } + +// }}} +// __make_vector{{{ +template <typename _Tp, typename... _Args> + _GLIBCXX_SIMD_INTRINSIC constexpr __vector_type_t<_Tp, sizeof...(_Args)> + __make_vector(const _Args&... __args) + { + return __vector_type_t<_Tp, sizeof...(_Args)>{static_cast<_Tp>(__args)...}; + } + +// }}} +// __vector_broadcast{{{ +template <size_t _Np, typename _Tp> + _GLIBCXX_SIMD_INTRINSIC constexpr __vector_type_t<_Tp, _Np> + __vector_broadcast(_Tp __x) + { + return __call_with_n_evaluations<_Np>( + [](auto... __xx) { return __vector_type_t<_Tp, _Np>{__xx...}; }, + [&__x](int) { return __x; }); + } + +// }}} +// __generate_vector{{{ + template <typename _Tp, size_t _Np, typename _Gp, size_t... _I> + _GLIBCXX_SIMD_INTRINSIC constexpr __vector_type_t<_Tp, _Np> + __generate_vector_impl(_Gp&& __gen, index_sequence<_I...>) + { + return __vector_type_t<_Tp, _Np>{ + static_cast<_Tp>(__gen(_SizeConstant<_I>()))...}; + } + +template <typename _V, typename _VVT = _VectorTraits<_V>, typename _Gp> + _GLIBCXX_SIMD_INTRINSIC constexpr _V + __generate_vector(_Gp&& __gen) + { + if constexpr (__is_vector_type_v<_V>) + return __generate_vector_impl<typename _VVT::value_type, + _VVT::_S_full_size>( + static_cast<_Gp&&>(__gen), make_index_sequence<_VVT::_S_full_size>()); + else + return __generate_vector_impl<typename _VVT::value_type, + _VVT::_S_partial_width>( + static_cast<_Gp&&>(__gen), + make_index_sequence<_VVT::_S_partial_width>()); + } + +template <typename _Tp, size_t _Np, typename _Gp> + _GLIBCXX_SIMD_INTRINSIC constexpr __vector_type_t<_Tp, _Np> + __generate_vector(_Gp&& __gen) + { + return __generate_vector_impl<_Tp, _Np>(static_cast<_Gp&&>(__gen), + make_index_sequence<_Np>()); + } + +// }}} +// __xor{{{ +template <typename _TW> + _GLIBCXX_SIMD_INTRINSIC constexpr _TW + __xor(_TW __a, _TW __b) noexcept + { + if constexpr (__is_vector_type_v<_TW> || __is_simd_wrapper_v<_TW>) + { + using _Tp = typename conditional_t<__is_simd_wrapper_v<_TW>, _TW, + _VectorTraitsImpl<_TW>>::value_type; + if constexpr (is_floating_point_v<_Tp>) + { + using _Ip = make_unsigned_t<__int_for_sizeof_t<_Tp>>; + return __vector_bitcast<_Tp>(__vector_bitcast<_Ip>(__a) + ^ __vector_bitcast<_Ip>(__b)); + } + else if constexpr (__is_vector_type_v<_TW>) + return __a ^ __b; + else + return __a._M_data ^ __b._M_data; + } + else + return __a ^ __b; + } + +// }}} +// __or{{{ +template <typename _TW> + _GLIBCXX_SIMD_INTRINSIC constexpr _TW + __or(_TW __a, _TW __b) noexcept + { + if constexpr (__is_vector_type_v<_TW> || __is_simd_wrapper_v<_TW>) + { + using _Tp = typename conditional_t<__is_simd_wrapper_v<_TW>, _TW, + _VectorTraitsImpl<_TW>>::value_type; + if constexpr (is_floating_point_v<_Tp>) + { + using _Ip = make_unsigned_t<__int_for_sizeof_t<_Tp>>; + return __vector_bitcast<_Tp>(__vector_bitcast<_Ip>(__a) + | __vector_bitcast<_Ip>(__b)); + } + else if constexpr (__is_vector_type_v<_TW>) + return __a | __b; + else + return __a._M_data | __b._M_data; + } + else + return __a | __b; + } + +// }}} +// __and{{{ +template <typename _TW> + _GLIBCXX_SIMD_INTRINSIC constexpr _TW + __and(_TW __a, _TW __b) noexcept + { + if constexpr (__is_vector_type_v<_TW> || __is_simd_wrapper_v<_TW>) + { + using _Tp = typename conditional_t<__is_simd_wrapper_v<_TW>, _TW, + _VectorTraitsImpl<_TW>>::value_type; + if constexpr (is_floating_point_v<_Tp>) + { + using _Ip = make_unsigned_t<__int_for_sizeof_t<_Tp>>; + return __vector_bitcast<_Tp>(__vector_bitcast<_Ip>(__a) + & __vector_bitcast<_Ip>(__b)); + } + else if constexpr (__is_vector_type_v<_TW>) + return __a & __b; + else + return __a._M_data & __b._M_data; + } + else + return __a & __b; + } + +// }}} +// __andnot{{{ +#if _GLIBCXX_SIMD_X86INTRIN && !defined __clang__ +static constexpr struct +{ + _GLIBCXX_SIMD_INTRINSIC __v4sf + operator()(__v4sf __a, __v4sf __b) const noexcept + { return __builtin_ia32_andnps(__a, __b); } + + _GLIBCXX_SIMD_INTRINSIC __v2df + operator()(__v2df __a, __v2df __b) const noexcept + { return __builtin_ia32_andnpd(__a, __b); } + + _GLIBCXX_SIMD_INTRINSIC __v2di + operator()(__v2di __a, __v2di __b) const noexcept + { return __builtin_ia32_pandn128(__a, __b); } + + _GLIBCXX_SIMD_INTRINSIC __v8sf + operator()(__v8sf __a, __v8sf __b) const noexcept + { return __builtin_ia32_andnps256(__a, __b); } + + _GLIBCXX_SIMD_INTRINSIC __v4df + operator()(__v4df __a, __v4df __b) const noexcept + { return __builtin_ia32_andnpd256(__a, __b); } + + _GLIBCXX_SIMD_INTRINSIC __v4di + operator()(__v4di __a, __v4di __b) const noexcept + { + if constexpr (__have_avx2) + return __builtin_ia32_andnotsi256(__a, __b); + else + return reinterpret_cast<__v4di>( + __builtin_ia32_andnpd256(reinterpret_cast<__v4df>(__a), + reinterpret_cast<__v4df>(__b))); + } + + _GLIBCXX_SIMD_INTRINSIC __v16sf + operator()(__v16sf __a, __v16sf __b) const noexcept + { + if constexpr (__have_avx512dq) + return _mm512_andnot_ps(__a, __b); + else + return reinterpret_cast<__v16sf>( + _mm512_andnot_si512(reinterpret_cast<__v8di>(__a), + reinterpret_cast<__v8di>(__b))); + } + + _GLIBCXX_SIMD_INTRINSIC __v8df + operator()(__v8df __a, __v8df __b) const noexcept + { + if constexpr (__have_avx512dq) + return _mm512_andnot_pd(__a, __b); + else + return reinterpret_cast<__v8df>( + _mm512_andnot_si512(reinterpret_cast<__v8di>(__a), + reinterpret_cast<__v8di>(__b))); + } + + _GLIBCXX_SIMD_INTRINSIC __v8di + operator()(__v8di __a, __v8di __b) const noexcept + { return _mm512_andnot_si512(__a, __b); } +} _S_x86_andnot; +#endif // _GLIBCXX_SIMD_X86INTRIN && !__clang__ + +template <typename _TW> + _GLIBCXX_SIMD_INTRINSIC constexpr _TW + __andnot(_TW __a, _TW __b) noexcept + { + if constexpr (__is_vector_type_v<_TW> || __is_simd_wrapper_v<_TW>) + { + using _TVT = conditional_t<__is_simd_wrapper_v<_TW>, _TW, + _VectorTraitsImpl<_TW>>; + using _Tp = typename _TVT::value_type; +#if _GLIBCXX_SIMD_X86INTRIN && !defined __clang__ + if constexpr (sizeof(_TW) >= 16) + { + const auto __ai = __to_intrin(__a); + const auto __bi = __to_intrin(__b); + if (!__builtin_is_constant_evaluated() + && !(__builtin_constant_p(__ai) && __builtin_constant_p(__bi))) + { + const auto __r = _S_x86_andnot(__ai, __bi); + if constexpr (is_convertible_v<decltype(__r), _TW>) + return __r; + else + return reinterpret_cast<typename _TVT::type>(__r); + } + } +#endif // _GLIBCXX_SIMD_X86INTRIN + using _Ip = make_unsigned_t<__int_for_sizeof_t<_Tp>>; + return __vector_bitcast<_Tp>(~__vector_bitcast<_Ip>(__a) + & __vector_bitcast<_Ip>(__b)); + } + else + return ~__a & __b; + } + +// }}} +// __not{{{ +template <typename _Tp, typename _TVT = _VectorTraits<_Tp>> + _GLIBCXX_SIMD_INTRINSIC constexpr _Tp + __not(_Tp __a) noexcept + { + if constexpr (is_floating_point_v<typename _TVT::value_type>) + return reinterpret_cast<typename _TVT::type>( + ~__vector_bitcast<unsigned>(__a)); + else + return ~__a; + } + +// }}} +// __concat{{{ +template <typename _Tp, typename _TVT = _VectorTraits<_Tp>, + typename _R = __vector_type_t<typename _TVT::value_type, + _TVT::_S_full_size * 2>> + constexpr _R + __concat(_Tp a_, _Tp b_) + { +#ifdef _GLIBCXX_SIMD_WORKAROUND_XXX_1 + using _W + = conditional_t<is_floating_point_v<typename _TVT::value_type>, double, + conditional_t<(sizeof(_Tp) >= 2 * sizeof(long long)), + long long, typename _TVT::value_type>>; + constexpr int input_width = sizeof(_Tp) / sizeof(_W); + const auto __a = __vector_bitcast<_W>(a_); + const auto __b = __vector_bitcast<_W>(b_); + using _Up = __vector_type_t<_W, sizeof(_R) / sizeof(_W)>; +#else + constexpr int input_width = _TVT::_S_full_size; + const _Tp& __a = a_; + const _Tp& __b = b_; + using _Up = _R; +#endif + if constexpr (input_width == 2) + return reinterpret_cast<_R>(_Up{__a[0], __a[1], __b[0], __b[1]}); + else if constexpr (input_width == 4) + return reinterpret_cast<_R>( + _Up{__a[0], __a[1], __a[2], __a[3], __b[0], __b[1], __b[2], __b[3]}); + else if constexpr (input_width == 8) + return reinterpret_cast<_R>( + _Up{__a[0], __a[1], __a[2], __a[3], __a[4], __a[5], __a[6], __a[7], + __b[0], __b[1], __b[2], __b[3], __b[4], __b[5], __b[6], __b[7]}); + else if constexpr (input_width == 16) + return reinterpret_cast<_R>( + _Up{__a[0], __a[1], __a[2], __a[3], __a[4], __a[5], __a[6], + __a[7], __a[8], __a[9], __a[10], __a[11], __a[12], __a[13], + __a[14], __a[15], __b[0], __b[1], __b[2], __b[3], __b[4], + __b[5], __b[6], __b[7], __b[8], __b[9], __b[10], __b[11], + __b[12], __b[13], __b[14], __b[15]}); + else if constexpr (input_width == 32) + return reinterpret_cast<_R>( + _Up{__a[0], __a[1], __a[2], __a[3], __a[4], __a[5], __a[6], + __a[7], __a[8], __a[9], __a[10], __a[11], __a[12], __a[13], + __a[14], __a[15], __a[16], __a[17], __a[18], __a[19], __a[20], + __a[21], __a[22], __a[23], __a[24], __a[25], __a[26], __a[27], + __a[28], __a[29], __a[30], __a[31], __b[0], __b[1], __b[2], + __b[3], __b[4], __b[5], __b[6], __b[7], __b[8], __b[9], + __b[10], __b[11], __b[12], __b[13], __b[14], __b[15], __b[16], + __b[17], __b[18], __b[19], __b[20], __b[21], __b[22], __b[23], + __b[24], __b[25], __b[26], __b[27], __b[28], __b[29], __b[30], + __b[31]}); + } + +// }}} +// __zero_extend {{{ +template <typename _Tp, typename _TVT = _VectorTraits<_Tp>> + struct _ZeroExtendProxy + { + using value_type = typename _TVT::value_type; + static constexpr size_t _Np = _TVT::_S_full_size; + const _Tp __x; + + template <typename _To, typename _ToVT = _VectorTraits<_To>, + typename + = enable_if_t<is_same_v<typename _ToVT::value_type, value_type>>> + _GLIBCXX_SIMD_INTRINSIC operator _To() const + { + constexpr size_t _ToN = _ToVT::_S_full_size; + if constexpr (_ToN == _Np) + return __x; + else if constexpr (_ToN == 2 * _Np) + { +#ifdef _GLIBCXX_SIMD_WORKAROUND_XXX_3 + if constexpr (__have_avx && _TVT::template _S_is<float, 4>) + return __vector_bitcast<value_type>( + _mm256_insertf128_ps(__m256(), __x, 0)); + else if constexpr (__have_avx && _TVT::template _S_is<double, 2>) + return __vector_bitcast<value_type>( + _mm256_insertf128_pd(__m256d(), __x, 0)); + else if constexpr (__have_avx2 && _Np * sizeof(value_type) == 16) + return __vector_bitcast<value_type>( + _mm256_insertf128_si256(__m256i(), __to_intrin(__x), 0)); + else if constexpr (__have_avx512f && _TVT::template _S_is<float, 8>) + { + if constexpr (__have_avx512dq) + return __vector_bitcast<value_type>( + _mm512_insertf32x8(__m512(), __x, 0)); + else + return reinterpret_cast<__m512>( + _mm512_insertf64x4(__m512d(), + reinterpret_cast<__m256d>(__x), 0)); + } + else if constexpr (__have_avx512f + && _TVT::template _S_is<double, 4>) + return __vector_bitcast<value_type>( + _mm512_insertf64x4(__m512d(), __x, 0)); + else if constexpr (__have_avx512f && _Np * sizeof(value_type) == 32) + return __vector_bitcast<value_type>( + _mm512_inserti64x4(__m512i(), __to_intrin(__x), 0)); +#endif + return __concat(__x, _Tp()); + } + else if constexpr (_ToN == 4 * _Np) + { +#ifdef _GLIBCXX_SIMD_WORKAROUND_XXX_3 + if constexpr (__have_avx512dq && _TVT::template _S_is<double, 2>) + { + return __vector_bitcast<value_type>( + _mm512_insertf64x2(__m512d(), __x, 0)); + } + else if constexpr (__have_avx512f + && is_floating_point_v<value_type>) + { + return __vector_bitcast<value_type>( + _mm512_insertf32x4(__m512(), reinterpret_cast<__m128>(__x), + 0)); + } + else if constexpr (__have_avx512f && _Np * sizeof(value_type) == 16) + { + return __vector_bitcast<value_type>( + _mm512_inserti32x4(__m512i(), __to_intrin(__x), 0)); + } +#endif + return __concat(__concat(__x, _Tp()), + __vector_type_t<value_type, _Np * 2>()); + } + else if constexpr (_ToN == 8 * _Np) + return __concat(operator __vector_type_t<value_type, _Np * 4>(), + __vector_type_t<value_type, _Np * 4>()); + else if constexpr (_ToN == 16 * _Np) + return __concat(operator __vector_type_t<value_type, _Np * 8>(), + __vector_type_t<value_type, _Np * 8>()); + else + __assert_unreachable<_Tp>(); + } + }; + +template <typename _Tp, typename _TVT = _VectorTraits<_Tp>> + _GLIBCXX_SIMD_INTRINSIC _ZeroExtendProxy<_Tp, _TVT> + __zero_extend(_Tp __x) + { return {__x}; } + +// }}} +// __extract<_Np, By>{{{ +template <int _Offset, + int _SplitBy, + typename _Tp, + typename _TVT = _VectorTraits<_Tp>, + typename _R = __vector_type_t<typename _TVT::value_type, + _TVT::_S_full_size / _SplitBy>> + _GLIBCXX_SIMD_INTRINSIC constexpr _R + __extract(_Tp __in) + { + using value_type = typename _TVT::value_type; +#if _GLIBCXX_SIMD_X86INTRIN // {{{ + if constexpr (sizeof(_Tp) == 64 && _SplitBy == 4 && _Offset > 0) + { + if constexpr (__have_avx512dq && is_same_v<double, value_type>) + return _mm512_extractf64x2_pd(__to_intrin(__in), _Offset); + else if constexpr (is_floating_point_v<value_type>) + return __vector_bitcast<value_type>( + _mm512_extractf32x4_ps(__intrin_bitcast<__m512>(__in), _Offset)); + else + return reinterpret_cast<_R>( + _mm512_extracti32x4_epi32(__intrin_bitcast<__m512i>(__in), + _Offset)); + } + else +#endif // _GLIBCXX_SIMD_X86INTRIN }}} + { +#ifdef _GLIBCXX_SIMD_WORKAROUND_XXX_1 + using _W = conditional_t< + is_floating_point_v<value_type>, double, + conditional_t<(sizeof(_R) >= 16), long long, value_type>>; + static_assert(sizeof(_R) % sizeof(_W) == 0); + constexpr int __return_width = sizeof(_R) / sizeof(_W); + using _Up = __vector_type_t<_W, __return_width>; + const auto __x = __vector_bitcast<_W>(__in); +#else + constexpr int __return_width = _TVT::_S_full_size / _SplitBy; + using _Up = _R; + const __vector_type_t<value_type, _TVT::_S_full_size>& __x + = __in; // only needed for _Tp = _SimdWrapper<value_type, _Np> +#endif + constexpr int _O = _Offset * __return_width; + return __call_with_subscripts<__return_width, _O>( + __x, [](auto... __entries) { + return reinterpret_cast<_R>(_Up{__entries...}); + }); + } + } + +// }}} +// __lo/__hi64[z]{{{ +template <typename _Tp, + typename _R + = __vector_type8_t<typename _VectorTraits<_Tp>::value_type>> + _GLIBCXX_SIMD_INTRINSIC constexpr _R + __lo64(_Tp __x) + { + _R __r{}; + __builtin_memcpy(&__r, &__x, 8); + return __r; + } + +template <typename _Tp, + typename _R + = __vector_type8_t<typename _VectorTraits<_Tp>::value_type>> + _GLIBCXX_SIMD_INTRINSIC constexpr _R + __hi64(_Tp __x) + { + static_assert(sizeof(_Tp) == 16, "use __hi64z if you meant it"); + _R __r{}; + __builtin_memcpy(&__r, reinterpret_cast<const char*>(&__x) + 8, 8); + return __r; + } + +template <typename _Tp, + typename _R + = __vector_type8_t<typename _VectorTraits<_Tp>::value_type>> + _GLIBCXX_SIMD_INTRINSIC constexpr _R + __hi64z([[maybe_unused]] _Tp __x) + { + _R __r{}; + if constexpr (sizeof(_Tp) == 16) + __builtin_memcpy(&__r, reinterpret_cast<const char*>(&__x) + 8, 8); + return __r; + } + +// }}} +// __lo/__hi128{{{ +template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC constexpr auto + __lo128(_Tp __x) + { return __extract<0, sizeof(_Tp) / 16>(__x); } + +template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC constexpr auto + __hi128(_Tp __x) + { + static_assert(sizeof(__x) == 32); + return __extract<1, 2>(__x); + } + +// }}} +// __lo/__hi256{{{ +template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC constexpr auto + __lo256(_Tp __x) + { + static_assert(sizeof(__x) == 64); + return __extract<0, 2>(__x); + } + +template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC constexpr auto + __hi256(_Tp __x) + { + static_assert(sizeof(__x) == 64); + return __extract<1, 2>(__x); + } + +// }}} +// __auto_bitcast{{{ +template <typename _Tp> + struct _AutoCast + { + static_assert(__is_vector_type_v<_Tp>); + + const _Tp __x; + + template <typename _Up, typename _UVT = _VectorTraits<_Up>> + _GLIBCXX_SIMD_INTRINSIC constexpr operator _Up() const + { return __intrin_bitcast<typename _UVT::type>(__x); } + }; + +template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC constexpr _AutoCast<_Tp> + __auto_bitcast(const _Tp& __x) + { return {__x}; } + +template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC constexpr + _AutoCast<typename _SimdWrapper<_Tp, _Np>::_BuiltinType> + __auto_bitcast(const _SimdWrapper<_Tp, _Np>& __x) + { return {__x._M_data}; } + +// }}} +// ^^^ ---- builtin vector types [[gnu::vector_size(N)]] and operations ---- ^^^ + +#if _GLIBCXX_SIMD_HAVE_SSE_ABI +// __bool_storage_member_type{{{ +#if _GLIBCXX_SIMD_HAVE_AVX512F && _GLIBCXX_SIMD_X86INTRIN +template <size_t _Size> + struct __bool_storage_member_type + { + static_assert((_Size & (_Size - 1)) != 0, + "This trait may only be used for non-power-of-2 sizes. " + "Power-of-2 sizes must be specialized."); + using type = + typename __bool_storage_member_type<std::__bit_ceil(_Size)>::type; + }; + +template <> + struct __bool_storage_member_type<1> { using type = bool; }; + +template <> + struct __bool_storage_member_type<2> { using type = __mmask8; }; + +template <> + struct __bool_storage_member_type<4> { using type = __mmask8; }; + +template <> + struct __bool_storage_member_type<8> { using type = __mmask8; }; + +template <> + struct __bool_storage_member_type<16> { using type = __mmask16; }; + +template <> + struct __bool_storage_member_type<32> { using type = __mmask32; }; + +template <> + struct __bool_storage_member_type<64> { using type = __mmask64; }; +#endif // _GLIBCXX_SIMD_HAVE_AVX512F + +// }}} +// __intrinsic_type (x86){{{ +// the following excludes bool via __is_vectorizable +#if _GLIBCXX_SIMD_HAVE_SSE +template <typename _Tp, size_t _Bytes> + struct __intrinsic_type<_Tp, _Bytes, + enable_if_t<__is_vectorizable_v<_Tp> && _Bytes <= 64>> + { + static_assert(!is_same_v<_Tp, long double>, + "no __intrinsic_type support for long double on x86"); + + static constexpr size_t _S_VBytes = _Bytes <= 16 ? 16 + : _Bytes <= 32 ? 32 + : 64; + + using type [[__gnu__::__vector_size__(_S_VBytes)]] + = conditional_t<is_integral_v<_Tp>, long long int, _Tp>; + }; +#endif // _GLIBCXX_SIMD_HAVE_SSE + +// }}} +#endif // _GLIBCXX_SIMD_HAVE_SSE_ABI +// __intrinsic_type (ARM){{{ +#if _GLIBCXX_SIMD_HAVE_NEON +template <typename _Tp, size_t _Bytes> + struct __intrinsic_type<_Tp, _Bytes, + enable_if_t<__is_vectorizable_v<_Tp> && _Bytes <= 16>> + { + static constexpr int _S_VBytes = _Bytes <= 8 ? 8 : 16; + using _Ip = __int_for_sizeof_t<_Tp>; + using _Up = conditional_t< + is_floating_point_v<_Tp>, _Tp, + conditional_t<is_unsigned_v<_Tp>, make_unsigned_t<_Ip>, _Ip>>; + using type [[__gnu__::__vector_size__(_S_VBytes)]] = _Up; + }; +#endif // _GLIBCXX_SIMD_HAVE_NEON + +// }}} +// __intrinsic_type (PPC){{{ +#ifdef __ALTIVEC__ +template <typename _Tp> + struct __intrinsic_type_impl; + +#define _GLIBCXX_SIMD_PPC_INTRIN(_Tp) \ + template <> \ + struct __intrinsic_type_impl<_Tp> { using type = __vector _Tp; } +_GLIBCXX_SIMD_PPC_INTRIN(float); +_GLIBCXX_SIMD_PPC_INTRIN(double); +_GLIBCXX_SIMD_PPC_INTRIN(signed char); +_GLIBCXX_SIMD_PPC_INTRIN(unsigned char); +_GLIBCXX_SIMD_PPC_INTRIN(signed short); +_GLIBCXX_SIMD_PPC_INTRIN(unsigned short); +_GLIBCXX_SIMD_PPC_INTRIN(signed int); +_GLIBCXX_SIMD_PPC_INTRIN(unsigned int); +_GLIBCXX_SIMD_PPC_INTRIN(signed long); +_GLIBCXX_SIMD_PPC_INTRIN(unsigned long); +_GLIBCXX_SIMD_PPC_INTRIN(signed long long); +_GLIBCXX_SIMD_PPC_INTRIN(unsigned long long); +#undef _GLIBCXX_SIMD_PPC_INTRIN + +template <typename _Tp, size_t _Bytes> + struct __intrinsic_type<_Tp, _Bytes, + enable_if_t<__is_vectorizable_v<_Tp> && _Bytes <= 16>> + { + static_assert(!is_same_v<_Tp, long double>, + "no __intrinsic_type support for long double on PPC"); +#ifndef __VSX__ + static_assert(!is_same_v<_Tp, double>, + "no __intrinsic_type support for double on PPC w/o VSX"); +#endif +#ifndef __POWER8_VECTOR__ + static_assert( + !(is_integral_v<_Tp> && sizeof(_Tp) > 4), + "no __intrinsic_type support for integers larger than 4 Bytes " + "on PPC w/o POWER8 vectors"); +#endif + using type = typename __intrinsic_type_impl<conditional_t< + is_floating_point_v<_Tp>, _Tp, __int_for_sizeof_t<_Tp>>>::type; + }; +#endif // __ALTIVEC__ + +// }}} +// _SimdWrapper<bool>{{{1 +template <size_t _Width> + struct _SimdWrapper<bool, _Width, + void_t<typename __bool_storage_member_type<_Width>::type>> + { + using _BuiltinType = typename __bool_storage_member_type<_Width>::type; + using value_type = bool; + + static constexpr size_t _S_full_size = sizeof(_BuiltinType) * __CHAR_BIT__; + + _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapper<bool, _S_full_size> + __as_full_vector() const { return _M_data; } + + _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapper() = default; + _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapper(_BuiltinType __k) + : _M_data(__k) {}; + + _GLIBCXX_SIMD_INTRINSIC operator const _BuiltinType&() const + { return _M_data; } + + _GLIBCXX_SIMD_INTRINSIC operator _BuiltinType&() + { return _M_data; } + + _GLIBCXX_SIMD_INTRINSIC _BuiltinType __intrin() const + { return _M_data; } + + _GLIBCXX_SIMD_INTRINSIC constexpr value_type operator[](size_t __i) const + { return _M_data & (_BuiltinType(1) << __i); } + + template <size_t __i> + _GLIBCXX_SIMD_INTRINSIC constexpr value_type + operator[](_SizeConstant<__i>) const + { return _M_data & (_BuiltinType(1) << __i); } + + _GLIBCXX_SIMD_INTRINSIC constexpr void _M_set(size_t __i, value_type __x) + { + if (__x) + _M_data |= (_BuiltinType(1) << __i); + else + _M_data &= ~(_BuiltinType(1) << __i); + } + + _GLIBCXX_SIMD_INTRINSIC + constexpr bool _M_is_constprop() const + { return __builtin_constant_p(_M_data); } + + _GLIBCXX_SIMD_INTRINSIC constexpr bool _M_is_constprop_none_of() const + { + if (__builtin_constant_p(_M_data)) + { + constexpr int __nbits = sizeof(_BuiltinType) * __CHAR_BIT__; + constexpr _BuiltinType __active_mask + = ~_BuiltinType() >> (__nbits - _Width); + return (_M_data & __active_mask) == 0; + } + return false; + } + + _GLIBCXX_SIMD_INTRINSIC constexpr bool _M_is_constprop_all_of() const + { + if (__builtin_constant_p(_M_data)) + { + constexpr int __nbits = sizeof(_BuiltinType) * __CHAR_BIT__; + constexpr _BuiltinType __active_mask + = ~_BuiltinType() >> (__nbits - _Width); + return (_M_data & __active_mask) == __active_mask; + } + return false; + } + + _BuiltinType _M_data; + }; + +// _SimdWrapperBase{{{1 +template <bool _MustZeroInitPadding, typename _BuiltinType> + struct _SimdWrapperBase; + +template <typename _BuiltinType> + struct _SimdWrapperBase<false, _BuiltinType> // no padding or no SNaNs + { + _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapperBase() = default; + _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapperBase(_BuiltinType __init) + : _M_data(__init) + {} + + _BuiltinType _M_data; + }; + +template <typename _BuiltinType> + struct _SimdWrapperBase<true, _BuiltinType> // with padding that needs to + // never become SNaN + { + _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapperBase() : _M_data() {} + _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapperBase(_BuiltinType __init) + : _M_data(__init) + {} + + _BuiltinType _M_data; + }; + +// }}} +// _SimdWrapper{{{ +template <typename _Tp, size_t _Width> + struct _SimdWrapper< + _Tp, _Width, + void_t<__vector_type_t<_Tp, _Width>, __intrinsic_type_t<_Tp, _Width>>> + : _SimdWrapperBase<__has_iec559_behavior<__signaling_NaN, _Tp>::value + && sizeof(_Tp) * _Width + == sizeof(__vector_type_t<_Tp, _Width>), + __vector_type_t<_Tp, _Width>> + { + using _Base + = _SimdWrapperBase<__has_iec559_behavior<__signaling_NaN, _Tp>::value + && sizeof(_Tp) * _Width + == sizeof(__vector_type_t<_Tp, _Width>), + __vector_type_t<_Tp, _Width>>; + + static_assert(__is_vectorizable_v<_Tp>); + static_assert(_Width >= 2); // 1 doesn't make sense, use _Tp directly then + + using _BuiltinType = __vector_type_t<_Tp, _Width>; + using value_type = _Tp; + + static inline constexpr size_t _S_full_size + = sizeof(_BuiltinType) / sizeof(value_type); + static inline constexpr int _S_size = _Width; + static inline constexpr bool _S_is_partial = _S_full_size != _S_size; + + using _Base::_M_data; + + _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapper<_Tp, _S_full_size> + __as_full_vector() const + { return _M_data; } + + _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapper(initializer_list<_Tp> __init) + : _Base(__generate_from_n_evaluations<_Width, _BuiltinType>( + [&](auto __i) { return __init.begin()[__i.value]; })) {} + + _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapper() = default; + _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapper(const _SimdWrapper&) + = default; + _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapper(_SimdWrapper&&) = default; + + _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapper& + operator=(const _SimdWrapper&) = default; + _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapper& + operator=(_SimdWrapper&&) = default; + + template <typename _V, typename = enable_if_t<disjunction_v< + is_same<_V, __vector_type_t<_Tp, _Width>>, + is_same<_V, __intrinsic_type_t<_Tp, _Width>>>>> + _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapper(_V __x) + // __vector_bitcast can convert e.g. __m128 to __vector(2) float + : _Base(__vector_bitcast<_Tp, _Width>(__x)) {} + + template <typename... _As, + typename = enable_if_t<((is_same_v<simd_abi::scalar, _As> && ...) + && sizeof...(_As) <= _Width)>> + _GLIBCXX_SIMD_INTRINSIC constexpr + operator _SimdTuple<_Tp, _As...>() const + { + const auto& dd = _M_data; // workaround for GCC7 ICE + return __generate_from_n_evaluations<sizeof...(_As), + _SimdTuple<_Tp, _As...>>([&]( + auto __i) constexpr { return dd[int(__i)]; }); + } + + _GLIBCXX_SIMD_INTRINSIC constexpr operator const _BuiltinType&() const + { return _M_data; } + + _GLIBCXX_SIMD_INTRINSIC constexpr operator _BuiltinType&() + { return _M_data; } + + _GLIBCXX_SIMD_INTRINSIC constexpr _Tp operator[](size_t __i) const + { return _M_data[__i]; } + + template <size_t __i> + _GLIBCXX_SIMD_INTRINSIC constexpr _Tp operator[](_SizeConstant<__i>) const + { return _M_data[__i]; } + + _GLIBCXX_SIMD_INTRINSIC constexpr void _M_set(size_t __i, _Tp __x) + { _M_data[__i] = __x; } + + _GLIBCXX_SIMD_INTRINSIC + constexpr bool _M_is_constprop() const + { return __builtin_constant_p(_M_data); } + + _GLIBCXX_SIMD_INTRINSIC constexpr bool _M_is_constprop_none_of() const + { + if (__builtin_constant_p(_M_data)) + { + bool __r = true; + if constexpr (is_floating_point_v<_Tp>) + { + using _Ip = __int_for_sizeof_t<_Tp>; + const auto __intdata = __vector_bitcast<_Ip>(_M_data); + __execute_n_times<_Width>( + [&](auto __i) { __r &= __intdata[__i.value] == _Ip(); }); + } + else + __execute_n_times<_Width>( + [&](auto __i) { __r &= _M_data[__i.value] == _Tp(); }); + return __r; + } + return false; + } + + _GLIBCXX_SIMD_INTRINSIC constexpr bool _M_is_constprop_all_of() const + { + if (__builtin_constant_p(_M_data)) + { + bool __r = true; + if constexpr (is_floating_point_v<_Tp>) + { + using _Ip = __int_for_sizeof_t<_Tp>; + const auto __intdata = __vector_bitcast<_Ip>(_M_data); + __execute_n_times<_Width>( + [&](auto __i) { __r &= __intdata[__i.value] == ~_Ip(); }); + } + else + __execute_n_times<_Width>( + [&](auto __i) { __r &= _M_data[__i.value] == ~_Tp(); }); + return __r; + } + return false; + } + }; + +// }}} + +// __vectorized_sizeof {{{ +template <typename _Tp> + constexpr size_t + __vectorized_sizeof() + { + if constexpr (!__is_vectorizable_v<_Tp>) + return 0; + + if constexpr (sizeof(_Tp) <= 8) + { + // X86: + if constexpr (__have_avx512bw) + return 64; + if constexpr (__have_avx512f && sizeof(_Tp) >= 4) + return 64; + if constexpr (__have_avx2) + return 32; + if constexpr (__have_avx && is_floating_point_v<_Tp>) + return 32; + if constexpr (__have_sse2) + return 16; + if constexpr (__have_sse && is_same_v<_Tp, float>) + return 16; + /* The following is too much trouble because of mixed MMX and x87 code. + * While nothing here explicitly calls MMX instructions of registers, + * they are still emitted but no EMMS cleanup is done. + if constexpr (__have_mmx && sizeof(_Tp) <= 4 && is_integral_v<_Tp>) + return 8; + */ + + // PowerPC: + if constexpr (__have_power8vec + || (__have_power_vmx && (sizeof(_Tp) < 8)) + || (__have_power_vsx && is_floating_point_v<_Tp>) ) + return 16; + + // ARM: + if constexpr (__have_neon_a64 + || (__have_neon_a32 && !is_same_v<_Tp, double>) ) + return 16; + if constexpr (__have_neon + && sizeof(_Tp) < 8 + // Only allow fp if the user allows non-ICE559 fp (e.g. + // via -ffast-math). ARMv7 NEON fp is not conforming to + // IEC559. + && (__support_neon_float || !is_floating_point_v<_Tp>)) + return 16; + } + + return sizeof(_Tp); + } + +// }}} +namespace simd_abi { +// most of simd_abi is defined in simd_detail.h +template <typename _Tp> + inline constexpr int max_fixed_size + = (__have_avx512bw && sizeof(_Tp) == 1) ? 64 : 32; + +// compatible {{{ +#if defined __x86_64__ || defined __aarch64__ +template <typename _Tp> + using compatible = conditional_t<(sizeof(_Tp) <= 8), _VecBuiltin<16>, scalar>; +#elif defined __ARM_NEON +// FIXME: not sure, probably needs to be scalar (or dependent on the hard-float +// ABI?) +template <typename _Tp> + using compatible + = conditional_t<(sizeof(_Tp) < 8 + && (__support_neon_float || !is_floating_point_v<_Tp>)), + _VecBuiltin<16>, scalar>; +#else +template <typename> + using compatible = scalar; +#endif + +// }}} +// native {{{ +template <typename _Tp> + constexpr auto + __determine_native_abi() + { + constexpr size_t __bytes = __vectorized_sizeof<_Tp>(); + if constexpr (__bytes == sizeof(_Tp)) + return static_cast<scalar*>(nullptr); + else if constexpr (__have_avx512vl || (__have_avx512f && __bytes == 64)) + return static_cast<_VecBltnBtmsk<__bytes>*>(nullptr); + else + return static_cast<_VecBuiltin<__bytes>*>(nullptr); + } + +template <typename _Tp, typename = enable_if_t<__is_vectorizable_v<_Tp>>> + using native = remove_pointer_t<decltype(__determine_native_abi<_Tp>())>; + +// }}} +// __default_abi {{{ +#if defined _GLIBCXX_SIMD_DEFAULT_ABI +template <typename _Tp> + using __default_abi = _GLIBCXX_SIMD_DEFAULT_ABI<_Tp>; +#else +template <typename _Tp> + using __default_abi = compatible<_Tp>; +#endif + +// }}} +} // namespace simd_abi + +// traits {{{1 +// is_abi_tag {{{2 +template <typename _Tp, typename = void_t<>> + struct is_abi_tag : false_type {}; + +template <typename _Tp> + struct is_abi_tag<_Tp, void_t<typename _Tp::_IsValidAbiTag>> + : public _Tp::_IsValidAbiTag {}; + +template <typename _Tp> + inline constexpr bool is_abi_tag_v = is_abi_tag<_Tp>::value; + +// is_simd(_mask) {{{2 +template <typename _Tp> + struct is_simd : public false_type {}; + +template <typename _Tp> + inline constexpr bool is_simd_v = is_simd<_Tp>::value; + +template <typename _Tp> + struct is_simd_mask : public false_type {}; + +template <typename _Tp> +inline constexpr bool is_simd_mask_v = is_simd_mask<_Tp>::value; + +// simd_size {{{2 +template <typename _Tp, typename _Abi, typename = void> + struct __simd_size_impl {}; + +template <typename _Tp, typename _Abi> + struct __simd_size_impl< + _Tp, _Abi, + enable_if_t<conjunction_v<__is_vectorizable<_Tp>, is_abi_tag<_Abi>>>> + : _SizeConstant<_Abi::template _S_size<_Tp>> {}; + +template <typename _Tp, typename _Abi = simd_abi::__default_abi<_Tp>> + struct simd_size : __simd_size_impl<_Tp, _Abi> {}; + +template <typename _Tp, typename _Abi = simd_abi::__default_abi<_Tp>> + inline constexpr size_t simd_size_v = simd_size<_Tp, _Abi>::value; + +// simd_abi::deduce {{{2 +template <typename _Tp, size_t _Np, typename = void> + struct __deduce_impl; + +namespace simd_abi { +/** + * @tparam _Tp The requested `value_type` for the elements. + * @tparam _Np The requested number of elements. + * @tparam _Abis This parameter is ignored, since this implementation cannot + * make any use of it. Either __a good native ABI is matched and used as `type` + * alias, or the `fixed_size<_Np>` ABI is used, which internally is built from + * the best matching native ABIs. + */ +template <typename _Tp, size_t _Np, typename...> + struct deduce : __deduce_impl<_Tp, _Np> {}; + +template <typename _Tp, size_t _Np, typename... _Abis> + using deduce_t = typename deduce<_Tp, _Np, _Abis...>::type; +} // namespace simd_abi + +// }}}2 +// rebind_simd {{{2 +template <typename _Tp, typename _V, typename = void> + struct rebind_simd; + +template <typename _Tp, typename _Up, typename _Abi> + struct rebind_simd< + _Tp, simd<_Up, _Abi>, + void_t<simd_abi::deduce_t<_Tp, simd_size_v<_Up, _Abi>, _Abi>>> + { + using type + = simd<_Tp, simd_abi::deduce_t<_Tp, simd_size_v<_Up, _Abi>, _Abi>>; + }; + +template <typename _Tp, typename _Up, typename _Abi> + struct rebind_simd< + _Tp, simd_mask<_Up, _Abi>, + void_t<simd_abi::deduce_t<_Tp, simd_size_v<_Up, _Abi>, _Abi>>> + { + using type + = simd_mask<_Tp, simd_abi::deduce_t<_Tp, simd_size_v<_Up, _Abi>, _Abi>>; + }; + +template <typename _Tp, typename _V> + using rebind_simd_t = typename rebind_simd<_Tp, _V>::type; + +// resize_simd {{{2 +template <int _Np, typename _V, typename = void> + struct resize_simd; + +template <int _Np, typename _Tp, typename _Abi> + struct resize_simd<_Np, simd<_Tp, _Abi>, + void_t<simd_abi::deduce_t<_Tp, _Np, _Abi>>> + { using type = simd<_Tp, simd_abi::deduce_t<_Tp, _Np, _Abi>>; }; + +template <int _Np, typename _Tp, typename _Abi> + struct resize_simd<_Np, simd_mask<_Tp, _Abi>, + void_t<simd_abi::deduce_t<_Tp, _Np, _Abi>>> + { using type = simd_mask<_Tp, simd_abi::deduce_t<_Tp, _Np, _Abi>>; }; + +template <int _Np, typename _V> + using resize_simd_t = typename resize_simd<_Np, _V>::type; + +// }}}2 +// memory_alignment {{{2 +template <typename _Tp, typename _Up = typename _Tp::value_type> + struct memory_alignment + : public _SizeConstant<vector_aligned_tag::_S_alignment<_Tp, _Up>> {}; + +template <typename _Tp, typename _Up = typename _Tp::value_type> + inline constexpr size_t memory_alignment_v = memory_alignment<_Tp, _Up>::value; + +// class template simd [simd] {{{1 +template <typename _Tp, typename _Abi = simd_abi::__default_abi<_Tp>> + class simd; + +template <typename _Tp, typename _Abi> + struct is_simd<simd<_Tp, _Abi>> : public true_type {}; + +template <typename _Tp> + using native_simd = simd<_Tp, simd_abi::native<_Tp>>; + +template <typename _Tp, int _Np> + using fixed_size_simd = simd<_Tp, simd_abi::fixed_size<_Np>>; + +template <typename _Tp, size_t _Np> + using __deduced_simd = simd<_Tp, simd_abi::deduce_t<_Tp, _Np>>; + +// class template simd_mask [simd_mask] {{{1 +template <typename _Tp, typename _Abi = simd_abi::__default_abi<_Tp>> + class simd_mask; + +template <typename _Tp, typename _Abi> + struct is_simd_mask<simd_mask<_Tp, _Abi>> : public true_type {}; + +template <typename _Tp> + using native_simd_mask = simd_mask<_Tp, simd_abi::native<_Tp>>; + +template <typename _Tp, int _Np> + using fixed_size_simd_mask = simd_mask<_Tp, simd_abi::fixed_size<_Np>>; + +template <typename _Tp, size_t _Np> + using __deduced_simd_mask = simd_mask<_Tp, simd_abi::deduce_t<_Tp, _Np>>; + +// casts [simd.casts] {{{1 +// static_simd_cast {{{2 +template <typename _Tp, typename _Up, typename _Ap, bool = is_simd_v<_Tp>, + typename = void> + struct __static_simd_cast_return_type; + +template <typename _Tp, typename _A0, typename _Up, typename _Ap> + struct __static_simd_cast_return_type<simd_mask<_Tp, _A0>, _Up, _Ap, false, + void> + : __static_simd_cast_return_type<simd<_Tp, _A0>, _Up, _Ap> {}; + +template <typename _Tp, typename _Up, typename _Ap> + struct __static_simd_cast_return_type< + _Tp, _Up, _Ap, true, enable_if_t<_Tp::size() == simd_size_v<_Up, _Ap>>> + { using type = _Tp; }; + +template <typename _Tp, typename _Ap> + struct __static_simd_cast_return_type<_Tp, _Tp, _Ap, false, +#ifdef _GLIBCXX_SIMD_FIX_P2TS_ISSUE66 + enable_if_t<__is_vectorizable_v<_Tp>> +#else + void +#endif + > + { using type = simd<_Tp, _Ap>; }; + +template <typename _Tp, typename = void> + struct __safe_make_signed { using type = _Tp;}; + +template <typename _Tp> + struct __safe_make_signed<_Tp, enable_if_t<is_integral_v<_Tp>>> + { + // the extra make_unsigned_t is because of PR85951 + using type = make_signed_t<make_unsigned_t<_Tp>>; + }; + +template <typename _Tp> + using safe_make_signed_t = typename __safe_make_signed<_Tp>::type; + +template <typename _Tp, typename _Up, typename _Ap> + struct __static_simd_cast_return_type<_Tp, _Up, _Ap, false, +#ifdef _GLIBCXX_SIMD_FIX_P2TS_ISSUE66 + enable_if_t<__is_vectorizable_v<_Tp>> +#else + void +#endif + > + { + using type = conditional_t< + (is_integral_v<_Up> && is_integral_v<_Tp> && +#ifndef _GLIBCXX_SIMD_FIX_P2TS_ISSUE65 + is_signed_v<_Up> != is_signed_v<_Tp> && +#endif + is_same_v<safe_make_signed_t<_Up>, safe_make_signed_t<_Tp>>), + simd<_Tp, _Ap>, fixed_size_simd<_Tp, simd_size_v<_Up, _Ap>>>; + }; + +template <typename _Tp, typename _Up, typename _Ap, + typename _R + = typename __static_simd_cast_return_type<_Tp, _Up, _Ap>::type> + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR _R + static_simd_cast(const simd<_Up, _Ap>& __x) + { + if constexpr (is_same<_R, simd<_Up, _Ap>>::value) + return __x; + else + { + _SimdConverter<_Up, _Ap, typename _R::value_type, typename _R::abi_type> + __c; + return _R(__private_init, __c(__data(__x))); + } + } + +namespace __proposed { +template <typename _Tp, typename _Up, typename _Ap, + typename _R + = typename __static_simd_cast_return_type<_Tp, _Up, _Ap>::type> + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR typename _R::mask_type + static_simd_cast(const simd_mask<_Up, _Ap>& __x) + { + using _RM = typename _R::mask_type; + return {__private_init, _RM::abi_type::_MaskImpl::template _S_convert< + typename _RM::simd_type::value_type>(__x)}; + } +} // namespace __proposed + +// simd_cast {{{2 +template <typename _Tp, typename _Up, typename _Ap, + typename _To = __value_type_or_identity_t<_Tp>> + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR auto + simd_cast(const simd<_ValuePreserving<_Up, _To>, _Ap>& __x) + -> decltype(static_simd_cast<_Tp>(__x)) + { return static_simd_cast<_Tp>(__x); } + +namespace __proposed { +template <typename _Tp, typename _Up, typename _Ap, + typename _To = __value_type_or_identity_t<_Tp>> + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR auto + simd_cast(const simd_mask<_ValuePreserving<_Up, _To>, _Ap>& __x) + -> decltype(static_simd_cast<_Tp>(__x)) + { return static_simd_cast<_Tp>(__x); } +} // namespace __proposed + +// }}}2 +// resizing_simd_cast {{{ +namespace __proposed { +/* Proposed spec: + +template <class T, class U, class Abi> +T resizing_simd_cast(const simd<U, Abi>& x) + +p1 Constraints: + - is_simd_v<T> is true and + - T::value_type is the same type as U + +p2 Returns: + A simd object with the i^th element initialized to x[i] for all i in the + range of [0, min(T::size(), simd_size_v<U, Abi>)). If T::size() is larger + than simd_size_v<U, Abi>, the remaining elements are value-initialized. + +template <class T, class U, class Abi> +T resizing_simd_cast(const simd_mask<U, Abi>& x) + +p1 Constraints: is_simd_mask_v<T> is true + +p2 Returns: + A simd_mask object with the i^th element initialized to x[i] for all i in +the range of [0, min(T::size(), simd_size_v<U, Abi>)). If T::size() is larger + than simd_size_v<U, Abi>, the remaining elements are initialized to false. + + */ + +template <typename _Tp, typename _Up, typename _Ap> + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR enable_if_t< + conjunction_v<is_simd<_Tp>, is_same<typename _Tp::value_type, _Up>>, _Tp> + resizing_simd_cast(const simd<_Up, _Ap>& __x) + { + if constexpr (is_same_v<typename _Tp::abi_type, _Ap>) + return __x; + else if constexpr (simd_size_v<_Up, _Ap> == 1) + { + _Tp __r{}; + __r[0] = __x[0]; + return __r; + } + else if constexpr (_Tp::size() == 1) + return __x[0]; + else if constexpr (sizeof(_Tp) == sizeof(__x) + && !__is_fixed_size_abi_v<_Ap>) + return {__private_init, + __vector_bitcast<typename _Tp::value_type, _Tp::size()>( + _Ap::_S_masked(__data(__x))._M_data)}; + else + { + _Tp __r{}; + __builtin_memcpy(&__data(__r), &__data(__x), + sizeof(_Up) + * std::min(_Tp::size(), simd_size_v<_Up, _Ap>)); + return __r; + } + } + +template <typename _Tp, typename _Up, typename _Ap> + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR + enable_if_t<is_simd_mask_v<_Tp>, _Tp> + resizing_simd_cast(const simd_mask<_Up, _Ap>& __x) + { + return {__private_init, _Tp::abi_type::_MaskImpl::template _S_convert< + typename _Tp::simd_type::value_type>(__x)}; + } +} // namespace __proposed + +// }}} +// to_fixed_size {{{2 +template <typename _Tp, int _Np> + _GLIBCXX_SIMD_INTRINSIC fixed_size_simd<_Tp, _Np> + to_fixed_size(const fixed_size_simd<_Tp, _Np>& __x) + { return __x; } + +template <typename _Tp, int _Np> + _GLIBCXX_SIMD_INTRINSIC fixed_size_simd_mask<_Tp, _Np> + to_fixed_size(const fixed_size_simd_mask<_Tp, _Np>& __x) + { return __x; } + +template <typename _Tp, typename _Ap> + _GLIBCXX_SIMD_INTRINSIC auto + to_fixed_size(const simd<_Tp, _Ap>& __x) + { + return simd<_Tp, simd_abi::fixed_size<simd_size_v<_Tp, _Ap>>>([&__x]( + auto __i) constexpr { return __x[__i]; }); + } + +template <typename _Tp, typename _Ap> + _GLIBCXX_SIMD_INTRINSIC auto + to_fixed_size(const simd_mask<_Tp, _Ap>& __x) + { + constexpr int _Np = simd_mask<_Tp, _Ap>::size(); + fixed_size_simd_mask<_Tp, _Np> __r; + __execute_n_times<_Np>([&](auto __i) constexpr { __r[__i] = __x[__i]; }); + return __r; + } + +// to_native {{{2 +template <typename _Tp, int _Np> + _GLIBCXX_SIMD_INTRINSIC + enable_if_t<(_Np == native_simd<_Tp>::size()), native_simd<_Tp>> + to_native(const fixed_size_simd<_Tp, _Np>& __x) + { + alignas(memory_alignment_v<native_simd<_Tp>>) _Tp __mem[_Np]; + __x.copy_to(__mem, vector_aligned); + return {__mem, vector_aligned}; + } + +template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC + enable_if_t<(_Np == native_simd_mask<_Tp>::size()), native_simd_mask<_Tp>> + to_native(const fixed_size_simd_mask<_Tp, _Np>& __x) + { + return native_simd_mask<_Tp>([&](auto __i) constexpr { return __x[__i]; }); + } + +// to_compatible {{{2 +template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC enable_if_t<(_Np == simd<_Tp>::size()), simd<_Tp>> + to_compatible(const simd<_Tp, simd_abi::fixed_size<_Np>>& __x) + { + alignas(memory_alignment_v<simd<_Tp>>) _Tp __mem[_Np]; + __x.copy_to(__mem, vector_aligned); + return {__mem, vector_aligned}; + } + +template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC + enable_if_t<(_Np == simd_mask<_Tp>::size()), simd_mask<_Tp>> + to_compatible(const simd_mask<_Tp, simd_abi::fixed_size<_Np>>& __x) + { return simd_mask<_Tp>([&](auto __i) constexpr { return __x[__i]; }); } + +// masked assignment [simd_mask.where] {{{1 + +// where_expression {{{1 +// const_where_expression<M, T> {{{2 +template <typename _M, typename _Tp> + class const_where_expression + { + using _V = _Tp; + static_assert(is_same_v<_V, __remove_cvref_t<_Tp>>); + + struct _Wrapper { using value_type = _V; }; + + protected: + using _Impl = typename _V::_Impl; + + using value_type = + typename conditional_t<is_arithmetic_v<_V>, _Wrapper, _V>::value_type; + + _GLIBCXX_SIMD_INTRINSIC friend const _M& + __get_mask(const const_where_expression& __x) + { return __x._M_k; } + + _GLIBCXX_SIMD_INTRINSIC friend const _Tp& + __get_lvalue(const const_where_expression& __x) + { return __x._M_value; } + + const _M& _M_k; + _Tp& _M_value; + + public: + const_where_expression(const const_where_expression&) = delete; + const_where_expression& operator=(const const_where_expression&) = delete; + + _GLIBCXX_SIMD_INTRINSIC const_where_expression(const _M& __kk, const _Tp& dd) + : _M_k(__kk), _M_value(const_cast<_Tp&>(dd)) {} + + _GLIBCXX_SIMD_INTRINSIC _V + operator-() const&& + { + return {__private_init, + _Impl::template _S_masked_unary<negate>(__data(_M_k), + __data(_M_value))}; + } + + template <typename _Up, typename _Flags> + [[nodiscard]] _GLIBCXX_SIMD_INTRINSIC _V + copy_from(const _LoadStorePtr<_Up, value_type>* __mem, _Flags) const&& + { + return {__private_init, + _Impl::_S_masked_load(__data(_M_value), __data(_M_k), + _Flags::template _S_apply<_V>(__mem))}; + } + + template <typename _Up, typename _Flags> + _GLIBCXX_SIMD_INTRINSIC void + copy_to(_LoadStorePtr<_Up, value_type>* __mem, _Flags) const&& + { + _Impl::_S_masked_store(__data(_M_value), + _Flags::template _S_apply<_V>(__mem), + __data(_M_k)); + } + }; + +// const_where_expression<bool, T> {{{2 +template <typename _Tp> + class const_where_expression<bool, _Tp> + { + using _M = bool; + using _V = _Tp; + + static_assert(is_same_v<_V, __remove_cvref_t<_Tp>>); + + struct _Wrapper { using value_type = _V; }; + + protected: + using value_type = + typename conditional_t<is_arithmetic_v<_V>, _Wrapper, _V>::value_type; + + _GLIBCXX_SIMD_INTRINSIC friend const _M& + __get_mask(const const_where_expression& __x) + { return __x._M_k; } + + _GLIBCXX_SIMD_INTRINSIC friend const _Tp& + __get_lvalue(const const_where_expression& __x) + { return __x._M_value; } + + const bool _M_k; + _Tp& _M_value; + + public: + const_where_expression(const const_where_expression&) = delete; + const_where_expression& operator=(const const_where_expression&) = delete; + + _GLIBCXX_SIMD_INTRINSIC const_where_expression(const bool __kk, const _Tp& dd) + : _M_k(__kk), _M_value(const_cast<_Tp&>(dd)) {} + + _GLIBCXX_SIMD_INTRINSIC _V operator-() const&& + { return _M_k ? -_M_value : _M_value; } + + template <typename _Up, typename _Flags> + [[nodiscard]] _GLIBCXX_SIMD_INTRINSIC _V + copy_from(const _LoadStorePtr<_Up, value_type>* __mem, _Flags) const&& + { return _M_k ? static_cast<_V>(__mem[0]) : _M_value; } + + template <typename _Up, typename _Flags> + _GLIBCXX_SIMD_INTRINSIC void + copy_to(_LoadStorePtr<_Up, value_type>* __mem, _Flags) const&& + { + if (_M_k) + __mem[0] = _M_value; + } + }; + +// where_expression<M, T> {{{2 +template <typename _M, typename _Tp> + class where_expression : public const_where_expression<_M, _Tp> + { + using _Impl = typename const_where_expression<_M, _Tp>::_Impl; + + static_assert(!is_const<_Tp>::value, + "where_expression may only be instantiated with __a non-const " + "_Tp parameter"); + + using typename const_where_expression<_M, _Tp>::value_type; + using const_where_expression<_M, _Tp>::_M_k; + using const_where_expression<_M, _Tp>::_M_value; + + static_assert( + is_same<typename _M::abi_type, typename _Tp::abi_type>::value, ""); + static_assert(_M::size() == _Tp::size(), ""); + + _GLIBCXX_SIMD_INTRINSIC friend _Tp& __get_lvalue(where_expression& __x) + { return __x._M_value; } + + public: + where_expression(const where_expression&) = delete; + where_expression& operator=(const where_expression&) = delete; + + _GLIBCXX_SIMD_INTRINSIC where_expression(const _M& __kk, _Tp& dd) + : const_where_expression<_M, _Tp>(__kk, dd) {} + + template <typename _Up> + _GLIBCXX_SIMD_INTRINSIC void operator=(_Up&& __x) && + { + _Impl::_S_masked_assign(__data(_M_k), __data(_M_value), + __to_value_type_or_member_type<_Tp>( + static_cast<_Up&&>(__x))); + } + +#define _GLIBCXX_SIMD_OP_(__op, __name) \ + template <typename _Up> \ + _GLIBCXX_SIMD_INTRINSIC void operator __op##=(_Up&& __x)&& \ + { \ + _Impl::template _S_masked_cassign( \ + __data(_M_k), __data(_M_value), \ + __to_value_type_or_member_type<_Tp>(static_cast<_Up&&>(__x)), \ + [](auto __impl, auto __lhs, auto __rhs) constexpr { \ + return __impl.__name(__lhs, __rhs); \ + }); \ + } \ + static_assert(true) + _GLIBCXX_SIMD_OP_(+, _S_plus); + _GLIBCXX_SIMD_OP_(-, _S_minus); + _GLIBCXX_SIMD_OP_(*, _S_multiplies); + _GLIBCXX_SIMD_OP_(/, _S_divides); + _GLIBCXX_SIMD_OP_(%, _S_modulus); + _GLIBCXX_SIMD_OP_(&, _S_bit_and); + _GLIBCXX_SIMD_OP_(|, _S_bit_or); + _GLIBCXX_SIMD_OP_(^, _S_bit_xor); + _GLIBCXX_SIMD_OP_(<<, _S_shift_left); + _GLIBCXX_SIMD_OP_(>>, _S_shift_right); +#undef _GLIBCXX_SIMD_OP_ + + _GLIBCXX_SIMD_INTRINSIC void operator++() && + { + __data(_M_value) + = _Impl::template _S_masked_unary<__increment>(__data(_M_k), + __data(_M_value)); + } + + _GLIBCXX_SIMD_INTRINSIC void operator++(int) && + { + __data(_M_value) + = _Impl::template _S_masked_unary<__increment>(__data(_M_k), + __data(_M_value)); + } + + _GLIBCXX_SIMD_INTRINSIC void operator--() && + { + __data(_M_value) + = _Impl::template _S_masked_unary<__decrement>(__data(_M_k), + __data(_M_value)); + } + + _GLIBCXX_SIMD_INTRINSIC void operator--(int) && + { + __data(_M_value) + = _Impl::template _S_masked_unary<__decrement>(__data(_M_k), + __data(_M_value)); + } + + // intentionally hides const_where_expression::copy_from + template <typename _Up, typename _Flags> + _GLIBCXX_SIMD_INTRINSIC void + copy_from(const _LoadStorePtr<_Up, value_type>* __mem, _Flags) && + { + __data(_M_value) + = _Impl::_S_masked_load(__data(_M_value), __data(_M_k), + _Flags::template _S_apply<_Tp>(__mem)); + } + }; + +// where_expression<bool, T> {{{2 +template <typename _Tp> + class where_expression<bool, _Tp> : public const_where_expression<bool, _Tp> + { + using _M = bool; + using typename const_where_expression<_M, _Tp>::value_type; + using const_where_expression<_M, _Tp>::_M_k; + using const_where_expression<_M, _Tp>::_M_value; + + public: + where_expression(const where_expression&) = delete; + where_expression& operator=(const where_expression&) = delete; + + _GLIBCXX_SIMD_INTRINSIC where_expression(const _M& __kk, _Tp& dd) + : const_where_expression<_M, _Tp>(__kk, dd) {} + +#define _GLIBCXX_SIMD_OP_(__op) \ + template <typename _Up> \ + _GLIBCXX_SIMD_INTRINSIC void operator __op(_Up&& __x)&& \ + { if (_M_k) _M_value __op static_cast<_Up&&>(__x); } + + _GLIBCXX_SIMD_OP_(=) + _GLIBCXX_SIMD_OP_(+=) + _GLIBCXX_SIMD_OP_(-=) + _GLIBCXX_SIMD_OP_(*=) + _GLIBCXX_SIMD_OP_(/=) + _GLIBCXX_SIMD_OP_(%=) + _GLIBCXX_SIMD_OP_(&=) + _GLIBCXX_SIMD_OP_(|=) + _GLIBCXX_SIMD_OP_(^=) + _GLIBCXX_SIMD_OP_(<<=) + _GLIBCXX_SIMD_OP_(>>=) + #undef _GLIBCXX_SIMD_OP_ + + _GLIBCXX_SIMD_INTRINSIC void operator++() && + { if (_M_k) ++_M_value; } + + _GLIBCXX_SIMD_INTRINSIC void operator++(int) && + { if (_M_k) ++_M_value; } + + _GLIBCXX_SIMD_INTRINSIC void operator--() && + { if (_M_k) --_M_value; } + + _GLIBCXX_SIMD_INTRINSIC void operator--(int) && + { if (_M_k) --_M_value; } + + // intentionally hides const_where_expression::copy_from + template <typename _Up, typename _Flags> + _GLIBCXX_SIMD_INTRINSIC void + copy_from(const _LoadStorePtr<_Up, value_type>* __mem, _Flags) && + { if (_M_k) _M_value = __mem[0]; } + }; + +// where {{{1 +template <typename _Tp, typename _Ap> + _GLIBCXX_SIMD_INTRINSIC where_expression<simd_mask<_Tp, _Ap>, simd<_Tp, _Ap>> + where(const typename simd<_Tp, _Ap>::mask_type& __k, simd<_Tp, _Ap>& __value) + { return {__k, __value}; } + +template <typename _Tp, typename _Ap> + _GLIBCXX_SIMD_INTRINSIC + const_where_expression<simd_mask<_Tp, _Ap>, simd<_Tp, _Ap>> + where(const typename simd<_Tp, _Ap>::mask_type& __k, + const simd<_Tp, _Ap>& __value) + { return {__k, __value}; } + +template <typename _Tp, typename _Ap> + _GLIBCXX_SIMD_INTRINSIC + where_expression<simd_mask<_Tp, _Ap>, simd_mask<_Tp, _Ap>> + where(const remove_const_t<simd_mask<_Tp, _Ap>>& __k, + simd_mask<_Tp, _Ap>& __value) + { return {__k, __value}; } + +template <typename _Tp, typename _Ap> + _GLIBCXX_SIMD_INTRINSIC + const_where_expression<simd_mask<_Tp, _Ap>, simd_mask<_Tp, _Ap>> + where(const remove_const_t<simd_mask<_Tp, _Ap>>& __k, + const simd_mask<_Tp, _Ap>& __value) + { return {__k, __value}; } + +template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC where_expression<bool, _Tp> + where(_ExactBool __k, _Tp& __value) + { return {__k, __value}; } + +template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC const_where_expression<bool, _Tp> + where(_ExactBool __k, const _Tp& __value) + { return {__k, __value}; } + + template <typename _Tp, typename _Ap> + void where(bool __k, simd<_Tp, _Ap>& __value) = delete; + + template <typename _Tp, typename _Ap> + void where(bool __k, const simd<_Tp, _Ap>& __value) = delete; + +// proposed mask iterations {{{1 +namespace __proposed { +template <size_t _Np> + class where_range + { + const bitset<_Np> __bits; + + public: + where_range(bitset<_Np> __b) : __bits(__b) {} + + class iterator + { + size_t __mask; + size_t __bit; + + _GLIBCXX_SIMD_INTRINSIC void __next_bit() + { __bit = __builtin_ctzl(__mask); } + + _GLIBCXX_SIMD_INTRINSIC void __reset_lsb() + { + // 01100100 - 1 = 01100011 + __mask &= (__mask - 1); + // __asm__("btr %1,%0" : "+r"(__mask) : "r"(__bit)); + } + + public: + iterator(decltype(__mask) __m) : __mask(__m) { __next_bit(); } + iterator(const iterator&) = default; + iterator(iterator&&) = default; + + _GLIBCXX_SIMD_ALWAYS_INLINE size_t operator->() const + { return __bit; } + + _GLIBCXX_SIMD_ALWAYS_INLINE size_t operator*() const + { return __bit; } + + _GLIBCXX_SIMD_ALWAYS_INLINE iterator& operator++() + { + __reset_lsb(); + __next_bit(); + return *this; + } + + _GLIBCXX_SIMD_ALWAYS_INLINE iterator operator++(int) + { + iterator __tmp = *this; + __reset_lsb(); + __next_bit(); + return __tmp; + } + + _GLIBCXX_SIMD_ALWAYS_INLINE bool operator==(const iterator& __rhs) const + { return __mask == __rhs.__mask; } + + _GLIBCXX_SIMD_ALWAYS_INLINE bool operator!=(const iterator& __rhs) const + { return __mask != __rhs.__mask; } + }; + + iterator begin() const + { return __bits.to_ullong(); } + + iterator end() const + { return 0; } + }; + +template <typename _Tp, typename _Ap> + where_range<simd_size_v<_Tp, _Ap>> + where(const simd_mask<_Tp, _Ap>& __k) + { return __k.__to_bitset(); } + +} // namespace __proposed + +// }}}1 +// reductions [simd.reductions] {{{1 + template <typename _Tp, typename _Abi, typename _BinaryOperation = plus<>> + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR _Tp + reduce(const simd<_Tp, _Abi>& __v, + _BinaryOperation __binary_op = _BinaryOperation()) + { return _Abi::_SimdImpl::_S_reduce(__v, __binary_op); } + +template <typename _M, typename _V, typename _BinaryOperation = plus<>> + _GLIBCXX_SIMD_INTRINSIC typename _V::value_type + reduce(const const_where_expression<_M, _V>& __x, + typename _V::value_type __identity_element, + _BinaryOperation __binary_op) + { + if (__builtin_expect(none_of(__get_mask(__x)), false)) + return __identity_element; + + _V __tmp = __identity_element; + _V::_Impl::_S_masked_assign(__data(__get_mask(__x)), __data(__tmp), + __data(__get_lvalue(__x))); + return reduce(__tmp, __binary_op); + } + +template <typename _M, typename _V> + _GLIBCXX_SIMD_INTRINSIC typename _V::value_type + reduce(const const_where_expression<_M, _V>& __x, plus<> __binary_op = {}) + { return reduce(__x, 0, __binary_op); } + +template <typename _M, typename _V> + _GLIBCXX_SIMD_INTRINSIC typename _V::value_type + reduce(const const_where_expression<_M, _V>& __x, multiplies<> __binary_op) + { return reduce(__x, 1, __binary_op); } + +template <typename _M, typename _V> + _GLIBCXX_SIMD_INTRINSIC typename _V::value_type + reduce(const const_where_expression<_M, _V>& __x, bit_and<> __binary_op) + { return reduce(__x, ~typename _V::value_type(), __binary_op); } + +template <typename _M, typename _V> + _GLIBCXX_SIMD_INTRINSIC typename _V::value_type + reduce(const const_where_expression<_M, _V>& __x, bit_or<> __binary_op) + { return reduce(__x, 0, __binary_op); } + +template <typename _M, typename _V> + _GLIBCXX_SIMD_INTRINSIC typename _V::value_type + reduce(const const_where_expression<_M, _V>& __x, bit_xor<> __binary_op) + { return reduce(__x, 0, __binary_op); } + +// }}}1 +// algorithms [simd.alg] {{{ +template <typename _Tp, typename _Ap> + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR simd<_Tp, _Ap> + min(const simd<_Tp, _Ap>& __a, const simd<_Tp, _Ap>& __b) + { return {__private_init, _Ap::_SimdImpl::_S_min(__data(__a), __data(__b))}; } + +template <typename _Tp, typename _Ap> + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR simd<_Tp, _Ap> + max(const simd<_Tp, _Ap>& __a, const simd<_Tp, _Ap>& __b) + { return {__private_init, _Ap::_SimdImpl::_S_max(__data(__a), __data(__b))}; } + +template <typename _Tp, typename _Ap> + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR + pair<simd<_Tp, _Ap>, simd<_Tp, _Ap>> + minmax(const simd<_Tp, _Ap>& __a, const simd<_Tp, _Ap>& __b) + { + const auto pair_of_members + = _Ap::_SimdImpl::_S_minmax(__data(__a), __data(__b)); + return {simd<_Tp, _Ap>(__private_init, pair_of_members.first), + simd<_Tp, _Ap>(__private_init, pair_of_members.second)}; + } + +template <typename _Tp, typename _Ap> + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR simd<_Tp, _Ap> + clamp(const simd<_Tp, _Ap>& __v, const simd<_Tp, _Ap>& __lo, + const simd<_Tp, _Ap>& __hi) + { + using _Impl = typename _Ap::_SimdImpl; + return {__private_init, + _Impl::_S_min(__data(__hi), + _Impl::_S_max(__data(__lo), __data(__v)))}; + } + +// }}} + +template <size_t... _Sizes, typename _Tp, typename _Ap, + typename = enable_if_t<((_Sizes + ...) == simd<_Tp, _Ap>::size())>> + inline tuple<simd<_Tp, simd_abi::deduce_t<_Tp, _Sizes>>...> + split(const simd<_Tp, _Ap>&); + +// __extract_part {{{ +template <int _Index, int _Total, int _Combine = 1, typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_CONST + _SimdWrapper<_Tp, _Np / _Total * _Combine> + __extract_part(const _SimdWrapper<_Tp, _Np> __x); + +template <int Index, int Parts, int _Combine = 1, typename _Tp, typename _A0, + typename... _As> + _GLIBCXX_SIMD_INTRINSIC auto + __extract_part(const _SimdTuple<_Tp, _A0, _As...>& __x); + +// }}} +// _SizeList {{{ +template <size_t _V0, size_t... _Values> + struct _SizeList + { + template <size_t _I> + static constexpr size_t _S_at(_SizeConstant<_I> = {}) + { + if constexpr (_I == 0) + return _V0; + else + return _SizeList<_Values...>::template _S_at<_I - 1>(); + } + + template <size_t _I> + static constexpr auto _S_before(_SizeConstant<_I> = {}) + { + if constexpr (_I == 0) + return _SizeConstant<0>(); + else + return _SizeConstant< + _V0 + _SizeList<_Values...>::template _S_before<_I - 1>()>(); + } + + template <size_t _Np> + static constexpr auto _S_pop_front(_SizeConstant<_Np> = {}) + { + if constexpr (_Np == 0) + return _SizeList(); + else + return _SizeList<_Values...>::template _S_pop_front<_Np - 1>(); + } + }; + +// }}} +// __extract_center {{{ +template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC _SimdWrapper<_Tp, _Np / 2> + __extract_center(_SimdWrapper<_Tp, _Np> __x) + { + static_assert(_Np >= 4); + static_assert(_Np % 4 == 0); // x0 - x1 - x2 - x3 -> return {x1, x2} +#if _GLIBCXX_SIMD_X86INTRIN // {{{ + if constexpr (__have_avx512f && sizeof(_Tp) * _Np == 64) + { + const auto __intrin = __to_intrin(__x); + if constexpr (is_integral_v<_Tp>) + return __vector_bitcast<_Tp>(_mm512_castsi512_si256( + _mm512_shuffle_i32x4(__intrin, __intrin, + 1 + 2 * 0x4 + 2 * 0x10 + 3 * 0x40))); + else if constexpr (sizeof(_Tp) == 4) + return __vector_bitcast<_Tp>(_mm512_castps512_ps256( + _mm512_shuffle_f32x4(__intrin, __intrin, + 1 + 2 * 0x4 + 2 * 0x10 + 3 * 0x40))); + else if constexpr (sizeof(_Tp) == 8) + return __vector_bitcast<_Tp>(_mm512_castpd512_pd256( + _mm512_shuffle_f64x2(__intrin, __intrin, + 1 + 2 * 0x4 + 2 * 0x10 + 3 * 0x40))); + else + __assert_unreachable<_Tp>(); + } + else if constexpr (sizeof(_Tp) * _Np == 32 && is_floating_point_v<_Tp>) + return __vector_bitcast<_Tp>( + _mm_shuffle_pd(__lo128(__vector_bitcast<double>(__x)), + __hi128(__vector_bitcast<double>(__x)), 1)); + else if constexpr (sizeof(__x) == 32 && sizeof(_Tp) * _Np <= 32) + return __vector_bitcast<_Tp>( + _mm_alignr_epi8(__hi128(__vector_bitcast<_LLong>(__x)), + __lo128(__vector_bitcast<_LLong>(__x)), + sizeof(_Tp) * _Np / 4)); + else +#endif // _GLIBCXX_SIMD_X86INTRIN }}} + { + __vector_type_t<_Tp, _Np / 2> __r; + __builtin_memcpy(&__r, + reinterpret_cast<const char*>(&__x) + + sizeof(_Tp) * _Np / 4, + sizeof(_Tp) * _Np / 2); + return __r; + } + } + +template <typename _Tp, typename _A0, typename... _As> + _GLIBCXX_SIMD_INTRINSIC + _SimdWrapper<_Tp, _SimdTuple<_Tp, _A0, _As...>::_S_size() / 2> + __extract_center(const _SimdTuple<_Tp, _A0, _As...>& __x) + { + if constexpr (sizeof...(_As) == 0) + return __extract_center(__x.first); + else + return __extract_part<1, 4, 2>(__x); + } + +// }}} +// __split_wrapper {{{ +template <size_t... _Sizes, typename _Tp, typename... _As> + auto + __split_wrapper(_SizeList<_Sizes...>, const _SimdTuple<_Tp, _As...>& __x) + { + return split<_Sizes...>( + fixed_size_simd<_Tp, _SimdTuple<_Tp, _As...>::_S_size()>(__private_init, + __x)); + } + +// }}} + +// split<simd>(simd) {{{ +template <typename _V, typename _Ap, + size_t Parts = simd_size_v<typename _V::value_type, _Ap> / _V::size()> + enable_if_t<simd_size_v<typename _V::value_type, _Ap> == Parts * _V::size() + && is_simd_v<_V>, array<_V, Parts>> + split(const simd<typename _V::value_type, _Ap>& __x) + { + using _Tp = typename _V::value_type; + if constexpr (Parts == 1) + { + return {simd_cast<_V>(__x)}; + } + else if (__x._M_is_constprop()) + { + return __generate_from_n_evaluations<Parts, array<_V, Parts>>([&]( + auto __i) constexpr { + return _V([&](auto __j) constexpr { + return __x[__i * _V::size() + __j]; + }); + }); + } + else if constexpr ( + __is_fixed_size_abi_v<_Ap> + && (is_same_v<typename _V::abi_type, simd_abi::scalar> + || (__is_fixed_size_abi_v<typename _V::abi_type> + && sizeof(_V) == sizeof(_Tp) * _V::size() // _V doesn't have padding + ))) + { + // fixed_size -> fixed_size (w/o padding) or scalar +#ifdef _GLIBCXX_SIMD_USE_ALIASING_LOADS + const __may_alias<_Tp>* const __element_ptr + = reinterpret_cast<const __may_alias<_Tp>*>(&__data(__x)); + return __generate_from_n_evaluations<Parts, array<_V, Parts>>([&]( + auto __i) constexpr { + return _V(__element_ptr + __i * _V::size(), vector_aligned); + }); +#else + const auto& __xx = __data(__x); + return __generate_from_n_evaluations<Parts, array<_V, Parts>>([&]( + auto __i) constexpr { + [[maybe_unused]] constexpr size_t __offset + = decltype(__i)::value * _V::size(); + return _V([&](auto __j) constexpr { + constexpr _SizeConstant<__j + __offset> __k; + return __xx[__k]; + }); + }); +#endif + } + else if constexpr (is_same_v<typename _V::abi_type, simd_abi::scalar>) + { + // normally memcpy should work here as well + return __generate_from_n_evaluations<Parts, array<_V, Parts>>([&]( + auto __i) constexpr { return __x[__i]; }); + } + else + { + return __generate_from_n_evaluations<Parts, array<_V, Parts>>([&]( + auto __i) constexpr { + if constexpr (__is_fixed_size_abi_v<typename _V::abi_type>) + return _V([&](auto __j) constexpr { + return __x[__i * _V::size() + __j]; + }); + else + return _V(__private_init, + __extract_part<decltype(__i)::value, Parts>(__data(__x))); + }); + } + } + +// }}} +// split<simd_mask>(simd_mask) {{{ +template <typename _V, typename _Ap, + size_t _Parts + = simd_size_v<typename _V::simd_type::value_type, _Ap> / _V::size()> + enable_if_t<is_simd_mask_v<_V> && simd_size_v<typename + _V::simd_type::value_type, _Ap> == _Parts * _V::size(), array<_V, _Parts>> + split(const simd_mask<typename _V::simd_type::value_type, _Ap>& __x) + { + if constexpr (is_same_v<_Ap, typename _V::abi_type>) + return {__x}; + else if constexpr (_Parts == 1) + return {__proposed::static_simd_cast<_V>(__x)}; + else if constexpr (_Parts == 2 && __is_sse_abi<typename _V::abi_type>() + && __is_avx_abi<_Ap>()) + return {_V(__private_init, __lo128(__data(__x))), + _V(__private_init, __hi128(__data(__x)))}; + else if constexpr (_V::size() <= __CHAR_BIT__ * sizeof(_ULLong)) + { + const bitset __bits = __x.__to_bitset(); + return __generate_from_n_evaluations<_Parts, array<_V, _Parts>>([&]( + auto __i) constexpr { + constexpr size_t __offset = __i * _V::size(); + return _V(__bitset_init, (__bits >> __offset).to_ullong()); + }); + } + else + { + return __generate_from_n_evaluations<_Parts, array<_V, _Parts>>([&]( + auto __i) constexpr { + constexpr size_t __offset = __i * _V::size(); + return _V( + __private_init, [&](auto __j) constexpr { + return __x[__j + __offset]; + }); + }); + } + } + +// }}} +// split<_Sizes...>(simd) {{{ +template <size_t... _Sizes, typename _Tp, typename _Ap, typename> + _GLIBCXX_SIMD_ALWAYS_INLINE + tuple<simd<_Tp, simd_abi::deduce_t<_Tp, _Sizes>>...> + split(const simd<_Tp, _Ap>& __x) + { + using _SL = _SizeList<_Sizes...>; + using _Tuple = tuple<__deduced_simd<_Tp, _Sizes>...>; + constexpr size_t _Np = simd_size_v<_Tp, _Ap>; + constexpr size_t _N0 = _SL::template _S_at<0>(); + using _V = __deduced_simd<_Tp, _N0>; + + if (__x._M_is_constprop()) + return __generate_from_n_evaluations<sizeof...(_Sizes), _Tuple>([&]( + auto __i) constexpr { + using _Vi = __deduced_simd<_Tp, _SL::_S_at(__i)>; + constexpr size_t __offset = _SL::_S_before(__i); + return _Vi([&](auto __j) constexpr { return __x[__offset + __j]; }); + }); + else if constexpr (_Np == _N0) + { + static_assert(sizeof...(_Sizes) == 1); + return {simd_cast<_V>(__x)}; + } + else if constexpr // split from fixed_size, such that __x::first.size == _N0 + (__is_fixed_size_abi_v< + _Ap> && __fixed_size_storage_t<_Tp, _Np>::_S_first_size == _N0) + { + static_assert( + !__is_fixed_size_abi_v<typename _V::abi_type>, + "How can <_Tp, _Np> be __a single _SimdTuple entry but __a " + "fixed_size_simd " + "when deduced?"); + // extract first and recurse (__split_wrapper is needed to deduce a new + // _Sizes pack) + return tuple_cat(make_tuple(_V(__private_init, __data(__x).first)), + __split_wrapper(_SL::template _S_pop_front<1>(), + __data(__x).second)); + } + else if constexpr ((!is_same_v<simd_abi::scalar, + simd_abi::deduce_t<_Tp, _Sizes>> && ...) + && (!__is_fixed_size_abi_v< + simd_abi::deduce_t<_Tp, _Sizes>> && ...)) + { + if constexpr (((_Sizes * 2 == _Np) && ...)) + return {{__private_init, __extract_part<0, 2>(__data(__x))}, + {__private_init, __extract_part<1, 2>(__data(__x))}}; + else if constexpr (is_same_v<_SizeList<_Sizes...>, + _SizeList<_Np / 3, _Np / 3, _Np / 3>>) + return {{__private_init, __extract_part<0, 3>(__data(__x))}, + {__private_init, __extract_part<1, 3>(__data(__x))}, + {__private_init, __extract_part<2, 3>(__data(__x))}}; + else if constexpr (is_same_v<_SizeList<_Sizes...>, + _SizeList<2 * _Np / 3, _Np / 3>>) + return {{__private_init, __extract_part<0, 3, 2>(__data(__x))}, + {__private_init, __extract_part<2, 3>(__data(__x))}}; + else if constexpr (is_same_v<_SizeList<_Sizes...>, + _SizeList<_Np / 3, 2 * _Np / 3>>) + return {{__private_init, __extract_part<0, 3>(__data(__x))}, + {__private_init, __extract_part<1, 3, 2>(__data(__x))}}; + else if constexpr (is_same_v<_SizeList<_Sizes...>, + _SizeList<_Np / 2, _Np / 4, _Np / 4>>) + return {{__private_init, __extract_part<0, 2>(__data(__x))}, + {__private_init, __extract_part<2, 4>(__data(__x))}, + {__private_init, __extract_part<3, 4>(__data(__x))}}; + else if constexpr (is_same_v<_SizeList<_Sizes...>, + _SizeList<_Np / 4, _Np / 4, _Np / 2>>) + return {{__private_init, __extract_part<0, 4>(__data(__x))}, + {__private_init, __extract_part<1, 4>(__data(__x))}, + {__private_init, __extract_part<1, 2>(__data(__x))}}; + else if constexpr (is_same_v<_SizeList<_Sizes...>, + _SizeList<_Np / 4, _Np / 2, _Np / 4>>) + return {{__private_init, __extract_part<0, 4>(__data(__x))}, + {__private_init, __extract_center(__data(__x))}, + {__private_init, __extract_part<3, 4>(__data(__x))}}; + else if constexpr (((_Sizes * 4 == _Np) && ...)) + return {{__private_init, __extract_part<0, 4>(__data(__x))}, + {__private_init, __extract_part<1, 4>(__data(__x))}, + {__private_init, __extract_part<2, 4>(__data(__x))}, + {__private_init, __extract_part<3, 4>(__data(__x))}}; + // else fall through + } +#ifdef _GLIBCXX_SIMD_USE_ALIASING_LOADS + const __may_alias<_Tp>* const __element_ptr + = reinterpret_cast<const __may_alias<_Tp>*>(&__x); + return __generate_from_n_evaluations<sizeof...(_Sizes), _Tuple>([&]( + auto __i) constexpr { + using _Vi = __deduced_simd<_Tp, _SL::_S_at(__i)>; + constexpr size_t __offset = _SL::_S_before(__i); + constexpr size_t __base_align = alignof(simd<_Tp, _Ap>); + constexpr size_t __a + = __base_align - ((__offset * sizeof(_Tp)) % __base_align); + constexpr size_t __b = ((__a - 1) & __a) ^ __a; + constexpr size_t __alignment = __b == 0 ? __a : __b; + return _Vi(__element_ptr + __offset, overaligned<__alignment>); + }); +#else + return __generate_from_n_evaluations<sizeof...(_Sizes), _Tuple>([&]( + auto __i) constexpr { + using _Vi = __deduced_simd<_Tp, _SL::_S_at(__i)>; + const auto& __xx = __data(__x); + using _Offset = decltype(_SL::_S_before(__i)); + return _Vi([&](auto __j) constexpr { + constexpr _SizeConstant<_Offset::value + __j> __k; + return __xx[__k]; + }); + }); +#endif + } + +// }}} + +// __subscript_in_pack {{{ +template <size_t _I, typename _Tp, typename _Ap, typename... _As> + _GLIBCXX_SIMD_INTRINSIC constexpr _Tp + __subscript_in_pack(const simd<_Tp, _Ap>& __x, const simd<_Tp, _As>&... __xs) + { + if constexpr (_I < simd_size_v<_Tp, _Ap>) + return __x[_I]; + else + return __subscript_in_pack<_I - simd_size_v<_Tp, _Ap>>(__xs...); + } + +// }}} +// __store_pack_of_simd {{{ +template <typename _Tp, typename _A0, typename... _As> + _GLIBCXX_SIMD_INTRINSIC void + __store_pack_of_simd(char* __mem, const simd<_Tp, _A0>& __x0, + const simd<_Tp, _As>&... __xs) + { + constexpr size_t __n_bytes = sizeof(_Tp) * simd_size_v<_Tp, _A0>; + __builtin_memcpy(__mem, &__data(__x0), __n_bytes); + if constexpr (sizeof...(__xs) > 0) + __store_pack_of_simd(__mem + __n_bytes, __xs...); + } + +// }}} +// concat(simd...) {{{ +template <typename _Tp, typename... _As> + inline _GLIBCXX_SIMD_CONSTEXPR + simd<_Tp, simd_abi::deduce_t<_Tp, (simd_size_v<_Tp, _As> + ...)>> + concat(const simd<_Tp, _As>&... __xs) + { + using _Rp = __deduced_simd<_Tp, (simd_size_v<_Tp, _As> + ...)>; + if constexpr (sizeof...(__xs) == 1) + return simd_cast<_Rp>(__xs...); + else if ((... && __xs._M_is_constprop())) + return simd<_Tp, + simd_abi::deduce_t<_Tp, (simd_size_v<_Tp, _As> + ...)>>([&]( + auto __i) constexpr { return __subscript_in_pack<__i>(__xs...); }); + else + { + _Rp __r{}; + __store_pack_of_simd(reinterpret_cast<char*>(&__data(__r)), __xs...); + return __r; + } + } + +// }}} +// concat(array<simd>) {{{ +template <typename _Tp, typename _Abi, size_t _Np> + _GLIBCXX_SIMD_ALWAYS_INLINE + _GLIBCXX_SIMD_CONSTEXPR __deduced_simd<_Tp, simd_size_v<_Tp, _Abi> * _Np> + concat(const array<simd<_Tp, _Abi>, _Np>& __x) + { + return __call_with_subscripts<_Np>(__x, [](const auto&... __xs) { + return concat(__xs...); + }); + } + +// }}} + +// _SmartReference {{{ +template <typename _Up, typename _Accessor = _Up, + typename _ValueType = typename _Up::value_type> + class _SmartReference + { + friend _Accessor; + int _M_index; + _Up& _M_obj; + + _GLIBCXX_SIMD_INTRINSIC constexpr _ValueType _M_read() const noexcept + { + if constexpr (is_arithmetic_v<_Up>) + return _M_obj; + else + return _M_obj[_M_index]; + } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC constexpr void _M_write(_Tp&& __x) const + { _Accessor::_S_set(_M_obj, _M_index, static_cast<_Tp&&>(__x)); } + + public: + _GLIBCXX_SIMD_INTRINSIC constexpr + _SmartReference(_Up& __o, int __i) noexcept + : _M_index(__i), _M_obj(__o) {} + + using value_type = _ValueType; + + _GLIBCXX_SIMD_INTRINSIC _SmartReference(const _SmartReference&) = delete; + + _GLIBCXX_SIMD_INTRINSIC constexpr operator value_type() const noexcept + { return _M_read(); } + + template <typename _Tp, + typename + = _ValuePreservingOrInt<__remove_cvref_t<_Tp>, value_type>> + _GLIBCXX_SIMD_INTRINSIC constexpr _SmartReference operator=(_Tp&& __x) && + { + _M_write(static_cast<_Tp&&>(__x)); + return {_M_obj, _M_index}; + } + +#define _GLIBCXX_SIMD_OP_(__op) \ + template <typename _Tp, \ + typename _TT \ + = decltype(declval<value_type>() __op declval<_Tp>()), \ + typename = _ValuePreservingOrInt<__remove_cvref_t<_Tp>, _TT>, \ + typename = _ValuePreservingOrInt<_TT, value_type>> \ + _GLIBCXX_SIMD_INTRINSIC constexpr _SmartReference \ + operator __op##=(_Tp&& __x) && \ + { \ + const value_type& __lhs = _M_read(); \ + _M_write(__lhs __op __x); \ + return {_M_obj, _M_index}; \ + } + _GLIBCXX_SIMD_ALL_ARITHMETICS(_GLIBCXX_SIMD_OP_); + _GLIBCXX_SIMD_ALL_SHIFTS(_GLIBCXX_SIMD_OP_); + _GLIBCXX_SIMD_ALL_BINARY(_GLIBCXX_SIMD_OP_); +#undef _GLIBCXX_SIMD_OP_ + + template <typename _Tp = void, + typename + = decltype(++declval<conditional_t<true, value_type, _Tp>&>())> + _GLIBCXX_SIMD_INTRINSIC constexpr _SmartReference operator++() && + { + value_type __x = _M_read(); + _M_write(++__x); + return {_M_obj, _M_index}; + } + + template <typename _Tp = void, + typename + = decltype(declval<conditional_t<true, value_type, _Tp>&>()++)> + _GLIBCXX_SIMD_INTRINSIC constexpr value_type operator++(int) && + { + const value_type __r = _M_read(); + value_type __x = __r; + _M_write(++__x); + return __r; + } + + template <typename _Tp = void, + typename + = decltype(--declval<conditional_t<true, value_type, _Tp>&>())> + _GLIBCXX_SIMD_INTRINSIC constexpr _SmartReference operator--() && + { + value_type __x = _M_read(); + _M_write(--__x); + return {_M_obj, _M_index}; + } + + template <typename _Tp = void, + typename + = decltype(declval<conditional_t<true, value_type, _Tp>&>()--)> + _GLIBCXX_SIMD_INTRINSIC constexpr value_type operator--(int) && + { + const value_type __r = _M_read(); + value_type __x = __r; + _M_write(--__x); + return __r; + } + + _GLIBCXX_SIMD_INTRINSIC friend void + swap(_SmartReference&& __a, _SmartReference&& __b) noexcept( + conjunction< + is_nothrow_constructible<value_type, _SmartReference&&>, + is_nothrow_assignable<_SmartReference&&, value_type&&>>::value) + { + value_type __tmp = static_cast<_SmartReference&&>(__a); + static_cast<_SmartReference&&>(__a) = static_cast<value_type>(__b); + static_cast<_SmartReference&&>(__b) = std::move(__tmp); + } + + _GLIBCXX_SIMD_INTRINSIC friend void + swap(value_type& __a, _SmartReference&& __b) noexcept( + conjunction< + is_nothrow_constructible<value_type, value_type&&>, + is_nothrow_assignable<value_type&, value_type&&>, + is_nothrow_assignable<_SmartReference&&, value_type&&>>::value) + { + value_type __tmp(std::move(__a)); + __a = static_cast<value_type>(__b); + static_cast<_SmartReference&&>(__b) = std::move(__tmp); + } + + _GLIBCXX_SIMD_INTRINSIC friend void + swap(_SmartReference&& __a, value_type& __b) noexcept( + conjunction< + is_nothrow_constructible<value_type, _SmartReference&&>, + is_nothrow_assignable<value_type&, value_type&&>, + is_nothrow_assignable<_SmartReference&&, value_type&&>>::value) + { + value_type __tmp(__a); + static_cast<_SmartReference&&>(__a) = std::move(__b); + __b = std::move(__tmp); + } + }; + +// }}} +// __scalar_abi_wrapper {{{ +template <int _Bytes> + struct __scalar_abi_wrapper + { + template <typename _Tp> static constexpr size_t _S_full_size = 1; + template <typename _Tp> static constexpr size_t _S_size = 1; + template <typename _Tp> static constexpr size_t _S_is_partial = false; + + template <typename _Tp, typename _Abi = simd_abi::scalar> + static constexpr bool _S_is_valid_v + = _Abi::template _IsValid<_Tp>::value && sizeof(_Tp) == _Bytes; + }; + +// }}} +// __decay_abi metafunction {{{ +template <typename _Tp> + struct __decay_abi { using type = _Tp; }; + +template <int _Bytes> + struct __decay_abi<__scalar_abi_wrapper<_Bytes>> + { using type = simd_abi::scalar; }; + +// }}} +// __find_next_valid_abi metafunction {{{1 +// Given an ABI tag A<N>, find an N2 < N such that A<N2>::_S_is_valid_v<_Tp> == +// true, N2 is a power-of-2, and A<N2>::_S_is_partial<_Tp> is false. Break +// recursion at 2 elements in the resulting ABI tag. In this case +// type::_S_is_valid_v<_Tp> may be false. +template <template <int> class _Abi, int _Bytes, typename _Tp> + struct __find_next_valid_abi + { + static constexpr auto _S_choose() + { + constexpr int _NextBytes = std::__bit_ceil(_Bytes) / 2; + using _NextAbi = _Abi<_NextBytes>; + if constexpr (_NextBytes < sizeof(_Tp) * 2) // break recursion + return _Abi<_Bytes>(); + else if constexpr (_NextAbi::template _S_is_partial<_Tp> == false + && _NextAbi::template _S_is_valid_v<_Tp>) + return _NextAbi(); + else + return __find_next_valid_abi<_Abi, _NextBytes, _Tp>::_S_choose(); + } + + using type = decltype(_S_choose()); + }; + +template <int _Bytes, typename _Tp> + struct __find_next_valid_abi<__scalar_abi_wrapper, _Bytes, _Tp> + { using type = simd_abi::scalar; }; + +// _AbiList {{{1 +template <template <int> class...> + struct _AbiList + { + template <typename, int> static constexpr bool _S_has_valid_abi = false; + template <typename, int> using _FirstValidAbi = void; + template <typename, int> using _BestAbi = void; + }; + +template <template <int> class _A0, template <int> class... _Rest> + struct _AbiList<_A0, _Rest...> + { + template <typename _Tp, int _Np> + static constexpr bool _S_has_valid_abi + = _A0<sizeof(_Tp) * _Np>::template _S_is_valid_v< + _Tp> || _AbiList<_Rest...>::template _S_has_valid_abi<_Tp, _Np>; + + template <typename _Tp, int _Np> + using _FirstValidAbi = conditional_t< + _A0<sizeof(_Tp) * _Np>::template _S_is_valid_v<_Tp>, + typename __decay_abi<_A0<sizeof(_Tp) * _Np>>::type, + typename _AbiList<_Rest...>::template _FirstValidAbi<_Tp, _Np>>; + + template <typename _Tp, int _Np> + static constexpr auto _S_determine_best_abi() + { + static_assert(_Np >= 1); + constexpr int _Bytes = sizeof(_Tp) * _Np; + if constexpr (_Np == 1) + return __make_dependent_t<_Tp, simd_abi::scalar>{}; + else + { + constexpr int __fullsize = _A0<_Bytes>::template _S_full_size<_Tp>; + // _A0<_Bytes> is good if: + // 1. The ABI tag is valid for _Tp + // 2. The storage overhead is no more than padding to fill the next + // power-of-2 number of bytes + if constexpr (_A0<_Bytes>::template _S_is_valid_v< + _Tp> && __fullsize / 2 < _Np) + return typename __decay_abi<_A0<_Bytes>>::type{}; + else + { + using _B = + typename __find_next_valid_abi<_A0, _Bytes, _Tp>::type; + if constexpr (_B::template _S_is_valid_v< + _Tp> && _B::template _S_size<_Tp> <= _Np) + return _B{}; + else + return + typename _AbiList<_Rest...>::template _BestAbi<_Tp, _Np>{}; + } + } + } + + template <typename _Tp, int _Np> + using _BestAbi = decltype(_S_determine_best_abi<_Tp, _Np>()); + }; + +// }}}1 + +// the following lists all native ABIs, which makes them accessible to +// simd_abi::deduce and select_best_vector_type_t (for fixed_size). Order +// matters: Whatever comes first has higher priority. +using _AllNativeAbis = _AbiList<simd_abi::_VecBltnBtmsk, simd_abi::_VecBuiltin, + __scalar_abi_wrapper>; + +// valid _SimdTraits specialization {{{1 +template <typename _Tp, typename _Abi> + struct _SimdTraits<_Tp, _Abi, void_t<typename _Abi::template _IsValid<_Tp>>> + : _Abi::template __traits<_Tp> {}; + +// __deduce_impl specializations {{{1 +// try all native ABIs (including scalar) first +template <typename _Tp, size_t _Np> + struct __deduce_impl< + _Tp, _Np, enable_if_t<_AllNativeAbis::template _S_has_valid_abi<_Tp, _Np>>> + { using type = _AllNativeAbis::_FirstValidAbi<_Tp, _Np>; }; + +// fall back to fixed_size only if scalar and native ABIs don't match +template <typename _Tp, size_t _Np, typename = void> + struct __deduce_fixed_size_fallback {}; + +template <typename _Tp, size_t _Np> + struct __deduce_fixed_size_fallback<_Tp, _Np, + enable_if_t<simd_abi::fixed_size<_Np>::template _S_is_valid_v<_Tp>>> + { using type = simd_abi::fixed_size<_Np>; }; + +template <typename _Tp, size_t _Np, typename> + struct __deduce_impl : public __deduce_fixed_size_fallback<_Tp, _Np> {}; + +//}}}1 + +// simd_mask {{{ +template <typename _Tp, typename _Abi> + class simd_mask : public _SimdTraits<_Tp, _Abi>::_MaskBase + { + // types, tags, and friends {{{ + using _Traits = _SimdTraits<_Tp, _Abi>; + using _MemberType = typename _Traits::_MaskMember; + + // We map all masks with equal element sizeof to a single integer type, the + // one given by __int_for_sizeof_t<_Tp>. This is the approach + // [[gnu::vector_size(N)]] types take as well and it reduces the number of + // template specializations in the implementation classes. + using _Ip = __int_for_sizeof_t<_Tp>; + static constexpr _Ip* _S_type_tag = nullptr; + + friend typename _Traits::_MaskBase; + friend class simd<_Tp, _Abi>; // to construct masks on return + friend typename _Traits::_SimdImpl; // to construct masks on return and + // inspect data on masked operations + public: + using _Impl = typename _Traits::_MaskImpl; + friend _Impl; + + // }}} + // member types {{{ + using value_type = bool; + using reference = _SmartReference<_MemberType, _Impl, value_type>; + using simd_type = simd<_Tp, _Abi>; + using abi_type = _Abi; + + // }}} + static constexpr size_t size() // {{{ + { return __size_or_zero_v<_Tp, _Abi>; } + + // }}} + // constructors & assignment {{{ + simd_mask() = default; + simd_mask(const simd_mask&) = default; + simd_mask(simd_mask&&) = default; + simd_mask& operator=(const simd_mask&) = default; + simd_mask& operator=(simd_mask&&) = default; + + // }}} + // access to internal representation (optional feature) {{{ + _GLIBCXX_SIMD_ALWAYS_INLINE explicit + simd_mask(typename _Traits::_MaskCastType __init) + : _M_data{__init} {} + // conversions to internal type is done in _MaskBase + + // }}} + // bitset interface (extension to be proposed) {{{ + // TS_FEEDBACK: + // Conversion of simd_mask to and from bitset makes it much easier to + // interface with other facilities. I suggest adding `static + // simd_mask::from_bitset` and `simd_mask::to_bitset`. + _GLIBCXX_SIMD_ALWAYS_INLINE static simd_mask + __from_bitset(bitset<size()> bs) + { return {__bitset_init, bs}; } + + _GLIBCXX_SIMD_ALWAYS_INLINE bitset<size()> + __to_bitset() const + { return _Impl::_S_to_bits(_M_data)._M_to_bitset(); } + + // }}} + // explicit broadcast constructor {{{ + _GLIBCXX_SIMD_ALWAYS_INLINE explicit _GLIBCXX_SIMD_CONSTEXPR + simd_mask(value_type __x) + : _M_data(_Impl::template _S_broadcast<_Ip>(__x)) {} + + // }}} + // implicit type conversion constructor {{{ + #ifdef _GLIBCXX_SIMD_ENABLE_IMPLICIT_MASK_CAST + // proposed improvement + template <typename _Up, typename _A2, + typename = enable_if_t<simd_size_v<_Up, _A2> == size()>> + _GLIBCXX_SIMD_ALWAYS_INLINE explicit(sizeof(_MemberType) + != sizeof(typename _SimdTraits<_Up, _A2>::_MaskMember)) + simd_mask(const simd_mask<_Up, _A2>& __x) + : simd_mask(__proposed::static_simd_cast<simd_mask>(__x)) {} + #else + // conforming to ISO/IEC 19570:2018 + template <typename _Up, typename = enable_if_t<conjunction< + is_same<abi_type, simd_abi::fixed_size<size()>>, + is_same<_Up, _Up>>::value>> + _GLIBCXX_SIMD_ALWAYS_INLINE + simd_mask(const simd_mask<_Up, simd_abi::fixed_size<size()>>& __x) + : _M_data(_Impl::_S_from_bitmask(__data(__x), _S_type_tag)) {} + #endif + + // }}} + // load constructor {{{ + template <typename _Flags> + _GLIBCXX_SIMD_ALWAYS_INLINE + simd_mask(const value_type* __mem, _Flags) + : _M_data(_Impl::template _S_load<_Ip>( + _Flags::template _S_apply<simd_mask>(__mem))) {} + + template <typename _Flags> + _GLIBCXX_SIMD_ALWAYS_INLINE + simd_mask(const value_type* __mem, simd_mask __k, _Flags) + : _M_data{} + { + _M_data + = _Impl::_S_masked_load(_M_data, __k._M_data, + _Flags::template _S_apply<simd_mask>(__mem)); + } + + // }}} + // loads [simd_mask.load] {{{ + template <typename _Flags> + _GLIBCXX_SIMD_ALWAYS_INLINE void + copy_from(const value_type* __mem, _Flags) + { + _M_data = _Impl::template _S_load<_Ip>( + _Flags::template _S_apply<simd_mask>(__mem)); + } + + // }}} + // stores [simd_mask.store] {{{ + template <typename _Flags> + _GLIBCXX_SIMD_ALWAYS_INLINE void + copy_to(value_type* __mem, _Flags) const + { _Impl::_S_store(_M_data, _Flags::template _S_apply<simd_mask>(__mem)); } + + // }}} + // scalar access {{{ + _GLIBCXX_SIMD_ALWAYS_INLINE reference + operator[](size_t __i) + { + if (__i >= size()) + __invoke_ub("Subscript %d is out of range [0, %d]", __i, size() - 1); + return {_M_data, int(__i)}; + } + + _GLIBCXX_SIMD_ALWAYS_INLINE value_type + operator[](size_t __i) const + { + if (__i >= size()) + __invoke_ub("Subscript %d is out of range [0, %d]", __i, size() - 1); + if constexpr (__is_scalar_abi<_Abi>()) + return _M_data; + else + return static_cast<bool>(_M_data[__i]); + } + + // }}} + // negation {{{ + _GLIBCXX_SIMD_ALWAYS_INLINE simd_mask + operator!() const + { return {__private_init, _Impl::_S_bit_not(_M_data)}; } + + // }}} + // simd_mask binary operators [simd_mask.binary] {{{ + #ifdef _GLIBCXX_SIMD_ENABLE_IMPLICIT_MASK_CAST + // simd_mask<int> && simd_mask<uint> needs disambiguation + template <typename _Up, typename _A2, + typename + = enable_if_t<is_convertible_v<simd_mask<_Up, _A2>, simd_mask>>> + _GLIBCXX_SIMD_ALWAYS_INLINE friend simd_mask + operator&&(const simd_mask& __x, const simd_mask<_Up, _A2>& __y) + { + return {__private_init, + _Impl::_S_logical_and(__x._M_data, simd_mask(__y)._M_data)}; + } + + template <typename _Up, typename _A2, + typename + = enable_if_t<is_convertible_v<simd_mask<_Up, _A2>, simd_mask>>> + _GLIBCXX_SIMD_ALWAYS_INLINE friend simd_mask + operator||(const simd_mask& __x, const simd_mask<_Up, _A2>& __y) + { + return {__private_init, + _Impl::_S_logical_or(__x._M_data, simd_mask(__y)._M_data)}; + } + #endif // _GLIBCXX_SIMD_ENABLE_IMPLICIT_MASK_CAST + + _GLIBCXX_SIMD_ALWAYS_INLINE friend simd_mask + operator&&(const simd_mask& __x, const simd_mask& __y) + { + return {__private_init, _Impl::_S_logical_and(__x._M_data, __y._M_data)}; + } + + _GLIBCXX_SIMD_ALWAYS_INLINE friend simd_mask + operator||(const simd_mask& __x, const simd_mask& __y) + { + return {__private_init, _Impl::_S_logical_or(__x._M_data, __y._M_data)}; + } + + _GLIBCXX_SIMD_ALWAYS_INLINE friend simd_mask + operator&(const simd_mask& __x, const simd_mask& __y) + { return {__private_init, _Impl::_S_bit_and(__x._M_data, __y._M_data)}; } + + _GLIBCXX_SIMD_ALWAYS_INLINE friend simd_mask + operator|(const simd_mask& __x, const simd_mask& __y) + { return {__private_init, _Impl::_S_bit_or(__x._M_data, __y._M_data)}; } + + _GLIBCXX_SIMD_ALWAYS_INLINE friend simd_mask + operator^(const simd_mask& __x, const simd_mask& __y) + { return {__private_init, _Impl::_S_bit_xor(__x._M_data, __y._M_data)}; } + + _GLIBCXX_SIMD_ALWAYS_INLINE friend simd_mask& + operator&=(simd_mask& __x, const simd_mask& __y) + { + __x._M_data = _Impl::_S_bit_and(__x._M_data, __y._M_data); + return __x; + } + + _GLIBCXX_SIMD_ALWAYS_INLINE friend simd_mask& + operator|=(simd_mask& __x, const simd_mask& __y) + { + __x._M_data = _Impl::_S_bit_or(__x._M_data, __y._M_data); + return __x; + } + + _GLIBCXX_SIMD_ALWAYS_INLINE friend simd_mask& + operator^=(simd_mask& __x, const simd_mask& __y) + { + __x._M_data = _Impl::_S_bit_xor(__x._M_data, __y._M_data); + return __x; + } + + // }}} + // simd_mask compares [simd_mask.comparison] {{{ + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend simd_mask + operator==(const simd_mask& __x, const simd_mask& __y) + { return !operator!=(__x, __y); } + + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend simd_mask + operator!=(const simd_mask& __x, const simd_mask& __y) + { return {__private_init, _Impl::_S_bit_xor(__x._M_data, __y._M_data)}; } + + // }}} + // private_init ctor {{{ + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR + simd_mask(_PrivateInit, typename _Traits::_MaskMember __init) + : _M_data(__init) {} + + // }}} + // private_init generator ctor {{{ + template <typename _Fp, typename = decltype(bool(declval<_Fp>()(size_t())))> + _GLIBCXX_SIMD_INTRINSIC constexpr + simd_mask(_PrivateInit, _Fp&& __gen) + : _M_data() + { + __execute_n_times<size()>([&](auto __i) constexpr { + _Impl::_S_set(_M_data, __i, __gen(__i)); + }); + } + + // }}} + // bitset_init ctor {{{ + _GLIBCXX_SIMD_INTRINSIC simd_mask(_BitsetInit, bitset<size()> __init) + : _M_data( + _Impl::_S_from_bitmask(_SanitizedBitMask<size()>(__init), _S_type_tag)) + {} + + // }}} + // __cvt {{{ + // TS_FEEDBACK: + // The conversion operator this implements should be a ctor on simd_mask. + // Once you call .__cvt() on a simd_mask it converts conveniently. + // A useful variation: add `explicit(sizeof(_Tp) != sizeof(_Up))` + struct _CvtProxy + { + template <typename _Up, typename _A2, + typename + = enable_if_t<simd_size_v<_Up, _A2> == simd_size_v<_Tp, _Abi>>> + operator simd_mask<_Up, _A2>() && + { + using namespace std::experimental::__proposed; + return static_simd_cast<simd_mask<_Up, _A2>>(_M_data); + } + + const simd_mask<_Tp, _Abi>& _M_data; + }; + + _GLIBCXX_SIMD_INTRINSIC _CvtProxy + __cvt() const + { return {*this}; } + + // }}} + // operator?: overloads (suggested extension) {{{ + #ifdef __GXX_CONDITIONAL_IS_OVERLOADABLE__ + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend simd_mask + operator?:(const simd_mask& __k, const simd_mask& __where_true, + const simd_mask& __where_false) + { + auto __ret = __where_false; + _Impl::_S_masked_assign(__k._M_data, __ret._M_data, __where_true._M_data); + return __ret; + } + + template <typename _U1, typename _U2, + typename _Rp = simd<common_type_t<_U1, _U2>, _Abi>, + typename = enable_if_t<conjunction_v< + is_convertible<_U1, _Rp>, is_convertible<_U2, _Rp>, + is_convertible<simd_mask, typename _Rp::mask_type>>>> + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend _Rp + operator?:(const simd_mask& __k, const _U1& __where_true, + const _U2& __where_false) + { + _Rp __ret = __where_false; + _Rp::_Impl::_S_masked_assign( + __data(static_cast<typename _Rp::mask_type>(__k)), __data(__ret), + __data(static_cast<_Rp>(__where_true))); + return __ret; + } + + #ifdef _GLIBCXX_SIMD_ENABLE_IMPLICIT_MASK_CAST + template <typename _Kp, typename _Ak, typename _Up, typename _Au, + typename = enable_if_t< + conjunction_v<is_convertible<simd_mask<_Kp, _Ak>, simd_mask>, + is_convertible<simd_mask<_Up, _Au>, simd_mask>>>> + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend simd_mask + operator?:(const simd_mask<_Kp, _Ak>& __k, const simd_mask& __where_true, + const simd_mask<_Up, _Au>& __where_false) + { + simd_mask __ret = __where_false; + _Impl::_S_masked_assign(simd_mask(__k)._M_data, __ret._M_data, + __where_true._M_data); + return __ret; + } + #endif // _GLIBCXX_SIMD_ENABLE_IMPLICIT_MASK_CAST + #endif // __GXX_CONDITIONAL_IS_OVERLOADABLE__ + + // }}} + // _M_is_constprop {{{ + _GLIBCXX_SIMD_INTRINSIC constexpr bool + _M_is_constprop() const + { + if constexpr (__is_scalar_abi<_Abi>()) + return __builtin_constant_p(_M_data); + else + return _M_data._M_is_constprop(); + } + + // }}} + + private: + friend const auto& __data<_Tp, abi_type>(const simd_mask&); + friend auto& __data<_Tp, abi_type>(simd_mask&); + alignas(_Traits::_S_mask_align) _MemberType _M_data; + }; + +// }}} + +// __data(simd_mask) {{{ +template <typename _Tp, typename _Ap> + _GLIBCXX_SIMD_INTRINSIC constexpr const auto& + __data(const simd_mask<_Tp, _Ap>& __x) + { return __x._M_data; } + +template <typename _Tp, typename _Ap> + _GLIBCXX_SIMD_INTRINSIC constexpr auto& + __data(simd_mask<_Tp, _Ap>& __x) + { return __x._M_data; } + +// }}} + +// simd_mask reductions [simd_mask.reductions] {{{ +template <typename _Tp, typename _Abi> + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR bool + all_of(const simd_mask<_Tp, _Abi>& __k) noexcept + { + if (__builtin_is_constant_evaluated() || __k._M_is_constprop()) + { + for (size_t __i = 0; __i < simd_size_v<_Tp, _Abi>; ++__i) + if (!__k[__i]) + return false; + return true; + } + else + return _Abi::_MaskImpl::_S_all_of(__k); + } + +template <typename _Tp, typename _Abi> + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR bool + any_of(const simd_mask<_Tp, _Abi>& __k) noexcept + { + if (__builtin_is_constant_evaluated() || __k._M_is_constprop()) + { + for (size_t __i = 0; __i < simd_size_v<_Tp, _Abi>; ++__i) + if (__k[__i]) + return true; + return false; + } + else + return _Abi::_MaskImpl::_S_any_of(__k); + } + +template <typename _Tp, typename _Abi> + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR bool + none_of(const simd_mask<_Tp, _Abi>& __k) noexcept + { + if (__builtin_is_constant_evaluated() || __k._M_is_constprop()) + { + for (size_t __i = 0; __i < simd_size_v<_Tp, _Abi>; ++__i) + if (__k[__i]) + return false; + return true; + } + else + return _Abi::_MaskImpl::_S_none_of(__k); + } + +template <typename _Tp, typename _Abi> + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR bool + some_of(const simd_mask<_Tp, _Abi>& __k) noexcept + { + if (__builtin_is_constant_evaluated() || __k._M_is_constprop()) + { + for (size_t __i = 1; __i < simd_size_v<_Tp, _Abi>; ++__i) + if (__k[__i] != __k[__i - 1]) + return true; + return false; + } + else + return _Abi::_MaskImpl::_S_some_of(__k); + } + +template <typename _Tp, typename _Abi> + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR int + popcount(const simd_mask<_Tp, _Abi>& __k) noexcept + { + if (__builtin_is_constant_evaluated() || __k._M_is_constprop()) + { + const int __r = __call_with_subscripts<simd_size_v<_Tp, _Abi>>( + __k, [](auto... __elements) { return ((__elements != 0) + ...); }); + if (__builtin_is_constant_evaluated() || __builtin_constant_p(__r)) + return __r; + } + return _Abi::_MaskImpl::_S_popcount(__k); + } + +template <typename _Tp, typename _Abi> + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR int + find_first_set(const simd_mask<_Tp, _Abi>& __k) + { + if (__builtin_is_constant_evaluated() || __k._M_is_constprop()) + { + constexpr size_t _Np = simd_size_v<_Tp, _Abi>; + const size_t _Idx = __call_with_n_evaluations<_Np>( + [](auto... __indexes) { return std::min({__indexes...}); }, + [&](auto __i) { return __k[__i] ? +__i : _Np; }); + if (_Idx >= _Np) + __invoke_ub("find_first_set(empty mask) is UB"); + if (__builtin_constant_p(_Idx)) + return _Idx; + } + return _Abi::_MaskImpl::_S_find_first_set(__k); + } + +template <typename _Tp, typename _Abi> + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR int + find_last_set(const simd_mask<_Tp, _Abi>& __k) + { + if (__builtin_is_constant_evaluated() || __k._M_is_constprop()) + { + constexpr size_t _Np = simd_size_v<_Tp, _Abi>; + const int _Idx = __call_with_n_evaluations<_Np>( + [](auto... __indexes) { return std::max({__indexes...}); }, + [&](auto __i) { return __k[__i] ? int(__i) : -1; }); + if (_Idx < 0) + __invoke_ub("find_first_set(empty mask) is UB"); + if (__builtin_constant_p(_Idx)) + return _Idx; + } + return _Abi::_MaskImpl::_S_find_last_set(__k); + } + +_GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR bool +all_of(_ExactBool __x) noexcept +{ return __x; } + +_GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR bool +any_of(_ExactBool __x) noexcept +{ return __x; } + +_GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR bool +none_of(_ExactBool __x) noexcept +{ return !__x; } + +_GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR bool +some_of(_ExactBool) noexcept +{ return false; } + +_GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR int +popcount(_ExactBool __x) noexcept +{ return __x; } + +_GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR int +find_first_set(_ExactBool) +{ return 0; } + +_GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR int +find_last_set(_ExactBool) +{ return 0; } + +// }}} + +// _SimdIntOperators{{{1 +template <typename _V, typename _Impl, bool> + class _SimdIntOperators {}; + +template <typename _V, typename _Impl> + class _SimdIntOperators<_V, _Impl, true> + { + _GLIBCXX_SIMD_INTRINSIC const _V& __derived() const + { return *static_cast<const _V*>(this); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _GLIBCXX_SIMD_CONSTEXPR _V + _S_make_derived(_Tp&& __d) + { return {__private_init, static_cast<_Tp&&>(__d)}; } + + public: + _GLIBCXX_SIMD_CONSTEXPR friend _V& operator%=(_V& __lhs, const _V& __x) + { return __lhs = __lhs % __x; } + + _GLIBCXX_SIMD_CONSTEXPR friend _V& operator&=(_V& __lhs, const _V& __x) + { return __lhs = __lhs & __x; } + + _GLIBCXX_SIMD_CONSTEXPR friend _V& operator|=(_V& __lhs, const _V& __x) + { return __lhs = __lhs | __x; } + + _GLIBCXX_SIMD_CONSTEXPR friend _V& operator^=(_V& __lhs, const _V& __x) + { return __lhs = __lhs ^ __x; } + + _GLIBCXX_SIMD_CONSTEXPR friend _V& operator<<=(_V& __lhs, const _V& __x) + { return __lhs = __lhs << __x; } + + _GLIBCXX_SIMD_CONSTEXPR friend _V& operator>>=(_V& __lhs, const _V& __x) + { return __lhs = __lhs >> __x; } + + _GLIBCXX_SIMD_CONSTEXPR friend _V& operator<<=(_V& __lhs, int __x) + { return __lhs = __lhs << __x; } + + _GLIBCXX_SIMD_CONSTEXPR friend _V& operator>>=(_V& __lhs, int __x) + { return __lhs = __lhs >> __x; } + + _GLIBCXX_SIMD_CONSTEXPR friend _V operator%(const _V& __x, const _V& __y) + { + return _SimdIntOperators::_S_make_derived( + _Impl::_S_modulus(__data(__x), __data(__y))); + } + + _GLIBCXX_SIMD_CONSTEXPR friend _V operator&(const _V& __x, const _V& __y) + { + return _SimdIntOperators::_S_make_derived( + _Impl::_S_bit_and(__data(__x), __data(__y))); + } + + _GLIBCXX_SIMD_CONSTEXPR friend _V operator|(const _V& __x, const _V& __y) + { + return _SimdIntOperators::_S_make_derived( + _Impl::_S_bit_or(__data(__x), __data(__y))); + } + + _GLIBCXX_SIMD_CONSTEXPR friend _V operator^(const _V& __x, const _V& __y) + { + return _SimdIntOperators::_S_make_derived( + _Impl::_S_bit_xor(__data(__x), __data(__y))); + } + + _GLIBCXX_SIMD_CONSTEXPR friend _V operator<<(const _V& __x, const _V& __y) + { + return _SimdIntOperators::_S_make_derived( + _Impl::_S_bit_shift_left(__data(__x), __data(__y))); + } + + _GLIBCXX_SIMD_CONSTEXPR friend _V operator>>(const _V& __x, const _V& __y) + { + return _SimdIntOperators::_S_make_derived( + _Impl::_S_bit_shift_right(__data(__x), __data(__y))); + } + + template <typename _VV = _V> + _GLIBCXX_SIMD_CONSTEXPR friend _V operator<<(const _V& __x, int __y) + { + using _Tp = typename _VV::value_type; + if (__y < 0) + __invoke_ub("The behavior is undefined if the right operand of a " + "shift operation is negative. [expr.shift]\nA shift by " + "%d was requested", + __y); + if (size_t(__y) >= sizeof(declval<_Tp>() << __y) * __CHAR_BIT__) + __invoke_ub( + "The behavior is undefined if the right operand of a " + "shift operation is greater than or equal to the width of the " + "promoted left operand. [expr.shift]\nA shift by %d was requested", + __y); + return _SimdIntOperators::_S_make_derived( + _Impl::_S_bit_shift_left(__data(__x), __y)); + } + + template <typename _VV = _V> + _GLIBCXX_SIMD_CONSTEXPR friend _V operator>>(const _V& __x, int __y) + { + using _Tp = typename _VV::value_type; + if (__y < 0) + __invoke_ub( + "The behavior is undefined if the right operand of a shift " + "operation is negative. [expr.shift]\nA shift by %d was requested", + __y); + if (size_t(__y) >= sizeof(declval<_Tp>() << __y) * __CHAR_BIT__) + __invoke_ub( + "The behavior is undefined if the right operand of a shift " + "operation is greater than or equal to the width of the promoted " + "left operand. [expr.shift]\nA shift by %d was requested", + __y); + return _SimdIntOperators::_S_make_derived( + _Impl::_S_bit_shift_right(__data(__x), __y)); + } + + // unary operators (for integral _Tp) + _GLIBCXX_SIMD_CONSTEXPR _V operator~() const + { return {__private_init, _Impl::_S_complement(__derived()._M_data)}; } + }; + +//}}}1 + +// simd {{{ +template <typename _Tp, typename _Abi> + class simd : public _SimdIntOperators< + simd<_Tp, _Abi>, typename _SimdTraits<_Tp, _Abi>::_SimdImpl, + conjunction<is_integral<_Tp>, + typename _SimdTraits<_Tp, _Abi>::_IsValid>::value>, + public _SimdTraits<_Tp, _Abi>::_SimdBase + { + using _Traits = _SimdTraits<_Tp, _Abi>; + using _MemberType = typename _Traits::_SimdMember; + using _CastType = typename _Traits::_SimdCastType; + static constexpr _Tp* _S_type_tag = nullptr; + friend typename _Traits::_SimdBase; + + public: + using _Impl = typename _Traits::_SimdImpl; + friend _Impl; + friend _SimdIntOperators<simd, _Impl, true>; + + using value_type = _Tp; + using reference = _SmartReference<_MemberType, _Impl, value_type>; + using mask_type = simd_mask<_Tp, _Abi>; + using abi_type = _Abi; + + static constexpr size_t size() + { return __size_or_zero_v<_Tp, _Abi>; } + + _GLIBCXX_SIMD_CONSTEXPR simd() = default; + _GLIBCXX_SIMD_CONSTEXPR simd(const simd&) = default; + _GLIBCXX_SIMD_CONSTEXPR simd(simd&&) noexcept = default; + _GLIBCXX_SIMD_CONSTEXPR simd& operator=(const simd&) = default; + _GLIBCXX_SIMD_CONSTEXPR simd& operator=(simd&&) noexcept = default; + + // implicit broadcast constructor + template <typename _Up, + typename = enable_if_t<!is_same_v<__remove_cvref_t<_Up>, bool>>> + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR + simd(_ValuePreservingOrInt<_Up, value_type>&& __x) + : _M_data( + _Impl::_S_broadcast(static_cast<value_type>(static_cast<_Up&&>(__x)))) + {} + + // implicit type conversion constructor (convert from fixed_size to + // fixed_size) + template <typename _Up> + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR + simd(const simd<_Up, simd_abi::fixed_size<size()>>& __x, + enable_if_t< + conjunction< + is_same<simd_abi::fixed_size<size()>, abi_type>, + negation<__is_narrowing_conversion<_Up, value_type>>, + __converts_to_higher_integer_rank<_Up, value_type>>::value, + void*> = nullptr) + : simd{static_cast<array<_Up, size()>>(__x).data(), vector_aligned} {} + + // explicit type conversion constructor +#ifdef _GLIBCXX_SIMD_ENABLE_STATIC_CAST + template <typename _Up, typename _A2, + typename = decltype(static_simd_cast<simd>( + declval<const simd<_Up, _A2>&>()))> + _GLIBCXX_SIMD_ALWAYS_INLINE explicit _GLIBCXX_SIMD_CONSTEXPR + simd(const simd<_Up, _A2>& __x) + : simd(static_simd_cast<simd>(__x)) {} +#endif // _GLIBCXX_SIMD_ENABLE_STATIC_CAST + + // generator constructor + template <typename _Fp> + _GLIBCXX_SIMD_ALWAYS_INLINE explicit _GLIBCXX_SIMD_CONSTEXPR + simd(_Fp&& __gen, _ValuePreservingOrInt<decltype(declval<_Fp>()( + declval<_SizeConstant<0>&>())), + value_type>* = nullptr) + : _M_data(_Impl::_S_generator(static_cast<_Fp&&>(__gen), _S_type_tag)) {} + + // load constructor + template <typename _Up, typename _Flags> + _GLIBCXX_SIMD_ALWAYS_INLINE + simd(const _Up* __mem, _Flags) + : _M_data( + _Impl::_S_load(_Flags::template _S_apply<simd>(__mem), _S_type_tag)) + {} + + // loads [simd.load] + template <typename _Up, typename _Flags> + _GLIBCXX_SIMD_ALWAYS_INLINE void + copy_from(const _Vectorizable<_Up>* __mem, _Flags) + { + _M_data = static_cast<decltype(_M_data)>( + _Impl::_S_load(_Flags::template _S_apply<simd>(__mem), _S_type_tag)); + } + + // stores [simd.store] + template <typename _Up, typename _Flags> + _GLIBCXX_SIMD_ALWAYS_INLINE void + copy_to(_Vectorizable<_Up>* __mem, _Flags) const + { + _Impl::_S_store(_M_data, _Flags::template _S_apply<simd>(__mem), + _S_type_tag); + } + + // scalar access + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR reference + operator[](size_t __i) + { return {_M_data, int(__i)}; } + + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR value_type + operator[]([[maybe_unused]] size_t __i) const + { + if constexpr (__is_scalar_abi<_Abi>()) + { + _GLIBCXX_DEBUG_ASSERT(__i == 0); + return _M_data; + } + else + return _M_data[__i]; + } + + // increment and decrement: + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR simd& + operator++() + { + _Impl::_S_increment(_M_data); + return *this; + } + + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR simd + operator++(int) + { + simd __r = *this; + _Impl::_S_increment(_M_data); + return __r; + } + + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR simd& + operator--() + { + _Impl::_S_decrement(_M_data); + return *this; + } + + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR simd + operator--(int) + { + simd __r = *this; + _Impl::_S_decrement(_M_data); + return __r; + } + + // unary operators (for any _Tp) + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR mask_type + operator!() const + { return {__private_init, _Impl::_S_negate(_M_data)}; } + + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR simd + operator+() const + { return *this; } + + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR simd + operator-() const + { return {__private_init, _Impl::_S_unary_minus(_M_data)}; } + + // access to internal representation (suggested extension) + _GLIBCXX_SIMD_ALWAYS_INLINE explicit _GLIBCXX_SIMD_CONSTEXPR + simd(_CastType __init) : _M_data(__init) {} + + // compound assignment [simd.cassign] + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend simd& + operator+=(simd& __lhs, const simd& __x) + { return __lhs = __lhs + __x; } + + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend simd& + operator-=(simd& __lhs, const simd& __x) + { return __lhs = __lhs - __x; } + + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend simd& + operator*=(simd& __lhs, const simd& __x) + { return __lhs = __lhs * __x; } + + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend simd& + operator/=(simd& __lhs, const simd& __x) + { return __lhs = __lhs / __x; } + + // binary operators [simd.binary] + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend simd + operator+(const simd& __x, const simd& __y) + { return {__private_init, _Impl::_S_plus(__x._M_data, __y._M_data)}; } + + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend simd + operator-(const simd& __x, const simd& __y) + { return {__private_init, _Impl::_S_minus(__x._M_data, __y._M_data)}; } + + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend simd + operator*(const simd& __x, const simd& __y) + { return {__private_init, _Impl::_S_multiplies(__x._M_data, __y._M_data)}; } + + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend simd + operator/(const simd& __x, const simd& __y) + { return {__private_init, _Impl::_S_divides(__x._M_data, __y._M_data)}; } + + // compares [simd.comparison] + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend mask_type + operator==(const simd& __x, const simd& __y) + { return simd::_S_make_mask(_Impl::_S_equal_to(__x._M_data, __y._M_data)); } + + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend mask_type + operator!=(const simd& __x, const simd& __y) + { + return simd::_S_make_mask( + _Impl::_S_not_equal_to(__x._M_data, __y._M_data)); + } + + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend mask_type + operator<(const simd& __x, const simd& __y) + { return simd::_S_make_mask(_Impl::_S_less(__x._M_data, __y._M_data)); } + + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend mask_type + operator<=(const simd& __x, const simd& __y) + { + return simd::_S_make_mask(_Impl::_S_less_equal(__x._M_data, __y._M_data)); + } + + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend mask_type + operator>(const simd& __x, const simd& __y) + { return simd::_S_make_mask(_Impl::_S_less(__y._M_data, __x._M_data)); } + + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend mask_type + operator>=(const simd& __x, const simd& __y) + { + return simd::_S_make_mask(_Impl::_S_less_equal(__y._M_data, __x._M_data)); + } + + // operator?: overloads (suggested extension) {{{ +#ifdef __GXX_CONDITIONAL_IS_OVERLOADABLE__ + _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend simd + operator?:(const mask_type& __k, const simd& __where_true, + const simd& __where_false) + { + auto __ret = __where_false; + _Impl::_S_masked_assign(__data(__k), __data(__ret), __data(__where_true)); + return __ret; + } + +#endif // __GXX_CONDITIONAL_IS_OVERLOADABLE__ + // }}} + + // "private" because of the first arguments's namespace + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR + simd(_PrivateInit, const _MemberType& __init) + : _M_data(__init) {} + + // "private" because of the first arguments's namespace + _GLIBCXX_SIMD_INTRINSIC + simd(_BitsetInit, bitset<size()> __init) : _M_data() + { where(mask_type(__bitset_init, __init), *this) = ~*this; } + + _GLIBCXX_SIMD_INTRINSIC constexpr bool + _M_is_constprop() const + { + if constexpr (__is_scalar_abi<_Abi>()) + return __builtin_constant_p(_M_data); + else + return _M_data._M_is_constprop(); + } + + private: + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR static mask_type + _S_make_mask(typename mask_type::_MemberType __k) + { return {__private_init, __k}; } + + friend const auto& __data<value_type, abi_type>(const simd&); + friend auto& __data<value_type, abi_type>(simd&); + alignas(_Traits::_S_simd_align) _MemberType _M_data; + }; + +// }}} +// __data {{{ +template <typename _Tp, typename _Ap> + _GLIBCXX_SIMD_INTRINSIC constexpr const auto& + __data(const simd<_Tp, _Ap>& __x) + { return __x._M_data; } + +template <typename _Tp, typename _Ap> + _GLIBCXX_SIMD_INTRINSIC constexpr auto& + __data(simd<_Tp, _Ap>& __x) + { return __x._M_data; } + +// }}} +namespace __float_bitwise_operators { //{{{ +template <typename _Tp, typename _Ap> + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR simd<_Tp, _Ap> + operator^(const simd<_Tp, _Ap>& __a, const simd<_Tp, _Ap>& __b) + { + return {__private_init, + _Ap::_SimdImpl::_S_bit_xor(__data(__a), __data(__b))}; + } + +template <typename _Tp, typename _Ap> + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR simd<_Tp, _Ap> + operator|(const simd<_Tp, _Ap>& __a, const simd<_Tp, _Ap>& __b) + { + return {__private_init, + _Ap::_SimdImpl::_S_bit_or(__data(__a), __data(__b))}; + } + +template <typename _Tp, typename _Ap> + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR simd<_Tp, _Ap> + operator&(const simd<_Tp, _Ap>& __a, const simd<_Tp, _Ap>& __b) + { + return {__private_init, + _Ap::_SimdImpl::_S_bit_and(__data(__a), __data(__b))}; + } +} // namespace __float_bitwise_operators }}} + +_GLIBCXX_SIMD_END_NAMESPACE + +#endif // __cplusplus >= 201703L +#endif // _GLIBCXX_EXPERIMENTAL_SIMD_H + +// vim: foldmethod=marker foldmarker={{{,}}} diff --git a/libstdc++-v3/include/experimental/bits/simd_builtin.h b/libstdc++-v3/include/experimental/bits/simd_builtin.h new file mode 100644 index 00000000000..f2c99faa4ee --- /dev/null +++ b/libstdc++-v3/include/experimental/bits/simd_builtin.h @@ -0,0 +1,2949 @@ +// Simd Abi specific implementations -*- C++ -*- + +// Copyright (C) 2020 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// Under Section 7 of GPL version 3, you are granted additional +// permissions described in the GCC Runtime Library Exception, version +// 3.1, as published by the Free Software Foundation. + +// You should have received a copy of the GNU General Public License and +// a copy of the GCC Runtime Library Exception along with this program; +// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +// <http://www.gnu.org/licenses/>. + +#ifndef _GLIBCXX_EXPERIMENTAL_SIMD_ABIS_H_ +#define _GLIBCXX_EXPERIMENTAL_SIMD_ABIS_H_ + +#if __cplusplus >= 201703L + +#include <array> +#include <cmath> +#include <cstdlib> + +_GLIBCXX_SIMD_BEGIN_NAMESPACE +// _S_allbits{{{ +template <typename _V> + static inline _GLIBCXX_SIMD_USE_CONSTEXPR _V _S_allbits + = reinterpret_cast<_V>(~__vector_type_t<char, sizeof(_V) / sizeof(char)>()); + +// }}} +// _S_signmask, _S_absmask{{{ +template <typename _V, typename = _VectorTraits<_V>> + static inline _GLIBCXX_SIMD_USE_CONSTEXPR _V _S_signmask + = __xor(_V() + 1, _V() - 1); + +template <typename _V, typename = _VectorTraits<_V>> + static inline _GLIBCXX_SIMD_USE_CONSTEXPR _V _S_absmask + = __andnot(_S_signmask<_V>, _S_allbits<_V>); + +//}}} +// __vector_permute<Indices...>{{{ +// Index == -1 requests zeroing of the output element +template <int... _Indices, typename _Tp, typename _TVT = _VectorTraits<_Tp>> + _Tp + __vector_permute(_Tp __x) + { + static_assert(sizeof...(_Indices) == _TVT::_S_full_size); + return __make_vector<typename _TVT::value_type>( + (_Indices == -1 ? 0 : __x[_Indices == -1 ? 0 : _Indices])...); + } + +// }}} +// __vector_shuffle<Indices...>{{{ +// Index == -1 requests zeroing of the output element +template <int... _Indices, typename _Tp, typename _TVT = _VectorTraits<_Tp>> + _Tp + __vector_shuffle(_Tp __x, _Tp __y) + { + return _Tp{(_Indices == -1 ? 0 + : _Indices < _TVT::_S_full_size + ? __x[_Indices] + : __y[_Indices - _TVT::_S_full_size])...}; + } + +// }}} +// __make_wrapper{{{ +template <typename _Tp, typename... _Args> + _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapper<_Tp, sizeof...(_Args)> + __make_wrapper(const _Args&... __args) + { return __make_vector<_Tp>(__args...); } + +// }}} +// __wrapper_bitcast{{{ +template <typename _Tp, size_t _ToN = 0, typename _Up, size_t _M, + size_t _Np = _ToN != 0 ? _ToN : sizeof(_Up) * _M / sizeof(_Tp)> + _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapper<_Tp, _Np> + __wrapper_bitcast(_SimdWrapper<_Up, _M> __x) + { + static_assert(_Np > 1); + return __intrin_bitcast<__vector_type_t<_Tp, _Np>>(__x._M_data); + } + +// }}} +// __shift_elements_right{{{ +// if (__shift % 2ⁿ == 0) => the low n Bytes are correct +template <unsigned __shift, typename _Tp, typename _TVT = _VectorTraits<_Tp>> + _GLIBCXX_SIMD_INTRINSIC _Tp + __shift_elements_right(_Tp __v) + { + [[maybe_unused]] const auto __iv = __to_intrin(__v); + static_assert(__shift <= sizeof(_Tp)); + if constexpr (__shift == 0) + return __v; + else if constexpr (__shift == sizeof(_Tp)) + return _Tp(); +#if _GLIBCXX_SIMD_X86INTRIN // {{{ + else if constexpr (__have_sse && __shift == 8 + && _TVT::template _S_is<float, 4>) + return _mm_movehl_ps(__iv, __iv); + else if constexpr (__have_sse2 && __shift == 8 + && _TVT::template _S_is<double, 2>) + return _mm_unpackhi_pd(__iv, __iv); + else if constexpr (__have_sse2 && sizeof(_Tp) == 16) + return reinterpret_cast<typename _TVT::type>( + _mm_srli_si128(reinterpret_cast<__m128i>(__iv), __shift)); + else if constexpr (__shift == 16 && sizeof(_Tp) == 32) + { + /*if constexpr (__have_avx && _TVT::template _S_is<double, 4>) + return _mm256_permute2f128_pd(__iv, __iv, 0x81); + else if constexpr (__have_avx && _TVT::template _S_is<float, 8>) + return _mm256_permute2f128_ps(__iv, __iv, 0x81); + else if constexpr (__have_avx) + return reinterpret_cast<typename _TVT::type>( + _mm256_permute2f128_si256(__iv, __iv, 0x81)); + else*/ + return __zero_extend(__hi128(__v)); + } + else if constexpr (__have_avx2 && sizeof(_Tp) == 32 && __shift < 16) + { + const auto __vll = __vector_bitcast<_LLong>(__v); + return reinterpret_cast<typename _TVT::type>( + _mm256_alignr_epi8(_mm256_permute2x128_si256(__vll, __vll, 0x81), + __vll, __shift)); + } + else if constexpr (__have_avx && sizeof(_Tp) == 32 && __shift < 16) + { + const auto __vll = __vector_bitcast<_LLong>(__v); + return reinterpret_cast<typename _TVT::type>( + __concat(_mm_alignr_epi8(__hi128(__vll), __lo128(__vll), __shift), + _mm_srli_si128(__hi128(__vll), __shift))); + } + else if constexpr (sizeof(_Tp) == 32 && __shift > 16) + return __zero_extend(__shift_elements_right<__shift - 16>(__hi128(__v))); + else if constexpr (sizeof(_Tp) == 64 && __shift == 32) + return __zero_extend(__hi256(__v)); + else if constexpr (__have_avx512f && sizeof(_Tp) == 64) + { + if constexpr (__shift >= 48) + return __zero_extend( + __shift_elements_right<__shift - 48>(__extract<3, 4>(__v))); + else if constexpr (__shift >= 32) + return __zero_extend( + __shift_elements_right<__shift - 32>(__hi256(__v))); + else if constexpr (__shift % 8 == 0) + return reinterpret_cast<typename _TVT::type>( + _mm512_alignr_epi64(__m512i(), __intrin_bitcast<__m512i>(__v), + __shift / 8)); + else if constexpr (__shift % 4 == 0) + return reinterpret_cast<typename _TVT::type>( + _mm512_alignr_epi32(__m512i(), __intrin_bitcast<__m512i>(__v), + __shift / 4)); + else if constexpr (__have_avx512bw && __shift < 16) + { + const auto __vll = __vector_bitcast<_LLong>(__v); + return reinterpret_cast<typename _TVT::type>( + _mm512_alignr_epi8(_mm512_shuffle_i32x4(__vll, __vll, 0xf9), + __vll, __shift)); + } + else if constexpr (__have_avx512bw && __shift < 32) + { + const auto __vll = __vector_bitcast<_LLong>(__v); + return reinterpret_cast<typename _TVT::type>( + _mm512_alignr_epi8(_mm512_shuffle_i32x4(__vll, __m512i(), 0xee), + _mm512_shuffle_i32x4(__vll, __vll, 0xf9), + __shift - 16)); + } + else + __assert_unreachable<_Tp>(); + } + /* + } else if constexpr (__shift % 16 == 0 && sizeof(_Tp) == 64) + return __auto_bitcast(__extract<__shift / 16, 4>(__v)); + */ +#endif // _GLIBCXX_SIMD_X86INTRIN }}} + else + { + constexpr int __chunksize = __shift % 8 == 0 ? 8 + : __shift % 4 == 0 ? 4 + : __shift % 2 == 0 ? 2 + : 1; + auto __w = __vector_bitcast<__int_with_sizeof_t<__chunksize>>(__v); + using _Up = decltype(__w); + return __intrin_bitcast<_Tp>( + __call_with_n_evaluations<(sizeof(_Tp) - __shift) / __chunksize>( + [](auto... __chunks) { return _Up{__chunks...}; }, + [&](auto __i) { return __w[__shift / __chunksize + __i]; })); + } + } + +// }}} +// __extract_part(_SimdWrapper<_Tp, _Np>) {{{ +template <int _Index, int _Total, int _Combine, typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_CONST + _SimdWrapper<_Tp, _Np / _Total * _Combine> + __extract_part(const _SimdWrapper<_Tp, _Np> __x) + { + if constexpr (_Index % 2 == 0 && _Total % 2 == 0 && _Combine % 2 == 0) + return __extract_part<_Index / 2, _Total / 2, _Combine / 2>(__x); + else + { + constexpr size_t __values_per_part = _Np / _Total; + constexpr size_t __values_to_skip = _Index * __values_per_part; + constexpr size_t __return_size = __values_per_part * _Combine; + using _R = __vector_type_t<_Tp, __return_size>; + static_assert((_Index + _Combine) * __values_per_part * sizeof(_Tp) + <= sizeof(__x), + "out of bounds __extract_part"); + // the following assertion would ensure no "padding" to be read + // static_assert(_Total >= _Index + _Combine, "_Total must be greater + // than _Index"); + + // static_assert(__return_size * _Total == _Np, "_Np must be divisible + // by _Total"); + if (__x._M_is_constprop()) + return __generate_from_n_evaluations<__return_size, _R>( + [&](auto __i) { return __x[__values_to_skip + __i]; }); + if constexpr (_Index == 0 && _Total == 1) + return __x; + else if constexpr (_Index == 0) + return __intrin_bitcast<_R>(__as_vector(__x)); +#if _GLIBCXX_SIMD_X86INTRIN // {{{ + else if constexpr (sizeof(__x) == 32 + && __return_size * sizeof(_Tp) <= 16) + { + constexpr size_t __bytes_to_skip = __values_to_skip * sizeof(_Tp); + if constexpr (__bytes_to_skip == 16) + return __vector_bitcast<_Tp, __return_size>( + __hi128(__as_vector(__x))); + else + return __vector_bitcast<_Tp, __return_size>( + _mm_alignr_epi8(__hi128(__vector_bitcast<_LLong>(__x)), + __lo128(__vector_bitcast<_LLong>(__x)), + __bytes_to_skip)); + } +#endif // _GLIBCXX_SIMD_X86INTRIN }}} + else if constexpr (_Index > 0 + && (__values_to_skip % __return_size != 0 + || sizeof(_R) >= 8) + && (__values_to_skip + __return_size) * sizeof(_Tp) + <= 64 + && sizeof(__x) >= 16) + return __intrin_bitcast<_R>( + __shift_elements_right<__values_to_skip * sizeof(_Tp)>( + __as_vector(__x))); + else + { + _R __r = {}; + __builtin_memcpy(&__r, + reinterpret_cast<const char*>(&__x) + + sizeof(_Tp) * __values_to_skip, + __return_size * sizeof(_Tp)); + return __r; + } + } + } + +// }}} +// __extract_part(_SimdWrapper<bool, _Np>) {{{ +template <int _Index, int _Total, int _Combine = 1, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapper<bool, _Np / _Total * _Combine> + __extract_part(const _SimdWrapper<bool, _Np> __x) + { + static_assert(_Combine == 1, "_Combine != 1 not implemented"); + static_assert(__have_avx512f && _Np == _Np); + static_assert(_Total >= 2 && _Index + _Combine <= _Total && _Index >= 0); + return __x._M_data >> (_Index * _Np / _Total); + } + +// }}} + +// __vector_convert {{{ +// implementation requires an index sequence +template <typename _To, typename _From, size_t... _I> + _GLIBCXX_SIMD_INTRINSIC constexpr _To + __vector_convert(_From __a, index_sequence<_I...>) + { + using _Tp = typename _VectorTraits<_To>::value_type; + return _To{static_cast<_Tp>(__a[_I])...}; + } + +template <typename _To, typename _From, size_t... _I> + _GLIBCXX_SIMD_INTRINSIC constexpr _To + __vector_convert(_From __a, _From __b, index_sequence<_I...>) + { + using _Tp = typename _VectorTraits<_To>::value_type; + return _To{static_cast<_Tp>(__a[_I])..., static_cast<_Tp>(__b[_I])...}; + } + +template <typename _To, typename _From, size_t... _I> + _GLIBCXX_SIMD_INTRINSIC constexpr _To + __vector_convert(_From __a, _From __b, _From __c, index_sequence<_I...>) + { + using _Tp = typename _VectorTraits<_To>::value_type; + return _To{static_cast<_Tp>(__a[_I])..., static_cast<_Tp>(__b[_I])..., + static_cast<_Tp>(__c[_I])...}; + } + +template <typename _To, typename _From, size_t... _I> + _GLIBCXX_SIMD_INTRINSIC constexpr _To + __vector_convert(_From __a, _From __b, _From __c, _From __d, + index_sequence<_I...>) + { + using _Tp = typename _VectorTraits<_To>::value_type; + return _To{static_cast<_Tp>(__a[_I])..., static_cast<_Tp>(__b[_I])..., + static_cast<_Tp>(__c[_I])..., static_cast<_Tp>(__d[_I])...}; + } + +template <typename _To, typename _From, size_t... _I> + _GLIBCXX_SIMD_INTRINSIC constexpr _To + __vector_convert(_From __a, _From __b, _From __c, _From __d, _From __e, + index_sequence<_I...>) + { + using _Tp = typename _VectorTraits<_To>::value_type; + return _To{static_cast<_Tp>(__a[_I])..., static_cast<_Tp>(__b[_I])..., + static_cast<_Tp>(__c[_I])..., static_cast<_Tp>(__d[_I])..., + static_cast<_Tp>(__e[_I])...}; + } + +template <typename _To, typename _From, size_t... _I> + _GLIBCXX_SIMD_INTRINSIC constexpr _To + __vector_convert(_From __a, _From __b, _From __c, _From __d, _From __e, + _From __f, index_sequence<_I...>) + { + using _Tp = typename _VectorTraits<_To>::value_type; + return _To{static_cast<_Tp>(__a[_I])..., static_cast<_Tp>(__b[_I])..., + static_cast<_Tp>(__c[_I])..., static_cast<_Tp>(__d[_I])..., + static_cast<_Tp>(__e[_I])..., static_cast<_Tp>(__f[_I])...}; + } + +template <typename _To, typename _From, size_t... _I> + _GLIBCXX_SIMD_INTRINSIC constexpr _To + __vector_convert(_From __a, _From __b, _From __c, _From __d, _From __e, + _From __f, _From __g, index_sequence<_I...>) + { + using _Tp = typename _VectorTraits<_To>::value_type; + return _To{static_cast<_Tp>(__a[_I])..., static_cast<_Tp>(__b[_I])..., + static_cast<_Tp>(__c[_I])..., static_cast<_Tp>(__d[_I])..., + static_cast<_Tp>(__e[_I])..., static_cast<_Tp>(__f[_I])..., + static_cast<_Tp>(__g[_I])...}; + } + +template <typename _To, typename _From, size_t... _I> + _GLIBCXX_SIMD_INTRINSIC constexpr _To + __vector_convert(_From __a, _From __b, _From __c, _From __d, _From __e, + _From __f, _From __g, _From __h, index_sequence<_I...>) + { + using _Tp = typename _VectorTraits<_To>::value_type; + return _To{static_cast<_Tp>(__a[_I])..., static_cast<_Tp>(__b[_I])..., + static_cast<_Tp>(__c[_I])..., static_cast<_Tp>(__d[_I])..., + static_cast<_Tp>(__e[_I])..., static_cast<_Tp>(__f[_I])..., + static_cast<_Tp>(__g[_I])..., static_cast<_Tp>(__h[_I])...}; + } + +template <typename _To, typename _From, size_t... _I> + _GLIBCXX_SIMD_INTRINSIC constexpr _To + __vector_convert(_From __a, _From __b, _From __c, _From __d, _From __e, + _From __f, _From __g, _From __h, _From __i, + index_sequence<_I...>) + { + using _Tp = typename _VectorTraits<_To>::value_type; + return _To{static_cast<_Tp>(__a[_I])..., static_cast<_Tp>(__b[_I])..., + static_cast<_Tp>(__c[_I])..., static_cast<_Tp>(__d[_I])..., + static_cast<_Tp>(__e[_I])..., static_cast<_Tp>(__f[_I])..., + static_cast<_Tp>(__g[_I])..., static_cast<_Tp>(__h[_I])..., + static_cast<_Tp>(__i[_I])...}; + } + +template <typename _To, typename _From, size_t... _I> + _GLIBCXX_SIMD_INTRINSIC constexpr _To + __vector_convert(_From __a, _From __b, _From __c, _From __d, _From __e, + _From __f, _From __g, _From __h, _From __i, _From __j, + index_sequence<_I...>) + { + using _Tp = typename _VectorTraits<_To>::value_type; + return _To{static_cast<_Tp>(__a[_I])..., static_cast<_Tp>(__b[_I])..., + static_cast<_Tp>(__c[_I])..., static_cast<_Tp>(__d[_I])..., + static_cast<_Tp>(__e[_I])..., static_cast<_Tp>(__f[_I])..., + static_cast<_Tp>(__g[_I])..., static_cast<_Tp>(__h[_I])..., + static_cast<_Tp>(__i[_I])..., static_cast<_Tp>(__j[_I])...}; + } + +template <typename _To, typename _From, size_t... _I> + _GLIBCXX_SIMD_INTRINSIC constexpr _To + __vector_convert(_From __a, _From __b, _From __c, _From __d, _From __e, + _From __f, _From __g, _From __h, _From __i, _From __j, + _From __k, index_sequence<_I...>) + { + using _Tp = typename _VectorTraits<_To>::value_type; + return _To{static_cast<_Tp>(__a[_I])..., static_cast<_Tp>(__b[_I])..., + static_cast<_Tp>(__c[_I])..., static_cast<_Tp>(__d[_I])..., + static_cast<_Tp>(__e[_I])..., static_cast<_Tp>(__f[_I])..., + static_cast<_Tp>(__g[_I])..., static_cast<_Tp>(__h[_I])..., + static_cast<_Tp>(__i[_I])..., static_cast<_Tp>(__j[_I])..., + static_cast<_Tp>(__k[_I])...}; + } + +template <typename _To, typename _From, size_t... _I> + _GLIBCXX_SIMD_INTRINSIC constexpr _To + __vector_convert(_From __a, _From __b, _From __c, _From __d, _From __e, + _From __f, _From __g, _From __h, _From __i, _From __j, + _From __k, _From __l, index_sequence<_I...>) + { + using _Tp = typename _VectorTraits<_To>::value_type; + return _To{static_cast<_Tp>(__a[_I])..., static_cast<_Tp>(__b[_I])..., + static_cast<_Tp>(__c[_I])..., static_cast<_Tp>(__d[_I])..., + static_cast<_Tp>(__e[_I])..., static_cast<_Tp>(__f[_I])..., + static_cast<_Tp>(__g[_I])..., static_cast<_Tp>(__h[_I])..., + static_cast<_Tp>(__i[_I])..., static_cast<_Tp>(__j[_I])..., + static_cast<_Tp>(__k[_I])..., static_cast<_Tp>(__l[_I])...}; + } + +template <typename _To, typename _From, size_t... _I> + _GLIBCXX_SIMD_INTRINSIC constexpr _To + __vector_convert(_From __a, _From __b, _From __c, _From __d, _From __e, + _From __f, _From __g, _From __h, _From __i, _From __j, + _From __k, _From __l, _From __m, index_sequence<_I...>) + { + using _Tp = typename _VectorTraits<_To>::value_type; + return _To{static_cast<_Tp>(__a[_I])..., static_cast<_Tp>(__b[_I])..., + static_cast<_Tp>(__c[_I])..., static_cast<_Tp>(__d[_I])..., + static_cast<_Tp>(__e[_I])..., static_cast<_Tp>(__f[_I])..., + static_cast<_Tp>(__g[_I])..., static_cast<_Tp>(__h[_I])..., + static_cast<_Tp>(__i[_I])..., static_cast<_Tp>(__j[_I])..., + static_cast<_Tp>(__k[_I])..., static_cast<_Tp>(__l[_I])..., + static_cast<_Tp>(__m[_I])...}; + } + +template <typename _To, typename _From, size_t... _I> + _GLIBCXX_SIMD_INTRINSIC constexpr _To + __vector_convert(_From __a, _From __b, _From __c, _From __d, _From __e, + _From __f, _From __g, _From __h, _From __i, _From __j, + _From __k, _From __l, _From __m, _From __n, + index_sequence<_I...>) + { + using _Tp = typename _VectorTraits<_To>::value_type; + return _To{static_cast<_Tp>(__a[_I])..., static_cast<_Tp>(__b[_I])..., + static_cast<_Tp>(__c[_I])..., static_cast<_Tp>(__d[_I])..., + static_cast<_Tp>(__e[_I])..., static_cast<_Tp>(__f[_I])..., + static_cast<_Tp>(__g[_I])..., static_cast<_Tp>(__h[_I])..., + static_cast<_Tp>(__i[_I])..., static_cast<_Tp>(__j[_I])..., + static_cast<_Tp>(__k[_I])..., static_cast<_Tp>(__l[_I])..., + static_cast<_Tp>(__m[_I])..., static_cast<_Tp>(__n[_I])...}; + } + +template <typename _To, typename _From, size_t... _I> + _GLIBCXX_SIMD_INTRINSIC constexpr _To + __vector_convert(_From __a, _From __b, _From __c, _From __d, _From __e, + _From __f, _From __g, _From __h, _From __i, _From __j, + _From __k, _From __l, _From __m, _From __n, _From __o, + index_sequence<_I...>) + { + using _Tp = typename _VectorTraits<_To>::value_type; + return _To{static_cast<_Tp>(__a[_I])..., static_cast<_Tp>(__b[_I])..., + static_cast<_Tp>(__c[_I])..., static_cast<_Tp>(__d[_I])..., + static_cast<_Tp>(__e[_I])..., static_cast<_Tp>(__f[_I])..., + static_cast<_Tp>(__g[_I])..., static_cast<_Tp>(__h[_I])..., + static_cast<_Tp>(__i[_I])..., static_cast<_Tp>(__j[_I])..., + static_cast<_Tp>(__k[_I])..., static_cast<_Tp>(__l[_I])..., + static_cast<_Tp>(__m[_I])..., static_cast<_Tp>(__n[_I])..., + static_cast<_Tp>(__o[_I])...}; + } + +template <typename _To, typename _From, size_t... _I> + _GLIBCXX_SIMD_INTRINSIC constexpr _To + __vector_convert(_From __a, _From __b, _From __c, _From __d, _From __e, + _From __f, _From __g, _From __h, _From __i, _From __j, + _From __k, _From __l, _From __m, _From __n, _From __o, + _From __p, index_sequence<_I...>) + { + using _Tp = typename _VectorTraits<_To>::value_type; + return _To{static_cast<_Tp>(__a[_I])..., static_cast<_Tp>(__b[_I])..., + static_cast<_Tp>(__c[_I])..., static_cast<_Tp>(__d[_I])..., + static_cast<_Tp>(__e[_I])..., static_cast<_Tp>(__f[_I])..., + static_cast<_Tp>(__g[_I])..., static_cast<_Tp>(__h[_I])..., + static_cast<_Tp>(__i[_I])..., static_cast<_Tp>(__j[_I])..., + static_cast<_Tp>(__k[_I])..., static_cast<_Tp>(__l[_I])..., + static_cast<_Tp>(__m[_I])..., static_cast<_Tp>(__n[_I])..., + static_cast<_Tp>(__o[_I])..., static_cast<_Tp>(__p[_I])...}; + } + +// Defer actual conversion to the overload that takes an index sequence. Note +// that this function adds zeros or drops values off the end if you don't ensure +// matching width. +template <typename _To, typename... _From, size_t _FromSize> + _GLIBCXX_SIMD_INTRINSIC constexpr _To + __vector_convert(_SimdWrapper<_From, _FromSize>... __xs) + { +#ifdef _GLIBCXX_SIMD_WORKAROUND_PR85048 + using _From0 = __first_of_pack_t<_From...>; + using _FW = _SimdWrapper<_From0, _FromSize>; + if (!_FW::_S_is_partial && !(... && __xs._M_is_constprop())) + { + if constexpr ((sizeof...(_From) & (sizeof...(_From) - 1)) + == 0) // power-of-two number of arguments + return __convert_x86<_To>(__as_vector(__xs)...); + else // append zeros and recurse until the above branch is taken + return __vector_convert<_To>(__xs..., _FW{}); + } + else +#endif + return __vector_convert<_To>( + __as_vector(__xs)..., + make_index_sequence<(sizeof...(__xs) == 1 ? std::min( + _VectorTraits<_To>::_S_full_size, int(_FromSize)) + : _FromSize)>()); + } + +// }}} +// __convert function{{{ +template <typename _To, typename _From, typename... _More> + _GLIBCXX_SIMD_INTRINSIC constexpr auto + __convert(_From __v0, _More... __vs) + { + static_assert((true && ... && is_same_v<_From, _More>) ); + if constexpr (__is_vectorizable_v<_From>) + { + using _V = typename _VectorTraits<_To>::type; + using _Tp = typename _VectorTraits<_To>::value_type; + return _V{static_cast<_Tp>(__v0), static_cast<_Tp>(__vs)...}; + } + else if constexpr (__is_vector_type_v<_From>) + return __convert<_To>(__as_wrapper(__v0), __as_wrapper(__vs)...); + else // _SimdWrapper arguments + { + constexpr size_t __input_size = _From::_S_size * (1 + sizeof...(_More)); + if constexpr (__is_vectorizable_v<_To>) + return __convert<__vector_type_t<_To, __input_size>>(__v0, __vs...); + else if constexpr (!__is_vector_type_v<_To>) + return _To(__convert<typename _To::_BuiltinType>(__v0, __vs...)); + else + { + static_assert( + sizeof...(_More) == 0 + || _VectorTraits<_To>::_S_full_size >= __input_size, + "__convert(...) requires the input to fit into the output"); + return __vector_convert<_To>(__v0, __vs...); + } + } + } + +// }}} +// __convert_all{{{ +// Converts __v into array<_To, N>, where N is _NParts if non-zero or +// otherwise deduced from _To such that N * #elements(_To) <= #elements(__v). +// Note: this function may return less than all converted elements +template <typename _To, + size_t _NParts = 0, // allows to convert fewer or more (only last + // _To, to be partially filled) than all + size_t _Offset = 0, // where to start, # of elements (not Bytes or + // Parts) + typename _From, typename _FromVT = _VectorTraits<_From>> + _GLIBCXX_SIMD_INTRINSIC auto + __convert_all(_From __v) + { + if constexpr (is_arithmetic_v<_To> && _NParts != 1) + { + static_assert(_Offset < _FromVT::_S_full_size); + constexpr auto _Np + = _NParts == 0 ? _FromVT::_S_partial_width - _Offset : _NParts; + return __generate_from_n_evaluations<_Np, array<_To, _Np>>( + [&](auto __i) { return static_cast<_To>(__v[__i + _Offset]); }); + } + else + { + static_assert(__is_vector_type_v<_To>); + using _ToVT = _VectorTraits<_To>; + if constexpr (__is_vector_type_v<_From>) + return __convert_all<_To, _NParts>(__as_wrapper(__v)); + else if constexpr (_NParts == 1) + { + static_assert(_Offset % _ToVT::_S_full_size == 0); + return array<_To, 1>{__vector_convert<_To>( + __extract_part<_Offset / _ToVT::_S_full_size, + __div_roundup(_FromVT::_S_partial_width, + _ToVT::_S_full_size)>(__v))}; + } +#if _GLIBCXX_SIMD_X86INTRIN // {{{ + else if constexpr (!__have_sse4_1 && _Offset == 0 + && is_integral_v<typename _FromVT::value_type> + && sizeof(typename _FromVT::value_type) + < sizeof(typename _ToVT::value_type) + && !(sizeof(typename _FromVT::value_type) == 4 + && is_same_v<typename _ToVT::value_type, double>)) + { + using _ToT = typename _ToVT::value_type; + using _FromT = typename _FromVT::value_type; + constexpr size_t _Np + = _NParts != 0 + ? _NParts + : (_FromVT::_S_partial_width / _ToVT::_S_full_size); + using _R = array<_To, _Np>; + // __adjust modifies its input to have _Np (use _SizeConstant) + // entries so that no unnecessary intermediate conversions are + // requested and, more importantly, no intermediate conversions are + // missing + [[maybe_unused]] auto __adjust + = [](auto __n, + auto __vv) -> _SimdWrapper<_FromT, decltype(__n)::value> { + return __vector_bitcast<_FromT, decltype(__n)::value>(__vv); + }; + [[maybe_unused]] const auto __vi = __to_intrin(__v); + auto&& __make_array = [](auto __x0, [[maybe_unused]] auto __x1) { + if constexpr (_Np == 1) + return _R{__intrin_bitcast<_To>(__x0)}; + else + return _R{__intrin_bitcast<_To>(__x0), + __intrin_bitcast<_To>(__x1)}; + }; + + if constexpr (_Np == 0) + return _R{}; + else if constexpr (sizeof(_FromT) == 1 && sizeof(_ToT) == 2) + { + static_assert(is_integral_v<_FromT>); + static_assert(is_integral_v<_ToT>); + if constexpr (is_unsigned_v<_FromT>) + return __make_array(_mm_unpacklo_epi8(__vi, __m128i()), + _mm_unpackhi_epi8(__vi, __m128i())); + else + return __make_array( + _mm_srai_epi16(_mm_unpacklo_epi8(__vi, __vi), 8), + _mm_srai_epi16(_mm_unpackhi_epi8(__vi, __vi), 8)); + } + else if constexpr (sizeof(_FromT) == 2 && sizeof(_ToT) == 4) + { + static_assert(is_integral_v<_FromT>); + if constexpr (is_floating_point_v<_ToT>) + { + const auto __ints + = __convert_all<__vector_type16_t<int>, _Np>( + __adjust(_SizeConstant<_Np * 4>(), __v)); + return __generate_from_n_evaluations<_Np, _R>( + [&](auto __i) { + return __vector_convert<_To>(__as_wrapper(__ints[__i])); + }); + } + else if constexpr (is_unsigned_v<_FromT>) + return __make_array(_mm_unpacklo_epi16(__vi, __m128i()), + _mm_unpackhi_epi16(__vi, __m128i())); + else + return __make_array( + _mm_srai_epi32(_mm_unpacklo_epi16(__vi, __vi), 16), + _mm_srai_epi32(_mm_unpackhi_epi16(__vi, __vi), 16)); + } + else if constexpr (sizeof(_FromT) == 4 && sizeof(_ToT) == 8 + && is_integral_v<_FromT> && is_integral_v<_ToT>) + { + if constexpr (is_unsigned_v<_FromT>) + return __make_array(_mm_unpacklo_epi32(__vi, __m128i()), + _mm_unpackhi_epi32(__vi, __m128i())); + else + return __make_array( + _mm_unpacklo_epi32(__vi, _mm_srai_epi32(__vi, 31)), + _mm_unpackhi_epi32(__vi, _mm_srai_epi32(__vi, 31))); + } + else if constexpr (sizeof(_FromT) == 4 && sizeof(_ToT) == 8 + && is_integral_v<_FromT> && is_integral_v<_ToT>) + { + if constexpr (is_unsigned_v<_FromT>) + return __make_array(_mm_unpacklo_epi32(__vi, __m128i()), + _mm_unpackhi_epi32(__vi, __m128i())); + else + return __make_array( + _mm_unpacklo_epi32(__vi, _mm_srai_epi32(__vi, 31)), + _mm_unpackhi_epi32(__vi, _mm_srai_epi32(__vi, 31))); + } + else if constexpr (sizeof(_FromT) == 1 && sizeof(_ToT) >= 4 + && is_signed_v<_FromT>) + { + const __m128i __vv[2] = {_mm_unpacklo_epi8(__vi, __vi), + _mm_unpackhi_epi8(__vi, __vi)}; + const __vector_type_t<int, 4> __vvvv[4] = { + __vector_bitcast<int>(_mm_unpacklo_epi16(__vv[0], __vv[0])), + __vector_bitcast<int>(_mm_unpackhi_epi16(__vv[0], __vv[0])), + __vector_bitcast<int>(_mm_unpacklo_epi16(__vv[1], __vv[1])), + __vector_bitcast<int>(_mm_unpackhi_epi16(__vv[1], __vv[1]))}; + if constexpr (sizeof(_ToT) == 4) + return __generate_from_n_evaluations<_Np, _R>([&](auto __i) { + return __vector_convert<_To>( + _SimdWrapper<int, 4>(__vvvv[__i] >> 24)); + }); + else if constexpr (is_integral_v<_ToT>) + return __generate_from_n_evaluations<_Np, _R>([&](auto __i) { + const auto __signbits = __to_intrin(__vvvv[__i / 2] >> 31); + const auto __sx32 = __to_intrin(__vvvv[__i / 2] >> 24); + return __vector_bitcast<_ToT>( + __i % 2 == 0 ? _mm_unpacklo_epi32(__sx32, __signbits) + : _mm_unpackhi_epi32(__sx32, __signbits)); + }); + else + return __generate_from_n_evaluations<_Np, _R>([&](auto __i) { + const _SimdWrapper<int, 4> __int4 = __vvvv[__i / 2] >> 24; + return __vector_convert<_To>( + __i % 2 == 0 ? __int4 + : _SimdWrapper<int, 4>( + _mm_unpackhi_epi64(__to_intrin(__int4), + __to_intrin(__int4)))); + }); + } + else if constexpr (sizeof(_FromT) == 1 && sizeof(_ToT) == 4) + { + const auto __shorts = __convert_all<__vector_type16_t< + conditional_t<is_signed_v<_FromT>, short, unsigned short>>>( + __adjust(_SizeConstant<(_Np + 1) / 2 * 8>(), __v)); + return __generate_from_n_evaluations<_Np, _R>([&](auto __i) { + return __convert_all<_To>(__shorts[__i / 2])[__i % 2]; + }); + } + else if constexpr (sizeof(_FromT) == 2 && sizeof(_ToT) == 8 + && is_signed_v<_FromT> && is_integral_v<_ToT>) + { + const __m128i __vv[2] = {_mm_unpacklo_epi16(__vi, __vi), + _mm_unpackhi_epi16(__vi, __vi)}; + const __vector_type16_t<int> __vvvv[4] + = {__vector_bitcast<int>( + _mm_unpacklo_epi32(_mm_srai_epi32(__vv[0], 16), + _mm_srai_epi32(__vv[0], 31))), + __vector_bitcast<int>( + _mm_unpackhi_epi32(_mm_srai_epi32(__vv[0], 16), + _mm_srai_epi32(__vv[0], 31))), + __vector_bitcast<int>( + _mm_unpacklo_epi32(_mm_srai_epi32(__vv[1], 16), + _mm_srai_epi32(__vv[1], 31))), + __vector_bitcast<int>( + _mm_unpackhi_epi32(_mm_srai_epi32(__vv[1], 16), + _mm_srai_epi32(__vv[1], 31)))}; + return __generate_from_n_evaluations<_Np, _R>([&](auto __i) { + return __vector_bitcast<_ToT>(__vvvv[__i]); + }); + } + else if constexpr (sizeof(_FromT) <= 2 && sizeof(_ToT) == 8) + { + const auto __ints + = __convert_all<__vector_type16_t<conditional_t< + is_signed_v<_FromT> || is_floating_point_v<_ToT>, int, + unsigned int>>>( + __adjust(_SizeConstant<(_Np + 1) / 2 * 4>(), __v)); + return __generate_from_n_evaluations<_Np, _R>([&](auto __i) { + return __convert_all<_To>(__ints[__i / 2])[__i % 2]; + }); + } + else + __assert_unreachable<_To>(); + } +#endif // _GLIBCXX_SIMD_X86INTRIN }}} + else if constexpr ((_FromVT::_S_partial_width - _Offset) + > _ToVT::_S_full_size) + { + /* + static_assert( + (_FromVT::_S_partial_width & (_FromVT::_S_partial_width - 1)) == + 0, + "__convert_all only supports power-of-2 number of elements. + Otherwise " "the return type cannot be array<_To, N>."); + */ + constexpr size_t _NTotal + = (_FromVT::_S_partial_width - _Offset) / _ToVT::_S_full_size; + constexpr size_t _Np = _NParts == 0 ? _NTotal : _NParts; + static_assert( + _Np <= _NTotal + || (_Np == _NTotal + 1 + && (_FromVT::_S_partial_width - _Offset) % _ToVT::_S_full_size + > 0)); + using _R = array<_To, _Np>; + if constexpr (_Np == 1) + return _R{__vector_convert<_To>( + __extract_part<_Offset, _FromVT::_S_partial_width, + _ToVT::_S_full_size>(__v))}; + else + return __generate_from_n_evaluations<_Np, _R>([&]( + auto __i) constexpr { + auto __part + = __extract_part<__i * _ToVT::_S_full_size + _Offset, + _FromVT::_S_partial_width, + _ToVT::_S_full_size>(__v); + return __vector_convert<_To>(__part); + }); + } + else if constexpr (_Offset == 0) + return array<_To, 1>{__vector_convert<_To>(__v)}; + else + return array<_To, 1>{__vector_convert<_To>( + __extract_part<_Offset, _FromVT::_S_partial_width, + _FromVT::_S_partial_width - _Offset>(__v))}; + } + } + +// }}} + +// _GnuTraits {{{ +template <typename _Tp, typename _Mp, typename _Abi, size_t _Np> + struct _GnuTraits + { + using _IsValid = true_type; + using _SimdImpl = typename _Abi::_SimdImpl; + using _MaskImpl = typename _Abi::_MaskImpl; + + // simd and simd_mask member types {{{ + using _SimdMember = _SimdWrapper<_Tp, _Np>; + using _MaskMember = _SimdWrapper<_Mp, _Np>; + static constexpr size_t _S_simd_align = alignof(_SimdMember); + static constexpr size_t _S_mask_align = alignof(_MaskMember); + + // }}} + // size metadata {{{ + static constexpr size_t _S_full_size = _SimdMember::_S_full_size; + static constexpr bool _S_is_partial = _SimdMember::_S_is_partial; + + // }}} + // _SimdBase / base class for simd, providing extra conversions {{{ + struct _SimdBase2 + { + explicit operator __intrinsic_type_t<_Tp, _Np>() const + { + return __to_intrin(static_cast<const simd<_Tp, _Abi>*>(this)->_M_data); + } + explicit operator __vector_type_t<_Tp, _Np>() const + { + return static_cast<const simd<_Tp, _Abi>*>(this)->_M_data.__builtin(); + } + }; + + struct _SimdBase1 + { + explicit operator __intrinsic_type_t<_Tp, _Np>() const + { return __data(*static_cast<const simd<_Tp, _Abi>*>(this)); } + }; + + using _SimdBase = conditional_t< + is_same<__intrinsic_type_t<_Tp, _Np>, __vector_type_t<_Tp, _Np>>::value, + _SimdBase1, _SimdBase2>; + + // }}} + // _MaskBase {{{ + struct _MaskBase2 + { + explicit operator __intrinsic_type_t<_Tp, _Np>() const + { + return static_cast<const simd_mask<_Tp, _Abi>*>(this) + ->_M_data.__intrin(); + } + explicit operator __vector_type_t<_Tp, _Np>() const + { + return static_cast<const simd_mask<_Tp, _Abi>*>(this)->_M_data._M_data; + } + }; + + struct _MaskBase1 + { + explicit operator __intrinsic_type_t<_Tp, _Np>() const + { return __data(*static_cast<const simd_mask<_Tp, _Abi>*>(this)); } + }; + + using _MaskBase = conditional_t< + is_same<__intrinsic_type_t<_Tp, _Np>, __vector_type_t<_Tp, _Np>>::value, + _MaskBase1, _MaskBase2>; + + // }}} + // _MaskCastType {{{ + // parameter type of one explicit simd_mask constructor + class _MaskCastType + { + using _Up = __intrinsic_type_t<_Tp, _Np>; + _Up _M_data; + + public: + _MaskCastType(_Up __x) : _M_data(__x) {} + operator _MaskMember() const { return _M_data; } + }; + + // }}} + // _SimdCastType {{{ + // parameter type of one explicit simd constructor + class _SimdCastType1 + { + using _Ap = __intrinsic_type_t<_Tp, _Np>; + _SimdMember _M_data; + + public: + _SimdCastType1(_Ap __a) : _M_data(__vector_bitcast<_Tp>(__a)) {} + operator _SimdMember() const { return _M_data; } + }; + + class _SimdCastType2 + { + using _Ap = __intrinsic_type_t<_Tp, _Np>; + using _B = __vector_type_t<_Tp, _Np>; + _SimdMember _M_data; + + public: + _SimdCastType2(_Ap __a) : _M_data(__vector_bitcast<_Tp>(__a)) {} + _SimdCastType2(_B __b) : _M_data(__b) {} + operator _SimdMember() const { return _M_data; } + }; + + using _SimdCastType = conditional_t< + is_same<__intrinsic_type_t<_Tp, _Np>, __vector_type_t<_Tp, _Np>>::value, + _SimdCastType1, _SimdCastType2>; + //}}} + }; + +// }}} +struct _CommonImplX86; +struct _CommonImplNeon; +struct _CommonImplBuiltin; +template <typename _Abi> struct _SimdImplBuiltin; +template <typename _Abi> struct _MaskImplBuiltin; +template <typename _Abi> struct _SimdImplX86; +template <typename _Abi> struct _MaskImplX86; +template <typename _Abi> struct _SimdImplNeon; +template <typename _Abi> struct _MaskImplNeon; +template <typename _Abi> struct _SimdImplPpc; + +// simd_abi::_VecBuiltin {{{ +template <int _UsedBytes> + struct simd_abi::_VecBuiltin + { + template <typename _Tp> + static constexpr size_t _S_size = _UsedBytes / sizeof(_Tp); + + // validity traits {{{ + struct _IsValidAbiTag : __bool_constant<(_UsedBytes > 1)> {}; + + template <typename _Tp> + struct _IsValidSizeFor + : __bool_constant<(_UsedBytes / sizeof(_Tp) > 1 + && _UsedBytes % sizeof(_Tp) == 0 + && _UsedBytes <= __vectorized_sizeof<_Tp>() + && (!__have_avx512f || _UsedBytes <= 32))> {}; + + template <typename _Tp> + struct _IsValid : conjunction<_IsValidAbiTag, __is_vectorizable<_Tp>, + _IsValidSizeFor<_Tp>> {}; + + template <typename _Tp> + static constexpr bool _S_is_valid_v = _IsValid<_Tp>::value; + + // }}} + // _SimdImpl/_MaskImpl {{{ +#if _GLIBCXX_SIMD_X86INTRIN + using _CommonImpl = _CommonImplX86; + using _SimdImpl = _SimdImplX86<_VecBuiltin<_UsedBytes>>; + using _MaskImpl = _MaskImplX86<_VecBuiltin<_UsedBytes>>; +#elif _GLIBCXX_SIMD_HAVE_NEON + using _CommonImpl = _CommonImplNeon; + using _SimdImpl = _SimdImplNeon<_VecBuiltin<_UsedBytes>>; + using _MaskImpl = _MaskImplNeon<_VecBuiltin<_UsedBytes>>; +#else + using _CommonImpl = _CommonImplBuiltin; +#ifdef __ALTIVEC__ + using _SimdImpl = _SimdImplPpc<_VecBuiltin<_UsedBytes>>; +#else + using _SimdImpl = _SimdImplBuiltin<_VecBuiltin<_UsedBytes>>; +#endif + using _MaskImpl = _MaskImplBuiltin<_VecBuiltin<_UsedBytes>>; +#endif + + // }}} + // __traits {{{ + template <typename _Tp> + using _MaskValueType = __int_for_sizeof_t<_Tp>; + + template <typename _Tp> + using __traits + = conditional_t<_S_is_valid_v<_Tp>, + _GnuTraits<_Tp, _MaskValueType<_Tp>, + _VecBuiltin<_UsedBytes>, _S_size<_Tp>>, + _InvalidTraits>; + + //}}} + // size metadata {{{ + template <typename _Tp> + static constexpr size_t _S_full_size = __traits<_Tp>::_S_full_size; + + template <typename _Tp> + static constexpr bool _S_is_partial = __traits<_Tp>::_S_is_partial; + + // }}} + // implicit masks {{{ + template <typename _Tp> + using _MaskMember = _SimdWrapper<_MaskValueType<_Tp>, _S_size<_Tp>>; + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember<_Tp> + _S_implicit_mask() + { + using _UV = typename _MaskMember<_Tp>::_BuiltinType; + if constexpr (!_MaskMember<_Tp>::_S_is_partial) + return ~_UV(); + else + { + constexpr auto __size = _S_size<_Tp>; + _GLIBCXX_SIMD_USE_CONSTEXPR auto __r = __generate_vector<_UV>( + [](auto __i) constexpr { return __i < __size ? -1 : 0; }); + return __r; + } + } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static constexpr __intrinsic_type_t<_Tp, + _S_size<_Tp>> + _S_implicit_mask_intrin() + { + return __to_intrin( + __vector_bitcast<_Tp>(_S_implicit_mask<_Tp>()._M_data)); + } + + template <typename _TW, typename _TVT = _VectorTraits<_TW>> + _GLIBCXX_SIMD_INTRINSIC static constexpr _TW _S_masked(_TW __x) + { + using _Tp = typename _TVT::value_type; + if constexpr (!_MaskMember<_Tp>::_S_is_partial) + return __x; + else + return __and(__as_vector(__x), + __vector_bitcast<_Tp>(_S_implicit_mask<_Tp>())); + } + + template <typename _TW, typename _TVT = _VectorTraits<_TW>> + _GLIBCXX_SIMD_INTRINSIC static constexpr auto + __make_padding_nonzero(_TW __x) + { + using _Tp = typename _TVT::value_type; + if constexpr (!_S_is_partial<_Tp>) + return __x; + else + { + _GLIBCXX_SIMD_USE_CONSTEXPR auto __implicit_mask + = __vector_bitcast<_Tp>(_S_implicit_mask<_Tp>()); + if constexpr (is_integral_v<_Tp>) + return __or(__x, ~__implicit_mask); + else + { + _GLIBCXX_SIMD_USE_CONSTEXPR auto __one + = __andnot(__implicit_mask, + __vector_broadcast<_S_full_size<_Tp>>(_Tp(1))); + // it's not enough to return `x | 1_in_padding` because the + // padding in x might be inf or nan (independent of + // __FINITE_MATH_ONLY__, because it's about padding bits) + return __or(__and(__x, __implicit_mask), __one); + } + } + } + // }}} + }; + +// }}} +// simd_abi::_VecBltnBtmsk {{{ +template <int _UsedBytes> + struct simd_abi::_VecBltnBtmsk + { + template <typename _Tp> + static constexpr size_t _S_size = _UsedBytes / sizeof(_Tp); + + // validity traits {{{ + struct _IsValidAbiTag : __bool_constant<(_UsedBytes > 1)> {}; + + template <typename _Tp> + struct _IsValidSizeFor + : __bool_constant<(_UsedBytes / sizeof(_Tp) > 1 + && _UsedBytes % sizeof(_Tp) == 0 && _UsedBytes <= 64 + && (_UsedBytes > 32 || __have_avx512vl))> {}; + + // Bitmasks require at least AVX512F. If sizeof(_Tp) < 4 the AVX512BW is also + // required. + template <typename _Tp> + struct _IsValid + : conjunction< + _IsValidAbiTag, __bool_constant<__have_avx512f>, + __bool_constant<__have_avx512bw || (sizeof(_Tp) >= 4)>, + __bool_constant<(__vectorized_sizeof<_Tp>() > sizeof(_Tp))>, + _IsValidSizeFor<_Tp>> {}; + + template <typename _Tp> + static constexpr bool _S_is_valid_v = _IsValid<_Tp>::value; + + // }}} + // simd/_MaskImpl {{{ + #if _GLIBCXX_SIMD_X86INTRIN + using _CommonImpl = _CommonImplX86; + using _SimdImpl = _SimdImplX86<_VecBltnBtmsk<_UsedBytes>>; + using _MaskImpl = _MaskImplX86<_VecBltnBtmsk<_UsedBytes>>; + #else + template <int> + struct _MissingImpl; + + using _CommonImpl = _MissingImpl<_UsedBytes>; + using _SimdImpl = _MissingImpl<_UsedBytes>; + using _MaskImpl = _MissingImpl<_UsedBytes>; + #endif + + // }}} + // __traits {{{ + template <typename _Tp> + using _MaskMember = _SimdWrapper<bool, _S_size<_Tp>>; + + template <typename _Tp> + using __traits = conditional_t< + _S_is_valid_v<_Tp>, + _GnuTraits<_Tp, bool, _VecBltnBtmsk<_UsedBytes>, _S_size<_Tp>>, + _InvalidTraits>; + + //}}} + // size metadata {{{ + template <typename _Tp> + static constexpr size_t _S_full_size = __traits<_Tp>::_S_full_size; + template <typename _Tp> + static constexpr bool _S_is_partial = __traits<_Tp>::_S_is_partial; + + // }}} + // implicit mask {{{ + private: + template <typename _Tp> + using _ImplicitMask = _SimdWrapper<bool, _S_size<_Tp>>; + + public: + template <size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr __bool_storage_member_type_t<_Np> + __implicit_mask_n() + { + using _Tp = __bool_storage_member_type_t<_Np>; + return _Np < sizeof(_Tp) * __CHAR_BIT__ ? _Tp((1ULL << _Np) - 1) : ~_Tp(); + } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static constexpr _ImplicitMask<_Tp> + _S_implicit_mask() + { return __implicit_mask_n<_S_size<_Tp>>(); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static constexpr __bool_storage_member_type_t< + _S_size<_Tp>> + _S_implicit_mask_intrin() + { return __implicit_mask_n<_S_size<_Tp>>(); } + + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np> + _S_masked(_SimdWrapper<_Tp, _Np> __x) + { + if constexpr (is_same_v<_Tp, bool>) + if constexpr (_Np < 8 || (_Np & (_Np - 1)) != 0) + return _MaskImpl::_S_bit_and( + __x, _SimdWrapper<_Tp, _Np>( + __bool_storage_member_type_t<_Np>((1ULL << _Np) - 1))); + else + return __x; + else + return _S_masked(__x._M_data); + } + + template <typename _TV> + _GLIBCXX_SIMD_INTRINSIC static constexpr _TV + _S_masked(_TV __x) + { + using _Tp = typename _VectorTraits<_TV>::value_type; + static_assert( + !__is_bitmask_v<_TV>, + "_VecBltnBtmsk::_S_masked cannot work on bitmasks, since it doesn't " + "know the number of elements. Use _SimdWrapper<bool, N> instead."); + if constexpr (_S_is_partial<_Tp>) + { + constexpr size_t _Np = _S_size<_Tp>; + return __make_dependent_t<_TV, _CommonImpl>::_S_blend( + _S_implicit_mask<_Tp>(), _SimdWrapper<_Tp, _Np>(), + _SimdWrapper<_Tp, _Np>(__x)); + } + else + return __x; + } + + template <typename _TV, typename _TVT = _VectorTraits<_TV>> + _GLIBCXX_SIMD_INTRINSIC static constexpr auto + __make_padding_nonzero(_TV __x) + { + using _Tp = typename _TVT::value_type; + if constexpr (!_S_is_partial<_Tp>) + return __x; + else + { + constexpr size_t _Np = _S_size<_Tp>; + if constexpr (is_integral_v<typename _TVT::value_type>) + return __x + | __generate_vector<_Tp, _S_full_size<_Tp>>( + [](auto __i) -> _Tp { + if (__i < _Np) + return 0; + else + return 1; + }); + else + return __make_dependent_t<_TV, _CommonImpl>::_S_blend( + _S_implicit_mask<_Tp>(), + _SimdWrapper<_Tp, _Np>( + __vector_broadcast<_S_full_size<_Tp>>(_Tp(1))), + _SimdWrapper<_Tp, _Np>(__x)) + ._M_data; + } + } + + // }}} + }; + +//}}} +// _CommonImplBuiltin {{{ +struct _CommonImplBuiltin +{ + // _S_converts_via_decomposition{{{ + // This lists all cases where a __vector_convert needs to fall back to + // conversion of individual scalars (i.e. decompose the input vector into + // scalars, convert, compose output vector). In those cases, _S_masked_load & + // _S_masked_store prefer to use the _S_bit_iteration implementation. + template <typename _From, typename _To, size_t _ToSize> + static inline constexpr bool __converts_via_decomposition_v + = sizeof(_From) != sizeof(_To); + + // }}} + // _S_load{{{ + template <typename _Tp, size_t _Np, size_t _Bytes = _Np * sizeof(_Tp)> + _GLIBCXX_SIMD_INTRINSIC static __vector_type_t<_Tp, _Np> + _S_load(const void* __p) + { + static_assert(_Np > 1); + static_assert(_Bytes % sizeof(_Tp) == 0); + using _Rp = __vector_type_t<_Tp, _Np>; + if constexpr (sizeof(_Rp) == _Bytes) + { + _Rp __r; + __builtin_memcpy(&__r, __p, _Bytes); + return __r; + } + else + { +#ifdef _GLIBCXX_SIMD_WORKAROUND_PR90424 + using _Up = conditional_t< + is_integral_v<_Tp>, + conditional_t<_Bytes % 4 == 0, + conditional_t<_Bytes % 8 == 0, long long, int>, + conditional_t<_Bytes % 2 == 0, short, signed char>>, + conditional_t<(_Bytes < 8 || _Np % 2 == 1 || _Np == 2), _Tp, + double>>; + using _V = __vector_type_t<_Up, _Np * sizeof(_Tp) / sizeof(_Up)>; + if constexpr (sizeof(_V) != sizeof(_Rp)) + { // on i386 with 4 < _Bytes <= 8 + _Rp __r{}; + __builtin_memcpy(&__r, __p, _Bytes); + return __r; + } + else +#else // _GLIBCXX_SIMD_WORKAROUND_PR90424 + using _V = _Rp; +#endif // _GLIBCXX_SIMD_WORKAROUND_PR90424 + { + _V __r{}; + static_assert(_Bytes <= sizeof(_V)); + __builtin_memcpy(&__r, __p, _Bytes); + return reinterpret_cast<_Rp>(__r); + } + } + } + + // }}} + // _S_store {{{ + template <size_t _ReqBytes = 0, typename _TV> + _GLIBCXX_SIMD_INTRINSIC static void _S_store(_TV __x, void* __addr) + { + constexpr size_t _Bytes = _ReqBytes == 0 ? sizeof(__x) : _ReqBytes; + static_assert(sizeof(__x) >= _Bytes); + + if constexpr (__is_vector_type_v<_TV>) + { + using _Tp = typename _VectorTraits<_TV>::value_type; + constexpr size_t _Np = _Bytes / sizeof(_Tp); + static_assert(_Np * sizeof(_Tp) == _Bytes); + +#ifdef _GLIBCXX_SIMD_WORKAROUND_PR90424 + using _Up = conditional_t< + (is_integral_v<_Tp> || _Bytes < 4), + conditional_t<(sizeof(__x) > sizeof(long long)), long long, _Tp>, + float>; + const auto __v = __vector_bitcast<_Up>(__x); +#else // _GLIBCXX_SIMD_WORKAROUND_PR90424 + const __vector_type_t<_Tp, _Np> __v = __x; +#endif // _GLIBCXX_SIMD_WORKAROUND_PR90424 + + if constexpr ((_Bytes & (_Bytes - 1)) != 0) + { + constexpr size_t _MoreBytes = std::__bit_ceil(_Bytes); + alignas(decltype(__v)) char __tmp[_MoreBytes]; + __builtin_memcpy(__tmp, &__v, _MoreBytes); + __builtin_memcpy(__addr, __tmp, _Bytes); + } + else + __builtin_memcpy(__addr, &__v, _Bytes); + } + else + __builtin_memcpy(__addr, &__x, _Bytes); + } + + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static void _S_store(_SimdWrapper<_Tp, _Np> __x, + void* __addr) + { _S_store<_Np * sizeof(_Tp)>(__x._M_data, __addr); } + + // }}} + // _S_store_bool_array(_BitMask) {{{ + template <size_t _Np, bool _Sanitized> + _GLIBCXX_SIMD_INTRINSIC static constexpr void + _S_store_bool_array(_BitMask<_Np, _Sanitized> __x, bool* __mem) + { + if constexpr (_Np == 1) + __mem[0] = __x[0]; + else if constexpr (_Np == 2) + { + short __bool2 = (__x._M_to_bits() * 0x81) & 0x0101; + _S_store<_Np>(__bool2, __mem); + } + else if constexpr (_Np == 3) + { + int __bool3 = (__x._M_to_bits() * 0x4081) & 0x010101; + _S_store<_Np>(__bool3, __mem); + } + else + { + __execute_n_times<__div_roundup(_Np, 4)>([&](auto __i) { + constexpr int __offset = __i * 4; + constexpr int __remaining = _Np - __offset; + if constexpr (__remaining > 4 && __remaining <= 7) + { + const _ULLong __bool7 + = (__x.template _M_extract<__offset>()._M_to_bits() + * 0x40810204081ULL) + & 0x0101010101010101ULL; + _S_store<__remaining>(__bool7, __mem + __offset); + } + else if constexpr (__remaining >= 4) + { + int __bits = __x.template _M_extract<__offset>()._M_to_bits(); + if constexpr (__remaining > 7) + __bits &= 0xf; + const int __bool4 = (__bits * 0x204081) & 0x01010101; + _S_store<4>(__bool4, __mem + __offset); + } + }); + } + } + + // }}} + // _S_blend{{{ + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr auto + _S_blend(_SimdWrapper<__int_for_sizeof_t<_Tp>, _Np> __k, + _SimdWrapper<_Tp, _Np> __at0, _SimdWrapper<_Tp, _Np> __at1) + { return __k._M_data ? __at1._M_data : __at0._M_data; } + + // }}} +}; + +// }}} +// _SimdImplBuiltin {{{1 +template <typename _Abi> + struct _SimdImplBuiltin + { + // member types {{{2 + template <typename _Tp> + static constexpr size_t _S_max_store_size = 16; + + using abi_type = _Abi; + + template <typename _Tp> + using _TypeTag = _Tp*; + + template <typename _Tp> + using _SimdMember = typename _Abi::template __traits<_Tp>::_SimdMember; + + template <typename _Tp> + using _MaskMember = typename _Abi::template _MaskMember<_Tp>; + + template <typename _Tp> + static constexpr size_t _S_size = _Abi::template _S_size<_Tp>; + + template <typename _Tp> + static constexpr size_t _S_full_size = _Abi::template _S_full_size<_Tp>; + + using _CommonImpl = typename _Abi::_CommonImpl; + using _SuperImpl = typename _Abi::_SimdImpl; + using _MaskImpl = typename _Abi::_MaskImpl; + + // _M_make_simd(_SimdWrapper/__intrinsic_type_t) {{{2 + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static simd<_Tp, _Abi> + _M_make_simd(_SimdWrapper<_Tp, _Np> __x) + { return {__private_init, __x}; } + + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static simd<_Tp, _Abi> + _M_make_simd(__intrinsic_type_t<_Tp, _Np> __x) + { return {__private_init, __vector_bitcast<_Tp>(__x)}; } + + // _S_broadcast {{{2 + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdMember<_Tp> + _S_broadcast(_Tp __x) noexcept + { return __vector_broadcast<_S_full_size<_Tp>>(__x); } + + // _S_generator {{{2 + template <typename _Fp, typename _Tp> + inline static constexpr _SimdMember<_Tp> _S_generator(_Fp&& __gen, + _TypeTag<_Tp>) + { + return __generate_vector<_Tp, _S_full_size<_Tp>>([&]( + auto __i) constexpr { + if constexpr (__i < _S_size<_Tp>) + return __gen(__i); + else + return 0; + }); + } + + // _S_load {{{2 + template <typename _Tp, typename _Up> + _GLIBCXX_SIMD_INTRINSIC static _SimdMember<_Tp> + _S_load(const _Up* __mem, _TypeTag<_Tp>) noexcept + { + constexpr size_t _Np = _S_size<_Tp>; + constexpr size_t __max_load_size + = (sizeof(_Up) >= 4 && __have_avx512f) || __have_avx512bw ? 64 + : (is_floating_point_v<_Up> && __have_avx) || __have_avx2 ? 32 + : 16; + constexpr size_t __bytes_to_load = sizeof(_Up) * _Np; + if constexpr (sizeof(_Up) > 8) + return __generate_vector<_Tp, _SimdMember<_Tp>::_S_full_size>([&]( + auto __i) constexpr { + return static_cast<_Tp>(__i < _Np ? __mem[__i] : 0); + }); + else if constexpr (is_same_v<_Up, _Tp>) + return _CommonImpl::template _S_load<_Tp, _S_full_size<_Tp>, + _Np * sizeof(_Tp)>(__mem); + else if constexpr (__bytes_to_load <= __max_load_size) + return __convert<_SimdMember<_Tp>>( + _CommonImpl::template _S_load<_Up, _Np>(__mem)); + else if constexpr (__bytes_to_load % __max_load_size == 0) + { + constexpr size_t __n_loads = __bytes_to_load / __max_load_size; + constexpr size_t __elements_per_load = _Np / __n_loads; + return __call_with_n_evaluations<__n_loads>( + [](auto... __uncvted) { + return __convert<_SimdMember<_Tp>>(__uncvted...); + }, + [&](auto __i) { + return _CommonImpl::template _S_load<_Up, __elements_per_load>( + __mem + __i * __elements_per_load); + }); + } + else if constexpr (__bytes_to_load % (__max_load_size / 2) == 0 + && __max_load_size > 16) + { // e.g. int[] -> <char, 12> with AVX2 + constexpr size_t __n_loads + = __bytes_to_load / (__max_load_size / 2); + constexpr size_t __elements_per_load = _Np / __n_loads; + return __call_with_n_evaluations<__n_loads>( + [](auto... __uncvted) { + return __convert<_SimdMember<_Tp>>(__uncvted...); + }, + [&](auto __i) { + return _CommonImpl::template _S_load<_Up, __elements_per_load>( + __mem + __i * __elements_per_load); + }); + } + else // e.g. int[] -> <char, 9> + return __call_with_subscripts( + __mem, make_index_sequence<_Np>(), [](auto... __args) { + return __vector_type_t<_Tp, _S_full_size<_Tp>>{ + static_cast<_Tp>(__args)...}; + }); + } + + // _S_masked_load {{{2 + template <typename _Tp, size_t _Np, typename _Up> + static inline _SimdWrapper<_Tp, _Np> + _S_masked_load(_SimdWrapper<_Tp, _Np> __merge, _MaskMember<_Tp> __k, + const _Up* __mem) noexcept + { + _BitOps::_S_bit_iteration(_MaskImpl::_S_to_bits(__k), [&](auto __i) { + __merge._M_set(__i, static_cast<_Tp>(__mem[__i])); + }); + return __merge; + } + + // _S_store {{{2 + template <typename _Tp, typename _Up> + _GLIBCXX_SIMD_INTRINSIC static void + _S_store(_SimdMember<_Tp> __v, _Up* __mem, _TypeTag<_Tp>) noexcept + { + // TODO: converting int -> "smaller int" can be optimized with AVX512 + constexpr size_t _Np = _S_size<_Tp>; + constexpr size_t __max_store_size + = _SuperImpl::template _S_max_store_size<_Up>; + if constexpr (sizeof(_Up) > 8) + __execute_n_times<_Np>([&](auto __i) constexpr { + __mem[__i] = __v[__i]; + }); + else if constexpr (is_same_v<_Up, _Tp>) + _CommonImpl::_S_store(__v, __mem); + else if constexpr (sizeof(_Up) * _Np <= __max_store_size) + _CommonImpl::_S_store(_SimdWrapper<_Up, _Np>(__convert<_Up>(__v)), + __mem); + else + { + constexpr size_t __vsize = __max_store_size / sizeof(_Up); + // round up to convert the last partial vector as well: + constexpr size_t __stores = __div_roundup(_Np, __vsize); + constexpr size_t __full_stores = _Np / __vsize; + using _V = __vector_type_t<_Up, __vsize>; + const array<_V, __stores> __converted + = __convert_all<_V, __stores>(__v); + __execute_n_times<__full_stores>([&](auto __i) constexpr { + _CommonImpl::_S_store(__converted[__i], __mem + __i * __vsize); + }); + if constexpr (__full_stores < __stores) + _CommonImpl::template _S_store<(_Np - __full_stores * __vsize) + * sizeof(_Up)>( + __converted[__full_stores], __mem + __full_stores * __vsize); + } + } + + // _S_masked_store_nocvt {{{2 + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static void + _S_masked_store_nocvt(_SimdWrapper<_Tp, _Np> __v, _Tp* __mem, + _MaskMember<_Tp> __k) + { + _BitOps::_S_bit_iteration( + _MaskImpl::_S_to_bits(__k), [&](auto __i) constexpr { + __mem[__i] = __v[__i]; + }); + } + + // _S_masked_store {{{2 + template <typename _TW, typename _TVT = _VectorTraits<_TW>, + typename _Tp = typename _TVT::value_type, typename _Up> + static inline void + _S_masked_store(const _TW __v, _Up* __mem, const _MaskMember<_Tp> __k) + noexcept + { + constexpr size_t _TV_size = _S_size<_Tp>; + [[maybe_unused]] const auto __vi = __to_intrin(__v); + constexpr size_t __max_store_size + = _SuperImpl::template _S_max_store_size<_Up>; + if constexpr ( + is_same_v< + _Tp, + _Up> || (is_integral_v<_Tp> && is_integral_v<_Up> && sizeof(_Tp) == sizeof(_Up))) + { + // bitwise or no conversion, reinterpret: + const _MaskMember<_Up> __kk = [&]() { + if constexpr (__is_bitmask_v<decltype(__k)>) + return _MaskMember<_Up>(__k._M_data); + else + return __wrapper_bitcast<__int_for_sizeof_t<_Up>>(__k); + }(); + _SuperImpl::_S_masked_store_nocvt(__wrapper_bitcast<_Up>(__v), + __mem, __kk); + } + else if constexpr (__vectorized_sizeof<_Up>() > sizeof(_Up) + && !_CommonImpl:: + template __converts_via_decomposition_v< + _Tp, _Up, __max_store_size>) + { // conversion via decomposition is better handled via the + // bit_iteration + // fallback below + constexpr size_t _UW_size + = std::min(_TV_size, __max_store_size / sizeof(_Up)); + static_assert(_UW_size <= _TV_size); + using _UW = _SimdWrapper<_Up, _UW_size>; + using _UV = __vector_type_t<_Up, _UW_size>; + using _UAbi = simd_abi::deduce_t<_Up, _UW_size>; + if constexpr (_UW_size == _TV_size) // one convert+store + { + const _UW __converted = __convert<_UW>(__v); + _SuperImpl::_S_masked_store_nocvt( + __converted, __mem, + _UAbi::_MaskImpl::template _S_convert< + __int_for_sizeof_t<_Up>>(__k)); + } + else + { + static_assert(_UW_size * sizeof(_Up) == __max_store_size); + constexpr size_t _NFullStores = _TV_size / _UW_size; + constexpr size_t _NAllStores + = __div_roundup(_TV_size, _UW_size); + constexpr size_t _NParts = _S_full_size<_Tp> / _UW_size; + const array<_UV, _NAllStores> __converted + = __convert_all<_UV, _NAllStores>(__v); + __execute_n_times<_NFullStores>([&](auto __i) { + _SuperImpl::_S_masked_store_nocvt( + _UW(__converted[__i]), __mem + __i * _UW_size, + _UAbi::_MaskImpl::template _S_convert< + __int_for_sizeof_t<_Up>>( + __extract_part<__i, _NParts>(__k.__as_full_vector()))); + }); + if constexpr (_NAllStores + > _NFullStores) // one partial at the end + _SuperImpl::_S_masked_store_nocvt( + _UW(__converted[_NFullStores]), + __mem + _NFullStores * _UW_size, + _UAbi::_MaskImpl::template _S_convert< + __int_for_sizeof_t<_Up>>( + __extract_part<_NFullStores, _NParts>( + __k.__as_full_vector()))); + } + } + else + _BitOps::_S_bit_iteration( + _MaskImpl::_S_to_bits(__k), [&](auto __i) constexpr { + __mem[__i] = static_cast<_Up>(__v[__i]); + }); + } + + // _S_complement {{{2 + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np> + _S_complement(_SimdWrapper<_Tp, _Np> __x) noexcept + { return ~__x._M_data; } + + // _S_unary_minus {{{2 + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np> + _S_unary_minus(_SimdWrapper<_Tp, _Np> __x) noexcept + { + // GCC doesn't use the psign instructions, but pxor & psub seem to be + // just as good a choice as pcmpeqd & psign. So meh. + return -__x._M_data; + } + + // arithmetic operators {{{2 + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np> + _S_plus(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y) + { return __x._M_data + __y._M_data; } + + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np> + _S_minus(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y) + { return __x._M_data - __y._M_data; } + + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np> + _S_multiplies(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y) + { return __x._M_data * __y._M_data; } + + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np> + _S_divides(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y) + { + // Note that division by 0 is always UB, so we must ensure we avoid the + // case for partial registers + if constexpr (!_Abi::template _S_is_partial<_Tp>) + return __x._M_data / __y._M_data; + else + return __x._M_data / _Abi::__make_padding_nonzero(__y._M_data); + } + + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np> + _S_modulus(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y) + { + if constexpr (!_Abi::template _S_is_partial<_Tp>) + return __x._M_data % __y._M_data; + else + return __as_vector(__x) + % _Abi::__make_padding_nonzero(__as_vector(__y)); + } + + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np> + _S_bit_and(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y) + { return __and(__x, __y); } + + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np> + _S_bit_or(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y) + { return __or(__x, __y); } + + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np> + _S_bit_xor(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y) + { return __xor(__x, __y); } + + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static _SimdWrapper<_Tp, _Np> + _S_bit_shift_left(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y) + { return __x._M_data << __y._M_data; } + + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static _SimdWrapper<_Tp, _Np> + _S_bit_shift_right(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y) + { return __x._M_data >> __y._M_data; } + + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np> + _S_bit_shift_left(_SimdWrapper<_Tp, _Np> __x, int __y) + { return __x._M_data << __y; } + + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np> + _S_bit_shift_right(_SimdWrapper<_Tp, _Np> __x, int __y) + { return __x._M_data >> __y; } + + // compares {{{2 + // _S_equal_to {{{3 + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember<_Tp> + _S_equal_to(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y) + { return __x._M_data == __y._M_data; } + + // _S_not_equal_to {{{3 + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember<_Tp> + _S_not_equal_to(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y) + { return __x._M_data != __y._M_data; } + + // _S_less {{{3 + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember<_Tp> + _S_less(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y) + { return __x._M_data < __y._M_data; } + + // _S_less_equal {{{3 + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember<_Tp> + _S_less_equal(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y) + { return __x._M_data <= __y._M_data; } + + // _S_negate {{{2 + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember<_Tp> + _S_negate(_SimdWrapper<_Tp, _Np> __x) noexcept + { return !__x._M_data; } + + // _S_min, _S_max, _S_minmax {{{2 + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_NORMAL_MATH _GLIBCXX_SIMD_INTRINSIC static constexpr + _SimdWrapper<_Tp, _Np> + _S_min(_SimdWrapper<_Tp, _Np> __a, _SimdWrapper<_Tp, _Np> __b) + { return __a._M_data < __b._M_data ? __a._M_data : __b._M_data; } + + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_NORMAL_MATH _GLIBCXX_SIMD_INTRINSIC static constexpr + _SimdWrapper<_Tp, _Np> + _S_max(_SimdWrapper<_Tp, _Np> __a, _SimdWrapper<_Tp, _Np> __b) + { return __a._M_data > __b._M_data ? __a._M_data : __b._M_data; } + + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_NORMAL_MATH _GLIBCXX_SIMD_INTRINSIC static constexpr + pair<_SimdWrapper<_Tp, _Np>, _SimdWrapper<_Tp, _Np>> + _S_minmax(_SimdWrapper<_Tp, _Np> __a, _SimdWrapper<_Tp, _Np> __b) + { + return {__a._M_data < __b._M_data ? __a._M_data : __b._M_data, + __a._M_data < __b._M_data ? __b._M_data : __a._M_data}; + } + + // reductions {{{2 + template <size_t _Np, size_t... _Is, size_t... _Zeros, typename _Tp, + typename _BinaryOperation> + _GLIBCXX_SIMD_INTRINSIC static _Tp + _S_reduce_partial(index_sequence<_Is...>, index_sequence<_Zeros...>, + simd<_Tp, _Abi> __x, _BinaryOperation&& __binary_op) + { + using _V = __vector_type_t<_Tp, _Np / 2>; + static_assert(sizeof(_V) <= sizeof(__x)); + // _S_full_size is the size of the smallest native SIMD register that + // can store _Np/2 elements: + using _FullSimd = __deduced_simd<_Tp, _VectorTraits<_V>::_S_full_size>; + using _HalfSimd = __deduced_simd<_Tp, _Np / 2>; + const auto __xx = __as_vector(__x); + return _HalfSimd::abi_type::_SimdImpl::_S_reduce( + static_cast<_HalfSimd>(__as_vector(__binary_op( + static_cast<_FullSimd>(__intrin_bitcast<_V>(__xx)), + static_cast<_FullSimd>(__intrin_bitcast<_V>( + __vector_permute<(_Np / 2 + _Is)..., (int(_Zeros * 0) - 1)...>( + __xx)))))), + __binary_op); + } + + template <typename _Tp, typename _BinaryOperation> + _GLIBCXX_SIMD_INTRINSIC static constexpr _Tp + _S_reduce(simd<_Tp, _Abi> __x, _BinaryOperation&& __binary_op) + { + constexpr size_t _Np = simd_size_v<_Tp, _Abi>; + if constexpr (_Np == 1) + return __x[0]; + else if constexpr (_Np == 2) + return __binary_op(simd<_Tp, simd_abi::scalar>(__x[0]), + simd<_Tp, simd_abi::scalar>(__x[1]))[0]; + else if constexpr (_Abi::template _S_is_partial<_Tp>) //{{{ + { + [[maybe_unused]] constexpr auto __full_size + = _Abi::template _S_full_size<_Tp>; + if constexpr (_Np == 3) + return __binary_op( + __binary_op(simd<_Tp, simd_abi::scalar>(__x[0]), + simd<_Tp, simd_abi::scalar>(__x[1])), + simd<_Tp, simd_abi::scalar>(__x[2]))[0]; + else if constexpr (is_same_v<__remove_cvref_t<_BinaryOperation>, + plus<>>) + { + using _Ap = simd_abi::deduce_t<_Tp, __full_size>; + return _Ap::_SimdImpl::_S_reduce( + simd<_Tp, _Ap>(__private_init, + _Abi::_S_masked(__as_vector(__x))), + __binary_op); + } + else if constexpr (is_same_v<__remove_cvref_t<_BinaryOperation>, + multiplies<>>) + { + using _Ap = simd_abi::deduce_t<_Tp, __full_size>; + using _TW = _SimdWrapper<_Tp, __full_size>; + _GLIBCXX_SIMD_USE_CONSTEXPR auto __implicit_mask_full + = _Abi::template _S_implicit_mask<_Tp>().__as_full_vector(); + _GLIBCXX_SIMD_USE_CONSTEXPR _TW __one + = __vector_broadcast<__full_size>(_Tp(1)); + const _TW __x_full = __data(__x).__as_full_vector(); + const _TW __x_padded_with_ones + = _Ap::_CommonImpl::_S_blend(__implicit_mask_full, __one, + __x_full); + return _Ap::_SimdImpl::_S_reduce( + simd<_Tp, _Ap>(__private_init, __x_padded_with_ones), + __binary_op); + } + else if constexpr (_Np & 1) + { + using _Ap = simd_abi::deduce_t<_Tp, _Np - 1>; + return __binary_op( + simd<_Tp, simd_abi::scalar>(_Ap::_SimdImpl::_S_reduce( + simd<_Tp, _Ap>( + __intrin_bitcast<__vector_type_t<_Tp, _Np - 1>>( + __as_vector(__x))), + __binary_op)), + simd<_Tp, simd_abi::scalar>(__x[_Np - 1]))[0]; + } + else + return _S_reduce_partial<_Np>( + make_index_sequence<_Np / 2>(), + make_index_sequence<__full_size - _Np / 2>(), __x, __binary_op); + } //}}} + else if constexpr (sizeof(__x) == 16) //{{{ + { + if constexpr (_Np == 16) + { + const auto __y = __data(__x); + __x = __binary_op( + _M_make_simd<_Tp, _Np>( + __vector_permute<0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, + 7, 7>(__y)), + _M_make_simd<_Tp, _Np>( + __vector_permute<8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13, + 14, 14, 15, 15>(__y))); + } + if constexpr (_Np >= 8) + { + const auto __y = __vector_bitcast<short>(__data(__x)); + __x = __binary_op( + _M_make_simd<_Tp, _Np>(__vector_bitcast<_Tp>( + __vector_permute<0, 0, 1, 1, 2, 2, 3, 3>(__y))), + _M_make_simd<_Tp, _Np>(__vector_bitcast<_Tp>( + __vector_permute<4, 4, 5, 5, 6, 6, 7, 7>(__y)))); + } + if constexpr (_Np >= 4) + { + using _Up = conditional_t<is_floating_point_v<_Tp>, float, int>; + const auto __y = __vector_bitcast<_Up>(__data(__x)); + __x = __binary_op(__x, + _M_make_simd<_Tp, _Np>(__vector_bitcast<_Tp>( + __vector_permute<3, 2, 1, 0>(__y)))); + } + using _Up = conditional_t<is_floating_point_v<_Tp>, double, _LLong>; + const auto __y = __vector_bitcast<_Up>(__data(__x)); + __x = __binary_op(__x, _M_make_simd<_Tp, _Np>(__vector_bitcast<_Tp>( + __vector_permute<1, 1>(__y)))); + return __x[0]; + } //}}} + else + { + static_assert(sizeof(__x) > __min_vector_size<_Tp>); + static_assert((_Np & (_Np - 1)) == 0); // _Np must be a power of 2 + using _Ap = simd_abi::deduce_t<_Tp, _Np / 2>; + using _V = simd<_Tp, _Ap>; + return _Ap::_SimdImpl::_S_reduce( + __binary_op(_V(__private_init, __extract<0, 2>(__as_vector(__x))), + _V(__private_init, + __extract<1, 2>(__as_vector(__x)))), + static_cast<_BinaryOperation&&>(__binary_op)); + } + } + + // math {{{2 + // frexp, modf and copysign implemented in simd_math.h +#define _GLIBCXX_SIMD_MATH_FALLBACK(__name) \ + template <typename _Tp, typename... _More> \ + static _Tp _S_##__name(const _Tp& __x, const _More&... __more) \ + { \ + return __generate_vector<_Tp>( \ + [&](auto __i) { return __name(__x[__i], __more[__i]...); }); \ + } + +#define _GLIBCXX_SIMD_MATH_FALLBACK_MASKRET(__name) \ + template <typename _Tp, typename... _More> \ + static typename _Tp::mask_type _S_##__name(const _Tp& __x, \ + const _More&... __more) \ + { \ + return __generate_vector<_Tp>( \ + [&](auto __i) { return __name(__x[__i], __more[__i]...); }); \ + } + +#define _GLIBCXX_SIMD_MATH_FALLBACK_FIXEDRET(_RetTp, __name) \ + template <typename _Tp, typename... _More> \ + static auto _S_##__name(const _Tp& __x, const _More&... __more) \ + { \ + return __fixed_size_storage_t<_RetTp, \ + _VectorTraits<_Tp>::_S_partial_width>:: \ + _S_generate([&](auto __meta) constexpr { \ + return __meta._S_generator( \ + [&](auto __i) { \ + return __name(__x[__meta._S_offset + __i], \ + __more[__meta._S_offset + __i]...); \ + }, \ + static_cast<_RetTp*>(nullptr)); \ + }); \ + } + + _GLIBCXX_SIMD_MATH_FALLBACK(acos) + _GLIBCXX_SIMD_MATH_FALLBACK(asin) + _GLIBCXX_SIMD_MATH_FALLBACK(atan) + _GLIBCXX_SIMD_MATH_FALLBACK(atan2) + _GLIBCXX_SIMD_MATH_FALLBACK(cos) + _GLIBCXX_SIMD_MATH_FALLBACK(sin) + _GLIBCXX_SIMD_MATH_FALLBACK(tan) + _GLIBCXX_SIMD_MATH_FALLBACK(acosh) + _GLIBCXX_SIMD_MATH_FALLBACK(asinh) + _GLIBCXX_SIMD_MATH_FALLBACK(atanh) + _GLIBCXX_SIMD_MATH_FALLBACK(cosh) + _GLIBCXX_SIMD_MATH_FALLBACK(sinh) + _GLIBCXX_SIMD_MATH_FALLBACK(tanh) + _GLIBCXX_SIMD_MATH_FALLBACK(exp) + _GLIBCXX_SIMD_MATH_FALLBACK(exp2) + _GLIBCXX_SIMD_MATH_FALLBACK(expm1) + _GLIBCXX_SIMD_MATH_FALLBACK(ldexp) + _GLIBCXX_SIMD_MATH_FALLBACK_FIXEDRET(int, ilogb) + _GLIBCXX_SIMD_MATH_FALLBACK(log) + _GLIBCXX_SIMD_MATH_FALLBACK(log10) + _GLIBCXX_SIMD_MATH_FALLBACK(log1p) + _GLIBCXX_SIMD_MATH_FALLBACK(log2) + _GLIBCXX_SIMD_MATH_FALLBACK(logb) + + // modf implemented in simd_math.h + _GLIBCXX_SIMD_MATH_FALLBACK(scalbn) + _GLIBCXX_SIMD_MATH_FALLBACK(scalbln) + _GLIBCXX_SIMD_MATH_FALLBACK(cbrt) + _GLIBCXX_SIMD_MATH_FALLBACK(fabs) + _GLIBCXX_SIMD_MATH_FALLBACK(pow) + _GLIBCXX_SIMD_MATH_FALLBACK(sqrt) + _GLIBCXX_SIMD_MATH_FALLBACK(erf) + _GLIBCXX_SIMD_MATH_FALLBACK(erfc) + _GLIBCXX_SIMD_MATH_FALLBACK(lgamma) + _GLIBCXX_SIMD_MATH_FALLBACK(tgamma) + + _GLIBCXX_SIMD_MATH_FALLBACK_FIXEDRET(long, lrint) + _GLIBCXX_SIMD_MATH_FALLBACK_FIXEDRET(long long, llrint) + + _GLIBCXX_SIMD_MATH_FALLBACK_FIXEDRET(long, lround) + _GLIBCXX_SIMD_MATH_FALLBACK_FIXEDRET(long long, llround) + + _GLIBCXX_SIMD_MATH_FALLBACK(fmod) + _GLIBCXX_SIMD_MATH_FALLBACK(remainder) + + template <typename _Tp, typename _TVT = _VectorTraits<_Tp>> + static _Tp + _S_remquo(const _Tp __x, const _Tp __y, + __fixed_size_storage_t<int, _TVT::_S_partial_width>* __z) + { + return __generate_vector<_Tp>([&](auto __i) { + int __tmp; + auto __r = remquo(__x[__i], __y[__i], &__tmp); + __z->_M_set(__i, __tmp); + return __r; + }); + } + + // copysign in simd_math.h + _GLIBCXX_SIMD_MATH_FALLBACK(nextafter) + _GLIBCXX_SIMD_MATH_FALLBACK(fdim) + _GLIBCXX_SIMD_MATH_FALLBACK(fmax) + _GLIBCXX_SIMD_MATH_FALLBACK(fmin) + _GLIBCXX_SIMD_MATH_FALLBACK(fma) + + template <typename _Tp, size_t _Np> + static constexpr _MaskMember<_Tp> + _S_isgreater(_SimdWrapper<_Tp, _Np> __x, + _SimdWrapper<_Tp, _Np> __y) noexcept + { + using _Ip = __int_for_sizeof_t<_Tp>; + const auto __xn = __vector_bitcast<_Ip>(__x); + const auto __yn = __vector_bitcast<_Ip>(__y); + const auto __xp = __xn < 0 ? -(__xn & __finite_max_v<_Ip>) : __xn; + const auto __yp = __yn < 0 ? -(__yn & __finite_max_v<_Ip>) : __yn; + return __andnot(_SuperImpl::_S_isunordered(__x, __y)._M_data, + __xp > __yp); + } + + template <typename _Tp, size_t _Np> + static constexpr _MaskMember<_Tp> + _S_isgreaterequal(_SimdWrapper<_Tp, _Np> __x, + _SimdWrapper<_Tp, _Np> __y) noexcept + { + using _Ip = __int_for_sizeof_t<_Tp>; + const auto __xn = __vector_bitcast<_Ip>(__x); + const auto __yn = __vector_bitcast<_Ip>(__y); + const auto __xp = __xn < 0 ? -(__xn & __finite_max_v<_Ip>) : __xn; + const auto __yp = __yn < 0 ? -(__yn & __finite_max_v<_Ip>) : __yn; + return __andnot(_SuperImpl::_S_isunordered(__x, __y)._M_data, + __xp >= __yp); + } + + template <typename _Tp, size_t _Np> + static constexpr _MaskMember<_Tp> + _S_isless(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y) noexcept + { + using _Ip = __int_for_sizeof_t<_Tp>; + const auto __xn = __vector_bitcast<_Ip>(__x); + const auto __yn = __vector_bitcast<_Ip>(__y); + const auto __xp = __xn < 0 ? -(__xn & __finite_max_v<_Ip>) : __xn; + const auto __yp = __yn < 0 ? -(__yn & __finite_max_v<_Ip>) : __yn; + return __andnot(_SuperImpl::_S_isunordered(__x, __y)._M_data, + __xp < __yp); + } + + template <typename _Tp, size_t _Np> + static constexpr _MaskMember<_Tp> + _S_islessequal(_SimdWrapper<_Tp, _Np> __x, + _SimdWrapper<_Tp, _Np> __y) noexcept + { + using _Ip = __int_for_sizeof_t<_Tp>; + const auto __xn = __vector_bitcast<_Ip>(__x); + const auto __yn = __vector_bitcast<_Ip>(__y); + const auto __xp = __xn < 0 ? -(__xn & __finite_max_v<_Ip>) : __xn; + const auto __yp = __yn < 0 ? -(__yn & __finite_max_v<_Ip>) : __yn; + return __andnot(_SuperImpl::_S_isunordered(__x, __y)._M_data, + __xp <= __yp); + } + + template <typename _Tp, size_t _Np> + static constexpr _MaskMember<_Tp> + _S_islessgreater(_SimdWrapper<_Tp, _Np> __x, + _SimdWrapper<_Tp, _Np> __y) noexcept + { + return __andnot(_SuperImpl::_S_isunordered(__x, __y), + _SuperImpl::_S_not_equal_to(__x, __y)); + } + +#undef _GLIBCXX_SIMD_MATH_FALLBACK +#undef _GLIBCXX_SIMD_MATH_FALLBACK_MASKRET +#undef _GLIBCXX_SIMD_MATH_FALLBACK_FIXEDRET + // _S_abs {{{3 + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static _SimdWrapper<_Tp, _Np> + _S_abs(_SimdWrapper<_Tp, _Np> __x) noexcept + { + // if (__builtin_is_constant_evaluated()) + // { + // return __x._M_data < 0 ? -__x._M_data : __x._M_data; + // } + if constexpr (is_floating_point_v<_Tp>) + // `v < 0 ? -v : v` cannot compile to the efficient implementation of + // masking the signbit off because it must consider v == -0 + + // ~(-0.) & v would be easy, but breaks with fno-signed-zeros + return __and(_S_absmask<__vector_type_t<_Tp, _Np>>, __x._M_data); + else + return __x._M_data < 0 ? -__x._M_data : __x._M_data; + } + + // }}}3 + // _S_plus_minus {{{ + // Returns __x + __y - __y without -fassociative-math optimizing to __x. + // - _TV must be __vector_type_t<floating-point type, N>. + // - _UV must be _TV or floating-point type. + template <typename _TV, typename _UV> + _GLIBCXX_SIMD_INTRINSIC static constexpr _TV _S_plus_minus(_TV __x, + _UV __y) noexcept + { + #if defined __i386__ && !defined __SSE_MATH__ + if constexpr (sizeof(__x) == 8) + { // operations on __x would use the FPU + static_assert(is_same_v<_TV, __vector_type_t<float, 2>>); + const auto __x4 = __vector_bitcast<float, 4>(__x); + if constexpr (is_same_v<_TV, _UV>) + return __vector_bitcast<float, 2>( + _S_plus_minus(__x4, __vector_bitcast<float, 4>(__y))); + else + return __vector_bitcast<float, 2>(_S_plus_minus(__x4, __y)); + } + #endif + #if !defined __clang__ && __GCC_IEC_559 == 0 + if (__builtin_is_constant_evaluated() + || (__builtin_constant_p(__x) && __builtin_constant_p(__y))) + return (__x + __y) - __y; + else + return [&] { + __x += __y; + if constexpr(__have_sse) + { + if constexpr (sizeof(__x) >= 16) + asm("" : "+x"(__x)); + else if constexpr (is_same_v<__vector_type_t<float, 2>, _TV>) + asm("" : "+x"(__x[0]), "+x"(__x[1])); + else + __assert_unreachable<_TV>(); + } + else if constexpr(__have_neon) + asm("" : "+w"(__x)); + else if constexpr (__have_power_vmx) + { + if constexpr (is_same_v<__vector_type_t<float, 2>, _TV>) + asm("" : "+fgr"(__x[0]), "+fgr"(__x[1])); + else + asm("" : "+v"(__x)); + } + else + asm("" : "+g"(__x)); + return __x - __y; + }(); + #else + return (__x + __y) - __y; + #endif + } + + // }}} + // _S_nearbyint {{{3 + template <typename _Tp, typename _TVT = _VectorTraits<_Tp>> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_nearbyint(_Tp __x_) noexcept + { + using value_type = typename _TVT::value_type; + using _V = typename _TVT::type; + const _V __x = __x_; + const _V __absx = __and(__x, _S_absmask<_V>); + static_assert(__CHAR_BIT__ * sizeof(1ull) >= __digits_v<value_type>); + _GLIBCXX_SIMD_USE_CONSTEXPR _V __shifter_abs + = _V() + (1ull << (__digits_v<value_type> - 1)); + const _V __shifter = __or(__and(_S_signmask<_V>, __x), __shifter_abs); + const _V __shifted = _S_plus_minus(__x, __shifter); + return __absx < __shifter_abs ? __shifted : __x; + } + + // _S_rint {{{3 + template <typename _Tp, typename _TVT = _VectorTraits<_Tp>> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_rint(_Tp __x) noexcept + { + return _SuperImpl::_S_nearbyint(__x); + } + + // _S_trunc {{{3 + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static _SimdWrapper<_Tp, _Np> + _S_trunc(_SimdWrapper<_Tp, _Np> __x) + { + using _V = __vector_type_t<_Tp, _Np>; + const _V __absx = __and(__x._M_data, _S_absmask<_V>); + static_assert(__CHAR_BIT__ * sizeof(1ull) >= __digits_v<_Tp>); + constexpr _Tp __shifter = 1ull << (__digits_v<_Tp> - 1); + _V __truncated = _S_plus_minus(__absx, __shifter); + __truncated -= __truncated > __absx ? _V() + 1 : _V(); + return __absx < __shifter ? __or(__xor(__absx, __x._M_data), __truncated) + : __x._M_data; + } + + // _S_round {{{3 + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static _SimdWrapper<_Tp, _Np> + _S_round(_SimdWrapper<_Tp, _Np> __x) + { + const auto __abs_x = _SuperImpl::_S_abs(__x); + const auto __t_abs = _SuperImpl::_S_trunc(__abs_x)._M_data; + const auto __r_abs // round(abs(x)) = + = __t_abs + (__abs_x._M_data - __t_abs >= _Tp(.5) ? _Tp(1) : 0); + return __or(__xor(__abs_x._M_data, __x._M_data), __r_abs); + } + + // _S_floor {{{3 + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static _SimdWrapper<_Tp, _Np> + _S_floor(_SimdWrapper<_Tp, _Np> __x) + { + const auto __y = _SuperImpl::_S_trunc(__x)._M_data; + const auto __negative_input + = __vector_bitcast<_Tp>(__x._M_data < __vector_broadcast<_Np, _Tp>(0)); + const auto __mask + = __andnot(__vector_bitcast<_Tp>(__y == __x._M_data), __negative_input); + return __or(__andnot(__mask, __y), + __and(__mask, __y - __vector_broadcast<_Np, _Tp>(1))); + } + + // _S_ceil {{{3 + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static _SimdWrapper<_Tp, _Np> + _S_ceil(_SimdWrapper<_Tp, _Np> __x) + { + const auto __y = _SuperImpl::_S_trunc(__x)._M_data; + const auto __negative_input + = __vector_bitcast<_Tp>(__x._M_data < __vector_broadcast<_Np, _Tp>(0)); + const auto __inv_mask + = __or(__vector_bitcast<_Tp>(__y == __x._M_data), __negative_input); + return __or(__and(__inv_mask, __y), + __andnot(__inv_mask, __y + __vector_broadcast<_Np, _Tp>(1))); + } + + // _S_isnan {{{3 + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static _MaskMember<_Tp> + _S_isnan([[maybe_unused]] _SimdWrapper<_Tp, _Np> __x) + { + #if __FINITE_MATH_ONLY__ + return {}; // false + #elif !defined __SUPPORT_SNAN__ + return ~(__x._M_data == __x._M_data); + #elif defined __STDC_IEC_559__ + using _Ip = __int_for_sizeof_t<_Tp>; + const auto __absn = __vector_bitcast<_Ip>(_SuperImpl::_S_abs(__x)); + const auto __infn + = __vector_bitcast<_Ip>(__vector_broadcast<_Np>(__infinity_v<_Tp>)); + return __infn < __absn; + #else + #error "Not implemented: how to support SNaN but non-IEC559 floating-point?" + #endif + } + + // _S_isfinite {{{3 + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static _MaskMember<_Tp> + _S_isfinite([[maybe_unused]] _SimdWrapper<_Tp, _Np> __x) + { + #if __FINITE_MATH_ONLY__ + using _UV = typename _MaskMember<_Tp>::_BuiltinType; + _GLIBCXX_SIMD_USE_CONSTEXPR _UV __alltrue = ~_UV(); + return __alltrue; + #else + // if all exponent bits are set, __x is either inf or NaN + using _Ip = __int_for_sizeof_t<_Tp>; + const auto __absn = __vector_bitcast<_Ip>(_SuperImpl::_S_abs(__x)); + const auto __maxn + = __vector_bitcast<_Ip>(__vector_broadcast<_Np>(__finite_max_v<_Tp>)); + return __absn <= __maxn; + #endif + } + + // _S_isunordered {{{3 + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static _MaskMember<_Tp> + _S_isunordered(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y) + { + return __or(_S_isnan(__x), _S_isnan(__y)); + } + + // _S_signbit {{{3 + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static _MaskMember<_Tp> + _S_signbit(_SimdWrapper<_Tp, _Np> __x) + { + using _Ip = __int_for_sizeof_t<_Tp>; + return __vector_bitcast<_Ip>(__x) < 0; + // Arithmetic right shift (SRA) would also work (instead of compare), but + // 64-bit SRA isn't available on x86 before AVX512. And in general, + // compares are more likely to be efficient than SRA. + } + + // _S_isinf {{{3 + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static _MaskMember<_Tp> + _S_isinf([[maybe_unused]] _SimdWrapper<_Tp, _Np> __x) + { + #if __FINITE_MATH_ONLY__ + return {}; // false + #else + return _SuperImpl::template _S_equal_to<_Tp, _Np>(_SuperImpl::_S_abs(__x), + __vector_broadcast<_Np>( + __infinity_v<_Tp>)); + // alternative: + // compare to inf using the corresponding integer type + /* + return + __vector_bitcast<_Tp>(__vector_bitcast<__int_for_sizeof_t<_Tp>>( + _S_abs(__x)._M_data) + == + __vector_bitcast<__int_for_sizeof_t<_Tp>>(__vector_broadcast<_Np>( + __infinity_v<_Tp>))); + */ + #endif + } + + // _S_isnormal {{{3 + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static _MaskMember<_Tp> + _S_isnormal(_SimdWrapper<_Tp, _Np> __x) + { + using _Ip = __int_for_sizeof_t<_Tp>; + const auto __absn = __vector_bitcast<_Ip>(_SuperImpl::_S_abs(__x)); + const auto __minn + = __vector_bitcast<_Ip>(__vector_broadcast<_Np>(__norm_min_v<_Tp>)); + #if __FINITE_MATH_ONLY__ + return __absn >= __minn; + #else + const auto __maxn + = __vector_bitcast<_Ip>(__vector_broadcast<_Np>(__finite_max_v<_Tp>)); + return __minn <= __absn && __absn <= __maxn; + #endif + } + + // _S_fpclassify {{{3 + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static __fixed_size_storage_t<int, _Np> + _S_fpclassify(_SimdWrapper<_Tp, _Np> __x) + { + using _I = __int_for_sizeof_t<_Tp>; + const auto __xn + = __vector_bitcast<_I>(__to_intrin(_SuperImpl::_S_abs(__x))); + constexpr size_t _NI = sizeof(__xn) / sizeof(_I); + _GLIBCXX_SIMD_USE_CONSTEXPR auto __minn + = __vector_bitcast<_I>(__vector_broadcast<_NI>(__norm_min_v<_Tp>)); + _GLIBCXX_SIMD_USE_CONSTEXPR auto __infn + = __vector_bitcast<_I>(__vector_broadcast<_NI>(__infinity_v<_Tp>)); + + _GLIBCXX_SIMD_USE_CONSTEXPR auto __fp_normal + = __vector_broadcast<_NI, _I>(FP_NORMAL); + #if !__FINITE_MATH_ONLY__ + _GLIBCXX_SIMD_USE_CONSTEXPR auto __fp_nan + = __vector_broadcast<_NI, _I>(FP_NAN); + _GLIBCXX_SIMD_USE_CONSTEXPR auto __fp_infinite + = __vector_broadcast<_NI, _I>(FP_INFINITE); + #endif + #ifndef __FAST_MATH__ + _GLIBCXX_SIMD_USE_CONSTEXPR auto __fp_subnormal + = __vector_broadcast<_NI, _I>(FP_SUBNORMAL); + #endif + _GLIBCXX_SIMD_USE_CONSTEXPR auto __fp_zero + = __vector_broadcast<_NI, _I>(FP_ZERO); + + __vector_type_t<_I, _NI> + __tmp = __xn < __minn + #ifdef __FAST_MATH__ + ? __fp_zero + #else + ? (__xn == 0 ? __fp_zero : __fp_subnormal) + #endif + #if __FINITE_MATH_ONLY__ + : __fp_normal; + #else + : (__xn < __infn ? __fp_normal + : (__xn == __infn ? __fp_infinite : __fp_nan)); + #endif + + if constexpr (sizeof(_I) == sizeof(int)) + { + using _FixedInt = __fixed_size_storage_t<int, _Np>; + const auto __as_int = __vector_bitcast<int, _Np>(__tmp); + if constexpr (_FixedInt::_S_tuple_size == 1) + return {__as_int}; + else if constexpr (_FixedInt::_S_tuple_size == 2 + && is_same_v< + typename _FixedInt::_SecondType::_FirstAbi, + simd_abi::scalar>) + return {__extract<0, 2>(__as_int), __as_int[_Np - 1]}; + else if constexpr (_FixedInt::_S_tuple_size == 2) + return {__extract<0, 2>(__as_int), + __auto_bitcast(__extract<1, 2>(__as_int))}; + else + __assert_unreachable<_Tp>(); + } + else if constexpr (_Np == 2 && sizeof(_I) == 8 + && __fixed_size_storage_t<int, _Np>::_S_tuple_size == 2) + { + const auto __aslong = __vector_bitcast<_LLong>(__tmp); + return {int(__aslong[0]), {int(__aslong[1])}}; + } + #if _GLIBCXX_SIMD_X86INTRIN + else if constexpr (sizeof(_Tp) == 8 && sizeof(__tmp) == 32 + && __fixed_size_storage_t<int, _Np>::_S_tuple_size == 1) + return {_mm_packs_epi32(__to_intrin(__lo128(__tmp)), + __to_intrin(__hi128(__tmp)))}; + else if constexpr (sizeof(_Tp) == 8 && sizeof(__tmp) == 64 + && __fixed_size_storage_t<int, _Np>::_S_tuple_size == 1) + return {_mm512_cvtepi64_epi32(__to_intrin(__tmp))}; + #endif // _GLIBCXX_SIMD_X86INTRIN + else if constexpr (__fixed_size_storage_t<int, _Np>::_S_tuple_size == 1) + return {__call_with_subscripts<_Np>(__vector_bitcast<_LLong>(__tmp), + [](auto... __l) { + return __make_wrapper<int>(__l...); + })}; + else + __assert_unreachable<_Tp>(); + } + + // _S_increment & _S_decrement{{{2 + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static void + _S_increment(_SimdWrapper<_Tp, _Np>& __x) + { __x = __x._M_data + 1; } + + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static void + _S_decrement(_SimdWrapper<_Tp, _Np>& __x) + { __x = __x._M_data - 1; } + + // smart_reference access {{{2 + template <typename _Tp, size_t _Np, typename _Up> + _GLIBCXX_SIMD_INTRINSIC constexpr static void + _S_set(_SimdWrapper<_Tp, _Np>& __v, int __i, _Up&& __x) noexcept + { __v._M_set(__i, static_cast<_Up&&>(__x)); } + + // _S_masked_assign{{{2 + template <typename _Tp, typename _K, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static void + _S_masked_assign(_SimdWrapper<_K, _Np> __k, _SimdWrapper<_Tp, _Np>& __lhs, + __type_identity_t<_SimdWrapper<_Tp, _Np>> __rhs) + { + if (__k._M_is_constprop_none_of()) + return; + else if (__k._M_is_constprop_all_of()) + __lhs = __rhs; + else + __lhs = _CommonImpl::_S_blend(__k, __lhs, __rhs); + } + + template <typename _Tp, typename _K, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static void + _S_masked_assign(_SimdWrapper<_K, _Np> __k, _SimdWrapper<_Tp, _Np>& __lhs, + __type_identity_t<_Tp> __rhs) + { + if (__k._M_is_constprop_none_of()) + return; + else if (__k._M_is_constprop_all_of()) + __lhs = __vector_broadcast<_Np>(__rhs); + else if (__builtin_constant_p(__rhs) && __rhs == 0) + { + if constexpr (!is_same_v<bool, _K>) + // the __andnot optimization only makes sense if __k._M_data is a + // vector register + __lhs._M_data + = __andnot(__vector_bitcast<_Tp>(__k), __lhs._M_data); + else + // for AVX512/__mmask, a _mm512_maskz_mov is best + __lhs + = _CommonImpl::_S_blend(__k, __lhs, _SimdWrapper<_Tp, _Np>()); + } + else + __lhs = _CommonImpl::_S_blend(__k, __lhs, + _SimdWrapper<_Tp, _Np>( + __vector_broadcast<_Np>(__rhs))); + } + + // _S_masked_cassign {{{2 + template <typename _Op, typename _Tp, typename _K, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static void + _S_masked_cassign(const _SimdWrapper<_K, _Np> __k, + _SimdWrapper<_Tp, _Np>& __lhs, + const __type_identity_t<_SimdWrapper<_Tp, _Np>> __rhs, + _Op __op) + { + if (__k._M_is_constprop_none_of()) + return; + else if (__k._M_is_constprop_all_of()) + __lhs = __op(_SuperImpl{}, __lhs, __rhs); + else + __lhs = _CommonImpl::_S_blend(__k, __lhs, + __op(_SuperImpl{}, __lhs, __rhs)); + } + + template <typename _Op, typename _Tp, typename _K, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static void + _S_masked_cassign(const _SimdWrapper<_K, _Np> __k, + _SimdWrapper<_Tp, _Np>& __lhs, + const __type_identity_t<_Tp> __rhs, _Op __op) + { _S_masked_cassign(__k, __lhs, __vector_broadcast<_Np>(__rhs), __op); } + + // _S_masked_unary {{{2 + template <template <typename> class _Op, typename _Tp, typename _K, + size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static _SimdWrapper<_Tp, _Np> + _S_masked_unary(const _SimdWrapper<_K, _Np> __k, + const _SimdWrapper<_Tp, _Np> __v) + { + if (__k._M_is_constprop_none_of()) + return __v; + auto __vv = _M_make_simd(__v); + _Op<decltype(__vv)> __op; + if (__k._M_is_constprop_all_of()) + return __data(__op(__vv)); + else + return _CommonImpl::_S_blend(__k, __v, __data(__op(__vv))); + } + + //}}}2 + }; + +// _MaskImplBuiltinMixin {{{1 +struct _MaskImplBuiltinMixin +{ + template <typename _Tp> + using _TypeTag = _Tp*; + + // _S_to_maskvector {{{ + template <typename _Up, size_t _ToN = 1> + _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Up, _ToN> + _S_to_maskvector(bool __x) + { + static_assert(is_same_v<_Up, __int_for_sizeof_t<_Up>>); + return __x ? __vector_type_t<_Up, _ToN>{~_Up()} + : __vector_type_t<_Up, _ToN>{}; + } + + template <typename _Up, size_t _UpN = 0, size_t _Np, bool _Sanitized, + size_t _ToN = _UpN == 0 ? _Np : _UpN> + _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Up, _ToN> + _S_to_maskvector(_BitMask<_Np, _Sanitized> __x) + { + static_assert(is_same_v<_Up, __int_for_sizeof_t<_Up>>); + return __generate_vector<__vector_type_t<_Up, _ToN>>([&]( + auto __i) constexpr { + if constexpr (__i < _Np) + return __x[__i] ? ~_Up() : _Up(); + else + return _Up(); + }); + } + + template <typename _Up, size_t _UpN = 0, typename _Tp, size_t _Np, + size_t _ToN = _UpN == 0 ? _Np : _UpN> + _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Up, _ToN> + _S_to_maskvector(_SimdWrapper<_Tp, _Np> __x) + { + static_assert(is_same_v<_Up, __int_for_sizeof_t<_Up>>); + using _TW = _SimdWrapper<_Tp, _Np>; + using _UW = _SimdWrapper<_Up, _ToN>; + if constexpr (sizeof(_Up) == sizeof(_Tp) && sizeof(_TW) == sizeof(_UW)) + return __wrapper_bitcast<_Up, _ToN>(__x); + else if constexpr (is_same_v<_Tp, bool>) // bits -> vector + return _S_to_maskvector<_Up, _ToN>(_BitMask<_Np>(__x._M_data)); + else + { // vector -> vector + /* + [[maybe_unused]] const auto __y = __vector_bitcast<_Up>(__x._M_data); + if constexpr (sizeof(_Tp) == 8 && sizeof(_Up) == 4 && sizeof(__y) == + 16) return __vector_permute<1, 3, -1, -1>(__y); else if constexpr + (sizeof(_Tp) == 4 && sizeof(_Up) == 2 + && sizeof(__y) == 16) + return __vector_permute<1, 3, 5, 7, -1, -1, -1, -1>(__y); + else if constexpr (sizeof(_Tp) == 8 && sizeof(_Up) == 2 + && sizeof(__y) == 16) + return __vector_permute<3, 7, -1, -1, -1, -1, -1, -1>(__y); + else if constexpr (sizeof(_Tp) == 2 && sizeof(_Up) == 1 + && sizeof(__y) == 16) + return __vector_permute<1, 3, 5, 7, 9, 11, 13, 15, -1, -1, -1, -1, + -1, -1, -1, -1>(__y); else if constexpr (sizeof(_Tp) == 4 && + sizeof(_Up) == 1 + && sizeof(__y) == 16) + return __vector_permute<3, 7, 11, 15, -1, -1, -1, -1, -1, -1, -1, + -1, -1, -1, -1, -1>(__y); else if constexpr (sizeof(_Tp) == 8 && + sizeof(_Up) == 1 + && sizeof(__y) == 16) + return __vector_permute<7, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1, + -1, -1, -1, -1, -1>(__y); else + */ + { + return __generate_vector<__vector_type_t<_Up, _ToN>>([&]( + auto __i) constexpr { + if constexpr (__i < _Np) + return _Up(__x[__i.value]); + else + return _Up(); + }); + } + } + } + + // }}} + // _S_to_bits {{{ + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr _SanitizedBitMask<_Np> + _S_to_bits(_SimdWrapper<_Tp, _Np> __x) + { + static_assert(!is_same_v<_Tp, bool>); + static_assert(_Np <= __CHAR_BIT__ * sizeof(_ULLong)); + using _Up = make_unsigned_t<__int_for_sizeof_t<_Tp>>; + const auto __bools + = __vector_bitcast<_Up>(__x) >> (sizeof(_Up) * __CHAR_BIT__ - 1); + _ULLong __r = 0; + __execute_n_times<_Np>( + [&](auto __i) { __r |= _ULLong(__bools[__i.value]) << __i; }); + return __r; + } + + // }}} +}; + +// _MaskImplBuiltin {{{1 +template <typename _Abi> + struct _MaskImplBuiltin : _MaskImplBuiltinMixin + { + using _MaskImplBuiltinMixin::_S_to_bits; + using _MaskImplBuiltinMixin::_S_to_maskvector; + + // member types {{{ + template <typename _Tp> + using _SimdMember = typename _Abi::template __traits<_Tp>::_SimdMember; + + template <typename _Tp> + using _MaskMember = typename _Abi::template _MaskMember<_Tp>; + + using _SuperImpl = typename _Abi::_MaskImpl; + using _CommonImpl = typename _Abi::_CommonImpl; + + template <typename _Tp> + static constexpr size_t _S_size = simd_size_v<_Tp, _Abi>; + + // }}} + // _S_broadcast {{{ + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember<_Tp> + _S_broadcast(bool __x) + { + return __x ? _Abi::template _S_implicit_mask<_Tp>() + : _MaskMember<_Tp>(); + } + + // }}} + // _S_load {{{ + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember<_Tp> + _S_load(const bool* __mem) + { + using _I = __int_for_sizeof_t<_Tp>; + if constexpr (sizeof(_Tp) == sizeof(bool)) + { + const auto __bools + = _CommonImpl::template _S_load<_I, _S_size<_Tp>>(__mem); + // bool is {0, 1}, everything else is UB + return __bools > 0; + } + else + return __generate_vector<_I, _S_size<_Tp>>([&](auto __i) constexpr { + return __mem[__i] ? ~_I() : _I(); + }); + } + + // }}} + // _S_convert {{{ + template <typename _Tp, size_t _Np, bool _Sanitized> + _GLIBCXX_SIMD_INTRINSIC static constexpr auto + _S_convert(_BitMask<_Np, _Sanitized> __x) + { + if constexpr (__is_builtin_bitmask_abi<_Abi>()) + return _SimdWrapper<bool, simd_size_v<_Tp, _Abi>>(__x._M_to_bits()); + else + return _SuperImpl::template _S_to_maskvector<__int_for_sizeof_t<_Tp>, + _S_size<_Tp>>( + __x._M_sanitized()); + } + + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr auto + _S_convert(_SimdWrapper<bool, _Np> __x) + { + if constexpr (__is_builtin_bitmask_abi<_Abi>()) + return _SimdWrapper<bool, simd_size_v<_Tp, _Abi>>(__x._M_data); + else + return _SuperImpl::template _S_to_maskvector<__int_for_sizeof_t<_Tp>, + _S_size<_Tp>>( + _BitMask<_Np>(__x._M_data)._M_sanitized()); + } + + template <typename _Tp, typename _Up, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr auto + _S_convert(_SimdWrapper<_Up, _Np> __x) + { + if constexpr (__is_builtin_bitmask_abi<_Abi>()) + return _SimdWrapper<bool, simd_size_v<_Tp, _Abi>>( + _SuperImpl::_S_to_bits(__x)); + else + return _SuperImpl::template _S_to_maskvector<__int_for_sizeof_t<_Tp>, + _S_size<_Tp>>(__x); + } + + template <typename _Tp, typename _Up, typename _UAbi> + _GLIBCXX_SIMD_INTRINSIC static constexpr auto + _S_convert(simd_mask<_Up, _UAbi> __x) + { + if constexpr (__is_builtin_bitmask_abi<_Abi>()) + { + using _R = _SimdWrapper<bool, simd_size_v<_Tp, _Abi>>; + if constexpr (__is_builtin_bitmask_abi<_UAbi>()) // bits -> bits + return _R(__data(__x)); + else if constexpr (__is_scalar_abi<_UAbi>()) // bool -> bits + return _R(__data(__x)); + else if constexpr (__is_fixed_size_abi_v<_UAbi>) // bitset -> bits + return _R(__data(__x)._M_to_bits()); + else // vector -> bits + return _R(_UAbi::_MaskImpl::_S_to_bits(__data(__x))._M_to_bits()); + } + else + return _SuperImpl::template _S_to_maskvector<__int_for_sizeof_t<_Tp>, + _S_size<_Tp>>( + __data(__x)); + } + + // }}} + // _S_masked_load {{{2 + template <typename _Tp, size_t _Np> + static inline _SimdWrapper<_Tp, _Np> + _S_masked_load(_SimdWrapper<_Tp, _Np> __merge, + _SimdWrapper<_Tp, _Np> __mask, const bool* __mem) noexcept + { + // AVX(2) has 32/64 bit maskload, but nothing at 8 bit granularity + auto __tmp = __wrapper_bitcast<__int_for_sizeof_t<_Tp>>(__merge); + _BitOps::_S_bit_iteration(_SuperImpl::_S_to_bits(__mask), + [&](auto __i) { + __tmp._M_set(__i, -__mem[__i]); + }); + __merge = __wrapper_bitcast<_Tp>(__tmp); + return __merge; + } + + // _S_store {{{2 + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static void _S_store(_SimdWrapper<_Tp, _Np> __v, + bool* __mem) noexcept + { + __execute_n_times<_Np>([&](auto __i) constexpr { + __mem[__i] = __v[__i]; + }); + } + + // _S_masked_store {{{2 + template <typename _Tp, size_t _Np> + static inline void + _S_masked_store(const _SimdWrapper<_Tp, _Np> __v, bool* __mem, + const _SimdWrapper<_Tp, _Np> __k) noexcept + { + _BitOps::_S_bit_iteration( + _SuperImpl::_S_to_bits(__k), [&](auto __i) constexpr { + __mem[__i] = __v[__i]; + }); + } + + // _S_from_bitmask{{{2 + template <size_t _Np, typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _MaskMember<_Tp> + _S_from_bitmask(_SanitizedBitMask<_Np> __bits, _TypeTag<_Tp>) + { + return _SuperImpl::template _S_to_maskvector<_Tp, _S_size<_Tp>>(__bits); + } + + // logical and bitwise operators {{{2 + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np> + _S_logical_and(const _SimdWrapper<_Tp, _Np>& __x, + const _SimdWrapper<_Tp, _Np>& __y) + { return __and(__x._M_data, __y._M_data); } + + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np> + _S_logical_or(const _SimdWrapper<_Tp, _Np>& __x, + const _SimdWrapper<_Tp, _Np>& __y) + { return __or(__x._M_data, __y._M_data); } + + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np> + _S_bit_not(const _SimdWrapper<_Tp, _Np>& __x) + { + if constexpr (_Abi::template _S_is_partial<_Tp>) + return __andnot(__x, __wrapper_bitcast<_Tp>( + _Abi::template _S_implicit_mask<_Tp>())); + else + return __not(__x._M_data); + } + + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np> + _S_bit_and(const _SimdWrapper<_Tp, _Np>& __x, + const _SimdWrapper<_Tp, _Np>& __y) + { return __and(__x._M_data, __y._M_data); } + + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np> + _S_bit_or(const _SimdWrapper<_Tp, _Np>& __x, + const _SimdWrapper<_Tp, _Np>& __y) + { return __or(__x._M_data, __y._M_data); } + + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np> + _S_bit_xor(const _SimdWrapper<_Tp, _Np>& __x, + const _SimdWrapper<_Tp, _Np>& __y) + { return __xor(__x._M_data, __y._M_data); } + + // smart_reference access {{{2 + template <typename _Tp, size_t _Np> + static constexpr void _S_set(_SimdWrapper<_Tp, _Np>& __k, int __i, + bool __x) noexcept + { + if constexpr (is_same_v<_Tp, bool>) + __k._M_set(__i, __x); + else + { + static_assert(is_same_v<_Tp, __int_for_sizeof_t<_Tp>>); + if (__builtin_is_constant_evaluated()) + { + __k = __generate_from_n_evaluations<_Np, + __vector_type_t<_Tp, _Np>>( + [&](auto __j) { + if (__i == __j) + return _Tp(-__x); + else + return __k[+__j]; + }); + } + else + __k._M_data[__i] = -__x; + } + } + + // _S_masked_assign{{{2 + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static void + _S_masked_assign(_SimdWrapper<_Tp, _Np> __k, + _SimdWrapper<_Tp, _Np>& __lhs, + __type_identity_t<_SimdWrapper<_Tp, _Np>> __rhs) + { __lhs = _CommonImpl::_S_blend(__k, __lhs, __rhs); } + + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static void + _S_masked_assign(_SimdWrapper<_Tp, _Np> __k, + _SimdWrapper<_Tp, _Np>& __lhs, bool __rhs) + { + if (__builtin_constant_p(__rhs)) + { + if (__rhs == false) + __lhs = __andnot(__k, __lhs); + else + __lhs = __or(__k, __lhs); + return; + } + __lhs = _CommonImpl::_S_blend(__k, __lhs, + __data(simd_mask<_Tp, _Abi>(__rhs))); + } + + //}}}2 + // _S_all_of {{{ + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static bool + _S_all_of(simd_mask<_Tp, _Abi> __k) + { + return __call_with_subscripts( + __data(__k), make_index_sequence<_S_size<_Tp>>(), + [](const auto... __ent) constexpr { return (... && !(__ent == 0)); }); + } + + // }}} + // _S_any_of {{{ + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static bool + _S_any_of(simd_mask<_Tp, _Abi> __k) + { + return __call_with_subscripts( + __data(__k), make_index_sequence<_S_size<_Tp>>(), + [](const auto... __ent) constexpr { return (... || !(__ent == 0)); }); + } + + // }}} + // _S_none_of {{{ + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static bool + _S_none_of(simd_mask<_Tp, _Abi> __k) + { + return __call_with_subscripts( + __data(__k), make_index_sequence<_S_size<_Tp>>(), + [](const auto... __ent) constexpr { return (... && (__ent == 0)); }); + } + + // }}} + // _S_some_of {{{ + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static bool + _S_some_of(simd_mask<_Tp, _Abi> __k) + { + const int __n_true = _S_popcount(__k); + return __n_true > 0 && __n_true < int(_S_size<_Tp>); + } + + // }}} + // _S_popcount {{{ + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static int + _S_popcount(simd_mask<_Tp, _Abi> __k) + { + using _I = __int_for_sizeof_t<_Tp>; + if constexpr (is_default_constructible_v<simd<_I, _Abi>>) + return -reduce( + simd<_I, _Abi>(__private_init, __wrapper_bitcast<_I>(__data(__k)))); + else + return -reduce(__bit_cast<rebind_simd_t<_I, simd<_Tp, _Abi>>>( + simd<_Tp, _Abi>(__private_init, __data(__k)))); + } + + // }}} + // _S_find_first_set {{{ + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static int + _S_find_first_set(simd_mask<_Tp, _Abi> __k) + { + return std::__countr_zero( + _SuperImpl::_S_to_bits(__data(__k))._M_to_bits()); + } + + // }}} + // _S_find_last_set {{{ + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static int + _S_find_last_set(simd_mask<_Tp, _Abi> __k) + { + return std::__bit_width( + _SuperImpl::_S_to_bits(__data(__k))._M_to_bits()) - 1; + } + + // }}} + }; + +//}}}1 +_GLIBCXX_SIMD_END_NAMESPACE +#endif // __cplusplus >= 201703L +#endif // _GLIBCXX_EXPERIMENTAL_SIMD_ABIS_H_ + +// vim: foldmethod=marker foldmarker={{{,}}} sw=2 noet ts=8 sts=2 tw=80 diff --git a/libstdc++-v3/include/experimental/bits/simd_converter.h b/libstdc++-v3/include/experimental/bits/simd_converter.h new file mode 100644 index 00000000000..dc4598743f9 --- /dev/null +++ b/libstdc++-v3/include/experimental/bits/simd_converter.h @@ -0,0 +1,354 @@ +// Generic simd conversions -*- C++ -*- + +// Copyright (C) 2020 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// Under Section 7 of GPL version 3, you are granted additional +// permissions described in the GCC Runtime Library Exception, version +// 3.1, as published by the Free Software Foundation. + +// You should have received a copy of the GNU General Public License and +// a copy of the GCC Runtime Library Exception along with this program; +// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +// <http://www.gnu.org/licenses/>. + +#ifndef _GLIBCXX_EXPERIMENTAL_SIMD_CONVERTER_H_ +#define _GLIBCXX_EXPERIMENTAL_SIMD_CONVERTER_H_ + +#if __cplusplus >= 201703L + +_GLIBCXX_SIMD_BEGIN_NAMESPACE +// _SimdConverter scalar -> scalar {{{ +template <typename _From, typename _To> + struct _SimdConverter<_From, simd_abi::scalar, _To, simd_abi::scalar, + enable_if_t<!is_same_v<_From, _To>>> + { + _GLIBCXX_SIMD_INTRINSIC constexpr _To operator()(_From __a) const noexcept + { return static_cast<_To>(__a); } + }; + +// }}} +// _SimdConverter scalar -> "native" {{{ +template <typename _From, typename _To, typename _Abi> + struct _SimdConverter<_From, simd_abi::scalar, _To, _Abi, + enable_if_t<!is_same_v<_Abi, simd_abi::scalar>>> + { + using _Ret = typename _Abi::template __traits<_To>::_SimdMember; + + template <typename... _More> + _GLIBCXX_SIMD_INTRINSIC constexpr _Ret + operator()(_From __a, _More... __more) const noexcept + { + static_assert(sizeof...(_More) + 1 == _Abi::template _S_size<_To>); + static_assert(conjunction_v<is_same<_From, _More>...>); + return __make_vector<_To>(__a, __more...); + } + }; + +// }}} +// _SimdConverter "native 1" -> "native 2" {{{ +template <typename _From, typename _To, typename _AFrom, typename _ATo> + struct _SimdConverter< + _From, _AFrom, _To, _ATo, + enable_if_t<!disjunction_v< + __is_fixed_size_abi<_AFrom>, __is_fixed_size_abi<_ATo>, + is_same<_AFrom, simd_abi::scalar>, is_same<_ATo, simd_abi::scalar>, + conjunction<is_same<_From, _To>, is_same<_AFrom, _ATo>>>>> + { + using _Arg = typename _AFrom::template __traits<_From>::_SimdMember; + using _Ret = typename _ATo::template __traits<_To>::_SimdMember; + using _V = __vector_type_t<_To, simd_size_v<_To, _ATo>>; + + template <typename... _More> + _GLIBCXX_SIMD_INTRINSIC constexpr _Ret + operator()(_Arg __a, _More... __more) const noexcept + { return __vector_convert<_V>(__a, __more...); } + }; + +// }}} +// _SimdConverter scalar -> fixed_size<1> {{{1 +template <typename _From, typename _To> + struct _SimdConverter<_From, simd_abi::scalar, _To, simd_abi::fixed_size<1>, + void> + { + _GLIBCXX_SIMD_INTRINSIC constexpr _SimdTuple<_To, simd_abi::scalar> + operator()(_From __x) const noexcept + { return {static_cast<_To>(__x)}; } + }; + +// _SimdConverter fixed_size<1> -> scalar {{{1 +template <typename _From, typename _To> + struct _SimdConverter<_From, simd_abi::fixed_size<1>, _To, simd_abi::scalar, + void> + { + _GLIBCXX_SIMD_INTRINSIC constexpr _To + operator()(_SimdTuple<_From, simd_abi::scalar> __x) const noexcept + { return {static_cast<_To>(__x.first)}; } + }; + +// _SimdConverter fixed_size<_Np> -> fixed_size<_Np> {{{1 +template <typename _From, typename _To, int _Np> + struct _SimdConverter<_From, simd_abi::fixed_size<_Np>, _To, + simd_abi::fixed_size<_Np>, + enable_if_t<!is_same_v<_From, _To>>> + { + using _Ret = __fixed_size_storage_t<_To, _Np>; + using _Arg = __fixed_size_storage_t<_From, _Np>; + + _GLIBCXX_SIMD_INTRINSIC constexpr _Ret + operator()(const _Arg& __x) const noexcept + { + if constexpr (is_same_v<_From, _To>) + return __x; + + // special case (optimize) int signedness casts + else if constexpr (sizeof(_From) == sizeof(_To) + && is_integral_v<_From> && is_integral_v<_To>) + return __bit_cast<_Ret>(__x); + + // special case if all ABI tags in _Ret are scalar + else if constexpr (__is_scalar_abi<typename _Ret::_FirstAbi>()) + { + return __call_with_subscripts( + __x, make_index_sequence<_Np>(), + [](auto... __values) constexpr->_Ret { + return __make_simd_tuple<_To, decltype((void) __values, + simd_abi::scalar())...>( + static_cast<_To>(__values)...); + }); + } + + // from one vector to one vector + else if constexpr (_Arg::_S_first_size == _Ret::_S_first_size) + { + _SimdConverter<_From, typename _Arg::_FirstAbi, _To, + typename _Ret::_FirstAbi> + __native_cvt; + if constexpr (_Arg::_S_tuple_size == 1) + return {__native_cvt(__x.first)}; + else + { + constexpr size_t _NRemain = _Np - _Arg::_S_first_size; + _SimdConverter<_From, simd_abi::fixed_size<_NRemain>, _To, + simd_abi::fixed_size<_NRemain>> + __remainder_cvt; + return {__native_cvt(__x.first), __remainder_cvt(__x.second)}; + } + } + + // from one vector to multiple vectors + else if constexpr (_Arg::_S_first_size > _Ret::_S_first_size) + { + const auto __multiple_return_chunks + = __convert_all<__vector_type_t<_To, _Ret::_S_first_size>>( + __x.first); + constexpr auto __converted = __multiple_return_chunks.size() + * _Ret::_FirstAbi::template _S_size<_To>; + constexpr auto __remaining = _Np - __converted; + if constexpr (_Arg::_S_tuple_size == 1 && __remaining == 0) + return __to_simd_tuple<_To, _Np>(__multiple_return_chunks); + else if constexpr (_Arg::_S_tuple_size == 1) + { // e.g. <int, 3> -> <double, 2, 1> or <short, 7> -> <double, 4, 2, + // 1> + using _RetRem + = __remove_cvref_t<decltype(__simd_tuple_pop_front<__converted>( + _Ret()))>; + const auto __return_chunks2 + = __convert_all<__vector_type_t<_To, _RetRem::_S_first_size>, 0, + __converted>(__x.first); + constexpr auto __converted2 + = __converted + + __return_chunks2.size() * _RetRem::_S_first_size; + if constexpr (__converted2 == _Np) + return __to_simd_tuple<_To, _Np>(__multiple_return_chunks, + __return_chunks2); + else + { + using _RetRem2 = __remove_cvref_t< + decltype(__simd_tuple_pop_front<__return_chunks2.size() + * _RetRem::_S_first_size>( + _RetRem()))>; + const auto __return_chunks3 = __convert_all< + __vector_type_t<_To, _RetRem2::_S_first_size>, 0, + __converted2>(__x.first); + constexpr auto __converted3 + = __converted2 + + __return_chunks3.size() * _RetRem2::_S_first_size; + if constexpr (__converted3 == _Np) + return __to_simd_tuple<_To, _Np>(__multiple_return_chunks, + __return_chunks2, + __return_chunks3); + else + { + using _RetRem3 + = __remove_cvref_t<decltype(__simd_tuple_pop_front< + __return_chunks3.size() + * _RetRem2::_S_first_size>( + _RetRem2()))>; + const auto __return_chunks4 = __convert_all< + __vector_type_t<_To, _RetRem3::_S_first_size>, 0, + __converted3>(__x.first); + constexpr auto __converted4 + = __converted3 + + __return_chunks4.size() * _RetRem3::_S_first_size; + if constexpr (__converted4 == _Np) + return __to_simd_tuple<_To, _Np>( + __multiple_return_chunks, __return_chunks2, + __return_chunks3, __return_chunks4); + else + __assert_unreachable<_To>(); + } + } + } + else + { + constexpr size_t _NRemain = _Np - _Arg::_S_first_size; + _SimdConverter<_From, simd_abi::fixed_size<_NRemain>, _To, + simd_abi::fixed_size<_NRemain>> + __remainder_cvt; + return __simd_tuple_concat( + __to_simd_tuple<_To, _Arg::_S_first_size>( + __multiple_return_chunks), + __remainder_cvt(__x.second)); + } + } + + // from multiple vectors to one vector + // _Arg::_S_first_size < _Ret::_S_first_size + // a) heterogeneous input at the end of the tuple (possible with partial + // native registers in _Ret) + else if constexpr (_Ret::_S_tuple_size == 1 + && _Np % _Arg::_S_first_size != 0) + { + static_assert(_Ret::_FirstAbi::template _S_is_partial<_To>); + return _Ret{__generate_from_n_evaluations< + _Np, typename _VectorTraits<typename _Ret::_FirstType>::type>( + [&](auto __i) { return static_cast<_To>(__x[__i]); })}; + } + else + { + static_assert(_Arg::_S_tuple_size > 1); + constexpr auto __n + = __div_roundup(_Ret::_S_first_size, _Arg::_S_first_size); + return __call_with_n_evaluations<__n>( + [&__x](auto... __uncvted) { + // assuming _Arg Abi tags for all __i are _Arg::_FirstAbi + _SimdConverter<_From, typename _Arg::_FirstAbi, _To, + typename _Ret::_FirstAbi> + __native_cvt; + if constexpr (_Ret::_S_tuple_size == 1) + return _Ret{__native_cvt(__uncvted...)}; + else + return _Ret{ + __native_cvt(__uncvted...), + _SimdConverter< + _From, simd_abi::fixed_size<_Np - _Ret::_S_first_size>, _To, + simd_abi::fixed_size<_Np - _Ret::_S_first_size>>()( + __simd_tuple_pop_front<_Ret::_S_first_size>(__x))}; + }, + [&__x](auto __i) { return __get_tuple_at<__i>(__x); }); + } + } + }; + +// _SimdConverter "native" -> fixed_size<_Np> {{{1 +// i.e. 1 register to ? registers +template <typename _From, typename _Ap, typename _To, int _Np> + struct _SimdConverter<_From, _Ap, _To, simd_abi::fixed_size<_Np>, + enable_if_t<!__is_fixed_size_abi_v<_Ap>>> + { + static_assert( + _Np == simd_size_v<_From, _Ap>, + "_SimdConverter to fixed_size only works for equal element counts"); + + using _Ret = __fixed_size_storage_t<_To, _Np>; + + _GLIBCXX_SIMD_INTRINSIC constexpr _Ret + operator()(typename _SimdTraits<_From, _Ap>::_SimdMember __x) const noexcept + { + if constexpr (_Ret::_S_tuple_size == 1) + return {__vector_convert<typename _Ret::_FirstType::_BuiltinType>(__x)}; + else + { + using _FixedNp = simd_abi::fixed_size<_Np>; + _SimdConverter<_From, _FixedNp, _To, _FixedNp> __fixed_cvt; + using _FromFixedStorage = __fixed_size_storage_t<_From, _Np>; + if constexpr (_FromFixedStorage::_S_tuple_size == 1) + return __fixed_cvt(_FromFixedStorage{__x}); + else if constexpr (_FromFixedStorage::_S_tuple_size == 2) + { + _FromFixedStorage __tmp; + static_assert(sizeof(__tmp) <= sizeof(__x)); + __builtin_memcpy(&__tmp.first, &__x, sizeof(__tmp.first)); + __builtin_memcpy(&__tmp.second.first, + reinterpret_cast<const char*>(&__x) + + sizeof(__tmp.first), + sizeof(__tmp.second.first)); + return __fixed_cvt(__tmp); + } + else + __assert_unreachable<_From>(); + } + } + }; + +// _SimdConverter fixed_size<_Np> -> "native" {{{1 +// i.e. ? register to 1 registers +template <typename _From, int _Np, typename _To, typename _Ap> + struct _SimdConverter<_From, simd_abi::fixed_size<_Np>, _To, _Ap, + enable_if_t<!__is_fixed_size_abi_v<_Ap>>> + { + static_assert( + _Np == simd_size_v<_To, _Ap>, + "_SimdConverter to fixed_size only works for equal element counts"); + + using _Arg = __fixed_size_storage_t<_From, _Np>; + + _GLIBCXX_SIMD_INTRINSIC constexpr + typename _SimdTraits<_To, _Ap>::_SimdMember + operator()(_Arg __x) const noexcept + { + if constexpr (_Arg::_S_tuple_size == 1) + return __vector_convert<__vector_type_t<_To, _Np>>(__x.first); + else if constexpr (_Arg::_S_is_homogeneous) + return __call_with_n_evaluations<_Arg::_S_tuple_size>( + [](auto... __members) { + if constexpr ((is_convertible_v<decltype(__members), _To> && ...)) + return __vector_type_t<_To, _Np>{static_cast<_To>(__members)...}; + else + return __vector_convert<__vector_type_t<_To, _Np>>(__members...); + }, + [&](auto __i) { return __get_tuple_at<__i>(__x); }); + else if constexpr (__fixed_size_storage_t<_To, _Np>::_S_tuple_size == 1) + { + _SimdConverter<_From, simd_abi::fixed_size<_Np>, _To, + simd_abi::fixed_size<_Np>> + __fixed_cvt; + return __fixed_cvt(__x).first; + } + else + { + const _SimdWrapper<_From, _Np> __xv + = __generate_from_n_evaluations<_Np, __vector_type_t<_From, _Np>>( + [&](auto __i) { return __x[__i]; }); + return __vector_convert<__vector_type_t<_To, _Np>>(__xv); + } + } + }; + +// }}}1 +_GLIBCXX_SIMD_END_NAMESPACE +#endif // __cplusplus >= 201703L +#endif // _GLIBCXX_EXPERIMENTAL_SIMD_CONVERTER_H_ + +// vim: foldmethod=marker sw=2 noet ts=8 sts=2 tw=80 diff --git a/libstdc++-v3/include/experimental/bits/simd_detail.h b/libstdc++-v3/include/experimental/bits/simd_detail.h new file mode 100644 index 00000000000..a49a9d88b7f --- /dev/null +++ b/libstdc++-v3/include/experimental/bits/simd_detail.h @@ -0,0 +1,306 @@ +// Internal macros for the simd implementation -*- C++ -*- + +// Copyright (C) 2020 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// Under Section 7 of GPL version 3, you are granted additional +// permissions described in the GCC Runtime Library Exception, version +// 3.1, as published by the Free Software Foundation. + +// You should have received a copy of the GNU General Public License and +// a copy of the GCC Runtime Library Exception along with this program; +// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +// <http://www.gnu.org/licenses/>. + +#ifndef _GLIBCXX_EXPERIMENTAL_SIMD_DETAIL_H_ +#define _GLIBCXX_EXPERIMENTAL_SIMD_DETAIL_H_ + +#if __cplusplus >= 201703L + +#include <cstddef> +#include <cstdint> + + +#define _GLIBCXX_SIMD_BEGIN_NAMESPACE \ + namespace std _GLIBCXX_VISIBILITY(default) \ + { \ + _GLIBCXX_BEGIN_NAMESPACE_VERSION \ + namespace experimental { \ + inline namespace parallelism_v2 { +#define _GLIBCXX_SIMD_END_NAMESPACE \ + } \ + } \ + _GLIBCXX_END_NAMESPACE_VERSION \ + } + +// ISA extension detection. The following defines all the _GLIBCXX_SIMD_HAVE_XXX +// macros ARM{{{ +#if defined __ARM_NEON +#define _GLIBCXX_SIMD_HAVE_NEON 1 +#else +#define _GLIBCXX_SIMD_HAVE_NEON 0 +#endif +#if defined __ARM_NEON && (__ARM_ARCH >= 8 || defined __aarch64__) +#define _GLIBCXX_SIMD_HAVE_NEON_A32 1 +#else +#define _GLIBCXX_SIMD_HAVE_NEON_A32 0 +#endif +#if defined __ARM_NEON && defined __aarch64__ +#define _GLIBCXX_SIMD_HAVE_NEON_A64 1 +#else +#define _GLIBCXX_SIMD_HAVE_NEON_A64 0 +#endif +//}}} +// x86{{{ +#ifdef __MMX__ +#define _GLIBCXX_SIMD_HAVE_MMX 1 +#else +#define _GLIBCXX_SIMD_HAVE_MMX 0 +#endif +#if defined __SSE__ || defined __x86_64__ +#define _GLIBCXX_SIMD_HAVE_SSE 1 +#else +#define _GLIBCXX_SIMD_HAVE_SSE 0 +#endif +#if defined __SSE2__ || defined __x86_64__ +#define _GLIBCXX_SIMD_HAVE_SSE2 1 +#else +#define _GLIBCXX_SIMD_HAVE_SSE2 0 +#endif +#ifdef __SSE3__ +#define _GLIBCXX_SIMD_HAVE_SSE3 1 +#else +#define _GLIBCXX_SIMD_HAVE_SSE3 0 +#endif +#ifdef __SSSE3__ +#define _GLIBCXX_SIMD_HAVE_SSSE3 1 +#else +#define _GLIBCXX_SIMD_HAVE_SSSE3 0 +#endif +#ifdef __SSE4_1__ +#define _GLIBCXX_SIMD_HAVE_SSE4_1 1 +#else +#define _GLIBCXX_SIMD_HAVE_SSE4_1 0 +#endif +#ifdef __SSE4_2__ +#define _GLIBCXX_SIMD_HAVE_SSE4_2 1 +#else +#define _GLIBCXX_SIMD_HAVE_SSE4_2 0 +#endif +#ifdef __XOP__ +#define _GLIBCXX_SIMD_HAVE_XOP 1 +#else +#define _GLIBCXX_SIMD_HAVE_XOP 0 +#endif +#ifdef __AVX__ +#define _GLIBCXX_SIMD_HAVE_AVX 1 +#else +#define _GLIBCXX_SIMD_HAVE_AVX 0 +#endif +#ifdef __AVX2__ +#define _GLIBCXX_SIMD_HAVE_AVX2 1 +#else +#define _GLIBCXX_SIMD_HAVE_AVX2 0 +#endif +#ifdef __BMI__ +#define _GLIBCXX_SIMD_HAVE_BMI1 1 +#else +#define _GLIBCXX_SIMD_HAVE_BMI1 0 +#endif +#ifdef __BMI2__ +#define _GLIBCXX_SIMD_HAVE_BMI2 1 +#else +#define _GLIBCXX_SIMD_HAVE_BMI2 0 +#endif +#ifdef __LZCNT__ +#define _GLIBCXX_SIMD_HAVE_LZCNT 1 +#else +#define _GLIBCXX_SIMD_HAVE_LZCNT 0 +#endif +#ifdef __SSE4A__ +#define _GLIBCXX_SIMD_HAVE_SSE4A 1 +#else +#define _GLIBCXX_SIMD_HAVE_SSE4A 0 +#endif +#ifdef __FMA__ +#define _GLIBCXX_SIMD_HAVE_FMA 1 +#else +#define _GLIBCXX_SIMD_HAVE_FMA 0 +#endif +#ifdef __FMA4__ +#define _GLIBCXX_SIMD_HAVE_FMA4 1 +#else +#define _GLIBCXX_SIMD_HAVE_FMA4 0 +#endif +#ifdef __F16C__ +#define _GLIBCXX_SIMD_HAVE_F16C 1 +#else +#define _GLIBCXX_SIMD_HAVE_F16C 0 +#endif +#ifdef __POPCNT__ +#define _GLIBCXX_SIMD_HAVE_POPCNT 1 +#else +#define _GLIBCXX_SIMD_HAVE_POPCNT 0 +#endif +#ifdef __AVX512F__ +#define _GLIBCXX_SIMD_HAVE_AVX512F 1 +#else +#define _GLIBCXX_SIMD_HAVE_AVX512F 0 +#endif +#ifdef __AVX512DQ__ +#define _GLIBCXX_SIMD_HAVE_AVX512DQ 1 +#else +#define _GLIBCXX_SIMD_HAVE_AVX512DQ 0 +#endif +#ifdef __AVX512VL__ +#define _GLIBCXX_SIMD_HAVE_AVX512VL 1 +#else +#define _GLIBCXX_SIMD_HAVE_AVX512VL 0 +#endif +#ifdef __AVX512BW__ +#define _GLIBCXX_SIMD_HAVE_AVX512BW 1 +#else +#define _GLIBCXX_SIMD_HAVE_AVX512BW 0 +#endif + +#if _GLIBCXX_SIMD_HAVE_SSE +#define _GLIBCXX_SIMD_HAVE_SSE_ABI 1 +#else +#define _GLIBCXX_SIMD_HAVE_SSE_ABI 0 +#endif +#if _GLIBCXX_SIMD_HAVE_SSE2 +#define _GLIBCXX_SIMD_HAVE_FULL_SSE_ABI 1 +#else +#define _GLIBCXX_SIMD_HAVE_FULL_SSE_ABI 0 +#endif + +#if _GLIBCXX_SIMD_HAVE_AVX +#define _GLIBCXX_SIMD_HAVE_AVX_ABI 1 +#else +#define _GLIBCXX_SIMD_HAVE_AVX_ABI 0 +#endif +#if _GLIBCXX_SIMD_HAVE_AVX2 +#define _GLIBCXX_SIMD_HAVE_FULL_AVX_ABI 1 +#else +#define _GLIBCXX_SIMD_HAVE_FULL_AVX_ABI 0 +#endif + +#if _GLIBCXX_SIMD_HAVE_AVX512F +#define _GLIBCXX_SIMD_HAVE_AVX512_ABI 1 +#else +#define _GLIBCXX_SIMD_HAVE_AVX512_ABI 0 +#endif +#if _GLIBCXX_SIMD_HAVE_AVX512BW +#define _GLIBCXX_SIMD_HAVE_FULL_AVX512_ABI 1 +#else +#define _GLIBCXX_SIMD_HAVE_FULL_AVX512_ABI 0 +#endif + +#if defined __x86_64__ && !_GLIBCXX_SIMD_HAVE_SSE2 +#error "Use of SSE2 is required on AMD64" +#endif +//}}} + +#ifdef __clang__ +#define _GLIBCXX_SIMD_NORMAL_MATH +#else +#define _GLIBCXX_SIMD_NORMAL_MATH \ + [[__gnu__::__optimize__("finite-math-only,no-signed-zeros")]] +#endif +#define _GLIBCXX_SIMD_NEVER_INLINE [[__gnu__::__noinline__]] +#define _GLIBCXX_SIMD_INTRINSIC \ + [[__gnu__::__always_inline__, __gnu__::__artificial__]] inline +#define _GLIBCXX_SIMD_ALWAYS_INLINE [[__gnu__::__always_inline__]] inline +#define _GLIBCXX_SIMD_IS_UNLIKELY(__x) __builtin_expect(__x, 0) +#define _GLIBCXX_SIMD_IS_LIKELY(__x) __builtin_expect(__x, 1) + +#if defined __STRICT_ANSI__ && __STRICT_ANSI__ +#define _GLIBCXX_SIMD_CONSTEXPR +#define _GLIBCXX_SIMD_USE_CONSTEXPR_API const +#else +#define _GLIBCXX_SIMD_CONSTEXPR constexpr +#define _GLIBCXX_SIMD_USE_CONSTEXPR_API constexpr +#endif + +#if defined __clang__ +#define _GLIBCXX_SIMD_USE_CONSTEXPR const +#else +#define _GLIBCXX_SIMD_USE_CONSTEXPR constexpr +#endif + +#define _GLIBCXX_SIMD_LIST_BINARY(__macro) __macro(|) __macro(&) __macro(^) +#define _GLIBCXX_SIMD_LIST_SHIFTS(__macro) __macro(<<) __macro(>>) +#define _GLIBCXX_SIMD_LIST_ARITHMETICS(__macro) \ + __macro(+) __macro(-) __macro(*) __macro(/) __macro(%) + +#define _GLIBCXX_SIMD_ALL_BINARY(__macro) \ + _GLIBCXX_SIMD_LIST_BINARY(__macro) static_assert(true) +#define _GLIBCXX_SIMD_ALL_SHIFTS(__macro) \ + _GLIBCXX_SIMD_LIST_SHIFTS(__macro) static_assert(true) +#define _GLIBCXX_SIMD_ALL_ARITHMETICS(__macro) \ + _GLIBCXX_SIMD_LIST_ARITHMETICS(__macro) static_assert(true) + +#ifdef _GLIBCXX_SIMD_NO_ALWAYS_INLINE +#undef _GLIBCXX_SIMD_ALWAYS_INLINE +#define _GLIBCXX_SIMD_ALWAYS_INLINE inline +#undef _GLIBCXX_SIMD_INTRINSIC +#define _GLIBCXX_SIMD_INTRINSIC inline +#endif + +#if _GLIBCXX_SIMD_HAVE_SSE || _GLIBCXX_SIMD_HAVE_MMX +#define _GLIBCXX_SIMD_X86INTRIN 1 +#else +#define _GLIBCXX_SIMD_X86INTRIN 0 +#endif + +// workaround macros {{{ +// use aliasing loads to help GCC understand the data accesses better +// This also seems to hide a miscompilation on swap(x[i], x[i + 1]) with +// fixed_size_simd<float, 16> x. +#define _GLIBCXX_SIMD_USE_ALIASING_LOADS 1 + +// vector conversions on x86 not optimized: +#if _GLIBCXX_SIMD_X86INTRIN +#define _GLIBCXX_SIMD_WORKAROUND_PR85048 1 +#endif + +// integer division not optimized +#define _GLIBCXX_SIMD_WORKAROUND_PR90993 1 + +// very bad codegen for extraction and concatenation of 128/256 "subregisters" +// with sizeof(element type) < 8: https://godbolt.org/g/mqUsgM +#if _GLIBCXX_SIMD_X86INTRIN +#define _GLIBCXX_SIMD_WORKAROUND_XXX_1 1 +#endif + +// bad codegen for 8 Byte memcpy to __vector_type_t<char, 16> +#define _GLIBCXX_SIMD_WORKAROUND_PR90424 1 + +// bad codegen for zero-extend using simple concat(__x, 0) +#if _GLIBCXX_SIMD_X86INTRIN +#define _GLIBCXX_SIMD_WORKAROUND_XXX_3 1 +#endif + +// https://github.com/cplusplus/parallelism-ts/issues/65 (incorrect return type +// of static_simd_cast) +#define _GLIBCXX_SIMD_FIX_P2TS_ISSUE65 1 + +// https://github.com/cplusplus/parallelism-ts/issues/66 (incorrect SFINAE +// constraint on (static)_simd_cast) +#define _GLIBCXX_SIMD_FIX_P2TS_ISSUE66 1 +// }}} + +#endif // __cplusplus >= 201703L +#endif // _GLIBCXX_EXPERIMENTAL_SIMD_DETAIL_H_ + +// vim: foldmethod=marker diff --git a/libstdc++-v3/include/experimental/bits/simd_fixed_size.h b/libstdc++-v3/include/experimental/bits/simd_fixed_size.h new file mode 100644 index 00000000000..fba8c7e466e --- /dev/null +++ b/libstdc++-v3/include/experimental/bits/simd_fixed_size.h @@ -0,0 +1,2066 @@ +// Simd fixed_size ABI specific implementations -*- C++ -*- + +// Copyright (C) 2020 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// Under Section 7 of GPL version 3, you are granted additional +// permissions described in the GCC Runtime Library Exception, version +// 3.1, as published by the Free Software Foundation. + +// You should have received a copy of the GNU General Public License and +// a copy of the GCC Runtime Library Exception along with this program; +// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +// <http://www.gnu.org/licenses/>. + +/* + * The fixed_size ABI gives the following guarantees: + * - simd objects are passed via the stack + * - memory layout of `simd<_Tp, _Np>` is equivalent to `array<_Tp, _Np>` + * - alignment of `simd<_Tp, _Np>` is `_Np * sizeof(_Tp)` if _Np is __a + * power-of-2 value, otherwise `std::__bit_ceil(_Np * sizeof(_Tp))` (Note: + * if the alignment were to exceed the system/compiler maximum, it is bounded + * to that maximum) + * - simd_mask objects are passed like bitset<_Np> + * - memory layout of `simd_mask<_Tp, _Np>` is equivalent to `bitset<_Np>` + * - alignment of `simd_mask<_Tp, _Np>` is equal to the alignment of + * `bitset<_Np>` + */ + +#ifndef _GLIBCXX_EXPERIMENTAL_SIMD_FIXED_SIZE_H_ +#define _GLIBCXX_EXPERIMENTAL_SIMD_FIXED_SIZE_H_ + +#if __cplusplus >= 201703L + +#include <array> + +_GLIBCXX_SIMD_BEGIN_NAMESPACE + +// __simd_tuple_element {{{ +template <size_t _I, typename _Tp> + struct __simd_tuple_element; + +template <typename _Tp, typename _A0, typename... _As> + struct __simd_tuple_element<0, _SimdTuple<_Tp, _A0, _As...>> + { using type = simd<_Tp, _A0>; }; + +template <size_t _I, typename _Tp, typename _A0, typename... _As> + struct __simd_tuple_element<_I, _SimdTuple<_Tp, _A0, _As...>> + { + using type = + typename __simd_tuple_element<_I - 1, _SimdTuple<_Tp, _As...>>::type; + }; + +template <size_t _I, typename _Tp> + using __simd_tuple_element_t = typename __simd_tuple_element<_I, _Tp>::type; + +// }}} +// __simd_tuple_concat {{{ + +template <typename _Tp, typename... _A0s, typename... _A1s> + _GLIBCXX_SIMD_INTRINSIC constexpr _SimdTuple<_Tp, _A0s..., _A1s...> + __simd_tuple_concat(const _SimdTuple<_Tp, _A0s...>& __left, + const _SimdTuple<_Tp, _A1s...>& __right) + { + if constexpr (sizeof...(_A0s) == 0) + return __right; + else if constexpr (sizeof...(_A1s) == 0) + return __left; + else + return {__left.first, __simd_tuple_concat(__left.second, __right)}; + } + +template <typename _Tp, typename _A10, typename... _A1s> + _GLIBCXX_SIMD_INTRINSIC constexpr _SimdTuple<_Tp, simd_abi::scalar, _A10, + _A1s...> + __simd_tuple_concat(const _Tp& __left, + const _SimdTuple<_Tp, _A10, _A1s...>& __right) + { return {__left, __right}; } + +// }}} +// __simd_tuple_pop_front {{{ +// Returns the next _SimdTuple in __x that has _Np elements less. +// Precondition: _Np must match the number of elements in __first (recursively) +template <size_t _Np, typename _Tp> + _GLIBCXX_SIMD_INTRINSIC constexpr decltype(auto) + __simd_tuple_pop_front(_Tp&& __x) + { + if constexpr (_Np == 0) + return static_cast<_Tp&&>(__x); + else + { + using _Up = __remove_cvref_t<_Tp>; + static_assert(_Np >= _Up::_S_first_size); + return __simd_tuple_pop_front<_Np - _Up::_S_first_size>(__x.second); + } + } + +// }}} +// __get_simd_at<_Np> {{{1 +struct __as_simd {}; + +struct __as_simd_tuple {}; + +template <typename _Tp, typename _A0, typename... _Abis> + _GLIBCXX_SIMD_INTRINSIC constexpr simd<_Tp, _A0> + __simd_tuple_get_impl(__as_simd, const _SimdTuple<_Tp, _A0, _Abis...>& __t, + _SizeConstant<0>) + { return {__private_init, __t.first}; } + +template <typename _Tp, typename _A0, typename... _Abis> + _GLIBCXX_SIMD_INTRINSIC constexpr const auto& + __simd_tuple_get_impl(__as_simd_tuple, + const _SimdTuple<_Tp, _A0, _Abis...>& __t, + _SizeConstant<0>) + { return __t.first; } + +template <typename _Tp, typename _A0, typename... _Abis> + _GLIBCXX_SIMD_INTRINSIC constexpr auto& + __simd_tuple_get_impl(__as_simd_tuple, _SimdTuple<_Tp, _A0, _Abis...>& __t, + _SizeConstant<0>) + { return __t.first; } + +template <typename _R, size_t _Np, typename _Tp, typename... _Abis> + _GLIBCXX_SIMD_INTRINSIC constexpr auto + __simd_tuple_get_impl(_R, const _SimdTuple<_Tp, _Abis...>& __t, + _SizeConstant<_Np>) + { return __simd_tuple_get_impl(_R(), __t.second, _SizeConstant<_Np - 1>()); } + +template <size_t _Np, typename _Tp, typename... _Abis> + _GLIBCXX_SIMD_INTRINSIC constexpr auto& + __simd_tuple_get_impl(__as_simd_tuple, _SimdTuple<_Tp, _Abis...>& __t, + _SizeConstant<_Np>) + { + return __simd_tuple_get_impl(__as_simd_tuple(), __t.second, + _SizeConstant<_Np - 1>()); + } + +template <size_t _Np, typename _Tp, typename... _Abis> + _GLIBCXX_SIMD_INTRINSIC constexpr auto + __get_simd_at(const _SimdTuple<_Tp, _Abis...>& __t) + { return __simd_tuple_get_impl(__as_simd(), __t, _SizeConstant<_Np>()); } + +// }}} +// __get_tuple_at<_Np> {{{ +template <size_t _Np, typename _Tp, typename... _Abis> + _GLIBCXX_SIMD_INTRINSIC constexpr auto + __get_tuple_at(const _SimdTuple<_Tp, _Abis...>& __t) + { + return __simd_tuple_get_impl(__as_simd_tuple(), __t, _SizeConstant<_Np>()); + } + +template <size_t _Np, typename _Tp, typename... _Abis> + _GLIBCXX_SIMD_INTRINSIC constexpr auto& + __get_tuple_at(_SimdTuple<_Tp, _Abis...>& __t) + { + return __simd_tuple_get_impl(__as_simd_tuple(), __t, _SizeConstant<_Np>()); + } + +// __tuple_element_meta {{{1 +template <typename _Tp, typename _Abi, size_t _Offset> + struct __tuple_element_meta : public _Abi::_SimdImpl + { + static_assert(is_same_v<typename _Abi::_SimdImpl::abi_type, + _Abi>); // this fails e.g. when _SimdImpl is an + // alias for _SimdImplBuiltin<_DifferentAbi> + using value_type = _Tp; + using abi_type = _Abi; + using _Traits = _SimdTraits<_Tp, _Abi>; + using _MaskImpl = typename _Abi::_MaskImpl; + using _MaskMember = typename _Traits::_MaskMember; + using simd_type = simd<_Tp, _Abi>; + static constexpr size_t _S_offset = _Offset; + static constexpr size_t _S_size() { return simd_size<_Tp, _Abi>::value; } + static constexpr _MaskImpl _S_mask_impl = {}; + + template <size_t _Np, bool _Sanitized> + _GLIBCXX_SIMD_INTRINSIC static auto + _S_submask(_BitMask<_Np, _Sanitized> __bits) + { return __bits.template _M_extract<_Offset, _S_size()>(); } + + template <size_t _Np, bool _Sanitized> + _GLIBCXX_SIMD_INTRINSIC static _MaskMember + _S_make_mask(_BitMask<_Np, _Sanitized> __bits) + { + return _MaskImpl::template _S_convert<_Tp>( + __bits.template _M_extract<_Offset, _S_size()>()._M_sanitized()); + } + + _GLIBCXX_SIMD_INTRINSIC static _ULLong + _S_mask_to_shifted_ullong(_MaskMember __k) + { return _MaskImpl::_S_to_bits(__k).to_ullong() << _Offset; } + }; + +template <size_t _Offset, typename _Tp, typename _Abi, typename... _As> + __tuple_element_meta<_Tp, _Abi, _Offset> + __make_meta(const _SimdTuple<_Tp, _Abi, _As...>&) + { return {}; } + +// }}}1 +// _WithOffset wrapper class {{{ +template <size_t _Offset, typename _Base> + struct _WithOffset : public _Base + { + static inline constexpr size_t _S_offset = _Offset; + + _GLIBCXX_SIMD_INTRINSIC char* _M_as_charptr() + { + return reinterpret_cast<char*>(this) + + _S_offset * sizeof(typename _Base::value_type); + } + + _GLIBCXX_SIMD_INTRINSIC const char* _M_as_charptr() const + { + return reinterpret_cast<const char*>(this) + + _S_offset * sizeof(typename _Base::value_type); + } + }; + +// make _WithOffset<_WithOffset> ill-formed to use: +template <size_t _O0, size_t _O1, typename _Base> + struct _WithOffset<_O0, _WithOffset<_O1, _Base>> {}; + +template <size_t _Offset, typename _Tp> + decltype(auto) + __add_offset(_Tp& __base) + { return static_cast<_WithOffset<_Offset, __remove_cvref_t<_Tp>>&>(__base); } + +template <size_t _Offset, typename _Tp> + decltype(auto) + __add_offset(const _Tp& __base) + { + return static_cast<const _WithOffset<_Offset, __remove_cvref_t<_Tp>>&>( + __base); + } + +template <size_t _Offset, size_t _ExistingOffset, typename _Tp> + decltype(auto) + __add_offset(_WithOffset<_ExistingOffset, _Tp>& __base) + { + return static_cast<_WithOffset<_Offset + _ExistingOffset, _Tp>&>( + static_cast<_Tp&>(__base)); + } + +template <size_t _Offset, size_t _ExistingOffset, typename _Tp> + decltype(auto) + __add_offset(const _WithOffset<_ExistingOffset, _Tp>& __base) + { + return static_cast<const _WithOffset<_Offset + _ExistingOffset, _Tp>&>( + static_cast<const _Tp&>(__base)); + } + +template <typename _Tp> + constexpr inline size_t __offset = 0; + +template <size_t _Offset, typename _Tp> + constexpr inline size_t __offset<_WithOffset<_Offset, _Tp>> + = _WithOffset<_Offset, _Tp>::_S_offset; + +template <typename _Tp> + constexpr inline size_t __offset<const _Tp> = __offset<_Tp>; + +template <typename _Tp> + constexpr inline size_t __offset<_Tp&> = __offset<_Tp>; + +template <typename _Tp> + constexpr inline size_t __offset<_Tp&&> = __offset<_Tp>; + +// }}} +// _SimdTuple specializations {{{1 +// empty {{{2 +template <typename _Tp> + struct _SimdTuple<_Tp> + { + using value_type = _Tp; + static constexpr size_t _S_tuple_size = 0; + static constexpr size_t _S_size() { return 0; } + }; + +// _SimdTupleData {{{2 +template <typename _FirstType, typename _SecondType> + struct _SimdTupleData + { + _FirstType first; + _SecondType second; + + _GLIBCXX_SIMD_INTRINSIC + constexpr bool _M_is_constprop() const + { + if constexpr (is_class_v<_FirstType>) + return first._M_is_constprop() && second._M_is_constprop(); + else + return __builtin_constant_p(first) && second._M_is_constprop(); + } + }; + +template <typename _FirstType, typename _Tp> + struct _SimdTupleData<_FirstType, _SimdTuple<_Tp>> + { + _FirstType first; + static constexpr _SimdTuple<_Tp> second = {}; + + _GLIBCXX_SIMD_INTRINSIC + constexpr bool _M_is_constprop() const + { + if constexpr (is_class_v<_FirstType>) + return first._M_is_constprop(); + else + return __builtin_constant_p(first); + } + }; + +// 1 or more {{{2 +template <typename _Tp, typename _Abi0, typename... _Abis> + struct _SimdTuple<_Tp, _Abi0, _Abis...> + : _SimdTupleData<typename _SimdTraits<_Tp, _Abi0>::_SimdMember, + _SimdTuple<_Tp, _Abis...>> + { + static_assert(!__is_fixed_size_abi_v<_Abi0>); + using value_type = _Tp; + using _FirstType = typename _SimdTraits<_Tp, _Abi0>::_SimdMember; + using _FirstAbi = _Abi0; + using _SecondType = _SimdTuple<_Tp, _Abis...>; + static constexpr size_t _S_tuple_size = sizeof...(_Abis) + 1; + + static constexpr size_t _S_size() + { return simd_size_v<_Tp, _Abi0> + _SecondType::_S_size(); } + + static constexpr size_t _S_first_size = simd_size_v<_Tp, _Abi0>; + static constexpr bool _S_is_homogeneous = (is_same_v<_Abi0, _Abis> && ...); + + using _Base = _SimdTupleData<typename _SimdTraits<_Tp, _Abi0>::_SimdMember, + _SimdTuple<_Tp, _Abis...>>; + using _Base::first; + using _Base::second; + + _GLIBCXX_SIMD_INTRINSIC constexpr _SimdTuple() = default; + _GLIBCXX_SIMD_INTRINSIC constexpr _SimdTuple(const _SimdTuple&) = default; + _GLIBCXX_SIMD_INTRINSIC constexpr _SimdTuple& operator=(const _SimdTuple&) + = default; + + template <typename _Up> + _GLIBCXX_SIMD_INTRINSIC constexpr _SimdTuple(_Up&& __x) + : _Base{static_cast<_Up&&>(__x)} {} + + template <typename _Up, typename _Up2> + _GLIBCXX_SIMD_INTRINSIC constexpr _SimdTuple(_Up&& __x, _Up2&& __y) + : _Base{static_cast<_Up&&>(__x), static_cast<_Up2&&>(__y)} {} + + template <typename _Up> + _GLIBCXX_SIMD_INTRINSIC constexpr _SimdTuple(_Up&& __x, _SimdTuple<_Tp>) + : _Base{static_cast<_Up&&>(__x)} {} + + _GLIBCXX_SIMD_INTRINSIC char* _M_as_charptr() + { return reinterpret_cast<char*>(this); } + + _GLIBCXX_SIMD_INTRINSIC const char* _M_as_charptr() const + { return reinterpret_cast<const char*>(this); } + + template <size_t _Np> + _GLIBCXX_SIMD_INTRINSIC constexpr auto& _M_at() + { + if constexpr (_Np == 0) + return first; + else + return second.template _M_at<_Np - 1>(); + } + + template <size_t _Np> + _GLIBCXX_SIMD_INTRINSIC constexpr const auto& _M_at() const + { + if constexpr (_Np == 0) + return first; + else + return second.template _M_at<_Np - 1>(); + } + + template <size_t _Np> + _GLIBCXX_SIMD_INTRINSIC constexpr auto _M_simd_at() const + { + if constexpr (_Np == 0) + return simd<_Tp, _Abi0>(__private_init, first); + else + return second.template _M_simd_at<_Np - 1>(); + } + + template <size_t _Offset = 0, typename _Fp> + _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdTuple + _S_generate(_Fp&& __gen, _SizeConstant<_Offset> = {}) + { + auto&& __first = __gen(__tuple_element_meta<_Tp, _Abi0, _Offset>()); + if constexpr (_S_tuple_size == 1) + return {__first}; + else + return {__first, + _SecondType::_S_generate( + static_cast<_Fp&&>(__gen), + _SizeConstant<_Offset + simd_size_v<_Tp, _Abi0>>())}; + } + + template <size_t _Offset = 0, typename _Fp, typename... _More> + _GLIBCXX_SIMD_INTRINSIC _SimdTuple + _M_apply_wrapped(_Fp&& __fun, const _More&... __more) const + { + auto&& __first + = __fun(__make_meta<_Offset>(*this), first, __more.first...); + if constexpr (_S_tuple_size == 1) + return {__first}; + else + return { + __first, + second.template _M_apply_wrapped<_Offset + simd_size_v<_Tp, _Abi0>>( + static_cast<_Fp&&>(__fun), __more.second...)}; + } + + template <typename _Tup> + _GLIBCXX_SIMD_INTRINSIC constexpr decltype(auto) + _M_extract_argument(_Tup&& __tup) const + { + using _TupT = typename __remove_cvref_t<_Tup>::value_type; + if constexpr (is_same_v<_SimdTuple, __remove_cvref_t<_Tup>>) + return __tup.first; + else if (__builtin_is_constant_evaluated()) + return __fixed_size_storage_t<_TupT, _S_first_size>::_S_generate([&]( + auto __meta) constexpr { + return __meta._S_generator( + [&](auto __i) constexpr { return __tup[__i]; }, + static_cast<_TupT*>(nullptr)); + }); + else + return [&]() { + __fixed_size_storage_t<_TupT, _S_first_size> __r; + __builtin_memcpy(__r._M_as_charptr(), __tup._M_as_charptr(), + sizeof(__r)); + return __r; + }(); + } + + template <typename _Tup> + _GLIBCXX_SIMD_INTRINSIC constexpr auto& + _M_skip_argument(_Tup&& __tup) const + { + static_assert(_S_tuple_size > 1); + using _Up = __remove_cvref_t<_Tup>; + constexpr size_t __off = __offset<_Up>; + if constexpr (_S_first_size == _Up::_S_first_size && __off == 0) + return __tup.second; + else if constexpr (_S_first_size > _Up::_S_first_size + && _S_first_size % _Up::_S_first_size == 0 + && __off == 0) + return __simd_tuple_pop_front<_S_first_size>(__tup); + else if constexpr (_S_first_size + __off < _Up::_S_first_size) + return __add_offset<_S_first_size>(__tup); + else if constexpr (_S_first_size + __off == _Up::_S_first_size) + return __tup.second; + else + __assert_unreachable<_Tup>(); + } + + template <size_t _Offset, typename... _More> + _GLIBCXX_SIMD_INTRINSIC constexpr void + _M_assign_front(const _SimdTuple<_Tp, _Abi0, _More...>& __x) & + { + static_assert(_Offset == 0); + first = __x.first; + if constexpr (sizeof...(_More) > 0) + { + static_assert(sizeof...(_Abis) >= sizeof...(_More)); + second.template _M_assign_front<0>(__x.second); + } + } + + template <size_t _Offset> + _GLIBCXX_SIMD_INTRINSIC constexpr void + _M_assign_front(const _FirstType& __x) & + { + static_assert(_Offset == 0); + first = __x; + } + + template <size_t _Offset, typename... _As> + _GLIBCXX_SIMD_INTRINSIC constexpr void + _M_assign_front(const _SimdTuple<_Tp, _As...>& __x) & + { + __builtin_memcpy(_M_as_charptr() + _Offset * sizeof(value_type), + __x._M_as_charptr(), + sizeof(_Tp) * _SimdTuple<_Tp, _As...>::_S_size()); + } + + /* + * Iterate over the first objects in this _SimdTuple and call __fun for each + * of them. If additional arguments are passed via __more, chunk them into + * _SimdTuple or __vector_type_t objects of the same number of values. + */ + template <typename _Fp, typename... _More> + _GLIBCXX_SIMD_INTRINSIC constexpr _SimdTuple + _M_apply_per_chunk(_Fp&& __fun, _More&&... __more) const + { + if constexpr ((... + || conjunction_v< + is_lvalue_reference<_More>, + negation<is_const<remove_reference_t<_More>>>>) ) + { + // need to write back at least one of __more after calling __fun + auto&& __first = [&](auto... __args) constexpr + { + auto __r = __fun(__tuple_element_meta<_Tp, _Abi0, 0>(), first, + __args...); + [[maybe_unused]] auto&& __ignore_me = {( + [](auto&& __dst, const auto& __src) { + if constexpr (is_assignable_v<decltype(__dst), + decltype(__dst)>) + { + __dst.template _M_assign_front<__offset<decltype(__dst)>>( + __src); + } + }(static_cast<_More&&>(__more), __args), + 0)...}; + return __r; + } + (_M_extract_argument(__more)...); + if constexpr (_S_tuple_size == 1) + return {__first}; + else + return {__first, + second._M_apply_per_chunk(static_cast<_Fp&&>(__fun), + _M_skip_argument(__more)...)}; + } + else + { + auto&& __first = __fun(__tuple_element_meta<_Tp, _Abi0, 0>(), first, + _M_extract_argument(__more)...); + if constexpr (_S_tuple_size == 1) + return {__first}; + else + return {__first, + second._M_apply_per_chunk(static_cast<_Fp&&>(__fun), + _M_skip_argument(__more)...)}; + } + } + + template <typename _R = _Tp, typename _Fp, typename... _More> + _GLIBCXX_SIMD_INTRINSIC auto _M_apply_r(_Fp&& __fun, + const _More&... __more) const + { + auto&& __first = __fun(__tuple_element_meta<_Tp, _Abi0, 0>(), first, + __more.first...); + if constexpr (_S_tuple_size == 1) + return __first; + else + return __simd_tuple_concat<_R>( + __first, second.template _M_apply_r<_R>(static_cast<_Fp&&>(__fun), + __more.second...)); + } + + template <typename _Fp, typename... _More> + _GLIBCXX_SIMD_INTRINSIC constexpr friend _SanitizedBitMask<_S_size()> + _M_test(const _Fp& __fun, const _SimdTuple& __x, const _More&... __more) + { + const _SanitizedBitMask<_S_first_size> __first + = _Abi0::_MaskImpl::_S_to_bits( + __fun(__tuple_element_meta<_Tp, _Abi0, 0>(), __x.first, + __more.first...)); + if constexpr (_S_tuple_size == 1) + return __first; + else + return _M_test(__fun, __x.second, __more.second...) + ._M_prepend(__first); + } + + template <typename _Up, _Up _I> + _GLIBCXX_SIMD_INTRINSIC constexpr _Tp + operator[](integral_constant<_Up, _I>) const noexcept + { + if constexpr (_I < simd_size_v<_Tp, _Abi0>) + return _M_subscript_read(_I); + else + return second[integral_constant<_Up, _I - simd_size_v<_Tp, _Abi0>>()]; + } + + _Tp operator[](size_t __i) const noexcept + { + if constexpr (_S_tuple_size == 1) + return _M_subscript_read(__i); + else + { +#ifdef _GLIBCXX_SIMD_USE_ALIASING_LOADS + return reinterpret_cast<const __may_alias<_Tp>*>(this)[__i]; +#else + if constexpr (__is_scalar_abi<_Abi0>()) + { + const _Tp* ptr = &first; + return ptr[__i]; + } + else + return __i < simd_size_v<_Tp, _Abi0> + ? _M_subscript_read(__i) + : second[__i - simd_size_v<_Tp, _Abi0>]; +#endif + } + } + + void _M_set(size_t __i, _Tp __val) noexcept + { + if constexpr (_S_tuple_size == 1) + return _M_subscript_write(__i, __val); + else + { +#ifdef _GLIBCXX_SIMD_USE_ALIASING_LOADS + reinterpret_cast<__may_alias<_Tp>*>(this)[__i] = __val; +#else + if (__i < simd_size_v<_Tp, _Abi0>) + _M_subscript_write(__i, __val); + else + second._M_set(__i - simd_size_v<_Tp, _Abi0>, __val); +#endif + } + } + + private: + // _M_subscript_read/_write {{{ + _Tp _M_subscript_read([[maybe_unused]] size_t __i) const noexcept + { + if constexpr (__is_vectorizable_v<_FirstType>) + return first; + else + return first[__i]; + } + + void _M_subscript_write([[maybe_unused]] size_t __i, _Tp __y) noexcept + { + if constexpr (__is_vectorizable_v<_FirstType>) + first = __y; + else + first._M_set(__i, __y); + } + + // }}} + }; + +// __make_simd_tuple {{{1 +template <typename _Tp, typename _A0> + _GLIBCXX_SIMD_INTRINSIC _SimdTuple<_Tp, _A0> + __make_simd_tuple(simd<_Tp, _A0> __x0) + { return {__data(__x0)}; } + +template <typename _Tp, typename _A0, typename... _As> + _GLIBCXX_SIMD_INTRINSIC _SimdTuple<_Tp, _A0, _As...> + __make_simd_tuple(const simd<_Tp, _A0>& __x0, const simd<_Tp, _As>&... __xs) + { return {__data(__x0), __make_simd_tuple(__xs...)}; } + +template <typename _Tp, typename _A0> + _GLIBCXX_SIMD_INTRINSIC _SimdTuple<_Tp, _A0> + __make_simd_tuple(const typename _SimdTraits<_Tp, _A0>::_SimdMember& __arg0) + { return {__arg0}; } + +template <typename _Tp, typename _A0, typename _A1, typename... _Abis> + _GLIBCXX_SIMD_INTRINSIC _SimdTuple<_Tp, _A0, _A1, _Abis...> + __make_simd_tuple( + const typename _SimdTraits<_Tp, _A0>::_SimdMember& __arg0, + const typename _SimdTraits<_Tp, _A1>::_SimdMember& __arg1, + const typename _SimdTraits<_Tp, _Abis>::_SimdMember&... __args) + { return {__arg0, __make_simd_tuple<_Tp, _A1, _Abis...>(__arg1, __args...)}; } + +// __to_simd_tuple {{{1 +template <typename _Tp, size_t _Np, typename _V, size_t _NV, typename... _VX> + _GLIBCXX_SIMD_INTRINSIC constexpr __fixed_size_storage_t<_Tp, _Np> + __to_simd_tuple(const array<_V, _NV>& __from, const _VX... __fromX); + +template <typename _Tp, size_t _Np, + size_t _Offset = 0, // skip this many elements in __from0 + typename _R = __fixed_size_storage_t<_Tp, _Np>, typename _V0, + typename _V0VT = _VectorTraits<_V0>, typename... _VX> + _GLIBCXX_SIMD_INTRINSIC _R constexpr __to_simd_tuple(const _V0 __from0, + const _VX... __fromX) + { + static_assert(is_same_v<typename _V0VT::value_type, _Tp>); + static_assert(_Offset < _V0VT::_S_full_size); + using _R0 = __vector_type_t<_Tp, _R::_S_first_size>; + if constexpr (_R::_S_tuple_size == 1) + { + if constexpr (_Np == 1) + return _R{__from0[_Offset]}; + else if constexpr (_Offset == 0 && _V0VT::_S_full_size >= _Np) + return _R{__intrin_bitcast<_R0>(__from0)}; + else if constexpr (_Offset * 2 == _V0VT::_S_full_size + && _V0VT::_S_full_size / 2 >= _Np) + return _R{__intrin_bitcast<_R0>(__extract_part<1, 2>(__from0))}; + else if constexpr (_Offset * 4 == _V0VT::_S_full_size + && _V0VT::_S_full_size / 4 >= _Np) + return _R{__intrin_bitcast<_R0>(__extract_part<1, 4>(__from0))}; + else + __assert_unreachable<_Tp>(); + } + else + { + if constexpr (1 == _R::_S_first_size) + { // extract one scalar and recurse + if constexpr (_Offset + 1 < _V0VT::_S_full_size) + return _R{__from0[_Offset], + __to_simd_tuple<_Tp, _Np - 1, _Offset + 1>(__from0, + __fromX...)}; + else + return _R{__from0[_Offset], + __to_simd_tuple<_Tp, _Np - 1, 0>(__fromX...)}; + } + + // place __from0 into _R::first and recurse for __fromX -> _R::second + else if constexpr (_V0VT::_S_full_size == _R::_S_first_size + && _Offset == 0) + return _R{__from0, + __to_simd_tuple<_Tp, _Np - _R::_S_first_size>(__fromX...)}; + + // place lower part of __from0 into _R::first and recurse with _Offset + else if constexpr (_V0VT::_S_full_size > _R::_S_first_size + && _Offset == 0) + return _R{__intrin_bitcast<_R0>(__from0), + __to_simd_tuple<_Tp, _Np - _R::_S_first_size, + _R::_S_first_size>(__from0, __fromX...)}; + + // place lower part of second quarter of __from0 into _R::first and + // recurse with _Offset + else if constexpr (_Offset * 4 == _V0VT::_S_full_size + && _V0VT::_S_full_size >= 4 * _R::_S_first_size) + return _R{__intrin_bitcast<_R0>(__extract_part<2, 4>(__from0)), + __to_simd_tuple<_Tp, _Np - _R::_S_first_size, + _Offset + _R::_S_first_size>(__from0, + __fromX...)}; + + // place lower half of high half of __from0 into _R::first and recurse + // with _Offset + else if constexpr (_Offset * 2 == _V0VT::_S_full_size + && _V0VT::_S_full_size >= 4 * _R::_S_first_size) + return _R{__intrin_bitcast<_R0>(__extract_part<2, 4>(__from0)), + __to_simd_tuple<_Tp, _Np - _R::_S_first_size, + _Offset + _R::_S_first_size>(__from0, + __fromX...)}; + + // place high half of __from0 into _R::first and recurse with __fromX + else if constexpr (_Offset * 2 == _V0VT::_S_full_size + && _V0VT::_S_full_size / 2 >= _R::_S_first_size) + return _R{__intrin_bitcast<_R0>(__extract_part<1, 2>(__from0)), + __to_simd_tuple<_Tp, _Np - _R::_S_first_size, 0>( + __fromX...)}; + + // ill-formed if some unforseen pattern is needed + else + __assert_unreachable<_Tp>(); + } + } + +template <typename _Tp, size_t _Np, typename _V, size_t _NV, typename... _VX> + _GLIBCXX_SIMD_INTRINSIC constexpr __fixed_size_storage_t<_Tp, _Np> + __to_simd_tuple(const array<_V, _NV>& __from, const _VX... __fromX) + { + if constexpr (is_same_v<_Tp, _V>) + { + static_assert( + sizeof...(_VX) == 0, + "An array of scalars must be the last argument to __to_simd_tuple"); + return __call_with_subscripts( + __from, + make_index_sequence<_NV>(), [&](const auto... __args) constexpr { + return __simd_tuple_concat( + _SimdTuple<_Tp, simd_abi::scalar>{__args}..., _SimdTuple<_Tp>()); + }); + } + else + return __call_with_subscripts( + __from, + make_index_sequence<_NV>(), [&](const auto... __args) constexpr { + return __to_simd_tuple<_Tp, _Np>(__args..., __fromX...); + }); + } + +template <size_t, typename _Tp> + using __to_tuple_helper = _Tp; + +template <typename _Tp, typename _A0, size_t _NOut, size_t _Np, + size_t... _Indexes> + _GLIBCXX_SIMD_INTRINSIC __fixed_size_storage_t<_Tp, _NOut> + __to_simd_tuple_impl(index_sequence<_Indexes...>, + const array<__vector_type_t<_Tp, simd_size_v<_Tp, _A0>>, _Np>& __args) + { + return __make_simd_tuple<_Tp, __to_tuple_helper<_Indexes, _A0>...>( + __args[_Indexes]...); + } + +template <typename _Tp, typename _A0, size_t _NOut, size_t _Np, + typename _R = __fixed_size_storage_t<_Tp, _NOut>> + _GLIBCXX_SIMD_INTRINSIC _R + __to_simd_tuple_sized( + const array<__vector_type_t<_Tp, simd_size_v<_Tp, _A0>>, _Np>& __args) + { + static_assert(_Np * simd_size_v<_Tp, _A0> >= _NOut); + return __to_simd_tuple_impl<_Tp, _A0, _NOut>( + make_index_sequence<_R::_S_tuple_size>(), __args); + } + +// __optimize_simd_tuple {{{1 +template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC _SimdTuple<_Tp> + __optimize_simd_tuple(const _SimdTuple<_Tp>) + { return {}; } + +template <typename _Tp, typename _Ap> + _GLIBCXX_SIMD_INTRINSIC const _SimdTuple<_Tp, _Ap>& + __optimize_simd_tuple(const _SimdTuple<_Tp, _Ap>& __x) + { return __x; } + +template <typename _Tp, typename _A0, typename _A1, typename... _Abis, + typename _R = __fixed_size_storage_t< + _Tp, _SimdTuple<_Tp, _A0, _A1, _Abis...>::_S_size()>> + _GLIBCXX_SIMD_INTRINSIC _R + __optimize_simd_tuple(const _SimdTuple<_Tp, _A0, _A1, _Abis...>& __x) + { + using _Tup = _SimdTuple<_Tp, _A0, _A1, _Abis...>; + if constexpr (is_same_v<_R, _Tup>) + return __x; + else if constexpr (is_same_v<typename _R::_FirstType, + typename _Tup::_FirstType>) + return {__x.first, __optimize_simd_tuple(__x.second)}; + else if constexpr (__is_scalar_abi<_A0>() + || _A0::template _S_is_partial<_Tp>) + return {__generate_from_n_evaluations<_R::_S_first_size, + typename _R::_FirstType>( + [&](auto __i) { return __x[__i]; }), + __optimize_simd_tuple( + __simd_tuple_pop_front<_R::_S_first_size>(__x))}; + else if constexpr (is_same_v<_A0, _A1> + && _R::_S_first_size == simd_size_v<_Tp, _A0> + simd_size_v<_Tp, _A1>) + return {__concat(__x.template _M_at<0>(), __x.template _M_at<1>()), + __optimize_simd_tuple(__x.second.second)}; + else if constexpr (sizeof...(_Abis) >= 2 + && _R::_S_first_size == (4 * simd_size_v<_Tp, _A0>) + && simd_size_v<_Tp, _A0> == __simd_tuple_element_t< + (sizeof...(_Abis) >= 2 ? 3 : 0), _Tup>::size()) + return { + __concat(__concat(__x.template _M_at<0>(), __x.template _M_at<1>()), + __concat(__x.template _M_at<2>(), __x.template _M_at<3>())), + __optimize_simd_tuple(__x.second.second.second.second)}; + else + { + static_assert(sizeof(_R) == sizeof(__x)); + _R __r; + __builtin_memcpy(__r._M_as_charptr(), __x._M_as_charptr(), + sizeof(_Tp) * _R::_S_size()); + return __r; + } + } + +// __for_each(const _SimdTuple &, Fun) {{{1 +template <size_t _Offset = 0, typename _Tp, typename _A0, typename _Fp> + _GLIBCXX_SIMD_INTRINSIC constexpr void + __for_each(const _SimdTuple<_Tp, _A0>& __t, _Fp&& __fun) + { static_cast<_Fp&&>(__fun)(__make_meta<_Offset>(__t), __t.first); } + +template <size_t _Offset = 0, typename _Tp, typename _A0, typename _A1, + typename... _As, typename _Fp> + _GLIBCXX_SIMD_INTRINSIC constexpr void + __for_each(const _SimdTuple<_Tp, _A0, _A1, _As...>& __t, _Fp&& __fun) + { + __fun(__make_meta<_Offset>(__t), __t.first); + __for_each<_Offset + simd_size<_Tp, _A0>::value>(__t.second, + static_cast<_Fp&&>(__fun)); + } + +// __for_each(_SimdTuple &, Fun) {{{1 +template <size_t _Offset = 0, typename _Tp, typename _A0, typename _Fp> + _GLIBCXX_SIMD_INTRINSIC constexpr void + __for_each(_SimdTuple<_Tp, _A0>& __t, _Fp&& __fun) + { static_cast<_Fp&&>(__fun)(__make_meta<_Offset>(__t), __t.first); } + +template <size_t _Offset = 0, typename _Tp, typename _A0, typename _A1, + typename... _As, typename _Fp> + _GLIBCXX_SIMD_INTRINSIC constexpr void + __for_each(_SimdTuple<_Tp, _A0, _A1, _As...>& __t, _Fp&& __fun) + { + __fun(__make_meta<_Offset>(__t), __t.first); + __for_each<_Offset + simd_size<_Tp, _A0>::value>(__t.second, + static_cast<_Fp&&>(__fun)); + } + +// __for_each(_SimdTuple &, const _SimdTuple &, Fun) {{{1 +template <size_t _Offset = 0, typename _Tp, typename _A0, typename _Fp> + _GLIBCXX_SIMD_INTRINSIC constexpr void + __for_each(_SimdTuple<_Tp, _A0>& __a, const _SimdTuple<_Tp, _A0>& __b, + _Fp&& __fun) + { + static_cast<_Fp&&>(__fun)(__make_meta<_Offset>(__a), __a.first, __b.first); + } + +template <size_t _Offset = 0, typename _Tp, typename _A0, typename _A1, + typename... _As, typename _Fp> + _GLIBCXX_SIMD_INTRINSIC constexpr void + __for_each(_SimdTuple<_Tp, _A0, _A1, _As...>& __a, + const _SimdTuple<_Tp, _A0, _A1, _As...>& __b, _Fp&& __fun) + { + __fun(__make_meta<_Offset>(__a), __a.first, __b.first); + __for_each<_Offset + simd_size<_Tp, _A0>::value>(__a.second, __b.second, + static_cast<_Fp&&>(__fun)); + } + +// __for_each(const _SimdTuple &, const _SimdTuple &, Fun) {{{1 +template <size_t _Offset = 0, typename _Tp, typename _A0, typename _Fp> + _GLIBCXX_SIMD_INTRINSIC constexpr void + __for_each(const _SimdTuple<_Tp, _A0>& __a, const _SimdTuple<_Tp, _A0>& __b, + _Fp&& __fun) + { + static_cast<_Fp&&>(__fun)(__make_meta<_Offset>(__a), __a.first, __b.first); + } + +template <size_t _Offset = 0, typename _Tp, typename _A0, typename _A1, + typename... _As, typename _Fp> + _GLIBCXX_SIMD_INTRINSIC constexpr void + __for_each(const _SimdTuple<_Tp, _A0, _A1, _As...>& __a, + const _SimdTuple<_Tp, _A0, _A1, _As...>& __b, _Fp&& __fun) + { + __fun(__make_meta<_Offset>(__a), __a.first, __b.first); + __for_each<_Offset + simd_size<_Tp, _A0>::value>(__a.second, __b.second, + static_cast<_Fp&&>(__fun)); + } + +// }}}1 +// __extract_part(_SimdTuple) {{{ +template <int _Index, int _Total, int _Combine, typename _Tp, typename _A0, + typename... _As> + _GLIBCXX_SIMD_INTRINSIC auto // __vector_type_t or _SimdTuple + __extract_part(const _SimdTuple<_Tp, _A0, _As...>& __x) + { + // worst cases: + // (a) 4, 4, 4 => 3, 3, 3, 3 (_Total = 4) + // (b) 2, 2, 2 => 3, 3 (_Total = 2) + // (c) 4, 2 => 2, 2, 2 (_Total = 3) + using _Tuple = _SimdTuple<_Tp, _A0, _As...>; + static_assert(_Index + _Combine <= _Total && _Index >= 0 && _Total >= 1); + constexpr size_t _Np = _Tuple::_S_size(); + static_assert(_Np >= _Total && _Np % _Total == 0); + constexpr size_t __values_per_part = _Np / _Total; + [[maybe_unused]] constexpr size_t __values_to_skip + = _Index * __values_per_part; + constexpr size_t __return_size = __values_per_part * _Combine; + using _RetAbi = simd_abi::deduce_t<_Tp, __return_size>; + + // handle (optimize) the simple cases + if constexpr (_Index == 0 && _Tuple::_S_first_size == __return_size) + return __x.first._M_data; + else if constexpr (_Index == 0 && _Total == _Combine) + return __x; + else if constexpr (_Index == 0 && _Tuple::_S_first_size >= __return_size) + return __intrin_bitcast<__vector_type_t<_Tp, __return_size>>( + __as_vector(__x.first)); + + // recurse to skip unused data members at the beginning of _SimdTuple + else if constexpr (__values_to_skip >= _Tuple::_S_first_size) + { // recurse + if constexpr (_Tuple::_S_first_size % __values_per_part == 0) + { + constexpr int __parts_in_first + = _Tuple::_S_first_size / __values_per_part; + return __extract_part<_Index - __parts_in_first, + _Total - __parts_in_first, _Combine>( + __x.second); + } + else + return __extract_part<__values_to_skip - _Tuple::_S_first_size, + _Np - _Tuple::_S_first_size, __return_size>( + __x.second); + } + + // extract from multiple _SimdTuple data members + else if constexpr (__return_size > _Tuple::_S_first_size - __values_to_skip) + { +#ifdef _GLIBCXX_SIMD_USE_ALIASING_LOADS + const __may_alias<_Tp>* const element_ptr + = reinterpret_cast<const __may_alias<_Tp>*>(&__x) + __values_to_skip; + return __as_vector(simd<_Tp, _RetAbi>(element_ptr, element_aligned)); +#else + [[maybe_unused]] constexpr size_t __offset = __values_to_skip; + return __as_vector(simd<_Tp, _RetAbi>([&](auto __i) constexpr { + constexpr _SizeConstant<__i + __offset> __k; + return __x[__k]; + })); +#endif + } + + // all of the return values are in __x.first + else if constexpr (_Tuple::_S_first_size % __values_per_part == 0) + return __extract_part<_Index, _Tuple::_S_first_size / __values_per_part, + _Combine>(__x.first); + else + return __extract_part<__values_to_skip, _Tuple::_S_first_size, + _Combine * __values_per_part>(__x.first); + } + +// }}} +// __fixed_size_storage_t<_Tp, _Np>{{{ +template <typename _Tp, int _Np, typename _Tuple, + typename _Next = simd<_Tp, _AllNativeAbis::_BestAbi<_Tp, _Np>>, + int _Remain = _Np - int(_Next::size())> + struct __fixed_size_storage_builder; + +template <typename _Tp, int _Np> + struct __fixed_size_storage + : public __fixed_size_storage_builder<_Tp, _Np, _SimdTuple<_Tp>> {}; + +template <typename _Tp, int _Np, typename... _As, typename _Next> + struct __fixed_size_storage_builder<_Tp, _Np, _SimdTuple<_Tp, _As...>, _Next, + 0> + { using type = _SimdTuple<_Tp, _As..., typename _Next::abi_type>; }; + +template <typename _Tp, int _Np, typename... _As, typename _Next, int _Remain> + struct __fixed_size_storage_builder<_Tp, _Np, _SimdTuple<_Tp, _As...>, _Next, + _Remain> + { + using type = typename __fixed_size_storage_builder< + _Tp, _Remain, _SimdTuple<_Tp, _As..., typename _Next::abi_type>>::type; + }; + +// }}} +// _AbisInSimdTuple {{{ +template <typename _Tp> + struct _SeqOp; + +template <size_t _I0, size_t... _Is> + struct _SeqOp<index_sequence<_I0, _Is...>> + { + using _FirstPlusOne = index_sequence<_I0 + 1, _Is...>; + using _NotFirstPlusOne = index_sequence<_I0, (_Is + 1)...>; + template <size_t _First, size_t _Add> + using _Prepend = index_sequence<_First, _I0 + _Add, (_Is + _Add)...>; + }; + +template <typename _Tp> + struct _AbisInSimdTuple; + +template <typename _Tp> + struct _AbisInSimdTuple<_SimdTuple<_Tp>> + { + using _Counts = index_sequence<0>; + using _Begins = index_sequence<0>; + }; + +template <typename _Tp, typename _Ap> + struct _AbisInSimdTuple<_SimdTuple<_Tp, _Ap>> + { + using _Counts = index_sequence<1>; + using _Begins = index_sequence<0>; + }; + +template <typename _Tp, typename _A0, typename... _As> + struct _AbisInSimdTuple<_SimdTuple<_Tp, _A0, _A0, _As...>> + { + using _Counts = typename _SeqOp<typename _AbisInSimdTuple< + _SimdTuple<_Tp, _A0, _As...>>::_Counts>::_FirstPlusOne; + using _Begins = typename _SeqOp<typename _AbisInSimdTuple< + _SimdTuple<_Tp, _A0, _As...>>::_Begins>::_NotFirstPlusOne; + }; + +template <typename _Tp, typename _A0, typename _A1, typename... _As> + struct _AbisInSimdTuple<_SimdTuple<_Tp, _A0, _A1, _As...>> + { + using _Counts = typename _SeqOp<typename _AbisInSimdTuple< + _SimdTuple<_Tp, _A1, _As...>>::_Counts>::template _Prepend<1, 0>; + using _Begins = typename _SeqOp<typename _AbisInSimdTuple< + _SimdTuple<_Tp, _A1, _As...>>::_Begins>::template _Prepend<0, 1>; + }; + +// }}} +// __autocvt_to_simd {{{ +template <typename _Tp, bool = is_arithmetic_v<__remove_cvref_t<_Tp>>> + struct __autocvt_to_simd + { + _Tp _M_data; + using _TT = __remove_cvref_t<_Tp>; + + operator _TT() + { return _M_data; } + + operator _TT&() + { + static_assert(is_lvalue_reference<_Tp>::value, ""); + static_assert(!is_const<_Tp>::value, ""); + return _M_data; + } + + operator _TT*() + { + static_assert(is_lvalue_reference<_Tp>::value, ""); + static_assert(!is_const<_Tp>::value, ""); + return &_M_data; + } + + constexpr inline __autocvt_to_simd(_Tp dd) : _M_data(dd) {} + + template <typename _Abi> + operator simd<typename _TT::value_type, _Abi>() + { return {__private_init, _M_data}; } + + template <typename _Abi> + operator simd<typename _TT::value_type, _Abi>&() + { + return *reinterpret_cast<simd<typename _TT::value_type, _Abi>*>( + &_M_data); + } + + template <typename _Abi> + operator simd<typename _TT::value_type, _Abi>*() + { + return reinterpret_cast<simd<typename _TT::value_type, _Abi>*>( + &_M_data); + } + }; + +template <typename _Tp> + __autocvt_to_simd(_Tp &&) -> __autocvt_to_simd<_Tp>; + +template <typename _Tp> + struct __autocvt_to_simd<_Tp, true> + { + using _TT = __remove_cvref_t<_Tp>; + _Tp _M_data; + fixed_size_simd<_TT, 1> _M_fd; + + constexpr inline __autocvt_to_simd(_Tp dd) : _M_data(dd), _M_fd(_M_data) {} + + ~__autocvt_to_simd() + { _M_data = __data(_M_fd).first; } + + operator fixed_size_simd<_TT, 1>() + { return _M_fd; } + + operator fixed_size_simd<_TT, 1> &() + { + static_assert(is_lvalue_reference<_Tp>::value, ""); + static_assert(!is_const<_Tp>::value, ""); + return _M_fd; + } + + operator fixed_size_simd<_TT, 1> *() + { + static_assert(is_lvalue_reference<_Tp>::value, ""); + static_assert(!is_const<_Tp>::value, ""); + return &_M_fd; + } + }; + +// }}} + +struct _CommonImplFixedSize; +template <int _Np> struct _SimdImplFixedSize; +template <int _Np> struct _MaskImplFixedSize; +// simd_abi::_Fixed {{{ +template <int _Np> + struct simd_abi::_Fixed + { + template <typename _Tp> static constexpr size_t _S_size = _Np; + template <typename _Tp> static constexpr size_t _S_full_size = _Np; + // validity traits {{{ + struct _IsValidAbiTag : public __bool_constant<(_Np > 0)> {}; + + template <typename _Tp> + struct _IsValidSizeFor + : __bool_constant<(_Np <= simd_abi::max_fixed_size<_Tp>)> {}; + + template <typename _Tp> + struct _IsValid : conjunction<_IsValidAbiTag, __is_vectorizable<_Tp>, + _IsValidSizeFor<_Tp>> {}; + + template <typename _Tp> + static constexpr bool _S_is_valid_v = _IsValid<_Tp>::value; + + // }}} + // _S_masked {{{ + _GLIBCXX_SIMD_INTRINSIC static constexpr _SanitizedBitMask<_Np> + _S_masked(_BitMask<_Np> __x) + { return __x._M_sanitized(); } + + _GLIBCXX_SIMD_INTRINSIC static constexpr _SanitizedBitMask<_Np> + _S_masked(_SanitizedBitMask<_Np> __x) + { return __x; } + + // }}} + // _*Impl {{{ + using _CommonImpl = _CommonImplFixedSize; + using _SimdImpl = _SimdImplFixedSize<_Np>; + using _MaskImpl = _MaskImplFixedSize<_Np>; + + // }}} + // __traits {{{ + template <typename _Tp, bool = _S_is_valid_v<_Tp>> + struct __traits : _InvalidTraits {}; + + template <typename _Tp> + struct __traits<_Tp, true> + { + using _IsValid = true_type; + using _SimdImpl = _SimdImplFixedSize<_Np>; + using _MaskImpl = _MaskImplFixedSize<_Np>; + + // simd and simd_mask member types {{{ + using _SimdMember = __fixed_size_storage_t<_Tp, _Np>; + using _MaskMember = _SanitizedBitMask<_Np>; + + static constexpr size_t _S_simd_align + = std::__bit_ceil(_Np * sizeof(_Tp)); + + static constexpr size_t _S_mask_align = alignof(_MaskMember); + + // }}} + // _SimdBase / base class for simd, providing extra conversions {{{ + struct _SimdBase + { + // The following ensures, function arguments are passed via the stack. + // This is important for ABI compatibility across TU boundaries + _SimdBase(const _SimdBase&) {} + _SimdBase() = default; + + explicit operator const _SimdMember &() const + { return static_cast<const simd<_Tp, _Fixed>*>(this)->_M_data; } + + explicit operator array<_Tp, _Np>() const + { + array<_Tp, _Np> __r; + // _SimdMember can be larger because of higher alignment + static_assert(sizeof(__r) <= sizeof(_SimdMember), ""); + __builtin_memcpy(__r.data(), &static_cast<const _SimdMember&>(*this), + sizeof(__r)); + return __r; + } + }; + + // }}} + // _MaskBase {{{ + // empty. The bitset interface suffices + struct _MaskBase {}; + + // }}} + // _SimdCastType {{{ + struct _SimdCastType + { + _SimdCastType(const array<_Tp, _Np>&); + _SimdCastType(const _SimdMember& dd) : _M_data(dd) {} + explicit operator const _SimdMember &() const { return _M_data; } + + private: + const _SimdMember& _M_data; + }; + + // }}} + // _MaskCastType {{{ + class _MaskCastType + { + _MaskCastType() = delete; + }; + // }}} + }; + // }}} + }; + +// }}} +// _CommonImplFixedSize {{{ +struct _CommonImplFixedSize +{ + // _S_store {{{ + template <typename _Tp, typename... _As> + _GLIBCXX_SIMD_INTRINSIC static void + _S_store(const _SimdTuple<_Tp, _As...>& __x, void* __addr) + { + constexpr size_t _Np = _SimdTuple<_Tp, _As...>::_S_size(); + __builtin_memcpy(__addr, &__x, _Np * sizeof(_Tp)); + } + + // }}} +}; + +// }}} +// _SimdImplFixedSize {{{1 +// fixed_size should not inherit from _SimdMathFallback in order for +// specializations in the used _SimdTuple Abis to get used +template <int _Np> + struct _SimdImplFixedSize + { + // member types {{{2 + using _MaskMember = _SanitizedBitMask<_Np>; + + template <typename _Tp> + using _SimdMember = __fixed_size_storage_t<_Tp, _Np>; + + template <typename _Tp> + static constexpr size_t _S_tuple_size = _SimdMember<_Tp>::_S_tuple_size; + + template <typename _Tp> + using _Simd = simd<_Tp, simd_abi::fixed_size<_Np>>; + + template <typename _Tp> + using _TypeTag = _Tp*; + + // broadcast {{{2 + template <typename _Tp> + static constexpr inline _SimdMember<_Tp> _S_broadcast(_Tp __x) noexcept + { + return _SimdMember<_Tp>::_S_generate([&](auto __meta) constexpr { + return __meta._S_broadcast(__x); + }); + } + + // _S_generator {{{2 + template <typename _Fp, typename _Tp> + static constexpr inline _SimdMember<_Tp> _S_generator(_Fp&& __gen, + _TypeTag<_Tp>) + { + return _SimdMember<_Tp>::_S_generate([&__gen](auto __meta) constexpr { + return __meta._S_generator( + [&](auto __i) constexpr { + return __i < _Np ? __gen(_SizeConstant<__meta._S_offset + __i>()) + : 0; + }, + _TypeTag<_Tp>()); + }); + } + + // _S_load {{{2 + template <typename _Tp, typename _Up> + static inline _SimdMember<_Tp> _S_load(const _Up* __mem, + _TypeTag<_Tp>) noexcept + { + return _SimdMember<_Tp>::_S_generate([&](auto __meta) { + return __meta._S_load(&__mem[__meta._S_offset], _TypeTag<_Tp>()); + }); + } + + // _S_masked_load {{{2 + template <typename _Tp, typename... _As, typename _Up> + static inline _SimdTuple<_Tp, _As...> + _S_masked_load(const _SimdTuple<_Tp, _As...>& __old, + const _MaskMember __bits, const _Up* __mem) noexcept + { + auto __merge = __old; + __for_each(__merge, [&](auto __meta, auto& __native) { + if (__meta._S_submask(__bits).any()) +#pragma GCC diagnostic push + // __mem + __mem._S_offset could be UB ([expr.add]/4.3, but it punts + // the responsibility for avoiding UB to the caller of the masked load + // via the mask. Consequently, the compiler may assume this branch is + // unreachable, if the pointer arithmetic is UB. +#pragma GCC diagnostic ignored "-Warray-bounds" + __native + = __meta._S_masked_load(__native, __meta._S_make_mask(__bits), + __mem + __meta._S_offset); +#pragma GCC diagnostic pop + }); + return __merge; + } + + // _S_store {{{2 + template <typename _Tp, typename _Up> + static inline void _S_store(const _SimdMember<_Tp>& __v, _Up* __mem, + _TypeTag<_Tp>) noexcept + { + __for_each(__v, [&](auto __meta, auto __native) { + __meta._S_store(__native, &__mem[__meta._S_offset], _TypeTag<_Tp>()); + }); + } + + // _S_masked_store {{{2 + template <typename _Tp, typename... _As, typename _Up> + static inline void _S_masked_store(const _SimdTuple<_Tp, _As...>& __v, + _Up* __mem, + const _MaskMember __bits) noexcept + { + __for_each(__v, [&](auto __meta, auto __native) { + if (__meta._S_submask(__bits).any()) +#pragma GCC diagnostic push + // __mem + __mem._S_offset could be UB ([expr.add]/4.3, but it punts + // the responsibility for avoiding UB to the caller of the masked + // store via the mask. Consequently, the compiler may assume this + // branch is unreachable, if the pointer arithmetic is UB. +#pragma GCC diagnostic ignored "-Warray-bounds" + __meta._S_masked_store(__native, __mem + __meta._S_offset, + __meta._S_make_mask(__bits)); +#pragma GCC diagnostic pop + }); + } + + // negation {{{2 + template <typename _Tp, typename... _As> + static inline _MaskMember + _S_negate(const _SimdTuple<_Tp, _As...>& __x) noexcept + { + _MaskMember __bits = 0; + __for_each( + __x, [&__bits](auto __meta, auto __native) constexpr { + __bits + |= __meta._S_mask_to_shifted_ullong(__meta._S_negate(__native)); + }); + return __bits; + } + + // reductions {{{2 + template <typename _Tp, typename _BinaryOperation> + static constexpr inline _Tp _S_reduce(const _Simd<_Tp>& __x, + const _BinaryOperation& __binary_op) + { + using _Tup = _SimdMember<_Tp>; + const _Tup& __tup = __data(__x); + if constexpr (_Tup::_S_tuple_size == 1) + return _Tup::_FirstAbi::_SimdImpl::_S_reduce( + __tup.template _M_simd_at<0>(), __binary_op); + else if constexpr (_Tup::_S_tuple_size == 2 && _Tup::_S_size() > 2 + && _Tup::_SecondType::_S_size() == 1) + { + return __binary_op(simd<_Tp, simd_abi::scalar>( + reduce(__tup.template _M_simd_at<0>(), + __binary_op)), + __tup.template _M_simd_at<1>())[0]; + } + else if constexpr (_Tup::_S_tuple_size == 2 && _Tup::_S_size() > 4 + && _Tup::_SecondType::_S_size() == 2) + { + return __binary_op( + simd<_Tp, simd_abi::scalar>( + reduce(__tup.template _M_simd_at<0>(), __binary_op)), + simd<_Tp, simd_abi::scalar>( + reduce(__tup.template _M_simd_at<1>(), __binary_op)))[0]; + } + else + { + const auto& __x2 = __call_with_n_evaluations< + __div_roundup(_Tup::_S_tuple_size, 2)>( + [](auto __first_simd, auto... __remaining) { + if constexpr (sizeof...(__remaining) == 0) + return __first_simd; + else + { + using _Tup2 + = _SimdTuple<_Tp, + typename decltype(__first_simd)::abi_type, + typename decltype(__remaining)::abi_type...>; + return fixed_size_simd<_Tp, _Tup2::_S_size()>( + __private_init, + __make_simd_tuple(__first_simd, __remaining...)); + } + }, + [&](auto __i) { + auto __left = __tup.template _M_simd_at<2 * __i>(); + if constexpr (2 * __i + 1 == _Tup::_S_tuple_size) + return __left; + else + { + auto __right = __tup.template _M_simd_at<2 * __i + 1>(); + using _LT = decltype(__left); + using _RT = decltype(__right); + if constexpr (_LT::size() == _RT::size()) + return __binary_op(__left, __right); + else + { + _GLIBCXX_SIMD_USE_CONSTEXPR_API + typename _LT::mask_type __k( + __private_init, + [](auto __j) constexpr { return __j < _RT::size(); }); + _LT __ext_right = __left; + where(__k, __ext_right) + = __proposed::resizing_simd_cast<_LT>(__right); + where(__k, __left) = __binary_op(__left, __ext_right); + return __left; + } + } + }); + return reduce(__x2, __binary_op); + } + } + + // _S_min, _S_max {{{2 + template <typename _Tp, typename... _As> + static inline constexpr _SimdTuple<_Tp, _As...> + _S_min(const _SimdTuple<_Tp, _As...>& __a, + const _SimdTuple<_Tp, _As...>& __b) + { + return __a._M_apply_per_chunk( + [](auto __impl, auto __aa, auto __bb) constexpr { + return __impl._S_min(__aa, __bb); + }, + __b); + } + + template <typename _Tp, typename... _As> + static inline constexpr _SimdTuple<_Tp, _As...> + _S_max(const _SimdTuple<_Tp, _As...>& __a, + const _SimdTuple<_Tp, _As...>& __b) + { + return __a._M_apply_per_chunk( + [](auto __impl, auto __aa, auto __bb) constexpr { + return __impl._S_max(__aa, __bb); + }, + __b); + } + + // _S_complement {{{2 + template <typename _Tp, typename... _As> + static inline constexpr _SimdTuple<_Tp, _As...> + _S_complement(const _SimdTuple<_Tp, _As...>& __x) noexcept + { + return __x._M_apply_per_chunk([](auto __impl, auto __xx) constexpr { + return __impl._S_complement(__xx); + }); + } + + // _S_unary_minus {{{2 + template <typename _Tp, typename... _As> + static inline constexpr _SimdTuple<_Tp, _As...> + _S_unary_minus(const _SimdTuple<_Tp, _As...>& __x) noexcept + { + return __x._M_apply_per_chunk([](auto __impl, auto __xx) constexpr { + return __impl._S_unary_minus(__xx); + }); + } + + // arithmetic operators {{{2 + +#define _GLIBCXX_SIMD_FIXED_OP(name_, op_) \ + template <typename _Tp, typename... _As> \ + static inline constexpr _SimdTuple<_Tp, _As...> name_( \ + const _SimdTuple<_Tp, _As...> __x, const _SimdTuple<_Tp, _As...> __y) \ + { \ + return __x._M_apply_per_chunk( \ + [](auto __impl, auto __xx, auto __yy) constexpr { \ + return __impl.name_(__xx, __yy); \ + }, \ + __y); \ + } + + _GLIBCXX_SIMD_FIXED_OP(_S_plus, +) + _GLIBCXX_SIMD_FIXED_OP(_S_minus, -) + _GLIBCXX_SIMD_FIXED_OP(_S_multiplies, *) + _GLIBCXX_SIMD_FIXED_OP(_S_divides, /) + _GLIBCXX_SIMD_FIXED_OP(_S_modulus, %) + _GLIBCXX_SIMD_FIXED_OP(_S_bit_and, &) + _GLIBCXX_SIMD_FIXED_OP(_S_bit_or, |) + _GLIBCXX_SIMD_FIXED_OP(_S_bit_xor, ^) + _GLIBCXX_SIMD_FIXED_OP(_S_bit_shift_left, <<) + _GLIBCXX_SIMD_FIXED_OP(_S_bit_shift_right, >>) +#undef _GLIBCXX_SIMD_FIXED_OP + + template <typename _Tp, typename... _As> + static inline constexpr _SimdTuple<_Tp, _As...> + _S_bit_shift_left(const _SimdTuple<_Tp, _As...>& __x, int __y) + { + return __x._M_apply_per_chunk([__y](auto __impl, auto __xx) constexpr { + return __impl._S_bit_shift_left(__xx, __y); + }); + } + + template <typename _Tp, typename... _As> + static inline constexpr _SimdTuple<_Tp, _As...> + _S_bit_shift_right(const _SimdTuple<_Tp, _As...>& __x, int __y) + { + return __x._M_apply_per_chunk([__y](auto __impl, auto __xx) constexpr { + return __impl._S_bit_shift_right(__xx, __y); + }); + } + + // math {{{2 +#define _GLIBCXX_SIMD_APPLY_ON_TUPLE(_RetTp, __name) \ + template <typename _Tp, typename... _As, typename... _More> \ + static inline __fixed_size_storage_t<_RetTp, _Np> \ + _S_##__name(const _SimdTuple<_Tp, _As...>& __x, \ + const _More&... __more) \ + { \ + if constexpr (sizeof...(_More) == 0) \ + { \ + if constexpr (is_same_v<_Tp, _RetTp>) \ + return __x._M_apply_per_chunk( \ + [](auto __impl, auto __xx) constexpr { \ + using _V = typename decltype(__impl)::simd_type; \ + return __data(__name(_V(__private_init, __xx))); \ + }); \ + else \ + return __optimize_simd_tuple( \ + __x.template _M_apply_r<_RetTp>([](auto __impl, auto __xx) { \ + return __impl._S_##__name(__xx); \ + })); \ + } \ + else if constexpr ( \ + is_same_v< \ + _Tp, \ + _RetTp> && (... && is_same_v<_SimdTuple<_Tp, _As...>, _More>) ) \ + return __x._M_apply_per_chunk( \ + [](auto __impl, auto __xx, auto... __pack) constexpr { \ + using _V = typename decltype(__impl)::simd_type; \ + return __data(__name(_V(__private_init, __xx), \ + _V(__private_init, __pack)...)); \ + }, \ + __more...); \ + else if constexpr (is_same_v<_Tp, _RetTp>) \ + return __x._M_apply_per_chunk( \ + [](auto __impl, auto __xx, auto... __pack) constexpr { \ + using _V = typename decltype(__impl)::simd_type; \ + return __data(__name(_V(__private_init, __xx), \ + __autocvt_to_simd(__pack)...)); \ + }, \ + __more...); \ + else \ + __assert_unreachable<_Tp>(); \ + } + + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, acos) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, asin) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, atan) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, atan2) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, cos) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, sin) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, tan) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, acosh) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, asinh) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, atanh) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, cosh) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, sinh) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, tanh) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, exp) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, exp2) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, expm1) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(int, ilogb) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, log) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, log10) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, log1p) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, log2) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, logb) + // modf implemented in simd_math.h + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, + scalbn) // double scalbn(double x, int exp); + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, scalbln) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, cbrt) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, abs) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, fabs) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, pow) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, sqrt) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, erf) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, erfc) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, lgamma) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, tgamma) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, trunc) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, ceil) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, floor) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, nearbyint) + + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, rint) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(long, lrint) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(long long, llrint) + + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, round) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(long, lround) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(long long, llround) + + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, ldexp) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, fmod) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, remainder) + // copysign in simd_math.h + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, nextafter) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, fdim) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, fmax) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, fmin) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, fma) + _GLIBCXX_SIMD_APPLY_ON_TUPLE(int, fpclassify) +#undef _GLIBCXX_SIMD_APPLY_ON_TUPLE + + template <typename _Tp, typename... _Abis> + static _SimdTuple<_Tp, _Abis...> _S_remquo( + const _SimdTuple<_Tp, _Abis...>& __x, + const _SimdTuple<_Tp, _Abis...>& __y, + __fixed_size_storage_t<int, _SimdTuple<_Tp, _Abis...>::_S_size()>* __z) + { + return __x._M_apply_per_chunk( + [](auto __impl, const auto __xx, const auto __yy, auto& __zz) { + return __impl._S_remquo(__xx, __yy, &__zz); + }, + __y, *__z); + } + + template <typename _Tp, typename... _As> + static inline _SimdTuple<_Tp, _As...> + _S_frexp(const _SimdTuple<_Tp, _As...>& __x, + __fixed_size_storage_t<int, _Np>& __exp) noexcept + { + return __x._M_apply_per_chunk( + [](auto __impl, const auto& __a, auto& __b) { + return __data( + frexp(typename decltype(__impl)::simd_type(__private_init, __a), + __autocvt_to_simd(__b))); + }, + __exp); + } + +#define _GLIBCXX_SIMD_TEST_ON_TUPLE_(name_) \ + template <typename _Tp, typename... _As> \ + static inline _MaskMember \ + _S_##name_(const _SimdTuple<_Tp, _As...>& __x) noexcept \ + { \ + return _M_test([](auto __impl, \ + auto __xx) { return __impl._S_##name_(__xx); }, \ + __x); \ + } + + _GLIBCXX_SIMD_TEST_ON_TUPLE_(isinf) + _GLIBCXX_SIMD_TEST_ON_TUPLE_(isfinite) + _GLIBCXX_SIMD_TEST_ON_TUPLE_(isnan) + _GLIBCXX_SIMD_TEST_ON_TUPLE_(isnormal) + _GLIBCXX_SIMD_TEST_ON_TUPLE_(signbit) +#undef _GLIBCXX_SIMD_TEST_ON_TUPLE_ + + // _S_increment & _S_decrement{{{2 + template <typename... _Ts> + _GLIBCXX_SIMD_INTRINSIC static constexpr void + _S_increment(_SimdTuple<_Ts...>& __x) + { + __for_each( + __x, [](auto __meta, auto& native) constexpr { + __meta._S_increment(native); + }); + } + + template <typename... _Ts> + _GLIBCXX_SIMD_INTRINSIC static constexpr void + _S_decrement(_SimdTuple<_Ts...>& __x) + { + __for_each( + __x, [](auto __meta, auto& native) constexpr { + __meta._S_decrement(native); + }); + } + + // compares {{{2 +#define _GLIBCXX_SIMD_CMP_OPERATIONS(__cmp) \ + template <typename _Tp, typename... _As> \ + _GLIBCXX_SIMD_INTRINSIC constexpr static _MaskMember \ + __cmp(const _SimdTuple<_Tp, _As...>& __x, \ + const _SimdTuple<_Tp, _As...>& __y) \ + { \ + return _M_test( \ + [](auto __impl, auto __xx, auto __yy) constexpr { \ + return __impl.__cmp(__xx, __yy); \ + }, \ + __x, __y); \ + } + + _GLIBCXX_SIMD_CMP_OPERATIONS(_S_equal_to) + _GLIBCXX_SIMD_CMP_OPERATIONS(_S_not_equal_to) + _GLIBCXX_SIMD_CMP_OPERATIONS(_S_less) + _GLIBCXX_SIMD_CMP_OPERATIONS(_S_less_equal) + _GLIBCXX_SIMD_CMP_OPERATIONS(_S_isless) + _GLIBCXX_SIMD_CMP_OPERATIONS(_S_islessequal) + _GLIBCXX_SIMD_CMP_OPERATIONS(_S_isgreater) + _GLIBCXX_SIMD_CMP_OPERATIONS(_S_isgreaterequal) + _GLIBCXX_SIMD_CMP_OPERATIONS(_S_islessgreater) + _GLIBCXX_SIMD_CMP_OPERATIONS(_S_isunordered) +#undef _GLIBCXX_SIMD_CMP_OPERATIONS + + // smart_reference access {{{2 + template <typename _Tp, typename... _As, typename _Up> + _GLIBCXX_SIMD_INTRINSIC static void _S_set(_SimdTuple<_Tp, _As...>& __v, + int __i, _Up&& __x) noexcept + { __v._M_set(__i, static_cast<_Up&&>(__x)); } + + // _S_masked_assign {{{2 + template <typename _Tp, typename... _As> + _GLIBCXX_SIMD_INTRINSIC static void + _S_masked_assign(const _MaskMember __bits, _SimdTuple<_Tp, _As...>& __lhs, + const __type_identity_t<_SimdTuple<_Tp, _As...>>& __rhs) + { + __for_each( + __lhs, __rhs, + [&](auto __meta, auto& __native_lhs, auto __native_rhs) constexpr { + __meta._S_masked_assign(__meta._S_make_mask(__bits), __native_lhs, + __native_rhs); + }); + } + + // Optimization for the case where the RHS is a scalar. No need to broadcast + // the scalar to a simd first. + template <typename _Tp, typename... _As> + _GLIBCXX_SIMD_INTRINSIC static void + _S_masked_assign(const _MaskMember __bits, _SimdTuple<_Tp, _As...>& __lhs, + const __type_identity_t<_Tp> __rhs) + { + __for_each( + __lhs, [&](auto __meta, auto& __native_lhs) constexpr { + __meta._S_masked_assign(__meta._S_make_mask(__bits), __native_lhs, + __rhs); + }); + } + + // _S_masked_cassign {{{2 + template <typename _Op, typename _Tp, typename... _As> + static inline void _S_masked_cassign(const _MaskMember __bits, + _SimdTuple<_Tp, _As...>& __lhs, + const _SimdTuple<_Tp, _As...>& __rhs, + _Op __op) + { + __for_each( + __lhs, __rhs, + [&](auto __meta, auto& __native_lhs, auto __native_rhs) constexpr { + __meta.template _S_masked_cassign(__meta._S_make_mask(__bits), + __native_lhs, __native_rhs, __op); + }); + } + + // Optimization for the case where the RHS is a scalar. No need to broadcast + // the scalar to a simd first. + template <typename _Op, typename _Tp, typename... _As> + static inline void _S_masked_cassign(const _MaskMember __bits, + _SimdTuple<_Tp, _As...>& __lhs, + const _Tp& __rhs, _Op __op) + { + __for_each( + __lhs, [&](auto __meta, auto& __native_lhs) constexpr { + __meta.template _S_masked_cassign(__meta._S_make_mask(__bits), + __native_lhs, __rhs, __op); + }); + } + + // _S_masked_unary {{{2 + template <template <typename> class _Op, typename _Tp, typename... _As> + static inline _SimdTuple<_Tp, _As...> + _S_masked_unary(const _MaskMember __bits, + const _SimdTuple<_Tp, _As...> __v) // TODO: const-ref __v? + { + return __v._M_apply_wrapped([&__bits](auto __meta, + auto __native) constexpr { + return __meta.template _S_masked_unary<_Op>(__meta._S_make_mask( + __bits), + __native); + }); + } + + // }}}2 + }; + +// _MaskImplFixedSize {{{1 +template <int _Np> + struct _MaskImplFixedSize + { + static_assert( + sizeof(_ULLong) * __CHAR_BIT__ >= _Np, + "The fixed_size implementation relies on one _ULLong being able to store " + "all boolean elements."); // required in load & store + + // member types {{{ + using _Abi = simd_abi::fixed_size<_Np>; + + using _MaskMember = _SanitizedBitMask<_Np>; + + template <typename _Tp> + using _FirstAbi = typename __fixed_size_storage_t<_Tp, _Np>::_FirstAbi; + + template <typename _Tp> + using _TypeTag = _Tp*; + + // }}} + // _S_broadcast {{{ + template <typename> + _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember + _S_broadcast(bool __x) + { return __x ? ~_MaskMember() : _MaskMember(); } + + // }}} + // _S_load {{{ + template <typename> + _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember + _S_load(const bool* __mem) + { + using _Ip = __int_for_sizeof_t<bool>; + // the following load uses element_aligned and relies on __mem already + // carrying alignment information from when this load function was + // called. + const simd<_Ip, _Abi> __bools(reinterpret_cast<const __may_alias<_Ip>*>( + __mem), + element_aligned); + return __data(__bools != 0); + } + + // }}} + // _S_to_bits {{{ + template <bool _Sanitized> + _GLIBCXX_SIMD_INTRINSIC static constexpr _SanitizedBitMask<_Np> + _S_to_bits(_BitMask<_Np, _Sanitized> __x) + { + if constexpr (_Sanitized) + return __x; + else + return __x._M_sanitized(); + } + + // }}} + // _S_convert {{{ + template <typename _Tp, typename _Up, typename _UAbi> + _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember + _S_convert(simd_mask<_Up, _UAbi> __x) + { + return _UAbi::_MaskImpl::_S_to_bits(__data(__x)) + .template _M_extract<0, _Np>(); + } + + // }}} + // _S_from_bitmask {{{2 + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _MaskMember + _S_from_bitmask(_MaskMember __bits, _TypeTag<_Tp>) noexcept + { return __bits; } + + // _S_load {{{2 + static inline _MaskMember _S_load(const bool* __mem) noexcept + { + // TODO: _UChar is not necessarily the best type to use here. For smaller + // _Np _UShort, _UInt, _ULLong, float, and double can be more efficient. + _ULLong __r = 0; + using _Vs = __fixed_size_storage_t<_UChar, _Np>; + __for_each(_Vs{}, [&](auto __meta, auto) { + __r |= __meta._S_mask_to_shifted_ullong( + __meta._S_mask_impl._S_load(&__mem[__meta._S_offset], + _SizeConstant<__meta._S_size()>())); + }); + return __r; + } + + // _S_masked_load {{{2 + static inline _MaskMember _S_masked_load(_MaskMember __merge, + _MaskMember __mask, + const bool* __mem) noexcept + { + _BitOps::_S_bit_iteration(__mask.to_ullong(), [&](auto __i) { + __merge.set(__i, __mem[__i]); + }); + return __merge; + } + + // _S_store {{{2 + static inline void _S_store(const _MaskMember __bitmask, + bool* __mem) noexcept + { + if constexpr (_Np == 1) + __mem[0] = __bitmask[0]; + else + _FirstAbi<_UChar>::_CommonImpl::_S_store_bool_array(__bitmask, __mem); + } + + // _S_masked_store {{{2 + static inline void _S_masked_store(const _MaskMember __v, bool* __mem, + const _MaskMember __k) noexcept + { + _BitOps::_S_bit_iteration(__k, [&](auto __i) { __mem[__i] = __v[__i]; }); + } + + // logical and bitwise operators {{{2 + _GLIBCXX_SIMD_INTRINSIC static _MaskMember + _S_logical_and(const _MaskMember& __x, const _MaskMember& __y) noexcept + { return __x & __y; } + + _GLIBCXX_SIMD_INTRINSIC static _MaskMember + _S_logical_or(const _MaskMember& __x, const _MaskMember& __y) noexcept + { return __x | __y; } + + _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember + _S_bit_not(const _MaskMember& __x) noexcept + { return ~__x; } + + _GLIBCXX_SIMD_INTRINSIC static _MaskMember + _S_bit_and(const _MaskMember& __x, const _MaskMember& __y) noexcept + { return __x & __y; } + + _GLIBCXX_SIMD_INTRINSIC static _MaskMember + _S_bit_or(const _MaskMember& __x, const _MaskMember& __y) noexcept + { return __x | __y; } + + _GLIBCXX_SIMD_INTRINSIC static _MaskMember + _S_bit_xor(const _MaskMember& __x, const _MaskMember& __y) noexcept + { return __x ^ __y; } + + // smart_reference access {{{2 + _GLIBCXX_SIMD_INTRINSIC static void _S_set(_MaskMember& __k, int __i, + bool __x) noexcept + { __k.set(__i, __x); } + + // _S_masked_assign {{{2 + _GLIBCXX_SIMD_INTRINSIC static void + _S_masked_assign(const _MaskMember __k, _MaskMember& __lhs, + const _MaskMember __rhs) + { __lhs = (__lhs & ~__k) | (__rhs & __k); } + + // Optimization for the case where the RHS is a scalar. + _GLIBCXX_SIMD_INTRINSIC static void _S_masked_assign(const _MaskMember __k, + _MaskMember& __lhs, + const bool __rhs) + { + if (__rhs) + __lhs |= __k; + else + __lhs &= ~__k; + } + + // }}}2 + // _S_all_of {{{ + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static bool _S_all_of(simd_mask<_Tp, _Abi> __k) + { return __data(__k).all(); } + + // }}} + // _S_any_of {{{ + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static bool _S_any_of(simd_mask<_Tp, _Abi> __k) + { return __data(__k).any(); } + + // }}} + // _S_none_of {{{ + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static bool _S_none_of(simd_mask<_Tp, _Abi> __k) + { return __data(__k).none(); } + + // }}} + // _S_some_of {{{ + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static bool + _S_some_of([[maybe_unused]] simd_mask<_Tp, _Abi> __k) + { + if constexpr (_Np == 1) + return false; + else + return __data(__k).any() && !__data(__k).all(); + } + + // }}} + // _S_popcount {{{ + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static int _S_popcount(simd_mask<_Tp, _Abi> __k) + { return __data(__k).count(); } + + // }}} + // _S_find_first_set {{{ + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static int + _S_find_first_set(simd_mask<_Tp, _Abi> __k) + { return std::__countr_zero(__data(__k).to_ullong()); } + + // }}} + // _S_find_last_set {{{ + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static int + _S_find_last_set(simd_mask<_Tp, _Abi> __k) + { return std::__bit_width(__data(__k).to_ullong()) - 1; } + + // }}} + }; +// }}}1 + +_GLIBCXX_SIMD_END_NAMESPACE +#endif // __cplusplus >= 201703L +#endif // _GLIBCXX_EXPERIMENTAL_SIMD_FIXED_SIZE_H_ + +// vim: foldmethod=marker sw=2 noet ts=8 sts=2 tw=80 diff --git a/libstdc++-v3/include/experimental/bits/simd_math.h b/libstdc++-v3/include/experimental/bits/simd_math.h new file mode 100644 index 00000000000..bbaa899faa2 --- /dev/null +++ b/libstdc++-v3/include/experimental/bits/simd_math.h @@ -0,0 +1,1500 @@ +// Math overloads for simd -*- C++ -*- + +// Copyright (C) 2020 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// Under Section 7 of GPL version 3, you are granted additional +// permissions described in the GCC Runtime Library Exception, version +// 3.1, as published by the Free Software Foundation. + +// You should have received a copy of the GNU General Public License and +// a copy of the GCC Runtime Library Exception along with this program; +// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +// <http://www.gnu.org/licenses/>. + +#ifndef _GLIBCXX_EXPERIMENTAL_SIMD_MATH_H_ +#define _GLIBCXX_EXPERIMENTAL_SIMD_MATH_H_ + +#if __cplusplus >= 201703L + +#include <utility> +#include <iomanip> + +_GLIBCXX_SIMD_BEGIN_NAMESPACE +template <typename _Tp, typename _V> + using _Samesize = fixed_size_simd<_Tp, _V::size()>; + +// _Math_return_type {{{ +template <typename _DoubleR, typename _Tp, typename _Abi> + struct _Math_return_type; + +template <typename _DoubleR, typename _Tp, typename _Abi> + using _Math_return_type_t = + typename _Math_return_type<_DoubleR, _Tp, _Abi>::type; + +template <typename _Tp, typename _Abi> + struct _Math_return_type<double, _Tp, _Abi> + { using type = simd<_Tp, _Abi>; }; + +template <typename _Tp, typename _Abi> + struct _Math_return_type<bool, _Tp, _Abi> + { using type = simd_mask<_Tp, _Abi>; }; + +template <typename _DoubleR, typename _Tp, typename _Abi> + struct _Math_return_type + { using type = fixed_size_simd<_DoubleR, simd_size_v<_Tp, _Abi>>; }; + +//}}} +// _GLIBCXX_SIMD_MATH_CALL_ {{{ +#define _GLIBCXX_SIMD_MATH_CALL_(__name) \ +template <typename _Tp, typename _Abi, typename..., \ + typename _R = _Math_return_type_t< \ + decltype(std::__name(declval<double>())), _Tp, _Abi>> \ + enable_if_t<is_floating_point_v<_Tp>, _R> \ + __name(simd<_Tp, _Abi> __x) \ + { return {__private_init, _Abi::_SimdImpl::_S_##__name(__data(__x))}; } + +// }}} +//_Extra_argument_type{{{ +template <typename _Up, typename _Tp, typename _Abi> + struct _Extra_argument_type; + +template <typename _Tp, typename _Abi> + struct _Extra_argument_type<_Tp*, _Tp, _Abi> + { + using type = simd<_Tp, _Abi>*; + static constexpr double* declval(); + static constexpr bool __needs_temporary_scalar = true; + + _GLIBCXX_SIMD_INTRINSIC static constexpr auto _S_data(type __x) + { return &__data(*__x); } + }; + +template <typename _Up, typename _Tp, typename _Abi> + struct _Extra_argument_type<_Up*, _Tp, _Abi> + { + static_assert(is_integral_v<_Up>); + using type = fixed_size_simd<_Up, simd_size_v<_Tp, _Abi>>*; + static constexpr _Up* declval(); + static constexpr bool __needs_temporary_scalar = true; + + _GLIBCXX_SIMD_INTRINSIC static constexpr auto _S_data(type __x) + { return &__data(*__x); } + }; + +template <typename _Tp, typename _Abi> + struct _Extra_argument_type<_Tp, _Tp, _Abi> + { + using type = simd<_Tp, _Abi>; + static constexpr double declval(); + static constexpr bool __needs_temporary_scalar = false; + + _GLIBCXX_SIMD_INTRINSIC static constexpr decltype(auto) + _S_data(const type& __x) + { return __data(__x); } + }; + +template <typename _Up, typename _Tp, typename _Abi> + struct _Extra_argument_type + { + static_assert(is_integral_v<_Up>); + using type = fixed_size_simd<_Up, simd_size_v<_Tp, _Abi>>; + static constexpr _Up declval(); + static constexpr bool __needs_temporary_scalar = false; + + _GLIBCXX_SIMD_INTRINSIC static constexpr decltype(auto) + _S_data(const type& __x) + { return __data(__x); } + }; + +//}}} +// _GLIBCXX_SIMD_MATH_CALL2_ {{{ +#define _GLIBCXX_SIMD_MATH_CALL2_(__name, arg2_) \ +template < \ + typename _Tp, typename _Abi, typename..., \ + typename _Arg2 = _Extra_argument_type<arg2_, _Tp, _Abi>, \ + typename _R = _Math_return_type_t< \ + decltype(std::__name(declval<double>(), _Arg2::declval())), _Tp, _Abi>> \ + enable_if_t<is_floating_point_v<_Tp>, _R> \ + __name(const simd<_Tp, _Abi>& __x, const typename _Arg2::type& __y) \ + { \ + return {__private_init, \ + _Abi::_SimdImpl::_S_##__name(__data(__x), _Arg2::_S_data(__y))}; \ + } \ +template <typename _Up, typename _Tp, typename _Abi> \ + _GLIBCXX_SIMD_INTRINSIC _Math_return_type_t< \ + decltype(std::__name( \ + declval<double>(), \ + declval<enable_if_t< \ + conjunction_v< \ + is_same<arg2_, _Tp>, \ + negation<is_same<__remove_cvref_t<_Up>, simd<_Tp, _Abi>>>, \ + is_convertible<_Up, simd<_Tp, _Abi>>, is_floating_point<_Tp>>, \ + double>>())), \ + _Tp, _Abi> \ + __name(_Up&& __xx, const simd<_Tp, _Abi>& __yy) \ + { return __name(simd<_Tp, _Abi>(static_cast<_Up&&>(__xx)), __yy); } + +// }}} +// _GLIBCXX_SIMD_MATH_CALL3_ {{{ +#define _GLIBCXX_SIMD_MATH_CALL3_(__name, arg2_, arg3_) \ +template <typename _Tp, typename _Abi, typename..., \ + typename _Arg2 = _Extra_argument_type<arg2_, _Tp, _Abi>, \ + typename _Arg3 = _Extra_argument_type<arg3_, _Tp, _Abi>, \ + typename _R = _Math_return_type_t< \ + decltype(std::__name(declval<double>(), _Arg2::declval(), \ + _Arg3::declval())), \ + _Tp, _Abi>> \ + enable_if_t<is_floating_point_v<_Tp>, _R> \ + __name(const simd<_Tp, _Abi>& __x, const typename _Arg2::type& __y, \ + const typename _Arg3::type& __z) \ + { \ + return {__private_init, \ + _Abi::_SimdImpl::_S_##__name(__data(__x), _Arg2::_S_data(__y), \ + _Arg3::_S_data(__z))}; \ + } \ +template < \ + typename _T0, typename _T1, typename _T2, typename..., \ + typename _U0 = __remove_cvref_t<_T0>, \ + typename _U1 = __remove_cvref_t<_T1>, \ + typename _U2 = __remove_cvref_t<_T2>, \ + typename _Simd = conditional_t<is_simd_v<_U1>, _U1, _U2>, \ + typename = enable_if_t<conjunction_v< \ + is_simd<_Simd>, is_convertible<_T0&&, _Simd>, \ + is_convertible<_T1&&, _Simd>, is_convertible<_T2&&, _Simd>, \ + negation<conjunction< \ + is_simd<_U0>, is_floating_point<__value_type_or_identity_t<_U0>>>>>>> \ + _GLIBCXX_SIMD_INTRINSIC decltype(__name(declval<const _Simd&>(), \ + declval<const _Simd&>(), \ + declval<const _Simd&>())) \ + __name(_T0&& __xx, _T1&& __yy, _T2&& __zz) \ + { \ + return __name(_Simd(static_cast<_T0&&>(__xx)), \ + _Simd(static_cast<_T1&&>(__yy)), \ + _Simd(static_cast<_T2&&>(__zz))); \ + } + +// }}} +// __cosSeries {{{ +template <typename _Abi> + _GLIBCXX_SIMD_ALWAYS_INLINE static simd<float, _Abi> + __cosSeries(const simd<float, _Abi>& __x) + { + const simd<float, _Abi> __x2 = __x * __x; + simd<float, _Abi> __y; + __y = 0x1.ap-16f; // 1/8! + __y = __y * __x2 - 0x1.6c1p-10f; // -1/6! + __y = __y * __x2 + 0x1.555556p-5f; // 1/4! + return __y * (__x2 * __x2) - .5f * __x2 + 1.f; + } + +template <typename _Abi> + _GLIBCXX_SIMD_ALWAYS_INLINE static simd<double, _Abi> + __cosSeries(const simd<double, _Abi>& __x) + { + const simd<double, _Abi> __x2 = __x * __x; + simd<double, _Abi> __y; + __y = 0x1.AC00000000000p-45; // 1/16! + __y = __y * __x2 - 0x1.9394000000000p-37; // -1/14! + __y = __y * __x2 + 0x1.1EED8C0000000p-29; // 1/12! + __y = __y * __x2 - 0x1.27E4FB7400000p-22; // -1/10! + __y = __y * __x2 + 0x1.A01A01A018000p-16; // 1/8! + __y = __y * __x2 - 0x1.6C16C16C16C00p-10; // -1/6! + __y = __y * __x2 + 0x1.5555555555554p-5; // 1/4! + return (__y * __x2 - .5f) * __x2 + 1.f; + } + +// }}} +// __sinSeries {{{ +template <typename _Abi> + _GLIBCXX_SIMD_ALWAYS_INLINE static simd<float, _Abi> + __sinSeries(const simd<float, _Abi>& __x) + { + const simd<float, _Abi> __x2 = __x * __x; + simd<float, _Abi> __y; + __y = -0x1.9CC000p-13f; // -1/7! + __y = __y * __x2 + 0x1.111100p-7f; // 1/5! + __y = __y * __x2 - 0x1.555556p-3f; // -1/3! + return __y * (__x2 * __x) + __x; + } + +template <typename _Abi> + _GLIBCXX_SIMD_ALWAYS_INLINE static simd<double, _Abi> + __sinSeries(const simd<double, _Abi>& __x) + { + // __x = [0, 0.7854 = pi/4] + // __x² = [0, 0.6169 = pi²/8] + const simd<double, _Abi> __x2 = __x * __x; + simd<double, _Abi> __y; + __y = -0x1.ACF0000000000p-41; // -1/15! + __y = __y * __x2 + 0x1.6124400000000p-33; // 1/13! + __y = __y * __x2 - 0x1.AE64567000000p-26; // -1/11! + __y = __y * __x2 + 0x1.71DE3A5540000p-19; // 1/9! + __y = __y * __x2 - 0x1.A01A01A01A000p-13; // -1/7! + __y = __y * __x2 + 0x1.1111111111110p-7; // 1/5! + __y = __y * __x2 - 0x1.5555555555555p-3; // -1/3! + return __y * (__x2 * __x) + __x; + } + +// }}} +// __zero_low_bits {{{ +template <int _Bits, typename _Tp, typename _Abi> + _GLIBCXX_SIMD_INTRINSIC simd<_Tp, _Abi> + __zero_low_bits(simd<_Tp, _Abi> __x) + { + const simd<_Tp, _Abi> __bitmask + = __bit_cast<_Tp>(~make_unsigned_t<__int_for_sizeof_t<_Tp>>() << _Bits); + return {__private_init, + _Abi::_SimdImpl::_S_bit_and(__data(__x), __data(__bitmask))}; + } + +// }}} +// __fold_input {{{ + +/**@internal + * Fold @p x into [-¼π, ¼π] and remember the quadrant it came from: + * quadrant 0: [-¼π, ¼π] + * quadrant 1: [ ¼π, ¾π] + * quadrant 2: [ ¾π, 1¼π] + * quadrant 3: [1¼π, 1¾π] + * + * The algorithm determines `y` as the multiple `x - y * ¼π = [-¼π, ¼π]`. Using + * a bitmask, `y` is reduced to `quadrant`. `y` can be calculated as + * ``` + * y = trunc(x / ¼π); + * y += fmod(y, 2); + * ``` + * This can be simplified by moving the (implicit) division by 2 into the + * truncation expression. The `+= fmod` effect can the be achieved by using + * rounding instead of truncation: `y = round(x / ½π) * 2`. If precision allows, + * `2/π * x` is better (faster). + */ +template <typename _Tp, typename _Abi> + struct _Folded + { + simd<_Tp, _Abi> _M_x; + rebind_simd_t<int, simd<_Tp, _Abi>> _M_quadrant; + }; + +namespace __math_float { +inline constexpr float __pi_over_4 = 0x1.921FB6p-1f; // π/4 +inline constexpr float __2_over_pi = 0x1.45F306p-1f; // 2/π +inline constexpr float __pi_2_5bits0 + = 0x1.921fc0p0f; // π/2, 5 0-bits (least significant) +inline constexpr float __pi_2_5bits0_rem + = -0x1.5777a6p-21f; // π/2 - __pi_2_5bits0 +} // namespace __math_float +namespace __math_double { +inline constexpr double __pi_over_4 = 0x1.921fb54442d18p-1; // π/4 +inline constexpr double __2_over_pi = 0x1.45F306DC9C883p-1; // 2/π +inline constexpr double __pi_2 = 0x1.921fb54442d18p0; // π/2 +} // namespace __math_double + +template <typename _Abi> + _GLIBCXX_SIMD_ALWAYS_INLINE _Folded<float, _Abi> + __fold_input(const simd<float, _Abi>& __x) + { + using _V = simd<float, _Abi>; + using _IV = rebind_simd_t<int, _V>; + using namespace __math_float; + _Folded<float, _Abi> __r; + __r._M_x = abs(__x); +#if 0 + // zero most mantissa bits: + constexpr float __1_over_pi = 0x1.45F306p-2f; // 1/π + const auto __y = (__r._M_x * __1_over_pi + 0x1.8p23f) - 0x1.8p23f; + // split π into 4 parts, the first three with 13 trailing zeros (to make the + // following multiplications precise): + constexpr float __pi0 = 0x1.920000p1f; + constexpr float __pi1 = 0x1.fb4000p-11f; + constexpr float __pi2 = 0x1.444000p-23f; + constexpr float __pi3 = 0x1.68c234p-38f; + __r._M_x - __y*__pi0 - __y*__pi1 - __y*__pi2 - __y*__pi3 +#else + if (_GLIBCXX_SIMD_IS_UNLIKELY(all_of(__r._M_x < __pi_over_4))) + __r._M_quadrant = 0; + else if (_GLIBCXX_SIMD_IS_LIKELY(all_of(__r._M_x < 6 * __pi_over_4))) + { + const _V __y = nearbyint(__r._M_x * __2_over_pi); + __r._M_quadrant = static_simd_cast<_IV>(__y) & 3; // __y mod 4 + __r._M_x -= __y * __pi_2_5bits0; + __r._M_x -= __y * __pi_2_5bits0_rem; + } + else + { + using __math_double::__2_over_pi; + using __math_double::__pi_2; + using _VD = rebind_simd_t<double, _V>; + _VD __xd = static_simd_cast<_VD>(__r._M_x); + _VD __y = nearbyint(__xd * __2_over_pi); + __r._M_quadrant = static_simd_cast<_IV>(__y) & 3; // = __y mod 4 + __r._M_x = static_simd_cast<_V>(__xd - __y * __pi_2); + } +#endif + return __r; + } + +template <typename _Abi> + _GLIBCXX_SIMD_ALWAYS_INLINE _Folded<double, _Abi> + __fold_input(const simd<double, _Abi>& __x) + { + using _V = simd<double, _Abi>; + using _IV = rebind_simd_t<int, _V>; + using namespace __math_double; + + _Folded<double, _Abi> __r; + __r._M_x = abs(__x); + if (_GLIBCXX_SIMD_IS_UNLIKELY(all_of(__r._M_x < __pi_over_4))) + { + __r._M_quadrant = 0; + return __r; + } + const _V __y = nearbyint(__r._M_x / (2 * __pi_over_4)); + __r._M_quadrant = static_simd_cast<_IV>(__y) & 3; + + if (_GLIBCXX_SIMD_IS_LIKELY(all_of(__r._M_x < 1025 * __pi_over_4))) + { + // x - y * pi/2, y uses no more than 11 mantissa bits + __r._M_x -= __y * 0x1.921FB54443000p0; + __r._M_x -= __y * -0x1.73DCB3B39A000p-43; + __r._M_x -= __y * 0x1.45C06E0E68948p-86; + } + else if (_GLIBCXX_SIMD_IS_LIKELY(all_of(__y <= 0x1.0p30))) + { + // x - y * pi/2, y uses no more than 29 mantissa bits + __r._M_x -= __y * 0x1.921FB40000000p0; + __r._M_x -= __y * 0x1.4442D00000000p-24; + __r._M_x -= __y * 0x1.8469898CC5170p-48; + } + else + { + // x - y * pi/2, y may require all mantissa bits + const _V __y_hi = __zero_low_bits<26>(__y); + const _V __y_lo = __y - __y_hi; + const auto __pi_2_1 = 0x1.921FB50000000p0; + const auto __pi_2_2 = 0x1.110B460000000p-26; + const auto __pi_2_3 = 0x1.1A62630000000p-54; + const auto __pi_2_4 = 0x1.8A2E03707344Ap-81; + __r._M_x = __r._M_x - __y_hi * __pi_2_1 + - max(__y_hi * __pi_2_2, __y_lo * __pi_2_1) + - min(__y_hi * __pi_2_2, __y_lo * __pi_2_1) + - max(__y_hi * __pi_2_3, __y_lo * __pi_2_2) + - min(__y_hi * __pi_2_3, __y_lo * __pi_2_2) + - max(__y * __pi_2_4, __y_lo * __pi_2_3) + - min(__y * __pi_2_4, __y_lo * __pi_2_3); + } + return __r; + } + +// }}} +// __extract_exponent_as_int {{{ +template <typename _Tp, typename _Abi> + rebind_simd_t<int, simd<_Tp, _Abi>> + __extract_exponent_as_int(const simd<_Tp, _Abi>& __v) + { + using _Vp = simd<_Tp, _Abi>; + using _Up = make_unsigned_t<__int_for_sizeof_t<_Tp>>; + using namespace std::experimental::__float_bitwise_operators; + const _Vp __exponent_mask + = __infinity_v<_Tp>; // 0x7f800000 or 0x7ff0000000000000 + return static_simd_cast<rebind_simd_t<int, _Vp>>( + __bit_cast<rebind_simd_t<_Up, _Vp>>(__v & __exponent_mask) + >> (__digits_v<_Tp> - 1)); + } + +// }}} +// __impl_or_fallback {{{ +template <typename ImplFun, typename FallbackFun, typename... _Args> + _GLIBCXX_SIMD_INTRINSIC auto + __impl_or_fallback_dispatch(int, ImplFun&& __impl_fun, FallbackFun&&, + _Args&&... __args) + -> decltype(__impl_fun(static_cast<_Args&&>(__args)...)) + { return __impl_fun(static_cast<_Args&&>(__args)...); } + +template <typename ImplFun, typename FallbackFun, typename... _Args> + inline auto + __impl_or_fallback_dispatch(float, ImplFun&&, FallbackFun&& __fallback_fun, + _Args&&... __args) + -> decltype(__fallback_fun(static_cast<_Args&&>(__args)...)) + { return __fallback_fun(static_cast<_Args&&>(__args)...); } + +template <typename... _Args> + _GLIBCXX_SIMD_INTRINSIC auto + __impl_or_fallback(_Args&&... __args) + { + return __impl_or_fallback_dispatch(int(), static_cast<_Args&&>(__args)...); + } +//}}} + +// trigonometric functions {{{ +_GLIBCXX_SIMD_MATH_CALL_(acos) +_GLIBCXX_SIMD_MATH_CALL_(asin) +_GLIBCXX_SIMD_MATH_CALL_(atan) +_GLIBCXX_SIMD_MATH_CALL2_(atan2, _Tp) + +/* + * algorithm for sine and cosine: + * + * The result can be calculated with sine or cosine depending on the π/4 section + * the input is in. sine ≈ __x + __x³ cosine ≈ 1 - __x² + * + * sine: + * Map -__x to __x and invert the output + * Extend precision of __x - n * π/4 by calculating + * ((__x - n * p1) - n * p2) - n * p3 (p1 + p2 + p3 = π/4) + * + * Calculate Taylor series with tuned coefficients. + * Fix sign. + */ +// cos{{{ +template <typename _Tp, typename _Abi> + enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>> + cos(const simd<_Tp, _Abi>& __x) + { + using _V = simd<_Tp, _Abi>; + if constexpr (__is_scalar_abi<_Abi>() || __is_fixed_size_abi_v<_Abi>) + return {__private_init, _Abi::_SimdImpl::_S_cos(__data(__x))}; + else + { + if constexpr (is_same_v<_Tp, float>) + if (_GLIBCXX_SIMD_IS_UNLIKELY(any_of(abs(__x) >= 393382))) + return static_simd_cast<_V>( + cos(static_simd_cast<rebind_simd_t<double, _V>>(__x))); + + const auto __f = __fold_input(__x); + // quadrant | effect + // 0 | cosSeries, + + // 1 | sinSeries, - + // 2 | cosSeries, - + // 3 | sinSeries, + + using namespace std::experimental::__float_bitwise_operators; + const _V __sign_flip + = _V(-0.f) & static_simd_cast<_V>((1 + __f._M_quadrant) << 30); + + const auto __need_cos = (__f._M_quadrant & 1) == 0; + if (_GLIBCXX_SIMD_IS_UNLIKELY(all_of(__need_cos))) + return __sign_flip ^ __cosSeries(__f._M_x); + else if (_GLIBCXX_SIMD_IS_UNLIKELY(none_of(__need_cos))) + return __sign_flip ^ __sinSeries(__f._M_x); + else // some_of(__need_cos) + { + _V __r = __sinSeries(__f._M_x); + where(__need_cos.__cvt(), __r) = __cosSeries(__f._M_x); + return __r ^ __sign_flip; + } + } + } + +template <typename _Tp> + _GLIBCXX_SIMD_ALWAYS_INLINE + enable_if_t<is_floating_point<_Tp>::value, simd<_Tp, simd_abi::scalar>> + cos(simd<_Tp, simd_abi::scalar> __x) + { return std::cos(__data(__x)); } + +//}}} +// sin{{{ +template <typename _Tp, typename _Abi> + enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>> + sin(const simd<_Tp, _Abi>& __x) + { + using _V = simd<_Tp, _Abi>; + if constexpr (__is_scalar_abi<_Abi>() || __is_fixed_size_abi_v<_Abi>) + return {__private_init, _Abi::_SimdImpl::_S_sin(__data(__x))}; + else + { + if constexpr (is_same_v<_Tp, float>) + if (_GLIBCXX_SIMD_IS_UNLIKELY(any_of(abs(__x) >= 527449))) + return static_simd_cast<_V>( + sin(static_simd_cast<rebind_simd_t<double, _V>>(__x))); + + const auto __f = __fold_input(__x); + // quadrant | effect + // 0 | sinSeries + // 1 | cosSeries + // 2 | sinSeries, sign flip + // 3 | cosSeries, sign flip + using namespace std::experimental::__float_bitwise_operators; + const auto __sign_flip + = (__x ^ static_simd_cast<_V>(1 - __f._M_quadrant)) & _V(_Tp(-0.)); + + const auto __need_sin = (__f._M_quadrant & 1) == 0; + if (_GLIBCXX_SIMD_IS_UNLIKELY(all_of(__need_sin))) + return __sign_flip ^ __sinSeries(__f._M_x); + else if (_GLIBCXX_SIMD_IS_UNLIKELY(none_of(__need_sin))) + return __sign_flip ^ __cosSeries(__f._M_x); + else // some_of(__need_sin) + { + _V __r = __cosSeries(__f._M_x); + where(__need_sin.__cvt(), __r) = __sinSeries(__f._M_x); + return __sign_flip ^ __r; + } + } + } + +template <typename _Tp> + _GLIBCXX_SIMD_ALWAYS_INLINE + enable_if_t<is_floating_point<_Tp>::value, simd<_Tp, simd_abi::scalar>> + sin(simd<_Tp, simd_abi::scalar> __x) + { return std::sin(__data(__x)); } + +//}}} +_GLIBCXX_SIMD_MATH_CALL_(tan) +_GLIBCXX_SIMD_MATH_CALL_(acosh) +_GLIBCXX_SIMD_MATH_CALL_(asinh) +_GLIBCXX_SIMD_MATH_CALL_(atanh) +_GLIBCXX_SIMD_MATH_CALL_(cosh) +_GLIBCXX_SIMD_MATH_CALL_(sinh) +_GLIBCXX_SIMD_MATH_CALL_(tanh) +// }}} +// exponential functions {{{ +_GLIBCXX_SIMD_MATH_CALL_(exp) +_GLIBCXX_SIMD_MATH_CALL_(exp2) +_GLIBCXX_SIMD_MATH_CALL_(expm1) + +// }}} +// frexp {{{ +#if _GLIBCXX_SIMD_X86INTRIN +template <typename _Tp, size_t _Np> + _SimdWrapper<_Tp, _Np> + __getexp(_SimdWrapper<_Tp, _Np> __x) + { + if constexpr (__have_avx512vl && __is_sse_ps<_Tp, _Np>()) + return __auto_bitcast(_mm_getexp_ps(__to_intrin(__x))); + else if constexpr (__have_avx512f && __is_sse_ps<_Tp, _Np>()) + return __auto_bitcast(_mm512_getexp_ps(__auto_bitcast(__to_intrin(__x)))); + else if constexpr (__have_avx512vl && __is_sse_pd<_Tp, _Np>()) + return _mm_getexp_pd(__x); + else if constexpr (__have_avx512f && __is_sse_pd<_Tp, _Np>()) + return __lo128(_mm512_getexp_pd(__auto_bitcast(__x))); + else if constexpr (__have_avx512vl && __is_avx_ps<_Tp, _Np>()) + return _mm256_getexp_ps(__x); + else if constexpr (__have_avx512f && __is_avx_ps<_Tp, _Np>()) + return __lo256(_mm512_getexp_ps(__auto_bitcast(__x))); + else if constexpr (__have_avx512vl && __is_avx_pd<_Tp, _Np>()) + return _mm256_getexp_pd(__x); + else if constexpr (__have_avx512f && __is_avx_pd<_Tp, _Np>()) + return __lo256(_mm512_getexp_pd(__auto_bitcast(__x))); + else if constexpr (__is_avx512_ps<_Tp, _Np>()) + return _mm512_getexp_ps(__x); + else if constexpr (__is_avx512_pd<_Tp, _Np>()) + return _mm512_getexp_pd(__x); + else + __assert_unreachable<_Tp>(); + } + +template <typename _Tp, size_t _Np> + _SimdWrapper<_Tp, _Np> + __getmant_avx512(_SimdWrapper<_Tp, _Np> __x) + { + if constexpr (__have_avx512vl && __is_sse_ps<_Tp, _Np>()) + return __auto_bitcast(_mm_getmant_ps(__to_intrin(__x), _MM_MANT_NORM_p5_1, + _MM_MANT_SIGN_src)); + else if constexpr (__have_avx512f && __is_sse_ps<_Tp, _Np>()) + return __auto_bitcast(_mm512_getmant_ps(__auto_bitcast(__to_intrin(__x)), + _MM_MANT_NORM_p5_1, + _MM_MANT_SIGN_src)); + else if constexpr (__have_avx512vl && __is_sse_pd<_Tp, _Np>()) + return _mm_getmant_pd(__x, _MM_MANT_NORM_p5_1, _MM_MANT_SIGN_src); + else if constexpr (__have_avx512f && __is_sse_pd<_Tp, _Np>()) + return __lo128(_mm512_getmant_pd(__auto_bitcast(__x), _MM_MANT_NORM_p5_1, + _MM_MANT_SIGN_src)); + else if constexpr (__have_avx512vl && __is_avx_ps<_Tp, _Np>()) + return _mm256_getmant_ps(__x, _MM_MANT_NORM_p5_1, _MM_MANT_SIGN_src); + else if constexpr (__have_avx512f && __is_avx_ps<_Tp, _Np>()) + return __lo256(_mm512_getmant_ps(__auto_bitcast(__x), _MM_MANT_NORM_p5_1, + _MM_MANT_SIGN_src)); + else if constexpr (__have_avx512vl && __is_avx_pd<_Tp, _Np>()) + return _mm256_getmant_pd(__x, _MM_MANT_NORM_p5_1, _MM_MANT_SIGN_src); + else if constexpr (__have_avx512f && __is_avx_pd<_Tp, _Np>()) + return __lo256(_mm512_getmant_pd(__auto_bitcast(__x), _MM_MANT_NORM_p5_1, + _MM_MANT_SIGN_src)); + else if constexpr (__is_avx512_ps<_Tp, _Np>()) + return _mm512_getmant_ps(__x, _MM_MANT_NORM_p5_1, _MM_MANT_SIGN_src); + else if constexpr (__is_avx512_pd<_Tp, _Np>()) + return _mm512_getmant_pd(__x, _MM_MANT_NORM_p5_1, _MM_MANT_SIGN_src); + else + __assert_unreachable<_Tp>(); + } +#endif // _GLIBCXX_SIMD_X86INTRIN + +/** + * splits @p __v into exponent and mantissa, the sign is kept with the mantissa + * + * The return value will be in the range [0.5, 1.0[ + * The @p __e value will be an integer defining the power-of-two exponent + */ +template <typename _Tp, typename _Abi> + enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>> + frexp(const simd<_Tp, _Abi>& __x, _Samesize<int, simd<_Tp, _Abi>>* __exp) + { + if constexpr (simd_size_v<_Tp, _Abi> == 1) + { + int __tmp; + const auto __r = std::frexp(__x[0], &__tmp); + (*__exp)[0] = __tmp; + return __r; + } + else if constexpr (__is_fixed_size_abi_v<_Abi>) + { + return {__private_init, + _Abi::_SimdImpl::_S_frexp(__data(__x), __data(*__exp))}; +#if _GLIBCXX_SIMD_X86INTRIN + } + else if constexpr (__have_avx512f) + { + constexpr size_t _Np = simd_size_v<_Tp, _Abi>; + constexpr size_t _NI = _Np < 4 ? 4 : _Np; + const auto __v = __data(__x); + const auto __isnonzero + = _Abi::_SimdImpl::_S_isnonzerovalue_mask(__v._M_data); + const _SimdWrapper<int, _NI> __exp_plus1 + = 1 + __convert<_SimdWrapper<int, _NI>>(__getexp(__v))._M_data; + const _SimdWrapper<int, _Np> __e = __wrapper_bitcast<int, _Np>( + _Abi::_CommonImpl::_S_blend(_SimdWrapper<bool, _NI>(__isnonzero), + _SimdWrapper<int, _NI>(), __exp_plus1)); + simd_abi::deduce_t<int, _Np>::_CommonImpl::_S_store(__e, __exp); + return {__private_init, + _Abi::_CommonImpl::_S_blend(_SimdWrapper<bool, _Np>( + __isnonzero), + __v, __getmant_avx512(__v))}; +#endif // _GLIBCXX_SIMD_X86INTRIN + } + else + { + // fallback implementation + static_assert(sizeof(_Tp) == 4 || sizeof(_Tp) == 8); + using _V = simd<_Tp, _Abi>; + using _IV = rebind_simd_t<int, _V>; + using namespace std::experimental::__proposed; + using namespace std::experimental::__float_bitwise_operators; + + constexpr int __exp_adjust = sizeof(_Tp) == 4 ? 0x7e : 0x3fe; + constexpr int __exp_offset = sizeof(_Tp) == 4 ? 0x70 : 0x200; + constexpr _Tp __subnorm_scale = sizeof(_Tp) == 4 ? 0x1p112 : 0x1p512; + _GLIBCXX_SIMD_USE_CONSTEXPR_API _V __exponent_mask + = __infinity_v<_Tp>; // 0x7f800000 or 0x7ff0000000000000 + _GLIBCXX_SIMD_USE_CONSTEXPR_API _V __p5_1_exponent + = -(2 - __epsilon_v<_Tp>) / 2; // 0xbf7fffff or 0xbfefffffffffffff + + _V __mant = __p5_1_exponent & (__exponent_mask | __x); // +/-[.5, 1) + const _IV __exponent_bits = __extract_exponent_as_int(__x); + if (_GLIBCXX_SIMD_IS_LIKELY(all_of(isnormal(__x)))) + { + *__exp + = simd_cast<_Samesize<int, _V>>(__exponent_bits - __exp_adjust); + return __mant; + } + +#if __FINITE_MATH_ONLY__ + // at least one element of __x is 0 or subnormal, the rest is normal + // (inf and NaN are excluded by -ffinite-math-only) + const auto __iszero_inf_nan = __x == 0; +#else + const auto __as_int + = __bit_cast<rebind_simd_t<__int_for_sizeof_t<_Tp>, _V>>(abs(__x)); + const auto __inf + = __bit_cast<rebind_simd_t<__int_for_sizeof_t<_Tp>, _V>>( + _V(__infinity_v<_Tp>)); + const auto __iszero_inf_nan = static_simd_cast<typename _V::mask_type>( + __as_int == 0 || __as_int >= __inf); +#endif + + const _V __scaled_subnormal = __x * __subnorm_scale; + const _V __mant_subnormal + = __p5_1_exponent & (__exponent_mask | __scaled_subnormal); + where(!isnormal(__x), __mant) = __mant_subnormal; + where(__iszero_inf_nan, __mant) = __x; + _IV __e = __extract_exponent_as_int(__scaled_subnormal); + using _MaskType = + typename conditional_t<sizeof(typename _V::value_type) == sizeof(int), + _V, _IV>::mask_type; + const _MaskType __value_isnormal = isnormal(__x).__cvt(); + where(__value_isnormal.__cvt(), __e) = __exponent_bits; + static_assert(sizeof(_IV) == sizeof(__value_isnormal)); + const _IV __offset + = (__bit_cast<_IV>(__value_isnormal) & _IV(__exp_adjust)) + | (__bit_cast<_IV>(static_simd_cast<_MaskType>(__exponent_bits == 0) + & static_simd_cast<_MaskType>(__x != 0)) + & _IV(__exp_adjust + __exp_offset)); + *__exp = simd_cast<_Samesize<int, _V>>(__e - __offset); + return __mant; + } + } + +// }}} +_GLIBCXX_SIMD_MATH_CALL2_(ldexp, int) +_GLIBCXX_SIMD_MATH_CALL_(ilogb) + +// logarithms {{{ +_GLIBCXX_SIMD_MATH_CALL_(log) +_GLIBCXX_SIMD_MATH_CALL_(log10) +_GLIBCXX_SIMD_MATH_CALL_(log1p) +_GLIBCXX_SIMD_MATH_CALL_(log2) + +//}}} +// logb{{{ +template <typename _Tp, typename _Abi> + enable_if_t<is_floating_point<_Tp>::value, simd<_Tp, _Abi>> + logb(const simd<_Tp, _Abi>& __x) + { + constexpr size_t _Np = simd_size_v<_Tp, _Abi>; + if constexpr (_Np == 1) + return std::logb(__x[0]); + else if constexpr (__is_fixed_size_abi_v<_Abi>) + { + return {__private_init, + __data(__x)._M_apply_per_chunk([](auto __impl, auto __xx) { + using _V = typename decltype(__impl)::simd_type; + return __data( + std::experimental::logb(_V(__private_init, __xx))); + })}; + } +#if _GLIBCXX_SIMD_X86INTRIN // {{{ + else if constexpr (__have_avx512vl && __is_sse_ps<_Tp, _Np>()) + return {__private_init, + __auto_bitcast(_mm_getexp_ps(__to_intrin(__as_vector(__x))))}; + else if constexpr (__have_avx512vl && __is_sse_pd<_Tp, _Np>()) + return {__private_init, _mm_getexp_pd(__data(__x))}; + else if constexpr (__have_avx512vl && __is_avx_ps<_Tp, _Np>()) + return {__private_init, _mm256_getexp_ps(__data(__x))}; + else if constexpr (__have_avx512vl && __is_avx_pd<_Tp, _Np>()) + return {__private_init, _mm256_getexp_pd(__data(__x))}; + else if constexpr (__have_avx512f && __is_avx_ps<_Tp, _Np>()) + return {__private_init, + __lo256(_mm512_getexp_ps(__auto_bitcast(__data(__x))))}; + else if constexpr (__have_avx512f && __is_avx_pd<_Tp, _Np>()) + return {__private_init, + __lo256(_mm512_getexp_pd(__auto_bitcast(__data(__x))))}; + else if constexpr (__is_avx512_ps<_Tp, _Np>()) + return {__private_init, _mm512_getexp_ps(__data(__x))}; + else if constexpr (__is_avx512_pd<_Tp, _Np>()) + return {__private_init, _mm512_getexp_pd(__data(__x))}; +#endif // _GLIBCXX_SIMD_X86INTRIN }}} + else + { + using _V = simd<_Tp, _Abi>; + using namespace std::experimental::__proposed; + auto __is_normal = isnormal(__x); + + // work on abs(__x) to reflect the return value on Linux for negative + // inputs (domain-error => implementation-defined value is returned) + const _V abs_x = abs(__x); + + // __exponent(__x) returns the exponent value (bias removed) as + // simd<_Up> with integral _Up + auto&& __exponent = [](const _V& __v) { + using namespace std::experimental::__proposed; + using _IV = rebind_simd_t< + conditional_t<sizeof(_Tp) == sizeof(_LLong), _LLong, int>, _V>; + return (__bit_cast<_IV>(__v) >> (__digits_v<_Tp> - 1)) + - (__max_exponent_v<_Tp> - 1); + }; + _V __r = static_simd_cast<_V>(__exponent(abs_x)); + if (_GLIBCXX_SIMD_IS_LIKELY(all_of(__is_normal))) + // without corner cases (nan, inf, subnormal, zero) we have our + // answer: + return __r; + const auto __is_zero = __x == 0; + const auto __is_nan = isnan(__x); + const auto __is_inf = isinf(__x); + where(__is_zero, __r) = -__infinity_v<_Tp>; + where(__is_nan, __r) = __x; + where(__is_inf, __r) = __infinity_v<_Tp>; + __is_normal |= __is_zero || __is_nan || __is_inf; + if (all_of(__is_normal)) + // at this point everything but subnormals is handled + return __r; + // subnormals repeat the exponent extraction after multiplication of the + // input with __a floating point value that has 112 (0x70) in its exponent + // (not too big for sp and large enough for dp) + const _V __scaled = abs_x * _Tp(0x1p112); + _V __scaled_exp = static_simd_cast<_V>(__exponent(__scaled) - 112); + where(__is_normal, __scaled_exp) = __r; + return __scaled_exp; + } + } + +//}}} +template <typename _Tp, typename _Abi> + enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>> + modf(const simd<_Tp, _Abi>& __x, simd<_Tp, _Abi>* __iptr) + { + if constexpr (__is_scalar_abi<_Abi>() + || (__is_fixed_size_abi_v< + _Abi> && simd_size_v<_Tp, _Abi> == 1)) + { + _Tp __tmp; + _Tp __r = std::modf(__x[0], &__tmp); + __iptr[0] = __tmp; + return __r; + } + else + { + const auto __integral = trunc(__x); + *__iptr = __integral; + auto __r = __x - __integral; +#if !__FINITE_MATH_ONLY__ + where(isinf(__x), __r) = _Tp(); +#endif + return copysign(__r, __x); + } + } + +_GLIBCXX_SIMD_MATH_CALL2_(scalbn, int) +_GLIBCXX_SIMD_MATH_CALL2_(scalbln, long) + +_GLIBCXX_SIMD_MATH_CALL_(cbrt) + +_GLIBCXX_SIMD_MATH_CALL_(abs) +_GLIBCXX_SIMD_MATH_CALL_(fabs) + +// [parallel.simd.math] only asks for is_floating_point_v<_Tp> and forgot to +// allow signed integral _Tp +template <typename _Tp, typename _Abi> + enable_if_t<!is_floating_point_v<_Tp> && is_signed_v<_Tp>, simd<_Tp, _Abi>> + abs(const simd<_Tp, _Abi>& __x) + { return {__private_init, _Abi::_SimdImpl::_S_abs(__data(__x))}; } + +template <typename _Tp, typename _Abi> + enable_if_t<!is_floating_point_v<_Tp> && is_signed_v<_Tp>, simd<_Tp, _Abi>> + fabs(const simd<_Tp, _Abi>& __x) + { return {__private_init, _Abi::_SimdImpl::_S_abs(__data(__x))}; } + +// the following are overloads for functions in <cstdlib> and not covered by +// [parallel.simd.math]. I don't see much value in making them work, though +/* +template <typename _Abi> simd<long, _Abi> labs(const simd<long, _Abi> &__x) +{ return {__private_init, _Abi::_SimdImpl::abs(__data(__x))}; } + +template <typename _Abi> simd<long long, _Abi> llabs(const simd<long long, _Abi> +&__x) +{ return {__private_init, _Abi::_SimdImpl::abs(__data(__x))}; } +*/ + +#define _GLIBCXX_SIMD_CVTING2(_NAME) \ +template <typename _Tp, typename _Abi> \ + _GLIBCXX_SIMD_INTRINSIC simd<_Tp, _Abi> _NAME( \ + const simd<_Tp, _Abi>& __x, const __type_identity_t<simd<_Tp, _Abi>>& __y) \ + { \ + return _NAME(__x, __y); \ + } \ + \ +template <typename _Tp, typename _Abi> \ + _GLIBCXX_SIMD_INTRINSIC simd<_Tp, _Abi> _NAME( \ + const __type_identity_t<simd<_Tp, _Abi>>& __x, const simd<_Tp, _Abi>& __y) \ + { \ + return _NAME(__x, __y); \ + } + +#define _GLIBCXX_SIMD_CVTING3(_NAME) \ +template <typename _Tp, typename _Abi> \ + _GLIBCXX_SIMD_INTRINSIC simd<_Tp, _Abi> _NAME( \ + const __type_identity_t<simd<_Tp, _Abi>>& __x, const simd<_Tp, _Abi>& __y, \ + const simd<_Tp, _Abi>& __z) \ + { \ + return _NAME(__x, __y, __z); \ + } \ + \ +template <typename _Tp, typename _Abi> \ + _GLIBCXX_SIMD_INTRINSIC simd<_Tp, _Abi> _NAME( \ + const simd<_Tp, _Abi>& __x, const __type_identity_t<simd<_Tp, _Abi>>& __y, \ + const simd<_Tp, _Abi>& __z) \ + { \ + return _NAME(__x, __y, __z); \ + } \ + \ +template <typename _Tp, typename _Abi> \ + _GLIBCXX_SIMD_INTRINSIC simd<_Tp, _Abi> _NAME( \ + const simd<_Tp, _Abi>& __x, const simd<_Tp, _Abi>& __y, \ + const __type_identity_t<simd<_Tp, _Abi>>& __z) \ + { \ + return _NAME(__x, __y, __z); \ + } \ + \ +template <typename _Tp, typename _Abi> \ + _GLIBCXX_SIMD_INTRINSIC simd<_Tp, _Abi> _NAME( \ + const simd<_Tp, _Abi>& __x, const __type_identity_t<simd<_Tp, _Abi>>& __y, \ + const __type_identity_t<simd<_Tp, _Abi>>& __z) \ + { \ + return _NAME(__x, __y, __z); \ + } \ + \ +template <typename _Tp, typename _Abi> \ + _GLIBCXX_SIMD_INTRINSIC simd<_Tp, _Abi> _NAME( \ + const __type_identity_t<simd<_Tp, _Abi>>& __x, const simd<_Tp, _Abi>& __y, \ + const __type_identity_t<simd<_Tp, _Abi>>& __z) \ + { \ + return _NAME(__x, __y, __z); \ + } \ + \ +template <typename _Tp, typename _Abi> \ + _GLIBCXX_SIMD_INTRINSIC simd<_Tp, _Abi> _NAME( \ + const __type_identity_t<simd<_Tp, _Abi>>& __x, \ + const __type_identity_t<simd<_Tp, _Abi>>& __y, const simd<_Tp, _Abi>& __z) \ + { \ + return _NAME(__x, __y, __z); \ + } + +template <typename _R, typename _ToApply, typename _Tp, typename... _Tps> + _GLIBCXX_SIMD_INTRINSIC _R + __fixed_size_apply(_ToApply&& __apply, const _Tp& __arg0, + const _Tps&... __args) + { + return {__private_init, + __data(__arg0)._M_apply_per_chunk( + [&](auto __impl, const auto&... __inner) { + using _V = typename decltype(__impl)::simd_type; + return __data(__apply(_V(__private_init, __inner)...)); + }, + __data(__args)...)}; + } + +template <typename _VV> + __remove_cvref_t<_VV> + __hypot(_VV __x, _VV __y) + { + using _V = __remove_cvref_t<_VV>; + using _Tp = typename _V::value_type; + if constexpr (_V::size() == 1) + return std::hypot(_Tp(__x[0]), _Tp(__y[0])); + else if constexpr (__is_fixed_size_abi_v<typename _V::abi_type>) + { + return __fixed_size_apply<_V>([](auto __a, + auto __b) { return hypot(__a, __b); }, + __x, __y); + } + else + { + // A simple solution for _Tp == float would be to cast to double and + // simply calculate sqrt(x²+y²) as it can't over-/underflow anymore with + // dp. It still needs the Annex F fixups though and isn't faster on + // Skylake-AVX512 (not even for SSE and AVX vectors, and really bad for + // AVX-512). + using namespace __float_bitwise_operators; + _V __absx = abs(__x); // no error + _V __absy = abs(__y); // no error + _V __hi = max(__absx, __absy); // no error + _V __lo = min(__absy, __absx); // no error + + // round __hi down to the next power-of-2: + _GLIBCXX_SIMD_USE_CONSTEXPR_API _V __inf(__infinity_v<_Tp>); + +#ifndef __FAST_MATH__ + if constexpr (__have_neon && !__have_neon_a32) + { // With ARMv7 NEON, we have no subnormals and must use slightly + // different strategy + const _V __hi_exp = __hi & __inf; + _V __scale_back = __hi_exp; + // For large exponents (max & max/2) the inversion comes too close + // to subnormals. Subtract 3 from the exponent: + where(__hi_exp > 1, __scale_back) = __hi_exp * _Tp(0.125); + // Invert and adjust for the off-by-one error of inversion via xor: + const _V __scale = (__scale_back ^ __inf) * _Tp(.5); + const _V __h1 = __hi * __scale; + const _V __l1 = __lo * __scale; + _V __r = __scale_back * sqrt(__h1 * __h1 + __l1 * __l1); + // Fix up hypot(0, 0) to not be NaN: + where(__hi == 0, __r) = 0; + return __r; + } +#endif + +#ifdef __FAST_MATH__ + // With fast-math, ignore precision of subnormals and inputs from + // __finite_max_v/2 to __finite_max_v. This removes all + // branching/masking. + if constexpr (true) +#else + if (_GLIBCXX_SIMD_IS_LIKELY(all_of(isnormal(__x)) + && all_of(isnormal(__y)))) +#endif + { + const _V __hi_exp = __hi & __inf; + //((__hi + __hi) & __inf) ^ __inf almost works for computing + //__scale, + // except when (__hi + __hi) & __inf == __inf, in which case __scale + // becomes 0 (should be min/2 instead) and thus loses the + // information from __lo. +#ifdef __FAST_MATH__ + using _Ip = __int_for_sizeof_t<_Tp>; + using _IV = rebind_simd_t<_Ip, _V>; + const auto __as_int = __bit_cast<_IV>(__hi_exp); + const _V __scale + = __bit_cast<_V>(2 * __bit_cast<_Ip>(_Tp(1)) - __as_int); +#else + const _V __scale = (__hi_exp ^ __inf) * _Tp(.5); +#endif + _GLIBCXX_SIMD_USE_CONSTEXPR_API _V __mant_mask + = __norm_min_v<_Tp> - __denorm_min_v<_Tp>; + const _V __h1 = (__hi & __mant_mask) | _V(1); + const _V __l1 = __lo * __scale; + return __hi_exp * sqrt(__h1 * __h1 + __l1 * __l1); + } + else + { + // slower path to support subnormals + // if __hi is subnormal, avoid scaling by inf & final mul by 0 + // (which yields NaN) by using min() + _V __scale = _V(1 / __norm_min_v<_Tp>); + // invert exponent w/o error and w/o using the slow divider unit: + // xor inverts the exponent but off by 1. Multiplication with .5 + // adjusts for the discrepancy. + where(__hi >= __norm_min_v<_Tp>, __scale) + = ((__hi & __inf) ^ __inf) * _Tp(.5); + // adjust final exponent for subnormal inputs + _V __hi_exp = __norm_min_v<_Tp>; + where(__hi >= __norm_min_v<_Tp>, __hi_exp) + = __hi & __inf; // no error + _V __h1 = __hi * __scale; // no error + _V __l1 = __lo * __scale; // no error + + // sqrt(x²+y²) = e*sqrt((x/e)²+(y/e)²): + // this ensures no overflow in the argument to sqrt + _V __r = __hi_exp * sqrt(__h1 * __h1 + __l1 * __l1); +#ifdef __STDC_IEC_559__ + // fixup for Annex F requirements + // the naive fixup goes like this: + // + // where(__l1 == 0, __r) = __hi; + // where(isunordered(__x, __y), __r) = __quiet_NaN_v<_Tp>; + // where(isinf(__absx) || isinf(__absy), __r) = __inf; + // + // The fixup can be prepared in parallel with the sqrt, requiring a + // single blend step after hi_exp * sqrt, reducing latency and + // throughput: + _V __fixup = __hi; // __lo == 0 + where(isunordered(__x, __y), __fixup) = __quiet_NaN_v<_Tp>; + where(isinf(__absx) || isinf(__absy), __fixup) = __inf; + where(!(__lo == 0 || isunordered(__x, __y) + || (isinf(__absx) || isinf(__absy))), + __fixup) + = __r; + __r = __fixup; +#endif + return __r; + } + } + } + +template <typename _Tp, typename _Abi> + _GLIBCXX_SIMD_INTRINSIC simd<_Tp, _Abi> + hypot(const simd<_Tp, _Abi>& __x, const simd<_Tp, _Abi>& __y) + { + return __hypot<conditional_t<__is_fixed_size_abi_v<_Abi>, + const simd<_Tp, _Abi>&, simd<_Tp, _Abi>>>(__x, + __y); + } + +_GLIBCXX_SIMD_CVTING2(hypot) + + template <typename _VV> + __remove_cvref_t<_VV> + __hypot(_VV __x, _VV __y, _VV __z) + { + using _V = __remove_cvref_t<_VV>; + using _Abi = typename _V::abi_type; + using _Tp = typename _V::value_type; + /* FIXME: enable after PR77776 is resolved + if constexpr (_V::size() == 1) + return std::hypot(_Tp(__x[0]), _Tp(__y[0]), _Tp(__z[0])); + else + */ + if constexpr (__is_fixed_size_abi_v<_Abi> && _V::size() > 1) + { + return __fixed_size_apply<simd<_Tp, _Abi>>( + [](auto __a, auto __b, auto __c) { return hypot(__a, __b, __c); }, + __x, __y, __z); + } + else + { + using namespace __float_bitwise_operators; + const _V __absx = abs(__x); // no error + const _V __absy = abs(__y); // no error + const _V __absz = abs(__z); // no error + _V __hi = max(max(__absx, __absy), __absz); // no error + _V __l0 = min(__absz, max(__absx, __absy)); // no error + _V __l1 = min(__absy, __absx); // no error + if constexpr (__digits_v<_Tp> == 64 && __max_exponent_v<_Tp> == 0x4000 + && __min_exponent_v<_Tp> == -0x3FFD && _V::size() == 1) + { // Seems like x87 fp80, where bit 63 is always 1 unless subnormal or + // NaN. In this case the bit-tricks don't work, they require IEC559 + // binary32 or binary64 format. +#ifdef __STDC_IEC_559__ + // fixup for Annex F requirements + if (isinf(__absx[0]) || isinf(__absy[0]) || isinf(__absz[0])) + return __infinity_v<_Tp>; + else if (isunordered(__absx[0], __absy[0] + __absz[0])) + return __quiet_NaN_v<_Tp>; + else if (__l0[0] == 0 && __l1[0] == 0) + return __hi; +#endif + _V __hi_exp = __hi; + const _ULLong __tmp = 0x8000'0000'0000'0000ull; + __builtin_memcpy(&__data(__hi_exp), &__tmp, 8); + const _V __scale = 1 / __hi_exp; + __hi *= __scale; + __l0 *= __scale; + __l1 *= __scale; + return __hi_exp * sqrt((__l0 * __l0 + __l1 * __l1) + __hi * __hi); + } + else + { + // round __hi down to the next power-of-2: + _GLIBCXX_SIMD_USE_CONSTEXPR_API _V __inf(__infinity_v<_Tp>); + +#ifndef __FAST_MATH__ + if constexpr (_V::size() > 1 && __have_neon && !__have_neon_a32) + { // With ARMv7 NEON, we have no subnormals and must use slightly + // different strategy + const _V __hi_exp = __hi & __inf; + _V __scale_back = __hi_exp; + // For large exponents (max & max/2) the inversion comes too + // close to subnormals. Subtract 3 from the exponent: + where(__hi_exp > 1, __scale_back) = __hi_exp * _Tp(0.125); + // Invert and adjust for the off-by-one error of inversion via + // xor: + const _V __scale = (__scale_back ^ __inf) * _Tp(.5); + const _V __h1 = __hi * __scale; + __l0 *= __scale; + __l1 *= __scale; + _V __lo = __l0 * __l0 + + __l1 * __l1; // add the two smaller values first + asm("" : "+m"(__lo)); + _V __r = __scale_back * sqrt(__h1 * __h1 + __lo); + // Fix up hypot(0, 0, 0) to not be NaN: + where(__hi == 0, __r) = 0; + return __r; + } +#endif + +#ifdef __FAST_MATH__ + // With fast-math, ignore precision of subnormals and inputs from + // __finite_max_v/2 to __finite_max_v. This removes all + // branching/masking. + if constexpr (true) +#else + if (_GLIBCXX_SIMD_IS_LIKELY(all_of(isnormal(__x)) + && all_of(isnormal(__y)) + && all_of(isnormal(__z)))) +#endif + { + const _V __hi_exp = __hi & __inf; + //((__hi + __hi) & __inf) ^ __inf almost works for computing + //__scale, except when (__hi + __hi) & __inf == __inf, in which + // case __scale + // becomes 0 (should be min/2 instead) and thus loses the + // information from __lo. +#ifdef __FAST_MATH__ + using _Ip = __int_for_sizeof_t<_Tp>; + using _IV = rebind_simd_t<_Ip, _V>; + const auto __as_int = __bit_cast<_IV>(__hi_exp); + const _V __scale + = __bit_cast<_V>(2 * __bit_cast<_Ip>(_Tp(1)) - __as_int); +#else + const _V __scale = (__hi_exp ^ __inf) * _Tp(.5); +#endif + constexpr _Tp __mant_mask + = __norm_min_v<_Tp> - __denorm_min_v<_Tp>; + const _V __h1 = (__hi & _V(__mant_mask)) | _V(1); + __l0 *= __scale; + __l1 *= __scale; + const _V __lo + = __l0 * __l0 + + __l1 * __l1; // add the two smaller values first + return __hi_exp * sqrt(__lo + __h1 * __h1); + } + else + { + // slower path to support subnormals + // if __hi is subnormal, avoid scaling by inf & final mul by 0 + // (which yields NaN) by using min() + _V __scale = _V(1 / __norm_min_v<_Tp>); + // invert exponent w/o error and w/o using the slow divider + // unit: xor inverts the exponent but off by 1. Multiplication + // with .5 adjusts for the discrepancy. + where(__hi >= __norm_min_v<_Tp>, __scale) + = ((__hi & __inf) ^ __inf) * _Tp(.5); + // adjust final exponent for subnormal inputs + _V __hi_exp = __norm_min_v<_Tp>; + where(__hi >= __norm_min_v<_Tp>, __hi_exp) + = __hi & __inf; // no error + _V __h1 = __hi * __scale; // no error + __l0 *= __scale; // no error + __l1 *= __scale; // no error + _V __lo = __l0 * __l0 + + __l1 * __l1; // add the two smaller values first + _V __r = __hi_exp * sqrt(__lo + __h1 * __h1); +#ifdef __STDC_IEC_559__ + // fixup for Annex F requirements + _V __fixup = __hi; // __lo == 0 + // where(__lo == 0, __fixup) = __hi; + where(isunordered(__x, __y + __z), __fixup) + = __quiet_NaN_v<_Tp>; + where(isinf(__absx) || isinf(__absy) || isinf(__absz), __fixup) + = __inf; + // Instead of __lo == 0, the following could depend on __h1² == + // __h1² + __lo (i.e. __hi is so much larger than the other two + // inputs that the result is exactly __hi). While this may + // improve precision, it is likely to reduce efficiency if the + // ISA has FMAs (because __h1² + __lo is an FMA, but the + // intermediate + // __h1² must be kept) + where(!(__lo == 0 || isunordered(__x, __y + __z) + || isinf(__absx) || isinf(__absy) || isinf(__absz)), + __fixup) + = __r; + __r = __fixup; +#endif + return __r; + } + } + } + } + + template <typename _Tp, typename _Abi> + _GLIBCXX_SIMD_INTRINSIC simd<_Tp, _Abi> + hypot(const simd<_Tp, _Abi>& __x, const simd<_Tp, _Abi>& __y, + const simd<_Tp, _Abi>& __z) + { + return __hypot<conditional_t<__is_fixed_size_abi_v<_Abi>, + const simd<_Tp, _Abi>&, simd<_Tp, _Abi>>>(__x, + __y, + __z); + } + +_GLIBCXX_SIMD_CVTING3(hypot) + +_GLIBCXX_SIMD_MATH_CALL2_(pow, _Tp) + +_GLIBCXX_SIMD_MATH_CALL_(sqrt) +_GLIBCXX_SIMD_MATH_CALL_(erf) +_GLIBCXX_SIMD_MATH_CALL_(erfc) +_GLIBCXX_SIMD_MATH_CALL_(lgamma) +_GLIBCXX_SIMD_MATH_CALL_(tgamma) +_GLIBCXX_SIMD_MATH_CALL_(ceil) +_GLIBCXX_SIMD_MATH_CALL_(floor) +_GLIBCXX_SIMD_MATH_CALL_(nearbyint) +_GLIBCXX_SIMD_MATH_CALL_(rint) +_GLIBCXX_SIMD_MATH_CALL_(lrint) +_GLIBCXX_SIMD_MATH_CALL_(llrint) + +_GLIBCXX_SIMD_MATH_CALL_(round) +_GLIBCXX_SIMD_MATH_CALL_(lround) +_GLIBCXX_SIMD_MATH_CALL_(llround) + +_GLIBCXX_SIMD_MATH_CALL_(trunc) + +_GLIBCXX_SIMD_MATH_CALL2_(fmod, _Tp) +_GLIBCXX_SIMD_MATH_CALL2_(remainder, _Tp) +_GLIBCXX_SIMD_MATH_CALL3_(remquo, _Tp, int*) + +template <typename _Tp, typename _Abi> + enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>> + copysign(const simd<_Tp, _Abi>& __x, const simd<_Tp, _Abi>& __y) + { + if constexpr (simd_size_v<_Tp, _Abi> == 1) + return std::copysign(__x[0], __y[0]); + else if constexpr (is_same_v<_Tp, long double> && sizeof(_Tp) == 12) + // Remove this case once __bit_cast is implemented via __builtin_bit_cast. + // It is necessary, because __signmask below cannot be computed at compile + // time. + return simd<_Tp, _Abi>( + [&](auto __i) { return std::copysign(__x[__i], __y[__i]); }); + else + { + using _V = simd<_Tp, _Abi>; + using namespace std::experimental::__float_bitwise_operators; + _GLIBCXX_SIMD_USE_CONSTEXPR_API auto __signmask = _V(1) ^ _V(-1); + return (__x & (__x ^ __signmask)) | (__y & __signmask); + } + } + +_GLIBCXX_SIMD_MATH_CALL2_(nextafter, _Tp) +// not covered in [parallel.simd.math]: +// _GLIBCXX_SIMD_MATH_CALL2_(nexttoward, long double) +_GLIBCXX_SIMD_MATH_CALL2_(fdim, _Tp) +_GLIBCXX_SIMD_MATH_CALL2_(fmax, _Tp) +_GLIBCXX_SIMD_MATH_CALL2_(fmin, _Tp) + +_GLIBCXX_SIMD_MATH_CALL3_(fma, _Tp, _Tp) +_GLIBCXX_SIMD_MATH_CALL_(fpclassify) +_GLIBCXX_SIMD_MATH_CALL_(isfinite) + +// isnan and isinf require special treatment because old glibc may declare +// `int isinf(double)`. +template <typename _Tp, typename _Abi, typename..., + typename _R = _Math_return_type_t<bool, _Tp, _Abi>> + enable_if_t<is_floating_point_v<_Tp>, _R> + isinf(simd<_Tp, _Abi> __x) + { return {__private_init, _Abi::_SimdImpl::_S_isinf(__data(__x))}; } + +template <typename _Tp, typename _Abi, typename..., + typename _R = _Math_return_type_t<bool, _Tp, _Abi>> + enable_if_t<is_floating_point_v<_Tp>, _R> + isnan(simd<_Tp, _Abi> __x) + { return {__private_init, _Abi::_SimdImpl::_S_isnan(__data(__x))}; } + +_GLIBCXX_SIMD_MATH_CALL_(isnormal) + +template <typename..., typename _Tp, typename _Abi> + simd_mask<_Tp, _Abi> + signbit(simd<_Tp, _Abi> __x) + { + if constexpr (is_integral_v<_Tp>) + { + if constexpr (is_unsigned_v<_Tp>) + return simd_mask<_Tp, _Abi>{}; // false + else + return __x < 0; + } + else + return {__private_init, _Abi::_SimdImpl::_S_signbit(__data(__x))}; + } + +_GLIBCXX_SIMD_MATH_CALL2_(isgreater, _Tp) +_GLIBCXX_SIMD_MATH_CALL2_(isgreaterequal, _Tp) +_GLIBCXX_SIMD_MATH_CALL2_(isless, _Tp) +_GLIBCXX_SIMD_MATH_CALL2_(islessequal, _Tp) +_GLIBCXX_SIMD_MATH_CALL2_(islessgreater, _Tp) +_GLIBCXX_SIMD_MATH_CALL2_(isunordered, _Tp) + +/* not covered in [parallel.simd.math] +template <typename _Abi> __doublev<_Abi> nan(const char* tagp); +template <typename _Abi> __floatv<_Abi> nanf(const char* tagp); +template <typename _Abi> __ldoublev<_Abi> nanl(const char* tagp); + +template <typename _V> struct simd_div_t { + _V quot, rem; +}; + +template <typename _Abi> +simd_div_t<_SCharv<_Abi>> div(_SCharv<_Abi> numer, + _SCharv<_Abi> denom); +template <typename _Abi> +simd_div_t<__shortv<_Abi>> div(__shortv<_Abi> numer, + __shortv<_Abi> denom); +template <typename _Abi> +simd_div_t<__intv<_Abi>> div(__intv<_Abi> numer, __intv<_Abi> denom); +template <typename _Abi> +simd_div_t<__longv<_Abi>> div(__longv<_Abi> numer, + __longv<_Abi> denom); +template <typename _Abi> +simd_div_t<__llongv<_Abi>> div(__llongv<_Abi> numer, + __llongv<_Abi> denom); +*/ + +// special math {{{ +template <typename _Tp, typename _Abi> + enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>> + assoc_laguerre(const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __n, + const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __m, + const simd<_Tp, _Abi>& __x) + { + return simd<_Tp, _Abi>([&](auto __i) { + return std::assoc_laguerre(__n[__i], __m[__i], __x[__i]); + }); + } + +template <typename _Tp, typename _Abi> + enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>> + assoc_legendre(const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __n, + const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __m, + const simd<_Tp, _Abi>& __x) + { + return simd<_Tp, _Abi>([&](auto __i) { + return std::assoc_legendre(__n[__i], __m[__i], __x[__i]); + }); + } + +_GLIBCXX_SIMD_MATH_CALL2_(beta, _Tp) +_GLIBCXX_SIMD_MATH_CALL_(comp_ellint_1) +_GLIBCXX_SIMD_MATH_CALL_(comp_ellint_2) +_GLIBCXX_SIMD_MATH_CALL2_(comp_ellint_3, _Tp) +_GLIBCXX_SIMD_MATH_CALL2_(cyl_bessel_i, _Tp) +_GLIBCXX_SIMD_MATH_CALL2_(cyl_bessel_j, _Tp) +_GLIBCXX_SIMD_MATH_CALL2_(cyl_bessel_k, _Tp) +_GLIBCXX_SIMD_MATH_CALL2_(cyl_neumann, _Tp) +_GLIBCXX_SIMD_MATH_CALL2_(ellint_1, _Tp) +_GLIBCXX_SIMD_MATH_CALL2_(ellint_2, _Tp) +_GLIBCXX_SIMD_MATH_CALL3_(ellint_3, _Tp, _Tp) +_GLIBCXX_SIMD_MATH_CALL_(expint) + +template <typename _Tp, typename _Abi> + enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>> + hermite(const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __n, + const simd<_Tp, _Abi>& __x) + { + return simd<_Tp, _Abi>( + [&](auto __i) { return std::hermite(__n[__i], __x[__i]); }); + } + +template <typename _Tp, typename _Abi> + enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>> + laguerre(const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __n, + const simd<_Tp, _Abi>& __x) + { + return simd<_Tp, _Abi>( + [&](auto __i) { return std::laguerre(__n[__i], __x[__i]); }); + } + +template <typename _Tp, typename _Abi> + enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>> + legendre(const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __n, + const simd<_Tp, _Abi>& __x) + { + return simd<_Tp, _Abi>( + [&](auto __i) { return std::legendre(__n[__i], __x[__i]); }); + } + +_GLIBCXX_SIMD_MATH_CALL_(riemann_zeta) + +template <typename _Tp, typename _Abi> + enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>> + sph_bessel(const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __n, + const simd<_Tp, _Abi>& __x) + { + return simd<_Tp, _Abi>( + [&](auto __i) { return std::sph_bessel(__n[__i], __x[__i]); }); + } + +template <typename _Tp, typename _Abi> + enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>> + sph_legendre(const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __l, + const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __m, + const simd<_Tp, _Abi>& theta) + { + return simd<_Tp, _Abi>([&](auto __i) { + return std::assoc_legendre(__l[__i], __m[__i], theta[__i]); + }); + } + +template <typename _Tp, typename _Abi> + enable_if_t<is_floating_point_v<_Tp>, simd<_Tp, _Abi>> + sph_neumann(const fixed_size_simd<unsigned, simd_size_v<_Tp, _Abi>>& __n, + const simd<_Tp, _Abi>& __x) + { + return simd<_Tp, _Abi>( + [&](auto __i) { return std::sph_neumann(__n[__i], __x[__i]); }); + } +// }}} + +#undef _GLIBCXX_SIMD_MATH_CALL_ +#undef _GLIBCXX_SIMD_MATH_CALL2_ +#undef _GLIBCXX_SIMD_MATH_CALL3_ + +_GLIBCXX_SIMD_END_NAMESPACE + +#endif // __cplusplus >= 201703L +#endif // _GLIBCXX_EXPERIMENTAL_SIMD_MATH_H_ + +// vim: foldmethod=marker sw=2 ts=8 noet sts=2 diff --git a/libstdc++-v3/include/experimental/bits/simd_neon.h b/libstdc++-v3/include/experimental/bits/simd_neon.h new file mode 100644 index 00000000000..a3a8ffe165f --- /dev/null +++ b/libstdc++-v3/include/experimental/bits/simd_neon.h @@ -0,0 +1,519 @@ +// Simd NEON specific implementations -*- C++ -*- + +// Copyright (C) 2020 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// Under Section 7 of GPL version 3, you are granted additional +// permissions described in the GCC Runtime Library Exception, version +// 3.1, as published by the Free Software Foundation. + +// You should have received a copy of the GNU General Public License and +// a copy of the GCC Runtime Library Exception along with this program; +// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +// <http://www.gnu.org/licenses/>. + +#ifndef _GLIBCXX_EXPERIMENTAL_SIMD_NEON_H_ +#define _GLIBCXX_EXPERIMENTAL_SIMD_NEON_H_ + +#if __cplusplus >= 201703L + +#if !_GLIBCXX_SIMD_HAVE_NEON +#error "simd_neon.h may only be included when NEON on ARM is available" +#endif + +_GLIBCXX_SIMD_BEGIN_NAMESPACE + +// _CommonImplNeon {{{ +struct _CommonImplNeon : _CommonImplBuiltin +{ + // _S_store {{{ + using _CommonImplBuiltin::_S_store; + + // }}} +}; + +// }}} +// _SimdImplNeon {{{ +template <typename _Abi> + struct _SimdImplNeon : _SimdImplBuiltin<_Abi> + { + using _Base = _SimdImplBuiltin<_Abi>; + + template <typename _Tp> + using _MaskMember = typename _Base::template _MaskMember<_Tp>; + + template <typename _Tp> + static constexpr size_t _S_max_store_size = 16; + + // _S_masked_load {{{ + template <typename _Tp, size_t _Np, typename _Up> + static inline _SimdWrapper<_Tp, _Np> + _S_masked_load(_SimdWrapper<_Tp, _Np> __merge, _MaskMember<_Tp> __k, + const _Up* __mem) noexcept + { + __execute_n_times<_Np>([&](auto __i) { + if (__k[__i] != 0) + __merge._M_set(__i, static_cast<_Tp>(__mem[__i])); + }); + return __merge; + } + + // }}} + // _S_masked_store_nocvt {{{ + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static void + _S_masked_store_nocvt(_SimdWrapper<_Tp, _Np> __v, _Tp* __mem, + _MaskMember<_Tp> __k) + { + __execute_n_times<_Np>([&](auto __i) { + if (__k[__i] != 0) + __mem[__i] = __v[__i]; + }); + } + + // }}} + // _S_reduce {{{ + template <typename _Tp, typename _BinaryOperation> + _GLIBCXX_SIMD_INTRINSIC static _Tp + _S_reduce(simd<_Tp, _Abi> __x, _BinaryOperation&& __binary_op) + { + constexpr size_t _Np = __x.size(); + if constexpr (sizeof(__x) == 16 && _Np >= 4 + && !_Abi::template _S_is_partial<_Tp>) + { + const auto __halves = split<simd<_Tp, simd_abi::_Neon<8>>>(__x); + const auto __y = __binary_op(__halves[0], __halves[1]); + return _SimdImplNeon<simd_abi::_Neon<8>>::_S_reduce( + __y, static_cast<_BinaryOperation&&>(__binary_op)); + } + else if constexpr (_Np == 8) + { + __x = __binary_op(__x, _Base::template _M_make_simd<_Tp, _Np>( + __vector_permute<1, 0, 3, 2, 5, 4, 7, 6>( + __x._M_data))); + __x = __binary_op(__x, _Base::template _M_make_simd<_Tp, _Np>( + __vector_permute<3, 2, 1, 0, 7, 6, 5, 4>( + __x._M_data))); + __x = __binary_op(__x, _Base::template _M_make_simd<_Tp, _Np>( + __vector_permute<7, 6, 5, 4, 3, 2, 1, 0>( + __x._M_data))); + return __x[0]; + } + else if constexpr (_Np == 4) + { + __x + = __binary_op(__x, _Base::template _M_make_simd<_Tp, _Np>( + __vector_permute<1, 0, 3, 2>(__x._M_data))); + __x + = __binary_op(__x, _Base::template _M_make_simd<_Tp, _Np>( + __vector_permute<3, 2, 1, 0>(__x._M_data))); + return __x[0]; + } + else if constexpr (_Np == 2) + { + __x = __binary_op(__x, _Base::template _M_make_simd<_Tp, _Np>( + __vector_permute<1, 0>(__x._M_data))); + return __x[0]; + } + else + return _Base::_S_reduce(__x, + static_cast<_BinaryOperation&&>(__binary_op)); + } + + // }}} + // math {{{ + // _S_sqrt {{{ + template <typename _Tp, typename _TVT = _VectorTraits<_Tp>> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_sqrt(_Tp __x) + { + if constexpr (__have_neon_a64) + { + const auto __intrin = __to_intrin(__x); + if constexpr (_TVT::template _S_is<float, 2>) + return vsqrt_f32(__intrin); + else if constexpr (_TVT::template _S_is<float, 4>) + return vsqrtq_f32(__intrin); + else if constexpr (_TVT::template _S_is<double, 1>) + return vsqrt_f64(__intrin); + else if constexpr (_TVT::template _S_is<double, 2>) + return vsqrtq_f64(__intrin); + else + __assert_unreachable<_Tp>(); + } + else + return _Base::_S_sqrt(__x); + } + + // }}} + // _S_trunc {{{ + template <typename _TW, typename _TVT = _VectorTraits<_TW>> + _GLIBCXX_SIMD_INTRINSIC static _TW _S_trunc(_TW __x) + { + using _Tp = typename _TVT::value_type; + if constexpr (__have_neon_a32) + { + const auto __intrin = __to_intrin(__x); + if constexpr (_TVT::template _S_is<float, 2>) + return vrnd_f32(__intrin); + else if constexpr (_TVT::template _S_is<float, 4>) + return vrndq_f32(__intrin); + else if constexpr (_TVT::template _S_is<double, 1>) + return vrnd_f64(__intrin); + else if constexpr (_TVT::template _S_is<double, 2>) + return vrndq_f64(__intrin); + else + __assert_unreachable<_Tp>(); + } + else if constexpr (is_same_v<_Tp, float>) + { + auto __intrin = __to_intrin(__x); + if constexpr (sizeof(__x) == 16) + __intrin = vcvtq_f32_s32(vcvtq_s32_f32(__intrin)); + else + __intrin = vcvt_f32_s32(vcvt_s32_f32(__intrin)); + return _Base::_S_abs(__x)._M_data < 0x1p23f + ? __vector_bitcast<float>(__intrin) + : __x._M_data; + } + else + return _Base::_S_trunc(__x); + } + + // }}} + // _S_round {{{ + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static _SimdWrapper<_Tp, _Np> + _S_round(_SimdWrapper<_Tp, _Np> __x) + { + if constexpr (__have_neon_a32) + { + const auto __intrin = __to_intrin(__x); + if constexpr (sizeof(_Tp) == 4 && sizeof(__x) == 8) + return vrnda_f32(__intrin); + else if constexpr (sizeof(_Tp) == 4 && sizeof(__x) == 16) + return vrndaq_f32(__intrin); + else if constexpr (sizeof(_Tp) == 8 && sizeof(__x) == 8) + return vrnda_f64(__intrin); + else if constexpr (sizeof(_Tp) == 8 && sizeof(__x) == 16) + return vrndaq_f64(__intrin); + else + __assert_unreachable<_Tp>(); + } + else + return _Base::_S_round(__x); + } + + // }}} + // _S_floor {{{ + template <typename _Tp, typename _TVT = _VectorTraits<_Tp>> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_floor(_Tp __x) + { + if constexpr (__have_neon_a32) + { + const auto __intrin = __to_intrin(__x); + if constexpr (_TVT::template _S_is<float, 2>) + return vrndm_f32(__intrin); + else if constexpr (_TVT::template _S_is<float, 4>) + return vrndmq_f32(__intrin); + else if constexpr (_TVT::template _S_is<double, 1>) + return vrndm_f64(__intrin); + else if constexpr (_TVT::template _S_is<double, 2>) + return vrndmq_f64(__intrin); + else + __assert_unreachable<_Tp>(); + } + else + return _Base::_S_floor(__x); + } + + // }}} + // _S_ceil {{{ + template <typename _Tp, typename _TVT = _VectorTraits<_Tp>> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_ceil(_Tp __x) + { + if constexpr (__have_neon_a32) + { + const auto __intrin = __to_intrin(__x); + if constexpr (_TVT::template _S_is<float, 2>) + return vrndp_f32(__intrin); + else if constexpr (_TVT::template _S_is<float, 4>) + return vrndpq_f32(__intrin); + else if constexpr (_TVT::template _S_is<double, 1>) + return vrndp_f64(__intrin); + else if constexpr (_TVT::template _S_is<double, 2>) + return vrndpq_f64(__intrin); + else + __assert_unreachable<_Tp>(); + } + else + return _Base::_S_ceil(__x); + } + + //}}} }}} + }; // }}} +// _MaskImplNeonMixin {{{ +struct _MaskImplNeonMixin +{ + using _Base = _MaskImplBuiltinMixin; + + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr _SanitizedBitMask<_Np> + _S_to_bits(_SimdWrapper<_Tp, _Np> __x) + { + if (__builtin_is_constant_evaluated()) + return _Base::_S_to_bits(__x); + + using _I = __int_for_sizeof_t<_Tp>; + if constexpr (sizeof(__x) == 16) + { + auto __asint = __vector_bitcast<_I>(__x); +#ifdef __aarch64__ + [[maybe_unused]] constexpr auto __zero = decltype(__asint)(); +#else + [[maybe_unused]] constexpr auto __zero = decltype(__lo64(__asint))(); +#endif + if constexpr (sizeof(_Tp) == 1) + { + constexpr auto __bitsel + = __generate_from_n_evaluations<16, __vector_type_t<_I, 16>>( + [&](auto __i) { + return static_cast<_I>( + __i < _Np ? (__i < 8 ? 1 << __i : 1 << (__i - 8)) : 0); + }); + __asint &= __bitsel; +#ifdef __aarch64__ + return __vector_bitcast<_UShort>( + vpaddq_s8(vpaddq_s8(vpaddq_s8(__asint, __zero), __zero), + __zero))[0]; +#else + return __vector_bitcast<_UShort>( + vpadd_s8(vpadd_s8(vpadd_s8(__lo64(__asint), __hi64(__asint)), + __zero), + __zero))[0]; +#endif + } + else if constexpr (sizeof(_Tp) == 2) + { + constexpr auto __bitsel + = __generate_from_n_evaluations<8, __vector_type_t<_I, 8>>( + [&](auto __i) { + return static_cast<_I>(__i < _Np ? 1 << __i : 0); + }); + __asint &= __bitsel; +#ifdef __aarch64__ + return vpaddq_s16(vpaddq_s16(vpaddq_s16(__asint, __zero), __zero), + __zero)[0]; +#else + return vpadd_s16( + vpadd_s16(vpadd_s16(__lo64(__asint), __hi64(__asint)), __zero), + __zero)[0]; +#endif + } + else if constexpr (sizeof(_Tp) == 4) + { + constexpr auto __bitsel + = __generate_from_n_evaluations<4, __vector_type_t<_I, 4>>( + [&](auto __i) { + return static_cast<_I>(__i < _Np ? 1 << __i : 0); + }); + __asint &= __bitsel; +#ifdef __aarch64__ + return vpaddq_s32(vpaddq_s32(__asint, __zero), __zero)[0]; +#else + return vpadd_s32(vpadd_s32(__lo64(__asint), __hi64(__asint)), + __zero)[0]; +#endif + } + else if constexpr (sizeof(_Tp) == 8) + return (__asint[0] & 1) | (__asint[1] & 2); + else + __assert_unreachable<_Tp>(); + } + else if constexpr (sizeof(__x) == 8) + { + auto __asint = __vector_bitcast<_I>(__x); + [[maybe_unused]] constexpr auto __zero = decltype(__asint)(); + if constexpr (sizeof(_Tp) == 1) + { + constexpr auto __bitsel + = __generate_from_n_evaluations<8, __vector_type_t<_I, 8>>( + [&](auto __i) { + return static_cast<_I>(__i < _Np ? 1 << __i : 0); + }); + __asint &= __bitsel; + return vpadd_s8(vpadd_s8(vpadd_s8(__asint, __zero), __zero), + __zero)[0]; + } + else if constexpr (sizeof(_Tp) == 2) + { + constexpr auto __bitsel + = __generate_from_n_evaluations<4, __vector_type_t<_I, 4>>( + [&](auto __i) { + return static_cast<_I>(__i < _Np ? 1 << __i : 0); + }); + __asint &= __bitsel; + return vpadd_s16(vpadd_s16(__asint, __zero), __zero)[0]; + } + else if constexpr (sizeof(_Tp) == 4) + { + __asint &= __make_vector<_I>(0x1, 0x2); + return vpadd_s32(__asint, __zero)[0]; + } + else + __assert_unreachable<_Tp>(); + } + else + return _Base::_S_to_bits(__x); + } +}; + +// }}} +// _MaskImplNeon {{{ +template <typename _Abi> + struct _MaskImplNeon : _MaskImplNeonMixin, _MaskImplBuiltin<_Abi> + { + using _MaskImplBuiltinMixin::_S_to_maskvector; + using _MaskImplNeonMixin::_S_to_bits; + using _Base = _MaskImplBuiltin<_Abi>; + using _Base::_S_convert; + + // _S_all_of {{{ + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static bool _S_all_of(simd_mask<_Tp, _Abi> __k) + { + const auto __kk + = __vector_bitcast<char>(__k._M_data) + | ~__vector_bitcast<char>(_Abi::template _S_implicit_mask<_Tp>()); + if constexpr (sizeof(__k) == 16) + { + const auto __x = __vector_bitcast<long long>(__kk); + return __x[0] + __x[1] == -2; + } + else if constexpr (sizeof(__k) <= 8) + return __bit_cast<__int_for_sizeof_t<decltype(__kk)>>(__kk) == -1; + else + __assert_unreachable<_Tp>(); + } + + // }}} + // _S_any_of {{{ + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static bool _S_any_of(simd_mask<_Tp, _Abi> __k) + { + const auto __kk + = __vector_bitcast<char>(__k._M_data) + | ~__vector_bitcast<char>(_Abi::template _S_implicit_mask<_Tp>()); + if constexpr (sizeof(__k) == 16) + { + const auto __x = __vector_bitcast<long long>(__kk); + return (__x[0] | __x[1]) != 0; + } + else if constexpr (sizeof(__k) <= 8) + return __bit_cast<__int_for_sizeof_t<decltype(__kk)>>(__kk) != 0; + else + __assert_unreachable<_Tp>(); + } + + // }}} + // _S_none_of {{{ + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static bool _S_none_of(simd_mask<_Tp, _Abi> __k) + { + const auto __kk = _Abi::_S_masked(__k._M_data); + if constexpr (sizeof(__k) == 16) + { + const auto __x = __vector_bitcast<long long>(__kk); + return (__x[0] | __x[1]) == 0; + } + else if constexpr (sizeof(__k) <= 8) + return __bit_cast<__int_for_sizeof_t<decltype(__kk)>>(__kk) == 0; + else + __assert_unreachable<_Tp>(); + } + + // }}} + // _S_some_of {{{ + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static bool _S_some_of(simd_mask<_Tp, _Abi> __k) + { + if constexpr (sizeof(__k) <= 8) + { + const auto __kk = __vector_bitcast<char>(__k._M_data) + | ~__vector_bitcast<char>( + _Abi::template _S_implicit_mask<_Tp>()); + using _Up = make_unsigned_t<__int_for_sizeof_t<decltype(__kk)>>; + return __bit_cast<_Up>(__kk) + 1 > 1; + } + else + return _Base::_S_some_of(__k); + } + + // }}} + // _S_popcount {{{ + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static int _S_popcount(simd_mask<_Tp, _Abi> __k) + { + if constexpr (sizeof(_Tp) == 1) + { + const auto __s8 = __vector_bitcast<_SChar>(__k._M_data); + int8x8_t __tmp = __lo64(__s8) + __hi64z(__s8); + return -vpadd_s8(vpadd_s8(vpadd_s8(__tmp, int8x8_t()), int8x8_t()), + int8x8_t())[0]; + } + else if constexpr (sizeof(_Tp) == 2) + { + const auto __s16 = __vector_bitcast<short>(__k._M_data); + int16x4_t __tmp = __lo64(__s16) + __hi64z(__s16); + return -vpadd_s16(vpadd_s16(__tmp, int16x4_t()), int16x4_t())[0]; + } + else if constexpr (sizeof(_Tp) == 4) + { + const auto __s32 = __vector_bitcast<int>(__k._M_data); + int32x2_t __tmp = __lo64(__s32) + __hi64z(__s32); + return -vpadd_s32(__tmp, int32x2_t())[0]; + } + else if constexpr (sizeof(_Tp) == 8) + { + static_assert(sizeof(__k) == 16); + const auto __s64 = __vector_bitcast<long>(__k._M_data); + return -(__s64[0] + __s64[1]); + } + } + + // }}} + // _S_find_first_set {{{ + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static int + _S_find_first_set(simd_mask<_Tp, _Abi> __k) + { + // TODO: the _Base implementation is not optimal for NEON + return _Base::_S_find_first_set(__k); + } + + // }}} + // _S_find_last_set {{{ + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static int + _S_find_last_set(simd_mask<_Tp, _Abi> __k) + { + // TODO: the _Base implementation is not optimal for NEON + return _Base::_S_find_last_set(__k); + } + + // }}} + }; // }}} + +_GLIBCXX_SIMD_END_NAMESPACE +#endif // __cplusplus >= 201703L +#endif // _GLIBCXX_EXPERIMENTAL_SIMD_NEON_H_ +// vim: foldmethod=marker sw=2 noet ts=8 sts=2 tw=80 diff --git a/libstdc++-v3/include/experimental/bits/simd_ppc.h b/libstdc++-v3/include/experimental/bits/simd_ppc.h new file mode 100644 index 00000000000..c00d2323ac6 --- /dev/null +++ b/libstdc++-v3/include/experimental/bits/simd_ppc.h @@ -0,0 +1,123 @@ +// Simd PowerPC specific implementations -*- C++ -*- + +// Copyright (C) 2020 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// Under Section 7 of GPL version 3, you are granted additional +// permissions described in the GCC Runtime Library Exception, version +// 3.1, as published by the Free Software Foundation. + +// You should have received a copy of the GNU General Public License and +// a copy of the GCC Runtime Library Exception along with this program; +// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +// <http://www.gnu.org/licenses/>. + +#ifndef _GLIBCXX_EXPERIMENTAL_SIMD_PPC_H_ +#define _GLIBCXX_EXPERIMENTAL_SIMD_PPC_H_ + +#if __cplusplus >= 201703L + +#ifndef __ALTIVEC__ +#error "simd_ppc.h may only be included when AltiVec/VMX is available" +#endif + +_GLIBCXX_SIMD_BEGIN_NAMESPACE + +// _SimdImplPpc {{{ +template <typename _Abi> + struct _SimdImplPpc : _SimdImplBuiltin<_Abi> + { + using _Base = _SimdImplBuiltin<_Abi>; + + // Byte and halfword shift instructions on PPC only consider the low 3 or 4 + // bits of the RHS. Consequently, shifting by sizeof(_Tp)*CHAR_BIT (or more) + // is UB without extra measures. To match scalar behavior, byte and halfword + // shifts need an extra fixup step. + + // _S_bit_shift_left {{{ + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np> + _S_bit_shift_left(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y) + { + __x = _Base::_S_bit_shift_left(__x, __y); + if constexpr (sizeof(_Tp) < sizeof(int)) + __x._M_data + = (__y._M_data < sizeof(_Tp) * __CHAR_BIT__) & __x._M_data; + return __x; + } + + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np> + _S_bit_shift_left(_SimdWrapper<_Tp, _Np> __x, int __y) + { + __x = _Base::_S_bit_shift_left(__x, __y); + if constexpr (sizeof(_Tp) < sizeof(int)) + { + if (__y >= sizeof(_Tp) * __CHAR_BIT__) + return {}; + } + return __x; + } + + // }}} + // _S_bit_shift_right {{{ + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np> + _S_bit_shift_right(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y) + { + if constexpr (sizeof(_Tp) < sizeof(int)) + { + constexpr int __nbits = sizeof(_Tp) * __CHAR_BIT__; + if constexpr (is_unsigned_v<_Tp>) + return (__y._M_data < __nbits) + & _Base::_S_bit_shift_right(__x, __y)._M_data; + else + { + _Base::_S_masked_assign(_SimdWrapper<_Tp, _Np>(__y._M_data + >= __nbits), + __y, __nbits - 1); + return _Base::_S_bit_shift_right(__x, __y); + } + } + else + return _Base::_S_bit_shift_right(__x, __y); + } + + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np> + _S_bit_shift_right(_SimdWrapper<_Tp, _Np> __x, int __y) + { + if constexpr (sizeof(_Tp) < sizeof(int)) + { + constexpr int __nbits = sizeof(_Tp) * __CHAR_BIT__; + if (__y >= __nbits) + { + if constexpr (is_unsigned_v<_Tp>) + return {}; + else + return _Base::_S_bit_shift_right(__x, __nbits - 1); + } + } + return _Base::_S_bit_shift_right(__x, __y); + } + + // }}} + }; + +// }}} + +_GLIBCXX_SIMD_END_NAMESPACE +#endif // __cplusplus >= 201703L +#endif // _GLIBCXX_EXPERIMENTAL_SIMD_PPC_H_ + +// vim: foldmethod=marker sw=2 noet ts=8 sts=2 tw=80 diff --git a/libstdc++-v3/include/experimental/bits/simd_scalar.h b/libstdc++-v3/include/experimental/bits/simd_scalar.h new file mode 100644 index 00000000000..7680bc39c30 --- /dev/null +++ b/libstdc++-v3/include/experimental/bits/simd_scalar.h @@ -0,0 +1,772 @@ +// Simd scalar ABI specific implementations -*- C++ -*- + +// Copyright (C) 2020 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// Under Section 7 of GPL version 3, you are granted additional +// permissions described in the GCC Runtime Library Exception, version +// 3.1, as published by the Free Software Foundation. + +// You should have received a copy of the GNU General Public License and +// a copy of the GCC Runtime Library Exception along with this program; +// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +// <http://www.gnu.org/licenses/>. + +#ifndef _GLIBCXX_EXPERIMENTAL_SIMD_SCALAR_H_ +#define _GLIBCXX_EXPERIMENTAL_SIMD_SCALAR_H_ +#if __cplusplus >= 201703L + +#include <cmath> + +_GLIBCXX_SIMD_BEGIN_NAMESPACE + +// __promote_preserving_unsigned{{{ +// work around crazy semantics of unsigned integers of lower rank than int: +// Before applying an operator the operands are promoted to int. In which case +// over- or underflow is UB, even though the operand types were unsigned. +template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC constexpr decltype(auto) + __promote_preserving_unsigned(const _Tp& __x) + { + if constexpr (is_signed_v<decltype(+__x)> && is_unsigned_v<_Tp>) + return static_cast<unsigned int>(__x); + else + return __x; + } + +// }}} + +struct _CommonImplScalar; +struct _CommonImplBuiltin; +struct _SimdImplScalar; +struct _MaskImplScalar; + +// simd_abi::_Scalar {{{ +struct simd_abi::_Scalar +{ + template <typename _Tp> + static constexpr size_t _S_size = 1; + + template <typename _Tp> + static constexpr size_t _S_full_size = 1; + + template <typename _Tp> + static constexpr bool _S_is_partial = false; + + struct _IsValidAbiTag : true_type {}; + + template <typename _Tp> + struct _IsValidSizeFor : true_type {}; + + template <typename _Tp> + struct _IsValid : __is_vectorizable<_Tp> {}; + + template <typename _Tp> + static constexpr bool _S_is_valid_v = _IsValid<_Tp>::value; + + _GLIBCXX_SIMD_INTRINSIC static constexpr bool _S_masked(bool __x) + { return __x; } + + using _CommonImpl = _CommonImplScalar; + using _SimdImpl = _SimdImplScalar; + using _MaskImpl = _MaskImplScalar; + + template <typename _Tp, bool = _S_is_valid_v<_Tp>> + struct __traits : _InvalidTraits {}; + + template <typename _Tp> + struct __traits<_Tp, true> + { + using _IsValid = true_type; + using _SimdImpl = _SimdImplScalar; + using _MaskImpl = _MaskImplScalar; + using _SimdMember = _Tp; + using _MaskMember = bool; + + static constexpr size_t _S_simd_align = alignof(_SimdMember); + static constexpr size_t _S_mask_align = alignof(_MaskMember); + + // nothing the user can spell converts to/from simd/simd_mask + struct _SimdCastType { _SimdCastType() = delete; }; + struct _MaskCastType { _MaskCastType() = delete; }; + struct _SimdBase {}; + struct _MaskBase {}; + }; +}; + +// }}} +// _CommonImplScalar {{{ +struct _CommonImplScalar +{ + // _S_store {{{ + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static void _S_store(_Tp __x, void* __addr) + { __builtin_memcpy(__addr, &__x, sizeof(_Tp)); } + + // }}} + // _S_store_bool_array(_BitMask) {{{ + template <size_t _Np, bool _Sanitized> + _GLIBCXX_SIMD_INTRINSIC static constexpr void + _S_store_bool_array(_BitMask<_Np, _Sanitized> __x, bool* __mem) + { + __make_dependent_t<decltype(__x), _CommonImplBuiltin>::_S_store_bool_array( + __x, __mem); + } + + // }}} +}; + +// }}} +// _SimdImplScalar {{{ +struct _SimdImplScalar +{ + // member types {{{2 + using abi_type = simd_abi::scalar; + + template <typename _Tp> + using _TypeTag = _Tp*; + + // _S_broadcast {{{2 + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static constexpr _Tp _S_broadcast(_Tp __x) noexcept + { return __x; } + + // _S_generator {{{2 + template <typename _Fp, typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static constexpr _Tp _S_generator(_Fp&& __gen, + _TypeTag<_Tp>) + { return __gen(_SizeConstant<0>()); } + + // _S_load {{{2 + template <typename _Tp, typename _Up> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_load(const _Up* __mem, + _TypeTag<_Tp>) noexcept + { return static_cast<_Tp>(__mem[0]); } + + // _S_masked_load {{{2 + template <typename _Tp, typename _Up> + static inline _Tp _S_masked_load(_Tp __merge, bool __k, + const _Up* __mem) noexcept + { + if (__k) + __merge = static_cast<_Tp>(__mem[0]); + return __merge; + } + + // _S_store {{{2 + template <typename _Tp, typename _Up> + static inline void _S_store(_Tp __v, _Up* __mem, _TypeTag<_Tp>) noexcept + { __mem[0] = static_cast<_Up>(__v); } + + // _S_masked_store {{{2 + template <typename _Tp, typename _Up> + static inline void _S_masked_store(const _Tp __v, _Up* __mem, + const bool __k) noexcept + { if (__k) __mem[0] = __v; } + + // _S_negate {{{2 + template <typename _Tp> + static constexpr inline bool _S_negate(_Tp __x) noexcept + { return !__x; } + + // _S_reduce {{{2 + template <typename _Tp, typename _BinaryOperation> + static constexpr inline _Tp + _S_reduce(const simd<_Tp, simd_abi::scalar>& __x, _BinaryOperation&) + { return __x._M_data; } + + // _S_min, _S_max {{{2 + template <typename _Tp> + static constexpr inline _Tp _S_min(const _Tp __a, const _Tp __b) + { return std::min(__a, __b); } + + template <typename _Tp> + static constexpr inline _Tp _S_max(const _Tp __a, const _Tp __b) + { return std::max(__a, __b); } + + // _S_complement {{{2 + template <typename _Tp> + static constexpr inline _Tp _S_complement(_Tp __x) noexcept + { return static_cast<_Tp>(~__x); } + + // _S_unary_minus {{{2 + template <typename _Tp> + static constexpr inline _Tp _S_unary_minus(_Tp __x) noexcept + { return static_cast<_Tp>(-__x); } + + // arithmetic operators {{{2 + template <typename _Tp> + static constexpr inline _Tp _S_plus(_Tp __x, _Tp __y) + { + return static_cast<_Tp>(__promote_preserving_unsigned(__x) + + __promote_preserving_unsigned(__y)); + } + + template <typename _Tp> + static constexpr inline _Tp _S_minus(_Tp __x, _Tp __y) + { + return static_cast<_Tp>(__promote_preserving_unsigned(__x) + - __promote_preserving_unsigned(__y)); + } + + template <typename _Tp> + static constexpr inline _Tp _S_multiplies(_Tp __x, _Tp __y) + { + return static_cast<_Tp>(__promote_preserving_unsigned(__x) + * __promote_preserving_unsigned(__y)); + } + + template <typename _Tp> + static constexpr inline _Tp _S_divides(_Tp __x, _Tp __y) + { + return static_cast<_Tp>(__promote_preserving_unsigned(__x) + / __promote_preserving_unsigned(__y)); + } + + template <typename _Tp> + static constexpr inline _Tp _S_modulus(_Tp __x, _Tp __y) + { + return static_cast<_Tp>(__promote_preserving_unsigned(__x) + % __promote_preserving_unsigned(__y)); + } + + template <typename _Tp> + static constexpr inline _Tp _S_bit_and(_Tp __x, _Tp __y) + { + if constexpr (is_floating_point_v<_Tp>) + { + using _Ip = __int_for_sizeof_t<_Tp>; + return __bit_cast<_Tp>(__bit_cast<_Ip>(__x) & __bit_cast<_Ip>(__y)); + } + else + return static_cast<_Tp>(__promote_preserving_unsigned(__x) + & __promote_preserving_unsigned(__y)); + } + + template <typename _Tp> + static constexpr inline _Tp _S_bit_or(_Tp __x, _Tp __y) + { + if constexpr (is_floating_point_v<_Tp>) + { + using _Ip = __int_for_sizeof_t<_Tp>; + return __bit_cast<_Tp>(__bit_cast<_Ip>(__x) | __bit_cast<_Ip>(__y)); + } + else + return static_cast<_Tp>(__promote_preserving_unsigned(__x) + | __promote_preserving_unsigned(__y)); + } + + template <typename _Tp> + static constexpr inline _Tp _S_bit_xor(_Tp __x, _Tp __y) + { + if constexpr (is_floating_point_v<_Tp>) + { + using _Ip = __int_for_sizeof_t<_Tp>; + return __bit_cast<_Tp>(__bit_cast<_Ip>(__x) ^ __bit_cast<_Ip>(__y)); + } + else + return static_cast<_Tp>(__promote_preserving_unsigned(__x) + ^ __promote_preserving_unsigned(__y)); + } + + template <typename _Tp> + static constexpr inline _Tp _S_bit_shift_left(_Tp __x, int __y) + { return static_cast<_Tp>(__promote_preserving_unsigned(__x) << __y); } + + template <typename _Tp> + static constexpr inline _Tp _S_bit_shift_right(_Tp __x, int __y) + { return static_cast<_Tp>(__promote_preserving_unsigned(__x) >> __y); } + + // math {{{2 + // frexp, modf and copysign implemented in simd_math.h + template <typename _Tp> + using _ST = _SimdTuple<_Tp, simd_abi::scalar>; + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_acos(_Tp __x) + { return std::acos(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_asin(_Tp __x) + { return std::asin(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_atan(_Tp __x) + { return std::atan(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_cos(_Tp __x) + { return std::cos(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_sin(_Tp __x) + { return std::sin(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_tan(_Tp __x) + { return std::tan(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_acosh(_Tp __x) + { return std::acosh(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_asinh(_Tp __x) + { return std::asinh(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_atanh(_Tp __x) + { return std::atanh(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_cosh(_Tp __x) + { return std::cosh(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_sinh(_Tp __x) + { return std::sinh(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_tanh(_Tp __x) + { return std::tanh(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_atan2(_Tp __x, _Tp __y) + { return std::atan2(__x, __y); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_exp(_Tp __x) + { return std::exp(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_exp2(_Tp __x) + { return std::exp2(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_expm1(_Tp __x) + { return std::expm1(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_log(_Tp __x) + { return std::log(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_log10(_Tp __x) + { return std::log10(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_log1p(_Tp __x) + { return std::log1p(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_log2(_Tp __x) + { return std::log2(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_logb(_Tp __x) + { return std::logb(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _ST<int> _S_ilogb(_Tp __x) + { return {std::ilogb(__x)}; } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_pow(_Tp __x, _Tp __y) + { return std::pow(__x, __y); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_abs(_Tp __x) + { return std::abs(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_fabs(_Tp __x) + { return std::fabs(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_sqrt(_Tp __x) + { return std::sqrt(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_cbrt(_Tp __x) + { return std::cbrt(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_erf(_Tp __x) + { return std::erf(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_erfc(_Tp __x) + { return std::erfc(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_lgamma(_Tp __x) + { return std::lgamma(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_tgamma(_Tp __x) + { return std::tgamma(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_trunc(_Tp __x) + { return std::trunc(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_floor(_Tp __x) + { return std::floor(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_ceil(_Tp __x) + { return std::ceil(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_nearbyint(_Tp __x) + { return std::nearbyint(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_rint(_Tp __x) + { return std::rint(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _ST<long> _S_lrint(_Tp __x) + { return {std::lrint(__x)}; } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _ST<long long> _S_llrint(_Tp __x) + { return {std::llrint(__x)}; } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_round(_Tp __x) + { return std::round(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _ST<long> _S_lround(_Tp __x) + { return {std::lround(__x)}; } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _ST<long long> _S_llround(_Tp __x) + { return {std::llround(__x)}; } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_ldexp(_Tp __x, _ST<int> __y) + { return std::ldexp(__x, __y.first); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_scalbn(_Tp __x, _ST<int> __y) + { return std::scalbn(__x, __y.first); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_scalbln(_Tp __x, _ST<long> __y) + { return std::scalbln(__x, __y.first); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_fmod(_Tp __x, _Tp __y) + { return std::fmod(__x, __y); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_remainder(_Tp __x, _Tp __y) + { return std::remainder(__x, __y); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_nextafter(_Tp __x, _Tp __y) + { return std::nextafter(__x, __y); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_fdim(_Tp __x, _Tp __y) + { return std::fdim(__x, __y); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_fmax(_Tp __x, _Tp __y) + { return std::fmax(__x, __y); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_fmin(_Tp __x, _Tp __y) + { return std::fmin(__x, __y); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_fma(_Tp __x, _Tp __y, _Tp __z) + { return std::fma(__x, __y, __z); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC static _Tp _S_remquo(_Tp __x, _Tp __y, _ST<int>* __z) + { return std::remquo(__x, __y, &__z->first); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC constexpr static _ST<int> _S_fpclassify(_Tp __x) + { return {std::fpclassify(__x)}; } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC constexpr static bool _S_isfinite(_Tp __x) + { return std::isfinite(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC constexpr static bool _S_isinf(_Tp __x) + { return std::isinf(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC constexpr static bool _S_isnan(_Tp __x) + { return std::isnan(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC constexpr static bool _S_isnormal(_Tp __x) + { return std::isnormal(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC constexpr static bool _S_signbit(_Tp __x) + { return std::signbit(__x); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC constexpr static bool _S_isgreater(_Tp __x, _Tp __y) + { return std::isgreater(__x, __y); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC constexpr static bool _S_isgreaterequal(_Tp __x, + _Tp __y) + { return std::isgreaterequal(__x, __y); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC constexpr static bool _S_isless(_Tp __x, _Tp __y) + { return std::isless(__x, __y); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC constexpr static bool _S_islessequal(_Tp __x, _Tp __y) + { return std::islessequal(__x, __y); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC constexpr static bool _S_islessgreater(_Tp __x, + _Tp __y) + { return std::islessgreater(__x, __y); } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC constexpr static bool _S_isunordered(_Tp __x, + _Tp __y) + { return std::isunordered(__x, __y); } + + // _S_increment & _S_decrement{{{2 + template <typename _Tp> + constexpr static inline void _S_increment(_Tp& __x) + { ++__x; } + + template <typename _Tp> + constexpr static inline void _S_decrement(_Tp& __x) + { --__x; } + + + // compares {{{2 + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC constexpr static bool _S_equal_to(_Tp __x, _Tp __y) + { return __x == __y; } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC constexpr static bool _S_not_equal_to(_Tp __x, + _Tp __y) + { return __x != __y; } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC constexpr static bool _S_less(_Tp __x, _Tp __y) + { return __x < __y; } + + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC constexpr static bool _S_less_equal(_Tp __x, + _Tp __y) + { return __x <= __y; } + + // smart_reference access {{{2 + template <typename _Tp, typename _Up> + constexpr static void _S_set(_Tp& __v, [[maybe_unused]] int __i, + _Up&& __x) noexcept + { + _GLIBCXX_DEBUG_ASSERT(__i == 0); + __v = static_cast<_Up&&>(__x); + } + + // _S_masked_assign {{{2 + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC constexpr static void + _S_masked_assign(bool __k, _Tp& __lhs, _Tp __rhs) + { if (__k) __lhs = __rhs; } + + // _S_masked_cassign {{{2 + template <typename _Op, typename _Tp> + _GLIBCXX_SIMD_INTRINSIC constexpr static void + _S_masked_cassign(const bool __k, _Tp& __lhs, const _Tp __rhs, _Op __op) + { if (__k) __lhs = __op(_SimdImplScalar{}, __lhs, __rhs); } + + // _S_masked_unary {{{2 + template <template <typename> class _Op, typename _Tp> + _GLIBCXX_SIMD_INTRINSIC constexpr static _Tp _S_masked_unary(const bool __k, + const _Tp __v) + { return static_cast<_Tp>(__k ? _Op<_Tp>{}(__v) : __v); } + + // }}}2 +}; + +// }}} +// _MaskImplScalar {{{ +struct _MaskImplScalar +{ + // member types {{{ + template <typename _Tp> + using _TypeTag = _Tp*; + + // }}} + // _S_broadcast {{{ + template <typename> + _GLIBCXX_SIMD_INTRINSIC static constexpr bool _S_broadcast(bool __x) + { return __x; } + + // }}} + // _S_load {{{ + template <typename> + _GLIBCXX_SIMD_INTRINSIC static constexpr bool _S_load(const bool* __mem) + { return __mem[0]; } + + // }}} + // _S_to_bits {{{ + _GLIBCXX_SIMD_INTRINSIC static constexpr _SanitizedBitMask<1> + _S_to_bits(bool __x) + { return __x; } + + // }}} + // _S_convert {{{ + template <typename, bool _Sanitized> + _GLIBCXX_SIMD_INTRINSIC static constexpr bool + _S_convert(_BitMask<1, _Sanitized> __x) + { return __x[0]; } + + template <typename, typename _Up, typename _UAbi> + _GLIBCXX_SIMD_INTRINSIC static constexpr bool + _S_convert(simd_mask<_Up, _UAbi> __x) + { return __x[0]; } + + // }}} + // _S_from_bitmask {{{2 + template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC constexpr static bool + _S_from_bitmask(_SanitizedBitMask<1> __bits, _TypeTag<_Tp>) noexcept + { return __bits[0]; } + + // _S_masked_load {{{2 + _GLIBCXX_SIMD_INTRINSIC constexpr static bool + _S_masked_load(bool __merge, bool __mask, const bool* __mem) noexcept + { + if (__mask) + __merge = __mem[0]; + return __merge; + } + + // _S_store {{{2 + _GLIBCXX_SIMD_INTRINSIC static void _S_store(bool __v, bool* __mem) noexcept + { __mem[0] = __v; } + + // _S_masked_store {{{2 + _GLIBCXX_SIMD_INTRINSIC static void + _S_masked_store(const bool __v, bool* __mem, const bool __k) noexcept + { + if (__k) + __mem[0] = __v; + } + + // logical and bitwise operators {{{2 + static constexpr bool _S_logical_and(bool __x, bool __y) + { return __x && __y; } + + static constexpr bool _S_logical_or(bool __x, bool __y) + { return __x || __y; } + + static constexpr bool _S_bit_not(bool __x) + { return !__x; } + + static constexpr bool _S_bit_and(bool __x, bool __y) + { return __x && __y; } + + static constexpr bool _S_bit_or(bool __x, bool __y) + { return __x || __y; } + + static constexpr bool _S_bit_xor(bool __x, bool __y) + { return __x != __y; } + + // smart_reference access {{{2 + constexpr static void _S_set(bool& __k, [[maybe_unused]] int __i, + bool __x) noexcept + { + _GLIBCXX_DEBUG_ASSERT(__i == 0); + __k = __x; + } + + // _S_masked_assign {{{2 + _GLIBCXX_SIMD_INTRINSIC static void _S_masked_assign(bool __k, bool& __lhs, + bool __rhs) + { + if (__k) + __lhs = __rhs; + } + + // }}}2 + // _S_all_of {{{ + template <typename _Tp, typename _Abi> + _GLIBCXX_SIMD_INTRINSIC constexpr static bool + _S_all_of(simd_mask<_Tp, _Abi> __k) + { return __k._M_data; } + + // }}} + // _S_any_of {{{ + template <typename _Tp, typename _Abi> + _GLIBCXX_SIMD_INTRINSIC constexpr static bool + _S_any_of(simd_mask<_Tp, _Abi> __k) + { return __k._M_data; } + + // }}} + // _S_none_of {{{ + template <typename _Tp, typename _Abi> + _GLIBCXX_SIMD_INTRINSIC constexpr static bool + _S_none_of(simd_mask<_Tp, _Abi> __k) + { return !__k._M_data; } + + // }}} + // _S_some_of {{{ + template <typename _Tp, typename _Abi> + _GLIBCXX_SIMD_INTRINSIC constexpr static bool + _S_some_of(simd_mask<_Tp, _Abi>) + { return false; } + + // }}} + // _S_popcount {{{ + template <typename _Tp, typename _Abi> + _GLIBCXX_SIMD_INTRINSIC constexpr static int + _S_popcount(simd_mask<_Tp, _Abi> __k) + { return __k._M_data; } + + // }}} + // _S_find_first_set {{{ + template <typename _Tp, typename _Abi> + _GLIBCXX_SIMD_INTRINSIC constexpr static int + _S_find_first_set(simd_mask<_Tp, _Abi>) + { return 0; } + + // }}} + // _S_find_last_set {{{ + template <typename _Tp, typename _Abi> + _GLIBCXX_SIMD_INTRINSIC constexpr static int + _S_find_last_set(simd_mask<_Tp, _Abi>) + { return 0; } + + // }}} +}; + +// }}} + +_GLIBCXX_SIMD_END_NAMESPACE +#endif // __cplusplus >= 201703L +#endif // _GLIBCXX_EXPERIMENTAL_SIMD_SCALAR_H_ + +// vim: foldmethod=marker sw=2 noet ts=8 sts=2 tw=80 diff --git a/libstdc++-v3/include/experimental/bits/simd_x86.h b/libstdc++-v3/include/experimental/bits/simd_x86.h new file mode 100644 index 00000000000..d1d7b9d4bf3 --- /dev/null +++ b/libstdc++-v3/include/experimental/bits/simd_x86.h @@ -0,0 +1,5169 @@ +// Simd x86 specific implementations -*- C++ -*- + +// Copyright (C) 2020 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// Under Section 7 of GPL version 3, you are granted additional +// permissions described in the GCC Runtime Library Exception, version +// 3.1, as published by the Free Software Foundation. + +// You should have received a copy of the GNU General Public License and +// a copy of the GCC Runtime Library Exception along with this program; +// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +// <http://www.gnu.org/licenses/>. + +#ifndef _GLIBCXX_EXPERIMENTAL_SIMD_X86_H_ +#define _GLIBCXX_EXPERIMENTAL_SIMD_X86_H_ + +#if __cplusplus >= 201703L + +#if !_GLIBCXX_SIMD_X86INTRIN +#error \ + "simd_x86.h may only be included when MMX or SSE on x86(_64) are available" +#endif + +_GLIBCXX_SIMD_BEGIN_NAMESPACE + +// __to_masktype {{{ +// Given <T, N> return <__int_for_sizeof_t<T>, N>. For _SimdWrapper and +// __vector_type_t. +template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapper<__int_for_sizeof_t<_Tp>, _Np> + __to_masktype(_SimdWrapper<_Tp, _Np> __x) + { + return reinterpret_cast<__vector_type_t<__int_for_sizeof_t<_Tp>, _Np>>( + __x._M_data); + } + +template <typename _TV, + typename _TVT + = enable_if_t<__is_vector_type_v<_TV>, _VectorTraits<_TV>>, + typename _Up = __int_for_sizeof_t<typename _TVT::value_type>> + _GLIBCXX_SIMD_INTRINSIC constexpr __vector_type_t<_Up, _TVT::_S_full_size> + __to_masktype(_TV __x) + { return reinterpret_cast<__vector_type_t<_Up, _TVT::_S_full_size>>(__x); } + +// }}} +// __interleave128_lo {{{ +template <typename _Ap, typename _B, typename _Tp = common_type_t<_Ap, _B>, + typename _Trait = _VectorTraits<_Tp>> + _GLIBCXX_SIMD_INTRINSIC constexpr _Tp + __interleave128_lo(const _Ap& __av, const _B& __bv) + { + const _Tp __a(__av); + const _Tp __b(__bv); + if constexpr (sizeof(_Tp) == 16 && _Trait::_S_full_size == 2) + return _Tp{__a[0], __b[0]}; + else if constexpr (sizeof(_Tp) == 16 && _Trait::_S_full_size == 4) + return _Tp{__a[0], __b[0], __a[1], __b[1]}; + else if constexpr (sizeof(_Tp) == 16 && _Trait::_S_full_size == 8) + return _Tp{__a[0], __b[0], __a[1], __b[1], + __a[2], __b[2], __a[3], __b[3]}; + else if constexpr (sizeof(_Tp) == 16 && _Trait::_S_full_size == 16) + return _Tp{__a[0], __b[0], __a[1], __b[1], __a[2], __b[2], + __a[3], __b[3], __a[4], __b[4], __a[5], __b[5], + __a[6], __b[6], __a[7], __b[7]}; + else if constexpr (sizeof(_Tp) == 32 && _Trait::_S_full_size == 4) + return _Tp{__a[0], __b[0], __a[2], __b[2]}; + else if constexpr (sizeof(_Tp) == 32 && _Trait::_S_full_size == 8) + return _Tp{__a[0], __b[0], __a[1], __b[1], + __a[4], __b[4], __a[5], __b[5]}; + else if constexpr (sizeof(_Tp) == 32 && _Trait::_S_full_size == 16) + return _Tp{__a[0], __b[0], __a[1], __b[1], __a[2], __b[2], + __a[3], __b[3], __a[8], __b[8], __a[9], __b[9], + __a[10], __b[10], __a[11], __b[11]}; + else if constexpr (sizeof(_Tp) == 32 && _Trait::_S_full_size == 32) + return _Tp{__a[0], __b[0], __a[1], __b[1], __a[2], __b[2], __a[3], + __b[3], __a[4], __b[4], __a[5], __b[5], __a[6], __b[6], + __a[7], __b[7], __a[16], __b[16], __a[17], __b[17], __a[18], + __b[18], __a[19], __b[19], __a[20], __b[20], __a[21], __b[21], + __a[22], __b[22], __a[23], __b[23]}; + else if constexpr (sizeof(_Tp) == 64 && _Trait::_S_full_size == 8) + return _Tp{__a[0], __b[0], __a[2], __b[2], + __a[4], __b[4], __a[6], __b[6]}; + else if constexpr (sizeof(_Tp) == 64 && _Trait::_S_full_size == 16) + return _Tp{__a[0], __b[0], __a[1], __b[1], __a[4], __b[4], + __a[5], __b[5], __a[8], __b[8], __a[9], __b[9], + __a[12], __b[12], __a[13], __b[13]}; + else if constexpr (sizeof(_Tp) == 64 && _Trait::_S_full_size == 32) + return _Tp{__a[0], __b[0], __a[1], __b[1], __a[2], __b[2], __a[3], + __b[3], __a[8], __b[8], __a[9], __b[9], __a[10], __b[10], + __a[11], __b[11], __a[16], __b[16], __a[17], __b[17], __a[18], + __b[18], __a[19], __b[19], __a[24], __b[24], __a[25], __b[25], + __a[26], __b[26], __a[27], __b[27]}; + else if constexpr (sizeof(_Tp) == 64 && _Trait::_S_full_size == 64) + return _Tp{__a[0], __b[0], __a[1], __b[1], __a[2], __b[2], __a[3], + __b[3], __a[4], __b[4], __a[5], __b[5], __a[6], __b[6], + __a[7], __b[7], __a[16], __b[16], __a[17], __b[17], __a[18], + __b[18], __a[19], __b[19], __a[20], __b[20], __a[21], __b[21], + __a[22], __b[22], __a[23], __b[23], __a[32], __b[32], __a[33], + __b[33], __a[34], __b[34], __a[35], __b[35], __a[36], __b[36], + __a[37], __b[37], __a[38], __b[38], __a[39], __b[39], __a[48], + __b[48], __a[49], __b[49], __a[50], __b[50], __a[51], __b[51], + __a[52], __b[52], __a[53], __b[53], __a[54], __b[54], __a[55], + __b[55]}; + else + __assert_unreachable<_Tp>(); + } + +// }}} +// __is_zero{{{ +template <typename _Tp, typename _TVT = _VectorTraits<_Tp>> + _GLIBCXX_SIMD_INTRINSIC constexpr bool + __is_zero(_Tp __a) + { + if (!__builtin_is_constant_evaluated()) + { + if constexpr (__have_avx) + { + if constexpr (_TVT::template _S_is<float, 8>) + return _mm256_testz_ps(__a, __a); + else if constexpr (_TVT::template _S_is<double, 4>) + return _mm256_testz_pd(__a, __a); + else if constexpr (sizeof(_Tp) == 32) + return _mm256_testz_si256(__to_intrin(__a), __to_intrin(__a)); + else if constexpr (_TVT::template _S_is<float>) + return _mm_testz_ps(__to_intrin(__a), __to_intrin(__a)); + else if constexpr (_TVT::template _S_is<double, 2>) + return _mm_testz_pd(__a, __a); + else + return _mm_testz_si128(__to_intrin(__a), __to_intrin(__a)); + } + else if constexpr (__have_sse4_1) + return _mm_testz_si128(__intrin_bitcast<__m128i>(__a), + __intrin_bitcast<__m128i>(__a)); + } + else if constexpr (sizeof(_Tp) <= 8) + return reinterpret_cast<__int_for_sizeof_t<_Tp>>(__a) == 0; + else + { + const auto __b = __vector_bitcast<_LLong>(__a); + if constexpr (sizeof(__b) == 16) + return (__b[0] | __b[1]) == 0; + else if constexpr (sizeof(__b) == 32) + return __is_zero(__lo128(__b) | __hi128(__b)); + else if constexpr (sizeof(__b) == 64) + return __is_zero(__lo256(__b) | __hi256(__b)); + else + __assert_unreachable<_Tp>(); + } + } + +// }}} +// __movemask{{{ +template <typename _Tp, typename _TVT = _VectorTraits<_Tp>> + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_CONST int + __movemask(_Tp __a) + { + if constexpr (sizeof(_Tp) == 32) + { + if constexpr (_TVT::template _S_is<float>) + return _mm256_movemask_ps(__to_intrin(__a)); + else if constexpr (_TVT::template _S_is<double>) + return _mm256_movemask_pd(__to_intrin(__a)); + else + return _mm256_movemask_epi8(__to_intrin(__a)); + } + else if constexpr (_TVT::template _S_is<float>) + return _mm_movemask_ps(__to_intrin(__a)); + else if constexpr (_TVT::template _S_is<double>) + return _mm_movemask_pd(__to_intrin(__a)); + else + return _mm_movemask_epi8(__to_intrin(__a)); + } + +// }}} +// __testz{{{ +template <typename _TI, typename _TVT = _VectorTraits<_TI>> + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_CONST constexpr int + __testz(_TI __a, _TI __b) + { + static_assert(is_same_v<_TI, __intrinsic_type_t<typename _TVT::value_type, + _TVT::_S_full_size>>); + if (!__builtin_is_constant_evaluated()) + { + if constexpr (sizeof(_TI) == 32) + { + if constexpr (_TVT::template _S_is<float>) + return _mm256_testz_ps(__to_intrin(__a), __to_intrin(__b)); + else if constexpr (_TVT::template _S_is<double>) + return _mm256_testz_pd(__to_intrin(__a), __to_intrin(__b)); + else + return _mm256_testz_si256(__to_intrin(__a), __to_intrin(__b)); + } + else if constexpr (_TVT::template _S_is<float> && __have_avx) + return _mm_testz_ps(__to_intrin(__a), __to_intrin(__b)); + else if constexpr (_TVT::template _S_is<double> && __have_avx) + return _mm_testz_pd(__to_intrin(__a), __to_intrin(__b)); + else if constexpr (__have_sse4_1) + return _mm_testz_si128(__intrin_bitcast<__m128i>(__to_intrin(__a)), + __intrin_bitcast<__m128i>(__to_intrin(__b))); + else + return __movemask(0 == __and(__a, __b)) != 0; + } + else + return __is_zero(__and(__a, __b)); + } + +// }}} +// __testc{{{ +// requires SSE4.1 or above +template <typename _TI, typename _TVT = _VectorTraits<_TI>> + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_CONST constexpr int + __testc(_TI __a, _TI __b) + { + static_assert(is_same_v<_TI, __intrinsic_type_t<typename _TVT::value_type, + _TVT::_S_full_size>>); + if (__builtin_is_constant_evaluated()) + return __is_zero(__andnot(__a, __b)); + + if constexpr (sizeof(_TI) == 32) + { + if constexpr (_TVT::template _S_is<float>) + return _mm256_testc_ps(__a, __b); + else if constexpr (_TVT::template _S_is<double>) + return _mm256_testc_pd(__a, __b); + else + return _mm256_testc_si256(__to_intrin(__a), __to_intrin(__b)); + } + else if constexpr (_TVT::template _S_is<float> && __have_avx) + return _mm_testc_ps(__to_intrin(__a), __to_intrin(__b)); + else if constexpr (_TVT::template _S_is<double> && __have_avx) + return _mm_testc_pd(__to_intrin(__a), __to_intrin(__b)); + else + { + static_assert(is_same_v<_TI, _TI> && __have_sse4_1); + return _mm_testc_si128(__intrin_bitcast<__m128i>(__to_intrin(__a)), + __intrin_bitcast<__m128i>(__to_intrin(__b))); + } + } + +// }}} +// __testnzc{{{ +template <typename _TI, typename _TVT = _VectorTraits<_TI>> + _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_CONST constexpr int + __testnzc(_TI __a, _TI __b) + { + static_assert(is_same_v<_TI, __intrinsic_type_t<typename _TVT::value_type, + _TVT::_S_full_size>>); + if (!__builtin_is_constant_evaluated()) + { + if constexpr (sizeof(_TI) == 32) + { + if constexpr (_TVT::template _S_is<float>) + return _mm256_testnzc_ps(__a, __b); + else if constexpr (_TVT::template _S_is<double>) + return _mm256_testnzc_pd(__a, __b); + else + return _mm256_testnzc_si256(__to_intrin(__a), __to_intrin(__b)); + } + else if constexpr (_TVT::template _S_is<float> && __have_avx) + return _mm_testnzc_ps(__to_intrin(__a), __to_intrin(__b)); + else if constexpr (_TVT::template _S_is<double> && __have_avx) + return _mm_testnzc_pd(__to_intrin(__a), __to_intrin(__b)); + else if constexpr (__have_sse4_1) + return _mm_testnzc_si128(__intrin_bitcast<__m128i>(__to_intrin(__a)), + __intrin_bitcast<__m128i>(__to_intrin(__b))); + else + return __movemask(0 == __and(__a, __b)) == 0 + && __movemask(0 == __andnot(__a, __b)) == 0; + } + else + return !(__is_zero(__and(__a, __b)) || __is_zero(__andnot(__a, __b))); + } + +// }}} +// __xzyw{{{ +// shuffles the complete vector, swapping the inner two quarters. Often useful +// for AVX for fixing up a shuffle result. +template <typename _Tp, typename _TVT = _VectorTraits<_Tp>> + _GLIBCXX_SIMD_INTRINSIC _Tp + __xzyw(_Tp __a) + { + if constexpr (sizeof(_Tp) == 16) + { + const auto __x = __vector_bitcast<conditional_t< + is_floating_point_v<typename _TVT::value_type>, float, int>>(__a); + return reinterpret_cast<_Tp>( + decltype(__x){__x[0], __x[2], __x[1], __x[3]}); + } + else if constexpr (sizeof(_Tp) == 32) + { + const auto __x = __vector_bitcast<conditional_t< + is_floating_point_v<typename _TVT::value_type>, double, _LLong>>(__a); + return reinterpret_cast<_Tp>( + decltype(__x){__x[0], __x[2], __x[1], __x[3]}); + } + else if constexpr (sizeof(_Tp) == 64) + { + const auto __x = __vector_bitcast<conditional_t< + is_floating_point_v<typename _TVT::value_type>, double, _LLong>>(__a); + return reinterpret_cast<_Tp>(decltype(__x){__x[0], __x[1], __x[4], + __x[5], __x[2], __x[3], + __x[6], __x[7]}); + } + else + __assert_unreachable<_Tp>(); + } + +// }}} +// __maskload_epi32{{{ +template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC auto + __maskload_epi32(const int* __ptr, _Tp __k) + { + if constexpr (sizeof(__k) == 16) + return _mm_maskload_epi32(__ptr, __k); + else + return _mm256_maskload_epi32(__ptr, __k); + } + +// }}} +// __maskload_epi64{{{ +template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC auto + __maskload_epi64(const _LLong* __ptr, _Tp __k) + { + if constexpr (sizeof(__k) == 16) + return _mm_maskload_epi64(__ptr, __k); + else + return _mm256_maskload_epi64(__ptr, __k); + } + +// }}} +// __maskload_ps{{{ +template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC auto + __maskload_ps(const float* __ptr, _Tp __k) + { + if constexpr (sizeof(__k) == 16) + return _mm_maskload_ps(__ptr, __k); + else + return _mm256_maskload_ps(__ptr, __k); + } + +// }}} +// __maskload_pd{{{ +template <typename _Tp> + _GLIBCXX_SIMD_INTRINSIC auto + __maskload_pd(const double* __ptr, _Tp __k) + { + if constexpr (sizeof(__k) == 16) + return _mm_maskload_pd(__ptr, __k); + else + return _mm256_maskload_pd(__ptr, __k); + } + +// }}} + +#ifdef _GLIBCXX_SIMD_WORKAROUND_PR85048 +#include "simd_x86_conversions.h" +#endif + +// ISA & type detection {{{ +template <typename _Tp, size_t _Np> + constexpr bool + __is_sse_ps() + { + return __have_sse + && is_same_v<_Tp, + float> && sizeof(__intrinsic_type_t<_Tp, _Np>) == 16; + } + +template <typename _Tp, size_t _Np> + constexpr bool + __is_sse_pd() + { + return __have_sse2 + && is_same_v<_Tp, + double> && sizeof(__intrinsic_type_t<_Tp, _Np>) == 16; + } + +template <typename _Tp, size_t _Np> + constexpr bool + __is_avx_ps() + { + return __have_avx + && is_same_v<_Tp, + float> && sizeof(__intrinsic_type_t<_Tp, _Np>) == 32; + } + +template <typename _Tp, size_t _Np> + constexpr bool + __is_avx_pd() + { + return __have_avx + && is_same_v<_Tp, + double> && sizeof(__intrinsic_type_t<_Tp, _Np>) == 32; + } + +template <typename _Tp, size_t _Np> + constexpr bool + __is_avx512_ps() + { + return __have_avx512f + && is_same_v<_Tp, + float> && sizeof(__intrinsic_type_t<_Tp, _Np>) == 64; + } + +template <typename _Tp, size_t _Np> + constexpr bool + __is_avx512_pd() + { + return __have_avx512f + && is_same_v<_Tp, + double> && sizeof(__intrinsic_type_t<_Tp, _Np>) == 64; + } + +// }}} +struct _MaskImplX86Mixin; + +// _CommonImplX86 {{{ +struct _CommonImplX86 : _CommonImplBuiltin +{ +#ifdef _GLIBCXX_SIMD_WORKAROUND_PR85048 + // _S_converts_via_decomposition {{{ + template <typename _From, typename _To, size_t _ToSize> + static constexpr bool _S_converts_via_decomposition() + { + if constexpr (is_integral_v< + _From> && is_integral_v<_To> && sizeof(_From) == 8 + && _ToSize == 16) + return (sizeof(_To) == 2 && !__have_ssse3) + || (sizeof(_To) == 1 && !__have_avx512f); + else if constexpr (is_floating_point_v<_From> && is_integral_v<_To>) + return ((sizeof(_From) == 4 || sizeof(_From) == 8) && sizeof(_To) == 8 + && !__have_avx512dq) + || (sizeof(_From) == 8 && sizeof(_To) == 4 && !__have_sse4_1 + && _ToSize == 16); + else if constexpr ( + is_integral_v<_From> && is_floating_point_v<_To> && sizeof(_From) == 8 + && !__have_avx512dq) + return (sizeof(_To) == 4 && _ToSize == 16) + || (sizeof(_To) == 8 && _ToSize < 64); + else + return false; + } + + template <typename _From, typename _To, size_t _ToSize> + static inline constexpr bool __converts_via_decomposition_v + = _S_converts_via_decomposition<_From, _To, _ToSize>(); + + // }}} +#endif + // _S_store {{{ + using _CommonImplBuiltin::_S_store; + + template <typename _Tp, size_t _Np> + _GLIBCXX_SIMD_INTRINSIC static void _S_store(_SimdWrapper<_Tp, _Np> __x, + void* __addr) + { + constexpr size_t _Bytes = _Np * sizeof(_Tp); + + if constexpr ((_Bytes & (_Bytes - 1)) != 0 && __have_avx512bw_vl) + { + const auto __v = __to_intrin(__x); + + if constexpr (_Bytes & 1) + { + if constexpr (_Bytes < 16) + _mm_mask_storeu_epi8(__addr, 0xffffu >> (16 - _Bytes), + __intrin_bitcast<__m128i>(__v)); + else if constexpr (_Bytes < 32) + _mm256_mask_storeu_epi8(__addr, 0xffffffffu >> (32 - _Bytes), + __intrin_bitcast<__m256i>(__v)); + else + _mm512_mask_storeu_epi8(__addr, + 0xffffffffffffffffull >> (64 - _Bytes), + __intrin_bitcast<__m512i>(__v)); + } + else if constexpr (_Bytes & 2) + { + if constexpr (_Bytes < 16) + _mm_mask_storeu_epi16(__addr, 0xffu >> (8 - _Bytes / 2), + __intrin_bitcast<__m128i>(__v)); + else if constexpr (_Bytes < 32) + _mm256_mask_storeu_epi16(__addr, 0xffffu >> (16 - _Bytes / 2), + __intrin_bitcast<__m256i>(__v)); + else + _mm512_mask_storeu_epi16(__addr, + 0xffffffffull >> (32 - _Bytes / 2), + __intrin_bitcast<__m512i>(__v)); + } + else if constexpr (_Bytes & 4) + { + if constexpr (_Bytes < 16) + _mm_mask_storeu_epi32(__addr, 0xfu >> (4 - _Bytes / 4), + __intrin_bitcast<__m128i>(__v)); + else if constexpr (_Bytes < 32) + _mm256_mask_storeu_epi32(__addr, 0xffu >> (8 - _Bytes / 4), + __intrin_bitcast<__m256i>(__v)); + else + _mm512_mask_storeu_epi32(__addr, 0xffffull >> (16 - _Bytes / 4), + __intrin_bitcast<__m512i>(__v)); [...] [diff truncated at 524288 bytes]
reply other threads:[~2021-01-27 16:39 UTC|newest] Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20210127163918.E607F3846405@sourceware.org \ --to=redi@gcc.gnu.org \ --cc=gcc-cvs@gcc.gnu.org \ --cc=libstdc++-cvs@gcc.gnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).