* [PATCH] std::experimental::simd
@ 2019-10-14 12:12 Matthias Kretz
2019-10-15 3:52 ` Thomas Rodgers
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: Matthias Kretz @ 2019-10-14 12:12 UTC (permalink / raw)
To: gcc-patches, libstdc++
[-- Attachment #1: Type: text/plain, Size: 2346 bytes --]
Let me try again to get this patch ready. It will need a few iterations...
This patch is without documentation and testsuite. I can add them on request
but would prefer a follow-up patch after getting this one right.
I recommend to review starting from simd.h + simd_scalar.h, then
simd_builtin.h, simd_x86.h, and simd_fixed_size.h. I sure when we get this far
we are a few iterations further.
Regarding the license. The license header is currently just a copy from my
repo, but we can change it to the libstdc++ license. The paperwork with the
FSF is done.
* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/experimental/simd: New header for Parallelism TS 2.
* include/experimental/bits/simd.h: Definition of the public simd
interfaces and general implementation helpers.
* include/experimental/bits/simd_builtin.h: Implementation of the
_VecBuiltin simd_abi.
* include/experimental/bits/simd_combine.h: Preliminary
implementation of the _Combine simd_abi.
* include/experimental/bits/simd_converter.h: Generic simd
conversions.
* include/experimental/bits/simd_detail.h: Internal macros for the
simd implementation.
* include/experimental/bits/simd_fixed_size.h: Simd fixed_size ABI
specific implementations.
* include/experimental/bits/simd_math.h: Math overloads for simd.
* include/experimental/bits/simd_neon.h: Simd NEON specific
implementations.
* include/experimental/bits/simd_scalar.h: Simd scalar ABI
specific implementations.
* include/experimental/bits/simd_x86.h: Simd x86 specific
implementations.
* include/experimental/bits/simd_x86_conversions.h: x86 specific
conversion optimizations.
--
──────────────────────────────────────────────────────────────────────────
Dr. Matthias Kretz https://mattkretz.github.io
GSI Helmholtzzentrum für Schwerionenforschung https://gsi.de
SIMD easy and portable https://github.com/VcDevel/Vc
──────────────────────────────────────────────────────────────────────────
[-- Attachment #2: simd.diff.xz --]
[-- Type: application/x-xz, Size: 87536 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] std::experimental::simd
2019-10-14 12:12 [PATCH] std::experimental::simd Matthias Kretz
@ 2019-10-15 3:52 ` Thomas Rodgers
2019-10-24 8:26 ` Dr. Matthias Kretz
` (2 subsequent siblings)
3 siblings, 0 replies; 13+ messages in thread
From: Thomas Rodgers @ 2019-10-15 3:52 UTC (permalink / raw)
To: libstdc++
Hey Matthias,
Sorry about missing you in Cologne. I have been meaning to send a
followup email. I'd like to start the review of this patch but I don't
think I'll have time before the close of stage 1, as I have a few C++20
features I'm trying to finish by the end of the month.
Can we set up a bit of time early next week to discuss this? I can
arrange BlueJeans teleconference.
Thanks,
Tom.
Matthias Kretz writes:
> Let me try again to get this patch ready. It will need a few iterations...
> This patch is without documentation and testsuite. I can add them on request
> but would prefer a follow-up patch after getting this one right.
>
> I recommend to review starting from simd.h + simd_scalar.h, then
> simd_builtin.h, simd_x86.h, and simd_fixed_size.h. I sure when we get this far
> we are a few iterations further.
>
> Regarding the license. The license header is currently just a copy from my
> repo, but we can change it to the libstdc++ license. The paperwork with the
> FSF is done.
>
>
> * include/Makefile.am: Add new header.
> * include/Makefile.in: Regenerate.
> * include/experimental/simd: New header for Parallelism TS 2.
> * include/experimental/bits/simd.h: Definition of the public simd
> interfaces and general implementation helpers.
> * include/experimental/bits/simd_builtin.h: Implementation of the
> _VecBuiltin simd_abi.
> * include/experimental/bits/simd_combine.h: Preliminary
> implementation of the _Combine simd_abi.
> * include/experimental/bits/simd_converter.h: Generic simd
> conversions.
> * include/experimental/bits/simd_detail.h: Internal macros for the
> simd implementation.
> * include/experimental/bits/simd_fixed_size.h: Simd fixed_size ABI
> specific implementations.
> * include/experimental/bits/simd_math.h: Math overloads for simd.
> * include/experimental/bits/simd_neon.h: Simd NEON specific
> implementations.
> * include/experimental/bits/simd_scalar.h: Simd scalar ABI
> specific implementations.
> * include/experimental/bits/simd_x86.h: Simd x86 specific
> implementations.
> * include/experimental/bits/simd_x86_conversions.h: x86 specific
> conversion optimizations.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] std::experimental::simd
2019-10-14 12:12 [PATCH] std::experimental::simd Matthias Kretz
2019-10-15 3:52 ` Thomas Rodgers
@ 2019-10-24 8:26 ` Dr. Matthias Kretz
2020-02-10 16:49 ` Thomas Rodgers
2020-01-07 11:01 ` Matthias Kretz
[not found] ` <3486545.znU0eCzeS4@excalibur>
3 siblings, 1 reply; 13+ messages in thread
From: Dr. Matthias Kretz @ 2019-10-24 8:26 UTC (permalink / raw)
To: trodgers; +Cc: libstdc++
On Montag, 14. Oktober 2019 14:12:12 CEST Matthias Kretz wrote:
> This patch is without documentation and testsuite. I can add them on request
> but would prefer a follow-up patch after getting this one right.
Regarding tests, here's what I think a "consumer test" should cover:
1. compiler flags
- `-march=native -O2`
2. element types
- char
- uint8_t
- int
- float
- double
3. operations
- broadcast ctor
- generator ctor
- all operators
- compares
- mask reductions
- non-converting loads & stores
- non-converting masked loads & stores
- horizontal reductions
- math: 1024 input values, comparing fun(T) against fun(simd<T>)
- conversions:
* char <-> int
* uint8_t <-> int
* uint8_t <-> float
* int <-> float
* int <-> double
* float <-> double
4. ABI tags:
- scalar
- fixed_size<{2, 3, 4, 6, 8, 12, 16, 32}>
- deduce_t<T, {2, 3, 4, 6, 8, 12, 16, 32}> (if different from fixed_size)
- deduce_t<T, 64> for AVX512BW targets
- compatible (if different from above)
- native (if different from above)
Note that 1-4 are orthogonal and already span a huge space. I'm sure we need
to reduce this list, not expand.
--
──────────────────────────────────────────────────────────────────────────
Dr. Matthias Kretz https://mattkretz.github.io
GSI Helmholtzzentrum für Schwerionenforschung https://gsi.de
SIMD easy and portable https://github.com/VcDevel/Vc
──────────────────────────────────────────────────────────────────────────
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] std::experimental::simd
2019-10-14 12:12 [PATCH] std::experimental::simd Matthias Kretz
2019-10-15 3:52 ` Thomas Rodgers
2019-10-24 8:26 ` Dr. Matthias Kretz
@ 2020-01-07 11:01 ` Matthias Kretz
2020-01-07 11:17 ` Andrew Pinski
[not found] ` <3486545.znU0eCzeS4@excalibur>
3 siblings, 1 reply; 13+ messages in thread
From: Matthias Kretz @ 2020-01-07 11:01 UTC (permalink / raw)
To: libstdc++; +Cc: gcc-patches
Is there any chance left we can get this done for 10.1? If not, can we please
get it ready for 10.2 ASAP?
Cheers,
Matthias
On Montag, 14. Oktober 2019 14:12:12 CET Matthias Kretz wrote:
> Let me try again to get this patch ready. It will need a few iterations...
> This patch is without documentation and testsuite. I can add them on request
> but would prefer a follow-up patch after getting this one right.
>
> I recommend to review starting from simd.h + simd_scalar.h, then
> simd_builtin.h, simd_x86.h, and simd_fixed_size.h. I sure when we get this
> far we are a few iterations further.
>
> Regarding the license. The license header is currently just a copy from my
> repo, but we can change it to the libstdc++ license. The paperwork with the
> FSF is done.
>
>
> * include/Makefile.am: Add new header.
> * include/Makefile.in: Regenerate.
> * include/experimental/simd: New header for Parallelism TS 2.
> * include/experimental/bits/simd.h: Definition of the public simd
> interfaces and general implementation helpers.
> * include/experimental/bits/simd_builtin.h: Implementation of the
> _VecBuiltin simd_abi.
> * include/experimental/bits/simd_combine.h: Preliminary
> implementation of the _Combine simd_abi.
> * include/experimental/bits/simd_converter.h: Generic simd
> conversions.
> * include/experimental/bits/simd_detail.h: Internal macros for the
> simd implementation.
> * include/experimental/bits/simd_fixed_size.h: Simd fixed_size ABI
> specific implementations.
> * include/experimental/bits/simd_math.h: Math overloads for simd.
> * include/experimental/bits/simd_neon.h: Simd NEON specific
> implementations.
> * include/experimental/bits/simd_scalar.h: Simd scalar ABI
> specific implementations.
> * include/experimental/bits/simd_x86.h: Simd x86 specific
> implementations.
> * include/experimental/bits/simd_x86_conversions.h: x86 specific
> conversion optimizations.
--
──────────────────────────────────────────────────────────────────────────
Dr. Matthias Kretz https://mattkretz.github.io
GSI Helmholtz Centre for Heavy Ion Research https://gsi.de
std::experimental::simd https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] std::experimental::simd
2020-01-07 11:01 ` Matthias Kretz
@ 2020-01-07 11:17 ` Andrew Pinski
2020-01-07 13:19 ` Dr. Matthias Kretz
0 siblings, 1 reply; 13+ messages in thread
From: Andrew Pinski @ 2020-01-07 11:17 UTC (permalink / raw)
To: Matthias Kretz; +Cc: libstdc++, GCC Patches
On Tue, Jan 7, 2020 at 3:01 AM Matthias Kretz <m.kretz@gsi.de> wrote:
>
> Is there any chance left we can get this done for 10.1? If not, can we please
> get it ready for 10.2 ASAP?
>
> Cheers,
> Matthias
>
> On Montag, 14. Oktober 2019 14:12:12 CET Matthias Kretz wrote:
> > Let me try again to get this patch ready. It will need a few iterations...
> > This patch is without documentation and testsuite. I can add them on request
> > but would prefer a follow-up patch after getting this one right.
> >
> > I recommend to review starting from simd.h + simd_scalar.h, then
> > simd_builtin.h, simd_x86.h, and simd_fixed_size.h. I sure when we get this
> > far we are a few iterations further.
> >
> > Regarding the license. The license header is currently just a copy from my
> > repo, but we can change it to the libstdc++ license. The paperwork with the
> > FSF is done.
Seems like it would be better if we put the x86 and aarch64/arm
specific parts in their own headers.
Also all of the x86 conversion should be removed as
__builtin_convertvector is supported now.
libstdc++v3 is only ever supported by the version that comes with the compiler.
Thanks,
Andrew
> >
> >
> > * include/Makefile.am: Add new header.
> > * include/Makefile.in: Regenerate.
> > * include/experimental/simd: New header for Parallelism TS 2.
> > * include/experimental/bits/simd.h: Definition of the public simd
> > interfaces and general implementation helpers.
> > * include/experimental/bits/simd_builtin.h: Implementation of the
> > _VecBuiltin simd_abi.
> > * include/experimental/bits/simd_combine.h: Preliminary
> > implementation of the _Combine simd_abi.
> > * include/experimental/bits/simd_converter.h: Generic simd
> > conversions.
> > * include/experimental/bits/simd_detail.h: Internal macros for the
> > simd implementation.
> > * include/experimental/bits/simd_fixed_size.h: Simd fixed_size ABI
> > specific implementations.
> > * include/experimental/bits/simd_math.h: Math overloads for simd.
> > * include/experimental/bits/simd_neon.h: Simd NEON specific
> > implementations.
> > * include/experimental/bits/simd_scalar.h: Simd scalar ABI
> > specific implementations.
> > * include/experimental/bits/simd_x86.h: Simd x86 specific
> > implementations.
> > * include/experimental/bits/simd_x86_conversions.h: x86 specific
> > conversion optimizations.
>
>
> --
> ──────────────────────────────────────────────────────────────────────────
> Dr. Matthias Kretz https://mattkretz.github.io
> GSI Helmholtz Centre for Heavy Ion Research https://gsi.de
> std::experimental::simd https://github.com/VcDevel/std-simd
> ──────────────────────────────────────────────────────────────────────────
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] std::experimental::simd
2020-01-07 11:17 ` Andrew Pinski
@ 2020-01-07 13:19 ` Dr. Matthias Kretz
0 siblings, 0 replies; 13+ messages in thread
From: Dr. Matthias Kretz @ 2020-01-07 13:19 UTC (permalink / raw)
To: Andrew Pinski; +Cc: libstdc++, GCC Patches
On Dienstag, 7. Januar 2020 12:16:57 CET Andrew Pinski wrote:
> On Tue, Jan 7, 2020 at 3:01 AM Matthias Kretz <m.kretz@gsi.de> wrote:
> > Is there any chance left we can get this done for 10.1? If not, can we
> > please get it ready for 10.2 ASAP?
> >
> > Cheers,
> >
> > Matthias
> >
> > On Montag, 14. Oktober 2019 14:12:12 CET Matthias Kretz wrote:
> > > Let me try again to get this patch ready. It will need a few
> > > iterations...
> > > This patch is without documentation and testsuite. I can add them on
> > > request but would prefer a follow-up patch after getting this one
> > > right.
> > >
> > > I recommend to review starting from simd.h + simd_scalar.h, then
> > > simd_builtin.h, simd_x86.h, and simd_fixed_size.h. I sure when we get
> > > this
> > > far we are a few iterations further.
> > >
> > > Regarding the license. The license header is currently just a copy from
> > > my
> > > repo, but we can change it to the libstdc++ license. The paperwork with
> > > the
> > > FSF is done.
>
> Seems like it would be better if we put the x86 and aarch64/arm
> specific parts in their own headers.
Yes. I'm already working on it. It makes me unhappy in some of the generic
parts of the code, but I think it's still a worthwhile reorganization. Last
state is here: https://github.com/VcDevel/std-simd/tree/master/experimental/
bits
I'll prepare a new patch.
> Also all of the x86 conversion should be removed as
> __builtin_convertvector is supported now.
simd_x86_conversions.h is about PR85048 (and more missing optimizations). I'd
prefer to implement simd_x86_conversions.h in the compiler, but I'd need some
guidance. I'd like the first release of std::experimental::simd to have high
performance - because that's the main reason for using it. I'd rather wait a
release than taint the impression of its usefulness.
> libstdc++v3 is only ever supported by the version that comes with the
> compiler.
Right, that's an artifact of having active users of this code. I'll clean it
up.
Thanks for the feedback,
Matthias
--
──────────────────────────────┬────────────────────────────────────────────
Dr. Matthias Kretz │ SDE — Software Development for Experiments
Senior Software Engineer, │ 📞 +49 6159 713084
SIMD Expert, │ 📧 m.kretz@gsi.de
ISO C++ Committee Member │ 🔗 mattkretz.github.io
──────────────────────────────┴────────────────────────────────────────────
GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de
Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Ursula Weyrich, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
State Secretary / Staatssekretär Dr. Georg Schütte
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] std::experimental::simd
2019-10-24 8:26 ` Dr. Matthias Kretz
@ 2020-02-10 16:49 ` Thomas Rodgers
2020-02-10 20:14 ` Thomas Rodgers
0 siblings, 1 reply; 13+ messages in thread
From: Thomas Rodgers @ 2020-02-10 16:49 UTC (permalink / raw)
To: Dr. Matthias Kretz; +Cc: libstdc++
Catching up on this...(and since I'm not in Prague I won't be able to do so in person).
I finally got a chance to talk with Jeff Law recently about a more extensive testing strategy using Red Hat internal resources, and I am going to start working on
adapting his CI build scripts that drive checking out machines from our lab environment starting later this week.
Jonathan indicated you may have a newer version of your <simd> patch forthcoming? Is that something that's likely to happen before the end of this month? I'd
like to work on getting start working on getting <simd> in during stage4 (Jonathan is willing to accept it because it is experimental as long as it is in
good shape to commit).
Thanks,
Tom.
----- Original Message -----
From: "Dr. Matthias Kretz" <m.kretz@gsi.de>
To: trodgers@redhat.com
Cc: libstdc++@gcc.gnu.org
Sent: Thursday, October 24, 2019 1:26:47 AM
Subject: Re: [PATCH] std::experimental::simd
On Montag, 14. Oktober 2019 14:12:12 CEST Matthias Kretz wrote:
> This patch is without documentation and testsuite. I can add them on request
> but would prefer a follow-up patch after getting this one right.
Regarding tests, here's what I think a "consumer test" should cover:
1. compiler flags
- `-march=native -O2`
2. element types
- char
- uint8_t
- int
- float
- double
3. operations
- broadcast ctor
- generator ctor
- all operators
- compares
- mask reductions
- non-converting loads & stores
- non-converting masked loads & stores
- horizontal reductions
- math: 1024 input values, comparing fun(T) against fun(simd<T>)
- conversions:
* char <-> int
* uint8_t <-> int
* uint8_t <-> float
* int <-> float
* int <-> double
* float <-> double
4. ABI tags:
- scalar
- fixed_size<{2, 3, 4, 6, 8, 12, 16, 32}>
- deduce_t<T, {2, 3, 4, 6, 8, 12, 16, 32}> (if different from fixed_size)
- deduce_t<T, 64> for AVX512BW targets
- compatible (if different from above)
- native (if different from above)
Note that 1-4 are orthogonal and already span a huge space. I'm sure we need
to reduce this list, not expand.
--
──────────────────────────────────────────────────────────────────────────
Dr. Matthias Kretz https://mattkretz.github.io
GSI Helmholtzzentrum für Schwerionenforschung https://gsi.de
SIMD easy and portable https://github.com/VcDevel/Vc
──────────────────────────────────────────────────────────────────────────
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] std::experimental::simd
2020-02-10 16:49 ` Thomas Rodgers
@ 2020-02-10 20:14 ` Thomas Rodgers
0 siblings, 0 replies; 13+ messages in thread
From: Thomas Rodgers @ 2020-02-10 20:14 UTC (permalink / raw)
To: Dr. Matthias Kretz; +Cc: libstdc++
Erk, that was intended as a private reply.
Thomak Rodgers writes:
> Catching up on this...(and since I'm not in Prague I won't be able to do so in person).
>
> I finally got a chance to talk with Jeff Law recently about a more extensive testing strategy using Red Hat internal resources, and I am going to start working on
> adapting his CI build scripts that drive checking out machines from our lab environment starting later this week.
>
> Jonathan indicated you may have a newer version of your <simd> patch forthcoming? Is that something that's likely to happen before the end of this month? I'd
> like to work on getting start working on getting <simd> in during stage4 (Jonathan is willing to accept it because it is experimental as long as it is in
> good shape to commit).
>
> Thanks,
> Tom.
>
>
> ----- Original Message -----
> From: "Dr. Matthias Kretz" <m.kretz@gsi.de>
> To: trodgers@redhat.com
> Cc: libstdc++@gcc.gnu.org
> Sent: Thursday, October 24, 2019 1:26:47 AM
> Subject: Re: [PATCH] std::experimental::simd
>
> On Montag, 14. Oktober 2019 14:12:12 CEST Matthias Kretz wrote:
>> This patch is without documentation and testsuite. I can add them on request
>> but would prefer a follow-up patch after getting this one right.
>
> Regarding tests, here's what I think a "consumer test" should cover:
>
> 1. compiler flags
> - `-march=native -O2`
>
> 2. element types
> - char
> - uint8_t
> - int
> - float
> - double
>
> 3. operations
> - broadcast ctor
> - generator ctor
> - all operators
> - compares
> - mask reductions
> - non-converting loads & stores
> - non-converting masked loads & stores
> - horizontal reductions
> - math: 1024 input values, comparing fun(T) against fun(simd<T>)
> - conversions:
> * char <-> int
> * uint8_t <-> int
> * uint8_t <-> float
> * int <-> float
> * int <-> double
> * float <-> double
>
> 4. ABI tags:
> - scalar
> - fixed_size<{2, 3, 4, 6, 8, 12, 16, 32}>
> - deduce_t<T, {2, 3, 4, 6, 8, 12, 16, 32}> (if different from fixed_size)
> - deduce_t<T, 64> for AVX512BW targets
> - compatible (if different from above)
> - native (if different from above)
>
> Note that 1-4 are orthogonal and already span a huge space. I'm sure we need
> to reduce this list, not expand.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] std::experimental::simd
[not found] ` <xkqeo8qyl8y8.fsf@trodgers.remote>
@ 2020-05-08 19:03 ` Matthias Kretz
2020-11-11 23:43 ` Jonathan Wakely
0 siblings, 1 reply; 13+ messages in thread
From: Matthias Kretz @ 2020-05-08 19:03 UTC (permalink / raw)
To: Thomas Rodgers, libstdc++, Gcc-patches
[-- Attachment #1: Type: text/plain, Size: 797 bytes --]
Here's my last update to the std::experimental::simd patch. It's currently
based on the gcc-10 branch.
Cheers,
Matthias
--
──────────────────────────────────────────────────────────────────────────
Dr. Matthias Kretz https://mattkretz.github.io
GSI Helmholtz Centre for Heavy Ion Research https://gsi.de
std::experimental::simd https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────
[-- Attachment #2: simd.patch --]
[-- Type: text/x-patch, Size: 1583956 bytes --]
diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
index 0f03126db1c..c7ac33faaf5 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
@@ -2869,6 +2869,17 @@ since C++14 and the implementation is complete.
<entry>Library Fundamentals 2 TS</entry>
</row>
+ <row>
+ <entry>
+ <link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0214r9.pdf">
+ P0214R9
+ </link>
+ </entry>
+ <entry>Data-Parallel Types</entry>
+ <entry>Y</entry>
+ <entry>Parallelism 2 TS</entry>
+ </row>
+
</tbody>
</tgroup>
</table>
@@ -3014,6 +3025,185 @@ since C++14 and the implementation is complete.
If <code>!is_regular_file(p)</code>, an error is reported.
</para>
+ <section xml:id="iso.2017.par2ts" xreflabel="Implementation Specific Behavior of the Parallelism 2 TS"><info><title>Parallelism 2 TS</title></info>
+
+ <para>
+ <emphasis>9.3 [parallel.simd.abi]</emphasis>
+ <code>max_fixed_size<T></code> is 32, except when targetting
+ AVX512BW and <code>sizeof(T)</code> is 1.
+ </para>
+
+ <para>
+ When targeting 32-bit x86,
+ <classname>simd_abi::compatible<T></classname> is an alias for
+ <classname>simd_abi::scalar</classname>. When targeting 64-bit x86
+ (including x32), <classname>simd_abi::compatible<T></classname> is
+ an alias for <classname>simd_abi::_VecBuiltin<16></classname>,
+ unless <code>T</code> is <code>long double</code>, in which case it is
+ an alias for <classname>simd_abi::scalar</classname>.
+ </para>
+
+ <para>
+ When targeting x86 (both 32-bit and 64-bit),
+ <classname>simd_abi::native<T></classname> is an alias for one of
+ <classname>simd_abi::_VecBuiltin<16></classname>,
+ <classname>simd_abi::_VecBuiltin<32></classname>, or
+ <classname>simd_abi::_VecBltnBtmsk<64></classname>, depending on
+ the machine options the compiler was invoked with.
+ </para>
+
+ <para>
+ For any other targeted machine
+ <classname>simd_abi::compatible<T></classname> and
+ <classname>simd_abi::native<T></classname> are aliases for
+ <classname>simd_abi::scalar</classname>. (subject to change)
+ </para>
+
+ <para>
+ The extended ABI tag types defined in the
+ <code>std::experimental::parallelism_v2::simd_abi</code> namespace are:
+ <classname>simd_abi::_VecBuiltin<Bytes></classname>, and
+ <classname>simd_abi::_VecBltnBtmsk<Bytes></classname>.
+ </para>
+
+ <para>
+ <classname>simd_abi::deduce<T, N, Abis...>::type</classname>,
+ with <code>N > 1</code> is an alias for an extended ABI tag, if a
+ supported extended ABI tag exists. Otherwise it is an alias for
+ <classname>simd_abi::fixed_size<N></classname>. The <classname>
+ simd_abi::_VecBltnBtmsk</classname> ABI tag is preferred over
+ <classname>simd_abi::_VecBuiltin</classname>.
+ </para>
+
+ <para>
+ <emphasis>9.4 [parallel.simd.traits]</emphasis>
+ <classname>memory_alignment<T, U>::value</classname> is
+ <code>sizeof(U) * T::size()</code> rounded up to the next power-of-two
+ value.
+ </para>
+
+ <para>
+ <emphasis>9.6.1 [parallel.simd.overview]</emphasis>
+ On ARM, <classname>simd<T, _VecBuiltin<Bytes>></classname>
+ is supported if <code>__ARM_NEON</code> is defined and
+ <code>sizeof(T) <= 4</code>. Additionally,
+ <code>sizeof(T) == 8</code> with integral <code>T</code> is supported if
+ <code>__ARM_ARCH >= 8</code>, and <code>double</code> is supported if
+ <code>__aarch64__</code> is defined.
+ On x86, given an extended ABI tag <code>Abi</code>,
+ <classname>simd<T, Abi></classname> is supported according to the
+ following table:
+ <table frame="all" xml:id="table.par2ts_simd_support">
+ <title>Support for Extended ABI Tags</title>
+
+ <tgroup cols="4" align="left" colsep="0" rowsep="1">
+ <colspec colname="c1"/>
+ <colspec colname="c2"/>
+ <colspec colname="c3"/>
+ <colspec colname="c4"/>
+ <thead>
+ <row>
+ <entry>ABI tag <code>Abi</code></entry>
+ <entry>value type <code>T</code></entry>
+ <entry>values for <code>Bytes</code></entry>
+ <entry>required machine option</entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry morerows="5">
+ <classname>_VecBuiltin<Bytes></classname>
+ </entry>
+ <entry morerows="1"><code>float</code></entry>
+ <entry>8, 12, 16</entry>
+ <entry>"-msse"</entry>
+ </row>
+
+ <row>
+ <entry>20, 24, 28, 32</entry>
+ <entry>"-mavx"</entry>
+ </row>
+
+ <row>
+ <entry morerows="1"><code>double</code></entry>
+ <entry>16</entry>
+ <entry>"-msse2"</entry>
+ </row>
+
+ <row>
+ <entry>24, 32</entry>
+ <entry>"-mavx"</entry>
+ </row>
+
+ <row>
+ <entry morerows="1">
+ integral types other than <code>bool</code>
+ </entry>
+ <entry>
+ <code>Bytes</code> ≤ 16 and <code>Bytes</code> divisible by
+ <code>sizeof(T)</code>
+ </entry>
+ <entry>"-msse2"</entry>
+ </row>
+
+ <row>
+ <entry>
+ 16 < <code>Bytes</code> ≤ 32 and <code>Bytes</code>
+ divisible by <code>sizeof(T)</code>
+ </entry>
+ <entry>"-mavx2"</entry>
+ </row>
+
+ <row>
+ <entry morerows="1">
+ <classname>_VecBuiltin<Bytes></classname> and
+ <classname>_VecBltnBtmsk<Bytes></classname>
+ </entry>
+ <entry>
+ vectorizable types with <code>sizeof(T)</code> ≥ 4
+ </entry>
+ <entry morerows="1">
+ 32 < <code>Bytes</code> ≤ 64 and <code>Bytes</code>
+ divisible by <code>sizeof(T)</code>
+ </entry>
+ <entry>"-mavx512f"</entry>
+ </row>
+
+ <row>
+ <entry>
+ vectorizable types with <code>sizeof(T)</code> < 4
+ </entry>
+ <entry>"-mavx512bw"</entry>
+ </row>
+
+ <row>
+ <entry morerows="1">
+ <classname>_VecBltnBtmsk<Bytes></classname>
+ </entry>
+ <entry>
+ vectorizable types with <code>sizeof(T)</code> ≥ 4
+ </entry>
+ <entry morerows="1">
+ <code>Bytes</code> ≤ 32 and <code>Bytes</code> divisible by
+ <code>sizeof(T)</code>
+ </entry>
+ <entry>"-mavx512vl"</entry>
+ </row>
+
+ <row>
+ <entry>
+ vectorizable types with <code>sizeof(T)</code> < 4
+ </entry>
+ <entry>"-mavx512bw" and "-mavx512vl"</entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </table>
+ </para>
+
+ </section>
</section>
diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 80aeb3f8959..d1c870f620c 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -734,6 +734,7 @@ experimental_headers = \
${experimental_srcdir}/ratio \
${experimental_srcdir}/regex \
${experimental_srcdir}/set \
+ ${experimental_srcdir}/simd \
${experimental_srcdir}/socket \
${experimental_srcdir}/source_location \
${experimental_srcdir}/string \
@@ -754,6 +755,16 @@ experimental_bits_headers = \
${experimental_bits_srcdir}/lfts_config.h \
${experimental_bits_srcdir}/net.h \
${experimental_bits_srcdir}/shared_ptr.h \
+ ${experimental_bits_srcdir}/simd.h \
+ ${experimental_bits_srcdir}/simd_builtin.h \
+ ${experimental_bits_srcdir}/simd_converter.h \
+ ${experimental_bits_srcdir}/simd_detail.h \
+ ${experimental_bits_srcdir}/simd_fixed_size.h \
+ ${experimental_bits_srcdir}/simd_math.h \
+ ${experimental_bits_srcdir}/simd_neon.h \
+ ${experimental_bits_srcdir}/simd_scalar.h \
+ ${experimental_bits_srcdir}/simd_x86.h \
+ ${experimental_bits_srcdir}/simd_x86_conversions.h \
${experimental_bits_srcdir}/string_view.tcc \
${experimental_bits_filesystem_headers}
diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in
index eb437ad8d8d..686331fd15c 100644
--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -1079,6 +1079,7 @@ experimental_headers = \
${experimental_srcdir}/ratio \
${experimental_srcdir}/regex \
${experimental_srcdir}/set \
+ ${experimental_srcdir}/simd \
${experimental_srcdir}/socket \
${experimental_srcdir}/source_location \
${experimental_srcdir}/string \
@@ -1099,6 +1100,16 @@ experimental_bits_headers = \
${experimental_bits_srcdir}/lfts_config.h \
${experimental_bits_srcdir}/net.h \
${experimental_bits_srcdir}/shared_ptr.h \
+ ${experimental_bits_srcdir}/simd.h \
+ ${experimental_bits_srcdir}/simd_builtin.h \
+ ${experimental_bits_srcdir}/simd_converter.h \
+ ${experimental_bits_srcdir}/simd_detail.h \
+ ${experimental_bits_srcdir}/simd_fixed_size.h \
+ ${experimental_bits_srcdir}/simd_math.h \
+ ${experimental_bits_srcdir}/simd_neon.h \
+ ${experimental_bits_srcdir}/simd_scalar.h \
+ ${experimental_bits_srcdir}/simd_x86.h \
+ ${experimental_bits_srcdir}/simd_x86_conversions.h \
${experimental_bits_srcdir}/string_view.tcc \
${experimental_bits_filesystem_headers}
diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/include/experimental/bits/simd.h
new file mode 100644
index 00000000000..298ff5957a1
--- /dev/null
+++ b/libstdc++-v3/include/experimental/bits/simd.h
@@ -0,0 +1,5031 @@
+// Definition of the public simd interfaces -*- C++ -*-
+
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library. This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+// <http://www.gnu.org/licenses/>.
+
+#ifndef _GLIBCXX_EXPERIMENTAL_SIMD_H
+#define _GLIBCXX_EXPERIMENTAL_SIMD_H
+
+#if __cplusplus >= 201703L
+
+#include "simd_detail.h"
+#include <bitset>
+#include <climits>
+#include <cstring>
+#include <functional>
+#include <iosfwd>
+#include <limits>
+#include <utility>
+
+#if _GLIBCXX_SIMD_X86INTRIN
+#include <x86intrin.h>
+#elif _GLIBCXX_SIMD_HAVE_NEON
+#include <arm_neon.h>
+#endif
+
+_GLIBCXX_SIMD_BEGIN_NAMESPACE
+
+#if !_GLIBCXX_SIMD_X86INTRIN
+using __m128 [[__gnu__::__vector_size__(16)]] = float;
+using __m128d [[__gnu__::__vector_size__(16)]] = double;
+using __m128i [[__gnu__::__vector_size__(16)]] = long long;
+using __m256 [[__gnu__::__vector_size__(32)]] = float;
+using __m256d [[__gnu__::__vector_size__(32)]] = double;
+using __m256i [[__gnu__::__vector_size__(32)]] = long long;
+using __m512 [[__gnu__::__vector_size__(64)]] = float;
+using __m512d [[__gnu__::__vector_size__(64)]] = double;
+using __m512i [[__gnu__::__vector_size__(64)]] = long long;
+#endif
+
+// __next_power_of_2{{{
+/**
+ * \internal
+ * Returns the next power of 2 larger than or equal to \p __x.
+ */
+constexpr std::size_t
+__next_power_of_2(std::size_t __x)
+{
+ return (__x & (__x - 1)) == 0 ? __x
+ : __next_power_of_2((__x | (__x >> 1)) + 1);
+}
+
+// }}}
+namespace simd_abi {
+// {{{
+// implementation details:
+struct _Scalar;
+template <int _Np> struct _Fixed;
+
+// There are two major ABIs that appear on different architectures.
+// Both have non-boolean values packed into an N Byte register
+// -> #elements = N / sizeof(T)
+// Masks differ:
+// 1. Use value vector registers for masks (all 0 or all 1)
+// 2. Use bitmasks (mask registers) with one bit per value in the corresponding
+// value vector
+//
+// Both can be partially used, masking off the rest when doing horizontal
+// operations or operations that can trap (e.g. FP_INVALID or integer division
+// by 0). This is encoded as the number of used bytes.
+template <int _UsedBytes> struct _VecBuiltin;
+template <int _UsedBytes> struct _VecBltnBtmsk;
+
+template <typename _Tp, int _Np> using _VecN = _VecBuiltin<sizeof(_Tp) * _Np>;
+
+template <int _UsedBytes = 16> using _Sse = _VecBuiltin<_UsedBytes>;
+template <int _UsedBytes = 32> using _Avx = _VecBuiltin<_UsedBytes>;
+template <int _UsedBytes = 64> using _Avx512 = _VecBltnBtmsk<_UsedBytes>;
+template <int _UsedBytes = 16> using _Neon = _VecBuiltin<_UsedBytes>;
+
+// implementation-defined:
+using __sse = _Sse<>;
+using __avx = _Avx<>;
+using __avx512 = _Avx512<>;
+using __neon = _Neon<>;
+
+using __neon128 = _Neon<16>;
+using __neon64 = _Neon<8>;
+
+// standard:
+template <typename _Tp, size_t _Np, typename...> struct deduce;
+template <int _Np> using fixed_size = _Fixed<_Np>;
+using scalar = _Scalar;
+// }}}
+} // namespace simd_abi
+// forward declarations is_simd(_mask), simd(_mask), simd_size {{{
+template <typename _Tp> struct is_simd;
+template <typename _Tp> struct is_simd_mask;
+template <typename _Tp, typename _Abi> class simd;
+template <typename _Tp, typename _Abi> class simd_mask;
+template <typename _Tp, typename _Abi> struct simd_size;
+// }}}
+// load/store flags {{{
+struct element_aligned_tag
+{
+};
+struct vector_aligned_tag
+{
+};
+template <size_t _Np> struct overaligned_tag
+{
+ static constexpr size_t _S_alignment = _Np;
+};
+inline constexpr element_aligned_tag element_aligned = {};
+inline constexpr vector_aligned_tag vector_aligned = {};
+template <size_t _Np> inline constexpr overaligned_tag<_Np> overaligned = {};
+// }}}
+
+// vvv ---- type traits ---- vvv
+// integer type aliases{{{
+using _UChar = unsigned char;
+using _SChar = signed char;
+using _UShort = unsigned short;
+using _UInt = unsigned int;
+using _ULong = unsigned long;
+using _ULLong = unsigned long long;
+using _LLong = long long;
+//}}}
+// __identity/__id{{{
+template <typename _Tp> struct __identity
+{
+ using type = _Tp;
+};
+template <typename _Tp> using __id = typename __identity<_Tp>::type;
+
+// }}}
+// __first_of_pack{{{
+template <typename _T0, typename...> struct __first_of_pack
+{
+ using type = _T0;
+};
+template <typename... _Ts>
+using __first_of_pack_t = typename __first_of_pack<_Ts...>::type;
+
+//}}}
+// __value_type_or_identity_t {{{
+template <typename _Tp>
+typename _Tp::value_type
+__value_type_or_identity_impl(int);
+template <typename _Tp>
+_Tp
+__value_type_or_identity_impl(float);
+template <typename _Tp>
+using __value_type_or_identity_t
+ = decltype(__value_type_or_identity_impl<_Tp>(int()));
+
+// }}}
+// __is_vectorizable {{{
+template <typename _Tp>
+struct __is_vectorizable : public std::is_arithmetic<_Tp>
+{
+};
+template <> struct __is_vectorizable<bool> : public false_type
+{
+};
+template <typename _Tp>
+inline constexpr bool __is_vectorizable_v = __is_vectorizable<_Tp>::value;
+// Deduces to a vectorizable type
+template <typename _Tp, typename = enable_if_t<__is_vectorizable_v<_Tp>>>
+using _Vectorizable = _Tp;
+
+// }}}
+// _LoadStorePtr / __is_possible_loadstore_conversion {{{
+template <typename _Ptr, typename _ValueType>
+struct __is_possible_loadstore_conversion
+ : conjunction<__is_vectorizable<_Ptr>, __is_vectorizable<_ValueType>>
+{
+};
+template <> struct __is_possible_loadstore_conversion<bool, bool> : true_type
+{
+};
+// Deduces to a type allowed for load/store with the given value type.
+template <typename _Ptr, typename _ValueType,
+ typename = enable_if_t<
+ __is_possible_loadstore_conversion<_Ptr, _ValueType>::value>>
+using _LoadStorePtr = _Ptr;
+
+// }}}
+// _SizeConstant{{{
+template <size_t _X> using _SizeConstant = integral_constant<size_t, _X>;
+// }}}
+// __is_bitmask{{{
+template <typename _Tp, typename = std::void_t<>>
+struct __is_bitmask : false_type
+{
+};
+template <typename _Tp>
+inline constexpr bool __is_bitmask_v = __is_bitmask<_Tp>::value;
+
+// the __mmaskXX case:
+template <typename _Tp>
+struct __is_bitmask<_Tp, std::void_t<decltype(std::declval<unsigned&>()
+ = std::declval<_Tp>() & 1u)>>
+ : true_type
+{
+};
+
+// }}}
+// __int_for_sizeof{{{
+template <size_t> struct __int_for_sizeof;
+template <> struct __int_for_sizeof<1>
+{
+ using type = signed char;
+ static_assert(sizeof(type) == 1);
+};
+template <> struct __int_for_sizeof<2>
+{
+ using type = signed short;
+ static_assert(sizeof(type) == 2);
+};
+template <> struct __int_for_sizeof<4>
+{
+ using type = signed int;
+ static_assert(sizeof(type) == 4);
+};
+template <> struct __int_for_sizeof<8>
+{
+ using type = signed long long;
+ static_assert(sizeof(type) == 8);
+};
+#ifdef __SIZEOF_INT128__
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wpedantic"
+template <> struct __int_for_sizeof<16>
+{
+ using type = __int128;
+ static_assert(sizeof(type) == 16);
+};
+#pragma GCC diagnostic pop
+#endif // __SIZEOF_INT128__
+template <typename _Tp>
+using __int_for_sizeof_t = typename __int_for_sizeof<sizeof(_Tp)>::type;
+template <size_t _Np>
+using __int_with_sizeof_t = typename __int_for_sizeof<_Np>::type;
+
+// }}}
+// __is_fixed_size_abi{{{
+template <typename _Tp> struct __is_fixed_size_abi : false_type
+{
+};
+template <int _Np>
+struct __is_fixed_size_abi<simd_abi::fixed_size<_Np>> : true_type
+{
+};
+
+template <typename _Tp>
+inline constexpr bool __is_fixed_size_abi_v = __is_fixed_size_abi<_Tp>::value;
+
+// }}}
+// constexpr feature detection{{{
+constexpr inline bool __have_mmx = _GLIBCXX_SIMD_HAVE_MMX;
+constexpr inline bool __have_sse = _GLIBCXX_SIMD_HAVE_SSE;
+constexpr inline bool __have_sse2 = _GLIBCXX_SIMD_HAVE_SSE2;
+constexpr inline bool __have_sse3 = _GLIBCXX_SIMD_HAVE_SSE3;
+constexpr inline bool __have_ssse3 = _GLIBCXX_SIMD_HAVE_SSSE3;
+constexpr inline bool __have_sse4_1 = _GLIBCXX_SIMD_HAVE_SSE4_1;
+constexpr inline bool __have_sse4_2 = _GLIBCXX_SIMD_HAVE_SSE4_2;
+constexpr inline bool __have_xop = _GLIBCXX_SIMD_HAVE_XOP;
+constexpr inline bool __have_avx = _GLIBCXX_SIMD_HAVE_AVX;
+constexpr inline bool __have_avx2 = _GLIBCXX_SIMD_HAVE_AVX2;
+constexpr inline bool __have_bmi = _GLIBCXX_SIMD_HAVE_BMI1;
+constexpr inline bool __have_bmi2 = _GLIBCXX_SIMD_HAVE_BMI2;
+constexpr inline bool __have_lzcnt = _GLIBCXX_SIMD_HAVE_LZCNT;
+constexpr inline bool __have_sse4a = _GLIBCXX_SIMD_HAVE_SSE4A;
+constexpr inline bool __have_fma = _GLIBCXX_SIMD_HAVE_FMA;
+constexpr inline bool __have_fma4 = _GLIBCXX_SIMD_HAVE_FMA4;
+constexpr inline bool __have_f16c = _GLIBCXX_SIMD_HAVE_F16C;
+constexpr inline bool __have_popcnt = _GLIBCXX_SIMD_HAVE_POPCNT;
+constexpr inline bool __have_avx512f = _GLIBCXX_SIMD_HAVE_AVX512F;
+constexpr inline bool __have_avx512dq = _GLIBCXX_SIMD_HAVE_AVX512DQ;
+constexpr inline bool __have_avx512vl = _GLIBCXX_SIMD_HAVE_AVX512VL;
+constexpr inline bool __have_avx512bw = _GLIBCXX_SIMD_HAVE_AVX512BW;
+constexpr inline bool __have_avx512dq_vl = __have_avx512dq && __have_avx512vl;
+constexpr inline bool __have_avx512bw_vl = __have_avx512bw && __have_avx512vl;
+
+constexpr inline bool __have_neon = _GLIBCXX_SIMD_HAVE_NEON;
+constexpr inline bool __have_neon_a32 = _GLIBCXX_SIMD_HAVE_NEON_A32;
+constexpr inline bool __have_neon_a64 = _GLIBCXX_SIMD_HAVE_NEON_A64;
+
+#ifdef __POWER9_VECTOR__
+constexpr inline bool __have_power9vec = true;
+#else
+constexpr inline bool __have_power9vec = false;
+#endif
+#if defined __POWER8_VECTOR__
+constexpr inline bool __have_power8vec = true;
+#else
+constexpr inline bool __have_power8vec = __have_power9vec;
+#endif
+#if defined __VSX__
+constexpr inline bool __have_power_vsx = true;
+#else
+constexpr inline bool __have_power_vsx = __have_power8vec;
+#endif
+#if defined __ALTIVEC__
+constexpr inline bool __have_power_vmx = true;
+#else
+constexpr inline bool __have_power_vmx = __have_power_vsx;
+#endif
+
+// }}}
+// __is_scalar_abi {{{
+template <typename _Abi>
+constexpr bool
+__is_scalar_abi()
+{
+ return std::is_same_v<simd_abi::scalar, _Abi>;
+}
+
+// }}}
+// __abi_bytes_v {{{
+template <template <int> class _Abi, int _Bytes>
+constexpr int
+__abi_bytes_impl(_Abi<_Bytes>*)
+{
+ return _Bytes;
+}
+template <typename _Tp>
+constexpr int
+__abi_bytes_impl(_Tp*)
+{
+ return -1;
+}
+template <typename _Abi>
+inline constexpr int __abi_bytes_v
+ = __abi_bytes_impl(static_cast<_Abi*>(nullptr));
+
+// }}}
+// __is_builtin_bitmask_abi {{{
+template <typename _Abi>
+constexpr bool
+__is_builtin_bitmask_abi()
+{
+ return std::is_same_v<simd_abi::_VecBltnBtmsk<__abi_bytes_v<_Abi>>, _Abi>;
+}
+
+// }}}
+// __is_sse_abi {{{
+template <typename _Abi>
+constexpr bool
+__is_sse_abi()
+{
+ constexpr auto _Bytes = __abi_bytes_v<_Abi>;
+ return _Bytes <= 16 && std::is_same_v<simd_abi::_VecBuiltin<_Bytes>, _Abi>;
+}
+
+// }}}
+// __is_avx_abi {{{
+template <typename _Abi>
+constexpr bool
+__is_avx_abi()
+{
+ constexpr auto _Bytes = __abi_bytes_v<_Abi>;
+ return _Bytes > 16 && _Bytes <= 32
+ && std::is_same_v<simd_abi::_VecBuiltin<_Bytes>, _Abi>;
+}
+
+// }}}
+// __is_avx512_abi {{{
+template <typename _Abi>
+constexpr bool
+__is_avx512_abi()
+{
+ constexpr auto _Bytes = __abi_bytes_v<_Abi>;
+ return _Bytes <= 64 && std::is_same_v<simd_abi::_Avx512<_Bytes>, _Abi>;
+}
+
+// }}}
+// __is_neon_abi {{{
+template <typename _Abi>
+constexpr bool
+__is_neon_abi()
+{
+ constexpr auto _Bytes = __abi_bytes_v<_Abi>;
+ return _Bytes <= 16 && std::is_same_v<simd_abi::_VecBuiltin<_Bytes>, _Abi>;
+}
+
+// }}}
+// __make_dependent_t {{{
+template <typename, typename _Up> struct __make_dependent
+{
+ using type = _Up;
+};
+template <typename _Tp, typename _Up>
+using __make_dependent_t = typename __make_dependent<_Tp, _Up>::type;
+
+// }}}
+// ^^^ ---- type traits ---- ^^^
+
+// __assert_unreachable{{{
+template <typename _Tp> struct __assert_unreachable
+{
+ static_assert(!std::is_same_v<_Tp, _Tp>, "this should be unreachable");
+};
+
+// }}}
+// __size_or_zero_v {{{
+template <typename _Tp, typename _Ap, size_t _Np = simd_size<_Tp, _Ap>::value>
+constexpr size_t
+__size_or_zero_dispatch(int)
+{
+ return _Np;
+}
+template <typename _Tp, typename _Ap>
+constexpr size_t
+__size_or_zero_dispatch(float)
+{
+ return 0;
+}
+template <typename _Tp, typename _Ap>
+inline constexpr size_t __size_or_zero_v = __size_or_zero_dispatch<_Tp, _Ap>(0);
+
+// }}}
+// __bit_cast {{{
+template <typename _To, typename _From>
+_GLIBCXX_SIMD_INTRINSIC _To
+__bit_cast(const _From __x)
+{
+ static_assert(sizeof(_To) == sizeof(_From));
+ _To __r;
+ __builtin_memcpy(reinterpret_cast<char*>(&__r),
+ reinterpret_cast<const char*>(&__x), sizeof(_To));
+ return __r;
+}
+
+// }}}
+// __div_roundup {{{
+inline constexpr std::size_t
+__div_roundup(std::size_t __a, std::size_t __b)
+{
+ return (__a + __b - 1) / __b;
+}
+
+// }}}
+// _ExactBool{{{
+class _ExactBool
+{
+ const bool _M_data;
+
+public:
+ _GLIBCXX_SIMD_INTRINSIC constexpr _ExactBool(bool __b) : _M_data(__b) {}
+ _ExactBool(int) = delete;
+ _GLIBCXX_SIMD_INTRINSIC constexpr operator bool() const { return _M_data; }
+};
+
+// }}}
+// __execute_n_times{{{
+template <typename _Fp, size_t... _I>
+_GLIBCXX_SIMD_INTRINSIC constexpr void
+__execute_on_index_sequence(_Fp&& __f, std::index_sequence<_I...>)
+{
+ [[maybe_unused]] auto&& __x = {(__f(_SizeConstant<_I>()), 0)...};
+}
+
+template <typename _Fp>
+_GLIBCXX_SIMD_INTRINSIC constexpr void
+__execute_on_index_sequence(_Fp&&, std::index_sequence<>)
+{}
+
+template <size_t _Np, typename _Fp>
+_GLIBCXX_SIMD_INTRINSIC constexpr void
+__execute_n_times(_Fp&& __f)
+{
+ __execute_on_index_sequence(static_cast<_Fp&&>(__f),
+ std::make_index_sequence<_Np>{});
+}
+
+// }}}
+// __generate_from_n_evaluations{{{
+template <typename _R, typename _Fp, size_t... _I>
+_GLIBCXX_SIMD_INTRINSIC constexpr _R
+__execute_on_index_sequence_with_return(_Fp&& __f, std::index_sequence<_I...>)
+{
+ return _R{__f(_SizeConstant<_I>())...};
+}
+
+template <size_t _Np, typename _R, typename _Fp>
+_GLIBCXX_SIMD_INTRINSIC constexpr _R
+__generate_from_n_evaluations(_Fp&& __f)
+{
+ return __execute_on_index_sequence_with_return<_R>(
+ static_cast<_Fp&&>(__f), std::make_index_sequence<_Np>{});
+}
+
+// }}}
+// __call_with_n_evaluations{{{
+template <size_t... _I, typename _F0, typename _FArgs>
+_GLIBCXX_SIMD_INTRINSIC constexpr auto
+__call_with_n_evaluations(std::index_sequence<_I...>, _F0&& __f0,
+ _FArgs&& __fargs)
+{
+ return __f0(__fargs(_SizeConstant<_I>())...);
+}
+
+template <size_t _Np, typename _F0, typename _FArgs>
+_GLIBCXX_SIMD_INTRINSIC constexpr auto
+__call_with_n_evaluations(_F0&& __f0, _FArgs&& __fargs)
+{
+ return __call_with_n_evaluations(std::make_index_sequence<_Np>{},
+ static_cast<_F0&&>(__f0),
+ static_cast<_FArgs&&>(__fargs));
+}
+
+// }}}
+// __call_with_subscripts{{{
+template <size_t _First = 0, size_t... _It, typename _Tp, typename _Fp>
+_GLIBCXX_SIMD_INTRINSIC constexpr auto
+__call_with_subscripts(_Tp&& __x, index_sequence<_It...>, _Fp&& __fun)
+{
+ return __fun(__x[_First + _It]...);
+}
+
+template <size_t _Np, size_t _First = 0, typename _Tp, typename _Fp>
+_GLIBCXX_SIMD_INTRINSIC constexpr auto
+__call_with_subscripts(_Tp&& __x, _Fp&& __fun)
+{
+ return __call_with_subscripts<_First>(static_cast<_Tp&&>(__x),
+ std::make_index_sequence<_Np>(),
+ static_cast<_Fp&&>(__fun));
+}
+
+// }}}
+// __may_alias{{{
+/**\internal
+ * Helper __may_alias<_Tp> that turns _Tp into the type to be used for an
+ * aliasing pointer. This adds the __may_alias attribute to _Tp (with compilers
+ * that support it).
+ */
+template <typename _Tp> using __may_alias [[__gnu__::__may_alias__]] = _Tp;
+
+// }}}
+// _UnsupportedBase {{{
+// simd and simd_mask base for unsupported <_Tp, _Abi>
+struct _UnsupportedBase
+{
+ _UnsupportedBase() = delete;
+ _UnsupportedBase(const _UnsupportedBase&) = delete;
+ _UnsupportedBase& operator=(const _UnsupportedBase&) = delete;
+ ~_UnsupportedBase() = delete;
+};
+
+// }}}
+// _InvalidTraits {{{
+/**
+ * \internal
+ * Defines the implementation of __a given <_Tp, _Abi>.
+ *
+ * Implementations must ensure that only valid <_Tp, _Abi> instantiations are
+ * possible. Static assertions in the type definition do not suffice. It is
+ * important that SFINAE works.
+ */
+struct _InvalidTraits
+{
+ using _IsValid = false_type;
+ using _SimdBase = _UnsupportedBase;
+ using _MaskBase = _UnsupportedBase;
+
+ static constexpr size_t _S_simd_align = 1;
+ struct _SimdImpl;
+ struct _SimdMember
+ {
+ };
+ struct _SimdCastType;
+
+ static constexpr size_t _S_mask_align = 1;
+ struct _MaskImpl;
+ struct _MaskMember
+ {
+ };
+ struct _MaskCastType;
+};
+// }}}
+// _SimdTraits {{{
+template <typename _Tp, typename _Abi, typename = std::void_t<>>
+struct _SimdTraits : _InvalidTraits
+{
+};
+
+// }}}
+// __private_init, __bitset_init{{{
+/**
+ * \internal
+ * Tag used for private init constructor of simd and simd_mask
+ */
+inline constexpr struct _PrivateInit
+{
+} __private_init = {};
+inline constexpr struct _BitsetInit
+{
+} __bitset_init = {};
+
+// }}}
+// __is_narrowing_conversion<_From, _To>{{{
+template <typename _From, typename _To, bool = std::is_arithmetic<_From>::value,
+ bool = std::is_arithmetic<_To>::value>
+struct __is_narrowing_conversion;
+
+// ignore "warning C4018: '<': signed/unsigned mismatch" in the following trait.
+// The implicit conversions will do the right thing here.
+template <typename _From, typename _To>
+struct __is_narrowing_conversion<_From, _To, true, true>
+ : public __bool_constant<(
+ std::numeric_limits<_From>::digits > std::numeric_limits<_To>::digits
+ || std::numeric_limits<_From>::max() > std::numeric_limits<_To>::max()
+ || std::numeric_limits<_From>::lowest()
+ < std::numeric_limits<_To>::lowest()
+ || (std::is_signed<_From>::value && std::is_unsigned<_To>::value))>
+{
+};
+
+template <typename _Tp>
+struct __is_narrowing_conversion<bool, _Tp, true, true> : public true_type
+{
+};
+template <>
+struct __is_narrowing_conversion<bool, bool, true, true> : public false_type
+{
+};
+template <typename _Tp>
+struct __is_narrowing_conversion<_Tp, _Tp, true, true> : public false_type
+{
+};
+
+template <typename _From, typename _To>
+struct __is_narrowing_conversion<_From, _To, false, true>
+ : public negation<std::is_convertible<_From, _To>>
+{
+};
+
+// }}}
+// __converts_to_higher_integer_rank{{{
+template <typename _From, typename _To, bool = (sizeof(_From) < sizeof(_To))>
+struct __converts_to_higher_integer_rank : public true_type
+{
+};
+// this may fail for char -> short if sizeof(char) == sizeof(short)
+template <typename _From, typename _To>
+struct __converts_to_higher_integer_rank<_From, _To, false>
+ : public std::is_same<decltype(std::declval<_From>() + std::declval<_To>()),
+ _To>
+{
+};
+
+// }}}
+// __is_aligned(_v){{{
+template <typename _Flag, size_t _Alignment> struct __is_aligned;
+template <size_t _Alignment>
+struct __is_aligned<vector_aligned_tag, _Alignment> : public true_type
+{
+};
+template <size_t _Alignment>
+struct __is_aligned<element_aligned_tag, _Alignment> : public false_type
+{
+};
+template <size_t _GivenAlignment, size_t _Alignment>
+struct __is_aligned<overaligned_tag<_GivenAlignment>, _Alignment>
+ : public std::integral_constant<bool, (_GivenAlignment % _Alignment == 0)>
+{
+};
+template <typename _Flag, size_t _Alignment>
+inline constexpr bool __is_aligned_v = __is_aligned<_Flag, _Alignment>::value;
+
+// }}}
+// __data(simd/simd_mask) {{{
+template <typename _Tp, typename _Ap>
+_GLIBCXX_SIMD_INTRINSIC constexpr const auto&
+__data(const simd<_Tp, _Ap>& __x);
+template <typename _Tp, typename _Ap>
+_GLIBCXX_SIMD_INTRINSIC constexpr auto&
+__data(simd<_Tp, _Ap>& __x);
+
+template <typename _Tp, typename _Ap>
+_GLIBCXX_SIMD_INTRINSIC constexpr const auto&
+__data(const simd_mask<_Tp, _Ap>& __x);
+template <typename _Tp, typename _Ap>
+_GLIBCXX_SIMD_INTRINSIC constexpr auto&
+__data(simd_mask<_Tp, _Ap>& __x);
+
+// }}}
+// _SimdConverter {{{
+template <typename _FromT, typename _FromA, typename _ToT, typename _ToA,
+ typename = void>
+struct _SimdConverter;
+
+template <typename _Tp, typename _Ap>
+struct _SimdConverter<_Tp, _Ap, _Tp, _Ap, void>
+{
+ template <typename _Up>
+ _GLIBCXX_SIMD_INTRINSIC const _Up& operator()(const _Up& __x)
+ {
+ return __x;
+ }
+};
+
+// }}}
+// __to_value_type_or_member_type {{{
+template <typename _V>
+_GLIBCXX_SIMD_INTRINSIC constexpr auto
+__to_value_type_or_member_type(const _V& __x) -> decltype(__data(__x))
+{
+ return __data(__x);
+}
+
+template <typename _V>
+_GLIBCXX_SIMD_INTRINSIC constexpr const typename _V::value_type&
+__to_value_type_or_member_type(const typename _V::value_type& __x)
+{
+ return __x;
+}
+
+// }}}
+// __bool_storage_member_type{{{
+template <size_t _Size> struct __bool_storage_member_type;
+
+template <size_t _Size>
+using __bool_storage_member_type_t =
+ typename __bool_storage_member_type<_Size>::type;
+
+// }}}
+// _SimdTuple {{{
+// why not std::tuple?
+// 1. std::tuple gives no guarantee about the storage order, but I require
+// storage
+// equivalent to std::array<_Tp, _Np>
+// 2. direct access to the element type (first template argument)
+// 3. enforces equal element type, only different _Abi types are allowed
+template <typename _Tp, typename... _Abis> struct _SimdTuple;
+
+//}}}
+// __fixed_size_storage_t {{{
+template <typename _Tp, int _Np> struct __fixed_size_storage;
+
+template <typename _Tp, int _Np>
+using __fixed_size_storage_t = typename __fixed_size_storage<_Tp, _Np>::type;
+
+// }}}
+// _SimdWrapper fwd decl{{{
+template <typename _Tp, size_t _Size, typename = std::void_t<>>
+struct _SimdWrapper;
+
+template <typename _Tp>
+using _SimdWrapper8 = _SimdWrapper<_Tp, 8 / sizeof(_Tp)>;
+template <typename _Tp>
+using _SimdWrapper16 = _SimdWrapper<_Tp, 16 / sizeof(_Tp)>;
+template <typename _Tp>
+using _SimdWrapper32 = _SimdWrapper<_Tp, 32 / sizeof(_Tp)>;
+template <typename _Tp>
+using _SimdWrapper64 = _SimdWrapper<_Tp, 64 / sizeof(_Tp)>;
+
+// }}}
+// __is_simd_wrapper {{{
+template <typename _Tp> struct __is_simd_wrapper : false_type
+{
+};
+template <typename _Tp, size_t _Np>
+struct __is_simd_wrapper<_SimdWrapper<_Tp, _Np>> : true_type
+{
+};
+template <typename _Tp>
+inline constexpr bool __is_simd_wrapper_v = __is_simd_wrapper<_Tp>::value;
+
+// }}}
+// _BitOps {{{
+struct _BitOps
+{
+ // __popcount {{{
+ static constexpr _UInt __popcount(_UInt __x)
+ {
+ return __builtin_popcount(__x);
+ }
+ static constexpr _ULong __popcount(_ULong __x)
+ {
+ return __builtin_popcountl(__x);
+ }
+ static constexpr _ULLong __popcount(_ULLong __x)
+ {
+ return __builtin_popcountll(__x);
+ }
+
+ // }}}
+ // __ctz/__clz {{{
+ static constexpr _UInt __ctz(_UInt __x) { return __builtin_ctz(__x); }
+ static constexpr _ULong __ctz(_ULong __x) { return __builtin_ctzl(__x); }
+ static constexpr _ULLong __ctz(_ULLong __x) { return __builtin_ctzll(__x); }
+ static constexpr _UInt __clz(_UInt __x) { return __builtin_clz(__x); }
+ static constexpr _ULong __clz(_ULong __x) { return __builtin_clzl(__x); }
+ static constexpr _ULLong __clz(_ULLong __x) { return __builtin_clzll(__x); }
+
+ // }}}
+ // __bit_iteration {{{
+ template <typename _Tp, typename _Fp>
+ static void __bit_iteration(_Tp __mask, _Fp&& __f)
+ {
+ static_assert(sizeof(_ULLong) >= sizeof(_Tp));
+ std::conditional_t<sizeof(_Tp) <= sizeof(_UInt), _UInt, _ULLong> __k;
+ if constexpr (std::is_convertible_v<_Tp, decltype(__k)>)
+ __k = __mask;
+ else
+ __k = __mask.to_ullong();
+ switch (__popcount(__k))
+ {
+ default:
+ do
+ {
+ __f(__ctz(__k));
+ __k &= (__k - 1);
+ }
+ while (__k);
+ break;
+ /*case 3:
+ __f(__ctz(__k));
+ __k &= (__k - 1);
+ [[fallthrough]];*/
+ case 2:
+ __f(__ctz(__k));
+ [[fallthrough]];
+ case 1:
+ __f(__popcount(~decltype(__k)()) - 1 - __clz(__k));
+ [[fallthrough]];
+ case 0:
+ break;
+ }
+ }
+
+ //}}}
+ // __firstbit{{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_CONST static auto __firstbit(_Tp __bits)
+ {
+ static_assert(std::is_integral_v<_Tp>,
+ "__firstbit requires an integral argument");
+ if constexpr (sizeof(_Tp) <= sizeof(int))
+ return __builtin_ctz(__bits);
+ else if constexpr (alignof(_ULLong) == 8)
+ return __builtin_ctzll(__bits);
+ else
+ {
+ _UInt __lo = __bits;
+ return __lo == 0 ? 32 + __builtin_ctz(__bits >> 32)
+ : __builtin_ctz(__lo);
+ }
+ }
+
+ // }}}
+ // __lastbit{{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_CONST static auto __lastbit(_Tp __bits)
+ {
+ static_assert(std::is_integral_v<_Tp>,
+ "__lastbit requires an integral argument");
+ if constexpr (sizeof(_Tp) <= sizeof(int))
+ return 31 - __builtin_clz(__bits);
+ else if constexpr (alignof(_ULLong) == 8)
+ return 63 - __builtin_clzll(__bits);
+ else
+ {
+ _UInt __lo = __bits;
+ _UInt __hi = __bits >> 32u;
+ return __hi == 0 ? 31 - __builtin_clz(__lo) : 63 - __builtin_clz(__hi);
+ }
+ }
+
+ // }}}
+};
+
+//}}}
+// __increment, __decrement {{{
+template <typename _Tp = void> struct __increment
+{
+ constexpr _Tp operator()(_Tp __a) const { return ++__a; }
+};
+template <> struct __increment<void>
+{
+ template <typename _Tp> constexpr _Tp operator()(_Tp __a) const
+ {
+ return ++__a;
+ }
+};
+template <typename _Tp = void> struct __decrement
+{
+ constexpr _Tp operator()(_Tp __a) const { return --__a; }
+};
+template <> struct __decrement<void>
+{
+ template <typename _Tp> constexpr _Tp operator()(_Tp __a) const
+ {
+ return --__a;
+ }
+};
+
+// }}}
+// _ValuePreserving(OrInt) {{{
+template <typename _From, typename _To,
+ typename = enable_if_t<negation<
+ __is_narrowing_conversion<__remove_cvref_t<_From>, _To>>::value>>
+using _ValuePreserving = _From;
+
+template <typename _From, typename _To,
+ typename _DecayedFrom = __remove_cvref_t<_From>,
+ typename = enable_if_t<conjunction<
+ is_convertible<_From, _To>,
+ disjunction<
+ is_same<_DecayedFrom, _To>, is_same<_DecayedFrom, int>,
+ conjunction<is_same<_DecayedFrom, _UInt>, is_unsigned<_To>>,
+ negation<__is_narrowing_conversion<_DecayedFrom, _To>>>>::value>>
+using _ValuePreservingOrInt = _From;
+
+// }}}
+// __intrinsic_type {{{
+template <typename _Tp, size_t _Bytes, typename = std::void_t<>>
+struct __intrinsic_type;
+template <typename _Tp, size_t _Size>
+using __intrinsic_type_t =
+ typename __intrinsic_type<_Tp, _Size * sizeof(_Tp)>::type;
+template <typename _Tp>
+using __intrinsic_type2_t = typename __intrinsic_type<_Tp, 2>::type;
+template <typename _Tp>
+using __intrinsic_type4_t = typename __intrinsic_type<_Tp, 4>::type;
+template <typename _Tp>
+using __intrinsic_type8_t = typename __intrinsic_type<_Tp, 8>::type;
+template <typename _Tp>
+using __intrinsic_type16_t = typename __intrinsic_type<_Tp, 16>::type;
+template <typename _Tp>
+using __intrinsic_type32_t = typename __intrinsic_type<_Tp, 32>::type;
+template <typename _Tp>
+using __intrinsic_type64_t = typename __intrinsic_type<_Tp, 64>::type;
+template <typename _Tp>
+using __intrinsic_type128_t = typename __intrinsic_type<_Tp, 128>::type;
+
+// }}}
+// _BitMask {{{
+template <size_t _Np, bool _Sanitized = false> struct _BitMask;
+
+template <size_t _Np, bool _Sanitized>
+struct __is_bitmask<_BitMask<_Np, _Sanitized>, void> : true_type
+{
+};
+
+template <size_t _Np> using _SanitizedBitMask = _BitMask<_Np, true>;
+
+template <size_t _Np, bool _Sanitized> struct _BitMask
+{
+ static_assert(_Np > 0);
+ static constexpr size_t _NBytes = __div_roundup(_Np, CHAR_BIT);
+ using _Tp = conditional_t<_Np == 1, bool,
+ make_unsigned_t<__int_with_sizeof_t<std::min(
+ sizeof(_ULLong), __next_power_of_2(_NBytes))>>>;
+ static constexpr int _S_array_size = __div_roundup(_NBytes, sizeof(_Tp));
+ _Tp _M_bits[_S_array_size];
+ static constexpr int _S_unused_bits
+ = _Np == 1 ? 0 : _S_array_size * sizeof(_Tp) * CHAR_BIT - _Np;
+ static constexpr _Tp _S_bitmask = +_Tp(~_Tp()) >> _S_unused_bits;
+
+ constexpr _BitMask() noexcept = default;
+ constexpr _BitMask(unsigned long long __x) noexcept
+ : _M_bits{static_cast<_Tp>(__x)}
+ {}
+ _BitMask(std::bitset<_Np> __x) noexcept : _BitMask(__x.to_ullong()) {}
+
+ constexpr _BitMask(const _BitMask&) noexcept = default;
+
+ template <bool _RhsSanitized, typename = enable_if_t<_RhsSanitized == false
+ && _Sanitized == true>>
+ constexpr _BitMask(const _BitMask<_Np, _RhsSanitized>& __rhs) noexcept
+ : _BitMask(__rhs._M_sanitized())
+ {}
+
+ constexpr operator _SimdWrapper<bool, _Np>() const noexcept
+ {
+ static_assert(_S_array_size == 1);
+ return _M_bits[0];
+ }
+
+ // precondition: is sanitized
+ constexpr _Tp _M_to_bits() const noexcept
+ {
+ static_assert(_S_array_size == 1);
+ return _M_bits[0];
+ }
+ // precondition: is sanitized
+ constexpr unsigned long long to_ullong() const noexcept
+ {
+ static_assert(_S_array_size == 1);
+ return _M_bits[0];
+ }
+ // precondition: is sanitized
+ constexpr unsigned long to_ulong() const noexcept
+ {
+ static_assert(_S_array_size == 1);
+ return _M_bits[0];
+ }
+ constexpr std::bitset<_Np> _M_to_bitset() const noexcept
+ {
+ static_assert(_S_array_size == 1);
+ return _M_bits[0];
+ }
+
+ constexpr decltype(auto) _M_sanitized() const noexcept
+ {
+ if constexpr (_Sanitized)
+ return *this;
+ else if constexpr (_Np == 1)
+ return _SanitizedBitMask<_Np>(_M_bits[0]);
+ else
+ {
+ _SanitizedBitMask<_Np> __r = {};
+ for (int __i = 0; __i < _S_array_size; ++__i)
+ __r._M_bits[__i] = _M_bits[__i];
+ if constexpr (_S_unused_bits > 0)
+ __r._M_bits[_S_array_size - 1] &= _S_bitmask;
+ return __r;
+ }
+ }
+
+ template <size_t _Mp, bool _LSanitized>
+ constexpr _BitMask<_Np + _Mp, _Sanitized>
+ _M_prepend(_BitMask<_Mp, _LSanitized> __lsb) const noexcept
+ {
+ constexpr size_t _RN = _Np + _Mp;
+ using _Rp = _BitMask<_RN, _Sanitized>;
+ if constexpr (_Rp::_S_array_size == 1)
+ {
+ _Rp __r{{_M_bits[0]}};
+ __r._M_bits[0] <<= _Mp;
+ __r._M_bits[0] |= __lsb._M_sanitized()._M_bits[0];
+ return __r;
+ }
+ else
+ __assert_unreachable<_Rp>();
+ }
+
+ // Return a new _BitMask with size _NewSize while dropping _DropLsb least
+ // significant bits. If the operation implicitly produces a sanitized bitmask,
+ // the result type will have _Sanitized set.
+ template <size_t _DropLsb, size_t _NewSize = _Np - _DropLsb>
+ constexpr auto _M_extract() const noexcept
+ {
+ static_assert(_Np > _DropLsb);
+ static_assert(_DropLsb + _NewSize <= sizeof(_ULLong) * CHAR_BIT,
+ "not implemented for bitmasks larger than one ullong");
+ if constexpr (_NewSize == 1) // must sanitize because the return _Tp is bool
+ return _SanitizedBitMask<1>{
+ {static_cast<bool>(_M_bits[0] & (_Tp(1) << _DropLsb))}};
+ else
+ return _BitMask<_NewSize,
+ ((_NewSize + _DropLsb == sizeof(_Tp) * CHAR_BIT
+ && _NewSize + _DropLsb <= _Np)
+ || ((_Sanitized || _Np == sizeof(_Tp) * CHAR_BIT)
+ && _NewSize + _DropLsb >= _Np))>(_M_bits[0]
+ >> _DropLsb);
+ }
+
+ // True if all bits are set. Implicitly sanitizes if _Sanitized == false.
+ constexpr bool all() const noexcept
+ {
+ if constexpr (_Np == 1)
+ return _M_bits[0];
+ else if constexpr (!_Sanitized)
+ return _M_sanitized().all();
+ else
+ {
+ constexpr _Tp __allbits = ~_Tp();
+ for (int __i = 0; __i < _S_array_size - 1; ++__i)
+ if (_M_bits[__i] != __allbits)
+ return false;
+ return _M_bits[_S_array_size - 1] == _S_bitmask;
+ }
+ }
+
+ // True if at least one bit is set. Implicitly sanitizes if _Sanitized ==
+ // false.
+ constexpr bool any() const noexcept
+ {
+ if constexpr (_Np == 1)
+ return _M_bits[0];
+ else if constexpr (!_Sanitized)
+ return _M_sanitized().any();
+ else
+ {
+ for (int __i = 0; __i < _S_array_size - 1; ++__i)
+ if (_M_bits[__i] != 0)
+ return true;
+ return _M_bits[_S_array_size - 1] != 0;
+ }
+ }
+
+ // True if no bit is set. Implicitly sanitizes if _Sanitized == false.
+ constexpr bool none() const noexcept
+ {
+ if constexpr (_Np == 1)
+ return !_M_bits[0];
+ else if constexpr (!_Sanitized)
+ return _M_sanitized().none();
+ else
+ {
+ for (int __i = 0; __i < _S_array_size - 1; ++__i)
+ if (_M_bits[__i] != 0)
+ return false;
+ return _M_bits[_S_array_size - 1] == 0;
+ }
+ }
+
+ // Returns the number of set bits. Implicitly sanitizes if _Sanitized ==
+ // false.
+ constexpr int count() const noexcept
+ {
+ if constexpr (_Np == 1)
+ return _M_bits[0];
+ else if constexpr (!_Sanitized)
+ return _M_sanitized().none();
+ else
+ {
+ int __result = __builtin_popcountll(_M_bits[0]);
+ for (int __i = 1; __i < _S_array_size; ++__i)
+ __result += __builtin_popcountll(_M_bits[__i]);
+ return __result;
+ }
+ }
+
+ // Returns the bit at offset __i as bool.
+ constexpr bool operator[](size_t __i) const noexcept
+ {
+ if constexpr (_Np == 1)
+ return _M_bits[0];
+ else if constexpr (_S_array_size == 1)
+ return (_M_bits[0] >> __i) & 1;
+ else
+ {
+ const size_t __j = __i / (sizeof(_Tp) * CHAR_BIT);
+ const size_t __shift = __i % (sizeof(_Tp) * CHAR_BIT);
+ return (_M_bits[__j] >> __shift) & 1;
+ }
+ }
+ template <size_t __i>
+ constexpr bool operator[](_SizeConstant<__i>) const noexcept
+ {
+ static_assert(__i < _Np);
+ constexpr size_t __j = __i / (sizeof(_Tp) * CHAR_BIT);
+ constexpr size_t __shift = __i % (sizeof(_Tp) * CHAR_BIT);
+ return static_cast<bool>(_M_bits[__j] & (_Tp(1) << __shift));
+ }
+
+ // Set the bit at offset __i to __x.
+ constexpr void set(size_t __i, bool __x) noexcept
+ {
+ if constexpr (_Np == 1)
+ _M_bits[0] = __x;
+ else if constexpr (_S_array_size == 1)
+ {
+ _M_bits[0] &= ~_Tp(_Tp(1) << __i);
+ _M_bits[0] |= _Tp(_Tp(__x) << __i);
+ }
+ else
+ {
+ const size_t __j = __i / (sizeof(_Tp) * CHAR_BIT);
+ const size_t __shift = __i % (sizeof(_Tp) * CHAR_BIT);
+ _M_bits[__j] &= ~_Tp(_Tp(1) << __shift);
+ _M_bits[__j] |= _Tp(_Tp(__x) << __shift);
+ }
+ }
+ template <size_t __i>
+ constexpr void set(_SizeConstant<__i>, bool __x) noexcept
+ {
+ static_assert(__i < _Np);
+ if constexpr (_Np == 1)
+ _M_bits[0] = __x;
+ else
+ {
+ constexpr size_t __j = __i / (sizeof(_Tp) * CHAR_BIT);
+ constexpr size_t __shift = __i % (sizeof(_Tp) * CHAR_BIT);
+ constexpr _Tp __mask = ~_Tp(_Tp(1) << __shift);
+ _M_bits[__j] &= __mask;
+ _M_bits[__j] |= _Tp(_Tp(__x) << __shift);
+ }
+ }
+
+ // Inverts all bits. Sanitized input leads to sanitized output.
+ constexpr _BitMask operator~() const noexcept
+ {
+ if constexpr (_Np == 1)
+ return !_M_bits[0];
+ else
+ {
+ _BitMask __result{};
+ for (int __i = 0; __i < _S_array_size - 1; ++__i)
+ __result._M_bits[__i] = ~_M_bits[__i];
+ if constexpr (_Sanitized)
+ __result._M_bits[_S_array_size - 1]
+ = _M_bits[_S_array_size - 1] ^ _S_bitmask;
+ else
+ __result._M_bits[_S_array_size - 1] = ~_M_bits[_S_array_size - 1];
+ return __result;
+ }
+ }
+
+ constexpr _BitMask& operator^=(const _BitMask& __b) & noexcept
+ {
+ __execute_n_times<_S_array_size>(
+ [&](auto __i) { _M_bits[__i] ^= __b._M_bits[__i]; });
+ return *this;
+ }
+ constexpr _BitMask& operator|=(const _BitMask& __b) & noexcept
+ {
+ __execute_n_times<_S_array_size>(
+ [&](auto __i) { _M_bits[__i] |= __b._M_bits[__i]; });
+ return *this;
+ }
+ constexpr _BitMask& operator&=(const _BitMask& __b) & noexcept
+ {
+ __execute_n_times<_S_array_size>(
+ [&](auto __i) { _M_bits[__i] &= __b._M_bits[__i]; });
+ return *this;
+ }
+ friend constexpr _BitMask operator^(const _BitMask& __a,
+ const _BitMask& __b) noexcept
+ {
+ _BitMask __r = __a;
+ __r ^= __b;
+ return __r;
+ }
+ friend constexpr _BitMask operator|(const _BitMask& __a,
+ const _BitMask& __b) noexcept
+ {
+ _BitMask __r = __a;
+ __r |= __b;
+ return __r;
+ }
+ friend constexpr _BitMask operator&(const _BitMask& __a,
+ const _BitMask& __b) noexcept
+ {
+ _BitMask __r = __a;
+ __r &= __b;
+ return __r;
+ }
+
+ _GLIBCXX_SIMD_INTRINSIC
+ constexpr bool _M_is_constprop() const
+ {
+ if constexpr (_S_array_size == 0)
+ return __builtin_constant_p(_M_bits[0]);
+ else
+ {
+ for (int __i = 0; __i < _S_array_size; ++__i)
+ if (!__builtin_constant_p(_M_bits[__i]))
+ return false;
+ return true;
+ }
+ }
+};
+
+// }}}
+
+// vvv ---- builtin vector types [[gnu::vector_size(N)]] and operations ---- vvv
+// __min_vector_size {{{
+template <typename _Tp = void>
+static inline constexpr int __min_vector_size = 2 * sizeof(_Tp);
+#if _GLIBCXX_SIMD_HAVE_NEON
+template <> inline constexpr int __min_vector_size<void> = 8;
+#else
+template <> inline constexpr int __min_vector_size<void> = 16;
+#endif
+
+// }}}
+// __vector_type {{{
+template <typename _Tp, size_t _Np, typename = void> struct __vector_type_n
+{
+};
+
+// substition failure for 0-element case
+template <typename _Tp> struct __vector_type_n<_Tp, 0, void>
+{
+};
+
+// special case 1-element to be _Tp itself
+template <typename _Tp>
+struct __vector_type_n<_Tp, 1, enable_if_t<__is_vectorizable_v<_Tp>>>
+{
+ using type = _Tp;
+};
+
+// else, use GNU-style builtin vector types
+template <typename _Tp, size_t _Np>
+struct __vector_type_n<_Tp, _Np,
+ enable_if_t<__is_vectorizable_v<_Tp> && _Np >= 2>>
+{
+ static constexpr size_t _Bytes = _Np * sizeof(_Tp) < __min_vector_size<_Tp>
+ ? __min_vector_size<_Tp>
+ : __next_power_of_2(_Np * sizeof(_Tp));
+ using type [[__gnu__::__vector_size__(_Bytes)]] = _Tp;
+};
+
+template <typename _Tp, size_t _Bytes, size_t = _Bytes % sizeof(_Tp)>
+struct __vector_type;
+
+template <typename _Tp, size_t _Bytes>
+struct __vector_type<_Tp, _Bytes, 0>
+ : __vector_type_n<_Tp, _Bytes / sizeof(_Tp)>
+{
+};
+
+template <typename _Tp, size_t _Size>
+using __vector_type_t = typename __vector_type_n<_Tp, _Size>::type;
+template <typename _Tp>
+using __vector_type2_t = typename __vector_type<_Tp, 2>::type;
+template <typename _Tp>
+using __vector_type4_t = typename __vector_type<_Tp, 4>::type;
+template <typename _Tp>
+using __vector_type8_t = typename __vector_type<_Tp, 8>::type;
+template <typename _Tp>
+using __vector_type16_t = typename __vector_type<_Tp, 16>::type;
+template <typename _Tp>
+using __vector_type32_t = typename __vector_type<_Tp, 32>::type;
+template <typename _Tp>
+using __vector_type64_t = typename __vector_type<_Tp, 64>::type;
+template <typename _Tp>
+using __vector_type128_t = typename __vector_type<_Tp, 128>::type;
+
+// }}}
+// __is_vector_type {{{
+template <typename _Tp, typename = std::void_t<>>
+struct __is_vector_type : false_type
+{
+};
+template <typename _Tp>
+struct __is_vector_type<
+ _Tp, std::void_t<typename __vector_type<decltype(std::declval<_Tp>()[0]),
+ sizeof(_Tp)>::type>>
+ : std::is_same<_Tp, typename __vector_type<decltype(std::declval<_Tp>()[0]),
+ sizeof(_Tp)>::type>
+{
+};
+
+template <typename _Tp>
+inline constexpr bool __is_vector_type_v = __is_vector_type<_Tp>::value;
+
+// }}}
+// _VectorTraits{{{
+template <typename _Tp, typename = std::void_t<>> struct _VectorTraitsImpl;
+template <typename _Tp>
+struct _VectorTraitsImpl<_Tp, enable_if_t<__is_vector_type_v<_Tp>>>
+{
+ using type = _Tp;
+ using value_type = decltype(std::declval<_Tp>()[0]);
+ static constexpr int _S_width = sizeof(_Tp) / sizeof(value_type);
+ using _Wrapper = _SimdWrapper<value_type, _S_width>;
+ template <typename _Up, int _W = _S_width>
+ static constexpr bool __is = std::is_same_v<value_type, _Up>&& _W == _S_width;
+};
+template <typename _Tp, size_t _Np>
+struct _VectorTraitsImpl<_SimdWrapper<_Tp, _Np>,
+ std::void_t<__vector_type_t<_Tp, _Np>>>
+{
+ using type = __vector_type_t<_Tp, _Np>;
+ using value_type = _Tp;
+ static constexpr int _S_width = sizeof(type) / sizeof(value_type);
+ using _Wrapper = _SimdWrapper<_Tp, _Np>;
+ static constexpr bool _S_is_partial = (_Np == _S_width);
+ static constexpr int _S_partial_width = _Np;
+ template <typename _Up, int _W = _S_width>
+ static constexpr bool __is = std::is_same_v<value_type, _Up>&& _W == _S_width;
+};
+
+template <typename _Tp, typename = typename _VectorTraitsImpl<_Tp>::type>
+using _VectorTraits = _VectorTraitsImpl<_Tp>;
+
+// }}}
+// __as_vector{{{
+template <typename _V>
+_GLIBCXX_SIMD_INTRINSIC constexpr auto
+__as_vector(_V __x)
+{
+ if constexpr (__is_vector_type_v<_V>)
+ return __x;
+ else if constexpr (is_simd<_V>::value || is_simd_mask<_V>::value)
+ return __data(__x)._M_data;
+ else if constexpr (__is_vectorizable_v<_V>)
+ return __vector_type_t<_V, 2>{__x};
+ else
+ return __x._M_data;
+}
+
+// }}}
+// __as_wrapper{{{
+template <typename _V>
+_GLIBCXX_SIMD_INTRINSIC constexpr auto
+__as_wrapper(_V __x)
+{
+ if constexpr (__is_vector_type_v<_V>)
+ return _SimdWrapper<typename _VectorTraits<_V>::value_type,
+ _VectorTraits<_V>::_S_width>(__x);
+ else if constexpr (is_simd<_V>::value || is_simd_mask<_V>::value)
+ return __data(__x);
+ else
+ return __x;
+}
+
+// }}}
+// __intrin_bitcast{{{
+template <typename _To, typename _From>
+_GLIBCXX_SIMD_INTRINSIC constexpr _To
+__intrin_bitcast(_From __v)
+{
+ static_assert(__is_vector_type_v<_From> && __is_vector_type_v<_To>);
+ if constexpr (sizeof(_To) == sizeof(_From))
+ return reinterpret_cast<_To>(__v);
+ else if constexpr (sizeof(_From) > sizeof(_To))
+ if constexpr (sizeof(_To) >= 16)
+ return reinterpret_cast<const __may_alias<_To>&>(__v);
+ else
+ {
+ _To __r;
+ __builtin_memcpy(&__r, &__v, sizeof(_To));
+ return __r;
+ }
+#if _GLIBCXX_SIMD_X86INTRIN
+ else if constexpr (__have_avx && sizeof(_From) == 16 && sizeof(_To) == 32)
+ return reinterpret_cast<_To>(__builtin_ia32_ps256_ps(
+ reinterpret_cast<__vector_type_t<float, 4>>(__v)));
+ else if constexpr (__have_avx512f && sizeof(_From) == 16 && sizeof(_To) == 64)
+ return reinterpret_cast<_To>(__builtin_ia32_ps512_ps(
+ reinterpret_cast<__vector_type_t<float, 4>>(__v)));
+ else if constexpr (__have_avx512f && sizeof(_From) == 32 && sizeof(_To) == 64)
+ return reinterpret_cast<_To>(__builtin_ia32_ps512_256ps(
+ reinterpret_cast<__vector_type_t<float, 8>>(__v)));
+#endif // _GLIBCXX_SIMD_X86INTRIN
+ else if constexpr (sizeof(__v) <= 8)
+ return reinterpret_cast<_To>(
+ __vector_type_t<__int_for_sizeof_t<_From>, sizeof(_To) / sizeof(_From)>{
+ reinterpret_cast<__int_for_sizeof_t<_From>>(__v)});
+ else
+ {
+ static_assert(sizeof(_To) > sizeof(_From));
+ _To __r = {};
+ __builtin_memcpy(&__r, &__v, sizeof(_From));
+ return __r;
+ }
+}
+
+// }}}
+// __vector_bitcast{{{
+template <typename _To, size_t _NN = 0, typename _From,
+ typename _FromVT = _VectorTraits<_From>,
+ size_t _Np = _NN == 0 ? sizeof(_From) / sizeof(_To) : _NN>
+_GLIBCXX_SIMD_INTRINSIC constexpr __vector_type_t<_To, _Np>
+__vector_bitcast(_From __x)
+{
+ using _R = __vector_type_t<_To, _Np>;
+ return __intrin_bitcast<_R>(__x);
+}
+template <typename _To, size_t _NN = 0, typename _Tp, size_t _Nx,
+ size_t _Np
+ = _NN == 0 ? sizeof(_SimdWrapper<_Tp, _Nx>) / sizeof(_To) : _NN>
+_GLIBCXX_SIMD_INTRINSIC constexpr __vector_type_t<_To, _Np>
+__vector_bitcast(const _SimdWrapper<_Tp, _Nx>& __x)
+{
+ static_assert(_Np > 1);
+ return __intrin_bitcast<__vector_type_t<_To, _Np>>(__x._M_data);
+}
+
+// }}}
+// __convert_x86 declarations {{{
+#ifdef _GLIBCXX_SIMD_WORKAROUND_PR85048
+template <typename _To, typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+_To __convert_x86(_Tp);
+
+template <typename _To, typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+_To __convert_x86(_Tp, _Tp);
+
+template <typename _To, typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+_To __convert_x86(_Tp, _Tp, _Tp, _Tp);
+
+template <typename _To, typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+_To __convert_x86(_Tp, _Tp, _Tp, _Tp, _Tp, _Tp, _Tp, _Tp);
+
+template <typename _To, typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+_To __convert_x86(_Tp, _Tp, _Tp, _Tp, _Tp, _Tp, _Tp, _Tp, _Tp, _Tp, _Tp, _Tp,
+ _Tp, _Tp, _Tp, _Tp);
+#endif // _GLIBCXX_SIMD_WORKAROUND_PR85048
+
+//}}}
+// __to_intrin {{{
+template <typename _Tp, typename _TVT = _VectorTraits<_Tp>,
+ typename _R
+ = __intrinsic_type_t<typename _TVT::value_type, _TVT::_S_width>>
+_GLIBCXX_SIMD_INTRINSIC constexpr _R
+__to_intrin(_Tp __x)
+{
+ static_assert(sizeof(__x) <= sizeof(_R),
+ "__to_intrin may never drop values off the end");
+ if constexpr (sizeof(__x) == sizeof(_R))
+ return reinterpret_cast<_R>(__as_vector(__x));
+ else
+ {
+ using _Up = __int_for_sizeof_t<_Tp>;
+ return reinterpret_cast<_R>(
+ __vector_type_t<_Up, sizeof(_R) / sizeof(_Up)>{__bit_cast<_Up>(__x)});
+ }
+}
+
+// }}}
+// __make_vector{{{
+template <typename _Tp, typename... _Args>
+_GLIBCXX_SIMD_INTRINSIC constexpr __vector_type_t<_Tp, sizeof...(_Args)>
+__make_vector(const _Args&... __args)
+{
+ return __vector_type_t<_Tp, sizeof...(_Args)>{static_cast<_Tp>(__args)...};
+}
+
+// }}}
+// __vector_broadcast{{{
+template <size_t _Np, typename _Tp>
+_GLIBCXX_SIMD_INTRINSIC constexpr __vector_type_t<_Tp, _Np>
+__vector_broadcast(_Tp __x)
+{
+ return __call_with_n_evaluations<_Np>(
+ [](auto... __xx) { return __vector_type_t<_Tp, _Np>{__xx...}; },
+ [&__x](int) { return __x; });
+}
+
+// }}}
+// __generate_vector{{{
+template <typename _Tp, size_t _Np, typename _Gp, size_t... _I>
+_GLIBCXX_SIMD_INTRINSIC constexpr __vector_type_t<_Tp, _Np>
+__generate_vector_impl(_Gp&& __gen, std::index_sequence<_I...>)
+{
+#ifdef _GLIBCXX_SIMD_WORKAROUND_PR89229
+ // Using -S -fverbose-asm this function turned up as the place where the
+ // invalid instruction was produced. Using some arbitrary memory clobbers to
+ // kill the optimizer and thus avoid the problem.
+ if constexpr (__have_avx512f && !__have_avx512vl && sizeof(_Tp) == 8
+ && std::is_integral_v<_Tp>)
+ if (!__builtin_is_constant_evaluated())
+ [] { asm("" ::: "memory"); }();
+#endif
+ return __vector_type_t<_Tp, _Np>{
+ static_cast<_Tp>(__gen(_SizeConstant<_I>()))...};
+}
+
+template <typename _V, typename _VVT = _VectorTraits<_V>, typename _Gp>
+_GLIBCXX_SIMD_INTRINSIC constexpr _V
+__generate_vector(_Gp&& __gen)
+{
+ if constexpr (__is_vector_type_v<_V>)
+ return __generate_vector_impl<typename _VVT::value_type, _VVT::_S_width>(
+ static_cast<_Gp&&>(__gen), std::make_index_sequence<_VVT::_S_width>());
+ else
+ return __generate_vector_impl<typename _VVT::value_type,
+ _VVT::_S_partial_width>(
+ static_cast<_Gp&&>(__gen),
+ std::make_index_sequence<_VVT::_S_partial_width>());
+}
+
+template <typename _Tp, size_t _Np, typename _Gp>
+_GLIBCXX_SIMD_INTRINSIC constexpr __vector_type_t<_Tp, _Np>
+__generate_vector(_Gp&& __gen)
+{
+ return __generate_vector_impl<_Tp, _Np>(static_cast<_Gp&&>(__gen),
+ std::make_index_sequence<_Np>());
+}
+
+// }}}
+// __xor{{{
+template <typename _Tp, typename _TVT = _VectorTraits<_Tp>, typename... _Dummy>
+_GLIBCXX_SIMD_INTRINSIC constexpr _Tp
+__xor(_Tp __a, typename _TVT::type __b, _Dummy...) noexcept
+{
+ static_assert(sizeof...(_Dummy) == 0);
+ using _Up = typename _TVT::value_type;
+ using _Ip = make_unsigned_t<__int_for_sizeof_t<_Up>>;
+ return __vector_bitcast<_Up>(__vector_bitcast<_Ip>(__a)
+ ^ __vector_bitcast<_Ip>(__b));
+}
+
+template <typename _Tp, typename = decltype(_Tp() ^ _Tp())>
+_GLIBCXX_SIMD_INTRINSIC constexpr _Tp
+__xor(_Tp __a, _Tp __b) noexcept
+{
+ return __a ^ __b;
+}
+
+// }}}
+// __or{{{
+template <typename _Tp, typename _TVT = _VectorTraits<_Tp>, typename... _Dummy>
+_GLIBCXX_SIMD_INTRINSIC constexpr _Tp
+__or(_Tp __a, typename _TVT::type __b, _Dummy...) noexcept
+{
+ static_assert(sizeof...(_Dummy) == 0);
+ using _Up = typename _TVT::value_type;
+ using _Ip = make_unsigned_t<__int_for_sizeof_t<_Up>>;
+ return __vector_bitcast<_Up>(__vector_bitcast<_Ip>(__a)
+ | __vector_bitcast<_Ip>(__b));
+}
+
+template <typename _Tp, typename = decltype(_Tp() | _Tp())>
+_GLIBCXX_SIMD_INTRINSIC constexpr _Tp
+__or(_Tp __a, _Tp __b) noexcept
+{
+ return __a | __b;
+}
+
+// }}}
+// __and{{{
+template <typename _Tp, typename _TVT = _VectorTraits<_Tp>, typename... _Dummy>
+_GLIBCXX_SIMD_INTRINSIC constexpr _Tp
+__and(_Tp __a, typename _TVT::type __b, _Dummy...) noexcept
+{
+ static_assert(sizeof...(_Dummy) == 0);
+ using _Up = typename _TVT::value_type;
+ using _Ip = make_unsigned_t<__int_for_sizeof_t<_Up>>;
+ return __vector_bitcast<_Up>(__vector_bitcast<_Ip>(__a)
+ & __vector_bitcast<_Ip>(__b));
+}
+
+template <typename _Tp, typename = decltype(_Tp() & _Tp())>
+_GLIBCXX_SIMD_INTRINSIC constexpr _Tp
+__and(_Tp __a, _Tp __b) noexcept
+{
+ return __a & __b;
+}
+
+// }}}
+// __andnot{{{
+#if _GLIBCXX_SIMD_X86INTRIN && !defined __clang__
+static constexpr struct
+{
+ _GLIBCXX_SIMD_INTRINSIC __v4sf operator()(__v4sf __a,
+ __v4sf __b) const noexcept
+ {
+ return __builtin_ia32_andnps(__a, __b);
+ }
+ _GLIBCXX_SIMD_INTRINSIC __v2df operator()(__v2df __a,
+ __v2df __b) const noexcept
+ {
+ return __builtin_ia32_andnpd(__a, __b);
+ }
+ _GLIBCXX_SIMD_INTRINSIC __v2di operator()(__v2di __a,
+ __v2di __b) const noexcept
+ {
+ return __builtin_ia32_pandn128(__a, __b);
+ }
+ _GLIBCXX_SIMD_INTRINSIC __v8sf operator()(__v8sf __a,
+ __v8sf __b) const noexcept
+ {
+ return __builtin_ia32_andnps256(__a, __b);
+ }
+ _GLIBCXX_SIMD_INTRINSIC __v4df operator()(__v4df __a,
+ __v4df __b) const noexcept
+ {
+ return __builtin_ia32_andnpd256(__a, __b);
+ }
+ _GLIBCXX_SIMD_INTRINSIC __v4di operator()(__v4di __a,
+ __v4di __b) const noexcept
+ {
+ return __builtin_ia32_andnotsi256(__a, __b);
+ }
+ _GLIBCXX_SIMD_INTRINSIC __v16sf operator()(__v16sf __a,
+ __v16sf __b) const noexcept
+ {
+ if constexpr (__have_avx512dq)
+ return _mm512_andnot_ps(__a, __b);
+ else
+ return reinterpret_cast<__v16sf>(
+ _mm512_andnot_si512(reinterpret_cast<__v8di>(__a),
+ reinterpret_cast<__v8di>(__b)));
+ }
+ _GLIBCXX_SIMD_INTRINSIC __v8df operator()(__v8df __a,
+ __v8df __b) const noexcept
+ {
+ if constexpr (__have_avx512dq)
+ return _mm512_andnot_pd(__a, __b);
+ else
+ return reinterpret_cast<__v8df>(
+ _mm512_andnot_si512(reinterpret_cast<__v8di>(__a),
+ reinterpret_cast<__v8di>(__b)));
+ }
+ _GLIBCXX_SIMD_INTRINSIC __v8di operator()(__v8di __a,
+ __v8di __b) const noexcept
+ {
+ return _mm512_andnot_si512(__a, __b);
+ }
+} _S_x86_andnot;
+#endif // _GLIBCXX_SIMD_X86INTRIN && !__clang__
+
+template <typename _Tp, typename _TVT = _VectorTraits<_Tp>, typename... _Dummy>
+_GLIBCXX_SIMD_INTRINSIC constexpr _Tp
+__andnot(_Tp __a, typename _TVT::type __b, _Dummy...) noexcept
+{
+ static_assert(sizeof...(_Dummy) == 0);
+#if _GLIBCXX_SIMD_X86INTRIN && !defined __clang__
+ if constexpr (sizeof(_Tp) >= 16)
+ {
+ const auto __ai = __to_intrin(__a);
+ const auto __bi = __to_intrin(__b);
+ if (!__builtin_is_constant_evaluated()
+ && !(__builtin_constant_p(__ai) && __builtin_constant_p(__bi)))
+ {
+ const auto __r = _S_x86_andnot(__ai, __bi);
+ if constexpr (is_convertible_v<decltype(__r), _Tp>)
+ return __r;
+ else
+ return reinterpret_cast<_Tp>(__r);
+ }
+ }
+#endif // _GLIBCXX_SIMD_X86INTRIN
+ using _Up = typename _TVT::value_type;
+ using _Ip = make_unsigned_t<__int_for_sizeof_t<_Up>>;
+ return __vector_bitcast<_Up>(~__vector_bitcast<_Ip>(__a)
+ & __vector_bitcast<_Ip>(__b));
+}
+
+template <typename _Tp, typename = decltype(~_Tp() & _Tp())>
+_GLIBCXX_SIMD_INTRINSIC constexpr _Tp
+__andnot(_Tp __a, _Tp __b) noexcept
+{
+ return ~__a & __b;
+}
+
+// }}}
+// __not{{{
+template <typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+_GLIBCXX_SIMD_INTRINSIC constexpr _Tp
+__not(_Tp __a) noexcept
+{
+ if constexpr (std::is_floating_point_v<typename _TVT::value_type>)
+ return reinterpret_cast<typename _TVT::type>(
+ ~__vector_bitcast<unsigned>(__a));
+ else
+ return ~__a;
+}
+
+// }}}
+// __concat{{{
+template <typename _Tp, typename _TVT = _VectorTraits<_Tp>,
+ typename _R
+ = __vector_type_t<typename _TVT::value_type, _TVT::_S_width * 2>>
+constexpr _R
+__concat(_Tp a_, _Tp b_)
+{
+#ifdef _GLIBCXX_SIMD_WORKAROUND_XXX_1
+ using _W
+ = std::conditional_t<std::is_floating_point_v<typename _TVT::value_type>,
+ double,
+ conditional_t<(sizeof(_Tp) >= 2 * sizeof(long long)),
+ long long, typename _TVT::value_type>>;
+ constexpr int input_width = sizeof(_Tp) / sizeof(_W);
+ const auto __a = __vector_bitcast<_W>(a_);
+ const auto __b = __vector_bitcast<_W>(b_);
+ using _Up = __vector_type_t<_W, sizeof(_R) / sizeof(_W)>;
+#else
+ constexpr int input_width = _TVT::_S_width;
+ const _Tp& __a = a_;
+ const _Tp& __b = b_;
+ using _Up = _R;
+#endif
+ if constexpr (input_width == 2)
+ return reinterpret_cast<_R>(_Up{__a[0], __a[1], __b[0], __b[1]});
+ else if constexpr (input_width == 4)
+ return reinterpret_cast<_R>(
+ _Up{__a[0], __a[1], __a[2], __a[3], __b[0], __b[1], __b[2], __b[3]});
+ else if constexpr (input_width == 8)
+ return reinterpret_cast<_R>(
+ _Up{__a[0], __a[1], __a[2], __a[3], __a[4], __a[5], __a[6], __a[7],
+ __b[0], __b[1], __b[2], __b[3], __b[4], __b[5], __b[6], __b[7]});
+ else if constexpr (input_width == 16)
+ return reinterpret_cast<_R>(
+ _Up{__a[0], __a[1], __a[2], __a[3], __a[4], __a[5], __a[6],
+ __a[7], __a[8], __a[9], __a[10], __a[11], __a[12], __a[13],
+ __a[14], __a[15], __b[0], __b[1], __b[2], __b[3], __b[4],
+ __b[5], __b[6], __b[7], __b[8], __b[9], __b[10], __b[11],
+ __b[12], __b[13], __b[14], __b[15]});
+ else if constexpr (input_width == 32)
+ return reinterpret_cast<_R>(_Up{
+ __a[0], __a[1], __a[2], __a[3], __a[4], __a[5], __a[6], __a[7],
+ __a[8], __a[9], __a[10], __a[11], __a[12], __a[13], __a[14], __a[15],
+ __a[16], __a[17], __a[18], __a[19], __a[20], __a[21], __a[22], __a[23],
+ __a[24], __a[25], __a[26], __a[27], __a[28], __a[29], __a[30], __a[31],
+ __b[0], __b[1], __b[2], __b[3], __b[4], __b[5], __b[6], __b[7],
+ __b[8], __b[9], __b[10], __b[11], __b[12], __b[13], __b[14], __b[15],
+ __b[16], __b[17], __b[18], __b[19], __b[20], __b[21], __b[22], __b[23],
+ __b[24], __b[25], __b[26], __b[27], __b[28], __b[29], __b[30], __b[31]});
+}
+
+// }}}
+// __zero_extend {{{
+template <typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+struct _ZeroExtendProxy
+{
+ using value_type = typename _TVT::value_type;
+ static constexpr size_t _Np = _TVT::_S_width;
+ const _Tp __x;
+
+ template <typename _To, typename _ToVT = _VectorTraits<_To>,
+ typename
+ = enable_if_t<is_same_v<typename _ToVT::value_type, value_type>>>
+ _GLIBCXX_SIMD_INTRINSIC operator _To() const
+ {
+ constexpr size_t _ToN = _ToVT::_S_width;
+ if constexpr (_ToN == _Np)
+ return __x;
+ else if constexpr (_ToN == 2 * _Np)
+ {
+#ifdef _GLIBCXX_SIMD_WORKAROUND_XXX_3
+ if constexpr (__have_avx && _TVT::template __is<float, 4>)
+ return __vector_bitcast<value_type>(
+ _mm256_insertf128_ps(__m256(), __x, 0));
+ else if constexpr (__have_avx && _TVT::template __is<double, 2>)
+ return __vector_bitcast<value_type>(
+ _mm256_insertf128_pd(__m256d(), __x, 0));
+ else if constexpr (__have_avx2 && _Np * sizeof(value_type) == 16)
+ return __vector_bitcast<value_type>(
+ _mm256_insertf128_si256(__m256i(), __to_intrin(__x), 0));
+ else if constexpr (__have_avx512f && _TVT::template __is<float, 8>)
+ {
+ if constexpr (__have_avx512dq)
+ return __vector_bitcast<value_type>(
+ _mm512_insertf32x8(__m512(), __x, 0));
+ else
+ return reinterpret_cast<__m512>(
+ _mm512_insertf64x4(__m512d(), reinterpret_cast<__m256d>(__x),
+ 0));
+ }
+ else if constexpr (__have_avx512f && _TVT::template __is<double, 4>)
+ return __vector_bitcast<value_type>(
+ _mm512_insertf64x4(__m512d(), __x, 0));
+ else if constexpr (__have_avx512f && _Np * sizeof(value_type) == 32)
+ return __vector_bitcast<value_type>(
+ _mm512_inserti64x4(__m512i(), __to_intrin(__x), 0));
+#endif
+ return __concat(__x, _Tp());
+ }
+ else if constexpr (_ToN == 4 * _Np)
+ {
+#ifdef _GLIBCXX_SIMD_WORKAROUND_XXX_3
+ if constexpr (__have_avx512dq && _TVT::template __is<double, 2>)
+ {
+ return __vector_bitcast<value_type>(
+ _mm512_insertf64x2(__m512d(), __x, 0));
+ }
+ else if constexpr (__have_avx512f
+ && std::is_floating_point_v<value_type>)
+ {
+ return __vector_bitcast<value_type>(
+ _mm512_insertf32x4(__m512(), reinterpret_cast<__m128>(__x), 0));
+ }
+ else if constexpr (__have_avx512f && _Np * sizeof(value_type) == 16)
+ {
+ return __vector_bitcast<value_type>(
+ _mm512_inserti32x4(__m512i(), __to_intrin(__x), 0));
+ }
+#endif
+ return __concat(__concat(__x, _Tp()),
+ __vector_type_t<value_type, _Np * 2>());
+ }
+ else if constexpr (_ToN == 8 * _Np)
+ return __concat(operator __vector_type_t<value_type, _Np * 4>(),
+ __vector_type_t<value_type, _Np * 4>());
+ else if constexpr (_ToN == 16 * _Np)
+ return __concat(operator __vector_type_t<value_type, _Np * 8>(),
+ __vector_type_t<value_type, _Np * 8>());
+ else
+ __assert_unreachable<_Tp>();
+ }
+};
+
+template <typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+_GLIBCXX_SIMD_INTRINSIC _ZeroExtendProxy<_Tp, _TVT>
+__zero_extend(_Tp __x)
+{
+ return {__x};
+}
+
+// }}}
+// __extract<_Np, By>{{{
+template <
+ int _Offset, int _SplitBy, typename _Tp, typename _TVT = _VectorTraits<_Tp>,
+ typename _R
+ = __vector_type_t<typename _TVT::value_type, _TVT::_S_width / _SplitBy>>
+_GLIBCXX_SIMD_INTRINSIC constexpr _R
+__extract(_Tp __in)
+{
+ using value_type = typename _TVT::value_type;
+#if _GLIBCXX_SIMD_X86INTRIN // {{{
+ if constexpr (sizeof(_Tp) == 64 && _SplitBy == 4 && _Offset > 0)
+ {
+ if constexpr (__have_avx512dq && std::is_same_v<double, value_type>)
+ return _mm512_extractf64x2_pd(__to_intrin(__in), _Offset);
+ else if constexpr (std::is_floating_point_v<value_type>)
+ return __vector_bitcast<value_type>(
+ _mm512_extractf32x4_ps(__intrin_bitcast<__m512>(__in), _Offset));
+ else
+ return reinterpret_cast<_R>(
+ _mm512_extracti32x4_epi32(__intrin_bitcast<__m512i>(__in), _Offset));
+ }
+ else
+#endif // _GLIBCXX_SIMD_X86INTRIN }}}
+ {
+#ifdef _GLIBCXX_SIMD_WORKAROUND_XXX_1
+ using _W = std::conditional_t<
+ std::is_floating_point_v<value_type>, double,
+ std::conditional_t<(sizeof(_R) >= 16), long long, value_type>>;
+ static_assert(sizeof(_R) % sizeof(_W) == 0);
+ constexpr int __return_width = sizeof(_R) / sizeof(_W);
+ using _Up = __vector_type_t<_W, __return_width>;
+ const auto __x = __vector_bitcast<_W>(__in);
+#else
+ constexpr int __return_width = _TVT::_S_width / _SplitBy;
+ using _Up = _R;
+ const __vector_type_t<value_type, _TVT::_S_width>& __x
+ = __in; // only needed for _Tp = _SimdWrapper<value_type, _Np>
+#endif
+ constexpr int _O = _Offset * __return_width;
+ return __call_with_subscripts<__return_width, _O>(
+ __x, [](auto... __entries) {
+ return reinterpret_cast<_R>(_Up{__entries...});
+ });
+ }
+}
+
+// }}}
+// __lo/__hi64[z]{{{
+template <typename _Tp,
+ typename _R
+ = __vector_type8_t<typename _VectorTraits<_Tp>::value_type>>
+_GLIBCXX_SIMD_INTRINSIC constexpr _R
+__lo64(_Tp __x)
+{
+ _R __r{};
+ __builtin_memcpy(&__r, &__x, 8);
+ return __r;
+}
+
+template <typename _Tp,
+ typename _R
+ = __vector_type8_t<typename _VectorTraits<_Tp>::value_type>>
+_GLIBCXX_SIMD_INTRINSIC constexpr _R
+__hi64(_Tp __x)
+{
+ static_assert(sizeof(_Tp) == 16, "use __hi64z if you meant it");
+ _R __r{};
+ __builtin_memcpy(&__r, reinterpret_cast<const char*>(&__x) + 8, 8);
+ return __r;
+}
+
+template <typename _Tp,
+ typename _R
+ = __vector_type8_t<typename _VectorTraits<_Tp>::value_type>>
+_GLIBCXX_SIMD_INTRINSIC constexpr _R
+__hi64z([[maybe_unused]] _Tp __x)
+{
+ _R __r{};
+ if constexpr (sizeof(_Tp) == 16)
+ __builtin_memcpy(&__r, reinterpret_cast<const char*>(&__x) + 8, 8);
+ return __r;
+}
+
+// }}}
+// __lo/__hi128{{{
+template <typename _Tp>
+_GLIBCXX_SIMD_INTRINSIC constexpr auto
+__lo128(_Tp __x)
+{
+ return __extract<0, sizeof(_Tp) / 16>(__x);
+}
+template <typename _Tp>
+_GLIBCXX_SIMD_INTRINSIC constexpr auto
+__hi128(_Tp __x)
+{
+ static_assert(sizeof(__x) == 32);
+ return __extract<1, 2>(__x);
+}
+
+// }}}
+// __lo/__hi256{{{
+template <typename _Tp>
+_GLIBCXX_SIMD_INTRINSIC constexpr auto
+__lo256(_Tp __x)
+{
+ static_assert(sizeof(__x) == 64);
+ return __extract<0, 2>(__x);
+}
+template <typename _Tp>
+_GLIBCXX_SIMD_INTRINSIC constexpr auto
+__hi256(_Tp __x)
+{
+ static_assert(sizeof(__x) == 64);
+ return __extract<1, 2>(__x);
+}
+
+// }}}
+// __auto_bitcast{{{
+template <typename _Tp> struct _AutoCast
+{
+ static_assert(__is_vector_type_v<_Tp>);
+ const _Tp __x;
+ template <typename _Up, typename _UVT = _VectorTraits<_Up>>
+ _GLIBCXX_SIMD_INTRINSIC constexpr operator _Up() const
+ {
+ return __intrin_bitcast<typename _UVT::type>(__x);
+ }
+};
+template <typename _Tp>
+_GLIBCXX_SIMD_INTRINSIC constexpr _AutoCast<_Tp>
+__auto_bitcast(const _Tp& __x)
+{
+ return {__x};
+}
+template <typename _Tp, size_t _Np>
+_GLIBCXX_SIMD_INTRINSIC constexpr _AutoCast<
+ typename _SimdWrapper<_Tp, _Np>::_BuiltinType>
+__auto_bitcast(const _SimdWrapper<_Tp, _Np>& __x)
+{
+ return {__x._M_data};
+}
+
+// }}}
+// ^^^ ---- builtin vector types [[gnu::vector_size(N)]] and operations ---- ^^^
+
+#if _GLIBCXX_SIMD_HAVE_SSE_ABI
+// __bool_storage_member_type{{{
+#if _GLIBCXX_SIMD_HAVE_AVX512F && _GLIBCXX_SIMD_X86INTRIN
+template <size_t _Size> struct __bool_storage_member_type
+{
+ static_assert((_Size & (_Size - 1)) != 0,
+ "This trait may only be used for non-power-of-2 sizes. "
+ "Power-of-2 sizes must be specialized.");
+ using type =
+ typename __bool_storage_member_type<__next_power_of_2(_Size)>::type;
+};
+template <> struct __bool_storage_member_type<1>
+{
+ using type = bool;
+};
+template <> struct __bool_storage_member_type<2>
+{
+ using type = __mmask8;
+};
+template <> struct __bool_storage_member_type<4>
+{
+ using type = __mmask8;
+};
+template <> struct __bool_storage_member_type<8>
+{
+ using type = __mmask8;
+};
+template <> struct __bool_storage_member_type<16>
+{
+ using type = __mmask16;
+};
+template <> struct __bool_storage_member_type<32>
+{
+ using type = __mmask32;
+};
+template <> struct __bool_storage_member_type<64>
+{
+ using type = __mmask64;
+};
+#endif // _GLIBCXX_SIMD_HAVE_AVX512F
+
+// }}}
+// __intrinsic_type (x86){{{
+// the following excludes bool via __is_vectorizable
+#if _GLIBCXX_SIMD_HAVE_SSE
+template <typename _Tp, size_t _Bytes>
+struct __intrinsic_type<
+ _Tp, _Bytes, std::enable_if_t<__is_vectorizable_v<_Tp> && _Bytes <= 64>>
+{
+ static_assert(!std::is_same_v<_Tp, long double>,
+ "no __intrinsic_type support for long double on x86");
+ static constexpr std::size_t _VBytes
+ = _Bytes <= 16 ? 16 : _Bytes <= 32 ? 32 : 64;
+ using type [[__gnu__::__vector_size__(_VBytes)]]
+ = std::conditional_t<std::is_integral_v<_Tp>, long long int, _Tp>;
+};
+#endif // _GLIBCXX_SIMD_HAVE_SSE
+
+// }}}
+#endif // _GLIBCXX_SIMD_HAVE_SSE_ABI
+// __intrinsic_type (ARM){{{
+#if _GLIBCXX_SIMD_HAVE_NEON
+#define _GLIBCXX_SIMD_NEON_INTRIN(_Tp) \
+ template <> \
+ struct __intrinsic_type<__remove_cvref_t<decltype(_Tp()[0])>, sizeof(_Tp), \
+ void> \
+ { \
+ using type = _Tp; \
+ }
+_GLIBCXX_SIMD_NEON_INTRIN(int8x8_t);
+_GLIBCXX_SIMD_NEON_INTRIN(int8x16_t);
+_GLIBCXX_SIMD_NEON_INTRIN(int16x4_t);
+_GLIBCXX_SIMD_NEON_INTRIN(int16x8_t);
+_GLIBCXX_SIMD_NEON_INTRIN(int32x2_t);
+_GLIBCXX_SIMD_NEON_INTRIN(int32x4_t);
+_GLIBCXX_SIMD_NEON_INTRIN(uint8x8_t);
+_GLIBCXX_SIMD_NEON_INTRIN(uint8x16_t);
+_GLIBCXX_SIMD_NEON_INTRIN(uint16x4_t);
+_GLIBCXX_SIMD_NEON_INTRIN(uint16x8_t);
+_GLIBCXX_SIMD_NEON_INTRIN(uint32x2_t);
+_GLIBCXX_SIMD_NEON_INTRIN(uint32x4_t);
+#if defined _ARM_FEATURE_FP16_VECTOR_ARITHMETIC
+_GLIBCXX_SIMD_NEON_INTRIN(float16x4_t);
+_GLIBCXX_SIMD_NEON_INTRIN(float16x8_t);
+#endif
+_GLIBCXX_SIMD_NEON_INTRIN(float32x2_t);
+_GLIBCXX_SIMD_NEON_INTRIN(float32x4_t);
+#if defined __aarch64__
+_GLIBCXX_SIMD_NEON_INTRIN(int64x1_t);
+_GLIBCXX_SIMD_NEON_INTRIN(uint64x1_t);
+_GLIBCXX_SIMD_NEON_INTRIN(float64x1_t);
+_GLIBCXX_SIMD_NEON_INTRIN(float64x2_t);
+#endif
+_GLIBCXX_SIMD_NEON_INTRIN(int64x2_t);
+_GLIBCXX_SIMD_NEON_INTRIN(uint64x2_t);
+#undef _GLIBCXX_SIMD_NEON_INTRIN
+
+template <typename _Tp, size_t _Bytes>
+struct __intrinsic_type<_Tp, _Bytes,
+ enable_if_t<__is_vectorizable_v<_Tp> && _Bytes <= 16>>
+{
+ static constexpr int _VBytes = _Bytes <= 8 ? 8 : 16;
+ using _Tmp = conditional_t<
+ sizeof(_Tp) == 1, __remove_cvref_t<decltype(int8x16_t()[0])>,
+ conditional_t<
+ sizeof(_Tp) == 2, short,
+ conditional_t<
+ sizeof(_Tp) == 4, int,
+ conditional_t<sizeof(_Tp) == 8,
+ __remove_cvref_t<decltype(int64x2_t()[0])>, void>>>>;
+ using _Up = conditional_t<
+ is_floating_point_v<_Tp>, _Tp,
+ conditional_t<is_unsigned_v<_Tp>, make_unsigned_t<_Tmp>, _Tmp>>;
+ using type = typename __intrinsic_type<_Up, _VBytes>::type;
+};
+#endif // _GLIBCXX_SIMD_HAVE_NEON
+
+// }}}
+// __intrinsic_type (PPC){{{
+#ifdef __ALTIVEC__
+template <typename _Tp> struct __intrinsic_type_impl;
+#define _GLIBCXX_SIMD_PPC_INTRIN(_Tp) \
+ template <> struct __intrinsic_type_impl<_Tp> \
+ { \
+ using type = __vector _Tp; \
+ }
+_GLIBCXX_SIMD_PPC_INTRIN(float);
+_GLIBCXX_SIMD_PPC_INTRIN(double);
+_GLIBCXX_SIMD_PPC_INTRIN(signed char);
+_GLIBCXX_SIMD_PPC_INTRIN(unsigned char);
+_GLIBCXX_SIMD_PPC_INTRIN(signed short);
+_GLIBCXX_SIMD_PPC_INTRIN(unsigned short);
+_GLIBCXX_SIMD_PPC_INTRIN(signed int);
+_GLIBCXX_SIMD_PPC_INTRIN(unsigned int);
+_GLIBCXX_SIMD_PPC_INTRIN(signed long long);
+_GLIBCXX_SIMD_PPC_INTRIN(unsigned long long);
+#undef _GLIBCXX_SIMD_PPC_INTRIN
+
+template <typename _Tp, size_t _Bytes>
+struct __intrinsic_type<
+ _Tp, _Bytes, std::enable_if_t<__is_vectorizable_v<_Tp> && _Bytes <= 16>>
+{
+ static_assert(!std::is_same_v<_Tp, long double>,
+ "no __intrinsic_type support for long double on PPC");
+#ifndef __VSX__
+ static_assert(!std::is_same_v<_Tp, double>,
+ "no __intrinsic_type support for double on PPC w/o VSX");
+#endif
+#ifndef __POWER8_VECTOR__
+ static_assert(!(std::is_integral_v<_Tp> && sizeof(_Tp) > 4),
+ "no __intrinsic_type support for integers larger than 4 Bytes "
+ "on PPC w/o POWER8 vectors");
+#endif
+ using type = typename __intrinsic_type_impl<conditional_t<
+ is_floating_point_v<_Tp>, _Tp, __int_for_sizeof_t<_Tp>>>::type;
+};
+#endif // __ALTIVEC__
+
+// }}}
+// _SimdWrapper<bool>{{{1
+template <size_t _Width>
+struct _SimdWrapper<
+ bool, _Width, std::void_t<typename __bool_storage_member_type<_Width>::type>>
+{
+ using _BuiltinType = typename __bool_storage_member_type<_Width>::type;
+ using value_type = bool;
+ static constexpr size_t _S_width = sizeof(_BuiltinType) * CHAR_BIT;
+
+ _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapper<bool, _S_width>
+ __as_full_vector() const
+ {
+ return _M_data;
+ }
+
+ _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapper() = default;
+ _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapper(_BuiltinType __k)
+ : _M_data(__k){};
+
+ _GLIBCXX_SIMD_INTRINSIC operator const _BuiltinType &() const
+ {
+ return _M_data;
+ }
+ _GLIBCXX_SIMD_INTRINSIC operator _BuiltinType&() { return _M_data; }
+
+ _GLIBCXX_SIMD_INTRINSIC _BuiltinType __intrin() const { return _M_data; }
+
+ _GLIBCXX_SIMD_INTRINSIC constexpr value_type operator[](size_t __i) const
+ {
+ return _M_data & (_BuiltinType(1) << __i);
+ }
+ template <size_t __i>
+ _GLIBCXX_SIMD_INTRINSIC constexpr value_type
+ operator[](_SizeConstant<__i>) const
+ {
+ return _M_data & (_BuiltinType(1) << __i);
+ }
+ _GLIBCXX_SIMD_INTRINSIC constexpr void __set(size_t __i, value_type __x)
+ {
+ if (__x)
+ _M_data |= (_BuiltinType(1) << __i);
+ else
+ _M_data &= ~(_BuiltinType(1) << __i);
+ }
+
+ _GLIBCXX_SIMD_INTRINSIC
+ constexpr bool _M_is_constprop() const
+ {
+ return __builtin_constant_p(_M_data);
+ }
+
+ _BuiltinType _M_data;
+};
+
+// _SimdWrapperBase{{{1
+template <bool> struct _SimdWrapperBase;
+
+template <> struct _SimdWrapperBase<true> // no padding or no SNaNs
+{
+};
+
+#ifdef __SUPPORT_SNAN__
+template <>
+struct _SimdWrapperBase<false> // with padding that needs to never become SNaN
+{
+ _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapperBase() : _M_data() {}
+};
+#endif // __SUPPORT_SNAN__
+
+// }}}
+// _SimdWrapper{{{
+template <typename _Tp, size_t _Width>
+struct _SimdWrapper<
+ _Tp, _Width,
+ std::void_t<__vector_type_t<_Tp, _Width>, __intrinsic_type_t<_Tp, _Width>>>
+ : _SimdWrapperBase<
+#ifdef __SUPPORT_SNAN__
+ !std::numeric_limits<_Tp>::has_signaling_NaN
+ || sizeof(_Tp) * _Width == sizeof(__vector_type_t<_Tp, _Width>)
+#else
+ true
+#endif
+ >
+{
+ static_assert(__is_vectorizable_v<_Tp>);
+ static_assert(_Width >= 2); // 1 doesn't make sense, use _Tp directly then
+ using _BuiltinType = __vector_type_t<_Tp, _Width>;
+ using value_type = _Tp;
+ static constexpr size_t _S_width = sizeof(_BuiltinType) / sizeof(value_type);
+ static inline constexpr int __size = _Width;
+
+ _BuiltinType _M_data;
+
+ _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapper<_Tp, _S_width>
+ __as_full_vector() const
+ {
+ return _M_data;
+ }
+
+ _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapper(
+ std::initializer_list<_Tp> __init)
+ : _M_data(__generate_from_n_evaluations<_Width, _BuiltinType>(
+ [&](auto __i) { return __init.begin()[__i.value]; }))
+ {}
+
+ _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapper() = default;
+ _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapper(const _SimdWrapper&) = default;
+ _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapper(_SimdWrapper&&) = default;
+ _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapper& operator=(const _SimdWrapper&)
+ = default;
+ _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapper& operator=(_SimdWrapper&&)
+ = default;
+
+ template <typename _V, typename = std::enable_if_t<std::disjunction_v<
+ is_same<_V, __vector_type_t<_Tp, _Width>>,
+ is_same<_V, __intrinsic_type_t<_Tp, _Width>>>>>
+ _GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapper(_V __x)
+ : _M_data(__vector_bitcast<_Tp, _Width>(
+ __x)) // __vector_bitcast can convert e.g. __m128 to __vector(2) float
+ {}
+
+ template <typename... _As,
+ typename
+ = enable_if_t<((std::is_same_v<simd_abi::scalar, _As> && ...)
+ && sizeof...(_As) <= _Width)>>
+ _GLIBCXX_SIMD_INTRINSIC constexpr operator _SimdTuple<_Tp, _As...>() const
+ {
+ const auto& dd = _M_data; // workaround for GCC7 ICE
+ return __generate_from_n_evaluations<sizeof...(_As),
+ _SimdTuple<_Tp, _As...>>([&](
+ auto __i) constexpr { return dd[int(__i)]; });
+ }
+
+ _GLIBCXX_SIMD_INTRINSIC constexpr operator const _BuiltinType &() const
+ {
+ return _M_data;
+ }
+ _GLIBCXX_SIMD_INTRINSIC constexpr operator _BuiltinType&() { return _M_data; }
+
+ _GLIBCXX_SIMD_INTRINSIC constexpr _Tp operator[](size_t __i) const
+ {
+ return _M_data[__i];
+ }
+ template <size_t __i>
+ _GLIBCXX_SIMD_INTRINSIC constexpr _Tp operator[](_SizeConstant<__i>) const
+ {
+ return _M_data[__i];
+ }
+
+ _GLIBCXX_SIMD_INTRINSIC constexpr void __set(size_t __i, _Tp __x)
+ {
+ _M_data[__i] = __x;
+ }
+
+ _GLIBCXX_SIMD_INTRINSIC
+ constexpr bool _M_is_constprop() const
+ {
+ return __builtin_constant_p(_M_data);
+ }
+};
+
+// }}}
+
+// __vectorized_sizeof {{{
+template <typename _Tp>
+constexpr size_t
+__vectorized_sizeof()
+{
+ if constexpr (!__is_vectorizable_v<_Tp>)
+ return 0;
+
+ if constexpr (sizeof(_Tp) <= 8)
+ {
+ // X86:
+ if constexpr (__have_avx512bw)
+ return 64;
+ if constexpr (__have_avx512f && sizeof(_Tp) >= 4)
+ return 64;
+ if constexpr (__have_avx2)
+ return 32;
+ if constexpr (__have_avx && std::is_floating_point_v<_Tp>)
+ return 32;
+ if constexpr (__have_sse2)
+ return 16;
+ if constexpr (__have_sse && std::is_same_v<_Tp, float>)
+ return 16;
+ if constexpr (__have_mmx && sizeof(_Tp) <= 4 && std::is_integral_v<_Tp>)
+ return 8;
+
+ // PowerPC:
+ if constexpr (__have_power8vec || (__have_power_vmx && (sizeof(_Tp) < 8))
+ || (__have_power_vsx && std::is_floating_point_v<_Tp>) )
+ return 16;
+
+ // ARM:
+ if constexpr (__have_neon_a64
+ || (__have_neon_a32 && !is_same_v<_Tp, double>) )
+ return 16;
+ if constexpr (__have_neon
+ && sizeof(_Tp) < 8
+ // Only allow fp if the user allows non-ICE559 fp (e.g. via
+ // -ffast-math). ARMv7 NEON fp is not conforming to IEC559.
+ && (__GCC_IEC_559 == 0 || !std::is_floating_point_v<_Tp>) )
+ return 16;
+ }
+
+ return sizeof(_Tp);
+};
+
+// }}}
+namespace simd_abi {
+// most of simd_abi is defined in simd_detail.h
+template <typename _Tp>
+inline constexpr int max_fixed_size
+ = (__have_avx512bw && sizeof(_Tp) == 1) ? 64 : 32;
+// compatible {{{
+#if defined __x86_64__ || defined __aarch64__
+template <typename _Tp>
+using compatible
+ = std::conditional_t<(sizeof(_Tp) <= 8), _VecBuiltin<16>, scalar>;
+#elif defined __ARM_NEON
+// FIXME: not sure, probably needs to be scalar (or dependent on the hard-float
+// ABI?)
+template <typename _Tp>
+using compatible
+ = std::conditional_t<(sizeof(_Tp) < 8), _VecBuiltin<16>, scalar>;
+#else
+template <typename> using compatible = scalar;
+#endif
+
+// }}}
+// native {{{
+template <typename _Tp>
+constexpr auto
+__determine_native_abi()
+{
+ constexpr size_t __bytes = __vectorized_sizeof<_Tp>();
+ if constexpr (__bytes == sizeof(_Tp))
+ return static_cast<scalar*>(nullptr);
+ else if constexpr (__have_avx512vl || (__have_avx512f && __bytes == 64))
+ return static_cast<_VecBltnBtmsk<__bytes>*>(nullptr);
+ else
+ return static_cast<_VecBuiltin<__bytes>*>(nullptr);
+}
+
+template <typename _Tp, typename = enable_if_t<__is_vectorizable_v<_Tp>>>
+using native = std::remove_pointer_t<decltype(__determine_native_abi<_Tp>())>;
+
+// }}}
+// __default_abi {{{
+#if defined _GLIBCXX_SIMD_DEFAULT_ABI
+template <typename _Tp> using __default_abi = _GLIBCXX_SIMD_DEFAULT_ABI<_Tp>;
+#else
+template <typename _Tp> using __default_abi = compatible<_Tp>;
+#endif
+
+// }}}
+} // namespace simd_abi
+
+// traits {{{1
+// is_abi_tag {{{2
+template <typename _Tp, typename = std::void_t<>> struct is_abi_tag : false_type
+{
+};
+template <typename _Tp>
+struct is_abi_tag<_Tp, std::void_t<typename _Tp::_IsValidAbiTag>>
+ : public _Tp::_IsValidAbiTag
+{
+};
+template <typename _Tp>
+inline constexpr bool is_abi_tag_v = is_abi_tag<_Tp>::value;
+
+// is_simd(_mask) {{{2
+template <typename _Tp> struct is_simd : public false_type
+{
+};
+template <typename _Tp> inline constexpr bool is_simd_v = is_simd<_Tp>::value;
+
+template <typename _Tp> struct is_simd_mask : public false_type
+{
+};
+template <typename _Tp>
+inline constexpr bool is_simd_mask_v = is_simd_mask<_Tp>::value;
+
+// simd_size {{{2
+template <typename _Tp, typename _Abi, typename = void> struct __simd_size_impl
+{
+};
+template <typename _Tp, typename _Abi>
+struct __simd_size_impl<
+ _Tp, _Abi,
+ enable_if_t<std::conjunction_v<__is_vectorizable<_Tp>,
+ std::experimental::is_abi_tag<_Abi>>>>
+ : _SizeConstant<_Abi::template size<_Tp>>
+{
+};
+
+template <typename _Tp, typename _Abi = simd_abi::__default_abi<_Tp>>
+struct simd_size : __simd_size_impl<_Tp, _Abi>
+{
+};
+template <typename _Tp, typename _Abi = simd_abi::__default_abi<_Tp>>
+inline constexpr size_t simd_size_v = simd_size<_Tp, _Abi>::value;
+
+// simd_abi::deduce {{{2
+template <typename _Tp, std::size_t _Np, typename = void> struct __deduce_impl;
+namespace simd_abi {
+/**
+ * \tparam _Tp The requested `value_type` for the elements.
+ * \tparam _Np The requested number of elements.
+ * \tparam _Abis This parameter is ignored, since this implementation cannot
+ * make any use of it. Either __a good native ABI is matched and used as `type`
+ * alias, or the `fixed_size<_Np>` ABI is used, which internally is built from
+ * the best matching native ABIs.
+ */
+template <typename _Tp, std::size_t _Np, typename...>
+struct deduce : std::experimental::__deduce_impl<_Tp, _Np>
+{
+};
+
+template <typename _Tp, size_t _Np, typename... _Abis>
+using deduce_t = typename deduce<_Tp, _Np, _Abis...>::type;
+} // namespace simd_abi
+
+// }}}2
+// rebind_simd {{{2
+template <typename _Tp, typename _V, typename = void> struct rebind_simd;
+template <typename _Tp, typename _Up, typename _Abi>
+struct rebind_simd<
+ _Tp, simd<_Up, _Abi>,
+ void_t<simd_abi::deduce_t<_Tp, simd_size_v<_Up, _Abi>, _Abi>>>
+{
+ using type = simd<_Tp, simd_abi::deduce_t<_Tp, simd_size_v<_Up, _Abi>, _Abi>>;
+};
+template <typename _Tp, typename _Up, typename _Abi>
+struct rebind_simd<
+ _Tp, simd_mask<_Up, _Abi>,
+ void_t<simd_abi::deduce_t<_Tp, simd_size_v<_Up, _Abi>, _Abi>>>
+{
+ using type
+ = simd_mask<_Tp, simd_abi::deduce_t<_Tp, simd_size_v<_Up, _Abi>, _Abi>>;
+};
+template <typename _Tp, typename _V>
+using rebind_simd_t = typename rebind_simd<_Tp, _V>::type;
+
+// resize_simd {{{2
+template <int _Np, typename _V, typename = void> struct resize_simd;
+template <int _Np, typename _Tp, typename _Abi>
+struct resize_simd<_Np, simd<_Tp, _Abi>,
+ void_t<simd_abi::deduce_t<_Tp, _Np, _Abi>>>
+{
+ using type = simd<_Tp, simd_abi::deduce_t<_Tp, _Np, _Abi>>;
+};
+template <int _Np, typename _Tp, typename _Abi>
+struct resize_simd<_Np, simd_mask<_Tp, _Abi>,
+ void_t<simd_abi::deduce_t<_Tp, _Np, _Abi>>>
+{
+ using type = simd_mask<_Tp, simd_abi::deduce_t<_Tp, _Np, _Abi>>;
+};
+template <int _Np, typename _V>
+using resize_simd_t = typename resize_simd<_Np, _V>::type;
+
+// }}}2
+// memory_alignment {{{2
+template <typename _Tp, typename _Up = typename _Tp::value_type>
+struct memory_alignment
+ : public _SizeConstant<__next_power_of_2(sizeof(_Up) * _Tp::size())>
+{
+};
+template <typename _Tp, typename _Up = typename _Tp::value_type>
+inline constexpr size_t memory_alignment_v = memory_alignment<_Tp, _Up>::value;
+
+// class template simd [simd] {{{1
+template <typename _Tp, typename _Abi = simd_abi::__default_abi<_Tp>>
+class simd;
+template <typename _Tp, typename _Abi>
+struct is_simd<simd<_Tp, _Abi>> : public true_type
+{
+};
+template <typename _Tp> using native_simd = simd<_Tp, simd_abi::native<_Tp>>;
+template <typename _Tp, int _Np>
+using fixed_size_simd = simd<_Tp, simd_abi::fixed_size<_Np>>;
+template <typename _Tp, size_t _Np>
+using __deduced_simd = simd<_Tp, simd_abi::deduce_t<_Tp, _Np>>;
+
+// class template simd_mask [simd_mask] {{{1
+template <typename _Tp, typename _Abi = simd_abi::__default_abi<_Tp>>
+class simd_mask;
+template <typename _Tp, typename _Abi>
+struct is_simd_mask<simd_mask<_Tp, _Abi>> : public true_type
+{
+};
+template <typename _Tp>
+using native_simd_mask = simd_mask<_Tp, simd_abi::native<_Tp>>;
+template <typename _Tp, int _Np>
+using fixed_size_simd_mask = simd_mask<_Tp, simd_abi::fixed_size<_Np>>;
+template <typename _Tp, size_t _Np>
+using __deduced_simd_mask = simd_mask<_Tp, simd_abi::deduce_t<_Tp, _Np>>;
+
+// casts [simd.casts] {{{1
+// static_simd_cast {{{2
+template <typename _Tp, typename _Up, typename _Ap, bool = is_simd_v<_Tp>,
+ typename = void>
+struct __static_simd_cast_return_type;
+
+template <typename _Tp, typename _A0, typename _Up, typename _Ap>
+struct __static_simd_cast_return_type<simd_mask<_Tp, _A0>, _Up, _Ap, false,
+ void>
+ : __static_simd_cast_return_type<simd<_Tp, _A0>, _Up, _Ap>
+{
+};
+
+template <typename _Tp, typename _Up, typename _Ap>
+struct __static_simd_cast_return_type<
+ _Tp, _Up, _Ap, true, enable_if_t<_Tp::size() == simd_size_v<_Up, _Ap>>>
+{
+ using type = _Tp;
+};
+
+template <typename _Tp, typename _Ap>
+struct __static_simd_cast_return_type<_Tp, _Tp, _Ap, false,
+#ifdef _GLIBCXX_SIMD_FIX_P2TS_ISSUE66
+ enable_if_t<__is_vectorizable_v<_Tp>>
+#else
+ void
+#endif
+ >
+{
+ using type = simd<_Tp, _Ap>;
+};
+
+template <typename _Tp, typename = void> struct __safe_make_signed
+{
+ using type = _Tp;
+};
+template <typename _Tp>
+struct __safe_make_signed<_Tp, enable_if_t<std::is_integral_v<_Tp>>>
+{
+ // the extra make_unsigned_t is because of PR85951
+ using type = std::make_signed_t<std::make_unsigned_t<_Tp>>;
+};
+template <typename _Tp>
+using safe_make_signed_t = typename __safe_make_signed<_Tp>::type;
+
+template <typename _Tp, typename _Up, typename _Ap>
+struct __static_simd_cast_return_type<_Tp, _Up, _Ap, false,
+#ifdef _GLIBCXX_SIMD_FIX_P2TS_ISSUE66
+ enable_if_t<__is_vectorizable_v<_Tp>>
+#else
+ void
+#endif
+ >
+{
+ using type = std::conditional_t<
+ (std::is_integral_v<_Up> && std::is_integral_v<_Tp> &&
+#ifndef _GLIBCXX_SIMD_FIX_P2TS_ISSUE65
+ std::is_signed_v<_Up> != std::is_signed_v<_Tp> &&
+#endif
+ std::is_same_v<safe_make_signed_t<_Up>, safe_make_signed_t<_Tp>>),
+ simd<_Tp, _Ap>, fixed_size_simd<_Tp, simd_size_v<_Up, _Ap>>>;
+};
+
+template <typename _Tp, typename _Up, typename _Ap,
+ typename _R
+ = typename __static_simd_cast_return_type<_Tp, _Up, _Ap>::type>
+_GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR _R
+static_simd_cast(const simd<_Up, _Ap>& __x)
+{
+ if constexpr (std::is_same<_R, simd<_Up, _Ap>>::value)
+ {
+ return __x;
+ }
+ else
+ {
+ _SimdConverter<_Up, _Ap, typename _R::value_type, typename _R::abi_type>
+ __c;
+ return _R(__private_init, __c(__data(__x)));
+ }
+}
+
+namespace __proposed {
+template <typename _Tp, typename _Up, typename _Ap,
+ typename _R
+ = typename __static_simd_cast_return_type<_Tp, _Up, _Ap>::type>
+_GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR typename _R::mask_type
+static_simd_cast(const simd_mask<_Up, _Ap>& __x)
+{
+ using _RM = typename _R::mask_type;
+ return {__private_init, _RM::abi_type::_MaskImpl::template __convert<
+ typename _RM::simd_type::value_type>(__x)};
+}
+} // namespace __proposed
+
+// simd_cast {{{2
+template <typename _Tp, typename _Up, typename _Ap,
+ typename _To = __value_type_or_identity_t<_Tp>>
+_GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR auto
+simd_cast(const simd<_ValuePreserving<_Up, _To>, _Ap>& __x)
+ -> decltype(static_simd_cast<_Tp>(__x))
+{
+ return static_simd_cast<_Tp>(__x);
+}
+
+namespace __proposed {
+template <typename _Tp, typename _Up, typename _Ap,
+ typename _To = __value_type_or_identity_t<_Tp>>
+_GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR auto
+simd_cast(const simd_mask<_ValuePreserving<_Up, _To>, _Ap>& __x)
+ -> decltype(static_simd_cast<_Tp>(__x))
+{
+ return static_simd_cast<_Tp>(__x);
+}
+} // namespace __proposed
+
+// }}}2
+// resizing_simd_cast {{{
+namespace __proposed {
+/* Proposed spec:
+
+template <class T, class U, class Abi>
+T resizing_simd_cast(const simd<U, Abi>& x)
+
+p1 Constraints:
+ - is_simd_v<T> is true and
+ - T::value_type is the same type as U
+
+p2 Returns:
+ A simd object with the i^th element initialized to x[i] for all i in the
+ range of [0, min(T::size(), simd_size_v<U, Abi>)). If T::size() is larger
+ than simd_size_v<U, Abi>, the remaining elements are value-initialized.
+
+template <class T, class U, class Abi>
+T resizing_simd_cast(const simd_mask<U, Abi>& x)
+
+p1 Constraints: is_simd_mask_v<T> is true
+
+p2 Returns:
+ A simd_mask object with the i^th element initialized to x[i] for all i in
+the range of [0, min(T::size(), simd_size_v<U, Abi>)). If T::size() is larger
+ than simd_size_v<U, Abi>, the remaining elements are initialized to false.
+
+ */
+
+template <typename _Tp, typename _Up, typename _Ap>
+_GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR enable_if_t<
+ conjunction_v<is_simd<_Tp>, is_same<typename _Tp::value_type, _Up>>, _Tp>
+resizing_simd_cast(const simd<_Up, _Ap>& __x)
+{
+ if constexpr (is_same_v<typename _Tp::abi_type, _Ap>)
+ return __x;
+ else if constexpr (simd_size_v<_Up, _Ap> == 1)
+ {
+ _Tp __r{};
+ __r[0] = __x[0];
+ return __r;
+ }
+ else if constexpr (_Tp::size() == 1)
+ return __x[0];
+ else if constexpr (sizeof(_Tp) == sizeof(__x) && !__is_fixed_size_abi_v<_Ap>)
+ return {__private_init,
+ __vector_bitcast<typename _Tp::value_type, _Tp::size()>(
+ _Ap::__masked(__data(__x))._M_data)};
+ else
+ {
+ _Tp __r{};
+ __builtin_memcpy(&__data(__r), &__data(__x),
+ sizeof(_Up)
+ * std::min(_Tp::size(), simd_size_v<_Up, _Ap>));
+ return __r;
+ }
+}
+
+template <typename _Tp, typename _Up, typename _Ap>
+_GLIBCXX_SIMD_INTRINSIC
+ _GLIBCXX_SIMD_CONSTEXPR enable_if_t<is_simd_mask_v<_Tp>, _Tp>
+ resizing_simd_cast(const simd_mask<_Up, _Ap>& __x)
+{
+ return {__private_init, _Tp::abi_type::_MaskImpl::template __convert<
+ typename _Tp::simd_type::value_type>(__x)};
+}
+} // namespace __proposed
+
+// }}}
+// to_fixed_size {{{2
+template <typename _Tp, int _Np>
+_GLIBCXX_SIMD_INTRINSIC fixed_size_simd<_Tp, _Np>
+to_fixed_size(const fixed_size_simd<_Tp, _Np>& __x)
+{
+ return __x;
+}
+
+template <typename _Tp, int _Np>
+_GLIBCXX_SIMD_INTRINSIC fixed_size_simd_mask<_Tp, _Np>
+to_fixed_size(const fixed_size_simd_mask<_Tp, _Np>& __x)
+{
+ return __x;
+}
+
+template <typename _Tp, typename _Ap>
+_GLIBCXX_SIMD_INTRINSIC auto
+to_fixed_size(const simd<_Tp, _Ap>& __x)
+{
+ return simd<_Tp, simd_abi::fixed_size<simd_size_v<_Tp, _Ap>>>([&__x](
+ auto __i) constexpr { return __x[__i]; });
+}
+
+template <typename _Tp, typename _Ap>
+_GLIBCXX_SIMD_INTRINSIC auto
+to_fixed_size(const simd_mask<_Tp, _Ap>& __x)
+{
+ constexpr int _Np = simd_mask<_Tp, _Ap>::size();
+ fixed_size_simd_mask<_Tp, _Np> __r;
+ __execute_n_times<_Np>([&](auto __i) constexpr { __r[__i] = __x[__i]; });
+ return __r;
+}
+
+// to_native {{{2
+template <typename _Tp, int _Np>
+_GLIBCXX_SIMD_INTRINSIC
+ enable_if_t<(_Np == native_simd<_Tp>::size()), native_simd<_Tp>>
+ to_native(const fixed_size_simd<_Tp, _Np>& __x)
+{
+ alignas(memory_alignment_v<native_simd<_Tp>>) _Tp __mem[_Np];
+ __x.copy_to(__mem, vector_aligned);
+ return {__mem, vector_aligned};
+}
+
+template <typename _Tp, size_t _Np>
+_GLIBCXX_SIMD_INTRINSIC
+ enable_if_t<(_Np == native_simd_mask<_Tp>::size()), native_simd_mask<_Tp>>
+ to_native(const fixed_size_simd_mask<_Tp, _Np>& __x)
+{
+ return native_simd_mask<_Tp>([&](auto __i) constexpr { return __x[__i]; });
+}
+
+// to_compatible {{{2
+template <typename _Tp, size_t _Np>
+_GLIBCXX_SIMD_INTRINSIC enable_if_t<(_Np == simd<_Tp>::size()), simd<_Tp>>
+to_compatible(const simd<_Tp, simd_abi::fixed_size<_Np>>& __x)
+{
+ alignas(memory_alignment_v<simd<_Tp>>) _Tp __mem[_Np];
+ __x.copy_to(__mem, vector_aligned);
+ return {__mem, vector_aligned};
+}
+
+template <typename _Tp, size_t _Np>
+_GLIBCXX_SIMD_INTRINSIC
+ enable_if_t<(_Np == simd_mask<_Tp>::size()), simd_mask<_Tp>>
+ to_compatible(const simd_mask<_Tp, simd_abi::fixed_size<_Np>>& __x)
+{
+ return simd_mask<_Tp>([&](auto __i) constexpr { return __x[__i]; });
+}
+
+// masked assignment [simd_mask.where] {{{1
+
+// where_expression {{{1
+template <typename _M, typename _Tp> class const_where_expression //{{{2
+{
+ using _V = _Tp;
+ static_assert(std::is_same_v<_V, __remove_cvref_t<_Tp>>);
+ struct Wrapper
+ {
+ using value_type = _V;
+ };
+
+protected:
+ using _Impl = typename _V::_Impl;
+
+ using value_type = typename std::conditional_t<std::is_arithmetic<_V>::value,
+ Wrapper, _V>::value_type;
+ _GLIBCXX_SIMD_INTRINSIC friend const _M&
+ __get_mask(const const_where_expression& __x)
+ {
+ return __x._M_k;
+ }
+ _GLIBCXX_SIMD_INTRINSIC friend const _Tp&
+ __get_lvalue(const const_where_expression& __x)
+ {
+ return __x._M_value;
+ }
+ const _M& _M_k;
+ _Tp& _M_value;
+
+public:
+ const_where_expression(const const_where_expression&) = delete;
+ const_where_expression& operator=(const const_where_expression&) = delete;
+
+ _GLIBCXX_SIMD_INTRINSIC const_where_expression(const _M& __kk, const _Tp& dd)
+ : _M_k(__kk), _M_value(const_cast<_Tp&>(dd))
+ {}
+
+ _GLIBCXX_SIMD_INTRINSIC _V operator-() const&&
+ {
+ return {__private_init,
+ _Impl::template __masked_unary<std::negate>(__data(_M_k),
+ __data(_M_value))};
+ }
+
+ template <typename _Up, typename _Flags>
+ [[nodiscard]] _GLIBCXX_SIMD_INTRINSIC _V
+ copy_from(const _LoadStorePtr<_Up, value_type>* __mem, _Flags __f) const&&
+ {
+ return {__private_init,
+ _Impl::__masked_load(__data(_M_value), __data(_M_k), __mem, __f)};
+ }
+
+ template <typename _Up, typename _Flags>
+ _GLIBCXX_SIMD_INTRINSIC void copy_to(_LoadStorePtr<_Up, value_type>* __mem,
+ _Flags __f) const&&
+ {
+ _Impl::__masked_store(__data(_M_value), __mem, __f, __data(_M_k));
+ }
+};
+
+template <typename _Tp> class const_where_expression<bool, _Tp> //{{{2
+{
+ using _M = bool;
+ using _V = _Tp;
+ static_assert(std::is_same_v<_V, __remove_cvref_t<_Tp>>);
+ struct Wrapper
+ {
+ using value_type = _V;
+ };
+
+protected:
+ using value_type = typename std::conditional_t<std::is_arithmetic<_V>::value,
+ Wrapper, _V>::value_type;
+ _GLIBCXX_SIMD_INTRINSIC friend const _M&
+ __get_mask(const const_where_expression& __x)
+ {
+ return __x._M_k;
+ }
+ _GLIBCXX_SIMD_INTRINSIC friend const _Tp&
+ __get_lvalue(const const_where_expression& __x)
+ {
+ return __x._M_value;
+ }
+ const bool _M_k;
+ _Tp& _M_value;
+
+public:
+ const_where_expression(const const_where_expression&) = delete;
+ const_where_expression& operator=(const const_where_expression&) = delete;
+
+ _GLIBCXX_SIMD_INTRINSIC const_where_expression(const bool __kk, const _Tp& dd)
+ : _M_k(__kk), _M_value(const_cast<_Tp&>(dd))
+ {}
+
+ _GLIBCXX_SIMD_INTRINSIC _V operator-() const&&
+ {
+ return _M_k ? -_M_value : _M_value;
+ }
+
+ template <typename _Up, typename _Flags>
+ [[nodiscard]] _GLIBCXX_SIMD_INTRINSIC _V
+ copy_from(const _LoadStorePtr<_Up, value_type>* __mem, _Flags) const&&
+ {
+ return _M_k ? static_cast<_V>(__mem[0]) : _M_value;
+ }
+
+ template <typename _Up, typename _Flags>
+ _GLIBCXX_SIMD_INTRINSIC void copy_to(_LoadStorePtr<_Up, value_type>* __mem,
+ _Flags) const&&
+ {
+ if (_M_k)
+ {
+ __mem[0] = _M_value;
+ }
+ }
+};
+
+// where_expression {{{2
+template <typename _M, typename _Tp>
+class where_expression : public const_where_expression<_M, _Tp>
+{
+ using _Impl = typename const_where_expression<_M, _Tp>::_Impl;
+
+ static_assert(!std::is_const<_Tp>::value,
+ "where_expression may only be instantiated with __a non-const "
+ "_Tp parameter");
+ using typename const_where_expression<_M, _Tp>::value_type;
+ using const_where_expression<_M, _Tp>::_M_k;
+ using const_where_expression<_M, _Tp>::_M_value;
+ static_assert(
+ std::is_same<typename _M::abi_type, typename _Tp::abi_type>::value, "");
+ static_assert(_M::size() == _Tp::size(), "");
+
+ _GLIBCXX_SIMD_INTRINSIC friend _Tp& __get_lvalue(where_expression& __x)
+ {
+ return __x._M_value;
+ }
+
+public:
+ where_expression(const where_expression&) = delete;
+ where_expression& operator=(const where_expression&) = delete;
+
+ _GLIBCXX_SIMD_INTRINSIC where_expression(const _M& __kk, _Tp& dd)
+ : const_where_expression<_M, _Tp>(__kk, dd)
+ {}
+
+ template <typename _Up> _GLIBCXX_SIMD_INTRINSIC void operator=(_Up&& __x) &&
+ {
+ _Impl::__masked_assign(__data(_M_k), __data(_M_value),
+ __to_value_type_or_member_type<_Tp>(
+ static_cast<_Up&&>(__x)));
+ }
+
+#define _GLIBCXX_SIMD_OP_(__op, __name) \
+ template <typename _Up> \
+ _GLIBCXX_SIMD_INTRINSIC void operator __op##=(_Up&& __x)&& \
+ { \
+ _Impl::template __masked_cassign( \
+ __data(_M_k), __data(_M_value), \
+ __to_value_type_or_member_type<_Tp>(static_cast<_Up&&>(__x)), \
+ [](auto __impl, auto __lhs, auto __rhs) constexpr { \
+ return __impl.__name(__lhs, __rhs); \
+ }); \
+ } \
+ static_assert(true)
+ _GLIBCXX_SIMD_OP_(+, __plus);
+ _GLIBCXX_SIMD_OP_(-, __minus);
+ _GLIBCXX_SIMD_OP_(*, __multiplies);
+ _GLIBCXX_SIMD_OP_(/, __divides);
+ _GLIBCXX_SIMD_OP_(%, __modulus);
+ _GLIBCXX_SIMD_OP_(&, __bit_and);
+ _GLIBCXX_SIMD_OP_(|, __bit_or);
+ _GLIBCXX_SIMD_OP_(^, __bit_xor);
+ _GLIBCXX_SIMD_OP_(<<, __shift_left);
+ _GLIBCXX_SIMD_OP_(>>, __shift_right);
+#undef _GLIBCXX_SIMD_OP_
+
+ _GLIBCXX_SIMD_INTRINSIC void operator++() &&
+ {
+ __data(_M_value)
+ = _Impl::template __masked_unary<__increment>(__data(_M_k),
+ __data(_M_value));
+ }
+ _GLIBCXX_SIMD_INTRINSIC void operator++(int) &&
+ {
+ __data(_M_value)
+ = _Impl::template __masked_unary<__increment>(__data(_M_k),
+ __data(_M_value));
+ }
+ _GLIBCXX_SIMD_INTRINSIC void operator--() &&
+ {
+ __data(_M_value)
+ = _Impl::template __masked_unary<__decrement>(__data(_M_k),
+ __data(_M_value));
+ }
+ _GLIBCXX_SIMD_INTRINSIC void operator--(int) &&
+ {
+ __data(_M_value)
+ = _Impl::template __masked_unary<__decrement>(__data(_M_k),
+ __data(_M_value));
+ }
+
+ // intentionally hides const_where_expression::copy_from
+ template <typename _Up, typename _Flags>
+ _GLIBCXX_SIMD_INTRINSIC void
+ copy_from(const _LoadStorePtr<_Up, value_type>* __mem, _Flags __f) &&
+ {
+ __data(_M_value)
+ = _Impl::__masked_load(__data(_M_value), __data(_M_k), __mem, __f);
+ }
+};
+
+// where_expression<bool> {{{2
+template <typename _Tp>
+class where_expression<bool, _Tp> : public const_where_expression<bool, _Tp>
+{
+ using _M = bool;
+ using typename const_where_expression<_M, _Tp>::value_type;
+ using const_where_expression<_M, _Tp>::_M_k;
+ using const_where_expression<_M, _Tp>::_M_value;
+
+public:
+ where_expression(const where_expression&) = delete;
+ where_expression& operator=(const where_expression&) = delete;
+
+ _GLIBCXX_SIMD_INTRINSIC where_expression(const _M& __kk, _Tp& dd)
+ : const_where_expression<_M, _Tp>(__kk, dd)
+ {}
+
+#define _GLIBCXX_SIMD_OP_(__op) \
+ template <typename _Up> \
+ _GLIBCXX_SIMD_INTRINSIC void operator __op(_Up&& __x)&& \
+ { \
+ if (_M_k) \
+ { \
+ _M_value __op static_cast<_Up&&>(__x); \
+ } \
+ } \
+ static_assert(true)
+ _GLIBCXX_SIMD_OP_(=);
+ _GLIBCXX_SIMD_OP_(+=);
+ _GLIBCXX_SIMD_OP_(-=);
+ _GLIBCXX_SIMD_OP_(*=);
+ _GLIBCXX_SIMD_OP_(/=);
+ _GLIBCXX_SIMD_OP_(%=);
+ _GLIBCXX_SIMD_OP_(&=);
+ _GLIBCXX_SIMD_OP_(|=);
+ _GLIBCXX_SIMD_OP_(^=);
+ _GLIBCXX_SIMD_OP_(<<=);
+ _GLIBCXX_SIMD_OP_(>>=);
+#undef _GLIBCXX_SIMD_OP_
+ _GLIBCXX_SIMD_INTRINSIC void operator++() &&
+ {
+ if (_M_k)
+ {
+ ++_M_value;
+ }
+ }
+ _GLIBCXX_SIMD_INTRINSIC void operator++(int) &&
+ {
+ if (_M_k)
+ {
+ ++_M_value;
+ }
+ }
+ _GLIBCXX_SIMD_INTRINSIC void operator--() &&
+ {
+ if (_M_k)
+ {
+ --_M_value;
+ }
+ }
+ _GLIBCXX_SIMD_INTRINSIC void operator--(int) &&
+ {
+ if (_M_k)
+ {
+ --_M_value;
+ }
+ }
+
+ // intentionally hides const_where_expression::copy_from
+ template <typename _Up, typename _Flags>
+ _GLIBCXX_SIMD_INTRINSIC void
+ copy_from(const _LoadStorePtr<_Up, value_type>* __mem, _Flags) &&
+ {
+ if (_M_k)
+ {
+ _M_value = __mem[0];
+ }
+ }
+};
+
+// where_expression<_M, tuple<...>> {{{2
+
+// where {{{1
+template <typename _Tp, typename _Ap>
+_GLIBCXX_SIMD_INTRINSIC where_expression<simd_mask<_Tp, _Ap>, simd<_Tp, _Ap>>
+where(const typename simd<_Tp, _Ap>::mask_type& __k, simd<_Tp, _Ap>& __value)
+{
+ return {__k, __value};
+}
+template <typename _Tp, typename _Ap>
+_GLIBCXX_SIMD_INTRINSIC
+ const_where_expression<simd_mask<_Tp, _Ap>, simd<_Tp, _Ap>>
+ where(const typename simd<_Tp, _Ap>::mask_type& __k,
+ const simd<_Tp, _Ap>& __value)
+{
+ return {__k, __value};
+}
+template <typename _Tp, typename _Ap>
+_GLIBCXX_SIMD_INTRINSIC
+ where_expression<simd_mask<_Tp, _Ap>, simd_mask<_Tp, _Ap>>
+ where(const std::remove_const_t<simd_mask<_Tp, _Ap>>& __k,
+ simd_mask<_Tp, _Ap>& __value)
+{
+ return {__k, __value};
+}
+template <typename _Tp, typename _Ap>
+_GLIBCXX_SIMD_INTRINSIC
+ const_where_expression<simd_mask<_Tp, _Ap>, simd_mask<_Tp, _Ap>>
+ where(const std::remove_const_t<simd_mask<_Tp, _Ap>>& __k,
+ const simd_mask<_Tp, _Ap>& __value)
+{
+ return {__k, __value};
+}
+template <typename _Tp>
+_GLIBCXX_SIMD_INTRINSIC where_expression<bool, _Tp>
+where(_ExactBool __k, _Tp& __value)
+{
+ return {__k, __value};
+}
+template <typename _Tp>
+_GLIBCXX_SIMD_INTRINSIC const_where_expression<bool, _Tp>
+where(_ExactBool __k, const _Tp& __value)
+{
+ return {__k, __value};
+}
+template <typename _Tp, typename _Ap>
+void
+where(bool __k, simd<_Tp, _Ap>& __value)
+ = delete;
+template <typename _Tp, typename _Ap>
+void
+where(bool __k, const simd<_Tp, _Ap>& __value)
+ = delete;
+
+// proposed mask iterations {{{1
+namespace __proposed {
+template <size_t _Np> class where_range
+{
+ const std::bitset<_Np> __bits;
+
+public:
+ where_range(std::bitset<_Np> __b) : __bits(__b) {}
+
+ class iterator
+ {
+ size_t __mask;
+ size_t __bit;
+
+ _GLIBCXX_SIMD_INTRINSIC void __next_bit()
+ {
+ __bit = __builtin_ctzl(__mask);
+ }
+ _GLIBCXX_SIMD_INTRINSIC void __reset_lsb()
+ {
+ // 01100100 - 1 = 01100011
+ __mask &= (__mask - 1);
+ // __asm__("btr %1,%0" : "+r"(__mask) : "r"(__bit));
+ }
+
+ public:
+ iterator(decltype(__mask) __m) : __mask(__m) { __next_bit(); }
+ iterator(const iterator&) = default;
+ iterator(iterator&&) = default;
+
+ _GLIBCXX_SIMD_ALWAYS_INLINE size_t operator->() const { return __bit; }
+ _GLIBCXX_SIMD_ALWAYS_INLINE size_t operator*() const { return __bit; }
+
+ _GLIBCXX_SIMD_ALWAYS_INLINE iterator& operator++()
+ {
+ __reset_lsb();
+ __next_bit();
+ return *this;
+ }
+ _GLIBCXX_SIMD_ALWAYS_INLINE iterator operator++(int)
+ {
+ iterator __tmp = *this;
+ __reset_lsb();
+ __next_bit();
+ return __tmp;
+ }
+
+ _GLIBCXX_SIMD_ALWAYS_INLINE bool operator==(const iterator& __rhs) const
+ {
+ return __mask == __rhs.__mask;
+ }
+ _GLIBCXX_SIMD_ALWAYS_INLINE bool operator!=(const iterator& __rhs) const
+ {
+ return __mask != __rhs.__mask;
+ }
+ };
+
+ iterator begin() const { return __bits.to_ullong(); }
+ iterator end() const { return 0; }
+};
+
+template <typename _Tp, typename _Ap>
+where_range<simd_size_v<_Tp, _Ap>>
+where(const simd_mask<_Tp, _Ap>& __k)
+{
+ return __k.__to_bitset();
+}
+
+} // namespace __proposed
+
+// }}}1
+// reductions [simd.reductions] {{{1
+template <typename _Tp, typename _Abi, typename _BinaryOperation = std::plus<>>
+_GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR _Tp
+reduce(const simd<_Tp, _Abi>& __v,
+ _BinaryOperation __binary_op = _BinaryOperation())
+{
+ return _Abi::_SimdImpl::__reduce(__v, __binary_op);
+}
+
+template <typename _M, typename _V, typename _BinaryOperation = std::plus<>>
+_GLIBCXX_SIMD_INTRINSIC typename _V::value_type
+reduce(const const_where_expression<_M, _V>& __x,
+ typename _V::value_type __identity_element, _BinaryOperation __binary_op)
+{
+ if (__builtin_expect(none_of(__get_mask(__x)), false))
+ return __identity_element;
+
+ _V __tmp = __identity_element;
+ _V::_Impl::__masked_assign(__data(__get_mask(__x)), __data(__tmp),
+ __data(__get_lvalue(__x)));
+ return reduce(__tmp, __binary_op);
+}
+
+template <typename _M, typename _V>
+_GLIBCXX_SIMD_INTRINSIC typename _V::value_type
+reduce(const const_where_expression<_M, _V>& __x, std::plus<> __binary_op = {})
+{
+ return reduce(__x, 0, __binary_op);
+}
+
+template <typename _M, typename _V>
+_GLIBCXX_SIMD_INTRINSIC typename _V::value_type
+reduce(const const_where_expression<_M, _V>& __x, std::multiplies<> __binary_op)
+{
+ return reduce(__x, 1, __binary_op);
+}
+
+template <typename _M, typename _V>
+_GLIBCXX_SIMD_INTRINSIC typename _V::value_type
+reduce(const const_where_expression<_M, _V>& __x, std::bit_and<> __binary_op)
+{
+ return reduce(__x, ~typename _V::value_type(), __binary_op);
+}
+
+template <typename _M, typename _V>
+_GLIBCXX_SIMD_INTRINSIC typename _V::value_type
+reduce(const const_where_expression<_M, _V>& __x, std::bit_or<> __binary_op)
+{
+ return reduce(__x, 0, __binary_op);
+}
+
+template <typename _M, typename _V>
+_GLIBCXX_SIMD_INTRINSIC typename _V::value_type
+reduce(const const_where_expression<_M, _V>& __x, std::bit_xor<> __binary_op)
+{
+ return reduce(__x, 0, __binary_op);
+}
+
+// }}}1
+// algorithms [simd.alg] {{{
+template <typename _Tp, typename _Ap>
+_GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR simd<_Tp, _Ap>
+min(const simd<_Tp, _Ap>& __a, const simd<_Tp, _Ap>& __b)
+{
+ return {__private_init, _Ap::_SimdImpl::__min(__data(__a), __data(__b))};
+}
+template <typename _Tp, typename _Ap>
+_GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR simd<_Tp, _Ap>
+max(const simd<_Tp, _Ap>& __a, const simd<_Tp, _Ap>& __b)
+{
+ return {__private_init, _Ap::_SimdImpl::__max(__data(__a), __data(__b))};
+}
+template <typename _Tp, typename _Ap>
+_GLIBCXX_SIMD_INTRINSIC
+ _GLIBCXX_SIMD_CONSTEXPR std::pair<simd<_Tp, _Ap>, simd<_Tp, _Ap>>
+ minmax(const simd<_Tp, _Ap>& __a, const simd<_Tp, _Ap>& __b)
+{
+ const auto pair_of_members
+ = _Ap::_SimdImpl::__minmax(__data(__a), __data(__b));
+ return {simd<_Tp, _Ap>(__private_init, pair_of_members.first),
+ simd<_Tp, _Ap>(__private_init, pair_of_members.second)};
+}
+template <typename _Tp, typename _Ap>
+_GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR simd<_Tp, _Ap>
+clamp(const simd<_Tp, _Ap>& __v, const simd<_Tp, _Ap>& __lo,
+ const simd<_Tp, _Ap>& __hi)
+{
+ using _Impl = typename _Ap::_SimdImpl;
+ return {__private_init,
+ _Impl::__min(__data(__hi), _Impl::__max(__data(__lo), __data(__v)))};
+}
+
+// }}}
+
+namespace _P0918 {
+// shuffle {{{1
+template <int _Stride, int _Offset = 0> struct strided
+{
+ static constexpr int _S_stride = _Stride;
+ static constexpr int _S_offset = _Offset;
+ template <typename _Tp, typename _Ap>
+ using __shuffle_return_type = simd<
+ _Tp, simd_abi::deduce_t<
+ _Tp, __div_roundup(simd_size_v<_Tp, _Ap> - _Offset, _Stride), _Ap>>;
+ // alternative, always use fixed_size:
+ // fixed_size_simd<_Tp, __div_roundup(simd_size_v<_Tp, _Ap> - _Offset,
+ // _Stride)>;
+ template <typename _Tp> static constexpr auto __src_index(_Tp __dst_index)
+ {
+ return _Offset + __dst_index * _Stride;
+ }
+};
+
+// SFINAE for the return type ensures _P is a type that provides the alias
+// template member
+// __shuffle_return_type and the static member function __src_index
+template <typename _P, typename _Tp, typename _Ap,
+ typename _R = typename _P::template __shuffle_return_type<_Tp, _Ap>,
+ typename
+ = decltype(_P::__src_index(std::experimental::_SizeConstant<0>()))>
+_GLIBCXX_SIMD_INTRINSIC _R
+shuffle(const simd<_Tp, _Ap>& __x)
+{
+ return _R([&__x](auto __i) constexpr { return __x[_P::__src_index(__i)]; });
+}
+
+// }}}1
+} // namespace _P0918
+
+namespace __proposed {
+using namespace _P0918;
+} // namespace __proposed
+
+template <size_t... _Sizes, typename _Tp, typename _Ap,
+ typename = enable_if_t<((_Sizes + ...) == simd<_Tp, _Ap>::size())>>
+inline std::tuple<simd<_Tp, simd_abi::deduce_t<_Tp, _Sizes>>...>
+split(const simd<_Tp, _Ap>&);
+
+// __extract_part {{{
+template <int _Index, int _Total, int _Combine = 1, typename _Tp, size_t _Np>
+_GLIBCXX_SIMD_INTRINSIC
+ _GLIBCXX_CONST _SimdWrapper<_Tp, _Np / _Total * _Combine>
+ __extract_part(const _SimdWrapper<_Tp, _Np> __x);
+
+template <int Index, int Parts, int _Combine = 1, typename _Tp, typename _A0,
+ typename... _As>
+_GLIBCXX_SIMD_INTRINSIC auto
+__extract_part(const _SimdTuple<_Tp, _A0, _As...>& __x);
+
+// }}}
+// _SizeList {{{
+template <size_t _V0, size_t... _Values> struct _SizeList
+{
+ template <size_t _I> static constexpr size_t __at(_SizeConstant<_I> = {})
+ {
+ if constexpr (_I == 0)
+ {
+ return _V0;
+ }
+ else
+ {
+ return _SizeList<_Values...>::template __at<_I - 1>();
+ }
+ }
+
+ template <size_t _I> static constexpr auto __before(_SizeConstant<_I> = {})
+ {
+ if constexpr (_I == 0)
+ {
+ return _SizeConstant<0>();
+ }
+ else
+ {
+ return _SizeConstant<
+ _V0 + _SizeList<_Values...>::template __before<_I - 1>()>();
+ }
+ }
+
+ template <size_t _Np>
+ static constexpr auto __pop_front(_SizeConstant<_Np> = {})
+ {
+ if constexpr (_Np == 0)
+ {
+ return _SizeList();
+ }
+ else
+ {
+ return _SizeList<_Values...>::template __pop_front<_Np - 1>();
+ }
+ }
+};
+// }}}
+// __extract_center {{{
+template <typename _Tp, size_t _Np>
+_GLIBCXX_SIMD_INTRINSIC _SimdWrapper<_Tp, _Np / 2>
+__extract_center(_SimdWrapper<_Tp, _Np> __x)
+{
+ static_assert(_Np >= 4);
+ static_assert(_Np % 4 == 0); // x0 - x1 - x2 - x3 -> return {x1, x2}
+#if _GLIBCXX_SIMD_X86INTRIN // {{{
+ if constexpr (__have_avx512f && sizeof(_Tp) * _Np == 64)
+ {
+ const auto __intrin = __to_intrin(__x);
+ if constexpr (std::is_integral_v<_Tp>)
+ return __vector_bitcast<_Tp>(_mm512_castsi512_si256(
+ _mm512_shuffle_i32x4(__intrin, __intrin,
+ 1 + 2 * 0x4 + 2 * 0x10 + 3 * 0x40)));
+ else if constexpr (sizeof(_Tp) == 4)
+ return __vector_bitcast<_Tp>(_mm512_castps512_ps256(
+ _mm512_shuffle_f32x4(__intrin, __intrin,
+ 1 + 2 * 0x4 + 2 * 0x10 + 3 * 0x40)));
+ else if constexpr (sizeof(_Tp) == 8)
+ return __vector_bitcast<_Tp>(_mm512_castpd512_pd256(
+ _mm512_shuffle_f64x2(__intrin, __intrin,
+ 1 + 2 * 0x4 + 2 * 0x10 + 3 * 0x40)));
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (sizeof(_Tp) * _Np == 32 && std::is_floating_point_v<_Tp>)
+ return __vector_bitcast<_Tp>(
+ _mm_shuffle_pd(__lo128(__vector_bitcast<double>(__x)),
+ __hi128(__vector_bitcast<double>(__x)), 1));
+ else if constexpr (sizeof(__x) == 32 && sizeof(_Tp) * _Np <= 32)
+ return __vector_bitcast<_Tp>(
+ _mm_alignr_epi8(__hi128(__vector_bitcast<_LLong>(__x)),
+ __lo128(__vector_bitcast<_LLong>(__x)),
+ sizeof(_Tp) * _Np / 4));
+ else
+#endif // _GLIBCXX_SIMD_X86INTRIN }}}
+ {
+ __vector_type_t<_Tp, _Np / 2> __r;
+ __builtin_memcpy(&__r,
+ reinterpret_cast<const char*>(&__x)
+ + sizeof(_Tp) * _Np / 4,
+ sizeof(_Tp) * _Np / 2);
+ return __r;
+ }
+}
+
+template <typename _Tp, typename _A0, typename... _As>
+_GLIBCXX_SIMD_INTRINSIC
+ _SimdWrapper<_Tp, _SimdTuple<_Tp, _A0, _As...>::size() / 2>
+ __extract_center(const _SimdTuple<_Tp, _A0, _As...>& __x)
+{
+ if constexpr (sizeof...(_As) == 0)
+ return __extract_center(__x.first);
+ else
+ return __extract_part<1, 4, 2>(__x);
+}
+
+// }}}
+// __split_wrapper {{{
+template <size_t... _Sizes, typename _Tp, typename... _As>
+auto
+__split_wrapper(_SizeList<_Sizes...>, const _SimdTuple<_Tp, _As...>& __x)
+{
+ return std::experimental::split<_Sizes...>(
+ fixed_size_simd<_Tp, _SimdTuple<_Tp, _As...>::size()>(__private_init, __x));
+}
+
+// }}}
+
+// split<simd>(simd) {{{
+template <typename _V, typename _Ap,
+ size_t Parts = simd_size_v<typename _V::value_type, _Ap> / _V::size()>
+inline enable_if_t<
+ (is_simd<_V>::value
+ && simd_size_v<typename _V::value_type, _Ap> == Parts * _V::size()),
+ std::array<_V, Parts>>
+split(const simd<typename _V::value_type, _Ap>& __x)
+{
+ using _Tp = typename _V::value_type;
+ if constexpr (Parts == 1)
+ {
+ return {simd_cast<_V>(__x)};
+ }
+ else if (__x._M_is_constprop())
+ {
+ return __generate_from_n_evaluations<Parts, std::array<_V, Parts>>([&](
+ auto __i) constexpr {
+ return _V([&](auto __j) constexpr {
+ return __x[__i * _V::size() + __j];
+ });
+ });
+ }
+ else if constexpr (
+ __is_fixed_size_abi_v<_Ap>
+ && (std::is_same_v<typename _V::abi_type, simd_abi::scalar>
+ || (__is_fixed_size_abi_v<typename _V::abi_type>
+ && sizeof(_V) == sizeof(_Tp) * _V::size() // _V doesn't have padding
+ )))
+ {
+ // fixed_size -> fixed_size (w/o padding) or scalar
+#ifdef _GLIBCXX_SIMD_USE_ALIASING_LOADS
+ const __may_alias<_Tp>* const __element_ptr
+ = reinterpret_cast<const __may_alias<_Tp>*>(&__data(__x));
+ return __generate_from_n_evaluations<Parts, std::array<_V, Parts>>([&](
+ auto __i) constexpr {
+ return _V(__element_ptr + __i * _V::size(), vector_aligned);
+ });
+#else
+ const auto& __xx = __data(__x);
+ return __generate_from_n_evaluations<Parts, std::array<_V, Parts>>([&](
+ auto __i) constexpr {
+ [[maybe_unused]] constexpr size_t __offset
+ = decltype(__i)::value * _V::size();
+ return _V([&](auto __j) constexpr {
+ constexpr _SizeConstant<__j + __offset> __k;
+ return __xx[__k];
+ });
+ });
+#endif
+ }
+ else if constexpr (std::is_same_v<typename _V::abi_type, simd_abi::scalar>)
+ {
+ // normally memcpy should work here as well
+ return __generate_from_n_evaluations<Parts, std::array<_V, Parts>>([&](
+ auto __i) constexpr { return __x[__i]; });
+ }
+ else
+ {
+ return __generate_from_n_evaluations<Parts, std::array<_V, Parts>>([&](
+ auto __i) constexpr {
+ if constexpr (__is_fixed_size_abi_v<typename _V::abi_type>)
+ {
+ return _V([&](auto __j) constexpr {
+ return __x[__i * _V::size() + __j];
+ });
+ }
+ else
+ {
+ return _V(__private_init,
+ __extract_part<decltype(__i)::value, Parts>(__data(__x)));
+ }
+ });
+ }
+}
+
+// }}}
+// split<simd_mask>(simd_mask) {{{
+template <typename _V, typename _Ap,
+ size_t _Parts
+ = simd_size_v<typename _V::simd_type::value_type, _Ap> / _V::size()>
+enable_if_t<
+ (is_simd_mask_v<
+ _V> && simd_size_v<typename _V::simd_type::value_type, _Ap> == _Parts * _V::size()),
+ std::array<_V, _Parts>>
+split(const simd_mask<typename _V::simd_type::value_type, _Ap>& __x)
+{
+ if constexpr (std::is_same_v<_Ap, typename _V::abi_type>)
+ {
+ return {__x};
+ }
+ else if constexpr (_Parts == 1)
+ {
+ return {__proposed::static_simd_cast<_V>(__x)};
+ }
+ else if constexpr (_Parts == 2 && __is_sse_abi<typename _V::abi_type>()
+ && __is_avx_abi<_Ap>())
+ {
+ return {_V(__private_init, __lo128(__data(__x))),
+ _V(__private_init, __hi128(__data(__x)))};
+ }
+ else if constexpr (_V::size() <= CHAR_BIT * sizeof(_ULLong))
+ {
+ const std::bitset __bits = __x.__to_bitset();
+ return __generate_from_n_evaluations<_Parts, std::array<_V, _Parts>>([&](
+ auto __i) constexpr {
+ constexpr size_t __offset = __i * _V::size();
+ return _V(__bitset_init, (__bits >> __offset).to_ullong());
+ });
+ }
+ else
+ {
+ return __generate_from_n_evaluations<_Parts, std::array<_V, _Parts>>([&](
+ auto __i) constexpr {
+ constexpr size_t __offset = __i * _V::size();
+ return _V(
+ __private_init, [&](auto __j) constexpr {
+ return __x[__j + __offset];
+ });
+ });
+ }
+}
+
+// }}}
+// split<_Sizes...>(simd) {{{
+template <size_t... _Sizes, typename _Tp, typename _Ap, typename>
+_GLIBCXX_SIMD_ALWAYS_INLINE
+ std::tuple<simd<_Tp, simd_abi::deduce_t<_Tp, _Sizes>>...>
+ split(const simd<_Tp, _Ap>& __x)
+{
+ using _SL = _SizeList<_Sizes...>;
+ using _Tuple = std::tuple<__deduced_simd<_Tp, _Sizes>...>;
+ constexpr size_t _Np = simd_size_v<_Tp, _Ap>;
+ constexpr size_t _N0 = _SL::template __at<0>();
+ using _V = __deduced_simd<_Tp, _N0>;
+
+ if (__x._M_is_constprop())
+ return __generate_from_n_evaluations<sizeof...(_Sizes), _Tuple>([&](
+ auto __i) constexpr {
+ using _Vi = __deduced_simd<_Tp, _SL::__at(__i)>;
+ constexpr size_t __offset = _SL::__before(__i);
+ return _Vi([&](auto __j) constexpr { return __x[__offset + __j]; });
+ });
+ else if constexpr (_Np == _N0)
+ {
+ static_assert(sizeof...(_Sizes) == 1);
+ return {simd_cast<_V>(__x)};
+ }
+ else if constexpr // split from fixed_size, such that __x::first.size == _N0
+ (__is_fixed_size_abi_v<
+ _Ap> && __fixed_size_storage_t<_Tp, _Np>::_S_first_size == _N0)
+ {
+ static_assert(!__is_fixed_size_abi_v<typename _V::abi_type>,
+ "How can <_Tp, _Np> be __a single _SimdTuple entry but __a "
+ "fixed_size_simd "
+ "when deduced?");
+ // extract first and recurse (__split_wrapper is needed to deduce a new
+ // _Sizes pack)
+ return std::tuple_cat(
+ std::make_tuple(_V(__private_init, __data(__x).first)),
+ __split_wrapper(_SL::template __pop_front<1>(), __data(__x).second));
+ }
+ else if constexpr ((!std::is_same_v<simd_abi::scalar,
+ simd_abi::deduce_t<_Tp, _Sizes>> && ...)
+ && (!__is_fixed_size_abi_v<
+ simd_abi::deduce_t<_Tp, _Sizes>> && ...))
+ {
+ if constexpr (((_Sizes * 2 == _Np) && ...))
+ return {{__private_init, __extract_part<0, 2>(__data(__x))},
+ {__private_init, __extract_part<1, 2>(__data(__x))}};
+ else if constexpr (std::is_same_v<_SizeList<_Sizes...>,
+ _SizeList<_Np / 3, _Np / 3, _Np / 3>>)
+ return {{__private_init, __extract_part<0, 3>(__data(__x))},
+ {__private_init, __extract_part<1, 3>(__data(__x))},
+ {__private_init, __extract_part<2, 3>(__data(__x))}};
+ else if constexpr (std::is_same_v<_SizeList<_Sizes...>,
+ _SizeList<2 * _Np / 3, _Np / 3>>)
+ return {{__private_init, __extract_part<0, 3, 2>(__data(__x))},
+ {__private_init, __extract_part<2, 3>(__data(__x))}};
+ else if constexpr (std::is_same_v<_SizeList<_Sizes...>,
+ _SizeList<_Np / 3, 2 * _Np / 3>>)
+ return {{__private_init, __extract_part<0, 3>(__data(__x))},
+ {__private_init, __extract_part<1, 3, 2>(__data(__x))}};
+ else if constexpr (std::is_same_v<_SizeList<_Sizes...>,
+ _SizeList<_Np / 2, _Np / 4, _Np / 4>>)
+ return {{__private_init, __extract_part<0, 2>(__data(__x))},
+ {__private_init, __extract_part<2, 4>(__data(__x))},
+ {__private_init, __extract_part<3, 4>(__data(__x))}};
+ else if constexpr (std::is_same_v<_SizeList<_Sizes...>,
+ _SizeList<_Np / 4, _Np / 4, _Np / 2>>)
+ return {{__private_init, __extract_part<0, 4>(__data(__x))},
+ {__private_init, __extract_part<1, 4>(__data(__x))},
+ {__private_init, __extract_part<1, 2>(__data(__x))}};
+ else if constexpr (std::is_same_v<_SizeList<_Sizes...>,
+ _SizeList<_Np / 4, _Np / 2, _Np / 4>>)
+ return {{__private_init, __extract_part<0, 4>(__data(__x))},
+ {__private_init, __extract_center(__data(__x))},
+ {__private_init, __extract_part<3, 4>(__data(__x))}};
+ else if constexpr (((_Sizes * 4 == _Np) && ...))
+ return {{__private_init, __extract_part<0, 4>(__data(__x))},
+ {__private_init, __extract_part<1, 4>(__data(__x))},
+ {__private_init, __extract_part<2, 4>(__data(__x))},
+ {__private_init, __extract_part<3, 4>(__data(__x))}};
+ // else fall through
+ }
+#ifdef _GLIBCXX_SIMD_USE_ALIASING_LOADS
+ const __may_alias<_Tp>* const __element_ptr
+ = reinterpret_cast<const __may_alias<_Tp>*>(&__x);
+ return __generate_from_n_evaluations<sizeof...(_Sizes), _Tuple>([&](
+ auto __i) constexpr {
+ using _Vi = __deduced_simd<_Tp, _SL::__at(__i)>;
+ constexpr size_t __offset = _SL::__before(__i);
+ constexpr size_t __base_align = alignof(simd<_Tp, _Ap>);
+ constexpr size_t __a
+ = __base_align - ((__offset * sizeof(_Tp)) % __base_align);
+ constexpr size_t __b = ((__a - 1) & __a) ^ __a;
+ constexpr size_t __alignment = __b == 0 ? __a : __b;
+ return _Vi(__element_ptr + __offset, overaligned<__alignment>);
+ });
+#else
+ return __generate_from_n_evaluations<sizeof...(_Sizes), _Tuple>([&](
+ auto __i) constexpr {
+ using _Vi = __deduced_simd<_Tp, _SL::__at(__i)>;
+ const auto& __xx = __data(__x);
+ using _Offset = decltype(_SL::__before(__i));
+ return _Vi([&](auto __j) constexpr {
+ constexpr _SizeConstant<_Offset::value + __j> __k;
+ return __xx[__k];
+ });
+ });
+#endif
+}
+
+// }}}
+
+// __subscript_in_pack {{{
+template <size_t _I, typename _Tp, typename _Ap, typename... _As>
+_GLIBCXX_SIMD_INTRINSIC constexpr _Tp
+__subscript_in_pack(const simd<_Tp, _Ap>& __x, const simd<_Tp, _As>&... __xs)
+{
+ if constexpr (_I < simd_size_v<_Tp, _Ap>)
+ return __x[_I];
+ else
+ return __subscript_in_pack<_I - simd_size_v<_Tp, _Ap>>(__xs...);
+}
+
+// }}}
+// __store_pack_of_simd {{{
+template <typename _Tp, typename _A0, typename... _As>
+_GLIBCXX_SIMD_INTRINSIC void
+__store_pack_of_simd(char* __mem, const simd<_Tp, _A0>& __x0,
+ const simd<_Tp, _As>&... __xs)
+{
+ constexpr size_t __n_bytes = sizeof(_Tp) * simd_size_v<_Tp, _A0>;
+ __builtin_memcpy(__mem, &__data(__x0), __n_bytes);
+ if constexpr (sizeof...(__xs) > 0)
+ __store_pack_of_simd(__mem + __n_bytes, __xs...);
+}
+
+// }}}
+// concat(simd...) {{{
+template <typename _Tp, typename... _As>
+inline _GLIBCXX_SIMD_CONSTEXPR
+simd<_Tp, simd_abi::deduce_t<_Tp, (simd_size_v<_Tp, _As> + ...)>>
+concat(const simd<_Tp, _As>&... __xs)
+{
+ using _Rp = __deduced_simd<_Tp, (simd_size_v<_Tp, _As> + ...)>;
+ if constexpr(sizeof...(__xs) == 1)
+ return simd_cast<_Rp>(__xs...);
+ else if ((... && __xs._M_is_constprop()))
+ return simd<_Tp,
+ simd_abi::deduce_t<_Tp, (simd_size_v<_Tp, _As> + ...)>>([&](
+ auto __i) constexpr { return __subscript_in_pack<__i>(__xs...); });
+ else
+ {
+ _Rp __r{};
+ __store_pack_of_simd(reinterpret_cast<char*>(&__data(__r)), __xs...);
+ return __r;
+ }
+}
+
+// }}}
+// concat(array<simd>) {{{
+template <typename _Tp, typename _Abi, size_t _Np>
+_GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR
+__deduced_simd<_Tp, simd_size_v<_Tp, _Abi> * _Np>
+concat(const std::array<simd<_Tp, _Abi>, _Np>& __x)
+{
+ return __call_with_subscripts<_Np>(__x, [](const auto&... __xs) {
+ return concat(__xs...);
+ });
+}
+
+// }}}
+
+// _SmartReference {{{
+template <typename _Up, typename _Accessor = _Up,
+ typename _ValueType = typename _Up::value_type>
+class _SmartReference
+{
+ friend _Accessor;
+ int _M_index;
+ _Up& _M_obj;
+
+ _GLIBCXX_SIMD_INTRINSIC constexpr _ValueType __read() const noexcept
+ {
+ if constexpr (std::is_arithmetic_v<_Up>)
+ {
+ _GLIBCXX_DEBUG_ASSERT(_M_index == 0);
+ return _M_obj;
+ }
+ else
+ {
+ return _M_obj[_M_index];
+ }
+ }
+
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC constexpr void __write(_Tp&& __x) const
+ {
+ _Accessor::__set(_M_obj, _M_index, static_cast<_Tp&&>(__x));
+ }
+
+public:
+ _GLIBCXX_SIMD_INTRINSIC constexpr _SmartReference(_Up& __o, int __i) noexcept
+ : _M_index(__i), _M_obj(__o)
+ {}
+
+ using value_type = _ValueType;
+
+ _GLIBCXX_SIMD_INTRINSIC _SmartReference(const _SmartReference&) = delete;
+
+ _GLIBCXX_SIMD_INTRINSIC constexpr operator value_type() const noexcept
+ {
+ return __read();
+ }
+
+ template <typename _Tp,
+ typename = _ValuePreservingOrInt<__remove_cvref_t<_Tp>, value_type>>
+ _GLIBCXX_SIMD_INTRINSIC constexpr _SmartReference operator=(_Tp&& __x) &&
+ {
+ __write(static_cast<_Tp&&>(__x));
+ return {_M_obj, _M_index};
+ }
+
+ // TODO: improve with operator.()
+
+#define _GLIBCXX_SIMD_OP_(__op) \
+ template <typename _Tp, \
+ typename _TT \
+ = decltype(std::declval<value_type>() __op std::declval<_Tp>()), \
+ typename = _ValuePreservingOrInt<__remove_cvref_t<_Tp>, _TT>, \
+ typename = _ValuePreservingOrInt<_TT, value_type>> \
+ _GLIBCXX_SIMD_INTRINSIC constexpr _SmartReference operator __op##=( \
+ _Tp&& __x)&& \
+ { \
+ const value_type& __lhs = __read(); \
+ __write(__lhs __op __x); \
+ return {_M_obj, _M_index}; \
+ }
+ _GLIBCXX_SIMD_ALL_ARITHMETICS(_GLIBCXX_SIMD_OP_);
+ _GLIBCXX_SIMD_ALL_SHIFTS(_GLIBCXX_SIMD_OP_);
+ _GLIBCXX_SIMD_ALL_BINARY(_GLIBCXX_SIMD_OP_);
+#undef _GLIBCXX_SIMD_OP_
+
+ template <typename _Tp = void,
+ typename = decltype(
+ ++std::declval<std::conditional_t<true, value_type, _Tp>&>())>
+ _GLIBCXX_SIMD_INTRINSIC constexpr _SmartReference operator++() &&
+ {
+ value_type __x = __read();
+ __write(++__x);
+ return {_M_obj, _M_index};
+ }
+
+ template <typename _Tp = void,
+ typename = decltype(
+ std::declval<std::conditional_t<true, value_type, _Tp>&>()++)>
+ _GLIBCXX_SIMD_INTRINSIC constexpr value_type operator++(int) &&
+ {
+ const value_type __r = __read();
+ value_type __x = __r;
+ __write(++__x);
+ return __r;
+ }
+
+ template <typename _Tp = void,
+ typename = decltype(
+ --std::declval<std::conditional_t<true, value_type, _Tp>&>())>
+ _GLIBCXX_SIMD_INTRINSIC constexpr _SmartReference operator--() &&
+ {
+ value_type __x = __read();
+ __write(--__x);
+ return {_M_obj, _M_index};
+ }
+
+ template <typename _Tp = void,
+ typename = decltype(
+ std::declval<std::conditional_t<true, value_type, _Tp>&>()--)>
+ _GLIBCXX_SIMD_INTRINSIC constexpr value_type operator--(int) &&
+ {
+ const value_type __r = __read();
+ value_type __x = __r;
+ __write(--__x);
+ return __r;
+ }
+
+ _GLIBCXX_SIMD_INTRINSIC friend void
+ swap(_SmartReference&& __a, _SmartReference&& __b) noexcept(
+ conjunction<
+ std::is_nothrow_constructible<value_type, _SmartReference&&>,
+ std::is_nothrow_assignable<_SmartReference&&, value_type&&>>::value)
+ {
+ value_type __tmp = static_cast<_SmartReference&&>(__a);
+ static_cast<_SmartReference&&>(__a) = static_cast<value_type>(__b);
+ static_cast<_SmartReference&&>(__b) = std::move(__tmp);
+ }
+
+ _GLIBCXX_SIMD_INTRINSIC friend void
+ swap(value_type& __a, _SmartReference&& __b) noexcept(
+ conjunction<
+ std::is_nothrow_constructible<value_type, value_type&&>,
+ std::is_nothrow_assignable<value_type&, value_type&&>,
+ std::is_nothrow_assignable<_SmartReference&&, value_type&&>>::value)
+ {
+ value_type __tmp(std::move(__a));
+ __a = static_cast<value_type>(__b);
+ static_cast<_SmartReference&&>(__b) = std::move(__tmp);
+ }
+
+ _GLIBCXX_SIMD_INTRINSIC friend void
+ swap(_SmartReference&& __a, value_type& __b) noexcept(
+ conjunction<
+ std::is_nothrow_constructible<value_type, _SmartReference&&>,
+ std::is_nothrow_assignable<value_type&, value_type&&>,
+ std::is_nothrow_assignable<_SmartReference&&, value_type&&>>::value)
+ {
+ value_type __tmp(__a);
+ static_cast<_SmartReference&&>(__a) = std::move(__b);
+ __b = std::move(__tmp);
+ }
+};
+
+// }}}
+// __scalar_abi_wrapper {{{
+template <int _Bytes> struct __scalar_abi_wrapper
+{
+ template <typename _Tp, typename _Abi = simd_abi::scalar>
+ static constexpr bool _S_is_valid_v
+ = _Abi::template _IsValid<_Tp>::value && sizeof(_Tp) == _Bytes;
+};
+
+// }}}
+// __decay_abi metafunction {{{
+template <typename _Tp> struct __decay_abi
+{
+ using type = _Tp;
+};
+template <int _Bytes> struct __decay_abi<__scalar_abi_wrapper<_Bytes>>
+{
+ using type = simd_abi::scalar;
+};
+
+// }}}
+// __full_abi metafunction {{{1
+// Given an ABI tag A where A::_S_is_partial == true, define type to be such
+// that _S_is_partial == false and A::_S_full_size<T> == type::size<T> for all
+// valid T
+template <template <int> class _Abi, int _Bytes, typename _Tp> struct __full_abi
+{
+ static constexpr auto __choose()
+ {
+ using _High = _Abi<__next_power_of_2(_Bytes) / 2>;
+ if constexpr (_High::template _S_is_valid_v<
+ _Tp> || _Bytes <= sizeof(_Tp) * 2)
+ return _High();
+ else
+ return
+ typename __full_abi<_Abi, __next_power_of_2(_Bytes) / 2, _Tp>::type();
+ }
+ using type = decltype(__choose());
+};
+
+template <int _Bytes, typename _Tp>
+struct __full_abi<__scalar_abi_wrapper, _Bytes, _Tp>
+{
+ using type = simd_abi::scalar;
+};
+
+// _AbiList {{{1
+template <template <int> class...> struct _AbiList
+{
+ template <typename, int> static constexpr bool _S_has_valid_abi = false;
+ template <typename, int> using _FirstValidAbi = void;
+ template <typename, int> using _BestAbi = void;
+};
+
+template <template <int> class _A0, template <int> class... _Rest>
+struct _AbiList<_A0, _Rest...>
+{
+ template <typename _Tp, int _Np>
+ static constexpr bool _S_has_valid_abi
+ = _A0<sizeof(_Tp) * _Np>::template _S_is_valid_v<
+ _Tp> || _AbiList<_Rest...>::template _S_has_valid_abi<_Tp, _Np>;
+
+ template <typename _Tp, int _Np>
+ using _FirstValidAbi = std::conditional_t<
+ _A0<sizeof(_Tp) * _Np>::template _S_is_valid_v<_Tp>,
+ typename __decay_abi<_A0<sizeof(_Tp) * _Np>>::type,
+ typename _AbiList<_Rest...>::template _FirstValidAbi<_Tp, _Np>>;
+
+ template <typename _Tp, int _Np> static constexpr auto __determine_best_abi()
+ {
+ constexpr int _Bytes = sizeof(_Tp) * _Np;
+ if constexpr (_A0<_Bytes>::template _S_is_valid_v<_Tp>)
+ return typename __decay_abi<_A0<_Bytes>>::type{};
+ else
+ {
+ using _B = typename __full_abi<_A0, _Bytes, _Tp>::type;
+ if constexpr (_B::template _S_is_valid_v<
+ _Tp> && _B::template size<_Tp> <= _Np)
+ return _B{};
+ else
+ return typename _AbiList<_Rest...>::template _BestAbi<_Tp, _Np>{};
+ }
+ }
+
+ template <typename _Tp, int _Np>
+ using _BestAbi = decltype(__determine_best_abi<_Tp, _Np>());
+};
+
+// }}}1
+
+// the following lists all native ABIs, which makes them accessible to
+// simd_abi::deduce and select_best_vector_type_t (for fixed_size). Order
+// matters: Whatever comes first has higher priority.
+using _AllNativeAbis = _AbiList<simd_abi::_VecBltnBtmsk, simd_abi::_VecBuiltin,
+ __scalar_abi_wrapper>;
+
+// valid _SimdTraits specialization {{{1
+template <typename _Tp, typename _Abi>
+struct _SimdTraits<_Tp, _Abi,
+ std::void_t<typename _Abi::template _IsValid<_Tp>>>
+ : _Abi::template __traits<_Tp>
+{
+};
+
+// __deduce_impl specializations {{{1
+// try all native ABIs (including scalar) first
+template <typename _Tp, std::size_t _Np>
+struct __deduce_impl<
+ _Tp, _Np, enable_if_t<_AllNativeAbis::template _S_has_valid_abi<_Tp, _Np>>>
+{
+ using type = _AllNativeAbis::_FirstValidAbi<_Tp, _Np>;
+};
+
+// fall back to fixed_size only if scalar and native ABIs don't match
+template <typename _Tp, std::size_t _Np, typename = void>
+struct __deduce_fixed_size_fallback
+{
+};
+template <typename _Tp, std::size_t _Np>
+struct __deduce_fixed_size_fallback<
+ _Tp, _Np, enable_if_t<simd_abi::fixed_size<_Np>::template _S_is_valid_v<_Tp>>>
+{
+ using type = simd_abi::fixed_size<_Np>;
+};
+template <typename _Tp, std::size_t _Np, typename>
+struct __deduce_impl : public __deduce_fixed_size_fallback<_Tp, _Np>
+{
+};
+
+//}}}1
+
+// simd_mask {{{
+template <typename _Tp, typename _Abi>
+class simd_mask : public _SimdTraits<_Tp, _Abi>::_MaskBase
+{
+ // types, tags, and friends {{{
+ using _Traits = _SimdTraits<_Tp, _Abi>;
+ using _MemberType = typename _Traits::_MaskMember;
+ static constexpr _Tp* _S_type_tag = nullptr;
+ friend typename _Traits::_MaskBase;
+ friend class simd<_Tp, _Abi>; // to construct masks on return
+ friend typename _Traits::_SimdImpl; // to construct masks on return and
+ // inspect data on masked operations
+public:
+ using _Impl = typename _Traits::_MaskImpl;
+ friend _Impl;
+ // }}}
+ // member types {{{
+ using value_type = bool;
+ using reference = _SmartReference<_MemberType, _Impl, value_type>;
+ using simd_type = simd<_Tp, _Abi>;
+ using abi_type = _Abi;
+
+ // }}}
+ static constexpr size_t size() { return __size_or_zero_v<_Tp, _Abi>; }
+ // constructors & assignment {{{
+ simd_mask() = default;
+ simd_mask(const simd_mask&) = default;
+ simd_mask(simd_mask&&) = default;
+ simd_mask& operator=(const simd_mask&) = default;
+ simd_mask& operator=(simd_mask&&) = default;
+
+ // }}}
+
+ // access to internal representation (suggested extension) {{{
+ _GLIBCXX_SIMD_ALWAYS_INLINE explicit simd_mask(
+ typename _Traits::_MaskCastType __init)
+ : _M_data{__init}
+ {}
+ // conversions to internal type is done in _MaskBase
+
+ // }}}
+ // bitset interface (extension to be proposed) {{{
+ // TS_FEEDBACK:
+ // Conversion of simd_mask to and from bitset makes it much easier to
+ // interface with other facilities. I suggest adding `static
+ // simd_mask::from_bitset` and `simd_mask::to_bitset`.
+ _GLIBCXX_SIMD_ALWAYS_INLINE static simd_mask
+ __from_bitset(std::bitset<size()> bs)
+ {
+ return {__bitset_init, bs};
+ }
+ _GLIBCXX_SIMD_ALWAYS_INLINE std::bitset<size()> __to_bitset() const
+ {
+ return _Impl::__to_bits(_M_data)._M_to_bitset();
+ }
+
+ // }}}
+ // explicit broadcast constructor {{{
+ _GLIBCXX_SIMD_ALWAYS_INLINE explicit _GLIBCXX_SIMD_CONSTEXPR
+ simd_mask(value_type __x)
+ : _M_data(_Impl::template __broadcast<_Tp>(__x))
+ {}
+
+ // }}}
+ // implicit type conversion constructor {{{
+#ifdef _GLIBCXX_SIMD_ENABLE_IMPLICIT_MASK_CAST
+ // proposed improvement
+ template <typename _Up, typename _A2,
+ typename = enable_if_t<simd_size_v<_Up, _A2> == size()>>
+ _GLIBCXX_SIMD_ALWAYS_INLINE explicit(
+ sizeof(_MemberType) != sizeof(typename _SimdTraits<_Up, _A2>::_MaskMember))
+ simd_mask(const simd_mask<_Up, _A2>& __x)
+ : simd_mask(__proposed::static_simd_cast<simd_mask>(__x))
+ {}
+#else
+ // conforming to ISO/IEC 19570:2018
+ template <typename _Up, typename = enable_if_t<conjunction<
+ is_same<abi_type, simd_abi::fixed_size<size()>>,
+ is_same<_Up, _Up>>::value>>
+ _GLIBCXX_SIMD_ALWAYS_INLINE
+ simd_mask(const simd_mask<_Up, simd_abi::fixed_size<size()>>& __x)
+ : _M_data(_Impl::__from_bitmask(__data(__x), _S_type_tag))
+ {}
+#endif
+ // }}}
+ // load constructor {{{
+ template <typename _Flags>
+ _GLIBCXX_SIMD_ALWAYS_INLINE simd_mask(const value_type* __mem, _Flags)
+ : _M_data(_Impl::template __load<_Tp, _Flags>(__mem))
+ {}
+ template <typename _Flags>
+ _GLIBCXX_SIMD_ALWAYS_INLINE simd_mask(const value_type* __mem, simd_mask __k,
+ _Flags __f)
+ : _M_data{}
+ {
+ _M_data = _Impl::__masked_load(_M_data, __k._M_data, __mem, __f);
+ }
+
+ // }}}
+ // loads [simd_mask.load] {{{
+ template <typename _Flags>
+ _GLIBCXX_SIMD_ALWAYS_INLINE void copy_from(const value_type* __mem, _Flags)
+ {
+ _M_data = _Impl::template __load<_Tp, _Flags>(__mem);
+ }
+
+ // }}}
+ // stores [simd_mask.store] {{{
+ template <typename _Flags>
+ _GLIBCXX_SIMD_ALWAYS_INLINE void copy_to(value_type* __mem, _Flags __f) const
+ {
+ _Impl::__store(_M_data, __mem, __f);
+ }
+
+ // }}}
+ // scalar access {{{
+ _GLIBCXX_SIMD_ALWAYS_INLINE reference operator[](size_t __i)
+ {
+ return {_M_data, int(__i)};
+ }
+ _GLIBCXX_SIMD_ALWAYS_INLINE value_type operator[]([
+ [maybe_unused]] size_t __i) const
+ {
+ if constexpr (__is_scalar_abi<_Abi>())
+ {
+ _GLIBCXX_DEBUG_ASSERT(__i == 0);
+ return _M_data;
+ }
+ else
+ return static_cast<bool>(_M_data[__i]);
+ }
+
+ // }}}
+ // negation {{{
+ _GLIBCXX_SIMD_ALWAYS_INLINE simd_mask operator!() const
+ {
+ return {__private_init, _Impl::__bit_not(_M_data)};
+ }
+
+ // }}}
+ // simd_mask binary operators [simd_mask.binary] {{{
+#ifdef _GLIBCXX_SIMD_ENABLE_IMPLICIT_MASK_CAST
+ // simd_mask<int> && simd_mask<uint> needs disambiguation
+ template <typename _Up, typename _A2,
+ typename
+ = enable_if_t<is_convertible_v<simd_mask<_Up, _A2>, simd_mask>>>
+ _GLIBCXX_SIMD_ALWAYS_INLINE friend simd_mask
+ operator&&(const simd_mask& __x, const simd_mask<_Up, _A2>& __y)
+ {
+ return {__private_init,
+ _Impl::__logical_and(__x._M_data, simd_mask(__y)._M_data)};
+ }
+ template <typename _Up, typename _A2,
+ typename
+ = enable_if_t<is_convertible_v<simd_mask<_Up, _A2>, simd_mask>>>
+ _GLIBCXX_SIMD_ALWAYS_INLINE friend simd_mask
+ operator||(const simd_mask& __x, const simd_mask<_Up, _A2>& __y)
+ {
+ return {__private_init,
+ _Impl::__logical_or(__x._M_data, simd_mask(__y)._M_data)};
+ }
+#endif // _GLIBCXX_SIMD_ENABLE_IMPLICIT_MASK_CAST
+ _GLIBCXX_SIMD_ALWAYS_INLINE friend simd_mask operator&&(const simd_mask& __x,
+ const simd_mask& __y)
+ {
+ return {__private_init, _Impl::__logical_and(__x._M_data, __y._M_data)};
+ }
+ _GLIBCXX_SIMD_ALWAYS_INLINE friend simd_mask operator||(const simd_mask& __x,
+ const simd_mask& __y)
+ {
+ return {__private_init, _Impl::__logical_or(__x._M_data, __y._M_data)};
+ }
+
+ _GLIBCXX_SIMD_ALWAYS_INLINE friend simd_mask operator&(const simd_mask& __x,
+ const simd_mask& __y)
+ {
+ return {__private_init, _Impl::__bit_and(__x._M_data, __y._M_data)};
+ }
+ _GLIBCXX_SIMD_ALWAYS_INLINE friend simd_mask operator|(const simd_mask& __x,
+ const simd_mask& __y)
+ {
+ return {__private_init, _Impl::__bit_or(__x._M_data, __y._M_data)};
+ }
+ _GLIBCXX_SIMD_ALWAYS_INLINE friend simd_mask operator^(const simd_mask& __x,
+ const simd_mask& __y)
+ {
+ return {__private_init, _Impl::__bit_xor(__x._M_data, __y._M_data)};
+ }
+
+ _GLIBCXX_SIMD_ALWAYS_INLINE friend simd_mask& operator&=(simd_mask& __x,
+ const simd_mask& __y)
+ {
+ __x._M_data = _Impl::__bit_and(__x._M_data, __y._M_data);
+ return __x;
+ }
+ _GLIBCXX_SIMD_ALWAYS_INLINE friend simd_mask& operator|=(simd_mask& __x,
+ const simd_mask& __y)
+ {
+ __x._M_data = _Impl::__bit_or(__x._M_data, __y._M_data);
+ return __x;
+ }
+ _GLIBCXX_SIMD_ALWAYS_INLINE friend simd_mask& operator^=(simd_mask& __x,
+ const simd_mask& __y)
+ {
+ __x._M_data = _Impl::__bit_xor(__x._M_data, __y._M_data);
+ return __x;
+ }
+
+ // }}}
+ // simd_mask compares [simd_mask.comparison] {{{
+ _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend simd_mask
+ operator==(const simd_mask& __x, const simd_mask& __y)
+ {
+ return !operator!=(__x, __y);
+ }
+ _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend simd_mask
+ operator!=(const simd_mask& __x, const simd_mask& __y)
+ {
+ return {__private_init, _Impl::__bit_xor(__x._M_data, __y._M_data)};
+ }
+
+ // }}}
+ // private_init ctor {{{
+ _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR
+ simd_mask(_PrivateInit, typename _Traits::_MaskMember __init)
+ : _M_data(__init)
+ {}
+
+ // }}}
+ // private_init generator ctor {{{
+ template <typename _Fp,
+ typename = decltype(bool(std::declval<_Fp>()(size_t())))>
+ _GLIBCXX_SIMD_INTRINSIC constexpr simd_mask(_PrivateInit, _Fp&& __gen)
+ : _M_data()
+ {
+ __execute_n_times<size()>(
+ [&](auto __i) constexpr { _Impl::__set(_M_data, __i, __gen(__i)); });
+ }
+
+ // }}}
+ // bitset_init ctor {{{
+ _GLIBCXX_SIMD_INTRINSIC simd_mask(_BitsetInit, std::bitset<size()> __init)
+ : _M_data(
+ _Impl::__from_bitmask(_SanitizedBitMask<size()>(__init), _S_type_tag))
+ {}
+
+ // }}}
+ // __cvt {{{
+ // TS_FEEDBACK:
+ // The conversion operator this implements should be a ctor on simd_mask.
+ // Once you call .__cvt() on a simd_mask it converts conveniently.
+ // A useful variation: add `explicit(sizeof(_Tp) != sizeof(_Up))`
+ struct _CvtProxy
+ {
+ template <typename _Up, typename _A2,
+ typename
+ = enable_if_t<simd_size_v<_Up, _A2> == simd_size_v<_Tp, _Abi>>>
+ operator simd_mask<_Up, _A2>() &&
+ {
+ using namespace std::experimental::__proposed;
+ return static_simd_cast<simd_mask<_Up, _A2>>(_M_data);
+ }
+
+ const simd_mask<_Tp, _Abi>& _M_data;
+ };
+ _GLIBCXX_SIMD_INTRINSIC _CvtProxy __cvt() const { return {*this}; }
+ // }}}
+ // operator?: overloads (suggested extension) {{{
+#ifdef __GXX_CONDITIONAL_IS_OVERLOADABLE__
+ _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend simd_mask
+ operator?:(const simd_mask& __k, const simd_mask& __where_true,
+ const simd_mask& __where_false)
+ {
+ auto __ret = __where_false;
+ _Impl::__masked_assign(__k._M_data, __ret._M_data, __where_true._M_data);
+ return __ret;
+ }
+
+ template <typename _U1, typename _U2,
+ typename _Rp = simd<common_type_t<_U1, _U2>, _Abi>,
+ typename = enable_if_t<conjunction_v<
+ is_convertible<_U1, _Rp>, is_convertible<_U2, _Rp>,
+ is_convertible<simd_mask, typename _Rp::mask_type>>>>
+ _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend _Rp
+ operator?:(const simd_mask& __k, const _U1& __where_true,
+ const _U2& __where_false)
+ {
+ _Rp __ret = __where_false;
+ _Rp::_Impl::__masked_assign(__data(
+ static_cast<typename _Rp::mask_type>(__k)),
+ __data(__ret),
+ __data(static_cast<_Rp>(__where_true)));
+ return __ret;
+ }
+
+#ifdef _GLIBCXX_SIMD_ENABLE_IMPLICIT_MASK_CAST
+ template <typename _Kp, typename _Ak, typename _Up, typename _Au,
+ typename = enable_if_t<
+ conjunction_v<is_convertible<simd_mask<_Kp, _Ak>, simd_mask>,
+ is_convertible<simd_mask<_Up, _Au>, simd_mask>>>>
+ _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend simd_mask
+ operator?:(const simd_mask<_Kp, _Ak>& __k, const simd_mask& __where_true,
+ const simd_mask<_Up, _Au>& __where_false)
+ {
+ simd_mask __ret = __where_false;
+ _Impl::__masked_assign(simd_mask(__k)._M_data, __ret._M_data,
+ __where_true._M_data);
+ return __ret;
+ }
+#endif // _GLIBCXX_SIMD_ENABLE_IMPLICIT_MASK_CAST
+#endif // __GXX_CONDITIONAL_IS_OVERLOADABLE__
+ // }}}
+ // _M_is_constprop {{{
+ _GLIBCXX_SIMD_INTRINSIC
+ constexpr bool _M_is_constprop() const
+ {
+ if constexpr (__is_scalar_abi<_Abi>())
+ return __builtin_constant_p(_M_data);
+ else
+ return _M_data._M_is_constprop();
+ }
+
+ // }}}
+
+private:
+ friend const auto& __data<_Tp, abi_type>(const simd_mask&);
+ friend auto& __data<_Tp, abi_type>(simd_mask&);
+ alignas(_Traits::_S_mask_align) _MemberType _M_data;
+};
+
+// }}}
+
+// __data(simd_mask) {{{
+template <typename _Tp, typename _Ap>
+_GLIBCXX_SIMD_INTRINSIC constexpr const auto&
+__data(const simd_mask<_Tp, _Ap>& __x)
+{
+ return __x._M_data;
+}
+template <typename _Tp, typename _Ap>
+_GLIBCXX_SIMD_INTRINSIC constexpr auto&
+__data(simd_mask<_Tp, _Ap>& __x)
+{
+ return __x._M_data;
+}
+// }}}
+
+// simd_mask reductions [simd_mask.reductions] {{{
+template <typename _Tp, typename _Abi>
+_GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR bool
+all_of(const simd_mask<_Tp, _Abi>& __k) noexcept
+{
+ if (__builtin_is_constant_evaluated() || __k._M_is_constprop())
+ {
+ for (size_t __i = 0; __i < simd_size_v<_Tp, _Abi>; ++__i)
+ if (!__k[__i])
+ return false;
+ return true;
+ }
+ else
+ return _Abi::_MaskImpl::__all_of(__k);
+}
+template <typename _Tp, typename _Abi>
+_GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR bool
+any_of(const simd_mask<_Tp, _Abi>& __k) noexcept
+{
+ if (__builtin_is_constant_evaluated() || __k._M_is_constprop())
+ {
+ for (size_t __i = 0; __i < simd_size_v<_Tp, _Abi>; ++__i)
+ if (__k[__i])
+ return true;
+ return false;
+ }
+ else
+ return _Abi::_MaskImpl::__any_of(__k);
+}
+template <typename _Tp, typename _Abi>
+_GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR bool
+none_of(const simd_mask<_Tp, _Abi>& __k) noexcept
+{
+ if (__builtin_is_constant_evaluated() || __k._M_is_constprop())
+ {
+ for (size_t __i = 0; __i < simd_size_v<_Tp, _Abi>; ++__i)
+ if (__k[__i])
+ return false;
+ return true;
+ }
+ else
+ return _Abi::_MaskImpl::__none_of(__k);
+}
+template <typename _Tp, typename _Abi>
+_GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR bool
+some_of(const simd_mask<_Tp, _Abi>& __k) noexcept
+{
+ if (__builtin_is_constant_evaluated() || __k._M_is_constprop())
+ {
+ for (size_t __i = 1; __i < simd_size_v<_Tp, _Abi>; ++__i)
+ if (__k[__i] != __k[__i - 1])
+ return true;
+ return false;
+ }
+ else
+ return _Abi::_MaskImpl::__some_of(__k);
+}
+template <typename _Tp, typename _Abi>
+_GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR int
+popcount(const simd_mask<_Tp, _Abi>& __k) noexcept
+{
+ if (__builtin_is_constant_evaluated() || __k._M_is_constprop())
+ {
+ int __r = 0;
+ for (size_t __i = 0; __i < simd_size_v<_Tp, _Abi>; ++__i)
+ if (__k[__i])
+ ++__r;
+ return __r;
+ }
+ else
+ return _Abi::_MaskImpl::__popcount(__k);
+}
+template <typename _Tp, typename _Abi>
+_GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR int
+find_first_set(const simd_mask<_Tp, _Abi>& __k)
+{
+ if (__builtin_is_constant_evaluated() || __k._M_is_constprop())
+ {
+ for (size_t __i = 0; __i < simd_size_v<_Tp, _Abi>; ++__i)
+ if (__k[__i])
+ return __i;
+ __builtin_unreachable(); // make none_of(__k) UB/ill-formed
+ }
+ else
+ return _Abi::_MaskImpl::__find_first_set(__k);
+}
+template <typename _Tp, typename _Abi>
+_GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR int
+find_last_set(const simd_mask<_Tp, _Abi>& __k)
+{
+ if (__builtin_is_constant_evaluated() || __k._M_is_constprop())
+ {
+ for (size_t __i = simd_size_v<_Tp, _Abi>; __i > 0; --__i)
+ if (__k[__i - 1])
+ return __i - 1;
+ __builtin_unreachable(); // make none_of(__k) UB/ill-formed
+ }
+ else
+ return _Abi::_MaskImpl::__find_last_set(__k);
+}
+
+_GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR bool
+all_of(_ExactBool __x) noexcept
+{
+ return __x;
+}
+_GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR bool
+any_of(_ExactBool __x) noexcept
+{
+ return __x;
+}
+_GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR bool
+none_of(_ExactBool __x) noexcept
+{
+ return !__x;
+}
+_GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR bool
+ some_of(_ExactBool) noexcept
+{
+ return false;
+}
+_GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR int
+popcount(_ExactBool __x) noexcept
+{
+ return __x;
+}
+_GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR int
+ find_first_set(_ExactBool)
+{
+ return 0;
+}
+_GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR int
+ find_last_set(_ExactBool)
+{
+ return 0;
+}
+
+// }}}
+
+// _SimdIntOperators{{{1
+template <typename _V, typename _Impl, bool> class _SimdIntOperators
+{
+};
+
+template <typename _V, typename _Impl> class _SimdIntOperators<_V, _Impl, true>
+{
+ _GLIBCXX_SIMD_INTRINSIC const _V& __derived() const
+ {
+ return *static_cast<const _V*>(this);
+ }
+
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static _GLIBCXX_SIMD_CONSTEXPR _V
+ __make_derived(_Tp&& __d)
+ {
+ return {__private_init, static_cast<_Tp&&>(__d)};
+ }
+
+public:
+ _GLIBCXX_SIMD_CONSTEXPR friend _V& operator%=(_V& __lhs, const _V& __x)
+ {
+ return __lhs = __lhs % __x;
+ }
+ _GLIBCXX_SIMD_CONSTEXPR friend _V& operator&=(_V& __lhs, const _V& __x)
+ {
+ return __lhs = __lhs & __x;
+ }
+ _GLIBCXX_SIMD_CONSTEXPR friend _V& operator|=(_V& __lhs, const _V& __x)
+ {
+ return __lhs = __lhs | __x;
+ }
+ _GLIBCXX_SIMD_CONSTEXPR friend _V& operator^=(_V& __lhs, const _V& __x)
+ {
+ return __lhs = __lhs ^ __x;
+ }
+ _GLIBCXX_SIMD_CONSTEXPR friend _V& operator<<=(_V& __lhs, const _V& __x)
+ {
+ return __lhs = __lhs << __x;
+ }
+ _GLIBCXX_SIMD_CONSTEXPR friend _V& operator>>=(_V& __lhs, const _V& __x)
+ {
+ return __lhs = __lhs >> __x;
+ }
+ _GLIBCXX_SIMD_CONSTEXPR friend _V& operator<<=(_V& __lhs, int __x)
+ {
+ return __lhs = __lhs << __x;
+ }
+ _GLIBCXX_SIMD_CONSTEXPR friend _V& operator>>=(_V& __lhs, int __x)
+ {
+ return __lhs = __lhs >> __x;
+ }
+
+ _GLIBCXX_SIMD_CONSTEXPR friend _V operator%(const _V& __x, const _V& __y)
+ {
+ return _SimdIntOperators::__make_derived(
+ _Impl::__modulus(__data(__x), __data(__y)));
+ }
+ _GLIBCXX_SIMD_CONSTEXPR friend _V operator&(const _V& __x, const _V& __y)
+ {
+ return _SimdIntOperators::__make_derived(
+ _Impl::__bit_and(__data(__x), __data(__y)));
+ }
+ _GLIBCXX_SIMD_CONSTEXPR friend _V operator|(const _V& __x, const _V& __y)
+ {
+ return _SimdIntOperators::__make_derived(
+ _Impl::__bit_or(__data(__x), __data(__y)));
+ }
+ _GLIBCXX_SIMD_CONSTEXPR friend _V operator^(const _V& __x, const _V& __y)
+ {
+ return _SimdIntOperators::__make_derived(
+ _Impl::__bit_xor(__data(__x), __data(__y)));
+ }
+ _GLIBCXX_SIMD_CONSTEXPR friend _V operator<<(const _V& __x, const _V& __y)
+ {
+ return _SimdIntOperators::__make_derived(
+ _Impl::__bit_shift_left(__data(__x), __data(__y)));
+ }
+ _GLIBCXX_SIMD_CONSTEXPR friend _V operator>>(const _V& __x, const _V& __y)
+ {
+ return _SimdIntOperators::__make_derived(
+ _Impl::__bit_shift_right(__data(__x), __data(__y)));
+ }
+ _GLIBCXX_SIMD_CONSTEXPR friend _V operator<<(const _V& __x, int __y)
+ {
+ return _SimdIntOperators::__make_derived(
+ _Impl::__bit_shift_left(__data(__x), __y));
+ }
+ _GLIBCXX_SIMD_CONSTEXPR friend _V operator>>(const _V& __x, int __y)
+ {
+ return _SimdIntOperators::__make_derived(
+ _Impl::__bit_shift_right(__data(__x), __y));
+ }
+
+ // unary operators (for integral _Tp)
+ _GLIBCXX_SIMD_CONSTEXPR _V operator~() const
+ {
+ return {__private_init, _Impl::__complement(__derived()._M_data)};
+ }
+};
+
+//}}}1
+
+// simd {{{
+template <typename _Tp, typename _Abi>
+class simd : public _SimdIntOperators<
+ simd<_Tp, _Abi>, typename _SimdTraits<_Tp, _Abi>::_SimdImpl,
+ conjunction<std::is_integral<_Tp>,
+ typename _SimdTraits<_Tp, _Abi>::_IsValid>::value>,
+ public _SimdTraits<_Tp, _Abi>::_SimdBase
+{
+ using _Traits = _SimdTraits<_Tp, _Abi>;
+ using _MemberType = typename _Traits::_SimdMember;
+ using _CastType = typename _Traits::_SimdCastType;
+ static constexpr _Tp* _S_type_tag = nullptr;
+ friend typename _Traits::_SimdBase;
+
+public:
+ using _Impl = typename _Traits::_SimdImpl;
+ friend _Impl;
+ friend _SimdIntOperators<simd, _Impl, true>;
+
+ using value_type = _Tp;
+ using reference = _SmartReference<_MemberType, _Impl, value_type>;
+ using mask_type = simd_mask<_Tp, _Abi>;
+ using abi_type = _Abi;
+
+ static constexpr size_t size() { return __size_or_zero_v<_Tp, _Abi>; }
+ _GLIBCXX_SIMD_CONSTEXPR simd() = default;
+ _GLIBCXX_SIMD_CONSTEXPR simd(const simd&) = default;
+ _GLIBCXX_SIMD_CONSTEXPR simd(simd&&) noexcept = default;
+ _GLIBCXX_SIMD_CONSTEXPR simd& operator=(const simd&) = default;
+ _GLIBCXX_SIMD_CONSTEXPR simd& operator=(simd&&) noexcept = default;
+
+ // implicit broadcast constructor
+ template <typename _Up, typename = _ValuePreservingOrInt<_Up, value_type>>
+ _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR simd(_Up&& __x)
+ : _M_data(
+ _Impl::__broadcast(static_cast<value_type>(static_cast<_Up&&>(__x))))
+ {}
+
+ // implicit type conversion constructor (convert from fixed_size to
+ // fixed_size)
+ template <typename _Up>
+ _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR
+ simd(const simd<_Up, simd_abi::fixed_size<size()>>& __x,
+ enable_if_t<
+ conjunction<std::is_same<simd_abi::fixed_size<size()>, abi_type>,
+ std::negation<__is_narrowing_conversion<_Up, value_type>>,
+ __converts_to_higher_integer_rank<_Up, value_type>>::value,
+ void*> = nullptr)
+ : simd{static_cast<std::array<_Up, size()>>(__x).data(), vector_aligned}
+ {}
+
+ // explicit type conversion constructor
+#ifdef _GLIBCXX_SIMD_ENABLE_STATIC_CAST
+ template <typename _Up, typename _A2,
+ typename = decltype(
+ static_simd_cast<simd>(std::declval<const simd<_Up, _A2>&>()))>
+ _GLIBCXX_SIMD_ALWAYS_INLINE explicit _GLIBCXX_SIMD_CONSTEXPR
+ simd(const simd<_Up, _A2>& __x)
+ : simd(static_simd_cast<simd>(__x))
+ {}
+#endif // _GLIBCXX_SIMD_ENABLE_STATIC_CAST
+
+ // generator constructor
+ template <typename _Fp>
+ _GLIBCXX_SIMD_ALWAYS_INLINE explicit _GLIBCXX_SIMD_CONSTEXPR
+ simd(_Fp&& __gen, _ValuePreservingOrInt<decltype(std::declval<_Fp>()(
+ std::declval<_SizeConstant<0>&>())),
+ value_type>* = nullptr)
+ : _M_data(_Impl::__generator(static_cast<_Fp&&>(__gen), _S_type_tag))
+ {}
+
+ // load constructor
+ template <typename _Up, typename _Flags>
+ _GLIBCXX_SIMD_ALWAYS_INLINE simd(const _Up* __mem, _Flags __f)
+ : _M_data(_Impl::__load(__mem, __f, _S_type_tag))
+ {}
+
+ // loads [simd.load]
+ template <typename _Up, typename _Flags>
+ _GLIBCXX_SIMD_ALWAYS_INLINE void copy_from(const _Vectorizable<_Up>* __mem,
+ _Flags __f)
+ {
+ _M_data
+ = static_cast<decltype(_M_data)>(_Impl::__load(__mem, __f, _S_type_tag));
+ }
+
+ // stores [simd.store]
+ template <typename _Up, typename _Flags>
+ _GLIBCXX_SIMD_ALWAYS_INLINE void copy_to(_Vectorizable<_Up>* __mem,
+ _Flags __f) const
+ {
+ _Impl::__store(_M_data, __mem, __f, _S_type_tag);
+ }
+
+ // scalar access
+ _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR reference
+ operator[](size_t __i)
+ {
+ return {_M_data, int(__i)};
+ }
+ _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR value_type operator[]([
+ [maybe_unused]] size_t __i) const
+ {
+ if constexpr (__is_scalar_abi<_Abi>())
+ {
+ _GLIBCXX_DEBUG_ASSERT(__i == 0);
+ return _M_data;
+ }
+ else
+ {
+ return _M_data[__i];
+ }
+ }
+
+ // increment and decrement:
+ _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR simd& operator++()
+ {
+ _Impl::__increment(_M_data);
+ return *this;
+ }
+ _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR simd operator++(int)
+ {
+ simd __r = *this;
+ _Impl::__increment(_M_data);
+ return __r;
+ }
+ _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR simd& operator--()
+ {
+ _Impl::__decrement(_M_data);
+ return *this;
+ }
+ _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR simd operator--(int)
+ {
+ simd __r = *this;
+ _Impl::__decrement(_M_data);
+ return __r;
+ }
+
+ // unary operators (for any _Tp)
+ _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR mask_type
+ operator!() const
+ {
+ return {__private_init, _Impl::__negate(_M_data)};
+ }
+ _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR simd operator+() const
+ {
+ return *this;
+ }
+ _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR simd operator-() const
+ {
+ return {__private_init, _Impl::__unary_minus(_M_data)};
+ }
+
+ // access to internal representation (suggested extension)
+ _GLIBCXX_SIMD_ALWAYS_INLINE explicit _GLIBCXX_SIMD_CONSTEXPR
+ simd(_CastType __init)
+ : _M_data(__init)
+ {}
+
+ // compound assignment [simd.cassign]
+ _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend simd&
+ operator+=(simd& __lhs, const simd& __x)
+ {
+ return __lhs = __lhs + __x;
+ }
+ _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend simd&
+ operator-=(simd& __lhs, const simd& __x)
+ {
+ return __lhs = __lhs - __x;
+ }
+ _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend simd&
+ operator*=(simd& __lhs, const simd& __x)
+ {
+ return __lhs = __lhs * __x;
+ }
+ _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend simd&
+ operator/=(simd& __lhs, const simd& __x)
+ {
+ return __lhs = __lhs / __x;
+ }
+
+ // binary operators [simd.binary]
+ _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend simd
+ operator+(const simd& __x, const simd& __y)
+ {
+ return {__private_init, _Impl::__plus(__x._M_data, __y._M_data)};
+ }
+ _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend simd
+ operator-(const simd& __x, const simd& __y)
+ {
+ return {__private_init, _Impl::__minus(__x._M_data, __y._M_data)};
+ }
+ _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend simd
+ operator*(const simd& __x, const simd& __y)
+ {
+ return {__private_init, _Impl::__multiplies(__x._M_data, __y._M_data)};
+ }
+ _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend simd
+ operator/(const simd& __x, const simd& __y)
+ {
+ return {__private_init, _Impl::__divides(__x._M_data, __y._M_data)};
+ }
+
+ // compares [simd.comparison]
+ _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend mask_type
+ operator==(const simd& __x, const simd& __y)
+ {
+ return simd::__make_mask(_Impl::__equal_to(__x._M_data, __y._M_data));
+ }
+ _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend mask_type
+ operator!=(const simd& __x, const simd& __y)
+ {
+ return simd::__make_mask(_Impl::__not_equal_to(__x._M_data, __y._M_data));
+ }
+ _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend mask_type
+ operator<(const simd& __x, const simd& __y)
+ {
+ return simd::__make_mask(_Impl::__less(__x._M_data, __y._M_data));
+ }
+ _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend mask_type
+ operator<=(const simd& __x, const simd& __y)
+ {
+ return simd::__make_mask(_Impl::__less_equal(__x._M_data, __y._M_data));
+ }
+ _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend mask_type
+ operator>(const simd& __x, const simd& __y)
+ {
+ return simd::__make_mask(_Impl::__less(__y._M_data, __x._M_data));
+ }
+ _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend mask_type
+ operator>=(const simd& __x, const simd& __y)
+ {
+ return simd::__make_mask(_Impl::__less_equal(__y._M_data, __x._M_data));
+ }
+
+ // operator?: overloads (suggested extension) {{{
+#ifdef __GXX_CONDITIONAL_IS_OVERLOADABLE__
+ _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR friend simd
+ operator?:(const mask_type& __k, const simd& __where_true,
+ const simd& __where_false)
+ {
+ auto __ret = __where_false;
+ _Impl::__masked_assign(__data(__k), __data(__ret), __data(__where_true));
+ return __ret;
+ }
+#endif // __GXX_CONDITIONAL_IS_OVERLOADABLE__
+ // }}}
+
+ // "private" because of the first arguments's namespace
+ _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR
+ simd(_PrivateInit, const _MemberType& __init)
+ : _M_data(__init)
+ {}
+
+ // "private" because of the first arguments's namespace
+ _GLIBCXX_SIMD_INTRINSIC simd(_BitsetInit, std::bitset<size()> __init)
+ : _M_data()
+ {
+ where(mask_type(__bitset_init, __init), *this) = ~*this;
+ }
+
+ _GLIBCXX_SIMD_INTRINSIC
+ constexpr bool _M_is_constprop() const
+ {
+ if constexpr (__is_scalar_abi<_Abi>())
+ return __builtin_constant_p(_M_data);
+ else
+ return _M_data._M_is_constprop();
+ }
+
+private:
+ _GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR static mask_type
+ __make_mask(typename mask_type::_MemberType __k)
+ {
+ return {__private_init, __k};
+ }
+
+ friend const auto& __data<value_type, abi_type>(const simd&);
+ friend auto& __data<value_type, abi_type>(simd&);
+ alignas(_Traits::_S_simd_align) _MemberType _M_data;
+};
+
+// }}}
+// __data {{{
+template <typename _Tp, typename _Ap>
+_GLIBCXX_SIMD_INTRINSIC constexpr const auto&
+__data(const simd<_Tp, _Ap>& __x)
+{
+ return __x._M_data;
+}
+template <typename _Tp, typename _Ap>
+_GLIBCXX_SIMD_INTRINSIC constexpr auto&
+__data(simd<_Tp, _Ap>& __x)
+{
+ return __x._M_data;
+}
+// }}}
+
+namespace __proposed {
+namespace float_bitwise_operators {
+// float_bitwise_operators {{{
+template <typename _Tp, typename _Ap>
+_GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR simd<_Tp, _Ap>
+operator^(const simd<_Tp, _Ap>& __a, const simd<_Tp, _Ap>& __b)
+{
+ return {__private_init, _Ap::_SimdImpl::__bit_xor(__data(__a), __data(__b))};
+}
+
+template <typename _Tp, typename _Ap>
+_GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR simd<_Tp, _Ap>
+operator|(const simd<_Tp, _Ap>& __a, const simd<_Tp, _Ap>& __b)
+{
+ return {__private_init, _Ap::_SimdImpl::__bit_or(__data(__a), __data(__b))};
+}
+
+template <typename _Tp, typename _Ap>
+_GLIBCXX_SIMD_INTRINSIC _GLIBCXX_SIMD_CONSTEXPR simd<_Tp, _Ap>
+operator&(const simd<_Tp, _Ap>& __a, const simd<_Tp, _Ap>& __b)
+{
+ return {__private_init, _Ap::_SimdImpl::__bit_and(__data(__a), __data(__b))};
+}
+// }}}
+} // namespace float_bitwise_operators
+} // namespace __proposed
+
+_GLIBCXX_SIMD_END_NAMESPACE
+
+#endif // __cplusplus >= 201703L
+#endif // _GLIBCXX_EXPERIMENTAL_SIMD_H
+
+// vim: foldmethod=marker
diff --git a/libstdc++-v3/include/experimental/bits/simd_builtin.h b/libstdc++-v3/include/experimental/bits/simd_builtin.h
new file mode 100644
index 00000000000..4dbdce95797
--- /dev/null
+++ b/libstdc++-v3/include/experimental/bits/simd_builtin.h
@@ -0,0 +1,2854 @@
+// Simd Abi specific implementations -*- C++ -*-
+
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library. This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+// <http://www.gnu.org/licenses/>.
+
+#ifndef _GLIBCXX_EXPERIMENTAL_SIMD_ABIS_H_
+#define _GLIBCXX_EXPERIMENTAL_SIMD_ABIS_H_
+
+#if __cplusplus >= 201703L
+
+#include <array>
+#include <cmath>
+#include <cstdlib>
+
+_GLIBCXX_SIMD_BEGIN_NAMESPACE
+// _S_allbits{{{
+template <typename _V>
+static inline constexpr _V _S_allbits
+ = reinterpret_cast<_V>(~__vector_type_t<char, sizeof(_V) / sizeof(char)>());
+
+// }}}
+// _S_signmask, _S_absmask{{{
+template <typename _V, typename = _VectorTraits<_V>>
+static inline constexpr _V _S_signmask = __xor(_V() + 1, _V() - 1);
+template <typename _V, typename = _VectorTraits<_V>>
+static inline constexpr _V _S_absmask
+ = __andnot(_S_signmask<_V>, _S_allbits<_V>);
+
+//}}}
+// __vector_permute<Indices...>{{{
+// Index == -1 requests zeroing of the output element
+template <int... _Indices, typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+_Tp
+__vector_permute(_Tp __x)
+{
+ static_assert(sizeof...(_Indices) == _TVT::_S_width);
+ return __make_vector<typename _TVT::value_type>(
+ (_Indices == -1 ? 0 : __x[_Indices == -1 ? 0 : _Indices])...);
+}
+
+// }}}
+// __vector_shuffle<Indices...>{{{
+// Index == -1 requests zeroing of the output element
+template <int... _Indices, typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+_Tp
+__vector_shuffle(_Tp __x, _Tp __y)
+{
+ return _Tp{(_Indices == -1 ? 0
+ : _Indices < _TVT::_S_width
+ ? __x[_Indices]
+ : __y[_Indices - _TVT::_S_width])...};
+}
+
+// }}}
+// __make_wrapper{{{
+template <typename _Tp, typename... _Args>
+_GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapper<_Tp, sizeof...(_Args)>
+__make_wrapper(const _Args&... __args)
+{
+ return __make_vector<_Tp>(__args...);
+}
+
+// }}}
+// __wrapper_bitcast{{{
+template <typename _Tp, size_t _ToN = 0, typename _Up, size_t _M,
+ size_t _Np = _ToN != 0 ? _ToN : sizeof(_Up) * _M / sizeof(_Tp)>
+_GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapper<_Tp, _Np>
+__wrapper_bitcast(_SimdWrapper<_Up, _M> __x)
+{
+ static_assert(_Np > 1);
+ return __intrin_bitcast<__vector_type_t<_Tp, _Np>>(__x._M_data);
+}
+
+// }}}
+// __shift_elements_right{{{
+// if (__shift % 2ⁿ == 0) => the low n Bytes are correct
+template <unsigned __shift, typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+_GLIBCXX_SIMD_INTRINSIC _Tp
+__shift_elements_right(_Tp __v)
+{
+ [[maybe_unused]] const auto __iv = __to_intrin(__v);
+ static_assert(__shift <= sizeof(_Tp));
+ if constexpr (__shift == 0)
+ return __v;
+ else if constexpr (__shift == sizeof(_Tp))
+ return _Tp();
+#if _GLIBCXX_SIMD_X86INTRIN // {{{
+ else if constexpr (__have_sse && __shift == 8
+ && _TVT::template __is<float, 4>)
+ return _mm_movehl_ps(__iv, __iv);
+ else if constexpr (__have_sse2 && __shift == 8
+ && _TVT::template __is<double, 2>)
+ return _mm_unpackhi_pd(__iv, __iv);
+ else if constexpr (__have_sse2 && sizeof(_Tp) == 16)
+ return reinterpret_cast<typename _TVT::type>(
+ _mm_srli_si128(reinterpret_cast<__m128i>(__iv), __shift));
+ else if constexpr (__shift == 16 && sizeof(_Tp) == 32)
+ {
+ /*if constexpr (__have_avx && _TVT::template __is<double, 4>)
+ return _mm256_permute2f128_pd(__iv, __iv, 0x81);
+ else if constexpr (__have_avx && _TVT::template __is<float, 8>)
+ return _mm256_permute2f128_ps(__iv, __iv, 0x81);
+ else if constexpr (__have_avx)
+ return reinterpret_cast<typename _TVT::type>(
+ _mm256_permute2f128_si256(__iv, __iv, 0x81));
+ else*/
+ return __zero_extend(__hi128(__v));
+ }
+ else if constexpr (__have_avx2 && sizeof(_Tp) == 32 && __shift < 16)
+ {
+ const auto __vll = __vector_bitcast<_LLong>(__v);
+ return reinterpret_cast<typename _TVT::type>(
+ _mm256_alignr_epi8(_mm256_permute2x128_si256(__vll, __vll, 0x81), __vll,
+ __shift));
+ }
+ else if constexpr (__have_avx && sizeof(_Tp) == 32 && __shift < 16)
+ {
+ const auto __vll = __vector_bitcast<_LLong>(__v);
+ return reinterpret_cast<typename _TVT::type>(
+ __concat(_mm_alignr_epi8(__hi128(__vll), __lo128(__vll), __shift),
+ _mm_srli_si128(__hi128(__vll), __shift)));
+ }
+ else if constexpr (sizeof(_Tp) == 32 && __shift > 16)
+ return __zero_extend(__shift_elements_right<__shift - 16>(__hi128(__v)));
+ else if constexpr (sizeof(_Tp) == 64 && __shift == 32)
+ return __zero_extend(__hi256(__v));
+ else if constexpr (__have_avx512f && sizeof(_Tp) == 64)
+ {
+ if constexpr (__shift >= 48)
+ return __zero_extend(
+ __shift_elements_right<__shift - 48>(__extract<3, 4>(__v)));
+ else if constexpr (__shift >= 32)
+ return __zero_extend(
+ __shift_elements_right<__shift - 32>(__hi256(__v)));
+ else if constexpr (__shift % 8 == 0)
+ return reinterpret_cast<typename _TVT::type>(
+ _mm512_alignr_epi64(__m512i(), __intrin_bitcast<__m512i>(__v),
+ __shift / 8));
+ else if constexpr (__shift % 4 == 0)
+ return reinterpret_cast<typename _TVT::type>(
+ _mm512_alignr_epi32(__m512i(), __intrin_bitcast<__m512i>(__v),
+ __shift / 4));
+ else if constexpr (__have_avx512bw && __shift < 16)
+ {
+ const auto __vll = __vector_bitcast<_LLong>(__v);
+ return reinterpret_cast<typename _TVT::type>(
+ _mm512_alignr_epi8(_mm512_shuffle_i32x4(__vll, __vll, 0xf9), __vll,
+ __shift));
+ }
+ else if constexpr (__have_avx512bw && __shift < 32)
+ {
+ const auto __vll = __vector_bitcast<_LLong>(__v);
+ return reinterpret_cast<typename _TVT::type>(
+ _mm512_alignr_epi8(_mm512_shuffle_i32x4(__vll, __m512i(), 0xee),
+ _mm512_shuffle_i32x4(__vll, __vll, 0xf9),
+ __shift - 16));
+ }
+ else
+ __assert_unreachable<_Tp>();
+ }
+/*
+ } else if constexpr (__shift % 16 == 0 && sizeof(_Tp) == 64)
+ return __auto_bitcast(__extract<__shift / 16, 4>(__v));
+*/
+#endif // _GLIBCXX_SIMD_X86INTRIN }}}
+ else
+ {
+ constexpr int __chunksize
+ = __shift % 8 == 0 ? 8
+ : __shift % 4 == 0 ? 4 : __shift % 2 == 0 ? 2 : 1;
+ auto __w = __vector_bitcast<__int_with_sizeof_t<__chunksize>>(__v);
+ using _Up = decltype(__w);
+ return __intrin_bitcast<_Tp>(
+ __call_with_n_evaluations<(sizeof(_Tp) - __shift) / __chunksize>(
+ [](auto... __chunks) { return _Up{__chunks...}; },
+ [&](auto __i) { return __w[__shift / __chunksize + __i]; }));
+ }
+}
+
+// }}}
+// __extract_part(_SimdWrapper<_Tp, _Np>) {{{
+template <int _Index, int _Total, int _Combine, typename _Tp, size_t _Np>
+_GLIBCXX_SIMD_INTRINSIC
+ _GLIBCXX_CONST _SimdWrapper<_Tp, _Np / _Total * _Combine>
+ __extract_part(const _SimdWrapper<_Tp, _Np> __x)
+{
+ if constexpr (_Index % 2 == 0 && _Total % 2 == 0 && _Combine % 2 == 0)
+ return __extract_part<_Index / 2, _Total / 2, _Combine / 2>(__x);
+ else
+ {
+ constexpr size_t __values_per_part = _Np / _Total;
+ constexpr size_t __values_to_skip = _Index * __values_per_part;
+ constexpr size_t __return_size = __values_per_part * _Combine;
+ using _R = __vector_type_t<_Tp, __return_size>;
+ static_assert((_Index + _Combine) * __values_per_part * sizeof(_Tp)
+ <= sizeof(__x),
+ "out of bounds __extract_part");
+ // the following assertion would ensure no "padding" to be read
+ // static_assert(_Total >= _Index + _Combine, "_Total must be greater than
+ // _Index");
+
+ // static_assert(__return_size * _Total == _Np, "_Np must be divisible by
+ // _Total");
+ if (__x._M_is_constprop())
+ return __generate_from_n_evaluations<__return_size, _R>(
+ [&](auto __i) { return __x[__values_to_skip + __i]; });
+ if constexpr (_Index == 0 && _Total == 1)
+ return __x;
+ else if constexpr (_Index == 0)
+ return __intrin_bitcast<_R>(__as_vector(__x));
+#if _GLIBCXX_SIMD_X86INTRIN // {{{
+ else if constexpr (sizeof(__x) == 32 && __return_size * sizeof(_Tp) <= 16)
+ {
+ constexpr size_t __bytes_to_skip = __values_to_skip * sizeof(_Tp);
+ if constexpr (__bytes_to_skip == 16)
+ return __vector_bitcast<_Tp, __return_size>(
+ __hi128(__as_vector(__x)));
+ else
+ return __vector_bitcast<_Tp, __return_size>(
+ _mm_alignr_epi8(__hi128(__vector_bitcast<_LLong>(__x)),
+ __lo128(__vector_bitcast<_LLong>(__x)),
+ __bytes_to_skip));
+ }
+#endif // _GLIBCXX_SIMD_X86INTRIN }}}
+ else if constexpr (_Index > 0
+ && (__values_to_skip % __return_size != 0
+ || sizeof(_R) >= 8)
+ && (__values_to_skip + __return_size) * sizeof(_Tp)
+ <= 64
+ && sizeof(__x) >= 16)
+ return __intrin_bitcast<_R>(
+ __shift_elements_right<__values_to_skip * sizeof(_Tp)>(
+ __as_vector(__x)));
+ else
+ {
+ _R __r = {};
+ __builtin_memcpy(&__r,
+ reinterpret_cast<const char*>(&__x)
+ + sizeof(_Tp) * __values_to_skip,
+ __return_size * sizeof(_Tp));
+ return __r;
+ }
+ }
+}
+
+// }}}
+// __extract_part(_SimdWrapper<bool, _Np>) {{{
+template <int _Index, int _Total, int _Combine = 1, size_t _Np>
+_GLIBCXX_SIMD_INTRINSIC constexpr _SimdWrapper<bool, _Np / _Total * _Combine>
+__extract_part(const _SimdWrapper<bool, _Np> __x)
+{
+ static_assert(_Combine == 1, "_Combine != 1 not implemented");
+ static_assert(__have_avx512f && _Np == _Np);
+ static_assert(_Total >= 2 && _Index + _Combine <= _Total && _Index >= 0);
+ return __x._M_data >> (_Index * _Np / _Total);
+}
+
+// }}}
+
+// __vector_convert {{{
+// implementation requires an index sequence
+template <typename _To, typename _From, size_t... _I>
+_GLIBCXX_SIMD_INTRINSIC constexpr _To
+__vector_convert(_From __a, index_sequence<_I...>)
+{
+ using _Tp = typename _VectorTraits<_To>::value_type;
+ return _To{static_cast<_Tp>(__a[_I])...};
+}
+
+template <typename _To, typename _From, size_t... _I>
+_GLIBCXX_SIMD_INTRINSIC constexpr _To
+__vector_convert(_From __a, _From __b, index_sequence<_I...>)
+{
+ using _Tp = typename _VectorTraits<_To>::value_type;
+ return _To{static_cast<_Tp>(__a[_I])..., static_cast<_Tp>(__b[_I])...};
+}
+
+template <typename _To, typename _From, size_t... _I>
+_GLIBCXX_SIMD_INTRINSIC constexpr _To
+__vector_convert(_From __a, _From __b, _From __c, index_sequence<_I...>)
+{
+ using _Tp = typename _VectorTraits<_To>::value_type;
+ return _To{static_cast<_Tp>(__a[_I])..., static_cast<_Tp>(__b[_I])...,
+ static_cast<_Tp>(__c[_I])...};
+}
+
+template <typename _To, typename _From, size_t... _I>
+_GLIBCXX_SIMD_INTRINSIC constexpr _To
+__vector_convert(_From __a, _From __b, _From __c, _From __d,
+ index_sequence<_I...>)
+{
+ using _Tp = typename _VectorTraits<_To>::value_type;
+ return _To{static_cast<_Tp>(__a[_I])..., static_cast<_Tp>(__b[_I])...,
+ static_cast<_Tp>(__c[_I])..., static_cast<_Tp>(__d[_I])...};
+}
+
+template <typename _To, typename _From, size_t... _I>
+_GLIBCXX_SIMD_INTRINSIC constexpr _To
+__vector_convert(_From __a, _From __b, _From __c, _From __d, _From __e,
+ index_sequence<_I...>)
+{
+ using _Tp = typename _VectorTraits<_To>::value_type;
+ return _To{static_cast<_Tp>(__a[_I])..., static_cast<_Tp>(__b[_I])...,
+ static_cast<_Tp>(__c[_I])..., static_cast<_Tp>(__d[_I])...,
+ static_cast<_Tp>(__e[_I])...};
+}
+
+template <typename _To, typename _From, size_t... _I>
+_GLIBCXX_SIMD_INTRINSIC constexpr _To
+__vector_convert(_From __a, _From __b, _From __c, _From __d, _From __e,
+ _From __f, index_sequence<_I...>)
+{
+ using _Tp = typename _VectorTraits<_To>::value_type;
+ return _To{static_cast<_Tp>(__a[_I])..., static_cast<_Tp>(__b[_I])...,
+ static_cast<_Tp>(__c[_I])..., static_cast<_Tp>(__d[_I])...,
+ static_cast<_Tp>(__e[_I])..., static_cast<_Tp>(__f[_I])...};
+}
+
+template <typename _To, typename _From, size_t... _I>
+_GLIBCXX_SIMD_INTRINSIC constexpr _To
+__vector_convert(_From __a, _From __b, _From __c, _From __d, _From __e,
+ _From __f, _From __g, index_sequence<_I...>)
+{
+ using _Tp = typename _VectorTraits<_To>::value_type;
+ return _To{static_cast<_Tp>(__a[_I])..., static_cast<_Tp>(__b[_I])...,
+ static_cast<_Tp>(__c[_I])..., static_cast<_Tp>(__d[_I])...,
+ static_cast<_Tp>(__e[_I])..., static_cast<_Tp>(__f[_I])...,
+ static_cast<_Tp>(__g[_I])...};
+}
+
+template <typename _To, typename _From, size_t... _I>
+_GLIBCXX_SIMD_INTRINSIC constexpr _To
+__vector_convert(_From __a, _From __b, _From __c, _From __d, _From __e,
+ _From __f, _From __g, _From __h, index_sequence<_I...>)
+{
+ using _Tp = typename _VectorTraits<_To>::value_type;
+ return _To{static_cast<_Tp>(__a[_I])..., static_cast<_Tp>(__b[_I])...,
+ static_cast<_Tp>(__c[_I])..., static_cast<_Tp>(__d[_I])...,
+ static_cast<_Tp>(__e[_I])..., static_cast<_Tp>(__f[_I])...,
+ static_cast<_Tp>(__g[_I])..., static_cast<_Tp>(__h[_I])...};
+}
+
+template <typename _To, typename _From, size_t... _I>
+_GLIBCXX_SIMD_INTRINSIC constexpr _To
+__vector_convert(_From __a, _From __b, _From __c, _From __d, _From __e,
+ _From __f, _From __g, _From __h, _From __i,
+ index_sequence<_I...>)
+{
+ using _Tp = typename _VectorTraits<_To>::value_type;
+ return _To{static_cast<_Tp>(__a[_I])..., static_cast<_Tp>(__b[_I])...,
+ static_cast<_Tp>(__c[_I])..., static_cast<_Tp>(__d[_I])...,
+ static_cast<_Tp>(__e[_I])..., static_cast<_Tp>(__f[_I])...,
+ static_cast<_Tp>(__g[_I])..., static_cast<_Tp>(__h[_I])...,
+ static_cast<_Tp>(__i[_I])...};
+}
+
+template <typename _To, typename _From, size_t... _I>
+_GLIBCXX_SIMD_INTRINSIC constexpr _To
+__vector_convert(_From __a, _From __b, _From __c, _From __d, _From __e,
+ _From __f, _From __g, _From __h, _From __i, _From __j,
+ index_sequence<_I...>)
+{
+ using _Tp = typename _VectorTraits<_To>::value_type;
+ return _To{static_cast<_Tp>(__a[_I])..., static_cast<_Tp>(__b[_I])...,
+ static_cast<_Tp>(__c[_I])..., static_cast<_Tp>(__d[_I])...,
+ static_cast<_Tp>(__e[_I])..., static_cast<_Tp>(__f[_I])...,
+ static_cast<_Tp>(__g[_I])..., static_cast<_Tp>(__h[_I])...,
+ static_cast<_Tp>(__i[_I])..., static_cast<_Tp>(__j[_I])...};
+}
+
+template <typename _To, typename _From, size_t... _I>
+_GLIBCXX_SIMD_INTRINSIC constexpr _To
+__vector_convert(_From __a, _From __b, _From __c, _From __d, _From __e,
+ _From __f, _From __g, _From __h, _From __i, _From __j,
+ _From __k, index_sequence<_I...>)
+{
+ using _Tp = typename _VectorTraits<_To>::value_type;
+ return _To{static_cast<_Tp>(__a[_I])..., static_cast<_Tp>(__b[_I])...,
+ static_cast<_Tp>(__c[_I])..., static_cast<_Tp>(__d[_I])...,
+ static_cast<_Tp>(__e[_I])..., static_cast<_Tp>(__f[_I])...,
+ static_cast<_Tp>(__g[_I])..., static_cast<_Tp>(__h[_I])...,
+ static_cast<_Tp>(__i[_I])..., static_cast<_Tp>(__j[_I])...,
+ static_cast<_Tp>(__k[_I])...};
+}
+
+template <typename _To, typename _From, size_t... _I>
+_GLIBCXX_SIMD_INTRINSIC constexpr _To
+__vector_convert(_From __a, _From __b, _From __c, _From __d, _From __e,
+ _From __f, _From __g, _From __h, _From __i, _From __j,
+ _From __k, _From __l, index_sequence<_I...>)
+{
+ using _Tp = typename _VectorTraits<_To>::value_type;
+ return _To{static_cast<_Tp>(__a[_I])..., static_cast<_Tp>(__b[_I])...,
+ static_cast<_Tp>(__c[_I])..., static_cast<_Tp>(__d[_I])...,
+ static_cast<_Tp>(__e[_I])..., static_cast<_Tp>(__f[_I])...,
+ static_cast<_Tp>(__g[_I])..., static_cast<_Tp>(__h[_I])...,
+ static_cast<_Tp>(__i[_I])..., static_cast<_Tp>(__j[_I])...,
+ static_cast<_Tp>(__k[_I])..., static_cast<_Tp>(__l[_I])...};
+}
+
+template <typename _To, typename _From, size_t... _I>
+_GLIBCXX_SIMD_INTRINSIC constexpr _To
+__vector_convert(_From __a, _From __b, _From __c, _From __d, _From __e,
+ _From __f, _From __g, _From __h, _From __i, _From __j,
+ _From __k, _From __l, _From __m, index_sequence<_I...>)
+{
+ using _Tp = typename _VectorTraits<_To>::value_type;
+ return _To{static_cast<_Tp>(__a[_I])..., static_cast<_Tp>(__b[_I])...,
+ static_cast<_Tp>(__c[_I])..., static_cast<_Tp>(__d[_I])...,
+ static_cast<_Tp>(__e[_I])..., static_cast<_Tp>(__f[_I])...,
+ static_cast<_Tp>(__g[_I])..., static_cast<_Tp>(__h[_I])...,
+ static_cast<_Tp>(__i[_I])..., static_cast<_Tp>(__j[_I])...,
+ static_cast<_Tp>(__k[_I])..., static_cast<_Tp>(__l[_I])...,
+ static_cast<_Tp>(__m[_I])...};
+}
+
+template <typename _To, typename _From, size_t... _I>
+_GLIBCXX_SIMD_INTRINSIC constexpr _To
+__vector_convert(_From __a, _From __b, _From __c, _From __d, _From __e,
+ _From __f, _From __g, _From __h, _From __i, _From __j,
+ _From __k, _From __l, _From __m, _From __n,
+ index_sequence<_I...>)
+{
+ using _Tp = typename _VectorTraits<_To>::value_type;
+ return _To{static_cast<_Tp>(__a[_I])..., static_cast<_Tp>(__b[_I])...,
+ static_cast<_Tp>(__c[_I])..., static_cast<_Tp>(__d[_I])...,
+ static_cast<_Tp>(__e[_I])..., static_cast<_Tp>(__f[_I])...,
+ static_cast<_Tp>(__g[_I])..., static_cast<_Tp>(__h[_I])...,
+ static_cast<_Tp>(__i[_I])..., static_cast<_Tp>(__j[_I])...,
+ static_cast<_Tp>(__k[_I])..., static_cast<_Tp>(__l[_I])...,
+ static_cast<_Tp>(__m[_I])..., static_cast<_Tp>(__n[_I])...};
+}
+
+template <typename _To, typename _From, size_t... _I>
+_GLIBCXX_SIMD_INTRINSIC constexpr _To
+__vector_convert(_From __a, _From __b, _From __c, _From __d, _From __e,
+ _From __f, _From __g, _From __h, _From __i, _From __j,
+ _From __k, _From __l, _From __m, _From __n, _From __o,
+ index_sequence<_I...>)
+{
+ using _Tp = typename _VectorTraits<_To>::value_type;
+ return _To{static_cast<_Tp>(__a[_I])..., static_cast<_Tp>(__b[_I])...,
+ static_cast<_Tp>(__c[_I])..., static_cast<_Tp>(__d[_I])...,
+ static_cast<_Tp>(__e[_I])..., static_cast<_Tp>(__f[_I])...,
+ static_cast<_Tp>(__g[_I])..., static_cast<_Tp>(__h[_I])...,
+ static_cast<_Tp>(__i[_I])..., static_cast<_Tp>(__j[_I])...,
+ static_cast<_Tp>(__k[_I])..., static_cast<_Tp>(__l[_I])...,
+ static_cast<_Tp>(__m[_I])..., static_cast<_Tp>(__n[_I])...,
+ static_cast<_Tp>(__o[_I])...};
+}
+
+template <typename _To, typename _From, size_t... _I>
+_GLIBCXX_SIMD_INTRINSIC constexpr _To
+__vector_convert(_From __a, _From __b, _From __c, _From __d, _From __e,
+ _From __f, _From __g, _From __h, _From __i, _From __j,
+ _From __k, _From __l, _From __m, _From __n, _From __o,
+ _From __p, index_sequence<_I...>)
+{
+ using _Tp = typename _VectorTraits<_To>::value_type;
+ return _To{static_cast<_Tp>(__a[_I])..., static_cast<_Tp>(__b[_I])...,
+ static_cast<_Tp>(__c[_I])..., static_cast<_Tp>(__d[_I])...,
+ static_cast<_Tp>(__e[_I])..., static_cast<_Tp>(__f[_I])...,
+ static_cast<_Tp>(__g[_I])..., static_cast<_Tp>(__h[_I])...,
+ static_cast<_Tp>(__i[_I])..., static_cast<_Tp>(__j[_I])...,
+ static_cast<_Tp>(__k[_I])..., static_cast<_Tp>(__l[_I])...,
+ static_cast<_Tp>(__m[_I])..., static_cast<_Tp>(__n[_I])...,
+ static_cast<_Tp>(__o[_I])..., static_cast<_Tp>(__p[_I])...};
+}
+
+// Defer actual conversion to the overload that takes an index sequence. Note
+// that this function adds zeros or drops values off the end if you don't ensure
+// matching width.
+template <typename _To, typename... _From, typename _ToT = _VectorTraits<_To>,
+ typename _FromT = _VectorTraits<__first_of_pack_t<_From...>>>
+_GLIBCXX_SIMD_INTRINSIC constexpr _To
+__vector_convert(_From... __xs)
+{
+#ifdef _GLIBCXX_SIMD_WORKAROUND_PR85048
+ if (!(... && __builtin_constant_p(__xs)))
+ {
+ if constexpr ((sizeof...(_From) & (sizeof...(_From) - 1))
+ == 0) // power-of-two number of arguments
+ return __convert_x86<_To>(__as_vector(__xs)...);
+ else
+ {
+ using _FF = __first_of_pack_t<_From...>;
+ return __vector_convert<_To>(__xs..., _FF{});
+ }
+ }
+ else
+#endif
+ return __vector_convert<_To>(
+ __xs...,
+ make_index_sequence<std::min(_ToT::_S_width, _FromT::_S_width)>());
+}
+
+// This overload takes a vectorizable type _To and produces a return type that
+// matches the width.
+template <typename _To, typename... _From,
+ typename = enable_if_t<__is_vectorizable_v<_To>>,
+ typename _FromT = _VectorTraits<__first_of_pack_t<_From...>>,
+ typename = int>
+_GLIBCXX_SIMD_INTRINSIC constexpr _To
+__vector_convert(_From... __xs)
+{
+ return __vector_convert<__vector_type_t<_To, _FromT::_S_width>>(__xs...);
+}
+
+// }}}
+// __convert function{{{
+template <typename _To, typename _From, typename... _More>
+_GLIBCXX_SIMD_INTRINSIC constexpr auto
+__convert(_From __v0, _More... __vs)
+{
+ if constexpr (__is_vectorizable_v<_From>)
+ {
+ static_assert((true && ... && is_same_v<_From, _More>) );
+ using _V = typename _VectorTraits<_To>::type;
+ using _Tp = typename _VectorTraits<_To>::value_type;
+ return _V{static_cast<_Tp>(__v0), static_cast<_Tp>(__vs)...};
+ }
+ else if constexpr (!__is_vector_type_v<_From>)
+ return __convert<_To>(__as_vector(__v0), __as_vector(__vs)...);
+ else
+ {
+ static_assert((true && ... && is_same_v<_From, _More>) );
+ if constexpr (__is_vectorizable_v<_To>)
+ return __convert<__vector_type_t<_To, (_VectorTraits<_From>::_S_width
+ * (1 + sizeof...(_More)))>>(
+ __v0, __vs...);
+ else if constexpr (!__is_vector_type_v<_To>)
+ return _To(__convert<typename _To::_BuiltinType>(__v0, __vs...));
+ else
+ {
+ static_assert(
+ sizeof...(_More) == 0
+ || _VectorTraits<_To>::_S_width
+ >= (1 + sizeof...(_More)) * _VectorTraits<_From>::_S_width,
+ "__convert(...) requires the input to fit into the output");
+ return __vector_convert<_To>(__v0, __vs...);
+ }
+ }
+}
+
+// }}}
+// __convert_all{{{
+// Converts __v into std::array<_To, N>, where N is _NParts if non-zero or
+// otherwise deduced from _To such that N * #elements(_To) <= #elements(__v).
+// Note: this function may return less than all converted elements
+template <typename _To,
+ size_t _NParts = 0, // allows to convert fewer or more (only last _To,
+ // to be partially filled) than all
+ size_t _Offset = 0, // where to start, # of elements (not Bytes or
+ // Parts)
+ typename _From, typename _FromVT = _VectorTraits<_From>>
+_GLIBCXX_SIMD_INTRINSIC auto
+__convert_all(_From __v)
+{
+ if constexpr (std::is_arithmetic_v<_To> && _NParts != 1)
+ {
+ static_assert(_Offset < _FromVT::_S_width);
+ constexpr auto _Np
+ = _NParts == 0 ? _FromVT::_S_partial_width - _Offset : _NParts;
+ return __generate_from_n_evaluations<_Np, std::array<_To, _Np>>(
+ [&](auto __i) { return static_cast<_To>(__v[__i + _Offset]); });
+ }
+ else
+ {
+ static_assert(__is_vector_type_v<_To>);
+ using _ToVT = _VectorTraits<_To>;
+ if constexpr (__is_vector_type_v<_From>)
+ return __convert_all<_To, _NParts>(__as_wrapper(__v));
+ else if constexpr (_NParts == 1)
+ {
+ static_assert(_Offset % _ToVT::_S_width == 0);
+ return std::array<_To, 1>{__vector_convert<_To>(
+ __extract_part<_Offset / _ToVT::_S_width,
+ __div_roundup(_FromVT::_S_partial_width,
+ _ToVT::_S_width)>(__v))};
+ }
+#if _GLIBCXX_SIMD_X86INTRIN // {{{
+ else if constexpr (
+ !__have_sse4_1 && _Offset == 0
+ && is_integral_v<
+ typename _FromVT::
+ value_type> && sizeof(typename _FromVT::value_type) < sizeof(typename _ToVT::value_type)
+ && !(sizeof(typename _FromVT::value_type) == 4
+ && is_same_v<typename _ToVT::value_type, double>) )
+ {
+ using _ToT = typename _ToVT::value_type;
+ using _FromT = typename _FromVT::value_type;
+ constexpr size_t _Np
+ = _NParts != 0 ? _NParts
+ : (_FromVT::_S_partial_width / _ToVT::_S_width);
+ using _R = std::array<_To, _Np>;
+ // __adjust modifies its input to have _Np (use _SizeConstant) entries
+ // so that no unnecessary intermediate conversions are requested and,
+ // more importantly, no intermediate conversions are missing
+ [[maybe_unused]] auto __adjust
+ = [](auto __n,
+ auto __vv) -> _SimdWrapper<_FromT, decltype(__n)::value> {
+ return __vector_bitcast<_FromT, decltype(__n)::value>(__vv);
+ };
+ [[maybe_unused]] const auto __vi = __to_intrin(__v);
+ auto&& __make_array =
+ []<typename _ToConvert>(_ToConvert __x0,
+ [[maybe_unused]] _ToConvert __x1) {
+ if constexpr (_Np == 1)
+ return _R{__vector_bitcast<_ToT>(__x0)};
+ else
+ return _R{__vector_bitcast<_ToT>(__x0),
+ __vector_bitcast<_ToT>(__x1)};
+ };
+
+ if constexpr (_Np == 0)
+ return _R{};
+ else if constexpr (sizeof(_FromT) == 1 && sizeof(_ToT) == 2)
+ {
+ static_assert(std::is_integral_v<_FromT>);
+ static_assert(std::is_integral_v<_ToT>);
+ if constexpr (is_unsigned_v<_FromT>)
+ return __make_array(_mm_unpacklo_epi8(__vi, __m128i()),
+ _mm_unpackhi_epi8(__vi, __m128i()));
+ else
+ return __make_array(
+ _mm_srai_epi16(_mm_unpacklo_epi8(__vi, __vi), 8),
+ _mm_srai_epi16(_mm_unpackhi_epi8(__vi, __vi), 8));
+ }
+ else if constexpr (sizeof(_FromT) == 2 && sizeof(_ToT) == 4)
+ {
+ static_assert(std::is_integral_v<_FromT>);
+ if constexpr (is_floating_point_v<_ToT>)
+ {
+ const auto __ints
+ = __convert_all<__vector_type16_t<int>, _Np>(
+ __adjust(_SizeConstant<_Np * 4>(), __v));
+ return __generate_from_n_evaluations<_Np, _R>([&](auto __i) {
+ return __vector_convert<_To>(__ints[__i]);
+ });
+ }
+ else if constexpr (is_unsigned_v<_FromT>)
+ return __make_array(_mm_unpacklo_epi16(__vi, __m128i()),
+ _mm_unpackhi_epi16(__vi, __m128i()));
+ else
+ return __make_array(
+ _mm_srai_epi32(_mm_unpacklo_epi16(__vi, __vi), 16),
+ _mm_srai_epi32(_mm_unpackhi_epi16(__vi, __vi), 16));
+ }
+ else if constexpr (sizeof(_FromT) == 4 && sizeof(_ToT) == 8
+ && is_integral_v<_FromT> && is_integral_v<_ToT>)
+ {
+ if constexpr (is_unsigned_v<_FromT>)
+ return __make_array(_mm_unpacklo_epi32(__vi, __m128i()),
+ _mm_unpackhi_epi32(__vi, __m128i()));
+ else
+ return __make_array(
+ _mm_unpacklo_epi32(__vi, _mm_srai_epi32(__vi, 31)),
+ _mm_unpackhi_epi32(__vi, _mm_srai_epi32(__vi, 31)));
+ }
+ else if constexpr (sizeof(_FromT) == 4 && sizeof(_ToT) == 8
+ && is_integral_v<_FromT> && is_integral_v<_ToT>)
+ {
+ if constexpr (is_unsigned_v<_FromT>)
+ return __make_array(_mm_unpacklo_epi32(__vi, __m128i()),
+ _mm_unpackhi_epi32(__vi, __m128i()));
+ else
+ return __make_array(
+ _mm_unpacklo_epi32(__vi, _mm_srai_epi32(__vi, 31)),
+ _mm_unpackhi_epi32(__vi, _mm_srai_epi32(__vi, 31)));
+ }
+ else if constexpr (sizeof(_FromT) == 1 && sizeof(_ToT) >= 4
+ && is_signed_v<_FromT>)
+ {
+ const __m128i __vv[2] = {_mm_unpacklo_epi8(__vi, __vi),
+ _mm_unpackhi_epi8(__vi, __vi)};
+ const __vector_type16_t<int> __vvvv[4]
+ = {__vector_bitcast<int>(_mm_unpacklo_epi16(__vv[0], __vv[0])),
+ __vector_bitcast<int>(_mm_unpackhi_epi16(__vv[0], __vv[0])),
+ __vector_bitcast<int>(_mm_unpacklo_epi16(__vv[1], __vv[1])),
+ __vector_bitcast<int>(_mm_unpackhi_epi16(__vv[1], __vv[1]))};
+ if constexpr (sizeof(_ToT) == 4)
+ return __generate_from_n_evaluations<_Np, _R>([&](auto __i) {
+ return __vector_convert<_To>(__vvvv[__i] >> 24);
+ });
+ else if constexpr (is_integral_v<_ToT>)
+ return __generate_from_n_evaluations<_Np, _R>([&](auto __i) {
+ const auto __signbits = __to_intrin(__vvvv[__i / 2] >> 31);
+ const auto __sx32 = __to_intrin(__vvvv[__i / 2] >> 24);
+ return __vector_bitcast<_ToT>(
+ __i % 2 == 0 ? _mm_unpacklo_epi32(__sx32, __signbits)
+ : _mm_unpackhi_epi32(__sx32, __signbits));
+ });
+ else
+ return __generate_from_n_evaluations<_Np, _R>([&](auto __i) {
+ const auto __int4 = __vvvv[__i / 2] >> 24;
+ return __vector_convert<_To>(
+ __i % 2 == 0 ? __int4
+ : __vector_bitcast<int>(
+ _mm_unpackhi_epi64(__to_intrin(__int4),
+ __to_intrin(__int4))));
+ });
+ }
+ else if constexpr (sizeof(_FromT) == 1 && sizeof(_ToT) == 4)
+ {
+ const auto __shorts = __convert_all<__vector_type16_t<
+ conditional_t<is_signed_v<_FromT>, short, unsigned short>>>(
+ __adjust(_SizeConstant<(_Np + 1) / 2 * 8>(), __v));
+ return __generate_from_n_evaluations<_Np, _R>([&](auto __i) {
+ return __convert_all<_To>(__shorts[__i / 2])[__i % 2];
+ });
+ }
+ else if constexpr (sizeof(_FromT) == 2 && sizeof(_ToT) == 8
+ && is_signed_v<_FromT> && is_integral_v<_ToT>)
+ {
+ const __m128i __vv[2] = {_mm_unpacklo_epi16(__vi, __vi),
+ _mm_unpackhi_epi16(__vi, __vi)};
+ const __vector_type16_t<int> __vvvv[4]
+ = {__vector_bitcast<int>(
+ _mm_unpacklo_epi32(_mm_srai_epi32(__vv[0], 16),
+ _mm_srai_epi32(__vv[0], 31))),
+ __vector_bitcast<int>(
+ _mm_unpackhi_epi32(_mm_srai_epi32(__vv[0], 16),
+ _mm_srai_epi32(__vv[0], 31))),
+ __vector_bitcast<int>(
+ _mm_unpacklo_epi32(_mm_srai_epi32(__vv[1], 16),
+ _mm_srai_epi32(__vv[1], 31))),
+ __vector_bitcast<int>(
+ _mm_unpackhi_epi32(_mm_srai_epi32(__vv[1], 16),
+ _mm_srai_epi32(__vv[1], 31)))};
+ return __generate_from_n_evaluations<_Np, _R>(
+ [&](auto __i) { return __vector_bitcast<_ToT>(__vvvv[__i]); });
+ }
+ else if constexpr (sizeof(_FromT) <= 2 && sizeof(_ToT) == 8)
+ {
+ const auto __ints = __convert_all<__vector_type16_t<
+ conditional_t<is_signed_v<_FromT> || is_floating_point_v<_ToT>,
+ int, unsigned int>>>(
+ __adjust(_SizeConstant<(_Np + 1) / 2 * 4>(), __v));
+ return __generate_from_n_evaluations<_Np, _R>([&](auto __i) {
+ return __convert_all<_To>(__ints[__i / 2])[__i % 2];
+ });
+ }
+ else
+ __assert_unreachable<_To>();
+ }
+#endif // _GLIBCXX_SIMD_X86INTRIN }}}
+ else if constexpr ((_FromVT::_S_partial_width - _Offset)
+ > _ToVT::_S_width)
+ {
+ /*
+ static_assert(
+ (_FromVT::_S_partial_width & (_FromVT::_S_partial_width - 1)) == 0,
+ "__convert_all only supports power-of-2 number of elements.
+ Otherwise " "the return type cannot be std::array<_To, N>.");
+ */
+ constexpr size_t _NTotal
+ = (_FromVT::_S_partial_width - _Offset) / _ToVT::_S_width;
+ constexpr size_t _Np = _NParts == 0 ? _NTotal : _NParts;
+ static_assert(
+ _Np <= _NTotal
+ || (_Np == _NTotal + 1
+ && (_FromVT::_S_partial_width - _Offset) % _ToVT::_S_width
+ > 0));
+ using _R = std::array<_To, _Np>;
+ if constexpr (_Np == 1)
+ return _R{__vector_convert<_To>(
+ __as_vector(__extract_part<_Offset, _FromVT::_S_partial_width,
+ _ToVT::_S_width>(__v)))};
+ else
+ return __generate_from_n_evaluations<_Np, _R>([&](
+ auto __i) constexpr {
+ auto __part
+ = __extract_part<__i * _ToVT::_S_width + _Offset,
+ _FromVT::_S_partial_width, _ToVT::_S_width>(
+ __v);
+ return __vector_convert<_To>(__part);
+ });
+ }
+ else if constexpr (_Offset == 0)
+ return std::array<_To, 1>{__vector_convert<_To>(__as_vector(__v))};
+ else
+ return std::array<_To, 1>{__vector_convert<_To>(__as_vector(
+ __extract_part<_Offset, _FromVT::_S_partial_width,
+ _FromVT::_S_partial_width - _Offset>(__v)))};
+ }
+}
+
+// }}}
+
+// _GnuTraits {{{
+template <typename _Tp, typename _Mp, typename _Abi, size_t _Np>
+struct _GnuTraits
+{
+ using _IsValid = true_type;
+ using _SimdImpl = typename _Abi::_SimdImpl;
+ using _MaskImpl = typename _Abi::_MaskImpl;
+
+ // simd and simd_mask member types {{{
+ using _SimdMember = _SimdWrapper<_Tp, _Np>;
+ using _MaskMember = _SimdWrapper<_Mp, _Np>;
+ static constexpr size_t _S_simd_align = alignof(_SimdMember);
+ static constexpr size_t _S_mask_align = alignof(_MaskMember);
+
+ // }}}
+ // _SimdBase / base class for simd, providing extra conversions {{{
+ struct _SimdBase2
+ {
+ explicit operator __intrinsic_type_t<_Tp, _Np>() const
+ {
+ return __to_intrin(static_cast<const simd<_Tp, _Abi>*>(this)->_M_data);
+ }
+ explicit operator __vector_type_t<_Tp, _Np>() const
+ {
+ return static_cast<const simd<_Tp, _Abi>*>(this)->_M_data.__builtin();
+ }
+ };
+ struct _SimdBase1
+ {
+ explicit operator __intrinsic_type_t<_Tp, _Np>() const
+ {
+ return __data(*static_cast<const simd<_Tp, _Abi>*>(this));
+ }
+ };
+ using _SimdBase
+ = std::conditional_t<std::is_same<__intrinsic_type_t<_Tp, _Np>,
+ __vector_type_t<_Tp, _Np>>::value,
+ _SimdBase1, _SimdBase2>;
+
+ // }}}
+ // _MaskBase {{{
+ struct _MaskBase2
+ {
+ explicit operator __intrinsic_type_t<_Tp, _Np>() const
+ {
+ return static_cast<const simd_mask<_Tp, _Abi>*>(this)->_M_data.__intrin();
+ }
+ explicit operator __vector_type_t<_Tp, _Np>() const
+ {
+ return static_cast<const simd_mask<_Tp, _Abi>*>(this)->_M_data._M_data;
+ }
+ };
+ struct _MaskBase1
+ {
+ explicit operator __intrinsic_type_t<_Tp, _Np>() const
+ {
+ return __data(*static_cast<const simd_mask<_Tp, _Abi>*>(this));
+ }
+ };
+ using _MaskBase
+ = std::conditional_t<std::is_same<__intrinsic_type_t<_Tp, _Np>,
+ __vector_type_t<_Tp, _Np>>::value,
+ _MaskBase1, _MaskBase2>;
+
+ // }}}
+ // _MaskCastType {{{
+ // parameter type of one explicit simd_mask constructor
+ class _MaskCastType
+ {
+ using _Up = __intrinsic_type_t<_Tp, _Np>;
+ _Up _M_data;
+
+ public:
+ _MaskCastType(_Up __x) : _M_data(__x) {}
+ operator _MaskMember() const { return _M_data; }
+ };
+
+ // }}}
+ // _SimdCastType {{{
+ // parameter type of one explicit simd constructor
+ class _SimdCastType1
+ {
+ using _Ap = __intrinsic_type_t<_Tp, _Np>;
+ _SimdMember _M_data;
+
+ public:
+ _SimdCastType1(_Ap __a) : _M_data(__vector_bitcast<_Tp>(__a)) {}
+ operator _SimdMember() const { return _M_data; }
+ };
+
+ class _SimdCastType2
+ {
+ using _Ap = __intrinsic_type_t<_Tp, _Np>;
+ using _B = __vector_type_t<_Tp, _Np>;
+ _SimdMember _M_data;
+
+ public:
+ _SimdCastType2(_Ap __a) : _M_data(__vector_bitcast<_Tp>(__a)) {}
+ _SimdCastType2(_B __b) : _M_data(__b) {}
+ operator _SimdMember() const { return _M_data; }
+ };
+
+ using _SimdCastType
+ = std::conditional_t<std::is_same<__intrinsic_type_t<_Tp, _Np>,
+ __vector_type_t<_Tp, _Np>>::value,
+ _SimdCastType1, _SimdCastType2>;
+ //}}}
+};
+
+// }}}
+struct _CommonImplX86;
+struct _CommonImplNeon;
+struct _CommonImplBuiltin;
+template <typename _Abi> struct _SimdImplBuiltin;
+template <typename _Abi> struct _MaskImplBuiltin;
+template <typename _Abi> struct _SimdImplX86;
+template <typename _Abi> struct _MaskImplX86;
+template <typename _Abi> struct _SimdImplNeon;
+template <typename _Abi> struct _MaskImplNeon;
+// simd_abi::_VecBuiltin {{{
+template <int _UsedBytes> struct simd_abi::_VecBuiltin
+{
+ template <typename _Tp>
+ static constexpr size_t size = _UsedBytes / sizeof(_Tp);
+ template <typename _Tp>
+ static constexpr size_t _S_full_size
+ = sizeof(__vector_type_t<_Tp, size<_Tp>>) / sizeof(_Tp);
+ static constexpr bool _S_is_partial = (_UsedBytes & (_UsedBytes - 1)) != 0;
+
+ // validity traits {{{
+ struct _IsValidAbiTag : __bool_constant<(_UsedBytes > 1)>
+ {
+ };
+
+ template <typename _Tp>
+ struct _IsValidSizeFor
+ : std::conjunction<
+ __bool_constant<(_UsedBytes / sizeof(_Tp) > 1
+ && _UsedBytes % sizeof(_Tp) == 0)>,
+ __bool_constant<(_UsedBytes <= __vectorized_sizeof<_Tp>())>>
+ {
+ };
+ template <typename _Tp>
+ struct _IsValid : std::conjunction<_IsValidAbiTag, __is_vectorizable<_Tp>,
+ _IsValidSizeFor<_Tp>>
+ {
+ };
+ template <typename _Tp>
+ static constexpr bool _S_is_valid_v = _IsValid<_Tp>::value;
+
+ // }}}
+ // _SimdImpl/_MaskImpl {{{
+#if _GLIBCXX_SIMD_X86INTRIN
+ using _CommonImpl = _CommonImplX86;
+ using _SimdImpl = _SimdImplX86<_VecBuiltin<_UsedBytes>>;
+ using _MaskImpl = _MaskImplX86<_VecBuiltin<_UsedBytes>>;
+#elif _GLIBCXX_SIMD_HAVE_NEON
+ using _CommonImpl = _CommonImplNeon;
+ using _SimdImpl = _SimdImplNeon<_VecBuiltin<_UsedBytes>>;
+ using _MaskImpl = _MaskImplNeon<_VecBuiltin<_UsedBytes>>;
+#else
+ using _CommonImpl = _CommonImplBuiltin;
+ using _SimdImpl = _SimdImplBuiltin<_VecBuiltin<_UsedBytes>>;
+ using _MaskImpl = _MaskImplBuiltin<_VecBuiltin<_UsedBytes>>;
+#endif
+
+ // }}}
+ // __traits {{{
+ template <typename _Tp>
+ using __traits = std::conditional_t<
+ _S_is_valid_v<_Tp>,
+ _GnuTraits<_Tp, _Tp, _VecBuiltin<_UsedBytes>, size<_Tp>>, _InvalidTraits>;
+ //}}}
+ // implicit masks {{{
+ template <typename _Tp>
+ static constexpr _SimdWrapper<_Tp, size<_Tp>> __implicit_mask()
+ {
+ constexpr auto __size = _S_full_size<_Tp>;
+ using _ImplicitMask = __vector_type_t<__int_for_sizeof_t<_Tp>, __size>;
+ return reinterpret_cast<__vector_type_t<_Tp, __size>>(
+ !_S_is_partial ? ~_ImplicitMask()
+ : __generate_vector<_ImplicitMask>([](auto __i) constexpr {
+ return __i < _UsedBytes / sizeof(_Tp) ? -1 : 0;
+ }));
+ }
+
+ template <typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+ static constexpr _Tp __masked(_Tp __x)
+ {
+ using _Up = typename _TVT::value_type;
+ if constexpr (_S_is_partial)
+ return __and(__as_vector(__x), __implicit_mask<_Up>()._M_data);
+ else
+ return __x;
+ }
+
+ template <typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+ static constexpr auto __make_padding_nonzero(_Tp __x)
+ {
+ if constexpr (!_S_is_partial)
+ return __x;
+ else
+ {
+ using _Up = typename _TVT::value_type;
+ if constexpr (std::is_integral_v<_Up>)
+ return __or(__x, ~__implicit_mask<_Up>()._M_data);
+ else
+ {
+ constexpr auto __one
+ = __andnot(__implicit_mask<_Up>()._M_data,
+ __vector_broadcast<_S_full_size<_Up>>(_Up(1)));
+ return __or(__x, __one);
+ }
+ }
+ }
+ // }}}
+};
+
+// }}}
+// simd_abi::_VecBltnBtmsk {{{
+template <int _UsedBytes> struct simd_abi::_VecBltnBtmsk
+{
+ template <typename _Tp>
+ static constexpr size_t size = _UsedBytes / sizeof(_Tp);
+ template <typename _Tp>
+ static constexpr size_t _S_full_size
+ = sizeof(__vector_type_t<_Tp, size<_Tp>>) / sizeof(_Tp);
+ static constexpr bool _S_is_partial = (_UsedBytes & (_UsedBytes - 1)) != 0;
+
+ // validity traits {{{
+ struct _IsValidAbiTag : __bool_constant<(_UsedBytes > 1)>
+ {
+ };
+ template <typename _Tp>
+ struct _IsValidSizeFor
+ : __bool_constant<(_UsedBytes / sizeof(_Tp) > 1
+ && _UsedBytes % sizeof(_Tp) == 0 && _UsedBytes <= 64
+ && (_UsedBytes > 32 || __have_avx512vl))>
+ {
+ };
+ // Bitmasks require at least AVX512F. If sizeof(_Tp) < 4 the AVX512BW is also
+ // required.
+ template <typename _Tp>
+ struct _IsValid
+ : conjunction<_IsValidAbiTag, __bool_constant<__have_avx512f>,
+ __bool_constant<__have_avx512bw || (sizeof(_Tp) >= 4)>,
+ __bool_constant<(__vectorized_sizeof<_Tp>() > sizeof(_Tp))>,
+ _IsValidSizeFor<_Tp>>
+ {
+ };
+ template <typename _Tp>
+ static constexpr bool _S_is_valid_v = _IsValid<_Tp>::value;
+
+ // }}}
+ // implicit mask {{{
+private:
+ template <typename _Tp> using _ImplicitMask = _SimdWrapper<bool, size<_Tp>>;
+
+public:
+ template <size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr __bool_storage_member_type_t<_Np>
+ __implicit_mask_n()
+ {
+ using _Tp = __bool_storage_member_type_t<_Np>;
+ return _Np < sizeof(_Tp) * CHAR_BIT ? _Tp((1ULL << _Np) - 1) : ~_Tp();
+ }
+
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _ImplicitMask<_Tp> __implicit_mask()
+ {
+ return __implicit_mask_n<size<_Tp>>();
+ }
+
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np>
+ __masked(_SimdWrapper<_Tp, _Np> __x)
+ {
+ if constexpr (is_same_v<_Tp, bool>)
+ if constexpr (_S_is_partial || _Np < 8)
+ return _MaskImpl::__bit_and(__x, _SimdWrapper<_Tp, _Np>(
+ __bool_storage_member_type_t<_Np>(
+ (1ULL << _Np) - 1)));
+ else
+ return __x;
+ else
+ return __masked(__x._M_data);
+ }
+
+ template <typename _TV>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _TV __masked(_TV __x)
+ {
+ static_assert(
+ !__is_bitmask_v<_TV>,
+ "_VecBltnBtmsk::__masked cannot work on bitmasks, since it doesn't "
+ "know the number of elements. Use _SimdWrapper<bool, N> instead.");
+ if constexpr (_S_is_partial)
+ {
+ using _Tp = typename _VectorTraits<_TV>::value_type;
+ constexpr size_t _Np = size<_Tp>;
+ return __make_dependent_t<_TV, _CommonImpl>::_S_blend(
+ __implicit_mask<_Tp>(), _SimdWrapper<_Tp, _Np>(),
+ _SimdWrapper<_Tp, _Np>(__x));
+ }
+ else
+ return __x;
+ }
+
+ template <typename _TV, typename _TVT = _VectorTraits<_TV>>
+ static constexpr auto __make_padding_nonzero(_TV __x)
+ {
+ if constexpr (!_S_is_partial)
+ return __x;
+ else
+ {
+ using _Tp = typename _TVT::value_type;
+ constexpr size_t _Np = size<_Tp>;
+ if constexpr (is_integral_v<typename _TVT::value_type>)
+ return __x
+ | __generate_vector<_Tp, _S_full_size<_Tp>>(
+ [](auto __i) -> _Tp {
+ if (__i < _Np)
+ return 0;
+ else
+ return 1;
+ });
+ else
+ return __make_dependent_t<_TV, _CommonImpl>::_S_blend(
+ __implicit_mask<_Tp>(),
+ _SimdWrapper<_Tp, _Np>(
+ __vector_broadcast<_S_full_size<_Tp>>(_Tp(1))),
+ _SimdWrapper<_Tp, _Np>(__x))
+ ._M_data;
+ }
+ }
+
+ // }}}
+ // simd/_MaskImpl {{{
+#if _GLIBCXX_SIMD_X86INTRIN
+ using _CommonImpl = _CommonImplX86;
+ using _SimdImpl = _SimdImplX86<_VecBltnBtmsk<_UsedBytes>>;
+ using _MaskImpl = _MaskImplX86<_VecBltnBtmsk<_UsedBytes>>;
+#else
+ template <int> struct _MissingImpl;
+ using _CommonImpl = _MissingImpl<_UsedBytes>;
+ using _SimdImpl = _MissingImpl<_UsedBytes>;
+ using _MaskImpl = _MissingImpl<_UsedBytes>;
+#endif
+
+ // }}}
+ // __traits {{{
+ template <typename _Tp>
+ using __traits = std::conditional_t<
+ _S_is_valid_v<_Tp>,
+ _GnuTraits<_Tp, bool, _VecBltnBtmsk<_UsedBytes>, size<_Tp>>,
+ _InvalidTraits>;
+ //}}}
+};
+
+//}}}
+// _CommonImplBuiltin {{{
+struct _CommonImplBuiltin
+{
+ // __converts_via_decomposition{{{
+ // This lists all cases where a __vector_convert needs to fall back to
+ // conversion of individual scalars (i.e. decompose the input vector into
+ // scalars, convert, compose output vector). In those cases, __masked_load &
+ // __masked_store prefer to use the __bit_iteration implementation.
+ template <typename _From, typename _To, size_t _ToSize>
+ static inline constexpr bool __converts_via_decomposition_v
+ = sizeof(_From) != sizeof(_To);
+
+ // }}}
+ // _S_load{{{
+ template <typename _Tp, size_t _Np, size_t _M = _Np * sizeof(_Tp),
+ typename _Fp>
+ _GLIBCXX_SIMD_INTRINSIC static __vector_type_t<_Tp, _Np>
+ _S_load(const void* __p, _Fp)
+ {
+ static_assert(_Np > 1);
+ static_assert(_M % sizeof(_Tp) == 0);
+#ifdef _GLIBCXX_SIMD_WORKAROUND_PR90424
+ using _Up = conditional_t<
+ is_integral_v<_Tp>,
+ conditional_t<_M % 4 == 0, conditional_t<_M % 8 == 0, long long, int>,
+ conditional_t<_M % 2 == 0, short, signed char>>,
+ conditional_t<(_M < 8 || _Np % 2 == 1 || _Np == 2), _Tp, double>>;
+ using _V = __vector_type_t<_Up, _Np * sizeof(_Tp) / sizeof(_Up)>;
+#else // _GLIBCXX_SIMD_WORKAROUND_PR90424
+ using _V = __vector_type_t<_Tp, _Np>;
+#endif // _GLIBCXX_SIMD_WORKAROUND_PR90424
+ _V __r{};
+ static_assert(_M <= sizeof(_V));
+ if constexpr (std::is_same_v<_Fp, vector_aligned_tag>)
+ __p = __builtin_assume_aligned(__p, alignof(__vector_type_t<_Tp, _Np>));
+ else if constexpr (!std::is_same_v<_Fp, element_aligned_tag>)
+ __p = __builtin_assume_aligned(__p, _Fp::_S_alignment);
+
+ __builtin_memcpy(&__r, __p, _M);
+ return reinterpret_cast<__vector_type_t<_Tp, _Np>>(__r);
+ }
+
+ // }}}
+ // __store {{{
+ template <size_t _ReqBytes = 0, typename _Flags, typename _TV>
+ _GLIBCXX_SIMD_INTRINSIC static void __store(_TV __x, void* __addr, _Flags)
+ {
+ constexpr size_t _Bytes = _ReqBytes == 0 ? sizeof(__x) : _ReqBytes;
+ static_assert(sizeof(__x) >= _Bytes);
+
+ if constexpr (std::is_same_v<_Flags, vector_aligned_tag>)
+ __addr = __builtin_assume_aligned(__addr, alignof(_TV));
+ else if constexpr (!std::is_same_v<_Flags, element_aligned_tag>)
+ __addr = __builtin_assume_aligned(__addr, _Flags::_S_alignment);
+
+ if constexpr (__is_vector_type_v<_TV>)
+ {
+ using _Tp = typename _VectorTraits<_TV>::value_type;
+ constexpr size_t _Np = _Bytes / sizeof(_Tp);
+ static_assert(_Np * sizeof(_Tp) == _Bytes);
+
+#ifdef _GLIBCXX_SIMD_WORKAROUND_PR90424
+ using _Up = std::conditional_t<
+ (std::is_integral_v<_Tp> || _Bytes < 4),
+ std::conditional_t<(sizeof(__x) > sizeof(long long)), long long, _Tp>,
+ float>;
+ const auto __v = __vector_bitcast<_Up>(__x);
+#else // _GLIBCXX_SIMD_WORKAROUND_PR90424
+ const __vector_type_t<_Tp, _Np> __v = __x;
+#endif // _GLIBCXX_SIMD_WORKAROUND_PR90424
+
+ if constexpr ((_Bytes & (_Bytes - 1)) != 0)
+ {
+ constexpr size_t _MoreBytes = __next_power_of_2(_Bytes);
+ alignas(decltype(__v)) char __tmp[_MoreBytes];
+ __builtin_memcpy(__tmp, &__v, _MoreBytes);
+ __builtin_memcpy(__addr, __tmp, _Bytes);
+ }
+ else
+ __builtin_memcpy(__addr, &__v, _Bytes);
+ }
+ else
+ __builtin_memcpy(__addr, &__x, _Bytes);
+ }
+
+ template <typename _Flags, typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static void __store(_SimdWrapper<_Tp, _Np> __x,
+ void* __addr, _Flags)
+ {
+ __store<_Np * sizeof(_Tp)>(__x._M_data, __addr, _Flags());
+ }
+
+ // }}}
+ // __store_bool_array(_BitMask) {{{
+ template <size_t _Np, typename _Flags, bool _Sanitized>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr void
+ __store_bool_array(_BitMask<_Np, _Sanitized> __x, bool* __mem, _Flags)
+ {
+ if constexpr (_Np == 1)
+ __mem[0] = __x[0];
+ else if constexpr (_Np == 2)
+ {
+ short __bool2 = (__x._M_to_bits() * 0x81) & 0x0101;
+ __store<_Np>(__bool2, __mem, _Flags());
+ }
+ else if constexpr (_Np == 3)
+ {
+ int __bool3 = (__x._M_to_bits() * 0x4081) & 0x010101;
+ __store<_Np>(__bool3, __mem, _Flags());
+ }
+ else
+ {
+ __execute_n_times<__div_roundup(_Np, 4)>([&](auto __i) {
+ constexpr int __offset = __i * 4;
+ constexpr int __remaining = _Np - __offset;
+ if constexpr (__remaining > 4 && __remaining <= 7)
+ {
+ const _ULLong __bool7
+ = (__x.template _M_extract<__offset>()._M_to_bits()
+ * 0x40810204081ULL)
+ & 0x0101010101010101ULL;
+ __store<__remaining>(__bool7, __mem + __offset, _Flags());
+ }
+ else if constexpr (__remaining >= 4)
+ {
+ int __bits = __x.template _M_extract<__offset>()._M_to_bits();
+ if constexpr (__remaining > 7)
+ __bits &= 0xf;
+ const int __bool4 = (__bits * 0x204081) & 0x01010101;
+ __store<4>(__bool4, __mem + __offset, _Flags());
+ }
+ });
+ }
+ }
+
+ // }}}
+ // _S_blend{{{
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr auto
+ _S_blend(_SimdWrapper<_Tp, _Np> __k, _SimdWrapper<_Tp, _Np> __at0,
+ _SimdWrapper<_Tp, _Np> __at1)
+ {
+ return __vector_bitcast<__int_for_sizeof_t<_Tp>>(__k) ? __at1._M_data
+ : __at0._M_data;
+ }
+
+ // }}}
+};
+
+// }}}
+// _SimdImplBuiltin {{{1
+template <typename _Abi> struct _SimdImplBuiltin
+{
+ // member types {{{2
+ template <typename _Tp> static constexpr size_t _S_max_store_size = 16;
+ using abi_type = _Abi;
+ template <typename _Tp> using _TypeTag = _Tp*;
+ template <typename _Tp>
+ using _SimdMember = typename _Abi::template __traits<_Tp>::_SimdMember;
+ template <typename _Tp>
+ using _MaskMember = typename _Abi::template __traits<_Tp>::_MaskMember;
+ template <typename _Tp>
+ static constexpr size_t _S_size = _Abi::template size<_Tp>;
+ template <typename _Tp>
+ static constexpr size_t _S_full_size = _Abi::template _S_full_size<_Tp>;
+ using _CommonImpl = typename _Abi::_CommonImpl;
+ using _SuperImpl = typename _Abi::_SimdImpl;
+ using _MaskImpl = typename _Abi::_MaskImpl;
+
+ // __make_simd(_SimdWrapper/__intrinsic_type_t) {{{2
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static simd<_Tp, _Abi>
+ __make_simd(_SimdWrapper<_Tp, _Np> __x)
+ {
+ return {__private_init, __x};
+ }
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static simd<_Tp, _Abi>
+ __make_simd(__intrinsic_type_t<_Tp, _Np> __x)
+ {
+ return {__private_init, __vector_bitcast<_Tp>(__x)};
+ }
+
+ // __broadcast {{{2
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdMember<_Tp>
+ __broadcast(_Tp __x) noexcept
+ {
+ return __vector_broadcast<_S_full_size<_Tp>>(__x);
+ }
+
+ // __generator {{{2
+ template <typename _Fp, typename _Tp>
+ inline static constexpr _SimdMember<_Tp> __generator(_Fp&& __gen,
+ _TypeTag<_Tp>)
+ {
+ return __generate_vector<_Tp, _S_full_size<_Tp>>([&](auto __i) constexpr {
+ if constexpr (__i < _S_size<_Tp>)
+ return __gen(__i);
+ else
+ return 0;
+ });
+ }
+
+ // __load {{{2
+ template <typename _Tp, typename _Up, typename _Fp>
+ _GLIBCXX_SIMD_INTRINSIC static _SimdMember<_Tp> __load(const _Up* __mem, _Fp,
+ _TypeTag<_Tp>) noexcept
+ {
+ constexpr size_t _Np = _S_size<_Tp>;
+ constexpr size_t __max_load_size
+ = (sizeof(_Up) >= 4 && __have_avx512f) || __have_avx512bw
+ ? 64
+ : (std::is_floating_point_v<_Up> && __have_avx) || __have_avx2 ? 32
+ : 16;
+ constexpr size_t __bytes_to_load = sizeof(_Up) * _Np;
+ if constexpr (sizeof(_Up) > 8)
+ return __generate_vector<_Tp, _SimdMember<_Tp>::_S_width>([&](
+ auto __i) constexpr {
+ return static_cast<_Tp>(__i < _Np ? __mem[__i] : 0);
+ });
+ else if constexpr (std::is_same_v<_Up, _Tp>)
+ return _CommonImpl::template _S_load<_Tp, _S_full_size<_Tp>,
+ _Np * sizeof(_Tp)>(__mem, _Fp());
+ else if constexpr (__bytes_to_load <= __max_load_size)
+ return __convert<_SimdMember<_Tp>>(
+ _CommonImpl::template _S_load<_Up, _Np>(__mem, _Fp()));
+ else if constexpr (__bytes_to_load % __max_load_size == 0)
+ {
+ constexpr size_t __n_loads = __bytes_to_load / __max_load_size;
+ constexpr size_t __elements_per_load = _Np / __n_loads;
+ return __call_with_n_evaluations<__n_loads>(
+ [](auto... __uncvted) {
+ return __convert<_SimdMember<_Tp>>(__uncvted...);
+ },
+ [&](auto __i) {
+ return _CommonImpl::template _S_load<_Up, __elements_per_load>(
+ __mem + __i * __elements_per_load, _Fp());
+ });
+ }
+ else if constexpr (__bytes_to_load % (__max_load_size / 2) == 0
+ && __max_load_size > 16)
+ { // e.g. int[] -> <char, 12> with AVX2
+ constexpr size_t __n_loads = __bytes_to_load / (__max_load_size / 2);
+ constexpr size_t __elements_per_load = _Np / __n_loads;
+ return __call_with_n_evaluations<__n_loads>(
+ [](auto... __uncvted) {
+ return __convert<_SimdMember<_Tp>>(__uncvted...);
+ },
+ [&](auto __i) {
+ return _CommonImpl::template _S_load<_Up, __elements_per_load>(
+ __mem + __i * __elements_per_load, _Fp());
+ });
+ }
+ else // e.g. int[] -> <char, 9>
+ return __call_with_subscripts(
+ __mem, make_index_sequence<_Np>(), [](auto... __args) {
+ return __vector_type_t<_Tp, _S_full_size<_Tp>>{
+ static_cast<_Tp>(__args)...};
+ });
+ }
+
+ // __masked_load {{{2
+ template <typename _Tp, size_t _Np, typename _Up, typename _Fp>
+ static inline _SimdWrapper<_Tp, _Np>
+ __masked_load(_SimdWrapper<_Tp, _Np> __merge, _MaskMember<_Tp> __k,
+ const _Up* __mem, _Fp) noexcept
+ {
+ _BitOps::__bit_iteration(_MaskImpl::__to_bits(__k), [&](auto __i) {
+ __merge.__set(__i, static_cast<_Tp>(__mem[__i]));
+ });
+ return __merge;
+ }
+
+ // __store {{{2
+ template <typename _Tp, typename _Up, typename _Fp>
+ _GLIBCXX_SIMD_INTRINSIC static void __store(_SimdMember<_Tp> __v, _Up* __mem,
+ _Fp, _TypeTag<_Tp>) noexcept
+ {
+ // TODO: converting int -> "smaller int" can be optimized with AVX512
+ constexpr size_t _Np = _S_size<_Tp>;
+ constexpr size_t __max_store_size
+ = _SuperImpl::template _S_max_store_size<_Up>;
+ if constexpr (sizeof(_Up) > 8)
+ __execute_n_times<_Np>([&](auto __i) constexpr {
+ __mem[__i] = __v[__i];
+ });
+ else if constexpr (std::is_same_v<_Up, _Tp>)
+ _CommonImpl::__store(__v, __mem, _Fp());
+ else if constexpr (sizeof(_Up) * _Np <= __max_store_size)
+ _CommonImpl::__store(_SimdWrapper<_Up, _Np>(__convert<_Up>(__v)), __mem,
+ _Fp());
+ else
+ {
+ constexpr size_t __vsize = __max_store_size / sizeof(_Up);
+ // round up to convert the last partial vector as well:
+ constexpr size_t __stores = __div_roundup(_Np, __vsize);
+ constexpr size_t __full_stores = _Np / __vsize;
+ using _V = __vector_type_t<_Up, __vsize>;
+ const std::array<_V, __stores> __converted
+ = __convert_all<_V, __stores>(__v);
+ __execute_n_times<__full_stores>([&](auto __i) constexpr {
+ _CommonImpl::__store(__converted[__i], __mem + __i * __vsize, _Fp());
+ });
+ if constexpr (__full_stores < __stores)
+ _CommonImpl::template __store<(_Np - __full_stores * __vsize)
+ * sizeof(_Up)>(
+ __converted[__full_stores], __mem + __full_stores * __vsize, _Fp());
+ }
+ }
+
+ // __masked_store_nocvt {{{2
+ template <typename _Tp, std::size_t _Np, typename _Fp>
+ _GLIBCXX_SIMD_INTRINSIC static void
+ __masked_store_nocvt(_SimdWrapper<_Tp, _Np> __v, _Tp* __mem, _Fp,
+ _SimdWrapper<_Tp, _Np> __k)
+ {
+ _BitOps::__bit_iteration(
+ _MaskImpl::__to_bits(__k), [&](auto __i) constexpr {
+ __mem[__i] = __v[__i];
+ });
+ }
+
+ // __masked_store {{{2
+ template <typename _TW, typename _TVT = _VectorTraits<_TW>,
+ typename _Tp = typename _TVT::value_type, typename _Up,
+ typename _Fp>
+ static inline void __masked_store(const _TW __v, _Up* __mem, _Fp,
+ const _MaskMember<_Tp> __k) noexcept
+ {
+ constexpr size_t _TV_size = _S_size<_Tp>;
+ [[maybe_unused]] const auto __vi = __to_intrin(__v);
+ constexpr size_t __max_store_size
+ = _SuperImpl::template _S_max_store_size<_Up>;
+ if constexpr (
+ std::is_same_v<
+ _Tp,
+ _Up> || (std::is_integral_v<_Tp> && std::is_integral_v<_Up> && sizeof(_Tp) == sizeof(_Up)))
+ {
+ // bitwise or no conversion, reinterpret:
+ const auto __kk = [&]() {
+ if constexpr (__is_bitmask_v<decltype(__k)>)
+ return _MaskMember<_Up>(__k._M_data);
+ else
+ return __wrapper_bitcast<_Up>(__k);
+ }();
+ _SuperImpl::__masked_store_nocvt(__wrapper_bitcast<_Up>(__v), __mem,
+ _Fp(), __kk);
+ }
+ else if constexpr (__vectorized_sizeof<_Up>() > sizeof(_Up)
+ && !_CommonImpl::template __converts_via_decomposition_v<
+ _Tp, _Up, __max_store_size>)
+ { // conversion via decomposition is better handled via the bit_iteration
+ // fallback below
+ constexpr size_t _UW_size
+ = std::min(_TV_size, __max_store_size / sizeof(_Up));
+ static_assert(_UW_size <= _TV_size);
+ using _UW = _SimdWrapper<_Up, _UW_size>;
+ using _UV = __vector_type_t<_Up, _UW_size>;
+ using _UAbi = simd_abi::deduce_t<_Up, _UW_size>;
+ if constexpr (_UW_size == _TV_size) // one convert+store
+ {
+ const _UW __converted = __convert<_UW>(__v);
+ _SuperImpl::__masked_store_nocvt(
+ __converted, __mem, _Fp(),
+ _UAbi::_MaskImpl::template __convert<_Up>(__k));
+ }
+ else
+ {
+ static_assert(_UW_size * sizeof(_Up) == __max_store_size);
+ constexpr size_t _NFullStores = _TV_size / _UW_size;
+ constexpr size_t _NAllStores = __div_roundup(_TV_size, _UW_size);
+ constexpr size_t _NParts = _S_full_size<_Tp> / _UW_size;
+ const std::array<_UV, _NAllStores> __converted
+ = __convert_all<_UV, _NAllStores>(__v);
+ __execute_n_times<_NFullStores>([&](auto __i) {
+ _SuperImpl::__masked_store_nocvt(
+ _UW(__converted[__i]), __mem + __i * _UW_size, _Fp(),
+ _UAbi::_MaskImpl::template __convert<_Up>(
+ __extract_part<__i, _NParts>(__k.__as_full_vector())));
+ });
+ if constexpr (_NAllStores > _NFullStores) // one partial at the end
+ _SuperImpl::__masked_store_nocvt(
+ _UW(__converted[_NFullStores]), __mem + _NFullStores * _UW_size,
+ _Fp(),
+ _UAbi::_MaskImpl::template __convert<_Up>(
+ __extract_part<_NFullStores, _NParts>(
+ __k.__as_full_vector())));
+ }
+ }
+ else
+ _BitOps::__bit_iteration(
+ _MaskImpl::__to_bits(__k), [&](auto __i) constexpr {
+ __mem[__i] = static_cast<_Up>(__v[__i]);
+ });
+ }
+
+ // __complement {{{2
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np>
+ __complement(_SimdWrapper<_Tp, _Np> __x) noexcept
+ {
+ return ~__x._M_data;
+ }
+
+ // __unary_minus {{{2
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np>
+ __unary_minus(_SimdWrapper<_Tp, _Np> __x) noexcept
+ {
+ // GCC doesn't use the psign instructions, but pxor & psub seem to be just
+ // as good a choice as pcmpeqd & psign. So meh.
+ return -__x._M_data;
+ }
+
+ // arithmetic operators {{{2
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np>
+ __plus(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y)
+ {
+ return __x._M_data + __y._M_data;
+ }
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np>
+ __minus(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y)
+ {
+ return __x._M_data - __y._M_data;
+ }
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np>
+ __multiplies(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y)
+ {
+ return __x._M_data * __y._M_data;
+ }
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np>
+ __divides(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y)
+ {
+ // Note that division by 0 is always UB, so we must ensure we avoid the
+ // case for partial registers
+ if constexpr (!_Abi::_S_is_partial)
+ return __x._M_data / __y._M_data;
+ else
+ return __as_vector(__x) / _Abi::__make_padding_nonzero(__as_vector(__y));
+ }
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np>
+ __modulus(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y)
+ {
+ if constexpr (!_Abi::_S_is_partial)
+ return __x._M_data % __y._M_data;
+ else
+ return __as_vector(__x) % _Abi::__make_padding_nonzero(__as_vector(__y));
+ }
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np>
+ __bit_and(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y)
+ {
+ return __and(__x._M_data, __y._M_data);
+ }
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np>
+ __bit_or(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y)
+ {
+ return __or(__x._M_data, __y._M_data);
+ }
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np>
+ __bit_xor(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y)
+ {
+ return __xor(__x._M_data, __y._M_data);
+ }
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static _SimdWrapper<_Tp, _Np>
+ __bit_shift_left(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y)
+ {
+ return __x._M_data << __y._M_data;
+ }
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static _SimdWrapper<_Tp, _Np>
+ __bit_shift_right(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y)
+ {
+#ifdef _GLIBCXX_SIMD_WORKAROUND_XXX_5
+ if constexpr (sizeof(_Tp) == 8)
+ return __generate_vector<__vector_type_t<_Tp, _Np>>([&](auto __i) {
+ return __x._M_data[__i.value] >> __y._M_data[__i.value];
+ });
+ else
+#endif
+ return __x._M_data >> __y._M_data;
+ }
+
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np>
+ __bit_shift_left(_SimdWrapper<_Tp, _Np> __x, int __y)
+ {
+ // The behavior is undefined if the right operand is negative, or greater
+ // than or equal to the width of the promoted left operand.
+ if (__y < 0 || __y >= sizeof(std::declval<_Tp>() << __y) * CHAR_BIT)
+ __builtin_unreachable();
+ else if (__builtin_constant_p(__y) && __y >= sizeof(_Tp) * CHAR_BIT)
+ return {};
+ else
+ return __x._M_data << __y;
+ }
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np>
+ __bit_shift_right(_SimdWrapper<_Tp, _Np> __x, int __y)
+ {
+ if (__y < 0 || __y >= sizeof(std::declval<_Tp>() >> __y) * CHAR_BIT)
+ __builtin_unreachable();
+ else if (__builtin_constant_p(__y) && __y >= sizeof(_Tp) * CHAR_BIT
+ && is_unsigned_v<_Tp>)
+ return {};
+ else
+ return __x._M_data >> __y;
+ }
+
+ // compares {{{2
+ // __equal_to {{{3
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember<_Tp>
+ __equal_to(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y)
+ {
+ return __vector_bitcast<_Tp>(__x._M_data == __y._M_data);
+ }
+
+ // __not_equal_to {{{3
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember<_Tp>
+ __not_equal_to(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y)
+ {
+ return __vector_bitcast<_Tp>(__x._M_data != __y._M_data);
+ }
+
+ // __less {{{3
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember<_Tp>
+ __less(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y)
+ {
+ return __vector_bitcast<_Tp>(__x._M_data < __y._M_data);
+ }
+
+ // __less_equal {{{3
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember<_Tp>
+ __less_equal(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y)
+ {
+ return __vector_bitcast<_Tp>(__x._M_data <= __y._M_data);
+ }
+
+ // negation {{{2
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember<_Tp>
+ __negate(_SimdWrapper<_Tp, _Np> __x) noexcept
+ {
+ return __vector_bitcast<_Tp>(!__x._M_data);
+ }
+
+ // __min, __max, __minmax {{{2
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_NORMAL_MATH
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np>
+ __min(_SimdWrapper<_Tp, _Np> __a, _SimdWrapper<_Tp, _Np> __b)
+ {
+ return __a._M_data < __b._M_data ? __a._M_data : __b._M_data;
+ }
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_NORMAL_MATH
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np>
+ __max(_SimdWrapper<_Tp, _Np> __a, _SimdWrapper<_Tp, _Np> __b)
+ {
+ return __a._M_data > __b._M_data ? __a._M_data : __b._M_data;
+ }
+
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_NORMAL_MATH
+ _GLIBCXX_SIMD_INTRINSIC static constexpr std::pair<_SimdWrapper<_Tp, _Np>,
+ _SimdWrapper<_Tp, _Np>>
+ __minmax(_SimdWrapper<_Tp, _Np> __a, _SimdWrapper<_Tp, _Np> __b)
+ {
+ return {__a._M_data < __b._M_data ? __a._M_data : __b._M_data,
+ __a._M_data < __b._M_data ? __b._M_data : __a._M_data};
+ }
+
+ // reductions {{{2
+ template <size_t _Np, size_t... _Is, size_t... _Zeros, typename _Tp,
+ typename _BinaryOperation>
+ _GLIBCXX_SIMD_INTRINSIC static _Tp
+ __reduce_partial(std::index_sequence<_Is...>, std::index_sequence<_Zeros...>,
+ simd<_Tp, _Abi> __x, _BinaryOperation&& __binary_op)
+ {
+ using _V = __vector_type_t<_Tp, _Np / 2>;
+ static_assert(sizeof(_V) <= sizeof(__x));
+ // _S_width is the size of the smallest native SIMD register that can
+ // store _Np/2 elements:
+ using _FullSimd = __deduced_simd<_Tp, _VectorTraits<_V>::_S_width>;
+ using _HalfSimd = __deduced_simd<_Tp, _Np / 2>;
+ const auto __xx = __as_vector(__x);
+ return _HalfSimd::abi_type::_SimdImpl::__reduce(
+ static_cast<_HalfSimd>(__as_vector(__binary_op(
+ static_cast<_FullSimd>(__intrin_bitcast<_V>(__xx)),
+ static_cast<_FullSimd>(__intrin_bitcast<_V>(
+ __vector_permute<(_Np / 2 + _Is)..., (int(_Zeros * 0) - 1)...>(
+ __xx)))))),
+ __binary_op);
+ }
+
+ template <typename _Tp, typename _BinaryOperation>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _Tp
+ __reduce(simd<_Tp, _Abi> __x, _BinaryOperation&& __binary_op)
+ {
+ constexpr size_t _Np = simd_size_v<_Tp, _Abi>;
+ if constexpr (_Np == 1)
+ return __x[0];
+ else if constexpr (_Np == 2)
+ return __binary_op(simd<_Tp, simd_abi::scalar>(__x[0]),
+ simd<_Tp, simd_abi::scalar>(__x[1]))[0];
+ else if constexpr (_Abi::_S_is_partial) //{{{
+ {
+ [[maybe_unused]] constexpr auto __full_size
+ = _Abi::template _S_full_size<_Tp>;
+ if constexpr (_Np == 3)
+ return __binary_op(__binary_op(simd<_Tp, simd_abi::scalar>(__x[0]),
+ simd<_Tp, simd_abi::scalar>(__x[1])),
+ simd<_Tp, simd_abi::scalar>(__x[2]))[0];
+ else if constexpr (std::is_same_v<__remove_cvref_t<_BinaryOperation>,
+ std::plus<>>)
+ {
+ using _Ap = simd_abi::deduce_t<_Tp, __full_size>;
+ return _Ap::_SimdImpl::__reduce(
+ simd<_Tp, _Ap>(__private_init, _Abi::__masked(__as_vector(__x))),
+ __binary_op);
+ }
+ else if constexpr (std::is_same_v<__remove_cvref_t<_BinaryOperation>,
+ std::multiplies<>>)
+ {
+ using _Ap = simd_abi::deduce_t<_Tp, __full_size>;
+ using _TW = _SimdWrapper<_Tp, __full_size>;
+ constexpr auto __implicit_mask_full
+ = _Abi::template __implicit_mask<_Tp>().__as_full_vector();
+ constexpr _TW __one = __vector_broadcast<__full_size>(_Tp(1));
+ const _TW __x_full = __data(__x).__as_full_vector();
+ const _TW __x_padded_with_ones
+ = _Ap::_CommonImpl::_S_blend(__implicit_mask_full, __one,
+ __x_full);
+ return _Ap::_SimdImpl::__reduce(
+ simd<_Tp, _Ap>(__private_init, __x_padded_with_ones),
+ __binary_op);
+ }
+ else if constexpr (_Np & 1)
+ {
+ using _Ap = simd_abi::deduce_t<_Tp, _Np - 1>;
+ return __binary_op(
+ simd<_Tp, simd_abi::scalar>(_Ap::_SimdImpl::__reduce(
+ simd<_Tp, _Ap>(__intrin_bitcast<__vector_type_t<_Tp, _Np - 1>>(
+ __as_vector(__x))),
+ __binary_op)),
+ simd<_Tp, simd_abi::scalar>(__x[_Np - 1]))[0];
+ }
+ else
+ return __reduce_partial<_Np>(
+ std::make_index_sequence<_Np / 2>(),
+ std::make_index_sequence<__full_size - _Np / 2>(), __x,
+ __binary_op);
+ } //}}}
+ else if constexpr (sizeof(__x) == 16) //{{{
+ {
+ if constexpr (_Np == 16)
+ {
+ const auto __y = __data(__x);
+ __x = __binary_op(
+ __make_simd<_Tp, _Np>(__vector_permute<0, 0, 1, 1, 2, 2, 3, 3, 4,
+ 4, 5, 5, 6, 6, 7, 7>(__y)),
+ __make_simd<_Tp, _Np>(
+ __vector_permute<8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14,
+ 14, 15, 15>(__y)));
+ }
+ if constexpr (_Np >= 8)
+ {
+ const auto __y = __vector_bitcast<short>(__data(__x));
+ __x
+ = __binary_op(__make_simd<_Tp, _Np>(__vector_bitcast<_Tp>(
+ __vector_permute<0, 0, 1, 1, 2, 2, 3, 3>(__y))),
+ __make_simd<_Tp, _Np>(__vector_bitcast<_Tp>(
+ __vector_permute<4, 4, 5, 5, 6, 6, 7, 7>(__y))));
+ }
+ if constexpr (_Np >= 4)
+ {
+ using _Up
+ = std::conditional_t<std::is_floating_point_v<_Tp>, float, int>;
+ const auto __y = __vector_bitcast<_Up>(__data(__x));
+ __x = __binary_op(__x, __make_simd<_Tp, _Np>(__vector_bitcast<_Tp>(
+ __vector_permute<3, 2, 1, 0>(__y))));
+ }
+ using _Up
+ = std::conditional_t<std::is_floating_point_v<_Tp>, double, _LLong>;
+ const auto __y = __vector_bitcast<_Up>(__data(__x));
+ __x = __binary_op(__x, __make_simd<_Tp, _Np>(__vector_bitcast<_Tp>(
+ __vector_permute<1, 1>(__y))));
+ return __x[0];
+ } //}}}
+ else
+ {
+ static_assert(sizeof(__x) > __min_vector_size<_Tp>);
+ static_assert((_Np & (_Np - 1)) == 0); // _Np must be a power of 2
+ using _Ap = simd_abi::deduce_t<_Tp, _Np / 2>;
+ using _V = std::experimental::simd<_Tp, _Ap>;
+ return _Ap::_SimdImpl::__reduce(
+ __binary_op(_V(__private_init, __extract<0, 2>(__as_vector(__x))),
+ _V(__private_init, __extract<1, 2>(__as_vector(__x)))),
+ static_cast<_BinaryOperation&&>(__binary_op));
+ }
+ }
+
+ // math {{{2
+ // frexp, modf and copysign implemented in simd_math.h
+#define _GLIBCXX_SIMD_MATH_FALLBACK(__name) \
+ template <typename _Tp, typename... _More> \
+ static _Tp __##__name(const _Tp& __x, const _More&... __more) \
+ { \
+ return __generate_vector<_Tp>( \
+ [&](auto __i) { return std::__name(__x[__i], __more[__i]...); }); \
+ }
+
+#define _GLIBCXX_SIMD_MATH_FALLBACK_MASKRET(__name) \
+ template <typename _Tp, typename... _More> \
+ static \
+ typename _Tp::mask_type __##__name(const _Tp& __x, const _More&... __more) \
+ { \
+ return __generate_vector<_Tp>( \
+ [&](auto __i) { return std::__name(__x[__i], __more[__i]...); }); \
+ }
+
+#define _GLIBCXX_SIMD_MATH_FALLBACK_FIXEDRET(_RetTp, __name) \
+ template <typename _Tp, typename... _More> \
+ static auto __##__name(const _Tp& __x, const _More&... __more) \
+ { \
+ return __fixed_size_storage_t<_RetTp, \
+ _VectorTraits<_Tp>::_S_partial_width>:: \
+ __generate([&](auto __meta) constexpr { \
+ return __meta.__generator( \
+ [&](auto __i) { \
+ return std::__name(__x[__meta._S_offset + __i], \
+ __more[__meta._S_offset + __i]...); \
+ }, \
+ static_cast<_RetTp*>(nullptr)); \
+ }); \
+ }
+
+ _GLIBCXX_SIMD_MATH_FALLBACK(acos)
+ _GLIBCXX_SIMD_MATH_FALLBACK(asin)
+ _GLIBCXX_SIMD_MATH_FALLBACK(atan)
+ _GLIBCXX_SIMD_MATH_FALLBACK(atan2)
+ _GLIBCXX_SIMD_MATH_FALLBACK(cos)
+ _GLIBCXX_SIMD_MATH_FALLBACK(sin)
+ _GLIBCXX_SIMD_MATH_FALLBACK(tan)
+ _GLIBCXX_SIMD_MATH_FALLBACK(acosh)
+ _GLIBCXX_SIMD_MATH_FALLBACK(asinh)
+ _GLIBCXX_SIMD_MATH_FALLBACK(atanh)
+ _GLIBCXX_SIMD_MATH_FALLBACK(cosh)
+ _GLIBCXX_SIMD_MATH_FALLBACK(sinh)
+ _GLIBCXX_SIMD_MATH_FALLBACK(tanh)
+ _GLIBCXX_SIMD_MATH_FALLBACK(exp)
+ _GLIBCXX_SIMD_MATH_FALLBACK(exp2)
+ _GLIBCXX_SIMD_MATH_FALLBACK(expm1)
+ _GLIBCXX_SIMD_MATH_FALLBACK(ldexp)
+ _GLIBCXX_SIMD_MATH_FALLBACK_FIXEDRET(int, ilogb)
+ _GLIBCXX_SIMD_MATH_FALLBACK(log)
+ _GLIBCXX_SIMD_MATH_FALLBACK(log10)
+ _GLIBCXX_SIMD_MATH_FALLBACK(log1p)
+ _GLIBCXX_SIMD_MATH_FALLBACK(log2)
+ _GLIBCXX_SIMD_MATH_FALLBACK(logb)
+
+ // modf implemented in simd_math.h
+ _GLIBCXX_SIMD_MATH_FALLBACK(scalbn)
+ _GLIBCXX_SIMD_MATH_FALLBACK(scalbln)
+ _GLIBCXX_SIMD_MATH_FALLBACK(cbrt)
+ _GLIBCXX_SIMD_MATH_FALLBACK(fabs)
+ _GLIBCXX_SIMD_MATH_FALLBACK(pow)
+ _GLIBCXX_SIMD_MATH_FALLBACK(sqrt)
+ _GLIBCXX_SIMD_MATH_FALLBACK(erf)
+ _GLIBCXX_SIMD_MATH_FALLBACK(erfc)
+ _GLIBCXX_SIMD_MATH_FALLBACK(lgamma)
+ _GLIBCXX_SIMD_MATH_FALLBACK(tgamma)
+
+ _GLIBCXX_SIMD_MATH_FALLBACK_FIXEDRET(long, lrint)
+ _GLIBCXX_SIMD_MATH_FALLBACK_FIXEDRET(long long, llrint)
+
+ _GLIBCXX_SIMD_MATH_FALLBACK_FIXEDRET(long, lround)
+ _GLIBCXX_SIMD_MATH_FALLBACK_FIXEDRET(long long, llround)
+
+ _GLIBCXX_SIMD_MATH_FALLBACK(fmod)
+ _GLIBCXX_SIMD_MATH_FALLBACK(remainder)
+
+ template <typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+ static _Tp __remquo(const _Tp __x, const _Tp __y,
+ __fixed_size_storage_t<int, _TVT::_S_partial_width>* __z)
+ {
+ return __generate_vector<_Tp>([&](auto __i) {
+ int __tmp;
+ auto __r = std::remquo(__x[__i], __y[__i], &__tmp);
+ __z->__set(__i, __tmp);
+ return __r;
+ });
+ }
+
+ // copysign in simd_math.h
+ _GLIBCXX_SIMD_MATH_FALLBACK(nextafter)
+ _GLIBCXX_SIMD_MATH_FALLBACK(fdim)
+ _GLIBCXX_SIMD_MATH_FALLBACK(fmax)
+ _GLIBCXX_SIMD_MATH_FALLBACK(fmin)
+ _GLIBCXX_SIMD_MATH_FALLBACK(fma)
+
+ template <typename _Tp, size_t _Np>
+ static constexpr auto __isgreater(_SimdWrapper<_Tp, _Np> __x,
+ _SimdWrapper<_Tp, _Np> __y) noexcept
+ {
+ using _Ip = __int_for_sizeof_t<_Tp>;
+ const auto __xn = __vector_bitcast<_Ip>(__x);
+ const auto __yn = __vector_bitcast<_Ip>(__y);
+ const auto __xp = __xn < 0 ? -(__xn & numeric_limits<_Ip>::max()) : __xn;
+ const auto __yp = __yn < 0 ? -(__yn & numeric_limits<_Ip>::max()) : __yn;
+ return __and(__not(_SuperImpl::__isunordered(__x, __y)),
+ __vector_bitcast<_Tp>(__xp > __yp));
+ }
+ template <typename _Tp, size_t _Np>
+ static constexpr auto __isgreaterequal(_SimdWrapper<_Tp, _Np> __x,
+ _SimdWrapper<_Tp, _Np> __y) noexcept
+ {
+ using _Ip = __int_for_sizeof_t<_Tp>;
+ const auto __xn = __vector_bitcast<_Ip>(__x);
+ const auto __yn = __vector_bitcast<_Ip>(__y);
+ const auto __xp = __xn < 0 ? -(__xn & numeric_limits<_Ip>::max()) : __xn;
+ const auto __yp = __yn < 0 ? -(__yn & numeric_limits<_Ip>::max()) : __yn;
+ return __and(__not(_SuperImpl::__isunordered(__x, __y)),
+ __vector_bitcast<_Tp>(__xp >= __yp));
+ }
+ template <typename _Tp, size_t _Np>
+ static constexpr auto __isless(_SimdWrapper<_Tp, _Np> __x,
+ _SimdWrapper<_Tp, _Np> __y) noexcept
+ {
+ using _Ip = __int_for_sizeof_t<_Tp>;
+ const auto __xn = __vector_bitcast<_Ip>(__x);
+ const auto __yn = __vector_bitcast<_Ip>(__y);
+ const auto __xp = __xn < 0 ? -(__xn & numeric_limits<_Ip>::max()) : __xn;
+ const auto __yp = __yn < 0 ? -(__yn & numeric_limits<_Ip>::max()) : __yn;
+ return __and(__not(_SuperImpl::__isunordered(__x, __y)),
+ __vector_bitcast<_Tp>(__xp < __yp));
+ }
+ template <typename _Tp, size_t _Np>
+ static constexpr auto __islessequal(_SimdWrapper<_Tp, _Np> __x,
+ _SimdWrapper<_Tp, _Np> __y) noexcept
+ {
+ using _Ip = __int_for_sizeof_t<_Tp>;
+ const auto __xn = __vector_bitcast<_Ip>(__x);
+ const auto __yn = __vector_bitcast<_Ip>(__y);
+ const auto __xp = __xn < 0 ? -(__xn & numeric_limits<_Ip>::max()) : __xn;
+ const auto __yp = __yn < 0 ? -(__yn & numeric_limits<_Ip>::max()) : __yn;
+ return __and(__not(_SuperImpl::__isunordered(__x, __y)),
+ __vector_bitcast<_Tp>(__xp <= __yp));
+ }
+ template <typename _Tp, size_t _Np>
+ static constexpr auto __islessgreater(_SimdWrapper<_Tp, _Np> __x,
+ _SimdWrapper<_Tp, _Np> __y) noexcept
+ {
+ return __and(__not(_SuperImpl::__isunordered(__x, __y)),
+ _SuperImpl::__not_equal_to(__x, __y));
+ }
+
+#undef _GLIBCXX_SIMD_MATH_FALLBACK
+#undef _GLIBCXX_SIMD_MATH_FALLBACK_MASKRET
+#undef _GLIBCXX_SIMD_MATH_FALLBACK_FIXEDRET
+ // __abs {{{3
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static _SimdWrapper<_Tp, _Np>
+ __abs(_SimdWrapper<_Tp, _Np> __x) noexcept
+ {
+ // if (__builtin_is_constant_evaluated())
+ // {
+ // return __x._M_data < 0 ? -__x._M_data : __x._M_data;
+ // }
+ if constexpr (std::is_floating_point_v<_Tp>)
+ // `v < 0 ? -v : v` cannot compile to the efficient implementation of
+ // masking the signbit off because it must consider v == -0
+
+ // ~(-0.) & v would be easy, but breaks with fno-signed-zeros
+ return __and(_S_absmask<__vector_type_t<_Tp, _Np>>, __x._M_data);
+ else
+#ifdef _GLIBCXX_SIMD_WORKAROUND_PR91533
+ if constexpr (sizeof(__x) < 16 && std::is_signed_v<_Tp>)
+ {
+ if constexpr (sizeof(_Tp) == 4)
+ return __auto_bitcast(_mm_abs_epi32(__to_intrin(__x)));
+ else if constexpr (sizeof(_Tp) == 2)
+ return __auto_bitcast(_mm_abs_epi16(__to_intrin(__x)));
+ else
+ return __auto_bitcast(_mm_abs_epi8(__to_intrin(__x)));
+ }
+ else
+#endif //_GLIBCXX_SIMD_WORKAROUND_PR91533
+ return __x._M_data < 0 ? -__x._M_data : __x._M_data;
+ }
+
+ // __nearbyint {{{3
+ template <typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+ _GLIBCXX_SIMD_INTRINSIC static _Tp __nearbyint(_Tp __x_) noexcept
+ {
+ using value_type = typename _TVT::value_type;
+ using _V = typename _TVT::type;
+ const _V __x = __x_;
+ const _V __absx = __and(__x, _S_absmask<_V>);
+ static_assert(CHAR_BIT * sizeof(1ull)
+ >= std::numeric_limits<value_type>::digits);
+ constexpr _V __shifter_abs
+ = _V() + (1ull << (std::numeric_limits<value_type>::digits - 1));
+ const _V __shifter = __or(__and(_S_signmask<_V>, __x), __shifter_abs);
+ _V __shifted = __x + __shifter;
+ // how can we stop -fassociative-math to break this pattern?
+ // asm("" : "+X"(__shifted));
+ __shifted -= __shifter;
+ return __absx < __shifter_abs ? __shifted : __x;
+ }
+
+ // __rint {{{3
+ template <typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+ _GLIBCXX_SIMD_INTRINSIC static _Tp __rint(_Tp __x) noexcept
+ {
+ return _SuperImpl::__nearbyint(__x);
+ }
+
+ // __trunc {{{3
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static _SimdWrapper<_Tp, _Np>
+ __trunc(_SimdWrapper<_Tp, _Np> __x)
+ {
+ using _V = __vector_type_t<_Tp, _Np>;
+ const _V __absx = __and(__x._M_data, _S_absmask<_V>);
+ static_assert(CHAR_BIT * sizeof(1ull) >= std::numeric_limits<_Tp>::digits);
+ constexpr _Tp __shifter = 1ull << (std::numeric_limits<_Tp>::digits - 1);
+ _V __truncated = (__absx + __shifter) - __shifter;
+ __truncated -= __truncated > __absx ? _V() + 1 : _V();
+ return __absx < __shifter ? __or(__xor(__absx, __x._M_data), __truncated)
+ : __x._M_data;
+ }
+
+ // __round {{{3
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static _SimdWrapper<_Tp, _Np>
+ __round(_SimdWrapper<_Tp, _Np> __x)
+ {
+ using _V = __vector_type_t<_Tp, _Np>;
+ const _V __absx = __and(__x._M_data, _S_absmask<_V>);
+ static_assert(CHAR_BIT * sizeof(1ull) >= std::numeric_limits<_Tp>::digits);
+ constexpr _Tp __shifter = 1ull << (std::numeric_limits<_Tp>::digits - 1);
+ _V __truncated = (__absx + __shifter) - __shifter;
+ __truncated -= __truncated > __absx ? _V() + 1 : _V();
+ const _V __rounded
+ = __or(__xor(__absx, __x._M_data),
+ __truncated + (__absx - __truncated >= _Tp(.5) ? _V() + 1 : _V()));
+ return __absx < __shifter ? __rounded : __x._M_data;
+ }
+
+ // __floor {{{3
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static _SimdWrapper<_Tp, _Np>
+ __floor(_SimdWrapper<_Tp, _Np> __x)
+ {
+ const auto __y = _SuperImpl::__trunc(__x)._M_data;
+ const auto __negative_input
+ = __vector_bitcast<_Tp>(__x._M_data < __vector_broadcast<_Np, _Tp>(0));
+ const auto __mask
+ = __andnot(__vector_bitcast<_Tp>(__y == __x._M_data), __negative_input);
+ return __or(__andnot(__mask, __y),
+ __and(__mask, __y - __vector_broadcast<_Np, _Tp>(1)));
+ }
+
+ // __ceil {{{3
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static _SimdWrapper<_Tp, _Np>
+ __ceil(_SimdWrapper<_Tp, _Np> __x)
+ {
+ const auto __y = _SuperImpl::__trunc(__x)._M_data;
+ const auto __negative_input
+ = __vector_bitcast<_Tp>(__x._M_data < __vector_broadcast<_Np, _Tp>(0));
+ const auto __inv_mask
+ = __or(__vector_bitcast<_Tp>(__y == __x._M_data), __negative_input);
+ return __or(__and(__inv_mask, __y),
+ __andnot(__inv_mask, __y + __vector_broadcast<_Np, _Tp>(1)));
+ }
+
+ // __isnan {{{3
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static _MaskMember<_Tp>
+ __isnan(_SimdWrapper<_Tp, _Np> __x)
+ {
+#if __FINITE_MATH_ONLY__
+ [](auto&&) {}(__x);
+ return {}; // false
+#elif !defined __SUPPORT_SNAN__
+ return __vector_bitcast<_Tp>(~(__x._M_data == __x._M_data));
+#elif defined __STDC_IEC_559__
+ using _Up = make_unsigned_t<__int_for_sizeof_t<_Tp>>;
+ constexpr auto __max = __vector_bitcast<_Up>(
+ __vector_broadcast<_Np>(numeric_limits<_Tp>::infinity()));
+ auto __bits = __vector_bitcast<_Up>(__x);
+ __bits &= __vector_bitcast<_Up>(_S_absmask<__vector_type_t<_Tp, _Np>>);
+ return __vector_bitcast<_Tp>(__bits > __max);
+#else
+#error "Not implemented: how to support SNaN but non-IEC559 floating-point?"
+#endif
+ }
+
+ // __isfinite {{{3
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static _MaskMember<_Tp>
+ __isfinite(_SimdWrapper<_Tp, _Np> __x)
+ {
+#if __FINITE_MATH_ONLY__
+ [](auto&&) {}(__x);
+ return __vector_bitcast<_Np>(_Tp()) == __vector_bitcast<_Np>(_Tp());
+#else
+ // if all exponent bits are set, __x is either inf or NaN
+ using _I = __int_for_sizeof_t<_Tp>;
+ constexpr auto __inf = __vector_bitcast<_I>(
+ __vector_broadcast<_Np>(std::numeric_limits<_Tp>::infinity()));
+ return __vector_bitcast<_Tp>(__inf > (__vector_bitcast<_I>(__x) & __inf));
+#endif
+ }
+
+ // __isunordered {{{3
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static _MaskMember<_Tp>
+ __isunordered(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y)
+ {
+ return __or(__isnan(__x), __isnan(__y));
+ }
+
+ // __signbit {{{3
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static _MaskMember<_Tp>
+ __signbit(_SimdWrapper<_Tp, _Np> __x)
+ {
+ using _I = __int_for_sizeof_t<_Tp>;
+ return __vector_bitcast<_Tp>(__vector_bitcast<_I>(__x) < 0);
+ // Arithmetic right shift (SRA) would also work (instead of compare), but
+ // 64-bit SRA isn't available on x86 before AVX512. And in general,
+ // compares are more likely to be efficient than SRA.
+ }
+
+ // __isinf {{{3
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static _MaskMember<_Tp>
+ __isinf(_SimdWrapper<_Tp, _Np> __x)
+ {
+#if __FINITE_MATH_ONLY__
+ [](auto&&) {}(__x);
+ return {}; // false
+#else
+ return _SuperImpl::template __equal_to<_Tp, _Np>(
+ _SuperImpl::__abs(__x),
+ __vector_broadcast<_Np>(std::numeric_limits<_Tp>::infinity()));
+ // alternative:
+ // compare to inf using the corresponding integer type
+ /*
+ return
+ __vector_bitcast<_Tp>(__vector_bitcast<__int_for_sizeof_t<_Tp>>(__abs(__x)._M_data)
+ ==
+ __vector_bitcast<__int_for_sizeof_t<_Tp>>(__vector_broadcast<_Np>(
+ std::numeric_limits<_Tp>::infinity())));
+ */
+#endif
+ }
+
+ // __isnormal {{{3
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static _MaskMember<_Tp>
+ __isnormal(_SimdWrapper<_Tp, _Np> __x)
+ {
+ using _I = __int_for_sizeof_t<_Tp>;
+ const auto absn = __vector_bitcast<_I>(_SuperImpl::__abs(__x));
+ const auto minn = __vector_bitcast<_I>(
+ __vector_broadcast<_Np>(std::numeric_limits<_Tp>::min()));
+#if __FINITE_MATH_ONLY__
+ return __auto_bitcast(absn >= minn);
+#else
+ const auto infn = __vector_bitcast<_I>(
+ __vector_broadcast<_Np>(std::numeric_limits<_Tp>::infinity()));
+ return __auto_bitcast(absn >= minn && absn < infn);
+#endif
+ }
+
+ // __fpclassify {{{3
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static __fixed_size_storage_t<int, _Np>
+ __fpclassify(_SimdWrapper<_Tp, _Np> __x)
+ {
+ using _I = __int_for_sizeof_t<_Tp>;
+ const auto __xi = __to_intrin(__abs(__x));
+ const auto __xn = __vector_bitcast<_I>(__xi);
+ constexpr size_t _NI = sizeof(__xn) / sizeof(_I);
+
+ constexpr auto __fp_normal = __vector_broadcast<_NI, _I>(FP_NORMAL);
+ constexpr auto __fp_nan = __vector_broadcast<_NI, _I>(FP_NAN);
+ constexpr auto __fp_infinite = __vector_broadcast<_NI, _I>(FP_INFINITE);
+ constexpr auto __fp_subnormal = __vector_broadcast<_NI, _I>(FP_SUBNORMAL);
+ constexpr auto __fp_zero = __vector_broadcast<_NI, _I>(FP_ZERO);
+
+ __vector_type_t<_I, _NI> __tmp;
+ if constexpr (sizeof(_Tp) == 4)
+ __tmp = __xn < 0x0080'0000
+ ? (__xn == 0 ? __fp_zero : __fp_subnormal)
+ : (__xn < 0x7f80'0000
+ ? __fp_normal
+ : (__xn == 0x7f80'0000 ? __fp_infinite : __fp_nan));
+ else if constexpr (sizeof(_Tp) == 8)
+ __tmp = __xn < 0x0010'0000'0000'0000LL
+ ? (__xn == 0 ? __fp_zero : __fp_subnormal)
+ : (__xn < 0x7ff0'0000'0000'0000LL
+ ? __fp_normal
+ : (__xn == 0x7ff0'0000'0000'0000LL ? __fp_infinite
+ : __fp_nan));
+ else
+ __assert_unreachable<_Tp>();
+
+ if constexpr (sizeof(_I) == sizeof(int))
+ {
+ using _FixedInt = __fixed_size_storage_t<int, _Np>;
+ const auto __as_int = __vector_bitcast<int, _Np>(__tmp);
+ if constexpr (_FixedInt::_S_tuple_size == 1)
+ return {__as_int};
+ else if constexpr (_FixedInt::_S_tuple_size == 2
+ && std::is_same_v<
+ typename _FixedInt::_SecondType::_FirstAbi,
+ simd_abi::scalar>)
+ return {__extract<0, 2>(__as_int), __as_int[_Np - 1]};
+ else if constexpr (_FixedInt::_S_tuple_size == 2)
+ return {__extract<0, 2>(__as_int),
+ __auto_bitcast(__extract<1, 2>(__as_int))};
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (_Np == 2 && sizeof(_I) == 8
+ && __fixed_size_storage_t<int, _Np>::_S_tuple_size == 2)
+ {
+ const auto __aslong = __vector_bitcast<_LLong>(__tmp);
+ return {int(__aslong[0]), {int(__aslong[1])}};
+ }
+#if _GLIBCXX_SIMD_X86INTRIN
+ else if constexpr (sizeof(_Tp) == 8 && sizeof(__tmp) == 32
+ && __fixed_size_storage_t<int, _Np>::_S_tuple_size == 1)
+ return {_mm_packs_epi32(__to_intrin(__lo128(__tmp)),
+ __to_intrin(__hi128(__tmp)))};
+ else if constexpr (sizeof(_Tp) == 8 && sizeof(__tmp) == 64
+ && __fixed_size_storage_t<int, _Np>::_S_tuple_size == 1)
+ return {_mm512_cvtepi64_epi32(__to_intrin(__tmp))};
+#endif // _GLIBCXX_SIMD_X86INTRIN
+ else if constexpr (__fixed_size_storage_t<int, _Np>::_S_tuple_size == 1)
+ return {__call_with_subscripts<_Np>(__vector_bitcast<_LLong>(__tmp),
+ [](auto... __l) {
+ return __make_wrapper<int>(__l...);
+ })};
+ else
+ __assert_unreachable<_Tp>();
+ }
+
+ // __increment & __decrement{{{2
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static void __increment(_SimdWrapper<_Tp, _Np>& __x)
+ {
+ __x = __x._M_data + 1;
+ }
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static void __decrement(_SimdWrapper<_Tp, _Np>& __x)
+ {
+ __x = __x._M_data - 1;
+ }
+
+ // smart_reference access {{{2
+ template <typename _Tp, size_t _Np, typename _Up>
+ _GLIBCXX_SIMD_INTRINSIC constexpr static void
+ __set(_SimdWrapper<_Tp, _Np>& __v, int __i, _Up&& __x) noexcept
+ {
+ __v.__set(__i, static_cast<_Up&&>(__x));
+ }
+
+ // __masked_assign{{{2
+ template <typename _Tp, typename _K, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static void
+ __masked_assign(_SimdWrapper<_K, _Np> __k, _SimdWrapper<_Tp, _Np>& __lhs,
+ __id<_SimdWrapper<_Tp, _Np>> __rhs)
+ {
+ __lhs = _CommonImpl::_S_blend(__k, __lhs, __rhs);
+ }
+
+ template <typename _Tp, typename _K, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static void
+ __masked_assign(_SimdWrapper<_K, _Np> __k, _SimdWrapper<_Tp, _Np>& __lhs,
+ __id<_Tp> __rhs)
+ {
+ if (__builtin_constant_p(__rhs) && __rhs == 0 && std::is_same_v<_K, _Tp>)
+ {
+ if constexpr (!is_same_v<bool, _K>)
+ // the __andnot optimization only makes sense if __k._M_data is a
+ // vector register
+ __lhs._M_data = __andnot(__k._M_data, __lhs._M_data);
+ else
+ // for AVX512/__mmask, a _mm512_maskz_mov is best
+ __lhs = _CommonImpl::_S_blend(__k, __lhs, _SimdWrapper<_Tp, _Np>());
+ }
+ else
+ __lhs = _CommonImpl::_S_blend(__k, __lhs,
+ _SimdWrapper<_Tp, _Np>(
+ __vector_broadcast<_Np>(__rhs)));
+ }
+
+ // __masked_cassign {{{2
+ template <typename _Op, typename _Tp, typename _K, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static void
+ __masked_cassign(const _SimdWrapper<_K, _Np> __k,
+ _SimdWrapper<_Tp, _Np>& __lhs,
+ const __id<_SimdWrapper<_Tp, _Np>> __rhs, _Op __op)
+ {
+ __lhs = _CommonImpl::_S_blend(__k, __lhs, __op(_SuperImpl{}, __lhs, __rhs));
+ }
+
+ template <typename _Op, typename _Tp, typename _K, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static void
+ __masked_cassign(const _SimdWrapper<_K, _Np> __k,
+ _SimdWrapper<_Tp, _Np>& __lhs, const __id<_Tp> __rhs,
+ _Op __op)
+ {
+ __lhs = _CommonImpl::_S_blend(__k, __lhs,
+ __op(_SuperImpl{}, __lhs,
+ _SimdWrapper<_Tp, _Np>(
+ __vector_broadcast<_Np>(__rhs))));
+ }
+
+ // __masked_unary {{{2
+ template <template <typename> class _Op, typename _Tp, typename _K,
+ size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static _SimdWrapper<_Tp, _Np>
+ __masked_unary(const _SimdWrapper<_K, _Np> __k,
+ const _SimdWrapper<_Tp, _Np> __v)
+ {
+ auto __vv = __make_simd(__v);
+ _Op<decltype(__vv)> __op;
+ return _CommonImpl::_S_blend(__k, __v, __data(__op(__vv)));
+ }
+
+ //}}}2
+};
+
+// _MaskImplBuiltinMixin {{{1
+struct _MaskImplBuiltinMixin
+{
+ template <typename _Tp> using _TypeTag = _Tp*;
+
+ // __to_maskvector {{{
+ template <typename _Up, size_t _ToN = 1>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Up, _ToN>
+ __to_maskvector(bool __x)
+ {
+ using _I = __int_for_sizeof_t<_Up>;
+ return __vector_bitcast<_Up>(__x ? __vector_type_t<_I, _ToN>{~_I()}
+ : __vector_type_t<_I, _ToN>{});
+ }
+
+ template <typename _Up, size_t _UpN = 0, size_t _Np, bool _Sanitized,
+ size_t _ToN = _UpN == 0 ? _Np : _UpN>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Up, _ToN>
+ __to_maskvector(_BitMask<_Np, _Sanitized> __x)
+ {
+ using _I = __int_for_sizeof_t<_Up>;
+ return __vector_bitcast<_Up>(
+ __generate_vector<__vector_type_t<_I, _ToN>>([&](auto __i) constexpr {
+ if constexpr (__i < _Np)
+ return __x[__i] ? ~_I() : _I();
+ else
+ return _I();
+ }));
+ }
+
+ template <typename _Up, size_t _UpN = 0, typename _Tp, size_t _Np,
+ size_t _ToN = _UpN == 0 ? _Np : _UpN>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Up, _ToN>
+ __to_maskvector(_SimdWrapper<_Tp, _Np> __x)
+ {
+ using _TW = _SimdWrapper<_Tp, _Np>;
+ using _UW = _SimdWrapper<_Up, _ToN>;
+ if constexpr (sizeof(_Up) == sizeof(_Tp) && sizeof(_TW) == sizeof(_UW))
+ return __wrapper_bitcast<_Up, _ToN>(__x);
+ else if constexpr (is_same_v<_Tp, bool>) // bits -> vector
+ return __to_maskvector<_Up, _ToN>(std::bitset<_Np>(__x._M_data));
+ else
+ { // vector -> vector
+ /*
+ [[maybe_unused]] const auto __y = __vector_bitcast<_Up>(__x._M_data);
+ if constexpr (sizeof(_Tp) == 8 && sizeof(_Up) == 4 && sizeof(__y) == 16)
+ return __vector_permute<1, 3, -1, -1>(__y);
+ else if constexpr (sizeof(_Tp) == 4 && sizeof(_Up) == 2
+ && sizeof(__y) == 16)
+ return __vector_permute<1, 3, 5, 7, -1, -1, -1, -1>(__y);
+ else if constexpr (sizeof(_Tp) == 8 && sizeof(_Up) == 2
+ && sizeof(__y) == 16)
+ return __vector_permute<3, 7, -1, -1, -1, -1, -1, -1>(__y);
+ else if constexpr (sizeof(_Tp) == 2 && sizeof(_Up) == 1
+ && sizeof(__y) == 16)
+ return __vector_permute<1, 3, 5, 7, 9, 11, 13, 15, -1, -1, -1, -1, -1,
+ -1, -1, -1>(__y);
+ else if constexpr (sizeof(_Tp) == 4 && sizeof(_Up) == 1
+ && sizeof(__y) == 16)
+ return __vector_permute<3, 7, 11, 15, -1, -1, -1, -1, -1, -1, -1, -1,
+ -1, -1, -1, -1>(__y);
+ else if constexpr (sizeof(_Tp) == 8 && sizeof(_Up) == 1
+ && sizeof(__y) == 16)
+ return __vector_permute<7, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+ -1, -1, -1, -1>(__y);
+ else
+ */
+ {
+ using _I = __int_for_sizeof_t<_Up>;
+ const auto __y
+ = __vector_bitcast<__int_for_sizeof_t<_Tp>>(__x._M_data);
+ return __vector_bitcast<_Up>(
+ __generate_vector<__vector_type_t<_I, _ToN>>([&](
+ auto __i) constexpr {
+ if constexpr (__i < _Np)
+ return _I(__y[__i.value]);
+ else
+ return _I();
+ }));
+ }
+ }
+ }
+
+ // }}}
+ // __to_bits {{{
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SanitizedBitMask<_Np>
+ __to_bits(_SimdWrapper<_Tp, _Np> __x)
+ {
+ static_assert(!is_same_v<_Tp, bool>);
+ static_assert(_Np <= CHAR_BIT * sizeof(_ULLong));
+ using _Up = make_unsigned_t<__int_for_sizeof_t<_Tp>>;
+ const auto __bools
+ = __vector_bitcast<_Up>(__x) >> (sizeof(_Up) * CHAR_BIT - 1);
+ _ULLong __r = 0;
+ __execute_n_times<_Np>(
+ [&](auto __i) { __r |= _ULLong(__bools[__i.value]) << __i; });
+ return __r;
+ }
+
+ // }}}
+};
+
+// _MaskImplBuiltin {{{1
+template <typename _Abi> struct _MaskImplBuiltin : _MaskImplBuiltinMixin
+{
+ using _MaskImplBuiltinMixin::__to_bits;
+ using _MaskImplBuiltinMixin::__to_maskvector;
+
+ // member types {{{
+ template <typename _Tp>
+ using _SimdMember = typename _Abi::template __traits<_Tp>::_SimdMember;
+ template <typename _Tp>
+ using _MaskMember = typename _Abi::template __traits<_Tp>::_MaskMember;
+ using _SuperImpl = typename _Abi::_MaskImpl;
+ using _CommonImpl = typename _Abi::_CommonImpl;
+ template <typename _Tp> static constexpr size_t size = simd_size_v<_Tp, _Abi>;
+
+ // }}}
+ // __broadcast {{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember<_Tp>
+ __broadcast(bool __x)
+ {
+ return __x ? _Abi::template __implicit_mask<_Tp>() : _MaskMember<_Tp>();
+ }
+
+ // }}}
+ // __load {{{
+ template <typename _Tp, typename _Flags>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember<_Tp>
+ __load(const bool* __mem)
+ {
+ using _I = __int_for_sizeof_t<_Tp>;
+ if constexpr (sizeof(_Tp) == sizeof(bool))
+ {
+ const auto __bools
+ = _CommonImpl::template _S_load<_I, size<_Tp>>(__mem, _Flags());
+ // bool is {0, 1}, everything else is UB
+ return __vector_bitcast<_Tp>(__bools > 0);
+ }
+ else
+ return __vector_bitcast<_Tp>(__generate_vector<_I, size<_Tp>>([&](
+ auto __i) constexpr { return __mem[__i] ? ~_I() : _I(); }));
+ }
+
+ // }}}
+ // __convert {{{
+ template <typename _Tp, size_t _Np, bool _Sanitized>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr auto
+ __convert(_BitMask<_Np, _Sanitized> __x)
+ {
+ if constexpr (__is_builtin_bitmask_abi<_Abi>())
+ return _SimdWrapper<bool, simd_size_v<_Tp, _Abi>>(__x._M_to_bits());
+ else
+ return _SuperImpl::template __to_maskvector<_Tp, size<_Tp>>(
+ __x._M_sanitized());
+ }
+
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr auto
+ __convert(_SimdWrapper<bool, _Np> __x)
+ {
+ if constexpr (__is_builtin_bitmask_abi<_Abi>())
+ return _SimdWrapper<bool, simd_size_v<_Tp, _Abi>>(__x._M_data);
+ else
+ return _SuperImpl::template __to_maskvector<_Tp, size<_Tp>>(
+ _BitMask<_Np>(__x._M_data)._M_sanitized());
+ }
+
+ template <typename _Tp, typename _Up, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr auto
+ __convert(_SimdWrapper<_Up, _Np> __x)
+ {
+ if constexpr (__is_builtin_bitmask_abi<_Abi>())
+ return _SimdWrapper<bool, simd_size_v<_Tp, _Abi>>(
+ _SuperImpl::__to_bits(__x));
+ else
+ return _SuperImpl::template __to_maskvector<_Tp, size<_Tp>>(__x);
+ }
+
+ template <typename _Tp, typename _Up, typename _UAbi>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr auto
+ __convert(simd_mask<_Up, _UAbi> __x)
+ {
+ if constexpr (__is_builtin_bitmask_abi<_Abi>())
+ {
+ using _R = _SimdWrapper<bool, simd_size_v<_Tp, _Abi>>;
+ if constexpr (__is_builtin_bitmask_abi<_UAbi>()) // bits -> bits
+ return _R(__data(__x));
+ else if constexpr (__is_scalar_abi<_UAbi>()) // bool -> bits
+ return _R(__data(__x));
+ else if constexpr (__is_fixed_size_abi_v<_UAbi>) // bitset -> bits
+ return _R(__data(__x)._M_to_bits());
+ else // vector -> bits
+ return _R(_UAbi::_MaskImpl::__to_bits(__data(__x))._M_to_bits());
+ }
+ else
+ return _SuperImpl::template __to_maskvector<_Tp, size<_Tp>>(__data(__x));
+ }
+
+ // }}}
+ // __masked_load {{{2
+ template <typename _Tp, size_t _Np, typename _Fp>
+ static inline _SimdWrapper<_Tp, _Np>
+ __masked_load(_SimdWrapper<_Tp, _Np> __merge, _SimdWrapper<_Tp, _Np> __mask,
+ const bool* __mem, _Fp) noexcept
+ {
+ // AVX(2) has 32/64 bit maskload, but nothing at 8 bit granularity
+ auto __tmp = __wrapper_bitcast<__int_for_sizeof_t<_Tp>>(__merge);
+ _BitOps::__bit_iteration(_SuperImpl::__to_bits(__mask),
+ [&](auto __i) { __tmp.__set(__i, -__mem[__i]); });
+ __merge = __wrapper_bitcast<_Tp>(__tmp);
+ return __merge;
+ }
+
+ // __store {{{2
+ template <typename _Tp, size_t _Np, typename _Fp>
+ _GLIBCXX_SIMD_INTRINSIC static void __store(_SimdWrapper<_Tp, _Np> __v,
+ bool* __mem, _Fp) noexcept
+ {
+ __execute_n_times<_Np>([&](auto __i) constexpr { __mem[__i] = __v[__i]; });
+ }
+
+ // __masked_store {{{2
+ template <typename _Tp, size_t _Np, typename _Fp>
+ static inline void __masked_store(const _SimdWrapper<_Tp, _Np> __v,
+ bool* __mem, _Fp,
+ const _SimdWrapper<_Tp, _Np> __k) noexcept
+ {
+ _BitOps::__bit_iteration(
+ _SuperImpl::__to_bits(__k), [&](auto __i) constexpr {
+ __mem[__i] = __v[__i];
+ });
+ }
+
+ // __from_bitmask{{{2
+ template <size_t _Np, typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static _MaskMember<_Tp>
+ __from_bitmask(_SanitizedBitMask<_Np> __bits, _TypeTag<_Tp>)
+ {
+ return _SuperImpl::template __to_maskvector<_Tp, size<_Tp>>(__bits);
+ }
+
+ // logical and bitwise operators {{{2
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np>
+ __logical_and(const _SimdWrapper<_Tp, _Np>& __x,
+ const _SimdWrapper<_Tp, _Np>& __y)
+ {
+ return __and(__x._M_data, __y._M_data);
+ }
+
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np>
+ __logical_or(const _SimdWrapper<_Tp, _Np>& __x,
+ const _SimdWrapper<_Tp, _Np>& __y)
+ {
+ return __or(__x._M_data, __y._M_data);
+ }
+
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np>
+ __bit_not(const _SimdWrapper<_Tp, _Np>& __x)
+ {
+ if constexpr(_Abi::_S_is_partial)
+ return __andnot(__x._M_data, _Abi::template __implicit_mask<_Tp>());
+ else
+ return __not(__x._M_data);
+ }
+
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np>
+ __bit_and(const _SimdWrapper<_Tp, _Np>& __x,
+ const _SimdWrapper<_Tp, _Np>& __y)
+ {
+ return __and(__x._M_data, __y._M_data);
+ }
+
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np>
+ __bit_or(const _SimdWrapper<_Tp, _Np>& __x, const _SimdWrapper<_Tp, _Np>& __y)
+ {
+ return __or(__x._M_data, __y._M_data);
+ }
+
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np>
+ __bit_xor(const _SimdWrapper<_Tp, _Np>& __x,
+ const _SimdWrapper<_Tp, _Np>& __y)
+ {
+ return __xor(__x._M_data, __y._M_data);
+ }
+
+ // smart_reference access {{{2
+ template <typename _Tp, size_t _Np>
+ static constexpr void __set(_SimdWrapper<_Tp, _Np>& __k, int __i,
+ bool __x) noexcept
+ {
+ if constexpr (std::is_same_v<_Tp, bool>)
+ __k.__set(__i, __x);
+ else
+ {
+ using _Ip = __int_for_sizeof_t<_Tp>;
+ auto __ki = __vector_bitcast<_Ip>(__k._M_data);
+ if (__builtin_is_constant_evaluated())
+ {
+ __k = __vector_bitcast<_Tp>(
+ __generate_from_n_evaluations<_Np, decltype(__ki)>([&](auto __j) {
+ if (__i == __j)
+ return _Ip(-__x);
+ else
+ return __ki[+__j];
+ }));
+ }
+ else
+ {
+ __ki[__i] = _Ip(-__x);
+ __k = __vector_bitcast<_Tp>(__ki);
+ }
+ }
+ }
+
+ // __masked_assign{{{2
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static void
+ __masked_assign(_SimdWrapper<_Tp, _Np> __k, _SimdWrapper<_Tp, _Np>& __lhs,
+ __id<_SimdWrapper<_Tp, _Np>> __rhs)
+ {
+ __lhs = _CommonImpl::_S_blend(__k, __lhs, __rhs);
+ }
+
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static void
+ __masked_assign(_SimdWrapper<_Tp, _Np> __k, _SimdWrapper<_Tp, _Np>& __lhs,
+ bool __rhs)
+ {
+ if (__builtin_constant_p(__rhs))
+ {
+ if (__rhs == false)
+ {
+ __lhs = __andnot(__k._M_data, __lhs._M_data);
+ }
+ else
+ {
+ __lhs = __or(__k._M_data, __lhs._M_data);
+ }
+ return;
+ }
+ __lhs
+ = _CommonImpl::_S_blend(__k, __lhs, __data(simd_mask<_Tp, _Abi>(__rhs)));
+ }
+
+ //}}}2
+ // __all_of {{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static bool __all_of(simd_mask<_Tp, _Abi> __k)
+ {
+ return __call_with_subscripts(
+ __vector_bitcast<__int_for_sizeof_t<_Tp>>(__data(__k)),
+ make_index_sequence<size<_Tp>>(),
+ [](const auto... __ent) constexpr { return (... && !(__ent == 0)); });
+ }
+
+ // }}}
+ // __any_of {{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static bool __any_of(simd_mask<_Tp, _Abi> __k)
+ {
+ return __call_with_subscripts(
+ __vector_bitcast<__int_for_sizeof_t<_Tp>>(__data(__k)),
+ make_index_sequence<size<_Tp>>(),
+ [](const auto... __ent) constexpr { return (... || !(__ent == 0)); });
+ }
+
+ // }}}
+ // __none_of {{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static bool __none_of(simd_mask<_Tp, _Abi> __k)
+ {
+ return __call_with_subscripts(
+ __vector_bitcast<__int_for_sizeof_t<_Tp>>(__data(__k)),
+ make_index_sequence<size<_Tp>>(),
+ [](const auto... __ent) constexpr { return (... && (__ent == 0)); });
+ }
+
+ // }}}
+ // __some_of {{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static bool __some_of(simd_mask<_Tp, _Abi> __k)
+ {
+ const int __n_true = __popcount(__k);
+ return __n_true > 0 && __n_true < int(size<_Tp>);
+ }
+
+ // }}}
+ // __popcount {{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static int __popcount(simd_mask<_Tp, _Abi> __k)
+ {
+ using _I = __int_for_sizeof_t<_Tp>;
+ if constexpr (std::is_default_constructible_v<simd<_I, _Abi>>)
+ return -reduce(
+ simd<_I, _Abi>(__private_init, __wrapper_bitcast<_I>(__data(__k))));
+ else
+ return -reduce(__bit_cast<rebind_simd_t<_I, simd<_Tp, _Abi>>>(
+ simd<_Tp, _Abi>(__private_init, __data(__k))));
+ }
+
+ // }}}
+ // __find_first_set {{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static int __find_first_set(simd_mask<_Tp, _Abi> __k)
+ {
+ return _BitOps::__firstbit(_SuperImpl::__to_bits(__data(__k))._M_to_bits());
+ }
+
+ // }}}
+ // __find_last_set {{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static int __find_last_set(simd_mask<_Tp, _Abi> __k)
+ {
+ return _BitOps::__lastbit(_SuperImpl::__to_bits(__data(__k))._M_to_bits());
+ }
+
+ // }}}
+};
+
+//}}}1
+_GLIBCXX_SIMD_END_NAMESPACE
+#endif // __cplusplus >= 201703L
+#endif // _GLIBCXX_EXPERIMENTAL_SIMD_ABIS_H_
+
+// vim: foldmethod=marker sw=2 noet ts=8 sts=2 tw=80
diff --git a/libstdc++-v3/include/experimental/bits/simd_converter.h b/libstdc++-v3/include/experimental/bits/simd_converter.h
new file mode 100644
index 00000000000..256b64023d2
--- /dev/null
+++ b/libstdc++-v3/include/experimental/bits/simd_converter.h
@@ -0,0 +1,337 @@
+// Generic simd conversions -*- C++ -*-
+
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library. This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+// <http://www.gnu.org/licenses/>.
+
+#ifndef _GLIBCXX_EXPERIMENTAL_SIMD_CONVERTER_H_
+#define _GLIBCXX_EXPERIMENTAL_SIMD_CONVERTER_H_
+
+#if __cplusplus >= 201703L
+
+_GLIBCXX_SIMD_BEGIN_NAMESPACE
+// _SimdConverter scalar -> scalar {{{
+template <typename _From, typename _To>
+struct _SimdConverter<_From, simd_abi::scalar, _To, simd_abi::scalar,
+ std::enable_if_t<!std::is_same_v<_From, _To>>>
+{
+ _GLIBCXX_SIMD_INTRINSIC constexpr _To operator()(_From __a) const noexcept
+ {
+ return static_cast<_To>(__a);
+ }
+};
+
+// }}}
+// _SimdConverter "native" -> scalar {{{
+template <typename _From, typename _To, typename _Abi>
+struct _SimdConverter<_From, _Abi, _To, simd_abi::scalar,
+ std::enable_if_t<!std::is_same_v<_Abi, simd_abi::scalar>>>
+{
+ using _Arg = typename _Abi::template __traits<_From>::_SimdMember;
+ static constexpr size_t _S_n = _Arg::_S_width;
+
+ _GLIBCXX_SIMD_INTRINSIC constexpr std::array<_To, _S_n>
+ __all(_Arg __a) const noexcept
+ {
+ return __call_with_subscripts(
+ __a, make_index_sequence<_S_n>(),
+ [&](auto... __values) constexpr -> std::array<_To, _S_n> {
+ return {static_cast<_To>(__values)...};
+ });
+ }
+};
+
+// }}}
+// _SimdConverter scalar -> "native" {{{
+template <typename _From, typename _To, typename _Abi>
+struct _SimdConverter<_From, simd_abi::scalar, _To, _Abi,
+ std::enable_if_t<!std::is_same_v<_Abi, simd_abi::scalar>>>
+{
+ using _Ret = typename _Abi::template __traits<_To>::_SimdMember;
+
+ template <typename... _More>
+ _GLIBCXX_SIMD_INTRINSIC constexpr _Ret
+ operator()(_From __a, _More... __more) const noexcept
+ {
+ static_assert(sizeof...(_More) + 1 == _Abi::template size<_To>);
+ static_assert(std::conjunction_v<std::is_same<_From, _More>...>);
+ return __make_vector<_To>(__a, __more...);
+ }
+};
+
+// }}}
+// _SimdConverter "native 1" -> "native 2" {{{
+template <typename _From, typename _To, typename _AFrom, typename _ATo>
+struct _SimdConverter<
+ _From, _AFrom, _To, _ATo,
+ std::enable_if_t<!std::disjunction_v<
+ __is_fixed_size_abi<_AFrom>, __is_fixed_size_abi<_ATo>,
+ std::is_same<_AFrom, simd_abi::scalar>,
+ std::is_same<_ATo, simd_abi::scalar>,
+ std::conjunction<std::is_same<_From, _To>, std::is_same<_AFrom, _ATo>>>>>
+{
+ using _Arg = typename _AFrom::template __traits<_From>::_SimdMember;
+ using _Ret = typename _ATo::template __traits<_To>::_SimdMember;
+ using _V = __vector_type_t<_To, simd_size_v<_To, _ATo>>;
+
+ _GLIBCXX_SIMD_INTRINSIC constexpr auto __all(_Arg __a) const noexcept
+ {
+ return __convert_all<_V>(__a);
+ }
+
+ template <typename... _More>
+ _GLIBCXX_SIMD_INTRINSIC constexpr _Ret
+ operator()(_Arg __a, _More... __more) const noexcept
+ {
+ return __convert<_V>(__a, __more...);
+ }
+};
+
+// }}}
+// _SimdConverter scalar -> fixed_size<1> {{{1
+template <typename _From, typename _To>
+struct _SimdConverter<_From, simd_abi::scalar, _To, simd_abi::fixed_size<1>,
+ void>
+{
+ _GLIBCXX_SIMD_INTRINSIC constexpr _SimdTuple<_To, simd_abi::scalar>
+ operator()(_From __x) const noexcept
+ {
+ return {static_cast<_To>(__x)};
+ }
+};
+
+// _SimdConverter fixed_size<1> -> scalar {{{1
+template <typename _From, typename _To>
+struct _SimdConverter<_From, simd_abi::fixed_size<1>, _To, simd_abi::scalar,
+ void>
+{
+ _GLIBCXX_SIMD_INTRINSIC constexpr _To
+ operator()(_SimdTuple<_From, simd_abi::scalar> __x) const noexcept
+ {
+ return {static_cast<_To>(__x.first)};
+ }
+};
+
+// _SimdConverter fixed_size<_Np> -> fixed_size<_Np> {{{1
+template <typename _From, typename _To, int _Np>
+struct _SimdConverter<_From, simd_abi::fixed_size<_Np>, _To,
+ simd_abi::fixed_size<_Np>,
+ std::enable_if_t<!std::is_same_v<_From, _To>>>
+{
+ using _Ret = __fixed_size_storage_t<_To, _Np>;
+ using _Arg = __fixed_size_storage_t<_From, _Np>;
+
+ _GLIBCXX_SIMD_INTRINSIC constexpr _Ret
+ operator()(const _Arg& __x) const noexcept
+ {
+ if constexpr (std::is_same_v<_From, _To>)
+ return __x;
+
+ // special case (optimize) int signedness casts
+ else if constexpr (sizeof(_From) == sizeof(_To)
+ && std::is_integral_v<_From> && std::is_integral_v<_To>)
+ return __bit_cast<_Ret>(__x);
+
+ // special case if all ABI tags in _Ret are scalar
+ else if constexpr (__is_scalar_abi<typename _Ret::_FirstAbi>())
+ {
+ return __call_with_subscripts(
+ __x, make_index_sequence<_Np>(),
+ [](auto... __values) constexpr -> _Ret {
+ return __make_simd_tuple<_To, decltype((void) __values,
+ simd_abi::scalar())...>(
+ static_cast<_To>(__values)...);
+ });
+ }
+
+ // from one vector to one vector
+ else if constexpr (_Arg::_S_first_size == _Ret::_S_first_size)
+ {
+ _SimdConverter<_From, typename _Arg::_FirstAbi, _To,
+ typename _Ret::_FirstAbi>
+ __native_cvt;
+ if constexpr (_Arg::_S_tuple_size == 1)
+ return {__native_cvt(__x.first)};
+ else
+ {
+ constexpr size_t _NRemain = _Np - _Arg::_S_first_size;
+ _SimdConverter<_From, simd_abi::fixed_size<_NRemain>, _To,
+ simd_abi::fixed_size<_NRemain>>
+ __remainder_cvt;
+ return {__native_cvt(__x.first), __remainder_cvt(__x.second)};
+ }
+ }
+
+ // from one vector to multiple vectors
+ else if constexpr (_Arg::_S_first_size > _Ret::_S_first_size)
+ {
+ const auto __multiple_return_chunks
+ = __convert_all<__vector_type_t<_To, _Ret::_S_first_size>>(__x.first);
+ constexpr auto __converted = __multiple_return_chunks.size()
+ * _Ret::_FirstAbi::template size<_To>;
+ constexpr auto __remaining = _Np - __converted;
+ if constexpr (_Arg::_S_tuple_size == 1 && __remaining == 0)
+ return __to_simd_tuple<_To, _Np>(__multiple_return_chunks);
+ else if constexpr (_Arg::_S_tuple_size == 1)
+ { // e.g. <int, 3> -> <double, 2, 1> or <short, 7> -> <double, 4, 2,
+ // 1>
+ using _RetRem = __remove_cvref_t<decltype(
+ __simd_tuple_pop_front<__multiple_return_chunks.size()>(_Ret()))>;
+ const auto __return_chunks2
+ = __convert_all<__vector_type_t<_To, _RetRem::_S_first_size>, 0,
+ __converted>(__x.first);
+ constexpr auto __converted2
+ = __converted + __return_chunks2.size() * _RetRem::_S_first_size;
+ if constexpr (__converted2 == _Np)
+ return __to_simd_tuple<_To, _Np>(__multiple_return_chunks,
+ __return_chunks2);
+ else
+ {
+ using _RetRem2 = __remove_cvref_t<decltype(
+ __simd_tuple_pop_front<__return_chunks2.size()>(_RetRem()))>;
+ const auto __return_chunks3
+ = __convert_all<__vector_type_t<_To, _RetRem2::_S_first_size>,
+ 0, __converted2>(__x.first);
+ constexpr auto __converted3
+ = __converted2
+ + __return_chunks3.size() * _RetRem2::_S_first_size;
+ if constexpr (__converted3 == _Np)
+ return __to_simd_tuple<_To, _Np>(__multiple_return_chunks,
+ __return_chunks2,
+ __return_chunks3);
+ else
+ {
+ using _RetRem3 = __remove_cvref_t<decltype(
+ __simd_tuple_pop_front<__return_chunks3.size()>(
+ _RetRem2()))>;
+ const auto __return_chunks4 = __convert_all<
+ __vector_type_t<_To, _RetRem3::_S_first_size>, 0,
+ __converted3>(__x.first);
+ constexpr auto __converted4
+ = __converted3
+ + __return_chunks4.size() * _RetRem3::_S_first_size;
+ if constexpr (__converted4 == _Np)
+ return __to_simd_tuple<_To, _Np>(__multiple_return_chunks,
+ __return_chunks2,
+ __return_chunks3,
+ __return_chunks4);
+ else
+ __assert_unreachable<_To>();
+ }
+ }
+ }
+ else
+ {
+ constexpr size_t _NRemain = _Np - _Arg::_S_first_size;
+ _SimdConverter<_From, simd_abi::fixed_size<_NRemain>, _To,
+ simd_abi::fixed_size<_NRemain>>
+ __remainder_cvt;
+ return __simd_tuple_concat(
+ __to_simd_tuple<_To, _Arg::_S_first_size>(
+ __multiple_return_chunks),
+ __remainder_cvt(__x.second));
+ }
+ }
+
+ // from multiple vectors to one vector
+ // _Arg::_S_first_size < _Ret::_S_first_size
+ // a) heterogeneous input at the end of the tuple (possible with partial
+ // native registers in _Ret)
+ else if constexpr (_Ret::_S_tuple_size == 1
+ && _Np % _Arg::_S_first_size != 0)
+ {
+ static_assert(_Ret::_FirstAbi::_S_is_partial);
+ return _Ret{__generate_from_n_evaluations<
+ _Np, typename _VectorTraits<typename _Ret::_FirstType>::type>(
+ [&](auto __i) { return static_cast<_To>(__x[__i]); })};
+ }
+ else
+ {
+ static_assert(_Arg::_S_tuple_size > 1);
+ constexpr auto __n
+ = __div_roundup(_Ret::_S_first_size, _Arg::_S_first_size);
+ return __call_with_n_evaluations<__n>(
+ [&__x](auto... __uncvted) {
+ // assuming _Arg Abi tags for all __i are _Arg::_FirstAbi
+ _SimdConverter<_From, typename _Arg::_FirstAbi, _To,
+ typename _Ret::_FirstAbi>
+ __native_cvt;
+ if constexpr (_Ret::_S_tuple_size == 1)
+ return _Ret{__native_cvt(__uncvted...)};
+ else
+ return _Ret{
+ __native_cvt(__uncvted...),
+ _SimdConverter<
+ _From, simd_abi::fixed_size<_Np - _Ret::_S_first_size>, _To,
+ simd_abi::fixed_size<_Np - _Ret::_S_first_size>>()(
+ __simd_tuple_pop_front<sizeof...(__uncvted)>(__x))};
+ },
+ [&__x](auto __i) { return __get_tuple_at<__i>(__x); });
+ }
+ }
+};
+
+// _SimdConverter "native" -> fixed_size<_Np> {{{1
+// i.e. 1 register to ? registers
+template <typename _From, typename _Ap, typename _To, int _Np>
+struct _SimdConverter<_From, _Ap, _To, simd_abi::fixed_size<_Np>,
+ std::enable_if_t<!__is_fixed_size_abi_v<_Ap>>>
+{
+ static_assert(
+ _Np == simd_size_v<_From, _Ap>,
+ "_SimdConverter to fixed_size only works for equal element counts");
+
+ _GLIBCXX_SIMD_INTRINSIC constexpr __fixed_size_storage_t<_To, _Np>
+ operator()(typename _SimdTraits<_From, _Ap>::_SimdMember __x) const noexcept
+ {
+ _SimdConverter<_From, simd_abi::fixed_size<_Np>, _To,
+ simd_abi::fixed_size<_Np>>
+ __fixed_cvt;
+ return __fixed_cvt(__fixed_size_storage_t<_From, _Np>{__x});
+ }
+};
+
+// _SimdConverter fixed_size<_Np> -> "native" {{{1
+// i.e. ? register to 1 registers
+template <typename _From, int _Np, typename _To, typename _Ap>
+struct _SimdConverter<_From, simd_abi::fixed_size<_Np>, _To, _Ap,
+ std::enable_if_t<!__is_fixed_size_abi_v<_Ap>>>
+{
+ static_assert(
+ _Np == simd_size_v<_To, _Ap>,
+ "_SimdConverter to fixed_size only works for equal element counts");
+
+ _GLIBCXX_SIMD_INTRINSIC constexpr typename _SimdTraits<_To, _Ap>::_SimdMember
+ operator()(__fixed_size_storage_t<_From, _Np> __x) const noexcept
+ {
+ _SimdConverter<_From, simd_abi::fixed_size<_Np>, _To,
+ simd_abi::fixed_size<_Np>>
+ __fixed_cvt;
+ return __fixed_cvt(__x).first;
+ }
+};
+
+// }}}1
+_GLIBCXX_SIMD_END_NAMESPACE
+#endif // __cplusplus >= 201703L
+#endif // _GLIBCXX_EXPERIMENTAL_SIMD_CONVERTER_H_
+
+// vim: foldmethod=marker sw=2 noet ts=8 sts=2 tw=80
diff --git a/libstdc++-v3/include/experimental/bits/simd_detail.h b/libstdc++-v3/include/experimental/bits/simd_detail.h
new file mode 100644
index 00000000000..c8a40ecc3af
--- /dev/null
+++ b/libstdc++-v3/include/experimental/bits/simd_detail.h
@@ -0,0 +1,309 @@
+// Internal macros for the simd implementation -*- C++ -*-
+
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library. This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+// <http://www.gnu.org/licenses/>.
+
+#ifndef _GLIBCXX_EXPERIMENTAL_SIMD_DETAIL_H_
+#define _GLIBCXX_EXPERIMENTAL_SIMD_DETAIL_H_
+
+#if __cplusplus >= 201703L
+
+#include <cstddef>
+#include <cstdint>
+
+
+#define _GLIBCXX_SIMD_BEGIN_NAMESPACE \
+ namespace std _GLIBCXX_VISIBILITY(default) \
+ { \
+ _GLIBCXX_BEGIN_NAMESPACE_VERSION \
+ namespace experimental { \
+ inline namespace parallelism_v2 {
+#define _GLIBCXX_SIMD_END_NAMESPACE \
+ } \
+ } \
+ _GLIBCXX_END_NAMESPACE_VERSION \
+ }
+
+// ISA extension detection. The following defines all the _GLIBCXX_SIMD_HAVE_XXX
+// macros ARM{{{
+#if defined __ARM_NEON
+#define _GLIBCXX_SIMD_HAVE_NEON 1
+#else
+#define _GLIBCXX_SIMD_HAVE_NEON 0
+#endif
+#if defined __ARM_NEON && (__ARM_ARCH >= 8 || defined __aarch64__)
+#define _GLIBCXX_SIMD_HAVE_NEON_A32 1
+#else
+#define _GLIBCXX_SIMD_HAVE_NEON_A32 0
+#endif
+#if defined __ARM_NEON && defined __aarch64__
+#define _GLIBCXX_SIMD_HAVE_NEON_A64 1
+#else
+#define _GLIBCXX_SIMD_HAVE_NEON_A64 0
+#endif
+//}}}
+// x86{{{
+#ifdef __MMX__
+#define _GLIBCXX_SIMD_HAVE_MMX 1
+#else
+#define _GLIBCXX_SIMD_HAVE_MMX 0
+#endif
+#if defined __SSE__ || defined __x86_64__
+#define _GLIBCXX_SIMD_HAVE_SSE 1
+#else
+#define _GLIBCXX_SIMD_HAVE_SSE 0
+#endif
+#if defined __SSE2__ || defined __x86_64__
+#define _GLIBCXX_SIMD_HAVE_SSE2 1
+#else
+#define _GLIBCXX_SIMD_HAVE_SSE2 0
+#endif
+#ifdef __SSE3__
+#define _GLIBCXX_SIMD_HAVE_SSE3 1
+#else
+#define _GLIBCXX_SIMD_HAVE_SSE3 0
+#endif
+#ifdef __SSSE3__
+#define _GLIBCXX_SIMD_HAVE_SSSE3 1
+#else
+#define _GLIBCXX_SIMD_HAVE_SSSE3 0
+#endif
+#ifdef __SSE4_1__
+#define _GLIBCXX_SIMD_HAVE_SSE4_1 1
+#else
+#define _GLIBCXX_SIMD_HAVE_SSE4_1 0
+#endif
+#ifdef __SSE4_2__
+#define _GLIBCXX_SIMD_HAVE_SSE4_2 1
+#else
+#define _GLIBCXX_SIMD_HAVE_SSE4_2 0
+#endif
+#ifdef __XOP__
+#define _GLIBCXX_SIMD_HAVE_XOP 1
+#else
+#define _GLIBCXX_SIMD_HAVE_XOP 0
+#endif
+#ifdef __AVX__
+#define _GLIBCXX_SIMD_HAVE_AVX 1
+#else
+#define _GLIBCXX_SIMD_HAVE_AVX 0
+#endif
+#ifdef __AVX2__
+#define _GLIBCXX_SIMD_HAVE_AVX2 1
+#else
+#define _GLIBCXX_SIMD_HAVE_AVX2 0
+#endif
+#ifdef __BMI__
+#define _GLIBCXX_SIMD_HAVE_BMI1 1
+#else
+#define _GLIBCXX_SIMD_HAVE_BMI1 0
+#endif
+#ifdef __BMI2__
+#define _GLIBCXX_SIMD_HAVE_BMI2 1
+#else
+#define _GLIBCXX_SIMD_HAVE_BMI2 0
+#endif
+#ifdef __LZCNT__
+#define _GLIBCXX_SIMD_HAVE_LZCNT 1
+#else
+#define _GLIBCXX_SIMD_HAVE_LZCNT 0
+#endif
+#ifdef __SSE4A__
+#define _GLIBCXX_SIMD_HAVE_SSE4A 1
+#else
+#define _GLIBCXX_SIMD_HAVE_SSE4A 0
+#endif
+#ifdef __FMA__
+#define _GLIBCXX_SIMD_HAVE_FMA 1
+#else
+#define _GLIBCXX_SIMD_HAVE_FMA 0
+#endif
+#ifdef __FMA4__
+#define _GLIBCXX_SIMD_HAVE_FMA4 1
+#else
+#define _GLIBCXX_SIMD_HAVE_FMA4 0
+#endif
+#ifdef __F16C__
+#define _GLIBCXX_SIMD_HAVE_F16C 1
+#else
+#define _GLIBCXX_SIMD_HAVE_F16C 0
+#endif
+#ifdef __POPCNT__
+#define _GLIBCXX_SIMD_HAVE_POPCNT 1
+#else
+#define _GLIBCXX_SIMD_HAVE_POPCNT 0
+#endif
+#ifdef __AVX512F__
+#define _GLIBCXX_SIMD_HAVE_AVX512F 1
+#else
+#define _GLIBCXX_SIMD_HAVE_AVX512F 0
+#endif
+#ifdef __AVX512DQ__
+#define _GLIBCXX_SIMD_HAVE_AVX512DQ 1
+#else
+#define _GLIBCXX_SIMD_HAVE_AVX512DQ 0
+#endif
+#ifdef __AVX512VL__
+#define _GLIBCXX_SIMD_HAVE_AVX512VL 1
+#else
+#define _GLIBCXX_SIMD_HAVE_AVX512VL 0
+#endif
+#ifdef __AVX512BW__
+#define _GLIBCXX_SIMD_HAVE_AVX512BW 1
+#else
+#define _GLIBCXX_SIMD_HAVE_AVX512BW 0
+#endif
+
+#if _GLIBCXX_SIMD_HAVE_SSE
+#define _GLIBCXX_SIMD_HAVE_SSE_ABI 1
+#else
+#define _GLIBCXX_SIMD_HAVE_SSE_ABI 0
+#endif
+#if _GLIBCXX_SIMD_HAVE_SSE2
+#define _GLIBCXX_SIMD_HAVE_FULL_SSE_ABI 1
+#else
+#define _GLIBCXX_SIMD_HAVE_FULL_SSE_ABI 0
+#endif
+
+#if _GLIBCXX_SIMD_HAVE_AVX
+#define _GLIBCXX_SIMD_HAVE_AVX_ABI 1
+#else
+#define _GLIBCXX_SIMD_HAVE_AVX_ABI 0
+#endif
+#if _GLIBCXX_SIMD_HAVE_AVX2
+#define _GLIBCXX_SIMD_HAVE_FULL_AVX_ABI 1
+#else
+#define _GLIBCXX_SIMD_HAVE_FULL_AVX_ABI 0
+#endif
+
+#if _GLIBCXX_SIMD_HAVE_AVX512F
+#define _GLIBCXX_SIMD_HAVE_AVX512_ABI 1
+#else
+#define _GLIBCXX_SIMD_HAVE_AVX512_ABI 0
+#endif
+#if _GLIBCXX_SIMD_HAVE_AVX512BW
+#define _GLIBCXX_SIMD_HAVE_FULL_AVX512_ABI 1
+#else
+#define _GLIBCXX_SIMD_HAVE_FULL_AVX512_ABI 0
+#endif
+
+#if defined __x86_64__ && !_GLIBCXX_SIMD_HAVE_SSE2
+#error "Use of SSE2 is required on AMD64"
+#endif
+//}}}
+
+#define _GLIBCXX_SIMD_NORMAL_MATH \
+ [[__gnu__::__optimize__("finite-math-only,no-signed-zeros")]]
+#define _GLIBCXX_SIMD_NEVER_INLINE [[__gnu__::__noinline__]]
+#define _GLIBCXX_SIMD_INTRINSIC \
+ [[__gnu__::__always_inline__, __gnu__::__artificial__]] inline
+#define _GLIBCXX_SIMD_ALWAYS_INLINE [[__gnu__::__always_inline__]] inline
+#define _GLIBCXX_SIMD_IS_UNLIKELY(__x) __builtin_expect(__x, 0)
+#define _GLIBCXX_SIMD_IS_LIKELY(__x) __builtin_expect(__x, 1)
+#if defined __STRICT_ANSI__ && __STRICT_ANSI__
+#define _GLIBCXX_SIMD_CONSTEXPR
+#else
+#define _GLIBCXX_SIMD_CONSTEXPR constexpr
+#endif
+
+#define _GLIBCXX_SIMD_LIST_BINARY(__macro) __macro(|) __macro(&) __macro(^)
+#define _GLIBCXX_SIMD_LIST_SHIFTS(__macro) __macro(<<) __macro(>>)
+#define _GLIBCXX_SIMD_LIST_ARITHMETICS(__macro) \
+ __macro(+) __macro(-) __macro(*) __macro(/) __macro(%)
+
+#define _GLIBCXX_SIMD_ALL_BINARY(__macro) \
+ _GLIBCXX_SIMD_LIST_BINARY(__macro) static_assert(true)
+#define _GLIBCXX_SIMD_ALL_SHIFTS(__macro) \
+ _GLIBCXX_SIMD_LIST_SHIFTS(__macro) static_assert(true)
+#define _GLIBCXX_SIMD_ALL_ARITHMETICS(__macro) \
+ _GLIBCXX_SIMD_LIST_ARITHMETICS(__macro) static_assert(true)
+
+#ifdef _GLIBCXX_SIMD_NO_ALWAYS_INLINE
+#undef _GLIBCXX_SIMD_ALWAYS_INLINE
+#define _GLIBCXX_SIMD_ALWAYS_INLINE inline
+#undef _GLIBCXX_SIMD_INTRINSIC
+#define _GLIBCXX_SIMD_INTRINSIC inline
+#endif
+
+#if _GLIBCXX_SIMD_HAVE_SSE || _GLIBCXX_SIMD_HAVE_MMX
+#define _GLIBCXX_SIMD_X86INTRIN 1
+#else
+#define _GLIBCXX_SIMD_X86INTRIN 0
+#endif
+
+// workaround macros {{{
+// use aliasing loads to help GCC understand the data accesses better
+// This also seems to hide a miscompilation on swap(x[i], x[i + 1]) with
+// fixed_size_simd<float, 16> x.
+#define _GLIBCXX_SIMD_USE_ALIASING_LOADS 1
+
+// vector conversions on x86 not optimized:
+#if _GLIBCXX_SIMD_X86INTRIN
+#define _GLIBCXX_SIMD_WORKAROUND_PR85048 1
+#endif
+
+// Invalid instruction mov from xmm16-31
+#define _GLIBCXX_SIMD_WORKAROUND_PR89229 1
+
+// integer division not optimized
+#define _GLIBCXX_SIMD_WORKAROUND_PR90993 1
+
+// very bad codegen for extraction and concatenation of 128/256 "subregisters"
+// with sizeof(element type) < 8: https://godbolt.org/g/mqUsgM
+#if _GLIBCXX_SIMD_X86INTRIN
+#define _GLIBCXX_SIMD_WORKAROUND_XXX_1 1
+#endif
+
+// bad codegen for 8 Byte memcpy to __vector_type_t<char, 16>
+#define _GLIBCXX_SIMD_WORKAROUND_PR90424 1
+
+// bad codegen for zero-extend using simple concat(__x, 0)
+#if _GLIBCXX_SIMD_X86INTRIN
+#define _GLIBCXX_SIMD_WORKAROUND_XXX_3 1
+#endif
+
+// bad codegen for integer division
+#define _GLIBCXX_SIMD_WORKAROUND_XXX_4 1
+
+// abs pattern may generate MMX instructions without EMMS cleanup (This only
+// happens with SSSE3 because pabs[bwd] is part of SSSE3.)
+#if __GNUC__ < 10 && defined __SSSE3__ && _GLIBCXX_SIMD_X86INTRIN
+#define _GLIBCXX_SIMD_WORKAROUND_PR91533 1
+#endif
+
+#if __GNUC__ < 10 && defined __aarch64__
+#define _GLIBCXX_SIMD_WORKAROUND_XXX_5 1
+#endif
+
+// https://github.com/cplusplus/parallelism-ts/issues/65 (incorrect return type
+// of static_simd_cast)
+#define _GLIBCXX_SIMD_FIX_P2TS_ISSUE65 1
+
+// https://github.com/cplusplus/parallelism-ts/issues/66 (incorrect SFINAE
+// constraint on (static)_simd_cast)
+#define _GLIBCXX_SIMD_FIX_P2TS_ISSUE66 1
+// }}}
+
+#endif // __cplusplus >= 201703L
+#endif // _GLIBCXX_EXPERIMENTAL_SIMD_DETAIL_H_
+
+// vim: foldmethod=marker
diff --git a/libstdc++-v3/include/experimental/bits/simd_fixed_size.h b/libstdc++-v3/include/experimental/bits/simd_fixed_size.h
new file mode 100644
index 00000000000..2b643f28835
--- /dev/null
+++ b/libstdc++-v3/include/experimental/bits/simd_fixed_size.h
@@ -0,0 +1,2102 @@
+// Simd fixed_size ABI specific implementations -*- C++ -*-
+
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library. This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+// <http://www.gnu.org/licenses/>.
+
+/*
+ * The fixed_size ABI gives the following guarantees:
+ * - simd objects are passed via the stack
+ * - memory layout of `simd<_Tp, _Np>` is equivalent to `std::array<_Tp, _Np>`
+ * - alignment of `simd<_Tp, _Np>` is `_Np * sizeof(_Tp)` if _Np is __a
+ * power-of-2 value, otherwise `__next_power_of_2(_Np * sizeof(_Tp))` (Note:
+ * if the alignment were to exceed the system/compiler maximum, it is bounded
+ * to that maximum)
+ * - simd_mask objects are passed like std::bitset<_Np>
+ * - memory layout of `simd_mask<_Tp, _Np>` is equivalent to `std::bitset<_Np>`
+ * - alignment of `simd_mask<_Tp, _Np>` is equal to the alignment of
+ * `std::bitset<_Np>`
+ */
+
+#ifndef _GLIBCXX_EXPERIMENTAL_SIMD_FIXED_SIZE_H_
+#define _GLIBCXX_EXPERIMENTAL_SIMD_FIXED_SIZE_H_
+
+#if __cplusplus >= 201703L
+
+#include <array>
+
+_GLIBCXX_SIMD_BEGIN_NAMESPACE
+
+// __simd_tuple_element {{{
+template <size_t _I, typename _Tp> struct __simd_tuple_element;
+template <typename _Tp, typename _A0, typename... _As>
+struct __simd_tuple_element<0, _SimdTuple<_Tp, _A0, _As...>>
+{
+ using type = std::experimental::simd<_Tp, _A0>;
+};
+template <size_t _I, typename _Tp, typename _A0, typename... _As>
+struct __simd_tuple_element<_I, _SimdTuple<_Tp, _A0, _As...>>
+{
+ using type =
+ typename __simd_tuple_element<_I - 1, _SimdTuple<_Tp, _As...>>::type;
+};
+template <size_t _I, typename _Tp>
+using __simd_tuple_element_t = typename __simd_tuple_element<_I, _Tp>::type;
+
+// }}}
+// __simd_tuple_concat {{{
+template <typename _Tp, typename... _A0s, typename... _A1s>
+_GLIBCXX_SIMD_INTRINSIC constexpr _SimdTuple<_Tp, _A0s..., _A1s...>
+__simd_tuple_concat(const _SimdTuple<_Tp, _A0s...>& __left,
+ const _SimdTuple<_Tp, _A1s...>& __right)
+{
+ if constexpr (sizeof...(_A0s) == 0)
+ return __right;
+ else if constexpr (sizeof...(_A1s) == 0)
+ return __left;
+ else
+ return {__left.first, __simd_tuple_concat(__left.second, __right)};
+}
+
+template <typename _Tp, typename _A10, typename... _A1s>
+_GLIBCXX_SIMD_INTRINSIC constexpr _SimdTuple<_Tp, simd_abi::scalar, _A10,
+ _A1s...>
+__simd_tuple_concat(const _Tp& __left,
+ const _SimdTuple<_Tp, _A10, _A1s...>& __right)
+{
+ return {__left, __right};
+}
+
+// }}}
+// __simd_tuple_pop_front {{{
+template <size_t _Np, typename _Tp>
+_GLIBCXX_SIMD_INTRINSIC constexpr decltype(auto)
+__simd_tuple_pop_front(_Tp&& __x)
+{
+ if constexpr (_Np == 0)
+ return static_cast<_Tp&&>(__x);
+ else
+ return __simd_tuple_pop_front<_Np - 1>(__x.second);
+}
+
+// }}}
+// __get_simd_at<_Np> {{{1
+struct __as_simd
+{
+};
+struct __as_simd_tuple
+{
+};
+template <typename _Tp, typename _A0, typename... _Abis>
+_GLIBCXX_SIMD_INTRINSIC constexpr simd<_Tp, _A0>
+__simd_tuple_get_impl(__as_simd, const _SimdTuple<_Tp, _A0, _Abis...>& __t,
+ _SizeConstant<0>)
+{
+ return {__private_init, __t.first};
+}
+template <typename _Tp, typename _A0, typename... _Abis>
+_GLIBCXX_SIMD_INTRINSIC constexpr const auto&
+__simd_tuple_get_impl(__as_simd_tuple,
+ const _SimdTuple<_Tp, _A0, _Abis...>& __t,
+ _SizeConstant<0>)
+{
+ return __t.first;
+}
+template <typename _Tp, typename _A0, typename... _Abis>
+_GLIBCXX_SIMD_INTRINSIC constexpr auto&
+__simd_tuple_get_impl(__as_simd_tuple, _SimdTuple<_Tp, _A0, _Abis...>& __t,
+ _SizeConstant<0>)
+{
+ return __t.first;
+}
+
+template <typename _R, size_t _Np, typename _Tp, typename... _Abis>
+_GLIBCXX_SIMD_INTRINSIC constexpr auto
+__simd_tuple_get_impl(_R, const _SimdTuple<_Tp, _Abis...>& __t,
+ _SizeConstant<_Np>)
+{
+ return __simd_tuple_get_impl(_R(), __t.second, _SizeConstant<_Np - 1>());
+}
+template <size_t _Np, typename _Tp, typename... _Abis>
+_GLIBCXX_SIMD_INTRINSIC constexpr auto&
+__simd_tuple_get_impl(__as_simd_tuple, _SimdTuple<_Tp, _Abis...>& __t,
+ _SizeConstant<_Np>)
+{
+ return __simd_tuple_get_impl(__as_simd_tuple(), __t.second,
+ _SizeConstant<_Np - 1>());
+}
+
+template <size_t _Np, typename _Tp, typename... _Abis>
+_GLIBCXX_SIMD_INTRINSIC constexpr auto
+__get_simd_at(const _SimdTuple<_Tp, _Abis...>& __t)
+{
+ return __simd_tuple_get_impl(__as_simd(), __t, _SizeConstant<_Np>());
+}
+
+// }}}
+// __get_tuple_at<_Np> {{{
+template <size_t _Np, typename _Tp, typename... _Abis>
+_GLIBCXX_SIMD_INTRINSIC constexpr auto
+__get_tuple_at(const _SimdTuple<_Tp, _Abis...>& __t)
+{
+ return __simd_tuple_get_impl(__as_simd_tuple(), __t, _SizeConstant<_Np>());
+}
+
+template <size_t _Np, typename _Tp, typename... _Abis>
+_GLIBCXX_SIMD_INTRINSIC constexpr auto&
+__get_tuple_at(_SimdTuple<_Tp, _Abis...>& __t)
+{
+ return __simd_tuple_get_impl(__as_simd_tuple(), __t, _SizeConstant<_Np>());
+}
+
+// __tuple_element_meta {{{1
+template <typename _Tp, typename _Abi, size_t _Offset>
+struct __tuple_element_meta : public _Abi::_SimdImpl
+{
+ static_assert(is_same_v<typename _Abi::_SimdImpl::abi_type,
+ _Abi>); // this fails e.g. when _SimdImpl is an alias
+ // for _SimdImplBuiltin<_DifferentAbi>
+ using value_type = _Tp;
+ using abi_type = _Abi;
+ using _Traits = _SimdTraits<_Tp, _Abi>;
+ using _MaskImpl = typename _Abi::_MaskImpl;
+ using _MaskMember = typename _Traits::_MaskMember;
+ using simd_type = std::experimental::simd<_Tp, _Abi>;
+ static constexpr size_t _S_offset = _Offset;
+ static constexpr size_t size() { return simd_size<_Tp, _Abi>::value; }
+ static constexpr _MaskImpl _S_mask_impl = {};
+
+ template <size_t _Np, bool _Sanitized>
+ _GLIBCXX_SIMD_INTRINSIC static auto
+ __submask(_BitMask<_Np, _Sanitized> __bits)
+ {
+ return __bits.template _M_extract<_Offset, size()>();
+ }
+
+ template <size_t _Np, bool _Sanitized>
+ _GLIBCXX_SIMD_INTRINSIC static _MaskMember
+ __make_mask(_BitMask<_Np, _Sanitized> __bits)
+ {
+ return _MaskImpl::template __convert<_Tp>(
+ __bits.template _M_extract<_Offset, size()>()._M_sanitized());
+ }
+
+ _GLIBCXX_SIMD_INTRINSIC static _ULLong
+ __mask_to_shifted_ullong(_MaskMember __k)
+ {
+ return _MaskImpl::__to_bits(__k).to_ullong() << _Offset;
+ }
+};
+
+template <size_t _Offset, typename _Tp, typename _Abi, typename... _As>
+__tuple_element_meta<_Tp, _Abi, _Offset>
+__make_meta(const _SimdTuple<_Tp, _Abi, _As...>&)
+{
+ return {};
+}
+
+// }}}1
+// _WithOffset wrapper class {{{
+template <size_t _Offset, typename _Base> struct _WithOffset : public _Base
+{
+ static inline constexpr size_t _S_offset = _Offset;
+
+ _GLIBCXX_SIMD_INTRINSIC char* __as_charptr()
+ {
+ return reinterpret_cast<char*>(this)
+ + _S_offset * sizeof(typename _Base::value_type);
+ }
+ _GLIBCXX_SIMD_INTRINSIC const char* __as_charptr() const
+ {
+ return reinterpret_cast<const char*>(this)
+ + _S_offset * sizeof(typename _Base::value_type);
+ }
+};
+
+// make _WithOffset<_WithOffset> ill-formed to use:
+template <size_t _O0, size_t _O1, typename _Base>
+struct _WithOffset<_O0, _WithOffset<_O1, _Base>>
+{
+};
+
+template <size_t _Offset, typename _Tp>
+decltype(auto)
+__add_offset(_Tp& __base)
+{
+ return static_cast<_WithOffset<_Offset, __remove_cvref_t<_Tp>>&>(__base);
+}
+template <size_t _Offset, typename _Tp>
+decltype(auto)
+__add_offset(const _Tp& __base)
+{
+ return static_cast<const _WithOffset<_Offset, __remove_cvref_t<_Tp>>&>(
+ __base);
+}
+template <size_t _Offset, size_t _ExistingOffset, typename _Tp>
+decltype(auto)
+__add_offset(_WithOffset<_ExistingOffset, _Tp>& __base)
+{
+ return static_cast<_WithOffset<_Offset + _ExistingOffset, _Tp>&>(
+ static_cast<_Tp&>(__base));
+}
+template <size_t _Offset, size_t _ExistingOffset, typename _Tp>
+decltype(auto)
+__add_offset(const _WithOffset<_ExistingOffset, _Tp>& __base)
+{
+ return static_cast<const _WithOffset<_Offset + _ExistingOffset, _Tp>&>(
+ static_cast<const _Tp&>(__base));
+}
+
+template <typename _Tp> constexpr inline size_t __offset = 0;
+template <size_t _Offset, typename _Tp>
+constexpr inline size_t
+ __offset<_WithOffset<_Offset, _Tp>> = _WithOffset<_Offset, _Tp>::_S_offset;
+template <typename _Tp>
+constexpr inline size_t __offset<const _Tp> = __offset<_Tp>;
+template <typename _Tp> constexpr inline size_t __offset<_Tp&> = __offset<_Tp>;
+template <typename _Tp> constexpr inline size_t __offset<_Tp&&> = __offset<_Tp>;
+
+// }}}
+// _SimdTuple specializations {{{1
+// empty {{{2
+template <typename _Tp> struct _SimdTuple<_Tp>
+{
+ using value_type = _Tp;
+ static constexpr size_t _S_tuple_size = 0;
+ static constexpr size_t size() { return 0; }
+};
+
+// _SimdTupleData {{{2
+template <typename _FirstType, typename _SecondType> struct _SimdTupleData
+{
+ _FirstType first;
+ _SecondType second;
+
+ _GLIBCXX_SIMD_INTRINSIC
+ constexpr bool _M_is_constprop() const
+ {
+ if constexpr(is_class_v<_FirstType>)
+ return first._M_is_constprop() && second._M_is_constprop();
+ else
+ return __builtin_constant_p(first) && second._M_is_constprop();
+ }
+};
+
+template <typename _FirstType, typename _Tp>
+struct _SimdTupleData<_FirstType, _SimdTuple<_Tp>>
+{
+ _FirstType first;
+ static constexpr _SimdTuple<_Tp> second = {};
+
+ _GLIBCXX_SIMD_INTRINSIC
+ constexpr bool _M_is_constprop() const
+ {
+ if constexpr(is_class_v<_FirstType>)
+ return first._M_is_constprop();
+ else
+ return __builtin_constant_p(first);
+ }
+};
+
+// 1 or more {{{2
+template <typename _Tp, typename _Abi0, typename... _Abis>
+struct _SimdTuple<_Tp, _Abi0, _Abis...>
+ : _SimdTupleData<typename _SimdTraits<_Tp, _Abi0>::_SimdMember,
+ _SimdTuple<_Tp, _Abis...>>
+{
+ static_assert(!__is_fixed_size_abi_v<_Abi0>);
+ using value_type = _Tp;
+ using _FirstType = typename _SimdTraits<_Tp, _Abi0>::_SimdMember;
+ using _FirstAbi = _Abi0;
+ using _SecondType = _SimdTuple<_Tp, _Abis...>;
+ static constexpr size_t _S_tuple_size = sizeof...(_Abis) + 1;
+ static constexpr size_t size()
+ {
+ return simd_size_v<_Tp, _Abi0> + _SecondType::size();
+ }
+ static constexpr size_t _S_first_size = simd_size_v<_Tp, _Abi0>;
+
+ using _Base = _SimdTupleData<typename _SimdTraits<_Tp, _Abi0>::_SimdMember,
+ _SimdTuple<_Tp, _Abis...>>;
+ using _Base::first;
+ using _Base::second;
+
+ _GLIBCXX_SIMD_INTRINSIC constexpr _SimdTuple() = default;
+ _GLIBCXX_SIMD_INTRINSIC constexpr _SimdTuple(const _SimdTuple&) = default;
+ _GLIBCXX_SIMD_INTRINSIC constexpr _SimdTuple& operator=(const _SimdTuple&)
+ = default;
+
+ template <typename _Up>
+ _GLIBCXX_SIMD_INTRINSIC constexpr _SimdTuple(_Up&& __x)
+ : _Base{static_cast<_Up&&>(__x)}
+ {}
+ template <typename _Up, typename _Up2>
+ _GLIBCXX_SIMD_INTRINSIC constexpr _SimdTuple(_Up&& __x, _Up2&& __y)
+ : _Base{static_cast<_Up&&>(__x), static_cast<_Up2&&>(__y)}
+ {}
+ template <typename _Up>
+ _GLIBCXX_SIMD_INTRINSIC constexpr _SimdTuple(_Up&& __x, _SimdTuple<_Tp>)
+ : _Base{static_cast<_Up&&>(__x)}
+ {}
+
+ _GLIBCXX_SIMD_INTRINSIC char* __as_charptr()
+ {
+ return reinterpret_cast<char*>(this);
+ }
+ _GLIBCXX_SIMD_INTRINSIC const char* __as_charptr() const
+ {
+ return reinterpret_cast<const char*>(this);
+ }
+
+ template <size_t _Np> _GLIBCXX_SIMD_INTRINSIC constexpr auto& __at()
+ {
+ if constexpr (_Np == 0)
+ return first;
+ else
+ return second.template __at<_Np - 1>();
+ }
+ template <size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC constexpr const auto& __at() const
+ {
+ if constexpr (_Np == 0)
+ return first;
+ else
+ return second.template __at<_Np - 1>();
+ }
+
+ template <size_t _Np> _GLIBCXX_SIMD_INTRINSIC constexpr auto __simd_at() const
+ {
+ if constexpr (_Np == 0)
+ return simd<_Tp, _Abi0>(__private_init, first);
+ else
+ return second.template __simd_at<_Np - 1>();
+ }
+
+ template <size_t _Offset = 0, typename _Fp>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdTuple
+ __generate(_Fp&& __gen, _SizeConstant<_Offset> = {})
+ {
+ auto&& __first = __gen(__tuple_element_meta<_Tp, _Abi0, _Offset>());
+ if constexpr (_S_tuple_size == 1)
+ return {__first};
+ else
+ return {__first, _SecondType::__generate(
+ static_cast<_Fp&&>(__gen),
+ _SizeConstant<_Offset + simd_size_v<_Tp, _Abi0>>())};
+ }
+
+ template <size_t _Offset = 0, typename _Fp, typename... _More>
+ _GLIBCXX_SIMD_INTRINSIC _SimdTuple
+ __apply_wrapped(_Fp&& __fun, const _More&... __more) const
+ {
+ auto&& __first = __fun(__make_meta<_Offset>(*this), first, __more.first...);
+ if constexpr (_S_tuple_size == 1)
+ return {__first};
+ else
+ return {
+ __first,
+ second.template __apply_wrapped<_Offset + simd_size_v<_Tp, _Abi0>>(
+ static_cast<_Fp&&>(__fun), __more.second...)};
+ }
+
+ template <size_t _Size, size_t _Offset = 0,
+ typename _R = __fixed_size_storage_t<_Tp, _Size>>
+ _GLIBCXX_SIMD_INTRINSIC constexpr _R __extract_tuple_with_size() const
+ {
+ if constexpr (_Size == _S_first_size && _Offset == 0)
+ return {first};
+ else if constexpr (_Size > _S_first_size && _Offset == 0
+ && _S_tuple_size > 1)
+ return {
+ first,
+ second.template __extract_tuple_with_size<_Size - _S_first_size>()};
+ else if constexpr (_Size == 1)
+ return {operator[](_SizeConstant<_Offset>())};
+ else if constexpr (_R::_S_tuple_size == 1)
+ {
+ static_assert(_Offset % _Size == 0);
+ static_assert(_S_first_size % _Size == 0);
+ return {typename _R::_FirstType(
+ __private_init,
+ __extract_part<_Offset / _Size, _S_first_size / _Size>(first))};
+ }
+ else
+ __assert_unreachable<_SizeConstant<_Size>>();
+ }
+
+ template <typename _Tup>
+ _GLIBCXX_SIMD_INTRINSIC constexpr decltype(auto)
+ __extract_argument(_Tup&& __tup) const
+ {
+ using _TupT = typename __remove_cvref_t<_Tup>::value_type;
+ if constexpr (is_same_v<_SimdTuple, __remove_cvref_t<_Tup>>)
+ return __tup.first;
+ else if (__builtin_is_constant_evaluated())
+ return __fixed_size_storage_t<_TupT, _S_first_size>::__generate([&](
+ auto __meta) constexpr {
+ return __meta.__generator(
+ [&](auto __i) constexpr { return __tup[__i]; },
+ static_cast<_TupT*>(nullptr));
+ });
+ else
+ return [&]() {
+ __fixed_size_storage_t<_TupT, _S_first_size> __r;
+ __builtin_memcpy(__r.__as_charptr(), __tup.__as_charptr(), sizeof(__r));
+ return __r;
+ }();
+ }
+
+ template <typename _Tup>
+ _GLIBCXX_SIMD_INTRINSIC constexpr auto& __skip_argument(_Tup&& __tup) const
+ {
+ static_assert(_S_tuple_size > 1);
+ using _Up = __remove_cvref_t<_Tup>;
+ constexpr size_t __off = __offset<_Up>;
+ if constexpr (_S_first_size == _Up::_S_first_size && __off == 0)
+ return __tup.second;
+ else if constexpr (_S_first_size > _Up::_S_first_size
+ && _S_first_size % _Up::_S_first_size == 0 && __off == 0)
+ return __simd_tuple_pop_front<_S_first_size / _Up::_S_first_size>(__tup);
+ else if constexpr (_S_first_size + __off < _Up::_S_first_size)
+ return __add_offset<_S_first_size>(__tup);
+ else if constexpr (_S_first_size + __off == _Up::_S_first_size)
+ return __tup.second;
+ else
+ __assert_unreachable<_Tup>();
+ }
+
+ template <size_t _Offset, typename... _More>
+ _GLIBCXX_SIMD_INTRINSIC constexpr void
+ __assign_front(const _SimdTuple<_Tp, _Abi0, _More...>& __x) &
+ {
+ static_assert(_Offset == 0);
+ first = __x.first;
+ if constexpr (sizeof...(_More) > 0)
+ {
+ static_assert(sizeof...(_Abis) >= sizeof...(_More));
+ second.template __assign_front<0>(__x.second);
+ }
+ }
+
+ template <size_t _Offset>
+ _GLIBCXX_SIMD_INTRINSIC constexpr void __assign_front(const _FirstType& __x) &
+ {
+ static_assert(_Offset == 0);
+ first = __x;
+ }
+
+ template <size_t _Offset, typename... _As>
+ _GLIBCXX_SIMD_INTRINSIC constexpr void
+ __assign_front(const _SimdTuple<_Tp, _As...>& __x) &
+ {
+ __builtin_memcpy(__as_charptr() + _Offset * sizeof(value_type),
+ __x.__as_charptr(),
+ sizeof(_Tp) * _SimdTuple<_Tp, _As...>::size());
+ }
+
+ /*
+ * Iterate over the first objects in this _SimdTuple and call __fun for each
+ * of them. If additional arguments are passed via __more, chunk them into
+ * _SimdTuple or __vector_type_t objects of the same number of values.
+ */
+ template <typename _Fp, typename... _More>
+ _GLIBCXX_SIMD_INTRINSIC constexpr _SimdTuple
+ __apply_per_chunk(_Fp&& __fun, _More&&... __more) const
+ {
+ if constexpr ((...
+ || conjunction_v<
+ is_lvalue_reference<_More>,
+ negation<is_const<remove_reference_t<_More>>>>) )
+ {
+ // need to write back at least one of __more after calling __fun
+ auto&& __first = [&](auto... __args) constexpr
+ {
+ auto __r
+ = __fun(__tuple_element_meta<_Tp, _Abi0, 0>(), first, __args...);
+ [[maybe_unused]] auto&& __ignore_me = {(
+ [](auto&& __dst, const auto& __src) {
+ if constexpr (is_assignable_v<decltype(__dst), decltype(__dst)>)
+ {
+ __dst.template __assign_front<__offset<decltype(__dst)>>(
+ __src);
+ }
+ }(static_cast<_More&&>(__more), __args),
+ 0)...};
+ return __r;
+ }
+ (__extract_argument(__more)...);
+ if constexpr (_S_tuple_size == 1)
+ return {__first};
+ else
+ return {__first,
+ second.__apply_per_chunk(static_cast<_Fp&&>(__fun),
+ __skip_argument(__more)...)};
+ }
+ else
+ {
+ auto&& __first = __fun(__tuple_element_meta<_Tp, _Abi0, 0>(), first,
+ __extract_argument(__more)...);
+ if constexpr (_S_tuple_size == 1)
+ return {__first};
+ else
+ return {__first,
+ second.__apply_per_chunk(static_cast<_Fp&&>(__fun),
+ __skip_argument(__more)...)};
+ }
+ }
+
+ template <typename _R = _Tp, typename _Fp, typename... _More>
+ _GLIBCXX_SIMD_INTRINSIC auto __apply_r(_Fp&& __fun,
+ const _More&... __more) const
+ {
+ auto&& __first
+ = __fun(__tuple_element_meta<_Tp, _Abi0, 0>(), first, __more.first...);
+ if constexpr (_S_tuple_size == 1)
+ return __first;
+ else
+ return __simd_tuple_concat<_R>(
+ __first, second.template __apply_r<_R>(static_cast<_Fp&&>(__fun),
+ __more.second...));
+ }
+
+ template <typename _Fp, typename... _More>
+ _GLIBCXX_SIMD_INTRINSIC constexpr friend _SanitizedBitMask<size()>
+ __test(const _Fp& __fun, const _SimdTuple& __x, const _More&... __more)
+ {
+ const _SanitizedBitMask<_S_first_size> __first
+ = _Abi0::_MaskImpl::__to_bits(__fun(__tuple_element_meta<_Tp, _Abi0, 0>(),
+ __x.first, __more.first...));
+ if constexpr (_S_tuple_size == 1)
+ return __first;
+ else
+ return __test(__fun, __x.second, __more.second...)._M_prepend(__first);
+ }
+
+ template <typename _Up, _Up _I>
+ _GLIBCXX_SIMD_INTRINSIC constexpr _Tp
+ operator[](std::integral_constant<_Up, _I>) const noexcept
+ {
+ if constexpr (_I < simd_size_v<_Tp, _Abi0>)
+ return __subscript_read(_I);
+ else
+ return second[std::integral_constant<_Up,
+ _I - simd_size_v<_Tp, _Abi0>>()];
+ }
+
+ _Tp operator[](size_t __i) const noexcept
+ {
+ if constexpr (_S_tuple_size == 1)
+ return __subscript_read(__i);
+ else
+ {
+#ifdef _GLIBCXX_SIMD_USE_ALIASING_LOADS
+ return reinterpret_cast<const __may_alias<_Tp>*>(this)[__i];
+#else
+ if constexpr (__is_scalar_abi<_Abi0>())
+ {
+ const _Tp* ptr = &first;
+ return ptr[__i];
+ }
+ else
+ return __i < simd_size_v<_Tp, _Abi0>
+ ? __subscript_read(__i)
+ : second[__i - simd_size_v<_Tp, _Abi0>];
+#endif
+ }
+ }
+
+ void __set(size_t __i, _Tp __val) noexcept
+ {
+ if constexpr (_S_tuple_size == 1)
+ return __subscript_write(__i, __val);
+ else
+ {
+#ifdef _GLIBCXX_SIMD_USE_ALIASING_LOADS
+ reinterpret_cast<__may_alias<_Tp>*>(this)[__i] = __val;
+#else
+ if (__i < simd_size_v<_Tp, _Abi0>)
+ __subscript_write(__i, __val);
+ else
+ second.__set(__i - simd_size_v<_Tp, _Abi0>, __val);
+#endif
+ }
+ }
+
+private:
+ // __subscript_read/_write {{{
+ _Tp __subscript_read([[maybe_unused]] size_t __i) const noexcept
+ {
+ if constexpr (__is_vectorizable_v<_FirstType>)
+ return first;
+ else
+ return first[__i];
+ }
+
+ void __subscript_write([[maybe_unused]] size_t __i, _Tp __y) noexcept
+ {
+ if constexpr (__is_vectorizable_v<_FirstType>)
+ first = __y;
+ else
+ first.__set(__i, __y);
+ }
+
+ // }}}
+};
+
+// __make_simd_tuple {{{1
+template <typename _Tp, typename _A0>
+_GLIBCXX_SIMD_INTRINSIC _SimdTuple<_Tp, _A0>
+__make_simd_tuple(std::experimental::simd<_Tp, _A0> __x0)
+{
+ return {__data(__x0)};
+}
+template <typename _Tp, typename _A0, typename... _As>
+_GLIBCXX_SIMD_INTRINSIC _SimdTuple<_Tp, _A0, _As...>
+__make_simd_tuple(const std::experimental::simd<_Tp, _A0>& __x0,
+ const std::experimental::simd<_Tp, _As>&... __xs)
+{
+ return {__data(__x0), __make_simd_tuple(__xs...)};
+}
+
+template <typename _Tp, typename _A0>
+_GLIBCXX_SIMD_INTRINSIC _SimdTuple<_Tp, _A0>
+__make_simd_tuple(const typename _SimdTraits<_Tp, _A0>::_SimdMember& __arg0)
+{
+ return {__arg0};
+}
+
+template <typename _Tp, typename _A0, typename _A1, typename... _Abis>
+_GLIBCXX_SIMD_INTRINSIC _SimdTuple<_Tp, _A0, _A1, _Abis...>
+__make_simd_tuple(
+ const typename _SimdTraits<_Tp, _A0>::_SimdMember& __arg0,
+ const typename _SimdTraits<_Tp, _A1>::_SimdMember& __arg1,
+ const typename _SimdTraits<_Tp, _Abis>::_SimdMember&... __args)
+{
+ return {__arg0, __make_simd_tuple<_Tp, _A1, _Abis...>(__arg1, __args...)};
+}
+
+// __to_simd_tuple {{{1
+template <typename _Tp, size_t _Np, typename _V, size_t _NV, typename... _VX>
+_GLIBCXX_SIMD_INTRINSIC constexpr __fixed_size_storage_t<_Tp, _Np>
+__to_simd_tuple(const std::array<_V, _NV>& __from, const _VX... __fromX);
+
+template <typename _Tp, size_t _Np,
+ size_t _Offset = 0, // skip this many elements in __from0
+ typename _R = __fixed_size_storage_t<_Tp, _Np>, typename _V0,
+ typename _V0VT = _VectorTraits<_V0>, typename... _VX>
+_GLIBCXX_SIMD_INTRINSIC _R constexpr __to_simd_tuple(const _V0 __from0,
+ const _VX... __fromX)
+{
+ static_assert(std::is_same_v<typename _V0VT::value_type, _Tp>);
+ static_assert(_Offset < _V0VT::_S_width);
+ using _R0 = __vector_type_t<_Tp, _R::_S_first_size>;
+ if constexpr (_R::_S_tuple_size == 1)
+ {
+ if constexpr (_Np == 1)
+ return _R{__from0[_Offset]};
+ else if constexpr (_Offset == 0 && _V0VT::_S_width >= _Np)
+ return _R{__intrin_bitcast<_R0>(__from0)};
+ else if constexpr (_Offset * 2 == _V0VT::_S_width
+ && _V0VT::_S_width / 2 >= _Np)
+ return _R{__intrin_bitcast<_R0>(__extract_part<1, 2>(__from0))};
+ else if constexpr (_Offset * 4 == _V0VT::_S_width
+ && _V0VT::_S_width / 4 >= _Np)
+ return _R{__intrin_bitcast<_R0>(__extract_part<1, 4>(__from0))};
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else
+ {
+ if constexpr (1 == _R::_S_first_size)
+ { // extract one scalar and recurse
+ if constexpr (_Offset + 1 < _V0VT::_S_width)
+ return _R{__from0[_Offset],
+ __to_simd_tuple<_Tp, _Np - 1, _Offset + 1>(__from0,
+ __fromX...)};
+ else
+ return _R{__from0[_Offset],
+ __to_simd_tuple<_Tp, _Np - 1, 0>(__fromX...)};
+ }
+
+ // place __from0 into _R::first and recurse for __fromX -> _R::second
+ else if constexpr (_V0VT::_S_width == _R::_S_first_size && _Offset == 0)
+ return _R{__from0,
+ __to_simd_tuple<_Tp, _Np - _R::_S_first_size>(__fromX...)};
+
+ // place lower part of __from0 into _R::first and recurse with _Offset
+ else if constexpr (_V0VT::_S_width > _R::_S_first_size && _Offset == 0)
+ return _R{__intrin_bitcast<_R0>(__from0),
+ __to_simd_tuple<_Tp, _Np - _R::_S_first_size,
+ _R::_S_first_size>(__from0, __fromX...)};
+
+ // place lower part of second quarter of __from0 into _R::first and
+ // recurse with _Offset
+ else if constexpr (_Offset * 4 == _V0VT::_S_width
+ && _V0VT::_S_width >= 4 * _R::_S_first_size)
+ return _R{__intrin_bitcast<_R0>(__extract_part<2, 4>(__from0)),
+ __to_simd_tuple<_Tp, _Np - _R::_S_first_size,
+ _Offset + _R::_S_first_size>(__from0,
+ __fromX...)};
+
+ // place lower half of high half of __from0 into _R::first and recurse
+ // with _Offset
+ else if constexpr (_Offset * 2 == _V0VT::_S_width
+ && _V0VT::_S_width >= 4 * _R::_S_first_size)
+ return _R{__intrin_bitcast<_R0>(__extract_part<2, 4>(__from0)),
+ __to_simd_tuple<_Tp, _Np - _R::_S_first_size,
+ _Offset + _R::_S_first_size>(__from0,
+ __fromX...)};
+
+ // place high half of __from0 into _R::first and recurse with __fromX
+ else if constexpr (_Offset * 2 == _V0VT::_S_width
+ && _V0VT::_S_width / 2 >= _R::_S_first_size)
+ return _R{__intrin_bitcast<_R0>(__extract_part<1, 2>(__from0)),
+ __to_simd_tuple<_Tp, _Np - _R::_S_first_size, 0>(__fromX...)};
+
+ // ill-formed if some unforseen pattern is needed
+ else
+ __assert_unreachable<_Tp>();
+ }
+}
+
+template <typename _Tp, size_t _Np, typename _V, size_t _NV, typename... _VX>
+_GLIBCXX_SIMD_INTRINSIC constexpr __fixed_size_storage_t<_Tp, _Np>
+__to_simd_tuple(const std::array<_V, _NV>& __from, const _VX... __fromX)
+{
+ if constexpr (std::is_same_v<_Tp, _V>)
+ {
+ static_assert(
+ sizeof...(_VX) == 0,
+ "An array of scalars must be the last argument to __to_simd_tuple");
+ return __call_with_subscripts(
+ __from,
+ std::make_index_sequence<_NV>(), [&](const auto... __args) constexpr {
+ return __simd_tuple_concat(
+ _SimdTuple<_Tp, simd_abi::scalar>{__args}..., _SimdTuple<_Tp>());
+ });
+ }
+ else
+ return __call_with_subscripts(
+ __from,
+ std::make_index_sequence<_NV>(), [&](const auto... __args) constexpr {
+ return __to_simd_tuple<_Tp, _Np>(__args..., __fromX...);
+ });
+}
+
+template <size_t, typename _Tp> using __to_tuple_helper = _Tp;
+template <typename _Tp, typename _A0, size_t _NOut, size_t _Np,
+ size_t... _Indexes>
+_GLIBCXX_SIMD_INTRINSIC __fixed_size_storage_t<_Tp, _NOut>
+__to_simd_tuple_impl(
+ std::index_sequence<_Indexes...>,
+ const std::array<__vector_type_t<_Tp, simd_size_v<_Tp, _A0>>, _Np>& __args)
+{
+ return __make_simd_tuple<_Tp, __to_tuple_helper<_Indexes, _A0>...>(
+ __args[_Indexes]...);
+}
+
+template <typename _Tp, typename _A0, size_t _NOut, size_t _Np,
+ typename _R = __fixed_size_storage_t<_Tp, _NOut>>
+_GLIBCXX_SIMD_INTRINSIC _R
+__to_simd_tuple_sized(
+ const std::array<__vector_type_t<_Tp, simd_size_v<_Tp, _A0>>, _Np>& __args)
+{
+ static_assert(_Np * simd_size_v<_Tp, _A0> >= _NOut);
+ return __to_simd_tuple_impl<_Tp, _A0, _NOut>(
+ std::make_index_sequence<_R::_S_tuple_size>(), __args);
+}
+
+template <typename _Tp, typename _A0, size_t _Np>
+[[deprecated]] _GLIBCXX_SIMD_INTRINSIC auto
+__to_simd_tuple(
+ const std::array<__vector_type_t<_Tp, simd_size_v<_Tp, _A0>>, _Np>& __args)
+{
+ return __to_simd_tuple<_Tp, _Np * simd_size_v<_Tp, _A0>>(__args);
+}
+
+// __optimize_simd_tuple {{{1
+template <typename _Tp>
+_GLIBCXX_SIMD_INTRINSIC _SimdTuple<_Tp>
+__optimize_simd_tuple(const _SimdTuple<_Tp>)
+{
+ return {};
+}
+
+template <typename _Tp, typename _Ap>
+_GLIBCXX_SIMD_INTRINSIC const _SimdTuple<_Tp, _Ap>&
+__optimize_simd_tuple(const _SimdTuple<_Tp, _Ap>& __x)
+{
+ return __x;
+}
+
+template <typename _Tp, typename _A0, typename _A1, typename... _Abis,
+ typename _R = __fixed_size_storage_t<
+ _Tp, _SimdTuple<_Tp, _A0, _A1, _Abis...>::size()>>
+_GLIBCXX_SIMD_INTRINSIC _R
+__optimize_simd_tuple(const _SimdTuple<_Tp, _A0, _A1, _Abis...>& __x)
+{
+ using _Tup = _SimdTuple<_Tp, _A0, _A1, _Abis...>;
+ if constexpr (std::is_same_v<_R, _Tup>)
+ return __x;
+ else if constexpr (is_same_v<typename _R::_FirstType,
+ typename _Tup::_FirstType>)
+ return {__x.first, __optimize_simd_tuple(__x.second)};
+ else if constexpr (__is_scalar_abi<_A0>()) // implies all entries are scalar
+ return {
+ __generate_from_n_evaluations<_R::_S_first_size, typename _R::_FirstType>(
+ [&](auto __i) { return __x[__i]; }),
+ __optimize_simd_tuple(__simd_tuple_pop_front<_R::_S_first_size>(__x))};
+ else if constexpr (_R::_S_first_size
+ == simd_size_v<
+ _Tp,
+ _A0> + simd_size_v<_Tp, _A1> && is_same_v<_A0, _A1>)
+ return {__concat(__x.template __at<0>(), __x.template __at<1>()),
+ __optimize_simd_tuple(__x.second.second)};
+ else if constexpr (
+ sizeof...(_Abis) >= 2
+ && _R::_S_first_size
+ == 4
+ * simd_size_v<
+ _Tp,
+ _A0> && simd_size_v<_Tp, _A0> == __simd_tuple_element_t<(sizeof...(_Abis) >= 2 ? 3 : 0), _Tup>::size())
+ return {__concat(__concat(__x.template __at<0>(), __x.template __at<1>()),
+ __concat(__x.template __at<2>(), __x.template __at<3>())),
+ __optimize_simd_tuple(__x.second.second.second.second)};
+ else
+ {
+ _R __r;
+ __builtin_memcpy(__r.__as_charptr(), __x.__as_charptr(),
+ sizeof(_Tp) * _R::size());
+ return __r;
+ }
+}
+
+// __for_each(const _SimdTuple &, Fun) {{{1
+template <size_t _Offset = 0, typename _Tp, typename _A0, typename _Fp>
+_GLIBCXX_SIMD_INTRINSIC constexpr void
+__for_each(const _SimdTuple<_Tp, _A0>& __t, _Fp&& __fun)
+{
+ static_cast<_Fp&&>(__fun)(__make_meta<_Offset>(__t), __t.first);
+}
+template <size_t _Offset = 0, typename _Tp, typename _A0, typename _A1,
+ typename... _As, typename _Fp>
+_GLIBCXX_SIMD_INTRINSIC constexpr void
+__for_each(const _SimdTuple<_Tp, _A0, _A1, _As...>& __t, _Fp&& __fun)
+{
+ __fun(__make_meta<_Offset>(__t), __t.first);
+ __for_each<_Offset + simd_size<_Tp, _A0>::value>(__t.second,
+ static_cast<_Fp&&>(__fun));
+}
+
+// __for_each(_SimdTuple &, Fun) {{{1
+template <size_t _Offset = 0, typename _Tp, typename _A0, typename _Fp>
+_GLIBCXX_SIMD_INTRINSIC constexpr void
+__for_each(_SimdTuple<_Tp, _A0>& __t, _Fp&& __fun)
+{
+ static_cast<_Fp&&>(__fun)(__make_meta<_Offset>(__t), __t.first);
+}
+template <size_t _Offset = 0, typename _Tp, typename _A0, typename _A1,
+ typename... _As, typename _Fp>
+_GLIBCXX_SIMD_INTRINSIC constexpr void
+__for_each(_SimdTuple<_Tp, _A0, _A1, _As...>& __t, _Fp&& __fun)
+{
+ __fun(__make_meta<_Offset>(__t), __t.first);
+ __for_each<_Offset + simd_size<_Tp, _A0>::value>(__t.second,
+ static_cast<_Fp&&>(__fun));
+}
+
+// __for_each(_SimdTuple &, const _SimdTuple &, Fun) {{{1
+template <size_t _Offset = 0, typename _Tp, typename _A0, typename _Fp>
+_GLIBCXX_SIMD_INTRINSIC constexpr void
+__for_each(_SimdTuple<_Tp, _A0>& __a, const _SimdTuple<_Tp, _A0>& __b,
+ _Fp&& __fun)
+{
+ static_cast<_Fp&&>(__fun)(__make_meta<_Offset>(__a), __a.first, __b.first);
+}
+template <size_t _Offset = 0, typename _Tp, typename _A0, typename _A1,
+ typename... _As, typename _Fp>
+_GLIBCXX_SIMD_INTRINSIC constexpr void
+__for_each(_SimdTuple<_Tp, _A0, _A1, _As...>& __a,
+ const _SimdTuple<_Tp, _A0, _A1, _As...>& __b, _Fp&& __fun)
+{
+ __fun(__make_meta<_Offset>(__a), __a.first, __b.first);
+ __for_each<_Offset + simd_size<_Tp, _A0>::value>(__a.second, __b.second,
+ static_cast<_Fp&&>(__fun));
+}
+
+// __for_each(const _SimdTuple &, const _SimdTuple &, Fun) {{{1
+template <size_t _Offset = 0, typename _Tp, typename _A0, typename _Fp>
+_GLIBCXX_SIMD_INTRINSIC constexpr void
+__for_each(const _SimdTuple<_Tp, _A0>& __a, const _SimdTuple<_Tp, _A0>& __b,
+ _Fp&& __fun)
+{
+ static_cast<_Fp&&>(__fun)(__make_meta<_Offset>(__a), __a.first, __b.first);
+}
+template <size_t _Offset = 0, typename _Tp, typename _A0, typename _A1,
+ typename... _As, typename _Fp>
+_GLIBCXX_SIMD_INTRINSIC constexpr void
+__for_each(const _SimdTuple<_Tp, _A0, _A1, _As...>& __a,
+ const _SimdTuple<_Tp, _A0, _A1, _As...>& __b, _Fp&& __fun)
+{
+ __fun(__make_meta<_Offset>(__a), __a.first, __b.first);
+ __for_each<_Offset + simd_size<_Tp, _A0>::value>(__a.second, __b.second,
+ static_cast<_Fp&&>(__fun));
+}
+
+// }}}1
+// __extract_part(_SimdTuple) {{{
+template <int _Index, int _Total, int _Combine, typename _Tp, typename _A0,
+ typename... _As>
+_GLIBCXX_SIMD_INTRINSIC auto // __vector_type_t or _SimdTuple
+__extract_part(const _SimdTuple<_Tp, _A0, _As...>& __x)
+{
+ // worst cases:
+ // (a) 4, 4, 4 => 3, 3, 3, 3 (_Total = 4)
+ // (b) 2, 2, 2 => 3, 3 (_Total = 2)
+ // (c) 4, 2 => 2, 2, 2 (_Total = 3)
+ using _Tuple = _SimdTuple<_Tp, _A0, _As...>;
+ static_assert(_Index + _Combine <= _Total && _Index >= 0 && _Total >= 1);
+ constexpr size_t _Np = _Tuple::size();
+ static_assert(_Np >= _Total && _Np % _Total == 0);
+ constexpr size_t __values_per_part = _Np / _Total;
+ [[maybe_unused]] constexpr size_t __values_to_skip
+ = _Index * __values_per_part;
+ constexpr size_t __return_size = __values_per_part * _Combine;
+ using _RetAbi = simd_abi::deduce_t<_Tp, __return_size>;
+
+ // handle (optimize) the simple cases
+ if constexpr (_Index == 0 && _Tuple::_S_first_size == __return_size)
+ return __x.first._M_data;
+ else if constexpr (_Index == 0 && _Total == _Combine)
+ return __x;
+ else if constexpr (_Index == 0 && _Tuple::_S_first_size >= __return_size)
+ return __intrin_bitcast<__vector_type_t<_Tp, __return_size>>(
+ __as_vector(__x.first));
+
+ // recurse to skip unused data members at the beginning of _SimdTuple
+ else if constexpr (__values_to_skip >= _Tuple::_S_first_size)
+ { // recurse
+ if constexpr (_Tuple::_S_first_size % __values_per_part == 0)
+ {
+ constexpr int __parts_in_first
+ = _Tuple::_S_first_size / __values_per_part;
+ return __extract_part<_Index - __parts_in_first,
+ _Total - __parts_in_first, _Combine>(
+ __x.second);
+ }
+ else
+ return __extract_part<__values_to_skip - _Tuple::_S_first_size,
+ _Np - _Tuple::_S_first_size, __return_size>(
+ __x.second);
+ }
+
+ // extract from multiple _SimdTuple data members
+ else if constexpr (__return_size > _Tuple::_S_first_size - __values_to_skip)
+ {
+#ifdef _GLIBCXX_SIMD_USE_ALIASING_LOADS
+ const __may_alias<_Tp>* const element_ptr
+ = reinterpret_cast<const __may_alias<_Tp>*>(&__x) + __values_to_skip;
+ return __as_vector(simd<_Tp, _RetAbi>(element_ptr, element_aligned));
+#else
+ [[maybe_unused]] constexpr size_t __offset = __values_to_skip;
+ return __as_vector(simd<_Tp, _RetAbi>([&](auto __i) constexpr {
+ constexpr _SizeConstant<__i + __offset> __k;
+ return __x[__k];
+ }));
+#endif
+ }
+
+ // all of the return values are in __x.first
+ else if constexpr (_Tuple::_S_first_size % __values_per_part == 0)
+ return __extract_part<_Index, _Tuple::_S_first_size / __values_per_part,
+ _Combine>(__x.first);
+ else
+ return __extract_part<__values_to_skip, _Tuple::_S_first_size,
+ _Combine * __values_per_part>(__x.first);
+}
+
+// }}}
+// __fixed_size_storage_t<_Tp, _Np>{{{
+template <typename _Tp, int _Np, typename _Tuple,
+ typename _Next = simd<_Tp, _AllNativeAbis::_BestAbi<_Tp, _Np>>,
+ int _Remain = _Np - int(_Next::size())>
+struct __fixed_size_storage_builder;
+
+template <typename _Tp, int _Np>
+struct __fixed_size_storage
+ : public __fixed_size_storage_builder<_Tp, _Np, _SimdTuple<_Tp>>
+{
+};
+
+template <typename _Tp, int _Np, typename... _As, typename _Next>
+struct __fixed_size_storage_builder<_Tp, _Np, _SimdTuple<_Tp, _As...>, _Next, 0>
+{
+ using type = _SimdTuple<_Tp, _As..., typename _Next::abi_type>;
+};
+
+template <typename _Tp, int _Np, typename... _As, typename _Next, int _Remain>
+struct __fixed_size_storage_builder<_Tp, _Np, _SimdTuple<_Tp, _As...>, _Next,
+ _Remain>
+{
+ using type = typename __fixed_size_storage_builder<
+ _Tp, _Remain, _SimdTuple<_Tp, _As..., typename _Next::abi_type>>::type;
+};
+
+// }}}
+// _AbisInSimdTuple {{{
+template <typename _Tp> struct _SeqOp;
+template <size_t _I0, size_t... _Is>
+struct _SeqOp<std::index_sequence<_I0, _Is...>>
+{
+ using _FirstPlusOne = std::index_sequence<_I0 + 1, _Is...>;
+ using _NotFirstPlusOne = std::index_sequence<_I0, (_Is + 1)...>;
+ template <size_t _First, size_t _Add>
+ using _Prepend = std::index_sequence<_First, _I0 + _Add, (_Is + _Add)...>;
+};
+
+template <typename _Tp> struct _AbisInSimdTuple;
+template <typename _Tp> struct _AbisInSimdTuple<_SimdTuple<_Tp>>
+{
+ using _Counts = std::index_sequence<0>;
+ using _Begins = std::index_sequence<0>;
+};
+template <typename _Tp, typename _Ap>
+struct _AbisInSimdTuple<_SimdTuple<_Tp, _Ap>>
+{
+ using _Counts = std::index_sequence<1>;
+ using _Begins = std::index_sequence<0>;
+};
+template <typename _Tp, typename _A0, typename... _As>
+struct _AbisInSimdTuple<_SimdTuple<_Tp, _A0, _A0, _As...>>
+{
+ using _Counts = typename _SeqOp<typename _AbisInSimdTuple<
+ _SimdTuple<_Tp, _A0, _As...>>::_Counts>::_FirstPlusOne;
+ using _Begins = typename _SeqOp<typename _AbisInSimdTuple<
+ _SimdTuple<_Tp, _A0, _As...>>::_Begins>::_NotFirstPlusOne;
+};
+template <typename _Tp, typename _A0, typename _A1, typename... _As>
+struct _AbisInSimdTuple<_SimdTuple<_Tp, _A0, _A1, _As...>>
+{
+ using _Counts = typename _SeqOp<typename _AbisInSimdTuple<
+ _SimdTuple<_Tp, _A1, _As...>>::_Counts>::template _Prepend<1, 0>;
+ using _Begins = typename _SeqOp<typename _AbisInSimdTuple<
+ _SimdTuple<_Tp, _A1, _As...>>::_Begins>::template _Prepend<0, 1>;
+};
+
+// }}}
+// __autocvt_to_simd {{{
+template <typename _Tp, bool = std::is_arithmetic_v<__remove_cvref_t<_Tp>>>
+struct __autocvt_to_simd
+{
+ _Tp _M_data;
+ using _TT = __remove_cvref_t<_Tp>;
+ operator _TT() { return _M_data; }
+ operator _TT&()
+ {
+ static_assert(std::is_lvalue_reference<_Tp>::value, "");
+ static_assert(!std::is_const<_Tp>::value, "");
+ return _M_data;
+ }
+ operator _TT*()
+ {
+ static_assert(std::is_lvalue_reference<_Tp>::value, "");
+ static_assert(!std::is_const<_Tp>::value, "");
+ return &_M_data;
+ }
+
+ constexpr inline __autocvt_to_simd(_Tp dd) : _M_data(dd) {}
+
+ template <typename _Abi> operator simd<typename _TT::value_type, _Abi>()
+ {
+ return {__private_init, _M_data};
+ }
+
+ template <typename _Abi> operator simd<typename _TT::value_type, _Abi> &()
+ {
+ return *reinterpret_cast<simd<typename _TT::value_type, _Abi>*>(&_M_data);
+ }
+
+ template <typename _Abi> operator simd<typename _TT::value_type, _Abi> *()
+ {
+ return reinterpret_cast<simd<typename _TT::value_type, _Abi>*>(&_M_data);
+ }
+};
+template <typename _Tp> __autocvt_to_simd(_Tp &&) -> __autocvt_to_simd<_Tp>;
+
+template <typename _Tp> struct __autocvt_to_simd<_Tp, true>
+{
+ using _TT = __remove_cvref_t<_Tp>;
+ _Tp _M_data;
+ fixed_size_simd<_TT, 1> _M_fd;
+
+ constexpr inline __autocvt_to_simd(_Tp dd) : _M_data(dd), _M_fd(_M_data) {}
+ ~__autocvt_to_simd() { _M_data = __data(_M_fd).first; }
+
+ operator fixed_size_simd<_TT, 1>() { return _M_fd; }
+ operator fixed_size_simd<_TT, 1> &()
+ {
+ static_assert(std::is_lvalue_reference<_Tp>::value, "");
+ static_assert(!std::is_const<_Tp>::value, "");
+ return _M_fd;
+ }
+ operator fixed_size_simd<_TT, 1> *()
+ {
+ static_assert(std::is_lvalue_reference<_Tp>::value, "");
+ static_assert(!std::is_const<_Tp>::value, "");
+ return &_M_fd;
+ }
+};
+
+// }}}
+
+struct _CommonImplFixedSize;
+template <int _Np> struct _SimdImplFixedSize;
+template <int _Np> struct _MaskImplFixedSize;
+// simd_abi::_Fixed {{{
+template <int _Np> struct simd_abi::_Fixed
+{
+ template <typename _Tp> static constexpr size_t size = _Np;
+ template <typename _Tp> static constexpr size_t _S_full_size = _Np;
+ // validity traits {{{
+ struct _IsValidAbiTag : public __bool_constant<(_Np > 0)>
+ {
+ };
+ template <typename _Tp>
+ struct _IsValidSizeFor
+ : __bool_constant<(_Np <= simd_abi::max_fixed_size<_Tp>)>
+ {
+ };
+ template <typename _Tp>
+ struct _IsValid
+ : conjunction<_IsValidAbiTag, __is_vectorizable<_Tp>, _IsValidSizeFor<_Tp>>
+ {
+ };
+ template <typename _Tp>
+ static constexpr bool _S_is_valid_v = _IsValid<_Tp>::value;
+
+ // }}}
+ // __masked {{{
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SanitizedBitMask<_Np>
+ __masked(_BitMask<_Np> __x)
+ {
+ return __x._M_sanitized();
+ }
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SanitizedBitMask<_Np>
+ __masked(_SanitizedBitMask<_Np> __x)
+ {
+ return __x;
+ }
+
+ // }}}
+ // _*Impl {{{
+ using _CommonImpl = _CommonImplFixedSize;
+ using _SimdImpl = _SimdImplFixedSize<_Np>;
+ using _MaskImpl = _MaskImplFixedSize<_Np>;
+
+ // }}}
+ // __traits {{{
+ template <typename _Tp, bool = _S_is_valid_v<_Tp>>
+ struct __traits : _InvalidTraits
+ {
+ };
+
+ template <typename _Tp> struct __traits<_Tp, true>
+ {
+ using _IsValid = true_type;
+ using _SimdImpl = _SimdImplFixedSize<_Np>;
+ using _MaskImpl = _MaskImplFixedSize<_Np>;
+
+ // simd and simd_mask member types {{{
+ using _SimdMember = __fixed_size_storage_t<_Tp, _Np>;
+ using _MaskMember = _SanitizedBitMask<_Np>;
+ static constexpr size_t _S_simd_align
+ = __next_power_of_2(_Np * sizeof(_Tp));
+ static constexpr size_t _S_mask_align = alignof(_MaskMember);
+
+ // }}}
+ // _SimdBase / base class for simd, providing extra conversions {{{
+ struct _SimdBase
+ {
+ // The following ensures, function arguments are passed via the stack.
+ // This is important for ABI compatibility across TU boundaries
+ _SimdBase(const _SimdBase&) {}
+ _SimdBase() = default;
+
+ explicit operator const _SimdMember &() const
+ {
+ return static_cast<const simd<_Tp, _Fixed>*>(this)->_M_data;
+ }
+ explicit operator std::array<_Tp, _Np>() const
+ {
+ std::array<_Tp, _Np> __r;
+ // _SimdMember can be larger because of higher alignment
+ static_assert(sizeof(__r) <= sizeof(_SimdMember), "");
+ __builtin_memcpy(__r.data(), &static_cast<const _SimdMember&>(*this),
+ sizeof(__r));
+ return __r;
+ }
+ };
+
+ // }}}
+ // _MaskBase {{{
+ // empty. The std::bitset interface suffices
+ struct _MaskBase
+ {
+ };
+
+ // }}}
+ // _SimdCastType {{{
+ struct _SimdCastType
+ {
+ _SimdCastType(const std::array<_Tp, _Np>&);
+ _SimdCastType(const _SimdMember& dd) : _M_data(dd) {}
+ explicit operator const _SimdMember &() const { return _M_data; }
+
+ private:
+ const _SimdMember& _M_data;
+ };
+
+ // }}}
+ // _MaskCastType {{{
+ class _MaskCastType
+ {
+ _MaskCastType() = delete;
+ };
+ // }}}
+ };
+ // }}}
+};
+
+// }}}
+// _CommonImplFixedSize {{{
+struct _CommonImplFixedSize
+{
+ // __store {{{
+ template <typename _Flags, typename _Tp, typename... _As>
+ _GLIBCXX_SIMD_INTRINSIC static void
+ __store(const _SimdTuple<_Tp, _As...>& __x, void* __addr, _Flags)
+ {
+ constexpr size_t _Np = _SimdTuple<_Tp, _As...>::size();
+ if constexpr (std::is_same_v<_Flags, vector_aligned_tag>)
+ __addr = __builtin_assume_aligned(
+ __addr, memory_alignment_v<fixed_size_simd<_Tp, _Np>, _Tp>);
+ else if constexpr (!std::is_same_v<_Flags, element_aligned_tag>)
+ __addr = __builtin_assume_aligned(__addr, _Flags::_S_alignment);
+ __builtin_memcpy(__addr, &__x, _Np * sizeof(_Tp));
+ }
+
+ // }}}
+};
+
+// }}}
+// _SimdImplFixedSize {{{1
+// fixed_size should not inherit from _SimdMathFallback in order for
+// specializations in the used _SimdTuple Abis to get used
+template <int _Np> struct _SimdImplFixedSize
+{
+ // member types {{{2
+ using _MaskMember = _SanitizedBitMask<_Np>;
+ template <typename _Tp> using _SimdMember = __fixed_size_storage_t<_Tp, _Np>;
+ template <typename _Tp>
+ static constexpr std::size_t _S_tuple_size = _SimdMember<_Tp>::_S_tuple_size;
+ template <typename _Tp>
+ using _Simd = std::experimental::simd<_Tp, simd_abi::fixed_size<_Np>>;
+ template <typename _Tp> using _TypeTag = _Tp*;
+
+ // broadcast {{{2
+ template <typename _Tp>
+ static constexpr inline _SimdMember<_Tp> __broadcast(_Tp __x) noexcept
+ {
+ return _SimdMember<_Tp>::__generate([&](auto __meta) constexpr {
+ return __meta.__broadcast(__x);
+ });
+ }
+
+ // __generator {{{2
+ template <typename _Fp, typename _Tp>
+ static constexpr inline _SimdMember<_Tp> __generator(_Fp&& __gen,
+ _TypeTag<_Tp>)
+ {
+ return _SimdMember<_Tp>::__generate([&__gen](auto __meta) constexpr {
+ return __meta.__generator(
+ [&](auto __i) constexpr {
+ return __i < _Np ? __gen(_SizeConstant<__meta._S_offset + __i>()) : 0;
+ },
+ _TypeTag<_Tp>());
+ });
+ }
+
+ // __load {{{2
+ template <typename _Tp, typename _Up, typename _Fp>
+ static inline _SimdMember<_Tp> __load(const _Up* __mem, _Fp __f,
+ _TypeTag<_Tp>) noexcept
+ {
+ return _SimdMember<_Tp>::__generate([&](auto __meta) {
+ return __meta.__load(&__mem[__meta._S_offset], __f, _TypeTag<_Tp>());
+ });
+ }
+
+ // __masked_load {{{2
+ template <typename _Tp, typename... _As, typename _Up, typename _Fp>
+ static inline _SimdTuple<_Tp, _As...>
+ __masked_load(const _SimdTuple<_Tp, _As...>& __old, const _MaskMember __bits,
+ const _Up* __mem, _Fp __f) noexcept
+ {
+ auto __merge = __old;
+ __for_each(__merge, [&](auto __meta, auto& __native) {
+ if (__meta.__submask(__bits).any())
+#pragma GCC diagnostic push
+ // __mem + __mem._S_offset could be UB ([expr.add]/4.3, but it punts the
+ // responsibility for avoiding UB to the caller of the masked load via the
+ // mask. Consequently, the compiler may assume this branch is unreachable,
+ // if the pointer arithmetic is UB.
+#pragma GCC diagnostic ignored "-Warray-bounds"
+ __native = __meta.__masked_load(__native, __meta.__make_mask(__bits),
+ __mem + __meta._S_offset, __f);
+#pragma GCC diagnostic pop
+ });
+ return __merge;
+ }
+
+ // __store {{{2
+ template <typename _Tp, typename _Up, typename _Fp>
+ static inline void __store(const _SimdMember<_Tp>& __v, _Up* __mem, _Fp __f,
+ _TypeTag<_Tp>) noexcept
+ {
+ __for_each(__v, [&](auto __meta, auto __native) {
+ __meta.__store(__native, &__mem[__meta._S_offset], __f, _TypeTag<_Tp>());
+ });
+ }
+
+ // __masked_store {{{2
+ template <typename _Tp, typename... _As, typename _Up, typename _Fp>
+ static inline void __masked_store(const _SimdTuple<_Tp, _As...>& __v,
+ _Up* __mem, _Fp __f,
+ const _MaskMember __bits) noexcept
+ {
+ __for_each(__v, [&](auto __meta, auto __native) {
+ if (__meta.__submask(__bits).any())
+#pragma GCC diagnostic push
+ // __mem + __mem._S_offset could be UB ([expr.add]/4.3, but it punts the
+ // responsibility for avoiding UB to the caller of the masked store via the
+ // mask. Consequently, the compiler may assume this branch is unreachable,
+ // if the pointer arithmetic is UB.
+#pragma GCC diagnostic ignored "-Warray-bounds"
+ __meta.__masked_store(__native, __mem + __meta._S_offset, __f,
+ __meta.__make_mask(__bits));
+#pragma GCC diagnostic pop
+ });
+ }
+
+ // negation {{{2
+ template <typename _Tp, typename... _As>
+ static inline _MaskMember
+ __negate(const _SimdTuple<_Tp, _As...>& __x) noexcept
+ {
+ _MaskMember __bits = 0;
+ __for_each(
+ __x, [&__bits](auto __meta, auto __native) constexpr {
+ __bits |= __meta.__mask_to_shifted_ullong(__meta.__negate(__native));
+ });
+ return __bits;
+ }
+
+ // reductions {{{2
+ template <typename _Tp, typename _BinaryOperation>
+ static constexpr inline _Tp __reduce(const _Simd<_Tp>& __x,
+ const _BinaryOperation& __binary_op)
+ {
+ using _Tup = _SimdMember<_Tp>;
+ const _Tup& __tup = __data(__x);
+ if constexpr (_Tup::_S_tuple_size == 1)
+ return _Tup::_FirstAbi::_SimdImpl::__reduce(__tup.template __simd_at<0>(),
+ __binary_op);
+ else if constexpr (_Tup::_S_tuple_size == 2
+ && _Tup::size() > 2
+ && _Tup::_SecondType::size() == 1)
+ {
+ return __binary_op(simd<_Tp, simd_abi::scalar>(
+ reduce(__tup.template __simd_at<0>(),
+ __binary_op)),
+ __tup.template __simd_at<1>())[0];
+ }
+ else if constexpr (_Tup::_S_tuple_size == 2
+ && _Tup::size() > 4
+ && _Tup::_SecondType::size() == 2)
+ {
+ return __binary_op(
+ simd<_Tp, simd_abi::scalar>(
+ reduce(__tup.template __simd_at<0>(), __binary_op)),
+ simd<_Tp, simd_abi::scalar>(
+ reduce(__tup.template __simd_at<1>(), __binary_op)))[0];
+ }
+ else
+ {
+ const auto& __x2
+ = __call_with_n_evaluations<__div_roundup(_Tup::_S_tuple_size, 2)>(
+ [](auto __first_simd, auto... __remaining) {
+ if constexpr (sizeof...(__remaining) == 0)
+ return __first_simd;
+ else
+ {
+ using _Tup2
+ = _SimdTuple<_Tp, typename decltype(__first_simd)::abi_type,
+ typename decltype(__remaining)::abi_type...>;
+ return fixed_size_simd<_Tp, _Tup2::size()>(
+ __private_init,
+ __make_simd_tuple(__first_simd, __remaining...));
+ }
+ },
+ [&](auto __i) {
+ auto __left = __tup.template __simd_at<2 * __i>();
+ if constexpr (2 * __i + 1 == _Tup::_S_tuple_size)
+ return __left;
+ else
+ {
+ auto __right = __tup.template __simd_at<2 * __i + 1>();
+ using _LT = decltype(__left);
+ using _RT = decltype(__right);
+ if constexpr (_LT::size() == _RT::size())
+ return __binary_op(__left, __right);
+ else
+ {
+ _GLIBCXX_SIMD_CONSTEXPR typename _LT::mask_type __k(
+ __private_init, [](auto __j) constexpr {
+ return __j < _RT::size();
+ });
+ _LT __ext_right = __left;
+ where(__k, __ext_right)
+ = __proposed::resizing_simd_cast<_LT>(__right);
+ where(__k, __left) = __binary_op(__left, __ext_right);
+ return __left;
+ }
+ }
+ });
+ return reduce(__x2, __binary_op);
+ }
+ }
+
+ // __min, __max {{{2
+ template <typename _Tp, typename... _As>
+ static inline constexpr _SimdTuple<_Tp, _As...>
+ __min(const _SimdTuple<_Tp, _As...>& __a, const _SimdTuple<_Tp, _As...>& __b)
+ {
+ return __a.__apply_per_chunk(
+ [](auto __impl, auto __aa, auto __bb) constexpr {
+ return __impl.__min(__aa, __bb);
+ },
+ __b);
+ }
+
+ template <typename _Tp, typename... _As>
+ static inline constexpr _SimdTuple<_Tp, _As...>
+ __max(const _SimdTuple<_Tp, _As...>& __a, const _SimdTuple<_Tp, _As...>& __b)
+ {
+ return __a.__apply_per_chunk(
+ [](auto __impl, auto __aa, auto __bb) constexpr {
+ return __impl.__max(__aa, __bb);
+ },
+ __b);
+ }
+
+ // __complement {{{2
+ template <typename _Tp, typename... _As>
+ static inline constexpr _SimdTuple<_Tp, _As...>
+ __complement(const _SimdTuple<_Tp, _As...>& __x) noexcept
+ {
+ return __x.__apply_per_chunk([](auto __impl, auto __xx) constexpr {
+ return __impl.__complement(__xx);
+ });
+ }
+
+ // __unary_minus {{{2
+ template <typename _Tp, typename... _As>
+ static inline constexpr _SimdTuple<_Tp, _As...>
+ __unary_minus(const _SimdTuple<_Tp, _As...>& __x) noexcept
+ {
+ return __x.__apply_per_chunk([](auto __impl, auto __xx) constexpr {
+ return __impl.__unary_minus(__xx);
+ });
+ }
+
+ // arithmetic operators {{{2
+
+#define _GLIBCXX_SIMD_FIXED_OP(name_, op_) \
+ template <typename _Tp, typename... _As> \
+ static inline constexpr _SimdTuple<_Tp, _As...> name_( \
+ const _SimdTuple<_Tp, _As...> __x, const _SimdTuple<_Tp, _As...> __y) \
+ { \
+ return __x.__apply_per_chunk( \
+ [](auto __impl, auto __xx, auto __yy) constexpr { \
+ return __impl.name_(__xx, __yy); \
+ }, \
+ __y); \
+ }
+
+ _GLIBCXX_SIMD_FIXED_OP(__plus, +)
+ _GLIBCXX_SIMD_FIXED_OP(__minus, -)
+ _GLIBCXX_SIMD_FIXED_OP(__multiplies, *)
+ _GLIBCXX_SIMD_FIXED_OP(__divides, /)
+ _GLIBCXX_SIMD_FIXED_OP(__modulus, %)
+ _GLIBCXX_SIMD_FIXED_OP(__bit_and, &)
+ _GLIBCXX_SIMD_FIXED_OP(__bit_or, |)
+ _GLIBCXX_SIMD_FIXED_OP(__bit_xor, ^)
+ _GLIBCXX_SIMD_FIXED_OP(__bit_shift_left, <<)
+ _GLIBCXX_SIMD_FIXED_OP(__bit_shift_right, >>)
+#undef _GLIBCXX_SIMD_FIXED_OP
+
+ template <typename _Tp, typename... _As>
+ static inline constexpr _SimdTuple<_Tp, _As...>
+ __bit_shift_left(const _SimdTuple<_Tp, _As...>& __x, int __y)
+ {
+ return __x.__apply_per_chunk([__y](auto __impl, auto __xx) constexpr {
+ return __impl.__bit_shift_left(__xx, __y);
+ });
+ }
+
+ template <typename _Tp, typename... _As>
+ static inline constexpr _SimdTuple<_Tp, _As...>
+ __bit_shift_right(const _SimdTuple<_Tp, _As...>& __x, int __y)
+ {
+ return __x.__apply_per_chunk([__y](auto __impl, auto __xx) constexpr {
+ return __impl.__bit_shift_right(__xx, __y);
+ });
+ }
+
+ // math {{{2
+#define _GLIBCXX_SIMD_APPLY_ON_TUPLE(_RetTp, __name) \
+ template <typename _Tp, typename... _As, typename... _More> \
+ static inline __fixed_size_storage_t<_RetTp, \
+ _SimdTuple<_Tp, _As...>::size()> \
+ __##__name(const _SimdTuple<_Tp, _As...>& __x, const _More&... __more) \
+ { \
+ if constexpr (sizeof...(_More) == 0) \
+ { \
+ if constexpr (is_same_v<_Tp, _RetTp>) \
+ return __x.__apply_per_chunk([](auto __impl, auto __xx) constexpr { \
+ using _V = typename decltype(__impl)::simd_type; \
+ return __data(__name(_V(__private_init, __xx))); \
+ }); \
+ else \
+ return __optimize_simd_tuple(__x.template __apply_r<_RetTp>( \
+ [](auto __impl, auto __xx) { return __impl.__##__name(__xx); })); \
+ } \
+ else if constexpr ( \
+ is_same_v< \
+ _Tp, \
+ _RetTp> && (... && std::is_same_v<_SimdTuple<_Tp, _As...>, _More>) ) \
+ return __x.__apply_per_chunk( \
+ [](auto __impl, auto __xx, auto... __pack) constexpr { \
+ using _V = typename decltype(__impl)::simd_type; \
+ return __data( \
+ __name(_V(__private_init, __xx), _V(__private_init, __pack)...)); \
+ }, \
+ __more...); \
+ else if constexpr (is_same_v<_Tp, _RetTp>) \
+ return __x.__apply_per_chunk( \
+ [](auto __impl, auto __xx, auto... __pack) constexpr { \
+ using _V = typename decltype(__impl)::simd_type; \
+ return __data( \
+ __name(_V(__private_init, __xx), __autocvt_to_simd(__pack)...)); \
+ }, \
+ __more...); \
+ else \
+ __assert_unreachable<_Tp>(); \
+ }
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, acos)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, asin)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, atan)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, atan2)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, cos)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, sin)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, tan)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, acosh)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, asinh)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, atanh)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, cosh)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, sinh)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, tanh)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, exp)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, exp2)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, expm1)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(int, ilogb)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, log)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, log10)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, log1p)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, log2)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, logb)
+ // modf implemented in simd_math.h
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, scalbn) // double scalbn(double x, int exp);
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, scalbln)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, cbrt)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, abs)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, fabs)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, pow)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, sqrt)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, erf)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, erfc)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, lgamma)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, tgamma)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, trunc)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, ceil)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, floor)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, nearbyint)
+
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, rint)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(long, lrint)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(long long, llrint)
+
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, round)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(long, lround)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(long long, llround)
+
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, ldexp)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, fmod)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, remainder)
+ // copysign in simd_math.h
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, nextafter)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, fdim)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, fmax)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, fmin)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(_Tp, fma)
+ _GLIBCXX_SIMD_APPLY_ON_TUPLE(int, fpclassify)
+#undef _GLIBCXX_SIMD_APPLY_ON_TUPLE
+
+ template <typename _Tp, typename... _Abis>
+ static _SimdTuple<_Tp, _Abis...>
+ __remquo(const _SimdTuple<_Tp, _Abis...>& __x,
+ const _SimdTuple<_Tp, _Abis...>& __y,
+ __fixed_size_storage_t<int, _SimdTuple<_Tp, _Abis...>::size()>* __z)
+ {
+ return __x.__apply_per_chunk(
+ [](auto __impl, const auto __xx, const auto __yy, auto& __zz) {
+ return __impl.__remquo(__xx, __yy, &__zz);
+ },
+ __y, *__z);
+ }
+
+ template <typename _Tp, typename... _As>
+ static inline _SimdTuple<_Tp, _As...>
+ __frexp(const _SimdTuple<_Tp, _As...>& __x,
+ __fixed_size_storage_t<int, _Np>& __exp) noexcept
+ {
+ return __x.__apply_per_chunk(
+ [](auto __impl, const auto& __a, auto& __b) {
+ return __data(
+ frexp(typename decltype(__impl)::simd_type(__private_init, __a),
+ __autocvt_to_simd(__b)));
+ },
+ __exp);
+ }
+
+ template <typename _Tp, typename... _As>
+ static inline __fixed_size_storage_t<int, _Np>
+ __fpclassify(const _SimdTuple<_Tp, _As...>& __x) noexcept
+ {
+ return __optimize_simd_tuple(__x.template __apply_r<int>(
+ [](auto __impl, auto __xx) { return __impl.__fpclassify(__xx); }));
+ }
+
+#define _GLIBCXX_SIMD_TEST_ON_TUPLE_(name_) \
+ template <typename _Tp, typename... _As> \
+ static inline _MaskMember __##name_( \
+ const _SimdTuple<_Tp, _As...>& __x) noexcept \
+ { \
+ return __test([](auto __impl, \
+ auto __xx) { return __impl.__##name_(__xx); }, \
+ __x); \
+ }
+ _GLIBCXX_SIMD_TEST_ON_TUPLE_(isinf)
+ _GLIBCXX_SIMD_TEST_ON_TUPLE_(isfinite)
+ _GLIBCXX_SIMD_TEST_ON_TUPLE_(isnan)
+ _GLIBCXX_SIMD_TEST_ON_TUPLE_(isnormal)
+ _GLIBCXX_SIMD_TEST_ON_TUPLE_(signbit)
+#undef _GLIBCXX_SIMD_TEST_ON_TUPLE_
+
+ // __increment & __decrement{{{2
+ template <typename... _Ts>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr void
+ __increment(_SimdTuple<_Ts...>& __x)
+ {
+ __for_each(
+ __x,
+ [](auto __meta, auto& native) constexpr { __meta.__increment(native); });
+ }
+
+ template <typename... _Ts>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr void
+ __decrement(_SimdTuple<_Ts...>& __x)
+ {
+ __for_each(
+ __x,
+ [](auto __meta, auto& native) constexpr { __meta.__decrement(native); });
+ }
+
+ // compares {{{2
+#define _GLIBCXX_SIMD_CMP_OPERATIONS(__cmp) \
+ template <typename _Tp, typename... _As> \
+ _GLIBCXX_SIMD_INTRINSIC constexpr static _MaskMember __cmp( \
+ const _SimdTuple<_Tp, _As...>& __x, const _SimdTuple<_Tp, _As...>& __y) \
+ { \
+ return __test( \
+ [](auto __impl, auto __xx, auto __yy) constexpr { \
+ return __impl.__cmp(__xx, __yy); \
+ }, \
+ __x, __y); \
+ }
+ _GLIBCXX_SIMD_CMP_OPERATIONS(__equal_to)
+ _GLIBCXX_SIMD_CMP_OPERATIONS(__not_equal_to)
+ _GLIBCXX_SIMD_CMP_OPERATIONS(__less)
+ _GLIBCXX_SIMD_CMP_OPERATIONS(__less_equal)
+ _GLIBCXX_SIMD_CMP_OPERATIONS(__isless)
+ _GLIBCXX_SIMD_CMP_OPERATIONS(__islessequal)
+ _GLIBCXX_SIMD_CMP_OPERATIONS(__isgreater)
+ _GLIBCXX_SIMD_CMP_OPERATIONS(__isgreaterequal)
+ _GLIBCXX_SIMD_CMP_OPERATIONS(__islessgreater)
+ _GLIBCXX_SIMD_CMP_OPERATIONS(__isunordered)
+#undef _GLIBCXX_SIMD_CMP_OPERATIONS
+
+ // smart_reference access {{{2
+ template <typename _Tp, typename... _As, typename _Up>
+ _GLIBCXX_SIMD_INTRINSIC static void __set(_SimdTuple<_Tp, _As...>& __v,
+ int __i, _Up&& __x) noexcept
+ {
+ __v.__set(__i, static_cast<_Up&&>(__x));
+ }
+
+ // __masked_assign {{{2
+ template <typename _Tp, typename... _As>
+ _GLIBCXX_SIMD_INTRINSIC static void
+ __masked_assign(const _MaskMember __bits, _SimdTuple<_Tp, _As...>& __lhs,
+ const __id<_SimdTuple<_Tp, _As...>>& __rhs)
+ {
+ __for_each(
+ __lhs,
+ __rhs, [&](auto __meta, auto& __native_lhs, auto __native_rhs) constexpr {
+ __meta.__masked_assign(__meta.__make_mask(__bits), __native_lhs,
+ __native_rhs);
+ });
+ }
+
+ // Optimization for the case where the RHS is a scalar. No need to broadcast
+ // the scalar to a simd first.
+ template <typename _Tp, typename... _As>
+ _GLIBCXX_SIMD_INTRINSIC static void
+ __masked_assign(const _MaskMember __bits, _SimdTuple<_Tp, _As...>& __lhs,
+ const __id<_Tp> __rhs)
+ {
+ __for_each(
+ __lhs, [&](auto __meta, auto& __native_lhs) constexpr {
+ __meta.__masked_assign(__meta.__make_mask(__bits), __native_lhs, __rhs);
+ });
+ }
+
+ // __masked_cassign {{{2
+ template <typename _Op, typename _Tp, typename... _As>
+ static inline void
+ __masked_cassign(const _MaskMember __bits, _SimdTuple<_Tp, _As...>& __lhs,
+ const _SimdTuple<_Tp, _As...>& __rhs, _Op __op)
+ {
+ __for_each(
+ __lhs,
+ __rhs, [&](auto __meta, auto& __native_lhs, auto __native_rhs) constexpr {
+ __meta.template __masked_cassign(__meta.__make_mask(__bits),
+ __native_lhs, __native_rhs, __op);
+ });
+ }
+
+ // Optimization for the case where the RHS is a scalar. No need to broadcast
+ // the scalar to a simd first.
+ template <typename _Op, typename _Tp, typename... _As>
+ static inline void __masked_cassign(const _MaskMember __bits,
+ _SimdTuple<_Tp, _As...>& __lhs,
+ const _Tp& __rhs, _Op __op)
+ {
+ __for_each(
+ __lhs, [&](auto __meta, auto& __native_lhs) constexpr {
+ __meta.template __masked_cassign(__meta.__make_mask(__bits),
+ __native_lhs, __rhs, __op);
+ });
+ }
+
+ // __masked_unary {{{2
+ template <template <typename> class _Op, typename _Tp, typename... _As>
+ static inline _SimdTuple<_Tp, _As...>
+ __masked_unary(const _MaskMember __bits,
+ const _SimdTuple<_Tp, _As...> __v) // TODO: const-ref __v?
+ {
+ return __v.__apply_wrapped([&__bits](auto __meta, auto __native) constexpr {
+ return __meta.template __masked_unary<_Op>(__meta.__make_mask(__bits),
+ __native);
+ });
+ }
+
+ // }}}2
+};
+
+// _MaskImplFixedSize {{{1
+template <int _Np> struct _MaskImplFixedSize
+{
+ static_assert(sizeof(_ULLong) * CHAR_BIT >= _Np,
+ "The fixed_size implementation relies on one "
+ "_ULLong being able to store all boolean "
+ "elements."); // required in load & store
+
+ // member types {{{
+ using _Abi = simd_abi::fixed_size<_Np>;
+ template <typename _Tp>
+ using _FirstAbi = typename __fixed_size_storage_t<_Tp, _Np>::_FirstAbi;
+ using _MaskMember = _SanitizedBitMask<_Np>;
+ template <typename _Tp> using _TypeTag = _Tp*;
+
+ // }}}
+ // __broadcast {{{
+ template <typename>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember __broadcast(bool __x)
+ {
+ return __x ? ~_MaskMember() : _MaskMember();
+ }
+
+ // }}}
+ // __load {{{
+ template <typename, typename _Fp>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember __load(const bool* __mem)
+ {
+ using _Up = make_unsigned_t<__int_for_sizeof_t<bool>>;
+ const simd<_Up, _Abi> __bools(reinterpret_cast<const __may_alias<_Up>*>(
+ __mem),
+ _Fp());
+ return __data(__bools != 0);
+ }
+
+ // }}}
+ // __to_bits {{{
+ template <bool _Sanitized>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SanitizedBitMask<_Np>
+ __to_bits(_BitMask<_Np, _Sanitized> __x)
+ {
+ if constexpr (_Sanitized)
+ return __x;
+ else
+ return __x._M_sanitized();
+ }
+
+ // }}}
+ // __convert {{{
+ template <typename _Tp, typename _Up, typename _UAbi>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember
+ __convert(simd_mask<_Up, _UAbi> __x)
+ {
+ return _UAbi::_MaskImpl::__to_bits(__data(__x))
+ .template _M_extract<0, _Np>();
+ }
+
+ // }}}
+ // __from_bitmask {{{2
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static _MaskMember
+ __from_bitmask(_MaskMember __bits, _TypeTag<_Tp>) noexcept
+ {
+ return __bits;
+ }
+
+ // __load {{{2
+ template <typename _Fp>
+ static inline _MaskMember __load(const bool* __mem, _Fp __f) noexcept
+ {
+ // TODO: _UChar is not necessarily the best type to use here. For smaller
+ // _Np _UShort, _UInt, _ULLong, float, and double can be more efficient.
+ _ULLong __r = 0;
+ using _Vs = __fixed_size_storage_t<_UChar, _Np>;
+ __for_each(_Vs{}, [&](auto __meta, auto) {
+ __r |= __meta.__mask_to_shifted_ullong(
+ __meta._S_mask_impl.__load(&__mem[__meta._S_offset], __f,
+ _SizeConstant<__meta.size()>()));
+ });
+ return __r;
+ }
+
+ // __masked_load {{{2
+ template <typename _Fp>
+ static inline _MaskMember __masked_load(_MaskMember __merge,
+ _MaskMember __mask, const bool* __mem,
+ _Fp) noexcept
+ {
+ _BitOps::__bit_iteration(__mask.to_ullong(),
+ [&](auto __i) { __merge.set(__i, __mem[__i]); });
+ return __merge;
+ }
+
+ // __store {{{2
+ template <typename _Fp>
+ static inline void __store(const _MaskMember __bitmask, bool* __mem,
+ _Fp) noexcept
+ {
+ if constexpr (_Np == 1)
+ __mem[0] = __bitmask[0];
+ else
+ _FirstAbi<_UChar>::_CommonImpl::__store_bool_array(__bitmask, __mem,
+ _Fp());
+ }
+
+ // __masked_store {{{2
+ template <typename _Fp>
+ static inline void __masked_store(const _MaskMember __v, bool* __mem, _Fp,
+ const _MaskMember __k) noexcept
+ {
+ _BitOps::__bit_iteration(__k, [&](auto __i) { __mem[__i] = __v[__i]; });
+ }
+
+ // logical and bitwise operators {{{2
+ _GLIBCXX_SIMD_INTRINSIC static _MaskMember
+ __logical_and(const _MaskMember& __x, const _MaskMember& __y) noexcept
+ {
+ return __x & __y;
+ }
+
+ _GLIBCXX_SIMD_INTRINSIC static _MaskMember
+ __logical_or(const _MaskMember& __x, const _MaskMember& __y) noexcept
+ {
+ return __x | __y;
+ }
+
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember
+ __bit_not(const _MaskMember& __x) noexcept
+ {
+ return ~__x;
+ }
+
+ _GLIBCXX_SIMD_INTRINSIC static _MaskMember
+ __bit_and(const _MaskMember& __x, const _MaskMember& __y) noexcept
+ {
+ return __x & __y;
+ }
+
+ _GLIBCXX_SIMD_INTRINSIC static _MaskMember
+ __bit_or(const _MaskMember& __x, const _MaskMember& __y) noexcept
+ {
+ return __x | __y;
+ }
+
+ _GLIBCXX_SIMD_INTRINSIC static _MaskMember
+ __bit_xor(const _MaskMember& __x, const _MaskMember& __y) noexcept
+ {
+ return __x ^ __y;
+ }
+
+ // smart_reference access {{{2
+ _GLIBCXX_SIMD_INTRINSIC static void __set(_MaskMember& __k, int __i,
+ bool __x) noexcept
+ {
+ __k.set(__i, __x);
+ }
+
+ // __masked_assign {{{2
+ _GLIBCXX_SIMD_INTRINSIC static void __masked_assign(const _MaskMember __k,
+ _MaskMember& __lhs,
+ const _MaskMember __rhs)
+ {
+ __lhs = (__lhs & ~__k) | (__rhs & __k);
+ }
+
+ // Optimization for the case where the RHS is a scalar.
+ _GLIBCXX_SIMD_INTRINSIC static void
+ __masked_assign(const _MaskMember __k, _MaskMember& __lhs, const bool __rhs)
+ {
+ if (__rhs)
+ {
+ __lhs |= __k;
+ }
+ else
+ {
+ __lhs &= ~__k;
+ }
+ }
+
+ // }}}2
+ // __all_of {{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static bool __all_of(simd_mask<_Tp, _Abi> __k)
+ {
+ return __data(__k).all();
+ }
+
+ // }}}
+ // __any_of {{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static bool __any_of(simd_mask<_Tp, _Abi> __k)
+ {
+ return __data(__k).any();
+ }
+
+ // }}}
+ // __none_of {{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static bool __none_of(simd_mask<_Tp, _Abi> __k)
+ {
+ return __data(__k).none();
+ }
+
+ // }}}
+ // __some_of {{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static bool
+ __some_of([[maybe_unused]] simd_mask<_Tp, _Abi> __k)
+ {
+ if constexpr (_Np == 1)
+ return false;
+ else
+ return __data(__k).any() && !__data(__k).all();
+ }
+
+ // }}}
+ // __popcount {{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static int __popcount(simd_mask<_Tp, _Abi> __k)
+ {
+ return __data(__k).count();
+ }
+
+ // }}}
+ // __find_first_set {{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static int __find_first_set(simd_mask<_Tp, _Abi> __k)
+ {
+ return _BitOps::__firstbit(__data(__k).to_ullong());
+ }
+
+ // }}}
+ // __find_last_set {{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static int __find_last_set(simd_mask<_Tp, _Abi> __k)
+ {
+ return _BitOps::__lastbit(__data(__k).to_ullong());
+ }
+
+ // }}}
+};
+// }}}1
+
+_GLIBCXX_SIMD_END_NAMESPACE
+#endif // __cplusplus >= 201703L
+#endif // _GLIBCXX_EXPERIMENTAL_SIMD_FIXED_SIZE_H_
+
+// vim: foldmethod=marker sw=2 noet ts=8 sts=2 tw=80
diff --git a/libstdc++-v3/include/experimental/bits/simd_math.h b/libstdc++-v3/include/experimental/bits/simd_math.h
new file mode 100644
index 00000000000..4185a3bcaa1
--- /dev/null
+++ b/libstdc++-v3/include/experimental/bits/simd_math.h
@@ -0,0 +1,1451 @@
+// Math overloads for simd -*- C++ -*-
+
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library. This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+// <http://www.gnu.org/licenses/>.
+
+#ifndef _GLIBCXX_EXPERIMENTAL_SIMD_MATH_H_
+#define _GLIBCXX_EXPERIMENTAL_SIMD_MATH_H_
+
+#if __cplusplus >= 201703L
+
+#include <utility>
+#include <iomanip>
+
+_GLIBCXX_SIMD_BEGIN_NAMESPACE
+template <typename _Tp, typename _V>
+using __samesize = fixed_size_simd<_Tp, _V::size()>;
+// __math_return_type {{{
+template <typename _DoubleR, typename _Tp, typename _Abi>
+struct __math_return_type;
+template <typename _DoubleR, typename _Tp, typename _Abi>
+using __math_return_type_t =
+ typename __math_return_type<_DoubleR, _Tp, _Abi>::type;
+
+template <typename _Tp, typename _Abi>
+struct __math_return_type<double, _Tp, _Abi>
+{
+ using type = std::experimental::simd<_Tp, _Abi>;
+};
+template <typename _Tp, typename _Abi>
+struct __math_return_type<bool, _Tp, _Abi>
+{
+ using type = std::experimental::simd_mask<_Tp, _Abi>;
+};
+template <typename _DoubleR, typename _Tp, typename _Abi>
+struct __math_return_type
+{
+ using type
+ = std::experimental::fixed_size_simd<_DoubleR, simd_size_v<_Tp, _Abi>>;
+};
+//}}}
+// _GLIBCXX_SIMD_MATH_CALL_ {{{
+#define _GLIBCXX_SIMD_MATH_CALL_(__name) \
+ template <typename _Tp, typename _Abi, typename..., \
+ typename _R = std::experimental::__math_return_type_t< \
+ decltype(std::__name(std::declval<double>())), _Tp, _Abi>> \
+ enable_if_t<std::is_floating_point_v<_Tp>, _R> __name( \
+ std::experimental::simd<_Tp, _Abi> __x) \
+ { \
+ return {std::experimental::__private_init, \
+ _Abi::_SimdImpl::__##__name(std::experimental::__data(__x))}; \
+ }
+
+// }}}
+//__extra_argument_type{{{
+template <typename _Up, typename _Tp, typename _Abi>
+struct __extra_argument_type;
+
+template <typename _Tp, typename _Abi>
+struct __extra_argument_type<_Tp*, _Tp, _Abi>
+{
+ using type = std::experimental::simd<_Tp, _Abi>*;
+ static constexpr double* declval();
+ _GLIBCXX_SIMD_INTRINSIC static constexpr auto __data(type __x)
+ {
+ return &std::experimental::__data(*__x);
+ }
+ static constexpr bool __needs_temporary_scalar = true;
+};
+template <typename _Up, typename _Tp, typename _Abi>
+struct __extra_argument_type<_Up*, _Tp, _Abi>
+{
+ static_assert(std::is_integral_v<_Up>);
+ using type = std::experimental::fixed_size_simd<
+ _Up, std::experimental::simd_size_v<_Tp, _Abi>>*;
+ static constexpr _Up* declval();
+ _GLIBCXX_SIMD_INTRINSIC static constexpr auto __data(type __x)
+ {
+ return &std::experimental::__data(*__x);
+ }
+ static constexpr bool __needs_temporary_scalar = true;
+};
+template <typename _Tp, typename _Abi>
+struct __extra_argument_type<_Tp, _Tp, _Abi>
+{
+ using type = std::experimental::simd<_Tp, _Abi>;
+ static constexpr double declval();
+ _GLIBCXX_SIMD_INTRINSIC static constexpr decltype(auto)
+ __data(const type& __x)
+ {
+ return std::experimental::__data(__x);
+ }
+ static constexpr bool __needs_temporary_scalar = false;
+};
+template <typename _Up, typename _Tp, typename _Abi>
+struct __extra_argument_type
+{
+ static_assert(std::is_integral_v<_Up>);
+ using type = std::experimental::fixed_size_simd<
+ _Up, std::experimental::simd_size_v<_Tp, _Abi>>;
+ static constexpr _Up declval();
+ _GLIBCXX_SIMD_INTRINSIC static constexpr decltype(auto)
+ __data(const type& __x)
+ {
+ return std::experimental::__data(__x);
+ }
+ static constexpr bool __needs_temporary_scalar = false;
+};
+//}}}
+// _GLIBCXX_SIMD_MATH_CALL2_ {{{
+#define _GLIBCXX_SIMD_MATH_CALL2_(__name, arg2_) \
+ template <typename _Tp, typename _Abi, typename..., \
+ typename _Arg2 \
+ = std::experimental::__extra_argument_type<arg2_, _Tp, _Abi>, \
+ typename _R = std::experimental::__math_return_type_t< \
+ decltype(std::__name(std::declval<double>(), _Arg2::declval())), \
+ _Tp, _Abi>> \
+ enable_if_t<std::is_floating_point_v<_Tp>, _R> __name( \
+ const std::experimental::simd<_Tp, _Abi>& __x, \
+ const typename _Arg2::type& __y) \
+ { \
+ return {std::experimental::__private_init, \
+ _Abi::_SimdImpl::__##__name(std::experimental::__data(__x), \
+ _Arg2::__data(__y))}; \
+ } \
+ template <typename _Up, typename _Tp, typename _Abi> \
+ _GLIBCXX_SIMD_INTRINSIC std::experimental::__math_return_type_t< \
+ decltype(std::__name( \
+ std::declval<double>(), \
+ std::declval<enable_if_t< \
+ std::conjunction_v< \
+ std::is_same<arg2_, _Tp>, \
+ std::negation<std::is_same<__remove_cvref_t<_Up>, \
+ std::experimental::simd<_Tp, _Abi>>>, \
+ std::is_convertible<_Up, std::experimental::simd<_Tp, _Abi>>, \
+ std::is_floating_point<_Tp>>, \
+ double>>())), \
+ _Tp, _Abi> \
+ __name(_Up&& __xx, const std::experimental::simd<_Tp, _Abi>& __yy) \
+ { \
+ return std::experimental::__name(std::experimental::simd<_Tp, _Abi>( \
+ static_cast<_Up&&>(__xx)), \
+ __yy); \
+ }
+
+// }}}
+// _GLIBCXX_SIMD_MATH_CALL3_ {{{
+#define _GLIBCXX_SIMD_MATH_CALL3_(__name, arg2_, arg3_) \
+ template <typename _Tp, typename _Abi, typename..., \
+ typename _Arg2 \
+ = std::experimental::__extra_argument_type<arg2_, _Tp, _Abi>, \
+ typename _Arg3 \
+ = std::experimental::__extra_argument_type<arg3_, _Tp, _Abi>, \
+ typename _R = std::experimental::__math_return_type_t< \
+ decltype(std::__name(std::declval<double>(), _Arg2::declval(), \
+ _Arg3::declval())), \
+ _Tp, _Abi>> \
+ enable_if_t<std::is_floating_point_v<_Tp>, _R> __name( \
+ std::experimental::simd<_Tp, _Abi> __x, typename _Arg2::type __y, \
+ typename _Arg3::type __z) \
+ { \
+ return {std::experimental::__private_init, \
+ _Abi::_SimdImpl::__##__name(std::experimental::__data(__x), \
+ _Arg2::__data(__y), \
+ _Arg3::__data(__z))}; \
+ } \
+ template <typename _Tp, typename _Up, typename _V, typename..., \
+ typename _TT = __remove_cvref_t<_Tp>, \
+ typename _UU = __remove_cvref_t<_Up>, \
+ typename _VV = __remove_cvref_t<_V>, \
+ typename _Simd \
+ = std::conditional_t<std::experimental::is_simd_v<_UU>, _UU, _VV>> \
+ _GLIBCXX_SIMD_INTRINSIC decltype( \
+ std::experimental::__name(_Simd(std::declval<_Tp>()), \
+ _Simd(std::declval<_Up>()), \
+ _Simd(std::declval<_V>()))) \
+ __name(_Tp&& __xx, _Up&& __yy, _V&& __zz) \
+ { \
+ return std::experimental::__name(_Simd(static_cast<_Tp&&>(__xx)), \
+ _Simd(static_cast<_Up&&>(__yy)), \
+ _Simd(static_cast<_V&&>(__zz))); \
+ }
+
+// }}}
+// __cosSeries {{{
+template <typename _Abi>
+_GLIBCXX_SIMD_ALWAYS_INLINE static simd<float, _Abi>
+__cosSeries(const simd<float, _Abi>& __x)
+{
+ const simd<float, _Abi> __x2 = __x * __x;
+ simd<float, _Abi> __y;
+ __y = 0x1.ap-16f; // 1/8!
+ __y = __y * __x2 - 0x1.6c1p-10f; // -1/6!
+ __y = __y * __x2 + 0x1.555556p-5f; // 1/4!
+ return __y * (__x2 * __x2) - .5f * __x2 + 1.f;
+}
+template <typename _Abi>
+_GLIBCXX_SIMD_ALWAYS_INLINE static simd<double, _Abi>
+__cosSeries(const simd<double, _Abi>& __x)
+{
+ const simd<double, _Abi> __x2 = __x * __x;
+ simd<double, _Abi> __y;
+ __y = 0x1.AC00000000000p-45; // 1/16!
+ __y = __y * __x2 - 0x1.9394000000000p-37; // -1/14!
+ __y = __y * __x2 + 0x1.1EED8C0000000p-29; // 1/12!
+ __y = __y * __x2 - 0x1.27E4FB7400000p-22; // -1/10!
+ __y = __y * __x2 + 0x1.A01A01A018000p-16; // 1/8!
+ __y = __y * __x2 - 0x1.6C16C16C16C00p-10; // -1/6!
+ __y = __y * __x2 + 0x1.5555555555554p-5; // 1/4!
+ return (__y * __x2 - .5f) * __x2 + 1.f;
+}
+
+// }}}
+// __sinSeries {{{
+template <typename _Abi>
+_GLIBCXX_SIMD_ALWAYS_INLINE static simd<float, _Abi>
+__sinSeries(const simd<float, _Abi>& __x)
+{
+ const simd<float, _Abi> __x2 = __x * __x;
+ simd<float, _Abi> __y;
+ __y = -0x1.9CC000p-13f; // -1/7!
+ __y = __y * __x2 + 0x1.111100p-7f; // 1/5!
+ __y = __y * __x2 - 0x1.555556p-3f; // -1/3!
+ return __y * (__x2 * __x) + __x;
+}
+
+template <typename _Abi>
+_GLIBCXX_SIMD_ALWAYS_INLINE static simd<double, _Abi>
+__sinSeries(const simd<double, _Abi>& __x)
+{
+ // __x = [0, 0.7854 = pi/4]
+ // __x² = [0, 0.6169 = pi²/8]
+ const simd<double, _Abi> __x2 = __x * __x;
+ simd<double, _Abi> __y;
+ __y = -0x1.ACF0000000000p-41; // -1/15!
+ __y = __y * __x2 + 0x1.6124400000000p-33; // 1/13!
+ __y = __y * __x2 - 0x1.AE64567000000p-26; // -1/11!
+ __y = __y * __x2 + 0x1.71DE3A5540000p-19; // 1/9!
+ __y = __y * __x2 - 0x1.A01A01A01A000p-13; // -1/7!
+ __y = __y * __x2 + 0x1.1111111111110p-7; // 1/5!
+ __y = __y * __x2 - 0x1.5555555555555p-3; // -1/3!
+ return __y * (__x2 * __x) + __x;
+}
+
+// }}}
+// __zero_low_bits {{{
+template <int _Bits, typename _Tp, typename _Abi>
+_GLIBCXX_SIMD_INTRINSIC simd<_Tp, _Abi>
+__zero_low_bits(simd<_Tp, _Abi> __x)
+{
+ const simd<_Tp, _Abi> __bitmask = __bit_cast<_Tp>(
+ ~std::make_unsigned_t<__int_for_sizeof_t<_Tp>>() << _Bits);
+ return {__private_init,
+ _Abi::_SimdImpl::__bit_and(__data(__x), __data(__bitmask))};
+}
+
+// }}}
+// __fold_input {{{
+
+/**\internal
+ * Fold \p x into [-¼π, ¼π] and remember the quadrant it came from:
+ * quadrant 0: [-¼π, ¼π]
+ * quadrant 1: [ ¼π, ¾π]
+ * quadrant 2: [ ¾π, 1¼π]
+ * quadrant 3: [1¼π, 1¾π]
+ *
+ * The algorithm determines `y` as the multiple `x - y * ¼π = [-¼π, ¼π]`. Using
+ * a bitmask, `y` is reduced to `quadrant`. `y` can be calculated as
+ * ```
+ * y = trunc(x / ¼π);
+ * y += fmod(y, 2);
+ * ```
+ * This can be simplified by moving the (implicit) division by 2 into the
+ * truncation expression. The `+= fmod` effect can the be achieved by using
+ * rounding instead of truncation: `y = round(x / ½π) * 2`. If precision allows,
+ * `2/π * x` is better (faster).
+ */
+template <typename _Tp, typename _Abi> struct __folded
+{
+ simd<_Tp, _Abi> _M_x;
+ rebind_simd_t<int, simd<_Tp, _Abi>> _M_quadrant;
+};
+
+namespace __math_float {
+inline constexpr float __pi_over_4 = 0x1.921FB6p-1f; // π/4
+inline constexpr float __2_over_pi = 0x1.45F306p-1f; // 2/π
+inline constexpr float __pi_2_5bits0
+ = 0x1.921fc0p0f; // π/2, 5 0-bits (least significant)
+inline constexpr float __pi_2_5bits0_rem
+ = -0x1.5777a6p-21f; // π/2 - __pi_2_5bits0
+} // namespace __math_float
+namespace __math_double {
+inline constexpr double __pi_over_4 = 0x1.921fb54442d18p-1; // π/4
+inline constexpr double __2_over_pi = 0x1.45F306DC9C883p-1; // 2/π
+inline constexpr double __pi_2 = 0x1.921fb54442d18p0; // π/2
+} // namespace __math_double
+
+template <typename _Abi>
+_GLIBCXX_SIMD_ALWAYS_INLINE __folded<float, _Abi>
+__fold_input(const simd<float, _Abi>& __x)
+{
+ using _V = simd<float, _Abi>;
+ using _IV = rebind_simd_t<int, _V>;
+ using namespace __math_float;
+ __folded<float, _Abi> __r;
+ __r._M_x = abs(__x);
+#if 0
+ // zero most mantissa bits:
+ constexpr float __1_over_pi = 0x1.45F306p-2f; // 1/π
+ const auto __y = (__r._M_x * __1_over_pi + 0x1.8p23f) - 0x1.8p23f;
+ // split π into 4 parts, the first three with 13 trailing zeros (to make the following
+ // multiplications precise):
+ constexpr float __pi0 = 0x1.920000p1f;
+ constexpr float __pi1 = 0x1.fb4000p-11f;
+ constexpr float __pi2 = 0x1.444000p-23f;
+ constexpr float __pi3 = 0x1.68c234p-38f;
+ __r._M_x - __y*__pi0 - __y*__pi1 - __y*__pi2 - __y*__pi3
+#else
+ if (_GLIBCXX_SIMD_IS_UNLIKELY(all_of(__r._M_x < __pi_over_4)))
+ __r._M_quadrant = 0;
+ else if (_GLIBCXX_SIMD_IS_LIKELY(all_of(__r._M_x < 6 * __pi_over_4)))
+ {
+ const _V __y = nearbyint(__r._M_x * __2_over_pi);
+ __r._M_quadrant = static_simd_cast<_IV>(__y) & 3; // __y mod 4
+ __r._M_x -= __y * __pi_2_5bits0;
+ __r._M_x -= __y * __pi_2_5bits0_rem;
+ }
+ else
+ {
+ using __math_double::__2_over_pi;
+ using __math_double::__pi_2;
+ using _VD = rebind_simd_t<double, _V>;
+ _VD __xd = static_simd_cast<_VD>(__r._M_x);
+ _VD __y = nearbyint(__xd * __2_over_pi);
+ __r._M_quadrant = static_simd_cast<_IV>(__y) & 3; // = __y mod 4
+ __r._M_x = static_simd_cast<_V>(__xd - __y * __pi_2);
+ }
+#endif
+ return __r;
+}
+
+template <typename _Abi>
+_GLIBCXX_SIMD_ALWAYS_INLINE __folded<double, _Abi>
+__fold_input(const simd<double, _Abi>& __x)
+{
+ using _V = simd<double, _Abi>;
+ using _IV = rebind_simd_t<int, _V>;
+ using namespace __math_double;
+
+ __folded<double, _Abi> __r;
+ __r._M_x = abs(__x);
+ if (_GLIBCXX_SIMD_IS_UNLIKELY(all_of(__r._M_x < __pi_over_4)))
+ {
+ __r._M_quadrant = 0;
+ return __r;
+ }
+ const _V __y = nearbyint(__r._M_x / (2 * __pi_over_4));
+ __r._M_quadrant = static_simd_cast<_IV>(__y) & 3;
+
+ if (_GLIBCXX_SIMD_IS_LIKELY(all_of(__r._M_x < 1025 * __pi_over_4)))
+ {
+ // x - y * pi/2, y uses no more than 11 mantissa bits
+ __r._M_x -= __y * 0x1.921FB54443000p0;
+ __r._M_x -= __y * -0x1.73DCB3B39A000p-43;
+ __r._M_x -= __y * 0x1.45C06E0E68948p-86;
+ }
+ else if (_GLIBCXX_SIMD_IS_LIKELY(all_of(__y <= 0x1.0p30)))
+ {
+ // x - y * pi/2, y uses no more than 29 mantissa bits
+ __r._M_x -= __y * 0x1.921FB40000000p0;
+ __r._M_x -= __y * 0x1.4442D00000000p-24;
+ __r._M_x -= __y * 0x1.8469898CC5170p-48;
+ }
+ else
+ {
+ // x - y * pi/2, y may require all mantissa bits
+ const _V __y_hi = __zero_low_bits<26>(__y);
+ const _V __y_lo = __y - __y_hi;
+ const auto __pi_2_1 = 0x1.921FB50000000p0;
+ const auto __pi_2_2 = 0x1.110B460000000p-26;
+ const auto __pi_2_3 = 0x1.1A62630000000p-54;
+ const auto __pi_2_4 = 0x1.8A2E03707344Ap-81;
+ __r._M_x = __r._M_x - __y_hi * __pi_2_1
+ - max(__y_hi * __pi_2_2, __y_lo * __pi_2_1)
+ - min(__y_hi * __pi_2_2, __y_lo * __pi_2_1)
+ - max(__y_hi * __pi_2_3, __y_lo * __pi_2_2)
+ - min(__y_hi * __pi_2_3, __y_lo * __pi_2_2)
+ - max(__y * __pi_2_4, __y_lo * __pi_2_3)
+ - min(__y * __pi_2_4, __y_lo * __pi_2_3);
+ }
+ return __r;
+}
+
+// }}}
+// __extract_exponent_bits {{{
+template <typename _Abi>
+rebind_simd_t<int, simd<float, _Abi>>
+__extract_exponent_bits(const simd<float, _Abi>& __v)
+{
+ using namespace std::experimental::__proposed;
+ using namespace std::experimental::__proposed::float_bitwise_operators;
+ _GLIBCXX_SIMD_CONSTEXPR simd<float, _Abi> __exponent_mask
+ = std::numeric_limits<float>::infinity(); // 0x7f800000
+ return __bit_cast<rebind_simd_t<int, simd<float, _Abi>>>(__v
+ & __exponent_mask);
+}
+
+template <typename _Abi>
+rebind_simd_t<int, simd<double, _Abi>>
+__extract_exponent_bits(const simd<double, _Abi>& __v)
+{
+ using namespace std::experimental::_P0918;
+ using namespace std::experimental::__proposed::float_bitwise_operators;
+ const simd<double, _Abi> __exponent_mask
+ = std::numeric_limits<double>::infinity(); // 0x7ff0000000000000
+ constexpr auto _Np = simd_size_v<double, _Abi> * 2;
+ constexpr auto _Max = simd_abi::max_fixed_size<int>;
+ if constexpr (_Np > _Max)
+ {
+ const auto __tup
+ = split<_Max / 2, (_Np - _Max) / 2>(__v & __exponent_mask);
+ return concat(
+ shuffle<strided<2, 1>>(
+ __bit_cast<simd<int, simd_abi::deduce_t<int, _Max>>>(
+ std::get<0>(__tup))),
+ shuffle<strided<2, 1>>(
+ __bit_cast<simd<int, simd_abi::deduce_t<int, _Np - _Max>>>(
+ std::get<1>(__tup))));
+ }
+ else
+ return shuffle<strided<2, 1>>(
+ __bit_cast<simd<int, simd_abi::deduce_t<int, _Np>>>(__v
+ & __exponent_mask));
+}
+
+// }}}
+// __impl_or_fallback {{{
+template <typename ImplFun, typename FallbackFun, typename... _Args>
+_GLIBCXX_SIMD_INTRINSIC auto
+__impl_or_fallback_dispatch(int, ImplFun&& __impl_fun, FallbackFun&&,
+ _Args&&... __args)
+ -> decltype(__impl_fun(static_cast<_Args&&>(__args)...))
+{
+ return __impl_fun(static_cast<_Args&&>(__args)...);
+}
+
+template <typename ImplFun, typename FallbackFun, typename... _Args>
+inline auto
+__impl_or_fallback_dispatch(float, ImplFun&&, FallbackFun&& __fallback_fun,
+ _Args&&... __args)
+ -> decltype(__fallback_fun(static_cast<_Args&&>(__args)...))
+{
+ return __fallback_fun(static_cast<_Args&&>(__args)...);
+}
+
+template <typename... _Args>
+_GLIBCXX_SIMD_INTRINSIC auto
+__impl_or_fallback(_Args&&... __args)
+{
+ return __impl_or_fallback_dispatch(int(), static_cast<_Args&&>(__args)...);
+} //}}}
+
+// trigonometric functions {{{
+_GLIBCXX_SIMD_MATH_CALL_(acos)
+_GLIBCXX_SIMD_MATH_CALL_(asin)
+_GLIBCXX_SIMD_MATH_CALL_(atan)
+_GLIBCXX_SIMD_MATH_CALL2_(atan2, _Tp)
+
+/*
+ * algorithm for sine and cosine:
+ *
+ * The result can be calculated with sine or cosine depending on the π/4 section
+ * the input is in. sine ≈ __x + __x³ cosine ≈ 1 - __x²
+ *
+ * sine:
+ * Map -__x to __x and invert the output
+ * Extend precision of __x - n * π/4 by calculating
+ * ((__x - n * p1) - n * p2) - n * p3 (p1 + p2 + p3 = π/4)
+ *
+ * Calculate Taylor series with tuned coefficients.
+ * Fix sign.
+ */
+// cos{{{
+template <typename _Tp, typename _Abi>
+enable_if_t<std::is_floating_point_v<_Tp>, simd<_Tp, _Abi>>
+cos(const simd<_Tp, _Abi>& __x)
+{
+ using _V = simd<_Tp, _Abi>;
+ if constexpr (__is_scalar_abi<_Abi>() || __is_fixed_size_abi_v<_Abi>)
+ return {__private_init, _Abi::_SimdImpl::__cos(__data(__x))};
+ else
+ {
+ if constexpr (is_same_v<_Tp, float>)
+ if (_GLIBCXX_SIMD_IS_UNLIKELY(any_of(abs(__x) >= 393382)))
+ return static_simd_cast<_V>(
+ cos(static_simd_cast<rebind_simd_t<double, _V>>(__x)));
+
+ const auto __f = __fold_input(__x);
+ // quadrant | effect
+ // 0 | cosSeries, +
+ // 1 | sinSeries, -
+ // 2 | cosSeries, -
+ // 3 | sinSeries, +
+ using namespace std::experimental::__proposed::float_bitwise_operators;
+ const _V __sign_flip
+ = _V(-0.f) & static_simd_cast<_V>((1 + __f._M_quadrant) << 30);
+
+ const auto __need_cos = (__f._M_quadrant & 1) == 0;
+ if (_GLIBCXX_SIMD_IS_UNLIKELY(all_of(__need_cos)))
+ return __sign_flip ^ __cosSeries(__f._M_x);
+ else if (_GLIBCXX_SIMD_IS_UNLIKELY(none_of(__need_cos)))
+ return __sign_flip ^ __sinSeries(__f._M_x);
+ else // some_of(__need_cos)
+ {
+ _V __r = __sinSeries(__f._M_x);
+ where(__need_cos.__cvt(), __r) = __cosSeries(__f._M_x);
+ return __r ^ __sign_flip;
+ }
+ }
+}
+
+template <typename _Tp>
+_GLIBCXX_SIMD_ALWAYS_INLINE
+ enable_if_t<std::is_floating_point<_Tp>::value, simd<_Tp, simd_abi::scalar>>
+ cos(simd<_Tp, simd_abi::scalar> __x)
+{
+ return std::cos(__data(__x));
+}
+//}}}
+// sin{{{
+template <typename _Tp, typename _Abi>
+enable_if_t<std::is_floating_point_v<_Tp>, simd<_Tp, _Abi>>
+sin(const simd<_Tp, _Abi>& __x)
+{
+ using _V = simd<_Tp, _Abi>;
+ if constexpr (__is_scalar_abi<_Abi>() || __is_fixed_size_abi_v<_Abi>)
+ return {__private_init, _Abi::_SimdImpl::__sin(__data(__x))};
+ else
+ {
+ if constexpr (is_same_v<_Tp, float>)
+ if (_GLIBCXX_SIMD_IS_UNLIKELY(any_of(abs(__x) >= 527449)))
+ return static_simd_cast<_V>(
+ sin(static_simd_cast<rebind_simd_t<double, _V>>(__x)));
+
+ const auto __f = __fold_input(__x);
+ // quadrant | effect
+ // 0 | sinSeries
+ // 1 | cosSeries
+ // 2 | sinSeries, sign flip
+ // 3 | cosSeries, sign flip
+ using namespace std::experimental::__proposed::float_bitwise_operators;
+ const auto __sign_flip
+ = (__x ^ static_simd_cast<_V>(1 - __f._M_quadrant)) & _V(_Tp(-0.));
+
+ const auto __need_sin = (__f._M_quadrant & 1) == 0;
+ if (_GLIBCXX_SIMD_IS_UNLIKELY(all_of(__need_sin)))
+ return __sign_flip ^ __sinSeries(__f._M_x);
+ else if (_GLIBCXX_SIMD_IS_UNLIKELY(none_of(__need_sin)))
+ return __sign_flip ^ __cosSeries(__f._M_x);
+ else // some_of(__need_sin)
+ {
+ _V __r = __cosSeries(__f._M_x);
+ where(__need_sin.__cvt(), __r) = __sinSeries(__f._M_x);
+ return __sign_flip ^ __r;
+ }
+ }
+}
+
+template <typename _Tp>
+_GLIBCXX_SIMD_ALWAYS_INLINE
+ enable_if_t<std::is_floating_point<_Tp>::value, simd<_Tp, simd_abi::scalar>>
+ sin(simd<_Tp, simd_abi::scalar> __x)
+{
+ return std::sin(__data(__x));
+}
+//}}}
+
+_GLIBCXX_SIMD_MATH_CALL_(tan)
+_GLIBCXX_SIMD_MATH_CALL_(acosh)
+_GLIBCXX_SIMD_MATH_CALL_(asinh)
+_GLIBCXX_SIMD_MATH_CALL_(atanh)
+_GLIBCXX_SIMD_MATH_CALL_(cosh)
+_GLIBCXX_SIMD_MATH_CALL_(sinh)
+_GLIBCXX_SIMD_MATH_CALL_(tanh)
+// }}}
+// exponential functions {{{
+_GLIBCXX_SIMD_MATH_CALL_(exp)
+_GLIBCXX_SIMD_MATH_CALL_(exp2)
+_GLIBCXX_SIMD_MATH_CALL_(expm1)
+// }}}
+// frexp {{{
+#if _GLIBCXX_SIMD_X86INTRIN
+template <typename _Tp, size_t _Np>
+_SimdWrapper<_Tp, _Np>
+__getexp(_SimdWrapper<_Tp, _Np> __x)
+{
+ if constexpr (__have_avx512vl && __is_sse_ps<_Tp, _Np>())
+ return __auto_bitcast(_mm_getexp_ps(__to_intrin(__x)));
+ else if constexpr (__have_avx512f && __is_sse_ps<_Tp, _Np>())
+ return __auto_bitcast(_mm512_getexp_ps(__auto_bitcast(__to_intrin(__x))));
+ else if constexpr (__have_avx512vl && __is_sse_pd<_Tp, _Np>())
+ return _mm_getexp_pd(__x);
+ else if constexpr (__have_avx512f && __is_sse_pd<_Tp, _Np>())
+ return __lo128(_mm512_getexp_pd(__auto_bitcast(__x)));
+ else if constexpr (__have_avx512vl && __is_avx_ps<_Tp, _Np>())
+ return _mm256_getexp_ps(__x);
+ else if constexpr (__have_avx512f && __is_avx_ps<_Tp, _Np>())
+ return __lo256(_mm512_getexp_ps(__auto_bitcast(__x)));
+ else if constexpr (__have_avx512vl && __is_avx_pd<_Tp, _Np>())
+ return _mm256_getexp_pd(__x);
+ else if constexpr (__have_avx512f && __is_avx_pd<_Tp, _Np>())
+ return __lo256(_mm512_getexp_pd(__auto_bitcast(__x)));
+ else if constexpr (__is_avx512_ps<_Tp, _Np>())
+ return _mm512_getexp_ps(__x);
+ else if constexpr (__is_avx512_pd<_Tp, _Np>())
+ return _mm512_getexp_pd(__x);
+ else
+ __assert_unreachable<_Tp>();
+}
+
+template <typename _Tp, size_t _Np>
+_SimdWrapper<_Tp, _Np>
+__getmant_avx512(_SimdWrapper<_Tp, _Np> __x)
+{
+ if constexpr (__have_avx512vl && __is_sse_ps<_Tp, _Np>())
+ return __auto_bitcast(
+ _mm_getmant_ps(__to_intrin(__x), _MM_MANT_NORM_p5_1, _MM_MANT_SIGN_src));
+ else if constexpr (__have_avx512f && __is_sse_ps<_Tp, _Np>())
+ return __auto_bitcast(_mm512_getmant_ps(__auto_bitcast(__to_intrin(__x)),
+ _MM_MANT_NORM_p5_1,
+ _MM_MANT_SIGN_src));
+ else if constexpr (__have_avx512vl && __is_sse_pd<_Tp, _Np>())
+ return _mm_getmant_pd(__x, _MM_MANT_NORM_p5_1, _MM_MANT_SIGN_src);
+ else if constexpr (__have_avx512f && __is_sse_pd<_Tp, _Np>())
+ return __lo128(_mm512_getmant_pd(__auto_bitcast(__x), _MM_MANT_NORM_p5_1,
+ _MM_MANT_SIGN_src));
+ else if constexpr (__have_avx512vl && __is_avx_ps<_Tp, _Np>())
+ return _mm256_getmant_ps(__x, _MM_MANT_NORM_p5_1, _MM_MANT_SIGN_src);
+ else if constexpr (__have_avx512f && __is_avx_ps<_Tp, _Np>())
+ return __lo256(_mm512_getmant_ps(__auto_bitcast(__x), _MM_MANT_NORM_p5_1,
+ _MM_MANT_SIGN_src));
+ else if constexpr (__have_avx512vl && __is_avx_pd<_Tp, _Np>())
+ return _mm256_getmant_pd(__x, _MM_MANT_NORM_p5_1, _MM_MANT_SIGN_src);
+ else if constexpr (__have_avx512f && __is_avx_pd<_Tp, _Np>())
+ return __lo256(_mm512_getmant_pd(__auto_bitcast(__x), _MM_MANT_NORM_p5_1,
+ _MM_MANT_SIGN_src));
+ else if constexpr (__is_avx512_ps<_Tp, _Np>())
+ return _mm512_getmant_ps(__x, _MM_MANT_NORM_p5_1, _MM_MANT_SIGN_src);
+ else if constexpr (__is_avx512_pd<_Tp, _Np>())
+ return _mm512_getmant_pd(__x, _MM_MANT_NORM_p5_1, _MM_MANT_SIGN_src);
+ else
+ __assert_unreachable<_Tp>();
+}
+#endif // _GLIBCXX_SIMD_X86INTRIN
+
+/**
+ * splits \p __v into exponent and mantissa, the sign is kept with the mantissa
+ *
+ * The return value will be in the range [0.5, 1.0[
+ * The \p __e value will be an integer defining the power-of-two exponent
+ */
+template <typename _Tp, typename _Abi>
+enable_if_t<std::is_floating_point_v<_Tp>, simd<_Tp, _Abi>>
+frexp(const simd<_Tp, _Abi>& __x, __samesize<int, simd<_Tp, _Abi>>* __exp)
+{
+ if constexpr (simd_size_v<_Tp, _Abi> == 1)
+ {
+ int __tmp;
+ const auto __r = std::frexp(__x[0], &__tmp);
+ (*__exp)[0] = __tmp;
+ return __r;
+ }
+ else if constexpr (__is_fixed_size_abi_v<_Abi>)
+ {
+ return {__private_init,
+ _Abi::_SimdImpl::__frexp(__data(__x), __data(*__exp))};
+#if _GLIBCXX_SIMD_X86INTRIN
+ }
+ else if constexpr (__have_avx512f)
+ {
+ using _IV = __samesize<int, simd<_Tp, _Abi>>;
+ constexpr size_t _Np = simd_size_v<_Tp, _Abi>;
+ constexpr size_t _NI = _Np < 4 ? 4 : _Np;
+ const auto __v = __data(__x);
+ const auto __isnonzero
+ = _Abi::_SimdImpl::__isnonzerovalue_mask(__v._M_data);
+ const _SimdWrapper<int, _NI> __exp_plus1
+ = 1 + __convert<_SimdWrapper<int, _NI>>(__getexp(__v))._M_data;
+ const _SimdWrapper<int, _Np> __e = __wrapper_bitcast<int, _Np>(
+ _Abi::_CommonImpl::_S_blend(_SimdWrapper<bool, _NI>(__isnonzero),
+ _SimdWrapper<int, _NI>(), __exp_plus1));
+ simd_abi::deduce_t<int, _Np>::_CommonImpl::__store(
+ __e, __exp, overaligned<alignof(_IV)>);
+ return {__private_init,
+ _Abi::_CommonImpl::_S_blend(_SimdWrapper<bool, _Np>(__isnonzero),
+ __v, __getmant_avx512(__v))};
+#endif // _GLIBCXX_SIMD_X86INTRIN
+ }
+ else
+ {
+ // fallback implementation
+ static_assert(sizeof(_Tp) == 4 || sizeof(_Tp) == 8);
+ using _V = simd<_Tp, _Abi>;
+ using _IV = rebind_simd_t<int, _V>;
+ using _Limits = std::numeric_limits<_Tp>;
+ using namespace std::experimental::__proposed;
+ using namespace std::experimental::__proposed::float_bitwise_operators;
+
+ constexpr int __exp_shift = sizeof(_Tp) == 4 ? 23 : 20;
+ constexpr int __exp_adjust = sizeof(_Tp) == 4 ? 0x7e : 0x3fe;
+ constexpr int __exp_offset = sizeof(_Tp) == 4 ? 0x70 : 0x200;
+ constexpr _Tp __subnorm_scale = sizeof(_Tp) == 4 ? 0x1p112 : 0x1p512;
+ _GLIBCXX_SIMD_CONSTEXPR _V __exponent_mask
+ = _Limits::infinity(); // 0x7f800000 or 0x7ff0000000000000
+ _GLIBCXX_SIMD_CONSTEXPR _V __p5_1_exponent
+ = _Tp(sizeof(_Tp) == 4 ? -0x1.fffffep-1 : -0x1.fffffffffffffp-1);
+
+ _V __mant = __p5_1_exponent & (__exponent_mask | __x);
+ const _IV __exponent_bits = __extract_exponent_bits(__x);
+ if (_GLIBCXX_SIMD_IS_LIKELY(all_of(isnormal(__x))))
+ {
+ *__exp = simd_cast<__samesize<int, _V>>(
+ (__exponent_bits >> __exp_shift) - __exp_adjust);
+ return __mant;
+ }
+
+ // can't use isunordered(x*inf, x*0) because inf*0 raises invalid
+ const auto __as_int
+ = __bit_cast<rebind_simd_t<__int_for_sizeof_t<_Tp>, _V>>(abs(__x));
+ const auto __inf = __bit_cast<rebind_simd_t<__int_for_sizeof_t<_Tp>, _V>>(
+ _V(std::numeric_limits<_Tp>::infinity()));
+ const auto __iszero_inf_nan = static_simd_cast<typename _V::mask_type>(
+ __as_int == 0 || __as_int >= __inf);
+
+ const _V __scaled_subnormal = __x * __subnorm_scale;
+ const _V __mant_subnormal
+ = __p5_1_exponent & (__exponent_mask | __scaled_subnormal);
+ where(!isnormal(__x), __mant) = __mant_subnormal;
+ where(__iszero_inf_nan, __mant) = __x;
+ _IV __e = __extract_exponent_bits(__scaled_subnormal);
+ using _MaskType = typename std::conditional_t<
+ sizeof(typename _V::mask_type) == sizeof(_IV), _V, _IV>::mask_type;
+ const _MaskType __value_isnormal = isnormal(__x).__cvt();
+ where(__value_isnormal.__cvt(), __e) = __exponent_bits;
+ static_assert(sizeof(_IV) == sizeof(__value_isnormal));
+ const _IV __offset
+ = (__bit_cast<_IV>(__value_isnormal) & _IV(__exp_adjust))
+ | (__bit_cast<_IV>(static_simd_cast<_MaskType>(__exponent_bits == 0)
+ & static_simd_cast<_MaskType>(__x != 0))
+ & _IV(__exp_adjust + __exp_offset));
+ *__exp = simd_cast<__samesize<int, _V>>((__e >> __exp_shift) - __offset);
+ return __mant;
+ }
+}
+// }}}
+_GLIBCXX_SIMD_MATH_CALL2_(ldexp, int)
+_GLIBCXX_SIMD_MATH_CALL_(ilogb)
+
+// logarithms {{{
+_GLIBCXX_SIMD_MATH_CALL_(log)
+_GLIBCXX_SIMD_MATH_CALL_(log10)
+_GLIBCXX_SIMD_MATH_CALL_(log1p)
+_GLIBCXX_SIMD_MATH_CALL_(log2)
+//}}}
+// logb{{{
+template <typename _Tp, typename _Abi>
+enable_if_t<std::is_floating_point<_Tp>::value, simd<_Tp, _Abi>>
+logb(const simd<_Tp, _Abi>& __x)
+{
+ constexpr size_t _Np = simd_size_v<_Tp, _Abi>;
+ if constexpr (_Np == 1)
+ return std::logb(__x[0]);
+ else if constexpr (__is_fixed_size_abi_v<_Abi>)
+ {
+ return {__private_init,
+ __data(__x).__apply_per_chunk([](auto __impl, auto __xx) {
+ using _V = typename decltype(__impl)::simd_type;
+ return __data(
+ std::experimental::logb(_V(__private_init, __xx)));
+ })};
+ }
+#if _GLIBCXX_SIMD_X86INTRIN // {{{
+ else if constexpr (__have_avx512vl && __is_sse_ps<_Tp, _Np>())
+ return {__private_init,
+ __auto_bitcast(_mm_getexp_ps(__to_intrin(__as_vector(__x))))};
+ else if constexpr (__have_avx512vl && __is_sse_pd<_Tp, _Np>())
+ return {__private_init, _mm_getexp_pd(__data(__x))};
+ else if constexpr (__have_avx512vl && __is_avx_ps<_Tp, _Np>())
+ return {__private_init, _mm256_getexp_ps(__data(__x))};
+ else if constexpr (__have_avx512vl && __is_avx_pd<_Tp, _Np>())
+ return {__private_init, _mm256_getexp_pd(__data(__x))};
+ else if constexpr (__have_avx512f && __is_avx_ps<_Tp, _Np>())
+ return {__private_init,
+ __lo256(_mm512_getexp_ps(__auto_bitcast(__data(__x))))};
+ else if constexpr (__have_avx512f && __is_avx_pd<_Tp, _Np>())
+ return {__private_init,
+ __lo256(_mm512_getexp_pd(__auto_bitcast(__data(__x))))};
+ else if constexpr (__is_avx512_ps<_Tp, _Np>())
+ return {__private_init, _mm512_getexp_ps(__data(__x))};
+ else if constexpr (__is_avx512_pd<_Tp, _Np>())
+ return {__private_init, _mm512_getexp_pd(__data(__x))};
+#endif // _GLIBCXX_SIMD_X86INTRIN }}}
+ else
+ {
+ using _V = simd<_Tp, _Abi>;
+ using namespace std::experimental::__proposed;
+ auto __is_normal = isnormal(__x);
+
+ // work on __abs(__x) to reflect the return value on Linux for negative
+ // inputs (domain-error => implementation-defined value is returned)
+ const _V abs_x = abs(__x);
+
+ // __exponent(__x) returns the exponent value (bias removed) as simd<_Up>
+ // with integral _Up
+ auto&& __exponent = [](const _V& __v) {
+ using namespace std::experimental::__proposed;
+ using _IV = rebind_simd_t<
+ std::conditional_t<sizeof(_Tp) == sizeof(_LLong), _LLong, int>, _V>;
+ return (__bit_cast<_IV>(__v) >> (std::numeric_limits<_Tp>::digits - 1))
+ - (std::numeric_limits<_Tp>::max_exponent - 1);
+ };
+ _V __r = static_simd_cast<_V>(__exponent(abs_x));
+ if (_GLIBCXX_SIMD_IS_LIKELY(all_of(__is_normal)))
+ // without corner cases (nan, inf, subnormal, zero) we have our
+ // answer:
+ return __r;
+ const auto __is_zero = __x == 0;
+ const auto __is_nan = isnan(__x);
+ const auto __is_inf = isinf(__x);
+ where(__is_zero, __r) = -std::numeric_limits<_Tp>::infinity();
+ where(__is_nan, __r) = __x;
+ where(__is_inf, __r) = std::numeric_limits<_Tp>::infinity();
+ __is_normal |= __is_zero || __is_nan || __is_inf;
+ if (all_of(__is_normal))
+ // at this point everything but subnormals is handled
+ return __r;
+ // subnormals repeat the exponent extraction after multiplication of the
+ // input with __a floating point value that has 112 (0x70) in its exponent
+ // (not too big for sp and large enough for dp)
+ const _V __scaled = abs_x * _Tp(0x1p112);
+ _V __scaled_exp = static_simd_cast<_V>(__exponent(__scaled) - 112);
+ where(__is_normal, __scaled_exp) = __r;
+ return __scaled_exp;
+ }
+}
+//}}}
+template <typename _Tp, typename _Abi>
+enable_if_t<std::is_floating_point_v<_Tp>, simd<_Tp, _Abi>>
+modf(const simd<_Tp, _Abi>& __x, simd<_Tp, _Abi>* __iptr)
+{
+ const auto __integral = trunc(__x);
+ *__iptr = __integral;
+ auto __r = __x - __integral;
+ where(isinf(__x), __r) = _Tp();
+ return copysign(__r, __x);
+}
+
+_GLIBCXX_SIMD_MATH_CALL2_(scalbn, int)
+_GLIBCXX_SIMD_MATH_CALL2_(scalbln, long)
+
+_GLIBCXX_SIMD_MATH_CALL_(cbrt)
+
+_GLIBCXX_SIMD_MATH_CALL_(abs)
+_GLIBCXX_SIMD_MATH_CALL_(fabs)
+
+// [parallel.simd.math] only asks for is_floating_point_v<_Tp> and forgot to
+// allow signed integral _Tp
+template <typename _Tp, typename _Abi>
+enable_if_t<!std::is_floating_point_v<_Tp> && std::is_signed_v<_Tp>,
+ simd<_Tp, _Abi>>
+abs(const simd<_Tp, _Abi>& __x)
+{
+ return {__private_init, _Abi::_SimdImpl::__abs(__data(__x))};
+}
+template <typename _Tp, typename _Abi>
+enable_if_t<!std::is_floating_point_v<_Tp> && std::is_signed_v<_Tp>,
+ simd<_Tp, _Abi>>
+fabs(const simd<_Tp, _Abi>& __x)
+{
+ return {__private_init, _Abi::_SimdImpl::__abs(__data(__x))};
+}
+
+// the following are overloads for functions in <cstdlib> and not covered by
+// [parallel.simd.math]. I don't see much value in making them work, though
+/*
+template <typename _Abi> simd<long, _Abi> labs(const simd<long, _Abi> &__x)
+{
+ return {__private_init, _Abi::_SimdImpl::abs(__data(__x))};
+}
+template <typename _Abi> simd<long long, _Abi> llabs(const simd<long long, _Abi>
+&__x)
+{
+ return {__private_init, _Abi::_SimdImpl::abs(__data(__x))};
+}
+*/
+
+#define _GLIBCXX_SIMD_CVTING2(_NAME) \
+ template <typename _Tp, typename _Abi> \
+ _GLIBCXX_SIMD_INTRINSIC simd<_Tp, _Abi> _NAME( \
+ const simd<_Tp, _Abi>& __x, const __id<simd<_Tp, _Abi>>& __y) \
+ { \
+ return _NAME(__x, __y); \
+ } \
+ template <typename _Tp, typename _Abi> \
+ _GLIBCXX_SIMD_INTRINSIC simd<_Tp, _Abi> _NAME( \
+ const __id<simd<_Tp, _Abi>>& __x, const simd<_Tp, _Abi>& __y) \
+ { \
+ return _NAME(__x, __y); \
+ }
+
+#define _GLIBCXX_SIMD_CVTING3(_NAME) \
+ template <typename _Tp, typename _Abi> \
+ _GLIBCXX_SIMD_INTRINSIC simd<_Tp, _Abi> _NAME( \
+ const __id<simd<_Tp, _Abi>>& __x, const simd<_Tp, _Abi>& __y, \
+ const simd<_Tp, _Abi>& __z) \
+ { \
+ return _NAME(__x, __y, __z); \
+ } \
+ template <typename _Tp, typename _Abi> \
+ _GLIBCXX_SIMD_INTRINSIC simd<_Tp, _Abi> _NAME( \
+ const simd<_Tp, _Abi>& __x, const __id<simd<_Tp, _Abi>>& __y, \
+ const simd<_Tp, _Abi>& __z) \
+ { \
+ return _NAME(__x, __y, __z); \
+ } \
+ template <typename _Tp, typename _Abi> \
+ _GLIBCXX_SIMD_INTRINSIC simd<_Tp, _Abi> _NAME( \
+ const simd<_Tp, _Abi>& __x, const simd<_Tp, _Abi>& __y, \
+ const __id<simd<_Tp, _Abi>>& __z) \
+ { \
+ return _NAME(__x, __y, __z); \
+ } \
+ template <typename _Tp, typename _Abi> \
+ _GLIBCXX_SIMD_INTRINSIC simd<_Tp, _Abi> _NAME( \
+ const simd<_Tp, _Abi>& __x, const __id<simd<_Tp, _Abi>>& __y, \
+ const __id<simd<_Tp, _Abi>>& __z) \
+ { \
+ return _NAME(__x, __y, __z); \
+ } \
+ template <typename _Tp, typename _Abi> \
+ _GLIBCXX_SIMD_INTRINSIC simd<_Tp, _Abi> _NAME( \
+ const __id<simd<_Tp, _Abi>>& __x, const simd<_Tp, _Abi>& __y, \
+ const __id<simd<_Tp, _Abi>>& __z) \
+ { \
+ return _NAME(__x, __y, __z); \
+ } \
+ template <typename _Tp, typename _Abi> \
+ _GLIBCXX_SIMD_INTRINSIC simd<_Tp, _Abi> _NAME( \
+ const __id<simd<_Tp, _Abi>>& __x, const __id<simd<_Tp, _Abi>>& __y, \
+ const simd<_Tp, _Abi>& __z) \
+ { \
+ return _NAME(__x, __y, __z); \
+ }
+
+template <typename _R, typename _ToApply, typename _Tp, typename... _Tps>
+_GLIBCXX_SIMD_INTRINSIC _R
+__fixed_size_apply(_ToApply&& __apply, const _Tp& __arg0, const _Tps&... __args)
+{
+ return {__private_init,
+ __data(__arg0).__apply_per_chunk(
+ [&](auto __impl, const auto&... __inner) {
+ using _V = typename decltype(__impl)::simd_type;
+ return __data(__apply(_V(__private_init, __inner)...));
+ },
+ __data(__args)...)};
+}
+
+template <typename _VV>
+__remove_cvref_t<_VV>
+__hypot(_VV __x, _VV __y)
+{
+ using _V = __remove_cvref_t<_VV>;
+ using _Tp = typename _V::value_type;
+ if constexpr (_V::size() == 1)
+ return std::hypot(_Tp(__x[0]), _Tp(__y[0]));
+ else if constexpr (__is_fixed_size_abi_v<typename _V::abi_type>)
+ {
+ return __fixed_size_apply<_V>([](auto __a,
+ auto __b) { return hypot(__a, __b); },
+ __x, __y);
+ }
+ else
+ {
+ // A simple solution for _Tp == float would be to cast to double and
+ // simply calculate sqrt(x²+y²) as it can't over-/underflow anymore with
+ // dp. It still needs the Annex F fixups though and isn't faster on
+ // Skylake-AVX512 (not even for SSE and AVX vectors, and really bad for
+ // AVX-512).
+ using namespace __proposed::float_bitwise_operators;
+ using _Limits = std::numeric_limits<_Tp>;
+ _V __absx = abs(__x); // no error
+ _V __absy = abs(__y); // no error
+ _V __hi = max(__absx, __absy); // no error
+ _V __lo = min(__absy, __absx); // no error
+
+ // round __hi down to the next power-of-2:
+ _GLIBCXX_SIMD_CONSTEXPR _V __inf(_Limits::infinity());
+
+ if (_GLIBCXX_SIMD_IS_LIKELY(all_of(isnormal(__x))
+ && all_of(isnormal(__y))))
+ {
+ const _V __hi_exp = __hi & __inf;
+ //((__hi + __hi) & __inf) ^ __inf almost works for computing __scale,
+ // except when (__hi + __hi) & __inf == __inf, in which case __scale
+ // becomes 0 (should be min/2 instead) and thus loses the information
+ // from __lo.
+ const _V __scale = (__hi_exp ^ __inf) * _Tp(.5);
+ _GLIBCXX_SIMD_CONSTEXPR _V __mant_mask
+ = _Limits::min() - _Limits::denorm_min();
+ const _V __h1 = (__hi & __mant_mask) | _V(1);
+ const _V __l1 = __lo * __scale;
+ return __hi_exp * sqrt(__h1 * __h1 + __l1 * __l1);
+ }
+ else
+ {
+ // slower path to support subnormals
+ // if __hi is subnormal, avoid scaling by inf & final mul by 0 (which
+ // yields NaN) by using min()
+ _V __scale = _V(1 / _Limits::min());
+ // invert exponent w/o error and w/o using the slow divider unit:
+ // xor inverts the exponent but off by 1. Multiplication with .5
+ // adjusts for the discrepancy.
+ where(__hi >= _Limits::min(), __scale)
+ = ((__hi & __inf) ^ __inf) * _Tp(.5);
+ // adjust final exponent for subnormal inputs
+ _V __hi_exp = _Limits::min();
+ where(__hi >= _Limits::min(), __hi_exp) = __hi & __inf; // no error
+ _V __h1 = __hi * __scale; // no error
+ _V __l1 = __lo * __scale; // no error
+
+ // sqrt(x²+y²) = e*sqrt((x/e)²+(y/e)²):
+ // this ensures no overflow in the argument to sqrt
+ _V __r = __hi_exp * sqrt(__h1 * __h1 + __l1 * __l1);
+#ifdef __STDC_IEC_559__
+ // fixup for Annex F requirements
+ // the naive fixup goes like this:
+ //
+ // where(__l1 == 0, __r) = __hi;
+ // where(isunordered(__x, __y), __r) = _Limits::quiet_NaN();
+ // where(isinf(__absx) || isinf(__absy), __r) = __inf;
+ //
+ // The fixup can be prepared in parallel with the sqrt, requiring a
+ // single blend step after hi_exp * sqrt, reducing latency and
+ // throughput:
+ _V __fixup = __hi; // __lo == 0
+ where(isunordered(__x, __y), __fixup) = _Limits::quiet_NaN();
+ where(isinf(__absx) || isinf(__absy), __fixup) = __inf;
+ where(!(__lo == 0 || isunordered(__x, __y)
+ || (isinf(__absx) || isinf(__absy))),
+ __fixup)
+ = __r;
+ __r = __fixup;
+#endif
+ return __r;
+ }
+ }
+}
+
+template <typename _Tp, typename _Abi>
+_GLIBCXX_SIMD_INTRINSIC simd<_Tp, _Abi>
+hypot(const simd<_Tp, _Abi>& __x, const simd<_Tp, _Abi>& __y)
+{
+ return __hypot<conditional_t<__is_fixed_size_abi_v<_Abi>,
+ const simd<_Tp, _Abi>&, simd<_Tp, _Abi>>>(__x,
+ __y);
+}
+_GLIBCXX_SIMD_CVTING2(hypot)
+
+template <typename _VV>
+__remove_cvref_t<_VV>
+__hypot(_VV __x, _VV __y, _VV __z)
+{
+ using _V = __remove_cvref_t<_VV>;
+ using _Abi = typename _V::abi_type;
+ using _Tp = typename _V::value_type;
+ /* FIXME: enable after PR77776 is resolved
+ if constexpr (_V::size() == 1)
+ return std::hypot(_Tp(__x[0]), _Tp(__y[0]), _Tp(__z[0]));
+ else
+ */
+ if constexpr (__is_fixed_size_abi_v<_Abi> && _V::size() > 1)
+ {
+ return __fixed_size_apply<simd<_Tp, _Abi>>(
+ [](auto __a, auto __b, auto __c) { return hypot(__a, __b, __c); }, __x,
+ __y, __z);
+ }
+ else
+ {
+ using namespace __proposed::float_bitwise_operators;
+ using _Limits = std::numeric_limits<_Tp>;
+ const _V __absx = abs(__x); // no error
+ const _V __absy = abs(__y); // no error
+ const _V __absz = abs(__z); // no error
+ _V __hi = max(max(__absx, __absy), __absz); // no error
+ _V __l0 = min(__absz, max(__absx, __absy)); // no error
+ _V __l1 = min(__absy, __absx); // no error
+ if constexpr (numeric_limits<_Tp>::digits == 64
+ && numeric_limits<_Tp>::max_exponent == 0x4000
+ && numeric_limits<_Tp>::min_exponent == -0x3FFD
+ && _V::size() == 1)
+ { // Seems like x87 fp80, where bit 63 is always 1 unless subnormal or
+ // NaN. In this case the bit-tricks don't work, they require IEC559
+ // binary32 or binary64 format.
+#ifdef __STDC_IEC_559__
+ // fixup for Annex F requirements
+ if (isinf(__absx[0]) || isinf(__absy[0]) || isinf(__absz[0]))
+ return _Limits::infinity();
+ else if (isunordered(__absx[0], __absy[0] + __absz[0]))
+ return _Limits::quiet_NaN();
+ else if (__l0[0] == 0 && __l1[0] == 0)
+ return __hi;
+#endif
+ _V __hi_exp = __hi;
+ const _ULLong __tmp = 0x8000'0000'0000'0000ull;
+ __builtin_memcpy(&__hi_exp, &__tmp, 8);
+ const _V __scale = 1 / __hi_exp;
+ __hi *= __scale;
+ __l0 *= __scale;
+ __l1 *= __scale;
+ return __hi_exp * sqrt((__l0 * __l0 + __l1 * __l1) + __hi * __hi);
+ }
+ else
+ {
+ // round __hi down to the next power-of-2:
+ _GLIBCXX_SIMD_CONSTEXPR _V __inf(_Limits::infinity());
+
+ if (_GLIBCXX_SIMD_IS_LIKELY(all_of(isnormal(__x))
+ && all_of(isnormal(__y))
+ && all_of(isnormal(__z))))
+ {
+ const _V __hi_exp = __hi & __inf;
+ //((__hi + __hi) & __inf) ^ __inf almost works for computing
+ //__scale, except when (__hi + __hi) & __inf == __inf, in which
+ // case __scale
+ // becomes 0 (should be min/2 instead) and thus loses the
+ // information from __lo.
+ const _V __scale = (__hi_exp ^ __inf) * _Tp(.5);
+ _GLIBCXX_SIMD_CONSTEXPR _V __mant_mask
+ = _Limits::min() - _Limits::denorm_min();
+ const _V __h1 = (__hi & __mant_mask) | _V(1);
+ __l0 *= __scale;
+ __l1 *= __scale;
+ const _V __lo
+ = __l0 * __l0 + __l1 * __l1; // add the two smaller values first
+ return __hi_exp * sqrt(__lo + __h1 * __h1);
+ }
+ else
+ {
+ // slower path to support subnormals
+ // if __hi is subnormal, avoid scaling by inf & final mul by 0
+ // (which yields NaN) by using min()
+ _V __scale = _V(1 / _Limits::min());
+ // invert exponent w/o error and w/o using the slow divider unit:
+ // xor inverts the exponent but off by 1. Multiplication with .5
+ // adjusts for the discrepancy.
+ where(__hi >= _Limits::min(), __scale)
+ = ((__hi & __inf) ^ __inf) * _Tp(.5);
+ // adjust final exponent for subnormal inputs
+ _V __hi_exp = _Limits::min();
+ where(__hi >= _Limits::min(), __hi_exp)
+ = __hi & __inf; // no error
+ _V __h1 = __hi * __scale; // no error
+ __l0 *= __scale; // no error
+ __l1 *= __scale; // no error
+ _V __lo
+ = __l0 * __l0 + __l1 * __l1; // add the two smaller values first
+ _V __r = __hi_exp * sqrt(__lo + __h1 * __h1);
+#ifdef __STDC_IEC_559__
+ // fixup for Annex F requirements
+ _V __fixup = __hi; // __lo == 0
+ // where(__lo == 0, __fixup) = __hi;
+ where(isunordered(__x, __y + __z), __fixup)
+ = _Limits::quiet_NaN();
+ where(isinf(__absx) || isinf(__absy) || isinf(__absz), __fixup)
+ = __inf;
+ // Instead of __lo == 0, the following could depend on __h1² ==
+ // __h1² + __lo (i.e. __hi is so much larger than the other two
+ // inputs that the result is exactly __hi). While this may improve
+ // precision, it is likely to reduce efficiency if the ISA has
+ // FMAs (because __h1² + __lo is an FMA, but the intermediate
+ // __h1² must be kept)
+ where(!(__lo == 0 || isunordered(__x, __y + __z) || isinf(__absx)
+ || isinf(__absy) || isinf(__absz)),
+ __fixup)
+ = __r;
+ __r = __fixup;
+#endif
+ return __r;
+ }
+ }
+ }
+}
+
+template <typename _Tp, typename _Abi>
+_GLIBCXX_SIMD_INTRINSIC simd<_Tp, _Abi>
+hypot(const simd<_Tp, _Abi>& __x, const simd<_Tp, _Abi>& __y,
+ const simd<_Tp, _Abi>& __z)
+{
+ return __hypot<conditional_t<__is_fixed_size_abi_v<_Abi>,
+ const simd<_Tp, _Abi>&, simd<_Tp, _Abi>>>(__x,
+ __y,
+ __z);
+}
+_GLIBCXX_SIMD_CVTING3(hypot)
+
+_GLIBCXX_SIMD_MATH_CALL2_(pow, _Tp)
+
+_GLIBCXX_SIMD_MATH_CALL_(sqrt)
+_GLIBCXX_SIMD_MATH_CALL_(erf)
+_GLIBCXX_SIMD_MATH_CALL_(erfc)
+_GLIBCXX_SIMD_MATH_CALL_(lgamma)
+_GLIBCXX_SIMD_MATH_CALL_(tgamma)
+_GLIBCXX_SIMD_MATH_CALL_(ceil)
+_GLIBCXX_SIMD_MATH_CALL_(floor)
+_GLIBCXX_SIMD_MATH_CALL_(nearbyint)
+_GLIBCXX_SIMD_MATH_CALL_(rint)
+_GLIBCXX_SIMD_MATH_CALL_(lrint)
+_GLIBCXX_SIMD_MATH_CALL_(llrint)
+
+_GLIBCXX_SIMD_MATH_CALL_(round)
+_GLIBCXX_SIMD_MATH_CALL_(lround)
+_GLIBCXX_SIMD_MATH_CALL_(llround)
+
+_GLIBCXX_SIMD_MATH_CALL_(trunc)
+
+_GLIBCXX_SIMD_MATH_CALL2_(fmod, _Tp)
+_GLIBCXX_SIMD_MATH_CALL2_(remainder, _Tp)
+_GLIBCXX_SIMD_MATH_CALL3_(remquo, _Tp, int*)
+template <typename _Tp, typename _Abi>
+enable_if_t<std::is_floating_point_v<_Tp>, simd<_Tp, _Abi>>
+copysign(const simd<_Tp, _Abi>& __x, const simd<_Tp, _Abi>& __y)
+{
+ using namespace std::experimental::__proposed::float_bitwise_operators;
+ const auto __signmask = -simd<_Tp, _Abi>();
+ return (__x & (__x ^ __signmask)) | (__y & __signmask);
+}
+
+_GLIBCXX_SIMD_MATH_CALL2_(nextafter, _Tp)
+// not covered in [parallel.simd.math]:
+// _GLIBCXX_SIMD_MATH_CALL2_(nexttoward, long double)
+_GLIBCXX_SIMD_MATH_CALL2_(fdim, _Tp)
+_GLIBCXX_SIMD_MATH_CALL2_(fmax, _Tp)
+_GLIBCXX_SIMD_MATH_CALL2_(fmin, _Tp)
+
+_GLIBCXX_SIMD_MATH_CALL3_(fma, _Tp, _Tp)
+_GLIBCXX_SIMD_MATH_CALL_(fpclassify)
+_GLIBCXX_SIMD_MATH_CALL_(isfinite)
+
+// isnan and isinf require special treatment because old glibc may declare
+// `int std::isinf(double)`.
+template <typename _Tp, typename _Abi, typename...,
+ typename _R
+ = std::experimental::__math_return_type_t<bool, _Tp, _Abi>>
+enable_if_t<std::is_floating_point_v<_Tp>, _R>
+isinf(std::experimental::simd<_Tp, _Abi> __x)
+{
+ return {std::experimental::__private_init,
+ _Abi::_SimdImpl::__isinf(std::experimental::__data(__x))};
+}
+template <typename _Tp, typename _Abi, typename...,
+ typename _R
+ = std::experimental::__math_return_type_t<bool, _Tp, _Abi>>
+enable_if_t<std::is_floating_point_v<_Tp>, _R>
+isnan(std::experimental::simd<_Tp, _Abi> __x)
+{
+ return {std::experimental::__private_init,
+ _Abi::_SimdImpl::__isnan(std::experimental::__data(__x))};
+}
+_GLIBCXX_SIMD_MATH_CALL_(isnormal)
+
+template <typename..., typename _Tp, typename _Abi>
+std::experimental::simd_mask<_Tp, _Abi>
+signbit(std::experimental::simd<_Tp, _Abi> __x)
+{
+ if constexpr (std::is_integral_v<_Tp>)
+ {
+ if constexpr (std::is_unsigned_v<_Tp>)
+ return std::experimental::simd_mask<_Tp, _Abi>{}; // false
+ else
+ return __x < 0;
+ }
+ else
+ return {std::experimental::__private_init,
+ _Abi::_SimdImpl::__signbit(std::experimental::__data(__x))};
+}
+
+_GLIBCXX_SIMD_MATH_CALL2_(isgreater, _Tp)
+_GLIBCXX_SIMD_MATH_CALL2_(isgreaterequal, _Tp)
+_GLIBCXX_SIMD_MATH_CALL2_(isless, _Tp)
+_GLIBCXX_SIMD_MATH_CALL2_(islessequal, _Tp)
+_GLIBCXX_SIMD_MATH_CALL2_(islessgreater, _Tp)
+_GLIBCXX_SIMD_MATH_CALL2_(isunordered, _Tp)
+
+/* not covered in [parallel.simd.math]
+template <typename _Abi> __doublev<_Abi> nan(const char* tagp);
+template <typename _Abi> __floatv<_Abi> nanf(const char* tagp);
+template <typename _Abi> __ldoublev<_Abi> nanl(const char* tagp);
+
+template <typename _V> struct simd_div_t {
+ _V quot, rem;
+};
+template <typename _Abi>
+simd_div_t<_SCharv<_Abi>> div(_SCharv<_Abi> numer,
+ _SCharv<_Abi> denom);
+template <typename _Abi>
+simd_div_t<__shortv<_Abi>> div(__shortv<_Abi> numer,
+ __shortv<_Abi> denom);
+template <typename _Abi>
+simd_div_t<__intv<_Abi>> div(__intv<_Abi> numer, __intv<_Abi> denom);
+template <typename _Abi>
+simd_div_t<__longv<_Abi>> div(__longv<_Abi> numer,
+ __longv<_Abi> denom);
+template <typename _Abi>
+simd_div_t<__llongv<_Abi>> div(__llongv<_Abi> numer,
+ __llongv<_Abi> denom);
+*/
+
+// special math {{{
+template <typename _Tp, typename _Abi>
+enable_if_t<std::is_floating_point_v<_Tp>, simd<_Tp, _Abi>>
+assoc_laguerre(const std::experimental::fixed_size_simd<
+ unsigned, std::experimental::simd_size_v<_Tp, _Abi>>& __n,
+ const std::experimental::fixed_size_simd<
+ unsigned, std::experimental::simd_size_v<_Tp, _Abi>>& __m,
+ const std::experimental::simd<_Tp, _Abi>& __x)
+{
+ return std::experimental::simd<_Tp, _Abi>([&](auto __i) {
+ return std::assoc_laguerre(__n[__i], __m[__i], __x[__i]);
+ });
+}
+
+template <typename _Tp, typename _Abi>
+enable_if_t<std::is_floating_point_v<_Tp>, simd<_Tp, _Abi>>
+assoc_legendre(const std::experimental::fixed_size_simd<
+ unsigned, std::experimental::simd_size_v<_Tp, _Abi>>& __n,
+ const std::experimental::fixed_size_simd<
+ unsigned, std::experimental::simd_size_v<_Tp, _Abi>>& __m,
+ const std::experimental::simd<_Tp, _Abi>& __x)
+{
+ return std::experimental::simd<_Tp, _Abi>([&](auto __i) {
+ return std::assoc_legendre(__n[__i], __m[__i], __x[__i]);
+ });
+}
+
+_GLIBCXX_SIMD_MATH_CALL2_(beta, _Tp)
+_GLIBCXX_SIMD_MATH_CALL_(comp_ellint_1)
+_GLIBCXX_SIMD_MATH_CALL_(comp_ellint_2)
+_GLIBCXX_SIMD_MATH_CALL2_(comp_ellint_3, _Tp)
+_GLIBCXX_SIMD_MATH_CALL2_(cyl_bessel_i, _Tp)
+_GLIBCXX_SIMD_MATH_CALL2_(cyl_bessel_j, _Tp)
+_GLIBCXX_SIMD_MATH_CALL2_(cyl_bessel_k, _Tp)
+_GLIBCXX_SIMD_MATH_CALL2_(cyl_neumann, _Tp)
+_GLIBCXX_SIMD_MATH_CALL2_(ellint_1, _Tp)
+_GLIBCXX_SIMD_MATH_CALL2_(ellint_2, _Tp)
+_GLIBCXX_SIMD_MATH_CALL3_(ellint_3, _Tp, _Tp)
+_GLIBCXX_SIMD_MATH_CALL_(expint)
+
+template <typename _Tp, typename _Abi>
+enable_if_t<std::is_floating_point_v<_Tp>, simd<_Tp, _Abi>>
+hermite(const std::experimental::fixed_size_simd<
+ unsigned, std::experimental::simd_size_v<_Tp, _Abi>>& __n,
+ const std::experimental::simd<_Tp, _Abi>& __x)
+{
+ return std::experimental::simd<_Tp, _Abi>(
+ [&](auto __i) { return std::hermite(__n[__i], __x[__i]); });
+}
+
+template <typename _Tp, typename _Abi>
+enable_if_t<std::is_floating_point_v<_Tp>, simd<_Tp, _Abi>>
+laguerre(const std::experimental::fixed_size_simd<
+ unsigned, std::experimental::simd_size_v<_Tp, _Abi>>& __n,
+ const std::experimental::simd<_Tp, _Abi>& __x)
+{
+ return std::experimental::simd<_Tp, _Abi>(
+ [&](auto __i) { return std::laguerre(__n[__i], __x[__i]); });
+}
+
+template <typename _Tp, typename _Abi>
+enable_if_t<std::is_floating_point_v<_Tp>, simd<_Tp, _Abi>>
+legendre(const std::experimental::fixed_size_simd<
+ unsigned, std::experimental::simd_size_v<_Tp, _Abi>>& __n,
+ const std::experimental::simd<_Tp, _Abi>& __x)
+{
+ return std::experimental::simd<_Tp, _Abi>(
+ [&](auto __i) { return std::legendre(__n[__i], __x[__i]); });
+}
+
+_GLIBCXX_SIMD_MATH_CALL_(riemann_zeta)
+
+template <typename _Tp, typename _Abi>
+enable_if_t<std::is_floating_point_v<_Tp>, simd<_Tp, _Abi>>
+sph_bessel(const std::experimental::fixed_size_simd<
+ unsigned, std::experimental::simd_size_v<_Tp, _Abi>>& __n,
+ const std::experimental::simd<_Tp, _Abi>& __x)
+{
+ return std::experimental::simd<_Tp, _Abi>(
+ [&](auto __i) { return std::sph_bessel(__n[__i], __x[__i]); });
+}
+
+template <typename _Tp, typename _Abi>
+enable_if_t<std::is_floating_point_v<_Tp>, simd<_Tp, _Abi>>
+sph_legendre(const std::experimental::fixed_size_simd<
+ unsigned, std::experimental::simd_size_v<_Tp, _Abi>>& __l,
+ const std::experimental::fixed_size_simd<
+ unsigned, std::experimental::simd_size_v<_Tp, _Abi>>& __m,
+ const std::experimental::simd<_Tp, _Abi>& theta)
+{
+ return std::experimental::simd<_Tp, _Abi>([&](auto __i) {
+ return std::assoc_legendre(__l[__i], __m[__i], theta[__i]);
+ });
+}
+
+template <typename _Tp, typename _Abi>
+enable_if_t<std::is_floating_point_v<_Tp>, simd<_Tp, _Abi>>
+sph_neumann(const std::experimental::fixed_size_simd<
+ unsigned, std::experimental::simd_size_v<_Tp, _Abi>>& __n,
+ const std::experimental::simd<_Tp, _Abi>& __x)
+{
+ return std::experimental::simd<_Tp, _Abi>(
+ [&](auto __i) { return std::sph_neumann(__n[__i], __x[__i]); });
+}
+// }}}
+
+#undef _GLIBCXX_SIMD_MATH_CALL_
+#undef _GLIBCXX_SIMD_MATH_CALL2_
+#undef _GLIBCXX_SIMD_MATH_CALL3_
+
+_GLIBCXX_SIMD_END_NAMESPACE
+
+#endif // __cplusplus >= 201703L
+#endif // _GLIBCXX_EXPERIMENTAL_SIMD_MATH_H_
+
+// vim: foldmethod=marker sw=2 ts=8 noet sts=2
diff --git a/libstdc++-v3/include/experimental/bits/simd_neon.h b/libstdc++-v3/include/experimental/bits/simd_neon.h
new file mode 100644
index 00000000000..efff0150b8a
--- /dev/null
+++ b/libstdc++-v3/include/experimental/bits/simd_neon.h
@@ -0,0 +1,466 @@
+// Simd NEON specific implementations -*- C++ -*-
+
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library. This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+// <http://www.gnu.org/licenses/>.
+
+#ifndef _GLIBCXX_EXPERIMENTAL_SIMD_NEON_H_
+#define _GLIBCXX_EXPERIMENTAL_SIMD_NEON_H_
+
+#if __cplusplus >= 201703L
+
+#if !_GLIBCXX_SIMD_HAVE_NEON
+#error "simd_neon.h may only be included when NEON on ARM is available"
+#endif
+
+_GLIBCXX_SIMD_BEGIN_NAMESPACE
+
+// _CommonImplNeon {{{
+struct _CommonImplNeon : _CommonImplBuiltin
+{
+ // __store {{{
+ using _CommonImplBuiltin::__store;
+
+ // }}}
+};
+
+// }}}
+// _SimdImplNeon {{{
+template <typename _Abi> struct _SimdImplNeon : _SimdImplBuiltin<_Abi>
+{
+ using _Base = _SimdImplBuiltin<_Abi>;
+ template <typename _Tp> static constexpr size_t _S_max_store_size = 16;
+
+ // __masked_load {{{
+ template <typename _Tp, size_t _Np, typename _Up, typename _Fp>
+ static inline _SimdWrapper<_Tp, _Np>
+ __masked_load(_SimdWrapper<_Tp, _Np> __merge, _SimdWrapper<_Tp, _Np> __k,
+ const _Up* __mem, _Fp) noexcept
+ {
+ __execute_n_times<_Np>([&](auto __i) {
+ if (__k[__i] != 0)
+ __merge.__set(__i, static_cast<_Tp>(__mem[__i]));
+ });
+ return __merge;
+ }
+
+ // }}}
+ // __masked_store_nocvt {{{
+ template <typename _Tp, std::size_t _Np, typename _Fp>
+ _GLIBCXX_SIMD_INTRINSIC static void
+ __masked_store_nocvt(_SimdWrapper<_Tp, _Np> __v, _Tp* __mem, _Fp,
+ _SimdWrapper<_Tp, _Np> __k)
+ {
+ __execute_n_times<_Np>([&](auto __i) {
+ if (__k[__i] != 0)
+ __mem[__i] = __v[__i];
+ });
+ }
+
+ // }}}
+ // __reduce {{{
+ template <typename _Tp, typename _BinaryOperation>
+ _GLIBCXX_SIMD_INTRINSIC static _Tp __reduce(simd<_Tp, _Abi> __x,
+ _BinaryOperation&& __binary_op)
+ {
+ constexpr size_t _Np = __x.size();
+ if constexpr (sizeof(__x) == 16 && _Np >= 4 && !_Abi::_S_is_partial)
+ {
+ const auto __halves = split<simd<_Tp, simd_abi::_Neon<8>>>(__x);
+ const auto __y = __binary_op(__halves[0], __halves[1]);
+ return _SimdImplNeon<simd_abi::_Neon<8>>::__reduce(
+ __y, static_cast<_BinaryOperation&&>(__binary_op));
+ }
+ else if constexpr (_Np == 8)
+ {
+ __x = __binary_op(__x, _Base::template __make_simd<_Tp, _Np>(
+ __vector_permute<1, 0, 3, 2, 5, 4, 7, 6>(
+ __x._M_data)));
+ __x = __binary_op(__x, _Base::template __make_simd<_Tp, _Np>(
+ __vector_permute<3, 2, 1, 0, 7, 6, 5, 4>(
+ __x._M_data)));
+ __x = __binary_op(__x, _Base::template __make_simd<_Tp, _Np>(
+ __vector_permute<7, 6, 5, 4, 3, 2, 1, 0>(
+ __x._M_data)));
+ return __x[0];
+ }
+ else if constexpr (_Np == 4)
+ {
+ __x = __binary_op(__x, _Base::template __make_simd<_Tp, _Np>(
+ __vector_permute<1, 0, 3, 2>(__x._M_data)));
+ __x = __binary_op(__x, _Base::template __make_simd<_Tp, _Np>(
+ __vector_permute<3, 2, 1, 0>(__x._M_data)));
+ return __x[0];
+ }
+ else if constexpr (_Np == 2)
+ {
+ __x = __binary_op(__x, _Base::template __make_simd<_Tp, _Np>(
+ __vector_permute<1, 0>(__x._M_data)));
+ return __x[0];
+ }
+ else
+ return _Base::__reduce(__x, static_cast<_BinaryOperation&&>(__binary_op));
+ }
+
+ // }}}
+ // math {{{
+ // __sqrt {{{
+ template <typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+ _GLIBCXX_SIMD_INTRINSIC static _Tp __sqrt(_Tp __x)
+ {
+ if constexpr (__have_neon_a64)
+ {
+ const auto __intrin = __to_intrin(__x);
+ if constexpr (_TVT::template __is<float, 2>)
+ return vsqrt_f32(__intrin);
+ else if constexpr (_TVT::template __is<float, 4>)
+ return vsqrtq_f32(__intrin);
+ else if constexpr (_TVT::template __is<double, 1>)
+ return vsqrt_f64(__intrin);
+ else if constexpr (_TVT::template __is<double, 2>)
+ return vsqrtq_f64(__intrin);
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else
+ return _Base::__sqrt(__x);
+ } // }}}
+ // __trunc {{{
+ template <typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+ _GLIBCXX_SIMD_INTRINSIC static _Tp __trunc(_Tp __x)
+ {
+ if constexpr (__have_neon_a32)
+ {
+ const auto __intrin = __to_intrin(__x);
+ if constexpr (_TVT::template __is<float, 2>)
+ return vrnd_f32(__intrin);
+ else if constexpr (_TVT::template __is<float, 4>)
+ return vrndq_f32(__intrin);
+ else if constexpr (_TVT::template __is<double, 1>)
+ return vrnd_f64(__intrin);
+ else if constexpr (_TVT::template __is<double, 2>)
+ return vrndq_f64(__intrin);
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else
+ return _Base::__trunc(__x);
+ } // }}}
+ // __floor {{{
+ template <typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+ _GLIBCXX_SIMD_INTRINSIC static _Tp __floor(_Tp __x)
+ {
+ if constexpr (__have_neon_a32)
+ {
+ const auto __intrin = __to_intrin(__x);
+ if constexpr (_TVT::template __is<float, 2>)
+ return vrndm_f32(__intrin);
+ else if constexpr (_TVT::template __is<float, 4>)
+ return vrndmq_f32(__intrin);
+ else if constexpr (_TVT::template __is<double, 1>)
+ return vrndm_f64(__intrin);
+ else if constexpr (_TVT::template __is<double, 2>)
+ return vrndmq_f64(__intrin);
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else
+ return _Base::__floor(__x);
+ } // }}}
+ // __ceil {{{
+ template <typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+ _GLIBCXX_SIMD_INTRINSIC static _Tp __ceil(_Tp __x)
+ {
+ if constexpr (__have_neon_a32)
+ {
+ const auto __intrin = __to_intrin(__x);
+ if constexpr (_TVT::template __is<float, 2>)
+ return vrndp_f32(__intrin);
+ else if constexpr (_TVT::template __is<float, 4>)
+ return vrndpq_f32(__intrin);
+ else if constexpr (_TVT::template __is<double, 1>)
+ return vrndp_f64(__intrin);
+ else if constexpr (_TVT::template __is<double, 2>)
+ return vrndpq_f64(__intrin);
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else
+ return _Base::__ceil(__x);
+ } //}}}
+ //}}}
+}; // }}}
+// _MaskImplNeonMixin {{{
+struct _MaskImplNeonMixin
+{
+ using _Base = _MaskImplBuiltinMixin;
+
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SanitizedBitMask<_Np>
+ __to_bits(_SimdWrapper<_Tp, _Np> __x)
+ {
+ if (__builtin_is_constant_evaluated())
+ return _Base::__to_bits(__x);
+
+ using _I = __int_for_sizeof_t<_Tp>;
+ if constexpr (sizeof(__x) == 16)
+ {
+ auto __asint = __vector_bitcast<_I>(__x);
+#ifdef __aarch64__
+ [[maybe_unused]] constexpr auto __zero = decltype(__asint)();
+#else
+ [[maybe_unused]] constexpr auto __zero = decltype(__lo64(__asint))();
+#endif
+ if constexpr (sizeof(_Tp) == 1)
+ {
+ constexpr auto __bitsel
+ = __generate_from_n_evaluations<16, __vector_type_t<_I, 16>>(
+ [&](auto __i) {
+ return static_cast<_I>(
+ __i < _Np ? (__i < 8 ? 1 << __i : 1 << (__i - 8)) : 0);
+ });
+ __asint &= __bitsel;
+#ifdef __aarch64__
+ return __vector_bitcast<_UShort>(
+ vpaddq_s8(vpaddq_s8(vpaddq_s8(__asint, __zero), __zero),
+ __zero))[0];
+#else
+ return __vector_bitcast<_UShort>(
+ vpadd_s8(vpadd_s8(vpadd_s8(__lo64(__asint), __hi64(__asint)),
+ __zero),
+ __zero))[0];
+#endif
+ }
+ else if constexpr (sizeof(_Tp) == 2)
+ {
+ constexpr auto __bitsel
+ = __generate_from_n_evaluations<8, __vector_type_t<_I, 8>>(
+ [&](auto __i) {
+ return static_cast<_I>(__i < _Np ? 1 << __i : 0);
+ });
+ __asint &= __bitsel;
+#ifdef __aarch64__
+ return vpaddq_s16(vpaddq_s16(vpaddq_s16(__asint, __zero), __zero),
+ __zero)[0];
+#else
+ return vpadd_s16(
+ vpadd_s16(vpadd_s16(__lo64(__asint), __hi64(__asint)), __zero),
+ __zero)[0];
+#endif
+ }
+ else if constexpr (sizeof(_Tp) == 4)
+ {
+ constexpr auto __bitsel
+ = __generate_from_n_evaluations<4, __vector_type_t<_I, 4>>(
+ [&](auto __i) {
+ return static_cast<_I>(__i < _Np ? 1 << __i : 0);
+ });
+ __asint &= __bitsel;
+#ifdef __aarch64__
+ return vpaddq_s32(vpaddq_s32(__asint, __zero), __zero)[0];
+#else
+ return vpadd_s32(vpadd_s32(__lo64(__asint), __hi64(__asint)),
+ __zero)[0];
+#endif
+ }
+ else if constexpr (sizeof(_Tp) == 8)
+ return (__asint[0] & 1) | (__asint[1] & 2);
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (sizeof(__x) == 8)
+ {
+ auto __asint = __vector_bitcast<_I>(__x);
+ [[maybe_unused]] constexpr auto __zero = decltype(__asint)();
+ if constexpr (sizeof(_Tp) == 1)
+ {
+ constexpr auto __bitsel
+ = __generate_from_n_evaluations<8, __vector_type_t<_I, 8>>(
+ [&](auto __i) {
+ return static_cast<_I>(__i < _Np ? 1 << __i : 0);
+ });
+ __asint &= __bitsel;
+ return vpadd_s8(vpadd_s8(vpadd_s8(__asint, __zero), __zero),
+ __zero)[0];
+ }
+ else if constexpr (sizeof(_Tp) == 2)
+ {
+ constexpr auto __bitsel
+ = __generate_from_n_evaluations<4, __vector_type_t<_I, 4>>(
+ [&](auto __i) {
+ return static_cast<_I>(__i < _Np ? 1 << __i : 0);
+ });
+ __asint &= __bitsel;
+ return vpadd_s16(vpadd_s16(__asint, __zero), __zero)[0];
+ }
+ else if constexpr (sizeof(_Tp) == 4)
+ {
+ __asint &= __make_vector<_I>(0x1, 0x2);
+ return vpadd_s32(__asint, __zero)[0];
+ }
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else
+ return _Base::__to_bits(__x);
+ }
+};
+
+// }}}
+// _MaskImplNeon {{{
+template <typename _Abi>
+struct _MaskImplNeon : _MaskImplNeonMixin, _MaskImplBuiltin<_Abi>
+{
+ using _MaskImplBuiltinMixin::__to_maskvector;
+ using _MaskImplNeonMixin::__to_bits;
+ using _Base = _MaskImplBuiltin<_Abi>;
+ using _Base::__convert;
+
+ // __all_of {{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static bool __all_of(simd_mask<_Tp, _Abi> __k)
+ {
+ const auto __kk
+ = __vector_bitcast<char>(__k._M_data)
+ | ~__vector_bitcast<char>(_Abi::template __implicit_mask<_Tp>());
+ if constexpr (sizeof(__k) == 16)
+ {
+ const auto __x = __vector_bitcast<long long>(__kk);
+ return __x[0] + __x[1] == -2;
+ }
+ else if constexpr (sizeof(__k) <= 8)
+ return __bit_cast<__int_for_sizeof_t<decltype(__kk)>>(__kk) == -1;
+ else
+ __assert_unreachable<_Tp>();
+ }
+
+ // }}}
+ // __any_of {{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static bool __any_of(simd_mask<_Tp, _Abi> __k)
+ {
+ const auto __kk
+ = __vector_bitcast<char>(__k._M_data)
+ | ~__vector_bitcast<char>(_Abi::template __implicit_mask<_Tp>());
+ if constexpr (sizeof(__k) == 16)
+ {
+ const auto __x = __vector_bitcast<long long>(__kk);
+ return (__x[0] | __x[1]) != 0;
+ }
+ else if constexpr (sizeof(__k) <= 8)
+ return __bit_cast<__int_for_sizeof_t<decltype(__kk)>>(__kk) != 0;
+ else
+ __assert_unreachable<_Tp>();
+ }
+
+ // }}}
+ // __none_of {{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static bool __none_of(simd_mask<_Tp, _Abi> __k)
+ {
+ const auto __kk
+ = __vector_bitcast<char>(__k._M_data)
+ | ~__vector_bitcast<char>(_Abi::template __implicit_mask<_Tp>());
+ if constexpr (sizeof(__k) == 16)
+ {
+ const auto __x = __vector_bitcast<long long>(__kk);
+ return (__x[0] | __x[1]) == 0;
+ }
+ else if constexpr (sizeof(__k) <= 8)
+ return __bit_cast<__int_for_sizeof_t<decltype(__kk)>>(__kk) == 0;
+ else
+ __assert_unreachable<_Tp>();
+ }
+
+ // }}}
+ // __some_of {{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static bool __some_of(simd_mask<_Tp, _Abi> __k)
+ {
+ if constexpr (sizeof(__k) <= 8)
+ {
+ const auto __kk
+ = __vector_bitcast<char>(__k._M_data)
+ | ~__vector_bitcast<char>(_Abi::template __implicit_mask<_Tp>());
+ using _Up = std::make_unsigned_t<__int_for_sizeof_t<decltype(__kk)>>;
+ return __bit_cast<_Up>(__kk) + 1 > 1;
+ }
+ else
+ return _Base::__some_of(__k);
+ }
+
+ // }}}
+ // __popcount {{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static int __popcount(simd_mask<_Tp, _Abi> __k)
+ {
+ if constexpr (sizeof(_Tp) == 1)
+ {
+ const auto __s8 = __vector_bitcast<_SChar>(__k._M_data);
+ int8x8_t __tmp = __lo64(__s8) + __hi64z(__s8);
+ return -vpadd_s8(vpadd_s8(vpadd_s8(__tmp, int8x8_t()), int8x8_t()),
+ int8x8_t())[0];
+ }
+ else if constexpr (sizeof(_Tp) == 2)
+ {
+ const auto __s16 = __vector_bitcast<short>(__k._M_data);
+ int16x4_t __tmp = __lo64(__s16) + __hi64z(__s16);
+ return -vpadd_s16(vpadd_s16(__tmp, int16x4_t()), int16x4_t())[0];
+ }
+ else if constexpr (sizeof(_Tp) == 4)
+ {
+ const auto __s32 = __vector_bitcast<int>(__k._M_data);
+ int32x2_t __tmp = __lo64(__s32) + __hi64z(__s32);
+ return -vpadd_s32(__tmp, int32x2_t())[0];
+ }
+ else if constexpr (sizeof(_Tp) == 8)
+ {
+ static_assert(sizeof(__k) == 16);
+ const auto __s64 = __vector_bitcast<long>(__k._M_data);
+ return -(__s64[0] + __s64[1]);
+ }
+ }
+
+ // }}}
+ // __find_first_set {{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static int __find_first_set(simd_mask<_Tp, _Abi> __k)
+ {
+ // TODO: the _Base implementation is not optimal for NEON
+ return _Base::__find_first_set(__k);
+ }
+
+ // }}}
+ // __find_last_set {{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static int __find_last_set(simd_mask<_Tp, _Abi> __k)
+ {
+ // TODO: the _Base implementation is not optimal for NEON
+ return _Base::__find_last_set(__k);
+ }
+
+ // }}}
+}; // }}}
+
+_GLIBCXX_SIMD_END_NAMESPACE
+#endif // __cplusplus >= 201703L
+#endif // _GLIBCXX_EXPERIMENTAL_SIMD_NEON_H_
+// vim: foldmethod=marker sw=2 noet ts=8 sts=2 tw=80
diff --git a/libstdc++-v3/include/experimental/bits/simd_scalar.h b/libstdc++-v3/include/experimental/bits/simd_scalar.h
new file mode 100644
index 00000000000..fc4ffe12298
--- /dev/null
+++ b/libstdc++-v3/include/experimental/bits/simd_scalar.h
@@ -0,0 +1,877 @@
+// Simd scalar ABI specific implementations -*- C++ -*-
+
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library. This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+// <http://www.gnu.org/licenses/>.
+
+#ifndef _GLIBCXX_EXPERIMENTAL_SIMD_SCALAR_H_
+#define _GLIBCXX_EXPERIMENTAL_SIMD_SCALAR_H_
+#if __cplusplus >= 201703L
+
+#include <cmath>
+
+_GLIBCXX_SIMD_BEGIN_NAMESPACE
+
+// __promote_preserving_unsigned{{{
+// work around crazy semantics of unsigned integers of lower rank than int:
+// Before applying an operator the operands are promoted to int. In which case
+// over- or underflow is UB, even though the operand types were unsigned.
+template <typename _Tp>
+_GLIBCXX_SIMD_INTRINSIC constexpr decltype(auto)
+__promote_preserving_unsigned(const _Tp& __x)
+{
+ if constexpr (std::is_signed_v<decltype(+__x)> && std::is_unsigned_v<_Tp>)
+ return static_cast<unsigned int>(__x);
+ else
+ return __x;
+}
+
+// }}}
+
+struct _CommonImplScalar;
+struct _CommonImplBuiltin;
+struct _SimdImplScalar;
+struct _MaskImplScalar;
+// simd_abi::_Scalar {{{
+struct simd_abi::_Scalar
+{
+ template <typename _Tp> static constexpr size_t size = 1;
+ template <typename _Tp> static constexpr size_t _S_full_size = 1;
+ static constexpr bool _S_is_partial = false;
+ struct _IsValidAbiTag : true_type
+ {
+ };
+ template <typename _Tp> struct _IsValidSizeFor : true_type
+ {
+ };
+ template <typename _Tp> struct _IsValid : __is_vectorizable<_Tp>
+ {
+ };
+ template <typename _Tp>
+ static constexpr bool _S_is_valid_v = _IsValid<_Tp>::value;
+
+ _GLIBCXX_SIMD_INTRINSIC static constexpr bool __masked(bool __x)
+ {
+ return __x;
+ }
+
+ using _CommonImpl = _CommonImplScalar;
+ using _SimdImpl = _SimdImplScalar;
+ using _MaskImpl = _MaskImplScalar;
+
+ template <typename _Tp, bool = _S_is_valid_v<_Tp>>
+ struct __traits : _InvalidTraits
+ {
+ };
+
+ template <typename _Tp> struct __traits<_Tp, true>
+ {
+ using _IsValid = true_type;
+ using _SimdImpl = _SimdImplScalar;
+ using _MaskImpl = _MaskImplScalar;
+ using _SimdMember = _Tp;
+ using _MaskMember = bool;
+ static constexpr size_t _S_simd_align = alignof(_SimdMember);
+ static constexpr size_t _S_mask_align = alignof(_MaskMember);
+
+ // nothing the user can spell converts to/from simd/simd_mask
+ struct _SimdCastType
+ {
+ _SimdCastType() = delete;
+ };
+ struct _MaskCastType
+ {
+ _MaskCastType() = delete;
+ };
+ struct _SimdBase
+ {
+ };
+ struct _MaskBase
+ {
+ };
+ };
+};
+// }}}
+// _CommonImplScalar {{{
+struct _CommonImplScalar
+{
+ // __store {{{
+ template <typename _Flags, typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static void __store(_Tp __x, void* __addr, _Flags)
+ {
+ __builtin_memcpy(__addr, &__x, sizeof(_Tp));
+ }
+
+ // }}}
+ // __store_bool_array(_BitMask) {{{
+ template <size_t _Np, typename _Flags, bool _Sanitized>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr void
+ __store_bool_array(_BitMask<_Np, _Sanitized> __x, bool* __mem, _Flags)
+ {
+ __make_dependent_t<_Flags, _CommonImplBuiltin>::__store_bool_array(__x, __mem,
+ _Flags());
+ }
+
+ // }}}
+};
+
+// }}}
+// _SimdImplScalar {{{
+struct _SimdImplScalar
+{
+ // member types {{{2
+ using abi_type = simd_abi::scalar;
+ template <typename _Tp> using _TypeTag = _Tp*;
+
+ // broadcast {{{2
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _Tp __broadcast(_Tp __x) noexcept
+ {
+ return __x;
+ }
+
+ // __generator {{{2
+ template <typename _Fp, typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _Tp __generator(_Fp&& __gen,
+ _TypeTag<_Tp>)
+ {
+ return __gen(_SizeConstant<0>());
+ }
+
+ // __load {{{2
+ template <typename _Tp, typename _Up, typename _Fp>
+ _GLIBCXX_SIMD_INTRINSIC static _Tp __load(const _Up* __mem, _Fp,
+ _TypeTag<_Tp>) noexcept
+ {
+ return static_cast<_Tp>(__mem[0]);
+ }
+
+ // __masked_load {{{2
+ template <typename _Tp, typename _Up, typename _Fp>
+ static inline _Tp __masked_load(_Tp __merge, bool __k, const _Up* __mem,
+ _Fp) noexcept
+ {
+ if (__k)
+ __merge = static_cast<_Tp>(__mem[0]);
+ return __merge;
+ }
+
+ // __store {{{2
+ template <typename _Tp, typename _Up, typename _Fp>
+ static inline void __store(_Tp __v, _Up* __mem, _Fp, _TypeTag<_Tp>) noexcept
+ {
+ __mem[0] = static_cast<_Tp>(__v);
+ }
+
+ // __masked_store {{{2
+ template <typename _Tp, typename _Up, typename _Fp>
+ static inline void __masked_store(const _Tp __v, _Up* __mem, _Fp,
+ const bool __k) noexcept
+ {
+ if (__k)
+ __mem[0] = __v;
+ }
+
+ // __negate {{{2
+ template <typename _Tp>
+ static constexpr inline bool __negate(_Tp __x) noexcept
+ {
+ return !__x;
+ }
+
+ // __reduce {{{2
+ template <typename _Tp, typename _BinaryOperation>
+ static constexpr inline _Tp __reduce(const simd<_Tp, simd_abi::scalar>& __x,
+ _BinaryOperation&)
+ {
+ return __x._M_data;
+ }
+
+ // __min, __max {{{2
+ template <typename _Tp>
+ static constexpr inline _Tp __min(const _Tp __a, const _Tp __b)
+ {
+ return std::min(__a, __b);
+ }
+
+ template <typename _Tp>
+ static constexpr inline _Tp __max(const _Tp __a, const _Tp __b)
+ {
+ return std::max(__a, __b);
+ }
+
+ // __complement {{{2
+ template <typename _Tp>
+ static constexpr inline _Tp __complement(_Tp __x) noexcept
+ {
+ return static_cast<_Tp>(~__x);
+ }
+
+ // __unary_minus {{{2
+ template <typename _Tp>
+ static constexpr inline _Tp __unary_minus(_Tp __x) noexcept
+ {
+ return static_cast<_Tp>(-__x);
+ }
+
+ // arithmetic operators {{{2
+ template <typename _Tp> static constexpr inline _Tp __plus(_Tp __x, _Tp __y)
+ {
+ return static_cast<_Tp>(__promote_preserving_unsigned(__x)
+ + __promote_preserving_unsigned(__y));
+ }
+
+ template <typename _Tp> static constexpr inline _Tp __minus(_Tp __x, _Tp __y)
+ {
+ return static_cast<_Tp>(__promote_preserving_unsigned(__x)
+ - __promote_preserving_unsigned(__y));
+ }
+
+ template <typename _Tp>
+ static constexpr inline _Tp __multiplies(_Tp __x, _Tp __y)
+ {
+ return static_cast<_Tp>(__promote_preserving_unsigned(__x)
+ * __promote_preserving_unsigned(__y));
+ }
+
+ template <typename _Tp>
+ static constexpr inline _Tp __divides(_Tp __x, _Tp __y)
+ {
+ return static_cast<_Tp>(__promote_preserving_unsigned(__x)
+ / __promote_preserving_unsigned(__y));
+ }
+
+ template <typename _Tp>
+ static constexpr inline _Tp __modulus(_Tp __x, _Tp __y)
+ {
+ return static_cast<_Tp>(__promote_preserving_unsigned(__x)
+ % __promote_preserving_unsigned(__y));
+ }
+
+ template <typename _Tp>
+ static constexpr inline _Tp __bit_and(_Tp __x, _Tp __y)
+ {
+ if constexpr (is_floating_point_v<_Tp>)
+ {
+ using _I = __int_for_sizeof_t<_Tp>;
+ const _I __r = reinterpret_cast<const __may_alias<_I>&>(__x)
+ & reinterpret_cast<const __may_alias<_I>&>(__y);
+ return reinterpret_cast<const __may_alias<_Tp>&>(__r);
+ }
+ else
+ return static_cast<_Tp>(__promote_preserving_unsigned(__x)
+ & __promote_preserving_unsigned(__y));
+ }
+
+ template <typename _Tp> static constexpr inline _Tp __bit_or(_Tp __x, _Tp __y)
+ {
+ if constexpr (is_floating_point_v<_Tp>)
+ {
+ using _I = __int_for_sizeof_t<_Tp>;
+ const _I __r = reinterpret_cast<const __may_alias<_I>&>(__x)
+ | reinterpret_cast<const __may_alias<_I>&>(__y);
+ return reinterpret_cast<const __may_alias<_Tp>&>(__r);
+ }
+ else
+ return static_cast<_Tp>(__promote_preserving_unsigned(__x)
+ | __promote_preserving_unsigned(__y));
+ }
+
+ template <typename _Tp>
+ static constexpr inline _Tp __bit_xor(_Tp __x, _Tp __y)
+ {
+ if constexpr (is_floating_point_v<_Tp>)
+ {
+ using _I = __int_for_sizeof_t<_Tp>;
+ const _I __r = reinterpret_cast<const __may_alias<_I>&>(__x)
+ ^ reinterpret_cast<const __may_alias<_I>&>(__y);
+ return reinterpret_cast<const __may_alias<_Tp>&>(__r);
+ }
+ else
+ return static_cast<_Tp>(__promote_preserving_unsigned(__x)
+ ^ __promote_preserving_unsigned(__y));
+ }
+
+ template <typename _Tp>
+ static constexpr inline _Tp __bit_shift_left(_Tp __x, int __y)
+ {
+ return static_cast<_Tp>(__promote_preserving_unsigned(__x) << __y);
+ }
+
+ template <typename _Tp>
+ static constexpr inline _Tp __bit_shift_right(_Tp __x, int __y)
+ {
+ return static_cast<_Tp>(__promote_preserving_unsigned(__x) >> __y);
+ }
+
+ // math {{{2
+ // frexp, modf and copysign implemented in simd_math.h
+ template <typename _Tp> using _ST = _SimdTuple<_Tp, simd_abi::scalar>;
+
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __acos(_Tp __x)
+ {
+ return std::acos(__x);
+ }
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __asin(_Tp __x)
+ {
+ return std::asin(__x);
+ }
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __atan(_Tp __x)
+ {
+ return std::atan(__x);
+ }
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __cos(_Tp __x)
+ {
+ return std::cos(__x);
+ }
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __sin(_Tp __x)
+ {
+ return std::sin(__x);
+ }
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __tan(_Tp __x)
+ {
+ return std::tan(__x);
+ }
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __acosh(_Tp __x)
+ {
+ return std::acosh(__x);
+ }
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __asinh(_Tp __x)
+ {
+ return std::asinh(__x);
+ }
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __atanh(_Tp __x)
+ {
+ return std::atanh(__x);
+ }
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __cosh(_Tp __x)
+ {
+ return std::cosh(__x);
+ }
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __sinh(_Tp __x)
+ {
+ return std::sinh(__x);
+ }
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __tanh(_Tp __x)
+ {
+ return std::tanh(__x);
+ }
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static _Tp __atan2(_Tp __x, _Tp __y)
+ {
+ return std::atan2(__x, __y);
+ }
+
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __exp(_Tp __x)
+ {
+ return std::exp(__x);
+ }
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __exp2(_Tp __x)
+ {
+ return std::exp2(__x);
+ }
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __expm1(_Tp __x)
+ {
+ return std::expm1(__x);
+ }
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __log(_Tp __x)
+ {
+ return std::log(__x);
+ }
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __log10(_Tp __x)
+ {
+ return std::log10(__x);
+ }
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __log1p(_Tp __x)
+ {
+ return std::log1p(__x);
+ }
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __log2(_Tp __x)
+ {
+ return std::log2(__x);
+ }
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __logb(_Tp __x)
+ {
+ return std::logb(__x);
+ }
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static _ST<int> __ilogb(_Tp __x)
+ {
+ return {std::ilogb(__x)};
+ }
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static _Tp __pow(_Tp __x, _Tp __y)
+ {
+ return std::pow(__x, __y);
+ }
+
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __abs(_Tp __x)
+ {
+ return std::abs(__x);
+ }
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __fabs(_Tp __x)
+ {
+ return std::fabs(__x);
+ }
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __sqrt(_Tp __x)
+ {
+ return std::sqrt(__x);
+ }
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __cbrt(_Tp __x)
+ {
+ return std::cbrt(__x);
+ }
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __erf(_Tp __x)
+ {
+ return std::erf(__x);
+ }
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __erfc(_Tp __x)
+ {
+ return std::erfc(__x);
+ }
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __lgamma(_Tp __x)
+ {
+ return std::lgamma(__x);
+ }
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __tgamma(_Tp __x)
+ {
+ return std::tgamma(__x);
+ }
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __trunc(_Tp __x)
+ {
+ return std::trunc(__x);
+ }
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __floor(_Tp __x)
+ {
+ return std::floor(__x);
+ }
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __ceil(_Tp __x)
+ {
+ return std::ceil(__x);
+ }
+
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static _Tp __nearbyint(_Tp __x)
+ {
+ return std::nearbyint(__x);
+ }
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __rint(_Tp __x)
+ {
+ return std::rint(__x);
+ }
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static _ST<long> __lrint(_Tp __x)
+ {
+ return {std::lrint(__x)};
+ }
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static _ST<long long> __llrint(_Tp __x)
+ {
+ return {std::llrint(__x)};
+ }
+ template <typename _Tp> _GLIBCXX_SIMD_INTRINSIC static _Tp __round(_Tp __x)
+ {
+ return std::round(__x);
+ }
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static _ST<long> __lround(_Tp __x)
+ {
+ return {std::lround(__x)};
+ }
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static _ST<long long> __llround(_Tp __x)
+ {
+ return {std::llround(__x)};
+ }
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static _Tp __ldexp(_Tp __x, _ST<int> __y)
+ {
+ return std::ldexp(__x, __y.first);
+ }
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static _Tp __scalbn(_Tp __x, _ST<int> __y)
+ {
+ return std::scalbn(__x, __y.first);
+ }
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static _Tp __scalbln(_Tp __x, _ST<long> __y)
+ {
+ return std::scalbln(__x, __y.first);
+ }
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static _Tp __fmod(_Tp __x, _Tp __y)
+ {
+ return std::fmod(__x, __y);
+ }
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static _Tp __remainder(_Tp __x, _Tp __y)
+ {
+ return std::remainder(__x, __y);
+ }
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static _Tp __nextafter(_Tp __x, _Tp __y)
+ {
+ return std::nextafter(__x, __y);
+ }
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static _Tp __fdim(_Tp __x, _Tp __y)
+ {
+ return std::fdim(__x, __y);
+ }
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static _Tp __fmax(_Tp __x, _Tp __y)
+ {
+ return std::fmax(__x, __y);
+ }
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static _Tp __fmin(_Tp __x, _Tp __y)
+ {
+ return std::fmin(__x, __y);
+ }
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static _Tp __fma(_Tp __x, _Tp __y, _Tp __z)
+ {
+ return std::fma(__x, __y, __z);
+ }
+
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static _Tp __remquo(_Tp __x, _Tp __y, _ST<int>* __z)
+ {
+ return std::remquo(__x, __y, &__z->first);
+ }
+ template <typename _Tp>
+ [[deprecated]] _GLIBCXX_SIMD_INTRINSIC static _Tp __remquo(_Tp __x, _Tp __y,
+ int* __z)
+ {
+ return std::remquo(__x, __y, __z);
+ }
+
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC constexpr static _ST<int> __fpclassify(_Tp __x)
+ {
+ return {std::fpclassify(__x)};
+ }
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC constexpr static bool __isfinite(_Tp __x)
+ {
+ return std::isfinite(__x);
+ }
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC constexpr static bool __isinf(_Tp __x)
+ {
+ return std::isinf(__x);
+ }
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC constexpr static bool __isnan(_Tp __x)
+ {
+ return std::isnan(__x);
+ }
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC constexpr static bool __isnormal(_Tp __x)
+ {
+ return std::isnormal(__x);
+ }
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC constexpr static bool __signbit(_Tp __x)
+ {
+ return std::signbit(__x);
+ }
+
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC constexpr static bool __isgreater(_Tp __x, _Tp __y)
+ {
+ return std::isgreater(__x, __y);
+ }
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC constexpr static bool __isgreaterequal(_Tp __x, _Tp __y)
+ {
+ return std::isgreaterequal(__x, __y);
+ }
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC constexpr static bool __isless(_Tp __x, _Tp __y)
+ {
+ return std::isless(__x, __y);
+ }
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC constexpr static bool __islessequal(_Tp __x, _Tp __y)
+ {
+ return std::islessequal(__x, __y);
+ }
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC constexpr static bool __islessgreater(_Tp __x, _Tp __y)
+ {
+ return std::islessgreater(__x, __y);
+ }
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC constexpr static bool __isunordered(_Tp __x, _Tp __y)
+ {
+ return std::isunordered(__x, __y);
+ }
+
+ // __increment & __decrement{{{2
+ template <typename _Tp> constexpr static inline void __increment(_Tp& __x)
+ {
+ ++__x;
+ }
+ template <typename _Tp> constexpr static inline void __decrement(_Tp& __x)
+ {
+ --__x;
+ }
+
+ // compares {{{2
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC constexpr static bool __equal_to(_Tp __x, _Tp __y)
+ {
+ return __x == __y;
+ }
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC constexpr static bool __not_equal_to(_Tp __x, _Tp __y)
+ {
+ return __x != __y;
+ }
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC constexpr static bool __less(_Tp __x, _Tp __y)
+ {
+ return __x < __y;
+ }
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC constexpr static bool __less_equal(_Tp __x, _Tp __y)
+ {
+ return __x <= __y;
+ }
+
+ // smart_reference access {{{2
+ template <typename _Tp, typename _Up>
+ constexpr static void __set(_Tp& __v, [[maybe_unused]] int __i,
+ _Up&& __x) noexcept
+ {
+ _GLIBCXX_DEBUG_ASSERT(__i == 0);
+ __v = static_cast<_Up&&>(__x);
+ }
+
+ // __masked_assign {{{2
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC constexpr static void
+ __masked_assign(bool __k, _Tp& __lhs, _Tp __rhs)
+ {
+ if (__k)
+ __lhs = __rhs;
+ }
+
+ // __masked_cassign {{{2
+ template <typename _Op, typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC constexpr static void
+ __masked_cassign(const bool __k, _Tp& __lhs, const _Tp __rhs, _Op __op)
+ {
+ if (__k)
+ __lhs = __op(_SimdImplScalar{}, __lhs, __rhs);
+ }
+
+ // __masked_unary {{{2
+ template <template <typename> class _Op, typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC constexpr static _Tp __masked_unary(const bool __k,
+ const _Tp __v)
+ {
+ return static_cast<_Tp>(__k ? _Op<_Tp>{}(__v) : __v);
+ }
+
+ // }}}2
+};
+
+// }}}
+// _MaskImplScalar {{{
+struct _MaskImplScalar
+{
+ // member types {{{
+ template <typename _Tp> using _TypeTag = _Tp*;
+
+ // }}}
+ // __broadcast {{{
+ template <typename>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr bool __broadcast(bool __x)
+ {
+ return __x;
+ }
+
+ // }}}
+ // __load {{{
+ template <typename, typename _Fp>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr bool __load(const bool* __mem)
+ {
+ return __mem[0];
+ }
+
+ // }}}
+ // __to_bits {{{
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SanitizedBitMask<1>
+ __to_bits(bool __x)
+ {
+ return __x;
+ }
+
+ // }}}
+ // __convert {{{
+ template <typename _Tp, bool _Sanitized>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr bool
+ __convert(_BitMask<1, _Sanitized> __x)
+ {
+ return __x[0];
+ }
+
+ template <typename _Tp, typename _Up, typename _UAbi>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr bool
+ __convert(simd_mask<_Up, _UAbi> __x)
+ {
+ return __x[0];
+ }
+
+ // }}}
+ // __from_bitmask {{{2
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC constexpr static bool
+ __from_bitmask(_SanitizedBitMask<1> __bits, _TypeTag<_Tp>) noexcept
+ {
+ return __bits[0];
+ }
+
+ // __masked_load {{{2
+ template <typename _Fp>
+ _GLIBCXX_SIMD_INTRINSIC constexpr static bool
+ __masked_load(bool __merge, bool __mask, const bool* __mem, _Fp) noexcept
+ {
+ if (__mask)
+ __merge = __mem[0];
+ return __merge;
+ }
+
+ // __store {{{2
+ template <typename _Fp>
+ _GLIBCXX_SIMD_INTRINSIC static void __store(bool __v, bool* __mem,
+ _Fp) noexcept
+ {
+ __mem[0] = __v;
+ }
+
+ // __masked_store {{{2
+ template <typename _Fp>
+ _GLIBCXX_SIMD_INTRINSIC static void
+ __masked_store(const bool __v, bool* __mem, _Fp, const bool __k) noexcept
+ {
+ if (__k)
+ __mem[0] = __v;
+ }
+
+ // logical and bitwise operators {{{2
+ static constexpr bool __logical_and(bool __x, bool __y) { return __x && __y; }
+ static constexpr bool __logical_or(bool __x, bool __y) { return __x || __y; }
+ static constexpr bool __bit_not(bool __x) { return !__x; }
+ static constexpr bool __bit_and(bool __x, bool __y) { return __x && __y; }
+ static constexpr bool __bit_or(bool __x, bool __y) { return __x || __y; }
+ static constexpr bool __bit_xor(bool __x, bool __y) { return __x != __y; }
+
+ // smart_reference access {{{2
+ constexpr static void __set(bool& __k, [[maybe_unused]] int __i,
+ bool __x) noexcept
+ {
+ _GLIBCXX_DEBUG_ASSERT(__i == 0);
+ __k = __x;
+ }
+
+ // __masked_assign {{{2
+ _GLIBCXX_SIMD_INTRINSIC static void __masked_assign(bool __k, bool& __lhs,
+ bool __rhs)
+ {
+ if (__k)
+ __lhs = __rhs;
+ }
+
+ // }}}2
+ // __all_of {{{
+ template <typename _Tp, typename _Abi>
+ _GLIBCXX_SIMD_INTRINSIC constexpr static bool
+ __all_of(simd_mask<_Tp, _Abi> __k)
+ {
+ return __k._M_data;
+ }
+
+ // }}}
+ // __any_of {{{
+ template <typename _Tp, typename _Abi>
+ _GLIBCXX_SIMD_INTRINSIC constexpr static bool
+ __any_of(simd_mask<_Tp, _Abi> __k)
+ {
+ return __k._M_data;
+ }
+
+ // }}}
+ // __none_of {{{
+ template <typename _Tp, typename _Abi>
+ _GLIBCXX_SIMD_INTRINSIC constexpr static bool
+ __none_of(simd_mask<_Tp, _Abi> __k)
+ {
+ return !__k._M_data;
+ }
+
+ // }}}
+ // __some_of {{{
+ template <typename _Tp, typename _Abi>
+ _GLIBCXX_SIMD_INTRINSIC constexpr static bool __some_of(simd_mask<_Tp, _Abi>)
+ {
+ return false;
+ }
+
+ // }}}
+ // __popcount {{{
+ template <typename _Tp, typename _Abi>
+ _GLIBCXX_SIMD_INTRINSIC constexpr static int
+ __popcount(simd_mask<_Tp, _Abi> __k)
+ {
+ return __k._M_data;
+ }
+
+ // }}}
+ // __find_first_set {{{
+ template <typename _Tp, typename _Abi>
+ _GLIBCXX_SIMD_INTRINSIC constexpr static int
+ __find_first_set(simd_mask<_Tp, _Abi>)
+ {
+ return 0;
+ }
+
+ // }}}
+ // __find_last_set {{{
+ template <typename _Tp, typename _Abi>
+ _GLIBCXX_SIMD_INTRINSIC constexpr static int
+ __find_last_set(simd_mask<_Tp, _Abi>)
+ {
+ return 0;
+ }
+
+ // }}}
+};
+
+// }}}
+
+_GLIBCXX_SIMD_END_NAMESPACE
+#endif // __cplusplus >= 201703L
+#endif // _GLIBCXX_EXPERIMENTAL_SIMD_SCALAR_H_
+
+// vim: foldmethod=marker sw=2 noet ts=8 sts=2 tw=80
diff --git a/libstdc++-v3/include/experimental/bits/simd_x86.h b/libstdc++-v3/include/experimental/bits/simd_x86.h
new file mode 100644
index 00000000000..4e15aac8b62
--- /dev/null
+++ b/libstdc++-v3/include/experimental/bits/simd_x86.h
@@ -0,0 +1,5037 @@
+// Simd x86 specific implementations -*- C++ -*-
+
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library. This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+// <http://www.gnu.org/licenses/>.
+
+#ifndef _GLIBCXX_EXPERIMENTAL_SIMD_X86_H_
+#define _GLIBCXX_EXPERIMENTAL_SIMD_X86_H_
+
+#if __cplusplus >= 201703L
+
+#if !_GLIBCXX_SIMD_X86INTRIN
+#error \
+ "simd_x86.h may only be included when MMX or SSE on x86(_64) are available"
+#endif
+
+_GLIBCXX_SIMD_BEGIN_NAMESPACE
+
+// __interleave128_lo {{{
+template <typename _Ap, typename _B, typename _Tp = std::common_type_t<_Ap, _B>,
+ typename _Trait = _VectorTraits<_Tp>>
+_GLIBCXX_SIMD_INTRINSIC constexpr _Tp
+__interleave128_lo(const _Ap& __av, const _B& __bv)
+{
+ const _Tp __a(__av);
+ const _Tp __b(__bv);
+ if constexpr (sizeof(_Tp) == 16 && _Trait::_S_width == 2)
+ return _Tp{__a[0], __b[0]};
+ else if constexpr (sizeof(_Tp) == 16 && _Trait::_S_width == 4)
+ return _Tp{__a[0], __b[0], __a[1], __b[1]};
+ else if constexpr (sizeof(_Tp) == 16 && _Trait::_S_width == 8)
+ return _Tp{__a[0], __b[0], __a[1], __b[1], __a[2], __b[2], __a[3], __b[3]};
+ else if constexpr (sizeof(_Tp) == 16 && _Trait::_S_width == 16)
+ return _Tp{__a[0], __b[0], __a[1], __b[1], __a[2], __b[2], __a[3], __b[3],
+ __a[4], __b[4], __a[5], __b[5], __a[6], __b[6], __a[7], __b[7]};
+ else if constexpr (sizeof(_Tp) == 32 && _Trait::_S_width == 4)
+ return _Tp{__a[0], __b[0], __a[2], __b[2]};
+ else if constexpr (sizeof(_Tp) == 32 && _Trait::_S_width == 8)
+ return _Tp{__a[0], __b[0], __a[1], __b[1], __a[4], __b[4], __a[5], __b[5]};
+ else if constexpr (sizeof(_Tp) == 32 && _Trait::_S_width == 16)
+ return _Tp{__a[0], __b[0], __a[1], __b[1], __a[2], __b[2],
+ __a[3], __b[3], __a[8], __b[8], __a[9], __b[9],
+ __a[10], __b[10], __a[11], __b[11]};
+ else if constexpr (sizeof(_Tp) == 32 && _Trait::_S_width == 32)
+ return _Tp{__a[0], __b[0], __a[1], __b[1], __a[2], __b[2], __a[3],
+ __b[3], __a[4], __b[4], __a[5], __b[5], __a[6], __b[6],
+ __a[7], __b[7], __a[16], __b[16], __a[17], __b[17], __a[18],
+ __b[18], __a[19], __b[19], __a[20], __b[20], __a[21], __b[21],
+ __a[22], __b[22], __a[23], __b[23]};
+ else if constexpr (sizeof(_Tp) == 64 && _Trait::_S_width == 8)
+ return _Tp{__a[0], __b[0], __a[2], __b[2], __a[4], __b[4], __a[6], __b[6]};
+ else if constexpr (sizeof(_Tp) == 64 && _Trait::_S_width == 16)
+ return _Tp{__a[0], __b[0], __a[1], __b[1], __a[4], __b[4],
+ __a[5], __b[5], __a[8], __b[8], __a[9], __b[9],
+ __a[12], __b[12], __a[13], __b[13]};
+ else if constexpr (sizeof(_Tp) == 64 && _Trait::_S_width == 32)
+ return _Tp{__a[0], __b[0], __a[1], __b[1], __a[2], __b[2], __a[3],
+ __b[3], __a[8], __b[8], __a[9], __b[9], __a[10], __b[10],
+ __a[11], __b[11], __a[16], __b[16], __a[17], __b[17], __a[18],
+ __b[18], __a[19], __b[19], __a[24], __b[24], __a[25], __b[25],
+ __a[26], __b[26], __a[27], __b[27]};
+ else if constexpr (sizeof(_Tp) == 64 && _Trait::_S_width == 64)
+ return _Tp{__a[0], __b[0], __a[1], __b[1], __a[2], __b[2], __a[3],
+ __b[3], __a[4], __b[4], __a[5], __b[5], __a[6], __b[6],
+ __a[7], __b[7], __a[16], __b[16], __a[17], __b[17], __a[18],
+ __b[18], __a[19], __b[19], __a[20], __b[20], __a[21], __b[21],
+ __a[22], __b[22], __a[23], __b[23], __a[32], __b[32], __a[33],
+ __b[33], __a[34], __b[34], __a[35], __b[35], __a[36], __b[36],
+ __a[37], __b[37], __a[38], __b[38], __a[39], __b[39], __a[48],
+ __b[48], __a[49], __b[49], __a[50], __b[50], __a[51], __b[51],
+ __a[52], __b[52], __a[53], __b[53], __a[54], __b[54], __a[55],
+ __b[55]};
+ else
+ __assert_unreachable<_Tp>();
+}
+
+// }}}
+// __is_zero{{{
+template <typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+_GLIBCXX_SIMD_INTRINSIC constexpr bool
+__is_zero(_Tp __a)
+{
+ if (!__builtin_is_constant_evaluated())
+ {
+ if constexpr (__have_avx)
+ {
+ if constexpr (_TVT::template __is<float, 8>)
+ return _mm256_testz_ps(__a, __a);
+ else if constexpr (_TVT::template __is<double, 4>)
+ return _mm256_testz_pd(__a, __a);
+ else if constexpr (sizeof(_Tp) == 32)
+ return _mm256_testz_si256(__to_intrin(__a), __to_intrin(__a));
+ else if constexpr (_TVT::template __is<float>)
+ return _mm_testz_ps(__to_intrin(__a), __to_intrin(__a));
+ else if constexpr (_TVT::template __is<double, 2>)
+ return _mm_testz_pd(__a, __a);
+ else
+ return _mm_testz_si128(__to_intrin(__a), __to_intrin(__a));
+ }
+ else if constexpr (__have_sse4_1)
+ return _mm_testz_si128(__intrin_bitcast<__m128i>(__a),
+ __intrin_bitcast<__m128i>(__a));
+ }
+ else if constexpr (sizeof(_Tp) <= 8)
+ return reinterpret_cast<__int_for_sizeof_t<_Tp>>(__a) == 0;
+ else
+ {
+ const auto __b = __vector_bitcast<_LLong>(__a);
+ if constexpr (sizeof(__b) == 16)
+ return (__b[0] | __b[1]) == 0;
+ else if constexpr (sizeof(__b) == 32)
+ return __is_zero(__lo128(__b) | __hi128(__b));
+ else if constexpr (sizeof(__b) == 64)
+ return __is_zero(__lo256(__b) | __hi256(__b));
+ else
+ __assert_unreachable<_Tp>();
+ }
+}
+// }}}
+// __movemask{{{
+template <typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+_GLIBCXX_SIMD_INTRINSIC _GLIBCXX_CONST int
+__movemask(_Tp __a)
+{
+ if constexpr (sizeof(_Tp) == 32)
+ {
+ if constexpr (_TVT::template __is<float>)
+ return _mm256_movemask_ps(__to_intrin(__a));
+ else if constexpr (_TVT::template __is<double>)
+ return _mm256_movemask_pd(__to_intrin(__a));
+ else
+ return _mm256_movemask_epi8(__to_intrin(__a));
+ }
+ else if constexpr (_TVT::template __is<float>)
+ return _mm_movemask_ps(__to_intrin(__a));
+ else if constexpr (_TVT::template __is<double>)
+ return _mm_movemask_pd(__to_intrin(__a));
+ else
+ return _mm_movemask_epi8(__to_intrin(__a));
+}
+
+// }}}
+// __testz{{{
+template <typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+_GLIBCXX_SIMD_INTRINSIC _GLIBCXX_CONST constexpr int
+__testz(_Tp __a, typename _TVT::type __b)
+{
+ if (!__builtin_is_constant_evaluated())
+ {
+ if constexpr (sizeof(_Tp) == 32)
+ {
+ if constexpr (_TVT::template __is<float>)
+ return _mm256_testz_ps(__to_intrin(__a), __to_intrin(__b));
+ else if constexpr (_TVT::template __is<double>)
+ return _mm256_testz_pd(__to_intrin(__a), __to_intrin(__b));
+ else
+ return _mm256_testz_si256(__to_intrin(__a), __to_intrin(__b));
+ }
+ else if constexpr (_TVT::template __is<float> && __have_avx)
+ return _mm_testz_ps(__to_intrin(__a), __to_intrin(__b));
+ else if constexpr (_TVT::template __is<double> && __have_avx)
+ return _mm_testz_pd(__to_intrin(__a), __to_intrin(__b));
+ else if constexpr (__have_sse4_1)
+ return _mm_testz_si128(__intrin_bitcast<__m128i>(__to_intrin(__a)),
+ __intrin_bitcast<__m128i>(__to_intrin(__b)));
+ else
+ return __movemask(0 == __and(__a, __b)) != 0;
+ }
+ else
+ return __is_zero(__and(__a, __b));
+}
+
+// }}}
+// __testc{{{
+// requires SSE4.1 or above
+template <typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+_GLIBCXX_SIMD_INTRINSIC _GLIBCXX_CONST constexpr int
+__testc(_Tp __a, typename _TVT::type __b)
+{
+ if (__builtin_is_constant_evaluated())
+ return __is_zero(__andnot(__a, __b));
+
+ if constexpr (sizeof(_Tp) == 32)
+ {
+ if constexpr (_TVT::template __is<float>)
+ return _mm256_testc_ps(__a, __b);
+ else if constexpr (_TVT::template __is<double>)
+ return _mm256_testc_pd(__a, __b);
+ else
+ return _mm256_testc_si256(__to_intrin(__a), __to_intrin(__b));
+ }
+ else if constexpr (_TVT::template __is<float> && __have_avx)
+ return _mm_testc_ps(__to_intrin(__a), __to_intrin(__b));
+ else if constexpr (_TVT::template __is<double> && __have_avx)
+ return _mm_testc_pd(__to_intrin(__a), __to_intrin(__b));
+ else
+ {
+ static_assert(is_same_v<_Tp, _Tp> && __have_sse4_1);
+ return _mm_testc_si128(__intrin_bitcast<__m128i>(__to_intrin(__a)),
+ __intrin_bitcast<__m128i>(__to_intrin(__b)));
+ }
+}
+
+// }}}
+// __testnzc{{{
+template <typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+_GLIBCXX_SIMD_INTRINSIC _GLIBCXX_CONST constexpr int
+__testnzc(_Tp __a, typename _TVT::type __b)
+{
+ if (!__builtin_is_constant_evaluated())
+ {
+ if constexpr (sizeof(_Tp) == 32)
+ {
+ if constexpr (_TVT::template __is<float>)
+ return _mm256_testnzc_ps(__a, __b);
+ else if constexpr (_TVT::template __is<double>)
+ return _mm256_testnzc_pd(__a, __b);
+ else
+ return _mm256_testnzc_si256(__to_intrin(__a), __to_intrin(__b));
+ }
+ else if constexpr (_TVT::template __is<float> && __have_avx)
+ return _mm_testnzc_ps(__to_intrin(__a), __to_intrin(__b));
+ else if constexpr (_TVT::template __is<double> && __have_avx)
+ return _mm_testnzc_pd(__to_intrin(__a), __to_intrin(__b));
+ else if constexpr (__have_sse4_1)
+ return _mm_testnzc_si128(__intrin_bitcast<__m128i>(__to_intrin(__a)),
+ __intrin_bitcast<__m128i>(__to_intrin(__b)));
+ else
+ return __movemask(0 == __and(__a, __b)) == 0
+ && __movemask(0 == __andnot(__a, __b)) == 0;
+ }
+ else
+ return !(__is_zero(__and(__a, __b)) || __is_zero(__andnot(__a, __b)));
+}
+
+// }}}
+// __xzyw{{{
+// shuffles the complete vector, swapping the inner two quarters. Often useful
+// for AVX for fixing up a shuffle result.
+template <typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+_GLIBCXX_SIMD_INTRINSIC _Tp
+__xzyw(_Tp __a)
+{
+ if constexpr (sizeof(_Tp) == 16)
+ {
+ const auto __x = __vector_bitcast<conditional_t<
+ is_floating_point_v<typename _TVT::value_type>, float, int>>(__a);
+ return reinterpret_cast<_Tp>(
+ decltype(__x){__x[0], __x[2], __x[1], __x[3]});
+ }
+ else if constexpr (sizeof(_Tp) == 32)
+ {
+ const auto __x = __vector_bitcast<conditional_t<
+ is_floating_point_v<typename _TVT::value_type>, double, _LLong>>(__a);
+ return reinterpret_cast<_Tp>(
+ decltype(__x){__x[0], __x[2], __x[1], __x[3]});
+ }
+ else if constexpr (sizeof(_Tp) == 64)
+ {
+ const auto __x = __vector_bitcast<conditional_t<
+ is_floating_point_v<typename _TVT::value_type>, double, _LLong>>(__a);
+ return reinterpret_cast<_Tp>(decltype(
+ __x){__x[0], __x[1], __x[4], __x[5], __x[2], __x[3], __x[6], __x[7]});
+ }
+ else
+ __assert_unreachable<_Tp>();
+}
+
+// }}}
+
+#ifdef _GLIBCXX_SIMD_WORKAROUND_PR85048
+#include "simd_x86_conversions.h"
+#endif
+
+// ISA & type detection {{{
+template <typename _Tp, size_t _Np>
+constexpr bool
+__is_sse_ps()
+{
+ return __have_sse
+ && std::is_same_v<_Tp,
+ float> && sizeof(__intrinsic_type_t<_Tp, _Np>) == 16;
+}
+template <typename _Tp, size_t _Np>
+constexpr bool
+__is_sse_pd()
+{
+ return __have_sse2
+ && std::is_same_v<
+ _Tp, double> && sizeof(__intrinsic_type_t<_Tp, _Np>) == 16;
+}
+template <typename _Tp, size_t _Np>
+constexpr bool
+__is_avx_ps()
+{
+ return __have_avx
+ && std::is_same_v<_Tp,
+ float> && sizeof(__intrinsic_type_t<_Tp, _Np>) == 32;
+}
+template <typename _Tp, size_t _Np>
+constexpr bool
+__is_avx_pd()
+{
+ return __have_avx
+ && std::is_same_v<
+ _Tp, double> && sizeof(__intrinsic_type_t<_Tp, _Np>) == 32;
+}
+template <typename _Tp, size_t _Np>
+constexpr bool
+__is_avx512_ps()
+{
+ return __have_avx512f
+ && std::is_same_v<_Tp,
+ float> && sizeof(__intrinsic_type_t<_Tp, _Np>) == 64;
+}
+template <typename _Tp, size_t _Np>
+constexpr bool
+__is_avx512_pd()
+{
+ return __have_avx512f
+ && std::is_same_v<
+ _Tp, double> && sizeof(__intrinsic_type_t<_Tp, _Np>) == 64;
+}
+
+// }}}
+struct _MaskImplX86Mixin;
+// _CommonImplX86 {{{
+struct _CommonImplX86 : _CommonImplBuiltin
+{
+#ifdef _GLIBCXX_SIMD_WORKAROUND_PR85048
+ // __converts_via_decomposition {{{
+ template <typename _From, typename _To, size_t _ToSize>
+ static constexpr bool __converts_via_decomposition()
+ {
+ if constexpr (is_integral_v<
+ _From> && is_integral_v<_To> && sizeof(_From) == 8
+ && _ToSize == 16)
+ return (sizeof(_To) == 2 && !__have_ssse3)
+ || (sizeof(_To) == 1 && !__have_avx512f);
+ else if constexpr (is_floating_point_v<_From> && is_integral_v<_To>)
+ return ((sizeof(_From) == 4 || sizeof(_From) == 8) && sizeof(_To) == 8
+ && !__have_avx512dq)
+ || (sizeof(_From) == 8 && sizeof(_To) == 4 && !__have_sse4_1
+ && _ToSize == 16);
+ else if constexpr (
+ is_integral_v<_From> && is_floating_point_v<_To> && sizeof(_From) == 8
+ && !__have_avx512dq)
+ return (sizeof(_To) == 4 && _ToSize == 16)
+ || (sizeof(_To) == 8 && _ToSize < 64);
+ else
+ return false;
+ }
+
+ template <typename _From, typename _To, size_t _ToSize>
+ static inline constexpr bool __converts_via_decomposition_v
+ = __converts_via_decomposition<_From, _To, _ToSize>();
+
+ // }}}
+#endif
+ // __store {{{
+ using _CommonImplBuiltin::__store;
+
+ template <typename _Flags, typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static void __store(_SimdWrapper<_Tp, _Np> __x,
+ void* __addr, _Flags)
+ {
+ constexpr size_t _Bytes = _Np * sizeof(_Tp);
+
+ if constexpr ((_Bytes & (_Bytes - 1)) != 0 && __have_avx512bw_vl)
+ {
+ const auto __v = __to_intrin(__x);
+ if constexpr (std::is_same_v<_Flags, vector_aligned_tag>)
+ __addr
+ = __builtin_assume_aligned(__addr, alignof(_SimdWrapper<_Tp, _Np>));
+ else if constexpr (!std::is_same_v<_Flags, element_aligned_tag>)
+ __addr = __builtin_assume_aligned(__addr, _Flags::_S_alignment);
+
+ if constexpr (_Bytes & 1)
+ {
+ if constexpr (_Bytes < 16)
+ _mm_mask_storeu_epi8(__addr, 0xffffu >> (16 - _Bytes),
+ __intrin_bitcast<__m128i>(__v));
+ else if constexpr (_Bytes < 32)
+ _mm256_mask_storeu_epi8(__addr, 0xffffffffu >> (32 - _Bytes),
+ __intrin_bitcast<__m256i>(__v));
+ else
+ _mm512_mask_storeu_epi8(__addr,
+ 0xffffffffffffffffull >> (64 - _Bytes),
+ __intrin_bitcast<__m512i>(__v));
+ }
+ else if constexpr (_Bytes & 2)
+ {
+ if constexpr (_Bytes < 16)
+ _mm_mask_storeu_epi16(__addr, 0xffu >> (8 - _Bytes / 2),
+ __intrin_bitcast<__m128i>(__v));
+ else if constexpr (_Bytes < 32)
+ _mm256_mask_storeu_epi16(__addr, 0xffffu >> (16 - _Bytes / 2),
+ __intrin_bitcast<__m256i>(__v));
+ else
+ _mm512_mask_storeu_epi16(__addr,
+ 0xffffffffull >> (32 - _Bytes / 2),
+ __intrin_bitcast<__m512i>(__v));
+ }
+ else if constexpr (_Bytes & 4)
+ {
+ if constexpr (_Bytes < 16)
+ _mm_mask_storeu_epi32(__addr, 0xfu >> (4 - _Bytes / 4),
+ __intrin_bitcast<__m128i>(__v));
+ else if constexpr (_Bytes < 32)
+ _mm256_mask_storeu_epi32(__addr, 0xffu >> (8 - _Bytes / 4),
+ __intrin_bitcast<__m256i>(__v));
+ else
+ _mm512_mask_storeu_epi32(__addr, 0xffffull >> (16 - _Bytes / 4),
+ __intrin_bitcast<__m512i>(__v));
+ }
+ else
+ {
+ static_assert(
+ _Bytes > 16,
+ "_Bytes < 16 && (_Bytes & 7) == 0 && (_Bytes & (_Bytes "
+ "- 1)) != 0 is impossible");
+ if constexpr (_Bytes < 32)
+ _mm256_mask_storeu_epi64(__addr, 0xfu >> (4 - _Bytes / 8),
+ __intrin_bitcast<__m256i>(__v));
+ else
+ _mm512_mask_storeu_epi64(__addr, 0xffull >> (8 - _Bytes / 8),
+ __intrin_bitcast<__m512i>(__v));
+ }
+ }
+ else
+ _CommonImplBuiltin::__store(__x, __addr, _Flags());
+ }
+
+ // }}}
+ // __store_bool_array(_BitMask) {{{
+ template <size_t _Np, typename _Flags, bool _Sanitized>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr void
+ __store_bool_array(const _BitMask<_Np, _Sanitized> __x, bool* __mem, _Flags)
+ {
+ if constexpr (__have_avx512bw_vl) // don't care for BW w/o VL
+ __store<_Np>(1 & __vector_bitcast<_UChar, _Np>([=]() constexpr {
+ if constexpr (_Np <= 16)
+ return _mm_movm_epi8(__x._M_to_bits());
+ else if constexpr (_Np <= 32)
+ return _mm256_movm_epi8(__x._M_to_bits());
+ else if constexpr (_Np <= 64)
+ return _mm512_movm_epi8(__x._M_to_bits());
+ else
+ __assert_unreachable<_SizeConstant<_Np>>();
+ }()),
+ __mem, _Flags());
+ else if constexpr (__have_bmi2)
+ {
+ if constexpr (_Np <= 4)
+ __store<_Np>(_pdep_u32(__x._M_to_bits(), 0x01010101U), __mem,
+ _Flags());
+ else
+ __execute_n_times<__div_roundup(_Np, sizeof(size_t))>([&](auto __i) {
+ constexpr size_t __offset = __i * sizeof(size_t);
+ constexpr int __todo = std::min(sizeof(size_t), _Np - __offset);
+ if constexpr (__todo == 1)
+ __mem[__offset] = __x[__offset];
+ else
+ {
+ const auto __bools =
+#ifdef __x86_64__
+ _pdep_u64(__x.template _M_extract<__offset>().to_ullong(),
+ 0x0101010101010101ULL);
+#else // __x86_64__
+ _pdep_u32(__x.template _M_extract<__offset>()._M_to_bits(),
+ 0x01010101U);
+#endif // __x86_64__
+ __store<__todo>(__bools, __mem + __offset, _Flags());
+ }
+ });
+ }
+ else if constexpr (__have_sse2 && _Np > 7)
+ __execute_n_times<__div_roundup(_Np, 16)>([&](auto __i) {
+ constexpr int __offset = __i * 16;
+ constexpr int __todo = std::min(16, int(_Np) - __offset);
+ const int __bits = __x.template _M_extract<__offset>()._M_to_bits();
+ __vector_type16_t<_UChar> __bools;
+ if constexpr (__have_avx512f)
+ {
+ auto __as32bits
+ = _mm512_maskz_mov_epi32(__bits,
+ __to_intrin(__vector_broadcast<16>(1)));
+ auto __as16bits = __xzyw(
+ _mm256_packs_epi32(__lo256(__as32bits),
+ __todo > 8 ? __hi256(__as32bits) : __m256i()));
+ __bools = __vector_bitcast<_UChar>(
+ _mm_packs_epi16(__lo128(__as16bits), __hi128(__as16bits)));
+ }
+ else
+ {
+ using _V = __vector_type_t<_UChar, 16>;
+ auto __tmp = _mm_cvtsi32_si128(__bits);
+ __tmp = _mm_unpacklo_epi8(__tmp, __tmp);
+ __tmp = _mm_unpacklo_epi16(__tmp, __tmp);
+ __tmp = _mm_unpacklo_epi32(__tmp, __tmp);
+ _V __tmp2 = reinterpret_cast<_V>(__tmp);
+ __tmp2 &= _V{1, 2, 4, 8, 16, 32, 64, 128,
+ 1, 2, 4, 8, 16, 32, 64, 128}; // mask bit index
+ __bools = (__tmp2 == 0) + 1; // 0xff -> 0x00 | 0x00 -> 0x01
+ }
+ __store<__todo>(__bools, __mem + __offset, _Flags());
+ });
+ else
+ _CommonImplBuiltin::__store_bool_array(__x, __mem, _Flags());
+ }
+
+ // }}}
+ // _S_blend_avx512 {{{
+ // Returns: __k ? __b : __a
+ // TODO: reverse __a and __b to match COND_EXPR
+ // Requires: _TV to be a __vector_type_t matching valuetype for the bitmask
+ // __k
+ template <typename _Kp, typename _TV>
+ _GLIBCXX_SIMD_INTRINSIC static _TV
+ _S_blend_avx512(const _Kp __k, const _TV __a, const _TV __b) noexcept
+ {
+ static_assert(__is_vector_type_v<_TV>);
+ using _Tp = typename _VectorTraits<_TV>::value_type;
+ static_assert(sizeof(_TV) >= 16);
+ static_assert(sizeof(_Tp) <= 8);
+ using _IntT = conditional_t<(sizeof(_Tp) > 2),
+ conditional_t<sizeof(_Tp) == 4, int, long long>,
+ conditional_t<sizeof(_Tp) == 1, char, short>>;
+ [[maybe_unused]] const auto __aa = __vector_bitcast<_IntT>(__a);
+ [[maybe_unused]] const auto __bb = __vector_bitcast<_IntT>(__b);
+ if constexpr (sizeof(_TV) == 64)
+ {
+ if constexpr (sizeof(_Tp) == 1)
+ return reinterpret_cast<_TV>(
+ __builtin_ia32_blendmb_512_mask(__aa, __bb, __k));
+ else if constexpr (sizeof(_Tp) == 2)
+ return reinterpret_cast<_TV>(
+ __builtin_ia32_blendmw_512_mask(__aa, __bb, __k));
+ else if constexpr (sizeof(_Tp) == 4 && is_floating_point_v<_Tp>)
+ return __builtin_ia32_blendmps_512_mask(__a, __b, __k);
+ else if constexpr (sizeof(_Tp) == 4)
+ return reinterpret_cast<_TV>(
+ __builtin_ia32_blendmd_512_mask(__aa, __bb, __k));
+ else if constexpr (sizeof(_Tp) == 8 && is_floating_point_v<_Tp>)
+ return __builtin_ia32_blendmpd_512_mask(__a, __b, __k);
+ else if constexpr (sizeof(_Tp) == 8)
+ return reinterpret_cast<_TV>(
+ __builtin_ia32_blendmq_512_mask(__aa, __bb, __k));
+ }
+ else if constexpr (sizeof(_TV) == 32)
+ {
+ if constexpr (sizeof(_Tp) == 1)
+ return reinterpret_cast<_TV>(
+ __builtin_ia32_blendmb_256_mask(__aa, __bb, __k));
+ else if constexpr (sizeof(_Tp) == 2)
+ return reinterpret_cast<_TV>(
+ __builtin_ia32_blendmw_256_mask(__aa, __bb, __k));
+ else if constexpr (sizeof(_Tp) == 4 && is_floating_point_v<_Tp>)
+ return __builtin_ia32_blendmps_256_mask(__a, __b, __k);
+ else if constexpr (sizeof(_Tp) == 4)
+ return reinterpret_cast<_TV>(
+ __builtin_ia32_blendmd_256_mask(__aa, __bb, __k));
+ else if constexpr (sizeof(_Tp) == 8 && is_floating_point_v<_Tp>)
+ return __builtin_ia32_blendmpd_256_mask(__a, __b, __k);
+ else if constexpr (sizeof(_Tp) == 8)
+ return reinterpret_cast<_TV>(
+ __builtin_ia32_blendmq_256_mask(__aa, __bb, __k));
+ }
+ else if constexpr (sizeof(_TV) == 16)
+ {
+ if constexpr (sizeof(_Tp) == 1)
+ return reinterpret_cast<_TV>(
+ __builtin_ia32_blendmb_128_mask(__aa, __bb, __k));
+ else if constexpr (sizeof(_Tp) == 2)
+ return reinterpret_cast<_TV>(
+ __builtin_ia32_blendmw_128_mask(__aa, __bb, __k));
+ else if constexpr (sizeof(_Tp) == 4 && is_floating_point_v<_Tp>)
+ return __builtin_ia32_blendmps_128_mask(__a, __b, __k);
+ else if constexpr (sizeof(_Tp) == 4)
+ return reinterpret_cast<_TV>(
+ __builtin_ia32_blendmd_128_mask(__aa, __bb, __k));
+ else if constexpr (sizeof(_Tp) == 8 && is_floating_point_v<_Tp>)
+ return __builtin_ia32_blendmpd_128_mask(__a, __b, __k);
+ else if constexpr (sizeof(_Tp) == 8)
+ return reinterpret_cast<_TV>(
+ __builtin_ia32_blendmq_128_mask(__aa, __bb, __k));
+ }
+ }
+
+ // }}}
+ // _S_blend_intrin {{{
+ // Returns: __k ? __b : __a
+ // TODO: reverse __a and __b to match COND_EXPR
+ // Requires: _Tp to be an intrinsic type (integers blend per byte) and 16/32
+ // Bytes wide
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static _Tp _S_blend_intrin(_Tp __k, _Tp __a,
+ _Tp __b) noexcept
+ {
+ static_assert(is_same_v<decltype(__to_intrin(__a)), _Tp>);
+ constexpr struct
+ {
+ _GLIBCXX_SIMD_INTRINSIC __m128 operator()(__m128 __a, __m128 __b,
+ __m128 __k) const noexcept
+ {
+ return __builtin_ia32_blendvps(__a, __b, __k);
+ }
+ _GLIBCXX_SIMD_INTRINSIC __m128d operator()(__m128d __a, __m128d __b,
+ __m128d __k) const noexcept
+ {
+ return __builtin_ia32_blendvpd(__a, __b, __k);
+ }
+ _GLIBCXX_SIMD_INTRINSIC __m128i operator()(__m128i __a, __m128i __b,
+ __m128i __k) const noexcept
+ {
+ return reinterpret_cast<__m128i>(
+ __builtin_ia32_pblendvb128(reinterpret_cast<__v16qi>(__a),
+ reinterpret_cast<__v16qi>(__b),
+ reinterpret_cast<__v16qi>(__k)));
+ }
+ _GLIBCXX_SIMD_INTRINSIC __m256 operator()(__m256 __a, __m256 __b,
+ __m256 __k) const noexcept
+ {
+ return __builtin_ia32_blendvps256(__a, __b, __k);
+ }
+ _GLIBCXX_SIMD_INTRINSIC __m256d operator()(__m256d __a, __m256d __b,
+ __m256d __k) const noexcept
+ {
+ return __builtin_ia32_blendvpd256(__a, __b, __k);
+ }
+ _GLIBCXX_SIMD_INTRINSIC __m256i operator()(__m256i __a, __m256i __b,
+ __m256i __k) const noexcept
+ {
+ return reinterpret_cast<__m256i>(
+ __builtin_ia32_pblendvb256(reinterpret_cast<__v32qi>(__a),
+ reinterpret_cast<__v32qi>(__b),
+ reinterpret_cast<__v32qi>(__k)));
+ }
+ } __eval;
+ return __eval(__a, __b, __k);
+ }
+
+ // }}}
+ // _S_blend {{{
+ // Returns: __k ? __at1 : __at0
+ // TODO: reverse __at0 and __at1 to match COND_EXPR
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np>
+ _S_blend(_SimdWrapper<bool, _Np> __k, _SimdWrapper<_Tp, _Np> __at0,
+ _SimdWrapper<_Tp, _Np> __at1)
+ {
+ static_assert(is_same_v<_Tp, _Tp> && __have_avx512f);
+ if (__k._M_is_constprop() && __at0._M_is_constprop()
+ && __at1._M_is_constprop())
+ return __generate_from_n_evaluations<_Np, __vector_type_t<_Tp, _Np>>([&](
+ auto __i) constexpr { return __k[__i] ? __at1[__i] : __at0[__i]; });
+ else if constexpr (sizeof(__at0) == 64
+ || (__have_avx512vl && sizeof(__at0) >= 16))
+ return _S_blend_avx512(__k._M_data, __at0._M_data, __at1._M_data);
+ else
+ {
+ static_assert((__have_avx512vl && sizeof(__at0) < 16)
+ || !__have_avx512vl);
+ constexpr size_t __size = (__have_avx512vl ? 16 : 64) / sizeof(_Tp);
+ return __vector_bitcast<_Tp, _Np>(
+ _S_blend_avx512(__k._M_data, __vector_bitcast<_Tp, __size>(__at0),
+ __vector_bitcast<_Tp, __size>(__at1)));
+ }
+ }
+
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np>
+ _S_blend(_SimdWrapper<_Tp, _Np> __k, _SimdWrapper<_Tp, _Np> __at0,
+ _SimdWrapper<_Tp, _Np> __at1)
+ {
+ if (__builtin_is_constant_evaluated()
+ || (__k._M_is_constprop() && __at0._M_is_constprop()
+ && __at1._M_is_constprop()))
+ {
+ auto __r = __or(__andnot(__k, __at0), __and(__k, __at1));
+ if (__r._M_is_constprop())
+ return __r;
+ }
+ if constexpr (((__have_avx512f && sizeof(__at0) == 64)
+ || __have_avx512vl)
+ && (sizeof(_Tp) >= 4 || __have_avx512bw))
+ // convert to bitmask and call overload above
+ return _S_blend(_SimdWrapper<bool, _Np>(
+ __make_dependent_t<_Tp, _MaskImplX86Mixin>::__to_bits(
+ __k)
+ ._M_to_bits()),
+ __at0, __at1);
+ else
+ {
+ // Since GCC does not assume __k to be a mask, using the builtin
+ // conditional operator introduces an extra compare against 0 before
+ // blending. So we rather call the intrinsic here.
+ if constexpr (__have_sse4_1)
+ return _S_blend_intrin(__to_intrin(__k), __to_intrin(__at0),
+ __to_intrin(__at1));
+ else
+ return __or(__andnot(__k, __at0), __and(__k, __at1));
+ }
+ }
+
+ // }}}
+};
+
+// }}}
+// _SimdImplX86 {{{
+template <typename _Abi> struct _SimdImplX86 : _SimdImplBuiltin<_Abi>
+{
+ using _Base = _SimdImplBuiltin<_Abi>;
+ template <typename _Tp>
+ using _MaskMember = typename _Base::template _MaskMember<_Tp>;
+ template <typename _Tp>
+ static constexpr size_t _S_full_size = _Abi::template _S_full_size<_Tp>;
+ template <typename _Tp>
+ static constexpr size_t size = _Abi::template size<_Tp>;
+ template <typename _Tp>
+ static constexpr size_t _S_max_store_size
+ = (sizeof(_Tp) >= 4 && __have_avx512f) || __have_avx512bw
+ ? 64
+ : (std::is_floating_point_v<_Tp>&& __have_avx) || __have_avx2 ? 32 : 16;
+ using _MaskImpl = typename _Abi::_MaskImpl;
+
+ // __masked_load {{{
+ template <typename _Tp, size_t _Np, typename _Up, typename _Fp>
+ static inline _SimdWrapper<_Tp, _Np>
+ __masked_load(_SimdWrapper<_Tp, _Np> __merge, _MaskMember<_Tp> __k,
+ const _Up* __mem, _Fp) noexcept
+ {
+ static_assert(_Np == size<_Tp>);
+ if constexpr (std::is_same_v<_Tp, _Up> || // no conversion
+ (sizeof(_Tp) == sizeof(_Up)
+ && std::is_integral_v<
+ _Tp> == std::is_integral_v<_Up>) // conversion via bit
+ // reinterpretation
+ )
+ {
+ [[maybe_unused]] const auto __intrin = __to_intrin(__merge);
+ if constexpr ((__is_avx512_abi<_Abi>() || __have_avx512bw_vl)
+ && sizeof(_Tp) == 1)
+ {
+ const auto __kk = _MaskImpl::__to_bits(__k)._M_to_bits();
+ if constexpr (sizeof(__intrin) == 16)
+ __merge = __vector_bitcast<_Tp, _Np>(
+ _mm_mask_loadu_epi8(__intrin, __kk, __mem));
+ else if constexpr (sizeof(__merge) == 32)
+ __merge = __vector_bitcast<_Tp, _Np>(
+ _mm256_mask_loadu_epi8(__intrin, __kk, __mem));
+ else if constexpr (sizeof(__merge) == 64)
+ __merge = __vector_bitcast<_Tp, _Np>(
+ _mm512_mask_loadu_epi8(__intrin, __kk, __mem));
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr ((__is_avx512_abi<_Abi>() || __have_avx512bw_vl)
+ && sizeof(_Tp) == 2)
+ {
+ const auto __kk = _MaskImpl::__to_bits(__k)._M_to_bits();
+ if constexpr (sizeof(__intrin) == 16)
+ __merge = __vector_bitcast<_Tp, _Np>(
+ _mm_mask_loadu_epi16(__intrin, __kk, __mem));
+ else if constexpr (sizeof(__intrin) == 32)
+ __merge = __vector_bitcast<_Tp, _Np>(
+ _mm256_mask_loadu_epi16(__intrin, __kk, __mem));
+ else if constexpr (sizeof(__intrin) == 64)
+ __merge = __vector_bitcast<_Tp, _Np>(
+ _mm512_mask_loadu_epi16(__intrin, __kk, __mem));
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr ((__is_avx512_abi<_Abi>() || __have_avx512vl)
+ && sizeof(_Tp) == 4 && std::is_integral_v<_Up>)
+ {
+ const auto __kk = _MaskImpl::__to_bits(__k)._M_to_bits();
+ if constexpr (sizeof(__intrin) == 16)
+ __merge = __vector_bitcast<_Tp, _Np>(
+ _mm_mask_loadu_epi32(__intrin, __kk, __mem));
+ else if constexpr (sizeof(__intrin) == 32)
+ __merge = __vector_bitcast<_Tp, _Np>(
+ _mm256_mask_loadu_epi32(__intrin, __kk, __mem));
+ else if constexpr (sizeof(__intrin) == 64)
+ __merge = __vector_bitcast<_Tp, _Np>(
+ _mm512_mask_loadu_epi32(__intrin, __kk, __mem));
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr ((__is_avx512_abi<_Abi>() || __have_avx512vl)
+ && sizeof(_Tp) == 4 && std::is_floating_point_v<_Up>)
+ {
+ const auto __kk = _MaskImpl::__to_bits(__k)._M_to_bits();
+ if constexpr (sizeof(__intrin) == 16)
+ __merge = __vector_bitcast<_Tp, _Np>(
+ _mm_mask_loadu_ps(__intrin, __kk, __mem));
+ else if constexpr (sizeof(__intrin) == 32)
+ __merge = __vector_bitcast<_Tp, _Np>(
+ _mm256_mask_loadu_ps(__intrin, __kk, __mem));
+ else if constexpr (sizeof(__intrin) == 64)
+ __merge = __vector_bitcast<_Tp, _Np>(
+ _mm512_mask_loadu_ps(__intrin, __kk, __mem));
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (__have_avx2 && sizeof(_Tp) == 4
+ && std::is_integral_v<_Up>)
+ {
+ if constexpr (sizeof(__intrin) == 16)
+ __merge
+ = __or(__andnot(__k._M_data, __merge._M_data),
+ __vector_bitcast<_Tp, _Np>(
+ _mm_maskload_epi32(reinterpret_cast<const int*>(__mem),
+ __to_intrin(__k))));
+ else if constexpr (sizeof(__intrin) == 32)
+ __merge
+ = (~__k._M_data & __merge._M_data)
+ | __vector_bitcast<_Tp, _Np>(
+ _mm256_maskload_epi32(reinterpret_cast<const int*>(__mem),
+ __to_intrin(__k)));
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (__have_avx && sizeof(_Tp) == 4)
+ {
+ if constexpr (sizeof(__intrin) == 16)
+ __merge = __or(__andnot(__k._M_data, __merge._M_data),
+ __vector_bitcast<_Tp, _Np>(_mm_maskload_ps(
+ reinterpret_cast<const float*>(__mem),
+ __intrin_bitcast<__m128i>(__as_vector(__k)))));
+ else if constexpr (sizeof(__intrin) == 32)
+ __merge
+ = __or(__andnot(__k._M_data, __merge._M_data),
+ _mm256_maskload_ps(reinterpret_cast<const float*>(__mem),
+ __vector_bitcast<_LLong>(__k)));
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr ((__is_avx512_abi<_Abi>() || __have_avx512vl)
+ && sizeof(_Tp) == 8 && std::is_integral_v<_Up>)
+ {
+ const auto __kk = _MaskImpl::__to_bits(__k)._M_to_bits();
+ if constexpr (sizeof(__intrin) == 16)
+ __merge = __vector_bitcast<_Tp, _Np>(
+ _mm_mask_loadu_epi64(__intrin, __kk, __mem));
+ else if constexpr (sizeof(__intrin) == 32)
+ __merge = __vector_bitcast<_Tp, _Np>(
+ _mm256_mask_loadu_epi64(__intrin, __kk, __mem));
+ else if constexpr (sizeof(__intrin) == 64)
+ __merge = __vector_bitcast<_Tp, _Np>(
+ _mm512_mask_loadu_epi64(__intrin, __kk, __mem));
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr ((__is_avx512_abi<_Abi>() || __have_avx512vl)
+ && sizeof(_Tp) == 8 && std::is_floating_point_v<_Up>)
+ {
+ const auto __kk = _MaskImpl::__to_bits(__k)._M_to_bits();
+ if constexpr (sizeof(__intrin) == 16)
+ __merge = __vector_bitcast<_Tp, _Np>(
+ _mm_mask_loadu_pd(__intrin, __kk, __mem));
+ else if constexpr (sizeof(__intrin) == 32)
+ __merge = __vector_bitcast<_Tp, _Np>(
+ _mm256_mask_loadu_pd(__intrin, __kk, __mem));
+ else if constexpr (sizeof(__intrin) == 64)
+ __merge = __vector_bitcast<_Tp, _Np>(
+ _mm512_mask_loadu_pd(__intrin, __kk, __mem));
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (__have_avx2 && sizeof(_Tp) == 8
+ && std::is_integral_v<_Up>)
+ {
+ if constexpr (sizeof(__intrin) == 16)
+ __merge = __or(__andnot(__k._M_data, __merge._M_data),
+ __vector_bitcast<_Tp, _Np>(_mm_maskload_epi64(
+ reinterpret_cast<const _LLong*>(__mem),
+ __to_intrin(__k))));
+ else if constexpr (sizeof(__intrin) == 32)
+ __merge
+ = (~__k._M_data & __merge._M_data)
+ | __vector_bitcast<_Tp>(_mm256_maskload_epi64(
+ reinterpret_cast<const _LLong*>(__mem), __to_intrin(__k)));
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (__have_avx && sizeof(_Tp) == 8)
+ {
+ if constexpr (sizeof(__intrin) == 16)
+ __merge
+ = __or(__andnot(__k._M_data, __merge._M_data),
+ __vector_bitcast<_Tp, _Np>(
+ _mm_maskload_pd(reinterpret_cast<const double*>(__mem),
+ __vector_bitcast<_LLong>(__k))));
+ else if constexpr (sizeof(__intrin) == 32)
+ __merge = __or(__andnot(__k._M_data, __merge._M_data),
+ _mm256_maskload_pd(reinterpret_cast<const double*>(
+ __mem),
+ __vector_bitcast<_LLong>(__k)));
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else
+ _BitOps::__bit_iteration(_MaskImpl::__to_bits(__k), [&](auto __i) {
+ __merge.__set(__i, static_cast<_Tp>(__mem[__i]));
+ });
+ }
+ /* Very uncertain, that the following improves anything. Needs benchmarking
+ * before it's activated.
+ else if constexpr (sizeof(_Up) <= 8 && // no long double
+ !__converts_via_decomposition_v<
+ _Up, _Tp,
+ sizeof(__merge)> // conversion via decomposition
+ // is better handled via the
+ // bit_iteration fallback below
+ )
+ {
+ // TODO: copy pattern from __masked_store, which doesn't resort to
+ // fixed_size
+ using _Ap = simd_abi::deduce_t<_Up, _Np>;
+ using _ATraits = _SimdTraits<_Up, _Ap>;
+ using _AImpl = typename _ATraits::_SimdImpl;
+ typename _ATraits::_SimdMember __uncvted{};
+ typename _ATraits::_MaskMember __kk = _Ap::_MaskImpl::template
+ __convert<_Up>(__k);
+ __uncvted = _AImpl::__masked_load(__uncvted, __kk, __mem, _Fp());
+ _SimdConverter<_Up, _Ap, _Tp, _Abi> __converter;
+ _Base::__masked_assign(__k, __merge, __converter(__uncvted));
+ }
+ */
+ else
+ __merge = _Base::__masked_load(__merge, __k, __mem, _Fp());
+ return __merge;
+ return __merge;
+ }
+
+ // }}}
+ // __masked_store_nocvt {{{
+ template <typename _Tp, std::size_t _Np, typename _Fp>
+ _GLIBCXX_SIMD_INTRINSIC static void
+ __masked_store_nocvt(_SimdWrapper<_Tp, _Np> __v, _Tp* __mem, _Fp,
+ _SimdWrapper<bool, _Np> __k)
+ {
+ [[maybe_unused]] const auto __vi = __to_intrin(__v);
+ if constexpr (sizeof(__vi) == 64)
+ {
+ static_assert(sizeof(__v) == 64 && __have_avx512f);
+ if constexpr (__have_avx512bw && sizeof(_Tp) == 1)
+ _mm512_mask_storeu_epi8(__mem, __k, __vi);
+ else if constexpr (__have_avx512bw && sizeof(_Tp) == 2)
+ _mm512_mask_storeu_epi16(__mem, __k, __vi);
+ else if constexpr (__have_avx512f && sizeof(_Tp) == 4)
+ {
+ if constexpr (__is_aligned_v<_Fp, 64> && std::is_integral_v<_Tp>)
+ _mm512_mask_store_epi32(__mem, __k, __vi);
+ else if constexpr (__is_aligned_v<
+ _Fp, 64> && std::is_floating_point_v<_Tp>)
+ _mm512_mask_store_ps(__mem, __k, __vi);
+ else if constexpr (std::is_integral_v<_Tp>)
+ _mm512_mask_storeu_epi32(__mem, __k, __vi);
+ else
+ _mm512_mask_storeu_ps(__mem, __k, __vi);
+ }
+ else if constexpr (__have_avx512f && sizeof(_Tp) == 8)
+ {
+ if constexpr (__is_aligned_v<_Fp, 64> && std::is_integral_v<_Tp>)
+ _mm512_mask_store_epi64(__mem, __k, __vi);
+ else if constexpr (__is_aligned_v<
+ _Fp, 64> && std::is_floating_point_v<_Tp>)
+ _mm512_mask_store_pd(__mem, __k, __vi);
+ else if constexpr (std::is_integral_v<_Tp>)
+ _mm512_mask_storeu_epi64(__mem, __k, __vi);
+ else
+ _mm512_mask_storeu_pd(__mem, __k, __vi);
+ }
+#if 0 // with KNL either sizeof(_Tp) >= 4 or sizeof(_vi) <= 32
+ // with Skylake-AVX512, __have_avx512bw is true
+ else if constexpr (__have_sse2)
+ {
+ using _M = __vector_type_t<_Tp, _Np>;
+ using _MVT = _VectorTraits<_M>;
+ _mm_maskmoveu_si128(__auto_bitcast(__extract<0, 4>(__v._M_data)),
+ __auto_bitcast(_MaskImpl::template __convert<_Tp, _Np>(__k._M_data)),
+ reinterpret_cast<char*>(__mem));
+ _mm_maskmoveu_si128(__auto_bitcast(__extract<1, 4>(__v._M_data)),
+ __auto_bitcast(_MaskImpl::template __convert<_Tp, _Np>(
+ __k._M_data >> 1 * _MVT::_S_width)),
+ reinterpret_cast<char*>(__mem) + 1 * 16);
+ _mm_maskmoveu_si128(__auto_bitcast(__extract<2, 4>(__v._M_data)),
+ __auto_bitcast(_MaskImpl::template __convert<_Tp, _Np>(
+ __k._M_data >> 2 * _MVT::_S_width)),
+ reinterpret_cast<char*>(__mem) + 2 * 16);
+ if constexpr (_Np > 48 / sizeof(_Tp))
+ _mm_maskmoveu_si128(
+ __auto_bitcast(__extract<3, 4>(__v._M_data)),
+ __auto_bitcast(_MaskImpl::template __convert<_Tp, _Np>(
+ __k._M_data >> 3 * _MVT::_S_width)),
+ reinterpret_cast<char*>(__mem) + 3 * 16);
+ }
+#endif
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (sizeof(__vi) == 32)
+ {
+ if constexpr (__have_avx512bw_vl && sizeof(_Tp) == 1)
+ _mm256_mask_storeu_epi8(__mem, __k, __vi);
+ else if constexpr (__have_avx512bw_vl && sizeof(_Tp) == 2)
+ _mm256_mask_storeu_epi16(__mem, __k, __vi);
+ else if constexpr (__have_avx512vl && sizeof(_Tp) == 4)
+ {
+ if constexpr (__is_aligned_v<_Fp, 32> && std::is_integral_v<_Tp>)
+ _mm256_mask_store_epi32(__mem, __k, __vi);
+ else if constexpr (__is_aligned_v<
+ _Fp, 32> && std::is_floating_point_v<_Tp>)
+ _mm256_mask_store_ps(__mem, __k, __vi);
+ else if constexpr (std::is_integral_v<_Tp>)
+ _mm256_mask_storeu_epi32(__mem, __k, __vi);
+ else
+ _mm256_mask_storeu_ps(__mem, __k, __vi);
+ }
+ else if constexpr (__have_avx512vl && sizeof(_Tp) == 8)
+ {
+ if constexpr (__is_aligned_v<_Fp, 32> && std::is_integral_v<_Tp>)
+ _mm256_mask_store_epi64(__mem, __k, __vi);
+ else if constexpr (__is_aligned_v<
+ _Fp, 32> && std::is_floating_point_v<_Tp>)
+ _mm256_mask_store_pd(__mem, __k, __vi);
+ else if constexpr (std::is_integral_v<_Tp>)
+ _mm256_mask_storeu_epi64(__mem, __k, __vi);
+ else
+ _mm256_mask_storeu_pd(__mem, __k, __vi);
+ }
+ else if constexpr (__have_avx512f
+ && (sizeof(_Tp) >= 4 || __have_avx512bw))
+ {
+ // use a 512-bit maskstore, using zero-extension of the bitmask
+ __masked_store_nocvt(
+ _SimdWrapper64<_Tp>(
+ __intrin_bitcast<__vector_type64_t<_Tp>>(__v._M_data)),
+ __mem,
+ // careful, vector_aligned has a stricter meaning in the
+ // 512-bit maskstore:
+ std::conditional_t<std::is_same_v<_Fp, vector_aligned_tag>,
+ overaligned_tag<32>, _Fp>(),
+ _SimdWrapper<bool, 64 / sizeof(_Tp)>(__k._M_data));
+ }
+ else
+ __masked_store_nocvt(
+ __v, __mem, _Fp(),
+ _MaskImpl::template __to_maskvector<_Tp, 32 / sizeof(_Tp)>(__k));
+ }
+ else if constexpr (sizeof(__vi) == 16)
+ {
+ // the store is aligned if _Fp is overaligned_tag<16> (or higher) or _Fp
+ // is vector_aligned_tag while __v is actually a 16-Byte vector (could
+ // be 2/4/8 as well)
+ [[maybe_unused]] constexpr bool __aligned
+ = __is_aligned_v<
+ _Fp,
+ 16> && (sizeof(__v) == 16 || !std::is_same_v<_Fp, vector_aligned_tag>);
+ if constexpr (__have_avx512bw_vl && sizeof(_Tp) == 1)
+ _mm_mask_storeu_epi8(__mem, __k, __vi);
+ else if constexpr (__have_avx512bw_vl && sizeof(_Tp) == 2)
+ _mm_mask_storeu_epi16(__mem, __k, __vi);
+ else if constexpr (__have_avx512vl && sizeof(_Tp) == 4)
+ {
+ if constexpr (__aligned && std::is_integral_v<_Tp>)
+ _mm_mask_store_epi32(__mem, __k, __vi);
+ else if constexpr (__aligned && std::is_floating_point_v<_Tp>)
+ _mm_mask_store_ps(__mem, __k, __vi);
+ else if constexpr (std::is_integral_v<_Tp>)
+ _mm_mask_storeu_epi32(__mem, __k, __vi);
+ else
+ _mm_mask_storeu_ps(__mem, __k, __vi);
+ }
+ else if constexpr (__have_avx512vl && sizeof(_Tp) == 8)
+ {
+ if constexpr (__aligned && std::is_integral_v<_Tp>)
+ _mm_mask_store_epi64(__mem, __k, __vi);
+ else if constexpr (__aligned && std::is_floating_point_v<_Tp>)
+ _mm_mask_store_pd(__mem, __k, __vi);
+ else if constexpr (std::is_integral_v<_Tp>)
+ _mm_mask_storeu_epi64(__mem, __k, __vi);
+ else
+ _mm_mask_storeu_pd(__mem, __k, __vi);
+ }
+ else if constexpr (__have_avx512f
+ && (sizeof(_Tp) >= 4 || __have_avx512bw))
+ {
+ // use a 512-bit maskstore, using zero-extension of the bitmask
+ __masked_store_nocvt(
+ _SimdWrapper64<_Tp>(
+ __intrin_bitcast<__intrinsic_type64_t<_Tp>>(__v._M_data)),
+ __mem,
+ // careful, vector_aligned has a stricter meaning in the 512-bit
+ // maskstore:
+ std::conditional_t<std::is_same_v<_Fp, vector_aligned_tag>,
+ overaligned_tag<sizeof(__v)>, _Fp>(),
+ _SimdWrapper<bool, 64 / sizeof(_Tp)>(__k._M_data));
+ }
+ else
+ __masked_store_nocvt(
+ __v, __mem, _Fp(),
+ _MaskImpl::template __to_maskvector<_Tp, 16 / sizeof(_Tp)>(__k));
+ }
+ else
+ __assert_unreachable<_Tp>();
+ }
+
+ template <typename _TW,
+ typename _Tp = typename _VectorTraits<_TW>::value_type,
+ typename _Fp>
+ _GLIBCXX_SIMD_INTRINSIC static void __masked_store_nocvt(_TW __v, _Tp* __mem,
+ _Fp, _TW __k)
+ {
+ if constexpr (sizeof(_TW) <= 16)
+ {
+ [[maybe_unused]] const auto __vi
+ = __intrin_bitcast<__m128i>(__as_vector(__v));
+ [[maybe_unused]] const auto __ki
+ = __intrin_bitcast<__m128i>(__as_vector(__k));
+ if constexpr (__have_avx512bw_vl && sizeof(_Tp) == 1)
+ _mm_mask_storeu_epi8(__mem, _mm_movepi8_mask(__ki), __vi);
+ else if constexpr (__have_avx512bw_vl && sizeof(_Tp) == 2)
+ _mm_mask_storeu_epi16(__mem, _mm_movepi16_mask(__ki), __vi);
+ else if constexpr (__have_avx2 && sizeof(_Tp) == 4
+ && std::is_integral_v<_Tp>)
+ _mm_maskstore_epi32(reinterpret_cast<int*>(__mem), __ki, __vi);
+ else if constexpr (__have_avx && sizeof(_Tp) == 4)
+ _mm_maskstore_ps(reinterpret_cast<float*>(__mem), __ki,
+ __vector_bitcast<float>(__vi));
+ else if constexpr (__have_avx2 && sizeof(_Tp) == 8
+ && std::is_integral_v<_Tp>)
+ _mm_maskstore_epi64(reinterpret_cast<_LLong*>(__mem), __ki, __vi);
+ else if constexpr (__have_avx && sizeof(_Tp) == 8)
+ _mm_maskstore_pd(reinterpret_cast<double*>(__mem), __ki,
+ __vector_bitcast<double>(__vi));
+ else if constexpr (__have_sse2)
+ _mm_maskmoveu_si128(__vi, __ki, reinterpret_cast<char*>(__mem));
+ }
+ else if constexpr (sizeof(_TW) == 32)
+ {
+ [[maybe_unused]] const auto __vi
+ = __intrin_bitcast<__m256i>(__as_vector(__v));
+ [[maybe_unused]] const auto __ki
+ = __intrin_bitcast<__m256i>(__as_vector(__k));
+ if constexpr (__have_avx512bw_vl && sizeof(_Tp) == 1)
+ _mm256_mask_storeu_epi8(__mem, _mm256_movepi8_mask(__ki), __vi);
+ else if constexpr (__have_avx512bw_vl && sizeof(_Tp) == 2)
+ _mm256_mask_storeu_epi16(__mem, _mm256_movepi16_mask(__ki), __vi);
+ else if constexpr (__have_avx2 && sizeof(_Tp) == 4
+ && std::is_integral_v<_Tp>)
+ _mm256_maskstore_epi32(reinterpret_cast<int*>(__mem), __ki, __vi);
+ else if constexpr (sizeof(_Tp) == 4)
+ _mm256_maskstore_ps(reinterpret_cast<float*>(__mem), __ki,
+ __vector_bitcast<float>(__v));
+ else if constexpr (__have_avx2 && sizeof(_Tp) == 8
+ && std::is_integral_v<_Tp>)
+ _mm256_maskstore_epi64(reinterpret_cast<_LLong*>(__mem), __ki, __vi);
+ else if constexpr (__have_avx && sizeof(_Tp) == 8)
+ _mm256_maskstore_pd(reinterpret_cast<double*>(__mem), __ki,
+ __vector_bitcast<double>(__v));
+ else if constexpr (__have_sse2)
+ {
+ _mm_maskmoveu_si128(__lo128(__vi), __lo128(__ki),
+ reinterpret_cast<char*>(__mem));
+ _mm_maskmoveu_si128(__hi128(__vi), __hi128(__ki),
+ reinterpret_cast<char*>(__mem) + 16);
+ }
+ }
+ else
+ __assert_unreachable<_Tp>();
+ }
+
+ // }}}
+ // __masked_store {{{
+ template <typename _Tp, size_t _Np, typename _Up, typename _Fp>
+ _GLIBCXX_SIMD_INTRINSIC static void
+ __masked_store(const _SimdWrapper<_Tp, _Np> __v, _Up* __mem, _Fp,
+ const _MaskMember<_Tp> __k) noexcept
+ {
+ if constexpr (std::is_integral_v<
+ _Tp> && std::is_integral_v<_Up> && sizeof(_Tp) > sizeof(_Up)
+ && __have_avx512f && (sizeof(_Tp) >= 4 || __have_avx512bw)
+ && (sizeof(__v) == 64 || __have_avx512vl))
+ { // truncating store
+ const auto __vi = __to_intrin(__v);
+ const auto __kk = _MaskImpl::__to_bits(__k)._M_to_bits();
+ if constexpr (sizeof(_Tp) == 8 && sizeof(_Up) == 4
+ && sizeof(__vi) == 64)
+ _mm512_mask_cvtepi64_storeu_epi32(__mem, __kk, __vi);
+ else if constexpr (sizeof(_Tp) == 8 && sizeof(_Up) == 4
+ && sizeof(__vi) == 32)
+ _mm256_mask_cvtepi64_storeu_epi32(__mem, __kk, __vi);
+ else if constexpr (sizeof(_Tp) == 8 && sizeof(_Up) == 4
+ && sizeof(__vi) == 16)
+ _mm_mask_cvtepi64_storeu_epi32(__mem, __kk, __vi);
+ else if constexpr (sizeof(_Tp) == 8 && sizeof(_Up) == 2
+ && sizeof(__vi) == 64)
+ _mm512_mask_cvtepi64_storeu_epi16(__mem, __kk, __vi);
+ else if constexpr (sizeof(_Tp) == 8 && sizeof(_Up) == 2
+ && sizeof(__vi) == 32)
+ _mm256_mask_cvtepi64_storeu_epi16(__mem, __kk, __vi);
+ else if constexpr (sizeof(_Tp) == 8 && sizeof(_Up) == 2
+ && sizeof(__vi) == 16)
+ _mm_mask_cvtepi64_storeu_epi16(__mem, __kk, __vi);
+ else if constexpr (sizeof(_Tp) == 8 && sizeof(_Up) == 1
+ && sizeof(__vi) == 64)
+ _mm512_mask_cvtepi64_storeu_epi8(__mem, __kk, __vi);
+ else if constexpr (sizeof(_Tp) == 8 && sizeof(_Up) == 1
+ && sizeof(__vi) == 32)
+ _mm256_mask_cvtepi64_storeu_epi8(__mem, __kk, __vi);
+ else if constexpr (sizeof(_Tp) == 8 && sizeof(_Up) == 1
+ && sizeof(__vi) == 16)
+ _mm_mask_cvtepi64_storeu_epi8(__mem, __kk, __vi);
+ else if constexpr (sizeof(_Tp) == 4 && sizeof(_Up) == 2
+ && sizeof(__vi) == 64)
+ _mm512_mask_cvtepi32_storeu_epi16(__mem, __kk, __vi);
+ else if constexpr (sizeof(_Tp) == 4 && sizeof(_Up) == 2
+ && sizeof(__vi) == 32)
+ _mm256_mask_cvtepi32_storeu_epi16(__mem, __kk, __vi);
+ else if constexpr (sizeof(_Tp) == 4 && sizeof(_Up) == 2
+ && sizeof(__vi) == 16)
+ _mm_mask_cvtepi32_storeu_epi16(__mem, __kk, __vi);
+ else if constexpr (sizeof(_Tp) == 4 && sizeof(_Up) == 1
+ && sizeof(__vi) == 64)
+ _mm512_mask_cvtepi32_storeu_epi8(__mem, __kk, __vi);
+ else if constexpr (sizeof(_Tp) == 4 && sizeof(_Up) == 1
+ && sizeof(__vi) == 32)
+ _mm256_mask_cvtepi32_storeu_epi8(__mem, __kk, __vi);
+ else if constexpr (sizeof(_Tp) == 4 && sizeof(_Up) == 1
+ && sizeof(__vi) == 16)
+ _mm_mask_cvtepi32_storeu_epi8(__mem, __kk, __vi);
+ else if constexpr (sizeof(_Tp) == 2 && sizeof(_Up) == 1
+ && sizeof(__vi) == 64)
+ _mm512_mask_cvtepi16_storeu_epi8(__mem, __kk, __vi);
+ else if constexpr (sizeof(_Tp) == 2 && sizeof(_Up) == 1
+ && sizeof(__vi) == 32)
+ _mm256_mask_cvtepi16_storeu_epi8(__mem, __kk, __vi);
+ else if constexpr (sizeof(_Tp) == 2 && sizeof(_Up) == 1
+ && sizeof(__vi) == 16)
+ _mm_mask_cvtepi16_storeu_epi8(__mem, __kk, __vi);
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else
+ _Base::__masked_store(__v, __mem, _Fp(), __k);
+ }
+
+ // }}}
+ // __multiplies {{{
+ template <typename _V, typename _VVT = _VectorTraits<_V>>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _V __multiplies(_V __x, _V __y)
+ {
+ using _Tp = typename _VVT::value_type;
+ if (__builtin_is_constant_evaluated() || __x._M_is_constprop()
+ || __y._M_is_constprop())
+ return __as_vector(__x) * __as_vector(__y);
+ else if constexpr (sizeof(_Tp) == 1)
+ {
+ if constexpr (sizeof(_V) == 2)
+ {
+ const auto __xs = reinterpret_cast<short>(__x._M_data);
+ const auto __ys = reinterpret_cast<short>(__y._M_data);
+ return reinterpret_cast<__vector_type_t<_Tp, 2>>(
+ short(((__xs * __ys) & 0xff) | ((__xs >> 8) * (__ys & 0xff00))));
+ }
+ else if constexpr (sizeof(_V) == 4 && _VVT::_S_partial_width == 3)
+ {
+ const auto __xi = reinterpret_cast<int>(__x._M_data);
+ const auto __yi = reinterpret_cast<int>(__y._M_data);
+ return reinterpret_cast<__vector_type_t<_Tp, 3>>(
+ ((__xi * __yi) & 0xff)
+ | (((__xi >> 8) * (__yi & 0xff00)) & 0xff00)
+ | ((__xi >> 16) * (__yi & 0xff0000)));
+ }
+ else if constexpr (sizeof(_V) == 4)
+ {
+ const auto __xi = reinterpret_cast<int>(__x._M_data);
+ const auto __yi = reinterpret_cast<int>(__y._M_data);
+ return reinterpret_cast<__vector_type_t<_Tp, 4>>(
+ ((__xi * __yi) & 0xff)
+ | (((__xi >> 8) * (__yi & 0xff00)) & 0xff00)
+ | (((__xi >> 16) * (__yi & 0xff0000)) & 0xff0000)
+ | ((__xi >> 24) * (__yi & 0xff000000u)));
+ }
+ else if constexpr (sizeof(_V) == 8 && __have_avx2
+ && std::is_signed_v<_Tp>)
+ return __convert<typename _VVT::type>(
+ __vector_bitcast<short>(_mm_cvtepi8_epi16(__to_intrin(__x)))
+ * __vector_bitcast<short>(_mm_cvtepi8_epi16(__to_intrin(__y))));
+ else if constexpr (sizeof(_V) == 8 && __have_avx2
+ && std::is_unsigned_v<_Tp>)
+ return __convert<typename _VVT::type>(
+ __vector_bitcast<short>(_mm_cvtepu8_epi16(__to_intrin(__x)))
+ * __vector_bitcast<short>(_mm_cvtepu8_epi16(__to_intrin(__y))));
+ else
+ {
+ // codegen of `x*y` is suboptimal (as of GCC 9.0.1)
+ constexpr size_t __full_size = _VVT::_S_width;
+ constexpr int _Np = sizeof(_V) >= 16 ? __full_size / 2 : 8;
+ using _ShortW = _SimdWrapper<short, _Np>;
+ const _ShortW __even = __vector_bitcast<short, _Np>(__x)
+ * __vector_bitcast<short, _Np>(__y);
+ _ShortW __high_byte = _ShortW()._M_data - 256;
+ //[&]() { asm("" : "+x"(__high_byte._M_data)); }();
+ const _ShortW __odd
+ = (__vector_bitcast<short, _Np>(__x) >> 8)
+ * (__vector_bitcast<short, _Np>(__y) & __high_byte._M_data);
+ if constexpr (__have_avx512bw && sizeof(_V) > 2)
+ return _CommonImplX86::_S_blend_avx512(
+ 0xaaaa'aaaa'aaaa'aaaaLL, __vector_bitcast<_Tp>(__even),
+ __vector_bitcast<_Tp>(__odd));
+ else if constexpr (__have_sse4_1 && sizeof(_V) > 2)
+ return _CommonImplX86::_S_blend_intrin(__to_intrin(__high_byte),
+ __to_intrin(__even),
+ __to_intrin(__odd));
+ else
+ return __to_intrin(__or(__andnot(__high_byte, __even), __odd));
+ }
+ }
+ else
+ return _Base::__multiplies(__x, __y);
+ }
+
+ // }}}
+ // __divides {{{
+#ifdef _GLIBCXX_SIMD_WORKAROUND_PR90993
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np>
+ __divides(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y)
+ {
+ if (!__builtin_is_constant_evaluated()
+ && !__builtin_constant_p(__y._M_data))
+ if constexpr (is_integral_v<_Tp> && sizeof(_Tp) <= 4)
+ { // use divps - codegen of `x/y` is suboptimal (as of GCC 9.0.1)
+ // Note that using floating-point division is likely to raise the
+ // *Inexact* exception flag and thus appears like an invalid "as-if"
+ // transformation. However, C++ doesn't specify how the fpenv can be
+ // observed and points to C. C says that function calls are assumed to
+ // potentially raise fp exceptions, unless documented otherwise.
+ // Consequently, operator/, which is a function call, may raise fp
+ // exceptions.
+ /*const struct _CsrGuard
+ {
+ const unsigned _M_data = _mm_getcsr();
+ _CsrGuard()
+ {
+ _mm_setcsr(0x9f80); // turn off FP exceptions and flush-to-zero
+ }
+ ~_CsrGuard() { _mm_setcsr(_M_data); }
+ } __csr;*/
+ using _Float = conditional_t<sizeof(_Tp) == 4, double, float>;
+ constexpr size_t __n_intermediate
+ = std::min(_Np, (__have_avx512f ? 64 : __have_avx ? 32 : 16)
+ / sizeof(_Float));
+ using _FloatV = __vector_type_t<_Float, __n_intermediate>;
+ constexpr size_t __n_floatv = __div_roundup(_Np, __n_intermediate);
+ using _R = __vector_type_t<_Tp, _Np>;
+ const auto __xf = __convert_all<_FloatV, __n_floatv>(__x);
+ const auto __yf = __convert_all<_FloatV, __n_floatv>(
+ _Abi::__make_padding_nonzero(__as_vector(__y)));
+ return __call_with_n_evaluations<__n_floatv>(
+ [](auto... __quotients) {
+ return __vector_convert<_R>(__quotients...);
+ },
+ [&__xf, &__yf](auto __i) { return __xf[__i] / __yf[__i]; });
+ }
+ /* 64-bit int division is potentially optimizable via double division if
+ * the value in __x is small enough and the conversion between
+ * int<->double is efficient enough:
+ else if constexpr (is_integral_v<_Tp> && is_unsigned_v<_Tp> &&
+ sizeof(_Tp) == 8)
+ {
+ if constexpr (__have_sse4_1 && sizeof(__x) == 16)
+ {
+ if (_mm_test_all_zeros(__x, __m128i{0xffe0'0000'0000'0000ull,
+ 0xffe0'0000'0000'0000ull}))
+ {
+ __x._M_data | 0x __vector_convert<__m128d>(__x._M_data)
+ }
+ }
+ }
+ */
+ return _Base::__divides(__x, __y);
+ }
+#endif // _GLIBCXX_SIMD_WORKAROUND_PR90993
+
+ // }}}
+ // __modulus {{{
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np>
+ __modulus(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y)
+ {
+ if (__builtin_is_constant_evaluated() || __builtin_constant_p(__y._M_data)
+ || sizeof(_Tp) >= 8)
+ return _Base::__modulus(__x, __y);
+ else
+ return _Base::__minus(__x, __multiplies(__y, __divides(__x, __y)));
+ }
+
+ // }}}
+ // __bit_shift_left {{{
+ // Notes on UB. C++2a [expr.shift] says:
+ // -1- [...] The operands shall be of integral or unscoped enumeration type
+ // and integral promotions are performed. The type of the result is that
+ // of the promoted left operand. The behavior is undefined if the right
+ // operand is negative, or greater than or equal to the width of the
+ // promoted left operand.
+ // -2- The value of E1 << E2 is the unique value congruent to E1×2^E2 modulo
+ // 2^N, where N is the width of the type of the result.
+ //
+ // C++17 [expr.shift] says:
+ // -2- The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated
+ // bits are zero-filled. If E1 has an unsigned type, the value of the
+ // result is E1 × 2^E2 , reduced modulo one more than the maximum value
+ // representable in the result type. Otherwise, if E1 has a signed type
+ // and non-negative value, and E1 × 2^E2 is representable in the
+ // corresponding unsigned type of the result type, then that value,
+ // converted to the result type, is the resulting value; otherwise, the
+ // behavior is undefined.
+ //
+ // Consequences:
+ // With C++2a signed and unsigned types have the same UB
+ // characteristics:
+ // - left shift is not UB for 0 <= RHS < max(32, #bits(T))
+ //
+ // With C++17 there's little room for optimizations because the standard
+ // requires all shifts to happen on promoted integrals (i.e. int). Thus,
+ // short and char shifts must assume shifts affect bits of neighboring
+ // values.
+#ifndef _GLIBCXX_SIMD_NO_SHIFT_OPT
+ template <typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+ inline _GLIBCXX_CONST static typename _TVT::type __bit_shift_left(_Tp __xx,
+ int __y)
+ {
+ using _V = typename _TVT::type;
+ using _Up = typename _TVT::value_type;
+ _V __x = __xx;
+ [[maybe_unused]] const auto __ix = __to_intrin(__x);
+ if (__builtin_is_constant_evaluated())
+ return __x << __y;
+#if __cplusplus > 201703
+ // after C++17, signed shifts have no UB, and behave just like unsigned
+ // shifts
+ else if constexpr (sizeof(_Up) == 1 && is_signed_v<_Up>)
+ return __vector_bitcast<_Up>(
+ __bit_shift_left(__vector_bitcast<make_unsigned_t<_Up>>(__x), __y));
+#endif
+ else if constexpr (sizeof(_Up) == 1)
+ {
+ // (cf. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83894)
+ if (__builtin_constant_p(__y))
+ {
+ if (__y == 0)
+ return __x;
+ else if (__y == 1)
+ return __x + __x;
+ else if (__y == 2)
+ {
+ __x = __x + __x;
+ return __x + __x;
+ }
+ else if (__y > 2 && __y < 8)
+ {
+ if constexpr (sizeof(__x) > sizeof(unsigned))
+ {
+ const _UChar __mask = 0xff << __y; // precomputed vector
+ return __vector_bitcast<_Up>(
+ __vector_bitcast<_UChar>(__vector_bitcast<unsigned>(__x)
+ << __y)
+ & __mask);
+ }
+ else
+ {
+ const unsigned __mask
+ = (0xff & (0xff << __y)) * 0x01010101u;
+ return reinterpret_cast<_V>(
+ static_cast<__int_for_sizeof_t<_V>>(
+ unsigned(reinterpret_cast<__int_for_sizeof_t<_V>>(__x)
+ << __y)
+ & __mask));
+ }
+ }
+ else if (__y >= 8 && __y < 32)
+ return _V();
+ else
+ __builtin_unreachable();
+ }
+ // general strategy in the following: use an sllv instead of sll
+ // instruction, because it's 2 to 4 times faster:
+ else if constexpr (__have_avx512bw_vl && sizeof(__x) == 16)
+ return __vector_bitcast<_Up>(
+ _mm256_cvtepi16_epi8(_mm256_sllv_epi16(_mm256_cvtepi8_epi16(__ix),
+ _mm256_set1_epi16(__y))));
+ else if constexpr (__have_avx512bw && sizeof(__x) == 32)
+ return __vector_bitcast<_Up>(
+ _mm512_cvtepi16_epi8(_mm512_sllv_epi16(_mm512_cvtepi8_epi16(__ix),
+ _mm512_set1_epi16(__y))));
+ else if constexpr (__have_avx512bw && sizeof(__x) == 64)
+ {
+ const auto __shift = _mm512_set1_epi16(__y);
+ return __vector_bitcast<_Up>(
+ __concat(_mm512_cvtepi16_epi8(_mm512_sllv_epi16(
+ _mm512_cvtepi8_epi16(__lo256(__ix)), __shift)),
+ _mm512_cvtepi16_epi8(_mm512_sllv_epi16(
+ _mm512_cvtepi8_epi16(__hi256(__ix)), __shift))));
+ }
+ else if constexpr (__have_avx2 && sizeof(__x) == 32)
+ {
+#if 1
+ const auto __shift = _mm_cvtsi32_si128(__y);
+ auto __k
+ = _mm256_sll_epi16(_mm256_slli_epi16(~__m256i(), 8), __shift);
+ __k |= _mm256_srli_epi16(__k, 8);
+ return __vector_bitcast<_Up>(_mm256_sll_epi32(__ix, __shift) & __k);
+#else
+ const _Up __k = 0xff << __y;
+ return __vector_bitcast<_Up>(__vector_bitcast<int>(__x) << __y)
+ & __k;
+#endif
+ }
+ else
+ {
+ const auto __shift = _mm_cvtsi32_si128(__y);
+ auto __k = _mm_sll_epi16(_mm_slli_epi16(~__m128i(), 8), __shift);
+ __k |= _mm_srli_epi16(__k, 8);
+ return __intrin_bitcast<_V>(_mm_sll_epi16(__ix, __shift) & __k);
+ }
+ }
+ return __x << __y;
+ }
+
+ template <typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+ inline _GLIBCXX_CONST static typename _TVT::type
+ __bit_shift_left(_Tp __xx, typename _TVT::type __y)
+ {
+ using _V = typename _TVT::type;
+ using _Up = typename _TVT::value_type;
+ _V __x = __xx;
+ [[maybe_unused]] const auto __ix = __to_intrin(__x);
+ [[maybe_unused]] const auto __iy = __to_intrin(__y);
+ if (__builtin_is_constant_evaluated())
+ return __x << __y;
+#if __cplusplus > 201703
+ // after C++17, signed shifts have no UB, and behave just like unsigned
+ // shifts
+ else if constexpr (is_signed_v<_Up>)
+ return __vector_bitcast<_Up>(
+ __bit_shift_left(__vector_bitcast<make_unsigned_t<_Up>>(__x),
+ __vector_bitcast<make_unsigned_t<_Up>>(__y)));
+#endif
+ else if constexpr (sizeof(_Up) == 1)
+ {
+ if constexpr (sizeof __ix == 64 && __have_avx512bw)
+ return __vector_bitcast<_Up>(
+ __concat(_mm512_cvtepi16_epi8(
+ _mm512_sllv_epi16(_mm512_cvtepu8_epi16(__lo256(__ix)),
+ _mm512_cvtepu8_epi16(__lo256(__iy)))),
+ _mm512_cvtepi16_epi8(_mm512_sllv_epi16(
+ _mm512_cvtepu8_epi16(__hi256(__ix)),
+ _mm512_cvtepu8_epi16(__hi256(__iy))))));
+ else if constexpr (sizeof __ix == 32 && __have_avx512bw)
+ return __vector_bitcast<_Up>(_mm512_cvtepi16_epi8(
+ _mm512_sllv_epi16(_mm512_cvtepu8_epi16(__ix),
+ _mm512_cvtepu8_epi16(__iy))));
+ else if constexpr (sizeof __x <= 8 && __have_avx512bw_vl)
+ return __intrin_bitcast<_V>(_mm_cvtepi16_epi8(
+ _mm_sllv_epi16(_mm_cvtepu8_epi16(__ix), _mm_cvtepu8_epi16(__iy))));
+ else if constexpr (sizeof __ix == 16 && __have_avx512bw_vl)
+ return __intrin_bitcast<_V>(_mm256_cvtepi16_epi8(
+ _mm256_sllv_epi16(_mm256_cvtepu8_epi16(__ix),
+ _mm256_cvtepu8_epi16(__iy))));
+ else if constexpr (sizeof __ix == 16 && __have_avx512bw)
+ return __intrin_bitcast<_V>(
+ __lo128(_mm512_cvtepi16_epi8(_mm512_sllv_epi16(
+ _mm512_cvtepu8_epi16(_mm256_castsi128_si256(__ix)),
+ _mm512_cvtepu8_epi16(_mm256_castsi128_si256(__iy))))));
+ else if constexpr (__have_sse4_1 && sizeof(__x) == 16)
+ {
+ auto __mask
+ = __vector_bitcast<_Up>(__vector_bitcast<short>(__y) << 5);
+ auto __x4
+ = __vector_bitcast<_Up>(__vector_bitcast<short>(__x) << 4);
+ __x4 &= char(0xf0);
+ __x = reinterpret_cast<_V>(_CommonImplX86::_S_blend_intrin(
+ __to_intrin(__mask), __to_intrin(__x), __to_intrin(__x4)));
+ __mask += __mask;
+ auto __x2
+ = __vector_bitcast<_Up>(__vector_bitcast<short>(__x) << 2);
+ __x2 &= char(0xfc);
+ __x = reinterpret_cast<_V>(_CommonImplX86::_S_blend_intrin(
+ __to_intrin(__mask), __to_intrin(__x), __to_intrin(__x2)));
+ __mask += __mask;
+ auto __x1 = __x + __x;
+ __x = reinterpret_cast<_V>(_CommonImplX86::_S_blend_intrin(
+ __to_intrin(__mask), __to_intrin(__x), __to_intrin(__x1)));
+ return __x & ((__y & char(0xf8)) == 0); // y > 7 nulls the result
+ }
+ else if constexpr (sizeof(__x) == 16)
+ {
+ auto __mask
+ = __vector_bitcast<_UChar>(__vector_bitcast<short>(__y) << 5);
+ auto __x4
+ = __vector_bitcast<_Up>(__vector_bitcast<short>(__x) << 4);
+ __x4 &= char(0xf0);
+ __x = __vector_bitcast<_SChar>(__mask) < 0 ? __x4 : __x;
+ __mask += __mask;
+ auto __x2
+ = __vector_bitcast<_Up>(__vector_bitcast<short>(__x) << 2);
+ __x2 &= char(0xfc);
+ __x = __vector_bitcast<_SChar>(__mask) < 0 ? __x2 : __x;
+ __mask += __mask;
+ auto __x1 = __x + __x;
+ __x = __vector_bitcast<_SChar>(__mask) < 0 ? __x1 : __x;
+ return __x & ((__y & char(0xf8)) == 0); // y > 7 nulls the result
+ }
+ else
+ return __x << __y;
+ }
+ else if constexpr (sizeof(_Up) == 2)
+ {
+ if constexpr (sizeof __ix == 64 && __have_avx512bw)
+ return __vector_bitcast<_Up>(_mm512_sllv_epi16(__ix, __iy));
+ else if constexpr (sizeof __ix == 32 && __have_avx512bw_vl)
+ return __vector_bitcast<_Up>(_mm256_sllv_epi16(__ix, __iy));
+ else if constexpr (sizeof __ix == 32 && __have_avx512bw)
+ return __vector_bitcast<_Up>(
+ __lo256(_mm512_sllv_epi16(_mm512_castsi256_si512(__ix),
+ _mm512_castsi256_si512(__iy))));
+ else if constexpr (sizeof __ix == 32 && __have_avx2)
+ {
+ const auto __ux = __vector_bitcast<unsigned>(__x);
+ const auto __uy = __vector_bitcast<unsigned>(__y);
+ return __vector_bitcast<_Up>(_mm256_blend_epi16(
+ __auto_bitcast(__ux << (__uy & 0x0000ffffu)),
+ __auto_bitcast((__ux & 0xffff0000u) << (__uy >> 16)), 0xaa));
+ }
+ else if constexpr (sizeof __ix == 16 && __have_avx512bw_vl)
+ return __intrin_bitcast<_V>(_mm_sllv_epi16(__ix, __iy));
+ else if constexpr (sizeof __ix == 16 && __have_avx512bw)
+ return __intrin_bitcast<_V>(
+ __lo128(_mm512_sllv_epi16(_mm512_castsi128_si512(__ix),
+ _mm512_castsi128_si512(__iy))));
+ else if constexpr (sizeof __ix == 16 && __have_avx2)
+ {
+ const auto __ux = __vector_bitcast<unsigned>(__ix);
+ const auto __uy = __vector_bitcast<unsigned>(__iy);
+ return __intrin_bitcast<_V>(_mm_blend_epi16(
+ __auto_bitcast(__ux << (__uy & 0x0000ffffu)),
+ __auto_bitcast((__ux & 0xffff0000u) << (__uy >> 16)), 0xaa));
+ }
+ else if constexpr (sizeof __ix == 16)
+ {
+ __y += 0x3f8 >> 3;
+ return __x
+ * __intrin_bitcast<_V>(
+ __vector_convert<__vector_type16_t<int>>(
+ __vector_bitcast<float>(
+ __vector_bitcast<unsigned>(__to_intrin(__y)) << 23))
+ | (__vector_convert<__vector_type16_t<int>>(
+ __vector_bitcast<float>(
+ (__vector_bitcast<unsigned>(__to_intrin(__y)) >> 16)
+ << 23))
+ << 16));
+ }
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (sizeof(_Up) == 4 && sizeof __ix == 16 && !__have_avx2)
+ // latency is suboptimal, but throughput is at full speedup
+ return __intrin_bitcast<_V>(
+ __vector_bitcast<unsigned>(__ix)
+ * __vector_convert<__vector_type16_t<int>>(__vector_bitcast<float>(
+ (__vector_bitcast<unsigned, 4>(__y) << 23) + 0x3f80'0000)));
+ else if constexpr (sizeof(_Up) == 8 && sizeof __ix == 16 && !__have_avx2)
+ {
+ const auto __lo = _mm_sll_epi64(__ix, __iy);
+ const auto __hi = _mm_sll_epi64(__ix, _mm_unpackhi_epi64(__iy, __iy));
+ if constexpr (__have_sse4_1)
+ return __vector_bitcast<_Up>(_mm_blend_epi16(__lo, __hi, 0xf0));
+ else
+ return __vector_bitcast<_Up>(
+ _mm_move_sd(__vector_bitcast<double>(__hi),
+ __vector_bitcast<double>(__lo)));
+ }
+ else
+ return __x << __y;
+ }
+#endif // _GLIBCXX_SIMD_NO_SHIFT_OPT
+
+ // }}}
+ // __bit_shift_right {{{
+#ifndef _GLIBCXX_SIMD_NO_SHIFT_OPT
+ template <typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+ inline _GLIBCXX_CONST static typename _TVT::type __bit_shift_right(_Tp __xx,
+ int __y)
+ {
+ using _V = typename _TVT::type;
+ using _Up = typename _TVT::value_type;
+ _V __x = __xx;
+ [[maybe_unused]] const auto __ix = __to_intrin(__x);
+ if (__builtin_is_constant_evaluated())
+ return __x >> __y;
+ else if (__builtin_constant_p(__y)
+ && std::is_unsigned_v<_Up> && __y >= int(sizeof(_Up) * CHAR_BIT))
+ return _V();
+ else if constexpr (sizeof(_Up) == 1 && is_unsigned_v<_Up>) //{{{
+ return __intrin_bitcast<_V>(__vector_bitcast<_UShort>(__ix) >> __y)
+ & _Up(0xff >> __y);
+ //}}}
+ else if constexpr (sizeof(_Up) == 1 && is_signed_v<_Up>) //{{{
+ return __intrin_bitcast<_V>(
+ (__vector_bitcast<_UShort>(__vector_bitcast<short>(__ix) >> (__y + 8))
+ << 8)
+ | (__vector_bitcast<_UShort>(
+ __vector_bitcast<short>(__vector_bitcast<_UShort>(__ix) << 8)
+ >> __y)
+ >> 8));
+ //}}}
+ // GCC optimizes sizeof == 2, 4, and unsigned 8 as expected
+ else if constexpr (sizeof(_Up) == 8 && is_signed_v<_Up>) //{{{
+ {
+ if (__y > 32)
+ return (__intrin_bitcast<_V>(__vector_bitcast<int>(__ix) >> 32)
+ & _Up(0xffff'ffff'0000'0000ull))
+ | __vector_bitcast<_Up>(
+ __vector_bitcast<int>(__vector_bitcast<_ULLong>(__ix) >> 32)
+ >> (__y - 32));
+ else
+ return __intrin_bitcast<_V>(__vector_bitcast<_ULLong>(__ix) >> __y)
+ | __vector_bitcast<_Up>(
+ __vector_bitcast<int>(__ix & -0x8000'0000'0000'0000ll)
+ >> __y);
+ }
+ //}}}
+ else
+ return __x >> __y;
+ }
+
+ template <typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+ inline _GLIBCXX_CONST static typename _TVT::type
+ __bit_shift_right(_Tp __xx, typename _TVT::type __y)
+ {
+ using _V = typename _TVT::type;
+ using _Up = typename _TVT::value_type;
+ _V __x = __xx;
+ [[maybe_unused]] const auto __ix = __to_intrin(__x);
+ [[maybe_unused]] const auto __iy = __to_intrin(__y);
+ if (__builtin_is_constant_evaluated()
+ || (__builtin_constant_p(__x) && __builtin_constant_p(__y)))
+ return __x >> __y;
+ else if constexpr (sizeof(_Up) == 1) //{{{
+ {
+ if constexpr (sizeof(__x) <= 8 && __have_avx512bw_vl)
+ return __intrin_bitcast<_V>(_mm_cvtepi16_epi8(
+ is_signed_v<_Up>
+ ? _mm_srav_epi16(_mm_cvtepi8_epi16(__ix), _mm_cvtepi8_epi16(__iy))
+ : _mm_srlv_epi16(_mm_cvtepu8_epi16(__ix),
+ _mm_cvtepu8_epi16(__iy))));
+ if constexpr (sizeof(__x) == 16 && __have_avx512bw_vl)
+ return __intrin_bitcast<_V>(_mm256_cvtepi16_epi8(
+ is_signed_v<_Up> ? _mm256_srav_epi16(_mm256_cvtepi8_epi16(__ix),
+ _mm256_cvtepi8_epi16(__iy))
+ : _mm256_srlv_epi16(_mm256_cvtepu8_epi16(__ix),
+ _mm256_cvtepu8_epi16(__iy))));
+ else if constexpr (sizeof(__x) == 32 && __have_avx512bw)
+ return __vector_bitcast<_Up>(_mm512_cvtepi16_epi8(
+ is_signed_v<_Up> ? _mm512_srav_epi16(_mm512_cvtepi8_epi16(__ix),
+ _mm512_cvtepi8_epi16(__iy))
+ : _mm512_srlv_epi16(_mm512_cvtepu8_epi16(__ix),
+ _mm512_cvtepu8_epi16(__iy))));
+ else if constexpr (sizeof(__x) == 64 && is_signed_v<_Up>)
+ return __vector_bitcast<_Up>(_mm512_mask_mov_epi8(
+ _mm512_srav_epi16(__ix, _mm512_srli_epi16(__iy, 8)),
+ 0x5555'5555'5555'5555ull,
+ _mm512_srav_epi16(_mm512_slli_epi16(__ix, 8),
+ _mm512_maskz_add_epi8(0x5555'5555'5555'5555ull,
+ __iy,
+ _mm512_set1_epi16(8)))));
+ else if constexpr (sizeof(__x) == 64 && is_unsigned_v<_Up>)
+ return __vector_bitcast<_Up>(_mm512_mask_mov_epi8(
+ _mm512_srlv_epi16(__ix, _mm512_srli_epi16(__iy, 8)),
+ 0x5555'5555'5555'5555ull,
+ _mm512_srlv_epi16(
+ _mm512_maskz_mov_epi8(0x5555'5555'5555'5555ull, __ix),
+ _mm512_maskz_mov_epi8(0x5555'5555'5555'5555ull, __iy))));
+ /* This has better throughput but higher latency than the impl below
+ else if constexpr (__have_avx2 && sizeof(__x) == 16 &&
+ is_unsigned_v<_Up>)
+ {
+ const auto __shorts = __to_intrin(__bit_shift_right(
+ __vector_bitcast<_UShort>(_mm256_cvtepu8_epi16(__ix)),
+ __vector_bitcast<_UShort>(_mm256_cvtepu8_epi16(__iy))));
+ return __vector_bitcast<_Up>(
+ _mm_packus_epi16(__lo128(__shorts), __hi128(__shorts)));
+ }
+ */
+ else if constexpr (__have_avx2 && sizeof(__x) > 8)
+ // the following uses vpsr[al]vd, which requires AVX2
+ if constexpr (is_signed_v<_Up>)
+ {
+ const auto r3 = __vector_bitcast<_UInt>(
+ (__vector_bitcast<int>(__x)
+ >> (__vector_bitcast<_UInt>(__y) >> 24)))
+ & 0xff000000u;
+ const auto r2 = __vector_bitcast<_UInt>((
+ (__vector_bitcast<int>(__x) << 8)
+ >> ((__vector_bitcast<_UInt>(__y) << 8) >> 24)))
+ & 0xff000000u;
+ const auto r1
+ = __vector_bitcast<_UInt>(
+ ((__vector_bitcast<int>(__x) << 16)
+ >> ((__vector_bitcast<_UInt>(__y) << 16) >> 24)))
+ & 0xff000000u;
+ const auto r0 = __vector_bitcast<_UInt>(
+ (__vector_bitcast<int>(__x) << 24)
+ >> ((__vector_bitcast<_UInt>(__y) << 24) >> 24));
+ return __vector_bitcast<_Up>(r3 | (r2 >> 8) | (r1 >> 16)
+ | (r0 >> 24));
+ }
+ else
+ {
+ const auto r3 = (__vector_bitcast<_UInt>(__x)
+ >> (__vector_bitcast<_UInt>(__y) >> 24))
+ & 0xff000000u;
+ const auto r2 = ((__vector_bitcast<_UInt>(__x) << 8)
+ >> ((__vector_bitcast<_UInt>(__y) << 8) >> 24))
+ & 0xff000000u;
+ const auto r1 = ((__vector_bitcast<_UInt>(__x) << 16)
+ >> ((__vector_bitcast<_UInt>(__y) << 16) >> 24))
+ & 0xff000000u;
+ const auto r0 = (__vector_bitcast<_UInt>(__x) << 24)
+ >> ((__vector_bitcast<_UInt>(__y) << 24) >> 24);
+ return __vector_bitcast<_Up>(r3 | (r2 >> 8) | (r1 >> 16)
+ | (r0 >> 24));
+ }
+ else if constexpr (__have_sse4_1
+ && is_unsigned_v<_Up> && sizeof(__x) > 2)
+ {
+ auto __x128 = __vector_bitcast<_Up>(__ix);
+ auto __mask
+ = __vector_bitcast<_Up>(__vector_bitcast<_UShort>(__iy) << 5);
+ auto __x4 = __vector_bitcast<_Up>(
+ (__vector_bitcast<_UShort>(__x128) >> 4) & _UShort(0xff0f));
+ __x128 = __vector_bitcast<_Up>(_CommonImplX86::_S_blend_intrin(
+ __to_intrin(__mask), __to_intrin(__x128), __to_intrin(__x4)));
+ __mask += __mask;
+ auto __x2 = __vector_bitcast<_Up>(
+ (__vector_bitcast<_UShort>(__x128) >> 2) & _UShort(0xff3f));
+ __x128 = __vector_bitcast<_Up>(_CommonImplX86::_S_blend_intrin(
+ __to_intrin(__mask), __to_intrin(__x128), __to_intrin(__x2)));
+ __mask += __mask;
+ auto __x1 = __vector_bitcast<_Up>(
+ (__vector_bitcast<_UShort>(__x128) >> 1) & _UShort(0xff7f));
+ __x128 = __vector_bitcast<_Up>(_CommonImplX86::_S_blend_intrin(
+ __to_intrin(__mask), __to_intrin(__x128), __to_intrin(__x1)));
+ return __intrin_bitcast<_V>(
+ __x128
+ & ((__vector_bitcast<_Up>(__iy) & char(0xf8))
+ == 0)); // y > 7 nulls the result
+ }
+ else if constexpr (__have_sse4_1 && is_signed_v<_Up> && sizeof(__x) > 2)
+ {
+ auto __mask
+ = __vector_bitcast<_UChar>(__vector_bitcast<_UShort>(__iy) << 5);
+ auto __maskl = [&]() {
+ return __to_intrin(__vector_bitcast<_UShort>(__mask) << 8);
+ };
+ auto __xh = __vector_bitcast<short>(__ix);
+ auto __xl = __vector_bitcast<short>(__ix) << 8;
+ auto __xh4 = __xh >> 4;
+ auto __xl4 = __xl >> 4;
+ __xh = __vector_bitcast<short>(_CommonImplX86::_S_blend_intrin(
+ __to_intrin(__mask), __to_intrin(__xh), __to_intrin(__xh4)));
+ __xl = __vector_bitcast<short>(
+ _CommonImplX86::_S_blend_intrin(__maskl(), __to_intrin(__xl),
+ __to_intrin(__xl4)));
+ __mask += __mask;
+ auto __xh2 = __xh >> 2;
+ auto __xl2 = __xl >> 2;
+ __xh = __vector_bitcast<short>(_CommonImplX86::_S_blend_intrin(
+ __to_intrin(__mask), __to_intrin(__xh), __to_intrin(__xh2)));
+ __xl = __vector_bitcast<short>(
+ _CommonImplX86::_S_blend_intrin(__maskl(), __to_intrin(__xl),
+ __to_intrin(__xl2)));
+ __mask += __mask;
+ auto __xh1 = __xh >> 1;
+ auto __xl1 = __xl >> 1;
+ __xh = __vector_bitcast<short>(_CommonImplX86::_S_blend_intrin(
+ __to_intrin(__mask), __to_intrin(__xh), __to_intrin(__xh1)));
+ __xl = __vector_bitcast<short>(
+ _CommonImplX86::_S_blend_intrin(__maskl(), __to_intrin(__xl),
+ __to_intrin(__xl1)));
+ return __intrin_bitcast<_V>(
+ (__vector_bitcast<_Up>((__xh & short(0xff00)))
+ | __vector_bitcast<_Up>(__vector_bitcast<_UShort>(__xl) >> 8))
+ & ((__vector_bitcast<_Up>(__iy) & char(0xf8))
+ == 0)); // y > 7 nulls the result
+ }
+ else if constexpr (is_unsigned_v<_Up> && sizeof(__x) > 2) // SSE2
+ {
+ auto __mask
+ = __vector_bitcast<_Up>(__vector_bitcast<_UShort>(__y) << 5);
+ auto __x4 = __vector_bitcast<_Up>(
+ (__vector_bitcast<_UShort>(__x) >> 4) & _UShort(0xff0f));
+ __x = __mask > 0x7f ? __x4 : __x;
+ __mask += __mask;
+ auto __x2 = __vector_bitcast<_Up>(
+ (__vector_bitcast<_UShort>(__x) >> 2) & _UShort(0xff3f));
+ __x = __mask > 0x7f ? __x2 : __x;
+ __mask += __mask;
+ auto __x1 = __vector_bitcast<_Up>(
+ (__vector_bitcast<_UShort>(__x) >> 1) & _UShort(0xff7f));
+ __x = __mask > 0x7f ? __x1 : __x;
+ return __x & ((__y & char(0xf8)) == 0); // y > 7 nulls the result
+ }
+ else if constexpr (sizeof(__x) > 2) // signed SSE2
+ {
+ static_assert(is_signed_v<_Up>);
+ auto __maskh = __vector_bitcast<_UShort>(__y) << 5;
+ auto __maskl = __vector_bitcast<_UShort>(__y) << (5 + 8);
+ auto __xh = __vector_bitcast<short>(__x);
+ auto __xl = __vector_bitcast<short>(__x) << 8;
+ auto __xh4 = __xh >> 4;
+ auto __xl4 = __xl >> 4;
+ __xh = __maskh > 0x7fff ? __xh4 : __xh;
+ __xl = __maskl > 0x7fff ? __xl4 : __xl;
+ __maskh += __maskh;
+ __maskl += __maskl;
+ auto __xh2 = __xh >> 2;
+ auto __xl2 = __xl >> 2;
+ __xh = __maskh > 0x7fff ? __xh2 : __xh;
+ __xl = __maskl > 0x7fff ? __xl2 : __xl;
+ __maskh += __maskh;
+ __maskl += __maskl;
+ auto __xh1 = __xh >> 1;
+ auto __xl1 = __xl >> 1;
+ __xh = __maskh > 0x7fff ? __xh1 : __xh;
+ __xl = __maskl > 0x7fff ? __xl1 : __xl;
+ __x = __vector_bitcast<_Up>((__xh & short(0xff00)))
+ | __vector_bitcast<_Up>(__vector_bitcast<_UShort>(__xl) >> 8);
+ return __x & ((__y & char(0xf8)) == 0); // y > 7 nulls the result
+ }
+ else
+ return __x >> __y;
+ } //}}}
+ else if constexpr (sizeof(_Up) == 2 && sizeof(__x) >= 4) //{{{
+ {
+ [[maybe_unused]] auto __blend_0xaa = [](auto __a, auto __b) {
+ if constexpr (sizeof(__a) == 16)
+ return _mm_blend_epi16(__to_intrin(__a), __to_intrin(__b), 0xaa);
+ else if constexpr (sizeof(__a) == 32)
+ return _mm256_blend_epi16(__to_intrin(__a), __to_intrin(__b), 0xaa);
+ else if constexpr (sizeof(__a) == 64)
+ return _mm512_mask_blend_epi16(0xaaaa'aaaaU, __to_intrin(__a),
+ __to_intrin(__b));
+ else
+ __assert_unreachable<decltype(__a)>();
+ };
+ if constexpr (__have_avx512bw_vl && sizeof(_Tp) <= 16)
+ return __intrin_bitcast<_V>(is_signed_v<_Up>
+ ? _mm_srav_epi16(__ix, __iy)
+ : _mm_srlv_epi16(__ix, __iy));
+ else if constexpr (__have_avx512bw_vl && sizeof(_Tp) == 32)
+ return __vector_bitcast<_Up>(is_signed_v<_Up>
+ ? _mm256_srav_epi16(__ix, __iy)
+ : _mm256_srlv_epi16(__ix, __iy));
+ else if constexpr (__have_avx512bw && sizeof(_Tp) == 64)
+ return __vector_bitcast<_Up>(is_signed_v<_Up>
+ ? _mm512_srav_epi16(__ix, __iy)
+ : _mm512_srlv_epi16(__ix, __iy));
+ else if constexpr (__have_avx2 && is_signed_v<_Up>)
+ return __intrin_bitcast<_V>(
+ __blend_0xaa(((__vector_bitcast<int>(__ix) << 16)
+ >> (__vector_bitcast<int>(__iy) & 0xffffu))
+ >> 16,
+ __vector_bitcast<int>(__ix)
+ >> (__vector_bitcast<int>(__iy) >> 16)));
+ else if constexpr (__have_avx2 && is_unsigned_v<_Up>)
+ return __intrin_bitcast<_V>(
+ __blend_0xaa((__vector_bitcast<_UInt>(__ix) & 0xffffu)
+ >> (__vector_bitcast<_UInt>(__iy) & 0xffffu),
+ __vector_bitcast<_UInt>(__ix)
+ >> (__vector_bitcast<_UInt>(__iy) >> 16)));
+ else if constexpr (__have_sse4_1)
+ {
+ auto __mask = __vector_bitcast<_UShort>(__iy);
+ auto __x128 = __vector_bitcast<_Up>(__ix);
+ //__mask *= 0x0808;
+ __mask = (__mask << 3) | (__mask << 11);
+ // do __x128 = 0 where __y[4] is set
+ __x128 = __vector_bitcast<_Up>(
+ _mm_blendv_epi8(__to_intrin(__x128), __m128i(),
+ __to_intrin(__mask)));
+ // do __x128 =>> 8 where __y[3] is set
+ __x128 = __vector_bitcast<_Up>(
+ _mm_blendv_epi8(__to_intrin(__x128), __to_intrin(__x128 >> 8),
+ __to_intrin(__mask += __mask)));
+ // do __x128 =>> 4 where __y[2] is set
+ __x128 = __vector_bitcast<_Up>(
+ _mm_blendv_epi8(__to_intrin(__x128), __to_intrin(__x128 >> 4),
+ __to_intrin(__mask += __mask)));
+ // do __x128 =>> 2 where __y[1] is set
+ __x128 = __vector_bitcast<_Up>(
+ _mm_blendv_epi8(__to_intrin(__x128), __to_intrin(__x128 >> 2),
+ __to_intrin(__mask += __mask)));
+ // do __x128 =>> 1 where __y[0] is set
+ return __intrin_bitcast<_V>(
+ _mm_blendv_epi8(__to_intrin(__x128), __to_intrin(__x128 >> 1),
+ __to_intrin(__mask + __mask)));
+ }
+ else
+ {
+ auto __k = __vector_bitcast<_UShort>(__iy) << 11;
+ auto __x128 = __vector_bitcast<_Up>(__ix);
+ auto __mask = [](__vector_type16_t<_UShort> __kk) {
+ return __vector_bitcast<short>(__kk) < 0;
+ };
+ // do __x128 = 0 where __y[4] is set
+ __x128 = __mask(__k) ? decltype(__x128)() : __x128;
+ // do __x128 =>> 8 where __y[3] is set
+ __x128 = __mask(__k += __k) ? __x128 >> 8 : __x128;
+ // do __x128 =>> 4 where __y[2] is set
+ __x128 = __mask(__k += __k) ? __x128 >> 4 : __x128;
+ // do __x128 =>> 2 where __y[1] is set
+ __x128 = __mask(__k += __k) ? __x128 >> 2 : __x128;
+ // do __x128 =>> 1 where __y[0] is set
+ return __intrin_bitcast<_V>(__mask(__k + __k) ? __x128 >> 1
+ : __x128);
+ }
+ } //}}}
+ else if constexpr (sizeof(_Up) == 4 && !__have_avx2) //{{{
+ {
+ if constexpr (is_unsigned_v<_Up>)
+ {
+ // x >> y == x * 2^-y == (x * 2^(31-y)) >> 31
+ const __m128 __factor_f = reinterpret_cast<__m128>(
+ 0x4f00'0000u - (__vector_bitcast<unsigned, 4>(__y) << 23));
+ const __m128i __factor
+ = __builtin_constant_p(__factor_f) ? __to_intrin(
+ __make_vector<unsigned>(__factor_f[0], __factor_f[1],
+ __factor_f[2], __factor_f[3]))
+ : _mm_cvttps_epi32(__factor_f);
+ const auto __r02
+ = _mm_srli_epi64(_mm_mul_epu32(__ix, __factor), 31);
+ const auto __r13 = _mm_mul_epu32(_mm_srli_si128(__ix, 4),
+ _mm_srli_si128(__factor, 4));
+ if constexpr (__have_sse4_1)
+ return __intrin_bitcast<_V>(
+ _mm_blend_epi16(_mm_slli_epi64(__r13, 1), __r02, 0x33));
+ else
+ return __intrin_bitcast<_V>(
+ __r02 | _mm_slli_si128(_mm_srli_epi64(__r13, 31), 4));
+ }
+ else
+ {
+ auto __shift = [](auto __a, auto __b) {
+ if constexpr (is_signed_v<_Up>)
+ return _mm_sra_epi32(__a, __b);
+ else
+ return _mm_srl_epi32(__a, __b);
+ };
+ const auto __r0
+ = __shift(__ix, _mm_unpacklo_epi32(__iy, __m128i()));
+ const auto __r1 = __shift(__ix, _mm_srli_epi64(__iy, 32));
+ const auto __r2
+ = __shift(__ix, _mm_unpackhi_epi32(__iy, __m128i()));
+ const auto __r3 = __shift(__ix, _mm_srli_si128(__iy, 12));
+ if constexpr (__have_sse4_1)
+ return __intrin_bitcast<_V>(
+ _mm_blend_epi16(_mm_blend_epi16(__r1, __r0, 0x3),
+ _mm_blend_epi16(__r3, __r2, 0x30), 0xf0));
+ else
+ return __intrin_bitcast<_V>(_mm_unpacklo_epi64(
+ _mm_unpacklo_epi32(__r0, _mm_srli_si128(__r1, 4)),
+ _mm_unpackhi_epi32(__r2, _mm_srli_si128(__r3, 4))));
+ }
+ } //}}}
+ else
+ return __x >> __y;
+ }
+#endif // _GLIBCXX_SIMD_NO_SHIFT_OPT
+
+ // }}}
+ // compares {{{
+ // __equal_to {{{
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember<_Tp>
+ __equal_to(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y)
+ {
+ if constexpr (__is_avx512_abi<_Abi>()) // {{{
+ {
+ if (__builtin_is_constant_evaluated()
+ || (__x._M_is_constprop() && __y._M_is_constprop()))
+ return _MaskImpl::__to_bits(_SimdWrapper<_Tp, _Np>(
+ __vector_bitcast<_Tp>(__x._M_data == __y._M_data)));
+
+ constexpr auto __k1 = _Abi::template __implicit_mask<_Tp>();
+ [[maybe_unused]] const auto __xi = __to_intrin(__x);
+ [[maybe_unused]] const auto __yi = __to_intrin(__y);
+ if constexpr (std::is_floating_point_v<_Tp>)
+ {
+ if constexpr (sizeof(__xi) == 64 && sizeof(_Tp) == 8)
+ return _mm512_mask_cmp_pd_mask(__k1, __xi, __yi, _CMP_EQ_OQ);
+ else if constexpr (sizeof(__xi) == 64 && sizeof(_Tp) == 4)
+ return _mm512_mask_cmp_ps_mask(__k1, __xi, __yi, _CMP_EQ_OQ);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 8)
+ return _mm256_mask_cmp_pd_mask(__k1, __xi, __yi, _CMP_EQ_OQ);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 4)
+ return _mm256_mask_cmp_ps_mask(__k1, __xi, __yi, _CMP_EQ_OQ);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 8)
+ return _mm_mask_cmp_pd_mask(__k1, __xi, __yi, _CMP_EQ_OQ);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 4)
+ return _mm_mask_cmp_ps_mask(__k1, __xi, __yi, _CMP_EQ_OQ);
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (sizeof(__xi) == 64 && sizeof(_Tp) == 8)
+ return _mm512_mask_cmpeq_epi64_mask(__k1, __xi, __yi);
+ else if constexpr (sizeof(__xi) == 64 && sizeof(_Tp) == 4)
+ return _mm512_mask_cmpeq_epi32_mask(__k1, __xi, __yi);
+ else if constexpr (sizeof(__xi) == 64 && sizeof(_Tp) == 2)
+ return _mm512_mask_cmpeq_epi16_mask(__k1, __xi, __yi);
+ else if constexpr (sizeof(__xi) == 64 && sizeof(_Tp) == 1)
+ return _mm512_mask_cmpeq_epi8_mask(__k1, __xi, __yi);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 8)
+ return _mm256_mask_cmpeq_epi64_mask(__k1, __xi, __yi);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 4)
+ return _mm256_mask_cmpeq_epi32_mask(__k1, __xi, __yi);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 2)
+ return _mm256_mask_cmpeq_epi16_mask(__k1, __xi, __yi);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 1)
+ return _mm256_mask_cmpeq_epi8_mask(__k1, __xi, __yi);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 8)
+ return _mm_mask_cmpeq_epi64_mask(__k1, __xi, __yi);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 4)
+ return _mm_mask_cmpeq_epi32_mask(__k1, __xi, __yi);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 2)
+ return _mm_mask_cmpeq_epi16_mask(__k1, __xi, __yi);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 1)
+ return _mm_mask_cmpeq_epi8_mask(__k1, __xi, __yi);
+ else
+ __assert_unreachable<_Tp>();
+ } // }}}
+ else if constexpr (!__builtin_is_constant_evaluated() && sizeof(__x) == 8) // {{{
+ {
+ const auto __r128 = __vector_bitcast<_Tp, 16 / sizeof(_Tp)>(__x)
+ == __vector_bitcast<_Tp, 16 / sizeof(_Tp)>(__y);
+ _MaskMember<_Tp> __r64;
+ __builtin_memcpy(&__r64._M_data, &__r128, sizeof(__r64));
+ return __r64;
+ } // }}}
+ else
+ return _Base::__equal_to(__x, __y);
+ }
+
+ // }}}
+ // __not_equal_to {{{
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember<_Tp>
+ __not_equal_to(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y)
+ {
+ if constexpr (__is_avx512_abi<_Abi>()) // {{{
+ {
+ if (__builtin_is_constant_evaluated()
+ || (__x._M_is_constprop() && __y._M_is_constprop()))
+ return _MaskImpl::__to_bits(_SimdWrapper<_Tp, _Np>(
+ __vector_bitcast<_Tp>(__x._M_data != __y._M_data)));
+
+ constexpr auto __k1 = _Abi::template __implicit_mask<_Tp>();
+ [[maybe_unused]] const auto __xi = __to_intrin(__x);
+ [[maybe_unused]] const auto __yi = __to_intrin(__y);
+ if constexpr (std::is_floating_point_v<_Tp>)
+ {
+ if constexpr (sizeof(__xi) == 64 && sizeof(_Tp) == 8)
+ return _mm512_mask_cmp_pd_mask(__k1, __xi, __yi, _CMP_NEQ_UQ);
+ else if constexpr (sizeof(__xi) == 64 && sizeof(_Tp) == 4)
+ return _mm512_mask_cmp_ps_mask(__k1, __xi, __yi, _CMP_NEQ_UQ);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 8)
+ return _mm256_mask_cmp_pd_mask(__k1, __xi, __yi, _CMP_NEQ_UQ);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 4)
+ return _mm256_mask_cmp_ps_mask(__k1, __xi, __yi, _CMP_NEQ_UQ);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 8)
+ return _mm_mask_cmp_pd_mask(__k1, __xi, __yi, _CMP_NEQ_UQ);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 4)
+ return _mm_mask_cmp_ps_mask(__k1, __xi, __yi, _CMP_NEQ_UQ);
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (sizeof(__xi) == 64 && sizeof(_Tp) == 8)
+ return ~_mm512_mask_cmpeq_epi64_mask(__k1, __xi, __yi);
+ else if constexpr (sizeof(__xi) == 64 && sizeof(_Tp) == 4)
+ return ~_mm512_mask_cmpeq_epi32_mask(__k1, __xi, __yi);
+ else if constexpr (sizeof(__xi) == 64 && sizeof(_Tp) == 2)
+ return ~_mm512_mask_cmpeq_epi16_mask(__k1, __xi, __yi);
+ else if constexpr (sizeof(__xi) == 64 && sizeof(_Tp) == 1)
+ return ~_mm512_mask_cmpeq_epi8_mask(__k1, __xi, __yi);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 8)
+ return ~_mm256_mask_cmpeq_epi64_mask(__k1, __xi, __yi);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 4)
+ return ~_mm256_mask_cmpeq_epi32_mask(__k1, __xi, __yi);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 2)
+ return ~_mm256_mask_cmpeq_epi16_mask(__k1, __xi, __yi);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 1)
+ return ~_mm256_mask_cmpeq_epi8_mask(__k1, __xi, __yi);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 8)
+ return ~_mm_mask_cmpeq_epi64_mask(__k1, __xi, __yi);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 4)
+ return ~_mm_mask_cmpeq_epi32_mask(__k1, __xi, __yi);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 2)
+ return ~_mm_mask_cmpeq_epi16_mask(__k1, __xi, __yi);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 1)
+ return ~_mm_mask_cmpeq_epi8_mask(__k1, __xi, __yi);
+ else
+ __assert_unreachable<_Tp>();
+ } // }}}
+ else if constexpr (!__builtin_is_constant_evaluated() && sizeof(__x) == 8) // {{{
+ {
+ const auto __r128 = __vector_bitcast<_Tp, 16 / sizeof(_Tp)>(__x)
+ != __vector_bitcast<_Tp, 16 / sizeof(_Tp)>(__y);
+ _MaskMember<_Tp> __r64;
+ __builtin_memcpy(&__r64._M_data, &__r128, sizeof(__r64));
+ return __r64;
+ } // }}}
+ else
+ return _Base::__not_equal_to(__x, __y);
+ }
+
+ // }}}
+ // __less {{{
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember<_Tp>
+ __less(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y)
+ {
+ if constexpr (__is_avx512_abi<_Abi>()) // {{{
+ {
+ if (__builtin_is_constant_evaluated()
+ || (__x._M_is_constprop() && __y._M_is_constprop()))
+ return _MaskImpl::__to_bits(_SimdWrapper<_Tp, _Np>(
+ __vector_bitcast<_Tp>(__x._M_data < __y._M_data)));
+
+ constexpr auto __k1 = _Abi::template __implicit_mask<_Tp>();
+ [[maybe_unused]] const auto __xi = __to_intrin(__x);
+ [[maybe_unused]] const auto __yi = __to_intrin(__y);
+ if constexpr (sizeof(__xi) == 64)
+ {
+ if constexpr (std::is_same_v<_Tp, float>)
+ return _mm512_mask_cmp_ps_mask(__k1, __xi, __yi, _CMP_LT_OS);
+ else if constexpr (std::is_same_v<_Tp, double>)
+ return _mm512_mask_cmp_pd_mask(__k1, __xi, __yi, _CMP_LT_OS);
+ else if constexpr (std::is_signed_v<_Tp> && sizeof(_Tp) == 1)
+ return _mm512_mask_cmplt_epi8_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_signed_v<_Tp> && sizeof(_Tp) == 2)
+ return _mm512_mask_cmplt_epi16_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_signed_v<_Tp> && sizeof(_Tp) == 4)
+ return _mm512_mask_cmplt_epi32_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_signed_v<_Tp> && sizeof(_Tp) == 8)
+ return _mm512_mask_cmplt_epi64_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_unsigned_v<_Tp> && sizeof(_Tp) == 1)
+ return _mm512_mask_cmplt_epu8_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_unsigned_v<_Tp> && sizeof(_Tp) == 2)
+ return _mm512_mask_cmplt_epu16_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_unsigned_v<_Tp> && sizeof(_Tp) == 4)
+ return _mm512_mask_cmplt_epu32_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_unsigned_v<_Tp> && sizeof(_Tp) == 8)
+ return _mm512_mask_cmplt_epu64_mask(__k1, __xi, __yi);
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (sizeof(__xi) == 32)
+ {
+ if constexpr (std::is_same_v<_Tp, float>)
+ return _mm256_mask_cmp_ps_mask(__k1, __xi, __yi, _CMP_LT_OS);
+ else if constexpr (std::is_same_v<_Tp, double>)
+ return _mm256_mask_cmp_pd_mask(__k1, __xi, __yi, _CMP_LT_OS);
+ else if constexpr (std::is_signed_v<_Tp> && sizeof(_Tp) == 1)
+ return _mm256_mask_cmplt_epi8_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_signed_v<_Tp> && sizeof(_Tp) == 2)
+ return _mm256_mask_cmplt_epi16_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_signed_v<_Tp> && sizeof(_Tp) == 4)
+ return _mm256_mask_cmplt_epi32_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_signed_v<_Tp> && sizeof(_Tp) == 8)
+ return _mm256_mask_cmplt_epi64_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_unsigned_v<_Tp> && sizeof(_Tp) == 1)
+ return _mm256_mask_cmplt_epu8_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_unsigned_v<_Tp> && sizeof(_Tp) == 2)
+ return _mm256_mask_cmplt_epu16_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_unsigned_v<_Tp> && sizeof(_Tp) == 4)
+ return _mm256_mask_cmplt_epu32_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_unsigned_v<_Tp> && sizeof(_Tp) == 8)
+ return _mm256_mask_cmplt_epu64_mask(__k1, __xi, __yi);
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (sizeof(__xi) == 16)
+ {
+ if constexpr (std::is_same_v<_Tp, float>)
+ return _mm_mask_cmp_ps_mask(__k1, __xi, __yi, _CMP_LT_OS);
+ else if constexpr (std::is_same_v<_Tp, double>)
+ return _mm_mask_cmp_pd_mask(__k1, __xi, __yi, _CMP_LT_OS);
+ else if constexpr (std::is_signed_v<_Tp> && sizeof(_Tp) == 1)
+ return _mm_mask_cmplt_epi8_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_signed_v<_Tp> && sizeof(_Tp) == 2)
+ return _mm_mask_cmplt_epi16_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_signed_v<_Tp> && sizeof(_Tp) == 4)
+ return _mm_mask_cmplt_epi32_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_signed_v<_Tp> && sizeof(_Tp) == 8)
+ return _mm_mask_cmplt_epi64_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_unsigned_v<_Tp> && sizeof(_Tp) == 1)
+ return _mm_mask_cmplt_epu8_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_unsigned_v<_Tp> && sizeof(_Tp) == 2)
+ return _mm_mask_cmplt_epu16_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_unsigned_v<_Tp> && sizeof(_Tp) == 4)
+ return _mm_mask_cmplt_epu32_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_unsigned_v<_Tp> && sizeof(_Tp) == 8)
+ return _mm_mask_cmplt_epu64_mask(__k1, __xi, __yi);
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else
+ __assert_unreachable<_Tp>();
+ } // }}}
+ else if constexpr (!__builtin_is_constant_evaluated() && sizeof(__x) == 8) // {{{
+ {
+ const auto __r128 = __vector_bitcast<_Tp, 16 / sizeof(_Tp)>(__x)
+ < __vector_bitcast<_Tp, 16 / sizeof(_Tp)>(__y);
+ _MaskMember<_Tp> __r64;
+ __builtin_memcpy(&__r64._M_data, &__r128, sizeof(__r64));
+ return __r64;
+ } // }}}
+ else
+ return _Base::__less(__x, __y);
+ }
+
+ // }}}
+ // __less_equal {{{
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember<_Tp>
+ __less_equal(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y)
+ {
+ if constexpr (__is_avx512_abi<_Abi>()) // {{{
+ {
+ if (__builtin_is_constant_evaluated()
+ || (__x._M_is_constprop() && __y._M_is_constprop()))
+ return _MaskImpl::__to_bits(_SimdWrapper<_Tp, _Np>(
+ __vector_bitcast<_Tp>(__x._M_data <= __y._M_data)));
+
+ constexpr auto __k1 = _Abi::template __implicit_mask<_Tp>();
+ [[maybe_unused]] const auto __xi = __to_intrin(__x);
+ [[maybe_unused]] const auto __yi = __to_intrin(__y);
+ if constexpr (sizeof(__xi) == 64)
+ {
+ if constexpr (std::is_same_v<_Tp, float>)
+ return _mm512_mask_cmp_ps_mask(__k1, __xi, __yi, _CMP_LE_OS);
+ else if constexpr (std::is_same_v<_Tp, double>)
+ return _mm512_mask_cmp_pd_mask(__k1, __xi, __yi, _CMP_LE_OS);
+ else if constexpr (std::is_signed_v<_Tp> && sizeof(_Tp) == 1)
+ return _mm512_mask_cmple_epi8_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_signed_v<_Tp> && sizeof(_Tp) == 2)
+ return _mm512_mask_cmple_epi16_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_signed_v<_Tp> && sizeof(_Tp) == 4)
+ return _mm512_mask_cmple_epi32_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_signed_v<_Tp> && sizeof(_Tp) == 8)
+ return _mm512_mask_cmple_epi64_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_unsigned_v<_Tp> && sizeof(_Tp) == 1)
+ return _mm512_mask_cmple_epu8_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_unsigned_v<_Tp> && sizeof(_Tp) == 2)
+ return _mm512_mask_cmple_epu16_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_unsigned_v<_Tp> && sizeof(_Tp) == 4)
+ return _mm512_mask_cmple_epu32_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_unsigned_v<_Tp> && sizeof(_Tp) == 8)
+ return _mm512_mask_cmple_epu64_mask(__k1, __xi, __yi);
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (sizeof(__xi) == 32)
+ {
+ if constexpr (std::is_same_v<_Tp, float>)
+ return _mm256_mask_cmp_ps_mask(__k1, __xi, __yi, _CMP_LE_OS);
+ else if constexpr (std::is_same_v<_Tp, double>)
+ return _mm256_mask_cmp_pd_mask(__k1, __xi, __yi, _CMP_LE_OS);
+ else if constexpr (std::is_signed_v<_Tp> && sizeof(_Tp) == 1)
+ return _mm256_mask_cmple_epi8_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_signed_v<_Tp> && sizeof(_Tp) == 2)
+ return _mm256_mask_cmple_epi16_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_signed_v<_Tp> && sizeof(_Tp) == 4)
+ return _mm256_mask_cmple_epi32_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_signed_v<_Tp> && sizeof(_Tp) == 8)
+ return _mm256_mask_cmple_epi64_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_unsigned_v<_Tp> && sizeof(_Tp) == 1)
+ return _mm256_mask_cmple_epu8_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_unsigned_v<_Tp> && sizeof(_Tp) == 2)
+ return _mm256_mask_cmple_epu16_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_unsigned_v<_Tp> && sizeof(_Tp) == 4)
+ return _mm256_mask_cmple_epu32_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_unsigned_v<_Tp> && sizeof(_Tp) == 8)
+ return _mm256_mask_cmple_epu64_mask(__k1, __xi, __yi);
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (sizeof(__xi) == 16)
+ {
+ if constexpr (std::is_same_v<_Tp, float>)
+ return _mm_mask_cmp_ps_mask(__k1, __xi, __yi, _CMP_LE_OS);
+ else if constexpr (std::is_same_v<_Tp, double>)
+ return _mm_mask_cmp_pd_mask(__k1, __xi, __yi, _CMP_LE_OS);
+ else if constexpr (std::is_signed_v<_Tp> && sizeof(_Tp) == 1)
+ return _mm_mask_cmple_epi8_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_signed_v<_Tp> && sizeof(_Tp) == 2)
+ return _mm_mask_cmple_epi16_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_signed_v<_Tp> && sizeof(_Tp) == 4)
+ return _mm_mask_cmple_epi32_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_signed_v<_Tp> && sizeof(_Tp) == 8)
+ return _mm_mask_cmple_epi64_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_unsigned_v<_Tp> && sizeof(_Tp) == 1)
+ return _mm_mask_cmple_epu8_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_unsigned_v<_Tp> && sizeof(_Tp) == 2)
+ return _mm_mask_cmple_epu16_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_unsigned_v<_Tp> && sizeof(_Tp) == 4)
+ return _mm_mask_cmple_epu32_mask(__k1, __xi, __yi);
+ else if constexpr (std::is_unsigned_v<_Tp> && sizeof(_Tp) == 8)
+ return _mm_mask_cmple_epu64_mask(__k1, __xi, __yi);
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else
+ __assert_unreachable<_Tp>();
+ } // }}}
+ else if constexpr (!__builtin_is_constant_evaluated() && sizeof(__x) == 8) // {{{
+ {
+ const auto __r128 = __vector_bitcast<_Tp, 16 / sizeof(_Tp)>(__x)
+ <= __vector_bitcast<_Tp, 16 / sizeof(_Tp)>(__y);
+ _MaskMember<_Tp> __r64;
+ __builtin_memcpy(&__r64._M_data, &__r128, sizeof(__r64));
+ return __r64;
+ } // }}}
+ else
+ return _Base::__less_equal(__x, __y);
+ }
+
+ // }}}
+ // }}}
+ // negation {{{
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember<_Tp>
+ __negate(_SimdWrapper<_Tp, _Np> __x) noexcept
+ {
+ if constexpr (__is_avx512_abi<_Abi>())
+ return __equal_to(__x, _SimdWrapper<_Tp, _Np>());
+ else
+ return _Base::__negate(__x);
+ }
+
+ // }}}
+ // math {{{
+ using _Base::__abs;
+ // __sqrt {{{
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static _SimdWrapper<_Tp, _Np>
+ __sqrt(_SimdWrapper<_Tp, _Np> __x)
+ {
+ if constexpr (__is_sse_ps<_Tp, _Np>())
+ return __auto_bitcast(_mm_sqrt_ps(__to_intrin(__x)));
+ else if constexpr (__is_sse_pd<_Tp, _Np>())
+ return _mm_sqrt_pd(__x);
+ else if constexpr (__is_avx_ps<_Tp, _Np>())
+ return _mm256_sqrt_ps(__x);
+ else if constexpr (__is_avx_pd<_Tp, _Np>())
+ return _mm256_sqrt_pd(__x);
+ else if constexpr (__is_avx512_ps<_Tp, _Np>())
+ return _mm512_sqrt_ps(__x);
+ else if constexpr (__is_avx512_pd<_Tp, _Np>())
+ return _mm512_sqrt_pd(__x);
+ else
+ __assert_unreachable<_Tp>();
+ }
+
+ // }}}
+ // __ldexp {{{
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static _SimdWrapper<_Tp, _Np>
+ __ldexp(_SimdWrapper<_Tp, _Np> __x, __fixed_size_storage_t<int, _Np> __exp)
+ {
+ if constexpr (__is_avx512_abi<_Abi>())
+ {
+ const auto __xi = __to_intrin(__x);
+ constexpr _SimdConverter<int, simd_abi::fixed_size<_Np>, _Tp, _Abi>
+ __cvt;
+ const auto __expi = __to_intrin(__cvt(__exp));
+ constexpr auto __k1 = _Abi::template __implicit_mask<_Tp>();
+ if constexpr (sizeof(__xi) == 16)
+ {
+ if constexpr (sizeof(_Tp) == 8)
+ return _mm_maskz_scalef_pd(__k1, __xi, __expi);
+ else
+ return _mm_maskz_scalef_ps(__k1, __xi, __expi);
+ }
+ else if constexpr (sizeof(__xi) == 32)
+ {
+ if constexpr (sizeof(_Tp) == 8)
+ return _mm256_maskz_scalef_pd(__k1, __xi, __expi);
+ else
+ return _mm256_maskz_scalef_ps(__k1, __xi, __expi);
+ }
+ else
+ {
+ static_assert(sizeof(__xi) == 64);
+ if constexpr (sizeof(_Tp) == 8)
+ return _mm512_maskz_scalef_pd(__k1, __xi, __expi);
+ else
+ return _mm512_maskz_scalef_ps(__k1, __xi, __expi);
+ }
+ }
+ else
+ return _Base::__ldexp(__x, __exp);
+ }
+
+ // }}}
+ // __trunc {{{
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static _SimdWrapper<_Tp, _Np>
+ __trunc(_SimdWrapper<_Tp, _Np> __x)
+ {
+ if constexpr (__is_avx512_ps<_Tp, _Np>())
+ return _mm512_roundscale_ps(__x, 0x0b);
+ else if constexpr (__is_avx512_pd<_Tp, _Np>())
+ return _mm512_roundscale_pd(__x, 0x0b);
+ else if constexpr (__is_avx_ps<_Tp, _Np>())
+ return _mm256_round_ps(__x, 0x3);
+ else if constexpr (__is_avx_pd<_Tp, _Np>())
+ return _mm256_round_pd(__x, 0x3);
+ else if constexpr (__have_sse4_1 && __is_sse_ps<_Tp, _Np>())
+ return __auto_bitcast(_mm_round_ps(__to_intrin(__x), 0x3));
+ else if constexpr (__have_sse4_1 && __is_sse_pd<_Tp, _Np>())
+ return _mm_round_pd(__x, 0x3);
+ else if constexpr (__is_sse_ps<_Tp, _Np>())
+ {
+ auto __truncated = _mm_cvtepi32_ps(_mm_cvttps_epi32(__to_intrin(__x)));
+ const auto __no_fractional_values
+ = __vector_bitcast<int>(__vector_bitcast<_UInt>(__to_intrin(__x))
+ & 0x7f800000u)
+ < 0x4b000000; // the exponent is so large that no mantissa bits
+ // signify fractional values (0x3f8 + 23*8 =
+ // 0x4b0)
+ return __no_fractional_values ? __truncated : __to_intrin(__x);
+ }
+ else
+ return _Base::__trunc(__x);
+ }
+
+ // }}}
+ // __round {{{
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static _SimdWrapper<_Tp, _Np>
+ __round(_SimdWrapper<_Tp, _Np> __x)
+ {
+ using _V = __vector_type_t<_Tp, _Np>;
+ _V __truncated;
+ if constexpr (__is_avx512_ps<_Tp, _Np>())
+ __truncated = _mm512_roundscale_ps(__x._M_data, 0x0b);
+ else if constexpr (__is_avx512_pd<_Tp, _Np>())
+ __truncated = _mm512_roundscale_pd(__x._M_data, 0x0b);
+ else if constexpr (__is_avx_ps<_Tp, _Np>())
+ __truncated
+ = _mm256_round_ps(__x._M_data, _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC);
+ else if constexpr (__is_avx_pd<_Tp, _Np>())
+ __truncated
+ = _mm256_round_pd(__x._M_data, _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC);
+ else if constexpr (__have_sse4_1 && __is_sse_ps<_Tp, _Np>())
+ __truncated = __auto_bitcast(
+ _mm_round_ps(__to_intrin(__x), _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC));
+ else if constexpr (__have_sse4_1 && __is_sse_pd<_Tp, _Np>())
+ __truncated
+ = _mm_round_pd(__x._M_data, _MM_FROUND_TO_ZERO | _MM_FROUND_NO_EXC);
+ else if constexpr (__is_sse_ps<_Tp, _Np>())
+ __truncated
+ = __auto_bitcast(_mm_cvtepi32_ps(_mm_cvttps_epi32(__to_intrin(__x))));
+ else
+ return _Base::__round(__x);
+
+ // x < 0 => truncated <= 0 && truncated >= x => x - truncated <= 0
+ // x > 0 => truncated >= 0 && truncated <= x => x - truncated >= 0
+
+ const _V __rounded
+ = __truncated
+ + (__and(_S_absmask<_V>, __x._M_data - __truncated) >= _Tp(.5)
+ ? __or(__and(_S_signmask<_V>, __x._M_data), _V() + 1)
+ : _V());
+ if constexpr (__have_sse4_1)
+ return __rounded;
+ else
+ return __and(_S_absmask<_V>, __x._M_data) < 0x1p23f ? __rounded
+ : __x._M_data;
+ }
+
+ // }}}
+ // __nearbyint {{{
+ template <typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+ _GLIBCXX_SIMD_INTRINSIC static _Tp __nearbyint(_Tp __x) noexcept
+ {
+ if constexpr (_TVT::template __is<float, 16>)
+ return _mm512_roundscale_ps(__x, 0x0c);
+ else if constexpr (_TVT::template __is<double, 8>)
+ return _mm512_roundscale_pd(__x, 0x0c);
+ else if constexpr (_TVT::template __is<float, 8>)
+ return _mm256_round_ps(__x, _MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC);
+ else if constexpr (_TVT::template __is<double, 4>)
+ return _mm256_round_pd(__x, _MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC);
+ else if constexpr (__have_sse4_1 && _TVT::template __is<float, 4>)
+ return _mm_round_ps(__x, _MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC);
+ else if constexpr (__have_sse4_1 && _TVT::template __is<double, 2>)
+ return _mm_round_pd(__x, _MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC);
+ else
+ return _Base::__nearbyint(__x);
+ }
+
+ // }}}
+ // __rint {{{
+ template <typename _Tp, typename _TVT = _VectorTraits<_Tp>>
+ _GLIBCXX_SIMD_INTRINSIC static _Tp __rint(_Tp __x) noexcept
+ {
+ if constexpr (_TVT::template __is<float, 16>)
+ return _mm512_roundscale_ps(__x, 0x04);
+ else if constexpr (_TVT::template __is<double, 8>)
+ return _mm512_roundscale_pd(__x, 0x04);
+ else if constexpr (_TVT::template __is<float, 8>)
+ return _mm256_round_ps(__x, _MM_FROUND_CUR_DIRECTION);
+ else if constexpr (_TVT::template __is<double, 4>)
+ return _mm256_round_pd(__x, _MM_FROUND_CUR_DIRECTION);
+ else if constexpr (__have_sse4_1 && _TVT::template __is<float, 4>)
+ return _mm_round_ps(__x, _MM_FROUND_CUR_DIRECTION);
+ else if constexpr (__have_sse4_1 && _TVT::template __is<double, 2>)
+ return _mm_round_pd(__x, _MM_FROUND_CUR_DIRECTION);
+ else
+ return _Base::__rint(__x);
+ }
+
+ // }}}
+ // __floor {{{
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static _SimdWrapper<_Tp, _Np>
+ __floor(_SimdWrapper<_Tp, _Np> __x)
+ {
+ if constexpr (__is_avx512_ps<_Tp, _Np>())
+ return _mm512_roundscale_ps(__x, 0x09);
+ else if constexpr (__is_avx512_pd<_Tp, _Np>())
+ return _mm512_roundscale_pd(__x, 0x09);
+ else if constexpr (__is_avx_ps<_Tp, _Np>())
+ return _mm256_round_ps(__x, 0x1);
+ else if constexpr (__is_avx_pd<_Tp, _Np>())
+ return _mm256_round_pd(__x, 0x1);
+ else if constexpr (__have_sse4_1 && __is_sse_ps<_Tp, _Np>())
+ return __auto_bitcast(_mm_floor_ps(__to_intrin(__x)));
+ else if constexpr (__have_sse4_1 && __is_sse_pd<_Tp, _Np>())
+ return _mm_floor_pd(__x);
+ else
+ return _Base::__floor(__x);
+ }
+
+ // }}}
+ // __ceil {{{
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static _SimdWrapper<_Tp, _Np>
+ __ceil(_SimdWrapper<_Tp, _Np> __x)
+ {
+ if constexpr (__is_avx512_ps<_Tp, _Np>())
+ return _mm512_roundscale_ps(__x, 0x0a);
+ else if constexpr (__is_avx512_pd<_Tp, _Np>())
+ return _mm512_roundscale_pd(__x, 0x0a);
+ else if constexpr (__is_avx_ps<_Tp, _Np>())
+ return _mm256_round_ps(__x, 0x2);
+ else if constexpr (__is_avx_pd<_Tp, _Np>())
+ return _mm256_round_pd(__x, 0x2);
+ else if constexpr (__have_sse4_1 && __is_sse_ps<_Tp, _Np>())
+ return __auto_bitcast(_mm_ceil_ps(__to_intrin(__x)));
+ else if constexpr (__have_sse4_1 && __is_sse_pd<_Tp, _Np>())
+ return _mm_ceil_pd(__x);
+ else
+ return _Base::__ceil(__x);
+ }
+
+ // }}}
+ // __signbit {{{
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static _MaskMember<_Tp>
+ __signbit(_SimdWrapper<_Tp, _Np> __x)
+ {
+ if constexpr (__is_avx512_abi<_Abi>() && __have_avx512dq)
+ {
+ if constexpr (sizeof(__x) == 64 && sizeof(_Tp) == 4)
+ return _mm512_movepi32_mask(__intrin_bitcast<__m512i>(__x._M_data));
+ else if constexpr (sizeof(__x) == 64 && sizeof(_Tp) == 8)
+ return _mm512_movepi64_mask(__intrin_bitcast<__m512i>(__x._M_data));
+ else if constexpr (sizeof(__x) == 32 && sizeof(_Tp) == 4)
+ return _mm256_movepi32_mask(__intrin_bitcast<__m256i>(__x._M_data));
+ else if constexpr (sizeof(__x) == 32 && sizeof(_Tp) == 8)
+ return _mm256_movepi64_mask(__intrin_bitcast<__m256i>(__x._M_data));
+ else if constexpr (sizeof(__x) <= 16 && sizeof(_Tp) == 4)
+ return _mm_movepi32_mask(__intrin_bitcast<__m128i>(__x._M_data));
+ else if constexpr (sizeof(__x) <= 16 && sizeof(_Tp) == 8)
+ return _mm_movepi64_mask(__intrin_bitcast<__m128i>(__x._M_data));
+ }
+ else if constexpr (__is_avx512_abi<_Abi>())
+ {
+ const auto __xi = __to_intrin(__x);
+ [[maybe_unused]] constexpr auto __k1
+ = _Abi::template __implicit_mask<_Tp>();
+ if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 4)
+ return _mm_movemask_ps(__xi);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 8)
+ return _mm_movemask_pd(__xi);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 4)
+ return _mm256_movemask_ps(__xi);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 8)
+ return _mm256_movemask_pd(__xi);
+ else if constexpr (sizeof(__xi) == 64 && sizeof(_Tp) == 4)
+ return _mm512_mask_cmplt_epi32_mask(__k1,
+ __intrin_bitcast<__m512i>(__xi),
+ __m512i());
+ else if constexpr (sizeof(__xi) == 64 && sizeof(_Tp) == 8)
+ return _mm512_mask_cmplt_epi64_mask(__k1,
+ __intrin_bitcast<__m512i>(__xi),
+ __m512i());
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else
+ return _Base::__signbit(__x);
+ /*{
+ using _I = __int_for_sizeof_t<_Tp>;
+ if constexpr (sizeof(__x) == 64)
+ return __less(__vector_bitcast<_I>(__x), _I());
+ else
+ {
+ const auto __xx = __vector_bitcast<_I>(__x._M_data);
+ [[maybe_unused]] constexpr _I __signmask =
+ std::numeric_limits<_I>::min();
+ if constexpr ((sizeof(_Tp) == 4 &&
+ (__have_avx2 || sizeof(__x) == 16)) ||
+ __have_avx512vl)
+ {
+ return __vector_bitcast<_Tp>(__xx >>
+ std::numeric_limits<_I>::digits);
+ }
+ else if constexpr ((__have_avx2 ||
+ (__have_ssse3 && sizeof(__x) == 16)))
+ {
+ return __vector_bitcast<_Tp>((__xx & __signmask) ==
+ __signmask);
+ }
+ else
+ { // SSE2/3 or AVX (w/o AVX2)
+ constexpr auto __one = __vector_broadcast<_Np, _Tp>(1);
+ return __vector_bitcast<_Tp>(
+ __vector_bitcast<_Tp>(
+ (__xx & __signmask) |
+ __vector_bitcast<_I>(__one)) // -1 or 1
+ != __one);
+ }
+ }
+ }*/
+ }
+
+ // }}}
+ // __isnonzerovalue_mask (isnormal | is subnormal == !isinf & !isnan & !is
+ // zero) {{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static auto __isnonzerovalue_mask(_Tp __x)
+ {
+ using _Traits = _VectorTraits<_Tp>;
+ if constexpr (__have_avx512dq_vl)
+ {
+ if constexpr (_Traits::template __is<
+ float, 2> || _Traits::template __is<float, 4>)
+ return _knot_mask8(_mm_fpclass_ps_mask(__to_intrin(__x), 0x9f));
+ else if constexpr (_Traits::template __is<float, 8>)
+ return _knot_mask8(_mm256_fpclass_ps_mask(__x, 0x9f));
+ else if constexpr (_Traits::template __is<float, 16>)
+ return _knot_mask16(_mm512_fpclass_ps_mask(__x, 0x9f));
+ else if constexpr (_Traits::template __is<double, 2>)
+ return _knot_mask8(_mm_fpclass_pd_mask(__x, 0x9f));
+ else if constexpr (_Traits::template __is<double, 4>)
+ return _knot_mask8(_mm256_fpclass_pd_mask(__x, 0x9f));
+ else if constexpr (_Traits::template __is<double, 8>)
+ return _knot_mask8(_mm512_fpclass_pd_mask(__x, 0x9f));
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else
+ {
+ using _Up = typename _Traits::value_type;
+ constexpr size_t _Np = _Traits::_S_width;
+ const auto __a
+ = __x * std::numeric_limits<_Up>::infinity(); // NaN if __x == 0
+ const auto __b = __x * _Up(); // NaN if __x == inf
+ if constexpr (__have_avx512vl && __is_sse_ps<_Up, _Np>())
+ return _mm_cmp_ps_mask(__to_intrin(__a), __to_intrin(__b),
+ _CMP_ORD_Q);
+ else if constexpr (__have_avx512f && __is_sse_ps<_Up, _Np>())
+ return __mmask8(0xf
+ & _mm512_cmp_ps_mask(__auto_bitcast(__a),
+ __auto_bitcast(__b),
+ _CMP_ORD_Q));
+ else if constexpr (__have_avx512vl && __is_sse_pd<_Up, _Np>())
+ return _mm_cmp_pd_mask(__a, __b, _CMP_ORD_Q);
+ else if constexpr (__have_avx512f && __is_sse_pd<_Up, _Np>())
+ return __mmask8(0x3
+ & _mm512_cmp_pd_mask(__auto_bitcast(__a),
+ __auto_bitcast(__b),
+ _CMP_ORD_Q));
+ else if constexpr (__have_avx512vl && __is_avx_ps<_Up, _Np>())
+ return _mm256_cmp_ps_mask(__a, __b, _CMP_ORD_Q);
+ else if constexpr (__have_avx512f && __is_avx_ps<_Up, _Np>())
+ return __mmask8(_mm512_cmp_ps_mask(__auto_bitcast(__a),
+ __auto_bitcast(__b), _CMP_ORD_Q));
+ else if constexpr (__have_avx512vl && __is_avx_pd<_Up, _Np>())
+ return _mm256_cmp_pd_mask(__a, __b, _CMP_ORD_Q);
+ else if constexpr (__have_avx512f && __is_avx_pd<_Up, _Np>())
+ return __mmask8(0xf
+ & _mm512_cmp_pd_mask(__auto_bitcast(__a),
+ __auto_bitcast(__b),
+ _CMP_ORD_Q));
+ else if constexpr (__is_avx512_ps<_Up, _Np>())
+ return _mm512_cmp_ps_mask(__a, __b, _CMP_ORD_Q);
+ else if constexpr (__is_avx512_pd<_Up, _Np>())
+ return _mm512_cmp_pd_mask(__a, __b, _CMP_ORD_Q);
+ else
+ __assert_unreachable<_Tp>();
+ }
+ }
+
+ // }}}
+ // __isfinite {{{
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static _MaskMember<_Tp>
+ __isfinite(_SimdWrapper<_Tp, _Np> __x)
+ {
+ static_assert(is_floating_point_v<_Tp>);
+#if __FINITE_MATH_ONLY__
+ [](auto&&){}(__x);
+ return __equal_to(_SimdWrapper<_Tp, _Np>(), _SimdWrapper<_Tp, _Np>());
+#else
+ if constexpr (__is_avx512_abi<_Abi>() && __have_avx512dq)
+ {
+ const auto __xi = __to_intrin(__x);
+ constexpr auto __k1 = _Abi::template __implicit_mask<_Tp>();
+ if constexpr (sizeof(__xi) == 64 && sizeof(_Tp) == 4)
+ return __k1 ^ _mm512_mask_fpclass_ps_mask(__k1, __xi, 0x99);
+ else if constexpr (sizeof(__xi) == 64 && sizeof(_Tp) == 8)
+ return __k1 ^ _mm512_mask_fpclass_pd_mask(__k1, __xi, 0x99);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 4)
+ return __k1 ^ _mm256_mask_fpclass_ps_mask(__k1, __xi, 0x99);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 8)
+ return __k1 ^ _mm256_mask_fpclass_pd_mask(__k1, __xi, 0x99);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 4)
+ return __k1 ^ _mm_mask_fpclass_ps_mask(__k1, __xi, 0x99);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 8)
+ return __k1 ^ _mm_mask_fpclass_pd_mask(__k1, __xi, 0x99);
+ }
+ else if constexpr (__is_avx512_abi<_Abi>())
+ {
+ // if all exponent bits are set, __x is either inf or NaN
+ using _I = __int_for_sizeof_t<_Tp>;
+ const auto __inf = __vector_bitcast<_I>(
+ __vector_broadcast<_Np>(std::numeric_limits<_Tp>::infinity()));
+ return __less<_I, _Np>(__vector_bitcast<_I>(__x) & __inf, __inf);
+ }
+ else
+ return _Base::__isfinite(__x);
+#endif
+ }
+
+ // }}}
+ // __isinf {{{
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static _MaskMember<_Tp>
+ __isinf(_SimdWrapper<_Tp, _Np> __x)
+ {
+#if __FINITE_MATH_ONLY__
+ [](auto&&){}(__x);
+ return {}; // false
+#else
+ if constexpr (__is_avx512_abi<_Abi>() && __have_avx512dq)
+ {
+ const auto __xi = __to_intrin(__x);
+ if constexpr (sizeof(__xi) == 64 && sizeof(_Tp) == 4)
+ return _mm512_fpclass_ps_mask(__xi, 0x18);
+ else if constexpr (sizeof(__xi) == 64 && sizeof(_Tp) == 8)
+ return _mm512_fpclass_pd_mask(__xi, 0x18);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 4)
+ return _mm256_fpclass_ps_mask(__xi, 0x18);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 8)
+ return _mm256_fpclass_pd_mask(__xi, 0x18);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 4)
+ return _mm_fpclass_ps_mask(__xi, 0x18);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 8)
+ return _mm_fpclass_pd_mask(__xi, 0x18);
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (__have_avx512dq_vl)
+ {
+ if constexpr (__is_sse_pd<_Tp, _Np>())
+ return __vector_bitcast<double>(
+ _mm_movm_epi64(_mm_fpclass_pd_mask(__x, 0x18)));
+ else if constexpr (__is_avx_pd<_Tp, _Np>())
+ return __vector_bitcast<double>(
+ _mm256_movm_epi64(_mm256_fpclass_pd_mask(__x, 0x18)));
+ else if constexpr (__is_sse_ps<_Tp, _Np>())
+ return __auto_bitcast(
+ _mm_movm_epi32(_mm_fpclass_ps_mask(__to_intrin(__x), 0x18)));
+ else if constexpr (__is_avx_ps<_Tp, _Np>())
+ return __vector_bitcast<float>(
+ _mm256_movm_epi32(_mm256_fpclass_ps_mask(__x, 0x18)));
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else
+ return _Base::__isinf(__x);
+#endif
+ }
+
+ // }}}
+ // __isnormal {{{
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static _MaskMember<_Tp>
+ __isnormal(_SimdWrapper<_Tp, _Np> __x)
+ {
+#if __FINITE_MATH_ONLY__
+ [[maybe_unused]] constexpr int __mode = 0x26;
+#else
+ [[maybe_unused]] constexpr int __mode = 0xbf;
+#endif
+ if constexpr (__is_avx512_abi<_Abi>() && __have_avx512dq)
+ {
+ const auto __xi = __to_intrin(__x);
+ const auto __k1 = _Abi::template __implicit_mask<_Tp>();
+ if constexpr (sizeof(__xi) == 64 && sizeof(_Tp) == 4)
+ return __k1 ^ _mm512_mask_fpclass_ps_mask(__k1, __xi, __mode);
+ else if constexpr (sizeof(__xi) == 64 && sizeof(_Tp) == 8)
+ return __k1 ^ _mm512_mask_fpclass_pd_mask(__k1, __xi, __mode);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 4)
+ return __k1 ^ _mm256_mask_fpclass_ps_mask(__k1, __xi, __mode);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 8)
+ return __k1 ^ _mm256_mask_fpclass_pd_mask(__k1, __xi, __mode);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 4)
+ return __k1 ^ _mm_mask_fpclass_ps_mask(__k1, __xi, __mode);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 8)
+ return __k1 ^ _mm_mask_fpclass_pd_mask(__k1, __xi, __mode);
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (__have_avx512dq)
+ {
+ if constexpr (__have_avx512vl && __is_sse_ps<_Tp, _Np>())
+ return __auto_bitcast(_mm_movm_epi32(
+ _knot_mask8(_mm_fpclass_ps_mask(__to_intrin(__x), __mode))));
+ else if constexpr (__have_avx512vl && __is_avx_ps<_Tp, _Np>())
+ return __vector_bitcast<float>(_mm256_movm_epi32(
+ _knot_mask8(_mm256_fpclass_ps_mask(__x, __mode))));
+ else if constexpr (__is_avx512_ps<_Tp, _Np>())
+ return _knot_mask16(_mm512_fpclass_ps_mask(__x, __mode));
+ else if constexpr (__have_avx512vl && __is_sse_pd<_Tp, _Np>())
+ return __vector_bitcast<double>(
+ _mm_movm_epi64(_knot_mask8(_mm_fpclass_pd_mask(__x, __mode))));
+ else if constexpr (__have_avx512vl && __is_avx_pd<_Tp, _Np>())
+ return __vector_bitcast<double>(_mm256_movm_epi64(
+ _knot_mask8(_mm256_fpclass_pd_mask(__x, __mode))));
+ else if constexpr (__is_avx512_pd<_Tp, _Np>())
+ return _knot_mask8(_mm512_fpclass_pd_mask(__x, __mode));
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (__is_avx512_abi<_Abi>())
+ {
+ using _I = __int_for_sizeof_t<_Tp>;
+ const auto absn = __vector_bitcast<_I>(__abs(__x));
+ const auto minn = __vector_bitcast<_I>(
+ __vector_broadcast<_Np>(std::numeric_limits<_Tp>::min()));
+#if __FINITE_MATH_ONLY__
+ return __less_equal<_I, _Np>(minn, absn);
+#else
+ const auto infn = __vector_bitcast<_I>(
+ __vector_broadcast<_Np>(std::numeric_limits<_Tp>::infinity()));
+ return __and(__less_equal<_I, _Np>(minn, absn),
+ __less<_I, _Np>(absn, infn));
+#endif
+ }
+ else
+ return _Base::__isnormal(__x);
+ }
+
+ // }}}
+ // __isnan {{{
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static _MaskMember<_Tp>
+ __isnan(_SimdWrapper<_Tp, _Np> __x)
+ {
+ return __isunordered(__x, __x);
+ }
+
+ // }}}
+ // __isunordered {{{
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static _MaskMember<_Tp>
+ __isunordered(_SimdWrapper<_Tp, _Np> __x, _SimdWrapper<_Tp, _Np> __y)
+ {
+#if __FINITE_MATH_ONLY__
+ [](auto&&){}(__x);
+ return {}; // false
+#else
+ const auto __xi = __to_intrin(__x);
+ const auto __yi = __to_intrin(__y);
+ if constexpr (__is_avx512_abi<_Abi>())
+ {
+ constexpr auto __k1 = _Abi::template __implicit_mask<_Tp>();
+ if constexpr (sizeof(__xi) == 64 && sizeof(_Tp) == 4)
+ return _mm512_mask_cmp_ps_mask(__k1, __xi, __yi, _CMP_UNORD_Q);
+ else if constexpr (sizeof(__xi) == 64 && sizeof(_Tp) == 8)
+ return _mm512_mask_cmp_pd_mask(__k1, __xi, __yi, _CMP_UNORD_Q);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 4)
+ return _mm256_mask_cmp_ps_mask(__k1, __xi, __yi, _CMP_UNORD_Q);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 8)
+ return _mm256_mask_cmp_pd_mask(__k1, __xi, __yi, _CMP_UNORD_Q);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 4)
+ return _mm_mask_cmp_ps_mask(__k1, __xi, __yi, _CMP_UNORD_Q);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 8)
+ return _mm_mask_cmp_pd_mask(__k1, __xi, __yi, _CMP_UNORD_Q);
+ }
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 4)
+ return _mm256_cmp_ps(__xi, __yi, _CMP_UNORD_Q);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 8)
+ return _mm256_cmp_pd(__xi, __yi, _CMP_UNORD_Q);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 4)
+ return __auto_bitcast(_mm_cmpunord_ps(__xi, __yi));
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 8)
+ return __auto_bitcast(_mm_cmpunord_pd(__xi, __yi));
+ else
+ __assert_unreachable<_Tp>();
+#endif
+ }
+
+ // }}}
+ // __isgreater {{{
+ template <typename _Tp, size_t _Np>
+ static constexpr _MaskMember<_Tp> __isgreater(_SimdWrapper<_Tp, _Np> __x,
+ _SimdWrapper<_Tp, _Np> __y)
+ {
+ const auto __xi = __to_intrin(__x);
+ const auto __yi = __to_intrin(__y);
+ if constexpr (__is_avx512_abi<_Abi>())
+ {
+ const auto __k1 = _Abi::template __implicit_mask<_Tp>();
+ if constexpr (sizeof(__xi) == 64 && sizeof(_Tp) == 4)
+ return _mm512_mask_cmp_ps_mask(__k1, __xi, __yi, _CMP_GT_OQ);
+ else if constexpr (sizeof(__xi) == 64 && sizeof(_Tp) == 8)
+ return _mm512_mask_cmp_pd_mask(__k1, __xi, __yi, _CMP_GT_OQ);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 4)
+ return _mm256_mask_cmp_ps_mask(__k1, __xi, __yi, _CMP_GT_OQ);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 8)
+ return _mm256_mask_cmp_pd_mask(__k1, __xi, __yi, _CMP_GT_OQ);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 4)
+ return _mm_mask_cmp_ps_mask(__k1, __xi, __yi, _CMP_GT_OQ);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 8)
+ return _mm_mask_cmp_pd_mask(__k1, __xi, __yi, _CMP_GT_OQ);
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (__have_avx)
+ {
+ if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 4)
+ return _mm256_cmp_ps(__xi, __yi, _CMP_GT_OQ);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 8)
+ return _mm256_cmp_pd(__xi, __yi, _CMP_GT_OQ);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 4)
+ return __auto_bitcast(_mm_cmp_ps(__xi, __yi, _CMP_GT_OQ));
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 8)
+ return _mm_cmp_pd(__xi, __yi, _CMP_GT_OQ);
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (__have_sse2 && sizeof(__xi) == 16 && sizeof(_Tp) == 4)
+ {
+ const auto __xn = __vector_bitcast<int>(__xi);
+ const auto __yn = __vector_bitcast<int>(__yi);
+ const auto __xp = __xn < 0 ? -(__xn & 0x7fff'ffff) : __xn;
+ const auto __yp = __yn < 0 ? -(__yn & 0x7fff'ffff) : __yn;
+ return __auto_bitcast(__and(_mm_cmpord_ps(__xi, __yi),
+ reinterpret_cast<__m128>(__xp > __yp)));
+ }
+ else if constexpr (__have_sse2 && sizeof(__xi) == 16 && sizeof(_Tp) == 8)
+ return __auto_bitcast(__vector_type_t<__int_with_sizeof_t<8>, 2>{
+ -_mm_ucomigt_sd(__xi, __yi),
+ -_mm_ucomigt_sd(_mm_unpackhi_pd(__xi, __xi),
+ _mm_unpackhi_pd(__yi, __yi))});
+ else
+ return _Base::__isgreater(__x, __y);
+ }
+
+ // }}}
+ // __isgreaterequal {{{
+ template <typename _Tp, size_t _Np>
+ static constexpr _MaskMember<_Tp> __isgreaterequal(_SimdWrapper<_Tp, _Np> __x,
+ _SimdWrapper<_Tp, _Np> __y)
+ {
+ const auto __xi = __to_intrin(__x);
+ const auto __yi = __to_intrin(__y);
+ if constexpr (__is_avx512_abi<_Abi>())
+ {
+ const auto __k1 = _Abi::template __implicit_mask<_Tp>();
+ if constexpr (sizeof(__xi) == 64 && sizeof(_Tp) == 4)
+ return _mm512_mask_cmp_ps_mask(__k1, __xi, __yi, _CMP_GE_OQ);
+ else if constexpr (sizeof(__xi) == 64 && sizeof(_Tp) == 8)
+ return _mm512_mask_cmp_pd_mask(__k1, __xi, __yi, _CMP_GE_OQ);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 4)
+ return _mm256_mask_cmp_ps_mask(__k1, __xi, __yi, _CMP_GE_OQ);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 8)
+ return _mm256_mask_cmp_pd_mask(__k1, __xi, __yi, _CMP_GE_OQ);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 4)
+ return _mm_mask_cmp_ps_mask(__k1, __xi, __yi, _CMP_GE_OQ);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 8)
+ return _mm_mask_cmp_pd_mask(__k1, __xi, __yi, _CMP_GE_OQ);
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (__have_avx)
+ {
+ if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 4)
+ return _mm256_cmp_ps(__xi, __yi, _CMP_GE_OQ);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 8)
+ return _mm256_cmp_pd(__xi, __yi, _CMP_GE_OQ);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 4)
+ return __auto_bitcast(_mm_cmp_ps(__xi, __yi, _CMP_GE_OQ));
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 8)
+ return _mm_cmp_pd(__xi, __yi, _CMP_GE_OQ);
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (__have_sse2 && sizeof(__xi) == 16 && sizeof(_Tp) == 4)
+ {
+ const auto __xn = __vector_bitcast<int>(__xi);
+ const auto __yn = __vector_bitcast<int>(__yi);
+ const auto __xp = __xn < 0 ? -(__xn & 0x7fff'ffff) : __xn;
+ const auto __yp = __yn < 0 ? -(__yn & 0x7fff'ffff) : __yn;
+ return __auto_bitcast(__and(_mm_cmpord_ps(__xi, __yi),
+ reinterpret_cast<__m128>(__xp >= __yp)));
+ }
+ else if constexpr (__have_sse2 && sizeof(__xi) == 16 && sizeof(_Tp) == 8)
+ return __auto_bitcast(__vector_type_t<__int_with_sizeof_t<8>, 2>{
+ -_mm_ucomige_sd(__xi, __yi),
+ -_mm_ucomige_sd(_mm_unpackhi_pd(__xi, __xi),
+ _mm_unpackhi_pd(__yi, __yi))});
+ else
+ return _Base::__isgreaterequal(__x, __y);
+ }
+
+ // }}}
+ // __isless {{{
+ template <typename _Tp, size_t _Np>
+ static constexpr _MaskMember<_Tp> __isless(_SimdWrapper<_Tp, _Np> __x,
+ _SimdWrapper<_Tp, _Np> __y)
+ {
+ const auto __xi = __to_intrin(__x);
+ const auto __yi = __to_intrin(__y);
+ if constexpr (__is_avx512_abi<_Abi>())
+ {
+ const auto __k1 = _Abi::template __implicit_mask<_Tp>();
+ if constexpr (sizeof(__xi) == 64 && sizeof(_Tp) == 4)
+ return _mm512_mask_cmp_ps_mask(__k1, __xi, __yi, _CMP_LT_OQ);
+ else if constexpr (sizeof(__xi) == 64 && sizeof(_Tp) == 8)
+ return _mm512_mask_cmp_pd_mask(__k1, __xi, __yi, _CMP_LT_OQ);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 4)
+ return _mm256_mask_cmp_ps_mask(__k1, __xi, __yi, _CMP_LT_OQ);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 8)
+ return _mm256_mask_cmp_pd_mask(__k1, __xi, __yi, _CMP_LT_OQ);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 4)
+ return _mm_mask_cmp_ps_mask(__k1, __xi, __yi, _CMP_LT_OQ);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 8)
+ return _mm_mask_cmp_pd_mask(__k1, __xi, __yi, _CMP_LT_OQ);
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (__have_avx)
+ {
+ if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 4)
+ return _mm256_cmp_ps(__xi, __yi, _CMP_LT_OQ);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 8)
+ return _mm256_cmp_pd(__xi, __yi, _CMP_LT_OQ);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 4)
+ return __auto_bitcast(_mm_cmp_ps(__xi, __yi, _CMP_LT_OQ));
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 8)
+ return _mm_cmp_pd(__xi, __yi, _CMP_LT_OQ);
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (__have_sse2 && sizeof(__xi) == 16 && sizeof(_Tp) == 4)
+ {
+ const auto __xn = __vector_bitcast<int>(__xi);
+ const auto __yn = __vector_bitcast<int>(__yi);
+ const auto __xp = __xn < 0 ? -(__xn & 0x7fff'ffff) : __xn;
+ const auto __yp = __yn < 0 ? -(__yn & 0x7fff'ffff) : __yn;
+ return __auto_bitcast(__and(_mm_cmpord_ps(__xi, __yi),
+ reinterpret_cast<__m128>(__xp < __yp)));
+ }
+ else if constexpr (__have_sse2 && sizeof(__xi) == 16 && sizeof(_Tp) == 8)
+ return __auto_bitcast(__vector_type_t<__int_with_sizeof_t<8>, 2>{
+ -_mm_ucomigt_sd(__yi, __xi),
+ -_mm_ucomigt_sd(_mm_unpackhi_pd(__yi, __yi),
+ _mm_unpackhi_pd(__xi, __xi))});
+ else
+ return _Base::__isless(__x, __y);
+ }
+
+ // }}}
+ // __islessequal {{{
+ template <typename _Tp, size_t _Np>
+ static constexpr _MaskMember<_Tp> __islessequal(_SimdWrapper<_Tp, _Np> __x,
+ _SimdWrapper<_Tp, _Np> __y)
+ {
+ const auto __xi = __to_intrin(__x);
+ const auto __yi = __to_intrin(__y);
+ if constexpr (__is_avx512_abi<_Abi>())
+ {
+ const auto __k1 = _Abi::template __implicit_mask<_Tp>();
+ if constexpr (sizeof(__xi) == 64 && sizeof(_Tp) == 4)
+ return _mm512_mask_cmp_ps_mask(__k1, __xi, __yi, _CMP_LE_OQ);
+ else if constexpr (sizeof(__xi) == 64 && sizeof(_Tp) == 8)
+ return _mm512_mask_cmp_pd_mask(__k1, __xi, __yi, _CMP_LE_OQ);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 4)
+ return _mm256_mask_cmp_ps_mask(__k1, __xi, __yi, _CMP_LE_OQ);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 8)
+ return _mm256_mask_cmp_pd_mask(__k1, __xi, __yi, _CMP_LE_OQ);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 4)
+ return _mm_mask_cmp_ps_mask(__k1, __xi, __yi, _CMP_LE_OQ);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 8)
+ return _mm_mask_cmp_pd_mask(__k1, __xi, __yi, _CMP_LE_OQ);
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (__have_avx)
+ {
+ if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 4)
+ return _mm256_cmp_ps(__xi, __yi, _CMP_LE_OQ);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 8)
+ return _mm256_cmp_pd(__xi, __yi, _CMP_LE_OQ);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 4)
+ return __auto_bitcast(_mm_cmp_ps(__xi, __yi, _CMP_LE_OQ));
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 8)
+ return _mm_cmp_pd(__xi, __yi, _CMP_LE_OQ);
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (__have_sse2 && sizeof(__xi) == 16 && sizeof(_Tp) == 4)
+ {
+ const auto __xn = __vector_bitcast<int>(__xi);
+ const auto __yn = __vector_bitcast<int>(__yi);
+ const auto __xp = __xn < 0 ? -(__xn & 0x7fff'ffff) : __xn;
+ const auto __yp = __yn < 0 ? -(__yn & 0x7fff'ffff) : __yn;
+ return __auto_bitcast(__and(_mm_cmpord_ps(__xi, __yi),
+ reinterpret_cast<__m128>(__xp <= __yp)));
+ }
+ else if constexpr (__have_sse2 && sizeof(__xi) == 16 && sizeof(_Tp) == 8)
+ return __auto_bitcast(__vector_type_t<__int_with_sizeof_t<8>, 2>{
+ -_mm_ucomige_sd(__yi, __xi),
+ -_mm_ucomige_sd(_mm_unpackhi_pd(__yi, __yi),
+ _mm_unpackhi_pd(__xi, __xi))});
+ else
+ return _Base::__islessequal(__x, __y);
+ }
+
+ // }}}
+ // __islessgreater {{{
+ template <typename _Tp, size_t _Np>
+ static constexpr _MaskMember<_Tp> __islessgreater(_SimdWrapper<_Tp, _Np> __x,
+ _SimdWrapper<_Tp, _Np> __y)
+ {
+ const auto __xi = __to_intrin(__x);
+ const auto __yi = __to_intrin(__y);
+ if constexpr (__is_avx512_abi<_Abi>())
+ {
+ const auto __k1 = _Abi::template __implicit_mask<_Tp>();
+ if constexpr (sizeof(__xi) == 64 && sizeof(_Tp) == 4)
+ return _mm512_mask_cmp_ps_mask(__k1, __xi, __yi, _CMP_NEQ_OQ);
+ else if constexpr (sizeof(__xi) == 64 && sizeof(_Tp) == 8)
+ return _mm512_mask_cmp_pd_mask(__k1, __xi, __yi, _CMP_NEQ_OQ);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 4)
+ return _mm256_mask_cmp_ps_mask(__k1, __xi, __yi, _CMP_NEQ_OQ);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 8)
+ return _mm256_mask_cmp_pd_mask(__k1, __xi, __yi, _CMP_NEQ_OQ);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 4)
+ return _mm_mask_cmp_ps_mask(__k1, __xi, __yi, _CMP_NEQ_OQ);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 8)
+ return _mm_mask_cmp_pd_mask(__k1, __xi, __yi, _CMP_NEQ_OQ);
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (__have_avx)
+ {
+ if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 4)
+ return _mm256_cmp_ps(__xi, __yi, _CMP_NEQ_OQ);
+ else if constexpr (sizeof(__xi) == 32 && sizeof(_Tp) == 8)
+ return _mm256_cmp_pd(__xi, __yi, _CMP_NEQ_OQ);
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 4)
+ return __auto_bitcast(_mm_cmp_ps(__xi, __yi, _CMP_NEQ_OQ));
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 8)
+ return _mm_cmp_pd(__xi, __yi, _CMP_NEQ_OQ);
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 4)
+ return __auto_bitcast(
+ __and(_mm_cmpord_ps(__xi, __yi), _mm_cmpneq_ps(__xi, __yi)));
+ else if constexpr (sizeof(__xi) == 16 && sizeof(_Tp) == 8)
+ return __and(_mm_cmpord_pd(__xi, __yi), _mm_cmpneq_pd(__xi, __yi));
+ else
+ __assert_unreachable<_Tp>();
+ }
+
+ //}}}
+ //}}}
+};
+
+// }}}
+// _MaskImplX86Mixin {{{
+struct _MaskImplX86Mixin
+{
+ template <typename _Tp> using _TypeTag = _Tp*;
+ using _Base = _MaskImplBuiltinMixin;
+
+ // __to_maskvector(bool) {{{
+ template <typename _Up, size_t _ToN = 1, typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr enable_if_t<is_same_v<_Tp, bool>,
+ _SimdWrapper<_Up, _ToN>>
+ __to_maskvector(_Tp __x)
+ {
+ using _I = __int_for_sizeof_t<_Up>;
+ return __vector_bitcast<_Up>(__x ? __vector_type_t<_I, _ToN>{~_I()}
+ : __vector_type_t<_I, _ToN>{});
+ }
+
+ // }}}
+ // __to_maskvector(_SanitizedBitMask) {{{
+ template <typename _Up, size_t _UpN = 0, size_t _Np,
+ size_t _ToN = _UpN == 0 ? _Np : _UpN>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Up, _ToN>
+ __to_maskvector(_SanitizedBitMask<_Np> __x)
+ {
+ using _UV = __vector_type_t<_Up, _ToN>;
+ using _UI = __intrinsic_type_t<_Up, _ToN>;
+ [[maybe_unused]] const auto __k = __x._M_to_bits();
+ if constexpr (_Np == 1)
+ return __to_maskvector<_Up, _ToN>(__k);
+ else if (__x._M_is_constprop() || __builtin_is_constant_evaluated())
+ {
+ using _Ip = __int_for_sizeof_t<_Up>;
+ return __vector_bitcast<_Up>(
+ __generate_from_n_evaluations<std::min(_ToN, _Np),
+ __vector_type_t<_Ip, _ToN>>(
+ [&](auto __i) -> _Ip { return -__x[__i.value]; }));
+ }
+ else if constexpr (sizeof(_Up) == 1)
+ {
+ if constexpr (sizeof(_UI) == 16)
+ {
+ if constexpr (__have_avx512bw_vl)
+ return __intrin_bitcast<_UV>(_mm_movm_epi8(__k));
+ else if constexpr (__have_avx512bw)
+ return __intrin_bitcast<_UV>(__lo128(_mm512_movm_epi8(__k)));
+ else if constexpr (__have_avx512f)
+ {
+ auto __as32bits = _mm512_maskz_mov_epi32(__k, ~__m512i());
+ auto __as16bits = __xzyw(
+ _mm256_packs_epi32(__lo256(__as32bits), __hi256(__as32bits)));
+ return __intrin_bitcast<_UV>(
+ _mm_packs_epi16(__lo128(__as16bits), __hi128(__as16bits)));
+ }
+ else if constexpr (__have_ssse3)
+ {
+ const auto __bitmask = __to_intrin(
+ __make_vector<_UChar>(1, 2, 4, 8, 16, 32, 64, 128, 1, 2, 4, 8,
+ 16, 32, 64, 128));
+ return __intrin_bitcast<_UV>(
+ __vector_bitcast<_Up>(
+ _mm_shuffle_epi8(__to_intrin(
+ __vector_type_t<_ULLong, 2>{__k}),
+ _mm_setr_epi8(0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
+ 1, 1, 1, 1, 1, 1))
+ & __bitmask)
+ != 0);
+ }
+ // else fall through
+ }
+ else if constexpr (sizeof(_UI) == 32)
+ {
+ if constexpr (__have_avx512bw_vl)
+ return __vector_bitcast<_Up>(_mm256_movm_epi8(__k));
+ else if constexpr (__have_avx512bw)
+ return __vector_bitcast<_Up>(__lo256(_mm512_movm_epi8(__k)));
+ else if constexpr (__have_avx512f)
+ {
+ auto __as16bits = // 0 16 1 17 ... 15 31
+ _mm512_srli_epi32(_mm512_maskz_mov_epi32(__k, ~__m512i()), 16)
+ | _mm512_slli_epi32(_mm512_maskz_mov_epi32(__k >> 16,
+ ~__m512i()),
+ 16);
+ auto __0_16_1_17 = __xzyw(_mm256_packs_epi16(
+ __lo256(__as16bits),
+ __hi256(__as16bits)) // 0 16 1 17 2 18 3 19 8 24 9 25 ...
+ );
+ // deinterleave:
+ return __vector_bitcast<_Up>(__xzyw(_mm256_shuffle_epi8(
+ __0_16_1_17, // 0 16 1 17 2 ...
+ _mm256_setr_epi8(0, 2, 4, 6, 8, 10, 12, 14, 1, 3, 5, 7, 9, 11,
+ 13, 15, 0, 2, 4, 6, 8, 10, 12, 14, 1, 3, 5,
+ 7, 9, 11, 13,
+ 15)))); // 0-7 16-23 8-15 24-31 -> xzyw
+ // 0-3 8-11 16-19 24-27
+ // 4-7 12-15 20-23 28-31
+ }
+ else if constexpr (__have_avx2)
+ {
+ const auto __bitmask = _mm256_broadcastsi128_si256(__to_intrin(
+ __make_vector<_UChar>(1, 2, 4, 8, 16, 32, 64, 128, 1, 2, 4, 8,
+ 16, 32, 64, 128)));
+ return __vector_bitcast<_Up>(
+ __vector_bitcast<_Up>(
+ _mm256_shuffle_epi8(
+ _mm256_broadcastsi128_si256(
+ __to_intrin(__vector_type_t<_ULLong, 2>{__k})),
+ _mm256_setr_epi8(0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1,
+ 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3,
+ 3, 3, 3, 3))
+ & __bitmask)
+ != 0);
+ }
+ // else fall through
+ }
+ else if constexpr (sizeof(_UI) == 64)
+ return reinterpret_cast<__vector_type_t<_SChar, 64>>(
+ _mm512_movm_epi8(__k));
+ if constexpr (std::min(_ToN, _Np) <= 4)
+ {
+ if constexpr (_Np > 7) // avoid overflow
+ __x &= _SanitizedBitMask<_Np>(0x0f);
+ const _UInt __char_mask
+ = ((_UInt(__x.to_ulong()) * 0x00204081U) & 0x01010101ULL) * 0xff;
+ __vector_type_t<_Up, _ToN> __r = {};
+ __builtin_memcpy(&__r, &__char_mask,
+ std::min(sizeof(__r), sizeof(__char_mask)));
+ return __r;
+ }
+ else if constexpr (std::min(_ToN, _Np) <= 7)
+ {
+ if constexpr (_Np > 7) // avoid overflow
+ __x &= _SanitizedBitMask<_Np>(0x7f);
+ const _ULLong __char_mask
+ = ((__x.to_ulong() * 0x40810204081ULL) & 0x0101010101010101ULL)
+ * 0xff;
+ __vector_type_t<_Up, _ToN> __r = {};
+ __builtin_memcpy(&__r, &__char_mask,
+ std::min(sizeof(__r), sizeof(__char_mask)));
+ return __r;
+ }
+ }
+ else if constexpr (sizeof(_Up) == 2)
+ {
+ if constexpr (sizeof(_UI) == 16)
+ {
+ if constexpr (__have_avx512bw_vl)
+ return __intrin_bitcast<_UV>(_mm_movm_epi16(__k));
+ else if constexpr (__have_avx512bw)
+ return __intrin_bitcast<_UV>(__lo128(_mm512_movm_epi16(__k)));
+ else if constexpr (__have_avx512f)
+ {
+ __m256i __as32bits;
+ if constexpr (__have_avx512vl)
+ __as32bits = _mm256_maskz_mov_epi32(__k, ~__m256i());
+ else
+ __as32bits = __lo256(_mm512_maskz_mov_epi32(__k, ~__m512i()));
+ return __intrin_bitcast<_UV>(
+ _mm_packs_epi32(__lo128(__as32bits), __hi128(__as32bits)));
+ }
+ // else fall through
+ }
+ else if constexpr (sizeof(_UI) == 32)
+ {
+ if constexpr (__have_avx512bw_vl)
+ return __vector_bitcast<_Up>(_mm256_movm_epi16(__k));
+ else if constexpr (__have_avx512bw)
+ return __vector_bitcast<_Up>(__lo256(_mm512_movm_epi16(__k)));
+ else if constexpr (__have_avx512f)
+ {
+ auto __as32bits = _mm512_maskz_mov_epi32(__k, ~__m512i());
+ return __vector_bitcast<_Up>(
+ __xzyw(_mm256_packs_epi32(__lo256(__as32bits),
+ __hi256(__as32bits))));
+ }
+ // else fall through
+ }
+ else if constexpr (sizeof(_UI) == 64)
+ return __vector_bitcast<_Up>(_mm512_movm_epi16(__k));
+ }
+ else if constexpr (sizeof(_Up) == 4)
+ {
+ if constexpr (sizeof(_UI) == 16)
+ {
+ if constexpr (__have_avx512dq_vl)
+ return __intrin_bitcast<_UV>(_mm_movm_epi32(__k));
+ else if constexpr (__have_avx512dq)
+ return __intrin_bitcast<_UV>(__lo128(_mm512_movm_epi32(__k)));
+ else if constexpr (__have_avx512vl)
+ return __intrin_bitcast<_UV>(
+ _mm_maskz_mov_epi32(__k, ~__m128i()));
+ else if constexpr (__have_avx512f)
+ return __intrin_bitcast<_UV>(
+ __lo128(_mm512_maskz_mov_epi32(__k, ~__m512i())));
+ // else fall through
+ }
+ else if constexpr (sizeof(_UI) == 32)
+ {
+ if constexpr (__have_avx512dq_vl)
+ return __vector_bitcast<_Up>(_mm256_movm_epi32(__k));
+ else if constexpr (__have_avx512dq)
+ return __vector_bitcast<_Up>(__lo256(_mm512_movm_epi32(__k)));
+ else if constexpr (__have_avx512vl)
+ return __vector_bitcast<_Up>(
+ _mm256_maskz_mov_epi32(__k, ~__m256i()));
+ else if constexpr (__have_avx512f)
+ return __vector_bitcast<_Up>(
+ __lo256(_mm512_maskz_mov_epi32(__k, ~__m512i())));
+ // else fall through
+ }
+ else if constexpr (sizeof(_UI) == 64)
+ return __vector_bitcast<_Up>(
+ __have_avx512dq ? _mm512_movm_epi32(__k)
+ : _mm512_maskz_mov_epi32(__k, ~__m512i()));
+ }
+ else if constexpr (sizeof(_Up) == 8)
+ {
+ if constexpr (sizeof(_UI) == 16)
+ {
+ if constexpr (__have_avx512dq_vl)
+ return __vector_bitcast<_Up>(_mm_movm_epi64(__k));
+ else if constexpr (__have_avx512dq)
+ return __vector_bitcast<_Up>(__lo128(_mm512_movm_epi64(__k)));
+ else if constexpr (__have_avx512vl)
+ return __vector_bitcast<_Up>(
+ _mm_maskz_mov_epi64(__k, ~__m128i()));
+ else if constexpr (__have_avx512f)
+ return __vector_bitcast<_Up>(
+ __lo128(_mm512_maskz_mov_epi64(__k, ~__m512i())));
+ // else fall through
+ }
+ else if constexpr (sizeof(_UI) == 32)
+ {
+ if constexpr (__have_avx512dq_vl)
+ return __vector_bitcast<_Up>(_mm256_movm_epi64(__k));
+ else if constexpr (__have_avx512dq)
+ return __vector_bitcast<_Up>(__lo256(_mm512_movm_epi64(__k)));
+ else if constexpr (__have_avx512vl)
+ return __vector_bitcast<_Up>(
+ _mm256_maskz_mov_epi64(__k, ~__m256i()));
+ else if constexpr (__have_avx512f)
+ return __vector_bitcast<_Up>(
+ __lo256(_mm512_maskz_mov_epi64(__k, ~__m512i())));
+ // else fall through
+ }
+ else if constexpr (sizeof(_UI) == 64)
+ return __vector_bitcast<_Up>(
+ __have_avx512dq ? _mm512_movm_epi64(__k)
+ : _mm512_maskz_mov_epi64(__k, ~__m512i()));
+ }
+
+ using _UpUInt = std::make_unsigned_t<__int_for_sizeof_t<_Up>>;
+ using _V = __vector_type_t<_UpUInt, _ToN>;
+ constexpr size_t __bits_per_element = sizeof(_Up) * CHAR_BIT;
+ if constexpr (_ToN == 2)
+ {
+ return __vector_bitcast<_Up>(_V{_UpUInt(-__x[0]), _UpUInt(-__x[1])});
+ }
+ else if constexpr (!__have_avx2 && __have_avx && sizeof(_V) == 32)
+ {
+ if constexpr (sizeof(_Up) == 4)
+ return __vector_bitcast<_Up>(_mm256_cmp_ps(
+ _mm256_and_ps(_mm256_castsi256_ps(_mm256_set1_epi32(__k)),
+ _mm256_castsi256_ps(_mm256_setr_epi32(
+ 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80))),
+ _mm256_setzero_ps(), _CMP_NEQ_UQ));
+ else if constexpr (sizeof(_Up) == 8)
+ return __vector_bitcast<_Up>(_mm256_cmp_pd(
+ _mm256_and_pd(_mm256_castsi256_pd(_mm256_set1_epi64x(__k)),
+ _mm256_castsi256_pd(
+ _mm256_setr_epi64x(0x01, 0x02, 0x04, 0x08))),
+ _mm256_setzero_pd(), _CMP_NEQ_UQ));
+ else
+ __assert_unreachable<_Up>();
+ }
+ else if constexpr (__bits_per_element >= _ToN)
+ {
+ constexpr auto __bitmask
+ = __generate_vector<__vector_type_t<_UpUInt, _ToN>>(
+ [](auto __i) constexpr->_UpUInt {
+ return __i < _ToN ? 1ull << __i : 0;
+ });
+ const auto __bits = __vector_broadcast<_ToN, _UpUInt>(__k) & __bitmask;
+ if constexpr (__bits_per_element > _ToN)
+ return __vector_bitcast<_Up>(
+ __vector_bitcast<__int_for_sizeof_t<_Up>>(__bits) > 0);
+ else
+ return __vector_bitcast<_Up>(__bits != 0);
+ }
+ else
+ {
+ const _V __tmp
+ = __generate_vector<_V>([&](auto __i) constexpr {
+ return static_cast<_UpUInt>(
+ __k >> (__bits_per_element * (__i / __bits_per_element)));
+ })
+ & __generate_vector<_V>([](auto __i) constexpr {
+ return static_cast<_UpUInt>(1ull << (__i % __bits_per_element));
+ }); // mask bit index
+ return __vector_bitcast<_Up>(__tmp != _V());
+ }
+ }
+
+ // }}}
+ // __to_maskvector(_SimdWrapper) {{{
+ template <typename _Up, size_t _UpN = 0, typename _Tp, size_t _Np,
+ size_t _ToN = _UpN == 0 ? _Np : _UpN>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Up, _ToN>
+ __to_maskvector(_SimdWrapper<_Tp, _Np> __x)
+ {
+ using _TW = _SimdWrapper<_Tp, _Np>;
+ using _UW = _SimdWrapper<_Up, _ToN>;
+ using _UI = __intrinsic_type_t<_Up, _ToN>;
+ if constexpr (sizeof(_Up) == sizeof(_Tp) && sizeof(_TW) == sizeof(_UW))
+ if constexpr (_ToN <= _Np)
+ return __wrapper_bitcast<_Up, _ToN>(__x);
+ else
+ return simd_abi::deduce_t<_Up, _ToN>::__masked(
+ __wrapper_bitcast<_Up, _ToN>(__x));
+ else if constexpr (is_same_v<_Tp, bool>) // bits -> vector
+ return __to_maskvector<_Up, _ToN>(
+ _BitMask<_Np>(__x._M_data)._M_sanitized());
+ else
+ { // vector -> vector {{{
+ if (__x._M_is_constprop() || __builtin_is_constant_evaluated())
+ {
+ const auto __y = __vector_bitcast<__int_for_sizeof_t<_Tp>>(__x);
+ using _Ip = __int_for_sizeof_t<_Up>;
+ return __vector_bitcast<_Up>(
+ __generate_from_n_evaluations<std::min(_ToN, _Np),
+ __vector_type_t<_Ip, _ToN>>(
+ [&](auto __i) -> _Ip { return __y[__i.value]; }));
+ }
+ using _To = __vector_type_t<_Up, _ToN>;
+ [[maybe_unused]] constexpr size_t _FromN = _Np;
+ constexpr int _FromBytes = sizeof(_Tp);
+ constexpr int _ToBytes = sizeof(_Up);
+ const auto __k = __x._M_data;
+
+ if constexpr (_FromBytes == _ToBytes)
+ return __intrin_bitcast<_To>(__k);
+ else if constexpr (sizeof(_UI) == 16 && sizeof(__k) == 16)
+ { // SSE -> SSE {{{
+ if constexpr (_FromBytes == 4 && _ToBytes == 8)
+ return __intrin_bitcast<_To>(__interleave128_lo(__k, __k));
+ else if constexpr (_FromBytes == 2 && _ToBytes == 8)
+ {
+ const auto __y
+ = __vector_bitcast<int>(__interleave128_lo(__k, __k));
+ return __intrin_bitcast<_To>(__interleave128_lo(__y, __y));
+ }
+ else if constexpr (_FromBytes == 1 && _ToBytes == 8)
+ {
+ auto __y
+ = __vector_bitcast<short>(__interleave128_lo(__k, __k));
+ auto __z = __vector_bitcast<int>(__interleave128_lo(__y, __y));
+ return __intrin_bitcast<_To>(__interleave128_lo(__z, __z));
+ }
+ else if constexpr (_FromBytes == 8 && _ToBytes == 4 && __have_sse2)
+ return __intrin_bitcast<_To>(
+ _mm_packs_epi32(__vector_bitcast<_LLong>(__k), __m128i()));
+ else if constexpr (_FromBytes == 8 && _ToBytes == 4)
+ return __vector_shuffle<1, 3, 6, 7>(__vector_bitcast<_Up>(__k),
+ _UI());
+ else if constexpr (_FromBytes == 2 && _ToBytes == 4)
+ return __intrin_bitcast<_To>(__interleave128_lo(__k, __k));
+ else if constexpr (_FromBytes == 1 && _ToBytes == 4)
+ {
+ const auto __y
+ = __vector_bitcast<short>(__interleave128_lo(__k, __k));
+ return __intrin_bitcast<_To>(__interleave128_lo(__y, __y));
+ }
+ else if constexpr (_FromBytes == 8 && _ToBytes == 2)
+ {
+ if constexpr (__have_sse2 && !__have_ssse3)
+ return __intrin_bitcast<_To>(_mm_packs_epi32(
+ _mm_packs_epi32(__vector_bitcast<_LLong>(__k), __m128i()),
+ __m128i()));
+ else
+ return __intrin_bitcast<_To>(
+ __vector_permute<3, 7, -1, -1, -1, -1, -1, -1>(
+ __vector_bitcast<_Up>(__k)));
+ }
+ else if constexpr (_FromBytes == 4 && _ToBytes == 2)
+ return __intrin_bitcast<_To>(
+ _mm_packs_epi32(__vector_bitcast<_LLong>(__k), __m128i()));
+ else if constexpr (_FromBytes == 1 && _ToBytes == 2)
+ return __intrin_bitcast<_To>(__interleave128_lo(__k, __k));
+ else if constexpr (_FromBytes == 8 && _ToBytes == 1 && __have_ssse3)
+ return __intrin_bitcast<_To>(
+ _mm_shuffle_epi8(__vector_bitcast<_LLong>(__k),
+ _mm_setr_epi8(7, 15, -1, -1, -1, -1, -1, -1,
+ -1, -1, -1, -1, -1, -1, -1,
+ -1)));
+ else if constexpr (_FromBytes == 8 && _ToBytes == 1)
+ {
+ auto __y
+ = _mm_packs_epi32(__vector_bitcast<_LLong>(__k), __m128i());
+ __y = _mm_packs_epi32(__y, __m128i());
+ return __intrin_bitcast<_To>(_mm_packs_epi16(__y, __m128i()));
+ }
+ else if constexpr (_FromBytes == 4 && _ToBytes == 1 && __have_ssse3)
+ return __intrin_bitcast<_To>(
+ _mm_shuffle_epi8(__vector_bitcast<_LLong>(__k),
+ _mm_setr_epi8(3, 7, 11, 15, -1, -1, -1, -1, -1,
+ -1, -1, -1, -1, -1, -1, -1)));
+ else if constexpr (_FromBytes == 4 && _ToBytes == 1)
+ {
+ const auto __y
+ = _mm_packs_epi32(__vector_bitcast<_LLong>(__k), __m128i());
+ return __intrin_bitcast<_To>(_mm_packs_epi16(__y, __m128i()));
+ }
+ else if constexpr (_FromBytes == 2 && _ToBytes == 1)
+ return __intrin_bitcast<_To>(
+ _mm_packs_epi16(__vector_bitcast<_LLong>(__k), __m128i()));
+ else
+ __assert_unreachable<_Tp>();
+ } // }}}
+ else if constexpr (sizeof(_UI) == 32 && sizeof(__k) == 32)
+ { // AVX -> AVX {{{
+ if constexpr (_FromBytes == _ToBytes)
+ __assert_unreachable<_Tp>();
+ else if constexpr (_FromBytes == _ToBytes * 2)
+ {
+ const auto __y = __vector_bitcast<_LLong>(__k);
+ return __intrin_bitcast<_To>(_mm256_castsi128_si256(
+ _mm_packs_epi16(__lo128(__y), __hi128(__y))));
+ }
+ else if constexpr (_FromBytes == _ToBytes * 4)
+ {
+ const auto __y = __vector_bitcast<_LLong>(__k);
+ return __intrin_bitcast<_To>(_mm256_castsi128_si256(
+ _mm_packs_epi16(_mm_packs_epi16(__lo128(__y), __hi128(__y)),
+ __m128i())));
+ }
+ else if constexpr (_FromBytes == _ToBytes * 8)
+ {
+ const auto __y = __vector_bitcast<_LLong>(__k);
+ return __intrin_bitcast<_To>(_mm256_castsi128_si256(
+ _mm_shuffle_epi8(_mm_packs_epi16(__lo128(__y), __hi128(__y)),
+ _mm_setr_epi8(3, 7, 11, 15, -1, -1, -1, -1,
+ -1, -1, -1, -1, -1, -1, -1,
+ -1))));
+ }
+ else if constexpr (_FromBytes * 2 == _ToBytes)
+ {
+ auto __y = __xzyw(__to_intrin(__k));
+ if constexpr (std::is_floating_point_v<_Tp>)
+ return __intrin_bitcast<_To>(_mm256_unpacklo_ps(__y, __y));
+ else
+ return __intrin_bitcast<_To>(_mm256_unpacklo_epi8(__y, __y));
+ }
+ else if constexpr (_FromBytes * 4 == _ToBytes)
+ {
+ auto __y
+ = _mm_unpacklo_epi8(__lo128(__vector_bitcast<_LLong>(__k)),
+ __lo128(__vector_bitcast<_LLong>(
+ __k))); // drops 3/4 of input
+ return __intrin_bitcast<_To>(
+ __concat(_mm_unpacklo_epi16(__y, __y),
+ _mm_unpackhi_epi16(__y, __y)));
+ }
+ else if constexpr (_FromBytes == 1 && _ToBytes == 8)
+ {
+ auto __y
+ = _mm_unpacklo_epi8(__lo128(__vector_bitcast<_LLong>(__k)),
+ __lo128(__vector_bitcast<_LLong>(
+ __k))); // drops 3/4 of input
+ __y = _mm_unpacklo_epi16(__y,
+ __y); // drops another 1/2 => 7/8 total
+ return __intrin_bitcast<_To>(
+ __concat(_mm_unpacklo_epi32(__y, __y),
+ _mm_unpackhi_epi32(__y, __y)));
+ }
+ else
+ __assert_unreachable<_Tp>();
+ } // }}}
+ else if constexpr (sizeof(_UI) == 32 && sizeof(__k) == 16)
+ { // SSE -> AVX {{{
+ if constexpr (_FromBytes == _ToBytes)
+ return __intrin_bitcast<_To>(
+ __intrinsic_type_t<_Tp, 32 / sizeof(_Tp)>(
+ __zero_extend(__to_intrin(__k))));
+ else if constexpr (_FromBytes * 2 == _ToBytes)
+ { // keep all
+ return __intrin_bitcast<_To>(
+ __concat(_mm_unpacklo_epi8(__vector_bitcast<_LLong>(__k),
+ __vector_bitcast<_LLong>(__k)),
+ _mm_unpackhi_epi8(__vector_bitcast<_LLong>(__k),
+ __vector_bitcast<_LLong>(__k))));
+ }
+ else if constexpr (_FromBytes * 4 == _ToBytes)
+ {
+ if constexpr (__have_avx2)
+ {
+ return __intrin_bitcast<_To>(_mm256_shuffle_epi8(
+ __concat(__vector_bitcast<_LLong>(__k),
+ __vector_bitcast<_LLong>(__k)),
+ _mm256_setr_epi8(0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3,
+ 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6,
+ 7, 7, 7, 7)));
+ }
+ else
+ {
+ return __intrin_bitcast<_To>(__concat(
+ _mm_shuffle_epi8(__vector_bitcast<_LLong>(__k),
+ _mm_setr_epi8(0, 0, 0, 0, 1, 1, 1, 1, 2,
+ 2, 2, 2, 3, 3, 3, 3)),
+ _mm_shuffle_epi8(__vector_bitcast<_LLong>(__k),
+ _mm_setr_epi8(4, 4, 4, 4, 5, 5, 5, 5, 6,
+ 6, 6, 6, 7, 7, 7, 7))));
+ }
+ }
+ else if constexpr (_FromBytes * 8 == _ToBytes)
+ {
+ if constexpr (__have_avx2)
+ {
+ return __intrin_bitcast<_To>(_mm256_shuffle_epi8(
+ __concat(__vector_bitcast<_LLong>(__k),
+ __vector_bitcast<_LLong>(__k)),
+ _mm256_setr_epi8(0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1,
+ 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3,
+ 3, 3, 3, 3)));
+ }
+ else
+ {
+ return __intrin_bitcast<_To>(__concat(
+ _mm_shuffle_epi8(__vector_bitcast<_LLong>(__k),
+ _mm_setr_epi8(0, 0, 0, 0, 0, 0, 0, 0, 1,
+ 1, 1, 1, 1, 1, 1, 1)),
+ _mm_shuffle_epi8(__vector_bitcast<_LLong>(__k),
+ _mm_setr_epi8(2, 2, 2, 2, 2, 2, 2, 2, 3,
+ 3, 3, 3, 3, 3, 3, 3))));
+ }
+ }
+ else if constexpr (_FromBytes == _ToBytes * 2)
+ return __intrin_bitcast<_To>(__m256i(__zero_extend(
+ _mm_packs_epi16(__vector_bitcast<_LLong>(__k), __m128i()))));
+ else if constexpr (_FromBytes == 8 && _ToBytes == 2)
+ {
+ return __intrin_bitcast<_To>(__m256i(__zero_extend(
+ _mm_shuffle_epi8(__vector_bitcast<_LLong>(__k),
+ _mm_setr_epi8(6, 7, 14, 15, -1, -1, -1, -1,
+ -1, -1, -1, -1, -1, -1, -1,
+ -1)))));
+ }
+ else if constexpr (_FromBytes == 4 && _ToBytes == 1)
+ {
+ return __intrin_bitcast<_To>(__m256i(__zero_extend(
+ _mm_shuffle_epi8(__vector_bitcast<_LLong>(__k),
+ _mm_setr_epi8(3, 7, 11, 15, -1, -1, -1, -1,
+ -1, -1, -1, -1, -1, -1, -1,
+ -1)))));
+ }
+ else if constexpr (_FromBytes == 8 && _ToBytes == 1)
+ {
+ return __intrin_bitcast<_To>(__m256i(__zero_extend(
+ _mm_shuffle_epi8(__vector_bitcast<_LLong>(__k),
+ _mm_setr_epi8(7, 15, -1, -1, -1, -1, -1, -1,
+ -1, -1, -1, -1, -1, -1, -1,
+ -1)))));
+ }
+ else
+ static_assert(!std::is_same_v<_Tp, _Tp>, "should be unreachable");
+ } // }}}
+ else if constexpr (sizeof(_UI) == 16 && sizeof(__k) == 32)
+ { // AVX -> SSE {{{
+ if constexpr (_FromBytes == _ToBytes)
+ { // keep low 1/2
+ return __intrin_bitcast<_To>(__lo128(__k));
+ }
+ else if constexpr (_FromBytes == _ToBytes * 2)
+ { // keep all
+ auto __y = __vector_bitcast<_LLong>(__k);
+ return __intrin_bitcast<_To>(
+ _mm_packs_epi16(__lo128(__y), __hi128(__y)));
+ }
+ else if constexpr (_FromBytes == _ToBytes * 4)
+ { // add 1/2 undef
+ auto __y = __vector_bitcast<_LLong>(__k);
+ return __intrin_bitcast<_To>(
+ _mm_packs_epi16(_mm_packs_epi16(__lo128(__y), __hi128(__y)),
+ __m128i()));
+ }
+ else if constexpr (_FromBytes == 8 && _ToBytes == 1)
+ { // add 3/4 undef
+ auto __y = __vector_bitcast<_LLong>(__k);
+ return __intrin_bitcast<_To>(
+ _mm_shuffle_epi8(_mm_packs_epi16(__lo128(__y), __hi128(__y)),
+ _mm_setr_epi8(3, 7, 11, 15, -1, -1, -1, -1,
+ -1, -1, -1, -1, -1, -1, -1,
+ -1)));
+ }
+ else if constexpr (_FromBytes * 2 == _ToBytes)
+ { // keep low 1/4
+ auto __y = __lo128(__vector_bitcast<_LLong>(__k));
+ return __intrin_bitcast<_To>(_mm_unpacklo_epi8(__y, __y));
+ }
+ else if constexpr (_FromBytes * 4 == _ToBytes)
+ { // keep low 1/8
+ auto __y = __lo128(__vector_bitcast<_LLong>(__k));
+ __y = _mm_unpacklo_epi8(__y, __y);
+ return __intrin_bitcast<_To>(_mm_unpacklo_epi8(__y, __y));
+ }
+ else if constexpr (_FromBytes * 8 == _ToBytes)
+ { // keep low 1/16
+ auto __y = __lo128(__vector_bitcast<_LLong>(__k));
+ __y = _mm_unpacklo_epi8(__y, __y);
+ __y = _mm_unpacklo_epi8(__y, __y);
+ return __intrin_bitcast<_To>(_mm_unpacklo_epi8(__y, __y));
+ }
+ else
+ static_assert(!std::is_same_v<_Tp, _Tp>, "should be unreachable");
+ } // }}}
+ else
+ return _Base::template __to_maskvector<_Up, _ToN>(__x);
+ /*
+ if constexpr (_FromBytes > _ToBytes) {
+ const _To __y = __vector_bitcast<_Up>(__k);
+ return [&] <std::size_t... _Is> (std::index_sequence<_Is...>) {
+ constexpr int _Stride = _FromBytes / _ToBytes;
+ return _To{__y[(_Is + 1) * _Stride - 1]...};
+ }(std::make_index_sequence<std::min(_ToN, _FromN)>());
+ } else {
+ // {0, 0, 1, 1} (_Dups = 2, _Is<4>)
+ // {0, 0, 0, 0, 1, 1, 1, 1} (_Dups = 4, _Is<8>)
+ // {0, 0, 1, 1, 2, 2, 3, 3} (_Dups = 2, _Is<8>)
+ // ...
+ return [&] <std::size_t... _Is> (std::index_sequence<_Is...>) {
+ constexpr int __dup = _ToBytes / _FromBytes;
+ return __intrin_bitcast<_To>(_From{__k[_Is / __dup]...});
+ }(std::make_index_sequence<_FromN>());
+ }
+ */
+ } // }}}
+ }
+
+ // }}}
+ // __to_bits {{{
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SanitizedBitMask<_Np>
+ __to_bits(_SimdWrapper<_Tp, _Np> __x)
+ {
+ if constexpr (is_same_v<_Tp, bool>)
+ return _BitMask<_Np>(__x._M_data)._M_sanitized();
+ else
+ {
+ if (__builtin_is_constant_evaluated() || __builtin_constant_p(__x._M_data))
+ {
+ using _I = __int_for_sizeof_t<_Tp>;
+ const auto __bools = -__vector_bitcast<_I>(__x);
+ _ULLong __k = 0;
+ __execute_n_times<_Np>([&](auto __i) {
+ __k |= (_ULLong(__bools[int(__i)]) << __i);
+ });
+ if(__builtin_is_constant_evaluated() || __builtin_constant_p(__k))
+ return __k;
+ }
+ const auto __xi = __to_intrin(__x);
+ if constexpr (is_floating_point_v<_Tp>)
+ if constexpr (sizeof(_Tp) == 4) // float
+ if constexpr (sizeof(__xi) == 16)
+ return _BitMask<_Np>(_mm_movemask_ps(__xi));
+ else if constexpr (sizeof(__xi) == 32)
+ return _BitMask<_Np>(_mm256_movemask_ps(__xi));
+ else if constexpr (__have_avx512dq)
+ return _BitMask<_Np>(
+ _mm512_movepi32_mask(reinterpret_cast<__m512i>(__xi)));
+ else
+ return _BitMask<_Np>(
+ _mm512_cmp_ps_mask(__xi, __xi, _CMP_UNORD_Q));
+ else // implies double
+ if constexpr (sizeof(__xi) == 16)
+ return _BitMask<_Np>(_mm_movemask_pd(__xi));
+ else if constexpr (sizeof(__xi) == 32)
+ return _BitMask<_Np>(_mm256_movemask_pd(__xi));
+ else if constexpr (__have_avx512dq)
+ return _BitMask<_Np>(
+ _mm512_movepi64_mask(reinterpret_cast<__m512i>(__xi)));
+ else
+ return _BitMask<_Np>(_mm512_cmp_pd_mask(__xi, __xi, _CMP_UNORD_Q));
+
+ else if constexpr (sizeof(_Tp) == 1)
+ if constexpr (sizeof(__xi) == 16)
+ if constexpr (__have_avx512bw_vl)
+ return _BitMask<_Np>(_mm_movepi8_mask(__xi));
+ else // implies SSE2
+ return _BitMask<_Np>(_mm_movemask_epi8(__xi));
+ else if constexpr (sizeof(__xi) == 32)
+ if constexpr (__have_avx512bw_vl)
+ return _BitMask<_Np>(_mm256_movepi8_mask(__xi));
+ else // implies AVX2
+ return _BitMask<_Np>(_mm256_movemask_epi8(__xi));
+ else // implies AVX512BW
+ return _BitMask<_Np>(_mm512_movepi8_mask(__xi));
+
+ else if constexpr (sizeof(_Tp) == 2)
+ if constexpr (sizeof(__xi) == 16)
+ if constexpr (__have_avx512bw_vl)
+ return _BitMask<_Np>(_mm_movepi16_mask(__xi));
+ else if constexpr (__have_avx512bw)
+ return _BitMask<_Np>(_mm512_movepi16_mask(__zero_extend(__xi)));
+ else // implies SSE2
+ return _BitMask<_Np>(
+ _mm_movemask_epi8(_mm_packs_epi16(__xi, __m128i())));
+ else if constexpr (sizeof(__xi) == 32)
+ if constexpr (__have_avx512bw_vl)
+ return _BitMask<_Np>(_mm256_movepi16_mask(__xi));
+ else if constexpr (__have_avx512bw)
+ return _BitMask<_Np>(_mm512_movepi16_mask(__zero_extend(__xi)));
+ else // implies SSE2
+ return _BitMask<_Np>(_mm_movemask_epi8(
+ _mm_packs_epi16(__lo128(__xi), __hi128(__xi))));
+ else // implies AVX512BW
+ return _BitMask<_Np>(_mm512_movepi16_mask(__xi));
+
+ else if constexpr (sizeof(_Tp) == 4)
+ if constexpr (sizeof(__xi) == 16)
+ if constexpr (__have_avx512dq_vl)
+ return _BitMask<_Np>(_mm_movepi32_mask(__xi));
+ else if constexpr (__have_avx512vl)
+ return _BitMask<_Np>(_mm_cmplt_epi32_mask(__xi, __m128i()));
+ else if constexpr (__have_avx512dq)
+ return _BitMask<_Np>(_mm512_movepi32_mask(__zero_extend(__xi)));
+ else if constexpr (__have_avx512f)
+ return _BitMask<_Np>(
+ _mm512_cmplt_epi32_mask(__zero_extend(__xi), __m512i()));
+ else // implies SSE
+ return _BitMask<_Np>(
+ _mm_movemask_ps(reinterpret_cast<__m128>(__xi)));
+ else if constexpr (sizeof(__xi) == 32)
+ if constexpr (__have_avx512dq_vl)
+ return _BitMask<_Np>(_mm256_movepi32_mask(__xi));
+ else if constexpr (__have_avx512dq)
+ return _BitMask<_Np>(_mm512_movepi32_mask(__zero_extend(__xi)));
+ else if constexpr (__have_avx512vl)
+ return _BitMask<_Np>(_mm256_cmplt_epi32_mask(__xi, __m256i()));
+ else if constexpr (__have_avx512f)
+ return _BitMask<_Np>(
+ _mm512_cmplt_epi32_mask(__zero_extend(__xi), __m512i()));
+ else // implies AVX
+ return _BitMask<_Np>(
+ _mm256_movemask_ps(reinterpret_cast<__m256>(__xi)));
+ else // implies AVX512??
+ if constexpr (__have_avx512dq)
+ return _BitMask<_Np>(_mm512_movepi32_mask(__xi));
+ else // implies AVX512F
+ return _BitMask<_Np>(_mm512_cmplt_epi32_mask(__xi, __m512i()));
+
+ else if constexpr (sizeof(_Tp) == 8)
+ if constexpr (sizeof(__xi) == 16)
+ if constexpr (__have_avx512dq_vl)
+ return _BitMask<_Np>(_mm_movepi64_mask(__xi));
+ else if constexpr (__have_avx512dq)
+ return _BitMask<_Np>(_mm512_movepi64_mask(__zero_extend(__xi)));
+ else if constexpr (__have_avx512vl)
+ return _BitMask<_Np>(_mm_cmplt_epi64_mask(__xi, __m128i()));
+ else if constexpr (__have_avx512f)
+ return _BitMask<_Np>(
+ _mm512_cmplt_epi64_mask(__zero_extend(__xi), __m512i()));
+ else // implies SSE2
+ return _BitMask<_Np>(
+ _mm_movemask_pd(reinterpret_cast<__m128d>(__xi)));
+ else if constexpr (sizeof(__xi) == 32)
+ if constexpr (__have_avx512dq_vl)
+ return _BitMask<_Np>(_mm256_movepi64_mask(__xi));
+ else if constexpr (__have_avx512dq)
+ return _BitMask<_Np>(_mm512_movepi64_mask(__zero_extend(__xi)));
+ else if constexpr (__have_avx512vl)
+ return _BitMask<_Np>(_mm256_cmplt_epi64_mask(__xi, __m256i()));
+ else if constexpr (__have_avx512f)
+ return _BitMask<_Np>(
+ _mm512_cmplt_epi64_mask(__zero_extend(__xi), __m512i()));
+ else // implies AVX
+ return _BitMask<_Np>(
+ _mm256_movemask_pd(reinterpret_cast<__m256d>(__xi)));
+ else // implies AVX512??
+ if constexpr (__have_avx512dq)
+ return _BitMask<_Np>(_mm512_movepi64_mask(__xi));
+ else // implies AVX512F
+ return _BitMask<_Np>(_mm512_cmplt_epi64_mask(__xi, __m512i()));
+
+ else
+ __assert_unreachable<_Tp>();
+ }
+ }
+ // }}}
+};
+
+// }}}
+// _MaskImplX86 {{{
+template <typename _Abi>
+struct _MaskImplX86 : _MaskImplX86Mixin, _MaskImplBuiltin<_Abi>
+{
+ using _MaskImplX86Mixin::__to_bits;
+ using _MaskImplX86Mixin::__to_maskvector;
+ using _MaskImplBuiltin<_Abi>::__convert;
+
+ // member types {{{
+ template <typename _Tp>
+ using _SimdMember = typename _Abi::template __traits<_Tp>::_SimdMember;
+ template <typename _Tp>
+ using _MaskMember = typename _Abi::template __traits<_Tp>::_MaskMember;
+ template <typename _Tp> static constexpr size_t size = simd_size_v<_Tp, _Abi>;
+ using _Base = _MaskImplBuiltin<_Abi>;
+
+ // }}}
+ // __broadcast {{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember<_Tp>
+ __broadcast(bool __x)
+ {
+ if constexpr (__is_avx512_abi<_Abi>())
+ return __x ? _Abi::__masked(_MaskMember<_Tp>(-1)) : _MaskMember<_Tp>();
+ else
+ return _Base::template __broadcast<_Tp>(__x);
+ }
+
+ // }}}
+ // __load {{{
+ template <typename _Tp, typename _Flags>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _MaskMember<_Tp>
+ __load(const bool* __bool_mem)
+ {
+ const void* __mem = __bool_mem;
+ if constexpr (is_same_v<_Flags, vector_aligned_tag>)
+ __mem
+ = __builtin_assume_aligned(__mem,
+ memory_alignment_v<simd_mask<_Tp, _Abi>>);
+ else if constexpr (!is_same_v<_Flags, element_aligned_tag>)
+ __mem = __builtin_assume_aligned(__mem, _Flags::_S_alignment);
+
+ if constexpr (__have_avx512bw)
+ {
+ const auto __to_vec_or_bits = [](auto __bits) -> decltype(auto) {
+ if constexpr (__is_avx512_abi<_Abi>())
+ return __bits;
+ else
+ return __to_maskvector<_Tp>(
+ _BitMask<size<_Tp>>(__bits)._M_sanitized());
+ };
+
+ if constexpr (size<_Tp> <= 16 && __have_avx512vl)
+ {
+ __m128i __a = {};
+ __builtin_memcpy(&__a, __mem, size<_Tp>);
+ return __to_vec_or_bits(_mm_test_epi8_mask(__a, __a));
+ }
+ else if constexpr (size<_Tp> <= 32 && __have_avx512vl)
+ {
+ __m256i __a = {};
+ __builtin_memcpy(&__a, __mem, size<_Tp>);
+ return __to_vec_or_bits(_mm256_test_epi8_mask(__a, __a));
+ }
+ else if constexpr (size<_Tp> <= 64)
+ {
+ __m512i __a = {};
+ __builtin_memcpy(&__a, __mem, size<_Tp>);
+ return __to_vec_or_bits(_mm512_test_epi8_mask(__a, __a));
+ }
+ }
+ else if constexpr (__is_avx512_abi<_Abi>())
+ {
+ if constexpr (size<_Tp> <= 8)
+ {
+ __m128i __a = {};
+ __builtin_memcpy(&__a, __mem, size<_Tp>);
+ const auto __b = _mm512_cvtepi8_epi64(__a);
+ return _mm512_test_epi64_mask(__b, __b);
+ }
+ else if constexpr (size<_Tp> <= 16)
+ {
+ __m128i __a = {};
+ __builtin_memcpy(&__a, __mem, size<_Tp>);
+ const auto __b = _mm512_cvtepi8_epi32(__a);
+ return _mm512_test_epi32_mask(__b, __b);
+ }
+ else if constexpr (size<_Tp> <= 32)
+ {
+ __m128i __a = {};
+ __builtin_memcpy(&__a, __mem, 16);
+ const auto __b = _mm512_cvtepi8_epi32(__a);
+ __builtin_memcpy(&__a, __mem + 16, size<_Tp> - 16);
+ const auto __c = _mm512_cvtepi8_epi32(__a);
+ return _mm512_test_epi32_mask(__b, __b)
+ | (_mm512_test_epi32_mask(__c, __c) << 16);
+ }
+ else if constexpr (size<_Tp> <= 64)
+ {
+ __m128i __a = {};
+ __builtin_memcpy(&__a, __mem, 16);
+ const auto __b = _mm512_cvtepi8_epi32(__a);
+ __builtin_memcpy(&__a, __mem + 16, 16);
+ const auto __c = _mm512_cvtepi8_epi32(__a);
+ if constexpr (size<_Tp> <= 48)
+ {
+ __builtin_memcpy(&__a, __mem + 32, size<_Tp> - 32);
+ const auto __d = _mm512_cvtepi8_epi32(__a);
+ return _mm512_test_epi32_mask(__b, __b)
+ | (_mm512_test_epi32_mask(__c, __c) << 16)
+ | (_ULLong(_mm512_test_epi32_mask(__d, __d)) << 32);
+ }
+ else
+ {
+ __builtin_memcpy(&__a, __mem + 16, 16);
+ const auto __d = _mm512_cvtepi8_epi32(__a);
+ __builtin_memcpy(&__a, __mem + 32, size<_Tp> - 48);
+ const auto __e = _mm512_cvtepi8_epi32(__a);
+ return _mm512_test_epi32_mask(__b, __b)
+ | (_mm512_test_epi32_mask(__c, __c) << 16)
+ | (_ULLong(_mm512_test_epi32_mask(__d, __d)) << 32)
+ | (_ULLong(_mm512_test_epi32_mask(__e, __e)) << 48);
+ }
+ }
+ else
+ __assert_unreachable<_Flags>();
+ }
+ else if constexpr (sizeof(_Tp) == 8 && size<_Tp> == 2)
+ return __vector_bitcast<_Tp>(
+ __vector_type16_t<int>{-int(__bool_mem[0]), -int(__bool_mem[0]),
+ -int(__bool_mem[1]), -int(__bool_mem[1])});
+ else if constexpr (sizeof(_Tp) == 8 && size<_Tp> <= 4 && __have_avx)
+ {
+ int __bool4;
+ __builtin_memcpy(&__bool4, __mem, size<_Tp>);
+ const auto __k
+ = __to_intrin((__vector_broadcast<4>(__bool4)
+ & __make_vector<int>(0x1, 0x100, 0x10000,
+ size<_Tp> == 4 ? 0x1000000 : 0))
+ != 0);
+ return __vector_bitcast<_Tp>(
+ __concat(_mm_unpacklo_epi32(__k, __k), _mm_unpackhi_epi32(__k, __k)));
+ }
+ else if constexpr (sizeof(_Tp) == 4 && size<_Tp> <= 4)
+ {
+ int __bools = 0;
+ __builtin_memcpy(&__bools, __mem, size<_Tp>);
+ if constexpr (__have_sse2)
+ {
+ __m128i __k = _mm_cvtsi32_si128(__bools);
+ __k = _mm_cmpgt_epi16(_mm_unpacklo_epi8(__k, __k), __m128i());
+ return __vector_bitcast<_Tp, size<_Tp>>(
+ _mm_unpacklo_epi16(__k, __k));
+ }
+ else
+ {
+ __m128 __k = _mm_cvtpi8_ps(_mm_cvtsi32_si64(__bools));
+ _mm_empty();
+ return __vector_bitcast<_Tp, size<_Tp>>(
+ _mm_cmpgt_ps(__k, __m128()));
+ }
+ }
+ else if constexpr (sizeof(_Tp) == 4 && size<_Tp> <= 8)
+ {
+ __m128i __k = {};
+ __builtin_memcpy(&__k, __mem, size<_Tp>);
+ __k = _mm_cmpgt_epi16(_mm_unpacklo_epi8(__k, __k), __m128i());
+ return __vector_bitcast<_Tp>(
+ __concat(_mm_unpacklo_epi16(__k, __k), _mm_unpackhi_epi16(__k, __k)));
+ }
+ else if constexpr (sizeof(_Tp) == 2 && size<_Tp> <= 16)
+ {
+ __m128i __k = {};
+ __builtin_memcpy(&__k, __mem, size<_Tp>);
+ __k = _mm_cmpgt_epi8(__k, __m128i());
+ if constexpr (size<_Tp> <= 8)
+ return __vector_bitcast<_Tp, size<_Tp>>(_mm_unpacklo_epi8(__k, __k));
+ else
+ return __concat(_mm_unpacklo_epi8(__k, __k),
+ _mm_unpackhi_epi8(__k, __k));
+ }
+ else
+ return _Base::template __load<_Tp, _Flags>(__bool_mem);
+ }
+
+ // }}}
+ // __from_bitmask{{{
+ template <size_t _Np, typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static _MaskMember<_Tp>
+ __from_bitmask(_SanitizedBitMask<_Np> __bits, _TypeTag<_Tp>)
+ {
+ if constexpr (__is_avx512_abi<_Abi>())
+ return __bits._M_to_bits();
+ else
+ return __to_maskvector<_Tp, size<_Tp>>(__bits);
+ }
+
+ // }}}
+ // __masked_load {{{2
+ template <typename _Tp, size_t _Np, typename _Fp>
+ static inline _SimdWrapper<_Tp, _Np>
+ __masked_load(_SimdWrapper<_Tp, _Np> __merge, _SimdWrapper<_Tp, _Np> __mask,
+ const bool* __mem, _Fp) noexcept
+ {
+ if constexpr (__is_avx512_abi<_Abi>())
+ {
+ if constexpr (__have_avx512bw_vl)
+ {
+ if constexpr (_Np <= 16)
+ {
+ const auto __a = _mm_mask_loadu_epi8(__m128i(), __mask, __mem);
+ return (__merge & ~__mask) | _mm_test_epi8_mask(__a, __a);
+ }
+ else if constexpr (_Np <= 32)
+ {
+ const auto __a
+ = _mm256_mask_loadu_epi8(__m256i(), __mask, __mem);
+ return (__merge & ~__mask) | _mm256_test_epi8_mask(__a, __a);
+ }
+ else if constexpr (_Np <= 64)
+ {
+ const auto __a
+ = _mm512_mask_loadu_epi8(__m512i(), __mask, __mem);
+ return (__merge & ~__mask) | _mm512_test_epi8_mask(__a, __a);
+ }
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else
+ {
+ _BitOps::__bit_iteration(__mask, [&](auto __i) {
+ __merge.__set(__i, __mem[__i]);
+ });
+ return __merge;
+ }
+ }
+ else if constexpr (__have_avx512bw_vl && _Np == 32 && sizeof(_Tp) == 1)
+ {
+ const auto __k = __to_bits(__mask)._M_to_bits();
+ __merge
+ = _mm256_mask_sub_epi8(__to_intrin(__merge), __k, __m256i(),
+ _mm256_mask_loadu_epi8(__m256i(), __k, __mem));
+ }
+ else if constexpr (__have_avx512bw_vl && _Np == 16 && sizeof(_Tp) == 1)
+ {
+ const auto __k = __to_bits(__mask)._M_to_bits();
+ __merge
+ = _mm_mask_sub_epi8(__vector_bitcast<_LLong>(__merge), __k, __m128i(),
+ _mm_mask_loadu_epi8(__m128i(), __k, __mem));
+ }
+ else if constexpr (__have_avx512bw_vl && _Np == 16 && sizeof(_Tp) == 2)
+ {
+ const auto __k = __to_bits(__mask)._M_to_bits();
+ __merge = _mm256_mask_sub_epi16(
+ __vector_bitcast<_LLong>(__merge), __k, __m256i(),
+ _mm256_cvtepi8_epi16(_mm_mask_loadu_epi8(__m128i(), __k, __mem)));
+ }
+ else if constexpr (__have_avx512bw_vl && _Np == 8 && sizeof(_Tp) == 2)
+ {
+ const auto __k = __to_bits(__mask)._M_to_bits();
+ __merge = _mm_mask_sub_epi16(
+ __vector_bitcast<_LLong>(__merge), __k, __m128i(),
+ _mm_cvtepi8_epi16(_mm_mask_loadu_epi8(__m128i(), __k, __mem)));
+ }
+ else if constexpr (__have_avx512bw_vl && _Np == 8 && sizeof(_Tp) == 4)
+ {
+ const auto __k = __to_bits(__mask)._M_to_bits();
+ __merge = __vector_bitcast<_Tp>(_mm256_mask_sub_epi32(
+ __vector_bitcast<_LLong>(__merge), __k, __m256i(),
+ _mm256_cvtepi8_epi32(_mm_mask_loadu_epi8(__m128i(), __k, __mem))));
+ }
+ else if constexpr (__have_avx512bw_vl && _Np == 4 && sizeof(_Tp) == 4)
+ {
+ const auto __k = __to_bits(__mask)._M_to_bits();
+ __merge = __vector_bitcast<_Tp>(_mm_mask_sub_epi32(
+ __vector_bitcast<_LLong>(__merge), __k, __m128i(),
+ _mm_cvtepi8_epi32(_mm_mask_loadu_epi8(__m128i(), __k, __mem))));
+ }
+ else if constexpr (__have_avx512bw_vl && _Np == 4 && sizeof(_Tp) == 8)
+ {
+ const auto __k = __to_bits(__mask)._M_to_bits();
+ __merge = __vector_bitcast<_Tp>(_mm256_mask_sub_epi64(
+ __vector_bitcast<_LLong>(__merge), __k, __m256i(),
+ _mm256_cvtepi8_epi64(_mm_mask_loadu_epi8(__m128i(), __k, __mem))));
+ }
+ else if constexpr (__have_avx512bw_vl && _Np == 2 && sizeof(_Tp) == 8)
+ {
+ const auto __k = __to_bits(__mask)._M_to_bits();
+ __merge = __vector_bitcast<_Tp>(_mm_mask_sub_epi64(
+ __vector_bitcast<_LLong>(__merge), __k, __m128i(),
+ _mm_cvtepi8_epi64(_mm_mask_loadu_epi8(__m128i(), __k, __mem))));
+ }
+ else
+ {
+ return _Base::__masked_load(__merge, __mask, __mem, _Fp{});
+ }
+ return __merge;
+ }
+
+ // __store {{{2
+ template <typename _Tp, size_t _Np, typename _Fp>
+ _GLIBCXX_SIMD_INTRINSIC static void __store(_SimdWrapper<_Tp, _Np> __v,
+ bool* __mem, _Fp) noexcept
+ {
+ if constexpr (__is_avx512_abi<_Abi>())
+ {
+ if constexpr (__have_avx512bw_vl)
+ _CommonImplX86::__store<_Np>(
+ __vector_bitcast<char>([](auto __data) {
+ if constexpr (_Np <= 16)
+ return _mm_maskz_set1_epi8(__data, 1);
+ else if constexpr (_Np <= 32)
+ return _mm256_maskz_set1_epi8(__data, 1);
+ else
+ return _mm512_maskz_set1_epi8(__data, 1);
+ }(__v._M_data)),
+ __mem, _Fp());
+ else if constexpr (_Np <= 8)
+ _CommonImplX86::__store<_Np>(
+ __vector_bitcast<char>(
+#if defined __x86_64__
+ __make_wrapper<_ULLong>(
+ _pdep_u64(__v._M_data, 0x0101010101010101ULL), 0ull)
+#else
+ __make_wrapper<_UInt>(_pdep_u32(__v._M_data, 0x01010101U),
+ _pdep_u32(__v._M_data >> 4, 0x01010101U))
+#endif
+ ),
+ __mem, _Fp());
+ else if constexpr (_Np <= 16)
+ _mm512_mask_cvtepi32_storeu_epi8(__mem, 0xffffu >> (16 - _Np),
+ _mm512_maskz_set1_epi32(__v._M_data,
+ 1));
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (__is_sse_abi<_Abi>()) //{{{
+ {
+ if constexpr (_Np == 2 && sizeof(_Tp) == 8)
+ {
+ const auto __k = __vector_bitcast<int>(__v);
+ __mem[0] = -__k[1];
+ __mem[1] = -__k[3];
+ }
+ else if constexpr (_Np <= 4 && sizeof(_Tp) == 4)
+ {
+ if constexpr (__have_sse2)
+ {
+ const unsigned __bool4
+ = __vector_bitcast<_UInt>(
+ _mm_packs_epi16(_mm_packs_epi32(__intrin_bitcast<__m128i>(
+ __to_intrin(__v)),
+ __m128i()),
+ __m128i()))[0]
+ & 0x01010101u;
+ __builtin_memcpy(__mem, &__bool4, _Np);
+ }
+ else if constexpr (__have_mmx)
+ {
+ const __m64 __k
+ = _mm_cvtps_pi8(__and(__to_intrin(__v), _mm_set1_ps(1.f)));
+ __builtin_memcpy(__mem, &__k, _Np);
+ _mm_empty();
+ }
+ else
+ return _Base::__store(__v, __mem, _Fp());
+ }
+ else if constexpr (_Np <= 8 && sizeof(_Tp) == 2)
+ {
+ _CommonImplX86::__store<_Np>(
+ __vector_bitcast<char>(_mm_packs_epi16(
+ __to_intrin(__vector_bitcast<_UShort>(__v) >> 15), __m128i())),
+ __mem, _Fp());
+ }
+ else if constexpr (_Np <= 16 && sizeof(_Tp) == 1)
+ _CommonImplX86::__store<_Np>(__v._M_data & 1, __mem, _Fp());
+ else
+ __assert_unreachable<_Tp>();
+ } // }}}
+ else if constexpr (__is_avx_abi<_Abi>()) // {{{
+ {
+ if constexpr (_Np <= 4 && sizeof(_Tp) == 8)
+ {
+ auto __k = __intrin_bitcast<__m256i>(__to_intrin(__v));
+ int __bool4;
+ if constexpr (__have_avx2)
+ __bool4 = _mm256_movemask_epi8(__k);
+ else
+ __bool4 = (_mm_movemask_epi8(__lo128(__k))
+ | (_mm_movemask_epi8(__hi128(__k)) << 16));
+ __bool4 &= 0x01010101;
+ __builtin_memcpy(__mem, &__bool4, _Np);
+ }
+ else if constexpr (_Np <= 8 && sizeof(_Tp) == 4)
+ {
+ const auto __k = __intrin_bitcast<__m256i>(__to_intrin(__v));
+ const auto __k2
+ = _mm_srli_epi16(_mm_packs_epi16(__lo128(__k), __hi128(__k)), 15);
+ const auto __k3
+ = __vector_bitcast<char>(_mm_packs_epi16(__k2, __m128i()));
+ _CommonImplX86::__store<_Np>(__k3, __mem, _Fp());
+ }
+ else if constexpr (_Np <= 16 && sizeof(_Tp) == 2)
+ {
+ if constexpr (__have_avx2)
+ {
+ const auto __x = _mm256_srli_epi16(__to_intrin(__v), 15);
+ const auto __bools = __vector_bitcast<char>(
+ _mm_packs_epi16(__lo128(__x), __hi128(__x)));
+ _CommonImplX86::__store<_Np>(__bools, __mem, _Fp());
+ }
+ else
+ {
+ const auto __bools
+ = 1
+ & __vector_bitcast<_UChar>(
+ _mm_packs_epi16(__lo128(__to_intrin(__v)),
+ __hi128(__to_intrin(__v))));
+ _CommonImplX86::__store<_Np>(__bools, __mem, _Fp());
+ }
+ }
+ else if constexpr (_Np <= 32 && sizeof(_Tp) == 1)
+ _CommonImplX86::__store<_Np>(1 & __v._M_data, __mem, _Fp());
+ else
+ __assert_unreachable<_Tp>();
+ } // }}}
+ else
+ __assert_unreachable<_Tp>();
+ }
+
+ // __masked_store {{{2
+ template <typename _Tp, size_t _Np, typename _Fp>
+ static inline void __masked_store(const _SimdWrapper<_Tp, _Np> __v,
+ bool* __mem, _Fp,
+ const _SimdWrapper<_Tp, _Np> __k) noexcept
+ {
+ if constexpr (__is_avx512_abi<_Abi>())
+ {
+ static_assert(is_same_v<_Tp, bool>);
+ if constexpr (_Np <= 16 && __have_avx512bw_vl)
+ _mm_mask_storeu_epi8(__mem, __k, _mm_maskz_set1_epi8(__v, 1));
+ else if constexpr (_Np <= 16)
+ _mm512_mask_cvtepi32_storeu_epi8(__mem, __k,
+ _mm512_maskz_set1_epi32(__v, 1));
+ else if constexpr (_Np <= 32 && __have_avx512bw_vl)
+ _mm256_mask_storeu_epi8(__mem, __k, _mm256_maskz_set1_epi8(__v, 1));
+ else if constexpr (_Np <= 32 && __have_avx512bw)
+ _mm256_mask_storeu_epi8(__mem, __k,
+ __lo256(_mm512_maskz_set1_epi8(__v, 1)));
+ else if constexpr (_Np <= 64 && __have_avx512bw)
+ _mm512_mask_storeu_epi8(__mem, __k, _mm512_maskz_set1_epi8(__v, 1));
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else
+ _Base::__masked_store(__v, __mem, _Fp(), __k);
+ }
+
+ // logical and bitwise operators {{{2
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np>
+ __logical_and(const _SimdWrapper<_Tp, _Np>& __x,
+ const _SimdWrapper<_Tp, _Np>& __y)
+ {
+ if constexpr (std::is_same_v<_Tp, bool>)
+ {
+ if constexpr (__have_avx512dq && _Np <= 8)
+ return _kand_mask8(__x._M_data, __y._M_data);
+ else if constexpr (_Np <= 16)
+ return _kand_mask16(__x._M_data, __y._M_data);
+ else if constexpr (__have_avx512bw && _Np <= 32)
+ return _kand_mask32(__x._M_data, __y._M_data);
+ else if constexpr (__have_avx512bw && _Np <= 64)
+ return _kand_mask64(__x._M_data, __y._M_data);
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else
+ return _Base::__logical_and(__x, __y);
+ }
+
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np>
+ __logical_or(const _SimdWrapper<_Tp, _Np>& __x,
+ const _SimdWrapper<_Tp, _Np>& __y)
+ {
+ if constexpr (std::is_same_v<_Tp, bool>)
+ {
+ if constexpr (__have_avx512dq && _Np <= 8)
+ return _kor_mask8(__x._M_data, __y._M_data);
+ else if constexpr (_Np <= 16)
+ return _kor_mask16(__x._M_data, __y._M_data);
+ else if constexpr (__have_avx512bw && _Np <= 32)
+ return _kor_mask32(__x._M_data, __y._M_data);
+ else if constexpr (__have_avx512bw && _Np <= 64)
+ return _kor_mask64(__x._M_data, __y._M_data);
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else
+ return _Base::__logical_or(__x, __y);
+ }
+
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np>
+ __bit_not(const _SimdWrapper<_Tp, _Np>& __x)
+ {
+ if constexpr (std::is_same_v<_Tp, bool>)
+ {
+ if constexpr (__have_avx512dq && _Np <= 8)
+ return _kandn_mask8(__x._M_data,
+ _Abi::template __implicit_mask_n<_Np>());
+ else if constexpr (_Np <= 16)
+ return _kandn_mask16(__x._M_data,
+ _Abi::template __implicit_mask_n<_Np>());
+ else if constexpr (__have_avx512bw && _Np <= 32)
+ return _kandn_mask32(__x._M_data,
+ _Abi::template __implicit_mask_n<_Np>());
+ else if constexpr (__have_avx512bw && _Np <= 64)
+ return _kandn_mask64(__x._M_data,
+ _Abi::template __implicit_mask_n<_Np>());
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else
+ return _Base::__bit_not(__x);
+ }
+
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np>
+ __bit_and(const _SimdWrapper<_Tp, _Np>& __x,
+ const _SimdWrapper<_Tp, _Np>& __y)
+ {
+ if constexpr (std::is_same_v<_Tp, bool>)
+ {
+ if constexpr (__have_avx512dq && _Np <= 8)
+ return _kand_mask8(__x._M_data, __y._M_data);
+ else if constexpr (_Np <= 16)
+ return _kand_mask16(__x._M_data, __y._M_data);
+ else if constexpr (__have_avx512bw && _Np <= 32)
+ return _kand_mask32(__x._M_data, __y._M_data);
+ else if constexpr (__have_avx512bw && _Np <= 64)
+ return _kand_mask64(__x._M_data, __y._M_data);
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else
+ return _Base::__bit_and(__x, __y);
+ }
+
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np>
+ __bit_or(const _SimdWrapper<_Tp, _Np>& __x, const _SimdWrapper<_Tp, _Np>& __y)
+ {
+ if constexpr (std::is_same_v<_Tp, bool>)
+ {
+ if constexpr (__have_avx512dq && _Np <= 8)
+ return _kor_mask8(__x._M_data, __y._M_data);
+ else if constexpr (_Np <= 16)
+ return _kor_mask16(__x._M_data, __y._M_data);
+ else if constexpr (__have_avx512bw && _Np <= 32)
+ return _kor_mask32(__x._M_data, __y._M_data);
+ else if constexpr (__have_avx512bw && _Np <= 64)
+ return _kor_mask64(__x._M_data, __y._M_data);
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else
+ return _Base::__bit_or(__x, __y);
+ }
+
+ template <typename _Tp, size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdWrapper<_Tp, _Np>
+ __bit_xor(const _SimdWrapper<_Tp, _Np>& __x,
+ const _SimdWrapper<_Tp, _Np>& __y)
+ {
+ if constexpr (std::is_same_v<_Tp, bool>)
+ {
+ if constexpr (__have_avx512dq && _Np <= 8)
+ return _kxor_mask8(__x._M_data, __y._M_data);
+ else if constexpr (_Np <= 16)
+ return _kxor_mask16(__x._M_data, __y._M_data);
+ else if constexpr (__have_avx512bw && _Np <= 32)
+ return _kxor_mask32(__x._M_data, __y._M_data);
+ else if constexpr (__have_avx512bw && _Np <= 64)
+ return _kxor_mask64(__x._M_data, __y._M_data);
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else
+ return _Base::__bit_xor(__x, __y);
+ }
+
+ //}}}2
+ // __masked_assign{{{
+ template <size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static void
+ __masked_assign(_SimdWrapper<bool, _Np> __k, _SimdWrapper<bool, _Np>& __lhs,
+ _SimdWrapper<bool, _Np> __rhs)
+ {
+ __lhs._M_data
+ = (~__k._M_data & __lhs._M_data) | (__k._M_data & __rhs._M_data);
+ }
+
+ template <size_t _Np>
+ _GLIBCXX_SIMD_INTRINSIC static void
+ __masked_assign(_SimdWrapper<bool, _Np> __k, _SimdWrapper<bool, _Np>& __lhs,
+ bool __rhs)
+ {
+ if (__rhs)
+ __lhs._M_data = __k._M_data | __lhs._M_data;
+ else
+ __lhs._M_data = ~__k._M_data & __lhs._M_data;
+ }
+
+ using _MaskImplBuiltin<_Abi>::__masked_assign;
+
+ //}}}
+ // __all_of {{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static bool __all_of(simd_mask<_Tp, _Abi> __k)
+ {
+ if constexpr (__is_sse_abi<_Abi>() || __is_avx_abi<_Abi>())
+ {
+ constexpr size_t _Np = simd_size_v<_Tp, _Abi>;
+ if constexpr (__have_sse4_1)
+ return 0
+ != __testc(__as_vector(__k),
+ _Abi::template __implicit_mask<_Tp>());
+ else if constexpr (std::is_same_v<_Tp, float>)
+ return (_mm_movemask_ps(__to_intrin(__k._M_data)) & ((1 << _Np) - 1))
+ == (1 << _Np) - 1;
+ else if constexpr (std::is_same_v<_Tp, double>)
+ return (_mm_movemask_pd(__to_intrin(__k._M_data)) & ((1 << _Np) - 1))
+ == (1 << _Np) - 1;
+ else
+ return (_mm_movemask_epi8(__to_intrin(__k._M_data))
+ & ((1 << (_Np * sizeof(_Tp))) - 1))
+ == (1 << (_Np * sizeof(_Tp))) - 1;
+ }
+ else if constexpr (__is_avx512_abi<_Abi>())
+ {
+ constexpr auto _Mask = _Abi::template __implicit_mask<_Tp>();
+ const auto __kk = __k._M_data._M_data;
+ if constexpr (sizeof(__kk) == 1)
+ {
+ if constexpr (__have_avx512dq)
+ return _kortestc_mask8_u8(__kk, _Mask == 0xff ? __kk
+ : __mmask8(~_Mask));
+ else
+ return _kortestc_mask16_u8(__kk, __mmask16(~_Mask));
+ }
+ else if constexpr (sizeof(__kk) == 2)
+ return _kortestc_mask16_u8(__kk, _Mask == 0xffff ? __kk
+ : __mmask16(~_Mask));
+ else if constexpr (sizeof(__kk) == 4 && __have_avx512bw)
+ return _kortestc_mask32_u8(__kk, _Mask == 0xffffffffU
+ ? __kk
+ : __mmask32(~_Mask));
+ else if constexpr (sizeof(__kk) == 8 && __have_avx512bw)
+ return _kortestc_mask64_u8(__kk, _Mask == 0xffffffffffffffffULL
+ ? __kk
+ : __mmask64(~_Mask));
+ else
+ __assert_unreachable<_Tp>();
+ }
+ }
+
+ // }}}
+ // __any_of {{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static bool __any_of(simd_mask<_Tp, _Abi> __k)
+ {
+ if constexpr (__is_sse_abi<_Abi>() || __is_avx_abi<_Abi>())
+ {
+ constexpr size_t _Np = simd_size_v<_Tp, _Abi>;
+ if constexpr (__have_sse4_1)
+ {
+ if constexpr (_Abi::_S_is_partial || sizeof(__k) < 16)
+ return 0
+ == __testz(__as_vector(__k),
+ _Abi::template __implicit_mask<_Tp>());
+ else
+ return 0 == __testz(__as_vector(__k), __as_vector(__k));
+ }
+ else if constexpr (std::is_same_v<_Tp, float>)
+ return (_mm_movemask_ps(__to_intrin(__k._M_data)) & ((1 << _Np) - 1))
+ != 0;
+ else if constexpr (std::is_same_v<_Tp, double>)
+ return (_mm_movemask_pd(__to_intrin(__k._M_data)) & ((1 << _Np) - 1))
+ != 0;
+ else
+ return (_mm_movemask_epi8(__to_intrin(__k._M_data))
+ & ((1 << (_Np * sizeof(_Tp))) - 1))
+ != 0;
+ }
+ else if constexpr (__is_avx512_abi<_Abi>())
+ return (__k._M_data._M_data & _Abi::template __implicit_mask<_Tp>()) != 0;
+ }
+
+ // }}}
+ // __none_of {{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static bool __none_of(simd_mask<_Tp, _Abi> __k)
+ {
+ if constexpr (__is_sse_abi<_Abi>() || __is_avx_abi<_Abi>())
+ {
+ constexpr size_t _Np = simd_size_v<_Tp, _Abi>;
+ if constexpr (__have_sse4_1)
+ {
+ if constexpr (_Abi::_S_is_partial || sizeof(__k) < 16)
+ return 0
+ != __testz(__as_vector(__k),
+ _Abi::template __implicit_mask<_Tp>());
+ else
+ return 0 != __testz(__as_vector(__k), __as_vector(__k));
+ }
+ else if constexpr (std::is_same_v<_Tp, float>)
+ return (__movemask(__to_intrin(__k._M_data)) & ((1 << _Np) - 1)) == 0;
+ else if constexpr (std::is_same_v<_Tp, double>)
+ return (__movemask(__to_intrin(__k._M_data)) & ((1 << _Np) - 1)) == 0;
+ else
+ return (__movemask(__to_intrin(__k._M_data))
+ & int((1ull << (_Np * sizeof(_Tp))) - 1))
+ == 0;
+ }
+ else if constexpr (__is_avx512_abi<_Abi>())
+ return (__k._M_data._M_data & _Abi::template __implicit_mask<_Tp>()) == 0;
+ }
+
+ // }}}
+ // __some_of {{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static bool __some_of(simd_mask<_Tp, _Abi> __k)
+ {
+ if constexpr (__is_sse_abi<_Abi>() || __is_avx_abi<_Abi>())
+ {
+ constexpr size_t _Np = simd_size_v<_Tp, _Abi>;
+ if constexpr (__have_sse4_1)
+ return 0
+ != __testnzc(__as_vector(__k),
+ _Abi::template __implicit_mask<_Tp>());
+ else if constexpr (std::is_same_v<_Tp, float>)
+ {
+ constexpr int __allbits = (1 << _Np) - 1;
+ const auto __tmp
+ = _mm_movemask_ps(__to_intrin(__k._M_data)) & __allbits;
+ return __tmp > 0 && __tmp < __allbits;
+ }
+ else if constexpr (std::is_same_v<_Tp, double>)
+ {
+ constexpr int __allbits = (1 << _Np) - 1;
+ const auto __tmp
+ = _mm_movemask_pd(__to_intrin(__k._M_data)) & __allbits;
+ return __tmp > 0 && __tmp < __allbits;
+ }
+ else
+ {
+ constexpr int __allbits = (1 << (_Np * sizeof(_Tp))) - 1;
+ const auto __tmp
+ = _mm_movemask_epi8(__to_intrin(__k._M_data)) & __allbits;
+ return __tmp > 0 && __tmp < __allbits;
+ }
+ }
+ else if constexpr (__is_avx512_abi<_Abi>())
+ return __any_of(__k) && !__all_of(__k);
+ else
+ __assert_unreachable<_Tp>();
+ }
+
+ // }}}
+ // __popcount {{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static int __popcount(simd_mask<_Tp, _Abi> __k)
+ {
+ constexpr size_t _Np = simd_size_v<_Tp, _Abi>;
+ const auto __kk = _Abi::__masked(__k._M_data)._M_data;
+ if constexpr (__is_avx512_abi<_Abi>())
+ {
+ if constexpr (_Np > 32)
+ return __builtin_popcountll(__kk);
+ else
+ return __builtin_popcount(__kk);
+ }
+ else
+ {
+ if constexpr (__have_popcnt)
+ {
+ int __bits = __movemask(__to_intrin(__vector_bitcast<_Tp>(__kk)));
+ const int __count = __builtin_popcount(__bits);
+ return std::is_integral_v<_Tp> ? __count / sizeof(_Tp) : __count;
+ }
+ else if constexpr (_Np == 2 && sizeof(_Tp) == 8)
+ {
+ const int mask = _mm_movemask_pd(__auto_bitcast(__kk));
+ return mask - (mask >> 1);
+ }
+ else if constexpr (_Np <= 4 && sizeof(_Tp) == 8)
+ {
+ auto __x = -(__lo128(__kk) + __hi128(__kk));
+ return __x[0] + __x[1];
+ }
+ else if constexpr (_Np <= 4 && sizeof(_Tp) == 4)
+ {
+ if constexpr (__have_sse2)
+ {
+ __m128i __x = __intrin_bitcast<__m128i>(__to_intrin(__kk));
+ __x = _mm_add_epi32(__x,
+ _mm_shuffle_epi32(__x,
+ _MM_SHUFFLE(0, 1, 2, 3)));
+ __x = _mm_add_epi32(
+ __x, _mm_shufflelo_epi16(__x, _MM_SHUFFLE(1, 0, 3, 2)));
+ return -_mm_cvtsi128_si32(__x);
+ }
+ else
+ return __builtin_popcount(_mm_movemask_ps(__auto_bitcast(__kk)));
+ }
+ else if constexpr (_Np <= 8 && sizeof(_Tp) == 2)
+ {
+ auto __x = __to_intrin(__kk);
+ __x
+ = _mm_add_epi16(__x,
+ _mm_shuffle_epi32(__x, _MM_SHUFFLE(0, 1, 2, 3)));
+ __x = _mm_add_epi16(__x,
+ _mm_shufflelo_epi16(__x,
+ _MM_SHUFFLE(0, 1, 2, 3)));
+ __x = _mm_add_epi16(__x,
+ _mm_shufflelo_epi16(__x,
+ _MM_SHUFFLE(2, 3, 0, 1)));
+ return -short(_mm_extract_epi16(__x, 0));
+ }
+ else if constexpr (_Np <= 16 && sizeof(_Tp) == 1)
+ {
+ auto __x = __to_intrin(__kk);
+ __x = _mm_add_epi8(__x,
+ _mm_shuffle_epi32(__x, _MM_SHUFFLE(0, 1, 2, 3)));
+ __x
+ = _mm_add_epi8(__x,
+ _mm_shufflelo_epi16(__x, _MM_SHUFFLE(0, 1, 2, 3)));
+ __x
+ = _mm_add_epi8(__x,
+ _mm_shufflelo_epi16(__x, _MM_SHUFFLE(2, 3, 0, 1)));
+ auto __y = -__vector_bitcast<_UChar>(__x);
+ if constexpr (__have_sse4_1)
+ return __y[0] + __y[1];
+ else
+ {
+ unsigned __z = _mm_extract_epi16(__to_intrin(__y), 0);
+ return (__z & 0xff) + (__z >> 8);
+ }
+ }
+ else if constexpr (sizeof(__kk) == 32)
+ {
+ // The following works only as long as the implementations above use
+ // a summation
+ using _I = __int_for_sizeof_t<_Tp>;
+ const auto __as_int = __vector_bitcast<_I>(__kk);
+ _MaskImplX86<simd_abi::__sse>::__popcount(
+ simd_mask<_I, simd_abi::__sse>(__private_init,
+ __lo128(__as_int)
+ + __hi128(__as_int)));
+ }
+ else
+ __assert_unreachable<_Tp>();
+ }
+ }
+
+ // }}}
+ // __find_first_set {{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static int __find_first_set(simd_mask<_Tp, _Abi> __k)
+ {
+ if constexpr (__is_avx512_abi<_Abi>())
+ if constexpr (size<_Tp> <= 32)
+ return _tzcnt_u32(__k._M_data._M_data);
+ else
+ return _BitOps::__firstbit(__k._M_data._M_data);
+ else
+ return _Base::__find_first_set(__k);
+ }
+
+ // }}}
+ // __find_last_set {{{
+ template <typename _Tp>
+ _GLIBCXX_SIMD_INTRINSIC static int __find_last_set(simd_mask<_Tp, _Abi> __k)
+ {
+ if constexpr (__is_avx512_abi<_Abi>())
+ if constexpr (size<_Tp> <= 32)
+ return 31 - _lzcnt_u32(__k._M_data._M_data);
+ else
+ return _BitOps::__lastbit(__k._M_data._M_data);
+ else
+ return _Base::__find_last_set(__k);
+ }
+
+ // }}}
+};
+
+// }}}
+
+_GLIBCXX_SIMD_END_NAMESPACE
+#endif // __cplusplus >= 201703L
+#endif // _GLIBCXX_EXPERIMENTAL_SIMD_X86_H_
+
+// vim: foldmethod=marker sw=2 noet ts=8 sts=2 tw=80
diff --git a/libstdc++-v3/include/experimental/bits/simd_x86_conversions.h b/libstdc++-v3/include/experimental/bits/simd_x86_conversions.h
new file mode 100644
index 00000000000..f72d7809680
--- /dev/null
+++ b/libstdc++-v3/include/experimental/bits/simd_x86_conversions.h
@@ -0,0 +1,1993 @@
+// x86 specific conversion optimizations -*- C++ -*-
+
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library. This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+// <http://www.gnu.org/licenses/>.
+
+#ifndef _GLIBCXX_EXPERIMENTAL_SIMD_X86_CONVERSIONS_H
+#define _GLIBCXX_EXPERIMENTAL_SIMD_X86_CONVERSIONS_H
+
+#if __cplusplus >= 201703L
+
+// work around PR85827
+// 1-arg __convert_x86 {{{1
+template <typename _To, typename _V, typename _Traits>
+_GLIBCXX_SIMD_INTRINSIC _To
+__convert_x86(_V __v)
+{
+ static_assert(__is_vector_type_v<_V>);
+ using _Tp = typename _Traits::value_type;
+ constexpr size_t _Np = _Traits::_S_width;
+ [[maybe_unused]] const auto __intrin = __to_intrin(__v);
+ using _Up = typename _VectorTraits<_To>::value_type;
+ constexpr size_t _M = _VectorTraits<_To>::_S_width;
+
+ // [xyz]_to_[xyz] {{{2
+ [[maybe_unused]] constexpr bool __x_to_x
+ = sizeof(__v) <= 16 && sizeof(_To) <= 16;
+ [[maybe_unused]] constexpr bool __x_to_y
+ = sizeof(__v) <= 16 && sizeof(_To) == 32;
+ [[maybe_unused]] constexpr bool __x_to_z
+ = sizeof(__v) <= 16 && sizeof(_To) == 64;
+ [[maybe_unused]] constexpr bool __y_to_x
+ = sizeof(__v) == 32 && sizeof(_To) <= 16;
+ [[maybe_unused]] constexpr bool __y_to_y
+ = sizeof(__v) == 32 && sizeof(_To) == 32;
+ [[maybe_unused]] constexpr bool __y_to_z
+ = sizeof(__v) == 32 && sizeof(_To) == 64;
+ [[maybe_unused]] constexpr bool __z_to_x
+ = sizeof(__v) == 64 && sizeof(_To) <= 16;
+ [[maybe_unused]] constexpr bool __z_to_y
+ = sizeof(__v) == 64 && sizeof(_To) == 32;
+ [[maybe_unused]] constexpr bool __z_to_z
+ = sizeof(__v) == 64 && sizeof(_To) == 64;
+
+ // iX_to_iX {{{2
+ [[maybe_unused]] constexpr bool __i_to_i
+ = is_integral_v<_Up> && is_integral_v<_Tp>;
+ [[maybe_unused]] constexpr bool __i8_to_i16
+ = __i_to_i && sizeof(_Tp) == 1 && sizeof(_Up) == 2;
+ [[maybe_unused]] constexpr bool __i8_to_i32
+ = __i_to_i && sizeof(_Tp) == 1 && sizeof(_Up) == 4;
+ [[maybe_unused]] constexpr bool __i8_to_i64
+ = __i_to_i && sizeof(_Tp) == 1 && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __i16_to_i8
+ = __i_to_i && sizeof(_Tp) == 2 && sizeof(_Up) == 1;
+ [[maybe_unused]] constexpr bool __i16_to_i32
+ = __i_to_i && sizeof(_Tp) == 2 && sizeof(_Up) == 4;
+ [[maybe_unused]] constexpr bool __i16_to_i64
+ = __i_to_i && sizeof(_Tp) == 2 && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __i32_to_i8
+ = __i_to_i && sizeof(_Tp) == 4 && sizeof(_Up) == 1;
+ [[maybe_unused]] constexpr bool __i32_to_i16
+ = __i_to_i && sizeof(_Tp) == 4 && sizeof(_Up) == 2;
+ [[maybe_unused]] constexpr bool __i32_to_i64
+ = __i_to_i && sizeof(_Tp) == 4 && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __i64_to_i8
+ = __i_to_i && sizeof(_Tp) == 8 && sizeof(_Up) == 1;
+ [[maybe_unused]] constexpr bool __i64_to_i16
+ = __i_to_i && sizeof(_Tp) == 8 && sizeof(_Up) == 2;
+ [[maybe_unused]] constexpr bool __i64_to_i32
+ = __i_to_i && sizeof(_Tp) == 8 && sizeof(_Up) == 4;
+
+ // [fsu]X_to_[fsu]X {{{2
+ // ibw = integral && byte or word, i.e. char and short with any signedness
+ [[maybe_unused]] constexpr bool __s64_to_f32
+ = is_integral_v<_Tp> && is_signed_v<_Tp> && sizeof(_Tp) == 8
+ && is_floating_point_v<_Up> && sizeof(_Up) == 4;
+ [[maybe_unused]] constexpr bool __s32_to_f32
+ = is_integral_v<_Tp> && is_signed_v<_Tp> && sizeof(_Tp) == 4
+ && is_floating_point_v<_Up> && sizeof(_Up) == 4;
+ [[maybe_unused]] constexpr bool __s16_to_f32
+ = is_integral_v<_Tp> && is_signed_v<_Tp> && sizeof(_Tp) == 2
+ && is_floating_point_v<_Up> && sizeof(_Up) == 4;
+ [[maybe_unused]] constexpr bool __s8_to_f32
+ = is_integral_v<_Tp> && is_signed_v<_Tp> && sizeof(_Tp) == 1
+ && is_floating_point_v<_Up> && sizeof(_Up) == 4;
+ [[maybe_unused]] constexpr bool __u64_to_f32
+ = is_integral_v<_Tp> && is_unsigned_v<_Tp> && sizeof(_Tp) == 8
+ && is_floating_point_v<_Up> && sizeof(_Up) == 4;
+ [[maybe_unused]] constexpr bool __u32_to_f32
+ = is_integral_v<_Tp> && is_unsigned_v<_Tp> && sizeof(_Tp) == 4
+ && is_floating_point_v<_Up> && sizeof(_Up) == 4;
+ [[maybe_unused]] constexpr bool __u16_to_f32
+ = is_integral_v<_Tp> && is_unsigned_v<_Tp> && sizeof(_Tp) == 2
+ && is_floating_point_v<_Up> && sizeof(_Up) == 4;
+ [[maybe_unused]] constexpr bool __u8_to_f32
+ = is_integral_v<_Tp> && is_unsigned_v<_Tp> && sizeof(_Tp) == 1
+ && is_floating_point_v<_Up> && sizeof(_Up) == 4;
+ [[maybe_unused]] constexpr bool __s64_to_f64
+ = is_integral_v<_Tp> && is_signed_v<_Tp> && sizeof(_Tp) == 8
+ && is_floating_point_v<_Up> && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __s32_to_f64
+ = is_integral_v<_Tp> && is_signed_v<_Tp> && sizeof(_Tp) == 4
+ && is_floating_point_v<_Up> && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __u64_to_f64
+ = is_integral_v<_Tp> && is_unsigned_v<_Tp> && sizeof(_Tp) == 8
+ && is_floating_point_v<_Up> && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __u32_to_f64
+ = is_integral_v<_Tp> && is_unsigned_v<_Tp> && sizeof(_Tp) == 4
+ && is_floating_point_v<_Up> && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __f32_to_s64
+ = is_integral_v<_Up> && is_signed_v<_Up> && sizeof(_Up) == 8
+ && is_floating_point_v<_Tp> && sizeof(_Tp) == 4;
+ [[maybe_unused]] constexpr bool __f32_to_s32
+ = is_integral_v<_Up> && is_signed_v<_Up> && sizeof(_Up) == 4
+ && is_floating_point_v<_Tp> && sizeof(_Tp) == 4;
+ [[maybe_unused]] constexpr bool __f32_to_u64
+ = is_integral_v<_Up> && is_unsigned_v<_Up> && sizeof(_Up) == 8
+ && is_floating_point_v<_Tp> && sizeof(_Tp) == 4;
+ [[maybe_unused]] constexpr bool __f32_to_u32
+ = is_integral_v<_Up> && is_unsigned_v<_Up> && sizeof(_Up) == 4
+ && is_floating_point_v<_Tp> && sizeof(_Tp) == 4;
+ [[maybe_unused]] constexpr bool __f64_to_s64
+ = is_integral_v<_Up> && is_signed_v<_Up> && sizeof(_Up) == 8
+ && is_floating_point_v<_Tp> && sizeof(_Tp) == 8;
+ [[maybe_unused]] constexpr bool __f64_to_s32
+ = is_integral_v<_Up> && is_signed_v<_Up> && sizeof(_Up) == 4
+ && is_floating_point_v<_Tp> && sizeof(_Tp) == 8;
+ [[maybe_unused]] constexpr bool __f64_to_u64
+ = is_integral_v<_Up> && is_unsigned_v<_Up> && sizeof(_Up) == 8
+ && is_floating_point_v<_Tp> && sizeof(_Tp) == 8;
+ [[maybe_unused]] constexpr bool __f64_to_u32
+ = is_integral_v<_Up> && is_unsigned_v<_Up> && sizeof(_Up) == 4
+ && is_floating_point_v<_Tp> && sizeof(_Tp) == 8;
+ [[maybe_unused]] constexpr bool __ibw_to_f32
+ = is_integral_v<_Tp> && sizeof(_Tp) <= 2
+ && is_floating_point_v<_Up> && sizeof(_Up) == 4;
+ [[maybe_unused]] constexpr bool __ibw_to_f64
+ = is_integral_v<_Tp> && sizeof(_Tp) <= 2
+ && is_floating_point_v<_Up> && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __f32_to_ibw
+ = is_integral_v<_Up> && sizeof(_Up) <= 2
+ && is_floating_point_v<_Tp> && sizeof(_Tp) == 4;
+ [[maybe_unused]] constexpr bool __f64_to_ibw
+ = is_integral_v<_Up> && sizeof(_Up) <= 2
+ && is_floating_point_v<_Tp> && sizeof(_Tp) == 8;
+ [[maybe_unused]] constexpr bool __f32_to_f64
+ = is_floating_point_v<_Tp> && sizeof(_Tp) == 4
+ && is_floating_point_v<_Up> && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __f64_to_f32
+ = is_floating_point_v<_Tp> && sizeof(_Tp) == 8
+ && is_floating_point_v<_Up> && sizeof(_Up) == 4;
+
+ if constexpr (__i_to_i && __y_to_x && !__have_avx2) //{{{2
+ return __convert_x86<_To>(__lo128(__v), __hi128(__v));
+ else if constexpr (__i_to_i && __x_to_y && !__have_avx2) //{{{2
+ return __concat(__convert_x86<__vector_type_t<_Up, _M / 2>>(__v),
+ __convert_x86<__vector_type_t<_Up, _M / 2>>(
+ __extract_part<1, _Np / _M * 2>(__v)));
+ else if constexpr (__i_to_i) //{{{2
+ {
+ static_assert(__x_to_x || __have_avx2,
+ "integral conversions with ymm registers require AVX2");
+ static_assert(__have_avx512bw
+ || ((sizeof(_Tp) >= 4 || sizeof(__v) < 64)
+ && (sizeof(_Up) >= 4 || sizeof(_To) < 64)),
+ "8/16-bit integers in zmm registers require AVX512BW");
+ static_assert((sizeof(__v) < 64 && sizeof(_To) < 64) || __have_avx512f,
+ "integral conversions with ymm registers require AVX2");
+ }
+ if constexpr (is_floating_point_v<_Tp> == is_floating_point_v<_Up> && //{{{2
+ sizeof(_Tp) == sizeof(_Up))
+ {
+ // conversion uses simple bit reinterpretation (or no conversion at all)
+ if constexpr (_Np >= _M)
+ return __intrin_bitcast<_To>(__v);
+ else
+ return __zero_extend(__vector_bitcast<_Up>(__v));
+ }
+ else if constexpr (_Np < _M && sizeof(_To) > 16) // zero extend (eg. xmm -> ymm){{{2
+ return __zero_extend(
+ __convert_x86<__vector_type_t<
+ _Up, (16 / sizeof(_Up) > _Np) ? 16 / sizeof(_Up) : _Np>>(__v));
+ else if constexpr (_Np > _M && sizeof(__v) > 16) // partial input (eg. ymm -> xmm){{{2
+ return __convert_x86<_To>(__extract_part<0, _Np / _M>(__v));
+ else if constexpr (__i64_to_i32) //{{{2
+ {
+ if constexpr (__x_to_x && __have_avx512vl)
+ return __intrin_bitcast<_To>(_mm_cvtepi64_epi32(__intrin));
+ else if constexpr (__x_to_x)
+ return __auto_bitcast(
+ _mm_shuffle_ps(__vector_bitcast<float>(__v), __m128(), 8));
+ else if constexpr (__y_to_x && __have_avx512vl)
+ return __intrin_bitcast<_To>(_mm256_cvtepi64_epi32(__intrin));
+ else if constexpr (__y_to_x && __have_avx512f)
+ return __intrin_bitcast<_To>(
+ __lo128(_mm512_cvtepi64_epi32(__auto_bitcast(__v))));
+ else if constexpr (__y_to_x)
+ return __intrin_bitcast<_To>(
+ __lo128(_mm256_permute4x64_epi64(_mm256_shuffle_epi32(__intrin, 8),
+ 0 + 4 * 2)));
+ else if constexpr (__z_to_y)
+ return __intrin_bitcast<_To>(_mm512_cvtepi64_epi32(__intrin));
+ }
+ else if constexpr (__i64_to_i16) //{{{2
+ {
+ if constexpr (__x_to_x && __have_avx512vl)
+ return __intrin_bitcast<_To>(_mm_cvtepi64_epi16(__intrin));
+ else if constexpr (__x_to_x && __have_avx512f)
+ return __intrin_bitcast<_To>(
+ __lo128(_mm512_cvtepi64_epi16(__auto_bitcast(__v))));
+ else if constexpr (__x_to_x && __have_ssse3)
+ {
+ return __intrin_bitcast<_To>(
+ _mm_shuffle_epi8(__intrin,
+ _mm_setr_epi8(0, 1, 8, 9, -0x80, -0x80, -0x80,
+ -0x80, -0x80, -0x80, -0x80, -0x80,
+ -0x80, -0x80, -0x80, -0x80)));
+ // fallback without SSSE3
+ }
+ else if constexpr (__y_to_x && __have_avx512vl)
+ return __intrin_bitcast<_To>(_mm256_cvtepi64_epi16(__intrin));
+ else if constexpr (__y_to_x && __have_avx512f)
+ return __intrin_bitcast<_To>(
+ __lo128(_mm512_cvtepi64_epi16(__auto_bitcast(__v))));
+ else if constexpr (__y_to_x)
+ {
+ const auto __a = _mm256_shuffle_epi8(
+ __intrin,
+ _mm256_setr_epi8(0, 1, 8, 9, -0x80, -0x80, -0x80, -0x80, -0x80,
+ -0x80, -0x80, -0x80, -0x80, -0x80, -0x80, -0x80,
+ -0x80, -0x80, -0x80, -0x80, 0, 1, 8, 9, -0x80,
+ -0x80, -0x80, -0x80, -0x80, -0x80, -0x80, -0x80));
+ return __intrin_bitcast<_To>(__lo128(__a) | __hi128(__a));
+ }
+ else if constexpr (__z_to_x)
+ return __intrin_bitcast<_To>(_mm512_cvtepi64_epi16(__intrin));
+ }
+ else if constexpr (__i64_to_i8) //{{{2
+ {
+ if constexpr (__x_to_x && __have_avx512vl)
+ return __intrin_bitcast<_To>(_mm_cvtepi64_epi8(__intrin));
+ else if constexpr (__x_to_x && __have_avx512f)
+ return __intrin_bitcast<_To>(
+ __lo128(_mm512_cvtepi64_epi8(__zero_extend(__intrin))));
+ else if constexpr (__y_to_x && __have_avx512vl)
+ return __intrin_bitcast<_To>(_mm256_cvtepi64_epi8(__intrin));
+ else if constexpr (__y_to_x && __have_avx512f)
+ return __intrin_bitcast<_To>(
+ _mm512_cvtepi64_epi8(__zero_extend(__intrin)));
+ else if constexpr (__z_to_x)
+ return __intrin_bitcast<_To>(_mm512_cvtepi64_epi8(__intrin));
+ }
+ else if constexpr (__i32_to_i64) //{{{2
+ {
+ if constexpr (__have_sse4_1 && __x_to_x)
+ return __intrin_bitcast<_To>(is_signed_v<_Tp>
+ ? _mm_cvtepi32_epi64(__intrin)
+ : _mm_cvtepu32_epi64(__intrin));
+ else if constexpr (__x_to_x)
+ {
+ return __intrin_bitcast<_To>(
+ _mm_unpacklo_epi32(__intrin, is_signed_v<_Tp>
+ ? _mm_srai_epi32(__intrin, 31)
+ : __m128i()));
+ }
+ else if constexpr (__x_to_y)
+ return __intrin_bitcast<_To>(is_signed_v<_Tp>
+ ? _mm256_cvtepi32_epi64(__intrin)
+ : _mm256_cvtepu32_epi64(__intrin));
+ else if constexpr (__y_to_z)
+ return __intrin_bitcast<_To>(is_signed_v<_Tp>
+ ? _mm512_cvtepi32_epi64(__intrin)
+ : _mm512_cvtepu32_epi64(__intrin));
+ }
+ else if constexpr (__i32_to_i16) //{{{2
+ {
+ if constexpr (__x_to_x && __have_avx512vl)
+ return __intrin_bitcast<_To>(_mm_cvtepi32_epi16(__intrin));
+ else if constexpr (__x_to_x && __have_avx512f)
+ return __intrin_bitcast<_To>(
+ __lo128(_mm512_cvtepi32_epi16(__auto_bitcast(__v))));
+ else if constexpr (__x_to_x && __have_ssse3)
+ return __intrin_bitcast<_To>(_mm_shuffle_epi8(
+ __intrin, _mm_setr_epi8(0, 1, 4, 5, 8, 9, 12, 13, -0x80, -0x80, -0x80,
+ -0x80, -0x80, -0x80, -0x80, -0x80)));
+ else if constexpr (__x_to_x)
+ {
+ auto __a = _mm_unpacklo_epi16(__intrin, __m128i()); // 0o.o 1o.o
+ auto __b = _mm_unpackhi_epi16(__intrin, __m128i()); // 2o.o 3o.o
+ auto __c = _mm_unpacklo_epi16(__a, __b); // 02oo ..oo
+ auto __d = _mm_unpackhi_epi16(__a, __b); // 13oo ..oo
+ return __intrin_bitcast<_To>(
+ _mm_unpacklo_epi16(__c, __d)); // 0123 oooo
+ }
+ else if constexpr (__y_to_x && __have_avx512vl)
+ return __intrin_bitcast<_To>(_mm256_cvtepi32_epi16(__intrin));
+ else if constexpr (__y_to_x && __have_avx512f)
+ return __intrin_bitcast<_To>(
+ __lo128(_mm512_cvtepi32_epi16(__auto_bitcast(__v))));
+ else if constexpr (__y_to_x)
+ {
+ auto __a = _mm256_shuffle_epi8(
+ __intrin,
+ _mm256_setr_epi8(0, 1, 4, 5, 8, 9, 12, 13, -0x80, -0x80, -0x80,
+ -0x80, -0x80, -0x80, -0x80, -0x80, 0, 1, 4, 5, 8,
+ 9, 12, 13, -0x80, -0x80, -0x80, -0x80, -0x80,
+ -0x80, -0x80, -0x80));
+ return __intrin_bitcast<_To>(__lo128(
+ _mm256_permute4x64_epi64(__a,
+ 0xf8))); // __a[0] __a[2] | __a[3] __a[3]
+ }
+ else if constexpr (__z_to_y)
+ return __intrin_bitcast<_To>(_mm512_cvtepi32_epi16(__intrin));
+ }
+ else if constexpr (__i32_to_i8) //{{{2
+ {
+ if constexpr (__x_to_x && __have_avx512vl)
+ return __intrin_bitcast<_To>(_mm_cvtepi32_epi8(__intrin));
+ else if constexpr (__x_to_x && __have_avx512f)
+ return __intrin_bitcast<_To>(
+ __lo128(_mm512_cvtepi32_epi8(__zero_extend(__intrin))));
+ else if constexpr (__x_to_x && __have_ssse3)
+ {
+ return __intrin_bitcast<_To>(
+ _mm_shuffle_epi8(__intrin,
+ _mm_setr_epi8(0, 4, 8, 12, -0x80, -0x80, -0x80,
+ -0x80, -0x80, -0x80, -0x80, -0x80,
+ -0x80, -0x80, -0x80, -0x80)));
+ }
+ else if constexpr (__x_to_x)
+ {
+ const auto __a
+ = _mm_unpacklo_epi8(__intrin, __intrin); // 0... .... 1... ....
+ const auto __b
+ = _mm_unpackhi_epi8(__intrin, __intrin); // 2... .... 3... ....
+ const auto __c = _mm_unpacklo_epi8(__a, __b); // 02.. .... .... ....
+ const auto __d = _mm_unpackhi_epi8(__a, __b); // 13.. .... .... ....
+ const auto __e = _mm_unpacklo_epi8(__c, __d); // 0123 .... .... ....
+ return __intrin_bitcast<_To>(__e & _mm_cvtsi32_si128(-1));
+ }
+ else if constexpr (__y_to_x && __have_avx512vl)
+ return __intrin_bitcast<_To>(_mm256_cvtepi32_epi8(__intrin));
+ else if constexpr (__y_to_x && __have_avx512f)
+ return __intrin_bitcast<_To>(
+ _mm512_cvtepi32_epi8(__zero_extend(__intrin)));
+ else if constexpr (__z_to_x)
+ return __intrin_bitcast<_To>(_mm512_cvtepi32_epi8(__intrin));
+ }
+ else if constexpr (__i16_to_i64) //{{{2
+ {
+ if constexpr (__x_to_x && __have_sse4_1)
+ return __intrin_bitcast<_To>(is_signed_v<_Tp>
+ ? _mm_cvtepi16_epi64(__intrin)
+ : _mm_cvtepu16_epi64(__intrin));
+ else if constexpr (__x_to_x && is_signed_v<_Tp>)
+ {
+ auto __x = _mm_srai_epi16(__intrin, 15);
+ auto __y = _mm_unpacklo_epi16(__intrin, __x);
+ __x = _mm_unpacklo_epi16(__x, __x);
+ return __intrin_bitcast<_To>(_mm_unpacklo_epi32(__y, __x));
+ }
+ else if constexpr (__x_to_x)
+ return __intrin_bitcast<_To>(
+ _mm_unpacklo_epi32(_mm_unpacklo_epi16(__intrin, __m128i()),
+ __m128i()));
+ else if constexpr (__x_to_y)
+ return __intrin_bitcast<_To>(is_signed_v<_Tp>
+ ? _mm256_cvtepi16_epi64(__intrin)
+ : _mm256_cvtepu16_epi64(__intrin));
+ else if constexpr (__x_to_z)
+ return __intrin_bitcast<_To>(is_signed_v<_Tp>
+ ? _mm512_cvtepi16_epi64(__intrin)
+ : _mm512_cvtepu16_epi64(__intrin));
+ }
+ else if constexpr (__i16_to_i32) //{{{2
+ {
+ if constexpr (__x_to_x && __have_sse4_1)
+ return __intrin_bitcast<_To>(is_signed_v<_Tp>
+ ? _mm_cvtepi16_epi32(__intrin)
+ : _mm_cvtepu16_epi32(__intrin));
+ else if constexpr (__x_to_x && is_signed_v<_Tp>)
+ return __intrin_bitcast<_To>(
+ _mm_srai_epi32(_mm_unpacklo_epi16(__intrin, __intrin), 16));
+ else if constexpr (__x_to_x && is_unsigned_v<_Tp>)
+ return __intrin_bitcast<_To>(_mm_unpacklo_epi16(__intrin, __m128i()));
+ else if constexpr (__x_to_y)
+ return __intrin_bitcast<_To>(is_signed_v<_Tp>
+ ? _mm256_cvtepi16_epi32(__intrin)
+ : _mm256_cvtepu16_epi32(__intrin));
+ else if constexpr (__y_to_z)
+ return __intrin_bitcast<_To>(is_signed_v<_Tp>
+ ? _mm512_cvtepi16_epi32(__intrin)
+ : _mm512_cvtepu16_epi32(__intrin));
+ }
+ else if constexpr (__i16_to_i8) //{{{2
+ {
+ if constexpr (__x_to_x && __have_avx512bw_vl)
+ return __intrin_bitcast<_To>(_mm_cvtepi16_epi8(__intrin));
+ else if constexpr (__x_to_x && __have_avx512bw)
+ return __intrin_bitcast<_To>(
+ __lo128(_mm512_cvtepi16_epi8(__zero_extend(__intrin))));
+ else if constexpr (__x_to_x && __have_ssse3)
+ return __intrin_bitcast<_To>(_mm_shuffle_epi8(
+ __intrin, _mm_setr_epi8(0, 2, 4, 6, 8, 10, 12, 14, -0x80, -0x80,
+ -0x80, -0x80, -0x80, -0x80, -0x80, -0x80)));
+ else if constexpr (__x_to_x)
+ {
+ auto __a
+ = _mm_unpacklo_epi8(__intrin, __intrin); // 00.. 11.. 22.. 33..
+ auto __b
+ = _mm_unpackhi_epi8(__intrin, __intrin); // 44.. 55.. 66.. 77..
+ auto __c = _mm_unpacklo_epi8(__a, __b); // 0404 .... 1515 ....
+ auto __d = _mm_unpackhi_epi8(__a, __b); // 2626 .... 3737 ....
+ auto __e = _mm_unpacklo_epi8(__c, __d); // 0246 0246 .... ....
+ auto __f = _mm_unpackhi_epi8(__c, __d); // 1357 1357 .... ....
+ return __intrin_bitcast<_To>(_mm_unpacklo_epi8(__e, __f));
+ }
+ else if constexpr (__y_to_x && __have_avx512bw_vl)
+ return __intrin_bitcast<_To>(_mm256_cvtepi16_epi8(__intrin));
+ else if constexpr (__y_to_x && __have_avx512bw)
+ return __intrin_bitcast<_To>(
+ __lo256(_mm512_cvtepi16_epi8(__zero_extend(__intrin))));
+ else if constexpr (__y_to_x)
+ {
+ auto __a = _mm256_shuffle_epi8(
+ __intrin,
+ _mm256_setr_epi8(0, 2, 4, 6, 8, 10, 12, 14, -0x80, -0x80, -0x80,
+ -0x80, -0x80, -0x80, -0x80, -0x80, -0x80, -0x80,
+ -0x80, -0x80, -0x80, -0x80, -0x80, -0x80, 0, 2, 4,
+ 6, 8, 10, 12, 14));
+ return __intrin_bitcast<_To>(__lo128(__a) | __hi128(__a));
+ }
+ else if constexpr (__z_to_y && __have_avx512bw)
+ return __intrin_bitcast<_To>(_mm512_cvtepi16_epi8(__intrin));
+ else if constexpr (__z_to_y)
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (__i8_to_i64) //{{{2
+ {
+ if constexpr (__x_to_x && __have_sse4_1)
+ return __intrin_bitcast<_To>(is_signed_v<_Tp>
+ ? _mm_cvtepi8_epi64(__intrin)
+ : _mm_cvtepu8_epi64(__intrin));
+ else if constexpr (__x_to_x && is_signed_v<_Tp>)
+ {
+ if constexpr (__have_ssse3)
+ {
+ auto __dup = _mm_unpacklo_epi8(__intrin, __intrin);
+ auto __epi16 = _mm_srai_epi16(__dup, 8);
+ _mm_shuffle_epi8(__epi16, _mm_setr_epi8(0, 1, 1, 1, 1, 1, 1, 1, 2,
+ 3, 3, 3, 3, 3, 3, 3));
+ }
+ else
+ {
+ auto __x = _mm_unpacklo_epi8(__intrin, __intrin);
+ __x = _mm_unpacklo_epi16(__x, __x);
+ return __intrin_bitcast<_To>(
+ _mm_unpacklo_epi32(_mm_srai_epi32(__x, 24),
+ _mm_srai_epi32(__x, 31)));
+ }
+ }
+ else if constexpr (__x_to_x)
+ {
+ return __intrin_bitcast<_To>(_mm_unpacklo_epi32(
+ _mm_unpacklo_epi16(_mm_unpacklo_epi8(__intrin, __m128i()),
+ __m128i()),
+ __m128i()));
+ }
+ else if constexpr (__x_to_y)
+ return __intrin_bitcast<_To>(is_signed_v<_Tp>
+ ? _mm256_cvtepi8_epi64(__intrin)
+ : _mm256_cvtepu8_epi64(__intrin));
+ else if constexpr (__x_to_z)
+ return __intrin_bitcast<_To>(is_signed_v<_Tp>
+ ? _mm512_cvtepi8_epi64(__intrin)
+ : _mm512_cvtepu8_epi64(__intrin));
+ }
+ else if constexpr (__i8_to_i32) //{{{2
+ {
+ if constexpr (__x_to_x && __have_sse4_1)
+ return __intrin_bitcast<_To>(is_signed_v<_Tp>
+ ? _mm_cvtepi8_epi32(__intrin)
+ : _mm_cvtepu8_epi32(__intrin));
+ else if constexpr (__x_to_x && is_signed_v<_Tp>)
+ {
+ const auto __x = _mm_unpacklo_epi8(__intrin, __intrin);
+ return __intrin_bitcast<_To>(
+ _mm_srai_epi32(_mm_unpacklo_epi16(__x, __x), 24));
+ }
+ else if constexpr (__x_to_x && is_unsigned_v<_Tp>)
+ return __intrin_bitcast<_To>(
+ _mm_unpacklo_epi16(_mm_unpacklo_epi8(__intrin, __m128i()),
+ __m128i()));
+ else if constexpr (__x_to_y)
+ return __intrin_bitcast<_To>(is_signed_v<_Tp>
+ ? _mm256_cvtepi8_epi32(__intrin)
+ : _mm256_cvtepu8_epi32(__intrin));
+ else if constexpr (__x_to_z)
+ return __intrin_bitcast<_To>(is_signed_v<_Tp>
+ ? _mm512_cvtepi8_epi32(__intrin)
+ : _mm512_cvtepu8_epi32(__intrin));
+ }
+ else if constexpr (__i8_to_i16) //{{{2
+ {
+ if constexpr (__x_to_x && __have_sse4_1)
+ return __intrin_bitcast<_To>(is_signed_v<_Tp>
+ ? _mm_cvtepi8_epi16(__intrin)
+ : _mm_cvtepu8_epi16(__intrin));
+ else if constexpr (__x_to_x && is_signed_v<_Tp>)
+ return __intrin_bitcast<_To>(
+ _mm_srai_epi16(_mm_unpacklo_epi8(__intrin, __intrin), 8));
+ else if constexpr (__x_to_x && is_unsigned_v<_Tp>)
+ return __intrin_bitcast<_To>(_mm_unpacklo_epi8(__intrin, __m128i()));
+ else if constexpr (__x_to_y)
+ return __intrin_bitcast<_To>(is_signed_v<_Tp>
+ ? _mm256_cvtepi8_epi16(__intrin)
+ : _mm256_cvtepu8_epi16(__intrin));
+ else if constexpr (__y_to_z && __have_avx512bw)
+ return __intrin_bitcast<_To>(is_signed_v<_Tp>
+ ? _mm512_cvtepi8_epi16(__intrin)
+ : _mm512_cvtepu8_epi16(__intrin));
+ else if constexpr (__y_to_z)
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (__f32_to_s64) //{{{2
+ {
+ if constexpr (__have_avx512dq_vl && __x_to_x)
+ return __intrin_bitcast<_To>(_mm_cvttps_epi64(__intrin));
+ else if constexpr (__have_avx512dq_vl && __x_to_y)
+ return __intrin_bitcast<_To>(_mm256_cvttps_epi64(__intrin));
+ else if constexpr (__have_avx512dq && __y_to_z)
+ return __intrin_bitcast<_To>(_mm512_cvttps_epi64(__intrin));
+ // else use scalar fallback
+ }
+ else if constexpr (__f32_to_u64) //{{{2
+ {
+ if constexpr (__have_avx512dq_vl && __x_to_x)
+ return __intrin_bitcast<_To>(_mm_cvttps_epu64(__intrin));
+ else if constexpr (__have_avx512dq_vl && __x_to_y)
+ return __intrin_bitcast<_To>(_mm256_cvttps_epu64(__intrin));
+ else if constexpr (__have_avx512dq && __y_to_z)
+ return __intrin_bitcast<_To>(_mm512_cvttps_epu64(__intrin));
+ // else use scalar fallback
+ }
+ else if constexpr (__f32_to_s32) //{{{2
+ {
+ if constexpr (__x_to_x || __y_to_y || __z_to_z)
+ {
+ // go to fallback, it does the right thing
+ }
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (__f32_to_u32) //{{{2
+ {
+ if constexpr (__have_avx512vl && __x_to_x)
+ return __auto_bitcast(_mm_cvttps_epu32(__intrin));
+ else if constexpr (__have_avx512f && __x_to_x)
+ return __auto_bitcast(
+ __lo128(_mm512_cvttps_epu32(__auto_bitcast(__v))));
+ else if constexpr (__have_avx512vl && __y_to_y)
+ return __vector_bitcast<_Up>(_mm256_cvttps_epu32(__intrin));
+ else if constexpr (__have_avx512f && __y_to_y)
+ return __vector_bitcast<_Up>(
+ __lo256(_mm512_cvttps_epu32(__auto_bitcast(__v))));
+ else if constexpr (__x_to_x || __y_to_y || __z_to_z)
+ {
+ // go to fallback, it does the right thing. We can't use the
+ // _mm_floor_ps - 0x8000'0000 trick for f32->u32 because it would
+ // discard small input values (only 24 mantissa bits)
+ }
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else if constexpr (__f32_to_ibw) //{{{2
+ return __convert_x86<_To>(__convert_x86<__vector_type_t<int, _Np>>(__v));
+ else if constexpr (__f64_to_s64) //{{{2
+ {
+ if constexpr (__have_avx512dq_vl && __x_to_x)
+ return __intrin_bitcast<_To>(_mm_cvttpd_epi64(__intrin));
+ else if constexpr (__have_avx512dq_vl && __y_to_y)
+ return __intrin_bitcast<_To>(_mm256_cvttpd_epi64(__intrin));
+ else if constexpr (__have_avx512dq && __z_to_z)
+ return __intrin_bitcast<_To>(_mm512_cvttpd_epi64(__intrin));
+ // else use scalar fallback
+ }
+ else if constexpr (__f64_to_u64) //{{{2
+ {
+ if constexpr (__have_avx512dq_vl && __x_to_x)
+ return __intrin_bitcast<_To>(_mm_cvttpd_epu64(__intrin));
+ else if constexpr (__have_avx512dq_vl && __y_to_y)
+ return __intrin_bitcast<_To>(_mm256_cvttpd_epu64(__intrin));
+ else if constexpr (__have_avx512dq && __z_to_z)
+ return __intrin_bitcast<_To>(_mm512_cvttpd_epu64(__intrin));
+ // else use scalar fallback
+ }
+ else if constexpr (__f64_to_s32) //{{{2
+ {
+ if constexpr (__x_to_x)
+ return __intrin_bitcast<_To>(_mm_cvttpd_epi32(__intrin));
+ else if constexpr (__y_to_x)
+ return __intrin_bitcast<_To>(_mm256_cvttpd_epi32(__intrin));
+ else if constexpr (__z_to_y)
+ return __intrin_bitcast<_To>(_mm512_cvttpd_epi32(__intrin));
+ }
+ else if constexpr (__f64_to_u32) //{{{2
+ {
+ if constexpr (__have_avx512vl && __x_to_x)
+ return __intrin_bitcast<_To>(_mm_cvttpd_epu32(__intrin));
+ else if constexpr (__have_sse4_1 && __x_to_x)
+ return __vector_bitcast<_Up, _M>(
+ _mm_cvttpd_epi32(_mm_floor_pd(__intrin) - 0x8000'0000u))
+ ^ 0x8000'0000u;
+ else if constexpr (__x_to_x)
+ {
+ // use scalar fallback: it's only 2 values to convert, can't get much
+ // better than scalar decomposition
+ }
+ else if constexpr (__have_avx512vl && __y_to_x)
+ return __intrin_bitcast<_To>(_mm256_cvttpd_epu32(__intrin));
+ else if constexpr (__y_to_x)
+ {
+ return __intrin_bitcast<_To>(
+ __vector_bitcast<_Up>(
+ _mm256_cvttpd_epi32(_mm256_floor_pd(__intrin) - 0x8000'0000u))
+ ^ 0x8000'0000u);
+ }
+ else if constexpr (__z_to_y)
+ return __intrin_bitcast<_To>(_mm512_cvttpd_epu32(__intrin));
+ }
+ else if constexpr (__f64_to_ibw) //{{{2
+ {
+ return __convert_x86<_To>(
+ __convert_x86<__vector_type_t<int, (_Np < 4 ? 4 : _Np)>>(__v));
+ }
+ else if constexpr (__s64_to_f32) //{{{2
+ {
+ if constexpr (__x_to_x && __have_avx512dq_vl)
+ return __intrin_bitcast<_To>(_mm_cvtepi64_ps(__intrin));
+ else if constexpr (__y_to_x && __have_avx512dq_vl)
+ return __intrin_bitcast<_To>(_mm256_cvtepi64_ps(__intrin));
+ else if constexpr (__z_to_y && __have_avx512dq)
+ return __intrin_bitcast<_To>(_mm512_cvtepi64_ps(__intrin));
+ else if constexpr (__z_to_y)
+ return __intrin_bitcast<_To>(
+ _mm512_cvtpd_ps(__convert_x86<__vector_type_t<double, 8>>(__v)));
+ }
+ else if constexpr (__u64_to_f32) //{{{2
+ {
+ if constexpr (__x_to_x && __have_avx512dq_vl)
+ return __intrin_bitcast<_To>(_mm_cvtepu64_ps(__intrin));
+ else if constexpr (__y_to_x && __have_avx512dq_vl)
+ return __intrin_bitcast<_To>(_mm256_cvtepu64_ps(__intrin));
+ else if constexpr (__z_to_y && __have_avx512dq)
+ return __intrin_bitcast<_To>(_mm512_cvtepu64_ps(__intrin));
+ else if constexpr (__z_to_y)
+ {
+ return __intrin_bitcast<_To>(
+ __lo256(_mm512_cvtepu32_ps(__auto_bitcast(
+ _mm512_cvtepi64_epi32(_mm512_srai_epi64(__intrin, 32)))))
+ * 0x100000000LL
+ + __lo256(_mm512_cvtepu32_ps(
+ __auto_bitcast(_mm512_cvtepi64_epi32(__intrin)))));
+ }
+ }
+ else if constexpr (__s32_to_f32) //{{{2
+ {
+ // use fallback (builtin conversion)
+ }
+ else if constexpr (__u32_to_f32) //{{{2
+ {
+ if constexpr (__x_to_x && __have_avx512vl)
+ {
+ // use fallback
+ }
+ else if constexpr (__x_to_x && __have_avx512f)
+ return __intrin_bitcast<_To>(
+ __lo128(_mm512_cvtepu32_ps(__auto_bitcast(__v))));
+ else if constexpr (__x_to_x && (__have_fma || __have_fma4))
+ // work around PR85819
+ return __auto_bitcast(0x10000 * _mm_cvtepi32_ps(__to_intrin(__v >> 16))
+ + _mm_cvtepi32_ps(__to_intrin(__v & 0xffff)));
+ else if constexpr (__y_to_y && __have_avx512vl)
+ {
+ // use fallback
+ }
+ else if constexpr (__y_to_y && __have_avx512f)
+ return __intrin_bitcast<_To>(
+ __lo256(_mm512_cvtepu32_ps(__auto_bitcast(__v))));
+ else if constexpr (__y_to_y)
+ // work around PR85819
+ return 0x10000 * _mm256_cvtepi32_ps(__to_intrin(__v >> 16))
+ + _mm256_cvtepi32_ps(__to_intrin(__v & 0xffff));
+ // else use fallback (builtin conversion)
+ }
+ else if constexpr (__ibw_to_f32) //{{{2
+ {
+ if constexpr (_M <= 4 || __have_avx2)
+ return __convert_x86<_To>(__convert_x86<__vector_type_t<int, _M>>(__v));
+ else
+ {
+ static_assert(__x_to_y);
+ __m128i __a, __b;
+ if constexpr (__have_sse4_1)
+ {
+ __a = sizeof(_Tp) == 2
+ ? (is_signed_v<_Tp> ? _mm_cvtepi16_epi32(__intrin)
+ : _mm_cvtepu16_epi32(__intrin))
+ : (is_signed_v<_Tp> ? _mm_cvtepi8_epi32(__intrin)
+ : _mm_cvtepu8_epi32(__intrin));
+ const auto __w
+ = _mm_shuffle_epi32(__intrin, sizeof(_Tp) == 2 ? 0xee : 0xe9);
+ __b = sizeof(_Tp) == 2
+ ? (is_signed_v<_Tp> ? _mm_cvtepi16_epi32(__w)
+ : _mm_cvtepu16_epi32(__w))
+ : (is_signed_v<_Tp> ? _mm_cvtepi8_epi32(__w)
+ : _mm_cvtepu8_epi32(__w));
+ }
+ else
+ {
+ __m128i __tmp;
+ if constexpr (sizeof(_Tp) == 1)
+ {
+ __tmp
+ = is_signed_v<_Tp>
+ ? _mm_srai_epi16(_mm_unpacklo_epi8(__intrin, __intrin),
+ 8)
+ : _mm_unpacklo_epi8(__intrin, __m128i());
+ }
+ else
+ {
+ static_assert(sizeof(_Tp) == 2);
+ __tmp = __intrin;
+ }
+ __a = is_signed_v<_Tp>
+ ? _mm_srai_epi32(_mm_unpacklo_epi16(__tmp, __tmp), 16)
+ : _mm_unpacklo_epi16(__tmp, __m128i());
+ __b = is_signed_v<_Tp>
+ ? _mm_srai_epi32(_mm_unpackhi_epi16(__tmp, __tmp), 16)
+ : _mm_unpackhi_epi16(__tmp, __m128i());
+ }
+ return __convert_x86<_To>(__vector_bitcast<int>(__a),
+ __vector_bitcast<int>(__b));
+ }
+ }
+ else if constexpr (__s64_to_f64) //{{{2
+ {
+ if constexpr (__x_to_x && __have_avx512dq_vl)
+ return __intrin_bitcast<_To>(_mm_cvtepi64_pd(__intrin));
+ else if constexpr (__y_to_y && __have_avx512dq_vl)
+ return __intrin_bitcast<_To>(_mm256_cvtepi64_pd(__intrin));
+ else if constexpr (__z_to_z && __have_avx512dq)
+ return __intrin_bitcast<_To>(_mm512_cvtepi64_pd(__intrin));
+ else if constexpr (__z_to_z)
+ {
+ return __intrin_bitcast<_To>(
+ _mm512_cvtepi32_pd(_mm512_cvtepi64_epi32(__to_intrin(__v >> 32)))
+ * 0x100000000LL
+ + _mm512_cvtepu32_pd(_mm512_cvtepi64_epi32(__intrin)));
+ }
+ }
+ else if constexpr (__u64_to_f64) //{{{2
+ {
+ if constexpr (__x_to_x && __have_avx512dq_vl)
+ return __intrin_bitcast<_To>(_mm_cvtepu64_pd(__intrin));
+ else if constexpr (__y_to_y && __have_avx512dq_vl)
+ return __intrin_bitcast<_To>(_mm256_cvtepu64_pd(__intrin));
+ else if constexpr (__z_to_z && __have_avx512dq)
+ return __intrin_bitcast<_To>(_mm512_cvtepu64_pd(__intrin));
+ else if constexpr (__z_to_z)
+ {
+ return __intrin_bitcast<_To>(
+ _mm512_cvtepu32_pd(_mm512_cvtepi64_epi32(__to_intrin(__v >> 32)))
+ * 0x100000000LL
+ + _mm512_cvtepu32_pd(_mm512_cvtepi64_epi32(__intrin)));
+ }
+ }
+ else if constexpr (__s32_to_f64) //{{{2
+ {
+ if constexpr (__x_to_x)
+ return __intrin_bitcast<_To>(_mm_cvtepi32_pd(__intrin));
+ else if constexpr (__x_to_y)
+ return __intrin_bitcast<_To>(_mm256_cvtepi32_pd(__intrin));
+ else if constexpr (__y_to_z)
+ return __intrin_bitcast<_To>(_mm512_cvtepi32_pd(__intrin));
+ }
+ else if constexpr (__u32_to_f64) //{{{2
+ {
+ if constexpr (__x_to_x && __have_avx512vl)
+ return __intrin_bitcast<_To>(_mm_cvtepu32_pd(__intrin));
+ else if constexpr (__x_to_x && __have_avx512f)
+ return __intrin_bitcast<_To>(
+ __lo128(_mm512_cvtepu32_pd(__auto_bitcast(__v))));
+ else if constexpr (__x_to_x)
+ return __intrin_bitcast<_To>(
+ _mm_cvtepi32_pd(__to_intrin(__v ^ 0x8000'0000u)) + 0x8000'0000u);
+ else if constexpr (__x_to_y && __have_avx512vl)
+ return __intrin_bitcast<_To>(_mm256_cvtepu32_pd(__intrin));
+ else if constexpr (__x_to_y && __have_avx512f)
+ return __intrin_bitcast<_To>(
+ __lo256(_mm512_cvtepu32_pd(__auto_bitcast(__v))));
+ else if constexpr (__x_to_y)
+ return __intrin_bitcast<_To>(
+ _mm256_cvtepi32_pd(__to_intrin(__v ^ 0x8000'0000u)) + 0x8000'0000u);
+ else if constexpr (__y_to_z)
+ return __intrin_bitcast<_To>(_mm512_cvtepu32_pd(__intrin));
+ }
+ else if constexpr (__ibw_to_f64) //{{{2
+ {
+ return __convert_x86<_To>(
+ __convert_x86<__vector_type_t<int, std::max(size_t(4), _M)>>(__v));
+ }
+ else if constexpr (__f32_to_f64) //{{{2
+ {
+ if constexpr (__x_to_x)
+ return __intrin_bitcast<_To>(_mm_cvtps_pd(__intrin));
+ else if constexpr (__x_to_y)
+ return __intrin_bitcast<_To>(_mm256_cvtps_pd(__intrin));
+ else if constexpr (__y_to_z)
+ return __intrin_bitcast<_To>(_mm512_cvtps_pd(__intrin));
+ }
+ else if constexpr (__f64_to_f32) //{{{2
+ {
+ if constexpr (__x_to_x)
+ return __intrin_bitcast<_To>(_mm_cvtpd_ps(__intrin));
+ else if constexpr (__y_to_x)
+ return __intrin_bitcast<_To>(_mm256_cvtpd_ps(__intrin));
+ else if constexpr (__z_to_y)
+ return __intrin_bitcast<_To>(_mm512_cvtpd_ps(__intrin));
+ }
+ else //{{{2
+ __assert_unreachable<_Tp>();
+
+ // fallback:{{{2
+ return __vector_convert<_To>(__v, make_index_sequence<std::min(_M, _Np)>());
+ //}}}
+} // }}}
+// 2-arg __convert_x86 {{{1
+template <typename _To, typename _V, typename _Traits>
+_GLIBCXX_SIMD_INTRINSIC _To
+__convert_x86(_V __v0, _V __v1)
+{
+ static_assert(__is_vector_type_v<_V>);
+ using _Tp = typename _Traits::value_type;
+ constexpr size_t _Np = _Traits::_S_width;
+ [[maybe_unused]] const auto __i0 = __to_intrin(__v0);
+ [[maybe_unused]] const auto __i1 = __to_intrin(__v1);
+ using _Up = typename _VectorTraits<_To>::value_type;
+ constexpr size_t _M = _VectorTraits<_To>::_S_width;
+
+ static_assert(2 * _Np <= _M, "__v1 would be discarded; use the one-argument "
+ "__convert_x86 overload instead");
+
+ // [xyz]_to_[xyz] {{{2
+ [[maybe_unused]] constexpr bool __x_to_x
+ = sizeof(__v0) <= 16 && sizeof(_To) <= 16;
+ [[maybe_unused]] constexpr bool __x_to_y
+ = sizeof(__v0) <= 16 && sizeof(_To) == 32;
+ [[maybe_unused]] constexpr bool __x_to_z
+ = sizeof(__v0) <= 16 && sizeof(_To) == 64;
+ [[maybe_unused]] constexpr bool __y_to_x
+ = sizeof(__v0) == 32 && sizeof(_To) <= 16;
+ [[maybe_unused]] constexpr bool __y_to_y
+ = sizeof(__v0) == 32 && sizeof(_To) == 32;
+ [[maybe_unused]] constexpr bool __y_to_z
+ = sizeof(__v0) == 32 && sizeof(_To) == 64;
+ [[maybe_unused]] constexpr bool __z_to_x
+ = sizeof(__v0) == 64 && sizeof(_To) <= 16;
+ [[maybe_unused]] constexpr bool __z_to_y
+ = sizeof(__v0) == 64 && sizeof(_To) == 32;
+ [[maybe_unused]] constexpr bool __z_to_z
+ = sizeof(__v0) == 64 && sizeof(_To) == 64;
+
+ // iX_to_iX {{{2
+ [[maybe_unused]] constexpr bool __i_to_i
+ = std::is_integral_v<_Up> && std::is_integral_v<_Tp>;
+ [[maybe_unused]] constexpr bool __i8_to_i16
+ = __i_to_i && sizeof(_Tp) == 1 && sizeof(_Up) == 2;
+ [[maybe_unused]] constexpr bool __i8_to_i32
+ = __i_to_i && sizeof(_Tp) == 1 && sizeof(_Up) == 4;
+ [[maybe_unused]] constexpr bool __i8_to_i64
+ = __i_to_i && sizeof(_Tp) == 1 && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __i16_to_i8
+ = __i_to_i && sizeof(_Tp) == 2 && sizeof(_Up) == 1;
+ [[maybe_unused]] constexpr bool __i16_to_i32
+ = __i_to_i && sizeof(_Tp) == 2 && sizeof(_Up) == 4;
+ [[maybe_unused]] constexpr bool __i16_to_i64
+ = __i_to_i && sizeof(_Tp) == 2 && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __i32_to_i8
+ = __i_to_i && sizeof(_Tp) == 4 && sizeof(_Up) == 1;
+ [[maybe_unused]] constexpr bool __i32_to_i16
+ = __i_to_i && sizeof(_Tp) == 4 && sizeof(_Up) == 2;
+ [[maybe_unused]] constexpr bool __i32_to_i64
+ = __i_to_i && sizeof(_Tp) == 4 && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __i64_to_i8
+ = __i_to_i && sizeof(_Tp) == 8 && sizeof(_Up) == 1;
+ [[maybe_unused]] constexpr bool __i64_to_i16
+ = __i_to_i && sizeof(_Tp) == 8 && sizeof(_Up) == 2;
+ [[maybe_unused]] constexpr bool __i64_to_i32
+ = __i_to_i && sizeof(_Tp) == 8 && sizeof(_Up) == 4;
+
+ // [fsu]X_to_[fsu]X {{{2
+ // ibw = integral && byte or word, i.e. char and short with any signedness
+ [[maybe_unused]] constexpr bool __i64_to_f32
+ = is_integral_v<_Tp> && sizeof(_Tp) == 8
+ && is_floating_point_v<_Up> && sizeof(_Up) == 4;
+ [[maybe_unused]] constexpr bool __s32_to_f32
+ = is_integral_v<_Tp> && is_signed_v<_Tp> && sizeof(_Tp) == 4
+ && is_floating_point_v<_Up> && sizeof(_Up) == 4;
+ [[maybe_unused]] constexpr bool __s16_to_f32
+ = is_integral_v<_Tp> && is_signed_v<_Tp> && sizeof(_Tp) == 2
+ && is_floating_point_v<_Up> && sizeof(_Up) == 4;
+ [[maybe_unused]] constexpr bool __s8_to_f32
+ = is_integral_v<_Tp> && is_signed_v<_Tp> && sizeof(_Tp) == 1
+ && is_floating_point_v<_Up> && sizeof(_Up) == 4;
+ [[maybe_unused]] constexpr bool __u32_to_f32
+ = is_integral_v<_Tp> && is_unsigned_v<_Tp> && sizeof(_Tp) == 4
+ && is_floating_point_v<_Up> && sizeof(_Up) == 4;
+ [[maybe_unused]] constexpr bool __u16_to_f32
+ = is_integral_v<_Tp> && is_unsigned_v<_Tp> && sizeof(_Tp) == 2
+ && is_floating_point_v<_Up> && sizeof(_Up) == 4;
+ [[maybe_unused]] constexpr bool __u8_to_f32
+ = is_integral_v<_Tp> && is_unsigned_v<_Tp> && sizeof(_Tp) == 1
+ && is_floating_point_v<_Up> && sizeof(_Up) == 4;
+ [[maybe_unused]] constexpr bool __s64_to_f64
+ = is_integral_v<_Tp> && is_signed_v<_Tp> && sizeof(_Tp) == 8
+ && is_floating_point_v<_Up> && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __s32_to_f64
+ = is_integral_v<_Tp> && is_signed_v<_Tp> && sizeof(_Tp) == 4
+ && is_floating_point_v<_Up> && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __s16_to_f64
+ = is_integral_v<_Tp> && is_signed_v<_Tp> && sizeof(_Tp) == 2
+ && is_floating_point_v<_Up> && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __s8_to_f64
+ = is_integral_v<_Tp> && is_signed_v<_Tp> && sizeof(_Tp) == 1
+ && is_floating_point_v<_Up> && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __u64_to_f64
+ = is_integral_v<_Tp> && is_unsigned_v<_Tp> && sizeof(_Tp) == 8
+ && is_floating_point_v<_Up> && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __u32_to_f64
+ = is_integral_v<_Tp> && is_unsigned_v<_Tp> && sizeof(_Tp) == 4
+ && is_floating_point_v<_Up> && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __u16_to_f64
+ = is_integral_v<_Tp> && is_unsigned_v<_Tp> && sizeof(_Tp) == 2
+ && is_floating_point_v<_Up> && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __u8_to_f64
+ = is_integral_v<_Tp> && is_unsigned_v<_Tp> && sizeof(_Tp) == 1
+ && is_floating_point_v<_Up> && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __f32_to_s64
+ = is_integral_v<_Up> && is_signed_v<_Up> && sizeof(_Up) == 8
+ && is_floating_point_v<_Tp> && sizeof(_Tp) == 4;
+ [[maybe_unused]] constexpr bool __f32_to_s32
+ = is_integral_v<_Up> && is_signed_v<_Up> && sizeof(_Up) == 4
+ && is_floating_point_v<_Tp> && sizeof(_Tp) == 4;
+ [[maybe_unused]] constexpr bool __f32_to_u64
+ = is_integral_v<_Up> && is_unsigned_v<_Up> && sizeof(_Up) == 8
+ && is_floating_point_v<_Tp> && sizeof(_Tp) == 4;
+ [[maybe_unused]] constexpr bool __f32_to_u32
+ = is_integral_v<_Up> && is_unsigned_v<_Up> && sizeof(_Up) == 4
+ && is_floating_point_v<_Tp> && sizeof(_Tp) == 4;
+ [[maybe_unused]] constexpr bool __f64_to_s64
+ = is_integral_v<_Up> && is_signed_v<_Up> && sizeof(_Up) == 8
+ && is_floating_point_v<_Tp> && sizeof(_Tp) == 8;
+ [[maybe_unused]] constexpr bool __f64_to_s32
+ = is_integral_v<_Up> && is_signed_v<_Up> && sizeof(_Up) == 4
+ && is_floating_point_v<_Tp> && sizeof(_Tp) == 8;
+ [[maybe_unused]] constexpr bool __f64_to_u64
+ = is_integral_v<_Up> && is_unsigned_v<_Up> && sizeof(_Up) == 8
+ && is_floating_point_v<_Tp> && sizeof(_Tp) == 8;
+ [[maybe_unused]] constexpr bool __f64_to_u32
+ = is_integral_v<_Up> && is_unsigned_v<_Up> && sizeof(_Up) == 4
+ && is_floating_point_v<_Tp> && sizeof(_Tp) == 8;
+ [[maybe_unused]] constexpr bool __f32_to_ibw
+ = is_integral_v<_Up> && sizeof(_Up) <= 2
+ && is_floating_point_v<_Tp> && sizeof(_Tp) == 4;
+ [[maybe_unused]] constexpr bool __f64_to_ibw
+ = is_integral_v<_Up> && sizeof(_Up) <= 2
+ && is_floating_point_v<_Tp> && sizeof(_Tp) == 8;
+ [[maybe_unused]] constexpr bool __f32_to_f64
+ = is_floating_point_v<_Tp> && sizeof(_Tp) == 4
+ && is_floating_point_v<_Up> && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __f64_to_f32
+ = is_floating_point_v<_Tp> && sizeof(_Tp) == 8
+ && is_floating_point_v<_Up> && sizeof(_Up) == 4;
+
+ if constexpr (__i_to_i && __y_to_x && !__have_avx2)
+ { //{{{2
+ // <double, 4>, <double, 4> => <short, 8>
+ return __convert_x86<_To>(__lo128(__v0), __hi128(__v0), __lo128(__v1),
+ __hi128(__v1));
+ }
+ else if constexpr (__i_to_i)
+ { // assert ISA {{{2
+ static_assert(__x_to_x || __have_avx2,
+ "integral conversions with ymm registers require AVX2");
+ static_assert(__have_avx512bw
+ || ((sizeof(_Tp) >= 4 || sizeof(__v0) < 64)
+ && (sizeof(_Up) >= 4 || sizeof(_To) < 64)),
+ "8/16-bit integers in zmm registers require AVX512BW");
+ static_assert((sizeof(__v0) < 64 && sizeof(_To) < 64) || __have_avx512f,
+ "integral conversions with ymm registers require AVX2");
+ }
+ // concat => use 1-arg __convert_x86 {{{2
+ if constexpr ((sizeof(__v0) == 16 && __have_avx2)
+ || (sizeof(__v0) == 16 && __have_avx
+ && std::is_floating_point_v<_Tp>)
+ || (sizeof(__v0) == 32 && __have_avx512f
+ && (sizeof(_Tp) >= 4 || __have_avx512bw)))
+ {
+ // The ISA can handle wider input registers, so concat and use one-arg
+ // implementation. This reduces code duplication considerably.
+ return __convert_x86<_To>(__concat(__v0, __v1));
+ }
+ else
+ { //{{{2
+ // conversion using bit reinterpretation (or no conversion at all) should
+ // all go through the concat branch above:
+ static_assert(!(
+ std::is_floating_point_v<
+ _Tp> == std::is_floating_point_v<_Up> && sizeof(_Tp) == sizeof(_Up)));
+ if constexpr (2 * _Np < _M && sizeof(_To) > 16)
+ { // handle all zero extension{{{2
+ constexpr size_t Min = 16 / sizeof(_Up);
+ return __zero_extend(
+ __convert_x86<
+ __vector_type_t<_Up, (Min > 2 * _Np) ? Min : 2 * _Np>>(__v0,
+ __v1));
+ }
+ else if constexpr (__i64_to_i32)
+ { //{{{2
+ if constexpr (__x_to_x)
+ return __auto_bitcast(
+ _mm_shuffle_ps(__auto_bitcast(__v0), __auto_bitcast(__v1), 0x88));
+ else if constexpr (__y_to_y)
+ {
+ // AVX512F is not available (would concat otherwise)
+ return __auto_bitcast(
+ __xzyw(_mm256_shuffle_ps(__auto_bitcast(__v0),
+ __auto_bitcast(__v1), 0x88)));
+ // alternative:
+ // const auto v0_abxxcdxx = _mm256_shuffle_epi32(__v0, 8);
+ // const auto v1_efxxghxx = _mm256_shuffle_epi32(__v1, 8);
+ // const auto v_abefcdgh = _mm256_unpacklo_epi64(v0_abxxcdxx,
+ // v1_efxxghxx); return _mm256_permute4x64_epi64(v_abefcdgh,
+ // 0x01 * 0 + 0x04 * 2 + 0x10 * 1 + 0x40 * 3); // abcdefgh
+ }
+ else if constexpr (__z_to_z)
+ return __intrin_bitcast<_To>(__concat(_mm512_cvtepi64_epi32(__i0),
+ _mm512_cvtepi64_epi32(__i1)));
+ }
+ else if constexpr (__i64_to_i16)
+ { //{{{2
+ if constexpr (__x_to_x)
+ {
+ // AVX2 is not available (would concat otherwise)
+ if constexpr (__have_sse4_1)
+ {
+ return __intrin_bitcast<_To>(_mm_shuffle_epi8(
+ _mm_blend_epi16(__i0, _mm_slli_si128(__i1, 4), 0x44),
+ _mm_setr_epi8(0, 1, 8, 9, 4, 5, 12, 13, -0x80, -0x80, -0x80,
+ -0x80, -0x80, -0x80, -0x80, -0x80)));
+ }
+ else
+ {
+ return __vector_type_t<_Up, _M>{_Up(__v0[0]), _Up(__v0[1]),
+ _Up(__v1[0]), _Up(__v1[1])};
+ }
+ }
+ else if constexpr (__y_to_x)
+ {
+ auto __a
+ = _mm256_unpacklo_epi16(__i0, __i1); // 04.. .... 26.. ....
+ auto __b
+ = _mm256_unpackhi_epi16(__i0, __i1); // 15.. .... 37.. ....
+ auto __c = _mm256_unpacklo_epi16(__a, __b); // 0145 .... 2367 ....
+ return __intrin_bitcast<_To>(
+ _mm_unpacklo_epi32(__lo128(__c), __hi128(__c))); // 0123 4567
+ }
+ else if constexpr (__z_to_y)
+ return __intrin_bitcast<_To>(__concat(_mm512_cvtepi64_epi16(__i0),
+ _mm512_cvtepi64_epi16(__i1)));
+ }
+ else if constexpr (__i64_to_i8)
+ { //{{{2
+ if constexpr (__x_to_x && __have_sse4_1)
+ {
+ return __intrin_bitcast<_To>(_mm_shuffle_epi8(
+ _mm_blend_epi16(__i0, _mm_slli_si128(__i1, 4), 0x44),
+ _mm_setr_epi8(0, 8, 4, 12, -0x80, -0x80, -0x80, -0x80, -0x80,
+ -0x80, -0x80, -0x80, -0x80, -0x80, -0x80,
+ -0x80)));
+ }
+ else if constexpr (__x_to_x && __have_ssse3)
+ {
+ return __intrin_bitcast<_To>(_mm_unpacklo_epi16(
+ _mm_shuffle_epi8(__i0, _mm_setr_epi8(0, 8, -0x80, -0x80, -0x80,
+ -0x80, -0x80, -0x80, -0x80,
+ -0x80, -0x80, -0x80, -0x80,
+ -0x80, -0x80, -0x80)),
+ _mm_shuffle_epi8(__i1, _mm_setr_epi8(0, 8, -0x80, -0x80, -0x80,
+ -0x80, -0x80, -0x80, -0x80,
+ -0x80, -0x80, -0x80, -0x80,
+ -0x80, -0x80, -0x80))));
+ }
+ else if constexpr (__x_to_x)
+ {
+ return __vector_type_t<_Up, _M>{_Up(__v0[0]), _Up(__v0[1]),
+ _Up(__v1[0]), _Up(__v1[1])};
+ }
+ else if constexpr (__y_to_x)
+ {
+ const auto __a = _mm256_shuffle_epi8(
+ _mm256_blend_epi32(__i0, _mm256_slli_epi64(__i1, 32), 0xAA),
+ _mm256_setr_epi8(0, 8, -0x80, -0x80, 4, 12, -0x80, -0x80, -0x80,
+ -0x80, -0x80, -0x80, -0x80, -0x80, -0x80,
+ -0x80, -0x80, -0x80, 0, 8, -0x80, -0x80, 4, 12,
+ -0x80, -0x80, -0x80, -0x80, -0x80, -0x80,
+ -0x80, -0x80));
+ return __intrin_bitcast<_To>(__lo128(__a) | __hi128(__a));
+ } // __z_to_x uses concat fallback
+ }
+ else if constexpr (__i32_to_i16)
+ { //{{{2
+ if constexpr (__x_to_x)
+ {
+ // AVX2 is not available (would concat otherwise)
+ if constexpr (__have_sse4_1)
+ {
+ return __intrin_bitcast<_To>(_mm_shuffle_epi8(
+ _mm_blend_epi16(__i0, _mm_slli_si128(__i1, 2), 0xaa),
+ _mm_setr_epi8(0, 1, 4, 5, 8, 9, 12, 13, 2, 3, 6, 7, 10, 11,
+ 14, 15)));
+ }
+ else if constexpr (__have_ssse3)
+ {
+ return __intrin_bitcast<_To>(
+ _mm_hadd_epi16(__to_intrin(__v0 << 16),
+ __to_intrin(__v1 << 16)));
+ /*
+ return _mm_unpacklo_epi64(
+ _mm_shuffle_epi8(__i0, _mm_setr_epi8(0, 1, 4, 5, 8, 9, 12,
+ 13, 8, 9, 12, 13, 12, 13, 14, 15)), _mm_shuffle_epi8(__i1,
+ _mm_setr_epi8(0, 1, 4, 5, 8, 9, 12, 13, 8, 9, 12, 13, 12, 13,
+ 14, 15)));
+ */
+ }
+ else
+ {
+ auto __a = _mm_unpacklo_epi16(__i0, __i1); // 04.. 15..
+ auto __b = _mm_unpackhi_epi16(__i0, __i1); // 26.. 37..
+ auto __c = _mm_unpacklo_epi16(__a, __b); // 0246 ....
+ auto __d = _mm_unpackhi_epi16(__a, __b); // 1357 ....
+ return __intrin_bitcast<_To>(
+ _mm_unpacklo_epi16(__c, __d)); // 0123 4567
+ }
+ }
+ else if constexpr (__y_to_y)
+ {
+ const auto __shuf
+ = _mm256_setr_epi8(0, 1, 4, 5, 8, 9, 12, 13, -0x80, -0x80,
+ -0x80, -0x80, -0x80, -0x80, -0x80, -0x80, 0,
+ 1, 4, 5, 8, 9, 12, 13, -0x80, -0x80, -0x80,
+ -0x80, -0x80, -0x80, -0x80, -0x80);
+ auto __a = _mm256_shuffle_epi8(__i0, __shuf);
+ auto __b = _mm256_shuffle_epi8(__i1, __shuf);
+ return __intrin_bitcast<_To>(
+ __xzyw(_mm256_unpacklo_epi64(__a, __b)));
+ } // __z_to_z uses concat fallback
+ }
+ else if constexpr (__i32_to_i8)
+ { //{{{2
+ if constexpr (__x_to_x && __have_ssse3)
+ {
+ const auto shufmask
+ = _mm_setr_epi8(0, 4, 8, 12, -0x80, -0x80, -0x80, -0x80, -0x80,
+ -0x80, -0x80, -0x80, -0x80, -0x80, -0x80,
+ -0x80);
+ return __intrin_bitcast<_To>(
+ _mm_unpacklo_epi32(_mm_shuffle_epi8(__i0, shufmask),
+ _mm_shuffle_epi8(__i1, shufmask)));
+ }
+ else if constexpr (__x_to_x)
+ {
+ auto __a = _mm_unpacklo_epi8(__i0, __i1); // 04.. .... 15.. ....
+ auto __b = _mm_unpackhi_epi8(__i0, __i1); // 26.. .... 37.. ....
+ auto __c = _mm_unpacklo_epi8(__a, __b); // 0246 .... .... ....
+ auto __d = _mm_unpackhi_epi8(__a, __b); // 1357 .... .... ....
+ auto __e = _mm_unpacklo_epi8(__c, __d); // 0123 4567 .... ....
+ return __intrin_bitcast<_To>(__e & __m128i{-1, 0});
+ }
+ else if constexpr (__y_to_x)
+ {
+ const auto __a = _mm256_shuffle_epi8(
+ _mm256_blend_epi16(__i0, _mm256_slli_epi32(__i1, 16), 0xAA),
+ _mm256_setr_epi8(0, 4, 8, 12, -0x80, -0x80, -0x80, -0x80, 2, 6,
+ 10, 14, -0x80, -0x80, -0x80, -0x80, -0x80,
+ -0x80, -0x80, -0x80, 0, 4, 8, 12, -0x80, -0x80,
+ -0x80, -0x80, 2, 6, 10, 14));
+ return __intrin_bitcast<_To>(__lo128(__a) | __hi128(__a));
+ } // __z_to_y uses concat fallback
+ }
+ else if constexpr (__i16_to_i8)
+ { //{{{2
+ if constexpr (__x_to_x && __have_ssse3)
+ {
+ const auto __shuf = reinterpret_cast<__m128i>(
+ __vector_type_t<_UChar, 16>{0, 2, 4, 6, 8, 10, 12, 14, 0x80,
+ 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
+ 0x80});
+ return __intrin_bitcast<_To>(
+ _mm_unpacklo_epi64(_mm_shuffle_epi8(__i0, __shuf),
+ _mm_shuffle_epi8(__i1, __shuf)));
+ }
+ else if constexpr (__x_to_x)
+ {
+ auto __a = _mm_unpacklo_epi8(__i0, __i1); // 08.. 19.. 2A.. 3B..
+ auto __b = _mm_unpackhi_epi8(__i0, __i1); // 4C.. 5D.. 6E.. 7F..
+ auto __c = _mm_unpacklo_epi8(__a, __b); // 048C .... 159D ....
+ auto __d = _mm_unpackhi_epi8(__a, __b); // 26AE .... 37BF ....
+ auto __e = _mm_unpacklo_epi8(__c, __d); // 0246 8ACE .... ....
+ auto __f = _mm_unpackhi_epi8(__c, __d); // 1357 9BDF .... ....
+ return __intrin_bitcast<_To>(_mm_unpacklo_epi8(__e, __f));
+ }
+ else if constexpr (__y_to_y)
+ {
+ return __intrin_bitcast<_To>(__xzyw(_mm256_shuffle_epi8(
+ (__to_intrin(__v0) & _mm256_set1_epi32(0x00ff00ff))
+ | _mm256_slli_epi16(__i1, 8),
+ _mm256_setr_epi8(0, 2, 4, 6, 8, 10, 12, 14, 1, 3, 5, 7, 9, 11,
+ 13, 15, 0, 2, 4, 6, 8, 10, 12, 14, 1, 3, 5, 7,
+ 9, 11, 13, 15))));
+ } // __z_to_z uses concat fallback
+ }
+ else if constexpr (__i64_to_f32)
+ { //{{{2
+ if constexpr (__x_to_x)
+ return __make_wrapper<float>(__v0[0], __v0[1], __v1[0], __v1[1]);
+ else if constexpr (__y_to_y)
+ {
+ static_assert(__y_to_y && __have_avx2);
+ const auto __a = _mm256_unpacklo_epi32(__i0, __i1); // aeAE cgCG
+ const auto __b = _mm256_unpackhi_epi32(__i0, __i1); // bfBF dhDH
+ const auto __lo32 = _mm256_unpacklo_epi32(__a, __b); // abef cdgh
+ const auto __hi32
+ = __vector_bitcast<conditional_t<is_signed_v<_Tp>, int, _UInt>>(
+ _mm256_unpackhi_epi32(__a, __b)); // ABEF CDGH
+ const auto __hi
+ = 0x100000000LL
+ * __convert_x86<__vector_type_t<float, 8>>(__hi32);
+ const auto __mid
+ = 0x10000 * _mm256_cvtepi32_ps(_mm256_srli_epi32(__lo32, 16));
+ const auto __lo
+ = _mm256_cvtepi32_ps(_mm256_set1_epi32(0x0000ffffu) & __lo32);
+ return __xzyw((__hi + __mid) + __lo);
+ }
+ else if constexpr (__z_to_z && __have_avx512dq)
+ {
+ return std::is_signed_v<_Tp> ? __concat(_mm512_cvtepi64_ps(__i0),
+ _mm512_cvtepi64_ps(__i1))
+ : __concat(_mm512_cvtepu64_ps(__i0),
+ _mm512_cvtepu64_ps(__i1));
+ }
+ else if constexpr (__z_to_z && std::is_signed_v<_Tp>)
+ {
+ const __m512 __hi32 = _mm512_cvtepi32_ps(
+ __concat(_mm512_cvtepi64_epi32(__to_intrin(__v0 >> 32)),
+ _mm512_cvtepi64_epi32(__to_intrin(__v1 >> 32))));
+ const __m512i __lo32 = __concat(_mm512_cvtepi64_epi32(__i0),
+ _mm512_cvtepi64_epi32(__i1));
+ // split low 32-bits, because if __hi32 is a small negative
+ // number, the 24-bit mantissa may lose important information if
+ // any of the high 8 bits of __lo32 is set, leading to
+ // catastrophic cancelation in the FMA
+ const __m512 __hi16
+ = _mm512_cvtepu32_ps(_mm512_set1_epi32(0xffff0000u) & __lo32);
+ const __m512 __lo16
+ = _mm512_cvtepi32_ps(_mm512_set1_epi32(0x0000ffffu) & __lo32);
+ return (__hi32 * 0x100000000LL + __hi16) + __lo16;
+ }
+ else if constexpr (__z_to_z && std::is_unsigned_v<_Tp>)
+ {
+ return __intrin_bitcast<_To>(
+ _mm512_cvtepu32_ps(
+ __concat(_mm512_cvtepi64_epi32(_mm512_srai_epi64(__i0, 32)),
+ _mm512_cvtepi64_epi32(_mm512_srai_epi64(__i1, 32))))
+ * 0x100000000LL
+ + _mm512_cvtepu32_ps(__concat(_mm512_cvtepi64_epi32(__i0),
+ _mm512_cvtepi64_epi32(__i1))));
+ }
+ }
+ else if constexpr (__f64_to_s32)
+ { //{{{2
+ // use concat fallback
+ }
+ else if constexpr (__f64_to_u32)
+ { //{{{2
+ if constexpr (__x_to_x && __have_sse4_1)
+ {
+ return __vector_bitcast<_Up, _M>(_mm_unpacklo_epi64(
+ _mm_cvttpd_epi32(_mm_floor_pd(__i0) - 0x8000'0000u),
+ _mm_cvttpd_epi32(_mm_floor_pd(__i1) - 0x8000'0000u)))
+ ^ 0x8000'0000u;
+ // without SSE4.1 just use the scalar fallback, it's only four
+ // values
+ }
+ else if constexpr (__y_to_y)
+ {
+ return __vector_bitcast<_Up>(
+ __concat(_mm256_cvttpd_epi32(_mm256_floor_pd(__i0)
+ - 0x8000'0000u),
+ _mm256_cvttpd_epi32(_mm256_floor_pd(__i1)
+ - 0x8000'0000u)))
+ ^ 0x8000'0000u;
+ } // __z_to_z uses fallback
+ }
+ else if constexpr (__f64_to_ibw)
+ { //{{{2
+ // one-arg __f64_to_ibw goes via _SimdWrapper<int, ?>. The fallback
+ // would go via two independet conversions to _SimdWrapper<_To> and
+ // subsequent interleaving. This is better, because f64->__i32 allows
+ // to combine __v0 and __v1 into one register:
+ // if constexpr (__z_to_x || __y_to_x) {
+ return __convert_x86<_To>(
+ __convert_x86<__vector_type_t<int, _Np * 2>>(__v0, __v1));
+ //}
+ }
+ else if constexpr (__f32_to_ibw)
+ { //{{{2
+ return __convert_x86<_To>(
+ __convert_x86<__vector_type_t<int, _Np>>(__v0),
+ __convert_x86<__vector_type_t<int, _Np>>(__v1));
+ //}}}
+ }
+
+ // fallback: {{{2
+ if constexpr (sizeof(_To) >= 32)
+ // if _To is ymm or zmm, then _SimdWrapper<_Up, _M / 2> is xmm or ymm
+ return __concat(__convert_x86<__vector_type_t<_Up, _M / 2>>(__v0),
+ __convert_x86<__vector_type_t<_Up, _M / 2>>(__v1));
+ else if constexpr (sizeof(_To) == 16)
+ {
+ const auto __lo = __to_intrin(__convert_x86<_To>(__v0));
+ const auto __hi = __to_intrin(__convert_x86<_To>(__v1));
+ if constexpr (sizeof(_Up) * _Np == 8)
+ {
+ if constexpr (is_floating_point_v<_Up>)
+ return __auto_bitcast(
+ _mm_unpacklo_pd(__vector_bitcast<double>(__lo),
+ __vector_bitcast<double>(__hi)));
+ else
+ return __intrin_bitcast<_To>(_mm_unpacklo_epi64(__lo, __hi));
+ }
+ else if constexpr (sizeof(_Up) * _Np == 4)
+ {
+ if constexpr (is_floating_point_v<_Up>)
+ return __auto_bitcast(
+ _mm_unpacklo_ps(__vector_bitcast<float>(__lo),
+ __vector_bitcast<float>(__hi)));
+ else
+ return __intrin_bitcast<_To>(_mm_unpacklo_epi32(__lo, __hi));
+ }
+ else if constexpr (sizeof(_Up) * _Np == 2)
+ return __intrin_bitcast<_To>(_mm_unpacklo_epi16(__lo, __hi));
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else
+ return __vector_convert<_To>(__v0, __v1, make_index_sequence<_Np>());
+ //}}}
+ }
+} //}}}1
+// 4-arg __convert_x86 {{{1
+template <typename _To, typename _V, typename _Traits>
+_GLIBCXX_SIMD_INTRINSIC _To
+__convert_x86(_V __v0, _V __v1, _V __v2, _V __v3)
+{
+ static_assert(__is_vector_type_v<_V>);
+ using _Tp = typename _Traits::value_type;
+ constexpr size_t _Np = _Traits::_S_width;
+ [[maybe_unused]] const auto __i0 = __to_intrin(__v0);
+ [[maybe_unused]] const auto __i1 = __to_intrin(__v1);
+ [[maybe_unused]] const auto __i2 = __to_intrin(__v2);
+ [[maybe_unused]] const auto __i3 = __to_intrin(__v3);
+ using _Up = typename _VectorTraits<_To>::value_type;
+ constexpr size_t _M = _VectorTraits<_To>::_S_width;
+
+ static_assert(4 * _Np <= _M,
+ "__v2/__v3 would be discarded; use the two/one-argument "
+ "__convert_x86 overload instead");
+
+ // [xyz]_to_[xyz] {{{2
+ [[maybe_unused]] constexpr bool __x_to_x
+ = sizeof(__v0) <= 16 && sizeof(_To) <= 16;
+ [[maybe_unused]] constexpr bool __x_to_y
+ = sizeof(__v0) <= 16 && sizeof(_To) == 32;
+ [[maybe_unused]] constexpr bool __x_to_z
+ = sizeof(__v0) <= 16 && sizeof(_To) == 64;
+ [[maybe_unused]] constexpr bool __y_to_x
+ = sizeof(__v0) == 32 && sizeof(_To) <= 16;
+ [[maybe_unused]] constexpr bool __y_to_y
+ = sizeof(__v0) == 32 && sizeof(_To) == 32;
+ [[maybe_unused]] constexpr bool __y_to_z
+ = sizeof(__v0) == 32 && sizeof(_To) == 64;
+ [[maybe_unused]] constexpr bool __z_to_x
+ = sizeof(__v0) == 64 && sizeof(_To) <= 16;
+ [[maybe_unused]] constexpr bool __z_to_y
+ = sizeof(__v0) == 64 && sizeof(_To) == 32;
+ [[maybe_unused]] constexpr bool __z_to_z
+ = sizeof(__v0) == 64 && sizeof(_To) == 64;
+
+ // iX_to_iX {{{2
+ [[maybe_unused]] constexpr bool __i_to_i
+ = std::is_integral_v<_Up> && std::is_integral_v<_Tp>;
+ [[maybe_unused]] constexpr bool __i8_to_i16
+ = __i_to_i && sizeof(_Tp) == 1 && sizeof(_Up) == 2;
+ [[maybe_unused]] constexpr bool __i8_to_i32
+ = __i_to_i && sizeof(_Tp) == 1 && sizeof(_Up) == 4;
+ [[maybe_unused]] constexpr bool __i8_to_i64
+ = __i_to_i && sizeof(_Tp) == 1 && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __i16_to_i8
+ = __i_to_i && sizeof(_Tp) == 2 && sizeof(_Up) == 1;
+ [[maybe_unused]] constexpr bool __i16_to_i32
+ = __i_to_i && sizeof(_Tp) == 2 && sizeof(_Up) == 4;
+ [[maybe_unused]] constexpr bool __i16_to_i64
+ = __i_to_i && sizeof(_Tp) == 2 && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __i32_to_i8
+ = __i_to_i && sizeof(_Tp) == 4 && sizeof(_Up) == 1;
+ [[maybe_unused]] constexpr bool __i32_to_i16
+ = __i_to_i && sizeof(_Tp) == 4 && sizeof(_Up) == 2;
+ [[maybe_unused]] constexpr bool __i32_to_i64
+ = __i_to_i && sizeof(_Tp) == 4 && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __i64_to_i8
+ = __i_to_i && sizeof(_Tp) == 8 && sizeof(_Up) == 1;
+ [[maybe_unused]] constexpr bool __i64_to_i16
+ = __i_to_i && sizeof(_Tp) == 8 && sizeof(_Up) == 2;
+ [[maybe_unused]] constexpr bool __i64_to_i32
+ = __i_to_i && sizeof(_Tp) == 8 && sizeof(_Up) == 4;
+
+ // [fsu]X_to_[fsu]X {{{2
+ // ibw = integral && byte or word, i.e. char and short with any signedness
+ [[maybe_unused]] constexpr bool __i64_to_f32
+ = is_integral_v<_Tp> && sizeof(_Tp) == 8
+ && is_floating_point_v<_Up> && sizeof(_Up) == 4;
+ [[maybe_unused]] constexpr bool __s32_to_f32
+ = is_integral_v<_Tp> && is_signed_v<_Tp> && sizeof(_Tp) == 4
+ && is_floating_point_v<_Up> && sizeof(_Up) == 4;
+ [[maybe_unused]] constexpr bool __s16_to_f32
+ = is_integral_v<_Tp> && is_signed_v<_Tp> && sizeof(_Tp) == 2
+ && is_floating_point_v<_Up> && sizeof(_Up) == 4;
+ [[maybe_unused]] constexpr bool __s8_to_f32
+ = is_integral_v<_Tp> && is_signed_v<_Tp> && sizeof(_Tp) == 1
+ && is_floating_point_v<_Up> && sizeof(_Up) == 4;
+ [[maybe_unused]] constexpr bool __u32_to_f32
+ = is_integral_v<_Tp> && is_unsigned_v<_Tp> && sizeof(_Tp) == 4
+ && is_floating_point_v<_Up> && sizeof(_Up) == 4;
+ [[maybe_unused]] constexpr bool __u16_to_f32
+ = is_integral_v<_Tp> && is_unsigned_v<_Tp> && sizeof(_Tp) == 2
+ && is_floating_point_v<_Up> && sizeof(_Up) == 4;
+ [[maybe_unused]] constexpr bool __u8_to_f32
+ = is_integral_v<_Tp> && is_unsigned_v<_Tp> && sizeof(_Tp) == 1
+ && is_floating_point_v<_Up> && sizeof(_Up) == 4;
+ [[maybe_unused]] constexpr bool __s64_to_f64
+ = is_integral_v<_Tp> && is_signed_v<_Tp> && sizeof(_Tp) == 8
+ && is_floating_point_v<_Up> && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __s32_to_f64
+ = is_integral_v<_Tp> && is_signed_v<_Tp> && sizeof(_Tp) == 4
+ && is_floating_point_v<_Up> && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __s16_to_f64
+ = is_integral_v<_Tp> && is_signed_v<_Tp> && sizeof(_Tp) == 2
+ && is_floating_point_v<_Up> && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __s8_to_f64
+ = is_integral_v<_Tp> && is_signed_v<_Tp> && sizeof(_Tp) == 1
+ && is_floating_point_v<_Up> && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __u64_to_f64
+ = is_integral_v<_Tp> && is_unsigned_v<_Tp> && sizeof(_Tp) == 8
+ && is_floating_point_v<_Up> && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __u32_to_f64
+ = is_integral_v<_Tp> && is_unsigned_v<_Tp> && sizeof(_Tp) == 4
+ && is_floating_point_v<_Up> && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __u16_to_f64
+ = is_integral_v<_Tp> && is_unsigned_v<_Tp> && sizeof(_Tp) == 2
+ && is_floating_point_v<_Up> && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __u8_to_f64
+ = is_integral_v<_Tp> && is_unsigned_v<_Tp> && sizeof(_Tp) == 1
+ && is_floating_point_v<_Up> && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __f32_to_s64
+ = is_integral_v<_Up> && is_signed_v<_Up> && sizeof(_Up) == 8
+ && is_floating_point_v<_Tp> && sizeof(_Tp) == 4;
+ [[maybe_unused]] constexpr bool __f32_to_s32
+ = is_integral_v<_Up> && is_signed_v<_Up> && sizeof(_Up) == 4
+ && is_floating_point_v<_Tp> && sizeof(_Tp) == 4;
+ [[maybe_unused]] constexpr bool __f32_to_u64
+ = is_integral_v<_Up> && is_unsigned_v<_Up> && sizeof(_Up) == 8
+ && is_floating_point_v<_Tp> && sizeof(_Tp) == 4;
+ [[maybe_unused]] constexpr bool __f32_to_u32
+ = is_integral_v<_Up> && is_unsigned_v<_Up> && sizeof(_Up) == 4
+ && is_floating_point_v<_Tp> && sizeof(_Tp) == 4;
+ [[maybe_unused]] constexpr bool __f64_to_s64
+ = is_integral_v<_Up> && is_signed_v<_Up> && sizeof(_Up) == 8
+ && is_floating_point_v<_Tp> && sizeof(_Tp) == 8;
+ [[maybe_unused]] constexpr bool __f64_to_s32
+ = is_integral_v<_Up> && is_signed_v<_Up> && sizeof(_Up) == 4
+ && is_floating_point_v<_Tp> && sizeof(_Tp) == 8;
+ [[maybe_unused]] constexpr bool __f64_to_u64
+ = is_integral_v<_Up> && is_unsigned_v<_Up> && sizeof(_Up) == 8
+ && is_floating_point_v<_Tp> && sizeof(_Tp) == 8;
+ [[maybe_unused]] constexpr bool __f64_to_u32
+ = is_integral_v<_Up> && is_unsigned_v<_Up> && sizeof(_Up) == 4
+ && is_floating_point_v<_Tp> && sizeof(_Tp) == 8;
+ [[maybe_unused]] constexpr bool __f32_to_ibw
+ = is_integral_v<_Up> && sizeof(_Up) <= 2
+ && is_floating_point_v<_Tp> && sizeof(_Tp) == 4;
+ [[maybe_unused]] constexpr bool __f64_to_ibw
+ = is_integral_v<_Up> && sizeof(_Up) <= 2
+ && is_floating_point_v<_Tp> && sizeof(_Tp) == 8;
+ [[maybe_unused]] constexpr bool __f32_to_f64
+ = is_floating_point_v<_Tp> && sizeof(_Tp) == 4
+ && is_floating_point_v<_Up> && sizeof(_Up) == 8;
+ [[maybe_unused]] constexpr bool __f64_to_f32
+ = is_floating_point_v<_Tp> && sizeof(_Tp) == 8
+ && is_floating_point_v<_Up> && sizeof(_Up) == 4;
+
+ if constexpr (__i_to_i && __y_to_x && !__have_avx2)
+ { //{{{2
+ // <double, 4>, <double, 4>, <double, 4>, <double, 4> => <char, 16>
+ return __convert_x86<_To>(__lo128(__v0), __hi128(__v0), __lo128(__v1),
+ __hi128(__v1), __lo128(__v2), __hi128(__v2),
+ __lo128(__v3), __hi128(__v3));
+ }
+ else if constexpr (__i_to_i)
+ { // assert ISA {{{2
+ static_assert(__x_to_x || __have_avx2,
+ "integral conversions with ymm registers require AVX2");
+ static_assert(__have_avx512bw
+ || ((sizeof(_Tp) >= 4 || sizeof(__v0) < 64)
+ && (sizeof(_Up) >= 4 || sizeof(_To) < 64)),
+ "8/16-bit integers in zmm registers require AVX512BW");
+ static_assert((sizeof(__v0) < 64 && sizeof(_To) < 64) || __have_avx512f,
+ "integral conversions with ymm registers require AVX2");
+ }
+ // concat => use 2-arg __convert_x86 {{{2
+ if constexpr ((sizeof(__v0) == 16 && __have_avx2)
+ || (sizeof(__v0) == 16 && __have_avx
+ && std::is_floating_point_v<_Tp>)
+ || (sizeof(__v0) == 32 && __have_avx512f))
+ {
+ // The ISA can handle wider input registers, so concat and use two-arg
+ // implementation. This reduces code duplication considerably.
+ return __convert_x86<_To>(__concat(__v0, __v1), __concat(__v2, __v3));
+ }
+ else
+ { //{{{2
+ // conversion using bit reinterpretation (or no conversion at all) should
+ // all go through the concat branch above:
+ static_assert(!(
+ std::is_floating_point_v<
+ _Tp> == std::is_floating_point_v<_Up> && sizeof(_Tp) == sizeof(_Up)));
+ if constexpr (4 * _Np < _M && sizeof(_To) > 16)
+ { // handle all zero extension{{{2
+ constexpr size_t Min = 16 / sizeof(_Up);
+ return __zero_extend(
+ __convert_x86<
+ __vector_type_t<_Up, (Min > 4 * _Np) ? Min : 4 * _Np>>(__v0, __v1,
+ __v2,
+ __v3));
+ }
+ else if constexpr (__i64_to_i16)
+ { //{{{2
+ if constexpr (__x_to_x && __have_sse4_1)
+ {
+ return __intrin_bitcast<_To>(_mm_shuffle_epi8(
+ _mm_blend_epi16(_mm_blend_epi16(__i0, _mm_slli_si128(__i1, 2),
+ 0x22),
+ _mm_blend_epi16(_mm_slli_si128(__i2, 4),
+ _mm_slli_si128(__i3, 6), 0x88),
+ 0xcc),
+ _mm_setr_epi8(0, 1, 8, 9, 2, 3, 10, 11, 4, 5, 12, 13, 6, 7, 14,
+ 15)));
+ }
+ else if constexpr (__y_to_y && __have_avx2)
+ {
+ return __intrin_bitcast<_To>(_mm256_shuffle_epi8(
+ __xzyw(_mm256_blend_epi16(
+ __auto_bitcast(
+ _mm256_shuffle_ps(__vector_bitcast<float>(__v0),
+ __vector_bitcast<float>(__v2),
+ 0x88)), // 0.1. 8.9. 2.3. A.B.
+ __to_intrin(__vector_bitcast<int>(_mm256_shuffle_ps(
+ __vector_bitcast<float>(__v1),
+ __vector_bitcast<float>(__v3), 0x88))
+ << 16), // .4.5 .C.D .6.7 .E.F
+ 0xaa) // 0415 8C9D 2637 AEBF
+ ), // 0415 2637 8C9D AEBF
+ _mm256_setr_epi8(0, 1, 4, 5, 8, 9, 12, 13, 2, 3, 6, 7, 10, 11,
+ 14, 15, 0, 1, 4, 5, 8, 9, 12, 13, 2, 3, 6, 7,
+ 10, 11, 14, 15)));
+ /*
+ auto __a = _mm256_unpacklo_epi16(__v0, __v1); // 04.. .... 26..
+ .... auto __b = _mm256_unpackhi_epi16(__v0, __v1); // 15..
+ .... 37.. .... auto __c = _mm256_unpacklo_epi16(__v2, __v3); //
+ 8C.. .... AE.. .... auto __d = _mm256_unpackhi_epi16(__v2, __v3);
+ // 9D.. .... BF.. .... auto __e = _mm256_unpacklo_epi16(__a, __b);
+ // 0145 .... 2367 .... auto __f = _mm256_unpacklo_epi16(__c, __d);
+ // 89CD .... ABEF .... auto __g = _mm256_unpacklo_epi64(__e, __f);
+ // 0145 89CD 2367 ABEF return __concat(
+ _mm_unpacklo_epi32(__lo128(__g), __hi128(__g)),
+ _mm_unpackhi_epi32(__lo128(__g), __hi128(__g))); // 0123 4567
+ 89AB CDEF
+ */
+ } // else use fallback
+ }
+ else if constexpr (__i64_to_i8)
+ { //{{{2
+ if constexpr (__x_to_x)
+ {
+ // TODO: use fallback for now
+ }
+ else if constexpr (__y_to_x)
+ {
+ auto __a = _mm256_srli_epi32(_mm256_slli_epi32(__i0, 24), 24)
+ | _mm256_srli_epi32(_mm256_slli_epi32(__i1, 24), 16)
+ | _mm256_srli_epi32(_mm256_slli_epi32(__i2, 24), 8)
+ | _mm256_slli_epi32(
+ __i3, 24); // 048C .... 159D .... 26AE .... 37BF ....
+ /*return _mm_shuffle_epi8(
+ _mm_blend_epi32(__lo128(__a) << 32, __hi128(__a), 0x5),
+ _mm_setr_epi8(4, 12, 0, 8, 5, 13, 1, 9, 6, 14, 2, 10, 7, 15,
+ 3, 11));*/
+ auto __b = _mm256_unpackhi_epi64(
+ __a, __a); // 159D .... 159D .... 37BF .... 37BF ....
+ auto __c = _mm256_unpacklo_epi8(
+ __a, __b); // 0145 89CD .... .... 2367 ABEF .... ....
+ return __intrin_bitcast<_To>(
+ _mm_unpacklo_epi16(__lo128(__c),
+ __hi128(__c))); // 0123 4567 89AB CDEF
+ }
+ }
+ else if constexpr (__i32_to_i8)
+ { //{{{2
+ if constexpr (__x_to_x)
+ {
+ if constexpr (__have_ssse3)
+ {
+ const auto __x0 = __vector_bitcast<_UInt>(__v0) & 0xff;
+ const auto __x1 = (__vector_bitcast<_UInt>(__v1) & 0xff) << 8;
+ const auto __x2 = (__vector_bitcast<_UInt>(__v2) & 0xff)
+ << 16;
+ const auto __x3 = __vector_bitcast<_UInt>(__v3) << 24;
+ return __intrin_bitcast<_To>(
+ _mm_shuffle_epi8(__to_intrin(__x0 | __x1 | __x2 | __x3),
+ _mm_setr_epi8(0, 4, 8, 12, 1, 5, 9, 13, 2,
+ 6, 10, 14, 3, 7, 11, 15)));
+ }
+ else
+ {
+ auto __a
+ = _mm_unpacklo_epi8(__i0, __i2); // 08.. .... 19.. ....
+ auto __b
+ = _mm_unpackhi_epi8(__i0, __i2); // 2A.. .... 3B.. ....
+ auto __c
+ = _mm_unpacklo_epi8(__i1, __i3); // 4C.. .... 5D.. ....
+ auto __d
+ = _mm_unpackhi_epi8(__i1, __i3); // 6E.. .... 7F.. ....
+ auto __e = _mm_unpacklo_epi8(__a, __c); // 048C .... .... ....
+ auto __f = _mm_unpackhi_epi8(__a, __c); // 159D .... .... ....
+ auto __g = _mm_unpacklo_epi8(__b, __d); // 26AE .... .... ....
+ auto __h = _mm_unpackhi_epi8(__b, __d); // 37BF .... .... ....
+ return __intrin_bitcast<_To>(_mm_unpacklo_epi8(
+ _mm_unpacklo_epi8(__e, __g), // 0246 8ACE .... ....
+ _mm_unpacklo_epi8(__f, __h) // 1357 9BDF .... ....
+ )); // 0123 4567 89AB CDEF
+ }
+ }
+ else if constexpr (__y_to_y)
+ {
+ const auto __a = _mm256_shuffle_epi8(
+ __to_intrin((__vector_bitcast<_UShort>(_mm256_blend_epi16(
+ __i0, _mm256_slli_epi32(__i1, 16), 0xAA))
+ & 0xff)
+ | (__vector_bitcast<_UShort>(_mm256_blend_epi16(
+ __i2, _mm256_slli_epi32(__i3, 16), 0xAA))
+ << 8)),
+ _mm256_setr_epi8(0, 4, 8, 12, 2, 6, 10, 14, 1, 5, 9, 13, 3, 7,
+ 11, 15, 0, 4, 8, 12, 2, 6, 10, 14, 1, 5, 9, 13,
+ 3, 7, 11, 15));
+ return __intrin_bitcast<_To>(_mm256_permutevar8x32_epi32(
+ __a, _mm256_setr_epi32(0, 4, 1, 5, 2, 6, 3, 7)));
+ }
+ }
+ else if constexpr (__i64_to_f32)
+ { //{{{2
+ // this branch is only relevant with AVX and w/o AVX2 (i.e. no ymm
+ // integers)
+ if constexpr (__x_to_y)
+ {
+ return __make_wrapper<float>(__v0[0], __v0[1], __v1[0], __v1[1],
+ __v2[0], __v2[1], __v3[0], __v3[1]);
+
+ const auto __a = _mm_unpacklo_epi32(__i0, __i1); // acAC
+ const auto __b = _mm_unpackhi_epi32(__i0, __i1); // bdBD
+ const auto __c = _mm_unpacklo_epi32(__i2, __i3); // egEG
+ const auto __d = _mm_unpackhi_epi32(__i2, __i3); // fhFH
+ const auto __lo32a = _mm_unpacklo_epi32(__a, __b); // abcd
+ const auto __lo32b = _mm_unpacklo_epi32(__c, __d); // efgh
+ const auto __hi32
+ = __vector_bitcast<conditional_t<is_signed_v<_Tp>, int, _UInt>>(
+ __concat(_mm_unpackhi_epi32(__a, __b),
+ _mm_unpackhi_epi32(__c, __d))); // ABCD EFGH
+ const auto __hi
+ = 0x100000000LL
+ * __convert_x86<__vector_type_t<float, 8>>(__hi32);
+ const auto __mid
+ = 0x10000
+ * _mm256_cvtepi32_ps(__concat(_mm_srli_epi32(__lo32a, 16),
+ _mm_srli_epi32(__lo32b, 16)));
+ const auto __lo = _mm256_cvtepi32_ps(
+ __concat(_mm_set1_epi32(0x0000ffffu) & __lo32a,
+ _mm_set1_epi32(0x0000ffffu) & __lo32b));
+ return (__hi + __mid) + __lo;
+ }
+ }
+ else if constexpr (__f64_to_ibw)
+ { //{{{2
+ return __convert_x86<_To>(
+ __convert_x86<__vector_type_t<int, _Np * 2>>(__v0, __v1),
+ __convert_x86<__vector_type_t<int, _Np * 2>>(__v2, __v3));
+ }
+ else if constexpr (__f32_to_ibw)
+ { //{{{2
+ return __convert_x86<_To>(
+ __convert_x86<__vector_type_t<int, _Np>>(__v0),
+ __convert_x86<__vector_type_t<int, _Np>>(__v1),
+ __convert_x86<__vector_type_t<int, _Np>>(__v2),
+ __convert_x86<__vector_type_t<int, _Np>>(__v3));
+ } //}}}
+
+ // fallback: {{{2
+ if constexpr (sizeof(_To) >= 32)
+ // if _To is ymm or zmm, then _SimdWrapper<_Up, _M / 2> is xmm or ymm
+ return __concat(__convert_x86<__vector_type_t<_Up, _M / 2>>(__v0, __v1),
+ __convert_x86<__vector_type_t<_Up, _M / 2>>(__v2,
+ __v3));
+ else if constexpr (sizeof(_To) == 16)
+ {
+ const auto __lo = __to_intrin(__convert_x86<_To>(__v0, __v1));
+ const auto __hi = __to_intrin(__convert_x86<_To>(__v2, __v3));
+ if constexpr (sizeof(_Up) * _Np * 2 == 8)
+ {
+ if constexpr (is_floating_point_v<_Up>)
+ return __auto_bitcast(_mm_unpacklo_pd(__lo, __hi));
+ else
+ return __intrin_bitcast<_To>(_mm_unpacklo_epi64(__lo, __hi));
+ }
+ else if constexpr (sizeof(_Up) * _Np * 2 == 4)
+ {
+ if constexpr (is_floating_point_v<_Up>)
+ return __auto_bitcast(_mm_unpacklo_ps(__lo, __hi));
+ else
+ return __intrin_bitcast<_To>(_mm_unpacklo_epi32(__lo, __hi));
+ }
+ else
+ __assert_unreachable<_Tp>();
+ }
+ else
+ return __vector_convert<_To>(__v0, __v1, __v2, __v3,
+ make_index_sequence<_Np>());
+ //}}}2
+ }
+} //}}}
+// 8-arg __convert_x86 {{{1
+template <typename _To, typename _V, typename _Traits>
+_GLIBCXX_SIMD_INTRINSIC _To
+__convert_x86(_V __v0, _V __v1, _V __v2, _V __v3, _V __v4, _V __v5, _V __v6,
+ _V __v7)
+{
+ static_assert(__is_vector_type_v<_V>);
+ using _Tp = typename _Traits::value_type;
+ constexpr size_t _Np = _Traits::_S_width;
+ [[maybe_unused]] const auto __i0 = __to_intrin(__v0);
+ [[maybe_unused]] const auto __i1 = __to_intrin(__v1);
+ [[maybe_unused]] const auto __i2 = __to_intrin(__v2);
+ [[maybe_unused]] const auto __i3 = __to_intrin(__v3);
+ [[maybe_unused]] const auto __i4 = __to_intrin(__v4);
+ [[maybe_unused]] const auto __i5 = __to_intrin(__v5);
+ [[maybe_unused]] const auto __i6 = __to_intrin(__v6);
+ [[maybe_unused]] const auto __i7 = __to_intrin(__v7);
+ using _Up = typename _VectorTraits<_To>::value_type;
+ constexpr size_t _M = _VectorTraits<_To>::_S_width;
+
+ static_assert(8 * _Np <= _M,
+ "__v4-__v7 would be discarded; use the four/two/one-argument "
+ "__convert_x86 overload instead");
+
+ // [xyz]_to_[xyz] {{{2
+ [[maybe_unused]] constexpr bool __x_to_x
+ = sizeof(__v0) <= 16 && sizeof(_To) <= 16;
+ [[maybe_unused]] constexpr bool __x_to_y
+ = sizeof(__v0) <= 16 && sizeof(_To) == 32;
+ [[maybe_unused]] constexpr bool __x_to_z
+ = sizeof(__v0) <= 16 && sizeof(_To) == 64;
+ [[maybe_unused]] constexpr bool __y_to_x
+ = sizeof(__v0) == 32 && sizeof(_To) <= 16;
+ [[maybe_unused]] constexpr bool __y_to_y
+ = sizeof(__v0) == 32 && sizeof(_To) == 32;
+ [[maybe_unused]] constexpr bool __y_to_z
+ = sizeof(__v0) == 32 && sizeof(_To) == 64;
+ [[maybe_unused]] constexpr bool __z_to_x
+ = sizeof(__v0) == 64 && sizeof(_To) <= 16;
+ [[maybe_unused]] constexpr bool __z_to_y
+ = sizeof(__v0) == 64 && sizeof(_To) == 32;
+ [[maybe_unused]] constexpr bool __z_to_z
+ = sizeof(__v0) == 64 && sizeof(_To) == 64;
+
+ // [if]X_to_i8 {{{2
+ [[maybe_unused]] constexpr bool __i_to_i
+ = std::is_integral_v<_Up> && std::is_integral_v<_Tp>;
+ [[maybe_unused]] constexpr bool __i64_to_i8
+ = __i_to_i && sizeof(_Tp) == 8 && sizeof(_Up) == 1;
+ [[maybe_unused]] constexpr bool __f64_to_i8
+ = is_integral_v<_Up> && sizeof(_Up) == 1
+ && is_floating_point_v<_Tp> && sizeof(_Tp) == 8;
+
+ if constexpr (__i_to_i) // assert ISA {{{2
+ {
+ static_assert(__x_to_x || __have_avx2,
+ "integral conversions with ymm registers require AVX2");
+ static_assert(__have_avx512bw
+ || ((sizeof(_Tp) >= 4 || sizeof(__v0) < 64)
+ && (sizeof(_Up) >= 4 || sizeof(_To) < 64)),
+ "8/16-bit integers in zmm registers require AVX512BW");
+ static_assert((sizeof(__v0) < 64 && sizeof(_To) < 64) || __have_avx512f,
+ "integral conversions with ymm registers require AVX2");
+ }
+ // concat => use 4-arg __convert_x86 {{{2
+ if constexpr ((sizeof(__v0) == 16 && __have_avx2)
+ || (sizeof(__v0) == 16 && __have_avx
+ && std::is_floating_point_v<_Tp>)
+ || (sizeof(__v0) == 32 && __have_avx512f))
+ {
+ // The ISA can handle wider input registers, so concat and use two-arg
+ // implementation. This reduces code duplication considerably.
+ return __convert_x86<_To>(__concat(__v0, __v1), __concat(__v2, __v3),
+ __concat(__v4, __v5), __concat(__v6, __v7));
+ }
+ else //{{{2
+ {
+ // conversion using bit reinterpretation (or no conversion at all) should
+ // all go through the concat branch above:
+ static_assert(!(
+ std::is_floating_point_v<
+ _Tp> == std::is_floating_point_v<_Up> && sizeof(_Tp) == sizeof(_Up)));
+ static_assert(!(8 * _Np < _M && sizeof(_To) > 16),
+ "zero extension should be impossible");
+ if constexpr (__i64_to_i8) //{{{2
+ {
+ if constexpr (__x_to_x && __have_ssse3)
+ {
+ // unsure whether this is better than the variant below
+ return __intrin_bitcast<_To>(_mm_shuffle_epi8(
+ __to_intrin((((__v0 & 0xff) | ((__v1 & 0xff) << 8))
+ | (((__v2 & 0xff) << 16) | ((__v3 & 0xff) << 24)))
+ | ((((__v4 & 0xff) << 32) | ((__v5 & 0xff) << 40))
+ | (((__v6 & 0xff) << 48) | (__v7 << 56)))),
+ _mm_setr_epi8(0, 8, 1, 9, 2, 10, 3, 11, 4, 12, 5, 13, 6, 14, 7,
+ 15)));
+ }
+ else if constexpr (__x_to_x)
+ {
+ const auto __a = _mm_unpacklo_epi8(__i0, __i1); // ac
+ const auto __b = _mm_unpackhi_epi8(__i0, __i1); // bd
+ const auto __c = _mm_unpacklo_epi8(__i2, __i3); // eg
+ const auto __d = _mm_unpackhi_epi8(__i2, __i3); // fh
+ const auto __e = _mm_unpacklo_epi8(__i4, __i5); // ik
+ const auto __f = _mm_unpackhi_epi8(__i4, __i5); // jl
+ const auto __g = _mm_unpacklo_epi8(__i6, __i7); // mo
+ const auto __h = _mm_unpackhi_epi8(__i6, __i7); // np
+ return __intrin_bitcast<_To>(_mm_unpacklo_epi64(
+ _mm_unpacklo_epi32(_mm_unpacklo_epi8(__a, __b), // abcd
+ _mm_unpacklo_epi8(__c, __d)), // efgh
+ _mm_unpacklo_epi32(_mm_unpacklo_epi8(__e, __f), // ijkl
+ _mm_unpacklo_epi8(__g, __h)) // mnop
+ ));
+ }
+ else if constexpr (__y_to_y)
+ {
+ auto __a = // 048C GKOS 159D HLPT 26AE IMQU 37BF JNRV
+ __to_intrin((((__v0 & 0xff) | ((__v1 & 0xff) << 8))
+ | (((__v2 & 0xff) << 16) | ((__v3 & 0xff) << 24)))
+ | ((((__v4 & 0xff) << 32) | ((__v5 & 0xff) << 40))
+ | (((__v6 & 0xff) << 48) | ((__v7 << 56)))));
+ /*
+ auto __b = _mm256_unpackhi_epi64(__a, __a); // 159D HLPT 159D
+ HLPT 37BF JNRV 37BF JNRV auto __c = _mm256_unpacklo_epi8(__a,
+ __b); // 0145 89CD GHKL OPST 2367 ABEF IJMN QRUV auto __d =
+ __xzyw(__c); // 0145 89CD 2367 ABEF GHKL OPST IJMN QRUV return
+ _mm256_shuffle_epi8(
+ __d, _mm256_setr_epi8(0, 1, 8, 9, 2, 3, 10, 11, 4, 5, 12, 13,
+ 6, 7, 14, 15, 0, 1, 8, 9, 2, 3, 10, 11, 4, 5, 12, 13, 6, 7, 14,
+ 15));
+ */
+ auto __b = _mm256_shuffle_epi8( // 0145 89CD GHKL OPST 2367 ABEF
+ // IJMN QRUV
+ __a, _mm256_setr_epi8(0, 8, 1, 9, 2, 10, 3, 11, 4, 12, 5, 13, 6,
+ 14, 7, 15, 0, 8, 1, 9, 2, 10, 3, 11, 4,
+ 12, 5, 13, 6, 14, 7, 15));
+ auto __c = __xzyw(__b); // 0145 89CD 2367 ABEF GHKL OPST IJMN QRUV
+ return __intrin_bitcast<_To>(_mm256_shuffle_epi8(
+ __c, _mm256_setr_epi8(0, 1, 8, 9, 2, 3, 10, 11, 4, 5, 12, 13, 6,
+ 7, 14, 15, 0, 1, 8, 9, 2, 3, 10, 11, 4, 5,
+ 12, 13, 6, 7, 14, 15)));
+ }
+ else if constexpr (__z_to_z)
+ {
+ return __concat(
+ __convert_x86<__vector_type_t<_Up, _M / 2>>(__v0, __v1, __v2,
+ __v3),
+ __convert_x86<__vector_type_t<_Up, _M / 2>>(__v4, __v5, __v6,
+ __v7));
+ }
+ }
+ else if constexpr (__f64_to_i8) //{{{2
+ {
+ return __convert_x86<_To>(
+ __convert_x86<__vector_type_t<int, _Np * 2>>(__v0, __v1),
+ __convert_x86<__vector_type_t<int, _Np * 2>>(__v2, __v3),
+ __convert_x86<__vector_type_t<int, _Np * 2>>(__v4, __v5),
+ __convert_x86<__vector_type_t<int, _Np * 2>>(__v6, __v7));
+ }
+ else // unreachable {{{2
+ __assert_unreachable<_Tp>();
+ //}}}
+
+ // fallback: {{{2
+ if constexpr (sizeof(_To) >= 32)
+ // if _To is ymm or zmm, then _SimdWrapper<_Up, _M / 2> is xmm or ymm
+ return __concat(
+ __convert_x86<__vector_type_t<_Up, _M / 2>>(__v0, __v1, __v2, __v3),
+ __convert_x86<__vector_type_t<_Up, _M / 2>>(__v4, __v5, __v6, __v7));
+ else if constexpr (sizeof(_To) == 16)
+ {
+ const auto __lo
+ = __to_intrin(__convert_x86<_To>(__v0, __v1, __v2, __v3));
+ const auto __hi
+ = __to_intrin(__convert_x86<_To>(__v4, __v5, __v6, __v7));
+ static_assert(sizeof(_Up) == 1 && _Np == 2);
+ return __intrin_bitcast<_To>(_mm_unpacklo_epi64(__lo, __hi));
+ }
+ else
+ {
+ __assert_unreachable<_Tp>();
+ // return __vector_convert<_To>(__v0, __v1, __v2, __v3, __v4, __v5,
+ // __v6, __v7,
+ // make_index_sequence<_Np>());
+ } //}}}2
+ }
+} //}}}
+// 16-arg __convert_x86 {{{1
+template <typename _To, typename _V, typename _Traits>
+_GLIBCXX_SIMD_INTRINSIC _To
+__convert_x86(_V __v0, _V __v1, _V __v2, _V __v3, _V __v4, _V __v5, _V __v6,
+ _V __v7, _V __v8, _V __v9, _V __v10, _V __v11, _V __v12, _V __v13,
+ _V __v14, _V __v15)
+{
+ // concat => use 8-arg __convert_x86 {{{2
+ return __convert_x86<_To>(__concat(__v0, __v1), __concat(__v2, __v3),
+ __concat(__v4, __v5), __concat(__v6, __v7),
+ __concat(__v8, __v9), __concat(__v10, __v11),
+ __concat(__v12, __v13), __concat(__v14, __v15));
+} //}}}
+
+#endif // __cplusplus >= 201703L
+#endif // _GLIBCXX_EXPERIMENTAL_SIMD_X86_CONVERSIONS_H
+
+// vim: foldmethod=marker
diff --git a/libstdc++-v3/include/experimental/simd b/libstdc++-v3/include/experimental/simd
new file mode 100644
index 00000000000..cb875bd0e40
--- /dev/null
+++ b/libstdc++-v3/include/experimental/simd
@@ -0,0 +1,66 @@
+// Components for element-wise operations on data-parallel objects -*- C++ -*-
+
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library. This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+// <http://www.gnu.org/licenses/>.
+
+/** @file experimental/simd
+ * This is a TS C++ Library header.
+ */
+
+//
+// N4773 §9 data-parallel types library
+//
+
+#ifndef _GLIBCXX_EXPERIMENTAL_SIMD
+#define _GLIBCXX_EXPERIMENTAL_SIMD
+
+#define __cpp_lib_experimental_parallel_simd 201803
+
+#pragma GCC diagnostic push
+// Many [[gnu::vector_size(N)]] types might lead to a -Wpsabi warning which is
+// irrelevant as those functions never appear on ABI borders
+#pragma GCC diagnostic ignored "-Wpsabi"
+
+// If __OPTIMIZE__ is not defined some intrinsics are defined as macros, making
+// use of C casts internally. This requires us to disable the warning as it
+// would otherwise yield many false positives.
+#ifndef __OPTIMIZE__
+#pragma GCC diagnostic ignored "-Wold-style-cast"
+#endif
+
+#include "bits/simd_detail.h"
+#include "bits/simd.h"
+#include "bits/simd_fixed_size.h"
+#include "bits/simd_scalar.h"
+#include "bits/simd_builtin.h"
+#include "bits/simd_converter.h"
+#if _GLIBCXX_SIMD_X86INTRIN
+#include "bits/simd_x86.h"
+#elif _GLIBCXX_SIMD_HAVE_NEON
+#include "bits/simd_neon.h"
+#endif
+#include "bits/simd_math.h"
+
+#pragma GCC diagnostic pop
+
+#endif // _GLIBCXX_EXPERIMENTAL_SIMD
+// vim: ft=cpp
diff --git a/libstdc++-v3/testsuite/Makefile.am b/libstdc++-v3/testsuite/Makefile.am
index e19509d2534..9cef1e65e1b 100644
--- a/libstdc++-v3/testsuite/Makefile.am
+++ b/libstdc++-v3/testsuite/Makefile.am
@@ -47,6 +47,7 @@ site.exp: Makefile
@echo '## these variables are automatically generated by make ##' >site.tmp
@echo '# Do not edit here. If you wish to override these values' >>site.tmp
@echo '# edit the last section' >>site.tmp
+ @echo 'set tool libstdc++' >>site.tmp
@echo 'set srcdir $(srcdir)' >>site.tmp
@echo "set objdir `pwd`" >>site.tmp
@echo 'set build_alias "$(build_alias)"' >>site.tmp
@@ -55,7 +56,6 @@ site.exp: Makefile
@echo 'set host_triplet $(host_triplet)' >>site.tmp
@echo 'set target_alias "$(target_alias)"' >>site.tmp
@echo 'set target_triplet $(target_triplet)' >>site.tmp
- @echo 'set target_triplet $(target_triplet)' >>site.tmp
@echo 'set libiconv "$(LIBICONV)"' >>site.tmp
@echo 'set baseline_dir "$(baseline_dir)"' >> site.tmp
@echo 'set baseline_subdir_switch "$(baseline_subdir_switch)"' >> site.tmp
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-char-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/abs-char-constexpr.cc
new file mode 100644
index 00000000000..ffff65ee130
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-char-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-char-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/abs-char-fixed_size.cc
new file mode 100644
index 00000000000..f8dd7d4ef82
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-char-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-char.cc b/libstdc++-v3/testsuite/experimental/simd/abs-char.cc
new file mode 100644
index 00000000000..8b37d82caaa
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-char.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-char16_t-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/abs-char16_t-constexpr.cc
new file mode 100644
index 00000000000..4c11f64ea4f
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-char16_t-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<char16_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-char16_t-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/abs-char16_t-fixed_size.cc
new file mode 100644
index 00000000000..ef375ce9451
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-char16_t-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<char16_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-char16_t.cc b/libstdc++-v3/testsuite/experimental/simd/abs-char16_t.cc
new file mode 100644
index 00000000000..6618460cc38
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-char16_t.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<char16_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-char32_t-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/abs-char32_t-constexpr.cc
new file mode 100644
index 00000000000..e6c5ba261f3
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-char32_t-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<char32_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-char32_t-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/abs-char32_t-fixed_size.cc
new file mode 100644
index 00000000000..9b95c98421c
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-char32_t-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<char32_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-char32_t.cc b/libstdc++-v3/testsuite/experimental/simd/abs-char32_t.cc
new file mode 100644
index 00000000000..47e8fc78e90
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-char32_t.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<char32_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-double-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/abs-double-constexpr.cc
new file mode 100644
index 00000000000..4adce678c0f
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-double-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-double-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/abs-double-fixed_size.cc
new file mode 100644
index 00000000000..25bb1f8cb24
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-double-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-double.cc b/libstdc++-v3/testsuite/experimental/simd/abs-double.cc
new file mode 100644
index 00000000000..302389530c1
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-double.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-float-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/abs-float-constexpr.cc
new file mode 100644
index 00000000000..317d2b7db52
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-float-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<float>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-float-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/abs-float-fixed_size.cc
new file mode 100644
index 00000000000..e69ea67bc71
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-float-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<float>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-float.cc b/libstdc++-v3/testsuite/experimental/simd/abs-float.cc
new file mode 100644
index 00000000000..2ba2178d2b0
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-float.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<float>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-int-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/abs-int-constexpr.cc
new file mode 100644
index 00000000000..b515b5cb4a1
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-int-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-int-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/abs-int-fixed_size.cc
new file mode 100644
index 00000000000..c41eeb52641
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-int-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-int.cc b/libstdc++-v3/testsuite/experimental/simd/abs-int.cc
new file mode 100644
index 00000000000..7299e4af93d
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-int.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-long-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/abs-long-constexpr.cc
new file mode 100644
index 00000000000..5d3ac0a8217
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-long-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-long-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/abs-long-fixed_size.cc
new file mode 100644
index 00000000000..a5f27e384e7
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-long-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-long.cc b/libstdc++-v3/testsuite/experimental/simd/abs-long.cc
new file mode 100644
index 00000000000..64719277b7c
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-long.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-long_double-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/abs-long_double-constexpr.cc
new file mode 100644
index 00000000000..bdd51845a38
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-long_double-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<long double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-long_double-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/abs-long_double-fixed_size.cc
new file mode 100644
index 00000000000..0454574b3db
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-long_double-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<long double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-long_double.cc b/libstdc++-v3/testsuite/experimental/simd/abs-long_double.cc
new file mode 100644
index 00000000000..d18f45f8a45
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-long_double.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<long double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-long_long-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/abs-long_long-constexpr.cc
new file mode 100644
index 00000000000..736d0005a68
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-long_long-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-long_long-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/abs-long_long-fixed_size.cc
new file mode 100644
index 00000000000..2fdcfbb077c
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-long_long-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-long_long.cc b/libstdc++-v3/testsuite/experimental/simd/abs-long_long.cc
new file mode 100644
index 00000000000..f1dfe10f33f
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-long_long.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-short-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/abs-short-constexpr.cc
new file mode 100644
index 00000000000..5b95e5def75
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-short-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-short-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/abs-short-fixed_size.cc
new file mode 100644
index 00000000000..a09d81da561
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-short-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-short.cc b/libstdc++-v3/testsuite/experimental/simd/abs-short.cc
new file mode 100644
index 00000000000..d772aea85e2
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-short.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-signed_char-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/abs-signed_char-constexpr.cc
new file mode 100644
index 00000000000..e343396280b
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-signed_char-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<signed char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-signed_char-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/abs-signed_char-fixed_size.cc
new file mode 100644
index 00000000000..43146b7b6d5
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-signed_char-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<signed char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-signed_char.cc b/libstdc++-v3/testsuite/experimental/simd/abs-signed_char.cc
new file mode 100644
index 00000000000..bfd89fd5a96
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-signed_char.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<signed char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_char-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_char-constexpr.cc
new file mode 100644
index 00000000000..8d2eba06b13
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_char-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<unsigned char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_char-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_char-fixed_size.cc
new file mode 100644
index 00000000000..f3afe7b4548
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_char-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<unsigned char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_char.cc b/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_char.cc
new file mode 100644
index 00000000000..1b00d8bfdd2
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_char.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<unsigned char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_int-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_int-constexpr.cc
new file mode 100644
index 00000000000..05354330541
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_int-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<unsigned int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_int-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_int-fixed_size.cc
new file mode 100644
index 00000000000..6aeec8b356b
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_int-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<unsigned int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_int.cc b/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_int.cc
new file mode 100644
index 00000000000..d3581287a0a
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_int.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<unsigned int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_long-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_long-constexpr.cc
new file mode 100644
index 00000000000..547a3f77c71
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_long-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<unsigned long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_long-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_long-fixed_size.cc
new file mode 100644
index 00000000000..9bd90e8ea7e
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_long-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<unsigned long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_long.cc b/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_long.cc
new file mode 100644
index 00000000000..79c2fd3f739
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_long.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<unsigned long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_long_long-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_long_long-constexpr.cc
new file mode 100644
index 00000000000..0976cbe1082
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_long_long-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<unsigned long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_long_long-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_long_long-fixed_size.cc
new file mode 100644
index 00000000000..ed7adc17ac1
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_long_long-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<unsigned long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_long_long.cc b/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_long_long.cc
new file mode 100644
index 00000000000..60369fe7d23
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_long_long.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<unsigned long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_short-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_short-constexpr.cc
new file mode 100644
index 00000000000..e4078abd7e2
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_short-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<unsigned short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_short-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_short-fixed_size.cc
new file mode 100644
index 00000000000..5d24b6bd858
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_short-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<unsigned short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_short.cc b/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_short.cc
new file mode 100644
index 00000000000..b7bde5c7487
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-unsigned_short.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<unsigned short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-wchar_t-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/abs-wchar_t-constexpr.cc
new file mode 100644
index 00000000000..7adee468f76
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-wchar_t-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<wchar_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-wchar_t-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/abs-wchar_t-fixed_size.cc
new file mode 100644
index 00000000000..006e31a4a9c
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-wchar_t-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<wchar_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/abs-wchar_t.cc b/libstdc++-v3/testsuite/experimental/simd/abs-wchar_t.cc
new file mode 100644
index 00000000000..1b837f3f005
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/abs-wchar_t.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/abs.h"
+
+int main()
+{
+ iterate_abis<wchar_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-char-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-char-constexpr.cc
new file mode 100644
index 00000000000..453e5f6c644
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-char-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-char-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-char-fixed_size.cc
new file mode 100644
index 00000000000..5c6ec25040d
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-char-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-char.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-char.cc
new file mode 100644
index 00000000000..0b4b81155f6
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-char.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-char16_t-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-char16_t-constexpr.cc
new file mode 100644
index 00000000000..f946ac69bed
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-char16_t-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<char16_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-char16_t-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-char16_t-fixed_size.cc
new file mode 100644
index 00000000000..a04511a6cf5
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-char16_t-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<char16_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-char16_t.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-char16_t.cc
new file mode 100644
index 00000000000..3a8dcf9acd3
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-char16_t.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<char16_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-char32_t-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-char32_t-constexpr.cc
new file mode 100644
index 00000000000..ba1cd1b9cbd
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-char32_t-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<char32_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-char32_t-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-char32_t-fixed_size.cc
new file mode 100644
index 00000000000..797d60f7822
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-char32_t-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<char32_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-char32_t.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-char32_t.cc
new file mode 100644
index 00000000000..874e27d70a1
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-char32_t.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<char32_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-double-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-double-constexpr.cc
new file mode 100644
index 00000000000..3f3e06b4e17
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-double-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-double-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-double-fixed_size.cc
new file mode 100644
index 00000000000..c8690e22f1b
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-double-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-double.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-double.cc
new file mode 100644
index 00000000000..bf4accfc59b
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-double.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-float-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-float-constexpr.cc
new file mode 100644
index 00000000000..5b97d8a5a67
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-float-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<float>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-float-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-float-fixed_size.cc
new file mode 100644
index 00000000000..6c88f4e5289
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-float-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<float>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-float.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-float.cc
new file mode 100644
index 00000000000..e4b0eec5742
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-float.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<float>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-int-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-int-constexpr.cc
new file mode 100644
index 00000000000..2e40f05da5a
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-int-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-int-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-int-fixed_size.cc
new file mode 100644
index 00000000000..801fa146563
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-int-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-int.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-int.cc
new file mode 100644
index 00000000000..9e5a74e8d0c
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-int.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-long-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-long-constexpr.cc
new file mode 100644
index 00000000000..959e7689c99
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-long-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-long-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-long-fixed_size.cc
new file mode 100644
index 00000000000..1d997953e63
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-long-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-long.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-long.cc
new file mode 100644
index 00000000000..67f04518350
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-long.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-long_double-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-long_double-constexpr.cc
new file mode 100644
index 00000000000..284c48f9399
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-long_double-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<long double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-long_double-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-long_double-fixed_size.cc
new file mode 100644
index 00000000000..0c2973a88c8
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-long_double-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<long double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-long_double.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-long_double.cc
new file mode 100644
index 00000000000..649f5dc5d42
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-long_double.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<long double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-long_long-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-long_long-constexpr.cc
new file mode 100644
index 00000000000..9f454a9bda5
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-long_long-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-long_long-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-long_long-fixed_size.cc
new file mode 100644
index 00000000000..d1295fbd4e4
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-long_long-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-long_long.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-long_long.cc
new file mode 100644
index 00000000000..7e8a3f91b23
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-long_long.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-short-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-short-constexpr.cc
new file mode 100644
index 00000000000..fcc2d52a097
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-short-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-short-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-short-fixed_size.cc
new file mode 100644
index 00000000000..92e3fb0bdeb
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-short-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-short.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-short.cc
new file mode 100644
index 00000000000..e294906388c
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-short.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-signed_char-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-signed_char-constexpr.cc
new file mode 100644
index 00000000000..a02e310f606
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-signed_char-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<signed char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-signed_char-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-signed_char-fixed_size.cc
new file mode 100644
index 00000000000..51545f8960d
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-signed_char-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<signed char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-signed_char.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-signed_char.cc
new file mode 100644
index 00000000000..67bb7e9493e
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-signed_char.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<signed char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_char-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_char-constexpr.cc
new file mode 100644
index 00000000000..a71a1cee92e
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_char-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<unsigned char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_char-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_char-fixed_size.cc
new file mode 100644
index 00000000000..8c32dd2a4fc
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_char-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<unsigned char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_char.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_char.cc
new file mode 100644
index 00000000000..ce4e416091b
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_char.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<unsigned char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_int-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_int-constexpr.cc
new file mode 100644
index 00000000000..bbafb7a5fba
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_int-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<unsigned int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_int-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_int-fixed_size.cc
new file mode 100644
index 00000000000..63b6e61e0ef
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_int-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<unsigned int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_int.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_int.cc
new file mode 100644
index 00000000000..8704ef8bc48
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_int.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<unsigned int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_long-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_long-constexpr.cc
new file mode 100644
index 00000000000..7279d391ec5
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_long-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<unsigned long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_long-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_long-fixed_size.cc
new file mode 100644
index 00000000000..2bbd1e3cd7e
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_long-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<unsigned long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_long.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_long.cc
new file mode 100644
index 00000000000..2ec36d041cb
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_long.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<unsigned long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_long_long-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_long_long-constexpr.cc
new file mode 100644
index 00000000000..579ee3cb787
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_long_long-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<unsigned long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_long_long-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_long_long-fixed_size.cc
new file mode 100644
index 00000000000..eb216b1daf2
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_long_long-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<unsigned long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_long_long.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_long_long.cc
new file mode 100644
index 00000000000..9d0502d3e80
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_long_long.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<unsigned long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_short-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_short-constexpr.cc
new file mode 100644
index 00000000000..ea8f8d9b68d
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_short-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<unsigned short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_short-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_short-fixed_size.cc
new file mode 100644
index 00000000000..fb8650d4ddd
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_short-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<unsigned short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_short.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_short.cc
new file mode 100644
index 00000000000..e5d45f12a58
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-unsigned_short.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<unsigned short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-wchar_t-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-wchar_t-constexpr.cc
new file mode 100644
index 00000000000..52b4b70fcda
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-wchar_t-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<wchar_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-wchar_t-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-wchar_t-fixed_size.cc
new file mode 100644
index 00000000000..51f485c17e4
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-wchar_t-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<wchar_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/algorithms-wchar_t.cc b/libstdc++-v3/testsuite/experimental/simd/algorithms-wchar_t.cc
new file mode 100644
index 00000000000..e5df7bc35ab
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/algorithms-wchar_t.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/algorithms.h"
+
+int main()
+{
+ iterate_abis<wchar_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-char-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-char-constexpr.cc
new file mode 100644
index 00000000000..2441ead5416
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-char-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-char-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-char-fixed_size.cc
new file mode 100644
index 00000000000..b6a8850f988
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-char-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-char.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-char.cc
new file mode 100644
index 00000000000..a7bcdbf8c26
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-char.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-char16_t-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-char16_t-constexpr.cc
new file mode 100644
index 00000000000..7030a405433
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-char16_t-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<char16_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-char16_t-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-char16_t-fixed_size.cc
new file mode 100644
index 00000000000..1124d75c645
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-char16_t-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<char16_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-char16_t.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-char16_t.cc
new file mode 100644
index 00000000000..cdb827041cc
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-char16_t.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<char16_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-char32_t-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-char32_t-constexpr.cc
new file mode 100644
index 00000000000..7d585299fb5
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-char32_t-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<char32_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-char32_t-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-char32_t-fixed_size.cc
new file mode 100644
index 00000000000..0a09d2a27d5
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-char32_t-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<char32_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-char32_t.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-char32_t.cc
new file mode 100644
index 00000000000..6d127fed41e
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-char32_t.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<char32_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-double-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-double-constexpr.cc
new file mode 100644
index 00000000000..38acf53cd86
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-double-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-double-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-double-fixed_size.cc
new file mode 100644
index 00000000000..5a8480383c6
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-double-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-double.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-double.cc
new file mode 100644
index 00000000000..8a258106dec
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-double.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-float-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-float-constexpr.cc
new file mode 100644
index 00000000000..02bd74edb45
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-float-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<float>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-float-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-float-fixed_size.cc
new file mode 100644
index 00000000000..b0326aebeba
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-float-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<float>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-float.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-float.cc
new file mode 100644
index 00000000000..210d01aeeec
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-float.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<float>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-int-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-int-constexpr.cc
new file mode 100644
index 00000000000..e810f8a379d
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-int-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-int-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-int-fixed_size.cc
new file mode 100644
index 00000000000..2199cae9850
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-int-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-int.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-int.cc
new file mode 100644
index 00000000000..d3945085de0
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-int.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-long-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-long-constexpr.cc
new file mode 100644
index 00000000000..af48c1de8eb
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-long-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-long-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-long-fixed_size.cc
new file mode 100644
index 00000000000..90531aa4c85
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-long-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-long.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-long.cc
new file mode 100644
index 00000000000..cb839c32c75
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-long.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-long_double-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-long_double-constexpr.cc
new file mode 100644
index 00000000000..e9fdce9bb0e
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-long_double-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<long double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-long_double-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-long_double-fixed_size.cc
new file mode 100644
index 00000000000..a237b259b93
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-long_double-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<long double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-long_double.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-long_double.cc
new file mode 100644
index 00000000000..537a3fcb4c9
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-long_double.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<long double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-long_long-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-long_long-constexpr.cc
new file mode 100644
index 00000000000..8c0ca3413a5
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-long_long-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-long_long-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-long_long-fixed_size.cc
new file mode 100644
index 00000000000..586e5380a5d
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-long_long-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-long_long.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-long_long.cc
new file mode 100644
index 00000000000..6af141ac126
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-long_long.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-short-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-short-constexpr.cc
new file mode 100644
index 00000000000..ab2f19dca8c
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-short-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-short-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-short-fixed_size.cc
new file mode 100644
index 00000000000..1d71a7328f9
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-short-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-short.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-short.cc
new file mode 100644
index 00000000000..2f3a937715c
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-short.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-signed_char-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-signed_char-constexpr.cc
new file mode 100644
index 00000000000..a7a65fc4869
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-signed_char-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<signed char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-signed_char-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-signed_char-fixed_size.cc
new file mode 100644
index 00000000000..58f30eac548
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-signed_char-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<signed char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-signed_char.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-signed_char.cc
new file mode 100644
index 00000000000..40e707496c7
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-signed_char.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<signed char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_char-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_char-constexpr.cc
new file mode 100644
index 00000000000..23ffa001a7d
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_char-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<unsigned char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_char-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_char-fixed_size.cc
new file mode 100644
index 00000000000..1dbc5313a22
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_char-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<unsigned char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_char.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_char.cc
new file mode 100644
index 00000000000..b266c3d0900
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_char.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<unsigned char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_int-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_int-constexpr.cc
new file mode 100644
index 00000000000..114c4d28258
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_int-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<unsigned int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_int-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_int-fixed_size.cc
new file mode 100644
index 00000000000..063559e6a9b
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_int-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<unsigned int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_int.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_int.cc
new file mode 100644
index 00000000000..234eb95a65c
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_int.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<unsigned int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_long-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_long-constexpr.cc
new file mode 100644
index 00000000000..86b2fce8258
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_long-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<unsigned long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_long-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_long-fixed_size.cc
new file mode 100644
index 00000000000..148134d7baf
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_long-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<unsigned long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_long.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_long.cc
new file mode 100644
index 00000000000..a30a6e76162
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_long.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<unsigned long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_long_long-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_long_long-constexpr.cc
new file mode 100644
index 00000000000..4f10a2ad029
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_long_long-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<unsigned long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_long_long-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_long_long-fixed_size.cc
new file mode 100644
index 00000000000..e04cb2b23b3
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_long_long-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<unsigned long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_long_long.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_long_long.cc
new file mode 100644
index 00000000000..73824beead2
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_long_long.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<unsigned long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_short-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_short-constexpr.cc
new file mode 100644
index 00000000000..c87091d3151
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_short-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<unsigned short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_short-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_short-fixed_size.cc
new file mode 100644
index 00000000000..0f886eacf2e
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_short-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<unsigned short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_short.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_short.cc
new file mode 100644
index 00000000000..69d786bdca5
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-unsigned_short.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<unsigned short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-wchar_t-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-wchar_t-constexpr.cc
new file mode 100644
index 00000000000..94604182e00
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-wchar_t-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<wchar_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-wchar_t-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-wchar_t-fixed_size.cc
new file mode 100644
index 00000000000..45aaf4689fd
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-wchar_t-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<wchar_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/broadcast-wchar_t.cc b/libstdc++-v3/testsuite/experimental/simd/broadcast-wchar_t.cc
new file mode 100644
index 00000000000..319243c448d
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/broadcast-wchar_t.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/broadcast.h"
+
+int main()
+{
+ iterate_abis<wchar_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-char-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/casts-char-constexpr.cc
new file mode 100644
index 00000000000..378e2fb2df7
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-char-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-char-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/casts-char-fixed_size.cc
new file mode 100644
index 00000000000..a098d2e3f4e
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-char-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-char.cc b/libstdc++-v3/testsuite/experimental/simd/casts-char.cc
new file mode 100644
index 00000000000..f64a0024052
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-char.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-char16_t-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/casts-char16_t-constexpr.cc
new file mode 100644
index 00000000000..1722271157b
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-char16_t-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<char16_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-char16_t-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/casts-char16_t-fixed_size.cc
new file mode 100644
index 00000000000..bd8faab2002
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-char16_t-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<char16_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-char16_t.cc b/libstdc++-v3/testsuite/experimental/simd/casts-char16_t.cc
new file mode 100644
index 00000000000..bd1268c3612
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-char16_t.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<char16_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-char32_t-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/casts-char32_t-constexpr.cc
new file mode 100644
index 00000000000..938dfbb75d8
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-char32_t-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<char32_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-char32_t-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/casts-char32_t-fixed_size.cc
new file mode 100644
index 00000000000..85951c4320f
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-char32_t-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<char32_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-char32_t.cc b/libstdc++-v3/testsuite/experimental/simd/casts-char32_t.cc
new file mode 100644
index 00000000000..8ef1e7ce7e5
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-char32_t.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<char32_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-double-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/casts-double-constexpr.cc
new file mode 100644
index 00000000000..0cf0b9e5c79
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-double-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-double-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/casts-double-fixed_size.cc
new file mode 100644
index 00000000000..8b7f0c9aed0
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-double-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-double.cc b/libstdc++-v3/testsuite/experimental/simd/casts-double.cc
new file mode 100644
index 00000000000..41be646beeb
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-double.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-float-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/casts-float-constexpr.cc
new file mode 100644
index 00000000000..f7a4eff264e
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-float-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<float>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-float-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/casts-float-fixed_size.cc
new file mode 100644
index 00000000000..b854f481ab6
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-float-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<float>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-float.cc b/libstdc++-v3/testsuite/experimental/simd/casts-float.cc
new file mode 100644
index 00000000000..f766426a834
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-float.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<float>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-int-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/casts-int-constexpr.cc
new file mode 100644
index 00000000000..7851a637593
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-int-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-int-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/casts-int-fixed_size.cc
new file mode 100644
index 00000000000..2cdfc6e91b1
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-int-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-int.cc b/libstdc++-v3/testsuite/experimental/simd/casts-int.cc
new file mode 100644
index 00000000000..97d288508b8
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-int.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-long-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/casts-long-constexpr.cc
new file mode 100644
index 00000000000..0cd85e095d9
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-long-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-long-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/casts-long-fixed_size.cc
new file mode 100644
index 00000000000..9d43269a0f2
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-long-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-long.cc b/libstdc++-v3/testsuite/experimental/simd/casts-long.cc
new file mode 100644
index 00000000000..88cb7299fb7
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-long.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-long_double-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/casts-long_double-constexpr.cc
new file mode 100644
index 00000000000..d2b2ed1fa29
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-long_double-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<long double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-long_double-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/casts-long_double-fixed_size.cc
new file mode 100644
index 00000000000..1705e0b6fbc
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-long_double-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<long double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-long_double.cc b/libstdc++-v3/testsuite/experimental/simd/casts-long_double.cc
new file mode 100644
index 00000000000..3ca613c7eab
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-long_double.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<long double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-long_long-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/casts-long_long-constexpr.cc
new file mode 100644
index 00000000000..cc135964ab0
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-long_long-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-long_long-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/casts-long_long-fixed_size.cc
new file mode 100644
index 00000000000..cdc5a31cee6
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-long_long-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-long_long.cc b/libstdc++-v3/testsuite/experimental/simd/casts-long_long.cc
new file mode 100644
index 00000000000..3a671406607
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-long_long.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-short-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/casts-short-constexpr.cc
new file mode 100644
index 00000000000..13b56956836
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-short-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-short-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/casts-short-fixed_size.cc
new file mode 100644
index 00000000000..fe52ac7db75
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-short-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-short.cc b/libstdc++-v3/testsuite/experimental/simd/casts-short.cc
new file mode 100644
index 00000000000..6e5dbe83784
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-short.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-signed_char-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/casts-signed_char-constexpr.cc
new file mode 100644
index 00000000000..a1598ca85bc
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-signed_char-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<signed char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-signed_char-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/casts-signed_char-fixed_size.cc
new file mode 100644
index 00000000000..27b4d595db1
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-signed_char-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<signed char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-signed_char.cc b/libstdc++-v3/testsuite/experimental/simd/casts-signed_char.cc
new file mode 100644
index 00000000000..2c5a99941ab
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-signed_char.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<signed char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_char-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_char-constexpr.cc
new file mode 100644
index 00000000000..661dce00177
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_char-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<unsigned char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_char-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_char-fixed_size.cc
new file mode 100644
index 00000000000..0bafc8a31fb
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_char-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<unsigned char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_char.cc b/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_char.cc
new file mode 100644
index 00000000000..611642504cd
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_char.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<unsigned char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_int-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_int-constexpr.cc
new file mode 100644
index 00000000000..5289792e403
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_int-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<unsigned int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_int-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_int-fixed_size.cc
new file mode 100644
index 00000000000..00f1eb0d41b
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_int-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<unsigned int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_int.cc b/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_int.cc
new file mode 100644
index 00000000000..f7b202c27ae
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_int.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<unsigned int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_long-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_long-constexpr.cc
new file mode 100644
index 00000000000..fabb1d94dd4
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_long-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<unsigned long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_long-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_long-fixed_size.cc
new file mode 100644
index 00000000000..36070941276
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_long-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<unsigned long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_long.cc b/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_long.cc
new file mode 100644
index 00000000000..cf44cbc2c11
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_long.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<unsigned long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_long_long-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_long_long-constexpr.cc
new file mode 100644
index 00000000000..fee0dc4b452
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_long_long-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<unsigned long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_long_long-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_long_long-fixed_size.cc
new file mode 100644
index 00000000000..bc0d32f9fd8
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_long_long-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<unsigned long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_long_long.cc b/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_long_long.cc
new file mode 100644
index 00000000000..5779b69f410
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_long_long.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<unsigned long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_short-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_short-constexpr.cc
new file mode 100644
index 00000000000..86462e6ebb5
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_short-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<unsigned short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_short-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_short-fixed_size.cc
new file mode 100644
index 00000000000..fc7c4a81b15
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_short-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<unsigned short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_short.cc b/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_short.cc
new file mode 100644
index 00000000000..fbef0ec25f0
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-unsigned_short.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<unsigned short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-wchar_t-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/casts-wchar_t-constexpr.cc
new file mode 100644
index 00000000000..ea61dc67a2c
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-wchar_t-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<wchar_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-wchar_t-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/casts-wchar_t-fixed_size.cc
new file mode 100644
index 00000000000..a2f7f8eb820
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-wchar_t-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<wchar_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/casts-wchar_t.cc b/libstdc++-v3/testsuite/experimental/simd/casts-wchar_t.cc
new file mode 100644
index 00000000000..492bd52db7c
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/casts-wchar_t.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/casts.h"
+
+int main()
+{
+ iterate_abis<wchar_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/fpclassify-double-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/fpclassify-double-constexpr.cc
new file mode 100644
index 00000000000..0c5b3955c4c
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/fpclassify-double-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/fpclassify.h"
+
+int main()
+{
+ iterate_abis<double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/fpclassify-double-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/fpclassify-double-fixed_size.cc
new file mode 100644
index 00000000000..69fc7e9e28f
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/fpclassify-double-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/fpclassify.h"
+
+int main()
+{
+ iterate_abis<double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/fpclassify-double.cc b/libstdc++-v3/testsuite/experimental/simd/fpclassify-double.cc
new file mode 100644
index 00000000000..25693b2082c
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/fpclassify-double.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/fpclassify.h"
+
+int main()
+{
+ iterate_abis<double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/fpclassify-float-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/fpclassify-float-constexpr.cc
new file mode 100644
index 00000000000..cb2ce60a75c
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/fpclassify-float-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/fpclassify.h"
+
+int main()
+{
+ iterate_abis<float>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/fpclassify-float-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/fpclassify-float-fixed_size.cc
new file mode 100644
index 00000000000..80ca4a6043f
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/fpclassify-float-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/fpclassify.h"
+
+int main()
+{
+ iterate_abis<float>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/fpclassify-float.cc b/libstdc++-v3/testsuite/experimental/simd/fpclassify-float.cc
new file mode 100644
index 00000000000..886a4d0c83b
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/fpclassify-float.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/fpclassify.h"
+
+int main()
+{
+ iterate_abis<float>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/fpclassify-long_double-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/fpclassify-long_double-constexpr.cc
new file mode 100644
index 00000000000..3d3578f974e
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/fpclassify-long_double-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/fpclassify.h"
+
+int main()
+{
+ iterate_abis<long double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/fpclassify-long_double-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/fpclassify-long_double-fixed_size.cc
new file mode 100644
index 00000000000..6848e5db58c
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/fpclassify-long_double-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/fpclassify.h"
+
+int main()
+{
+ iterate_abis<long double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/fpclassify-long_double.cc b/libstdc++-v3/testsuite/experimental/simd/fpclassify-long_double.cc
new file mode 100644
index 00000000000..66eb5aaa7be
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/fpclassify-long_double.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/fpclassify.h"
+
+int main()
+{
+ iterate_abis<long double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/frexp-double-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/frexp-double-constexpr.cc
new file mode 100644
index 00000000000..e86618b52fb
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/frexp-double-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/frexp.h"
+
+int main()
+{
+ iterate_abis<double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/frexp-double-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/frexp-double-fixed_size.cc
new file mode 100644
index 00000000000..0cf0d6f6de6
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/frexp-double-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/frexp.h"
+
+int main()
+{
+ iterate_abis<double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/frexp-double.cc b/libstdc++-v3/testsuite/experimental/simd/frexp-double.cc
new file mode 100644
index 00000000000..ebfc0f1b738
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/frexp-double.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/frexp.h"
+
+int main()
+{
+ iterate_abis<double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/frexp-float-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/frexp-float-constexpr.cc
new file mode 100644
index 00000000000..7c5a9838a41
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/frexp-float-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/frexp.h"
+
+int main()
+{
+ iterate_abis<float>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/frexp-float-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/frexp-float-fixed_size.cc
new file mode 100644
index 00000000000..b2d82af9f16
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/frexp-float-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/frexp.h"
+
+int main()
+{
+ iterate_abis<float>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/frexp-float.cc b/libstdc++-v3/testsuite/experimental/simd/frexp-float.cc
new file mode 100644
index 00000000000..584ea43afe3
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/frexp-float.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/frexp.h"
+
+int main()
+{
+ iterate_abis<float>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/frexp-long_double-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/frexp-long_double-constexpr.cc
new file mode 100644
index 00000000000..00e2a6c9dbc
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/frexp-long_double-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/frexp.h"
+
+int main()
+{
+ iterate_abis<long double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/frexp-long_double-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/frexp-long_double-fixed_size.cc
new file mode 100644
index 00000000000..abdcd9d5126
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/frexp-long_double-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/frexp.h"
+
+int main()
+{
+ iterate_abis<long double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/frexp-long_double.cc b/libstdc++-v3/testsuite/experimental/simd/frexp-long_double.cc
new file mode 100644
index 00000000000..00a3b61b33d
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/frexp-long_double.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/frexp.h"
+
+int main()
+{
+ iterate_abis<long double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generate_testcases.sh b/libstdc++-v3/testsuite/experimental/simd/generate_testcases.sh
new file mode 100755
index 00000000000..7acf17c7eed
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generate_testcases.sh
@@ -0,0 +1,81 @@
+#!/bin/bash
+
+floattypes=(
+"long double"
+"double"
+"float"
+)
+alltypes=(
+"${floattypes[@]}"
+"long long"
+"unsigned long long"
+"unsigned long"
+"long"
+"int"
+"unsigned int"
+"short"
+"unsigned short"
+"char"
+"signed char"
+"unsigned char"
+"char32_t"
+"char16_t"
+"wchar_t"
+)
+
+cd ${0%/*}
+for testcase in tests/*.h; do
+ if grep -q "test only floattypes" "$testcase"; then
+ typelist=("${floattypes[@]}")
+ else
+ typelist=("${alltypes[@]}")
+ fi
+ testcase=${testcase%.h}
+ testcase=${testcase##*/}
+ for type in "${typelist[@]}"; do
+ if [[ $testcase == sincos ]]; then
+ # The sincos test requires reference data to run
+ extra='// { dg-do compile }'
+ else
+ extra=''
+ fi
+ filename="${testcase}-${type// /_}"
+
+ cat > "${filename}.cc" <<EOF
+// { dg-options "-std=c++17" }
+${extra}
+#include "tests/${testcase}.h"
+
+int main()
+{
+ iterate_abis<${type}>();
+ return 0;
+}
+EOF
+ cat > "${filename}-constexpr.cc" <<EOF
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+${extra}
+#include "tests/${testcase}.h"
+
+int main()
+{
+ iterate_abis<${type}>();
+ return 0;
+}
+EOF
+ cat > "${filename}-fixed_size.cc" <<EOF
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+${extra}
+#define TESTFIXEDSIZE 1
+#include "tests/${testcase}.h"
+
+int main()
+{
+ iterate_abis<${type}>();
+ return 0;
+}
+EOF
+ done
+done
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-char-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/generator-char-constexpr.cc
new file mode 100644
index 00000000000..dceb9ab29b3
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-char-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-char-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/generator-char-fixed_size.cc
new file mode 100644
index 00000000000..f20cc441d74
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-char-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-char.cc b/libstdc++-v3/testsuite/experimental/simd/generator-char.cc
new file mode 100644
index 00000000000..790e4e3636a
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-char.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-char16_t-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/generator-char16_t-constexpr.cc
new file mode 100644
index 00000000000..59ea27c0802
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-char16_t-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<char16_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-char16_t-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/generator-char16_t-fixed_size.cc
new file mode 100644
index 00000000000..19fa325ed51
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-char16_t-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<char16_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-char16_t.cc b/libstdc++-v3/testsuite/experimental/simd/generator-char16_t.cc
new file mode 100644
index 00000000000..897ee1c7a88
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-char16_t.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<char16_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-char32_t-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/generator-char32_t-constexpr.cc
new file mode 100644
index 00000000000..4db121300fb
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-char32_t-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<char32_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-char32_t-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/generator-char32_t-fixed_size.cc
new file mode 100644
index 00000000000..62b5cd6c29f
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-char32_t-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<char32_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-char32_t.cc b/libstdc++-v3/testsuite/experimental/simd/generator-char32_t.cc
new file mode 100644
index 00000000000..2b04c8bda75
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-char32_t.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<char32_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-double-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/generator-double-constexpr.cc
new file mode 100644
index 00000000000..de491f79875
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-double-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-double-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/generator-double-fixed_size.cc
new file mode 100644
index 00000000000..e7af2ed7082
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-double-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-double.cc b/libstdc++-v3/testsuite/experimental/simd/generator-double.cc
new file mode 100644
index 00000000000..09ac4bdc33d
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-double.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-float-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/generator-float-constexpr.cc
new file mode 100644
index 00000000000..edabab7d3e8
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-float-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<float>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-float-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/generator-float-fixed_size.cc
new file mode 100644
index 00000000000..75d18751c02
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-float-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<float>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-float.cc b/libstdc++-v3/testsuite/experimental/simd/generator-float.cc
new file mode 100644
index 00000000000..40f44fae4d7
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-float.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<float>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-int-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/generator-int-constexpr.cc
new file mode 100644
index 00000000000..643a071d7c2
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-int-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-int-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/generator-int-fixed_size.cc
new file mode 100644
index 00000000000..acd38d02921
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-int-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-int.cc b/libstdc++-v3/testsuite/experimental/simd/generator-int.cc
new file mode 100644
index 00000000000..2166ba8d480
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-int.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-long-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/generator-long-constexpr.cc
new file mode 100644
index 00000000000..25b994c26a0
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-long-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-long-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/generator-long-fixed_size.cc
new file mode 100644
index 00000000000..a2d5ecfce3c
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-long-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-long.cc b/libstdc++-v3/testsuite/experimental/simd/generator-long.cc
new file mode 100644
index 00000000000..9529bcc37ab
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-long.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-long_double-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/generator-long_double-constexpr.cc
new file mode 100644
index 00000000000..f96beaa690a
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-long_double-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<long double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-long_double-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/generator-long_double-fixed_size.cc
new file mode 100644
index 00000000000..e60f903b48e
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-long_double-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<long double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-long_double.cc b/libstdc++-v3/testsuite/experimental/simd/generator-long_double.cc
new file mode 100644
index 00000000000..dbb5cac8e6b
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-long_double.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<long double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-long_long-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/generator-long_long-constexpr.cc
new file mode 100644
index 00000000000..e6b9f93fea7
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-long_long-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-long_long-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/generator-long_long-fixed_size.cc
new file mode 100644
index 00000000000..cb23b21fcc4
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-long_long-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-long_long.cc b/libstdc++-v3/testsuite/experimental/simd/generator-long_long.cc
new file mode 100644
index 00000000000..b1d1de2a2f1
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-long_long.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-short-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/generator-short-constexpr.cc
new file mode 100644
index 00000000000..84d3314be24
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-short-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-short-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/generator-short-fixed_size.cc
new file mode 100644
index 00000000000..44a6764f7e3
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-short-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-short.cc b/libstdc++-v3/testsuite/experimental/simd/generator-short.cc
new file mode 100644
index 00000000000..5343657320f
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-short.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-signed_char-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/generator-signed_char-constexpr.cc
new file mode 100644
index 00000000000..fd35555d54a
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-signed_char-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<signed char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-signed_char-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/generator-signed_char-fixed_size.cc
new file mode 100644
index 00000000000..bdca8349c33
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-signed_char-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<signed char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-signed_char.cc b/libstdc++-v3/testsuite/experimental/simd/generator-signed_char.cc
new file mode 100644
index 00000000000..0c1f5bb6118
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-signed_char.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<signed char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_char-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_char-constexpr.cc
new file mode 100644
index 00000000000..6802c31a3f8
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_char-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<unsigned char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_char-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_char-fixed_size.cc
new file mode 100644
index 00000000000..d990de8de5b
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_char-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<unsigned char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_char.cc b/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_char.cc
new file mode 100644
index 00000000000..2c4a0c57404
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_char.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<unsigned char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_int-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_int-constexpr.cc
new file mode 100644
index 00000000000..daba85f07ef
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_int-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<unsigned int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_int-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_int-fixed_size.cc
new file mode 100644
index 00000000000..6bdbebcdd24
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_int-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<unsigned int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_int.cc b/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_int.cc
new file mode 100644
index 00000000000..fed7b58d6ab
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_int.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<unsigned int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_long-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_long-constexpr.cc
new file mode 100644
index 00000000000..da209e2b894
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_long-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<unsigned long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_long-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_long-fixed_size.cc
new file mode 100644
index 00000000000..ab20c3f87ac
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_long-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<unsigned long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_long.cc b/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_long.cc
new file mode 100644
index 00000000000..66b330f2d5f
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_long.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<unsigned long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_long_long-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_long_long-constexpr.cc
new file mode 100644
index 00000000000..047ff571237
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_long_long-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<unsigned long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_long_long-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_long_long-fixed_size.cc
new file mode 100644
index 00000000000..6c96a68f2b3
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_long_long-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<unsigned long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_long_long.cc b/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_long_long.cc
new file mode 100644
index 00000000000..609e23f5df3
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_long_long.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<unsigned long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_short-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_short-constexpr.cc
new file mode 100644
index 00000000000..b24d0d9a60a
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_short-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<unsigned short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_short-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_short-fixed_size.cc
new file mode 100644
index 00000000000..456ece81cdc
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_short-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<unsigned short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_short.cc b/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_short.cc
new file mode 100644
index 00000000000..cc7f8c3d287
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-unsigned_short.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<unsigned short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-wchar_t-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/generator-wchar_t-constexpr.cc
new file mode 100644
index 00000000000..5cf9521b7c3
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-wchar_t-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<wchar_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-wchar_t-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/generator-wchar_t-fixed_size.cc
new file mode 100644
index 00000000000..4f77cfe7a91
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-wchar_t-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<wchar_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/generator-wchar_t.cc b/libstdc++-v3/testsuite/experimental/simd/generator-wchar_t.cc
new file mode 100644
index 00000000000..6c775fdd0e9
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/generator-wchar_t.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/generator.h"
+
+int main()
+{
+ iterate_abis<wchar_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/hypot3_fma-double-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/hypot3_fma-double-constexpr.cc
new file mode 100644
index 00000000000..bd6936cb40f
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/hypot3_fma-double-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/hypot3_fma.h"
+
+int main()
+{
+ iterate_abis<double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/hypot3_fma-double-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/hypot3_fma-double-fixed_size.cc
new file mode 100644
index 00000000000..eba5aa120ae
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/hypot3_fma-double-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/hypot3_fma.h"
+
+int main()
+{
+ iterate_abis<double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/hypot3_fma-double.cc b/libstdc++-v3/testsuite/experimental/simd/hypot3_fma-double.cc
new file mode 100644
index 00000000000..442cec265eb
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/hypot3_fma-double.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/hypot3_fma.h"
+
+int main()
+{
+ iterate_abis<double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/hypot3_fma-float-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/hypot3_fma-float-constexpr.cc
new file mode 100644
index 00000000000..43fab5b1b8e
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/hypot3_fma-float-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/hypot3_fma.h"
+
+int main()
+{
+ iterate_abis<float>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/hypot3_fma-float-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/hypot3_fma-float-fixed_size.cc
new file mode 100644
index 00000000000..e933dc8aea4
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/hypot3_fma-float-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/hypot3_fma.h"
+
+int main()
+{
+ iterate_abis<float>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/hypot3_fma-float.cc b/libstdc++-v3/testsuite/experimental/simd/hypot3_fma-float.cc
new file mode 100644
index 00000000000..24132704a26
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/hypot3_fma-float.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/hypot3_fma.h"
+
+int main()
+{
+ iterate_abis<float>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/hypot3_fma-long_double-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/hypot3_fma-long_double-constexpr.cc
new file mode 100644
index 00000000000..658c8a2fb6d
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/hypot3_fma-long_double-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/hypot3_fma.h"
+
+int main()
+{
+ iterate_abis<long double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/hypot3_fma-long_double-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/hypot3_fma-long_double-fixed_size.cc
new file mode 100644
index 00000000000..afed35e475f
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/hypot3_fma-long_double-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/hypot3_fma.h"
+
+int main()
+{
+ iterate_abis<long double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/hypot3_fma-long_double.cc b/libstdc++-v3/testsuite/experimental/simd/hypot3_fma-long_double.cc
new file mode 100644
index 00000000000..78cd653f795
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/hypot3_fma-long_double.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/hypot3_fma.h"
+
+int main()
+{
+ iterate_abis<long double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-char-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-char-constexpr.cc
new file mode 100644
index 00000000000..c3c0bd70f9d
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-char-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-char-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-char-fixed_size.cc
new file mode 100644
index 00000000000..c934dac6e65
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-char-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-char.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-char.cc
new file mode 100644
index 00000000000..02c2324be0e
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-char.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-char16_t-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-char16_t-constexpr.cc
new file mode 100644
index 00000000000..16cd5b3e477
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-char16_t-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<char16_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-char16_t-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-char16_t-fixed_size.cc
new file mode 100644
index 00000000000..56914e866f7
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-char16_t-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<char16_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-char16_t.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-char16_t.cc
new file mode 100644
index 00000000000..708c36f3dd0
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-char16_t.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<char16_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-char32_t-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-char32_t-constexpr.cc
new file mode 100644
index 00000000000..fbabea41c66
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-char32_t-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<char32_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-char32_t-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-char32_t-fixed_size.cc
new file mode 100644
index 00000000000..50676c64bdf
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-char32_t-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<char32_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-char32_t.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-char32_t.cc
new file mode 100644
index 00000000000..64d23ab4b8d
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-char32_t.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<char32_t>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-double-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-double-constexpr.cc
new file mode 100644
index 00000000000..c80490ddfa9
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-double-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-double-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-double-fixed_size.cc
new file mode 100644
index 00000000000..65717f6a449
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-double-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-double.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-double.cc
new file mode 100644
index 00000000000..9caf5aadf4f
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-double.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-float-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-float-constexpr.cc
new file mode 100644
index 00000000000..4d57562fef6
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-float-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<float>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-float-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-float-fixed_size.cc
new file mode 100644
index 00000000000..3b7dd998e24
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-float-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<float>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-float.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-float.cc
new file mode 100644
index 00000000000..9a5219fd89e
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-float.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<float>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-int-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-int-constexpr.cc
new file mode 100644
index 00000000000..d829d8bc842
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-int-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-int-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-int-fixed_size.cc
new file mode 100644
index 00000000000..72e1647d920
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-int-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-int.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-int.cc
new file mode 100644
index 00000000000..61b1970c831
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-int.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<int>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-long-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-long-constexpr.cc
new file mode 100644
index 00000000000..fb74cbec9a7
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-long-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-long-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-long-fixed_size.cc
new file mode 100644
index 00000000000..6f1892f30c5
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-long-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-long.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-long.cc
new file mode 100644
index 00000000000..d2ae50b8800
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-long.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-long_double-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-long_double-constexpr.cc
new file mode 100644
index 00000000000..42884f0f483
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-long_double-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<long double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-long_double-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-long_double-fixed_size.cc
new file mode 100644
index 00000000000..a617c0a0a8c
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-long_double-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<long double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-long_double.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-long_double.cc
new file mode 100644
index 00000000000..67ed81b7001
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-long_double.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<long double>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-long_long-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-long_long-constexpr.cc
new file mode 100644
index 00000000000..521a4ef7a0e
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-long_long-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-long_long-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-long_long-fixed_size.cc
new file mode 100644
index 00000000000..232f0942d2c
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-long_long-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-long_long.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-long_long.cc
new file mode 100644
index 00000000000..4768c4fadba
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-long_long.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<long long>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-short-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-short-constexpr.cc
new file mode 100644
index 00000000000..aab5b19a523
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-short-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-short-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-short-fixed_size.cc
new file mode 100644
index 00000000000..ec7ed1c31c0
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-short-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-short.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-short.cc
new file mode 100644
index 00000000000..ba0d08ef1f4
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-short.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<short>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-signed_char-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-signed_char-constexpr.cc
new file mode 100644
index 00000000000..4cd49cc02f8
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-signed_char-constexpr.cc
@@ -0,0 +1,10 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<signed char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-signed_char-fixed_size.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-signed_char-fixed_size.cc
new file mode 100644
index 00000000000..9f2da6b998d
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-signed_char-fixed_size.cc
@@ -0,0 +1,11 @@
+// { dg-options "-std=gnu++17" }
+// { dg-require-effective-target run_expensive_tests }
+
+#define TESTFIXEDSIZE 1
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<signed char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-signed_char.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-signed_char.cc
new file mode 100644
index 00000000000..76491b7ae17
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/simd/integer_operators-signed_char.cc
@@ -0,0 +1,9 @@
+// { dg-options "-std=c++17" }
+
+#include "tests/integer_operators.h"
+
+int main()
+{
+ iterate_abis<signed char>();
+ return 0;
+}
diff --git a/libstdc++-v3/testsuite/experimental/simd/integer_operators-unsigned_char-constexpr.cc b/libstdc++-v3/testsuite/experimental/simd/integer_operators-unsigned_char-constexpr.cc
new file mode 100644
index 00000000000..33781182de0
--- /dev/null
+++