From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lxmtout2.gsi.de (lxmtout2.gsi.de [140.181.3.112]) by sourceware.org (Postfix) with ESMTPS id 7C184386103B; Fri, 18 Dec 2020 15:49:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 7C184386103B Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=gsi.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=M.Kretz@gsi.de Received: from localhost (localhost [127.0.0.1]) by lxmtout2.gsi.de (Postfix) with ESMTP id 72226202AD6D; Fri, 18 Dec 2020 16:49:48 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at lxmtout2.gsi.de Received: from lxmtout2.gsi.de ([127.0.0.1]) by localhost (lxmtout2.gsi.de [127.0.0.1]) (amavisd-new, port 10024) with LMTP id V-ibOnya0daW; Fri, 18 Dec 2020 16:49:47 +0100 (CET) Received: from srvex3.campus.gsi.de (srvex3.campus.gsi.de [10.10.4.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-SHA256 (128/128 bits)) (No client certificate requested) by lxmtout2.gsi.de (Postfix) with ESMTPS id 88AF3202AD5D; Fri, 18 Dec 2020 16:49:45 +0100 (CET) Received: from excalibur.localnet (140.181.3.12) by srvex3.campus.gsi.de (10.10.4.16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2106.2; Fri, 18 Dec 2020 16:49:43 +0100 From: Matthias Kretz To: , Subject: [PATCH 1/2] Add std::experimental::simd from the Parallelism TS 2 Date: Fri, 18 Dec 2020 16:49:43 +0100 Message-ID: <3133130.QemTDgPxUG@excalibur> Organization: GSI Helmholtzzentrum =?UTF-8?B?ZsO8cg==?= Schwerionenforschung MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="nextPart3621190.fJnDbHsrEp" Content-Transfer-Encoding: 7Bit X-Originating-IP: [140.181.3.12] X-ClientProxiedBy: SRVEX2.campus.gsi.de (10.10.4.15) To srvex3.campus.gsi.de (10.10.4.16) X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, BODY_8BITS, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_NONE, SCC_10_SHORT_WORD_LINES, SCC_20_SHORT_WORD_LINES, SCC_35_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_PASS, TXREP, T_SPF_HELO_PERMERROR autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libstdc++@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libstdc++ mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Dec 2020 15:50:52 -0000 --nextPart3621190.fJnDbHsrEp Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="UTF-8" Resending this patch with proper commit message and rebased on master. =46rom: Matthias Kretz Adds . This implements the simd and simd_mask class templates via [[gnu::vector_size(N)]] data members. It implements overloads for all of for simd. Explicit vectorization of the functions is not finished. The majority of functions are marked as [[gnu::always_inline]] to enable quasi-ODR-conforming linking of TUs with different -m flags. Performance optimization was done for x86_64. ARM, Aarch64, and POWER rely on the compiler to recognize reduction, conversion, and shuffle patterns. Besides verification using many different machine flages, the code was also verified with different fast-math flags. libstdc++-v3/ChangeLog: * doc/xml/manual/status_cxx2017.xml: Add implementation status of the Parallelism TS 2. Document implementation-defined types and behavior. * include/Makefile.am: Add new headers. * include/Makefile.in: Regenerate. * include/experimental/simd: New file. New header for Parallelism TS 2. * include/experimental/bits/numeric_traits.h: New file. Implementation of P1841R1 using internal naming. Addition of missing IEC559 functionality query. * include/experimental/bits/simd.h: New file. Definition of the public simd interfaces and general implementation helpers. * include/experimental/bits/simd_builtin.h: New file. Implementation of the _VecBuiltin simd_abi. * include/experimental/bits/simd_converter.h: New file. Generic simd conversions. * include/experimental/bits/simd_detail.h: New file. Internal macros for the simd implementation. * include/experimental/bits/simd_fixed_size.h: New file. Simd fixed_size ABI specific implementations. * include/experimental/bits/simd_math.h: New file. Math overloads for simd. * include/experimental/bits/simd_neon.h: New file. Simd NEON specific implementations. * include/experimental/bits/simd_ppc.h: New file. Implement bit shifts to avoid invalid results for integral types smaller than int. * include/experimental/bits/simd_scalar.h: New file. Simd scalar ABI specific implementations. * include/experimental/bits/simd_x86.h: New file. Simd x86 specific implementations. * include/experimental/bits/simd_x86_conversions.h: New file. x86 specific conversion optimizations. The conversion patterns work around missing conversion patterns in the compiler and should be removed as soon as PR85048 is resolved. * testsuite/experimental/simd/standard_abi_usable.cc: New file. Test that all (not all fixed_size, though) standard simd and simd_mask types are usable. * testsuite/experimental/simd/standard_abi_usable_2.cc: New file. As above but with -ffast-math. * testsuite/libstdc++-dg/conformance.exp: Don't build simd tests from the standard test loop. Instead use check_vect_support_and_set_flags to build simd tests with the relevant machine flags. =2D-- .../doc/xml/manual/status_cxx2017.xml | 216 + libstdc++-v3/include/Makefile.am | 13 + libstdc++-v3/include/Makefile.in | 13 + .../experimental/bits/numeric_traits.h | 567 ++ libstdc++-v3/include/experimental/bits/simd.h | 5051 ++++++++++++++++ .../include/experimental/bits/simd_builtin.h | 2949 ++++++++++ .../experimental/bits/simd_converter.h | 354 ++ .../include/experimental/bits/simd_detail.h | 306 + .../experimental/bits/simd_fixed_size.h | 2066 +++++++ .../include/experimental/bits/simd_math.h | 1500 +++++ .../include/experimental/bits/simd_neon.h | 519 ++ .../include/experimental/bits/simd_ppc.h | 123 + .../include/experimental/bits/simd_scalar.h | 772 +++ .../include/experimental/bits/simd_x86.h | 5169 +++++++++++++++++ .../experimental/bits/simd_x86_conversions.h | 2029 +++++++ libstdc++-v3/include/experimental/simd | 70 + .../experimental/simd/standard_abi_usable.cc | 64 + .../simd/standard_abi_usable_2.cc | 4 + .../testsuite/libstdc++-dg/conformance.exp | 18 +- 19 files changed, 21802 insertions(+), 1 deletion(-) create mode 100644 libstdc++-v3/include/experimental/bits/numeric_traits.h create mode 100644 libstdc++-v3/include/experimental/bits/simd.h create mode 100644 libstdc++-v3/include/experimental/bits/simd_builtin.h create mode 100644 libstdc++-v3/include/experimental/bits/simd_converter.h create mode 100644 libstdc++-v3/include/experimental/bits/simd_detail.h create mode 100644 libstdc++-v3/include/experimental/bits/simd_fixed_size.h create mode 100644 libstdc++-v3/include/experimental/bits/simd_math.h create mode 100644 libstdc++-v3/include/experimental/bits/simd_neon.h create mode 100644 libstdc++-v3/include/experimental/bits/simd_ppc.h create mode 100644 libstdc++-v3/include/experimental/bits/simd_scalar.h create mode 100644 libstdc++-v3/include/experimental/bits/simd_x86.h create mode 100644 libstdc++-v3/include/experimental/bits/ simd_x86_conversions.h create mode 100644 libstdc++-v3/include/experimental/simd create mode 100644 libstdc++-v3/testsuite/experimental/simd/ standard_abi_usable.cc create mode 100644 libstdc++-v3/testsuite/experimental/simd/ standard_abi_usable_2.cc =2D- =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80 Dr. Matthias Kretz https://mattkretz.github.io GSI Helmholtz Centre for Heavy Ion Research https://gsi.de std::experimental::simd https://github.com/VcDevel/std-simd =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80= =E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2= =94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94= =80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80=E2=94=80 --nextPart3621190.fJnDbHsrEp Content-Disposition: inline; filename="0001-Add-std-experimental-simd-from-the-Parallelism-TS-2.patch" Content-Transfer-Encoding: quoted-printable Content-Type: text/x-patch; charset="utf-8"; name="0001-Add-std-experimental-simd-from-the-Parallelism-TS-2.patch" diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml b/libstdc++-v3/= doc/xml/manual/status_cxx2017.xml index e6834b3607a..bc740f8e1ba 100644 =2D-- a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml +++ b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml @@ -2869,6 +2869,17 @@ since C++14 and the implementation is complete. Library Fundamentals 2 TS =20 + + + + P0214R9 + + + Data-Parallel Types + Y + Parallelism 2 TS + + @@ -3014,6 +3025,211 @@ since C++14 and the implementation is complete. If !is_regular_file(p), an error is reported. =20 +
Parallelism 2 TS + + + 9.3 [parallel.simd.abi] + max_fixed_size<T> is 32, except when targetting + AVX512BW and sizeof(T) is 1. + + + + When targeting 32-bit x86, + simd_abi::compatible<T> is an alias f= or + simd_abi::scalar. + When targeting 64-bit x86 (including x32) or Aarch64, + simd_abi::compatible<T> is an alias f= or + simd_abi::_VecBuiltin<16>, + unless T is long double, in which case i= t is + an alias for simd_abi::scalar. + When targeting ARM (but not Aarch64) with NEON support, + simd_abi::compatible<T> is an alias f= or + simd_abi::_VecBuiltin<16>, + unless sizeof(T) > 4, in which case it is + an alias for simd_abi::scalar. Additionally, + simd_abi::compatible<float> is an ali= as for + simd_abi::scalar unless compiling with + -ffast-math. + + + + When targeting x86 (both 32-bit and 64-bit), + simd_abi::native<T> is an alias for o= ne of + simd_abi::scalar, + simd_abi::_VecBuiltin<16>, + simd_abi::_VecBuiltin<32>, or + simd_abi::_VecBltnBtmsk<64>, dependin= g on + T and the machine options the compiler was invoked wi= th. + + + + When targeting ARM/Aarch64 or POWER, + simd_abi::native<T> is an alias for + simd_abi::scalar or + simd_abi::_VecBuiltin<16>, depending = on + T and the machine options the compiler was invoked wi= th. + + + + For any other targeted machine + simd_abi::compatible<T> and + simd_abi::native<T> are aliases for + simd_abi::scalar. (subject to change) + + + + The extended ABI tag types defined in the + std::experimental::parallelism_v2::simd_abi namespace= are: + simd_abi::_VecBuiltin<Bytes>, and + simd_abi::_VecBltnBtmsk<Bytes>. + + + + simd_abi::deduce<T, N, Abis...>::type, + with N > 1 is an alias for an extended ABI tag, if= a + supported extended ABI tag exists. Otherwise it is an alias for + simd_abi::fixed_size<N>. The + simd_abi::_VecBltnBtmsk ABI tag is preferred over + simd_abi::_VecBuiltin. + + + + 9.4 [parallel.simd.traits] + memory_alignment<T, U>::value is + sizeof(U) * T::size() rounded up to the next power-of= =2Dtwo + value. + + + + 9.6.1 [parallel.simd.overview] + On ARM, simd<T, _VecBuiltin<Bytes>> + is supported if __ARM_NEON is defined and + sizeof(T) <=3D 4. Additionally, + sizeof(T) =3D=3D 8 with integral T is su= pported if + __ARM_ARCH >=3D 8, and double is supp= orted if + __aarch64__ is defined. + + On POWER, simd<T, _VecBuiltin<Bytes>> + is supported if __ALTIVEC__ is defined and size= of(T) + < 8. Additionally, double is supported if + __VSX__ is defined, and any T with + sizeof(T) ≤ 8 is supported if __POWER8_VECTOR__ + is defined. + + On x86, given an extended ABI tag Abi, + simd<T, Abi> is supported according t= o the + following table: + + Support for Extended ABI Tags + + + + + + + + + ABI tag Abi + value type T + values for Bytes + required machine option + + + + + + + _VecBuiltin<Bytes> + + float + 8, 12, 16 + "-msse" + + + + 20, 24, 28, 32 + "-mavx" + + + + double + 16 + "-msse2" + + + + 24, 32 + "-mavx" + + + + + integral types other than bool + + + Bytes =E2=89=A4 16 and Bytes d= ivisible by + sizeof(T) + + "-msse2" + + + + + 16 < Bytes =E2=89=A4 32 and Bytes<= /code> + divisible by sizeof(T) + + "-mavx2" + + + + + _VecBuiltin<Bytes> and + _VecBltnBtmsk<Bytes> + + + vectorizable types with sizeof(T) =E2=89=A5= 4 + + + 32 < Bytes =E2=89=A4 64 and Bytes<= /code> + divisible by sizeof(T) + + "-mavx512f" + + + + + vectorizable types with sizeof(T) < 4 + + "-mavx512bw" + + + + + _VecBltnBtmsk<Bytes> + + + vectorizable types with sizeof(T) =E2=89=A5= 4 + + + Bytes =E2=89=A4 32 and Bytes d= ivisible by + sizeof(T) + + "-mavx512vl" + + + + + vectorizable types with sizeof(T) < 4 + + "-mavx512bw" and "-mavx512vl" + + + + +
+
+ +
=20 =20 diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefi= le.am index 958dfea5a98..3ad24267cfd 100644 =2D-- a/libstdc++-v3/include/Makefile.am +++ b/libstdc++-v3/include/Makefile.am @@ -746,6 +746,7 @@ experimental_headers =3D \ ${experimental_srcdir}/ratio \ ${experimental_srcdir}/regex \ ${experimental_srcdir}/set \ + ${experimental_srcdir}/simd \ ${experimental_srcdir}/socket \ ${experimental_srcdir}/source_location \ ${experimental_srcdir}/string \ @@ -765,7 +766,19 @@ experimental_bits_builddir =3D ./experimental/bits experimental_bits_headers =3D \ ${experimental_bits_srcdir}/lfts_config.h \ ${experimental_bits_srcdir}/net.h \ + ${experimental_bits_srcdir}/numeric_traits.h \ ${experimental_bits_srcdir}/shared_ptr.h \ + ${experimental_bits_srcdir}/simd.h \ + ${experimental_bits_srcdir}/simd_builtin.h \ + ${experimental_bits_srcdir}/simd_converter.h \ + ${experimental_bits_srcdir}/simd_detail.h \ + ${experimental_bits_srcdir}/simd_fixed_size.h \ + ${experimental_bits_srcdir}/simd_math.h \ + ${experimental_bits_srcdir}/simd_neon.h \ + ${experimental_bits_srcdir}/simd_ppc.h \ + ${experimental_bits_srcdir}/simd_scalar.h \ + ${experimental_bits_srcdir}/simd_x86.h \ + ${experimental_bits_srcdir}/simd_x86_conversions.h \ ${experimental_bits_srcdir}/string_view.tcc \ ${experimental_bits_filesystem_headers} =20 diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefi= le.in index b3256a7835e..2692b8352be 100644 =2D-- a/libstdc++-v3/include/Makefile.in +++ b/libstdc++-v3/include/Makefile.in @@ -1096,6 +1096,7 @@ experimental_headers =3D \ ${experimental_srcdir}/ratio \ ${experimental_srcdir}/regex \ ${experimental_srcdir}/set \ + ${experimental_srcdir}/simd \ ${experimental_srcdir}/socket \ ${experimental_srcdir}/source_location \ ${experimental_srcdir}/string \ @@ -1115,7 +1116,19 @@ experimental_bits_builddir =3D ./experimental/bits experimental_bits_headers =3D \ ${experimental_bits_srcdir}/lfts_config.h \ ${experimental_bits_srcdir}/net.h \ + ${experimental_bits_srcdir}/numeric_traits.h \ ${experimental_bits_srcdir}/shared_ptr.h \ + ${experimental_bits_srcdir}/simd.h \ + ${experimental_bits_srcdir}/simd_builtin.h \ + ${experimental_bits_srcdir}/simd_converter.h \ + ${experimental_bits_srcdir}/simd_detail.h \ + ${experimental_bits_srcdir}/simd_fixed_size.h \ + ${experimental_bits_srcdir}/simd_math.h \ + ${experimental_bits_srcdir}/simd_neon.h \ + ${experimental_bits_srcdir}/simd_ppc.h \ + ${experimental_bits_srcdir}/simd_scalar.h \ + ${experimental_bits_srcdir}/simd_x86.h \ + ${experimental_bits_srcdir}/simd_x86_conversions.h \ ${experimental_bits_srcdir}/string_view.tcc \ ${experimental_bits_filesystem_headers} =20 diff --git a/libstdc++-v3/include/experimental/bits/numeric_traits.h b/libs= tdc++-v3/include/experimental/bits/numeric_traits.h new file mode 100644 index 00000000000..1b60874b788 =2D-- /dev/null +++ b/libstdc++-v3/include/experimental/bits/numeric_traits.h @@ -0,0 +1,567 @@ +// Definition of numeric_limits replacement traits P1841R1 -*- C++ -*- + +// Copyright (C) 2020 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// Under Section 7 of GPL version 3, you are granted additional +// permissions described in the GCC Runtime Library Exception, version +// 3.1, as published by the Free Software Foundation. + +// You should have received a copy of the GNU General Public License and +// a copy of the GCC Runtime Library Exception along with this program; +// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +// . + +#include + +namespace std { + +template