From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw1-x1131.google.com (mail-yw1-x1131.google.com [IPv6:2607:f8b0:4864:20::1131]) by sourceware.org (Postfix) with ESMTPS id B55C53858C35; Fri, 9 Feb 2024 14:28:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B55C53858C35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org B55C53858C35 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::1131 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707488926; cv=none; b=E2DNJhMT1ApK+BpVZ4yLcnnVVM2LxptES4RunCPzyEIQ91bED4gJfCML4Hqz6mdGWOKy1YJTvk/q3NwNMWtT67B+ZGreYNisUhuBTkt4XK59ty4ZhK4iT/HlnRdZmQpRjGlexOCOVKWzMyri5gK9ohUSFESTNlYAM5U7/+JJiT8= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707488926; c=relaxed/simple; bh=uJKhcLL5KqslpqJTHoilCsNvc2znrfAIybIP5UNlTI0=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=ZZFQOr2K0MxovGz/WSxCDy4wfUPeQMvyHPWT3EkbSBtNwhwXSGL+8CbRX6X3Ca9yzmzWgh3kZWqLaLuhGlemj1eGVcXDiksjntiOob5PPqDtxLamvXobrNZGUd0IzfJGbZehdM8Ut4Y1mDy7xZKUNMZoBUukCnYU0HH5qp80nY8= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-yw1-x1131.google.com with SMTP id 00721157ae682-60498c31743so10002407b3.3; Fri, 09 Feb 2024 06:28:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1707488918; x=1708093718; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=bSRog1NxU/lOHT9nop8FGQZz67r4J+8iW86gkq46s6w=; b=Blg3zsHXsJ2yGGN+s01pwdyzf4v2amkZpGbvmsKDwDrWmwqYdH7kT7de8GecnNZFO3 zRHIlyR8yE+vvOXudQF41fl4hCyFVBX4Qs35sMZO8rvqgzEWiqyGCMblS1tUubGgAljN /F8x+Mq05RjK2A7GrIim/v+5J/CW7vk7fQb2YYOGqklITIPW+0aRVxmeNb0cKTcHoE1M CFZzYk10XgUgcti8gbc55XBzqzZO6Cw76DvILMjFDsL2sexvqTAZB2roXhPPecGl6uUA Jp3NHY8UfGhnSBf/VBNcuc4KR9HWW41hsjCsCilObW69i4oUalpLNaBkHkE8Jt74OMzo A48A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707488918; x=1708093718; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bSRog1NxU/lOHT9nop8FGQZz67r4J+8iW86gkq46s6w=; b=A+GAu83TS10B/oB6ma58SeZMbPF6gLYXvHSljwn2WE0AmwJ1NLUy+7uLWh6gV5yU8n M7rkXU2YeRwNcix62Ky7QVKR/Ia82KWQYZT5YdNDLkfLFQTdBCjY7RYRikQ4qC41P1pk KP87WpivMk9Ku76/Ej0uFTGKZqMC8q4/XJ+gkbklOQ7mUlf0VtY5bgw99D/e05a174Ac dquL0wZqU4mI07mRA5CpGlKbTqR8mdAnu9sWGTr6W77J2b2XXAqPqoE1T7+t4bEXu/jE gXF6atECuZq8qk49v1fB+F0WjyVZd54SBoR8IyIVnaLshjU5ucl7NWGGZuSlYSPLwR9E jS1A== X-Gm-Message-State: AOJu0Yzm3QOtYA15cNUOmdhMAxPahuZDBdac1Y6B5gipDRn9V4B797K8 UhSdY1pQHg39d/4vwvIu9cxCHXbG+pUrXICMeoqeeibofd1kwDi67ey0/VgSi5O3OQ== X-Google-Smtp-Source: AGHT+IHvaHYNrFeZt/q2dQnuz+XfGcB3tiPK1n0P+b0CbBUfHCEB5NhENCNixCDXYE94eMExYFfjOg== X-Received: by 2002:a81:bb54:0:b0:5ff:7ba4:8897 with SMTP id a20-20020a81bb54000000b005ff7ba48897mr1637900ywl.38.1707488917033; Fri, 09 Feb 2024 06:28:37 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCVsGK46i9ws22lzjzYZs6PaPgvKad9FeEI0fbn6z2hNlG9Qkni3gy0TTEDYpTZWEFS/NZdk9CHIinXXIqfYlEs92oCNtLUE+YrwtkkIJ0Crn3D+U1TUtVGe8jDf7KWlo1ZjkrPoYdJQ3kyZcpDCkLuQ5qw= Received: from localhost.localdomain ([2600:381:600e:6cbc:7cb3:2434:6a99:fa81]) by smtp.gmail.com with ESMTPSA id w204-20020a817bd5000000b006042eeb20e1sm334021ywc.29.2024.02.09.06.28.32 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 09 Feb 2024 06:28:36 -0800 (PST) From: Srinivas Yadav Singanaboina X-Google-Original-From: Srinivas Yadav Singanaboina To: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org, m.kretz@gsi.de, richard.sandiford@arm.com Cc: Srinivas Yadav Singanaboina Subject: [PATCH v2] libstdc++: add ARM SVE support to std::experimental::simd Date: Fri, 9 Feb 2024 08:28:10 -0600 Message-Id: <20240209142810.97817-1-vasu.srinivasvasu.14@gmail.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,FREEMAIL_REPLY,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi, Thanks for review @Richard!. I have tried to address most of your comments in this patch. The major updates include optimizing operator[] for masks, find_first_set and find_last_set. My further comments on some of the pointed out issues are a. regarding the coverage of types supported for sve : Yes, all the types are covered by mapping any type using simple two rules : the size of the type and signedness of it. b. all the operator overloads now use infix operators. For division and remainder, the inactive elements are padded with 1 to avoid undefined behavior. c. isnan is optimized to have only two cases i.e finite_math_only case or case where svcmpuo is used. d. _S_load for masks (bool) now uses svld1 by reinterpret_casting the pointer to uint8_t pointer and then performing a svunpklo. The same optimization is not done for masked_load and stores, as conversion of mask from a higher size type to lower size type is not optimal (sequential). e. _S_unary_minus could not use svneg_x because it does not support unsigned types. f. added specializations for reductions. g. find_first_set and find_last_set are optimized using svclastb. libstdc++-v3/ChangeLog: * include/Makefile.am: Add simd_sve.h. * include/Makefile.in: Add simd_sve.h. * include/experimental/bits/simd.h: Add new SveAbi. * include/experimental/bits/simd_builtin.h: Use __no_sve_deduce_t to support existing Neon Abi. * include/experimental/bits/simd_converter.h: Convert sequentially when sve is available. * include/experimental/bits/simd_detail.h: Define sve specific macro. * include/experimental/bits/simd_math.h: Fallback frexp to execute sequntially when sve is available, to handle fixed_size_simd return type that always uses sve. * include/experimental/simd: Include bits/simd_sve.h. * testsuite/experimental/simd/tests/bits/main.h: Enable testing for sve128, sve256, sve512. * include/experimental/bits/simd_sve.h: New file. Signed-off-by: Srinivas Yadav Singanaboina vasu.srinivasvasu.14@gmail.com --- libstdc++-v3/include/Makefile.am | 1 + libstdc++-v3/include/Makefile.in | 1 + libstdc++-v3/include/experimental/bits/simd.h | 131 +- .../include/experimental/bits/simd_builtin.h | 35 +- .../experimental/bits/simd_converter.h | 57 +- .../include/experimental/bits/simd_detail.h | 7 +- .../include/experimental/bits/simd_math.h | 14 +- .../include/experimental/bits/simd_sve.h | 1863 +++++++++++++++++ libstdc++-v3/include/experimental/simd | 3 + .../experimental/simd/tests/bits/main.h | 3 + 10 files changed, 2084 insertions(+), 31 deletions(-) create mode 100644 libstdc++-v3/include/experimental/bits/simd_sve.h diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am index 6209f390e08..1170cb047a6 100644 --- a/libstdc++-v3/include/Makefile.am +++ b/libstdc++-v3/include/Makefile.am @@ -826,6 +826,7 @@ experimental_bits_headers = \ ${experimental_bits_srcdir}/simd_neon.h \ ${experimental_bits_srcdir}/simd_ppc.h \ ${experimental_bits_srcdir}/simd_scalar.h \ + ${experimental_bits_srcdir}/simd_sve.h \ ${experimental_bits_srcdir}/simd_x86.h \ ${experimental_bits_srcdir}/simd_x86_conversions.h \ ${experimental_bits_srcdir}/string_view.tcc \ diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in index 596fa0d2390..bc44582a2da 100644 --- a/libstdc++-v3/include/Makefile.in +++ b/libstdc++-v3/include/Makefile.in @@ -1172,6 +1172,7 @@ experimental_bits_headers = \ ${experimental_bits_srcdir}/simd_neon.h \ ${experimental_bits_srcdir}/simd_ppc.h \ ${experimental_bits_srcdir}/simd_scalar.h \ + ${experimental_bits_srcdir}/simd_sve.h \ ${experimental_bits_srcdir}/simd_x86.h \ ${experimental_bits_srcdir}/simd_x86_conversions.h \ ${experimental_bits_srcdir}/string_view.tcc \ diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/include/experimental/bits/simd.h index 90523ea57dc..d274cd740fe 100644 --- a/libstdc++-v3/include/experimental/bits/simd.h +++ b/libstdc++-v3/include/experimental/bits/simd.h @@ -39,12 +39,16 @@ #include #include #include +#include #if _GLIBCXX_SIMD_X86INTRIN #include #elif _GLIBCXX_SIMD_HAVE_NEON #include #endif +#if _GLIBCXX_SIMD_HAVE_SVE +#include +#endif /** @ingroup ts_simd * @{ @@ -83,6 +87,12 @@ using __m512d [[__gnu__::__vector_size__(64)]] = double; using __m512i [[__gnu__::__vector_size__(64)]] = long long; #endif +#if _GLIBCXX_SIMD_HAVE_SVE +constexpr inline int __sve_vectorized_size_bytes = __ARM_FEATURE_SVE_BITS / 8; +#else +constexpr inline int __sve_vectorized_size_bytes = 0; +#endif + namespace simd_abi { // simd_abi forward declarations {{{ // implementation details: @@ -108,6 +118,9 @@ template template struct _VecBltnBtmsk; +template + struct _SveAbi; + template using _VecN = _VecBuiltin; @@ -123,6 +136,9 @@ template template using _Neon = _VecBuiltin<_UsedBytes>; +template + using _Sve = _SveAbi<_UsedBytes, __sve_vectorized_size_bytes>; + // implementation-defined: using __sse = _Sse<>; using __avx = _Avx<>; @@ -130,6 +146,7 @@ using __avx512 = _Avx512<>; using __neon = _Neon<>; using __neon128 = _Neon<16>; using __neon64 = _Neon<8>; +using __sve = _Sve<>; // standard: template @@ -250,6 +267,8 @@ constexpr inline bool __support_neon_float = false; #endif +constexpr inline bool __have_sve = _GLIBCXX_SIMD_HAVE_SVE; + #ifdef _ARCH_PWR10 constexpr inline bool __have_power10vec = true; #else @@ -356,12 +375,13 @@ namespace __detail | (__have_avx512vnni << 27) | (__have_avx512vpopcntdq << 28) | (__have_avx512vp2intersect << 29); - else if constexpr (__have_neon) + else if constexpr (__have_neon || __have_sve) return __have_neon | (__have_neon_a32 << 1) | (__have_neon_a64 << 2) | (__have_neon_a64 << 2) - | (__support_neon_float << 3); + | (__support_neon_float << 3) + | (__have_sve << 4); else if constexpr (__have_power_vmx) return __have_power_vmx | (__have_power_vsx << 1) @@ -733,6 +753,16 @@ template return _Bytes <= 16 && is_same_v, _Abi>; } +// }}} +// __is_sve_abi {{{ +template + constexpr bool + __is_sve_abi() + { + constexpr auto _Bytes = __abi_bytes_v<_Abi>; + return _Bytes <= __sve_vectorized_size_bytes && is_same_v, _Abi>; + } + // }}} // __make_dependent_t {{{ template @@ -998,6 +1028,9 @@ template template using _SimdWrapper64 = _SimdWrapper<_Tp, 64 / sizeof(_Tp)>; +template + struct _SveSimdWrapper; + // }}} // __is_simd_wrapper {{{ template @@ -2858,6 +2891,8 @@ template constexpr size_t __bytes = __vectorized_sizeof<_Tp>(); if constexpr (__bytes == sizeof(_Tp)) return static_cast(nullptr); + else if constexpr (__have_sve) + return static_cast<_SveAbi<__sve_vectorized_size_bytes>*>(nullptr); else if constexpr (__have_avx512vl || (__have_avx512f && __bytes == 64)) return static_cast<_VecBltnBtmsk<__bytes>*>(nullptr); else @@ -2951,6 +2986,9 @@ template > template struct __deduce_impl; +template + struct __no_sve_deduce_impl; + namespace simd_abi { /** * @tparam _Tp The requested `value_type` for the elements. @@ -2965,6 +3003,12 @@ template template using deduce_t = typename deduce<_Tp, _Np, _Abis...>::type; + +template + struct __no_sve_deduce : __no_sve_deduce_impl<_Tp, _Np> {}; + +template + using __no_sve_deduce_t = typename __no_sve_deduce<_Tp, _Np, _Abis...>::type; } // namespace simd_abi // }}}2 @@ -2974,13 +3018,23 @@ template template struct rebind_simd<_Tp, simd<_Up, _Abi>, - void_t, _Abi>>> - { using type = simd<_Tp, simd_abi::deduce_t<_Tp, simd_size_v<_Up, _Abi>, _Abi>>; }; + void_t(), + simd_abi::__no_sve_deduce_t<_Tp, simd_size_v<_Up, _Abi>, _Abi>, + simd_abi::deduce_t<_Tp, simd_size_v<_Up, _Abi>, _Abi>>>> + { using type = simd<_Tp, std::conditional_t(), + simd_abi::__no_sve_deduce_t<_Tp, simd_size_v<_Up, _Abi>, _Abi>, + simd_abi::deduce_t<_Tp, simd_size_v<_Up, _Abi>, _Abi>>>; + }; template struct rebind_simd<_Tp, simd_mask<_Up, _Abi>, - void_t, _Abi>>> - { using type = simd_mask<_Tp, simd_abi::deduce_t<_Tp, simd_size_v<_Up, _Abi>, _Abi>>; }; + void_t(), + simd_abi::__no_sve_deduce_t<_Tp, simd_size_v<_Up, _Abi>, _Abi>, + simd_abi::deduce_t<_Tp, simd_size_v<_Up, _Abi>, _Abi>>>> + { using type = simd_mask<_Tp, std::conditional_t(), + simd_abi::__no_sve_deduce_t<_Tp, simd_size_v<_Up, _Abi>, _Abi>, + simd_abi::deduce_t<_Tp, simd_size_v<_Up, _Abi>, _Abi>>>; + }; template using rebind_simd_t = typename rebind_simd<_Tp, _V>::type; @@ -3243,7 +3297,7 @@ template else if constexpr (_Tp::size() == 1) return __x[0]; else if constexpr (sizeof(_Tp) == sizeof(__x) - && !__is_fixed_size_abi_v<_Ap>) + && !__is_fixed_size_abi_v<_Ap> && !__is_sve_abi<_Ap>()) return {__private_init, __vector_bitcast( _Ap::_S_masked(__data(__x))._M_data)}; @@ -4004,18 +4058,29 @@ template & __x) { using _Tp = typename _V::value_type; + + auto __gen_fallback = [&]() constexpr _GLIBCXX_SIMD_ALWAYS_INLINE_LAMBDA { + return __generate_from_n_evaluations<_Parts, array<_V, _Parts>>( + [&](auto __i) constexpr _GLIBCXX_SIMD_ALWAYS_INLINE_LAMBDA { + return _V([&](auto __j) constexpr _GLIBCXX_SIMD_ALWAYS_INLINE_LAMBDA + { return __x[__i * _V::size() + __j]; }); + }); + }; + if constexpr (_Parts == 1) { return {simd_cast<_V>(__x)}; } else if (__x._M_is_constprop()) { - return __generate_from_n_evaluations<_Parts, array<_V, _Parts>>( - [&](auto __i) constexpr _GLIBCXX_SIMD_ALWAYS_INLINE_LAMBDA { - return _V([&](auto __j) constexpr _GLIBCXX_SIMD_ALWAYS_INLINE_LAMBDA - { return __x[__i * _V::size() + __j]; }); - }); + return __gen_fallback(); } +#if _GLIBCXX_SIMD_HAVE_SVE + else if constexpr(__is_sve_abi<_Ap>) + { + return __gen_fallback(); + } +#endif else if constexpr ( __is_fixed_size_abi_v<_Ap> && (is_same_v @@ -4115,7 +4180,8 @@ template constexpr size_t _N0 = _SL::template _S_at<0>(); using _V = __deduced_simd<_Tp, _N0>; - if (__x._M_is_constprop()) + auto __gen_fallback = [&]() constexpr _GLIBCXX_SIMD_ALWAYS_INLINE_LAMBDA + { return __generate_from_n_evaluations( [&](auto __i) constexpr _GLIBCXX_SIMD_ALWAYS_INLINE_LAMBDA { using _Vi = __deduced_simd<_Tp, _SL::_S_at(__i)>; @@ -4124,6 +4190,14 @@ template return __x[__offset + __j]; }); }); + }; + + if (__x._M_is_constprop()) + __gen_fallback(); +#if _GLIBCXX_SIMD_HAVE_SVE + else if constexpr (__have_sve) + __gen_fallback(); +#endif else if constexpr (_Np == _N0) { static_assert(sizeof...(_Sizes) == 1); @@ -4510,8 +4584,10 @@ template