public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed
From: "yaozhongxiao" <yaozhongxiao@linux.alibaba.com>
To: "libstdc++" <libstdc++@gcc.gnu.org>
Subject: [RFC]  libstc++: Implement gather and scatter
Date: Thu, 18 Feb 2021 21:12:14 +0800	[thread overview]
Message-ID: <8bf771b0-3a95-4256-93c4-163ad71cea42.yaozhongxiao@linux.alibaba.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 1417 bytes --]


Memory load/store in gather and scatter mode with simd are common cases.
I try to support the gather and scatter features in my workload.
I send my draft to request for comment and suggestions before officially commits, 
Please do not hesitate to correct and comment, thanks.

Dr. Matthias Kretz, thank you very much for your work in simd, 
and hope to look forward to get suggestion from you.

--------------------------------------------------------------
 libstc++: Implement gather and scatter
    memory load/store in gather and scatter mode with simd are common
    cases. Implement them in simd via call to _S_gather and _S_scatter
    with it's Abi method.

    libstdc++-v3/ChangeLog:

            * include/experimental/bits/simd.h: Add simd::gather and 
            simd::scatter as the public methods.
            * include/experimental/bits/simd_builtin.h: Add 
            _SimdImplBuiltin::_S_gather and _SimdImplBuiltin::_S_scatter
            for simd native abi implementation.
            * include/experimental/bits/simd_fixed_size.h: Add 
            _SimdImplFixedSize::_S_gather and _SimdImplFixedSize::_S_scatter
            for simd fix abi implementation;
            _SimdTuple::_M_tuple_at for tuple accessing
            * include/experimental/bits/ssimd_scalar: Add 
            _SimdImplScalar::_S_gather and _SimdImplScalar::_S_scatter
            for simd scalar abi implementation.

[-- Attachment #2: patch.txt --]
[-- Type: application/octet-stream, Size: 10375 bytes --]

commit 178ded0bbcf126f7b347045992c9ef501a050bd9
Author: zhongxiao.yzx <zhongxiao.yzx@gmail.com>
Date:   Thu Feb 18 20:32:28 2021 +0800

    libstc++: Implement gather and scatter
    
    memory load/store in gather and scatter mode with simd are common
    cases. Implement them in simd via call to _S_gather and _S_scatter
    with it's Abi method.
    
    libstdc++-v3/ChangeLog:
    
            * include/experimental/bits/simd.h: Add simd::gather and
            simd::scatter as the public methods.
            * include/experimental/bits/simd_builtin.h: Add
            _SimdImplBuiltin::_S_gather and _SimdImplBuiltin::_S_scatter
            for simd native abi implementation.
            * include/experimental/bits/simd_fixed_size.h: Add
            _SimdImplFixedSize::_S_gather and _SimdImplFixedSize::_S_scatter
            for simd fix abi implementation;
            _SimdTuple::_M_tuple_at for tuple accessing
            * include/experimental/bits/ssimd_scalar: Add
            _SimdImplScalar::_S_gather and _SimdImplScalar::_S_scatter
            for simd scalar abi implementation.

diff --git a/libstdc++-v3/include/experimental/bits/simd.h b/libstdc++-v3/include/experimental/bits/simd.h
index c452778832f..603817ff2b0 100644
--- a/libstdc++-v3/include/experimental/bits/simd.h
+++ b/libstdc++-v3/include/experimental/bits/simd.h
@@ -29,6 +29,7 @@
 
 #include "simd_detail.h"
 #include "numeric_traits.h"
+#include <array>
 #include <bit>
 #include <bitset>
 #ifdef _GLIBCXX_DEBUG_UB
@@ -4960,6 +4961,15 @@ template <typename _Tp, typename _Abi>
 	  _Impl::_S_load(_Flags::template _S_apply<simd>(__mem), _S_type_tag))
       {}
 
+    // gather constructor
+    template <typename _Up, typename _Flags>
+      _GLIBCXX_SIMD_ALWAYS_INLINE
+      simd(const _Up* __mem, const __int_for_sizeof_t<_Up>* __idx, _Flags)
+      : _M_data(
+      _Impl::_S_gather(_Flags::template _S_apply<simd>(__mem),
+			   __idx, _S_type_tag))
+      {}
+
     // loads [simd.load]
     template <typename _Up, typename _Flags>
       _GLIBCXX_SIMD_ALWAYS_INLINE void
@@ -4978,6 +4988,47 @@ template <typename _Tp, typename _Abi>
 			_S_type_tag);
       }
 
+    // gather [simd.gather]
+    template <typename _Up, typename _Flags>
+      _GLIBCXX_SIMD_ALWAYS_INLINE void
+      gather(const _Vectorizable<_Up>* __mem,
+	     const std::array<int, size()>& __idx, _Flags)
+      {
+	_M_data = static_cast<decltype(_M_data)>(
+	  _Impl::_S_gather(_Flags::template _S_apply<simd>(__mem), __idx.data(),
+			   _S_type_tag));
+      }
+
+    template <typename _Up, typename _Flags>
+      _GLIBCXX_SIMD_ALWAYS_INLINE void
+      gather(const _Vectorizable<_Up>* __mem,
+	     const __int_for_sizeof_t<_Up>* __idx, _Flags)
+      {
+	_M_data = static_cast<decltype(_M_data)>(
+	  _Impl::_S_gather(_Flags::template _S_apply<simd>(__mem), __idx,
+			   _S_type_tag));
+      }
+
+    // scatter [simd.scatter]
+    template <typename _Up, typename _Flags>
+      _GLIBCXX_SIMD_ALWAYS_INLINE void
+      scatter(_Vectorizable<_Up>* __mem, std::array<int, size()>& __idx,
+	      _Flags) const
+      {
+	_Impl::_S_scatter(_M_data, _Flags::template _S_apply<simd>(__mem),
+			  __idx.data(), _S_type_tag);
+      }
+
+    // scatter [simd.scatter]
+    template <typename _Up, typename _Flags>
+      _GLIBCXX_SIMD_ALWAYS_INLINE void
+      scatter(_Vectorizable<_Up>* __mem, const __int_for_sizeof_t<_Up>* __idx,
+	      _Flags) const
+      {
+	_Impl::_S_scatter(_M_data, _Flags::template _S_apply<simd>(__mem),
+			  __idx, _S_type_tag);
+      }
+
     // scalar access
     _GLIBCXX_SIMD_ALWAYS_INLINE _GLIBCXX_SIMD_CONSTEXPR reference
     operator[](size_t __i)
diff --git a/libstdc++-v3/include/experimental/bits/simd_builtin.h b/libstdc++-v3/include/experimental/bits/simd_builtin.h
index 7f728a10488..69594e1006c 100644
--- a/libstdc++-v3/include/experimental/bits/simd_builtin.h
+++ b/libstdc++-v3/include/experimental/bits/simd_builtin.h
@@ -1428,6 +1428,54 @@ template <typename _Abi>
 	});
       }
 
+    // _S_gather {{{2
+    template <typename _Tp, typename _Up>
+      _GLIBCXX_SIMD_INTRINSIC static _SimdMember<_Tp>
+      _S_gather(const _Up* __mem, const __int_for_sizeof_t<_Up>* __idx,
+		_TypeTag<_Tp>) noexcept
+      {
+	constexpr size_t _Np = _S_size<_Tp>;
+	return __generate_vector<_Tp, _SimdMember<_Tp>::_S_full_size>([&](
+	  auto __i) constexpr {
+	  return static_cast<_Tp>(__i < _Np ? __mem[__idx[__i]] : 0);
+	});
+      }
+    // _S_gather
+    template <typename _Tp, typename _Up>
+      _GLIBCXX_SIMD_INTRINSIC static _SimdMember<_Tp>
+      _S_gather(const _Up* __mem, const _SimdMember<_Tp>& __idx,
+		_TypeTag<_Tp>) noexcept
+      {
+	constexpr size_t _Np = _S_size<_Tp>;
+	return __generate_vector<_Tp, _SimdMember<_Tp>::_S_full_size>([&](
+	  auto __i) constexpr {
+	  return static_cast<_Tp>(__i < _Np ? __mem[__idx[__i]] : 0);
+	});
+      } // }}}
+
+    // _S_scatter {{{2
+    template <typename _Tp, typename _Up>
+      _GLIBCXX_SIMD_INTRINSIC static void
+      _S_scatter(_SimdMember<_Tp> __v, _Up* __mem,
+		 const __int_for_sizeof_t<_Up>* __idx, _TypeTag<_Tp>) noexcept
+      {
+	constexpr size_t _Np = _S_size<_Tp>;
+	__execute_n_times<_Np>([&](auto __i) constexpr {
+	  __mem[__idx[__i]] = static_cast<_Up>(__v[__i]);
+	});
+      }
+    // _S_scatter
+    template <typename _Tp, typename _Up>
+      _GLIBCXX_SIMD_INTRINSIC static void
+      _S_scatter(_SimdMember<_Tp> __v, _Up* __mem,
+		 const _SimdMember<_Tp>& __idx, _TypeTag<_Tp>) noexcept
+      {
+	constexpr size_t _Np = _S_size<_Tp>;
+	__execute_n_times<_Np>([&](auto __i) constexpr {
+	  __mem[__idx[__i]] = static_cast<_Up>(__v[__i]);
+	});
+      } // }}}
+
     // _S_load {{{2
     template <typename _Tp, typename _Up>
       _GLIBCXX_SIMD_INTRINSIC static _SimdMember<_Tp>
@@ -2813,7 +2861,7 @@ template <typename _Abi>
 
     // smart_reference access {{{2
     template <typename _Tp, size_t _Np>
-      static constexpr void _S_set(_SimdWrapper<_Tp, _Np>& __k, int __i,
+      static constexpr void _S_set(_SimdWrapper<_Tp, _Np>& __k, size_t __i,
 				   bool __x) noexcept
       {
 	if constexpr (is_same_v<_Tp, bool>)
diff --git a/libstdc++-v3/include/experimental/bits/simd_fixed_size.h b/libstdc++-v3/include/experimental/bits/simd_fixed_size.h
index 2722055c899..befa32547cc 100644
--- a/libstdc++-v3/include/experimental/bits/simd_fixed_size.h
+++ b/libstdc++-v3/include/experimental/bits/simd_fixed_size.h
@@ -392,6 +392,15 @@ template <typename _Tp, typename _Abi0, typename... _Abis>
 	  return second.template _M_simd_at<_Np - 1>();
       }
 
+    template <size_t _Offset>
+      _GLIBCXX_SIMD_INTRINSIC constexpr auto _M_tuple_at() const
+      {
+	if constexpr (_Offset == 0)
+	  return first;
+	else
+	  return second.template _M_tuple_at<_Offset - simd_size_v<_Tp, _Abi0>>();
+      }
+
     template <size_t _Offset = 0, typename _Fp>
       _GLIBCXX_SIMD_INTRINSIC static constexpr _SimdTuple
       _S_generate(_Fp&& __gen, _SizeConstant<_Offset> = {})
@@ -1328,6 +1337,56 @@ template <int _Np>
 	});
       }
 
+    // _S_gather {{{2
+    template <typename _Tp, typename _Up>
+      static inline _SimdMember<_Tp>
+      _S_gather(const _Up* __mem, const __int_for_sizeof_t<_Up>* __idx,
+		_TypeTag<_Tp>) noexcept
+      {
+	return _SimdMember<_Tp>::_S_generate([&](auto __meta) {
+	  return __meta._S_gather(__mem, &__idx[__meta._S_offset],
+				  _TypeTag<_Tp>());
+	});
+      }
+
+    // _S_gather {{{2
+    template <typename _Tp, typename _Up>
+      static inline _SimdMember<_Tp>
+      _S_gather(const _Up* __mem, const _SimdMember<_Tp>& __idx,
+		_TypeTag<_Tp>) noexcept
+      {
+	return _SimdMember<_Tp>::_S_generate([&](auto __meta) {
+	  return __meta._S_gather(
+	    __mem, __idx.template _M_tuple_at<__meta._S_offset>(),
+	    _TypeTag<_Tp>());
+	});
+      }
+
+    // _S_scatter {{{2
+    template <typename _Tp, typename _Up>
+      static inline void
+      _S_scatter(const _SimdMember<_Tp>& __v, _Up* __mem,
+		 const __int_for_sizeof_t<_Up>* __idx, _TypeTag<_Tp>) noexcept
+      {
+	__for_each(__v, [&](auto __meta, auto __native) {
+	  __meta._S_scatter(__native, __mem, &__idx[__meta._S_offset],
+			    _TypeTag<_Tp>());
+	});
+      }
+
+    // _S_scatter {{{2
+    template <typename _Tp, typename _Up>
+      static inline void
+      _S_scatter(const _SimdMember<_Tp>& __v, _Up* __mem,
+		 const _SimdMember<_Tp>& __idx, _TypeTag<_Tp>) noexcept
+      {
+	__for_each(__v, __idx,
+		   [&](auto __meta, auto __v_tuple, auto __idx_tuple) {
+		     __meta._S_scatter(__v_tuple, __mem, __idx_tuple,
+				       _TypeTag<_Tp>());
+		   });
+      }
+
     // _S_load {{{2
     template <typename _Tp, typename _Up>
       static inline _SimdMember<_Tp> _S_load(const _Up* __mem,
diff --git a/libstdc++-v3/include/experimental/bits/simd_scalar.h b/libstdc++-v3/include/experimental/bits/simd_scalar.h
index 48e13f6c719..243672dda39 100644
--- a/libstdc++-v3/include/experimental/bits/simd_scalar.h
+++ b/libstdc++-v3/include/experimental/bits/simd_scalar.h
@@ -147,6 +147,42 @@ struct _SimdImplScalar
 							      _TypeTag<_Tp>)
     { return __gen(_SizeConstant<0>()); }
 
+  // _S_gather {{{2
+  template <typename _Tp, typename _Up>
+    _GLIBCXX_SIMD_INTRINSIC static _Tp
+    _S_gather(const _Up* __mem, const __int_for_sizeof_t<_Up>* __idx,
+	      _TypeTag<_Tp>) noexcept
+    {
+      return static_cast<_Tp>(__mem[__idx[0]]);
+    }
+
+  // _S_gather
+  template <typename _Tp, typename _Up>
+    _GLIBCXX_SIMD_INTRINSIC static _Tp
+    _S_gather(const _Up* __mem, const _Tp& __idx, _TypeTag<_Tp>) noexcept
+    {
+      return static_cast<_Tp>(__mem[__idx]);
+    } // }}}
+
+  // _S_scatter {{{2
+  template <typename _Tp, typename _Up>
+    _GLIBCXX_SIMD_INTRINSIC static void
+    _S_scatter(const _Tp& __v, _Up* __mem,
+	       [[maybe_unused]] const __int_for_sizeof_t<_Up>* __idx,
+	       _TypeTag<_Tp>) noexcept
+    {
+      __mem[__idx[0]] = static_cast<_Up>(__v);
+    }
+
+  // _S_scatter
+  template <typename _Tp, typename _Up>
+    _GLIBCXX_SIMD_INTRINSIC static void
+    _S_scatter(_Tp& __v, _Up* __mem, [[maybe_unused]] const _Tp& __idx,
+	       _TypeTag<_Tp>) noexcept
+    {
+      __mem[__idx] = static_cast<_Up>(__v);
+    } // }}}
+
   // _S_load {{{2
   template <typename _Tp, typename _Up>
     _GLIBCXX_SIMD_INTRINSIC static _Tp _S_load(const _Up* __mem,

             reply	other threads:[~2021-02-18 13:12 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-18 13:12 yaozhongxiao [this message]
2021-02-19  9:12 ` Matthias Kretz
2021-02-19 12:56   ` yao zhongxiao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8bf771b0-3a95-4256-93c4-163ad71cea42.yaozhongxiao@linux.alibaba.com \
    --to=yaozhongxiao@linux.alibaba.com \
    --cc=libstdc++@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).